Prerequisites
- KalmForge SDK installed.
- API key configured in Window → KalmForge.
- At least one running experiment with exposures and conversions.
- No Play Mode required. The window talks to the backend over HTTP using the project API key.
What it is
Open from KalmForge → A/B Test Analysis. A persistent editor window for statistical significance tracking that does not require Play Mode.
How to use
Click Fetch Stats to load all experiments. The window calls the project API directly using your KalmForge API key - no intermediate dashboard hop required.
Results table
| Name | Type | Description |
|---|---|---|
| Variant | string | Variant key (control first). |
| Exposures | int | Unique players exposed. |
| Conversions | int | Players that converted at least once. |
| Conv. Rate | % | Conversions / exposures, rendered with a mini bar. |
| vs Control | % | Relative lift vs the control variant. |
| Confidence | % | (1 - p_value) × 100 from a two-proportion z-test. |
| Verdict | Winner / Losing / Inconclusive | Computed from confidence and lift sign. |
A winner badge is rendered on the winning variant row.
Statistical method
Two-proportion z-test, two-tailed. Confidence = (1 - p_value) × 100. Significance threshold: p < 0.05.
1pooled_p = (c1 + c2) / (n1 + n2)2se = sqrt(pooled_p * (1 - pooled_p) * (1/n1 + 1/n2))3z = |p2 - p1| / se4p_value = 2 * (1 - normal_cdf(z)) // two-tailedMDE slider & sample size
A minimum sample size warning is shown per experiment until results are reliable. The required sample size per variant is:
1n = 16 * p * (1 - p) / MDE^22// p = control conversion rate3// MDE = configurable relative minimum detectable effectThe toolbar exposes an MDE slider from 5% to 50% (default 20%). Changing it instantly recalculates required sample sizes and verdicts without re-fetching. A Running only filter toggle is also in the toolbar.
Conclusion banners
One of four states per experiment:
- Keep running. Not enough data yet. - below the minimum sample size.
- Ready to conclude. [variant] is the winner with +X% lift at Y% confidence. - significant positive result.
- Null result. No significant difference detected. - sufficient data, no significance.
- One or more variants are significantly hurting conversion. Consider rolling back. - significant negative result.
Backend contract
For teams self-hosting, the editor window calls:
1GET /api/public/sdk/ab-tests/stats2Headers: X-API-Key: kf_xxx_yyy34# Response5{6 "experiments": [7 {8 "experiment_key": "checkout_v2",9 "status": "running",10 "total_exposures": 4210,11 "variants": [12 { "key": "control", "exposures": 2100, "conversions": 168, "conversion_rate": 0.08 },13 { "key": "treatment", "exposures": 2110, "conversions": 211, "conversion_rate": 0.10 }14 ]15 }16 ]17}Next steps
- Read the A/B Tests runtime API reference.
- Generate config-aware experiments via the Remote Config + A/B Setup Wizard.