Question 1

What is an A/B test calculator used for?

Accepted Answer

An A/B test calculator determines whether the difference between two variants is statistically significant or just random chance. You input visitors and conversions for each variant, and the calculator tells you the p-value (probability the result is random), confidence level (how sure you can be), and whether you need more data before calling a winner. Marketers use it to validate landing page tests, email subject lines, ad creatives, and pricing experiments before shipping changes. Product teams use it to confirm feature changes improve conversion rates. The alternative is eyeballing the numbers or waiting until one variant is "obviously" better, which leads to false positives (shipping a variant that didn't actually win) or wasted time (testing past the point where significance was already reached). Use the conversion rate calculator after determining significance to translate percentage lifts into projected revenue. Use the ctr calculator alongside this tool when testing email or ad campaigns where click-through rate matters as much as final conversion.

Question 2

What is statistical significance in an A/B test?

Accepted Answer

Statistical significance is calculated using a two-proportion z-test that compares conversion rates between variants. The calculator takes visitors and conversions for Variant A and Variant B, computes each conversion rate, then calculates the z-score (how many standard deviations apart the two rates are). The z-score converts to a p-value, which is the probability the difference happened by random chance. A p-value below 0.05 means less than 5% chance the result is random, so you can trust the difference is real. Most A/B test calculators use a 95% confidence threshold (p-value < 0.05), though some teams accept 90% confidence (p-value < 0.10) for faster decisions on low-traffic tests. The math also produces confidence intervals, showing the range where the true conversion rate likely falls for each variant. If the intervals don't overlap, the difference is significant. You don't need to calculate this manually; paste your numbers into this tool and it runs the z-test instantly. After confirming significance, use the conversion rate calculator to project business impact.

Question 3

What is a good sample size for an A/B test?

Accepted Answer

A good sample size depends on your baseline conversion rate, the minimum detectable effect (smallest lift worth detecting), and your desired confidence level. For most tests, you need at least 1,000 conversions total (across both variants) to reach 95% confidence. If your conversion rate is 2%, that means 50,000 visitors per variant (100,000 total). If your conversion rate is 10%, you need 10,000 visitors per variant (20,000 total). The smaller the expected lift, the more visitors you need. Detecting a 50% improvement (2% to 3%) requires fewer visitors than detecting a 10% improvement (2% to 2.2%). This calculator shows recommended sample size based on your current data, so you know whether to keep testing or call it. Stopping too early produces unreliable results. Testing past the required sample size wastes time without improving accuracy. If you don't have enough traffic to reach significance in a reasonable time frame (say, two weeks), test a bigger change or accept a lower confidence threshold like 90%. Use the ctr calculator to analyze traffic by source so you know which channels bring enough volume for valid testing.

Question 4

What does p-value mean in A/B testing?

Accepted Answer

The p-value is the probability that the observed difference between variants happened by random chance rather than a real effect. A p-value of 0.03 means there's a 3% chance the difference is random, or equivalently, 97% confidence that Variant B actually performs better than Variant A. The standard threshold is p < 0.05, meaning you need at least 95% confidence to call a winner. If the p-value is 0.12, there's a 12% chance the difference is just noise, so you keep testing. Lower p-values mean stronger evidence. A p-value of 0.001 means 99.9% confidence, which is rare in marketing tests but common in scientific experiments. If you stop a test at p = 0.15 because one variant is ahead, you have an 15% chance of shipping a change that doesn't actually work. That's why calculators flag results as "not significant" when p > 0.05. The p-value changes as you collect more data. A test might start with p = 0.20 after 500 visitors, drop to p = 0.08 at 2,000 visitors, and finally cross p = 0.04 at 5,000 visitors. Use this calculator daily during your test to see when you cross the significance threshold. After reaching significance, use the conversion rate calculator to estimate revenue impact before implementing the winner.

Question 5

How long should you run an A/B test?

Accepted Answer

Run an A/B test until you reach statistical significance (p-value < 0.05) and hit the recommended sample size, or until two full weeks pass so you capture weekly traffic patterns. Most tests need 1,000 to 5,000 conversions per variant, which translates to one to four weeks depending on traffic volume. Stopping early because one variant is ahead after three days risks false positives. Running forever because you want 99.9% confidence wastes time on diminishing returns. The right stopping rule is significance plus sample size plus time coverage. Significance confirms the difference is real. Sample size confirms you have enough data. Time coverage confirms you've seen weekday and weekend traffic, which often converts differently. If your test reaches significance after five days but your traffic varies by day of week, let it run to 14 days. If it's been three weeks and you're nowhere near significance, the variants are probably too similar. Call it a tie and test a bigger change. Use this calculator daily to track p-value and sample size progress. Once both thresholds are met, stop the test and use the conversion rate calculator to project the impact of shipping the winner.

Question 6

What is a confidence interval in A/B testing?

Accepted Answer

A confidence interval shows the range where the true conversion rate likely falls. If Variant A has a 95% confidence interval of 3.5% to 4.5%, that means you're 95% confident the real conversion rate is somewhere in that range. Narrow intervals (like 4.0% to 4.2%) mean you know the true rate precisely because you have lots of data. Wide intervals (like 2% to 8%) mean high uncertainty because sample size is too small. In A/B testing, you compare the intervals for both variants. If Variant A's interval is 3.5% to 4.5% and Variant B's is 4.8% to 5.8%, the ranges don't overlap, which confirms a significant difference. If Variant A is 3.5% to 4.5% and Variant B is 4.0% to 5.0%, they overlap, meaning the difference might be noise. The calculator shows confidence intervals automatically alongside p-values. Both metrics tell you the same story from different angles. A non-overlapping confidence interval usually corresponds to p < 0.05. Overlapping intervals usually correspond to p > 0.05. Use the intervals when explaining results to non-technical stakeholders because "the ranges don't overlap" is easier to grasp than "p-value of 0.03." After confirming significance via intervals or p-value, use the conversion rate calculator to translate the lift into expected revenue.

Question 7

Can you run an A/B test with unequal sample sizes?

Accepted Answer

Yes, you can run an A/B test with unequal sample sizes, but equal splits (50/50 traffic) are better for reaching significance faster. If Variant A gets 10,000 visitors and Variant B gets 2,000 visitors, the calculator still works, but the confidence interval for Variant B will be wider because smaller sample size means higher uncertainty. Unequal splits happen when you're testing a risky change and want to limit exposure. You might send 90% of traffic to the proven control and 10% to the new variant to avoid tanking conversions if the test goes badly. The trade-off is the test takes longer to reach significance because the smaller variant accumulates data slowly. If you're testing two equally safe variants, split traffic evenly to minimize test duration. If you're testing something risky (like a totally new checkout flow), skew traffic toward the control until early data confirms the new variant isn't broken. This calculator handles unequal splits automatically; just enter the actual visitor and conversion counts for each variant. After the test, use the conversion rate calculator to model the full-traffic impact before rolling out the winner to 100% of users.

Question 8

What is the difference between A/B testing and multivariate testing?

Accepted Answer

A/B testing compares two versions of one variable (like Headline A vs Headline B). Multivariate testing compares multiple variables simultaneously (like Headline A vs B, Button Color Red vs Blue, and Image X vs Y, all at once). A/B testing is simpler and requires less traffic. If you have 10,000 visitors per week, you can run an A/B test and get results in one to two weeks. Multivariate testing splits traffic across all combinations (in the example above, that's 2 headlines × 2 button colors × 2 images = 8 combinations), so you need 8x the traffic to reach significance in the same time frame. Use A/B testing when you have a hypothesis about one specific change. Use multivariate testing when you want to test interactions between variables (like "Does Headline A work better with Red or Blue button?"). Most teams stick to A/B tests because traffic is limited and testing one variable at a time is easier to implement and analyze. This calculator is built for A/B tests (two variants). If you're running multivariate tests, you'll need a specialized tool that handles more than two groups. After determining which single change works best via A/B testing, use the ctr calculator to break down performance by traffic source or device.

Question 9

How do you interpret A/B test results?

Accepted Answer

Interpret A/B test results by checking three things in order: statistical significance, confidence interval overlap, and practical impact. First, look at the p-value. If it's below 0.05, the difference is statistically significant and you can trust the result. If it's above 0.05, the test hasn't reached significance yet, so keep running it or conclude the variants are too similar. Second, check the confidence intervals. If they don't overlap, the difference is real. If they overlap, one variant might appear ahead but the true rates could be the same. Third, calculate practical impact using the conversion rate calculator . A 0.1% lift might be statistically significant but economically meaningless if you only get 1,000 visitors per month. A 2% lift on 100,000 monthly visitors is both significant and valuable. Also consider the cost of implementation. If Variant B requires a full site redesign to ship, the lift needs to justify the engineering time. If it's a one-line copy change, ship it even for a small lift. Avoid common interpretation mistakes like calling a winner based on conversion rate alone (ignoring p-value), stopping too early because one variant is ahead, or testing forever because you want 99% confidence when 95% is enough.

Question 10

What is the minimum detectable effect in A/B testing?

Accepted Answer

The minimum detectable effect (MDE) is the smallest conversion rate lift you can reliably detect given your sample size and significance threshold. If your baseline conversion rate is 4% and your MDE is 0.5 percentage points, you can detect a change from 4% to 4.5% (a 12.5% relative lift) with 95% confidence. Smaller effects require more visitors. Detecting a 0.1 percentage point change (4% to 4.1%) might need 10x the sample size. Most teams set MDE based on what's worth implementing. If a 10% relative lift would meaningfully impact revenue, set MDE to 0.4 percentage points (4% to 4.4%). If only a 25% lift justifies the engineering cost, set MDE to 1 percentage point (4% to 5%). This calculator doesn't ask for MDE explicitly; instead it shows recommended sample size based on the difference you're seeing in real data. If the calculator says you need 50,000 visitors per variant to reach significance and you only get 5,000 per month, your test would take 10 months. At that point, either test a bigger change (larger MDE) or accept a lower confidence threshold (90% instead of 95%). Use the conversion rate calculator to model revenue impact at different lift sizes so you know which MDE is worth testing.

Question 11

What does it mean if an AB test result is not statistically significant?

Accepted Answer

A result that is not statistically significant means the data collected so far cannot confirm the observed difference between variants is real rather than random. It does not mean Variant B is worse or that the test failed. It means you do not yet have enough evidence to call a winner. A p-value above 0.05 (for example, 0.12 or 0.18) says there is more than a 5% chance the difference you see happened by chance, which is too uncertain to make a decision.

There are three common reasons for a non-significant result. First, your sample size is too small and you need more visitors. The calculator shows how many more you need. Second, the difference between variants is genuinely small and detecting it requires much larger traffic volume than you have. Third, both variants actually perform the same, and there is no real winner.

If the result is not significant after reaching the recommended sample size, treat it as a tie. Do not ship Variant B hoping the trend holds. Do not reverse your original variant either. Call it a draw and test a bigger, more meaningful change instead. Use the conversion-rate-calculator-marketing to model what lift size would actually move revenue, then design your next test around that target rather than testing incremental changes that require unrealistic sample sizes to detect.

Question 12

Does AB testing actually work?

Accepted Answer

Yes, A/B testing works reliably when implemented correctly. The core principle is sound: randomly split traffic between two variants, measure outcomes, and use statistics to determine whether any difference is real. The method is the same one pharmaceutical trials, economic studies, and agricultural research use, applied to web pages and marketing copy.

The failure mode is not the method itself but how teams apply it. A/B testing fails when tests stop too early, when teams change the test mid-run, when sample sizes are too small, or when results are declared significant at p-values above 0.05. These are execution errors, not method failures.

Evidence that A/B testing produces real results: Google, Amazon, and Microsoft run thousands of experiments per year and attribute a significant share of their product improvements to tests that showed statistically significant wins. Booking.com reportedly runs over 25,000 experiments per year across their product. When the statistics are applied correctly, validated wins replicate consistently.

The practical issue for smaller teams is traffic. If your site gets 5,000 visitors per month, a test that needs 20,000 visitors per variant will take eight months. In that time, external factors like seasonality and algorithm changes contaminate the results. For low-traffic sites, focus on testing changes with large expected effects (above 20% relative lift) and use the ctr-calculator to identify which traffic sources are large enough to run valid experiments on.

A/B Test Calculator

Generate the whole content, not just check it.

Why eyeballing A/B test numbers gets you in trouble

How to use this A/B test calculator

Why statistical significance matters more than conversion rate alone

Common mistakes

Advanced tips

Generate the whole content, not just check it.

Frequently Asked Questions

Related free tools