A/B Testing That Actually Makes Sense
Learn how to run statistically valid A/B tests. Avoid common mistakes like small samples and early stops. A practical framework for meaningful conversion testing.
"We changed the CTA button and conversions jumped 20%!" Maybe. Or maybe it was a coincidence. Most A/B tests we see are not valid · sample too small, stopped too early, wrong hypothesis. A bad A/B test is worse than no test at all · because it pushes you to make decisions based on noise. A guide to setting up tests the right way.
When A/B Testing Makes Sense
- You have sufficient traffic (minimum 1,000 visitors per variant per month)
- The change is large enough to expect a 10%+ difference in results
- You have a clear hypothesis for why the change should work
- You can set up tracking that measures the right metric (not just CTR, but actual conversions)
If any of those is a "no" · A/B testing is not for you right now. Better to focus on other things first (more traffic, fixing obvious problems).
What to Test · Ranked by Impact
High Impact · Always Worth Testing
- Hero headline (H1) of your site
- Ad headline in your campaign
- Primary CTA button copy
- Pricing structure (one plan vs. 3 plans, monthly vs. annual)
- Lead form length (3 fields vs. 6 fields)
The best test results come from changes to the hero section and CTA button copy · both have dedicated frameworks in separate guides.
Medium Impact · If High-Impact Has Already Been Tested
- CTA button color
- CTA position (above/below hero text)
- Type of social proof (logos vs. testimonials)
- Landing page length (short vs. long)
Low Impact · Often a Waste of Time
- Exact shade of the button color
- The font you use
- Exact section label ('Pricing' vs. 'Plans')
- Stock photo A vs. stock photo B
How to Set Up a Proper A/B Test · 5 Steps
1. Define Your Hypothesis BEFORE the Test
Formula: "If I change X, I expect Y because Z."
Example: "If I replace 'Learn More' with 'Request a Quote in 24h' on the primary CTA, I expect 15–25% more clicks because it communicates a concrete outcome and a time frame."
Without a hypothesis, a test is not science · just random guessing.
2. Calculate Sample Size Before You Start
Tool: optimizely.com/sample-size-calculator (free). Enter:
- Baseline conversion rate (e.g. 3%)
- Minimum detectable effect (e.g. 15%)
- Statistical significance (95% is standard)
You will get the number of visitors per variant you need. It is often around 5,000–15,000 visitors per variant for small differences. If you do not have that · the test is too small to be valid.
3. Test One Thing Per Test
If you change the headline, color, and image at the same time, you will not know what drove the difference. One element per test. If you want to test more · run them sequentially, not in parallel.
4. Let the Test Run a Full Cycle — No Shortcuts
Minimum 1 week (to cover a full week of user behavior · Tuesday behaves differently from Saturday). Ideally 2–4 weeks. Do not stop the test early even if you see an "obvious winner" · a test stopped too soon is mathematically invalid.
5. Interpret Results with Statistical Significance
Most A/B tools show you statistical significance(p-value). Rule: do not make a decision if the p-value is above 0.05 (95% confidence). Below that threshold · the result may be pure chance.
Tools · What to Use
- Google Optimize · shut down (RIP), look for an alternative
- VWO (free up to 50,000 visitors/month) · solid for small businesses
- Convert.com · for mid-sized businesses with a serious CRO budget
- PostHog Experiments (free up to 1M events) · open source, great for tech companies
- Server-side A/B testing (custom code) · most flexible, requires a developer
Most Common Mistakes · That Turn Tests Into Noise
- Stopping the test when one variant takes an early 'lead' (a one-day conversion spike is not a signal)
- Running a test during seasonal anomalies (Black Friday, New Year, public holidays)
- Uneven traffic split (70/30 instead of 50/50)
- Cherry-picking metrics (you tested conversions, but report CTR because that's where you 'won')
- Not documenting the test (three months later you have no idea what you tested or why)
A/B testing is a powerful tool when applied correctly · with sufficient traffic, a clear hypothesis, one change per test, and a full run cycle. Without those elements, you are making decisions based on noise · which is worse than not collecting data at all.