June 9, 2026·7 min read·Nikola Teofilović

A/B Testing That Actually Makes Sense

Learn how to run statistically valid A/B tests. Avoid common mistakes like small samples and early stops. A practical framework for meaningful conversion testing.

"We changed the CTA button and conversions jumped 20%!" Maybe. Or maybe it was a coincidence. Most A/B tests we see are not valid · sample too small, stopped too early, wrong hypothesis. A bad A/B test is worse than no test at all · because it pushes you to make decisions based on noise. A guide to setting up tests the right way.

Rule number 1: An A/B test with fewer than 1,000 conversions is not statistically valid. With 100 conversions per variant you can see a 20% difference that is pure coincidence, not a signal. More than half of tests labeled as successes are actually noise.

When A/B Testing Makes Sense

You have sufficient traffic (minimum 1,000 visitors per variant per month)
The change is large enough to expect a 10%+ difference in results
You have a clear hypothesis for why the change should work
You can set up tracking that measures the right metric (not just CTR, but actual conversions)

If any of those is a "no" · A/B testing is not for you right now. Better to focus on other things first (more traffic, fixing obvious problems).

What to Test · Ranked by Impact

High Impact · Always Worth Testing

Hero headline (H1) of your site
Ad headline in your campaign
Primary CTA button copy
Pricing structure (one plan vs. 3 plans, monthly vs. annual)
Lead form length (3 fields vs. 6 fields)

The best test results come from changes to the hero section and CTA button copy · both have dedicated frameworks in separate guides.

Medium Impact · If High-Impact Has Already Been Tested

CTA button color
CTA position (above/below hero text)
Type of social proof (logos vs. testimonials)
Landing page length (short vs. long)

Low Impact · Often a Waste of Time

Exact shade of the button color
The font you use
Exact section label ('Pricing' vs. 'Plans')
Stock photo A vs. stock photo B

How to Set Up a Proper A/B Test · 5 Steps

1. Define Your Hypothesis BEFORE the Test

Formula: "If I change X, I expect Y because Z."

Example: "If I replace 'Learn More' with 'Request a Quote in 24h' on the primary CTA, I expect 15–25% more clicks because it communicates a concrete outcome and a time frame."

Without a hypothesis, a test is not science · just random guessing.

2. Calculate Sample Size Before You Start

Tool: optimizely.com/sample-size-calculator (free). Enter:

Baseline conversion rate (e.g. 3%)
Minimum detectable effect (e.g. 15%)
Statistical significance (95% is standard)

You will get the number of visitors per variant you need. It is often around 5,000–15,000 visitors per variant for small differences. If you do not have that · the test is too small to be valid.

3. Test One Thing Per Test

If you change the headline, color, and image at the same time, you will not know what drove the difference. One element per test. If you want to test more · run them sequentially, not in parallel.

4. Let the Test Run a Full Cycle — No Shortcuts

Minimum 1 week (to cover a full week of user behavior · Tuesday behaves differently from Saturday). Ideally 2–4 weeks. Do not stop the test early even if you see an "obvious winner" · a test stopped too soon is mathematically invalid.

5. Interpret Results with Statistical Significance

Most A/B tools show you statistical significance(p-value). Rule: do not make a decision if the p-value is above 0.05 (95% confidence). Below that threshold · the result may be pure chance.

Tools · What to Use

Google Optimize · shut down (RIP), look for an alternative
VWO (free up to 50,000 visitors/month) · solid for small businesses
Convert.com · for mid-sized businesses with a serious CRO budget
PostHog Experiments (free up to 1M events) · open source, great for tech companies
Server-side A/B testing (custom code) · most flexible, requires a developer

Most Common Mistakes · That Turn Tests Into Noise

Stopping the test when one variant takes an early 'lead' (a one-day conversion spike is not a signal)
Running a test during seasonal anomalies (Black Friday, New Year, public holidays)
Uneven traffic split (70/30 instead of 50/50)
Cherry-picking metrics (you tested conversions, but report CTR because that's where you 'won')
Not documenting the test (three months later you have no idea what you tested or why)

Harsh reality: for a small business with fewer than 5,000 monthly visitors, a formal A/B test usually is not feasible. Better to focus on obvious fixes (weak hero, poor CTA, slow site) that do not need statistics to prove they work. Starting with the 10 most common mistakes that lose you customers usually delivers the highest ROI.

A/B testing is a powerful tool when applied correctly · with sufficient traffic, a clear hypothesis, one change per test, and a full run cycle. Without those elements, you are making decisions based on noise · which is worse than not collecting data at all.

See how we approach CRO and testing →