How to A/B Test Cold Emails (And What to Actually Test)
Most A/B tests are statistically meaningless — not enough emails per variant or testing the wrong things. Here is what to test, in what order, and how many sends you actually need.
You send 100 emails with Subject Line A and 100 with Subject Line B. Subject A gets 45 opens. Subject B gets 38 opens. You declare A the winner and roll it out to 500 more prospects.
Statistically, you just made a decision based on noise. A 7-open difference on 100 emails has roughly a 45% probability of being random chance — not a real difference. You might be optimizing toward the wrong subject line based on data that's indistinguishable from a coin flip.
Here's how to A/B test cold emails correctly.
The One Variable Rule
Change exactly one thing per test. Not two. Not "subject line and opener together." One variable. If you change the subject line and the opener simultaneously and reply rate goes up, you have no idea which change caused the improvement. You can't replicate it. The test was wasted. Lock in winners one variable at a time.
What to Test First (Priority Order)
1. Subject Line — controls your open rate. Test length (short vs medium), format (question vs statement), personalization (with company name vs without), and pattern type (observation vs compliment vs gap-spotting). Minimum 200-300 emails per variant. Recommended: 500+.
2. Opener — controls your reply rate. Test: business observation vs trigger event vs compliment vs direct relevance vs problem-first. Minimum 300 emails per variant. Recommended: 500+.
3. CTA — controls conversion from read to reply. Test: soft ask ("Worth a look?") vs direct ask ("Can we book 15 minutes?") vs value-first ("Can I send you an example?") vs no ask (statement close). Minimum 500 emails per variant. Recommended: 1,000+.
4. Body Length — refinement. Test short (50-80 words) vs medium (80-125 words) vs long (125-175 words). Data suggests 80-125 words is the sweet spot, but your audience may differ. Test this last — by the time you're optimizing body length, your subject lines, openers, and CTAs should already be working.
How to Read Results: Winner vs Noise
- Under 200 emails/variant: Don't even look at the results. The data isn't meaningful yet.
- 200-500 emails/variant: A 20%+ relative difference is probably real. A 5-10% difference is noise.
- 500-1,000 emails/variant: A 10-15% relative difference is probably real.
- 1,000+ emails/variant: Even 5-10% differences start to be meaningful.
Common A/B Testing Mistakes
- Testing too many things at once — you learn nothing.
- Calling a winner too early — a 10% difference after 50 sends per variant is meaningless.
- Testing the wrong things first — fix subject lines before testing body length.
- Not segmenting test groups — split your list randomly between variants.
- Testing on a dirty list — if your bounce rate is 10%, your A/B test data is garbage.
- Ignoring external factors — send both variants at the same time, same day, to similar segments.
A Simple A/B Testing Workflow
- Define the test: "I'm testing whether a short subject line (under 40 chars) or a medium subject line (50-60 chars) generates higher open rates."
- Set your sample size: 500 emails per variant (1,000 total).
- Split your list randomly.
- Send both variants at the same time, same day, to similar list quality.
- Wait at least 72 hours for replies to come in.
- Compare results — is the difference large enough to be real?
- If winner: lock it in, test the next variable. If tie: keep default, test something else.
- Document everything: winning variant, sample size, absolute numbers, date. Build a knowledge base.
XSendFlow supports multi-variant campaigns — set up Variant A and Variant B side by side, split your list automatically, and see real-time stats per variant. Once a winner is clear, roll it out to the rest of your list with one click.
Ready to send better cold emails?
Try XSendFlow free →