June 5, 2026

A/B Testing Holdout Groups Explained: How to Measure the True Impact of Your Experimentation Program

Holdout groups reserve a small slice of traffic from all experiments so you can measure the cumulative impact of your testing program — here's how they work and when you need one.

You ran 20 A/B tests last quarter. Eight of them won. You shipped the winners and tallied up the lifts: +4% here, +7% there, +12% on that pricing page tweak. Add it all up and your testing program delivered a 31% conversion lift.

Except it didn't. Summing individual experiment wins almost always overstates real impact. Effects interact. Some wins cancel each other out. Seasonal trends inflate numbers. The only reliable way to know how much your entire experimentation program actually moved the needle is a holdout group.

What Is a Holdout Group?

A holdout group is a small percentage of your traffic — typically 2-5% — that never sees any experiment. While 95-98% of your visitors experience every winning variant, feature rollout, and optimization you ship, the holdout group sees only your original, untouched experience.

After weeks or months, you compare overall performance between the two groups. The difference is your experimentation program's true cumulative lift — not the sum of individual wins, but the real, measured impact on your business metrics.

Companies like Microsoft, Meta, and Booking.com use global holdouts to justify experimentation budgets and prove ROI to leadership. Optimizely launched self-service Global Holdouts in March 2026, making the technique accessible beyond big tech for the first time.

Holdout Groups vs. Control Groups: What's the Difference?

Every A/B test has a control group — the users who see the original version of the specific element you're testing. A holdout group is different. It's a global control that spans your entire testing program, not just one experiment.

FeatureA/B Test Control GroupHoldout Group
ScopeOne experimentAll experiments
DurationDays to weeksWeeks to quarters
PurposeTest one changeMeasure cumulative program impact
Traffic share50% of test traffic2-5% of all traffic

This distinction matters because individual test results can be misleading. A headline change might win in isolation, but when combined with a new CTA color and a layout shift from two other experiments, the interactions could reduce or amplify the effect. Holdout groups capture these interactions automatically.

When Should You Set Up a Holdout Group?

Holdout groups are not for everyone. They make sense when your experimentation program has matured beyond occasional one-off tests. Here are the signals that you're ready:

  • You run 3+ tests per month. With fewer tests, the cumulative difference between holdout and non-holdout groups will be too small to measure reliably.
  • You have enough traffic. Reserving 5% of 1,000 daily visitors gives you 50 visitors in the holdout — far too few for statistical significance. You need at least 10,000-20,000 monthly visitors before holdouts become practical.
  • Leadership asks about ROI. If your VP asks "what has our testing program actually delivered?" and you can only point to a spreadsheet of individual test results, a holdout gives you a single, defensible number.
  • You ship multiple changes simultaneously. When feature flags and experiments overlap, holdouts are the only way to untangle cumulative impact.

If you're still in the early stages — running your first few tests to learn what moves the needle — focus on getting the fundamentals right before adding holdout complexity.

How to Set Up a Holdout Group

  1. Choose your holdout percentage. 5% is the standard starting point. Go lower (2-3%) if you can't afford to withhold optimizations from too many users. Optimizely warns you if your holdout exceeds 5%.
  2. Define your primary metric. Revenue per visitor, conversion rate, or activation rate — pick the metric your program is designed to move.
  3. Assign users persistently. Holdout assignment must be sticky. If a user is in the holdout today, they must stay there for the entire measurement period. Cookie-based or user-ID-based hashing works well.
  4. Run for at least one full quarter. Short holdouts don't accumulate enough experiment wins to show a meaningful gap.
  5. Rotate or refresh periodically. After each measurement period, release the holdout group so those users benefit from your optimizations, and draw a new holdout sample.

Tools like Statsig, Eppo, LaunchDarkly, and Optimizely offer built-in holdout management. If you're using PageDuel for your A/B tests, you can track which users see winning variants and manually segment a holdout cohort in your analytics tool while your experimentation program matures.

Common Holdout Pitfalls

Even well-run holdouts can go wrong. Watch for these issues:

  • Holdout contamination. If any team ships a change to holdout users — even a bug fix that touches the experiment surface — the holdout is compromised. Every team must respect the holdout boundary.
  • Too-small sample size. A 2% holdout on a low-traffic site produces noisy data. Use a sample size calculator to verify you'll reach significance within your measurement window.
  • Novelty effects. Early holdout comparisons may overstate impact because non-holdout users benefit from the novelty of changes. Wait until post-novelty stabilization before drawing conclusions.
  • Survivorship bias. If your holdout only includes users who survived to the end of the period, you're excluding users who churned — potentially the ones most affected by your experiments.

What Does a Holdout Result Look Like?

After running a global holdout for a quarter, you might find that your experimentation program delivered a 6% lift in revenue per visitor — even though the sum of individual test wins suggested 18%. That 6% is the real number. It accounts for interaction effects, novelty decay, and all the tests that won on a proxy metric but didn't move the bottom line.

That 6% is also the number that justifies your statistical methodology, your tools budget, and your team's headcount. It's the number that turns experimentation from a cost center into a proven growth engine.

Start Testing, Then Measure the Program

Holdout groups are the gold standard for experimentation program measurement — but you need a testing program to measure first. If you haven't started running experiments yet, PageDuel lets you launch your first A/B test in minutes, for free. Once you're running tests consistently, holdouts will tell you exactly how much they're worth.

Related Reading

Ready to test this on your own site?

PageDuel pairs free, privacy-friendly analytics with no-code A/B testing and revenue attribution — so you can find a leak, fix it, and prove the fix made money.

Free plan, free forever · No credit card · Set up in 5 minutes