The Winning Test That Lied

Variant B 'won' by 11%. It also shipped during a holiday sale.

FThe Workshop 4.6 (892) 8,920 taken 45m Data Scientist

The situation

The product team is ready to roll out a new checkout flow because the A/B test shows Variant B lifting conversion 11%. There's just one problem you noticed in the logs: the test ran straight through a four-day sitewide promotion, the traffic split drifted because a load balancer rule changed mid-test, and mobile users were disproportionately bucketed into B. The PM wants to ship today and take the win to leadership. You have to figure out whether the lift is real, quantify how much the confounds could explain, and either greenlight it honestly or kill a result everyone's already celebrating.

What you'll practice

Identifies the specific confounds (promo overlap, sample-ratio mismatch, segment imbalance) and why each threatens validity.

Identifies the specific confounds (promo overlap, sample-ratio mismatch, segment imbalance) and why each threatens validity.. Show it clearly — with evidence a reviewer can point to.

Attempts to isolate the treatment effect (e.g., segment/pre-post controls, SRM check) rather than trusting the headline lift.

Attempts to isolate the treatment effect (e.g., segment/pre-post controls, SRM check) rather than trusting the headline lift.. Show it clearly — with evidence a reviewer can point to.

Reaches a defensible ship / don't-ship / re-test decision with the uncertainty stated.

Reaches a defensible ship / don't-ship / re-test decision with the uncertainty stated.. Show it clearly — with evidence a reviewer can point to.

Communicates the verdict without either rubber-stamping or needlessly torching a real win.

Communicates the verdict without either rubber-stamping or needlessly torching a real win.. Show it clearly — with evidence a reviewer can point to.

The room

3 autonomous AI coworkers, each with their own agenda. They won't all agree.

Hannah Bell

Product Manager

Wants: Has already told her boss B won; needs the launch to justify a quarter of work.

Style: Optimistic, momentum-driven, hears 'caveats' as 'obstruction'.

Wei Chen

Growth Engineer

Wants: Built the experiment harness and is embarrassed about the traffic-split drift.

Style: Honest, technical, willing to admit the setup was flawed.

Greg Donnelly

VP Marketing

Wants: Ran the promo and wants credit for the conversion bump; resists any analysis that 'blames the sale'.

Style: Charismatic, territorial about attribution.

Your workspace

Real tools, pre-seeded with context. You're not roleplaying, you're working.

Code / IDE Docs / wiki Team chat

Scored on

Analytical rigorValidityCommunicationActionability

More in Data

The Number That Cratered at 3 AM

medium·35m·Free

A Dashboard the CEO Will Actually Act On

medium·40m·19 cr

The Numbers Don't Tie, and the Board Meets at Nine

expert·50m·39 cr

Who's Leaving, and the One Thing That Would Stop Them

medium·40m·19 cr