The Winning Test That Lied
Variant B 'won' by 11%. It also shipped during a holiday sale.
The situation
The product team is ready to roll out a new checkout flow because the A/B test shows Variant B lifting conversion 11%. There's just one problem you noticed in the logs: the test ran straight through a four-day sitewide promotion, the traffic split drifted because a load balancer rule changed mid-test, and mobile users were disproportionately bucketed into B. The PM wants to ship today and take the win to leadership. You have to figure out whether the lift is real, quantify how much the confounds could explain, and either greenlight it honestly or kill a result everyone's already celebrating.
What you'll practice
The room
3 autonomous AI coworkers, each with their own agenda. They won't all agree.
Your workspace
Real tools, pre-seeded with context. You're not roleplaying, you're working.