Your new ranking model is up 8% on metrics — and quietly harming a user segment

The launch is greenlit, but a fairness check just flagged a regression you can't unsee.

FAtlas Guild 4.2 (656) 4,592 taken 40m ML Engineering Lead

The situation

Your new recommendation model lifts engagement 8% in the A/B test and leadership has already greenlit a full rollout for tomorrow. Hours before launch, a fairness audit shows the model systematically under-serves a protected user segment, and the offline metric that flagged it is noisy enough to argue about. Product wants the engagement win, your data scientist wants to halt, and you own the call on whether — and how — to ship.

What you'll practice

Decide ship / hold / mitigate under metric uncertainty

Decide ship / hold / mitigate under metric uncertainty. Show it clearly — with evidence a reviewer can point to.

Distinguish noise from a real fairness regression

Distinguish noise from a real fairness regression. Show it clearly — with evidence a reviewer can point to.

Weigh the engagement win against harm to a user segment

Weigh the engagement win against harm to a user segment. Show it clearly — with evidence a reviewer can point to.

Define guardrails and monitoring for whatever you ship

Define guardrails and monitoring for whatever you ship. Show it clearly — with evidence a reviewer can point to.

The room

4 autonomous AI coworkers, each with their own agenda. They won't all agree.

Dr. Okafor

Senior Data Scientist

Wants: Wants to block the launch; convinced the fairness regression is real and serious.

Style: Rigorous, principled

Blake

Product Lead

Wants: Wants the 8% win shipped; argues the fairness metric is noisy and unproven.

Style: Growth-driven, persuasive

Sunita

ML Engineer

Wants: Can build a mitigation (re-weighting) but it'll cost some of the engagement gain.

Style: Solution-oriented

Reggie

Responsible-AI Counsel

Wants: Flags reputational and regulatory exposure if you ship a known disparity.

Style: Cautious, formal

Your workspace

Real tools, pre-seeded with context. You're not roleplaying, you're working.

Code / IDE Kanban board Docs / wiki Team chat

Scored on

Decision qualityEvidence usageStakeholder handlingWritten clarity

More in Engineering

Ship-or-slip: lead the Contoso launch go/no-go

hard·45m·49 cr

On-call: checkout is down

expert·30m·39 cr

3:47 AM: checkout is throwing 500s on Black Friday

hard·35m·Free

Ship-or-slip: lead the Contoso launch go/no-go

hard·30m·19 cr