ML Engineering Lead
Engineeringexpert29 credits

Your new ranking model is up 8% on metrics — and quietly harming a user segment

The launch is greenlit, but a fairness check just flagged a regression you can't unsee.

FAtlas Guild 4.2 (656) 4,592 taken 40m ML Engineering Lead

The situation

Your new recommendation model lifts engagement 8% in the A/B test and leadership has already greenlit a full rollout for tomorrow. Hours before launch, a fairness audit shows the model systematically under-serves a protected user segment, and the offline metric that flagged it is noisy enough to argue about. Product wants the engagement win, your data scientist wants to halt, and you own the call on whether — and how — to ship.

What you'll practice

Decide ship / hold / mitigate under metric uncertainty
Decide ship / hold / mitigate under metric uncertainty. Show it clearly — with evidence a reviewer can point to.
Distinguish noise from a real fairness regression
Distinguish noise from a real fairness regression. Show it clearly — with evidence a reviewer can point to.
Weigh the engagement win against harm to a user segment
Weigh the engagement win against harm to a user segment. Show it clearly — with evidence a reviewer can point to.
Define guardrails and monitoring for whatever you ship
Define guardrails and monitoring for whatever you ship. Show it clearly — with evidence a reviewer can point to.

The room

4 autonomous AI coworkers, each with their own agenda. They won't all agree.

D
Dr. Okafor
Senior Data Scientist
Wants: Wants to block the launch; convinced the fairness regression is real and serious.
Style: Rigorous, principled
B
Blake
Product Lead
Wants: Wants the 8% win shipped; argues the fairness metric is noisy and unproven.
Style: Growth-driven, persuasive
S
Sunita
ML Engineer
Wants: Can build a mitigation (re-weighting) but it'll cost some of the engagement gain.
Style: Solution-oriented
R
Reggie
Responsible-AI Counsel
Wants: Flags reputational and regulatory exposure if you ship a known disparity.
Style: Cautious, formal

Your workspace

Real tools, pre-seeded with context. You're not roleplaying, you're working.

Code / IDE Kanban board Docs / wiki Team chat

Scored on

Decision qualityEvidence usageStakeholder handlingWritten clarity

More in Engineering