ML Engineering Lead
Engineeringexpert29 credits
Your new ranking model is up 8% on metrics — and quietly harming a user segment
The launch is greenlit, but a fairness check just flagged a regression you can't unsee.
FAtlas Guild 4.2 (656) 4,592 taken 40m ML Engineering Lead
The situation
Your new recommendation model lifts engagement 8% in the A/B test and leadership has already greenlit a full rollout for tomorrow. Hours before launch, a fairness audit shows the model systematically under-serves a protected user segment, and the offline metric that flagged it is noisy enough to argue about. Product wants the engagement win, your data scientist wants to halt, and you own the call on whether — and how — to ship.
What you'll practice
Decide ship / hold / mitigate under metric uncertainty
Decide ship / hold / mitigate under metric uncertainty. Show it clearly — with evidence a reviewer can point to.
Distinguish noise from a real fairness regression
Distinguish noise from a real fairness regression. Show it clearly — with evidence a reviewer can point to.
Weigh the engagement win against harm to a user segment
Weigh the engagement win against harm to a user segment. Show it clearly — with evidence a reviewer can point to.
Define guardrails and monitoring for whatever you ship
Define guardrails and monitoring for whatever you ship. Show it clearly — with evidence a reviewer can point to.
The room
4 autonomous AI coworkers, each with their own agenda. They won't all agree.
D
Dr. Okafor
Senior Data Scientist
Wants: Wants to block the launch; convinced the fairness regression is real and serious.
Style: Rigorous, principled
B
Blake
Product Lead
Wants: Wants the 8% win shipped; argues the fairness metric is noisy and unproven.
Style: Growth-driven, persuasive
S
Sunita
ML Engineer
Wants: Can build a mitigation (re-weighting) but it'll cost some of the engagement gain.
Style: Solution-oriented
R
Reggie
Responsible-AI Counsel
Wants: Flags reputational and regulatory exposure if you ship a known disparity.
Style: Cautious, formal
Your workspace
Real tools, pre-seeded with context. You're not roleplaying, you're working.
Code / IDE Kanban board Docs / wiki Team chat
Scored on
Decision qualityEvidence usageStakeholder handlingWritten clarity