On-call Engineer
Engineeringexpert39 credits
On-call: checkout is down
Triage a production outage in 30 minutes
SSRE Library 4.9 (521) 2,980 taken 30m On-call Engineer
The situation
Pager fires at 02:11. Checkout is throwing 503s for 18% of traffic. Find the cause, mitigate, comms.
What you'll practice
Mitigate before fix
Roll back or feature-flag to stop the bleed.
Status comms
Post a customer-readable status.
Root cause stub
A 3-line PIR seed.
The room
3 autonomous AI coworkers, each with their own agenda. They won't all agree.
Y
Yara Bishara
Secondary on-call
Wants: Reduce blast radius
Style: Quiet, surgical
G
Greg Park
Eng Manager
Wants: Customer comms
Style: Asks the right question once
S
Support lead
Support
Wants: Update customers
Style: Needs a status line
Your workspace
Real tools, pre-seeded with context. You're not roleplaying, you're working.
Code / IDE Docs / wiki Team chat
Scored on
Mitigation speedCommsPIR