# Incident Response and Stabilization ## Purpose Guide high-pressure response to live or high-impact issues by separating immediate stabilization from deeper root-cause correction. ## When to use - A production issue is actively impacting users or operators - A regression needs containment before a complete fix is ready - The team needs a calm sequence for triage, mitigation, and follow-up - Communication and operational clarity matter as much as code changes ## Inputs to gather - Current symptoms, severity, affected users, and timing - Available logs, metrics, alerts, dashboards, and recent changes - Safe rollback, feature flag, degrade, or traffic-shaping options - Stakeholders who need updates and what they need to know ## How to work - Stabilize user impact first if a safe containment path exists. - Keep mitigation, diagnosis, and communication distinct but coordinated. - Prefer reversible steps under uncertainty. - Record what is confirmed versus assumed while the incident is active. - After stabilization, convert the incident into structured debugging and prevention work. ## Output expectations - Stabilization plan or incident response summary - Clear mitigation status and next actions - Follow-up work for root cause, observability, and prevention ## Quality checklist - User impact reduction is prioritized appropriately. - Risky irreversible changes are avoided under pressure. - Communication is clear enough for collaborators to act. - Post-incident follow-up is not lost after immediate recovery. ## Handoff notes - Note what was mitigated versus actually fixed. - Pair with debugging workflow and observability once the system is stable enough for deeper work.