Build agent instruction suite with deployment profiling

2026-03-23 14:14:19 -05:00
parent 7a09b60623
commit 41cfbb0151
44 changed files with 2281 additions and 1 deletions
@@ -0,0 +1,46 @@
+# Bug Triage
+
+## Purpose
+
+Frame a defect clearly before or during debugging so the team understands impact, reproduction, suspected scope, and next actions.
+
+## When to use
+
+- A bug report arrives with incomplete detail
+- A failing test or regression needs initial framing
+- Multiple issues compete for attention and severity matters
+- You need a reliable reproduction path before deeper debugging
+
+## Inputs to gather
+
+- Reported symptoms, expected behavior, and actual behavior
+- Reproduction steps, environment, and frequency
+- User impact, severity, and likely affected surfaces
+- Recent changes, logs, or tests that may be related
+
+## How to work
+
+- Turn vague symptoms into a concrete problem statement.
+- Reproduce the issue when possible and tighten the reproduction steps.
+- Separate confirmed facts from assumptions.
+- Estimate impact and likely blast radius before diving into fixes.
+- Identify the next best debugging step if root cause is not yet known.
+
+## Output expectations
+
+- Clear bug statement and reproduction status
+- Impact and severity assessment
+- Suspected scope or likely component area
+- Recommended next debugging or fix step
+
+## Quality checklist
+
+- The issue is described in observable terms rather than guesses.
+- Reproduction details are specific enough to be reused.
+- Impact is clear enough to prioritize intelligently.
+- Unknowns are named instead of hidden.
+
+## Handoff notes
+
+- Record environment details and whether the issue is deterministic, intermittent, or unconfirmed.
+- Pair with debugging workflow once the problem is framed well enough to investigate deeply.
@@ -0,0 +1,46 @@
+# Debugging Workflow
+
+## Purpose
+
+Find root cause efficiently and verify fixes with a disciplined workflow that avoids premature assumptions and shallow symptom treatment.
+
+## When to use
+
+- The defect is real but the cause is unclear
+- A failing test needs investigation
+- The system has inconsistent or environment-specific behavior
+- A regression may have multiple plausible causes
+
+## Inputs to gather
+
+- Reproduction path or failing signal
+- Relevant code paths, logs, traces, state transitions, and recent changes
+- Existing tests or ways to validate a hypothesis
+- Environment details that may influence behavior
+
+## How to work
+
+- Reproduce first when possible, then narrow scope by isolating the smallest failing path.
+- Form hypotheses from evidence, not instinct alone, and invalidate them aggressively.
+- Inspect boundaries: inputs, outputs, state mutations, async timing, external dependencies, and configuration.
+- Fix the root cause rather than only masking symptoms when feasible.
+- Re-run the original failing signal and add regression protection if appropriate.
+
+## Output expectations
+
+- Root cause explanation tied to evidence
+- Fix or recommended fix approach
+- Verification that the original issue is resolved
+- Remaining uncertainty, if any
+
+## Quality checklist
+
+- The explanation connects cause to symptom clearly.
+- The chosen fix addresses the real failure mechanism.
+- Verification includes the original failing path.
+- Regression protection is considered when the bug is likely to recur.
+
+## Handoff notes
+
+- Note whether the issue was fully reproduced, partially inferred, or fixed based on a probable cause.
+- Mention monitoring or follow-up checks if confidence is limited by environment or observability.
@@ -0,0 +1,45 @@
+# Incident Response and Stabilization
+
+## Purpose
+
+Guide high-pressure response to live or high-impact issues by separating immediate stabilization from deeper root-cause correction.
+
+## When to use
+
+- A production issue is actively impacting users or operators
+- A regression needs containment before a complete fix is ready
+- The team needs a calm sequence for triage, mitigation, and follow-up
+- Communication and operational clarity matter as much as code changes
+
+## Inputs to gather
+
+- Current symptoms, severity, affected users, and timing
+- Available logs, metrics, alerts, dashboards, and recent changes
+- Safe rollback, feature flag, degrade, or traffic-shaping options
+- Stakeholders who need updates and what they need to know
+
+## How to work
+
+- Stabilize user impact first if a safe containment path exists.
+- Keep mitigation, diagnosis, and communication distinct but coordinated.
+- Prefer reversible steps under uncertainty.
+- Record what is confirmed versus assumed while the incident is active.
+- After stabilization, convert the incident into structured debugging and prevention work.
+
+## Output expectations
+
+- Stabilization plan or incident response summary
+- Clear mitigation status and next actions
+- Follow-up work for root cause, observability, and prevention
+
+## Quality checklist
+
+- User impact reduction is prioritized appropriately.
+- Risky irreversible changes are avoided under pressure.
+- Communication is clear enough for collaborators to act.
+- Post-incident follow-up is not lost after immediate recovery.
+
+## Handoff notes
+
+- Note what was mitigated versus actually fixed.
+- Pair with debugging workflow and observability once the system is stable enough for deeper work.