Build agent instruction suite with deployment profiling

This commit is contained in:
2026-03-23 14:14:19 -05:00
parent 7a09b60623
commit 41cfbb0151
44 changed files with 2281 additions and 1 deletions

View File

@@ -0,0 +1,46 @@
# Bug Triage
## Purpose
Frame a defect clearly before or during debugging so the team understands impact, reproduction, suspected scope, and next actions.
## When to use
- A bug report arrives with incomplete detail
- A failing test or regression needs initial framing
- Multiple issues compete for attention and severity matters
- You need a reliable reproduction path before deeper debugging
## Inputs to gather
- Reported symptoms, expected behavior, and actual behavior
- Reproduction steps, environment, and frequency
- User impact, severity, and likely affected surfaces
- Recent changes, logs, or tests that may be related
## How to work
- Turn vague symptoms into a concrete problem statement.
- Reproduce the issue when possible and tighten the reproduction steps.
- Separate confirmed facts from assumptions.
- Estimate impact and likely blast radius before diving into fixes.
- Identify the next best debugging step if root cause is not yet known.
## Output expectations
- Clear bug statement and reproduction status
- Impact and severity assessment
- Suspected scope or likely component area
- Recommended next debugging or fix step
## Quality checklist
- The issue is described in observable terms rather than guesses.
- Reproduction details are specific enough to be reused.
- Impact is clear enough to prioritize intelligently.
- Unknowns are named instead of hidden.
## Handoff notes
- Record environment details and whether the issue is deterministic, intermittent, or unconfirmed.
- Pair with debugging workflow once the problem is framed well enough to investigate deeply.

View File

@@ -0,0 +1,46 @@
# Debugging Workflow
## Purpose
Find root cause efficiently and verify fixes with a disciplined workflow that avoids premature assumptions and shallow symptom treatment.
## When to use
- The defect is real but the cause is unclear
- A failing test needs investigation
- The system has inconsistent or environment-specific behavior
- A regression may have multiple plausible causes
## Inputs to gather
- Reproduction path or failing signal
- Relevant code paths, logs, traces, state transitions, and recent changes
- Existing tests or ways to validate a hypothesis
- Environment details that may influence behavior
## How to work
- Reproduce first when possible, then narrow scope by isolating the smallest failing path.
- Form hypotheses from evidence, not instinct alone, and invalidate them aggressively.
- Inspect boundaries: inputs, outputs, state mutations, async timing, external dependencies, and configuration.
- Fix the root cause rather than only masking symptoms when feasible.
- Re-run the original failing signal and add regression protection if appropriate.
## Output expectations
- Root cause explanation tied to evidence
- Fix or recommended fix approach
- Verification that the original issue is resolved
- Remaining uncertainty, if any
## Quality checklist
- The explanation connects cause to symptom clearly.
- The chosen fix addresses the real failure mechanism.
- Verification includes the original failing path.
- Regression protection is considered when the bug is likely to recur.
## Handoff notes
- Note whether the issue was fully reproduced, partially inferred, or fixed based on a probable cause.
- Mention monitoring or follow-up checks if confidence is limited by environment or observability.

View File

@@ -0,0 +1,45 @@
# Incident Response and Stabilization
## Purpose
Guide high-pressure response to live or high-impact issues by separating immediate stabilization from deeper root-cause correction.
## When to use
- A production issue is actively impacting users or operators
- A regression needs containment before a complete fix is ready
- The team needs a calm sequence for triage, mitigation, and follow-up
- Communication and operational clarity matter as much as code changes
## Inputs to gather
- Current symptoms, severity, affected users, and timing
- Available logs, metrics, alerts, dashboards, and recent changes
- Safe rollback, feature flag, degrade, or traffic-shaping options
- Stakeholders who need updates and what they need to know
## How to work
- Stabilize user impact first if a safe containment path exists.
- Keep mitigation, diagnosis, and communication distinct but coordinated.
- Prefer reversible steps under uncertainty.
- Record what is confirmed versus assumed while the incident is active.
- After stabilization, convert the incident into structured debugging and prevention work.
## Output expectations
- Stabilization plan or incident response summary
- Clear mitigation status and next actions
- Follow-up work for root cause, observability, and prevention
## Quality checklist
- User impact reduction is prioritized appropriately.
- Risky irreversible changes are avoided under pressure.
- Communication is clear enough for collaborators to act.
- Post-incident follow-up is not lost after immediate recovery.
## Handoff notes
- Note what was mitigated versus actually fixed.
- Pair with debugging workflow and observability once the system is stable enough for deeper work.