# Observability and Operability ## Purpose Make systems easier to understand, debug, and run by improving signals, diagnostics, and operational readiness around important behavior. ## When to use - A system is hard to diagnose in production or staging - New functionality needs useful logs, metrics, traces, or alerts - Operational ownership is unclear during failures or rollout - Reliability work needs better visibility before deeper changes ## Inputs to gather - Critical workflows, failure modes, and current diagnostic signals - Existing logging, metrics, tracing, dashboards, and alerts - Operator needs during rollout, incident response, and debugging - Noise constraints and performance or cost considerations ## How to work - Instrument the questions a responder will need answered during failure. - Prefer signals tied to user-impacting behavior over vanity metrics. - Make logs structured and actionable when possible. - Add observability close to important boundaries and state transitions. - Keep signal quality high by avoiding low-value noise. ## Output expectations - Improved observability or an operability plan for the target area - Clear explanation of what new signals reveal - Notes on alerting, dashboard, or rollout support when relevant ## Quality checklist - Signals help detect and diagnose meaningful failures. - Instrumentation is focused and not excessively noisy. - Operational usage is considered, not just implementation convenience. - Added visibility maps to critical user or system outcomes. ## Handoff notes - Mention what incidents or debugging tasks the new observability should make easier. - Pair with debugging workflow, incident response, or performance optimization when diagnosis is the main bottleneck.