AGENTS.md Template: Production Debugging Workflows
AGENTS.md Template for Production Debugging Workflows provides a formal operating manual for AI coding agents tasked with diagnosing and resolving issues in live systems. It covers both single agent operation and multi agent orchestration, and it defines roles, handoffs, memory, and governance to maintain reliability and safety during debugging.
Target User
Developers, founders, product teams, engineering leaders
Use Cases
- Define a production debugging workflow for single-agent or multi-agent orchestration
- Standardize runbook for incident triage, reproduction, and fix
- Govern tool access and memory/context across agents
- Ensure human review and escalation for risky changes
Markdown Template
AGENTS.md Template: Production Debugging Workflows
# AGENTS.md
Project role
Incident Debugging Operator responsible for triage, reproduction, and remediation in production
Agent roster and responsibilities
Planner: defines tasks, coordinates sub agents, tracks progress
Implementer: develops fixes and changes to services in a controlled environment
Tester: validates fixes against scenarios including synthetic events
Reviewer: approves changes before deployment
Researcher: gathers logs, traces, metrics, and domain context
Domain Specialist: provides domain specific insights for accurate remediation
Orchestrator: supervises the workflow, enforces handoffs and memory updates
Supervisor or orchestrator behavior
The orchestrator maintains a shared memory of issue state, assigns tasks, triggers handoffs, and enforces escalation gates
It chooses safe execution paths and halts on prohibited actions
Handoff rules between agents
When a task completes, hand off artifacts including context, logs, and test results to the next agent
If a task fails, escalate to the orchestrator and apply rollback if necessary
Context memory and source of truth rules
Store issue context, logs, traces, and remediation notes in a persistent memory
Use a single source of truth including dashboard metrics and incident tickets
Tool access and permission rules
Agents may read logs and metrics, execute limited read write actions in staging, and request production changes via the orchestrator
Secrets must be retrieved from secure vaults and never logged
Architecture rules
Node based microservice architecture with clear service boundaries and observability
Agents communicate via defined interfaces and do not perform unapproved global config changes
File structure rules
Keep debugging artifacts under a dedicated production_debugging folder
Do not mix with application source; separate concerns
Data API or integration rules
Use production data only via controlled simulators or synthetic events when possible
Do not write to production data or settings without approval
Validation rules
All fixes must pass automated tests in CI and validated in staging before canary deployment
Security rules
No secrets in messages, all credentials fetched from vaults, rotate after remediation
Testing rules
Include unit tests, integration tests, and end to end tests in staging where applicable
Deployment rules
Deploy changes through canary canaries with monitoring and alerting
Human review and escalation rules
High risk changes require human sign off
Failure handling and rollback rules
If remediation fails, revert code and configuration to known good state; preserve logs and memory
Things Agents must not do
Do not modify production system configurations without approval
Do not bypass governance or run unsupported tooling in production
Do not disclose sensitive data in messagesOverview
The AGENTS.md Template for Production Debugging Workflows provides a formal operating manual for AI coding agents tasked with diagnosing and resolving issues in live systems. It covers both single agent operation and multi agent orchestration, and it defines roles, handoffs, memory, and governance to maintain reliability and safety during debugging.
When to Use This AGENTS.md Template
- During active production incidents requiring rapid triage, investigation, and remediation
- To standardize debugging processes across teams and services
- When introducing multi agent orchestration for incident handling
- To ensure tool governance and human review for high risk changes
- When documenting project level operating context for AI coding agents
Copyable AGENTS.md Template
The following is a ready to paste AGENTS.md template block that governs the production debugging workflow
# AGENTS.md
Project role
Incident Debugging Operator responsible for triage, reproduction, and remediation in production
Agent roster and responsibilities
Planner: defines tasks, coordinates sub agents, tracks progress
Implementer: develops fixes and changes to services in a controlled environment
Tester: validates fixes against scenarios including synthetic events
Reviewer: approves changes before deployment
Researcher: gathers logs, traces, metrics, and domain context
Domain Specialist: provides domain specific insights for accurate remediation
Orchestrator: supervises the workflow, enforces handoffs and memory updates
Supervisor or orchestrator behavior
The orchestrator maintains a shared memory of issue state, assigns tasks, triggers handoffs, and enforces escalation gates
It chooses safe execution paths and halts on prohibited actions
Handoff rules between agents
When a task completes, hand off artifacts including context, logs, and test results to the next agent
If a task fails, escalate to the orchestrator and apply rollback if necessary
Context memory and source of truth rules
Store issue context, logs, traces, and remediation notes in a persistent memory
Use a single source of truth including dashboard metrics and incident tickets
Tool access and permission rules
Agents may read logs and metrics, execute limited read write actions in staging, and request production changes via the orchestrator
Secrets must be retrieved from secure vaults and never logged
Architecture rules
Node based microservice architecture with clear service boundaries and observability
Agents communicate via defined interfaces and do not perform unapproved global config changes
File structure rules
Keep debugging artifacts under a dedicated production_debugging folder
Do not mix with application source; separate concerns
Data API or integration rules
Use production data only via controlled simulators or synthetic events when possible
Do not write to production data or settings without approval
Validation rules
All fixes must pass automated tests in CI and validated in staging before canary deployment
Security rules
No secrets in messages, all credentials fetched from vaults, rotate after remediation
Testing rules
Include unit tests, integration tests, and end to end tests in staging where applicable
Deployment rules
Deploy changes through canary canaries with monitoring and alerting
Human review and escalation rules
High risk changes require human sign off
Failure handling and rollback rules
If remediation fails, revert code and configuration to known good state; preserve logs and memory
Things Agents must not do
Do not modify production system configurations without approval
Do not bypass governance or run unsupported tooling in production
Do not disclose sensitive data in messages
Recommended Agent Operating Model
Roles and decision boundaries for effective production debugging
- Planner determines scope, assigns tasks, coordinates handoffs
- Implementer makes targeted changes in staging first then production with approvals
- Tester validates changes against scenarios and live event emulations
- Reviewer approves before deployment to production
- Researcher collects logs and traces to inform decisions
- Domain Specialist provides context on domain specific constraints
- Orchestrator enforces governance and escalations
Recommended Project Structure
production_debugging/
├─ agents/
│ ├─ planner/
│ ├─ implementer/
│ ├─ tester/
│ ├─ reviewer/
│ ├─ researcher/
│ └─ domain_specialist/
├─ data/
│ ├─ logs/
│ ├─ traces/
│ └─ memory/
├─ integrations/
├─ tests/
└─ docs/
Core Operating Principles
- Explicit handoffs with complete context
- Single source of truth for issue state
- Limit tool access to minimize blast radius
- Human review for high risk actions
- Traceable decisions and auditable changes
Agent Handoff and Collaboration Rules
Rules for planner implementer tester reviewer researcher and domain specialist
Tool Governance and Permission Rules
Rules for command execution file edits API calls secrets and production systems
Code Construction Rules
Specific implementation constraints for this workflow
Security and Production Rules
Security policy for debugging in production including access controls and data handling
Testing Checklist
- Unit tests for each component
- Integration tests across services
- Staging to production canary checks
- Rollout monitoring and automatic rollback on anomalies
Common Mistakes to Avoid
- Skipping human review for high risk changes
- Overwriting production data or configurations without sign off
- Ignoring memory and source of truth drift
Related implementation resources: AI Use Case for Corporate Event Managers Using Slack To Orchestrate Day-Of Venue Tasks Across Multi-Department Teams and AI Agent Use Case for Wholesalers Using Multi-Currency Ledger Trackers To Calculate Foreign Exchange Risk Exposure Across Global Accounts.
FAQ
How does this AGENTS.md template define a production debugging workflow?
This template defines roles, handoffs, and governance for debugging incidents in production, enabling single or multi agent coordination.
Who should be on the agent roster for production debugging?
The roster includes Planner, Implementer, Tester, Reviewer, Researcher, Domain Specialist, and Orchestrator for coordination and control.
How are handoffs between agents handled?
Handoffs include complete context, artifacts, and test results; failures escalate to the orchestrator with a rollback plan if needed.
What are the tool governance requirements?
Access is limited, secrets are retrieved from vaults, and production changes require orchestrator approval and audit trails.
How is security ensured during debugging?
Secrets never leave vaults, credentials rotate after remediation, and logs are scrubbed to avoid exposing sensitive data.