Team Protocols
Master the request-response FSM with request_id for reliable multi-agent coordination.
What You’ll Learn
When multiple agents collaborate, they need a shared language. Without structured protocols, messages get lost, responses arrive out of order, and agents waste cycles waiting for replies that already came. Team protocols solve this with a finite state machine (FSM) that governs every interaction.
By the end, you’ll understand:
- The request-response FSM and its states
- How
request_idensures message correlation - Structured message formats for team communication
- Error handling: timeouts, retries, and fallback strategies
The Problem
Imagine two agents working together. Agent A asks Agent B to review some code. Agent B finishes and sends back feedback. But Agent A has already sent another request. Now Agent B’s response arrives — which request does it answer?
Without protocols, team communication has three failure modes:
1. LOST MESSAGES
Agent A sends request → Agent B never sees it → Agent A waits forever
2. MISMATCHED RESPONSES
Agent A sends request #1, then request #2
Agent B responds twice — which response goes with which request?
3. RACE CONDITIONS
Both agents edit the same file without knowing
Result: merge conflict or overwritten work
How It Works
The Request-Response FSM
Every agent-to-agent interaction follows a finite state machine:
send request
┌──────┐ (with request_id) ┌──────────────┐
│ │ ──────────────────────────▶ │ │
│ IDLE │ │ REQUEST_SENT │
│ │ ◀────────┐ │ │
└──────┘ │ └──────┬───────┘
▲ │ │
│ │ timeout/ │ acknowledgment
│ │ error │ received
│ │ ▼
│ │ ┌──────────────┐
│ ├──────────────── │ WAITING │
│ │ └──────┬───────┘
│ │ │ response arrives
│ │ ▼
│ │ ┌──────────────────┐
│ └──────────────── │ RESPONSE_RECEIVED│
│ └────────┬─────────┘
│ process result │
└─────────────────────────────────────────┘
- IDLE — Agent is available. Can send or receive requests.
- REQUEST_SENT — Message dispatched. Waiting for acknowledgment.
- WAITING — Acknowledged. Waiting for the actual result.
- RESPONSE_RECEIVED — Result arrived. Process it, return to IDLE.
The FSM ensures every request has exactly one lifecycle. No message exists in two states at once.
The request_id Pattern
The request_id ties requests to responses. Every request gets a unique identifier, and every response echoes it back:
Agent A (Implementer) Agent B (Reviewer)
│ │
│ ┌──────────────────────────┐ │
│ │ type: "request" │ │
├─▶│ request_id: "req-0042" │───▶│
│ │ action: "review_code" │ │
│ │ payload: { file, diff } │ │
│ └──────────────────────────┘ │
│ │
│ ... time passes ... │
│ │
│ ┌──────────────────────────┐ │
│◀─│ type: "response" │◀───│
│ │ request_id: "req-0042" │ │
│ │ status: "completed" │ │
│ │ payload: { feedback } │ │
│ └──────────────────────────┘ │
Even if Agent A has sent other requests in the meantime, request_id: "req-0042" tells it exactly which request this response answers.
Structured Message Format
Every protocol message follows a consistent schema:
{
"type": "request | response | status | broadcast",
"from": "agent-name",
"to": "agent-name | all",
"request_id": "req-0042",
"action": "review_code | run_tests | fix_issue",
"payload": { "data": {} },
"metadata": { "priority": "normal", "timeout_ms": 30000 }
}
Four message types:
- request — “Please do this work” (requires a response)
- response — “Here is the result” (references a request_id)
- status — “Here is my current state” (informational, no response needed)
- broadcast — “Everyone should know this” (sent to all team members)
Error Handling
Protocols must handle three failure scenarios:
┌────────────────────────────────────────────────┐
│ 1. TIMEOUT │
│ No response in 30s → retry once → escalate │
│ │
│ 2. REJECTION │
│ Agent cannot handle request │
│ { status: "rejected", reason: "out_of_scope"}│
│ → reassign to another agent │
│ │
│ 3. PARTIAL FAILURE │
│ Agent completed part of the work │
│ { status: "partial", │
│ completed: [...], failed: [...] } │
│ → retry failed parts or fallback │
└────────────────────────────────────────────────┘
When all retries fail, the requesting agent attempts the work itself (solo execution) rather than blocking indefinitely.
Key Insight
Without protocols, team communication becomes chaotic. The FSM prevents lost messages and race conditions by making every interaction deterministic.
The request_id pattern solves the hardest problem in distributed systems: correlation. This is the same principle behind HTTP request-response, database transactions, and message queue acknowledgments. Agents can work asynchronously without losing track — an implementer can send code for review, continue on the next task, and correctly handle feedback whenever it arrives, because the request_id ties them together.
Hands-On Example
A complete team interaction: Implementer sends code to Reviewer, receives feedback, fixes issues.
STEP 1: Implementer sends review request
─────────────────────────────────────────
{
type: "request", from: "implementer", to: "reviewer",
request_id: "req-0042", action: "review_code",
payload: {
file: "src/auth/login.ts",
diff: "+function validateToken(token) { ... }"
}
}
State: IDLE → REQUEST_SENT
STEP 2: Reviewer acknowledges
─────────────────────────────────────────
{
type: "status", from: "reviewer", to: "implementer",
request_id: "req-0042",
payload: { status: "acknowledged", eta_ms: 15000 }
}
State: REQUEST_SENT → WAITING
STEP 3: Reviewer sends feedback
─────────────────────────────────────────
{
type: "response", from: "reviewer", to: "implementer",
request_id: "req-0042", status: "completed",
payload: {
approved: false,
issues: [
{ line: 12, severity: "high", message: "Token expiry not checked" },
{ line: 25, severity: "medium", message: "Missing error handling" }
]
}
}
State: WAITING → RESPONSE_RECEIVED
STEP 4: Implementer processes, returns to IDLE
─────────────────────────────────────────
Reads issues, applies fixes, sends new request (req-0043) for re-review.
State: RESPONSE_RECEIVED → IDLE
Each step has a clear state transition. The Implementer never guesses whether the Reviewer received the request or what the feedback refers to.
Protocol Design Principles
DO: Specific actions ("review_code", "run_tests", "fix_lint")
Enough context in payload for standalone execution
Realistic timeouts based on expected work duration
DON'T: Generic actions ("do_work") — too ambiguous
Large file contents when a path suffices
Timeouts too short — causes unnecessary retries
What Changed
| Without Team Protocols | With Team Protocols |
|---|---|
| Messages have no structure | Every message follows a schema |
| No way to match responses to requests | request_id provides correlation |
| Agents block waiting for replies | FSM tracks state, enables async work |
| Failures cause silent hangs | Timeouts and retries handle errors |
| Race conditions corrupt state | Deterministic state transitions |
Next Session
Session 11 covers Autonomous Agents — how agents transition between WORK and IDLE phases, auto-claim tasks from shared queues, and operate independently without human-in-the-loop supervision.