Skip to main content
Module 2: Multi-Agent 4 / 6
Intermediate Session 10 Protocols FSM

Team Protocols

Master the request-response FSM with request_id for reliable multi-agent coordination.

March 20, 2026 17 min read

What You’ll Learn

When multiple agents collaborate, they need a shared language. Without structured protocols, messages get lost, responses arrive out of order, and agents waste cycles waiting for replies that already came. Team protocols solve this with a finite state machine (FSM) that governs every interaction.

By the end, you’ll understand:

  • The request-response FSM and its states
  • How request_id ensures message correlation
  • Structured message formats for team communication
  • Error handling: timeouts, retries, and fallback strategies

The Problem

Imagine two agents working together. Agent A asks Agent B to review some code. Agent B finishes and sends back feedback. But Agent A has already sent another request. Now Agent B’s response arrives — which request does it answer?

Without protocols, team communication has three failure modes:

1. LOST MESSAGES
   Agent A sends request → Agent B never sees it → Agent A waits forever

2. MISMATCHED RESPONSES
   Agent A sends request #1, then request #2
   Agent B responds twice — which response goes with which request?

3. RACE CONDITIONS
   Both agents edit the same file without knowing
   Result: merge conflict or overwritten work

How It Works

The Request-Response FSM

Every agent-to-agent interaction follows a finite state machine:

                    send request
    ┌──────┐      (with request_id)     ┌──────────────┐
    │      │ ──────────────────────────▶ │              │
    │ IDLE │                             │ REQUEST_SENT │
    │      │ ◀────────┐                 │              │
    └──────┘          │                 └──────┬───────┘
       ▲              │                        │
       │              │ timeout/               │ acknowledgment
       │              │ error                  │ received
       │              │                        ▼
       │              │                 ┌──────────────┐
       │              ├──────────────── │   WAITING    │
       │              │                 └──────┬───────┘
       │              │                        │ response arrives
       │              │                        ▼
       │              │                 ┌──────────────────┐
       │              └──────────────── │ RESPONSE_RECEIVED│
       │                                └────────┬─────────┘
       │              process result             │
       └─────────────────────────────────────────┘
  • IDLE — Agent is available. Can send or receive requests.
  • REQUEST_SENT — Message dispatched. Waiting for acknowledgment.
  • WAITING — Acknowledged. Waiting for the actual result.
  • RESPONSE_RECEIVED — Result arrived. Process it, return to IDLE.

The FSM ensures every request has exactly one lifecycle. No message exists in two states at once.

The request_id Pattern

The request_id ties requests to responses. Every request gets a unique identifier, and every response echoes it back:

Agent A (Implementer)              Agent B (Reviewer)
       │                                  │
       │  ┌──────────────────────────┐    │
       │  │ type: "request"          │    │
       ├─▶│ request_id: "req-0042"   │───▶│
       │  │ action: "review_code"    │    │
       │  │ payload: { file, diff }  │    │
       │  └──────────────────────────┘    │
       │                                  │
       │         ... time passes ...      │
       │                                  │
       │  ┌──────────────────────────┐    │
       │◀─│ type: "response"         │◀───│
       │  │ request_id: "req-0042"   │    │
       │  │ status: "completed"      │    │
       │  │ payload: { feedback }    │    │
       │  └──────────────────────────┘    │

Even if Agent A has sent other requests in the meantime, request_id: "req-0042" tells it exactly which request this response answers.

Structured Message Format

Every protocol message follows a consistent schema:

{
  "type": "request | response | status | broadcast",
  "from": "agent-name",
  "to": "agent-name | all",
  "request_id": "req-0042",
  "action": "review_code | run_tests | fix_issue",
  "payload": { "data": {} },
  "metadata": { "priority": "normal", "timeout_ms": 30000 }
}

Four message types:

  • request — “Please do this work” (requires a response)
  • response — “Here is the result” (references a request_id)
  • status — “Here is my current state” (informational, no response needed)
  • broadcast — “Everyone should know this” (sent to all team members)

Error Handling

Protocols must handle three failure scenarios:

┌────────────────────────────────────────────────┐
│  1. TIMEOUT                                     │
│     No response in 30s → retry once → escalate  │
│                                                  │
│  2. REJECTION                                    │
│     Agent cannot handle request                  │
│     { status: "rejected", reason: "out_of_scope"}│
│     → reassign to another agent                  │
│                                                  │
│  3. PARTIAL FAILURE                              │
│     Agent completed part of the work             │
│     { status: "partial",                         │
│       completed: [...], failed: [...] }          │
│     → retry failed parts or fallback             │
└────────────────────────────────────────────────┘

When all retries fail, the requesting agent attempts the work itself (solo execution) rather than blocking indefinitely.

Key Insight

Without protocols, team communication becomes chaotic. The FSM prevents lost messages and race conditions by making every interaction deterministic.

The request_id pattern solves the hardest problem in distributed systems: correlation. This is the same principle behind HTTP request-response, database transactions, and message queue acknowledgments. Agents can work asynchronously without losing track — an implementer can send code for review, continue on the next task, and correctly handle feedback whenever it arrives, because the request_id ties them together.

Hands-On Example

A complete team interaction: Implementer sends code to Reviewer, receives feedback, fixes issues.

STEP 1: Implementer sends review request
─────────────────────────────────────────
{
  type: "request", from: "implementer", to: "reviewer",
  request_id: "req-0042", action: "review_code",
  payload: {
    file: "src/auth/login.ts",
    diff: "+function validateToken(token) { ... }"
  }
}
State: IDLE → REQUEST_SENT

STEP 2: Reviewer acknowledges
─────────────────────────────────────────
{
  type: "status", from: "reviewer", to: "implementer",
  request_id: "req-0042",
  payload: { status: "acknowledged", eta_ms: 15000 }
}
State: REQUEST_SENT → WAITING

STEP 3: Reviewer sends feedback
─────────────────────────────────────────
{
  type: "response", from: "reviewer", to: "implementer",
  request_id: "req-0042", status: "completed",
  payload: {
    approved: false,
    issues: [
      { line: 12, severity: "high", message: "Token expiry not checked" },
      { line: 25, severity: "medium", message: "Missing error handling" }
    ]
  }
}
State: WAITING → RESPONSE_RECEIVED

STEP 4: Implementer processes, returns to IDLE
─────────────────────────────────────────
Reads issues, applies fixes, sends new request (req-0043) for re-review.
State: RESPONSE_RECEIVED → IDLE

Each step has a clear state transition. The Implementer never guesses whether the Reviewer received the request or what the feedback refers to.

Protocol Design Principles

DO:  Specific actions ("review_code", "run_tests", "fix_lint")
     Enough context in payload for standalone execution
     Realistic timeouts based on expected work duration

DON'T: Generic actions ("do_work") — too ambiguous
       Large file contents when a path suffices
       Timeouts too short — causes unnecessary retries

What Changed

Without Team ProtocolsWith Team Protocols
Messages have no structureEvery message follows a schema
No way to match responses to requestsrequest_id provides correlation
Agents block waiting for repliesFSM tracks state, enables async work
Failures cause silent hangsTimeouts and retries handle errors
Race conditions corrupt stateDeterministic state transitions

Next Session

Session 11 covers Autonomous Agents — how agents transition between WORK and IDLE phases, auto-claim tasks from shared queues, and operate independently without human-in-the-loop supervision.