Skip to main content
Module 4: Mastery 6 / 6
Advanced Session 24 Production Monitoring

Production Patterns

Master observability, monitoring, and deployment patterns for production use.

March 20, 2026 22 min read

What You’ll Learn

So far you have used Claude Code as a personal tool — one developer in a terminal. But Claude Code can also run headless in CI/CD pipelines, serve entire teams, and operate as part of automated workflows. Moving to production requires the same rigor as any production system: monitoring, testing, error handling, and cost control.

By the end of this final session, you’ll understand:

  • Headless mode for non-interactive automation
  • CI/CD integration with GitHub Actions
  • Observability: logging, metrics, and alerting
  • Cost tracking and budgeting
  • Testing agent workflows
  • The maturity model for scaling AI-assisted development

The Problem

A tool that works on your laptop is not the same as a tool that runs reliably in production:

Development:                    Production:
One user                        Many users/triggers
Interactive terminal             Headless, no UI
Errors → you see them           Errors → silent failure
Cost → your API bill            Cost → team/org budget
Testing → "it worked for me"    Testing → automated validation

Every gap is a potential failure mode.

How It Works

Headless Mode

Claude Code runs without a terminal UI using the -p flag and structured output:

# One-shot mode with JSON output
claude -p "Summarize changes in the last 3 commits" --output-format json

# Pipe input for processing
cat requirements.txt | claude -p "Check for outdated dependencies"

The --output-format json flag is critical for automation. Instead of terminal output, you get parseable JSON:

{
  "result": "Found 2 issues: ...",
  "cost_usd": 0.023,
  "duration_ms": 4500,
  "model": "claude-sonnet-4-20250514",
  "session_id": "sess_abc123"
}

CI/CD Integration

The most common production pattern: Claude Code in GitHub Actions.

# .github/workflows/ai-review.yml
name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run AI Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          npm install -g @anthropic-ai/claude-code
          DIFF=$(git diff origin/main...HEAD)
          echo "$DIFF" | claude -p \
            "Review this diff for bugs and security issues." \
            --output-format json --max-turns 20 \
            > review.json 2>&1 || true

      - name: Post Review Comment
        if: always()
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            let review;
            try {
              review = JSON.parse(fs.readFileSync('review.json', 'utf8'));
            } catch (e) {
              review = { result: 'Review failed: ' + e.message };
            }
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `## AI Code Review\n\n${review.result || 'No output'}\n\n---\nCost: \`$${review.cost_usd || '?'}\``
            });

Key production details: timeout-minutes prevents runaway sessions, --max-turns caps agent iterations, || true prevents step failure, if: always() posts results even on error.

Observability

┌──────────────────────────────────────────────────────┐
│  Layer 1: Logging                                     │
│  What prompt, what tools called, what output, errors  │
│                                                       │
│  Layer 2: Metrics                                     │
│  Tokens per session, cost per project, duration,      │
│  tool call counts, error rates                        │
│                                                       │
│  Layer 3: Alerting                                    │
│  Cost exceeds threshold, error rate spikes,           │
│  agent stuck in loop, unexpected tool calls           │
└──────────────────────────────────────────────────────┘

A simple cost tracking wrapper:

#!/bin/bash
# track-cost.sh - Log Claude Code costs per project
PROJECT="$1"; PROMPT="$2"
OUTPUT=$(claude -p "$PROMPT" --output-format json 2>/dev/null)
COST=$(echo "$OUTPUT" | jq -r '.cost_usd // 0')
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
echo "$TIMESTAMP,$PROJECT,$COST" >> ~/.claude/cost-log.csv
echo "$OUTPUT" | jq -r '.result'

Cost Monitoring

┌──────────────────────────────────────────────────────┐
│  Layer 1: Per-session limits                          │
│  claude --max-tokens 50000                            │
│                                                       │
│  Layer 2: Model selection                             │
│  Haiku ~$0.001/call │ Sonnet ~$0.01 │ Opus ~$0.05   │
│                                                       │
│  Layer 3: Budget dashboards                           │
│  Track cost per project, per developer, over time     │
│                                                       │
│  Layer 4: Alerts                                      │
│  "Project X spent $50 today (limit: $20)"             │
└──────────────────────────────────────────────────────┘

Testing Agent Workflows

Agent workflows need testing at three levels:

Unit tests: Validate skill and agent files exist and are well-formed.

for skill_dir in .claude/skills/*/; do
  [ ! -f "$skill_dir/SKILL.md" ] && echo "FAIL: Missing $skill_dir/SKILL.md" && exit 1
done
echo "PASS: All skill files present"

Integration tests: Run Claude Code with a known prompt, verify expected behavior.

OUTPUT=$(claude -p "/review" --output-format json 2>/dev/null)
echo "$OUTPUT" | jq -r '.result' | grep -q "Security" && echo "PASS" || echo "FAIL"

End-to-end tests: Run a complete workflow in a test environment, verify the outcome.

The Maturity Model

┌──────────────────────────────────────────────────────┐
│  Stage 1: Ad-hoc Usage                               │
│  Individual devs use Claude Code, no shared config    │
│  → Personal productivity boost                        │
│                                                       │
│  Stage 2: Personal Automation                         │
│  Custom CLAUDE.md, personal skills, allowlists        │
│  → Consistent personal workflows                      │
│                                                       │
│  Stage 3: Team Standardization                        │
│  Shared CLAUDE.md in repo, team skills, CI/CD         │
│  → Team-wide consistency and quality                   │
│                                                       │
│  Stage 4: Platform Integration                        │
│  All pipelines, custom MCP servers, dashboards        │
│  → Organizational capability                           │
└──────────────────────────────────────────────────────┘

Most teams are at Stage 1-2. Stage 3 is where the multiplier kicks in — shared knowledge means every team member benefits from the best workflows.

Key Insight

Production Claude Code is about reliability and predictability. The same principles that make software production-ready apply directly to AI agent systems.

An agent that works 95% of the time is not production-ready. Production-ready means: you know when it fails (monitoring), you know why (logging), you know the cost (budgeting), you know it behaves consistently (testing), and you know it will not do something unexpected (permissions).

The tools are not exotic — logs, metrics, alerts, tests, CI/CD. The only difference is that the “service” is an AI agent instead of a web server.

Hands-On Example

Combine everything into a production checklist for your first CI/CD integration:

1. Create .claude/skills/review/SKILL.md     (review workflow)
2. Create .claude/agents/security-reviewer.md (security checks)
3. Add .github/workflows/ai-review.yml       (CI/CD above)
4. Set ANTHROPIC_API_KEY in repo secrets
5. Open a PR → AI review posts automatically
6. Track cost-log.csv weekly → set budget alerts
7. Review allowlists monthly → expand as trust builds

Start with one pipeline. Measure quality and cost. Expand to more workflows as confidence grows. This is Stage 2 to Stage 3 in the maturity model — the transition from personal tool to team infrastructure.

What Changed

Development UsageProduction Usage
Interactive terminalHeadless, JSON output
One developerTeam-wide, CI/CD triggers
Manual monitoringDashboards and alerts
Cost is personalCost is tracked and budgeted
”It works for me”Automated validation
Ad-hoc workflowsStandardized skills, version-controlled

Course Conclusion

You have completed the Claude Code Architecture Mastery course — 24 sessions across four modules.

Module 1: Core Agent (Sessions 1-6) — The agent loop, tools and permissions, planning with TodoWrite, subagents and context isolation, skills and knowledge loading, context compaction. You learned that Claude Code is not a chatbot but an agent loop with tools, permissions, and context management.

Module 2: Multi-Agent (Sessions 7-12) — Task graphs, background tasks, agent teams, team protocols, autonomous agents, worktree isolation. You learned that complex work requires multiple agents coordinating through well-defined protocols.

Module 3: Real Architecture (Sessions 13-18) — Control protocols, MCP integration, hooks, session storage, CLAUDE.md design, permission models. You gained practical knowledge to configure, extend, and secure Claude Code.

Module 4: Mastery (Sessions 19-24) — Multi-CLI workflows, error recovery, cost optimization, human-in-the-loop, custom agents and skills, production patterns. You reached production readiness.

The Core Principles

Five ideas appeared across all 24 sessions:

  1. The agent loop is the foundation. Tools, subagents, skills, teams — all built on: call API, check stop reason, execute tools, repeat.
  2. Context is the most precious resource. Subagents, compaction, skills, worktrees — every design decision manages context.
  3. Permissions are the safety net. Trust builds gradually, from restrictive to autonomous.
  4. Markdown is the extension language. No compilation, no deployment. Drop a file, gain a capability.
  5. Production requires production discipline. Monitoring, testing, error handling, cost management — familiar tools, new application.

Where to Go Next

  • Build your skill library. Start with your team’s three most common workflows.
  • Set up CI/CD integration. Start with automated PR reviews. Tune based on results.
  • Contribute to the community. Share skills, agents, and patterns.
  • Push the boundaries. Multi-agent architectures, custom MCP servers, novel coordination patterns.

You now have the architectural knowledge to build on top of Claude Code, not just use it. That is the difference between a user and a builder. Build something.