エラーリカバリー

リトライロジック、エラー分類、フォールバック戦略の構築方法。

2026年3月20日 · 18 分で読む

学ぶこと

エージェントワークフローではエラーは避けられません。APIは429を返します。ビルドは失敗します。テストが壊れます。ファイルが見つかりません。脆いスクリプトとレジリエントなエージェントを分けるのは、エラーに遭遇するかどうかではなく、障害にどう対応するかです。

このセッションを終えると、以下を理解できるようになります：

エラーを実行可能なカテゴリに分類する方法
指数バックオフ付きリトライパターン
プライマリアプローチが失敗した場合のフォールバック戦略
Claude Code がエラーを情報に変換する方法
一般的な障害モードとそのリカバリーパターン

課題

単純なエージェントはすべてのエラーを同じように扱います：停止します。しかし、すべてのエラーが同じではありません。

Error: ENOENT: no such file or directory     ← Wrong path, fixable
Error: 429 Too Many Requests                 ← Temporary, just wait
Error: Cannot find module 'express'          ← Missing dependency, install it
Error: SyntaxError: Unexpected token         ← Code bug, must fix
Error: EPERM: operation not permitted         ← Needs user action

それぞれまったく異なる対応が必要です。鍵は分類です：何をすべきか決める前に、どのようなエラーに対処しているのかを理解することです。

仕組み

エラー分類

すべてのエラーは3つのカテゴリのいずれかに分類されます：

┌──────────────────────────────────────────────────┐
│            Error Classification                   │
│                                                   │
│  TRANSIENT → Retry with backoff                   │
│  │  Rate limits, network errors, timeouts,        │
│  │  server 5xx. Will succeed if you wait.         │
│  │                                                │
│  PERMANENT → Change approach                      │
│  │  File not found, syntax error, type mismatch,  │
│  │  missing module. Retrying won't help.          │
│  │                                                │
│  USER-ACTIONABLE → Ask the user                   │
│     Permission errors, auth required, config      │
│     missing. The agent cannot resolve alone.      │
│                                                   │
└──────────────────────────────────────────────────┘

リトライパターン

一時的なエラーには、指数バックオフを使用します：

Attempt 1: Try → Fails (429) → Wait 1s
Attempt 2: Try → Fails (429) → Wait 2s
Attempt 3: Try → Fails (429) → Wait 4s
Attempt 4: Try → Succeeds!

  Max retries: 3-5 for most operations
  Max wait: cap at 30-60 seconds
  Doubling wait prevents hammering rate-limited APIs

フォールバック戦略

プライマリアプローチが恒久的に失敗した場合、代替手段を試みます：

┌──────────────────────────────────────────────────┐
│           Fallback Chain                          │
│                                                   │
│  Primary: npm install express                     │
│     │  FAILS (npm not found)                      │
│     ▼                                             │
│  Fallback 1: yarn add express                     │
│     │  FAILS (yarn not found)                     │
│     ▼                                             │
│  Fallback 2: pnpm add express                     │
│     │  SUCCEEDS → Continue with pnpm              │
│     ▼                                             │
│  All failed → Report what was tried               │
│                                                   │
└──────────────────────────────────────────────────┘

フォールバックはあらゆるレベルで機能します：

File Reading:     Try config.ts → config.js → config.json → search
Build Commands:   Try pnpm build → npm build → yarn build → read package.json
Test Runners:     Try pnpm test → npx jest → npx vitest → find config

Claude Code のツールエラー処理

ツール呼び出しが失敗すると、エラーは tool_result として返されます。AIはそれを読んで適応します：

┌──────────────────────────────────────────────────┐
│        Error as Tool Result                       │
│                                                   │
│  AI calls: Bash("cat src/auth.ts")                │
│                                                   │
│  Tool returns:                                    │
│    "cat: src/auth.ts: No such file or directory"  │
│    is_error: true                                 │
│                                                   │
│  AI reasons: "Wrong path. Let me search."         │
│                                                   │
│  AI calls: Grep("auth", pattern="*.ts")           │
│                                                   │
│  Tool returns:                                    │
│    "src/middleware/authenticate.ts"               │
│    "src/lib/auth-utils.ts"                        │
│                                                   │
│  AI adapts: reads the correct file                │
│                                                   │
└──────────────────────────────────────────────────┘

これが「情報としてのエラー」の原則です。エラーは停止信号ではなく、AIがより良い次の判断を下すためのデータです。

エラーリカバリーループ

┌──────────────────────────────────────────┐
│          Error Recovery Loop              │
│                                           │
│  1. Attempt the operation                 │
│          │                                │
│          ▼                                │
│  2. Succeed? YES → Continue               │
│              NO  → Classify               │
│          │                                │
│          ▼                                │
│  3. TRANSIENT → Retry (max 3-5x)         │
│     PERMANENT → Try fallback              │
│     USER-ACT  → Report and ask            │
│          │                                │
│          ▼                                │
│  4. Recovery worked? YES → Continue       │
│                      NO  → Escalate       │
│                                           │
└──────────────────────────────────────────┘

重要なポイント

優れたエラーリカバリーは、脆いスクリプトとレジリエントなエージェントを分けるものです。AIはエラーを停止条件ではなく、学習の機会として扱うべきです。

エラーが発生すると、そこには情報が含まれています：

「ファイルが見つかりません」はパスが間違っていることを示す --- 正しいパスを検索する
「モジュールが見つかりません」は依存関係が不足していることを示す --- インストールする
「型エラー」はコードにバグがあることを示す --- 型を読んで修正する
「パーミッション拒否」は操作に昇格されたアクセスが必要であることを示す --- ユーザーに聞く

各エラーはソリューション空間を狭めます。エージェント設計で最もよくある間違いは、すべてのエラーを致命的として扱うことです。2番目の間違いは、すべてを無差別にリトライすることです。分類こそが、リカバリーを効果的にするスキルです。

ハンズオン

レジリエントなパッケージインストールフローの構築

Install the "sharp" image processing library.
If it fails, diagnose the error and try alternative approaches.

優れたエラーリカバリーを持つエージェントの動作：

Step 1: pnpm add sharp
  → Error: node-gyp build failed (missing libvips)
  → Classify: PERMANENT (missing system dependency)

Step 2: Fallback → Try pre-built binaries
  → pnpm add sharp --ignore-scripts && npx sharp-install
  → Still fails

Step 3: Fallback → Try alternative library
  → pnpm add jimp (pure JavaScript, no native deps)
  → Succeeds!

Step 4: Update code to use jimp instead of sharp
  → Adjust imports and API calls
  → Run tests to verify

一般的な障害モードとリカバリー

障害	分類	リカバリー
`ENOENT: file not found`	恒久的	ファイルを検索、スペルを確認
`429 Too Many Requests`	一時的	指数バックオフ、最大5回リトライ
`EACCES: permission denied`	ユーザー対応	報告、`chmod` または `sudo` を提案
`Build failed: type error`	恒久的	エラーを読み、型の不一致を修正
`Test failed: assertion`	恒久的	テストを読み、ロジックを修正またはテストを更新
`Connection timeout`	一時的	より長いタイムアウトでリトライ
`Module not found`	恒久的	不足している依存関係をインストール
`Git conflict`	恒久的	コンフリクトマーカーを読み、解決

エラー対応の CLAUDE.md 設定

プロジェクト設定にリカバリー戦略を直接エンコードします：

# Error Recovery Rules
- Build fails: read FULL error output, not just last line
- Type error: check the relevant type definitions
- Missing import: search for the correct path
- Dependency issue: run pnpm install first
- Test fails: run failing test in isolation, read both test and implementation
- Never retry more than 3 times for the same error
- If stuck after 3 attempts, report what you tried

変更点まとめ

エラーリカバリーなし	エラーリカバリーあり
エージェントが最初のエラーで停止	エージェントが分類して適応
すべてのエラーを同じに扱う	一時的、恒久的、ユーザー対応に分類
リトライロジックなし	一時的エラーに指数バックオフ
フォールバックオプションなし	代替アプローチを試行
エラーは失敗	エラーは情報
ユーザーが常に介入する必要がある	エージェントが可能な限り自己修復

次のセッション

セッション21ではコスト最適化を扱います。各タスクに適したモデルの選択、トークンの効率的な管理、品質とコストのバランスを取るワークフロー設計の方法を学びます。