Context Compaction
Learn how Claude Code manages context window limits with a 3-layer compression strategy — keeping conversations productive even on long sessions.
What You’ll Learn
Context windows are finite. Even with 200K tokens, a long coding session with many file reads and tool calls will eventually hit the limit. When it does, Claude Code doesn’t just stop — it compacts.
By the end, you’ll understand:
- Why context limits matter in practice
- The 3-layer compression strategy
- What information survives compaction
- How to structure work for better context efficiency
The Problem
A typical coding session might look like:
System prompt: ~3,000 tokens
CLAUDE.md: ~1,000 tokens
Skill definitions: ~2,000 tokens
File reads (10 files): ~15,000 tokens
Tool outputs: ~5,000 tokens
Conversation: ~10,000 tokens
─────────────────────────────────────
Total: ~36,000 tokens
Now imagine a longer session with 50 file reads, multiple searches, and several rounds of edits. You can easily hit 100K+ tokens. At some point, the context window needs to be trimmed.
How It Works
The 3-Layer Strategy
Claude Code uses a progressive compression approach:
┌──────────────────────────────────────────────┐
│ Context Compaction │
│ │
│ Layer 1: Trim Tool Outputs │
│ ┌───────────────────────────────────────┐ │
│ │ Long tool results → truncated │ │
│ │ "File contents (5000 lines)..." │ │
│ │ → "File contents (first 200 lines)..." │ │
│ └───────────────────────────────────────┘ │
│ │
│ Layer 2: Summarize Old Messages │
│ ┌───────────────────────────────────────┐ │
│ │ Early conversation turns → │ │
│ │ compressed summary │ │
│ │ │ │
│ │ Turn 1-15: "User asked to fix auth │ │
│ │ bug. Found issue in middleware. │ │
│ │ Fixed JWT validation. Tests pass." │ │
│ └───────────────────────────────────────┘ │
│ │
│ Layer 3: Preserve Recent Context │
│ ┌───────────────────────────────────────┐ │
│ │ Last N turns kept in full │ │
│ │ System prompt always kept │ │
│ │ CLAUDE.md always kept │ │
│ │ Active task list always kept │ │
│ └───────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────┘
What Survives Compaction
| Always Preserved | Compressed | Removed |
|---|---|---|
| System prompt | Old conversation turns | Redundant tool outputs |
| CLAUDE.md | Early file reads | Superseded edits |
| Current task list | Previous search results | Intermediate exploration |
| Recent conversation | Old error messages | Resolved error details |
The Compaction Trigger
Compaction happens automatically when context usage approaches the limit:
Context usage: ████████████████░░░░ 80% ← Normal
Context usage: ██████████████████░░ 90% ← Approaching limit
Context usage: ████████████████████ 95% ← COMPACT NOW
After compaction:
Context usage: ████████████░░░░░░░░ 60% ← Room to continue
Preserving Critical Information
The system is smart about what to preserve:
- System instructions — Always kept (defines behavior)
- User’s original request — Always kept (the goal)
- Current state — What files have been modified, what’s pending
- Recent turns — The last several interactions in full
- Summary of earlier work — Compressed but not lost
Key Insight
Compaction is not deletion — it’s summarization. The AI doesn’t lose awareness of what happened earlier; it loses the verbatim details but keeps the essence.
This has practical implications for how you work:
- Long sessions are fine — compaction keeps them productive
- Critical context should be in CLAUDE.md — it’s never compacted
- Break large tasks into phases — each phase can start with a fresh context
- Use subagents for research — their context is separate and discarded after
The biggest mistake users make is worrying about context limits. In practice, compaction handles it automatically. The system degrades gracefully — you’ll notice slightly less detailed recall of earlier conversation, but the work continues.
Hands-On Example
Optimizing for Context Efficiency
Structure your work to minimize context waste:
# Inefficient: loading everything then working
"Read all files in src/, then fix the bug in auth.ts"
→ Loads 50 files, only 1 is relevant
# Efficient: targeted exploration
"The bug is in authentication. Check src/auth/ first."
→ Loads 3-5 files, context stays lean
# Even better: use subagents for exploration
"Use an Explore agent to find where authentication
is handled, then fix the bug."
→ Exploration context is isolated in subagent
Monitoring Context Usage
Claude Code shows context usage in the UI. Watch for the indicator:
─────────────────────────────────────
Context: ████████████████░░░░ 80%
─────────────────────────────────────
When you see high context usage:
- Consider starting a new session for unrelated tasks
- Use
/compactto manually trigger compaction - Delegate research to subagents
What Makes a Good Summary
When compaction summarizes your earlier work, it captures:
Summary of turns 1-25:
- User asked to implement user authentication for Express app
- Explored codebase: Express 4.x, PostgreSQL, no existing auth
- Decided on JWT + bcrypt approach
- Created: src/middleware/auth.ts, src/routes/auth.ts
- Modified: src/app.ts (added auth middleware)
- Added: user table migration
- Tests written and passing (4 tests)
- Current task: Adding password reset flow
This summary preserves the key decisions and state without the 25 turns of back-and-forth.
What Changed
| No Compaction | With 3-Layer Compaction |
|---|---|
| Session ends when context fills | Sessions can continue indefinitely |
| All history at full detail | Progressive compression |
| Context crisis is abrupt | Gradual, graceful degradation |
| Must restart for long tasks | Automatic management |
| Critical info can be lost | System prompt + CLAUDE.md preserved |
Next Session
This completes Module 1: Core Agent! You now understand the fundamental building blocks: the agent loop, tools, planning, subagents, knowledge loading, and context management.
In Module 2, we scale up to multi-agent systems. Session 7 starts with Task Graph & Dependencies — how Claude Code coordinates multiple tasks with dependency edges, enabling complex workflows.