How to Reduce Claude Code Token Usage
Before you change anything, measure where your tokens actually go. Cutting the wrong thing saves nothing — and you can't see the right thing without real data.
Quick answer
npx whoburnedmore to see which sessions and models are burning most of your budget. Then apply the high-impact fixes: clear context with /clear, use /compact to compress long sessions, scope your file reads, and route simple tasks to cheaper models. 📉Most developers who complain about high Claude Code bills are surprised when they see the breakdown: the majority of their spend comes from two or three sessions where context grew out of control, not from their everyday usage. The 10 techniques below are ranked by typical impact — but the ranking might be wrong for you. Measure first, then cut.
Measure
npx whoburnedmore
Identify
biggest drain
Fix
apply techniques
Verify
re-measure
Step 0: measure before you cut
Every token-reduction technique below has a different payoff depending on how you use Claude Code. A developer whose cost spikes from a handful of deep refactoring sessions needs different fixes than one whose cost is spread evenly across dozens of short tasks each day. The only way to know which techniques will help is to look at your actual data first.
$ npx whoburnedmore↳ reading ~/.claude/usage/ … TOKEN BREAKDOWN (last 30 days) date sessions input tokens output tokens cost ───────────────────────────────────────────────────────────── Jun 12 3 sess. 88.4M 14.2M $19.80 Jun 11 8 sess. 14.1M 2.1M $3.10 Jun 10 5 sess. 11.6M 1.8M $2.50 ───────────────────────────────────────────────────────────── Jun 12 alone = 79% of the 3-day total cost
In the example above, June 12 had only 3 sessions but burned nearly 80% of the 3-day budget. That tells you the problem isn't the frequency of sessions — it's the depth of context in those three long sessions. The fix for that scenario is /compact and /clear, not fewer sessions.
The 10 fixes, ranked by typical impact
Apply these in order. The first four account for the majority of savings in most workflows; the later ones give incremental gains for high-volume users.
- 1
Use /clear between unrelated tasks
The single highest-impact change for most developers. When you finish a task and start something unrelated, run/clearbefore the new message. This resets the context window to zero, so the next request doesn't re-send the entire conversation history as input tokens. A context window that drifted to 60k tokens costs you 60k input tokens on every subsequent message — clearing it costs nothing. - 2
Run /compact when sessions go long
When you can't clear (you're mid-task and need the history), run/compact. Claude Code compresses the conversation into a dense summary, replacing the full history with a compact version that preserves task state without the raw token weight. This can reduce a 100k-token context to under 20k with no meaningful loss for most coding tasks. - 3
Scope your file reads — don't load the whole repo
When you ask Claude to “look at the codebase,” it reads whatever files it can reach. Be specific: name the exact files relevant to the current task. Loading 50 files you don't need costs you those 50 files' worth of input tokens on every message in the session. - 4
Route simple tasks to a cheaper model
Not every task needs Sonnet. Formatting fixes, docstring generation, type annotation, and quick lookups run well on Haiku at roughly one-tenth the per-token cost. Check your model split in whoburnedmore — if it shows 100% Sonnet usage, you're over-spending on easy tasks. - 5
Prune your CLAUDE.md
CLAUDE.md is sent at the top of every session. A bloated CLAUDE.md with 3,000 words of project context costs you those 3,000 tokens before you've typed a single message. Trim it to the essentials — the parts that actually change how Claude behaves on most tasks — and move verbose documentation elsewhere. - 6
Avoid re-reading files you already read this session
If you read a large file early in a session, Claude already has it in context. Asking to read it again is redundant and burns input tokens for data Claude already has. Keep a mental model of what's been read; start a new task with /clear rather than re-reading on top of an existing context. - 7
Break long multi-step tasks into checkpoints
A session that attempts to refactor an entire module in one go grows context linearly with each subtask. Break the refactor into independent checkpoints, /clear between them, and feed only the necessary context into each. You get the same result with a fraction of the accumulated context cost. - 8
Use /context to monitor window growth in real time
The built-in/contextcommand shows your current window size. If you run it periodically during a long session, you can catch context bloat before it becomes expensive and trigger /compact or /clear proactively. - 9
Avoid verbose output instructions unless you need them
Prompts like “write detailed comments for every line” or “explain each step thoroughly” generate long outputs. Output tokens cost roughly 5× more than input tokens on Sonnet models. Keep output instructions scoped to what you actually need; ask for verbose output only when you need it. - 10
Batch related questions into one message
Each message round-trip re-sends the entire context as input. If you have three follow-up questions, ask them together in one message rather than sending them one by one. This alone can cut input token usage by 40–60% in interactive Q&A sessions.
How much can you realistically save?
Applying the top four fixes — /clear between tasks, /compact mid-session, scoped file reads, and model routing — typically reduces monthly usage by 30–60% for developers with context-heavy workflows. The savings equation is straightforward:
Re-measure after two weeks
Apply two or three fixes, run whoburnedmore again after two weeks, and compare the cost-per-session metric. Focus on the sessions that changed most — those are your highest-leverage wins.Quick-reference: command vs. savings type
Different techniques attack different parts of the cost. The table below maps each approach to the type of waste it eliminates:
| Technique | Cuts input tokens | Cuts output tokens | Effort |
|---|---|---|---|
| /clear between tasks | large | — | One command |
| /compact mid-session | medium | — | One command |
| Scope file reads | large | — | Prompt habit |
| Route to cheaper model | via pricing | via pricing | Claude.md or prompt |
| Prune CLAUDE.md | small | — | One-time edit |
| Batch questions | medium | — | Prompt habit |
| Avoid verbose output | — | large | Prompt habit |
highest single impact
haiku cheaper than sonnet
typical savings from top 4 fixes
The verify-and-iterate loop
Reducing token usage is an iterative process. After applying fixes, you need new data to know what actually worked. The fastest feedback loop:
Week 1: measure baseline
Run npx whoburnedmore and note your total cost, peak session cost, and model split. Screenshot or copy the output to have a before comparison.
Week 2: apply the top 3 fixes
Add /clear and /compact to your workflow, and scope your file reads more tightly. Don't change everything at once — you won't know what worked.
Week 3: re-measure and adjust
Run whoburnedmore again. Compare week-over-week. If cost-per-session dropped, the context fixes worked. If total sessions rose, you might need to batch more aggressively.
Ongoing: watch weekly trend
Add whoburnedmore to your weekly check-in habit. Spikes are visible immediately, before they compound into a large monthly bill.
Related: understand your context window
If /compact and /clear feel like band-aids rather than fixes, you may want to understand how the context window fills up in the first place. See why your Claude Code context window fills up for a detailed breakdown of what eats your window and how to structure sessions around it.The cost of Claude Code is almost entirely within your control once you can see it clearly. Running npx whoburnedmore takes 10 seconds; knowing which of these 10 techniques will actually move your number saves real money over a full month. 🔥
Related guides
How Much Does Claude Code Cost Per Month?
Stop guessing from averages — measure your own monthly Claude Code cost from real usage.
Why Your Claude Code Context Window Fills Up
What eats your context window — and how token totals reveal it.
How to Check Claude Code Token Usage
See your Claude Code tokens by day, model, and project — and how the built-in /usage compares.