How to Reduce Claude Code Token Usage

Before you change anything, measure where your tokens actually go. Cutting the wrong thing saves nothing — and you can't see the right thing without real data.

By Arham WaniLast updated June 2026 whoburnedmore guide

Quick answer

To reduce Claude Code token usage, start by running npx whoburnedmore to see which sessions and models are burning most of your budget. Then apply the high-impact fixes: clear context with /clear, use /compact to compress long sessions, scope your file reads, and route simple tasks to cheaper models. 📉

Most developers who complain about high Claude Code bills are surprised when they see the breakdown: the majority of their spend comes from two or three sessions where context grew out of control, not from their everyday usage. The 10 techniques below are ranked by typical impact — but the ranking might be wrong for you. Measure first, then cut.

Measure

npx whoburnedmore

Identify

biggest drain

Fix

apply techniques

Verify

re-measure

The right order: measure your usage, find the biggest drain, apply targeted fixes, verify the result.

Step 0: measure before you cut

Every token-reduction technique below has a different payoff depending on how you use Claude Code. A developer whose cost spikes from a handful of deep refactoring sessions needs different fixes than one whose cost is spread evenly across dozens of short tasks each day. The only way to know which techniques will help is to look at your actual data first.

zsh — ~/code

$ npx whoburnedmore↳ reading ~/.claude/usage/ …   TOKEN BREAKDOWN  (last 30 days)  date        sessions   input tokens   output tokens   cost  ─────────────────────────────────────────────────────────────  Jun 12      3 sess.    88.4M          14.2M           $19.80  Jun 11      8 sess.    14.1M           2.1M            $3.10  Jun 10      5 sess.    11.6M           1.8M            $2.50  ─────────────────────────────────────────────────────────────  Jun 12 alone = 79% of the 3-day total cost

In the example above, June 12 had only 3 sessions but burned nearly 80% of the 3-day budget. That tells you the problem isn't the frequency of sessions — it's the depth of context in those three long sessions. The fix for that scenario is /compact and /clear, not fewer sessions.

The 10 fixes, ranked by typical impact

Apply these in order. The first four account for the majority of savings in most workflows; the later ones give incremental gains for high-volume users.

1
Use /clear between unrelated tasks
The single highest-impact change for most developers. When you finish a task and start something unrelated, run /clearbefore the new message. This resets the context window to zero, so the next request doesn't re-send the entire conversation history as input tokens. A context window that drifted to 60k tokens costs you 60k input tokens on every subsequent message — clearing it costs nothing.
2
Run /compact when sessions go long
When you can't clear (you're mid-task and need the history), run /compact. Claude Code compresses the conversation into a dense summary, replacing the full history with a compact version that preserves task state without the raw token weight. This can reduce a 100k-token context to under 20k with no meaningful loss for most coding tasks.
3
Scope your file reads — don't load the whole repo
When you ask Claude to “look at the codebase,” it reads whatever files it can reach. Be specific: name the exact files relevant to the current task. Loading 50 files you don't need costs you those 50 files' worth of input tokens on every message in the session.
4
Route simple tasks to a cheaper model
Not every task needs Sonnet. Formatting fixes, docstring generation, type annotation, and quick lookups run well on Haiku at roughly one-tenth the per-token cost. Check your model split in whoburnedmore — if it shows 100% Sonnet usage, you're over-spending on easy tasks.
5
Prune your CLAUDE.md
CLAUDE.md is sent at the top of every session. A bloated CLAUDE.md with 3,000 words of project context costs you those 3,000 tokens before you've typed a single message. Trim it to the essentials — the parts that actually change how Claude behaves on most tasks — and move verbose documentation elsewhere.
6
Avoid re-reading files you already read this session
If you read a large file early in a session, Claude already has it in context. Asking to read it again is redundant and burns input tokens for data Claude already has. Keep a mental model of what's been read; start a new task with /clear rather than re-reading on top of an existing context.
7
Break long multi-step tasks into checkpoints
A session that attempts to refactor an entire module in one go grows context linearly with each subtask. Break the refactor into independent checkpoints, /clear between them, and feed only the necessary context into each. You get the same result with a fraction of the accumulated context cost.
8
Use /context to monitor window growth in real time
The built-in /context command shows your current window size. If you run it periodically during a long session, you can catch context bloat before it becomes expensive and trigger /compact or /clear proactively.
9
Avoid verbose output instructions unless you need them
Prompts like “write detailed comments for every line” or “explain each step thoroughly” generate long outputs. Output tokens cost roughly 5× more than input tokens on Sonnet models. Keep output instructions scoped to what you actually need; ask for verbose output only when you need it.
10
Batch related questions into one message
Each message round-trip re-sends the entire context as input. If you have three follow-up questions, ask them together in one message rather than sending them one by one. This alone can cut input token usage by 40–60% in interactive Q&A sessions.

How much can you realistically save?

Applying the top four fixes — /clear between tasks, /compact mid-session, scoped file reads, and model routing — typically reduces monthly usage by 30–60% for developers with context-heavy workflows. The savings equation is straightforward:

savings=cost_before−cost_after=Σ (Δtokens_d×price_model)

Savings from context management = prior cost minus post-fix cost.

Re-measure after two weeks

Apply two or three fixes, run whoburnedmore again after two weeks, and compare the cost-per-session metric. Focus on the sessions that changed most — those are your highest-leverage wins.

Quick-reference: command vs. savings type

Different techniques attack different parts of the cost. The table below maps each approach to the type of waste it eliminates:

Technique	Cuts input tokens	Cuts output tokens	Effort
/clear between tasks	large	—	One command
/compact mid-session	medium	—	One command
Scope file reads	large	—	Prompt habit
Route to cheaper model	via pricing	via pricing	Claude.md or prompt
Prune CLAUDE.md	small	—	One-time edit
Batch questions	medium	—	Prompt habit
Avoid verbose output	—	large	Prompt habit

Most developers recoup 40–60% of their costs by applying the first three rows alone.

/clear

highest single impact

~10×

haiku cheaper than sonnet

30–60%

typical savings from top 4 fixes

The verify-and-iterate loop

Reducing token usage is an iterative process. After applying fixes, you need new data to know what actually worked. The fastest feedback loop:

Week 1: measure baseline

Run npx whoburnedmore and note your total cost, peak session cost, and model split. Screenshot or copy the output to have a before comparison.

Week 2: apply the top 3 fixes

Add /clear and /compact to your workflow, and scope your file reads more tightly. Don't change everything at once — you won't know what worked.

Week 3: re-measure and adjust

Run whoburnedmore again. Compare week-over-week. If cost-per-session dropped, the context fixes worked. If total sessions rose, you might need to batch more aggressively.

Ongoing: watch weekly trend

Add whoburnedmore to your weekly check-in habit. Spikes are visible immediately, before they compound into a large monthly bill.

Related: understand your context window

If /compact and /clear feel like band-aids rather than fixes, you may want to understand how the context window fills up in the first place. See why your Claude Code context window fills up for a detailed breakdown of what eats your window and how to structure sessions around it.

The cost of Claude Code is almost entirely within your control once you can see it clearly. Running npx whoburnedmore takes 10 seconds; knowing which of these 10 techniques will actually move your number saves real money over a full month. 🔥

Related guides

How Much Does Claude Code Cost Per Month?

Stop guessing from averages — measure your own monthly Claude Code cost from real usage.

Why Your Claude Code Context Window Fills Up

What eats your context window — and how token totals reveal it.

How to Check Claude Code Token Usage

See your Claude Code tokens by day, model, and project — and how the built-in /usage compares.

← Browse all whoburnedmore guides

How to Reduce Claude Code Token Usage

Before you change anything, measure where your tokens actually go. Cutting the wrong thing saves nothing — and you can't see the right thing without real data.

By Arham WaniLast updated June 2026 whoburnedmore guide

Quick answer

Measure

npx whoburnedmore

Identify

biggest drain

Fix

apply techniques

Verify

re-measure

The right order: measure your usage, find the biggest drain, apply targeted fixes, verify the result.

Step 0: measure before you cut

zsh — ~/code

$ npx whoburnedmore↳ reading ~/.claude/usage/ …   TOKEN BREAKDOWN  (last 30 days)  date        sessions   input tokens   output tokens   cost  ─────────────────────────────────────────────────────────────  Jun 12      3 sess.    88.4M          14.2M           $19.80  Jun 11      8 sess.    14.1M           2.1M            $3.10  Jun 10      5 sess.    11.6M           1.8M            $2.50  ─────────────────────────────────────────────────────────────  Jun 12 alone = 79% of the 3-day total cost

The 10 fixes, ranked by typical impact

Apply these in order. The first four account for the majority of savings in most workflows; the later ones give incremental gains for high-volume users.

1
Use /clear between unrelated tasks
The single highest-impact change for most developers. When you finish a task and start something unrelated, run /clearbefore the new message. This resets the context window to zero, so the next request doesn't re-send the entire conversation history as input tokens. A context window that drifted to 60k tokens costs you 60k input tokens on every subsequent message — clearing it costs nothing.
2
Run /compact when sessions go long
When you can't clear (you're mid-task and need the history), run /compact. Claude Code compresses the conversation into a dense summary, replacing the full history with a compact version that preserves task state without the raw token weight. This can reduce a 100k-token context to under 20k with no meaningful loss for most coding tasks.
3
Scope your file reads — don't load the whole repo
When you ask Claude to “look at the codebase,” it reads whatever files it can reach. Be specific: name the exact files relevant to the current task. Loading 50 files you don't need costs you those 50 files' worth of input tokens on every message in the session.
4
Route simple tasks to a cheaper model
Not every task needs Sonnet. Formatting fixes, docstring generation, type annotation, and quick lookups run well on Haiku at roughly one-tenth the per-token cost. Check your model split in whoburnedmore — if it shows 100% Sonnet usage, you're over-spending on easy tasks.
5
Prune your CLAUDE.md
CLAUDE.md is sent at the top of every session. A bloated CLAUDE.md with 3,000 words of project context costs you those 3,000 tokens before you've typed a single message. Trim it to the essentials — the parts that actually change how Claude behaves on most tasks — and move verbose documentation elsewhere.
6
Avoid re-reading files you already read this session
If you read a large file early in a session, Claude already has it in context. Asking to read it again is redundant and burns input tokens for data Claude already has. Keep a mental model of what's been read; start a new task with /clear rather than re-reading on top of an existing context.
7
Break long multi-step tasks into checkpoints
A session that attempts to refactor an entire module in one go grows context linearly with each subtask. Break the refactor into independent checkpoints, /clear between them, and feed only the necessary context into each. You get the same result with a fraction of the accumulated context cost.
8
Use /context to monitor window growth in real time
The built-in /context command shows your current window size. If you run it periodically during a long session, you can catch context bloat before it becomes expensive and trigger /compact or /clear proactively.
9
Avoid verbose output instructions unless you need them
Prompts like “write detailed comments for every line” or “explain each step thoroughly” generate long outputs. Output tokens cost roughly 5× more than input tokens on Sonnet models. Keep output instructions scoped to what you actually need; ask for verbose output only when you need it.
10
Batch related questions into one message
Each message round-trip re-sends the entire context as input. If you have three follow-up questions, ask them together in one message rather than sending them one by one. This alone can cut input token usage by 40–60% in interactive Q&A sessions.

How much can you realistically save?

savings=cost_before−cost_after=Σ (Δtokens_d×price_model)

Savings from context management = prior cost minus post-fix cost.

Re-measure after two weeks

Apply two or three fixes, run whoburnedmore again after two weeks, and compare the cost-per-session metric. Focus on the sessions that changed most — those are your highest-leverage wins.

Quick-reference: command vs. savings type

Different techniques attack different parts of the cost. The table below maps each approach to the type of waste it eliminates:

Technique	Cuts input tokens	Cuts output tokens	Effort
/clear between tasks	large	—	One command
/compact mid-session	medium	—	One command
Scope file reads	large	—	Prompt habit
Route to cheaper model	via pricing	via pricing	Claude.md or prompt
Prune CLAUDE.md	small	—	One-time edit
Batch questions	medium	—	Prompt habit
Avoid verbose output	—	large	Prompt habit

Most developers recoup 40–60% of their costs by applying the first three rows alone.

/clear

highest single impact

~10×

haiku cheaper than sonnet

30–60%

typical savings from top 4 fixes

The verify-and-iterate loop

Reducing token usage is an iterative process. After applying fixes, you need new data to know what actually worked. The fastest feedback loop:

Week 1: measure baseline

Run npx whoburnedmore and note your total cost, peak session cost, and model split. Screenshot or copy the output to have a before comparison.

Week 2: apply the top 3 fixes

Add /clear and /compact to your workflow, and scope your file reads more tightly. Don't change everything at once — you won't know what worked.

Week 3: re-measure and adjust

Run whoburnedmore again. Compare week-over-week. If cost-per-session dropped, the context fixes worked. If total sessions rose, you might need to batch more aggressively.

Ongoing: watch weekly trend

Add whoburnedmore to your weekly check-in habit. Spikes are visible immediately, before they compound into a large monthly bill.

Related: understand your context window

Related guides

How Much Does Claude Code Cost Per Month?

Stop guessing from averages — measure your own monthly Claude Code cost from real usage.

Why Your Claude Code Context Window Fills Up

What eats your context window — and how token totals reveal it.

How to Check Claude Code Token Usage

See your Claude Code tokens by day, model, and project — and how the built-in /usage compares.

← Browse all whoburnedmore guides