OpenAI Codex CLI Cost: Estimate and Track It

Codex CLI costs are driven by the model you use and how much context you accumulate per session. Here's how to estimate costs before you start — and measure them after.

By Arham WaniLast updated June 2026 whoburnedmore guide

Quick answer

OpenAI Codex CLI cost depends on which model backs it and whether you're on a flat ChatGPT plan or the API. On a flat plan, “cost” is what your tokens would cost at API rates — the honest way to check if the subscription pays off. Run npx whoburnedmore to see your actual Codex token spend next to every other tool you run. 💸

Codex CLI is OpenAI's terminal agent: you give it a task, it writes and runs code on your machine. The cost story is more complicated than a simple per-token rate because the underlying model changes, the ChatGPT flat-rate plan doesn't directly bill tokens, and the built-in /status command only shows your current 5-hour window rather than your full history. This guide covers the estimation math, the tracking options, and where to look when your usage is higher than expected.

How Codex CLI token pricing works

When you use Codex CLI through a ChatGPT Plus or Pro subscription, you don't pay per token — you pay a flat monthly fee and OpenAI manages how many “compute units” each session consumes. When you use Codex through the OpenAI API directly, you pay per token at the rate of the underlying model (currently o3 or o4-mini for most tasks).

The estimation formula

For API users, the cost of a Codex CLI session depends on three variables: the size of the context you feed it (files, conversation history), the length of its response (code it writes, explanations it gives), and the per-token price of the model. Reasoning models like o3 also generate internal “thinking tokens” that count toward your bill even though you never see them.

cost_session=ctx_tokens×p_in+out_tokens×p_out+think_tokens×p_think

Session cost = context tokens × input price + output tokens × output price + thinking tokens × thinking price.

Reasoning tokens are invisible but real

o3 and similar reasoning models perform internal chain-of-thought before responding. These “thinking tokens” appear on your API bill but are not returned to you. A task that generates a short code snippet might actually cost 3–5× more in thinking tokens than in output tokens. This is the most common source of “why did this session cost so much?” surprises.

Where does your Codex spend go?

In most Codex CLI workflows, the cost breakdown by token type looks something like this (illustrative example for a heavy-use developer session):

thinking tokens52%
context / input31%
output (code + text)17%

If you're surprised that thinking tokens are the largest slice, you're not alone. This proportion is model-dependent — o4-mini has lower thinking token overhead than o3 for simple tasks, which is why routing straightforward tasks to the lighter model makes a real difference. Measuring this split requires reading the raw API response metadata that tools like whoburnedmore surface.

Estimate your monthly Codex cost before running

Use this three-step estimate before a heavy Codex session to avoid unexpected bills:

1
Estimate your average context size
What files will you need Codex to read? A typical source file is 5–15k tokens. A large module with tests might be 50k. Add the estimated conversation history for a session that might run 10 back-and-forth exchanges at 500 tokens each: 5k more.
2
Estimate output per exchange
If Codex is writing a new component, estimate the code output: 200–500 tokens per function. If it's refactoring, it outputs the full revised file — could be 2–10k per exchange.
3
Add a 3–5× multiplier for thinking tokens on o3
Multiply your output estimate by 3–5 to account for invisible reasoning tokens. For o4-mini on simple tasks, the multiplier is closer to 1–2×. If you're not sure which model backs your Codex sessions, run whoburnedmore — it shows the model split.
4
Compare to the flat-plan threshold
If your estimated monthly cost (sessions × session cost) is less than your ChatGPT subscription price, the flat plan is the better deal. If it's close to or higher than the flat rate, you might be better off on API pay-as-you-go for predictable, heavy usage.

Actually measuring what you spent

Estimates are useful before a session. After your session, you need real numbers. Whoburnedmore reads the Codex CLI local logs and tallies your token usage per day and per model so you can see the actual cost, not the estimate:

zsh — ~/code

$ npx whoburnedmore↳ reading ~/.codex/logs/ …   CODEX CLI  (last 30 days)  model         sessions   input + think    output     est. cost  ───────────────────────────────────────────────────────────────  o3              12        41.2M            6.8M       $58.40  o4-mini         28         8.9M            2.1M        $4.20  ───────────────────────────────────────────────────────────────  total           40        50.1M            8.9M       $62.60   ↳ on ChatGPT Pro ($200/mo): saving ~$137 this month

The /status command: useful but limited

Codex CLI's built-in /statusshows your usage within the current 5-hour rolling window. That's useful for knowing how much headroom you have before hitting the rate limit, but it resets after 5 hours and gives you no monthly aggregate. For the full picture, you need to read the persistent local logs — which is what whoburnedmore does.

What you want to know	/status	whoburnedmore
Current 5-hour remaining		estimated
Full session history	—
Monthly aggregate cost	—
Model split (o3 vs o4-mini)	—
Thinking vs output token split	—
Cross-tool comparison	—	12+ tools

For anything beyond the current window, the built-in is insufficient.

Understanding the 5-hour window

Codex CLI on ChatGPT plans enforces a 5-hour rolling usage window. Once you hit the cap, new Codex sessions are queued or refused until the window rolls forward. This means the cost behavior of Codex is less about per-token pricing and more about time-based throughput on flat plans.

For API users, there's no equivalent hard cap — you're limited by OpenAI's rate limits on requests per minute and tokens per minute, which are typically much higher for paid API tiers. The relevant limit on the API is your account credit balance, not a usage window.

Track Codex alongside your other tools

If you also run Claude Code, Gemini CLI, or other agents, your Codex usage is just one piece of your total AI coding spend. See Codex usage limits in detail for the 5-hour and weekly caps, or use whoburnedmore to see all tools together.

3–5×

thinking token multiplier on o3

local

logs stay on your machine

5-hour

rolling window on flat plans

Codex CLI cost is tractable once you separate the three token types and know your model mix. Running npx whoburnedmore after a heavy Codex week gives you the real breakdown — not the estimate — so your next round of sessions is better scoped and less surprising. 🔥

Related guides

How to Check Codex CLI Usage and Token Cost

Codex CLI's /status only shows the current window — here's how to see your full history and cost.

Codex Usage Limits: 5-Hour and Weekly Caps

The 5-hour vs weekly windows on a ChatGPT plan — and how to track consumption.

AI Coding Cost: Claude Code vs Codex vs Gemini

A real cost comparison of the big three — measured from your logs, not marketing.

← Browse all whoburnedmore guides

OpenAI Codex CLI Cost: Estimate and Track It

Codex CLI costs are driven by the model you use and how much context you accumulate per session. Here's how to estimate costs before you start — and measure them after.

By Arham WaniLast updated June 2026 whoburnedmore guide

Quick answer

How Codex CLI token pricing works

The estimation formula

cost_session=ctx_tokens×p_in+out_tokens×p_out+think_tokens×p_think

Session cost = context tokens × input price + output tokens × output price + thinking tokens × thinking price.

Reasoning tokens are invisible but real

Where does your Codex spend go?

In most Codex CLI workflows, the cost breakdown by token type looks something like this (illustrative example for a heavy-use developer session):

thinking tokens52%
context / input31%
output (code + text)17%

Estimate your monthly Codex cost before running

Use this three-step estimate before a heavy Codex session to avoid unexpected bills:

1
Estimate your average context size
What files will you need Codex to read? A typical source file is 5–15k tokens. A large module with tests might be 50k. Add the estimated conversation history for a session that might run 10 back-and-forth exchanges at 500 tokens each: 5k more.
2
Estimate output per exchange
If Codex is writing a new component, estimate the code output: 200–500 tokens per function. If it's refactoring, it outputs the full revised file — could be 2–10k per exchange.
3
Add a 3–5× multiplier for thinking tokens on o3
Multiply your output estimate by 3–5 to account for invisible reasoning tokens. For o4-mini on simple tasks, the multiplier is closer to 1–2×. If you're not sure which model backs your Codex sessions, run whoburnedmore — it shows the model split.
4
Compare to the flat-plan threshold
If your estimated monthly cost (sessions × session cost) is less than your ChatGPT subscription price, the flat plan is the better deal. If it's close to or higher than the flat rate, you might be better off on API pay-as-you-go for predictable, heavy usage.

Actually measuring what you spent

zsh — ~/code

$ npx whoburnedmore↳ reading ~/.codex/logs/ …   CODEX CLI  (last 30 days)  model         sessions   input + think    output     est. cost  ───────────────────────────────────────────────────────────────  o3              12        41.2M            6.8M       $58.40  o4-mini         28         8.9M            2.1M        $4.20  ───────────────────────────────────────────────────────────────  total           40        50.1M            8.9M       $62.60   ↳ on ChatGPT Pro ($200/mo): saving ~$137 this month

The /status command: useful but limited

What you want to know	/status	whoburnedmore
Current 5-hour remaining		estimated
Full session history	—
Monthly aggregate cost	—
Model split (o3 vs o4-mini)	—
Thinking vs output token split	—
Cross-tool comparison	—	12+ tools

For anything beyond the current window, the built-in is insufficient.

Understanding the 5-hour window

Track Codex alongside your other tools

3–5×

thinking token multiplier on o3

local

logs stay on your machine

5-hour

rolling window on flat plans

Related guides

How to Check Codex CLI Usage and Token Cost

Codex CLI's /status only shows the current window — here's how to see your full history and cost.

Codex Usage Limits: 5-Hour and Weekly Caps

The 5-hour vs weekly windows on a ChatGPT plan — and how to track consumption.

AI Coding Cost: Claude Code vs Codex vs Gemini

A real cost comparison of the big three — measured from your logs, not marketing.

← Browse all whoburnedmore guides