OpenAI Codex CLI Cost: Estimate and Track It
Codex CLI costs are driven by the model you use and how much context you accumulate per session. Here's how to estimate costs before you start — and measure them after.
Quick answer
npx whoburnedmore to see your actual Codex token spend next to every other tool you run. 💸Codex CLI is OpenAI's terminal agent: you give it a task, it writes and runs code on your machine. The cost story is more complicated than a simple per-token rate because the underlying model changes, the ChatGPT flat-rate plan doesn't directly bill tokens, and the built-in /status command only shows your current 5-hour window rather than your full history. This guide covers the estimation math, the tracking options, and where to look when your usage is higher than expected.
How Codex CLI token pricing works
When you use Codex CLI through a ChatGPT Plus or Pro subscription, you don't pay per token — you pay a flat monthly fee and OpenAI manages how many “compute units” each session consumes. When you use Codex through the OpenAI API directly, you pay per token at the rate of the underlying model (currently o3 or o4-mini for most tasks).
The estimation formula
For API users, the cost of a Codex CLI session depends on three variables: the size of the context you feed it (files, conversation history), the length of its response (code it writes, explanations it gives), and the per-token price of the model. Reasoning models like o3 also generate internal “thinking tokens” that count toward your bill even though you never see them.
Reasoning tokens are invisible but real
o3 and similar reasoning models perform internal chain-of-thought before responding. These “thinking tokens” appear on your API bill but are not returned to you. A task that generates a short code snippet might actually cost 3–5× more in thinking tokens than in output tokens. This is the most common source of “why did this session cost so much?” surprises.Where does your Codex spend go?
In most Codex CLI workflows, the cost breakdown by token type looks something like this (illustrative example for a heavy-use developer session):
- thinking tokens52%
- context / input31%
- output (code + text)17%
If you're surprised that thinking tokens are the largest slice, you're not alone. This proportion is model-dependent — o4-mini has lower thinking token overhead than o3 for simple tasks, which is why routing straightforward tasks to the lighter model makes a real difference. Measuring this split requires reading the raw API response metadata that tools like whoburnedmore surface.
Estimate your monthly Codex cost before running
Use this three-step estimate before a heavy Codex session to avoid unexpected bills:
- 1
Estimate your average context size
What files will you need Codex to read? A typical source file is 5–15k tokens. A large module with tests might be 50k. Add the estimated conversation history for a session that might run 10 back-and-forth exchanges at 500 tokens each: 5k more. - 2
Estimate output per exchange
If Codex is writing a new component, estimate the code output: 200–500 tokens per function. If it's refactoring, it outputs the full revised file — could be 2–10k per exchange. - 3
Add a 3–5× multiplier for thinking tokens on o3
Multiply your output estimate by 3–5 to account for invisible reasoning tokens. For o4-mini on simple tasks, the multiplier is closer to 1–2×. If you're not sure which model backs your Codex sessions, run whoburnedmore — it shows the model split. - 4
Compare to the flat-plan threshold
If your estimated monthly cost (sessions × session cost) is less than your ChatGPT subscription price, the flat plan is the better deal. If it's close to or higher than the flat rate, you might be better off on API pay-as-you-go for predictable, heavy usage.
Actually measuring what you spent
Estimates are useful before a session. After your session, you need real numbers. Whoburnedmore reads the Codex CLI local logs and tallies your token usage per day and per model so you can see the actual cost, not the estimate:
$ npx whoburnedmore↳ reading ~/.codex/logs/ … CODEX CLI (last 30 days) model sessions input + think output est. cost ─────────────────────────────────────────────────────────────── o3 12 41.2M 6.8M $58.40 o4-mini 28 8.9M 2.1M $4.20 ─────────────────────────────────────────────────────────────── total 40 50.1M 8.9M $62.60 ↳ on ChatGPT Pro ($200/mo): saving ~$137 this month
The /status command: useful but limited
Codex CLI's built-in /statusshows your usage within the current 5-hour rolling window. That's useful for knowing how much headroom you have before hitting the rate limit, but it resets after 5 hours and gives you no monthly aggregate. For the full picture, you need to read the persistent local logs — which is what whoburnedmore does.
| What you want to know | /status | whoburnedmore |
|---|---|---|
| Current 5-hour remaining | estimated | |
| Full session history | — | |
| Monthly aggregate cost | — | |
| Model split (o3 vs o4-mini) | — | |
| Thinking vs output token split | — | |
| Cross-tool comparison | — | 12+ tools |
Understanding the 5-hour window
Codex CLI on ChatGPT plans enforces a 5-hour rolling usage window. Once you hit the cap, new Codex sessions are queued or refused until the window rolls forward. This means the cost behavior of Codex is less about per-token pricing and more about time-based throughput on flat plans.
For API users, there's no equivalent hard cap — you're limited by OpenAI's rate limits on requests per minute and tokens per minute, which are typically much higher for paid API tiers. The relevant limit on the API is your account credit balance, not a usage window.
Track Codex alongside your other tools
If you also run Claude Code, Gemini CLI, or other agents, your Codex usage is just one piece of your total AI coding spend. See Codex usage limits in detail for the 5-hour and weekly caps, or use whoburnedmore to see all tools together.thinking token multiplier on o3
logs stay on your machine
rolling window on flat plans
Codex CLI cost is tractable once you separate the three token types and know your model mix. Running npx whoburnedmore after a heavy Codex week gives you the real breakdown — not the estimate — so your next round of sessions is better scoped and less surprising. 🔥
Related guides
How to Check Codex CLI Usage and Token Cost
Codex CLI's /status only shows the current window — here's how to see your full history and cost.
Codex Usage Limits: 5-Hour and Weekly Caps
The 5-hour vs weekly windows on a ChatGPT plan — and how to track consumption.
AI Coding Cost: Claude Code vs Codex vs Gemini
A real cost comparison of the big three — measured from your logs, not marketing.