OpenAI Codex Credit Billing in 2026, Explained
In 2026 OpenAI Codex switched to a metered, credit-based bill priced per million tokens. Here is what changed, how to estimate the new charge, and how to watch it day by day.
Quick answer
npx whoburnedmore. 🔥For a long time the deal was simple: pay for a ChatGPT plan, get a fixed pool of Codex activity, and stop worrying about per-token math. That arrangement changed in 2026. Codex now draws down a balance of credits, and each credit is spent against the tokens your sessions actually move — the prompts and context you send in, the cached fragments that get reused, and the code the model writes back out. The headline rates live on OpenAI's pricing page; what this guide does is explain the shape of the new bill so the numbers there stop looking like noise.
What is Codex credit billing?
Credit billing means Codex meters consumption rather than handing you a flat quota. You hold a credit balance — topped up by your plan or bought directly — and every Codex exchange decrements it in proportion to the tokens processed. There is no “50 requests per window” abstraction sitting between you and the cost; a heavyweight refactor that drags a large repository into context spends far more than a one-line tweak, even though both are a single command. The current per-million rates for each token class are published at developers.openai.com/codex/pricing, and they move, so treat that page — not a number memorised from a blog post — as the source of truth.
Why OpenAI moved to metering
A flat allowance is easy to reason about but hard to price fairly: a developer running autonomous multi-file agents burns orders of magnitude more compute than someone asking the occasional question, yet both paid the same. Token-priced credits let the bill follow the work. The trade-off is that you now have to estimate ahead of time instead of relying on a comfortable ceiling.How do Codex credits map to tokens?
Three token classes are priced separately, and the gap between them is the whole reason the bill is hard to eyeball:
- 1
Input tokens — the most common, mid-priced
Everything you send into a turn: your instruction, the files Codex pulls into context, tool results, and prior conversation. On a long agentic session this dominates volume. - 2
Cached input tokens — the cheap reuse discount
When the same context prefix recurs across turns, Codex can serve it from cache at a steeply reduced rate. Long-running sessions on one codebase benefit the most, which is why a chatty back-and-forth can cost less per turn than its raw token count suggests. - 3
Output tokens — the priciest per token
The code, diffs, and explanations the model generates. Output is the dearest class, so a task that writes a lot — scaffolding a module, generating tests — skews your bill upward out of proportion to how much you typed.
The practical consequence: two sessions that move an identical total number of tokens can cost wildly different amounts depending on how that total splits across the three classes. A bill dominated by cached input is cheap; one dominated by fresh output is not.
How much does Codex cost per token in 2026?
Rather than quote a figure that will be stale by next quarter, estimate it. Take your token counts for a period, multiply each class by its current published rate (per million), and sum:
Plug in your own usage. Say a week moved 8M input tokens, 5M of which hit cache, plus 600K output tokens. You would price the 3M uncached input at the input rate, the 5M cached portion at the cheaper cached rate, and the 600K output at the (highest) output rate, then add the three. The reason this is worth doing by hand once is that it builds intuition for which lever — trimming context, leaning on cache, or generating less — actually shaves your bill.
Estimate before you commit to a heavy run
Before kicking off an autonomous multi-file task, glance at the rates and your recent token-per-task average. A rough product of the two tells you whether this is a 10-credit errand or a 500-credit campaign — useful when the balance is finite.How is this different from the old plan allowance?
Under the previous model your ChatGPT plan included a flat pool of Codex activity, metered in coarse units like requests-per-window. You either had room or you were rate-limited; the dollar cost of any single task was invisible. Credit billing inverts that — the cost is legible per task, but the comfortable “unlimited within the cap” feeling is gone.
| Old plan allowance | 2026 credit billing | |
|---|---|---|
| Unit | requests / window | tokens → credits |
| Cost per task | opaque | computable |
| Cheap reuse | — | cached input rate |
| Risk | mid-session lockout | balance drawdown |
| Best move | pace requests | estimate + track spend |
How do I track my Codex spend?
Estimating ahead is half of it; the other half is checking what you actually burned. Codex writes a record of every session to disk locally, and npx whoburnedmore reads those logs, totals the tokens by class and model, applies current rates, and gives you a per-day, per-model cost estimate — no API key, no dashboard login:
$ npx whoburnedmore↳ reading Codex session logs… CODEX CREDIT SPEND — by day ────────────────────────────────────────── Jun 17 in 1.9M cached 1.1M out 142K ≈ 84 cr Jun 16 in 0.8M cached 0.5M out 61K ≈ 33 cr Jun 15 in 2.4M cached 1.6M out 188K ≈ 97 cr 7-day est. spend: ~480 credits
Splitting the meter into input, cached, and output is exactly what lets the estimate match the bill — a tool that only reports a single token total cannot, because it has no way to apply the three different rates. Seeing the per-class columns also tells you wherea heavy day came from: a spike in the output column is a generation-heavy day; a spike in uncached input means context kept getting rebuilt from scratch.
What leaves your machine?
Only daily aggregates — date, token totals, and the cost estimate — are sent if you submit to the leaderboard. The prompts, file contents, and code Codex handled stay local and are never read or transmitted. Prefer to keep everything on the machine? Run npx whoburnedmore --local and nothing is uploaded at all. 🛡️
Compare with the limits and cost guides
For how usage caps still interact with credits, see Codex usage limits; for a broader breakdown of what drives the bill, see Codex CLI cost. This page is the one about the billing model itself.token classes priced separately
logs whoburnedmore reads
aggregates that leave the machine
Bottom line: credit billing rewards developers who understand the three-rate structure and keep an eye on the meter. Estimate with the equation above, point npx whoburnedmore at your logs to see the real numbers, and the 2026 bill stops being a surprise.
Related guides
How to Check Codex CLI Usage and Token Cost
Codex CLI's /status only shows the current window — here's how to see your full history and cost.
OpenAI Codex CLI Cost: Estimate and Track It
The token math behind Codex CLI pricing — plus how to track what you really spend.
Codex Usage Limits: 5-Hour and Weekly Caps
The 5-hour vs weekly windows on a ChatGPT plan — and how to track consumption.