We've spent the past several weeks dissecting Claude Code's cache management bugs, building fixes, and documenting hidden costs. There's still one question we owe a clean answer to: what does heavy Claude Code usage actually cost?
Not in subscription dollars — that's easy. The harder question is: what would that same usage cost at API rates? And how much of that value depends on the cache actually working?
We instrumented our sessions to find out, then re-ran the analysis after Anthropic shipped Opus 4.7 mid-April. The numbers got more dramatic.
The Setup
We captured 28,358 API calls over 25 days (April 4–28, 2026) using claude-code-meter, the public usage-collector we ship alongside our cache-fix proxy. These are real production sessions — multi-agent development work, blog writing, infrastructure management — running on a Max plan with our cache interceptor active.
Every call was logged with full token counts: input, output, cache creation, and cache read. We priced each call at Anthropic's published API rates (verified 2026-04-14, prior to the Opus 4.7 pricing shift on subsequent calls).
The window covers both the Opus 4.6 era (April 4 through ~April 13) and the Opus 4.7 era (~April 14 onward). That mix matters — we'll come back to it.
The Bottom Line
| Metric | Value |
|---|---|
| API calls (25 days) | 28,358 |
| Average API calls/day | 1,134 |
| API-equivalent cost (25 days) | $4,664.61 |
| API-equivalent monthly projection | $5,597/mo |
| Max 5x subscription cost | $100/mo |
| Max 5x effective discount | 98% |
Read that again: the same usage that costs $100/month on the Max 5x plan would cost $5,597/month at API rates. That's a 56x leverage ratio.
For reference, on the Max 20x tier ($200/mo), the leverage is 28x — still substantial. The same usage that costs $200/mo subscribed would cost $5,597 at API rates.
Per-Call Economics
At 1,134 calls per day, each API call's share of the subscription works out to fractions of a penny:
| Metric | Max 5x | Max 20x |
|---|---|---|
| Average API-equivalent cost per call | $0.1645 | $0.1645 |
| Effective subscription cost per call | $0.00294 | $0.00588 |
| Leverage ratio | 56x | 28x |
At $0.1645 per call at API rates, $100 buys you just 608 calls — about 20 per day. We're making 1,134 calls per day. The subscription isn't just a discount; at our usage level the API alternative is economically unsustainable for most operators at this rate.
For lighter users the leverage is lower but still substantial. For heavier users hitting the quota ceiling, cache efficiency determines how much work you actually get done within the subscription.
Where the Money Goes
| Model | Calls | Share of Calls | API-Equiv Cost | Share of Cost | Avg $/Call |
|---|---|---|---|---|---|
| Opus 4.6 | 20,064 | 71% | $3,230.26 | 69% | $0.1610 |
| Opus 4.7 | 5,794 | 20% | $1,400.68 | 30% | $0.2417 |
| Haiku 4.5 | 2,465 | 9% | $25.86 | 1% | $0.0105 |
| Sonnet 4.6 | 29 | <1% | $7.81 | <1% | $0.2693 |
| Opus 4 | 6 | <1% | ~$0 | <1% | — |
| Total | 28,358 | 100% | $4,664.61 | 100% | $0.1645 |
Two things stand out:
Opus dominates cost. Across both 4.6 and 4.7, Opus accounts for 91% of calls and 99% of cost. Haiku handles 9% of calls for 1% of cost. Optimizing an Opus call has roughly 17× the cost impact of optimizing a Haiku call, and since Opus dominates volume, that's where cache efficiency matters most.
Opus 4.7 is ~50% more expensive per call than 4.6. Per-token rates are unchanged — Opus 4.7 ships at the same input / output / cache pricing as 4.6. What changed is the tokenizer, which inflates token counts by roughly 35% on the same input, plus shifts in average call shape (more reasoning tokens, slightly heavier prompts on average). The combination shows up empirically here: the same workload, on a different model version, costs 1.5× as much per call to process at API-equivalent rates.
This has a counterintuitive consequence we'll cover next.
The Opus 4.7 Transition: Why Leverage Went Up, Not Down
When Anthropic shipped Opus 4.7 mid-April, the natural assumption was that Max plan economics would tighten — higher per-call cost at API rates, same flat subscription, less "headroom" for users.
Our data shows the opposite. Leverage went up.
The mechanism: subscription cost is flat. When per-call API cost goes up, the gap between API rates and subscription gets wider, not narrower. As long as you keep doing the same work, your subscription is now buying MORE notional API value, not less.
| Metric | Pre-4.7 (3 weeks earlier) | Now (25 days, mixed) |
|---|---|---|
| Avg cost per Opus call | $0.0752 | $0.1645 |
| Calls per day | 1,423 | 1,134 |
| Monthly projection | $3,209 | $5,597 |
| Max 5x leverage | 32x | 56x |
| Max 20x leverage | n/a* | 28x |
*Max 20x wasn't a current tier in our earlier measurement window.
Two caveats this paints over but worth naming:
-
Quota burn is what actually limits you, not API cost. Subscription absorbs the API cost difference; Anthropic eats it. But the quota ceiling sits in token-processing units, and Opus 4.7 burns more tokens per equivalent task. So the SAME work hits the quota faster on 4.7. You can "afford" more in a leverage sense; you can DO less per Q5h window.
-
Cache hit rate is the gate. All these numbers assume the cache is working. The 56x leverage ratio is conditional on us maintaining a 99% cache hit rate during this period. Drop that hit rate, and quota burn accelerates while leverage compresses.
Cache Efficiency: The Real Story
Here's where this connects to everything we've been writing about.
| Metric | Value |
|---|---|
| Overall cache hit rate | 99% |
| Total cache reads | 7.7 billion tokens (7,665.4M) |
| Total cache writes | 65.2 million tokens |
| Actual API-equivalent cost | $4,664.61 |
| Cost if 0% cache hit rate | $38,647.92 |
| Savings from caching | $33,983.31 |
| Cache savings as % of no-cache cost | 87.9% |
That's the number: $33,983 in API-equivalent savings over 25 days, purely from the cache working correctly. Extrapolated monthly, roughly $40,800 in avoided API-equivalent cost — all from prefix matching doing its job.
Or put differently: the cache is responsible for an 88% reduction in what we'd otherwise be charged. The subscription discount handles another layer on top of that.
What Happens When the Cache Breaks
We've documented the failure modes extensively in this series:
- Part 1 — The Problem: Cache hit rates dropping from 90%+ to single digits after session resume, turning every API call into a full rebuild
- Part 2 — Cache Architecture: Three separate bugs that break prefix stability — resume block scatter, fingerprint instability, tool ordering
- Part 4 — The TTL Discovery: Two independent mechanisms that can downgrade your TTL from 1 hour to 5 minutes — a quota-based server-side switch at 100% Q5h, and a client-side allowlist gate that can lock you to 5m regardless of quota. Both create feedback loops that accelerate waste.
- Silent Context Degradation and The Invisible Tax: Even when the cache works, hidden context-bloat mechanisms quietly increase per-turn token consumption.
If our 99% cache hit rate dropped to 50% — which we've measured happening without the interceptor — the API-equivalent cost for this period roughly doubles. More critically, the quota burn rate accelerates, pushing you toward the TTL downgrade zone — dropping cache lifetime from 1 hour to 5 minutes and creating a vicious cycle of rebuilds. (Some users are stuck on 5m TTL from the start due to a separate client-side gating mechanism — see Part 4.)
The 56x leverage ratio assumes the cache works. Break the cache, and the economics start working against you.
The Interceptor's ROI
Our community fix — the JavaScript interceptor that stabilizes cache prefixes at the network layer — is what maintained the 99% hit rate during this measurement period.
The math is direct:
- Cache savings over 25 days with interceptor: $33,983
- Cache savings at 50% hit rate (without interceptor, projected): ~$17,000
- Interceptor value: ~$17,000 in avoided API-equivalent waste over 25 days
That's roughly $680/day in API-equivalent value, or $20,400/month — from a JavaScript file that sorts tool definitions, normalizes resume blocks, and stabilizes the request prefix.
The interceptor is open source. The data we used to write this post comes from claude-code-meter, also open source. Both ship as a single npm package:
npm install -g claude-code-cache-fix
What This Means for Max Plan Users
The Max plan is an extraordinary value proposition for power users. At our usage level, the effective discount is 98% on Max 5x and 96% on Max 20x. Even moderate users making 500 calls/day are getting significant leverage.
But that value proposition has a dependency: the cache has to work.
The subscription covers all usage regardless of cache efficiency — Anthropic absorbs the cost difference. But quota consumption is tied to actual token processing, and the quota is finite. When the cache breaks:
- Each API call consumes more quota — cache misses mean full-price token processing instead of discounted cache reads
- You hit the quota ceiling faster — triggering the server-side TTL downgrade from 1h to 5m (and some users never get 1h in the first place due to a client-side gating mechanism — Part 4 has the full story)
- The downgrade makes caching harder — 5-minute TTL means any pause longer than 5 minutes busts the cache entirely
- The cycle feeds itself — more busts → more quota burn → more downgrades → more busts
The subscription price doesn't change. But the amount of work you can get done within that subscription absolutely does. And the Opus 4.7 transition made each cache miss more expensive in quota-units, not less.
The Takeaway
Three numbers to remember:
- 56x — the leverage ratio between Max 5x cost and API-equivalent cost at our usage level (28x on Max 20x)
- 99% — the cache hit rate that makes that leverage possible
- $33,983 — what the cache saved us in 25 days of work
The Max plan is a great deal. Cache efficiency is what keeps it that way. And right now, keeping that cache efficient requires fixes that Anthropic hasn't fully shipped yet.
This is a companion post to our Claude Code cache investigation series: Part 1 · Part 2 · Part 3 · Part 4 · Part 5 · Part 6 · plus follow-ups Silent Context Degradation and The Invisible Tax.
Data collected from 28,358 API calls between April 4–28, 2026 on Max-tier accounts running multi-agent workloads with the VSITS cache interceptor active. The window covers both the Opus 4.6 era (Apr 4–~13) and the Opus 4.7 era (~Apr 14 onward). API-equivalent costs calculated at published Anthropic API rates verified 2026-04-14. All logging performed locally via claude-code-meter. Subscription billing may differ from API-rate estimates; verify at platform.claude.com/docs/en/about-claude/pricing.
*Investigation conducted by Veritas Supera IT Solutions (VSITS LLC).*