Don't let your CFO cancel your AI
Frontier AI is getting expensive. It's time to plan around that.
The frontier AI subsidy is ending. Microsoft just canceled a chunk of Claude Code licenses and Uber is questioning whether AI is worth it after burning through its entire 2026 budget in four month. Both companies got a look at their renewal numbers ahead of you and stepped back from the edge, and both companies are looking to resolve this via abandonment. This is the wrong move.
The cliff is real and the math is rather unforgiving. We’re going to see quite a few more companies either adjust course or abandon the efforts entirely. But the people that are smart about it are not only going to be fine, but have a pretty significant advantage over these companies that cut loose after letting overuse run rampant.
The key is cheaper models and smarter strategies (and to get your staff to quit tokenmaxxing, for Pete’s sake).
The subsidies are running out
Anthropic ships Opus 4.7 at $5 / $25 per million input / output tokens, with a bizarre tokenizer change that made the exact same prompts consume 1.0-1.35x more tokens than Opus 4.6 (which release just two months prior).
GPT-5.5 launched at $5 / $30, a massive 2x jump from GPT-5.4. The Pro version of 5.5 sits at $30 / $180 (so forget about that).
ChatGPT Pro and Claude Max are both $200 a month, and Sam Altman has already said Pro is a money-loser. The prices are subsidized. The Information has OpenAI losing $5 billion in 2024 and projects $14 billion in 2026 and $44 billion by 2028. Anthropic raised $13 billion at a $183 billion valuation last September to keep the inference layer running. Every Claude Code session, every Codex run, every Cowork operation you launched this week was paid for in part by venture capital and Microsoft’s balance sheet, not by your subscription.
The walk-back has started (and later than I expected, to be honest). Anthropic moved Claude Code and the Claude Agent SDK off “included with Claude Max” and onto metered API credits in late April. GitHub Copilot flips to usage-based billing on June 1. Cursor raised its Pro seat in March. The flat-rate enterprise agreement your CFO signed in 2024 is going to reprice against actual inference cost in 2026 and 2027, and pooled-credit allowances are getting cut.
Both Microsoft and Uber companies ran the same math twelve months earlier than the rest of the market and landed on the same place. The two enterprises with the clearest view of frontier inference cost both told their orgs to slow down inside the same quarter. If you’re waiting for a more obvious signal to move, you missed it.
The cheap models got good
While the Anthropic and OpenAI behemoths pushed headlines and budgets, the little guys underneath have slowly caught up. A million output tokens a day on GPT-5.5 costs $30. The same tokens on Kimi K2.6 cost $2.50 (Cursor’s Composer 2.5, a wildly impressive model that I’ll talk about more later, doesn’t publish their exact pricing but is likely similar).
That’s $10,000 a year per engineer on a single line item. Kimi, DeepSeek, and GLM are open-weight, so you can run them in your own VPC or on-prem when compliance or cost demands it. The cheap models also burn a third of the kilowatt-hours and water of the same job on Opus, which is the environmental side of the same procurement argument (a vital argument that is only going to get more loud in the coming year).
Opus still holds SWE-Bench Pro and GPT-5.5 still leads FrontierMath Tier 4. But this gap covers maybe 10% of the work your team does, and I’m being generous. The other 90% runs the same on Kimi, DeepSeek, GLM, or Composer for an eighth of the price.
Your daily driver and your standby tier
I don’t want the takeaway here to be either “buy the enterprise frontier contract” or “abandon AI.” The former is becoming unreasonable and the latter will put you behind the corps with cash.
It’s tiering, resulting in less than $50 a month per person.
Daily driver: a Cursor seat at $20 a month, running Composer 2.5 as the default in-editor model. Cursor 3 shipped a simplified surface for casual users. Non-engineers can use the editor for non-code work without learning vim or knowing what a buffer is, and the unified-agents view puts cloud runs and editor sessions on the same panel. Composer 2.5 is fast, responsive, and built for iterative work. I’ve been using it as my default for the past two weeks and I am blown away by what it can do. There have been times where I’ve churned through 20 bucks of Opus on a problem with no success, only to swing in with some rapid Composer 2.5 iteration and get it solved with pennies. I don’t know what sort of voodoo Cursor did with this model, but its working.
Standby tier: a $20 Claude Code or Codex Plus subscription, in your pocket for the few times a week the work goes frontier-shaped. Hard architectural refactors. Long-horizon agentic runs. Research-flavored math problems. Truth-critical legal or financial review. You don’t need an annual enterprise Anthropic agreement for these. You need access on the days you need it and billed in a way that won’t scare your finance team.
I think the tiering above, with some model variations, will become standard. So much so that I wouldn’t be surprised if Anthropic and OpenAI see the writing on the wall and try to whip up their own version of Composer 2.5
The is the new AI stack for engineers and PMs.
“I’m a leader at my company. What do I do?”
Get the full bill. Per-seat Copilot. Anthropic and OpenAI API. Cursor. Codex. ChatGPT Enterprise. Notion AI. Glean. Jasper. Hubspot AI add-ons. Bedrock. Vertex. Sum it. Most leaders cannot answer this question, which is exactly why the line is growing.
Run two A/Bs. The biggest engineering workload and the biggest non-engineering workload. Two weeks each on Composer 2.5, Kimi K2.6, DeepSeek V4, or GLM-4.6. Measure shipped output. People never measure output properly and its infuriating.
Tier your model menu. Cheap or in-editor model is the default. Frontier requires written justification or purposeful restrictions.
Pilot self-hosted open-weight by Q4. Kimi K2.6 or DeepSeek V4 on internal infra. This is just to have the option ready when the renewal hits, not to necessarily mandate it. This kind of stuff takes time to set up.
Pull your renewal forward. If your enterprise agreement with Anthropic or OpenAI renews in 2026, the new number probably won’t be pretty. Renegotiate now or migrate part of the workload off-frontier before the cliff.
“I don’t want to lose these tools. How do I convince my company to keep them?”
Walk in with the bill. Pitch a specific number, not a moving model target. “We spent X on Opus. Same workload on Composer at Y. Delta is Z.” Numbers are what will convince CFOs (and make you look smart).
Run a two-person, two-week pilot. On a non-critical workload, please. Then write up a one-page memo with the numbers.
Flip the risk frame. The risk isn’t “what if the cheap model is worse.” The risk is “what if our competitor figured this out three months before we did.” The business advantage in this next era of AI pricing is figuring out the balance between maximum intelligence at affordable and sustainable pricing.
Argue workflow fit, not benchmarks. Composer isn’t better than Opus on the leaderboard, but it’s likely better for the work you do. We don’t use industrial ovens to bake birthday cakes.
Name your escalation path upfront. Frontier still wins frontier work, throwing smaller models at huge repo-wide tasks will cause even more technical debt. This credibility is what makes the rest of the pitch land (and makes your boss not feel like they’ve been driving a sinking ship since onboarding Claude Code in January).
The bet
Don’t abandon these tools. The companies and people who learn to use them well will spend the next two years running AI at a third of the cost of the ones still buying the tokenmaxxing leaderboard. Strive to ship faster on the same budget, and keep frontier on standby for the days its needed. The ones who ride the cliff down to the renewal repricing without changing anything are going to look up in twelve months and realize their competitor is doing more work for less money on the same playing field.
Composer in the editor. Claude Code or Codex in your pocket. Cheap by default. Good for your wallet (and better for the planet).
Pace yourself. Or somebody else will pace you.




This feels like the next awkward phase of AI adoption.
First everyone rushed in because it felt cheap enough.
Now the bill is arriving and companies have to learn the boring adult version of using AI properly.
The panic is real, but walking away entirely is a massive unforced error.
It feels exactly like the early cloud days when companies got hit with their first massive AWS bill because nobody was monitoring idle instances.
The solution wasn't to move back to on-prem server closets; it was to actually learn how to architect for cost efficiency.