Anthropic Alone Priced for a Smaller AI Boom

Key insights

Token demand can be gamed. When employees compete on leaderboards to burn the most tokens, the metric stops measuring real usage
The AI infrastructure cycle is priced on an assumption that must survive an 18-month lag between capacity decisions and actual demand
Anthropic's per-token billing is a bet that only verified demand counts. It sits opposite the flat-rate unlimited model used elsewhere
If Anthropic and OpenAI both go public this year, investors get their first direct comparison of two opposite bets on how big AI demand really is

SourceCNBC

Published April 17, 2026

CNBC

Hosts:Deirdre Bosa

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

CNBC's Deirdre Bosa opens her April 17 segment with a blunt claim: the AI demand signal is broken, and only one company is acting like it. The clip is under five minutes. The argument inside is that the token consumption number everyone is building for may not be real.

She points to three things. Employees gaming tokenmaxxing leaderboards at Meta and Shopify. AI agents running in the background and burning millions of tokens without supervision. Flat-rate subscriptions that let a single $200 plan generate thousands of dollars in compute. Meanwhile NVIDIA sells hundreds of billions in chips, data center operators commit to 30 gigawatts of new capacity (enough power for several major cities), and the whole cycle assumes usage keeps climbing.

Anthropic is the only major lab that has priced for a smaller version of that world. It killed flat-rate access for third-party tools, moved enterprise customers to per-token billing, and CEO Dario Amodei said on the Dwarkesh podcast that other AI companies are "just doing stuff because it sounds cool" without writing the spreadsheet. Both Anthropic and OpenAI are expected to go public this year. That will be the first moment the public market sees which of the two priced reality right.

What a token is, and the rented-car analogy

A token is the basic unit of AI usage. Every prompt you type, every response the model writes, every line of code an agent produces is billed in tokens. A quick chat costs a few hundred. An AI agent that browses the web, writes code, and runs tasks on its own can burn millions of tokens in a single session.

Bosa's analogy captures the shift. Using a chatbot is like ordering a car. Deploying an agent is like sending that car out to run errands all day, on your credit card, while you're not watching.

The numbers say the same thing. One estimate Bosa cites suggests a single $200/month Max plan running an autonomous agent could generate $2,000 to $5,000 in compute. That is a 10x to 25x mismatch between what the user pays and what the compute actually costs.

The budget math is breaking

The first cracks are showing up in company budgets. Uber's CTO told The Information this week that AI coding tools have already maxed out Uber's full-year AI budget. It is only April. Goldman Sachs Research adds that enterprise customers are overrunning their inference budgets by orders of magnitude, with AI costs on track to rival engineering headcount this year.

On top of that, there is the tokenmaxxing problem. Meta and Shopify have put employees on leaderboards that track how many tokens they use, not what they ship. NVIDIA CEO Jensen Huang said it plainly: if a $500,000 engineer has not consumed at least $250,000 worth of tokens, he would be "deeply alarmed."

Two other CEOs said the quiet part out loud. Ali Ghodsi, CEO of Databricks, which processes AI workloads for thousands of companies, told CNBC that "if your goal is to just burn a lot of money, there are plenty of easy ways to do that." And Eric Glyman, CEO of Ramp, which tracks AI spend for thousands of businesses, put it just as bluntly: "You can use the most advanced model on the planet to edit your email, but maybe you don't need to."

When the metric can be gamed by the people being measured, the number stops measuring. That is the core of Bosa's argument.

Anthropic's counter-move

Anthropic's response has been, so far, unique in the industry. It moved first on three things.

First, it killed flat-rate access for third-party tools. Popular AI harnesses like OpenClaw, an open-source agent that can run autonomously for hours, can no longer be used under a flat-rate Claude subscription. Users who want that now pay per token on top. The math made it unsustainable: a $200/month plan that generates thousands of dollars in compute every day costs Anthropic money each time the user runs it.

Second, it is moving enterprise customers off flat-rate seats to per-token billing. If you want Anthropic models inside your company, you pay for what you actually use, not for the number of seats that might one day use them.

Third, the framing. On the Dwarkesh podcast, Amodei said other AI companies "have not written down the spreadsheet" and "don't really understand the risks they're taking. They're just kind of doing stuff because it sounds cool."

That is unusually blunt language about his largest rivals. It is also a position with a built-in trade-off. If demand really takes off, Anthropic misses the upside. If demand turns out to be a leaderboard mirage, Anthropic is the one still standing on solid ground.

The cone of uncertainty

Amodei calls the gap between today's capacity decisions and tomorrow's real demand the cone of uncertainty. Data centers take one to two years to build. That means every AI company is making billion-dollar bets today on demand that has not materialized yet. Buy too little and you lose customers. Buy too much and the revenue does not show up. As Amodei puts it: "If you're off by a couple of years, that can be ruinous."

The CNBC segment closes on a moment that has not arrived yet but will. Both Anthropic and OpenAI are expected to IPO this year. When that happens, public markets get a direct look at two opposite bets on AI demand. One company will show up with per-token revenue that mirrors what customers pay for. The other will show up with a much bigger top line, built on flat-rate plans and capacity that assumes today's usage keeps growing.

The market tends to reward the company that knows what it sold. Bosa's thesis is that 2026 will be the year we find out which of the two labs actually does.

Glossary

Term	Definition
Token	The basic unit of AI usage. Text is chopped into tokens (a short word or piece of one), and every prompt and response is billed in tokens
AI agent	An AI that runs on its own. Instead of answering one question, it keeps working: opens websites, writes code, sends messages, runs for hours
Inference	The actual work the AI model does to produce an answer. It is what you pay for each time the model runs
Tokenmaxxing	The practice of ranking engineers by how many AI tokens they burn rather than by what they ship. The word blends "token" and "maxxing" (maxing out)
Flat-rate plan	A subscription that lets you use a service as much as you want for a fixed price. The risk: a few heavy users can cost the provider more than they pay
Cone of uncertainty	Dario Amodei's term for the risk of making long-lead decisions, like building a data center, on demand numbers that may not be real

Sources and resources

CNBC — AI Demand Is Overstated, Only Anthropic Is Being Realistic (YouTube) — The segment itself
Deirdre Bosa at CNBC — Host and columnist profile
Dario Amodei on Wikipedia — Background on the Anthropic CEO
Jensen Huang on Wikipedia — Background on the NVIDIA CEO
Ali Ghodsi on Wikipedia — Background on the Databricks CEO
Eric Glyman on Ramp — Ramp CEO profile
Dwarkesh Podcast — Where Amodei's spreadsheet quote came from
OpenClaw — The third-party AI agent Anthropic cut off from flat-rate Claude