Skip to content
Back to articles

Tokenmaxxing: Silicon Valley's New AI Contest

April 17, 2026/7 min read/1,381 words
AnthropicOpenAIAI InfrastructureGenerative AI
CNBC Television studio with Deirdre Bosa interviewing Eric Glyman about tokenmaxxing
Image: Screenshot from YouTube.

Key insights

  • Tokenmaxxing turns AI usage into a performance metric, but Goodhart's law says any measure that becomes a target stops measuring what it was meant to
  • Ramp's data shows a K-shaped AI economy. The top 25 percent of AI spenders have more than doubled their revenue while the bottom 25 percent are flat
  • OpenAI is building for maximum demand. Anthropic is throttling and charging. Only one of these strategies has read the market correctly
  • Investor Dan Niles points to Amazon in 1999: even if you picked the winner, the stock fell 95 percent peak to trough. Overbuild is guaranteed in revolutionary tech
Published April 9, 2026
CNBC Television
CNBC Television
Hosts:Deirdre Bosa
Ramp
Guest:Eric GlymanRamp

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

There's a new thing in Silicon Valley and it's called tokenmaxxing. Engineers are competing to see who can burn the most AI tokens. Nvidia CEO Jensen Huang has said he'd be alarmed if a top engineer weren't burning $250K a year in AI compute. Shopify uses token usage as a performance signal. Meta employees reportedly burned through about 900 million tokens in a month.

In a 42-minute CNBC livestream on April 9, 2026, Deirdre Bosa asks the big question: how much of the inflated AI demand is real, and how much is just the game? She interviews Eric Glyman, CEO of Ramp, who is launching a new product to track enterprise AI spend, and investor Dan Niles, who warns of a bubble.

Both agree there is a real gap between companies that adopt AI and those that don't. But when AI usage becomes the yardstick itself, the metric breaks. And over all of it sits the bigger question: has OpenAI or Anthropic read the market correctly?

What is tokenmaxxing?

A token is the base unit of AI usage. When you type to a chatbot or an agent, the text is split into tokens, and every response costs a certain number of tokens. More tokens means more compute, which means a bigger bill. The more powerful the model (like Claude Opus or GPT-5), the more each token costs.

Tokenmaxxing is the culture where AI usage becomes a status indicator. It's a little like how lines of code once were a performance metric among programmers. Now it's tokens burned in AI systems.

Ramp is building a product that gives finance leaders visibility into a company's full AI spend: how many tokens, on which tasks, with which model, and whether a cheaper model could have done the job equally well. Glyman says AI spend across Ramp's customer base has grown 13x in a year, with 50 percent growth each quarter. And nobody knows how to budget for it.

The metric trap: when the target distorts the measure

Here's where Goodhart's law shows up: "a measure that becomes a target stops being a good measure." Glyman puts it this way:

"Once you incentivize people to go and get off the call, people hang up quickly. Once you incentivize use as many tokens, you'll see engineers go and count all the digits of pi and use these tokens."

It's not a hypothetical. Bosa opens the broadcast with the story of Amazon, which once measured call center reps on how quickly they ended calls. Reps started hanging up on customers to get the time down. The metric went up. Customer service collapsed. Jeff Bezos killed it overnight.

The same failure mode is now playing out with AI tokens. Except this time the stakes are more than a trillion dollars in planned AI infrastructure.

The Ferrari-for-groceries analogy

Glyman's most-quoted analogy: using a Ferrari to deliver groceries. Meaning: you send a frontier model like Claude Opus or GPT-5 on a task a much cheaper model (Claude Haiku, an older GPT variant, or an open-source model) could have handled.

"You can use the most advanced model on the planet to edit your email, but maybe you don't need to."

The point isn't that no one should use the most powerful models. The point is that finance leaders have no way to know how much AI spend is actually generating returns and how much is just habit or engineer-leaderboard posturing.

Glyman adds an interesting data point from OpenRouter (a service that routes requests between different AI models): the frontier models' share of all tokens used has fallen from over 20 percent to around 4 percent. People pick cheaper models when they can.

Two strategies: OpenAI vs Anthropic

Here's where the real tension in the broadcast lives. The two most visible AI companies in the US have picked opposite strategies:

OpenAIAnthropic
Main moveBuild maximum capacity, cut pricesCap usage, require payment, cut off third-party access
AssumptionDemand keeps going straight upOnly real, paying demand counts
RiskCan't recoup the investment if growth flattensMiss the market if demand really takes off

Glyman hints that his own instinct sits closer to Anthropic's discipline: aggressive goals, but realistic about return on investment. Niles is more direct. OpenAI had, by his numbers, roughly $20B in annualized revenue but $1.4 trillion in capital commitments. They're expected to burn $220 billion in cash flow through 2029 before reaching profitability in 2030.

The result of that imbalance is what you're seeing in the stocks: Oracle down hard (more than half of Oracle's backlog is tied to OpenAI), Microsoft down 20 percent year-to-date (it owns 27 percent of OpenAI). Google, on the other hand, is up on the year, because the company is still free-cash-flow positive.

The K-shaped AI economy

Ramp's data across more than 50,000 businesses shows a clear split:

  • Bottom 25 percent (lowest AI spenders): revenue growth of about 12 percent over three years. Basically flat.
  • Top 25 percent (highest AI spenders, including roofers and contractors, not just tech): revenue more than doubled.

The message is nuanced. AI adoption works, but that doesn't mean every dollar spent is optimal. You can use AI well and still waste. It's also possible that top performers simply use more tokens because they have more real work to do, not because they're gaming a leaderboard.

Historical mirror: Amazon 1999 or Cisco 2000?

Dan Niles is a portfolio manager who has watched two major tech waves run through the markets. His frame is the dot-com bubble:

  • Amazon (1999 to 2003): revenue went from $1.6 billion to $3.1 billion, nearly doubling. But the stock fell 95 percent peak to trough. Even if you picked the winner, you had to sit through the entire drop before getting back to even.
  • Cisco: the clear winner of the broadband buildout. Its stock took more than 20 years to get back to 1999 levels.
  • The NASDAQ: down 78 percent over two and a half years. Around a thousand internet companies went to zero.

Niles's point: when the infrastructure opportunity is this big, you are guaranteed to get overbuild. It's not a question of if. It's a question of which companies survive the drawdown.

His current read, as of April 2026:

DirectionCompaniesWhy
Positioned wellAnthropic, Amazon, AppleAmazon hosts Anthropic's workload and has physical infrastructure. Apple can sit back with 1.5 billion iPhones and license AI from Google
Cautious onMicrosoft, OracleBoth have heavy OpenAI exposure, and Niles sees OpenAI squeezed between Anthropic (enterprise) and Google (consumer)

The agent shift changes the math

One underdiscussed point from the interview: once agent-driven AI products (Niles specifically names Anthropic's Claude Code) rolled out in winter 2026, OpenRouter token growth jumped from 20 percent to 130 percent over two months.

But agents need a different kind of hardware than classic AI workloads. Today's AI infrastructure is built for GPUs (graphics processing units) that do repetitive math in parallel at extreme speed. Agents do many different things in sequence: hit a website, open a spreadsheet, call an API, remember the last step. That's a job for CPUs (central processing units), more memory, and optical networking between servers.

That's why Intel, written off for three years, is suddenly interesting again. This week Google and Intel announced a deal. Niles predicts the agent boom could pull a whole cluster of "dead" tech names back to life.

What this is really about

Tokenmaxxing is more than a weird competition among engineers. It's a proxy metric for a much harder question: how much of the giant AI infrastructure being built right now will actually be used?

If OpenAI is right, we're still early. Demand is growing exponentially, agent workloads are only just starting, and the companies building the most capacity win.

If Anthropic is right, the market is already picking the winners, and discipline matters more than volume. Willingness to pay is the real signal, not tokens burned on a leaderboard.

Both can't be right at the same time. And in the meantime, Silicon Valley's engineers keep competing to see who can burn the most tokens.

Glossary

TermDefinition
TokenThe base unit of AI usage. One token is roughly 3/4 of an English word. You pay per token in and per token out
TokenmaxxingThe culture of treating AI usage as a performance metric. Engineers and companies compete on who can burn the most tokens
Frontier modelThe most powerful and expensive AI models: Claude Opus, GPT-5, Gemini Ultra. Built for complex tasks, but often overkill for everyday ones
Goodhart's law"A measure that becomes a target stops being a good measure." The effect that, once you incentivize a metric, people start optimizing for the metric itself instead of the thing it was supposed to measure
GPU (graphics processing unit)Built to run repetitive math in parallel at extreme speed. Today's AI stack is built on GPUs
CPU (central processing unit)Built to run many different tasks in sequence. Useful for agents that hop between different kinds of work
AgentAn AI that performs tasks on its own instead of just answering questions. It can open spreadsheets, read websites, call APIs
OpenRouterA third-party service that routes AI requests across different models and providers, and publishes usage data
Innovator's dilemmaThe theory that incumbents struggle to adopt disruptive technologies because they threaten the existing business model
Jevons paradoxWhen something gets cheaper or more efficient to use, total consumption often rises rather than falls

Sources and resources

Share this article