Skip to content
Back to articles

Box CEO: Why Enterprises Keep Failing at AI

April 28, 2026/9 min read/1,838 words
AI AgentsAI InfrastructureAI and EmploymentGenerative AI
Steven Sinofsky, Aaron Levie, and Martin Casado discussing enterprise AI on the a16z podcast
Image: Screenshot from YouTube.
SourceYouTube
Published April 28, 2026
a16z
a16z
Hosts:Steven Sinofsky, Martin Casado
Box
Guest:Aaron LevieBox

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Most large companies are still terrible at AI, and the standard explanation is that they are slow. Aaron Levie, CEO of Box, thinks that gets it wrong. On a recent a16z panel with board partner Steven Sinofsky and general partner Martin Casado, Levie laid out a different read: the technology fundamentally doesn't fit how large organizations work yet, and the gap will take years to close.

The conversation cuts through the hype on three fronts. Why top-down "more AI" mandates collapse. Why agents hit the same integration wall humans hit. And why the people predicting mass job loss are repeating a forecast that has been wrong for sixty years.

The Silicon Valley vs. enterprise gap

Levie spends his weeks visiting customers, and he describes his job as "bringing reality to the valley and the valley to reality". The gap is bigger than most people in tech realize.

A Silicon Valley engineer has near-ideal conditions for AI. High technical aptitude. Modern tools they pick themselves. The freedom to debug a broken pipeline and fix it. And the work itself is verifiable, which is exactly what models are best at. None of that holds in the rest of knowledge work. Workflows are different. Users are less technical. Data is fragmented. Systems are old. So when Silicon Valley promises that AI agents will run your company by next quarter, people in actual enterprises just look confused.

Casado adds an angle worth paying attention to. The deepest secular trends in technology, like the early internet, start with individuals and only later move into organizations. ChatGPT is already inside every big company through people who use it on their own. What is failing is the centralized, top-down deployment, not AI itself. The MIT figure that "95 percent of enterprise AI efforts fail" is misread, in his view, because it is measuring the corporate program and not the actual usage by employees.

The board, the consultant, and the doomed AI project

Sinofsky and Casado have both sat on enterprise boards, and they describe a now-familiar cycle. The board tells the CEO they need more AI. The CEO hires consultants. The consultants build a centralized project that no one in the operations side understands. It quietly fails. Then everyone blames AI.

Two things make this worse. The first is the token-counting incentive. Several large companies are now measuring employee productivity by how many AI tokens (the units of AI usage you pay for) they burn. The predictable result is engineers asking agents to do useless work just to inflate the metric. Casado quotes someone who works at one of these companies admitting it directly. You get whatever you measure.

The second is architectural paralysis. Three years ago, the answer to "how do you deploy agents" was completely different from today. Should the agent run in your cloud or theirs? In a browser or as a process? With which tools? Companies that picked early often got burned. Levie says he now sees CIOs stuck mid-debate between two or three frameworks, unwilling to commit, because the tech is moving faster than enterprise architecture cycles can absorb. Standing still feels safer than betting wrong.

Treat AI like a user, not like software

This is the architectural shift Casado wants people to internalize, and it might be the most useful idea in the entire conversation. Stop trying to integrate AI into your software. Treat the AI as a user of that software.

Six months ago, every product company was bolting AI features into the existing user interface, the chat-with-your-product pattern. That hybrid is collapsing. The new approach: take your product, expose it as a CLI (a text-only interface a program can drive), or as a set of clean APIs, and let the agent use it the same way a person would. The agent runs in a separate harness like Claude Code or Codex, and your product becomes something it consumes.

Why does this work better? Because LLMs (large language models) are non-deterministic and they handle the long tail of messy real-world cases. Those are properties of humans, not of software. Casado's point lands hard: we have spent forty years building access controls, processes, and design patterns to deal with messy humans. If you treat the agent as a new hire, give it an email address, a license, an identity, and the same access rights as a peer at its level, you get to draft on all that infrastructure. If you treat it like software, you fight it.

It is also why he is bearish on what some call the "SaaS apocalypse," the theory that agents will replace the seat licenses companies pay for. An agent is another seat. There is no way around that. It needs an identity. It can't share a human's credentials without breaking security. The pricing model may bend, but the seat doesn't go away.

The integration wall AI doesn't fix

Sinofsky pushes back on the optimistic part of this. "Treat the agent as a user" sounds clean, but real users hit walls all the time. Any company with more than a thousand employees, or older than ten years, is a mass of stuff sitting there waiting to be integrated, and AI does not actually help integrate anything.

He describes the human version. You call customer service. The agent on the line can't help you, so they bounce you to another human. The next person can't help either because that's a different department's system. You eventually find the person with the right access. An AI agent has no instinct for any of that. It will hit a permissions wall and get stuck, because nobody told it to call Sally in HR or Bob in finance.

This is also why the announcement that OpenAI is partnering with Accenture and Deloitte to roll out enterprise AI was, to Sinofsky, the most obvious news of the year. Big companies will need armies of system integrators just to wire AI into their existing data and processes. The "agents will replace the consultants" headline got it backwards. You need consultants to make agents work in the first place.

Salesforce goes headless: the bellwether moment

The biggest enterprise news Levie wanted to highlight: Salesforce announced it is going fully headless. A "headless" product has no user interface; it is meant to be driven by other software. Salesforce is openly conceding that the most important user of CRM data going forward is not a salesperson, it is an agent acting on behalf of one.

That decision opens a different scale of use. A human queries a CRM a few times a day. An agent fans out 500 parallel queries instantly to map every account before a meeting. Suddenly the bottleneck is not how fast the human types, it is how much the SaaS backend can handle. Levie warns that many SaaS products will collapse the first time they are hit at agent scale, the same way ERP APIs broke when business intelligence tools first started snapshotting them every night.

It also forces a new pricing question. Are agents seats? An API tax? A separate identity tier? Nobody has the answer yet. But as Salesforce goes, so does most of enterprise software. The race is on to expose every SaaS product as something an agent can drive directly, and the companies that get the architecture right early will have a structural advantage over the ones still building chat boxes.

AI coding makes systems more complex, not less

Levie is one of the loudest CEO voices on AI coding, and he says something his peers rarely admit on stage. Box gets a 2 to 3x productivity gain from AI coding, not 10x. The reason is not the model. The model writes 80 to 90 percent of new feature code at Box already. The bottleneck is everything around it: code review, security review, the deploy pipeline. You can ship faster, but only as fast as the constraints allow.

Casado pushes the point further. AI-written code degrades over time more aggressively than human-written code, and the industry has not figured out how to manage that yet. Vibe coding (the style where you let AI generate code with little manual review, named after the loose, intuitive feel of the workflow) works fine for one-off prototypes. It does not work for systems that have to keep running for ten years.

In Levie's words, the funniest concept in the discourse is that more code means fewer engineers. It is the opposite. Every new system is more complex than the one it replaced. More software means more upgrades, more downtime, more security incidents, more humans needed to keep the whole thing standing. Big companies survive precisely because they wrap engineers in constraints (reviews, audits, slow deploys) that AI accelerates but never removes.

The jobs misread: a sixty-year track record of being wrong

The closing argument is the most optimistic part of the panel, and it leans on history. Sinofsky pulls examples that map cleanly to today's predictions:

YearPredictionWhat actually happened
1965IBM's pitch: computers will replace accountantsAccounting expanded massively because companies could afford more analysis
1981Time magazine cover: computers will automate paper out of officesPaper consumption rose for two more decades; new categories of office work appeared
1995"The End of Work" predicts mass joblessness from automationThe internet boom created millions of new categories of jobs
2026AI will eliminate knowledge workSame mistake, new tools

Levie's argument is straightforward. A company is an information processor, and the binding constraint has never been how fast information gets created, only how effectively it gets consumed and acted on. AI accelerates creation. It does not relieve the consumption side, which still requires judgment, context, and someone willing to be accountable.

That is why the AI-native companies are hiring fastest, why infrastructure firms are growing not shrinking, and why engineering jobs will spread far beyond Silicon Valley. The next generation of engineers won't be at Google. They will be at John Deere automating tractors, at Caterpillar building AI for heavy equipment, and at Eli Lilly designing drugs.

What this is really about

The panel's diagnosis is consistent across all three speakers, even when they disagree on details. Enterprise AI is hard, but not because enterprises are slow or stupid. It is hard because:

  • The technology is changing faster than architecture decisions can settle
  • AI does not solve integration, which is where most of the real work has always been
  • Productivity gains are real but always bounded by the review and trust processes that keep big companies from imploding
  • Treating agents as users, not as software, is the unlock most companies haven't made yet

The diffusion will take years. In the meantime, the people getting AI to work in enterprises right now are individual employees, not centralized programs. That is the real signal, and it is the one most boardrooms keep missing.

Glossary

TermDefinition
AgentAn AI that performs tasks on its own, opening apps, calling APIs, and acting across multiple steps, instead of just answering questions
LLM (large language model)The kind of AI behind ChatGPT, Claude, and Gemini. Trained on huge text datasets, returns probabilistic answers
Non-deterministicThe same input can give different outputs, like a person reasoning through a problem, not a database lookup
HeadlessA product with no user interface, meant to be driven by other software (or agents) through APIs
CLI (command line interface)A text-only way to interact with software, easier for programs and agents to drive than a graphical interface
TokenThe base unit of AI usage, billed per input and per output. Engineers competing on token volume is a recent enterprise antipattern
MCP (Model Context Protocol)An emerging standard for letting agents call into a software product's data and actions
Vibe codingA loose, AI-led coding style where the developer prompts an agent and accepts most of what it writes without deep review

Sources and resources

Share this article