Nvidia GTC 2026: $1 Trillion Demand and a New Agent OS

Key insights

The Groq acqui-hire reveals that no single chip architecture can serve all inference tiers. Nvidia is admitting its GPUs need a fundamentally different type of silicon for ultra-fast token generation.
Jensen's 'token factory' framing redefines how data centers are valued: not by storage or compute cycles, but by tokens per watt per dollar. Every AI company CEO will track this metric.
Calling OpenClaw 'the Linux of agents' implies the $2 trillion enterprise IT industry faces a platform shift, not incremental AI features. Every SaaS company either becomes an agent platform or risks disruption.
The robo-taxi partnerships with BYD, Hyundai, Nissan, and Uber represent 18 million cars per year. Nvidia is positioning as the platform layer for autonomous vehicles, not the car maker.

SourceYouTube

Published March 16, 2026

Yahoo Finance

Hosts:Jensen Huang

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Jensen Huang, founder and CEO of Nvidia, used his GTC 2026 (GPU Technology Conference, Nvidia's annual developer conference) keynote to double his AI infrastructure demand forecast from $500 billion to $1 trillion through 2027. He unveiled the Vera Rubin computing platform with Groq integration, announced Nvidia's support for the open-source agent framework OpenClaw, and revealed new robo-taxi partnerships with BYD, Hyundai, Nissan, and Uber. The 2-hour-and-20-minute keynote covered everything from chip architecture to a Disney robot walking on stage.

The $1 trillion forecast

Huang opened with a claim that set the tone for the entire event. Last year at GTC, he told the audience he could see $500 billion in high-confidence demand for Blackwell and Vera Rubin systems through 2026. This year, he doubled it: "through 2027 at least $1 trillion" in demand.

The demand increase stems from what Huang calls the "inference inflection." AI has moved through three phases in two years. First, generative AI (ChatGPT in late 2022) taught models to create content. Then reasoning models (like GPT-5) allowed AI to think through problems step by step. Finally, agentic AI made models capable of doing actual work: reading files, writing code, testing it, and iterating. Coding tools like Claude Code (Anthropic), Codex (OpenAI), and Cursor let developers describe what they want, and the AI writes, tests, and fixes the code.

Each phase multiplied the compute required. Huang claimed that computing demand has increased by 1 million times in the last two years, driven by a roughly 10,000-fold increase in tokens needed per task and a 100-fold increase in usage. He also pointed to $150 billion in venture capital invested into AI startups, calling it "the largest in human history."

Keynote slide showing inference inflection driving strong growth, with demand doubling from $500 billion to $1 trillion and a 60/40 split between hyperscalers and other customers — Demand doubled to $1 trillion. The 60/40 split shows hyperscalers on the left, everything else on the right. Screenshot from YouTube.

Nvidia's current business reflects this breadth. About 60% of revenue comes from the top five hyperscalers (large cloud providers like Amazon, Google, and Microsoft). The remaining 40% spans regional clouds, sovereign AI projects (AI infrastructure controlled by specific countries rather than foreign providers), enterprise, industrial, and robotics applications. Huang framed this diversity as resilience: "This is not a one app technology. This is absolutely a new computing platform shift."

The keynote also marked the 20th anniversary of CUDA, the programming platform that lets developers use Nvidia's graphics chips (GPUs) for AI, scientific research, and other heavy computation. With 450 companies sponsoring GTC, 1,000 technical sessions, and 2,000 speakers, the conference itself illustrated how deeply Nvidia's ecosystem has penetrated every major industry.

Jensen Huang on stage showing GTC 2026 industry verticals: automotive, financial services, healthcare, industrial, media, quantum, retail, robotics, and telco — Nvidia spans nine industry verticals: automotive, financial services, healthcare and life sciences, industrial, media and entertainment, quantum computing, retail and consumer goods, robotics, and telecommunications. Screenshot from YouTube.

Vera Rubin: seven chips, five racks, one system

The hardware centerpiece of the keynote was Vera Rubin, Nvidia's next-generation AI computing platform. Previous launches featured Huang holding up a single chip. Vera Rubin is different. It is an entire vertically integrated system: seven chip types across five rack-scale computers, delivering 3.6 exaflops of computing power (a billion billion calculations per second, multiplied by 3.6).

Keynote slide showing the $120 billion structured data ecosystem with cuDF at the center, connecting to enterprise platforms, cloud providers, and open-source engines — Huang's "best slide": Nvidia's cuDF library now integrates into the entire $120 billion structured data ecosystem. Screenshot from YouTube.

The biggest surprise was the Groq integration. Nvidia revealed that its acqui-hire of Groq produced a third-generation chip called the LP30, manufactured by Samsung. Groq's Language Processing Units (LPUs) use large amounts of on-chip memory (SRAM) instead of the external memory that graphics processing units (GPUs) rely on, making them exceptionally fast at generating tokens. Paired with Vera Rubin's GPUs, the combination delivers 35 times more throughput per megawatt.

The practical result, according to Huang: token generation speed would jump from 2 million to 700 million tokens per second per gigawatt, a 350-fold increase in two years. For context, one gigawatt is roughly the power output of a large nuclear plant.

This matters because modern AI workloads have two distinct phases. The "prefill" step (reading and processing your input) demands raw mathematical power, which GPUs handle well. The "decode" step (generating output tokens one by one) is bottlenecked by memory bandwidth, which is where Groq's SRAM-heavy LPUs excel. By splitting the work across two types of silicon, Nvidia can optimize both phases simultaneously.

Dylan Patel, founder of semiconductor research firm SemiAnalysis, independently validated the performance claims. Last year, Huang said Blackwell (Nvidia's current-generation AI chip architecture) delivered 35 times the performance per watt over the previous generation, Hopper. Patel's benchmarks found the actual number was higher: "Jensen sandbagged. It's actually 50 times."

The roadmap extends further. Rubin Ultra will connect 144 GPUs in a single NVLink domain (Nvidia's custom high-speed interconnect between GPUs) using a new rack design called Kyber. After that comes Feynman in 2028, with a next-generation GPU, a new LPU (LP40), and a new CPU codenamed Rosa.

OpenClaw: the agent OS moment

The software announcement that got the most stage time was Nvidia's endorsement of OpenClaw, the open-source AI agent framework created by Peter Steinberger. Huang called it "the most popular open-source project in the history of humanity," noting that it exceeded Linux's 30-year adoption curve in just a few weeks.

Keynote slide announcing NVIDIA NemoClaw Reference OpenClaw, showing the agent toolkit architecture with OpenShell, tools, LLM connections, sub-agents, and Nemotron integration — NemoClaw: Nvidia's enterprise-ready agent toolkit built on OpenClaw, with OpenShell for security, LLM connections, sub-agents, and Nemotron integration. Screenshot from YouTube.

Huang compared OpenClaw's significance to Linux, HTTP, and Kubernetes (the container orchestration system that made cloud computing practical), each of which arrived at the right moment to unlock an entire computing era. OpenClaw connects to large language models (LLMs), the AI systems that power tools like ChatGPT and Claude. It manages resources, accesses tools, handles scheduling, decomposes problems into steps, and spawns sub-agents. In Huang's framing, it is the operating system layer that the agentic AI era needed. He showed a video montage of real-world use cases: a 60-year-old father automating his lobster business, developers building other tools on top of OpenClaw, and an already-established "ClawCon" conference.

The enterprise implications were blunt. Huang predicted that "every single SaaS company will become an agentic-as-a-service company". SaaS (software as a service) means cloud software you rent, like Spotify or Google Docs. Agentic-as-a-service is Huang's term for the next step: instead of giving people tools, you rent out AI agents that do the work for them. The entire IT industry of tools, file systems, and consulting would be restructured around agents that can access sensitive information, execute code, and communicate externally.

Keynote slide titled Enterprise IT Renaissance from SaaS to Agent-as-a-Service, comparing the old model of data centers with files to the new model of AI factories producing tokens for agentic-as-a-service providers and AI agents — Huang's vision: enterprise IT shifts from data centers storing files to AI factories producing tokens for agentic-as-a-service providers. Screenshot from YouTube.

That capability creates obvious security risks. As Huang put it: agents in a corporate network can access sensitive information, execute code, and communicate externally. "Just say that out loud," he told the audience. To address this, Nvidia announced NemoClaw, an enterprise-ready reference design built on top of OpenClaw. NemoClaw adds policy guardrails, a privacy router, and security controls so that companies can deploy AI agents without giving them unchecked access to corporate networks. Nvidia also announced a Nemotron coalition of companies partnering to build the next generation of open AI models, including coding tool Cursor, developer platform LangChain, French AI company Mistral, and search engine Perplexity.

Keynote slide showing Nvidia's six open model families: BioNeMo for biomedical AI, Earth-2 for AI physics, Nemotron for agentic AI, Cosmos for physical AI, GROOT for robotics, and Alpamayo for autonomous vehicles — Nvidia's six open model families span from biomedical research to self-driving cars. Screenshot from YouTube.

Physical AI and robo-taxis at scale

The final major section covered physical AI, meaning robots and autonomous vehicles. Huang announced four new partners for Nvidia's robo-taxi platform: BYD, Hyundai, Nissan, and Geely. Together, these manufacturers build 18 million cars per year, joining existing partners Mercedes, Toyota, and GM. Nvidia also announced a partnership with Uber to deploy robo-taxi-ready vehicles into Uber's ride-hailing network across multiple cities.

The strategy positions Nvidia as the "Intel Inside" of autonomous vehicles rather than a competitor to car makers. The company provides the computing platform (a chip called Thor for in-car processing), its self-driving AI called Alpamayo, and simulation tools for training robots in virtual environments. The car makers build the cars.

Inside a Mercedes self-driving car, Alpamayo CoT displays a message: There's a double-parked vehicle in my lane. I'm going around — Alpamayo in action: the AI explains its reasoning to the passenger in real time. "There's a double-parked vehicle in my lane. I'm going around." Screenshot from YouTube.

Huang declared that "the ChatGPT moment of self-driving cars has arrived," meaning autonomous driving has reached the point where the technology demonstrably works, and the remaining challenges are deployment and regulation rather than fundamental research.

The keynote's most memorable visual was a Disney Olaf robot walking onto the stage. The robot uses Nvidia's Newton physics simulator (software that models gravity, friction, and movement) and Jetson, a small computer built for robots. Olaf learned to walk inside Omniverse, Nvidia's virtual world where robots can practice before facing reality. Nvidia developed the physics engine together with Disney Research and DeepMind. Huang noted that 110 robots were on display at GTC, spanning manufacturing (ABB, KUKA, Foxconn), medical (Paratas AI), and entertainment applications.

How to interpret these claims

Huang's forecast of $1 trillion in demand is a projection, not confirmed orders. It represents what Nvidia expects customers will spend on Blackwell and Vera Rubin infrastructure through 2027. The actual figure depends on continued growth in AI usage, successful deployment of reasoning and agentic models, and sustained investment from hyperscalers and startups.

The "1 million times" computing demand figure is Huang's own estimate, arrived at by multiplying increased token requirements per task (10,000x) by increased usage (100x). Independent verification of this specific claim is difficult since the underlying data is not public.

The Groq integration is real hardware in production, but the 350x token generation improvement compares a future system (Vera Rubin with Groq LP30) against the current generation over a two-year timeline. Whether customers see that improvement in practice depends on the specific workloads they run.

The robo-taxi partnerships are announcements of intent, not deployments. The timeline from "robo-taxi ready platform" to actual self-driving cars operating on public roads involves regulatory approval, safety validation, and infrastructure buildout that could take years.

Glossary

Term	Definition
GTC (GPU Technology Conference)	Nvidia's annual developer conference where the company launches new technology and showcases partnerships.
CUDA	Nvidia's programming platform that lets developers use graphics chips (GPUs) for AI, research, and other heavy computation.
GPU (Graphics Processing Unit)	A graphics chip, originally built for gaming, now best known as the engine behind AI training and inference.
Inference	When an AI model generates a response or makes a prediction. The "thinking" step, as opposed to training.
Token	The smallest unit an AI model works with. Roughly 3-4 characters of text.
Tokens per watt	How many AI outputs a data center can produce per unit of energy. The key efficiency metric for AI factories.
NVLink	Nvidia's custom high-speed connection between GPUs, much faster than standard Ethernet or InfiniBand cables.
LPU (Language Processing Unit)	Groq's specialized chip optimized for fast token generation using on-chip memory (SRAM) instead of external memory.
Acqui-hire	When a company buys another mainly to hire its engineering team and license its technology.
Prefill	The first step of AI inference where the model reads and processes your input text.
Decode	The second step where the AI generates output tokens one by one. This is the speed-sensitive part.
Exaflops	A billion billion calculations per second. A measure of raw computing power.
Co-packaged optics (CPO)	Technology where light-based communication is built directly onto the chip, replacing copper cables for faster data transfer.
Agentic AI	AI that can take actions, use tools, and complete tasks autonomously, not just answer questions.
Sovereign AI	AI infrastructure and models controlled by a specific country, not dependent on foreign providers.
Agentic-as-a-service	Jensen Huang's term for the shift where every SaaS company rents out AI agents that do the work, instead of just providing tools for humans to use.