Skip to content
Back to articles

NVIDIA DRIVE AV: Why Two AI Stacks Beat One

March 11, 2026ยท6 min readยท1,186 words
autonomous vehiclesNVIDIA DRIVE AVself-driving car technologyAI safetyAlpamayo
Jensen Huang and Xinzhou Wu driving through San Francisco in a NVIDIA DRIVE AV vehicle
Image: Screenshot from YouTube.

Key insights

  • DRIVE AV runs two parallel systems: Alpamayo, an end-to-end AI model that drives like a human, and a classical safety stack that acts as a guardrail.
  • When Alpamayo makes a mistake, the classical stack catches it, giving 'a loss of capability but not a loss of safety.'
  • The system has iterated through roughly 2,300 model versions at about 7 per day, supported by 2 million simulation tests running daily.
SourceYouTube
Published March 11, 2026
NVIDIA
NVIDIA
Hosts:Jensen Huang, Xinzhou Wu

This is an AI-generated summary. The source video includes demos, visuals and context not covered here. Watch the video โ†’ ยท How our articles are made โ†’

In Brief

NVIDIA DRIVE AV is the company's autonomous driving platform, and its defining feature is an unusual design choice: it runs two completely separate AI systems at the same time. One system, called Alpamayo, is a modern end-to-end AI model that has learned to drive by observing humans. The other is a classical, rule-based safety system built to strict automotive standards. In a drive through San Francisco, Jensen Huang (NVIDIA's CEO) and Xinzhou Wu (VP of Automotive) explained why this dual-stack architecture is the central bet behind their autonomous driving strategy.


What is the dual-stack architecture?

DRIVE AV is built on two parallel systems that run simultaneously rather than replacing one with the other. Most self-driving approaches choose either a classical rule-based system or a modern AI model. NVIDIA runs both.

The first system is Alpamayo, an end-to-end AI model. "End-to-end" means it takes raw sensor data as input (camera images, radar readings, and ultrasonic signals) and outputs a driving decision directly, without breaking the task into separate modules for perception, prediction, and planning. It learned to drive by studying large amounts of real human driving data, which is why it handles situations like lane changes and speed bumps in a way that feels natural rather than mechanical.

The second system is a classical stack: a traditional software system built entirely from human-written rules, following the ASIL (Automotive Safety Integrity Level) safety standard used across the automotive industry. Unlike the AI model, every decision the classical stack makes can be traced, audited, and formally verified.

Analogy:

Explained simply: Think of it like a jazz musician (Alpamayo) performing alongside a strict conductor (the classical stack). The musician improvises fluidly and sounds natural. The conductor ensures no one plays something that breaks the rules. The analogy breaks down because the two systems here don't take turns โ€” they run in parallel, with the conductor able to override the musician at any moment.


How the two stacks work together

The key insight is that the classical stack acts as a safety guardrail for Alpamayo, not a replacement for it.

What Alpamayo does well

Because it learned from human driving data, Alpamayo handles unscripted situations gracefully. In the San Francisco drive, the car encountered a speed bump that was never explicitly programmed into the system. Alpamayo slowed down appropriately anyway, because it had seen humans slow down for speed bumps and generalized from that. Lane changes in dense city traffic felt smooth and human-like rather than rigid, because Alpamayo learned to read gaps in traffic the way drivers do rather than following a checklist of rules.

The practical advantage of the end-to-end approach is iteration speed. With a classical multi-module system, fixing one module can break another. "You're chasing cats and dogs around the table," as Huang put it. With a single end-to-end model, improvements flow through the whole system via backpropagation (the standard technique for training neural networks by adjusting the model based on errors). Alpamayo has gone through roughly 2,300 model versions, at an average of about 7 per day.

What the classical stack prevents

The core problem with any AI model is that it can behave unexpectedly on inputs it has never seen before. This is called out-of-distribution behavior: the model encounters a situation that falls outside its training data, and its output becomes unpredictable.

The classical stack prevents Alpamayo from acting on those unpredictable outputs. If Alpamayo produces a decision that violates safety rules, the classical stack overrides it before it reaches the wheels. Wu summarized the result: "you have a loss of capability but not a loss of safety". When a new model version causes a regression somewhere, the safety envelope holds even if a specific capability temporarily gets worse.

This also matters for development speed: because the classical stack is always there as a backstop, engineers can deploy new Alpamayo versions more aggressively without worrying that a regression will cause an unsafe situation.


The data flywheel

An AI system is only as good as the data it trains on. DRIVE AV's development relies on three connected tools to build a large, high-quality simulation environment.

NuRec (Neural Reconstruction) replays real recorded drives at the pixel level, reconstructing the entire scene so the model can be tested in situations that already happened on real roads.

Cosmos is NVIDIA's world model for generating synthetic data. It can take a NuRec reconstruction and modify it: change the weather, alter the background, or create rare scenarios that haven't been recorded yet.

The functional scenario tree is a structured catalog of all the driving situations the system needs to handle. When a new scenario appears during road testing, a new branch is added to the tree, and the data curation pipeline works to fill it with examples.

Together, these tools run 2 million simulation tests every day.


Why it matters

The path to robotaxis

Self-driving systems are classified by automation level. L2++ is not an official SAE designation, but an industry term used by NVIDIA and others:

LevelWhat it meansDriver needed?
L2++Advanced driver assist (unofficial term)Yes, full responsibility
L3Car handles driving in certain conditionsMust be ready to take over
L4Fully autonomous in defined areasNo

DRIVE AV currently operates at L2++. The same safety architecture is designed to scale from L2++ through L3 to L4.

For robotaxi operations, teleoperation acts as a final backup: if the car gets stuck, a remote operator injects a few waypoints and the car navigates itself out, without any safety compromise.

Industry validation

Mercedes-Benz has already deployed DRIVE AV in its vehicles. The platform received the highest Euro NCAP 2025 safety rating on its first attempt, the most demanding safety benchmark for vehicles sold in Europe.

What's next

The team described the next generation as adding reasoning and memory to Alpamayo. A model with memory can learn from past mistakes in a persistent way; reasoning allows it to handle rare situations by breaking unfamiliar scenarios into familiar pieces rather than treating them as entirely new. The current model has about 1 billion parameters, a relatively small size compared to large language models. This suggests the driving task requires far less raw information than it might appear to.


Common misconceptions

"End-to-end AI makes safety systems obsolete"

The DRIVE AV approach is a direct counter to this idea. An end-to-end model is harder to verify formally: you can't audit every decision path the way you can with rule-based code. The classical stack exists precisely because end-to-end models, however capable, can fail in ways nobody can predict in advance. Replacing the safety stack with more AI would remove the ability to make formal safety guarantees.

"More sensor data means better decisions"

The video makes the opposite point: the key details needed to predict a safe driving path are likely very small compared to everything the sensors capture. A billion-parameter model is effective not because it processes everything, but because it learns which parts of the input actually matter.


Glossary

TermDefinition
End-to-end modelAn AI model that takes raw sensor inputs and outputs driving decisions directly, without separate modules for each step.
Classical stackA rule-based software system built with traditional programming logic and formal safety verification.
ASIL (Automotive Safety Integrity Level)The ISO 26262 standard for functional safety in automotive systems. Ranges from A (lowest) to D (highest risk).
Out-of-distributionA situation the AI hasn't seen during training, where its behavior becomes unpredictable.
BackpropagationThe standard algorithm for training neural networks: errors are fed back through the model to adjust its parameters.
NuRec (Neural Reconstruction)NVIDIA's tool for pixel-level reconstruction of real recorded drives, used to replay scenarios in simulation.
CosmosNVIDIA's world model for generating synthetic driving scenarios, including rare events and weather variations.
Functional scenario treeA structured catalog of all driving situations that need to be covered in testing, organized as a branching tree.
L2++ / L3 / L4Automation levels. L2++ is an unofficial industry term for advanced driver assist (human must supervise). L3 and L4 are official SAE levels: L3 = car handles driving, human takes over on request; L4 = fully autonomous in defined areas.
TeleoperationRemote human control of a vehicle, used as a fallback when the autonomous system gets stuck.
Euro NCAPEuropean New Car Assessment Programme, the main independent safety rating body for vehicles in Europe.

Sources and resources

Share this article