Karpathy: From Vibe Coding to Agentic Engineering

SourceYouTube

Published April 29, 2026

Sequoia Capital

Hosts:Stephanie Zhan

Guest:Andrej Karpathy — Eureka Labs

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

A year ago, Andrej Karpathy, former head of AI at Tesla and co-founder of OpenAI, coined the term "vibe coding." At Sequoia Capital's AI Ascent 2026 conference, he sits down with partner Stephanie Zhan and explains why that isn't enough anymore. Vibe coding raises the floor for everyone. But professional software still has a quality bar to clear, and that's where the new discipline comes in: agentic engineering, the craft of coordinating unreliable AI agents without losing the responsibility that made old-school engineering serious.

The turning point in December

Karpathy opens by admitting something surprising from one of AI's most experienced voices: he has never felt more behind as a programmer. It sounds dramatic, but it isn't said with regret. It's an acknowledgement of a real shift.

In December 2025 he was on a break and had a bit more time. He was using agentic tools like Claude Code, the latest models started returning whole chunks that just worked, and he stopped correcting them. After a few weeks, he trusted the system so completely that he was just vibe coding all the time. His side-projects folder is now overflowing, he says, laughing.

The point isn't that Karpathy is slow. The point is that he's a canary in the coal mine. When one of the few people in the world who has built modern AI from the inside swaps out his entire workflow in four or five weeks, that says something about the pace for everyone else.

Software 3.0: when the prompt is the program

So what actually changed? Karpathy uses his own framing of Software 3.0:

Software 1.0: You write code. Explicit logic, line by line.
Software 2.0: You "program" by collecting datasets and training neural networks. Weights replace code.
Software 3.0: You write text. The prompt is the program, and a large language model (LLM) is the interpreter.

"What's in your context window is your lever over the interpreter that is the LLM", he says. Context window is everything the AI can see when it answers you. Like a workbench: anything you place on the table, it can use. Anything you don't put there doesn't exist for the model.

The clearest example is the installer for OpenClaw, the AI coding tool by Peter Steinberger. Traditionally an installer is a bash script: a long sequence of commands that needs to cover all platforms, all Linux versions, all edge cases. It always becomes unstable and unreadable.

The OpenClaw installer is just a block of text you paste into your AI agent. The agent reads the text, looks at your machine, debugs along the way, and gets things working. You're no longer giving commands. You're giving intent.

When the app shouldn't exist

Karpathy built an experiment he calls MenuGen: snap a photo of a restaurant menu, get pictures of the dishes back. He vibe-coded a whole web app, ran it on Vercel, used OCR (text recognition from images) to read the menu, and used an image generator to create the dishes.

Then he saw the Software 3.0 version, which blew his mind: take your photo, give it to Gemini, and ask it to use Nano Banana to overlay the dishes directly onto the menu. You get back the exact same menu picture, but now with little illustrations rendered over each dish.

"All of MenuGen is spurious", he concludes. It's working in the old paradigm. The new paradigm is raw: the prompt is the image, the answer is the image, and the neural network does all the work in between.

Apps will get faster to build, sure. But the deeper point is that some apps simply shouldn't exist. The difference between "a faster version of what we have" and "new things that couldn't exist before" is fundamental.

Why the AI is jagged

You've probably seen it: the model crushes a hard task and fumbles a trivial one. Karpathy uses this week's example: "How is it possible that state-of-the-art Claude Opus 4.7 can refactor a 100,000-line codebase or find zero-day vulnerabilities, and at the same time tell me to walk to the car wash 50 meters away?"

He calls this jagged intelligence. The reason lies in how the models are trained with reinforcement learning (a method where AI learns by trying things and getting rewards for correct answers). Reinforcement learning requires that you can verify the answer. Math, code, and logic are easy to verify. Empathy, everyday judgment, and humor are not.

The result is that models peak in the domains that are easy to verify and stagnate in everything else. Not because they're dumber on the rest, but because the training signal never reaches there.

Karpathy goes further: the labs choose what goes into the training data. When GPT-3.5 became GPT-4, chess ability improved dramatically, not because the model got smarter overall, but because someone at OpenAI decided to add a large amount of chess games. You're always working with a model that has an invisible priority list from the labs. If your application lives inside the priority list, it flies. If not, you have to fine-tune yourself or build your own training environment.

Vibe coding raises the floor. Agentic engineering keeps the ceiling.

This is the line that gave the talk its title. Stephanie Zhan asks: what's the difference, and where are we now?

Vibe coding is about raising the floor, Karpathy says. Anyone can vibe-code anything now. That's amazing. It's a real revolution for who gets to make software at all.

But professional software still has a quality bar. You're not allowed to introduce vulnerabilities just because you vibe-coded the solution. You're still responsible for what works, what's safe, and what does what it should. The question becomes: how do you go faster without losing that bar?

That's agentic engineering. A separate engineering discipline where you coordinate the agents like a flock of skilled but unreliable interns. Karpathy calls them spiky entities: a bit fallible, a bit stochastic (unpredictable), but extremely powerful when used correctly.

He thinks the ceiling is much higher than people realize. People used to talk about the 10x engineer, the one who's ten times more productive than average. The best agentic engineers go far beyond 10x today.

For hiring, that means puzzle questions don't cut it anymore. You have to give a candidate a real project, say "build a secure Twitter clone for agents," let them use agentic tooling, and then point ten Codex agents at the result and try to break it. That's the new touchstone.

Ghosts, not animals

Karpathy has written an essay arguing that LLMs are not animals but ghosts. It sounds mystical, but the point is practical.

An animal has innate motivation, curiosity, play. It grows in a body that carries millions of years of evolution's training. An LLM has none of that. It's a statistical simulation circuit, built on pre-training (learning statistics from the entire internet) with reinforcement learning bolted on top.

If you yell at an LLM, it doesn't perform better or worse. It doesn't care. It can't care.

That's not a philosophical wag of the finger. It's a warning against using the wrong mental model. If you treat an agent like a colleague you can motivate, you'll be disappointed. If you treat it like a statistical system you have to steer with precise instructions and smart sequencing, you make progress.

What you can't outsource

To close, Zhan asks about education: what is still worth learning deeply when intelligence gets cheap?

Karpathy points to a tweet he keeps thinking about every other day: "You can outsource your thinking, but you can't outsource your understanding."

You can let the agents do the code, the planning, and the research. But you still have to know what you're trying to build, why it's worth building, and how to direct the agents toward the goal. You become the bottleneck, not because you're slow, but because you're the only one who actually understands why anything matters.

His own LLM knowledge bases are an attempt to fix this: have an agent build wikis from articles he reads, so he can ask questions and see information from different angles. Tools to enhance understanding, not replace it.

Glossary

Term	Definition
Vibe coding	Building software by describing what you want to an AI and trusting it to do the rest, without reading or understanding the code it writes.
Agentic engineering	The serious discipline emerging on top of vibe coding: coordinating AI agents to go fast while keeping responsibility for safety and quality.
Software 3.0	The third era of programming, where you "program" by writing text (prompts) to an AI instead of writing code (1.0) or training neural networks (2.0).
LLM (large language model)	An AI model trained on massive amounts of text from the internet, which answers by predicting the next word based on what it has seen before.
Context window	Everything the AI can see when it answers you. Like a workbench: only what's on the table can be used.
Reinforcement learning (RL)	A training method where the model learns by trial and error, getting rewards for correct answers, like training a dog with treats.
Verifiability	The ability to check whether an answer is correct. AI is best in domains where this is easy (math, code), worse where it's hard (creative writing, everyday logic).
Jagged intelligence	When AI can master something complex (refactoring 100,000 lines of code) and fail at something trivial (a car wash 50 meters away).
Fine-tuning	Specializing a finished AI model on your own dataset for better results in your specific context.