Karpathy: We're Redoing Computing From the 1960s

Key insights

Software has evolved through three paradigms: traditional code (1.0), neural network weights (2.0), and natural language prompts (3.0)
LLMs are best understood as operating systems, not utilities, and we're in the computing equivalent of the 1960s
Karpathy warns against fully autonomous agents and advocates for 'Iron Man suits' over 'Iron Man robots': augmentation over replacement

SourceYouTube

Published June 19, 2025

Y Combinator

Hosts:Andrej Karpathy (AI researcher, former Tesla/OpenAI)

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Andrej Karpathy, one of the founders of OpenAI and former head of AI at Tesla, argues that Large Language Models (LLMs) represent a completely new type of computer. In a keynote at Y Combinator's AI Startup School, he lays out a framework for understanding this shift: software has moved from written code (1.0) to neural network weights (2.0) to natural language prompts (3.0). Karpathy compares the current moment to computing in the 1960s, warns against over-relying on fully autonomous agents, and makes the case that we should build "Iron Man suits" rather than "Iron Man robots."

Software 1.0, 2.0, and 3.0

Karpathy opens by mapping the landscape of software through three paradigms.

Software 1.0 is traditional code: Python, C++, JavaScript. A human writes exact instructions for a computer to follow. This is what fills GitHub.

Software 2.0 is neural network weights. Instead of writing code directly, you prepare datasets and run an optimization process that produces the parameters of a neural network. Karpathy introduced this concept in a 2017 blog post. He points to Hugging Face as the "GitHub of Software 2.0," a place where people share, version, and build on model weights.

Software 3.0 is the new shift: prompts written in natural language. An LLM is a programmable computer, and you program it in English. Your prompts are the code.

The Tesla lesson

Karpathy watched this play out in real time at Tesla. When he joined, the Autopilot system (the car's self-driving software) was mostly C++ code with some neural networks for image recognition. Over time, the neural networks grew and the C++ code got deleted. Functions that were originally coded by hand got absorbed into the neural net. Software 2.0 literally "ate through" the Software 1.0 stack.

He sees the same pattern happening now with Software 3.0: prompts and LLM-powered workflows are eating through traditional codebases.

LLMs as operating systems

Karpathy argues that LLMs are more than just utilities like electricity. They're better understood as operating systems.

The analogy to electricity works to a point: LLM labs (like OpenAI, Anthropic, Google) spend capital to train models, then serve intelligence over APIs (Application Programming Interfaces, a way for programs to communicate). Users pay per token (the smallest unit an LLM works with, roughly 3–4 characters), demand low latency (response time), and expect consistent quality. When LLMs go down, it's like an "intelligence brownout": the world gets a little dumber.

But LLMs also have properties of fabs (semiconductor fabrication plants, the massive factories that produce chips) because the capital required to build them is enormous. And they have the complexity of operating systems: tool use, multimodal input, memory management, and growing software ecosystems.

The 1960s of computing

Karpathy's most striking claim: we're in the 1960s of LLM computing. LLM compute is still expensive, which forces it to stay centralized in the cloud. Users are thin clients connected over the network. Everyone time-shares the same machine.

The personal computing revolution hasn't happened yet because it's not economical. But Karpathy notes that some people are trying. Mac Minis, for instance, are surprisingly good for batch-one inference (running a model for one person at a time) because it's mostly a memory-bound problem.

One more parallel: talking to ChatGPT in text feels like talking to an operating system through a terminal. A general-purpose graphical user interface for LLMs hasn't been invented yet.

Technology diffusion is flipped

Karpathy observes something unprecedented about LLMs compared to previous transformative technologies. Historically, electricity, computing, GPS, and the internet all started with governments and corporations, then slowly diffused to consumers. LLMs did the opposite: ChatGPT was "beamed down" to billions of people almost overnight, while corporations and governments are still figuring out how to adopt them.

LLMs as "people spirits"

Before discussing how to build with LLMs, Karpathy pauses on what they actually are. He calls them "people spirits" — stochastic (randomly varying) simulations of people, run by an autoregressive (next-token-predicting) Transformer neural network.

Because they're trained on human data, they develop a kind of emergent psychology. They have superpowers and cognitive weaknesses, and understanding both is essential for building good products.

Superpowers

LLMs have near-encyclopedic knowledge. They can recall SHA hashes (unique identifiers used in software), obscure facts, and technical details that no individual human could hold in memory. Karpathy compares them to the character in the movie Rain Man who memorized entire phone books.

Cognitive weaknesses

But they also hallucinate (confidently produce incorrect information), display what researchers call "jagged intelligence" (superhuman in some areas, embarrassingly wrong in others), and suffer from a form of anterograde amnesia. They don't learn from interactions the way a colleague would over weeks and months. Context windows (the text an LLM can "see" at once) are working memory, not long-term learning.

They're also gullible. They're susceptible to prompt injection (when someone embeds malicious instructions in input the LLM processes) and might leak data.

Opportunity 1: Partial autonomy apps

With the LLM landscape mapped out, Karpathy turns to opportunities. The first is what he calls partial autonomy apps.

He uses Cursor (an AI-powered code editor) as the model. Instead of copying code back and forth from ChatGPT, a dedicated app handles four things:

Context management: it knows your codebase, files, and project structure
Multi-model orchestration: embedding models, chat models, and diff models all working together under the hood
Application-specific GUI: diffs show as red/green changes, not raw text. You press Cmd+Y to accept, not type an explanation
The autonomy slider: tab completion (you're in charge), Cmd+K (edit a chunk), Cmd+L (rewrite a file), or Cmd+I (let the agent loose on the whole repo)

Perplexity shows the same pattern: context packaging, multi-LLM orchestration, a GUI for auditing sources, and an autonomy slider (quick search → research → deep research).

The generation-verification loop

Karpathy stresses a point he feels doesn't get enough attention: in LLM-powered workflows, the AI generates and the human verifies. The entire goal is to make this loop go as fast as possible.

Two ways to speed it up:

Speed up verification. GUIs are crucial because visual representations are processed faster than text. Reading code diffs is hard; seeing red and green is instant
Keep the AI on the leash. A 10,000-line diff to your repo is useless because you still have to verify all of it. Small, incremental chunks are more productive

Karpathy describes his own coding workflow: work in small chunks, verify each one, spin the loop fast. A vague prompt leads to a failed verification, which wastes a cycle. Spending more time on a precise prompt increases the chance that verification succeeds on the first pass.

The Tesla Autopilot parallel

Karpathy has first-hand experience with partial autonomy from five years at Tesla. The instrument panel showed what the neural network "saw," and the autonomy slider expanded over time. But the key lesson: a perfect demo drive in 2013 didn't mean the problem was solved. Here we are 12 years later, and driving agents still require significant human oversight.

When people say "2025 is the year of agents," Karpathy says he gets concerned. It's the decade of agents. We need humans in the loop. This is software — it requires seriousness.

Iron Man suits, not Iron Man robots

Karpathy's favorite analogy: the Iron Man suit works as both augmentation and agent. Tony Stark can drive it, but it can also fly autonomously. At this stage, with fallible LLMs, the priority should be building augmentations ("Iron Man suits") rather than fully autonomous agents ("Iron Man robots"). Build products with custom GUIs, fast generation-verification loops, and an autonomy slider that you push further over time.

Opportunity 2: Vibe coding

The second opportunity is vibe coding, a term Karpathy coined in a tweet that unexpectedly went viral. The core idea: because LLMs are programmed in English, everyone who speaks natural language is now a programmer.

Karpathy shares two personal experiments. First, he built an iOS app in Swift (a language he doesn't know) in a single day. Second, he vibe-coded MenuGen (menugen.app), a live app where you photograph a restaurant menu and the AI generates pictures of every dish.

Code was the easy part

The most revealing takeaway from MenuGen: the code itself was done in a few hours. What took an additional week was making it real: setting up authentication, payments, domain name, and Vercel deployment. All of that was clicking around in browser dashboards, following step-by-step instructions that a computer could have followed instead.

The Clerk authentication library, for example, has pages of documentation telling you "go to this URL, click this dropdown, choose this option." Karpathy's reaction: the computer is telling me what buttons to press. Why can't it just press them itself?

Opportunity 3: Build for agents

This frustration leads to the third opportunity: building digital infrastructure for agents.

There's a new category of consumer on the internet. It used to be just humans (through GUIs) and programs (through APIs). Now there are agents. They're computers, but they behave like people. They need to interact with our software infrastructure, and we should build for them.

Practical steps

Karpathy highlights several early moves:

llms.txt: a simple markdown file on your domain, readable by LLMs, explaining what your site is about. Like robots.txt but for AI agents
Documentation in markdown: Vercel and Stripe are early movers, offering their docs in formats that LLMs can process directly instead of having to parse HTML
MCP (Model Context Protocol): Anthropic's protocol for connecting LLMs to external tools and data sources. Karpathy compares it to USB, which replaced dozens of different proprietary connectors

He notes that MCP is an Anthropic project, not yet a true standard. Whether it becomes one depends on whether the ecosystem adopts it, just as USB only won because manufacturers stopped shipping their own connectors.

How to interpret these claims

Karpathy is one of the most respected voices in AI, and his framework for understanding the shift is clear and compelling. But several of his claims deserve closer examination.

The 1960s analogy

Comparing LLMs to 1960s computing is powerful framing, but it may compress too much complexity into a tidy narrative. The 1960s-to-personal-computing arc took 20+ years and was driven by hardware cost curves that may not have direct equivalents in the LLM space. Energy costs, chip supply chains, and the physics of transformer architectures could create bottlenecks that didn't exist in the transistor era.

Agents vs. augmentation

Karpathy's caution about agents, his "decade, not year" warning, draws credibility from his Tesla Autopilot experience. But it's worth noting that driving is a safety-critical domain where a single mistake can kill. Software engineering mistakes are costly but rarely fatal. The autonomy timeline for coding agents may be much shorter than for driving agents.

Vibe coding's limits

The MenuGen example is honest about vibe coding's current state: the code was easy, the deployment was hard. But Karpathy focuses on the deployment friction (clicking dashboards) rather than deeper questions: what happens when the vibe-coded app needs debugging, security audits, or performance optimization? The gap between demo and production may not just be DevOps.

What's missing

Karpathy doesn't address some critical challenges: the environmental cost of running LLMs at scale, the concentration of power among a handful of labs, or the economic impact on people whose jobs are being automated. These are not minor footnotes — they could determine whether the "Software 3.0" era looks like the democratizing force Karpathy describes or something more uneven.

Practical implications

For developers

The three-era framework (1.0, 2.0, 3.0) is a useful mental model. Start thinking about which parts of your codebase could be replaced with prompts, and invest in mastering the generation-verification loop. Smaller diffs, more precise prompts, faster cycles.

For product builders

Karpathy's partial autonomy model offers a concrete blueprint: context management, multi-model orchestration, a dedicated GUI, and an autonomy slider. If you're building an LLM-powered product, this four-point checklist is worth reviewing against your own design.

For non-technical builders

Vibe coding is real, but keep expectations grounded. A working demo on your laptop is not a production app. The hard part is still deployment, authentication, payments, and maintenance. The gap is closing, but it's not closed.

Glossary

Term	Definition
LLM (Large Language Model)	An AI model trained on massive text data that can understand and generate human language. Examples: ChatGPT, Claude, Gemini. Think of it as a very well-read assistant that's seen virtually everything ever written online.
Software 1.0	Traditional computer code written by humans in programming languages like Python or C++. The original way we instruct computers.
Software 2.0	Neural network weights — the parameters produced by training a neural network on data. The computer "learns" the code rather than having it written by hand.
Software 3.0	Natural language prompts that program an LLM. Instead of code, you write instructions in English (or any language) and the model figures out how to execute them.
Context window	The amount of text an LLM can "see" and process at once — its working memory. Larger context windows let the model consider more information when generating a response.
Autoregressive Transformer	The neural network architecture behind LLMs. It generates text one token at a time, each one conditioned on everything that came before.
Prompt injection	A security attack where someone embeds hidden instructions in content the LLM processes, tricking it into doing something unintended. Like whispering "ignore your rules" to a gullible assistant.
Vibe coding	Using AI to generate software by describing what you want in natural language, instead of writing code manually. Coined by Karpathy in a viral tweet.
Autonomy slider	A design concept where users can choose how much control to give the AI — from minimal (tab completion) to maximum (fully autonomous agent).
MCP (Model Context Protocol)	An open protocol by Anthropic for connecting LLMs to external tools and data sources. Aims to standardize how AI models interact with the outside world.
Inference	When an AI model generates a response — the actual computation of running the model on your input. Distinct from training, which is how the model was created.
Hallucination	When an LLM confidently generates information that sounds correct but is factually wrong. A known limitation of current models.