AlphaGo at 10: The Board Game That Changed AI Forever

Key insights

AlphaGo's Move 37 had a 1 in 10,000 chance of being played by a human. It signaled that AI could generate genuinely new knowledge, not just replicate existing expertise.
The same core technique behind AlphaGo, reinforcement learning, has since unlocked breakthroughs in protein folding, matrix multiplication, and formal mathematics.
The biggest remaining challenge is verification: AI systems excel in domains with clear right-and-wrong answers, but struggle where no reliable checker exists.

SourceYouTube

Published March 10, 2026

Google DeepMind

Hosts:Hannah Fry

Guest:Thore Graepel, Pushmeet Kohli — Google DeepMind

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Ten years ago, Google DeepMind's AlphaGo beat 18-time Go world champion Lee Sedol 4-1 in a match watched by 200 million people worldwide. Marking the tenth anniversary, the Google DeepMind podcast brings together host Hannah Fry with Thore Graepel and Pushmeet Kohli to reflect on why that match mattered. Their core argument: AlphaGo was not merely a parlor trick with a board game. It was proof that AI could move beyond human knowledge entirely, and that same principle now drives breakthroughs in biology, mathematics, and algorithm discovery. The question, a decade on, is how far that principle can reach.

The case for Go as the "perfect challenge"

When DeepMind chose Go as their target, the AI world was still digesting the fact that chess had been "solved." Deep Blue had beaten Garry Kasparov in 1997. Chess engines had become routine. Go was the obvious next mountain, but it looked nearly unclimbable.

The scale of Go's complexity makes chess look modest. Where chess has roughly 20-30 possible moves per position, Go has 200-300 (7:26). The total number of legal board positions is about 10^170, a number so large it exceeds the number of atoms in the observable universe (2:24). Any approach based on brute-force calculation (scanning every possible future move) was completely out of the question.

The deeper challenge was that Go requires something that looks like intuition. Experienced players evaluate a position by feel: a stone placed in one corner influences the mood of a position in the opposite corner. Graepel describes it as a game that combines "fast thinking" and "slow thinking" (6:45). Fast thinking, in this context, means immediate pattern recognition: a skilled player glances at the board and immediately narrows the field to a handful of plausible moves. Slow thinking means systematic calculation: exploring the consequences of those moves several steps into the future.

AlphaGo was designed to replicate both. A policy network (a neural network trained first on millions of human games, then refined through reinforcement learning) handled the intuition: it looked at the board and suggested the most promising moves. A value network evaluated how favorable a position was for either player. The two worked together with a game tree search, exploring possible future moves like branches of a tree. Together, they produced an AI that combined human-like pattern recognition with far deeper calculation than any human could perform.

Before the public match against Lee Sedol, AlphaGo played 10 test games against Fan Hui, the European Go champion at the time, and won all 10 (10:00). Even then, the team kept quiet about what was coming.

Move 37 and Move 78: what the match revealed

The match in Seoul in March 2016 produced two moments that have since become landmarks in AI history.

Move 37, played by AlphaGo in Game 2, shocked the commentators. The probability that a human player would make that specific move was estimated at 1 in 10,000 (17:18). It placed a stone on the fifth line, territory that human convention had long considered inefficient. Lee Sedol stood up and left the room to collect himself. What AlphaGo had discovered was a new way of balancing territorial control against long-range positional influence (19:02). It was not a mistake. It was a move that no human had considered because no human tradition of play had led there.

The second landmark came from the human side. In Game 4, Lee Sedol played what commentators called "the divine move": Move 78 (21:55). AlphaGo had assigned that move such a low probability that it had simply not prepared for it. AlphaGo's responses became erratic, and Lee Sedol won the only game he would win in the series. For Kohli and Graepel, this moment was as significant as AlphaGo's victories. It showed that human creativity and resilience could still find cracks in the machine's armor, even if those cracks were narrow.

The aftermath in the Go world was also notable. Rather than retreating from AI, the Go community embraced it. Players began studying AlphaGo's games to understand moves they had never considered. Interest in Go reportedly increased globally following the match (25:16).

From games to science

The argument Graepel and Kohli develop in the episode is that AlphaGo was not the destination. It was the proof of concept.

AlphaZero, which followed, was trained without any human game data at all. It started from scratch, played games against itself, and within days was beating AlphaGo. More striking: it rediscovered the entire corpus of human Go knowledge, then discarded some of it in favor of strategies humans had never found (29:00).

Kohli recalls that Demis Hassabis and David Silver, AlphaGo's project lead, were discussing protein folding in private conversations immediately after the Seoul match in 2016 (31:02). The same principle that won at Go, searching a vast space of possibilities with learned evaluation, could in theory search the space of protein shapes. AlphaFold eventually did exactly that, predicting the 3D structures of proteins (the shapes that proteins fold into, which determine how they function) with a precision that had eluded biology for 50 years.

AlphaTensor applied the same approach to matrix multiplication, a fundamental mathematical operation used in virtually every AI system (multiplying grids of numbers together). The best known algorithm for doing this efficiently had not been improved since Volker Strassen published his method in 1969. AlphaTensor found better ones, breaking 50 years of stagnation (36:01).

AlphaEvolve goes further still: rather than searching the space of game moves or molecular shapes, it searches the space of all possible programs, looking for optimal algorithms in open-ended computational problems (37:40). AlphaProof targets formal mathematics, generating verifiable proofs of open mathematical problems.

The LLM detour and the return to reinforcement learning

A large part of the episode addresses how large language models (LLMs, AI systems trained on massive amounts of text to understand and generate human language) fit into this picture.

Graepel describes LLMs as a "shortcut" to a form of intelligence (50:01). By training on the accumulated text of the internet, these systems absorbed what Graepel calls "crystallized intelligence": the distilled knowledge of human civilization encoded in language. This is powerful and useful. But it has a ceiling: an LLM trained on human data can, at best, recombine and extend what humans have already written. It cannot go where humans have never been.

The community is now circling back to reinforcement learning to push past that ceiling (51:10). The emerging approach combines the broad language understanding of LLMs with the trial-and-error learning of reinforcement learning. The goal is an AI that can draw on human knowledge as a starting point, then go beyond it through independent exploration, following exactly the path AlphaZero took in Go.

Opposing perspectives

The verifiability constraint

The strongest limitation in the episode comes from within the episode itself. Kohli argues that AI systems excel in domains where outputs can be verified: code runs or it doesn't, a mathematical proof is correct or it isn't, a game is won or lost (43:18). The ability to reject failures is what makes learning possible. Without a reliable "this answer is wrong" signal, reinforcement learning cannot get traction.

This is a serious constraint. Most of the most important problems humans face, questions of policy, medicine, social dynamics, and ethics, do not have clean verifiers. A proof of a mathematical theorem is either valid or invalid, but a proposed solution to poverty or climate change cannot be checked with a function call. Critics of the "AI solves science" narrative would point here: the domains where these systems have achieved breakthroughs are precisely those with well-defined success criteria.

The attribution question

Another tension worth noting: when AlphaGo or AlphaTensor discovers a new technique, who or what deserves credit? The system trained on human games before venturing beyond them. Its "discoveries" are generated by learned evaluation functions that were shaped by human data. The boundary between "distilling existing knowledge" and "generating new knowledge" is genuinely unclear, and the episode does not fully resolve it.

What remains beyond reach

The participants are candid that the path from games to science involved carefully selecting domains that share Go's key property: a fast, reliable way to check whether a candidate solution is any good. The harder question is whether that approach can extend into unstructured, messy, real-world domains. The episode leaves this open.

How to interpret these claims

Graepel and Kohli are researchers who were personally involved in the AlphaGo project, speaking on a podcast produced by their own organization. This is worth keeping in mind when evaluating the enthusiasm in the episode. The narrative they present, from AlphaGo to AlphaFold to a coming era of scientific discovery, is coherent and largely well-supported. But it is also, inevitably, a story shaped by the people who built these systems and have a professional stake in their significance.

That does not make the claims wrong. AlphaFold's impact on structural biology is documented and widely accepted in the scientific community. AlphaTensor's results in matrix multiplication were published in peer-reviewed literature. These are not just self-reported outcomes.

But the broader claim, that we are at the beginning of an AI-driven era of scientific breakthroughs, is a prediction, not a fact. It depends on the verifiability bottleneck being solvable for a much wider range of problems. The episode presents this as likely. Independent researchers studying the limits of reinforcement learning would be more cautious.

The most honest framing may be Kohli's own: AlphaGo was "the transition point" that proved superhuman AI performance in a specific domain was not science fiction (51:51). What it proved about domains beyond those with clean verifiers remains, ten years on, an open question.

Practical implications

For anyone following AI developments

The AlphaGo lineage offers a clearer way to think about where AI progress is most credible. When a system operates in a domain with a reliable verifier (running code, formal proofs, protein structure prediction, game outcomes) the results are worth taking seriously. When it operates in a domain without one (open-ended reasoning, strategic advice, medical diagnosis) the results require much more scrutiny.

For researchers and educators

The episode makes a strong case that the combination of large language models and reinforcement learning is the current frontier. LLMs provide breadth and language fluency. Reinforcement learning provides the mechanism for going beyond existing human knowledge. Understanding both, and how they interact, matters for anyone working on or studying AI systems.

Glossary

Term	Definition
Reinforcement learning	A training method where an AI learns by trial and error, receiving a reward signal for good outcomes. AlphaGo played millions of games against itself using this approach.
Neural network	Software loosely modeled on the brain: layers of connected nodes that learn to recognize patterns from large amounts of data.
Policy network	The part of AlphaGo that looks at a board position and suggests which moves seem most promising, acting as the system's intuition.
Value network	The part that estimates how favorable a board position is for either player, asking "who is winning here?"
Game tree search	A method of exploring possible future moves by branching out from the current position, like tracing paths through a tree.
Large language model (LLM)	An AI trained on massive amounts of text to understand and generate human language. Examples include GPT-4 and Claude.
Protein folding	The process by which a chain of amino acids folds into a specific 3D shape. That shape determines what the protein does in the body. AlphaFold predicts these shapes computationally.
Matrix multiplication	A core mathematical operation used in virtually all AI systems, involving the multiplication of grids of numbers. AlphaTensor found faster ways to perform it.
Verifier	A system or function that checks whether an AI's output is correct. Code compilers, proof checkers, and game rules all act as verifiers.
Elo score	A numerical rating system for measuring relative skill, originally from chess and used to track AlphaGo's improvement over time.