How the World's Top Mathematician Uses AI

Key insights
- AI has driven the cost of idea generation to near zero, shifting the bottleneck in science from hypothesis creation to verification at scale.
- 50 AI-solved Erdős problems look revolutionary until you see the full picture: systematic sweeps show only a 1–2% success rate per problem. AI buys scale, and we only see the winners.
- Tao expects human-AI hybrids to dominate frontier math much longer than most expect, because current AI cannot build cumulatively from partial progress.
This is an AI-generated summary. The source video may include demos, visuals and additional context.
In Brief
Terence Tao, a UCLA professor who won the Fields Medal (the highest prize in mathematics) at just 31, sat down with podcast host Dwarkesh Patel for a wide-ranging conversation about AI and scientific discovery. Tao describes using AI tools daily for secondary work, but says the hardest part of his job, the core creative breakthrough, still happens the old-fashioned way: pen and paper. The conversation covers why AI is changing the economics of science, what a closer look at AI's math performance actually reveals, and why the future of frontier research is human-AI collaboration rather than AI replacement.
Related reading:
Kepler was a "high-temperature LLM"
The conversation opens with a striking analogy. Dwarkesh Patel suggests that Johannes Kepler, the 17th-century astronomer who discovered the laws of planetary motion, was doing something that looks a lot like what large language models (LLMs) do today.
Kepler spent twenty years trying random relationships against the Danish astronomer Tycho Brahe's massive astronomical dataset, most of which went nowhere. He explored Platonic solids, musical harmonies, and geometric patterns before stumbling onto the actual laws. As Dwarkesh puts it at 4:09: "Kepler was a high-temperature LLM." The "temperature" here refers to how random or creative an AI's outputs are, a setting you can dial up or down in most AI systems.
Tao agrees the parallel is striking, but points to something crucial: it only worked because Brahe's dataset was there to verify ideas against. Without reliable data to check hypotheses, the hypothesis generation was worthless. This sets the stage for everything that follows.
Idea generation is now almost free
Historically, coming up with the right hypothesis was considered the genius part of science. It was rare, it was celebrated, and it was the bottleneck.
The internet didn't create abundance by itself, and neither does cheap hypothesis generation. The new bottleneck is verification, validation, and figuring out which ideas actually move science forward. Scientists can now generate thousands of theories for any given problem. Journals are already being flooded with AI-generated submissions. Human reviewers are overwhelmed. The traditional peer review system was built for a world where ideas were scarce. It isn't built for a world where ideas are abundant.
Tao's point is that science needs to redesign itself to handle this shift. The prestige hierarchy has flipped: the "eureka moment" is now the cheap part.
The Erdős reality check
One of the most-discussed recent AI achievements in mathematics is the solving of problems from Paul Erdős's collection. Erdős (1913-1996) was a legendary mathematician who published more papers than almost anyone in history and left behind over a thousand unsolved problems, offering cash prizes for whoever could crack them.
In recent months, AI tools solved 50 of these Erdős problems. That number circulated widely as a sign of AI's growing mathematical power. But Tao brings it back to earth.
At 44:40, he says: "on any given problem an AI tool has a success rate of maybe 1% or 2%." There are roughly 600 Erdős problems still unsolved. The 50 solved ones represent the low-hanging fruit: problems that had almost no existing literature, where an obscure technique could be combined with something else to produce a solution. AI found all of those at its current capability level, then essentially plateaued.
What happened next is textbook selection bias. When researchers ran systematic sweeps throwing frontier AI models at every single Erdős problem simultaneously, the results weren't impressive. But the successes got shared on social media, and the failures didn't. So from the outside, it looked like a revolution.
This matters for how we interpret AI progress in science generally. Tao isn't dismissing what happened: solving 50 long-standing problems is genuinely impressive. But he's asking you to read the denominator, not just the numerator.
AI excels at breadth, humans at depth
Tao draws a clear distinction between what AI does well and what humans do well in mathematics.
AI excels at breadth: taking every known technique and trying it on a problem, often with fewer careless errors than a human. It can survey the entire literature, try all standard approaches, and identify which ones get 80% of the way there.
Humans excel at depth: building on partial progress, one step at a time. When a proof attempt gets stuck, a good mathematician doesn't start over. They remember what they tried, what almost worked, and they build on that. They pull collaborators in, share what they've found, and iterate.
Current AI systems can't do this. They can jump high, fail, and jump again. But they can't reach a handhold, stay there, and pull other people up from it. Each new session starts from scratch. "They've really sped up lots of secondary tasks. They haven't yet sped up the core thing that I do," Tao says at 48:46.
His current papers would take 5x longer to produce without AI for the auxiliary work: code generation, deeper literature searches, more numerical examples, reformatting. But the core mathematical insight, the part that makes a paper worth publishing in a top journal, still comes from Tao working alone with a pen.
The implication is important: science needs to redesign itself to take advantage of AI's breadth capability. Right now, science is structured around depth, because that's what humans are good at. The new opportunity is to use AI to map out entire fields first, covering all the easy ground systematically, then send human experts to the specific islands of difficulty where depth still matters.
The missing piece: a language for strategies
One of the most thought-provoking parts of the conversation concerns something that doesn't exist yet.
Tools like Lean (a programming language that lets you write mathematical proofs and have them verified automatically by a computer) have been transformative because they give AI a precise formal language for proofs. AI can learn from Lean proofs, generate Lean proofs, and verify them.
But mathematics is more than proofs. It's also strategies, conjectures, intuitions, and judgment calls about what's worth trying. Scientists talk to each other in a semi-formal language that combines data, argument, narrative, and hunches. We don't have a formal system for that, and Tao isn't sure we ever will.
Without such a system, reinforcement learning (a method for training AI by rewarding correct behavior, like training a dog with treats) can't learn how to evaluate whether a half-formed idea is worth pursuing. You can't reward or penalize a hunch. This is why Tao believes the human element will remain essential for a long time.
Human-AI hybrids will dominate for longer than expected
The conversation ends with Tao's prediction about the future. Dwarkesh asks: when will AI be doing frontier mathematics at least as well as the best human mathematicians?
Tao is more conservative than most AI optimists. At 1:19:57, he says: "I guess I do believe that hybrid human plus AIs will dominate mathematics for a lot longer."
His reasoning: current AI is excellent at certain things and terrible at others. You can add frameworks and scaffolding to reduce error rates, but the fundamental ingredients for a full replacement of creative intellectual work aren't there yet. The missing piece isn't more compute or a larger model. It's the ability to build on partial progress step by step, maintain a thread of understanding, and make judgment calls about what's worth pursuing.
This isn't pessimism. Tao clearly finds the tools valuable. He uses them daily. His papers are richer, broader, and cover more ground than they would have in 2020. But richer and broader is not the same as deeper, and depth is what separates a paper worth reading from one that isn't.
Glossary
| Term | Definition |
|---|---|
| Fields Medal | The highest prize in mathematics, awarded every four years to mathematicians under 40. Often described as the "Nobel Prize of mathematics." |
| LLM (Large Language Model) | An AI system trained on massive amounts of text to predict and generate language. ChatGPT, Claude, and Gemini are examples. |
| Lean (proof assistant) | A programming language where you write mathematical proofs that a computer checks automatically. Increasingly used to verify complex math results. |
| Reinforcement learning | A method for training AI by rewarding correct behavior and penalizing mistakes, similar to training a dog with treats. Used to fine-tune language models. |
| Erdős problems | A collection of over a thousand unsolved math problems posed by the legendary Hungarian mathematician Paul Erdős, who often offered cash prizes for solutions. |
| Selection bias | When you only see the winners of a process, giving a misleading picture of the overall success rate. If AI solves 50 problems and fails 2,950 others, but only the 50 are reported, the picture looks better than it is. |
Sources and resources
Want to go deeper? Watch the full video on YouTube →