What Happens Now That AI Is Good at Math?

This is an AI-generated summary. The source video may include demos, visuals and additional context.
In Brief
The night Ernest Ryu solved it, he had four hours to himself. His son's bedtime was 8 p.m. and he made a point of not staying up past midnight. Over three evenings, twelve hours in total, he worked with ChatGPT on a problem that had been open in optimization theory for forty-two years. He got the proof. In episode 17 of the OpenAI Podcast, Ryu and colleague Sébastien Bubeck sit down with host Andrew Mayne to answer the question the title poses: what happens now that AI is good at math?
Related reading:
From Camping Splits to an IMO Gold Medal
Two years ago, Sébastien Bubeck ran simple tests on ChatGPT: imagine three people go camping and each pays for different things. Can the model split the bill fairly across seventeen items? It couldn't. Ask it to find a Zoom time that works across three time zones, and it would fail that too.
A year and a half ago, Bubeck participated in a workshop debate among mathematicians: would scaling large language models (LLMs, AI systems trained on vast amounts of text) ever help resolve major open problems? The room took a vote. Eighty percent said no. Eight months later, the models were doing research-level mathematics.
The jump did not come from one thing. Better training techniques, the arrival of reasoning models, and improved consistency over long chains of logic all came together at once. The result: in summer 2025, ChatGPT achieved a gold medal performance at the International Mathematical Olympiad (IMO), the world's most prestigious math competition for high school students. The question of whether AI could do competition math was settled.
But competition problems are not research problems. They have known solutions. They are designed to be solved in a few hours. Ernest Ryu, then a math professor at UCLA, decided to find out what happened on the other side of that line.
The 42-Year-Old Problem
The question Ryu chose had been open since 1983. It concerned the Nesterov accelerated gradient method, a foundational algorithm in optimization theory: could it, in the worst case, diverge (produce runaway results instead of converging to a solution)? Most evidence suggested the algorithm was safe. No one had proven it either way.
Over twelve hours spread across three evenings, Ryu did not simply type the problem and wait. He played the role of verifier. When ChatGPT made a mistake, he corrected it. When the conversation drifted, he steered it toward approaches he recognized as promising. The model explored. He guided. After forty-two years, the answer turned out to be yes: the algorithm can diverge. Ryu checked the proof by hand and by ChatGPT. It was correct.
He posted about it on X. People paid attention. It was one of the earliest cases of a genuinely open mathematical problem being resolved with AI assistance.
Ryu is now a researcher at OpenAI. His calibration of where things stand today: any physicist, chemist, or engineer who uses complex mathematics but is not inventing new mathematics can have ChatGPT handle it for them. For 99 percent of people who encounter hard math, the models are already at the right level.
The Erdős Test: From Literature Search to New Mathematics
Paul Erdős was one of the most prolific mathematicians of the twentieth century. Over 1,500 papers. No permanent address. He traveled from university to university asking questions. His unsolved problems are catalogued on a public website: roughly a thousand questions, each tracked as open or resolved.
Once OpenAI's models started handling research-level math, the Erdős list became a natural test. The first result surprised everyone: GPT performed a deep literature search, scanning thousands of papers across unrelated fields, and found an answer buried in a completely different area of mathematics. The problem was marked open. The solution had been out there, invisible, in a language no one would have thought to search.
From there, OpenAI researcher Mark Selke ran a systematic sweep of all the problems. The models returned solutions to ten Erdős problems. Bubeck tweeted about it, triggering a debate about whether this counted as real discovery, including a public disagreement with DeepMind's Demis Hassabis about how to describe the results.
Then came the answer that settled it: OpenAI now has more than ten solutions to Erdős problems that are completely new — not found in the literature, not previously published anywhere. Results publishable in top combinatorics journals, produced by ChatGPT and internal models. The jump from "found it somewhere" to "invented it" happened within a few months.
Why Math Is the Perfect AGI Benchmark
Mathematics did more than give AI a new skill. It gave it a discipline. Bubeck explains the core property: to resolve a mathematical problem, you have to think for a long time: days, weeks, sometimes years. And the thinking must be consistent all the way through. One mistake anywhere in the chain destroys the entire argument, regardless of how correct everything else is. No partial credit.
This is exactly what OpenAI wants from reasoning models: the capacity to catch and correct errors before they compound. The hope is that the consistency developed through mathematics will generalize to other domains, for the same reason we teach humans math even when they will never use it professionally. It trains a way of thinking.
Bubeck calls this "AGI time": how long can an AI sustain coherent thought, as if it were a human researcher? Two years ago, models matched a high school student thinking for a few minutes. Today, they can sustain a researcher's focus for hours, possibly a few days. The target is weeks, then months. The arc over four years (from seconds to minutes to hours to days) has been consistent, and Bubeck sees no reason for it to stop.
The Automated Researcher
The current way of working is what Bubeck calls the professor-student interaction: the human poses a problem, the AI works on it, comes back, they talk, the AI goes again. That is how Ryu's twelve-hour session worked. It is also the ceiling of what is currently possible.
The limitation is the context window, roughly fifty pages of mathematical thinking in a single session. Many groundbreaking papers are longer than that. And the human thought behind a thirty-page paper represents far more raw thinking than the final text shows.
The next step is the automated researcher: a model that works autonomously over long periods, accumulates knowledge the way a mathematician fills notebooks across months of work, and produces results that required more sustained effort than any single session allows. Ryu points to OpenAI Codex as the current analogy: Codex already handles large code repositories across extended sessions, compacting and continuing as it goes. The same architecture, applied to mathematics, is where OpenAI is actively working.
Expertise Matters More, Not Less
Both researchers are clear about the risk. Bubeck states it directly: "Expertise is even more valuable than it ever was."
The reason Ryu could solve a forty-two-year-old problem in twelve hours is decades of accumulated mathematical intuition — not ChatGPT's. He knew which approaches were novel. He caught the model's mistakes because he understood what a correct argument had to look like. Without that depth, the same twelve hours would have produced nothing useful.
The models confirm this. Non-mathematicians who have tried using AI to prove theorems have produced long, confident proofs that were simply wrong. The tool amplifies expertise. Without expertise, there is nothing to amplify.
The risk Bubeck names is mental atrophy: letting AI do the hard foundational work before you have done it yourself, asking ChatGPT to explain results rather than sitting with them for weeks until you genuinely understand. The deep understanding that makes AI collaboration productive takes years of difficult work to build. Shortcutting that development undermines the very thing that makes the tool powerful. We need more scientists, not fewer. And they need to be genuinely good.
What This Means
If you have avoided mathematics: this is the best possible moment to start. ChatGPT can explain Maxwell's equations, adapt to your exact level of background, and generate problems tailored to where your understanding breaks down. The path in has never been more accessible.
If you are already a scientist or engineer: AI can now handle the mathematics your work requires without you having to master it yourself. A biologist can run complex optimization. A physicist can work with advanced differential geometry. The barrier between disciplines is lower than it has ever been.
And if you are a mathematician: according to Bubeck, the field is about to become far more connected. Results buried in obscure papers will surface when a model finds the link. Harder problems, faster feedback, and the dopamine hit of a solved problem compressed into shorter timelines.
Forty-two years of an open problem, resolved over three evenings. The timelines are compressing.
Glossary
| Term | Definition |
|---|---|
| Open problem | A mathematical question that no one has yet managed to prove or disprove |
| IMO | International Mathematical Olympiad, the world's most prestigious math competition for high school students |
| AGI time | How long an AI can sustain coherent thought, as if it were a human researcher working over days or weeks |
| Context window | The amount of text an AI can hold in memory during a single work session, currently around fifty pages for math tasks |
| Erdős number | The steps between you and mathematician Paul Erdős in a chain of co-authored papers. Bubeck's is two, Ryu's is three. |
Sources and resources
Want to go deeper? Watch the full video on YouTube →