The purpose of this question is to collect examples where large language models (LLMs) like ChatGPT have led to notable mathematical developments.
The emphasis in this question is on LLMs, but answers about other machine-learning tools are also welcome.
This question complements two questions that I asked before: Experimental mathematics leading to major advances (January 2010) and The use of computers leading to major mathematical advances II (June 2021). I think it will be useful to keep track of mathematical achievements based on LLMs or assisted by LLMs since it is considered a serious possibility that LLM's have the potential to change (and automatize) or at least assist research in mathematics.
I relaxed the threshold from "major" (in the previous two questions) to "notable" to allow more answers.
A related question specifically about Deep Mind is this: What mathematical problems can be attacked using DeepMind's recent mathematical breakthroughs? ; Another related question referring to deep learning is What are possible applications of deep learning to research mathematics? See also What mathematical problems can be attacked using DeepMind's recent mathematical breakthroughs? .
-
4$\begingroup$ There have been many similar questions on MO to this about the use of AI/machine learning in research math; see, e.g., mathoverflow.net/questions/463937 and other questions linked there. $\endgroup$Sam Hopkins– Sam Hopkins ♦2025年10月26日 17:43:38 +00:00Commented Oct 26 at 17:43
-
8$\begingroup$ I haven't voted on the question (in either way), but I consider it likely that answers - if you get some - will lead to a lot of discussion regarding how significant the LLM contribution actually was. $\endgroup$Jochen Glueck– Jochen Glueck2025年10月26日 17:48:35 +00:00Commented Oct 26 at 17:48
-
38$\begingroup$ My instinct is to downvote the question, though I don't have any better justification than that I hate the intrusion of AI into every sphere, and would rather not see t here; but that's unreasonable personal bias, so I just won't vote. But it does seem nonsensioal to me that the question would be at 9 – 7 while both answers, reasonable as far as I can tell, are at 0 – 2. I hope downvoters will consider leaving a comment about what they think is an appropriate answer. $\endgroup$LSpice– LSpice2025年10月26日 20:27:21 +00:00Commented Oct 26 at 20:27
-
4$\begingroup$ I think that there should be a special badge for controversial questions :). $\endgroup$Gil Kalai– Gil Kalai2025年10月27日 19:59:04 +00:00Commented Oct 27 at 19:59
-
4$\begingroup$ Re, just to be clear, I meant my rant to express dissatisfaction with the ubiquity of AI, not with you or this question; I hope I gave no offence. Re, I thought there was, but searching just turned up a post Can we have a badge for controversy? which seems to indicate that the answer to the titular question is, or 15 years ago was, "no." $\endgroup$LSpice– LSpice2025年10月28日 14:57:16 +00:00Commented Oct 28 at 14:57
15 Answers 15
Boris Alexeev and Dustin Mixon posted last week their paper Forbidden Sidon subsets of perfect difference sets, featuring a human-assisted proof, where they had an LLM generate the Lean formalization of their proof. In my view this is one of the promising uses of LLMs, because the verifier naturally guards against hallucinations.
The problem is notable: they give a counterexample to a 1000ドル Erdös problem (as well as noting that Marshall Hall had published a counterexample before Erdös made the conjecture).
My caveat: a human must still verify that the definitions and the statement of the main theorem are correct, lest the LLM generate a correct proof, but of a different theorem.
-
6$\begingroup$ This is a very interesting paper. But I think it's important to point out that this use of ChatGPT was a mixed success. They do cite one instance where one of their intermediate results (Proposition 20) was formally proved by the LLM autonomously. On the other hand, they also say that their efforts at vibe coding the nearly trivial result that if f is a fixed point–free involution on a finite set S, then S has even cardinality was "a multi-day struggle." $\endgroup$Timothy Chow– Timothy Chow2025年10月31日 14:32:08 +00:00Commented Oct 31 at 14:32
-
3$\begingroup$ IMO, what Alexeev and Mixon did was closer to "autoformalization" than to automated discovery of new theorems. Another impressive example of an autoformalization effort is the development by Math.Inc of a tool called Gauss, which helped them complete a challenging formalization project that Tao and Kontorovich had proposed but had not completed. $\endgroup$Timothy Chow– Timothy Chow2025年10月31日 14:39:16 +00:00Commented Oct 31 at 14:39
-
2$\begingroup$ it wasn't 1000ドル problem - it was a strong statement that would have proven 1000ドル problem had it been true. But even Erdos said this formulation was most likely false $\endgroup$NooneAtAll3– NooneAtAll32025年11月07日 02:12:26 +00:00Commented Nov 7 at 2:12
-
2$\begingroup$ Note that in this particular case, the statement of the main theorem had already been formalized in a Lean repository of Erdos problems, maintained by Google DeepMind. In particular, in this case the statement had already been inspected by experts. You are absolutely right that in general people probably won't be so lucky. $\endgroup$Kevin Buzzard– Kevin Buzzard2025年11月08日 01:57:39 +00:00Commented Nov 8 at 1:57
-
1$\begingroup$ The paper quotes Erdős clearly stating "I offer a thousand dollars for a proof or disproof of this conjecture." $\endgroup$aorq– aorq2025年11月09日 13:32:48 +00:00Commented Nov 9 at 13:32
Here is an example Counterexample to majority optimality in NICD with erasures
From the abstract:
We asked GPT-5 Pro to look for counterexamples among a public list of open problems (the Simons ``Real Analysis in Computer Science'' collection). After several numerical experiments, it suggested a counterexample for the Non-Interactive Correlation Distillation (NICD) with erasures question: namely, a Boolean function on 5 bits that achieves a strictly larger value of E|f(z)| than the 5-bit majority function when the erasure parameter is p=0.40. In this very short note we record the finding, state the problem precisely, give the explicit function, and verify the computation step by step by hand so that it can be checked without a computer. In addition, we show that for each fixed odd n the majority is optimal (among unbiased Boolean functions) in a neighborhood of p=0. We view this as a little spark of an AI contribution in Theoretical Computer Science: while modern Large Language Models (LLMs) often assist with literature and numerics, here a concrete finite counterexample emerged.
At the request of Gil Kalai, I'm converting a comment to an answer.
The paper "Mathematical exploration and discovery at scale" by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner was just posted to the arXiv: https://arxiv.org/abs/2511.02864.
Below is the abstract of the paper.
AlphaEvolve is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.
EDIT:
Some further developments are in the paper "New Nikodym set constructions over finite fields" by Terence Tao (https://arxiv.org/abs/2511.07721) whose abstract reads
For any fixed dimension $d \geq 3$ we construct a Nikodym set in $\mathbb{F}_q^d$ of cardinality $q^d - (\frac{d-2}{\log 2} +1+o(1)) q^{d-1} \log q$ in the limit $q \to \infty$, when $q$ is an odd prime power. This improves upon the naive random construction, which gives a set of cardinality $q^d - (d-1+o(1)) q^{d-1} \log q$, and is new in the regime where $\mathbb{F}_q$ has unbounded characteristic and $q$ not a perfect square. While the final proofs are completely human generated, the initial ideas of the construction were inspired by output from the tools AlphaEvolve and DeepThink. We also give a new construction of Nikodym sets in $\mathbb{F}_q^2$ for $q$ a perfect square that match the existing bounds of $q^2 - q^{3/2} + O(q \log q)$, assuming that $q$ is not the square of a prime $p \equiv 3 \pmod{4}$.
And also in the paper "Sum-difference exponents for boundedly many slopes, and rational complexity" again by Terence Tao (https://arxiv.org/abs/2511.15135) whose abstract reads
The dimension of Kakeya sets can be bounded using sum-difference exponents $\mathrm{SD}(R;s)$ for various sets of rational slopes $R$ and output slope $s$; the arithmetic Kakeya conjecture, which implies the Kakeya conjecture in all dimensions, asserts that the infimum of such exponents is 1ドル$. The best upper bound on this infimum currently is 1ドル.67513\dots$. In this note, inspired by numerical explorations from the tool AlphaEvolve, we study the regime where the cardinality of the set of slopes $R$ is bounded. In this regime, we establish that these exponents converge to 2ドル$ at a rate controlled by the rational complexity of $s$ relative to $R$, which measures how efficiently $s$ can be expressed as a rational combination of slopes in $R$.
-
2$\begingroup$ In a nutshell, the idea is to solve a combinatorial optimization problem by evolving code for generating combinatorial objects rather than evolving the combinatorial objects themselves. To do this, one needs to be able to make small random perturbations of the code while still having the code compile; this is where LLMs come in, since writing code is one thing LLMs are good at. $\endgroup$Timothy Chow– Timothy Chow2025年11月07日 05:05:22 +00:00Commented Nov 7 at 5:05
-
1$\begingroup$ @TimothyChow I believe the approach used to find better cap sets by DeepMind (discussed at mathoverflow.net/questions/463937) was along the same lines. $\endgroup$2025年11月07日 13:17:53 +00:00Commented Nov 7 at 13:17
-
1$\begingroup$ Yes, AlphaEvolve is "FunSearch 2.0". $\endgroup$Timothy Chow– Timothy Chow2025年11月07日 13:50:51 +00:00Commented Nov 7 at 13:50
-
1$\begingroup$ Thanks, Sam. There is a blogpost about the paper here: terrytao.wordpress.com/2025/11/05/… $\endgroup$Gil Kalai– Gil Kalai2025年11月11日 09:37:48 +00:00Commented Nov 11 at 9:37
This paper
Sergey Avvakumov, Roman Karasev, Tensor rank of the determinant and periodic triangulations of $\mathbb{R}^n$
https://arxiv.org/abs/2509.22333
includes in the Acknowledgments "We also thank ChatGPT 5 for pointing out that the lower bound in the proof of Theorem 1.5 can be stated in tensor language and is thus equal to the determinant’s tensor rank."
-
3$\begingroup$ Thanks, Zach! I knew the paper and I met Sergey today, but did not know about the role of ChatGPT :) $\endgroup$Gil Kalai– Gil Kalai2025年10月26日 20:35:06 +00:00Commented Oct 26 at 20:35
The paper "Point Convergence of Nesterov's Accelerated Gradient Method: An AI-Assisted Proof" by Uijeong Jang and Ernest Ryu, posted to Arxiv October 27, 2025, states in the abstract:
The Nesterov accelerated gradient method, introduced in 1983, has been a cornerstone of optimization theory and practice. Yet the question of its point convergence had remained open. In this work, we resolve this longstanding open problem in the affirmative. The discovery of the proof was heavily assisted by ChatGPT, a proprietary large language model, and we describe the process through which its assistance was elicited.
https://arxiv.org/abs/2510.23513
See also this discussion by Damek Davis that helps put the result in perspective: https://x.com/damekdavis/status/1982529760505782510?s=46
Not exactly a notable result, but in my recent preprint Evaluation of GPT-5 on an Advanced Extension of Kashihara's Problem I describe how GPT-5 has been able to improve the general version of an extended combinatorial problem I originally solved in 2010.
Scott Aaronson Phillip Harris, Freek Witteveen have a recent paper on the bounds of amplification of QMA (quantum Merlin-Arthur). A critical part of the paper involved a linear algebra trick suggested by GPT5. See Aaronson's blog entry here.
I have hesitated to post this example because I don't think it's really a "notable mathematical development" as such, but after seeing the other answers, I think this one is worth mentioning.
As reported in Scientific American, Epoch AI invited several mathematicians, including Ken Ono, to a meeting designed to generate challenge problems for "FrontierMath". Among other things, Ono came up with what he thought was a Ph.D.-thesis-level problem: "What is the 5th power moment of Tamagawa numbers of elliptic curves over $\mathbb{Q}$?" To Ono's amazement, the AI autonomously solved the problem. You can read Ono's account on his Facebook page (also reproduced below), or listen to him talk about it here.
Even if this is a cherry-picked example—the best one from the whole meeting—this strikes me as a very impressive achievement. But see also this tweet by Daniel Litt, who was also one of the invited mathematicians but was not too impressed when he read over the chat log. enter image description here
-
2$\begingroup$ A similar project, but on a smaller scale and led by Christian Stump, for using PhD-level mathematics problems to benchmark AI is: math.science-bench.ai $\endgroup$2025年11月06日 15:04:20 +00:00Commented Nov 6 at 15:04
The abstract of Early science acceleration experiments with GPT-5 by Sébastien Bubeck, Christian Coester, Ronen Eldan, Timothy Gowers, Yin Tat Lee, Alexandru Lupsasca, Mehtaab Sawhney, Robert Scherrer, Mark Sellke, Brian K. Spears, Derya Unutmaz, Kevin Weil, Steven Yin, and Nikita Zhivotovskiy states in part, "Of note, this paper includes four new results in mathematics (carefully verified by the human authors), underscoring how GPT-5 can help human mathematicians settle previously unsolved problems."
-
11$\begingroup$ I found this passage on pg. 29 really interesting: "Our experience illustrates a pitfall in using AI: although GPT-5 possesses enormous internal knowledge and the capability to locate even more using the internet, it may not always report the original information sources accurately. This has the potential to deceive even seasoned researchers into thinking their findings are novel. We expect that our experience is not unique, and urge others to take special care in attribution when working with LLM-assisted proofs." $\endgroup$2025年11月21日 04:11:30 +00:00Commented Nov 21 at 4:11
-
2$\begingroup$ I agree. Even without AI, humans often believe they’ve discovered something new only to learn it was proved earlier by someone else. What’s interesting now is to understand how the rate of such misattributions from LLM-assisted work compares to the natural rate of human rediscovery $\endgroup$Paata Ivanisvili– Paata Ivanisvili2025年11月21日 07:16:00 +00:00Commented Nov 21 at 7:16
-
1$\begingroup$ @SamHopkins I find the quote telling, although as someone who studied history as one of their subjects in the UK's 16-18 high-school specialisms, I find the apparent surprise of a lot of scientists and mathematicians in this regard to be a bit depressing. (Any historian worth their salt will know the distinction between primary and secondary sources and have done some basic training of source analysis, etc) $\endgroup$Yemon Choi– Yemon Choi2025年11月21日 14:23:12 +00:00Commented Nov 21 at 14:23
-
1$\begingroup$ @YemonChoi Yes. It shouldn't even require any formal training (just common sense) to know that you shouldn't just blindly copy a reference from someone else's bibliography without checking its accuracy, but of course many scientists and mathematicians have been doing this since time immemorial. $\endgroup$Timothy Chow– Timothy Chow2025年11月21日 15:19:43 +00:00Commented Nov 21 at 15:19
here is a paper on bottleneck duality in flow networks with lattice coefficients from fall 2024.
https://arxiv.org/abs/2410.00315
the appendix to this paper details how the main result and the proof were generated by GPT-o1-mini in september 2024. it was very difficult to get a correct proof at the time; current models nail a correct proof immediately.
-
$\begingroup$ This is funny! I once tried in vain to detropicalize max-flow-min-cut (you can find some traces of that on MO), while you have managed to tropicalize it even further (+ becomes max) and then extend it to distributive lattices :) $\endgroup$darij grinberg– darij grinberg2025年10月28日 17:54:26 +00:00Commented Oct 28 at 17:54
Using deep neural networks, Deepmind & collaborators numerically found a class of unstable singularities of the porous media and 3D Euler (with boundary) equations. Notable here is the fact that the level of precision of their solutions "meets the stringent requirements for rigorous mathematical validation via computer-assisted proofs" (quote from the paper).
Paper here: https://arxiv.org/abs/2509.14185 Article: https://deepmind.google/discover/blog/discovering-new-solutions-to-century-old-problems-in-fluid-dynamics/
-
4$\begingroup$ Note that this is an application of neural networks, but not LLMs. (I think they tried to use AlphaEvolve, but this wasn't the main ingredient in the paper...) $\endgroup$Geordie Williamson– Geordie Williamson2025年10月28日 21:42:59 +00:00Commented Oct 28 at 21:42
There have been some notable recent examples of LLMs playing an important role in solving certain Erdős problems, e.g., Problem 124 and Problem 481 (although I think the latter turned out to be implied by a result of Klarner in 1982), which were purportedly solved entirely by Aristotle, and Problem 367, which was reduced by Wouter van Doorn to a lemma that was solved by Gemini Deepthink. While these are not famous Erdős problems and turned out to have relatively short and simple solutions, they differ from Olympiad problems in that the computer solved problems without known solutions.
A recent paper by Nagda, Raghavan, and Thakurta: "Reinforced generation of combinatorial structures: Applications to complexity theory" They received help from AlphaEvolve to improve the best-known bound for Max-3CUT and Max-4CUT. Their idea seems quite general, so I would not be surprised if more complexity theory results would be improved.
The computational complexity paper "Search versus Decision for $S_2^P$" by Lance Fortnow writes in the acknowledgements:
While the results are fully due to the author, this paper was mostly generated using the large language model Gemini 3 Pro with prompting from the author. The author takes full responsibility for its contents.
EDIT (additional context): The author further elaborated on Twitter/X: when asked "It looks like you only told it the theorem statement and didn't give it the sketch." the author replied "Yes, it came up with the proof on its own. Surprised me as well.".
-
1$\begingroup$ That acknowledgement reads to me like it says the AI was not used in the mathematical development itself. It only helped write the paper. $\endgroup$Wojowu– Wojowu2025年12月04日 12:19:49 +00:00Commented yesterday
-
1$\begingroup$ I agree that the acknowledgement could be interpreted this way. However, the author clarified on Twitter/X: when asked "It looks like you only told it the theorem statement and didn't give it the sketch." the author replied "Yes, it came up with the proof on its own. Surprised me as well.". $\endgroup$JoS– JoS2025年12月04日 14:08:12 +00:00Commented yesterday
-
1$\begingroup$ You should edit your answer to clarify and include this then. I'm happy to withdraw my downvote if you do so. $\endgroup$Wojowu– Wojowu2025年12月04日 14:57:24 +00:00Commented yesterday
-
$\begingroup$ Thank you, I have edited the answer accordingly! $\endgroup$JoS– JoS2025年12月04日 17:46:05 +00:00Commented 21 hours ago
This is a modest development and a modest use of AI, but nonetheless, since it may seem that only discretish mathematics is mentioned in most answers, the proof of Lemma 6 in https://arxiv.org/pdf/2511.06849 is due to ChatGPT 5. The techniques are standard but their use is elegant. Honestly, we were stressed out about the proof (when we discovered that an earlier one we had was flawed), and ChatGPT came to our rescue.
You must log in to answer this question.
Explore related questions
See similar questions with these tags.