Examples for the use of AI and especially LLMs in notable mathematical developments

Question 1

The purpose of this question is to collect examples where large language models (LLMs) like ChatGPT have led to notable mathematical developments.

The emphasis in this question is on LLMs, but answers about other machine-learning tools are also welcome.

This question complements two questions that I asked before: Experimental mathematics leading to major advances (January 2010) and The use of computers leading to major mathematical advances II (June 2021). I think it will be useful to keep track of mathematical achievements based on LLMs or assisted by LLMs since it is considered a serious possibility that LLM's have the potential to change (and automatize) or at least assist research in mathematics.

I relaxed the threshold from "major" (in the previous two questions) to "notable" to allow more answers.

A related question specifically about Deep Mind is this: What mathematical problems can be attacked using DeepMind's recent mathematical breakthroughs? ; Another related question referring to deep learning is What are possible applications of deep learning to research mathematics? See also What mathematical problems can be attacked using DeepMind's recent mathematical breakthroughs? .

Question 2

There have been many similar questions on MO to this about the use of AI/machine learning in research math; see, e.g., mathoverflow.net/questions/463937 and other questions linked there.

Question 3

I haven't voted on the question (in either way), but I consider it likely that answers - if you get some - will lead to a lot of discussion regarding how significant the LLM contribution actually was.

Question 4

My instinct is to downvote the question, though I don't have any better justification than that I hate the intrusion of AI into every sphere, and would rather not see t here; but that's unreasonable personal bias, so I just won't vote. But it does seem nonsensioal to me that the question would be at 9 – 7 while both answers, reasonable as far as I can tell, are at 0 – 2. I hope downvoters will consider leaving a comment about what they think is an appropriate answer.

Question 5

I think that there should be a special badge for controversial questions :).

Question 6

Re, just to be clear, I meant my rant to express dissatisfaction with the ubiquity of AI, not with you or this question; I hope I gave no offence. Re, I thought there was, but searching just turned up a post Can we have a badge for controversy? which seems to indicate that the answer to the titular question is, or 15 years ago was, "no."

Question 7

Boris Alexeev and Dustin Mixon posted last week their paper Forbidden Sidon subsets of perfect difference sets, featuring a human-assisted proof, where they had an LLM generate the Lean formalization of their proof. In my view this is one of the promising uses of LLMs, because the verifier naturally guards against hallucinations.

The problem is notable: they give a counterexample to a 1000ドル Erdös problem (as well as noting that Marshall Hall had published a counterexample before Erdös made the conjecture).

My caveat: a human must still verify that the definitions and the statement of the main theorem are correct, lest the LLM generate a correct proof, but of a different theorem.

Question 8

This is a very interesting paper. But I think it's important to point out that this use of ChatGPT was a mixed success. They do cite one instance where one of their intermediate results (Proposition 20) was formally proved by the LLM autonomously. On the other hand, they also say that their efforts at vibe coding the nearly trivial result that if f is a fixed point–free involution on a finite set S, then S has even cardinality was "a multi-day struggle."

Question 9

IMO, what Alexeev and Mixon did was closer to "autoformalization" than to automated discovery of new theorems. Another impressive example of an autoformalization effort is the development by Math.Inc of a tool called Gauss, which helped them complete a challenging formalization project that Tao and Kontorovich had proposed but had not completed.

Question 10

it wasn't 1000ドル problem - it was a strong statement that would have proven 1000ドル problem had it been true. But even Erdos said this formulation was most likely false

Question 11

Note that in this particular case, the statement of the main theorem had already been formalized in a Lean repository of Erdos problems, maintained by Google DeepMind. In particular, in this case the statement had already been inspected by experts. You are absolutely right that in general people probably won't be so lucky.

Question 12

The paper quotes Erdős clearly stating "I offer a thousand dollars for a proof or disproof of this conjecture."

Question 13

Here is an example Counterexample to majority optimality in NICD with erasures

From the abstract:

We asked GPT-5 Pro to look for counterexamples among a public list of open problems (the Simons ``Real Analysis in Computer Science'' collection). After several numerical experiments, it suggested a counterexample for the Non-Interactive Correlation Distillation (NICD) with erasures question: namely, a Boolean function on 5 bits that achieves a strictly larger value of E|f(z)| than the 5-bit majority function when the erasure parameter is p=0.40. In this very short note we record the finding, state the problem precisely, give the explicit function, and verify the computation step by step by hand so that it can be checked without a computer. In addition, we show that for each fixed odd n the majority is optimal (among unbiased Boolean functions) in a neighborhood of p=0. We view this as a little spark of an AI contribution in Theoretical Computer Science: while modern Large Language Models (LLMs) often assist with literature and numerics, here a concrete finite counterexample emerged.

Question 14

At the request of Gil Kalai, I'm converting a comment to an answer.

The paper "Mathematical exploration and discovery at scale" by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner was just posted to the arXiv: https://arxiv.org/abs/2511.02864.

Below is the abstract of the paper.

AlphaEvolve is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.

EDIT:

Some further developments are in the paper "New Nikodym set constructions over finite fields" by Terence Tao (https://arxiv.org/abs/2511.07721) whose abstract reads

For any fixed dimension $d \geq 3$ we construct a Nikodym set in $\mathbb{F}_q^d$ of cardinality $q^d - (\frac{d-2}{\log 2} +1+o(1)) q^{d-1} \log q$ in the limit $q \to \infty$, when $q$ is an odd prime power. This improves upon the naive random construction, which gives a set of cardinality $q^d - (d-1+o(1)) q^{d-1} \log q$, and is new in the regime where $\mathbb{F}_q$ has unbounded characteristic and $q$ not a perfect square. While the final proofs are completely human generated, the initial ideas of the construction were inspired by output from the tools AlphaEvolve and DeepThink. We also give a new construction of Nikodym sets in $\mathbb{F}_q^2$ for $q$ a perfect square that match the existing bounds of $q^2 - q^{3/2} + O(q \log q)$, assuming that $q$ is not the square of a prime $p \equiv 3 \pmod{4}$.

And also in the paper "Sum-difference exponents for boundedly many slopes, and rational complexity" again by Terence Tao (https://arxiv.org/abs/2511.15135) whose abstract reads

The dimension of Kakeya sets can be bounded using sum-difference exponents $\mathrm{SD}(R;s)$ for various sets of rational slopes $R$ and output slope $s$; the arithmetic Kakeya conjecture, which implies the Kakeya conjecture in all dimensions, asserts that the infimum of such exponents is 1ドル$. The best upper bound on this infimum currently is 1ドル.67513\dots$. In this note, inspired by numerical explorations from the tool AlphaEvolve, we study the regime where the cardinality of the set of slopes $R$ is bounded. In this regime, we establish that these exponents converge to 2ドル$ at a rate controlled by the rational complexity of $s$ relative to $R$, which measures how efficiently $s$ can be expressed as a rational combination of slopes in $R$.

Question 15

In a nutshell, the idea is to solve a combinatorial optimization problem by evolving code for generating combinatorial objects rather than evolving the combinatorial objects themselves. To do this, one needs to be able to make small random perturbations of the code while still having the code compile; this is where LLMs come in, since writing code is one thing LLMs are good at.

Question 16

@TimothyChow I believe the approach used to find better cap sets by DeepMind (discussed at mathoverflow.net/questions/463937) was along the same lines.

Question 17

Yes, AlphaEvolve is "FunSearch 2.0".

Question 18

Thanks, Sam. There is a blogpost about the paper here: terrytao.wordpress.com/2025/11/05/…

Question 19

This paper

Sergey Avvakumov, Roman Karasev, Tensor rank of the determinant and periodic triangulations of $\mathbb{R}^n$

https://arxiv.org/abs/2509.22333

includes in the Acknowledgments "We also thank ChatGPT 5 for pointing out that the lower bound in the proof of Theorem 1.5 can be stated in tensor language and is thus equal to the determinant’s tensor rank."

Question 20

Thanks, Zach! I knew the paper and I met Sergey today, but did not know about the role of ChatGPT :)

Question 21

The paper "Point Convergence of Nesterov's Accelerated Gradient Method: An AI-Assisted Proof" by Uijeong Jang and Ernest Ryu, posted to Arxiv October 27, 2025, states in the abstract:

The Nesterov accelerated gradient method, introduced in 1983, has been a cornerstone of optimization theory and practice. Yet the question of its point convergence had remained open. In this work, we resolve this longstanding open problem in the affirmative. The discovery of the proof was heavily assisted by ChatGPT, a proprietary large language model, and we describe the process through which its assistance was elicited.

https://arxiv.org/abs/2510.23513

See also this discussion by Damek Davis that helps put the result in perspective: https://x.com/damekdavis/status/1982529760505782510?s=46

Question 22

Not exactly a notable result, but in my recent preprint Evaluation of GPT-5 on an Advanced Extension of Kashihara's Problem I describe how GPT-5 has been able to improve the general version of an extended combinatorial problem I originally solved in 2010.

Question 23

Scott Aaronson Phillip Harris, Freek Witteveen have a recent paper on the bounds of amplification of QMA (quantum Merlin-Arthur). A critical part of the paper involved a linear algebra trick suggested by GPT5. See Aaronson's blog entry here.

Question 24

I have hesitated to post this example because I don't think it's really a "notable mathematical development" as such, but after seeing the other answers, I think this one is worth mentioning.

As reported in Scientific American, Epoch AI invited several mathematicians, including Ken Ono, to a meeting designed to generate challenge problems for "FrontierMath". Among other things, Ono came up with what he thought was a Ph.D.-thesis-level problem: "What is the 5th power moment of Tamagawa numbers of elliptic curves over $\mathbb{Q}$?" To Ono's amazement, the AI autonomously solved the problem. You can read Ono's account on his Facebook page (also reproduced below), or listen to him talk about it here.

Even if this is a cherry-picked example—the best one from the whole meeting—this strikes me as a very impressive achievement. But see also this tweet by Daniel Litt, who was also one of the invited mathematicians but was not too impressed when he read over the chat log. enter image description here

Question 25

A similar project, but on a smaller scale and led by Christian Stump, for using PhD-level mathematics problems to benchmark AI is: math.science-bench.ai

Question 26

The abstract of Early science acceleration experiments with GPT-5 by Sébastien Bubeck, Christian Coester, Ronen Eldan, Timothy Gowers, Yin Tat Lee, Alexandru Lupsasca, Mehtaab Sawhney, Robert Scherrer, Mark Sellke, Brian K. Spears, Derya Unutmaz, Kevin Weil, Steven Yin, and Nikita Zhivotovskiy states in part, "Of note, this paper includes four new results in mathematics (carefully verified by the human authors), underscoring how GPT-5 can help human mathematicians settle previously unsolved problems."

Question 27

I found this passage on pg. 29 really interesting: "Our experience illustrates a pitfall in using AI: although GPT-5 possesses enormous internal knowledge and the capability to locate even more using the internet, it may not always report the original information sources accurately. This has the potential to deceive even seasoned researchers into thinking their findings are novel. We expect that our experience is not unique, and urge others to take special care in attribution when working with LLM-assisted proofs."

Question 28

I agree. Even without AI, humans often believe they’ve discovered something new only to learn it was proved earlier by someone else. What’s interesting now is to understand how the rate of such misattributions from LLM-assisted work compares to the natural rate of human rediscovery

Question 29

@SamHopkins I find the quote telling, although as someone who studied history as one of their subjects in the UK's 16-18 high-school specialisms, I find the apparent surprise of a lot of scientists and mathematicians in this regard to be a bit depressing. (Any historian worth their salt will know the distinction between primary and secondary sources and have done some basic training of source analysis, etc)

Question 30

@YemonChoi Yes. It shouldn't even require any formal training (just common sense) to know that you shouldn't just blindly copy a reference from someone else's bibliography without checking its accuracy, but of course many scientists and mathematicians have been doing this since time immemorial.

Question 31

here is a paper on bottleneck duality in flow networks with lattice coefficients from fall 2024.

https://arxiv.org/abs/2410.00315

the appendix to this paper details how the main result and the proof were generated by GPT-o1-mini in september 2024. it was very difficult to get a correct proof at the time; current models nail a correct proof immediately.

Question 32

This is funny! I once tried in vain to detropicalize max-flow-min-cut (you can find some traces of that on MO), while you have managed to tropicalize it even further (+ becomes max) and then extend it to distributive lattices :)

Question 33

Using deep neural networks, Deepmind & collaborators numerically found a class of unstable singularities of the porous media and 3D Euler (with boundary) equations. Notable here is the fact that the level of precision of their solutions "meets the stringent requirements for rigorous mathematical validation via computer-assisted proofs" (quote from the paper).

Paper here: https://arxiv.org/abs/2509.14185 Article: https://deepmind.google/discover/blog/discovering-new-solutions-to-century-old-problems-in-fluid-dynamics/

Question 34

Note that this is an application of neural networks, but not LLMs. (I think they tried to use AlphaEvolve, but this wasn't the main ingredient in the paper...)

Question 35

There have been some notable recent examples of LLMs playing an important role in solving certain Erdős problems, e.g., Problem 124 and Problem 481 (although I think the latter turned out to be implied by a result of Klarner in 1982), which were purportedly solved entirely by Aristotle, and Problem 367, which was reduced by Wouter van Doorn to a lemma that was solved by Gemini Deepthink. While these are not famous Erdős problems and turned out to have relatively short and simple solutions, they differ from Olympiad problems in that the computer solved problems without known solutions.

Question 36

A recent paper by Nagda, Raghavan, and Thakurta: "Reinforced generation of combinatorial structures: Applications to complexity theory" They received help from AlphaEvolve to improve the best-known bound for Max-3CUT and Max-4CUT. Their idea seems quite general, so I would not be surprised if more complexity theory results would be improved.

Question 37

The computational complexity paper "Search versus Decision for $S_2^P$" by Lance Fortnow writes in the acknowledgements:

While the results are fully due to the author, this paper was mostly generated using the large language model Gemini 3 Pro with prompting from the author. The author takes full responsibility for its contents.

EDIT (additional context): The author further elaborated on Twitter/X: when asked "It looks like you only told it the theorem statement and didn't give it the sketch." the author replied "Yes, it came up with the proof on its own. Surprised me as well.".

Question 38

That acknowledgement reads to me like it says the AI was not used in the mathematical development itself. It only helped write the paper.

Question 39

I agree that the acknowledgement could be interpreted this way. However, the author clarified on Twitter/X: when asked "It looks like you only told it the theorem statement and didn't give it the sketch." the author replied "Yes, it came up with the proof on its own. Surprised me as well.".

Question 40

You should edit your answer to clarify and include this then. I'm happy to withdraw my downvote if you do so.

Question 41

Thank you, I have edited the answer accordingly!

Question 42

This is a modest development and a modest use of AI, but nonetheless, since it may seem that only discretish mathematics is mentioned in most answers, the proof of Lemma 6 in https://arxiv.org/pdf/2511.06849 is due to ChatGPT 5. The techniques are standard but their use is elegant. Honestly, we were stressed out about the proof (when we discovered that an earlier one we had was flawed), and ChatGPT came to our rescue.

Timothy Chow – Timothy Chow · Answer 1 · 2025-10-27 23:16:37Z

28

$\begingroup$

Boris Alexeev and Dustin Mixon posted last week their paper Forbidden Sidon subsets of perfect difference sets, featuring a human-assisted proof, where they had an LLM generate the Lean formalization of their proof. In my view this is one of the promising uses of LLMs, because the verifier naturally guards against hallucinations.

The problem is notable: they give a counterexample to a 1000ドル Erdös problem (as well as noting that Marshall Hall had published a counterexample before Erdös made the conjecture).

My caveat: a human must still verify that the definitions and the statement of the main theorem are correct, lest the LLM generate a correct proof, but of a different theorem.

Share

Improve this answer

answered Oct 27 at 23:16

community wiki

Lior Silberman

$\endgroup$

5

6

$\begingroup$ This is a very interesting paper. But I think it's important to point out that this use of ChatGPT was a mixed success. They do cite one instance where one of their intermediate results (Proposition 20) was formally proved by the LLM autonomously. On the other hand, they also say that their efforts at vibe coding the nearly trivial result that if f is a fixed point–free involution on a finite set S, then S has even cardinality was "a multi-day struggle." $\endgroup$

Timothy Chow
– Timothy Chow

2025年10月31日 14:32:08 +00:00
Commented Oct 31 at 14:32
3

$\begingroup$ IMO, what Alexeev and Mixon did was closer to "autoformalization" than to automated discovery of new theorems. Another impressive example of an autoformalization effort is the development by Math.Inc of a tool called Gauss, which helped them complete a challenging formalization project that Tao and Kontorovich had proposed but had not completed. $\endgroup$

Timothy Chow
– Timothy Chow

2025年10月31日 14:39:16 +00:00
Commented Oct 31 at 14:39
2

$\begingroup$ it wasn't 1000ドル problem - it was a strong statement that would have proven 1000ドル problem had it been true. But even Erdos said this formulation was most likely false $\endgroup$

NooneAtAll3
– NooneAtAll3

2025年11月07日 02:12:26 +00:00
Commented Nov 7 at 2:12
2

$\begingroup$ Note that in this particular case, the statement of the main theorem had already been formalized in a Lean repository of Erdos problems, maintained by Google DeepMind. In particular, in this case the statement had already been inspected by experts. You are absolutely right that in general people probably won't be so lucky. $\endgroup$

Kevin Buzzard
– Kevin Buzzard

2025年11月08日 01:57:39 +00:00
Commented Nov 8 at 1:57
1

$\begingroup$ The paper quotes Erdős clearly stating "I offer a thousand dollars for a proof or disproof of this conjecture." $\endgroup$

aorq
– aorq

2025年11月09日 13:32:48 +00:00
Commented Nov 9 at 13:32

Add a comment |

score 20 · Answer 2 · 2025-10-27 12:06:23Z

Here is an example Counterexample to majority optimality in NICD with erasures

From the abstract:

We asked GPT-5 Pro to look for counterexamples among a public list of open problems (the Simons ``Real Analysis in Computer Science'' collection). After several numerical experiments, it suggested a counterexample for the Non-Interactive Correlation Distillation (NICD) with erasures question: namely, a Boolean function on 5 bits that achieves a strictly larger value of E|f(z)| than the 5-bit majority function when the erasure parameter is p=0.40. In this very short note we record the finding, state the problem precisely, give the explicit function, and verify the computation step by step by hand so that it can be checked without a computer. In addition, we show that for each fixed odd n the majority is optimal (among unbiased Boolean functions) in a neighborhood of p=0. We view this as a little spark of an AI contribution in Theoretical Computer Science: while modern Large Language Models (LLMs) often assist with literature and numerics, here a concrete finite counterexample emerged.

Timothy Chow – Timothy Chow · Answer 3 · 2025-11-06 22:00:26Z

At the request of Gil Kalai, I'm converting a comment to an answer.

The paper "Mathematical exploration and discovery at scale" by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner was just posted to the arXiv: https://arxiv.org/abs/2511.02864.

Below is the abstract of the paper.

AlphaEvolve is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.

EDIT:

Some further developments are in the paper "New Nikodym set constructions over finite fields" by Terence Tao (https://arxiv.org/abs/2511.07721) whose abstract reads

For any fixed dimension $d \geq 3$ we construct a Nikodym set in $\mathbb{F}_q^d$ of cardinality $q^d - (\frac{d-2}{\log 2} +1+o(1)) q^{d-1} \log q$ in the limit $q \to \infty$, when $q$ is an odd prime power. This improves upon the naive random construction, which gives a set of cardinality $q^d - (d-1+o(1)) q^{d-1} \log q$, and is new in the regime where $\mathbb{F}_q$ has unbounded characteristic and $q$ not a perfect square. While the final proofs are completely human generated, the initial ideas of the construction were inspired by output from the tools AlphaEvolve and DeepThink. We also give a new construction of Nikodym sets in $\mathbb{F}_q^2$ for $q$ a perfect square that match the existing bounds of $q^2 - q^{3/2} + O(q \log q)$, assuming that $q$ is not the square of a prime $p \equiv 3 \pmod{4}$.

And also in the paper "Sum-difference exponents for boundedly many slopes, and rational complexity" again by Terence Tao (https://arxiv.org/abs/2511.15135) whose abstract reads

The dimension of Kakeya sets can be bounded using sum-difference exponents $\mathrm{SD}(R;s)$ for various sets of rational slopes $R$ and output slope $s$; the arithmetic Kakeya conjecture, which implies the Kakeya conjecture in all dimensions, asserts that the infimum of such exponents is 1ドル$. The best upper bound on this infimum currently is 1ドル.67513\dots$. In this note, inspired by numerical explorations from the tool AlphaEvolve, we study the regime where the cardinality of the set of slopes $R$ is bounded. In this regime, we establish that these exponents converge to 2ドル$ at a rate controlled by the rational complexity of $s$ relative to $R$, which measures how efficiently $s$ can be expressed as a rational combination of slopes in $R$.

In a nutshell, the idea is to solve a combinatorial optimization problem by evolving code for generating combinatorial objects rather than evolving the combinatorial objects themselves. To do this, one needs to be able to make small random perturbations of the code while still having the code compile; this is where LLMs come in, since writing code is one thing LLMs are good at.
@TimothyChow I believe the approach used to find better cap sets by DeepMind (discussed at mathoverflow.net/questions/463937) was along the same lines.
Thanks, Sam. There is a blogpost about the paper here: terrytao.wordpress.com/2025/11/05/…

Gil Kalai – Gil Kalai · Answer 4 · 2025-10-26 19:40:32Z

This paper

Sergey Avvakumov, Roman Karasev, Tensor rank of the determinant and periodic triangulations of $\mathbb{R}^n$

https://arxiv.org/abs/2509.22333

includes in the Acknowledgments "We also thank ChatGPT 5 for pointing out that the lower bound in the proof of Theorem 1.5 can be stated in tensor language and is thus equal to the determinant’s tensor rank."

Thanks, Zach! I knew the paper and I met Sergey today, but did not know about the role of ChatGPT :)

score 11 · Answer 5 · 2025-10-29 08:53:31Z

The paper "Point Convergence of Nesterov's Accelerated Gradient Method: An AI-Assisted Proof" by Uijeong Jang and Ernest Ryu, posted to Arxiv October 27, 2025, states in the abstract:

The Nesterov accelerated gradient method, introduced in 1983, has been a cornerstone of optimization theory and practice. Yet the question of its point convergence had remained open. In this work, we resolve this longstanding open problem in the affirmative. The discovery of the proof was heavily assisted by ChatGPT, a proprietary large language model, and we describe the process through which its assistance was elicited.

https://arxiv.org/abs/2510.23513

See also this discussion by Damek Davis that helps put the result in perspective: https://x.com/damekdavis/status/1982529760505782510?s=46

score 8 · Answer 6 · 2025-10-26 20:08:28Z

Not exactly a notable result, but in my recent preprint Evaluation of GPT-5 on an Advanced Extension of Kashihara's Problem I describe how GPT-5 has been able to improve the general version of an extended combinatorial problem I originally solved in 2010.

score 7 · Answer 7 · 2025-10-27 16:42:19Z

Scott Aaronson Phillip Harris, Freek Witteveen have a recent paper on the bounds of amplification of QMA (quantum Merlin-Arthur). A critical part of the paper involved a linear algebra trick suggested by GPT5. See Aaronson's blog entry here.

Sam Hopkins – Sam Hopkins ♦ · Answer 8 · 2025-11-06 14:57:04Z

I have hesitated to post this example because I don't think it's really a "notable mathematical development" as such, but after seeing the other answers, I think this one is worth mentioning.

As reported in Scientific American, Epoch AI invited several mathematicians, including Ken Ono, to a meeting designed to generate challenge problems for "FrontierMath". Among other things, Ono came up with what he thought was a Ph.D.-thesis-level problem: "What is the 5th power moment of Tamagawa numbers of elliptic curves over $\mathbb{Q}$?" To Ono's amazement, the AI autonomously solved the problem. You can read Ono's account on his Facebook page (also reproduced below), or listen to him talk about it here.

Even if this is a cherry-picked example—the best one from the whole meeting—this strikes me as a very impressive achievement. But see also this tweet by Daniel Litt, who was also one of the invited mathematicians but was not too impressed when he read over the chat log. enter image description here

A similar project, but on a smaller scale and led by Christian Stump, for using PhD-level mathematics problems to benchmark AI is: math.science-bench.ai

Sam Hopkins – Sam Hopkins ♦ · Answer 9 · 2025-11-21 04:08:54Z

6

$\begingroup$

The abstract of Early science acceleration experiments with GPT-5 by Sébastien Bubeck, Christian Coester, Ronen Eldan, Timothy Gowers, Yin Tat Lee, Alexandru Lupsasca, Mehtaab Sawhney, Robert Scherrer, Mark Sellke, Brian K. Spears, Derya Unutmaz, Kevin Weil, Steven Yin, and Nikita Zhivotovskiy states in part, "Of note, this paper includes four new results in mathematics (carefully verified by the human authors), underscoring how GPT-5 can help human mathematicians settle previously unsolved problems."

Share

Improve this answer

answered Nov 21 at 4:08

community wiki

Timothy Chow

$\endgroup$

4

11

$\begingroup$ I found this passage on pg. 29 really interesting: "Our experience illustrates a pitfall in using AI: although GPT-5 possesses enormous internal knowledge and the capability to locate even more using the internet, it may not always report the original information sources accurately. This has the potential to deceive even seasoned researchers into thinking their findings are novel. We expect that our experience is not unique, and urge others to take special care in attribution when working with LLM-assisted proofs." $\endgroup$

Sam Hopkins
– Sam Hopkins ♦

2025年11月21日 04:11:30 +00:00
Commented Nov 21 at 4:11
2

$\begingroup$ I agree. Even without AI, humans often believe they’ve discovered something new only to learn it was proved earlier by someone else. What’s interesting now is to understand how the rate of such misattributions from LLM-assisted work compares to the natural rate of human rediscovery $\endgroup$

Paata Ivanisvili
– Paata Ivanisvili

2025年11月21日 07:16:00 +00:00
Commented Nov 21 at 7:16
1

$\begingroup$ @SamHopkins I find the quote telling, although as someone who studied history as one of their subjects in the UK's 16-18 high-school specialisms, I find the apparent surprise of a lot of scientists and mathematicians in this regard to be a bit depressing. (Any historian worth their salt will know the distinction between primary and secondary sources and have done some basic training of source analysis, etc) $\endgroup$

Yemon Choi
– Yemon Choi

2025年11月21日 14:23:12 +00:00
Commented Nov 21 at 14:23
1

$\begingroup$ @YemonChoi Yes. It shouldn't even require any formal training (just common sense) to know that you shouldn't just blindly copy a reference from someone else's bibliography without checking its accuracy, but of course many scientists and mathematicians have been doing this since time immemorial. $\endgroup$

Timothy Chow
– Timothy Chow

2025年11月21日 15:19:43 +00:00
Commented Nov 21 at 15:19

Add a comment |

darij grinberg – darij grinberg · Answer 10 · 2025-10-28 17:25:14Z

here is a paper on bottleneck duality in flow networks with lattice coefficients from fall 2024.

https://arxiv.org/abs/2410.00315

the appendix to this paper details how the main result and the proof were generated by GPT-o1-mini in september 2024. it was very difficult to get a correct proof at the time; current models nail a correct proof immediately.

This is funny! I once tried in vain to detropicalize max-flow-min-cut (you can find some traces of that on MO), while you have managed to tropicalize it even further (+ becomes max) and then extend it to distributive lattices :)

Geordie Williamson – Geordie Williamson · Answer 11 · 2025-10-28 16:54:01Z

Using deep neural networks, Deepmind & collaborators numerically found a class of unstable singularities of the porous media and 3D Euler (with boundary) equations. Notable here is the fact that the level of precision of their solutions "meets the stringent requirements for rigorous mathematical validation via computer-assisted proofs" (quote from the paper).

Paper here: https://arxiv.org/abs/2509.14185 Article: https://deepmind.google/discover/blog/discovering-new-solutions-to-century-old-problems-in-fluid-dynamics/

Note that this is an application of neural networks, but not LLMs. (I think they tried to use AlphaEvolve, but this wasn't the main ingredient in the paper...)

score 4 · Answer 12 · 2025-12-05 02:16:13Z

There have been some notable recent examples of LLMs playing an important role in solving certain Erdős problems, e.g., Problem 124 and Problem 481 (although I think the latter turned out to be implied by a result of Klarner in 1982), which were purportedly solved entirely by Aristotle, and Problem 367, which was reduced by Wouter van Doorn to a lemma that was solved by Gemini Deepthink. While these are not famous Erdős problems and turned out to have relatively short and simple solutions, they differ from Olympiad problems in that the computer solved problems without known solutions.

score 3 · Answer 13 · 2025-10-28 17:09:11Z

A recent paper by Nagda, Raghavan, and Thakurta: "Reinforced generation of combinatorial structures: Applications to complexity theory" They received help from AlphaEvolve to improve the best-known bound for Max-3CUT and Max-4CUT. Their idea seems quite general, so I would not be surprised if more complexity theory results would be improved.

Wojowu – Wojowu · Answer 14 · 2025-12-04 11:56:29Z

1

$\begingroup$

The computational complexity paper "Search versus Decision for $S_2^P$" by Lance Fortnow writes in the acknowledgements:

While the results are fully due to the author, this paper was mostly generated using the large language model Gemini 3 Pro with prompting from the author. The author takes full responsibility for its contents.

EDIT (additional context): The author further elaborated on Twitter/X: when asked "It looks like you only told it the theorem statement and didn't give it the sketch." the author replied "Yes, it came up with the proof on its own. Surprised me as well.".

Share

Improve this answer

edited 21 hours ago

community wiki

2 revs
JoS

$\endgroup$

4

1

$\begingroup$ That acknowledgement reads to me like it says the AI was not used in the mathematical development itself. It only helped write the paper. $\endgroup$

Wojowu
– Wojowu

2025年12月04日 12:19:49 +00:00
Commented yesterday
1

$\begingroup$ I agree that the acknowledgement could be interpreted this way. However, the author clarified on Twitter/X: when asked "It looks like you only told it the theorem statement and didn't give it the sketch." the author replied "Yes, it came up with the proof on its own. Surprised me as well.". $\endgroup$

JoS
– JoS

2025年12月04日 14:08:12 +00:00
Commented yesterday
1

$\begingroup$ You should edit your answer to clarify and include this then. I'm happy to withdraw my downvote if you do so. $\endgroup$

Wojowu
– Wojowu

2025年12月04日 14:57:24 +00:00
Commented yesterday
$\begingroup$ Thank you, I have edited the answer accordingly! $\endgroup$

JoS
– JoS

2025年12月04日 17:46:05 +00:00
Commented 21 hours ago

Add a comment |

score 1 · Answer 15 · 2025-12-05 06:27:12Z

This is a modest development and a modest use of AI, but nonetheless, since it may seem that only discretish mathematics is mentioned in most answers, the proof of Lemma 6 in https://arxiv.org/pdf/2511.06849 is due to ChatGPT 5. The techniques are standard but their use is elegant. Honestly, we were stressed out about the proof (when we discovered that an earlier one we had was flawed), and ChatGPT came to our rescue.

Stack Exchange Network

Examples for the use of AI and especially LLMs in notable mathematical developments

The purpose of this question is to collect examples where large language models (LLMs) like ChatGPT have led to notable mathematical developments.

15 Answers 15

You must log in to answer this question.

Linked

Examples for the use of AI and especially LLMs in notable mathematical developments

The purpose of this question is to collect examples where large language models (LLMs) like ChatGPT have led to notable mathematical developments.

15 Answers 15

You must log in to answer this question.

Linked

Related