Polynomial towers and inverse Gowers theory for bounded-exponent groups
6 January, 2026 in math.CO, math.DS, paper | Tags: Asgar Jamneshan, Gowers-Host-Kra seminorms, inverse conjecture for the Gowers norm, Or Shalom | by Terence Tao | 4 comments
Asgar Jamneshan, Or Shalom and I have uploaded to the arXiv our paper “ Polynomial towers and inverse Gowers theory for bounded-exponent groups“. This continues our investigation into the ergodic-theory approach to the inverse theory of Gowers norms over finite abelian groups {G}. In this regard, our main result establishes a satisfactory (qualitative) inverse theorem for groups {G} of bounded exponent:
Theorem 1 Let {G} be a finite abelian group of some exponent {m}, and let {f: G \rightarrow {\bf C}} be {1}-bounded with {\|f\|_{U^{k+1}(G)} > \delta}. Then there exists a polynomial {P: G \rightarrow {\bf R}/{\bf Z}} of degree at most {k} such that\displaystyle |\mathop{\bf E}_{x \in G} f(x) e(-P(x))| \gg_{k,m,\delta} 1.
This type of result was previously known in the case of vector spaces over finite fields (by work of myself and Ziegler), groups of squarefree order (by work of Candela, González-Sánchez, and Szegedy), and in the {k \leq 2} case (by work of Jamneshan and myself). The case {G = ({\bf Z}/4{\bf Z})^n}, for instance, is treated by this theorem but not covered by previous results. In the aforementioned paper of Candela et al., a result similar to the above theorem was also established, except that the polynomial {P} was defined in an extension of {G} rather than in {G} itself (or equivalently, {f} correlated with a projection of a phase polynomial, rather than directly with a phase polynomial on {G}). This result is consistent with a conjecture of Jamneshan and myself regarding what the “right” inverse theorem should be in any finite abelian group {G} (not necessarily of bounded exponent).
In contrast to previous work, we do not need to treat the “high characteristic” and “low characteristic” cases separately; in fact, many of the delicate algebraic questions about polynomials in low characteristic do not need to be directly addressed in our approach, although this is at the cost of making the inductive arguments rather intricate and opaque.
As mentioned above, our approach is ergodic-theoretic, deriving the above combinatorial inverse theorem from an ergodic structure theorem of Host–Kra type. The most natural ergodic structure theorem one could establish here, which would imply the above theorem, would be the statement that if {\Gamma} is a countable abelian group of bounded exponent, and {X} is an ergodic {\Gamma}-system of order at most {k} in the Host–Kra sense, then {X} would be an Abramov system – generated by polynomials of degree at most {k}. This statement was conjectured many years ago by Bergelson, Ziegler, and myself, and is true in many “high characteristic” cases, but unfortunately fails in low characteristic, as recently shown by Jamneshan, Shalom, and myself. However, we are able to recover a weaker version of this statement here, namely that {X} admits an extension which is an Abramov system. (This result was previously established by Candela et al. in the model case when {\Gamma} is a vector space over a finite field.) By itself, this weaker result would only recover a correlation with a projected phase polynomial, as in the work of Candela et al.; but the extension we construct arises as a tower of abelian extensions, and in the bounded exponent case there is an algebraic argument (hinging on a certain short exact sequence of abelian groups splitting) that allows one to map the functions in this tower back to the original combinatorial group {G} rather than an extension thereof, thus recovering the full strength of the above theorem.
It remains to prove the ergodic structure theorem. The standard approach would be to describe the system {X} as a Host–Kra tower
\displaystyle Z^{\leq 0}(X) \leq Z^{\leq 1}(X) \leq \dots \leq Z^{\leq k}(X) = X,
where each extension {Z^{\leq j+1}(X)} of {Z^{\leq j}(X)} is a compact abelian group extension by a cocycle of “type” {j}, and them attempt to show that each such cocycle is cohomologous to a polynomial cocycle. However, this appears to be impossible in general, particularly in low characteristic, as certain key short exact sequences fail to split in the required ways. To get around this, we have to work with a different tower, extending various levels of this tower as needed to obtain additional good algebraic properties of each level that enables one to split the required short exact sequences. The precise properties needed are rather technical, but the main ones can be described informally as follows:
- We need the cocycles to obey an “exactness” property, in that there is a sharp correspondence between the type of the cocycle (or any of its components) and its degree as a polynomial cocycle. (By general nonsense, any polynomial cocycle of degree {\leq d-1} is automatically of type {\leq d}; exactness, roughly speaking, asserts the converse.) Informally, the cocycles should be “as polynomial as possible”.
- The systems in the tower need to have “large spectrum” in that the set of eigenvalues of the system form a countable dense subgroup of the Pontryagin dual of the acting group {\Gamma} (in fact we demand that a specific countable dense subgroup {\tilde \Gamma} is represented).
- The systems need to be “pure” in the sense that the sampling map {\iota_{x_0} P(\gamma) := P(T^\gamma x_0)} that maps polynomials on the system to polynomials on the group {\Gamma} is injective for a.e. {x_0}, with the image being a pure subgroup. Informally, this means that the problem of taking roots of a polynomial in the system is equivalent to the problem of taking roots of the corresponding polynomial on the group {\Gamma}. In low characteristic, the root-taking problem becomes quite complicated, and we do not give a good solution to this problem either in the ergodic theory setting or the combinatorial one; however, purity at least lets one show that the two problems are (morally) equivalent to each other, which turns out to be what is actually needed to make the arguments work. There is also a technical “relative purity” condition we need to impose at each level of the extension to ensure that this purity property propagates up the tower, but I will not describe it in detail here.
It is then possible to recursively construct a tower of extensions that eventually reaches an extension of {X}, for which the above useful properties of exactness, large spectrum, and purity are obeyed, and that the system remains Abramov at each level of the tower. This requires a lengthy process of “straightening” the cocycle by differentiating it, obtaining various “Conze–Lesigne” type equations for the derivatives, and then “integrating” those equations to place the original cocycle in a good form. At multiple stages in this process it becomes necessary to have various short exact sequences of (topological) abelian groups split, which necessitates the various good properties mentioned above. To close the induction one then has to verify that these properties can be maintained as one ascends the tower, which is a non-trivial task in itself.
The maximal length of the Erdős–Herzog–Piranian lemniscate in high degree
15 December, 2025 in math.CV, paper | Tags: Erdos, lemniscate, polynomials | by Terence Tao | 20 comments
I’ve just uploaded to the arXiv my preprint The maximal length of the Erdős–Herzog–Piranian lemniscate in high degree. This paper resolves (in the asymptotic regime of sufficiently high degree) an old question about the polynomial lemniscates
\displaystyle \partial E_1(p) := \{ z: |p(z)| = 1\}
attached to monic polynomials {p} of a given degree {n}, and specifically the question of bounding the arclength {\ell(\partial E_1(p))} of such lemniscates. For instance, when {p(z)=z^n}, the lemniscate is the unit circle and the arclength is {2\pi}; this in fact turns out to be the minimum possible length amongst all (connected) lemniscates, a result of Pommerenke. However, the question of the largest lemniscate length is open. The leading candidate for the extremizer is the polynomial\displaystyle p_0(z)= z^n-1
whose lemniscate is quite convoluted, with an arclength that can be computed asymptotically as\displaystyle \ell(\partial E_1(p_0)) = B(1/2,n/2) = 2n + 4 \log 2 + O(1/n)
where {B} is the beta function.(The images here were generated using AlphaEvolve and Gemini.) A reasonably well-known conjecture of Erdős, Herzog, and Piranian (Erdős problem 114) asserts that this is indeed the maximizer, thus {\ell(\partial E_1(p)) \leq \ell(\partial E_1(p_0))} for all monic polynomials of degree {n}.
There have been several partial results towards this conjecture. For instance, Eremenko and Hayman verified the conjecture when {n=2}. Asympotically, bounds of the form {\ell(\partial E_1(p)) \leq (C+o(1)) n} had been known for various {C} such as {\pi}, {2\pi}, or {4\pi}; a significant advance was made by Fryntov and Nazarov, who obtained the asymptotically sharp upper bound
\displaystyle \ell(\partial E_1(p)) \leq 2n + O(n^{7/8})
and also obtained the sharp conjecture {\ell(\partial E_1(p)) \leq \ell(\partial E_1(p_0))} for {p} sufficiently close to {p_0}. In that paper, the authors comment that the {O(n^{7/8})} error could be improved, but that {O(n^{1/2})} seemed to be the limit of their method.I recently explored this problem with the optimization tool AlphaEvolve, where I found that when I assigned this tool the task of optimizing {\ell(\partial E_1(p))} for a given degree {n}, that the tool rapidly converged to choosing {p} to be equal to {p_0} (up to the rotation and translation symmetries of the problem). This suggested to me that the conjecture was true for all {n}, though of course this was far from a rigorous proof. AlphaEvolve also provided some useful visualization code for these lemniscates which I have incorporated into the paper (and this blog post), and which helped build my intuition for this problem; I view this sort of “vibe-coded visualization” as another practical use-case of present-day AI tools.
In this paper, we iteratively improve upon the Fryntov-Nazarov method to obtain the following bounds, in increasing order of strength:
- (i) {\ell(\partial E_1(p)) \leq 2n + O(n^{1/2})}.
- (ii) {\ell(\partial E_1(p)) \leq 2n + O(1)}.
- (iii) {\ell(\partial E_1(p)) \leq 2n + 4 \log 2 + o(1)}.
- (iv) {\ell(\partial E_1(p)) \leq \ell(\partial E_1(p_0))} for sufficiently large {n}.
The proof of these bounds is somewhat circuitious and technical, with the analysis from each part of this result used as a starting point for the next one. For this blog post, I would like to focus on the main ideas of the arguments.
A key difficulty is that there are relatively few tools for upper bounding the arclength of a curve; indeed, the coastline paradox already shows that curves can have infinite length even when bounded. Thus, one needs to utilize some smooth or algebraic structure on the curve to hope for good upper bounds. One possible approach is via the Crofton formula, using Bezout’s theorem to control the intersection of the curve with various lines. This is already good enough to get bounds of the form {\ell(\partial E_1(p)) \leq O(n)} (for instance by combining it with other known tools to control the diameter of the lemniscate), but it seems challenging to use this approach to get bounds close to the optimal {\sim 2n}.
Instead, we follow Fryntov–Nazarov and utilize Stokes’ theorem to convert the arclength into an area integral. A typical identity used in that paper is
\displaystyle \ell(\partial E_1(p)) = 2 \int_{E_1(p)} |p'|\ dA - \int_{E_1(p)} \frac{|p\varphi|}{\varphi} \psi\ dA
where {dA} is area measure, {\varphi = p'/p} is the log-derivative of {p}, {\psi = p''/p'} is the log-derivative of {p'}, and {E_1(p)} is the region {\{ |p| \leq 1 \}}. This and the triangle inequality already lets one prove bounds of the form {\ell(\partial E_1(p)) \leq 2\pi n + O(n^{1/2})} by controlling {\int_{E_1(p)} |\psi|\ dA} using the triangle inequality, the Hardy-Littlewood rearrangement inequality, and the Polya capacity inequality.But this argument does not fully capture the oscillating nature of the phase {\frac{|\varphi|}{\varphi}} on one hand, and the oscillating nature of {E_1(p)} on the other. Fryntov–Nazarov exploited these oscillations with some additional decompositions and integration by parts arguments. By optimizing these arguments, I was able to establish an inequality of the form
where {E_2(p) = \{ |p| \leq 2\}} is an enlargement of {E_1(p)} (that is significantly less oscillatory, as displayed in the figure below), and {X_1,\dots,X_5} are certain error terms that can be controlled by a number of standard tools (e.g., the Grönwall area theorem).One can heuristically justify (1) as follows. Suppose we work in a region where the functions {\psi}, {p} are roughly constant: {\psi(z) \approx \psi_0}, {p(z) \approx c_0}. For simplicity let us normalize {\psi_0} to be real, and {c_0} to be negative real. In order to have a non-trivial lemniscate in this region, {c_0} should be close to {-1}. Because the unit circle {\partial D(0,1)} is tangent to the line {\{ w: \Re w = -1\}} at {-1}, the lemniscate condition {|p(z)|=1} is then heuristically approximated by the condition that {\Re (p(z) + 1) = 0}. On the other hand, the hypothesis {\psi(z) \approx \psi_0} suggests that {p'(z) \approx A e^{\psi_0 z}} for some amplitude {z}, which heuristically integrates to {p(z) \approx c_0 + \frac{A}{\psi_0} e^{\psi_0 z}}. Writing {\frac{A}{\psi_0}} in polar coordinates as {Re^{i\theta}} and {z} in Cartesian coordinates as {x+iy}, the condition {\Re(p(z)+1)=0} can then be rearranged after some algebra as
\displaystyle \cos (\psi_0 y + \theta) \approx -\frac{1+c_0}{Re^{\psi_0 x}}.
If the right-hand side is much larger than {1} in magnitude, we thus expect the lemniscate to be empty in this region; but if instead the right-hand side is much less than {1} in magnitude, we expect the lemniscate to behave like a periodic sequence of horizontal lines of spacing {\frac{\pi}{\psi_0}}. This makes the main terms on both sides of (1) approximately agree (per unit area).A graphic illustration of (1) (provided by Gemini) is shown below, where the dark spots correspond to small values of {\psi} that act to “repel” (and shorten) the lemniscate. (The bright spots correspond to the critical points of {p}, which in this case consist of six critical points at the origin and one at both of {+1/2} and {-1/2}.)
By choosing parameters appropriately, one can show that {X_1,\dots,X_5 = O(\sqrt{n})} and {\frac{1}{\pi} \int_{E_2(p)} |\psi|\ dA \leq 2n + O(1)}, yielding the first bound {\ell(\partial E_1(p)) \leq 2n + O(n^{1/2})}. However, by a more careful inspection of the arguments, and in particular measuring the defect in the triangle inequality
\displaystyle |\psi(z)| = |\sum_\zeta \frac{1}{z-\zeta}| \leq \sum_\zeta \frac{1}{|z-\zeta|},
where {\zeta} ranges over critical points. From some elementary geometry, one can show that the more the critical points {\zeta} are dispersed away from each other (in an {\ell^1} sense), the more one can gain over the triangle inequality here; conversely, the {L^1} dispersion {\sum_\zeta |\zeta|} of the critical points (after normalizing so that these critical points have mean zero) can be used to improve the control on the error terms {X_1,\dots,X_5}. Optimizing this strategy leads to the second bound\displaystyle \ell(\partial E_1(p)) \leq 2n + O(1).
At this point, the only remaining cases that need to be handled are the ones with bounded dispersion: {\sum_\zeta |\zeta| \leq 1}. In this case, one can do some elementary manipulations of the factorization
\displaystyle p'(z) = n \prod_\zeta (z-\zeta)
to obtain some quite precise control on the asymptotics of {p(z)} and {p'(z)}; for instance, we will be able to obtain an approximation of the form\displaystyle p(z) \approx -1 + \frac{z p'(z)}{n}
with high accuracy, as long as {z} is not too close to the origin or to critical points. This, combined with direct arclength computations, can eventually lead to the third estimate\displaystyle \ell(\partial E_1(p)) \leq 2n + 4 \log 2 + o(1).
The last remaining cases to handle are those of small dispersion, {\sum_\zeta |\zeta| = o(1)}. An extremely careful version of the previous analysis can now give an estimate of the shape\displaystyle \ell(\partial E_1(p)) \leq \ell(\partial E_1(p_0)) - c \|p\| + o(\|p\|)
for an absolute constant {c>0}, where {\|p\|} is a measure of how close {p} is to {p_0} (it is equal to the dispersion {\sum_\zeta |\zeta|} plus an additional term {n |1 + p(0)|^{1/n}} to deal with the constant term {p(0)}). This establishes the final bound (for {n} large enough), and even shows that the only extremizer is {p_0} (up to translation and rotation symmetry).The Equational Theories Project: Advancing Collaborative Mathematical Research at Scale
9 December, 2025 in math.RA, paper | Tags: Aaron Hill, Alex Meiburg, Amir Livne Bar-on, Anand Rao Tadipatri, Andrés Goens, Bernhard Reinke, Bruno Le Floch, Cody Roux, Daniel Weber, David Renshaw, Douglas McNeil, Emmanuel Osalotioman Osazuwa, Equational Theory Project, Fan Zheng, Fernando Vaquerizo-Villar, Floris van Doorn, Giovanni Paolini, Harald Husum, Hernán Ibarra Mejia, Jérémy Scanvic, Joachim Breitner, Jose Brox, Lorenzo Luccioli, Marco Petracci, Marcus Rossel, Mario Carneiro, Martin Dvorak, Matthew Bolan, Nicholas Carlini, Pace P. Nielsen, Pietro Monticone, Shreyas Srinivas, universal algebra, Vlad Tsyrklevich, Zoltan Kocsis | by Terence Tao | 18 comments
Matthew Bolan, Joachim Breitner, Jose Brox, Nicholas Carlini, Mario Carneiro, Floris van Doorn, Martin Dvorak, Andrés Goens, Aaron Hill, Harald Husum, Hernán Ibarra Mejia, Zoltan Kocsis, Bruno Le Floch, Amir Livne Bar-on, Lorenzo Luccioli, Douglas McNeil, Alex Meiburg, Pietro Monticone, Pace P. Nielsen, Emmanuel Osalotioman Osazuwa, Giovanni Paolini, Marco Petracci, Bernhard Reinke, David Renshaw, Marcus Rossel, Cody Roux, Jérémy Scanvic, Shreyas Srinivas, Anand Rao Tadipatri, Vlad Tsyrklevich, Fernando Vaquerizo-Villar, Daniel Weber, Fan Zheng, and I have just uploaded to the arXiv our preprint The Equational Theories Project: Advancing Collaborative Mathematical Research at Scale. This is the final report for the Equational Theories Project, which was proposed in this blog post and also showcased in this subsequent blog post. The aim of this project was to see whether one could collaboratively achieve a large-scale systematic exploration of a mathematical space, which in this case was the implication graph between 4694 equational laws of magmas. A magma is a set {G} equipped with a binary operation {\diamond: G \times G \rightarrow G} (or, equivalently, a {G \times G} multiplication table). An equational law is an equation involving this operation and a number of indeterminate variables. Some examples of equational laws, together with the number that we assigned to that law, include
- {E1} (Trivial law): {x = x}
- {E2} (Singleton law): {x = y}
- {E43} (Commutative law): {x \diamond y = y \diamond x}
- {E168} (Central groupoid law): {x = (y \diamond x) \diamond (x \diamond z)}
- {E4512} (Associative law): {x \diamond (y \diamond z) = (x \diamond y) \diamond z}
The aim of the project was to work out which of these laws imply which others. For instance, all laws imply the trivial law {E1}, and conversely the singleton law {E2} implies all the others. On the other hand, the commutative law {E43} does not imply the associative law {E4512} (because there exist magmas that are commutative but not associative), nor is the converse true. All in all, there are {22,028,942} implications of this type to settle; most of these are relatively easy and could be resolved in a matter of minutes by an expert in college-level algebra, but prior to this project, it was impractical to actually do so in a manner that could be feasibly verified. Also, this problem is known to become undecidable for sufficiently long equational laws. Nevertheless, we were able to resolve all the implications informally after two months, and have them completely formalized in Lean after a further five months.
After a rather hectic setup process (documented in this personal log), progress came in various waves. Initially, huge swathes of implications could be resolved first by very simple-minded techniques, such as brute-force searching all small finite magmas to refute implications; then, automated theorem provers such as Vampire or Mace4 / Prover9 were deployed to handle a large fraction of the remainder. A few equations had existing literature that allowed for many implications involving them to be determined. This left a core of just under a thousand implications that did not fall to any of the “cheap” methods, and which occupied the bulk of the efforts of the project. As it turns out, all of the remaining implications were negative; the difficulty was to construct explicit magmas that obeyed one law but not another. To do this, we discovered a number of general constructions of magmas that were effective at this task. For instance:
- Linear models, in which the carrier {G} was a (commutative or noncommutative) ring and the magma operation took the form {x \diamond y = ax + by} for some coefficients {a,b}, turned out to resolve many cases.
- We discovered a new invariant of an equational law, which we call the “twisting semigroup” of that law, which also allowed us to construct further examples of magmas that obeyed one law {E} but not another {E'}, by starting with a base magma {M} that obeyed both laws, taking a Cartesian power {M^n} of that magma, and then “twisting” the magma operation by certain permutations of {\{1,\dots,n\}} designed to preserve {E} but not {E'}.
- We developed a theory of “abelian magma extensions”, similar to the notion of an abelian extension of a group, which allowed us to flexibly build new magmas out of old ones in a manner controlled by a certain “magma cohomology group {H^1_E(G,M)}” which were tractable to compute, and again gave ways to construct magmas that obeyed one law {E} but not another {E'}.
- Greedy methods, in which one fills out an infinite multiplication table in a greedy manner (somewhat akin to naively solving a Sudoku puzzle), subject to some rules designed to avoid collisions and maintain a law {E}, as well as some seed entries designed to enforce a counterexample to a separate law {E'}. Despite the apparent complexity of this method, it can be automated in a manner that allowed for many outstanding implications to be refuted.
- Smarter ways to utilize automated theorem provers, such as strategically adding in additional axioms to the magma to help narrow the search space, were developed over the course of the project.
Even after applying all these general techniques, though, there were about a dozen particularly difficult implications that resisted even these more powerful methods. Several ad hoc constructions were needed in order to understand the behavior of magmas obeying such equations as E854, E906, E1323, E1516, and E1729, with the latter taking months of effort to finally solve and then formalize.
A variety of GUI interfaces were also developed to facilitate the collaboration (most notably the Equation Explorer tool mentioned above), and several side projects were also created within the project, such as the exploration of the implication graph when the magma was also restricted to be finite. In this case, we resolved all of the {22,028,942} implications except for one (and its dual):
Problem 1 Does the law {x = y \diamond (x \diamond ((y \diamond x) \diamond y))} (E677) imply the law {x = ((x \diamond x) \diamond x) \diamond x} (E255) for finite magmas?
See this blueprint page for some partial results on this problem, which we were unable to resolve even after months of effort.
Interestingly, modern AI tools did not play a major role in this project (but it was largely completed in 2024, before the most recent advanced models became available); while they could resolve many implications, the older “good old-fashioned AI” of automated theorem provers were far cheaper to run and already handled the overwhelming majority of the implications that the advanced AI tools could. But I could imagine that such tools would play a more prominent role in future similar projects.
The story of Erdős problem #1026
8 December, 2025 in expository, math.CA, math.CO | Tags: AI, AlphaEvolve, Erdos | by Terence Tao | 17 comments
Problem 1026 on the Erdős problem web site recently got solved through an interesting combination of existing literature, online collaboration, and AI tools. The purpose of this blog post is to try to tell the story of this collaboration, and also to supply a complete proof.
The original problem of Erdős, posed in 1975, is rather ambiguous. Erdős starts by recalling his famous theorem with Szekeres that says that given a sequence of {k^2+1} distinct real numbers, one can find a subsequence of length {k+1} which is either increasing or decreasing; and that one cannot improve the {k^2+1} to {k^2}, by considering for instance a sequence of {k} blocks of length {k}, with the numbers in each block decreasing, but the blocks themselves increasing. He also noted a result of Hanani that every sequence of length {k(k+3)/2} can be decomposed into the union of {k} monotone sequences. He then wrote “As far as I know the following question is not yet settled. Let {x_1,\dots,x_n} be a sequence of distinct numbers, determine
\displaystyle S(x_1,\dots,x_n) = \max \sum_r x_{i_r}
where the maximum is to be taken over all monotonic sequences {x_{i_1},\dots,x_{i_m}}“.This problem was added to the Erdős problem site on September 12, 2025, with a note that the problem was rather ambiguous. For any fixed {n}, this is an explicit piecewise linear function of the variables {x_1,\dots,x_n} that could be computed by a simple brute force algorithm, but Erdős was presumably seeking optimal bounds for this quantity under some natural constraint on the {x_i}. The day the problem was posted, Desmond Weisenberg proposed studying the quantity {c(n)}, defined as the largest constant such that
\displaystyle S(x_1,\dots,x_n) \geq c(n) \sum_{i=1}^n x_i
for all choices of (distinct) real numbers {x_1,\dots,x_n}. Desmond noted that for this formulation one could assume without loss of generality that the {x_i} were positive, since deleting negative or vanishing {x_i} does not increase the left-hand side and does not decrease the right-hand side. By a limiting argument one could also allow collisions between the {x_i}, so long as one interpreted monotonicity in the weak sense.Though not stated on the web site, one can formulate this problem in game theoretic terms. Suppose that Alice has a stack of {N} coins for some large {N}. She divides the coins into {n} piles of consisting of {x_1,\dots,x_n} coins each, so that {\sum_{i=1}^n x_i = N}. She then passes the piles to Bob, who is allowed to select a monotone subsequence of the piles (in the weak sense) and keep all the coins in those piles. What is the largest fraction {c(n)} of the coins that Bob can guarantee to keep, regardless of how Alice divides up the coins? (One can work with either a discrete version of this problem where the {x_i} are integers, or a continuous one where the coins can be split fractionally, but in the limit {N \rightarrow \infty} the problems can easily be seen to be equivalent.)
AI-generated images continue to be problematic for a number of reasons, but here is one such image that somewhat manages at least to convey the idea of the game:
For small {n}, one can work out {c(n)} by hand. For {n=1}, clearly {c(1)=1}: Alice has to put all the coins into one pile, which Bob simply takes. Similarly {c(2)=1}: regardless of how Alice divides the coins into two piles, the piles will either be increasing or decreasing, so in either case Bob can take both. The first interesting case is {n=3}. Bob can again always take the two largest piles, guaranteeing himself {2/3} of the coins. On the other hand, if Alice almost divides the coins evenly, for instance into piles {((1/3 + \varepsilon)N, (1/3-2\varepsilon) N, (1/3+\varepsilon)N)} for some small {\varepsilon>0}, then Bob cannot take all three piles as they are non-monotone, and so can only take two of them, allowing Alice to limit the payout fraction to be arbitrarily close to {2/3}. So we conclude that {c(3)=2/3}.
An hour after Desmond’s comment, Stijn Cambie noted (though not in the language I used above) that a similar construction to the one above, in which Alice divides the coins into {k^2} pairs that are almost even, in such a way that the longest monotone sequence is of length {k}, gives the upper bound {c(k^2) \leq 1/k}. It is also easy to see that {c(n)} is a non-increasing function of {n}, so this gives a general bound {c(n) \leq (1+o(1))/\sqrt{n}}. Less than an hour after that, Wouter van Doorn noted that the Hanani result mentioned above gives the lower bound {c(n) \geq (\frac{1}{\sqrt{2}}-o(1))/\sqrt{n}}, and posed the problem of determining the asymptotic limit of {\sqrt{n} c(n)} as {n \rightarrow \infty}, given that this was now known to range between {1/\sqrt{2}-o(1)} and {1+o(1)}. This version was accepted by Thomas Bloom, the moderator of the Erdős problem site, as a valid interpretation of the original problem.
The next day, Stijn computed the first few values of {c(n)} exactly:
\displaystyle 1, 1, 2/3, 1/2, 1/2, 3/7, 2/5, 3/8, 1/3.
While the general pattern was not yet clear, this was enough data for Stijn to conjecture that {c(k^2)=1/k}, which would also imply that {\sqrt{n} c(n) \rightarrow 1} as {n \rightarrow \infty}. (EDIT: as later located by an AI deep research tool, this conjecture was also made in Section 12 of this 1980 article of Steele.) Stijn also described the extremizing sequences for this range of {n}, but did not continue the calculation further (a naive computation would take runtime exponential in {n}, due to the large number of possible subsequences to consider).The problem then lay dormant for almost two months, until December 7, 2025, in which Boris Alexeev, as part of a systematic sweep of the Erdős problems using the AI tool Aristotle, was able to get this tool to autonomously solve this conjecture {c(k^2)=1/k} in the proof assistant language Lean. The proof converted the problem to a rectangle-packing problem.
This was one further addition to a recent sequence of examples where an Erdős problem had been automatically solved in one fashion or another by an AI tool. Like the previous cases, the proof turned out to not be particularly novel. Within an hour, Koishi Chan gave an alternate proof deriving the required bound {c(k^2) \geq 1/k} from the original Erdős-Szekeres theorem by a standard “blow-up” argument which we can give here in the Alice-Bob formulation. Take a large {M}, and replace each pile of {x_i} coins with {(1+o(1)) M^2 x_i^2} new piles, each of size {(1+o(1)) x_i}, chosen so that the longest monotone subsequence in this collection is {(1+o(1)) M x_i}. Among all the new piles, the longest monotone subsequence has length {(1+o(1)) M S(x_1,\dots,x_n)}. Applying Erdős-Szekeres, one concludes the bound
\displaystyle M S(x_1,\dots,x_n) \geq (1-o(1)) (\sum_{i=1}^{k^2} M^2 x_i^2)^{1/2}
and on canceling the {M}‘s, sending {M \rightarrow \infty}, and applying Cauchy-Schwarz, one obtains {c(k^2) \geq 1/k} (in fact the argument gives {c(n) \geq 1/\sqrt{n}} for all {n}).Once this proof was found, it was natural to try to see if it had already appeared in the literature. AI deep research tools have successfully located such prior literature in the past, but in this case they did not succeed, and a more “old-fashioned” Google Scholar job turned up some relevant references: a 2016 paper by Tidor, Wang and Yang contained this precise result, citing an earlier paper of Wagner as inspiration for applying “blowup” to the Erdős-Szekeres theorem.
But the story does not end there! Upon reading the above story the next day, I realized that the problem of estimating {c(n)} was a suitable task for AlphaEvolve, which I have used recently as mentioned in this previous post. Specifically, one could task to obtain upper bounds on {c(n)} by directing it to produce real numbers (or integers) {x_1,\dots,x_n} summing up to a fixed sum (I chose {10^6}) with a small a value of {S(x_1,\dots,x_n)} as possible. After an hour of run time, AlphaEvolve produced the following upper bounds on {c(n)} for {1 \leq n \leq 16}, with some intriguingly structured potential extremizing solutions:
The numerical scores (divided by {10^6}) were pretty obviously trying to approximate simple rational numbers. There were a variety of ways (including modern AI) to extract the actual rational numbers they were close to, but I searched for a dedicated tool and found this useful little web page of John Cook that did the job:\displaystyle 1, 1, 2/3, 1/2, 1/2, 3/7, 2/5, 3/8, 1/3, 1/4.
\displaystyle 1/3, 4/13, 3/10, 4/14, 3/11, 4/15, 1/4.
I could not immediately see the pattern here, but after some trial and error in which I tried to align numerators and denominators, I eventually organized this sequence into a more suggestive form:\displaystyle 1,
\displaystyle 1/1, \mathbf{2/3}, 1/2,
\displaystyle 2/4, \mathbf{3/7}, 2/5, \mathbf{3/8}, 2/6,
\displaystyle 3/9, \mathbf{4/13}, 3/10, \mathbf{4/14}, 3/11, \mathbf{4/15}, 3/12.
This gave a somewhat complicated but predictable conjecture for the values of the sequence {c(n)}. On posting this, Boris found a clean formulation of the conjecture, namely that\displaystyle c(k^2 + 2a + 1) = \frac{k}{k^2+a} \ \ \ \ \ (1)
whenever {k \geq 1} and {-k \leq a \leq k}. After a bit of effort, he also produced an explicit upper bound construction:Proposition 1 If {k \geq 1} and {-k \leq a \leq k}, then {c(k^2+2a+1) \leq \frac{k}{k^2+a}}.
Proof: Consider a sequence {x_1,\dots,x_{k^2+2a+1}} of numbers clustered around the “red number” {|a|} and “blue number” {|a+1|}, consisting of {|a|} blocks of {k-|a|} “blue” numbers, followed by {|a+1|} blocks of {|a+1|} “red” numbers, and then {k-|a|} further blocks of {k} “blue” numbers. When {a \geq 0}, one should take all blocks to be slightly decreasing within each block, but the blue blocks should be are increasing between each other, and the red blocks should also be increasing between each other. When {a < 0}, all of these orderings should be reversed. The total number of elements is indeed
\displaystyle |a| \times (k-|a|) + |a+1| \times |a+1| + (k-|a|) \times k
\displaystyle = k^2 + 2a + 1
and the total sum is close to\displaystyle |a| \times (k-|a|) \times |a+1| + |a+1| \times |a+1| \times |a|
+ (k-|a|) \times k \times |a+1| = (k^2 + a) |a+1|.
With this setup, one can check that any monotone sequence consists either of at most {|a+1|} red elements and at most {k-|a|} blue elements, or no red elements and at most {k} blue elements, in either case giving a monotone sum that is bounded by either\displaystyle |a+1| \times |a| + (k-|a|) \times |a+1| = k |a+1|
or\displaystyle 0 + k \times |a+1| = k |a+1|,
giving the claim. \BoxHere is a figure illustrating the above construction in the {a \geq 0} case (obtained after starting with a ChatGPT-provided file and then manually fixing a number of placement issues):
Here is a plot of 1/c(n) (produced by ChatGPT Pro), showing that it is basically a piecewise linear approximation to the square root function:
Shortly afterwards, Lawrence Wu clarified the connection between this problem and a square packing problem, which was also due to Erdős (Problem 106). Let {f(n)} be the least number such that, whenever one packs {n} squares of sidelength {d_1,\dots,d_n} into a square of sidelength {D}, with all sides parallel to the coordinate axes, one has
\displaystyle \sum_{i=1}^n d_i \leq f(n) D.
Proposition 2 For any {n}, one has\displaystyle c(n) \geq \frac{1}{f(n)}.
Proof: Given {x_1,\dots,x_n} and {1 \leq i \leq n}, let {S_i} be the maximal sum over all increasing subsequences ending in {x_i}, and {T_i} be the maximal sum over all decreasing subsequences ending in {x_i}. For {i < j}, we have either {S_j \geq S_i + x_j} (if {x_j \geq x_i}) or {T_j \geq T_i + x_j} (if {x_j \leq x_i}). In particular, the squares {(S_i-x_i, T_i-x_i)} and {(S_j-x_j, T_j-x_j)} are disjoint. These squares pack into the square {[0, S(x_1,\dots,x_n)]^2}, so by definition of {f}, we have
\displaystyle \sum_{i=1}^n x_i \leq f(n) S(x_1,\dots,x_n),
and the claim follows. \BoxThis idea of using packing to prove Erdős-Szekeres type results goes back to a 1959 paper of Seidenberg, although it was a discrete rectangle-packing argument that was not phrased in such an elegantly geometric form. It is possible that Aristotle was “aware” of the Seidenberg argument via its training data, as it had incorporated a version of this argument in its proof.
Here is an illustration of the above argument using the AlphaEvolve-provided example
\displaystyle[99998, 99997, 116305, 117032, 116304,
\displaystyle 58370, 83179, 117030, 92705, 99080]
for n=10 to convert it to a square packing (image produced by ChatGPT Pro):
At this point, Lawrence performed another AI deep research search, this time successfully locating a paper from just last year by Baek, Koizumi, and Ueoro, where they show that
Theorem 3 For any {k \geq 1}, one has\displaystyle f(k^2+1) \leq k
which, when combined with a previous argument of Praton, implies
Theorem 4 For any {k \geq 1} and {c \in {\bf Z}} with {k^2+2c+1 \geq 1}, one has\displaystyle f(k^2+2c+1) \leq k + \frac{c}{k}.
This proves the conjecture!
There just remained the issue of putting everything together. I did feed all of the above information into a large language model, which was able to produce a coherent proof of (1) assuming the results of Baek-Koizumi-Ueoro and Praton. Of course, LLM outputs are prone to hallucination, so it would be preferable to formalize that argument in Lean, but this looks quite doable with current tools, and I expect this to be accomplished shortly. But I was also able to reproduce the arguments of Baek-Koizumi-Ueoro and Praton, which I include below for completeness.
Proof: (Proof of Theorem 3, adapted from Baek-Koizumi-Ueoro) We can normalize {D=k}. It then suffices to show that if we pack the length {k} torus {({\bf Z}/k{\bf Z})^2} by {k^2+1} axis-parallel squares of sidelength {d_1,\dots,d_{k^2+1}}, then
\displaystyle \sum_{i=1}^{k^2+1} d_i \leq k^2.
Pick {x_0, y_0 \in {\bf R}/k{\bf Z}}. Then we have a {k \times k} grid
\displaystyle (x_0 + {\bf Z}) \times (y_0 + {\bf Z}) \pmod {k{\bf Z}^2}
inside the torus. The {i^{th}} square, when restricted to this grid, becomes a discrete rectangle {A_{i,x_0} \times B_{i,y_0}} for some finite sets {A_{i,x_0}, B_{i,y_0}} with\displaystyle |\# A_{i,x_0} -\# B_{i,y_0}| \leq 1. \ \ \ \ \ (2)
By the packing condition, we have\displaystyle \sum_{i=1}^{k^2+1} \# A_{i,x_0} \# B_{i,y_0} \leq k^2.
From (2) we have\displaystyle (\# A_{i,x_0} - 1) (\# B_{i,y_0} - 1) \geq 0
hence\displaystyle \# A_{i,x_0} \# B_{i,y_0} \geq \# A_{i,x_0} + \# B_{i,y_0} - 1.
Inserting this bound and rearranging, we conclude that\displaystyle \sum_{i=1}^{k^2+1} \# A_{i,x_0} + \sum_{i=1}^{k^2+1} \# B_{i,y_0} \leq 2k^2 + 1.
Taking the supremum over {x_0,y_0} we conclude that\displaystyle \sup_{x_0} \sum_{i=1}^{k^2+1} \# A_{i,x_0} + \sup_{y_0} \sum_{i=1}^{k^2+1} \# B_{i,y_0} \leq 2k^2 + 1
so by the pigeonhole principle one of the summands is at most {k^2}. Let’s say it is the former, thus\displaystyle \sup_{x_0} \sum_{i=1}^{k^2+1} \# A_{i,x_0} \leq k^2.
In particular, the average value of {\sum_{i=1}^{k^2+1} \# A_{i,x_0}} is at most {k^2}. But this can be computed to be {\sum_{i=1}^{k^2+1} d_i}, giving the claim. Similarly if it is the other sum. \BoxUPDATE: Actually, the above argument also proves Theorem 4 with only minor modifications. Nevertheless, we give the original derivation of Theorem 4 using the embedding argument of Praton below for sake of completeness.
Proof: (Proof of Theorem 4, adapted from Praton) We write {c = \epsilon |c|} with {\epsilon = \pm 1}. We can rescale so that the square one is packing into is {[0,k]^2}. Thus, we pack {k^2+2\varepsilon |c|+1} squares of sidelength {d_1,\dots,d_{k^2+2\varepsilon |c|+1}} into {[0,k]^2}, and our task is to show that
\displaystyle \sum_{i=1}^{k^2+2\varepsilon|c|+1} d_i \leq k^2 + \varepsilon |c|.
We pick a large natural number {N} (in particular, larger than {k}), and consider the three nested squares\displaystyle [0,k]^2 \subset [0,N]^2 \subset [0,N + |c| \frac{N}{N-\varepsilon}]^2.
We can pack {[0,N]^2 \backslash [0,k]^2} by {N^2-k^2} unit squares. We can similarly pack\displaystyle [0,N + |c| \frac{N}{N-\varepsilon}]^2 \backslash [0,N]^2
\displaystyle =[0, \frac{N}{N-\varepsilon} (N+|c|-\varepsilon)]^2 \backslash [0, \frac{N}{N-\varepsilon} (N-\varepsilon)]^2
into {(N+|c|-\varepsilon)^2 - (N-\varepsilon)^2} squares of sidelength {\frac{N}{N-\varepsilon}}. All in all, this produces\displaystyle k^2+2\varepsilon |c|+1 + N^2-k^2 + (N+|c|-\varepsilon)^2 - (N-\varepsilon)^2
\displaystyle = (N+|c|)^2 + 1
squares, of total length\displaystyle (\sum_{i=1}^{k^2+2\varepsilon |c|+1} d_i) +(N^2-k^2) + ((N+|c|-\varepsilon)^2 - (N-\varepsilon)^2) \frac{N}{N-\varepsilon}.
Applying Theorem 3, we conclude that\displaystyle (\sum_{i=1}^{k^2+2\varepsilon |c|+1} d_i) +(N^2-k^2)
\displaystyle + ((N+|c|-\varepsilon)^2 - (N-\varepsilon)^2) \frac{N}{N-\varepsilon} \leq (N+|c|) (N + |c| \frac{N}{N-\varepsilon}).
The right-hand side is\displaystyle N^2 + 2|c| N + |c|^2 + \varepsilon |c| + O(1/N)
and the left-hand side similarly evaluates to\displaystyle (\sum_{i=1}^{k^2+2c+1} d_i) + N^2 -k^2 + 2|c| N + |c|^2 + O(1/N)
and so we simplify to\displaystyle \sum_{i=1}^{k^2+2\varepsilon |c|+1} d_i \leq k^2 + \varepsilon |c| + O(1/N).
Sending {N \rightarrow \infty}, we obtain the claim. \Box One striking feature of this story for me is how important it was to have a diverse set of people, literature, and tools to attack this problem. To be able to state and prove the precise formula for {c(n)} required multiple observations, including some version of the following:- The sequence can be numerically computed as a sequence of rational numbers.
- When appropriately normalized and arranged, visible patterns in this sequence appear that allow one to conjecture the form of the sequence.
- This problem is a weighted version of the Erdős-Szekeres theorem.
- Among the many proofs of the Erdős-Szekeres theorem is the proof of Seidenberg in 1959, which can be interpreted as a discrete rectangle packing argument.
- This problem can be reinterpreted as a continuous square packing problem, and in fact is closely related to (a generalized axis-parallel form of) Erdős problem 106, which concerns such packings.
- The axis-parallel form of Erdős problem 106 was recently solved by Baek-Koizumi-Ueoro.
- The paper of Praton shows that Erdős Problem 106 implies the generalized version needed for this problem. This implication specializes to the axis-parallel case.
Another key ingredient was the balanced AI policy on the Erdős problem website, which encourages disclosed AI usage while strongly discouraging undisclosed use. To quote from that policy: “Comments prepared with the assistance of AI are permitted, provided (a) this is disclosed, (b) the contents (including mathematics, code, numerical data, and the existence of relevant sources) have been carefully checked and verified by the user themselves without the assistance of AI, and (c) the comment is not unreasonably long.”
Quantitative correlations and some problems on prime factors of consecutive integers
1 December, 2025 in math.NT, paper | Tags: correlation, Joni Teravainen, prime divisors | by Terence Tao | 4 comments
Joni Teravainen and I have uploaded to the arXiv our paper “Quantitative correlations and some problems on prime factors of consecutive integers“. This paper applies modern analytic number theory tools – most notably, the Maynard sieve and the recent correlation estimates for bounded multiplicative functions of Pilatte – to resolve (either partially or fully) some old problems of Erdős, Strauss, Pomerance, Sárközy, and Hildebrand, mostly regarding the prime counting function
\displaystyle \omega(n) := \sum_{p|n} 1
and its relatives. The famous Hardy–Ramanujan and Erdős–Kac laws tell us that asymptotically for {n \sim x}, {\omega(n)} should behave like a gaussian random variable with mean and variance both close to {\log\log x}; but the question of the joint distribution of consecutive values such as {\omega(n), \omega(n+1)} is still only partially understood. Aside from some lower order correlations at small primes (arising from such observations as the fact that precisely one of {n,n+1} will be divisible by {2}), the expectation is that such consecutive values behave like independent random variables. As an indication of the state of the art, it was recently shown by Charamaras and Richter that any bounded observables {f(\omega(n))}, {g(\omega(n+1))} will be asymptotically decorrelated in the limit {n \rightarrow \infty} if one performs a logarithmic statistical averaging. Roughly speaking, this confirms the independence heuristic at the scale {\sqrt{\log\log x}} of the standard deviation, but does not resolve finer-grained information, such as precisely estimating the probability of the event {\omega(n)=\omega(n+1)}.Our first result, answering a question of Erdős, shows that there are infinitely many {n} for which one has the bound
\displaystyle \omega(n+k) \ll k
for all {k \geq 1}. For {k \gg \log\log n}, such a bound is already to be expected (though not completely universal) from the Hardy–Ramanujan law; the main difficulty is thus with the short shifts {k = o(\log\log n)}. If one only had to demonstrate this type of bound for a bounded number of {k}, then this type of result is well within standard sieve theory methods, which can make any bounded number of shifts {n+k} “almost prime” in the sense that {\omega(n+k)} becomes bounded. Thus the problem is that the “sieve dimension” {\sim \log\log n} grows (slowly) with {n}. When writing about this problem in 1980, Erdős and Graham write “we just know too little about sieves to be able to handle such a question (“we” here means not just us but the collective wisdom (?) of our poor struggling human race)”.However, with the advent of the Maynard sieve (also sometimes referred to as the Maynard–Tao sieve), it turns out to be possible to sieve for the conditions {\omega(n+k) \ll k} for all {k = o(\log\log n)} simultaneously (roughly speaking, by sieving out any {n} for which {n+k} is divisible by a prime {p \ll x^{1/\exp(Ck)}} for a large {C}), and then performing a moment calculation analogous to the standard proof (due to Turán) of the Hardy–Ramanujan law, but weighted by the Maynard sieve. (In order to get good enough convergence, one needs to control fourth moments as well as second moments, but these are standard, if somewhat tedious, calculations).
Our second result, which answers a separate question of Erdős, establishes that the quantity
\displaystyle \sum_n \frac{\omega(n)}{2^n} = 0.5169428\dots
is irrational; this had recently been established by Pratt under the assumption of the prime tuples conjecture, but we are able to establish this result unconditoinally. The binary expansion of this number is of course closely related to the distribution of {\omega}, but in view of the Hardy–Ramanujan law, the {n^{th}} digit of this number is influenced by about {\log\log\log n} nearby values of {\omega}, which is too many correlations for current technology to handle. However, it is possible to do some “Gowers norm” type calculations to decouple things to the point where pairwise correlation information is sufficient. To see this, suppose for contradiction that this number was a rational {a/q}, thus\displaystyle q \sum_n \frac{\omega(n)}{2^n} = 0 \hbox{ mod } 1.
Multiplying by {2^n}, we obtain some relations between shifts {\omega(n+h)}:\displaystyle q \sum_{h=1}^\infty \frac{\omega(n+h)}{2^h} = 0 \hbox{ mod } 1.
Using the additive nature of {\omega}, one then also gets similar relations on arithmetic progressions, for many {n} and {p}:\displaystyle q \sum_{h=1}^\infty \frac{\omega(n+ph)}{2^h} = 0 \hbox{ mod } 1.
Taking alternating sums of this sort of identity for various {n} and {p} (in analogy to how averages involving arithmetic progressions can be related to Gowers norm-type expressions over cubes), one can eventually arrive eliminate the contribution of small {H}, and arrive at an identity of the form\displaystyle q \sum_{h=1}^\infty \frac{\sum_{\epsilon \in \{0,1\}^K} (-1)^{|\epsilon|} \omega(n + r_{\epsilon,h+K})}{2^{h+K}} = 0 \hbox{ mod } 1 \ \ \ \ \ (1)
for many {n}, where {K} is a parameter (we eventually take {K \sim \log\log\log\log n}) and {r_{\epsilon,h+K}} are various shifts that we will not write out explicitly here. This looks like quite a messy expression; however, one can adapt proofs of the Erdős–Kac law and show that, as long as one ignores the contribution of really large prime factors (of order {\gg n^{1/10}}, say) to the {\omega(n + r_{\epsilon,h+K})}, that this sort of sum behaves like a gaussian, and in particular once one can show a suitable local limit theorem, one can contradict (1). The contribution of the large prime factors does cause a problem though, as a naive application of the triangle inequality bounds this contribution by {O(1)}, which is an error that overwhelms the information provided by (1). To resolve this we have to adapt the pairwise correlation estimates of Pilatte mentioned earlier to demonstrate that the these contributions are in fact {o(1)}. Here it is important that the error estimates of Pilatte are quite strong (of order {O(\log^{-c} n)}); previous correlation estimates of this type (such as those used in this earlier paper with Joni) turn out to be too weak for this argument to close.Our final result concerns the asymptotic behavior of the density
\displaystyle \frac{1}{x} \{n \leq x: \omega(n+1) = \omega(n)\}
(we also address similar questions for {\Omega(n+1)=\Omega(n)} and {\tau(n+1)=\tau(n)}). Heuristic arguments led Erdős, Pomerance, and Sárközy to conjecture that this quantity was asymptotically {\frac{1}{2\sqrt{\pi \log\log x}}}. They were able to establish an upper bound of {O(1/\log\log x)}, while Hildebrand obtained a lower bound of {\gg 1/(\log\log x)^3}, due to Hildebrand. Here, we obtain the asymptotic for almost all {x} (the limitation here is the standard one, which is that the current technology on pairwise correlation estimates either requires logarithmic averaging, or is restricted to almost all scales rather than all scales). Roughly speaking, the idea is to use the circle method to rewrite the above density in terms of expressions\displaystyle \frac{1}{x} \sum_{n \leq x} e(\alpha \omega(n+1)) e(-\alpha \omega(n))
for various frequencies {\alpha}, use the estimates of Pilatte to handle the minor arc {\alpha}, and convert the major arc contribution back into physical space (in which {\omega(n+1)} and {\omega(n)} are now permitted to differ by a large amount) and use more traditional sieve theoretic methods to estimate the result.Growth rates of sequences governed by the squarefree properties of its translates
1 December, 2025 in math.NT, paper | Tags: Erdos, Wouter van Doorn | by Terence Tao | 3 comments
Wouter van Doorn and I have uploaded to the arXiv our paper “Growth rates of sequences governed by the squarefree properties of its translates“. In this paper we answer a number of questions of Erdős} ( Problem 1102 and Problem 1103 on the Erdős problem web site) regarding how quickly a sequence {A = \{a_1 < a_2 < \dots\}} of increasing natural numbers can grow if one constrains its translates {n+A} to interact with the set {\mathcal{SF} = \{1,2,3,5,6,7,10,\dots\}} of squarefree numbers in various ways. For instance, Erdős defined a sequence {A} to have “Property {P}” if each of its translates {n+A} only intersected {\mathcal{SF}} in finitely many points. Erdős believed this to be quite a restrictive condition on {A}, writing “Probably a sequence having property P must increase fairly fast, but I have no results in this direction.”. Perhaps surprisingly, we show that while these sequences must be of density zero, they can in fact grow arbitrary slowly in the sense that one can have {a_j \leq j f(j)} for all sufficiently large {j} and any specified function {f(j)} that tends to infinity as {j \rightarrow \infty}. For instance, one can find a sequence that grows like {O(j \log\log\log j)}. The density zero claim can be proven by a version of the Maier matrix method, and also follows from known moment estimates on the gaps between squarefree numbers; the latter claim is proven by a greedy construction in which one slowly imposes more and more congruence conditions on the sequence to ensure that various translates of the sequence stop being squarefree after a certain point.
Erdős also defined a somewhat complementary property {Q}, which asserts that for infinitely many {n}, all the elements {n+a} of {A} for {a \leq n} are square-free. Since the squarefree numbers themselves have density {6/\pi^2}, it is easy to see that a sequence with property {Q} must have (upper) density at most {6/\pi^2} (because it must be “admissible” in the sense of avoiding one residue class modulo {p^2} for each {p}). Erdős observed that any sufficiently rapidly growing (admissible) sequence would obey property {Q} but beyond that, Erdős writes “I have no precise information about the rate of increase a sequence having property Q must have.”. Our results in this direction may also be surprising: we show that there exist sequences with property {Q} with density exactly {6/\pi^2} (or equivalently, {a_j \sim \frac{\pi^2}{6} j}). This requires a recursive sieve construction, in which one starts with an initial scale {n} and finds a much larger number {n'} such that {n'+a} is squarefree for most of the squarefree numbers {a \leq n'} (and all of the squarefree numbers {a \leq n}). We quantify Erdős’s remark by showing that an (admissible) sequence will necessarily obey property {Q} once it grows significantly faster than {\exp( C j \log j)}, but need not obey this property if it only grows like {\exp(O(j^{1/2} \log^{1/2} j))}. This is achieved through further application of sieve methods.
A third property studied by Erdős is the property of having squarefree sums, so that {a_i + a_j} is squarefree for all {i,j}. Erdős writes, “In fact one can find a sequence which grows exponentially. Must such a sequence really increase so fast? I do not expect that there is such a sequence of polynomial growth.” Here our results are relatively weak: we can construct such a sequence that grows like {\exp(O(j \log j))}, but do not know if this is optimal; the best lower bound we can produce on the growth, coming from the large sieve, is {\gg j^{4/3}}. (Somewhat annoyingly, the precise form of the large sieve inequality we needed was not in the literature, so we have an appendix supplying it.) We suspect that further progress on this problem requires advances in inverse sieve theory.
A weaker property than squarefree sums (but stronger than property {Q}), referred to by Erdős as property {\overline{P}}, asserts that there are infinitely many {n} such that all elements of {n+A} (not just the small ones) are square-free. Here, the situation is close to, but not quite the same, as that for property {Q}; we show that sequences with property {\overline{P}} must have upper density strictly less than {6/\pi^2}, but can have density arbitrarily close to this value.
Finally, we looked at a further question of Erdős on the size of an admissible set {A}. Because the squarefree numbers are admissible, the maximum number {A(x)} of elements of an admissible set {A} up to {x} (OEIS A083544) is at least the number {|{\mathcal SF} \cap [x]|} of squarefree elements up to {x} (A013928). It was observed by Ruzsa that the former sequence is greater than the latter for infinitely many {x}. Erdős asked, “Probably this holds for all large x. It would be of some interest to estimate A(x) as accurately as possible.”
We are able to show
\displaystyle \frac{\sqrt{x}}{\log x} \ll A(x) - \frac{6}{\pi^2} x \ll x^{4/5},
with the upper bound coming from the large sieve and the lower bound from a probabilistic construction. In contrast, a classical result of Walfisz shows that\displaystyle |{\mathcal SF} \cap [x]| - \frac{6}{\pi^2} x \ll x^{1/2} \exp(-c \log^{3/5} x / (\log\log x)^{1/5}).
Together, this implies that Erdős’s conjecture holds {A(x) > |{\mathcal SF} \cap [x]|} for all sufficiently large {x}. Numerically, it appears that in fact this conjecture holds for all {n>17}:However, we do not currently have enough numerical data for the sequence {A(x)} to completely confirm the conjecture in all cases. This could potentially be a crowdsourced project (similar to the Erdős-Guy-Selfridge project reported on in this previous blog post).
Call for industry sponsors for IPAM’s RIPS program
24 November, 2025 in advertising | Tags: ipam | by Terence Tao | 6 comments
Over the last 25 years, the Institute for pure and applied mathematics (IPAM) at UCLA (where I am now director of special projects) has run the popular Research in Industrial Projects for Students (RIPS) program every summer, in which industry sponsors with research projects are matched with talented undergraduates and a postdoctoral mentor to work at IPAM on the sponsored project for nine weeks. IPAM is now putting out a call for industry sponsors who can contribute a suitable research project for the Summer of 2026, as well as funding to cover the costs of the research; details are available here.
(Student applications for this program will open at a later date, once the list of projects is finalized.)
Climbing the cosmic distance ladder: another sample chapter
21 November, 2025 in advertising, book, update | Tags: cosmic distance ladder, Tanya Klowden | by Terence Tao | 14 comments
Five years ago, I announced a popular science book project with Tanya Klowden on the cosmic distance ladder, in which we released a sample draft chapter of the book, covering the “fourth rung” of the ladder, which for us meant the distances to the planets. In the intervening time, a number of unexpected events have slowed down this project significantly; but I am happy to announce that we have completed a second draft chapter, this time on the “seventh rung” of measuring distances across the Milky Way, which required the maturation of the technologies of photography and spectroscopy, as well as the dawn of the era of “big data” in the early twentieth century, as exemplified for instance by the “Harvard computers“.
We welcome feedback of course, and are continuing to work to complete the book despite the various delays. In the mean time, you can check out our instagram account for the project, or the pair of videos that Grant Sanderson (3blue1brown) produced with us on this topic, which I previously blogged about here.
Thanks to Clio Cresswell, Riley Tao, and Noah Klowden for comments and corrections.
Sum-difference exponents for boundedly many slopes, and rational complexity
19 November, 2025 in math.CO, paper | Tags: additive combinatorics, AlphaEvolve, entropy, Kakeya conjecture | by Terence Tao | 11 comments
I have uploaded to the arXiv my paper “Sum-difference exponents for boundedly many slopes, and rational complexity“. This is the second spinoff of my previous project with Bogdan Georgiev, Javier Gómez–Serrano, and Adam Zsolt Wagner that I recently posted about. One of the many problems we experimented using the AlphaEvolve tool with was that of computing sum-difference constants. While AlphaEvolve did modest improve one of the known lower bounds on sum-difference constants, it also revealed an asymptotic behavior to these constants that had not been previously observed, which I then gave a rigorous demonstration of in this paper.
In the original formulation of the sum-difference problem, one is given a finite subset {E} of {{\bf R}^2} with some control on projections, such as
\displaystyle |\{ a: (a,b) \in E \}| \leq N
\displaystyle |\{ b: (a,b) \in E \}| \leq N
\displaystyle |\{ a+b: (a,b) \in E \}| \leq N
and one then asks to obtain upper bounds on the quantity\displaystyle |\{ a-b: (a,b) \in E \}|. \ \ \ \ \ (1)
This is related to Kakeya sets because if one joins a line segment between {(a,0)} and {(b,1)} for every {(a,b) \in E}, one gets a family of line segments whose set of directions has cardinality (1), but whose slices at heights {0,1,1/2} have cardinality at most {N}.Because {a-b} is clearly determined by {a} and {b}, one can trivially get an upper bound of {N^2} on (1). In 1999, Bourgain utilized what was then the very recent “Balog–Szemerédi–Gowers lemma” to improve this bound to {N^{2-\frac{1}{13}}}, which gave a new lower bound of {\frac{13d+12}{25}} on the (Minkowski) dimension of Kakeya sets in {{\bf R}^d}, which improved upon the previous bounds of Tom Wolff in high dimensions. (A side note: Bourgain challenged Tom to also obtain a result of this form, but when they compared notes, Tom obtained the slightly weaker bound of {N^{2-\frac{1}{14}}}, which gave Jean great satisfaction.) Currently, the best upper bound known for this quantity is {N^{2-\frac{1}{6}}}.
One can get better bounds by adding more projections. For instance, if one also assumes
\displaystyle |\{ a+2b: (a,b) \in E \}| \leq N
then one can improve the upper bound for (1) to {N^{2-\frac{1}{4}}}. The arithmetic Kakeya conjecture asserts that, by adding enough projections, one can get the exponent arbitrarily close to {1}. If one could achieve this, this would imply the Kakeya conjecture in all dimensions. Unfortunately, even with arbitrarily many projections, the best exponent we can reach asymptotically is {1.67513\dots}.It was observed by Ruzsa that all of these questions can be equivalently formulated in terms of Shannon entropy. For instance, the upper bound {N^{2-\frac{1}{6}}} of (1) turns out to be equivalent to the entropy inequality
\displaystyle {\bf H}(X-Y) \leq (2-\frac{1}{6}) \max( {\bf H}(X), {\bf H}(Y) , {\bf H}(X+Y) )
holding for all discrete random variables {X, Y} (not necessarily independent) taking values in {{\bf R}^2}. In the language of this paper, we write this as\displaystyle SD(\{0,1,\infty\}; -1) \leq 2-\frac{1}{6}.
Similarly we have\displaystyle SD(\{0,1,2,\infty\}; -1) \leq 2-\frac{1}{4}.
As part of the AlphaEvolve experiments, we directed this tool to obtain lower bounds for {SD(\{0,1,\infty\}; \frac{a}{b})} for various rational numbers {\frac{a}{b}}, defined as the best constant in the inequality
\displaystyle {\bf H}(X+\frac{a}{b} Y) \leq SD(\{0,1,\infty\}; \frac{a}{b}) \max( {\bf H}(X), {\bf H}(Y) , {\bf H}(X+Y) ).
We did not figure out a way for AlphaEvolve to efficiently establish upper bounds on these quantities, so the bounds provided by AlphaEvolve were of unknown accuracy. Nevertheless, they were sufficient to give a strong indication that these constants decayed logarithmically to {2} as {a+b \rightarrow \infty}:The first main result of this paper is to confirm that this is indeed the case, in that
\displaystyle 2 - \frac{c_2}{\log(2+|a|+|b|)} \leq SD(\{0,1,\infty\}; \frac{a}{b}) \leq 2 - \frac{c_1}{\log(2+|a|+|b|)}
whenever {a/b} is in lowest terms and not equal to {0, 1, \infty}, where {c_1,c_2>0} are absolute constants. The lower bound was obtained by observing the shape of the examples produced by AlphaEvolve, which resembled a discrete Gaussian on a certain lattice determined by {a,b}. The upper bound was established by an application of the “entropic Plünnecke–Ruzsa calculus”, relying particularly on the entropic Ruzsa triangle inequality, the entropic Balog–Szemerédi–Gowers lemma, as well as an entropy form of an inequality of Bukh.The arguments also apply to settings where there are more projections under control than just the {0,1,\infty} projections. If one also controls projections {X + r_i Y} for various rationals {r_1,\dots,r_k} and {R} denotes the set of slopes of the projections under control, then it turns out that the associated sum-difference constant {SD(\{0,1,\infty,r_1,\dots,r_k\}; s)} still decays to {2}, but now the key parameter is not the height {|a|+|b|} of {s}, but rather what I call the rational complexity of {s} with respect to {R}, defined as the smallest integer {D} for which one can write {s} as a ratio {P(r_1,\dots,r_k)/Q(r_1,\dots,r_k)} where {P,Q} are integer-coefficient polynomials of degree at most {D} and coefficients at most {2^D}. Specifically, {SD(\{0,1,\infty,r_1,\dots,r_k\}; s)} decays to {2} at a polynomial rate in {D}, although I was not able to pin down the exponent of this decay exactly. The concept of rational complexity may seem somewhat artificial, but it roughly speaking measures how difficult it is to use the entropic Plünnecke–Ruzsa calculus to pass from control of {X, Y, X+Y}, and {X+r_i Y} to control of {X+sY}.
While this work does not make noticeable advances towards the arithmetic Kakeya conjecture (we only consider regimes where the sum-difference constant is close to {2}, rather than close to {1}), it does highlight the fact that these constants are extremely arithmetic in nature, in that the influence of projections {X+r_iY} on {X+sY} is highly dependent on how efficiently one can represent {s} as a rational combination of the {r_i}.
New Nikodym set constructions over finite fields
12 November, 2025 in math.CO, paper | Tags: AlphaEvolve, finite fields, Nikodym set, probabilistic method | by Terence Tao | 14 comments
I have uploaded to the arXiv my paper “New Nikodym set constructions over finite fields“. This is a spinoff of my previous project with Bogdan Georgiev, Javier Gómez–Serrano, and Adam Zsolt Wagner that I recently posted about. In that project we experimented with using AlphaEvolve (and other tools, such as DeepThink and AlphaProof) to explore various mathematical problems which were connected somehow to an optimization problem. For one of these — the finite field Nikodym set problem — these experiments led (by a somewhat convoluted process) to an improved asymptotic construction of such sets, the details of which are written up (by myself rather than by AI tools) in this paper.
Let {{\mathbb F}_q} be a finite field of some order {q} (which must be a prime or a power of a prime), and let {d} be a fixed dimension. A Nikodym set in {{\mathbb F}_q^d} is a subset {N} of {{\mathbb F}_q^d} with the property that for every point {x \in {\mathbb F}_q^d}, there exists a line {\ell} passing through {x} such that all points of {\ell} other than {x} lie in {N}. Such sets are close cousins of Kakeya sets (which contain a line in every direction); indeed, roughly speaking, applying a random projective transformation to a Nikodym set will yield (most of) a Kakeya set. As a consequence, any lower bound on Kakeya sets implies a similar bound on Nikodym sets; in particular, one has a lower bound
\displaystyle |N| \geq \frac{q^d}{2^{d-1}} + O(q^{d-1})
on the size of a Nikodym set {N}, coming from a similar bound on Kakeya sets due to Bukh and Chao using the polynomial method.For Kakeya sets, Bukh and Chao showed this bound to be sharp up to the lower order error {O(q^{d-1})}; but for Nikodym sets it is conjectured that in fact such sets should asymptotically have full density, in the sense that
\displaystyle |N| \geq q^d - o(q^d).
This is known in two dimensions thanks to work by Szönyi et al. on blocking sets, and was also established in bounded torsion cases (and in particular for even {q}) by Guo, Kopparty, and Sudan by combining the polynomial method with the theory of linear codes. But in other cases this conjecture remains open in three and higher dimensions.
In our experiments we focused on the opposite problem of constructing Nikodym sets of size as small as possible. In the plane {d=2}, constructions of size \displaystyle |N| = q^2 - q^{3/2} + O(q \log q) \ \ \ \ \ (1)
We set AlphaEvolve to try to optimize the three dimensional problem with a variable field size {q} (which we took to be prime for simplicity), with the intent to get this tool to come up with a construction that worked asymptotically for large {q}, rather than just for any fixed value of {q}. After some rounds of evolution, it arrived at a construction which empirically had size about {q^3 - 8q^2}. Inspecting the code, it turned out that AlphaEvolve had constructed a Nikodym set {N} by (mostly) removing eight low-degree algebraic surfaces (all of the form {\{ (x,y,x^i y)\}} for various {i}). We used the tool DeepThink to confirm the Nikodym property and to verify the construction, and then asked it to generalize the method. By removing many more than eight surfaces, and using some heuristic arguments based on the Chebotarev density theorem, DeepThink claimed a construction of size \displaystyle |N| = q^3 - 2 q^2 \log q + o(q^2 \log q) \ \ \ \ \ (2)
The arguments can be sketched here as follows. Let {V} be a random surface of degree {D}, and let {x} be a point in {{\mathbb F}_q^3} which does not lie in {V}. A random line through {x} then meets {V} in a number of points, which is basically the set of zeroes in {{\mathbb F}_q} of a random polynomial of degree {D}. The (function field analogue of the) Chebotarev density theorem predicts that the probability that this polynomial has no roots in {{\mathbb F}_q} is about {\delta_D}, where
\displaystyle \delta_D = 1 - \frac{1}{1!} + \frac{1}{2!} - \dots + \frac{(-1)^D}{D!}
is the proportion of permutations on {D} elements that are derangements (no fixed points). So, if one removes {k} random surfaces of degrees {D_1,\dots,D_k}, the probability that a random line avoids all of these surfaces is about {\delta_{D_1} \dots \delta_{D_k}}. If this product is significantly greater than {1/q^2}, then the law of large numbers (and concentration of measure) then predicts (with high probability) that out of the {\sim q^2} lines through {x}, at least one will avoid the removed surfaces, thus giving (most of) a Nikodym set. The Lang-Weil estimate predicts that each surface has cardinality about {q^2}, so this should give a Nikodym set of size about {q^3 - kq^2}.DeepThink took the degrees {D_1,\dots,D_k} to be large, so that the derangement probabilities {\delta_{D_i}} were close to {1/e}. This led it to predict that {k} could be taken to be as large as {2 \log q}, leading to the claimed bound (2). However, on inspecting this argument we realized that these moderately high degree surfaces were effectively acting as random sets, so one could dramatically simplify DeepThink’s argument by simply taking {N} to be a completely random set of the desired cardinality (2), in which case the verification of the Nikodym set property (with positive probability) could be established by a standard Chernoff bound-type argument (actually, I ended up using Bennett’s inequality rather than Chernoff’s inequality, but this is a minor technical detail).
On the other hand, the derangement probabilities {\delta_D} oscillate around {1/e}, and in fact are as large as {1/2} when {D=2}. This suggested that one could do better than the purely random construction if one only removed quadratic surfaces instead of higher degree surfaces, and heuristically predicted the improvement \displaystyle |N| = q^3 - \frac{2}{\log 2} q^2 \log q + o(q^2 \log q). \ \ \ \ \ (3)
However, I realized that one could repair the construction by adding back a small random portion of the removed quadratic surfaces, to allow for a non-zero number of lines through {x} to stay inside the putative Nikodym set even when {x} was in one of the surfaces {V}, and the line was not tangent to {V}. Pursuing this idea, and performing various standard probabilistic calculations and projective changes of variable, the problem essentially reduced to the following: given {k} random quadratic polynomials in the plane {{\mathbb F}_q^2}, is it true that these polynomials simultaneously take quadratic residue values for {\gg 2^{-k}} of the points in that plane? Heuristically this should be true even for {2^{-k}} close to {1/q^2}. However, it proved difficult to accurately control this simultaneous quadratic residue event; standard algebraic geometry tools such as the Weil conjectures seemed to require some vanishing of étale cohomology groups in order to obtain adequate error terms, and this was not something I was eager to try to work out. However, by exploiting projective symmetry (and the {2}-transitive nature of the projective linear group), I could get satisfactory control of such intersections as long as {2^{-k}} was a little bit larger than {1/q} rather than {1/q^2}. This gave an intermediate construction of size
\displaystyle |N| = q^3 - (\frac{1}{\log 2} + 1) q^2 \log q + o(q^2 \log q),
which still beat the purely random construction, but fell short of heuristic predictions. This argument (generalized to higher dimensions) is what is contained in the paper. I pose the question of locating a construction with the improved bound (3) (perhaps by some modification of the strategy of removing quadratic varieties) as an open question.We also looked at the two-dimensional case to see how well AlphaEvolve could recover known results, in the case that {q} was a perfect square. It was able to come up with a construction that was slightly worse than the best known construction, in which one removed a large number of parabolas from the plane; after manually optimizing the construction we were able to recover the known bound (1). This final construction is somewhat similar to existing constructions (it has a strong resemblance to a standard construction formed by taking the complement of a Hermitian unital), but is still technically a new construction, so we have also added it to this paper.