How can I estimate the probability that the sum $S_n$ of $n$ uniform random 48-bit signed integers overflows a 64-bit signed integer?
Edit: the overflow can occur at any step, not only on the final result.
Since this is a normal distribution, I guess that the cumulative sums more or less oscillate around zero, and that the probability that they ever exceed 64 bits is vanishingly small. I would like to formalise this intuition.
-
2$\begingroup$ By overflow do you mean that the final result is different from what it would have been if evaluated with unlimited integers, or that any of the individual additions overflows? (if that happens, the final result can still be correct) $\endgroup$user555045– user5550452024年02月02日 21:28:31 +00:00Commented Feb 2, 2024 at 21:28
-
3$\begingroup$ Re: "the probability that they ever exceed 64 bits is vanishingly small": Note that if we're assuming two's complement, then there is a slight bias toward negative numbers (because $-2^{n-1}$ is possible but 2ドル^{n-1}$ is not, so the average possible value is -0.5), and that bias will eventually add up. So if by "ever" you literally mean "ever" -- the probability as n goes to infinity -- then the probability is 1, no? (Even without that bias I actually wonder if the probability might be 1, but if so then that's not as obvious.) $\endgroup$ruakh– ruakh2024年02月02日 22:59:16 +00:00Commented Feb 2, 2024 at 22:59
-
$\begingroup$ @ruakh since this is a sum of independent variables, the variance will continually grow, so for very large $n$ the probability of being in the 64-bit range goes to zero. $\endgroup$qwr– qwr2024年02月03日 01:39:00 +00:00Commented Feb 3, 2024 at 1:39
-
$\begingroup$ stats.stackexchange.com/questions/317852/… $\endgroup$qwr– qwr2024年02月03日 03:06:38 +00:00Commented Feb 3, 2024 at 3:06
-
$\begingroup$ Since you edited the question, I don't know the distribution of the max of partial sums. You're probably better off asking on stats stack exchange because this is not really a CS question. $\endgroup$qwr– qwr2024年02月03日 20:30:25 +00:00Commented Feb 3, 2024 at 20:30
3 Answers 3
Suppose the 48-bit signed integers are drawn uniformly randomly from $[-2^{47}, 2^{47}-1]$. To overflow, there must be at least $n = 2^{63}/2^{47} = 2^{16}$ integers drawn, so $n$ is large enough for the Central Limit Theorem to apply, so the result is approximately[citation needed] Normal but discretized.
For independent variables $X_i$ with mean $\mu$ and variance $\sigma^2$, the resulting sum will be approximately Normal with mean $n \mu$ and variance $n \sigma^2$. For our uniformly random integers, the mean $\mu = -1/2$ is slightly biased negatively as ruakh notes, and the variance is about $(2^{48})^2/12 = 2^{96}/12 \approx 6.6e27$, so this means the resulting distribution is approximately $N(-n/2, 6.6e27 \times n )$.
Since the standard deviations are only about 8ドル.1e13 \sqrt n$, even for $n=1e8$, 2ドル^{63}$ would be more than 10 SDs away, but for $n=1e9$, it's only 3.6 SDs, so pick your $n$ based on z-score probabilities. (For SDs this large, the biased mean has little effect on the result.)
Calculate the average and variance of choosing a single signed 48 bit variable. Multiply by n to get the average and variance of choosing n 48 bit variables. The standard variation is roughly the square root of the variance.
Calculate how many standard deviations away from the average you need to be for an overflow. The probability of being more than k standard deviations away from the mean is about exp(-k^2).
Now that is the probability of overflow after summing n numbers. It ignores the possibility that the sum might overflow and then become smaller again. So the probability of overflow at some point within n steps is higher.
-
$\begingroup$ You're using the Central Limit Theorem, correct? I didn't know if CLT applied to sums like it applied to normalized means. See my comment on the question post. $\endgroup$qwr– qwr2024年02月03日 03:11:34 +00:00Commented Feb 3, 2024 at 3:11
-
$\begingroup$ As long as you can calculate the mean, the variance and the standard deviation, for the CLT you only need to know "how many standard deviations am I allowed to be away from the mean". If you had n values from -1000 to +10000, so each has a mean of 4,500, then your mean for n values would be 4,500n. And for very very large n if 4,500n gets close to 2^63 then the number of standard deviations allowed without overflow would get small very quickly. $\endgroup$gnasher729– gnasher7292024年02月03日 12:35:34 +00:00Commented Feb 3, 2024 at 12:35
-
$\begingroup$ I mean how big does n have to be to be approximately normal. But I think 2^16 is way more than enough. $\endgroup$qwr– qwr2024年02月03日 20:22:01 +00:00Commented Feb 3, 2024 at 20:22
Hoeffding's inequality gives non-asymptotic probability bounds. If $X$ is on $[a,b]$ and $C>b-a$ then $$P\left(\left|\sum_i X_i - \sum_i E[X_i]\right|>t\right)\leq 2\exp\left(\frac{-2t^2}{nC^2} \right).$$
This gives qualitatively similar bounds to the CLT that @qwr used, but slightly more rigorous since the CLT statement is asymptotic and Hoeffding's inequality holds for all $n$ and $t$