3
$\begingroup$

How can I estimate the probability that the sum $S_n$ of $n$ uniform random 48-bit signed integers overflows a 64-bit signed integer?

Edit: the overflow can occur at any step, not only on the final result.

Since this is a normal distribution, I guess that the cumulative sums more or less oscillate around zero, and that the probability that they ever exceed 64 bits is vanishingly small. I would like to formalise this intuition.

asked Feb 2, 2024 at 11:08
$\endgroup$
5
  • 2
    $\begingroup$ By overflow do you mean that the final result is different from what it would have been if evaluated with unlimited integers, or that any of the individual additions overflows? (if that happens, the final result can still be correct) $\endgroup$ Commented Feb 2, 2024 at 21:28
  • 3
    $\begingroup$ Re: "the probability that they ever exceed 64 bits is vanishingly small": Note that if we're assuming two's complement, then there is a slight bias toward negative numbers (because $-2^{n-1}$ is possible but 2ドル^{n-1}$ is not, so the average possible value is -0.5), and that bias will eventually add up. So if by "ever" you literally mean "ever" -- the probability as n goes to infinity -- then the probability is 1, no? (Even without that bias I actually wonder if the probability might be 1, but if so then that's not as obvious.) $\endgroup$ Commented Feb 2, 2024 at 22:59
  • $\begingroup$ @ruakh since this is a sum of independent variables, the variance will continually grow, so for very large $n$ the probability of being in the 64-bit range goes to zero. $\endgroup$ Commented Feb 3, 2024 at 1:39
  • $\begingroup$ stats.stackexchange.com/questions/317852/… $\endgroup$ Commented Feb 3, 2024 at 3:06
  • $\begingroup$ Since you edited the question, I don't know the distribution of the max of partial sums. You're probably better off asking on stats stack exchange because this is not really a CS question. $\endgroup$ Commented Feb 3, 2024 at 20:30

3 Answers 3

4
$\begingroup$

Suppose the 48-bit signed integers are drawn uniformly randomly from $[-2^{47}, 2^{47}-1]$. To overflow, there must be at least $n = 2^{63}/2^{47} = 2^{16}$ integers drawn, so $n$ is large enough for the Central Limit Theorem to apply, so the result is approximately[citation needed] Normal but discretized.

For independent variables $X_i$ with mean $\mu$ and variance $\sigma^2$, the resulting sum will be approximately Normal with mean $n \mu$ and variance $n \sigma^2$. For our uniformly random integers, the mean $\mu = -1/2$ is slightly biased negatively as ruakh notes, and the variance is about $(2^{48})^2/12 = 2^{96}/12 \approx 6.6e27$, so this means the resulting distribution is approximately $N(-n/2, 6.6e27 \times n )$.

Since the standard deviations are only about 8ドル.1e13 \sqrt n$, even for $n=1e8$, 2ドル^{63}$ would be more than 10 SDs away, but for $n=1e9$, it's only 3.6 SDs, so pick your $n$ based on z-score probabilities. (For SDs this large, the biased mean has little effect on the result.)

answered Feb 3, 2024 at 1:29
$\endgroup$
8
$\begingroup$

Calculate the average and variance of choosing a single signed 48 bit variable. Multiply by n to get the average and variance of choosing n 48 bit variables. The standard variation is roughly the square root of the variance.

Calculate how many standard deviations away from the average you need to be for an overflow. The probability of being more than k standard deviations away from the mean is about exp(-k^2).

Now that is the probability of overflow after summing n numbers. It ignores the possibility that the sum might overflow and then become smaller again. So the probability of overflow at some point within n steps is higher.

answered Feb 2, 2024 at 16:59
$\endgroup$
3
  • $\begingroup$ You're using the Central Limit Theorem, correct? I didn't know if CLT applied to sums like it applied to normalized means. See my comment on the question post. $\endgroup$ Commented Feb 3, 2024 at 3:11
  • $\begingroup$ As long as you can calculate the mean, the variance and the standard deviation, for the CLT you only need to know "how many standard deviations am I allowed to be away from the mean". If you had n values from -1000 to +10000, so each has a mean of 4,500, then your mean for n values would be 4,500n. And for very very large n if 4,500n gets close to 2^63 then the number of standard deviations allowed without overflow would get small very quickly. $\endgroup$ Commented Feb 3, 2024 at 12:35
  • $\begingroup$ I mean how big does n have to be to be approximately normal. But I think 2^16 is way more than enough. $\endgroup$ Commented Feb 3, 2024 at 20:22
3
$\begingroup$

Hoeffding's inequality gives non-asymptotic probability bounds. If $X$ is on $[a,b]$ and $C>b-a$ then $$P\left(\left|\sum_i X_i - \sum_i E[X_i]\right|>t\right)\leq 2\exp\left(\frac{-2t^2}{nC^2} \right).$$

This gives qualitatively similar bounds to the CLT that @qwr used, but slightly more rigorous since the CLT statement is asymptotic and Hoeffding's inequality holds for all $n$ and $t$

answered Feb 4, 2024 at 4:32
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.