Prove that an expression can create duplicate

Question 1

Let's suppose we have this expression:

(i % 1000000) * 1000 + ms

where i is an always increasing number, and ms is the millisecond part of the current time ( ranging 0..999). So each time we are calling the expression above, at random intervals, while i is increasing always obtaining all unique values, the result of the expression above will potentially returns duplicated, intuitively. How to show this in an acceptable form? Is there a way to show the probability next iteration will generate a duplicate?

Question 2

To show that there can be duplicates, it is sufficient to provide a concrete example. But here, it's possible to describe the set of duplicates: you'll get duplicates for all x, n>0, ms: f(i=x, ms) = f(i=x+n*1000000, ms)

Question 3

"Is there a way to show the probability next iteration will generate a duplicate?" Probability doesn't make sense here. Is any iteration liable to produce an outcome that has been seen before? If i<1000000, does not happen. If i>=1000000, 100% guaranteed.

Question 4

The problem is underspecified. "i is an always increasing number". Increasing by how much? Without specifying that the first 6 digits can behave wildly different. Everything from some never changing 6 digit number to completely random 6 digit numbers. Why? Because "always increasing" can be satisfied by digits we never see. Doesn't change 1000000001 evaluations forcing a duplicate but it impacts the probability of previous evaluations.

Question 5

OK so in practice you will get a repeat at 1 million and a few. Not at 1 billion and 1.

You have 1000 random values 0-999. each number after the first million has a 1 in a thousand chance of hitting the same random number as its previous twin number.

So the chance of getting a duplicate on each number after 1m is 1/1000. Each time you take another number, the cumulative chance of getting a duplicate is higher. After 1,000,000 + n numbers the probability of a duplicate having occurred, p, is:

p = 1-(1/1000)^n

ie. 693 rolls of that dice later and you have a 50% chance of having had a duplicate. 5000 rolls and that's risen to 99.3%

(Obviously if you get to 2m then you have a 2/1000 chance for each number and so on, so you would need a step function to model the probability for all n)

Question 6

Pretty sure your maths is wrong here. There is zero chance that the 1,000,002nd invocation collides with anything other than the 2nd invocation, because it has the form 000002xxx and the only previous invocation with that pattern was the 2nd invocation. It's 1,000,000 separate buckets each of size 1,000, so starting to get a reasonable chance of a collision when there are sqrt(1000) = 33 or so in each bucket, or 33M invocations. Still less than 1B but much more than 1,000,693.

Question 7

so 1,000,002 has a 1/1000 chance, 1,000,003 has a 1/1000 chance... etc by 1,000,693 you have had to be lucky for none of those chances to come up

Question 8

you would only hit 1b without a repeat if all the first 1m were xxxxxx000, the second 1m were xxxxxx001 etc

Question 9

+1 It's quite plausible to treat the ms value as a random value, independent of i. And then the math is correct.

Question 10

The minimum value of your expression is 0, when i % 1000000 = 0 and ms = 0.
The maximum value of your expression is 999999999, when i % 1000000 = 999999 and ms = 999.

Therefore via the pigeonhole principle after 1000000001 evaluations it must generate a duplicate.

Question 11

let' suppose I have a call when i is one million and ms is 1, then a call with i = 20000001 and ms is zero, I have a duplicate, am I wrong?

Question 12

I realised I mis-parsed your expression so I've re-written my answer. This is now pretty much the classic solution to this problem.

Question 13

Although true I think this is misleading. Say you are in the meeting where someone is saying this is a fine method of generating non-repeating numbers. You say, but it will definitely repeat after 1 billion! They reply "PAH! 1 billion!! we will never get to that many!!" When in actual fact you have a very high chance of getting a duplicate after 1 million.

Question 14

@Ewan True. With this specification I can implement it so that it must repeat after 1000. Ever increasing doesn't have to mean +1.

Question 15

With size-limited data types, there must eventually be duplicates, as they can only have a limited number of different states (e.g. 2^32 for the typical 32-bit integer).

In your case, it's even more limited:

(i % 1000000) can have one million different values (if we allow negative i values and the typical interpretation of the % operator, it's nearly two millions, 1999999 values, but I assume i will always be positive).
Then (i % 1000000) * 1000 can also have exactly one million different values.
ms can have one thousand different values.
The sum (i % 1000000) * 1000 + ms combines one million cases with one thousand cases, giving one billion different cases.

So, at least after one billion invocations, there must be a duplicate.

But there's the peculiar combination of a counter i with a real-time milliseconds value.

If you systematically increment i by one with every invocation, the earliest possible duplicate comes after one million invocations.

Depending on the time span for a million invocations, it's highly unlikely that you get one billion different values before the first duplicate. I'd expect something between one and two million invocations, unless you reach multi-million invocations per second.

If you want to make the best duplicates-free use of the range 0...999999999, simply use:

i % 1000000000

Then you're sure that you have the first duplicate exactly after one billion invocations.

Question 16

Very minor point: the first duplicate will occur after 1 billion plus one invocations, not one billion.

Ewan Ewan 83.9k5 gold badges90 silver badges187 bronze badges · Accepted Answer · 2022-07-12 22:00:25Z

OK so in practice you will get a repeat at 1 million and a few. Not at 1 billion and 1.

You have 1000 random values 0-999. each number after the first million has a 1 in a thousand chance of hitting the same random number as its previous twin number.

So the chance of getting a duplicate on each number after 1m is 1/1000. Each time you take another number, the cumulative chance of getting a duplicate is higher. After 1,000,000 + n numbers the probability of a duplicate having occurred, p, is:

p = 1-(1/1000)^n

ie. 693 rolls of that dice later and you have a 50% chance of having had a duplicate. 5000 rolls and that's risen to 99.3%

(Obviously if you get to 2m then you have a 2/1000 chance for each number and so on, so you would need a step function to model the probability for all n)

Pretty sure your maths is wrong here. There is zero chance that the 1,000,002nd invocation collides with anything other than the 2nd invocation, because it has the form 000002xxx and the only previous invocation with that pattern was the 2nd invocation. It's 1,000,000 separate buckets each of size 1,000, so starting to get a reasonable chance of a collision when there are sqrt(1000) = 33 or so in each bucket, or 33M invocations. Still less than 1B but much more than 1,000,693.
so 1,000,002 has a 1/1000 chance, 1,000,003 has a 1/1000 chance... etc by 1,000,693 you have had to be lucky for none of those chances to come up
you would only hit 1b without a repeat if all the first 1m were xxxxxx000, the second 1m were xxxxxx001 etc
+1 It's quite plausible to treat the ms value as a random value, independent of i. And then the math is correct.

Stack Exchange Network

Prove that an expression can create duplicate

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Prove that an expression can create duplicate

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions