13

I just can't seem to understand how this would work.

Question:
Given a sequential file that contains at most four billion 32 bit integers in random order, find a 32-bit integer that isn't in the file (and there must be at least one missing)

Answer:
it is helpful to view this binary search in terms of the 32 bits that represent each integer. In the first pass of the algorithm we read the (at most) four billion input integers and write those with a leading zero bit to one sequential file and those with a leading one bit to another file.

One of those files contains at most two billion integer, so we next use that file as the current input and repeat the probe process, but this time on the second bit.

So by splitting the file over and over again (binary search) how would this actually lead me to the missing 32 bit integer?

John Saunders
162k26 gold badges252 silver badges403 bronze badges
asked Feb 16, 2011 at 1:28

2 Answers 2

12

After each pass your next pass will be on the smaller of the two lists you've compiled.

At some point, you MUST encounter an empty list and this will determine your number. For example let's just use 3 bit numbers.

000
001
110
100
111

after the first pass we have

000
001
110
100
111

Then we look at the 2nd bits in the first list because it is smaller than (or equal to) the second. We would split them into

000
001
empty list

notice how the file that would start with 01 is empty, this means that there are no numbers that start with 01 so 010 and 011 are missing.

The reason we must eventually have a missing list is because we are choosing the smaller list for our next pass each time.

answered Feb 16, 2011 at 1:34

5 Comments

What about 101? Shouldn't it be missing too
Your question says to find A number, it doesn't say to find ALL the numbers. This is why this method works.
O I see, I thought it would spit out all the numbers, thanks!
@WuHoUnited "Then we look at the 2nd bits in the first list because it is smaller than (or equal to) the second" ...smaller than or equal to which second ? Also what will be big O complexity ? O(lg n) ?
@Geek In that sentence, the first list refers to 000, 001. (the second list would be 110, 100, 111). By smaller I mean that there are fewer elements in it. The big O(n) complexity depends on what you want to be the "n" (is n in this case 32 bits or is it 2^32 possible numbers or is it four billion selected numbers)
0

Eventually, if you keep splitting, you will have (at most) 4 billion files, each with one integer in it. In theory, you will then "know" which one is missing because there won't be a file for it.

You might also end up with a situation where you have an odd number of integers. In that case, the smaller half will be missing a number. This makes it easier to home in on the missing number.

In the case where you have an even number, you know that two are missing. In this case, you must find the parts that are smaller than their respective halves, and then proceed with the solution above.

answered Feb 16, 2011 at 1:35

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.