Picking high index for binary search of sorted array

Question 1

The following pseudocode returns index of the given element x within sorted array A where p is starting index and r is last index of array A.

I am confused about though process behind picking initial value of high. I understand that if while loop has low < high then we pick high = r + 1 but if while loop was low <= high then we would pick high = r so that element at highest index of array is always included in the search (thus not excluded from the search).

Please elaborate on the crux of thought process behind picking high index value for initializing.

For example, one thing that I could think of was following:

Is there a relation to picking mid value using floor? Floor biases mid towards the left of the sorted array but high = r + 1 biases the mid towards the right so it kind of balances out.

If I pick high = r, in addition to changing while condition to low <= high, should I change mid to ceiling((low+high)/2) ?

FIND-SPLIT-POINT(A, p, r, x)
low = p // low end of search range
high = r + 1 // high end of search range
while low < high // more than one element?
 mid = floor((low + high)/2) // midpoint of range
 if x <= A[mid] // is answer q <= mid?
 high = mid // narrow search to A[low : mid]
 else low = mid + 1 // narrow search to A[mid + 1 : high]
return low

Question 2

I would suggest you first do some dry runs on paper with, say, an array of size 10. Let us know if you still have doubts.

Question 3

This seems to be an iterative version of binary search; there should be plenty of materials on the internet. :)

Question 4

The algorithm itself is pretty simple but choice of high index is not elaborated on. Both versions can be proven correct with loop invariants. I have edited the question to reflect that I don't particularly want a loop invariant but elaboration of the process of picking high index value.

Question 5

The thought process:

I have an interval containing x.
I can half the size of the interval, repeatedly.
This is tricky to get right so I write down a loop invariant.
I’ll need a loop invariant that is true before the loop starts.
I’ll need a loop invariant that stays true after each iteration.
Every iteration must make progress.

Question 6

You said in a comment that you don't want a loop invariant, but I would strongly advise you to approach this in terms of loop invariants. The appropriate choices for low and high are those that lend themselves to an easy-to-state loop invariant.

The loop invariant for this algorithm appears to be that the split point between elements < x and those ≥ x is in the range low to high inclusive. If x occurs exactly once in A then the split point is also its index, but the algorithm is more general than that. There are length(A) + 1 = r + 2 possible split points, so I don't think assigning a lower value to high would make sense.

If you replaced floor with ceil in the algorithm as written, it wouldn't work at all (consider what happens when high - low = 1). The reason floor makes sense here is that low and high point between array elements (to split points) while mid points to an array element. If (low + high)/2 is not an integer, then interpreted in the same way as low and high, it points to the middle of an array element, and floor gets you that element's index with no left/right bias (assuming 0-based indexing; in a 1-based language you should use ceil). The bias in this algorithm is actually when (low + high)/2 is an integer: the array element to the right of center is tested in that case.

If you know that x occurs in the array, you could pick a loop invariant like "the index of the leftmost occurrence of x is in the range low to high inclusive" and set high = r initially. If x may not occur in the array, it's much harder to come up with a good invariant. You could start with "the range low to high inclusive includes the smallest element that is at least as large as x", but that doesn't handle the case that all elements are smaller than x. You would probably have to imagine that there is an infinite element past the end of the array, and allow the current range to include it, which takes you back to the original algorithm.

gnasher729 gnasher729 32.4k36 silver badges56 bronze badges · Answer 1 · 2024-05-15 22:18:37Z

The thought process:

I have an interval containing x.
I can half the size of the interval, repeatedly.
This is tricky to get right so I write down a loop invariant.
I’ll need a loop invariant that is true before the loop starts.
I’ll need a loop invariant that stays true after each iteration.
Every iteration must make progress.

benrg benrg 2,5217 silver badges13 bronze badges · Answer 2 · 2024-05-15 22:06:01Z

You said in a comment that you don't want a loop invariant, but I would strongly advise you to approach this in terms of loop invariants. The appropriate choices for low and high are those that lend themselves to an easy-to-state loop invariant.

The loop invariant for this algorithm appears to be that the split point between elements < x and those ≥ x is in the range low to high inclusive. If x occurs exactly once in A then the split point is also its index, but the algorithm is more general than that. There are length(A) + 1 = r + 2 possible split points, so I don't think assigning a lower value to high would make sense.

If you replaced floor with ceil in the algorithm as written, it wouldn't work at all (consider what happens when high - low = 1). The reason floor makes sense here is that low and high point between array elements (to split points) while mid points to an array element. If (low + high)/2 is not an integer, then interpreted in the same way as low and high, it points to the middle of an array element, and floor gets you that element's index with no left/right bias (assuming 0-based indexing; in a 1-based language you should use ceil). The bias in this algorithm is actually when (low + high)/2 is an integer: the array element to the right of center is tested in that case.

If you know that x occurs in the array, you could pick a loop invariant like "the index of the leftmost occurrence of x is in the range low to high inclusive" and set high = r initially. If x may not occur in the array, it's much harder to come up with a good invariant. You could start with "the range low to high inclusive includes the smallest element that is at least as large as x", but that doesn't handle the case that all elements are smaller than x. You would probably have to imagine that there is an infinite element past the end of the array, and allow the current range to include it, which takes you back to the original algorithm.

Stack Exchange Network

Picking high index for binary search of sorted array

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Picking high index for binary search of sorted array

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions