I'm trying to understand the basic concepts of algorithms through the classes offered at Coursera (in bits and pieces), I came across the deterministic linear time selection algorithm that works as follows:
- Select(A,n,i)
- If n = 1 return A[1].
- p = ChoosePivot(A, n)
- B = Partition(A, n, p)
- Suppose p is the jth element of B (i.e., the jth order statistic of A). Let the "first part of B" denote
its first j − 1 elements and the "second part" its last n − j elements.
- If i = j, return p.
- If i < j, return Select(1st part of B, j − 1, i).
- Else return Select(2nd part of B, n − j, i − j).
And sorts the array internally in the ChoosePivot
subroutine to calculate the median of median using a comparison based sorting algorithm. But isnt the lower bound on comparison based sorting O(nlogn)
? So how would it be possible for us to acheive O(n)
for the entire algorithm then?
Am I missing something here?
1 Answer 1
And sorts the array internally in the ChoosePivot subroutine to calculate the median of median using a comparison based sorting algorithm.
This is the incorrect premise (you are right that if we did do a comparison-based sort, we couldn't be linear). The point of median-of-medians is that it is linear - you don't need to sort the whole set if all you want is the median. See proof of O(n) running time on wikipedia for more detail.
ChoosePivot
does?ChoosePivot
computes the median of medians of the original array for which it employs comparison based sorting, which theoretically has a worst case ofO(nlogn)
if Im not mistaken