Here is pseudocode for the algorithm:
select(L,k)
{
if (L has 10 or fewer elements)
{
sort L
return the element in the kth position
}
partition L into subsets S[i] of five elements each
(there will be n/5 subsets total).
for (i = 1 to n/5) do
x[i] = select(S[i],3)
M = select({x[i]}, n/10)
partition L into L1<M, L2=M, L3>M
if (k <= length(L1))
return select(L1,k)
else if (k > length(L1)+length(L2))
return select(L3,k-length(L1)-length(L2))
else return M
}
Here is some analysis of the algorithm: http://www.ics.uci.edu/~eppstein/161/960130.html
The analysis suggests to use the recurrence relation $T(n) \leq \frac{12n}{5} + T(\frac{n}5) + T(\frac{7n}{10})$. Solving this we get linear work for a call. But aren't there log many recursive calls? So that would be $n\log n$.
Put another way, conceptually this algorithm seems like it could be described as "for each call, cut the search area kind of like binary search, but not guaranteed to cut the search area by as much as binary search, and add a linear time partition". Binary search runs in $O(\log(n)),ドル so adding a linear search per call would make it $O(n\log(n))$.
What am I not understanding about the linked analysis?
1 Answer 1
Consider the following series
$S = n + \lceil n/2 \rceil + \lceil n/4 \rceil + ... + 1$
There are $\lceil \log_2 n \rceil + 1$ terms but the sum is less than 2ドル^{\lceil \log_2 n \rceil+1} < 4n$ always.
Similarly, in your algorithm even though you have $\log n$ recursions, each recursion successively does a fraction of work. Hence the total work done in linear in $n,ドル rather than $\Theta(n \log n)$ as the naive analysis suggests, just similar to the sum of the series above. '
Also note that $T(n)$ is $O(n)$ as well as $O(n\log n)$. This is because $O(n\log n)$ is also an upper bound to $T(n)$ and therefore it is incorrect to say, for example, "Why is $T(n)$ not $O(n\log n)$?".
-
1$\begingroup$ I still don't get it. This recurrence: T(N) <= 12/5N + T(N/5) + T(7N/10). MergeSort recurrence: T(N) = 2T(N/2) + N. MergeSort complexity: O(N*log(N)) for the reasons I stated before. If your explanation is right, why is MergeSort complexity not O(N)? $\endgroup$justAnotherGuy– justAnotherGuy2016年03月12日 06:28:09 +00:00Commented Mar 12, 2016 at 6:28
-
$\begingroup$ Merge sort is n + (n/2 + n/2) + (n/4 + n/4 + n/4 + n/4) + (n/8 + n/8 + n/8 + n/8 + n/8 + n/8 + n/8 +n/8) ... In each level work does not diminishes, it is same as n. And therefore boils down to $n \log n$ for merge sort. $\endgroup$Sarvottamananda– Sarvottamananda2016年03月12日 06:30:36 +00:00Commented Mar 12, 2016 at 6:30
-
$\begingroup$ Total work in k-order statistics is ~ T(N/5) + T(7n/10) ~ C.9n/10 $\endgroup$Sarvottamananda– Sarvottamananda2016年03月12日 06:34:48 +00:00Commented Mar 12, 2016 at 6:34
-
$\begingroup$ OK, the intuition is clicking now. Thank you. $\endgroup$justAnotherGuy– justAnotherGuy2016年03月12日 07:03:43 +00:00Commented Mar 12, 2016 at 7:03
-
$\begingroup$ It did not help my understanding that in my implementation of the algorithm, I was doing a linear scan of the entire initial array to find the index of the element to partition around. Instead, I should do a linear scan only within the bounds of the subarray. $\endgroup$justAnotherGuy– justAnotherGuy2016年03月12日 07:13:52 +00:00Commented Mar 12, 2016 at 7:13
Explore related questions
See similar questions with these tags.
select(L1,k)
andselect(L3,k-length(L1)-length(L2))
. Also, please proof-rad your formatting and indent/spacing. $\endgroup$