Why Does the Deterministic Selection Algorithm Run in $O(n)$ and not $O(n\log n)$?

Question 1

Here is pseudocode for the algorithm:

select(L,k)
{
if (L has 10 or fewer elements)
{
 sort L
 return the element in the kth position
}
partition L into subsets S[i] of five elements each
 (there will be n/5 subsets total).
for (i = 1 to n/5) do
 x[i] = select(S[i],3)
M = select({x[i]}, n/10)
partition L into L1<M, L2=M, L3>M
if (k <= length(L1))
 return select(L1,k)
else if (k > length(L1)+length(L2))
 return select(L3,k-length(L1)-length(L2))
else return M
}

Here is some analysis of the algorithm: http://www.ics.uci.edu/~eppstein/161/960130.html

The analysis suggests to use the recurrence relation $T(n) \leq \frac{12n}{5} + T(\frac{n}5) + T(\frac{7n}{10})$. Solving this we get linear work for a call. But aren't there log many recursive calls? So that would be $n\log n$.

Put another way, conceptually this algorithm seems like it could be described as "for each call, cut the search area kind of like binary search, but not guaranteed to cut the search area by as much as binary search, and add a linear time partition". Binary search runs in $O(\log(n)),ドル so adding a linear search per call would make it $O(n\log(n))$.

What am I not understanding about the linked analysis?

Question 2

What do you think the recurrence relation should be? Why do you think there are log many recursive calls? Syntactically, I only see two: select(L1,k) and select(L3,k-length(L1)-length(L2)). Also, please proof-rad your formatting and indent/spacing.

Question 3

Hi. I'm not sure, but conceptually the algorithm seems like "cut the search area like binary search, but maybe not by as much as binary search, and add a linear time partition." Binary search runs in O(log(n)). So this would run in O(n*log(n)). The two analyses are opposed. What is wrong with this analysis compared to the recurrence relation presented?

Question 4

Further: the recurrence relation for binary search is T(n) = T(n/2) + 1, which "telescoped" is "O(log(n))". The given recurrence relation for this algorithm is T(n) <= 12/5n + T(n/5) + T(7n/10). Telescoping in the same way each T(7n/10) term would have an n term plus a T(n/5) term and plus a T(7n/10) term until base case. So it seems this relation "telescopes" to something like T(n) <= n + nlog(n) even discarding the (n/5) term, which is T(n) <= n*log(n)

Question 5

And there are log many recursive calls because if you do a call each time you cut the size in half (as in binary search), you do log many calls, and this is only guaranteed to cut down the size by .7 each call, which is greater than "in half", so you would do (at least) log many calls.

Question 6

Consider the following series

$S = n + \lceil n/2 \rceil + \lceil n/4 \rceil + ... + 1$

There are $\lceil \log_2 n \rceil + 1$ terms but the sum is less than 2ドル^{\lceil \log_2 n \rceil+1} < 4n$ always.

Similarly, in your algorithm even though you have $\log n$ recursions, each recursion successively does a fraction of work. Hence the total work done in linear in $n,ドル rather than $\Theta(n \log n)$ as the naive analysis suggests, just similar to the sum of the series above. '

Also note that $T(n)$ is $O(n)$ as well as $O(n\log n)$. This is because $O(n\log n)$ is also an upper bound to $T(n)$ and therefore it is incorrect to say, for example, "Why is $T(n)$ not $O(n\log n)$?".

Question 7

I still don't get it. This recurrence: T(N) <= 12/5N + T(N/5) + T(7N/10). MergeSort recurrence: T(N) = 2T(N/2) + N. MergeSort complexity: O(N*log(N)) for the reasons I stated before. If your explanation is right, why is MergeSort complexity not O(N)?

Question 8

Merge sort is n + (n/2 + n/2) + (n/4 + n/4 + n/4 + n/4) + (n/8 + n/8 + n/8 + n/8 + n/8 + n/8 + n/8 +n/8) ... In each level work does not diminishes, it is same as n. And therefore boils down to $n \log n$ for merge sort.

Question 9

Total work in k-order statistics is ~ T(N/5) + T(7n/10) ~ C.9n/10

Question 10

OK, the intuition is clicking now. Thank you.

Question 11

It did not help my understanding that in my implementation of the algorithm, I was doing a linear scan of the entire initial array to find the index of the element to partition around. Instead, I should do a linear scan only within the bounds of the subarray.

score 3 · Accepted Answer · 2016-03-12 05:43:23Z

3

$\begingroup$

Consider the following series

$S = n + \lceil n/2 \rceil + \lceil n/4 \rceil + ... + 1$

There are $\lceil \log_2 n \rceil + 1$ terms but the sum is less than 2ドル^{\lceil \log_2 n \rceil+1} < 4n$ always.

Similarly, in your algorithm even though you have $\log n$ recursions, each recursion successively does a fraction of work. Hence the total work done in linear in $n,ドル rather than $\Theta(n \log n)$ as the naive analysis suggests, just similar to the sum of the series above. '

Also note that $T(n)$ is $O(n)$ as well as $O(n\log n)$. This is because $O(n\log n)$ is also an upper bound to $T(n)$ and therefore it is incorrect to say, for example, "Why is $T(n)$ not $O(n\log n)$?".

Share

Improve this answer

edited Mar 12, 2016 at 6:01

answered Mar 12, 2016 at 5:43

Sarvottamananda's user avatar

Sarvottamananda Sarvottamananda

4,8771 gold badge14 silver badges19 bronze badges

$\endgroup$

6

1

$\begingroup$ I still don't get it. This recurrence: T(N) <= 12/5N + T(N/5) + T(7N/10). MergeSort recurrence: T(N) = 2T(N/2) + N. MergeSort complexity: O(N*log(N)) for the reasons I stated before. If your explanation is right, why is MergeSort complexity not O(N)? $\endgroup$

justAnotherGuy
– justAnotherGuy

2016年03月12日 06:28:09 +00:00
Commented Mar 12, 2016 at 6:28
$\begingroup$ Merge sort is n + (n/2 + n/2) + (n/4 + n/4 + n/4 + n/4) + (n/8 + n/8 + n/8 + n/8 + n/8 + n/8 + n/8 +n/8) ... In each level work does not diminishes, it is same as n. And therefore boils down to $n \log n$ for merge sort. $\endgroup$

Sarvottamananda
– Sarvottamananda

2016年03月12日 06:30:36 +00:00
Commented Mar 12, 2016 at 6:30
$\begingroup$ Total work in k-order statistics is ~ T(N/5) + T(7n/10) ~ C.9n/10 $\endgroup$

Sarvottamananda
– Sarvottamananda

2016年03月12日 06:34:48 +00:00
Commented Mar 12, 2016 at 6:34
$\begingroup$ OK, the intuition is clicking now. Thank you. $\endgroup$

justAnotherGuy
– justAnotherGuy

2016年03月12日 07:03:43 +00:00
Commented Mar 12, 2016 at 7:03
$\begingroup$ It did not help my understanding that in my implementation of the algorithm, I was doing a linear scan of the entire initial array to find the index of the element to partition around. Instead, I should do a linear scan only within the bounds of the subarray. $\endgroup$

justAnotherGuy
– justAnotherGuy

2016年03月12日 07:13:52 +00:00
Commented Mar 12, 2016 at 7:13

| Show 1 more comment

Stack Exchange Network

Why Does the Deterministic Selection Algorithm Run in $O(n)$ and not $O(n\log n)$?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Why Does the Deterministic Selection Algorithm Run in $O(n)$ and not $O(n\log n)$?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions