I have just proved that for every $\alpha, \beta>0 : (\log n)^\alpha=O(n^\beta)$.
Now, given an array of $n$ elements, I want to find an efficient comparison based algorithm for finding the $\log n$ largest elements in the array, and returning them in sorted order.
I will appreciate your help on what is the best way to solve this question, and how should I think when countering such a question.
-
$\begingroup$ You can omit requirement $\alpha \gt 0$. $\endgroup$zkutch– zkutch2021年04月25日 10:57:47 +00:00Commented Apr 25, 2021 at 10:57
-
$\begingroup$ It just makes the statement stronger. take $\alpha>1$ $\endgroup$nir shahar– nir shahar2021年04月25日 11:00:12 +00:00Commented Apr 25, 2021 at 11:00
-
1$\begingroup$ Thanks, but this is not what I asked... I have proved the above and should use it $\endgroup$Math4me– Math4me2021年04月25日 11:11:16 +00:00Commented Apr 25, 2021 at 11:11
2 Answers 2
A possibility (don't think it's the simplest way, though) is:
- Transform the array into a max-heap;
- find (without extracting) the $\log n$ largests elements in the heap.
The first step can be done in $O(n)$. For the second step, let's give a little bit details:
- The largest element is found among one element (it's the root);
- the second largest must be found among two elements (the two children of the root);
- in general, the $k$-th largest element must be found among $k$ elements, because each time you select an element to be the largest, you remove it from the candidates and add its two children.
That means that after having found the $k$ largest elements, finding the $k+1$-th is done by searching the maximum value among $k+1$, so it is done in $O(k)$. Since we want to find at most $\log n$ elements, the total search takes $O((\log n)^2)$ time, and with the property you proved, it is $O(n)$.
That means that the total complexity of the algorithm is $O(n)$ which is obviously optimal since you need to browse all elements of the array.
Edit: Actually, I realize (that confirms what I initially said) that there is a simpler way to do it:
Use a quickselect algorithm to find the $\log n$-th largest element (no plural here) of the array (this is done in $O(n)$). By doing so in place, the $\log n$ largest elements will be placed in consecutive positions at the end of the array. You can then sort those $\log n$ elements in time complexity $O(\log n \log \log n) \subset O((\log n)^2) \subset O(n)$.
-
$\begingroup$ Excellent! Thank you so much - that was very helpful! I appreciate it! :D $\endgroup$Math4me– Math4me2021年04月25日 13:25:25 +00:00Commented Apr 25, 2021 at 13:25
-
$\begingroup$ Can you please explain more about the quickselect option (I know this sort algorithm) $\endgroup$Math4me– Math4me2021年04月25日 14:16:16 +00:00Commented Apr 25, 2021 at 14:16
-
$\begingroup$ @Math4me It is quite long to explain, so I suggest you read the wikipedia article I linked in my answer. Note that the quickselect algorithm is not the same as quicksort (though they have similarities). $\endgroup$Nathaniel– Nathaniel2021年04月25日 14:25:58 +00:00Commented Apr 25, 2021 at 14:25
-
$\begingroup$ No need for an edit tag, see here cs.meta.stackexchange.com/q/657/472 $\endgroup$Juho– Juho2021年04月25日 16:31:24 +00:00Commented Apr 25, 2021 at 16:31
Further to what Nathaniel said, you can think of this as a simple modification of quicksort, where you ignore any partitions which fall outside the range of interest.
Given:
algorithm quicksort(A, lo, hi) is
if lo < hi then
p := partition(A, lo, hi)
quicksort(A, lo, p - 1)
quicksort(A, p + 1, hi)
Then this algorithm sorts the top $k$ elements in order, leaving the other elements in an unknown order:
algorithm quicksort_k(A, k, lo, hi) is
if lo < hi then
p := partition(A, lo, hi)
quicksort_k(A, k, lo, p - 1)
if k > p then
quicksort_k(A, k, p + 1, hi)
The expected running time is $O(n \log k)$.
The same trick works with any sort algorithm that is conceptually based on partitioning, such as MSD radix sort.
Explore related questions
See similar questions with these tags.