Find median of unsorted array in $O(n)$ time

Question 1

To find the median of an unsorted array, we can make a min-heap in $O(n\log n)$ time for $n$ elements, and then we can extract one by one $n/2$ elements to get the median. But this approach would take $O(n \log n)$ time.

Can we do the same by some method in $O(n)$ time? If we can, then how?

Question 2

Note the related meta discussion; as it turns out, simple web searches lead to the answer to this question.

Question 3

stackoverflow.com/questions/2571358/median-of-a-billion-numbers

Question 4

One can make a heap in O(n) time (check any good book or the internet for the build_heap method). Secondly, this is a very standard and well-known problem that can be found in any standard algorithm book or by simply doing a quick search on the internet, as pointed out by others.

Question 5

Does this answer your question? Computational complexity of calculating mean vs median?

Question 6

This is a special case of a selection algorithm that can find the $k$th smallest element of an array with $k$ is the half of the size of the array. There is an implementation that is linear in the worst case.

Generic selection algorithm

First let's see an algorithm find-kth that finds the $k$th smallest element of an array:

find-kth(A, k)
 pivot = random element of A
 (L, R) = split(A, pivot)
 if k = |L|+1, return pivot
 if k ≤ |L| , return find-kth(L, k)
 if k > |L|+1, return find-kth(R, k-(|L|+1))

The function split(A, pivot) returns L,R such that all elements in R are greater than pivot and L all the others (minus one occurrence of pivot). Then all is done recursively.

This is $O(n)$ in average but $O(n^2)$ in the worst case.

Linear worst case: the median-of-medians algorithm

A better pivot is the median of all the medians of sub arrays of A of size 5, by using calling the procedure on the array of these medians.

find-kth(A, k)
 B = [median(A[1], .., A[5]), median(A[6], .., A[10]), ..]
 pivot = find-kth(B, |B|/2)
 ...

This guarantees $O(n)$ in all cases. It is not that obvious. These powerpoint slides are helpful both at explaining the algorithm and the complexity.

Note that most of the time using a random pivot is faster.

Question 7

Is this size 5 standard ? What if size of A is less than 5 ?

Question 8

On the first algorithm: return A[k] is incorrect (unless A is sorted which would make the algorithm moot). If split happened to divide A such that k = |L| + 1 you still don't know where the kth element is. Your base case is when |A| = 1 else you still need to make one of the two recursive calls.

Question 9

@NickCaplinger fixed using web.archive.org

Question 10

Isn't the worst case for the generic selection algorithm O(NlogN)? Even if the recursive call leaves only 10% of the array after each call, then it's still a logarithm in basis 10.

Question 11

@octavian $O(N^2)$ seems right, in the worst case exactly one is removed, so there will be $N$ iterations, each of complexity $N$

Question 12

There is a randomized Monte Carlo algorithm to do this. It is described in [MU2017]. It is a linear time algorithm that fails to produce a solution with probability at most $n^{-1/4}$. As discussed in [MU2017], it is significantly simpler than the deterministic $O(n)$ algorithm and yields a smaller constant factor in the linear running time.

The main idea of the algorithm is to use sampling. We have to find two elements that are close together in the sorted order of the array and that have the median lie between them. See the reference [MU2017] for a complete discussion.

[MU2017] Michael Mitzenmacher and Eli Upfal. "Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis," chapter 3, pages 57-62. Cambridge University Press, Second edition, 2017.

Question 13

Remember the Quicksort algorithm?

The difference is that you don't need the whole array sorted, you only need the portion containing the median in the right place. If you have n elements 0 to n-1, then the median is element (n - 1)/2 if n is even, and the average of elements n/2 - 1 and n/2 if n is odd.

So you start with one quicksort partition. Then you look at the two partitions: Instead of sorting them both, you only partition the one containing the one or two elements that you are interested in, and that half gets partitioned further.

(In the unlucky case that you end up with two partitions, one ending at n/2 - 1 and one starting at n, you can just take the maximum and minimum of each side)

You can easily adapt this method to find the k largest or k smallest items, or items with index k to k' in linear time.

Question 14

(Is this a useful addition to the accepted answer?)

Question 15

It tells you how you can find an algorithm yourself. That's much more important than being given the algorithm.

Question 16

I have one more solution for Find median of unsorted array in O(n) time but it will Increase Space Complexity.

#include<stdio.h>
#include<conio.h>
int main()
{
 int a[11] = {10,45,35,3,24,23,78,89,33,34,11};
 int max=0,*ar,co=0,mid=0;
 for(int i=0; i<sizeof(a)/4; i++) // O(n)
 {
 if(i==0)
 max = a[i];
 if(a[i]>max)
 max = a[i];
 }
 ar =(int*)malloc((sizeof(int)*max)+1); // O(n)
 for(int i=0; i<sizeof(a)/4; i++)
 ar[a[i]] = 1;
 for(int i=0; i<=max; i++) // 0(n)
 if(ar[i]==1)
 {
 co++;
 if(co == ((sizeof(a)/8)+1))
 mid = i;
 }
 printf("%d",mid);
}
//Total = n+n+n = O(n)

Question 17

I appreciate your participation. However, we're not a coding site and we're not looking for answers that consist entirely or primarily of code. Instead, we're looking for ideas, concise pseudocode, explanation, justification, etc. Also, not everyone here reads C code. I encourage you to edit your question accordingly. Thank you.

Question 18

Please include an argument how the last loop requires $O(n)$ operations.

Question 19

The memory allocation is $O(k),ドル where $k$ is the $\max (a)$. Since each number is represented with $O(n)$ bits, $k$ might be exponential in $n$. This also means the last loop is $O(2^n)$ like greybeard suggested.

Question 20

@sejpalsinh $n$ means the number of element in the given array. It does NOT mean the max or an upper bound of the numbers in the given array.

jmad jmad 9,6081 gold badge40 silver badges43 bronze badges · Accepted Answer · 2012-05-20 00:23:13Z

This is a special case of a selection algorithm that can find the $k$th smallest element of an array with $k$ is the half of the size of the array. There is an implementation that is linear in the worst case.

Generic selection algorithm

First let's see an algorithm find-kth that finds the $k$th smallest element of an array:

find-kth(A, k)
 pivot = random element of A
 (L, R) = split(A, pivot)
 if k = |L|+1, return pivot
 if k ≤ |L| , return find-kth(L, k)
 if k > |L|+1, return find-kth(R, k-(|L|+1))

The function split(A, pivot) returns L,R such that all elements in R are greater than pivot and L all the others (minus one occurrence of pivot). Then all is done recursively.

This is $O(n)$ in average but $O(n^2)$ in the worst case.

Linear worst case: the median-of-medians algorithm

A better pivot is the median of all the medians of sub arrays of A of size 5, by using calling the procedure on the array of these medians.

find-kth(A, k)
 B = [median(A[1], .., A[5]), median(A[6], .., A[10]), ..]
 pivot = find-kth(B, |B|/2)
 ...

This guarantees $O(n)$ in all cases. It is not that obvious. These powerpoint slides are helpful both at explaining the algorithm and the complexity.

Note that most of the time using a random pivot is faster.

Is this size 5 standard ? What if size of A is less than 5 ?
On the first algorithm: return A[k] is incorrect (unless A is sorted which would make the algorithm moot). If split happened to divide A such that k = |L| + 1 you still don't know where the kth element is. Your base case is when |A| = 1 else you still need to make one of the two recursive calls.
Isn't the worst case for the generic selection algorithm O(NlogN)? Even if the recursive call leaves only 10% of the array after each call, then it's still a logarithm in basis 10.
@octavian $O(N^2)$ seems right, in the worst case exactly one is removed, so there will be $N$ iterations, each of complexity $N$

Stack Exchange Network

Find median of unsorted array in $O(n)$ time

4 Answers 4

Generic selection algorithm

Linear worst case: the median-of-medians algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Find median of unsorted array in $O(n)$ time

4 Answers 4

Generic selection algorithm

Linear worst case: the median-of-medians algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions