To find the median of an unsorted array, we can make a min-heap in $O(n\log n)$ time for $n$ elements, and then we can extract one by one $n/2$ elements to get the median. But this approach would take $O(n \log n)$ time.
Can we do the same by some method in $O(n)$ time? If we can, then how?
-
3$\begingroup$ Note the related meta discussion; as it turns out, simple web searches lead to the answer to this question. $\endgroup$Raphael– Raphael2012年05月19日 10:30:53 +00:00Commented May 19, 2012 at 10:30
-
2$\begingroup$ stackoverflow.com/questions/2571358/median-of-a-billion-numbers $\endgroup$Evgeny Zislis– Evgeny Zislis2015年08月04日 20:51:44 +00:00Commented Aug 4, 2015 at 20:51
-
$\begingroup$ One can make a heap in O(n) time (check any good book or the internet for the build_heap method). Secondly, this is a very standard and well-known problem that can be found in any standard algorithm book or by simply doing a quick search on the internet, as pointed out by others. $\endgroup$codeR– codeR2024年01月31日 09:42:22 +00:00Commented Jan 31, 2024 at 9:42
-
$\begingroup$ Does this answer your question? Computational complexity of calculating mean vs median? $\endgroup$codeR– codeR2024年01月31日 09:44:47 +00:00Commented Jan 31, 2024 at 9:44
4 Answers 4
This is a special case of a selection algorithm that can find the $k$th smallest element of an array with $k$ is the half of the size of the array. There is an implementation that is linear in the worst case.
Generic selection algorithm
First let's see an algorithm find-kth
that finds the $k$th smallest element of an array:
find-kth(A, k)
pivot = random element of A
(L, R) = split(A, pivot)
if k = |L|+1, return pivot
if k ≤ |L| , return find-kth(L, k)
if k > |L|+1, return find-kth(R, k-(|L|+1))
The function split(A, pivot)
returns L,R
such that all elements in R
are greater than pivot
and L
all the others (minus one occurrence of pivot
). Then all is done recursively.
This is $O(n)$ in average but $O(n^2)$ in the worst case.
Linear worst case: the median-of-medians algorithm
A better pivot is the median of all the medians of sub arrays of A
of size 5, by using calling the procedure on the array of these medians.
find-kth(A, k)
B = [median(A[1], .., A[5]), median(A[6], .., A[10]), ..]
pivot = find-kth(B, |B|/2)
...
This guarantees $O(n)$ in all cases. It is not that obvious. These powerpoint slides are helpful both at explaining the algorithm and the complexity.
Note that most of the time using a random pivot is faster.
-
$\begingroup$ Is this size
5
standard ? What if size of A is less than 5 ? $\endgroup$Jayesh– Jayesh2016年02月21日 20:02:02 +00:00Commented Feb 21, 2016 at 20:02 -
4$\begingroup$ On the first algorithm:
return A[k]
is incorrect (unlessA
is sorted which would make the algorithm moot). Ifsplit
happened to divideA
such thatk = |L| + 1
you still don't know where thek
th element is. Your base case is when|A| = 1
else you still need to make one of the two recursive calls. $\endgroup$wcochran– wcochran2016年04月30日 04:48:34 +00:00Commented Apr 30, 2016 at 4:48 -
2$\begingroup$ @NickCaplinger fixed using web.archive.org $\endgroup$jmad– jmad2016年05月17日 01:06:50 +00:00Commented May 17, 2016 at 1:06
-
2$\begingroup$ Isn't the worst case for the generic selection algorithm O(NlogN)? Even if the recursive call leaves only 10% of the array after each call, then it's still a logarithm in basis 10. $\endgroup$bsky– bsky2017年03月29日 14:40:33 +00:00Commented Mar 29, 2017 at 14:40
-
1$\begingroup$ @octavian $O(N^2)$ seems right, in the worst case exactly one is removed, so there will be $N$ iterations, each of complexity $N$ $\endgroup$rtpax– rtpax2018年08月23日 18:32:24 +00:00Commented Aug 23, 2018 at 18:32
There is a randomized Monte Carlo algorithm to do this. It is described in [MU2017]. It is a linear time algorithm that fails to produce a solution with probability at most $n^{-1/4}$. As discussed in [MU2017], it is significantly simpler than the deterministic $O(n)$ algorithm and yields a smaller constant factor in the linear running time.
The main idea of the algorithm is to use sampling. We have to find two elements that are close together in the sorted order of the array and that have the median lie between them. See the reference [MU2017] for a complete discussion.
[MU2017] Michael Mitzenmacher and Eli Upfal. "Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis," chapter 3, pages 57-62. Cambridge University Press, Second edition, 2017.
Remember the Quicksort algorithm?
The difference is that you don't need the whole array sorted, you only need the portion containing the median in the right place. If you have n elements 0 to n-1, then the median is element (n - 1)/2 if n is even, and the average of elements n/2 - 1 and n/2 if n is odd.
So you start with one quicksort partition. Then you look at the two partitions: Instead of sorting them both, you only partition the one containing the one or two elements that you are interested in, and that half gets partitioned further.
(In the unlucky case that you end up with two partitions, one ending at n/2 - 1 and one starting at n, you can just take the maximum and minimum of each side)
You can easily adapt this method to find the k largest or k smallest items, or items with index k to k' in linear time.
-
$\begingroup$ (Is this a useful addition to the accepted answer?) $\endgroup$greybeard– greybeard2020年02月01日 12:40:00 +00:00Commented Feb 1, 2020 at 12:40
-
$\begingroup$ It tells you how you can find an algorithm yourself. That's much more important than being given the algorithm. $\endgroup$gnasher729– gnasher7292020年02月02日 15:56:01 +00:00Commented Feb 2, 2020 at 15:56
I have one more solution for Find median of unsorted array in O(n) time but it will Increase Space Complexity.
#include<stdio.h>
#include<conio.h>
int main()
{
int a[11] = {10,45,35,3,24,23,78,89,33,34,11};
int max=0,*ar,co=0,mid=0;
for(int i=0; i<sizeof(a)/4; i++) // O(n)
{
if(i==0)
max = a[i];
if(a[i]>max)
max = a[i];
}
ar =(int*)malloc((sizeof(int)*max)+1); // O(n)
for(int i=0; i<sizeof(a)/4; i++)
ar[a[i]] = 1;
for(int i=0; i<=max; i++) // 0(n)
if(ar[i]==1)
{
co++;
if(co == ((sizeof(a)/8)+1))
mid = i;
}
printf("%d",mid);
}
//Total = n+n+n = O(n)
-
1$\begingroup$ I appreciate your participation. However, we're not a coding site and we're not looking for answers that consist entirely or primarily of code. Instead, we're looking for ideas, concise pseudocode, explanation, justification, etc. Also, not everyone here reads C code. I encourage you to edit your question accordingly. Thank you. $\endgroup$2020年02月01日 06:09:56 +00:00Commented Feb 1, 2020 at 6:09
-
1$\begingroup$ Please include an argument how the last loop requires $O(n)$ operations. $\endgroup$greybeard– greybeard2020年02月01日 08:54:02 +00:00Commented Feb 1, 2020 at 8:54
-
$\begingroup$ The memory allocation is $O(k),ドル where $k$ is the $\max (a)$. Since each number is represented with $O(n)$ bits, $k$ might be exponential in $n$. This also means the last loop is $O(2^n)$ like greybeard suggested. $\endgroup$Shahaf Finder– Shahaf Finder2020年02月01日 12:46:36 +00:00Commented Feb 1, 2020 at 12:46
-
1$\begingroup$ @sejpalsinh $n$ means the number of element in the given array. It does NOT mean the max or an upper bound of the numbers in the given array. $\endgroup$喜欢算法和数学– 喜欢算法和数学2020年02月01日 14:42:20 +00:00Commented Feb 1, 2020 at 14:42