I am looking for an algorithm for selecting A [N/4] the element in an unsorted array A where N is the Number of elements of the array A. I want the algorithm to do the selection in sublinear times .I have knowledge of basic structures like a BST etc? Which one will be the best algorithm for me keeping in mind I want it to be the fastest possible and should not be too tough for me to implement.Here N can vary upto 250000.Any help will be highly appreciated.Note array can have non unique elements
-
Isn't this a projecteuler.net problem?Jon Egeland– Jon Egeland2012年07月07日 21:40:41 +00:00Commented Jul 7, 2012 at 21:40
-
2Given unsorted input, I think the best you can hope for is linear complexity. Unless one element can give you information about another, if you don't look at all elements, one you didn't look at could be the one you're looking for. For linear time, you're looking for the median of medians algorithm.Jerry Coffin– Jerry Coffin2012年07月07日 21:43:50 +00:00Commented Jul 7, 2012 at 21:43
-
can u help me with implementation as well because I cant find any good pseudo code for it?Ajax Aristodemos– Ajax Aristodemos2012年07月07日 22:03:51 +00:00Commented Jul 7, 2012 at 22:03
-
1wikipedia or any decent algorithms book (eg: MIT)Karoly Horvath– Karoly Horvath2012年07月07日 22:42:39 +00:00Commented Jul 7, 2012 at 22:42
1 Answer 1
As @Jerry Coffin mentioned, you cannot hope to get a sublinear time algorithm here unless you are willing to do some preprocessing up front. If you want a linear-time algorithm for this problem, you can use the quickselect algorithm, which runs in expected O(n) time with an O(n2) worst-case. The median-of-medians algorithm has worst-case O(n) behavior, but has a high constant factor. One algorithm that you might find useful is the introselect algorithm, which combines the two previous algorithms to get a worst-case O(n) algorithm with a low constant factor. This algorithm is typically what's used to implement the std::nth_element algorithm in the C++ standard library.
If you are willing to do some preprocessing ahead of time, you can put all of the elements into an order statistic tree. From that point forward, you can look up the kth element for any k in time O(log n) worst-case. The preprocessing time required is O(n log n), though, so unless you are making repeated queries this is unlikely to be the best option.
Hope this helps!