Probabilistic selection algorithm

Question 1

Given is:

An array of length N.
The array contains integers.
The integers are not necessarily sorted.

Find an algorithm that:

Returns (a close approximation of)the K-th smallest array element.
Has a runtime complexity of O(N log N) and a space complexity of O(log N).
The algorithm needn't be deterministic. In case of a probabilistic algorithm also provide a measure for the quality of the approximated result.

Question 2

Please add more information. What's the "selection algorithm" being modified? Also, if it's homework you should tag it as such.

Question 3

Selection algorithm is an algorithm to select kth order element in an array without sorting it.

Question 4

@Algos - is this homework? (If it is, then I'm sure that the point of it is to get you to work the answer out for yourself.)

Question 5

yes its a homework.. stuck thr for a long time .. so I was looking for some hint .

Question 6

we need a randomized algorithm with O (n log n ) time in high probability

Question 7

Treat the problem as something analogous to Quicksort. Given an element in the array, you can get its rank in O(n) time and O(lg n) space. You can use binary search to find an element with a given rank in O(lg n) iterations of that, for a total of O(lg n) space and O(n lg n) time.

Question 8

How to get rank of an element in O(n) in case of read only ?. the problem is to get the rank of an element in (n log n ) time.

Question 9

@Algos: You can get the rank of a given element in O(n); getting an element with a given rank is what we are trying to solve.

Question 10

yep .. got that .. can you explain further the algorithm. still somewhat confused.

Question 11

@Algos: Assume rank 0 is the smallest array element. Get the largest and smallest elements of the array (call them lo and hi) and their ranks (lo_rank and hi_rank). Label A: Then, pick some element m between lo and hi (in O(n) time) and compute its rank m_rank (in O(n)). If m_rank == k, we are done. If m_rank < k, replace lo and lo_rank with m and m_rank respectively and go to A; if m_rank > k, do the same with hi and go to A.

Question 12

by rank do you mean thr index ?? min element will have rank 0 & max with rank n.

Question 13

Iterate over the array once to find the minimum and maximum element.
Iterate over the array to find a random element pivot between minimum and maximum (exclusive).
Iterate over the array and count the number of elements less than or equal to pivot (numSmallerEqual) and the number of elements greater than pivot (numBigger).
1. If K <= numSmallerEqual, set maximum=pivot.
2. Else set minimum=pivot.
If (maximum - minimum)==0, output minimum, terminate.
If (maximum - minimum)==1
1. If K <= numSmallerEqual, output minimum.
2. Else output maximum.
3. Terminate.
GOTO 2:

EDIT: Corrected the error pointed out by lVlad, still not tested.

Question 14

This in fact does not require O(lg n) space as restrained, but O(1). Very nice!

Question 15

Except it's wrong.

1 3, k = 2. pivot = 2, minimum = 1, maximum = 3, numSmallerEqual = 1, minimum = 2. pivot = 2, numSmallerEqual = 1, minimum = 2

... infinite loop.

Question 16

@Dave - I would have suggested a fix if I had one. I'm not sure how to avoid the infinite loop.

Question 17

I haven't given a lot of thought to the correctness of your updated algorithm, but it's worth pointing out that it doesn't really follow the requirements. The complexity is O(n log v), where v is the difference between the maximum and minimum values of the array, not O(n log n). v can be independent of n.

Question 18

Why did you change the question?

Question 19

Don't construct partitions. Describe what the partitions are (in constant space), and recursively select on this.

Each subarray that quickselect recurses into can be described by its bounds (min and max element values, not their indexes). Iterating over a subarray so described requires O(n) comparisons, which are made at every recursion level, up to the same depth as in quicksort: O(log n) in the average case.

Quicksort also makes O(n) comparisons at every recursion level, while ordinary permuting quickselect makes O(n) comparisons in total in the average case (because it always recurses into only one partition).

Here's a sample implementation for distinct elements with an ordinary quickselect implementation for comparison.

Question 20

@Moron – You want O(1) per iteration. "Numbers less than a given element."

Question 21

@Moron – You recurse on all the elements less than the one at rank 100. Say you then find one at rank 50. You recurse on all the elements between the one ranked 50 and the one ranked 100. Iteration, instead of being free, now uses 2*n comparisons per recursion.

Question 22

@Moron – That's not my assumption, and I don't need a contiguous subarray. lo and hi are element values. I will iterate over all n elements of the original array in each step, but I will ignore those x that don't satisfy lo ≤ x ≤ hi.

Question 23

@Moron – Only an intuitive one. Randomized quickselect uses O(n) comparisons per step and recurses an expected O(log n) times, giving O(n log n) runtime. This uses an extra 2 n comparisons per step and recurses exactly like quickselect, so it's still O(n log n).

Question 24

@Moron – You're right. I was thinking that quickselect performs the same as quicksort (expected O(n log n) comparisons). So my algorithm uses more comparisons than quickselect, but approximately the same number of comparisons as quicksort, which Wikipedia says is expected O(n log n). Maybe "with high probability" from the question means "low probability of worst-case behaviour"? Also my algorithm uses only constant space (one recursion which is a tail call), so it's probably not the required answer.

Question 25

You can take the parameterization approach:

Since we have k specified in the problem, we can treat it as a constant, thus space of O(k log N) should be acceptable.

Divide the array into k partitions of equal length (O(1) time and O(k log N) space to store the partition borders because each key should only require log N space)
Find the smallest element in each partition (O(N) time and O(k log N) space to store the smallest elements
return the max of the k elements you have (O(k) time)

Since there's room for more cycles in there, there's probably a better way to do it. Also note that this would perform very poorly on an array that is sorted...

Jeremiah Willcock 31.2k8 gold badges81 silver badges79 bronze badges · Answer 1 · 2011-02-19 02:27:48Z

2

Treat the problem as something analogous to Quicksort. Given an element in the array, you can get its rank in O(n) time and O(lg n) space. You can use binary search to find an element with a given rank in O(lg n) iterations of that, for a total of O(lg n) space and O(n lg n) time.

Share

Improve this answer

answered Feb 19, 2011 at 2:27

Jeremiah Willcock's user avatar

Jeremiah Willcock

31.2k8 gold badges81 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

Algos

Algos Over a year ago

How to get rank of an element in O(n) in case of read only ?. the problem is to get the rank of an element in (n log n ) time.

2011年02月19日T02:34:53.243Z+00:00

Jeremiah Willcock

Jeremiah Willcock Over a year ago

@Algos: You can get the rank of a given element in O(n); getting an element with a given rank is what we are trying to solve.

2011年02月19日T02:36:26.317Z+00:00

Algos

Algos Over a year ago

yep .. got that .. can you explain further the algorithm. still somewhat confused.

2011年02月19日T02:39:07.997Z+00:00

Jeremiah Willcock

Jeremiah Willcock Over a year ago

@Algos: Assume rank 0 is the smallest array element. Get the largest and smallest elements of the array (call them lo and hi) and their ranks (lo_rank and hi_rank). Label A: Then, pick some element m between lo and hi (in O(n) time) and compute its rank m_rank (in O(n)). If m_rank == k, we are done. If m_rank < k, replace lo and lo_rank with m and m_rank respectively and go to A; if m_rank > k, do the same with hi and go to A.

2011年02月19日T02:47:36.543Z+00:00

Algos

Algos Over a year ago

by rank do you mean thr index ?? min element will have rank 0 & max with rank n.

2011年02月19日T02:53:47.263Z+00:00

|

corsiKa corsiKa Over a year ago · Answer 2 · 2011-02-19 03:43:48Z

2

Iterate over the array once to find the minimum and maximum element.
Iterate over the array to find a random element pivot between minimum and maximum (exclusive).
Iterate over the array and count the number of elements less than or equal to pivot (numSmallerEqual) and the number of elements greater than pivot (numBigger).
1. If K <= numSmallerEqual, set maximum=pivot.
2. Else set minimum=pivot.
If (maximum - minimum)==0, output minimum, terminate.
If (maximum - minimum)==1
1. If K <= numSmallerEqual, output minimum.
2. Else output maximum.
3. Terminate.
GOTO 2:

EDIT: Corrected the error pointed out by lVlad, still not tested.

Share

Improve this answer

edited Feb 20, 2011 at 2:29

community wiki

5 revs, 2 users 88%
Dave O.

14 Comments

corsiKa

corsiKa Over a year ago

This in fact does not require O(lg n) space as restrained, but O(1). Very nice!

2011年02月19日T05:29:23.027Z+00:00

IVlad

IVlad Over a year ago

Except it's wrong.

1 3, k = 2. pivot = 2, minimum = 1, maximum = 3, numSmallerEqual = 1, minimum = 2. pivot = 2, numSmallerEqual = 1, minimum = 2

... infinite loop.

2011年02月19日T11:08:13.73Z+00:00

IVlad

IVlad Over a year ago

@Dave - I would have suggested a fix if I had one. I'm not sure how to avoid the infinite loop.

2011年02月19日T13:52:45.817Z+00:00

IVlad

IVlad Over a year ago

I haven't given a lot of thought to the correctness of your updated algorithm, but it's worth pointing out that it doesn't really follow the requirements. The complexity is O(n log v), where v is the difference between the maximum and minimum values of the array, not O(n log n). v can be independent of n.

2011年02月19日T18:55:05.223Z+00:00

Aryabhatta

Aryabhatta Over a year ago

Why did you change the question?

2011年02月19日T22:33:46.303Z+00:00

|

aaz 5,21424 silver badges18 bronze badges · Answer 3 · 2011-02-19 02:27:49Z

2

Don't construct partitions. Describe what the partitions are (in constant space), and recursively select on this.

Each subarray that quickselect recurses into can be described by its bounds (min and max element values, not their indexes). Iterating over a subarray so described requires O(n) comparisons, which are made at every recursion level, up to the same depth as in quicksort: O(log n) in the average case.

Quicksort also makes O(n) comparisons at every recursion level, while ordinary permuting quickselect makes O(n) comparisons in total in the average case (because it always recurses into only one partition).

Here's a sample implementation for distinct elements with an ordinary quickselect implementation for comparison.

Share

Improve this answer

edited Feb 21, 2011 at 0:14

answered Feb 19, 2011 at 2:27

aaz's user avatar

aaz

5,21424 silver badges18 bronze badges

18 Comments

aaz

aaz Over a year ago

@Moron – You want O(1) per iteration. "Numbers less than a given element."

2011年02月19日T02:38:23.447Z+00:00

aaz

aaz Over a year ago

@Moron – You recurse on all the elements less than the one at rank 100. Say you then find one at rank 50. You recurse on all the elements between the one ranked 50 and the one ranked 100. Iteration, instead of being free, now uses 2*n comparisons per recursion.

2011年02月19日T02:53:35.043Z+00:00

aaz

aaz Over a year ago

@Moron – That's not my assumption, and I don't need a contiguous subarray. lo and hi are element values. I will iterate over all n elements of the original array in each step, but I will ignore those x that don't satisfy lo ≤ x ≤ hi.

2011年02月19日T11:17:29.057Z+00:00

aaz

aaz Over a year ago

@Moron – Only an intuitive one. Randomized quickselect uses O(n) comparisons per step and recurses an expected O(log n) times, giving O(n log n) runtime. This uses an extra 2 n comparisons per step and recurses exactly like quickselect, so it's still O(n log n).

2011年02月19日T22:18:42.35Z+00:00

aaz

aaz Over a year ago

@Moron – You're right. I was thinking that quickselect performs the same as quicksort (expected O(n log n) comparisons). So my algorithm uses more comparisons than quickselect, but approximately the same number of comparisons as quicksort, which Wikipedia says is expected O(n log n). Maybe "with high probability" from the question means "low probability of worst-case behaviour"? Also my algorithm uses only constant space (one recursion which is a tail call), so it's probably not the required answer.

2011年02月19日T23:43:22.52Z+00:00

|

jswolf19 2,31315 silver badges16 bronze badges · Answer 4 · 2011-02-24 11:38:23Z

You can take the parameterization approach:

Since we have k specified in the problem, we can treat it as a constant, thus space of O(k log N) should be acceptable.

Divide the array into k partitions of equal length (O(1) time and O(k log N) space to store the partition borders because each key should only require log N space)
Find the smallest element in each partition (O(N) time and O(k log N) space to store the smallest elements
return the max of the k elements you have (O(k) time)

Since there's room for more cycles in there, there's probably a better way to do it. Also note that this would perform very poorly on an array that is sorted...

CollectivesTM on Stack Overflow

Probabilistic selection algorithm

4 Answers 4

13 Comments

14 Comments

18 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

13 Comments

14 Comments

18 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related