Efficient sorting an array generated from random distribution

Question 1

I need to sort an array of n real numbers that was randomly generated in this way:

I have a given set of k closed intervals: [a1,b1],[a2,b2],...,[ak,bk] whose beginning and end are natural numbers. These intervals may overlap. Each i-th interval is assigned a number ci, specifying the probability of drawing it. After drawing a certain interval [ai,bi], we draw a number from the interval [ai,bi] according to a uniform distribution and place it in the array.

For example, I have a given array T = [6.1, 1.2, 1.5, 3.5, 4.5, 2.5, 3.9, 7.8], and an additional array with interval information in which there are triples (ai,bi,ci): P = [(1, 5, 0.75) , (4, 8, 0.25)]

The answer is obviously T = [1.2, 1. 5, 2.5, 3.5, 3.9, 4.5, 6.1, 7.8]

My question is, knowing how these numbers in the array were drawn, can we sort this array more efficiently? For example, in linear time? I was thinking that perhaps some modified version of bucket sort could be used here, but I have no idea what that would look like.

If I described something unclear, please tell me, my English is far from perfect.

Question 2

I can see the information from $\text P$ help picking a pivot or two in the first few partitions of quicksort. I don't expect it to reduce run time, let alone order of growth. Science, anyone?

Question 3

Cross-posted: cs.stackexchange.com/q/150037/755, stackoverflow.com/q/71591126/781723. Please do not post the same question on multiple sites.

Question 4

I'm going to assume that $k$ is small compared to $n$, because otherwise there isn't a lot of point in trying to optimise this; the cost of preprocessing the $k$ intervals into some kind of order would overwhelm the cost of just sorting $n$ elements at the end.

One possible solution is to maintain $k$ arrays, one for each interval, sort them independently, and then perform a $k$-way merge. Using a min-heap, this final step will take $O(n \log k)$ time in the worst case, and more like $O(n + k \log k)$ time if the intervals do not overlap.

Question 5

maintain $k$ arrays How do you set up the arrays?

Question 6

Yes, k is smaller than n, much smaller. Thanks, I will try to do it that way.

Question 7

But @greybeard is right, now how do we quickly move these n numbers into the appropriate arrays?

Question 8

Perhaps I didn't understand the problem correctly, but the idea is you have $k$ arrays corresponding to each interval, and assign each value to the interval it came from. I didn't realise that you don't necessarily have that information. In which case you would need to do something a little more clever, like splitting the whole range into subranges of roughly equal probability.

Pseudonym ♦Pseudonym 24.9k3 gold badges48 silver badges100 bronze badges · Answer 1 · 2022-03-21 23:50:23Z

1

$\begingroup$

I'm going to assume that $k$ is small compared to $n$, because otherwise there isn't a lot of point in trying to optimise this; the cost of preprocessing the $k$ intervals into some kind of order would overwhelm the cost of just sorting $n$ elements at the end.

One possible solution is to maintain $k$ arrays, one for each interval, sort them independently, and then perform a $k$-way merge. Using a min-heap, this final step will take $O(n \log k)$ time in the worst case, and more like $O(n + k \log k)$ time if the intervals do not overlap.

Share

Improve this answer

answered Mar 21, 2022 at 23:50

Pseudonym's user avatar

Pseudonym ♦Pseudonym

24.9k3 gold badges48 silver badges100 bronze badges

$\endgroup$

4

$\begingroup$ maintain $k$ arrays How do you set up the arrays? $\endgroup$

greybeard
– greybeard

2022年03月22日 00:23:58 +00:00
Commented Mar 22, 2022 at 0:23
$\begingroup$ Yes, k is smaller than n, much smaller. Thanks, I will try to do it that way. $\endgroup$

PK96
– PK96

2022年03月22日 08:59:45 +00:00
Commented Mar 22, 2022 at 8:59
$\begingroup$ But @greybeard is right, now how do we quickly move these n numbers into the appropriate arrays? $\endgroup$

PK96
– PK96

2022年03月22日 09:35:55 +00:00
Commented Mar 22, 2022 at 9:35
$\begingroup$ Perhaps I didn't understand the problem correctly, but the idea is you have $k$ arrays corresponding to each interval, and assign each value to the interval it came from. I didn't realise that you don't necessarily have that information. In which case you would need to do something a little more clever, like splitting the whole range into subranges of roughly equal probability. $\endgroup$

Pseudonym
– Pseudonym ♦

2022年03月22日 10:35:08 +00:00
Commented Mar 22, 2022 at 10:35

Add a comment |

Stack Exchange Network

Efficient sorting an array generated from random distribution

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Efficient sorting an array generated from random distribution

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions