I need to sort an array of n real numbers that was randomly generated in this way:
I have a given set of k closed intervals: [a1,b1],[a2,b2],...,[ak,bk] whose beginning and end are natural numbers. These intervals may overlap. Each i-th interval is assigned a number ci, specifying the probability of drawing it. After drawing a certain interval [ai,bi], we draw a number from the interval [ai,bi] according to a uniform distribution and place it in the array.
For example, I have a given array T = [6.1, 1.2, 1.5, 3.5, 4.5, 2.5, 3.9, 7.8], and an additional array with interval information in which there are triples (ai,bi,ci): P = [(1, 5, 0.75) , (4, 8, 0.25)]
The answer is obviously T = [1.2, 1. 5, 2.5, 3.5, 3.9, 4.5, 6.1, 7.8]
My question is, knowing how these numbers in the array were drawn, can we sort this array more efficiently? For example, in linear time? I was thinking that perhaps some modified version of bucket sort could be used here, but I have no idea what that would look like.
If I described something unclear, please tell me, my English is far from perfect.
-
$\begingroup$ I can see the information from $\text P$ help picking a pivot or two in the first few partitions of quicksort. I don't expect it to reduce run time, let alone order of growth. Science, anyone? $\endgroup$greybeard– greybeard2022年03月21日 20:52:18 +00:00Commented Mar 21, 2022 at 20:52
-
$\begingroup$ Cross-posted: cs.stackexchange.com/q/150037/755, stackoverflow.com/q/71591126/781723. Please do not post the same question on multiple sites. $\endgroup$D.W.– D.W. ♦2022年05月04日 00:35:28 +00:00Commented May 4, 2022 at 0:35
1 Answer 1
I'm going to assume that $k$ is small compared to $n$, because otherwise there isn't a lot of point in trying to optimise this; the cost of preprocessing the $k$ intervals into some kind of order would overwhelm the cost of just sorting $n$ elements at the end.
One possible solution is to maintain $k$ arrays, one for each interval, sort them independently, and then perform a $k$-way merge. Using a min-heap, this final step will take $O(n \log k)$ time in the worst case, and more like $O(n + k \log k)$ time if the intervals do not overlap.
-
$\begingroup$ maintain $k$ arrays How do you set up the arrays? $\endgroup$greybeard– greybeard2022年03月22日 00:23:58 +00:00Commented Mar 22, 2022 at 0:23
-
$\begingroup$ Yes, k is smaller than n, much smaller. Thanks, I will try to do it that way. $\endgroup$PK96– PK962022年03月22日 08:59:45 +00:00Commented Mar 22, 2022 at 8:59
-
$\begingroup$ But @greybeard is right, now how do we quickly move these n numbers into the appropriate arrays? $\endgroup$PK96– PK962022年03月22日 09:35:55 +00:00Commented Mar 22, 2022 at 9:35
-
$\begingroup$ Perhaps I didn't understand the problem correctly, but the idea is you have $k$ arrays corresponding to each interval, and assign each value to the interval it came from. I didn't realise that you don't necessarily have that information. In which case you would need to do something a little more clever, like splitting the whole range into subranges of roughly equal probability. $\endgroup$2022年03月22日 10:35:08 +00:00Commented Mar 22, 2022 at 10:35