$m-$way merge sort, extended merge sort

Question 1

Consider Merge-sort algorithm that we modify as follow:

Suppose we extend it to divide the input array into $m$ not necessary equal and after sorting each of them, merge them. Now can we conclude that the running time will be $O(n\log n)$?

I think the running time should be $O(n^2)$ as follow:

If the algorithm divide to three sections such that two first sections contains some constant number of elements and third sections contains all most elements then the running time will be $O(n^2)$ if the algorithm continue dividing as mentioned scenario.

Question 2

You are giving no justification for the $O(n^2)$ behavior, except "I think".

Question 3

There seems to be at least one word missing from the supposition.

Question 4

Yes, if I split n numbers into 1 / 1 / n-2 numbers then the runtime will be Theta(n^2). But that would be utterly stupid.

Question 5

The linlogarithmic behavior of MergeSort is due to the fact that the size of the subsets decreases geometrically rather than arithmetically, so that the number of passes over the whole data is logarithmic.

You might very well roll a modified MergeSort that subdivides in fractions $\frac12,\frac13,\frac16$ and keep the same complexity.

But if we pursue your idea of handling subarrays of constant size, the archetypal algorithm that you obtain is StraightInsertionSort (recursively split $n$ in $n-1$ and 1ドル$ sizes), that notoriously has an $O(n^2)$ behavior, because the number of passes is linear.

You get a similar situation when comparing dichotomic search and linear search in a sorted array.

Question 6

The question leaves some room for interpretation, so I apologize in advance if my answer misses the mark.

You can in fact merge (sort) $m \le n$ sorted "runs" in at most $\mathcal{O}(n\log n)$ time. This is one of the many great ideas that went into the TimSort algorithm. You can find the original design document here.

One of the ideas of TimSort is to look for already sorted subsequences (Tim Peters dubbed them runs). This of course may lead to runs of vastly different size, just like You proposed. In order to merge the runs efficiently, TimSort delays the merging, putting the runs on a stack, looking for opportunities to merge sequences of roughly equal size. To prevent the stack from growing super-logarithmically, TimSort ensures that the sizes of the runs on the stack (top to bottom) grow at least as fast as the Fibonacci sequence.

So if You use such a "Fibonacci" stack for merging, then You can even handle arbitrary splits during recursion (provided Your programming language and/or runtime supports call stacks of size $\mathcal{O}(n)$). Of course it is not exactly merge sort anymore...

user16034 · Answer 1 · 2023-03-13 08:21:51Z

The linlogarithmic behavior of MergeSort is due to the fact that the size of the subsets decreases geometrically rather than arithmetically, so that the number of passes over the whole data is logarithmic.

You might very well roll a modified MergeSort that subdivides in fractions $\frac12,\frac13,\frac16$ and keep the same complexity.

But if we pursue your idea of handling subarrays of constant size, the archetypal algorithm that you obtain is StraightInsertionSort (recursively split $n$ in $n-1$ and 1ドル$ sizes), that notoriously has an $O(n^2)$ behavior, because the number of passes is linear.

You get a similar situation when comparing dichotomic search and linear search in a sorted array.

DirkT 1,0212 silver badges13 bronze badges · Answer 2 · 2023-08-11 19:52:48Z

The question leaves some room for interpretation, so I apologize in advance if my answer misses the mark.

You can in fact merge (sort) $m \le n$ sorted "runs" in at most $\mathcal{O}(n\log n)$ time. This is one of the many great ideas that went into the TimSort algorithm. You can find the original design document here.

One of the ideas of TimSort is to look for already sorted subsequences (Tim Peters dubbed them runs). This of course may lead to runs of vastly different size, just like You proposed. In order to merge the runs efficiently, TimSort delays the merging, putting the runs on a stack, looking for opportunities to merge sequences of roughly equal size. To prevent the stack from growing super-logarithmically, TimSort ensures that the sizes of the runs on the stack (top to bottom) grow at least as fast as the Fibonacci sequence.

So if You use such a "Fibonacci" stack for merging, then You can even handle arbitrary splits during recursion (provided Your programming language and/or runtime supports call stacks of size $\mathcal{O}(n)$). Of course it is not exactly merge sort anymore...

Stack Exchange Network

$m-$way merge sort, extended merge sort

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

$m-$way merge sort, extended merge sort

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions