Slow merge sort

Question 1

template<class Compare, class Iterator>
void merge_sort(Iterator start, Iterator fin, int sort_type) {
 Compare comp;
 typedef typename iterator_traits<Iterator>::value_type _value_type; //black type magic to infer data type
 if (distance(start, fin) > 1) {
 vector<_value_type> left (start, start+distance(start, fin)/2);
 vector<_value_type> right (start+distance(start, fin)/2, fin);
 merge_sort<Compare>(left.begin(), left.end(), 2);
 merge_sort<Compare>(right.begin(), right.end(), 2);
 auto i = left.begin();
 auto j = right.begin();
 while (i!= left.end() or j!= right.end()) {
 if (i == left.end()) {
 *start++ = *j++;
 } else if (j == right.end()) {
 *start++ = *i++;
 } else if (comp(*i, *j)) {
 *start++ = *i++;
 } else {
 *start++ = *j++;
 }
 }
 }

So, I've written this implementation of merge_sort, but it seems to be quite slow --- it took 1500ms to sort a vector of 1 000 000 random ints, while standard qsort did the same in less than a tenth of that --- 130ms. Is there (and there definitely is) something wrong with my code, and how can I fix it so that it's more effective?

UPDATE: So, umm, I've updated the code and is should use only one auxiliary vector. The speed did not improve much, though. Anything else?

template<class Compare, class Iterator>
void merge_sort(Iterator start, Iterator fin, int sort_type) {
 Compare comp;
 typedef typename iterator_traits<Iterator>::value_type _value_type; //black type magic to infer data type
 if (distance(start, fin) > 1) {
 static vector<_value_type> temp (distance(start, fin), 0);
 auto i = start;
 auto j = start + distance(start, fin)/2;
 auto k = temp.begin();
 merge_sort<Compare>(i, j, 2);
 merge_sort<Compare>(j, fin, 2);
 while (i != start+distance(start, fin)/2 or j != fin) {
 if (i == start+distance(start, fin)/2) {
 *(k++) = *(j++);
 } else if (j == fin){
 *(k++) = *(i++);
 } else if (comp(*i, *j)) {
 *(k++) = *(i++);
 } else {
 *(k++) = *(j++);
 }
 }
 copy(temp.begin(), temp.begin()+distance(start, fin), start);
 }

UPDATE 1.5 It's a miracle this even works; it definitely shouldn't. I'll fix this later.

Question 2

Your code makes too many allocations and deallocations. For a million-item array the recursion goes about 20 levels deep, on each level you allocate and deallocate two new vectors. On the deepest level you allocate a million (yes, million!) one-item vectors. Luckily, not all at the same time; anyway you do it. And you do about two millions allocations in total. Twice the length of an array being sorted. Like almost all, who try to implement the merge sort...

Question 3

@CiaPan That's why bottom up merge sort is better IMO. You can easily reuse the temp array you need for the merge.

Question 4

@ratchetfreak You can do that in both approaches, actually. Just fill the temp array with a copy of data and merge alternatively from appropriate part of one to the other, switching the direction on each level of recursion. Thanks to the initial copying, the source is ready to use on the deepest level, despite a direction which it works.

Question 5

This example shows, how good algorithms, tailored for specific data structures (sequential file or list, in this case), may degenerate when simply implemented on a too high, abstract level (iterator over array).

Question 6

IMO it also shows how recursion may look elegant it can still limit the implementation in unexpected ways.

Question 7

Don't allocate a vector in each recursive call

You just need one copy of the whole vector as a parameter, then use both as the auxiliary interchangeably at each recursive call.

Example in Java so you can see what I mean (Taken from this source here ):

private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi)
{
 if (hi <= lo) return;
 int mid = lo + (hi - lo) / 2;
 sort (aux, a,lo, mid);
 sort (aux, a, mid+1, hi);
 merge(a, aux, lo, mid, hi);
}

Use insertion sort as base case

You can use insertion sort as a base case for a given threshold (Use some benchmarks to find it. Insertion sort is faster for smaller inputs.

Question 8

But shouldn't static take care of reallocation? Doesn't that mean that temp will be created only once?

Question 9

In a merge sort (and quick sort) you need to stop your recursion when there is a handful of iterations left (say 32 elements) and then use an in-place sort (some even use bubblesort shrugs) to sort the last elements in that range. The reason is that the recursion and splitting overhead becomes too large when the number of elements are small.

MAG MAG 2,97415 silver badges30 bronze badges · Answer 1 · 2016-01-29 23:45:48Z

Don't allocate a vector in each recursive call

You just need one copy of the whole vector as a parameter, then use both as the auxiliary interchangeably at each recursive call.

Example in Java so you can see what I mean (Taken from this source here ):

private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi)
{
 if (hi <= lo) return;
 int mid = lo + (hi - lo) / 2;
 sort (aux, a,lo, mid);
 sort (aux, a, mid+1, hi);
 merge(a, aux, lo, mid, hi);
}

Use insertion sort as base case

You can use insertion sort as a base case for a given threshold (Use some benchmarks to find it. Insertion sort is faster for smaller inputs.

But shouldn't static take care of reallocation? Doesn't that mean that temp will be created only once?

Emily L. Emily L. 16.7k1 gold badge39 silver badges89 bronze badges · Answer 2 · 2016-01-29 21:34:57Z

In a merge sort (and quick sort) you need to stop your recursion when there is a handful of iterations left (say 32 elements) and then use an in-place sort (some even use bubblesort shrugs) to sort the last elements in that range. The reason is that the recursion and splitting overhead becomes too large when the number of elements are small.

Stack Exchange Network

Slow merge sort

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Slow merge sort

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions