MergeSort performance

Question 1

Do you see any inefficient parts with the following code? On internet I see 30 ms for 100.000 integers but I can sort 100.000 integers in 300ms. The data is random. I am using late 2013 macbook pro so I don't expect a 10x slowdown from CPU, but who knows.

void mergesort2(int* list, int begin, int end,int *tmplist=0){
 bool allocated = false;
 if (end-begin < 1) {
 return;
 }
 if (tmplist == 0) {
 allocated = true;
 tmplist = new int[end-begin+1];
 }
 int middle = (begin+end)/2;
 mergesort2(list, begin, middle, tmplist);
 mergesort2(list, middle+1, end, tmplist);
 for (int first=begin, second=middle+1, tmp=0; tmp<end-begin+1; tmp++) {
 if (first > middle) {
 memcpy(tmplist+tmp, list+second, sizeof(int)*(end-begin-tmp+1));
 }
 else if (second > end) {
 memcpy(tmplist+tmp, list+first, sizeof(int)*(end-begin-tmp+1));
 }
 else if (list[first] <= list[second]) {
 tmplist[tmp] = list[first++];
 }else{
 tmplist[tmp] = list[second++];
 }
 }
 memcpy(list+begin, tmplist, (end-begin+1)*sizeof(int));
 if (allocated) {
 delete tmplist;
 }
}

Question 2

You have a memory allocation bug: you allocate with operator new[] but delete with operator delete. You should clean up tmplist with delete [] tmplist;

Question 3

Few things here...

the format of your input parameters is inconvenient. You should probably just replace the begin and end parameters with a single size. The method can then be called as:
```
mergesort2(list, middle, tmplist);
mergesort2(list + middle, end - middle, tmplist);
```
this also fixes a lot of +1 offsets you have where the end index is not convenient. It will also remove a lot of math where you are doing end-begin+1 or equivalent.
consider changing your mid-point calculation to be:
```
int middle = begin + (end - begin) / 2;
```
This avoids a bug when begin and end are large values, and the sum of them overflows the int.
Actually, if it were me, I would be using the size method above, and I would declare middle to be 1 different than you:
```
int middle = (size + 1) / 2;
```
this makes the second 'half' the 'smaller' half, and it removes all the '+1' offsets you have to apply later in the code (it also changes some other range checks/limits), but, overall it will simplify the code.
Small nit-pick - I dislike magic numbers, especially 'vagrant' + 1 values... consider this line here:
```
for (int first=begin, second=middle+1, tmp=0; tmp<end-begin+1; tmp++) {
```
This +1 could be avoided with (if you have a good size, and middle):
```
for (int first=begin, second=middle, tmp=0; tmp<size; tmp++) {
```
performance/correctness: declare begin and end to be constants in the method signature.
performance/correctness: inside your loop you have two conditions:
```
if (first > middle) { .... }
else if (second > end) { .... }
```
inside each of those blocks you should break; so that you do not need to repeatedly copy the memory which does not need merging.

Question 4

I have always struggled with +1 -1s, I will try to follow these guidelines and I hope they will help me in the long run!

Question 5

Don't like the idea of using size rather than begin and end. It goes against the normal way C++ code is written. If the code had been tagged C then I would have agreed.

Question 6

@LokiAstari - yeah, I was trapped by the original question not having a language tag at all, and the OP never really paid much attention at the time

Question 7

A C++ merge sort in the style of a Standard Library algorithm could be written as

template<class BiDirIt, class Compare = std::less<typename std::iterator_traits<BiDirIt>::value_type>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare())
{
 auto const N = std::distance(first, last);
 if (N < 2) return;
 auto middle = std::next(first, N / 2);
 merge_sort(first, middle, cmp);
 merge_sort(middle, last, cmp);
 std::inplace_merge(first, middle, last, cmp);
}

Let's look at how this improves upon your design:

generic interface: it takes two iterators and a comparison function of arbitrary types, instead of two indices into a pointer to an array of int and the implicit but fixed < comparison.
in-place: behind the covers it calls std::inplace_merge that can allocate extra memory but it does it so that users don't have to supply this, instead of your tmplist parameter.
half-open intervals: it will split the input range [first, last) into [first, middle) and [middle, last) which automically takes care of easy to make "off-by-one" errors in code like middle + 1 (in your code it is correct, but you still have to think about it, half-open intervals are easier to get right).
named algorithms: it uses std::inplace_merge as a ready-to-use component instead of a somewhat long and certainly tricky final step in your algorithm. Not only makes this your merge_sort more compact, the inplace_merge itself is a resuable component in its own right!
heavily optimized: Standard Library experts are very likely to know about the state of the art in writing performance-optimized algorithms. A factor of 10 of performance compared to your code seems a lot, but if you forget some hidden copies, use a hand-written loop that has O(N^2) complexity instead of O(N log N) you can quickly get there.

Question 8

Back to the original question, I created a small test fixture for experimenting with your code (after all, had to confirm it actually worked first!)

#define NUM_OF_INTS 100000
#define DEBUG 0
int main()
{
 int numbers[NUM_OF_INTS]; 
 int i;
 int *tmplist;
 tmplist = new int[NUM_OF_INTS];
 srand(0);
 for( i = 0; i < NUM_OF_INTS; i++ )
 numbers[i] = rand()%10000;
 if (DEBUG == 1)
 for( i = 0; i < NUM_OF_INTS; i++ )
 printf( "%03i:%04i\n", i, numbers[i] ); 
 mergesort2( &numbers[0], 0, NUM_OF_INTS, tmplist );
 if (DEBUG == 1 ) printf( "\n"); 
 if (DEBUG == 1)
 for( i = 0; i < NUM_OF_INTS; i++ )
 printf( "%03i:%04i\n", i, numbers[i] ); 
 return 0;
}

Running this test fixture in my enviroment, I see about .7s for 100k ints. I initially thought checking for tmplist in every recursive iteration might be causing a performance degradation, but it made a negligible change in execution time. I then started down the path of reimplementing mergesort without a temporary array, thinking that the array memcopies might be the root cause of the performance issue, when I stumbled upon what is wrong with your code.

Hint

Think about why the memcpy's are there inside the for loop. What is their purpose? Why are they necessary? What does it mean algorithmically when those memcpy's are done executing? What should happen next?

Spoiler (really think & test your code before looking at this)

You need to do a break; to quit the for loop after each memcopy for the two degenerate cases (first advances past midpoint, middle advances past end). Otherwise you are doing repeated memcpys (which lucky or unlucky, have no effect on the output), until tmp finally reaches the end.

With the fix, the code can now do 100k ints in .08s in my environment, hence the 10x improvement you were looking for.

Question 9

After fixing the problem @RichHendricks pointed out this still has...not a problem exactly, but still a design that's less than optimal (at least in my opinion).

In the initial phase, you basically do:

calculate middle
recurse on left side
recurse on right side

Each of those recursive calls does the same thing again. Modern processors have done quite a bit to make calls and returns more efficient, but that still adds quite a bit of overhead when the fundamental operation involved is basically just computing the arithmetic mean of two numbers.

You can avoid that by using a bottom-up merge-sort. Instead of recursively splitting the array in half until it gets down to some manageable size, you can simply take the first N elements and sort them with some low-overhead sort (typically an insertion sort). Repeat N elements at a time until you reach the end of the collection (though, of course, the last group might not be the same same size as the others).

Then take the first two sorted groups of N elements and merge them. Repeat for the next pair of groups until you reach the end. Repeat that process until all the groups have been merged into one.

As to how to do the merging: a couple of papers¹ have been written about in-place merging. I should add that quite a few algorithms to do this have been devised, but many (most?) are primarily theoretical--in theory, they have low enough computational complexity to maintain the O(N log n) complexity of the merge sort as a whole, but in fact they impose so much overhead that other methods are faster for almost any practical amount of data. I've tried to restrict the list of citations below to ones that seem to have at least some chance of being practical, not just theoretical.

One last point: contrary to one widespread belief, this is an area where you actually stand a pretty decent chance of writing code significantly better than the standard library provides. The C++ standard library does include an inplace_merge algorithm, but it's allowed to be significantly slower than the best algorithms now known. At least if I recall correctly these requirements haven't been updated since the 1998 standard, which was before many of the best in-place merging algorithms were invented. I don't claim to have looked at every implementation of in-place merging in every standard library, but I have seen at least a few relatively recent ones that used older, substantially less efficient algorithms.

For a few examples:

http://akira.ruc.dk/~keld/teaching/algoritmedesign_f04/Artikler/04/Huang88.pdf http://www.sofsem.cz/sofsem06/data/prezentace/23/A/Kim-OnOptimalandEfficientInPlaceMerging.ppt http://www.dcs.kcl.ac.uk/technical-reports/papers/TR-04-05.pdf

rolfl rolfl 98.1k17 gold badges219 silver badges419 bronze badges · Accepted Answer · 2014-03-01 15:09:02Z

Few things here...

the format of your input parameters is inconvenient. You should probably just replace the begin and end parameters with a single size. The method can then be called as:
```
mergesort2(list, middle, tmplist);
mergesort2(list + middle, end - middle, tmplist);
```
this also fixes a lot of +1 offsets you have where the end index is not convenient. It will also remove a lot of math where you are doing end-begin+1 or equivalent.
consider changing your mid-point calculation to be:
```
int middle = begin + (end - begin) / 2;
```
This avoids a bug when begin and end are large values, and the sum of them overflows the int.
Actually, if it were me, I would be using the size method above, and I would declare middle to be 1 different than you:
```
int middle = (size + 1) / 2;
```
this makes the second 'half' the 'smaller' half, and it removes all the '+1' offsets you have to apply later in the code (it also changes some other range checks/limits), but, overall it will simplify the code.
Small nit-pick - I dislike magic numbers, especially 'vagrant' + 1 values... consider this line here:
```
for (int first=begin, second=middle+1, tmp=0; tmp<end-begin+1; tmp++) {
```
This +1 could be avoided with (if you have a good size, and middle):
```
for (int first=begin, second=middle, tmp=0; tmp<size; tmp++) {
```
performance/correctness: declare begin and end to be constants in the method signature.
performance/correctness: inside your loop you have two conditions:
```
if (first > middle) { .... }
else if (second > end) { .... }
```
inside each of those blocks you should break; so that you do not need to repeatedly copy the memory which does not need merging.

I have always struggled with +1 -1s, I will try to follow these guidelines and I hope they will help me in the long run!
Don't like the idea of using size rather than begin and end. It goes against the normal way C++ code is written. If the code had been tagged C then I would have agreed.
@LokiAstari - yeah, I was trapped by the original question not having a language tag at all, and the OP never really paid much attention at the time

Stack Exchange Network

MergeSort performance

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

MergeSort performance

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions