Mergesort performance in C

Question 1

I have two questions about my code:

Are there any performance improvements in this implementation or generally (register int ?) that I could make?
What could I improve in my coding style?

void mergesortArray(int data[], int amount){
 if(amount == 1) return;
 //Precomputing values
 int sizeint = sizeof(int);
 int amountLeft = amount / 2;
 int amountRight = (amount % 2 == 0) ? amountLeft : (amountLeft + 1);
 //Splitting the array in right and left
 int *left = calloc(amountLeft, sizeint);
 int *right = calloc(amountRight, sizeint);
 if(left == NULL || right == NULL) return;
 //Copying the splitted content
 memcpy(left, data, amountLeft * sizeint);
 memcpy(right, data + amountLeft, amountRight * sizeint);
 //Recursive sorting the splitted arrays
 mergesortArray(left,amountLeft);
 mergesortArray(right,amountRight);
 //Merging the numbers
 int *pos1 = &left[0];
 int *pos2 = &right[0];
 int i = 0;
 for(i = 0; i < amount; i++) {
 if(*pos1 <= *pos2) {
 data[i] = *pos1;
 if (pos1 == &right[amountRight - 1])
 break;
 if(pos1 == &left[amountLeft - 1])
 pos1 = &right[amountRight - 1];
 else
 pos1++;
 }
 else {
 data[i] = *pos2;
 if(pos2 == &right[amountRight - 1])
 pos2 = &left[amountLeft - 1];
 else
 pos2++;
 }
 }
}

Question 2

in a modern C compiler, the only effect of using the register modifier is that the address of the variable cannot be acquired. I.E. a total waste of effort.

Question 3

the posted code will have a very large recursion level for any data set, where the recursion level has a direct relationship to the size of the data set. Each recursion is calling calloc() twice, but there is no calls to free() so this algorithm will leak memory like a sieve.

Question 4

The code seems to be performing the merge at every exit from a lower level recursion. That seems like way to many merges.

Question 5

The posted code could call free() for each of the pointers from the two local calls to calloc() just before returning from the recursion level

Question 6

perhaps could eliminate any calls to calloc() and free() by sorting/merging the sub arrays in place rather than copying them and sorting the copies

Question 7

The register keyword will do nothing. Current day compilers are smart enough to find a good register allocation themselves without you running in its way.

Why calloc? you initialize the values through memcpy anyway so no need for the zero initialization, malloc will work fine. Speaking of you need to free what you allocate otherwise you'll get a memory leak.

You should only need to allocate a single extra array for the merge.

When the amount left to sort is small (amount < 5) you can switch over to another sort that is more efficient for small numbers like insertion sort.

Finally brace position on your else is a bit odd:

 if(*pos1 <= *pos2) {
 }
 else {
 }

I'd expect either the opening braces to also be on its own line or else to be on the same as the closing brace of the if.

Question 8

What do you mean exactly by "The amount left to sort"?

Question 9

You can optimize callocs away by allocating a buffer of equal size and contents as the input array data at the very beginning of sorting. After that, you keep alternating the roles of the two arrays (data and the buffer). See what I mean:

static void mergesortArrayImprovedImpl(int* source, 
 int* target, 
 size_t offset, 
 size_t length) 
{
 if (length < 2) return;
 size_t left_range_length = length >> 1;
 // Divide.
 mergesortArrayImprovedImpl(target, 
 source, 
 offset, 
 left_range_length);
 mergesortArrayImprovedImpl(target, 
 source, 
 offset + left_range_length, 
 length - left_range_length);
 // Conquer.
 size_t left_index = offset;
 size_t left_index_bound = offset + left_range_length;
 size_t right_index = left_index_bound;
 size_t right_index_bound = offset + length;
 size_t target_index = offset;
 while (left_index < left_index_bound && right_index < right_index_bound) 
 {
 target[target_index++] = source[left_index] < source[right_index] ?
 source[left_index++] :
 source[right_index++];
 }
 memcpy(target + target_index, 
 source + left_index, 
 sizeof(int) * (left_index_bound - left_index));
 memcpy(target + target_index,
 source + right_index,
 sizeof(int) * (right_index_bound - right_index));
}
void mergesortArrayImproved(int data[], size_t amount) 
{
 if (amount < 2) return;
 int* aux = malloc(sizeof(int) * amount);
 memcpy(aux, data, sizeof(int) * amount);
 mergesortArrayImprovedImpl(aux, data, 0, amount);
 free(aux);
}

I leave profiling the two implementations, however, to you.

Question 10

As usual, I recommend random pausing as a way to see what takes the most time.

That said (I hate saying "that said" because it implies guessing is a good thing - it's not) here's my strong suspicion: A call to calloc only takes a line of code. Does that mean it's cheap? If you single-step it in the disassembly window you'd better bring lunch, because it will take quite a while. So I suspect the samples will show that as your "bottleneck".

You don't really need any extra memory (as @ratchet said) except another temporary array of the same size as the original. You can do your splitting in the original array. Do the merge into the temporary array, and copy back to the original array. That way you malloc only once. (And you should probably free it when you're done.)

And also as @ratchet said, when you get down to small amount, an unrolled bubble sort would probably be faster. Example:

if (amount == 1){
} else if (amount == 2){
 if (data[0] > data[1]) swap(data[0], data[1]);
} else if (amount == 3){
 if (data[0] > data[1]) swap(data[0], data[1]);
 if (data[0] > data[2]) swap(data[0], data[2]);
 if (data[1] > data[2]) swap(data[1], data[2]);
} else if (amount == 4){
 if (data[0] > data[1]) swap(data[0], data[1]);
 if (data[0] > data[2]) swap(data[0], data[2]);
 if (data[0] > data[3]) swap(data[0], data[3]);
 if (data[1] > data[2]) swap(data[1], data[2]);
 if (data[1] > data[3]) swap(data[1], data[3]);
 if (data[2] > data[3]) swap(data[2], data[3]);
} else {
 ... your code ...
}

BTW: You could say:

int amountRight = amount - amountLeft;

Question 11

Do you know switch?

Question 12

@Deduplicator: Sure, but the only purpose of switch is to give the compiler permission to generate a jump table, and it only does that if the number of branches is sufficiently large, and the branch codes are compact. Otherwise it just generates an if-ladder.

ratchet freak ratchet freak 13k22 silver badges46 bronze badges · Accepted Answer · 2015-11-17 11:13:31Z

The register keyword will do nothing. Current day compilers are smart enough to find a good register allocation themselves without you running in its way.

Why calloc? you initialize the values through memcpy anyway so no need for the zero initialization, malloc will work fine. Speaking of you need to free what you allocate otherwise you'll get a memory leak.

You should only need to allocate a single extra array for the merge.

When the amount left to sort is small (amount < 5) you can switch over to another sort that is more efficient for small numbers like insertion sort.

Finally brace position on your else is a bit odd:

 if(*pos1 <= *pos2) {
 }
 else {
 }

I'd expect either the opening braces to also be on its own line or else to be on the same as the closing brace of the if.

\$\begingroup\$ What do you mean exactly by "The amount left to sort"? \$\endgroup\$

iPh1ps99
– iPh1ps99

2015年11月17日 12:15:14 +00:00
Commented Nov 17, 2015 at 12:15

Stack Exchange Network

Mergesort performance in C

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Mergesort performance in C

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions