I implemented mergesort in C as an exercise. I have two main questions:
Is the code future-proof?
2.1 Will I have problems modifying it to sort float and double?
2.2 What could be problematic for creating a multithreaded version? I haven't learned multithreading in C yet, but I plan on a multithreaded version later on.
Secondary question:
- What could I change in the code to make it better?
#include <stdio.h>
void join (int* array, int* array_left, int* array_right, size_t len, size_t len_left)
{
size_t counter_left = 0;
size_t counter_right = 0;
for (size_t i = 0; i < len; i++)
{
if (counter_right == len - len_left || counter_left < len_left && array_left[counter_left] <= array_right[counter_right])
{
array[i] = array_left[counter_left];
counter_left++;
}
else
{
array[i] = array_right[counter_right];
counter_right++;
}
}
return;
}
void partition (int* array, int* array_part, size_t start_p, size_t len_p)
{
for (size_t i = start_p; i < len_p; i++)
{
array_part[i-start_p] = array[i];
}
return;
}
void mergesort(int* array, size_t len)
{
if (len <= 1)
{
return;
}
size_t len2 = len / 2;
int array_left[len2];
int array_right[len-len2];
//Partitioning array into left and right
partition (array, array_left, 0, len2);
partition (array, array_right, len2, len);
//recursiv call
mergesort(array_left, len2);
mergesort(array_right, len - len2);
//joining / sorting of the partitions
join(array, array_left, array_right, len, len2);
return;
}
int main()
{
int array[] = {1,2,3,7,8,9,6,5,4,0};
size_t len = sizeof array / sizeof array[0];
printf("Array unsorted: ");
for (int i = 0; i < len; i++)
{
printf("%d", array[i]);
}
printf("\n");
mergesort(array, len);
printf("Array sorted: ");
for (int i = 0; i < len; i++)
{
printf("%d", array[i]);
}
printf("\n");
return 0;
}
2 Answers 2
The condition
counter_right == len - len_left || counter_left < len_left && array_left[counter_left] <= array_right[counter_right])
looks scary. You'd be in much better shape separatingjoin
into two phases: actual merge, and handling tails:void join (int* array, int* array_left, int* array_right, size_t len, size_t len_left) { size_t counter_left = 0; size_t counter_right = 0; size_t i; // Phase 1: merge while (counter_left < len_left && counter_right < len - len_left) { if (array_left[counter_left] <= array_right[counter_right]) { array[i++] = array_left[counter_left++]; } else { array[i++] = array_right[counter_right++]; } } // Phase 2: tails // Now one of the arrays is exhausted. Another one (possibly) still // has data. Copy them to the target array. // Notice that we don't care which one is empty: copying from an empty // array is no-op. while (counter_left < len_left) { array[i++] = array_left[counter_left++]; } while (counter_right < len - len_left) { array[i++] = array_right[counter_right++]; } }
Of course, the two tail loops above implement a
copy
algorithm, and deserve to be factored out into a function. Interestingly, you already have this function. You just misnamed it.partition
is actuallycopy
.In fact, you may consider dropping altogether, and use
memcpy
instead.Recursion is expensive. For the arrays small enough, insertion sort is faster. Consider switching to insertion sort when
len
is less than a threshold, say16
. The exact value of the threshold is fine tuning, and requires some profiling experiments.Using VLAs is scary. Given a huge array it may overflow the stack. Consider heap allocation. Notice that the scratch array can be allocated only once.
Kudos for
<=
when comparing elements. Many people miss that this is how mergesort maintains stability.
-
\$\begingroup\$ What I think is weird is that the
copy
in quicksort also copies the element. \$\endgroup\$S.S. Anne– S.S. Anne2020年04月22日 20:36:30 +00:00Commented Apr 22, 2020 at 20:36 -
\$\begingroup\$ @vnp First of thanks for your help. I was wondering, if I really need to check for
counter_right < len - len_left
in the first while_loop. I thought len_left is always smaller or equal to len - len_left, becausesize_t len2 = len / 2;
will make sure of that. Or is that depending on the system the code is running on? And can you tell me, what you mean with scratch array? I tried google, but didn't help. \$\endgroup\$Miha– Miha2020年04月23日 10:09:04 +00:00Commented Apr 23, 2020 at 10:09 -
\$\begingroup\$ Note that
<=
is insufficient to maintain stability with floating point values that include not-a-number. \$\endgroup\$chux– chux2020年04月23日 14:21:53 +00:00Commented Apr 23, 2020 at 14:21
2.1 Will I have problems modifying it to sort
float
anddouble
?
The "sort" comes down to one line:
array_left[counter_left] <= array_right[counter_right]
With floating point, sorting concerns include:
- Signed zeros: -0.0 and +0.0
Typically the solution is to consider these of equal value. So an array of [-0.0, +0.0, -0.0, +0.0 ...], with the code, should produce a stable sort. If an order is needed (e.g. -0.0 before +0.0) , the compare will need to expand to account for the sign.
When x
is a NaN, comparing usually1 results in false
as in x <= y
and even x == x
.
With code's singular use of <=
for comparing, I foresee a problem, perhaps like qsort()
which require total ordering. A more advanced compare would be needed to handle arrays which include NaN.
1 C does not specify how compares behave with NaN. Most systems do follow IEEE of comparing with <= , <, >, >=, ==
involving at least 1 NaN is false.
i < len
orfor (int i = 0; i < len; i++)
, even in test code. \$\endgroup\$