Radix sort with benchmarks

Question 1

I am trying to time radix sort on 128 bit unsigned integers for different base sizes (that I call bitwidth). My code has at least the following problems:

The radix sort itself may run slower than it would otherwise as the bit width, number of passes etc are not constants that the compiler can understand and also maybe the use of variable length arrays.
I use variable length arrays. It might be better without those
It doesn't work (crashes due to stack overflow) for bitwidth > 18.

Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
#include <time.h>
#include <string.h>
#include <time.h>
#include <sys/sysinfo.h>
#define LENGTH 1000000
void print128(__uint128_t u) {
 if (u>9) print128(u/10);
 putchar(48+(int)(u%10));
}
// 2^(bitwidth) == count_size for counting sort. 16 is 128/bitwidth == num_passes (number of radix sort passes)
typedef __uint128_t u128;
typedef uint32_t u32;
u128* radix_sort128(u128* array, u32 size, u32 count_size, u32 num_passes, u32 bitwidth) {
 u32 counts[num_passes][count_size];
 memset(&counts, 0, count_size * num_passes * sizeof(u32));
 u128* cpy = (u128*) malloc(size * sizeof(u128));
 u32 o[num_passes];
 memset(o, 0, sizeof o);
 // u32 o[num_passes] = {0};
 u32 t, x, pos;
 u128* array_from, * array_to;
 for (x = 0; x < size; x++) {
 for (pos = 0; pos < num_passes; pos++) {
 t = (array[x] >> bitwidth * pos) % count_size;
 counts[pos][t]++;
 }
 }
 for (x = 0; x < count_size; x++) {
 for (pos = 0; pos < num_passes; pos++) {
 t = o[pos] + counts[pos][x];
 counts[pos][x] = o[pos];
 o[pos] = t;
 }
 }
 for (pos = 0; pos < num_passes; pos++) {
 array_from = pos % 2 == 0 ? array : cpy;
 array_to = pos % 2 == 0 ? cpy : array;
 for (x = 0; x < size; x++) {
 t = (array_from[x] >> bitwidth * pos) & 0xff;
 array_to[counts[pos][t]] = array_from[x];
 counts[pos][t]++;
 }
 }
 free(cpy);
 return array;
}
uint64_t wyhash64_x;
uint64_t wyhash64() {
 wyhash64_x += 0x60bee2bee120fc15;
 __uint128_t tmp;
 tmp = (__uint128_t) wyhash64_x * 0xa3b195354a39b70d;
 uint64_t m1 = (tmp >> 64) ^ tmp;
 tmp = (__uint128_t)m1 * 0x1b03738712fad5c9;
 uint64_t m2 = (tmp >> 64) ^ tmp;
 return m2;
}
int main() {
 u128* vals128 = malloc(LENGTH * sizeof(*vals128));
 u128 lower;
 u128 higher;
 struct timespec start, end;
 // make an array of random u128s 
 for (int i = 0; i < LENGTH; i++) {
 lower = (u128) wyhash64();
 higher = (u128) wyhash64();
 higher = higher << 64;
 vals128[i] = lower + higher;
 }
 for (u32 bitwidth = 8; bitwidth < 20; bitwidth+=2){
 u32 num_passes = 128/bitwidth;
 u32 count_size = 1 << bitwidth;
 printf("bitwidth %d num_passes %d count_size %d\n", bitwidth, num_passes, count_size);
 clock_gettime(CLOCK_MONOTONIC, &start);
 radix_sort128(vals128, LENGTH, count_size, num_passes, bitwidth);
 clock_gettime(CLOCK_MONOTONIC, &end);
 printf("%f seconds\n", (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1000000000.0);
 }
 free(vals128);
}

Any improvements gratefully received.

Edit

Corrected bug. & 0xff is now % count_size.

Question 2

The problem is the 0xff. That needs fixing

Question 3

Hi, you can refer to this codereview.stackexchange.com/q/216460/190450. The code is in c++. But you can go through it as a reference.

Question 4

num_passes is wrong unless 128 is divisible by bitwidth. You need to round up.

Question 5

There's an opaque magic number here, that reduces portability:

 putchar(48+(int)(u%10));

I think that was intended to be '0', and we unnecessarily created an ASCII-only program.

This function might be better if it works in larger chunks, and recursed less:

void print128(__uint128_t u) {
 static const uint64_t ten18 =わ 10000000000000000000u;
 if (u < ten18) {
 printf("%" PRIu64, (uint64_t)u);
 } else {
 print128(u/ten18);
 printf("%018" PRIu64, (uint64_t)(u % ten18));
 }
}

 u128* cpy = (u128*) malloc(size * sizeof(u128));

Don't cast the return from malloc() - it's a void*, which is convertible to any pointer type in C. And it's good practice to refer the size directly to the variable being initialised, rather than to its type - that makes it easier for the reader to see correctness without having to find the declaration, which may be further away. Another good habit is to multiply beginning with the size_t quantity, which can avoid overflow when there are more terms in the product.

 u128* cpy = malloc(sizeof *cpy * size);

Whatever we do, we must not dereference that pointer until we know it's not a null pointer.

 if (!cpy) {
 return NULL;
 }

uint64_t wyhash64()

This function declaration should be a prototype (as should main()):

uint64_t wyhash64(void)

There's no need for wyhash64_x to have global scope - it would be better as a static local within the function.

Question 6

On linux at least, I am not sure malloc ever returns NULL . See stackoverflow.com/questions/16674370/…

Question 7

It certainly does when you hit your ulimit -v value.

Question 8

That is true. It looks like the code as written is actually incorrect because of the 0xFF. Would you be able to add a correction to your great answer? That is it doesn't work for bitwidth other than 8. I think the mask just has to be change to count-size - 1 .

Question 9

It's probably better for you to add an answer of your own to explain that (and will help you build reputation here). (We like having multiple different answers here on Code Review.)

Question 10

As I mention in a comment above, there seems to be a fatal bug in any case

Question 11

Not much left after @Toby Speight review.

Bug?

Code has a mysterious & 0xff. Perhaps % count_size?

u32 counts[num_passes][count_size];
...
// t = (array[x] >> bitwidth * pos) & 0xff;
t = (array[x] >> bitwidth * pos) % count_size;
counts[pos][t]++;

Minor bug

Assuming int is 32-bit`.

When int is 16-bit, 1 << bitwidth shifts outside int range.

// u32 count_size = 1 << bitwidth;
u32 count_size = ((u32)1) << bitwidth; // or the like.

for loop like-wise fails. As i is used to index arrays, recommend size_t.

#define LENGTH 1000000
// for (int i = 0; i < LENGTH; i++) { 
for (size_t i = 0; i < LENGTH; i++) {

Multiplication order for size

Rather than assume 32-bit for size_t, (e.g. it may be 64-bit), perform the size multiplication starting with size_t to take advantage where a wide size_t may prevent overflow of u32 * u32. (I now see Toby mentioned part of this.)

... u32 count_size, u32 num_passes ...
// u32 counts[num_passes][count_size];
// memset(&counts, 0, count_size * num_passes * sizeof(u32));
// ^---------------------^ may overflow 32 bit math 
memset(&counts, 0, sizeof(u32) * count_size * num_passes);
 ^----------------------^ Math done using wider of size_t, u32

Even simpler, just use the size of the array.

// memset(&counts, 0, count_size * num_passes * sizeof(u32));
memset(&counts, 0, sizeof counts);

u32 vs uint32_t

Rather than a user/implementation defined fixed width type, consider using the standard one.

To know if u32 code is correct obliges a look-up of u32 definition.

Superfluous casts

Not needed with up conversions.

//lower = (u128) wyhash64();
// higher = (u128) wyhash64();
lower = wyhash64();
higher = wyhash64();

Question 12

And If you do both malloc() and memset(), use calloc() instead, and you don't have to worry about those sizes.

Question 13

@G.Sliepen Sounds like a good idea, perhaps post that, even if it is only a small review.

Question 14

Will % count_size be as fast as & (count_size - 1)?

Question 15

Can you also see how to fix " It doesn't work (crashes due to stack overflow) for bitwidth > 18."?

Question 16

There is another bug it seems. I don't think num_passes is correct if 128 is not divisible by bitwidth,

Toby Speight Toby Speight 87.2k14 gold badges104 silver badges322 bronze badges · Answer 1 · 2021-08-01 08:35:55Z

There's an opaque magic number here, that reduces portability:

 putchar(48+(int)(u%10));

I think that was intended to be '0', and we unnecessarily created an ASCII-only program.

This function might be better if it works in larger chunks, and recursed less:

void print128(__uint128_t u) {
 static const uint64_t ten18 =わ 10000000000000000000u;
 if (u < ten18) {
 printf("%" PRIu64, (uint64_t)u);
 } else {
 print128(u/ten18);
 printf("%018" PRIu64, (uint64_t)(u % ten18));
 }
}

 u128* cpy = (u128*) malloc(size * sizeof(u128));

Don't cast the return from malloc() - it's a void*, which is convertible to any pointer type in C. And it's good practice to refer the size directly to the variable being initialised, rather than to its type - that makes it easier for the reader to see correctness without having to find the declaration, which may be further away. Another good habit is to multiply beginning with the size_t quantity, which can avoid overflow when there are more terms in the product.

 u128* cpy = malloc(sizeof *cpy * size);

Whatever we do, we must not dereference that pointer until we know it's not a null pointer.

 if (!cpy) {
 return NULL;
 }

uint64_t wyhash64()

This function declaration should be a prototype (as should main()):

uint64_t wyhash64(void)

There's no need for wyhash64_x to have global scope - it would be better as a static local within the function.

On linux at least, I am not sure malloc ever returns NULL . See stackoverflow.com/questions/16674370/…
That is true. It looks like the code as written is actually incorrect because of the 0xFF. Would you be able to add a correction to your great answer? That is it doesn't work for bitwidth other than 8. I think the mask just has to be change to count-size - 1 .
It's probably better for you to add an answer of your own to explain that (and will help you build reputation here). (We like having multiple different answers here on Code Review.)
As I mention in a comment above, there seems to be a fatal bug in any case

chux chux 36.4k2 gold badges43 silver badges96 bronze badges · Answer 2 · 2021-08-01 21:10:22Z

Not much left after @Toby Speight review.

Bug?

Code has a mysterious & 0xff. Perhaps % count_size?

u32 counts[num_passes][count_size];
...
// t = (array[x] >> bitwidth * pos) & 0xff;
t = (array[x] >> bitwidth * pos) % count_size;
counts[pos][t]++;

Minor bug

Assuming int is 32-bit`.

When int is 16-bit, 1 << bitwidth shifts outside int range.

// u32 count_size = 1 << bitwidth;
u32 count_size = ((u32)1) << bitwidth; // or the like.

for loop like-wise fails. As i is used to index arrays, recommend size_t.

#define LENGTH 1000000
// for (int i = 0; i < LENGTH; i++) { 
for (size_t i = 0; i < LENGTH; i++) {

Multiplication order for size

Rather than assume 32-bit for size_t, (e.g. it may be 64-bit), perform the size multiplication starting with size_t to take advantage where a wide size_t may prevent overflow of u32 * u32. (I now see Toby mentioned part of this.)

... u32 count_size, u32 num_passes ...
// u32 counts[num_passes][count_size];
// memset(&counts, 0, count_size * num_passes * sizeof(u32));
// ^---------------------^ may overflow 32 bit math 
memset(&counts, 0, sizeof(u32) * count_size * num_passes);
 ^----------------------^ Math done using wider of size_t, u32

Even simpler, just use the size of the array.

// memset(&counts, 0, count_size * num_passes * sizeof(u32));
memset(&counts, 0, sizeof counts);

u32 vs uint32_t

Rather than a user/implementation defined fixed width type, consider using the standard one.

To know if u32 code is correct obliges a look-up of u32 definition.

Superfluous casts

Not needed with up conversions.

//lower = (u128) wyhash64();
// higher = (u128) wyhash64();
lower = wyhash64();
higher = wyhash64();

And If you do both malloc() and memset(), use calloc() instead, and you don't have to worry about those sizes.
@G.Sliepen Sounds like a good idea, perhaps post that, even if it is only a small review.
Can you also see how to fix " It doesn't work (crashes due to stack overflow) for bitwidth > 18."?
There is another bug it seems. I don't think num_passes is correct if 128 is not divisible by bitwidth,

Stack Exchange Network

Radix sort with benchmarks

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Radix sort with benchmarks

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions