I am trying to time radix sort on 128 bit unsigned integers for different base sizes (that I call bitwidth). My code has at least the following problems:
- The radix sort itself may run slower than it would otherwise as the bit width, number of passes etc are not constants that the compiler can understand and also maybe the use of variable length arrays.
- I use variable length arrays. It might be better without those
- It doesn't work (crashes due to stack overflow) for bitwidth > 18.
Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
#include <time.h>
#include <string.h>
#include <time.h>
#include <sys/sysinfo.h>
#define LENGTH 1000000
void print128(__uint128_t u) {
if (u>9) print128(u/10);
putchar(48+(int)(u%10));
}
// 2^(bitwidth) == count_size for counting sort. 16 is 128/bitwidth == num_passes (number of radix sort passes)
typedef __uint128_t u128;
typedef uint32_t u32;
u128* radix_sort128(u128* array, u32 size, u32 count_size, u32 num_passes, u32 bitwidth) {
u32 counts[num_passes][count_size];
memset(&counts, 0, count_size * num_passes * sizeof(u32));
u128* cpy = (u128*) malloc(size * sizeof(u128));
u32 o[num_passes];
memset(o, 0, sizeof o);
// u32 o[num_passes] = {0};
u32 t, x, pos;
u128* array_from, * array_to;
for (x = 0; x < size; x++) {
for (pos = 0; pos < num_passes; pos++) {
t = (array[x] >> bitwidth * pos) % count_size;
counts[pos][t]++;
}
}
for (x = 0; x < count_size; x++) {
for (pos = 0; pos < num_passes; pos++) {
t = o[pos] + counts[pos][x];
counts[pos][x] = o[pos];
o[pos] = t;
}
}
for (pos = 0; pos < num_passes; pos++) {
array_from = pos % 2 == 0 ? array : cpy;
array_to = pos % 2 == 0 ? cpy : array;
for (x = 0; x < size; x++) {
t = (array_from[x] >> bitwidth * pos) & 0xff;
array_to[counts[pos][t]] = array_from[x];
counts[pos][t]++;
}
}
free(cpy);
return array;
}
uint64_t wyhash64_x;
uint64_t wyhash64() {
wyhash64_x += 0x60bee2bee120fc15;
__uint128_t tmp;
tmp = (__uint128_t) wyhash64_x * 0xa3b195354a39b70d;
uint64_t m1 = (tmp >> 64) ^ tmp;
tmp = (__uint128_t)m1 * 0x1b03738712fad5c9;
uint64_t m2 = (tmp >> 64) ^ tmp;
return m2;
}
int main() {
u128* vals128 = malloc(LENGTH * sizeof(*vals128));
u128 lower;
u128 higher;
struct timespec start, end;
// make an array of random u128s
for (int i = 0; i < LENGTH; i++) {
lower = (u128) wyhash64();
higher = (u128) wyhash64();
higher = higher << 64;
vals128[i] = lower + higher;
}
for (u32 bitwidth = 8; bitwidth < 20; bitwidth+=2){
u32 num_passes = 128/bitwidth;
u32 count_size = 1 << bitwidth;
printf("bitwidth %d num_passes %d count_size %d\n", bitwidth, num_passes, count_size);
clock_gettime(CLOCK_MONOTONIC, &start);
radix_sort128(vals128, LENGTH, count_size, num_passes, bitwidth);
clock_gettime(CLOCK_MONOTONIC, &end);
printf("%f seconds\n", (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1000000000.0);
}
free(vals128);
}
Any improvements gratefully received.
Edit
Corrected bug. & 0xff
is now % count_size
.
-
1\$\begingroup\$ The problem is the 0xff. That needs fixing \$\endgroup\$Simd– Simd2021年08月01日 09:24:03 +00:00Commented Aug 1, 2021 at 9:24
-
\$\begingroup\$ Hi, you can refer to this codereview.stackexchange.com/q/216460/190450. The code is in c++. But you can go through it as a reference. \$\endgroup\$Deepeshkumar– Deepeshkumar2021年08月01日 13:52:30 +00:00Commented Aug 1, 2021 at 13:52
-
1\$\begingroup\$ num_passes is wrong unless 128 is divisible by bitwidth. You need to round up. \$\endgroup\$Simd– Simd2021年08月02日 08:21:16 +00:00Commented Aug 2, 2021 at 8:21
2 Answers 2
There's an opaque magic number here, that reduces portability:
putchar(48+(int)(u%10));
I think that was intended to be '0'
, and we unnecessarily created an ASCII-only program.
This function might be better if it works in larger chunks, and recursed less:
void print128(__uint128_t u) {
static const uint64_t ten18 =わ 10000000000000000000u;
if (u < ten18) {
printf("%" PRIu64, (uint64_t)u);
} else {
print128(u/ten18);
printf("%018" PRIu64, (uint64_t)(u % ten18));
}
}
u128* cpy = (u128*) malloc(size * sizeof(u128));
Don't cast the return from malloc()
- it's a void*
, which is convertible to any pointer type in C. And it's good practice to refer the size directly to the variable being initialised, rather than to its type - that makes it easier for the reader to see correctness without having to find the declaration, which may be further away. Another good habit is to multiply beginning with the size_t
quantity, which can avoid overflow when there are more terms in the product.
u128* cpy = malloc(sizeof *cpy * size);
Whatever we do, we must not dereference that pointer until we know it's not a null pointer.
if (!cpy) {
return NULL;
}
uint64_t wyhash64()
This function declaration should be a prototype (as should main()
):
uint64_t wyhash64(void)
There's no need for wyhash64_x
to have global scope - it would be better as a static
local within the function.
-
\$\begingroup\$ On linux at least, I am not sure malloc ever returns NULL . See stackoverflow.com/questions/16674370/… \$\endgroup\$Simd– Simd2021年08月01日 14:33:32 +00:00Commented Aug 1, 2021 at 14:33
-
1\$\begingroup\$ It certainly does when you hit your
ulimit -v
value. \$\endgroup\$Toby Speight– Toby Speight2021年08月01日 14:36:18 +00:00Commented Aug 1, 2021 at 14:36 -
\$\begingroup\$ That is true. It looks like the code as written is actually incorrect because of the 0xFF. Would you be able to add a correction to your great answer? That is it doesn't work for bitwidth other than 8. I think the mask just has to be change to count-size - 1 . \$\endgroup\$Simd– Simd2021年08月01日 14:38:57 +00:00Commented Aug 1, 2021 at 14:38
-
1\$\begingroup\$ It's probably better for you to add an answer of your own to explain that (and will help you build reputation here). (We like having multiple different answers here on Code Review.) \$\endgroup\$Toby Speight– Toby Speight2021年08月01日 14:44:53 +00:00Commented Aug 1, 2021 at 14:44
-
\$\begingroup\$ As I mention in a comment above, there seems to be a fatal bug in any case \$\endgroup\$Simd– Simd2021年08月02日 11:02:58 +00:00Commented Aug 2, 2021 at 11:02
Not much left after @Toby Speight review.
Bug?
Code has a mysterious & 0xff
. Perhaps % count_size
?
u32 counts[num_passes][count_size];
...
// t = (array[x] >> bitwidth * pos) & 0xff;
t = (array[x] >> bitwidth * pos) % count_size;
counts[pos][t]++;
Minor bug
Assuming int
is 32-bit`.
When int
is 16-bit, 1 << bitwidth
shifts outside int
range.
// u32 count_size = 1 << bitwidth;
u32 count_size = ((u32)1) << bitwidth; // or the like.
for
loop like-wise fails. As i
is used to index arrays, recommend size_t
.
#define LENGTH 1000000
// for (int i = 0; i < LENGTH; i++) {
for (size_t i = 0; i < LENGTH; i++) {
Multiplication order for size
Rather than assume 32-bit for size_t
, (e.g. it may be 64-bit), perform the size multiplication starting with size_t
to take advantage where a wide size_t
may prevent overflow of u32 * u32
. (I now see Toby mentioned part of this.)
... u32 count_size, u32 num_passes ...
// u32 counts[num_passes][count_size];
// memset(&counts, 0, count_size * num_passes * sizeof(u32));
// ^---------------------^ may overflow 32 bit math
memset(&counts, 0, sizeof(u32) * count_size * num_passes);
^----------------------^ Math done using wider of size_t, u32
Even simpler, just use the size of the array.
// memset(&counts, 0, count_size * num_passes * sizeof(u32));
memset(&counts, 0, sizeof counts);
u32
vs uint32_t
Rather than a user/implementation defined fixed width type, consider using the standard one.
To know if u32
code is correct obliges a look-up of u32
definition.
Superfluous casts
Not needed with up conversions.
//lower = (u128) wyhash64();
// higher = (u128) wyhash64();
lower = wyhash64();
higher = wyhash64();
-
1\$\begingroup\$ And If you do both
malloc()
andmemset()
, usecalloc()
instead, and you don't have to worry about those sizes. \$\endgroup\$G. Sliepen– G. Sliepen2021年08月01日 22:33:28 +00:00Commented Aug 1, 2021 at 22:33 -
2\$\begingroup\$ @G.Sliepen Sounds like a good idea, perhaps post that, even if it is only a small review. \$\endgroup\$chux– chux2021年08月01日 23:39:49 +00:00Commented Aug 1, 2021 at 23:39
-
\$\begingroup\$ Will % count_size be as fast as & (count_size - 1)? \$\endgroup\$Simd– Simd2021年08月02日 06:33:14 +00:00Commented Aug 2, 2021 at 6:33
-
\$\begingroup\$ Can you also see how to fix " It doesn't work (crashes due to stack overflow) for bitwidth > 18."? \$\endgroup\$Simd– Simd2021年08月02日 07:29:33 +00:00Commented Aug 2, 2021 at 7:29
-
\$\begingroup\$ There is another bug it seems. I don't think num_passes is correct if 128 is not divisible by bitwidth, \$\endgroup\$Simd– Simd2021年08月02日 08:22:01 +00:00Commented Aug 2, 2021 at 8:22