Sorting Floating Point Values

Question 1

I found sorting large arrays by a comparator that looks at the floating-point difference problematic, especially -ffast-math, (https://stackoverflow.com/questions/24442725/is-floating-point-addition-commutative-in-c.) This is intended to compare 32-bit floats exactly using integer arithmetic. Most references cited concern themselves with radix sort:

Nicholas Chapman, https://www.forwardscattering.org/post/34;
Michael Herf, http://stereopsis.com/radix.html;
Pierre Terdiman, http://codercorner.com/RadixSortRevisited.htm;
https://hbfs.wordpress.com/2010/03/09/radix-sort-on-floating-point-numbers/;
https://randomascii.wordpress.com/category/floating-point/.

Using a comparison sort, like qsort, it only has to decide on a total order of IEEE-754 numbers by comparing two at a time.

#include <stdlib.h> /* EXIT_SUCCESS qsort */
#include <stdio.h> /* printf */
#include <assert.h> /* assert */
#include <time.h> /* clock */
#include <limits.h> /* INT_MAX */
#include <math.h> /* C99 floating point macros */
#include <stdint.h> /* C99 uint32_t */
struct Foo { float x; };
/** Compares float {x} values of {Foo} exactly. Assumes IEEE-754-ish 32-bit
 float has the same endianness as {uint32_t}. References:
 \cite{KimYoonKim2011FastSortFloatingPoint},
 Nicholas Chapman \url{ https://www.forwardscattering.org/post/34 },
 Michael Herf \url{ http://stereopsis.com/radix.html },
 Pierre Terdiman \url{ http://codercorner.com/RadixSortRevisited.htm },
 \url{ https://hbfs.wordpress.com/2010/03/09/radix-sort-on-floating-point-numbers/ },
 \url{ https://randomascii.wordpress.com/category/floating-point/ },
 \url{ https://stackoverflow.com/questions/10632237/any-c-compiler-where-evaluates-to-larger-than-one }.
 @implements <Foo>Comparator
 @return Greater than, equal to, or less than 0, if the {Foo.x} pointed to by
 {av} is greater than, equal to, or less than the {Foo.x} pointed to by {bv}. */
static int x_cmp(const void *av, const void *bv) {
 const struct Foo *const a = av, *const b = bv;
 union { float f; uint32_t u; } ax, bx;
 ax.f = a->x, bx.f = b->x;
 {
 /* @chux unsigned -> uint32_t critical fix */
 const uint32_t ax_abs = ax.u & 0x7fffffff, ax_sign = ax.u & 0x80000000;
 const uint32_t bx_abs = bx.u & 0x7fffffff, bx_sign = bx.u & 0x80000000;
 const uint32_t same = !(ax_sign ^ bx_sign);
 /* if(!same) return ax_sign^1; else return ax_sign^(ax_abs - bx_abs); */
 return ax_sign ^ (!same + same * (ax_abs - bx_abs));
 }
}
/** Tests {x_cmp}. */
int main(void) {
 struct Foo foos[32]; /* could be bigger */
 const size_t foos_size = sizeof foos / sizeof *foos;
 unsigned i;
 union { float f; uint32_t u; } x;
 assert(sizeof(float) == 4);
 srand((unsigned)clock()), rand();
 foos[0].x = 0.0f;
 foos[1].x = 0.0f;
 foos[2].x = -0.0f;
 foos[3].x = INFINITY; /* C99 */
 foos[4].x = -INFINITY; /* C99 */
 foos[5].x = NAN; /* C99 */
 foos[6].x = (x.u = 0x807fffff, x.f); /* subnormal */
 foos[7].x = (x.u = 1, x.f); /* subnormal */
 for(i = 8; i < 2 * foos_size / 3; i++) {
 foos[i].x = rand() / (0.5f * RAND_MAX / INT_MAX) - INT_MAX;
 } for( ; i < foos_size; i++) {
 foos[i].x = rand() / (0.5f * RAND_MAX / 100.0f) - 100.0f;
 }
 qsort(foos, foos_size, sizeof foos[0], &x_cmp);
 for(i = 1; i < foos_size; i++) {
 printf("%g <= %g\n", foos[i - 1].x, foos[i].x);
 /* isnan C99 */
 assert(isnan(foos[i - 1].x) || isnan(foos[i].x)
 || foos[i - 1].x <= foos[i].x);
 }
 return EXIT_SUCCESS;
}

I compiled it on GCC 4.2.1 MacOS, GCC 5.4.0 Linux, and MSVC15, (the order of the NaN macro switched, thus isnan,) I think it's valid? Could the comparator be done in less steps?

Edit:

int32_t: increasing injection mod the branch cut. int32_t

IEEE 754-ish float: (one way) to get rid of the singularity and make it monotonic is to invert the negative values and flip the sign bit on positive values, Radix Sort, Sorting a float data.

IEEE 754 floating point.

Question 2

You shouldn't rely on internal FP representation like that, it won't work properly with every target CPU/FPU.

Question 3

Hmmmm: Idea: If x_cmp() only did uint32_t *a = av; uint32_t *b = bv; return (*a > *b ) - (*a < *b);, then the array would be quickly "sorted" per its binary image. Assuming a matching float/int32_t.endian and binary32 FP, a post analysis could fold, in O(n) time, the resultant array as needed. I suspect it would be faster. Perhaps I'll try it later.

Question 4

If you care about performance, don't use qsort. Here is an old post on the performance of sorting. qsort was very slow. I don't know how much it stands nowadays, but calling function pointers that frequently hurts anyway. In addition, qsort is badly designed. Generic sorting algorithms can implemented with a single "less than" operator (like in C++ STL). You don't need any arithmetic operations.

Question 5

@πάνταῥεῖ I think I'm relying on: (1) float is 32 bits (I check this with an assert,) (2) negative sign bit is 31, (3) ~int32_t(float)++ is 1 ULP bigger. Specifically, this applies to IBM floats as well. The answer by @chux made no such assumption. @user172818 true; I assume this applies to all comparison sorts, on average.

Question 6

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

// ax.f = a->x, bx.f = b->x;
ax.f = a->x;
bx.f = b->x;

I ran test cases and found OP's x_cmp() functionally correct for finite float over a 2,000,000,000 test cases.
As a test case, I tried OP's original x_cmp() versus the below and was at least 10% faster with the new code. Of course, that is just one platform comparison, yet aside from NaN issues, the below code is functionally similar to OP's and as a plus, is highly portable - unlike OP's. The point being that OP's compare method needs some reference point to justify the bit magic.
```
static int x_cmp_ref(const void *av, const void *bv) {
 return (*(float*)av > *(float*)bv) - (*(float*)av < *(float*)bv);
}
```
OP's has not stated the compare functionality of Not-a-number floats. A desirable aspect is that all NaN sort to one side, either all greater or all less than any other number, regardless of the NaN's "sign". x_cmp() considers sign first without regard to NaN-ness.

As a reference, I used the following to generate random float

float randf() {
 union {
 float f;
 unsigned char uc[sizeof (float)];
 } u;
 do {
 for (unsigned i=0; i<sizeof u.uc; i++) {
 u.uc[i] = (unsigned char) rand();
 }
 } while (!isfinite(u.f));
 return u.f;
}

At a later time, I may try to implement the idea in this comment

Question 7

1. Good catch, it should be all uint32_t to match float; 2. thanks; 4. that would be great! but it didn't work for negative values; stackoverflow.com/questions/4640906/…; 5. I agree that it's arguably best to keep them all NaNs one place; otoh, why change arbitrarily no-work placement without good reason?

Question 8

@NeilEdelman Thanks for your thoughts. #4 returns the correct int when comparing float of which 1 or both are negative. Unclear on "it didn't work for negative values" as it applies to qsort() used here. #5 There is a good reason: A typical application for a list with NaNs is to sort them out to the end, that way a simple reduction in N can lop them off the array. As I see it, a robust compare concenring NaN can be done with low cost. Yet as you imply, NaN handling is not universal.

Question 9

I was reinterpreting them as uint32_t, which did not sort the negative values correctly; leaving them as floats produced the right output. I measure it to be 25% faster on my machine. I did a sequence of values that differed by 1 ulp, and it appears to be stable, and looks cool.

Question 10

*As you said, the placement of NaNs in a comparison sort with float is different than int32_t; comparisons with float NaNs are not treated as equivalence relations.

chux chux 36.2k2 gold badges43 silver badges96 bronze badges · Accepted Answer · 2017-04-19 02:41:55Z

ax_sign ^ (!same + same * (ax_abs - bx_abs)); returns the incorrect signed result if int is not 32-bit. Certainly a problem if int is 16-bit and likely if int is 64-bit. If unsigned/int needs to be 32-bit, use (u)int32_t

, operator reduces clarity here. Suggest 2 lines of code

// ax.f = a->x, bx.f = b->x;
ax.f = a->x;
bx.f = b->x;

I ran test cases and found OP's x_cmp() functionally correct for finite float over a 2,000,000,000 test cases.
As a test case, I tried OP's original x_cmp() versus the below and was at least 10% faster with the new code. Of course, that is just one platform comparison, yet aside from NaN issues, the below code is functionally similar to OP's and as a plus, is highly portable - unlike OP's. The point being that OP's compare method needs some reference point to justify the bit magic.
```
static int x_cmp_ref(const void *av, const void *bv) {
 return (*(float*)av > *(float*)bv) - (*(float*)av < *(float*)bv);
}
```
OP's has not stated the compare functionality of Not-a-number floats. A desirable aspect is that all NaN sort to one side, either all greater or all less than any other number, regardless of the NaN's "sign". x_cmp() considers sign first without regard to NaN-ness.

As a reference, I used the following to generate random float

float randf() {
 union {
 float f;
 unsigned char uc[sizeof (float)];
 } u;
 do {
 for (unsigned i=0; i<sizeof u.uc; i++) {
 u.uc[i] = (unsigned char) rand();
 }
 } while (!isfinite(u.f));
 return u.f;
}

At a later time, I may try to implement the idea in this comment

1. Good catch, it should be all uint32_t to match float; 2. thanks; 4. that would be great! but it didn't work for negative values; stackoverflow.com/questions/4640906/…; 5. I agree that it's arguably best to keep them all NaNs one place; otoh, why change arbitrarily no-work placement without good reason?
@NeilEdelman Thanks for your thoughts. #4 returns the correct int when comparing float of which 1 or both are negative. Unclear on "it didn't work for negative values" as it applies to qsort() used here. #5 There is a good reason: A typical application for a list with NaNs is to sort them out to the end, that way a simple reduction in N can lop them off the array. As I see it, a robust compare concenring NaN can be done with low cost. Yet as you imply, NaN handling is not universal.
I was reinterpreting them as uint32_t, which did not sort the negative values correctly; leaving them as floats produced the right output. I measure it to be 25% faster on my machine. I did a sequence of values that differed by 1 ulp, and it appears to be stable, and looks cool.
*As you said, the placement of NaNs in a comparison sort with float is different than int32_t; comparisons with float NaNs are not treated as equivalence relations.

Stack Exchange Network

Sorting Floating Point Values

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Sorting Floating Point Values

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions