I have a comparison function for comparing two points in 2 (or more) dimensions based on the angle of each point in standard polar coordinates. In other words, a point a
should compare less than a point b
if the angle through the origin from the +x axis to a
is less than the corresponding angle for b
. For the origin, the normalized angle is 0
BUT for sanity it compares equal with any point.
This can be done pretty trivially using the atan2
function, which has built in handling for different quadrants and simply returns a normalized angle (between 0 inclusive and 2 pi exclusive) for any given point besides the origin. Unfortunately, trig is slow so it is better to compare b.x*a.y <=> a.x*b.y
. This is also nice since it plays nicer with integers than trig does. This comparison can be derived from the fact that tan is strictly increasing on the interval 0 to pi/2. However, it only works if the points a
and b
are in the same quadrant, so we need to do some preprocessing before we can apply it.
This is where my dilemma with my current code comes in. I originally had a fairly long and hard to read forest of if
statements that handled all cases where a
and b
were not in the same quadrant, but I replaced it with the code you see here. I first get the "region number" for each point a
and b
, and then use the b.x*a.y <=> a.x*b.y
comparison if necessary. I define the region number as 0
for the origin and 1-8
for the +x axis through quadrant IV in counterclockwise order. Namely, 2
is quadrant I, 3
is the +y axis, 4
is quadrant II, 5
is the -x axis, 6
is quadrant III, and 7
is the -y axis.
Thus I can compare two points by
- getting the region number for each
- if either is
0
, return0
since one of the points is the origin so they should compare equal since - otherwise, if their region numbers are not the same, return
-1
if the region number fora
is smaller and1
if the region number forb
is smaller - otherwise, the region numbers for
a
andb
are the same. If the region number fora
is odd, return0
because the points are on the same axis in the same direction (+x, +y, -x, or -y) - otherwise,
a
andb
are both in the interior of the same quadrant, so we can use theb.x*a.y <=> a.x*b.y
comparison
Finding the region number for each point does a bit of redundant work in some cases, but this is probably something the compiler can optimize away. I'm not at the point where I need to profile and optimize this, though I do want to avoid floating point computations.
Here is the code for the comparison function:
/// Return an identifier for the region of the xy plane containing a point
///
/// Returns 0 for origin or 1 to 8 for the +/- axis and quadrants in
/// counterclockwise order starting from the +x axis (ie 1 for +x axis,
/// 2 for quadrant I, etc, and 8 for quadrant IV)
static inline int find_x_y_region(const int64_t *a){
if(a[0] == 0){// a is on the y axis
if(a[1] == 0){// a is the origin
return 0;
}
return a[1] > 0 ? 3 : 7;// +y axis and -y axis respectively
}else if(a[1] == 0){// a is on the x axis (but not the origin)
return a[0] > 0 ? 1 : 5;// +x axis and -x axis respectively
}else if(a[1] > 0){// a is in the upper half plane
return a[0] > 0 ? 2 : 4;// quadrant I and quadrant II respectively
}// otherwise a is in the lower half plane
return a[0] > 0 ? 8 : 6;// quadrant IV and quadrant III respectively
}
static inline int cmp_x_y(const void *_a, const void *_b){
const int64_t *a = _a, *b = _b;
// consider the points a and b projected into the xy plane (ie ignore their z components)
// we want to find which point, a or b, has a smaller angle in its standard polar representation
// return -1 if a has a smaller angle, 1 if b does, 0 if they have the same angle
// we do not need to do trig for this, but there are a lot of cases
// for a fixed z, all xy points have a fixed radius, so comparison of points at the same z
// can be done instantly with one coordinate. However, when a and b have different z coordinates we
// need to work harder.
int a_region = find_x_y_region(a), b_region = find_x_y_region(b);
if(a_region == 0 || b_region == 0){
return 0;
}else if(a_region < b_region){
return -1;
}else if(a_region > b_region){
return 1;
}else if(a_region & 1){// both regions are the same, if they are odd both points are on the same axis (+/- x/y)
return 0;
}// otherwise both points are in the same quadrant
int64_t ord = b[0]*a[1] - a[0]*b[1];
if(ord < 0){
return -1;
}else if(ord == 0){
return 0;
}
return 1;
}
Factoring out find_x_y_region
helped a lot, but can I simplify this comparison function even further? Also, should I worry about dealing with overflow in the computation of ord
? As hinted at in the comments, I'm going to use this to make a KD tree out of points on the surface of a sphere, so I'm already requiring the square of any coordinate be representable as an int64_t
and for such points overflow is not a concern. I could coalesce a couple of branches in the comparison function to ternary operators which would shorten it by 4 lines, but that wouldn't really increase readability.
2 Answers 2
should I worry about dealing with overflow in the computation of ord?
Yes. The difference may overflow.
Instead compare:
// Compute cross product sign (see next bullet)
int64_t ba = b[0]*a[1];
int64_t ab = a[0]*b[1];
if(ba < ab) return -1;
if(ba == ab) return 0;
return 1;
can I simplify this comparison function even further?
Excessive comparing is done.
To expand on @vnp idea:
...If both vectors are in the upper half plane, the cross-product works. If they are both in the lower half plane, it also works (backwards). If they are in the different half planes, the one in the upper is less.
int cmp_x_y(const int64_t a[2], const int64_t b[2]) {
if (a[1] < 0 == b[1] < 0) { // same upper/lower plane?
int cps = cross_product_sign(a,b);
if (a[1] < 0) cps = -cps;
return cps;
}
return a[1] < 0 ? -1 : 1;
}
Above may need some edge-case review, but does present the idea to use the cross-product as the general solution path.
void *
vs. int64_t *
*a = _a
leads to undefined behavior (UB) should *_a
not meet the alignment requirements of int64_t
. void *
also loses some type checking.
Suggest simple using int64_t *
or int64_t a[2]
.
//static inline int cmp_x_y(const void *_a, const void *_b){
// const int64_t *a = _a, *b = _b;
static inline int cmp_x_y(const int64_t *a, const int64_t *b) {
// or
static inline int cmp_x_y(const int64_t a[2], const int64_t b[2]) {
Minor: discussion error
This can be done pretty trivially using the atan2 function, which has built in handling for different quadrants and simply returns a normalized angle (between 0 inclusive and 2 pi exclusive)
Above is amiss: atan2(y, x)
returns "arctan y=x in the interval [-π π] radians."
-
\$\begingroup\$ Oh right! I've used the cross product approach a lot in the past but I got stuck on a tan based approach. I have to stick with
void*
since this is a generic callback. (I have strict aliasing disabled which works around casting flexible length char arrays in generic data structs and alignment on platforms allowing unaligned access. I think the drawbacks ofvoid*
with no strict aliasing are more benign than of macro based or intrusive structs.) \$\endgroup\$hacatu– hacatu2021年07月20日 02:34:23 +00:00Commented Jul 20, 2021 at 2:34 -
\$\begingroup\$ @hacatu It is not the aliasing issue, but the alignment that can break code here. If stuck with
void* _a
, could useint64_t a; memcpy(&a, _a, sizeof a):
as that fixes alignment issues and a good compiler will talk advantage as it can to emit efficient code without a function call. \$\endgroup\$chux– chux2021年07月20日 02:39:58 +00:00Commented Jul 20, 2021 at 2:39
Early returns
You have a bunch of them. I'm all for them (not everyone is), but you're inconsistent in your else
style after an early return. Sometimes you do this:
if(a[1] == 0){
return a[0] > 0 ? 1 : 5;
}else // ...
and sometimes you dispense with the else
:
if(a[1] > 0){
return a[0] > 0 ? 2 : 4;
}
return a[0] > 0 ? 8 : 6;
My preferred style is the latter, since else
is redundant.
Inline
Even if a compiler were to respect your explicit inline
(which it typically doesn't), you wouldn't want it to. Compilers are better than humans at deciding what should be auto-inlined and what shouldn't. Just drop the inline
.
Further reading e.g.: https://www.greenend.org.uk/rjk/tech/inline.html, particularly
Sometimes it is necessary for the compiler to emit a stand-alone copy of the object code for a function even though it is an inline function
[...]
You can have a separate (not inline) definition in another translation unit, and the compiler might choose either that or the inline definition.
and https://stackoverflow.com/questions/2082551/what-does-inline-mean, including
nowadays most compilers don't even acknowledge it as a hint to inline the function.
Explore related questions
See similar questions with these tags.
if
statements by just returning signed-integer differences in many of these cases. \$\endgroup\$