floating-point implementation of a function with removable singularity

Question 1

Consider the following function:

ƒ(x) = sin(sqrt(x)) / sqrt(x) if x > 0
ƒ(x) = 1 if x = 0
ƒ(x) = sinh(sqrt(-x)) / sqrt(-x) if x < 0

This is an entire function with a Taylor series at 0 of: 1/1!, -1/3!, 1/5!, -1/7!, ... that I want to implement in floating point (think IEEE-754 single, IEEE-754 double or arbitrary precision like with MPFR.

My current C-ish implementation with a floating-point type F looks something like:

F func (F x)
{
 F q = sqrt (fabs (x));
 if (x * x == 0)
 return 1 - x / 6;
 else
 return x > 0
 ? sin (q) / q
 : sinh (q) / q;
}

From what I can tell the implementation works reasonably well in an environment of 0, but I am wondering if there are loophole values where it does not work (well).

Question 2

What is your concern and why are you bothering with x * x == 0 instead of just x == 0? For x such that x * x == 0, I would expect sin(q) and sinh(q) to return q because the actual value of sin q and sinh q is so close to q that q is the closest representable value. Then both sin (q) / q and sinh (q) / q produce 1, as does 1 - x / 6 (because x is so small that 1 is the closest representable value to 1 − x/6). So what are you gaining with x * x == 0 instead of just x == 0?

Question 3

@EricPostpischil My concern was that when |x| is small, then there is "not enough" information in the sub-normal mantissa to yield results in the error margin.

Question 4

Loophole values where it does not work

Infinities: When x is +infinity, the expected math result is 0.0, yet OP's code may return Not-a-number due to sin(infinity).

When x is -infinity, the expected math result is infinity, yet OP's code may return Not-a-number due to sinh(infinity)/infinity.

Code could test for these non-finite x values.

Overflow: When x is modestly negative, sinh(q) may overflow into infinity, yet the extend math quotient sinh(q)/q is still finite.

// float example
float yn = sinhf(q) / q;
// If a non-finite value resulted, try a slower, less precise, yet wider range approach.
if (!isfinite(yn)) {
 yn = expf(q - logf(q) - logf(2));
}

@njuffa offers a good alternative.

Another improvement:

float yn = sinhf(q) / q;
if (!isfinite(yn)) {
 // I expect this to work for all `x` where the math `sinhf(q) / q <= FLT_MAX`.
 yn = expf(q/2);
 yn *= yn/(2*q);
}

Modest magnitude x: I found that, using float, q = sqrtf(fabsf(x)); return sinhf(q) / q; loses about 6 bits of correctness before sinhf(q) hits overflow. This is because the error is forming the square root q and then q applied to sinhf(g) incurs an error proportional to q.

To really reduce this function's error, a better, TBD, approach is needed as x grows negative (about x < -4).

Trig errors

q = sqrtf(x) has up to 0.5 ULP error in the result. When q < 99.99%*π, sinf(q) is well behaved. Yet at/near every following multiple of π, the ULP error of sinf(q) can be terrible, even if the absolution value is very reasonable. As q grows and its least significant significant digit nears 1.0, sign and magnitude of sinf(q) becomes meaningless.

Additional code is needed to compensate for square root errors. Considerations start at x > 1.57 for ULP errors.

There are no for absolute errors concerns for finite x.

OP likely area of concern.

By using if (x * x == 0) return 1 - x / 6; code is well behaved near zero. The end-points of the region is about x = [-sqrt(DBL_TRUE_MIN) ... sqrt(DBL_TRUE_MIN)]. Now let us look at the end points as x is just within this region and just outside (e.g. near +/-sqrt(DBL_TRUE_MIN))

Taylor's series for sin(x) is x - x^3/3! + x^5/5! +/- ...

Taylor's series for sinh(x) is x + x^3/3! + x^5/5! + ...

Mathematically, all looks fine as the function is continuous, the 1st,2nd derivative also.

We could look close and compute with double x values near this area, chart that and report loophole values where the y steps are a tad jagged. Yet doing this examination is dubious as there is scant benefit over the simpler:

F func2 (F x)
{
 if (x == 0)
 return 1;
 else
 F q = sqrt (fabs (x));
 return x > 0
 ? sin (q) / q
 : sinh (q) / q;
}

Here, the nature of x near 0.0 for sin(x), sinh(x) is well researched and tested.

Still, to test, consider using float as it is feasible to test all float.

Find first difference

#include <float.h>
#include <math.h>
#include <stdio.h>
int main() {
 float x = sqrtf(FLT_TRUE_MIN);
 printf("sqrtf(FLT_TRUE_MIN):%24a\n", sqrtf(FLT_TRUE_MIN));
 x = FLT_TRUE_MIN;
 for (int i = 0; 1; i++) {
 float q = sqrtf(x);
 float yp = sinf(q) / q;
 float yn = sinhf(q) / q;
 float y0 = 1.0f - x / 6.0f;
 if (y0 != yp || y0 != yn) {
 printf("x:%a, q:%a, sin(q)/q:%a, sinh(q)/q:%a, 1-x/6:%a\n", //
 x, q, yp, yn, y0);
 break;
 }
 x = nextafterf(x, 2 * x);
 }
 return 0;
}

Sample output:

sqrtf(FLT_TRUE_MIN): 0x1.6a09e6p-75
x: 0x1p-46, q:0x1p-23, sin(q)/q:0x1p+0, sinh(q)/q:0x1.fffffep-1, 1-x/6:0x1p+0

It is clear that OP's original code has no discernable issues in the +/-sqrtf(FLT_TRUE_MIN) range as all 3 sub-function result in the same y values for x until much larger than sqrtf(FLT_TRUE_MIN) - as expected.

Still, I do not see a benefit (unless most usage values are near 0.0) in doing the if (x*x == 0) test versus the simple if (x==0).

Soap box

Certainly OP was most interesting is dealing with x at or near 0.0 and the "loopholes" there about x*x == 0. So was I when I first saw this post. Later I realized that many x values in the range [-8828.3 ... -7995.2] failed.

It highlights the importance of functional testing across the spectrum of x before attempting optimization in speed or low error performance.

Question 5

(+1) For "very negative" x, why not simply compute 1 / (2 * sqrt (-x) * exp (-sqrt (-x))), taking advantage of subnormals? This should produce results with smaller error than going through logarithmic space and exponentiating at the end, which incurs a substantial amount of error magnification at the extreme end of the range.

Question 6

@njuffa Interesting idea. Answer updated. Just went through a similar issue here with best approach. Will ponder some more.

Question 7

I decided to be less lazy and actually tried my idea and your proposed fix, and the results of a quick experiment seem clear. For a double implementation, the maximum error with my approach is about 6050 ulp, while it is about 1500 ulp with yours (i.e., when going through logarithmic space). My initial thought was that both approaches would lose on the order of 12 bits due to error magnification, but mine would reduce the error being magnified a bit for overall better accuracy. Clearly that is not the case.

Question 8

@njuffa The trick bring the computation into the fat FP zone near 0.0 to 1.0, forms the pre-answer and then scale exactly by 2^n. I'll look more into it later - real life calls.

Question 9

You latest proposal results in a maximum error of 513 ulps. I am sampling double arguments in [-514165, -504776], BTW.

chux 158k17 gold badges160 silver badges311 bronze badges · Accepted Answer · 2025-05-10 19:15:30Z

Loophole values where it does not work

Infinities: When x is +infinity, the expected math result is 0.0, yet OP's code may return Not-a-number due to sin(infinity).

When x is -infinity, the expected math result is infinity, yet OP's code may return Not-a-number due to sinh(infinity)/infinity.

Code could test for these non-finite x values.

Overflow: When x is modestly negative, sinh(q) may overflow into infinity, yet the extend math quotient sinh(q)/q is still finite.

// float example
float yn = sinhf(q) / q;
// If a non-finite value resulted, try a slower, less precise, yet wider range approach.
if (!isfinite(yn)) {
 yn = expf(q - logf(q) - logf(2));
}

@njuffa offers a good alternative.

Another improvement:

float yn = sinhf(q) / q;
if (!isfinite(yn)) {
 // I expect this to work for all `x` where the math `sinhf(q) / q <= FLT_MAX`.
 yn = expf(q/2);
 yn *= yn/(2*q);
}

Modest magnitude x: I found that, using float, q = sqrtf(fabsf(x)); return sinhf(q) / q; loses about 6 bits of correctness before sinhf(q) hits overflow. This is because the error is forming the square root q and then q applied to sinhf(g) incurs an error proportional to q.

To really reduce this function's error, a better, TBD, approach is needed as x grows negative (about x < -4).

Trig errors

q = sqrtf(x) has up to 0.5 ULP error in the result. When q < 99.99%*π, sinf(q) is well behaved. Yet at/near every following multiple of π, the ULP error of sinf(q) can be terrible, even if the absolution value is very reasonable. As q grows and its least significant significant digit nears 1.0, sign and magnitude of sinf(q) becomes meaningless.

Additional code is needed to compensate for square root errors. Considerations start at x > 1.57 for ULP errors.

There are no for absolute errors concerns for finite x.

OP likely area of concern.

By using if (x * x == 0) return 1 - x / 6; code is well behaved near zero. The end-points of the region is about x = [-sqrt(DBL_TRUE_MIN) ... sqrt(DBL_TRUE_MIN)]. Now let us look at the end points as x is just within this region and just outside (e.g. near +/-sqrt(DBL_TRUE_MIN))

Taylor's series for sin(x) is x - x^3/3! + x^5/5! +/- ...

Taylor's series for sinh(x) is x + x^3/3! + x^5/5! + ...

Mathematically, all looks fine as the function is continuous, the 1st,2nd derivative also.

We could look close and compute with double x values near this area, chart that and report loophole values where the y steps are a tad jagged. Yet doing this examination is dubious as there is scant benefit over the simpler:

F func2 (F x)
{
 if (x == 0)
 return 1;
 else
 F q = sqrt (fabs (x));
 return x > 0
 ? sin (q) / q
 : sinh (q) / q;
}

Here, the nature of x near 0.0 for sin(x), sinh(x) is well researched and tested.

Still, to test, consider using float as it is feasible to test all float.

Find first difference

#include <float.h>
#include <math.h>
#include <stdio.h>
int main() {
 float x = sqrtf(FLT_TRUE_MIN);
 printf("sqrtf(FLT_TRUE_MIN):%24a\n", sqrtf(FLT_TRUE_MIN));
 x = FLT_TRUE_MIN;
 for (int i = 0; 1; i++) {
 float q = sqrtf(x);
 float yp = sinf(q) / q;
 float yn = sinhf(q) / q;
 float y0 = 1.0f - x / 6.0f;
 if (y0 != yp || y0 != yn) {
 printf("x:%a, q:%a, sin(q)/q:%a, sinh(q)/q:%a, 1-x/6:%a\n", //
 x, q, yp, yn, y0);
 break;
 }
 x = nextafterf(x, 2 * x);
 }
 return 0;
}

Sample output:

sqrtf(FLT_TRUE_MIN): 0x1.6a09e6p-75
x: 0x1p-46, q:0x1p-23, sin(q)/q:0x1p+0, sinh(q)/q:0x1.fffffep-1, 1-x/6:0x1p+0

It is clear that OP's original code has no discernable issues in the +/-sqrtf(FLT_TRUE_MIN) range as all 3 sub-function result in the same y values for x until much larger than sqrtf(FLT_TRUE_MIN) - as expected.

Still, I do not see a benefit (unless most usage values are near 0.0) in doing the if (x*x == 0) test versus the simple if (x==0).

Soap box

Certainly OP was most interesting is dealing with x at or near 0.0 and the "loopholes" there about x*x == 0. So was I when I first saw this post. Later I realized that many x values in the range [-8828.3 ... -7995.2] failed.

It highlights the importance of functional testing across the spectrum of x before attempting optimization in speed or low error performance.

(+1) For "very negative" x, why not simply compute 1 / (2 * sqrt (-x) * exp (-sqrt (-x))), taking advantage of subnormals? This should produce results with smaller error than going through logarithmic space and exponentiating at the end, which incurs a substantial amount of error magnification at the extreme end of the range.
@njuffa Interesting idea. Answer updated. Just went through a similar issue here with best approach. Will ponder some more.
I decided to be less lazy and actually tried my idea and your proposed fix, and the results of a quick experiment seem clear. For a double implementation, the maximum error with my approach is about 6050 ulp, while it is about 1500 ulp with yours (i.e., when going through logarithmic space). My initial thought was that both approaches would lose on the order of 12 bits due to error magnification, but mine would reduce the error being magnified a bit for overall better accuracy. Clearly that is not the case.
@njuffa The trick bring the computation into the fat FP zone near 0.0 to 1.0, forms the pre-answer and then scale exactly by 2^n. I'll look more into it later - real life calls.
You latest proposal results in a maximum error of 513 ulps. I am sampling double arguments in [-514165, -504776], BTW.

CollectivesTM on Stack Overflow

floating-point implementation of a function with removable singularity

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related