First, it is most compact sine approximation ever. It seems i can do better than Remez in terms of precision/performance. Here [0,pi/2] approximation range.
double p[] =
-0.0020836519336891552,
-0.016434778826652056,
-0.14814815034514656;
double P1(double x, double a) {
return x + (a * x) * (x * x);
}
//err Inf: 7.415237313068701e-9
double sine_approximate(double x) {
return P1(P1(P1(x,p[0]),p[1]),p[2]);
}
And the same practical solution for float32 looks :
//maxErrULP=2.49524, maxErrInf=9.88345e-08
float p[] =
-0.0020836557,
-0.016434766,
-0.14814815;
Practical double64 ,
// error inf 2.22e-16
-0.00002581064752010963,
-0.00020280816250387983,
-0.00182899436357092772,
-0.01646090534492345922,
-0.14814814814814814;
So my question, can you compare this solution with Remez in terms of performance to precision?
UPD:
Pro:
- Compact (important where we approximate a lot)
- General
- Higher order derivatives
- Better behavior out of optimization range
- Simple "inverse" function of approximation
Neg:
- No parallelism available
- Only numerical minimization, don't have math base
- Dn't know rules to select best "primitive" function to compose
UPDATE2: For more examples have form of COS approximation as "1-0.5xP1(P1(P1))" And TAN approximation with R1 = x / (1.0f + k * x * x);
inline float r(float x, float k) { return x / (1.0f + k * x * x); }
inline float tan_approx(float x)
{
//maxulperr = 3.82904 maxrelerr = 2.70230e-07 maxabserr = 2.28229e-07
float a = -0.078562208;
float b = -0.0089859413;
float c = -0.24578492;
return r(r(r(x, a), b), c);
}
2 Answers 2
First off, I am delighted by the innovative approach demonstrated in the question. The titular question is unanswerable in its generality. For various reasons the three-term single-precision approximation of sine on [-π/2, π/2] was of most interest to me, and I narrowed the question to this:
"Using the same number of mathematical operations, does the proposed algorithm deliver superior accuracy compared to a minimax approximation generated using the Remes algorithm and evaluated in simple Horner form?"
The answer to that for this specific case is "No", as demonstrated by the test program below which I compiled with the Intel classic compiler (/fp:strict; -fp-model=strict). The output of the program is as follows:
Testing: sine_approximate(x)
maxulperr = 2.41809 maxrelerr= 1.74602e-007 maxabserr= 1.37514e-007
Testing: sine_remes(x)
maxulperr = 1.87379 maxrelerr= 1.12022e-007 maxabserr= 1.11686e-007
In modern floating-point computation assessing error in ulp is the most relevant metric. sine_approximate() uses 9 multiplies and 3 additions, while sine_remes() uses 7 multiplies and 5 additions, so both approaches require 12 mathematical operations. However, many modern floating-point units are based on fused multiply-add (FMA) as the basic building block, and looking at machine instructions sine_approximate() requires 3 FMAs and 6 FMULs, while sine_remes() requires 5 FMAs and 2 FMULs, so the latter requires fewer floating-point instructions.
My assessment of practical trade-offs is as follows. sine_approximate() is multiply-intensive, which may have a negative impact on energy expenditure and latency. It offers better instruction level parallelism than sine_remes() and may result in more compact code, as seen in compilation for ARM64 at Compiler Explorer, for example. sine_remes() offers smaller peak and average error than sine_approximate() and makes better use of FMA where that is available. When based on the simple Horner scheme, it lacks instruction-level parallelism, which could be addressed by switching to a second-order Horner scheme at the cost of one additional instruction.
In view of these trade-offs, I would consider the proposed scheme most suitable for lower-precision computations in certain embedded applications where its compactness can provide an advantage, possibly also in some DSP or AI applications depending on platform specifics. While it is questionable to generalize from this one example, this approach is likely not suitable for use in modern standard math libraries (for example glibc) which need to provide tight error bounds.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
#define FUNC(x) sine_remes(x)
float sine_approximate (float x)
{
x = (-2.08365335e-3f * x) * (x * x) + x; // -0x1.111bcep-9
x = (-1.64347887e-2f * x) * (x * x) + x; // -0x1.0d4480p-6
x = (-1.48148164e-1f * x) * (x * x) + x; // -0x1.2f684ep-3
return x;
}
float sine_remes (float x)
{
float p, t;
t = x * x;
p = - 2.38889015e-8f; // -0x1.9a6880p-26
p = p * t + 2.75253933e-6f; // 0x1.717088p-19
p = p * t - 1.98408685e-4f; // -0x1.a017dap-13
p = p * t + 8.33333377e-3f; // 0x1.111112p-7
p = p * t - 1.66666672e-1f; // -0x1.555556p-3
t = t * x;
p = p * t + x;
return p;
}
uint32_t float_as_uint32 (float a)
{
uint32_t r;
memcpy (&r, &a, sizeof r);
return r;
}
uint64_t double_as_uint64 (double a)
{
uint64_t r;
memcpy (&r, &a, sizeof r);
return r;
}
double floatUlpErr (float res, double ref)
{
uint64_t i, j, err, refi;
int expoRef;
/* ulp error cannot be computed if either operand is NaN, infinity, zero */
if (isnan (res) || isnan (ref) || isinf (res) || isinf (ref) ||
(res == 0.0f) || (ref == 0.0f)) {
return 0.0;
}
/* Convert the float result to an "extended float". This is like a float
with 56 instead of 24 effective mantissa bits, and reclaims the exponent
value of 0xff for a normal numeric binade
*/
i = ((uint64_t)float_as_uint32 (res)) << 32;
/* Convert the double reference to an "extended float". If the reference is
>= 2^129, we need to clamp to the maximum "extended float". If reference
is < 2^-126, we need to denormalize because of float's limited exponent
range.
*/
refi = double_as_uint64 (ref);
expoRef = (int)(((refi >> 52) & 0x7ff) - 1023);
if (expoRef >= 129) {
j = 0x7fffffffffffffffULL;
} else if (expoRef < -126) {
j = ((refi << 11) | 0x8000000000000000ULL) >> 8;
j = j >> (-(expoRef + 126));
} else {
j = ((refi << 11) & 0x7fffffffffffffffULL) >> 8;
j = j | ((uint64_t)(expoRef + 127) << 55);
}
j = j | (refi & 0x8000000000000000ULL);
err = (i < j) ? (j - i) : (i - j);
return err / 4294967296.0;
}
#define XSTR(s) STR(s)
#define STR(s) #s
int main (void)
{
const float PI = 3.14159265358979323f;
double maxulperr, maxabserr, maxrelerr;
printf ("Testing: %s\n", XSTR ( FUNC (x)));
maxulperr = 0;
maxabserr = 0;
maxrelerr = 0;
for (float x = 0.0f; x <= (PI / 2); x = nextafterf (x, INFINITY)) {
double ref = sin ((double)x);
float res = FUNC (x);
double abserr = fabs (res - ref);
if (abserr > maxabserr) maxabserr = abserr;
double relerr = fabs (abserr / ref);
if (relerr > maxrelerr) maxrelerr = relerr;
double ulperr = floatUlpErr (res, ref);
if (ulperr > maxulperr) maxulperr = ulperr;
}
for (float x = -0.0f; x >= (-PI / 2); x = nextafterf (x, -INFINITY)) {
double ref = sin ((double)x);
float res = FUNC (x);
double abserr = fabs (res - ref);
if (abserr > maxabserr) maxabserr = abserr;
double relerr = fabs (abserr / ref);
if (relerr > maxrelerr) maxrelerr = relerr;
double ulperr = floatUlpErr (res, ref);
if (ulperr > maxulperr) maxulperr = ulperr;
}
printf ("maxulperr = %.5f maxrelerr=%13.5e maxabserr=%13.5e\n",
maxulperr, maxrelerr, maxabserr);
return EXIT_SUCCESS;
}
Addendum: While Remes's exchange algorithm as originally published [1,2] covered just polynomial approximations, it was extended to rational approximations by later authors. One might thus want to include rational approximations like the one shown below (demonstrating just one of several possible arrangements) in a comparison. IEEE-754 compliant divisions on float operands can be quite fast one some modern processors (e.g. latency of 7 cycles on ARM Neoverse N2), while others (notably GPUs) offer facilities for the rapid computation of an approximate float division, accurate to within a couple of ulps or so.
// maxulperr = 1.89825 maxrelerr = 1.14488e-7 maxabserr = 1.13145e-7
float sine_remes_rat (float x)
{
float p, q, s, t;
s = x * x;
q = - 2.91885994e-3f; // -0x1.7e94b0p-9
p = - 1.64994039e-2f; // -0x1.0e5384p-6
t = s * x;
q = q * s - 2.00993896e-1f; // -0x1.9ba2b0p-3
p = p * s;
q = q * s - 6.00000238e+0f; // -0x1.80000ap+2
p = p * t + t;
return (p / q) + x;
}
With CUDA on NVIDIA GPUs, which provide a floating-point reciprocal instruction, one can then implement sine_remes_rat() with twelve mathematical operations (without the use of FMA), just as many as were used for sine_approximate() and sine_remes(). Accuracy suffers only slightly from the resulting use of approximate float division. This is just an exercise for demonstration purposes, as NVIDIA GPUs have built-in hardware support for computing float sine, exposed via the __sinf() device function intrinsic.
// -fmad=false (7 FMUL,4 FADD,1 RCP) maxulperr=2.35552 maxrelerr=1.41263e-7 maxabserr=1.40400e-7
// -fmad=true (4 FMA, 3 FMUL,1 RCP) maxulperr=2.19167 maxrelerr=1.48479e-7 maxabserr=1.48479e-7
__device__ float sine_remes_rat (float x)
{
float p, q, s, t;
s = x * x;
q = - 2.91885994e-3f; // -0x1.7e94b0p-9
p = - 1.64994057e-2f; // -0x1.0e5386p-6
t = s * x;
q = q * s - 2.00993910e-1f; // -0x1.9ba2b2p-3
p = p * s;
q = q * s - 6.00000238e+0f; // -0x1.80000ap+2
p = p * t + t;
asm ("rcp.approx.ftz.f32 %0,%0;" : "+f"(q)); // q = 1.0f / q
return (p * q) + x;
}
[1] E. Remes, "Sur un procédé convergent d'approximations successives pour déterminer les polynomes d'approximation", Comptes rendus hebdomadaires des séances de l'Académie des sciences, 198, Jan.-Jun. 1934, pp. 2063-2065 (presented at the session of June 4, 1934).
[2] Eugène Remes, "Sur le calcul effectif des polynomes d'approximation de Tchebichef," Comptes rendus hebdomadaires des séances de l'Académie des sciences, 199, Jul.-Dec. 1934, pp. 337-340 (presented at the session of July 23, 1934).
20 Comments
floatUlpErr() gives very similar results as my fp_nth_difff(). Although yours is absolute value and mine is signed. Someday I'll dig deep to understand the slight differences.double, just use the output of the Remes algorithm directly.Conclusion
At least on my machine, OP's approach is fast, though not fastest and the ULP error was not the best as some faster code.
Maybe compact (that is a source code compare) and a worthy effort of more study - though I would echo @njuffa: does not clearly beat Remez.
The singular biggest weakness to OP's approach is that x*x, needs to be recalculated for each term of p[] whereas other approaches like Remez can use a one time x*x.
I understand that OP is looking to compare a general approach, and not necessarily this sine_approximate(). Yet to do an apples-to-apples and advance this post, I suggest some changes to sine_approximate() as a prelude to real tests:
Start with float
float is simpler and exhaustive tests of all float values, if needed, are feasible. After assessing float, move to double.
Stitching
sine_approximate(x) is good over the [-π/2 ... +π/2] range and is useful as a sub-function for a general sin(). Yet to stitch the end points of one sub-function call with the results of an x from a value just outside the [-π/2 ... +π/2] range (which incurs a different computational path), it is desirable to minimize the discontinuity. A sin(x) that has jumps in values along the wider x range is undesirable in real word applications that look at the difference at these stitch points.
I recommend that the error at the end points be brought down to zero, even if this constraint means a slightly higher overall worst case error.
Consider [-π/4 ... +π/4]
To form a quality alternative sinf(), range reduction is needed. For me, that has been over the [-π/4 ... π/4] range (i.e. +/- 45 degrees).
Review key values
In the quest for speed and low error, review calls like sine_approximate(0) and sine_approximate(M_PI/2). A user would readily expect these to return 0.0 and 1.0 exactly.
Error assessment: consider ULP
OP graphed absolute error: sine_approximate(x) - math_sine(x). Also discussed was its relative error: (sine_approximate(x) - math_sine(x))/x. Yet, IMHO, it is the Unit in the last place (ULP) error that should be used for a general error assessment of a function like sine_approximate().
Sample approach: unit-in-the-last-place (ULP) differences.
For a float version of sine_approximate(x), I assessed a worst case error of 2.495 ULP.
x:0x1.00c894p-2 sin(x):0x1.fc3393027p-3 sine_approximate(x):0x1.fc3398p-3 ulp error:2.49524
Putting this altogether
In preparation for a good precision/performance compare, I offer a variation on OP's excellent sine_approximate(x):
// For 0 to pi/4.
// x:0.0625054762 sin(x):0x1.ffb625313p-5
// sine_approximate(x):0x1.ffb62ap-5
// Max ulp error:2.4039
float sine_approximate_c(float x) {
static const float p[] = {
-0.00208876099975342f, -0.0164295475661017f, -0.148148357068155f };
float y = x + (p[0] * x) * (x * x);
float z = y + (p[1] * y) * (y * y);
float a = z + (p[2] * z) * (z * z);
return a;
}
For the endpoints of 0, M_PI/4:
x:0 sine_approximate_c(x):0
x:0.785398 sine_approximate_c(x):0.707107
Illustrative ULP error per x - needs work (real life calls)
[Update]
Added a test harness to demo a sample compare of various sin() functions.
Your results may vary.
// https://stackoverflow.com/questions/79601848/sine-approximation-did-i-beat-remez
static inline float P1f(float x, float a) {
return x + (a * x) * (x * x);
}
// https://stackoverflow.com/q/79601848/2410359
float sine_approximate(float x) {
static const float p[] = {-0.002083651976749301f, -0.016434777528047562f,
-0.14814814925193787f};
return P1f(P1f(P1f(x, p[0]), p[1]), p[2]);
}
float sine_approx_c45(float x) {
static const float p[] = {
-0.00208876099975342f, -0.0164295475661017f, -0.148148357068155f};
float y = x + (p[0] * x) * (x * x);
float z = y + (p[1] * y) * (y * y);
float a = z + (p[2] * z) * (z * z);
return a;
}
float sine_approx_c90(float x) {
static const float p[] = {
-2.08365118E-03f, -1.64347888E-02f, -1.48148148E-01f};
float y = x + (p[0] * x) * (x * x);
float z = y + (p[1] * y) * (y * y);
float a = z + (p[2] * z) * (z * z);
return a;
}
float sine_approx_c90x(float x) {
static const float p[] = {
-2.08365118E-03f, -1.64347888E-02f, -1.48148148E-01f};
x = x + (p[0] * x) * (x * x);
x = x + (p[1] * x) * (x * x);
x = x + (p[2] * x) * (x * x);
return x;
}
float sine_approximate_nj(float x) {
x = (-2.08365335e-3f * x) * (x * x) + x; // -0x1.111bcep-9
x = (-1.64347887e-2f * x) * (x * x) + x; // -0x1.0d4480p-6
x = (-1.48148164e-1f * x) * (x * x) + x; // -0x1.2f684ep-3
return x;
}
float sine_remes_nj(float x) {
float p, t;
t = x * x;
p = -2.38889015e-8f; // -0x1.9a6880p-26
p = p * t + 2.75253933e-6f; // 0x1.717088p-19
p = p * t - 1.98408685e-4f; // -0x1.a017dap-13
p = p * t + 8.33333377e-3f; // 0x1.111112p-7
p = p * t - 1.66666672e-1f; // -0x1.555556p-3
t = t * x;
p = p * t + x;
// x:1.49331403 sin(x):0x1.fe76c040ap-1 sine_approximate(x):0x1.fe76c4p-1 ulp error:1.87379
return p;
}
float sine_remes_rat(float x) {
float p, q, s, t;
s = x * x;
q = -2.91886134e-3f; // -0x1.7e94bcp-9
p = -1.64994095e-2f; // -0x1.0e538ap-6
t = s * x;
q = q * s - 2.00993851e-1f; // -0x1.9ba2aap-3
p = p * s;
q = q * s - 6.00000238e+0f; // -0x1.80000ap+2
p = p * t + t;
return (p / q) + x;
}
float sine_chux3nice(float x) {
static const float p[] = {1, -1.6666654514001200E-01f,
8.3321437734004800E-03f, -0.000195113260505579f};
float x2 = x * x;
float y = x * (1 + x2 * (p[1] + x2 * (p[2] + (x2 * p[3]))));
return y;
}
float sine_chux3(float x) {
static const float p[] = {1, -1.6666017174279000E-01f,
8.3161538759826100E-03f, -0.000185941285974709f};
float x2 = x * x;
float y = x * (1 + x2 * (p[1] + x2 * (p[2] + (x2 * p[3]))));
return y;
}
/////////////////////////
// Test code
/////////////////////////
#include <assert.h>
#include <float.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
// https://codereview.stackexchange.com/q/288625/29485
#include "fp.h"
double test_sine(const char *name, float xhi, float (*f)(float)) {
static float xhi_prior = 0.0;
static unsigned count = 0;
static clock_t offset = 0;
clock_t c0, c1;
float xlo = 1 / 16.0f;
if (1 || xhi_prior != xhi) {
xhi_prior = xhi;
c0 = clock();
for (float x = xlo; x <= xhi; x = nextafterf(x, 2)) {
count++;
}
c1 = clock();
offset = c1 - c0;
}
double err_max = 0;
float err_x = 0;
double err_y0 = NAN;
float err_y1 = NAN;
for (float x = xlo; x <= xhi; x = nextafterf(x, 2)) {
double y0 = sin(x);
float y1 = f(x);
double err = fabs(fp_nth_difff(y1, y0));
if (err >= err_max) {
err_max = err;
err_x = x;
err_y0 = y0;
err_y1 = y1;
}
}
unsigned dummy = 0;
c0 = clock();
for (float x = xlo; x <= xhi; x = nextafterf(x, 2)) {
dummy++;
volatile float y = f(x);
(void) y;
}
c1 = clock();
// printf("%u %lld %lld %lld\n", count, (long long) offset,(long long) c0,(long long) c1);
double function_time = (double)(c1 - c0 - offset)/CLOCKS_PER_SEC/count;
function_time *= 1.0e9; // Scale time from seconds to ns.
(void) err_x; (void) err_y0; (void) err_y1;
printf(
"f(x):%-20s, Max ULP error in x:[0...%5.3g]: %.4g ULP. Time:%6.3f nsec, f(%-13.9g) --> %-13.9g\n",
name, xhi, err_max, function_time, xhi, f(xhi));
fflush(stdout);
return function_time;
}
// Test N times
double test_sineN(const char *name, float xhi, float (*f)(float)) {
double t = 0;
int n = 10;
for (int i = 0; i < n; i++) {
t += test_sine(name, xhi, f);
}
t /= n;
printf(
"f(x):%-20s, Average time:%6.3f nsec\n", name, t);
return t;
}
int main() {
time_t t0,t1;
time(&t0);
printf("FLT_EVAL_METHOD: %d\n", FLT_EVAL_METHOD);
float pi = acosf(-1);
test_sineN("sine_approx_c45", pi / 4, sine_approx_c45);
test_sineN("sine_approx_c90", pi / 2, sine_approx_c90);
test_sineN("sine_approx_c90x", pi / 2, sine_approx_c90x);
test_sineN("sine_approximate", pi / 2, sine_approximate);
test_sineN("sine_approximate_nj", pi / 2, sine_approximate_nj);
test_sineN("sine_remes_nj", pi / 2, sine_remes_nj);
test_sineN("sine_remes_rat", pi / 2, sine_remes_rat);
time(&t1);
printf("Run time: %g\n", difftime(t1, t0));
}
Results
FLT_EVAL_METHOD: 0
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 3.138 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 1.586 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 1.224 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 0.785 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 0.521 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 0.523 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 0.525 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 0.321 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 0.345 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Max ULP error in x:[0...0.785]: 2.404 ULP. Time: 0.310 nsec, f(0.785398185 ) --> 0.707106769
f(x):sine_approx_c45 , Average time: 0.928 nsec
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.370 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.375 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.374 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.309 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.287 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.236 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.222 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.233 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.195 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.205 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90 , Average time: 0.281 nsec
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.151 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.145 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.157 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.132 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.125 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.137 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.116 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.126 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.122 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Max ULP error in x:[0... 1.57]: 2.444 ULP. Time: 0.131 nsec, f(1.57079637 ) --> 1
f(x):sine_approx_c90x , Average time: 0.134 nsec
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.127 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.122 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.119 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.089 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.125 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.096 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.094 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.068 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.077 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Max ULP error in x:[0... 1.57]: 2.495 ULP. Time: 0.075 nsec, f(1.57079637 ) --> 1
f(x):sine_approximate , Average time: 0.099 nsec
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.073 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.092 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.100 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.078 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.087 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.065 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.081 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.072 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.061 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Max ULP error in x:[0... 1.57]: 2.418 ULP. Time: 0.060 nsec, f(1.57079637 ) --> 0.99999994
f(x):sine_approximate_nj , Average time: 0.077 nsec
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.084 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.074 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.088 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.079 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.069 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.076 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.081 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.073 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.072 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Max ULP error in x:[0... 1.57]: 1.874 ULP. Time: 0.071 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_nj , Average time: 0.077 nsec
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.070 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.068 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.067 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.066 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.078 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.064 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.069 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.055 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.055 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Max ULP error in x:[0... 1.57]: 1.923 ULP. Time: 0.060 nsec, f(1.57079637 ) --> 1
f(x):sine_remes_rat , Average time: 0.065 nsec
Run time: 199
15 Comments
sine(), using double only adds marginal precision improvement and at typically 2x run-time cost. With double, might as well use 5+ terms.Explore related questions
See similar questions with these tags.
x^7~ 5.9e-7 and tox^9~ 3.3e-9. Your recursionP1()approximates the Remez coefficient inx^9and higher quite well. It is an interesting compact form that I haven't seen before. The latency won't be great because of data dependencies but the accuracy here is impressive for so few coefficients.