Calculating Zeros of the Zeta Function - Floating Point Issues

Question 1

I have written a C program to compute values of the Hardy Z function, \$Z(t)\$. The Hardy Z function is a continuous, real-valued function that takes a positive real number \$t\$ as its input and outputs a real number that can be either positive or negative. Since \$Z(t)\$ is continuous, if it is positive (negative) at the beginning of a real interval and negative (positive) at the end of the interval, then it must be zero at some point inside that interval. It turns out that \$Z(t)\$ is zero at a given \$t\$ if and only if the complex valued Riemann Zeta function, \$\zeta(s)\$, is zero at the complex number \$s = 1⁄2 + it\$.

I will briefly describe the mathematical algorithm for \$Z(t)\$ before getting to my programming questions. \$Z(t) = M + E\$, where \$M =\$ the main term (easy formula) and \$E =\$ the error term (complicated formula). So, \$E\$ computes the error introduced if you only use the main term. There is a very good estimate of the \$E\$ term; call it \$R\$ (easy formula). The estimate \$Z(t) \approx M + R\$ is accurate plus or minus \10ドル^{-6}\$ for \$t > 200\$. Here, I am focused only on the \$M\$ term. Let

\begin{align*} N=\left[\left(\frac{t}{2\pi}\right)^{1/2}\right]\quad\text{and}\quad \theta(t) = \frac{t}{2}\log\left(\frac{t}{2\pi}\right) - \frac{\pi}{8} - \frac{t}{2} + \frac{1}{48 t} + \frac{7}{5760 t^3}, \end{align*}

where \$[x]\$ means the integral part of \$x\$. Then

\begin{align*} M(t) = 2 \sum_{n=1}^{N} n^{-1/2}\cos[\theta(t) - t \log n]. \end{align*}

Using \$Z(t) = M + R\$, my program calculates \$Z(t)\$ to an accuracy of at least 6 digits to the right of the decimal for \$t\$ values having up to 9 total digits (including digits to the left and right of the decimal). Then, accuracy starts to fall off. Looking at the above formula for \$M(t)\$, I can see two problems:

For large \$t\$, \$N\$ gets fairly large. (For \$t = 10,000,000,000\$, \$N = 39,894\$). A very small error introduced by floating point calculations grows to a "real error" when multiplied 39,894 times.
For large \$t\$, both \$\theta(t)\$ and \$t \log n\$ become significant multiples of \$t\$. Thus, at each iteration of the sum, floating point calculations are being made on numbers much larger than \$t\$.

To address the second problem, my code avoids any numbers larger than \$t\$ in the calculations, using the fact that the cosine function is \2ドル\pi\$ periodic. To explain my code, assume we want the \2ドル\pi\$ remainder of \$A\cdot B\$, but direct manipulation yields a number that is too large. We can ensure that all intermediate values are less than \$\max(A,B)\$ as follows.

Let \$i,j,k,m\$ be positive integers and let \$w,x,y,z\$ be positive real numbers. Choose any \$K\$ that is a positive even integer. Now let

\begin{align*} A = K\pi i + w,\kern6pt B = K\pi j + x,\kern6pt (ix + jw) = k+y,\kern6pt(K\pi - [K\pi])ij = m + z, \end{align*}

where again \$[x]\$ means the integral part of \$x\$. We then have

\begin{align*} A \cdot B &= (K\pi i + w) \cdot (K\pi j + x) = K\pi\{ (K\pi - [K\pi] + [K\pi])(ij)\} + K\pi\ (ix + jw) + wx \\ &= K\pi \cdot [K\pi]ij + K\pi\{ (K\pi - [K\pi] )ij\} + K\pi(ix + jw) + wx \\ &= K\pi \cdot [K\pi]ij + K\pi (m+z) + K\pi (k+y) + wx \\ &= K\pi \cdot [K\pi]ij + K\pi (k+m) + K\pi (y+z) + wx. \end{align*}

We can now drop the terms evenly divisible by \2ドル\pi\$, leaving

\begin{align*} K\pi(y+z) + wx. \end{align*}

Now for my actual C code, and then my questions.

typedef long double xdouble;
// -------------------------------------------------------------------
// We compute the main term of the Riemann-Siegel formula. We will
// need the uiN variable from ComputeNandP, along with the 
// current dt variable. The first step is to compute theta (one time).
// Using that theta, we make uiN passes through a calculating loop.
// -------------------------------------------------------------------
xdouble ComputeMain(xdouble dt, unsigned int uiN)
{
// -------------------------------------------------------------------
// We compute theta. The formula from our book is:
// 
// xdouble dTheta = ((dt/2) * (log(dt / (2 * M_PI)))) - M_PI/8 - dt/2 
// + 1/(48 * dt) + 7/(5760 * powl(dt, 3));
// 
// The problem with that formula is dTheta can be multiples of the
// value of dt. As dt approaches our floating point limits, this
// will cause problems. To resolve this issue, note that dTheta
// is only used in the 2 pi periodic cosine function. Thus, all we
// need is the remainder of dTheta / (2 * pi). We implement here
// the algorith discussed in our paper.
// ------------------------------------------------------------------
#define PI_TIMESP (2 * M_PI) // it may be optimal to increase this
#define PI_TIMESPINT 6.0 // depend on PI_TIMESP
#define PI_TIMESK (4 * M_PI) // it may be optimal to increase this
#define PI_TIMESKINT 12.0 // depend on PI_TIMESK
xdouble dMain = 0;
xdouble i, j, k;
xdouble w, x, y, z;
w = PI_TIMESP * modfl(dt/(2 * PI_TIMESP), &i); 
x = PI_TIMESP * modfl((log(dt / (2 * M_PI))-1) / PI_TIMESP, &j);
y = modfl(((i * x) + (j * w)), &k);
z = fmodl(((PI_TIMESP)-PI_TIMESPINT) * i * j, 1); 
xdouble dTheta = ((y + z) * PI_TIMESP) + (w * x) - M_PI/8 
 + 1/(48 * dt) + 7/(5760 * powl(dt, 3));
// -------------------------------------------------------------------
// Like dTheta, computing the term (dt * log((xdouble)n)) inside
// the cosine function can evaluate to multiples of dt, with the same
// floating point issues. We again would like to use the algorithm
// discussed in our paper to reduce the size of both (dt * log(n))
// and its intermediate variables. We can compute the algorithmic
// dt term outside the for loop. The other parts of the algorithm 
// depend on n and must be done at each iteration of the for loop.
// ------------------------------------------------------------------
w = PI_TIMESK * modfl(dt/PI_TIMESK, &i); 
xdouble dtLogN;
for (unsigned int n = 1; n <= uiN; ++n) {
 x = PI_TIMESK * modfl(log((xdouble)n)/ PI_TIMESK, &j); 
 y = modfl(((i * x) + (j * w)), &k);
 z = fmodl(((PI_TIMESK)-PI_TIMESKINT) * i * j, 1); 
 dtLogN = ((y + z) * PI_TIMESK) + (w * x);
 dMain += ((1 / sqrtl((xdouble)n)) * cosl(dTheta - dtLogN));
 }
return(2 * dMain);
}

My primary question is whether my "avoid variables larger than t" algorithm is useful, neutral or harmful. The useful part is avoiding numbers larger than \$t\$. The harmful part is including even more floating point calculations. I am hoping someone with floating point knowledge can give me their perspective.

Regarding the first problem, I’m not sure much can be done other than improve the accuracy of each iteration of the sum. When I switched from a compiler with a 64-bit long double to a compiler (gcc/mingw-64) with an 80-bit long double, that helped somewhat, but accuracy again falls off at ten total digits (at least if the left-most digit is above 5). Other than moving to a 128-bit long double, any other thoughts?

Question 2

@TMurphy, thank you for this submission. It would be more helpful if it included a unit test or demo() function which exercises the target code. Then Code Review contributors could better put the OP code, and tweaked versions of the code, through their paces. You are complaining about "large magnitudes" and "errors" -- better to explicitly printf() such quantities. // It's not completely clear whether you're complaining about absolute or relative error.

Question 3

I agree with @J_H. Also I assume that xdouble dt in your code represents T in your math and unsigned int uiN in your code represents N in your math - is this the case?

Question 4

@Justin Chang Yes.

Question 5

@TMurphy Then it seems to me that an arbitrary precision library such as the "GNU Multiple Precision Arithmetic Library" would handle these errors. Is there a reason why it cannot be used?

Question 6

Use long double constants

M_PI, although not part of the standard C library, is often coded like a double constant like

#define M_PI 3.1415926535897932384626433832795

Yet OP is using long double and mod math.

I recommend to use a long double constant

// v Append 'L'
#define M_PI_L 3.141592653589793238462643383279502884L
// 1 23456789 123456789 123456789 1234567

Consider at least 3+ decimal digits more than binary128 or calculate at runtime.

const long double pi = acosl(-1.0L);

Enable all compiler warnings. Use long double functions

Unclear why OP uses cosl(), yet not logl() for long double math.

// log((xdouble)n)/ PI_TIMESK
logl((xdouble)n)/ PI_TIMESK

This hints that all warnings are not enabled.

Missing #include <math.h>

Avoid obfuscation typedef long double xdouble;

Perhaps OP wants to also try typedef double xdouble;, etc. Yet the non-uniform use of long double functions and constants leads to unneeded complications. For now, choose 1 type and stick to it, using its common name long double.

For some generic math type matching, research <tgmath.h>. If used, be cautious.

Use the same object names as reference equations

The undocumented casual name changes used by OP might make sense for OP, yet add more cognitive overhead then simply being consistent.

// xdouble ComputeMain(xdouble dt, unsigned int uiN)
long double ComputeMain(long double t, unsigned N)

M_PI is not π

M_PI is a rational number (some double) that is near π (an irrational). OP's attempt to at doing some mod (M_PI*someInt) makes things worse than the mod π math that good cosl() does better as it likely uses an extended math π constant.

Considering OP is not using a long double version of log and M_PI underscore this.

If you really want to do mod π math, see
K.C. Ng's "ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit"

Code directly the equation

First code a fully long double calculation without your mod improvements and use as a reference.

Did this really perform unacceptably for OP?

long double theta = ((t/2) * (logl(t / (2 * M_PI_L)))) //
 - M_PI_L/8 - t/2 + 1/(48 * t) + 7/(5760 * powl(t, 3));
...
long double sum = 0.0;
for (unsigned n = 1; n <= N; n++) {
 sum += cosl(theta - t*logl(n))/sqrl(n);
}
long_double M_t = sum*2.0L;

Now consider mod

Maybe some user code mod math is good, I just do not see it yet. I do see casual type changes between double and long double that should be rectified first.

It is possible cospi() may help (slated for C2X) in OP's mod code.

The cospi functions compute the cosine of π ×ばつ x, thus regarding x as a measurement in half revolutions.

Note: % and fmod(), fmodl() are better thought of a remainder functions than Euclidean mod.

Maybe: Summation order

A general guide is that the summation order, if are all the same sign is to sum the smallest magnitude parts first (from 'N' to 1). If alternating signs, from larges magnitude to least, use the forward order (from '1' to N). This may only improve things by a few bits.

In this case, its more complicated and deserves a study.

Perhaps more later. A very intriguing and advanced math post.

Question 7

All great comments. (I'll wait just a bit longer before up-voting). I am new to this site. I could update the post with the full program code, but it is about 400 lines. Is that too much for this site? If that is OK, I should also explain HOW I verify the accuracy of the output.

Question 8

@TMurphy,take what ever time you want. If you want to update consider another post. Review help center for guidelines. Since you ask about accuracy, it makes sense to see your assessment criteria.

Question 9

@TMurphy Using the simplified ComputeMain() to compute zeros, I found "Zero: HardyZ( 0 ,0) --> 0 Zero: HardyZ( 0.783403098773051473899861 ,1) --> -5.42101e-20 Zero: HardyZ( 0.627526795358594805462286 ,2) --> 5.69206e-19 Zero: HardyZ( 0.55437338863809121607823 ,3) --> -2.71051e-20 Zero: HardyZ( 0.5098041507833285227347 ,4) --> 4.87891e-19 Zero: HardyZ( 0.478990168127584384422709 ,5) --> -5.42101e-20 Zero: HardyZ( 0.411232898663775866637772 ,9) --> 0". How does this compare with expectations?

Question 10

@TMurphy Interestingly, the mod π done by exceptional cos() may internally involve effectively up to 1150-bit math.

Question 11

Reinstate Monika Not sure about your ComputeMain() values. Almost all values will be in the range -20 to 20. If you want to look at the source code, you can access it via link, along with a method to verify computed values.

chux chux 36.1k2 gold badges43 silver badges96 bronze badges · Accepted Answer · 2024-10-13 23:10:58Z

Use long double constants

M_PI, although not part of the standard C library, is often coded like a double constant like

#define M_PI 3.1415926535897932384626433832795

Yet OP is using long double and mod math.

I recommend to use a long double constant

// v Append 'L'
#define M_PI_L 3.141592653589793238462643383279502884L
// 1 23456789 123456789 123456789 1234567

Consider at least 3+ decimal digits more than binary128 or calculate at runtime.

const long double pi = acosl(-1.0L);

Enable all compiler warnings. Use long double functions

Unclear why OP uses cosl(), yet not logl() for long double math.

// log((xdouble)n)/ PI_TIMESK
logl((xdouble)n)/ PI_TIMESK

This hints that all warnings are not enabled.

Missing #include <math.h>

Avoid obfuscation typedef long double xdouble;

Perhaps OP wants to also try typedef double xdouble;, etc. Yet the non-uniform use of long double functions and constants leads to unneeded complications. For now, choose 1 type and stick to it, using its common name long double.

For some generic math type matching, research <tgmath.h>. If used, be cautious.

Use the same object names as reference equations

The undocumented casual name changes used by OP might make sense for OP, yet add more cognitive overhead then simply being consistent.

// xdouble ComputeMain(xdouble dt, unsigned int uiN)
long double ComputeMain(long double t, unsigned N)

M_PI is not π

M_PI is a rational number (some double) that is near π (an irrational). OP's attempt to at doing some mod (M_PI*someInt) makes things worse than the mod π math that good cosl() does better as it likely uses an extended math π constant.

Considering OP is not using a long double version of log and M_PI underscore this.

If you really want to do mod π math, see
K.C. Ng's "ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit"

Code directly the equation

First code a fully long double calculation without your mod improvements and use as a reference.

Did this really perform unacceptably for OP?

long double theta = ((t/2) * (logl(t / (2 * M_PI_L)))) //
 - M_PI_L/8 - t/2 + 1/(48 * t) + 7/(5760 * powl(t, 3));
...
long double sum = 0.0;
for (unsigned n = 1; n <= N; n++) {
 sum += cosl(theta - t*logl(n))/sqrl(n);
}
long_double M_t = sum*2.0L;

Now consider mod

Maybe some user code mod math is good, I just do not see it yet. I do see casual type changes between double and long double that should be rectified first.

It is possible cospi() may help (slated for C2X) in OP's mod code.

The cospi functions compute the cosine of π ×ばつ x, thus regarding x as a measurement in half revolutions.

Note: % and fmod(), fmodl() are better thought of a remainder functions than Euclidean mod.

Maybe: Summation order

A general guide is that the summation order, if are all the same sign is to sum the smallest magnitude parts first (from 'N' to 1). If alternating signs, from larges magnitude to least, use the forward order (from '1' to N). This may only improve things by a few bits.

In this case, its more complicated and deserves a study.

Perhaps more later. A very intriguing and advanced math post.

All great comments. (I'll wait just a bit longer before up-voting). I am new to this site. I could update the post with the full program code, but it is about 400 lines. Is that too much for this site? If that is OK, I should also explain HOW I verify the accuracy of the output.
@TMurphy,take what ever time you want. If you want to update consider another post. Review help center for guidelines. Since you ask about accuracy, it makes sense to see your assessment criteria.
@TMurphy Using the simplified ComputeMain() to compute zeros, I found "Zero: HardyZ( 0 ,0) --> 0 Zero: HardyZ( 0.783403098773051473899861 ,1) --> -5.42101e-20 Zero: HardyZ( 0.627526795358594805462286 ,2) --> 5.69206e-19 Zero: HardyZ( 0.55437338863809121607823 ,3) --> -2.71051e-20 Zero: HardyZ( 0.5098041507833285227347 ,4) --> 4.87891e-19 Zero: HardyZ( 0.478990168127584384422709 ,5) --> -5.42101e-20 Zero: HardyZ( 0.411232898663775866637772 ,9) --> 0". How does this compare with expectations?
@TMurphy Interestingly, the mod π done by exceptional cos() may internally involve effectively up to 1150-bit math.
Reinstate Monika Not sure about your ComputeMain() values. Almost all values will be in the range -20 to 20. If you want to look at the source code, you can access it via link, along with a method to verify computed values.

Stack Exchange Network

Calculating Zeros of the Zeta Function - Floating Point Issues

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Calculating Zeros of the Zeta Function - Floating Point Issues

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions