Accurate computation of the inverse gamma function with the standard C math library

Question 1

The inverse of the gamma function over the reals is multivalued with an infinite number of branches. This self-answered question is about the principal inverse of the gamma function, Γ₀^-1(x), whose domain is [α, ∞], where α ≈ 0.8856031944108887 is the minimum of Γ(x) in the positive half-plane.

This inverse if useful in various contexts in mathematics and computer science. One example:

Gaston H. Gonnet, "Expected Length of the Longest Probe Sequence in Hash Code Searching." Journal of the Association for Computing Machinery, Vol. 28, No. 2, April 1981, pp. 289-304:

An investigation is made of the expected value of the maximum number of accesses needed to locate any element in a hashing file under various collision resolution schemes. [...] Finally, for the separate chaining (or direct chaining) method both expected values are found to be ≈ Γ^-1(n).

The Stirling approximation shows that the gamma function can be considered to be fairly closely related to exponentiation. Therefore the Lambert W function seems like a suitable basis for the computation of the principal inverse of the gamma function.

The question is: How does one compute it with good accuracy and reasonable performance?

My experience working on mathematical libraries suggests that a maximum error of 4 ulp is an appropriate bound for a special function implementation that is reasonably straightforward. An implementation should restrict itself to ISO-C99 and ideally require only the standard math functions plus an accurate ISO-C99 implementation of the principal branch of the Lambert W function, W₀, such as the one I provided earlier on this site.

The mapping of C floating-point types to IEEE-754 formats (double→ binary64; float → binary32) can be assumed, as can the presence of hardware support for the fused multiply-add operation performed by the fma(), fmaf() functions.

Question 2

In the 1990s the Dutch mathematician Nico Temme observed that in numerical computations involving the gamma function, it is often advantageous to represent the gamma function as the product of a "base" approximation very close to Γ(x) which can be computed in a straightforward manner and an adjustment factor, which I will call the "series", that is near unity for x > 1. In

N. M. Temme, "A set of algorithms for the incomplete gamma functions", Probability in the Engineering and Informational Sciences, Vol. 8, No. 2, April 1994, pp. 291-307,

the author chose the following generally advantageous arrangement, where the "base" provides an underestimate to Γ(x) while the "series" in the form of the polynomial P() is always close to unity and tends to 1 from above for x → ∞.

Γ(x) = x^x-1⁄2e^-x√(2π) P(1/x) = x^x-1⁄2e^-x√(2π) Γ^*(x)

The mathematical series expansion is P(1/x) = 1 + (1/12)(1/x) + (1/288)(1/x)² - (139/51840)(1/x)x³ ... , where the coefficients are given by OEIS sequence A001163 for the numerators and OEIS sequence A001164 for the denominators. Practical implementations of Γ^* on a fixed, restricted domain would be implemented with polynomial or rational minimax approximations. Some computational frameworks offer an implementation of Γ^*, for example GSL provides it as gsl_sf_gammastar().

Unfortunately, Temme's "base" x^x-1⁄2e^-x√(2π) is not invertible with the Lambert W function. Curiously, an alternate "base" that is readily invertible is provided by Stirling's original approximation, which in modern terminology (using z=x-1⁄2) is log(Γ(x)) = z log(z)-z+log(√(2π)) - 1/(24z) + 7/(2880z³) - 31/(40320z⁵) ...

James Stirling (tr. Francis Holliday), Differential Method: or, A Treatise concerning Summation and Interpolation of Infinite Series. London: E. Cave 1749, p. 123ff

Exponentiation of the above yields Γ(x) = z^ze^-z√(2π) P(1/z). The "base" z^ze^-z√(2π) provides an overestimate to Γ(x) while the "series" in the form of the polynomial P() is always close to unity and tends to 1 from below for x → ∞. The mathematical series expansion is P(1/z) = 1 - (1/24)(1/z) + (1/1152)(1/z)² + (1003/414720)(1/z)³ ..., where the coefficients are given by OEIS sequence A182935 for the numerators and OEIS sequence A144618 for the denominators. As with Temme's partitioning, practical computation of P() on a fixed and restricted domain will utilize polynomial or rational minimax approximations.

To avoid spurious overflow in intermediate computations, it seems highly advisable to remove the factor √(2π) from the "base" and combine it with the "series". This arrangement is represented by the functions gamma_stirling_base_excl_root() and gamma_stirling_series_incl_root() below. The former is easily inverted with the Lambert W function, as shown in the function gamma_stirling_base_excl_root_inv(). One can use gamma_stirling_base_excl_root(x) * gamma_stirling_series_incl_root(x) to compute Γ(x).

The basic idea for an accurate computation x = Γ₀^-1(y) is to use an iterative scheme. For a starting approximation accurate to within a few percent, one simply uses y in place of the "base" and computes the inverse of this "base". Since the "base" provides a slight overestimate of the gamma function, this results in a starting approximation that is a slight underestimate to the inverse of the gamma function.

The starting approximation is then refined using a simple derivative-free iteration scheme that exploits the fact that the "series" has a small positive slope but is almost a horizontal line across much of the domain, so gamma_stirling_series_incl_root(x) and gamma_stirling_series_incl_root(x + ε) are so close together that one can be substituted for the other.

However, this iteration scheme works poorly as x → α. Furthermore, at roughly x = 0.92, the domain of the real-valued Lambert W funcion is exceeded. For x near α an alternative computation is therefore required. A polynomial approximation appears suitable. Since the shape of Γ₀^-1(x) roughly resembles the square root function, the argument is transformed accordingly. As α is not exactly representable in a finite-precision floating-point format, this transformation is affected by catastrophic cancellation in the immediate vicinity of α . Therefore α must be represented with extended precision. With FMA available, the simplest way is to represent it as the product of two target-precision constants, thus representing α to almost quadruple precision.

The resulting code is found in my_gammainv(). Its accuracy depend on the quality of the standard math library used. I utilized the highly-accurate math library of the Intel toolchain ^[1] and observed a maximum error of 3.555852 ulp in reasonably extensive testing. Since the test cannot be performed in an exhaustive manner, this is not a guaranteed bound. I found that the number of iterations required for small x is relatively large, and for improved efficiency it would likely be advantageous to make use of Newton iterations instead. However, this requires an implementation of the digamma function. The code shown below is therefore best considered to be a reasonable baseline for future, improved, implementations.

/* For x in [1, 171.62437695630271] with z = x - 1/2, compute 
 Gamma(x) / (pow(z,z) * exp(-z))
*/
double gamma_stirling_series_incl_root (double x)
{
 const double sqrt_2pi = 2.5066282746310005;
 double z, ooz, p, q, r;
 z = x - 0.5;
 ooz = 1.0 / z;
 p = 1.1651261098836142e-6;
 p = fma (p, ooz, -6.6984025798790317e-5);
 p = fma (p, ooz, -3.4279581207190289e-2);
 p = fma (p, ooz, -3.6105933793714484e-1);
 p = fma (p, ooz, -1.1118476663536905e+0);
 p = fma (p, ooz, -1.7258577047636312e+0);
 p = fma (p, ooz, -1.5534865998228795e+0);
 p = fma (p, ooz, -8.4722630042414326e-1);
 p = fma (p, ooz, -2.6668992127945762e-1);
 p = fma (p, ooz, -4.1666666666672333e-2);
 p = p * ooz;
 q = 1.2812295488170272e-1;
 q = fma (q, ooz, 2.0812181436639463e+0);
 q = fma (q, ooz, 1.1349479778411878e+1);
 q = fma (q, ooz, 2.9631178954083069e+1);
 q = fma (q, ooz, 4.3371327947326982e+1);
 q = fma (q, ooz, 3.8081583482347888e+1);
 q = fma (q, ooz, 2.0525254178847312e+1);
 q = fma (q, ooz, 6.4213914440607534e+0);
 q = fma (q, ooz, 1.0000000000000000e+0);
 r = fma (p / q, sqrt_2pi, sqrt_2pi);
 return r;
}
// For x in [1/2, 171.62437695630271], with z = x-1/2, compute pow(z,z)*exp(-z)
double gamma_stirling_base_excl_root (double x)
{
 double z = x - 0.5;
 double h = pow (z, 0.5 * z);
 double r = (exp (-z) * h) * h;
 return r;
}
// For x in [1/e, INF), compute inverse of gamma_stirling_base_excl_root()
double gamma_stirling_base_excl_root_inv (double x)
{
 const double one_over_e = 0.3678794411714423; 
 const double base_e = 2.7182818284590452;
 double r, t;
 t = log (x);
 r = lambert_w0 (t * one_over_e);
#if 0
 // less accurate due to error magnification
 r = fma (exp (r), base_e, 0.5);
#else
 // more accurate but with potential for division by zero
 r = (t == 0) ? (base_e + 0.5) : fma (1.0 / r, t, 0.5);
#endif
 if (isinf (x)) r = x;
 return r;
}
/* For x in [1, 171.62437695630271], compute Gamma(x). With Intel math library
 observed maximum ulp error is 5.35774649 @ 49.362983703613281
*/
double my_gamma (double x)
{
 return gamma_stirling_base_excl_root(x)*gamma_stirling_series_incl_root(x);
}
/* For x in [0.8856031944108887, INF), compute Gamma^{-1}(x). With Intel math
 library, observed maximum ulp error is 3.555852 @ 5.1508555598026167
*/
double my_gammainv (double x)
{
 double r;
 if (x < 1.2456) {
 const double gamma_min_fac1 = 1.1070789203770544e+0;
 const double gamma_min_fac2 = 7.9994585581059174e-1;
 double red_x;
 red_x = fma (-gamma_min_fac1, gamma_min_fac2, x); // x - gamma_min
 red_x = sqrt (red_x);
 r = -1.7764958145211656e-2;
 r = fma (r, red_x, 3.7692812636504647e-2);
 r = fma (r, red_x, 9.4258030628319348e-2);
 r = fma (r, red_x, -4.5612222321620560e-1);
 r = fma (r, red_x, 7.0005412242014109e-1);
 r = fma (r, red_x, -4.3880715889789362e-1);
 r = fma (r, red_x, -2.1191527305483765e-2);
 r = fma (r, red_x, 5.9972071220247239e-2);
 r = fma (r, red_x, 2.0013353305108039e-1);
 r = fma (r, red_x, -1.2244091522441247e-1);
 r = fma (r, red_x, -2.2186691361367650e-1);
 r = fma (r, red_x, 1.4375172514492868e-1);
 r = fma (r, red_x, 2.8902441794995354e-1);
 r = fma (r, red_x, -1.9468736664744374e-1);
 r = fma (r, red_x, -4.6201898517252749e-1);
 r = fma (r, red_x, 3.5594630082461931e-1);
 r = fma (r, red_x, 1.5276760433849161e+0);
 r = fma (r, red_x, 1.4616321449683622e+0);
 } else {
 const double one_over_sqrt_2pi = 0.3989422804014327;
 double stirling_base_excl_root_est;
 // initial approximation; always an underestimate
 stirling_base_excl_root_est = x * one_over_sqrt_2pi;
 r = gamma_stirling_base_excl_root_inv (stirling_base_excl_root_est);
 // number of refinement steps determined experimentally
 int iters = 2;
 if (x < 1e52) iters += 1;
 if (x < 2e09) iters += 1;
 if (x < 2e02) iters += 1;
 if (x < 8e00) iters += 1;
 if (x < 3e00) iters += 1;
 if (x < 2e00) iters += 1;
 for (int i = 0; i < iters; i++) {
 stirling_base_excl_root_est = x /gamma_stirling_series_incl_root(r);
 r = gamma_stirling_base_excl_root_inv (stirling_base_excl_root_est);
 }
 }
 return r;
}

A corresponding single-precision implementation is structurally identical but needs at most three iterations of the derivative-free iteration scheme. Utilizing the Intel math library, a maximum error of 3.24856128 ulps was observed in an exhaustive test.

// For x in [1, 35.0400981] with z = x-1/2, compute Gamma(x)/(pow(z,z)*exp(-z))
float gamma_stirling_series_incl_rootf (float x)
{
 const float sqrt_2pi = 2.50662827f;
 float z, ooz, p, r;
 z = x - 0.5f;
 ooz = 1.0f / z;
 p = 1.40736238e-5f;
 p = fmaf (p, ooz, -1.25343795e-4f);
 p = fmaf (p, ooz, 4.35243332e-4f);
 p = fmaf (p, ooz, -6.09739101e-4f);
 p = fmaf (p, ooz, -3.85110063e-4f);
 p = fmaf (p, ooz, 2.55142688e-3f);
 p = fmaf (p, ooz, 8.42256821e-4f);
 p = fmaf (p, ooz, -4.16650288e-2f);
 p = p * ooz;
 r = fmaf (p, sqrt_2pi, sqrt_2pi);
 return r;
}
// For x in [1/2, 35.0400981], with z = x - 1/2, compute pow (z,z) * exp (-z)
float gamma_stirling_base_excl_rootf (float x)
{
 float z = x - 0.5f;
 float h = powf (z, 0.5f * z);
 float r = (expf (-z) * h) * h;
 return r;
}
// For x in [1/e, INF), compute inverse of gamma_stirling_base_excl_root()
float gamma_stirling_base_excl_root_invf (float x)
{
 const float one_over_e = 0.36787944f; 
 const float e = 2.71828183f;
 float r, t;
 t = logf (x);
 r = lambert_w0f (t * one_over_e);
#if 0
 // less accurate due to error magnification
 r = fmaf (expf (r), e, 0.5f);
#else
 // more accurate but with potential for division by zero!
 r = (t == 0) ? (e + 0.5f) : fmaf (1.0f / r, t, 0.5f);
#endif
 if (isinf (x)) r = x;
 return r;
}
/* For x in [1, 35.0400981], compute Gamma(x). With Intel math library, 
 observed maxulperr = 5.01100129 @ 18.3913364 
*/
float my_gammaf (float x)
{
 float r;
 r = gamma_stirling_base_excl_rootf(x) * gamma_stirling_series_incl_rootf(x);
 return r;
}
/* For x in [0.88560322, INF) compute Gamma^{-1}(x). With Intel math library,
 observed maxulperr = 3.24856128 @ 5.60469484
*/
float my_gammainvf (float x)
{
 float r;
 if (x < 1.3212f) {
 const float gamma_min_fac1 = 1.05837047e+0f;
 const float gamma_min_fac2 = 8.36761057e-1f;
 float red_x;
 red_x = fmaf (-gamma_min_fac1, gamma_min_fac2, x); // x - gamma_min
 red_x = sqrtf (red_x);
 r = 1.08589509e-2f;
 r = fmaf (r, red_x, -2.04678846e-1f);
 r = fmaf (r, red_x, 4.90045016e-1f);
 r = fmaf (r, red_x, -2.54904563e-1f);
 r = fmaf (r, red_x, -4.52452428e-1f);
 r = fmaf (r, red_x, 3.55210263e-1f);
 r = fmaf (r, red_x, 1.52769733e+0f);
 r = fmaf (r, red_x, 1.46163205e+0f);
 } else {
 const float one_over_sqrt_2pi = 0.39894228f;
 float stirling_base_excl_root_est;
 // initial approximation; always an underestimate
 stirling_base_excl_root_est = x * one_over_sqrt_2pi;
 r = gamma_stirling_base_excl_root_invf (stirling_base_excl_root_est);
 // number of refinement steps determined experimentally
 int iters = (x > 5500) ? 1 : ((x > 15) ? 2 : 3);
 for (int i = 0; i < iters; i++) {
 stirling_base_excl_root_est = x/gamma_stirling_series_incl_rootf(r);
 r = gamma_stirling_base_excl_root_invf(stirling_base_excl_root_est);
 }
 }
 return r;
}

^[1] Brian Gladman, Vincenzo Innocente, John Mather, and Paul Zimmermann: Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision. 2025. (hal-03141101v8)

Question 3

"a maximum error of 3.24856128 ulps was observed in an exhaustive test." --> as able, please post that test harness. (Also, IMO, sufficient to report ULP difference to x.xx)

njuffa 27.1k5 gold badges93 silver badges150 bronze badges · Accepted Answer · 2025-11-17 00:51:11Z