Prerequisites
The typedef
name uint128_t
designates an unsigned integer type with width exactly 128 bits.
The UINT128_MAX
is maximum value for an object of type uint128_t
.
Function int ctz(uint128_t n)
returns the number of trailing 0-bits in n
, starting at the least significant bit position. If n
is 0, the result is undefined.
The macro UINT128_C(n)
shall expand to an integer constant expression corresponding to the type uint128_t
.
The following macros are defined.
/* all 3^n for n < 41 fits into uint64_t */
#define LUT_SIZE64 41
/* all 3^n for n < 81 fits into uint128_t */
#define LUT_SIZE128 81
The following array is defined and initialized with corresponding values.
/* lut[n] contains 3^n */
uint128_t lut[LUT_SIZE128];
Problem
My program is concerned with verifying the convergence of the Collatz problem, using this algorithm.
The convergence for all values n
≤ 87 ×ばつ 260 has been proven. [Source: Christian Hercher, Uber die Lange nicht-trivialer Collatz-Zyklen, Artikel in der Zeitschrift "Die Wurzel" Hefte 6 und 7/2018.]
The following function is called for n
of the form \4ドルn+3\$, in order from the smallest one to the largest one, only if all preceding calls returned zero.
The following function should either
- return 0 if the Collatz problem for the
n
is convergent, - return 1 if the function cannot verify the convergence using 128-bit arithmetic,
- loop infinitely if the trajectory for the
n
is cyclic.
Code
int check_convergence(uint128_t n)
{
uint128_t n0 = n;
int e;
do {
if (n <= UINT128_C(87) << 60) {
return 0;
}
n++;
e = ctz(n);
n >>= e;
if (n < UINT128_C(1) << 64 && e < LUT_SIZE64) {
return 0;
}
if (n > UINT128_MAX >> 2*e || e >= LUT_SIZE128) {
return 1;
}
n *= lut[e];
n--;
n >>= ctz(n);
if (n < n0) {
return 0;
}
} while (1);
}
1 Answer 1
Given that (from your comments) there is one and only one input which would cause overflow, I propose the following check at the beginning of the function:
int check_convergence(uint128_t n)
{
const uint128_t n0 = n;
int e;
if (n == UINT128_MAX)
return 1;
do {
...
} while (true);
}
I also added const
to n0
, given that it's constant through all the function.
if (n < UINT128_C(1) << 64 && e < LUT_SIZE64)
return 0;
That can be rewritten as:
if (n <= UINT64_MAX && e < LUT_SIZE64)
return 0;
Although maybe unneeded, I prefer to always parenthesize macros that evaluate to a value, just in case:
#define LUT_SIZE128 (81)
-
1\$\begingroup\$
(unsigned long)(n>>64) == 0
seems to be much faster thann < UINT128_C(1) << 64
orn <= UINT64_MAX
. \$\endgroup\$DaBler– DaBler2019年08月28日 12:27:01 +00:00Commented Aug 28, 2019 at 12:27 -
\$\begingroup\$ @DaBler Nice. Even with
-O3
or-Ofast
? Does the cast affect performance (in theory it is unneeded)? I would remove that cast, or useuint64_t
instead if it affects performance. \$\endgroup\$alx - recommends codidact– alx - recommends codidact2019年08月28日 12:50:09 +00:00Commented Aug 28, 2019 at 12:50 -
1\$\begingroup\$ Using
-march=native -O3
and gcc 4.6.3 (-Ofast
should only have effect on floating-point math, right?). The(unsigned long)(n>>64) == 0
gives speedup factor about 1.1 overn < UINT128_C(1) << 64
, depending on the particular input range. \$\endgroup\$DaBler– DaBler2019年08月28日 13:01:40 +00:00Commented Aug 28, 2019 at 13:01 -
\$\begingroup\$ @DaBler Curious. I guess GCC doesn't know how to optimize
unsigned __int128
very much, which is what I guess you are using for thetypedef
. I guessn <= UINT64_MAX
is also slower. There's some other thing you may try, but which relies on implementation defined behaviour: Use aunion
that contains auint128_t
and auint64_t [2]
, and test which of the two elements of the array contains the MSbits. Then just compare that element of the union to0
. It may be even faster than your shift, or it may not. Just try ;-) \$\endgroup\$alx - recommends codidact– alx - recommends codidact2019年08月28日 13:21:51 +00:00Commented Aug 28, 2019 at 13:21 -
1\$\begingroup\$ That's exactly what I've tried for a few days back. But it did not bring any acceleration over mere
uint128_t
. See my attempt here Look fortypedef union { unsigned long ul[2]; uint128_t ull; } uint128_u;
\$\endgroup\$DaBler– DaBler2019年08月28日 13:36:22 +00:00Commented Aug 28, 2019 at 13:36
Explore related questions
See similar questions with these tags.
n++
can overflow for initialn = UINT128_MAX
. However,n++
in subsequent iterations of the do-while loop cannot overflow since those immediately precedingn >>= ctz(n);
will always make room for at least one bit. \$\endgroup\$n *= lut[e];
cannot overflow since the conditionn > UINT128_MAX >> 2*e
ensures the result of that multiplication will surely fit theuint128_t type
. \$\endgroup\$ctz(n)
is always greater than 0 since the argumentn
is even. \$\endgroup\$