Computational verification of Collatz conjecture

Question 1

Prerequisites

The typedef name uint128_t designates an unsigned integer type with width exactly 128 bits.

The UINT128_MAX is maximum value for an object of type uint128_t.

Function int ctz(uint128_t n) returns the number of trailing 0-bits in n, starting at the least significant bit position. If n is 0, the result is undefined.

The macro UINT128_C(n) shall expand to an integer constant expression corresponding to the type uint128_t.

The following macros are defined.

/* all 3^n for n < 41 fits into uint64_t */
#define LUT_SIZE64 41
/* all 3^n for n < 81 fits into uint128_t */
#define LUT_SIZE128 81

The following array is defined and initialized with corresponding values.

/* lut[n] contains 3^n */
uint128_t lut[LUT_SIZE128];

Problem

My program is concerned with verifying the convergence of the Collatz problem, using this algorithm.

The convergence for all values n ≤ 87 ×ばつ 2⁶⁰ has been proven. [Source: Christian Hercher, Uber die Lange nicht-trivialer Collatz-Zyklen, Artikel in der Zeitschrift "Die Wurzel" Hefte 6 und 7/2018.]

The following function is called for n of the form \4ドルn+3\$, in order from the smallest one to the largest one, only if all preceding calls returned zero.

The following function should either

return 0 if the Collatz problem for the n is convergent,
return 1 if the function cannot verify the convergence using 128-bit arithmetic,
loop infinitely if the trajectory for the n is cyclic.

Code

int check_convergence(uint128_t n)
{
 uint128_t n0 = n;
 int e;
 do {
 if (n <= UINT128_C(87) << 60) {
 return 0;
 }
 n++;
 e = ctz(n);
 n >>= e;
 if (n < UINT128_C(1) << 64 && e < LUT_SIZE64) {
 return 0;
 }
 if (n > UINT128_MAX >> 2*e || e >= LUT_SIZE128) {
 return 1;
 }
 n *= lut[e];
 n--;
 n >>= ctz(n);
 if (n < n0) {
 return 0;
 }
 } while (1);
}

Question 2

@CacahueteFrito Good point. The n++ can overflow for initial n = UINT128_MAX. However, n++ in subsequent iterations of the do-while loop cannot overflow since those immediately preceding n >>= ctz(n); will always make room for at least one bit.

Question 3

@CacahueteFrito The n *= lut[e]; cannot overflow since the condition n > UINT128_MAX >> 2*e ensures the result of that multiplication will surely fit the uint128_t type.

Question 4

@CacahueteFrito ctz(n) is always greater than 0 since the argument n is even.

Question 5

@CacahueteFrito Lets say we have 3 GHz CPU computing 3 000 000 000 simple instructions per second. Then going over 2^128 states would roughly take 3.6 x 10^21 years.

Question 6

Good to read that; sometimes I tend to forget the most basic things :-)

Question 7

Given that (from your comments) there is one and only one input which would cause overflow, I propose the following check at the beginning of the function:

int check_convergence(uint128_t n)
{
 const uint128_t n0 = n;
 int e;
 if (n == UINT128_MAX)
 return 1;
 do {
 ...
 } while (true);
}

I also added const to n0, given that it's constant through all the function.

if (n < UINT128_C(1) << 64 && e < LUT_SIZE64)
 return 0;

That can be rewritten as:

if (n <= UINT64_MAX && e < LUT_SIZE64)
 return 0;

Although maybe unneeded, I prefer to always parenthesize macros that evaluate to a value, just in case:

#define LUT_SIZE128 (81)

Question 8

(unsigned long)(n>>64) == 0 seems to be much faster than n < UINT128_C(1) << 64 or n <= UINT64_MAX.

Question 9

@DaBler Nice. Even with -O3 or -Ofast ? Does the cast affect performance (in theory it is unneeded)? I would remove that cast, or use uint64_t instead if it affects performance.

Question 10

Using -march=native -O3 and gcc 4.6.3 (-Ofast should only have effect on floating-point math, right?). The (unsigned long)(n>>64) == 0 gives speedup factor about 1.1 over n < UINT128_C(1) << 64, depending on the particular input range.

Question 11

@DaBler Curious. I guess GCC doesn't know how to optimize unsigned __int128 very much, which is what I guess you are using for the typedef. I guess n <= UINT64_MAX is also slower. There's some other thing you may try, but which relies on implementation defined behaviour: Use a union that contains a uint128_t and a uint64_t [2], and test which of the two elements of the array contains the MSbits. Then just compare that element of the union to 0. It may be even faster than your shift, or it may not. Just try ;-)

Question 12

That's exactly what I've tried for a few days back. But it did not bring any acceleration over mere uint128_t. See my attempt here Look for typedef union { unsigned long ul[2]; uint128_t ull; } uint128_u;

score 2 · Accepted Answer · 2019-08-27 12:47:18Z

2

\$\begingroup\$

Given that (from your comments) there is one and only one input which would cause overflow, I propose the following check at the beginning of the function:

int check_convergence(uint128_t n)
{
 const uint128_t n0 = n;
 int e;
 if (n == UINT128_MAX)
 return 1;
 do {
 ...
 } while (true);
}

I also added const to n0, given that it's constant through all the function.

if (n < UINT128_C(1) << 64 && e < LUT_SIZE64)
 return 0;

That can be rewritten as:

if (n <= UINT64_MAX && e < LUT_SIZE64)
 return 0;

Although maybe unneeded, I prefer to always parenthesize macros that evaluate to a value, just in case:

#define LUT_SIZE128 (81)

Share

edited Aug 27, 2019 at 17:55

answered Aug 27, 2019 at 12:47

alx - recommends codidact's user avatar

alx - recommends codidact alx - recommends codidact

2,0489 silver badges23 bronze badges

\$\endgroup\$

7

1

\$\begingroup\$ (unsigned long)(n>>64) == 0 seems to be much faster than n < UINT128_C(1) << 64 or n <= UINT64_MAX. \$\endgroup\$

DaBler
– DaBler

2019年08月28日 12:27:01 +00:00
Commented Aug 28, 2019 at 12:27
\$\begingroup\$ @DaBler Nice. Even with -O3 or -Ofast ? Does the cast affect performance (in theory it is unneeded)? I would remove that cast, or use uint64_t instead if it affects performance. \$\endgroup\$

alx - recommends codidact
– alx - recommends codidact

2019年08月28日 12:50:09 +00:00
Commented Aug 28, 2019 at 12:50
1

\$\begingroup\$ Using -march=native -O3 and gcc 4.6.3 (-Ofast should only have effect on floating-point math, right?). The (unsigned long)(n>>64) == 0 gives speedup factor about 1.1 over n < UINT128_C(1) << 64, depending on the particular input range. \$\endgroup\$

DaBler
– DaBler

2019年08月28日 13:01:40 +00:00
Commented Aug 28, 2019 at 13:01
\$\begingroup\$ @DaBler Curious. I guess GCC doesn't know how to optimize unsigned __int128 very much, which is what I guess you are using for the typedef. I guess n <= UINT64_MAX is also slower. There's some other thing you may try, but which relies on implementation defined behaviour: Use a union that contains a uint128_t and a uint64_t [2], and test which of the two elements of the array contains the MSbits. Then just compare that element of the union to 0. It may be even faster than your shift, or it may not. Just try ;-) \$\endgroup\$

alx - recommends codidact
– alx - recommends codidact

2019年08月28日 13:21:51 +00:00
Commented Aug 28, 2019 at 13:21
1

\$\begingroup\$ That's exactly what I've tried for a few days back. But it did not bring any acceleration over mere uint128_t. See my attempt here Look for typedef union { unsigned long ul[2]; uint128_t ull; } uint128_u; \$\endgroup\$

DaBler
– DaBler

2019年08月28日 13:36:22 +00:00
Commented Aug 28, 2019 at 13:36

| Show 2 more comments

Stack Exchange Network

Computational verification of Collatz conjecture

Prerequisites

Problem

Code

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Computational verification of Collatz conjecture

Prerequisites

Problem

Code

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions