"Longest Collatz sequence" in C slower than in Python 3

Question 1

This tight-looped memoization-less C solution to Project Euler's problem 14 seems efficient, but ran much slower than a similarly trivial Python 3 program. For the desired upper of 1000000, the C version ran practically forever, while Python 3 fairly quickly printed the correct result.

#include <stdio.h>
#define upper 1000000
int main() {
 int max = 1;
 int maxl = 1;
 for (int n = 1; n < upper; ++ n) {
 int c = n; // After v5r's answer: Should be `long c = n;`
 int cl = 1;
 while (c != 1) {
 if (c % 2 == 0) {
 c /= 2;
 }
 else {
 c = 3 * c + 1;
 }
 ++ cl;
 if (cl > maxl) {
 max = n;
 maxl = cl;
 }
 }
 }
 printf("%i\n", max);
 return 0;
}

Python 3:

#! /usr/bin/env python3
max = 1
maxl = 1
for n in range(1, 1000000):
 c = n
 cl = 1
 while c != 1:
 if c % 2 == 0:
 c /= 2
 else:
 c = 3 * c + 1
 cl += 1
 if cl > maxl:
 max = n
 maxl = cl
print(max)

Question 2

Interesting result. What insight were you hoping for from a code review?

Question 3

I was puzzled by the stark performance difference; would have expected C code to run much faster than Python 3. The C execution ran and ran and ran, without printing anything. I must have missed something relevant.

Question 4

Compiled the C code with Apple LLVM version 8.0.0 (clang-800.0.42.1)

Question 5

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers .

Question 6

"You must not edit the code in the question, as that would violate the question-and-answer nature of this site. (An exception to that would be if a user wrote a comment saying that your code is completely broken, and needs to be fixed before it can be reviewed.)" The latter does not apply here, so please stick to the first part of that rule and don't touch the code. Feel free to post a follow-up and link to that from this question. That's the preferred way of handling this.

Question 7

Assuming int is 32 bit long on your system, an integer overflow occurs on c with upper = 1000000. The multiplication in c = 3 * c + 1 can cause c to grow quite large, eventually exceeding the \2ドル^{31}-1\$ upper limit of a 32-bit signed integer, and thus rolling over into negative values. Obviously, the algorithm does not work correctly for negative values of c, and gets stuck.

In Python on the other hand, none of this occurs, since it extends integers to arbitrary length automatically, and thus avoids overflows.

You can "postpone" this problem (i.e. make your program work for larger values of upper) by using larger (and unsigned) integer types, such as uint64_t from stdint.h for c up to \2ドル^{64}-1\$.

However, if you want your C code to work for arbitrarily large upper values, you'll have to use an external library for arbitrary-precision integers, such as GNU GMP (or roll your own, which is really not advisable except as an exercise). The disadvantage is slightly worse performance, which would likely be comparable to the Python version for such a simple and arithmetics-heavy program.

Question 8

On my MacBook Pro, after I set upper=100000 bound just to get the timings back into human scale, the Python code executes (with python, i.e. Python 2.7.10) in 5.169 seconds and the C code executes in 0.073 seconds (after compilation with clang x.c -W -Wall -pedantic -O3).

If you're compiling without -O3, well, that could inflate your numbers quite a bit. With clang x.c -W -Wall -pedantic (implicitly -O0), my test case takes 0.195 seconds, i.e., almost three times as slow.

One pedantic thing that could be going horribly wrong is "you're on a 16-bit machine, so i < 1000000 is invariably false and your loop never terminates." But (A) you're not, and (B) if you were, then compiling with -W -Wall -pedantic would suffice to diagnose that issue on any modern compiler.

However, after thinking about that a little more, I found your actual bug. It's not overflow on the ++i loop; it's overflow on the 3*c+1 multiplication!

Specifically, once c overflows, because it's a signed int, it'll go negative; and then it may enter the attractor

-25, -74, -37, -110, -55, -164, -82, -41, -122,
-61, -182, -91, -272, -136, -68, -34, -17, -50,
(-25, -74, ...)

which never hits 1 and so your inner loop (the cl loop) never terminates.

In this particular case, you can fix the infinite loop by checking for c > 1 instead of c != 1, so that the inner loop will stop on overflow-to-negative as well as on reduction-to-1. However, this will not solve the general problem that 3*c+1 can overflow to a positive number; and it won't remove the undefined behavior that results on signed integer overflow. (Which is to say, the compiler is allowed by the language standard to assume that overflow never happens, which means that if overflow does happen anyway, all bets are off. It's your job as a programmer to make sure overflow can't happen.)

So what you need to do is check for overflow before doing the multiplication, like this:

if (c >= INT_MAX / 3) {
 printf("Input %d hits 32-bit overflow before producing a result\n", n);
 break;
}
c = 3 * c + 1;

With this modification, your C code runs in 0.572 seconds and produces the mathematically incorrect answer 910107. (The correct answer is 837799, and as v5r suggests, you can get that answer by changing int to int64_t throughout your code. The revised code runs in 0.402 seconds on my machine, i.e., the 64-bit version runs faster — but I suspect that's partly due to getting to skip over all those error-reporting printfs that were slowing down the 32-bit code. Code runs faster when it doesn't have any errors to report. :))

One more improvement: You can, and therefore should, move the code block

 if (cl > maxl) {
 max = n;
 maxl = cl;
 }

from inside the inner loop to outside it. There's no point doing all that computation every time through the inner loop. Just do it once, after the end of the inner loop.

Question 9

Thanks, I missed a break statement after maxl = cl;

Question 10

Adding break to that code block, in its current location, would be incorrect, because it would break out of the loop long before c=1 and therefore long before cl had attained its maximum value.

v5r v5r 3763 silver badges3 bronze badges · Accepted Answer · 2017-03-12 18:47:02Z

Assuming int is 32 bit long on your system, an integer overflow occurs on c with upper = 1000000. The multiplication in c = 3 * c + 1 can cause c to grow quite large, eventually exceeding the \2ドル^{31}-1\$ upper limit of a 32-bit signed integer, and thus rolling over into negative values. Obviously, the algorithm does not work correctly for negative values of c, and gets stuck.

In Python on the other hand, none of this occurs, since it extends integers to arbitrary length automatically, and thus avoids overflows.

You can "postpone" this problem (i.e. make your program work for larger values of upper) by using larger (and unsigned) integer types, such as uint64_t from stdint.h for c up to \2ドル^{64}-1\$.

However, if you want your C code to work for arbitrarily large upper values, you'll have to use an external library for arbitrary-precision integers, such as GNU GMP (or roll your own, which is really not advisable except as an exercise). The disadvantage is slightly worse performance, which would likely be comparable to the Python version for such a simple and arithmetics-heavy program.

Stack Exchange Network

"Longest Collatz sequence" in C slower than in Python 3

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

"Longest Collatz sequence" in C slower than in Python 3

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions