First of all, if you're unfamiliar with the Collatz sequence, it recursively applies the following function to an integer:
$$ f(n) = \begin{cases} n/2 & \text{if } n \equiv 0 \pmod{2}, \\ 3n+1 & \text{if } n \equiv 1 \pmod{2}. \end{cases} $$
What I wanted to do was to make a program that would:
- Quit if 0 was inputted
- Reprompt the user if something invalid was inputted
- For positive integers, e.g. 7, output "7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1"
- For negative integers, e.g. -3, where doing the above would output "-3 -8 -4 -2 -1 -2 -1 -2... [forever repeating]", output something like "-3 -8 -4 (-2 -1)", as "-2 -1" repeats forever.
This is what I have, which fits most of the criteria, but could definitely be improved somehow. What do you guys think?:
#include <stdio.h>
#include <stdbool.h>
long long cltz(long long x);
int main() {
long long n;
char input[256];
while (1) {
printf("Enter a positive or negative integer (0 to quit): ");
if (fgets(input, sizeof(input), stdin) == NULL) break;
if (sscanf(input, "%lld", &n) != 1) continue;
if (n == 0) break;
bool sign = n < 0;
long long on = n, slow = n, fast = n;
while (sign) {
slow = cltz(slow);
fast = cltz(cltz(fast));
if (fast == slow) break;
}
if (on == slow && sign) printf("( ");
printf("%lld ", n);
while (n != 1) {
n = cltz(n);
if (n == slow) break;
printf("%lld ", n);
}
long long c_slow = cltz(slow);
if (sign && on != slow) {
printf("( %lld ", slow);
while (c_slow != slow) {
printf("%lld ", c_slow);
c_slow = cltz(c_slow);
}
printf(")");
}
if (on == slow && sign) printf(")");
printf("\n");
}
return 0;
}
long long cltz(long long x) {
return (x & 1) ? ((x << 1) + x + 1) : (x >> 1);
}
2 Answers 2
Let the compile optimize
Rather than (x << 1) + x + 1) : (x >> 1)
, use (x*3 + 1) : (x/2)
to match the algorithm and enable your compiler's optimizations.
If you can optimize better than the compiler, get a better compiler.
Note that x%2
is not the same as x mod 2
.
Stay with x & 1
or use x % 2llu
.
Detect overflow
cltz()
may overflow. Should you want to detect this:
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
// Highest odd value valid for `cltz()`
#define CLTX_ODD_HIGH ((LLONG_MAX - 1)/3)
long long cltz(long long x) {
// return (x & 1) ? ((x << 1) + x + 1) : (x >> 1);
if (x & 1) {
if (llabs(x) > CLTX_ODD_HIGH) {
fprintf(stderr, "Overflow with %lld\n", x);
exit(EXIT_FAILURE);
}
return x*3 + 1;
}
return x / 2;
}
Design
Consider bool sign = n < 0; ... if (on == slow && sign) printf(")"); printf("\n");
as a helper function. That is, separate user input from doing the task.
Easier to perform testing and recover from overflow.
-
3\$\begingroup\$ Do you say that
x%2
is not the same as mathematicalx
mod 2 because the result is different for negative numbers? Whilst that's true in general, we're in a boolean context here (i.e. implicitlyx&2 != 0
), for which the result should be the same, and a compiler may be able to produce identical code (though I haven't confirmed that the ones I care about actually do). \$\endgroup\$Toby Speight– Toby Speight2023年06月10日 07:46:38 +00:00Commented Jun 10, 2023 at 7:46 -
1\$\begingroup\$ @TobySpeight Agree that
if (x%2)
is likeif (x&1)
ifx
inunsigned
orsigned
(aside from the so to be gone non- 2's complement). IAC, mod and signed%
are a general source of errors given-1%2 --> -1
. \$\endgroup\$chux– chux2023年06月10日 16:45:09 +00:00Commented Jun 10, 2023 at 16:45 -
\$\begingroup\$ Using
x/2
instead ofx>>1
would make the code slower unless you also changex
fromlong long
tounsigned long long
. See Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?. That would also sidestep the whole%2
issue @TobySpeight discussed. (Although compilers are smart enough to optimizex%2 != 0
the same asx&1
even for signed types, the problem is only when code insists on writingx%2 == 1
that forces compilers to make asm such that it's false for negative odd integers.) \$\endgroup\$Peter Cordes– Peter Cordes2023年06月11日 02:25:04 +00:00Commented Jun 11, 2023 at 2:25 -
1\$\begingroup\$ Oh, I read the question more carefully and see they are actually playing with negative starting values for the sequence. In general
x/2
(truncating toward zero) is not equivalent tox>>1
(rounding toward -inf), but interestingly in this case it's only done on even numbers, so either works. But GCC and clang don't notice that, and do the full rounding correction usingshr reg, 63
/add
beforesar
: godbolt.org/z/osT5c94Ph So your version is less efficient.x*3+1
is good, though; compilers know how to split it intox*2 + x
for an x86 LEA, no need to do that manually. \$\endgroup\$Peter Cordes– Peter Cordes2023年06月11日 02:36:09 +00:00Commented Jun 11, 2023 at 2:36 -
1\$\begingroup\$ "If you can optimize better than the compiler, get a better compiler." this should be told in class. \$\endgroup\$fudo– fudo2023年06月12日 09:21:52 +00:00Commented Jun 12, 2023 at 9:21
Adding to chux's review:
Naming things
Avoid unnecessarily abbreviating variable and function names. Some use of single-letter variables is OK, like i
for a loop index, or n
for a count.
Since n
is not a count but just the current value, I would write current
or value
. I know that it is used in the mathematical definition of the Collatz sequence, but unless you add a comment explaining this and showing that function, then a reader of your code has no idea what n
means.
Instead of cltz
and c
, write collatz
. Instead of o
, write old_value
or previous
(depending on whether you renamed n
to value
or current
).
Don't special-case negative integers
If you rewrite your code to always check if there is a repetition, you don't need to look at the sign bit. I would have a small array of two entries to remember the past two values. This also avoids making more calls to cltz()
than necessary: you are calculating fast
only to check if there is a repetition, but then you discard it even though you'll recalculate the same value soon afterwards.
Fill the array first, then calculate a new value every step, check if there is a repetition, if not print the oldest value in the array and replace it with the newly calculated one. If there is a repetition, then you can print the parentheses and the two values in the array, as those will be the repeating ones.
(-2 -1)
but not the positive loop4 2 1
? \$\endgroup\$cltz
just stand for Collatz? I thought at first it was some bithack helper function since the name is very close toclz
(count leading zeros) andctz
(count trailing zeros). Or clear trailing zeros? But that doesn't make sense because clearing means zeroing. Anyway, see Why does C++ code for testing the Collatz conjecture run faster than hand-written asm? re: versions that will compile efficiently for x86, specifically that using an unsigned type (uint64_t
) was faster there, using/2
which has to truncate toward 0 for signed, unlike>>1
. \$\endgroup\$