Context:
I want to verify the fact that under 32-bits, Ox8000 0000 - 1 = Ox7FFF FFFF, so if both of them are interpreted as signed integers, the sign will change from negative to positive.
Here goes the first version:
#include <stdio.h>
int main() {
int x = 0x80000000;
printf("x's value is Ox%x, representing integer %d\n", x, x);
if (x - 1 > 0)
printf("Ox%x - 1 > 0\n", x);
else
printf("Ox%x - 1 = Ox%x, which reprensents %d\n", x, x-1, x-1);
return 0;
}
Run it I get:
x's value is Ox80000000, representing integer -2147483648
Ox80000000 - 1 = Ox7fffffff, which reprensents 2147483647
From the second print info x - 1 > 0, but the statement inside if isn't run, which means that x - 1 < 0, which contradicts.
Then I made the second version:
#include <stdio.h>
int main() {
int x = 0x80000000;
printf("x's value is Ox%x, representing integer %d\n", x, x);
int y = x - 1;
if (y > 0)
printf("Ox%x - 1 > 0\n", x);
else
printf("Ox%x - 1 = Ox%x, which reprensents %d\n", x, x-1, x-1);
return 0;
}
This time the program run as expected:
x's value is Ox80000000, representing integer -2147483648
Ox80000000 - 1 > 0
Question:
I don't see what's the difference. From my understanding, if (x - 1 > 0) first calculate x - 1 then compare it to 0.
I am using gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
3 Answers 3
Signed over/underflow is undefined behavior, meaning that anything can happen and we can't assume any particular outcome. The flaw in your reasoning is "as expected" - nothing is expected here.
Analyzing how we ended up in one particular behavior out of multiple possible isn't often very meaningful, but sure we can do that...
In this case, while running your code in gcc with maximum optimization, the first code results in this:
- The value
-2147483648is pre-loaded into registers then printed with the first printf. - The
ifnever happens but is optimized out since the compiler can predict it. - Since a negative signed number in C can never become positive by subtracting from it (since that would invoke undefined behavior), the compiler is free to assume that any expression
x - 1wherexis known to be negative can never be> 0. - Therefore the
elsebranch is taken and2147483647is pre-loaded into registers for the second printf and printed along with-2147483648.
In the second example:
The value
-2147483648is pre-loaded into registers then printed with the first printf.int y = x - 1;never happens, nor does theif, all of it optimized away.Now the compiler can't just assume "this can never be positive" but it has to consider some sort of value getting loaded into
y, because the optimizing code must behave similar to storing a value inside anintand then comparing the result with> 0. Storing a value is a side effect and a compiler is only allowed to optimize out side effects if it can deduct that such an optimization doesn't change the way the code behaves. (Which is kind of silly here since there is no expected behavior.)So it takes the first branch because apparently on this particular attempt on this particular system, an underflow resulted in wrap-around behavior.
So by analyzing the code we learnt basically nothing of value except that code without bugs is good and relying on undefined behavior is bad. Since small tweaks to the code with undefined behavior could result in a completely different outcome.
Note that assigning 0x80000000 to a 32 bit int is an unsigned to signed conversion, which is compiler specific. This is because hex literals that can't fit inside an int are given the type unsigned int, if they can fit there. Which is the case here.
Comments
When signed integer subtraction overflows, the result is undefined behavior. Unlike unsigned subtraction, signed subtraction is not required to behave modulo one greater than the largest representable value of the result type. It may behave that way, or it could trap, or it could do something else. The behavior is left to the implementation.
In your first example, the optimizer presumably sees that x is negative, and therefore concludes that subtracting one from it will produce a negative result if it doesn't overflow. And if it does overflow, the behavior is undefined, so it's free to ignore it.
In the second example, it stores the subtraction result to a variable, which forces some value to be stored. The behavior is still undefined, but on your platform it looks like it's just using the low-order bits of the result, which is what you expected. But you can't rely on that.
If you want guaranteed behavior, it's best to avoid overflow conditions. Or you could make x an unsigned integer type, in which case the result is defined.
Comments
Congratulations! You are right to be confused.
You have found a compiler bug of sorts. The C language police will say what you have done invokes undefined behaviour of integer underflow (and they are right). But it can be confusing when it happens.
The problem is easily seen by throwing the sample code into Godbolt and then examining the assembler generated. Moral of story beware of edge cases near the ends of the permitted range of integer variables:
https://godbolt.org/z/hcWKsazo9
The first form rearranges the conditional expression ignoring underflow (even though at compile time it can see that will happen). Some compilers use higher precision variables at compile time to avoid under/overflow:
if (x - 1 > 0)
becomes simplified to the almost but not quite equivalent for under/overflow edge cases
if (x > 1)
And in that form it branches the wrong way for the tricky edge case. I think it is a bit naughty for the compiler to do this when it can see the underflow problem at compile time since everything in the expression is a constant - YMMV. If input data triggers such a failure at runtime can also cause trouble this way.
Assembler code for the failing version takes a cunning short cut:
cmp DWORD PTR [rbp-4], 1
jle .L6
Ignoring an underflow that is certain to happen.
Assembler code for the working version toy2() explicitly computes y = x - 1;
and then does the comparison to zero exactly as it is written in C.
mov eax, DWORD PTR [rbp-4]
sub eax, 1
mov DWORD PTR [rbp-8], eax
cmp DWORD PTR [rbp-8], 0
Moral is beware of the edge cases where underflow and overflow may result in undefined behaviour.
1 Comment
Explore related questions
See similar questions with these tags.
printf("Ox%x\n", x);is also UB, trying to print some signed type with a negative value with a specifier forunsigned.int x = 0x80000000;leads to implementation define behavior as0x80000000is anunsignedwhose value is outside theintrange.