Confused by difference between expression inside if and expression outside if

Question 1

Context:

I want to verify the fact that under 32-bits, Ox8000 0000 - 1 = Ox7FFF FFFF, so if both of them are interpreted as signed integers, the sign will change from negative to positive.

Here goes the first version:

#include <stdio.h>
int main() {
 int x = 0x80000000;
 printf("x's value is Ox%x, representing integer %d\n", x, x);
 if (x - 1 > 0)
 printf("Ox%x - 1 > 0\n", x);
 else
 printf("Ox%x - 1 = Ox%x, which reprensents %d\n", x, x-1, x-1);
 return 0;
}

Run it I get:

x's value is Ox80000000, representing integer -2147483648
Ox80000000 - 1 = Ox7fffffff, which reprensents 2147483647

From the second print info x - 1 > 0, but the statement inside if isn't run, which means that x - 1 < 0, which contradicts.

Then I made the second version:

#include <stdio.h>
int main() {
 int x = 0x80000000;
 printf("x's value is Ox%x, representing integer %d\n", x, x);
 int y = x - 1;
 if (y > 0)
 printf("Ox%x - 1 > 0\n", x);
 else
 printf("Ox%x - 1 = Ox%x, which reprensents %d\n", x, x-1, x-1);
 return 0;
}

This time the program run as expected:

x's value is Ox80000000, representing integer -2147483648
Ox80000000 - 1 > 0

Question:

I don't see what's the difference. From my understanding, if (x - 1 > 0) first calculate x - 1 then compare it to 0.

I am using gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Question 2

printf("Ox%x\n", x); is also UB, trying to print some signed type with a negative value with a specifier for unsigned.

Question 3

Note that int x = 0x80000000; leads to implementation define behavior as 0x80000000 is an unsigned whose value is outside the int range.

Question 4

The value of "most negative value minus one" sounds a lot like "one more than you can say". How useful is that value?

Question 5

Signed over/underflow is undefined behavior, meaning that anything can happen and we can't assume any particular outcome. The flaw in your reasoning is "as expected" - nothing is expected here.

Analyzing how we ended up in one particular behavior out of multiple possible isn't often very meaningful, but sure we can do that...

In this case, while running your code in gcc with maximum optimization, the first code results in this:

The value -2147483648 is pre-loaded into registers then printed with the first printf.
The if never happens but is optimized out since the compiler can predict it.
Since a negative signed number in C can never become positive by subtracting from it (since that would invoke undefined behavior), the compiler is free to assume that any expression x - 1 where x is known to be negative can never be > 0.
Therefore the else branch is taken and 2147483647 is pre-loaded into registers for the second printf and printed along with -2147483648.

In the second example:

The value -2147483648 is pre-loaded into registers then printed with the first printf.
int y = x - 1; never happens, nor does the if, all of it optimized away.
Now the compiler can't just assume "this can never be positive" but it has to consider some sort of value getting loaded into y, because the optimizing code must behave similar to storing a value inside an int and then comparing the result with > 0. Storing a value is a side effect and a compiler is only allowed to optimize out side effects if it can deduct that such an optimization doesn't change the way the code behaves. (Which is kind of silly here since there is no expected behavior.)
So it takes the first branch because apparently on this particular attempt on this particular system, an underflow resulted in wrap-around behavior.

So by analyzing the code we learnt basically nothing of value except that code without bugs is good and relying on undefined behavior is bad. Since small tweaks to the code with undefined behavior could result in a completely different outcome.

Note that assigning 0x80000000 to a 32 bit int is an unsigned to signed conversion, which is compiler specific. This is because hex literals that can't fit inside an int are given the type unsigned int, if they can fit there. Which is the case here.

Question 6

When signed integer subtraction overflows, the result is undefined behavior. Unlike unsigned subtraction, signed subtraction is not required to behave modulo one greater than the largest representable value of the result type. It may behave that way, or it could trap, or it could do something else. The behavior is left to the implementation.

In your first example, the optimizer presumably sees that x is negative, and therefore concludes that subtracting one from it will produce a negative result if it doesn't overflow. And if it does overflow, the behavior is undefined, so it's free to ignore it.

In the second example, it stores the subtraction result to a variable, which forces some value to be stored. The behavior is still undefined, but on your platform it looks like it's just using the low-order bits of the result, which is what you expected. But you can't rely on that.

If you want guaranteed behavior, it's best to avoid overflow conditions. Or you could make x an unsigned integer type, in which case the result is defined.

Question 7

Congratulations! You are right to be confused.

You have found a compiler bug of sorts. The C language police will say what you have done invokes undefined behaviour of integer underflow (and they are right). But it can be confusing when it happens.

The problem is easily seen by throwing the sample code into Godbolt and then examining the assembler generated. Moral of story beware of edge cases near the ends of the permitted range of integer variables:

https://godbolt.org/z/hcWKsazo9

The first form rearranges the conditional expression ignoring underflow (even though at compile time it can see that will happen). Some compilers use higher precision variables at compile time to avoid under/overflow:

 if (x - 1 > 0)

becomes simplified to the almost but not quite equivalent for under/overflow edge cases

 if (x > 1)

And in that form it branches the wrong way for the tricky edge case. I think it is a bit naughty for the compiler to do this when it can see the underflow problem at compile time since everything in the expression is a constant - YMMV. If input data triggers such a failure at runtime can also cause trouble this way.

Assembler code for the failing version takes a cunning short cut:

 cmp DWORD PTR [rbp-4], 1
 jle .L6

Ignoring an underflow that is certain to happen.

Assembler code for the working version toy2() explicitly computes y = x - 1; and then does the comparison to zero exactly as it is written in C.

 mov eax, DWORD PTR [rbp-4]
 sub eax, 1
 mov DWORD PTR [rbp-8], eax
 cmp DWORD PTR [rbp-8], 0

Moral is beware of the edge cases where underflow and overflow may result in undefined behaviour.

Question 8

You shouldn't call this a compiler bug since that implies that the responsibility for making this code work lies with the compiler and not the programmer. The issue is that the compiler has no obligation to act on potential overflows/underflows - this responsibility belongs to the programmer.

Lundin 220k47 gold badges282 silver badges447 bronze badges · Accepted Answer · 2025-02-27 14:56:19Z

Signed over/underflow is undefined behavior, meaning that anything can happen and we can't assume any particular outcome. The flaw in your reasoning is "as expected" - nothing is expected here.

Analyzing how we ended up in one particular behavior out of multiple possible isn't often very meaningful, but sure we can do that...

In this case, while running your code in gcc with maximum optimization, the first code results in this:

The value -2147483648 is pre-loaded into registers then printed with the first printf.
The if never happens but is optimized out since the compiler can predict it.
Since a negative signed number in C can never become positive by subtracting from it (since that would invoke undefined behavior), the compiler is free to assume that any expression x - 1 where x is known to be negative can never be > 0.
Therefore the else branch is taken and 2147483647 is pre-loaded into registers for the second printf and printed along with -2147483648.

In the second example:

The value -2147483648 is pre-loaded into registers then printed with the first printf.
int y = x - 1; never happens, nor does the if, all of it optimized away.
Now the compiler can't just assume "this can never be positive" but it has to consider some sort of value getting loaded into y, because the optimizing code must behave similar to storing a value inside an int and then comparing the result with > 0. Storing a value is a side effect and a compiler is only allowed to optimize out side effects if it can deduct that such an optimization doesn't change the way the code behaves. (Which is kind of silly here since there is no expected behavior.)
So it takes the first branch because apparently on this particular attempt on this particular system, an underflow resulted in wrap-around behavior.

So by analyzing the code we learnt basically nothing of value except that code without bugs is good and relying on undefined behavior is bad. Since small tweaks to the code with undefined behavior could result in a completely different outcome.

Note that assigning 0x80000000 to a 32 bit int is an unsigned to signed conversion, which is compiler specific. This is because hex literals that can't fit inside an int are given the type unsigned int, if they can fit there. Which is the case here.

CollectivesTM on Stack Overflow

Confused by difference between expression inside if and expression outside if

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related