Job Title: Sarcastic Architect
Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
If you’re not writing in C/C++, please ignore this post; matters discussed here are related to mitigation of certain decisions which were made 50+ years ago and which are pretty much carved in stone now by this 50-years-old tradition. In other words – this is one of those darker pages of C/C++ programming (and an attempt to make it less dark using some recent C++ developments).
Very recently, I was once again tasked with a yet another project of “enabling all the warnings and getting rid of them”. And as always, after enabling -Wall, by far the most annoying warning was “signed/unsigned mismatch” (known as C4018 in MSVC1, and -Wsign-compare in GCC/Clang); all the other warnings (well, if not enabling -pedantic) were very straightforward to deal with, but signed/unsigned mismatch was (as usual) severely non-trivial.
While dealing with all those pesky warnings – I recalled several experiences when
developers, while trying to get rid exactly of signed/unsigned warning, turned a perfectly-working-program-with-warnings, into a program-without-warnings-but-which-fails-once-a-day-or-so
(and however strange it might sound, clients didn’t consider removed warnings in such an occasionally-failing program as a mitigating circumstance, demanding immediate rollback of the offending “fixes”).
1 enabled starting with /W3 level, which is IIRC default for IDE projects
The Root of The Problem
— Title of a novel by Alexander Herzen, Russia, 1845 —
TBH, the problem is rather nasty (and IIRC, specific to C/C++ ). It stems from a very generic rule (existing at least since K&R C) which (translating from standardese into my-broken-English) says that if both operands of some operator are of the same size, with one of them being unsigned, and another signed – are converted into unsigned one before the operator is really invoked. When applying this generic rule to comparisons, it means that if we’re comparing a negative signed integer x with an unsigned one y, then, first of all, the signed integer will be converted into unsigned one. At this point, as our signed integer x is negative, and at least when our system uses universal-in-practice “two’s complement” representation,2 it will become a very-large unsigned value (to be exact – 2^N+x, where N is number-of-bits-in-our-int-type). Then, our requested comparison operator will be invoked, comparing 2^N+x with y, which is most of the time not what we really meant when we wrote our code. For example (if you don’t believe me, see https://godbolt.org/g/npTtjY ):
static_assert( -1 < 1U );//fails!
[画像:Hare with omg face:]“we're just narrowly avoiding a disaster without understanding how close we were.And it is not a quirk of a specific compiler: it is how ALL the existing compilers/platforms SHOULD behave, and perfectly according to the standard too. We’re just lucky that this doesn’t happen too often, so most of the time, we’re just narrowly avoiding a disaster without understanding how close we were.
To make things even worse, whenever sizeof(short) is less than sizeof(int), with short types it works differently-from-the-above and actually becomes intuitive on this occasion (because shorts – as well as all other “smaller-than-int” types – are implicitly “promoted” to signed ints before comparison):3
static_assert( short(-1) < (unsigned short)(1) );//stands!
And if we take operands of different sizes, the whole thing becomes very-strange-unless-you-know-how-conversions-are-written-in-the-standard:4
static_assert( int64_t(-1) < 1U );//stands! - because converted to larger which is signed
static_assert( -1 < uint64_t(1) );//fails! - because converted to larger which is unsigned
Overall, the logic above (while formally defined in all C/C++ standards to date) – is very non-obvious and extremely-easy-to-forget-about even if you happen to know it by heart (I have to admit that I don’t, and have to think for some time until providing an answer to “what the result will be” in each particular case). Therefore – it makes perfect sense for the compiler to issue that signed/unsigned warning.
2 formally – it seems that other representations should behave “as if” they’re two’s complement during such conversions, though as no C++ compiler exists with other-than-two’s-complement representation (except for Unisys which you have about as many chances to code for as chances for a cold day in hell), it is pretty much a moot issue
3 As Shafik Yaghmour noted in comments below, comparison involving conversions-from-shorts will lead to different results under K&R C; under ANSI C and C++ the result below stands
4 assuming sizeof(int64_t)> sizeof(int)
How To Fix It
— Title of a novel by Nikolai Chernyshevsky, Russia, 1863 —
While this signed/unsigned situation (which we’re informed about with a compiler warning) is indeed rather nasty,
one thing which is even worse than having this admittedly-nasty warning – is trying to fix it with explicit casts
While this warning does indicate potential problems with our code – explicit casts (whether it is C-style cast or static_cast<>) are usually even worse to our code than those-potential-signed-unsigned-problems above. In particular,IIRC, all the problems-I’ve-seen with a perfectly-working-program becoming an occasionally-non-working one as a result of “fixing” the warning, were due to improper use of explicit casts.
In other words,
Never ever use explicit casts merely to get rid of warnings (whether signed/unsigned or otherwise)
TBH, changing some type (usually from signed to unsigned) – which is usually an ideal-solution-for-this-kind-of-issues at least from a performance point of view – is not that much safer than explicit casts in the short run (though it is indeed less fragile in the long run). Personally, I usually prefer to add an assert(x>=0) for each such type change (to ensure that the value-which-I-think-should-never-be-negative, is indeed never negative; as long as this stands, the conversion from signed to unsigned becomes perfectly safe). But even such an assertion (which we might or might not run into during our tests) doesn’t qualify as really safe.
An Idea
After thinking for a while about it – I came up with an idea, which IMO might help us to keep relevant code reasonably-readable – and without those pesky warnings (and more importantly, without results of such comparisons being counter-intuitive).
It is apparently possible to write a template function which performs all the comparisons between arbitrary integer types in a manner which is perfectly consistent with our intuitive expectations; moreover, with C++17 if constexpr, it is possible to write such a function in a manner which (saving for compiler optimization glitches) won’t cause any performance degradation beyond absolutely necessary. Saving for potential bugs, such a function is presented below.
namespace intuitive {
template<class TA, class TB>
inline constexpr bool lt(TA a, TB b) {
using MAX_EFFICIENT_INT = intptr_t;//platform-dependent, but intptr_t is not a bad starting point
static_assert(std::is_integral<TA>::value);
static_assert(std::is_integral<TB>::value);
constexpr bool aSigned = std::is_signed<TA>::value;
constexpr bool bSigned = std::is_signed<TB>::value;
if constexpr(aSigned == bSigned)
return a < b;//both signed or both unsigned - no casts required, C promotions will do just fine
else {//different is_signed, let's make TSIGNED always-signed, and TUNSIGNED - always-unsigned
using TSIGNED = typename std::conditional<aSigned,TA,TB>::type;
using TUNSIGNED = typename std::conditional<aSigned,TB,TA>::type;
static_assert(sizeof(TSIGNED)+sizeof(TUNSIGNED)==sizeof(TA)+sizeof(TB));//self-check
if constexpr(sizeof(TSIGNED)>sizeof(TUNSIGNED))
return a < b;//again, no casts required, C promotions will do just fine (promoting b to TA which is signed)
if constexpr(sizeof(TUNSIGNED)<sizeof(int)) {
return a < b;//again, no casts required, C promotion-to-int will do just fine (promoting both to int which is signed)
}
//at this point, we have sizeof(TUNSIGNED) >= sizeof(TSIGNED) => no-cast will be counterintuitive
if constexpr(sizeof(TUNSIGNED)<sizeof(MAX_EFFICIENT_INT)) {
//we can cast unsigned to MAX_EFFICIENT_INT
if constexpr(aSigned) {
assert(!bSigned);
return a < MAX_EFFICIENT_INT(b);
}
else {
assert(bSigned);
return MAX_EFFICIENT_INT(a) < b;
}
}
else { //last resort: expression
assert(sizeof(TUNSIGNED)>=sizeof(TSIGNED));
if constexpr(aSigned) {
//return a<0 ? true : TUNSIGNED(a) < b;
return (a<0) | (TUNSIGNED(a) < b);//sic - single '|';
// equivalent to the above but seems to perform better under GCC (avoids branch)
}
else {
assert(bSigned);
//return b<0 ? false : a < TUNSIGNED(b);
return (b>=0) & (a < TUNSIGNED(b));//sic - single '&';
// equivalent to the above but seems to perform better under GCC (avoids branch)
}
}
}
}
}//namespace
NB: very very formally, it is probably required to replace sizeof() comparisons with std::numeric_limits comparisons, but for any platform I can imagine, I expect sizeof() to produce exactly the same result (but if you know it is not the case – please give me a shout).
The idea is pretty simple: with a template function intuitive::lt having both its parameters as template parameters, we have both types in our hands, so we can perform some type math to determine an optimal way of performing intuitive conversion (and in quite a few cases, simplistic a<b can do the trick). Then, using results of this math, and if constexpr, we can force our compiler to choose only that variation which is really necessary for given TA and TB.
With this function, all the following static_asserts will stand (to test it, see https://godbolt.org/g/znwD8y ; the code works starting from clang 3.9 and GCC7; also works under MSVC 19.12, but 19.12 is not available on godbolt (yet?)):
static_assert( intuitive::lt(-1,1U) );//stands
static_assert( intuitive::lt(short(-1),(unsigned short)(1)) );//stands
static_assert( intuitive::lt(int64_t(-1), 1U) );//stands
static_assert( intuitive::lt(-1,uint64_t(1)) );//stands
IMO, it is pretty intuitive that -1 is less than 1, regardless of the types we’re using to represent -1 and 1 (one point which might be less intuitive is that intuitive::eq(-1,unsigned(-1)) returns false, but it should be this way for consistency).
NB: when the post was ready, I run into [Walker], where implementation is different (on the first glnce, [Walker] seems to be slower, causing unnecessary branching at least under GCC), but the overall idea of achieving “safe” [Walker] or “intuitive” (this-post) comparison is very similar.
Generated Asm
TBH, I didn’t perform proper performance testing of intuitive::lt function (for such small pieces, it way too much depends on the context); instead, I used Compiler Explorer to see generated asm for intuitive::lt for different types under different compilers and to compare it to the code generated for usual C-style comparisons. Results (for x64, and as always, to be taken with a huge pile of salt):
Operation |
Clang 5 (asm ops/CPU cycles estimate) |
GCC 7.3 (asm ops/CPU cycles estimate) |
Comment |
C-style int<int |
1/1 |
1/1 |
C-style int<unsigned |
1/1 |
1/1 |
C-style int<unsigned short |
2/2 |
2/2 |
C-style int<uint64_t |
2/2 |
2/2 |
C-style int64_t<unsigned |
2/2 |
2/2 |
intuitive::lt(int,int) |
1/1 |
1/1 |
Same result, same code |
intuitive::lt(int,unsigned) |
3/2.5 |
3/2.5 |
Improved result, still very fast |
intuitive::lt(int,unsigned short) |
2/2 |
2/2 |
Same result, same code |
intuitive::lt(int,uint64_t) |
5/4 |
5/45 |
Improved result, the worst case for intuitive::lt performance-wise. Kudos to clang (and see footnote re. GCC) |
intuitive::lt(int64_t,unsigned) |
2/2 |
2/2 |
Same result, same code |
5 to avoid GCC generating branched code with potentional misprediction, it took rather weird (but hopefully still valid) code with bitwise operations on booleans
[画像:Femida hare:]“there is a chance that intuitive::lt MIGHT be a good thing to make behaviour of our C++ code to correspond better to our-expectations-when-we're-reading-itAs we can observe:
- whenever intuitive version happens to provide the same result as C-style one – the cost is exactly the same. In other words, we’re NOT paying for whatever-we’re-NOT-using
- Costs in other cases are still very low; even in the worst-case, the costs are comparable to the cost of usual conversions. Moreover, for most of the app-level code, conversions (including casts) are perceived to be “free”, and even being 2x more than that shouldn’t be too bad.
Conclusion
With all the troubles which arise both from the extremely counter-intuitive signed/unsigned comparisons (and well-intended but poorly-implemented attempts to get rid of them) – there is a chance that intuitive::lt (which, with its counterparts gt, le, ge, eq, and ne is BTW available at https://github.com/ITHare/util) – MIGHT be a good thing to make behaviour of our C++ code to correspond better to our-expectations-when-we’re-reading-it. Whether it really happens to be a Good Thing(tm) in the long run – is yet to be seen.
Oh, and if somebody can rewrite the code above so it flies even with a C++11-based compiler (replacing if constexprs with template instantiations, for example, along the lines of [Walker]) – such contributions will be very welcome 6.
6 I know how to do it in general, but my level of attention to details is not sufficient to make sure if-constexpr-less version works
Acknowledgement
Cartoons by Sergey GordeevIRL from Gordeev Animation Graphics, Prague.