On Thu, Feb 27, 2020 at 10:51:39AM -0500, Charalampos Stratakis wrote:
Hello folks,
I recently observed a failure on the s390x fedora rawhide buildbot, on the
clang builds, when clang got updated to version 10:
https://bugs.python.org/issue39689
The call:
struct.unpack('>?', b'\xf0')
means to unpack a "native bool", i.e. native size and alignment. Internally,
this does:
static PyObject *
nu_bool(const char *p, const formatdef *f)
{
_Bool x;
memcpy((char *)&x, p, sizeof x);
return PyBool_FromLong(x != 0);
}
i.e., copies "sizeof x" (1 byte) of memory to a temporary buffer x, and then
treats that as _Bool.
While I don't have access to the C standard, I believe it says that assignment of a true
value to _Bool can coerce to a unique "true" value. It seems that if a char
doesn't have the exact bit pattern for true or false, casting to _Bool is undefined
behavior. Is that correct?
Clang 10 on s390x seems to take advantage of this: it probably only looks at
the last bit(s) so a _Bool with a bit pattern of 0xf0 turns out false.
But the tests assume that 0xf0 should unpack to True.
I don't think it's specific to Clang 9, or the s390x arch. Have a look to
https://godbolt.org/z/3n-LqN
clang indeed just checks for the lowest bit. Is it correct? I think so. _Bool
can only holds two value, 0 and 1, [0] which is different from an int whose
value is
true or false whether its different or equal to 0. GCC and Clang agree on that:
https://godbolt.org/z/koc4Pb
So yeah, according to that rule, the value set in `p` wasn't from a _Bool if it
has the 0xf0 value. So you're re-interepreting memory between two different
types type-punning, and that's UB.
Quick and obvious fix:
static PyObject *
nu_bool(const char *p, const formatdef *f)
{
char x;
memcpy((char *)&x, p, sizeof x);
return PyBool_FromLong(x != 0);
}