I wrote a short function whose purpose is simple: Print the binary representation of number to the console. I also aim to be as portable as possible, hence my use of CHAR_BIT
and +
instead of |
in case the character encoding for '0'
happens to be odd. Is my code as good and portable as I intended?
#include <stdio.h>
#include <limits.h>
#define SIZE (sizeof(unsigned long long)*CHAR_BIT)
int putb(const unsigned long long n) {
char b[SIZE+1];
for (unsigned i = SIZE; i--; b[i^SIZE-1] = ((char)(n>>i)&1)+'0');
b[SIZE] = 0;
return puts(b);
}
2 Answers 2
The obvious portability problem is that we have b[i^SIZE-1]
where I'd expect b[SIZE-1-i]
. That looks like an error of judgement: it produces the same results when SIZE
is an exact power of 2, but not otherwise.
Instead of the SIZE
preprocessor macro, I'd probably use a constant within the function:
static const size_t length = sizeof n * CHAR_BIT;
If you do stick with the macro, consider #undef SIZE
afterwards so it's available to other code.
As a style issue, I don't like the for
loop with empty body on the same line. That looks more like code-golf than something that's intended to be readable - especially with the "work" of the loop stuffed into the control expression.
We really want the cast to char
to happen to the sum, since char
+char
yields int
. That said, gcc -pedantic -Wconversion
doesn't complain without it.
I would write the character constant 0
as '0円'
to better convey the intent. That's especially important as we have '0'
close by.
You might find it easier, clearer and more efficient to work backwards from the least-significant bit:
int putb(unsigned long long n)
{
char b[(sizeof n * CHAR_BIT) + 1];
char *p = b + sizeof b;
*--p = '0円';
for (; p-- > b; n >>= 1) {
*p = '0' + (char)(n & 1);
}
return puts(b);
}
This gives the option of a version that doesn't print leading zeros (and therefore modified to accept any size integer):
#include <stdint.h>
int putb(uintmax_t n)
{
char b[(sizeof n * CHAR_BIT) + 1];
char *p = b + sizeof b;
*--p = '0円';
do {
*--p = '0' + (n & 1);
} while (n >>= 1);
return puts(p);
}
This one is trivially modified to add a 0b
prefix, should that be desired.
-
1\$\begingroup\$ it happens to produce the same results when SIZE is an exact power of 2, but not otherwise - It's not just a coincidence; it's a known bithack which is mostly useful in assembly language, where
31 - reg
can't be done in a single instruction on most ISAs (ARM has anrsb
reverse-subtract instruction) butreg ^= 31
can. I've maybe only seen it in the context of how GCC implements__builtin_clz()
in terms of x86'sbsr
instruction, which produces a bit-index instead of leading-zero count. The builtin (like the instruction) doesn't define the result for input=0, so GCC can use ^=31. \$\endgroup\$Peter Cordes– Peter Cordes2023年06月10日 19:15:22 +00:00Commented Jun 10, 2023 at 19:15 -
1\$\begingroup\$ 100% agreed the XOR bithack is not a good idea in this code, though! If you want to write something easily readable that's more likely to compile to efficient asm given C's lack of a rotate builtin or capturing carry-out from a left shift, looping backwards like you're doing and taking the low bit is good. (@user16217248). (In x86 asm you'd want something like
shl rcx, 1
/mov rax, '0'
/adc rax, 0
to turn each bit into an ASCII digit via the carry flag, starting from the top bit. Or various other options likeshl
/setc al
/add al, '0'
. Or use SSE2 to do 16 bits in parallel.) \$\endgroup\$Peter Cordes– Peter Cordes2023年06月10日 19:21:34 +00:00Commented Jun 10, 2023 at 19:21 -
1\$\begingroup\$
char b[length+1];
Is this considered a variable-length array, even though in practice it's not? \$\endgroup\$CPlus– CPlus2023年06月10日 19:27:34 +00:00Commented Jun 10, 2023 at 19:27 -
2\$\begingroup\$ @user16217248: Yes in C it's a VLA, but compilers will in practice see the compile-time constant and not waste instructions, as long as you enable optimization. With
-O0
(godbolt.org/z/jenhMx7W3), clang stores a pointer to the array into a local var, like it would do with an actual VLA. Fun fact: in C++ it's not a VLA. In C++, aconst int
initialized with a constant expression becomesconstexpr
. \$\endgroup\$Peter Cordes– Peter Cordes2023年06月11日 06:50:26 +00:00Commented Jun 11, 2023 at 6:50 -
1\$\begingroup\$ (Unfortunately with
-O3
, both GCC and clang compile this to amazingly bad code, especially clang which does 45vpinsrb
instructions + 3xvmovd
to shuffle 1 byte at a time into SIMD vectors. As in How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)? / is there an inverse instruction to the movemask instruction in intel avx2? . And Convert 16 bits mask to 16 bytes mask has a version that does printing order, since that's what the question secretly wanted) \$\endgroup\$Peter Cordes– Peter Cordes2023年06月11日 07:02:38 +00:00Commented Jun 11, 2023 at 7:02
Is my code as good and portable as I intended?
Not quite.
C23
Use specifier "%b"
.
#include <stdio.h>
#include <limits.h>
printf("%0*llb\n", ULLONG_WIDTH, n); // Print all bits
printf("%llb\n", n); // Print significant bits
For highly portability C99 onward
OP used a constant sized buffer - good. Variable length arrays since C11 are optionally supported.
Do not assume no padding bits as with
sizeof(unsigned long long)*CHAR_BIT
. The number of value bits in anunsigned long long
depends onULLONG_MAX
.
#include <stdio.h>
#include <limits.h>
// https://stackoverflow.com/a/4589384/2410359
/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
#define SIZE (IMAX_BITS(ULLONG_MAX))
...
Consider simply code such as below which looks at the least significant bit per iteration.
int putb(unsigned long long n) {
char b[SIZE+1];
b[SIZE] = '0円';
for (unsigned i = SIZE; i-- > 0; ) {
b[i] = (char) ((n&1) + '0');
n >>= 1;
}
return puts(b);
}
Pre-C99
There is no unsigned long long
.
-
1\$\begingroup\$ Presence of padding bits may cause the buffer to be slightly oversized, but is that particularly harmful? Compared to the readability of the alternative, and the fact we're doing I/O, it seems irrelevant. \$\endgroup\$Toby Speight– Toby Speight2023年06月12日 12:27:08 +00:00Commented Jun 12, 2023 at 12:27
-
1\$\begingroup\$ @TobySpeight "Presence of padding bits may cause the buffer to be slightly oversized, but is that particularly harmful?" --> No - an extra
char
or so in a small local buffer is insignificant. Yet code miscalculates withfor (; p-- > b; n >>= 1) {
the number of bits to print. It prints extra leading "0" characters when the type is padded. \$\endgroup\$chux– chux2023年06月12日 12:31:12 +00:00Commented Jun 12, 2023 at 12:31 -
\$\begingroup\$ Ah yes, of course. An alternative is to use an all-ones constant of same type (i.e.
~0ULL
) and shift that along withn
as the count (for unsigned long long i = ~0ULL; i; i >>= 1)
). \$\endgroup\$Toby Speight– Toby Speight2023年06月12日 12:35:15 +00:00Commented Jun 12, 2023 at 12:35 -
\$\begingroup\$ @TobySpeight true, yet I find
for (unsigned i = SIZE; i-- > 0; ) {
more direct. \$\endgroup\$chux– chux2023年06月12日 12:37:20 +00:00Commented Jun 12, 2023 at 12:37