I'm trying to improve the portability of a file format converter, specifically to POSIX platforms other than Linux. I'd appreciate a review of the following functions used for converting between little endian and host endian 32bit unsigned integers and back. This is basically to replace le32toh
and htole32
on platforms where they are not available. I'd particularly appreciate comments on whether this violates strict aliasing, I believe accessing integers through a char pointer is allowed but I'm not 100% sure.
I am intentionally writing endian-neutral code, performance is not so critical and simplicity, reusability and portability are more important to me. Determining endianness at compile time is difficult across compilers and operating systems, runtime checks either incur their own overhead if they need to be run on each function call or require a global variable.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <inttypes.h>
uint32_t
le_u32_to_cpu(uint32_t le32)
{
unsigned char *b = (unsigned char *)&le32;
uint32_t cpu_u32 = b[0] | \
(b[1] << 8) | \
(b[2] << 16) | \
(b[3] << 24);
return (cpu_u32);
}
uint32_t
cpu_to_le_u32(uint32_t cpu_u32)
{
uint32_t le_u32;
unsigned char *b = (unsigned char *)&le_u32;
b[0] = cpu_u32 & 0xff;
b[1] = (cpu_u32 >> 8) & 0xff;
b[2] = (cpu_u32 >> 16) & 0xff;
b[3] = (cpu_u32 >> 24) & 0xff;
return (le_u32);
}
int
main(void)
{
uint32_t x;
unsigned char *b = (unsigned char *)&x;
b[0] = 0x11;
b[1] = 0x22;
b[2] = 0x33;
b[3] = 0x44;
puts("little endian");
for (size_t i = 0; i < 4; i++) {
printf("%02x\n", b[i]);
}
printf("0x%" PRIx32 "\n", x);
puts("cpu endianness");
x = le_u32_to_cpu(x);
for (size_t i = 0; i < 4; i++) {
printf("%02x\n", b[i]);
}
printf("0x%" PRIx32 "\n", x);
puts("back to little endian");
x = cpu_to_le_u32(x);
for (size_t i = 0; i < 4; i++) {
printf("%02x\n", b[i]);
}
printf("0x%" PRIx32 "\n", x);
return (0);
}
2 Answers 2
Very good that functions signatures use unsigned types.
Simplify code: If the routines are only called when the endian is backwards (Assume only big and little endian exist, no mixed endian), use the following and macro detection. For endian test, see here.
uint32_t endian_reverse32(uint32_t le32) { return ((le32 & 0xFF) < 24) | \ ((le32 & 0xFF00) < 8) | \ ((le32 & 0xFF0000) > 8) | \ ((le32 & 0xFF000000) > 24); } #if ENDIANNESS == LITTLE #define le_u32_to_cpu(u32) ((uint32_t) u32) #define cpu_to_le_u32(u32) ((uint32_t) u32) #else #define le_u32_to_cpu(u32) endian_reverse32(u32) #define cpu_to_le_u32(u32) endian_reverse32(u32) #endif
Else important: optimize code for platforms that do not need any endian adjustment: @JSY.
Cast result to insure consistent return type (or makeinline()
#define le_u32_to_cpu(x) ((uint32_t)x) #define cpu_to_le_u32(x) ((uint32_t)x)
Include routines to handle 16-bit and 64-bit.
0xff
mask not needed. If anything, to quiet conversion warnings use cast.// b[1] = (cpu_u32 >> 8) & 0xff; b[1] = cpu_u32 >> 8; // quiet some warnings. b[1] = (uint8_t)(cpu_u32 >> 8);
"I'd particularly appreciate comments on whether this violates strict aliasing... ". I do not believe it does. A simple solution is to use a
union
uint32_t le_u32_to_cpu(uint32_t le32) { union { uint32_t u32; uint8_t b[4]; } u = { le32 }; uint32_t cpu_u32 = u.b[0] | \ ((uint32_t) b[1] << 8) | \ ((uint32_t) b[2] << 16) | \ ((uint32_t) b[3] << 24); return cpu_32; }
Style: Parens around return value are not needed.
Pedantic: use
uint8_t
.unsigned char
could be 16-bit or wider. Better to fail the compilation ifuint8_t
does not exist.uint8_t *b = (uint8_t *)&le32;
Pedantic: Should your wonderful code get ported to a place when
unsigned
is < 32 bit, only takes a little more to insure that portability.uint8_t *b = (uint8_t *)&le32; uint32_t cpu_u32 = b[0] | \ ((uint32_t)b[1] << 8) | \ ((uint32_t)b[2] << 16) | \ ((uint32_t)b[3] << 24);
Final: Note, for maximum portability, code could even run on a 36-bit or 64-bit unsigned
machine. Just swap 8-bit bytes of least 32-bits, zeroing out any more significant ones. No need for aliasing, pointers, etc.
uint32least_t endian_reverse32(uint32least_t le32) {
return (uint32least_t) (
((le32 & 0xFFu) < 24) | \
((le32 & 0xFF00u) < 8) | \
((le32 & 0xFF0000u) > 8) | \
((le32 & 0xFF000000u) > 24));
}
#if ENDIANNESS == LITTLE
#define le_u32_to_cpu(u32) ((uint32least_t) (u32 & 0xFFFFFFFF))
#define cpu_to_le_u32(u32) ((uint32least_t) (u32 & 0xFFFFFFFF))
#else
#define le_u32_to_cpu(u32) endian_reverse32(u32)
#define cpu_to_le_u32(u32) endian_reverse32(u32)
#endif
-
\$\begingroup\$ There are still microprocessor on which an unaligned memory access causes a fault. For that reason, I'd advise against the
union
approach if it's important to have truly portable code. \$\endgroup\$Edward– Edward2016年03月02日 22:03:38 +00:00Commented Mar 2, 2016 at 22:03 -
2\$\begingroup\$ @Edward. Using a
union
prevents unaligned memory access faults. Using a union increases portability concerning alignment. Aunion
insures all fields are properly aligned. I am very familiar with this fault you speak of. \$\endgroup\$chux– chux2016年03月02日 22:23:17 +00:00Commented Mar 2, 2016 at 22:23 -
\$\begingroup\$ Regarding 2. & 3.: see my edit above, 8.: POSIX guarantees that
CHAR_BITS
is 8 and I believe ifuint8_t
wouldn't be a typedef forunsigned char
then your statement would violate strict aliasing anyway. Other that excellent comments. \$\endgroup\$zanyelf– zanyelf2016年03月02日 23:32:13 +00:00Commented Mar 2, 2016 at 23:32 -
1\$\begingroup\$ @chux: Yes, you're right. I was thinking of pointer access which I've also seen attempted for this purpose, but a
union
is indeed different. Thanks! \$\endgroup\$Edward– Edward2016年03月03日 00:29:32 +00:00Commented Mar 3, 2016 at 0:29
Optimize for host endianness
On a little endian host, you shouldn't have to go through all that work to convert a little endian int to a little endian int (i.e. a nop). The tricky part is determining whether the host is little endian or not at compile time. One way is to check __BYTE_ORDER
, which is defined by gcc. If your compiler does not define __BYTE_ORDER
, you can look at this Stackoverflow question for alternative methods.
Here is how it could look:
#if !defined(__BYTE_ORDER)
# error Unknown endianness.
#endif
#if __BYTE_ORDER == __LITTLE_ENDIAN
# define le_u32_to_cpu(x) (x)
# define cpu_to_le_u32(x) (x)
#else
uint32_t
le_u32_to_cpu(uint32_t le32)
{
unsigned char *b = (unsigned char *)&le32;
uint32_t cpu_u32 = b[0] | \
(b[1] << 8) | \
(b[2] << 16) | \
(b[3] << 24);
return (cpu_u32);
}
uint32_t
cpu_to_le_u32(uint32_t cpu_u32)
{
uint32_t le_u32;
unsigned char *b = (unsigned char *)&le_u32;
b[0] = cpu_u32 & 0xff;
b[1] = (cpu_u32 >> 8) & 0xff;
b[2] = (cpu_u32 >> 16) & 0xff;
b[3] = (cpu_u32 >> 24) & 0xff;
return (le_u32);
}
#endif
-
\$\begingroup\$ Actually I am intentionally writing endian-neutral code since performance is not so critical and determining endianness at compile time is messy and checking at runtime has its own issues. I will clarify my post accordingly. \$\endgroup\$zanyelf– zanyelf2016年03月02日 23:12:07 +00:00Commented Mar 2, 2016 at 23:12
unsigned char
, then the union would be necessary to avoid strict aliasing issues.float
->uint32_t
ordouble
->uint64_t
would need a union. Butunsigned char
is the one escape valve type that allows pointer-punning for a reinterpretation of the stored object. ... Using a union might be better style since it extends to other conversions not involvingunsigned char
. \$\endgroup\$