Portable byte order conversion

Question 1

I'm trying to improve the portability of a file format converter, specifically to POSIX platforms other than Linux. I'd appreciate a review of the following functions used for converting between little endian and host endian 32bit unsigned integers and back. This is basically to replace le32toh and htole32 on platforms where they are not available. I'd particularly appreciate comments on whether this violates strict aliasing, I believe accessing integers through a char pointer is allowed but I'm not 100% sure.

I am intentionally writing endian-neutral code, performance is not so critical and simplicity, reusability and portability are more important to me. Determining endianness at compile time is difficult across compilers and operating systems, runtime checks either incur their own overhead if they need to be run on each function call or require a global variable.

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <inttypes.h>
uint32_t
le_u32_to_cpu(uint32_t le32)
{
 unsigned char *b = (unsigned char *)&le32;
 uint32_t cpu_u32 = b[0] | \
 (b[1] << 8) | \
 (b[2] << 16) | \
 (b[3] << 24);
 return (cpu_u32);
}
uint32_t
cpu_to_le_u32(uint32_t cpu_u32)
{
 uint32_t le_u32;
 unsigned char *b = (unsigned char *)&le_u32;
 b[0] = cpu_u32 & 0xff;
 b[1] = (cpu_u32 >> 8) & 0xff;
 b[2] = (cpu_u32 >> 16) & 0xff;
 b[3] = (cpu_u32 >> 24) & 0xff;
 return (le_u32);
}
int
main(void)
{
 uint32_t x;
 unsigned char *b = (unsigned char *)&x;
 b[0] = 0x11;
 b[1] = 0x22;
 b[2] = 0x33;
 b[3] = 0x44;
 puts("little endian");
 for (size_t i = 0; i < 4; i++) {
 printf("%02x\n", b[i]);
 }
 printf("0x%" PRIx32 "\n", x);
 puts("cpu endianness");
 x = le_u32_to_cpu(x);
 for (size_t i = 0; i < 4; i++) {
 printf("%02x\n", b[i]);
 }
 printf("0x%" PRIx32 "\n", x);
 puts("back to little endian");
 x = cpu_to_le_u32(x);
 for (size_t i = 0; i < 4; i++) {
 printf("%02x\n", b[i]);
 }
 printf("0x%" PRIx32 "\n", x);
 return (0);
}

Question 2

why not use unions?

Question 3

well, why use unions, can you please elaborate the advantages of one over the other?

Question 4

AIUI if one of the types you're reinterpreting is not unsigned char, then the union would be necessary to avoid strict aliasing issues. float->uint32_t or double -> uint64_t would need a union. But unsigned char is the one escape valve type that allows pointer-punning for a reinterpretation of the stored object. ... Using a union might be better style since it extends to other conversions not involving unsigned char.

Question 5

Very good that functions signatures use unsigned types.

Simplify code: If the routines are only called when the endian is backwards (Assume only big and little endian exist, no mixed endian), use the following and macro detection. For endian test, see here.

uint32_t endian_reverse32(uint32_t le32) {
 return ((le32 & 0xFF) < 24) | \
 ((le32 & 0xFF00) < 8) | \
 ((le32 & 0xFF0000) > 8) | \
 ((le32 & 0xFF000000) > 24);
}
#if ENDIANNESS == LITTLE
 #define le_u32_to_cpu(u32) ((uint32_t) u32)
 #define cpu_to_le_u32(u32) ((uint32_t) u32)
#else
 #define le_u32_to_cpu(u32) endian_reverse32(u32)
 #define cpu_to_le_u32(u32) endian_reverse32(u32)
#endif

Else important: optimize code for platforms that do not need any endian adjustment: @JSY.
Cast result to insure consistent return type (or make inline()
```
#define le_u32_to_cpu(x) ((uint32_t)x)
#define cpu_to_le_u32(x) ((uint32_t)x)
```
Include routines to handle 16-bit and 64-bit.

0xff mask not needed. If anything, to quiet conversion warnings use cast.

// b[1] = (cpu_u32 >> 8) & 0xff;
b[1] = cpu_u32 >> 8;
// quiet some warnings.
b[1] = (uint8_t)(cpu_u32 >> 8);

"I'd particularly appreciate comments on whether this violates strict aliasing... ". I do not believe it does. A simple solution is to use a union

uint32_t le_u32_to_cpu(uint32_t le32) {
 union {
 uint32_t u32;
 uint8_t b[4];
 } u = { le32 };
 uint32_t cpu_u32 = u.b[0] | \
 ((uint32_t) b[1] << 8) | \
 ((uint32_t) b[2] << 16) | \
 ((uint32_t) b[3] << 24);
 return cpu_32;
}

Style: Parens around return value are not needed.
Pedantic: use uint8_t. unsigned char could be 16-bit or wider. Better to fail the compilation if uint8_t does not exist.
```
uint8_t *b = (uint8_t *)&le32;
```

Pedantic: Should your wonderful code get ported to a place when unsigned is < 32 bit, only takes a little more to insure that portability.

uint8_t *b = (uint8_t *)&le32;
uint32_t cpu_u32 = b[0] | \
 ((uint32_t)b[1] << 8) | \
 ((uint32_t)b[2] << 16) | \
 ((uint32_t)b[3] << 24);

Final: Note, for maximum portability, code could even run on a 36-bit or 64-bit unsigned machine. Just swap 8-bit bytes of least 32-bits, zeroing out any more significant ones. No need for aliasing, pointers, etc.

 uint32least_t endian_reverse32(uint32least_t le32) {
 return (uint32least_t) (
 ((le32 & 0xFFu) < 24) | \
 ((le32 & 0xFF00u) < 8) | \
 ((le32 & 0xFF0000u) > 8) | \
 ((le32 & 0xFF000000u) > 24));
 }
 #if ENDIANNESS == LITTLE
 #define le_u32_to_cpu(u32) ((uint32least_t) (u32 & 0xFFFFFFFF))
 #define cpu_to_le_u32(u32) ((uint32least_t) (u32 & 0xFFFFFFFF))
 #else
 #define le_u32_to_cpu(u32) endian_reverse32(u32)
 #define cpu_to_le_u32(u32) endian_reverse32(u32)
 #endif

Question 6

There are still microprocessor on which an unaligned memory access causes a fault. For that reason, I'd advise against the union approach if it's important to have truly portable code.

Question 7

@Edward. Using a union prevents unaligned memory access faults. Using a union increases portability concerning alignment. A union insures all fields are properly aligned. I am very familiar with this fault you speak of.

Question 8

Regarding 2. & 3.: see my edit above, 8.: POSIX guarantees that CHAR_BITS is 8 and I believe if uint8_t wouldn't be a typedef for unsigned char then your statement would violate strict aliasing anyway. Other that excellent comments.

Question 9

@chux: Yes, you're right. I was thinking of pointer access which I've also seen attempted for this purpose, but a union is indeed different. Thanks!

Question 10

Optimize for host endianness

On a little endian host, you shouldn't have to go through all that work to convert a little endian int to a little endian int (i.e. a nop). The tricky part is determining whether the host is little endian or not at compile time. One way is to check __BYTE_ORDER, which is defined by gcc. If your compiler does not define __BYTE_ORDER, you can look at this Stackoverflow question for alternative methods.

Here is how it could look:

#if !defined(__BYTE_ORDER)
# error Unknown endianness.
#endif
#if __BYTE_ORDER == __LITTLE_ENDIAN
# define le_u32_to_cpu(x) (x)
# define cpu_to_le_u32(x) (x)
#else
uint32_t
le_u32_to_cpu(uint32_t le32)
{
 unsigned char *b = (unsigned char *)&le32;
 uint32_t cpu_u32 = b[0] | \
 (b[1] << 8) | \
 (b[2] << 16) | \
 (b[3] << 24);
 return (cpu_u32);
}
uint32_t
cpu_to_le_u32(uint32_t cpu_u32)
{
 uint32_t le_u32;
 unsigned char *b = (unsigned char *)&le_u32;
 b[0] = cpu_u32 & 0xff;
 b[1] = (cpu_u32 >> 8) & 0xff;
 b[2] = (cpu_u32 >> 16) & 0xff;
 b[3] = (cpu_u32 >> 24) & 0xff;
 return (le_u32);
}
#endif

Question 11

Actually I am intentionally writing endian-neutral code since performance is not so critical and determining endianness at compile time is messy and checking at runtime has its own issues. I will clarify my post accordingly.

chux chux 36.1k2 gold badges43 silver badges96 bronze badges · Accepted Answer · 2016-03-02 21:11:33Z

Very good that functions signatures use unsigned types.

Simplify code: If the routines are only called when the endian is backwards (Assume only big and little endian exist, no mixed endian), use the following and macro detection. For endian test, see here.

uint32_t endian_reverse32(uint32_t le32) {
 return ((le32 & 0xFF) < 24) | \
 ((le32 & 0xFF00) < 8) | \
 ((le32 & 0xFF0000) > 8) | \
 ((le32 & 0xFF000000) > 24);
}
#if ENDIANNESS == LITTLE
 #define le_u32_to_cpu(u32) ((uint32_t) u32)
 #define cpu_to_le_u32(u32) ((uint32_t) u32)
#else
 #define le_u32_to_cpu(u32) endian_reverse32(u32)
 #define cpu_to_le_u32(u32) endian_reverse32(u32)
#endif

Else important: optimize code for platforms that do not need any endian adjustment: @JSY.
Cast result to insure consistent return type (or make inline()
```
#define le_u32_to_cpu(x) ((uint32_t)x)
#define cpu_to_le_u32(x) ((uint32_t)x)
```
Include routines to handle 16-bit and 64-bit.

0xff mask not needed. If anything, to quiet conversion warnings use cast.

// b[1] = (cpu_u32 >> 8) & 0xff;
b[1] = cpu_u32 >> 8;
// quiet some warnings.
b[1] = (uint8_t)(cpu_u32 >> 8);

"I'd particularly appreciate comments on whether this violates strict aliasing... ". I do not believe it does. A simple solution is to use a union

uint32_t le_u32_to_cpu(uint32_t le32) {
 union {
 uint32_t u32;
 uint8_t b[4];
 } u = { le32 };
 uint32_t cpu_u32 = u.b[0] | \
 ((uint32_t) b[1] << 8) | \
 ((uint32_t) b[2] << 16) | \
 ((uint32_t) b[3] << 24);
 return cpu_32;
}

Style: Parens around return value are not needed.
Pedantic: use uint8_t. unsigned char could be 16-bit or wider. Better to fail the compilation if uint8_t does not exist.
```
uint8_t *b = (uint8_t *)&le32;
```

Pedantic: Should your wonderful code get ported to a place when unsigned is < 32 bit, only takes a little more to insure that portability.

uint8_t *b = (uint8_t *)&le32;
uint32_t cpu_u32 = b[0] | \
 ((uint32_t)b[1] << 8) | \
 ((uint32_t)b[2] << 16) | \
 ((uint32_t)b[3] << 24);

Final: Note, for maximum portability, code could even run on a 36-bit or 64-bit unsigned machine. Just swap 8-bit bytes of least 32-bits, zeroing out any more significant ones. No need for aliasing, pointers, etc.

 uint32least_t endian_reverse32(uint32least_t le32) {
 return (uint32least_t) (
 ((le32 & 0xFFu) < 24) | \
 ((le32 & 0xFF00u) < 8) | \
 ((le32 & 0xFF0000u) > 8) | \
 ((le32 & 0xFF000000u) > 24));
 }
 #if ENDIANNESS == LITTLE
 #define le_u32_to_cpu(u32) ((uint32least_t) (u32 & 0xFFFFFFFF))
 #define cpu_to_le_u32(u32) ((uint32least_t) (u32 & 0xFFFFFFFF))
 #else
 #define le_u32_to_cpu(u32) endian_reverse32(u32)
 #define cpu_to_le_u32(u32) endian_reverse32(u32)
 #endif

There are still microprocessor on which an unaligned memory access causes a fault. For that reason, I'd advise against the union approach if it's important to have truly portable code.
@Edward. Using a union prevents unaligned memory access faults. Using a union increases portability concerning alignment. A union insures all fields are properly aligned. I am very familiar with this fault you speak of.
Regarding 2. & 3.: see my edit above, 8.: POSIX guarantees that CHAR_BITS is 8 and I believe if uint8_t wouldn't be a typedef for unsigned char then your statement would violate strict aliasing anyway. Other that excellent comments.
@chux: Yes, you're right. I was thinking of pointer access which I've also seen attempted for this purpose, but a union is indeed different. Thanks!

Stack Exchange Network

Portable byte order conversion

2 Answers 2

Optimize for host endianness

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Portable byte order conversion

2 Answers 2

Optimize for host endianness

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions