Generically convert bytes to integers

Question 1

I wanted to know If there was a way to make it more versatile or shorter/simpler. Here's the code:

// Take only std::byte parameter and return an unsigned integral
constexpr auto bytes_to_uint(std::same_as<std::byte> auto... bytes) -> std::unsigned_integral auto {
 constexpr auto N = sizeof...(bytes);
 // Integral types large enough to hold N bytes
 using types = std::tuple<
 std::uint8_t,
 std::uint16_t,
 std::uint32_t,
 std::uint32_t,
 std::uint64_t,
 std::uint64_t,
 std::uint64_t,
 std::uint64_t
 >;
 
 using result = std::tuple_element_t<N, types>;
 
 return [&]<std::size_t... S>(std::index_sequence<S...>) {
 // Accumulate the part of the number using the bitwise or operator for each bytes
 return ((static_cast<result>(bytes) << CHAR_BIT * (N - S - 1)) | ... );
 }(std::make_index_sequence<N>{});
}

It is meant to be used like this:

bytes_to_uint(std::byte{0xaa}, std::byte{0xbb}); // std::uint16_t: 0xaabb
bytes_to_uint(
 std::byte{0x11},
 std::byte{0x22},
 std::byte{0x33},
 std::byte{0x44}
); // std::uint32_t: 0x11223344

Question 2

The function signature strikes me as difficult to read, thanks to the constraint std::same_as<std::byte> auto... and the trailing "return type" std::unsigned_integral auto. I might rather write something like

constexpr auto bytes_to_uint(std::initializer_list<std::byte> bytes) {

...Ah, but then you couldn't use bytes.size() as a constant expression; I see. So I'd think about writing an overload set, like this:

constexpr std::uint8_t bytes_to_uint(std::byte a) {
 return a;
}
constexpr std::uint16_t bytes_to_uint(std::byte a, std::byte b) {
 return (a << 8) | b;
}
constexpr std::uint32_t bytes_to_uint(std::byte a, std::byte b, std::byte c, std::byte d) {
 return (a << 24) | (b << 16) | (c << 8) | d;
}

But I guess this is messy because you need 16 different overloads. You can't even use default function arguments, because you want bytes_to_uint(a,b,c) to be equal to bytes_to_uint(0,a,b,c) and not bytes_to_uint(a,b,c,0). Of course you could still write

#define B std::byte
constexpr std::uint8_t bytes_to_uint(B a)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,a); }
constexpr std::uint16_t bytes_to_uint(B a, B b)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,0,0,0,a,b); }
constexpr std::uint32_t bytes_to_uint(B a, B b, B c)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,0,0,a,b,c); }
constexpr std::uint32_t bytes_to_uint(B a, B b, B c, B d)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,0,a,b,c,d); }
constexpr std::uint64_t bytes_to_uint(B a, B b, B c, B d, B e)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,a,b,c,d,e); }
[...22 more lines...]
#undef B

but I bet you don't want to do that. Okay, let's stick with the templatey thing you did.

using result = std::tuple_element_t<N, types>;

I'd rather see this dependent typedef use CamelCase (like a template parameter) or suffixedwith_type (like an STL member typedef). Calling it result makes it look too much like a variable, and makes it hard to pick out the one place where you use it.

Instead of spending 13 lines and a <tuple> dependency, I'd rather just do

using ResultType = std::conditional_t<
 (N == 1), std::uint8_t, std::conditional_t<
 (N == 2), std::uint16_t, std::conditional_t<
 (N <= 4), std::uint32_t, std::uint64_t>>>;

Which reminds me, you need something like

static_assert(N <= 16);

to stop you from trying to handle an argument list of 17 bytes or more.

And I didn't even notice until I tried it in Godbolt, but, you've got an off-by-one bug here! You meant tuple_element_t<N-1, types>. Remember that indexing always starts at zero (everywhere except <regex>).

If you don't like conditional_t, another option is to use plain old ifs. Factor out the computation while it's still parameterized on ResultType, and then use an if-else chain to decide on the right type to plug in for ResultType later:

auto do_it = [&]<class ResultType, std::size_t... S>(ResultType, std::index_sequence<S...>) {
 return ((static_cast<ResultType>(bytes) << CHAR_BIT * (N - S - 1)) | ... );
};
if constexpr (N == 1) {
 return do_it(std::uint8_t{}, std::make_index_sequence<N>{});
} else if constexpr (N == 2) {
 return do_it(std::uint16_t{}, std::make_index_sequence<N>{});
} else if constexpr (N <= 4) {
 return do_it(std::uint32_t{}, std::make_index_sequence<N>{});
} else if constexpr (N <= 8) {
 return do_it(std::uint64_t{}, std::make_index_sequence<N>{});
}

Even better, permit the compiler to do the math at its preferred bit-width (which is 64 bits on all the desktop platforms I care about) and then truncate it at the end. This produces similar codegen and reads nicer yet:

std::uint64_t result = [&]<std::size_t... S>( std::index_sequence<S...>) {
 return ((static_cast<std::uint64_t>(bytes) << CHAR_BIT * (N - S - 1)) | ... );
}(std::make_index_sequence<N>{});
if constexpr (N == 1) {
 return std::uint8_t(result);
} else if constexpr (N == 2) {
 return std::uint16_t(result);
} else if constexpr (N <= 4) {
 return std::uint32_t(result);
} else {
 return std::uint64_t(result);
}

You have another bug when N == 1 (besides the off-by-one bug). When N == 1, the fold-expression has no | operations at all, and so it's just a uint8_t shifted by zero. That shift expression has type int. Which is not an unsigned integral type. So your return-type-constraint fails!

This is just another reason to do all the math first in uint64_t, and then cast down to uint8_t right before you return, as shown in my last example above.

Writing any test cases would have caught both this bug and the off-by-one bug. Test cases are always important! Especially when you're planning to put the code up for public review. (Or for review by coworkers, for that matter.)

Finally, I recommend parentheses to clarify the precedence of x << CHAR_BIT * y. In context it's obvious what you expected the precedence to be; but as a reader, I'm not sure you're right. Put the parentheses so that I don't have to think about it even for a second.

However, in this context, that's a very minor point, because you clearly aren't expecting anybody to read the expression ((static_cast<result>(bytes) << CHAR_BIT * (N - S - 1)) | ... ). It's a "trust me" line of code.

It's also silly to pretend that CHAR_BIT is relevant here. This code explodes spectacularly if CHAR_BIT is anything other than 8. So just write 8; and if you are compelled to work in a reference to CHAR_BIT, do so by writing

static_assert(CHAR_BIT == 8);

at the top of the function.

score 3 · Accepted Answer · 2020-09-01 14:09:28Z

The function signature strikes me as difficult to read, thanks to the constraint std::same_as<std::byte> auto... and the trailing "return type" std::unsigned_integral auto. I might rather write something like

constexpr auto bytes_to_uint(std::initializer_list<std::byte> bytes) {

...Ah, but then you couldn't use bytes.size() as a constant expression; I see. So I'd think about writing an overload set, like this:

constexpr std::uint8_t bytes_to_uint(std::byte a) {
 return a;
}
constexpr std::uint16_t bytes_to_uint(std::byte a, std::byte b) {
 return (a << 8) | b;
}
constexpr std::uint32_t bytes_to_uint(std::byte a, std::byte b, std::byte c, std::byte d) {
 return (a << 24) | (b << 16) | (c << 8) | d;
}

But I guess this is messy because you need 16 different overloads. You can't even use default function arguments, because you want bytes_to_uint(a,b,c) to be equal to bytes_to_uint(0,a,b,c) and not bytes_to_uint(a,b,c,0). Of course you could still write

#define B std::byte
constexpr std::uint8_t bytes_to_uint(B a)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,a); }
constexpr std::uint16_t bytes_to_uint(B a, B b)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,0,0,0,a,b); }
constexpr std::uint32_t bytes_to_uint(B a, B b, B c)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,0,0,a,b,c); }
constexpr std::uint32_t bytes_to_uint(B a, B b, B c, B d)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,0,a,b,c,d); }
constexpr std::uint64_t bytes_to_uint(B a, B b, B c, B d, B e)
 { return bytes_to_uint(0,0,0,0,0,0,0,0,0,0,0,a,b,c,d,e); }
[...22 more lines...]
#undef B

but I bet you don't want to do that. Okay, let's stick with the templatey thing you did.

using result = std::tuple_element_t<N, types>;

I'd rather see this dependent typedef use CamelCase (like a template parameter) or suffixedwith_type (like an STL member typedef). Calling it result makes it look too much like a variable, and makes it hard to pick out the one place where you use it.

Instead of spending 13 lines and a <tuple> dependency, I'd rather just do

using ResultType = std::conditional_t<
 (N == 1), std::uint8_t, std::conditional_t<
 (N == 2), std::uint16_t, std::conditional_t<
 (N <= 4), std::uint32_t, std::uint64_t>>>;

Which reminds me, you need something like

static_assert(N <= 16);

to stop you from trying to handle an argument list of 17 bytes or more.

And I didn't even notice until I tried it in Godbolt, but, you've got an off-by-one bug here! You meant tuple_element_t<N-1, types>. Remember that indexing always starts at zero (everywhere except <regex>).

If you don't like conditional_t, another option is to use plain old ifs. Factor out the computation while it's still parameterized on ResultType, and then use an if-else chain to decide on the right type to plug in for ResultType later:

auto do_it = [&]<class ResultType, std::size_t... S>(ResultType, std::index_sequence<S...>) {
 return ((static_cast<ResultType>(bytes) << CHAR_BIT * (N - S - 1)) | ... );
};
if constexpr (N == 1) {
 return do_it(std::uint8_t{}, std::make_index_sequence<N>{});
} else if constexpr (N == 2) {
 return do_it(std::uint16_t{}, std::make_index_sequence<N>{});
} else if constexpr (N <= 4) {
 return do_it(std::uint32_t{}, std::make_index_sequence<N>{});
} else if constexpr (N <= 8) {
 return do_it(std::uint64_t{}, std::make_index_sequence<N>{});
}

Even better, permit the compiler to do the math at its preferred bit-width (which is 64 bits on all the desktop platforms I care about) and then truncate it at the end. This produces similar codegen and reads nicer yet:

std::uint64_t result = [&]<std::size_t... S>( std::index_sequence<S...>) {
 return ((static_cast<std::uint64_t>(bytes) << CHAR_BIT * (N - S - 1)) | ... );
}(std::make_index_sequence<N>{});
if constexpr (N == 1) {
 return std::uint8_t(result);
} else if constexpr (N == 2) {
 return std::uint16_t(result);
} else if constexpr (N <= 4) {
 return std::uint32_t(result);
} else {
 return std::uint64_t(result);
}

You have another bug when N == 1 (besides the off-by-one bug). When N == 1, the fold-expression has no | operations at all, and so it's just a uint8_t shifted by zero. That shift expression has type int. Which is not an unsigned integral type. So your return-type-constraint fails!

This is just another reason to do all the math first in uint64_t, and then cast down to uint8_t right before you return, as shown in my last example above.

Writing any test cases would have caught both this bug and the off-by-one bug. Test cases are always important! Especially when you're planning to put the code up for public review. (Or for review by coworkers, for that matter.)

Finally, I recommend parentheses to clarify the precedence of x << CHAR_BIT * y. In context it's obvious what you expected the precedence to be; but as a reader, I'm not sure you're right. Put the parentheses so that I don't have to think about it even for a second.

However, in this context, that's a very minor point, because you clearly aren't expecting anybody to read the expression ((static_cast<result>(bytes) << CHAR_BIT * (N - S - 1)) | ... ). It's a "trust me" line of code.

It's also silly to pretend that CHAR_BIT is relevant here. This code explodes spectacularly if CHAR_BIT is anything other than 8. So just write 8; and if you are compelled to work in a reference to CHAR_BIT, do so by writing

static_assert(CHAR_BIT == 8);

at the top of the function.

Stack Exchange Network

Generically convert bytes to integers

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Generically convert bytes to integers

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions