General bytes to number converter (c++)

Question 1

I've written a bytes to number converter for cpp similar to the python int.from_bytes.

It does not make any assumptions on:

The endianness of the input buffer
The endianness of the system
The endianness consistency of the the system ( = no assumption that all numeric types have the same endianness )

Additionally the source buffer can be smaller than the target.

#include <cstdint>
#include <span>
#include <stdexcept>
namespace Binary
{
 // Used to indicate the endianness of a type
 // Better readability than a simple bool
 enum class Endianness
 {
 Little,
 Big,
 };
 namespace Int {
 // Base is the underlying type that will be used to store the eventual result
 template< typename Base >
 Base FromBytes( std::span< std::uint8_t > inBuffer, Binary::Endianness inEndianness )
 {
 // Check if we have enough space
 if( sizeof( Base ) < inBuffer.size() )
 {
 throw std::length_error( "unable to fit data in requested type" );
 }
 // Determine endianness of the Base type on this system
 union {
 Base i;
 char c[ sizeof( Base ) ];
 } test = { .i = 1 };
 bool tmp = test.c[ sizeof( Base ) - 1 ] == '1円';
 Binary::Endianness type_endianness = tmp ? Binary::Endianness::Big : Binary::Endianness::Little;
 // Always initialize your variables
 Base result = Base{ 0 };
 std::uint8_t * destination = ( std::uint8_t * ) &result;
 // If we have big endian and we have a smaller buffer the first bytes should
 // remain zero, meaning we should start filling further back
 if( type_endianness == Binary::Endianness::Big ) { destination += sizeof( Base ) - inBuffer.size(); }
 if( type_endianness == inEndianness )
 {
 // equal endianness => simple copy
 memcpy( destination, inBuffer.data(), inBuffer.size() );
 }
 else
 {
 // differing endianness => copy in reverse
 for( size_t i = 0; i < inBuffer.size(); i+=1 )
 {
 destination[ i ] = inBuffer[ inBuffer.size() - i - 1 ];
 }
 }
 return result;
 }
 }
}

Are there any improvements I could make?

Question 2

I like the enum. Except that it is different from the one C++20 defines.

We're missing some review context, like the range of use cases we're trying to address. This submission contains no unit tests or other demo code. It's unclear if caller is likely to call into this repeatedly through an array loop. Perhaps "exotic" 24-bit quantities motivated this code.

use constexpr

Every call to FromBytes asks about native endianess. It's a nice piece of code, very clear, but we should cache the result as it won't change. Or better, find the result at compile time.

Even better, dispense with it and rely on std::endian::native. Or write down the underlying requirements so we understand what systems are being targeted and what language / compiler variants are within scope.

nit: Assigning tmp = test.c[0] == '1円'; seems a little more convenient. But perhaps you wanted the order for the ternary to be "big, little" for aesthetic reasons. Consider renaming tmp to is_big.

Delete uninformative comments like "Always initialize your variables".

I am glad to see that .size() is the first thing we check. Good.

nit: I find the second form slightly easier to read.

... inBuffer[ inBuffer.size() - i - 1 ]

... inBuffer[ inBuffer.size() - 1 - i ]

Why? I read it as "constant minus i" where I can reason about the constant location independent of the loop. Rather than having to puzzle out the numeric relationship between size and i, observe it is always positive, and then decrement to get to the zero-th element.

document language version

std::reverse_copy has been available since C++17, but we choose not to use it, unclear why.

Using a standard routine might yield performance identical to a naïve loop. But maybe a vendor went to some trouble to vectorize it for your target processor, and it will go quicker. Use it for the same reason you prefer memcpy().

Tell us which language variants are acceptable for this project, write it down so future maintainers will know the rules.

I was slightly surprised to not see ntohl mentioned, nor be64toh, but maybe {3, 5, 6, 7}-byte inBuffers are just as important for the use case we're addressing. Those standard functions should absolutely appear in your automated test suite.

document the sign bit

The brief review context explicitly says that we implement a well-known interface. We do support a corresponding endianness parameter. But interpretation of the source sign bit is left implicit. Presumably it matches the Base type. This should be explicitly written down.

(Also, "Base" seems a slightly odd name as it suggests "Radix" which isn't relevant here. Maybe "NumericType"?)

API design

Imagine the caller is assigning a great many 16-bit integers via an array loop.

I worry about whether we're giving the compiler a good chance at vectorizing such a bulk assignment. Consider offering a second method that moves K integers of identical size.

Imagine the input numbers are already in native order and they have a standard {16, 32, 64}-bit size.

Would it help your use case if this routine was given the flexibility to report no-op by returning address of input buffer, avoiding the memcpy() ?

This code achieves its design goals. Before merging it to main it should be better documented and should be accompanied by unit tests.

I would be happy to delegate or accept maintenance tasks on this codebase.

J_H J_H 41.7k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2023-03-31 16:35:35Z

I like the enum. Except that it is different from the one C++20 defines.

We're missing some review context, like the range of use cases we're trying to address. This submission contains no unit tests or other demo code. It's unclear if caller is likely to call into this repeatedly through an array loop. Perhaps "exotic" 24-bit quantities motivated this code.

use constexpr

Every call to FromBytes asks about native endianess. It's a nice piece of code, very clear, but we should cache the result as it won't change. Or better, find the result at compile time.

Even better, dispense with it and rely on std::endian::native. Or write down the underlying requirements so we understand what systems are being targeted and what language / compiler variants are within scope.

nit: Assigning tmp = test.c[0] == '1円'; seems a little more convenient. But perhaps you wanted the order for the ternary to be "big, little" for aesthetic reasons. Consider renaming tmp to is_big.

Delete uninformative comments like "Always initialize your variables".

I am glad to see that .size() is the first thing we check. Good.

nit: I find the second form slightly easier to read.

... inBuffer[ inBuffer.size() - i - 1 ]

... inBuffer[ inBuffer.size() - 1 - i ]

Why? I read it as "constant minus i" where I can reason about the constant location independent of the loop. Rather than having to puzzle out the numeric relationship between size and i, observe it is always positive, and then decrement to get to the zero-th element.

document language version

std::reverse_copy has been available since C++17, but we choose not to use it, unclear why.

Using a standard routine might yield performance identical to a naïve loop. But maybe a vendor went to some trouble to vectorize it for your target processor, and it will go quicker. Use it for the same reason you prefer memcpy().

Tell us which language variants are acceptable for this project, write it down so future maintainers will know the rules.

I was slightly surprised to not see ntohl mentioned, nor be64toh, but maybe {3, 5, 6, 7}-byte inBuffers are just as important for the use case we're addressing. Those standard functions should absolutely appear in your automated test suite.

document the sign bit

The brief review context explicitly says that we implement a well-known interface. We do support a corresponding endianness parameter. But interpretation of the source sign bit is left implicit. Presumably it matches the Base type. This should be explicitly written down.

(Also, "Base" seems a slightly odd name as it suggests "Radix" which isn't relevant here. Maybe "NumericType"?)

API design

Imagine the caller is assigning a great many 16-bit integers via an array loop.

I worry about whether we're giving the compiler a good chance at vectorizing such a bulk assignment. Consider offering a second method that moves K integers of identical size.

Imagine the input numbers are already in native order and they have a standard {16, 32, 64}-bit size.

Would it help your use case if this routine was given the flexibility to report no-op by returning address of input buffer, avoiding the memcpy() ?

This code achieves its design goals. Before merging it to main it should be better documented and should be accompanied by unit tests.

I would be happy to delegate or accept maintenance tasks on this codebase.

Stack Exchange Network

General bytes to number converter (c++)

1 Answer 1

use constexpr

document language version

document the sign bit

API design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

General bytes to number converter (c++)

1 Answer 1

use constexpr

document language version

document the sign bit

API design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions