Bit manipulator (reader / writer)

Question 1

Please take a review of my simple bit manipulator:

#ifndef BIT_STREAM_H
#define BIT_STREAM_H
#include <cstdint>
class BitStream {
public:
 explicit BitStream(uint8_t* buffer, size_t buffer_size) : m_Buffer(buffer), m_BufferSize(buffer_size) {
 }
 virtual ~BitStream() = default;
public:
 inline uint64_t Mask(uint64_t offset, size_t size) const {
 return ((1 << size) - 1) << offset;
 }
public:
 inline size_t GetBufferSize() const {
 return m_BufferSize;
 }
 inline size_t GetOffset() const {
 return m_Offset;
 }
 inline bool GetErrorState() const {
 return m_ErrorState;
 }
protected:
 uint8_t* m_Buffer = nullptr;
 size_t m_BufferSize = 0;
 size_t m_Offset = 0;
 bool m_ErrorState = false;
};
class BitReader : public BitStream {
public:
 explicit BitReader(uint8_t* buffer, size_t buffer_size) : BitStream(buffer, buffer_size) {
 }
 template <size_t buffer_size>
 explicit BitReader(uint8_t(&buffer)[buffer_size]) : BitStream(buffer, buffer_size) {
 }
public:
 template<const size_t bit_size, typename value_t = uint64_t>
 inline value_t Read(value_t df = 0) {
 const size_t bpb = 8;
 const size_t byte_offset = m_Offset / bpb;
 const size_t relative_offset = m_Offset - (byte_offset * bpb);
 if (m_ErrorState) {
 return df;
 }
 if (((m_Offset + bit_size) / bpb) >= m_BufferSize) {
 m_ErrorState = true;
 return df;
 }
 m_Offset += bit_size;
 return (*(value_t*)&m_Buffer[byte_offset] & Mask(relative_offset, bit_size)) >> relative_offset;
 }
};
class BitWriter : public BitStream {
public:
 explicit BitWriter(uint8_t* buffer, size_t buffer_size) : BitStream(buffer, buffer_size) {
 }
 template <size_t buffer_size>
 explicit BitWriter(uint8_t(&buffer)[buffer_size]) : BitStream(buffer, buffer_size) {
 }
public:
 template<const size_t bit_size, typename value_t = uint64_t>
 inline void Write(value_t value) {
 const size_t bpb = 8;
 const size_t byte_offset = m_Offset / bpb;
 const size_t relative_offset = m_Offset - (byte_offset * bpb);
 if (m_ErrorState) {
 return;
 }
 if (((m_Offset + bit_size) / bpb) >= m_BufferSize) {
 m_ErrorState = true;
 return;
 }
 m_Offset += bit_size;
 *(value_t*)&m_Buffer[byte_offset] = (*(value_t*)&m_Buffer[byte_offset] & ~Mask(relative_offset, bit_size)) | (value << relative_offset);
 }
};
#endif // BIT_STREAM_H

Example of usage:

#include "BitStream.h"
#include <iostream>
#include <cassert>
int main() {
 uint8_t buffer[2];
 memset(buffer, 0, sizeof(buffer));
 BitWriter writer(buffer);
 writer.Write<4>(15);
 writer.Write<2>(0);
 writer.Write<6>(63);
 assert(!writer.GetErrorState());
 BitReader reader(buffer);
 std::cout << reader.Read<4>() << std::endl;
 std::cout << reader.Read<2>() << std::endl;
 std::cout << reader.Read<6>() << std::endl;
 assert(!reader.GetErrorState());
 std::cin.ignore();
}

Later in plan add exceptions, to string converter method.

Question 2

Unnecessary use of template arguments

In the functions BitReader::Read() and BitWrite::Write(), the bit_size should just be a normal function argument instead of a template argument:

template<typename value_t>
void Write(size_t bit_size, value_t value) {
 ...
}
...
writer.Write(4, 15);

Restrict the allowed value types to integers

You currently allow any type for value_t. This can be problematic though. What happens if I do:

writer.Write(4, 3.1415);

Or something like:

int a = 15;
writer.Write(4, &a);

Either explicitly restrict the allowed types to integers using std::enable_if:

template <typename value_t, std::enable_if_t<std::is_integral<value_t>::value>::type>
void Write(size_t bit_size, value_t value) {
 ...
}

Or don't use template at all, and just use uint64_t unconditionally, and rely on implicit casts between different size integers:

void Write(size_t bit_size, uint64_t value) {
 ...
}

Use `memcpy()` to avoid unaligned reads and writes

The following code has undefined behavior in C++, and might cause crashes on platforms that do not allow unaligned reads and writes to memory:

*(value_t *)&m_Buffer[byte_offset]

Apart from the alignment issue, the above statement might cause data to be read or written beyond the end of m_Buffer.

The correct way to do this is to use memcpy() to move data from the buffer into a value_t variable, and when moving data from a value_t variable to the buffer. While that might seem less optimal, the compiler will most likely be able to convert it back to an unaligned read or write on platforms that support it. So for example in Read(), you should write something like:

value_t value = 0;
memcpy(&value, &m_Buffer[byte_offset], std::min(sizeof value, m_BufferSize - byte_offset));
value &= Mask(relative_offset, bit_size);
value >>= relative_offset;
return value;

Instead of updating m_Buffer() directly in every call to Read() and Write(), you can also consider having a uint64_t staging buffer. For example, in Write(), you can add bits to the staging buffer until it is full, and then copy the staging buffer into m_Buffer, advance byte_offset by sizeof(uint64_t) and then continue from an empty staging buffer.

Consider big-endian platforms

Your code assumes that integers are stored in little-endian format. If you need to support big-endian platforms, then your code is not correct.

Consider asserting that the `value` does not exceed the given `bit_size`

You might want to assert that the value passed to Write() fits in the given bit_size. You could throw an exception if it doesn't, or just add an assert() statement that helps debugging but doesn't impact performance on release builds:

void Write(size_t bit_size, uint64_t value) {
 assert(bit_size <= 8 * sizeof value);
 assert(bit_size == 8 * sizeof value || value >> bit_size == 0);
 ...
}

Shifting `uint64_t` by 64 is undefined behavior

In C++, you can only reliably shift a value by up to 8 * sizeof(value) - 1 bits. That means you cannot shift an uint64_t by 64. Currently, your code has undefined behavior if you Read() or Write() 64 bits in one go.

Reading or writing more than 57 bits at a time might fail

Your code does not handle reading or writing more than 57 bits at a time correctly. Consider that Mask() generates a bitmask for the given size, and then shifts it by offset. Depending on what value came before it, offset can be between 0 and 7. So if offset is 7 and you want to write a 58-bit value to the buffer, you get a mask larger than an uint64_t can hold.

Unnecessary use of `inline`

Member functions that are defined inside a class declaration are automatically inline, so there is no need to specify that explicitly.

Avoid `std::endl`

You should use \n instead of std::endl. The latter is equivalent to \n, but forces a flush of the output stream, which can be bad for performance.

Question 3

bit_size should just be a normal function argument instead of a template argument - Really? What is your justification? If there is a small and well-known set of sizes, the compiler will be better able to produce optimized code.

Question 4

@Reinderien: The functions are inlined anyway, so the compiler can optimize it regardless of whether it is a template argument or a normal function argument.

Question 5

Inline isn't

This pops up on CodeReview regularly. inline is effectively ignored by most modern compilers, and even if it weren't, it isn't a good idea to force the matter -- programmers do not know what's best for their program when compared to the compiler when full optimization is enabled. It's safe to omit this.

Redundant initial values

uint8_t* m_Buffer = nullptr;
size_t m_BufferSize = 0;

These are initialized by your constructor, so there's no point in pre-initializing them here.

Type punning

This:

*(value_t*)&m_Buffer[byte_offset]

is a little gross. There are better ways to do inline pointer casts, particularly in C++. Here, since you're punning from uint8_t* to a pointer of arbitrary type, you'll probably want reinterpret_cast.

I guess bit_size should be 64 if you pass uint64_t for value_t, but I don't see why it's passed separately. You can move bpb up in scope; then bit_size would equal bpb * sizeof(value_t). Perhaps you want to support bit_size smaller than the value type, which is fine; but you should at least assert that bit_size is less than or equal to the number of bits in value_t.

There are subtler risks in the way that you're packing bits - from what I can tell, it assumes little-endian, which will not be universally true.

Question 6

inline is not ignored, it allows different compilation units to redefine the function with exactly same body, and potentially, optimize by combining to a single implementation at the linking time.

Question 7

for my personal point, it is not redundant initial, but default, for comfortable expand class in future, I not like worries about it each time when adding new construction method, and in this nothing bad for compile or run time, in current case m_Buffer & m_BufferSize will be initialized only once a time - in the defined constructor. proof

G. Sliepen G. Sliepen 68.9k3 gold badges74 silver badges179 bronze badges · Accepted Answer · 2020-07-11 21:49:59Z

Unnecessary use of template arguments

In the functions BitReader::Read() and BitWrite::Write(), the bit_size should just be a normal function argument instead of a template argument:

template<typename value_t>
void Write(size_t bit_size, value_t value) {
 ...
}
...
writer.Write(4, 15);

Restrict the allowed value types to integers

You currently allow any type for value_t. This can be problematic though. What happens if I do:

writer.Write(4, 3.1415);

Or something like:

int a = 15;
writer.Write(4, &a);

Either explicitly restrict the allowed types to integers using std::enable_if:

template <typename value_t, std::enable_if_t<std::is_integral<value_t>::value>::type>
void Write(size_t bit_size, value_t value) {
 ...
}

Or don't use template at all, and just use uint64_t unconditionally, and rely on implicit casts between different size integers:

void Write(size_t bit_size, uint64_t value) {
 ...
}

Use `memcpy()` to avoid unaligned reads and writes

The following code has undefined behavior in C++, and might cause crashes on platforms that do not allow unaligned reads and writes to memory:

*(value_t *)&m_Buffer[byte_offset]

Apart from the alignment issue, the above statement might cause data to be read or written beyond the end of m_Buffer.

The correct way to do this is to use memcpy() to move data from the buffer into a value_t variable, and when moving data from a value_t variable to the buffer. While that might seem less optimal, the compiler will most likely be able to convert it back to an unaligned read or write on platforms that support it. So for example in Read(), you should write something like:

value_t value = 0;
memcpy(&value, &m_Buffer[byte_offset], std::min(sizeof value, m_BufferSize - byte_offset));
value &= Mask(relative_offset, bit_size);
value >>= relative_offset;
return value;

Instead of updating m_Buffer() directly in every call to Read() and Write(), you can also consider having a uint64_t staging buffer. For example, in Write(), you can add bits to the staging buffer until it is full, and then copy the staging buffer into m_Buffer, advance byte_offset by sizeof(uint64_t) and then continue from an empty staging buffer.

Consider big-endian platforms

Your code assumes that integers are stored in little-endian format. If you need to support big-endian platforms, then your code is not correct.

Consider asserting that the `value` does not exceed the given `bit_size`

You might want to assert that the value passed to Write() fits in the given bit_size. You could throw an exception if it doesn't, or just add an assert() statement that helps debugging but doesn't impact performance on release builds:

void Write(size_t bit_size, uint64_t value) {
 assert(bit_size <= 8 * sizeof value);
 assert(bit_size == 8 * sizeof value || value >> bit_size == 0);
 ...
}

Shifting `uint64_t` by 64 is undefined behavior

In C++, you can only reliably shift a value by up to 8 * sizeof(value) - 1 bits. That means you cannot shift an uint64_t by 64. Currently, your code has undefined behavior if you Read() or Write() 64 bits in one go.

Reading or writing more than 57 bits at a time might fail

Your code does not handle reading or writing more than 57 bits at a time correctly. Consider that Mask() generates a bitmask for the given size, and then shifts it by offset. Depending on what value came before it, offset can be between 0 and 7. So if offset is 7 and you want to write a 58-bit value to the buffer, you get a mask larger than an uint64_t can hold.

Unnecessary use of `inline`

Member functions that are defined inside a class declaration are automatically inline, so there is no need to specify that explicitly.

Avoid `std::endl`

You should use \n instead of std::endl. The latter is equivalent to \n, but forces a flush of the output stream, which can be bad for performance.

bit_size should just be a normal function argument instead of a template argument - Really? What is your justification? If there is a small and well-known set of sizes, the compiler will be better able to produce optimized code.
@Reinderien: The functions are inlined anyway, so the compiler can optimize it regardless of whether it is a template argument or a normal function argument.

Stack Exchange Network

Bit manipulator (reader / writer)

2 Answers 2

Unnecessary use of template arguments

Restrict the allowed value types to integers

Use `memcpy()` to avoid unaligned reads and writes

Consider big-endian platforms

Consider asserting that the `value` does not exceed the given `bit_size`

Shifting `uint64_t` by 64 is undefined behavior

Reading or writing more than 57 bits at a time might fail

Unnecessary use of `inline`

Avoid `std::endl`

Inline isn't

Redundant initial values

Type punning

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bit manipulator (reader / writer)

2 Answers 2

Unnecessary use of template arguments

Restrict the allowed value types to integers

Use memcpy() to avoid unaligned reads and writes

Consider big-endian platforms

Consider asserting that the value does not exceed the given bit_size

Shifting uint64_t by 64 is undefined behavior

Reading or writing more than 57 bits at a time might fail

Unnecessary use of inline

Avoid std::endl

Inline isn't

Redundant initial values

Type punning

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `memcpy()` to avoid unaligned reads and writes

Consider asserting that the `value` does not exceed the given `bit_size`

Shifting `uint64_t` by 64 is undefined behavior

Unnecessary use of `inline`

Avoid `std::endl`