Please take a review of my simple bit manipulator:
#ifndef BIT_STREAM_H
#define BIT_STREAM_H
#include <cstdint>
class BitStream {
public:
explicit BitStream(uint8_t* buffer, size_t buffer_size) : m_Buffer(buffer), m_BufferSize(buffer_size) {
}
virtual ~BitStream() = default;
public:
inline uint64_t Mask(uint64_t offset, size_t size) const {
return ((1 << size) - 1) << offset;
}
public:
inline size_t GetBufferSize() const {
return m_BufferSize;
}
inline size_t GetOffset() const {
return m_Offset;
}
inline bool GetErrorState() const {
return m_ErrorState;
}
protected:
uint8_t* m_Buffer = nullptr;
size_t m_BufferSize = 0;
size_t m_Offset = 0;
bool m_ErrorState = false;
};
class BitReader : public BitStream {
public:
explicit BitReader(uint8_t* buffer, size_t buffer_size) : BitStream(buffer, buffer_size) {
}
template <size_t buffer_size>
explicit BitReader(uint8_t(&buffer)[buffer_size]) : BitStream(buffer, buffer_size) {
}
public:
template<const size_t bit_size, typename value_t = uint64_t>
inline value_t Read(value_t df = 0) {
const size_t bpb = 8;
const size_t byte_offset = m_Offset / bpb;
const size_t relative_offset = m_Offset - (byte_offset * bpb);
if (m_ErrorState) {
return df;
}
if (((m_Offset + bit_size) / bpb) >= m_BufferSize) {
m_ErrorState = true;
return df;
}
m_Offset += bit_size;
return (*(value_t*)&m_Buffer[byte_offset] & Mask(relative_offset, bit_size)) >> relative_offset;
}
};
class BitWriter : public BitStream {
public:
explicit BitWriter(uint8_t* buffer, size_t buffer_size) : BitStream(buffer, buffer_size) {
}
template <size_t buffer_size>
explicit BitWriter(uint8_t(&buffer)[buffer_size]) : BitStream(buffer, buffer_size) {
}
public:
template<const size_t bit_size, typename value_t = uint64_t>
inline void Write(value_t value) {
const size_t bpb = 8;
const size_t byte_offset = m_Offset / bpb;
const size_t relative_offset = m_Offset - (byte_offset * bpb);
if (m_ErrorState) {
return;
}
if (((m_Offset + bit_size) / bpb) >= m_BufferSize) {
m_ErrorState = true;
return;
}
m_Offset += bit_size;
*(value_t*)&m_Buffer[byte_offset] = (*(value_t*)&m_Buffer[byte_offset] & ~Mask(relative_offset, bit_size)) | (value << relative_offset);
}
};
#endif // BIT_STREAM_H
Example of usage:
#include "BitStream.h"
#include <iostream>
#include <cassert>
int main() {
uint8_t buffer[2];
memset(buffer, 0, sizeof(buffer));
BitWriter writer(buffer);
writer.Write<4>(15);
writer.Write<2>(0);
writer.Write<6>(63);
assert(!writer.GetErrorState());
BitReader reader(buffer);
std::cout << reader.Read<4>() << std::endl;
std::cout << reader.Read<2>() << std::endl;
std::cout << reader.Read<6>() << std::endl;
assert(!reader.GetErrorState());
std::cin.ignore();
}
Later in plan add exceptions, to string converter method.
2 Answers 2
Unnecessary use of template arguments
In the functions BitReader::Read()
and BitWrite::Write()
, the bit_size
should just be a normal function argument instead of a template argument:
template<typename value_t>
void Write(size_t bit_size, value_t value) {
...
}
...
writer.Write(4, 15);
Restrict the allowed value types to integers
You currently allow any type for value_t
. This can be problematic though. What happens if I do:
writer.Write(4, 3.1415);
Or something like:
int a = 15;
writer.Write(4, &a);
Either explicitly restrict the allowed types to integers using std::enable_if
:
template <typename value_t, std::enable_if_t<std::is_integral<value_t>::value>::type>
void Write(size_t bit_size, value_t value) {
...
}
Or don't use template at all, and just use uint64_t
unconditionally, and rely on implicit casts between different size integers:
void Write(size_t bit_size, uint64_t value) {
...
}
Use memcpy()
to avoid unaligned reads and writes
The following code has undefined behavior in C++, and might cause crashes on platforms that do not allow unaligned reads and writes to memory:
*(value_t *)&m_Buffer[byte_offset]
Apart from the alignment issue, the above statement might cause data to be read or written beyond the end of m_Buffer
.
The correct way to do this is to use memcpy()
to move data from the buffer into a value_t
variable, and when moving data from a value_t
variable to the buffer. While that might seem less optimal, the compiler will most likely be able to convert it back to an unaligned read or write on platforms that support it. So for example in Read()
, you should write something like:
value_t value = 0;
memcpy(&value, &m_Buffer[byte_offset], std::min(sizeof value, m_BufferSize - byte_offset));
value &= Mask(relative_offset, bit_size);
value >>= relative_offset;
return value;
Instead of updating m_Buffer()
directly in every call to Read()
and Write()
, you can also consider having a uint64_t
staging buffer. For example, in Write()
, you can add bits to the staging buffer until it is full, and then copy the staging buffer into m_Buffer
, advance byte_offset
by sizeof(uint64_t)
and then continue from an empty staging buffer.
Consider big-endian platforms
Your code assumes that integers are stored in little-endian format. If you need to support big-endian platforms, then your code is not correct.
Consider asserting that the value
does not exceed the given bit_size
You might want to assert that the value
passed to Write()
fits in the given bit_size
. You could throw an exception if it doesn't, or just add an assert()
statement that helps debugging but doesn't impact performance on release builds:
void Write(size_t bit_size, uint64_t value) {
assert(bit_size <= 8 * sizeof value);
assert(bit_size == 8 * sizeof value || value >> bit_size == 0);
...
}
Shifting uint64_t
by 64 is undefined behavior
In C++, you can only reliably shift a value by up to 8 * sizeof(value) - 1
bits. That means you cannot shift an uint64_t
by 64. Currently, your code has undefined behavior if you Read()
or Write()
64 bits in one go.
Reading or writing more than 57 bits at a time might fail
Your code does not handle reading or writing more than 57 bits at a time correctly. Consider that Mask()
generates a bitmask for the given size
, and then shifts it by offset
. Depending on what value came before it, offset
can be between 0 and 7. So if offset
is 7 and you want to write a 58-bit value to the buffer, you get a mask larger than an uint64_t
can hold.
Unnecessary use of inline
Member functions that are defined inside a class declaration are automatically inline
, so there is no need to specify that explicitly.
Avoid std::endl
You should use \n
instead of std::endl
. The latter is equivalent to \n
, but forces a flush of the output stream, which can be bad for performance.
-
\$\begingroup\$ bit_size should just be a normal function argument instead of a template argument - Really? What is your justification? If there is a small and well-known set of sizes, the compiler will be better able to produce optimized code. \$\endgroup\$Reinderien– Reinderien2020年07月11日 22:05:56 +00:00Commented Jul 11, 2020 at 22:05
-
\$\begingroup\$ @Reinderien: The functions are inlined anyway, so the compiler can optimize it regardless of whether it is a template argument or a normal function argument. \$\endgroup\$G. Sliepen– G. Sliepen2020年07月11日 22:49:18 +00:00Commented Jul 11, 2020 at 22:49
Inline isn't
This pops up on CodeReview regularly. inline
is effectively ignored by most modern compilers, and even if it weren't, it isn't a good idea to force the matter -- programmers do not know what's best for their program when compared to the compiler when full optimization is enabled. It's safe to omit this.
Redundant initial values
uint8_t* m_Buffer = nullptr;
size_t m_BufferSize = 0;
These are initialized by your constructor, so there's no point in pre-initializing them here.
Type punning
This:
*(value_t*)&m_Buffer[byte_offset]
is a little gross. There are better ways to do inline pointer casts, particularly in C++. Here, since you're punning from uint8_t*
to a pointer of arbitrary type, you'll probably want reinterpret_cast
.
I guess bit_size
should be 64 if you pass uint64_t
for value_t
, but I don't see why it's passed separately. You can move bpb
up in scope; then bit_size
would equal bpb * sizeof(value_t)
. Perhaps you want to support bit_size
smaller than the value type, which is fine; but you should at least assert that bit_size
is less than or equal to the number of bits in value_t
.
There are subtler risks in the way that you're packing bits - from what I can tell, it assumes little-endian, which will not be universally true.
-
\$\begingroup\$ inline is not ignored, it allows different compilation units to redefine the function with exactly same body, and potentially, optimize by combining to a single implementation at the linking time. \$\endgroup\$Ilkhd– Ilkhd2020年07月12日 09:11:22 +00:00Commented Jul 12, 2020 at 9:11
-
\$\begingroup\$ for my personal point, it is not redundant initial, but default, for comfortable expand class in future, I not like worries about it each time when adding new construction method, and in this nothing bad for compile or run time, in current case m_Buffer & m_BufferSize will be initialized only once a time - in the defined constructor. proof \$\endgroup\$Harry– Harry2020年07月13日 07:49:27 +00:00Commented Jul 13, 2020 at 7:49