Optimize memory and disk usage for sequences of n-bit values

Question 1

Motivation:

I am working with sequences of n-bit values coming from an ADC. I was initially using a std::vector<unsigned short> to store these values (12 bit in my case) but I wanted to improve on memory efficiency since this approach uses 16 bit per value instead of the 12 bit of information.

My main concern is disk usage (I am saving these sequences into disk as byte strings) rather than runtime memory but I though both could be improved with my solution. I am aware that any improvement on memory usage will likely introduce overhead somewhere else but I am ok with that (but would like to keep it to a minimum). Ease of usage is also a priority, as I don't want to overcomplicate things to much compared to a naive solution such as std::vector.

Solution:

My solution stores the data in a std::bitset and provides methods to access / insert the data. These methods are slower since they need to perform a conversion of the data.

The full code can be viewed at https://github.com/lobis/adc-array. I have some experience writing C++ code but not that much outside of my comfort zone. I am not very familiar with templates or with most of the stl. This is the first time I write a library that could be used by other projects so I would also love any feedback in this regard, such as my approach to testing (can be viewed on the GitHub page).

#include <bitset>
template<std::size_t ResolutionInNumberOfBits = 12, std::size_t NumberOfElements = 512>
class ADCArray {
private:
 std::bitset<ResolutionInNumberOfBits * NumberOfElements> data;
public:
 inline constexpr ADCArray() = default;
 inline constexpr static std::size_t size() { return NumberOfElements; }
 inline constexpr static std::pair<std::size_t, std::size_t> GetRange() {
 unsigned int max = 1;
 for (std::size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 max *= 2;
 }
 return {0, max - 1};
 }
 inline constexpr unsigned int at(size_t position) const {
 assert(position < size());
 unsigned int result = 0;
 unsigned int powerOf2 = 1;
 for (size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 if (data[position * ResolutionInNumberOfBits + i]) {
 result += powerOf2;
 }
 powerOf2 *= 2;
 }
 return result;
 }
 inline constexpr std::array<unsigned int, NumberOfElements> GetValues() const {
 std::array<unsigned int, NumberOfElements> result{};
 for (size_t i = 0; i < size(); i++) {
 result[i] = at(i);
 }
 return result;
 }
 inline constexpr void insert(size_t position, unsigned int value) {
 assert(position < size());
 assert(value <= GetRange().second);
 for (size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 data[position * ResolutionInNumberOfBits + i] = value % 2;
 value /= 2;
 }
 }
 inline constexpr bool operator==(const ADCArray& rhs) const {
 return data == rhs.data;
 }
 inline constexpr bool operator!=(const ADCArray& rhs) const {
 return !(*this == rhs);
 }
 // Serialization
 using byte = unsigned char;
 inline constexpr std::array<byte, sizeof(data)> ToBytes() const {
 std::array<byte, sizeof(data)> bytes;
 const byte* begin = reinterpret_cast<const byte*>(std::addressof(data));
 std::copy(begin, begin + sizeof(data), std::begin(bytes));
 return bytes;
 }
 inline constexpr ADCArray<ResolutionInNumberOfBits, NumberOfElements> static FromBytes(const std::array<byte, sizeof(data)>& bytes) {
 auto array = ADCArray<ResolutionInNumberOfBits, NumberOfElements>();
 byte* begin = reinterpret_cast<byte*>(std::addressof(array.data));
 std::copy(std::begin(bytes), std::end(bytes), begin);
 return array;
 }
 // Other constructors
 inline constexpr ADCArray(const std::array<unsigned int, size()>& values) {
 for (size_t i = 0; i < values.size(); i++) {
 insert(i, values.at(i));
 }
 }
 inline constexpr ADCArray(const ADCArray& array) {
 data = array.data;
 }
};

Usage:

# include "adc_array/adc_array.h"
const size_t resolutionInBits = 10;
const size_t numberOfElements = 1000000; // 1E6
auto adcArray = ADCArray<resolutionInBits, numberOfElements>();
std::srand(0);
for (int i = 0; i < numberOfElements; ++i){
 adcArray.insert(i, rand() % 1024);
}
sizeof(adcArray); // 1250000 bytes
auto values = std::array<unsigned short, numberOfElements>{};
for (int i = 0; i < numberOfElements; ++i){
 values[i] = adcArray.at(i);
}
sizeof(values); // 2000000 bytes, 60% more memory usage

Question 2

In your github link, you write ... one could use a std::vector<unsigned short>. However, 4 bytes (16 bit) is the minimum amount of memory that needs to be allocated for each value. Normal machines have 8-bit bytes, so a 16-bit unsigned short is 2 bytes. Only 4 wasted bits per sample is fine most of the time, for in-memory processing.

Question 3

Design review

Your requirements are a bit vague; or rather, a bit contradictory. You say you want maximum memory efficiency (in the first paragraph)... but then you say you don’t, really (in the first sentence of the second paragraph)... but then you say you do (second sentence, "okay with the overhead")... but then you say you don’t ("would like to keep it to a minimum"). You already know that there are going to be trade-offs, so pick a team, and stick with it. You can’t have it all. Accept that. None of this "would like to minimize all inefficiencies". Yeah, wouldn’t we all. Decide and clarify what you are trying to minimize, and what you are willing to sacrifice (but, obviously should be minimal as well, because why else use C++).

In any case, I’m not sure bitset is the right tool for this job. bitset is a decent abstraction for a single N-bit chunk of data... but it isn’t really for sequences. For statically-sized sequences of N-bit values, your best bet would be a array<byte, M> (where M has to be calculated as the ratio of N and CHAR_BIT, as I’ll show later). For dynamically-sized sequences, your best bet would be a vector<byte>. For a really sexy interface, you could follow the example of span, and be both static or dynamic.

You might also consider the example of span, and not make a container, but rather just a view; more on that later.

What, exactly do you need?

There are basically two options for arbitrary packed byte sequences, both with pros and cons:

Store the data packed in memory.
Store the data unpacked in memory, and only pack it on demand (such as when reading/writing to disk).

The one that sounds like what you want is option 1: to store the data packed in memory.

Solution 1: pack in memory

That option uses byte arrays in memory, with the data packed tightly in. For example, you have 12-bit values. Let’s assume 8-bit bytes. That means you would pack 1 value in 2 bytes, 2 values in 3 bytes, 3 values in 5 bytes, 4 values in 6 bytes, and so on:

 | . : ! . : | . : ! . : | . : ! . : | . : ! . : |
Value |<-- 1 -->|<-- 2 -->|<-- 3 -->|<-- 4 -->|
Byte |<- 1 ->|<- 2 ->|<- 3 ->|<- 4 ->|<- 5 ->|<- 6 ->|
 | . : . | . : . | . : . | . : . | . : . | . : . |

Generally the number of bytes you need for n N-bit values (with B-bit bytes) is \$\left\lceil \left( n \times N \right) \div B \right\rceil\$. Value i will start at byte number \$\left\lfloor \left( i \times N \right) \div B \right\rfloor\$, and bit position \$\left( i \times N \right) \mod B\$.

This solution looks a lot like what you already have (warning: untested code, written blindly, for illustration purposes only):

template <
 std::size_t Bits,
 std::size_t N
>
class ADCArray
{
private:
 // Better static_assert that (Bits * N) < numeric_limits<size_t>::max()!
 // Also better static_assert that Bits is <= (CHAR_BIT * sizeof(unsigned int)).
 static constexpr auto _size_in_bits = std::size_t(Bits * N);
 static constexpr auto _size_in_bytes = std::size_t((_size_in_bits / CHAR_BIT) + bool(_size_in_bits % CHAR_BIT));
 // Naturally, this will have to calculated based on Bits, not hard-coded,
 // but I'm just illustrating.
 static constexpr auto _mask = 0b1111'1111'1111u;
 std::array<unsigned char, _size_in_bytes> data;
public:
 // For example:
 constexpr auto at(std::size_t n) const -> unsigned int
 {
 // How many bits left to read:
 auto bits_left = Bits;
 // What byte position to start at:
 auto byte_pos = static_cast<unsigned int>((n * Bits) / CHAR_BIT);
 // What bit position to start at:
 auto bit_pos = static_cast<unsigned int>((n * Bits) % CHAR_BIT);
 // Result value:
 auto res = 0u;
 // Do the first (possibly partial) byte read:
 res = data[byte_pos++];
 bits_left -= (bit_pos == 0) ? CHAR_BIT : bit_pos;
 // Do any remaining (possibly partial) byte reads:
 while (bits_left != 0)
 {
 // How many bits to read this round:
 auto const bits_to_read = std::min(bits_left, CHAR_BIT);
 // Read those bits:
 auto const v = static_cast<unsigned int>(data[byte_pos++] >> (CHAR_BIT - bits_to_read));
 // Shift the result by that many bits, and then add them in:
 res <<= bits_to_read;
 res |= v;
 // Count the bits that were read:
 bits_left -= bits_to_read;
 }
 return res & _mask;
 }
 // ... and so on ...

Getting this right for arbitrary values of N and CHAR_BIT is tricky, but not insanely hard.

The benefits of this solution is that your memory usage can be absolutely minimal; you can’t possibly use less memory (assuming the data isn’t regular and compressible). The downside is that accesses and writes by value will be slow... although reading/writing the data to byte-based file storage will be as fast as it can possibly be.

So basically, the more you want to muck around with the values in memory, the worse this solution is. If all you want to do is collect/generate the data, and read/write it to byte-based binary streams (including files), then this is perfect. If you want to do a lot of calculations/transforms on the data in memory, then this is not great.

Solution 2: pack on demand

The other solution is to simply store the data as values in memory, with no packing, and do all the (un)packing only when requested (for example, when reading/writing to a file). There is no real magic to this solution; it’s just an array<unsigned int, NumberOfElements>. Or a vector, or anything, really... which is maximally flexible.

For that reason, you may not want to bother making a class at all—not even a view—and instead only make functions to pack/unpack on demand. This is the solution you want if you are going to be doing a lot of processing/transformation of the data. It uses more memory during processing, but will be a lot faster, and you can still pack it when saving/transmitting.

Solution 3: hybrid solution

There is a third solution, which is halfway between solutions 1 and 2.

In this solution, you do partial packing in memory. For example, let’s assume you have 20-bit values. With solution 1, you’d pack them like this:

Value |<-- 1 -->|<-- 2 -->|<-- 3 -->|<-- 4 -->|
Byte |<- 1 ->|<- 2 ->|<- 3 ->|<- 4 ->|<- 5 ->|<- 6 ->|<- 7 ->|<- 8 ->|<- 9 ->|<-10 ->|

So 4 values fit in 10 bytes.

This is maximally packed, with absolutely no wasted memory, but it means that every other value starts midway through a byte. The bit-twiddling required to access each values is costly.

To avoid that, we can sacrifice a little bit a of memory, and ensure each value starts on a byte boundary:

Value |<-- 1 -->|###|<-- 2 -->|###|<-- 3 -->|###|<-- 4 -->|
Byte |<- 1 ->|<- 2 ->|<- 3 ->|<- 4 ->|<- 5 ->|<- 6 ->|<- 7 ->|<- 8 ->|<- 9 ->|<-10 ->|<-11 ->|<-12 ->|

Now 4 values require 12 bytes... but each value access requires no bit-twiddling. Which will be a lot faster.

You’ll probably want to perfectly pack it (as in solution 1) when reading/writing to disk, so this hybrid solution actually requires the most code. But it will give you perhaps the best trade-off of speed and memory efficiency.

I’m assuming solution 1

It’s not clear from your problem description which solution you actually want. You say you mostly care about disk usage and not runtime memory... but then you say you are okay with the costs of minimizing runtime memory. So... which is it? You will minimize the disk space either way (with any of solution 1, 2, or 3), so do you want to minimize the runtime memory at the cost of making processing expensive, or do you want good runtime performance? (Or half-and-half, as in solution 3?)

I’m going to assume you really don’t care about the runtime processing costs, and you really want to minimize the memory usage. Okay, in that case, we’re talking about packing the data in memory (as opposed to leaving it unpacked in memory, and just packing it when reading/writing to disk). So, solution 1.

Consider a view, not a container

Now, as I said, I’m not sure bitset is the right took for the job. I’m not even sure array is the right tool, because:

how often will you know at compile time how much data you have? and
arrays are terrible for performance when they get huge.

When dealing with large amounts of data, even vector gets a bit iffy. deque is often a better choice, because you get most of the benefits of vector, but because the data is chunked, it’s a lot easier on memory.

The moral here is that you want to avoid forcing a choice of container on users if at all possible. And... it is possible, because you can let users use whatever container they please, and just create a view over it.

Suppose I want to read some unknown amount of 12-bit values from some source. I could create an array of bytes, take a guess at the size, and then try to fill it. I don’t want to use a vector, because a vector will zero-initialize all the data, which is wasteful (because it will all be overwritten anyway). So I manually allocate an array, and then create a view over it:

// Allocate 1024 bytes of memory.
auto size = std::size_t(1024);
auto buffer = std::make_unique_for_overwrite<std::byte>(size);
// Create a view of 12-bit values over that memory. The view is smart enough
// to know how many 12-bit values you get with 1024 bytes of raw memory.
auto view = adc_view<12>{buffer.get(), buffer.get() + size};
// Read 12-bit values from wherever, and put them in the memory the view
// represents.
auto p = view.begin();
while (true)
{
 while (p != view.end() and have_more_data())
 *p++ = read_12bit_value();
 // If we've read all the data, we're done.
 if (not have_more_data())
 break;
 // If we have more data to read, then we need to reallocate, copy the existing
 // data, and then continue reading after it.
 // Reallocate:
 auto temp = std::make_unique_for_overwrite<std::byte>(size + 1024);
 // Copy old data over:
 std::copy_n(buffer.get(), size, temp.get());
 // Replace old memory buffer:
 buffer = std::move(temp);
 // Reset the view to the new memory:
 view = adc_view<12>{buffer.get(), buffer.get() + size + 1024};
 p = view.begin() + size;
 size += 1024;
}
// But if we read everything, then we *could* do a shrink-to-fit, or we could
// just accept a little wasted memory, and return a new view that is just the
// data read.
auto data = adc_view<12>(buffer.get(), buffer.get() + (p - view.begin()));
// Now what you have is:
// data.begin() => an iterator to the first 12-bit value
// data.end() => an iterator to one-past-the-last 12-bit value
// data.size() => the number of 12-bit values
// data[i] => the i-th 12-bit value
// data.at(i) => "
// ... any other range/view functions you want

Now, in the example above, data is a std::unique_ptr<std::byte[]>... but it could be a std::vector<std::byte>, or a std::array<std::byte, N>, or a std::deque<std::byte>, or a custom circular buffer type, or... anything that is a sequence of bytes.

A view is a lightweight object that doesn’t own anything, but instead merely provides a view of some other data. In this case, you would have a view over a bunch of bytes, that presents those bytes as the N-bit values you want. It doesn’t matter what kind of object the view is looking at—could be a vector, and array, any sequence of bytes. This is maximal flexibility, at minimal cost.

I would suggest that instead of trying to create a container for N-bit values, you just make a view. Then you no longer need to worry about whether bitset is a good fit, or vector<bool> for dynamic sizes, or whether array<byte> or vector<byte> or deque<byte> are a better choice, or whether you should allow custom allocators so users can use memory pools or stack memory or whatever else. Let the user decide all that crap, and just provide a view.

Code review

#include <bitset>

You’re missing a number of headers; not least being <array>, <utility> (for pair), and more.

template<std::size_t ResolutionInNumberOfBits = 12, std::size_t NumberOfElements = 512>
class ADCArray {

Not sure it really makes much sense to provide defaults for the template parameters in this case.

Also... these names are a bit much. I mean, it’s a good thing to use descriptive names, but there comes a point when you’re just being silly. Is ResolutionInNumberOfBits really that much more descriptive than just Bits? And NumberOfElements can just be N, really; that’s kinda standard practice (for example, and even in the standard).

You should also consider putting all your stuff in your own namespace. The global namespace is a precious place, with a lot of special rules. I would suggest avoiding it if you can (and you can).

Also also, you probably want a bunch of static asserts to verify that the numbers make sense. ResolutionInNumberOfBits probably can’t be zero, for example. And if you are going to use unsigned int as the value type, it can’t be greater than the number of bits in an unsigned int (but more on that later).

 inline constexpr ADCArray() = default;

This is cool except for the inline. In fact, all of the inlines in your code are unnecessary. When you write a member function within the class, it’s inline by default anyway, and even if it weren’t compilers are way smarter than you or I these days, and will inline or not on their own whims, completely ignoring it when we manually specify inline.

I would also point out that you’ve tagged your code C++14. That’s... almost a decade old at this point. But alright, fine, if you want to peg your code to C++14, it could be worse... however, keep in mind that constexpr was still fairly primitive in C++14. In particular, neither bitset nor array were constexpr in C++14. (pair was, I think... but that’s about it.)

 inline constexpr static std::size_t size() { return NumberOfElements; }

This isn’t wrong... but... the standard form for ranges/containers/views has size() as either a non-static member function... or (as of more recent versions of C++, like C++20) a non-member function. By making this function static... you’ve pretty much broken all the rules, and now your container won’t work with std::ranges::size(), it won’t be detected as a sized range... basically nothing will work.

So either make this non-static, or make it a non-member.

If you are going to make it non-static, you should also make it const and noexcept (because it cannot fail).

 inline constexpr static std::pair<std::size_t, std::size_t> GetRange() {
 unsigned int max = 1;
 for (std::size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 max *= 2;
 }
 return {0, max - 1};
 }

This is a poorly named function. Without any comments to explain things (which is another issue; comment your code!), I have no clue what range this is getting.

It looks like it’s getting the maximum and minimum values of the value type of the container... but it doesn’t really make sense for that to a property of the container. Consider array<int>. It doesn’t have any functions that tell you what numeric_limits<int>::max() or numeric_limits<int>::min() are. Nor should it. That’s not information relevant to the array.

It’s also not information you should need an array to get. Consider if I just want to know the min and max values of a 12-bit unsigned number. Do I really need to do: ADCArray<12, DOES_NOT_MATTER>::GetRange().first and ADCArray<12, DOES_NOT_MATTER>::GetRange().second? Doesn’t it make more sense to have a separate, specific type, like something<12>::min and something<12>::max (where something could be ADCValue, I guess)?

Also, you don’t need a loop to calculate the max value. You can just do (1u << ResolutionInNumberOfBits) - 1.

 inline constexpr unsigned int at(size_t position) const {
 assert(position < size());
 unsigned int result = 0;
 unsigned int powerOf2 = 1;
 for (size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 if (data[position * ResolutionInNumberOfBits + i]) {
 result += powerOf2;
 }
 powerOf2 *= 2;
 }
 return result;
 }

You have hard-coded unsigned int as the value type of the container... but is that wise?

It would be better to let the user specify what they want. Maybe they want unsigned int. Maybe they want std::uint_fast32_t. Let them choose. You could provide a sensible default, where it defaults to unsigned int, of course.

This brings up another issue with using bitset as the underlying type: if you are working with bits, you are strictly limiting the size. Let’s say SIZE_MAX is 65,535. That means if you used an array of bytes, you could access up to 65,535 bytes (in theory). But with bits, you can only access up to 65,535... bits... which, assuming 8-bit bytes, is only ~8,000 bytes.

And, of course, you end up with silliness like this, where you have to go bit-by-bit to get the actual value. (As opposed to getting it up to a byte at a time, as I illustrated in the design review.)

Even so, this is not a particularly efficient, nor readable way to go about it. All you need to do is start at the starting bit, and go to the final bit, and set the bits in the result as needed, like so:

constexpr auto at(std::size_t position) const
{
 auto result = 0u;
 auto const start_pos = position * ResolutionInNumberOfBits;
 auto const end_pos = start_pos + ResolutionInNumberOfBits;
 for (auto i = start_pos; i != end_pos; ++i)
 result = (result << 1) + data[i];
 return result;
}

Simple as that. But, again, doing this bit-by-bit is not particularly efficient.

 inline constexpr std::array<unsigned int, NumberOfElements> GetValues() const {

This doesn’t seem to be a particularly useful function. It would make more sense to have things like begin()/end(), and use an algorithm:

auto data = ADCArray<resolutionInBits, numberOfElements>{};
// ... fill data somehow...
auto values = std::array<unsigned int, numberOfElements>{};
std::copy(data.begin(), data.end(), values.begin());
// Or:
// std::copy_n(data.begin(), numberOfElements, values.begin());
// std::copy_n(data.begin(), data.size(), values.begin());
// C++20:
// std::ranges::copy(data, values.begin());

Doing it this way allows for values to be a vector, or anything else, rather than a hard-coded array.

 inline constexpr void insert(size_t position, unsigned int value) {
 assert(position < size());
 assert(value <= GetRange().second);
 for (size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 data[position * ResolutionInNumberOfBits + i] = value % 2;
 value /= 2;
 }
 }

insert() is not a great name for this function, because you are not actually inserting anything. You are, at best, replacing a value in the container, not inserting one. (But "replace" is not a standard name, so probably not a great choice. Maybe just "set", for this case?)

I would suggest looking at the containers library, and seeing what functions the standard containers have (check out the member function table), what those functions actually do, and then using the same interface on your own container. The standard container library is the standard container library... it’s the library all C++ coders know (or should know), and it is the metric for what all C++ coders expect from a container. You will almost never be able to match the standard interfaces exactly, because every container is different (even the standard containers all vary in their interface). But you should try to match them as much as possible, and certainly in spirit.

As for the actual algorithm here... I would suggest rethinking the way you’re doing it. It is much easier to work backwards here:

constexpr auto set(std::size_t position, unsigned int value)
{
 auto const start_pos = position * ResolutionInNumberOfBits;
 auto const end_pos = start_pos + ResolutionInNumberOfBits;
 for (auto i = end_pos; i != start_pos; --i)
 {
 // Read the least-significant bit.
 data[i - 1] = value & 0x1u;
 // Shift.
 value >>= 1;
 }
}

This avoids the very expensive divisions and modulo operations.

 inline constexpr std::array<byte, sizeof(data)> ToBytes() const {
 std::array<byte, sizeof(data)> bytes;
 const byte* begin = reinterpret_cast<const byte*>(std::addressof(data));
 std::copy(begin, begin + sizeof(data), std::begin(bytes));
 return bytes;
 }

This is a really terrible idea. You don’t want to be digging into the guts of a class you have no control over, and no interface guarantees. You have no idea what’s going on inside std::bitset. You seem to assume that under the hood it’s just an array of bytes with the bits you want packed into it in logical (network) order. I guarantee you that that assumption is at least sometimes wrong—if the number of bits is less than the number of bits in an unsigned int, it is almost certain that it’s just using an unsigned int, where the bits you want are possibly scattered all about due to endianness. And even that assumes that there is no padding, no extra data (like a size value stored before the bit data), or anything else going on.

Even if written correctly, this function is just GetValues() all over again, which, again, doesn’t really serve much purpose. If you want to convert the data to an array of bytes, it would make much more sense to provide the public interface functions to allow it, and just let the user do it themselves.

 inline constexpr ADCArray<ResolutionInNumberOfBits, NumberOfElements> static FromBytes(const std::array<byte, sizeof(data)>& bytes) {
 auto array = ADCArray<ResolutionInNumberOfBits, NumberOfElements>();
 byte* begin = reinterpret_cast<byte*>(std::addressof(array.data));
 std::copy(std::begin(bytes), std::end(bytes), begin);
 return array;
 }

Same problems as with ToBytes(), with the added bonus of sizeof(data). You assume that a bitset is absolutely nothing but a byte array containing the data you want. You have no way of knowing that; if bitset has any other data members, and/or if they are ordered in a way you don’t expect, you’ll just be writing garbage into the bit set.

Incidentally, if you think this function is useful, you should seriously consider making a view rather than a container. If you have an array of bytes containing the raw data you want, why bother creating a whole other array just to copy that data into it? Why not just provide a lightweight view over the original data? There’s no need to copy anything.

 inline constexpr ADCArray(const std::array<unsigned int, size()>& values) {
 for (size_t i = 0; i < values.size(); i++) {
 insert(i, values.at(i));
 }
 }

This isn’t wrong... but it’s terribly inflexible. What if the values are in a vector? Or literally anything else?

What you should do is provide a constructor that takes an iterator pair, so you can read values in from anything: an array, a vector, hell, even directly out of a file or network stream, using stream iterators.

And then you should consider providing a range constructor, that can not only take an array, but any other kind of container or view or range.

Now, if you are trying to do this in C++14... it’s not impossible, but it’s fraught. If you were doing this in C++20, it would be trivial:

template <std::input_iterator I, std::sentinel_for<I> S>
 requires std::unsigned_integral<std::iter_value_t<I>>
constexpr ADCArray(I first, S last)
{
 auto n = std::size_t{0}; 
 while ((n < size()) and (first != last))
 set(n++, *first++);
 // If you want, check whether there's leftover data, or whether there was
 // not enough, and handle it as you please.
}
template <std::ranges::input_range R>
 requires std::unsigned_integral<std::range_value_t<R>>
constexpr ADCArray(R&& rng)
 : ADCArray{std::ranges::begin(rng), std::ranges::end(rng)}
{}

Doing the same thing in C++14 is possible, but much, much trickier, requiring much more work.

Summary

Put everything in a namespace.
Don’t work with bits. Work with bytes (at least). (So, no bitset, use array<byte> or vector<byte> instead.)
Don’t make a container, make a view. It’s much more flexible.
Try to match standard interfaces wherever possible.
static_assert everything that you assume must be true about your template parameters.
Ditch all the inlines.
Don’t assume anything about the internals of classes, not even standard library classes.
C++14 is ancient. At least move to C++17; even Google is there now. Even better, move to C++20.

Question 4

Your proposed at() bit-reverses the value relative to the storage in the bitset. The highest-numbered bit will be the last one your loop reads, and will become the least-significant one in result = (result << 1) + data[i];. So even if a compiler miraculously saw through that mess and was able to load the containing 4-byte or 8-byte chunk and shift/mask it, it would need an ARM rbit instruction. (Most other ISAs don't have a bit-reverse instruction, only byte-reverse.) I'm assuming of course a normal implementation of std::bitset where [0] is in the byte with lowest address.

Question 5

The OP's implementation is horrible in a different way, branching on each bit instead of shifting it into an integer, so yeah it needed changing. But it does put bits in the "right" order. If you have to write something like this, probably loop from the high to low bit, so at least in theory a sufficiently-smart compiler could do a 2-byte unaligned load starting at bitset + a byte offset of position * 3/2 or something, then right shift by 4 bits or not depending on whether position was odd.

Question 6

Anyway, then set() can go forwards, starting from the lowest bit of the value at the lowest position in the bitset, like in the OP's code.

Question 7

@PeterCordes I did not propose any at(). I just provided a very vague illustration of the idea, with code that was untested and frankly not even all that carefully thought out. I don’t even know if it works. The text explains all that.

Question 8

It's pretty long so I was skimming. I the general idea of at working that way was the OP's idea; I meant your example implementation, looping forward and doing (result << 1) + data[i];. Agreed that it's mostly a moot point because it's a terrible idea to go bit-by-bit anyway. Since std::bitset<> doesn't provide a bit-range accessor itself, 100% agreed that we shouldn't use it at all, instead of building one on top of its bit-at-a-time interface.

Question 9

You're missing the header includes to provide std::array, std::pair, std::addressof(), std::begin(), std::end(), std::copy() and assert(). Possibly others - these are just the ones I noticed.

You've misspelt std::size_t as size_t in more than one place.

You could use operator!=() = default to simplify.

We seem to have an assumption that ResolutionInNumberOfBits is no more than the number of bits in the platform's unsigned int, but we really ought to check that.

Question 10

Consider using a proper compression algorithm

Your method saves 25% of storage space, and is relatively simple to implement, but if the goal is to reduce memory and disk usage as much as possible, you can do much better than that.

For ADC values, it is likely that not all possible values are equally likely. Huffman coding is a way to store data in less bits, making use of the fact that some values are more likely than others, and storing those likely values with less bits. In fact, if every value between 0 and 1023 would be equally likely, and any other 16 bit value would never occur, Huffman coding would compress the data to exactly 12 bits per sample for you, but it can even produce better ratios depending on the distribution of values.

Often Huffman coding is not used on its own, but as part of a more sophisticated compression algorithm. For example, ZIP files, JPEG and MP3 files are all using Huffman at some point. But they first massage the data so it will be compressed even better.

Consider a typical signal that is being sampled by an ADC: it's probably a signal that doesn't jump from a random value to another between readouts, but is varying smoothly over time. Instead of compressing the value directly, you can instead just record the value of the initial sample, and then only record the difference between the current and the previous sample.

You can also decide to do lossy compression: if you know the signal or the ADC itself has some unwanted noise, you can apply a filter to the signal before compressing it.

There are many libraries available that can compress data for you.

Inefficient code

You are optimizing the size requirements for your data, but your code is not as efficient as possible when you look at the CPU time required to pack and unpack the data.

Consider GetRange(): basically it returns the minimum and maximum value that you can store per sample. Using the left shift operator you can calculate this without having to use a for-loop:

constexpr static std::pair<std::size_t, std::size_t> GetRange() {
 return {0, (1 << ResolutionInNumberOfBits) - 1};
}

In at() you can do something similar:

constexpr unsigned int at(size_t position) const {
 assert(position < size());
 unsigned int result = 0;
 for (size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 result |= data[position * ResolutionInNumberOfBits + i] << i;
 }
 return result;
}

But reassembling a value bit by bit is not very efficient. If instead of a std::bitset, you would store the data inside a std::array<uint64_t>, then you can often get a 12 bits value out of a uint64_t by just one shift and one bitwise-AND operation, and in the worst case you need to read two consecutive uint64_t values and piece the two parts together.

Don't make assumptions about the memory layout of `std::bitset`

Don't assume that a std::bitset stores all bits consecutively in memory. While it is like that it is implemented that way, the C++ standard does not guarantee that. Your ToBytes() function is therefore not safe.

Consider just writing load/save functions

Since you have a ToBytes() member function and a constructor that takes a std::array<unsigned int>, it looks like you have ADC samples stored in regular ints, then you convert this to a std::bitset, then you convert it to a std::array<bytes>, and then you write that to disk? That sounds complicated, and it actually uses a lot more memory than necessary.

Consider keeping the data in memory in regular int types, and just have functions to load and save data to disk in 12-bit format. Such functions would not have to first convert all data at once, they can just read the first four 16-bit values from memory, then convert those to 12-bit format, and then save 6 bytes to disk, then go to the next group of 16-bit values. This way, very little extra memory is necessary. Of course, writing 6 bytes at a time is not ideal either, probably you want to write a larger batch at a time, but you certainly don't have to convert the whole array before writing it out.

Question 11

The current state of the art in general purpose lossless compression is ZSTD; about as good ratio as DEFLATE (gzip) on most data, but much faster. And there are LZ4 and similar for light-weight fast compression and decompression. But yes, with any of these you'd probably want to do your own delta-encoding pass, since they won't do it for you. (Some archivers like 7z have filters they can use for you, including a delta filter, but that's separate from libzstd functions.)

Question 12

Depending on how they need to transform their data, keeping it in memory in unsigned short format would make the most sense to me. But yeah, for reading or writing a file, convert in chunks, cache-blocked for L1d or L2 size. (e.g. pack 16 KiB down to 12 KiB, then write that buffer to a file. The kernel will memcpy that 12K somewhere into the pagecache, and the original buffer will still be hot at least in L2, maybe in a 32K L1d cache. Then convert another 16K to 12K (loading + storing that amount) and call the kernel again. It's a tradeoff of syscall overhead vs. cache locality.)

Question 13

Or if you keep it in memory in a 32-bit format, then you're loading 32K for every 12K of packed bytes you store. This can go pretty fast if the compiler manages to auto-vectorize it to SIMD shuffles, or if you manually vectorize with NEON or SSE, or especially AVX-512VBMI (vpmultishiftqb parallel bitfield extract; can grab 8x unaligned 8-bit chunks from anywhere in the containing 8 byte SIMD element), if any of those are available on the machine you're running on.

Question 14

For ARM NEON, see Compacting data in buffer from 16 bit per element to 12 bits for a hand-written asm that the author suggests should run at better than 1 element (12 bits) converted per clock cycle. (On a typical low-end in-order ARM from 10 years ago.)

Question 15

For unpacking on an x86 with AVX2 or AVX512VBMI, see SIMD unpack 12-bit fields to 16-bit (This operation is useful for video / image formats with 12-bit color depth, to unpack into a format suitable for working with, so it's not surprising to find existing code for it.)

indi indi 16.5k2 gold badges25 silver badges65 bronze badges · Answer 1 · 2022-09-04 00:15:54Z

Design review

Your requirements are a bit vague; or rather, a bit contradictory. You say you want maximum memory efficiency (in the first paragraph)... but then you say you don’t, really (in the first sentence of the second paragraph)... but then you say you do (second sentence, "okay with the overhead")... but then you say you don’t ("would like to keep it to a minimum"). You already know that there are going to be trade-offs, so pick a team, and stick with it. You can’t have it all. Accept that. None of this "would like to minimize all inefficiencies". Yeah, wouldn’t we all. Decide and clarify what you are trying to minimize, and what you are willing to sacrifice (but, obviously should be minimal as well, because why else use C++).

In any case, I’m not sure bitset is the right tool for this job. bitset is a decent abstraction for a single N-bit chunk of data... but it isn’t really for sequences. For statically-sized sequences of N-bit values, your best bet would be a array<byte, M> (where M has to be calculated as the ratio of N and CHAR_BIT, as I’ll show later). For dynamically-sized sequences, your best bet would be a vector<byte>. For a really sexy interface, you could follow the example of span, and be both static or dynamic.

You might also consider the example of span, and not make a container, but rather just a view; more on that later.

What, exactly do you need?

There are basically two options for arbitrary packed byte sequences, both with pros and cons:

Store the data packed in memory.
Store the data unpacked in memory, and only pack it on demand (such as when reading/writing to disk).

The one that sounds like what you want is option 1: to store the data packed in memory.

Solution 1: pack in memory

That option uses byte arrays in memory, with the data packed tightly in. For example, you have 12-bit values. Let’s assume 8-bit bytes. That means you would pack 1 value in 2 bytes, 2 values in 3 bytes, 3 values in 5 bytes, 4 values in 6 bytes, and so on:

 | . : ! . : | . : ! . : | . : ! . : | . : ! . : |
Value |<-- 1 -->|<-- 2 -->|<-- 3 -->|<-- 4 -->|
Byte |<- 1 ->|<- 2 ->|<- 3 ->|<- 4 ->|<- 5 ->|<- 6 ->|
 | . : . | . : . | . : . | . : . | . : . | . : . |

Generally the number of bytes you need for n N-bit values (with B-bit bytes) is \$\left\lceil \left( n \times N \right) \div B \right\rceil\$. Value i will start at byte number \$\left\lfloor \left( i \times N \right) \div B \right\rfloor\$, and bit position \$\left( i \times N \right) \mod B\$.

This solution looks a lot like what you already have (warning: untested code, written blindly, for illustration purposes only):

template <
 std::size_t Bits,
 std::size_t N
>
class ADCArray
{
private:
 // Better static_assert that (Bits * N) < numeric_limits<size_t>::max()!
 // Also better static_assert that Bits is <= (CHAR_BIT * sizeof(unsigned int)).
 static constexpr auto _size_in_bits = std::size_t(Bits * N);
 static constexpr auto _size_in_bytes = std::size_t((_size_in_bits / CHAR_BIT) + bool(_size_in_bits % CHAR_BIT));
 // Naturally, this will have to calculated based on Bits, not hard-coded,
 // but I'm just illustrating.
 static constexpr auto _mask = 0b1111'1111'1111u;
 std::array<unsigned char, _size_in_bytes> data;
public:
 // For example:
 constexpr auto at(std::size_t n) const -> unsigned int
 {
 // How many bits left to read:
 auto bits_left = Bits;
 // What byte position to start at:
 auto byte_pos = static_cast<unsigned int>((n * Bits) / CHAR_BIT);
 // What bit position to start at:
 auto bit_pos = static_cast<unsigned int>((n * Bits) % CHAR_BIT);
 // Result value:
 auto res = 0u;
 // Do the first (possibly partial) byte read:
 res = data[byte_pos++];
 bits_left -= (bit_pos == 0) ? CHAR_BIT : bit_pos;
 // Do any remaining (possibly partial) byte reads:
 while (bits_left != 0)
 {
 // How many bits to read this round:
 auto const bits_to_read = std::min(bits_left, CHAR_BIT);
 // Read those bits:
 auto const v = static_cast<unsigned int>(data[byte_pos++] >> (CHAR_BIT - bits_to_read));
 // Shift the result by that many bits, and then add them in:
 res <<= bits_to_read;
 res |= v;
 // Count the bits that were read:
 bits_left -= bits_to_read;
 }
 return res & _mask;
 }
 // ... and so on ...

Getting this right for arbitrary values of N and CHAR_BIT is tricky, but not insanely hard.

The benefits of this solution is that your memory usage can be absolutely minimal; you can’t possibly use less memory (assuming the data isn’t regular and compressible). The downside is that accesses and writes by value will be slow... although reading/writing the data to byte-based file storage will be as fast as it can possibly be.

So basically, the more you want to muck around with the values in memory, the worse this solution is. If all you want to do is collect/generate the data, and read/write it to byte-based binary streams (including files), then this is perfect. If you want to do a lot of calculations/transforms on the data in memory, then this is not great.

Solution 2: pack on demand

The other solution is to simply store the data as values in memory, with no packing, and do all the (un)packing only when requested (for example, when reading/writing to a file). There is no real magic to this solution; it’s just an array<unsigned int, NumberOfElements>. Or a vector, or anything, really... which is maximally flexible.

For that reason, you may not want to bother making a class at all—not even a view—and instead only make functions to pack/unpack on demand. This is the solution you want if you are going to be doing a lot of processing/transformation of the data. It uses more memory during processing, but will be a lot faster, and you can still pack it when saving/transmitting.

Solution 3: hybrid solution

There is a third solution, which is halfway between solutions 1 and 2.

In this solution, you do partial packing in memory. For example, let’s assume you have 20-bit values. With solution 1, you’d pack them like this:

Value |<-- 1 -->|<-- 2 -->|<-- 3 -->|<-- 4 -->|
Byte |<- 1 ->|<- 2 ->|<- 3 ->|<- 4 ->|<- 5 ->|<- 6 ->|<- 7 ->|<- 8 ->|<- 9 ->|<-10 ->|

So 4 values fit in 10 bytes.

This is maximally packed, with absolutely no wasted memory, but it means that every other value starts midway through a byte. The bit-twiddling required to access each values is costly.

To avoid that, we can sacrifice a little bit a of memory, and ensure each value starts on a byte boundary:

Value |<-- 1 -->|###|<-- 2 -->|###|<-- 3 -->|###|<-- 4 -->|
Byte |<- 1 ->|<- 2 ->|<- 3 ->|<- 4 ->|<- 5 ->|<- 6 ->|<- 7 ->|<- 8 ->|<- 9 ->|<-10 ->|<-11 ->|<-12 ->|

Now 4 values require 12 bytes... but each value access requires no bit-twiddling. Which will be a lot faster.

You’ll probably want to perfectly pack it (as in solution 1) when reading/writing to disk, so this hybrid solution actually requires the most code. But it will give you perhaps the best trade-off of speed and memory efficiency.

I’m assuming solution 1

It’s not clear from your problem description which solution you actually want. You say you mostly care about disk usage and not runtime memory... but then you say you are okay with the costs of minimizing runtime memory. So... which is it? You will minimize the disk space either way (with any of solution 1, 2, or 3), so do you want to minimize the runtime memory at the cost of making processing expensive, or do you want good runtime performance? (Or half-and-half, as in solution 3?)

I’m going to assume you really don’t care about the runtime processing costs, and you really want to minimize the memory usage. Okay, in that case, we’re talking about packing the data in memory (as opposed to leaving it unpacked in memory, and just packing it when reading/writing to disk). So, solution 1.

Consider a view, not a container

Now, as I said, I’m not sure bitset is the right took for the job. I’m not even sure array is the right tool, because:

how often will you know at compile time how much data you have? and
arrays are terrible for performance when they get huge.

When dealing with large amounts of data, even vector gets a bit iffy. deque is often a better choice, because you get most of the benefits of vector, but because the data is chunked, it’s a lot easier on memory.

The moral here is that you want to avoid forcing a choice of container on users if at all possible. And... it is possible, because you can let users use whatever container they please, and just create a view over it.

Suppose I want to read some unknown amount of 12-bit values from some source. I could create an array of bytes, take a guess at the size, and then try to fill it. I don’t want to use a vector, because a vector will zero-initialize all the data, which is wasteful (because it will all be overwritten anyway). So I manually allocate an array, and then create a view over it:

// Allocate 1024 bytes of memory.
auto size = std::size_t(1024);
auto buffer = std::make_unique_for_overwrite<std::byte>(size);
// Create a view of 12-bit values over that memory. The view is smart enough
// to know how many 12-bit values you get with 1024 bytes of raw memory.
auto view = adc_view<12>{buffer.get(), buffer.get() + size};
// Read 12-bit values from wherever, and put them in the memory the view
// represents.
auto p = view.begin();
while (true)
{
 while (p != view.end() and have_more_data())
 *p++ = read_12bit_value();
 // If we've read all the data, we're done.
 if (not have_more_data())
 break;
 // If we have more data to read, then we need to reallocate, copy the existing
 // data, and then continue reading after it.
 // Reallocate:
 auto temp = std::make_unique_for_overwrite<std::byte>(size + 1024);
 // Copy old data over:
 std::copy_n(buffer.get(), size, temp.get());
 // Replace old memory buffer:
 buffer = std::move(temp);
 // Reset the view to the new memory:
 view = adc_view<12>{buffer.get(), buffer.get() + size + 1024};
 p = view.begin() + size;
 size += 1024;
}
// But if we read everything, then we *could* do a shrink-to-fit, or we could
// just accept a little wasted memory, and return a new view that is just the
// data read.
auto data = adc_view<12>(buffer.get(), buffer.get() + (p - view.begin()));
// Now what you have is:
// data.begin() => an iterator to the first 12-bit value
// data.end() => an iterator to one-past-the-last 12-bit value
// data.size() => the number of 12-bit values
// data[i] => the i-th 12-bit value
// data.at(i) => "
// ... any other range/view functions you want

Now, in the example above, data is a std::unique_ptr<std::byte[]>... but it could be a std::vector<std::byte>, or a std::array<std::byte, N>, or a std::deque<std::byte>, or a custom circular buffer type, or... anything that is a sequence of bytes.

A view is a lightweight object that doesn’t own anything, but instead merely provides a view of some other data. In this case, you would have a view over a bunch of bytes, that presents those bytes as the N-bit values you want. It doesn’t matter what kind of object the view is looking at—could be a vector, and array, any sequence of bytes. This is maximal flexibility, at minimal cost.

I would suggest that instead of trying to create a container for N-bit values, you just make a view. Then you no longer need to worry about whether bitset is a good fit, or vector<bool> for dynamic sizes, or whether array<byte> or vector<byte> or deque<byte> are a better choice, or whether you should allow custom allocators so users can use memory pools or stack memory or whatever else. Let the user decide all that crap, and just provide a view.

Code review

#include <bitset>

You’re missing a number of headers; not least being <array>, <utility> (for pair), and more.

template<std::size_t ResolutionInNumberOfBits = 12, std::size_t NumberOfElements = 512>
class ADCArray {

Not sure it really makes much sense to provide defaults for the template parameters in this case.

Also... these names are a bit much. I mean, it’s a good thing to use descriptive names, but there comes a point when you’re just being silly. Is ResolutionInNumberOfBits really that much more descriptive than just Bits? And NumberOfElements can just be N, really; that’s kinda standard practice (for example, and even in the standard).

You should also consider putting all your stuff in your own namespace. The global namespace is a precious place, with a lot of special rules. I would suggest avoiding it if you can (and you can).

Also also, you probably want a bunch of static asserts to verify that the numbers make sense. ResolutionInNumberOfBits probably can’t be zero, for example. And if you are going to use unsigned int as the value type, it can’t be greater than the number of bits in an unsigned int (but more on that later).

 inline constexpr ADCArray() = default;

This is cool except for the inline. In fact, all of the inlines in your code are unnecessary. When you write a member function within the class, it’s inline by default anyway, and even if it weren’t compilers are way smarter than you or I these days, and will inline or not on their own whims, completely ignoring it when we manually specify inline.

I would also point out that you’ve tagged your code C++14. That’s... almost a decade old at this point. But alright, fine, if you want to peg your code to C++14, it could be worse... however, keep in mind that constexpr was still fairly primitive in C++14. In particular, neither bitset nor array were constexpr in C++14. (pair was, I think... but that’s about it.)

 inline constexpr static std::size_t size() { return NumberOfElements; }

This isn’t wrong... but... the standard form for ranges/containers/views has size() as either a non-static member function... or (as of more recent versions of C++, like C++20) a non-member function. By making this function static... you’ve pretty much broken all the rules, and now your container won’t work with std::ranges::size(), it won’t be detected as a sized range... basically nothing will work.

So either make this non-static, or make it a non-member.

If you are going to make it non-static, you should also make it const and noexcept (because it cannot fail).

 inline constexpr static std::pair<std::size_t, std::size_t> GetRange() {
 unsigned int max = 1;
 for (std::size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 max *= 2;
 }
 return {0, max - 1};
 }

This is a poorly named function. Without any comments to explain things (which is another issue; comment your code!), I have no clue what range this is getting.

It looks like it’s getting the maximum and minimum values of the value type of the container... but it doesn’t really make sense for that to a property of the container. Consider array<int>. It doesn’t have any functions that tell you what numeric_limits<int>::max() or numeric_limits<int>::min() are. Nor should it. That’s not information relevant to the array.

It’s also not information you should need an array to get. Consider if I just want to know the min and max values of a 12-bit unsigned number. Do I really need to do: ADCArray<12, DOES_NOT_MATTER>::GetRange().first and ADCArray<12, DOES_NOT_MATTER>::GetRange().second? Doesn’t it make more sense to have a separate, specific type, like something<12>::min and something<12>::max (where something could be ADCValue, I guess)?

Also, you don’t need a loop to calculate the max value. You can just do (1u << ResolutionInNumberOfBits) - 1.

 inline constexpr unsigned int at(size_t position) const {
 assert(position < size());
 unsigned int result = 0;
 unsigned int powerOf2 = 1;
 for (size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 if (data[position * ResolutionInNumberOfBits + i]) {
 result += powerOf2;
 }
 powerOf2 *= 2;
 }
 return result;
 }

You have hard-coded unsigned int as the value type of the container... but is that wise?

It would be better to let the user specify what they want. Maybe they want unsigned int. Maybe they want std::uint_fast32_t. Let them choose. You could provide a sensible default, where it defaults to unsigned int, of course.

This brings up another issue with using bitset as the underlying type: if you are working with bits, you are strictly limiting the size. Let’s say SIZE_MAX is 65,535. That means if you used an array of bytes, you could access up to 65,535 bytes (in theory). But with bits, you can only access up to 65,535... bits... which, assuming 8-bit bytes, is only ~8,000 bytes.

And, of course, you end up with silliness like this, where you have to go bit-by-bit to get the actual value. (As opposed to getting it up to a byte at a time, as I illustrated in the design review.)

Even so, this is not a particularly efficient, nor readable way to go about it. All you need to do is start at the starting bit, and go to the final bit, and set the bits in the result as needed, like so:

constexpr auto at(std::size_t position) const
{
 auto result = 0u;
 auto const start_pos = position * ResolutionInNumberOfBits;
 auto const end_pos = start_pos + ResolutionInNumberOfBits;
 for (auto i = start_pos; i != end_pos; ++i)
 result = (result << 1) + data[i];
 return result;
}

Simple as that. But, again, doing this bit-by-bit is not particularly efficient.

 inline constexpr std::array<unsigned int, NumberOfElements> GetValues() const {

This doesn’t seem to be a particularly useful function. It would make more sense to have things like begin()/end(), and use an algorithm:

auto data = ADCArray<resolutionInBits, numberOfElements>{};
// ... fill data somehow...
auto values = std::array<unsigned int, numberOfElements>{};
std::copy(data.begin(), data.end(), values.begin());
// Or:
// std::copy_n(data.begin(), numberOfElements, values.begin());
// std::copy_n(data.begin(), data.size(), values.begin());
// C++20:
// std::ranges::copy(data, values.begin());

Doing it this way allows for values to be a vector, or anything else, rather than a hard-coded array.

 inline constexpr void insert(size_t position, unsigned int value) {
 assert(position < size());
 assert(value <= GetRange().second);
 for (size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 data[position * ResolutionInNumberOfBits + i] = value % 2;
 value /= 2;
 }
 }

insert() is not a great name for this function, because you are not actually inserting anything. You are, at best, replacing a value in the container, not inserting one. (But "replace" is not a standard name, so probably not a great choice. Maybe just "set", for this case?)

I would suggest looking at the containers library, and seeing what functions the standard containers have (check out the member function table), what those functions actually do, and then using the same interface on your own container. The standard container library is the standard container library... it’s the library all C++ coders know (or should know), and it is the metric for what all C++ coders expect from a container. You will almost never be able to match the standard interfaces exactly, because every container is different (even the standard containers all vary in their interface). But you should try to match them as much as possible, and certainly in spirit.

As for the actual algorithm here... I would suggest rethinking the way you’re doing it. It is much easier to work backwards here:

constexpr auto set(std::size_t position, unsigned int value)
{
 auto const start_pos = position * ResolutionInNumberOfBits;
 auto const end_pos = start_pos + ResolutionInNumberOfBits;
 for (auto i = end_pos; i != start_pos; --i)
 {
 // Read the least-significant bit.
 data[i - 1] = value & 0x1u;
 // Shift.
 value >>= 1;
 }
}

This avoids the very expensive divisions and modulo operations.

 inline constexpr std::array<byte, sizeof(data)> ToBytes() const {
 std::array<byte, sizeof(data)> bytes;
 const byte* begin = reinterpret_cast<const byte*>(std::addressof(data));
 std::copy(begin, begin + sizeof(data), std::begin(bytes));
 return bytes;
 }

This is a really terrible idea. You don’t want to be digging into the guts of a class you have no control over, and no interface guarantees. You have no idea what’s going on inside std::bitset. You seem to assume that under the hood it’s just an array of bytes with the bits you want packed into it in logical (network) order. I guarantee you that that assumption is at least sometimes wrong—if the number of bits is less than the number of bits in an unsigned int, it is almost certain that it’s just using an unsigned int, where the bits you want are possibly scattered all about due to endianness. And even that assumes that there is no padding, no extra data (like a size value stored before the bit data), or anything else going on.

Even if written correctly, this function is just GetValues() all over again, which, again, doesn’t really serve much purpose. If you want to convert the data to an array of bytes, it would make much more sense to provide the public interface functions to allow it, and just let the user do it themselves.

 inline constexpr ADCArray<ResolutionInNumberOfBits, NumberOfElements> static FromBytes(const std::array<byte, sizeof(data)>& bytes) {
 auto array = ADCArray<ResolutionInNumberOfBits, NumberOfElements>();
 byte* begin = reinterpret_cast<byte*>(std::addressof(array.data));
 std::copy(std::begin(bytes), std::end(bytes), begin);
 return array;
 }

Same problems as with ToBytes(), with the added bonus of sizeof(data). You assume that a bitset is absolutely nothing but a byte array containing the data you want. You have no way of knowing that; if bitset has any other data members, and/or if they are ordered in a way you don’t expect, you’ll just be writing garbage into the bit set.

Incidentally, if you think this function is useful, you should seriously consider making a view rather than a container. If you have an array of bytes containing the raw data you want, why bother creating a whole other array just to copy that data into it? Why not just provide a lightweight view over the original data? There’s no need to copy anything.

 inline constexpr ADCArray(const std::array<unsigned int, size()>& values) {
 for (size_t i = 0; i < values.size(); i++) {
 insert(i, values.at(i));
 }
 }

This isn’t wrong... but it’s terribly inflexible. What if the values are in a vector? Or literally anything else?

What you should do is provide a constructor that takes an iterator pair, so you can read values in from anything: an array, a vector, hell, even directly out of a file or network stream, using stream iterators.

And then you should consider providing a range constructor, that can not only take an array, but any other kind of container or view or range.

Now, if you are trying to do this in C++14... it’s not impossible, but it’s fraught. If you were doing this in C++20, it would be trivial:

template <std::input_iterator I, std::sentinel_for<I> S>
 requires std::unsigned_integral<std::iter_value_t<I>>
constexpr ADCArray(I first, S last)
{
 auto n = std::size_t{0}; 
 while ((n < size()) and (first != last))
 set(n++, *first++);
 // If you want, check whether there's leftover data, or whether there was
 // not enough, and handle it as you please.
}
template <std::ranges::input_range R>
 requires std::unsigned_integral<std::range_value_t<R>>
constexpr ADCArray(R&& rng)
 : ADCArray{std::ranges::begin(rng), std::ranges::end(rng)}
{}

Doing the same thing in C++14 is possible, but much, much trickier, requiring much more work.

Summary

Put everything in a namespace.
Don’t work with bits. Work with bytes (at least). (So, no bitset, use array<byte> or vector<byte> instead.)
Don’t make a container, make a view. It’s much more flexible.
Try to match standard interfaces wherever possible.
static_assert everything that you assume must be true about your template parameters.
Ditch all the inlines.
Don’t assume anything about the internals of classes, not even standard library classes.
C++14 is ancient. At least move to C++17; even Google is there now. Even better, move to C++20.

Your proposed at() bit-reverses the value relative to the storage in the bitset. The highest-numbered bit will be the last one your loop reads, and will become the least-significant one in result = (result << 1) + data[i];. So even if a compiler miraculously saw through that mess and was able to load the containing 4-byte or 8-byte chunk and shift/mask it, it would need an ARM rbit instruction. (Most other ISAs don't have a bit-reverse instruction, only byte-reverse.) I'm assuming of course a normal implementation of std::bitset where [0] is in the byte with lowest address.
The OP's implementation is horrible in a different way, branching on each bit instead of shifting it into an integer, so yeah it needed changing. But it does put bits in the "right" order. If you have to write something like this, probably loop from the high to low bit, so at least in theory a sufficiently-smart compiler could do a 2-byte unaligned load starting at bitset + a byte offset of position * 3/2 or something, then right shift by 4 bits or not depending on whether position was odd.
Anyway, then set() can go forwards, starting from the lowest bit of the value at the lowest position in the bitset, like in the OP's code.
@PeterCordes I did not propose any at(). I just provided a very vague illustration of the idea, with code that was untested and frankly not even all that carefully thought out. I don’t even know if it works. The text explains all that.
It's pretty long so I was skimming. I the general idea of at working that way was the OP's idea; I meant your example implementation, looping forward and doing (result << 1) + data[i];. Agreed that it's mostly a moot point because it's a terrible idea to go bit-by-bit anyway. Since std::bitset<> doesn't provide a bit-range accessor itself, 100% agreed that we shouldn't use it at all, instead of building one on top of its bit-at-a-time interface.

Toby Speight Toby Speight 87.8k14 gold badges104 silver badges325 bronze badges · Answer 2 · 2022-09-03 21:34:22Z

You're missing the header includes to provide std::array, std::pair, std::addressof(), std::begin(), std::end(), std::copy() and assert(). Possibly others - these are just the ones I noticed.

You've misspelt std::size_t as size_t in more than one place.

You could use operator!=() = default to simplify.

We seem to have an assumption that ResolutionInNumberOfBits is no more than the number of bits in the platform's unsigned int, but we really ought to check that.

G. Sliepen G. Sliepen 68.9k3 gold badges74 silver badges180 bronze badges · Answer 3 · 2022-09-03 21:19:05Z

Consider using a proper compression algorithm

Your method saves 25% of storage space, and is relatively simple to implement, but if the goal is to reduce memory and disk usage as much as possible, you can do much better than that.

For ADC values, it is likely that not all possible values are equally likely. Huffman coding is a way to store data in less bits, making use of the fact that some values are more likely than others, and storing those likely values with less bits. In fact, if every value between 0 and 1023 would be equally likely, and any other 16 bit value would never occur, Huffman coding would compress the data to exactly 12 bits per sample for you, but it can even produce better ratios depending on the distribution of values.

Often Huffman coding is not used on its own, but as part of a more sophisticated compression algorithm. For example, ZIP files, JPEG and MP3 files are all using Huffman at some point. But they first massage the data so it will be compressed even better.

Consider a typical signal that is being sampled by an ADC: it's probably a signal that doesn't jump from a random value to another between readouts, but is varying smoothly over time. Instead of compressing the value directly, you can instead just record the value of the initial sample, and then only record the difference between the current and the previous sample.

You can also decide to do lossy compression: if you know the signal or the ADC itself has some unwanted noise, you can apply a filter to the signal before compressing it.

There are many libraries available that can compress data for you.

Inefficient code

You are optimizing the size requirements for your data, but your code is not as efficient as possible when you look at the CPU time required to pack and unpack the data.

Consider GetRange(): basically it returns the minimum and maximum value that you can store per sample. Using the left shift operator you can calculate this without having to use a for-loop:

constexpr static std::pair<std::size_t, std::size_t> GetRange() {
 return {0, (1 << ResolutionInNumberOfBits) - 1};
}

In at() you can do something similar:

constexpr unsigned int at(size_t position) const {
 assert(position < size());
 unsigned int result = 0;
 for (size_t i = 0; i < ResolutionInNumberOfBits; i++) {
 result |= data[position * ResolutionInNumberOfBits + i] << i;
 }
 return result;
}

But reassembling a value bit by bit is not very efficient. If instead of a std::bitset, you would store the data inside a std::array<uint64_t>, then you can often get a 12 bits value out of a uint64_t by just one shift and one bitwise-AND operation, and in the worst case you need to read two consecutive uint64_t values and piece the two parts together.

Don't make assumptions about the memory layout of `std::bitset`

Don't assume that a std::bitset stores all bits consecutively in memory. While it is like that it is implemented that way, the C++ standard does not guarantee that. Your ToBytes() function is therefore not safe.

Consider just writing load/save functions

Since you have a ToBytes() member function and a constructor that takes a std::array<unsigned int>, it looks like you have ADC samples stored in regular ints, then you convert this to a std::bitset, then you convert it to a std::array<bytes>, and then you write that to disk? That sounds complicated, and it actually uses a lot more memory than necessary.

Consider keeping the data in memory in regular int types, and just have functions to load and save data to disk in 12-bit format. Such functions would not have to first convert all data at once, they can just read the first four 16-bit values from memory, then convert those to 12-bit format, and then save 6 bytes to disk, then go to the next group of 16-bit values. This way, very little extra memory is necessary. Of course, writing 6 bytes at a time is not ideal either, probably you want to write a larger batch at a time, but you certainly don't have to convert the whole array before writing it out.

The current state of the art in general purpose lossless compression is ZSTD; about as good ratio as DEFLATE (gzip) on most data, but much faster. And there are LZ4 and similar for light-weight fast compression and decompression. But yes, with any of these you'd probably want to do your own delta-encoding pass, since they won't do it for you. (Some archivers like 7z have filters they can use for you, including a delta filter, but that's separate from libzstd functions.)
Depending on how they need to transform their data, keeping it in memory in unsigned short format would make the most sense to me. But yeah, for reading or writing a file, convert in chunks, cache-blocked for L1d or L2 size. (e.g. pack 16 KiB down to 12 KiB, then write that buffer to a file. The kernel will memcpy that 12K somewhere into the pagecache, and the original buffer will still be hot at least in L2, maybe in a 32K L1d cache. Then convert another 16K to 12K (loading + storing that amount) and call the kernel again. It's a tradeoff of syscall overhead vs. cache locality.)
Or if you keep it in memory in a 32-bit format, then you're loading 32K for every 12K of packed bytes you store. This can go pretty fast if the compiler manages to auto-vectorize it to SIMD shuffles, or if you manually vectorize with NEON or SSE, or especially AVX-512VBMI (vpmultishiftqb parallel bitfield extract; can grab 8x unaligned 8-bit chunks from anywhere in the containing 8 byte SIMD element), if any of those are available on the machine you're running on.
For ARM NEON, see Compacting data in buffer from 16 bit per element to 12 bits for a hand-written asm that the author suggests should run at better than 1 element (12 bits) converted per clock cycle. (On a typical low-end in-order ARM from 10 years ago.)
For unpacking on an x86 with AVX2 or AVX512VBMI, see SIMD unpack 12-bit fields to 16-bit (This operation is useful for video / image formats with 12-bit color depth, to unpack into a format suitable for working with, so it's not surprising to find existing code for it.)

Stack Exchange Network

Optimize memory and disk usage for sequences of n-bit values

3 Answers 3

Design review

What, exactly do you need?

Solution 1: pack in memory

Solution 2: pack on demand

Solution 3: hybrid solution

I’m assuming solution 1

Consider a view, not a container

Code review

Summary

Consider using a proper compression algorithm

Inefficient code

Don't make assumptions about the memory layout of `std::bitset`

Consider just writing load/save functions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimize memory and disk usage for sequences of n-bit values

3 Answers 3

Design review

What, exactly do you need?

Solution 1: pack in memory

Solution 2: pack on demand

Solution 3: hybrid solution

I’m assuming solution 1

Consider a view, not a container

Code review

Summary

Consider using a proper compression algorithm

Inefficient code

Don't make assumptions about the memory layout of std::bitset

Consider just writing load/save functions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Don't make assumptions about the memory layout of `std::bitset`