7
\$\begingroup\$

I'm writing a program that makes heavy use of std::bitset's and occasionally needs to read/write these to file. std::bitset does overload the << & >> operators, but using these will result in an ASCII encoded file (i.e. {0,1} = 1 byte), which is ~8x bigger than it would be if using a bit-for-bit encoding.

I've seen a few questions on Stack Overflow relating to this, such as this question, but it seems there is no standard or easy way to do bitset I/O. I therefore set about writing a general bitset I/O class that is able to easily read and write multiple bitset's.

#include <iostream>
#include <vector>
#include <bitset>
template <std::size_t N>
class BitIo
{
public:
 void push_back(const std::bitset<N>& bs)
 {
 std::vector<Byte> result((N + 7) >> 3);
 for (int j = 0; j < int(N); ++j) {
 result[j >> 3] |= (bs[j] << (j & 7));
 }
 for (const Byte& byte : result) {
 bytes.push_back(byte);
 }
 num_bytes += NUM_BYTES_PER_BITSET;
 }
 std::bitset<N> pop_front()
 {
 std::bitset<N> result;
 for (int j = 0; j < int(N); ++j) {
 result[j] = ((bytes[(j >> 3) + offset] >> (j & 7)) & 1);
 }
 offset += NUM_BYTES_PER_BITSET;
 num_bytes -= NUM_BYTES_PER_BITSET;
 return result;
 }
 bool empty()
 {
 return num_bytes < NUM_BYTES_PER_BITSET;
 }
 void clear()
 {
 bytes.clear();
 num_bytes = 0;
 }
 std::size_t size()
 {
 return num_bytes;
 }
private:
 using Byte = unsigned char;
 static constexpr std::size_t NUM_BYTES_PER_BITSET = N / 8;
 template <std::size_t T>
 friend std::ostream& operator<<(std::ostream& os, const BitIo<T>& bio);
 template <std::size_t T>
 friend std::istream& operator>>(std::istream& is, BitIo<T>& bio);
 std::istream& read_file(std::istream& is)
 {
 bytes.clear();
 std::streampos current_pos, file_size;
 current_pos = is.tellg();
 is.seekg(0, std::ios::end);
 file_size = is.tellg() - current_pos;
 is.seekg(current_pos, std::ios::beg);
 bytes.resize(file_size);
 is.read((char*) &bytes[0], file_size);
 num_bytes += file_size;
 return is;
 }
 std::vector<Byte> bytes;
 std::size_t offset = 0;
 std::size_t num_bytes = 0;
};
template <std::size_t N>
std::ostream& operator<<(std::ostream& os, const BitIo<N>& bio)
{
 for (const auto& byte : bio.bytes) {
 os << byte;
 }
 return os;
}
template <std::size_t N>
std::istream& operator>>(std::istream& is, BitIo<N>& bio)
{
 if(!is) {
 is.setstate(std::ios::failbit);
 }
 bio.read_file(is);
 return is;
}

Here is an example usage:

std::ofstream bin_out("~/bf.bin", std::ios::out | std::ios::binary);
BitIo<16> bio;
bio.push_back(std::bitset<16>("1001011010010110"));
bio.push_back(std::bitset<16>("0000000011111111"));
bio.push_back(std::bitset<16>("1111111100000000"));
bio.push_back(std::bitset<16>("0011001111001100"));
bin_out << bio;
bin_out.close(); // bf.bin is 8 bytes
std::ifstream bin_in("~/bf.bin", std::ios::binary);
BitIo<16> bio2;
bin_in >> bio2;
while (!bio2.empty()) {
 cout << bio2.pop_front() << endl; // Prints the 4 16-bit bitsets in correct order.
}

I'm looking for any performance optimisations and design improvements.

At the moment, only one file can be read, it might be nice to be able to read multiple files into a single object. If anyone can suggest a method for doing this without impacting performance that would be good!

asked Oct 17, 2014 at 13:47
\$\endgroup\$
10
  • \$\begingroup\$ Not the interface I would want. I would have liked to go: std::cout << BitIO(myBitset) << "\n"; for output or std::cin >> BitIO(myBitset) for input. \$\endgroup\$ Commented Oct 18, 2014 at 17:22
  • \$\begingroup\$ I don't see how that could work well when there are multiple bitsets involved. \$\endgroup\$ Commented Oct 20, 2014 at 9:03
  • \$\begingroup\$ That's another problem i have with this code. Buffering it up in a vector before printing makes the whole interface for using it terrible. If I already have multiple bitsets (lets say a vector (or any container)). The commented technique works beautifully with std::copy() and std::ostream_iterator(). If I have a single bitset I don't need to create a vector to print it like this technique uses. \$\endgroup\$ Commented Oct 20, 2014 at 17:19
  • \$\begingroup\$ std::copy(std::begin(data), std::end(data), std::ostream_iterator<BitIO>(std::cout)); \$\endgroup\$ Commented Oct 20, 2014 at 17:20
  • \$\begingroup\$ std::copy(std::istream_iterator<BitIO>(file), std::istream_iterator<BitIO>(), std::back_inserter(data)); \$\endgroup\$ Commented Oct 20, 2014 at 17:22

2 Answers 2

7
\$\begingroup\$

You are using an std::vector for temporary storage inside push_back(). This is a possible point of optimization, since the size of it is constant ((N + 7) >> 3). You could use an std::array in this case to make sure no dynamic memory is allocated. If you are concerned however that your N is going to be, in some cases, big enough to cause a stack overflow, then the vector would be indeed the best choice.


Appending the vectors inside push_back() can be simplified:

for (const Byte& byte : result) {
 bytes.push_back(byte);
}

You can use std::vector::insert():

bytes.insert(std::end(bytes), std::begin(result), std::end(result));

This is also more efficient, since insert() can take the difference between begin / end and reserve() the exact amount of memory that will be needed.


for (int j = 0; j < int(N); ++j)

This int(N) cast is silly. Declare j with std::size_t type.

Also, why are you keeping a separate byte count in num_bytes if the bytes vector has that same info in its size() method?


Avoid C-style casts:

is.read((char*) &bytes[0], file_size);

Change to:

is.read(reinterpret_cast<char *>(&bytes[0]), file_size);

Methods that don't mutate member state are const:

bool empty() const;
std::size_t size() const;
answered Oct 17, 2014 at 18:44
\$\endgroup\$
2
  • \$\begingroup\$ Nice suggestions. The reason I'm keeping a byte count (num_bytes) and an offset into the vector (offset) is so I don't have to actually modify the underlying vector. \$\endgroup\$ Commented Oct 20, 2014 at 9:05
  • \$\begingroup\$ @Daniel, Oh I see, you are not removing data on pop_back. Okay. \$\endgroup\$ Commented Oct 20, 2014 at 13:33
2
\$\begingroup\$
static constexpr std::size_t NUM_BYTES_PER_BITSET = N / 8;

I fear that by choosing this as NUM_BYTES_PER_BITSET you are underestimating the number of required bytes when N is not a multiple of 8. This is not an issue since you are using a vector, but when reading the offset may be wrong!

answered Feb 14, 2016 at 8:46
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.