I'm writing a program that makes heavy use of std::bitset
's and occasionally needs to read/write these to file. std::bitset
does overload the <<
& >>
operators, but using these will result in an ASCII encoded file (i.e. {0,1} = 1 byte), which is ~8x bigger than it would be if using a bit-for-bit encoding.
I've seen a few questions on Stack Overflow relating to this, such as this question, but it seems there is no standard or easy way to do bitset
I/O. I therefore set about writing a general bitset
I/O class that is able to easily read and write multiple bitset
's.
#include <iostream>
#include <vector>
#include <bitset>
template <std::size_t N>
class BitIo
{
public:
void push_back(const std::bitset<N>& bs)
{
std::vector<Byte> result((N + 7) >> 3);
for (int j = 0; j < int(N); ++j) {
result[j >> 3] |= (bs[j] << (j & 7));
}
for (const Byte& byte : result) {
bytes.push_back(byte);
}
num_bytes += NUM_BYTES_PER_BITSET;
}
std::bitset<N> pop_front()
{
std::bitset<N> result;
for (int j = 0; j < int(N); ++j) {
result[j] = ((bytes[(j >> 3) + offset] >> (j & 7)) & 1);
}
offset += NUM_BYTES_PER_BITSET;
num_bytes -= NUM_BYTES_PER_BITSET;
return result;
}
bool empty()
{
return num_bytes < NUM_BYTES_PER_BITSET;
}
void clear()
{
bytes.clear();
num_bytes = 0;
}
std::size_t size()
{
return num_bytes;
}
private:
using Byte = unsigned char;
static constexpr std::size_t NUM_BYTES_PER_BITSET = N / 8;
template <std::size_t T>
friend std::ostream& operator<<(std::ostream& os, const BitIo<T>& bio);
template <std::size_t T>
friend std::istream& operator>>(std::istream& is, BitIo<T>& bio);
std::istream& read_file(std::istream& is)
{
bytes.clear();
std::streampos current_pos, file_size;
current_pos = is.tellg();
is.seekg(0, std::ios::end);
file_size = is.tellg() - current_pos;
is.seekg(current_pos, std::ios::beg);
bytes.resize(file_size);
is.read((char*) &bytes[0], file_size);
num_bytes += file_size;
return is;
}
std::vector<Byte> bytes;
std::size_t offset = 0;
std::size_t num_bytes = 0;
};
template <std::size_t N>
std::ostream& operator<<(std::ostream& os, const BitIo<N>& bio)
{
for (const auto& byte : bio.bytes) {
os << byte;
}
return os;
}
template <std::size_t N>
std::istream& operator>>(std::istream& is, BitIo<N>& bio)
{
if(!is) {
is.setstate(std::ios::failbit);
}
bio.read_file(is);
return is;
}
Here is an example usage:
std::ofstream bin_out("~/bf.bin", std::ios::out | std::ios::binary);
BitIo<16> bio;
bio.push_back(std::bitset<16>("1001011010010110"));
bio.push_back(std::bitset<16>("0000000011111111"));
bio.push_back(std::bitset<16>("1111111100000000"));
bio.push_back(std::bitset<16>("0011001111001100"));
bin_out << bio;
bin_out.close(); // bf.bin is 8 bytes
std::ifstream bin_in("~/bf.bin", std::ios::binary);
BitIo<16> bio2;
bin_in >> bio2;
while (!bio2.empty()) {
cout << bio2.pop_front() << endl; // Prints the 4 16-bit bitsets in correct order.
}
I'm looking for any performance optimisations and design improvements.
At the moment, only one file can be read, it might be nice to be able to read multiple files into a single object. If anyone can suggest a method for doing this without impacting performance that would be good!
2 Answers 2
You are using an std::vector
for temporary storage inside push_back()
. This is a possible point of optimization, since the size of it is constant ((N + 7) >> 3
). You could use an std::array
in this case to make sure no dynamic memory is allocated. If you are concerned however that your N
is going to be, in some cases, big enough to cause a stack overflow, then the vector
would be indeed the best choice.
Appending the vectors inside push_back()
can be simplified:
for (const Byte& byte : result) {
bytes.push_back(byte);
}
You can use std::vector::insert()
:
bytes.insert(std::end(bytes), std::begin(result), std::end(result));
This is also more efficient, since insert()
can take the difference between begin
/ end
and reserve()
the exact amount of memory that will be needed.
for (int j = 0; j < int(N); ++j)
This int(N)
cast is silly. Declare j
with std::size_t
type.
Also, why are you keeping a separate byte count in num_bytes
if the bytes
vector has that same info in its size()
method?
Avoid C-style casts:
is.read((char*) &bytes[0], file_size);
Change to:
is.read(reinterpret_cast<char *>(&bytes[0]), file_size);
Methods that don't mutate member state are const
:
bool empty() const;
std::size_t size() const;
-
\$\begingroup\$ Nice suggestions. The reason I'm keeping a byte count (
num_bytes
) and an offset into thevector
(offset
) is so I don't have to actually modify the underlyingvector
. \$\endgroup\$Daniel– Daniel2014年10月20日 09:05:57 +00:00Commented Oct 20, 2014 at 9:05 -
\$\begingroup\$ @Daniel, Oh I see, you are not removing data on
pop_back
. Okay. \$\endgroup\$glampert– glampert2014年10月20日 13:33:35 +00:00Commented Oct 20, 2014 at 13:33
static constexpr std::size_t NUM_BYTES_PER_BITSET = N / 8;
I fear that by choosing this as NUM_BYTES_PER_BITSET you are underestimating the number of required bytes when N is not a multiple of 8. This is not an issue since you are using a vector, but when reading the offset may be wrong!
std::cout << BitIO(myBitset) << "\n";
for output orstd::cin >> BitIO(myBitset)
for input. \$\endgroup\$bitset
s involved. \$\endgroup\$std::copy()
andstd::ostream_iterator()
. If I have a single bitset I don't need to create a vector to print it like this technique uses. \$\endgroup\$std::copy(std::begin(data), std::end(data), std::ostream_iterator<BitIO>(std::cout));
\$\endgroup\$std::copy(std::istream_iterator<BitIO>(file), std::istream_iterator<BitIO>(), std::back_inserter(data));
\$\endgroup\$