The following code is my answer to a Stack Overflow question:
#include <string>
#include <vector>
#include <fstream>
#include <cstdio>
#include <cassert>
// #include <stdint.h> => DWORD
#ifndef int32_t
#define uint32_t DWORD
#define int32_t int
#endif
struct MapItem
{
std::string term;
std::vector<int32_t> pl;
// this function also emphasizes the way this objects are serialized to file; if something changes here,
// ReadNext & WriteNext must also be updated (unfortunate dependency)
size_t SizeWrittenToFile() const
{
return 2*sizeof(uint32_t)+term.length()+pl.size()*sizeof(int32_t);
}
};
bool ReadNext(std::istream& in, MapItem& item)
{
size_t size;
in.read(reinterpret_cast<char*>(&size), sizeof(uint32_t));
if (!in)
return false;
std::istreambuf_iterator<char> itIn(in);
std::string& out = item.term;
out.reserve(size);
out.clear(); // this is necessary if the string is not empty
// copy_n available in C++11
for (std::insert_iterator<std::string> itOut(out, out.begin()); in && (out.length() < size); itIn++, itOut++)
*itOut = *itIn;
assert(in);
if (!in)
return false;
in.read(reinterpret_cast<char*>(&size), sizeof(uint32_t));
if (!in)
return false;
std::vector<int32_t>& out2 = item.pl;
out2.resize(size); // reserve doesn't work here
in.read(reinterpret_cast<char*>(&out2[0]), size * sizeof(int32_t));
assert(in);
return true;
}
// a "header" should be added to mark complete data (to write "atomically")
bool WriteNext(std::ostream& out, const MapItem& item)
{
size_t size = item.term.length();
out.write(reinterpret_cast<const char*>(&size), sizeof(uint32_t));
if (!out)
return false;
out.write(item.term.c_str(), size);
if (!out)
return false;
size = item.pl.size();
out.write(reinterpret_cast<const char*>(&size), sizeof(uint32_t));
if (!out)
return false;
out.write(reinterpret_cast<const char*>(&item.pl[0]), size * sizeof(int32_t));
if (!out)
return false;
return true;
}
bool UpdateItem(std::ifstream& in, std::ofstream& out, const MapItem& item)
{
MapItem it;
bool result;
for (result = ReadNext(in, it); result && (it.term != item.term); result = ReadNext(in, it))
if (!WriteNext(out, it))
return false;
if (!result)
return false;
assert(it.term == item.term);
if (!WriteNext(out, item))
return false;
for (result = ReadNext(in, it); result; result = ReadNext(in, it))
if (!WriteNext(out, it))
return false;
return in.eof();
}
bool UpdateItem(const char* filename, const MapItem& item)
{
std::ifstream in(filename, std::ios_base::in | std::ios_base::binary);
assert(in);
std::string filename2(filename);
filename2 += ".tmp";
std::ofstream out(filename2.c_str(), std::ios_base::out | std::ios_base::binary);
assert(out);
bool result = UpdateItem(in, out, item);
in.close();
out.close();
int err = 0;
if (result)
{
err = remove(filename);
assert(!err && "remov_140");
result = !err;
}
if (!result)
{
err = remove(filename2.c_str());
assert(!err && "remov_147");
}
else
{
err = rename(filename2.c_str(), filename);
assert(!err && "renam_151");
result = !err;
}
return result;
}
Beside correctness, I'm concerned with two problems:
- performance when reading to string in
ReadNext
: "in && (out.length() < size)
" A better implementation would be capable of reading directly a block of data. - To update an item a second file is used. Is there a safe solution using only one file?
EDIT My interess goes beyond the initial Stack Overflow question as this is a (re)learning exercise of the C++ streams and I'll use it as a sample. I also added the code here, with additional real tests, VS 2008, but it should be easily ported to other platforms.
1 Answer 1
I'm going to focus on readability rather than performance for this review. I will also (hopefully) avoid C++11 features since you're using VS2008.
Change
<assert.h>
to<cassert>
.
Use the C++ headers instead of the C headers.Rename
mapItem
to eithermap_item
orMapItem
.
The former is favored in Unix/Linux circles while the latter is favored in Windows circles.Comment and explain the function
SizeWrittenToFile
.
Since you didn't explain the binary format, it's not clear why this returns what it does without looking at yourWriteNext
function.Depending on
sizeof (size_t)
can lead to incompatibilities for the binary file format if this program is compiled on different machines or even just with different build settings.For example, let's say I have:
Machine A -- Assume
sizeof (size_t)
returns 4 (Perhaps a 32-bit machine).
Machine B -- Assumesizeof (size_t)
returns 8 (Perhaps a 64-bit machine).If I write a binary file on A and copy it over to B, B will not be able to correctly read or update the file.
Having fixed width formats for the size fields would solve this problem.
This is subjective, but I think exceptions would make your code a lot cleaner to read than returning error codes.
See below.
performance when reading to string in
ReadNext
:in && (out.length() < size)
A better implementation would be capable of reading directly a block of data.I can't think of any standard way of getting around this.
A non-standard way would be to do:out.resize (size) ; in.read (&out[0], size) ;
I don't recommend it, but it should work on many modern compilers.
Because this is a binary file format, don't forget to use the
std::ios_base::binary
flag when opening files.I noticed you omitted that flag on your tests.
See below.
To update an item a second file is used. Is there a safe solution using only one file?
I actually like your two-file approach.
For a one-file approach, you would basically create a
std::vector <mapItem>
andReadNext
the entire file into memory. You would then modify whatever you want in memory and write everything back to the file afterwards.