I am dealing with a lot of legacy, reverse engineered binary file formats, often with lost source code and reading/writing these files needs to be recoded in C++.
I am wondering if there are good examples or ideas on simplyfing the process of converting documentation of the file format into code with goal being to load data into a class that can be loaded/saved/processed.
From current investigation into the issue I think boost serialization may be one of the best options ( http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/ ) Although not sure if there is a simpler way just using C++ and STL?
I am mostly concerned about the ease of describing the data, and minimizing rework for each new type of binary file format being worked on.
-
5Boost Serialization doesn't do what you think it does. It doesn't reconstruct an arbitrary sequence of bytes into c++ objects; it deconstructs a c++ object into a non-arbitrary sequence of bytes for later reconstruction (in other words, the serialization format is already well-defined). I'm afraid you'll probably have to do this the old-fashioned way; by writing custom code.Robert Harvey– Robert Harvey2016年06月21日 14:20:17 +00:00Commented Jun 21, 2016 at 14:20
-
1For the parsing of the binary data, many libraries and code generators are available (including Boost.Spirit). I'm not sure whether this is what you need, though.5gon12eder– 5gon12eder2016年06月21日 21:50:49 +00:00Commented Jun 21, 2016 at 21:50
-
Ok boost spirit may be more appropriate for what I'm trying to achieve, I'll spend some time investigating that option.Malcolm McCaffery– Malcolm McCaffery2016年06月22日 02:07:21 +00:00Commented Jun 22, 2016 at 2:07
1 Answer 1
I am wondering if there are good examples or ideas on simplyfing the process of converting documentation of the file format into code with goal being to load data into a class that can be loaded/saved/processed.
This can be solved at multiple levels:
you can use boost::spirit parsing, or a custom serializer/deserializer (as suggested in the comments)
you can hide the implementation behind a custom set of boost::iostream device buffer types.
I am mostly concerned about the ease of describing the data, and minimizing rework for each new type of binary file format being worked on.
I would do this by creating some custom types that map i/o bytes to semantic information, transparently to the user:
/// map custom file header info into BlaBla information
class BlaBlaHeaderField
{
std::uint32_t binary_header;
BlaBlaHeaderField(std::uint32_t binary_header) { ... }
/// custom property (interprets individual bytes)
int BlaBlaParity() { return (binary_header & 0x01); }
};
This way, the format will be close to self-documenting from the code, later.
You can also use a union and overlay the fields with an integer/long/whatever.