Strategy for Binary File Format Description to C++ Implementation

Question 1

I am dealing with a lot of legacy, reverse engineered binary file formats, often with lost source code and reading/writing these files needs to be recoded in C++.

I am wondering if there are good examples or ideas on simplyfing the process of converting documentation of the file format into code with goal being to load data into a class that can be loaded/saved/processed.

From current investigation into the issue I think boost serialization may be one of the best options ( http://www.boost.org/doc/libs/1_61_0/libs/serialization/doc/ ) Although not sure if there is a simpler way just using C++ and STL?

I am mostly concerned about the ease of describing the data, and minimizing rework for each new type of binary file format being worked on.

Question 2

Boost Serialization doesn't do what you think it does. It doesn't reconstruct an arbitrary sequence of bytes into c++ objects; it deconstructs a c++ object into a non-arbitrary sequence of bytes for later reconstruction (in other words, the serialization format is already well-defined). I'm afraid you'll probably have to do this the old-fashioned way; by writing custom code.

Question 3

For the parsing of the binary data, many libraries and code generators are available (including Boost.Spirit). I'm not sure whether this is what you need, though.

Question 4

Ok boost spirit may be more appropriate for what I'm trying to achieve, I'll spend some time investigating that option.

Question 5

I am wondering if there are good examples or ideas on simplyfing the process of converting documentation of the file format into code with goal being to load data into a class that can be loaded/saved/processed.

This can be solved at multiple levels:

you can use boost::spirit parsing, or a custom serializer/deserializer (as suggested in the comments)
you can hide the implementation behind a custom set of boost::iostream device buffer types.

I am mostly concerned about the ease of describing the data, and minimizing rework for each new type of binary file format being worked on.

I would do this by creating some custom types that map i/o bytes to semantic information, transparently to the user:

/// map custom file header info into BlaBla information
class BlaBlaHeaderField
{
 std::uint32_t binary_header;
 BlaBlaHeaderField(std::uint32_t binary_header) { ... }
 /// custom property (interprets individual bytes)
 int BlaBlaParity() { return (binary_header & 0x01); }
};

This way, the format will be close to self-documenting from the code, later.

You can also use a union and overlay the fields with an integer/long/whatever.

utnapistim utnapistim 5,31318 silver badges25 bronze badges · Accepted Answer · 2016-06-22 09:22:35Z

I am wondering if there are good examples or ideas on simplyfing the process of converting documentation of the file format into code with goal being to load data into a class that can be loaded/saved/processed.

This can be solved at multiple levels:

you can use boost::spirit parsing, or a custom serializer/deserializer (as suggested in the comments)
you can hide the implementation behind a custom set of boost::iostream device buffer types.

I am mostly concerned about the ease of describing the data, and minimizing rework for each new type of binary file format being worked on.

I would do this by creating some custom types that map i/o bytes to semantic information, transparently to the user:

/// map custom file header info into BlaBla information
class BlaBlaHeaderField
{
 std::uint32_t binary_header;
 BlaBlaHeaderField(std::uint32_t binary_header) { ... }
 /// custom property (interprets individual bytes)
 int BlaBlaParity() { return (binary_header & 0x01); }
};

This way, the format will be close to self-documenting from the code, later.

You can also use a union and overlay the fields with an integer/long/whatever.

Stack Exchange Network

Strategy for Binary File Format Description to C++ Implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Strategy for Binary File Format Description to C++ Implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions