Reading a Binary File that was generated with C++ data types Using Numpy

Question 1

I am attempting to read a binary file with a known data type that was generated in C++ with the following structure:

 uint64_t shot;
 uint32_t status;
 double easting, northing, altitude;
 uint32_t zoneNumber;
 char zoneLetter;
 float vNorth, vEast, vDown;
 float qw, qx, qy, qz;
 float acel_x, acel_y, acel_z, gyro_x, gyro_y, gyro_z;
 float rollStdDev, pitchStdDev, yawStdDev;
 float northingStdDev, eastingStdDev, altitudeStdDev;
 float vNorthStdDev, vEastStdDev, vDownStdDev;

The only strangeness happening here is that the 'zoneletter' is a 1byte character with 3 bytes of padding following. So the character should start on byte 40 and then vNorth should start on byte 44.

This is my attempting at reading this binary file with python using numpy. I am unsure about if I converted the c++ doubles correctly, and I am pretty sure my problems are coming from reading the character.

The problem being that on the first iteration shot and status are reading correctly, but thereafter everything that is reading doesn't make any sense to what the actual data should look like. So I believe there is a problem with either the character or double reading from bytes 12-44.

dt2 = np.dtype([('ShotNum', np.uint64), ('Status', np.uint32), ('easting', np.float_),\
 ('northing', np.float_),('alt', np.float_), ('Znum', np.uint32),('letter', np.character),\
 ('vnorth', np.float32),('veast', np.float32),('vdown', np.float32),\
 ('qw', np.float32),('qx', np.float32),('qy', np.float32),('qz', np.float32),\
 ('acX', np.float32),('acY',np.float32),('acZ', np.float32),('gyX', np.float32),\
 ('gyY', np.float32),('gyZ', np.float32),('rollSTD', np.float32),('pitSTD', np.float32),\
 ('yawSTD', np.float32),('northSTD', np.float32),('eastSTD', np.float32),('altSTD', np.float32),\
 ('vnorthSTD', np.float32),('veastSTD', np.float32),('vDownSTD', np.float32)])

Here is the c++ explicit struct

[thread: 892] mainwindow.cpp(65): Sizeof NavigationSolution: 136 bytes
[thread: 892] mainwindow.cpp(66): shot
[thread: 892] mainwindow.cpp(67): bytes: 8
[thread: 892] mainwindow.cpp(68): offset: 0
[thread: 892] mainwindow.cpp(69): status
[thread: 892] mainwindow.cpp(70): bytes: 4
[thread: 892] mainwindow.cpp(71): offset: 8
[thread: 892] mainwindow.cpp(72): easting
[thread: 892] mainwindow.cpp(73): bytes: 8
[thread: 892] mainwindow.cpp(74): offset: 16
[thread: 892] mainwindow.cpp(75): northing
[thread: 892] mainwindow.cpp(76): bytes: 8
[thread: 892] mainwindow.cpp(77): offset: 24
[thread: 892] mainwindow.cpp(78): altitude
[thread: 892] mainwindow.cpp(79): bytes: 8
[thread: 892] mainwindow.cpp(80): offset: 32
[thread: 892] mainwindow.cpp(81): zoneNumber
[thread: 892] mainwindow.cpp(82): bytes: 4
[thread: 892] mainwindow.cpp(83): offset: 40
[thread: 892] mainwindow.cpp(84): zoneLetter
[thread: 892] mainwindow.cpp(85): bytes: 1
[thread: 892] mainwindow.cpp(86): offset: 44
[thread: 892] mainwindow.cpp(87): vNorth
[thread: 892] mainwindow.cpp(88): bytes: 4
[thread: 892] mainwindow.cpp(89): offset: 48
[thread: 892] mainwindow.cpp(90): vEast
[thread: 892] mainwindow.cpp(91): bytes: 4
[thread: 892] mainwindow.cpp(92): offset: 52
[thread: 892] mainwindow.cpp(93): vDown
[thread: 892] mainwindow.cpp(94): bytes: 4
[thread: 892] mainwindow.cpp(95): offset: 56

...etc

Question 2

"I am pretty sure my problems ..." You haven't explained your problems. That makes it difficult to help you solve them. :)

Question 3

I don't think ('letter', np.character) will work. You need to account for all four bytes (letter and padding). Try replacing that with ('letter', 'S1'), ('padding', 'S3').

Question 4

edited problem for a little more clarity. Ill give it a shot Warren

Question 5

I tried your solution Warren, and the output still doesn't make sense. I am unsure if a C++ double is a python numpy float_

Question 6

Exactly how was the binary file created? Are the C++ variables that you show actually the fields in a struct, and was that struct written out as a single block of memory? Or is each field written separately? If the former, be aware that the compiler may add padding to the struct (perhaps to improve memory alignment). For example, the compiler might have added four bytes after the status field to give easting an eight-byte alignment.

Question 7

You have to account for the padding that the compiler adds to the struct. For example, the offset of easting is 16 bytes, not 12. The compiler added four bytes of padding, presumably to give easting an eight-byte alignment.

If this work is part of a long-term project, think twice about using this format. It is compiler-dependent. If the C++ code that generates the binary file is compiled with a different compiler (or perhaps even with same compiler but different compiler options), the padding in the binary file could change, and the Python code would not read the file correctly.

Question 8

Warren Weckesser pointed out that your problems stemmed from missing padding in your NumPy dtype. While it's true that you can fix it manually by adding explicit pad fields, you can also fix it automatically with the align=True option to np.dtype:

align : bool, optional

Add padding to the fields to match what a C compiler would output for a similar C-struct.

Also, you might consider representing your structure in NumPy with nested arrays, for example instead of this:

('qw', np.float32),('qx', np.float32),('qy', np.float32),('qz', np.float32)

You could do:

('q', np.float32, 4)

Then any value in arr.q will be an array of length 4, which depending on your application may be easier to deal with.

Question 9

Great tip! Is there a C/C++ standard for padding the fields of a struct? Or does it just happen to work out that the popular C/C++ compilers all do the same thing?

Question 10

@WarrenWeckesser: It is implementation-defined. However, if OP is happy with the implementation-defined nature of his C++ structs, he may be satisfied with the same in Python. If OP wants to be cross-platform, the solution is the opposite: align=False (the default) in NumPy, and compiler-specific pragmas to enable structure packing in C++. Then, add explicit padding back in if required.

Question 11

As an example, double easting would have no padding before it on 32-bit x86 (we can surmise OP is using a 64-bit system).

Warren Weckesser 116k20 gold badges207 silver badges224 bronze badges · Accepted Answer · 2017-11-25 23:44:44Z

You have to account for the padding that the compiler adds to the struct. For example, the offset of easting is 16 bytes, not 12. The compiler added four bytes of padding, presumably to give easting an eight-byte alignment.

If this work is part of a long-term project, think twice about using this format. It is compiler-dependent. If the C++ code that generates the binary file is compiled with a different compiler (or perhaps even with same compiler but different compiler options), the padding in the binary file could change, and the Python code would not read the file correctly.

CollectivesTM on Stack Overflow

Reading a Binary File that was generated with C++ data types Using Numpy

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related