I am attempting to read a binary file with a known data type that was generated in C++ with the following structure:
uint64_t shot;
uint32_t status;
double easting, northing, altitude;
uint32_t zoneNumber;
char zoneLetter;
float vNorth, vEast, vDown;
float qw, qx, qy, qz;
float acel_x, acel_y, acel_z, gyro_x, gyro_y, gyro_z;
float rollStdDev, pitchStdDev, yawStdDev;
float northingStdDev, eastingStdDev, altitudeStdDev;
float vNorthStdDev, vEastStdDev, vDownStdDev;
The only strangeness happening here is that the 'zoneletter' is a 1byte character with 3 bytes of padding following. So the character should start on byte 40 and then vNorth should start on byte 44.
This is my attempting at reading this binary file with python using numpy. I am unsure about if I converted the c++ doubles correctly, and I am pretty sure my problems are coming from reading the character.
The problem being that on the first iteration shot and status are reading correctly, but thereafter everything that is reading doesn't make any sense to what the actual data should look like. So I believe there is a problem with either the character or double reading from bytes 12-44.
dt2 = np.dtype([('ShotNum', np.uint64), ('Status', np.uint32), ('easting', np.float_),\
('northing', np.float_),('alt', np.float_), ('Znum', np.uint32),('letter', np.character),\
('vnorth', np.float32),('veast', np.float32),('vdown', np.float32),\
('qw', np.float32),('qx', np.float32),('qy', np.float32),('qz', np.float32),\
('acX', np.float32),('acY',np.float32),('acZ', np.float32),('gyX', np.float32),\
('gyY', np.float32),('gyZ', np.float32),('rollSTD', np.float32),('pitSTD', np.float32),\
('yawSTD', np.float32),('northSTD', np.float32),('eastSTD', np.float32),('altSTD', np.float32),\
('vnorthSTD', np.float32),('veastSTD', np.float32),('vDownSTD', np.float32)])
Here is the c++ explicit struct
[thread: 892] mainwindow.cpp(65): Sizeof NavigationSolution: 136 bytes
[thread: 892] mainwindow.cpp(66): shot
[thread: 892] mainwindow.cpp(67): bytes: 8
[thread: 892] mainwindow.cpp(68): offset: 0
[thread: 892] mainwindow.cpp(69): status
[thread: 892] mainwindow.cpp(70): bytes: 4
[thread: 892] mainwindow.cpp(71): offset: 8
[thread: 892] mainwindow.cpp(72): easting
[thread: 892] mainwindow.cpp(73): bytes: 8
[thread: 892] mainwindow.cpp(74): offset: 16
[thread: 892] mainwindow.cpp(75): northing
[thread: 892] mainwindow.cpp(76): bytes: 8
[thread: 892] mainwindow.cpp(77): offset: 24
[thread: 892] mainwindow.cpp(78): altitude
[thread: 892] mainwindow.cpp(79): bytes: 8
[thread: 892] mainwindow.cpp(80): offset: 32
[thread: 892] mainwindow.cpp(81): zoneNumber
[thread: 892] mainwindow.cpp(82): bytes: 4
[thread: 892] mainwindow.cpp(83): offset: 40
[thread: 892] mainwindow.cpp(84): zoneLetter
[thread: 892] mainwindow.cpp(85): bytes: 1
[thread: 892] mainwindow.cpp(86): offset: 44
[thread: 892] mainwindow.cpp(87): vNorth
[thread: 892] mainwindow.cpp(88): bytes: 4
[thread: 892] mainwindow.cpp(89): offset: 48
[thread: 892] mainwindow.cpp(90): vEast
[thread: 892] mainwindow.cpp(91): bytes: 4
[thread: 892] mainwindow.cpp(92): offset: 52
[thread: 892] mainwindow.cpp(93): vDown
[thread: 892] mainwindow.cpp(94): bytes: 4
[thread: 892] mainwindow.cpp(95): offset: 56
...etc
2 Answers 2
You have to account for the padding that the compiler adds to the struct. For example, the offset of easting is 16 bytes, not 12. The compiler added four bytes of padding, presumably to give easting an eight-byte alignment.
If this work is part of a long-term project, think twice about using this format. It is compiler-dependent. If the C++ code that generates the binary file is compiled with a different compiler (or perhaps even with same compiler but different compiler options), the padding in the binary file could change, and the Python code would not read the file correctly.
Comments
Warren Weckesser pointed out that your problems stemmed from missing padding in your NumPy dtype. While it's true that you can fix it manually by adding explicit pad fields, you can also fix it automatically with the align=True option to np.dtype:
align : bool, optional
Add padding to the fields to match what a C compiler would output for a similar C-struct.
Also, you might consider representing your structure in NumPy with nested arrays, for example instead of this:
('qw', np.float32),('qx', np.float32),('qy', np.float32),('qz', np.float32)
You could do:
('q', np.float32, 4)
Then any value in arr.q will be an array of length 4, which depending on your application may be easier to deal with.
3 Comments
align=False (the default) in NumPy, and compiler-specific pragmas to enable structure packing in C++. Then, add explicit padding back in if required.double easting would have no padding before it on 32-bit x86 (we can surmise OP is using a 64-bit system).
('letter', np.character)will work. You need to account for all four bytes (letter and padding). Try replacing that with('letter', 'S1'), ('padding', 'S3').struct, and was thatstructwritten out as a single block of memory? Or is each field written separately? If the former, be aware that the compiler may add padding to the struct (perhaps to improve memory alignment). For example, the compiler might have added four bytes after thestatusfield to giveeastingan eight-byte alignment.