I am writing a python3 script to write some numbers as binary on file. While doing it I found something quite strange. for example, the following python code writes a "unsign short" and a "float" number to tmp file:
import struct
with open('tmp', "wb") as f:
id1 = 1
i = 0.5785536878880112
fmt = "Hf"
data = struct.pack('Hf', id1, i)
f.write(data)
print("no. of bytes:%d"%struct.calcsize(fmt))
According to the docs "H" (unsigned short) is 2 bytes and "f"(float) is 4 bytes. so I'd expect a 6-byte file, however the output is a 8byte data:
01 00 00 00 18 1c 14 3f
as indicated by
struct.calcsize(fmt)
which says "Hf" is of 8 bytes in size
if I do it separately, e.g.
data = struct.pack('H', id1)
f.write(data)
data = struct.pack('f', i)
f.write(data)
then the output is an expected 6-byte file:
01 00 18 1c 14 3f
what is happening here?
-
What Tim said. If you put the float before the ushort then you won't need padding because everything will be aligned correctly.PM 2Ring– PM 2Ring2015年11月16日 10:59:42 +00:00Commented Nov 16, 2015 at 10:59
2 Answers 2
According to the documentation, specifying the byte order removes any padding:
No padding is added when using non-native size and alignment, e.g. with ‘<’, ‘>’, ‘=’, and ‘!’.
Therefore, assuming you require little endian packing, the following gives the required output:
>>> struct.pack('<Hf', id1, i)
'\x01\x00\x18\x1c\x14?'
Note the <. (3f can be encoded as ASCII ?, hence the replacement)
Comments
struct.pack() aligns values according to their length, so a 4-byte value will always start at an index divisible by 4. If you write the data in chunks like you did in the second example, this padding can't be performed, obviously.
As the docs you linked to say:
By default, C numbers are represented in the machine’s native format and byte order, and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler).