I'm using the struct module, and things aren't going as I expected. Its due to some misunderstanding I have with the module I'm sure.
import struct
s = struct.Struct('Q');
print s.size
s = struct.Struct('H L Q');
print s.size
s = struct.Struct('H I Q');
print s.size
s = struct.Struct('H I Q H');
print s.size
The output of this is:
8
24
16
18
What am I missing here? Why are the second and third different sizes, and why is the fourth not 16?
4 Answers 4
Alignment issue.
Assuming you're running on a 64-bit non-Windows platform: Q and L will be 8-byte long, I is 4-byte, and H is 2-byte.
And these types must be put on a location which is a multiple of its size for best efficiency.
Therefore, the 2nd struct would be arranged as:
HH______ LLLLLLLL QQQQQQQQ
the 3rd struct:
HH__IIII QQQQQQQQ
and the 4th struct:
HH__IIII QQQQQQQQ HH
If you don't want alignment, and require L to have 4 byte (the "standard" size), you'll need to use the = or > or < format, as described in http://docs.python.org/library/struct.html#struct-alignment:
import struct
s = struct.Struct('=Q')
print s.size
s = struct.Struct('=HLQ')
print s.size
s = struct.Struct('=HIQ')
print s.size
s = struct.Struct('=HIQH')
print s.size
Demo: http://ideone.com/EMlgm
Comments
If you look at the documentation of struct:
Alternatively, the first character of the format string can be used to indicate the byte order, size and alignment of the packed data, according to the following table:
Character Byte order Size Alignment
@ native native native
= native standard none
< little-endian standard none
> big-endian standard none
! network (= big-endian) standard none
If the first character is not one of these, '@' is assumed.
Since you didn't give any size hint, native size and alignment is chosen which can give unpredictable sizes thanks to alignment and different sizes. This should fix the issue:
import struct
print(struct.calcsize('!Q'))
print(struct.calcsize('!H L Q'))
print(struct.calcsize('!H I Q'))
print(struct.calcsize('!H I Q H'))
3 Comments
= would be correct, unless you unpack something you know for certain that it is little-endian, in that case use <.If you're on 64 bits architecture, then int is 4 bytes, and long is 8 bytes:
>>> struct.Struct('I').size
4
>>> struct.Struct('L').size
8
For the last one, this is what we call "alignment": http://docs.python.org/library/struct.html#struct-alignment:
>>> struct.Struct('I').size
4
>>> struct.Struct('H').size
2
>>> struct.Struct('HI').size
8
# => aligned to the next word.
Comments
It has to do with alignment. If you add one of the byte order characters to the format, you will get the answers you expect.
=,<,>or!.