I have a binary file and specifications:
after 'abst' (0x61627374): var1 Unsigned 8-bit integer var2 Unsigned 24-bit integer var3 Sequence of Unicode 8-bit characters (UTF-8), terminated with 0x00
How to read var1,var2,var3 from file ?
2 Answers 2
Quick and dirty and not tested:
# assumption: the file is small enough to fit into the RAM
# and also that 'abst' does not occur in the dataset
for hunk in input.split('abst')[1:]: # skip first hunk, since it is the stuff befor the first 'abst' occurence
var1 = ord(hunk[0])
var2 = ord(hunk[1]) + ord(hunk[2])*256 + ord(hunk[3])*256*256
var3 = hunk[4:].split('\x00')[0]
answered Jul 20, 2011 at 12:01
Rudi
20.1k3 gold badges58 silver badges78 bronze badges
Sign up to request clarification or add additional context in comments.
2 Comments
John Machin
also if there is guff before 'abst' you will unpack that.
Rudi
@John Thank you. (And I even got caught by the first hunk error yesterday grml)
The bitstring module might be helpful here as you have unusual bit lengths, and it can be a bit more readable than unpacking values 'by hand':
import bitstring
bitstring.bytealigned = True
s = bitstring.ConstBitStream(your_file)
if s.find('0x61627374'): # seeks to your start code
start_code, var1, var2 = s.readlist('bytes:4, uint:8, uint:24')
p1 = s.pos
p2 = s.find('0x00', start=p1) # find next '\x00'
var3 = s[p1:p2+8].bytes # and interpret the slice as bytes
answered Jul 21, 2011 at 15:36
Scott Griffiths
22k8 gold badges58 silver badges86 bronze badges
Comments
lang-py