Python - reading 10 bit integers from a binary file

Question 1

I have a binary file containing a stream of 10-bit integers. I want to read it and store the values in a list.

It is working with the following code, which reads my_file and fills pixels with integer values:

file = open("my_file", "rb")
pixels = []
new10bitsByte = ""
try:
 byte = file.read(1)
 while byte:
 bits = bin(ord(byte))[2:].rjust(8, '0')
 for bit in reversed(bits):
 new10bitsByte += bit
 if len(new10bitsByte) == 10:
 pixels.append(int(new10bitsByte[::-1], 2))
 new10bitsByte = "" 
 byte = file.read(1)
finally:
 file.close()

It doesn't seem very elegant to read the bytes into bits, and read it back into "10-bit" bytes. Is there a better way to do it?

With 8 or 16 bit integers I could just use file.read(size) and convert the result to an int directly. But here, as each value is stored in 1.25 bytes, I would need something like file.read(1.25)...

Question 2

Check out the first two answers here: stackoverflow.com/questions/10689748/…

Question 3

@juanpa.arrivillaga Thank you! So from what I understand there is no way to read a file 10 bit by 10 bit in Python, I have to read it byte by byte and then "cut" the bytes to get my "10-bit" bytes.

Question 4

From what I understand, yes, but I am not certain. I just found that answer and it looked like it might be useful.

Question 5

You may want to read 40 bits at a time, i.e. 5 bytes. Those contain 4 full 10 bit numbers, which you should be able to extract in one go.

Question 6

What MisterMiyagi said. It looks like you're using Python 2. Is that correct? Unless the input file is really huge, it's probably a little more efficient to read it all into memory, rather than reading it byte by byte. FWIW, bits = format(ord(byte), '08b') is a little more efficient than using the bin function. But really, it's better to use MisterMiyagi's suggestion instead of this roundabout conversion algorithm.

Question 7

Here's a generator that does the bit operations without using text string conversions. Hopefully, it's a little more efficient. :)

To test it, I write all the numbers in range(1024) to a BytesIO stream, which behaves like a binary file.

from io import BytesIO
def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if len(b) == 0:
 break
 n = int.from_bytes(b, 'big')
 #Split n into 4 10 bit integers
 t = []
 for i in range(4):
 t.append(n & 0x3ff)
 n >>= 10
 yield from reversed(t)
# Make some test data: all the integers in range(1024),
# and save it to a byte stream
buff = BytesIO()
maxi = 1024
n = 0
for i in range(maxi):
 n = (n << 10) | i
 #Convert the 40 bit integer to 5 bytes & write them
 if i % 4 == 3:
 buff.write(n.to_bytes(5, 'big'))
 n = 0
# Rewind the stream so we can read from it
buff.seek(0)
# Read the data in 10 bit chunks
a = list(tenbitread(buff))
# Check it 
print(a == list(range(maxi)))

output

True

Doing list(tenbitread(buff)) is the simplest way to turn the generator output into a list, but you can easily iterate over the values instead, eg

for v in tenbitread(buff):

or

for i, v in enumerate(tenbitread(buff)):

if you want indices as well as the data values.

Here's a little-endian version of the generator which gives the same results as your code.

def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if not len(b):
 break
 n = int.from_bytes(b, 'little')
 #Split n into 4 10 bit integers
 for i in range(4):
 yield n & 0x3ff
 n >>= 10

We can improve this version slightly by "un-rolling" that for loop, which lets us get rid of the final masking and shifting operations.

def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if not len(b):
 break
 n = int.from_bytes(b, 'little')
 #Split n into 4 10 bit integers
 yield n & 0x3ff
 n >>= 10
 yield n & 0x3ff
 n >>= 10
 yield n & 0x3ff
 n >>= 10
 yield n

This should give a little more speed...

Question 8

It's working perfectly with the little-endian version, and 7 times faster than my initial code :) Thank you very much!

Question 9

@Jean-BaptisteMartin: There's a slight optimization that can be made. I don't know if it will speed things up much, but it's worth trying. I'll add it to my answer shortly.

Question 10

This is great! But how would you get signed ints instead of unsigned?

Question 11

@Taaam That's not too hard. We can use the ^ bitwise exclusive-or operator for that. To get 10 bit signed numbers, change the n & 0x3ff to ((n & 0x3ff) ^ 512) - 512. You can drop the inner parentheses: (n & 0x3ff ^ 512) - 512, but I think they make it a little easier to read.

Question 12

Adding a Numpy based solution suitable for unpacking large 10-bit packed byte buffers like the ones you might receive from AVT and FLIR cameras.

This is a 10-bit version of @cyrilgaudefroy's answer to a similar question; there you can also find a Numba alternative capable of yielding an additional speed increase.

import numpy as np
def read_uint10(byte_buf):
 data = np.frombuffer(byte_buf, dtype=np.uint8)
 # 5 bytes contain 4 10-bit pixels (5x8 == 4x10)
 b1, b2, b3, b4, b5 = np.reshape(data, (data.shape[0]//5, 5)).astype(np.uint16).T
 o1 = (b1 << 2) + (b2 >> 6)
 o2 = ((b2 % 64) << 4) + (b3 >> 4)
 o3 = ((b3 % 16) << 6) + (b4 >> 2)
 o4 = ((b4 % 4) << 8) + b5
 unpacked = np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), 4*o1.shape[0])
 return unpacked

Reshape can be omitted if returning a buffer instead of a Numpy array:

unpacked = np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1).tobytes()

Or if image dimensions are known it can be reshaped directly, e.g.:

unpacked = np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), (1024, 1024))

If the use of the modulus operator appears confusing, try playing around with:

np.unpackbits(np.array([255%64], dtype=np.uint8))

Edit: It turns out that the Allied Vision Mako-U cameras employ a different ordering than the one I originally suggested above:

o1 = ((b2 % 4) << 8) + b1
o2 = ((b3 % 16) << 6) + (b2 >> 2)
o3 = ((b4 % 64) << 4) + (b3 >> 4)
o4 = (b5 << 2) + (b4 >> 6)

So you might have to test different orders if images come out looking wonky initially for your specific setup.

Question 13

As there is no direct way to read a file x-bit by x-bit in Python, we have to read it byte by byte. Following MisterMiyagi and PM 2Ring's suggestions I modified my code to read the file by 5 byte chunks (i.e. 40 bits) and then split the resulting string into 4 10-bit numbers, instead of looping over the bits individually. It turned out to be twice as fast as my previous code.

file = open("my_file", "rb")
pixels = []
exit_loop = False
try:
 while not exit_loop:
 # Read 5 consecutive bytes into fiveBytesString
 fiveBytesString = ""
 for i in range(5):
 byte = file.read(1)
 if not byte:
 exit_loop = True
 break
 byteString = format(ord(byte), '08b')
 fiveBytesString += byteString[::-1]
 # Split fiveBytesString into 4 10-bit numbers, and add them to pixels
 pixels.extend([int(fiveBytesString[i:i+10][::-1], 2) for i in range(0, 40, 10) if len(fiveBytesString[i:i+10]) > 0])
finally:
 file.close()

Question 14

1). I'm not sure why you are doing those reversals with [::-1]. 2). You need to check that fiveBytesString isn't empty before attempting to convert it to integer. 3). exit isn't a great variable name because it shadows the exit() function. It's not an error to use it as a flag like that, just a little confusing for others reading your code. :)

Question 15

1) It is because I already know what my output is supposed to be (I'm trying to do the conversion myself but I already have the output file). For example, the 5 first bytes are 01001011, 01010100, 11100001, 10000101, 00011000. I know that the first output numbers should be 20, 23, 21, 37. To find the right output I had to reverse the bytes, concatenate them, split them and reverse the result again. I don't know how the input file was created, I just guessed that I had to do these reverses to get my output... 2) and 3) Edited, thanks!

Question 16

Ah, ok. I've added a new version. It now gives the same values as your code. However, I don't see how you get [20, 23, 21, 37] from [0b01001011, 0b01010100, 0b11100001, 0b10000101, 0b00011000].

Question 17

I just realized I gave you the wrong bytes, I'm really sorry! But your updated generator is working fine with my file and know I think I have a better understanding of how binary files manipulation work, thank you for your help!

Question 18

My pleasure! And thanks for the accept. If you'd posted the right bytes I would have been a bit faster with my answer. :) BTW, you may like to look at this answer I wrote last year that takes a slightly different approach to bit fiddling.

PM 2Ring 55.6k6 gold badges96 silver badges203 bronze badges · Accepted Answer · 2016-07-11 11:59:44Z

Here's a generator that does the bit operations without using text string conversions. Hopefully, it's a little more efficient. :)

To test it, I write all the numbers in range(1024) to a BytesIO stream, which behaves like a binary file.

from io import BytesIO
def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if len(b) == 0:
 break
 n = int.from_bytes(b, 'big')
 #Split n into 4 10 bit integers
 t = []
 for i in range(4):
 t.append(n & 0x3ff)
 n >>= 10
 yield from reversed(t)
# Make some test data: all the integers in range(1024),
# and save it to a byte stream
buff = BytesIO()
maxi = 1024
n = 0
for i in range(maxi):
 n = (n << 10) | i
 #Convert the 40 bit integer to 5 bytes & write them
 if i % 4 == 3:
 buff.write(n.to_bytes(5, 'big'))
 n = 0
# Rewind the stream so we can read from it
buff.seek(0)
# Read the data in 10 bit chunks
a = list(tenbitread(buff))
# Check it 
print(a == list(range(maxi)))

output

True

Doing list(tenbitread(buff)) is the simplest way to turn the generator output into a list, but you can easily iterate over the values instead, eg

for v in tenbitread(buff):

or

for i, v in enumerate(tenbitread(buff)):

if you want indices as well as the data values.

Here's a little-endian version of the generator which gives the same results as your code.

def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if not len(b):
 break
 n = int.from_bytes(b, 'little')
 #Split n into 4 10 bit integers
 for i in range(4):
 yield n & 0x3ff
 n >>= 10

We can improve this version slightly by "un-rolling" that for loop, which lets us get rid of the final masking and shifting operations.

def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if not len(b):
 break
 n = int.from_bytes(b, 'little')
 #Split n into 4 10 bit integers
 yield n & 0x3ff
 n >>= 10
 yield n & 0x3ff
 n >>= 10
 yield n & 0x3ff
 n >>= 10
 yield n

This should give a little more speed...

It's working perfectly with the little-endian version, and 7 times faster than my initial code :) Thank you very much!
@Jean-BaptisteMartin: There's a slight optimization that can be made. I don't know if it will speed things up much, but it's worth trying. I'll add it to my answer shortly.
This is great! But how would you get signed ints instead of unsigned?
@Taaam That's not too hard. We can use the ^ bitwise exclusive-or operator for that. To get 10 bit signed numbers, change the n & 0x3ff to ((n & 0x3ff) ^ 512) - 512. You can drop the inner parentheses: (n & 0x3ff ^ 512) - 512, but I think they make it a little easier to read.

CollectivesTM on Stack Overflow

Python - reading 10 bit integers from a binary file

3 Answers 3

4 Comments

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

4 Comments

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related