9

I have a binary file containing a stream of 10-bit integers. I want to read it and store the values in a list.

It is working with the following code, which reads my_file and fills pixels with integer values:

file = open("my_file", "rb")
pixels = []
new10bitsByte = ""
try:
 byte = file.read(1)
 while byte:
 bits = bin(ord(byte))[2:].rjust(8, '0')
 for bit in reversed(bits):
 new10bitsByte += bit
 if len(new10bitsByte) == 10:
 pixels.append(int(new10bitsByte[::-1], 2))
 new10bitsByte = "" 
 byte = file.read(1)
finally:
 file.close()

It doesn't seem very elegant to read the bytes into bits, and read it back into "10-bit" bytes. Is there a better way to do it?

With 8 or 16 bit integers I could just use file.read(size) and convert the result to an int directly. But here, as each value is stored in 1.25 bytes, I would need something like file.read(1.25)...

asked Jul 11, 2016 at 8:52
8
  • 2
    Check out the first two answers here: stackoverflow.com/questions/10689748/… Commented Jul 11, 2016 at 9:09
  • @juanpa.arrivillaga Thank you! So from what I understand there is no way to read a file 10 bit by 10 bit in Python, I have to read it byte by byte and then "cut" the bytes to get my "10-bit" bytes. Commented Jul 11, 2016 at 9:27
  • From what I understand, yes, but I am not certain. I just found that answer and it looked like it might be useful. Commented Jul 11, 2016 at 9:31
  • 2
    You may want to read 40 bits at a time, i.e. 5 bytes. Those contain 4 full 10 bit numbers, which you should be able to extract in one go. Commented Jul 11, 2016 at 9:51
  • 1
    What MisterMiyagi said. It looks like you're using Python 2. Is that correct? Unless the input file is really huge, it's probably a little more efficient to read it all into memory, rather than reading it byte by byte. FWIW, bits = format(ord(byte), '08b') is a little more efficient than using the bin function. But really, it's better to use MisterMiyagi's suggestion instead of this roundabout conversion algorithm. Commented Jul 11, 2016 at 10:05

3 Answers 3

3

Here's a generator that does the bit operations without using text string conversions. Hopefully, it's a little more efficient. :)

To test it, I write all the numbers in range(1024) to a BytesIO stream, which behaves like a binary file.

from io import BytesIO
def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if len(b) == 0:
 break
 n = int.from_bytes(b, 'big')
 #Split n into 4 10 bit integers
 t = []
 for i in range(4):
 t.append(n & 0x3ff)
 n >>= 10
 yield from reversed(t)
# Make some test data: all the integers in range(1024),
# and save it to a byte stream
buff = BytesIO()
maxi = 1024
n = 0
for i in range(maxi):
 n = (n << 10) | i
 #Convert the 40 bit integer to 5 bytes & write them
 if i % 4 == 3:
 buff.write(n.to_bytes(5, 'big'))
 n = 0
# Rewind the stream so we can read from it
buff.seek(0)
# Read the data in 10 bit chunks
a = list(tenbitread(buff))
# Check it 
print(a == list(range(maxi))) 

output

True

Doing list(tenbitread(buff)) is the simplest way to turn the generator output into a list, but you can easily iterate over the values instead, eg

for v in tenbitread(buff):

or

for i, v in enumerate(tenbitread(buff)):

if you want indices as well as the data values.


Here's a little-endian version of the generator which gives the same results as your code.

def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if not len(b):
 break
 n = int.from_bytes(b, 'little')
 #Split n into 4 10 bit integers
 for i in range(4):
 yield n & 0x3ff
 n >>= 10

We can improve this version slightly by "un-rolling" that for loop, which lets us get rid of the final masking and shifting operations.

def tenbitread(f):
 ''' Generate 10 bit (unsigned) integers from a binary file '''
 while True:
 b = f.read(5)
 if not len(b):
 break
 n = int.from_bytes(b, 'little')
 #Split n into 4 10 bit integers
 yield n & 0x3ff
 n >>= 10
 yield n & 0x3ff
 n >>= 10
 yield n & 0x3ff
 n >>= 10
 yield n 

This should give a little more speed...

answered Jul 11, 2016 at 11:59
Sign up to request clarification or add additional context in comments.

4 Comments

It's working perfectly with the little-endian version, and 7 times faster than my initial code :) Thank you very much!
@Jean-BaptisteMartin: There's a slight optimization that can be made. I don't know if it will speed things up much, but it's worth trying. I'll add it to my answer shortly.
This is great! But how would you get signed ints instead of unsigned?
@Taaam That's not too hard. We can use the ^ bitwise exclusive-or operator for that. To get 10 bit signed numbers, change the n & 0x3ff to ((n & 0x3ff) ^ 512) - 512. You can drop the inner parentheses: (n & 0x3ff ^ 512) - 512, but I think they make it a little easier to read.
2

Adding a Numpy based solution suitable for unpacking large 10-bit packed byte buffers like the ones you might receive from AVT and FLIR cameras.

This is a 10-bit version of @cyrilgaudefroy's answer to a similar question; there you can also find a Numba alternative capable of yielding an additional speed increase.

import numpy as np
def read_uint10(byte_buf):
 data = np.frombuffer(byte_buf, dtype=np.uint8)
 # 5 bytes contain 4 10-bit pixels (5x8 == 4x10)
 b1, b2, b3, b4, b5 = np.reshape(data, (data.shape[0]//5, 5)).astype(np.uint16).T
 o1 = (b1 << 2) + (b2 >> 6)
 o2 = ((b2 % 64) << 4) + (b3 >> 4)
 o3 = ((b3 % 16) << 6) + (b4 >> 2)
 o4 = ((b4 % 4) << 8) + b5
 unpacked = np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), 4*o1.shape[0])
 return unpacked

Reshape can be omitted if returning a buffer instead of a Numpy array:

unpacked = np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1).tobytes()

Or if image dimensions are known it can be reshaped directly, e.g.:

unpacked = np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), (1024, 1024))

If the use of the modulus operator appears confusing, try playing around with:

np.unpackbits(np.array([255%64], dtype=np.uint8))

Edit: It turns out that the Allied Vision Mako-U cameras employ a different ordering than the one I originally suggested above:

o1 = ((b2 % 4) << 8) + b1
o2 = ((b3 % 16) << 6) + (b2 >> 2)
o3 = ((b4 % 64) << 4) + (b3 >> 4)
o4 = (b5 << 2) + (b4 >> 6)

So you might have to test different orders if images come out looking wonky initially for your specific setup.

Comments

1

As there is no direct way to read a file x-bit by x-bit in Python, we have to read it byte by byte. Following MisterMiyagi and PM 2Ring's suggestions I modified my code to read the file by 5 byte chunks (i.e. 40 bits) and then split the resulting string into 4 10-bit numbers, instead of looping over the bits individually. It turned out to be twice as fast as my previous code.

file = open("my_file", "rb")
pixels = []
exit_loop = False
try:
 while not exit_loop:
 # Read 5 consecutive bytes into fiveBytesString
 fiveBytesString = ""
 for i in range(5):
 byte = file.read(1)
 if not byte:
 exit_loop = True
 break
 byteString = format(ord(byte), '08b')
 fiveBytesString += byteString[::-1]
 # Split fiveBytesString into 4 10-bit numbers, and add them to pixels
 pixels.extend([int(fiveBytesString[i:i+10][::-1], 2) for i in range(0, 40, 10) if len(fiveBytesString[i:i+10]) > 0])
finally:
 file.close()
answered Jul 11, 2016 at 11:47

6 Comments

1). I'm not sure why you are doing those reversals with [::-1]. 2). You need to check that fiveBytesString isn't empty before attempting to convert it to integer. 3). exit isn't a great variable name because it shadows the exit() function. It's not an error to use it as a flag like that, just a little confusing for others reading your code. :)
1) It is because I already know what my output is supposed to be (I'm trying to do the conversion myself but I already have the output file). For example, the 5 first bytes are 01001011, 01010100, 11100001, 10000101, 00011000. I know that the first output numbers should be 20, 23, 21, 37. To find the right output I had to reverse the bytes, concatenate them, split them and reverse the result again. I don't know how the input file was created, I just guessed that I had to do these reverses to get my output... 2) and 3) Edited, thanks!
Ah, ok. I've added a new version. It now gives the same values as your code. However, I don't see how you get [20, 23, 21, 37] from [0b01001011, 0b01010100, 0b11100001, 0b10000101, 0b00011000].
I just realized I gave you the wrong bytes, I'm really sorry! But your updated generator is working fine with my file and know I think I have a better understanding of how binary files manipulation work, thank you for your help!
My pleasure! And thanks for the accept. If you'd posted the right bytes I would have been a bit faster with my answer. :) BTW, you may like to look at this answer I wrote last year that takes a slightly different approach to bit fiddling.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.