Reading Bits from a byte with python

Question 1

I have instructions concerning the structure of a binary file and I'm trying to build a parser to get information from the binary file. I was doing quite alright till i came across the following:

Start with a DWORD Size = 0. You're going to reconstruct the size by getting packs of 7 bits:

Get a byte.

Add the first 7 bits of this byte to Size.

Check bit 7 (the last bit) of this byte. If it's on, go back to 1. to process the next byte.

To resume, if Size < 128 then it will occupy only 1 byte, else if Size < 16384 it will occupy only 2 bytes and so on...

What I'm confused about is what it means to "get bits from a byte", and to "check the last bit of the byte". This is the way I've been reading bytes from the file:


 from struct import *
 #..... some other blocks of code
 self.standard = {"DWORD":4,"WORD": 2,"BYTE": 1,"TEXT11": 1,"TEXT12": 2}
 st = st = self.standard
 size = 0
 data = unpack("b", f.read(st["BYTE"]))
 #how to get bits???
 if size < 128:
 #use st["TEXT11"]
 elif size < 16384:
 #use st["TEXT12"]

Question 2

It doesn't actually say "get bits from a byte". By "last bit" the author apparently means the most significant bit of the value.

Question 3

@Kinrad Rudolph i thought so at first too but i think they numbering with reference to 0 in step 3, like 0,1,2,3,4,5,6,7. so 7 would essentially be the 8th bit

Question 4

@KonradRudolph when discussing data formats, the bits of a value are normally numbered from 0 upward, so the most significant bit of a byte-sized value is bit 7, not (nonexistant) bit 8.

Question 5

What I'm confused about is what it means to "get bits from a byte"

You do that using bit operations. For example, to get the first (lower) 7 bits of a byte, use

byte & 127

Or, equivalently,

byte & 0x7f

Or

byte & 0b1111111

In your case, byte would be the first and only member of the tuple data.

To get the last bit, you need to both mask the bit (using &) and bit-shift it into position (using >>) — although in your case, since you only need to check whether it’s set, the shifting isn’t absolutely necessary.

Question 6

hey, thanks for the reply. but in what scenario would i actually have to DO the bit shifting >>

Question 7

@TochiBedford For instance, you could store two four-bit numbers next to each other in a single byte. To read the first number you’d do b & 0b1111. To read the second number, you could do b & 0b11110000 but now that number is shifted by four bits, and you need to shift it back down to get its value. That’s for example how the width and height of Pokémon sprite images used to be stored.

Question 8

after playing around with these examples. I had this thought, is it possible to just check if the last bit is set, this way? bin(data[0])[-1] == "0" . or is it bad practice?

Question 9

@TochiBedford Yes, it’s very bad practice. First off, it’s more code and harder to read. And especially once you’re familiar with bit operations, it’s a lot more complicated than the straightforward logic of bit operations, because it has to perform a lot more work. And lastly it’s also a lot less efficient. For individual operations you won’t notice the difference but if you did this a lot inside a loop you will probably notice that it’s around a hundred times (!) less efficient (probably not in Python but likely in C).

Question 10

Maybe the confusion is related to the binary representation of the integer number, for example, if we have the number 171 it is equivalent to this binary configuration (1 byte):

val = 0b10101011 # (bit configuration)
print(val) # -> 171 (integer value)

Now you can use a bit mask to let pass only 1 of those bits (big endian notation):

print(val & 0b00000001) # -> only the first bit pass and then it prints 1
print(val & 0b10000000) # -> only the latest bit pass and then it prints 128
print(val & 0b00000100) # -> it prints 0 because val does not have a 1 to the third position

Then, to check if the seventh bit is 1 you can do the following operation:

print((val & 0b01000000) >> 6)
# val = 0b10101011
# ^
# mask = 0b01000000
# result = 0b00000000 -> 0 (integer)
# shift = ^123456 -> 0b0

The bit shift (>> operator) allows you to get the result of the bit mask.

For example, if you want the second bit:

print((val & 0b00000010) >> 1)
# val = 0b10101011
# ^
# mask = 0b00000010
# result = 0b00000010 -> 2 (integer)
# shift = ^1 -> 1b0 -> 1 (integer)

Konrad Rudolph 550k142 gold badges968 silver badges1.3k bronze badges · Accepted Answer · 2020-07-15 08:08:53Z

3

What I'm confused about is what it means to "get bits from a byte"

You do that using bit operations. For example, to get the first (lower) 7 bits of a byte, use

byte & 127

Or, equivalently,

byte & 0x7f

Or

byte & 0b1111111

In your case, byte would be the first and only member of the tuple data.

To get the last bit, you need to both mask the bit (using &) and bit-shift it into position (using >>) — although in your case, since you only need to check whether it’s set, the shifting isn’t absolutely necessary.

Share

Improve this answer

answered Jul 15, 2020 at 8:08

Konrad Rudolph's user avatar

Konrad Rudolph

550k142 gold badges968 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tochi Bedford

Tochi Bedford Over a year ago

hey, thanks for the reply. but in what scenario would i actually have to DO the bit shifting >>

2020年07月15日T08:28:19.557Z+00:00

Konrad Rudolph

Konrad Rudolph Over a year ago

@TochiBedford For instance, you could store two four-bit numbers next to each other in a single byte. To read the first number you’d do b & 0b1111. To read the second number, you could do b & 0b11110000 but now that number is shifted by four bits, and you need to shift it back down to get its value. That’s for example how the width and height of Pokémon sprite images used to be stored.

2020年07月15日T08:32:27.113Z+00:00

Tochi Bedford

Tochi Bedford Over a year ago

after playing around with these examples. I had this thought, is it possible to just check if the last bit is set, this way? bin(data[0])[-1] == "0" . or is it bad practice?

2020年07月15日T09:12:08.323Z+00:00

Konrad Rudolph

Konrad Rudolph Over a year ago

@TochiBedford Yes, it’s very bad practice. First off, it’s more code and harder to read. And especially once you’re familiar with bit operations, it’s a lot more complicated than the straightforward logic of bit operations, because it has to perform a lot more work. And lastly it’s also a lot less efficient. For individual operations you won’t notice the difference but if you did this a lot inside a loop you will probably notice that it’s around a hundred times (!) less efficient (probably not in Python but likely in C).

2020年07月15日T10:10:56.567Z+00:00

CollectivesTM on Stack Overflow

Reading Bits from a byte with python

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related