Byte file conversion via python 3

Question 1

Is there any simple way for me to read the contents of a binary file as a binary string, turn it into a normal (utf-8) string, do some operations with it, turn it back into a binary string and write it into a binary file? I tried doing something as simple as:

a_file = open('image1.png', 'rb')
text = b''
for a_line in a_file:
 text += a_line
a_file.close()
text2 = text.decode('utf-8')
text3 = text2.encode()
a_file = open('image2.png', 'wb')
a_file.write(text3)
a_file.close()

but I get 'Unicode can not decode bytes in position...'

What am I doing terribly wrong?

Question 2

Why do you think a PNG file would contain text?

Question 3

Not sure what you're trying to accomplish, but this answer to another question may help.

Question 4

The utf8 format has enough structure that random arrangements of bytes are not valid UTF-8. The best approach would be to simply work with the bytes read from the file (which you can extract in one step with text = a_file.read()). Binary strings (type bytes) have all the string methods you'll want, even text-oriented ones like isupper() or swapcase(). And then there's bytearray, a mutable counterpart to the bytes type.

If for some reason you really want to turn your bytes into a str object, use a pure 8-bit encoding like Latin1. You'll get a unicode string, which is what you are really after. (UTF-8 is just an encoding for Unicode-- a very different thing.)

Question 5

And note, if you settle on a working encoding (e.g. latin-1), you don't need to handle the encode/decode yourself in Python 3. Just change open('image1.png', 'rb') to open('image1.png', 'r', encoding='latin-1'), and for the output, open('image2.png', 'w', encoding='latin-1') and you can read and write without bothering to manually encode/decode; it will have been decoded to str for you on read, and will encode the str for you on write.

Question 6

Good point; though opening the files in binary mode makes the code a little more transparent... I'm not sure the OP should be converting to str at all.

alexis 50.5k18 gold badges108 silver badges173 bronze badges · Accepted Answer · 2015-10-17 00:38:55Z

The utf8 format has enough structure that random arrangements of bytes are not valid UTF-8. The best approach would be to simply work with the bytes read from the file (which you can extract in one step with text = a_file.read()). Binary strings (type bytes) have all the string methods you'll want, even text-oriented ones like isupper() or swapcase(). And then there's bytearray, a mutable counterpart to the bytes type.

If for some reason you really want to turn your bytes into a str object, use a pure 8-bit encoding like Latin1. You'll get a unicode string, which is what you are really after. (UTF-8 is just an encoding for Unicode-- a very different thing.)

And note, if you settle on a working encoding (e.g. latin-1), you don't need to handle the encode/decode yourself in Python 3. Just change open('image1.png', 'rb') to open('image1.png', 'r', encoding='latin-1'), and for the output, open('image2.png', 'w', encoding='latin-1') and you can read and write without bothering to manually encode/decode; it will have been decoded to str for you on read, and will encode the str for you on write.
Good point; though opening the files in binary mode makes the code a little more transparent... I'm not sure the OP should be converting to str at all.

CollectivesTM on Stack Overflow

Byte file conversion via python 3

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related