Is there any simple way for me to read the contents of a binary file as a binary string, turn it into a normal (utf-8) string, do some operations with it, turn it back into a binary string and write it into a binary file? I tried doing something as simple as:
a_file = open('image1.png', 'rb')
text = b''
for a_line in a_file:
text += a_line
a_file.close()
text2 = text.decode('utf-8')
text3 = text2.encode()
a_file = open('image2.png', 'wb')
a_file.write(text3)
a_file.close()
but I get 'Unicode can not decode bytes in position...'
What am I doing terribly wrong?
-
1Why do you think a PNG file would contain text?Ignacio Vazquez-Abrams– Ignacio Vazquez-Abrams2015年10月17日 00:05:38 +00:00Commented Oct 17, 2015 at 0:05
-
Not sure what you're trying to accomplish, but this answer to another question may help.martineau– martineau2015年10月17日 00:11:23 +00:00Commented Oct 17, 2015 at 0:11
1 Answer 1
The utf8 format has enough structure that random arrangements of bytes are not valid UTF-8. The best approach would be to simply work with the bytes read from the file (which you can extract in one step with text = a_file.read()). Binary strings (type bytes) have all the string methods you'll want, even text-oriented ones like isupper() or swapcase(). And then there's bytearray, a mutable counterpart to the bytes type.
If for some reason you really want to turn your bytes into a str object, use a pure 8-bit encoding like Latin1. You'll get a unicode string, which is what you are really after. (UTF-8 is just an encoding for Unicode-- a very different thing.)
2 Comments
latin-1), you don't need to handle the encode/decode yourself in Python 3. Just change open('image1.png', 'rb') to open('image1.png', 'r', encoding='latin-1'), and for the output, open('image2.png', 'w', encoding='latin-1') and you can read and write without bothering to manually encode/decode; it will have been decoded to str for you on read, and will encode the str for you on write.str at all.Explore related questions
See similar questions with these tags.