I have read in an XML email attachment with
bytes_string=part.get_payload(decode=False)
The payload comes in as a byte string, as my variable name suggests.
I am trying to use the recommended Python 3 approach to turn this string into a usable string that I can manipulate.
The example shows:
str(b'abc','utf-8')
How can I apply the b (bytes) keyword argument to my variable bytes_string and use the recommended approach?
The way I tried doesn't work:
str(bbytes_string, 'utf-8')
-
3Does this answer your question? Convert bytes to a stringJosh Correia– Josh Correia2020年10月21日 22:59:36 +00:00Commented Oct 21, 2020 at 22:59
4 Answers 4
You had it nearly right in the last line. You want
str(bytes_string, 'utf-8')
because the type of bytes_string is bytes, the same as the type of b'abc'.
4 Comments
str(bytes_string, 'utf-8', 'ignore') Errors can be ignored by passing the third parameter.bytes_string, why would you want to ignore errors?UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 0: invalid start byte for the following bytes string b'\xbf\x8cd\xba\x7f\xe0\xf0\xb8t\xfe.TaFJ\xad\x100\x07p\xa0\x1f90\xb7P\x8eP\x90\x06)0' @TobySpeightCall decode() on a bytes instance to get the text which it encodes.
str = bytes.decode()
4 Comments
str = bytes.decode("utf-8) to use a diffirent encoding. Replace utf-8 to the encoding you want.How to filter (skip) non-UTF8 charachers from array?
To address this comment in @uname01's post and the OP, ignore the errors:
Code
>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'
Details
From the docs, here are more examples using the same errors parameter:
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict")
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
invalid start byte
The errors argument specifies the response when the input string can’t be converted according to the encoding’s rules. Legal values for this argument are
'strict'(raise aUnicodeDecodeErrorexception),'replace'(useU+FFFD,REPLACEMENT CHARACTER), or'ignore'(just leave the character out of the Unicode result).
Comments
UPDATED:
TO NOT HAVE ANY
band quotes at first and endHow to convert
bytesas seen to strings, even in weird situations.
As your code may have unrecognizable characters to 'utf-8' encoding,
it's better to use just str without any additional parameters:
some_bad_bytes = b'\x02-\xdfI#)'
text = str( some_bad_bytes )[2:-1]
print(text)
Output: \x02-\xdfI
if you add 'utf-8' parameter, to these specific bytes, you should receive error.
As PYTHON 3 standard says, text would be in utf-8 now with no concern.
3 Comments
b in first of it, then I updated answer