How do I convert a Python 3 byte-string variable into a regular string? [duplicate]

Question 1

I have read in an XML email attachment with

bytes_string=part.get_payload(decode=False)

The payload comes in as a byte string, as my variable name suggests.

I am trying to use the recommended Python 3 approach to turn this string into a usable string that I can manipulate.

The example shows:

str(b'abc','utf-8')

How can I apply the b (bytes) keyword argument to my variable bytes_string and use the recommended approach?

The way I tried doesn't work:

str(bbytes_string, 'utf-8')

Question 2

Does this answer your question? Convert bytes to a string

Question 3

You had it nearly right in the last line. You want

str(bytes_string, 'utf-8')

because the type of bytes_string is bytes, the same as the type of b'abc'.

Question 4

str(bytes_string, 'utf-8', 'ignore') Errors can be ignored by passing the third parameter.

Question 5

That looks like it should be a comment to pylang's answer (which addresses handling invalid input). If (you believe that) there's nothing wrong with bytes_string, why would you want to ignore errors?

Question 6

I am getting following error with your approach: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 0: invalid start byte for the following bytes string b'\xbf\x8cd\xba\x7f\xe0\xf0\xb8t\xfe.TaFJ\xad\x100\x07p\xa0\x1f90\xb7P\x8eP\x90\x06)0' @TobySpeight

Question 7

Well @alper, that's not a valid UTF-8 string, so what did you expect?

Question 8

Call decode() on a bytes instance to get the text which it encodes.

str = bytes.decode()

Question 9

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 230: invalid start byte

Question 10

@JuhaUntinen your encoding is probably not utf-8.

Question 11

How to filter (skip) non-UTF8 charachers from array?

Question 12

Use str = bytes.decode("utf-8) to use a diffirent encoding. Replace utf-8 to the encoding you want.

Question 13

How to filter (skip) non-UTF8 charachers from array?

To address this comment in @uname01's post and the OP, ignore the errors:

Code

>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'

Details

From the docs, here are more examples using the same errors parameter:

>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict") 
Traceback (most recent call last):
 ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
 invalid start byte

The errors argument specifies the response when the input string can’t be converted according to the encoding’s rules. Legal values for this argument are 'strict' (raise a UnicodeDecodeError exception), 'replace' (use U+FFFD, REPLACEMENT CHARACTER), or 'ignore' (just leave the character out of the Unicode result).

Question 14

UPDATED:

TO NOT HAVE ANY b and quotes at first and end

How to convert bytes as seen to strings, even in weird situations.

As your code may have unrecognizable characters to 'utf-8' encoding, it's better to use just str without any additional parameters:

some_bad_bytes = b'\x02-\xdfI#)'
text = str( some_bad_bytes )[2:-1]
print(text)

Output: \x02-\xdfI

if you add 'utf-8' parameter, to these specific bytes, you should receive error.

As PYTHON 3 standard says, text would be in utf-8 now with no concern.

Question 15

result is "b'\\x02-\\xdfI#)'" which probably isn't what he wants

Question 16

@GlenThompson it is just an example for unwanted conditions, that may happen. I use this specific text intentionally. If you mean text has a b in first of it, then I updated answer

Question 17

so very thanks i'm searching for a way for remove the b'' of an string that have ansi character without encoding and lossing the characters, i'm new in python and don't know than i can reduce an array from start and beginning using indexes :O

Toby Speight 32.4k58 gold badges83 silver badges118 bronze badges · Accepted Answer · 2015-06-25 21:09:13Z

252

You had it nearly right in the last line. You want

str(bytes_string, 'utf-8')

because the type of bytes_string is bytes, the same as the type of b'abc'.

Share

Improve this answer

edited Jan 30, 2018 at 19:32

ndmeiri's user avatar

ndmeiri

5,03912 gold badges39 silver badges47 bronze badges

answered Jun 25, 2015 at 21:09

Toby Speight's user avatar

Toby Speight

32.4k58 gold badges83 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Shubhamoy

Shubhamoy Over a year ago

str(bytes_string, 'utf-8', 'ignore') Errors can be ignored by passing the third parameter.

2018年06月08日T06:14:33.11Z+00:00

Toby Speight

Toby Speight Over a year ago

That looks like it should be a comment to pylang's answer (which addresses handling invalid input). If (you believe that) there's nothing wrong with bytes_string, why would you want to ignore errors?

2018年06月18日T08:36:04.323Z+00:00

alper

alper Over a year ago

I am getting following error with your approach: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 0: invalid start byte for the following bytes string b'\xbf\x8cd\xba\x7f\xe0\xf0\xb8t\xfe.TaFJ\xad\x100\x07p\xa0\x1f90\xb7P\x8eP\x90\x06)0' @TobySpeight

2019年02月28日T08:41:47.887Z+00:00

Toby Speight

Toby Speight Over a year ago

Well @alper, that's not a valid UTF-8 string, so what did you expect?

2019年02月28日T09:58:46.057Z+00:00

CollectivesTM on Stack Overflow

How do I convert a Python 3 byte-string variable into a regular string? [duplicate]

4 Answers 4

4 Comments

4 Comments

Comments

3 Comments

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

4 Comments

4 Comments

Comments

3 Comments

Linked

Related