How to convert 'binary string' to normal string in Python3? [duplicate]

Question 1

For example, I have a string like this(return value of subprocess.check_output):

>>> b'a string'
b'a string'

Whatever I did to it, it is always printed with the annoying b' before the string:

>>> print(b'a string')
b'a string'
>>> print(str(b'a string'))
b'a string'

Does anyone have any ideas about how to use it as a normal string or convert it into a normal string?

Question 2

@HanfeiSun what you call a "binary string" is a bytes object (see information about bytes object in the standard library )

Question 3

Decode it.

>>> b'a string'.decode('ascii')
'a string'

To get bytes from string, encode it.

>>> 'a string'.encode('ascii')
b'a string'

Question 4

@lyomi, I used ascii because the given string was made with ascii letters. You don't need to specify encoding if the encoding is utf-8 (default in Python 3.x according to str.encode, bytes.decode doc-string)

Question 5

@lyomi In 2016 (and its nearly the end) people still use ascii. There are many many 'legacy' products and systems (including specifications), but there are also lots of reasons why you might be creating a 'binary string' where you don't want unicode or something to try and 'merge' multiple bytes into a single character. We often use 'strings' to contain binary data for instance making DNS requests etc.

Question 6

I suggest to add the following to complete the answer. Most times we need to decode bytes from our operating system, such as console output, the most pythonic way I found to do it is to import locale and then os_encoding = locale.getpreferredencoding(). This way, we can decode using my_b_string.decode(os_encoding)

Question 7

@aturegano, It's not the only option. sys.getfilesystemencoding(), sys.stdin.encoding, sys.stdout.encoding. IMHO, using those automatic encoding detection could solve problem because the sub-program (OP is using subprocess) could be written other way to determine encoding (or even hard-coded). Thanks for feedback, anyway.

Question 8

@falsetru Note that sys.getfilesystemencoding() returns the name of the encoding used to convert between Unicode filenames and bytes filenames and is strongly dependant on operating system you are using. AFAIK, this function is used to convert to the system’s preferred representation. That means that it will not infer the codification used by the console that can be obtained using the aforementioned locale.getpreferredencoding() function

Question 9

If the answer from falsetru didn't work you could also try:

>>> b'a string'.decode('utf-8')
'a string'

Question 10

See the official encode() and decode() documentation from codecs library. utf-8 is the default encoding for the functions, but there are severals standard encodings in Python 3, like latin_1 or utf_32.

Question 11

This post is helpful in that it remedies error messages like this, "'utf-8' codec can't decode byte..."

falsetru 371k69 gold badges770 silver badges660 bronze badges · Accepted Answer · 2013-07-12 12:55:43Z

634

Decode it.

>>> b'a string'.decode('ascii')
'a string'

To get bytes from string, encode it.

>>> 'a string'.encode('ascii')
b'a string'

Share

Improve this answer

answered Jul 12, 2013 at 12:55

falsetru's user avatar

falsetru

371k69 gold badges770 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

falsetru

falsetru Over a year ago

@lyomi, I used ascii because the given string was made with ascii letters. You don't need to specify encoding if the encoding is utf-8 (default in Python 3.x according to str.encode, bytes.decode doc-string)

2016年03月30日T08:28:08.68Z+00:00

Jmons

Jmons Over a year ago

@lyomi In 2016 (and its nearly the end) people still use ascii. There are many many 'legacy' products and systems (including specifications), but there are also lots of reasons why you might be creating a 'binary string' where you don't want unicode or something to try and 'merge' multiple bytes into a single character. We often use 'strings' to contain binary data for instance making DNS requests etc.

2016年09月23日T11:55:42.38Z+00:00

aturegano

aturegano Over a year ago

I suggest to add the following to complete the answer. Most times we need to decode bytes from our operating system, such as console output, the most pythonic way I found to do it is to import locale and then os_encoding = locale.getpreferredencoding(). This way, we can decode using my_b_string.decode(os_encoding)

2017年07月27日T15:10:13.013Z+00:00

falsetru

falsetru Over a year ago

@aturegano, It's not the only option. sys.getfilesystemencoding(), sys.stdin.encoding, sys.stdout.encoding. IMHO, using those automatic encoding detection could solve problem because the sub-program (OP is using subprocess) could be written other way to determine encoding (or even hard-coded). Thanks for feedback, anyway.

2017年07月28日T09:02:57.567Z+00:00

aturegano

aturegano Over a year ago

@falsetru Note that sys.getfilesystemencoding() returns the name of the encoding used to convert between Unicode filenames and bytes filenames and is strongly dependant on operating system you are using. AFAIK, this function is used to convert to the system’s preferred representation. That means that it will not infer the codification used by the console that can be obtained using the aforementioned locale.getpreferredencoding() function

2017年07月28日T11:42:37.013Z+00:00

|

CollectivesTM on Stack Overflow

How to convert 'binary string' to normal string in Python3? [duplicate]

3 Answers 3

7 Comments

Comments

1 Comment

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

7 Comments

Comments

1 Comment

Linked

Related