Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Required fields*

How to convert between bytes and strings in Python 3?

This is a Python 101 type question, but it had me baffled for a while when I tried to use a package that seemed to convert my string input into bytes.

As you will see below I found the answer for myself, but I felt it was worth recording here because of the time it took me to unearth what was going on. It seems to be generic to Python 3, so I have not referred to the original package I was playing with; it does not seem to be an error (just that the particular package had a .tostring() method that was clearly not producing what I understood as a string...)

My test program goes like this:

import mangler # spoof package
stringThing = """
<Doc>
 <Greeting>Hello World</Greeting>
 <Greeting>你好</Greeting>
</Doc>
"""
# print out the input
print('This is the string input:')
print(stringThing)
# now make the string into bytes
bytesThing = mangler.tostring(stringThing) # pseudo-code again
# now print it out
print('\nThis is the bytes output:')
print(bytesThing)

The output from this code gives this:

This is the string input:
<Doc>
 <Greeting>Hello World</Greeting>
 <Greeting>你好</Greeting>
</Doc>
This is the bytes output:
b'\n<Doc>\n <Greeting>Hello World</Greeting>\n <Greeting>\xe4\xbd\xa0\xe5\xa5\xbd</Greeting>\n</Doc>\n'

So, there is a need to be able to convert between bytes and strings, to avoid ending up with non-ascii characters being turned into gobbledegook.

Answer*

Draft saved
Draft discarded
Cancel
5
  • If you look at the actual method implementations you'll see that utf-8 is the default encoding, therefore you can omit it given that you know that the encoding is indeed utf-8, i.e. stringThing.encode() and bytesThing.decode() will do just fine. Commented Jul 17, 2016 at 15:29
  • @ccpizza Making the encoding explicit in the above examples makes it much clearer what is going on, and IMHO is good practice. Not all unicode is UTF-8. It also avoids the silent failure referred to in the last paragraph. Commented Jul 18, 2016 at 18:06
  • totally agree; explicit is better than implicit, but imo it is good to know what is the implicit. Whether to use it or not is another question. Just because you can doesn't mean you should :) Commented Jul 18, 2016 at 21:17
  • In Python 3 it's safer to use decode('utf-8', 'backslashreplace') to avoid an exception if the encoding is unknown. One shouldn't always assume UTF-8! Commented Feb 12, 2018 at 17:19
  • bytesThing.decode(encoding = locale.getpreferredencoding()) is more accurate, than ignorantly assuming UTF-8 Commented Mar 11, 2025 at 21:28

lang-py

AltStyle によって変換されたページ (->オリジナル) /