1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Return to Answer

Post Timeline

added 609 characters in body

Source Link

edited Aug 4, 2013 at 11:18

Tim Pietzcker

edited Aug 4, 2013 at 11:18

Tim Pietzcker

337.5k
59
521
572

You're seeing the UTF-8-encoded version of your string (which you shouldn't name str, by the way). By adding the # -*- coding: utf-8 -*- line at the start of your script, you're telling Python that that's the encoding your script is using. Are you sure that it is in fact using that encoding?

If that's not the case (check your editor!) or if your terminal window (where you're printing the string) happens to be using a different encoding, you'll get gibberish (or errors if the encoded string can't be interpreted in that encoding).

Only if you decode your (byte)string, you'll get a Unicode object.

So first you need to know your terminal's character encoding. Then you should be converting all strings to Unicode as soon as possible and manipulate only Unicode objects in your program until it's time to output them - at which point you need to encode them to the correct encoding.

For example

# -*- coding: utf-8 -*- 
s = u"测试"
s = s + u"娴嬭瘯"
print s.encode("somecodepage")

Only if you decode your (byte)string, you'll get a Unicode object.

For example

# -*- coding: utf-8 -*- 
s = u"测试"
s = s + u"娴嬭瘯"
print s.encode("somecodepage")

Source Link

answered Aug 4, 2013 at 11:02

Tim Pietzcker

answered Aug 4, 2013 at 11:02

Tim Pietzcker

337.5k
59
521
572

Only if you decode your (byte)string, you'll get a Unicode object.

lang-py

CollectivesTM on Stack Overflow