I want to print the unicode version of a string in Python 2.7. It works fine in Python 3. But with python 2.7, I get the following error:
x="strings are now utf-8 \u03BCnico\u0394é!"
Python 3:
print('Python', python_version())
print(x)
Python 3.4.1
strings are now utf-8 μnicoΔé!
Python 2.7
>>> x='strings are now utf-8 \u03BCnico\u0394é!'
>>> x.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 38: ordinal not in range(128)
EDIT: I tried the followimg:
>>> x = u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'
>>> x
u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'
>>> x.encode("utf-8")
'strings are now utf-8 \\u03BCnico\\u0394\xc3\x83\xc2\xa9!'
>>> x
u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'
I don't see the encoding happening
EDIT 2:
>>> x=u'strings are now utf-8 \u03BCnico\u0394é!'
>>> x.encode("utf-8")
'strings are now utf-8 \xce\xbcnico\xce\x94\xc3\xa9!'
>>> b=x.encode("utf-8")
>>> b
'strings are now utf-8 \xce\xbcnico\xce\x94\xc3\xa9!'
>>>
1 Answer 1
In Python 2.x, you'll need to use the unicode literal:
x=u"strings are now utf-8 \u03BCnico\u0394é!"
Without this, the encode method doesn't know what encoding the string is, and assumes that it is ASCII. It then tries to convert ASCII to UTF-8, and fails when it encounters a character outside the ASCII character set.
Note also that Python 3.3 and above supports this notation. It's basically a no-op in that context because all strings are assumed unicode, but allows developers to write code that is compatible with both 2.x and 3.3+.
1 Comment
Explore related questions
See similar questions with these tags.
utf-8for example).print xwithout the.encode().\u...) in a byte string - they only work in unicode literals, as demonstrated in @LyndsySimon's answer.str.encode()doesn't operate in place like you seem to assume in your edited part. You'll need to look at the result ofencode()to see the encoding taking place, the original string won't change.printing it - print that last string, and you'll see. What Python shows you if you just enter a variable in the interpreter is the representation of a string.