1

I want to print the unicode version of a string in Python 2.7. It works fine in Python 3. But with python 2.7, I get the following error:

x="strings are now utf-8 \u03BCnico\u0394é!"

Python 3:

print('Python', python_version())
print(x)
Python 3.4.1
strings are now utf-8 μnicoΔé!

Python 2.7

>>> x='strings are now utf-8 \u03BCnico\u0394é!'
>>> x.encode('utf-8')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 38: ordinal not in range(128)

EDIT: I tried the followimg:

>>> x = u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'
>>> x
u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'
>>> x.encode("utf-8")
'strings are now utf-8 \\u03BCnico\\u0394\xc3\x83\xc2\xa9!'
>>> x
u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'

I don't see the encoding happening

EDIT 2:

>>> x=u'strings are now utf-8 \u03BCnico\u0394é!'
>>> x.encode("utf-8")
'strings are now utf-8 \xce\xbcnico\xce\x94\xc3\xa9!'
>>> b=x.encode("utf-8")
>>> b
'strings are now utf-8 \xce\xbcnico\xce\x94\xc3\xa9!'
>>> 
asked Jul 14, 2014 at 18:02
8
  • 1
    Your first problem is that you're trying to encode a byte string. You decode byte strings into unicode, and you encode unicode into byte strings in a particular encoding (utf-8 for example). Commented Jul 14, 2014 at 18:32
  • Just try printing the unicode literal print x without the .encode(). Commented Jul 14, 2014 at 18:34
  • Your second problem is that you're trying to use unicode escape sequences (\u...) in a byte string - they only work in unicode literals, as demonstrated in @LyndsySimon's answer. Commented Jul 14, 2014 at 18:35
  • 1
    Also: str.encode() doesn't operate in place like you seem to assume in your edited part. You'll need to look at the result of encode() to see the encoding taking place, the original string won't change. Commented Jul 14, 2014 at 18:40
  • 1
    @eagertoLearn because you're not printing it - print that last string, and you'll see. What Python shows you if you just enter a variable in the interpreter is the representation of a string. Commented Jul 14, 2014 at 18:46

1 Answer 1

2

In Python 2.x, you'll need to use the unicode literal:

x=u"strings are now utf-8 \u03BCnico\u0394é!"

Without this, the encode method doesn't know what encoding the string is, and assumes that it is ASCII. It then tries to convert ASCII to UTF-8, and fails when it encounters a character outside the ASCII character set.

Note also that Python 3.3 and above supports this notation. It's basically a no-op in that context because all strings are assumed unicode, but allows developers to write code that is compatible with both 2.x and 3.3+.

answered Jul 14, 2014 at 18:06
Sign up to request clarification or add additional context in comments.

1 Comment

I reviewed the comments on your OP - do you still need me to review your changes?

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.