error encoding string as unicode in python 2.7?

Question 1

I want to print the unicode version of a string in Python 2.7. It works fine in Python 3. But with python 2.7, I get the following error:

x="strings are now utf-8 \u03BCnico\u0394é!"

Python 3:

print('Python', python_version())
print(x)
Python 3.4.1
strings are now utf-8 μnicoΔé!

Python 2.7

>>> x='strings are now utf-8 \u03BCnico\u0394é!'
>>> x.encode('utf-8')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 38: ordinal not in range(128)

EDIT: I tried the followimg:

>>> x = u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'
>>> x
u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'
>>> x.encode("utf-8")
'strings are now utf-8 \\u03BCnico\\u0394\xc3\x83\xc2\xa9!'
>>> x
u'strings are now utf-8 \\u03BCnico\\u0394\xc3\xa9!'

I don't see the encoding happening

EDIT 2:

>>> x=u'strings are now utf-8 \u03BCnico\u0394é!'
>>> x.encode("utf-8")
'strings are now utf-8 \xce\xbcnico\xce\x94\xc3\xa9!'
>>> b=x.encode("utf-8")
>>> b
'strings are now utf-8 \xce\xbcnico\xce\x94\xc3\xa9!'
>>>

Question 2

Your first problem is that you're trying to encode a byte string. You decode byte strings into unicode, and you encode unicode into byte strings in a particular encoding (utf-8 for example).

Question 3

Just try printing the unicode literal print x without the .encode().

Question 4

Your second problem is that you're trying to use unicode escape sequences (\u...) in a byte string - they only work in unicode literals, as demonstrated in @LyndsySimon's answer.

Question 5

Also: str.encode() doesn't operate in place like you seem to assume in your edited part. You'll need to look at the result of encode() to see the encoding taking place, the original string won't change.

Question 6

@eagertoLearn because you're not printing it - print that last string, and you'll see. What Python shows you if you just enter a variable in the interpreter is the representation of a string.

Question 7

In Python 2.x, you'll need to use the unicode literal:

x=u"strings are now utf-8 \u03BCnico\u0394é!"

Without this, the encode method doesn't know what encoding the string is, and assumes that it is ASCII. It then tries to convert ASCII to UTF-8, and fails when it encounters a character outside the ASCII character set.

Note also that Python 3.3 and above supports this notation. It's basically a no-op in that context because all strings are assumed unicode, but allows developers to write code that is compatible with both 2.x and 3.3+.

Question 8

I reviewed the comments on your OP - do you still need me to review your changes?

Lyndsy Simon 5,2781 gold badge19 silver badges21 bronze badges · Accepted Answer · 2014-07-14 18:06:41Z

In Python 2.x, you'll need to use the unicode literal:

x=u"strings are now utf-8 \u03BCnico\u0394é!"

Without this, the encode method doesn't know what encoding the string is, and assumes that it is ASCII. It then tries to convert ASCII to UTF-8, and fails when it encounters a character outside the ASCII character set.

Note also that Python 3.3 and above supports this notation. It's basically a no-op in that context because all strings are assumed unicode, but allows developers to write code that is compatible with both 2.x and 3.3+.

I reviewed the comments on your OP - do you still need me to review your changes?

CollectivesTM on Stack Overflow

error encoding string as unicode in python 2.7?

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related