I have the next code snippet in Python (2.7.8) on Windows:
text1 = 'áéíóú'
text2 = text1.encode("utf-8")
and i have the next error exception:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
Any ideas?
2 Answers 2
You forgot to specify that you are dealing with a unicode string:
text1 = u'áéíóú' #prefix string with "u"
text2 = text1.encode("utf-8")
In python 3 this behavior has changed, and any string is unicode, so you don't need to specify it.
3 Comments
I have tried the following code in Linux with Python 2.7:
>>> text1 = 'áéíóú'
>>> text1
'\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba'
>>> type(text1)
<type 'str'>
>>> text1.decode("utf-8")
u'\xe1\xe9\xed\xf3\xfa'
>>> print '\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba'
áéíóú
>>> print u'\xe1\xe9\xed\xf3\xfa'
áéíóú
>>> u'\xe1\xe9\xed\xf3\xfa'.encode('utf-8')
'\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba'
\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba is the utf-8 coding of áéíóú. And \xe1\xe9\xed\xf3\xfa is the unicode coding of áéíóú.
text1 is encoded by utf-8, it only can be decoded to unicode by:
text1.decode("utf-8")
an unicode string can be encoded to an utf-8 string:
u'\xe1\xe9\xed\xf3\xfa'.encode('utf-8')
1 Comment
\xe1\xe9\xed\xf3\xfa is actually latin-1