Python and encoding, again

Question 1

I have the next code snippet in Python (2.7.8) on Windows:

text1 = 'áéíóú'
text2 = text1.encode("utf-8")

and i have the next error exception:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

Any ideas?

Question 2

You forgot to specify that you are dealing with a unicode string:

text1 = u'áéíóú' #prefix string with "u"
text2 = text1.encode("utf-8")

In python 3 this behavior has changed, and any string is unicode, so you don't need to specify it.

Question 3

@SuperBiasedMan - You should try it once.

Question 4

@J19 You're welcome :) I've noticed you're not usually accepting answers. On SO, it's customary to mark a answer as accepted if it solves your issue, so other people can quickly figure out the solution. To mark an answer as accepted, click the checkmark left of the answer text

Question 5

@SuperBiasedMan No problem. I've added some info to the answer

Question 6

I have tried the following code in Linux with Python 2.7:

>>> text1 = 'áéíóú'
>>> text1
'\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba'
>>> type(text1)
<type 'str'>
>>> text1.decode("utf-8")
u'\xe1\xe9\xed\xf3\xfa'
>>> print '\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba'
áéíóú
>>> print u'\xe1\xe9\xed\xf3\xfa'
áéíóú
>>> u'\xe1\xe9\xed\xf3\xfa'.encode('utf-8')
'\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba'

\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba is the utf-8 coding of áéíóú. And \xe1\xe9\xed\xf3\xfa is the unicode coding of áéíóú.

text1 is encoded by utf-8, it only can be decoded to unicode by:

text1.decode("utf-8")

an unicode string can be encoded to an utf-8 string:

u'\xe1\xe9\xed\xf3\xfa'.encode('utf-8')

Question 7

This is a good explanation of the process, but please notice there's no such thing as a "unicode coding". Unicode is just a number table, not a encoding. \xe1\xe9\xed\xf3\xfa is actually latin-1

loopbackbee 23.6k11 gold badges69 silver badges102 bronze badges · Accepted Answer · 2015-07-22 15:08:18Z

2

You forgot to specify that you are dealing with a unicode string:

text1 = u'áéíóú' #prefix string with "u"
text2 = text1.encode("utf-8")

In python 3 this behavior has changed, and any string is unicode, so you don't need to specify it.

Share

Improve this answer

edited Jul 22, 2015 at 15:26

answered Jul 22, 2015 at 15:08

loopbackbee's user avatar

loopbackbee

23.6k11 gold badges69 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

karthikr

karthikr Over a year ago

@SuperBiasedMan - You should try it once.

2015年07月22日T15:14:38.873Z+00:00

loopbackbee

loopbackbee Over a year ago

@J19 You're welcome :) I've noticed you're not usually accepting answers. On SO, it's customary to mark a answer as accepted if it solves your issue, so other people can quickly figure out the solution. To mark an answer as accepted, click the checkmark left of the answer text

2015年07月22日T15:21:06.213Z+00:00

loopbackbee

loopbackbee Over a year ago

@SuperBiasedMan No problem. I've added some info to the answer

2015年07月22日T15:26:38.44Z+00:00

CollectivesTM on Stack Overflow

Python and encoding, again

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related