Confusion about Python Decode method

Asked 11 years, 1 month ago

Viewed 698 times

I'm trying to run the command u'\xe1'.decode("utf-8") in python and I get this error:

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
 return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)

Why does it say I'm trying to decode ascii when I'm passing utf-8 as the first argument? In addition to this, is there any way I can get the character á from u'\xe1' and save it in a string?

Improve this question

asked Nov 20, 2014 at 22:56

BrockLee's user avatar

BrockLee

9812 gold badges9 silver badges26 bronze badges

3

what exactly are you trying to do?

Padraic Cunningham
– Padraic Cunningham

2014年11月20日 23:07:13 +00:00
Commented Nov 20, 2014 at 23:07
The python script I'm running takes text, processes it, and prints a JSON string containing a categorized version of the original text. What I've noticed is that characters like this sometimes end up as their unicode values in the printed JSON string.

BrockLee
– BrockLee

2014年11月20日 23:11:38 +00:00
Commented Nov 20, 2014 at 23:11
when you print your string you will see á

Padraic Cunningham
– Padraic Cunningham

2014年11月20日 23:15:00 +00:00
Commented Nov 20, 2014 at 23:15
So I was able to solve the problem. Thank you guys for the help. But I'm still confused on why the error says it's an ascii encoding problem when I'm using utf-8 instead.

BrockLee
– BrockLee

2014年11月21日 22:15:35 +00:00
Commented Nov 21, 2014 at 22:15

Add a comment |

1 Answer 1

Sorted by: Reset to default

decode will take a string and convert it to unicode (eg: "\xb0".decode("utf8") ==> u"\xb0")

encode will take unicode and convert it to a string (eg: u"\xb0".encode("utf8") ==> "\xb0")

neither has much to do with the rendering of a string... it is mostly an internal representation

try

print u"\xe1"

(your terminal will need to support unicode (idle will work ... dos terminal not so much))

>>> print u"\xe1"
á
>>> print repr(u"\xe1".encode("utf8"))
'\xc3\xa1'
>>> print repr("\xc3\xa1".decode("utf8"))
u'\xe1'

Improve this answer

answered Nov 20, 2014 at 23:05

Joran Beasley's user avatar

Joran Beasley

114k13 gold badges168 silver badges187 bronze badges

3 Comments

Hackaholic

Hackaholic Over a year ago

hey can we do this too??. >>> chr(ord("\xe1")) 'á'

2014年11月20日T23:14:24.197Z+00:00

Terry Jan Reedy

Terry Jan Reedy Over a year ago

The rule given in the answer is mostly true for 2.x, perhaps always for 3.x. The example output is for 2.x, slightly different in 3.x.

2014年11月20日T23:24:16.023Z+00:00

Joran Beasley

Joran Beasley Over a year ago

in python 2 it is >>> unichr(ord("\xe1")) 'á' @Hackaholic

2014年11月21日T00:10:16.177Z+00:00

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Confusion about Python Decode method

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related