2

I am using Python 2.6.6

item = {u'snippet': {u'title': u'How to Pronounce Canap\xe9'}}
title = item['snippet']['title']
print title

Result:

How to Pronounce Canapé

Desired result:

How to Pronounce Canapé

This looks like a Unicode issue, I tried encode and decode to utf8, but result still the same, any ideas?

asked Mar 19, 2014 at 4:18
9
  • That code sample works fine in my terminal. I have to assume this is an issue with your OS or terminal. What OS/Terminal software are you using? Commented Mar 19, 2014 at 4:21
  • How are you running this code? Commented Mar 19, 2014 at 4:29
  • @BenEchols, OS is CentOS 6.4, Terminal is SecureCRT 4.0 Commented Mar 19, 2014 at 4:31
  • @BurhanKhalid, on command line I type python, that puts me into the Python shell Commented Mar 19, 2014 at 4:32
  • 2
    Check the encoding of your SecureCRT session and make sure its UTF8 and not latin-1 or similar. Commented Mar 19, 2014 at 4:35

5 Answers 5

2

Your terminal expects UTF-8:

$ locale charmap
UTF-8 

Python prints using UTF-8:

>>> sys.stdout.encoding
UTF-8

Change SecureCRT setting to accept UTF-8.

answered Mar 19, 2014 at 5:29
Sign up to request clarification or add additional context in comments.

Comments

1

This is quite possibly due to mismatch of the default encoding that Python is using versus the console's encoding. It looks like Python is assuming that the encoding is UTF-8 but then the console is interpreting that as latin-1.

answered Mar 19, 2014 at 4:21

Comments

1

Instead of \xe9, use \u00e9 if possible. Then pick an appropriate encoding when outputting the unicode string:

print title.encode('latin1')

What encoding is sensible depends on where you are outputting to. Generally, you have to infer it from the environment variables, or maybe let your users make a choice in a configuration file.

PS: If you deal with Unicode strings a lot, I'd recommend switching to Python 3 (e.g. 3.3), if at all possible. Unicode handling is a lot more clear/explicit/sane, there.

answered Mar 19, 2014 at 4:23

5 Comments

I am not able to change \xe9 to \u00e9, the \xe9 is raw data from YouTube API
Ok, that shouldn't matter for Python2.7. From the output you've show, I think 'latin1' might be the correct encoding in your case.
@ChristianAichinger: u'\xe9' == u'\u00e9' therefore changing it won't help. Instead of .encode('latin1'), change SecureCRT to match the terminal settings on CentOS. If sys.stdout.encoding is correct (it matches $LC_TYPE, $LANG) then using Python 3 won't help
@J.F.Sebastian, I am getting the same error when I write the values to a file on the file system, would that indicate the problem is not SecureCRT?
@davidjhp: Writing to a file is different from writing to a terminal. If the output is redirected to a file, you could control the stdout encoding using PYTHONIOENCODING. Update your question with the output of print(repr(open("your_output_file", "rb").read()))?
0

I am getting your expected output in my terminal (using python 2.7.7) The format you are expecting depends on encoding set in the terminal. For me, it is set to 'cp437'

>>> import sys
>>> sys.stdin.encoding
'cp437'
>>> sys.stdout.encoding
'cp437'

You can verify that, you are getting correct output by giving:

print title.encode('cp437')
answered Mar 19, 2014 at 4:35

Comments

0

set your default encoding to iso-8859-1 in your sitecustomize.py file in ${pythondir}/lib/site-packages/ as

import sys
sys.setdefaultencoding('iso-8859-1')

for me it worked with \xe9.

answered Mar 19, 2014 at 4:59

2 Comments

AttributeError: 'module' object has no attribute 'setdefaultencodi
@davidjhp: don't do it. Changing sys.getdefaultencoding() from 'ascii' might break other Python scripts on your system in a subtle way.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.