Python Decode UTF-8 Not working

Asked 10 years, 3 months ago

Viewed 503 times

I am using Scrapy for scraping a Persian website.

title = response.xpath('//*[@id="news"]/div/div[2]/div[2]/div[2]/div[2]/div[2]/h1/a/text()').extract()

When I extract title from the site, it's give me encoded string like this:

[u' \t\t\u0628\u06cc\u0645\u0647 10 \u0633\u0627\u0644\u0647\u200c \u062f\u0631 \u062e\u0637 \u062d\u0645\u0644\u0647\u200c\u06cc \u062a\u06cc\u0645 \u0645\u0644\u06cc \t']

After search for decode string in Python I find this way:

title = response.xpath('//*[@id="news"]/div/div[2]/div[2]/div[2]/div[2]/div[2]/h1/a/text()').extract()
print(title[0].decode('utf-8'))

When I run this code it shows me this:

 print(title[0].decode('utf-8'))
 File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
 return codecs.utf_8_decode(input, errors, True)

What is the problem?

python

Improve this question

edited Sep 28, 2015 at 9:24

Chris Martin's user avatar

Chris Martin

30.9k12 gold badges83 silver badges142 bronze badges

asked Sep 28, 2015 at 9:15

user1086010's user avatar

user1086010

7171 gold badge11 silver badges25 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default

Your string is already fine, it's only represented by unicode escapes rather than actual glyphs, so that it can be shown in ASCII consoles as well. Try printing it:

>>> x = [u' \t\t\u0628\u06cc\u0645\u0647 10 \u0633\u0627\u0644\u0647\u200c \u062f\u0631 \u062e\u0637 \u062d\u0645\u0644\u0647\u200c\u06cc \u062a\u06cc\u0645 \u0645\u0644\u06cc \t']
>>> print x[0]
 بیمه 10 ساله‌ در خط حمله‌ی تیم ملی

Improve this answer

answered Sep 28, 2015 at 9:30

Stefano Sanfilippo's user avatar

Stefano Sanfilippo

33.2k7 gold badges85 silver badges83 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

python

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Python Decode UTF-8 Not working

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related