How to handle unicode language in python

Asked 10 years, 6 months ago

Viewed 131 times

I am writing a Python + Selenium script to scrap Linkedin site.
I read the profile summary using this statement, which works properly:

profileDescription = profile.find_element_by_xpath("div/div[1]").text

My problem is with the non english data coming from the site.
I am writing the data scrapped from the site to an excel using this code:

with open('search.csv', 'ab') as csvfile:
 self.liSearchOutWriter = csv.writer(csvfile, delimiter=',')
 self.liSearchOutWriter.writerow([profileDescription])

Whenever description contains non-english data, it does not display properly in the excel. I read through unicode and utf8 resources, but could not get a grip on it.

Can someone help me understand how I should modify my code in order to display non english data properly?

Improve this question

asked Jul 10, 2015 at 16:55

cppcoder's user avatar

cppcoder

23.3k7 gold badges62 silver badges90 bronze badges

Which version of python are you using? (And, uh, if you're using python2, can you switch to using python3?)

NightShadeQueen
– NightShadeQueen

2015年07月10日 16:57:15 +00:00
Commented Jul 10, 2015 at 16:57
I am using Python 2.7 and I cannot use Python 3

cppcoder
– cppcoder

2015年07月10日 17:05:39 +00:00
Commented Jul 10, 2015 at 17:05
Consider opening the file with codecs.open.

Davide R.
– Davide R.

2015年07月10日 17:08:58 +00:00
Commented Jul 10, 2015 at 17:08

Add a comment |

1 Answer 1

Sorted by: Reset to default

In Python 3.X this is supported out of the box:

 import csv
 with open('search.csv', newline='', encoding='utf-8') as csvfile:
 reader = csv.reader(csvfile)
 for row in reader:
 print(row)

If you're in Python 2.X there is a drop-in library for csv that supports unicode: unicode-csv

import unicodecsv
with open('search.csv', newline='', encoding='utf-8') as csvfile:
 unicodecsv.reader(f, encoding='utf-8'

Improve this answer

answered Jul 10, 2015 at 17:04

amza's user avatar

amza

8102 gold badges8 silver badges33 bronze badges

1 Comment

cppcoder

cppcoder Over a year ago

I am getting this error after using unicodecsv. UnicodeDecodeError: 'utf8' codec can't decode byte 0xd6 in position 0: invalid ontinuation byte

2015年07月11日T17:22:30.99Z+00:00

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

How to handle unicode language in python

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related