I am writing a Python + Selenium script to scrap Linkedin site.
I read the profile summary using this statement, which works properly:
profileDescription = profile.find_element_by_xpath("div/div[1]").text
My problem is with the non english data coming from the site.
I am writing the data scrapped from the site to an excel using this code:
with open('search.csv', 'ab') as csvfile:
self.liSearchOutWriter = csv.writer(csvfile, delimiter=',')
self.liSearchOutWriter.writerow([profileDescription])
Whenever description contains non-english data, it does not display properly in the excel. I read through unicode and utf8 resources, but could not get a grip on it.
Can someone help me understand how I should modify my code in order to display non english data properly?
1 Answer 1
In Python 3.X this is supported out of the box:
import csv
with open('search.csv', newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(row)
If you're in Python 2.X there is a drop-in library for csv that supports unicode: unicode-csv
import unicodecsv
with open('search.csv', newline='', encoding='utf-8') as csvfile:
unicodecsv.reader(f, encoding='utf-8'
1 Comment
unicodecsv. UnicodeDecodeError: 'utf8' codec can't decode byte 0xd6 in position 0: invalid ontinuation byte
codecs.open.