0

This is my first program in python and need some help in writing utf-8 data to file.

The intention is to read data from excel file and write comma seperated data to text file and below is the code I am running which is giving the below pasted error.

import xlrd
import csv
import codecs
wb = xlrd.open_workbook('/etl/dev/input/CustList.xls')
sh = wb.sheet_by_index(1)
file_output = codecs.open('/etl/dev/input/CustList.csv', 'w', 'utf-8')
for rownum in xrange(sh.nrows):
 file_output.write(sh.row_values(rownum))
file_output.close()

and here is the error

Traceback (most recent call last):
 File "TestXls2Csv.py", line 20, in <module>
 file_output.write(sh.row_values(rownum))
 File "/fstools/gptools/ext/python/lib/python2.6/codecs.py", line 686, in write
 return self.writer.write(data)
 File "/fstools/gptools/ext/python/lib/python2.6/codecs.py", line 351, in write
 data, consumed = self.encode(object, self.errors)
TypeError: coercing to Unicode: need string or buffer, list found

Any help is highly appreciated.

Thanks Zulfi

Tried the below

 row_values = [str(val) for val in sh.row_values(rownum)]
 file_output.write(",".join(row_values) + "\n")

It seems to work fine for one sheet of the excel but is giving the below error for the other sheet

Traceback (most recent call last): File "TestXls2Csv.py", line 12, in file_output.write(",".join(sh.row_values(rownum)) + "\n") TypeError: sequence item 8: expected string or Unicode, float foundI had initially tried using csv.writer but there is a \xa0 character in one of the cells which was causing a lot of trouble hence installed codecs and battling to get it to work.

Below is info on the excel document if that gives any insight

=== File: CustList.xls ===
Open took 3.03 seconds

BIFF version: 8; datemode: 0 codepage: 1200 (encoding: utf_16_le); countries: (1, 1) Last saved by: u'Rajesh, Vatha' Number of data sheets: 2 Use mmap: 1; Formatting: 0; On demand: 0 Ragged rows: 0 Load time: 0.01 seconds (stage 1) 1.86 seconds (stage 2)

sheet 0: name = u'MEMBER'; nrows = 29966; ncols = 11

sheet 1: name = u'PHYSICANS'; nrows = 1619; ncols = 19

command took 0.20 secondsPlease suggest.

Thanks Zulfi

asked Jun 4, 2014 at 7:27
4
  • 1
    i think the problem is that sh.row_values(rownum) is a list (row_values Returns a slice of the values of the cells in the given row.) and not a string, so you can't pass it to write(). If you want to write all the row you should iterate each cell Commented Jun 4, 2014 at 7:33
  • How should the output file look like? Give some example. Commented Jun 4, 2014 at 8:22
  • The output should contain the data in each excel row as comma seperated columns in text file Commented Jun 4, 2014 at 18:23
  • You mixed up your question update. Did you try my new answer? Commented Jun 5, 2014 at 8:13

2 Answers 2

1

dciriello was right, because file_output.write should take string as its arguments, but sh.row_values(rownum) return a list, that's the main reason.

here is what to do, if you want to copy a file from xls to csv.

import xlrd
import csv
import codecs
wb = xlrd.open_workbook('/etl/dev/input/CustList.xls')
table = wb.sheet_by_index(1)
nrows = table.nrows
with codecs.open('/etl/dev/input/CustList.csv', 'w', 'utf-8') as file_output:
 spamwriter = csv.writer(file_output)
 for i in range(nrows):
 spamwriter.writerow(table.row_values(i))
answered Jun 4, 2014 at 8:24
Sign up to request clarification or add additional context in comments.

1 Comment

Many thanks for your reply. I was trying to avoid using csv.writer because I was getting the below error with csv writer UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 10: ordinal not in range(128)
1

If you want the values to be comma-separated in your output file, you could simply change your write command to join the list of values to a comma-separated string.

But at first you have to convert every value in the list to a string, because row_values() is returning a list oft string and floating values.

...
row_values = [str(val) for val in sh.row_values(rownum)]
file_output.write(",".join(row_values) + "\n")
...
answered Jun 4, 2014 at 8:32

2 Comments

Thanks for the reply. Tried the above and this time I guess the issues pops up while placing the data in the string. This u'\xa0' is bugging a lot :( I wish there is a simple alternative to dump whatever character this is from excel to text file. Traceback (most recent call last): File "TestXls2Csv.py", line 12, in <module> row_values = [str(val) for val in sh.row_values(rownum)] UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 10: ordinal not in range(128)
changed str(val) to repr(val) without having any idea of what's the difference and it worked except that the strings were single quoted and prefixed with 'u' .

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.