Issue 23178: csv.reader does not handle BOM

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67367

classification

Title:	csv.reader does not handle BOM
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	closed	Resolution:	duplicate
Dependencies:	Superseder:	Python3: guess text file charset using the BOM View: 7651
Assigned To:	Nosy List:	jdufresne, r.david.murray
Priority:	normal	Keywords:

Created on 2015年01月06日 19:05 by jdufresne, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (2)
msg233549 - (view)	Author: Jon Dufresne (jdufresne) *	Date: 2015年01月06日 19:05
The following test script demonstrates that Python's csv library does not handle a BOM. I would expect the returned row to be equal to expected and to print 'True' to stdout. In the wild, it is typical for other CSV writers to add a BOM. MS Excel is especially picky about the BOM when reading a utf-8 encoded file. So many writers add a BOM for interopability with MS Excel. If a python program accepts a CSV file as input (often the case in web apps), these files will not be handled correctly without preprocessing. In my opinion, this should "just work" when reading the file. --- import codecs import csv f = open('foo.csv', 'wb') f.write(codecs.BOM_UTF8 + b'a,b,c') f.close() expected = ['a', 'b', 'c'] f = open('foo.csv') r = csv.reader(f) row = next(r) print(row) print(row == expected) --- Output --- $ ./python ~/test.py ['\ufeffa', 'b', 'c'] False ---
msg233550 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2015年01月06日 19:52
This is not a problem with the csv module in particular. See issue 7651.

History
Date	User	Action	Args
2022年04月11日 14:58:11	admin	set	github: 67367
2015年01月06日 19:52:05	r.david.murray	set	status: open -> closed superseder: Python3: guess text file charset using the BOM nosy: + r.david.murray messages: + msg233550 resolution: duplicate stage: resolved
2015年01月06日 19:05:05	jdufresne	create

homepage