homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: CSVReader ignores dialect.lineterminator
Type: enhancement Stage: resolved
Components: 2to3 (2.x to 3.x conversion tool), Documentation, email, Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder: Close 2to3 issues and list them here
View: 45544
Assigned To: docs@python Nosy List: Benjamin Schollnick, barry, docs@python, r.david.murray, skip.montanaro, xtreak
Priority: normal Keywords:

Created on 2019年07月29日 19:32 by Benjamin Schollnick, last changed 2022年04月11日 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
CSV_SAMPLE.CSV Benjamin Schollnick, 2019年07月30日 13:05 Sample data
bell.csv skip.montanaro, 2019年07月31日 04:46 example CSV file with \x07 as the line terminator
lfmapper.py skip.montanaro, 2019年07月31日 04:46
Messages (5)
msg348681 - (view) Author: Benjamin Schollnick (Benjamin Schollnick) Date: 2019年07月29日 19:32
I've run into a situation where the CSV input file is very unusual. The Delimiter is "\x06" and the lineterminator is "\x07".
While I've written code to work around this, it would be significantly nicer if the CSV Reader code actually paid attention to the dialect's lineterminator value.
msg348683 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019年07月29日 20:19
Seems related : https://bugs.python.org/issue1072404 . There is a note on docs that it's ignored and may be changed in future. 
https://docs.python.org/3/library/csv.html 
msg348711 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2019年07月30日 02:18
I imagine this is a corner case which will continue to cause problems. At the time the csv module was originally written, I believe the authors' intent was to read and write CSV files which were compatible with Excel. In Python 3, you have to open input files in text mode (that provides the underlying line splitting behavior). Consequently, you're not going to see proper line splitting with unadorned files.
Have you only tried this with Python 3? If you have tried Python 2, were you able to get it to work without your workaround?
msg348738 - (view) Author: Benjamin Schollnick (Benjamin Schollnick) Date: 2019年07月30日 13:05
This is tested under python 3...
filename = "csv_Sample.csv"
from csv import DictReader
datafile = open(filename, 'r')
data = csv.DictReader(datafile, lineterminator = '\x07', delimiter='\x06')
print(next(data))
 OrderedDict([('Field1', 'A'), ('Field2', 'B'), ('Field3', 'C'), ('Field4', 'D'), ('Field5', 'E'), ('Field6', 'F'), ('Field7', 'G'), ('Field8', 'H'), ('Field9', 'I'), ('Field10\x07', 'J\x07')])
print(ord(data.reader.dialect.lineterminator))
So it's untested under python 2, since I've stopped developing under Py2. 
I noticed the note in the CSV reader documentation, *AFTER* I diagnosed the issue with the CSV reader... Which is why I opened the bug / feature enhancement request, since this is an very odd edge case.
I agree 90+% of all CSVs are going to be \n line terminated, but if we offer it for writing, we should offer it for reading.
The main emphasis here is this code will not working in the real world, eg.
filename = "csvFile.csv"
from csv import DictReader, DictWriter
import csv
with open(filename, mode='w') as output_file:
 outcsv = csv.writer(output_file, delimiter=',', lineterminator=";")
 outcsv.writerow(['John Cleese', 'CEO', 'March'])
 outcsv.writerow(['Graham Chapman', 'CFO', 'November'])
 outcsv.writerow(['Terry Jones', 'Animation', 'March'])
 outcsv.writerow(['Eric Idle', 'Laugh Track', 'November'])
 outcsv.writerow(['Michael Palin', 'Snake Wrangler', 'March'])
with open(filename, mode='r') as input_file:
 csv_reader = csv.reader(input_file, delimiter=',', lineterminator=";")
 for row in csv_reader:
 print(row)
msg348778 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2019年07月31日 04:46
Looking at your sample file, it seems stranger than you first indicated. Your line terminator actually appears to be '\x07\r\n', not just '\x07'. Opening your file in text mode will leave you with '\x07' as the last character of the last cell in each row. I've attached two files, bell.csv, which has just '\x07' as the line terminator, and lfmapper.py, which provides a class (suboptimally named LFMapper) which takes a file object opened in binary mode and optional line_terminator and encoding args, and performs the necessary slicing of the input bytes, decoding them and returning strings.
Unless Python grows a way for you to tell the open() function what string to use as the line terminator in text mode, I don't think your example is ever going to work without some sort of shim class.
History
Date User Action Args
2022年04月11日 14:59:18adminsetgithub: 81890
2021年10月20日 23:11:45iritkatrielsetstatus: open -> closed
superseder: Close 2to3 issues and list them here
resolution: wont fix
stage: resolved
2020年07月22日 05:33:12Daniel Smejkalsetassignee: docs@python

components: + Documentation, 2to3 (2.x to 3.x conversion tool), email
nosy: + barry, r.david.murray, docs@python
2019年07月31日 04:46:34skip.montanarosetfiles: + lfmapper.py
2019年07月31日 04:46:21skip.montanarosetfiles: + bell.csv

messages: + msg348778
2019年07月30日 13:05:39Benjamin Schollnicksetfiles: + CSV_SAMPLE.CSV

messages: + msg348738
2019年07月30日 02:18:15skip.montanarosetmessages: + msg348711
2019年07月29日 20:19:16xtreaksetnosy: + xtreak, skip.montanaro
messages: + msg348683
2019年07月29日 19:32:08Benjamin Schollnickcreate

AltStyle によって変換されたページ (->オリジナル) /