This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年09月12日 04:49 by kalaxy, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test_csv.py | kalaxy, 2012年09月12日 04:49 | Script exhibiting bug. | ||
| test_csv_py3k.py | maciej.szulik, 2012年10月09日 20:41 | Script exhibiting bug for py3k. | ||
| csv.patch | mjohnson, 2013年03月20日 01:43 | review | ||
| Messages (10) | |||
|---|---|---|---|
| msg170352 - (view) | Author: Kalon Mills (kalaxy) | Date: 2012年09月12日 04:49 | |
cvs.reader improperly prematurely ends row parsing when parsing a row with an escaped newline but with quoting turned off. cvs.reader properly handles quoted newlines. cvs.writer properly handles writing escaped unquoted newlines so only the reader has an issue. Given a dialect with escapechar='\\', quoting=csv.QUOTE_NONE, lineterminator='\n': writer.writerow(['one\nelement']) will correctly write 'one\\\nelement\n' however pass that back into a reader and it will produce two rows: ['one\n'] ['element'] I would expect the reader to parse it correctly and return the original value of ['one\nelement'] I've attached a test script that exhibits the improper behavior. It uses a dialect to set an escapechar and disable quoting. |
|||
| msg172521 - (view) | Author: Maciej Szulik (maciej.szulik) * (Python triager) | Date: 2012年10月09日 20:41 | |
I've confirmed that bug in the latest repo version, still exists. I attach patch for py3k. I'll try to have a look at it in the current version, as soon as it will be fixed I'll port it to 2.7. |
|||
| msg175900 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年11月18日 18:33 | |
CSV is not well defined format. What you expect to read from csv.reader(['one', 'two'])? If two rows ['one'] and ['two'], than the reader in its own right and there is no bug which can be fixed. |
|||
| msg184415 - (view) | Author: Kalon Mills (kalaxy) | Date: 2013年03月18日 02:49 | |
Serhiy, sorry I'm not sure I understand your question. But if you take a look at the script that exhibits the problem I think the bug that I'm reporting becomes more clear. Namely, using the dialect configuration shown in the script, the round trip conversion from string through writer then through the reader back to string is inconsistent. The reader should return as output the same input that was given to the corresponding writer and this is not the case. So even if CVS in not well defined I believe the writer and reader should at least be consistent. |
|||
| msg184720 - (view) | Author: Michael Johnson (mjohnson) * | Date: 2013年03月20日 01:43 | |
On input, the reader sees a line like ['one\\\n','element'] from the file iterator and successfully escapes the newline character, but still interprets the end of the string as the end of a record. I've attached a patch that modifies this behavior, so that encountering the end of a string immediately after an escaped \r or \n is does not begin a new record. |
|||
| msg184723 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年03月20日 02:42 | |
New changeset 940748853712 by R David Murray in branch 'default': #15927: Fix cvs.reader parsing of escaped \r\n with quoting off. http://hg.python.org/cpython/rev/940748853712 |
|||
| msg184724 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2013年03月20日 02:44 | |
Although this is clearly a bug fix, it also represents a behavior change that could cause a working program to fail. I have therefore only applied it to 3.4, but I'm open to arguments that it should be backported. Thanks for the patch, Michael. |
|||
| msg303387 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2017年09月30日 03:05 | |
This issue was 'reopened' by #31590. I can understand inconsistency as a 'design bug', but design bugs are not code bugs, and fixing a design bugs is an enhancement issue, not a behavior issue. It is not clear to me why, with the specified dialect, "writer.writerow(['one\nelement'])" is correct in writing 'one\\\nelement\n'. The doc for Dialect.excapechar says "A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False." Yes, quoting is set to QUOTE_NONE, but \n is the lineterminator, not the delimiter (or the quotechar). It looks to me that escaping the lineterminator might be a bug. In any case, 'one\nelement' and 'one\\\nelement' are each 2 physical lines. I don't see anything in the doc about csv.reader joining physical lines into 'logical' lines the way that compile() does. |
|||
| msg309811 - (view) | Author: Sebastian Bank (xflr6) | Date: 2018年01月11日 14:57 | |
I am not sure about the design vs. code bug distinction, but what makes me think this should be fixed is primarily the broken round-trip (already mentioned above):
>>> import io, csv
>>> def roundtrip(value, **fmtparams):
with io.BytesIO() as f:
csv.writer(f, **fmtparams).writerow([value])
f.seek(0)
return next(csv.reader(f, **fmtparams))
>>> roundtrip('spam\neggs', quoting=csv.QUOTE_NONE, escapechar='\\')
['spam\n']
Furthermore, there is the inconsistency between Python 2 and 3, now that this has been fixed in 3.4.
I agree that the documentation of Dialect.escapechar is not in line with the code (in both Python 2 and Python 3): How about changing it to something along the following lines (TODO: reformulate according to how exactly Dialect.lineterminator affects this)?
"to escape the delimiter, \r, \n, and the quotechar if quoting is set to QUOTE_NONE
and the quotechar for all other quoting styles if doublequote is False":
>>> def write_csv(value, **fmtparams):
with io.BytesIO() as f:
csv.writer(f, **fmtparams).writerow([value])
return f.getvalue()
>>> write_csv('spam\reggs', quoting=csv.QUOTE_NONE, escapechar='\\')
'spam\\\reggs\r\n'
>>> write_csv('spam\neggs', quoting=csv.QUOTE_NONE, escapechar='\\')
'spam\\\neggs\r\n'
>>> write_csv('spam"eggs', quoting=csv.QUOTE_NONE, escapechar='\\')
'spam\\"eggs\r\n'
>>> write_csv('spam"eggs', quoting=csv.QUOTE_NONE, quotechar=None, escapechar='\\')
'spam"eggs\r\n'
>>> write_csv('spam"eggs', escapechar='\\', doublequote=False)
'spam\\"eggs\r\n'
> In any case, 'one\nelement' and 'one\\\nelement' are each 2 physical lines.
> I don't see anything in the doc about csv.reader joining physical lines
> into 'logical' lines the way that compile() does.
How about the following?
"csvreader.line_num
The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines."
"On reading, the escapechar removes any special meaning from the following character."
>>> write_csv('spam\neggs', quoting=csv.QUOTE_NONE) # with delimiter, \r, \n, and quotechar
Traceback (most recent call last):
...
Error: need to escape, but no escapechar set
>>> roundtrip('spam\neggs')
['spam\neggs']
>>> write_csv('spam\neggs')
'"spam\neggs"\r\n'
|
|||
| msg309846 - (view) | Author: Sebastian Bank (xflr6) | Date: 2018年01月12日 10:17 | |
To be complete, the docs of Dialect.escapechar should probably also say that it is used to escape itself. However, note that csw.writer currently only does this with csv.QUOTE_NONE (breaking round-trip otherwise: #12178). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:35 | admin | set | github: 60131 |
| 2018年01月12日 10:17:18 | xflr6 | set | messages: + msg309846 |
| 2018年01月11日 14:57:58 | xflr6 | set | nosy:
+ xflr6 messages: + msg309811 |
| 2017年09月30日 03:06:00 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg303387 |
| 2013年03月20日 02:44:33 | r.david.murray | set | status: open -> closed assignee: lukasz.langa -> versions: - Python 2.7, Python 3.2, Python 3.3 nosy: + r.david.murray messages: + msg184724 resolution: fixed stage: resolved |
| 2013年03月20日 02:42:03 | python-dev | set | nosy:
+ python-dev messages: + msg184723 |
| 2013年03月20日 01:43:57 | mjohnson | set | files:
+ csv.patch nosy: + mjohnson messages: + msg184720 keywords: + patch |
| 2013年03月18日 02:49:09 | kalaxy | set | messages: + msg184415 |
| 2012年11月18日 18:33:13 | serhiy.storchaka | set | messages: + msg175900 |
| 2012年10月09日 22:36:47 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka |
| 2012年10月09日 20:45:31 | lukasz.langa | set | assignee: lukasz.langa nosy: + lukasz.langa versions: + Python 3.2, Python 3.3 |
| 2012年10月09日 20:41:52 | maciej.szulik | set | files:
+ test_csv_py3k.py versions: + Python 3.4 nosy: + maciej.szulik messages: + msg172521 |
| 2012年09月13日 03:32:08 | chris.jerdonek | set | components:
+ Library (Lib), - None title: cvs.reader does not support escaped newline when quoting=cvs.QUOTE_NONE -> csv.reader() does not support escaped newline when quoting=csv.QUOTE_NONE |
| 2012年09月12日 04:49:29 | kalaxy | create | |