This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年09月26日 09:28 by mallyvai, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| csv_test.tar | mallyvai, 2017年09月30日 08:40 | |||
| Messages (9) | |||
|---|---|---|---|
| msg303025 - (view) | Author: Vaibhav Mallya (mallyvai) | Date: 2017年09月26日 09:28 | |
I'm writing python `csv` based-parsers as part of a data processing pipeline that includes Redshift and other data stores upstream and down. It's easy and expected in all of these data stores (http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) that CSV-style data can be generated with ESCAPE'd newlines, and with or without quotes on the columns. Challenge: However, 2.x CSV module has a bug where ESCAPE'd newlines in unquoted CSVs are not actually treated as escaped newlines, but as entirely new record entries. This is at odds with expected behavior in most common data warehouses (See - Redshift docs I linked above for example) and is a subtle source of bugs for data processing pipelines. We changed our Redshift Parameters to ADDQUOTES so we could get around this bug, after some debugging. Note - This seems to be a continuation of https://bugs.python.org/issue15927 which was closed as WONTFIX for 2.x. I think this is a legitimate bug, and should be fixed in 2.x. If someone is relying on old / bad behavior might mean something else is wrong. In my view, the current behavior effectively adds an implicit, undocumented dialect to the CSV module. |
|||
| msg303376 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2017年09月29日 23:54 | |
In closing #15927, R. David Murray said "Although this is clearly a bug fix, it also represents a behavior change that could cause a working program to fail. I have therefore only applied it to 3.4, but I'm open to arguments that it should be backported." David, I'll leave you to evaluate the argument presented. Vaibhav: in the meanwhile, consider moving your pipeline to 3.x or patching your copy of the csv module. You can put it in sitepackes as csv27. Or if you are distributing code anyway, include your patched copy with it. |
|||
| msg303381 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2017年09月30日 00:44 | |
I'm pretty hesitant to make this kind of change in python2. I'm going to punt, and let someone else make the decision. Which means if no one does, the status quo will win. Sorry about that. |
|||
| msg303382 - (view) | Author: Vaibhav Mallya (mallyvai) (Vaibhav Mallya (mallyvai)) | Date: 2017年09月30日 01:00 | |
If there's any way this can be documented that would be a big help, at least. There have been other folks who run into this, and the current behavior is implicit. On Sep 29, 2017 5:44 PM, "R. David Murray" <report@bugs.python.org> wrote: R. David Murray <rdmurray@bitdance.com> added the comment: I'm pretty hesitant to make this kind of change in python2. I'm going to punt, and let someone else make the decision. Which means if no one does, the status quo will win. Sorry about that. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue31590> _______________________________________ |
|||
| msg303388 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2017年09月30日 03:10 | |
I explained on #15927 why I currently see it as an enhancement issue, and therefore not appropriate to be backported. In fact, based on the doc, I am puzzled why the line terminator was being escaped. |
|||
| msg303401 - (view) | Author: Vaibhav Mallya (mallyvai) | Date: 2017年09月30日 08:40 | |
Hello R. David & Terry! Appreciate your prompt responses. While experimenting with different test cases I realized that escaped slashes and newlines are intrinsically annoying to reason about as stringy-one-liners, so I threw together a small tarball test case - attached - to make sure we're on the same page. To be clear, I was referring *solely* to reading with csv.DictReader (we're not using the writing part). The assertion for the multi_line_csv_unquoted fails, and I believe it should succeed. I hadn't considered the design-bug vs code-bug angle. I also think that documenting this somehow - explicitly - would help others, since there's no mention of the interaction here, with what should be a fairly common use-case. It might even make sense to make a "strong recommendation" that everything is quoted + escaped (much as redshift makes a strong recommendation to escape). Our data pipeline is doing fine after the right parameters on both sides, this is more about improving Python for the rest of the community. Thanks for your help, I will of course respect any decision you make. |
|||
| msg309814 - (view) | Author: Sebastian Bank (xflr6) | Date: 2018年01月11日 15:16 | |
https://bugs.python.org/issue15927#msg309811 gives sme code examples illustrating why I think this should be backported (and also the documentation should be changed for both Python 2 and 3). |
|||
| msg372479 - (view) | Author: Zackery Spytz (ZackerySpytz) * (Python triager) | Date: 2020年06月27日 21:08 | |
Python 2 is EOL, so I think this issue should be closed. |
|||
| msg372492 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2020年06月28日 01:58 | |
Yes, the status quo won ;-). Sebastian, if you think a doc fix is still needed for current versions, please open a new issue with a specific suggestion and explanation for changing the 3.9 doc. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:52 | admin | set | github: 75771 |
| 2020年06月28日 01:58:30 | terry.reedy | set | status: open -> closed resolution: wont fix messages: + msg372492 stage: resolved |
| 2020年06月27日 21:08:44 | ZackerySpytz | set | nosy:
+ ZackerySpytz messages: + msg372479 |
| 2018年01月11日 15:16:59 | xflr6 | set | nosy:
+ xflr6 messages: + msg309814 |
| 2017年09月30日 08:40:51 | mallyvai | set | files:
+ csv_test.tar messages: + msg303401 |
| 2017年09月30日 03:10:58 | terry.reedy | set | messages: + msg303388 |
| 2017年09月30日 01:00:37 | Vaibhav Mallya (mallyvai) | set | nosy:
+ Vaibhav Mallya (mallyvai) messages: + msg303382 |
| 2017年09月30日 00:44:00 | r.david.murray | set | messages: + msg303381 |
| 2017年09月29日 23:54:46 | terry.reedy | set | nosy:
+ terry.reedy, r.david.murray messages: + msg303376 |
| 2017年09月26日 09:28:19 | mallyvai | create | |