This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年07月01日 21:23 by vmax, last changed 2022年04月11日 14:58 by admin.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 2529 | closed | vmax, 2017年07月01日 22:01 | |
| PR 18336 | open | vmax, 2020年02月03日 18:45 | |
| Messages (7) | |||
|---|---|---|---|
| msg297497 - (view) | Author: Max Vorobev (vmax) * | Date: 2017年07月01日 21:23 | |
Line terminator defaults to '\r\n' while detecting dialect in csv.Sniffer |
|||
| msg311804 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2018年02月07日 21:40 | |
The csv expert listed in https://devguide.python.org/experts/ is marked as inactive, and I have never used the module. So you might need to ask for help on core-mentorship list. The csv doc for Sniffer.sniff says "Analyze the given sample and return a Dialect subclass reflecting the parameters found." It is not clear to me whether 'the parameters found' is meant to be all possible parameters or just those found. So, to be conservative, I will initially treat this an a feature addition for the the next version, rather than a bug to also be fixed in current versions. It does seem like a reasonable request. |
|||
| msg311805 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2018年02月07日 21:45 | |
Looking at the code and docstring, lineterminator was intentionally (knowingly) not sniffed, making this a feature addition. |
|||
| msg311806 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2018年02月07日 21:55 | |
While Sniffer *returns* a dialect with lineterminator = '\r\n', it *uses* '\n' for splitting. This is slightly odd, as it leaves lines terminated by '\r' while detecting within-line parameters, but it does not affect such detection. Are there csv files in the wild that use \r as line terminator. If so, they will not currently get split. |
|||
| msg327094 - (view) | Author: Neil Schemenauer (nascheme) * (Python committer) | Date: 2018年10月04日 22:33 | |
There is another issue related to this. If you use codecs to get a reader, it uses str.splitlines() internally, which treats a bunch of different characters as line terminators. See issue #18291 and: https://docs.python.org/3.8/library/stdtypes.html#str.splitlines I was thinking about different ways to fix this. First, the csv module suggests you pass newline='' to the file object. I suspect most people don't know to do that. So, I thought maybe the csv module should inspect the file object that gets passed in and then warn if newline='' has not been used or if the file is a codecs reader object. However, that seems fairly complicated. Would it be better if we changed the 'csv' module to do its own line splitting? I think that would be better although I'm not sure about backwards compatibly. Currently, the reader expects to call iter() on the input file. Would it be okay if it used the 'read' method of it in preference to using iter()? It could still fallback to iter() if there was no read method. |
|||
| msg329482 - (view) | Author: Gertjan van den Burg (Gertjan van den Burg) | Date: 2018年11月08日 17:27 | |
Note that the current CSV parser in _csv.c doesn't require the line terminator, it eats up \r and \n where necessary. See: https://github.com/python/cpython/blob/fd512d76456b65c529a5bc58d8cfe73e4a10de7a/Modules/_csv.c#L752 This is why the line terminator isn't detected and doesn't need to be detected. Also, files that use the \r line terminator exist and are parsed correctly at the moment. See for example: https://raw.githubusercontent.com/hadley/data-fuel-economy/master/1998-2008/2008.csv |
|||
| msg329487 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2018年11月08日 21:16 | |
A couple comments. 1. Terry Reedy wrote: > The csv expert listed in https://devguide.python.org/experts/ is marked as inactive That would be me. I am indeed inactive w.r.t. fixing broken stuff, and don't want to feel obligated to jump in with both feet when a CSV ticket is raised. Still, I keep half an eye on things. If people are actually interested in my opinion on such stuff, drop me a line. 2. Regarding the csv.Sniffer class... I've personally never found it useful, and would be happy to see it deprecated. I occasionally define a delimiter other than comma, and never specify the quotechar. (I've never seen anything other than quotation marks used anyway.) As others have indicated, the line terminator is kind of unnecessary with Python 3 (unless you need something really weird). If you actually need to specify a delimiter, I think giving a set of candidate delimiters would be sufficient. The first one encountered wins. Maybe I'm just getting old and cranky, but deprecation is the fork in the road I'd take, given the choice. Second choice would be to simplify the delimiter sniffing logic and get rid of anything to do with line terminators. Skip |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:48 | admin | set | github: 75008 |
| 2020年02月03日 18:45:19 | vmax | set | keywords:
+ patch stage: test needed -> patch review pull_requests: + pull_request17708 |
| 2018年11月08日 21:16:12 | skip.montanaro | set | messages: + msg329487 |
| 2018年11月08日 17:27:53 | Gertjan van den Burg | set | nosy:
+ Gertjan van den Burg messages: + msg329482 |
| 2018年10月04日 22:33:40 | nascheme | set | nosy:
+ nascheme messages: + msg327094 |
| 2018年02月07日 21:55:37 | terry.reedy | set | messages: + msg311806 |
| 2018年02月07日 21:45:02 | terry.reedy | set | messages: + msg311805 |
| 2018年02月07日 21:40:43 | terry.reedy | set | versions:
+ Python 3.8, - Python 3.6 nosy: + terry.reedy, skip.montanaro messages: + msg311804 type: behavior -> enhancement |
| 2017年07月08日 02:26:13 | terry.reedy | set | stage: test needed |
| 2017年07月01日 22:01:12 | vmax | set | pull_requests: + pull_request2595 |
| 2017年07月01日 21:23:44 | vmax | create | |