This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年05月09日 01:05 by MiK, last changed 2022年04月11日 14:58 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| sans_headers.csv | MiK, 2015年05月09日 16:29 | |||
| quotebug.py | skip.montanaro, 2015年05月17日 12:54 | |||
| csv_dialect_doc_clarify.patch | jbmilam, 2015年05月29日 19:28 | Document clarification | review | |
| csv.html | jbmilam, 2015年05月29日 19:29 | html file holding the changes | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 25989 | open | uniocto, 2021年05月08日 12:23 | |
| Messages (14) | |||
|---|---|---|---|
| msg242787 - (view) | Author: Mik (MiK) | Date: 2015年05月09日 01:05 | |
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> class Mon(csv.Dialect):
... delimiter = ','
... quotechar = '"'
... quoting = 0
... lineterminator = '\n'
...
>>> f = open('sans_headers.csv','r')
>>> reader = csv.DictReader(f, fieldnames=('code', 'nom', 'texte'), dialect=Mon)
>>> for l in reader:
... print l
...
{'nom': 'line_1', 'code': '3', 'texte': 'one line\ntwo lines'}
{'nom': 'line_2', 'code': '5', 'texte': 'one line\nand a quote "iop"";newline'}
{'nom': None, 'code': 'I\'m not a cat"', 'texte': None}
>>> f.seek(0)
>>> reader = csv.DictReader(f, fieldnames=('code', 'nom', 'texte'), delimiter=',', quotechar='"', quoting=0, lineterminator='\n')
>>> for l in reader:
... print l
...
{'nom': 'line_1', 'code': '3', 'texte': 'one line\ntwo lines'}
{'nom': 'line_2', 'code': '5', 'texte': 'one line\nand a quote "iop";newline\nI\'m not a cat'}
>>>
If I use a subclass of csv.Dialect with the same attribute that I should use with keywords in calling csv.DictReader I don't get the same behaviour.
|
|||
| msg242807 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2015年05月09日 12:00 | |
Can you attach your cab file so we don't need to reconstruct it (and possibly make a mistake) by reading your program's output? |
|||
| msg242811 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2015年05月09日 13:19 | |
Sorry, failed to override my phone's spell correction. "cab" should be "csv". |
|||
| msg242818 - (view) | Author: Mik (MiK) | Date: 2015年05月09日 16:29 | |
Hi, This is the file used for my test. Thank you, regard, Mik |
|||
| msg243396 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2015年05月17日 12:54 | |
In your Mon class, you've left the doublequote parameter unset (it defaults to None). It completely overrides the default dialect. When no Dialect class is given, the default is csv.excel. Note that doublequote is set to True in csv.excel. In your second example, the reader starts with csv.excel, then selectively overrides the named attributes. |
|||
| msg243397 - (view) | Author: Mik (MiK) | Date: 2015年05月17日 13:06 | |
Ok Thanks. But perhaps the documentation of csv.Dialect would be updated with the default parameters. If all attribute may be specified this would be indicated in the doc. |
|||
| msg243399 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2015年05月17日 13:18 | |
Yes, I think the documentation should be improved. |
|||
| msg243402 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2015年05月17日 14:50 | |
The defaults for the Dialect class are documented: https://docs.python.org/2/library/csv.html#dialects-and-formatting-parameters I think the problem is mostly that csv.Dialect must be subclassed. You can't use it directly, and if you subclass it as MiK did, you have to supply all the missing parameters. The default dialect is actually csv.excel, which does provide a suitable set of values for all attributes. There actually might be a bug lurking in the code as well. The value of csv.Dialect.doublequote is None, which will evaluate to False in a boolean context. The module docstring has this to say about that attribute: * doublequote - controls the handling of quotes inside fields. When True, two consecutive quotes are interpreted as one during read, and when writing, each quote character embedded in the data is written as two quotes Since the valid values of that attribute are actually only True and False, using None as a default value is an invitation to problems. It appears in this case that's what happened. csv.Dialect.__init__ doesn't seem to check that the overriding class properly sets all the required parameters. It checks to see if the class is Dialect. If not, and if the validate() call passes, all is assumed to be well. But digging a bit under the surface, it appears the validate step drops into C where the doublequote attribute of Dialog_Type is 0. I'm not sure the bug should be fixed in 2.7, but it's worth taking a look at the 3.5 code to see if that validation step can be improved. |
|||
| msg244344 - (view) | Author: Brandon Milam (jbmilam) * | Date: 2015年05月28日 20:29 | |
Hi all, I've been looking at this bug and am ready to start putting in some work on it but I have some questions about what is wanting to be done. From what I can tell these are the possible tasks for this issue. - Add to the docs under the dialect section the excel attributes vs. the dialect class attributes and explain how the excel dialect is the default and this is the functionality you'd be changing by creating a new dialect. - Add code to make sure that a certain number of attributes are set before the dialect can be accessed. (Though this might be C code and not really a C programmer nor do I know where _Dialect is in the repository) - Change the defaults in the dialects class because currently the documentation for "double quote" and "skip initial space" says that the default is False when in the code it is None. Also I did not find the "strict" dialect in the module at all. (maybe its part of that C code that I don't know how to find. - Add an example to the documentation on sub-classing dialect under the example on registering a new dialect If someone could clarify which of these is the desired direction for this issue it would be much appreciated. |
|||
| msg244347 - (view) | Author: Mik (MiK) | Date: 2015年05月28日 21:24 | |
Hi, I have just read the documentation once again. The problem is that it specifies that the default value for Dialect.doublequote is True : <quote>Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True.</quote> So it is easy to understand that the class csv.Dialect implements this default value. Although the class Dialect default in the csv.reader calling is "Excel" and thus, implicitly, it is csv.excel the default class whose attributes are described in the above paragraph. It would be great in this case to describe the attributes of the base class Dialect or specify that all attributes must be settled when we subclass this. Optionally it would be good that the code of CSV.Dialect be changed for really Boolean values. But the clarification of documentation is more important I think. |
|||
| msg244400 - (view) | Author: Brandon Milam (jbmilam) * | Date: 2015年05月29日 19:28 | |
Here I added on to the Dialects and Formatting Parameters paragraph explaining that the defaults listed are for the excel dialect and that all the attributes need to be specified if the user is wanting to create custom dialects through sub-classing. I will also include the html file this produces for those who do not want to look at the .rst file. Also I can go in and change the defaults of the Dialect class on the parameters that expect Boolean values if desired but I would open a separate issue for it. Let me know if there are any errors or desired changes in document change. |
|||
| msg244407 - (view) | Author: Mik (MiK) | Date: 2015年05月29日 20:03 | |
I think it's clearer that way. Thank you. |
|||
| msg389327 - (view) | Author: Irit Katriel (iritkatriel) * (Python committer) | Date: 2021年03月22日 16:00 | |
Brandon's patch has not been applied, it needs to be converted into a git PR. |
|||
| msg393254 - (view) | Author: So Ukiyama (uniocto) * | Date: 2021年05月08日 12:32 | |
I created a PR which apply Brandon Milam's patch. So If I have offended you with my rudeness, I hope you will forgive me for taking this down. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:16 | admin | set | github: 68335 |
| 2021年05月08日 12:32:59 | uniocto | set | messages: + msg393254 |
| 2021年05月08日 12:23:54 | uniocto | set | keywords:
+ patch nosy: + uniocto pull_requests: + pull_request24641 stage: needs patch -> patch review |
| 2021年03月22日 16:00:51 | iritkatriel | set | versions:
+ Python 3.10, - Python 3.4, Python 3.5 nosy: + iritkatriel messages: + msg389327 keywords: + easy, - patch |
| 2016年04月27日 02:36:58 | berker.peksag | set | nosy:
+ berker.peksag |
| 2015年05月29日 20:03:06 | MiK | set | messages: + msg244407 |
| 2015年05月29日 19:29:23 | jbmilam | set | files: + csv.html |
| 2015年05月29日 19:28:39 | jbmilam | set | files:
+ csv_dialect_doc_clarify.patch keywords: + patch messages: + msg244400 |
| 2015年05月28日 21:24:51 | MiK | set | messages: + msg244347 |
| 2015年05月28日 20:29:03 | jbmilam | set | nosy:
+ jbmilam messages: + msg244344 |
| 2015年05月17日 14:50:32 | skip.montanaro | set | messages: + msg243402 |
| 2015年05月17日 13:18:29 | r.david.murray | set | status: closed -> open assignee: docs@python stage: needs patch title: doublequote are not well recognized with Dialect class -> Dialect class defaults are not documented. nosy: + r.david.murray, docs@python versions: + Python 3.4, Python 3.5, - Python 2.7 messages: + msg243399 components: + Documentation, - Library (Lib) resolution: not a bug -> |
| 2015年05月17日 13:06:13 | MiK | set | messages: + msg243397 |
| 2015年05月17日 12:54:17 | skip.montanaro | set | status: open -> closed files: + quotebug.py resolution: not a bug messages: + msg243396 |
| 2015年05月09日 16:29:21 | MiK | set | files:
+ sans_headers.csv messages: + msg242818 |
| 2015年05月09日 13:19:10 | skip.montanaro | set | messages: + msg242811 |
| 2015年05月09日 12:00:09 | skip.montanaro | set | nosy:
+ skip.montanaro messages: + msg242807 |
| 2015年05月09日 01:05:19 | MiK | create | |