Issue 24147: Dialect class defaults are not documented.

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68335

classification

Title:	Dialect class defaults are not documented.
Type:	behavior	Stage:	patch review
Components:	Documentation	Versions:	Python 3.10

process

Dependencies:	Superseder:
Status:	open	Resolution:
Assigned To:	docs@python	Nosy List:	MiK, berker.peksag, docs@python, iritkatriel, jbmilam, r.david.murray, skip.montanaro, uniocto
Priority:	normal	Keywords:	easy, patch

Created on 2015年05月09日 01:05 by MiK, last changed 2022年04月11日 14:58 by admin.

Files
File name	Uploaded	Description	Edit
sans_headers.csv	MiK, 2015年05月09日 16:29
quotebug.py	skip.montanaro, 2015年05月17日 12:54
csv_dialect_doc_clarify.patch	jbmilam, 2015年05月29日 19:28	Document clarification	review
csv.html	jbmilam, 2015年05月29日 19:29	html file holding the changes

Pull Requests
URL	Status	Linked	Edit
PR 25989	open	uniocto, 2021年05月08日 12:23

Messages (14)
msg242787 - (view)	Author: Mik (MiK)	Date: 2015年05月09日 01:05
Python 2.7.3 (default, Mar 13 2014, 11:03:55) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> class Mon(csv.Dialect): ... delimiter = ',' ... quotechar = '"' ... quoting = 0 ... lineterminator = '\n' ... >>> f = open('sans_headers.csv','r') >>> reader = csv.DictReader(f, fieldnames=('code', 'nom', 'texte'), dialect=Mon) >>> for l in reader: ... print l ... {'nom': 'line_1', 'code': '3', 'texte': 'one line\ntwo lines'} {'nom': 'line_2', 'code': '5', 'texte': 'one line\nand a quote "iop"";newline'} {'nom': None, 'code': 'I\'m not a cat"', 'texte': None} >>> f.seek(0) >>> reader = csv.DictReader(f, fieldnames=('code', 'nom', 'texte'), delimiter=',', quotechar='"', quoting=0, lineterminator='\n') >>> for l in reader: ... print l ... {'nom': 'line_1', 'code': '3', 'texte': 'one line\ntwo lines'} {'nom': 'line_2', 'code': '5', 'texte': 'one line\nand a quote "iop";newline\nI\'m not a cat'} >>> If I use a subclass of csv.Dialect with the same attribute that I should use with keywords in calling csv.DictReader I don't get the same behaviour.
msg242807 - (view)	Author: Skip Montanaro (skip.montanaro) * (Python triager)	Date: 2015年05月09日 12:00
Can you attach your cab file so we don't need to reconstruct it (and possibly make a mistake) by reading your program's output?
msg242811 - (view)	Author: Skip Montanaro (skip.montanaro) * (Python triager)	Date: 2015年05月09日 13:19
Sorry, failed to override my phone's spell correction. "cab" should be "csv".
msg242818 - (view)	Author: Mik (MiK)	Date: 2015年05月09日 16:29
Hi, This is the file used for my test. Thank you, regard, Mik
msg243396 - (view)	Author: Skip Montanaro (skip.montanaro) * (Python triager)	Date: 2015年05月17日 12:54
In your Mon class, you've left the doublequote parameter unset (it defaults to None). It completely overrides the default dialect. When no Dialect class is given, the default is csv.excel. Note that doublequote is set to True in csv.excel. In your second example, the reader starts with csv.excel, then selectively overrides the named attributes.
msg243397 - (view)	Author: Mik (MiK)	Date: 2015年05月17日 13:06
Ok Thanks. But perhaps the documentation of csv.Dialect would be updated with the default parameters. If all attribute may be specified this would be indicated in the doc.
msg243399 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2015年05月17日 13:18
Yes, I think the documentation should be improved.
msg243402 - (view)	Author: Skip Montanaro (skip.montanaro) * (Python triager)	Date: 2015年05月17日 14:50
The defaults for the Dialect class are documented: https://docs.python.org/2/library/csv.html#dialects-and-formatting-parameters I think the problem is mostly that csv.Dialect must be subclassed. You can't use it directly, and if you subclass it as MiK did, you have to supply all the missing parameters. The default dialect is actually csv.excel, which does provide a suitable set of values for all attributes. There actually might be a bug lurking in the code as well. The value of csv.Dialect.doublequote is None, which will evaluate to False in a boolean context. The module docstring has this to say about that attribute: * doublequote - controls the handling of quotes inside fields. When True, two consecutive quotes are interpreted as one during read, and when writing, each quote character embedded in the data is written as two quotes Since the valid values of that attribute are actually only True and False, using None as a default value is an invitation to problems. It appears in this case that's what happened. csv.Dialect.__init__ doesn't seem to check that the overriding class properly sets all the required parameters. It checks to see if the class is Dialect. If not, and if the validate() call passes, all is assumed to be well. But digging a bit under the surface, it appears the validate step drops into C where the doublequote attribute of Dialog_Type is 0. I'm not sure the bug should be fixed in 2.7, but it's worth taking a look at the 3.5 code to see if that validation step can be improved.
msg244344 - (view)	Author: Brandon Milam (jbmilam) *	Date: 2015年05月28日 20:29
Hi all, I've been looking at this bug and am ready to start putting in some work on it but I have some questions about what is wanting to be done. From what I can tell these are the possible tasks for this issue. - Add to the docs under the dialect section the excel attributes vs. the dialect class attributes and explain how the excel dialect is the default and this is the functionality you'd be changing by creating a new dialect. - Add code to make sure that a certain number of attributes are set before the dialect can be accessed. (Though this might be C code and not really a C programmer nor do I know where _Dialect is in the repository) - Change the defaults in the dialects class because currently the documentation for "double quote" and "skip initial space" says that the default is False when in the code it is None. Also I did not find the "strict" dialect in the module at all. (maybe its part of that C code that I don't know how to find. - Add an example to the documentation on sub-classing dialect under the example on registering a new dialect If someone could clarify which of these is the desired direction for this issue it would be much appreciated.
msg244347 - (view)	Author: Mik (MiK)	Date: 2015年05月28日 21:24
Hi, I have just read the documentation once again. The problem is that it specifies that the default value for Dialect.doublequote is True : <quote>Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True.</quote> So it is easy to understand that the class csv.Dialect implements this default value. Although the class Dialect default in the csv.reader calling is "Excel" and thus, implicitly, it is csv.excel the default class whose attributes are described in the above paragraph. It would be great in this case to describe the attributes of the base class Dialect or specify that all attributes must be settled when we subclass this. Optionally it would be good that the code of CSV.Dialect be changed for really Boolean values. But the clarification of documentation is more important I think.
msg244400 - (view)	Author: Brandon Milam (jbmilam) *	Date: 2015年05月29日 19:28
Here I added on to the Dialects and Formatting Parameters paragraph explaining that the defaults listed are for the excel dialect and that all the attributes need to be specified if the user is wanting to create custom dialects through sub-classing. I will also include the html file this produces for those who do not want to look at the .rst file. Also I can go in and change the defaults of the Dialect class on the parameters that expect Boolean values if desired but I would open a separate issue for it. Let me know if there are any errors or desired changes in document change.
msg244407 - (view)	Author: Mik (MiK)	Date: 2015年05月29日 20:03
I think it's clearer that way. Thank you.
msg389327 - (view)	Author: Irit Katriel (iritkatriel) * (Python committer)	Date: 2021年03月22日 16:00
Brandon's patch has not been applied, it needs to be converted into a git PR.
msg393254 - (view)	Author: So Ukiyama (uniocto) *	Date: 2021年05月08日 12:32
I created a PR which apply Brandon Milam's patch. So If I have offended you with my rudeness, I hope you will forgive me for taking this down.

History
Date	User	Action	Args
2022年04月11日 14:58:16	admin	set	github: 68335
2021年05月08日 12:32:59	uniocto	set	messages: + msg393254
2021年05月08日 12:23:54	uniocto	set	keywords: + patch nosy: + uniocto pull_requests: + pull_request24641 stage: needs patch -> patch review
2021年03月22日 16:00:51	iritkatriel	set	versions: + Python 3.10, - Python 3.4, Python 3.5 nosy: + iritkatriel messages: + msg389327 keywords: + easy, - patch
2016年04月27日 02:36:58	berker.peksag	set	nosy: + berker.peksag
2015年05月29日 20:03:06	MiK	set	messages: + msg244407
2015年05月29日 19:29:23	jbmilam	set	files: + csv.html
2015年05月29日 19:28:39	jbmilam	set	files: + csv_dialect_doc_clarify.patch keywords: + patch messages: + msg244400
2015年05月28日 21:24:51	MiK	set	messages: + msg244347
2015年05月28日 20:29:03	jbmilam	set	nosy: + jbmilam messages: + msg244344
2015年05月17日 14:50:32	skip.montanaro	set	messages: + msg243402
2015年05月17日 13:18:29	r.david.murray	set	status: closed -> open assignee: docs@python stage: needs patch title: doublequote are not well recognized with Dialect class -> Dialect class defaults are not documented. nosy: + r.david.murray, docs@python versions: + Python 3.4, Python 3.5, - Python 2.7 messages: + msg243399 components: + Documentation, - Library (Lib) resolution: not a bug ->
2015年05月17日 13:06:13	MiK	set	messages: + msg243397
2015年05月17日 12:54:17	skip.montanaro	set	status: open -> closed files: + quotebug.py resolution: not a bug messages: + msg243396
2015年05月09日 16:29:21	MiK	set	files: + sans_headers.csv messages: + msg242818
2015年05月09日 13:19:10	skip.montanaro	set	messages: + msg242811
2015年05月09日 12:00:09	skip.montanaro	set	nosy: + skip.montanaro messages: + msg242807
2015年05月09日 01:05:19	MiK	create

homepage