[Python-checkins] python/nondist/peps pep-0305.txt,NONE,1.1 pep-0000.txt,1.224,1.225

2003年1月28日 20:20:22 -0800

Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1:/tmp/cvs-serv25170
Modified Files:
	pep-0000.txt 
Added Files:
	pep-0305.txt 
Log Message:
Added PEP 305, CSV file API
--- NEW FILE: pep-0305.txt ---
PEP: 305
Title: CSV file API
Version: $Revision: 1.1 $
Last-Modified: $Date: 2003年01月29日 04:20:19 $
Author: Skip Montanaro <skip@pobox.com>,
 Kevin Altis <altis@semi-retired.com>,
 Cliff Wells <LogiplexSoftware@earthlink.net>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 26-Jan-2003
Post-History: 
Abstract
========
The Comma Separated Values (CSV) file format is the most common import
and export format for spreadsheets and databases. Although many CSV
files are simple to parse, the format is not formally defined by a
stable specification and is subtle enough that parsing lines of a CSV
file with something like ``line.split(",")`` is bound to fail. This
PEP defines an API for reading and writing CSV files which should make
it possible for programmers to select a CSV module which meets their
requirements.
Existing Modules
================
Three widely available modules enable programmers to read and write
CSV files:
- Object Craft's CSV module [1]_
- Cliff Wells's Python-DSV module [2]_
- Laurence Tratt's ASV module [3]_
Each has a different API, making it somewhat difficult for programmers
to switch between them. More of a problem may be that they interpret
some of the CSV corner cases differently, so even after surmounting
the differences in the module APIs, the programmer has to also deal
with semantic differences between the packages.
Rationale
=========
By defining common APIs for reading and writing CSV files, we make it
easier for programmers to choose an appropriate module to suit their
needs, and make it easier to switch between modules if their needs
change. This PEP also forms a set of requirements for creation of a
module which will hopefully be incorporated into the Python
distribution.
Module Interface
================
The module supports two basic APIs, one for reading and one for
writing. The reading interface is::
 reader(fileobj [, dialect='excel2000']
 [, quotechar='"']
 [, delimiter=',']
 [, skipinitialspace=False])
A reader object is an iterable which takes a file-like object opened
for reading as the sole required parameter. It also accepts four
optional parameters (discussed below). Readers are typically used as
follows::
 csvreader = csv.reader(file("some.csv"))
 for row in csvreader:
 process(row)
The writing interface is similar::
 writer(fileobj [, dialect='excel2000']
 [, quotechar='"']
 [, delimiter=',']
 [, skipinitialspace=False])
A writer object is a wrapper around a file-like object opened for
writing. It accepts the same four optional parameters as the reader
constructor. Writers are typically used as follows::
 csvwriter = csv.writer(file("some.csv", "w"))
 for row in someiterable:
 csvwriter.write(row)
Optional Parameters
-------------------
Both the reader and writer constructors take four optional keyword
parameters:
- dialect is an easy way of specifying a complete set of format
 constraints for a reader or writer. Most people will know what
 application generated a CSV file or what application will process
 the CSV file they are generating, but not the precise settings
 necessary. The only dialect defined initially is "excel2000". The
 dialect parameter is interpreted in a case-insensitive manner.
- quotechar specifies a one-character string to use as the quoting
 character. It defaults to '"'.
- delimiter specifies a one-character string to use as the field
 separator. It defaults to ','.
- skipinitialspace specifies how to interpret whitespace which
 immediately follows a delimiter. It defaults to False, which means
 that whitespace immediate following a delimiter is part of the
 following field.
When processing a dialect setting and one or more of the other
optional parameters, the dialect parameter is processed first, then
the others are processed. This makes it easy to choose a dialect,
then override one or more of the settings. For example, if a CSV file
was generated by Excel 2000 using single quotes as the quote
character and TAB as the delimiter, you could create a reader like::
 csvreader = csv.reader(file("some.csv"), dialect="excel2000",
 quotechar="'", delimiter='\t')
Other details of how Excel generates CSV files would be handled
automatically.
Testing
=======
TBD.
Issues
======
- Should a parameter control how consecutive delimiters are
 interpreted? Our thought is "no". Consecutive delimiters should
 always denote an empty field.
- What about Unicode? Is it sufficient to pass a file object gotten
 from codecs.open()? For example::
 csvreader = csv.reader(codecs.open("some.csv", "r", "cp1252"))
 csvwriter = csv.writer(codecs.open("some.csv", "w", "utf-8"))
 In the first example, text would be assumed to be encoded as cp1252.
 Should the system be aggressive in converting to Unicode or should
 Unicode strings only be returned if necessary?
 In the second example, the file will take care of automatically
 encoding Unicode strings as utf-8 before writing to disk.
- What about alternate escape conventions? When Excel exports a file,
 it appears only the field delimiter needs to be escaped. It
 accomplishes this by quoting the entire field, then doubling any
 quote characters which appear in the field. It also quotes a field
 if the first character is a quote character. It would seem we need
 to support two modes: escape-by-quoting and escape-by-prefix. In
 addition, for the second mode, we'd have to specify the escape
 character (presumably defaulting to a backslash character).
- Should there be a "fully quoted" mode for writing? What about
 "fully quoted except for numeric values"?
- What about end-of-line? If I generate a CSV file on a Unix system,
 will Excel properly recognize the LF-only line terminators?
- What about conversion to other file formats? Is the list-of-lists
 output from the csvreader sufficient to feed into other writers?
- What about an option to generate list-of-dict output from the reader
 and accept list-of-dicts by the writer? This makes manipulating
 individual rows easier since each one is independent, but you lose
 field order when writing and have to tell the writer object the
 order the fields should appear in the file.
- Are quote character and delimiters limited to single characters? I
 had a client not that long ago who wrote their own flat file format
 with a delimiter of ":::".
- How should rows of different lengths be handled? The options seem
 to be:
 * raise an exception when a row is encountered whose length differs
 from the previous row
 * silently return short rows
 * allow the caller to specify the desired row length and what to do
 when rows of a different length are encountered: ignore, truncate,
 pad, raise exception, etc.
References
==========
.. [1] csv module, Object Craft
 (http://www.object-craft.com.au/projects/csv) 
.. [2] Python-DSV module, Wells
 (http://sourceforge.net/projects/python-dsv/) 
.. [3] ASV module, Tratt
 (http://tratt.net/laurie/python/asv/)
There are many references to other CSV-related projects on the Web. A
few are included here.
Copyright
=========
This document has been placed in the public domain.

..
 Local Variables:
 mode: indented-text
 indent-tabs-mode: nil
 sentence-end-double-space: t
 fill-column: 70
 End:
Index: pep-0000.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0000.txt,v
retrieving revision 1.224
retrieving revision 1.225
diff -C2 -d -r1.224 -r1.225
*** pep-0000.txt	23 Jan 2003 17:25:59 -0000	1.224
--- pep-0000.txt	29 Jan 2003 04:20:19 -0000	1.225
***************
*** 106,110 ****
 S 302 New Import Hooks JvR
 S 303 Extend divmod() for Multiple Divisors Bellman
! S 304 Controlling generation of bytecode files Montanaro

 Finished PEPs (done, implemented in CVS)
--- 106,111 ----
 S 302 New Import Hooks JvR
 S 303 Extend divmod() for Multiple Divisors Bellman
! S 304 Controlling Generation of Bytecode Files Montanaro
! I 305 CSV File API Montanaro, Altis, Wells

 Finished PEPs (done, implemented in CVS)
***************
*** 300,304 ****
 S 302 New Import Hooks JvR
 S 303 Extend divmod() for Multiple Divisors Bellman
! S 304 Controlling generation of bytecode files Montanaro
 SR 666 Reject Foolish Indentation Creighton

--- 301,306 ----
 S 302 New Import Hooks JvR
 S 303 Extend divmod() for Multiple Divisors Bellman
! S 304 Controlling Generation of Bytecode Files Montanaro
! I 305 CSV File API Montanaro, Altis, Wells
 SR 666 Reject Foolish Indentation Creighton

***************
*** 321,324 ****
--- 323,327 ----
 Ahlstrom, James C. jim@interet.com
 Althoff, Jim james_althoff@i2.com
+ Altis, Kevin altis@semi-retired.com
 Ascher, David davida@activestate.com
 Barrett, Paul barrett@stsci.edu
***************
*** 373,376 ****
--- 376,380 ----
 Tirosh, Oren oren at hishome.net
 Warsaw, Barry barry@zope.com
+ Wells, Cliff LogiplexSoftware@earthlink.net
 Wilson, Greg gvwilson@ddj.com
 Wouters, Thomas thomas@xs4all.net