Programmer's Python Data - Text Files & CSV

Written by Mike James

Tuesday, 10 June 2025

Article Index
Programmer's Python Data - Text Files & CSV
Text Formats
The CSV Module
CSV Dialects

Page 4 of 4

CSV Dialects

To determine the format that is use for the CSV file you have to select or configure a dialect. The csv.list_dialects() method lets us discover what predefined dialects are supported. Currently there are three standard classes:

csv.excel standard Excel format CSV, short name 'excel'
csv.excel_tab Excel format but using TAB separators, short name 'excel-tab'
csv.unix_dialect usual format generated by UNIX systems using '\n' as line terminator and quoting all fields, short name 'unix'

You can create your own dialect class by creating a custom class inheriting from csv.dialect.

The dialect class has the following attributes:

delimiter single character used to separate fields
doublequote two quotes are used to enter a quote character within a quote if this is true, otherwise the quotes are prefixed by the specified escape character.
escapechar single character used to prefix special characters when included in data.
lineterminator string used to terminate lines, currently the reader always uses \r\n'.
quotechar single character used to quote fields.
skipinitialspace whitespace following the delimiter is ignored if True, the default is False.
quoting determines which fields are quoted:
QUOTE_ALL all fields
QUOTE_MINIMAL only fields containing a special
character, this is the default
QUOTE_NONNUMERIC all non-numeric fields as floats
QUOTE_NONE no fields.
strict raises exception error on bad CSV input if True, the default is False.

For example, to create and read a CSV file that quotes only non-numeric data you would define a new dialect subclass:

class MyCSV(csv.Dialect):
 quoting =csv.QUOTE_NONNUMERIC
 delimiter = ','
 quotechar = '"'
 lineterminator = '\r\n'

Once you have this defined you can use it as a dialect class in the reader and writer objects:

import pathlib
import dataclasses
import csv
@dataclasses.dataclass
class person:
 name:str=""
 id:int=0
 score:float=0.0
 def values(self):
 return [self.name,str(self.id),str(self.score)]
 
class MyCSV(csv.Dialect):
 quoting =csv.QUOTE_NONNUMERIC
 delimiter = ','
 quotechar = '"'
 lineterminator = '\r\n'
me=person("mike",42,3.145)
path=pathlib.Path("myTextFile.csv")
with path.open(mode="wt",newline="") as f:
 peopleWriter=csv.writer(f,dialect=MyCSV)
 for i in range(43):
 peopleWriter.writerow([me.name,i,me.score])
 
with path.open(mode="rt") as f:
 peopleReader=csv.reader(f,dialect=MyCSV)
 for row in peopleReader:
 print(row)

If you run this you will see that a side effect of selecting csv.QUOTE_NONNUMERIC is that numeric values are converted to floats. So a typical record is:

['mike', 42.0, 3.145]

Notice that the integer 42 has been converted to a float.

You can also register a dialect name using either a subclass of dialect or as a set of parameters. For example, following:

csv.register_dialect("myDialect",MyCSV)

you can use the dialect via the short name "myDialect":

peopleReader=csv.reader(f,dialect="myDialect")

Alternatively you can register the dialect directly without having to create a subclass. For example:

csv.register_dialect("myDialect",quoting=csv.QUOTE_NONNUMERIC)

defines the quoting attribute and accepts the defaults for the others.

CSV is a very common format for encoding data, but there are more modern alternatives that are better if you are free to choose the format.

In chapter but not in this extract

JSON
Multiple Records
JSON and Dataclasses
XML
Python XML
ElementTree
More XML
Pickle

Advanced Pickling

Summary

Text files are simply binary files where the conversion to a string with suitable decoding is automatic.
As well as reading a fixed number of characters, you can also use the readline instruction to read in a single line of text.
The print instruction can be used with files and has the advantage of performing the conversion to a string automatically.
To make text files able to be read in and decoded you need to use a standard format like CSV, JSON or XML.
CSV, Comma Separated Values, is simple but it has a number of disadvantages in that converting from a string to the appropriate data type isn’t generally automatic and there are different dialects of CSV.
JSON is a good match to Python’s objects. It is easy to use and is cross-platform.
XML is more complicated and probably not a good choice if you can avoid it, but it is a widespread standard and very suitable for representing complex data.
XML is not well supported if you are looking for standard processing options such as DOM or SAX. The ElementTree module, however, provides good Python-oriented processing of XML.
Pickle is Python’s own object serialization format. It uses a binary file but it is very easy to use to save and load any Python class. Pickle is a good choice if the data is being produced and consumed by Python programs.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

pythondata360Contents

Python – A Lightning Tour
The Basic Data Type – Numbers
Extract: Bignum
Truthy & Falsey
Dates & Times
Extract Naive Dates
Sequences, Lists & Tuples
Extract Sequences
Strings
Extract Unicode Strings
Regular Expressions
Extract Simple Regular Expressions
The Dictionary
Extract The Dictionary
Iterables, Sets & Generators
Extract Iterables
Comprehensions
Extract Comprehensions
Data Structures & Collections
Extract Stacks, Queues and Deques
Extract Named Tuples and Counters
Bits & Bit Manipulation
Extract Bits and BigNum
Extract Bit Masks
Bytes
Extract Bytes And Strings
Extract Byte Manipulation
Binary Files
Extract Files and Paths
Text Files
Extract Text Files & CSV
Extract JSON ***NEW!!!
Creating Custom Data Classes
Extract A Custom Data Class
Python and Native Code
Extract Native Code
Appendix I Python in Visual Studio Code
Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:B0CK71TQ17>

<ASIN:187196265X>

Creating The Python UI With Tkinter

Creating The Python UI With Tkinter - The Canvas Widget

The Python Dictionary

Arrays in Python

Advanced Python Arrays - Introducing NumPy

pico book

Comments

or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner

<< Prev - Next

Last Updated ( Tuesday, 10 June 2025 )