Programmer's Python Data - Text Files & CSV
Written by Mike James
Tuesday, 10 June 2025
Article Index
Programmer's Python Data - Text Files & CSV
Text Formats
The CSV Module
CSV Dialects
Page 4 of 4

CSV Dialects

To determine the format that is use for the CSV file you have to select or configure a dialect. The csv.list_dialects() method lets us discover what predefined dialects are supported. Currently there are three standard classes:

  • csv.excel standard Excel format CSV, short name 'excel'

  • csv.excel_tab Excel format but using TAB separators, short name 'excel-tab'

  • csv.unix_dialect usual format generated by UNIX systems using '\n' as line terminator and quoting all fields, short name 'unix'

You can create your own dialect class by creating a custom class inheriting from csv.dialect.

The dialect class has the following attributes:

  • delimiter single character used to separate fields

  • doublequote two quotes are used to enter a quote character within a quote if this is true, otherwise the quotes are prefixed by the specified escape character.

  • escapechar single character used to prefix special characters when included in data.

  • lineterminator string used to terminate lines, currently the reader always uses \r\n'.

  • quotechar single character used to quote fields.

  • skipinitialspace whitespace following the delimiter is ignored if True, the default is False.

  • quoting determines which fields are quoted:
    QUOTE_ALL all fields
    QUOTE_MINIMAL only fields containing a special
    character, this is the default
    QUOTE_NONNUMERIC all non-numeric fields as floats
    QUOTE_NONE no fields.

  • strict raises exception error on bad CSV input if True, the default is False.

For example, to create and read a CSV file that quotes only non-numeric data you would define a new dialect subclass:

class MyCSV(csv.Dialect):
 quoting =csv.QUOTE_NONNUMERIC
 delimiter = ','
 quotechar = '"'
 lineterminator = '\r\n'

Once you have this defined you can use it as a dialect class in the reader and writer objects:

import pathlib
import dataclasses
import csv
@dataclasses.dataclass
class person:
 name:str=""
 id:int=0
 score:float=0.0
 def values(self):
 return [self.name,str(self.id),str(self.score)]
 
class MyCSV(csv.Dialect):
 quoting =csv.QUOTE_NONNUMERIC
 delimiter = ','
 quotechar = '"'
 lineterminator = '\r\n'
me=person("mike",42,3.145)
path=pathlib.Path("myTextFile.csv")
with path.open(mode="wt",newline="") as f:
 peopleWriter=csv.writer(f,dialect=MyCSV)
 for i in range(43):
 peopleWriter.writerow([me.name,i,me.score])
 
with path.open(mode="rt") as f:
 peopleReader=csv.reader(f,dialect=MyCSV)
 for row in peopleReader:
 print(row)

If you run this you will see that a side effect of selecting csv.QUOTE_NONNUMERIC is that numeric values are converted to floats. So a typical record is:

['mike', 42.0, 3.145]

Notice that the integer 42 has been converted to a float.

You can also register a dialect name using either a subclass of dialect or as a set of parameters. For example, following:

csv.register_dialect("myDialect",MyCSV)

you can use the dialect via the short name "myDialect":

peopleReader=csv.reader(f,dialect="myDialect")

Alternatively you can register the dialect directly without having to create a subclass. For example:

csv.register_dialect("myDialect",quoting=csv.QUOTE_NONNUMERIC)

defines the quoting attribute and accepts the defaults for the others.

CSV is a very common format for encoding data, but there are more modern alternatives that are better if you are free to choose the format.

In chapter but not in this extract

  • JSON
  • Multiple Records
  • JSON and Dataclasses
  • XML
  • Python XML
  • ElementTree
  • More XML
  • Pickle

Advanced Pickling

Summary

  • Text files are simply binary files where the conversion to a string with suitable decoding is automatic.

  • As well as reading a fixed number of characters, you can also use the readline instruction to read in a single line of text.

  • The print instruction can be used with files and has the advantage of performing the conversion to a string automatically.

  • To make text files able to be read in and decoded you need to use a standard format like CSV, JSON or XML.

  • CSV, Comma Separated Values, is simple but it has a number of disadvantages in that converting from a string to the appropriate data type isn’t generally automatic and there are different dialects of CSV.

  • JSON is a good match to Python’s objects. It is easy to use and is cross-platform.

  • XML is more complicated and probably not a good choice if you can avoid it, but it is a widespread standard and very suitable for representing complex data.

  • XML is not well supported if you are looking for standard processing options such as DOM or SAX. The ElementTree module, however, provides good Python-oriented processing of XML.

  • Pickle is Python’s own object serialization format. It uses a binary file but it is very easy to use to save and load any Python class. Pickle is a good choice if the data is being produced and consumed by Python programs.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

pythondata360Contents

  1. Python – A Lightning Tour
  2. The Basic Data Type – Numbers
    Extract: Bignum
  3. Truthy & Falsey
  4. Dates & Times
    Extract Naive Dates
  5. Sequences, Lists & Tuples
    Extract Sequences
  6. Strings
    Extract Unicode Strings
  7. Regular Expressions
    Extract Simple Regular Expressions
  8. The Dictionary
    Extract The Dictionary
  9. Iterables, Sets & Generators
    Extract Iterables
  10. Comprehensions
    Extract Comprehensions
  11. Data Structures & Collections
    Extract Stacks, Queues and Deques
    Extract Named Tuples and Counters
  12. Bits & Bit Manipulation
    Extract Bits and BigNum
    Extract Bit Masks
  13. Bytes
    Extract Bytes And Strings
    Extract Byte Manipulation
  14. Binary Files
    Extract Files and Paths
  15. Text Files
    Extract Text Files & CSV
    Extract JSON ***NEW!!!
  16. Creating Custom Data Classes
    Extract A Custom Data Class
  17. Python and Native Code
    Extract Native Code
    Appendix I Python in Visual Studio Code
    Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:B0CK71TQ17>

<ASIN:187196265X>

Related Articles

Creating The Python UI With Tkinter

Creating The Python UI With Tkinter - The Canvas Widget

The Python Dictionary

Arrays in Python

Advanced Python Arrays - Introducing NumPy

pico book

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


<< Prev - Next

Last Updated ( Tuesday, 10 June 2025 )