CSV parsing program that creates distinct header rows with transaction rows underneath

Question 1

My code reads in the data using DictReader, then creates a header row that contains my composite key (PEOPLE_ID, DON_DATE), and then adds various values that are distinct to each section. The output looks like this:

-01- PEOPLE_ID, DON_DATE, etc...
-02- dataline
-02- dataline
-01- ...
etc...

I'm looking to possibly simplify or streamline this code, and then could use advice on how to implement robust error-handling throughout. Here is my program:

#!/usr/bin/python
# pre_process.py
import csv
import sys
def main():
 infile = sys.argv[1]
 outfile = sys.argv[2]
 with open(infile, 'rbU') as in_obj:
 reader, fieldnames = open_reader(in_obj)
 reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'], 
 key['DON_DATE']))
 header_list = create_header_list(reader)
 master_dict = mapData(header_list, reader)
 writeData(master_dict, outfile, fieldnames)
def open_reader(file_obj):
 reader = csv.DictReader(file_obj, delimiter=',')
 return reader, reader.fieldnames
def create_header_list(dict_obj):
 p_id_list = []
 for row in dict_obj:
 if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
 p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
 return p_id_list
def mapData(header_list, dict_obj):
 master_dict = {}
 client_section_list = []
 for element in header_list:
 for row in dict_obj:
 if (row['PEOPLE_ID'], row['DON_DATE']) == element:
 client_section_list.append(row)
 element = list(element)
 element_list = [client_section_list[0]['DEDUCT_AMT'],
 client_section_list[0]['ND_AMT'],
 client_section_list[0]['DEDUCT_YTD'],
 client_section_list[0]['NONDEDUCT_YTD']
 ]
 try:
 element_list.append((float(client_section_list[0]['DEDUCT_YTD']) +
 float(client_section_list[0]['NONDEDUCT_YTD'])
 ))
 except ValueError:
 pass
 element.extend(element_list)
 element = tuple(element)
 master_dict[element] = client_section_list
 client_section_list = []
 return master_dict
def writeData(in_obj, outfile, in_fieldnames):
 with open(outfile, 'wb') as writer_outfile:
 writer = csv.writer(writer_outfile, delimiter=',')
 dict_writer = csv.DictWriter(writer_outfile,
 fieldnames=in_fieldnames,
 extrasaction='ignore')
 for k, v in in_obj.iteritems():
 writer_outfile.write(' -01- ')
 writer.writerow(k)
 for i, e in enumerate(v):
 writer_outfile.write(' -02- ')
 dict_writer.writerow(e)
def getReconTotals(infile):
 pass
if __name__ == '__main__':
 main()

Question 2

Don't reuse names for multiple purposes

Before this line, reader is a DictReader, after this line it's a list:

 reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'], 
 key['DON_DATE']))

This can be confusing. It would be better to name the result something else. And it gets worse: this new reader reader is passed to create_header_list and mapData as parameter named "dict_obj", which further adds to the confusion.

Simplify set creation

This function essentially creates a set:

def create_header_list(dict_obj):
 p_id_list = []
 for row in dict_obj:
 if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
 p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
 return p_id_list

The not in check is inefficient, because it's an \$O(n)\$ operation.

It would be simpler and more efficient to use a set:

def create_header_list(dict_obj):
 p_id_set = set()
 for row in dict_obj:
 p_id_set.add((row['PEOPLE_ID'], row['DON_DATE']))
 return p_id_set

Or even:

def create_header_list(dict_obj):
 return set([(row['PEOPLE_ID'], row['DON_DATE']) for row in dict_obj])

If the ordering of the elements is important, then instead of a set, you can use an OrderedDict, as suggested by this post.

Running Python scripts

Not all systems have Python at /use/bin/python. The recommended shebang for Python scripts:

#!/usr/bin/env python

Follow PEP8

PEP8 is the coding style guide for Python. Among other things, it recommends using snake_case for variable and function names. Several functions violate that.

Even if you disagree with a specific naming convention, it's a universal violation of good naming practices to mix two kinds of naming styles in the same program, such as create_header_list and mapData.

Question 3

Thanks for the feedback! What are O(n) operations, and why are they bad?

janos janosjanos 113k15 gold badges154 silver badges396 bronze badges · Answer 1 · 2015-08-18 22:19:44Z

Don't reuse names for multiple purposes

Before this line, reader is a DictReader, after this line it's a list:

 reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'], 
 key['DON_DATE']))

This can be confusing. It would be better to name the result something else. And it gets worse: this new reader reader is passed to create_header_list and mapData as parameter named "dict_obj", which further adds to the confusion.

Simplify set creation

This function essentially creates a set:

def create_header_list(dict_obj):
 p_id_list = []
 for row in dict_obj:
 if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
 p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
 return p_id_list

The not in check is inefficient, because it's an \$O(n)\$ operation.

It would be simpler and more efficient to use a set:

def create_header_list(dict_obj):
 p_id_set = set()
 for row in dict_obj:
 p_id_set.add((row['PEOPLE_ID'], row['DON_DATE']))
 return p_id_set

Or even:

def create_header_list(dict_obj):
 return set([(row['PEOPLE_ID'], row['DON_DATE']) for row in dict_obj])

If the ordering of the elements is important, then instead of a set, you can use an OrderedDict, as suggested by this post.

Running Python scripts

Not all systems have Python at /use/bin/python. The recommended shebang for Python scripts:

#!/usr/bin/env python

Follow PEP8

PEP8 is the coding style guide for Python. Among other things, it recommends using snake_case for variable and function names. Several functions violate that.

Even if you disagree with a specific naming convention, it's a universal violation of good naming practices to mix two kinds of naming styles in the same program, such as create_header_list and mapData.

Thanks for the feedback! What are O(n) operations, and why are they bad?

Stack Exchange Network

CSV parsing program that creates distinct header rows with transaction rows underneath

1 Answer 1

Don't reuse names for multiple purposes

Simplify set creation

Running Python scripts

Follow PEP8

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CSV parsing program that creates distinct header rows with transaction rows underneath

1 Answer 1

Don't reuse names for multiple purposes

Simplify set creation

Running Python scripts

Follow PEP8

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions