3
\$\begingroup\$

My code reads in the data using DictReader, then creates a header row that contains my composite key (PEOPLE_ID, DON_DATE), and then adds various values that are distinct to each section. The output looks like this:

-01- PEOPLE_ID, DON_DATE, etc...
-02- dataline
-02- dataline
-01- ...
etc...

I'm looking to possibly simplify or streamline this code, and then could use advice on how to implement robust error-handling throughout. Here is my program:

#!/usr/bin/python
# pre_process.py
import csv
import sys
def main():
 infile = sys.argv[1]
 outfile = sys.argv[2]
 with open(infile, 'rbU') as in_obj:
 reader, fieldnames = open_reader(in_obj)
 reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'], 
 key['DON_DATE']))
 header_list = create_header_list(reader)
 master_dict = mapData(header_list, reader)
 writeData(master_dict, outfile, fieldnames)
def open_reader(file_obj):
 reader = csv.DictReader(file_obj, delimiter=',')
 return reader, reader.fieldnames
def create_header_list(dict_obj):
 p_id_list = []
 for row in dict_obj:
 if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
 p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
 return p_id_list
def mapData(header_list, dict_obj):
 master_dict = {}
 client_section_list = []
 for element in header_list:
 for row in dict_obj:
 if (row['PEOPLE_ID'], row['DON_DATE']) == element:
 client_section_list.append(row)
 element = list(element)
 element_list = [client_section_list[0]['DEDUCT_AMT'],
 client_section_list[0]['ND_AMT'],
 client_section_list[0]['DEDUCT_YTD'],
 client_section_list[0]['NONDEDUCT_YTD']
 ]
 try:
 element_list.append((float(client_section_list[0]['DEDUCT_YTD']) +
 float(client_section_list[0]['NONDEDUCT_YTD'])
 ))
 except ValueError:
 pass
 element.extend(element_list)
 element = tuple(element)
 master_dict[element] = client_section_list
 client_section_list = []
 return master_dict
def writeData(in_obj, outfile, in_fieldnames):
 with open(outfile, 'wb') as writer_outfile:
 writer = csv.writer(writer_outfile, delimiter=',')
 dict_writer = csv.DictWriter(writer_outfile,
 fieldnames=in_fieldnames,
 extrasaction='ignore')
 for k, v in in_obj.iteritems():
 writer_outfile.write(' -01- ')
 writer.writerow(k)
 for i, e in enumerate(v):
 writer_outfile.write(' -02- ')
 dict_writer.writerow(e)
def getReconTotals(infile):
 pass
if __name__ == '__main__':
 main()
asked Aug 18, 2015 at 22:02
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Don't reuse names for multiple purposes

Before this line, reader is a DictReader, after this line it's a list:

 reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'], 
 key['DON_DATE']))

This can be confusing. It would be better to name the result something else. And it gets worse: this new reader reader is passed to create_header_list and mapData as parameter named "dict_obj", which further adds to the confusion.

Simplify set creation

This function essentially creates a set:

def create_header_list(dict_obj):
 p_id_list = []
 for row in dict_obj:
 if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
 p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
 return p_id_list

The not in check is inefficient, because it's an \$O(n)\$ operation.

It would be simpler and more efficient to use a set:

def create_header_list(dict_obj):
 p_id_set = set()
 for row in dict_obj:
 p_id_set.add((row['PEOPLE_ID'], row['DON_DATE']))
 return p_id_set

Or even:

def create_header_list(dict_obj):
 return set([(row['PEOPLE_ID'], row['DON_DATE']) for row in dict_obj])

If the ordering of the elements is important, then instead of a set, you can use an OrderedDict, as suggested by this post.

Running Python scripts

Not all systems have Python at /use/bin/python. The recommended shebang for Python scripts:

#!/usr/bin/env python

Follow PEP8

PEP8 is the coding style guide for Python. Among other things, it recommends using snake_case for variable and function names. Several functions violate that.

Even if you disagree with a specific naming convention, it's a universal violation of good naming practices to mix two kinds of naming styles in the same program, such as create_header_list and mapData.

answered Aug 18, 2015 at 22:19
\$\endgroup\$
1
  • \$\begingroup\$ Thanks for the feedback! What are O(n) operations, and why are they bad? \$\endgroup\$ Commented Aug 18, 2015 at 22:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.