My code reads in the data using DictReader
, then creates a header row that contains my composite key (PEOPLE_ID, DON_DATE), and then adds various values that are distinct to each section. The output looks like this:
-01- PEOPLE_ID, DON_DATE, etc...
-02- dataline
-02- dataline
-01- ...
etc...
I'm looking to possibly simplify or streamline this code, and then could use advice on how to implement robust error-handling throughout. Here is my program:
#!/usr/bin/python
# pre_process.py
import csv
import sys
def main():
infile = sys.argv[1]
outfile = sys.argv[2]
with open(infile, 'rbU') as in_obj:
reader, fieldnames = open_reader(in_obj)
reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'],
key['DON_DATE']))
header_list = create_header_list(reader)
master_dict = mapData(header_list, reader)
writeData(master_dict, outfile, fieldnames)
def open_reader(file_obj):
reader = csv.DictReader(file_obj, delimiter=',')
return reader, reader.fieldnames
def create_header_list(dict_obj):
p_id_list = []
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list:
p_id_list.append((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_list
def mapData(header_list, dict_obj):
master_dict = {}
client_section_list = []
for element in header_list:
for row in dict_obj:
if (row['PEOPLE_ID'], row['DON_DATE']) == element:
client_section_list.append(row)
element = list(element)
element_list = [client_section_list[0]['DEDUCT_AMT'],
client_section_list[0]['ND_AMT'],
client_section_list[0]['DEDUCT_YTD'],
client_section_list[0]['NONDEDUCT_YTD']
]
try:
element_list.append((float(client_section_list[0]['DEDUCT_YTD']) +
float(client_section_list[0]['NONDEDUCT_YTD'])
))
except ValueError:
pass
element.extend(element_list)
element = tuple(element)
master_dict[element] = client_section_list
client_section_list = []
return master_dict
def writeData(in_obj, outfile, in_fieldnames):
with open(outfile, 'wb') as writer_outfile:
writer = csv.writer(writer_outfile, delimiter=',')
dict_writer = csv.DictWriter(writer_outfile,
fieldnames=in_fieldnames,
extrasaction='ignore')
for k, v in in_obj.iteritems():
writer_outfile.write(' -01- ')
writer.writerow(k)
for i, e in enumerate(v):
writer_outfile.write(' -02- ')
dict_writer.writerow(e)
def getReconTotals(infile):
pass
if __name__ == '__main__':
main()
1 Answer 1
Don't reuse names for multiple purposes
Before this line, reader
is a DictReader
,
after this line it's a list
:
reader = sorted(reader, key=lambda key: (key['PEOPLE_ID'], key['DON_DATE']))
This can be confusing. It would be better to name the result something else.
And it gets worse: this new reader reader
is passed to create_header_list
and mapData
as parameter named "dict_obj",
which further adds to the confusion.
Simplify set creation
This function essentially creates a set:
def create_header_list(dict_obj): p_id_list = [] for row in dict_obj: if (row['PEOPLE_ID'], row['DON_DATE']) not in p_id_list: p_id_list.append((row['PEOPLE_ID'], row['DON_DATE'])) return p_id_list
The not in
check is inefficient, because it's an \$O(n)\$ operation.
It would be simpler and more efficient to use a set
:
def create_header_list(dict_obj):
p_id_set = set()
for row in dict_obj:
p_id_set.add((row['PEOPLE_ID'], row['DON_DATE']))
return p_id_set
Or even:
def create_header_list(dict_obj):
return set([(row['PEOPLE_ID'], row['DON_DATE']) for row in dict_obj])
If the ordering of the elements is important,
then instead of a set
, you can use an OrderedDict
,
as suggested by this post.
Running Python scripts
Not all systems have Python at /use/bin/python
. The recommended shebang for Python scripts:
#!/usr/bin/env python
Follow PEP8
PEP8 is the coding style guide for Python.
Among other things,
it recommends using snake_case
for variable and function names.
Several functions violate that.
Even if you disagree with a specific naming convention,
it's a universal violation of good naming practices to mix two kinds of naming styles in the same program, such as create_header_list
and mapData
.
-
\$\begingroup\$ Thanks for the feedback! What are O(n) operations, and why are they bad? \$\endgroup\$flybonzai– flybonzai2015年08月18日 22:38:36 +00:00Commented Aug 18, 2015 at 22:38