Aggregate loans report without using Python standard aggregate or group functions

Question 1

I've written the following as a code assignment (Python 2.7). The task is to aggregate the loans csv file from the accounts department by (Network, Product, Month) with the total currency amounts, and counts, without using standard Python aggregate or group functions. I would like advice and suggestions on my current solution in order to learn from it going forward as my code was reviewed already.

I've used Python dict OrderedDict instead of normal Python dictionary to keep the current order of the records. I do realise that using OrderedDict comes at a cost, how else could I keep the order of the records that will be written to the Output Summary CSV.

Input Loans CSV

Output Loans Summary CSV

"""
Created on 20 August 2017
Loans Summary
Validation
@author: Peter Wilson
"""
import re
import csv
import argparse
import collections
from datetime import datetime
def csv_to_dict(loans_csv):
 """Convert loans csv to
 Python dictionary"""
 loans_dict = csv.DictReader(open(loans_csv))
 return loans_dict
def dict_to_agg_dict(loans_dict):
 """Aggregate loans by
 (Network, Product, Month)"""
 agg_dict = collections.OrderedDict()
 for row in loans_dict:
 network = re.sub(r"'", '', row['Network'])
 product = re.sub(r"'", '', row['Product'])
 # assumed month per year to be aggregated
 month = re.findall(r'([A-z]+?-\d{4})', row['Date'])[0]
 # assumed currency rounded to closet rounded number
 currency = float(row['Amount'])
 dict_key = (network, product, month)
 agg_dict.setdefault(dict_key, []).append(currency)
 return agg_dict
def agg_dict_to_lists(agg_dict):
 """Convert aggregated dictionary
 to Python list of lists"""
 summary_list = []
 for key, values in agg_dict.iteritems():
 values_list = list(key)
 currency_sum = sum(values)
 counts = len(values)
 values_list.insert(len(values_list), currency_sum)
 values_list.insert(len(values_list), counts)
 summary_list.append(values_list)
 return summary_list
def output_csv_summary(summary_list, output_csv_folder):
 """Write aggregated results
 into output csv"""
 current_date = datetime.today().strftime("{0}{1}{2}".format("%y", "%m", "%d"))
 csv_name = "{0}_{1}.csv".format("Output", current_date)
 output_csv = "{0}\\{1}".format(output_csv_folder, csv_name)
 print("Writing {0} to {1}".format(csv_name, output_csv_folder))
 csv_header = ["Network", "Product", "Month\Year", "Currency", "Count"]
 with open(output_csv, 'wb') as csvfile:
 csvwriter = csv.writer(csvfile)
 csvwriter.writerow(csv_header)
 for row in summary_list:
 try:
 csvwriter.writerow(row)
 except UnicodeError as e:
 print(e)
def validate_loans(loans_csv, output_csv_folder):
 """Validate loans summary from
 accounts with system"""
 loans_dict = csv_to_dict(loans_csv)
 agg_dict = dict_to_agg_dict(loans_dict)
 summary_list = agg_dict_to_lists(agg_dict)
 output_csv_summary(summary_list, output_csv_folder)
if __name__ == '__main__':
 parser = argparse.ArgumentParser(description='Validate accounting loans summary against system')
 parser.add_argument('--loans_csv', metavar='path', required=True,
 help='path to input loans csv')
 parser.add_argument('--output_csv_folder', metavar='path', required=True,
 help='path to output csv folder')
 args = parser.parse_args()
 validate_loans(loans_csv=args.loans_csv, output_csv_folder=args.output_csv_folder)

Question 2

Try not to use name of data-structure in your variables. Check: Do you prefix variable names with an abbreviation of the variable types?. Python 2 doesn't support annotations but you can use comments to specify the types. On Python 3+ you can use function annotations and in Python 3.6+ you can use variable annotations as well.
Use with-statement while handling files: open(loans_csv).

It is good practice to use the with keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point.
Use simple str.split instead of regex r'([A-z]+?-\d{4})': row['Date'].split('-').
You can pass qotechar="'" to csv.DictReader and then you won't have to strip them manually.
To append an item at the end use list.append instead of vlist.insert(len(list), item).

The for-loop in agg_dict_to_lists can be reduced to:

for key, values in agg_dict.iteritems():
 values_list = list(key) + [sum(values), len(values)]
 summary_list.append(values_list)

If you're expecting currency to be decimal then use decimal module instead of plain floats. Decimals objects provides better floating-point arithmetic and alterable precision.
The format() is not required here: datetime.today().strftime("{0}{1}{2}".format("%y", "%m", "%d")). You could simply do: datetime.today().strftime('%y%m%d').
If you're opening the file in binary mode open(output_csv, 'wb') the make sure you're converting the data being written to bytes first by encoding individual items of the rows. This would otherwise not even work in Python 3 where you cannot mix bytes and str anymore.
print is not a function in Python 2 unless your import it using from __future__ import print_function. Hence, don't use it with parenthesis.

Question 3

Thanks for the suggestions. I should of noted that I'm using Python 2.7

Question 4

@PeterWilson Yeah, the print(e) call without __future__ import led me to believe that its Python 2, but then I saw the iteritems() call.

Ashwini Chaudhary Ashwini Chaudhary 3,15516 silver badges18 bronze badges · Accepted Answer · 2017-08-22 18:24:17Z

Try not to use name of data-structure in your variables. Check: Do you prefix variable names with an abbreviation of the variable types?. Python 2 doesn't support annotations but you can use comments to specify the types. On Python 3+ you can use function annotations and in Python 3.6+ you can use variable annotations as well.
Use with-statement while handling files: open(loans_csv).

It is good practice to use the with keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point.
Use simple str.split instead of regex r'([A-z]+?-\d{4})': row['Date'].split('-').
You can pass qotechar="'" to csv.DictReader and then you won't have to strip them manually.
To append an item at the end use list.append instead of vlist.insert(len(list), item).

The for-loop in agg_dict_to_lists can be reduced to:

for key, values in agg_dict.iteritems():
 values_list = list(key) + [sum(values), len(values)]
 summary_list.append(values_list)

If you're expecting currency to be decimal then use decimal module instead of plain floats. Decimals objects provides better floating-point arithmetic and alterable precision.
The format() is not required here: datetime.today().strftime("{0}{1}{2}".format("%y", "%m", "%d")). You could simply do: datetime.today().strftime('%y%m%d').
If you're opening the file in binary mode open(output_csv, 'wb') the make sure you're converting the data being written to bytes first by encoding individual items of the rows. This would otherwise not even work in Python 3 where you cannot mix bytes and str anymore.
print is not a function in Python 2 unless your import it using from __future__ import print_function. Hence, don't use it with parenthesis.

Thanks for the suggestions. I should of noted that I'm using Python 2.7
@PeterWilson Yeah, the print(e) call without __future__ import led me to believe that its Python 2, but then I saw the iteritems() call.

Stack Exchange Network

Aggregate loans report without using Python standard aggregate or group functions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Aggregate loans report without using Python standard aggregate or group functions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions