4
\$\begingroup\$

This was inspired by Aggregate loans report without using Python standard aggregate or group functions question, but I've decided to approach it using pandas.

To recap, sample input:

MSISDN,Network,Date,Product,Amount
1,Network 1,12-Mar-2016,Loan Product 1,1000
2,Network 2,16-Mar-2016,Loan Product 1,1122
3,Network 3,17-Mar-2016,Loan Product 2,2084
4,Network 3,18-Mar-2016,Loan Product 2,3098
5,Network 2,01-Apr-2016,Loan Product 1,5671

Desired output:

Network,Product,Month\Year,Currency,Count
Network 1,Loan Product 1,03-16,1000,1
Network 2,Loan Product 1,03-16,1122,1
Network 2,Loan Product 1,04-16,5671,1
Network 3,Loan Product 2,03-16,5182,2

In other words, the task is to group the data from the input.csv file by Network, Product and month+year of the Date column, then calculate the sum of Currency column keeping track of counts in each group.

I've solved it via creating a separate Month\Year column first, loading the Date values into datetime objects and dumping into a month-year format, then grouping by the desired columns using .groupby() and then aggregating with sum and count with further renaming the columns to the desired column names:

from datetime import datetime
import pandas as pd
df = pd.read_csv('input.csv')
df['Month\Year'] = df['Date'].apply(lambda s: datetime.strptime(s, "%d-%b-%Y").strftime('%m-%y'))
grouped = df.groupby(['Network', 'Product', 'Month\Year'])['Amount']
df = grouped.agg(['sum', 'count']).rename(columns={'sum': 'Currency', 'count': 'Count'}).reset_index()
df.to_csv('output.csv', index=False)

Is this the most optimal and readable pandas-based solution? Can it be further improved?

I am particularly not quite happy with renaming columns after aggregation - there should be a more straightforward way to aggregate into the custom-named columns.

asked Aug 22, 2017 at 19:50
\$\endgroup\$
2
  • \$\begingroup\$ If the Date column is a datetime you could skip the new column added and use Grouper, but that leaves you with datetime objects instead of strs. I don't really know a way to avoid the renaming at the end \$\endgroup\$ Commented Aug 23, 2017 at 13:40
  • \$\begingroup\$ if the Date is not a datetime, you can use pd.to_datetime() instead of the lambda and strptime \$\endgroup\$ Commented Aug 23, 2017 at 13:41

1 Answer 1

4
\$\begingroup\$

Your approach is already vectorised. That's good! But it could be better.

I've solved it via creating a separate Month\Year column first

Don't do that; there are more efficient options.

Is this the most optimal and readable pandas-based solution?

Not really.

I am particularly not quite happy with renaming columns after aggregation

That can indeed be solved through kwarg-style agg calls, but that isn't the most major issue. The most major issue is that you should be doing a single groupby having one of the group levels as a time grouper (essentially the same effect as a resample):

import io
import pandas as pd
with io.StringIO(
'''MSISDN,Network,Date,Product,Amount
1,Network 1,12-Mar-2016,Loan Product 1,1000
2,Network 2,16-Mar-2016,Loan Product 1,1122
3,Network 3,17-Mar-2016,Loan Product 2,2084
4,Network 3,18-Mar-2016,Loan Product 2,3098
5,Network 2,01-Apr-2016,Loan Product 1,5671
''') as f:
 df = pd.read_csv(f, parse_dates=['Date'], date_format='%d-%b-%Y')
out = (
 df.groupby([
 'Network', 'Product',
 pd.Grouper(key='Date', freq='MS'), # month-start resampling
 ])['Amount']
 .agg(Currency='sum', Count='count')
)
print(out)
 Currency Count
Network Product Date 
Network 1 Loan Product 1 2016年03月01日 1000 1
Network 2 Loan Product 1 2016年03月01日 1122 1
 2016年04月01日 5671 1
Network 3 Loan Product 2 2016年03月01日 5182 2
answered Jan 27 at 23:25
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.