I have the following function that allows me to calculate the marketing spend on my advertising campaigns.
The problem is that it's now taking too long (over 15 minutes) to apply the function to said dataframe, as it now has more than 8 months of historical data.
I have no idea if there's a way to speed up the process. Any help would be appreciated if you know a faster way to iter through rows.
def actual_spend_atlas(row):
filterdatemediaplanid_day = (atlas_df.Date == row.Date) & (atlas_df.MediaPlanIndex == row.MediaPlanIndex)
filterdatemediaplanid_total = (atlas_df.MediaPlanIndex == row.MediaPlanIndex)
total_impressions_day = atlas_df.ix[filterdatemediaplanid_day, 'Impressions'].sum()
total_impressions = atlas_df.ix[filterdatemediaplanid_total, 'Impressions'].sum()
if row.Amnet == "Amnet":
spend = atlas_df.AmnetCost * (row.Impressions / filterdatemediaplanid_day)
if row.CostMethod == 'FLF' or row.CostMethod == 'CPH':
if total_impressions < 500:
spend = 0
else:
rate = row.BookedRate
spend = rate * (row.Impressions/ total_impressions)
elif row.CostMethod == 'CPC':
spend = row.BookedRate * row.Clicks
elif row.CostMethod == 'CPM':
cpm_impressions = row.Impressions / 1000.0
rate = row.BookedRate
spend = rate * cpm_impressions
else:
spend = 0
return spend
atlas_df['Spend'] = atlas_df.apply(actual_spend_atlas, axis=1)
1 Answer 1
The easiest way to get a small speed-up is swap these two lines:
filterdatemediaplanid_day = (atlas_df.Date == row.Date) & (atlas_df.MediaPlanIndex == row.MediaPlanIndex)
filterdatemediaplanid_total = (atlas_df.MediaPlanIndex == row.MediaPlanIndex)
to:
filterdatemediaplanid_total = (atlas_df.MediaPlanIndex == row.MediaPlanIndex)
filterdatemediaplanid_day = (atlas_df.Date == row.Date) & filterdatemediaplanid_total
This part of the code does not serve any purpose:
if row.Amnet == "Amnet":
spend = atlas_df.AmnetCost * (row.Impressions / filterdatemediaplanid_day)
Since you don't return right away and afterwards comes an if/elif/else
block that is guaranteed to modify spend
, anything you calculate here will be overwritten.
While some people adhere to the single exit policy, I prefer early returns:
if row.CostMethod == 'FLF' or row.CostMethod == 'CPH':
total_impressions = atlas_df.ix[filterdatemediaplanid_total, 'Impressions'].sum()
if total_impressions >= 500:
return row.BookedRate * (row.Impressions/ total_impressions)
elif row.CostMethod == 'CPC':
return row.BookedRate * row.Clicks
elif row.CostMethod == 'CPM':
return row.BookedRate * row.Impressions / 1000.0
return 0
total_impressions_day
is not used anywhere and total_impressions
should only be calculated if needed.
filterdatemediaplanid_day
andfilterdatemediaplanid_total
arrays. \$\endgroup\$