Applying a dataframe function to a pandas groupby object

Question 1

I am trying to apply a function to each group in a pandas dataframe where the function requires access to the entire group (as opposed to just one row). For this I am iterating over each group in the groupby object. Is this the best way to achieve this?

import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,2], 
 'value': [70,10,20,100,50,5,33], 
 'other_value': [2.3, 3.3, 7.4, 1.1, 5, 10.3, 12]})
def clean_df(df, v_col, other_col):
 '''This function is just a made up example and might 
 get more complex in real life. ;)
 '''
 prev_points = df[v_col].shift(1)
 next_points = df[v_col].shift(-1)
 return df[(prev_points > 50) | (next_points < 20)] 
grouped = df.groupby('id')
pd.concat([clean_df(group, 'value', 'other_value') for _, group in grouped])

The original dataframe is

 id other_value value
0 1 2.3 70
1 1 3.3 10
2 1 7.4 20
3 1 1.1 100
4 2 5.0 50
5 2 10.3 5
6 2 12.0 33

The code will reduce it to

 id other_value value
0 1 2.3 70
1 1 3.3 10
4 2 5.0 50

Question 2

You can directly use apply on the grouped dataframe and it will be passed the whole group:

def clean_df(df, v_col='value', other_col='other_value'):
 '''This function is just a made up example and might 
 get more complex in real life. ;)
 '''
 prev_points = df[v_col].shift(1)
 next_points = df[v_col].shift(-1)
 return df[(prev_points > 50) | (next_points < 20)] 
df.groupby('id').apply(clean_df).reset_index(level=0, drop=True)
# id other_value value
# 0 1 2.3 70
# 1 1 3.3 10
# 4 2 5.0 50

Note that I had to give the other arguments default values, since the function that is applied needs to have only one argument. Another way around this is to make a function that returns the function:

def clean_df(v_col, other_col):
 '''This function is just a made up example and might 
 get more complex in real life. ;)
 '''
 def wrapper(df):
 prev_points = df[v_col].shift(1)
 next_points = df[v_col].shift(-1)
 return df[(prev_points > 50) | (next_points < 20)] 
 return wrapper

Which you can use like this:

df.groupby('id').apply(clean_df('value', 'other_value')).reset_index(level=0, drop=True)

Or you can use functools.partial with your clean_df:

from functools import partial
df.groupby('id') \
 .apply(partial(clean_df, v_col='value', other_col='other_value')) \
 .reset_index(level=0, drop=True)

Graipher GraipherGraipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2019-04-09 11:54:36Z

You can directly use apply on the grouped dataframe and it will be passed the whole group:

def clean_df(df, v_col='value', other_col='other_value'):
 '''This function is just a made up example and might 
 get more complex in real life. ;)
 '''
 prev_points = df[v_col].shift(1)
 next_points = df[v_col].shift(-1)
 return df[(prev_points > 50) | (next_points < 20)] 
df.groupby('id').apply(clean_df).reset_index(level=0, drop=True)
# id other_value value
# 0 1 2.3 70
# 1 1 3.3 10
# 4 2 5.0 50

Note that I had to give the other arguments default values, since the function that is applied needs to have only one argument. Another way around this is to make a function that returns the function:

def clean_df(v_col, other_col):
 '''This function is just a made up example and might 
 get more complex in real life. ;)
 '''
 def wrapper(df):
 prev_points = df[v_col].shift(1)
 next_points = df[v_col].shift(-1)
 return df[(prev_points > 50) | (next_points < 20)] 
 return wrapper

Which you can use like this:

df.groupby('id').apply(clean_df('value', 'other_value')).reset_index(level=0, drop=True)

Or you can use functools.partial with your clean_df:

from functools import partial
df.groupby('id') \
 .apply(partial(clean_df, v_col='value', other_col='other_value')) \
 .reset_index(level=0, drop=True)

Stack Exchange Network

Applying a dataframe function to a pandas groupby object

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Applying a dataframe function to a pandas groupby object

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions