find maximum corresponding column for maximum value after group by

Question 1

For my haves:

I want to find the key values for the maximum value whilst grouping by g1 and g2 to get these wants:

This is my current possibly (?) clumsy code:

df = pd.DataFrame({
 'value': [0, 1, 1, 3],
 'g1': ["b", "b", "c", "c"],
 'g2': ["a", "a", "a", "a"],
 'key': ["1", "2", "3", "4"]
})
print(df)
stats = df.groupby(["g1", "g2"])["value"].max().reset_index()
pd.merge(df, stats, left_on=["g1", "g2", "value"], right_on=["g1", "g2", "value"], how="inner")

Any improvement suggestions would be very much welcome please. Thanks.

Question 2

This isn't so narrow as to be off-topic, but you will benefit from showing more of the contextual code around this snippet. Otherwise reviews will be more constrained than they probably should be.

Question 3

Recognise that your first pass should not be to find the group maxima: it should be to find the indices of the group maxima. From there it is easy to index into the original dataframe to retrieve the desired maximum rows:

import pandas as pd
df = pd.DataFrame({
 'value': (0, 1, 1, 3),
 'g1': ('b', 'b', 'c', 'c'),
 'g2': ('a', 'a', 'a', 'a'),
 'key': ('1', '2', '3', '4')
})
imax = df.groupby(['g1', 'g2']).value.idxmax()
result = df.loc[imax]

However, your original dataframe construction has an issue. If key is truly a key, you should be using it as your index instead of the implicit numeric-range index:

import pandas as pd
df = pd.DataFrame({
 'value': (0, 1, 1, 3),
 'g1': ('b', 'b', 'c', 'c'),
 'g2': ('a', 'a', 'a', 'a'),
 'key': ('1', '2', '3', '4')
}).set_index('key')
imax = df.groupby(['g1', 'g2']).value.idxmax()
result = df.loc[imax]

Reinderien Reinderien 71k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2022-02-17 15:11:16Z

Recognise that your first pass should not be to find the group maxima: it should be to find the indices of the group maxima. From there it is easy to index into the original dataframe to retrieve the desired maximum rows:

import pandas as pd
df = pd.DataFrame({
 'value': (0, 1, 1, 3),
 'g1': ('b', 'b', 'c', 'c'),
 'g2': ('a', 'a', 'a', 'a'),
 'key': ('1', '2', '3', '4')
})
imax = df.groupby(['g1', 'g2']).value.idxmax()
result = df.loc[imax]

However, your original dataframe construction has an issue. If key is truly a key, you should be using it as your index instead of the implicit numeric-range index:

import pandas as pd
df = pd.DataFrame({
 'value': (0, 1, 1, 3),
 'g1': ('b', 'b', 'c', 'c'),
 'g2': ('a', 'a', 'a', 'a'),
 'key': ('1', '2', '3', '4')
}).set_index('key')
imax = df.groupby(['g1', 'g2']).value.idxmax()
result = df.loc[imax]

Stack Exchange Network

find maximum corresponding column for maximum value after group by

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

find maximum corresponding column for maximum value after group by

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions