For my haves:
I want to find the key values for the maximum value whilst grouping by g1 and g2 to get these wants:
This is my current possibly (?) clumsy code:
df = pd.DataFrame({
'value': [0, 1, 1, 3],
'g1': ["b", "b", "c", "c"],
'g2': ["a", "a", "a", "a"],
'key': ["1", "2", "3", "4"]
})
print(df)
stats = df.groupby(["g1", "g2"])["value"].max().reset_index()
pd.merge(df, stats, left_on=["g1", "g2", "value"], right_on=["g1", "g2", "value"], how="inner")
Any improvement suggestions would be very much welcome please. Thanks.
-
\$\begingroup\$ This isn't so narrow as to be off-topic, but you will benefit from showing more of the contextual code around this snippet. Otherwise reviews will be more constrained than they probably should be. \$\endgroup\$Reinderien– Reinderien2022年02月17日 14:02:59 +00:00Commented Feb 17, 2022 at 14:02
1 Answer 1
Recognise that your first pass should not be to find the group maxima: it should be to find the indices of the group maxima. From there it is easy to index into the original dataframe to retrieve the desired maximum rows:
import pandas as pd
df = pd.DataFrame({
'value': (0, 1, 1, 3),
'g1': ('b', 'b', 'c', 'c'),
'g2': ('a', 'a', 'a', 'a'),
'key': ('1', '2', '3', '4')
})
imax = df.groupby(['g1', 'g2']).value.idxmax()
result = df.loc[imax]
However, your original dataframe construction has an issue. If key
is truly a key, you should be using it as your index instead of the implicit numeric-range index:
import pandas as pd
df = pd.DataFrame({
'value': (0, 1, 1, 3),
'g1': ('b', 'b', 'c', 'c'),
'g2': ('a', 'a', 'a', 'a'),
'key': ('1', '2', '3', '4')
}).set_index('key')
imax = df.groupby(['g1', 'g2']).value.idxmax()
result = df.loc[imax]