I have dataframe df
with following characteristic
store_id | city_id | sales_A | sales_B | sales_C |
---|---|---|---|---|
STORE01 | CITY99 | 100 Item | None | None |
STORE01 | CITY99 | None | 200 Order | None |
STORE01 | CITY99 | None | None | 300 Client |
STORE01 | CITY99 | 150 Order | None | 300 Client |
... |
All rows will has same characteristics, where same store id
and city ID
has 1 row or more:
- row 1 : sales A has value, other None
- row 2 : sales B has value, other None
- row 3 : sales C has value, other None
- row 4 : sales A has value (but different with row 1), other None
Note that the value is not number, they are string, and must be kept as string
Ordering of rows might be different, but basically each has 1 or more rows, depends on sales.
In pandas,how can I merge them into one row, so the result dataset will be something like this :
store_id | city_id | sales_A | sales_B | sales_C |
---|---|---|---|---|
STORE01 | CITY99 | 100 Item, 150 Order | 200 Order | 300 Client |
Thanks
asked Aug 16, 2021 at 6:46
1 Answer 1
Use custom lambda function with remove None
values and duplicates, last join values by ,
in GroupBy.agg
:
#if None are strings convert them to NoneType
#df = df.mask(df == 'None', None)
f = lambda x: ', '.join(x.dropna().unique())
df = df.groupby(['store_id','city_id'], as_index=False).agg(f)
print (df)
store_id city_id sales_A sales_B sales_C
0 STORE01 CITY99 100 Item, 150 Order 200 Order 300 Client
answered Aug 16, 2021 at 6:55
Sign up to request clarification or add additional context in comments.
1 Comment
Timothy
Sorry, updated question. 1 combination can has more than one row on sales_A, or sales_B, or sales_C. Your approach work if I only has one NaN value per group
lang-py