I have such dataframe:
| product1 | product2 | product3 | product4 | product5 | time |
|---|---|---|---|---|---|
| straws | orange | melon | chair | bread | 1 |
| melon | milk | book | coffee | cake | 2 |
| bread | bananas | juice | chair | book | 3 |
| straws | coffee | cake | milk | orange | 4 |
I need to have the time step per items.
Example :
| TimesProduct1 | TimesProduct2 | TimesProduct3 | Timesproduct4 | Timesproduct5 |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 2 | 1 |
| 3 | 2 | 2 | 2 | 3 |
df.diff() unfortunatly doesn't work in that way.
Thanks for your help.
-
2It is really unclear what you are wanting to do.Riley– Riley2022年10月11日 00:47:11 +00:00Commented Oct 11, 2022 at 0:47
-
ok so i have for example a first client at time 1 who bought straws, then at time 4 another one bought this product. The time step is 3, it's the difference between the product apparence per time.Bry Sab– Bry Sab2022年10月11日 12:24:00 +00:00Commented Oct 11, 2022 at 12:24
1 Answer 1
setup
import pandas as pd
df = pd.DataFrame(
{
"product1":["straws", "melon", "bread", "straws"],
"product2":["orange", "milk", "bananas", "coffee"],
"product3":["melon", "book", "juice", "cake"],
"product4":["chair", "coffee", "chair", "milk"],
"product5":["bread", "cake", "book", "orange"],
"time":[1,2,3,4],
}
)
solution
result = (
df.melt(id_vars="time")
.groupby("value")
.apply(lambda d: d.sort_values("time").eval("diff = time.diff()"))
.pivot(index="time", columns="variable", values="diff")
.fillna(0)
.reset_index(drop=True)
)
solution 2 (for pandas bug workaround)
def make_diff_column(df):
df = df.sort_values("time")
df["diff"] = df["time"].diff()
return df
result = (
df.melt(id_vars="time")
.groupby("value")
.apply(make_diff_column)
.pivot(index="time", columns="variable", values="diff")
.fillna(0)
.reset_index(drop=True)
)
answered Oct 11, 2022 at 21:48
Riley
2,2801 gold badge8 silver badges18 bronze badges
Sign up to request clarification or add additional context in comments.
7 Comments
Bry Sab
i have a message error :'unshable type:'series'. on this line ".apply(lambda d: d.sort_values("time").eval("diff = time.diff()"))"
Bry Sab
I've tried so hard to work on it but it's a very important task. I have another very similar to this one. Imagine I have a camera in a forest, and per day I have over 10 differents kind of animals crossing. All i need is the overlap between days or more exactly the dt. With over 100 differents species, overlap is a very important aspect.
Riley
What version of
pandas are you using?Bry Sab
the version of colab, it's 1.3.5 version
Riley
We're using different versions of pandas which no doubt is the cause of lack of reproducibility. See if the Solution 2 will work for you?
|
lang-py