-1

I have such dataframe:

product1 product2 product3 product4 product5 time
straws orange melon chair bread 1
melon milk book coffee cake 2
bread bananas juice chair book 3
straws coffee cake milk orange 4

I need to have the time step per items.

Example :

TimesProduct1 TimesProduct2 TimesProduct3 Timesproduct4 Timesproduct5
0 0 0 0 0
1 0 0 0 0
2 0 0 2 1
3 2 2 2 3

df.diff() unfortunatly doesn't work in that way.

Thanks for your help.

asked Oct 10, 2022 at 22:18
2
  • 2
    It is really unclear what you are wanting to do. Commented Oct 11, 2022 at 0:47
  • ok so i have for example a first client at time 1 who bought straws, then at time 4 another one bought this product. The time step is 3, it's the difference between the product apparence per time. Commented Oct 11, 2022 at 12:24

1 Answer 1

0

setup

import pandas as pd
df = pd.DataFrame(
 {
 "product1":["straws", "melon", "bread", "straws"],
 "product2":["orange", "milk", "bananas", "coffee"],
 "product3":["melon", "book", "juice", "cake"],
 "product4":["chair", "coffee", "chair", "milk"],
 "product5":["bread", "cake", "book", "orange"],
 "time":[1,2,3,4],
 }
)

solution

result = (
 df.melt(id_vars="time")
 .groupby("value")
 .apply(lambda d: d.sort_values("time").eval("diff = time.diff()"))
 .pivot(index="time", columns="variable", values="diff")
 .fillna(0)
 .reset_index(drop=True)
)

solution 2 (for pandas bug workaround)

def make_diff_column(df):
 df = df.sort_values("time")
 df["diff"] = df["time"].diff()
 return df
result = (
 df.melt(id_vars="time")
 .groupby("value")
 .apply(make_diff_column)
 .pivot(index="time", columns="variable", values="diff")
 .fillna(0)
 .reset_index(drop=True)
)
answered Oct 11, 2022 at 21:48
Sign up to request clarification or add additional context in comments.

7 Comments

i have a message error :'unshable type:'series'. on this line ".apply(lambda d: d.sort_values("time").eval("diff = time.diff()"))"
I've tried so hard to work on it but it's a very important task. I have another very similar to this one. Imagine I have a camera in a forest, and per day I have over 10 differents kind of animals crossing. All i need is the overlap between days or more exactly the dt. With over 100 differents species, overlap is a very important aspect.
What version of pandas are you using?
the version of colab, it's 1.3.5 version
We're using different versions of pandas which no doubt is the cause of lack of reproducibility. See if the Solution 2 will work for you?
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.