Calculating time deltas between rows in a Pandas dataframe

Question 1

I am trying to compute the difference in timestamps and make a delta time column in a Pandas dataframe. This is the code I am currently using:

# Make x sequential in time
x.sort_values('timeseries',ascending=False)
x.reset_index(drop=True)
# Initialize a list to store the delta values
time_delta = [pd._libs.tslib.Timedelta('NaT')]
# Loop though the table and compute deltas
for i in range(1,len(x)):
 time_delta.append(x.loc[i,'timestamp'] - x.loc[i-1,'timestamp'])
# Compute a Pandas Series from the list 
time_delta = pd.Series(time_delta)
# Attach the Series back to the original df
x['time_delta'] = time_delta

It seems like there should be a more efficient / vectorized way of doing this simple operation, but I can't seem to figure it out.

Suggestions on improving this code would be greatly appreciated.

Question 2

Probably you miss:

shift() possibility. With it you don't need loop by hand
inplace variable in methods, e.g. x.sort_values()

Example code


from datetime import datetime, timedelta
import pandas as pd
from random import randint
if __name__ == "__main__":
 # Prepare table x with unsorted timestamp column
 date_today = datetime.now()
 timestamps = [date_today + timedelta(seconds=randint(1, 1000)) for _ in range(5)]
 x = pd.DataFrame(data={'timestamp': timestamps})
 # Make x sequential in time
 x.sort_values('timestamp', ascending=True, inplace=True)
 # Compute time_detla
 x['time_delta'] = x['timestamp'] - x['timestamp'].shift()
 print(x)

Question 3

Using x.time_delta.diff() (possibly with -1 as argument) might be even simpler.

Question 4

Yes, x['time_delta'] = x.timestamp.diff() is simpler.

Question 5

Feel free to include it in your answer if you want.

Question 6

Use the diff().

 x['time_delta'] = x.timestamp.diff().fillna(x['time_delta'])

This works as below, in a simpler example.

You could use the diff() Series method (with fillna to replace the first value in the series):

s = pd.Series([11, 13, 56, 60, 65])
s.diff().fillna(s)
0 11
1 2
2 43
3 4
4 5
dtype: float64

This was compiled from the comments below the current best answer (which I failed to see and kept searching), and the stack overflow link that explained it with fillna so I am hoping this can be lifted up to the top for future seekers. Happy data processing!

Question 7

An alternative to fillna(), is to drop the first element altogether. s = pd.Series([11, 13, 56, 60, 65]) s.diff()[1:] 1 2.0 2 43.0 3 4.0 4 5.0 dtype: float64

vaeta vaeta 8865 silver badges8 bronze badges · Accepted Answer · 2018-12-20 22:36:15Z

Probably you miss:

shift() possibility. With it you don't need loop by hand
inplace variable in methods, e.g. x.sort_values()

Example code


from datetime import datetime, timedelta
import pandas as pd
from random import randint
if __name__ == "__main__":
 # Prepare table x with unsorted timestamp column
 date_today = datetime.now()
 timestamps = [date_today + timedelta(seconds=randint(1, 1000)) for _ in range(5)]
 x = pd.DataFrame(data={'timestamp': timestamps})
 # Make x sequential in time
 x.sort_values('timestamp', ascending=True, inplace=True)
 # Compute time_detla
 x['time_delta'] = x['timestamp'] - x['timestamp'].shift()
 print(x)

Using x.time_delta.diff() (possibly with -1 as argument) might be even simpler.

Stack Exchange Network

Calculating time deltas between rows in a Pandas dataframe

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Calculating time deltas between rows in a Pandas dataframe

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions