I am trying to compute the difference in timestamps and make a delta time column in a Pandas dataframe. This is the code I am currently using:
# Make x sequential in time
x.sort_values('timeseries',ascending=False)
x.reset_index(drop=True)
# Initialize a list to store the delta values
time_delta = [pd._libs.tslib.Timedelta('NaT')]
# Loop though the table and compute deltas
for i in range(1,len(x)):
time_delta.append(x.loc[i,'timestamp'] - x.loc[i-1,'timestamp'])
# Compute a Pandas Series from the list
time_delta = pd.Series(time_delta)
# Attach the Series back to the original df
x['time_delta'] = time_delta
It seems like there should be a more efficient / vectorized way of doing this simple operation, but I can't seem to figure it out.
Suggestions on improving this code would be greatly appreciated.
2 Answers 2
Probably you miss:
shift()
possibility. With it you don't need loop by handinplace
variable in methods, e.g.x.sort_values()
Example code
from datetime import datetime, timedelta
import pandas as pd
from random import randint
if __name__ == "__main__":
# Prepare table x with unsorted timestamp column
date_today = datetime.now()
timestamps = [date_today + timedelta(seconds=randint(1, 1000)) for _ in range(5)]
x = pd.DataFrame(data={'timestamp': timestamps})
# Make x sequential in time
x.sort_values('timestamp', ascending=True, inplace=True)
# Compute time_detla
x['time_delta'] = x['timestamp'] - x['timestamp'].shift()
print(x)
-
3\$\begingroup\$ Using
x.time_delta.diff()
(possibly with-1
as argument) might be even simpler. \$\endgroup\$Graipher– Graipher2018年12月21日 16:56:28 +00:00Commented Dec 21, 2018 at 16:56 -
\$\begingroup\$ Yes,
x['time_delta'] = x.timestamp.diff()
is simpler. \$\endgroup\$vaeta– vaeta2018年12月21日 22:46:39 +00:00Commented Dec 21, 2018 at 22:46 -
\$\begingroup\$ Feel free to include it in your answer if you want. \$\endgroup\$Graipher– Graipher2018年12月21日 23:52:34 +00:00Commented Dec 21, 2018 at 23:52
Use the diff().
x['time_delta'] = x.timestamp.diff().fillna(x['time_delta'])
This works as below, in a simpler example.
You could use the diff()
Series method (with fillna
to replace the first value in the series):
s = pd.Series([11, 13, 56, 60, 65])
s.diff().fillna(s)
0 11
1 2
2 43
3 4
4 5
dtype: float64
This was compiled from the comments below the current best answer (which I failed to see and kept searching), and the stack overflow link that explained it with fillna
so I am hoping this can be lifted up to the top for future seekers. Happy data processing!
-
\$\begingroup\$ An alternative to fillna(), is to drop the first element altogether. s = pd.Series([11, 13, 56, 60, 65]) s.diff()[1:] 1 2.0 2 43.0 3 4.0 4 5.0 dtype: float64 \$\endgroup\$Andrew Irwin– Andrew Irwin2021年03月16日 20:40:59 +00:00Commented Mar 16, 2021 at 20:40