10
\$\begingroup\$

I am trying to compute the difference in timestamps and make a delta time column in a Pandas dataframe. This is the code I am currently using:

# Make x sequential in time
x.sort_values('timeseries',ascending=False)
x.reset_index(drop=True)
# Initialize a list to store the delta values
time_delta = [pd._libs.tslib.Timedelta('NaT')]
# Loop though the table and compute deltas
for i in range(1,len(x)):
 time_delta.append(x.loc[i,'timestamp'] - x.loc[i-1,'timestamp'])
# Compute a Pandas Series from the list 
time_delta = pd.Series(time_delta)
# Attach the Series back to the original df
x['time_delta'] = time_delta

It seems like there should be a more efficient / vectorized way of doing this simple operation, but I can't seem to figure it out.

Suggestions on improving this code would be greatly appreciated.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Dec 20, 2018 at 19:34
\$\endgroup\$

2 Answers 2

5
\$\begingroup\$

Probably you miss:

Example code


from datetime import datetime, timedelta
import pandas as pd
from random import randint
if __name__ == "__main__":
 # Prepare table x with unsorted timestamp column
 date_today = datetime.now()
 timestamps = [date_today + timedelta(seconds=randint(1, 1000)) for _ in range(5)]
 x = pd.DataFrame(data={'timestamp': timestamps})
 # Make x sequential in time
 x.sort_values('timestamp', ascending=True, inplace=True)
 # Compute time_detla
 x['time_delta'] = x['timestamp'] - x['timestamp'].shift()
 print(x)
answered Dec 20, 2018 at 22:36
\$\endgroup\$
3
  • 3
    \$\begingroup\$ Using x.time_delta.diff() (possibly with -1 as argument) might be even simpler. \$\endgroup\$ Commented Dec 21, 2018 at 16:56
  • \$\begingroup\$ Yes, x['time_delta'] = x.timestamp.diff() is simpler. \$\endgroup\$ Commented Dec 21, 2018 at 22:46
  • \$\begingroup\$ Feel free to include it in your answer if you want. \$\endgroup\$ Commented Dec 21, 2018 at 23:52
9
\$\begingroup\$

Use the diff().

 x['time_delta'] = x.timestamp.diff().fillna(x['time_delta'])

This works as below, in a simpler example.

You could use the diff() Series method (with fillna to replace the first value in the series):

s = pd.Series([11, 13, 56, 60, 65])
s.diff().fillna(s)
0 11
1 2
2 43
3 4
4 5
dtype: float64

This was compiled from the comments below the current best answer (which I failed to see and kept searching), and the stack overflow link that explained it with fillna so I am hoping this can be lifted up to the top for future seekers. Happy data processing!

answered May 5, 2020 at 22:47
\$\endgroup\$
1
  • \$\begingroup\$ An alternative to fillna(), is to drop the first element altogether. s = pd.Series([11, 13, 56, 60, 65]) s.diff()[1:] 1 2.0 2 43.0 3 4.0 4 5.0 dtype: float64 \$\endgroup\$ Commented Mar 16, 2021 at 20:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.