-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
-
Started using resample_apply
to resample a 1 minute dataframe to a 5 minute dataframe. I think the data from the resampled timeframe is added a row too late on the base timeframe, I am not sure though. Here's a minimal example code that just resamples the close:
import pandas as pd
from pandas import Timestamp
from backtesting.lib import resample_apply,
df = pd.DataFrame(data={'Close': [18120.0, 18120.25, 18118.75, 18119.5, 18119.25, 18119.25, 18119.0, 18117.25, 18117.25, 18117.0, 18117.0, 18118.5, 18118.5, 18120.5, 18121.75]}, index=[Timestamp('2024-03-01 00:00:00'), Timestamp('2024-03-01 00:01:00'), Timestamp('2024-03-01 00:02:00'), Timestamp('2024-03-01 00:03:00'), Timestamp('2024-03-01 00:04:00'), Timestamp('2024-03-01 00:05:00'), Timestamp('2024-03-01 00:06:00'), Timestamp('2024-03-01 00:07:00'), Timestamp('2024-03-01 00:08:00'), Timestamp('2024-03-01 00:09:00'), Timestamp('2024-03-01 00:10:00'), Timestamp('2024-03-01 00:11:00'), Timestamp('2024-03-01 00:12:00'), Timestamp('2024-03-01 00:13:00'), Timestamp('2024-03-01 00:14:00')])
resample_apply('5T', lambda x: x, df['Close'])
The result is:
2024年03月01日 00:00:00 NaN
2024年03月01日 00:01:00 NaN
2024年03月01日 00:02:00 NaN
2024年03月01日 00:03:00 NaN
2024年03月01日 00:04:00 NaN
2024年03月01日 00:05:00 18119.25
2024年03月01日 00:06:00 18119.25
2024年03月01日 00:07:00 18119.25
2024年03月01日 00:08:00 18119.25
2024年03月01日 00:09:00 18119.25
2024年03月01日 00:10:00 18117.00
2024年03月01日 00:11:00 18117.00
2024年03月01日 00:12:00 18117.00
2024年03月01日 00:13:00 18117.00
2024年03月01日 00:14:00 18117.00
Name: C[5T], dtype: float64
From my understanding we always operate on the close of the 1m candle. So at 2024年03月01日 00:04:00 we are at the end of the 1m candle as well as the end of the 5m candle.
Therefore, shouldn't the value 18119.25 (which is the close of the 5m candle that started at 2024年03月01日 00:00:00) start one row above at 2024年03月01日 00:04:00?
Thank you for any clarifications you can provide.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment
-
I'm also feeling strange for the resampled timeseries, not sure it's the same thing of the topic though. The output of the resample_apply
doesn't hold original timestamp, which is also mentioned #970 . Below is the code to replicate the situation.
import pandas as pd from pandas import Timestamp from backtesting.lib import resample_apply, OHLCV_AGG df = pd.DataFrame(data={'Close': [18120.0, 18120.25, 18118.75, 18119.5, 18119.25, 18119.25, 18119.0, 18117.25, 18117.25, 18117.0, 18117.0, 18118.5, 18118.5, 18120.5, 18121.75]}, index=[Timestamp('2024-03-01 00:00:00'), Timestamp('2024-03-01 00:01:00'), Timestamp('2024-03-01 00:02:00'), Timestamp('2024-03-01 00:03:00'), Timestamp('2024-03-01 00:04:00'), Timestamp('2024-03-01 00:05:00'), Timestamp('2024-03-01 00:06:00'), Timestamp('2024-03-01 00:07:00'), Timestamp('2024-03-01 00:08:00'), Timestamp('2024-03-01 00:09:00'), Timestamp('2024-03-01 00:10:00'), Timestamp('2024-03-01 00:11:00'), Timestamp('2024-03-01 00:12:00'), Timestamp('2024-03-01 00:13:00'), Timestamp('2024-03-01 00:14:00')]) df.Close = range(len(df)) # use more simple data # replication of current resample: resample_apply('5T', lambda x: x, df['Close']) resampled_1 = df.resample('5T', label='right').agg(OHLCV_AGG['Close']).reindex(index=df.index, method='ffill') # passing closed='right'. if this is not passed, closed='left' will be used resampled_2 = df.resample('5T', label='right', closed='right').agg(OHLCV_AGG['Close']).reindex(index=df.index, method='ffill') pd.concat([df, resampled_1, resampled_2], axis=1, keys=['PreResample', 'CurrentResample', 'ExpectedResample'])
output
PreResample CurrentResample ExpectedResample
Close Close Close
2024年03月01日 00:00:00 0 NaN 0
2024年03月01日 00:01:00 1 NaN 0
2024年03月01日 00:02:00 2 NaN 0
2024年03月01日 00:03:00 3 NaN 0
2024年03月01日 00:04:00 4 NaN 0
2024年03月01日 00:05:00 5 4.0 5
2024年03月01日 00:06:00 6 4.0 5
2024年03月01日 00:07:00 7 4.0 5
2024年03月01日 00:08:00 8 4.0 5
2024年03月01日 00:09:00 9 4.0 5
2024年03月01日 00:10:00 10 9.0 10
2024年03月01日 00:11:00 11 9.0 10
2024年03月01日 00:12:00 12 9.0 10
2024年03月01日 00:13:00 13 9.0 10
2024年03月01日 00:14:00 14 9.0 10
I thought that data would be updated every 5min and others are filled with the previous value like ExpectedResample
, by keeping the same timestamp with the original timeseries. I mean that if PreResample
has 5 at 2024年03月01日 00:05:00, the resampled should have the same value on the same timestamp in every 5min but current implementation chose 4 instead.
Any indicator calculated from resample_apply
would be shifted and aggregate the values using previous bar.
Is the current implementation actually expected one? or backtesting
assumes the timestamp on Open-time, instead on Close-time?
Beta Was this translation helpful? Give feedback.