Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

resample_apply - skewed result? #1122

Unanswered
mmarihart asked this question in Q&A
Discussion options

Started using resample_apply to resample a 1 minute dataframe to a 5 minute dataframe. I think the data from the resampled timeframe is added a row too late on the base timeframe, I am not sure though. Here's a minimal example code that just resamples the close:

import pandas as pd
from pandas import Timestamp
from backtesting.lib import resample_apply,
df = pd.DataFrame(data={'Close': [18120.0, 18120.25, 18118.75, 18119.5, 18119.25, 18119.25, 18119.0, 18117.25, 18117.25, 18117.0, 18117.0, 18118.5, 18118.5, 18120.5, 18121.75]}, index=[Timestamp('2024-03-01 00:00:00'), Timestamp('2024-03-01 00:01:00'), Timestamp('2024-03-01 00:02:00'), Timestamp('2024-03-01 00:03:00'), Timestamp('2024-03-01 00:04:00'), Timestamp('2024-03-01 00:05:00'), Timestamp('2024-03-01 00:06:00'), Timestamp('2024-03-01 00:07:00'), Timestamp('2024-03-01 00:08:00'), Timestamp('2024-03-01 00:09:00'), Timestamp('2024-03-01 00:10:00'), Timestamp('2024-03-01 00:11:00'), Timestamp('2024-03-01 00:12:00'), Timestamp('2024-03-01 00:13:00'), Timestamp('2024-03-01 00:14:00')])
resample_apply('5T', lambda x: x, df['Close'])

The result is:

2024年03月01日 00:00:00 NaN
2024年03月01日 00:01:00 NaN
2024年03月01日 00:02:00 NaN
2024年03月01日 00:03:00 NaN
2024年03月01日 00:04:00 NaN
2024年03月01日 00:05:00 18119.25
2024年03月01日 00:06:00 18119.25
2024年03月01日 00:07:00 18119.25
2024年03月01日 00:08:00 18119.25
2024年03月01日 00:09:00 18119.25
2024年03月01日 00:10:00 18117.00
2024年03月01日 00:11:00 18117.00
2024年03月01日 00:12:00 18117.00
2024年03月01日 00:13:00 18117.00
2024年03月01日 00:14:00 18117.00
Name: C[5T], dtype: float64

From my understanding we always operate on the close of the 1m candle. So at 2024年03月01日 00:04:00 we are at the end of the 1m candle as well as the end of the 5m candle.

Therefore, shouldn't the value 18119.25 (which is the close of the 5m candle that started at 2024年03月01日 00:00:00) start one row above at 2024年03月01日 00:04:00?

Thank you for any clarifications you can provide.

You must be logged in to vote

Replies: 1 comment

Comment options

I'm also feeling strange for the resampled timeseries, not sure it's the same thing of the topic though. The output of the resample_apply doesn't hold original timestamp, which is also mentioned #970 . Below is the code to replicate the situation.

import pandas as pd
from pandas import Timestamp
from backtesting.lib import resample_apply, OHLCV_AGG
df = pd.DataFrame(data={'Close': [18120.0, 18120.25, 18118.75, 18119.5, 18119.25, 18119.25, 18119.0, 18117.25, 18117.25, 18117.0, 18117.0, 18118.5, 18118.5, 18120.5, 18121.75]}, index=[Timestamp('2024-03-01 00:00:00'), Timestamp('2024-03-01 00:01:00'), Timestamp('2024-03-01 00:02:00'), Timestamp('2024-03-01 00:03:00'), Timestamp('2024-03-01 00:04:00'), Timestamp('2024-03-01 00:05:00'), Timestamp('2024-03-01 00:06:00'), Timestamp('2024-03-01 00:07:00'), Timestamp('2024-03-01 00:08:00'), Timestamp('2024-03-01 00:09:00'), Timestamp('2024-03-01 00:10:00'), Timestamp('2024-03-01 00:11:00'), Timestamp('2024-03-01 00:12:00'), Timestamp('2024-03-01 00:13:00'), Timestamp('2024-03-01 00:14:00')])
df.Close = range(len(df)) # use more simple data
# replication of current resample: resample_apply('5T', lambda x: x, df['Close'])
resampled_1 = df.resample('5T', label='right').agg(OHLCV_AGG['Close']).reindex(index=df.index, method='ffill')
# passing closed='right'. if this is not passed, closed='left' will be used
resampled_2 = df.resample('5T', label='right', closed='right').agg(OHLCV_AGG['Close']).reindex(index=df.index, method='ffill')
pd.concat([df, resampled_1, resampled_2], axis=1, keys=['PreResample', 'CurrentResample', 'ExpectedResample'])

output

 PreResample CurrentResample ExpectedResample
 Close Close Close
2024年03月01日 00:00:00 0 NaN 0
2024年03月01日 00:01:00 1 NaN 0
2024年03月01日 00:02:00 2 NaN 0
2024年03月01日 00:03:00 3 NaN 0
2024年03月01日 00:04:00 4 NaN 0
2024年03月01日 00:05:00 5 4.0 5
2024年03月01日 00:06:00 6 4.0 5
2024年03月01日 00:07:00 7 4.0 5
2024年03月01日 00:08:00 8 4.0 5
2024年03月01日 00:09:00 9 4.0 5
2024年03月01日 00:10:00 10 9.0 10
2024年03月01日 00:11:00 11 9.0 10
2024年03月01日 00:12:00 12 9.0 10
2024年03月01日 00:13:00 13 9.0 10
2024年03月01日 00:14:00 14 9.0 10

I thought that data would be updated every 5min and others are filled with the previous value like ExpectedResample, by keeping the same timestamp with the original timeseries. I mean that if PreResample has 5 at 2024年03月01日 00:05:00, the resampled should have the same value on the same timestamp in every 5min but current implementation chose 4 instead.

Any indicator calculated from resample_apply would be shifted and aggregate the values using previous bar.
Is the current implementation actually expected one? or backtesting assumes the timestamp on Open-time, instead on Close-time?

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /