Resampling data · kernc/backtesting.py · Discussion #210

jamhot1
Dec 27, 2020

I have been playing around with resampling but decided to create a resampled data.close column through pandas for hourly from 5 min data from yahoo.
Because the resampled data only appears as a value every 12 rows, I wanted to be able to create a conditional from a variable that stores the last known value.
The data looks something like
nan
nan
nan
nan
nan ....
109.66653
nan
nan...

I create a condition to check if not nan and to store that int in a variable to check against later in my trade logic.

when I print the variable to see what is being stored, it seems to have stored all values including the nan's despite placing the conditional within the next method to ignore all nan values.

I have tried to store specific row values from within the Next method previously in a different way and seemed to run into a similar issue. Am I missing something?

example:
if not self.data.H_Close == 'nan' : self.Hourly = self.data.H_Close

when self.Hourly is printed on each iteration I get all values including nan

confused!

Answered by kernc

Dec 28, 2020

Checking for nan should be done with numpy.isnan() as nans are a particularly curious bunch:

>>> float('nan') == float('nan')
False
>>> import numpy as np
>>> np.nan == np.nan
False

But more importantly, why not simply use Series.ffill()?

>>> pd.Series([1, np.nan, np.nan, 2, np.nan]).ffill().tolist()
[1, 1, 1, 2, 2]

View full answer

Replies: 1 comment 5 replies

kernc
Dec 28, 2020
Maintainer

Checking for nan should be done with numpy.isnan() as nans are a particularly curious bunch:

>>> float('nan') == float('nan')
False
>>> import numpy as np
>>> np.nan == np.nan
False

But more importantly, why not simply use Series.ffill()?

>>> pd.Series([1, np.nan, np.nan, 2, np.nan]).ffill().tolist()
[1, 1, 1, 2, 2]

5 replies

@jamhot1

jamhot1 Dec 28, 2020
Author

Thanks for the pointers Kernc!!
I did also try :
df['D_Close'] = df['D_Close'].fillna('a')

and checked for 'a' instead.

def next(self):
if not self.data.D_Close == 'a' :
self.Hourly = self.data.D_Close
print(self.Hourly)

And it prints the 'a' included with the int's

I will read up on the series.ffill()

@jamhot1

jamhot1 Dec 28, 2020
Author

just an update on where I think I was tripped up.

where
self.Hourly = self.data.D_Close

I found that
self.Hourly = self.data.D_Close[-1]

This gave me a printed output which didn't include the entire array (including nans) above it.

what am I seeing here? I can't quite understand why I need to look back a row to see what I thought was my current row.

@kernc

kernc Dec 29, 2020
Maintainer

This is pretty much how Python sequences work.

The reason a simple:

self.data.Close

(i.e. without explicit indexing) can be used in some (that is, boolean and numeric-scalar) contexts:

# e.g.
if self.data.Close > self.data.Low + 2:

is the following piece of code which does the indexing for you:

backtesting.py/backtesting/_util.py

Lines 59 to 69 in 1ee5670

def __bool__(self):

try:

return bool(self[-1])

except IndexError:

return super().__bool__()

def __float__(self):

try:

return float(self[-1])

except IndexError:

return super().__float__()

@jamhot1

jamhot1 Dec 29, 2020
Author

ahhh, yes. Of course. I do now recall this. What has thrown me is that you can call self.data.Close without [-1] within next(). So which line is does it call in this case?

@kernc

kernc Dec 29, 2020
Maintainer

Depends on the context:

if self.data.some_signal:

reduces to __bool__ method result above, and:

self.buy(limit=self.data.Low)

to the __float__ one.

But:

if self.data.Close > sma:

actually applies the comparison to the whole array first, producing a new boolean array, and only afterwards reduces to its last value (via __bool__ again). In this way, it's way less efficient than the explicit case (if self.data.Close[-1] > sma), but it's easy to write, easy to read, and it works well enough.

Answer selected by jamhot1

Uh oh!

Resampling data #210

Uh oh!

jamhot1 Dec 27, 2020

Replies: 1 comment · 5 replies

Uh oh!

Uh oh!

kernc Dec 28, 2020 Maintainer

Uh oh!

jamhot1 Dec 28, 2020 Author

Uh oh!

jamhot1 Dec 28, 2020 Author

Uh oh!

kernc Dec 29, 2020 Maintainer

Uh oh!

jamhot1 Dec 29, 2020 Author

Uh oh!

kernc Dec 29, 2020 Maintainer

jamhot1
Dec 27, 2020

Replies: 1 comment 5 replies

kernc
Dec 28, 2020
Maintainer

jamhot1 Dec 28, 2020
Author

jamhot1 Dec 28, 2020
Author

kernc Dec 29, 2020
Maintainer

jamhot1 Dec 29, 2020
Author

kernc Dec 29, 2020
Maintainer