-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
-
I have been playing around with resampling but decided to create a resampled data.close column through pandas for hourly from 5 min data from yahoo.
Because the resampled data only appears as a value every 12 rows, I wanted to be able to create a conditional from a variable that stores the last known value.
The data looks something like
nan
nan
nan
nan
nan ....
109.66653
nan
nan...
I create a condition to check if not nan and to store that int in a variable to check against later in my trade logic.
when I print the variable to see what is being stored, it seems to have stored all values including the nan's despite placing the conditional within the next method to ignore all nan values.
I have tried to store specific row values from within the Next method previously in a different way and seemed to run into a similar issue. Am I missing something?
example:
if not self.data.H_Close == 'nan' : self.Hourly = self.data.H_Close
when self.Hourly is printed on each iteration I get all values including nan
confused!
Beta Was this translation helpful? Give feedback.
All reactions
Checking for nan should be done with numpy.isnan()
as nans are a particularly curious bunch:
>>> float('nan') == float('nan') False >>> import numpy as np >>> np.nan == np.nan False
But more importantly, why not simply use Series.ffill()
?
>>> pd.Series([1, np.nan, np.nan, 2, np.nan]).ffill().tolist() [1, 1, 1, 2, 2]
Replies: 1 comment 5 replies
-
Checking for nan should be done with numpy.isnan()
as nans are a particularly curious bunch:
>>> float('nan') == float('nan') False >>> import numpy as np >>> np.nan == np.nan False
But more importantly, why not simply use Series.ffill()
?
>>> pd.Series([1, np.nan, np.nan, 2, np.nan]).ffill().tolist() [1, 1, 1, 2, 2]
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Thanks for the pointers Kernc!!
I did also try :
df['D_Close'] = df['D_Close'].fillna('a')
and checked for 'a' instead.
def next(self):
if not self.data.D_Close == 'a' :
self.Hourly = self.data.D_Close
print(self.Hourly)
And it prints the 'a' included with the int's
I will read up on the series.ffill()
Beta Was this translation helpful? Give feedback.
All reactions
-
just an update on where I think I was tripped up.
where
self.Hourly = self.data.D_Close
I found that
self.Hourly = self.data.D_Close[-1]
This gave me a printed output which didn't include the entire array (including nans) above it.
what am I seeing here? I can't quite understand why I need to look back a row to see what I thought was my current row.
Beta Was this translation helpful? Give feedback.
All reactions
-
This is pretty much how Python sequences work.
The reason a simple:
self.data.Close
(i.e. without explicit indexing) can be used in some (that is, boolean and numeric-scalar) contexts:
# e.g. if self.data.Close > self.data.Low + 2:
is the following piece of code which does the indexing for you:
backtesting.py/backtesting/_util.py
Lines 59 to 69 in 1ee5670
Beta Was this translation helpful? Give feedback.
All reactions
-
ahhh, yes. Of course. I do now recall this. What has thrown me is that you can call self.data.Close without [-1] within next(). So which line is does it call in this case?
Beta Was this translation helpful? Give feedback.
All reactions
-
Depends on the context:
if self.data.some_signal:
reduces to __bool__
method result above, and:
self.buy(limit=self.data.Low)
to the __float__
one.
But:
if self.data.Close > sma:
actually applies the comparison to the whole array first, producing a new boolean array, and only afterwards reduces to its last value (via __bool__
again). In this way, it's way less efficient than the explicit case (if self.data.Close[-1] > sma
), but it's easy to write, easy to read, and it works well enough.
Beta Was this translation helpful? Give feedback.