What is the structure of self.data ? · kernc/backtesting.py · Discussion #526

wolfcuring
Nov 11, 2021

I defined a func to prepare for the indicator which will use both open, high, low, close columns as input. As my experience before, the package will first turn all pandas DataFrames into np.arrays. So I add a line to turn it back into pandas, so that I can use pd funcs to prepare the data. But it is throwing an error which indicates that DataFrame constructor failed.

`def slump(df):

df = pd.DataFrame(df)
big = ((df.High - df.Low) > 10).astype(int) 
up = ((df.Open - df.Close) < 0).astype(int)
return big*up

...

class Big10(Strategy):

n = 52 
def init(self):
 self.big = self.I(slump, self.data, plot=True) `

and the error is:

Indicator "slump" errored with exception: DataFrame constructor not properly called!

I guess the problem lies in my calling self.data , as the input. As I see in the tutorials, self.data.Close works fine. So, I guess self.data has all the 5 columns ?

Answered by kernc

Nov 11, 2021

The docs explicitly say Strategy.data is not a dataframe, so you can't just use it as if.
If you need data to become a DataFrame, you can use its .df accessor (self.data.df), or .s accessor for individual series (self.data.Close.s).

Finally, when in doubt, just look in the source:

backtesting.py/backtesting/_util.py

Lines 103 to 195 in 267d99f

class _Data:

"""

A data array accessor. Provides access to OHLCV "columns"

as a standard `pd.DataFrame` would, except it's not a DataFrame

and the returned "series" are _not_ `pd.Series` but `np.ndarray`

for performance reasons.

"""

def __init__(self, df: pd.DataFrame):

self.__df = df

self....

View full answer

Replies: 1 comment 1 reply

kernc
Nov 11, 2021
Maintainer

Finally, when in doubt, just look in the source:

backtesting.py/backtesting/_util.py

Lines 103 to 195 in 267d99f

class _Data:

"""

A data array accessor. Provides access to OHLCV "columns"

as a standard `pd.DataFrame` would, except it's not a DataFrame

and the returned "series" are _not_ `pd.Series` but `np.ndarray`

for performance reasons.

"""

def __init__(self, df: pd.DataFrame):

self.__df = df

self.__i = len(df)

self.__pip: Optional[float] = None

self.__cache: Dict[str, _Array] = {}

self.__arrays: Dict[str, _Array] = {}

self._update()

def __getitem__(self, item):

return self.__get_array(item)

def __getattr__(self, item):

try:

return self.__get_array(item)

except KeyError:

raise AttributeError(f"Column '{item}' not in data") from None

def _set_length(self, i):

self.__i = i

self.__cache.clear()

def _update(self):

index = self.__df.index.copy()

self.__arrays = {col: _Array(arr, index=index)

for col, arr in self.__df.items()}

# Leave index as Series because pd.Timestamp nicer API to work with

self.__arrays['__index'] = index

def __repr__(self):

i = min(self.__i, len(self.__df) - 1)

index = self.__arrays['__index'][i]

items = ', '.join(f'{k}={v}' for k, v in self.__df.iloc[i].items())

return f'<Data i={i} ({index}) {items}>'

def __len__(self):

return self.__i

@property

def df(self) -> pd.DataFrame:

return (self.__df.iloc[:self.__i]

if self.__i < len(self.__df)

else self.__df)

@property

def pip(self) -> float:

if self.__pip is None:

self.__pip = 10**-np.median([len(s.partition('.')[-1])

for s in self.__arrays['Close'].astype(str)])

return self.__pip

def __get_array(self, key) -> _Array:

arr = self.__cache.get(key)

if arr is None:

arr = self.__cache[key] = self.__arrays[key][:self.__i]

return arr

@property

def Open(self) -> _Array:

return self.__get_array('Open')

@property

def High(self) -> _Array:

return self.__get_array('High')

@property

def Low(self) -> _Array:

return self.__get_array('Low')

@property

def Close(self) -> _Array:

return self.__get_array('Close')

@property

def Volume(self) -> _Array:

return self.__get_array('Volume')

@property

def index(self) -> pd.DatetimeIndex:

return self.__get_array('__index')

# Make pickling in Backtest.optimize() work with our catch-all __getattr__

def __getstate__(self):

return self.__dict__

def __setstate__(self, state):

self.__dict__ = state

😁

1 reply

@wolfcuring

wolfcuring Nov 12, 2021
Author

Cool ! It works. So I guess, it works, if I prepare all the conditions into one column and pass it onto OHLC data. Then, access it in a way, for example, when self.data.Signal == 1 buy, and when self.data.Signal == -1 close position ?

Answer selected by wolfcuring

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What is the structure of self.data ? #526

Uh oh!

{{title}}

Uh oh!

wolfcuring
Nov 11, 2021

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

kernc
Nov 11, 2021
Maintainer

Uh oh!

{{title}}

Uh oh!

wolfcuring Nov 12, 2021
Author

Select a reply

Uh oh!

Uh oh!

What is the structure of self.data ? #526

Uh oh!

wolfcuring Nov 11, 2021

Replies: 1 comment · 1 reply

Uh oh!

kernc Nov 11, 2021 Maintainer

Uh oh!

wolfcuring Nov 12, 2021 Author

wolfcuring
Nov 11, 2021

Replies: 1 comment 1 reply

kernc
Nov 11, 2021
Maintainer

wolfcuring Nov 12, 2021
Author