I want to perform some calculations over rolling periods of a pandas DataFrame and to abstract all this in a dedicated class. The job of the class below is simply, given backward and forward looking parameters to slice a column, perform some calculations and append the results as new columns of the original DataFrame.
I implemented the following:
import pandas as pd
import numpy as np
import random
random.seed(0)
np.random.seed(0)
class RollingEvaluator:
def __init__(self, backward_window_size, forward_window_size):
self.backward_window_size_ = backward_window_size
self.forward_window_size_ = forward_window_size
def run(self, data, column_name, func):
n_rows = data.shape[0]
output = []
for i in range(self.backward_window_size_, n_rows-self.forward_window_size_):
data_slice = data.iloc[(
i-self.backward_window_size_):(i+1+self.forward_window_size_)]
res = func(data_slice[column_name].values)
current_row = {**res,
**data.iloc[i]}
output.append(current_row)
return pd.DataFrame(output)
if __name__ == "__main__":
def custom_function(l):
return {"Percentage (custom)": l[-1] / l[0] - 1}
data = pd.read_csv("AAPL.csv", nrows=10)
re = RollingEvaluator(1, 0)
print(data)
res = re.run(data, "Adj Close", custom_function)
res["Percentage (default)"] = res["Adj Close"].pct_change()
print(res)
Is there a more pythonic / pandas oriented way to do this ?
The data looks like:
Date,High,Low,Open,Close,Volume,Adj Close
2000年01月03日,1.0044642686843872,0.9079241156578064,0.9363839030265808,0.9994419813156128,535796800.0,0.8594232797622681
2000年01月04日,0.9877232313156128,0.9034598469734192,0.9665178656578064,0.9151785969734192,512377600.0,0.786965012550354
2000年01月05日,0.9871651530265808,0.9196428656578064,0.9263392686843872,0.9285714030265808,778321600.0,0.7984815835952759
2000年01月06日,0.9553571343421936,0.8482142686843872,0.9475446343421936,0.8482142686843872,767972800.0,0.7293821573257446
2000年01月07日,0.9017857313156128,0.8526785969734192,0.8616071343421936,0.8883928656578064,460734400.0,0.7639315724372864
2000年01月10日,0.9129464030265808,0.8459821343421936,0.9107142686843872,0.8727678656578064,505064000.0,0.7504956722259521
2000年01月11日,0.8872767686843872,0.8080357313156128,0.8565848469734192,0.828125,441548800.0,0.7121071815490723
2000年01月12日,0.8526785969734192,0.7723214030265808,0.8482142686843872,0.7784598469734192,976068800.0,0.6694000959396362
2000年01月13日,0.8816964030265808,0.8258928656578064,0.8436104655265808,0.8638392686843872,1032684800.0,0.7428181767463684
2000年01月14日,0.9129464030265808,0.8872767686843872,0.8928571343421936,0.8967633843421936,390376000.0,0.7711297273635864
1 Answer 1
I don't think you should do this.
For your demonstrated AAPL
just do something like
y = data['Adj Close']
y.iloc[1:]/y.iloc[:-1] - 1
which produces the same series, and is much faster and simpler.
RollingEvaluator
is an over-abstraction that will stand in the way of proper Pandas, and prevents use of the right methods that perform well. For calculations that are not the above stock incremental growth expression, they may be able to use roll-vector style as above, or may require rolling, or something else; but none of those are possible with RollingEvaluator
. Unfortunately it's difficult to get more specific than that: numerical code really is case-by-case.
-
\$\begingroup\$ Can you expand a little on "depending on the calculations done"? Is this something to do with floating-point precision? (A link would be fine, if it's to involved to fit in the answer). \$\endgroup\$Toby Speight– Toby Speight2024年11月22日 08:39:29 +00:00Commented Nov 22, 2024 at 8:39
-
1\$\begingroup\$ @TobySpeight I tried. I don't know if I hit the mark. The motto is unfortunately "specific things are specific". But no, it doesn't have to do with precision; it's about performance and good API use. \$\endgroup\$Reinderien– Reinderien2024年11月22日 23:32:26 +00:00Commented Nov 22, 2024 at 23:32
custom_function
at all whenpct_change
does the exact same thing, only vectorised? \$\endgroup\$custom_function
was just an illustration of what I can do, the point of percentage was to make sure that I can reproduce thepct_change
function. I actually use this class to fit models on slices of the data \$\endgroup\$