2
\$\begingroup\$

I want to perform some calculations over rolling periods of a pandas DataFrame and to abstract all this in a dedicated class. The job of the class below is simply, given backward and forward looking parameters to slice a column, perform some calculations and append the results as new columns of the original DataFrame.

I implemented the following:

import pandas as pd
import numpy as np
import random
random.seed(0)
np.random.seed(0)
class RollingEvaluator:
 def __init__(self, backward_window_size, forward_window_size):
 self.backward_window_size_ = backward_window_size
 self.forward_window_size_ = forward_window_size
 def run(self, data, column_name, func):
 n_rows = data.shape[0]
 output = []
 for i in range(self.backward_window_size_, n_rows-self.forward_window_size_):
 data_slice = data.iloc[(
 i-self.backward_window_size_):(i+1+self.forward_window_size_)]
 res = func(data_slice[column_name].values)
 current_row = {**res,
 **data.iloc[i]}
 output.append(current_row)
 return pd.DataFrame(output)
if __name__ == "__main__":
 def custom_function(l):
 return {"Percentage (custom)": l[-1] / l[0] - 1}
 data = pd.read_csv("AAPL.csv", nrows=10)
 re = RollingEvaluator(1, 0)
 print(data)
 res = re.run(data, "Adj Close", custom_function)
 res["Percentage (default)"] = res["Adj Close"].pct_change()
 print(res)

Is there a more pythonic / pandas oriented way to do this ?

The data looks like:

Date,High,Low,Open,Close,Volume,Adj Close
2000年01月03日,1.0044642686843872,0.9079241156578064,0.9363839030265808,0.9994419813156128,535796800.0,0.8594232797622681
2000年01月04日,0.9877232313156128,0.9034598469734192,0.9665178656578064,0.9151785969734192,512377600.0,0.786965012550354
2000年01月05日,0.9871651530265808,0.9196428656578064,0.9263392686843872,0.9285714030265808,778321600.0,0.7984815835952759
2000年01月06日,0.9553571343421936,0.8482142686843872,0.9475446343421936,0.8482142686843872,767972800.0,0.7293821573257446
2000年01月07日,0.9017857313156128,0.8526785969734192,0.8616071343421936,0.8883928656578064,460734400.0,0.7639315724372864
2000年01月10日,0.9129464030265808,0.8459821343421936,0.9107142686843872,0.8727678656578064,505064000.0,0.7504956722259521
2000年01月11日,0.8872767686843872,0.8080357313156128,0.8565848469734192,0.828125,441548800.0,0.7121071815490723
2000年01月12日,0.8526785969734192,0.7723214030265808,0.8482142686843872,0.7784598469734192,976068800.0,0.6694000959396362
2000年01月13日,0.8816964030265808,0.8258928656578064,0.8436104655265808,0.8638392686843872,1032684800.0,0.7428181767463684
2000年01月14日,0.9129464030265808,0.8872767686843872,0.8928571343421936,0.8967633843421936,390376000.0,0.7711297273635864
asked Jan 13, 2022 at 18:28
\$\endgroup\$
4
  • \$\begingroup\$ Why would you have custom_function at all when pct_change does the exact same thing, only vectorised? \$\endgroup\$ Commented Jan 14, 2022 at 3:13
  • \$\begingroup\$ Well, the custom_function was just an illustration of what I can do, the point of percentage was to make sure that I can reproduce the pct_change function. I actually use this class to fit models on slices of the data \$\endgroup\$ Commented Jan 14, 2022 at 9:11
  • \$\begingroup\$ Ok, but Code Review doesn't operate on illustrations - please show your actual use case \$\endgroup\$ Commented Jan 14, 2022 at 13:20
  • \$\begingroup\$ The question is very vague - "some calculations" doesn't really help us understand the purpose of the code. Are you able to guide us a bit more with expected use-cases, so this doesn't get closed as lacking sufficient clarity? \$\endgroup\$ Commented Nov 22, 2024 at 8:41

1 Answer 1

3
\$\begingroup\$

I don't think you should do this.

For your demonstrated AAPL just do something like

y = data['Adj Close']
y.iloc[1:]/y.iloc[:-1] - 1

which produces the same series, and is much faster and simpler.

RollingEvaluator is an over-abstraction that will stand in the way of proper Pandas, and prevents use of the right methods that perform well. For calculations that are not the above stock incremental growth expression, they may be able to use roll-vector style as above, or may require rolling, or something else; but none of those are possible with RollingEvaluator. Unfortunately it's difficult to get more specific than that: numerical code really is case-by-case.

answered Nov 22, 2024 at 4:31
\$\endgroup\$
2
  • \$\begingroup\$ Can you expand a little on "depending on the calculations done"? Is this something to do with floating-point precision? (A link would be fine, if it's to involved to fit in the answer). \$\endgroup\$ Commented Nov 22, 2024 at 8:39
  • 1
    \$\begingroup\$ @TobySpeight I tried. I don't know if I hit the mark. The motto is unfortunately "specific things are specific". But no, it doesn't have to do with precision; it's about performance and good API use. \$\endgroup\$ Commented Nov 22, 2024 at 23:32

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.