Pandas rolling evaluation

Question 1

I want to perform some calculations over rolling periods of a pandas DataFrame and to abstract all this in a dedicated class. The job of the class below is simply, given backward and forward looking parameters to slice a column, perform some calculations and append the results as new columns of the original DataFrame.

I implemented the following:

import pandas as pd
import numpy as np
import random
random.seed(0)
np.random.seed(0)
class RollingEvaluator:
 def __init__(self, backward_window_size, forward_window_size):
 self.backward_window_size_ = backward_window_size
 self.forward_window_size_ = forward_window_size
 def run(self, data, column_name, func):
 n_rows = data.shape[0]
 output = []
 for i in range(self.backward_window_size_, n_rows-self.forward_window_size_):
 data_slice = data.iloc[(
 i-self.backward_window_size_):(i+1+self.forward_window_size_)]
 res = func(data_slice[column_name].values)
 current_row = {**res,
 **data.iloc[i]}
 output.append(current_row)
 return pd.DataFrame(output)
if __name__ == "__main__":
 def custom_function(l):
 return {"Percentage (custom)": l[-1] / l[0] - 1}
 data = pd.read_csv("AAPL.csv", nrows=10)
 re = RollingEvaluator(1, 0)
 print(data)
 res = re.run(data, "Adj Close", custom_function)
 res["Percentage (default)"] = res["Adj Close"].pct_change()
 print(res)

Is there a more pythonic / pandas oriented way to do this ?

The data looks like:

Date,High,Low,Open,Close,Volume,Adj Close
2000年01月03日,1.0044642686843872,0.9079241156578064,0.9363839030265808,0.9994419813156128,535796800.0,0.8594232797622681
2000年01月04日,0.9877232313156128,0.9034598469734192,0.9665178656578064,0.9151785969734192,512377600.0,0.786965012550354
2000年01月05日,0.9871651530265808,0.9196428656578064,0.9263392686843872,0.9285714030265808,778321600.0,0.7984815835952759
2000年01月06日,0.9553571343421936,0.8482142686843872,0.9475446343421936,0.8482142686843872,767972800.0,0.7293821573257446
2000年01月07日,0.9017857313156128,0.8526785969734192,0.8616071343421936,0.8883928656578064,460734400.0,0.7639315724372864
2000年01月10日,0.9129464030265808,0.8459821343421936,0.9107142686843872,0.8727678656578064,505064000.0,0.7504956722259521
2000年01月11日,0.8872767686843872,0.8080357313156128,0.8565848469734192,0.828125,441548800.0,0.7121071815490723
2000年01月12日,0.8526785969734192,0.7723214030265808,0.8482142686843872,0.7784598469734192,976068800.0,0.6694000959396362
2000年01月13日,0.8816964030265808,0.8258928656578064,0.8436104655265808,0.8638392686843872,1032684800.0,0.7428181767463684
2000年01月14日,0.9129464030265808,0.8872767686843872,0.8928571343421936,0.8967633843421936,390376000.0,0.7711297273635864

Question 2

Why would you have custom_function at all when pct_change does the exact same thing, only vectorised?

Question 3

Well, the custom_function was just an illustration of what I can do, the point of percentage was to make sure that I can reproduce the pct_change function. I actually use this class to fit models on slices of the data

Question 4

Ok, but Code Review doesn't operate on illustrations - please show your actual use case

Question 5

The question is very vague - "some calculations" doesn't really help us understand the purpose of the code. Are you able to guide us a bit more with expected use-cases, so this doesn't get closed as lacking sufficient clarity?

Question 6

I don't think you should do this.

For your demonstrated AAPL just do something like

y = data['Adj Close']
y.iloc[1:]/y.iloc[:-1] - 1

which produces the same series, and is much faster and simpler.

RollingEvaluator is an over-abstraction that will stand in the way of proper Pandas, and prevents use of the right methods that perform well. For calculations that are not the above stock incremental growth expression, they may be able to use roll-vector style as above, or may require rolling, or something else; but none of those are possible with RollingEvaluator. Unfortunately it's difficult to get more specific than that: numerical code really is case-by-case.

Question 7

Can you expand a little on "depending on the calculations done"? Is this something to do with floating-point precision? (A link would be fine, if it's to involved to fit in the answer).

Question 8

@TobySpeight I tried. I don't know if I hit the mark. The motto is unfortunately "specific things are specific". But no, it doesn't have to do with precision; it's about performance and good API use.

Reinderien Reinderien 71k5 gold badges76 silver badges256 bronze badges · Answer 1 · 2024-11-22 04:31:35Z

I don't think you should do this.

For your demonstrated AAPL just do something like

y = data['Adj Close']
y.iloc[1:]/y.iloc[:-1] - 1

which produces the same series, and is much faster and simpler.

RollingEvaluator is an over-abstraction that will stand in the way of proper Pandas, and prevents use of the right methods that perform well. For calculations that are not the above stock incremental growth expression, they may be able to use roll-vector style as above, or may require rolling, or something else; but none of those are possible with RollingEvaluator. Unfortunately it's difficult to get more specific than that: numerical code really is case-by-case.

Can you expand a little on "depending on the calculations done"? Is this something to do with floating-point precision? (A link would be fine, if it's to involved to fit in the answer).
@TobySpeight I tried. I don't know if I hit the mark. The motto is unfortunately "specific things are specific". But no, it doesn't have to do with precision; it's about performance and good API use.

Stack Exchange Network

Pandas rolling evaluation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Pandas rolling evaluation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions