I am quite at the beginning of learning Python and decided to start my own little coding project to improve my skills. The goal of my project is to simulate the past growth of a stock portfolio based on historical stock prices. My code seems to work. However, I am pretty sure that the basic structure of the code is not very clever and probably makes things more complicated than they actually are. Maybe someone can help me and tell me the best procedure to solve a problem like mine?
I started with a dataframe containing historical stock prices for a number (here: 2) of stocks. Here a simple example with only 3 time periods:
import pandas as pd
import numpy as np
price_data = pd.DataFrame({'Stock_A': [5,6,10],
'Stock_B': [5,7,2]})
Then I defined a start capital (here: 1000ドル). Furthermore, I decided how much of my money I want to invest in Stock_A
(here: 50%) and Stock_B
(here: also 50%).
capital = 1000
weighting = {'Stock_A': 0.5, 'Stock_B': 0.5}
Now, I can calculate how many shares of Stock_A
and Stock_B
I can buy in the beginning:
quantities = {key: weighting[key]*capital/price_data.get(key,0)[0] for key in weighting}
While time goes by the weights of the portfolio components will of course change, as the prices of Stock A and B move in opposite directions. So at some point the portfolio will mainly consist of Stock A, while the proportion of Stock B (value-wise) gets pretty small. To correct for this, I want to restore the initial 50:50 weighting as soon as the portfolio weights deviate too much from the initial weighting (so called rebalancing). I defined a function to decide whether a rebalancing is needed or not.
def need_to_rebalance(row):
rebalance = False
for asset in assets:
if not 0.4 < row[asset] * quantities[asset] / portfolio_value < 0.6:
rebalance = True
break
return rebalance
If we perform a rebalancing, the following formula returns the updated number of shares for Stock A and Stock B:
def rebalance(row):
for asset in assets:
quantities[asset] = weighting[asset]*portfolio_value/row[asset]
return quantities
Finally, I defined a third function that I can use to loop over the dataframe containing the stock prices in order to calculate the value of the portfolio based on the current number of stocks we own. It looks like this:
def run_backtest(row):
global portfolio_value, quantities
portfolio_value = sum(np.array(row[assets]) * np.array(list(quantities.values())))
if need_to_rebalance(row):
quantities = rebalance(row)
for asset in assets:
historical_quantities[asset].append(quantities[asset])
return portfolio_value
Then I put it all to work using .apply
:
assets = list(price_data.columns)
historical_quantities = {}
for asset in assets:
historical_quantities[asset] = []
output = price_data.copy()
output['portfolio_value'] = price_data.apply(run_backtest, axis = 1)
output.join(pd.DataFrame(historical_quantities), rsuffix='_weight')
The result looks reasonable to me, and it is basically what I wanted to achieve. However, I was wondering whether there is a more efficient way to solve the problem. Somehow, doing the calculation line by line and storing all the values in the variable 'historical quantities' just to add it to the dataframe at the end doesn't look very clever to me. Furthermore I have to use a lot of global variables. Storing a lot of values from inside the functions as global variables makes the code pretty messy (in particular, if I further extend my code to also include tax effects, which makes it necessary to also store all buying-prices in a further global variable). Is there a better way to tackle the problem?
All the best
-
3\$\begingroup\$ Can you show all of your code in one block at the end? Does it have test data for us to try? \$\endgroup\$Reinderien– Reinderien2022年02月23日 20:14:54 +00:00Commented Feb 23, 2022 at 20:14
1 Answer 1
About your existing code:
I don't think it's useful to represent your price_data
having Stock_A
and Stock_B
as columns. Instead consider making a multi-index, with one level as "time" and the second as "stock". Your columns will become Price, Quantity and Value.
Don't represent your weighting
as a dictionary: for it to be more compatible with native Pandas, represent it as a Pandas Series
. Among many other reasons, you will no longer have to write a dictionary comprehension for quantities
and this can be direct Pandas calculation instead.
Unfortunately, since your quantities have a recursive dependency it's probably impossible to fully vectorise this solution. However, you can get it done with no explicit loops and only one apply
, using a stateful iteration object.
At the end, it seems you were interested in the output of the join
but never actually saved it to a variable, so it was discarded.
Consider replacing your 40%/60% bounds check with an "absolute deviation from 50%" bounds check which will be slightly more convenient to implement.
Suggested
The following suggested code is roughly equivalent to yours, deviating only in the shape of the data - stocks are represented in an index level, and portfolio values are left unaggregated.
from typing import Optional
import pandas as pd
CAPITAL = 1000
WEIGHTS = pd.Series(
index=('A', 'B'),
data=(0.50, 0.50),
)
class WeightState:
def __init__(self) -> None:
self.quantity: Optional[pd.Series] = None
def next(self, row: pd.DataFrame) -> pd.Series:
row = row.droplevel(0)
if self.quantity is None:
self.quantity = WEIGHTS / row.Price * CAPITAL
else:
values = row.Price * self.quantity
portfolio_value = values.sum()
deviation = (values / portfolio_value - 0.5).abs().max()
if deviation >= 0.1:
self.quantity = WEIGHTS / row.Price * portfolio_value
return self.quantity
def balance(price_data: pd.DataFrame) -> None:
price_data['Quantity'] = (
price_data.groupby(level=0)
.apply(WeightState().next)
.stack()
)
price_data['Value'] = price_data.Price * price_data.Quantity
def main() -> None:
price_data = pd.DataFrame({
'Time': (0, 0, 1, 1, 2, 2),
'Stock': ('A', 'B', 'A', 'B', 'A', 'B'),
'Price': (5, 5, 6, 7, 10, 2),
})
price_data.set_index(['Time', 'Stock'], inplace=True)
balance(price_data)
print(price_data)
if __name__ == '__main__':
main()
Output
Price Quantity Value
Time Stock
0 A 5 100.0 500.0
B 5 100.0 500.0
1 A 6 100.0 600.0
B 7 100.0 700.0
2 A 10 60.0 600.0
B 2 300.0 600.0
-
\$\begingroup\$ Thanks for your help! I really appreciate it. I have some problems understanding, how your suggested code works. Can you tell me, how Quantity is calculated exactly? Do I understand it correctly that you are running the function next on the grouped price_data? Why does this give us the Quantities for Stock A as well as for Stock B? Why do you use
stack
anddroprow
? \$\endgroup\$LuckyLuke– LuckyLuke2022年02月25日 23:42:05 +00:00Commented Feb 25, 2022 at 23:42 -
\$\begingroup\$ The comments section isn't the right place for extended conversation; instead see chat.stackexchange.com/rooms/134436/… \$\endgroup\$Reinderien– Reinderien2022年02月26日 00:25:17 +00:00Commented Feb 26, 2022 at 0:25