Calculating the value of a stock-portfolio over time using Python Pandas

Question 1

I am quite at the beginning of learning Python and decided to start my own little coding project to improve my skills. The goal of my project is to simulate the past growth of a stock portfolio based on historical stock prices. My code seems to work. However, I am pretty sure that the basic structure of the code is not very clever and probably makes things more complicated than they actually are. Maybe someone can help me and tell me the best procedure to solve a problem like mine?

I started with a dataframe containing historical stock prices for a number (here: 2) of stocks. Here a simple example with only 3 time periods:

import pandas as pd
import numpy as np
price_data = pd.DataFrame({'Stock_A': [5,6,10],
 'Stock_B': [5,7,2]})

Then I defined a start capital (here: 1000ドル). Furthermore, I decided how much of my money I want to invest in Stock_A (here: 50%) and Stock_B (here: also 50%).

capital = 1000
weighting = {'Stock_A': 0.5, 'Stock_B': 0.5}

Now, I can calculate how many shares of Stock_A and Stock_B I can buy in the beginning:

quantities = {key: weighting[key]*capital/price_data.get(key,0)[0] for key in weighting}

While time goes by the weights of the portfolio components will of course change, as the prices of Stock A and B move in opposite directions. So at some point the portfolio will mainly consist of Stock A, while the proportion of Stock B (value-wise) gets pretty small. To correct for this, I want to restore the initial 50:50 weighting as soon as the portfolio weights deviate too much from the initial weighting (so called rebalancing). I defined a function to decide whether a rebalancing is needed or not.

def need_to_rebalance(row):
 rebalance = False
 for asset in assets:
 if not 0.4 < row[asset] * quantities[asset] / portfolio_value < 0.6:
 rebalance = True
 break
 return rebalance

If we perform a rebalancing, the following formula returns the updated number of shares for Stock A and Stock B:

def rebalance(row):
 for asset in assets:
 quantities[asset] = weighting[asset]*portfolio_value/row[asset]
 return quantities

Finally, I defined a third function that I can use to loop over the dataframe containing the stock prices in order to calculate the value of the portfolio based on the current number of stocks we own. It looks like this:

def run_backtest(row):
 global portfolio_value, quantities
 portfolio_value = sum(np.array(row[assets]) * np.array(list(quantities.values())))
 if need_to_rebalance(row):
 quantities = rebalance(row)
 for asset in assets:
 historical_quantities[asset].append(quantities[asset])
 return portfolio_value

Then I put it all to work using .apply:

assets = list(price_data.columns)
historical_quantities = {}
for asset in assets:
 historical_quantities[asset] = []
 
output = price_data.copy()
output['portfolio_value'] = price_data.apply(run_backtest, axis = 1)
output.join(pd.DataFrame(historical_quantities), rsuffix='_weight')

The result looks reasonable to me, and it is basically what I wanted to achieve. However, I was wondering whether there is a more efficient way to solve the problem. Somehow, doing the calculation line by line and storing all the values in the variable 'historical quantities' just to add it to the dataframe at the end doesn't look very clever to me. Furthermore I have to use a lot of global variables. Storing a lot of values from inside the functions as global variables makes the code pretty messy (in particular, if I further extend my code to also include tax effects, which makes it necessary to also store all buying-prices in a further global variable). Is there a better way to tackle the problem?

All the best

Question 2

Can you show all of your code in one block at the end? Does it have test data for us to try?

Question 3

About your existing code:

I don't think it's useful to represent your price_data having Stock_A and Stock_B as columns. Instead consider making a multi-index, with one level as "time" and the second as "stock". Your columns will become Price, Quantity and Value.

Don't represent your weighting as a dictionary: for it to be more compatible with native Pandas, represent it as a Pandas Series. Among many other reasons, you will no longer have to write a dictionary comprehension for quantities and this can be direct Pandas calculation instead.

Unfortunately, since your quantities have a recursive dependency it's probably impossible to fully vectorise this solution. However, you can get it done with no explicit loops and only one apply, using a stateful iteration object.

At the end, it seems you were interested in the output of the join but never actually saved it to a variable, so it was discarded.

Consider replacing your 40%/60% bounds check with an "absolute deviation from 50%" bounds check which will be slightly more convenient to implement.

Suggested

The following suggested code is roughly equivalent to yours, deviating only in the shape of the data - stocks are represented in an index level, and portfolio values are left unaggregated.

from typing import Optional
import pandas as pd
CAPITAL = 1000
WEIGHTS = pd.Series(
 index=('A', 'B'),
 data=(0.50, 0.50),
)
class WeightState:
 def __init__(self) -> None:
 self.quantity: Optional[pd.Series] = None
 def next(self, row: pd.DataFrame) -> pd.Series:
 row = row.droplevel(0)
 if self.quantity is None:
 self.quantity = WEIGHTS / row.Price * CAPITAL
 else:
 values = row.Price * self.quantity
 portfolio_value = values.sum()
 deviation = (values / portfolio_value - 0.5).abs().max()
 if deviation >= 0.1:
 self.quantity = WEIGHTS / row.Price * portfolio_value
 return self.quantity
def balance(price_data: pd.DataFrame) -> None:
 price_data['Quantity'] = (
 price_data.groupby(level=0)
 .apply(WeightState().next)
 .stack()
 )
 price_data['Value'] = price_data.Price * price_data.Quantity
def main() -> None:
 price_data = pd.DataFrame({
 'Time': (0, 0, 1, 1, 2, 2),
 'Stock': ('A', 'B', 'A', 'B', 'A', 'B'),
 'Price': (5, 5, 6, 7, 10, 2),
 })
 price_data.set_index(['Time', 'Stock'], inplace=True)
 balance(price_data)
 print(price_data)
if __name__ == '__main__':
 main()

Output

 Price Quantity Value
Time Stock 
0 A 5 100.0 500.0
 B 5 100.0 500.0
1 A 6 100.0 600.0
 B 7 100.0 700.0
2 A 10 60.0 600.0
 B 2 300.0 600.0

Question 4

Thanks for your help! I really appreciate it. I have some problems understanding, how your suggested code works. Can you tell me, how Quantity is calculated exactly? Do I understand it correctly that you are running the function next on the grouped price_data? Why does this give us the Quantities for Stock A as well as for Stock B? Why do you use stack and droprow?

Question 5

The comments section isn't the right place for extended conversation; instead see chat.stackexchange.com/rooms/134436/…

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Answer 1 · 2022-02-23 22:52:12Z

About your existing code:

I don't think it's useful to represent your price_data having Stock_A and Stock_B as columns. Instead consider making a multi-index, with one level as "time" and the second as "stock". Your columns will become Price, Quantity and Value.

Don't represent your weighting as a dictionary: for it to be more compatible with native Pandas, represent it as a Pandas Series. Among many other reasons, you will no longer have to write a dictionary comprehension for quantities and this can be direct Pandas calculation instead.

Unfortunately, since your quantities have a recursive dependency it's probably impossible to fully vectorise this solution. However, you can get it done with no explicit loops and only one apply, using a stateful iteration object.

At the end, it seems you were interested in the output of the join but never actually saved it to a variable, so it was discarded.

Consider replacing your 40%/60% bounds check with an "absolute deviation from 50%" bounds check which will be slightly more convenient to implement.

Suggested

The following suggested code is roughly equivalent to yours, deviating only in the shape of the data - stocks are represented in an index level, and portfolio values are left unaggregated.

from typing import Optional
import pandas as pd
CAPITAL = 1000
WEIGHTS = pd.Series(
 index=('A', 'B'),
 data=(0.50, 0.50),
)
class WeightState:
 def __init__(self) -> None:
 self.quantity: Optional[pd.Series] = None
 def next(self, row: pd.DataFrame) -> pd.Series:
 row = row.droplevel(0)
 if self.quantity is None:
 self.quantity = WEIGHTS / row.Price * CAPITAL
 else:
 values = row.Price * self.quantity
 portfolio_value = values.sum()
 deviation = (values / portfolio_value - 0.5).abs().max()
 if deviation >= 0.1:
 self.quantity = WEIGHTS / row.Price * portfolio_value
 return self.quantity
def balance(price_data: pd.DataFrame) -> None:
 price_data['Quantity'] = (
 price_data.groupby(level=0)
 .apply(WeightState().next)
 .stack()
 )
 price_data['Value'] = price_data.Price * price_data.Quantity
def main() -> None:
 price_data = pd.DataFrame({
 'Time': (0, 0, 1, 1, 2, 2),
 'Stock': ('A', 'B', 'A', 'B', 'A', 'B'),
 'Price': (5, 5, 6, 7, 10, 2),
 })
 price_data.set_index(['Time', 'Stock'], inplace=True)
 balance(price_data)
 print(price_data)
if __name__ == '__main__':
 main()

Output

 Price Quantity Value
Time Stock 
0 A 5 100.0 500.0
 B 5 100.0 500.0
1 A 6 100.0 600.0
 B 7 100.0 700.0
2 A 10 60.0 600.0
 B 2 300.0 600.0

Thanks for your help! I really appreciate it. I have some problems understanding, how your suggested code works. Can you tell me, how Quantity is calculated exactly? Do I understand it correctly that you are running the function next on the grouped price_data? Why does this give us the Quantities for Stock A as well as for Stock B? Why do you use stack and droprow?
The comments section isn't the right place for extended conversation; instead see chat.stackexchange.com/rooms/134436/…

Stack Exchange Network

Calculating the value of a stock-portfolio over time using Python Pandas

1 Answer 1

Suggested

Output

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Calculating the value of a stock-portfolio over time using Python Pandas

1 Answer 1

Suggested

Output

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions