I have a Python script that calculates the historical S&P 500 returns from a starting balance and annual contribution. The script also outputs interesting statistics (mean/min/max/stddev/confidence intervals) related to the historical returns. I'm seeking feedback on how the code could be refactored more efficiently.
sp500_time_machine.py
import os, sys, time
import datetime
import argparse
import statistics
import math
from sp500_data import growth
MONTHS_PER_YEAR = 12
FIRST_YEAR = 1928 # This is the first year of data from the dataset
# Given an initial investment, an annual contribution and a span in years, show how the investment would mature
# based on historical trends of the S&P 500
def sp500_time_machine(starting_balance, span, annual_contribution):
realized_gain_list = []
average_gain = 0
current_year = datetime.date.today().year # Grab this from the OS
total_spans = (current_year - FIRST_YEAR - span) + 1
# Adjust the starting year for each span
for base_year in range(total_spans):
realized_gains = starting_balance
# Loop through each span, month by month
for month in range(span * MONTHS_PER_YEAR):
realized_gains = (realized_gains + (annual_contribution / MONTHS_PER_YEAR)) * (1 + growth[month + base_year] / 100)
# Store each realized gain over the requested span in a list for later processing
realized_gain_list.append(realized_gains)
print("S&P realized gains plus principle from %s to %s for %s starting balance = %s" % ((FIRST_YEAR + base_year), (FIRST_YEAR + base_year + span), f'{starting_balance:,}', f'{int(realized_gains):,}'))
average_gain = average_gain + realized_gains
# Display the average, minimum and maximum gain over the requested time span
mean = int(average_gain / total_spans)
print("Average %s year realized gains plus principle over %d years is %s" % (span, total_spans, f'{mean:,}'))
# Calculate the standard deviation
std_dev = statistics.stdev(realized_gain_list)
print("Standard Deviation = %s" % f'{int(std_dev):,}')
# Determine the 99% confidence interval
#
# Stock market returns are not normally distributed, so this is a simplification of actual real-world data
# https://klementoninvesting.substack.com/p/the-distribution-of-stock-market
#
# z-score values are based on normal distributions
# The value of 1.96 is based on the fact that 95% of the area of a normal distribution is within 1.96 standard deviations of the mean
# Likewise, 2.58 standard deviations contain 99% of the area of a normal distribution
# 90% confidence z-value = 1.65
# 95% confidence z-value = 1.96
# 99% confidence z-value = 2.58
upper_interval = mean + 2.58 * (std_dev / math.sqrt(total_spans))
print("99%% Confidence Interval (Upper) = %s" % f'{int(upper_interval):,}')
lower_interval = mean - 2.58 * (std_dev / math.sqrt(total_spans))
print("99%% Confidence Interval (Lower) = %s" % f'{int(lower_interval):,}')
# Find the min/max values
min_gain = min(realized_gain_list)
min_gain_index = realized_gain_list.index(min_gain)
print("Minimum realized gain plus principle over %d years occurred from %s to %s with a final balance of %s" %
(span, min_gain_index + FIRST_YEAR, min_gain_index + FIRST_YEAR + span, f'{int(min_gain):,}'))
max_gain = max(realized_gain_list)
man_gain_index = realized_gain_list.index(max_gain)
print("Maximum realized gain plus principle over %d years occurred from %s to %s with a final balance of %s" %
(span, man_gain_index + FIRST_YEAR, man_gain_index + FIRST_YEAR + span, f'{int(max_gain):,}'))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-s", "--span", help="The number of consecutive years (span) to iterate over")
parser.add_argument("-p", "--principle", help="The initial investment amount")
parser.add_argument("-a", "--annual", help="The annual contribution amount")
args = parser.parse_args()
sp500_time_machine(int(args.principle), int(args.span), int(args.annual))
sp500_data.py
# Monthly growth data taken from https://www.officialdata.org/us/stocks/s-p-500/ and based upon http://www.econ.yale.edu/~shiller/data.htm
# This data contains reinvested dividends
# This data does NOT adjust for inflation!
growth = [-0.83, 5.75, 6.66, 3.44, -4.57, 1.09, 3.59, 7.37, 2.36, 7.08, 0.70, 7.69, 0.81, 2.05, -0.30, 1.80, 2.20, 9.20, 5.96, 4.24, -10.32, -26.19, 4.37, 1.83, 6.64, 4.12, 6.69, -5.65, -9.77, -1.76, -0.90, 0.34, -13.37, -6.80, -6.19, 3.56, 8.14, 2.38, -9.08, -9.16, -2.68, 3.86, -2.49, -14.37, -12.75, 2.05, -18.10, -0.85, -0.05, 1.14, -23.22, -11.31, -12.39, 6.18, 51.35, 10.37, -13.22, -0.34, -2.64, 4.57, -11.27, 0.33, 11.24, 29.32, 17.58, 8.46, -4.64, -0.48, -9.38, 2.80, 2.32, 6.08, 7.75, -4.80, 2.02, -9.83, 1.70, -4.36, -3.51, -2.01, 1.21, 3.21, 1.06, 0.40, -2.62, -5.93, 7.94, 8.27, 4.17, 5.60, 7.10, 2.43, 2.99, 9.71, 0.29, 5.82, 6.03, 2.41, 0.41, -5.02, 4.57, 6.23, 2.30, 1.44, 5.55, 3.10, -1.40, 3.46, 3.30, 0.23, -5.62, -4.09, -3.34, 6.39, 1.44, -13.76, -14.10, -8.27, -1.02, 3.24, -1.80, -6.02, -3.44, 1.56, 2.93, 20.49, 1.06, -4.08, 11.62, 0.47, -2.55, -1.16, -0.46, 0.27, -12.24, 4.10, 2.17, 2.84, -1.07, 11.06, 1.38, -1.41, -1.97, -0.15, -0.23, -0.15, 1.42, -13.34, -8.09, 3.87, 2.65, 4.76, 1.47, 2.85, -3.59, 0.72, -5.72, 1.18, -2.55, -1.59, 4.11, 5.71, 0.08, 0.86, -3.43, -4.08, -5.88, 2.62, -2.48, -4.76, -3.45, 1.87, 5.75, 4.38, 0.05, 1.66, 7.97, 2.15, 1.06, 6.50, 6.43, 4.01, 3.79, 4.36, 2.18, 2.47, -4.54, 2.55, -0.50, -4.21, 1.77, 3.67, -0.24, 3.24, -1.31, 2.20, 5.14, 3.02, -1.06, -1.23, 2.88, -0.28, 2.60, 3.38, 3.73, 0.31, 2.90, 4.16, 2.19, -1.70, 0.71, 7.18, 4.51, 3.61, 2.02, 4.30, 0.59, -2.68, 6.77, 0.52, -0.34, -2.55, -1.62, -14.42, -1.87, -0.01, 3.39, 0.92, 4.27, -3.67, -3.30, -1.36, 3.92, 6.69, -1.56, -2.17, 3.03, -0.73, -1.12, -0.86, -4.45, 1.92, 8.19, 5.33, 4.59, -1.96, -2.49, -0.68, 3.19, -5.10, -0.16, 1.63, -3.33, 1.49, 0.41, -0.18, -4.91, 6.26, 4.17, 1.87, 3.14, 1.95, 3.24, 2.63, 2.52, 1.38, 3.39, 3.91, 2.16, -6.72, 6.64, 4.11, 4.72, 0.38, 0.19, 8.01, 4.31, -1.11, 1.93, 0.63, -1.15, 2.37, 4.97, 3.14, 0.03, -2.25, 3.61, 3.83, -1.33, 0.75, 0.20, 0.46, 3.24, 3.37, 0.88, -1.11, -1.61, 3.67, 4.51, 0.99, -0.77, 0.96, -4.47, 1.00, -3.11, 1.91, 0.90, -4.11, 3.52, 2.71, 1.84, 3.02, 2.68, 2.58, 4.45, 4.42, 1.22, 4.46, 2.39, 2.74, 2.71, 4.30, 4.95, 2.17, 3.70, -0.44, 3.81, -0.08, 6.15, 7.64, -0.30, 4.82, -4.72, 7.07, 1.24, -2.39, 0.95, 7.21, 1.48, -2.84, -0.26, 5.75, -0.28, -3.09, -0.95, -0.71, 1.81, -1.86, -4.00, 1.62, 2.64, 4.16, 1.95, 2.32, -5.21, -3.74, -5.90, -1.80, 0.32, 2.33, 0.70, 2.42, 0.90, 3.56, 2.74, 3.07, 4.05, 2.94, 4.36, 3.33, 2.16, 4.25, -1.27, 2.81, 1.94, 1.77, -0.61, 4.23, -0.32, -3.70, 0.18, 0.67, 3.46, -1.49, -3.61, -1.08, 1.58, -0.62, 3.99, -2.20, 1.49, -2.72, -1.67, 3.54, 2.69, 5.43, 4.37, 3.40, 2.92, 1.26, -1.08, -0.03, 3.84, -0.54, 1.34, 4.77, 1.16, -3.49, 1.91, 0.34, -2.94, -7.19, -11.41, 2.72, 3.02, -0.59, -2.86, 7.20, 4.62, 4.15, 1.60, -0.11, 4.98, 2.27, 0.22, -1.22, 3.03, 2.89, 0.50, -0.31, 2.39, 3.33, 1.48, 2.07, 1.69, 1.22, -0.35, 3.96, -1.23, 1.97, 1.97, 0.94, -1.49, 2.82, 0.98, 0.34, 1.56, 1.73, -4.51, 0.10, 2.12, 3.60, 2.50, 1.08, -0.21, 1.98, -0.43, -3.86, 3.32, -5.01, -0.56, 0.02, -5.77, -3.22, -0.56, 5.32, 0.72, 4.13, 3.73, 2.63, 1.99, 2.06, -0.99, 1.99, 1.85, 1.65, 0.10, -2.88, 3.11, -0.02, -4.26, -1.56, 7.66, 2.56, 2.94, 0.05, -1.93, 3.51, 2.72, 1.79, 1.29, -3.99, -0.24, -1.91, 2.27, 3.51, -4.97, -4.21, -0.28, 0.63, 1.35, 1.00, -5.03, -0.59, -3.20, 2.01, -2.75, -11.20, -0.27, 0.52, 3.26, 6.32, 2.49, 0.21, 7.16, 4.11, 4.15, 2.83, 3.67, -1.11, -1.60, -0.46, -1.52, 2.49, -1.86, -4.37, 7.16, 4.42, 2.09, 2.62, 1.26, -0.78, 0.52, -0.50, 3.78, -1.21, 0.42, 5.25, 2.31, 0.99, -3.33, -1.35, -1.63, -2.57, -1.99, 1.21, -1.64, 2.00, 4.24, -6.85, -6.81, 1.70, -2.47, 4.57, -4.82, -2.71, 0.46, -11.35, -3.76, -10.01, 2.38, 3.74, -6.09, 8.63, 10.81, 4.97, 1.49, 6.71, 2.89, 0.43, -7.00, -0.85, 4.97, 2.04, -1.18, 9.55, 4.18, 0.80, 1.10, -0.38, 0.90, 2.67, -0.56, 2.44, -3.11, -0.37, 3.79, -0.54, -2.37, -0.05, -1.19, 0.06, 0.90, 1.28, -2.08, -1.18, -2.20, 0.98, -0.08, -3.39, -0.97, 0.27, 4.83, 5.50, 0.67, -0.06, 7.33, 0.40, -2.77, -5.44, 1.92, 4.19, -1.06, 2.34, 2.43, -1.89, 2.42, 1.42, 5.01, 1.54, -3.35, -0.32, 4.40, 3.31, 4.40, -8.78, -1.16, 5.04, 6.86, 4.97, 3.50, 2.84, 3.32, 4.61, -1.24, 0.01, -3.07, 4.14, 1.29, -1.62, 0.86, -2.02, 0.80, -8.30, 1.73, 3.04, 1.18, -4.80, -1.91, -2.74, 5.47, 0.57, -5.27, 0.24, 0.79, 12.10, 8.88, 4.50, 1.36, 3.93, 2.13, 3.87, 4.20, 4.42, 1.75, 0.71, -2.41, 3.31, 0.65, -1.14, -0.13, 1.58, -5.11, 0.44, 0.51, -0.25, -1.85, -0.91, 9.21, 1.41, -0.41, 1.29, -0.71, 4.70, 5.79, -0.48, 1.02, 2.74, 2.51, 2.25, -1.85, -1.88, 1.50, 6.42, 5.29, 0.75, 5.70, 6.18, 2.74, 0.49, 3.13, -1.80, 2.28, -2.46, -0.09, 3.53, 1.71, 6.67, 6.46, 4.38, -0.86, 0.17, 4.50, 3.12, 6.45, -3.03, -11.85, -12.30, -1.33, 4.25, 3.33, 3.23, -0.89, -2.19, 6.00, -0.31, -1.72, 1.93, 3.80, -2.02, 2.33, 3.51, 3.30, -0.16, 3.56, 4.12, 3.39, 2.80, 4.69, 0.46, 0.29, -1.81, 2.74, -2.21, -2.53, 2.71, 0.20, 3.85, 3.17, 0.17, -7.86, -4.34, -2.32, 2.98, 4.59, -0.69, 11.61, 3.04, 2.26, -0.18, 0.35, 0.78, 2.68, -0.30, 0.18, 0.02, 0.94, 7.36, -0.60, -1.01, 0.26, 2.07, -1.33, 1.91, 0.94, 0.38, -1.18, 2.76, 3.27, 0.14, 1.72, 2.15, -1.34, 0.72, 0.87, 0.06, 1.76, 1.35, 1.24, 0.01, 0.89, 1.74, -0.08, -1.42, -3.35, 1.06, 1.11, -0.52, 3.08, 0.82, -0.44, -0.37, -1.03, 2.45, 3.82, 2.56, 3.22, 3.35, 3.18, 3.55, 0.51, 3.72, 0.91, 2.36, 3.39, 0.16, 5.90, -0.20, 0.20, 2.35, 1.28, -3.48, 3.08, 2.02, 4.12, 5.05, 1.20, 3.26, 4.36, -0.62, -3.41, 9.22, 5.34, 5.74, 0.35, 1.19, 1.65, -1.15, 2.63, 0.24, 6.40, 5.31, 3.41, -0.22, 0.12, 4.47, -6.97, -4.90, 1.29, 10.97, 4.10, 5.05, -0.07, 2.92, 4.25, -0.10, -0.61, 4.52, -3.77, -0.60, -1.27, 7.11, 2.81, -0.12, -2.48, 3.94, 1.42, -2.84, 3.16, 0.85, 0.94, -1.08, -5.21, -0.77, -3.32, 0.46, -2.14, -9.08, 0.45, 6.88, -2.39, -2.66, -2.05, -11.25, 3.18, 5.05, 1.47, -0.30, -3.35, 4.95, -3.51, -2.82, -5.92, -10.76, 1.14, -4.76, -1.37, 6.63, -1.04, -0.22, -6.41, 1.31, 5.29, 5.31, 5.70, 0.60, -0.17, 3.16, 2.03, 1.21, 3.06, 4.93, 1.09, -1.57, 0.97, -2.56, 2.86, -2.24, -1.39, 2.78, 0.10, 4.77, 2.73, -1.35, 1.68, -0.26, -2.41, 1.34, 2.18, 1.81, 0.31, 0.28, -2.62, 3.96, 2.14, 1.47, -0.02, 1.49, 0.80, -0.79, -2.71, 0.72, 2.29, 2.53, 3.62, 2.00, 2.15, 0.69, 1.60, -2.47, 4.18, 3.39, 0.34, 0.57, -4.20, 3.07, 2.99, -4.81, 1.24, -6.64, -1.56, -2.63, 4.24, 2.56, -4.25, -6.08, 2.11, -4.85, -20.19, -8.61, -0.35, -1.10, -6.70, -5.69, 12.32, 6.66, 2.87, 1.28, 8.12, 3.65, 2.40, 2.09, 2.23, 1.36, -2.90, 5.94, 4.09, -5.88, -3.54, -0.16, 0.86, 3.37, 4.58, 2.49, 3.71, 3.46, 3.15, -1.11, 2.22, 0.66, -3.66, 3.10, -10.40, -0.79, 3.02, 1.77, 1.55, 4.78, 4.16, 2.88, -0.04, -3.09, -1.15, 2.92, 3.39, 3.02, -0.22, -2.84, 2.18, 4.27, 2.33, 2.72, 1.45, 4.57, -1.12, 3.25, 0.25, 1.19, 2.12, 3.86, 1.52, 0.97, -0.13, 2.72, 0.20, 1.53, 3.20, 1.50, -0.43, 1.78, -2.65, 5.71, 0.63, -1.11, 2.83, 0.06, 0.88, 0.98, -0.44, -0.08, -2.42, -4.51, 4.32, 2.93, -1.10, -6.42, -0.55, 6.36, 2.83, -0.30, 1.07, 3.30, 1.20, -0.44, -0.51, 1.20, 3.95, 1.44, 2.58, 1.75, -0.15, 1.69, 1.78, 0.99, 0.25, 1.65, 2.73, 1.59, 2.88, 4.86, -2.89, 0.06, -1.66, 1.96, 2.11, 1.58, 2.45, 1.68, -3.85, -2.08, -5.56, 1.74, 5.83, 1.95, 3.72, -1.53, 1.40, 3.83, -3.13, 3.09, 0.01, 4.43, 2.47, 3.35, 0.12, -18.92, 4.32, 5.89, 6.51, 3.48, 5.89, -0.63, 1.73, 3.95, 4.26, 2.80, 2.49, 0.82, 6.02, 0.76, 1.81, 3.07, 2.19, -0.08, 0.45, 4.74, 0.27, -2.05, -2.90, -0.89, 0.12, -7.87, -3.37, 0.46, 6.45, -7.28, -3.09, 5.29, 0.01, 1.38, 3.15, -2.59, 4.00, 0.60, 4.80, 2.54]
Example usage:
python3 sp500_time_machine.py -s20 -p100000 -a0
1 Answer 1
lint
minor pep-8 nit: It would be useful to run isort on this.
docstrings where appropriate
# Given an initial investment ...
This is a helpful comment, and I thank you for it.
It would be more helpful as a """docstring""".
returning numeric results
def sp500_time_machine(starting_balance, span, annual_contribution):
This is an OK signature.
Consider adding float
, int
, float
type hints to it.
span
has units of years,
but that only becomes apparent upon reading the code.
We could introduce it a little more clearly.
If we did add type hinting, the signature would end with
def ... ) -> None:
which makes me sad. It fits with the verb "show" in the introductory comment.
At fifty-ish lines this function is not too long. But it does do more than one thing.
Consider breaking out the initial loop as a helper function, which returns a list of results.
More generally, consider making the computation of figures separate from the display of figures. This aids composition, and allows a test suite to verify specific calculations.
globals and testability
current_year = datetime.date.today().year
This is nice enough, but it relies on a global variable (clock).
It would be much nicer if we saw def ... , current_year=None):
in the signature, which defaults to current year:
current_year = current_year or datetime.date.today().year
That way a unit test could specify e.g. 2022
to "freeze" an historic result in time.
And the test would continue pass in 2024, 2025, and following years.
units, & parallel structure to names
The meaning of this manifest constant is very clear:
FIRST_YEAR = 1928
This seems to be a similar quantity, but it is zero-origin:
for base_year in range(total_spans):
Ultimately it combines with
for month in range(span * MONTHS_PER_YEAR)
to form an anonymous month + base_year
index.
Going back and forth on the units is a bit jarring.
Consider incrementing a datetime.date
(or datetime.datetime
) in the loop.
Consider breaking out that growth[]
de-reference so a helper function "knows"
how to turn a point-in-time into
the correct array index.
# Store each realized gain over the requested span in a list for later processing
Elide the obvious comment -- the code eloquently said that already.
realized_gain_list.append(realized_gains)
Consider switching to a dict
so the year is apparent:
realized_gain[FIRST_YEAR + base_year] = realized_gains
.
The idea is to produce a results datastructure
which a maintenance engineer could not possibly misinterpret.
... starting balance = %s" % ...
Outputting an explicit $
dollar sign wouldn't hurt.
Consider using an
f-string
rather than the %
percent operator.
extract helper
# Display the average, ...
To keep computation and display separate, consider breaking out this section into a helper function.
Thank you for the Klement citation and the reminder of Gaussian facts, so the magic numbers are well-explained.
wrangling inputs
Nice arg parser.
Aha! "--span", help="The number of consecutive years (span) ...
--
that's what I'd been looking for, perfect.
... int(args.principle), int(args.span), int(args.annual)
Consider asking
argparse
to call int()
for you,
by specifying type=int
.
docstrings on data
In sp500_data.py
,
thank you for the pair of URL citations,
and for the "not in constant dollars!" warning.
These comments would work nicely as a """docstring"""
on the growth
vector.
Then a maintenance engineer could import it and use
help(growth)
to better understand how to correctly
interpret those figures.
The list of pasted numbers is straightforward, and it's fine that we wind up with a very long line. It's too bad that we don't see code or comments explaining how that list was downloaded / cleaned / computed, in case we want to reproduce results or incorporate recent data a few years from now.
This function achieves its design goals.
It would benefit from unit tests,
and from pushing print()
statements
into separate helpers.
I would be willing to delegate or accept maintenance tasks on this codebase.
-
\$\begingroup\$ Great feedback - thanks! If you'd like to contribute, the github repo is github.com/ssantner/sp500_time_machine. Let me know your username and I'll add you as a collaborator. \$\endgroup\$b1tflpr– b1tflpr2023年07月31日 19:10:57 +00:00Commented Jul 31, 2023 at 19:10
-
\$\begingroup\$ I was just answering the mandatory pair of questions for any code review: (1) Is it correct? (2) Is it maintainable? (that is, is it ready to be merged down to
main
where others will interact with it?) My answers were "yes" and "yes". \$\endgroup\$J_H– J_H2023年07月31日 19:13:52 +00:00Commented Jul 31, 2023 at 19:13