-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[ENH] Remove yfinance as a dependency and implement data_loader#721
[ENH] Remove yfinance as a dependency and implement data_loader #721Shuvam586 wants to merge 5 commits into
Conversation
Shuvam586
commented
Mar 3, 2026
instead of editing tickers mentioned in the cookbook notebooks, i added 2 csv files to pypfopt/data. stock_prices.csv and market_caps.csv with only data from 2023 onwards. the stock_prices.csv is around 450kb.
also removed yfinance related statements and functions from all notebooks
@fkiraly
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks!
May I request to not use data downloaded via yfinance from Yahoo services at all? This is due to terms of use, we should not distribute data from Yahoo services at all in the repository or package.
Could you instead use similar data? Either completely randomly generated (Brownian motion random walk or similar, with same column names and time index), or taking some inspiration from the actual data in how you randomize - but it cannot be the exact values.
Shuvam586
commented
Mar 5, 2026
market_caps.csv and stock_prices.csv now contain data produced synthetically with Geometric Brownian Motion.
fkiraly
commented
Mar 5, 2026
Thanks! Could you kindly post here the plots in the notebooks before/after, just to check if they look similar?
fkiraly
commented
Mar 5, 2026
also, code formatting tests are failing, please look at pre-commit
Shuvam586
commented
Mar 7, 2026
i have run pre-commit and pushed the changes.
Shuvam586
commented
Mar 7, 2026
fkiraly
commented
Mar 7, 2026
Thanks!
I suspect the actual data would be closer to exponential Brownian motion - that should be achieved by simply taking np.exp of the Brownian motion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add docstrings (numpydoc format)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not known in advance? Instad of loading the csv, you could simply load the header, or return the known list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, i will rewrite it to return only the list of the tickers in the csv file.
@fkiraly
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good!
- can you re-execute the notebooks after a clean reset?
- I think the simulated data should be exponential brownian motion to resemble the actual data
- please add numpydoc docstrings to the data loaders
- further comments above
fkiraly
commented
Mar 8, 2026
non-blocking - would it be possible to include the simulation code somewhere in the utils module?
Shuvam586
commented
Mar 12, 2026
I suspect the actual data would be closer to exponential Brownian motion - that should be achieved by simply taking
np.expof the Brownian motion.
i am confused. the code i used to generate the synthetic data uses exponential brownian motion.
for t in range(1, n_days): z = np.random.normal(size=len(tickers)) prices[t] = prices[t-1] * np.exp( (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * z )
Shuvam586
commented
Mar 12, 2026
can you re-execute the notebooks after a clean reset?
yes. i re-executed the cookbook notebooks before making the first commit.
fkiraly
commented
Mar 14, 2026
i am confused. the code i used to generate the synthetic data uses exponential brownian motion.
oh, ok - you are right. Then we do not need to change that.
Can you check the pyproject.toml and resolve the merge conflict?
Shuvam586
commented
Mar 21, 2026
non-blocking - would it be possible to include the simulation code somewhere in the
utilsmodule?
i would be happy to, but cant find a utils folder anywhere
Shuvam586
commented
Mar 21, 2026
i have made the docstring changes and resolved the conflict. i had exams going on, sorry for the hold-up.
Shuvam586
commented
Mar 29, 2026
@fkiraly not to bother you, but can this be merged?
amishverma05
commented
Apr 23, 2026
Hi @Shuvam586,
I was testing out your code and noticed a couple of things that need to be addressed:
1. Notebooks Crash on Import (ModuleNotFoundError)
In the updated cookbook/*.ipynb notebooks, the first cell still has !pip install PyPortfolioOpt. Running this forces Jupyter to download v1.6.0 from PyPI. Because the notebook is inside cookbook/ and pypfopt is outside in the parent directory, Python ignores your new local code, defaults to the PyPI version, and crashes because it can't find pypfopt.data.
I feel the !pip install command can be removed, and instead add the parent directory to sys.path so Python checks the local files first:
import sys import os sys.path.insert(0, os.path.abspath('..')) import matplotlib.pyplot as plt from pypfopt.data.data_loader import load_stockdata
2. Make available_tickers() Dynamic
In data_loader.py, available_tickers() returns a hardcoded list of the 33 companies. If the CSV files ever get updated with new datasets, this function will easily fall out of sync.
To keep this robust, you should dynamically read the column straight from the local CSV:
def available_tickers(): """ Return the list of available ticker symbols dynamically from the bundled dataset. """ df = _load_raw_data("market_caps.csv") return sorted(df["ticker"].tolist())
- contributors
pip install -e .before working on the repo. this makes the notebooks use the local version of pypfopt instead of the pypi one. usingsys.pathis unnecessary. - @fkiraly suggested in a previous change, to return the list of tickers. it is unlikely the csv file will change. the data is simulated and only acts as an example source.
Shuvam586
commented
Jun 18, 2026
hi @fkiraly, kindly, is it possible for you to review this pr. has been open for very long.
Closes #716
pypfopt.dataloaders for stock prices and market caps.