A python script to scrape opensourcealternative.to and locally search the data
| components | move non-entrypoint files to a submodule | |
| .gitignore | ignore index file | |
| cli.py | add default path for index file | |
| LICENSE.md | GPL | |
| Pipfile | create an index of all the open source alternatives | |
| Pipfile.lock | create an index of all the open source alternatives | |
| README.md | fix README typo in filename | |
| scrape.py | add some prints during the process to inform the user | |
This is a scraper that (may eventually be used) to combine data from many different "open source alternatives to proprietary software" lists into one open dataset.
Currently supports pulling in data from:
- opensourcealternative.to
Running
Both scripts use argparse, and are also documented through --help flags.
Running the scraper
pipenv install
pipenv run python3 ./scrape.py
this might take some time to run as there are some fairly substantial delays added when downloading individual project data so as not to annoy the webmasters of the services being scraped
Running the search tool
requires the index to be generated by the above step first
pipenv install
pipenv run python3 ./cli.py --index <path to index.json> --search "<search term>"