Aristotle is a highly customizable tool that collects links from sites.
With the properties in the config files, it scans all the defined sites and saves the metadata [title, description, imageLink, publishDate] of the site in the database.
These settings are basically:
- database: Currently, databases in this list ((https://docs.sqlalchemy.org/en/13/dialects/)) are supported. The settings of the DB where the links will be stored are entered here. For the
nameproperty, a database must be created in the DB and its name must be entered in this parameter. - locale: According to the language of the sites to be fetched, the feature to be localized must be entered here. For example, in English, en_EN should be entered.
- request: General features of the request.
- parser: In the parsing phase, if desired, title and description strings can be trimmed as much as the parameter given
database: dialect: mysql+pymysql url: localhost port: 3306 name: aristotle userName: root password: root locale: en_EN request: timeout: 3 userAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML) Chrome/23.0.1271.97 Safari/537.11 parser: titleCharLimit: 100 descriptionCharLimit: 300
article: - domain: cnn.com active: true link: https://edition.cnn.com/ filterForLink: mandatoryWords: ["/politics/"] permissibleWords: [] impermissibleWords: [] tagForMetadata: title: description: image: publishDate: publishDateFormat: "%Y-%m-%d" technology: - domain: mashable.com active: true link: https://mashable.com filterForLink: mandatoryWords: ["-"] permissibleWords: ['/article/'] impermissibleWords: [] tagForMetadata: title: description: image: publishDate: datetime publishDateFormat: "%d.%m.%Y"
If you'd like to contribute the project, feel free to clone a development version of this repository locally:
git clone https://github.com/egcodes/aristotle.git
Once you have a copy of the source, you can embed it in your Python package, or install it into your site-packages easily:
$ pip3 install -r requirements.txt
$ python3 setup.py install
- Python 3.x
- beautifulsoup4>=4.9.1
- requests>=2.24.0
- PyYAML>=5.3.1
- SQLAlchemy>=1.3.18
For database dialect, you must install the special dialect package for the database you use.
For example, if you are using MySQL, the PyMySQL package must be installed.