Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

egcodes/aristotle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

167 Commits

Repository files navigation

Aristotle is a highly customizable tool that collects links from sites.

Aristotle

With the properties in the config files, it scans all the defined sites and saves the metadata [title, description, imageLink, publishDate] of the site in the database.

Usage

config/properties.yaml

These settings are basically:

  1. database: Currently, databases in this list ((https://docs.sqlalchemy.org/en/13/dialects/)) are supported. The settings of the DB where the links will be stored are entered here. For the name property, a database must be created in the DB and its name must be entered in this parameter.
  2. locale: According to the language of the sites to be fetched, the feature to be localized must be entered here. For example, in English, en_EN should be entered.
  3. request: General features of the request.
  4. parser: In the parsing phase, if desired, title and description strings can be trimmed as much as the parameter given
database:
 dialect: mysql+pymysql
 url: localhost
 port: 3306
 name: aristotle
 userName: root
 password: root
locale: en_EN
request:
 timeout: 3
 userAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML) Chrome/23.0.1271.97 Safari/537.11
parser:
 titleCharLimit: 100
 descriptionCharLimit: 300

config/sources-{locale}.yaml

article:
 - domain: cnn.com
 active: true
 link: https://edition.cnn.com/
 filterForLink:
 mandatoryWords: ["/politics/"]
 permissibleWords: []
 impermissibleWords: []
 tagForMetadata:
 title:
 description:
 image:
 publishDate:
 publishDateFormat: "%Y-%m-%d"
technology:
 - domain: mashable.com
 active: true
 link: https://mashable.com
 filterForLink:
 mandatoryWords: ["-"]
 permissibleWords: ['/article/']
 impermissibleWords: []
 tagForMetadata:
 title:
 description:
 image:
 publishDate: datetime
 publishDateFormat: "%d.%m.%Y"

Development

If you'd like to contribute the project, feel free to clone a development version of this repository locally:

git clone https://github.com/egcodes/aristotle.git

Once you have a copy of the source, you can embed it in your Python package, or install it into your site-packages easily:

$ pip3 install -r requirements.txt
$ python3 setup.py install

Requirements

  • Python 3.x
    • beautifulsoup4>=4.9.1
    • requests>=2.24.0
    • PyYAML>=5.3.1
    • SQLAlchemy>=1.3.18

For database dialect, you must install the special dialect package for the database you use. For example, if you are using MySQL, the PyMySQL package must be installed.

About

highly customizable news collector

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /