3
5
Fork
You've already forked ariadne
2
Ariadne is the web crawler for the Clew search engine.
  • Python 92.8%
  • CSS 5%
  • HTML 2.2%
2026年04月06日 17:36:11 +08:00
lib Move blocklists to database 2026年04月06日 17:36:11 +08:00
src Move blocklists to database 2026年04月06日 17:36:11 +08:00
.gitignore Tally keywords for each field 2025年03月28日 14:22:48 -05:00
external-dependencies Add body text extraction heuristics 2025年03月27日 14:19:18 -05:00
LICENSE Fix more errors 2024年04月16日 18:43:40 -05:00
pyproject.toml Move blocklists to database 2026年04月06日 17:36:11 +08:00
README.md Move blocklists to database 2026年04月06日 17:36:11 +08:00

Ariadne

Ariadne is the web crawler for the Clew search engine.

User Agent and robots.txt

Ariadne crawls using the following User Agent, where "{node_id}" is replaced with the ID of the crawler node making the request:

Ariadne (web crawler for Clew; https://clew.se/about/; crawler node {node_id})

You can block it in your robots.txt as Ariadne.

Installation

The binaries for all three parts of the Ariadne architecture can be installed with a simple call to pipx: pipx install git+https://codeberg.org/Clew/ariadne.

If running ariadne, you will need the external package dependencies listed in the external-dependencies file in this repository to be able to parse webpages. If not running ariadne, you should only need to install libicu-dev.

When running the ariadne and daedalus services, you will need to configure a PostgreSQL database (as shown below) and run ariadne setup-database. You can create icarus notes using daedalus create-node <name> <email>.

To run an icarus node, you do not need a database; you can simply create the configuration file as shown below and run the icarus command. To get a TOTP code independently for use wit the Daedalus dashboard, run icarus totp.

Configuration

By default, Ariadne looks at ~/.config/ariadne/config.toml for its setup. This one file is used by all three parts of the architecture. You can change the location of the configuration file by setting the ARIADNE_CONFIG_PATH environment variable.

An example configuration file:

[logging]
verbosity="INFO"
[database]
name="clew_index"
host="localhost"
port="5432"
user="ariadne"
password="<password goes here>"
timeout=500
[ariadne]
task_processors=3
discovery_cap=100000
[daedalus]
max_parcel_size=80
host="127.0.0.1"
port=5400
user_agent="Ariadne (web crawler for Clew; https://clew.se/about/; crawler node {node_id})"
user_agent_short="Ariadne"
[icarus]
daedalus_instance="https://daedalus.clew.se"
simultaneous_requests = 5
name="<node name goes here>"
secret="<node secret goes here>"

If you are hosting an Icarus node, you do not need the database, ariadne, or daedalus sections of the configuration.

License

Ariadne is licensed under the AGPLv3 License.