Knowledge is a web application that automatically transforms the digital footprint into a personal search engine. It fetches content you interact with from various platforms—GitHub, HackerNews, Zotero, HuggingFace likes, X/Twitter—and organizes it into a navigable knowledge graph.
-
🤖 Automatic Aggregation: Daily, automated extraction of GitHub stars, HackerNews upvotes, and Zotero library.
-
🔍 Powerful Search: A built-in search engine to instantly find any item you've saved or interacted with.
-
🕸️ Knowledge Graph: Navigate bookmarks through a graph of automatically extracted topics and their connections.
My Personal Knowledge Base is available at raphaelsty.github.io/knowledge.
A GitHub Actions workflow runs once a day to perform the following tasks:
- Extracts Content from specified accounts:
- GitHub Stars
- HackerNews Upvotes
- Zotero Records
- HuggingFace Likes
- X/Twitter Bookmarks & Likes
- Processes and Stores Data in the
database/directory:database.json: Contains all the raw records.triples.json: Stores the knowledge graph data (topics and relationships).pipeline.pkl: The serialized search engine and knowledge graph pipeline.
- Deploys Updates:
- The backend API is automatically updated and pushed to the Fly.io instance.
- The frontend on GitHub Pages is refreshed with the latest data.
The backend is built with FastAPI and deployed on Fly.io, which offers a free tier suitable for this project. The frontend is a static site hosted on GitHub Pages. The search engine is powered by multiple cherche lexical models and features a final pylate-rs model, which is compiled from Rust to WebAssembly (WASM) to run directly in the client's browser.
Follow these steps to deploy your own instance of Knowledge.
First, fork this repository to your own GitHub account and then clone it to your local machine.
The application requires API keys and credentials to function. These must be set as Repository secrets in your forked repository's settings (Settings > Secrets and variables > Actions).
| Secret | Service | Required | Description |
|---|---|---|---|
FLY_API_TOKEN |
Fly.io | Yes | Allows the GitHub Action to deploy your application. See the Fly.io section for instructions. |
ZOTERO_API_KEY |
Zotero Settings | Optional | An API key to access your Zotero library. |
ZOTERO_LIBRARY_ID |
Zotero | Optional | The ID of the Zotero group library you want to index. |
HACKERNEWS_USERNAME |
Hacker News | Optional | HackerNews username to fetch upvoted posts. |
HACKERNEWS_PASSWORD |
Hacker News | Optional | HackerNews password. |
HUGGINGFACE_TOKEN |
HuggingFace | Optional | Token to fetch your HuggingFace liked models and datasets. |
TWITTER_AUTH_TOKEN |
X/Twitter | Optional | Browser auth_token cookie for X. See the X/Twitter section below. |
TWITTER_CT0 |
X/Twitter | Optional | Browser ct0 cookie (CSRF token) for X. See the X/Twitter section below. |
Next, edit the sources.yml file at the root of the repository to configure your data sources.
github: - "raphaelsty" - "gbolmier" - "MaxHalford" twitter: username: "raphaelsrty" min_likes: 10 max_pages: 2 semanlink: False huggingface: True
- github: List of GitHub usernames whose starred repositories you want to track.
- twitter: X/Twitter configuration. Set
usernameto your handle,min_likesto filter bookmarks, andmax_pagesto control how many pages of recent likes to fetch per run (~100 likes per page). Remove this block entirely to skip X/Twitter. - semanlink: Set to
Trueto enable Semanlink RDF data extraction. - huggingface: Set to
Trueto fetch your HuggingFace liked models and datasets (requiresHUGGINGFACE_TOKENsecret).
- Install
flyctl, the Fly.io command-line tool. Instructions can be found here. - Sign up and log in to Fly.io via the command line:
flyctl auth signup flyctl auth login
- Get API token and add it to your GitHub repository secrets as
FLY_API_TOKEN:flyctl auth token
- Launch the app. Follow the Fly.io launch documentation. This will generate a
fly.tomlfile. You won't need a database.
⚠️ Update API URLs After deploying, you must replace all instances ofhttps://knowledge.fly.devin thedocs/index.htmlfile with your own Fly.io app URL (e.g.,https://app_name.fly.dev).
- Go to your forked repository's settings (
Settings>Pages). - Under
Build and deployment, select the Source asDeploy from a branchand choose themainbranch with the/docsfolder.
⚠️ Update CORS Origins After your GitHub Pages site is live, you must add its URL to theoriginslist in theapi/api.pyfile to allow cross-origin requests.
origins = [ "https://your-github-username.github.io", # Add your GitHub Pages URL here ]
This project is designed to be affordable, but you are responsible for the costs incurred on Fly.io. Here is how to keep them in check:
⚠️ Bound Fly.io Concurrency To prevent costs from scaling unexpectedly, define connection limits in thefly.tomlfile.
[services.concurrency] hard_limit = 6 soft_limit = 3 type = "connections"
⚠️ Select a modest Fly.io VM A small virtual machine is sufficient. A shared-cpu-1x@1024MB is a good starting point.
To run the API on local machine for development, simply run the following command from the root of the repository:
make launch
The Zotero integration allows you to save academic papers, articles, and other documents, which will then be automatically indexed by your search engine.
-
Browser Extension: Use the Zotero Connector extension for your browser to easily save documents from the web.
-
Mobile App: The Zotero mobile app lets you add documents on the go. Any uploads will be indexed within a few hours.
This source is entirely optional. If you don't need it, simply remove the
sources.ymland skip this section.
The X/Twitter integration fetches your bookmarked tweets (filtered by a minimum like count) and your liked tweets. It uses Twikit, which connects to X's internal API — no paid API key required.
The setup is a bit of a trick: since X blocks automated logins from servers (Cloudflare protection), authentication relies on browser cookies rather than username/password.
- Log into x.com in your browser.
- Open DevTools (F12 or Cmd+Option+I).
- Go to Application > Cookies >
https://x.com. - Copy the values for
auth_tokenandct0. - Add them as GitHub repository secrets:
TWITTER_AUTH_TOKENandTWITTER_CT0.
On macOS with Safari, you can skip the manual step — the pipeline automatically extracts cookies from Safari when running locally (requires Full Disk Access for your terminal). This means uv run python run.py works out of the box on your Mac with no environment variables needed.
The auth_token cookie typically lasts about a year. The ct0 token may expire sooner. When the CI starts failing on the Twitter step, simply grab fresh cookies from your browser and update the GitHub secrets.
In sources.yml:
twitter: username: "your_handle" # Your X screen name min_likes: 10 # Minimum likes for bookmarked tweets to be included max_pages: 2 # Pages of recent likes to fetch per run (~100/page)
- Bookmarks: All pages are always fetched (typically small). Filtered by
min_likes. - Likes: Only the
max_pagesmost recent pages are fetched. This keeps daily CI runs fast while still catching new activity. For an initial backfill of your full like history, temporarily increasemax_pagesto200and run locally.
My personal Knowledge Base is inspired by and extracts resources from the Knowledge Base of François-Paul Servant, namely Semanlink.
This project is licensed under the GNU General Public License v3.0.
Knowledge Copyright (C) 2023-2025 Raphaël Sourty