Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

raphaelsty/knowledge

Repository files navigation

Knowledge

Personal Knowledge Base

Demonstration GIF

Knowledge is a web application that automatically transforms the digital footprint into a personal search engine. It fetches content you interact with from various platforms—GitHub, HackerNews, Zotero, HuggingFace likes, X/Twitter—and organizes it into a navigable knowledge graph.


🌟 Features

  • 🤖 Automatic Aggregation: Daily, automated extraction of GitHub stars, HackerNews upvotes, and Zotero library.

  • 🔍 Powerful Search: A built-in search engine to instantly find any item you've saved or interacted with.

  • 🕸️ Knowledge Graph: Navigate bookmarks through a graph of automatically extracted topics and their connections.

My Personal Knowledge Base is available at raphaelsty.github.io/knowledge.


🛠️ How It Works

A GitHub Actions workflow runs once a day to perform the following tasks:

  1. Extracts Content from specified accounts:
    • GitHub Stars
    • HackerNews Upvotes
    • Zotero Records
    • HuggingFace Likes
    • X/Twitter Bookmarks & Likes
  2. Processes and Stores Data in the database/ directory:
    • database.json: Contains all the raw records.
    • triples.json: Stores the knowledge graph data (topics and relationships).
    • pipeline.pkl: The serialized search engine and knowledge graph pipeline.
  3. Deploys Updates:
    • The backend API is automatically updated and pushed to the Fly.io instance.
    • The frontend on GitHub Pages is refreshed with the latest data.

The backend is built with FastAPI and deployed on Fly.io, which offers a free tier suitable for this project. The frontend is a static site hosted on GitHub Pages. The search engine is powered by multiple cherche lexical models and features a final pylate-rs model, which is compiled from Rust to WebAssembly (WASM) to run directly in the client's browser.

🚀 Getting Started: Installation & Deployment

Follow these steps to deploy your own instance of Knowledge.

1. Fork & Clone

First, fork this repository to your own GitHub account and then clone it to your local machine.

2. Configuration

A. Configure Secrets

The application requires API keys and credentials to function. These must be set as Repository secrets in your forked repository's settings (Settings > Secrets and variables > Actions).


Secret Service Required Description
FLY_API_TOKEN Fly.io Yes Allows the GitHub Action to deploy your application. See the Fly.io section for instructions.
ZOTERO_API_KEY Zotero Settings Optional An API key to access your Zotero library.
ZOTERO_LIBRARY_ID Zotero Optional The ID of the Zotero group library you want to index.
HACKERNEWS_USERNAME Hacker News Optional HackerNews username to fetch upvoted posts.
HACKERNEWS_PASSWORD Hacker News Optional HackerNews password.
HUGGINGFACE_TOKEN HuggingFace Optional Token to fetch your HuggingFace liked models and datasets.
TWITTER_AUTH_TOKEN X/Twitter Optional Browser auth_token cookie for X. See the X/Twitter section below.
TWITTER_CT0 X/Twitter Optional Browser ct0 cookie (CSRF token) for X. See the X/Twitter section below.

B. Specify Sources

Next, edit the sources.yml file at the root of the repository to configure your data sources.

github:
 - "raphaelsty"
 - "gbolmier"
 - "MaxHalford"
twitter:
 username: "raphaelsrty"
 min_likes: 10
 max_pages: 2
semanlink: False
huggingface: True
  • github: List of GitHub usernames whose starred repositories you want to track.
  • twitter: X/Twitter configuration. Set username to your handle, min_likes to filter bookmarks, and max_pages to control how many pages of recent likes to fetch per run (~100 likes per page). Remove this block entirely to skip X/Twitter.
  • semanlink: Set to True to enable Semanlink RDF data extraction.
  • huggingface: Set to True to fetch your HuggingFace liked models and datasets (requires HUGGINGFACE_TOKEN secret).

3. Deployment

A. Deploy the API to Fly.io

  1. Install flyctl, the Fly.io command-line tool. Instructions can be found here.
  2. Sign up and log in to Fly.io via the command line:
    flyctl auth signup
    flyctl auth login
  3. Get API token and add it to your GitHub repository secrets as FLY_API_TOKEN:
    flyctl auth token
  4. Launch the app. Follow the Fly.io launch documentation. This will generate a fly.toml file. You won't need a database.

⚠️ Update API URLs After deploying, you must replace all instances of https://knowledge.fly.dev in the docs/index.html file with your own Fly.io app URL (e.g., https://app_name.fly.dev).

B. Set up GitHub Pages

  1. Go to your forked repository's settings (Settings > Pages).
  2. Under Build and deployment, select the Source as Deploy from a branch and choose the main branch with the /docs folder.

⚠️ Update CORS Origins After your GitHub Pages site is live, you must add its URL to the origins list in the api/api.py file to allow cross-origin requests.

origins = [
 "https://your-github-username.github.io", # Add your GitHub Pages URL here
]

💸 Cost Management

This project is designed to be affordable, but you are responsible for the costs incurred on Fly.io. Here is how to keep them in check:

⚠️ Bound Fly.io Concurrency To prevent costs from scaling unexpectedly, define connection limits in the fly.toml file.

[services.concurrency]
 hard_limit = 6
 soft_limit = 3
 type = "connections"

⚠️ Select a modest Fly.io VM A small virtual machine is sufficient. A shared-cpu-1x@1024MB is a good starting point.


💻 Local Development

To run the API on local machine for development, simply run the following command from the root of the repository:

make launch

🔌 Zotero Integration

The Zotero integration allows you to save academic papers, articles, and other documents, which will then be automatically indexed by your search engine.

  • Browser Extension: Use the Zotero Connector extension for your browser to easily save documents from the web.

  • Mobile App: The Zotero mobile app lets you add documents on the go. Any uploads will be indexed within a few hours.


🐦 X/Twitter Integration

This source is entirely optional. If you don't need it, simply remove the twitter block from sources.yml and skip this section.

The X/Twitter integration fetches your bookmarked tweets (filtered by a minimum like count) and your liked tweets. It uses Twikit, which connects to X's internal API — no paid API key required.

The setup is a bit of a trick: since X blocks automated logins from servers (Cloudflare protection), authentication relies on browser cookies rather than username/password.

How to get your cookies

  1. Log into x.com in your browser.
  2. Open DevTools (F12 or Cmd+Option+I).
  3. Go to Application > Cookies > https://x.com.
  4. Copy the values for auth_token and ct0.
  5. Add them as GitHub repository secrets: TWITTER_AUTH_TOKEN and TWITTER_CT0.

On macOS with Safari, you can skip the manual step — the pipeline automatically extracts cookies from Safari when running locally (requires Full Disk Access for your terminal). This means uv run python run.py works out of the box on your Mac with no environment variables needed.

Cookie expiration

The auth_token cookie typically lasts about a year. The ct0 token may expire sooner. When the CI starts failing on the Twitter step, simply grab fresh cookies from your browser and update the GitHub secrets.

Configuration

In sources.yml:

twitter:
 username: "your_handle" # Your X screen name
 min_likes: 10 # Minimum likes for bookmarked tweets to be included
 max_pages: 2 # Pages of recent likes to fetch per run (~100/page)
  • Bookmarks: All pages are always fetched (typically small). Filtered by min_likes.
  • Likes: Only the max_pages most recent pages are fetched. This keeps daily CI runs fast while still catching new activity. For an initial backfill of your full like history, temporarily increase max_pages to 200 and run locally.

💡 Acknowledgements

My personal Knowledge Base is inspired by and extracts resources from the Knowledge Base of François-Paul Servant, namely Semanlink.

📜 License

This project is licensed under the GNU General Public License v3.0.

Knowledge Copyright (C) 2023-2025 Raphaël Sourty

AltStyle によって変換されたページ (->オリジナル) /