Name	Name	Last commit message	Last commit date
Latest commit History 483 Commits
.github/workflows	.github/workflows
api	api
database	database
docs	docs
img	img
knowledge_database	knowledge_database
.dockerignore	.dockerignore
.gitattributes	.gitattributes
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
Dockerfile	Dockerfile
LICENSE	LICENSE
Makefile	Makefile
fly.toml	fly.toml
pyproject.toml	pyproject.toml
pytest.ini	pytest.ini
readme.md	readme.md
run.py	run.py
sources.yml	sources.yml
uv.lock	uv.lock

Knowledge

Knowledge is a web application that automatically transforms the digital footprint into a personal search engine. It fetches content you interact with from various platforms—GitHub, HackerNews, Zotero, HuggingFace likes, X/Twitter—and organizes it into a navigable knowledge graph.

🌟 Features

🤖 Automatic Aggregation: Daily, automated extraction of GitHub stars, HackerNews upvotes, and Zotero library.
🔍 Powerful Search: A built-in search engine to instantly find any item you've saved or interacted with.
🕸️ Knowledge Graph: Navigate bookmarks through a graph of automatically extracted topics and their connections.

My Personal Knowledge Base is available at raphaelsty.github.io/knowledge.

🛠️ How It Works

A GitHub Actions workflow runs once a day to perform the following tasks:

Extracts Content from specified accounts:
- GitHub Stars
- HackerNews Upvotes
- Zotero Records
- HuggingFace Likes
- X/Twitter Bookmarks & Likes
Processes and Stores Data in the database/ directory:
- database.json: Contains all the raw records.
- triples.json: Stores the knowledge graph data (topics and relationships).
- pipeline.pkl: The serialized search engine and knowledge graph pipeline.
Deploys Updates:
- The backend API is automatically updated and pushed to the Fly.io instance.
- The frontend on GitHub Pages is refreshed with the latest data.

The backend is built with FastAPI and deployed on Fly.io, which offers a free tier suitable for this project. The frontend is a static site hosted on GitHub Pages. The search engine is powered by multiple cherche lexical models and features a final pylate-rs model, which is compiled from Rust to WebAssembly (WASM) to run directly in the client's browser.

🚀 Getting Started: Installation & Deployment

Follow these steps to deploy your own instance of Knowledge.

1. Fork & Clone

First, fork this repository to your own GitHub account and then clone it to your local machine.

2. Configuration

A. Configure Secrets

The application requires API keys and credentials to function. These must be set as Repository secrets in your forked repository's settings (Settings > Secrets and variables > Actions).

Secret	Service	Required	Description
`FLY_API_TOKEN`	Fly.io	Yes	Allows the GitHub Action to deploy your application. See the Fly.io section for instructions.
`ZOTERO_API_KEY`	Zotero Settings	Optional	An API key to access your Zotero library.
`ZOTERO_LIBRARY_ID`	Zotero	Optional	The ID of the Zotero group library you want to index.
`HACKERNEWS_USERNAME`	Hacker News	Optional	HackerNews username to fetch upvoted posts.
`HACKERNEWS_PASSWORD`	Hacker News	Optional	HackerNews password.
`HUGGINGFACE_TOKEN`	HuggingFace	Optional	Token to fetch your HuggingFace liked models and datasets.
`TWITTER_AUTH_TOKEN`	X/Twitter	Optional	Browser `auth_token` cookie for X. See the X/Twitter section below.
`TWITTER_CT0`	X/Twitter	Optional	Browser `ct0` cookie (CSRF token) for X. See the X/Twitter section below.

B. Specify Sources

Next, edit the sources.yml file at the root of the repository to configure your data sources.

github:
 - "raphaelsty"
 - "gbolmier"
 - "MaxHalford"
twitter:
 username: "raphaelsrty"
 min_likes: 10
 max_pages: 2
semanlink: False
huggingface: True

github: List of GitHub usernames whose starred repositories you want to track.
twitter: X/Twitter configuration. Set username to your handle, min_likes to filter bookmarks, and max_pages to control how many pages of recent likes to fetch per run (~100 likes per page). Remove this block entirely to skip X/Twitter.
semanlink: Set to True to enable Semanlink RDF data extraction.
huggingface: Set to True to fetch your HuggingFace liked models and datasets (requires HUGGINGFACE_TOKEN secret).

3. Deployment

A. Deploy the API to Fly.io

Install flyctl, the Fly.io command-line tool. Instructions can be found here.
Sign up and log in to Fly.io via the command line:
```
flyctl auth signup
flyctl auth login
```
Get API token and add it to your GitHub repository secrets as FLY_API_TOKEN:
```
flyctl auth token
```
Launch the app. Follow the Fly.io launch documentation. This will generate a fly.toml file. You won't need a database.

⚠️ Update API URLs After deploying, you must replace all instances of https://knowledge.fly.dev in the docs/index.html file with your own Fly.io app URL (e.g., https://app_name.fly.dev).

B. Set up GitHub Pages

Go to your forked repository's settings (Settings > Pages).
Under Build and deployment, select the Source as Deploy from a branch and choose the main branch with the /docs folder.

⚠️ Update CORS Origins After your GitHub Pages site is live, you must add its URL to the origins list in the api/api.py file to allow cross-origin requests.

origins = [
 "https://your-github-username.github.io", # Add your GitHub Pages URL here
]

💸 Cost Management

This project is designed to be affordable, but you are responsible for the costs incurred on Fly.io. Here is how to keep them in check:

⚠️ Bound Fly.io Concurrency To prevent costs from scaling unexpectedly, define connection limits in the fly.toml file.

[services.concurrency]
 hard_limit = 6
 soft_limit = 3
 type = "connections"

⚠️ Select a modest Fly.io VM A small virtual machine is sufficient. A shared-cpu-1x@1024MB is a good starting point.

💻 Local Development

To run the API on local machine for development, simply run the following command from the root of the repository:

make launch

🔌 Zotero Integration

The Zotero integration allows you to save academic papers, articles, and other documents, which will then be automatically indexed by your search engine.

Browser Extension: Use the Zotero Connector extension for your browser to easily save documents from the web.
Mobile App: The Zotero mobile app lets you add documents on the go. Any uploads will be indexed within a few hours.

Zotero mobile app Zotero mobile app Zotero mobile app

🐦 X/Twitter Integration

This source is entirely optional. If you don't need it, simply remove the twitter block from sources.yml and skip this section.

The X/Twitter integration fetches your bookmarked tweets (filtered by a minimum like count) and your liked tweets. It uses Twikit, which connects to X's internal API — no paid API key required.

The setup is a bit of a trick: since X blocks automated logins from servers (Cloudflare protection), authentication relies on browser cookies rather than username/password.

How to get your cookies

Log into x.com in your browser.
Open DevTools (F12 or Cmd+Option+I).
Go to Application > Cookies > https://x.com.
Copy the values for auth_token and ct0.
Add them as GitHub repository secrets: TWITTER_AUTH_TOKEN and TWITTER_CT0.

On macOS with Safari, you can skip the manual step — the pipeline automatically extracts cookies from Safari when running locally (requires Full Disk Access for your terminal). This means uv run python run.py works out of the box on your Mac with no environment variables needed.

Cookie expiration

The auth_token cookie typically lasts about a year. The ct0 token may expire sooner. When the CI starts failing on the Twitter step, simply grab fresh cookies from your browser and update the GitHub secrets.

Configuration

In sources.yml:

twitter:
 username: "your_handle" # Your X screen name
 min_likes: 10 # Minimum likes for bookmarked tweets to be included
 max_pages: 2 # Pages of recent likes to fetch per run (~100/page)

Bookmarks: All pages are always fetched (typically small). Filtered by min_likes.
Likes: Only the max_pages most recent pages are fetched. This keeps daily CI runs fast while still catching new activity. For an initial backfill of your full like history, temporarily increase max_pages to 200 and run locally.

💡 Acknowledgements

My personal Knowledge Base is inspired by and extracts resources from the Knowledge Base of François-Paul Servant, namely Semanlink.

📜 License

This project is licensed under the GNU General Public License v3.0.

License

raphaelsty/knowledge

Folders and files

Latest commit

History

Repository files navigation

Knowledge

🌟 Features

🛠️ How It Works

🚀 Getting Started: Installation & Deployment

1. Fork & Clone

2. Configuration

A. Configure Secrets

B. Specify Sources

3. Deployment

A. Deploy the API to Fly.io

B. Set up GitHub Pages

💸 Cost Management

💻 Local Development

🔌 Zotero Integration

🐦 X/Twitter Integration

How to get your cookies

Cookie expiration

Configuration

💡 Acknowledgements

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Languages

Packages