Turn websites into reusable structured data sources — through conversation, not code.
websource is a local-first CLI tool that analyzes a URL, detects extractable fields (title, price, image, date...), and generates a reusable extraction config that runs on demand or on a schedule. All data stays on your machine in SQLite.
git clone https://github.com/2fe2000/websource.git cd websource npm install npx playwright install chromium # Interactive setup wizard npx tsx bin/websource.ts init https://books.toscrape.com
| Command | Description |
|---|---|
init [url] |
Guided setup for a new data source |
scan <url> |
Analyze a page without saving |
sources list |
List all sources |
sources show <id> |
Show source details |
preview <id> |
Dry-run extraction (no DB write) |
extract <id> |
Run extraction and save |
diff <id> |
Show changes since last run |
schedule <id> <expr> |
Set a cron refresh schedule |
serve |
Start local REST API + scheduler |
export <id> |
Export to JSON/CSV |
doctor |
Run health checks |
If you use Claude Code, you can
run the interactive wizard from any chat session with /websource or natural
language like "scrape this URL".
bash scripts/install-skill.sh
All config is optional. Copy .env.example to .env to override defaults:
| Variable | Default | Description |
|---|---|---|
WEBSOURCE_DATA_DIR |
~/.local/share/websource |
Database and log location |
WEBSOURCE_CONFIG_DIR |
~/.config/websource |
Config file location |
LOG_LEVEL |
warn |
trace / debug / info / warn / error |
All extracted data is stored locally in a single SQLite database:
~/.local/share/websource/
├── websource.db ← all data
└── logs/ ← log files (production mode only)
| Table | Contents |
|---|---|
sources |
Source list (name, URL, status) |
extraction_configs |
Field selectors, fetchMode, and other settings |
runs |
Extraction run history (time, record counts, status) |
snapshots |
The actual extracted records |
diffs |
Added / changed / removed records between runs |
schedules |
Cron schedule settings |
Export extracted data:
# JSON npx tsx bin/websource.ts export <sourceId> --format json # CSV npx tsx bin/websource.ts export <sourceId> --format csv # REST API npx tsx bin/websource.ts serve # GET http://localhost:3847/sources/:id/data
Change the storage location — add to .env:
WEBSOURCE_DATA_DIR=/your/custom/path
- Node.js + TypeScript (ESM, strict)
- Cheerio for static HTML parsing, Playwright for JS-rendered pages
- SQLite (better-sqlite3) for all local persistence
- Fastify for the local REST API
- node-cron for scheduling
See docs/ARCHITECTURE.md for details.
See CONTRIBUTING.md.
MIT — see LICENSE.