Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

LessUp/bookmarks-cleaner

Repository files navigation

CleanBook

PyPI Python 3.10+ MIT CI Docs

Rules-first · ML-assisted · LLM-optional · Offline-first

简体中文 · Documentation · Releases

CleanBook is a command-line tool for cleaning, deduplicating, and classifying browser bookmark exports. It is designed for people who want a practical offline workflow: take an exported HTML bookmark file, run one command, and get a cleaner categorized result back.

Why use it

  • Offline by default: bookmark processing stays on your machine
  • Rules first: stable category matches are driven by config, not opaque prompts
  • ML where it helps: optional ML and LLM layers improve recall instead of owning the whole pipeline
  • Export-friendly: generate cleaned bookmark HTML, JSON data, and report-style outputs

Quick start

pipx install cleanbook
cleanbook -i bookmarks.html -o output/

Stable rules-only mode:

cleanbook -i bookmarks.html -o output/ --no-ml

From source:

git clone https://github.com/LessUp/bookmarks-cleaner.git
cd bookmarks-cleaner
pip install -e ".[dev]"
cleanbook -i examples/demo_bookmarks.html -o output/

Optional local extras:

pip install -e ".[dev,semantic]" # sentence-transformers + hnswlib
pip install -e ".[dev,audit]" # cleanlab-backed feedback data audit

Offline feedback loop:

cleanbook -i bookmarks.html -o output/ --export-review-queue output/review-queue.json
cleanbook --apply-feedback reviewed-feedback.json
cleanbook --train-feedback reviewed-feedback.json
cleanbook --audit-feedback reviewed-feedback.json --audit-output output/feedback-audit.json

What it ships

  • cleanbook — the maintained CLI entry point
  • cleanbook-wizard — interactive wizard entry point
  • config.json + taxonomy YAML files — the default classification surface

Project shape

main.py / cleanbook
 -> BookmarkProcessor
 -> classifier orchestration
 -> plugin pipeline
 -> services (feature store, taxonomy, performance, etc.)

Documentation

Development

This repository uses OpenSpec as the only active change workflow:

  1. /opsx:explore
  2. /opsx:propose
  3. /opsx:apply
  4. /opsx:archive

Maintained verification baseline:

python3 -m pytest -q tests/test_runtime_paths.py
python3 -m pytest -q

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /