Web page change tracking with structured diffs. markgrab + snapgrab integration, MCP native.
from diffgrab import DiffTracker tracker = DiffTracker() await tracker.track("https://example.com") changes = await tracker.check() for c in changes: if c.changed: print(c.summary) # "3 lines added, 1 lines removed in sections: Introduction." print(c.unified_diff) # Standard unified diff output await tracker.close()
- Change detection — track any URL, detect content changes via content hashing
- Structured diffs — unified diff + section-level analysis (which headings changed)
- Human-readable summaries — "5 lines added, 2 removed in sections: Intro, Methods"
- Snapshot history — SQLite storage, browse past versions of any page
- markgrab powered — HTML/YouTube/PDF/DOCX extraction via markgrab
- Visual diff — optional screenshot comparison via snapgrab
- MCP server — 5 tools for Claude Code / MCP clients
- CLI included —
diffgrab track,check,diff,history,untrack
flowchart TD
A["diffgrab track URL"] --> B["Fetch initial snapshot\n(markgrab + snapgrab)"]
B --> C["Store baseline"]
C --> D["diffgrab check"]
D --> E["Fetch current page"]
E --> F{"Content\nhash match?"}
F -->|"changed"| G["Compute structured diff\n+ section analysis"]
F -->|"unchanged"| H["No changes"]
G --> I["📊 DiffResult\nadded / removed / modified"]
pip install diffgrab
Optional extras:
pip install 'diffgrab[cli]' # CLI with click + rich pip install 'diffgrab[visual]' # Visual diff with snapgrab pip install 'diffgrab[mcp]' # MCP server with fastmcp pip install 'diffgrab[all]' # Everything
import asyncio from diffgrab import DiffTracker async def main(): tracker = DiffTracker() # Track a URL (takes initial snapshot) await tracker.track("https://example.com", interval_hours=12) # Check for changes changes = await tracker.check() for change in changes: if change.changed: print(change.summary) print(change.unified_diff) # Get diff between specific snapshots result = await tracker.diff("https://example.com", before_id=1, after_id=2) # Browse snapshot history history = await tracker.history("https://example.com", count=20) # Stop tracking await tracker.untrack("https://example.com") await tracker.close() asyncio.run(main())
from diffgrab import track, check, diff, history, untrack await track("https://example.com") changes = await check() result = await diff("https://example.com") snaps = await history("https://example.com") await untrack("https://example.com")
# Track a URL diffgrab track https://example.com --interval 12 # Check all tracked URLs for changes diffgrab check # Check a specific URL diffgrab check https://example.com # Show diff between snapshots diffgrab diff https://example.com diffgrab diff https://example.com --before 1 --after 3 # View snapshot history diffgrab history https://example.com --count 20 # Stop tracking diffgrab untrack https://example.com
Add to your Claude Code MCP config:
{
"mcpServers": {
"diffgrab": {
"command": "diffgrab-mcp",
"args": []
}
}
}Or with uvx:
{
"mcpServers": {
"diffgrab": {
"command": "uvx",
"args": ["--from", "diffgrab[mcp]", "diffgrab-mcp"]
}
}
}MCP Tools:
| Tool | Description |
|---|---|
track_url |
Register a URL for change tracking |
check_changes |
Check tracked URLs for changes |
get_diff |
Get structured diff between snapshots |
get_history |
Browse snapshot history |
untrack_url |
Stop tracking a URL |
Every diff operation returns a DiffResult:
@dataclass class DiffResult: url: str # The tracked URL changed: bool # Whether content changed added_lines: int # Lines added removed_lines: int # Lines removed changed_sections: list[str] # Markdown headings with changes unified_diff: str # Standard unified diff summary: str # Human-readable summary before_snapshot_id: int | None # DB ID of older snapshot after_snapshot_id: int | None # DB ID of newer snapshot before_timestamp: str # When older snapshot was taken after_timestamp: str # When newer snapshot was taken
Snapshots are stored in SQLite at ~/.local/share/diffgrab/diffgrab.db (auto-created). Custom path:
tracker = DiffTracker(db_path="/path/to/custom.db")
| Package | Role | PyPI |
|---|---|---|
| markgrab | HTML/YouTube/PDF/DOCX to markdown | pip install markgrab |
| snapgrab | URL to screenshot + metadata | pip install snapgrab |
| docpick | OCR + LLM document extraction | pip install docpick |
| feedkit | RSS feed collection | pip install feedkit |
| diffgrab | Web page change tracking | pip install diffgrab |
| browsegrab | Browser agent for LLMs | Coming soon |
- newswatch — RSS news monitoring pipeline (feedkit → markgrab → embgrep → diffgrab)
- watchdeck — Web page monitoring with visual diffs and safety guards
Part of the QuartzUnit ecosystem — composable Python libraries for data collection, extraction, search, and AI agent safety.