Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ArkNill/watchdeck

Repository files navigation

watchdeck

Python Tests License

한국어 문서 · llms.txt

Web monitoring pipeline — track page changes, capture visual diffs, and guard against monitoring pitfalls. Built from QuartzUnit libraries.

flowchart LR
 A["🔗 diffgrab\nchange detection"] --> B["📄 markgrab\ncontent extraction"]
 A --> C["📸 snapgrab\nvisual capture"]
 B --> D["🛡️ llm-degen-guard\noutput quality"]
 B --> E["🔄 agent-loop-guard\nloop detection"]
 B --> F["📋 agent-action-policy\naction safety"]
Loading

Quick Start

pip install watchdeck
# Add URLs to monitor
watchdeck add https://example.com
watchdeck add https://news.ycombinator.com --interval 12
# Check for changes
watchdeck check
# View history
watchdeck history https://example.com
# See diff between snapshots
watchdeck diff https://example.com

What It Does

  1. Detect — Tracks page changes via diffgrab (content hashing + structured diffs)
  2. Extract — Pulls full content via markgrab for quality validation
  3. Screenshot — Captures visual snapshots via snapgrab on change (optional)
  4. Guard — Three safety layers:

No cloud services, no API keys. Everything runs locally.

Install

pip install watchdeck

Requirements: Python 3.11+, Playwright (for screenshots: playwright install chromium)

CLI Reference

watchdeck add <URL>

Add a URL to monitor. Blocked URLs (localhost, private IPs, file://) are automatically rejected.

watchdeck add https://example.com # default: check every 24h
watchdeck add https://news.ycombinator.com -i 12 # check every 12h
Option Short Default Description
--interval -i 24 Check interval in hours

watchdeck check

Check all monitored URLs for changes.

watchdeck check # check all
watchdeck check -u https://example.com # specific URL
watchdeck check --screenshots # capture screenshots on change

Output:

 Monitor Check (3 URLs, 1240ms)
┌──────────────────────────┬───────────┬─────────┬──────────┐
│ URL │ Status │ Changes │ Warnings │
├──────────────────────────┼───────────┼─────────┼──────────┤
│ https://example.com │ CHANGED │ +5/-2 │ │
│ https://news.ycombinator │ unchanged │ │ │
│ https://old-page.com │ unchanged │ │ stale │
└──────────────────────────┴───────────┴─────────┴──────────┘
1 changes detected
1 stale URLs (consider reducing frequency)

watchdeck remove <URL>

Stop monitoring a URL.

watchdeck history <URL>

Show snapshot history.

watchdeck history https://example.com -n 10

watchdeck diff <URL>

Show diff between snapshots.

watchdeck diff https://example.com
watchdeck diff https://example.com --before 1 --after 3

Python API

import asyncio
from watchdeck import WatchDeck
async def main():
 deck = WatchDeck()
 # Add URLs (safety policy auto-applied)
 await deck.add("https://example.com", interval_hours=12)
 await deck.add("http://localhost:8080") # → blocked by policy
 # Check for changes
 report = await deck.check()
 for result in report.results:
 if result.changed:
 print(f"{result.url}: {result.summary}")
 if result.stale_warning:
 print(f" ⚠ {result.stale_warning}")
 if result.content_warning:
 print(f" ⚠ {result.content_warning}")
 # History and diffs
 snapshots = await deck.history("https://example.com")
 diff = await deck.diff("https://example.com")
 await deck.close()
asyncio.run(main())

Safety Guards

watchdeck integrates three QuartzUnit guard libraries to prevent common monitoring pitfalls:

URL Policy (agent-action-policy)

Automatically blocks monitoring of:

  • localhost, 127.0.0.1
  • Private networks (192.168.*, 10.*, 172.16-31.*)
  • file:// URLs
deck = WatchDeck()
success, msg = await deck.add("http://192.168.1.1/admin")
# success=False, msg="Cannot monitor internal/private network URLs"

Loop Detection (agent-loop-guard)

Detects when a URL hasn't changed for N consecutive checks and suggests reducing frequency:

⚠ URL unchanged for 5 consecutive checks — consider reducing frequency

Content Quality (llm-degen-guard)

Flags pages that return garbage content (CAPTCHA pages, bot detection, repetitive filler):

⚠ Content appears degenerate (score=0.78) — possible CAPTCHA or anti-bot page

Configuration

Data is stored in ~/.watchdeck/ by default:

~/.watchdeck/
├── tracker.db # diffgrab snapshots + change history

Custom location:

deck = WatchDeck(db_dir="/path/to/data")

How It Works

flowchart TD
 A["watchdeck add URL"] --> B["Initial snapshot\n(diffgrab + markgrab + snapgrab)"]
 B --> C["watchdeck check"]
 C --> D{"Content\nchanged?"}
 D -->|"yes"| E["Compute diff\n+ screenshot\n+ guard checks"]
 D -->|"no"| F["Check stale threshold"]
 E --> G["📊 Report"]
 F --> G
Loading

QuartzUnit Libraries Used

Library Role in watchdeck PyPI
diffgrab Page change detection + structured diffs pip install diffgrab
markgrab Content extraction for quality checks pip install markgrab
snapgrab Visual screenshot capture on change pip install snapgrab
agent-action-policy URL safety policy (block internal IPs) pip install agent-action-policy
agent-loop-guard Stale monitoring detection pip install agent-loop-guard
llm-degen-guard Garbage content detection pip install llm-degen-guard

See also: newswatch — news monitoring pipeline (feedkit + markgrab + embgrep + diffgrab)

License

MIT


Part of the QuartzUnit ecosystem — composable Python libraries for data collection, extraction, search, and AI agent safety.

About

Web monitoring pipeline — track page changes, capture visual diffs, and guard against monitoring pitfalls.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

Contributors

Languages

AltStyle によって変換されたページ (->オリジナル) /