-
Notifications
You must be signed in to change notification settings - Fork 6
Releases: Agents365-ai/paper-fetch
v0.5.0 — Agent-native CLI (real)
The agent-native release
paper-fetch is now a reference-quality agent-native CLI. Scored against the agent-native CLI rubric, the tool moves from 23 / 28 (partially agent-native) to 28 / 28, clearing every one of the seven principles: structured output, delegated auth, self-description, graduated safety, boundary validation, schema-as-truth, and three-audience design.
The resolution chain, host allowlist, and download hardening are unchanged. The contract, self-description, and failure routing are what changed.
Highlights
Self-description
- New
schemasubcommand.python scripts/fetch.py schemaemits a full machine-readable description of the CLI: parameters, types, exit codes, error codes, envelope shapes, environment variables. Runs offline. Agents should cache it againstschema_versionand re-read on drift. - Every envelope carries a
metaslot.request_id,latency_ms,schema_version,cli_version,sources_tried. Orchestrators can correlate logs, track SLOs, and detect schema drift without parsing prose.
Output contract
ok: "partial"on mixed batches, with anextslot suggesting the exact command that retries only the failed subset.- TTY-aware
--formatdefault —jsonwhen stdout is piped or captured,textin a terminal. Explicit--formatoverrides. Agents no longer have to remember the flag. --streamNDJSON mode — one result per line as each DOI resolves, then a final summary line. Useful for long batches.--prettyfor indented JSON.
Progress and observability
- NDJSON progress events on stderr in JSON mode:
start,source_try,source_hit,source_miss,source_skip,source_enrich,source_enrich_failed,download_ok,download_error,download_skip,dry_run,not_found,update_check_spawned. Every event carriesrequest_idandelapsed_msfor correlation with the final stdout envelope. - Auto-update now emits
update_check_spawnedso the silent backgroundgit pullis observable.
Retries and idempotency
--idempotency-key KEY— re-running with the same key replays the original envelope from<out>/.paper-fetch-idem/, re-stampsmeta.request_idandmeta.latency_ms, and setsmeta.replayed_from_idempotency_key. No network I/O.- Skip-if-exists by default;
--overwriteto force re-download. not_foundis nowretryable: truewithretry_after_hours: 168, since OA availability changes over time (embargoes lift, preprints appear).- Metadata enrichment from Semantic Scholar when Unpaywall returns a PDF URL but omits author/title — filename goes from
unknown_2021_Highly_accurate_protein_structure_predic.pdftoJumper_2021_Highly_accurate_protein_structure_predic.pdf.
Exit code taxonomy
0success,1unresolved (no OA copy; not retryable now),2reserved for auth,3validation,4transport (network/IO, retryable). Orchestrators can now route failures deterministically without parsing JSON.
New flags and environment
--timeout SECONDS— override the default 30s HTTP timeout.paper-fetch -/--batch -— read DOIs line-by-line from stdin.PAPER_FETCH_ALLOWED_HOSTS— comma-separated hosts that extend the hard-coded download allowlist.--versionreportspaper-fetch <cli> (schema <schema>).
Backwards compatibility
Existing invocations still work:
python scripts/fetch.py <doi>python scripts/fetch.py --batch dois.txt --out ./paperspython scripts/fetch.py <doi> --dry-runpython scripts/fetch.py <doi> --format text
Things an older consumer might trip on:
response.oknow can be the string"partial"on mixed batches. Truthy in every language I care about, butresponse.ok === truewould miss partial.- Exit code
1previously meant "any runtime failure"; it now means specifically "unresolved" (no OA copy found). Transport failures are now exit4. An orchestrator that hardcodedexit == 1to mean "some DOIs failed due to anything" will miss transport failures and should switch toexit != 0or check the stdout envelope.
Versioning
CLI_VERSION0.3.0 → 0.5.0 (skipping 0.4.x because an unrelatedv0.4.0tag already existed on origin pointing at a different commit)SCHEMA_VERSION1.1.0 — stable. The new stderr events andmetafields are additive. Agents caching schema againstschema_versiondo not need to re-discover.
Auto-update
Users who installed via git clone will pick this up automatically on the second invocation after their 24h cooldown elapses (the first invocation spawns the background pull; the second sees the new code). Force immediately with rm <skill_dir>/.git/.paper-fetch-last-update. Disable with PAPER_FETCH_NO_AUTO_UPDATE=1.
Assets 2
v0.4.0 — Silent background self-update
f7c9359 Highlights
Silent background self-update (#3)
When installed via git clone, paper-fetch now keeps itself in sync with upstream automatically — no cron, no launchd, no hooks, no manual git pull. Each invocation spawns a detached background git pull --ff-only in the skill directory.
- Non-blocking — current call is not delayed;
start_new_session=Truefully detaches - Silent — all output to
/dev/null; JSON contract on stdout is never polluted - Throttled — at most once per 24h via
.git/.paper-fetch-last-update - Safe —
--ff-onlyrefuses to merge if you have local edits - Convergence — updates apply on the next invocation
Kill switches:
export PAPER_FETCH_NO_AUTO_UPDATE=1 # disable entirely export PAPER_FETCH_UPDATE_INTERVAL=3600 # 1h cooldown instead of 24h
Online landing page (#2)
Live at https://agents365-ai.github.io/paper-fetch/ (English) and zh.html (中文). Includes hero, features, 5-source resolution walkthrough, discipline coverage table, comparison vs native agent, and install tabs for all 6 platforms.
Discipline-agnostic clarification (#1)
New "Discipline Coverage" / "学科覆盖" section in both READMEs. The skill works for any field — humanities, social sciences, chemistry, economics, psychology, materials — not just life sciences or CS. Unpaywall + Semantic Scholar cover every Crossref DOI; arXiv/PMC/bioRxiv are additional fallbacks for their specific domains.
Upgrading
Existing installs: nothing to do. On your next paper-fetch call (or within 24h for already-updated installs), the background self-updater will pull this release. From v0.4.0 onward, all future updates are fully automatic.
New installs: unchanged.
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetchFull changelog
- feat: add silent background self-update on each invocation (#3)
- docs: add online landing page (GitHub Pages) (#2)
- docs: clarify that paper-fetch is discipline-agnostic (#1)
Full diff: v0.3.0...v0.4.0
Assets 2
v0.3.0 — Agent-native CLI
What's Changed
Agent-native CLI (scripts/fetch.py)
- Structured JSON output — stable
{"ok": bool, "data": ...}envelope on stdout --format json|text— JSON for agents (default), text for humans--dry-run— resolve sources without downloading- Distinct exit codes — 0=success, 1=runtime, 3=validation
- Granular error codes —
not_found,download_network_error,download_not_a_pdf,download_host_not_allowed,download_size_exceeded,download_io_error,internal_error UNPAYWALL_EMAILnow optional — warns on stderr, skips Unpaywall, remaining 4 sources still work- Download safety — host allowlist (27 known OA domains) + 50 MB per-PDF size limit
- Top-level exception wrapper — no tracebacks on stdout
Platform support
- Added pi-mono (
metadata.pimonamespace)
Documentation
- Partial-failure batch example in SKILL.md
--dry-runand--format textexamples in both READMEs- Support and Author sections
Full Changelog: v0.1.0...v0.3.0
Assets 2
v0.1.0 — Initial release
paper-fetch v0.1.0
Legal open-access PDF downloader for academic papers.
Features
- 5-source fallback chain: Unpaywall → Semantic Scholar
openAccessPdf→ arXiv → PubMed Central OA → bioRxiv/medRxiv - Zero dependencies — pure Python standard library
- Batch mode —
--batch dois.txt - Auto-named output —
{author}_{year}_{title}.pdf - Never uses Sci-Hub or any paywall-bypass service
Multi-Platform Support
Claude Code · OpenClaw / ClawHub · Hermes Agent · OpenAI Codex · SkillsMP
Setup
export UNPAYWALL_EMAIL=you@example.com
python scripts/fetch.py 10.1038/s41586-021-03819-2