Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: snapsynapse/substack2md

substack2md v2.1.2

31 May 20:00
@snapsynapse snapsynapse

Choose a tag to compare

substack2md v2.1.2

Release date: 2026年05月31日
Commit: 9d15ae2
Tag: v2.1.2

Changes

  • Align assistant-guide documentation and metadata with the current GuideCheck Level 3 posture.
  • Serve the GuideCheck assistant guide from the project GitHub Pages .well-known URL so verifiers can discover the standard location.
  • Keep repository, root, and Pages guide copies byte-identical and mirror the manifest into the Pages source.
  • Move the GuideCheck canonical URL to https://substack2md.space/.well-known/assistant-guide.txt.
  • Use the GuideCheck text manifest format and Assistant Guide: title prefix expected by the hosted verifier.
  • Align public release surfaces with v2.1.2 after tagging: landing page version/date metadata, current frontmatter examples, batch command example, release compare links, and bare-domain URL conventions.

Verification

  • .venv/bin/python -m pytest: 75 passed, 1 skipped.
  • .venv/bin/python -m ruff check .: all checks passed.
  • .venv/bin/python -m substack2md --version: substack2md 2.1.2.
  • Local assistant-guide checks passed as part of tests/test_assistant_guide.py.
  • Live Substack smoke test was skipped because SUBSTACK2MD_LIVE was not enabled.
  • No checker behavior changed in this release-surface cleanup.

Residual Risk

  • GitHub Pages may not honor docs/_headers; hosted GuideCheck can still warn about response headers if https://substack2md.space/ is served directly by GitHub Pages.
  • GuideCheck conformance verifies guide structure, not publisher identity, dependency safety, or runtime behavior.
Assets 5
Loading

substack2md 2.1.1

25 May 04:23
@snapsynapse snapsynapse

Choose a tag to compare

Fixed

  • Move the GuideCheck canonical guide path to .well-known/assistant-guide.txt while keeping the root copy byte-identical for repository discovery.
  • Point package metadata and README instructions at the raw .well-known guide URL to avoid verifier size failures from rendered GitHub HTML.
  • Pin the assistant-guide manifest immutable URL to the commit containing the canonical .well-known guide.

Verification

  • pytest: 74 passed, 1 skipped
  • ruff check: passed
  • ruff format --check: passed
  • CLI version: substack2md 2.1.1
Loading

substack2md 2.1.0

25 May 04:17
@snapsynapse snapsynapse

Choose a tag to compare

Added

  • GuideCheck assistant-guide.txt AI-assisted install path with sidecar manifest and package metadata link.
  • Documentation for verifying and maintaining the AI-assisted install guide.

Fixed

  • Confine publication mapping output directories to base_dir; absolute paths and .. components are rejected.
  • Apply the configured CDP timeout to the initial /json/version discovery request.
  • Raise ImportError for missing runtime dependencies instead of terminating the importing process with sys.exit.

Tests

  • Regression coverage for CDP timeout/cleanup behavior, output path confinement, missing-dependency import behavior, and GuideCheck byte-profile/manifest checks.
Loading

substack2md v2.0.0

16 Apr 16:58
@snapsynapse snapsynapse

Choose a tag to compare

[2.0.0] - 2026年04月16日

Major release. Restructures the project from a single-file script into an installable package with a console entry point, ships a much larger feature set, and tightens every user-facing surface. All Python-level library imports (import substack2md; substack2md.fetch_paywall_status(...)) remain backward compatible.

Breaking changes

  • Removed the flat substack2md.py file. Invocation has moved from python substack2md.py URL to the installed console script substack2md URL (or python -m substack2md URL). After pip install . the CLI is on your PATH.
  • Removed requirements.txt and tests/requirements-dev.txt. pyproject.toml is now the single source of truth for dependencies. Use pip install . or pip install -e ".[dev]" instead.

Migration

v1.x v2.0.0
git clone && pip install -r requirements.txt git clone && pip install .
python substack2md.py URL substack2md URL
python substack2md.py --urls-file urls.txt substack2md --urls-file urls.txt
pip install -r tests/requirements-dev.txt pip install -e ".[dev]"

Added

  • --detect-paywall flag (originally #1 from @drewid74): queries Substack's public /api/v1/posts/{slug} API to classify posts as free or subscriber-only. Writes is_paid and audience fields to YAML frontmatter. Opt-in, graceful fallback to null on API errors, no additional authentication required.
  • --concurrency N flag: opt-in parallel processing. Defaults to 1 (sequential). Posts from the same publication are still serialized via per-host locks to avoid bot heuristics; parallelism is across different publications only.
  • Resume-from-interrupt: every successfully written URL is appended to <base-dir>/.substack2md-state. Subsequent runs skip already-completed URLs before any network call. Pass --no-resume to disable. Clean KeyboardInterrupt handling reports progress before exit.
  • --log-level {DEBUG,INFO,WARNING,ERROR}, --quiet/-q, --version flags. Diagnostics now flow through the logging module; [ok]/[skip] progress becomes INFO-level so --quiet can suppress them cleanly.
  • Teaser-warning detection: when --detect-paywall reports a paid post and the extracted body is under 300 words, substack2md logs a warning that you may have only captured the teaser and need to authenticate in the CDP-connected browser.
  • Custom-domain Substack support: publications with custom domains (e.g. stratechery.com) now route paywall API calls to their canonical <pub>.substack.com subdomain via the new resolve_substack_canonical() helper.
  • Richer tag extraction: merges <meta name="keywords">, ld+json keywords, and ld+json articleSection before normalization, so posts get the author's real taxonomy instead of just ["substack"].
  • --from-md + --detect-paywall: backfill paywall metadata on existing markdown archives without re-fetching HTML.
  • launch-browser.sh: macOS helper that detects Brave or Chrome, isolates a dedicated CDP profile at $HOME/.*-cdp-profile, opens port 9222 on loopback, and verifies the endpoint before exiting.
  • CI via GitHub Actions: pytest and ruff run on push and PR across Python 3.10, 3.11, 3.12, 3.13.
  • Test suite: 58 tests covering audience decoding, HTTP failure modes, request shape, frontmatter serialization, CLI wiring, publication-slug edges, canonical resolution, tag extraction, teaser warning, resume state, and concurrency.
  • Docs: CONTRIBUTING.md, SECURITY.md, CHANGELOG.md, and tests/EVALS.md.

Changed

  • Package layout: substack2md.pysubstack2md/ package with _core.py (library), cli.py (pipeline + main), _version.py (single-source version), __main__.py (enables python -m substack2md).
  • pyproject.toml: registers substack2md as a console script, declares runtime and dev dependencies, sets Python ≥ 3.10, wires the ruff lint/format config and the pytest config.
  • README: rewritten installation and Quick Start for the package flow; full and current CLI reference; paywall section documents all four audience enum values; links to launch-browser.sh and CONTRIBUTING.md.
  • Logging: print(..., file=sys.stderr) replaced with the substack2md logger. Formatter includes level + logger name so downstream aggregators can filter.
  • Code style: full ruff format pass applied. 100-char soft limit, isort-sorted imports, modern type-hint syntax under UP.

Fixed

  • Founding-tier posts (audience: founding) are now correctly classified as is_paid: true. Previously matched only only_paid exactly, which silently leaked paid content as free.
  • Missing audience field in a 200 response now returns (is_paid=None, audience=None) instead of defaulting to "everyone"/False. Matches the documented null-on-uncertainty contract.
  • Unknown audience values (future Substack tiers) are preserved verbatim as audience but is_paid is left as null so downstream treats the post as "status unknown" rather than silently free.
  • datetime.utcnow() deprecation: swapped for datetime.now(timezone.utc) so Python 3.12+ runs without DeprecationWarning and Python 3.14 (which removes utcnow) works.
  • CDP target leak: CDPClient.fetch_html now wraps navigate/eval in try/finally so Target.closeTarget always runs, preventing tab-pool exhaustion during long batches.
  • --timeout not threaded through to paywall API: the CLI --timeout value now reaches fetch_paywall_status instead of being hardcoded to 10s.

Contributors

drewid74
Loading

v1.1.0 — Initial Release

27 Feb 23:51
@snapsynapse snapsynapse

Choose a tag to compare

What is substack2md?

Convert any Substack post to clean, Obsidian-friendly Markdown — using your own authenticated browser session via Chrome DevTools Protocol (CDP). No API keys, no passwords stored, no scraping hacks.

Features

  • Authenticated via CDP — uses your already-logged-in browser, so paywalled posts just work
    • Batch processing — pass a single URL or a text file with hundreds of URLs
    • Obsidian wikilinks — internal cross-references are automatically rewritten as [[slug]] links
    • Rich frontmatter — title, author, publication, dates, tags, canonical URL, and more
    • Publication folders — output is organized by publication with configurable name mappings
    • Transcript cleaning — strips timestamps and speaker labels from podcast transcripts
    • Polite by default — configurable sleep between requests

Supported Browsers

  • Brave (recommended)
    • Chrome (Intel & Apple Silicon)

Requirements

  • Python 3.8+
    • Dependencies: websocket-client, beautifulsoup4, readability-lxml, markdownify

Quick Start

git clone https://github.com/snapsynapse/substack2md.git
cd substack2md
pip install -r requirements.txt
python substack2md.py https://example.substack.com/p/some-post

See the README for full setup instructions including how to launch your browser with remote debugging enabled.

Loading

AltStyle によって変換されたページ (->オリジナル) /