Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: ArkNill/markgrab

v0.2.0

24 Apr 01:28
@github-actions github-actions

Choose a tag to compare

Assets 2
Loading

v0.1.3

13 Apr 12:38
@ArkNill ArkNill

Choose a tag to compare

Bump version to 0.1.3

Loading

v0.1.2

17 Mar 01:48
@ArkNill ArkNill

Choose a tag to compare

Changes

  • Add MCP server module (markgrab-mcp) with 2 tools: extract_url, extract_multiple
  • Add [mcp] optional dependency (pip install "markgrab[mcp]")
  • Register on MCP official registry (io.github.ArkNill/markgrab)
  • 114 tests, all passing
Loading

v0.1.1

16 Mar 23:57
@ArkNill ArkNill

Choose a tag to compare

Initial Public Release

Universal web content extraction — any URL to LLM-ready markdown.

Highlights

  • HTML: BeautifulSoup + content density filtering (removes nav, sidebar, ads)
  • YouTube: Transcript extraction with timestamps and multi-language support
  • PDF: Text extraction with page structure (pdfplumber)
  • DOCX: Paragraph and heading extraction (python-docx)
  • Auto-fallback: httpx first, Playwright for JS-heavy pages
  • Async-first: Built on httpx and Playwright async APIs
  • CLI: markgrab <url> with markdown/text/JSON output
  • Anti-bot stealth: Opt-in Playwright stealth scripts
  • 114 unit tests, all passing
  • MIT licensed

Install

pip install markgrab

Python 3.11+ required. See README for details.

Loading

AltStyle によって変換されたページ (->オリジナル) /