-
Notifications
You must be signed in to change notification settings - Fork 1
Releases: ArkNill/markgrab
Releases · ArkNill/markgrab
v0.2.0
Full Changelog: v0.1.3...v0.2.0
Assets 2
v0.1.3
Bump version to 0.1.3
Assets 2
v0.1.2
Changes
- Add MCP server module (
markgrab-mcp) with 2 tools:extract_url,extract_multiple - Add
[mcp]optional dependency (pip install "markgrab[mcp]") - Register on MCP official registry (
io.github.ArkNill/markgrab) - 114 tests, all passing
Assets 2
v0.1.1
Initial Public Release
Universal web content extraction — any URL to LLM-ready markdown.
Highlights
- HTML: BeautifulSoup + content density filtering (removes nav, sidebar, ads)
- YouTube: Transcript extraction with timestamps and multi-language support
- PDF: Text extraction with page structure (pdfplumber)
- DOCX: Paragraph and heading extraction (python-docx)
- Auto-fallback: httpx first, Playwright for JS-heavy pages
- Async-first: Built on httpx and Playwright async APIs
- CLI:
markgrab <url>with markdown/text/JSON output - Anti-bot stealth: Opt-in Playwright stealth scripts
- 114 unit tests, all passing
- MIT licensed
Install
pip install markgrab
Python 3.11+ required. See README for details.