Releases: ArkNill/markgrab

Add MCP server module (markgrab-mcp) with 2 tools: extract_url, extract_multiple
Add [mcp] optional dependency (pip install "markgrab[mcp]")
Register on MCP official registry (io.github.ArkNill/markgrab)
114 tests, all passing

Assets 2

v0.1.1

16 Mar 23:57

@ArkNill ArkNill

v0.1.1

e3c1f94

v0.1.1

Initial Public Release

Universal web content extraction — any URL to LLM-ready markdown.

Highlights

HTML: BeautifulSoup + content density filtering (removes nav, sidebar, ads)
YouTube: Transcript extraction with timestamps and multi-language support
PDF: Text extraction with page structure (pdfplumber)
DOCX: Paragraph and heading extraction (python-docx)
Auto-fallback: httpx first, Playwright for JS-heavy pages
Async-first: Built on httpx and Playwright async APIs
CLI: markgrab <url> with markdown/text/JSON output
Anti-bot stealth: Opt-in Playwright stealth scripts
114 unit tests, all passing
MIT licensed

Install

pip install markgrab

Python 3.11+ required. See README for details.

Assets 2

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ArkNill/markgrab

v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

v0.1.3

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

v0.1.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changes

Uh oh!

v0.1.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Initial Public Release

Highlights

Install

Uh oh!