html-extractor

A simple extractor based on BeatufulSoup, You can use it to iterate through all the HTML files in the website root directory and get the text, placeholders and other text.

extractor beautifulsoup html-extractor

Updated Dec 16, 2019
Python

MorrisGlr / HEART

Star 0

HTML‐to‐Anki Enhanced Human Explanation & Reasoning Tool (HEART). A Python CLI that leverages the OpenAI API to transform full UWorld vignettes into AI-enhanced Anki flashcards.

python html education medical-education active-learning anki-flashcards learning-resources html-extractor openai-api

Updated Jun 3, 2025
Python

RayenMalouche / MCP-PDF-Extractor-server

Star 0

A Java-based server leveraging Apache Tika to extract content and metadata from files (PDF, DOCX, TXT, etc.) in a local files-to-extract directory. Supports HTML (with CSS styling) and text extraction, file listing, and metadata retrieval via MCP-compliant tools and REST APIs. Built with Spring Boot, Jetty, and MCP SDK.

java html pdf parser mcp extractor pdf-extractor html-extraction html-extractor pdf-extraction mcp-server modelcontextprotocol extractor-to-html

Updated Aug 30, 2025
Java

Improve this page

Add a description, image, and links to the html-extractor topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the html-extractor topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html-extractor

Here are 13 public repositories matching this topic...

miso-belica / sumy

bookieio / breadability

cdimascio / essence

zezhix / html-extractor

kwaziidev / textractor

JanDC / css-from-html-extractor

Whomrx666 / Xtract-htmlV2

Whomrx666 / Xtract-html

importcjj / go-readability

davidmillerpak / Media-Graper

the-real-yey / Simple-HTML-Extractor-

MorrisGlr / HEART

RayenMalouche / MCP-PDF-Extractor-server

Improve this page

Add this topic to your repo