article-extraction

Here are 6 public repositories matching this topic...

Language: All

Filter by language

All 6 Python 2 HTML 1 JavaScript 1 Jupyter Notebook 1 TypeScript 1

ieg-dhr / NLP-Course4Humanities_2024

This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification

nlp webpage text-classification teaching ner semantic-search nlp-machine-learning university-course historical-newspapers transformers-models llms article-extraction

Updated Jun 5, 2025
Jupyter Notebook

dstark5 / gnews-scraper

Star 13

GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

typescript web-scraping json-parsing web-crawling google-news data-scraping google-news-scraper web-data-extraction web-automation keyword-search gnews news-scraping gnews-api article-extraction gnews-scraper

Updated Aug 19, 2023
TypeScript

levindixon / WebMD

Star 2

📋 WebMD is a Chrome extension that transforms web pages into Markdown documents with surgical precision.

javascript chrome-extension markdown gfm github-flavored-markdown html-to-markdown web-scraping readability browser-extension markdown-converter content-extraction web-tools turndown manifest-v3 article-extraction

Updated Jul 3, 2025
JavaScript

UtrechtUniversity / dataQuest

Star 2

A configurable pipeline for extracting and filtering articles from large corpora, tailored for the Delpher Kranten corpus, with support for features like keyword filtering and tf-idf-based relevance scoring.

information-retrieval corpus-processing article-extraction keyword-filtering delpher-kranten

Updated Apr 18, 2025
Python

jvcByte / text-to-speech

Star 0

A web-based article extractor and text-to-speech converter. Extract content from any URL and listen to articles with natural voice synthesis. Supports multiple extraction methods.

audio text-to-speech web-scraping content-extraction voice-synthesis reading-assistant article-extraction text-to-speech-converter

Updated Aug 9, 2025
Python

xsukax-ReadClean-PDF

xsukax / xsukax-ReadClean-PDF

Star 0

A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.

lightweight bookmarklet pdf-converter text-extraction client-side cors-proxy distraction-free content-extraction reader-mode print-to-pdf web-to-pdf rtl-support article-extraction url-fetcher ad-removal readability-tool clean-reader content-cleaning webpage-cleaner news-archival

Updated Oct 5, 2025
HTML

Improve this page

Add a description, image, and links to the article-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the article-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly