Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

article-extraction

Here are 6 public repositories matching this topic...

This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification

  • Updated Jun 5, 2025
  • Jupyter Notebook

GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

  • Updated Aug 19, 2023
  • TypeScript
xsukax-ReadClean-PDF

A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.

  • Updated Oct 5, 2025
  • HTML

Improve this page

Add a description, image, and links to the article-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the article-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

AltStyle によって変換されたページ (->オリジナル) /