sumy

Module for automatic summarization of text documents and HTML pages.

[画像:miso-belica logo]

miso-belica.github.io Source Code Changelog

Suggest Changes

Popularity

7.3

Declining

Activity

8.3

Stars 3,659

Watchers 111

Forks 541

Last Commit 23 days ago

Description

Code Quality Rank: L5

Programming language: Python

License: Apache License 2.0

Tags: Text Processing Web Content Extracting HTML Scientific Engineering Information Analysis Internet Markup Linguistic Filters Education

Latest version: v0.10.0

sumy alternatives and similar packages

Based on the "Web Content Extracting" category.
Alternatively, view sumy alternatives based on common mentions on social networks and blogs.

TWINT

9.4 0.0 sumy VS TWINT

DISCONTINUED. An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

twintproject logo
newspaper

9.3 5.6 L3 sumy VS newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

codelucas logo

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.

Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

Promo getstream.io

[画像:Stream Logo]

python-goose

7.8 0.0 sumy VS python-goose

Html Content / Article Extractor, web scrapping lib in Python

grangier logo
textract

7.7 1.5 sumy VS textract

extract text from any document. no muss. no fuss.

deanmalmgren logo
trafilatura

7.7 6.8 sumy VS trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

adbar logo
toapi

7.0 0.0 sumy VS toapi

Every web site provides APIs.

elliotgao2 logo
python-readability

6.7 7.2 sumy VS python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

buriy logo
html2text

6.0 5.6 L1 sumy VS html2text

Convert HTML to Markdown-formatted text.

Alir3z4 logo
Goose3

4.4 6.3 sumy VS Goose3

A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html

goose3 logo
micawber

4.0 3.8 L5 sumy VS micawber

a small library for extracting rich content from urls

coleifer logo
lassie

3.7 0.0 L4 sumy VS lassie

Web Content Retrieval for HumansTM

michaelhelmick logo
inscriptis -- HTML to text conversion library, command line client and Web service

3.0 8.1 sumy VS inscriptis -- HTML to text conversion library, command line client and Web service

A python based HTML to text conversion library, command line client and Web service.

weblyzard logo
opengraph

3.0 0.0 L5 sumy VS opengraph

A python module to parse the Open Graph Protocol

erikriver logo
Haul

2.5 0.0 L5 sumy VS Haul

An Extensible Image Crawler

vinta logo
htmldate

2.3 3.6 sumy VS htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

adbar logo
sanitize

1.6 0.0 L4 sumy VS sanitize

Bringing sanity to world of messed-up data

Alir3z4 logo
JSONPATH

1.2 7.5 sumy VS JSONPATH

A query expression for extracting data from JSON.

linw1995 logo
Data Extractor

1.1 7.2 sumy VS Data Extractor

Combine XPath, CSS Selectors and JSONPath for Web data extracting.

linw1995 logo

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of sumy or a related project?

Add another 'Web Content Extracting' Package

InfluxDB – Built for High-Performance Time Series Workloads

featured www.influxdata.com

Popular Comparisons

SaaSHub - Software Alternatives and Reviews

featured www.saashub.com

README

Automatic text summarizer

image GitPod Ready-to-Code

Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains simple evaluation framework for text summaries. Implemented summarization methods are described in the [documentation](docs/summarizators.md). I also maintain a list of [alternative implementations](docs/alternatives.md) of the summarizers in various programming languages.

Is my natural language supported?

There is a [good chance](docs/index.md#Tokenizer) it is. But if not it is [not too hard to add](docs/how-to-add-new-language.md) it.

Installation

Make sure you have Python 3.6+ and pip (Windows, Linux) installed. Run simply (preferred way):

$ [sudo] pip install sumy
$ [sudo] pip install git+git://github.com/miso-belica/sumy.git # for the fresh version

Usage

Sumy contains command line utility for quick summarization of documents.

$ sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization # what's summarization?
$ sumy lex-rank --language=uk --length=30 --url=https://uk.wikipedia.org/wiki/Україна
$ sumy luhn --language=czech --url=https://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy edmundson --language=czech --length=3% --url=https://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy --help # for more info

Various evaluation methods for some summarization method can be executed by commands below:

$ sumy_eval lex-rank reference_summary.txt --url=https://en.wikipedia.org/wiki/Automatic_summarization
$ sumy_eval lsa reference_summary.txt --language=czech --url=https://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy_eval edmundson reference_summary.txt --language=czech --url=https://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy_eval --help # for more info

If you don't want to bother by the installation, you can try it as a container.

$ docker run --rm misobelica/sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization

Python API

Or you can use sumy like a library in your project. Create file sumy_example.py (don't name it sumy.py) with the code below to test it.

# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals
from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
LANGUAGE = "english"
SENTENCES_COUNT = 10
if __name__ == "__main__":
 url = "https://en.wikipedia.org/wiki/Automatic_summarization"
 parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
 # or for plain text files
 # parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
 # parser = PlaintextParser.from_string("Check this out.", Tokenizer(LANGUAGE))
 stemmer = Stemmer(LANGUAGE)
 summarizer = Summarizer(stemmer)
 summarizer.stop_words = get_stop_words(LANGUAGE)
 for sentence in summarizer(parser.document, SENTENCES_COUNT):
 print(sentence)

Interesting projects using sumy

I found some interesting projects while browsing the internet or sometimes people wrote me an e-mail with questions, and I was curious how they use the sumy :)

Learning to generate questions from text - https://github.com/adityasarvaiya/Automatic_Question_Generation
Summarize your video to any duration - https://github.com/aswanthkoleri/VideoMash and similar https://github.com/OpenGenus/vidsum
Tool for collectively summarizing large discussions - https://github.com/amyxzhang/wikum

Do not miss the trending, packages, news and articles with our weekly report.

Awesome Python is part of the LibHunt network. Terms. Privacy Policy.

(CC)

BY-SA

We recommend Spin The Wheel Of Names for a cryptographically secure random name picker.