Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

importcjj/go-readability

Repository files navigation

Go-Readability

GoDoc Travis CI Go Report Card

Go-Readability is a Go package that cleans a HTML page from clutter like buttons, ads and background images, and changes the page's text size, contrast and layout for better readability.

This package is fork from readability and go-readability, which inspired by readability for node.js and readability for python.

Why fork ?

There are severals reasons as to why I create a new fork instead sending a PR to original repository. Cause I need:

  • Extract images
  • Readable mix HTML tags
  • Custom line break

Example

package main
import (
	"fmt"
	nurl "net/url"
	"time"
	"github.com/importcjj/go-readability"
)
func main() {
	// Create URL
	url := "https://www.nytimes.com/2018/01/21/technology/inside-amazon-go-a-store-of-the-future.html"
	parsedURL, _ := nurl.Parse(url)
	extractor := &readability.Extractor{
		TextLineBreak: "<br/><br/>",
		TextWithImgTag: true,
	}
	// Fetch readable content
	article, err := extractor.FromURL(parsedURL, 5*time.Second)
	if err != nil {
		panic(err)
	}
	// Show results
	fmt.Println(article.Meta.Title)
	fmt.Println(article.Meta.Excerpt)
	fmt.Println(article.Meta.Author)
	// readable content
	fmt.Println(article.Text)
	// Tidy HTML
	fmt.Println(article.HTML)
	// Images
	fmt.Println(article.Images)
}

About

Go package that cleans a HTML page for better readability.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 59.5%
  • Go 40.5%

AltStyle によって変換されたページ (->オリジナル) /