mlhorizon/extractor

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
example		example
README.md		README.md
constant.go		constant.go
extractor.go		extractor.go
extractor_test.go		extractor_test.go
guess_html_encoding.go		guess_html_encoding.go

Repository files navigation

extractor

中文网页正文内容提取基于《基于行块分布函数的通用网页正文抽取算法》实现

安装

	go get github.com/yqingp/extractor

使用

	import (
		"github.com/yqingp/extractor"
	)
	....
	extract_worker := extractor.NewExtractor(url)
	content, err := extract_worker.Extract()
	
	if err != nil {
		fmt.Println(content)
	}

server方式启动

	go run example/server.go

	require 'rest_client'
	RestClient.post("http://localhost:8000/work", {:url => "http://www.baidu.com"})

About

网页正文内容提取

Releases

No releases published

Packages

No packages published

Languages

Go 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlhorizon/extractor

Folders and files

Latest commit

History

Repository files navigation

extractor

安装

使用

server方式启动

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

mlhorizon/extractor

Folders and files

Latest commit

History

Repository files navigation

extractor

安装

使用

server方式启动

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages