boilerpipeR: Interface to the Boilerpipe Java Library

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Version: 1.3.2
Imports: rJava
Suggests: RCurl
Published: 2021年05月19日
Author: See AUTHORS file.
boilerpipeR author details
Maintainer: Mario Annau <mario.annau at gmail.com>
NeedsCompilation: no
Materials: NEWS
CRAN checks: boilerpipeR results

Documentation:

Reference manual: boilerpipeR.html , boilerpipeR.pdf

Downloads:

Windows binaries: r-devel: boilerpipeR_1.3.2.zip, r-release: boilerpipeR_1.3.2.zip, r-oldrel: boilerpipeR_1.3.2.zip
macOS binaries: r-release (arm64): boilerpipeR_1.3.2.tgz, r-oldrel (arm64): boilerpipeR_1.3.2.tgz, r-release (x86_64): boilerpipeR_1.3.2.tgz, r-oldrel (x86_64): boilerpipeR_1.3.2.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=boilerpipeR to link to this page.

AltStyle によって変換されたページ (->オリジナル) /