Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
forked from cornelk/goscrape

Web scraper that can create an offline readable version of a website

License

Notifications You must be signed in to change notification settings

whilei/goscrape

Repository files navigation

goscrape - create offline browsable copies of websites

Build status go.dev reference Go Report Card codecov

A web scraper built with Golang. It downloads the content of a website and allows it to be archived and read offline.

Features

Features and advantages over existing tools like wget, httrack, Teleport Pro:

  • Free and open source
  • Available for all platforms that Golang supports
  • JPEG and PNG images can be converted down in quality to save disk space
  • Excluded URLS will not be fetched (unlike wget)
  • No incomplete temp files are left on disk
  • Downloaded asset files are skipped in a new scraper run
  • Assets from external domains are downloaded automatically
  • Sane default values

Limitations

  • No GUI version, console only

Installation

There are 2 options to install goscrape:

  1. Download and unpack a binary release from Releases

or

  1. Compile the latest release from source:
go install github.com/cornelk/goscrape@latest

Compiling the tool from source code needs to have a recent version of Golang installed.

Usage

goscrape http://website.com

Options

Scrape a website and create an offline browsable version on the disk.
Usage: goscrape [--include INCLUDE] [--exclude EXCLUDE] [--output OUTPUT] [--depth DEPTH] [--imagequality IMAGEQUALITY] [--timeout TIMEOUT] [--proxy PROXY] [--user USER] [--useragent USERAGENT] [--verbose] URLS [URLS ...]
Positional arguments:
 URLS
Options:
 --include INCLUDE, -n INCLUDE
 only include URLs with PERL Regular Expressions support
 --exclude EXCLUDE, -x EXCLUDE
 exclude URLs with PERL Regular Expressions support
 --output OUTPUT, -o OUTPUT
 output directory to write files to
 --depth DEPTH, -d DEPTH
 download depth, 0 for unlimited [default: 10]
 --imagequality IMAGEQUALITY, -i IMAGEQUALITY
 image quality, 0 to disable reencoding
 --timeout TIMEOUT, -t TIMEOUT
 time limit in seconds for each HTTP request to connect and read the request body
 --proxy PROXY, -p PROXY
 HTTP proxy to use for scraping
 --user USER, -u USER user[:password] to use for authentication
 --useragent USERAGENT, -a USERAGENT 
 user agent to use for scraping
 --verbose, -v verbose output
 --help, -h display this help and exit

About

Web scraper that can create an offline readable version of a website

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 96.9%
  • Makefile 3.1%

AltStyle によって変換されたページ (->オリジナル) /