sangaline/advanced-web-scraping-tutorial

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
zipru_scraper		zipru_scraper
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Repository files navigation

Advanced Web Scraping Tutorial Project

This repository is a companion to the article Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more. Please refer to the article for further details.

This is a scrapy web scraper for the fictional Zipru torrent site. It is designed to bypass four distinct anti-scraping mechanisms:

User agent filtering.
Obfuscated javascript redirects.
Captchas.
Header consistency checks.

The scraper is not actually functional because Zipru is not a real site. The code, however, is otherwise complete and can easily be adapted to work on other sites.

About

The Zipru scraper developed in the Advanced Web Scraping Tutorial.

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sangaline/advanced-web-scraping-tutorial

Folders and files

Latest commit

History

Repository files navigation

Advanced Web Scraping Tutorial Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

sangaline/advanced-web-scraping-tutorial

Folders and files

Latest commit

History

Repository files navigation

Advanced Web Scraping Tutorial Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages