pyspider

A Powerful Spider(Web Crawler) System in Python.

docs.pyspider.org DISCONTINUED. You can find some alternatives below.

Suggest Changes

Popularity

9.5

Stable

Activity

0.0

Stable

Stars 16,382

Watchers 897

Forks 3,689

Last Commit over 1 year ago

Code Quality Rank: L3

Programming language: Python

License: Apache License 2.0

Tags: HTTP Web Crawling Application Frameworks Internet WWW

Latest version: v0.3.10

pyspider alternatives and similar packages

Based on the "Web Crawling" category.
Alternatively, view pyspider alternatives based on common mentions on social networks and blogs.

Scrapy

9.9 9.4 L4 pyspider VS Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

scrapy logo
requests-html

9.1 0.0 pyspider VS requests-html

Pythonic HTML Parsing for HumansTM

psf logo

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.

Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

Promo getstream.io

[画像:Stream Logo]

portia

8.8 0.0 L2 pyspider VS portia

Visual scraping for Scrapy

scrapinghub logo
MechanicalSoup

7.7 5.6 L4 pyspider VS MechanicalSoup

A Python library for automating interaction with websites.

MechanicalSoup logo
RoboBrowser

7.2 0.0 L4 pyspider VS RoboBrowser

A simple, Pythonic library for browsing the web without a standalone web browser.

jmcarp logo
PSpider

6.4 0.0 pyspider VS PSpider

简单易用的Python爬虫框架,QQ交流群:597510560

xianhu logo
Grab

6.4 9.2 L3 pyspider VS Grab

Web Scraping Framework

lorien logo
cola

6.3 0.0 L3 pyspider VS cola

DISCONTINUED. A high-level distributed crawling framework.

qinxuye logo
feedparser

6.3 7.5 L3 pyspider VS feedparser

Parse feeds in Python

kurtmckee logo
Scrapely

6.1 0.0 pyspider VS Scrapely

A pure-python HTML screen-scraping library

scrapy logo
gain

6.0 0.0 pyspider VS gain

Web crawling framework based on asyncio.

elliotgao2 logo
Google Search Results in Python

4.2 0.0 pyspider VS Google Search Results in Python

Google Search Results via SERP API pip Python Package

serpapi logo
Sukhoi

4.2 0.0 pyspider VS Sukhoi

Minimalist and powerful Web Crawler.

untwisted logo
MSpider

4.0 0.0 pyspider VS MSpider

Spider

manning23 logo
reader

3.5 9.4 pyspider VS reader

A Python feed reader library.

lemon24 logo
spidy Web Crawler

3.3 0.0 pyspider VS spidy Web Crawler

The simple, easy to use command line web crawler.

rivermont logo
Crawley

2.7 0.0 pyspider VS Crawley

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

jmg logo
brownant

2.6 0.0 pyspider VS brownant

Brownant is a web data extracting framework.

douban logo
Demiurge

2.2 0.0 L5 pyspider VS Demiurge

PyQuery-based scraping micro-framework.

matiasb logo
Pomp

1.7 0.0 L5 pyspider VS Pomp

Screen scraping and web crawling framework

estin logo
FastImage

1.1 0.0 L4 pyspider VS FastImage

Python library that finds the size / type of an image given its URI by fetching as little as needed

bmuller logo
Mariner

0.5 0.0 pyspider VS Mariner

This a is mirror of Gitlab repository. Open your issues and pull requests there.

radek-sprta logo

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of pyspider or a related project?

Add another 'Web Crawling' Package

InfluxDB – Built for High-Performance Time Series Workloads

featured www.influxdata.com

Popular Comparisons

SaaSHub - Software Alternatives and Reviews

featured www.saashub.com

README

pyspider Build Status Coverage Status

A Powerful Spider(Web Crawler) System in Python.

Write script in Python
Powerful WebUI with script editor, task monitor, project manager and result viewer
MySQL, MongoDB, Redis, SQLite, Elasticsearch; PostgreSQL with SQLAlchemy as database backend
RabbitMQ, Redis and Kombu as message queue
Task priority, retry, periodical, recrawl by age, etc...
Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...

Tutorial: http://docs.pyspider.org/en/latest/tutorial/
Documentation: http://docs.pyspider.org/
Release notes: https://github.com/binux/pyspider/releases

Sample Code

from pyspider.libs.base_handler import *
class Handler(BaseHandler):
 crawl_config = {
 }
 @every(minutes=24 * 60)
 def on_start(self):
 self.crawl('http://scrapy.org/', callback=self.index_page)
 @config(age=10 * 24 * 60 * 60)
 def index_page(self, response):
 for each in response.doc('a[href^="http"]').items():
 self.crawl(each.attr.href, callback=self.detail_page)
 def detail_page(self, response):
 return {
 "url": response.url,
 "title": response.doc('title').text(),
 }

Installation

pip install pyspider
run command pyspider, visit http://localhost:5000/

WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or enable need-auth for webui.

Quickstart: http://docs.pyspider.org/en/latest/Quickstart/