Based on the "Web Crawling" category.
Alternatively, view pyspider alternatives based on common mentions on social networks and blogs.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of pyspider or a related project?
A Powerful Spider(Web Crawler) System in Python.
Tutorial: http://docs.pyspider.org/en/latest/tutorial/
Documentation: http://docs.pyspider.org/
Release notes: https://github.com/binux/pyspider/releases
from pyspider.libs.base_handler import *
class Handler(BaseHandler):
crawl_config = {
}
@every(minutes=24 * 60)
def on_start(self):
self.crawl('http://scrapy.org/', callback=self.index_page)
@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
for each in response.doc('a[href^="http"]').items():
self.crawl(each.attr.href, callback=self.detail_page)
def detail_page(self, response):
return {
"url": response.url,
"title": response.doc('title').text(),
}
pip install pyspiderpyspider, visit http://localhost:5000/ WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or enable need-auth for webui.
Quickstart: http://docs.pyspider.org/en/latest/Quickstart/
Licensed under the Apache License, Version 2.0
*Note that all licence references and agreements mentioned in the pyspider README section above
are relevant to that project's source code only.
Do not miss the trending, packages, news and articles with our weekly report.