MSpider

Spider

[画像:manning23 logo]

Source Code Changelog

Suggest Changes

Popularity

4.0

Stable

Activity

0.0

Stable

Stars 347

Watchers 54

Forks 191

Last Commit over 3 years ago

Programming language: Python

License: GNU General Public License v3.0 only

Tags: HTTP Web Crawling Internet

MSpider alternatives and similar packages

Based on the "Web Crawling" category.
Alternatively, view MSpider alternatives based on common mentions on social networks and blogs.

Scrapy

9.9 9.4 L4 MSpider VS Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

scrapy logo
pyspider

9.5 0.0 L3 MSpider VS pyspider

DISCONTINUED. A Powerful Spider(Web Crawler) System in Python.

InfluxDB – Built for High-Performance Time Series Workloads

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

Promo www.influxdata.com

[画像:InfluxDB Logo]

requests-html

9.1 0.0 MSpider VS requests-html

Pythonic HTML Parsing for HumansTM

psf logo
portia

8.8 0.0 L2 MSpider VS portia

Visual scraping for Scrapy

scrapinghub logo
MechanicalSoup

7.7 5.6 L4 MSpider VS MechanicalSoup

A Python library for automating interaction with websites.

MechanicalSoup logo
RoboBrowser

7.2 0.0 L4 MSpider VS RoboBrowser

A simple, Pythonic library for browsing the web without a standalone web browser.

jmcarp logo
Grab

6.4 9.2 L3 MSpider VS Grab

Web Scraping Framework

lorien logo
PSpider

6.4 0.0 MSpider VS PSpider

简单易用的Python爬虫框架,QQ交流群:597510560

xianhu logo
feedparser

6.3 7.5 L3 MSpider VS feedparser

Parse feeds in Python

kurtmckee logo
cola

6.3 0.0 L3 MSpider VS cola

DISCONTINUED. A high-level distributed crawling framework.

qinxuye logo
Scrapely

6.1 0.0 MSpider VS Scrapely

A pure-python HTML screen-scraping library

scrapy logo
gain

6.0 0.0 MSpider VS gain

Web crawling framework based on asyncio.

elliotgao2 logo
Google Search Results in Python

4.2 0.0 MSpider VS Google Search Results in Python

Google Search Results via SERP API pip Python Package

serpapi logo
Sukhoi

4.2 0.0 MSpider VS Sukhoi

Minimalist and powerful Web Crawler.

untwisted logo
reader

3.5 9.4 MSpider VS reader

A Python feed reader library.

lemon24 logo
spidy Web Crawler

3.3 0.0 MSpider VS spidy Web Crawler

The simple, easy to use command line web crawler.

rivermont logo
Crawley

2.7 0.0 MSpider VS Crawley

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

jmg logo
brownant

2.6 0.0 MSpider VS brownant

Brownant is a web data extracting framework.

douban logo
Demiurge

2.2 0.0 L5 MSpider VS Demiurge

PyQuery-based scraping micro-framework.

matiasb logo
Pomp

1.7 0.0 L5 MSpider VS Pomp

Screen scraping and web crawling framework

estin logo
FastImage

1.1 0.0 L4 MSpider VS FastImage

Python library that finds the size / type of an image given its URI by fetching as little as needed

bmuller logo
Mariner

0.5 0.0 MSpider VS Mariner

This a is mirror of Gitlab repository. Open your issues and pull requests there.

radek-sprta logo

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of MSpider or a related project?

Add another 'Web Crawling' Package

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.

featured getstream.io

Popular Comparisons

SaaSHub - Software Alternatives and Reviews

featured www.saashub.com

README

MSpider

Talk

The information security department of 360 company has been recruiting for a long time and is interested in contacting the mailbox zhangxin1[at]360.cn.

Installation

In Ubuntu, you need to install some libraries.

You can use pip or easy_install or apt-get to do this.

lxml
chardet
splinter
gevent
phantomjs

Example

Use MSpider collect the vulnerability information on the wooyun.org.

python mspider.py -u "http://www.wooyun.org/bugs/" --focus-domain "wooyun.org" --filter-keyword "xxx" --focus-keyword "bugs" -t 15 --random-agent true

Use MSpider collect the news information on the news.sina.com.cn.

python mspider.py -u "http://news.sina.com.cn/c/2015-12-20/doc-ifxmszek7395594.shtml" --focus-domain "news.sina.com.cn" -t 15 --random-agent true

ToDo

Crawl and storage of information.
Distributed crawling.

MSpider's help

Usage:
 __ __ _____ _ _
 | \/ |/ ____| (_) | |
 | \ / | (___ _ __ _ __| | ___ _ __
 | |\/| |\___ \| '_ \| |/ _` |/ _ \ '__|
 | | | |____) | |_) | | (_| | __/ |
 |_| |_|_____/| .__/|_|\__,_|\___|_|
 | |
 |_|
 Author: Manning23
Options:
 -h, --help show this help message and exit
 -u MSPIDER_URL, --url=MSPIDER_URL
 Target URL (e.g. "http://www.site.com/")
 -t MSPIDER_THREADS_NUM, --threads=MSPIDER_THREADS_NUM
 Max number of concurrent HTTP(s) requests (default 10)
 --depth=MSPIDER_DEPTH
 Crawling depth
 --count=MSPIDER_COUNT
 Crawling number
 --time=MSPIDER_TIME Crawl time
 --referer=MSPIDER_REFERER
 HTTP Referer header value
 --cookies=MSPIDER_COOKIES
 HTTP Cookie header value
 --spider-model=MSPIDER_MODEL
 Crawling mode: Static_Spider: 0 Dynamic_Spider: 1
 Mixed_Spider: 2
 --spider-policy=MSPIDER_POLICY
 Crawling strategy: Breadth-first 0 Depth-first 1
 Random-first 2
 --focus-keyword=MSPIDER_FOCUS_KEYWORD
 Focus keyword in URL
 --filter-keyword=MSPIDER_FILTER_KEYWORD
 Filter keyword in URL
 --filter-domain=MSPIDER_FILTER_DOMAIN
 Filter domain
 --focus-domain=MSPIDER_FOCUS_DOMAIN
 Focus domain
 --random-agent=MSPIDER_AGENT
 Use randomly selected HTTP User-Agent header value
 --print-all=MSPIDER_PRINT_ALL
 Will show more information

Do not miss the trending, packages, news and articles with our weekly report.

Awesome Python is part of the LibHunt network. Terms. Privacy Policy.

(CC)

BY-SA

We recommend Spin The Wheel Of Names for a cryptographically secure random name picker.