gain

Web crawling framework based on asyncio.

Popularity
6.0
Stable
Activity
0.0
Stable
2,028
72
205

Programming language: Python
License: GNU General Public License v3.0 only

gain alternatives and similar packages

Based on the "Web Crawling" category.
Alternatively, view gain alternatives based on common mentions on social networks and blogs.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of gain or a related project?

Add another 'Web Crawling' Package

README

Build Python Version License

Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp.

[](img/architecture.png)

Requirements

  • Python3.5+

Installation

pip install gain

pip install uvloop (Only linux)

Usage

  1. Write spider.py:
from gain import Css, Item, Parser, Spider
import aiofiles
class Post(Item):
 title = Css('.entry-title')
 content = Css('.entry-content')
 async def save(self):
 async with aiofiles.open('scrapinghub.txt', 'a+') as f:
 await f.write(self.results['title'])
class MySpider(Spider):
 concurrency = 5
 headers = {'User-Agent': 'Google Spider'}
 start_url = 'https://blog.scrapinghub.com/'
 parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
 Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]
MySpider.run()

Or use XPathParser:

from gain import Css, Item, Parser, XPathParser, Spider
class Post(Item):
 title = Css('.breadcrumb_last')
 async def save(self):
 print(self.title)
class MySpider(Spider):
 start_url = 'https://mydramatime.com/europe-and-us-drama/'
 concurrency = 5
 headers = {'User-Agent': 'Google Spider'}
 parsers = [
 XPathParser('//span[@class="category-name"]/a/@href'),
 XPathParser('//div[contains(@class, "pagination")]/ul/li/a[contains(@href, "page")]/@href'),
 XPathParser('//div[@class="mini-left"]//div[contains(@class, "mini-title")]/a/@href', Post)
 ]
 proxy = 'https://localhost:1234'
MySpider.run()

You can add proxy setting to spider as above.

  1. Run python spider.py

  2. Result:

[](img/sample.png)

Example

The examples are in the /example/ directory.

Contribution

  • Pull request.
  • Open issue.


*Note that all licence references and agreements mentioned in the gain README section above are relevant to that project's source code only.

Do not miss the trending, packages, news and articles with our weekly report.

Awesome Python is part of the LibHunt network. Terms. Privacy Policy.

(CC)
BY-SA
We recommend Spin The Wheel Of Names for a cryptographically secure random name picker.

AltStyle によって変換されたページ (->オリジナル) /