Name	Name	Last commit message	Last commit date
Latest commit History 162 Commits
.github	.github
assets	assets
docs	docs
fixtures	fixtures
requirements	requirements
scripts	scripts
search_engine_parser	search_engine_parser
.all-contributorsrc	.all-contributorsrc
.gitignore	.gitignore
.pylintrc	.pylintrc
.readthedocs.yml	.readthedocs.yml
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
README.md	README.md
setup.py	setup.py

Search Engine Parser

"If it is a search engine, then it can be parsed" - some random guy

Python 3.6|3.7|3.8|3.9 PyPI version PyPI - Downloads Deploy to Pypi Test Documentation Status License: MIT All Contributors

search-engine-parser is a package that lets you query popular search engines and scrape for result titles, links, descriptions and more. It aims to scrape the widest range of search engines. View all supported engines here.

Search Engine Parser

Popular Supported Engines

Popular search engines supported include:

Google
DuckDuckGo
GitHub
StackOverflow
Baidu
YouTube

View all supported engines here.

Installation

Install from PyPi:

 # install only package dependencies
 pip install search-engine-parser
 # Installs `pysearch` cli tool
 pip install "search-engine-parser[cli]"

or from master:

 pip install git+https://github.com/bisoncorps/search-engine-parser

Development

Clone the repository:

 git clone git@github.com:bisoncorps/search-engine-parser.git

Then create a virtual environment and install the required packages:

 mkvirtualenv search_engine_parser
 pip install -r requirements/dev.txt

Code Documentation

Code docs can be found on Read the Docs.

Running the tests

 pytest

Usage

Code

Query results can be scraped from popular search engines, as shown in the example snippet below.

 import pprint
 from search_engine_parser.core.engines.bing import Search as BingSearch
 from search_engine_parser.core.engines.google import Search as GoogleSearch
 from search_engine_parser.core.engines.yahoo import Search as YahooSearch
 search_args = ('preaching to the choir', 1)
 gsearch = GoogleSearch()
 ysearch = YahooSearch()
 bsearch = BingSearch()
 gresults = gsearch.search(*search_args)
 yresults = ysearch.search(*search_args)
 bresults = bsearch.search(*search_args)
 a = {
 "Google": gresults,
 "Yahoo": yresults,
 "Bing": bresults
 }
 # pretty print the result from each engine
 for k, v in a.items():
 print(f"-------------{k}------------")
 for result in v:
 pprint.pprint(result)
 # print first title from google search
 print(gresults["titles"][0])
 # print 10th link from yahoo search
 print(yresults["links"][9])
 # print 6th description from bing search
 print(bresults["descriptions"][5])
 # print first result containing links, descriptions and title
 print(gresults[0])

For localization, you can pass the url keyword and a localized url. This queries and parses the localized url using the same engine's parser:

 # Use google.de instead of google.com
 results = gsearch.search(*search_args, url="google.de")

If you need results in a specific language you can pass the 'hl' keyword and the 2-letter country abbreviation (here's a handy list):

 # Use 'it' to receive italian results
 results = gsearch.search(*search_args, hl="it")

Cache

The results are automatically cached for engine searches. You can either bypass the cache by adding cache=False to the search or async_search method or clear the engine's cache

 from search_engine_parser.core.engines.github import Search as GitHub
 github = GitHub()
 # bypass the cache
 github.search("search-engine-parser", cache=False)
 #OR
 # clear cache before search
 github.clear_cache()
 github.search("search-engine-parser")

Proxy

Adding a proxy entails sending details to the search function

 from search_engine_parser.core.engines.github import Search as GitHub
 github = GitHub()
 github.search("search-engine-parser",
 # http proxies supported only
 proxy='http://123.12.1.0',
 proxy_auth=('username', 'password'))

Async

search-engine-parser supports async:

 results = await gsearch.async_search(*search_args)

Results

The SearchResults after searching:

 >>> results = gsearch.search("preaching to the choir", 1)
 >>> results
 <search_engine_parser.core.base.SearchResult object at 0x7f907426a280>
 # the object supports retrieving individual results by iteration of just by type (links, descriptions, titles)
 >>> results[0] # returns the first <SearchItem>
 >>> results[0]["description"] # gets the description of the first item
 >>> results[0]["link"] # gets the link of the first item
 >>> results["descriptions"] # returns a list of all descriptions from all results

It can be iterated like a normal list to return individual SearchItems.

Command line

search-engine-parser comes with a CLI tool known as pysearch. You can use it as such:

pysearch --engine bing --type descriptions "Preaching to the choir"

Result:

'Preaching to the choir' originated in the USA in the 1970s. It is a variant of the earlier 'preaching to the converted', which dates from England in the late 1800s and has the same meaning. Origin - the full story 'Preaching to the choir' (also sometimes spelled quire) is of US origin.

Demo

usage: pysearch [-h] [-V] [-e ENGINE] [--show-summary] [-u URL] [-p PAGE]
 [-t TYPE] [-cc] [-r RANK] [--proxy PROXY]
 [--proxy-user PROXY_USER] [--proxy-password PROXY_PASSWORD]
 query
SearchEngineParser
positional arguments:
 query Query string to search engine for
optional arguments:
 -h, --help show this help message and exit
 -V, --version show program's version number and exit
 -e ENGINE, --engine ENGINE
 Engine to use for parsing the query e.g google, yahoo,
 bing,duckduckgo (default: google)
 --show-summary Shows the summary of an engine
 -u URL, --url URL A custom link to use as base url for search e.g
 google.de
 -p PAGE, --page PAGE Page of the result to return details for (default: 1)
 -t TYPE, --type TYPE Type of detail to return i.e full, links, desciptions
 or titles (default: full)
 -cc, --clear-cache Clear cache of engine before searching
 -r RANK, --rank RANK ID of Detail to return e.g 5 (default: 0)
 --proxy PROXY Proxy address to make use of
 --proxy-user PROXY_USER
 Proxy user to make use of
 --proxy-password PROXY_PASSWORD
 Proxy password to make use of