gazpacho is a web scraping library. It replaces requests and BeautifulSoup for most projects. gazpacho is small, simple, fast, and consistent. You should use it!
Based on the "HTML Manipulation" category.
Alternatively, view gazpacho alternatives based on common mentions on social networks and blogs.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of gazpacho or a related project?
gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies.
Install with pip at the command line:
pip install -U gazpacho
Give this a try:
from gazpacho import get, Soup
url = 'https://scrape.world/books'
html = get(url)
soup = Soup(html)
books = soup.find('div', {'class': 'book-'}, partial=True)
def parse(book):
name = book.find('h4').text
price = float(book.find('p').text[1:].split(' ')[0])
return name, price
[parse(book) for book in books]
Import gazpacho following the convention:
from gazpacho import get, Soup
Use the get function to download raw HTML:
url = 'https://scrape.world/soup'
html = get(url)
print(html[:50])
# '<!DOCTYPE html>\n<html lang="en">\n <head>\n <met'
Adjust get requests with optional params and headers:
get(
url='https://httpbin.org/anything',
params={'foo': 'bar', 'bar': 'baz'},
headers={'User-Agent': 'gazpacho'}
)
Use the Soup wrapper on raw html to enable parsing:
soup = Soup(html)
Soup objects can alternatively be initialized with the .get classmethod:
soup = Soup.get(url)
Use the .find method to target and extract HTML tags:
h1 = soup.find('h1')
print(h1)
# <h1 id="firstHeading" class="firstHeading" lang="en">Soup</h1>
Use the attrs argument to isolate tags that contain specific HTML element attributes:
soup.find('div', attrs={'class': 'section-'})
Element attributes are partially matched by default. Turn this off by setting partial to False:
soup.find('div', {'class': 'soup'}, partial=False)
Override the mode argument {'auto', 'first', 'all'} to guarantee return behaviour:
print(soup.find('span', mode='first'))
# <span class="navbar-toggler-icon"></span>
len(soup.find('span', mode='all'))
# 8
Soup objects have html, tag, attrs, and text attributes:
dir(h1)
# ['attrs', 'find', 'get', 'html', 'strip', 'tag', 'text']
Use them accordingly:
print(h1.html)
# '<h1 id="firstHeading" class="firstHeading" lang="en">Soup</h1>'
print(h1.tag)
# h1
print(h1.attrs)
# {'id': 'firstHeading', 'class': 'firstHeading', 'lang': 'en'}
print(h1.text)
# Soup
If you use gazpacho, consider adding the scraper: gazpacho badge to your project README.md:
[](https://github.com/maxhumber/gazpacho)
For feature requests or bug reports, please use Github Issues
For PRs, please read the CONTRIBUTING.md document
Do not miss the trending, packages, news and articles with our weekly report.