Scraping data from a table in python

Question 1

I'm new to python, and after doing a few tutorials, some about scraping, I've been trying some simple scraping on my own. Using BeautifulSoup I manage to get data from web pages where everything has labels, but without them I'm doing a poor job.

I'm trying to get the dollar exchange rate from: http://www.bancochile.cl/cgi-bin/cgi_mone?pagina=inversiones/mon_tasa/cgi_mone

Bank

The value I'm after is highlighted in yellow.

After a lot of trial and error, I manage to get the dollar exchange rate, but I think there has to be a better way.

import requests
from bs4 import BeautifulSoup
page = requests.get("http://www.bancochile.cl/cgi-bin /cgi_mone?pagina=inversiones/mon_tasa/cgi_mone")
soup = BeautifulSoup(page.content, 'html.parser')
tables = soup.find_all("table")
dollar = tables[4].find_all("td")
print(dollar[5].string)

Is there a better, or more correct way to do this? Also, I'm not sure if the problem is in the way I coded, or in not being able to better understand the HTML structure, to navigate to the information in a more efficient way.

Question 2

The markup is definitely not easy to parse because of the nested table elements with no meaningful attributes. But, you are right that relying on relative index of a table and the desired cell being the 6th in the table is quite a fragile strategy.

Instead, let's use the row title as our "anchor". Then, we'll get the following cell via the .find_next_sibling():

DESIRED_MONEDAS = "DOLAR USA"
label = soup.find(lambda tag: tag.name == "td" and tag.get_text(strip=True) == DESIRED_MONEDAS)
value = label.find_next_sibling("td").get_text(strip=True)
print(value)

Question 3

Thank you @alecxe, this is exactly what I was looking for. I'm still working out the details of how your code works, but is great study material to improve my understanding of scraping.

Question 4

@Pablo sure, glad to help. The important thing to understand that you don't have to "traverse" the HTML tree top to bottom, you can go sideways too. BeautifulSoup's API allows you to explore the tree, find elements in so many different ways, really great library.

alecxe alecxe 17.5k8 gold badges52 silver badges93 bronze badges · Accepted Answer · 2017-03-30 19:24:54Z

The markup is definitely not easy to parse because of the nested table elements with no meaningful attributes. But, you are right that relying on relative index of a table and the desired cell being the 6th in the table is quite a fragile strategy.

Instead, let's use the row title as our "anchor". Then, we'll get the following cell via the .find_next_sibling():

DESIRED_MONEDAS = "DOLAR USA"
label = soup.find(lambda tag: tag.name == "td" and tag.get_text(strip=True) == DESIRED_MONEDAS)
value = label.find_next_sibling("td").get_text(strip=True)
print(value)

Thank you @alecxe, this is exactly what I was looking for. I'm still working out the details of how your code works, but is great study material to improve my understanding of scraping.
@Pablo sure, glad to help. The important thing to understand that you don't have to "traverse" the HTML tree top to bottom, you can go sideways too. BeautifulSoup's API allows you to explore the tree, find elements in so many different ways, really great library.

Stack Exchange Network

Scraping data from a table in python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Scraping data from a table in python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions