3
\$\begingroup\$

I'm new to python, and after doing a few tutorials, some about scraping, I've been trying some simple scraping on my own. Using BeautifulSoup I manage to get data from web pages where everything has labels, but without them I'm doing a poor job.

I'm trying to get the dollar exchange rate from: http://www.bancochile.cl/cgi-bin/cgi_mone?pagina=inversiones/mon_tasa/cgi_mone

Bank

The value I'm after is highlighted in yellow.

After a lot of trial and error, I manage to get the dollar exchange rate, but I think there has to be a better way.

import requests
from bs4 import BeautifulSoup
page = requests.get("http://www.bancochile.cl/cgi-bin /cgi_mone?pagina=inversiones/mon_tasa/cgi_mone")
soup = BeautifulSoup(page.content, 'html.parser')
tables = soup.find_all("table")
dollar = tables[4].find_all("td")
print(dollar[5].string)

Is there a better, or more correct way to do this? Also, I'm not sure if the problem is in the way I coded, or in not being able to better understand the HTML structure, to navigate to the information in a more efficient way.

Reinderien
70.9k5 gold badges76 silver badges256 bronze badges
asked Mar 30, 2017 at 19:15
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

The markup is definitely not easy to parse because of the nested table elements with no meaningful attributes. But, you are right that relying on relative index of a table and the desired cell being the 6th in the table is quite a fragile strategy.

Instead, let's use the row title as our "anchor". Then, we'll get the following cell via the .find_next_sibling():

DESIRED_MONEDAS = "DOLAR USA"
label = soup.find(lambda tag: tag.name == "td" and tag.get_text(strip=True) == DESIRED_MONEDAS)
value = label.find_next_sibling("td").get_text(strip=True)
print(value)
answered Mar 30, 2017 at 19:24
\$\endgroup\$
2
  • \$\begingroup\$ Thank you @alecxe, this is exactly what I was looking for. I'm still working out the details of how your code works, but is great study material to improve my understanding of scraping. \$\endgroup\$ Commented Mar 31, 2017 at 17:15
  • \$\begingroup\$ @Pablo sure, glad to help. The important thing to understand that you don't have to "traverse" the HTML tree top to bottom, you can go sideways too. BeautifulSoup's API allows you to explore the tree, find elements in so many different ways, really great library. \$\endgroup\$ Commented Mar 31, 2017 at 17:17

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.