I try to parse html page and fetch values for currencies and write to csv. I have following code:
#!/usr/bin/env python
import urllib2
from BeautifulSoup import BeautifulSoup
contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())
table = soup.find('div', attrs={'class': 'content'})
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
for td in cols:
text = td.find(text=True) + ';'
print text,
print
The problem is, that I do not know, how to retrieve only values for currency. I tried some regexp like '^[0-9]{3}' - start with 3 digits but it doesn't work.
-
Any reason you are using BeautifulSoup 3 instead of 4? Not that it matters much for your problem, but bs4 offers much better functionality in places.Martijn Pieters– Martijn Pieters2013年03月06日 14:52:57 +00:00Commented Mar 6, 2013 at 14:52
-
Are you trying to get just the values of "official exchange rates" column?jurgenreza– jurgenreza2013年03月06日 15:02:51 +00:00Commented Mar 6, 2013 at 15:02
1 Answer 1
You'd be much better off picking out specific cells in the table. The td cells with the cell_c class contain data you are interested in, and the last one is always the currency exchange rate:
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
if 'cell_c' in cols[0]['class']:
# currency row
digital_code, letter_code, units, name, rate = [c.text for c in cols]
print digital_code, letter_code, units, name, rate
With the data in separate variables, you can now turn the text to decimal numbers, store them in a database, whatever.