When I view the page source in my browser, the html I am after appears there. However, when I make a requests using python requests the html doesn't appear.
The url I'm trying to scrape is http://dota2lounge.com/match?m=13362, and the specific html I am after in the page is.
<div class="full">
<a class="button" onclick="ChoseEvent(13362,'Whole Match',false)">Match</a>
<a class="button" onclick="ChoseEvent(13392,'1st Game','1462327200')">1st Game</a>
<a class="button" onclick="ChoseEvent(13424,'2nd Game','1462327200')">2nd Game</a>
<br><div id="toma" class="full" style="background: #444;line-height: 2.5rem;border: 1px solid #333;text-align: center;">Whole Match</div>
</div>
I'd like to get the 'onclick' values of the buttons. So far I've tried:
r = requests.get('http://dota2lounge.com/match?m=13268')
soup = bs(r.content, 'lxml')
buttons = soup.find_all('a', class_='button')
Which doesn't work.
r.content
Doesn't appear to show the html either.
2 Answers 2
Looks like the elements you want are being added by javascript that isn't being run when you make the request in python. Check out this question.
If you're just scraping this once (i.e. you just want the data and you're not trying to build a bot to play the game for you), the quickest option is often to just create a .htm file containing only links to every page you want to scrape (put each link in an <a> tag, you don't even need text). Then you can use a tool like downthemall in firefox to save a local copy of every page with the proper formatting.
Comments
try this
soup = BeautifulSoup(r.text, "html.parser")
for link in soup.findAll('a'):
print link.get('onclick')
2 Comments
Explore related questions
See similar questions with these tags.
soup.find_all('a', 'button'). Btw sounds like you have a typo in the param class:soup.find_all('a', class='button')