3

When I view the page source in my browser, the html I am after appears there. However, when I make a requests using python requests the html doesn't appear.

The url I'm trying to scrape is http://dota2lounge.com/match?m=13362, and the specific html I am after in the page is.

<div class="full">
 <a class="button" onclick="ChoseEvent(13362,'Whole Match',false)">Match</a>
 <a class="button" onclick="ChoseEvent(13392,'1st Game','1462327200')">1st Game</a>
 <a class="button" onclick="ChoseEvent(13424,'2nd Game','1462327200')">2nd Game</a>
 <br><div id="toma" class="full" style="background: #444;line-height: 2.5rem;border: 1px solid #333;text-align: center;">Whole Match</div>
</div>

I'd like to get the 'onclick' values of the buttons. So far I've tried:

r = requests.get('http://dota2lounge.com/match?m=13268')
soup = bs(r.content, 'lxml')
buttons = soup.find_all('a', class_='button')

Which doesn't work.

r.content

Doesn't appear to show the html either.

asked May 4, 2016 at 7:57
1
  • Try soup.find_all('a', 'button'). Btw sounds like you have a typo in the param class: soup.find_all('a', class='button') Commented May 4, 2016 at 8:02

2 Answers 2

1

Looks like the elements you want are being added by javascript that isn't being run when you make the request in python. Check out this question.

If you're just scraping this once (i.e. you just want the data and you're not trying to build a bot to play the game for you), the quickest option is often to just create a .htm file containing only links to every page you want to scrape (put each link in an <a> tag, you don't even need text). Then you can use a tool like downthemall in firefox to save a local copy of every page with the proper formatting.

answered May 6, 2016 at 1:47
Sign up to request clarification or add additional context in comments.

Comments

0

try this

soup = BeautifulSoup(r.text, "html.parser")
for link in soup.findAll('a'):
 print link.get('onclick')
answered May 4, 2016 at 9:10

2 Comments

Thanks but I tried your suggested parser and that didn't work. If I look into the text from the Request response I still can't see the html there. Are there any reasons it would be rendered in my browser but not in the Python Request?
i didn't find your html section in source code and try this code on dota2lounge.com/match?m=13362 url it find 2 onclick selectTeam($(this), 'a') FUNCTIONS there.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.