I'm doing web-crawling with Selenium and I want to get an element(such as a link) written by JavaScript after Selenium simulating clicking on a fake link.
I tried get_html_source(), but it doesn't include the content written by JavaScript.
Code I've written:
def test_comment_url_fetch(self):
sel = self.selenium
sel.open("/rmrb")
url = sel.get_location()
#print url
if url.startswith('http://login'):
sel.open("/rmrb")
i = 1
while True:
try:
if i == 1:
sel.click("//div[@class='WB_feed_type SW_fun S_line2']/div/div/div[3]/div/a[4]")
print "click"
else:
XPath = "//div[@class='WB_feed_type SW_fun S_line2'][%d]/div/div/div[3]/div/a[4]"%i
sel.click(XPath)
print "click"
except Exception, e:
print e
break
i += 1
html = sel.get_html_source()
html_file = open("tmp\\foo.html", 'w')
html_file.write(html.encode('utf-8'))
html_file.close()
I use a while-loop to click a series of fake links which trigger js-actions to show extra content, and that content is what I want. But sel.get_html_source() didn't give what I want.
Anybody may help? Thanks a lot.
-
Please include the code you've already written, and indicate which part of it is causing a problem for youSimon MᶜKenzie– Simon MᶜKenzie2013年04月18日 02:58:53 +00:00Commented Apr 18, 2013 at 2:58
3 Answers 3
Since I usually do post-processing on the fetched nodes I run JavaScript directly in the browser with execute_script. For example to get all a-tags:
js_code = "return document.getElementsByTagName('a')"
your_elements = sel.execute_script(js_code)
Edit: execute_script and get_eval are equivalent except that get_eval performs an implicit return, in execute_script it has to be stated explicitly.
1 Comment
sel.get_eval(js_code). And I found this Question Can't you just call the browser object inside your selenium environment? For example:
self.browser.find_elements_by_tag_name("div")
Should return you an array of divs. You can also find by class, id, and so on.
Edit Below is the code to create your 'browser' object.
from selenium import webdriver #The browser object
self.browser = webdriver.Firefox() #I Use firefox, but can do chrome, IE, and safari i believe
Then you should be able to do as shown above with the find_elements_by_tag_name.
3 Comments
You would need to use a browser engine which can execute Javascript, such as PhantomJS. Javascript's changes are only visible to clients which can execute Javascript and provide a DOM/Runtime for events to be fired.
Also very close in relation to: Executing Javascript from Python