Scraping a JavaScript rendered page

Question 1

I want to extract some data from a Javascript rendered page using Selenium web driver in Python3. I have try several driver, such as Firefox, Chromedriver, and PhantomJS, but always get the same result. Instead of the DOM element, I only got the script.

Here is the snippet of my code

url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018年01月09日'
driver = webdriver.Chrome("/var/chromedriver/chromedriver")
driver.implicitly_wait(20)
driver.get(url)
print(driver.page_source)

Do I miss something here ?

Question 2

Do you have an error message? Push your traceback message in post.

Question 3

There is no error message when I execute those codes. It just give me an unexpected result

Question 4

I don't see any such issues in your code block. I have tried your own script as follows :

from selenium import webdriver
url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018年01月09日'
driver = webdriver.Chrome()
driver.get(url)
print(driver.page_source)

I get the following Console Output :

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
 <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
 <meta name="deals::gwt:property" content="baseUrl=/flights/explore//static/" />
 <title>Explore flights</title>
 <meta name="description" content="Explore flights" />
 <script src="https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.yoTdpQipo6s.O/m=gapi_iframes,googleapis_client,plusone/rt=j/sv=1/d=1/ed=1/am=AAE/rs=AHpOoo9_VhuRoUovwpPPf5LqLZd-dmCnxw/cb=gapi.loaded_0" async=""></script>
 <script language="javascript" type="text/javascript">
 var __JS_ILT__ = new Date();
 .
 .
 . <
 /div></div > < div aria - hidden = "true"
 style = "display: none;" > < div class = "CTPFVNB-l-j CTPFVNB-l-h" > Displayed currencies may differ from the currencies used to purchase flights.– < a href = "https://www.google.com/intl/en/googlefinance/disclaimer/"
 class = "CTPFVNB-l-k" > Disclaimer < /a></div > < /div><div aria-hidden="true" style="display: none;"><div class="CTPFVNB-l-j CTPFVNB-l-h">Showing licensed rail data. – <a href="https:/ / www.google.com / intl / en / help / legalnotices_maps.html " class="
 CTPFVNB - l - k ">Legal Notice</a></div></div><div class="
 CTPFVNB - l - i "><a class="
 CTPFVNB - l - k CTPFVNB - l - j " href="
 https: //www.google.com/intl/en/policies/">Privacy &amp; Terms</a><a class="CTPFVNB-l-k CTPFVNB-l-j" href="https://support.google.com/flights/?hl=en">Help Center</a></div></div></div><iframe id="deals" tabindex="-1" style="position: absolute; width: 0px; height: 0px; border: none; left: -1000px; top: -1000px;">
</iframe><input type="text" id="_bgInput" style="display:none;" /></body></html>

Now, as you can clearly see at the fag end of the page_source there is an iframe. So untill and unless we switch to the iframe you won't be able to find the DOM element you are looking for.

Question 5

Thanks for your explanation. But, the problem is the output from the page_source is different from what I got if inspect the page. For example, I want to take all the price available. When I try to parse it from the page_source, it will return nothing because the price is not included there. If I see in the inspected element, the price is exist outside the iframe tag.

Question 6

Yes, you are right. Switch to the iframe and take page_source, you will find it all. I didn't observe any price being mentioned in the question. Feel free to raise a new question as per your new requirement. If my Answer have catered to your Question please Accept the Answer.

Question 7

use helium a selenium wraper

# pip install helium
import helium, time
url_one = "https://www.vbiz.in/nseoptionchain.html"
browser_one = helium.start_chrome(url_one, headless=True)
seconds = 5
time.sleep(seconds)
html = browser_one.page_source
browser_one.close()

undetected Selenium 195k44 gold badges304 silver badges387 bronze badges · Accepted Answer · 2018-01-02 11:37:13Z

I don't see any such issues in your code block. I have tried your own script as follows :

from selenium import webdriver
url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018年01月09日'
driver = webdriver.Chrome()
driver.get(url)
print(driver.page_source)

I get the following Console Output :

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
 <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
 <meta name="deals::gwt:property" content="baseUrl=/flights/explore//static/" />
 <title>Explore flights</title>
 <meta name="description" content="Explore flights" />
 <script src="https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.yoTdpQipo6s.O/m=gapi_iframes,googleapis_client,plusone/rt=j/sv=1/d=1/ed=1/am=AAE/rs=AHpOoo9_VhuRoUovwpPPf5LqLZd-dmCnxw/cb=gapi.loaded_0" async=""></script>
 <script language="javascript" type="text/javascript">
 var __JS_ILT__ = new Date();
 .
 .
 . <
 /div></div > < div aria - hidden = "true"
 style = "display: none;" > < div class = "CTPFVNB-l-j CTPFVNB-l-h" > Displayed currencies may differ from the currencies used to purchase flights.– < a href = "https://www.google.com/intl/en/googlefinance/disclaimer/"
 class = "CTPFVNB-l-k" > Disclaimer < /a></div > < /div><div aria-hidden="true" style="display: none;"><div class="CTPFVNB-l-j CTPFVNB-l-h">Showing licensed rail data. – <a href="https:/ / www.google.com / intl / en / help / legalnotices_maps.html " class="
 CTPFVNB - l - k ">Legal Notice</a></div></div><div class="
 CTPFVNB - l - i "><a class="
 CTPFVNB - l - k CTPFVNB - l - j " href="
 https: //www.google.com/intl/en/policies/">Privacy &amp; Terms</a><a class="CTPFVNB-l-k CTPFVNB-l-j" href="https://support.google.com/flights/?hl=en">Help Center</a></div></div></div><iframe id="deals" tabindex="-1" style="position: absolute; width: 0px; height: 0px; border: none; left: -1000px; top: -1000px;">
</iframe><input type="text" id="_bgInput" style="display:none;" /></body></html>

Now, as you can clearly see at the fag end of the page_source there is an iframe. So untill and unless we switch to the iframe you won't be able to find the DOM element you are looking for.

Thanks for your explanation. But, the problem is the output from the page_source is different from what I got if inspect the page. For example, I want to take all the price available. When I try to parse it from the page_source, it will return nothing because the price is not included there. If I see in the inspected element, the price is exist outside the iframe tag.
Yes, you are right. Switch to the iframe and take page_source, you will find it all. I didn't observe any price being mentioned in the question. Feel free to raise a new question as per your new requirement. If my Answer have catered to your Question please Accept the Answer.

CollectivesTM on Stack Overflow

Scraping a JavaScript rendered page

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related