2

I want to extract some data from a Javascript rendered page using Selenium web driver in Python3. I have try several driver, such as Firefox, Chromedriver, and PhantomJS, but always get the same result. Instead of the DOM element, I only got the script.

Here is the snippet of my code

url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018年01月09日'
driver = webdriver.Chrome("/var/chromedriver/chromedriver")
driver.implicitly_wait(20)
driver.get(url)
print(driver.page_source)

Do I miss something here ?

Cerbrus
73.3k19 gold badges138 silver badges151 bronze badges
asked Jan 2, 2018 at 11:17
2
  • Do you have an error message? Push your traceback message in post. Commented Jan 2, 2018 at 11:33
  • There is no error message when I execute those codes. It just give me an unexpected result Commented Jan 3, 2018 at 10:40

2 Answers 2

1

I don't see any such issues in your code block. I have tried your own script as follows :

from selenium import webdriver
url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018年01月09日'
driver = webdriver.Chrome()
driver.get(url)
print(driver.page_source)

I get the following Console Output :

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
 <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
 <meta name="deals::gwt:property" content="baseUrl=/flights/explore//static/" />
 <title>Explore flights</title>
 <meta name="description" content="Explore flights" />
 <script src="https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.yoTdpQipo6s.O/m=gapi_iframes,googleapis_client,plusone/rt=j/sv=1/d=1/ed=1/am=AAE/rs=AHpOoo9_VhuRoUovwpPPf5LqLZd-dmCnxw/cb=gapi.loaded_0" async=""></script>
 <script language="javascript" type="text/javascript">
 var __JS_ILT__ = new Date();
 .
 .
 . <
 /div></div > < div aria - hidden = "true"
 style = "display: none;" > < div class = "CTPFVNB-l-j CTPFVNB-l-h" > Displayed currencies may differ from the currencies used to purchase flights.– < a href = "https://www.google.com/intl/en/googlefinance/disclaimer/"
 class = "CTPFVNB-l-k" > Disclaimer < /a></div > < /div><div aria-hidden="true" style="display: none;"><div class="CTPFVNB-l-j CTPFVNB-l-h">Showing licensed rail data. – <a href="https:/ / www.google.com / intl / en / help / legalnotices_maps.html " class="
 CTPFVNB - l - k ">Legal Notice</a></div></div><div class="
 CTPFVNB - l - i "><a class="
 CTPFVNB - l - k CTPFVNB - l - j " href="
 https: //www.google.com/intl/en/policies/">Privacy &amp; Terms</a><a class="CTPFVNB-l-k CTPFVNB-l-j" href="https://support.google.com/flights/?hl=en">Help Center</a></div></div></div><iframe id="deals" tabindex="-1" style="position: absolute; width: 0px; height: 0px; border: none; left: -1000px; top: -1000px;">
</iframe><input type="text" id="_bgInput" style="display:none;" /></body></html>

Now, as you can clearly see at the fag end of the page_source there is an iframe. So untill and unless we switch to the iframe you won't be able to find the DOM element you are looking for.

answered Jan 2, 2018 at 11:37
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your explanation. But, the problem is the output from the page_source is different from what I got if inspect the page. For example, I want to take all the price available. When I try to parse it from the page_source, it will return nothing because the price is not included there. If I see in the inspected element, the price is exist outside the iframe tag.
Yes, you are right. Switch to the iframe and take page_source, you will find it all. I didn't observe any price being mentioned in the question. Feel free to raise a new question as per your new requirement. If my Answer have catered to your Question please Accept the Answer.
0

use helium a selenium wraper

# pip install helium
import helium, time
url_one = "https://www.vbiz.in/nseoptionchain.html"
browser_one = helium.start_chrome(url_one, headless=True)
seconds = 5
time.sleep(seconds)
html = browser_one.page_source
browser_one.close()
Thlbaut
6497 silver badges25 bronze badges
answered Oct 5, 2021 at 13:50

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.