I want to open an xml webpage and after opening the webpage,
I want to save the xml content displayed in the webpage as file1.xml
file.
What I tried:
from selenium import webdriver import os from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.common.keys import Keys chromedriver = "/home/dipankar/Desktop/chromedriver" os.environ["webdriver.chrome.driver"] = chromedriver driver = webdriver.Chrome(chromedriver) #driver = webdriver.Firefox() driver.get("http://www.example.com") saveas = ActionChains(driver).key_down(Keys.CONTROL).send_keys('S').key_up(Keys.CONTROL) saveas.perform()
I tried the following code, but it downloads the html
tags also. I want to download only the content displayed in the web page. Not the page source.
content = driver.page_source print content
Here I attached screen shot of the sample webpage.
1 Answer 1
Don't use a browser
Based on what you've said, the page is already in XML format, so download the content directly.
This is an assumption, but if the XML is designed to be shown on the web, there is likely to be an XSLT attached to it which will insert extra markup code to make it browser-friendly (which would explain why you're seeing HTML tags also). It is also possible that your browser has an extension doing a similar modification.
So the question comes down to which is more important; the simplicity of downloading the XML directly or "but i want to open the page in browser" and the complexity of loading and filtering results?
-
Without using browser, in terminal I downloaded the xml file using linux command
wget
. but my task is to open the file in browser then I want to download.Dipankar Nalui– Dipankar Nalui2018年05月03日 02:55:19 +00:00Commented May 3, 2018 at 2:55
wget
is available in python. Thanks.driver.page_source
code gives correct xml webpage content in firefox. but this same code gives html tags in chrome.