How to save an webpage as a xml file using Python and Selenium?

Question 1

I want to open an xml webpage and after opening the webpage, I want to save the xml content displayed in the webpage as file1.xml file.

What I tried:

from selenium import webdriver
import os
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
chromedriver = "/home/dipankar/Desktop/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
#driver = webdriver.Firefox()
driver.get("http://www.example.com")
saveas = ActionChains(driver).key_down(Keys.CONTROL).send_keys('S').key_up(Keys.CONTROL)
saveas.perform()

I tried the following code, but it downloads the html tags also. I want to download only the content displayed in the web page. Not the page source.

content = driver.page_source
print content

Here I attached screen shot of the sample webpage.

xml webpage

Question 2

crummy.com/software/BeautifulSoup/bs4/doc

Question 3

If the page you want is XML, and you want to save it as XML, why are you bothering with Selenium? Use curl, wget, wdownload, and any other number of apps used to download content and just save it directly?

Question 4

I did not know wget is available in python. Thanks.

Question 5

but i want to open the page in browser. 'wget' will not open in browser.

Question 6

driver.page_source code gives correct xml webpage content in firefox. but this same code gives html tags in chrome.

Question 7

Don't use a browser

Based on what you've said, the page is already in XML format, so download the content directly.

This is an assumption, but if the XML is designed to be shown on the web, there is likely to be an XSLT attached to it which will insert extra markup code to make it browser-friendly (which would explain why you're seeing HTML tags also). It is also possible that your browser has an extension doing a similar modification.

So the question comes down to which is more important; the simplicity of downloading the XML directly or "but i want to open the page in browser" and the complexity of loading and filtering results?

Question 8

Without using browser, in terminal I downloaded the xml file using linux command wget. but my task is to open the file in browser then I want to download.

MivaScott MivaScott 5483 silver badges14 bronze badges · Answer 1 · 2018-05-01 16:32:34Z

Don't use a browser

Based on what you've said, the page is already in XML format, so download the content directly.

This is an assumption, but if the XML is designed to be shown on the web, there is likely to be an XSLT attached to it which will insert extra markup code to make it browser-friendly (which would explain why you're seeing HTML tags also). It is also possible that your browser has an extension doing a similar modification.

So the question comes down to which is more important; the simplicity of downloading the XML directly or "but i want to open the page in browser" and the complexity of loading and filtering results?

Without using browser, in terminal I downloaded the xml file using linux command wget. but my task is to open the file in browser then I want to download.

Stack Exchange Network

How to save an webpage as a xml file using Python and Selenium?

1 Answer 1

Don't use a browser

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to save an webpage as a xml file using Python and Selenium?

1 Answer 1

Don't use a browser

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions