I've written a script to harvest all the data out of a table from a webpage using python in combination with selenium. It takes a while to parse them all. There are seven steps to hurdle to get to the target page. The search criterion for the table is "pump". However, when the table shows up, there is an option button to select "ALL" appearing in the downmost portion. After selecting the "All" from the options, the site then displays the data with full table. This script is able to automate the whole procedure. I tried to make my code faster using explicit wait maintaining the guidelines of selenium. It is doing it's job perfectly now. Here is the working code.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get('http://apps.tga.gov.au/Prod/devices/daen-entry.aspx')
driver.find_element_by_id('disclaimer-accept').click()
wait.until(EC.visibility_of_element_located((By.ID, "medicine-name")))
driver.find_element_by_id('medicine-name').send_keys('pump')
wait.until(EC.visibility_of_element_located((By.ID, "medicines-header-text")))
driver.find_element_by_id('medicines-header-text').click()
driver.find_element_by_id('submit-button').click()
wait.until(EC.visibility_of_element_located((By.ID, "ctl00_body_MedicineSummaryControl_cmbPageSelection")))
driver.find_element_by_id("ctl00_body_MedicineSummaryControl_cmbPageSelection").click()
driver.find_element_by_xpath('//option[@value="all"]').click()
wait.until(EC.visibility_of_element_located((By.ID, "ctl00_body_MedicineSummaryControl_grdSummary")))
tab_data = driver.find_element_by_id("ctl00_body_MedicineSummaryControl_grdSummary")
list_rows = []
for items in tab_data.find_elements_by_xpath('.//tr'):
list_cells = []
for item in items.find_elements_by_xpath('.//td[@class="row-odd"]|.//td'):
list_cells.append(item.text)
list_rows.append(list_cells)
for data in list_rows:
print(data)
driver.quit()
1 Answer 1
Your code is getting better from script to script and there is less things to point out. I would improve a couple of things only:
the way you get the data - I think you can use the "by tag name" locators with a nested list comprehension:
list_rows = [[cell.text for cell in row.find_elements_by_tag_name('td')] for row in tab_data.find_elements_by_tag_name('tr')]
you can use a
Select
class to select an option from a select dropdown:from selenium.webdriver.support.select import Select results_count = Select(driver.find_element_by_id("ctl00_body_MedicineSummaryControl_cmbPageSelection")) results_count.select_by_visible_text("All")
-
\$\begingroup\$ Thanks sir alecxe, for your invaluable comment and suggestion. I'll try to keep pace with the guidelines you have given me through your elaborative review. Thanks a zillion. \$\endgroup\$SIM– SIM2017年07月10日 13:30:32 +00:00Commented Jul 10, 2017 at 13:30
-
1\$\begingroup\$ @SMth80 I see what you mean, but nope, this is not possible in Python..there are context managers but they serve a different purpose. Thanks. \$\endgroup\$alecxe– alecxe2017年07月10日 19:32:16 +00:00Commented Jul 10, 2017 at 19:32
Explore related questions
See similar questions with these tags.