I've written some code in python in combination with selenium to collect photo shoot spaces in different locations in Paris.
My scraper is harvesting the names successfully at this moment. However, is it possible to do this any better way cause it looks repetitive?
Any input to improve this script will be highly appreciated. Here is what I've written:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.peerspace.com/")
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.ID, "activity-input"))).click()
wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'ellipsis')]/a"))).click()
wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'col-xs-12')]/li/a[@data-name='Photo Shoot']"))).click()
wait.until(EC.visibility_of_element_located((By.ID, "searchbar-input"))).send_keys("Paris")
wait.until(EC.visibility_of_element_located((By.ID,"searchbar-submit-button"))).click()
for items in wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='col-xs-12 ']/h6[contains(@class,'title')]"))):
print(items.text)
driver.quit()
Inputbox to be filled in before pressing the search button are with:
First one : Photo Shoot
Second one : Paris
1 Answer 1
Overall, it looks clean, but here are some potential improvements:
XPaths, generally, don't handle multi-valued class attributes well - you would have to workaround it with
concat
to make reliable. A better way would be a CSS selector - here are all 3 XPaths replaced with a relevant CSS selector:wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".ellipsis > a"))).click() wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "li > a[data-name='Photo Shoot']"))).click() # items wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "h6.title")))
- avoid using layout-oriented classes like
col-xs-12
in your locators - I would rename
items
totitle
since you are using lengthy expected condition names - what if you would extract them into more concise variable names, e.g.:
from selenium.webdriver.support.expected_conditions import visibility_of_element_located as is_visible from selenium.webdriver.support.expected_conditions import presence_of_all_elements_located as all_present
it might be a good idea to use
try/finally
to quit the browser in case thedriver
failsyou can also submit the search by appending a
\n
to the search input - this will remove the need to look for the submit button and clicking it:wait.until(is_visible((By.ID, "searchbar-input"))).send_keys("Paris\n")
All the above mentioned changes applied:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import visibility_of_element_located as is_visible
from selenium.webdriver.support.expected_conditions import presence_of_all_elements_located as all_present
driver = webdriver.Chrome()
try:
driver.get("https://www.peerspace.com/")
driver.maximize_window()
wait = WebDriverWait(driver, 10)
wait.until(is_visible((By.ID, "activity-input"))).click()
wait.until(is_visible((By.CSS_SELECTOR, ".ellipsis > a"))).click()
wait.until(is_visible((By.CSS_SELECTOR, "li > a[data-name='Photo Shoot']"))).click()
wait.until(is_visible((By.ID, "searchbar-input"))).send_keys("Paris\n")
titles = wait.until(all_present((By.CSS_SELECTOR, ".title")))
for title in titles:
print(title.text)
finally:
driver.quit()
We can further improve on following the DRY principle and extract the common wait.until(is_visible(...))
part into a separate function.
-
\$\begingroup\$ You are just awesome, sir alecxe. Thanks a zillion. \$\endgroup\$SIM– SIM2017年08月19日 15:01:10 +00:00Commented Aug 19, 2017 at 15:01
-
\$\begingroup\$ One question on this, sir. How did you find out the tailing space after "Paris"? Thanks. \$\endgroup\$SIM– SIM2017年08月19日 15:20:00 +00:00Commented Aug 19, 2017 at 15:20
-
\$\begingroup\$ @Shahin ah, you mean the newline character, right? This is just a common way to submit a form if it is submittable by "enter" in a search field. Thanks. \$\endgroup\$alecxe– alecxe2017年08月19日 16:05:12 +00:00Commented Aug 19, 2017 at 16:05
Explore related questions
See similar questions with these tags.