2
\$\begingroup\$

I've written a script in python in association with selenium to harvest some coffee shop names from yellowpages. Although, the webpage is not javascript injected one, I used selenium for the purpose of experiment, specially how to handle multipage without clicking on the next button.

Upon execution I could notice that my script runs flawlessly and parses the names from that page. However, it does all this very slowly. Is there any way I can make it's performance faster than how it is now being limited within selenium?

Here is the code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
for page_num in range(1,3): 
 driver.get('https://www.yellowpages.com/search?search_terms=pizza&geo_location_terms=San%20Francisco%2C%20CA&page={0}'.format(page_num)) 
 for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div.info"))):
 try:
 name = item.find_element_by_css_selector('a.business-name span[itemprop=name]').text
 except:
 name = ''
 print(name)
driver.quit()
alecxe
17.5k8 gold badges52 silver badges93 bronze badges
asked Aug 21, 2017 at 14:39
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

The code is pretty much straightforward and understandable, but I would still work on the following stylistic and readability issues:

  • extract constants. It might be a good idea to extract the URL template and maximum page number into separate constants, or as arguments of a function
  • put your main execution logic into the if __name__ == '__main__' to avoid it being executed if the module is imported
  • avoid using bare exception clauses. Be specific about the exceptions you are handling. In this case, NoSuchElementException is a good exception to handle if an item name is not found
  • use try/finally to safely close the driver if it fails before quitting - this way you would eliminate a situation when you have opened "ghost" browser windows left from failed scraping tries

All things applied:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def scrape(url, max_page_number):
 driver = webdriver.Chrome()
 wait = WebDriverWait(driver, 10)
 try:
 for page_number in range(1, max_page_number + 1):
 driver.get(url.format(page_number))
 for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".info"))):
 try:
 name = item.find_element_by_css_selector('a.business-name span[itemprop=name]').text
 except NoSuchElementException:
 name = ''
 print(name)
 finally:
 driver.quit()
if __name__ == '__main__':
 url_template = 'https://www.yellowpages.com/search?search_terms=pizza&geo_location_terms=San%20Francisco%2C%20CA&page={0}'
 max_page_number = 2
 scrape(url_template, max_page_number)
answered Aug 21, 2017 at 14:50
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.