My issue is the following :
I am training to retrieve the information on this website https://www.cetelem.es/.
I want to do several things:
Click on the two slide buttons to change the information.
Retrieve the information following the change of the sliding buttons
Put a condition, only retrieve information when tin and tae change.
I tried with the following code on google colab :
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--start-maximized')
webdriver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
url = "https://www.cetelem.es/"
webdriver.get(url)
webdriver.find_element_by_class_name("bar-slider importe").send_keys("20.000")
webdriver.find_element_by_class_name("bar-slider messes").send_keys("30")
webdriver.save_screenshot('sreenshot.png')
print(webdriver.find_element_by_tag_name('body').text)
If you have the solution, can you explain my mistake? Because I'm a real beginner in scraping.
1 Answer 1
This is probably not ideal but you could use the +/- buttons to adjust slider until target hit. This is an example for the top slider. You should also restrict the bounds of the target to lower of 4.000 € and upper of 60.000 €.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
# chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--start-maximized')
webdriver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
url = "https://www.cetelem.es/"
webdriver.get(url)
targetSliderStep = 22.700
targetSliderStep = round(targetSliderStep * 2) / 2 # to ensure is units covered by clicking
print('target: ' + "{0:.3f}".format(targetSliderStep))
actualSliderStep = float(webdriver.find_element_by_id('slider-step-value').text.replace(' €',''))
if actualSliderStep < targetSliderStep:
while float(webdriver.find_element_by_id('slider-step-value').text.replace(' €','')) < targetSliderStep:
webdriver.find_element_by_css_selector("#slider-step .up-button").click()
elif actualSliderStep > targetSliderStep:
while float(webdriver.find_element_by_id('slider-step-value').text.replace(' €','')) > targetSliderStep:
webdriver.find_element_by_css_selector("#slider-step .down-button").click()
print('actual: ' + webdriver.find_element_by_id('slider-step-value').text.replace(' €',''))
To answer your other questions:
Other slider:
Use same logic for other slider (and perhaps add in bounds of 12-96):
targetTimeStep = 22.700
targetTimeStep = round(int(targetTimeStep)) # to ensure is units covered by clicking
print('target: ' + str(targetTimeStep))
actualTimeStep = int(webdriver.find_element_by_id('slider-time-step-value').text)
if actualTimeStep < targetTimeStep:
while int(webdriver.find_element_by_id('slider-time-step-value').text) < targetTimeStep:
webdriver.find_element_by_css_selector("#slider-time-step .up-button").click()
elif actualTimeStep > targetTimeStep:
while int(webdriver.find_element_by_id('slider-time-step-value').text) > targetTimeStep:
webdriver.find_element_by_css_selector("#slider-time-step .down-button").click()
print('actual: ' + webdriver.find_element_by_id('slider-time-step-value').text)
Select projects:
You can get all the longer list of projects by clicking the left side menu bars and then targeting the project links by the href attribute value substrings.
webdriver.find_element_by_id('showLeft').click()
webdriver.find_element_by_id('layout_6').click()
projects = webdriver.find_elements_by_css_selector("[href*='prestamos/prestamo-']")
print(len(projects))
CSS selectors:
I use CSS selectors through out as modern browsers are optimized for CSS.
The following, for example,
#slider-step .up-button
Is a parent element with id (#
) slider-step
in descendant combination with an element/elements with class (.
) up-button
. The #
is an id selector, the .
a class selector and the " "
is a descendant combinator i.e. select children with this class that have a parent with that id.
You want to use id selectors where possible then class selectors as faster selectors.
If you right click inspect element on the +
button, for example, for the top slider you will see the following HTML:
You can clearly see the parent id and the child class for the +
.
If you want to practice css selectors the following link is fun and you can read up on selectors here