Python script for get minimum product price having maximum reviews

Question 1

Technical task.

user visits amazon.com website
user fills out a search field with the product name and activates search ==>
a page with search results is displayed.
user looks for the product having maximum reviews
count user extracts minimum product price (with applied discount - if any) from the page
user assigns amazon_price = product price
user visits bestbuy.com website
user chooses United States country
user fills out a search field with the product name and activates search ==>
a page with search results is displayed.
user looks for the product having maximum reviews count
user extracts minimum product price (with applied discount - if any) from the page
user assigns bestbuy_price = product price

import pytest
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from conftest import driver
first_dict = {}
second_dict = {0: 0}
@pytest.mark.parametrize("url, search_locator, products_locator, price_locator, review_locator ", [
 ("https://www.amazon.com/", "//input[@name = 'field-keywords']", "//div[@class = 'a-section']",
 "span.a-price-whole", "span.s-underline-text"),
 ("https://www.bestbuy.com/", "//input[@class = 'search-input']", "//div[@class = 'embedded-badge']",
 "div.priceView-hero-price span:first-child", "span.c-reviews")])
def test_shopping(driver, url, search_locator, products_locator, price_locator, review_locator):
 product = 'samsung galaxy s22'
 driver.get(url)
 if url == "https://www.bestbuy.com/":
 driver.find_element(By.XPATH, '//a[@class = "us-link"]').click()
 search_nav = driver.find_element(By.XPATH, search_locator)
 search_nav.send_keys(product)
 search_nav.send_keys(Keys.RETURN) # for windows need to be changed to Keys.ENTER
 names_product = driver.find_elements(By.XPATH, products_locator)
 assert len(names_product) != 0, "Page with product isn't displayed"
 review_counts = driver.find_elements_by_css_selector(review_locator)
 products_price = driver.find_elements_by_css_selector(price_locator)
 for i in range(len(products_price)):
 if i >= len(review_counts):
 break
 review_count_text = review_counts[i].text.strip('()').replace(',', '.')
 price_count_text = products_price[i].text.strip('$').replace(',', '.')
 if review_count_text == '' or price_count_text == '' or review_count_text == 'Not Yet Reviewed':
 continue
 review_count = int(review_count_text.replace('.', ''))
 price_count = float(price_count_text.replace('.99', ''))
 if "amazon" in url:
 first_dict[review_count] = price_count
 if "bestbuy" in url:
 second_dict[review_count] = price_count
 max_first_value = max(first_dict)
 max_second_value = max(second_dict)
 bestbuy_price = second_dict[max_second_value]
 amazon_price = first_dict[max_first_value]
 # once script completed the line below should be uncommented.
 assert amazon_price > bestbuy_price

Question 2

 search_nav.send_keys(Keys.RETURN) # for windows need to be changed to Keys.ENTER

I don't understand why there's a comment here. Add an if sys.platform == ... and be done with it. Explain the details in the code, not in a comment.

 review_counts = ...
 products_price = ...

These (and the *_locators) are helpful identifiers, thank you. Consider revisiting the whole singular vs plural distinction, for consistency.

 for i in range(len(products_price)):
 if i >= len(review_counts):
 break

We might have computed max( ... ) across the two lengths, fine, whatever. More importantly: It isn't clear why "too many reviews" leads to invalid data. At a minimum we need a # comment explaining what sort of bad data has been observed on particular web pages. We need this to understand what the code is doing, and also to identify whether next year's web pages still manifest that behavior, or if perhaps the logic can be pruned.

 review_count_text = ...
 price_count_text = ...

Those identifiers should probably not have "count" in the middle of them. They came from counts, but they do not contain counts.

I recommend Extract Helper: pass in review_counts / products_price and get back review_count / price_count. For one thing, it will let you write a Unit Test that reveals helpful example text strings to the maintenance engineers you hire next year.

 if "amazon" in url:
 first_dict[review_count] = price_count
 if "bestbuy" in url:
 second_dict[review_count] = price_count

These are ill-chosen identifiers. They are unimaginative and unhelpful. An obvious name would be amazon_dict. But better: use another level of indirection. Could be defaultdict(dict) where we talk about review_counts[vendor][review_count]. Or it could be review_counts[f"{vendor}_{review_count}"].

 assert amazon_price > bestbuy_price

This possibly is true ATM. But pricing strategies will vary as the months go by.

The parsing code above this seems to be nicely motivated, turning possibly chaotic web text into well-defined variables (or signalling fatal error if page format changed). In contrast, this particular assertion seems like it belongs one level up in the call stack. Push the parsing down into a helper function, and make an assertion on what comes back from it.

Overall?

I would not be willing to assign or delegate maintenance tasks on this code base as written. Adding unit tests with example HTML text snippets would go a long way toward explaining the underlying assumptions. The current code does not yet appear to be ready for pushing it into production.

Question 3

Thank you very much. This is a test assignment I recently wrote for the Trainee position. I will fix it.

J_H 43.9k3 gold badges38 silver badges162 bronze badges · Answer 1 · 2023-02-26 21:52:30Z

 search_nav.send_keys(Keys.RETURN) # for windows need to be changed to Keys.ENTER

I don't understand why there's a comment here. Add an if sys.platform == ... and be done with it. Explain the details in the code, not in a comment.

 review_counts = ...
 products_price = ...

These (and the *_locators) are helpful identifiers, thank you. Consider revisiting the whole singular vs plural distinction, for consistency.

 for i in range(len(products_price)):
 if i >= len(review_counts):
 break

We might have computed max( ... ) across the two lengths, fine, whatever. More importantly: It isn't clear why "too many reviews" leads to invalid data. At a minimum we need a # comment explaining what sort of bad data has been observed on particular web pages. We need this to understand what the code is doing, and also to identify whether next year's web pages still manifest that behavior, or if perhaps the logic can be pruned.

 review_count_text = ...
 price_count_text = ...

Those identifiers should probably not have "count" in the middle of them. They came from counts, but they do not contain counts.

I recommend Extract Helper: pass in review_counts / products_price and get back review_count / price_count. For one thing, it will let you write a Unit Test that reveals helpful example text strings to the maintenance engineers you hire next year.

 if "amazon" in url:
 first_dict[review_count] = price_count
 if "bestbuy" in url:
 second_dict[review_count] = price_count

These are ill-chosen identifiers. They are unimaginative and unhelpful. An obvious name would be amazon_dict. But better: use another level of indirection. Could be defaultdict(dict) where we talk about review_counts[vendor][review_count]. Or it could be review_counts[f"{vendor}_{review_count}"].

 assert amazon_price > bestbuy_price

This possibly is true ATM. But pricing strategies will vary as the months go by.

The parsing code above this seems to be nicely motivated, turning possibly chaotic web text into well-defined variables (or signalling fatal error if page format changed). In contrast, this particular assertion seems like it belongs one level up in the call stack. Push the parsing down into a helper function, and make an assertion on what comes back from it.

Overall?

I would not be willing to assign or delegate maintenance tasks on this code base as written. Adding unit tests with example HTML text snippets would go a long way toward explaining the underlying assumptions. The current code does not yet appear to be ready for pushing it into production.

Thank you very much. This is a test assignment I recently wrote for the Trainee position. I will fix it.

Stack Exchange Network

Python script for get minimum product price having maximum reviews

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Python script for get minimum product price having maximum reviews

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions