4
\$\begingroup\$

Use case - motivation & challenge

Hi all! I have been working with Python for the last two years, but never learned proper object-oriented programming and design patterns. I've decided for this year to close this gap by reading some books and applying the knowledge to a real-world problem. I am looking forward to learning a lot from all the suggestions :)

To kick off my learning, I've decided to automate a recurring weekly task of filling some timesheets located in Microsoft Teams, using a bot to do the heavy lifting for me. The bot should perform the following steps:

  • Navigate to the login page
  • Fill in username and password
  • Sign in
  • Navigate to the excel page with the timesheet
  • Fill in my weekly hours

Currently, the bot does almost all steps, except the last two, which I haven't implemented yet.

Code breakdown

The code is quite simple. I rely heavily on selenium to perform all actions, so I want to create a chrome instance where the agent will perform its actions.

Naturally, I first import the libraries I am going to use:

import os
import time
import random
from selenium import webdriver
from dataclasses import dataclass
from abc import ABC, abstractmethod
from webdriver_manager.chrome import ChromeDriverManager

Next up, I define immutable classes whose only purpose is to containerize information that is static, so that code duplication can be avoided.

@dataclass(frozen=True)
class XPathsContainer:
 teams_login_button: str = '//*[@id="mectrl_main_trigger"]/div/div[1]'
 teams_login_user_button: str = '//*[@id="i0116"]'
 teams_login_next_button: str = '//*[@id="idSIButton9"]'
 teams_login_pwd_button: str = '//*[@id="i0118"]'
 teams_sign_in_button: str = '//*[@id="idSIButton9"]'
 teams_sign_in_keep_logged_in: str = '//*[@id="KmsiCheckboxField"]'
@dataclass(frozen=True)
class UrlsContainer:
 teams_login_page: str = 'https://www.microsoft.com/en-in/microsoft-365/microsoft-teams/group-chat-software'

Now, I try to implement a base class which is called Driver. This class contains the initialization of the chrome object and sets the foundations for other agents to be inherited. Each Agent child class might have (in the future) different actions but they must have a sleep method (to avoid restrictions in using bots), they must be able to click, write information and navigate to pages.

class Driver(ABC):
 def __init__(self, action, instruction, driver=None):
 if driver:
 self.driver = driver
 else:
 self.driver = webdriver.Chrome(ChromeDriverManager().install())
 self.actions = {
 'navigate': self.navigate,
 'click': self.click,
 'write': self.write
 }
 self.parameters = {
 'action': None,
 'instruction': None
 }
 @abstractmethod
 def sleep(self, current_tick=1):
 pass
 @abstractmethod
 def navigate(self, *args):
 pass
 @abstractmethod
 def click(self, *args):
 pass
 @abstractmethod
 def write(self, **kwargs):
 pass
 @abstractmethod
 def main(self, **kwargs):
 pass

Now I implement a basic Agent child class, which implements the logic of required functions of the base class Driver.

class Agent(Driver):
 def __init__(self, action, instruction, driver):
 super().__init__(action, instruction, driver)
 self.action = action
 self.instruction = instruction
 def sleep(self, current_tick=1):
 seconds = random.randint(3, 7)
 timeout = time.time() + seconds
 while time.time() <= timeout:
 time.sleep(1)
 print(f"Sleeping to replicate user.... tick {current_tick}/{seconds}")
 current_tick += 1
 def navigate(self, url):
 print(f"Agent navigating to {url}...")
 return self.driver.get(url)
 def click(self, xpath):
 print(f"Agent clicking in '{xpath}'...")
 return self.driver.find_element_by_xpath(xpath).click()
 def write(self, args):
 xpath = args[0]
 phrase = args[1]
 print(f"Agent writing in '{xpath}' the phrase '{phrase}'...")
 return self.driver.find_element_by_xpath(xpath).send_keys(phrase)
 def main(self, **kwargs):
 self.action = kwargs.get('action', self.action)
 self.instruction = kwargs.get('instruction', self.instruction)
 self.actions[self.action](self.instruction)
 self.sleep()

Finally, I've created a function that updates the parameters of the class whenever there is a set of actions and instructions that need to be executed under the same chrome driver. And I've created a function that takes a script of actions and executes them.

def update_driver_parameters(driver, values):
 params = driver.parameters
 params['action'] = values[0]
 params['instruction'] = values[1]
 return params
def run_script(script):
 for script_line, script_values in SCRIPT.items():
 chrome = Agent(None, None, None)
 for instructions in script_values:
 params = update_driver_parameters(chrome, instructions)
 chrome.main(**params)
 chrome.sleep()
USER = os.environ["USERNAME"]
SECRET = os.environ["SECRET"]
SCRIPT = {
 'login': [
 ('navigate', UrlsContainer.teams_login_page),
 ('click', XPathsContainer.teams_login_button),
 ('write', (XPathsContainer.teams_login_user_button, USER)),
 ('click', XPathsContainer.teams_login_next_button),
 ('write', (XPathsContainer.teams_login_pwd_button, SECRET)),
 ('click', XPathsContainer.teams_sign_in_button),
 ('click', XPathsContainer.teams_sign_in_keep_logged_in),
 ('click', XPathsContainer.teams_sign_in_button),
 ]
}
run_script(SCRIPT)

Concerns

Right now, I think the code has several major concerns, mostly related to being inexperienced in design patterns:

  • I rely too much on Xpaths to make the bot do something which will result in an enormous data class if there are many steps to do;
  • Also, relying on Xpaths could be bad, because if the page is updated, I will have to retrace steps, but this is probably necessary evil;
  • I am not sure whether the implementation of an immutable class is the correct one. I've used dataclass for this;
  • I have the feeling that the inheritance that I've implemented is quite clunky. I want to be able to share the same driver along with multiple classes. I don't want to create a new driver per action, I always want to fetch the latest context the driver did, but if a new agent is created then a new driver must be assigned to that agent;
  • Maybe kwargs arguments could be implemented differently, I am never sure of the correct way to parse them without using kwargs.get;
  • Inconsistent use of args and kwargs, could this be implemented differently?
asked Feb 17, 2021 at 10:55
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

Bug: on the first line of run_script, SCRIPT.items() should be script.items(). As written, it executes the global SCRIPT and not the argument to the function.

It doesn't seem like Agent should inherit from Driver

If you research Selenium best practices, you will find a few that make sense for your use case (most are geared toward testing). Two of them are Page Objects and preferred selector order.

The idea behind Page Objects is to create a class for each page of the web application (or at least the pages you are using). The class encapsulates the data and methods needed to interact with that page. Your automation script then calls the methods on the Page Objects to automate a task. For example, a class for a login page might have methods for getting the login page, for entering a username, entering a password, clicking a remember me checkbox, and clicking a login button. A login method then calls these methods in the right order to do a login.

This lets you isolate page specifics in one place. For example, the current design seems to suggest that if you automate another task, you would need to duplicate the login portion of SCRIPT. Then, if the login process changes every script needs to by updated. Using a Page Object, only the login page class needs to be changed.

In practice the most reliable and robust way to select an element is by ID, then by name, css selector, and lastly Xpath is the least robust. It looks like most of your targets have IDs, so use that.

Structure the project something like this:

project
 pages
 __init__.py # can be empty
 base.py # one for each page 
 home.py
 login.py
 time.py
 ...etc... # add whatever other pages you use
 entertime.py # the script

Then

base.py
class BasePage:
 URL = None
 def __init__(self, driver=None):
 if driver is None:
 driver = webdriver.Chrome(ChromeDriverManager().install())
 
 self.driver = driver
 
 def click(self, locator, mu=1.5, sigma=0.3):
 """simulate human speed and click a page element."""
 self.dally(mu, sigma)
 self.driver.find_element(*locator).click()
 return self
 def dally(self, mu=1, sigma=0.2):
 pause = random.gauss(mu, sigma)
 while pause > 0:
 delta = min(1, pause)
 pause -= delta
 time.spleep(delta)
 return self
 def navigate(self):
 if self.URL:
 self.driver.get(self.URL)
 return self
 
 raise ValueError("No where to go. No URL")
 
 def send_keys(self, locator, keys):
 self.driver.find_element(*locator).send_keys(keys)
 return self
login.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from .base import BasePage
 
class LoginPage(BasePage):
 URL = 'https://www.microsoft.com/en-in/microsoft-365/microsoft-teams/group-chat-software'
 
 #locators for elements of the page
 LOGIN_BUTTON = (By.XPATH, '//*[@id="mectrl_main_trigger"]/div/div[1]')
 USERNAME_FIELD = (By.ID, "i0116")
 NEXT_BUTTON = (By.ID, "idSIButton9")
 PASSWORD_FIELD = (By.ID, "i0118")
 STAY_LOGGED_IN = (By.ID, "KmsiCheckboxField")
 
 def click_next(self):
 self.click(*self.NEXT_BUTTON)
 return self
 
 def start_login(self):
 self.click(*self.LOGIN_BUTTON)
 return self
 def enter_username(self, username):
 self.send_keys(*self.USERNAME_FIELD, username)
 self.click_next()
 return self
 
 def enter_password(self, password):
 self.send_keys(*self.PASSWORD_FIELD, password)
 self.click_next()
 return self
 
 def toggle_stay_logged_in(self):
 self.driver.find_element(*self.STAY_LOGGED_IN).click()
 return self
 
 def login(self, username, password):
 self.navigate()
 self.start_login()
 self.enter_username(username)
 self.enter_password(password)
 self.toggle_stay_logged_in()
 self.click_next()
 
 return HomePage(driver) # or whatever page comes after a login
entertime.py
import os
from pages import LoginPage, HomePage # what ever pages you need for the script
from selenium import webdriver
from selenium.webdriver.common.by import By
USER = os.environ["USERNAME"]
SECRET = os.environ["SECRET"]
homepage = LoginPage().login(USER, SECRET)
timepage = homepage.navigate_to_time_entry() # <== whatever method you define
timepage.entertime() # <== whatever method you define

I don't have MS teams to test this on, so this hasn't been tested. It is merely as suggestion on how to structure you project to make it easier to update, expand, etc.

answered Feb 23, 2021 at 0:00
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Nice answer. The import pattern entertime.py is a pattern I used to use, I found it good until I was more comfortable with correctly setting up a __main__.py. I'd personally rename entertime.pyto pages/__main__.py (with some import changes, from . import LoginPage, ...) and run the package with python -m pages rather than python entertime.py. Note using a __main__.py can be quite finicky at times so you (anyone) may prefer this much easier approach. \$\endgroup\$ Commented Feb 23, 2021 at 2:15
  • 1
    \$\begingroup\$ @Peilonrayz, I'm presuming that there will be multiple scripts like entertime.py to do different tasks. So a __main__.py wouldn't work, unless it took arguments to tell it what to do, e.g., something like python -m teams entertime would cause __main__.py to execute entertime.py. \$\endgroup\$ Commented Feb 23, 2021 at 3:59
  • \$\begingroup\$ Oh good point. Yeah using .pys would be simpler in that regard, hadn't thought of that. \$\endgroup\$ Commented Feb 23, 2021 at 4:30

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.