Use case - motivation & challenge
Hi all! I have been working with Python for the last two years, but never learned proper object-oriented programming and design patterns. I've decided for this year to close this gap by reading some books and applying the knowledge to a real-world problem. I am looking forward to learning a lot from all the suggestions :)
To kick off my learning, I've decided to automate a recurring weekly task of filling some timesheets located in Microsoft Teams, using a bot to do the heavy lifting for me. The bot should perform the following steps:
- Navigate to the login page
- Fill in username and password
- Sign in
- Navigate to the excel page with the timesheet
- Fill in my weekly hours
Currently, the bot does almost all steps, except the last two, which I haven't implemented yet.
Code breakdown
The code is quite simple. I rely heavily on selenium to perform all actions, so I want to create a chrome instance where the agent will perform its actions.
Naturally, I first import the libraries I am going to use:
import os
import time
import random
from selenium import webdriver
from dataclasses import dataclass
from abc import ABC, abstractmethod
from webdriver_manager.chrome import ChromeDriverManager
Next up, I define immutable classes whose only purpose is to containerize information that is static, so that code duplication can be avoided.
@dataclass(frozen=True)
class XPathsContainer:
teams_login_button: str = '//*[@id="mectrl_main_trigger"]/div/div[1]'
teams_login_user_button: str = '//*[@id="i0116"]'
teams_login_next_button: str = '//*[@id="idSIButton9"]'
teams_login_pwd_button: str = '//*[@id="i0118"]'
teams_sign_in_button: str = '//*[@id="idSIButton9"]'
teams_sign_in_keep_logged_in: str = '//*[@id="KmsiCheckboxField"]'
@dataclass(frozen=True)
class UrlsContainer:
teams_login_page: str = 'https://www.microsoft.com/en-in/microsoft-365/microsoft-teams/group-chat-software'
Now, I try to implement a base class which is called Driver
. This class contains the initialization of the chrome object and sets the foundations for other agents to be inherited. Each Agent
child class might have (in the future) different actions but they must have a sleep method (to avoid restrictions in using bots), they must be able to click, write information and navigate to pages.
class Driver(ABC):
def __init__(self, action, instruction, driver=None):
if driver:
self.driver = driver
else:
self.driver = webdriver.Chrome(ChromeDriverManager().install())
self.actions = {
'navigate': self.navigate,
'click': self.click,
'write': self.write
}
self.parameters = {
'action': None,
'instruction': None
}
@abstractmethod
def sleep(self, current_tick=1):
pass
@abstractmethod
def navigate(self, *args):
pass
@abstractmethod
def click(self, *args):
pass
@abstractmethod
def write(self, **kwargs):
pass
@abstractmethod
def main(self, **kwargs):
pass
Now I implement a basic Agent
child class, which implements the logic of required functions of the base class Driver
.
class Agent(Driver):
def __init__(self, action, instruction, driver):
super().__init__(action, instruction, driver)
self.action = action
self.instruction = instruction
def sleep(self, current_tick=1):
seconds = random.randint(3, 7)
timeout = time.time() + seconds
while time.time() <= timeout:
time.sleep(1)
print(f"Sleeping to replicate user.... tick {current_tick}/{seconds}")
current_tick += 1
def navigate(self, url):
print(f"Agent navigating to {url}...")
return self.driver.get(url)
def click(self, xpath):
print(f"Agent clicking in '{xpath}'...")
return self.driver.find_element_by_xpath(xpath).click()
def write(self, args):
xpath = args[0]
phrase = args[1]
print(f"Agent writing in '{xpath}' the phrase '{phrase}'...")
return self.driver.find_element_by_xpath(xpath).send_keys(phrase)
def main(self, **kwargs):
self.action = kwargs.get('action', self.action)
self.instruction = kwargs.get('instruction', self.instruction)
self.actions[self.action](self.instruction)
self.sleep()
Finally, I've created a function that updates the parameters of the class whenever there is a set of actions and instructions that need to be executed under the same chrome driver. And I've created a function that takes a script of actions and executes them.
def update_driver_parameters(driver, values):
params = driver.parameters
params['action'] = values[0]
params['instruction'] = values[1]
return params
def run_script(script):
for script_line, script_values in SCRIPT.items():
chrome = Agent(None, None, None)
for instructions in script_values:
params = update_driver_parameters(chrome, instructions)
chrome.main(**params)
chrome.sleep()
USER = os.environ["USERNAME"]
SECRET = os.environ["SECRET"]
SCRIPT = {
'login': [
('navigate', UrlsContainer.teams_login_page),
('click', XPathsContainer.teams_login_button),
('write', (XPathsContainer.teams_login_user_button, USER)),
('click', XPathsContainer.teams_login_next_button),
('write', (XPathsContainer.teams_login_pwd_button, SECRET)),
('click', XPathsContainer.teams_sign_in_button),
('click', XPathsContainer.teams_sign_in_keep_logged_in),
('click', XPathsContainer.teams_sign_in_button),
]
}
run_script(SCRIPT)
Concerns
Right now, I think the code has several major concerns, mostly related to being inexperienced in design patterns:
- I rely too much on Xpaths to make the bot do something which will result in an enormous data class if there are many steps to do;
- Also, relying on Xpaths could be bad, because if the page is updated, I will have to retrace steps, but this is probably necessary evil;
- I am not sure whether the implementation of an immutable class is the correct one. I've used
dataclass
for this; - I have the feeling that the inheritance that I've implemented is quite clunky. I want to be able to share the same driver along with multiple classes. I don't want to create a new driver per action, I always want to fetch the latest context the driver did, but if a new agent is created then a new driver must be assigned to that agent;
- Maybe
kwargs
arguments could be implemented differently, I am never sure of the correct way to parse them without usingkwargs.get
; - Inconsistent use of args and kwargs, could this be implemented differently?
1 Answer 1
Bug: on the first line of run_script
, SCRIPT.items()
should be script.items()
. As written, it executes the global SCRIPT and not the argument to the function.
It doesn't seem like Agent should inherit from Driver
If you research Selenium best practices, you will find a few that make sense for your use case (most are geared toward testing). Two of them are Page Objects and preferred selector order.
The idea behind Page Objects is to create a class for each page of the web application (or at least the pages you are using). The class encapsulates the data and methods needed to interact with that page. Your automation script then calls the methods on the Page Objects to automate a task. For example, a class for a login page might have methods for getting the login page, for entering a username, entering a password, clicking a remember me checkbox, and clicking a login button. A login method then calls these methods in the right order to do a login.
This lets you isolate page specifics in one place. For example, the current design seems to suggest that if you automate another task, you would need to duplicate the login
portion of SCRIPT
. Then, if the login process changes every script needs to by updated. Using a Page Object, only the login page class needs to be changed.
In practice the most reliable and robust way to select an element is by ID, then by name, css selector, and lastly Xpath is the least robust. It looks like most of your targets have IDs, so use that.
Structure the project something like this:
project
pages
__init__.py # can be empty
base.py # one for each page
home.py
login.py
time.py
...etc... # add whatever other pages you use
entertime.py # the script
Then
base.pyclass BasePage:
URL = None
def __init__(self, driver=None):
if driver is None:
driver = webdriver.Chrome(ChromeDriverManager().install())
self.driver = driver
def click(self, locator, mu=1.5, sigma=0.3):
"""simulate human speed and click a page element."""
self.dally(mu, sigma)
self.driver.find_element(*locator).click()
return self
def dally(self, mu=1, sigma=0.2):
pause = random.gauss(mu, sigma)
while pause > 0:
delta = min(1, pause)
pause -= delta
time.spleep(delta)
return self
def navigate(self):
if self.URL:
self.driver.get(self.URL)
return self
raise ValueError("No where to go. No URL")
def send_keys(self, locator, keys):
self.driver.find_element(*locator).send_keys(keys)
return self
login.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from .base import BasePage
class LoginPage(BasePage):
URL = 'https://www.microsoft.com/en-in/microsoft-365/microsoft-teams/group-chat-software'
#locators for elements of the page
LOGIN_BUTTON = (By.XPATH, '//*[@id="mectrl_main_trigger"]/div/div[1]')
USERNAME_FIELD = (By.ID, "i0116")
NEXT_BUTTON = (By.ID, "idSIButton9")
PASSWORD_FIELD = (By.ID, "i0118")
STAY_LOGGED_IN = (By.ID, "KmsiCheckboxField")
def click_next(self):
self.click(*self.NEXT_BUTTON)
return self
def start_login(self):
self.click(*self.LOGIN_BUTTON)
return self
def enter_username(self, username):
self.send_keys(*self.USERNAME_FIELD, username)
self.click_next()
return self
def enter_password(self, password):
self.send_keys(*self.PASSWORD_FIELD, password)
self.click_next()
return self
def toggle_stay_logged_in(self):
self.driver.find_element(*self.STAY_LOGGED_IN).click()
return self
def login(self, username, password):
self.navigate()
self.start_login()
self.enter_username(username)
self.enter_password(password)
self.toggle_stay_logged_in()
self.click_next()
return HomePage(driver) # or whatever page comes after a login
entertime.py
import os
from pages import LoginPage, HomePage # what ever pages you need for the script
from selenium import webdriver
from selenium.webdriver.common.by import By
USER = os.environ["USERNAME"]
SECRET = os.environ["SECRET"]
homepage = LoginPage().login(USER, SECRET)
timepage = homepage.navigate_to_time_entry() # <== whatever method you define
timepage.entertime() # <== whatever method you define
I don't have MS teams to test this on, so this hasn't been tested. It is merely as suggestion on how to structure you project to make it easier to update, expand, etc.
-
1\$\begingroup\$ Nice answer. The import pattern
entertime.py
is a pattern I used to use, I found it good until I was more comfortable with correctly setting up a__main__.py
. I'd personally renameentertime.py
topages/__main__.py
(with some import changes,from . import LoginPage, ...
) and run the package withpython -m pages
rather thanpython entertime.py
. Note using a__main__.py
can be quite finicky at times so you (anyone) may prefer this much easier approach. \$\endgroup\$2021年02月23日 02:15:33 +00:00Commented Feb 23, 2021 at 2:15 -
1\$\begingroup\$ @Peilonrayz, I'm presuming that there will be multiple scripts like
entertime.py
to do different tasks. So a__main__.py
wouldn't work, unless it took arguments to tell it what to do, e.g., something likepython -m teams entertime
would cause__main__.py
to executeentertime.py
. \$\endgroup\$RootTwo– RootTwo2021年02月23日 03:59:02 +00:00Commented Feb 23, 2021 at 3:59 -
\$\begingroup\$ Oh good point. Yeah using
.py
s would be simpler in that regard, hadn't thought of that. \$\endgroup\$2021年02月23日 04:30:01 +00:00Commented Feb 23, 2021 at 4:30
Explore related questions
See similar questions with these tags.