3
\$\begingroup\$

We have a requirement to check the size of some of our website home pages, in MB. This is to ensure that they are not growing too large, and that large images haven't been uploaded. I couldn't find much info on this, but have come up with following which is working as required.

The code is designed to either run locally on my machine, or by leveraging our Selenium grid. It simply loads home page, and from the browser Performance log, we strip out the Network.dataReceived (bytes) information and sum it.

Finally, we have set levels which the site should fall within.

#!/usr/bin/python
""" This script will simply check the download page size (bytes) of a Home page."""
import argparse
import re
import sys
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
parser = argparse.ArgumentParser(description="This script will measure size of a home page.")
parser.add_argument('--site', default='somewebsite.com', required=True)
parser.add_argument('--local', action='store_true', default=False)
parser.add_argument('--datacentre', choices=['dc1', 'dc2'])
args = parser.parse_args()
logging_prefs = {'performance' : 'INFO'}
if args.local:
 caps = DesiredCapabilities.CHROME.copy()
 caps['loggingPrefs'] = logging_prefs
 driver = webdriver.Chrome(desired_capabilities=caps)
else:
 profile = webdriver.FirefoxProfile()
 profile.set_preference('plugin.state.flash', 0)
 profile.set_preference("webdriver_enable_native_events", False)
 profile.update_preferences()
 caps = DesiredCapabilities.FIREFOX.copy()
 caps['loggingPrefs'], caps['acceptSslCerts'] = logging_prefs, False
 if args.datacentre == 'dc1':
 driver = webdriver.Remote(
 command_executor='http://selenium/hub',
 desired_capabilities=caps,
 browser_profile=profile)
 elif args.datacentre == 'dc2':
 driver = webdriver.Remote(
 command_executor='http://selenium/hub',
 desired_capabilities=caps,
 browser_profile=profile)
driver.implicitly_wait(30)
driver.set_page_load_timeout(30)
url = "http://" + args.site + "/"
driver.get(url)
total_bytes = []
try:
 for entry in driver.get_log('performance'):
 if "Network.dataReceived" in str(entry):
 r = re.search(r'encodedDataLength\":(.*?),', str(entry))
 total_bytes.append(int(r.group(1)))
except Exception:
 print 'error'
 driver.close()
 sys.exit(3)
if total_bytes is not None:
 mb = round((float(sum(total_bytes) / 1000) / 1000), 2)
if args.local:
 from datetime import datetime
 d = (datetime.today()).strftime("%d-%m-%y-%H-%M")
 filename = 'results_{}.txt'.format(str(d))
 with open(filename, 'a') as f:
 f.write("{}, {}\n".format(args.site, mb))
try:
 if mb < 2.0:
 print "OK. Total Network Data Received size for {}: {}MB".format(args.site, str(mb))
 sys.exit(0)
 elif mb >= 2.0 and mb < 4.0:
 print "Warning. Total Network Data Received size for {}: {}MB".format(args.site, str(mb))
 sys.exit(1)
 elif mb > 4.0:
 print "CRITICAL. Total Network Data Received size for {}: {}MB".format(args.site, str(mb))
 sys.exit(1)
except Exception:
 print "UNKNOWN. Something went wrong."
 sys.exit(3)
finally:
 driver.close()
Daniel
4,6122 gold badges18 silver badges40 bronze badges
asked Jul 6, 2018 at 15:30
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$
  1. You're using the wrong shebang. According to this StackOverflow answer:

    Correct usage for Python 3 scripts is:

    #!/usr/bin/env python3
    

    This defaults to version 3.latest. For Python 2.7.latest use python2 in place of python3.

    Don't use #!/usr/bin/python. From the Tutor mailing list:

    Consider the possiblities that in a different machine, python may be installed at /usr/bin/python or /bin/python in those cases, the above #! will fail. For those cases, we get to call the env executable with argument which will determine the arguments path by searching in the $PATH and use it correctly.

  2. I'd move command line argument handling to the bottom, possibly in a main(). The rest of the code could be separated into function 'units'.

  3. Certain variables and literals could be replaced by constants. From what I can tell:

    • logging_prefs could be a constant;
    • Waiting times could be constants;
    • Exit codes could be constants.
  4. Catching Exception is bad practice.

  5. These parentheses:

    d = (datetime.today()).strftime("%d-%m-%y-%H-%M")
    

    are superfluous.

  6. This cast:

    filename = 'results_{}.txt'.format(str(d))
    

    is pointless.

  7. Don't mix single and double quotes.

  8. Don't assign multiple variables on a single line.

  9. The same driver will be instantiated, regardless of the 'datacentre' argument. Therefore, the if-statement can be left out altogether.

  10. Error messages should be written to stderr, not stdout.

  11. Imports should be placed at the top of a file, and rarely in a function.

Here's my take on it (untested):

#!/usr/bin/env python3
""" This script will simply check the download page size (bytes) of a Home page."""
from __future__ import print_function
import argparse
from datetime import datetime
import re
import sys
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
LOGGING_PREFS = {"performance": "INFO"}
WAITING_TIME = 30
TIMEOUT = 30
DATA_LENGTH_REGEX = r"encodedDataLength\":(.*?),"
EXIT_OK = 0
EXIT_WARNING = 1
EXIT_FAILURE = 2
def get_driver(local):
 if local:
 caps = DesiredCapabilities.CHROME.copy()
 caps["loggingPrefs"] = LOGGING_PREFS
 return webdriver.Chrome(desired_capabilities=caps)
 else:
 profile = webdriver.FirefoxProfile()
 profile.set_preference("plugin.state.flash", 0)
 profile.set_preference("webdriver_enable_native_events", False)
 profile.update_preferences()
 caps = DesiredCapabilities.FIREFOX.copy()
 caps["loggingPrefs"] = LOGGING_PREFS
 caps["acceptSslCerts"] = False
 return webdriver.Remote(
 command_executor="http://selenium/hub",
 desired_capabilities=caps,
 browser_profile=profile
 )
def get_homepage_size(driver, website):
 driver.implicitly_wait(WAITING_TIME)
 driver.set_page_load_timeout(TIMEOUT)
 url = "http://" + website + "/"
 driver.get(url)
 total_bytes = []
 try:
 for entry in driver.get_log("performance"):
 entry = str(entry)
 if "Network.dataReceived" in entry:
 r = re.search(DATA_LENGTH_REGEX, entry)
 total_bytes.append(int(r.group(1)))
 except Exception:
 # TODO Find a more specific exception.
 # What could fail?
 driver.close()
 print("Failed to get data size for {}".format(website), file=sys.stderr)
 sys.exit(EXIT_FAILURE)
 if total_bytes is not None:
 return round((float(sum(total_bytes) / 1000) / 1000), 2)
def write_results_to_file(website, size_in_mb):
 date = datetime.today().strftime("%d-%m-%y-%H-%M")
 filename = "results_{}.txt".format(date)
 with open(filename, "a") as f:
 f.write("{}, {}\n".format(website, size_in_mb))
def display_results(website, size_in_mb):
 if size_in_mb < 2.0:
 print("OK. Total Network Data Received size for {}: {}MB".format(
 website, size_in_mb)
 )
 sys.exit(EXIT_OK)
 elif size_in_mb >= 2.0 and size_in_mb < 4.0:
 print("Warning. Total Network Data Received size for {}: {}MB".format(
 website, size_in_mb
 )
 sys.exit(EXIT_WARNING)
 elif size_in_mb >= 4.0:
 print("CRITICAL. Total Network Data Received size for {}: {}MB".format(
 website, size_in_mb)
 )
 sys.exit(EXIT_WARNING)
def main():
 parser = argparse.ArgumentParser(description="This script will measure size of a home page.")
 parser.add_argument("--site", default="somewebsite.com", required=True)
 parser.add_argument("--local", action="store_true", default=False)
 parser.add_argument("--datacentre", choices=["dc1", "dc2"])
 args = parser.parse_args()
 driver = get_driver(args.local)
 size_in_mb = get_homepage_size(driver=driver, website=args.site)
 if args.local:
 write_results_to_file(args.site, size_in_mb=size_in_mb)
 display_results(args.site, size_in_mb=size_in_mb)
if __name__ == "__main__":
 main()
answered Jul 8, 2018 at 11:06
\$\endgroup\$
1
  • 1
    \$\begingroup\$ (as an aside, I redacted my code, and the Driver Datacentre GB1 / GB2 logic does make sense in reality - we have a grid in each :) \$\endgroup\$ Commented Jul 9, 2018 at 8:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.