1
\$\begingroup\$

Version 1 - Beginner web scraper for Nagios

Current version changes:

  • Moved NAGIOS_DATA dictionary to separate file (and added to .gitignore)
  • Used functions with DOCSTRINGS
  • Removed the multiple redundant print() statements
  • Actually read the PEP8 standards, and renamed variables to match the requirements

Again, beginner Python programmer. I appreciate the feedback!

import requests
from scraper import NAGIOS_DATA
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth, HTTPDigestAuth
def get_url_response(url, user, password, auth_type):
 """Get the response from a URL.
 Args:
 url (str): Nagios base URL
 user (str): Nagios username
 password (str): Nagios password
 auth_type (str): Nagios auth_type - Basic or Digest
 Returns: Response object
 """
 if auth_type == "Basic":
 return requests.get(url, auth=HTTPBasicAuth(user, password))
 return requests.get(url, auth=HTTPDigestAuth(user, password))
def main():
 """
 Main entry to the program
 """
 # for nagios_entry in ALL_NAGIOS_INFO:
 for url, auth_data in NAGIOS_DATA.items():
 user, password, auth_type = auth_data["user"], auth_data["password"], \
 auth_data["auth_type"]
 full_url = "{}/cgi-bin/status.cgi?host=all".format(url)
 response = get_url_response(full_url, user, password, auth_type)
 if response.status_code == 200:
 html = BeautifulSoup(response.text, "html.parser")
 for i, items in enumerate(html.select('td')):
 if i == 3:
 hostsAll = items.text.split('\n')
 hosts_up = hostsAll[12]
 hosts_down = hostsAll[13]
 hosts_unreachable = hostsAll[14]
 hosts_pending = hostsAll[15]
 hosts_problems = hostsAll[24]
 hosts_types = hostsAll[25]
 if i == 12:
 serviceAll = items.text.split('\n')
 service_ok = serviceAll[13]
 service_warning = serviceAll[14]
 service_unknown = serviceAll[15]
 service_critical = serviceAll[16]
 service_problems = serviceAll[26]
 service_types = serviceAll[27]
 # print(i, items.text) ## To get the index and text
 print_stats(
 user, url, hosts_up, hosts_down, hosts_unreachable,
 hosts_pending, hosts_problems, hosts_types, service_ok,
 service_warning, service_unknown, service_critical,
 service_problems, service_types)
 # print("Request returned:\n\n{}".format(html.text))
 # To get the full request
def print_stats(
 user, url, hosts_up, hosts_down, hosts_unreachable, hosts_pending,
 hosts_problems, hosts_types, service_ok, service_warning,
 service_unknown, service_critical, service_problems, service_types):
 print("""{}@{}:
 Hosts
 Up\tDown\tUnreachable\tPending\tProblems\tTypes
 {}\t{}\t{}\t\t{}\t{}\t\t{}
 Services
 OK\tWarning\tUnknown\tCritical\tProblems\tTypes
 {}\t{}\t{}\t{}\t\t{}\t\t{}""".format(
 user, url, hosts_up, hosts_down, hosts_unreachable, hosts_pending,
 hosts_problems, hosts_types, service_ok, service_warning,
 service_unknown, service_critical, service_problems, service_types))
if __name__ == '__main__':
 main()

scraper.py source:

NAGIOS_DATA = {
 'http://192.168.0.5/nagios': {
 'user': 'nagiosadmin',
 'password': 'PasswordHere1',
 'auth_type': 'Basic'
 },
 'https://www.example.com/nagios': {
 'user': 'exampleuser',
 'password': 'P@ssw0rd2',
 'auth_type': 'Digest'
 },
}
asked Jan 24, 2020 at 20:02
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

There are still a couple of rouge non-PEP8-compliant variable names: serviceAll and hostsAll.

This is a minor detail, but to avoid too much nesting I would suggest inverting this condition if response.status_code == 200:. Then you can write it like this:

if response.status_code != 200:
 continue # or raise an exception
html = BeautifulSoup(response.text, "html.parser")

IMO, such code is much easier to read. These kind of checks are also called guards (https://en.wikipedia.org/wiki/Guard_(computer_science)).

Instead of iterating through all the td tags, I would store them in a list and then extract the necessary elements with an index:

td_elements = list(html.select('td'))
hosts_all = td_elements[3].text.split('\n')
service_all = td_elements[12].text.split('\n')

Next, I would like to focus on the print_stats function. It takes way to many parameters and has become tough to work with. I suggest storing all variables you extract from the HTML in a dictionary, which you can then pass to the print_stats function.

extracted_information = {
 'hosts_up': hosts_all[12],
 'hosts_down': hosts_all[13],
 'hosts_unreachable': hosts_all[14],
 'hosts_pending': hosts_all[15],
 'hosts_problems': hosts_all[24],
 'hosts_types': hosts_all[25],
 'service_ok': service_all[13],
 'service_warning': service_all[14],
 'service_unknown': service_all[15],
 'service_critical': service_all[16],
 'service_problems': service_all[26],
 'service_types': service_all[27],
}

Then you would call the print_stats function like this: print_stats(user, url, extracted_information).

Of course, we now have to rewrite the print_stats function itself. The Python format function can also take named parameters. For example: "{param1} and {param2}".format(param1="a", param2="b") would return string "a and b". Using this we can rewrite the template string and pass the "unpacked" extracted_information dictionary to the format function.

def print_stats(user, url, extracted_information):
 template = """{user}@{url}:
 Hosts
 Up\tDown\tUnreachable\tPending\tProblems\tTypes
 {hosts_up}\t{hosts_down}\t{hosts_unreachable}\t\t{hosts_pending}\t{hosts_problems}\t\t{hosts_types}
 Services
 OK\tWarning\tUnknown\tCritical\tProblems\tTypes
 {service_ok}\t{service_warning}\t{service_unknown}\t{service_critical}\t\t{service_problems}\t\t{service_types}"""
 print(template.format(user=user, url=url, **extracted_information))
answered Jan 25, 2020 at 9:33
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.