Version 1 - Beginner web scraper for Nagios
Current version changes:
- Moved
NAGIOS_DATA
dictionary to separate file (and added to .gitignore) - Used functions with DOCSTRINGS
- Removed the multiple redundant
print()
statements - Actually read the PEP8 standards, and renamed variables to match the requirements
Again, beginner Python programmer. I appreciate the feedback!
import requests
from scraper import NAGIOS_DATA
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth, HTTPDigestAuth
def get_url_response(url, user, password, auth_type):
"""Get the response from a URL.
Args:
url (str): Nagios base URL
user (str): Nagios username
password (str): Nagios password
auth_type (str): Nagios auth_type - Basic or Digest
Returns: Response object
"""
if auth_type == "Basic":
return requests.get(url, auth=HTTPBasicAuth(user, password))
return requests.get(url, auth=HTTPDigestAuth(user, password))
def main():
"""
Main entry to the program
"""
# for nagios_entry in ALL_NAGIOS_INFO:
for url, auth_data in NAGIOS_DATA.items():
user, password, auth_type = auth_data["user"], auth_data["password"], \
auth_data["auth_type"]
full_url = "{}/cgi-bin/status.cgi?host=all".format(url)
response = get_url_response(full_url, user, password, auth_type)
if response.status_code == 200:
html = BeautifulSoup(response.text, "html.parser")
for i, items in enumerate(html.select('td')):
if i == 3:
hostsAll = items.text.split('\n')
hosts_up = hostsAll[12]
hosts_down = hostsAll[13]
hosts_unreachable = hostsAll[14]
hosts_pending = hostsAll[15]
hosts_problems = hostsAll[24]
hosts_types = hostsAll[25]
if i == 12:
serviceAll = items.text.split('\n')
service_ok = serviceAll[13]
service_warning = serviceAll[14]
service_unknown = serviceAll[15]
service_critical = serviceAll[16]
service_problems = serviceAll[26]
service_types = serviceAll[27]
# print(i, items.text) ## To get the index and text
print_stats(
user, url, hosts_up, hosts_down, hosts_unreachable,
hosts_pending, hosts_problems, hosts_types, service_ok,
service_warning, service_unknown, service_critical,
service_problems, service_types)
# print("Request returned:\n\n{}".format(html.text))
# To get the full request
def print_stats(
user, url, hosts_up, hosts_down, hosts_unreachable, hosts_pending,
hosts_problems, hosts_types, service_ok, service_warning,
service_unknown, service_critical, service_problems, service_types):
print("""{}@{}:
Hosts
Up\tDown\tUnreachable\tPending\tProblems\tTypes
{}\t{}\t{}\t\t{}\t{}\t\t{}
Services
OK\tWarning\tUnknown\tCritical\tProblems\tTypes
{}\t{}\t{}\t{}\t\t{}\t\t{}""".format(
user, url, hosts_up, hosts_down, hosts_unreachable, hosts_pending,
hosts_problems, hosts_types, service_ok, service_warning,
service_unknown, service_critical, service_problems, service_types))
if __name__ == '__main__':
main()
scraper.py
source:
NAGIOS_DATA = {
'http://192.168.0.5/nagios': {
'user': 'nagiosadmin',
'password': 'PasswordHere1',
'auth_type': 'Basic'
},
'https://www.example.com/nagios': {
'user': 'exampleuser',
'password': 'P@ssw0rd2',
'auth_type': 'Digest'
},
}
1 Answer 1
There are still a couple of rouge non-PEP8-compliant variable names: serviceAll
and hostsAll
.
This is a minor detail, but to avoid too much nesting I would suggest inverting this condition if response.status_code == 200:
. Then you can write it like this:
if response.status_code != 200:
continue # or raise an exception
html = BeautifulSoup(response.text, "html.parser")
IMO, such code is much easier to read. These kind of checks are also called guards (https://en.wikipedia.org/wiki/Guard_(computer_science)).
Instead of iterating through all the td
tags, I would store them in a list and then extract the necessary elements with an index:
td_elements = list(html.select('td'))
hosts_all = td_elements[3].text.split('\n')
service_all = td_elements[12].text.split('\n')
Next, I would like to focus on the print_stats
function. It takes way to many parameters and has become tough to work with. I suggest storing all variables you extract from the HTML in a dictionary, which you can then pass to the print_stats
function.
extracted_information = {
'hosts_up': hosts_all[12],
'hosts_down': hosts_all[13],
'hosts_unreachable': hosts_all[14],
'hosts_pending': hosts_all[15],
'hosts_problems': hosts_all[24],
'hosts_types': hosts_all[25],
'service_ok': service_all[13],
'service_warning': service_all[14],
'service_unknown': service_all[15],
'service_critical': service_all[16],
'service_problems': service_all[26],
'service_types': service_all[27],
}
Then you would call the print_stats
function like this: print_stats(user, url, extracted_information)
.
Of course, we now have to rewrite the print_stats function itself. The Python format function can also take named parameters. For example: "{param1} and {param2}".format(param1="a", param2="b")
would return string "a and b"
. Using this we can rewrite the template string and pass the "unpacked" extracted_information
dictionary to the format
function.
def print_stats(user, url, extracted_information):
template = """{user}@{url}:
Hosts
Up\tDown\tUnreachable\tPending\tProblems\tTypes
{hosts_up}\t{hosts_down}\t{hosts_unreachable}\t\t{hosts_pending}\t{hosts_problems}\t\t{hosts_types}
Services
OK\tWarning\tUnknown\tCritical\tProblems\tTypes
{service_ok}\t{service_warning}\t{service_unknown}\t{service_critical}\t\t{service_problems}\t\t{service_types}"""
print(template.format(user=user, url=url, **extracted_information))