I built this scraper for work that will take a csv list of firewalls from our network management system and scan a given list of HTTPS ports to see if the firewalls are accepting web requests on the management ports. I originally built this in powershell, but decided to rebuild it in python for the learning experience.
I was able to cut down the scan time substantially using multiprocessing, but I'm wondering if I can further optimize my code to get it faster.
Also, I'm very new to python. So if you have any input on better more efficient ways that I could have used to accomplish these steps would be much appreciated.
import urllib.request
import re
import os
import ssl
import multiprocessing
#imports a csv list of firewalls with both private and public IP addresses
f = open(r'\h.csv',"r")
if f.mode =="r":
cont = f.read()
#regex to remove private ip addresses and then put the remaining public ip addresses in a list
c = re.sub(r"(172)\.(1[6-9]|2[0-9]|3[0-1])(\.(2[0-4][0-9]|25[0-5]|[1][0-9][0-9]|[1-9][0-9]|[0-9])){2}|(192)\.(168)(\.(2[0-4][0-9]|25[0-5]|[1][0-9][0-9]|[1-9][0-9]|[0-9])){2}|(10)(\.(2[0-4][0-4]|25[0-5]|[1][0-9][0-9]|[1-9][0-9]|[0-9])){3}","",cont)
d = re.findall(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}",c)
#uses HTTP requests to check if any of the 8 management ports on the addresses in the list are accepting web requests
def httpScan(list):
iplen = len(list)
ports = [443, 4433, 444, 433, 4343, 4444, 4443, 4434]
portlen = len(ports)
for k in range(iplen):
for i in range(portlen):
context = ssl._create_unverified_context()
try:
fp = urllib.request.urlopen("https://" + str(list[k]) + ":" + str(ports[i]), context=context)
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
except:
continue
if "SSLVPN" in mystr:
print(list[k] + " SSLVPN" + ": " + str(ports[i]) + " " + str(k) + " " + str(os.getpid()))
elif "auth1.html" in mystr:
print(list[k] + " MGMT" + ": " + str(ports[i]) + " " + str(k))
#splits the list of IP addresses up based on how many CPU there are and adds each segment to a dictionary
cpu = int(multiprocessing.cpu_count())
sliced = int(len(d)/cpu)
mod = int(len(d))%int(cpu)
num = 1
lists = dict()
for i in range(cpu):
if i != (cpu - 1):
lists[i] = d[(num*sliced) - sliced:num*sliced]
num += 1
else:
lists[i] = d[(num*sliced) - sliced:(num*sliced) + mod]
#starts a process for each unique segment created
t = dict()
if __name__ == "__main__":
for i in range(cpu):
t[i] = multiprocessing.Process(target=httpScan, args=(lists[i],))
t[i].start()
```
1 Answer 1
Reading File
Here is a tip, while reading, the r
is optional
f = open(r'\h.csv',"r")
can be written as
f = open(r'\h.csv')
Your whole reading block can use context managers (blocks using the with keyword).
with open(r'\h.csv', encoding='utf8') as f:
cont = f.read()
If you are dealing with a huge text file, you might do:
with open(r'\h.csv', encoding='utf8') as f:
for ip in f:
ip = ip.rstrip('\n')
.. verify
String
Using string formatting i.e. .format()
can give a better idea of what's going on. It also eliminates the use of str()
each time.
We can change this
print(list[k] + " MGMT" + ": " + str(ports[i]) + " " + str(k))
to that
print("{} MGMT: {} {}".format(list[k], ports[i], k))
and as from 3.6+, adding an f
print(f"{list[k]} MGMT: {ports[i]} {k}")
Loop Iteration
In many other languages, you need the index while looping to have the element at this index. Python provides a nice and intuitive way to loop over elements
The current implementation:
ports = [443, 4433, 444, 433, 4343, 4444, 4443, 4434]
portlen = len(ports)
for i in range(portlen):
print(ports[i])
But the pythonic way is:
ports = [443, 4433, 444, 433, 4343, 4444, 4443, 4434]
for port in ports:
print(port)
port
here gives you the element directly.
If ever you still want the index, you do:
for i, port in enumerate(ports):
where i
is the index.
Miscellaneous
Here:
cpu = int(multiprocessing.cpu_count())
No need to cast to int as multiprocessing.cpu_count()
already returns an integer. You can verify for int by type(multiprocessing.cpu_count())
Normally with .start()
, you must include a .join()
, as this allows all child processes to terminate before exiting.
for ...:
... .start()
for ...:
... .join()
-
1\$\begingroup\$ Thank you very much, this is exactly what I was looking for! \$\endgroup\$gdesigner1– gdesigner12019年08月06日 19:12:57 +00:00Commented Aug 6, 2019 at 19:12
-
\$\begingroup\$ You are welcomed! \$\endgroup\$Abdur-Rahmaan Janhangeer– Abdur-Rahmaan Janhangeer2019年08月06日 19:14:14 +00:00Commented Aug 6, 2019 at 19:14
Explore related questions
See similar questions with these tags.