4
\$\begingroup\$

Intro

This simple script will allow me to check for a specific opened port on a list of domains that I own. Instead of manually doing this check, I found Python a pretty good idea for such a task.

After profiling my code, I found out that def check_for_open_ports(): is really slow. It takes about 0:01:16.799242 seconds for 4 domains.

I wondered if there's a good / recommended way of improving this (maybe multithreading / multiprocessing). While asking for an answer which implements one of the above two methods is forbidden here, I wouldn't mind seeing one. I know that one shall use multiprocessing when there're I/O bound tasks which makes me believe I might go with a multithreading solution.


The code

from socket import gethostbyname, gaierror, error, socket, AF_INET, SOCK_STREAM
from sys import argv, exit
import re
DOMAINS_FILE = argv[1]
PORT = argv[2]
OUTPUT_FILE = argv[3]
def get_domains():
 """
 Return a list of domains from domains.txt
 """
 domains = []
 if len(argv) != 4:
 exit("Wrong number of arguments\n")
 try:
 with open(DOMAINS_FILE) as domains_file:
 for line in domains_file:
 domains.append(line.rstrip())
 except IOError:
 exit("First argument should be a file containing domains")
 return domains
def check_domain_format(domain):
 """
 This function removes the beginning of a domain if it starts with:
 www.
 http://
 http://www.
 https://
 https://www.
 """
 clear_domain = re.match(r"(https?://(?:www\.)?|www\.)(.*)", domain)
 if clear_domain:
 return clear_domain.group(2)
 return domain
def transform_domains_to_ips():
 """
 Return a list of ips specific to the domains in domains.txt
 """
 domains = get_domains()
 domains_ip = []
 for each_domain in domains:
 each_domain = check_domain_format(each_domain)
 try:
 domains_ip.append(gethostbyname(each_domain))
 except gaierror:
 print("Domain {} not ok. Skipping...\n".format(each_domain))
 return domains_ip
def check_for_open_ports():
 """
 Check for a specific opened PORT on all the domains from domains.txt
 """
 ips = transform_domains_to_ips()
 try:
 with open(OUTPUT_FILE, 'a') as output_file:
 for each_ip in ips:
 try:
 sock = socket(AF_INET, SOCK_STREAM)
 result = sock.connect_ex((each_ip, int(PORT)))
 if result == 0:
 output_file.write(each_ip + '\n')
 sock.close()
 except error:
 print("Couldn't connect to server")
 except KeyboardInterrupt:
 exit("You pressed CTRL + C. Will exit now...\n")
if __name__ == '__main__':
 check_for_open_ports()

A step further

After some checks, I realised that what was mainly slowing down the program can be improved by reducing the default timeout from the socket module using setdefaulttimeout(2).

Even if this solved a part of the problem, I still don't find it to be the cleanest one. Any advice related to performance is really welcome !


Extra info:

  • I'll probably use this only on Linux OSs
  • I've used Python 2.7.13

PS: I'd like you to ignore the fact that I didn't use optparse or argparse for parsing CLI arguments.

asked Feb 1, 2017 at 18:22
\$\endgroup\$
1
  • 2
    \$\begingroup\$ My question was firstly asked on SO, but I had a second thought and as discussed on chat, I came back home on CR. \$\endgroup\$ Commented Feb 1, 2017 at 19:13

1 Answer 1

2
\$\begingroup\$
  • First a slight style note (IMHO, of course). You called your function check_domain_format, but it's actually returning a modified string and you're using the result, not checking it. I'd go for a name like validate_domain_format

About it being slow:

  • Yes, multi-threading would help in checking multiple domains at once, but if that was the only problem you could just make a separate bash script to launch your python script with different parameters.
  • You said that you own the domains, so I'm assuming you have RAW socket capabilities. If that's the case, you can speed up your check by using a SYN check. You can have a look here , even if the question has been down-voted, it should give you the general idea. Here you can find that same check.

  • If you're doing this for educational purposes that's ok, otherwise nmap will most likely do a better job, give you more options and be faster (because SYN check is already implemented and you can also scan for UDP ports, for example).

answered Feb 1, 2017 at 22:09
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.