3
\$\begingroup\$

I need to iterate over a list of IP addresses, to check if the elements inside are subnets/supernets between them, for this I use the IP-address library.

Currently I have the following problem: my second iterator (which I use to iterate over the rest of the list) always starts at 0 and therefore compares each element of the list twice (which I think can be avoided).

Here is part of the txt file from which I create the list:

Name 
10.104.181.0/23
10.104.180.0/24
10.132.83.112/32
10.104.183.0/24
10.104.185.0/24
10.104.185.32/24
174.28.164.30/24
178.38.233.0/24
10.104.186.0/24
10.132.208.0/24
10.104.185.42/24
10.130.217.88/32
10.104.24.108/24
10.134.213.84/32
10.134.205.18/32
10.104.182.145/32
10.123.232.95/32
10.130.236.0/24
10.105.26.245/32
10.134.65.250/32
10.134.222.120/32
10.104.184.62/32
10.104.184.61/32
10.105.121.0/24

And here is my python code:

import ipaddress
import sys
fileobj=open("test.txt")
for line in fileobj:
 lines.append(line.strip())#create the list from the txt file
lines.pop(0)#remove line "name"
for line in lines:
 for j in range(1,len(lines)):
 if(line==lines[j]):#problem here: j start always at 1
 print('comparing same line.... skipping')
 pass
 elif(ipaddress.ip_network(line,False).subnet_of(ipaddress.ip_network(lines[j],False))):#check if subnet
 print(line+' is subnet of network'+lines[j])
 elif(ipaddress.ip_network(line,False).supernet_of(ipaddress.ip_network(lines[j],False))):#check if supernet
 print(line+' is super of network'+lines[j])
 

What I'm looking for:

  1. How to solve the problem of my second iterator.
  2. Any other improvement (whatever it is, formatting, algorithm,...) is appreciated.
asked May 11, 2021 at 9:04
\$\endgroup\$

2 Answers 2

4
\$\begingroup\$

This section:

fileobj=open("test.txt")
for line in fileobj:
 lines.append(line.strip())#create the list from the txt file
lines.pop(0)#remove line "name"

has an alternate representation as an iterator-adjustment to skip one line:

with open('test.txt') as fileobj:
 next(fileobj)
 lines = [line.strip() for line in fileobj]

Also note use of a list comprehension and a context manager.

Your separate supernet/subnet checks are redundant. If A is a subnet of B, B is a supernet of A; so it's not even worth doing the second.

Algorithmically: if your list is very, very long you may want to consider moving away from your current nested loop. Your current loop is O(n^2) in time and I don't immediately see a way to avoid this . Consider doing something like - roughly -

  1. Populate a list of tuples of start and end IP address integers and IP objects; the packed integer representation can be obtained by casting an ipaddress object to int
  2. Sort this list; a naive .sort() should work
  3. On your outer loop, iterate over all entries for subnet selection
  4. Prune away any potential supernets whose end is so low that they will never match again
  5. On your inner loop, iterate from the beginning of the list up to the current entry and then quit.

This sortation and early-bail should improve your performance a little.

Example

Edit: "a little" seems to be a (vast) understatement; comparing the two:

from collections import OrderedDict
from functools import partial
from ipaddress import IPv4Network
from random import randrange, randbytes, seed
from timeit import timeit
from typing import Iterable, Tuple, Sequence
NetRow = Tuple[int, int, str]
IndexRow = Tuple[int, int]
def parse_addresses(filename: str) -> Iterable[NetRow]:
 with open(filename) as f:
 next(f)
 for line in f:
 net_str = line.strip()
 net = IPv4Network(net_str, strict=False)
 yield int(net.network_address), int(net.broadcast_address), net_str
def end_indices(nets: Sequence[NetRow]) -> Iterable[IndexRow]:
 for i, (start, end, net) in enumerate(nets):
 yield end, i
def new(filename: str):
 # Start and end integers and network objects ordered by start
 subnets = sorted(parse_addresses(filename))
 # Supernet dictionary by index from subnets; in same order as subnets
 supernets = OrderedDict(enumerate(subnets))
 # List of tuples of subnet end to subnet index, used for pruning
 ends = sorted(end_indices(subnets))
 for sub_start, sub_end, subnet in subnets:
 # If there are any supernets whose end occurs before the current start,
 # they need to be pruned because they will never match
 for n_to_drop, (end, index) in enumerate(ends):
 if end >= sub_start:
 break
 supernets.pop(index)
 del ends[:n_to_drop]
 for super_start, super_end, supernet in supernets.values():
 # Skip comparison to self
 if subnet is supernet:
 continue
 # If the current supernet start is after the current subnet start,
 # there will not be any more supernets that encompass this subnet
 if super_start > sub_start:
 break
 # The supernet start occurs at or before the current subnet start.
 # If the supernet end occurs at or after the current subnet end,
 # then the supernet encompasses the subnet.
 if super_end >= sub_end:
 # assert subnet.subnet_of(supernet)
 print(f'{subnet} is subnet of {supernet}')
def old(filename: str):
 lines = []
 fileobj = open(filename)
 for line in fileobj:
 lines.append(line.strip()) # create the list from the txt file
 lines.pop(0) # remove line "name"
 for line in lines:
 for j in range(1, len(lines)):
 if (line == lines[j]): # problem here: j start always at 1
 # print('comparing same line.... skipping')
 pass
 elif (IPv4Network(line, False).subnet_of(
 IPv4Network(lines[j], False))): # check if subnet
 print(line
 + ' is subnet of network ' + lines[j])
 #elif (IPv4Network(line, False).supernet_of(
 # IPv4Network(lines[j], False))): # check if supernet
 # print(line
 # + ' is super of network ' + lines[j])
def generate_test(filename: str, min_mask: int, rows: int):
 seed(0) # for repeatability
 with open(filename, 'w') as f:
 f.write('Name\n')
 for _ in range(rows):
 addr = '.'.join(str(b) for b in randbytes(4))
 mask = randrange(min_mask, 31)
 f.write(f'{addr}/{mask}\n')
if __name__ == '__main__':
 generate_test('bigtest.txt', min_mask=14, rows=1_000)
 for method, title in (
 (new, 'New'),
 (old, 'Old'),
 ):
 print(f'{title} method:')
 t = timeit(partial(method, 'bigtest.txt'), number=1)
 print(f'{t:.3f}s\n')

Output

New method:
50.100.190.198/30 is subnet of 50.100.216.175/16
68.143.197.241/26 is subnet of 68.142.21.93/15
87.88.133.166/17 is subnet of 87.88.222.120/17
87.88.222.120/17 is subnet of 87.88.133.166/17
101.186.112.235/19 is subnet of 101.186.155.104/14
106.183.253.213/22 is subnet of 106.180.121.90/14
110.142.177.23/29 is subnet of 110.140.246.97/14
110.206.109.247/20 is subnet of 110.205.222.149/14
125.43.157.205/28 is subnet of 125.42.59.132/15
138.157.204.243/29 is subnet of 138.157.172.230/14
158.221.173.230/30 is subnet of 158.221.69.71/14
239.186.245.174/18 is subnet of 239.185.216.72/14
243.3.157.156/28 is subnet of 243.3.56.96/16
0.029s
Old method:
87.88.222.120/17 is subnet of network 87.88.133.166/17
125.43.157.205/28 is subnet of network 125.42.59.132/15
106.183.253.213/22 is subnet of network 106.180.121.90/14
101.186.112.235/19 is subnet of network 101.186.155.104/14
158.221.173.230/30 is subnet of network 158.221.69.71/14
243.3.157.156/28 is subnet of network 243.3.56.96/16
87.88.133.166/17 is subnet of network 87.88.222.120/17
50.100.190.198/30 is subnet of network 50.100.216.175/16
110.206.109.247/20 is subnet of network 110.205.222.149/14
138.157.204.243/29 is subnet of network 138.157.172.230/14
110.142.177.23/29 is subnet of network 110.140.246.97/14
68.143.197.241/26 is subnet of network 68.142.21.93/15
239.186.245.174/18 is subnet of network 239.185.216.72/14
33.487s

A more "fair" comparison uses your own algorithm but with a pre-parse step:

def old(filename: str):
 lines = []
 with open(filename) as fileobj:
 next(fileobj)
 for line in fileobj:
 net_str = line.strip()
 lines.append((net_str, IPv4Network(net_str, False)))
 for subnet_str, subnet in lines:
 for supernet_str, supernet in lines:
 if subnet_str is supernet_str:
 continue
 if subnet.subnet_of(supernet):
 print(f'{subnet_str} is subnet of {supernet_str}')

This still took 1.87s on my machine, which is ~58x the time of the 32ms of the new method.

Comparing the new and fair-old methods for min_mask=20, rows=10_000 is more dramatic: 0.362s vs. 249.763s, ~690x faster.

answered May 11, 2021 at 14:56
\$\endgroup\$
0
3
\$\begingroup\$

To answer your question, use enumerate() in the first loop to get the index of each line. Then start the second loop at the next index:

for index, line in enumerate(lines):
 for j in range(index + 1, len(lines)):
 ...
answered May 12, 2021 at 18:09
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.