1
\$\begingroup\$

This is a simple Python 3 program that verifies whether a string is a valid IPv6 address or not, parses IPv6 address to its hexadecimal and decimal values, and converts an integer to IPv6 format.

I am writing a webscraper, and to reduce network IO bound time complexity I need to programmatically lookup DNS records of target addresses and change hosts file accordingly due to DNS poisoning by GFW.

I use this API: 'https://www.robtex.com/dns-lookup/{website}' to get the addresses using XPaths, and the scraped results contains both IPv4 and IPv6 addresses, I would like to differentiate between IPv4 and IPv6 addresses and validate them separately.

For IPv4 addresses I have written this simple regex:

'^((25[0-5]|2[0-4]\d|1?[1-9]\d?|0)\.){3}(25[0-5]|2[0-4]\d|1?[1-9]\d?|0)$'

It does the validation in one step, but IPv6 is another thing, and after trying and failing to solve this using regex many times I gave up, I have searched for a regex to validate IPv6 and after I realized the regexes that work perfectly are way to long I changed my approach.

An IPv6 address is an integer between 0 and 2 ^ 128 (340282366920938463463374607431768211456) represented as 32 hexadecimal digits, formatted as 8 fields of 4 hex digits separated by 7 colons.

If not for the shortening rules IPv6 addresses can easily be validated by regexes.

There are two shortening rules that are used together, the first rule trims leading zeros in each field.

Now with only the first rule applied IPv6 can still be verified using this regex:

'^([\da-fA-F]{1,4}\:){7}([\da-fA-F]{1,4})$'

But there is the second rule, that omits continuous zero fields for once and uses '::' in its place, for example,

0:0:0:0:0:0:0:0 -> ::
0:0:0:0:0:0:0:1 –> ::1
fe80:0:0:0:0:0:0:1 –> fe80::1
fe80:0:0:0:0:1:0:1 -> fe80::1:0:1

Because of the second rule, each of the eight places where the omitted fields can be requires at least one regex, so that is at least 8 regexes in total...

So I have written my own code, and here it is:

import re
from ipaddress import ip_address, IPv6Address
from typing import Iterable
COLONS = {':', '::'}
DIGITS = frozenset('0123456789ABCDEFabcdef')
MAX_IPV6 = 2**128 - 1
ALL_ZEROS = '0:'*7 + '0'
LEADING = re.compile('^(0:)+')
TRAILING = re.compile('(:0)+$')
MIDDLE = re.compile('(:0)+')
def is_ipv6(ip: str) -> bool:
 if not isinstance(ip, str):
 raise TypeError('Argument must be an instance of str')
 if not ip or len(ip) > 39:
 return False
 first = True
 digits = 0
 colons = 0
 fields = 0
 compressed = False
 for i in ip:
 if i == ':':
 digits = 0
 first = False
 colons += 1
 if colons == 2:
 if not compressed:
 compressed = True
 else:
 return False
 elif colons > 2:
 return False
 else:
 if i not in DIGITS:
 return False
 digits += 1
 if digits > 4:
 return False
 if colons or first:
 first = False
 colons = 0
 fields += 1
 if fields > 8 - compressed:
 return False
 if (fields == 8 and colons != 1) or compressed:
 return True
 return False
def split_ipv6(ip: str) -> Iterable[str]:
 if not isinstance(ip, str):
 raise TypeError('Argument must be an instance of str')
 buffer = ''
 chunks = []
 digits = 0
 colons = 0
 fields = 0
 compressed = False
 for i in ip:
 tail = True
 if i == ':':
 if digits:
 chunks.append(buffer)
 digits = 0
 colons += 1
 if colons == 2:
 if not compressed:
 compressed = True
 tail = False
 else:
 return False
 if colons > 2:
 return False
 else:
 if i not in DIGITS:
 return False
 digits += 1
 if digits > 4:
 return False
 if colons or not buffer:
 if colons:
 chunks.append(':' * colons)
 colons = 0
 fields += 1
 buffer = i
 if fields > 8 - compressed:
 return False
 else:
 buffer += i
 if tail:
 chunks.append(buffer)
 else:
 chunks.append('::')
 if (fields == 8 and colons != 1) or compressed:
 return chunks
 return False
def parse_ipv6(ip: str) -> dict:
 segments = split_ipv6(ip)
 if not segments:
 raise ValueError('Argument is not a valid IPv6 address')
 compressed = False
 empty_fields = None
 if '::' in segments:
 fields = ['0'] * 8
 compressed = True
 cut = segments.index('::')
 left = [i for i in segments[:cut] if i not in COLONS]
 right = [i for i in segments[cut+1:] if i not in COLONS]
 
 fields[8-len(right):] = right
 fields[:len(left)] = left
 empty_fields = [i for i, f in enumerate(fields) if f == '0']
 else:
 fields = [c for c in segments if c not in COLONS]
 digits = [int(f, 16) for f in fields]
 # decimal = sum(d * 65536 ** i for i, d in enumerate(digits[::-1]))
 hexadecimal = '0x' + ''.join(i.zfill(4) for i in fields)
 decimal = int(hexadecimal, 16)
 parsed = {
 'segments': segments,
 'fields': fields,
 'digits': digits,
 'hexadecimal': hexadecimal,
 'decimal': decimal,
 'compressed': compressed,
 'empty fields': empty_fields
 }
 return parsed
def trim_left(s: str) -> str:
 return s[:-1].lstrip('0') + s[-1]
def to_ipv6(n: int, compress=True) -> str:
 if not isinstance(n, int):
 raise TypeError('Argument should be an instance of int')
 if not 0 <= n <= MAX_IPV6:
 raise ValueError('Argument is not in the valid range that IPv6 can represent')
 hexa = hex(n).removeprefix('0x').zfill(32)
 ipv6 = ':'.join(hexa[i:i+4] for i in range(0, 32, 4))
 if compress:
 ipv6 = ':'.join(trim_left(i) for i in ipv6.split(':'))
 if ipv6 == ALL_ZEROS:
 return '::'
 elif LEADING.match(ipv6):
 return LEADING.sub('::', ipv6)
 elif TRAILING.search(ipv6):
 return TRAILING.sub('::', ipv6)
 ipv6 = MIDDLE.sub(':', ipv6, 1)
 return ipv6
if __name__ == '__main__':
 test_cases = [
 (42540766411282592856904265327123268393, '2001:db8::ff00:42:8329'),
 (42540766411282592875278671431329809193, '2001:db8::ff00:0:42:8329'),
 (0, '::'),
 (1, '::1'),
 (5192296858534827628530496329220096, '1::'),
 (338288524927261089654018896841347694593, 'fe80::1'),
 (160289081533862935099527363545323831451, '7896:8ddf:4b26:f07f:a4cd:65de:ee90:809b'),
 (264029623924138153874706093713361856950, 'c6a2:4182:24b2:20f3:2d00:d2bb:3619:e9b6'),
 (155302777326544552126794348175886719955, '74d6:3a18:151d:948f:d13e:4d87:4fed:1bd3'),
 (152846031713612901234538066636429037612, '72fd:132e:fe1d:d05c:27d0:6001:a05f:902c'),
 (21824427460045008308753734783456952407, '106b:3b59:a20b:25dc:61b9:698e:d1e:c057'),
 (267115622348742355941753354636068900005, 'c8f4:98fa:50b3:e935:2bc9:25b0:593b:cca5'),
 (16777215, '::ff:ffff'),
 (3232235777, '::c0a8:101'),
 (4294967295, '::ffff:ffff'),
 (2155905152, '::8080:8080'),
 (18446744073709551615, '::ffff:ffff:ffff:ffff'),
 (18446744073709551616, '::1:0:0:0:0')
 ]
 for number, ip in test_cases:
 assert to_ipv6(number) == ip
 assert parse_ipv6(ip)['decimal'] == number

As you can see, the first function validates input and splits the input into chunks, in the same loop, I could have used a regex to split the input but that isn't much faster than the manual approach and I figure if I use regex I need at least another for loop to validate the result, in this approach I can save the cost of one loop and do early returns if the input is invalid.

And about the two other functions after I have written the first I just couldn't help myself.

All functions are working properly and there's exactly 0 chance that there are bugs introduced by me, everything returns what I assume to be correct, that is, addresses like '1::' are treated by the functions to be valid, I don't know if such syntax exists but I assume it does.

How can my code be improved?


Update

I have written a new function that does only the validation and therefore is much faster.

And why did I write these functions in the first place? Why I don't just use ipaddress library?

Well, I am reinventing the wheel and I have good reasons to do it.

The library code is not perfect, for one thing, I need to validate IPv6 addresses, and yes this means the strings might be invalid, and if the string is not a valid IP address, ipaddress.ip_address raises ValueError so I have to use try catch clauses...

In [690]: ipaddress.ip_address('100')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-690-10b17acc39c6> in <module>
----> 1 ipaddress.ip_address('100')
C:\Program Files\Python39\lib\ipaddress.py in ip_address(address)
 51 pass
 52
---> 53 raise ValueError('%r does not appear to be an IPv4 or IPv6 address' %
 54 address)
 55
ValueError: '100' does not appear to be an IPv4 or IPv6 address

And it validates both IPv4 and IPv6 addresses so I have to use isinstance checks...

My custom function only validates IPv6 addresses and doesn't raise exceptions, so there is not any extra step needed.

Secondly it is much slower than my functions:

Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.28.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
 ...: if compress:
 ...: ipv6 = ':'.join(trim_left(i) for i in ipv6.split(':'))
 ...: if ipv6 == ALL_ZEROS:
 ...: return '::'
 ...:
 ...: elif LEADING.match(ipv6):
 ...: return LEADING.sub('::', ipv6)
 ...:
 ...: elif TRAILING.search(ipv6):
 ...: return TRAILING.sub('::', ipv6)
 ...:
 ...: ipv6 = MIDDLE.sub(':', ipv6, 1)
 ...:
 ...: return ipv6
 ...:
 ...:
 ...: def ipaddress_test(s):
 ...: try:
 ...: return isinstance(ip_address(s), IPv6Address)
 ...: except ValueError:
 ...: return False
 ...:
 ...: if __name__ == '__main__':
 ...: test_cases = [
 ...: (42540766411282592856904265327123268393, '2001:db8::ff00:42:8329'),
 ...: (42540766411282592875278671431329809193, '2001:db8::ff00:0:42:8329'),
 ...: (0, '::'),
 ...: (1, '::1'),
 ...: (5192296858534827628530496329220096, '1::'),
 ...: (338288524927261089654018896841347694593, 'fe80::1'),
 ...: (160289081533862935099527363545323831451, '7896:8ddf:4b26:f07f:a4cd:65de:ee90:809b'),
 ...: (264029623924138153874706093713361856950, 'c6a2:4182:24b2:20f3:2d00:d2bb:3619:e9b6'),
 ...: (155302777326544552126794348175886719955, '74d6:3a18:151d:948f:d13e:4d87:4fed:1bd3'),
 ...: (152846031713612901234538066636429037612, '72fd:132e:fe1d:d05c:27d0:6001:a05f:902c'),
 ...: (21824427460045008308753734783456952407, '106b:3b59:a20b:25dc:61b9:698e:d1e:c057'),
 ...: (267115622348742355941753354636068900005, 'c8f4:98fa:50b3:e935:2bc9:25b0:593b:cca5'),
 ...: (16777215, '::ff:ffff'),
 ...: (3232235777, '::c0a8:101'),
 ...: (4294967295, '::ffff:ffff'),
 ...: (2155905152, '::8080:8080'),
 ...: (18446744073709551615, '::ffff:ffff:ffff:ffff'),
 ...: (18446744073709551616, '::1:0:0:0:0')
 ...: ]
 ...: for number, ip in test_cases:
 ...: assert to_ipv6(number) == ip
 ...: assert parse_ipv6(ip)['decimal'] == number
In [2]: ip_address('2001:0db8:0000:0000:ff00:0000:0042:8329')
Out[2]: IPv6Address('2001:db8::ff00:0:42:8329')
In [3]: type(ip_address('2001:0db8:0000:0000:ff00:0000:0042:8329')) == IPv6Address
Out[3]: True
In [4]: %timeit ipaddress_test('2001:0db8:0000:0000:ff00:0000:0042:8329')
12.2 μs ± 685 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [5]: %timeit is_ipv6('2001:0db8:0000:0000:ff00:0000:0042:8329')
6.01 μs ± 557 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: %timeit ipaddress_test('2001:0db8:0000:0000:ff00:0000:0042:8329')
12.3 μs ± 657 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [7]: %timeit parse_ipv6('2001:0db8:0000:0000:ff00:0000:0042:8329')
15.2 μs ± 302 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [8]: %timeit parse_ipv6('2001:db8::ff00:0:42:8329')
14.6 μs ± 629 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [9]: %timeit is_ipv6('2001:db8::ff00:0:42:8329')
4.02 μs ± 561 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [10]: %timeit ipaddress_test('2001:db8::ff00:0:42:8329')
10.4 μs ± 204 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [11]: %%timeit
 ...: for number, ip in test_cases:
 ...: assert is_ipv6(ip) == True
62.5 μs ± 5.24 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [12]: %%timeit
 ...: for number, ip in test_cases:
 ...: assert ipaddress_test(ip) == True
169 μs ± 4.58 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [13]: is_ipv6('f:'*7)
Out[13]: False
In [14]: is_ipv6('f:'*8)
Out[14]: False
In [15]: is_ipv6('f:'*9)
Out[15]: False
In [16]: is_ipv6('f:'*7+':')
Out[16]: True
In [17]: is_ipv6('f:'*7+'f')
Out[17]: True
In [18]: %timeit is_ipv6('f:'*7)
2.57 μs ± 392 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [19]: %timeit is_ipv6('f:'*8)
2.9 μs ± 468 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [20]: %timeit is_ipv6('f:'*9)
3.06 μs ± 368 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [21]: %timeit is_ipv6('f:'*7+':')
2.76 μs ± 468 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [22]: ipaddress_test('f:'*7)
Out[22]: False
In [23]: ipaddress_test('f:'*8)
Out[23]: False
In [24]: ipaddress_test('f:'*9)
Out[24]: False
In [25]: ipaddress_test('f:'*7+'f')
Out[25]: True
In [26]: %timeit ipaddress_test('f:'*7)
5.88 μs ± 684 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [27]: %timeit ipaddress_test('f:'*8)
5.87 μs ± 753 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [28]: %timeit ipaddress_test('f:'*9)
5.14 μs ± 770 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [29]: %timeit ipaddress_test('f:'*7+'f')
11.1 μs ± 400 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [30]: ipaddress_test('f::')
Out[30]: True
In [31]: ipaddress_test('::f')
Out[31]: True
In [32]: ipaddress_test('f::f')
Out[32]: True
In [33]: ipaddress_test('f:::f')
Out[33]: False
In [34]: %timeit ipaddress_test('f::')
5.53 μs ± 639 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [35]: %timeit ipaddress_test('::f')
5.54 μs ± 552 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [36]: %timeit ipaddress_test('f:::f')
5.25 μs ± 581 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [37]: %timeit ipaddress_test('100')
4.52 μs ± 591 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [38]: is_ipv6('100')
Out[38]: False
In [39]: %timeit is_ipv6('100')
747 ns ± 34.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [40]: %timeit is_ipv6('f::')
741 ns ± 60.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [41]: %timeit is_ipv6('::f')
735 ns ± 43 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [42]: %timeit is_ipv6('f:::f')
864 ns ± 49.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [43]: %timeit is_ipv6('windows')
278 ns ± 38.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [44]: %timeit ipaddress_test('windows')
4.62 μs ± 749 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [45]:

As you can see the library code is nowhere near the speed of my functions and in particular my custom functions spots invalid inputs much much faster than the library code...

Have I made myself clear?


Again, IPv6Address is not fast enough:

In [107]: from ipaddress import AddressValueError
In [108]: IPv6Address('100')
---------------------------------------------------------------------------
AddressValueError Traceback (most recent call last)
<ipython-input-108-46a502d0274c> in <module>
----> 1 IPv6Address('100')
C:\Program Files\Python39\lib\ipaddress.py in __init__(self, address)
 1916 addr_str, self._scope_id = self._split_scope_id(addr_str)
 1917
-> 1918 self._ip = self._ip_int_from_string(addr_str)
 1919
 1920 def __str__(self):
C:\Program Files\Python39\lib\ipaddress.py in _ip_int_from_string(cls, ip_str)
 1629 if len(parts) < _min_parts:
 1630 msg = "At least %d parts expected in %r" % (_min_parts, ip_str)
-> 1631 raise AddressValueError(msg)
 1632
 1633 # If the address has an IPv4-style suffix, convert it to hexadecimal.
AddressValueError: At least 3 parts expected in '100'
In [109]: def IPv6Address_test(s):
 ...: try:
 ...: IPv6Address(s)
 ...: return True
 ...: except AddressValueError:
 ...: return False
In [110]: %timeit IPv6Address_test('100')
2.13 μs ± 33.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [111]: IPv6Address_test('100::')
Out[111]: True
In [112]: IPv6Address_test('255.255.255.255')
Out[112]: False
In [113]: valid_long = [
 ...: '2001:0db8:0000:0000:ff00:0000:0042:8329',
 ...: '2001:db8::ff00:0:42:8329',
 ...: 'f:'*7+':',
 ...: 'f:'*7+'f',
 ...: '7896:8ddf:4b26:f07f:a4cd:65de:ee90:809b',
 ...: 'c6a2:4182:24b2:20f3:2d00:d2bb:3619:e9b6',
 ...: '74d6:3a18:151d:948f:d13e:4d87:4fed:1bd3',
 ...: '72fd:132e:fe1d:d05c:27d0:6001:a05f:902c',
 ...: '106b:3b59:a20b:25dc:61b9:698e:d1e:c057',
 ...: 'c8f4:98fa:50b3:e935:2bc9:25b0:593b:cca5',
 ...: '2001:db8::ff00:42:8329',
 ...: '2001:db8::ff00:0:42:8329'
 ...: ]
In [114]: valid_short = [
 ...: '::',
 ...: '::1',
 ...: '1::',
 ...: 'fe80::1',
 ...: '::ff:ffff',
 ...: '::c0a8:101',
 ...: '::ffff:ffff',
 ...: '::8080:8080',
 ...: '::ffff:ffff:ffff:ffff',
 ...: '::1:0:0:0:0',
 ...: 'f::',
 ...: '::f',
 ...: 'f::f'
 ...: ]
In [115]: invalid = [
 ...: '100',
 ...: 'windows',
 ...: 'intelligence',
 ...: 'this is not an IPv6 address',
 ...: 'esperanza',
 ...: 'hispana',
 ...: 'esperanta',
 ...: '255.255.255.255',
 ...: '192.168.1.1',
 ...: '127.0.0.1',
 ...: '151.101.129.69',
 ...: 'f:::f',
 ...: 'f:'*7,
 ...: 'f:'*8,
 ...: 'f:'*9
 ...: ]
In [116]: %timeit for i in valid_long: assert is_ipv6(i) == True
65 μs ± 7.37 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [117]: %timeit for i in valid_long: assert IPv6Address_test(i) == True
122 μs ± 5.68 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [118]: %timeit for i in valid_short: assert is_ipv6(i) == True
19.8 μs ± 485 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [119]: %timeit for i in valid_short: assert IPv6Address_test(i) == True
59.4 μs ± 6.72 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [120]: %timeit for i in invalid: assert is_ipv6(i) == False
16.6 μs ± 650 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [121]: %timeit for i in invalid: assert IPv6Address_test(i) == False
38.3 μs ± 7.13 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [122]: %timeit is_ipv6('::')
526 ns ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [123]: is_ipv6('::')
Out[123]: True
In [124]: %timeit is_ipv6('::1')
739 ns ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [125]: %timeit is_ipv6('1::')
741 ns ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [126]: %timeit is_ipv6('100')
761 ns ± 43.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [127]: %timeit IPv6Address_test('::')
2.65 μs ± 468 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [128]: %timeit IPv6Address_test('::1')
3.38 μs ± 687 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [129]: %timeit IPv6Address_test('1::')
3.4 μs ± 684 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [130]: %timeit IPv6Address_test('100')
2.13 μs ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [131]: %timeit IPv6Address_test('g')
2.11 μs ± 29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [132]: %timeit is_ipv6('g')
289 ns ± 54.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [133]:
```
asked Nov 22, 2021 at 14:12
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Have you considered using ipaddress? In particular, the conversion integer and string. \$\endgroup\$ Commented Nov 22, 2021 at 16:45
  • 1
    \$\begingroup\$ I agree with Marc that using the library ipaddress`es is the way to go. If you want to have a look at a working regex parser it looks like this regex101.com/r/cT0hV4/5 shudders. \$\endgroup\$ Commented Nov 22, 2021 at 17:17

1 Answer 1

1
\$\begingroup\$

I have written this simple regex

Ish. You haven't written it in a very simple way. Write it on multiple lines, add comments.

65536 ** i seems like a less obvious way of accomplishing 1 bit-shifted left by 16*i.

This:

 hexadecimal = '0x' + ''.join(i.zfill(4) for i in fields)
 decimal = int(hexadecimal, 16)

joins the fields in the string domain, but I think it would make more sense to join them in the integer domain - i.e. using bit-shift operations. Even if you did want to join them in the string domain, there's no need to prepend 0x.

parse_ipv6 currently has a weak type - a dict - and should prefer something like a named tuple instead.

It's an odd choice to have your accepted test case values in decimal. Hexadecimal literals like 0xFFFF_FFFF will be more obviously correct than 4294967295.

I am reinventing the wheel and I have good reasons to do it.

I beg to differ, but let's dig into it:

if the string is not a valid IP address, ipaddress.ip_address raises ValueError so I have to use try/catch clauses

That's a feature, not a bug. Thinking about the typical consumers of an IP parsing routine, well-written code would make better use of exceptions than a boolean value, and doing an except is trivial if needed.

And it validates both IPv4 and IPv6 addresses so I have to use isinstance checks...

That's because you're using it wrong. You should not be calling ip_address, and instead should directly construct an IPv6Address.

it is much slower than my functions

First, a fair comparison would only use IPv6Address() instead of forcing ip_address to try parsing an IPv4 address with a guaranteed failure.

Beyond that: whether or not fixing the above brings the routines into being performance-comparable, it's relatively rare that an application needs to validate thousands of addresses, and nearly always, correctness and maintainability matter more than performance. Your code is non-trivial, and will be a true pain to maintain as compared to using a built-in. How confident are you that your code is correct? 80%? 90%? Do you think that you can beat the stability and test coverage of the Python community? There are times where reinventing the wheel is called for, but this isn't one of them.

answered Nov 24, 2021 at 2:00
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.