I'm new to programming, I'd like you to check my work and criticize the hell out of me. What would you do differently?
FQDN: https://en.m.wikipedia.org/wiki/Fully_qualified_domain_name
- 253 characters not including trailing dot
- Label1.Label2.Label3.adomain.com
- All labels must be between 1-63 characters
- Cannot start or end with hyphen (-)
- May contain only a-z, 0-9 and hyphens
This just checks to make sure the hostname passed as an argument meets all standards.
import re
def is_fqdn(hostname):
"""
:param hostname: string
:return: bool
"""
# Remove trailing dot
try: # Is this necessary?
if hostname[-1] == '.':
hostname = hostname[0:-1]
except IndexError:
return False
# Check total length of hostname < 253
if len(hostname) > 253:
return False
# Split hostname into list of DNS labels
hostname = hostname.split('.')
# Define pattern of DNS label
# Can begin and end with a number or letter only
# Can contain hyphens, a-z, A-Z, 0-9
# 1 - 63 chars allowed
fqdn = re.compile(r'^[a-z0-9]([a-z-0-9-]{0,61}[a-z0-9])?$', re.IGNORECASE)
# Check if length of each DNS label < 63
# Match DNS label to pattern
for label in hostname:
if len(label) > 63:
return False
if not fqdn.match(label):
return False
# Found no errors, returning True
return True
I declared the variable for the regex pattern after the 2 conditionals with return statements. My thought was, why store a variable that would be unused if the prior conditions were met?
Could the regex be written any better?
2 Answers 2
Comments:
Use type declarations! These are (IMO) easier to read than docstring comments and they also make your code
mypy
-able.Your
try/catch
block is just an indirect way of requiring that the parameter is at least 1 character long. It's more clear IMO to just check the length explicitly, especially since you're already doing that as the next step.Reassigning a different type to an existing variable is something you see a lot in quick-and-dirty Python scripts, but it's bad practice IMO (and mypy will treat it as an error unless you forward-declare it with a tricky
Union
type). Just use a new variable name when you generate a new object with a new type.Your regex already enforces the 63-character requirement. DRY (Don't Repeat Yourself)!
Using Python's built-in
all
function is better than rolling your ownfor
loop.
import re
def is_fqdn(hostname: str) -> bool:
"""
https://en.m.wikipedia.org/wiki/Fully_qualified_domain_name
"""
if not 1 < len(hostname) < 253:
return False
# Remove trailing dot
if hostname[-1] == '.':
hostname = hostname[0:-1]
# Split hostname into list of DNS labels
labels = hostname.split('.')
# Define pattern of DNS label
# Can begin and end with a number or letter only
# Can contain hyphens, a-z, A-Z, 0-9
# 1 - 63 chars allowed
fqdn = re.compile(r'^[a-z0-9]([a-z-0-9-]{0,61}[a-z0-9])?$', re.IGNORECASE)
# Check that all labels match that pattern.
return all(fqdn.match(label) for label in labels)
I echo Roland's suggestion about writing a unit test. A function like this is really easy to write tests for; you'd do it like:
def test_is_fqdn() -> None:
# Things that are FQDNs
assert is_fqdn("homestarrunner.net")
assert is_fqdn("zombo.com")
# Things that are not FQDNs
assert not is_fqdn("")
assert not is_fqdn("a*")
assert not is_fqdn("foo") # no TLD means it's not a FQDN!
Note that the last assert in that test will fail...
-
\$\begingroup\$ This is exactly what I needed Sam. Thank you for helping me learn! I thought by creating a new variable for a list when I could just reassign would be a waste of resources. \$\endgroup\$vital– vital2020年01月11日 19:41:34 +00:00Commented Jan 11, 2020 at 19:41
-
1\$\begingroup\$ You're creating the new object in memory regardless. :) Since you only use the list once, you could also just not name it, e.g.
return all(fqdn.match(label) for label in hostname.split('.'))
\$\endgroup\$Samwise– Samwise2020年01月11日 19:44:08 +00:00Commented Jan 11, 2020 at 19:44 -
1\$\begingroup\$ Note that
foo
is a perfectly valid FQDN: it resolves to the host namedfoo
in the root domain, i.e. it is equivalent tofoo.
. There is an example of a small island country offering exactly that: Tonga sold an A record forto
to a URI shortener service for some time. See serverfault.com/a/90753/1499 and superuser.com/q/78408/2571 \$\endgroup\$Jörg W Mittag– Jörg W Mittag2020年01月12日 12:20:30 +00:00Commented Jan 12, 2020 at 12:20
Your requirements could be summed up in a single regex.
import re
def is_fqdn(hostname):
return re.match(r'^(?!.{255}|.{253}[^.])([a-z0-9](?:[-a-z-0-9]{0,61}[a-z0-9])?\.)*[a-z0-9](?:[-a-z0-9]{0,61}[a-z0-9])?[.]?$', hostname, re.IGNORECASE)
I don't particularly condone this very condensed formulation; but this does everything in your requirements.
Here's a rundown.
^
beginning of line / expression(?!.{255}|.{253}[^.])
negative lookahead: don't permit 255 or more characters, or 254 where the last is not a dot.([a-z0-9](?:[-a-z-0-9]{0,61}[a-z0-9])?\.)*
zero or more labels where the first and last characters are not hyphen, and a max of 61 characters between them can also be a hyphen; all followed by a dot. The last 62 are optional so that we also permit a single-character label where the first character is also the last.[a-z0-9](?:[-a-z0-9]{0,61}[a-z0-9])?
the final label does not have to be followed by a dot[.]?
but it can be$
end of line / expression
When you only use a regex once, there is no acute reason to compile
it. Python will do this under the hood anyway (and in fact keep a cache of recently used compiled regexes and often reuse them more quickly than if you explicitly recompile).
-
\$\begingroup\$ So what you’re saying is, just to clarify, that single regex could replace all of my code. However, it isn’t a good practice. Correct me if I’m wrong please. \$\endgroup\$vital– vital2020年01月11日 19:47:24 +00:00Commented Jan 11, 2020 at 19:47
-
1\$\begingroup\$ I'm sort of divided on "good practice". In a complex program where you use a lot of regex for other stuff, this would not at all be out of place. \$\endgroup\$tripleee– tripleee2020年01月11日 19:49:09 +00:00Commented Jan 11, 2020 at 19:49
-
\$\begingroup\$ re.error: missing ), unterminated subpattern at position 64 \$\endgroup\$shikida– shikida2022年01月24日 21:10:08 +00:00Commented Jan 24, 2022 at 21:10
-
\$\begingroup\$ @shikida Thanks for the feedback! Belatedly fixed and tested. (Good thing I wrote notes, or it would have been kind of hard to second-guess what this was supposed to do. That's one of the problems with it.) \$\endgroup\$tripleee– tripleee2022年01月25日 14:59:31 +00:00Commented Jan 25, 2022 at 14:59
try
block is really necessary. If you have unit tests, you should add them to your question. And if you don't have any tests, just provide a list of names and whether each of them is valid or not. \$\endgroup\$