I decided to resume my (limited) Python skills after not coding for a long time.
Here's a very simple self-contained program to convert the characters of an email address into their HTML entities (the purpose is to make the email invisible to spamrobots).
Any comment for improvement? (e.g. args parsing, code style, etc.)
Apart from the
-h
option inargparse
, what is the standard way to add some documentation for it, such as a manpage or some embedded help?
#!/usr/bin/python
#
# mung - Convert a string into its HTML entities
#
import argparse
parser = argparse.ArgumentParser(description = 'Convert a string into its HTML entities.')
parser.add_argument('string_to_mung', help = 'String to convert')
parser.add_argument('-l', '--link', action = 'store_true', help = 'Embed the munged string into a mailto: link')
args = parser.parse_args()
def mung(plain):
munged = ''
for c in plain:
munged = munged + '&#' + str(ord(c)) + ';'
return munged
string_munged = mung(args.string_to_mung)
if (args.link):
print('<a href="mailto:')
print(string_munged + '">')
print(string_munged + '</a>')
else:
print(string_munged)
-
1\$\begingroup\$ Spam bots deal rather well with X(HT)ML escape sequences since they often use appropriate parser libraries. \$\endgroup\$David Foerster– David Foerster2018年10月08日 08:53:02 +00:00Commented Oct 8, 2018 at 8:53
-
\$\begingroup\$ Actually my experience suggests otherwise, as an obfuscated email address gets ten times less spam; see bit.ly/2UOuAOA \$\endgroup\$dr_– dr_2018年12月15日 15:23:20 +00:00Commented Dec 15, 2018 at 15:23
1 Answer 1
The code is straightforward and reads well, but:
- string concatenation is not very efficient, especially in a loop. You'd be better off using
str.join
on an iterable; - encoding the
mailto:
part yourself impairs readability and maintenance, if only you had a function to do it for you. Oh wait... - The comment at the beginning would be better as a module docstrings, you would then be able to use it as the
argparse
description using__doc__
; - You should avoid interleaving code and function definition, and protect top level code using an
if __name__ == '__main__':
guard.
Proposed improvements:
#!/usr/bin/python
"""Convert a string into its HTML entities"""
import argparse
def command_line_parser():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('string_to_mung', help='String to convert')
parser.add_argument('-l', '--link', action='store_true',
help='Embed the munged string into a mailto: link')
return parser
def mung(plain):
return ''.join('&#{};'.format(ord(c)) for c in plain)
if __name__ == '__main__':
args = command_line_parser().parse_args()
string_munged = mung(args.string_to_mung)
if (args.link):
string_munged = '<a href="{0}{1}">{1}</a>'.format(mung('mailto:'), string_munged)
print(string_munged)
-
\$\begingroup\$ Python 3.6+:
'&#{};'.format(ord(c))
→f'&#{ord(c)};'
\$\endgroup\$Roman Odaisky– Roman Odaisky2018年10月07日 22:55:30 +00:00Commented Oct 7, 2018 at 22:55 -
2\$\begingroup\$ @RomanOdaisky Given that there is no version tag, I prefer to keep
format
. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2018年10月07日 23:20:38 +00:00Commented Oct 7, 2018 at 23:20