Scope is to throw any mail, HTML or plain or multipart into this and receive a readable formatted text that deals with special characters and HTML issues. Field of usage: System that imports mails into a mysql db, or prints them to PDF, etc.
What can I improve here?
import imapclient, pyzmail, html2text
"""
Connect to IMAP. Login and select folder.
retreive UID of Mails and fetch the latest.
return the body of the latest message
"""
def latestMail():
imapObj = imapclient.IMAPClient('imap.server.com', ssl=False)
imapObj.login('username', 'password')
imapObj.select_folder('INFO', readonly=True)
UIDs = imapObj.search(criteria='ALL', charset=None)
rawMessages = imapObj.fetch(UIDs[0], ['BODY[]', 'FLAGS'])
message = pyzmail.PyzMessage.factory(rawMessages[UIDs[0]]['BODY[]'])
return message
"""
Open message object, check if content is html or plain or both.
If both, only plain will be taken. for Plain, decodes the
message if charset is given, if not we will use system default
for html, additionally format html so the information
is exportable to any db.
"""
def parser(message):
if message.text_part is not None and message.html_part is not None:
multipart = True
else:
multipart = False
if message.text_part is not None:
try:
body = message.text_part.get_payload().decode(message.text_part.charset)
except TypeError:
body = message.text_part.get_payload()
if message.html_part is not None and multipart is False:
try:
body = html2text.html2text(message.html_part.get_payload().decode(message.html_part.charset))
except Exception:
raise Systemexit
return body
print parser(latestMail())
2 Answers 2
docstrings
This is a lovely module-level docstring.
"""
Connect to IMAP. Login and select folder.
retreive UID of Mails and fetch the latest.
return the body of the latest message
"""
Alas, it appears you intended for it to apply to this function:
def latestMail():
imapObj = ...
To accomplish that, you would need to write a function-level docstring:
def latestMail():
"""
your text goes here
"""
imapObj = ...
Then e.g. help(latestMail)
would display appropriate help.
Similarly for the parser() docstring. As it stands you computed a constant (a string), and then discarded it.
Also, PEP 8
asks that you name it latest_mail()
, imap_obj
, etc.
Though a name like client
would be better, yet.
credentials don't belong in source code
This looks like trouble:
imapObj.login('username', 'password')
I assume your git repo has 'Secret123'
or some other
credential checked in.
Avoid doing that.
Prefer os.environ['IMAP_PASSWORD']
,
or dotenv.
expensive query
UIDs = imapObj.search(criteria='ALL', charset=None)
Given that you literally only look at the most recent unique ID,
you'd be better off with criterion='(RECENT)'
,
so both the server and the network perform less work.
Also, my reading of ancient python2 docs suggests you'd want a tuple unpack there:
typ, msgnums = M.search(None, '(FROM "LDJ")')
assign a bool
if message.text_part is not None and message.html_part is not None:
multipart = True
else:
multipart = False
Consider rephrasing as
multipart = (message.text_part is not None
and message.html_part is not None)
or even
multipart = bool(message.text_part and message.html_part)
fatal exception
This seems unhelpful:
except Exception:
raise Systemexit
Better to not try
at all,
so when we bail out
it's at least apparent what went wrong.
Also, I found the except TypeError:
"handler" a bit frightening.
I suppose that if .decode()
failed, instead of a string
we return raw bytes?!?
I am soooo glad python3 made incompatible changes
which have spared a generation of programmers from
worrying about such details.
It looks good. I only have a few coding style suggestions.
Naming
The PEP 8 style guide recommends snake_case for function and variable names.
latestMail
would be latest_mail
.
rawMessages
would be raw_messages
.
The name parser
is a bit generic. email_parser
is more specific.
Documentation
The docstrings are very helpful. In the following, I would make some minor adjustments to capitalize "HTML" and explicitly mention "plain text":
"""
Open message object, check if content is HTML or plain text or both.
If both, only plain will be taken. For plain, decodes the
message if charset is given, if not we will use system default
for HTML, additionally format HTML so the information
is exportable to any db.
"""
Portability
I realize this question was posted many years ago when Python version 2.x was prevalent, but now that it is deprecated, consider porting to 3.x.
Simpler
These 2 lines :
message = pyzmail.PyzMessage.factory(rawMessages[UIDs[0]]['BODY[]'])
return message
can be simplified as:
return pyzmail.PyzMessage.factory(rawMessages[UIDs[0]]['BODY[]'])
This eliminates the intermediate message
variable.
When I see double negatives (not None
) as in this code:
if message.text_part is not None and message.html_part is not None:
I think it is easier to read by inverting the logic as:
if message.text_part is None or message.html_part is None:
multipart = False
else:
multipart = True