3
\$\begingroup\$

Scope is to throw any mail, HTML or plain or multipart into this and receive a readable formatted text that deals with special characters and HTML issues. Field of usage: System that imports mails into a mysql db, or prints them to PDF, etc.

What can I improve here?

import imapclient, pyzmail, html2text
"""
Connect to IMAP. Login and select folder.
retreive UID of Mails and fetch the latest.
return the body of the latest message
"""
def latestMail():
 imapObj = imapclient.IMAPClient('imap.server.com', ssl=False)
 imapObj.login('username', 'password')
 imapObj.select_folder('INFO', readonly=True)
 UIDs = imapObj.search(criteria='ALL', charset=None)
 rawMessages = imapObj.fetch(UIDs[0], ['BODY[]', 'FLAGS'])
 message = pyzmail.PyzMessage.factory(rawMessages[UIDs[0]]['BODY[]'])
 return message
"""
Open message object, check if content is html or plain or both.
If both, only plain will be taken. for Plain, decodes the
message if charset is given, if not we will use system default
for html, additionally format html so the information
is exportable to any db.
"""
def parser(message):
 if message.text_part is not None and message.html_part is not None:
 multipart = True
 else:
 multipart = False
 if message.text_part is not None:
 try:
 body = message.text_part.get_payload().decode(message.text_part.charset)
 except TypeError:
 body = message.text_part.get_payload()
 if message.html_part is not None and multipart is False:
 try:
 body = html2text.html2text(message.html_part.get_payload().decode(message.html_part.charset))
 except Exception:
 raise Systemexit
 return body
print parser(latestMail())
toolic
15.2k5 gold badges29 silver badges213 bronze badges
asked Nov 3, 2016 at 17:02
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

docstrings

This is a lovely module-level docstring.

"""
Connect to IMAP. Login and select folder.
retreive UID of Mails and fetch the latest.
return the body of the latest message
"""

Alas, it appears you intended for it to apply to this function:

def latestMail():
 imapObj = ...

To accomplish that, you would need to write a function-level docstring:

def latestMail():
 """
 your text goes here
 """
 imapObj = ...

Then e.g. help(latestMail) would display appropriate help.

Similarly for the parser() docstring. As it stands you computed a constant (a string), and then discarded it.

Also, PEP 8 asks that you name it latest_mail(), imap_obj, etc. Though a name like client would be better, yet.

credentials don't belong in source code

This looks like trouble:

 imapObj.login('username', 'password')

I assume your git repo has 'Secret123' or some other credential checked in. Avoid doing that. Prefer os.environ['IMAP_PASSWORD'], or dotenv.

expensive query

 UIDs = imapObj.search(criteria='ALL', charset=None)

Given that you literally only look at the most recent unique ID, you'd be better off with criterion='(RECENT)', so both the server and the network perform less work.

Also, my reading of ancient python2 docs suggests you'd want a tuple unpack there:

typ, msgnums = M.search(None, '(FROM "LDJ")')

assign a bool

 if message.text_part is not None and message.html_part is not None:
 multipart = True
 else:
 multipart = False

Consider rephrasing as

 multipart = (message.text_part is not None
 and message.html_part is not None)

or even

 multipart = bool(message.text_part and message.html_part)

fatal exception

This seems unhelpful:

 except Exception:
 raise Systemexit

Better to not try at all, so when we bail out it's at least apparent what went wrong.

Also, I found the except TypeError: "handler" a bit frightening. I suppose that if .decode() failed, instead of a string we return raw bytes?!? I am soooo glad python3 made incompatible changes which have spared a generation of programmers from worrying about such details.

answered Mar 14 at 23:11
\$\endgroup\$
1
\$\begingroup\$

It looks good. I only have a few coding style suggestions.

Naming

The PEP 8 style guide recommends snake_case for function and variable names.

latestMail would be latest_mail.

rawMessages would be raw_messages.

The name parser is a bit generic. email_parser is more specific.

Documentation

The docstrings are very helpful. In the following, I would make some minor adjustments to capitalize "HTML" and explicitly mention "plain text":

"""
Open message object, check if content is HTML or plain text or both.
If both, only plain will be taken. For plain, decodes the
message if charset is given, if not we will use system default
for HTML, additionally format HTML so the information
is exportable to any db.
"""

Portability

I realize this question was posted many years ago when Python version 2.x was prevalent, but now that it is deprecated, consider porting to 3.x.

Simpler

These 2 lines :

message = pyzmail.PyzMessage.factory(rawMessages[UIDs[0]]['BODY[]'])
return message

can be simplified as:

return pyzmail.PyzMessage.factory(rawMessages[UIDs[0]]['BODY[]'])

This eliminates the intermediate message variable.

When I see double negatives (not None) as in this code:

if message.text_part is not None and message.html_part is not None:

I think it is easier to read by inverting the logic as:

if message.text_part is None or message.html_part is None:
 multipart = False
else:
 multipart = True
answered Mar 13 at 17:56
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.