Revision 9a55f2fb-53fc-4b5c-b4ed-bea44f5ade1c - Code Review Stack Exchange
## Possible bug
Note that the minimum and the average only take into account days when there _was_ mail. Any days with no mail at all are ignored. That may or may not be the intended behaviour.
## Overview
I'll start by presenting a vision of how `main` should look like:
import argparse
from collections import Counter
from datetime import datetime
from getpass import getpass
import imaplib
from itertools import islice
import email
import email.parser
import sys
…
def main():
args = get_arguments()
imap_server = args.server or raw_input('Server: ')
imap_user = args.username or raw_input('Username: ')
imap_password = args.password or getpass('Password: ')
try:
conn = imaplib.IMAP4_SSL(imap_server, 993)
conn.login(imap_user, imap_password)
msgs = imap_messages(conn, fetch='(BODY[HEADER.FIELDS (DATE)])')
msgs_by_date = Counter(filter(None, (
header_date(msg) for msg in msgs
)))
if not msgs_by_date:
print "Empty mailbox"
return
min_emails_per_day = msgs_by_date.most_common()[-1][1]
max_emails_per_day = msgs_by_date.most_common(1)[0][1]
avg_emails_per_day = float(sum(msgs_by_date.values())) / len(msgs_by_date)
print 'Min Mails Per Day (So Far):', min_emails_per_day
print 'Max Mails Per Day (So Far):', max_emails_per_day
print '(Rough) Daily Mail Average:', avg_emails_per_day
except Exception as error:
print "An error has occurred, and the program has crashed; details:\n"
print error
return 10
if __name__ == "__main__":
sys.exit(main())
Most of the suggestions I would like to make are apparent in that excerpt:
Parameters:
- The leading underscore in `_get_arguments` should be dropped: it's not a private method in a class.
- Argument defaulting can be done using the `or` idiom: `a or b` will evaluate to `b` if `a` is `None` or if it is an empty string.
- Passwords should not be echoed on screen, so you should use [`getpass.getpass()`](https://docs.python.org/2/library/getpass.html#getpass.getpass) instead of `raw_input()`.
IMAP:
- This is the most complex part of the logic, so you should move some of the code into functions.
- **If you only want the Date header, then ask for just the Date header.** (See [RFC 3501 Sec 6.4.5](https://tools.ietf.org/html/rfc3501#section-6.4.5).)
Depending on the IMAP server implementation, the Date might be stored in a more readily accessible index, and thus faster to retrieve. Also try using the `INTERNALDATE` rather than `BODY[HEADER.FIELDS (DATE)]` to see if it's faster. (The internal date is the time at which the message was introduced into the mailbox, rather than the time at which the sender claims to have sent the message.)
Statistics:
- **A good data structure to use would be a [`collections.Counter`](https://docs.python.org/2/library/collections.html#collections.Counter).**
- A good way to populate the `Counter` is with a [generator expression](https://docs.python.org/2/tutorial/classes.html#generator-expressions).
- The average can be more elegantly computed using [sum()](https://docs.python.org/2/library/functions.html#sum) and [len()](https://docs.python.org/2/library/functions.html#len). For consistency, rename `rough_daily_average` to `avg_emails_per_day`.
- For consistency, use either `print "…: %s" % …` or `print "…", …`.
Error handling:
- `print str(error)` can just be `print error`, since `print` implicitly causes conversion to a string.
- I prefer not to hide `sys.exit()` inside a function.
## IMAP
My `main()` uses two functions: `imap_messages` and `header_date`.
class IMAPException(Exception): pass
def imap_messages(conn, mbox='INBOX', search='ALL', fetch='ALL', batch_size=1000):
# http://stackoverflow.com/a/8991553
def batch(iterable, batch_size):
it = iter(iterable)
while True:
chunk = list(islice(it, batch_size))
if not chunk:
return
yield chunk
ok, (result,) = conn.select(mbox, readonly=True)
if ok != 'OK': raise IMAPException(result)
ok, (result,) = conn.search(None, search)
if ok != 'OK': raise IMAPException(result)
for chunk in batch(result.split(), batch_size):
ok, results = conn.fetch(','.join(chunk), fetch)
if ok != 'OK': raise IMAPException('Failed to fetch some messages')
for msg in results:
yield msg
header_parser = email.parser.HeaderParser()
def header_date(msg):
for info in msg:
parsed_header = header_parser.parsestr(info)
if 'Date' in parsed_header:
date_tuple = email.utils.parsedate_tz(parsed_header['Date'])
local_date = datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
return local_date.strftime('%Y-%m-%d')
Remarks:
- Extracting some of the hard-coded strings as default parameters makes them not so hard-coded and can improve readability.
- The most significant performance improvement would be to issue `FETCH` commands for **batches of messages**. As noted in the RFC cited above, the IMAP client can say
A654 FETCH 2:4 (FLAGS BODY[HEADER.FIELDS (DATE FROM)])
to get the flags and the Date and From headers for messages 2, 3, and 4.
The caveat is that you can't get too greedy. If the response is too large, you'll get an error. The trick is to [limit the batch size][1].
- To make it clear what `.select("INBOX", True)` means, explicitly name the second parameter: `.select("INBOX", readonly=True)`.
- To help avoid some of the ugliness of the indexes in `msgdata[0][1]`, you can use destructuring assignment.
- `.search(…)` on an empty mailbox should return `(OK, [''])`. If the response code is not `OK`, then something went wrong. Your `print "No messages!"; sys.exit(0);` handling suggests that you are treating that as the expected behaviour for an empty mailbox, rather than as an error.
- Signal errors using exceptions rather than printing a message.
[1]: //stackoverflow.com/a/8991553/1157100