Faster way to group email-date pairs by date without many nested for loops?

Question 1

Given list of pairs (email-date):

['[email protected]', '2 august 1976'], 
['[email protected]', '3 august 2001'], 
['[email protected]', '2 october 2001'], 
['[email protected]', '2 october 2001'], 
['[email protected]', '2 august 2001']]

I need to turn it into dictionary grouped by dates (one date → many emails).

I can do it with for loops:

def emails_for_one_date():
 for each_entry in email_date_table:
 for each_date in each_entry:
 current_date_as_key = each_date
 for each_entry2 in email_date_table:
 list_emails = []
 if each_entry[1] == current_date_as_key:
 list_emails.append( each_entry[0])
 date_for_emails_dict[current_date_as_key] = list_emails
 print("------------------- dict here")
 print(date_for_emails_dict)

But Python is such a powerful language, I want to know the 'pythonic' way of doing it! I suspect it can be one line.

Question 2

Your code actually does more than advertised: the e-mail addresses are all added as keys to date_for_emails_dict as well.

Question 3

A good function will should accept the data as a parameter and return the resulting data structure. Call print(emails_by_date(email_date_table)) to put it all together.

Two Pythonic techniques you want to use are:

dict.setdefault(key[, default]), which gets rid of the need for if, as well as the requirement for the input to be grouped chronologically
Multiple assignment in the for-loop, which removes the ugly [0] and [1] indexing:

If the target list is a comma-separated list of targets: The object must be an iterable with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.

The function could be written as:

def emails_by_date(pairs):
 ret = {}
 for email, date in pairs:
 ret.setdefault(date, []).append(email)
 return ret

Question 4

As you suspect, there is an easier way. The first thing we'll need is defaultdict from the collections module.

from collections import defaultdict

A defaultdict acts like a dictionary, with the difference that whatever type is specified upon construction will be used as a default if a given key does not exist in the dictionary. For example:

>>> d = defaultdict(int)
>>> d[1]
0
>>> d
defaultdict(<class 'int'>, {1: 0})

With a normal dictionary, this would give you a KeyError (as 1 did not exist). With a defaultdict, it sees that 1 does not exist, and inserts an int() (which is 0).

In our case, we're going to want a list to be the default:

d = defaultdict(list)

Given that we have a list of pairs, we can just grab each of name and date out of the list as we go:

for name, date in email_date_table:

Now we just lookup the date key, and append the current name:

d[date].append(name)

If d[date] doesn't exist, it will create an empty list and append name to it. If it does exist, it will simply give us back the existing list of names that is mapped to by that key.

The whole thing then becomes:

d = defaultdict(list)
for name, date in email_date_table:
 d[date].append(name)

Question 5

With defaultdict, be aware that simply querying the resulting data structure for non-existent dates will cause it to be littered with empty list entries.

Question 6

both answers are great - I did not know that just by creating dictionary with date as key, it puts all corresponding emails in list!

Question 7

Here's the one-liner, not that it's recommended:

email_date_dict = {date: [name for name, date2 in email_date_table if date2 == date] for _, date in email_date_table}

It's inefficient (because it iterates over email_date_table repeatedly) and not very readable but illustrates how powerful python's list and dictionary comprehensions can be.

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Accepted Answer · 2014-07-31 08:07:07Z

A good function will should accept the data as a parameter and return the resulting data structure. Call print(emails_by_date(email_date_table)) to put it all together.

Two Pythonic techniques you want to use are:

dict.setdefault(key[, default]), which gets rid of the need for if, as well as the requirement for the input to be grouped chronologically
Multiple assignment in the for-loop, which removes the ugly [0] and [1] indexing:

If the target list is a comma-separated list of targets: The object must be an iterable with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.

The function could be written as:

def emails_by_date(pairs):
 ret = {}
 for email, date in pairs:
 ret.setdefault(date, []).append(email)
 return ret

Stack Exchange Network

Faster way to group email-date pairs by date without many nested for loops?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Faster way to group email-date pairs by date without many nested for loops?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions