Sorting email address according to occurrences of each domain

Question 1

My code:

import re
mbox = open('mailbox.txt')
ndict = {}
for line in mbox:
 domain = re.findall('From [^ ].*@([^ ]*)', line)
 if domain:
 if domain[0] in ndict:
 ndict[domain[0]] += 1
 else:
 ndict[domain[0]] = 1
print_list = [print(i) for i in sorted([(i, ndict[i]) for i in 
 list(ndict.keys())], key=lambda x: x[1], reverse=True)]

I don't like how I create a dictionary and then immediately convert it into a list (which I'm doing in order to sort it), as it seems un-Pythonic to me. Is there a more Pythonic way to do this without using dictionaries?

i.e. Can you do this by making a list and appending tuples of the domains and number of occurrences?

Question 2

I suggest you get a linter such as Prospector or flake8, this can tell if your code is un-Pythonic. Some people prefer hinters like black.

Your code doesn't conform to PEP 8 which is the standard when it comes to Python. Your comprehension is hard to read because it doesn't conform to best practices.
I'd recommend moving your code into a main function and use an if __name__ == '__main__': guard. This reduces global pollution and prevents your code from running accidentally.

When you see something like:

if key in my_dict:
 my_dict[key] += value
else:
 my_dict[key] = default + value

Then you should probably use dict.get which can get the following code:

my_dict[key] = my_dict.get(key, default) + value

In this case you can add more sugar by using collections.defaultdict, as it will default missing keys to 0.

import collections
my_dict = collections.defaultdict(int)
my_dict[key] += 1

Don't use comprehensions with side effects, they're hard to understand and are better expressed as standard for loops. This is as the list you're making is absolutely pointless.
You can use dict.items() rather than your comprehension with dict.keys().

import re
import collections
def main():
 with open('mailbox.txt') as mbox:
 ndict = collections.defaultdict(int)
 for line in mbox:
 domain = re.findall('From [^ ].*@([^ ]*)', line)
 if domain:
 ndict[domain[0]] += 1
 for item in sorted(ndict.items(), key=lambda x: x[1], reverse=True):
 print(item)
if __name__ == '__main__':
 main()

You can replace the majority of your code with collections.Counter.

import re
import collections
def main():
 with open('mailbox.txt') as mbox:
 counts = collections.Counter(
 domain[0]
 for domain in re.findall('From [^ ].*@([^ ]*)', line)
 if domain
 )
 for item in counts.most_common():
 print(item)
if __name__ == '__main__':
 main()

Question 3

Thank you for the answer, and for telling me about PEP 8 (I didn't even know about it before lol; apologies!) and all these new methods and libraries! dict.get I feel would be an especially useful one for me. However, I have to ask why you would use dict.items over dict.keys in that instance. Also I think you might have made a mistake in your last code block, you wrote count instead of counts when iterating. Thank you for your time btw!

Question 4

@rivershen No problem, we all have to start somewhere :) I'm unsure what you mean by your dict.items question, but I'll give it a stab anyway. Since [(i, ndict[i]) for i in ndict.keys()] is pretty much the same as ndict.items() then there are no benefits to using the former - it's longer to type and to read. Yes I did make a mistake, I have fixed that now thank you.

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Accepted Answer · 2019-08-23 00:14:45Z

I suggest you get a linter such as Prospector or flake8, this can tell if your code is un-Pythonic. Some people prefer hinters like black.

Your code doesn't conform to PEP 8 which is the standard when it comes to Python. Your comprehension is hard to read because it doesn't conform to best practices.
I'd recommend moving your code into a main function and use an if __name__ == '__main__': guard. This reduces global pollution and prevents your code from running accidentally.

When you see something like:

if key in my_dict:
 my_dict[key] += value
else:
 my_dict[key] = default + value

Then you should probably use dict.get which can get the following code:

my_dict[key] = my_dict.get(key, default) + value

In this case you can add more sugar by using collections.defaultdict, as it will default missing keys to 0.

import collections
my_dict = collections.defaultdict(int)
my_dict[key] += 1

Don't use comprehensions with side effects, they're hard to understand and are better expressed as standard for loops. This is as the list you're making is absolutely pointless.
You can use dict.items() rather than your comprehension with dict.keys().

import re
import collections
def main():
 with open('mailbox.txt') as mbox:
 ndict = collections.defaultdict(int)
 for line in mbox:
 domain = re.findall('From [^ ].*@([^ ]*)', line)
 if domain:
 ndict[domain[0]] += 1
 for item in sorted(ndict.items(), key=lambda x: x[1], reverse=True):
 print(item)
if __name__ == '__main__':
 main()

You can replace the majority of your code with collections.Counter.

import re
import collections
def main():
 with open('mailbox.txt') as mbox:
 counts = collections.Counter(
 domain[0]
 for domain in re.findall('From [^ ].*@([^ ]*)', line)
 if domain
 )
 for item in counts.most_common():
 print(item)
if __name__ == '__main__':
 main()

Thank you for the answer, and for telling me about PEP 8 (I didn't even know about it before lol; apologies!) and all these new methods and libraries! dict.get I feel would be an especially useful one for me. However, I have to ask why you would use dict.items over dict.keys in that instance. Also I think you might have made a mistake in your last code block, you wrote count instead of counts when iterating. Thank you for your time btw!
@rivershen No problem, we all have to start somewhere :) I'm unsure what you mean by your dict.items question, but I'll give it a stab anyway. Since [(i, ndict[i]) for i in ndict.keys()] is pretty much the same as ndict.items() then there are no benefits to using the former - it's longer to type and to read. Yes I did make a mistake, I have fixed that now thank you.

Stack Exchange Network

Sorting email address according to occurrences of each domain

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Sorting email address according to occurrences of each domain

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions