My code:
import re
mbox = open('mailbox.txt')
ndict = {}
for line in mbox:
domain = re.findall('From [^ ].*@([^ ]*)', line)
if domain:
if domain[0] in ndict:
ndict[domain[0]] += 1
else:
ndict[domain[0]] = 1
print_list = [print(i) for i in sorted([(i, ndict[i]) for i in
list(ndict.keys())], key=lambda x: x[1], reverse=True)]
I don't like how I create a dictionary and then immediately convert it into a list (which I'm doing in order to sort it), as it seems un-Pythonic to me. Is there a more Pythonic way to do this without using dictionaries?
i.e. Can you do this by making a list and appending tuples of the domains and number of occurrences?
1 Answer 1
I suggest you get a linter such as Prospector or flake8, this can tell if your code is un-Pythonic. Some people prefer hinters like black.
Your code doesn't conform to PEP 8 which is the standard when it comes to Python. Your comprehension is hard to read because it doesn't conform to best practices.
I'd recommend moving your code into a
main
function and use anif __name__ == '__main__':
guard. This reduces global pollution and prevents your code from running accidentally.When you see something like:
if key in my_dict: my_dict[key] += value else: my_dict[key] = default + value
Then you should probably use
dict.get
which can get the following code:my_dict[key] = my_dict.get(key, default) + value
In this case you can add more sugar by using
collections.defaultdict
, as it will default missing keys to 0.import collections my_dict = collections.defaultdict(int) my_dict[key] += 1
Don't use comprehensions with side effects, they're hard to understand and are better expressed as standard
for
loops. This is as the list you're making is absolutely pointless.- You can use
dict.items()
rather than your comprehension withdict.keys()
.
import re
import collections
def main():
with open('mailbox.txt') as mbox:
ndict = collections.defaultdict(int)
for line in mbox:
domain = re.findall('From [^ ].*@([^ ]*)', line)
if domain:
ndict[domain[0]] += 1
for item in sorted(ndict.items(), key=lambda x: x[1], reverse=True):
print(item)
if __name__ == '__main__':
main()
You can replace the majority of your code with collections.Counter
.
import re
import collections
def main():
with open('mailbox.txt') as mbox:
counts = collections.Counter(
domain[0]
for domain in re.findall('From [^ ].*@([^ ]*)', line)
if domain
)
for item in counts.most_common():
print(item)
if __name__ == '__main__':
main()
-
\$\begingroup\$ Thank you for the answer, and for telling me about PEP 8 (I didn't even know about it before lol; apologies!) and all these new methods and libraries!
dict.get
I feel would be an especially useful one for me. However, I have to ask why you would usedict.items
overdict.keys
in that instance. Also I think you might have made a mistake in your last code block, you wrote count instead of counts when iterating. Thank you for your time btw! \$\endgroup\$naisuu42– naisuu422019年08月23日 00:25:39 +00:00Commented Aug 23, 2019 at 0:25 -
1\$\begingroup\$ @rivershen No problem, we all have to start somewhere :) I'm unsure what you mean by your
dict.items
question, but I'll give it a stab anyway. Since[(i, ndict[i]) for i in ndict.keys()]
is pretty much the same asndict.items()
then there are no benefits to using the former - it's longer to type and to read. Yes I did make a mistake, I have fixed that now thank you. \$\endgroup\$2019年08月23日 00:29:06 +00:00Commented Aug 23, 2019 at 0:29