Printing a table of letter frequencies using Python

Question 1

What is the recommended (standard, cleaner, more elegant, etc.) method in Python to get a string with all the alphabet letters?

Here's three versions of code to generate a dictionary with keys as letters and 0 as all values. The aim is to parse a text and count how often a letter appears.

from string import ascii_lowercase
frequency = {letter: 0 for letter in ascii_lowercase}

A downside of this method is that the constant shows up in the pydoc output for the module in which it is imported.

frequency = {chr(letter): 0 for letter in range(ord('a'), ord('z')+1)}

frequency = {letter: 0 for letter in 'abcdefghijklmnopqrstuvwxyz'}

This is the snippet of code where it is used:

frequency = {chr(letter): 0 for letter in range(ord('a'), ord('z')+1)}
# The 'text' variable is taken from somewhere else, we define it here for example's sake
text = 'StackExchange Code Review'
for c in text.lower():
 if c in frequency.keys():
 frequency[c] += 1
 total += 1
output = OrderedDict(sorted(frequency.items(), key = lambda x: x[0]))
for l in output.keys():
 print(l + ':' + '{:>8}'.format(output[l]) + '{:9.2f}'.format(100 * output[l] / (total if total > 0 else 1)) + ' %')

Question 2

What are you really trying to accomplish with this code? Please provide the context. See How to Ask.

Question 3

I have added some context, however should a dictionary not be the best way to do so, I'm still curious to know the answer to my original question ("how to generate the alphabet in Python"). Thanks in advance.

Question 4

"The aim is to parse a text and count how often a letter appears." — Then post that code too, so that we can review your code properly.

Question 5

The printing portion should be simplified:

The use of OrderedDict is superfluous. A list of pairs would suffice.
I'd prefer to stick to the same iteration variable, when it is used to refer to the same thing. Choose either c or l.
You are not using str.format() effectively.
(total if total > 0 else 1) could just be (total or 1).

for c, n in sorted(frequency.items()):
 print('{}:{:>8}{:9.2f}%'.format(c, n, 100 * n / (total or 1)))

As for the counting process itself, you could write c in frequency instead of c in frequency.keys(). But, really, the task is so naturally suited to a collections.Counter that it would be silly not to use it.

from collections import Counter as _Counter
from string import ascii_lowercase as _ascii_lowercase
text = 'StackExchange Code Review'
frequency = _Counter({c: 0 for c in _ascii_lowercase})
frequency.update(c for c in text.lower() if c in frequency)
total = sum(frequency.values())
print('\n'.join(
 '{}:{:>8}{:9.2f}%'.format(c, n, 100 * n / (total or 1))
 for c, n in sorted(frequency.items())
))

I think that the call to sorted() should be optional as of Python 3.6.

Question 6

Thanks. Counter is an unordered collection, so sorted() should be necessary if we want to get the list of frequency by alphabetical order; is this correct?

Question 7

The release notes for Python 3.7 says: "The insertion-order preservation nature of dict objects has been declared an official part of the Python language spec." It's ambiguous whether that statement also applies to Counter, which is dict-like, but when I tried it on Python 3.7, it works without sorted(), and that I doubt that the order is coincidental.

Question 8

I would probably still wrap with sorted. There is major disagreement among the main python devs as to whether or not insertion order preservation is actually guaranteed or just probably going to remain true. Given that sorted is O(n) for already sorted collections (bc timsort), I wouldn't risk the code breaking in 5 years if some dev changes things.

Question 9

In the first part of your snippet, you can calculate the frequency in the place of '0' by finding the 'text.count' for each character in the 'text' itself. This will help you to avoid using 'if-statement' inside 'for-statement' and keeping your code clean and short.

Example:

text = 'StackExchange Code Review'
frequency = {chr(ltr) : text.lower().count(chr(ltr)) for ltr in range(ord('a'), ord('z') + 1)}
total = sum(frequency.values())

Note: 'if-statement', based on the 'text' in your snippet, checks for the space character which is absolutely not existed in the 'frequency', and thus, extra steps of execution.

Question 10

If your real problem is...

A downside of this method is that the constant shows up in the pydoc output for the module in which it is imported.

... then change the name of the import:

from string import ascii_lowercase as _lowercase_letters
frequency = {letter: 0 for letter in _lowercase_letters}

The leading underscore will prevent pydoc from automatically adding a reference to it.

200_success 200_success 146k22 gold badges190 silver badges479 bronze badges · Accepted Answer · 2018-10-12 21:18:47Z

The printing portion should be simplified:

The use of OrderedDict is superfluous. A list of pairs would suffice.
I'd prefer to stick to the same iteration variable, when it is used to refer to the same thing. Choose either c or l.
You are not using str.format() effectively.
(total if total > 0 else 1) could just be (total or 1).

for c, n in sorted(frequency.items()):
 print('{}:{:>8}{:9.2f}%'.format(c, n, 100 * n / (total or 1)))

As for the counting process itself, you could write c in frequency instead of c in frequency.keys(). But, really, the task is so naturally suited to a collections.Counter that it would be silly not to use it.

from collections import Counter as _Counter
from string import ascii_lowercase as _ascii_lowercase
text = 'StackExchange Code Review'
frequency = _Counter({c: 0 for c in _ascii_lowercase})
frequency.update(c for c in text.lower() if c in frequency)
total = sum(frequency.values())
print('\n'.join(
 '{}:{:>8}{:9.2f}%'.format(c, n, 100 * n / (total or 1))
 for c, n in sorted(frequency.items())
))

I think that the call to sorted() should be optional as of Python 3.6.

Thanks. Counter is an unordered collection, so sorted() should be necessary if we want to get the list of frequency by alphabetical order; is this correct?
The release notes for Python 3.7 says: "The insertion-order preservation nature of dict objects has been declared an official part of the Python language spec." It's ambiguous whether that statement also applies to Counter, which is dict-like, but when I tried it on Python 3.7, it works without sorted(), and that I doubt that the order is coincidental.
I would probably still wrap with sorted. There is major disagreement among the main python devs as to whether or not insertion order preservation is actually guaranteed or just probably going to remain true. Given that sorted is O(n) for already sorted collections (bc timsort), I wouldn't risk the code breaking in 5 years if some dev changes things.

Stack Exchange Network

Printing a table of letter frequencies using Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Printing a table of letter frequencies using Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions