What is the recommended (standard, cleaner, more elegant, etc.) method in Python to get a string with all the alphabet letters?
Here's three versions of code to generate a dictionary with keys as letters and 0 as all values. The aim is to parse a text and count how often a letter appears.
from string import ascii_lowercase
frequency = {letter: 0 for letter in ascii_lowercase}
A downside of this method is that the constant shows up in the pydoc
output for the module in which it is imported.
frequency = {chr(letter): 0 for letter in range(ord('a'), ord('z')+1)}
frequency = {letter: 0 for letter in 'abcdefghijklmnopqrstuvwxyz'}
This is the snippet of code where it is used:
frequency = {chr(letter): 0 for letter in range(ord('a'), ord('z')+1)}
# The 'text' variable is taken from somewhere else, we define it here for example's sake
text = 'StackExchange Code Review'
for c in text.lower():
if c in frequency.keys():
frequency[c] += 1
total += 1
output = OrderedDict(sorted(frequency.items(), key = lambda x: x[0]))
for l in output.keys():
print(l + ':' + '{:>8}'.format(output[l]) + '{:9.2f}'.format(100 * output[l] / (total if total > 0 else 1)) + ' %')
-
1\$\begingroup\$ What are you really trying to accomplish with this code? Please provide the context. See How to Ask. \$\endgroup\$200_success– 200_success2018年10月12日 18:32:39 +00:00Commented Oct 12, 2018 at 18:32
-
\$\begingroup\$ I have added some context, however should a dictionary not be the best way to do so, I'm still curious to know the answer to my original question ("how to generate the alphabet in Python"). Thanks in advance. \$\endgroup\$dr_– dr_2018年10月12日 18:36:58 +00:00Commented Oct 12, 2018 at 18:36
-
1\$\begingroup\$ "The aim is to parse a text and count how often a letter appears." — Then post that code too, so that we can review your code properly. \$\endgroup\$200_success– 200_success2018年10月12日 18:38:35 +00:00Commented Oct 12, 2018 at 18:38
3 Answers 3
The printing portion should be simplified:
- The use of
OrderedDict
is superfluous. A list of pairs would suffice. - I'd prefer to stick to the same iteration variable, when it is used to refer to the same thing. Choose either
c
orl
. - You are not using
str.format()
effectively. (total if total > 0 else 1)
could just be(total or 1)
.
for c, n in sorted(frequency.items()):
print('{}:{:>8}{:9.2f}%'.format(c, n, 100 * n / (total or 1)))
As for the counting process itself, you could write c in frequency
instead of c in frequency.keys()
. But, really, the task is so naturally suited to a collections.Counter
that it would be silly not to use it.
from collections import Counter as _Counter
from string import ascii_lowercase as _ascii_lowercase
text = 'StackExchange Code Review'
frequency = _Counter({c: 0 for c in _ascii_lowercase})
frequency.update(c for c in text.lower() if c in frequency)
total = sum(frequency.values())
print('\n'.join(
'{}:{:>8}{:9.2f}%'.format(c, n, 100 * n / (total or 1))
for c, n in sorted(frequency.items())
))
I think that the call to sorted()
should be optional as of Python 3.6.
-
\$\begingroup\$ Thanks.
Counter
is an unordered collection, sosorted()
should be necessary if we want to get the list of frequency by alphabetical order; is this correct? \$\endgroup\$dr_– dr_2018年10月13日 18:35:05 +00:00Commented Oct 13, 2018 at 18:35 -
\$\begingroup\$ The release notes for Python 3.7 says: "The insertion-order preservation nature of dict objects has been declared an official part of the Python language spec." It's ambiguous whether that statement also applies to
Counter
, which is dict-like, but when I tried it on Python 3.7, it works withoutsorted()
, and that I doubt that the order is coincidental. \$\endgroup\$200_success– 200_success2018年10月13日 19:06:22 +00:00Commented Oct 13, 2018 at 19:06 -
1\$\begingroup\$ I would probably still wrap with sorted. There is major disagreement among the main python devs as to whether or not insertion order preservation is actually guaranteed or just probably going to remain true. Given that
sorted
isO(n)
for already sorted collections (bc timsort), I wouldn't risk the code breaking in 5 years if some dev changes things. \$\endgroup\$Oscar Smith– Oscar Smith2018年10月13日 22:17:55 +00:00Commented Oct 13, 2018 at 22:17
In the first part of your snippet, you can calculate the frequency in the place of '0' by finding the 'text.count' for each character in the 'text' itself. This will help you to avoid using 'if-statement' inside 'for-statement' and keeping your code clean and short.
Example:
text = 'StackExchange Code Review'
frequency = {chr(ltr) : text.lower().count(chr(ltr)) for ltr in range(ord('a'), ord('z') + 1)}
total = sum(frequency.values())
Note: 'if-statement', based on the 'text' in your snippet, checks for the space character which is absolutely not existed in the 'frequency', and thus, extra steps of execution.
If your real problem is...
A downside of this method is that the constant shows up in the pydoc output for the module in which it is imported.
... then change the name of the import:
from string import ascii_lowercase as _lowercase_letters
frequency = {letter: 0 for letter in _lowercase_letters}
The leading underscore will prevent pydoc from automatically adding a reference to it.
Explore related questions
See similar questions with these tags.