2
\$\begingroup\$

What is the recommended (standard, cleaner, more elegant, etc.) method in Python to get a string with all the alphabet letters?

Here's three versions of code to generate a dictionary with keys as letters and 0 as all values. The aim is to parse a text and count how often a letter appears.


from string import ascii_lowercase
frequency = {letter: 0 for letter in ascii_lowercase}

A downside of this method is that the constant shows up in the pydoc output for the module in which it is imported.


frequency = {chr(letter): 0 for letter in range(ord('a'), ord('z')+1)}

frequency = {letter: 0 for letter in 'abcdefghijklmnopqrstuvwxyz'}

This is the snippet of code where it is used:

frequency = {chr(letter): 0 for letter in range(ord('a'), ord('z')+1)}
# The 'text' variable is taken from somewhere else, we define it here for example's sake
text = 'StackExchange Code Review'
for c in text.lower():
 if c in frequency.keys():
 frequency[c] += 1
 total += 1
output = OrderedDict(sorted(frequency.items(), key = lambda x: x[0]))
for l in output.keys():
 print(l + ':' + '{:>8}'.format(output[l]) + '{:9.2f}'.format(100 * output[l] / (total if total > 0 else 1)) + ' %')
200_success
146k22 gold badges190 silver badges479 bronze badges
asked Oct 12, 2018 at 18:31
\$\endgroup\$
3
  • 1
    \$\begingroup\$ What are you really trying to accomplish with this code? Please provide the context. See How to Ask. \$\endgroup\$ Commented Oct 12, 2018 at 18:32
  • \$\begingroup\$ I have added some context, however should a dictionary not be the best way to do so, I'm still curious to know the answer to my original question ("how to generate the alphabet in Python"). Thanks in advance. \$\endgroup\$ Commented Oct 12, 2018 at 18:36
  • 1
    \$\begingroup\$ "The aim is to parse a text and count how often a letter appears." — Then post that code too, so that we can review your code properly. \$\endgroup\$ Commented Oct 12, 2018 at 18:38

3 Answers 3

4
\$\begingroup\$

The printing portion should be simplified:

  • The use of OrderedDict is superfluous. A list of pairs would suffice.
  • I'd prefer to stick to the same iteration variable, when it is used to refer to the same thing. Choose either c or l.
  • You are not using str.format() effectively.
  • (total if total > 0 else 1) could just be (total or 1).
for c, n in sorted(frequency.items()):
 print('{}:{:>8}{:9.2f}%'.format(c, n, 100 * n / (total or 1)))

As for the counting process itself, you could write c in frequency instead of c in frequency.keys(). But, really, the task is so naturally suited to a collections.Counter that it would be silly not to use it.

from collections import Counter as _Counter
from string import ascii_lowercase as _ascii_lowercase
text = 'StackExchange Code Review'
frequency = _Counter({c: 0 for c in _ascii_lowercase})
frequency.update(c for c in text.lower() if c in frequency)
total = sum(frequency.values())
print('\n'.join(
 '{}:{:>8}{:9.2f}%'.format(c, n, 100 * n / (total or 1))
 for c, n in sorted(frequency.items())
))

I think that the call to sorted() should be optional as of Python 3.6.

answered Oct 12, 2018 at 21:18
\$\endgroup\$
3
  • \$\begingroup\$ Thanks. Counter is an unordered collection, so sorted() should be necessary if we want to get the list of frequency by alphabetical order; is this correct? \$\endgroup\$ Commented Oct 13, 2018 at 18:35
  • \$\begingroup\$ The release notes for Python 3.7 says: "The insertion-order preservation nature of dict objects has been declared an official part of the Python language spec." It's ambiguous whether that statement also applies to Counter, which is dict-like, but when I tried it on Python 3.7, it works without sorted(), and that I doubt that the order is coincidental. \$\endgroup\$ Commented Oct 13, 2018 at 19:06
  • 1
    \$\begingroup\$ I would probably still wrap with sorted. There is major disagreement among the main python devs as to whether or not insertion order preservation is actually guaranteed or just probably going to remain true. Given that sorted is O(n) for already sorted collections (bc timsort), I wouldn't risk the code breaking in 5 years if some dev changes things. \$\endgroup\$ Commented Oct 13, 2018 at 22:17
3
\$\begingroup\$

In the first part of your snippet, you can calculate the frequency in the place of '0' by finding the 'text.count' for each character in the 'text' itself. This will help you to avoid using 'if-statement' inside 'for-statement' and keeping your code clean and short.

Example:

text = 'StackExchange Code Review'
frequency = {chr(ltr) : text.lower().count(chr(ltr)) for ltr in range(ord('a'), ord('z') + 1)}
total = sum(frequency.values())

Note: 'if-statement', based on the 'text' in your snippet, checks for the space character which is absolutely not existed in the 'frequency', and thus, extra steps of execution.

answered Oct 14, 2018 at 14:43
\$\endgroup\$
2
\$\begingroup\$

If your real problem is...

A downside of this method is that the constant shows up in the pydoc output for the module in which it is imported.

... then change the name of the import:

from string import ascii_lowercase as _lowercase_letters
frequency = {letter: 0 for letter in _lowercase_letters}

The leading underscore will prevent pydoc from automatically adding a reference to it.

answered Oct 12, 2018 at 18:55
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.