As a beginner Python programmer, I wrote a simple program that counts how many times each letter appears in a text file. It works fine, but I'd like to know if it's possible to improve it.
def count_letters(filename, case_sensitive=False):
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
with open(filename, 'r') as f:
text = f.read()
if not case_sensitive:
alphabet = alphabet[:26]
text = text.lower()
letter_count = {ltr: 0 for ltr in alphabet}
for char in text:
if char in alphabet:
letter_count[char] += 1
for key in sorted(letter_count):
print(key, letter_count[key])
print("total:", sum(letter_count.values()))
2 Answers 2
@flornquake points at the good direction (use string
and collections.Counter
) but I'd still modify some details:
alphabet = alphabet[:26]
andtext = text.lower()
: My advice is not to override existing variables with new values, it makes code harder to understand. Use different names.if char in alphabet
: Make sure you perform inclusion predicates with hashes, sets or similar data structures, not lists/arrays. O(1) vs O(n).Functions should return values (hopefully related with their name). Here it makes sense to return the counter.
I'd write:
import collections
import string
def count_letters(filename, case_sensitive=False):
with open(filename, 'r') as f:
original_text = f.read()
if case_sensitive:
alphabet = string.ascii_letters
text = original_text
else:
alphabet = string.ascii_lowercase
text = original_text.lower()
alphabet_set = set(alphabet)
counts = collections.Counter(c for c in text if c in alphabet_set)
for letter in alphabet:
print(letter, counts[letter])
print("total:", sum(counts.values()))
return counts
-
\$\begingroup\$ You didn't explicitly mention it, but one of the best improvements is the initialization of the
Counter
from a generator expression. Nice. \$\endgroup\$Adam– Adam2013年07月27日 09:03:02 +00:00Commented Jul 27, 2013 at 9:03 -
\$\begingroup\$ @codesparkle: I was to, but I realized that would be reviewing flownquake's answer, not OP :-) I'l add a comment to his question. \$\endgroup\$tokland– tokland2013年07月27日 12:09:20 +00:00Commented Jul 27, 2013 at 12:09
-
\$\begingroup\$ Wouldn't using
original_text = original_text.lower()
be better? Avoids unneeded variable. Also what about non-existent files? That would create an unrecognizable error for user as saying that file doesn't exist in error is quite different from the error thatoriginal_text
undeclared. \$\endgroup\$Aseem Bansal– Aseem Bansal2013年07月29日 16:33:59 +00:00Commented Jul 29, 2013 at 16:33 -
\$\begingroup\$ @AseemBansal 1) reuse variables: Are you familiar with the functional paradigm? in FP you never ever use the same variable name (or symbol) for two different values (in the same scope, of course), because it's harder to reason about the algorithm. It's like in maths, you don't have x = 1, and just a step later x = 2, that makes no sense; the same when programming. More on this: en.wikipedia.org/wiki/Functional_programming 2) "what about non-existent files" The OP didn't stated what was to be done, so I did the same he did: nothing. Let the exception blow or be caught by the caller. \$\endgroup\$tokland– tokland2013年07月29日 16:39:10 +00:00Commented Jul 29, 2013 at 16:39
-
1\$\begingroup\$ @AseemBansal: In C is difficult (and usually unrealistic) to apply FP principles, but in Python (like in Ruby, Perl, Lua and other imperative languages), you definitely can (of course an inpure approach to FP, with side-effects). If you give it a try, you'll see that code improves noticiably, it's more much more declarative (describing not how but what you are doing). Perhaps the starting point would be: docs.python.org/3/howto/functional.html. If you happen to know Ruby, I wrote this: code.google.com/p/tokland/wiki/RubyFunctionalProgramming \$\endgroup\$tokland– tokland2013年07月29日 18:08:33 +00:00Commented Jul 29, 2013 at 18:08
Your coding style is good, especially if you're only starting to program in Python. But to improve your code, you should learn your way around the standard library. In this case, the string
and collections
modules can make life easier for you.
import collections
import string
def count_letters(filename, case_sensitive=False):
with open(filename, 'r') as f:
text = f.read()
if case_sensitive:
alphabet = string.ascii_letters
else:
alphabet = string.ascii_lowercase
text = text.lower()
letter_count = collections.Counter()
for char in text:
if char in alphabet:
letter_count[char] += 1
for letter in alphabet:
print(letter, letter_count[letter])
print("total:", sum(letter_count.values()))
collections.Counter
is, in essence, a dictionary with all values defaulting to 0
.
-
1\$\begingroup\$ Note that
collection.Counter
may take an iterable, so there is no need to add values with a side-effect (letter_count[char] += 1
). \$\endgroup\$tokland– tokland2013年07月27日 12:11:11 +00:00Commented Jul 27, 2013 at 12:11 -
\$\begingroup\$ @tokland Good point. \$\endgroup\$flornquake– flornquake2013年07月27日 18:27:12 +00:00Commented Jul 27, 2013 at 18:27