Counting letters in a text file

Question 1

As a beginner Python programmer, I wrote a simple program that counts how many times each letter appears in a text file. It works fine, but I'd like to know if it's possible to improve it.

def count_letters(filename, case_sensitive=False):
 alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
 with open(filename, 'r') as f:
 text = f.read()
 if not case_sensitive:
 alphabet = alphabet[:26]
 text = text.lower()
 letter_count = {ltr: 0 for ltr in alphabet}
 for char in text:
 if char in alphabet:
 letter_count[char] += 1
 for key in sorted(letter_count):
 print(key, letter_count[key])
 print("total:", sum(letter_count.values()))

Question 2

@flornquake points at the good direction (use string and collections.Counter) but I'd still modify some details:

alphabet = alphabet[:26] and text = text.lower(): My advice is not to override existing variables with new values, it makes code harder to understand. Use different names.
if char in alphabet: Make sure you perform inclusion predicates with hashes, sets or similar data structures, not lists/arrays. O(1) vs O(n).
Functions should return values (hopefully related with their name). Here it makes sense to return the counter.

I'd write:

import collections
import string
def count_letters(filename, case_sensitive=False):
 with open(filename, 'r') as f:
 original_text = f.read()
 if case_sensitive:
 alphabet = string.ascii_letters
 text = original_text
 else:
 alphabet = string.ascii_lowercase
 text = original_text.lower()
 alphabet_set = set(alphabet)
 counts = collections.Counter(c for c in text if c in alphabet_set)
 for letter in alphabet:
 print(letter, counts[letter])
 print("total:", sum(counts.values()))
 return counts

Question 3

You didn't explicitly mention it, but one of the best improvements is the initialization of the Counter from a generator expression. Nice.

Question 4

@codesparkle: I was to, but I realized that would be reviewing flownquake's answer, not OP :-) I'l add a comment to his question.

Question 5

Wouldn't using original_text = original_text.lower() be better? Avoids unneeded variable. Also what about non-existent files? That would create an unrecognizable error for user as saying that file doesn't exist in error is quite different from the error that original_text undeclared.

Question 6

@AseemBansal 1) reuse variables: Are you familiar with the functional paradigm? in FP you never ever use the same variable name (or symbol) for two different values (in the same scope, of course), because it's harder to reason about the algorithm. It's like in maths, you don't have x = 1, and just a step later x = 2, that makes no sense; the same when programming. More on this: en.wikipedia.org/wiki/Functional_programming 2) "what about non-existent files" The OP didn't stated what was to be done, so I did the same he did: nothing. Let the exception blow or be caught by the caller.

Question 7

@AseemBansal: In C is difficult (and usually unrealistic) to apply FP principles, but in Python (like in Ruby, Perl, Lua and other imperative languages), you definitely can (of course an inpure approach to FP, with side-effects). If you give it a try, you'll see that code improves noticiably, it's more much more declarative (describing not how but what you are doing). Perhaps the starting point would be: docs.python.org/3/howto/functional.html. If you happen to know Ruby, I wrote this: code.google.com/p/tokland/wiki/RubyFunctionalProgramming

Question 8

Your coding style is good, especially if you're only starting to program in Python. But to improve your code, you should learn your way around the standard library. In this case, the string and collections modules can make life easier for you.

import collections
import string
def count_letters(filename, case_sensitive=False):
 with open(filename, 'r') as f:
 text = f.read()
 if case_sensitive:
 alphabet = string.ascii_letters
 else:
 alphabet = string.ascii_lowercase
 text = text.lower()
 letter_count = collections.Counter()
 for char in text:
 if char in alphabet:
 letter_count[char] += 1
 for letter in alphabet:
 print(letter, letter_count[letter])
 print("total:", sum(letter_count.values()))

collections.Counter is, in essence, a dictionary with all values defaulting to 0.

Question 9

Note that collection.Counter may take an iterable, so there is no need to add values with a side-effect (letter_count[char] += 1).

Question 10

@tokland Good point.

tokland tokland 11.2k1 gold badge21 silver badges26 bronze badges · Accepted Answer · 2013-07-26 20:46:11Z

6

\$\begingroup\$

@flornquake points at the good direction (use string and collections.Counter) but I'd still modify some details:

alphabet = alphabet[:26] and text = text.lower(): My advice is not to override existing variables with new values, it makes code harder to understand. Use different names.
if char in alphabet: Make sure you perform inclusion predicates with hashes, sets or similar data structures, not lists/arrays. O(1) vs O(n).
Functions should return values (hopefully related with their name). Here it makes sense to return the counter.

I'd write:

import collections
import string
def count_letters(filename, case_sensitive=False):
 with open(filename, 'r') as f:
 original_text = f.read()
 if case_sensitive:
 alphabet = string.ascii_letters
 text = original_text
 else:
 alphabet = string.ascii_lowercase
 text = original_text.lower()
 alphabet_set = set(alphabet)
 counts = collections.Counter(c for c in text if c in alphabet_set)
 for letter in alphabet:
 print(letter, counts[letter])
 print("total:", sum(counts.values()))
 return counts

Share

edited Jan 18, 2019 at 11:41

answered Jul 26, 2013 at 20:46

tokland's user avatar

tokland tokland

11.2k1 gold badge21 silver badges26 bronze badges

\$\endgroup\$

7

\$\begingroup\$ You didn't explicitly mention it, but one of the best improvements is the initialization of the Counter from a generator expression. Nice. \$\endgroup\$

Adam
– Adam

2013年07月27日 09:03:02 +00:00
Commented Jul 27, 2013 at 9:03
\$\begingroup\$ @codesparkle: I was to, but I realized that would be reviewing flownquake's answer, not OP :-) I'l add a comment to his question. \$\endgroup\$

tokland
– tokland

2013年07月27日 12:09:20 +00:00
Commented Jul 27, 2013 at 12:09
\$\begingroup\$ Wouldn't using original_text = original_text.lower() be better? Avoids unneeded variable. Also what about non-existent files? That would create an unrecognizable error for user as saying that file doesn't exist in error is quite different from the error that original_text undeclared. \$\endgroup\$

Aseem Bansal
– Aseem Bansal

2013年07月29日 16:33:59 +00:00
Commented Jul 29, 2013 at 16:33
\$\begingroup\$ @AseemBansal 1) reuse variables: Are you familiar with the functional paradigm? in FP you never ever use the same variable name (or symbol) for two different values (in the same scope, of course), because it's harder to reason about the algorithm. It's like in maths, you don't have x = 1, and just a step later x = 2, that makes no sense; the same when programming. More on this: en.wikipedia.org/wiki/Functional_programming 2) "what about non-existent files" The OP didn't stated what was to be done, so I did the same he did: nothing. Let the exception blow or be caught by the caller. \$\endgroup\$

tokland
– tokland

2013年07月29日 16:39:10 +00:00
Commented Jul 29, 2013 at 16:39
1

\$\begingroup\$ @AseemBansal: In C is difficult (and usually unrealistic) to apply FP principles, but in Python (like in Ruby, Perl, Lua and other imperative languages), you definitely can (of course an inpure approach to FP, with side-effects). If you give it a try, you'll see that code improves noticiably, it's more much more declarative (describing not how but what you are doing). Perhaps the starting point would be: docs.python.org/3/howto/functional.html. If you happen to know Ruby, I wrote this: code.google.com/p/tokland/wiki/RubyFunctionalProgramming \$\endgroup\$

tokland
– tokland

2013年07月29日 18:08:33 +00:00
Commented Jul 29, 2013 at 18:08

| Show 2 more comments

Stack Exchange Network

Counting letters in a text file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Counting letters in a text file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions