6
\$\begingroup\$

Can someone review my code which I put on exercism.io for the word-count exercise (found here)?

class Phrase:
 def __init__(self, phrase):
 self.phrase = phrase.strip().lower().split()
 def word_count(self):
 word_dict = {}
 for word in self.phrase:
 word_input = ''.join(e for e in word if e.isalnum()).strip()
 if word_input:
 word_dict[word_input] = word_dict.get(word_input, 0) + 1
 return word_dict
SylvainD
29.7k1 gold badge49 silver badges93 bronze badges
asked Feb 18, 2014 at 8:12
\$\endgroup\$
1
  • \$\begingroup\$ I understand this is a Python exercise, but note that in real text word counting is not that easy. 'fish/meat' should be two words but 'A/C' only one. 'Los Angeles' should possibly be one word, while 'The Angels' are two words. It gets worse when you want to split sentences. nltk is a Python library that can help you with this if you're interested in natural language processing. \$\endgroup\$ Commented Feb 18, 2014 at 9:57

2 Answers 2

5
\$\begingroup\$

My first advice is stop writing classes if you don't need to : your class has only two methods an init and a proper method. This could probably be written as a simple function.

def word_count(phrase):
 word_dict = {}
 for word in phrase.strip().lower().split():
 word_input = ''.join(e for e in word if e.isalnum()).strip()
 if word_input:
 word_dict[word_input] = word_dict.get(word_input, 0) + 1
 return word_dict

Also, a lot of string processing you are doing is not useful, complicated and/or potentially wrong :

  • On phrase, as noticed by Gareth Rees, there's not point calling strip() since you call split().

As I wasn't aware that the result would be the same, here a proof of concept :

a=' this is a little test'
a.split() == a.strip().split()
-> True

and here's a link to the documentation for strip and split.

  • On individual words : the way you get word_input from word looks a bit convoluted to me. Among other things, there's no point in calling strip() on a string that only contains alphanumerical characters. Also, just removing "special" characters does not sound like the best option : it's and its will be considered the same way, its annoying :-). Maybe some characters like ' should be taken into account during the splitting. Maybe, depending on the language you are to handle, other characters like - should be kept (for instance in French, "basketball" is "basket-ball" but neither "basket" nor "ball" are French words so splitting would be quite awkward and so would be removing the dash).

Except for that, your code looks go to me!

However, if you wanted to make things even more pythonic, you could use defaultdict. This example will probably look familar to you.

answered Feb 18, 2014 at 8:50
\$\endgroup\$
1
  • 2
    \$\begingroup\$ There's no point calling .strip() at all, since you call .split() just afterwards. \$\endgroup\$ Commented Feb 18, 2014 at 11:06
5
\$\begingroup\$

Some notes:

  • As already pointed out, an object-oriented approach seems overkill, a simple function will do.
  • self.phrase. That's very confusing, phrase is a string but self.phrase is a list of strings. A better name would be words.

I'd write it with a functional approach using collections.Counter:

from collections import Counter
def word_count(phrase):
 words = phrase.lower().split()
 return Counter("".join(c for c in word if c.isalnum()) for word in words)

Or even simpler:

from collections import Counter
import re
def word_count(phrase):
 return Counter(re.sub(r"[^\w ]+", '', phrase.lower()).split())
Mast
13.8k12 gold badges55 silver badges127 bronze badges
answered Feb 18, 2014 at 13:35
\$\endgroup\$
1
  • \$\begingroup\$ This doesn't work on none ASCII characters such as: досвидания! \$\endgroup\$ Commented Aug 29, 2016 at 19:25

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.