Exercism assignment word-count in Python

Question 1

Can someone review my code which I put on exercism.io for the word-count exercise (found here)?

class Phrase:
 def __init__(self, phrase):
 self.phrase = phrase.strip().lower().split()
 def word_count(self):
 word_dict = {}
 for word in self.phrase:
 word_input = ''.join(e for e in word if e.isalnum()).strip()
 if word_input:
 word_dict[word_input] = word_dict.get(word_input, 0) + 1
 return word_dict

Question 2

I understand this is a Python exercise, but note that in real text word counting is not that easy. 'fish/meat' should be two words but 'A/C' only one. 'Los Angeles' should possibly be one word, while 'The Angels' are two words. It gets worse when you want to split sentences. nltk is a Python library that can help you with this if you're interested in natural language processing.

Question 3

My first advice is stop writing classes if you don't need to : your class has only two methods an init and a proper method. This could probably be written as a simple function.

def word_count(phrase):
 word_dict = {}
 for word in phrase.strip().lower().split():
 word_input = ''.join(e for e in word if e.isalnum()).strip()
 if word_input:
 word_dict[word_input] = word_dict.get(word_input, 0) + 1
 return word_dict

Also, a lot of string processing you are doing is not useful, complicated and/or potentially wrong :

On phrase, as noticed by Gareth Rees, there's not point calling strip() since you call split().

As I wasn't aware that the result would be the same, here a proof of concept :

a=' this is a little test'
a.split() == a.strip().split()
-> True

and here's a link to the documentation for strip and split.

On individual words : the way you get word_input from word looks a bit convoluted to me. Among other things, there's no point in calling strip() on a string that only contains alphanumerical characters. Also, just removing "special" characters does not sound like the best option : it's and its will be considered the same way, its annoying :-). Maybe some characters like ' should be taken into account during the splitting. Maybe, depending on the language you are to handle, other characters like - should be kept (for instance in French, "basketball" is "basket-ball" but neither "basket" nor "ball" are French words so splitting would be quite awkward and so would be removing the dash).

Except for that, your code looks go to me!

However, if you wanted to make things even more pythonic, you could use defaultdict. This example will probably look familar to you.

Question 4

There's no point calling .strip() at all, since you call .split() just afterwards.

Question 5

Some notes:

As already pointed out, an object-oriented approach seems overkill, a simple function will do.
self.phrase. That's very confusing, phrase is a string but self.phrase is a list of strings. A better name would be words.

I'd write it with a functional approach using collections.Counter:

from collections import Counter
def word_count(phrase):
 words = phrase.lower().split()
 return Counter("".join(c for c in word if c.isalnum()) for word in words)

Or even simpler:

from collections import Counter
import re
def word_count(phrase):
 return Counter(re.sub(r"[^\w ]+", '', phrase.lower()).split())

Question 6

This doesn't work on none ASCII characters such as: досвидания!

SylvainD SylvainDSylvainD 29.7k1 gold badge49 silver badges93 bronze badges · Answer 1 · 2014-02-18 08:50:05Z

My first advice is stop writing classes if you don't need to : your class has only two methods an init and a proper method. This could probably be written as a simple function.

def word_count(phrase):
 word_dict = {}
 for word in phrase.strip().lower().split():
 word_input = ''.join(e for e in word if e.isalnum()).strip()
 if word_input:
 word_dict[word_input] = word_dict.get(word_input, 0) + 1
 return word_dict

Also, a lot of string processing you are doing is not useful, complicated and/or potentially wrong :

On phrase, as noticed by Gareth Rees, there's not point calling strip() since you call split().

As I wasn't aware that the result would be the same, here a proof of concept :

a=' this is a little test'
a.split() == a.strip().split()
-> True

and here's a link to the documentation for strip and split.

On individual words : the way you get word_input from word looks a bit convoluted to me. Among other things, there's no point in calling strip() on a string that only contains alphanumerical characters. Also, just removing "special" characters does not sound like the best option : it's and its will be considered the same way, its annoying :-). Maybe some characters like ' should be taken into account during the splitting. Maybe, depending on the language you are to handle, other characters like - should be kept (for instance in French, "basketball" is "basket-ball" but neither "basket" nor "ball" are French words so splitting would be quite awkward and so would be removing the dash).

Except for that, your code looks go to me!

However, if you wanted to make things even more pythonic, you could use defaultdict. This example will probably look familar to you.

There's no point calling .strip() at all, since you call .split() just afterwards.

tokland toklandtokland 11.2k1 gold badge21 silver badges26 bronze badges · Answer 2 · 2014-02-18 13:35:47Z

Some notes:

As already pointed out, an object-oriented approach seems overkill, a simple function will do.
self.phrase. That's very confusing, phrase is a string but self.phrase is a list of strings. A better name would be words.

I'd write it with a functional approach using collections.Counter:

from collections import Counter
def word_count(phrase):
 words = phrase.lower().split()
 return Counter("".join(c for c in word if c.isalnum()) for word in words)

Or even simpler:

from collections import Counter
import re
def word_count(phrase):
 return Counter(re.sub(r"[^\w ]+", '', phrase.lower()).split())

This doesn't work on none ASCII characters such as: досвидания!

Stack Exchange Network

Exercism assignment word-count in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Exercism assignment word-count in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions