I've got an Algorithm class whose responsibility is to find if a given word is in a list of words.
As part of doing that, the algorithm first has to lowercase the words, remove punctuation, and remove any stopwords. Then, finally, the algorithm tries to find the given word in the cleaned list via the find_word
method.
As you can see in the code, I have split everything in small methods to avoid having one big method. However, I am not sure what's the proper way to call those methods. I am calling all them inside the find_word
method one after the other, but that smells like a bad design to me. How do I do this the right way?
import string
from nltk.corpus import stopwords
stopwords = stopwords.words('english')
class Algorithm:
def __init__(self, words: list):
self.words = words
def _lower(self):
# Lowercase all words of the list
self.words = [word.lower() for word in self.words]
def _remove_punctuation(self):
# Remove punctuation from all words of the list
punct_dict = dict((ord(punct), None) for punct in string.punctuation) # creates dict of {33:None, 34:None, etc}
self.words = [words.translate(punct_dict) for words in
self.words]
def _remove_stopwords(self):
# Remove stopwords from list such as in, at, who, etc.
self.words = [word for word in self.words if word not in stopwords]
def find_word(self, user_word):
# Find if the user given word is in the cleaned list
self._lower()
self._remove_punctuation()
self._remove_stopwords()
if user_word in self.words:
return user_word
class Chatbot:
def answer(self, user_word):
algorithm = Algorithm(['Rabbit', 'Horse', 'turtle'])
return algorithm.find_word(user_word)
chatbot = Chatbot()
print(chatbot.answer('horse'))
1 Answer 1
The only obviously bad thing in your design is that if you call find_word
multiple times on the same Algorithm
instance, you are duplicating the preparation work on the wordlist.
That can be remedied by calling _lower
, _remove_punctuation
and _remove_stopwords
from __init__
.
Apart from that, you might want to do some preprocessing on the user_word
as well, so that find_word('hORSE')
would also work, and maybe even find_word('!horse!')
.
Explore related questions
See similar questions with these tags.