I need to write code for the given problem:
I'm provided with a word and I need to find all the possible combination of it that matches with a given list of words in a file.
Here's my code. Can I make it much better? I'm sure it can be done. Please offer suggestions.
dict = {}
file = open("/usr/share/dict/words", "r")
for word in file: #Iterate through every word in the dictionary
word = word.strip().lower() #Strip newlines and send to lowercase
sorted_word = ''.join(sorted(word)) #Alphabetically sort the word
if sorted_word in dict: #Check if sorted_word is already a key
if word not in dict[sorted_word]: #Make sure word is not already in the list under the key sorted_word
dict[sorted_word].append(word) #Add to list under the key sorted_word
else: #If not in dictionary
dict[sorted_word] = [word] #Create new list with one entry
while(True): #Loop until quit is typed
jumble = raw_input("Enter a jumble to decode or 'quit': ") #Get input
jumble = jumble.lower() #Send jumble to lower
if(jumble == "quit"): #Quit if quit is typed
break
jumble = ''.join(sorted(jumble)) #Sort jumble alphabetical
if jumble in dict: #If sorted jumble exists in dictionary
results = dict[jumble] #Get list of words that match jumble
for result in results: #Loop through list printing results
print result, #Trailing , designates that there should be a space between entries
print "" #Print newlines
else: #Jumble not found in dictionary print message
print "Results for jumble not found"
3 Answers 3
Hum, first of all, there are way too many comments. People simply won't read them, you should only keep the meaningful ones. For example, you could remove:
if sorted_word in dict: #Check if sorted_word is already a key
Anybody who knows how Python dictionaries work will know what this lign is doing, the code is explicit by itself. Instead of always repeating what your dict
is doing, you should just explain once what it does when you declare it; the code will tell the rest. You should only explain once at the beginning how your algorithm work and let your code tell the rest. Honestly, you could remove almost all of your comments. Python is designed to be readable and it is.
Second point, avoid to name a variable dict
: it is already the name of a built-in type. It's bad pratice and can be confusing for people who will try to read your code.
Also, you open your file once, but never close it. You should close it once you have filled your dict
. You could use the with open('...', 'r') as f:
construct to have your file closed automatically at the end of the with
block, therefore, you won't even have to remember to close it. Also, file
, like dict
is also the name of a built-in type in Python, find another name :)
-
\$\begingroup\$ 100% agreed about too many comments. My general advice here is similar to yours: the lines of code themselves tell the tiny details, so instead find blocks of code and explain what they accomplish together. E.g. "Create a mapping from the sorted characters of a word to all the words made from those characters" for the first block; and "Look up and print all words created using all the same characters entered" for the second block. \$\endgroup\$Michael Urman– Michael Urman2013年12月10日 14:31:36 +00:00Commented Dec 10, 2013 at 14:31
On top of the comments from Morwenn : setdefault
and join
will do exactly what you want to do and make your code more expressive.
Here what your code can become :
#!/usr/bin/python
sorted_words = {}
with open("/usr/share/dict/words", "r") as f:
for word in f:
word = word.strip().lower()
sorted_word = ''.join(sorted(word))
sorted_words.setdefault(sorted_word,[]).append(word)
while True:
jumble = raw_input("Enter a jumble to decode or 'quit': ").lower()
if jumble == "quit":
break
sorted_jumble = ''.join(sorted(jumble))
if sorted_jumble in sorted_words:
print ", ".join(sorted_words[sorted_jumble])
else:
print "Results for jumble not found"
-
2\$\begingroup\$ Big +1; I was halfway through writing a response with the same advice. However there is one important behavior change with the above: it doesn't filter duplicate words (say from a bad dictionary file). I was going to suggest
set
instead oflist
for this, but that changes the order printed later. Note as well thatcollections.defaultdict(list)
lets you just callsorted_words[sorted_word].append(word)
. There's also one typo change: use", ".join
to keep the spacing between resulting words. \$\endgroup\$Michael Urman– Michael Urman2013年12月10日 14:25:40 +00:00Commented Dec 10, 2013 at 14:25 -
\$\begingroup\$ Thanks for pointing this out. I've edited my answer. As for the set, it completely went out of my mind. Given the source for words, I don't know if it is a big deal. \$\endgroup\$SylvainD– SylvainD2013年12月10日 14:28:48 +00:00Commented Dec 10, 2013 at 14:28
-
\$\begingroup\$ I agree with Michael Urman:
collections.defaultdict
is clearly indicated here. \$\endgroup\$Gareth Rees– Gareth Rees2013年12月10日 15:25:53 +00:00Commented Dec 10, 2013 at 15:25
One thing to consider: by using quit
as a magic word, you cannot look up anagrams of it. One approach you could use to enable looking up anagrams of quit
is to instead loop until EOF (typically input by pressing ctrl+d or ctrl+z enter) by using something like this:
while True:
try:
jumble = raw_input("Enter a jumble to decode: ")
except EOFError:
break
...
Alternately you could exit on an empty string, which of course has no anagrams.
sit
find anagrams including bothits
andit's
, and your code will not do so. \$\endgroup\$