Jumble solver algorithm

Question 1

I need to write code for the given problem:

I'm provided with a word and I need to find all the possible combination of it that matches with a given list of words in a file.

Here's my code. Can I make it much better? I'm sure it can be done. Please offer suggestions.

dict = {} 
file = open("/usr/share/dict/words", "r") 
for word in file: #Iterate through every word in the dictionary
 word = word.strip().lower() #Strip newlines and send to lowercase
 sorted_word = ''.join(sorted(word)) #Alphabetically sort the word
 if sorted_word in dict: #Check if sorted_word is already a key
 if word not in dict[sorted_word]: #Make sure word is not already in the list under the key sorted_word
 dict[sorted_word].append(word) #Add to list under the key sorted_word
 else: #If not in dictionary
 dict[sorted_word] = [word] #Create new list with one entry
while(True): #Loop until quit is typed
 jumble = raw_input("Enter a jumble to decode or 'quit': ") #Get input
 jumble = jumble.lower() #Send jumble to lower
 if(jumble == "quit"): #Quit if quit is typed
 break
 jumble = ''.join(sorted(jumble)) #Sort jumble alphabetical
 if jumble in dict: #If sorted jumble exists in dictionary
 results = dict[jumble] #Get list of words that match jumble
 for result in results: #Loop through list printing results
 print result, #Trailing , designates that there should be a space between entries
 print "" #Print newlines
 else: #Jumble not found in dictionary print message
 print "Results for jumble not found"

Question 2

You may want to handle some special characters, like apostrophe, more directly. It's quite possible that you'd want to let sit find anagrams including both its and it's, and your code will not do so.

Question 3

Hum, first of all, there are way too many comments. People simply won't read them, you should only keep the meaningful ones. For example, you could remove:

if sorted_word in dict: #Check if sorted_word is already a key

Anybody who knows how Python dictionaries work will know what this lign is doing, the code is explicit by itself. Instead of always repeating what your dict is doing, you should just explain once what it does when you declare it; the code will tell the rest. You should only explain once at the beginning how your algorithm work and let your code tell the rest. Honestly, you could remove almost all of your comments. Python is designed to be readable and it is.

Second point, avoid to name a variable dict: it is already the name of a built-in type. It's bad pratice and can be confusing for people who will try to read your code.

Also, you open your file once, but never close it. You should close it once you have filled your dict. You could use the with open('...', 'r') as f: construct to have your file closed automatically at the end of the with block, therefore, you won't even have to remember to close it. Also, file, like dict is also the name of a built-in type in Python, find another name :)

Question 4

100% agreed about too many comments. My general advice here is similar to yours: the lines of code themselves tell the tiny details, so instead find blocks of code and explain what they accomplish together. E.g. "Create a mapping from the sorted characters of a word to all the words made from those characters" for the first block; and "Look up and print all words created using all the same characters entered" for the second block.

Question 5

On top of the comments from Morwenn : setdefault and join will do exactly what you want to do and make your code more expressive.

Here what your code can become :

#!/usr/bin/python
sorted_words = {}
with open("/usr/share/dict/words", "r") as f:
 for word in f:
 word = word.strip().lower()
 sorted_word = ''.join(sorted(word))
 sorted_words.setdefault(sorted_word,[]).append(word)
while True:
 jumble = raw_input("Enter a jumble to decode or 'quit': ").lower()
 if jumble == "quit":
 break
 sorted_jumble = ''.join(sorted(jumble))
 if sorted_jumble in sorted_words:
 print ", ".join(sorted_words[sorted_jumble])
 else:
 print "Results for jumble not found"

Question 6

Big +1; I was halfway through writing a response with the same advice. However there is one important behavior change with the above: it doesn't filter duplicate words (say from a bad dictionary file). I was going to suggest set instead of list for this, but that changes the order printed later. Note as well that collections.defaultdict(list) lets you just call sorted_words[sorted_word].append(word). There's also one typo change: use ", ".join to keep the spacing between resulting words.

Question 7

Thanks for pointing this out. I've edited my answer. As for the set, it completely went out of my mind. Given the source for words, I don't know if it is a big deal.

Question 8

I agree with Michael Urman: collections.defaultdict is clearly indicated here.

Question 9

One thing to consider: by using quit as a magic word, you cannot look up anagrams of it. One approach you could use to enable looking up anagrams of quit is to instead loop until EOF (typically input by pressing ctrl+d or ctrl+z enter) by using something like this:

while True:
 try:
 jumble = raw_input("Enter a jumble to decode: ")
 except EOFError:
 break
 ...

Alternately you could exit on an empty string, which of course has no anagrams.

Morwenn Morwenn 20.2k3 gold badges69 silver badges132 bronze badges · Answer 1 · 2013-12-10 14:03:12Z

Hum, first of all, there are way too many comments. People simply won't read them, you should only keep the meaningful ones. For example, you could remove:

if sorted_word in dict: #Check if sorted_word is already a key

Anybody who knows how Python dictionaries work will know what this lign is doing, the code is explicit by itself. Instead of always repeating what your dict is doing, you should just explain once what it does when you declare it; the code will tell the rest. You should only explain once at the beginning how your algorithm work and let your code tell the rest. Honestly, you could remove almost all of your comments. Python is designed to be readable and it is.

Second point, avoid to name a variable dict: it is already the name of a built-in type. It's bad pratice and can be confusing for people who will try to read your code.

Also, you open your file once, but never close it. You should close it once you have filled your dict. You could use the with open('...', 'r') as f: construct to have your file closed automatically at the end of the with block, therefore, you won't even have to remember to close it. Also, file, like dict is also the name of a built-in type in Python, find another name :)

100% agreed about too many comments. My general advice here is similar to yours: the lines of code themselves tell the tiny details, so instead find blocks of code and explain what they accomplish together. E.g. "Create a mapping from the sorted characters of a word to all the words made from those characters" for the first block; and "Look up and print all words created using all the same characters entered" for the second block.

SylvainD SylvainD 29.7k1 gold badge49 silver badges93 bronze badges · Answer 2 · 2013-12-10 14:21:34Z

6

\$\begingroup\$

On top of the comments from Morwenn : setdefault and join will do exactly what you want to do and make your code more expressive.

Here what your code can become :

#!/usr/bin/python
sorted_words = {}
with open("/usr/share/dict/words", "r") as f:
 for word in f:
 word = word.strip().lower()
 sorted_word = ''.join(sorted(word))
 sorted_words.setdefault(sorted_word,[]).append(word)
while True:
 jumble = raw_input("Enter a jumble to decode or 'quit': ").lower()
 if jumble == "quit":
 break
 sorted_jumble = ''.join(sorted(jumble))
 if sorted_jumble in sorted_words:
 print ", ".join(sorted_words[sorted_jumble])
 else:
 print "Results for jumble not found"

Share

edited Dec 10, 2013 at 14:27

answered Dec 10, 2013 at 14:21

SylvainD's user avatar

SylvainD SylvainD

29.7k1 gold badge49 silver badges93 bronze badges

\$\endgroup\$

3

2

\$\begingroup\$ Big +1; I was halfway through writing a response with the same advice. However there is one important behavior change with the above: it doesn't filter duplicate words (say from a bad dictionary file). I was going to suggest set instead of list for this, but that changes the order printed later. Note as well that collections.defaultdict(list) lets you just call sorted_words[sorted_word].append(word). There's also one typo change: use ", ".join to keep the spacing between resulting words. \$\endgroup\$

Michael Urman
– Michael Urman

2013年12月10日 14:25:40 +00:00
Commented Dec 10, 2013 at 14:25
\$\begingroup\$ Thanks for pointing this out. I've edited my answer. As for the set, it completely went out of my mind. Given the source for words, I don't know if it is a big deal. \$\endgroup\$

SylvainD
– SylvainD

2013年12月10日 14:28:48 +00:00
Commented Dec 10, 2013 at 14:28
\$\begingroup\$ I agree with Michael Urman: collections.defaultdict is clearly indicated here. \$\endgroup\$

Gareth Rees
– Gareth Rees

2013年12月10日 15:25:53 +00:00
Commented Dec 10, 2013 at 15:25

Add a comment |

Michael Urman Michael Urman 3,8701 gold badge12 silver badges23 bronze badges · Answer 3 · 2013-12-10 14:39:53Z

One thing to consider: by using quit as a magic word, you cannot look up anagrams of it. One approach you could use to enable looking up anagrams of quit is to instead loop until EOF (typically input by pressing ctrl+d or ctrl+z enter) by using something like this:

while True:
 try:
 jumble = raw_input("Enter a jumble to decode: ")
 except EOFError:
 break
 ...

Alternately you could exit on an empty string, which of course has no anagrams.

Stack Exchange Network

Jumble solver algorithm

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Jumble solver algorithm

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions