Word Game Solver

Question 1

I like to play word games that give you a set of letters and you get points for the more words you can make out of those letters. I have some code that works, but it's pretty slow. I was wondering if anyone had any idea on how to speed it up or if this is even the right place for this.

Here's the code

import itertools
import os
import urllib.request
import sys
# PREWORK
DICTIONARY = os.path.join(sys.path[0], 'words.txt')
dictionary = []
with open(DICTIONARY) as f:
 [dictionary.append(word.strip().lower()) for word in f.read().split()]
def get_possible_dict_words(draw):
 """Get all possible words from a draw (list of letters) which are
 valid dictionary words. Use _get_permutations_draw and provided
 dictionary"""
 combinations = _get_permutations_draw(draw)
 listOfPossibleWords = []
 try:
 while True:
 combination = next(combinations)
 combination = ''.join(combination)
 if combination in listOfPossibleWords:
 continue
 if combination in dictionary:
 listOfPossibleWords.append(combination)
 except StopIteration:
 if listOfPossibleWords:
 return listOfPossibleWords
 else:
 print("There are no combinations made with those letters.")
def _get_permutations_draw(draw):
 """Helper to get all permutations of a draw (list of letters), hint:
 use itertools.permutations (order of letters matters)"""
 for _ in range(1, len(draw) + 1):
 for subset in itertools.permutations(draw, _):
 yield(subset)
def main():
 draw = input("Please enter each letter seperated by spaces: ").split(" ")
 print(get_possible_dict_words(draw))
if __name__ == '__main__':
 main()

Question 2

@alecxe has the right idea, but there is more to improve

Review

sys.path[0] is unnecessary

The default of open(filename) will look in the in the current directory, so there is no need to convert it to a full path beforehand
dictionary should also be a CONSTANT

Since dictionary is a global constant it should be written in ALL_CAPS

Secondly this variable is oddly named, dictionary is a datatype (hashmap) in python.
Change the data type of dictionary

@alecxe mentioned this as well, but a set will give O(0) lookup, instead of O(n), we can make the dictionary of words a set for faster lookup times
_ variables are used for variables you don't use later on.

You do use the variable so rename it.

Code

from itertools import permutations
WORD_FILE = 'words.txt'
def read_words(filename):
 with open(filename) as f:
 return {word.strip().lower() for word in f.readlines()}
def get_possible_words(target, word_set):
 used_words = set()
 for comb in get_permutations(target):
 if comb in word_set:
 used_words.add(comb)
 return used_words or "There are no combinations made with those letters."
def get_permutations(target):
 for i in range(1, len(target) + 1):
 for subset in permutations(target, i):
 yield(''.join(subset))
def main():
 target = input("Please enter each letter seperated by spaces: ").split()
 print(get_possible_words(target, read_words(WORD_FILE)))
if __name__ == '__main__':
 main()

Question 3

You may want to clarify your sentence about _: It's by convention that it's used for temporary variables, but there's nothing inherently special about an underscore. It's a variable like any other. This may be a useful link to add to your answer.

Question 4

@Graham I should've made it more clear it is convention to use _ for throwaway variables. Thnx for the link, will edit when I have a little more time

Question 5

My idea when it comes to optimizing code is to check the runtime with a profiler. Python has a useful module called cProfile with which you can easily track how long the individual function calls take. With the main call wrapped in to

import cProfile
...
cProfile.run('main()')

I get the following result

Please enter each letter seperated by spaces: t e s t d a r r e
['a', 'at', 'as', 'ad', 'art', 'test', 'dear', 'date', 'dare', 'desert', 'arrest']
 4932040 function calls in 20.747 seconds
 Ordered by: standard name
 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.000 0.000 20.747 20.747 <string>:1(<module>)
 1 0.000 0.000 0.000 0.000 codecs.py:318(decode)
 1 0.000 0.000 0.000 0.000 codecs.py:330(getstate)
 986409 0.234 0.000 0.234 0.000 word_game.py:14(_in_list_of_possibles)
 986379 14.626 0.000 14.626 0.000 word_game.py:21(_in_dictionary)
 1 0.800 0.800 16.245 16.245 word_game.py:28(get_possible_dict_words)
 986410 0.205 0.000 0.205 0.000 word_game.py:49(_get_permutations_draw)
 1 0.000 0.000 20.747 20.747 word_game.py:57(main)
 1 0.000 0.000 0.000 0.000 {built-in method _codecs.utf_8_decode}
 1 0.000 0.000 20.747 20.747 {built-in method builtins.exec}
 1 4.501 4.501 4.501 4.501 {built-in method builtins.input}
 1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
 986410 0.205 0.000 0.411 0.000 {built-in method builtins.next}
 1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
 11 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
 986409 0.175 0.000 0.175 0.000 {method 'join' of 'str' objects}
 1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}

I added two functions

def _in_list_of_possibles(combi, list_of_possibles):
 if combi in list_of_possibles:
 return True
 else:
 return False
def _in_dictionary(combi, dictionary):
 if combi in dictionary:
 return True
 else:
 return False

in order to get a better understanding of how long these steps take. And with this we can see that the lookup of the combination in the dictionary is the culprit whereas the look up in the newly composed list of possibles doesn't really affect performance.

~~(削除) My idea of performance increase would go into the direction of multi threading the lookup in the dictionary. (削除ここまで)~~

Following the idea of @ludisposed: Converting the dictionary into a set increases performance quite significantly.

Question 6

While I agree with parts of your answer, (use cprofile to check where the bottlenecks are) I don't agree with your last point. Why use multi threading, if you could make the dictionary variable a set essentially giving O(0) lookup.

Question 7

You are absolutely correct. According to the Zen of python: Simple is better than complex.

Question 8

I apologize for that, I fixed it in the answer.

Question 9

Couple straightforward things that should provide performance and memory usage improvements:

if combination in listOfPossibleWords: if listOfPossibleWords is a list is a slow full-scan operation with \$O(n)\$ complexity. By switching to a set, you would get constant time lookups
```
possible_words = set([])
try:
 while True:
 combination = next(combinations)
 combination = ''.join(combination)
 if combination in possible_words:
 continue
 if combination in dictionary:
 possible_words.add(combination)
 # ...
```
FYI, here is a handy complexity table for Python date structures: TimeComplexity
when you are reading the file, the f.read() would read the whole contents of the file into memory. You could iterate over the file line by line in a "lazy" manner instead:
```
with open(DICTIONARY) as f:
 dictionary = [word.strip().lower() for word in f]
```
Note that I've also updated it to be a proper pythonic list comprehension.

Ludisposed Ludisposed 11.8k2 gold badges41 silver badges91 bronze badges · Answer 1 · 2018-12-20 08:47:12Z

@alecxe has the right idea, but there is more to improve

Review

sys.path[0] is unnecessary

The default of open(filename) will look in the in the current directory, so there is no need to convert it to a full path beforehand
dictionary should also be a CONSTANT

Since dictionary is a global constant it should be written in ALL_CAPS

Secondly this variable is oddly named, dictionary is a datatype (hashmap) in python.
Change the data type of dictionary

@alecxe mentioned this as well, but a set will give O(0) lookup, instead of O(n), we can make the dictionary of words a set for faster lookup times
_ variables are used for variables you don't use later on.

You do use the variable so rename it.

Code

from itertools import permutations
WORD_FILE = 'words.txt'
def read_words(filename):
 with open(filename) as f:
 return {word.strip().lower() for word in f.readlines()}
def get_possible_words(target, word_set):
 used_words = set()
 for comb in get_permutations(target):
 if comb in word_set:
 used_words.add(comb)
 return used_words or "There are no combinations made with those letters."
def get_permutations(target):
 for i in range(1, len(target) + 1):
 for subset in permutations(target, i):
 yield(''.join(subset))
def main():
 target = input("Please enter each letter seperated by spaces: ").split()
 print(get_possible_words(target, read_words(WORD_FILE)))
if __name__ == '__main__':
 main()

You may want to clarify your sentence about _: It's by convention that it's used for temporary variables, but there's nothing inherently special about an underscore. It's a variable like any other. This may be a useful link to add to your answer.
@Graham I should've made it more clear it is convention to use _ for throwaway variables. Thnx for the link, will edit when I have a little more time

Uisdean Chapped Uisdean Chapped 413 bronze badges · Answer 2 · 2018-12-20 14:25:49Z

My idea when it comes to optimizing code is to check the runtime with a profiler. Python has a useful module called cProfile with which you can easily track how long the individual function calls take. With the main call wrapped in to

import cProfile
...
cProfile.run('main()')

I get the following result

Please enter each letter seperated by spaces: t e s t d a r r e
['a', 'at', 'as', 'ad', 'art', 'test', 'dear', 'date', 'dare', 'desert', 'arrest']
 4932040 function calls in 20.747 seconds
 Ordered by: standard name
 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.000 0.000 20.747 20.747 <string>:1(<module>)
 1 0.000 0.000 0.000 0.000 codecs.py:318(decode)
 1 0.000 0.000 0.000 0.000 codecs.py:330(getstate)
 986409 0.234 0.000 0.234 0.000 word_game.py:14(_in_list_of_possibles)
 986379 14.626 0.000 14.626 0.000 word_game.py:21(_in_dictionary)
 1 0.800 0.800 16.245 16.245 word_game.py:28(get_possible_dict_words)
 986410 0.205 0.000 0.205 0.000 word_game.py:49(_get_permutations_draw)
 1 0.000 0.000 20.747 20.747 word_game.py:57(main)
 1 0.000 0.000 0.000 0.000 {built-in method _codecs.utf_8_decode}
 1 0.000 0.000 20.747 20.747 {built-in method builtins.exec}
 1 4.501 4.501 4.501 4.501 {built-in method builtins.input}
 1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
 986410 0.205 0.000 0.411 0.000 {built-in method builtins.next}
 1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
 11 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
 986409 0.175 0.000 0.175 0.000 {method 'join' of 'str' objects}
 1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}

I added two functions

def _in_list_of_possibles(combi, list_of_possibles):
 if combi in list_of_possibles:
 return True
 else:
 return False
def _in_dictionary(combi, dictionary):
 if combi in dictionary:
 return True
 else:
 return False

in order to get a better understanding of how long these steps take. And with this we can see that the lookup of the combination in the dictionary is the culprit whereas the look up in the newly composed list of possibles doesn't really affect performance.

~~(削除) My idea of performance increase would go into the direction of multi threading the lookup in the dictionary. (削除ここまで)~~

Following the idea of @ludisposed: Converting the dictionary into a set increases performance quite significantly.

While I agree with parts of your answer, (use cprofile to check where the bottlenecks are) I don't agree with your last point. Why use multi threading, if you could make the dictionary variable a set essentially giving O(0) lookup.
You are absolutely correct. According to the Zen of python: Simple is better than complex.

alecxe alecxe 17.5k8 gold badges52 silver badges93 bronze badges · Answer 3 · 2018-12-19 20:09:36Z

Couple straightforward things that should provide performance and memory usage improvements:

if combination in listOfPossibleWords: if listOfPossibleWords is a list is a slow full-scan operation with \$O(n)\$ complexity. By switching to a set, you would get constant time lookups
```
possible_words = set([])
try:
 while True:
 combination = next(combinations)
 combination = ''.join(combination)
 if combination in possible_words:
 continue
 if combination in dictionary:
 possible_words.add(combination)
 # ...
```
FYI, here is a handy complexity table for Python date structures: TimeComplexity
when you are reading the file, the f.read() would read the whole contents of the file into memory. You could iterate over the file line by line in a "lazy" manner instead:
```
with open(DICTIONARY) as f:
 dictionary = [word.strip().lower() for word in f]
```
Note that I've also updated it to be a proper pythonic list comprehension.

Stack Exchange Network

Word Game Solver

3 Answers 3

Review

Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Word Game Solver

3 Answers 3

Review

Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions