I like to play word games that give you a set of letters and you get points for the more words you can make out of those letters. I have some code that works, but it's pretty slow. I was wondering if anyone had any idea on how to speed it up or if this is even the right place for this.
Here's the code
import itertools
import os
import urllib.request
import sys
# PREWORK
DICTIONARY = os.path.join(sys.path[0], 'words.txt')
dictionary = []
with open(DICTIONARY) as f:
[dictionary.append(word.strip().lower()) for word in f.read().split()]
def get_possible_dict_words(draw):
"""Get all possible words from a draw (list of letters) which are
valid dictionary words. Use _get_permutations_draw and provided
dictionary"""
combinations = _get_permutations_draw(draw)
listOfPossibleWords = []
try:
while True:
combination = next(combinations)
combination = ''.join(combination)
if combination in listOfPossibleWords:
continue
if combination in dictionary:
listOfPossibleWords.append(combination)
except StopIteration:
if listOfPossibleWords:
return listOfPossibleWords
else:
print("There are no combinations made with those letters.")
def _get_permutations_draw(draw):
"""Helper to get all permutations of a draw (list of letters), hint:
use itertools.permutations (order of letters matters)"""
for _ in range(1, len(draw) + 1):
for subset in itertools.permutations(draw, _):
yield(subset)
def main():
draw = input("Please enter each letter seperated by spaces: ").split(" ")
print(get_possible_dict_words(draw))
if __name__ == '__main__':
main()
3 Answers 3
@alecxe has the right idea, but there is more to improve
Review
sys.path[0]
is unnecessaryThe default of
open(filename)
will look in the in the current directory, so there is no need to convert it to a full path beforehanddictionary
should also be a CONSTANTSince dictionary is a global constant it should be written in
ALL_CAPS
Secondly this variable is oddly named,
dictionary
is a datatype (hashmap) in python.Change the data type of
dictionary
@alecxe mentioned this as well, but a
set
will give O(0) lookup, instead of O(n), we can make the dictionary of words a set for faster lookup times_
variables are used for variables you don't use later on.You do use the variable so rename it.
Code
from itertools import permutations
WORD_FILE = 'words.txt'
def read_words(filename):
with open(filename) as f:
return {word.strip().lower() for word in f.readlines()}
def get_possible_words(target, word_set):
used_words = set()
for comb in get_permutations(target):
if comb in word_set:
used_words.add(comb)
return used_words or "There are no combinations made with those letters."
def get_permutations(target):
for i in range(1, len(target) + 1):
for subset in permutations(target, i):
yield(''.join(subset))
def main():
target = input("Please enter each letter seperated by spaces: ").split()
print(get_possible_words(target, read_words(WORD_FILE)))
if __name__ == '__main__':
main()
-
1\$\begingroup\$ You may want to clarify your sentence about
_
: It's by convention that it's used for temporary variables, but there's nothing inherently special about an underscore. It's a variable like any other. This may be a useful link to add to your answer. \$\endgroup\$Graham– Graham2018年12月21日 22:32:09 +00:00Commented Dec 21, 2018 at 22:32 -
\$\begingroup\$ @Graham I should've made it more clear it is convention to use
_
for throwaway variables. Thnx for the link, will edit when I have a little more time \$\endgroup\$Ludisposed– Ludisposed2018年12月22日 12:39:51 +00:00Commented Dec 22, 2018 at 12:39
My idea when it comes to optimizing code is to check the runtime with a profiler. Python has a useful module called cProfile with which you can easily track how long the individual function calls take. With the main call wrapped in to
import cProfile
...
cProfile.run('main()')
I get the following result
Please enter each letter seperated by spaces: t e s t d a r r e
['a', 'at', 'as', 'ad', 'art', 'test', 'dear', 'date', 'dare', 'desert', 'arrest']
4932040 function calls in 20.747 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 20.747 20.747 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 codecs.py:318(decode)
1 0.000 0.000 0.000 0.000 codecs.py:330(getstate)
986409 0.234 0.000 0.234 0.000 word_game.py:14(_in_list_of_possibles)
986379 14.626 0.000 14.626 0.000 word_game.py:21(_in_dictionary)
1 0.800 0.800 16.245 16.245 word_game.py:28(get_possible_dict_words)
986410 0.205 0.000 0.205 0.000 word_game.py:49(_get_permutations_draw)
1 0.000 0.000 20.747 20.747 word_game.py:57(main)
1 0.000 0.000 0.000 0.000 {built-in method _codecs.utf_8_decode}
1 0.000 0.000 20.747 20.747 {built-in method builtins.exec}
1 4.501 4.501 4.501 4.501 {built-in method builtins.input}
1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
986410 0.205 0.000 0.411 0.000 {built-in method builtins.next}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
11 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
986409 0.175 0.000 0.175 0.000 {method 'join' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}
I added two functions
def _in_list_of_possibles(combi, list_of_possibles):
if combi in list_of_possibles:
return True
else:
return False
def _in_dictionary(combi, dictionary):
if combi in dictionary:
return True
else:
return False
in order to get a better understanding of how long these steps take. And with this we can see that the lookup of the combination in the dictionary is the culprit whereas the look up in the newly composed list of possibles doesn't really affect performance.
(削除) My idea of performance increase would go into the direction of multi threading the lookup in the dictionary. (削除ここまで)
Following the idea of @ludisposed: Converting the dictionary into a set increases performance quite significantly.
-
\$\begingroup\$ While I agree with parts of your answer, (use cprofile to check where the bottlenecks are) I don't agree with your last point. Why use multi threading, if you could make the
dictionary
variable aset
essentially giving O(0) lookup. \$\endgroup\$Ludisposed– Ludisposed2018年12月20日 14:39:38 +00:00Commented Dec 20, 2018 at 14:39 -
\$\begingroup\$ You are absolutely correct. According to the Zen of python: Simple is better than complex. \$\endgroup\$Uisdean Chapped– Uisdean Chapped2018年12月20日 14:45:38 +00:00Commented Dec 20, 2018 at 14:45
-
\$\begingroup\$ I apologize for that, I fixed it in the answer. \$\endgroup\$Uisdean Chapped– Uisdean Chapped2018年12月20日 14:56:44 +00:00Commented Dec 20, 2018 at 14:56
Couple straightforward things that should provide performance and memory usage improvements:
if combination in listOfPossibleWords:
iflistOfPossibleWords
is a list is a slow full-scan operation with \$O(n)\$ complexity. By switching to a set, you would get constant time lookupspossible_words = set([]) try: while True: combination = next(combinations) combination = ''.join(combination) if combination in possible_words: continue if combination in dictionary: possible_words.add(combination) # ...
FYI, here is a handy complexity table for Python date structures: TimeComplexity
when you are reading the file, the
f.read()
would read the whole contents of the file into memory. You could iterate over the file line by line in a "lazy" manner instead:with open(DICTIONARY) as f: dictionary = [word.strip().lower() for word in f]
Note that I've also updated it to be a proper pythonic list comprehension.