This program should generate NxN boards where there are words going from left to right and words going from top to bottom. Words are valid if they appear in the scrabble dictionary.
For example, a valid 3x3 board would be:
ACT
TOO
EGO
as it contains ACT, TOO and EGO going across, and ATE, COG and TOO going down.
import itertools
N = 3
f = open("scrabble.txt")
words = [line.strip().upper() for line in f if len(line.strip()) == N]
words = sorted(words)
f.close()
def tuple_to_grid(t):
for i in t:
print(i)
print()
found = []
for i in itertools.product(words, repeat = N):
for j in range(N):
down = "".join([k[j] for k in i])
if down not in words:
break
else:
tuple_to_grid(i)
N
can be changed to change the size of the board. It can generate fine for 2x2 and 3x3 boards, but when it comes to 4x4 it takes hours to generate single boards.
Please comment on how to make this faster.
3 Answers 3
Your problem is here:
for i in itertools.product(words, repeat = N):
This brute-force approach is why it's taking so long. Let \$k\$ be the number of \$N\$-letter words. For two or three letters, the time is proportional to \$k^2\$ or \$k^3\$. Bad enough, but for four letters, it's \$k^4\,ドル which is horrendous. \$k\$ is probably significantly larger for four letters as well, depending on your word list.
So, we need to drastically prune the number of combinations to try. There are many ways this might be done. One idea would be to prune things a row at a time. Suppose the first row is "ACHE". Look at all words beginning with "A". None begin with "AA" or "AQ". So the next row can eliminate all words beginning with "A" or "Q". Similarly, second-row second letter can't be "B" or "C" or several other things, because no words begin with "CB" or "CC", etc. This approach should cut way down on the combinations.
Independently of the performance issue, you can improve your code by:
- using better, more descriptive, variable names
- using functions instead of putting all your code at top-level
- using
with
to handle the file for you - using
set
instead oflist
to check that a word is contained in the file - using
zip
to "rotate" the board - using generators to produce values one at a time
The code would look like:
import itertools
def words_of_length(length, dictionnary_file='scrabble.txt'):
with open(dictionnary_file) as words:
for line in words:
line = line.strip()
if len(line) == length:
yield line.upper()
def generate_grids(words):
pick = words.pop()
words.add(pick)
for candidates in itertools.product(words, repeat=len(pick)):
for down in zip(*candidates):
if ''.join(down) not in words:
break
else:
yield candidates
if __name__ == '__main__':
words = set(words_of_length(3))
for grid in generate_grids(words):
print('\n'.join(grid))
According to 4 Letter Words list at wordfinder.yourdictionary.com there are 4002 4-letter words. So there exist \4002ドル^4 \approx 256\cdot {10}^{12}\$ four-letter words fours (inluding all duplicates, like MEAL+SOUP+MEAT+MEAT or NONE+NONE+NONE+NONE).
Assuming your computer generates and tests one million combinations per second, it would need about 195 years to process the whole set of possibilities.
You definitely need to use some faster method, basically relying on cutting non-promising paths of search. The answer by Tom Zych contains useful hint.
-
\$\begingroup\$ I think it's actually 4002 * 4001 * 4000 * 3999, not 4002^4, but 2.56 * 10^4 is about right either way. \$\endgroup\$cwallenpoole– cwallenpoole2017年10月12日 18:50:50 +00:00Commented Oct 12, 2017 at 18:50
-
\$\begingroup\$ @cwallenpoole 10^14, not 10^4. Anyway, that would be the answer in the case of combinations without duplications, but the problem does not exclude repetitions (see examples I added to my answer). \$\endgroup\$CiaPan– CiaPan2017年10月13日 05:42:28 +00:00Commented Oct 13, 2017 at 5:42
-
\$\begingroup\$ ACK! Yes. That was a typo. it's 10^14. I thought the math behind a limited, unordered set was
(Possiblities!)/((max valid possibilities)!)
. In this case, that means4002*4001*4000*3999
. \$\endgroup\$cwallenpoole– cwallenpoole2017年10月13日 12:12:50 +00:00Commented Oct 13, 2017 at 12:12 -
\$\begingroup\$ @cwallenpoole If you count ordered subsets i.e. sequences with no repetitions from N–word dictionary, then yes: you can choose any of N words as the first one, then any of the remaining N–1 as the second one and so on. However repetitions are not forbidden here, so at each choice you have all N words in the dictionary to choose from. \$\endgroup\$CiaPan– CiaPan2017年10月16日 08:12:29 +00:00Commented Oct 16, 2017 at 8:12
-
\$\begingroup\$ That is a fair point \$\endgroup\$cwallenpoole– cwallenpoole2017年10月16日 14:35:26 +00:00Commented Oct 16, 2017 at 14:35