4x4 word grid generator in Python

Question 1

This program should generate NxN boards where there are words going from left to right and words going from top to bottom. Words are valid if they appear in the scrabble dictionary.

For example, a valid 3x3 board would be:

ACT
TOO
EGO

as it contains ACT, TOO and EGO going across, and ATE, COG and TOO going down.

import itertools
N = 3
f = open("scrabble.txt")
words = [line.strip().upper() for line in f if len(line.strip()) == N]
words = sorted(words)
f.close()
def tuple_to_grid(t):
 for i in t:
 print(i)
 print()
found = []
for i in itertools.product(words, repeat = N):
 for j in range(N):
 down = "".join([k[j] for k in i])
 if down not in words:
 break
 else:
 tuple_to_grid(i)

N can be changed to change the size of the board. It can generate fine for 2x2 and 3x3 boards, but when it comes to 4x4 it takes hours to generate single boards.

Please comment on how to make this faster.

Question 2

Your problem is here:

for i in itertools.product(words, repeat = N):

This brute-force approach is why it's taking so long. Let \$k\$ be the number of \$N\$-letter words. For two or three letters, the time is proportional to \$k^2\$ or \$k^3\$. Bad enough, but for four letters, it's \$k^4\,ドル which is horrendous. \$k\$ is probably significantly larger for four letters as well, depending on your word list.

So, we need to drastically prune the number of combinations to try. There are many ways this might be done. One idea would be to prune things a row at a time. Suppose the first row is "ACHE". Look at all words beginning with "A". None begin with "AA" or "AQ". So the next row can eliminate all words beginning with "A" or "Q". Similarly, second-row second letter can't be "B" or "C" or several other things, because no words begin with "CB" or "CC", etc. This approach should cut way down on the combinations.

Question 3

Independently of the performance issue, you can improve your code by:

using better, more descriptive, variable names
using functions instead of putting all your code at top-level
using with to handle the file for you
using set instead of list to check that a word is contained in the file
using zip to "rotate" the board
using generators to produce values one at a time

The code would look like:

import itertools
def words_of_length(length, dictionnary_file='scrabble.txt'):
 with open(dictionnary_file) as words:
 for line in words:
 line = line.strip()
 if len(line) == length:
 yield line.upper()
def generate_grids(words):
 pick = words.pop()
 words.add(pick)
 for candidates in itertools.product(words, repeat=len(pick)):
 for down in zip(*candidates):
 if ''.join(down) not in words:
 break
 else:
 yield candidates
if __name__ == '__main__':
 words = set(words_of_length(3))
 for grid in generate_grids(words):
 print('\n'.join(grid))

Question 4

According to 4 Letter Words list at wordfinder.yourdictionary.com there are 4002 4-letter words. So there exist \4002ドル^4 \approx 256\cdot {10}^{12}\$ four-letter words fours (inluding all duplicates, like MEAL+SOUP+MEAT+MEAT or NONE+NONE+NONE+NONE).

Assuming your computer generates and tests one million combinations per second, it would need about 195 years to process the whole set of possibilities.

You definitely need to use some faster method, basically relying on cutting non-promising paths of search. The answer by Tom Zych contains useful hint.

Question 5

I think it's actually 4002 * 4001 * 4000 * 3999, not 4002^4, but 2.56 * 10^4 is about right either way.

Question 6

@cwallenpoole 10^14, not 10^4. Anyway, that would be the answer in the case of combinations without duplications, but the problem does not exclude repetitions (see examples I added to my answer).

Question 7

ACK! Yes. That was a typo. it's 10^14. I thought the math behind a limited, unordered set was (Possiblities!)/((max valid possibilities)!). In this case, that means 4002*4001*4000*3999.

Question 8

@cwallenpoole If you count ordered subsets i.e. sequences with no repetitions from N–word dictionary, then yes: you can choose any of N words as the first one, then any of the remaining N–1 as the second one and so on. However repetitions are not forbidden here, so at each choice you have all N words in the dictionary to choose from.

Question 9

That is a fair point

Tom Zych Tom Zych 4214 silver badges14 bronze badges · Accepted Answer · 2017-10-12 09:44:57Z

Your problem is here:

for i in itertools.product(words, repeat = N):

This brute-force approach is why it's taking so long. Let \$k\$ be the number of \$N\$-letter words. For two or three letters, the time is proportional to \$k^2\$ or \$k^3\$. Bad enough, but for four letters, it's \$k^4\,ドル which is horrendous. \$k\$ is probably significantly larger for four letters as well, depending on your word list.

So, we need to drastically prune the number of combinations to try. There are many ways this might be done. One idea would be to prune things a row at a time. Suppose the first row is "ACHE". Look at all words beginning with "A". None begin with "AA" or "AQ". So the next row can eliminate all words beginning with "A" or "Q". Similarly, second-row second letter can't be "B" or "C" or several other things, because no words begin with "CB" or "CC", etc. This approach should cut way down on the combinations.

Stack Exchange Network

4x4 word grid generator in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

4x4 word grid generator in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions