My current code implementing a depth-first search works well for puzzle 1. But my code is too slow; I think I use too many loops and I am not sure I am using the right data structure. So I have a problem in puzzle 3,4,5. Please help me to make my code run faster.
Description: The crossword file begins with a grid showing where the words go (represented by the * character) and the spaces around the words (represented by dots). Then there is a list of words, one per line, all in caps, that are to be placed into the crossword grid. The words should be placed in the grid so that they fit the number of asterisks and so that anywhere that words intersect each other, they share a letter. ( you can run puzzle1.txt to see the result)
puzzle1.txt
..........
.******...
...*......
...*......
...*****..
...*..*...
......*...
......*...
......*...
..........
POLAND
LHASA
SPAIN
INDIA
puzzle3.txt
.*.*.*.
.*.*.*.
.*.*.*.
*******
*.*.*.*
*.*.*.*
*.*.*.*
ABCDEFG
AAAA
BBBB
CCCC
DDDD
EEEE
FFFF
GGGG
puzzle4.txt
...........**********......
...*.......*......*........
...*...******.....*........
...*.......*......*........
...*****...*......*........
...*..*....*......*........
......*******.....*........
......*..*........*...*....
......*..*........*********
.........*............*....
......******..........*....
......*...........*****....
......*...............*....
......*...............*....
......*....................
......*....................
......*....................
......*....................
AUSTRALIA
GUATEMALA
PANAMA
PORTUGAL
IRELAND
MYANMAR
LHASA
MONTENEGRO
MALTA
CHINA
SPAIN
BHUTAN
DENMARK
INDIA
Source code
from collections import defaultdict
import os
import copy
import time
class search:
# Constructor
def __init__(self, file_name):
self.filename = file_name
self.board = [] ## list of board *..**
self.rows = 0
self.cols = 0
self.blank_tiles=[] ##pair row and col that have valid answer
self.answerlist =[] ## list of answer
self.count = 0
self.possibleboard = []
self.checkanswerlist = []
def addanswer(self,word):
self.answerlist.append(word)
def printverhorlist(self):
print(self.blank_tiles)
def verhorlist(self):
for j in range(self.rows):
col_lis = []
for i in range(self.cols):
# if current node is * append to row list , if rowlist >1 ( which mean it is a word) append row col to blank_tiles
if self.board[j][i] == "*":
col_lis.append((j, i))
elif len(col_lis) == 1:
col_lis = []
elif len(col_lis) > 1:
self.blank_tiles.append(col_lis)
col_lis = []
if j == self.cols-1 and len(col_lis) > 1:
self.blank_tiles.append(col_lis)
col_lis = []
for j in range(self.cols):
row_lis = []
for i in range(self.rows):
# if current node is * append to row list , if rowlist >1 ( which mean it is a word) append row col to blank_tiles
if self.board[i][j] == "*":
row_lis.append((i, j))
elif len(row_lis) == 1:
row_lis = []
elif len(row_lis) > 1:
self.blank_tiles.append(row_lis)
row_lis = []
if i == self.rows-1 and len(row_lis) > 1: # end of a column
self.blank_tiles.append(row_lis)
row_lis = []
def printboard(self,board):
print("#"*self.cols)
print("\n")
for row in board:
print("".join(row))
def printanswer(self):
for a in self.answerlist:
print(a, end = " ")
print("\n")
def openfile(self):
file = open(self.filename, "r")
a = file.readlines() # read in whole file as 1D list of strings
try:
for i in range(len(a)):
if a[i][0] == "*" or a[i][0] == ".":
self.board.append(a[i].strip())
else:
self.answerlist.append(a[i].strip())
except:
print("Cannot open file")
self.cols = len(self.board[0])
self.rows = len(self.board)
print(self.rows,self.cols)
def checkfill(self,board):
for m in range(self.rows):
for n in range(self.cols):
if board[m][n] == "*":
return 0
return 1
def startfind(self):
start = time.time()
self.find_match(self.board, self.blank_tiles, self.answerlist)
stop = time.time()
times = (stop - start) * 1000
print("Run time takes %d miliseconds" % times)
if not self.checkfill(self.board):
print("Cannot solve this puzzle")
else:
print("It takes %d step to solve this puzzle" % self.count)
print("There are %d possible board"%len(self.possibleboard))
def find_match(self,board, blank_tiles, lis):
if (self.checkfill(board) and len(blank_tiles) == 0): ## check no ** in board, and all the anwer in check answerlist
if board not in self.possibleboard:
self.board = board
self.possibleboard.append(board)
self.printboard(board)
# if no tiles left, print the board
if len(blank_tiles) == 0:
self.board = board
# p***
else:
for tile in blank_tiles: ##pair row and col that have valid answer
for word in lis: ## answerword
# check all word can place to the space or the space is all ***
#board[tile[i][0]][tile[i][1]] is current node
lengthanswer = len(tile)
if len(word) == len(tile) and all( board[tile[i][0]][tile[i][1]] == "*" or board[tile[i][0]][tile[i][1]] == word[i] for i in range(lengthanswer)):
new_board = copy.deepcopy(board) #create newboard
# if word not in self.checkanswerlist:
# self.checkanswerlist.append(word)
for i in range(len(tile)): #fill the word in a row and array
new_lis1 = list(new_board[tile[i][0]])
new_lis1[tile[i][1]] = word[i]
new_board[tile[i][0]] = "".join(new_lis1)
new_blank_tiles = blank_tiles[:] # remove an row and col already place in list
new_blank_tiles.remove(tile)
new_lis2 = lis[:] # remove word already place in
new_lis2.remove(word)
# self.printboard(new_board)
self.count = self.count +1
self.find_match(new_board, new_blank_tiles, new_lis2)
if __name__ == "__main__":
g = search("puzzle1.txt")
g.openfile()
g.printboard(g.board)
g.verhorlist()
g.printverhorlist()
g.printanswer()
g.startfind()
#g.printanswer()
-
2\$\begingroup\$ Can you add some description indicating what this code is supposed to do and what is a correct output and what this puzzle is about? \$\endgroup\$watch-this– watch-this2019年10月04日 03:12:11 +00:00Commented Oct 4, 2019 at 3:12
-
\$\begingroup\$ The crossword file begins with a grid showing where the words go (represented by the * character) and the spaces around the words (represented by dots). Then there is a list of words, one per line, all in caps, that are to be placed into the crossword grid. The words should be placed in the grid so that they fit the number of asterisks and so that anywhere that words intersect each other, they share a letter. ( you can run puzzle1.txt to se the result). \$\endgroup\$Hoang Hai– Hoang Hai2019年10月04日 05:05:41 +00:00Commented Oct 4, 2019 at 5:05
-
\$\begingroup\$ I want my code to be run faster, like you can see the runtime guides, because for now, my code run puzzle 3 -4 -5 is takes forever \$\endgroup\$Hoang Hai– Hoang Hai2019年10月04日 05:06:54 +00:00Commented Oct 4, 2019 at 5:06
-
1\$\begingroup\$ Thank you for accepting my review however, I suggest you undo the accept and just tick the upvote ^ sign for now to give other reviewers a chance to give some reviews and if my review or anyone's turns out to be the best, you might accept it at the end. I'm editing your code to add the description you indicated in the comment above. \$\endgroup\$watch-this– watch-this2019年10月04日 05:18:20 +00:00Commented Oct 4, 2019 at 5:18
2 Answers 2
Style
I suggest you check PEP0008, https://www.python.org/dev/peps/pep-0008/ the official Python style guide which will be your guide to write a Pythonic code and Flake8 http://flake8.pycqa.org/en/latest/ as a tool for style enforcement.
Clean up:
from collections import defaultdict
andimport os
are unused import statements and should be cleaned up as well as# g.printanswer()
at the last line of the code.Class Names Class names should normally use the CapWords convention. Example:
ClassNameExample
andclass search:
should be:class Search:
Function/method names Function names should be lowercase, with words separated by underscores as necessary to improve readability. Example:
def first_method(self, param):
def addanswer(self,word):
should be:def add_answer(self, word):
Note the comma
,
is separated by space on the right side for readability and same applies to all other methodsdef printverhorlist(self):
def verhorlist(self):
...Docstrings Python documentation strings (or docstrings) provide a convenient way of associating documentation with Python modules, functions, classes, and methods. You should indicate docstrings to your classes/methods/functions indicating what they do, what they return and type hints when necessary. Example:
class MyClass: """Description goes here.""" def first_method(self, param1: int, param2: str): """Do things and return stuff."""
And this applies to all your methods defined under your
search
class as long as they are not meant for internal use(private methods) which are written with a leading underscore:def _private_method(self, param):
Blank lines Surround top-level function and class definitions with two blank lines. Method definitions inside a class are surrounded by a single blank line. Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).
Comments Comments should be complete sentences. The first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!) and start with a single # and 2 spaces are left for inline comments example:
This line:
self.blank_tiles=[] ##pair row and col that have valid answer
should be written:
self.blank_tiles = [] # Pair row and col that have valid answer.
And same applies to all other comments
- Typos
def printverhorlist(self):
anddef verhorlist(self):
names that do not have any sense 'verhor' as well as spelling errors should be replaced with corrects words that have meanings. - Long lines Lines should not exceed 79 characters (lines 29, 44 in your code)
Code
A general comment is most of your code is very hard to read/understand due to the lack of documentation and description and it is not possible for anyone to tell what's a wrong/right output for not so obvious parts of the code. As per rules of this website the code should be running properly as of to the author best of knowledge, I'm going to assume that it's working correctly and I cannot tell again due to the absence of documentation and bad structure however here are some things to note:
As the title states, this should be a depth-first search(DFS) with pseudocode found here: https://en.wikipedia.org/wiki/Depth-first_search, I can't seem to find a proper implementation of a DFS search which is generally implemented using a stack (which is absent in your code). This might be an improper DFS implementation maybe leading to correct output somehow and maybe not(I cannot tell without a description or proper documentation) so I suggest you at least provide a proper description with a proper expected output if you need a decent feedback and here are a few points:
def __init__(self, file_name):
a class with a single parameter is not a good class and you may define functions that pass results to each other instead of the over-engineered code. And since most of the instance attributes are lists, you can define these lists inside of their corresponding functions that use them without using a class in the first place.Methods do return
def printverhorlist(self): print(self.blank_tiles)
This is the same as
print(self.blank_tiles)
so there is no need for defining a method for printing and for a specific/custom print/display of the user-defined objects, you might use__str__
method that returns a string in case someone is using your code and wants to print your defined object.With as
context managers:file = open(self.filename, "r")
inopenfile()
method, it is generally better to usewith
andas
context managers that automatically close the file without usingfile_object.close()
method you forgot to use after manually opening the file. Some Python files will close files automatically when they are no longer referenced, while others will not and it's up to the OS. to close files when the Python interpreter exits. So it's generally a bad practice not to close your files after opening them, to forget about this hassle, use with and as in the following way:with open(my_file) as example: # do stuff
Improper use of try except:
except: print("Cannot open file")
what kind of exception are you catching here? if the exception is the failure to open the file, you're opening the file outside the
try
body so the whatever exception you're trying to catch is never caught in the first place.
-
1\$\begingroup\$ this is not a full review - Sure it is! CR policy is that your answer have at least one insightful comment, and you have many. I'd omit that couching phrase altogether. \$\endgroup\$Reinderien– Reinderien2019年10月04日 12:32:28 +00:00Commented Oct 4, 2019 at 12:32
-
\$\begingroup\$ Looks like OP is using recursion instead of an explicit stack, so it looks like a DFS to me. \$\endgroup\$ggorlen– ggorlen2019年10月04日 16:53:14 +00:00Commented Oct 4, 2019 at 16:53
A bug: the first set of nested loops in verhorlist
, which is said to find columns but actually finds rows, tests j == self.cols-1
but should test i == self.cols-1
.
This is why for puzzle 3, the horizontal tile in the middle row is not found. As a secondary effect, a ton of broken solutions is printed for puzzle 3, because as we lack a tile, len(blank_tiles) == 0
even though not all words have been assigned.
The algorithm:
The algorithm can be switched from:
- For each tile
- Try each fitting word
To:
- For each tile
- Try one fitting word
This is also complete: every (fitting) combination of tile and word is tried, but by backtracking and then trying to put the same word somewhere else, not by every node in the recursion tree trying all combinations. The effect is that the branching factor of the recursion tree is dramatically reduced.
The time to solve a puzzle still depends dramatically on the search order, puzzle 3 can be solved on my PC in 2 ms or up to 240 ms (perhaps my PC is faster than the guidelines anticipate?), depending on the order of the word list. If "ABCDEFG" is put down first, everything just falls into place. If anything else is placed first, some trial-and-error likely results - but not an impossibly large number of tries.
Further improvement is possible, by intentionally using a good search order. "The order the words are in the file" is accidentally a good order for puzzle 3, but in general it may not be. The word that should go first is the word that has the fewest places it can go, this is an example of the Most Constraining Variable (MCV) heuristic for constraint satisfaction problem solvers. The other famous heuristic, LCV (least constraining value), is harder to give meaning in this modelling of the problem. Perhaps it is good to try to assign a word to a tile where it already crosses some other word, trying to eagerly cross words with each other when possible, so that the other place where the word also fits stays "unpolluted" for longer.
Explore related questions
See similar questions with these tags.