5
\$\begingroup\$

My current code implementing a depth-first search works well for puzzle 1. But my code is too slow; I think I use too many loops and I am not sure I am using the right data structure. So I have a problem in puzzle 3,4,5. Please help me to make my code run faster.

Description: The crossword file begins with a grid showing where the words go (represented by the * character) and the spaces around the words (represented by dots). Then there is a list of words, one per line, all in caps, that are to be placed into the crossword grid. The words should be placed in the grid so that they fit the number of asterisks and so that anywhere that words intersect each other, they share a letter. ( you can run puzzle1.txt to see the result)

guild

puzzle1.txt

 ..........
 .******...
 ...*......
 ...*......
 ...*****..
 ...*..*...
 ......*...
 ......*...
 ......*...
 ..........
 POLAND
 LHASA
 SPAIN
 INDIA
 puzzle3.txt
 .*.*.*.
 .*.*.*.
 .*.*.*.
 *******
 *.*.*.*
 *.*.*.*
 *.*.*.*
 ABCDEFG
 AAAA
 BBBB
 CCCC
 DDDD
 EEEE
 FFFF
 GGGG
 puzzle4.txt
 ...........**********......
 ...*.......*......*........
 ...*...******.....*........
 ...*.......*......*........
 ...*****...*......*........
 ...*..*....*......*........
 ......*******.....*........
 ......*..*........*...*....
 ......*..*........*********
 .........*............*....
 ......******..........*....
 ......*...........*****....
 ......*...............*....
 ......*...............*....
 ......*....................
 ......*....................
 ......*....................
 ......*....................
 AUSTRALIA
 GUATEMALA
 PANAMA
 PORTUGAL
 IRELAND
 MYANMAR
 LHASA
 MONTENEGRO
 MALTA
 CHINA
 SPAIN
 BHUTAN
 DENMARK
 INDIA

Source code

from collections import defaultdict
import os
import copy
import time
class search:
 # Constructor
 def __init__(self, file_name):
 self.filename = file_name
 self.board = [] ## list of board *..**
 self.rows = 0
 self.cols = 0
 self.blank_tiles=[] ##pair row and col that have valid answer
 self.answerlist =[] ## list of answer
 self.count = 0
 self.possibleboard = []
 self.checkanswerlist = []
 def addanswer(self,word):
 self.answerlist.append(word)
 def printverhorlist(self):
 print(self.blank_tiles)
 def verhorlist(self):
 for j in range(self.rows):
 col_lis = []
 for i in range(self.cols):
 # if current node is * append to row list , if rowlist >1 ( which mean it is a word) append row col to blank_tiles
 if self.board[j][i] == "*":
 col_lis.append((j, i))
 elif len(col_lis) == 1:
 col_lis = []
 elif len(col_lis) > 1:
 self.blank_tiles.append(col_lis)
 col_lis = []
 if j == self.cols-1 and len(col_lis) > 1:
 self.blank_tiles.append(col_lis)
 col_lis = []
 for j in range(self.cols):
 row_lis = []
 for i in range(self.rows):
 # if current node is * append to row list , if rowlist >1 ( which mean it is a word) append row col to blank_tiles
 if self.board[i][j] == "*":
 row_lis.append((i, j))
 elif len(row_lis) == 1:
 row_lis = []
 elif len(row_lis) > 1:
 self.blank_tiles.append(row_lis)
 row_lis = []
 if i == self.rows-1 and len(row_lis) > 1: # end of a column
 self.blank_tiles.append(row_lis)
 row_lis = []
 def printboard(self,board):
 print("#"*self.cols)
 print("\n")
 for row in board:
 print("".join(row))
 def printanswer(self):
 for a in self.answerlist:
 print(a, end = " ")
 print("\n")
 def openfile(self):
 file = open(self.filename, "r")
 a = file.readlines() # read in whole file as 1D list of strings
 try:
 for i in range(len(a)):
 if a[i][0] == "*" or a[i][0] == ".":
 self.board.append(a[i].strip())
 else:
 self.answerlist.append(a[i].strip())
 except:
 print("Cannot open file")
 self.cols = len(self.board[0])
 self.rows = len(self.board)
 print(self.rows,self.cols)
 def checkfill(self,board):
 for m in range(self.rows):
 for n in range(self.cols):
 if board[m][n] == "*":
 return 0
 return 1
 def startfind(self):
 start = time.time()
 self.find_match(self.board, self.blank_tiles, self.answerlist)
 stop = time.time()
 times = (stop - start) * 1000
 print("Run time takes %d miliseconds" % times)
 if not self.checkfill(self.board):
 print("Cannot solve this puzzle")
 else:
 print("It takes %d step to solve this puzzle" % self.count)
 print("There are %d possible board"%len(self.possibleboard))
 def find_match(self,board, blank_tiles, lis):
 if (self.checkfill(board) and len(blank_tiles) == 0): ## check no ** in board, and all the anwer in check answerlist
 if board not in self.possibleboard:
 self.board = board
 self.possibleboard.append(board)
 self.printboard(board)
 # if no tiles left, print the board
 if len(blank_tiles) == 0:
 self.board = board
# p***
 else:
 for tile in blank_tiles: ##pair row and col that have valid answer
 for word in lis: ## answerword
 # check all word can place to the space or the space is all ***
 #board[tile[i][0]][tile[i][1]] is current node
 lengthanswer = len(tile)
 if len(word) == len(tile) and all( board[tile[i][0]][tile[i][1]] == "*" or board[tile[i][0]][tile[i][1]] == word[i] for i in range(lengthanswer)):
 new_board = copy.deepcopy(board) #create newboard
 # if word not in self.checkanswerlist:
 # self.checkanswerlist.append(word)
 for i in range(len(tile)): #fill the word in a row and array
 new_lis1 = list(new_board[tile[i][0]])
 new_lis1[tile[i][1]] = word[i]
 new_board[tile[i][0]] = "".join(new_lis1)
 new_blank_tiles = blank_tiles[:] # remove an row and col already place in list
 new_blank_tiles.remove(tile)
 new_lis2 = lis[:] # remove word already place in
 new_lis2.remove(word)
 # self.printboard(new_board)
 self.count = self.count +1
 self.find_match(new_board, new_blank_tiles, new_lis2)
if __name__ == "__main__":
 g = search("puzzle1.txt")
 g.openfile()
 g.printboard(g.board)
 g.verhorlist()
 g.printverhorlist()
 g.printanswer()
 g.startfind()
 #g.printanswer()
asked Oct 4, 2019 at 1:52
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Can you add some description indicating what this code is supposed to do and what is a correct output and what this puzzle is about? \$\endgroup\$ Commented Oct 4, 2019 at 3:12
  • \$\begingroup\$ The crossword file begins with a grid showing where the words go (represented by the * character) and the spaces around the words (represented by dots). Then there is a list of words, one per line, all in caps, that are to be placed into the crossword grid. The words should be placed in the grid so that they fit the number of asterisks and so that anywhere that words intersect each other, they share a letter. ( you can run puzzle1.txt to se the result). \$\endgroup\$ Commented Oct 4, 2019 at 5:05
  • \$\begingroup\$ I want my code to be run faster, like you can see the runtime guides, because for now, my code run puzzle 3 -4 -5 is takes forever \$\endgroup\$ Commented Oct 4, 2019 at 5:06
  • 1
    \$\begingroup\$ Thank you for accepting my review however, I suggest you undo the accept and just tick the upvote ^ sign for now to give other reviewers a chance to give some reviews and if my review or anyone's turns out to be the best, you might accept it at the end. I'm editing your code to add the description you indicated in the comment above. \$\endgroup\$ Commented Oct 4, 2019 at 5:18

2 Answers 2

5
\$\begingroup\$

Style

I suggest you check PEP0008, https://www.python.org/dev/peps/pep-0008/ the official Python style guide which will be your guide to write a Pythonic code and Flake8 http://flake8.pycqa.org/en/latest/ as a tool for style enforcement.

  • Clean up: from collections import defaultdict and import os are unused import statements and should be cleaned up as well as # g.printanswer() at the last line of the code.

  • Class Names Class names should normally use the CapWords convention. Example: ClassNameExample and class search: should be: class Search:

  • Function/method names Function names should be lowercase, with words separated by underscores as necessary to improve readability. Example: def first_method(self, param): def addanswer(self,word): should be: def add_answer(self, word):

    Note the comma , is separated by space on the right side for readability and same applies to all other methods def printverhorlist(self): def verhorlist(self): ...

  • Docstrings Python documentation strings (or docstrings) provide a convenient way of associating documentation with Python modules, functions, classes, and methods. You should indicate docstrings to your classes/methods/functions indicating what they do, what they return and type hints when necessary. Example:

    class MyClass:
     """Description goes here."""
     def first_method(self, param1: int, param2: str):
     """Do things and return stuff."""
    

    And this applies to all your methods defined under your search class as long as they are not meant for internal use(private methods) which are written with a leading underscore: def _private_method(self, param):

  • Blank lines Surround top-level function and class definitions with two blank lines. Method definitions inside a class are surrounded by a single blank line. Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).

  • Comments Comments should be complete sentences. The first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!) and start with a single # and 2 spaces are left for inline comments example:

    This line:

     self.blank_tiles=[] ##pair row and col that have valid answer
    

    should be written:

     self.blank_tiles = [] # Pair row and col that have valid answer.
    

    And same applies to all other comments

  • Typos def printverhorlist(self): and def verhorlist(self): names that do not have any sense 'verhor' as well as spelling errors should be replaced with corrects words that have meanings.
  • Long lines Lines should not exceed 79 characters (lines 29, 44 in your code)

Code

A general comment is most of your code is very hard to read/understand due to the lack of documentation and description and it is not possible for anyone to tell what's a wrong/right output for not so obvious parts of the code. As per rules of this website the code should be running properly as of to the author best of knowledge, I'm going to assume that it's working correctly and I cannot tell again due to the absence of documentation and bad structure however here are some things to note:

  • As the title states, this should be a depth-first search(DFS) with pseudocode found here: https://en.wikipedia.org/wiki/Depth-first_search, I can't seem to find a proper implementation of a DFS search which is generally implemented using a stack (which is absent in your code). This might be an improper DFS implementation maybe leading to correct output somehow and maybe not(I cannot tell without a description or proper documentation) so I suggest you at least provide a proper description with a proper expected output if you need a decent feedback and here are a few points:

  • def __init__(self, file_name): a class with a single parameter is not a good class and you may define functions that pass results to each other instead of the over-engineered code. And since most of the instance attributes are lists, you can define these lists inside of their corresponding functions that use them without using a class in the first place.

  • Methods do return

    def printverhorlist(self):
     print(self.blank_tiles)
    

    This is the same as print(self.blank_tiles) so there is no need for defining a method for printing and for a specific/custom print/display of the user-defined objects, you might use __str__ method that returns a string in case someone is using your code and wants to print your defined object.

  • With as context managers: file = open(self.filename, "r") in openfile() method, it is generally better to use with and as context managers that automatically close the file without using file_object.close() method you forgot to use after manually opening the file. Some Python files will close files automatically when they are no longer referenced, while others will not and it's up to the OS. to close files when the Python interpreter exits. So it's generally a bad practice not to close your files after opening them, to forget about this hassle, use with and as in the following way:

    with open(my_file) as example:
     # do stuff
    
  • Improper use of try except:

    except:
     print("Cannot open file")
    

    what kind of exception are you catching here? if the exception is the failure to open the file, you're opening the file outside the try body so the whatever exception you're trying to catch is never caught in the first place.

answered Oct 4, 2019 at 5:06
\$\endgroup\$
2
  • 1
    \$\begingroup\$ this is not a full review - Sure it is! CR policy is that your answer have at least one insightful comment, and you have many. I'd omit that couching phrase altogether. \$\endgroup\$ Commented Oct 4, 2019 at 12:32
  • \$\begingroup\$ Looks like OP is using recursion instead of an explicit stack, so it looks like a DFS to me. \$\endgroup\$ Commented Oct 4, 2019 at 16:53
3
\$\begingroup\$

A bug: the first set of nested loops in verhorlist, which is said to find columns but actually finds rows, tests j == self.cols-1 but should test i == self.cols-1.

This is why for puzzle 3, the horizontal tile in the middle row is not found. As a secondary effect, a ton of broken solutions is printed for puzzle 3, because as we lack a tile, len(blank_tiles) == 0 even though not all words have been assigned.

The algorithm:

The algorithm can be switched from:

  • For each tile
    • Try each fitting word

To:

  • For each tile
    • Try one fitting word

This is also complete: every (fitting) combination of tile and word is tried, but by backtracking and then trying to put the same word somewhere else, not by every node in the recursion tree trying all combinations. The effect is that the branching factor of the recursion tree is dramatically reduced.

The time to solve a puzzle still depends dramatically on the search order, puzzle 3 can be solved on my PC in 2 ms or up to 240 ms (perhaps my PC is faster than the guidelines anticipate?), depending on the order of the word list. If "ABCDEFG" is put down first, everything just falls into place. If anything else is placed first, some trial-and-error likely results - but not an impossibly large number of tries.

Further improvement is possible, by intentionally using a good search order. "The order the words are in the file" is accidentally a good order for puzzle 3, but in general it may not be. The word that should go first is the word that has the fewest places it can go, this is an example of the Most Constraining Variable (MCV) heuristic for constraint satisfaction problem solvers. The other famous heuristic, LCV (least constraining value), is harder to give meaning in this modelling of the problem. Perhaps it is good to try to assign a word to a tile where it already crosses some other word, trying to eagerly cross words with each other when possible, so that the other place where the word also fits stays "unpolluted" for longer.

answered Oct 4, 2019 at 15:24
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.