Follow-up to Python program to find a word ladder transforming "four" to "five"
I'm trying to write a program to solve a word ladder puzzle to brush up on my Python. For those who don't know:
A word ladder puzzle begins with two words, and to solve the puzzle one must find a chain of other words to link the two, in which two adjacent words (that is, words in successive steps) differ by one letter. - Wikipedia
I've implemented breadth first search as opposed to depth first search for a big performance boost since my last question, as suggested in the answers.
I've also added a new mode, that can add and remove letters as well as just swapping them.
Additionally, the program now takes user input, and can take more than two words to link together (given a b c
, it runs a b
, and then runs b c
).
The program now takes ~6 seconds on my machine to run with input of five four
and 1
. I'm still interested in making this faster, and also would like to know how to make it more Pythonic.
# -------- Data Structures --------
class Queue():
""" FIFO Queue """
def __init__(self, parent = None):
self.list = []
self.parent = parent
def append(self, value):
self.list.append(value)
def pop(self):
return self.list.pop(0)
class Node():
""" Node of a Tree """
def __init__(self, value, parent = None):
self.value = value
self.parent = parent
# -------- Distance Functions --------
def hamming(s1, s2):
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
def levenshtein(s1, s2):
if len(s1) < len(s2):
return levenshtein(s2, s1)
if len(s2) == 0:
return len(s1)
previous_row = range(len(s2) + 1)
for i, c1 in enumerate(s1):
current_row = [i + 1]
for j, c2 in enumerate(s2):
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] + (c1 != c2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
return previous_row[-1]
# -------- IO --------
# Print sequence from root to node
def print_words(node):
values = []
while isinstance(node, Node):
values.append(node.value)
node = node.parent
print(list(reversed(values)))
# Get all input
all_words = open("/usr/share/dict/words", "r").read().lower().splitlines()
input_words = input("Enter list of words, seperated by spaces: ").split()
input_mode = int(input("Enter 1 for swap-only mode, or 2 for swap-add-rem mode: "))
# Validate user input
if not 1 <= input_mode <= 2:
raise ValueError("Invalid mode: " + input_mode)
for word in input_words:
if word not in all_words:
raise ValueError("Invalid word: " + word)
# Adapt to mode
distance = [hamming, levenshtein][input_mode - 1]
if input_mode == 1:
all_words = [word for word in all_words if len(word) == len(input_words[0])]
# -------- Breadth First Search --------
def fill(node, to_check, checked):
checked.add(node.value)
for word in all_words:
if 1 >= len(word) - len(node.value) >= -1 and distance(word, node.value) == 1:
to_check.append(Node(word, node))
for i in range(len(input_words) - 1):
root = Node(input_words[i])
checked = set()
to_check = Queue()
fill(root, to_check, checked)
while to_check.list:
node = to_check.pop()
if node.value == input_words[i + 1]:
print_words(node)
break
if node.value not in checked:
fill(node, to_check, checked)
-
\$\begingroup\$ Side answer: A node that knows both the parent and children is double-linked: en.wikipedia.org/wiki/Doubly_linked_list. So a node that knows only one is just 'linked'. Since a tree has a parent-child cardinality of 1..*, it would always be the child that knows the parent when 'linked'. \$\endgroup\$dfhwze– dfhwze2019年06月14日 05:32:31 +00:00Commented Jun 14, 2019 at 5:32
-
\$\begingroup\$ Never mind - I just found the answer to my confusingly worded question. Apologies. \$\endgroup\$Alex F– Alex F2019年06月14日 05:39:38 +00:00Commented Jun 14, 2019 at 5:39
-
\$\begingroup\$ Feel free to post a self-answer to enlighten us :) \$\endgroup\$dfhwze– dfhwze2019年06月14日 05:50:47 +00:00Commented Jun 14, 2019 at 5:50
-
\$\begingroup\$ I mean, the answer to the side question was just "no"... I just didn't have a good enough understanding of the term "tree" so the question didn't make much sense. \$\endgroup\$Alex F– Alex F2019年06月14日 06:08:04 +00:00Commented Jun 14, 2019 at 6:08
2 Answers 2
Your Queue
class should be replaced by the builtin collections.deque
which offers better performances (list
s .pop(0)
are \$\mathcal{O}(n)\$ since the remainder of the list have to be shifted, but deque.popleft()
is \$\mathcal{O}(1)\$).
You should also take the habit of opening files using the with
statement to avoid keeping opened file descriptors around:
def read_file(filename='/usr/share/dict/words'):
with open(filename) as f:
return set(map(str.lower, map(str.strip, f)))
Note that I return a set
to accelerate the search if word not in all_words
. You could also bring back the isalpha
filter from your previous question:
def read_file(filename='/usr/share/dict/words'):
with open(filename) as f:
return set(map(str.lower, filter(str.isalpha, map(str.strip, f))))
Your code would also largely gain from using argparse
instead of your various input
s.
And print_words
could easily be converted to an __iter__
method on the Node
class.
Example improvements:
"""Word-Ladder Solver.
blah blah blah
and blah
"""
import sys
import enum
import argparse
import itertools
from collections import deque
class Mode(enum.Enum):
SWAP = 'swap-only'
ADD_REM_SWAP = 'add-remove-swap'
class Node:
"""Node of a Tree"""
def __init__(self, value, parent=None):
self.value = value
self.parent = parent
def __iter__(self):
if self.parent is not None:
yield from self.parent
yield self.value
def __reversed__(self):
node = self
while node is not None:
yield node.value
node = node.parent
def command_line_parser():
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('word', nargs='+')
parser.add_argument('final_word')
parser.add_argument(
'-m', '--mode', type=Mode,
choices=[m.value for m in Mode], default=Mode.SWAP,
help='mode of operation to use')
parser.add_argument(
'-d', '--dictionnary', '--words-file',
metavar='PATH', default='/usr/share/dict/words',
help='path to the list of words to search from')
return parser
def pairwise(iterable):
a, b = itertools.tee(iterable)
next(b, None)
yield from zip(a, b)
def hamming(s1, s2):
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
def levenshtein(s1, s2):
if len(s1) < len(s2):
return levenshtein(s2, s1)
if len(s2) == 0:
return len(s1)
previous_row = range(len(s2) + 1)
for i, c1 in enumerate(s1):
current_row = [i + 1]
for j, c2 in enumerate(s2):
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] + (c1 != c2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
return previous_row[-1]
def read_words(filename):
with open(filename) as f:
return set(map(str.lower, filter(str.isalpha, map(str.strip, f))))
def find_word_ladder(source, target, words, distance):
checked = set()
candidates = deque([Node(source)])
while candidates:
node = candidates.popleft()
candidate = node.value
if candidate == target:
return node
if candidate not in checked:
checked.add(candidate)
candidates.extend(
Node(word, node)
for word in words
if distance(word, candidate) == 1)
def main(targets, words, mode):
if mode is Mode.SWAP:
distance = hamming
elif mode is Mode.ADD_REM_SWAP:
distance = levenshtein
else:
return
for source, target in pairwise(targets):
if source not in words:
sys.exit('unknown word in dictionnary: {}'.format(source))
if target not in words:
sys.exit('unknown word in dictionnary: {}'.format(target))
chain = find_word_ladder(source, target, words, distance)
print(list(chain))
if __name__ == '__main__':
parser = command_line_parser()
args = parser.parse_args()
try:
words = read_words(args.dictionnary)
except OSError as e:
parser.error('unable to read words file: {}'.format(e))
if args.mode is Mode.SWAP:
length = len(args.final_word)
words = {w for w in words if len(w) == length}
targets = args.word
targets.append(args.final_word)
main(targets, words, args.mode)
Example usage:
$ python words_ladder.py -d /usr/share/dict/cracklib-small five four dice
['five', 'fire', 'fore', 'tore', 'torr', 'tour', 'four']
['four', 'tour', 'torr', 'tore', 'tire', 'dire', 'dice']
-
\$\begingroup\$ Thanks for your time! One part of the code that I don't understand is
reversed(list(reversed(self)))
. Why would you reverse something twice? \$\endgroup\$Alex F– Alex F2019年06月15日 00:44:54 +00:00Commented Jun 15, 2019 at 0:44 -
\$\begingroup\$ @AlexF The first is here to call
self.__reversed__()
, the second is the same than in your function and serves to traverse the tree from the root to theself
node. All in all, this is just hiding yourprint_words
asNode
's internals. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2019年06月15日 07:15:16 +00:00Commented Jun 15, 2019 at 7:15 -
\$\begingroup\$ I contemplated using
deque.extendleft
instead but found it a bit obscure. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2019年06月15日 07:18:54 +00:00Commented Jun 15, 2019 at 7:18 -
1\$\begingroup\$ @AlexF Changed
__iter__
to provide an alternate implementation and avoid the doublereversed
. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2019年06月15日 10:08:34 +00:00Commented Jun 15, 2019 at 10:08
I'm a little bit short on time at the moment, but I'ld like to share a few minor observations with you. Maybe I will also find time to look at performance related optimizations and expand it later.
You said you would like your code to be Pythonic. With that in mind you could replace print(list(reversed(values)))
by print(values[::-1])
which uses slicing (nice explanation on Stack Overflow) to revert the list. Since you need the complete list nevertheless, there is no real advantage in using a reverse iterator instead of a list directly.
Also since "Explicit is better than implicit." (from Zen of Python) I would prefer to see
distance = hamming if input_mode == 1 else levenshtein
or something similar instead of
distance = [hamming, levenshtein][input_mode - 1]
The proposed version would also allow you to drop the string to int conversion. It would even work if the user entered something else than the two values presented by the prompt. levenshtein
would be assumed to be the default then. Change the if
to your liking if you prefer hamming
as default. Although this might not be something to think about, since you have an input validation in place (good). However, the way it is written could be improved since you have a really small range (read: two) values that are valid. That makes it possible to list them explicitly:
# Validate user input
if input_mode not in (1, 2):
raise ValueError("Invalid mode: " + input_mode)
This almost reads like normal language.
Explore related questions
See similar questions with these tags.