Word-Ladder solver in Python 3

Question 1

Follow-up to Python program to find a word ladder transforming "four" to "five"

I'm trying to write a program to solve a word ladder puzzle to brush up on my Python. For those who don't know:

A word ladder puzzle begins with two words, and to solve the puzzle one must find a chain of other words to link the two, in which two adjacent words (that is, words in successive steps) differ by one letter. - Wikipedia

I've implemented breadth first search as opposed to depth first search for a big performance boost since my last question, as suggested in the answers. I've also added a new mode, that can add and remove letters as well as just swapping them. Additionally, the program now takes user input, and can take more than two words to link together (given a b c, it runs a b, and then runs b c).

The program now takes ~6 seconds on my machine to run with input of five four and 1. I'm still interested in making this faster, and also would like to know how to make it more Pythonic.

# -------- Data Structures --------
class Queue():
 """ FIFO Queue """
 def __init__(self, parent = None):
 self.list = []
 self.parent = parent
 def append(self, value):
 self.list.append(value)
 def pop(self):
 return self.list.pop(0)
class Node():
 """ Node of a Tree """
 def __init__(self, value, parent = None):
 self.value = value
 self.parent = parent
# -------- Distance Functions --------
def hamming(s1, s2):
 return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
def levenshtein(s1, s2):
 if len(s1) < len(s2):
 return levenshtein(s2, s1)
 if len(s2) == 0:
 return len(s1)
 previous_row = range(len(s2) + 1)
 for i, c1 in enumerate(s1):
 current_row = [i + 1]
 for j, c2 in enumerate(s2):
 insertions = previous_row[j + 1] + 1 
 deletions = current_row[j] + 1
 substitutions = previous_row[j] + (c1 != c2)
 current_row.append(min(insertions, deletions, substitutions))
 previous_row = current_row
 return previous_row[-1]
# -------- IO --------
# Print sequence from root to node
def print_words(node):
 values = []
 while isinstance(node, Node):
 values.append(node.value)
 node = node.parent
 print(list(reversed(values)))
# Get all input
all_words = open("/usr/share/dict/words", "r").read().lower().splitlines()
input_words = input("Enter list of words, seperated by spaces: ").split()
input_mode = int(input("Enter 1 for swap-only mode, or 2 for swap-add-rem mode: "))
# Validate user input
if not 1 <= input_mode <= 2:
 raise ValueError("Invalid mode: " + input_mode)
for word in input_words:
 if word not in all_words:
 raise ValueError("Invalid word: " + word)
# Adapt to mode
distance = [hamming, levenshtein][input_mode - 1]
if input_mode == 1:
 all_words = [word for word in all_words if len(word) == len(input_words[0])]
# -------- Breadth First Search --------
def fill(node, to_check, checked):
 checked.add(node.value)
 for word in all_words:
 if 1 >= len(word) - len(node.value) >= -1 and distance(word, node.value) == 1:
 to_check.append(Node(word, node))
for i in range(len(input_words) - 1):
 root = Node(input_words[i])
 checked = set()
 to_check = Queue()
 fill(root, to_check, checked)
 while to_check.list:
 node = to_check.pop()
 if node.value == input_words[i + 1]:
 print_words(node)
 break
 if node.value not in checked:
 fill(node, to_check, checked)

Question 2

Side answer: A node that knows both the parent and children is double-linked: en.wikipedia.org/wiki/Doubly_linked_list. So a node that knows only one is just 'linked'. Since a tree has a parent-child cardinality of 1..*, it would always be the child that knows the parent when 'linked'.

Question 3

Never mind - I just found the answer to my confusingly worded question. Apologies.

Question 4

Feel free to post a self-answer to enlighten us :)

Question 5

I mean, the answer to the side question was just "no"... I just didn't have a good enough understanding of the term "tree" so the question didn't make much sense.

Question 6

Your Queue class should be replaced by the builtin collections.deque which offers better performances (lists .pop(0) are \$\mathcal{O}(n)\$ since the remainder of the list have to be shifted, but deque.popleft() is \$\mathcal{O}(1)\$).

You should also take the habit of opening files using the with statement to avoid keeping opened file descriptors around:

def read_file(filename='/usr/share/dict/words'):
 with open(filename) as f:
 return set(map(str.lower, map(str.strip, f)))

Note that I return a set to accelerate the search if word not in all_words. You could also bring back the isalpha filter from your previous question:

def read_file(filename='/usr/share/dict/words'):
 with open(filename) as f:
 return set(map(str.lower, filter(str.isalpha, map(str.strip, f))))

Your code would also largely gain from using argparse instead of your various inputs.

And print_words could easily be converted to an __iter__ method on the Node class.

Example improvements:

"""Word-Ladder Solver.
blah blah blah
and blah
"""
import sys
import enum
import argparse
import itertools
from collections import deque
class Mode(enum.Enum):
 SWAP = 'swap-only'
 ADD_REM_SWAP = 'add-remove-swap'
class Node:
 """Node of a Tree"""
 def __init__(self, value, parent=None):
 self.value = value
 self.parent = parent
 def __iter__(self):
 if self.parent is not None:
 yield from self.parent
 yield self.value
 def __reversed__(self):
 node = self
 while node is not None:
 yield node.value
 node = node.parent
def command_line_parser():
 parser = argparse.ArgumentParser(
 description=__doc__,
 formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 parser.add_argument('word', nargs='+')
 parser.add_argument('final_word')
 parser.add_argument(
 '-m', '--mode', type=Mode,
 choices=[m.value for m in Mode], default=Mode.SWAP,
 help='mode of operation to use')
 parser.add_argument(
 '-d', '--dictionnary', '--words-file',
 metavar='PATH', default='/usr/share/dict/words',
 help='path to the list of words to search from')
 return parser
def pairwise(iterable):
 a, b = itertools.tee(iterable)
 next(b, None)
 yield from zip(a, b)
def hamming(s1, s2):
 return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
def levenshtein(s1, s2):
 if len(s1) < len(s2):
 return levenshtein(s2, s1)
 if len(s2) == 0:
 return len(s1)
 previous_row = range(len(s2) + 1)
 for i, c1 in enumerate(s1):
 current_row = [i + 1]
 for j, c2 in enumerate(s2):
 insertions = previous_row[j + 1] + 1
 deletions = current_row[j] + 1
 substitutions = previous_row[j] + (c1 != c2)
 current_row.append(min(insertions, deletions, substitutions))
 previous_row = current_row
 return previous_row[-1]
def read_words(filename):
 with open(filename) as f:
 return set(map(str.lower, filter(str.isalpha, map(str.strip, f))))
def find_word_ladder(source, target, words, distance):
 checked = set()
 candidates = deque([Node(source)])
 while candidates:
 node = candidates.popleft()
 candidate = node.value
 if candidate == target:
 return node
 if candidate not in checked:
 checked.add(candidate)
 candidates.extend(
 Node(word, node)
 for word in words
 if distance(word, candidate) == 1)
def main(targets, words, mode):
 if mode is Mode.SWAP:
 distance = hamming
 elif mode is Mode.ADD_REM_SWAP:
 distance = levenshtein
 else:
 return
 for source, target in pairwise(targets):
 if source not in words:
 sys.exit('unknown word in dictionnary: {}'.format(source))
 if target not in words:
 sys.exit('unknown word in dictionnary: {}'.format(target))
 chain = find_word_ladder(source, target, words, distance)
 print(list(chain))
if __name__ == '__main__':
 parser = command_line_parser()
 args = parser.parse_args()
 try:
 words = read_words(args.dictionnary)
 except OSError as e:
 parser.error('unable to read words file: {}'.format(e))
 if args.mode is Mode.SWAP:
 length = len(args.final_word)
 words = {w for w in words if len(w) == length}
 targets = args.word
 targets.append(args.final_word)
 main(targets, words, args.mode)

Example usage:

$ python words_ladder.py -d /usr/share/dict/cracklib-small five four dice
['five', 'fire', 'fore', 'tore', 'torr', 'tour', 'four']
['four', 'tour', 'torr', 'tore', 'tire', 'dire', 'dice']

Question 7

Thanks for your time! One part of the code that I don't understand is reversed(list(reversed(self))). Why would you reverse something twice?

Question 8

@AlexF The first is here to call self.__reversed__(), the second is the same than in your function and serves to traverse the tree from the root to the self node. All in all, this is just hiding your print_words as Node's internals.

Question 9

I contemplated using deque.extendleft instead but found it a bit obscure.

Question 10

@AlexF Changed __iter__ to provide an alternate implementation and avoid the double reversed.

Question 11

I'm a little bit short on time at the moment, but I'ld like to share a few minor observations with you. Maybe I will also find time to look at performance related optimizations and expand it later.

You said you would like your code to be Pythonic. With that in mind you could replace print(list(reversed(values))) by print(values[::-1]) which uses slicing (nice explanation on Stack Overflow) to revert the list. Since you need the complete list nevertheless, there is no real advantage in using a reverse iterator instead of a list directly.

Also since "Explicit is better than implicit." (from Zen of Python) I would prefer to see

distance = hamming if input_mode == 1 else levenshtein

or something similar instead of

distance = [hamming, levenshtein][input_mode - 1]

The proposed version would also allow you to drop the string to int conversion. It would even work if the user entered something else than the two values presented by the prompt. levenshtein would be assumed to be the default then. Change the if to your liking if you prefer hamming as default. Although this might not be something to think about, since you have an input validation in place (good). However, the way it is written could be improved since you have a really small range (read: two) values that are valid. That makes it possible to list them explicitly:

# Validate user input
if input_mode not in (1, 2):
 raise ValueError("Invalid mode: " + input_mode)

This almost reads like normal language.

score 5 · Accepted Answer · 2019-06-14 12:11:21Z

Your Queue class should be replaced by the builtin collections.deque which offers better performances (lists .pop(0) are \$\mathcal{O}(n)\$ since the remainder of the list have to be shifted, but deque.popleft() is \$\mathcal{O}(1)\$).

You should also take the habit of opening files using the with statement to avoid keeping opened file descriptors around:

def read_file(filename='/usr/share/dict/words'):
 with open(filename) as f:
 return set(map(str.lower, map(str.strip, f)))

Note that I return a set to accelerate the search if word not in all_words. You could also bring back the isalpha filter from your previous question:

def read_file(filename='/usr/share/dict/words'):
 with open(filename) as f:
 return set(map(str.lower, filter(str.isalpha, map(str.strip, f))))

Your code would also largely gain from using argparse instead of your various inputs.

And print_words could easily be converted to an __iter__ method on the Node class.

Example improvements:

"""Word-Ladder Solver.
blah blah blah
and blah
"""
import sys
import enum
import argparse
import itertools
from collections import deque
class Mode(enum.Enum):
 SWAP = 'swap-only'
 ADD_REM_SWAP = 'add-remove-swap'
class Node:
 """Node of a Tree"""
 def __init__(self, value, parent=None):
 self.value = value
 self.parent = parent
 def __iter__(self):
 if self.parent is not None:
 yield from self.parent
 yield self.value
 def __reversed__(self):
 node = self
 while node is not None:
 yield node.value
 node = node.parent
def command_line_parser():
 parser = argparse.ArgumentParser(
 description=__doc__,
 formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 parser.add_argument('word', nargs='+')
 parser.add_argument('final_word')
 parser.add_argument(
 '-m', '--mode', type=Mode,
 choices=[m.value for m in Mode], default=Mode.SWAP,
 help='mode of operation to use')
 parser.add_argument(
 '-d', '--dictionnary', '--words-file',
 metavar='PATH', default='/usr/share/dict/words',
 help='path to the list of words to search from')
 return parser
def pairwise(iterable):
 a, b = itertools.tee(iterable)
 next(b, None)
 yield from zip(a, b)
def hamming(s1, s2):
 return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
def levenshtein(s1, s2):
 if len(s1) < len(s2):
 return levenshtein(s2, s1)
 if len(s2) == 0:
 return len(s1)
 previous_row = range(len(s2) + 1)
 for i, c1 in enumerate(s1):
 current_row = [i + 1]
 for j, c2 in enumerate(s2):
 insertions = previous_row[j + 1] + 1
 deletions = current_row[j] + 1
 substitutions = previous_row[j] + (c1 != c2)
 current_row.append(min(insertions, deletions, substitutions))
 previous_row = current_row
 return previous_row[-1]
def read_words(filename):
 with open(filename) as f:
 return set(map(str.lower, filter(str.isalpha, map(str.strip, f))))
def find_word_ladder(source, target, words, distance):
 checked = set()
 candidates = deque([Node(source)])
 while candidates:
 node = candidates.popleft()
 candidate = node.value
 if candidate == target:
 return node
 if candidate not in checked:
 checked.add(candidate)
 candidates.extend(
 Node(word, node)
 for word in words
 if distance(word, candidate) == 1)
def main(targets, words, mode):
 if mode is Mode.SWAP:
 distance = hamming
 elif mode is Mode.ADD_REM_SWAP:
 distance = levenshtein
 else:
 return
 for source, target in pairwise(targets):
 if source not in words:
 sys.exit('unknown word in dictionnary: {}'.format(source))
 if target not in words:
 sys.exit('unknown word in dictionnary: {}'.format(target))
 chain = find_word_ladder(source, target, words, distance)
 print(list(chain))
if __name__ == '__main__':
 parser = command_line_parser()
 args = parser.parse_args()
 try:
 words = read_words(args.dictionnary)
 except OSError as e:
 parser.error('unable to read words file: {}'.format(e))
 if args.mode is Mode.SWAP:
 length = len(args.final_word)
 words = {w for w in words if len(w) == length}
 targets = args.word
 targets.append(args.final_word)
 main(targets, words, args.mode)

Example usage:

$ python words_ladder.py -d /usr/share/dict/cracklib-small five four dice
['five', 'fire', 'fore', 'tore', 'torr', 'tour', 'four']
['four', 'tour', 'torr', 'tore', 'tire', 'dire', 'dice']

Thanks for your time! One part of the code that I don't understand is reversed(list(reversed(self))). Why would you reverse something twice?
@AlexF The first is here to call self.__reversed__(), the second is the same than in your function and serves to traverse the tree from the root to the self node. All in all, this is just hiding your print_words as Node's internals.
I contemplated using deque.extendleft instead but found it a bit obscure.
@AlexF Changed __iter__ to provide an alternate implementation and avoid the double reversed.

Stack Exchange Network

Word-Ladder solver in Python 3

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Word-Ladder solver in Python 3

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions