I am working on Word Ladder - LeetCode
Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
- Only one letter can be changed at a time.
- Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Note:
- Return 0 if there is no such transformation sequence.
- All words have the same length.
- All words contain only lowercase alphabetic characters.
- You may assume no duplicates in the word list.
- You may assume beginWord and endWord are non-empty and are not the same.
Example 1:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"] Output: 5 Explanation: As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" -> "cog", return its length 5.
Example 2:
Input: beginWord = "hit" endWord = "cog" wordList = ["hot","dot","dog","lot","log"] Output: 0 Explanation: The endWord "cog" is not in wordList, therefore no possible transformation.
My Approach:
- Create a graph with word distances
- Calculate the distance to the endWord using BFS
Code:
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
if endWord not in wordList:
return 0
graph = {}
wordList = [beginWord] + wordList if beginWord not in wordList else wordList
for i in range(len(wordList)):
wordU = wordList[i]
# Compute distances and prepare a graph when the distance is equal to one
for j in range(i+1, len(wordList)):
wordV = wordList[j]
if wordU not in graph.keys():
graph[wordU] = []
if wordV not in graph.keys():
graph[wordV] = []
if self.word_difference(wordU, wordV) == 1:
graph[wordU].append(wordV)
graph[wordV].append(wordU)
visited = set()
queue = [beginWord]
distances = {w:1 for w in queue} # since the first word is 1 we are putting 2 here for the root
return self.BFS(graph, endWord, visited, queue, distances)
def word_difference(self, word1, word2):
# Compute the difference in words
distance = 0
for c1, c2 in zip(word1, word2):
if c1 != c2:
distance += 1
return distance
def BFS(self, graph, endNode, visited, queue, distances):
while queue:
node = queue.pop(0)
if node not in visited:
visited.add(node)
if node == endNode:
return distances[node]
for neighbour in graph[node]:
if neighbour not in visited:
visited.add(neighbour)
distances[neighbour] = distances[node] + 1
queue.append(neighbour)
return 0
The problem is that the above code is failing on a test case with large word list.
The discussions forum had a solution which took a similar approach but is clearing all the test cases. How can I optimize my solution?
1 Answer 1
Measure first. If you have a performance problem, make sure you know where it is. A simple, low-tech way to do this is to start throwing time markers into the code and printing the elapsed time for different phases of the work. In your case, the graph-building is the costly part of the process, so there's no need to waste time thinking about how to optimize the BFS code.
from time import time
t1 = time()
... # Do some work.
t2 = time()
... # Do some other work
t3 = time()
print('Phase 1', t2 - t1)
print('Phase 2', t3 - t2)
Your algorithm and the one from the discussion forum are quite different.
Yes, they both build a graph and then use BFS on it. But, as noted, the
performance problem lies in the graph building. Your algorithm is O(N*N)
:
nested loops over wordList
. But the other solution is effectively O(N)
because the inner loop is over the positions in the current word, not the
entire word list.
# Your approach: O(N*N).
for i in range(len(wordList)):
...
for j in range(i+1, len(wordList)):
...
# Discussion forum: O(N*m), where m is tiny, effectively a constant.
for word in word_list:
for i in range(len(word)):
...
Too much computation: word difference is not relevant. A clue that your algorithm is inefficient comes from realizing that we don't actually care about word distances. Our goal is to build a graph linking immediate neighbors (those having a difference of exactly 1), but your algorithm does the work of actually computing all pairwise differences.
Invert the strategy: create linking keys. Suppose you realize that you don't need the word differences. The problem becomes, how do we find immediate neighbors (those having a difference of exactly 1) without actually computing any differences? That's not an easy question to answer, of course, and to some extent it requires a leap of insight. Nonetheless, there is still a general lesson to be learned. The wildcard-based strategy in the discussion forum achieves that goal by figuring out a way to map each word to some linking-keys, such that immediate neighbors will share a common key. The way to do that is to convert each word (a sequence of characters) into other sequences of characters where each character gets replaced by a generic value.
# The word
dog
# Its linking keys
_og
d_g
do_
-
\$\begingroup\$ Amazing! Thank you for your insightful comments. I guess I have to practice more. Is there anything (book/leetcode list/course) that you would suggest? \$\endgroup\$abybaddi009– abybaddi0092022年03月30日 18:21:04 +00:00Commented Mar 30, 2022 at 18:21
-
\$\begingroup\$ @abybaddi009 You're welcome. Yes, practice is crucial. I don't have any ready-made suggestions, other than my regular theme here: data-centric thinking as a strategy/mindset for reducing/simplifying algorithms. The linking-key strategy is a data-centric approach to the problem in the sense that it envisions an easily built data structure to achieve the linking without requiring exhaustive pairwise comparisons. You can see a different example of the theme here and in a couple of other links I provide in its comments. \$\endgroup\$FMc– FMc2022年03月30日 18:49:51 +00:00Commented Mar 30, 2022 at 18:49
Explore related questions
See similar questions with these tags.