performance - Word ladder leetcode

Question 1

Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:

Only one letter can be changed at a time.

Each transformed word must exist in the word list. Note that beginWord is not a transformed word.

Note:

Return 0 if there is no such transformation sequence.

All words have the same length.

All words contain only lowercase alphabetic characters.

You may assume no duplicates in the word list.

You may assume beginWord and endWord are non-empty and are not the same.

Example 1:
Input:
beginWord = "hit",
endWord = "cog",
wordList = ["hot","dot","dog","lot","log","cog"]
Output: 5
Explanation: As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" -> "cog",
return its length 5.
Example 2:
Input:
beginWord = "hit"
endWord = "cog"
wordList = ["hot","dot","dog","lot","log"]
Output: 0
Explanation: The endWord "cog" is not in wordList, therefore no possible transformation.

My Approach:

Create a graph with word distances
Calculate the distance to the endWord using BFS

Code:

class Solution:
 def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
 if endWord not in wordList:
 return 0
 
 graph = {}
 wordList = [beginWord] + wordList if beginWord not in wordList else wordList
 for i in range(len(wordList)):
 wordU = wordList[i]
 # Compute distances and prepare a graph when the distance is equal to one
 for j in range(i+1, len(wordList)):
 wordV = wordList[j]
 if wordU not in graph.keys():
 graph[wordU] = []
 if wordV not in graph.keys():
 graph[wordV] = []
 if self.word_difference(wordU, wordV) == 1:
 graph[wordU].append(wordV)
 graph[wordV].append(wordU)
 visited = set()
 queue = [beginWord]
 distances = {w:1 for w in queue} # since the first word is 1 we are putting 2 here for the root
 return self.BFS(graph, endWord, visited, queue, distances)
 
 def word_difference(self, word1, word2):
 # Compute the difference in words
 distance = 0
 for c1, c2 in zip(word1, word2):
 if c1 != c2:
 distance += 1
 return distance
 def BFS(self, graph, endNode, visited, queue, distances):
 while queue:
 node = queue.pop(0)
 if node not in visited:
 visited.add(node)
 if node == endNode:
 return distances[node]
 for neighbour in graph[node]:
 if neighbour not in visited:
 visited.add(neighbour)
 distances[neighbour] = distances[node] + 1
 queue.append(neighbour)
 return 0

The problem is that the above code is failing on a test case with large word list.

The discussions forum had a solution which took a similar approach but is clearing all the test cases. How can I optimize my solution?

Question 2

Measure first. If you have a performance problem, make sure you know where it is. A simple, low-tech way to do this is to start throwing time markers into the code and printing the elapsed time for different phases of the work. In your case, the graph-building is the costly part of the process, so there's no need to waste time thinking about how to optimize the BFS code.

from time import time
t1 = time()
... # Do some work.
t2 = time()
... # Do some other work
t3 = time()
print('Phase 1', t2 - t1)
print('Phase 2', t3 - t2)

Your algorithm and the one from the discussion forum are quite different. Yes, they both build a graph and then use BFS on it. But, as noted, the performance problem lies in the graph building. Your algorithm is O(N*N): nested loops over wordList. But the other solution is effectively O(N) because the inner loop is over the positions in the current word, not the entire word list.

# Your approach: O(N*N).
for i in range(len(wordList)):
 ...
 for j in range(i+1, len(wordList)):
 ...
# Discussion forum: O(N*m), where m is tiny, effectively a constant.
for word in word_list:
 for i in range(len(word)):
 ...

Too much computation: word difference is not relevant. A clue that your algorithm is inefficient comes from realizing that we don't actually care about word distances. Our goal is to build a graph linking immediate neighbors (those having a difference of exactly 1), but your algorithm does the work of actually computing all pairwise differences.

Invert the strategy: create linking keys. Suppose you realize that you don't need the word differences. The problem becomes, how do we find immediate neighbors (those having a difference of exactly 1) without actually computing any differences? That's not an easy question to answer, of course, and to some extent it requires a leap of insight. Nonetheless, there is still a general lesson to be learned. The wildcard-based strategy in the discussion forum achieves that goal by figuring out a way to map each word to some linking-keys, such that immediate neighbors will share a common key. The way to do that is to convert each word (a sequence of characters) into other sequences of characters where each character gets replaced by a generic value.

# The word
dog
# Its linking keys
_og
d_g
do_

Question 3

Amazing! Thank you for your insightful comments. I guess I have to practice more. Is there anything (book/leetcode list/course) that you would suggest?

Question 4

@abybaddi009 You're welcome. Yes, practice is crucial. I don't have any ready-made suggestions, other than my regular theme here: data-centric thinking as a strategy/mindset for reducing/simplifying algorithms. The linking-key strategy is a data-centric approach to the problem in the sense that it envisions an easily built data structure to achieve the linking without requiring exhaustive pairwise comparisons. You can see a different example of the theme here and in a couple of other links I provide in its comments.

FMc FMc 13.1k2 gold badges29 silver badges40 bronze badges · Accepted Answer · 2022-03-30 17:43:10Z

Measure first. If you have a performance problem, make sure you know where it is. A simple, low-tech way to do this is to start throwing time markers into the code and printing the elapsed time for different phases of the work. In your case, the graph-building is the costly part of the process, so there's no need to waste time thinking about how to optimize the BFS code.

from time import time
t1 = time()
... # Do some work.
t2 = time()
... # Do some other work
t3 = time()
print('Phase 1', t2 - t1)
print('Phase 2', t3 - t2)

Your algorithm and the one from the discussion forum are quite different. Yes, they both build a graph and then use BFS on it. But, as noted, the performance problem lies in the graph building. Your algorithm is O(N*N): nested loops over wordList. But the other solution is effectively O(N) because the inner loop is over the positions in the current word, not the entire word list.

# Your approach: O(N*N).
for i in range(len(wordList)):
 ...
 for j in range(i+1, len(wordList)):
 ...
# Discussion forum: O(N*m), where m is tiny, effectively a constant.
for word in word_list:
 for i in range(len(word)):
 ...

Too much computation: word difference is not relevant. A clue that your algorithm is inefficient comes from realizing that we don't actually care about word distances. Our goal is to build a graph linking immediate neighbors (those having a difference of exactly 1), but your algorithm does the work of actually computing all pairwise differences.

Invert the strategy: create linking keys. Suppose you realize that you don't need the word differences. The problem becomes, how do we find immediate neighbors (those having a difference of exactly 1) without actually computing any differences? That's not an easy question to answer, of course, and to some extent it requires a leap of insight. Nonetheless, there is still a general lesson to be learned. The wildcard-based strategy in the discussion forum achieves that goal by figuring out a way to map each word to some linking-keys, such that immediate neighbors will share a common key. The way to do that is to convert each word (a sequence of characters) into other sequences of characters where each character gets replaced by a generic value.

# The word
dog
# Its linking keys
_og
d_g
do_

Amazing! Thank you for your insightful comments. I guess I have to practice more. Is there anything (book/leetcode list/course) that you would suggest?
@abybaddi009 You're welcome. Yes, practice is crucial. I don't have any ready-made suggestions, other than my regular theme here: data-centric thinking as a strategy/mindset for reducing/simplifying algorithms. The linking-key strategy is a data-centric approach to the problem in the sense that it envisions an easily built data structure to achieve the linking without requiring exhaustive pairwise comparisons. You can see a different example of the theme here and in a couple of other links I provide in its comments.

Stack Exchange Network

performance - Word ladder leetcode

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

performance - Word ladder leetcode

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions