I was attempting to solve LeetCode 139 - Word Break
Given a string
s
and a dictionary of stringswordDict
, returntrue
ifs
can be segmented into a space-separated sequence of one or more dictionary words.Note that the same word in the dictionary may be reused multiple times in the segmentation.
Example 1:
Input: s = "leetcode", wordDict = ["leet","code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".Example 2:
Input: s = "applepenapple", wordDict = ["apple","pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple". Note that you are allowed to reuse a dictionary word.Example 3:
Input: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output: falseConstraints:
1 <= s.length <= 300
1 <= wordDict.length <= 1000
1 <= wordDict[i].length <= 20
s
andwordDict[i]
consist of only lowercase English letters.- All the strings of wordDict are unique.
I watched some videos saying implementing a trie would be an efficient approach to solve this so I implemented a Trie as follows:
fileprivate class TrieNode {
var data: Character?
var isWordEnd = false
var next = [Character: TrieNode]()
}
fileprivate struct Trie {
var root = TrieNode()
func insert(_ word: String) {
var rootNode = root
for char in word {
rootNode = insertInternal(char, at: rootNode)
}
rootNode.isWordEnd = true
}
func insertInternal(_ char: Character, at node: TrieNode) -> TrieNode {
var nextNode = TrieNode()
if node.next[char] != nil {
nextNode = node.next[char]!
}
nextNode.data = char
node.next[char] = nextNode
return nextNode
}
func doesWordExist(_ word: String) -> Bool {
var rootNode = root
for char in word {
let nextNode = rootNode.next
if nextNode[char] == nil {
return false
}
rootNode = nextNode[char]!
}
return rootNode.isWordEnd
}
}
Then in order to solve the problem, I took the recursion / backtracking approach with the help of the Trie created above in the following way:
func wordBreak(_ s: String, _ wordDict: [String]) -> Bool {
let trie = Trie()
for word in wordDict {
trie.insert(word)
}
return wordBreakInternal(s, wordDict: trie)
}
fileprivate func wordBreakInternal(_ s: String, wordDict: Trie) -> Bool {
if s.isEmpty { return true }
let leftIndex = s.startIndex
var rightIndex = s.startIndex
while rightIndex < s.endIndex {
let word = String(s[leftIndex ... rightIndex])
if wordDict.doesWordExist(word) {
var nextStartIndex = rightIndex
s.formIndex(after: &nextStartIndex)
let nextWord = String(s[nextStartIndex ..< s.endIndex])
if wordBreakInternal(nextWord, wordDict: wordDict) {
return true
}
}
s.formIndex(after: &rightIndex)
}
return false
}
I feel the solution works in terms of giving the desired output for these test cases:
print(wordBreak("leetcode", ["leet","code"])) // true
print(wordBreak("applepenapple", ["apple","pen"])) // true
print(wordBreak("catsandog", ["cats","dog","sand","and","cat"])) // false
print(wordBreak("aaaaaaa", ["aaaa","aaa"])) // true
However, when a large string is the input:
s = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
wordDict = ["a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa","aaaaaaaaa","aaaaaaaaaa"]
I get Time Limit Exceeded in my LeetCode submission.
Could someone advice if the optimization needs to be done in my trie implementation or in my backtracking / recursion with some ideas on how to make things more efficient ?
2 Answers 2
Summary: Your code is written clearly, and works correctly as far as I can see. It can be improved at a few points, and string slices can be used for a small performance improvement. In order to make it work with really large strings within the given time constraint, additional techniques like memoization must be used.
Simplify and improve the code
The var data: Character?
of class TrieNode
is assigned, but never used. It can be removed.
In func insertInternal()
a
var nextNode = TrieNode()
is created but not used if node.next[char]
already exists. Also the forced unwrapping can be avoided:
func insertInternal(_ char: Character, at node: TrieNode) -> TrieNode {
if let nextNode = node.next[char] {
return nextNode
} else {
let nextNode = TrieNode()
node.next[char] = nextNode
return nextNode
}
}
In func doesWordExist
the variable name of
let nextNode = rootNode.next
is misleading: it is not a node but a dictionary. Again the forced unwrapping (and the comparison against nil
) can be avoided:
func doesWordExist(_ word: Substring) -> Bool {
var rootNode = root
for char in word {
if let nextNode = rootNode.next[char] {
rootNode = nextNode
} else {
return false
}
}
return rootNode.isWordEnd
}
Use Substring
It is more efficient to pass Substring
s to the recursively called wordBreakInternal()
functions instead of String
s. Substring
is a "string slice" and shares the storage with the original string. The helper function is then declared as
func wordBreakInternal(_ s: Substring, wordDict: Trie) -> Bool
and called as
return wordBreakInternal(s[...], wordDict: trie)
This makes the function a bit faster, but not fast enough to solve the programming challenge within the time limit.
Memoization
The function is too slow for long strings because of the heavy recursion. One approach to reduce the amount of recursive calls is memoization: For each substring which has been determined to be word-breakable or not, remember the result:
func wordBreak(_ s: String, _ wordDict: [String]) -> Bool {
let trie = Trie()
var memo = [Substring: Bool]()
for word in wordDict {
trie.insert(word)
}
func wordBreakInternal(_ s: Substring) -> Bool {
if let memoized = memo[s] {
return memoized
}
if s.isEmpty { return true }
let leftIndex = s.startIndex
var rightIndex = s.startIndex
while rightIndex < s.endIndex {
let word = (s[leftIndex ... rightIndex])
if trie.doesWordExist(word) {
var nextStartIndex = rightIndex
s.formIndex(after: &nextStartIndex)
let nextWord = (s[nextStartIndex ..< s.endIndex])
if wordBreakInternal(nextWord) {
memo[s] = true
return true
}
}
s.formIndex(after: &rightIndex)
}
memo[s] = false
return false
}
return wordBreakInternal(s[...])
}
I have also made the helper function a nested function in the main function, so that the memoization dictionary (and the trie) need not be passed around as function parameters.
This function computes
let s = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
let wordDict = ["a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa","aaaaaaaaa","aaaaaaaaaa"]
let result = wordBreak(s, wordDict)
in about 7 milliseconds (compiled in Release mode, running on a Macbook Air M2).
To trie or not to trie
You have used a Trie in order to determine quickly if the strings starts with one of the given words. As it turns out, using the built-in hasPrefix()
method does this even better. Using string slices and memoization as before we get
func wordBreak(_ s: String, _ wordDict: [String]) -> Bool {
var memo = [Substring: Bool]()
func wordBreakInternal(_ s: Substring) -> Bool {
if let memoized = memo[s] {
return memoized
}
if s.isEmpty {
return true
}
for word in wordDict {
if s.hasPrefix(word) {
if wordBreakInternal(s.dropFirst(word.count)) {
memo[s] = true
return true
}
}
}
memo[s] = false
return false
}
return wordBreakInternal(s[...])
}
which computes the above test case in less than a millisecond.
-
1\$\begingroup\$ Thank you very much Martin. This answer was so helpful. It helped me optimize my code, improved code readability but it introduced me to memoization which was a key concept in optimizing the solution for submission. Your second solution was also amazing and was something I would never has thought of myself as I just learnt about the swift prefix functions after reading your answer - thank you ! \$\endgroup\$Shawn Frank– Shawn Frank2022年09月03日 17:16:41 +00:00Commented Sep 3, 2022 at 17:16
Avoid repeated computations
For every prefix that will not lead to a solution, the computation is repeated for all permutations of the words that could produce that prefix. For example given the words ["hot", "dog", "hotdog"], and the target string is "hotdogbreadandamillionmorechars", there are two ways to make the prefix hotdog, and the program will compute twice that it's not possible to break the input to words. In this example that's ok, but for the example where you get time limit exceeded it's easy to see that there are many prefixes for which there is a large number of permutations.
To avoid repeated computations, you could use a set of indexes for which the result has already been computed, and skip.
Another way is to use memoization as the other answer suggested.
Take advantage of the trie
The code doesn't take advantage of the trie to its full potential.
The main loop in wordBreakInternal
moves the rightIndex
to the right one by one,
and in every iteration checking if the substring defined by leftIndex
and rightIndex
is in the trie or not.
That is, each iteration starts searching in the trie from the top.
For example if the tree contains "apple" and you run wordBreakInternal
for "apple",
the loop will search for "a", then "ap", then "app", and so on,
and in each iteration the trie also explores the same sequence of characters.
A more effective use of the trie would be to add a findPrefixes
method,
which would take target string to match,
and returned all the valid prefixes that exist in the trie.
For example if the tree contains ["ap", "apple", "pie"] and the input is "applepie",
it would return "ap" and "apple".
The caller would then trie to see if it's possible to make the remaining of the input from "plepie" and "pie", respectively.
Use simpler and more precise names
In insert
, the variable rootNode
is repeatedly replaced with an inserted child node. Then it's not the root anymore, so simply node
would be a better name, less confusing.
doesWordExist
is a bit lengthy,
and "word" is already implied by the trie concept.
How about simply exists
, or more precisely hasPrefix
.
-
2\$\begingroup\$ Thank you very much Janos. Your tips helped me improve my code readability and avoid redundant computations. The most useful part for me was your tip on using the trie to find prefixes - I would never have thought of this on my own. Thank you very much for taking the time to answer me ... +1 \$\endgroup\$Shawn Frank– Shawn Frank2022年09月03日 17:18:25 +00:00Commented Sep 3, 2022 at 17:18
Explore related questions
See similar questions with these tags.