Determine if a string is a sequence of dictionary words

Question 1

This is Leetcode problem 139: "Wordbreak"

Problem: Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

The same word in the dictionary may be reused multiple times in the segmentation. You may assume the dictionary does not contain duplicate words.

Example:

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true

Explanation: Return true because "applepenapple" can be segmented as "apple pen apple". Note that you are allowed to reuse a dictionary word. my solution

class Solution:
 def wordBreak(self, s: str, wordDict: List[str]) -> bool:
 """
 """
 def prune(i):
 if i == len(s):
 return True
 elif i >len(s):
 return False
 else:
 for word in wordDict:
 m = len(word)
 if s[i:i+m] == word and prune(i+m):
 return True
 return False
 wordDict.sort(key = lambda x:-len(x))
 return prune(0)

I believe the time complexity for my solution to be O(m^n), where m is the amount of words in wordDict and n is the length of the string s. Space wise is just the recursion stack so O(n). I'm having a hard time figuring out how to optimize my solution by either using bfs/dfs or dynamic programming. What would I memoize?

Question 2

Include the problem description in your post. Don’t make reviewers search for what "Leetcode 139" is.

Question 3

As AJNeufeld indicated, There are always people lazier than you are (no offense meant) and I'm one of these people who will not make that effort to try to figure out what this thing above is about. I'm not an expert and I might not be writing some review on this code (and I might) and certainly I won't given the absence of the description. And just a side note: the use of classes is not meant for a personal style (no offense once more) but if there is no need for using a class, use a regular function and that would be enough in this case.

Question 4

@AJNeufeld hey, thanks for pointing that out! Sorry I am still new to this community i did not know.

Question 5

@EmadBoctor on LeetCode the template they have you use is a class by default, this is not a choice by OP

Question 6

@ C.Nivs yeah, recently this template started to become more common than usual I was wondering why. Thanks for letting me know anyway.

Question 7

`"""DocStrings"""`

+1 for adding a doc string to the wordBreak() method, but -1 for not having any content in it.

PEP-008

Your code diverges from the PEP-008 guidelines in several areas:

Use snake_case for functions and variables, not mixedCase.
Use a single space around operators (elif i > len(s):), not 2 spaces, and then no spaces.
Add a blank line between the inner prune() function and the outer function, as well as between the class Solution and its first method.

Use pylint or pyflakes or similar to ensure your code follows proper conventions.

Variable Names

Variable names like s are very terse, but since it is given that name in the question, is forgivable.

Variable names like wordDict are misleading, and should be avoided! wordDict is actually a list, not a dict. Although that name was given in the question, you should correct it to word_list, or simply words to avoid confusion. (+1 for the type hint; it helps avoid some confusion. But rename the parameter anyway.)

i is acceptable as a loop index, but there is no obvious loop. idx or pos may improve clarity.

m is unacceptable. It in no way suggests it is the length of a word. Use a more descriptive variable name.

Dead Code

elif i > len(s):
 return False

At what point will this branch every be taken? The only way is if prune(i+m) is called when i+m is greater than len(s). But if m is the length of word, and s[i:i+m] == word is true, it is impossible for i+m to exceed len(s). This branch is dead code, and can be removed.

Algorithmic Improvements

You are sorting your "dictionary" by length, so that you check the longest words first. However, this doesn't actually guarantee any speed improvements. At any step, if the next possible word is the shortest one, you've guaranteed it will be the last one checked. Ditch the sort.

What you want to do is reduce the number of words from the "dictionary" you are testing at each step. One obvious way would be to turn the "dictionary" into an actual dict, say, storing the list of words which start with a given letter:

word_dict = {
 "a": ["apple"],
 "p": ["pen"],
 }

Or more programmatically:

word_dict = defaultdict(list, ((word[:1], word) for word in wordDict))

Then at each step, instead of comparing s[i:i+len(word)] to each and every word in wordDict, you would compare to the list of words in word_dict[s[i:i+1]], which should be significantly smaller.

This method will degenerate if all the dictionary words start with the same letter (["aaaa", "aaab", "aaac", "aaad"]), but you could handle this by storing the dictionary as a tree structure instead.

Another alternative would be to separate wordDict into sets of words of different lengths:

words_by_length = { 3: { "pen" }, 5: { "apple" } }

Then, at each stage, you could do a single set containment test for each word length:

for word_length, word_list in words_by_length.items():
 if s[i:i+word_length] in words_list:
 # ...

Question 8

This task can be accomplished using a regular expression.

As an example, for the input wordDict = ["apple", "pen"], one can build a regular expression
r"(apple|pen)+", and then match it against the entire input string "applepenapple" (use re.fullmatch, not re.match). The underlying algorithm for regular expression matching is based on a finite automata and is very efficient.

Question 9

You have presented an alternate solution, but have not provided any review of the OP's code. From the Help Center/Answering "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted."

Question 10

The OP explicitly asked for an improved solution but not a full review. So I did not attempt to provide it.

AJNeufeld AJNeufeld 35.2k5 gold badges41 silver badges103 bronze badges · Answer 1 · 2019-09-17 05:38:41Z

`"""DocStrings"""`

+1 for adding a doc string to the wordBreak() method, but -1 for not having any content in it.

PEP-008

Your code diverges from the PEP-008 guidelines in several areas:

Use snake_case for functions and variables, not mixedCase.
Use a single space around operators (elif i > len(s):), not 2 spaces, and then no spaces.
Add a blank line between the inner prune() function and the outer function, as well as between the class Solution and its first method.

Use pylint or pyflakes or similar to ensure your code follows proper conventions.

Variable Names

Variable names like s are very terse, but since it is given that name in the question, is forgivable.

Variable names like wordDict are misleading, and should be avoided! wordDict is actually a list, not a dict. Although that name was given in the question, you should correct it to word_list, or simply words to avoid confusion. (+1 for the type hint; it helps avoid some confusion. But rename the parameter anyway.)

i is acceptable as a loop index, but there is no obvious loop. idx or pos may improve clarity.

m is unacceptable. It in no way suggests it is the length of a word. Use a more descriptive variable name.

Dead Code

elif i > len(s):
 return False

At what point will this branch every be taken? The only way is if prune(i+m) is called when i+m is greater than len(s). But if m is the length of word, and s[i:i+m] == word is true, it is impossible for i+m to exceed len(s). This branch is dead code, and can be removed.

Algorithmic Improvements

You are sorting your "dictionary" by length, so that you check the longest words first. However, this doesn't actually guarantee any speed improvements. At any step, if the next possible word is the shortest one, you've guaranteed it will be the last one checked. Ditch the sort.

What you want to do is reduce the number of words from the "dictionary" you are testing at each step. One obvious way would be to turn the "dictionary" into an actual dict, say, storing the list of words which start with a given letter:

word_dict = {
 "a": ["apple"],
 "p": ["pen"],
 }

Or more programmatically:

word_dict = defaultdict(list, ((word[:1], word) for word in wordDict))

Then at each step, instead of comparing s[i:i+len(word)] to each and every word in wordDict, you would compare to the list of words in word_dict[s[i:i+1]], which should be significantly smaller.

This method will degenerate if all the dictionary words start with the same letter (["aaaa", "aaab", "aaac", "aaad"]), but you could handle this by storing the dictionary as a tree structure instead.

Another alternative would be to separate wordDict into sets of words of different lengths:

words_by_length = { 3: { "pen" }, 5: { "apple" } }

Then, at each stage, you could do a single set containment test for each word length:

for word_length, word_list in words_by_length.items():
 if s[i:i+word_length] in words_list:
 # ...

GZ0 GZ0 2,3318 silver badges19 bronze badges · Answer 2 · 2019-09-17 05:16:03Z

1

\$\begingroup\$

This task can be accomplished using a regular expression.

As an example, for the input wordDict = ["apple", "pen"], one can build a regular expression
r"(apple|pen)+", and then match it against the entire input string "applepenapple" (use re.fullmatch, not re.match). The underlying algorithm for regular expression matching is based on a finite automata and is very efficient.

Share

answered Sep 17, 2019 at 5:16

GZ0's user avatar

GZ0 GZ0

2,3318 silver badges19 bronze badges

\$\endgroup\$

2

\$\begingroup\$ You have presented an alternate solution, but have not provided any review of the OP's code. From the Help Center/Answering "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." \$\endgroup\$

AJNeufeld
– AJNeufeld

2019年09月17日 05:51:21 +00:00
Commented Sep 17, 2019 at 5:51
\$\begingroup\$ The OP explicitly asked for an improved solution but not a full review. So I did not attempt to provide it. \$\endgroup\$

GZ0
– GZ0

2019年09月17日 05:53:00 +00:00
Commented Sep 17, 2019 at 5:53

Add a comment |

Stack Exchange Network

Determine if a string is a sequence of dictionary words

2 Answers 2

`"""DocStrings"""`

PEP-008

Variable Names

Dead Code

Algorithmic Improvements

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Determine if a string is a sequence of dictionary words

2 Answers 2

"""DocStrings"""

PEP-008

Variable Names

Dead Code

Algorithmic Improvements

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`"""DocStrings"""`