I'm trying to solve Wildcard Matching on LeetCode. I found a great solution using the Finite-state machine. The author builds an FSM and searches with BFS.
I rewrite the solution with the same idea (FSM) but use depth-first search (DFS):
class Solution:
def isMatch(self, s: str, p: str) -> bool:
# build the fsm
fsm = {}
state = 0
for i in range(len(p)):
if p[i] == "*":
# connect to state itself
fsm[state, p[i]] = state
else:
fsm[state, p[i]] = state + 1
state += 1
self.accept = state
self.fsm = fsm
return self.dfs(s, idx=0, state=0)
def dfs(self, s: str, idx: int, state: int) -> bool:
"""deep first search through the finite state machine"""
if idx == len(s):
# processing string finished
if state == self.accept:
return True
else:
return False
for value in [s[idx], "*", "?"]:
if (state, value) not in self.fsm:
# invaild state
continue
next_state = self.fsm[state, value]
if self.dfs(s, idx + 1, next_state):
return True
return False
Since we need to iterate the whole string s
, I think the time complexity of DFS and BFS are the same. However, I had the Time Limit Exceeded on the testing case:
"babbbbaabababaabbababaababaabbaabababbaaababbababaaaaaabbabaaaabababbabbababbbaaaababbbabbbbbbbbbbaabbb"
"b**bb**a**bba*b**a*bbb**aba***babbb*aa****aabb*bbb***a"
- Are the time complexity of DFS and BFS the same?
- Is there something wrong with my code?
2 Answers 2
Documentation
It is great that you used type hints for the functions.
The PEP 8 style guide recommends
adding docstrings for functions. I realize the code is submitted to
the LeetCode site as a programming challenge, but for general code
review, it would be good to add a docstring summarizing the purpose
of the isMatch
function. A description of the state machine would
also be helpful.
The dfs
function docstring is good:
"""deep first search through the finite state machine"""
but, it would be better to use the common name for the algorithm:
"""Depth-first search through the finite state machine"""
It would also be helpful to mention that the function is recursive.
Naming
PEP-8 recommends snake_case for function and variable names.
isMatch
would be is_match
(outside of LeetCode restrictions).
dfs
could be depth_first_search
.
s
and p
are a bit terse in this context. input_string
and pattern
would be better.
TLE
- Is there something wrong with my code?
Since the code does not hit the time goal, this is likely not the best approach. The link you posted for the FSM solution has a long discussion; perhaps there are some clues buried in there.
time complexity
I had the Time Limit Exceeded on the testing case: ...
"b**bb**a**bba*b**a*bbb**aba***babbb*aa****aabb*bbb***a"
(Using more standard regex notation from outside the problem
statement, we might phrase that as "^b.*.*bb.*.*a ..."
and so on.)
You seem to be touching on some classic "why does this egrep
take forever?" gotchas.
It is easy enough to write a short regex which hides
some ugly deeply nested loops.
I won't comment on your dfs()
.
I will observe that there's an opportunity to
simplify the input you're sending it.
There's no difference between the following patterns:
- a*b
- a**b
- a***b
- a****b
You can simplify all of them down to "a*b"
before
invoking dfs().
Explore related questions
See similar questions with these tags.