I implemented a Python version of KMP followed by Princeton Prof Robert Sedgewick's discussion.
However, I got a Time Limit Exceeds error. Per leetcode's explanation, TLE indicates that the solution is correct but it is extremely inefficient when test case with huge input size.
If I understand TLE correctly, this means that my solution works, but some part of my solution should be inefficient and need to be enhanced? Would this be due to an extremely large space cost when precomputing the DFA table?
class Solution(object):
def dfa(self, haystack, needle):
R = 256 # R : the alphabet size . We set 256 for ASCII
M = len(needle) # M : the patten length
# create a dictionary to encode every unique char from needle with a unique int value for later dfa retrieval.
global needle_dict
#needle_dict = dict(zip(list(set(needle)), range(len(list(set(needle))))))
needle_dict = dict(zip(list(set(needle).union(set(haystack))), range(len(list(set(needle).union(set(haystack)))))))
# create dfa table, where the # of dfa columns is M, the # of dfa rows is R.
dfa = [[0 for i in range(M)] for j in range(R)]
# At state 0 find the first char in the needle and set this first char's corresponding state as 1
dfa[needle_dict[needle[0]]][0] = 1
X = 0 # X to record previous state
for state in range(1, M): # dfa column
for row in range(0, R): # dfa row
dfa[row][state] = dfa[row][X] # copy mismatch cases
dfa[needle_dict[needle[state]]][state] = state + 1 # update the match case for present state
X = dfa[needle_dict[needle[state]]][X] # update X
return dfa
def strStr(self, haystack, needle):
if len(needle) == 0:
return 0
if (haystack == None or needle == None or len(haystack) < len(needle) ):
return -1
# precompute dfa[][]from pattern
dfa_table = self.dfa(haystack, needle)
i, state = 0, 0
while (i < len(haystack) and state < len(needle)):
state = dfa_table[needle_dict[haystack[i]]][state]
i += 1
if state == len(needle):
return i - len(needle)
else:
return -1
1 Answer 1
A few things pop out:
this:
needle_dict = dict(zip(list(set(needle).union(set(haystack))), range(len(list(set(needle).union(set(haystack)))))))
...creates two identical sets. More space(and time, perhaps) efficient to do:
straws = list(set(needle).union(set(haystack)))
needle_dict = dict(zip(straws), range(len(straws)))
this:
dfa = [[0 for i in range(M)] for j in range(R)]
...does needless iteration. Instead do:
dfa = [ list(range(M)) for j in range(R) ]
There's a small bug in:
if len(needle) == 0: return 0 if (haystack == None or needle == None or len(haystack) < len(needle) ): return -1
... in that you're calling len(needle)
before testing if it's None
.
It's unclear why you're doing:
global needle_dict
... it may have performance implications (though I'm not positive).
Explore related questions
See similar questions with these tags.
if foo is None
, notif foo == None
. e.g. appneta.com/blog/python-none-comparison \$\endgroup\$