Princeton KMP implementation

Question 1

I implemented a Python version of KMP followed by Princeton Prof Robert Sedgewick's discussion.

However, I got a Time Limit Exceeds error. Per leetcode's explanation, TLE indicates that the solution is correct but it is extremely inefficient when test case with huge input size.

If I understand TLE correctly, this means that my solution works, but some part of my solution should be inefficient and need to be enhanced? Would this be due to an extremely large space cost when precomputing the DFA table?

class Solution(object):
 def dfa(self, haystack, needle):
 R = 256 # R : the alphabet size . We set 256 for ASCII
 M = len(needle) # M : the patten length
 # create a dictionary to encode every unique char from needle with a unique int value for later dfa retrieval.
 global needle_dict
 #needle_dict = dict(zip(list(set(needle)), range(len(list(set(needle))))))
 needle_dict = dict(zip(list(set(needle).union(set(haystack))), range(len(list(set(needle).union(set(haystack)))))))
 # create dfa table, where the # of dfa columns is M, the # of dfa rows is R.
 dfa = [[0 for i in range(M)] for j in range(R)]
 # At state 0 find the first char in the needle and set this first char's corresponding state as 1
 dfa[needle_dict[needle[0]]][0] = 1
 X = 0 # X to record previous state
 for state in range(1, M): # dfa column
 for row in range(0, R): # dfa row
 dfa[row][state] = dfa[row][X] # copy mismatch cases
 dfa[needle_dict[needle[state]]][state] = state + 1 # update the match case for present state
 X = dfa[needle_dict[needle[state]]][X] # update X
 return dfa
 def strStr(self, haystack, needle):
 if len(needle) == 0:
 return 0
 if (haystack == None or needle == None or len(haystack) < len(needle) ):
 return -1
 # precompute dfa[][]from pattern
 dfa_table = self.dfa(haystack, needle)
 i, state = 0, 0
 while (i < len(haystack) and state < len(needle)):
 state = dfa_table[needle_dict[haystack[i]]][state]
 i += 1
 if state == len(needle):
 return i - len(needle) 
 else:
 return -1

Question 2

Unlikely to make much of a perf difference, but comparisons to None should be if foo is None, not if foo == None. e.g. appneta.com/blog/python-none-comparison

Question 3

Thanks! This is Code Review. I appreciated any suggestion for any inappropriate part! :)

Question 4

A few things pop out:

this:

needle_dict = dict(zip(list(set(needle).union(set(haystack))), range(len(list(set(needle).union(set(haystack)))))))

...creates two identical sets. More space(and time, perhaps) efficient to do:

 straws = list(set(needle).union(set(haystack)))
 needle_dict = dict(zip(straws), range(len(straws)))

this:

dfa = [[0 for i in range(M)] for j in range(R)]

...does needless iteration. Instead do:

 dfa = [ list(range(M)) for j in range(R) ]

There's a small bug in:

if len(needle) == 0:
 return 0
if (haystack == None or needle == None or len(haystack) < len(needle) ):
 return -1

... in that you're calling len(needle) before testing if it's None.

It's unclear why you're doing:
```
global needle_dict
```

... it may have performance implications (though I'm not positive).

pjz pjz 2,40613 silver badges15 bronze badges · Answer 1 · 2016-08-12 17:45:34Z

A few things pop out:

this:

needle_dict = dict(zip(list(set(needle).union(set(haystack))), range(len(list(set(needle).union(set(haystack)))))))

...creates two identical sets. More space(and time, perhaps) efficient to do:

 straws = list(set(needle).union(set(haystack)))
 needle_dict = dict(zip(straws), range(len(straws)))

this:

dfa = [[0 for i in range(M)] for j in range(R)]

...does needless iteration. Instead do:

 dfa = [ list(range(M)) for j in range(R) ]

There's a small bug in:

if len(needle) == 0:
 return 0
if (haystack == None or needle == None or len(haystack) < len(needle) ):
 return -1

... in that you're calling len(needle) before testing if it's None.

It's unclear why you're doing:
```
global needle_dict
```

... it may have performance implications (though I'm not positive).

Stack Exchange Network

Princeton KMP implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Princeton KMP implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions