5
\$\begingroup\$

I implemented a Python version of KMP followed by Princeton Prof Robert Sedgewick's discussion.

However, I got a Time Limit Exceeds error. Per leetcode's explanation, TLE indicates that the solution is correct but it is extremely inefficient when test case with huge input size.

If I understand TLE correctly, this means that my solution works, but some part of my solution should be inefficient and need to be enhanced? Would this be due to an extremely large space cost when precomputing the DFA table?

class Solution(object):
 def dfa(self, haystack, needle):
 R = 256 # R : the alphabet size . We set 256 for ASCII
 M = len(needle) # M : the patten length
 # create a dictionary to encode every unique char from needle with a unique int value for later dfa retrieval.
 global needle_dict
 #needle_dict = dict(zip(list(set(needle)), range(len(list(set(needle))))))
 needle_dict = dict(zip(list(set(needle).union(set(haystack))), range(len(list(set(needle).union(set(haystack)))))))
 # create dfa table, where the # of dfa columns is M, the # of dfa rows is R.
 dfa = [[0 for i in range(M)] for j in range(R)]
 # At state 0 find the first char in the needle and set this first char's corresponding state as 1
 dfa[needle_dict[needle[0]]][0] = 1
 X = 0 # X to record previous state
 for state in range(1, M): # dfa column
 for row in range(0, R): # dfa row
 dfa[row][state] = dfa[row][X] # copy mismatch cases
 dfa[needle_dict[needle[state]]][state] = state + 1 # update the match case for present state
 X = dfa[needle_dict[needle[state]]][X] # update X
 return dfa
 def strStr(self, haystack, needle):
 if len(needle) == 0:
 return 0
 if (haystack == None or needle == None or len(haystack) < len(needle) ):
 return -1
 # precompute dfa[][]from pattern
 dfa_table = self.dfa(haystack, needle)
 i, state = 0, 0
 while (i < len(haystack) and state < len(needle)):
 state = dfa_table[needle_dict[haystack[i]]][state]
 i += 1
 if state == len(needle):
 return i - len(needle) 
 else:
 return -1
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jul 16, 2016 at 18:57
\$\endgroup\$
2
  • \$\begingroup\$ Unlikely to make much of a perf difference, but comparisons to None should be if foo is None, not if foo == None. e.g. appneta.com/blog/python-none-comparison \$\endgroup\$ Commented Jul 17, 2016 at 23:08
  • \$\begingroup\$ Thanks! This is Code Review. I appreciated any suggestion for any inappropriate part! :) \$\endgroup\$ Commented Jul 19, 2016 at 1:01

1 Answer 1

1
\$\begingroup\$

A few things pop out:

  • this:

    needle_dict = dict(zip(list(set(needle).union(set(haystack))), range(len(list(set(needle).union(set(haystack)))))))
    

...creates two identical sets. More space(and time, perhaps) efficient to do:

 straws = list(set(needle).union(set(haystack)))
 needle_dict = dict(zip(straws), range(len(straws)))
  • this:

    dfa = [[0 for i in range(M)] for j in range(R)]
    

...does needless iteration. Instead do:

 dfa = [ list(range(M)) for j in range(R) ]
  • There's a small bug in:

    if len(needle) == 0:
     return 0
    if (haystack == None or needle == None or len(haystack) < len(needle) ):
     return -1
    

... in that you're calling len(needle) before testing if it's None.

  • It's unclear why you're doing:

    global needle_dict
    

... it may have performance implications (though I'm not positive).

answered Aug 12, 2016 at 17:45
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.