I was doing a standard problem of DP (dynamic programming) on SPOJ Edit Distance using Python.
t = raw_input()
for i in range(int(t)):
a,b = raw_input(),raw_input()
r = len(a)
c = len(b)
x = [[0]*(c+1) for j in range(r+1)]
for j in range(c+1):
x[0][j] = j
for j in range(r+1):
x[j][0] = j
for j in range(1,r+1):
for k in range(1,c+1):
if(b[k-1]!=a[j-1]):
x[j][k] = min(x[j-1][k-1]+1,x[j-1][k]+1,x[j][k-1]+1)
else:
x[j][k] = min(x[j-1][k-1],x[j-1][k]+1,x[j][k-1]+1)
print x[r][c]
The solution I have proposed is giving T.L.E (Time Limit Exceeded) even though I am using D.P. Is there any way to optimize it further in terms of time complexity or with respect to any feature of Python 2.7 such as input and output?
2 Answers 2
I've solved that problem in java and c++ (2nd and 3th places in best solutions category :) so I can compare the local and the remote execution time in order to see - how much faster should be your solution in order to pass.
So, the local execution time of my java solution is 78 ms for the such testcase (10 pairs x 2000 chars), the robot's time is 500 ms, so my PC is ~6.5 times faster. Then the following python DP solution takes 3.6 seconds on my PC so, it would take ~23.5 seconds on the remote PC. So if the remote time limit 15 seconds, the following solution must be minimum ~ 1.56 times faster in order to pass. Ufff....
import time
try:
# just to see the Python 2.5 + psyco speed - 17 times faster than Python 2.7 !!!
import psyco
psyco.full()
except:
pass
def editDistance(s1, s2):
if s1 == s2: return 0
if not len(s1):
return len(s2)
if not len(s2):
return len(s1)
if len(s1) > len(s2):
s1, s2 = s2, s1
r1 = range(len(s2) + 1)
r2 = [0] * len(r1)
i = 0
for c1 in s1:
r2[0] = i + 1
j = 0
for c2 in s2:
if c1 == c2:
r2[j+1] = r1[j]
else:
a1 = r2[j]
a2 = r1[j]
a3 = r1[j+1]
if a1 > a2:
if a2 > a3:
r2[j+1] = 1 + a3
else:
r2[j+1] = 1 + a2
else:
if a1 > a3:
r2[j+1] = 1 + a3
else:
r2[j+1] = 1 + a1
j += 1
aux = r1; r1 = r2; r2 = aux
i += 1
return r1[-1]
if __name__ == "__main__":
st = time.time()
t = raw_input()
for i in range(int(t)):
a, b = raw_input(), raw_input()
print editDistance(a, b)
#print "Time (s): ", time.time()-st
What I can say - It's very very hard or may be impossible to pass that puzzle using Python and DP approach (using java or c++ - peace of cake). Wait, wait - ask you - what about that two guys that passed using python? Ok, the answer is easy - they use something different. What's exactly? Something that I've used for java solution I think. That stuff just blows away the competitors....
-
\$\begingroup\$ Thanks @cat_baxter i am not much familiar with java but i will try and look into the algo part of that java solution and try to implement it in python if possible .... \$\endgroup\$RYO– RYO2013年07月08日 14:36:27 +00:00Commented Jul 8, 2013 at 14:36
I don't think there is much to improve in terms of performance. However, you could make the code a whole lot more self-descriptive, e.g. using better variable names, and moving the actual calculation into a function that can be called with different inputs. Here's my try:
def min_edit_distance(word1, word2, subst=1):
len1, len2 = len(word1), len(word2)
med = [[0] * (len2 + 1) for j in range(len1 + 1)]
for j in xrange(len1 + 1):
for k in xrange(len2 + 1):
if min(j, k) == 0:
med[j][k] = max(j, k) # initialization
else:
diag = 0 if word1[j-1] == word2[k-1] else subst
med[j][k] = min(med[j-1][k-1] + diag, # substite or keep
med[j-1][k ] + 1, # insert
med[j ][k-1] + 1) # delete
return med[len1][len2]
Main points:
- move the actual calculation into a function, for reusability
- use more descriptive names instead of one-letter variables
- different calculations of minimum edit distance use different costs for substitutions -- sometimes
1
, sometimes2
-- so this could be a parameter - unless I'm mistaken the
min
in yourelse
is not necessary;x[j-1][k-1]
will always be the best - the two initialization loops can be incorporated into the main double-loop. (Clearly this is a question of taste. Initialization loops are more typical for DP, while this variant is closer to the definition.)
Explore related questions
See similar questions with these tags.
collections.deque
- I doubt the input/output methods will make much difference. But I see no one has solved this problem with python yet. \$\endgroup\$