I am currently trying to teach myself some programming. I have started to work with Python by doing this challenge.
For each test case, I need to find the sum of the self-similarities of a string with each of its suffixes. For example, given the string ababaa
, the self-similarity scores are
- 6 (because
ababaa
=ababaa
) - 0 (because
ababaa
≠babaa
) - 3 (because
ababaa
andabaa
share three initial characters) - 0 (because
ababaa
≠baa
) - 1 (because
ababaa
andaa
share one initial character) - 1 (because
ababaa
anda
share one initial character)
... for a total score of 11.
When I test run it works out fine, however when I run this code with longer strings it takes too long time to run, so the site shuts down the program. Each input string consists of up to 100000 lowercase characters.
Is this because this code make unnecessary loops? Or what else might the problem be here?
# Solution.py for https://www.hackerrank.com/challenges/string-similarity
import sys
for line in sys.stdin:
if line.islower():
line = line[:-1] # line will be compared to suffix
line_length = len(line)
points = 0
for s in xrange(0,line_length):
suffix = line[s:]
suffix_length = len(suffix)
count = 1
while count < suffix_length+1:
if line.startswith(suffix[:1]): # if line and first character of suffix mach
if line.startswith(suffix[:suffix_length]):
points += len(suffix[:suffix_length])
break
else:
if line.startswith(suffix[:count+1]):
count += 1
elif line.startswith(suffix[:count]):
points += len(suffix[:count])
break
else:
break
print points
-
2\$\begingroup\$ Try studying z-algorithm. This question is a very very simple modification of the z-array you get as part of the z-algorithm. See this <codeforces.com/blog/entry/3107> for the tutorial or youtube video tutorials <youtube.com/watch?v=MFK0WYeVEag> \$\endgroup\$Shiv– Shiv2013年09月07日 15:05:08 +00:00Commented Sep 7, 2013 at 15:05
3 Answers 3
It is slow because you use slow algorithm ("too many loops"). This problem might be a bit hard for beginner. If you want to solve it anyway, be sure to search for tutorials on string algorithms (try to look at http://en.wikipedia.org/wiki/Aho-Corasick and change it a bit). Maybe dynamic programming will help you with this or other problems.
PS I hope you understand nobody here can not spoil the solution. Have fun with programming.
You could always try another problems:
Timus one of the best: http://acm.timus.ru/problemset.aspx
Project Euler if you like mathematics: http://projecteuler.net/
(Please feel free to edit and add some more.)
You have far too many if line.startswith(something)
checks inside the while
loop -- you are checking the same characters many times. The while
loop counts characters, so you only need to compare one pair of characters on each iteration.
As @Janne-Karila has said there are too many startswith
s. Your suffix[:1]
could be replaced by suffix[0]
and suffix[:suffix_length]
is simply the same as suffix
.
If you design the loops right you only need to compare one character at a time, and there is no call to use startswith
at all. This also greatly simplifies the code.
def string_sim():
n = int(sys.stdin.readline())
for _ in range(n):
line = sys.stdin.readline().strip()
points = len(line)
for s in xrange(1, len(line)):
for count in xrange(len(line) - s):
if line[count] != line[s + count]:
break
points += 1
print points
string_sim()
This will give a slight speed boost, but as others have pointed out, you need a wholly better algorithm to pass all the tests.