Hackers Ranking String Similarity

Question 1

I am currently trying to teach myself some programming. I have started to work with Python by doing this challenge.

For each test case, I need to find the sum of the self-similarities of a string with each of its suffixes. For example, given the string ababaa, the self-similarity scores are

6 (because ababaa = ababaa)
0 (because ababaa ≠ babaa)
3 (because ababaa and abaa share three initial characters)
0 (because ababaa ≠ baa)
1 (because ababaa and aa share one initial character)
1 (because ababaa and a share one initial character)

... for a total score of 11.

When I test run it works out fine, however when I run this code with longer strings it takes too long time to run, so the site shuts down the program. Each input string consists of up to 100000 lowercase characters.

Is this because this code make unnecessary loops? Or what else might the problem be here?

# Solution.py for https://www.hackerrank.com/challenges/string-similarity
import sys
for line in sys.stdin:
 if line.islower():
 line = line[:-1] # line will be compared to suffix
 line_length = len(line)
 points = 0
 for s in xrange(0,line_length):
 suffix = line[s:]
 suffix_length = len(suffix)
 count = 1
 while count < suffix_length+1:
 if line.startswith(suffix[:1]): # if line and first character of suffix mach
 if line.startswith(suffix[:suffix_length]):
 points += len(suffix[:suffix_length])
 break
 else:
 if line.startswith(suffix[:count+1]):
 count += 1
 elif line.startswith(suffix[:count]):
 points += len(suffix[:count])
 break
 else:
 break
 print points

Question 2

Try studying z-algorithm. This question is a very very simple modification of the z-array you get as part of the z-algorithm. See this <codeforces.com/blog/entry/3107> for the tutorial or youtube video tutorials <youtube.com/watch?v=MFK0WYeVEag>

Question 3

It is slow because you use slow algorithm ("too many loops"). This problem might be a bit hard for beginner. If you want to solve it anyway, be sure to search for tutorials on string algorithms (try to look at http://en.wikipedia.org/wiki/Aho-Corasick and change it a bit). Maybe dynamic programming will help you with this or other problems.

PS I hope you understand nobody here can not spoil the solution. Have fun with programming.

You could always try another problems:

Timus one of the best: http://acm.timus.ru/problemset.aspx

Project Euler if you like mathematics: http://projecteuler.net/

(Please feel free to edit and add some more.)

Question 4

You have far too many if line.startswith(something) checks inside the while loop -- you are checking the same characters many times. The while loop counts characters, so you only need to compare one pair of characters on each iteration.

Question 5

As @Janne-Karila has said there are too many startswiths. Your suffix[:1] could be replaced by suffix[0] and suffix[:suffix_length] is simply the same as suffix.

If you design the loops right you only need to compare one character at a time, and there is no call to use startswith at all. This also greatly simplifies the code.

def string_sim():
 n = int(sys.stdin.readline())
 for _ in range(n):
 line = sys.stdin.readline().strip()
 points = len(line)
 for s in xrange(1, len(line)):
 for count in xrange(len(line) - s):
 if line[count] != line[s + count]:
 break
 points += 1
 print points
string_sim()

This will give a slight speed boost, but as others have pointed out, you need a wholly better algorithm to pass all the tests.

kyticka kyticka 2061 silver badge3 bronze badges · Answer 1 · 2013-03-22 13:07:48Z

It is slow because you use slow algorithm ("too many loops"). This problem might be a bit hard for beginner. If you want to solve it anyway, be sure to search for tutorials on string algorithms (try to look at http://en.wikipedia.org/wiki/Aho-Corasick and change it a bit). Maybe dynamic programming will help you with this or other problems.

PS I hope you understand nobody here can not spoil the solution. Have fun with programming.

You could always try another problems:

Timus one of the best: http://acm.timus.ru/problemset.aspx

Project Euler if you like mathematics: http://projecteuler.net/

(Please feel free to edit and add some more.)

Janne Karila Janne Karila 10.6k21 silver badges34 bronze badges · Answer 2 · 2013-03-27 13:06:52Z

You have far too many if line.startswith(something) checks inside the while loop -- you are checking the same characters many times. The while loop counts characters, so you only need to compare one pair of characters on each iteration.

Stuart Stuart 2,86014 silver badges20 bronze badges · Answer 3 · 2013-09-07 20:14:59Z

As @Janne-Karila has said there are too many startswiths. Your suffix[:1] could be replaced by suffix[0] and suffix[:suffix_length] is simply the same as suffix.

If you design the loops right you only need to compare one character at a time, and there is no call to use startswith at all. This also greatly simplifies the code.

def string_sim():
 n = int(sys.stdin.readline())
 for _ in range(n):
 line = sys.stdin.readline().strip()
 points = len(line)
 for s in xrange(1, len(line)):
 for count in xrange(len(line) - s):
 if line[count] != line[s + count]:
 break
 points += 1
 print points
string_sim()

This will give a slight speed boost, but as others have pointed out, you need a wholly better algorithm to pass all the tests.

Stack Exchange Network

Hackers Ranking String Similarity

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Hackers Ranking String Similarity

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions