Return to Answer

Added a third way to print the results

edited Jun 19, 2016 at 14:01

29.4k
3
49
98

Once again, this can be optimizeoptimized by using zip instead of retrieving letters with their indices in the string:

 for _ in range(T):
 patient, virus = input().split()
 matches = 0
 for index in analyze(patient, virus):
 print(index, end=' ')
 matches += 1
 if not matches:
 print('No match!')
 else:
 print()

Or you can convert the matches on the fly:

 for _ in range(T):
 patient, virus = input().split()
 matches = ' '.join(map(str, analyze(patient, virus)))
 if not matches:
 matches = 'No match!'
 print(matches)

Once again, this can be optimize by using zip instead of retrieving letters with their indices in the string:

 for _ in range(T):
 patient, virus = input().split()
 matches = 0
 for index in analyze(patient, virus):
 print(index, end=' ')
 matches += 1
 if not matches:
 print('No match!')
 else:
 print()

Once again, this can be optimized by using zip instead of retrieving letters with their indices in the string:

 for _ in range(T):
 patient, virus = input().split()
 matches = 0
 for index in analyze(patient, virus):
 print(index, end=' ')
 matches += 1
 if not matches:
 print('No match!')
 else:
 print()

Or you can convert the matches on the fly:

 for _ in range(T):
 patient, virus = input().split()
 matches = ' '.join(map(str, analyze(patient, virus)))
 if not matches:
 matches = 'No match!'
 print(matches)

Source Link

answered Jun 19, 2016 at 11:54

301_Moved_Permanently

answered Jun 19, 2016 at 11:54

301_Moved_Permanently

29.4k
3
49
98

Various nitpicks

range(0, x) is better written as range(x)
print(str(i) + " ", end='') is equivalent to print(i, end=' ') but slower
if matches == 0 reads better as if not matches
You compute len(V) several times, just store its value in a variable.
Given that you don't use sep, and that h should already be a string, P,sep,V = str(h).partition(' ') is better written as P, V = h.split()

#Special cases aren't special enough to break the rules

Checking for equality between the patient sub-string and the virus is just a sub-case of counting the differences between them (and finding it is 0). Same for comparing the entire patient string and the virus. All are just sub-cases of:

 for i in range(0,upperlimit):
 Psub = P[i:i+len(V)]
 for j in range(0,len(V)): # If we hit this code then, we're about to see how many characters mismatch in the substring
 if Psub[j] != V[j]:
 mismatches += 1
 if mismatches>=2: # No point checking the substrings alphabet by alphabet any further, more than 1 mismatch present here.
 break

Besides, it is to be expected that there will be much more mismatch or fuzzy-matching when checking the substrings than there will be exact matches. So the exact comparison is just slowing you down.

Moreover, you can abuse the fact that booleans are integers and use the sum builtins to compute the number of differences faster:

 for i in range(upperlimit):
 Psub = P[i:i+len(V)]
 mismatches = sum(Psub[j] != V[j] for j in range(len(V)))

Once again, this can be optimize by using zip instead of retrieving letters with their indices in the string:

 for i in range(upperlimit):
 Psub = P[i:i+len(V)]
 mismatches = sum(p != v for p, v in zip(Psub, V))

The advantage of using zip being that the iteration will stop when reaching the end of the shortest string. We can use that to simplify the writting to:

 for i in range(upperlimit):
 mismatches = sum(p != v for p, v in zip(P[i:], V))

#Separation of concerns

As it stand, your code does 3 things at once:

iterate over the input data
compute the index of "matching" substrings
print such indexes

You should make functions to separate concerns and make it easier to re-use/test. A basic layout could look like:

def analyze(patient, virus):
 # build indexes where there is a match
def main():
 # iterate over lines in input
 for line in input_string:
 patient, virus = line.split()
 matches = analyze(patient, virus)
 # print output based on `matches`

It makes it more clear to follow what is going on. You then just have to figure out how to return meaningful values from analyze. First thoughts should be to build a list and return it:

def analyze(patient, virus):
 matches = []
 for i in range(len(patient) - len(virus) + 1):
 if sum(p != v for p, v in zip(patient[i:], virus)) < 2:
 matches.append(i)
 return matches
def main():
 # iterate over lines in input
 for line in input_string:
 patient, virus = line.split()
 matches = analyze(patient, virus)
 if not matches:
 print('No match!')
 else:
 print(' '.join(map(str, matches)))

Or you could turn analyze into a generator and convert its computation into a list when calling it. It let you avoid explicitly calling append:

Proposed improvements

def analyze(patient, virus):
 for begin in range(len(patient) - len(virus) + 1):
 if sum(p != v for p, v in zip(patient[begin:], virus)) < 2:
 yield begin
if __name__ == '__main__':
 T = int(input())
 for _ in range(T):
 patient, virus = input().split()
 matches = list(analyze(patient, virus))
 if not matches:
 print('No Match!')
 else:
 print(' '.join(map(str, matches)))

The advantage of using a generator is that you don't even have to convert it to a list, your original prints work as well:

 for _ in range(T):
 patient, virus = input().split()
 matches = 0
 for index in analyze(patient, virus):
 print(index, end=' ')
 matches += 1
 if not matches:
 print('No match!')
 else:
 print()

lang-py