Morgan and a String HackerRank challenge

Question 1

I am solving a HackerRank problem called 'Morgan and a String'. Given two strings, each consisting of up to 10⁵ uppercase characters, the task is to form the lexicographically smallest string from those letters by taking the first remaining character from either of the strings at each step. Example, for input strings "ACA" and "BCF", the answer should be "ABCACF":

Jack Daniel Result
ACA BCF
CA BCF A
CA CF AB
A CF ABC
A CF ABCA
 F ABCAC
 ABCACF

Sample input, with one test case:

1
ACEG
BDFH

Output:

ABCDEFGH

I've solved it by writing my own code which works perfectly on smaller test cases. But when they run it against test cases which are approx 6000 characters long I get a runtime error. How can I increase the efficiency?

n=int(input())#number of pairs to input
for mn in range(n):
 a=input()# String1 in the pair
 b=input()#String2 in the pair
 la=[]
 lb=[]
 for i in range(max(len(a),len(b))):#creating lists with all possible elements by slicing off the characters
 la.append(a[i:])
 lb.append(b[i:])
 if len(a)>len(b):#removes empty elements
 lb=[x for x in lb if x!='']
 else:
 la=[x for x in la if x!='']
 while True:#Create empty list for sorting the 0th elements of 'la' nd 'lb'
 temp=[]
 temp.append(la[0])
 temp.append(lb[0])
 temp=sorted(temp)
 print(temp[0][0],end='')#print the 1st character
 if(temp[0] in la):#removing the element after printing the first character
 la.pop(0)
 else:
 lb.pop(0)
 if len(la)==0:#breaks the while loop if a list gets empty
 print(temp[1],end='')
 break
 elif len(lb)==0:
 print(temp[1],end='')
 break

Question 2

Your creating lists with all possible elements part takes too long on long inputs, code won´t even reach the rest. here is a c++ solution that you can translate to python.

Question 3

@juvian - the first sentence of your comment sounds like a worthwhile review; consider posting that as an answer.

Question 4

This is an improvement on @Toby's answer

Instead of str += char which allocates a new string you can yield the result and "".join() it later on
Here's the problem: consider "BAB", "BAC". One would think to compare the two strings, see that BA->B is more minimal than BA->C. or you see that BA->B(end of string) is shorter than BA->(jump to other string)BA. Both of these are wrong, because we end up with BABBAC, instead of BABABC, which is more minimal (BABA < BABB).

The z fixes things because it fixes the issue where we incorrectly choose BA->B0円 over BA(first stack)->(other stack)BA. BA->BA < BA->Bz. Realistically what we should do during tiebreakers is jump to the other stack when one ends so we are comparing against BABB instead of BABz, but the z solution is faster.
Normally with the "z" fix you need to remove them from the end result, but because any uppercase char is lower then the lowercase "z" we know after len(a) + len(b) - 2 iterations only the 2 "z" strings remain in the resulting string. thus we only need to loop len(a) + len(b) - 2 times

Improved code

def morgan(a, b):
 a += "z"
 b += "z"
 for _ in range(len(a) + len(b) - 2):
 if a < b:
 yield a[0]
 a = a[1:]
 else:
 yield b[0]
 b = b[1:]
def morgan_and_string(a, b):
 """
 Return the lexicographically smallest string formed from the input strings, keeping each string
 in order in the result.
 Same length
 >>> morgan_and_string("ACEG", "BDFH")
 'ABCDEFGH'
 >>> morgan_and_string("ABCD", "ABCD")
 'AABBCCDD'
 Empty input
 >>> morgan_and_string("ABCD", "")
 'ABCD'
 >>> morgan_and_string("", "ABCD")
 'ABCD'
 Different length
 >>> morgan_and_string("Z", "ABCD")
 'ABCDZ'
 >>> morgan_and_string("ABCD", "Z")
 'ABCDZ'
 Descending strings
 >>> morgan_and_string("BA", "BA")
 'BABA'
 >>> morgan_and_string("ZAX", "ZAY")
 'ZAXZAY'
 >>> morgan_and_string("ZAY", "ZAX")
 'ZAXZAY'
 Proper prefix
 >>> morgan_and_string("BABC", "BA")
 'BABABC'
 >>> morgan_and_string("CABC", "CA")
 'CABCAC'
 >>> morgan_and_string("BAAC", "BA")
 'BAABAC'
 """
 return "".join(morgan(a, b))
if __name__ == '__main__':
 import doctest
 doctest.testmod()

Question 5

Does this work with ("ZAX", "ZAY") and ("ZAY", "ZAX"), which should both result in "ZAXZAY"?

Question 6

I read a comment in the discussion saying that we should add 'z' at end of both strings, but I didn't understand it. Can you please explain the purpose of adding it?

Question 7

and please explain the reason that you took the range as len(a)+len(b)-2

Question 8

@Sandesh.Patil I have edited a bit

Question 9

a==b was already handled

Question 10

Code organisation

At the moment, the logic handling the output/input and the logic computing the actual result are all mixed up.

This makes things hard to understand, hard to modify and hard to test.

It would be much easier to add a function with 2 parameters returning the string you are interested in.

With minimal changes, you'd get something like:

def get_smallest_string_combination(a, b):
 """Return the lexicographically smallest string ..."""
 la=[]
 lb=[]
 for i in range(max(len(a),len(b))):#creating lists with all possible elements by slicing off the characters
 la.append(a[i:])
 lb.append(b[i:])
 if len(a)>len(b):#removes empty elements
 lb=[x for x in lb if x!='']
 else:
 la=[x for x in la if x!='']
 output = []
 while True:#Create empty list for sorting the 0th elements of 'la' nd 'lb'
 temp=[]
 temp.append(la[0])
 temp.append(lb[0])
 temp=sorted(temp)
 output.append(temp[0][0])#add the 1st character
 if(temp[0] in la):#removing the element after printing the first character
 la.pop(0)
 else:
 lb.pop(0)
 if len(la)==0:#breaks the while loop if a list gets empty
 output.append(temp[1])
 break
 elif len(lb)==0:
 output.append(temp[1])
 break
 return "".join(output)
def automatic_test():
 assert get_smallest_string_combination("ACEG", "BDFH") == "ABCDEFGH"
def interactive_test():
 n=int(input())#number of pairs to input
 for mn in range(n):
 a=input()# String1 in the pair
 b=input()#String2 in the pair
 print(get_smallest_string_combination(a, b))
if __name__ == '__main__':
 automatic_test()

I took this chance to add a function testing the example you've provided.

Also, I've added the beginning of a docstring explaining the point of the function, I'll leave you finish it as an exercice.

More tests and first bug

Now that we have a simple way to write automatic tests, we could add test cases corresponding to edge-cases: empty string, string with one element, etc.

We'd get something like:

def automatic_test():
 # Same length
 assert get_smallest_string_combination("ACEG", "BDFH") == "ABCDEFGH"
 assert get_smallest_string_combination("ABCD", "ABCD") == "AABBCCDD"
 # Empty input
 assert get_smallest_string_combination("ABCD", "") == "ABCD"
 assert get_smallest_string_combination("", "ABCD") == "ABCD"
 # Different length
 assert get_smallest_string_combination("Z", "ABCD") == "ABCDZ"
 assert get_smallest_string_combination("ABCD", "Z") == "ABCDZ"

Which shows that empty inputs are not handled properly.

This can easily be fixed:

def get_smallest_string_combination(a, b):
 """Return the lexicographically smallest string ..."""
 la=[]
 lb=[]
 for i in range(max(len(a),len(b))):#creating lists with all possible elements by slicing off the characters
 la.append(a[i:])
 lb.append(b[i:])
 if len(a)>len(b):#removes empty elements
 lb=[x for x in lb if x!='']
 else:
 la=[x for x in la if x!='']
 output = []
 while la and lb:#Create empty list for sorting the 0th elements of 'la' nd 'lb'
 temp=[]
 temp.append(la[0])
 temp.append(lb[0])
 temp=sorted(temp)
 output.append(temp[0][0])#add the 1st character
 if(temp[0] in la):#removing the element after printing the first character
 la.pop(0)
 else:
 lb.pop(0)
 # Add remaining elements 
 if la:
 output.append(la[0])
 if lb:
 output.append(lb[0])
 return "".join(output)

Logic and complexity

The implementation is very complicated and inefficient but it would be much more simple.

At each step, you just need to compare the first character of each remaining string and add the smallest one (or both in case of equality).

As discussed on the comments, this is very wrong. To be updated in the future...

You'd get something like:

def get_smallest_string_combination(a, b):
 """Return the lexicographically smallest string ..."""
 la = list(a)
 lb = list(b)
 output = []
 while la and lb:
 first_a = la[0]
 first_b = lb[0]
 if first_a < first_b:
 output.append(first_a)
 la.pop(0)
 elif first_a > first_b:
 output.append(first_b)
 lb.pop(0)
 else: # Equal
 output.append(first_a)
 output.append(first_b)
 la.pop(0)
 lb.pop(0)
 # Add remaining elements 
 output.extend(la)
 output.extend(lb)
 return "".join(output)

Question 11

Is there any need to convert the strings to lists? Python's not my strong language, but I thought we could do all the (final) version with string slicing. Would that be better? Why / why not?

Question 12

Oh, and we need to compare more of a and b if first_a is equal to first_b, as per the example in the question (that's unlike the merge step of a merge-sort, where we'd know the inputs were sorted). So I'd suggest if a < b: output.append(a[0]); a = a[1:]; else: output.append(b[0]); b = b[1:] in the while loop.

Question 13

Oh I might have missed something. I'd be most interested if you have an example...

Question 14

Example: "BA" and "BA". Current code will produce "BBAA", but correct implementation should produce "BABA".

Question 15

@cariehl, that's not my reading - we always need to read from the one that will give us the lexicographically-smallest result. Order of inputs is irrelevant, so both of those should result in BABC.

Question 16

This is a follow-on to Josay's answer, and uses the same test suite.

We don't need to convert the two strings to lists; instead, we just need to pick the front character from the earlier of the two at each iteration (making sure that a proper substring of the other is correctly handled). Then the function looks like this:

def get_smallest_string_combination(a, b):
 """Return the lexicographically smallest string ..."""
 output = ""
 while a and b:
 if a + b < b + a:
 output += a[0]
 a = a[1:]
 else:
 output += b[0]
 b = b[1:]
 return output + a + b

We can use the doctest module to incorporate our test cases into the documentation:

def get_smallest_string_combination(a, b):
 """
 Return the lexicographically smallest string formed from the input strings, keeping each string
 in order in the result.
 Same length
 >>> get_smallest_string_combination("ACEG", "BDFH")
 'ABCDEFGH'
 >>> get_smallest_string_combination("ABCD", "ABCD")
 'AABBCCDD'
 Empty input
 >>> get_smallest_string_combination("ABCD", "")
 'ABCD'
 >>> get_smallest_string_combination("", "ABCD")
 'ABCD'
 Different length
 >>> get_smallest_string_combination("Z", "ABCD")
 'ABCDZ'
 >>> get_smallest_string_combination("ABCD", "Z")
 'ABCDZ'
 Descending strings
 >>> get_smallest_string_combination("BA", "BA")
 'BABA'
 Look-ahead
 >>> get_smallest_string_combination("ZAX", "ZAY")
 'ZAXZAY'
 >>> get_smallest_string_combination("ZAY", "ZAX")
 'ZAXZAY'
 Proper prefix
 >>> get_smallest_string_combination("BABC", "BA")
 'BABABC'
 >>> get_smallest_string_combination("CABC", "CA")
 'CABCAC'
 >>> get_smallest_string_combination("BAAC", "BA")
 'BAABAC'
 """
 output = ''
 while a and b:
 if a + b < b + a:
 output += a[0]
 a = a[1:]
 else:
 output += b[0]
 b = b[1:]
 return output + a + b
if __name__ == '__main__':
 import doctest
 doctest.testmod()

Note that performance may be a concern, especially for longer inputs, as we create 4 new strings on every iteration (2 + operations before comparison, one string slice, and the append to output). It may be worth implementing custom classes to provide views onto substrings and concatenations if the impacts are significant enough.

Question 17

While this is better I think it can be improved some more... 1. instead of += string operation which allocates new memory and is inefficient, you could yield and "".join the result. 2. Instead of assigning a new list O(n) with slicing a = a[1:] you could iterate over the index instead

Question 18

That's string slicing, not list slicing; does that still involve allocation? I wasn't sure how to use indexing and still compare the remainder of the string (other than tediously implementing char-by-char compare, and I'm too lazy for that!)

Question 19

any slices are done slice-by-copy, meaning every time you slice it copies all of the data into a new string object.

Question 20

"I wasn't sure how to use indexing and still compare the remainder of the string" I don't quite get what you mean here, I may have missed something... Does yours pass on hacker rank?

Question 21

I've no idea about hacker rank (I'm not a member), but I've added a couple of extra tests to show the lookahead that's needed.

score 4 · Accepted Answer · 2018-09-11 09:16:36Z

This is an improvement on @Toby's answer

Instead of str += char which allocates a new string you can yield the result and "".join() it later on
Here's the problem: consider "BAB", "BAC". One would think to compare the two strings, see that BA->B is more minimal than BA->C. or you see that BA->B(end of string) is shorter than BA->(jump to other string)BA. Both of these are wrong, because we end up with BABBAC, instead of BABABC, which is more minimal (BABA < BABB).

The z fixes things because it fixes the issue where we incorrectly choose BA->B0円 over BA(first stack)->(other stack)BA. BA->BA < BA->Bz. Realistically what we should do during tiebreakers is jump to the other stack when one ends so we are comparing against BABB instead of BABz, but the z solution is faster.
Normally with the "z" fix you need to remove them from the end result, but because any uppercase char is lower then the lowercase "z" we know after len(a) + len(b) - 2 iterations only the 2 "z" strings remain in the resulting string. thus we only need to loop len(a) + len(b) - 2 times

Improved code

def morgan(a, b):
 a += "z"
 b += "z"
 for _ in range(len(a) + len(b) - 2):
 if a < b:
 yield a[0]
 a = a[1:]
 else:
 yield b[0]
 b = b[1:]
def morgan_and_string(a, b):
 """
 Return the lexicographically smallest string formed from the input strings, keeping each string
 in order in the result.
 Same length
 >>> morgan_and_string("ACEG", "BDFH")
 'ABCDEFGH'
 >>> morgan_and_string("ABCD", "ABCD")
 'AABBCCDD'
 Empty input
 >>> morgan_and_string("ABCD", "")
 'ABCD'
 >>> morgan_and_string("", "ABCD")
 'ABCD'
 Different length
 >>> morgan_and_string("Z", "ABCD")
 'ABCDZ'
 >>> morgan_and_string("ABCD", "Z")
 'ABCDZ'
 Descending strings
 >>> morgan_and_string("BA", "BA")
 'BABA'
 >>> morgan_and_string("ZAX", "ZAY")
 'ZAXZAY'
 >>> morgan_and_string("ZAY", "ZAX")
 'ZAXZAY'
 Proper prefix
 >>> morgan_and_string("BABC", "BA")
 'BABABC'
 >>> morgan_and_string("CABC", "CA")
 'CABCAC'
 >>> morgan_and_string("BAAC", "BA")
 'BAABAC'
 """
 return "".join(morgan(a, b))
if __name__ == '__main__':
 import doctest
 doctest.testmod()

Does this work with ("ZAX", "ZAY") and ("ZAY", "ZAX"), which should both result in "ZAXZAY"?
I read a comment in the discussion saying that we should add 'z' at end of both strings, but I didn't understand it. Can you please explain the purpose of adding it?
and please explain the reason that you took the range as len(a)+len(b)-2

Stack Exchange Network

Morgan and a String HackerRank challenge

3 Answers 3

Improved code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Morgan and a String HackerRank challenge

3 Answers 3

Improved code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions