I am solving a HackerRank problem called 'Morgan and a String'. Given two strings, each consisting of up to 105 uppercase characters, the task is to form the lexicographically smallest string from those letters by taking the first remaining character from either of the strings at each step. Example, for input strings "ACA" and "BCF", the answer should be "ABCACF":
Jack Daniel Result
ACA BCF
CA BCF A
CA CF AB
A CF ABC
A CF ABCA
F ABCAC
ABCACF
Sample input, with one test case:
1
ACEG
BDFH
Output:
ABCDEFGH
I've solved it by writing my own code which works perfectly on smaller test cases. But when they run it against test cases which are approx 6000 characters long I get a runtime error. How can I increase the efficiency?
n=int(input())#number of pairs to input
for mn in range(n):
a=input()# String1 in the pair
b=input()#String2 in the pair
la=[]
lb=[]
for i in range(max(len(a),len(b))):#creating lists with all possible elements by slicing off the characters
la.append(a[i:])
lb.append(b[i:])
if len(a)>len(b):#removes empty elements
lb=[x for x in lb if x!='']
else:
la=[x for x in la if x!='']
while True:#Create empty list for sorting the 0th elements of 'la' nd 'lb'
temp=[]
temp.append(la[0])
temp.append(lb[0])
temp=sorted(temp)
print(temp[0][0],end='')#print the 1st character
if(temp[0] in la):#removing the element after printing the first character
la.pop(0)
else:
lb.pop(0)
if len(la)==0:#breaks the while loop if a list gets empty
print(temp[1],end='')
break
elif len(lb)==0:
print(temp[1],end='')
break
-
\$\begingroup\$ Your creating lists with all possible elements part takes too long on long inputs, code won´t even reach the rest. here is a c++ solution that you can translate to python. \$\endgroup\$juvian– juvian2018年09月10日 16:00:43 +00:00Commented Sep 10, 2018 at 16:00
-
\$\begingroup\$ @juvian - the first sentence of your comment sounds like a worthwhile review; consider posting that as an answer. \$\endgroup\$Toby Speight– Toby Speight2018年09月10日 16:32:11 +00:00Commented Sep 10, 2018 at 16:32
3 Answers 3
This is an improvement on @Toby's answer
Instead of
str += char
which allocates a new string you can yield the result and"".join()
it later onHere's the problem: consider
"BAB"
,"BAC"
. One would think to compare the two strings, see thatBA->B
is more minimal thanBA->C
. or you see thatBA->B
(end of string) is shorter thanBA->(jump to other string)BA
. Both of these are wrong, because we end up withBABBAC
, instead ofBABABC
, which is more minimal (BABA < BABB
).The z fixes things because it fixes the issue where we incorrectly choose
BA->B0円
overBA(first stack)->(other stack)BA
.BA->BA < BA->Bz
. Realistically what we should do during tiebreakers is jump to the other stack when one ends so we are comparing againstBABB
instead ofBABz
, but the z solution is faster.Normally with the
"z"
fix you need to remove them from the end result, but because any uppercase char is lower then the lowercase"z"
we know afterlen(a) + len(b) - 2
iterations only the 2"z"
strings remain in the resulting string. thus we only need to looplen(a) + len(b) - 2
times
Improved code
def morgan(a, b):
a += "z"
b += "z"
for _ in range(len(a) + len(b) - 2):
if a < b:
yield a[0]
a = a[1:]
else:
yield b[0]
b = b[1:]
def morgan_and_string(a, b):
"""
Return the lexicographically smallest string formed from the input strings, keeping each string
in order in the result.
Same length
>>> morgan_and_string("ACEG", "BDFH")
'ABCDEFGH'
>>> morgan_and_string("ABCD", "ABCD")
'AABBCCDD'
Empty input
>>> morgan_and_string("ABCD", "")
'ABCD'
>>> morgan_and_string("", "ABCD")
'ABCD'
Different length
>>> morgan_and_string("Z", "ABCD")
'ABCDZ'
>>> morgan_and_string("ABCD", "Z")
'ABCDZ'
Descending strings
>>> morgan_and_string("BA", "BA")
'BABA'
>>> morgan_and_string("ZAX", "ZAY")
'ZAXZAY'
>>> morgan_and_string("ZAY", "ZAX")
'ZAXZAY'
Proper prefix
>>> morgan_and_string("BABC", "BA")
'BABABC'
>>> morgan_and_string("CABC", "CA")
'CABCAC'
>>> morgan_and_string("BAAC", "BA")
'BAABAC'
"""
return "".join(morgan(a, b))
if __name__ == '__main__':
import doctest
doctest.testmod()
-
\$\begingroup\$ Does this work with
("ZAX", "ZAY")
and("ZAY", "ZAX")
, which should both result in"ZAXZAY"
? \$\endgroup\$Toby Speight– Toby Speight2018年09月11日 10:01:34 +00:00Commented Sep 11, 2018 at 10:01 -
\$\begingroup\$ I read a comment in the discussion saying that we should add 'z' at end of both strings, but I didn't understand it. Can you please explain the purpose of adding it? \$\endgroup\$Sandesh34– Sandesh342018年09月11日 11:10:41 +00:00Commented Sep 11, 2018 at 11:10
-
\$\begingroup\$ and please explain the reason that you took the range as
len(a)+len(b)-2
\$\endgroup\$Sandesh34– Sandesh342018年09月11日 11:16:51 +00:00Commented Sep 11, 2018 at 11:16 -
\$\begingroup\$ @Sandesh.Patil I have edited a bit \$\endgroup\$Ludisposed– Ludisposed2018年09月11日 12:20:50 +00:00Commented Sep 11, 2018 at 12:20
-
1\$\begingroup\$
a==b
was already handled \$\endgroup\$Ludisposed– Ludisposed2018年09月11日 13:10:17 +00:00Commented Sep 11, 2018 at 13:10
Code organisation
At the moment, the logic handling the output/input and the logic computing the actual result are all mixed up.
This makes things hard to understand, hard to modify and hard to test.
It would be much easier to add a function with 2 parameters returning the string you are interested in.
With minimal changes, you'd get something like:
def get_smallest_string_combination(a, b):
"""Return the lexicographically smallest string ..."""
la=[]
lb=[]
for i in range(max(len(a),len(b))):#creating lists with all possible elements by slicing off the characters
la.append(a[i:])
lb.append(b[i:])
if len(a)>len(b):#removes empty elements
lb=[x for x in lb if x!='']
else:
la=[x for x in la if x!='']
output = []
while True:#Create empty list for sorting the 0th elements of 'la' nd 'lb'
temp=[]
temp.append(la[0])
temp.append(lb[0])
temp=sorted(temp)
output.append(temp[0][0])#add the 1st character
if(temp[0] in la):#removing the element after printing the first character
la.pop(0)
else:
lb.pop(0)
if len(la)==0:#breaks the while loop if a list gets empty
output.append(temp[1])
break
elif len(lb)==0:
output.append(temp[1])
break
return "".join(output)
def automatic_test():
assert get_smallest_string_combination("ACEG", "BDFH") == "ABCDEFGH"
def interactive_test():
n=int(input())#number of pairs to input
for mn in range(n):
a=input()# String1 in the pair
b=input()#String2 in the pair
print(get_smallest_string_combination(a, b))
if __name__ == '__main__':
automatic_test()
I took this chance to add a function testing the example you've provided.
Also, I've added the beginning of a docstring explaining the point of the function, I'll leave you finish it as an exercice.
More tests and first bug
Now that we have a simple way to write automatic tests, we could add test cases corresponding to edge-cases: empty string, string with one element, etc.
We'd get something like:
def automatic_test():
# Same length
assert get_smallest_string_combination("ACEG", "BDFH") == "ABCDEFGH"
assert get_smallest_string_combination("ABCD", "ABCD") == "AABBCCDD"
# Empty input
assert get_smallest_string_combination("ABCD", "") == "ABCD"
assert get_smallest_string_combination("", "ABCD") == "ABCD"
# Different length
assert get_smallest_string_combination("Z", "ABCD") == "ABCDZ"
assert get_smallest_string_combination("ABCD", "Z") == "ABCDZ"
Which shows that empty inputs are not handled properly.
This can easily be fixed:
def get_smallest_string_combination(a, b):
"""Return the lexicographically smallest string ..."""
la=[]
lb=[]
for i in range(max(len(a),len(b))):#creating lists with all possible elements by slicing off the characters
la.append(a[i:])
lb.append(b[i:])
if len(a)>len(b):#removes empty elements
lb=[x for x in lb if x!='']
else:
la=[x for x in la if x!='']
output = []
while la and lb:#Create empty list for sorting the 0th elements of 'la' nd 'lb'
temp=[]
temp.append(la[0])
temp.append(lb[0])
temp=sorted(temp)
output.append(temp[0][0])#add the 1st character
if(temp[0] in la):#removing the element after printing the first character
la.pop(0)
else:
lb.pop(0)
# Add remaining elements
if la:
output.append(la[0])
if lb:
output.append(lb[0])
return "".join(output)
Logic and complexity
The implementation is very complicated and inefficient but it would be much more simple.
At each step, you just need to compare the first character of each remaining string and add the smallest one (or both in case of equality).
As discussed on the comments, this is very wrong. To be updated in the future...
You'd get something like:
def get_smallest_string_combination(a, b):
"""Return the lexicographically smallest string ..."""
la = list(a)
lb = list(b)
output = []
while la and lb:
first_a = la[0]
first_b = lb[0]
if first_a < first_b:
output.append(first_a)
la.pop(0)
elif first_a > first_b:
output.append(first_b)
lb.pop(0)
else: # Equal
output.append(first_a)
output.append(first_b)
la.pop(0)
lb.pop(0)
# Add remaining elements
output.extend(la)
output.extend(lb)
return "".join(output)
-
\$\begingroup\$ Is there any need to convert the strings to lists? Python's not my strong language, but I thought we could do all the (final) version with string slicing. Would that be better? Why / why not? \$\endgroup\$Toby Speight– Toby Speight2018年09月10日 17:04:02 +00:00Commented Sep 10, 2018 at 17:04
-
2\$\begingroup\$ Oh, and we need to compare more of
a
andb
iffirst_a
is equal tofirst_b
, as per the example in the question (that's unlike the merge step of a merge-sort, where we'd know the inputs were sorted). So I'd suggestif a < b: output.append(a[0]); a = a[1:]; else: output.append(b[0]); b = b[1:]
in thewhile
loop. \$\endgroup\$Toby Speight– Toby Speight2018年09月10日 17:05:40 +00:00Commented Sep 10, 2018 at 17:05 -
\$\begingroup\$ Oh I might have missed something. I'd be most interested if you have an example... \$\endgroup\$SylvainD– SylvainD2018年09月10日 17:21:55 +00:00Commented Sep 10, 2018 at 17:21
-
2\$\begingroup\$ Example: "BA" and "BA". Current code will produce "BBAA", but correct implementation should produce "BABA". \$\endgroup\$cariehl– cariehl2018年09月10日 17:24:58 +00:00Commented Sep 10, 2018 at 17:24
-
1\$\begingroup\$ @cariehl, that's not my reading - we always need to read from the one that will give us the lexicographically-smallest result. Order of inputs is irrelevant, so both of those should result in BABC. \$\endgroup\$Toby Speight– Toby Speight2018年09月10日 17:31:28 +00:00Commented Sep 10, 2018 at 17:31
This is a follow-on to Josay's answer, and uses the same test suite.
We don't need to convert the two strings to lists; instead, we just need to pick the front character from the earlier of the two at each iteration (making sure that a proper substring of the other is correctly handled). Then the function looks like this:
def get_smallest_string_combination(a, b):
"""Return the lexicographically smallest string ..."""
output = ""
while a and b:
if a + b < b + a:
output += a[0]
a = a[1:]
else:
output += b[0]
b = b[1:]
return output + a + b
We can use the doctest
module to incorporate our test cases into the documentation:
def get_smallest_string_combination(a, b):
"""
Return the lexicographically smallest string formed from the input strings, keeping each string
in order in the result.
Same length
>>> get_smallest_string_combination("ACEG", "BDFH")
'ABCDEFGH'
>>> get_smallest_string_combination("ABCD", "ABCD")
'AABBCCDD'
Empty input
>>> get_smallest_string_combination("ABCD", "")
'ABCD'
>>> get_smallest_string_combination("", "ABCD")
'ABCD'
Different length
>>> get_smallest_string_combination("Z", "ABCD")
'ABCDZ'
>>> get_smallest_string_combination("ABCD", "Z")
'ABCDZ'
Descending strings
>>> get_smallest_string_combination("BA", "BA")
'BABA'
Look-ahead
>>> get_smallest_string_combination("ZAX", "ZAY")
'ZAXZAY'
>>> get_smallest_string_combination("ZAY", "ZAX")
'ZAXZAY'
Proper prefix
>>> get_smallest_string_combination("BABC", "BA")
'BABABC'
>>> get_smallest_string_combination("CABC", "CA")
'CABCAC'
>>> get_smallest_string_combination("BAAC", "BA")
'BAABAC'
"""
output = ''
while a and b:
if a + b < b + a:
output += a[0]
a = a[1:]
else:
output += b[0]
b = b[1:]
return output + a + b
if __name__ == '__main__':
import doctest
doctest.testmod()
Note that performance may be a concern, especially for longer inputs, as we create 4 new strings on every iteration (2 +
operations before comparison, one string slice, and the append to output). It may be worth implementing custom classes to provide views onto substrings and concatenations if the impacts are significant enough.
-
\$\begingroup\$ While this is better I think it can be improved some more... 1. instead of
+=
string operation which allocates new memory and is inefficient, you couldyield
and"".join
the result. 2. Instead of assigning a new listO(n)
with slicinga = a[1:]
you could iterate over the index instead \$\endgroup\$Ludisposed– Ludisposed2018年09月11日 09:16:15 +00:00Commented Sep 11, 2018 at 9:16 -
\$\begingroup\$ That's string slicing, not list slicing; does that still involve allocation? I wasn't sure how to use indexing and still compare the remainder of the string (other than tediously implementing char-by-char compare, and I'm too lazy for that!) \$\endgroup\$Toby Speight– Toby Speight2018年09月11日 09:54:50 +00:00Commented Sep 11, 2018 at 9:54
-
\$\begingroup\$ any slices are done slice-by-copy, meaning every time you slice it copies all of the data into a new string object. \$\endgroup\$Ludisposed– Ludisposed2018年09月11日 09:58:42 +00:00Commented Sep 11, 2018 at 9:58
-
\$\begingroup\$ "I wasn't sure how to use indexing and still compare the remainder of the string" I don't quite get what you mean here, I may have missed something... Does yours pass on hacker rank? \$\endgroup\$Ludisposed– Ludisposed2018年09月11日 10:02:19 +00:00Commented Sep 11, 2018 at 10:02
-
1\$\begingroup\$ I've no idea about hacker rank (I'm not a member), but I've added a couple of extra tests to show the lookahead that's needed. \$\endgroup\$Toby Speight– Toby Speight2018年09月11日 10:03:45 +00:00Commented Sep 11, 2018 at 10:03
Explore related questions
See similar questions with these tags.