Below is my function to solve the Minion Game problem on HackerRank along with the problem description.
It works wonderfully with regular-length strings, like 'banana', but when it comes to really long ones (ones that produce output counter in the vicinity of 7501500, result in a runtime error (or memory error, depending on where you run it).
There must be a way to optimize the code. Can anybody advise?
"""Kevin and Stuart want to play the 'The Minion Game'.
Game Rules
Both players are given the same string, S.
Both players have to make substrings using the letters of the string S.
Stuart has to make words starting with consonants.
Kevin has to make words starting with vowels.
The game ends when both players have made all possible substrings.
Scoring
A player gets +1 point for each occurrence of the substring in the string S.
For Example:
String
S = BANANA
Kevin's vowel beginning word = ANA
Here, ANA occurs twice in BANANA. Hence, Kevin will get 2 Points.
"""
def minions(s):
vowels = ['A','E','I','O','U']
s = s.upper()
n = len(s)
kevin = dict()
stuart = dict()
# list all string variations using list comprehension
lst = [s[i:j+1] for i in range(n) for j in range(i,n)]
# populate dictionaries
for i in lst:
if i[0] in vowels:
if i in kevin:
kevin[i] += 1
else:
kevin[i] = 1
else:
if i in stuart:
stuart[i] += 1
else:
stuart[i] = 1
kevin_sm = 0
for x,y in kevin.items():
kevin_sm += y
stuart_sm = 0
for x,y in stuart.items():
stuart_sm += y
if kevin_sm > stuart_sm:
print("Kevin {}".format(kevin_sm))
elif kevin_sm == stuart_sm:
print("Draw")
else:
print("Stuart {}".format(stuart_sm))
Example of a problematic string:
NANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANAN
-
2\$\begingroup\$ For which exact example strings does the code run out of support? Are you sure your code is ready for review if the code doesn't complete all testcases? Please take a look at the help center. \$\endgroup\$Mast– Mast ♦2020年05月08日 08:11:04 +00:00Commented May 8, 2020 at 8:11
-
2\$\begingroup\$ added an example string to the problem body. The code doesn't complete all testcases precisely because of the problem I mentioned. \$\endgroup\$Denis Shvetsov– Denis Shvetsov2020年05月08日 08:44:15 +00:00Commented May 8, 2020 at 8:44
-
\$\begingroup\$ Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review. Feel free to come back once the code passes the testcases. \$\endgroup\$Mast– Mast ♦2020年05月08日 09:31:58 +00:00Commented May 8, 2020 at 9:31
-
2\$\begingroup\$ The question very much is about performance and scalability. Their code works but is not performant enough and does not scale well. They could have just not mentioned that a test case (one that is about performance/scalability, not core functionality) is failing and the question would have been on topic, just by virtue of them asking "Can this be made more performant?". The code even works for the provided test case that OP said failed (on the HackerRank website); it just happens to take 2 minutes (recent i7) and 8GBs of RAM... \$\endgroup\$Alex Povel– Alex Povel2020年05月08日 09:45:50 +00:00Commented May 8, 2020 at 9:45
-
\$\begingroup\$ Thanks, Alex. Noted for the future. \$\endgroup\$Denis Shvetsov– Denis Shvetsov2020年05月08日 11:20:31 +00:00Commented May 8, 2020 at 11:20
1 Answer 1
By storing all possible substrings in a dictionary, you require memory for all the duplicates your are creating. This is probably the cause for the memory errors.
You also iterate over two dictionaries and the (long) list, which will be slower than what is described below.
Creating the lst
in the first place requires two nested for
loops; they don't have quadratic complexity, but are expensive still.
The problem can be tackled very differently if seen like in the following.
Take the example of BANANA
.
Its length is word_length = 6
.
You look at the first letter, B
; it's a consonant.
As such, Stuart would score for this one.
Of course, the substring BANANA
occurs only once in BANANA
.
However, you can count all truncations of it as their own substrings.
One by one, you chop off the last letter, leading you to:
BANANA
BANAN
BANA
BAN
BA
B
All of these are substrings found in BANANA
.
Remember, we are still looking at the first letter, B
, which is indexed as 0
.
We found word_length - i = 6 - 0 = 6
valid substrings, all of which count towards Stuart's score.
Let's see if the above pattern holds going forward.
The next letter is A
, i
becomes 1
.
word_length
is a constant.
ANANA
ANAN
ANA
AN
A
This makes for 6 - 1 = 5
valid substrings, counted towards Kevin's score this time.
The third step looks like:
NANA
NAN
NA
N
with a score of word_length - i = 6 - 2 = 4
for Stuart.
Note that substrings which occur more than once are counted correctly.
For example, the substring NA
will be counted twice in the overall score.
Until now, it has been accounted for once, in the third step.
The next time it is counted will be in the fifth step, where the substrings looks like:
NA
N
As such, N
will also be counted correctly (twice).
Let the input string length be \$ n \$.
In this case, the below approach only iterates over it once, leading to \$ \mathcal{O}(n) \$, linear, complexity.
The rest is just incrementing integers, which is very cheap.
word_length
has to be computed once, which is \$ \mathcal{O}(1) \$ (constant/cheap).
def minion_game(word):
# word = input()
vowels = "AEIOU"
vowel_score = 0
consonant_score = 0
word_length = len(word)
for idx, letter in enumerate(word):
delta = word_length - idx
if letter in vowels:
vowel_score += delta
else:
consonant_score += delta
if vowel_score > consonant_score:
print("Kevin", vowel_score)
elif vowel_score < consonant_score:
print("Stuart", consonant_score)
else:
print("Draw")
Other points on your code:
for x,y in kevin.items():
kevin_sm += y
can just be
for y in kevin.values():
kevin_sm += y
vowels = ['A','E','I','O','U']
can, in this context, just be
vowels = "AEIOU"
It will behave the same for the in
lookup.
You will want a list if you required a mutable object.
for i in lst:
is very misleading variable naming.
lst
is a list of string slices, and not indices (i
).
It should read something like
for substring in substrings:
(aka, also rename the too-generic lst
variable).
-
1\$\begingroup\$
vowels = "AEIOU"
is really neat. you could omit the delta calculation altogether if you were to e.g.enumerate( reversed( word ), 1 )
orzip( word, range( word_length, 0, -1 ) )
switching indices. \$\endgroup\$two_pc_turkey_breast_mince– two_pc_turkey_breast_mince2020年05月08日 10:40:41 +00:00Commented May 8, 2020 at 10:40 -
1\$\begingroup\$ Good point. Usually I'm not a fan of densifying code too much, but this is useful and readable. \$\endgroup\$Alex Povel– Alex Povel2020年05月08日 10:48:47 +00:00Commented May 8, 2020 at 10:48
-
1\$\begingroup\$ @ALexPovel, thank you very much for your elaborate answer. This really helps a lot. \$\endgroup\$Denis Shvetsov– Denis Shvetsov2020年05月08日 11:07:12 +00:00Commented May 8, 2020 at 11:07