2
\$\begingroup\$

I'm solving yet another problem in HackerRank (https://www.hackerrank.com/challenges/determining-dna-health/problem). In short: you are given 2 arrays (genes and health), one of which have a 'gene' name, and the other - 'gene' weight (aka health). You then given a bunch of strings, each containing values m and n, which denote the start and end of the slice to be applied to the genes and health arrays, and the 'gene'-string, for which we need to determine healthiness. Then we need to return health-values for the most and the least healthy strings.

My solution is below, and it works, but it's not scalable, i.e. it fails testcases with a lot of values.

import re
if __name__ == '__main__':
 n = int(input())
 genes = input().rstrip().split()
 health = list(map(int, input().rstrip().split()))
 s = int(input())
 weights = []
 for s_itr in range(s):
 m,n,gn = input().split()
 weight = 0
 for i in range(int(m),int(n)+1):
 if genes[i] in gn:
 compilt = "r'(?=("+genes[i]+"))'"
 matches = len(re.findall(eval(compilt), gn))
 weight += health[i]*matches
 weights.append(weight)
 print(min(weights), max(weights))

Can you advise on how to apply generators here? I suspect that the solution fails because of the very big list that's being assembled. Is there a way to get min and max values here without collecting them all?

Example values:

genes = ['a', 'b', 'c', 'aa', 'd', 'b']
health = [1, 2, 3, 4, 5, 6]
gene1 = "1 5 caaab" (result = 19 = max)
gene2 = "0 4 xyz" (result = 0 = min)
gene3 = "2 4 bcdybc" (result = 11)

This case returns 0 19

UPD: Follow-up question on optimized (but still not working) solution here.

asked Jun 1, 2020 at 11:31
\$\endgroup\$
1
  • 6
    \$\begingroup\$ I have rolled back your most recent edit. Code Review does not permit the question to be substantially changed after answers have been posted (accepted or not) as it becomes ambiguous as to which version of the question the answer post is answering. If you have an improved version of the code, and want feedback on it, you must post a new question. See "What should I do when someone answers my question" in the help center. Note the recommendation about adding a link from this question to the new question, as well as a back link from the new question to this one. \$\endgroup\$ Commented Jun 1, 2020 at 23:08

1 Answer 1

4
\$\begingroup\$

eval(...)

What is the purpose of:

compilt = "r'(?=(" + genes[i] + "))'"
... eval(compilt), ...

It takes a string like "aa", and forms a new string "r'(?=(aa))'", which is source code for the raw string r'(?=(aa))', which when evaluated yields the string "(?=(aa))".

There is no escaping being done, no obvious reason to do the raw string formation and subsequent evaluation, and no prevention of a syntax error due to a stray ' character in the genes[i] array. So ... why not simply:

regex = "(?=(" + gene[i] + "))"

and no call to eval(...) at all?

Regex Capturing

The regex subexpression (...) is a "capturing group", which copies the matching characters into an internal buffer, for returning in the match group.

>>> re.findall('(?=(aa))', "caaab")
['aa', 'aa']

Without the capturing group, the matching characters do not have to be copied to the internal buffer, to be returned.

>>> re.findall('(?=aa)', "caaab")
['', '']

Given that you are only interested in the len(...) of the list returned from re.findall(), the capturing group seems like unnecessary overhead, which can be eliminated for faster execution.

Compiled Regex

As Python uses regular expressions, it maintains a cache of the most recently used regular expressions. This cache has a limited size, to prevent an excessive memory load.

In this exercise, you are repeatedly using the same gene regular expressions for each "healthy string" test. If the number of genes exceeds the cache size, Python will be tossing out compiled regular expressions, only to compile them again moments later.

You can preempt this by compiling and storing all the gene regular expressions ahead of time. Leveraging Python 3.6's f-strings, and list comprehension:

 genes = input().rstrip().split()
 genes_rx = [re.compile(f"(?={gene})") for gene in genes]

Used as:

 matches = len(re.findall(genes_rx[i], gn))

Now the gene to regular expression string, to compiled regular expression is done once per gene, instead of once per "healthy string" test.

Computing min/max weight without creating a list

How about:

 min_weight = math.inf
 max_weight = -math.inf
 for ...:
 weight = ...
 if weight < min_weight:
 min_weight = weight
 if weight > max_weight:
 max_weight = weight
 print(min_weight, max_weight)
answered Jun 1, 2020 at 20:00
\$\endgroup\$
7
  • \$\begingroup\$ AJNeufeld, I would need to take some time digesting your answer: some of the concepts you mentioned are not yet familiar to me, but right now I would like to thank you: your effort is really appreciated! \$\endgroup\$ Commented Jun 1, 2020 at 20:13
  • \$\begingroup\$ As for the 'what's the purpose' questions - I'm fairly new to the language, I learn stuff from various sources, and that is the best I could come up with. You are definitely right about it not being clean or optimal \$\endgroup\$ Commented Jun 1, 2020 at 20:14
  • \$\begingroup\$ I tried the things you mentioned, but the code still fails due to timeout. See full solution in the question body. \$\endgroup\$ Commented Jun 1, 2020 at 22:22
  • 1
    \$\begingroup\$ That too bad. It should be faster. Does it pass more test cases, or is it just one super large case that fails with a timeout? \$\endgroup\$ Commented Jun 1, 2020 at 23:12
  • \$\begingroup\$ @DenisShvetsov Can you test and profile your code locally to see the difference in execution time? \$\endgroup\$ Commented Jun 2, 2020 at 6:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.