This is a Hackerrank problem (https://www.hackerrank.com/challenges/sparse-arrays/problem).
We're given an array of strings and an array of queries, and have to return an array of the number of occurrences of each query in the array of strings. For example: s = ['ab', 'abc', 'aba', 'ab']. q = ['ab', atr', 'abc]. We need to return [2, 0, 1].
The constraints are 1 <= n <= 1000
, 1 <= q <= 1000
, where n
and q
are the number of strings and queries respectively, and 1 <= |strings[i]|, |queries[i]| <= 20
.
Here is my code:
def matchingStrings(strings, queries):
results = []
dic = {q:0 for q in queries}
for s in strings:
if s in dic: dic[s] += 1
for q in queries:
if q in dic: results.append(dic[q])
return results
Is there anything I can optimize? Speed, memory usage? Anything else?
2 Answers 2
if q in dic
is pointless. You initializeddic
so that it does have all queries.dic = dict.fromkeys(queries, 0)
should be a bit faster.dic
is not a particularly meaningful name,counter
orctr
would be better.- Creating
results
at the start of the function gives the false impression that it's needed there already. I'd create it right before the loop that builds it. Then you'd have the complete counting part first, and the complete querying part afterwards. Or better yet, ... - ... use a list comprehension, which is faster and nicer.
- Python already has a
Counter
class that does a lot for you and is probably faster.
from collections import Counter
def matchingStrings(strings, queries):
ctr = Counter(strings)
return [ctr[q] for q in queries]
For fun: 1000*1000 isn't super big, and HackerRank's calling code really only needs an iterable, so... this also gets accepted:
def matchingStrings(strings, queries):
return map(strings.count, queries)
-
\$\begingroup\$ I'll buy that a list comp is nicer, but faster? That's more difficult to claim. I'd enjoy looking at a profile. \$\endgroup\$Reinderien– Reinderien2020年10月19日 14:05:21 +00:00Commented Oct 19, 2020 at 14:05
-
\$\begingroup\$ @Reinderien It's common knowledge and I've done that comparison too many times. Since you apparently never have, I encourage you to do it now :-) \$\endgroup\$superb rain– superb rain2020年10月19日 14:08:08 +00:00Commented Oct 19, 2020 at 14:08
-
\$\begingroup\$ @Reinderien Btw, after seeing some people claim that list comprehension is always faster, a while ago I even tried to find a case where it isn't. I thought maybe I could relatively slow down the list comp by using outside variables so it would be a more expensive lookup or so. I failed. Whatever I tried, list comp was always significantly faster. \$\endgroup\$superb rain– superb rain2020年10月19日 15:51:14 +00:00Commented Oct 19, 2020 at 15:51
-
\$\begingroup\$ Without diving into a profiling session myself, there is ample evidence that some comprehensions are slower; take stackoverflow.com/a/27906182/313768 for instance. \$\endgroup\$Reinderien– Reinderien2020年10月22日 15:11:30 +00:00Commented Oct 22, 2020 at 15:11
-
\$\begingroup\$ @Reinderien Sure, if you artificially slow them down like that with something you're not doing in the loop version. But can you show a proper comparison where they're slower? \$\endgroup\$superb rain– superb rain2020年10月22日 15:56:01 +00:00Commented Oct 22, 2020 at 15:56
Any time you are using a dict, and you need to do something special the first time a key is used, take a look at collections.defaultdict()
.
from collections import defauldict
def matchingStrings(strings, queries):
results = []
counts = defaultdict(int)
for s in strings:
counts[s] += 1
results = [counts[q] for q in queries]
return results
Any time you need to count things, take a look at collections.Counter()
.
from collections import Counter
def matchingStrings(strings, queries):
counts = Counter(strings)
return [counts[q] for q in queries]