Number of occurrences of strings in array

Question 1

This is a Hackerrank problem (https://www.hackerrank.com/challenges/sparse-arrays/problem).

We're given an array of strings and an array of queries, and have to return an array of the number of occurrences of each query in the array of strings. For example: s = ['ab', 'abc', 'aba', 'ab']. q = ['ab', atr', 'abc]. We need to return [2, 0, 1].

The constraints are 1 <= n <= 1000, 1 <= q <= 1000, where n and q are the number of strings and queries respectively, and 1 <= |strings[i]|, |queries[i]| <= 20.

Here is my code:

def matchingStrings(strings, queries):
 results = []
 dic = {q:0 for q in queries}
 for s in strings:
 if s in dic: dic[s] += 1
 for q in queries:
 if q in dic: results.append(dic[q]) 
 return results

Is there anything I can optimize? Speed, memory usage? Anything else?

Question 2

if q in dic is pointless. You initialized dic so that it does have all queries.
dic = dict.fromkeys(queries, 0) should be a bit faster.
dic is not a particularly meaningful name, counter or ctr would be better.
Creating results at the start of the function gives the false impression that it's needed there already. I'd create it right before the loop that builds it. Then you'd have the complete counting part first, and the complete querying part afterwards. Or better yet, ...
... use a list comprehension, which is faster and nicer.
Python already has a Counter class that does a lot for you and is probably faster.

from collections import Counter
def matchingStrings(strings, queries):
 ctr = Counter(strings)
 return [ctr[q] for q in queries]

For fun: 1000*1000 isn't super big, and HackerRank's calling code really only needs an iterable, so... this also gets accepted:

def matchingStrings(strings, queries):
 return map(strings.count, queries)

Question 3

I'll buy that a list comp is nicer, but faster? That's more difficult to claim. I'd enjoy looking at a profile.

Question 4

@Reinderien It's common knowledge and I've done that comparison too many times. Since you apparently never have, I encourage you to do it now :-)

Question 5

@Reinderien Btw, after seeing some people claim that list comprehension is always faster, a while ago I even tried to find a case where it isn't. I thought maybe I could relatively slow down the list comp by using outside variables so it would be a more expensive lookup or so. I failed. Whatever I tried, list comp was always significantly faster.

Question 6

Without diving into a profiling session myself, there is ample evidence that some comprehensions are slower; take stackoverflow.com/a/27906182/313768 for instance.

Question 7

@Reinderien Sure, if you artificially slow them down like that with something you're not doing in the loop version. But can you show a proper comparison where they're slower?

Question 8

Any time you are using a dict, and you need to do something special the first time a key is used, take a look at collections.defaultdict().

from collections import defauldict
def matchingStrings(strings, queries):
 results = []
 counts = defaultdict(int)
 for s in strings:
 counts[s] += 1
 results = [counts[q] for q in queries]
 return results

Any time you need to count things, take a look at collections.Counter().

from collections import Counter
def matchingStrings(strings, queries):
 counts = Counter(strings)
 return [counts[q] for q in queries]

superb rain superb rain 4,0217 silver badges22 bronze badges · Accepted Answer · 2020-10-18 22:49:54Z

if q in dic is pointless. You initialized dic so that it does have all queries.
dic = dict.fromkeys(queries, 0) should be a bit faster.
dic is not a particularly meaningful name, counter or ctr would be better.
Creating results at the start of the function gives the false impression that it's needed there already. I'd create it right before the loop that builds it. Then you'd have the complete counting part first, and the complete querying part afterwards. Or better yet, ...
... use a list comprehension, which is faster and nicer.
Python already has a Counter class that does a lot for you and is probably faster.

from collections import Counter
def matchingStrings(strings, queries):
 ctr = Counter(strings)
 return [ctr[q] for q in queries]

For fun: 1000*1000 isn't super big, and HackerRank's calling code really only needs an iterable, so... this also gets accepted:

def matchingStrings(strings, queries):
 return map(strings.count, queries)

I'll buy that a list comp is nicer, but faster? That's more difficult to claim. I'd enjoy looking at a profile.
@Reinderien It's common knowledge and I've done that comparison too many times. Since you apparently never have, I encourage you to do it now :-)
@Reinderien Btw, after seeing some people claim that list comprehension is always faster, a while ago I even tried to find a case where it isn't. I thought maybe I could relatively slow down the list comp by using outside variables so it would be a more expensive lookup or so. I failed. Whatever I tried, list comp was always significantly faster.
Without diving into a profiling session myself, there is ample evidence that some comprehensions are slower; take stackoverflow.com/a/27906182/313768 for instance.
@Reinderien Sure, if you artificially slow them down like that with something you're not doing in the loop version. But can you show a proper comparison where they're slower?

Stack Exchange Network

Number of occurrences of strings in array

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Number of occurrences of strings in array

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions