I have been trying to solve this HackerRank problem:
We're going to make our own Contacts application! The application must perform two types of operations:
add name, where is a string denoting a contact name. This must store as a new contact in the application.
find partial, where is a string denoting a partial name to search the application for. It must count the number of contacts starting with and print the count on a new line.
Given sequential add and find operations, perform each operation in order.
What I came up is this
class TrieNode():
def __init__(self):
self.children = {}
self.word_count = 0
class Trie():
def __init__(self):
self.root = self.get_node()
def get_node(self):
return TrieNode()
def add(self, key):
crawl = self.root
for char in key:
if char not in crawl.children:
crawl.children[char] = self.get_node()
crawl = crawl.children[char]
crawl.word_count += 1
def get_prefix(self, key):
crawl = self.root
for char in key:
if char not in crawl.children:
return None
crawl = crawl.children[char]
return crawl
def find(self, key):
node = self.get_prefix(key)
return node.word_count if node else 0
The code above is a modification of this from GeeksForGeeks. The modifications that I have done are keeping a word_count
for every trie node. This denotes how many words start with the prefix ending in that node's letter. The other modification is using a hashmap
to keep children rather than list
s.
On HackerRank, 9/11 test cases pass, and the other 2 timeout. I can't figure out where I can optimize this further.
I know of a radix tree that is an optimized version of a normal trie, but that saves on space, rather than time, so I haven't explored that option.
Can anyone help me figure out where I can optimize for time in my code?
Thanks!
-
\$\begingroup\$ You should always include the programming language tag to the question. \$\endgroup\$dfhwze– dfhwze2019年08月25日 08:59:41 +00:00Commented Aug 25, 2019 at 8:59
1 Answer 1
I think the important thing to notice here is that you don't actually need a Trie. The partial
queries are not just substrings of the names, they are prefixes. It should therefore be faster to just built a dictionary which directly counts how often each prefix occurs.
Something as simple as this is sufficient (just tested it, it passes all testcases):
from collections import defaultdict
def contacts(queries):
prefixes = defaultdict(int)
for op, name in queries:
if op == "add":
# maybe not the best/nicest way to iterate over slices,
# but probably the easiest
for i in range(1, len(name) + 1):
prefixes[name[:i]] += 1
else:
# gives 0 if name is not in prefixes, no need for `get`
yield prefixes[name]
This is \$\mathcal{O}(len(name))\$ for each add
and \$\mathcal{O}(1)\$ for each find
. And since the length of all names is less than 22, the for
loop will be very fast. It does not even seem to matter that slicing the string might create additional copies.
In contrast, your code is also \$\mathcal{O}(len(name))\$ for each add
, but it is also \$\mathcal{O}(len(name))\$ for each find
, as far as I can tell. You need to actually traverse the Trie to find the count. Apparently that, maybe together with the overhead of looking up class methods, is enough to reach the time limit for one of the larger testcases.
Stylistically your code looks very good. The only thing to improve there is adding some docstrings
to your methods which explain how to use them.
If you were worried about Python 2 backward compatibility you should inherit from object
, but nowadays that is probably less of a concern (especially with one-off code for a coding challenge).
-
1\$\begingroup\$ Why
defaultdict(int)
instead ofCounter
? \$\endgroup\$Peter Taylor– Peter Taylor2019年08月28日 15:54:18 +00:00Commented Aug 28, 2019 at 15:54 -
1\$\begingroup\$ Thank you! I understand now. I have one follow up question though. Aren't tries' biggest use case finding prefixes? I'm asking because you said
not just substrings of the names, they are prefixes
, which makes me believe ahashmap
is a better implementation for prefix finding rather than atrie
. Or is it only true in this case because of max limit on the length of the input strings? \$\endgroup\$Sidharth Samant– Sidharth Samant2019年09月01日 05:19:15 +00:00Commented Sep 1, 2019 at 5:19 -
1\$\begingroup\$ @SidharthSamant I think you might be right. In that case it is probably because the input size is fixed. With your trie you iterate through the nodes to get to the final node, and this iteration happens in Python. With the dictionary you lookup a value in a hashmap, which happens in C. And since only the counts are needed, there is no extra memory needed (if each value contained all words seen so far with that prefix, it would take a lot more space than a trie). \$\endgroup\$Graipher– Graipher2019年09月01日 05:38:17 +00:00Commented Sep 1, 2019 at 5:38
-
\$\begingroup\$ @PeterTaylor Because I did not need any of the other features of
Counter
and therefore did not think of it. However, now that you mention itcounter.update(name[:i+1] for i in range(len(name)))
could be slightly more elegant than myfor
loop. \$\endgroup\$Graipher– Graipher2019年09月01日 05:41:14 +00:00Commented Sep 1, 2019 at 5:41
Explore related questions
See similar questions with these tags.