I've implemented a basic Trie that is supposed to work only with lowercase English letters.
class TrieNode {
public:
array<shared_ptr<TrieNode>, 26> children;
bool isWord;
TrieNode(bool _isWord = false) : isWord{_isWord} {
}
};
class Trie {
public:
Trie() : root{make_shared<TrieNode>(TrieNode())} {
}
void insert(const string& word) {
auto p = root;
for (const auto& ch : word) {
const auto idx = ch - 'a';
if (p->children[idx] == nullptr) {
p->children[idx] = make_shared<TrieNode>(TrieNode());
}
p = p->children[idx];
}
p->isWord = true;
}
bool search(const string& word, bool prefix = false) {
auto p = root;
for (const auto& ch : word) {
const auto idx = ch - 'a';
if (p->children[idx] == nullptr) {
return false;
}
p = p->children[idx];
}
if (prefix) {
return true;
}
return p->isWord;
}
bool startsWith(const string& prefix) {
return search(prefix, true);
}
private:
shared_ptr<TrieNode> root;
};
Is the use of shared_ptr
appropriate here? Could I get more efficiency from this code?
1 Answer 1
Use std::unique_ptr
instead of std::shared_ptr
You are never sharing your std::shared_ptr
s, so it is much better to use std::unique_ptr
instead. std::unique_ptr
uses less memory and is more efficient.
Call std::make_unique()
without arguments
When you call
p->children[idx] = make_shared<TrieNode>(TrieNode());
The expression TrieNode()
is creating a temporary TrieNode
object, and then that is copied by std::make_shared()
. This is unnecessary though, just call make_shared<TrieNode>()
. Or with std::unqiue_ptr
:
p->children[idx] = make_unique<TrieNode>();
root
does not need to be a pointer
You can just make root
be a TrieNode
directly. Then you can also remove the default constructor.
Alternative ways to store children
You need some way to allocate memory for children. Your approach of doing something like std::array<std::unique_ptr<TrieNode>, 26>
is one possible way to do that. However, there are others. The problem with your approach is that every node, even if it is a leaf node, stores 26 pointers. That's 208 bytes on a 64-bit system. It would be nice if you could avoid that, at least for the leaf nodes. So consider this instead:
std::unique_ptr<std::array<TrieNode, 26>> children;
Now you only have one pointer (8 bytes on a 64-bit system) in a TrieNode
, which reduces the memory usage of leaf nodes quite a bit.
There are many other ways to store children; you could use a std::vector<TrieNode>
, a std::unordered_map<char, TrieNode>
, or do something completely different. There are tradeoffs between memory usage, performance and easy of programming. Which approach is better also depends on how deep and dense your tries are.
How compact you can store your trie into memory is going to have a big influence on how efficient it will be; the smaller it is the less memory bandwidth will be necessary and the more cache memory will be effective.