I'm learning C++ and whilst trying to write in modern C++, I've made an attempt to rewrite a trie implementation written in C found here: http://www.geeksforgeeks.org/trie-insert-and-search/
It uses arrays to hold the branches in each node and because it's C it was done using malloc() and no freeing of memory was done in the example.
Is this an efficient approach of representing a trie in C++11? what other ways can I store children in trie nodes?
#include <iostream>
#include <map>
#include <algorithm>
#include <memory>
class Trie {
struct Node;
typedef std::unique_ptr<Node> spNode;
struct Node {
std::map<char, spNode> children;
bool isLeaf;
Node() : isLeaf{false} {}
};
spNode root;
public:
Trie();
void insert(const std::string& str);
bool search(const std::string& str);
};
Trie::Trie():root{nullptr}{}
void Trie::insert(const std::string& str) {
if (root == nullptr) {
std::unique_ptr<Node> node(new Node());
root = std::move(node);
}
Node *temp = root.get();
for (const char& c : str) {
if (temp->children.find(c) == temp->children.end()) {//if char not in map
std::unique_ptr<Node> node(new Node());
temp->children[c] = std::move(node);
}
temp = temp->children[c].get();
}
temp->isLeaf = true;
}
bool Trie::search(const std::string &str) {
if (root == nullptr) return false;
Node *temp = root.get();
for (const char& c : str) {
if (temp->children.find(c) == temp->children.end())
return false;
temp = temp->children[c].get();
}
return (temp->isLeaf);
}
int main (void) {
std::string words[] = { "Hello", "hi", "hey", "howdy", "ho"};
Trie test;
for (const auto& str : words) {
test.insert(str);
}
if (test.search("Hello"))
std::cout << " 'Hello' is found in the trie\n";
else
std::cout <<" 'Hello' is not found in the trie\n";
if (test.search("yo"))
std::cout << " 'yo' is found in the trie\n";
else
std::cout << " 'yo' is not found in the trie\n";
}
2 Answers 2
I don't see the need for unique_ptr
for the root node.
spNode root;
I would just make this a node.
Node root;
The use of std::map
is fine. But it does have O(log(n))
lookup. If you switch back to an array its O(1)
. Its a time for space thing. Pick the one you want.
I don't like the two line creation of nodes.
std::unique_ptr<Node> node(new Node());
temp->children[c] = std::move(node);
Just use reset:
temp->children[c].reset(new Node());
Or if you have C++14 use std::make_unique()
temp->children[c] = std::make_unique<Node>();
Personally search()
does not seem quite the correct verb.
bool search(const std::string& str);
What is wrong with find()
?
-
2\$\begingroup\$
std::make_unique()
is available in C++14 as well, unless I am missing something obvious (I often do)! \$\endgroup\$user2296177– user22961772017年05月13日 05:16:33 +00:00Commented May 13, 2017 at 5:16 -
\$\begingroup\$ @user2296177, make unique is introduced in C++14, if you meant that C++11 has it as well. \$\endgroup\$Incomputable– Incomputable2017年05月13日 06:52:01 +00:00Commented May 13, 2017 at 6:52
@LokiAstari hit pretty much all the points I'd have made, but I'd like to expand on one point and then make one philosophical point:
The use of
std::map
is fine. But it does haveO(log(n))
lookup.
std::map
is the C++ equivalent of Java's TreeMap. It's big (it stores about three pointers per node) and slow (lookups require following log(n)
of those pointers on average, and inserts can require rebalancing of the red-black tree, thus touching even more pointers) and therefore cache-unfriendly (a single tree lookup can dirty up to log(n)
cache lines, none of which will be helpful for the next tree lookup) and therefore even slower.
And you're not just using one std::map
— you're using one std::map
per node in your trie, i.e., you're making a big pointer-based tree of big pointer-based trees. This is just absolutely awful for performance.
So yes, an array spNode *children[256]
would probably be preferable. Or it might be worth the effort to abstract out a data type like
template<typename Key, typename Value>
struct FastMapForTries {
static_assert(std::is_same_v<Key, char>, "Key type must be char");
Value favorite_children['z' - 'A' + 1];
std::map<Key, Value> other_children;
Value& operator[](const Key& k) const { ... }
};
just to reduce the size of a Trie::Node
from 257*8
back down to 59*8 + sizeof(std::map)
.
If you're sticking with the std::map
, then at least you should avoid looking things up in the map repeatedly.
if (temp->children.find(c) == temp->children.end()) {//if char not in map
std::unique_ptr<Node> node(new Node());
temp->children[c] = std::move(node);
}
temp = temp->children[c].get();
I count three tree lookups in this short snippet. Try this instead:
auto it = temp->children.find(c);
if (it == temp->children.end()) { // if char not in map
it = temp->children.emplace_hint(it, c, std::make_unique<Node>());
}
temp = it;
Now there's only two lookups. — You might say "only one lookup", but look again and notice (as I missed the first time I wrote this code) that emplace_hint
is only ever called with end()
as the hint! So using emplace_hint
here is exactly as inefficient as using unhinted emplace
. A really good static analysis tool might have caught this bug of mine.
So let's do this instead:
auto pair = temp->children.insert(c, nullptr);
temp = pair.first;
if (pair.second) { // if we just inserted nullptr
*temp = std::make_unique<Node>();
}
Now there's only one tree lookup! If you have benchmarks for your trie code, you should see a ~3x speedup from just this one little refactor.
@Loki mentioned your awkwardly named trie.search(word)
method; he suggested trie.find(word)
. But I'd say that find
isn't the right word either. find
in the STL denotes "look for and return an iterator to", and (right now) your trie doesn't have iterators. So right now, the most STL-ish name for your method is trie.count(word)
, and it should return 0
or 1
.
Of course if you're not going for absolute STL-ish-ness, bool trie.contains(word)
would be the most user-friendly name.
But here's the philosophical point: When designing a data structure, you should treat observations such as "find
is the wrong name because find
returns an iterator" as questions to be seriously considered and responded to!
- I'd like to use the name
find
... - But
find
needs to return an iterator... - Is my trie iterable? No. Can one iterate a trie?...
- Yes, I suppose so (if we introduce parent pointers). An iterator would naturally look like a pointer to a leaf of the trie. Should I make my trie iterable?...
- Hmm. What else can one do with a pointer to a leaf of the trie? What operations are afforded by a pointer-to-leaf?...
- How about fast insertion of a suffix? For example, if I had a pointer to the last character of
"dog"
, I could quickly append a child-leaf to make"dogs"
. - Huh! That sounds a lot like the
emplace_hint
method we were just using! - What about a
lower_bound
method to go along with it?
...and suddenly we're inventing a rich and useful data structure that wasn't even on our radar at the beginning of the conversation. So, always take API-design questions seriously! :)
-
\$\begingroup\$ I love your workaround with the insert \$\endgroup\$miscco– miscco2017年05月14日 05:17:13 +00:00Commented May 14, 2017 at 5:17
-
\$\begingroup\$ "you're not just using one std::map — you're using one std::map per node in your trie" - I'd imagine using the
std::unordered_map
would be more appropriate? \$\endgroup\$Cyniscus– Cyniscus2017年05月21日 01:41:44 +00:00Commented May 21, 2017 at 1:41 -
\$\begingroup\$ @Cyniscus: Nope. That would just make me say "you're using one std::unordered_map per node in your trie" — which is still a ridiculous amount of overhead. A big pointer-based tree of pointer-based hashmaps is only slightly less silly than a big pointer-based tree of pointer-based trees. \$\endgroup\$Quuxplusone– Quuxplusone2017年05月21日 06:57:29 +00:00Commented May 21, 2017 at 6:57
Explore related questions
See similar questions with these tags.
hash-map
here.std::map
has the same characteristics as a tree.std::unordered_map
is closer to ahash-map
but I believe has a much larger space requirements. \$\endgroup\$