I would like to know if my implementation of a Trie data structure which supports addition, search and deletion of words is considered standard/optimal. More specifically, I have used a Hashmap in order to store the children as this gives constant look up as opposed to a vector. I would like to know if this is a good idea. Furthermore, I have no pointer to the parent node as I didn't seem to need it: is this acceptable?
I found that the correct way to implement delete was a bottom-up approach. As an aside, is it possible to do it top-down? As in start at the root then count how many complete words are encountered until the end of word_to_be_deleted is reached. Then delete from last word encountered to this word in a top-down manner.
#include <unordered_map>
#include <iostream>
#include <string>
#include <vector>
class Node {
private:
std::unordered_map<char, Node*> mchildren;
bool is_complete_word;
char mcontent;
public:
std::unordered_map<char, Node*> children() { return mchildren; }
Node *find_child(char c);
void complete() { is_complete_word = true; }
void not_complete() { is_complete_word = false; }
bool is_complete() { return is_complete_word; }
void set_content(char c) { mcontent = c; }
char content() { return mcontent; }
void add_child(Node *n) { mchildren[n->content()] = n; }
bool has_children() { return !mchildren.empty(); }
Node() {
is_complete_word = false;
mcontent = ' ';
}
};
class Trie {
private:
Node *root;
public:
void add_word(std::string s);
bool search_word(std::string s);
void delete_word(std::string s);
Trie();
};
Trie::Trie() {
root = new Node();
}
bool delete_until(Node *node, std::string s, size_t level, size_t len) {
if(node != NULL) {
if( level == len) {
if(node->is_complete()) {
node->not_complete();
if(!node->has_children()) {
return true;
}
return false;
}
} else {
if(delete_until(node->find_child(s.at(level)), s, level + 1, len))
{
delete node->find_child(s.at(level));
return ( !node->is_complete() && !node->has_children());
}
}
}
return false;
}
void Trie::delete_word(std::string s) {
Node *rt = root;
size_t len = s.size();
if(len > 0) {
delete_until(rt, s, 0, len);
}
}
Node *Node::find_child(char c) {
if(mchildren.find(c) != mchildren.end())
return mchildren[c];
return NULL;
}
void Trie::add_word(std::string s) {
Node *current = root;
if(s.length() == 0) {
current->complete();
return;
}
for(size_t i = 0; i < s.length(); ++i) {
Node *child = current->find_child(s[i]);
if(child != NULL) {
std::cout << "child != NULL" << " word = " << s << std::endl;
current = child; }
else if (child == NULL) {
std::cout << "child = NULL" << " word = " << s << std::endl;
Node *tmp = new Node();
tmp->set_content(s[i]);
current->add_child(tmp);
current = tmp;
}
if(i == s.length() - 1)
current->complete();
}
}
bool Trie::search_word(std::string s)
{
Node* current = root;
while ( current != NULL )
{
for ( size_t i = 0; i < s.length(); i++ )
{
Node* tmp = current->find_child(s[i]);
if ( tmp == NULL )
return false;
current = tmp;
}
if ( current->is_complete() )
return true;
else
return false;
}
return false;
}
int main() {
Trie *t = new Trie();
t->add_word("tes");
t->add_word("test");
t->add_word("testy");
t->add_word("Hello");
if(t->search_word("Hello")) { std::cout << "found Hello" <<std::endl;}
t->delete_word("Hello");
if(!t->search_word("Hello")) { std::cout << "Hello gone" <<std::endl;}
if(t->search_word("test")) { std::cout << "found test" <<std::endl;}
t->delete_word("test");
if(!t->search_word("test")) { std::cout << "test gone" <<std::endl;}
if(t->search_word("testy")) { std::cout << "found testy" <<std::endl;}
return 0;
}
3 Answers 3
All of @Djrcaus comments, i'll look at little bit more at the functions themselves, and issues with regard to modern c++
Tests
The empty string is a special case, test adding, searching and removal. Test double add, and double remove. All of these are very common edge cases with most container structures you will want to test that. assert()
in the tests, this way you won't miss a test failing.
Modern C++
Use nullptr
instead of NULL
it's much clearer. In general I would probably prefer using std::unique_ptr<Node>
rather than raw Node
pointers. This does take care of dangling Node pointers, in most cases you could use references to Node
as return or intermediate values without having to go to the raw Node
pointers.
Node::delete_until
The function signature takes 'len' as a parameter but len is never changed, removing it will make the function clearer by using string.size()
.
Trie::delete_word
See JUrcau's comments about function names, the Node*
variable rt
is superflous, just use root. I'm pretty sure there is a bug with the empty string not being removed, see testing ...
Node::find_child
find
returns an iterator to the found object, by doing another access to mchildren
you are forcing a recalculation of the hash. Use store the return value of find
and use it to return the node.
Trie::add_word
child
can either be NULL
or not, there is no need for the else if
just use if
, the last if
check to mark complete can be removed by moving the line current->complete();
to the outside of the for loop, this would also obviate the need for the check on size() == 0
.
Trie::search_word
if ( current->is_complete() )
towards the bottom of the function, if you just use return current->is_complete()
you can get rid of the if
statement there.
Node
class
children()
returns the whole map by value, definitely return by reference probably const &
. With regard to readibility it's hard to distinguish the setters in not_complete()
, complete()
and is_complete()
, especially since you have, set_content()
and add_child()
. set
is a good default prefix for a setter, and passing a boolean for bool setters rather than using a two version will come in handy further down, so use set_complete(bool val)
General
In your code you are mixing the implementations of Node
and Trie
that is not very common, and makes things harder to read.
Memory Management
You're using new
to allocate memory in quite a few places, but I don't see any matching delete
call to also free it.
Don't pass complex objects by value
You're accepting std::string
parameters by value. This will create a copy of them that simply wastes CPU cycles.
Using a std::
(C++17)/boost::
(pre C++17) string_view
as parameter (can be passed by value as it's made up of just a non-unknowing pointer and a size) will allow you to easily pass in either a std::string
or "sample text" or whatever part of some buffer.
Use const
If you have a constant instance of your class, you can't use it as all your methods are declared to only operate on mutable instances, even those that don't change anything like has_children()
.
Declare methods that don't change the state as const
.
Naming
All the Trie
methods (add_word
, search_word
, delete_word
) contain word
. This is redundant and I would simply call them add
, search
, delete
.
search
is not the ideal verb as I would expect it to return a reference to something (i.e. an iterator). If the return is bool
, contains
would be a better name.
Private classes
Your Node
doesn't seem to be used from the outside, consider hiding it inside the Trie
class.
std::unordered_map
with char
as key
Yes, a hashtable has O(1)
amortized search time, BUT your key is a char. For only 256 possible values for a char, using a constant-sized array is much simpler and with less overhead than an std::unordered_map
.
-
1\$\begingroup\$ I believe memory management could be extended/reworded to use rule of 0/3/5 instead, since it fits it better, in my opinion. \$\endgroup\$Incomputable– Incomputable2017年08月20日 09:45:30 +00:00Commented Aug 20, 2017 at 9:45
-
\$\begingroup\$ Thank you for the feedback. I have some questions regarding your comments. 1) memory management - I am freeing the memory in the method delete_word() 2) Can you clarify on the use of const? Do you propose that I use this on objects that should not be changed? Or do you propose I change the code so that it can be used with constant classes? 3) The node class should be contained within trie? \$\endgroup\$laker93– laker932017年08月20日 20:55:38 +00:00Commented Aug 20, 2017 at 20:55
-
\$\begingroup\$ Another question I have: I changed the add, search, delete methods so that string was passed by reference (e.g. add(std::string &s) ). Now I can't add words by calling: t->add("test"); Instead I have to declare test in a separate variable and then pass that variable to the method. Why is that? \$\endgroup\$laker93– laker932017年08月21日 09:10:32 +00:00Commented Aug 21, 2017 at 9:10
-
\$\begingroup\$ I've updated the answer with details \$\endgroup\$D. Jurcau– D. Jurcau2017年08月21日 16:23:37 +00:00Commented Aug 21, 2017 at 16:23
Unless you intend to expand the code futher, you can compactify structures like if - return - else return
.
In delete_until
if(!node->has_children()) {
return true;
}
return false;
can be replaced with
return !node->has_children();
Similary in Trie::search_word
you can shorten
if ( current->is_complete() )
return true;
else
return false;
to
return current->is_complete();
if
, closing braces on a line of their own, indentation of the lineNode *tmp
inadd_word
). This makes the code much easier to read and review. \$\endgroup\$