This is a simple Trie data structure for strings, except it puts the strings into the structure backwards.
The insert method simply iterates over chars from the string-to-be-inserted backwards, and starting at the root of the graph, looks for an edge that is labeled with the current char. If an edge is found, we move to the next vertex and next char. If no edge is found, a new edge and target vertex is created, with the new edge labeled with the current char.
The containsWord
method is similar. It follows labeled edges until it comes to the a word terminator '0円'
or returns false as soon as it cannot find the next edge.
I'm fairly new to C++. I'd like to know about any style errors or anything outside of accepted convention. Or if this is just not a good way of creating this structure.
My unit tests are working ok with this, but performance seems to be lagging a boost::unordered_set
for lookups. Tries are supposed to have \$O(length of key)\$ lookup time, which I would think would be about the same as the cost of calculating the hash code of a string to look up in a set.
#include <boost/unordered_map.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
class TrieVertex;
typedef boost::shared_ptr<TrieVertex> VertexPtr;
class TrieEdge
{
public:
char fC;
VertexPtr fTarget;
TrieEdge(char c, VertexPtr target) : fC(c), fTarget(target) {}
};
typedef boost::shared_ptr<TrieEdge> EdgePtr;
class TrieVertex {
public:
boost::unordered_map<char, EdgePtr> fOutEdges;
TrieVertex() : fOutEdges() {}
};
class ReverseTrie {
private:
VertexPtr fRoot;
size_t fSize;
public:
ReverseTrie( ): fRoot(boost::make_shared<TrieVertex>()), fSize(0) {}
~ReverseTrie() {
fRoot.reset();
}
void clear() {
boost::make_shared<TrieVertex>().swap(fRoot);
fSize = 0;
}
size_t size() {
return fSize;
}
bool insertStringR(const std::string &word) {
VertexPtr currentVertex = fRoot;
for (std::string::const_reverse_iterator i = word.rbegin(); i != word.rend(); ++i) {
if (*i == '0円') {
continue;
}
if (currentVertex->fOutEdges.size() == 0 || currentVertex->fOutEdges.count(*i) == 0) {
VertexPtr targetVertex = boost::make_shared<TrieVertex>();
EdgePtr e = boost::make_shared<TrieEdge>(*i, targetVertex);
currentVertex->fOutEdges[*i] = e;
currentVertex = e->fTarget;//*currentVertex.fOutEdges[*i]->fTarget;
} else {
EdgePtr e = currentVertex->fOutEdges[*i];
currentVertex = e->fTarget;
}
}
bool exists = true;
//end of word.
if (currentVertex->fOutEdges.count('0円') == 0) {
VertexPtr targetVertex = boost::make_shared<TrieVertex>();
EdgePtr e = boost::make_shared<TrieEdge>('0円', targetVertex);
currentVertex->fOutEdges['0円'] = e;
exists = false;
}
fSize++;
return exists;
}
bool containsString(const std::string &word) {
VertexPtr currentVertex = fRoot;
for (std::string::const_reverse_iterator i = word.rbegin(); i != word.rend(); ++i) {
if (*i == '0円') {
continue;
}
if (currentVertex->fOutEdges.size() == 0 || currentVertex->fOutEdges.count(*i) == 0) {
return false;
} else {
currentVertex = currentVertex->fOutEdges[*i]->fTarget;
}
}
if (currentVertex->fOutEdges.count('0円') == 0) {
return false;
}
return true;
}
void logEdges(VertexPtr vp) {
for ( boost::unordered_map<char, EdgePtr>::iterator it = vp->fOutEdges.begin(); it != vp->fOutEdges.end(); it++) {
char c = it->first;
EdgePtr ep = it->second;
std::string output;
if (c == '0円')
output.append("EOW");
else
output.push_back(c);
logWarning("---", "edge: %s", output.c_str());
logEdges(ep->fTarget);
}
}
void logEdges() {
logEdges(fRoot);
}
};
2 Answers 2
I don't have the time to do a completely thorough review at the moment, but here are some things that I noticed that may help you improve your code.
Supply the missing logWarning
function
This is more a matter of helping reviewers rather than necessarily a flaw in your code, but without the implementation of the logWarning
code that's called from within the logEdges
routine, it's not entirely clear what that function might do, although it's possible to make a guess.
Prefer std
to boost
functions
It's in no way a slight to the excellent boost
library, but all of the boost
elements this code uses are all now standardized (as of C++11). By using the std
forms rather than the boost
versions, your code is likely to be more portable.
Prefer private
to public
where practical
The TrieVertex
class has its only data member fOutEdges
as a public member. Rather than do it that way, it might be better to keep it private and declare ReverseTrie
to be a friend. That way ReverseTrie
would continue to have access to the data member, but no other class would. This seems like a safer design to me. The same is true for TrieEdge
.
Eliminate unused variables
The fC
member of the TrieEdge
class is never used. It can be omitted.
Consider adding utility functions
By adding some simple utility functions to the TrieVertex
class, you can make many portions of the code much easier to understand. Here are the functions I propose:
bool noEdges() const { return fOutEdges.size() == 0; }
bool noEdges(char ch) const { return fOutEdges.count(ch) == 0; }
VertexPtr addEdge(char ch) {
VertexPtr targetVertex = std::make_shared<TrieVertex>();
EdgePtr e = std::make_shared<TrieEdge>(targetVertex);
fOutEdges[ch] = e;
return e->fTarget;
}
Here's a some existing code without them:
if (currentVertex->fOutEdges.count('0円') == 0) {
VertexPtr targetVertex = boost::make_shared<TrieVertex>();
EdgePtr e = boost::make_shared<TrieEdge>('0円', targetVertex);
currentVertex->fOutEdges['0円'] = e;
exists = false;
}
Here's how it looks with them:
if (currentVertex->noEdges('0円')) {
currentVertex->addEdge('0円');
exists = false;
}
This
typedef
:typedef boost::shared_ptr<TrieVertex> VertexPtr;
should either be contained in a
class
or in anamespace
. Otherwise, it'll be directly exposed to anyone using this header.TrieEdge
andTrieVertex
should either have its members asprivate
, or the whole thing should be astruct
. Class data members should never bepublic
.If a member function isn't modifying any data members, it should be
const
:size_t size() const { return fSize; }
Since you compare
size()
to0
several times, you can instead have anempty()
function to make the checks a bit shorter:bool empty() const { return fSize == 0; }