I have been lately working in my C++, and I finished up this trie, and I want to know how my C++ is looking so far.
Node.hpp
#pragma once
#include <memory>
static const int ALPHABETSIZE = 26;
struct Node {
bool isEnd;
int prefixCount;
std::shared_ptr<Node> child[ALPHABETSIZE];
};
Trie.hpp
#pragma once
#include "Node.hpp"
#include <string>
#include <vector>
class Trie {
std::shared_ptr<Node> head;
public:
Trie();
void insert(const std::string& word) const;
bool search(const std::string& word) const;
unsigned int wordsWithPrefix(const std::string& prefix) const;
};
Trie.cpp
#include "Trie.hpp"
Trie::Trie() {
head = std::make_unique<Node>();
head->isEnd = false;
head->prefixCount = 0;
for (int i = 0; i < ALPHABETSIZE; ++i) {
head->child[i] = nullptr;
}
}
void Trie::insert(const std::string& word) const{
auto current = head;
for (int i = 0; i < word.length(); ++i) {
int letter = word[i] - 'a';
if (!current->child[letter])
current->child[letter] = std::make_shared<Node>();
current->child[letter]->prefixCount++;
current = current->child[letter];
}
current->isEnd = true;
}
bool Trie::search(const std::string& word) const{
auto current = head;
for (int i = 0; i < word.length(); ++i) {
auto letter = word[i] - 'a';
if (!current->child[letter])
return false;
current = current->child[letter];
}
return current->isEnd;
}
unsigned int Trie::wordsWithPrefix(const std::string& prefix) const{
auto current = head;
for (int i = 0; i < prefix.length(); ++i) {
auto letter = prefix[i] - 'a';
if (!current->child[letter])
return 0;
current = current->child[letter];
}
return current->prefixCount;
}
Source.cpp
#include <iostream>
#include <fstream>
#include <algorithm>
#include "Trie.hpp"
int main() {
Trie trie;
std::ifstream file;
std::string word = " ", input;
file.open("dictionary.txt");
//Ask for input
std::cin >> input;
std::transform(input.begin(), input.end(), input.begin(), ::tolower);
auto beenHere = false;
//Add every word to the trie.
while (std::getline(file, word)) {
std::transform(word.begin(), word.end(), word.begin(), ::tolower);
//if word's first character equals to input's, go on, else, dont add it onto the trie.
if (word[0] == input[0]){
beenHere = true;
trie.insert(word);
}
//If you've already been on the word's first character
//, and it isn't the same as the input's you are done.
if (beenHere && word[0] != input[0]) break;
}
std::string x = "";
for (int i = 0, length = input.length(); i < length; ++i) {
x += input.at(i);
if (trie.wordsWithPrefix(x) == 0) {
std::cout << x << "<" << input.substr(i+1, length - 1);
break;
}
}
file.close();
}
1 Answer 1
Node
Raw arrays:
std::shared_ptr<Node> child[ALPHABETSIZE];
are really awkward. There's little reason to prefer to use one here. Use a std::array
instead:
std::array<std::shared_ptr<Node>, ALPHABETSIZE> children;
Also I pluralized your array name. Seems to make more sense. You also want to make sure that a default-constructed Node
is valid, so I find it clearer to just add data member initializers:
struct Node {
bool isEnd = false;
int prefixCount = 0;
std::array<std::shared_ptr<Node>, ALPHABETSIZE> children;
};
Now you don't have to worry about default/value initialization.
You current construct your head
via:
head = std::make_unique<Node>();
But head
is a shared_ptr
so you should do:
head = std::make_shared<Node>();
And with the added data member initializers, that's all we need to do in our constructor. Done!
insertion
Prefer a range-based for loop anytime you want to loop over everything in a container without needing the iterator. It expresses intent better:
for (char c : word) {
int letter = c - 'a';
...
}
Also, what if word
contains anything other than a lower-case letter? What if it's a capital letter or a space or a hyphen? This does no error checking here, so leaves open the possibility of undefined behavior due to invalid memory access.
Choice of data structure
Does having 26 children for each node make sense? That could be very memory intensive if our data set is sparse! I'd expect something more like:
std::unordered_map<char, std::shared_ptr<Node>>
with which this would just look like:
auto& next = current->children[letter];
if (!next) next = std::make_shared<Node>();
next->prefixCount++;
current = next;
searching
I would rename search()
to contains()
, and same points apply about sanity checking your inputs and using a range-based for loop.
wordsWithPrefix
Note that your search
and wordsWithPrefix
do the same thing: we walk the length of the string, break early if we run out of children, and do something with the result. That suggests adding a helper function:
std::shared_ptr<Node> atPrefix(const std::string& prefix) const;
which does that work for us. With that function, we can implement:
bool contains(const std::string& word) const
{
auto node = atPrefix(word);
return node && node->isEnd;
}
unsigned int wordsWithPrefix(const std::string& prefix) const
{
auto node = atPrefix(prefix);
return node ? node->prefixCount : 0;
}
Trie construction
You may want to add a constructor that will iterate over and insert a bunch of words. Maybe take two iterators, same as the other standard containers.
cin and getline
Be careful about mixing those two. It doesn't always work.
Also, don't do this:
std::string word = " ", input;
Space isn't at a premium, you can declare your variables on separate lines.
main()
What is input
? I don't understand your usage here. Are you mandating a sorted dictionary.txt
for this to work? Some comments indicating this would be helpful.
final loop
at()
vs operator[]
just does range-checking. You don't need that here, so simply use operator[]
. Also if what you want to do is divide input
, you don't need a second argument to substr()
- just the one will automatically print the remainder of the string from there on out. Besides, your length argument is incorrect anyway.
Prefer:
std::string x;
for (char c : input) {
x += c;
if (trie.wordsWithPrefix(x) == 0) {
std::cout << x << '<' << input.substr(x.size());
break;
}
}
file.close()
Unnecessary. The ifstream
destructor will take care of itself. RAII for the win.
-
1\$\begingroup\$ Wow, that is an amazing review. If @Barry were my colleague reviewing my code, I would be the happiest programmer around. \$\endgroup\$Koby Becker– Koby Becker2015年10月11日 04:58:35 +00:00Commented Oct 11, 2015 at 4:58
-
\$\begingroup\$ Thanks for answering! All of what you said worked flawlessly, again, thanks. After changing my code a little bit more, it now works with a capital letters and hyphens. Here's my code if you want to check it out I do believe there's some memory leaking somewhere, but I can't find where... Anyways, thanks! \$\endgroup\$Levon– Levon2015年10月11日 14:36:19 +00:00Commented Oct 11, 2015 at 14:36