Checking for spelling errors with a trie

Question 1

I have been lately working in my C++, and I finished up this trie, and I want to know how my C++ is looking so far.

Node.hpp

#pragma once
#include <memory>
static const int ALPHABETSIZE = 26;
struct Node {
 bool isEnd;
 int prefixCount;
 std::shared_ptr<Node> child[ALPHABETSIZE];
};

Trie.hpp

#pragma once
#include "Node.hpp"
#include <string>
#include <vector>
class Trie {
 std::shared_ptr<Node> head;
public:
 Trie();
 void insert(const std::string& word) const;
 bool search(const std::string& word) const;
 unsigned int wordsWithPrefix(const std::string& prefix) const;
};

Trie.cpp

#include "Trie.hpp"
Trie::Trie() {
 head = std::make_unique<Node>();
 head->isEnd = false;
 head->prefixCount = 0;
 for (int i = 0; i < ALPHABETSIZE; ++i) {
 head->child[i] = nullptr;
 }
}
void Trie::insert(const std::string& word) const{
 auto current = head;
 for (int i = 0; i < word.length(); ++i) {
 int letter = word[i] - 'a';
 if (!current->child[letter])
 current->child[letter] = std::make_shared<Node>();
 current->child[letter]->prefixCount++;
 current = current->child[letter];
 }
 current->isEnd = true;
}
bool Trie::search(const std::string& word) const{
 auto current = head;
 for (int i = 0; i < word.length(); ++i) {
 auto letter = word[i] - 'a';
 if (!current->child[letter])
 return false;
 current = current->child[letter];
 }
 return current->isEnd;
}
unsigned int Trie::wordsWithPrefix(const std::string& prefix) const{
 auto current = head;
 for (int i = 0; i < prefix.length(); ++i) {
 auto letter = prefix[i] - 'a';
 if (!current->child[letter])
 return 0;
 current = current->child[letter];
 }
 return current->prefixCount;
}

Source.cpp

#include <iostream>
#include <fstream>
#include <algorithm>
#include "Trie.hpp"
int main() {
 Trie trie;
 std::ifstream file;
 std::string word = " ", input;
 file.open("dictionary.txt");
 //Ask for input
 std::cin >> input;
 std::transform(input.begin(), input.end(), input.begin(), ::tolower);
 auto beenHere = false;
 //Add every word to the trie.
 while (std::getline(file, word)) {
 std::transform(word.begin(), word.end(), word.begin(), ::tolower);
 //if word's first character equals to input's, go on, else, dont add it onto the trie.
 if (word[0] == input[0]){
 beenHere = true;
 trie.insert(word); 
 }
 //If you've already been on the word's first character
 //, and it isn't the same as the input's you are done.
 if (beenHere && word[0] != input[0]) break;
 }
 std::string x = "";
 for (int i = 0, length = input.length(); i < length; ++i) {
 x += input.at(i);
 if (trie.wordsWithPrefix(x) == 0) {
 std::cout << x << "<" << input.substr(i+1, length - 1);
 break;
 }
 }
 file.close();
}

Question 2

Node

Raw arrays:

std::shared_ptr<Node> child[ALPHABETSIZE];

are really awkward. There's little reason to prefer to use one here. Use a std::array instead:

std::array<std::shared_ptr<Node>, ALPHABETSIZE> children;

Also I pluralized your array name. Seems to make more sense. You also want to make sure that a default-constructed Node is valid, so I find it clearer to just add data member initializers:

struct Node {
 bool isEnd = false;
 int prefixCount = 0;
 std::array<std::shared_ptr<Node>, ALPHABETSIZE> children;
};

Now you don't have to worry about default/value initialization.

You current construct your head via:

head = std::make_unique<Node>();

But head is a shared_ptr so you should do:

head = std::make_shared<Node>();

And with the added data member initializers, that's all we need to do in our constructor. Done!

insertion

Prefer a range-based for loop anytime you want to loop over everything in a container without needing the iterator. It expresses intent better:

for (char c : word) {
 int letter = c - 'a';
 ...
}

Also, what if word contains anything other than a lower-case letter? What if it's a capital letter or a space or a hyphen? This does no error checking here, so leaves open the possibility of undefined behavior due to invalid memory access.

Choice of data structure

Does having 26 children for each node make sense? That could be very memory intensive if our data set is sparse! I'd expect something more like:

std::unordered_map<char, std::shared_ptr<Node>>

with which this would just look like:

auto& next = current->children[letter];
if (!next) next = std::make_shared<Node>();
next->prefixCount++;
current = next;

searching

I would rename search() to contains(), and same points apply about sanity checking your inputs and using a range-based for loop.

wordsWithPrefix

Note that your search and wordsWithPrefix do the same thing: we walk the length of the string, break early if we run out of children, and do something with the result. That suggests adding a helper function:

std::shared_ptr<Node> atPrefix(const std::string& prefix) const;

which does that work for us. With that function, we can implement:

bool contains(const std::string& word) const
{
 auto node = atPrefix(word);
 return node && node->isEnd;
}
unsigned int wordsWithPrefix(const std::string& prefix) const
{
 auto node = atPrefix(prefix);
 return node ? node->prefixCount : 0;
}

Trie construction

You may want to add a constructor that will iterate over and insert a bunch of words. Maybe take two iterators, same as the other standard containers.

cin and getline

Be careful about mixing those two. It doesn't always work.

Also, don't do this:

std::string word = " ", input;

Space isn't at a premium, you can declare your variables on separate lines.

main()

What is input? I don't understand your usage here. Are you mandating a sorted dictionary.txt for this to work? Some comments indicating this would be helpful.

final loop

at() vs operator[] just does range-checking. You don't need that here, so simply use operator[]. Also if what you want to do is divide input, you don't need a second argument to substr() - just the one will automatically print the remainder of the string from there on out. Besides, your length argument is incorrect anyway.

Prefer:

std::string x;
for (char c : input) {
 x += c;
 if (trie.wordsWithPrefix(x) == 0) {
 std::cout << x << '<' << input.substr(x.size());
 break;
 }
}

file.close()

Unnecessary. The ifstream destructor will take care of itself. RAII for the win.

Question 3

Wow, that is an amazing review. If @Barry were my colleague reviewing my code, I would be the happiest programmer around.

Question 4

Thanks for answering! All of what you said worked flawlessly, again, thanks. After changing my code a little bit more, it now works with a capital letters and hyphens. Here's my code if you want to check it out I do believe there's some memory leaking somewhere, but I can't find where... Anyways, thanks!

Barry Barry 18.5k1 gold badge40 silver badges92 bronze badges · Accepted Answer · 2015-10-11 00:43:32Z

Node

Raw arrays:

std::shared_ptr<Node> child[ALPHABETSIZE];

are really awkward. There's little reason to prefer to use one here. Use a std::array instead:

std::array<std::shared_ptr<Node>, ALPHABETSIZE> children;

Also I pluralized your array name. Seems to make more sense. You also want to make sure that a default-constructed Node is valid, so I find it clearer to just add data member initializers:

struct Node {
 bool isEnd = false;
 int prefixCount = 0;
 std::array<std::shared_ptr<Node>, ALPHABETSIZE> children;
};

Now you don't have to worry about default/value initialization.

You current construct your head via:

head = std::make_unique<Node>();

But head is a shared_ptr so you should do:

head = std::make_shared<Node>();

And with the added data member initializers, that's all we need to do in our constructor. Done!

insertion

Prefer a range-based for loop anytime you want to loop over everything in a container without needing the iterator. It expresses intent better:

for (char c : word) {
 int letter = c - 'a';
 ...
}

Also, what if word contains anything other than a lower-case letter? What if it's a capital letter or a space or a hyphen? This does no error checking here, so leaves open the possibility of undefined behavior due to invalid memory access.

Choice of data structure

Does having 26 children for each node make sense? That could be very memory intensive if our data set is sparse! I'd expect something more like:

std::unordered_map<char, std::shared_ptr<Node>>

with which this would just look like:

auto& next = current->children[letter];
if (!next) next = std::make_shared<Node>();
next->prefixCount++;
current = next;

searching

I would rename search() to contains(), and same points apply about sanity checking your inputs and using a range-based for loop.

wordsWithPrefix

Note that your search and wordsWithPrefix do the same thing: we walk the length of the string, break early if we run out of children, and do something with the result. That suggests adding a helper function:

std::shared_ptr<Node> atPrefix(const std::string& prefix) const;

which does that work for us. With that function, we can implement:

bool contains(const std::string& word) const
{
 auto node = atPrefix(word);
 return node && node->isEnd;
}
unsigned int wordsWithPrefix(const std::string& prefix) const
{
 auto node = atPrefix(prefix);
 return node ? node->prefixCount : 0;
}

Trie construction

You may want to add a constructor that will iterate over and insert a bunch of words. Maybe take two iterators, same as the other standard containers.

cin and getline

Be careful about mixing those two. It doesn't always work.

Also, don't do this:

std::string word = " ", input;

Space isn't at a premium, you can declare your variables on separate lines.

main()

What is input? I don't understand your usage here. Are you mandating a sorted dictionary.txt for this to work? Some comments indicating this would be helpful.

final loop

at() vs operator[] just does range-checking. You don't need that here, so simply use operator[]. Also if what you want to do is divide input, you don't need a second argument to substr() - just the one will automatically print the remainder of the string from there on out. Besides, your length argument is incorrect anyway.

Prefer:

std::string x;
for (char c : input) {
 x += c;
 if (trie.wordsWithPrefix(x) == 0) {
 std::cout << x << '<' << input.substr(x.size());
 break;
 }
}

file.close()

Unnecessary. The ifstream destructor will take care of itself. RAII for the win.

Wow, that is an amazing review. If @Barry were my colleague reviewing my code, I would be the happiest programmer around.
Thanks for answering! All of what you said worked flawlessly, again, thanks. After changing my code a little bit more, it now works with a capital letters and hyphens. Here's my code if you want to check it out I do believe there's some memory leaking somewhere, but I can't find where... Anyways, thanks!

Stack Exchange Network

Checking for spelling errors with a trie

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Checking for spelling errors with a trie

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions