This is my current coursework for university where I am to write a spell checking application and compile using C++11.
I have a dictionary file that is read into the program automatically and stored into a binary search tree in order. The user provides a text file that they would like to check which is split into tokens and stored in a string vector.
The program then loops through the string vector and checks if that word exists in the BST. If the word is not in the BST, it will print the word to the console as incorrectly spelled.
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <cstring>
#include "bst.h"
using namespace std;
string read_file(string &filename)
{
ifstream file(filename);
// If there is a file
if (file)
{
string data;
// Find total number of bytes of file
file.seekg(0, file.end);
data.resize(file.tellg());
file.seekg(0, file.beg);
// Read text into data string
file.read(&data[0], data.size());
// Close the file
file.close();
// Return the string
return data;
}
else return string("");
}
void split_words(vector<string> &words, string &data)
{
char *token, *delim = " .,?!:;/\"\'\n\t";
// Set token to first word
token = strtok(&data[0], delim);
// Split rest of words
while (token != nullptr)
{
// Convert each word from dictionary to lower case
for (int i = 0; i < strlen(token); ++i)
{
char word = tolower(token[i]);
token[i] = word;
}
// Push word to end of vector
words.push_back(token);
// Get the next word
token = strtok(nullptr, delim);
}
// Free the memory
token = nullptr;
delim = nullptr;
delete token;
delete delim;
}
int main(int argc, char **argv)
{
string file_to_check, file_data, word_dictionary = "dictionary.txt";
int spell_count = 0;
BinarySearchTree *tree = new BinarySearchTree();
vector<string> words;
// Loop through arguments
for (int i = 0; i < argc; ++i)
{
// Set file name if provided as argument
if (string(argv[i]) == "-i" && argv[i+1] != nullptr)
file_to_check = argv[i+1];
}
// If there was no file name as argument, prompt user
if (file_to_check.empty())
{
cout << "File name: ";
getline(cin, file_to_check);
cout << endl;
}
// If file name is not empty, run spell checking methods
if (!file_to_check.empty())
{
// Read words from dictionary.txt into file_data string
file_data = read_file(word_dictionary);
// Split the words and store into vector
split_words(words, file_data);
// Insert words into Binary Search Tree
for (int i = 0; i < words.size(); ++i)
stringstream(words[i]) >> *tree;
// Store the data read from specified file
file_data = read_file(file_to_check);
// Split sentences and store each word in words vector
split_words(words, file_data);
cout << endl;
// Loop through words vector and check if it exists in dictionary
for (int i = 0; i < words.size(); ++i)
{
// Print out non-occurring words
if (!tree->exists(words[i]))
{
spell_count++;
cout << words[i] << endl;
}
}
cout << endl;
// Print the total number of spelling mistakes
cout << spell_count << " spelling mistakes" << endl;
}
else
{
// If still no file specified, print message and exit
cout << "No file specified!" << endl;
return 0;
}
// Free the memory
delete tree;
return 0;
}
-
2\$\begingroup\$ Can you provide the "bst.h" header so people can compile this? Maybe also provide an input sample. \$\endgroup\$yuri– yuri2018年04月01日 17:22:40 +00:00Commented Apr 1, 2018 at 17:22
4 Answers 4
using namespace std;
Never do thisPrefer using
\n
overstd::endl
as the latter flushes the buffer which is rarely needed and can be detrimental to performance.const
your arguments whenever possible. More about const correctness*
and&
are considered to be part of the type in C++ so prefer writingint& foo
instead ofint &foo
. Likewise for*
.Never omit optional braces as doing so will lead to bugs eventually.
Reaching the end of main will automatically return 0 so you can remove the trailing
return 0
Don't initialize more than one variable per line.
I like that you use the prefix operator. Your naming is pretty good too so why are you doing this?
// Close the file file.close();
or this
// Loop through arguments for (int i = 0; i < argc; ++i)
Your comments are completely superfluous and can all be removed.
What is this?
// Free the memory token = nullptr; delim = nullptr; delete token; delete delim;
delete
is only used after a matching new
. It feels like you misunderstood something in regard to memory managment. Read up on this again.
new
delete
memory managment
BinarySearchTree *tree = new BinarySearchTree();
I don't know how this is implemented but a line like that looks like something from Java. Do you really need to put this on the heap?
- Personally I think the word splitting could be done rather elegantly in regex. However many people are quite opposed to using regex. I've also read that in the past regex performance in C++ was rather poor but it might be worth measuring to see if it's within acceptable limits for your use-case.
Regex example:
std::regex rx{R"(([^ .,?!:;/\"\'\n\t]+))"};
auto data_begin = std::sregex_iterator(data.begin(), data.end(), rx);
auto data_end = std::sregex_iterator();
for (std::sregex_iterator it = data_begin; it != data_end; ++it) {
// do something with words here
}
-
\$\begingroup\$ Thanks for the detailed feedback! We were given the binary search tree as a class hence the use of new. I will make changes based on your suggestions :) \$\endgroup\$Toby Cook– Toby Cook2018年04月02日 22:54:21 +00:00Commented Apr 2, 2018 at 22:54
-
2\$\begingroup\$ @TobyCook: you don't need to use
new
just because it's a class. You can writeBinarySearchTree tree;
, and then you don't need thedelete
at the end either. Less typing, fewer bugs! :) \$\endgroup\$Cris Luengo– Cris Luengo2018年04月03日 13:40:28 +00:00Commented Apr 3, 2018 at 13:40
I have two more comments to add to Yuri's answer:
Prefer to always initialize variables the moment you declare them:
char *token, *delim = " .,?!:;/\"\'\n\t"; token = strtok(&data[0], delim);
should be
char const* delim = " .,?!:;/\"\'\n\t"; char* token = strtok(&data[0], delim);
And don't declare variables until you need them. In C you had to declare all variables at the beginning of a function (I think the newest standard doesn't require this any more?) but in C++ you don't. There are advantages to declaring the variables where you need them: you can initialize them right away to a meaningful value, and you reduce their scope, reducing the probability of mistakes.
For example
spell_count
can be declared just before the loop that uses it, inside theif
statement. Likewise for all the strings except the file name.
Oh, one more thing. Three comments. Three comments to add to Yuri's answer:
I'd suggest using
constexpr
for constants, or at leastconst
:constexpr char const* word_dictionary = "dictionary.txt";
Note also the
const
I added to thedelim
variable under point 1. I think you should get at least a compiler warning if you don't, because you're pointing at unmutable data.I prefer constants to be declared at the top of the file, where they're easy to spot. At least the constants like
word_dictionary
that are configurable in the source code and you might want to adjust at some point.
-
\$\begingroup\$ Thanks Cris, I've taken on board your first two points. However, I noticed you changed my declaration of
word_dictionary
fromstd::string
tochar const*
. Are there any advantages of using the latter? \$\endgroup\$Toby Cook– Toby Cook2018年04月03日 15:49:24 +00:00Commented Apr 3, 2018 at 15:49 -
\$\begingroup\$ @TobyCook: None that I know of. I wasn't sure if
std::string
could beconstexpr
.char*
implicitly converts tostd::string
, and you only use it as input to a function, so there's really no difference in this case. \$\endgroup\$Cris Luengo– Cris Luengo2018年04月03日 17:27:09 +00:00Commented Apr 3, 2018 at 17:27
Everything important has been already said. Just a few minor notes:
you should reduce nesting in
read_file
function.Instead of:
if (file) { // Load the file return data; } else return string("");
You can do:
if (!file) { return string(""); } // Load the file return data;
This makes code much more readable.
If the file loading fails, you just simply return empty string and then try to parse it. This silent fail is very dangerous, you should definitely consider throwing an exception in case the file open fails:
if (!file) { throw std::runtime_error("Dictionary failed to load"); }
This change obviously requires
try
-catch
block within functionmain
.
Errors should be printed to the standard error stream, and cause the program to terminate with a failure (non-zero) status:
std::cerr << "No file specified!\n";
return 1;