I want to write my own program to search for a word in a sorted text file. In the text file each word is in one line. I want to write this myself because I did not find any simple program (without libraries such as boost) which can provide suggestion word which means a word with most similarity. Here is my code but I don't know if I am in the right or wrong track. Appreciate any suggestion.
#include<cstdlib>
#include<fstream>
#include<iostream>
using namespace std;
int main(){
ifstream fin;
fin.open("/usr/dict/words");
if(!fin.good()){cout<<"No dictionary file!\n"; exit(1);}
fin.seekg(0,ios::end);
int last = fin.tellg();
fin.seekg(0,ios::beg);
int first = fin.tellg();
string word = "book";
int middle = 0;
string search;
bool found = false;
while(!fin.eof())
{
middle=(first+last)/2;
cout<<first<<" "<<last<<" "<<" "<<middle; //**
fin.seekg(middle);
fin>>search;
cout<<search<<endl; //**
if(search<word)
{
first=middle+1;
cout<<"a"; //**
}
else if(search>word)
{
last=middle-1;
cout<<"b"; //**
}
else
{
found=true;
break;
}
cout<<first<<" "<<last<<" "<<" "<<middle<<endl; //**
}
return 0;
}
2 Answers 2
Your biggest problem is the code does not take into account not finding the word. If it fails your code will loop forever. You need a not found exit case.
This is always a bad sign
while(!fin.eof())
That line is nearly always wrong (in every programming language I know). Especially in your case that should never be true (since you know where the end is).
Remember that the last successful read reads up-to but not past the EOF. The eof-flag is only set when you read past EOF.
So on the last good read you will read up-to the end of file processes the data. There is no more data left in the file but the eof-flag will not be set because you have not read past the EOF yet so you will re-enter the loop. The next attempt to read data will fail (setting the eof-flag), but the read has failed and thus left the variables you just read into in a questionable state.
Luckily you don't get into that state because you move after each search. But it is a sign of the wrong type of test. This should be:
while(searching) // Set to false when you finish searching.
// Which may be when you find the word
// or there are no more word to search.
You are assuming that words start on the exact boundary of middle!
middle=(first+last)/2;
fin.seekg(middle);
fin>>search;
Not sure I believe that will hold. Middle is a good estimate. But you need to find the start of the word. So seek forward for the next word.
middle=(first+last)/2;
fin.seekg(middle);
fin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
if (!fin.eof()) // That may have read past EOF
{
}
Correct typing in C++ is everything:
fin.seekg(0,ios::end);
int last = fin.tellg(); // This does not return an int.
// With todays big files this can be rather a large
// number.
// That line should be.
std::streamsize last = fin.tellg();
If you don't care about the actual type (you just want the correct one). Then let the compiler work it out for you.
// C++11 let the compiler deduct the correct type of last.
auto last = fin.tellg()
Why take two lines to open the file:
ifstream fin;
fin.open("/usr/dict/words");
// Much simpler to read and write:
ifstream fin("/usr/dict/words");
Sure you can test for good(). But its better to let the stream decide if it is a usable state.
if(!fin.good()){cout<<"No dictionary file!\n"; exit(1);}
// Prefer to let the stream decide:
if (fin) // Stream used in a boolean context will convert itself into
// something useful that is convertible to true if you can
// use the stream or false if the stream is not usable.
{
cout<<"No dictionary file!\n";
exit(1);
}
If the size of the dictionary is not huge (couple of Meg). I personally would just load it into memory and let the standard data structures handle it. Alternatively You could map the file to memory and do a more standard search.
bool isWordInDict(std::string const& word);
{
struct MyDict: std::set<std::string>
{
typedef std::set<std::string>::const_iterator const_iterator;
MyDict()
{
std::ifstream fin("/usr/dict/words");
std::copy(std::istream_iterator<std::string>(fin), std::istream_iterator<std::string>(),
std::inserter(*this, end()));
}
};
static const MyDict dict;
MyDict::const_iterator find = dict.find(word);
return find != dict.end();
}
It looks like what you are trying to do is a binary search on your dict, but you have a number of flaws:
Your code has compile warnings:
found
is set, but never usedYou are doing a binary search on the file, but not identifying the word, but a word fragment. In my use case, for example, the code ends up oscillating between two points, neither of which is related to 'book' (in other words, your code does not work):
87578 87614 87596ly b87578 87595 87596 87578 87595 87586d b87578 87585 87586 87578 87585 87581ulated b87578 87580 87581 87578 87580 87579ngulated b87578 87578 87579 87578 87578 87578angulated a87579 87578 87578 87579 87578 87578angulated a87579 87578 87578 87579 87578 87578angulated a87579 87578 87578 87579 87578 87578angulated a87579 87578 87578 87579 87578 87578angulated a87579 87578 87578 87579 87578 87578angulated a87579 87578 87578 87579 87578 87578angulated
You need to identify the complete word, not just the word fragment.
You may be dealing with large files, and your mid-point is at risk of overflow errors. The code:
middle=(first+last)/2;
would be better as:
middle = first + (last - first) / 2;
but even then, you may still be at risk of errors with large files.
std::string s; /* input to string */ s.find("whatever");
? \$\endgroup\$