I'm trying to learn some coding to broaden my scope of knowledge, and I seemed to have run into a bit of a conundrum.
I'm trying to create a program to output the number of characters, digits, punctuation, spaces, words and lines that are being read in from a file.
Here is the text file I am reading in:
See Jack run. Jack can run fast. Jack runs after the cat. The cat's fur is black. See Jack catch the cat.
Jack says, "I caught the cat."
The cat says, "Meow!"
Jack has caught 1 meowing cat. Jack wants 5 cats, but can't find any more.
Here is my code:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream lab3;
string word;
lab3.open("lab3.txt");
int countletters=0,countnum=0,countpunc=0,countspace=0,words=0,line=0;
char character,prevchar = 0;
if(!lab3)
{
cout << "Could not open file" << endl;
return 1;
}
while(lab3.get(character) && !lab3.eof())
{
if(isalpha(character))
{
countletters++;
}
if (isdigit(character))
{
countnum++;
}
if (ispunct(character))
{
countpunc++;
if (isalpha(prevchar))
{
words++;
}
}
if (isspace(character))
{
countspace++;
if (isalpha(prevchar))
{
words++;
}
}
if(character=='\n')
{
line++;
}
prevchar = character;
}
cout << "There are " << countletters << " letters." << endl;
cout << "There are " << countnum << " numbers." << endl;
cout << "There are " << countpunc << " punctuations." << endl;
cout << "There are " << countspace << " spaces." << endl;
cout << "There are " << words << " words." << endl;
cout << "There are " << line << " sentences." << endl;
lab3.close();
return 0;
}
Output:
There are 167 letters.
There are 2 numbers.
There are 18 punctuations.
There are 52 spaces.
There are 47 words.
There are 4 sentences.
Some things I am hoping to learn:
- Advice for improvements on my code for learning purposes/efficiency.
- Explanation for reading information in from a text file: whether it is letters, numbers, punctuation - whatever you may run across doing this type of data-processing.
Some things I am aware of:
using namespace std;
is not good practice - what is the best practice for real world applications?- I am a beginner and this may not be (definitely is not) the cream-of-the-crop coding.
3 Answers 3
I see some things that may help you improve your code.
Don't abuse using namespace std
Putting using namespace std
at the top of every program is a bad habit that you'd do well to avoid. In my own production code, I usually simply type std::
where needed. It's a little bit more typing but it has two very nice benefits: it absolutely avoids any possibility of name collisions and it makes it absolutely clear which particular namespace is being used for various items.
Make sure you have all required #include
s
The code uses std::isalpha
but doesn't #include <cctype>
or <locale>
. It's not clear which one you want here. The functions in <cctype>
match the functions you're using, but you should be aware that they are only defined for the C
locale. Read this for details.
Omit unused variables
Because word
is never used, it can and should be omitted from the program.
Use better naming
I would say that words
is a good variable name, but line
(singular) is not. Also, we have countletters
(plural) but countnum
(singular). Some consistency in naming would improve this program.
Initialize and open files in one step
Instead of separately invoking lab3.open()
I'd suggest instead writing it this way:
std::ifstream lab3{"lab3.txt"};
Or if you wanted to make the program much more flexible and allow the user to specify a file name, this could be:
std::ifstream lab3{argv[1]};
Decompose the program into smaller parts
Right now, all of the code is in main
which isn't necessarily wrong, it means that it's not only hard to reuse but also hard to troubleshoot. Better is to separate the code into small chunks. It makes it both easier to understand and easier to fix or improve. In this case, I'd suggest creating an object that looks like this:
class WordCounter
{
public:
WordCounter();
void count(std::istream &in);
friend std::ostream& operator<<(std::ostream &out, const WordCounter &w);
private:
int letters;
int nums;
int puncts;
int spaces;
int words;
int lines;
};
When you define those functions, then, the main
routine can look like this:
int main()
{
std::ifstream lab3{"lab3.txt"};
if (!lab3)
{
std::cout << "Could not open file\n";
return 1;
}
WordCounter counter;
counter.count(lab3);
std::cout << counter;
}
Use consistent formatting
Using consistent formatting helps readers of your code understand it without distraction. This code is mostly well formatted, but the indenting within the while
loop is a bit inconsistent.
Use precise terminology
The code claims to be counting sentences, but is actually counting lines. Either number will work if no lines contain more than one sentence, but I did not see any guarantee that this will always be the case.
Don't use std::endl
if you don't really need it
The difference betweeen std::endl
and '\n'
is that '\n'
just emits a newline character, while std::endl
actually flushes the stream. This can be time-consuming in a program with a lot of I/O and is rarely actually needed. It's best to only use std::endl
when you have some good reason to flush the stream and it's not very often needed for simple programs such as this one. Avoiding the habit of using std::endl
when '\n'
will do will pay dividends in the future as you write more complex programs with more I/O and where performance needs to be maximized.
Omit return 0
When a C or C++ program reaches the end of main
the compiler will automatically generate code to return 0, so there is no need to put return 0;
explicitly at the end of main
.
Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the
main
function is equivalent to calling theexit
function with the value returned by themain
function as its argument; reaching the}
that terminates themain
function returns a value of 0.
For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C99 and C++98) have maintained the same idea. We rely on automatically generated member functions in C++, and few people write explicit return;
statements at the end of a void
function. Reasons against omitting seem to boil down to "it looks weird". If, like me, you're curious about the rationale for the change to the C standard read this question. Also note that in the early 1990s this was considered "sloppy practice" because it was undefined behavior (although widely supported) at the time.
So I advocate omitting it; others disagree (often vehemently!) In any case, if you encounter code that omits it, you'll know that it's explicitly supported by the standard and you'll know what it means.
About using namespace std;
. It's best to just do, e.g., std::ifstream
etc each time. There's a lot of stuff here, and even in this example it would have made it easier for me to see that std::ispunc
was built-in, and not something you wrote for this.
Variable names: If your name is two words put together, you should use either snake_case
or camelCase
consistently. E.g. countletters
-> countLetters
. I'd find numLetters
even more readable. Some people would even say you really ought to make it numberOfLetters
. lab3
also ought to be more descriptive.
Arrange your code logically. First you declare lab3
. Then you declare word
. Then you declare a lot of other stuff. And finally, you make sure lab3
opened correctly. Doing all the ifstream
stuff in one place makes it so much easier to read/fix later on. E.g.
std::ifstream input("lab3.txt");
if (!input)
{
std::cerr << "Could not open file." << std::endl;
return 1;
}
For reading the file, personally I would use std::getline, then split each line into a vector of "words." First of all, this would it easier to check whether to count something as a word (I think yours breaks on contractions.) And then you can just loop over each character of each word of each line for the small stuff.
Misc. std::ifstream
's close()
gets called by the destructor when it goes out of scope. No need to do it manually here. Your output says you have 4 sentences. You have 4 lines. Several more sentences. There's probably not much to do about it here, but whenever I see a whole bunch of variable definitions grouped together, my first thought is to move them closer to where they're actually used. But again they're already about as close as they can get here. You never use word
.
-
\$\begingroup\$ "For reading the file, personally I would use
std::getline
..." This is a far bigger issue than you make it out to be. Reading character-by-character will be extremely slow. You absolutely need to do buffered input if you want to be able to handle anything but the most trivial of inputs. \$\endgroup\$Cody Gray– Cody Gray2017年01月25日 14:33:17 +00:00Commented Jan 25, 2017 at 14:33 -
\$\begingroup\$ @CodyGray I think that if you test, you will find that char-by-char input already is buffered in most implementations. \$\endgroup\$Edward– Edward2017年01月26日 01:21:41 +00:00Commented Jan 26, 2017 at 1:21
Provide a meaningful exit value
Some others advised not to use a return
statement at the end of the program, and not to close()
the std::ifstream
. However, this means that your program will always exit with code 0, even though some error might have happened while reading the input, even if the initial opening of the input succeeded. It is better to check the status of the input at the end of the program, and return a meaningful exit code.
Similarly, if it is the job of your program to provide a certain output,
you also want to return an error code if your program could not actually write that output. So you should also close()
the output if applicable and check it for problems. In the case of std::cout
, there is no close()
function, but you should call flush()
instead.
I suggest you add the following at the end of your program:
input.close();
std::cout.flush();
if (input.fail() || std::cout.fail())
return 1;
There are 0 words
really? \$\endgroup\$