Working on a C programming question where I have to write a function which will determine how many words are in a given string. Assume that one or more consecutive white spaces is a delimiter between words, and that the string you pass to your function is null terminated.
Have to use pointers and only the #include(stdio.h) library. Wondering how this can be improved upon or if there are any possible errors. Here it is:
#include <stdio.h>
int word_counter(char string[])
{
//We start with first word unless we have a empty string then we have no words
int count;
if(*string!='0円'){
count=1;
}
else{
count=0;
return 0;
}
//while we dont reach the end of the string
while(*string!='0円'){
//if we detect a whitespace
if(*string==' '){
//get previous character
string--;
// If previous character is not a space we increase the count
// Otherwise we dont since we already counted a word
if(*string!=' '){
count++;
}
//return pointer to current character
string++;
}
// set pointer to next character
string++;
}
return count;
}
//just to test if it works
int main(void)
{
char str[] = "Hello World!";
printf("How many words? = %i\n", word_counter(str));
return 0;
}
-
\$\begingroup\$ off topic, and sorry for that. But once again, I am glad to have migrated to Python: print(len("Hello World!".split())) \$\endgroup\$steffen– steffen2017年03月31日 10:24:29 +00:00Commented Mar 31, 2017 at 10:24
4 Answers 4
isspace
There are other characters besides for spaces. What happens if I do:
Hello<tab><tab>world!
Your code will report that there is one word. I would rewrite these:
if(*string==' '){
//get previous character
string--;
// If previous character is not a space we increase the count
// Otherwise we dont since we already counted a word
if(*string!=' '){
count++;
}
You should instead use isspace
for these kind of things. ' '
is for explicitly a space.
Indentation
Fix your indentation. Your main
uses 4 spaces, your word_counter
uses (maybe?) 2. Be sure that it is consistent. Choose one or the other.
Empty case rework
(Actually, as @200_success points out you don't need this corner case. I'm going to leave this up here though, because sometimes you will get cases like this and you should consider reworking them if they appear awkward)
Your empty corner case can be reworked:
int count;
if(*string!='0円'){
count=1;
}
else{
count=0;
return 0;
}
First you don't need to set count = 0
if you just return immediately. I would restructure you if
statement to be:
if (*string == '0円') {
return 0;
}
And from there continue with:
int count = 1;
This means we don't leave count
uninitialized either.
-
2\$\begingroup\$ I don't believe that there should be a special case for
if (*string == '0円') { return 0; }
. Rather, it should just initializeint count = 0;
. A zero-length string could then be handled using the general case. \$\endgroup\$200_success– 200_success2017年03月30日 20:33:49 +00:00Commented Mar 30, 2017 at 20:33 -
2\$\begingroup\$ Technically
isspace()
cannot be a solution because it requiresctype.h
, a header other thanstdio.h
, but any company or homework that disallows#include <ctype.h>
is just plain dumb. That being said, it's theoretically possible to do it with juststdio.h
functions; You'd dofor(i=0; sscanf(str, " %*s%n", &bytesRead) == 1; i++, str+=bytesRead){}
(obviously with proper error checking). \$\endgroup\$Iwillnotexist Idonotexist– Iwillnotexist Idonotexist2017年03月30日 20:35:12 +00:00Commented Mar 30, 2017 at 20:35 -
\$\begingroup\$ @200_success: Ok, I am going to leave it up there though as it should be a lesson for reworking awkward base cases. \$\endgroup\$Dair– Dair2017年03月30日 20:36:54 +00:00Commented Mar 30, 2017 at 20:36
-
\$\begingroup\$ Additionally don't use different brace positioning. The if statements and while loops use a brace on the same line but your functions use a brace on the next line. Arguments can be made for and against either convention but pick one and stick with it for everything with the exception of single line statements. \$\endgroup\$DavidTheWin– DavidTheWin2017年03月31日 11:58:04 +00:00Commented Mar 31, 2017 at 11:58
This section is verbose and buggy:
if(*string==' '){ //get previous character string--; // If previous character is not a space we increase the count // Otherwise we dont since we already counted a word if(*string!=' '){ count++; } //return pointer to current character string++; }
Fist of all, it could be simplified to this (and written with more conventional whitespace):
if (*string == ' ') {
// If previous character is not a space we increase the count
// Otherwise we don't since we already counted a word
if (*(string - 1) != ' ') {
count++;
}
}
But what if the input to the function begins with a space? You would try to look at the preceding character. There isn't any, so your function would have undefined behaviour (likely crashing).
A good remedy for this bug is to use the logic in @Edward's solution, using a variable to keep track of the class of the previous character.
I see a number of things that may help you improve your code.
Use const
where possible
The word_counter
function does not (and should not) alter the passed string and so the parameter should therefore be declared const
.
int word_counter(const char string[])
Evaluate each character only once
There's no need to back up and test the previous character to see if it's whitespace or not -- it was already evaluated the last time through the loop! What's needed then, is just to remember that result. One way to do that would be to use a boolean variable to keep track of whether we're in a word or not in a word. Here's a rework showing how that might look:
#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>
int word_counter(const char string[])
{
int count = 0;
for (bool inword = false; *string; ++string) {
if (isspace(*string)) {
if (inword) {
inword = false;
}
} else { // not whitespace
if (!inword) {
inword = true;
++count;
}
}
}
return count;
}
As noted in the other review, you should use isspace()
because tab, spaces, form-feeds, vertical tabs and newlines are all things that might separate words.
Understand the concept of a locale
It's often ignored or overlooked, but the isspace
function and its siblings in ctype.h
may change behavior under POSIX or POSIX-like environments, depending on the locale
that's currently in use. It's not a reason to avoid using these functions, but it's good to be aware of the subtle details.
Omit return 0
When a C or C++ program reaches the end of main
the compiler will automatically generate code to return 0, so there is no need to put return 0;
explicitly at the end of main
.
Note: when I make this suggestion, it's almost invariably followed by one of two kinds of comments: "I didn't know that." or "That's bad advice!" My rationale is that it's safe and useful to rely on compiler behavior explicitly supported by the standard. For C, since C99; see ISO/IEC 9899:1999 section 5.1.2.2.3:
[...] a return from the initial call to the
main
function is equivalent to calling theexit
function with the value returned by themain
function as its argument; reaching the}
that terminates themain
function returns a value of 0.
For C++, since the first standard in 1998; see ISO/IEC 14882:1998 section 3.6.1:
If control reaches the end of main without encountering a return statement, the effect is that of executing return 0;
All versions of both standards since then (C99 and C++98) have maintained the same idea. We rely on automatically generated member functions in C++, and few people write explicit return;
statements at the end of a void
function. Reasons against omitting seem to boil down to "it looks weird". If, like me, you're curious about the rationale for the change to the C standard read this question. Also note that in the early 1990s this was considered "sloppy practice" because it was undefined behavior (although widely supported) at the time.
So I advocate omitting it; others disagree (often vehemently!) In any case, if you encounter code that omits it, you'll know that it's explicitly supported by the standard and you'll know what it means.
One thing that sticks out to me is your backstepping to look at the previous character in the event of a space character, specifically:
if(*string==' '){
string--;
if(*string!=' '){
//whatever you did here
}
}
What happens if the first character is a space? You're going to go into string[-1], or segmentation fault. As someone previously mentioned you shouldn't decrement your index here, you should just do:
if(*string - 1 != ' ')
and I would add that you should do a check here such as:
if(*string != string && *(string - 1) != ' ')
to keep yourself from going out of bounds.
Also, modularity is your friend, it makes your code more easily digested, I'd make a function which checks to see if a character is a specific character like:
int isExpectedChar(char current, char expected){
return current == expected;
}
I know it's simple, but it's an important part of your logic that can be reused and me as a reader can quickly look at the expression and see that you're checking to see if it's a specific character by giving it a good function name.
Also, this is nit-picky and opinionated, and perhaps your university (or whomever gave you this assignment) has a style guide, but a bit of spacing really improves readability of code. Give it some space!
if(*string==' '){
if(*string == ' ') {
and also with arithmetic
if(*(string-1)==' '){
if(*(string - 1) == ' ') {
Just seems to me that things are a lot more legible when they're not crammed together.
Edit: fixed syntax error.
-
1\$\begingroup\$
*string - 1 != ' '
isn't quite right — it is equivalent to checking*string != '!'
. \$\endgroup\$200_success– 200_success2017年03月31日 18:21:37 +00:00Commented Mar 31, 2017 at 18:21 -
\$\begingroup\$ Ah gotcha. My pointer arithmetic is a bit rusty since I rarely work in C anymore \$\endgroup\$user138741– user1387412017年03月31日 18:33:29 +00:00Commented Mar 31, 2017 at 18:33
-
\$\begingroup\$ So are you going to update the answer? \$\endgroup\$vol7ron– vol7ron2017年03月31日 22:04:19 +00:00Commented Mar 31, 2017 at 22:04
-
\$\begingroup\$ I did. See edit message at bottom of post. \$\endgroup\$user138741– user1387412017年03月31日 22:07:45 +00:00Commented Mar 31, 2017 at 22:07
-
\$\begingroup\$ There are still errors. Try writing a complete solution using your advice and you might see them. \$\endgroup\$200_success– 200_success2017年04月01日 05:07:33 +00:00Commented Apr 1, 2017 at 5:07