I'm self-learning C now, and I want to find out if my code is good and clean or kinda messy.
For this exercise, the program tells if the text which was inserted has the keyword "asd", without using functions from string.h
. If so, it returns how many times it was found.
it's not a big program, but I want to make sure it's clean and good so I don't pick up any bad practices.
#include "stdafx.h"
#include <stdio.h>
char findInStr(char *str, char *find) {
unsigned char howMany = 0, strCounter = 0, findCounter = 0;
do {
if (findCounter == 3)
howMany++;
if (str[strCounter] == find[findCounter])
findCounter++;
else
findCounter = 0;
strCounter++;
} while (str[strCounter-1] != '0円');
return howMany;
}
int main() {
char str[60];
char find[5] = "asd";
char found;
printf("enter text: ");
gets_s(str, 60);
found = findInStr(str, find);
if (found)
printf("found!!! %d\n times", found);
else
printf("not found\n");
return 0;
}
3 Answers 3
The problem statement
There's an ambiguity in the requirements, because you don't specify how to handle overlapping substrings. For example, should findInStr("ababa", "aba")
return 1, or 2? Either of those may be useful, but be clear which interpretation is needed. This code appears to count non-overlapping substrings.
Non-standard header
Despite its name, stdafx.h
is not a standard header, so you can't use it in portable code. Thankfully this program can easily become standard C, if you replace gets_s()
with the equivalent fgets()
call:
if (!fgets(str, sizeof str, stdin)) {
perror("gets");
return 1;
}
Note: this is not quite equivalent (thanks to Martin R for the comment), as get_s()
discards the final newline; that won't make a difference unless we specifically look for a string ending in newline (which could now match once, rather than never).
Naming
I like the variable names you've used - I found it easy to understand your algorithm with the clear names. (They do need to be a more suitable type - see Martin's answer for a very clear explanation).
I think they would be even clearer if you give them a line each for their declarations.
Bug (or undocumented constraint)
This test assumes that find
will always be three characters long:
if (findCounter == 3)
howMany++;
We can make more versatile by counting until we see the NUL character at the end of find
, like this:
if (find[findCounter] == '0円') {
howMany++;
findCounter = 0;
}
Bug
If a character is repeated in find
, we don't start looking again in the right place. We can test this with findInStr("ababaca", "abac")
- this doesn't find the string because when we reach the second b
, that doesn't match c
and we start again from there (instead of going back to the start of the potential match).
The fix for this to subtract findCounter
from strCounter
before we reset it to zero, so we resume searching from the right place:
if (str[strCounter] == find[findCounter]) {
findCounter++;
} else {
strCounter -= findCounter;
findCounter = 0;
}
Modified version
Keeping your algorithm, but incorporating the fixes from this and other answers, I get:
#include <stdio.h>
/* Count the number of non-overlapping occurrences of find in str.
* Returns zero if either string is empty.
*/
size_t findInStr(const char *str, const char *find)
{
size_t howMany = 0;
size_t strCounter = 0;
size_t findCounter = 0;
do {
if (find[findCounter] == '0円') {
++howMany;
findCounter = 0;
}
if (str[strCounter] == find[findCounter]) {
++findCounter;
} else {
strCounter -= findCounter;
findCounter = 0;
}
} while (str[strCounter++] != '0円');
return howMany;
}
I changed the test program to accept command-line arguments; this makes for easier testing. I hope that this is educational reading (note that I check the arguments before attempting to use them).
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "Usage: %s haystack needle\n", argv[0]);
return 1;
}
const char *const haystack = argv[1];
const char *const needle = argv[2];
const size_t found = findInStr(haystack, needle);
switch (found) {
case 0:
printf("'%s' was not found in '%s'.\n", needle, haystack);
break;
case 1:
printf("'%s' was found once in '%s'.\n", needle, haystack);
break;
default:
printf("'%s' was found %zd times in '%s'.\n", needle, found, haystack);
break;
}
return 0;
}
Pointer version
This is a bit more advanced, and harder to get right, but I'll show you what this looks like when converted to use pointers rather than indexes. A good optimizing compiler will likely generate the same code for both, but you'll want to be able to study and work with pointer code when you come across it.
size_t findInStr(const char *str, const char *const find)
{
size_t howMany = 0;
for (const char *find_pos = find; *str; ++str) {
if (*str == *find_pos) {
/* matched this char */
if (!*++find_pos) {
/* reached the end of find; count it and reset */
++howMany;
find_pos = find;
}
} else {
/* not matched - back up and continue */
str -= find_pos - find;
find_pos = find;
}
}
return howMany;
}
Here, I've also used simply *str
rather than the longer, but equivalent *str != '0円'
.
Self-test program
An alternative main()
can run a test suite, which can be useful in to avoid introducing bugs as you modify the code. Here's what I accumulated when writing the above:
/* return failure count (0 or 1) */
static int test(const char *haystack, const char *const needle, size_t expected)
{
const size_t actual = findInStr(haystack, needle);
if (actual == expected) {
return 0;
}
/* incorrect result */
fprintf(stderr,
"FAIL: findInStr(\"%s\", \"%s\") returned %zd instead of %zd\n",
haystack, needle, actual, expected);
return 1;
}
int main(void)
{
return
+ test("abc", "", 0)
+ test("abc", "d", 0)
+ test("abc", "cd", 0)
+ test("abc", "ac", 0)
+ test("abc", "ab", 1)
+ test("ab", "abc", 0)
+ test("abab", "ab", 2)
+ test("ababa", "aba", 1)
+ test("ababac", "bac", 1)
;
}
You might choose to use a preprocessor #ifdef
/#else
to be able to compile the test or interactive version from one source, or you might write two programs sharing a single findInStr()
implementation.
-
\$\begingroup\$ Goode catch about not finding strings with repeated characters! – Minor remark:
gets_s
(from the Microsoft C runtime) replaces the newline character with a0円
character, so it is not completely equivalent tofgets()
\$\endgroup\$Martin R– Martin R2018年03月09日 17:03:33 +00:00Commented Mar 9, 2018 at 17:03 -
\$\begingroup\$ Thanks @Martin - I've edited to incorporate your observation. (Not having a Microsoft platform, I don't have the man pages for their functions, so just guessed based on the name). \$\endgroup\$Toby Speight– Toby Speight2018年03月12日 08:43:09 +00:00Commented Mar 12, 2018 at 8:43
-
\$\begingroup\$ I simply looked it up in the MSDN online documentation: msdn.microsoft.com/en-us/library/5b5x9wc7.aspx :) \$\endgroup\$Martin R– Martin R2018年03月12日 08:46:14 +00:00Commented Mar 12, 2018 at 8:46
The return type
Don't use char
as the return type in
char findInStr(char *str, char *find);
On many (and on all POSIX compliant) platforms, char
is a 8-bit integer
which may be unsigned or signed, so the maximal return value would be 255
or 127
.
Another problem is that you count the number of occurrences
in an unsigned char
, which means that – for example – a count of
200
would be returned as -56
if the character type is (8 bit) signed.
A better choice would be int
(which is guaranteed to have at least
16 bits) or long
(at least 32 bits). To be completely on the safe
side, use size_t
, which is a type than can hold the size of
any object.
Const parameters
Since your function does not modify the passed strings, it is a good habit to declare them as constant:
size_t findInStr(const char *str, const char *find);
The compiler can then check that you don't (inadvertently) modify the pointed-to memory, and may be able to do further optimizations on the calling side.
It also documents that the function does not modify the strings.
Array sizes
In
char find[5] = "asd";
the array is one element too large. That does no harm, but there is a risk of forgetting to change the array size if the string on the right-hand side is modified. Better let the compiler determine the size automatically:
char find[] = "asd";
And here
char str[60];
// ...
gets_s(str, 60);
the array size is specified twice, which bears the risk of changing it at one place later but not at the other place. That is avoided with
char str[60];
// ...
gets_s(str, sizeof(str));
Bugs
You don't reset findCounter
properly, resulting in this miscount:
enter text: asdasd
found!!! 1
times
Your function tests for findCounter == 3
, but accepts a find
parameter of any length.
Style
Never omit the "optional" braces for your if
statements, especially if the statement spans more than one line. You will eventually contribute to a coding accident, and it will be your fault.
Your loop is awkward, especially the -1
in ... while (str[strCounter-1] != '0円')
. A for
loop would be more idiomatic and readable:
for (int strCounter = 0; str[strCounter] != '0円'; strCounter++) {
if (str[strCounter] == find[findCounter]) {
findCounter++;
if (findCounter == findLen) {
findCounter = 0;
howMany++;
}
} else {
findCounter = 0;
}
}
Avoid using unsigned
integers; they cause more trouble than they are worth.
-
\$\begingroup\$ thanks, agree with everything except the -1 problem. if you omit it, the function won't recognize the asd if it's written at the end. unless you have another idea? \$\endgroup\$Ariel Ariel– Ariel Ariel2018年03月08日 22:39:06 +00:00Commented Mar 8, 2018 at 22:39
-
1\$\begingroup\$ I've been wondering about
signed
VSunsigned
for a while now and it seems to be a very heated debate. \$\endgroup\$yuri– yuri2018年03月08日 22:44:53 +00:00Commented Mar 8, 2018 at 22:44 -
\$\begingroup\$ You need to do
-1
only because you incrementedstrCounter
prematurely. If you had used thefor
loop as I suggested, or if you had used... while (str[strCounter++] != '0円')
, then it would have also worked, but less awkwardly. \$\endgroup\$200_success– 200_success2018年03月08日 22:46:59 +00:00Commented Mar 8, 2018 at 22:46 -
1\$\begingroup\$ Disagreed on the for loop brackets: it's a basic language feature that everyone should get used to. Also disagreed on the unsigned int advice. Using unsigned ints will make you think properly about what should and shouldn't be unsigned, and put asserts and static casts in the correct places, which means you catch errors as soon as possible, not as late as possible, like that article suggests (while messing around with memset of all things). \$\endgroup\$user673679– user6736792018年03月09日 10:03:40 +00:00Commented Mar 9, 2018 at 10:03
-
1\$\begingroup\$ @user673679, I'm with you on the unsigned types, but with 200_success on the use of braces (but only since the notorious Apple
goto fail
bug), despite mitigations such asgcc -Wmisleading-indentation
). Just shows that these two observations are somewhat opinionated rather than absolute. \$\endgroup\$Toby Speight– Toby Speight2018年03月09日 10:38:33 +00:00Commented Mar 9, 2018 at 10:38
Explore related questions
See similar questions with these tags.