This is a simple version from the function strstr. It returns the address of the first occurrence of the substring s2 in s1. I want to know possible problems in the code, and how to improve it, in general.
#include <stdio.h>
/* Version of strstr, which returns the adress of
the first occurrence of s2 in s1, else, NULL
*/
char *strs(char *s1, char* s2)
{
char *ptr = NULL;
int i, n;
for (i = 0, n = 0; s1[i] != '0円'; i++)
{
if (s1[i] == s2[n])
{
if (n == 0)
ptr = &(s1[i]);
if (s2[n + 1] == '0円')
return ptr;
++n;
}
else
n = 0;
}
return NULL;
}
int main(void)
{
char s1[] = "I'm waiting in my cold cell\nWhen the bell begins to chime\n";
char s2[] = "waiting";
char *ptr = strs(s1, s2);
if (ptr == NULL)
{
printf("There is no occurrence of \"%s\" in the first string!\n",
s2);
}
else
{
printf("%s\n", ptr);
}
}
1 Answer 1
Comments about your code:
I believe that you would rename it to something like strstr
if you could.
You should accept pointers to const char
, since you don't modify them.
You should initialize i
and n
right inside the loop. It will narrow their scope and make the code more maintainable since it won't pollute the scope of the function. On top of that, they will be deallocated whenever the loop ends.
using int
for index is a very bad idea. The biggest issues with it that it is signed and not guaranteed to be large enough. You should use size_t
instead, because it is guaranteed that size_t
will have enough capacity to index array of maximum possible size.
You're increasing overhead by using subscript ([]
) operator.
Loop could be rewritten in other, more readable and straightforward form, one of which is shown in suggested implementation. I know that you would like to have high performance, but I don't think that it is that important in this situation. May be with more context of the usage of the algorithm I could suggest particular algorithm (there are quite a few of them). The best in my opinion is Boyer–Moore string search algorithm.
Suggested implementation:
I think that strncmp
should be used. I know that including string.h
header defeats the purpose, but you could write your own. It increases code reuse in C!
To make it work we just need to precalculate length of the substring.
const char *strs(const char *text, const char *substr)
{
size_t substrlen = strlen(substr);
while (*text != 0)
{
if (strncmp(text, substr, substrlen) == 0)
{
return text;
}
++text;
}
return NULL;
}
The performance gain from the data being const depends on physical constness and the compiler. Further reading.
-
1\$\begingroup\$ Thanks for the answer! I really considered using strncmp, but it just felt like cheating. By the way, should I use const char in the pointer because makes the code more readable, or because makes it faster? (or both) \$\endgroup\$Lúcio Cardoso– Lúcio Cardoso2016年08月07日 18:37:22 +00:00Commented Aug 7, 2016 at 18:37
-
1\$\begingroup\$ First reason + it prevents from occasional modifying string. The semantic of the function is that the function doesn't modify the string, so it will make it easier for people to understand that it for sure (not considering casts) won't modify the string. About the performance, it depends on the data being physically const. Unfortunately I don't have enough knowledge about it. Probably another review will be able to answer the question. I will upvote it so people could see it. \$\endgroup\$Incomputable– Incomputable2016年08月07日 18:39:33 +00:00Commented Aug 7, 2016 at 18:39
-
\$\begingroup\$ @LúcioCardoso, updated the answer with relevant SO posts. Have a nice read. \$\endgroup\$Incomputable– Incomputable2016年08月07日 18:47:29 +00:00Commented Aug 7, 2016 at 18:47
-
\$\begingroup\$ "The algorithm can be written to use less memory and possibly incur less overhead." can you provide an explanation to make this actually meaningful? \$\endgroup\$Jeroen Vannevel– Jeroen Vannevel2016年08月07日 23:17:51 +00:00Commented Aug 7, 2016 at 23:17
-
\$\begingroup\$ @JeroenVannevel, all the performance, both space and time, optimizations listed below that sentence are provided to justify it. I wanted to stress on possibly because I don't want to make overstatement since I didn't make any measurements. Suggested implementation probably uses less memory, since size_t is usually equal or greater than int. There are more ways to perform optimizations, so I think making too qualified statement would be inaccurate \$\endgroup\$Incomputable– Incomputable2016年08月07日 23:21:10 +00:00Commented Aug 7, 2016 at 23:21