4
\$\begingroup\$

This is a simple version from the function strstr. It returns the address of the first occurrence of the substring s2 in s1. I want to know possible problems in the code, and how to improve it, in general.

#include <stdio.h>
/* Version of strstr, which returns the adress of
 the first occurrence of s2 in s1, else, NULL
*/
char *strs(char *s1, char* s2)
{
 char *ptr = NULL;
 int i, n;
 for (i = 0, n = 0; s1[i] != '0円'; i++)
 {
 if (s1[i] == s2[n])
 {
 if (n == 0)
 ptr = &(s1[i]);
 if (s2[n + 1] == '0円')
 return ptr;
 ++n;
 }
 else
 n = 0;
 }
 return NULL;
}
int main(void)
{
 char s1[] = "I'm waiting in my cold cell\nWhen the bell begins to chime\n";
 char s2[] = "waiting";
 char *ptr = strs(s1, s2);
 if (ptr == NULL)
 {
 printf("There is no occurrence of \"%s\" in the first string!\n",
 s2);
 }
 else
 {
 printf("%s\n", ptr);
 }
}
asked Aug 7, 2016 at 17:16
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Comments about your code:

I believe that you would rename it to something like strstr if you could.

You should accept pointers to const char, since you don't modify them.

You should initialize i and n right inside the loop. It will narrow their scope and make the code more maintainable since it won't pollute the scope of the function. On top of that, they will be deallocated whenever the loop ends.

using int for index is a very bad idea. The biggest issues with it that it is signed and not guaranteed to be large enough. You should use size_t instead, because it is guaranteed that size_t will have enough capacity to index array of maximum possible size.

You're increasing overhead by using subscript ([]) operator.

Loop could be rewritten in other, more readable and straightforward form, one of which is shown in suggested implementation. I know that you would like to have high performance, but I don't think that it is that important in this situation. May be with more context of the usage of the algorithm I could suggest particular algorithm (there are quite a few of them). The best in my opinion is Boyer–Moore string search algorithm.

Suggested implementation:

I think that strncmp should be used. I know that including string.h header defeats the purpose, but you could write your own. It increases code reuse in C!

To make it work we just need to precalculate length of the substring.

const char *strs(const char *text, const char *substr)
{
 size_t substrlen = strlen(substr);
 while (*text != 0)
 {
 if (strncmp(text, substr, substrlen) == 0)
 {
 return text;
 }
 ++text;
 }
 return NULL;
}

The performance gain from the data being const depends on physical constness and the compiler. Further reading.

About performance gain..

answered Aug 7, 2016 at 17:56
\$\endgroup\$
8
  • 1
    \$\begingroup\$ Thanks for the answer! I really considered using strncmp, but it just felt like cheating. By the way, should I use const char in the pointer because makes the code more readable, or because makes it faster? (or both) \$\endgroup\$ Commented Aug 7, 2016 at 18:37
  • 1
    \$\begingroup\$ First reason + it prevents from occasional modifying string. The semantic of the function is that the function doesn't modify the string, so it will make it easier for people to understand that it for sure (not considering casts) won't modify the string. About the performance, it depends on the data being physically const. Unfortunately I don't have enough knowledge about it. Probably another review will be able to answer the question. I will upvote it so people could see it. \$\endgroup\$ Commented Aug 7, 2016 at 18:39
  • \$\begingroup\$ @LúcioCardoso, updated the answer with relevant SO posts. Have a nice read. \$\endgroup\$ Commented Aug 7, 2016 at 18:47
  • \$\begingroup\$ "The algorithm can be written to use less memory and possibly incur less overhead." can you provide an explanation to make this actually meaningful? \$\endgroup\$ Commented Aug 7, 2016 at 23:17
  • \$\begingroup\$ @JeroenVannevel, all the performance, both space and time, optimizations listed below that sentence are provided to justify it. I wanted to stress on possibly because I don't want to make overstatement since I didn't make any measurements. Suggested implementation probably uses less memory, since size_t is usually equal or greater than int. There are more ways to perform optimizations, so I think making too qualified statement would be inaccurate \$\endgroup\$ Commented Aug 7, 2016 at 23:21

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.