Recreating strstr in C

Question 1

I'm trying to recreate strstr in c, my code works but I think it is too long, and it needs some improvement.

#include <stdlib.h>
static int find_str_indx(char *str, int index, char *str_to_find)
{
 int i;
 int temp;
 int found_index;
 while (str[index] != '0円')
 {
 if (str[index] == *str_to_find)
 {
 temp = index;
 i = 0;
 found_index = index;
 while (str_to_find[i] != '0円' && str[index] != '0円')
 {
 if(str_to_find[i] != str[index])
 found_index = -1;
 i++;
 index++;
 }
 if (found_index == -1)
 index = temp;
 }
 index++;
 }
 return (found_index);
}
char *my_strstr(char *str, char *to_find)
{
 char *found_str;
 int found_index;
 int i;
 found_str = (char*)malloc(sizeof(found_str));
 found_index = find_str_indx(str, 0, to_find);
 if (found_index != -1)
 {
 i = 0;
 while (str[found_index] != '0円')
 {
 found_str[i] = str[found_index];
 found_index++;
 i++;
 }
 }
 else
 found_str = NULL;
 return (found_str);
}

UPDATE

I have rewritten the code to get rid of allocating memory inside strstr, as advised by ratchet freak and Jerry Coffin.

char *my_strstr(const char *haystack, const char *needle)
{
 size_t needle_len;
 needle_len = strlen(needle);
 while (*haystack)
 {
 if (*haystack == *needle)
 {
 if (!strncmp(haystack, needle, needle_len))
 return ((char *)haystack);
 }
 haystack++;
 }
 return (NULL);
}

Question 2

fatal flaw in there, sizeof(found_str) is always the size of a pointer (4 or 8 depending on 32 or 64 bit) so it's very likely that you will overflow that buffer. Not to mention strstr shouldn't allocate anything in the first place. You also leak the found_str if you didn't find the substring.

Question 3

Suggest my_strstr() document/test its intended/actual functionality with corner cases of my_strstr("", "needle"), my_strstr("haystack", ""), my_strstr("", "").

Question 4

This strikes me as somewhat...clumsy code that makes the job more difficult than truly necessary.

I'd break the task up into pieces, something like:

for every candidate position in the haystack
 if the needle matches at this point
 return success at this point
return failure

So that leaves us with two subordinate tasks:

Figure out the candidate positions
Check for a match starting at a specified point

The first is pretty simple: the candidate positions are the beginning of the haystack through the length of the haystack minus the length of the needle (and if the needle is larger than the haystack, it can't possibly match).

The second is even simpler: step through both strings, and if they don't match at a given point, return indicating failure. If (and only if) you reach the end of the second without a mismatch, return indicating success.

So let's write some code for that:

size_t length(char const *s) {
 size_t i;
 for (i = 0; s[i]; i++)
 ;
 return i;
}
bool match(char const *a, char const *b) {
 while (*b)
 if (*a++ != *b++)
 return false;
 return true;
}

Then we can do the function itself:

char const *find_substr(char const *haystack, char const *needle) {
 size_t len1 = length(haystack);
 size_t len2 = length(needle);
 for (size_t i = 0; i+len2<=len1; i++)
 if (match(haystack + i, needle))
 return haystack + i;
 return NULL;
}

Then we probably want a bit of test code to verify that it works to at least some degree under some circumstances:

char const *check(char const *s) {
 if (s)
 return s;
 return "[NULL POINTER]";
}
#define elements(arr) (sizeof(arr)/sizeof(arr[0]))
int main() {
 char input1[] = "This is some input for the function";
 // tests: match middle, non-match, match begin, match end, empty string
 char const *tests[] = { "for", "was", "This", "ion", "" };
 for (size_t = 0; i < elements(tests); i++)
 printf("%s\n", check(find_substr(input1, tests[i])));
 printf("%s\n", check(find_substr("the", "this is a longer string")));
}

There are certainly other ways the job could be done, but that seems to me like it's at least a reasonable way to implement what's probably the simplest (and most naive) one.

Question 5

You might want to make i unsigned or size_t instead of int within main.

Question 6

@Edward: Good point (though I can't quite imagine having more than 32767 test cases to step through, so I'm pretty sure int remains adequate indefinitely).

Question 7

bool match(char const *a, char const *b) has asymmetric functionality between a,b with while (*b). Maybe use bool match(char const *haystack, char const *needle)?

Question 8

A good reason for using size_t over int, even with short list, it that it avoids the pedantic warnings of mixing signed/unsigned compare with i < elements(tests).

Question 9

@JerryCoffin Thanks, I have rewritten my code and followed your pseudo-code.

Question 10

Below are some of my rules. Yours may differ.

Embrace the pointer, and pointer auto-increment. Indexes are ok, but so are pointers.
Avoid the trap that all if statements must test a boolean value. In C if statements can evaluate or test char, int, or even float.
The test if (*a == '0円') is ok, but if (*a) is shorter. Shorter is better.
Dont feel guilty for putting a return in the middle of your code. Not all subroutines must return or exit out of the bottom.
Some prefer while (1){ }, but for (;;){ } is also pretty. Maybe even prettier.
It's not really an infinite loop if it has a break or return in it.
Avoid using break; where return; would be better.
Avoid nesting loops and if statements three or more deep.
Small is simple. Simple is correct. Go for correct first. Then speed it up if you must.
Once I write some code, I first ask myself, is it correct and can I make it smaller?

My favorite C implementation of strstr. It's not boyer-moore fast, but it is small.

 char * strstr (const char *haystack, const char *needle) {
 const char * a = haystack, *b = needle;
 for (;;) {
 if ( !*b ) return NULL;
 if ( !*a ) return haystack;
 if ( *a++ == *b++ ) continue;
 a = ++haystack;
 b = s2;
 }
 }

Question 11

"The test if (*a == '0円') is ok, but if (*a) is shorter. Shorter is better.": Agree, but typo: should be if (!*a)

Question 12

"Shorter is better." Go to codegolf.stackexchange.com and tell me if the C code you find there would get any good reviews on this site :)

Question 13

I agree with Jerry Coffin on the overdone part. I recommend another algorithm however, which roughly is:

char* haystack;
char* needle;
int currentMatch = 0;
size_t hay_len = strlen(haystack);
size_t nee_len = strlen(needle);
for(int i = 0;i<hay_len;i++) {
 if(haystack[i] == needle[currentMatch]) {
 if(++currentMatch == nee_len) return i - currentMatch; // may or may not be off by one
 }else currentMatch = 0;
}

This is as optimized as I ever got for search for a pattern in another (ie a string in a string).

Jerry Coffin Jerry Coffin 34.1k4 gold badges77 silver badges144 bronze badges · Accepted Answer · 2016-05-11 16:04:16Z

This strikes me as somewhat...clumsy code that makes the job more difficult than truly necessary.

I'd break the task up into pieces, something like:

for every candidate position in the haystack
 if the needle matches at this point
 return success at this point
return failure

So that leaves us with two subordinate tasks:

Figure out the candidate positions
Check for a match starting at a specified point

The first is pretty simple: the candidate positions are the beginning of the haystack through the length of the haystack minus the length of the needle (and if the needle is larger than the haystack, it can't possibly match).

The second is even simpler: step through both strings, and if they don't match at a given point, return indicating failure. If (and only if) you reach the end of the second without a mismatch, return indicating success.

So let's write some code for that:

size_t length(char const *s) {
 size_t i;
 for (i = 0; s[i]; i++)
 ;
 return i;
}
bool match(char const *a, char const *b) {
 while (*b)
 if (*a++ != *b++)
 return false;
 return true;
}

Then we can do the function itself:

char const *find_substr(char const *haystack, char const *needle) {
 size_t len1 = length(haystack);
 size_t len2 = length(needle);
 for (size_t i = 0; i+len2<=len1; i++)
 if (match(haystack + i, needle))
 return haystack + i;
 return NULL;
}

Then we probably want a bit of test code to verify that it works to at least some degree under some circumstances:

char const *check(char const *s) {
 if (s)
 return s;
 return "[NULL POINTER]";
}
#define elements(arr) (sizeof(arr)/sizeof(arr[0]))
int main() {
 char input1[] = "This is some input for the function";
 // tests: match middle, non-match, match begin, match end, empty string
 char const *tests[] = { "for", "was", "This", "ion", "" };
 for (size_t = 0; i < elements(tests); i++)
 printf("%s\n", check(find_substr(input1, tests[i])));
 printf("%s\n", check(find_substr("the", "this is a longer string")));
}

There are certainly other ways the job could be done, but that seems to me like it's at least a reasonable way to implement what's probably the simplest (and most naive) one.

You might want to make i unsigned or size_t instead of int within main.
@Edward: Good point (though I can't quite imagine having more than 32767 test cases to step through, so I'm pretty sure int remains adequate indefinitely).
bool match(char const *a, char const *b) has asymmetric functionality between a,b with while (*b). Maybe use bool match(char const *haystack, char const *needle)?
A good reason for using size_t over int, even with short list, it that it avoids the pedantic warnings of mixing signed/unsigned compare with i < elements(tests).
@JerryCoffin Thanks, I have rewritten my code and followed your pseudo-code.

Stack Exchange Network

Recreating strstr in C

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Recreating strstr in C

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions