I'm trying to recreate strstr
in c, my code works but I think it is too long, and it needs some improvement.
#include <stdlib.h>
static int find_str_indx(char *str, int index, char *str_to_find)
{
int i;
int temp;
int found_index;
while (str[index] != '0円')
{
if (str[index] == *str_to_find)
{
temp = index;
i = 0;
found_index = index;
while (str_to_find[i] != '0円' && str[index] != '0円')
{
if(str_to_find[i] != str[index])
found_index = -1;
i++;
index++;
}
if (found_index == -1)
index = temp;
}
index++;
}
return (found_index);
}
char *my_strstr(char *str, char *to_find)
{
char *found_str;
int found_index;
int i;
found_str = (char*)malloc(sizeof(found_str));
found_index = find_str_indx(str, 0, to_find);
if (found_index != -1)
{
i = 0;
while (str[found_index] != '0円')
{
found_str[i] = str[found_index];
found_index++;
i++;
}
}
else
found_str = NULL;
return (found_str);
}
UPDATE
I have rewritten the code to get rid of allocating memory inside strstr
, as advised by ratchet freak
and Jerry Coffin
.
char *my_strstr(const char *haystack, const char *needle)
{
size_t needle_len;
needle_len = strlen(needle);
while (*haystack)
{
if (*haystack == *needle)
{
if (!strncmp(haystack, needle, needle_len))
return ((char *)haystack);
}
haystack++;
}
return (NULL);
}
3 Answers 3
This strikes me as somewhat...clumsy code that makes the job more difficult than truly necessary.
I'd break the task up into pieces, something like:
for every candidate position in the haystack
if the needle matches at this point
return success at this point
return failure
So that leaves us with two subordinate tasks:
- Figure out the candidate positions
- Check for a match starting at a specified point
The first is pretty simple: the candidate positions are the beginning of the haystack through the length of the haystack minus the length of the needle (and if the needle is larger than the haystack, it can't possibly match).
The second is even simpler: step through both strings, and if they don't match at a given point, return indicating failure. If (and only if) you reach the end of the second without a mismatch, return indicating success.
So let's write some code for that:
size_t length(char const *s) {
size_t i;
for (i = 0; s[i]; i++)
;
return i;
}
bool match(char const *a, char const *b) {
while (*b)
if (*a++ != *b++)
return false;
return true;
}
Then we can do the function itself:
char const *find_substr(char const *haystack, char const *needle) {
size_t len1 = length(haystack);
size_t len2 = length(needle);
for (size_t i = 0; i+len2<=len1; i++)
if (match(haystack + i, needle))
return haystack + i;
return NULL;
}
Then we probably want a bit of test code to verify that it works to at least some degree under some circumstances:
char const *check(char const *s) {
if (s)
return s;
return "[NULL POINTER]";
}
#define elements(arr) (sizeof(arr)/sizeof(arr[0]))
int main() {
char input1[] = "This is some input for the function";
// tests: match middle, non-match, match begin, match end, empty string
char const *tests[] = { "for", "was", "This", "ion", "" };
for (size_t = 0; i < elements(tests); i++)
printf("%s\n", check(find_substr(input1, tests[i])));
printf("%s\n", check(find_substr("the", "this is a longer string")));
}
There are certainly other ways the job could be done, but that seems to me like it's at least a reasonable way to implement what's probably the simplest (and most naive) one.
-
1\$\begingroup\$ You might want to make
i
unsigned
orsize_t
instead ofint
withinmain
. \$\endgroup\$Edward– Edward2016年05月11日 16:54:09 +00:00Commented May 11, 2016 at 16:54 -
\$\begingroup\$ @Edward: Good point (though I can't quite imagine having more than 32767 test cases to step through, so I'm pretty sure
int
remains adequate indefinitely). \$\endgroup\$Jerry Coffin– Jerry Coffin2016年05月11日 16:55:44 +00:00Commented May 11, 2016 at 16:55 -
\$\begingroup\$
bool match(char const *a, char const *b)
has asymmetric functionality betweena,b
withwhile (*b)
. Maybe usebool match(char const *haystack, char const *needle)
? \$\endgroup\$chux– chux2016年05月12日 02:56:34 +00:00Commented May 12, 2016 at 2:56 -
1\$\begingroup\$ A good reason for using
size_t
overint
, even with short list, it that it avoids the pedantic warnings of mixing signed/unsigned compare withi < elements(tests)
. \$\endgroup\$chux– chux2016年05月12日 02:58:17 +00:00Commented May 12, 2016 at 2:58 -
\$\begingroup\$ @JerryCoffin Thanks, I have rewritten my code and followed your pseudo-code. \$\endgroup\$Junius L– Junius L2016年05月13日 08:21:35 +00:00Commented May 13, 2016 at 8:21
Below are some of my rules. Yours may differ.
- Embrace the pointer, and pointer auto-increment. Indexes are ok, but so are pointers.
- Avoid the trap that all if statements must test a boolean value. In C if statements can evaluate or test char, int, or even float.
- The test
if (*a == '0円')
is ok, butif (*a)
is shorter. Shorter is better. - Dont feel guilty for putting a return in the middle of your code. Not all subroutines must return or exit out of the bottom.
- Some prefer
while (1){ }
, butfor (;;){ }
is also pretty. Maybe even prettier. - It's not really an infinite loop if it has a
break
orreturn
in it. - Avoid using
break;
wherereturn;
would be better. - Avoid nesting loops and if statements three or more deep.
- Small is simple. Simple is correct. Go for correct first. Then speed it up if you must.
- Once I write some code, I first ask myself, is it correct and can I make it smaller?
My favorite C implementation of strstr. It's not boyer-moore fast, but it is small.
char * strstr (const char *haystack, const char *needle) {
const char * a = haystack, *b = needle;
for (;;) {
if ( !*b ) return NULL;
if ( !*a ) return haystack;
if ( *a++ == *b++ ) continue;
a = ++haystack;
b = s2;
}
}
-
\$\begingroup\$ "The test
if (*a == '0円')
is ok, butif (*a)
is shorter. Shorter is better.": Agree, but typo: should beif (!*a)
\$\endgroup\$alx - recommends codidact– alx - recommends codidact2019年12月19日 14:23:03 +00:00Commented Dec 19, 2019 at 14:23 -
\$\begingroup\$ "Shorter is better." Go to codegolf.stackexchange.com and tell me if the C code you find there would get any good reviews on this site :) \$\endgroup\$G. Sliepen– G. Sliepen2019年12月19日 14:45:16 +00:00Commented Dec 19, 2019 at 14:45
I agree with Jerry Coffin on the overdone part. I recommend another algorithm however, which roughly is:
char* haystack;
char* needle;
int currentMatch = 0;
size_t hay_len = strlen(haystack);
size_t nee_len = strlen(needle);
for(int i = 0;i<hay_len;i++) {
if(haystack[i] == needle[currentMatch]) {
if(++currentMatch == nee_len) return i - currentMatch; // may or may not be off by one
}else currentMatch = 0;
}
This is as optimized as I ever got for search for a pattern in another (ie a string in a string).
sizeof(found_str)
is always the size of a pointer (4 or 8 depending on 32 or 64 bit) so it's very likely that you will overflow that buffer. Not to mention strstr shouldn't allocate anything in the first place. You also leak thefound_str
if you didn't find the substring. \$\endgroup\$my_strstr()
document/test its intended/actual functionality with corner cases ofmy_strstr("", "needle")
,my_strstr("haystack", "")
,my_strstr("", "")
. \$\endgroup\$