In this strstr
implementation, I am basically skipping the already matched and checked characters with an if else
condition. Is this a right way of implementing it?
char *strstrImp(char *input, char *find) {
if(*find=='0円')
return input;
if(*input=='0円')
return NULL;
char *start=input;
char *p;
while(*start!='0円')
{
p=start;
char *q=find;
while(*p!='0円' && *q!='0円' && *p==*q)
{
p++;
q++;
}
if(*q=='0円')
{
return start;
}
//update the starting position based on whether any character matched
if(p == start) //no match
start++;
else
start = p; //skip characters and start from the previously left place
}
return NULL;
}
int main()
{
char * res;
res = strstrImp("helloworld","low");
printf("%s",res);
}
4 Answers 4
Const-correctness
Since you tagged this question as c++, you must take care of const-correctness:
cr54045.cpp:36:18: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
res = strstrImp("helloworld","low");
^
cr54045.cpp:36:31: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
res = strstrImp("helloworld","low");
^
There should be a
const char *strstrImp(const char *input, const char *find)
{
return strstrImp(const_cast<char *>(input), find);
}
Incorrect optimization
Try this test case, which compares your implementation against the standard strstr()
.
int main()
{
const char *input = "alfalfalfa romeo", *find = "alfalfa ";
printf("%s\n", strstrImp(input, find)); // (null)
printf("%s\n", strstr(input, find)); // alfalfa romeo
}
Logic simplification
Since the start = p
optimization is bogus, we will only end the loop with start++
. The function body would then look like:
if (*find == '0円')
return input;
if (*input == '0円')
return NULL;
char *start = input;
while (*start != '0円')
{
...
start++;
}
return NULL;
Next, notice that the if (*input == '0円') return NULL
special case is redundant. Also notice that you never refer to input
again, so you might as well think of start
and input
as the same thing. The outline simplifies to:
if (*find == '0円')
return input;
while (*input != '0円')
{
...
input++;
}
return NULL;
Next, let's turn our attention inside the outer loop.
Since p
is only ever used inside the loop, you should pull its declaration inside the loop. p
and q
should both be const char *
.
while (*input != '0円')
{
const char *p = input, *q = find;
while (*p != 0 && *q != '0円' && *p == *q)
{
p++;
q++;
}
if (*q=='0円')
{
return input;
}
input++;
}
It turns out that the inner-loop condition can be simplified, because the *p == '0円'
test is redundant:
$$\begin{array}{ccc|c|c} \text{*p == *q} & \text{*q != '0円'} & \text{*p != '0円'} & {\begin{array}{l}\text{*p!='0円' &&}\\\text{*q!='0円' &&}\\\text{*p==*q}\end{array}} & \begin{array}{l}\text{*q != '0円' &&}\\\text{*p == *q}\end{array} \\ \hline \\ \begin{array}{c}T\\T\end{array} & \begin{array}{c}T\\T\end{array} & \begin{array}{c}T\\F\end{array} & \begin{array}{c}T\\F\ (impossible)\end{array} & T \\ \hline \\ \begin{array}{c}T\\T\end{array} & \begin{array}{c}F\\F\end{array} & \begin{array}{c}T\\F\end{array} & \begin{array}{c}F\ (impossible)\\F\end{array} & F \\ \hline \\ \begin{array}{c}F\\F\end{array} & \begin{array}{c}T\\T\end{array} & \begin{array}{c}T\\F\end{array} & \begin{array}{c}F\\F\end{array} & F \\ \hline \\ \begin{array}{c}F\\F\end{array} & \begin{array}{c}F\\F\end{array} & \begin{array}{c}T\\F\end{array} & \begin{array}{c}F\\F\end{array} & F \\ \end{array}$$
The function can now look like:
if (*find=='0円')
return input;
while (*input != '0円')
{
const char *p, *q;
for (p = input, q = find; *q != '0円' && *p == *q; p++, q++) {}
if (*q == '0円')
{
return input;
}
input++;
}
return NULL;
A well designed function shouldn't have surprising behaviour, and therefore shouldn't have special cases. Can we get rid of if (*find == '0円') return input;
?
The behaviour of strstr()
is, if find
is ""
, then the function returns input
. In order to reach the return input;
statement when find == ""
and input == ""
, we have to get rid of both the if (*find == '0円')
special case and the while (*input != '0円')
guard condition.
char *strstrImp(char *input, const char *find)
{
do
{
const char *p, *q;
for (p = input, q = find; *q != '0円' && *p == *q; p++, q++) {}
if (*q == '0円')
{
return input;
}
} while (*(input++) != '0円');
return NULL;
}
const char *strstrImp(const char *input, const char *find)
{
return strstrImp(const_cast<char *>(input), find);
}
Suppose we make that change — then what about the case where input == ""
and find
is non-empty? It turns out, that works just fine: it will flow through all of the conditions and return NULL
! There you go — a compact implementation of strstr()
with reasonably simple logic and without special cases.
-
\$\begingroup\$ I would argue that also in C const correctness is useful \$\endgroup\$PlasmaHH– PlasmaHH2014年06月12日 12:47:11 +00:00Commented Jun 12, 2014 at 12:47
-
1\$\begingroup\$ Get rid of the fugly
const_cast
and you have my vote. \$\endgroup\$Emily L.– Emily L.2014年06月12日 13:20:55 +00:00Commented Jun 12, 2014 at 13:20 -
\$\begingroup\$ @Emily How would you do it instead? \$\endgroup\$200_success– 200_success2014年06月12日 13:25:58 +00:00Commented Jun 12, 2014 at 13:25
-
1\$\begingroup\$ What obvious thing am I missing? Why can you just have the single
const char *strstrImp(const char *input, const char *find)
function? Is it because you want the ability to return a non const char*? Personally I don't mind adding const-ness but removing it has to be well justified. So I would swap the two bodies around and remove the constness on the return value of the first version. \$\endgroup\$Loki Astari– Loki Astari2014年06月12日 14:55:25 +00:00Commented Jun 12, 2014 at 14:55 -
1\$\begingroup\$ @DeadMG There is a very good reason, and it's called "standards conformance". The question was about re-implementing strstr as per the standard en.cppreference.com/w/cpp/string/byte/strstr. \$\endgroup\$Emily L.– Emily L.2014年06月16日 13:22:56 +00:00Commented Jun 16, 2014 at 13:22
It seems to me the code can be quite a bit simpler than what you've shown above. I think I'd prefer to implement roughly the following algorithm:
for each position in haystack
if needle matches haystack starting at this position
return current position
return no position
In C++, that might look something like this:
namespace {
bool match(char const *input1, char const *input2) {
for (; *input2 && *input1; ++input1, ++input2)
if (*input1 != *input2)
return false;
return true;
}
}
char *find_sub(char *haystack, char const *needle) {
for (; *haystack; ++haystack)
if (match(haystack, needle))
return haystack;
return NULL;
}
char const *find_sub(char const *haystack, char const *needle) {
return find_sub(const_cast<char *>(haystack), needle);
}
Note that I've renamed the functions to find_str
. At least according to the C standard, any and all names starting with str
are reserved for the implementation. At least by my reading, strstrImp
is probably allowed in C++, but I'd rather use a name that's clearly allowed in both C and C++ if possible.
For what it's worth, a few test cases:
int main() {
char *input1 = "asddf";
assert(find_sub(input1, "dd") == input1 + 2);
char const *input = "asddf";
assert(find_sub(input, "f") == input + 4);
assert(find_sub(input, "x") == NULL);
assert(find_sub(input, "") == input);
}
Much like the version @200_success posted, this handles searching for an empty string without any special-case code. It also handles the overloading pretty much the same way he did--while the const_cast
isn't exactly beautiful, it beats the alternative of duplicating the code for find_sub
, once for char *
and again for char const *
input. I did initially implement the overloading as @Loki Astari suggested in a comment, with the non-const version calling the const version instead of vice versa. After some though, I think this is preferable though. Since we're certain that the other overload of find_sub
won't attempt to modify either of the subject strings, it's safe to cast away the const
ness in this case. If we instead add const-ness to the non-const overload, then we have to cast it away again on the result, so we have to write two explicit casts. By going the other direction, the cast on the return value from char *
to char const *
can happen implicitly, so the code is marginally less ugly.
Also note that I've used NULL
to return a null pointer. This allows that function definition to be compatible with both older C++ compilers and C compilers. If you're sure it'll only be used with (reasonably) current C++ compilers, it's probably better to replace that with nullptr
instead.
The way I learned C strings was to check for end of string with something like
while (*input) {...};
rather than the more verbose
while (*input != '0円') {...};
Both are correct. It is just that one is more verbose, or long winded. Verbosity is not necessarily a virtue. Brevity is not a vice.
I also prefer
for (;;) {...}
over the similar
while (1) {...}
When I learned C, many decades ago, (for many>= 3) some compilers would actually build a constant 1 for while (1) and test that the constant 1 was not 0. Your preferences may differ.
The above implementation with those additional ... preferences would become
char *strstrImp(const char *input, const char *find) {
for (;*input;input++) {
const char *p = input, *q = find;
while (*p && *q && *p++ == *q++) {}
if (!*q) return (char *)input;
}
return NULL;
}
My simple implementation with the above variable names is
char *strstrImp(const char *input, const char *find) {
const char *p = input, *q = find;
for (;;) {
if (!*q) return (char *)input;
if (!*p) return NULL;
if (*p++ == *q++) continue;
p = ++input;
q = find;
}
}
The types are all appropriate for the C standard strstr. I like it because it has a single loop rather than the typical two loop approach. The continue test in the middle of the loop allows us to decide whether to progress on the inner loop, or reset the variables for the outer loop. It's not crazy fast with large input or find strings. For small and medium input and find strings, it is about as fast as any other implementation.
Already there are great answers for C++. But in case of C, I would like to add one more point: instead of using two loops we can divide the tasks into functions to make it more readable. In response to the comments the whole point or observation is made for readability purpose.
strstrImp -> find/locate substring in string and returns a pointer points to the first character of the found substr in str otherwise a null pointer.
char const *strstrImp (const char *str, const char *substr)
{
size_t len=strlen(substr);
for (int i = 0; str[i] != '0円'; i++)
{
if (strncmp((str + i), substr,len) == 0)
{
return (char *)(str + i);
}
}
return NULL;
}
Explore related questions
See similar questions with these tags.
input
in O(n) (or one loop). Unfortunately building the state transition table is non trivial code. Thus this is a perfectly acceptable solution. \$\endgroup\$