Since memmem()
isn't cross-platform, I'm trying to rewrite it from scratch. How does this look?
#define MemContainsStr(Str, StrLen, Substr) MemContainsMem(Str, StrLen, SubStr, sizeof(SubStr))
void *MemContainsMem(const void *StrStart, register unsigned long StrLen, const char *Substr, register const unsigned char SubstrLen) {
if (StrLen < SubstrLen) {
return 0;
}
register const unsigned char *Str = memchr(StrStart, *Substr, StrLen);
do {
if (!Str || StrLen < SubstrLen) {
return 0;
}
if (!memcmp(Str, Substr, SubstrLen)) {
return Str;
}
register void *NewStr = memchr(Str+1, *Substr, StrLen);
StrLen -= NewStr-Str;
Str = NewStr;
} while (Str);
return 0;
}
Are there register
variables that shouldn't be, how can I further optimize this code?
1 Answer 1
I'm trying to rewrite it from scratch. How does this look?
Mis-matched function signature
OP code uses
void *MemContainsMem(const void *StrStart, register unsigned long StrLen,
const char *Substr, register const unsigned char SubstrLen);
The referenced code uses
void *memmem(const void *haystack, size_t haystacklen,
const void *needle, size_t needlelen);
Recommend to use the same function signature. Unclear why OP changed from size_t
to unsigned long
and changed pointer type.
Use of Str...
hints at a string, yet this function does not operate on strings. Maybe Mem...
? Hard to beat the names needle/haystack
.
Invalid access in corner case
The first *Substr
in memchr(StrStart, *Substr, StrLen);
is UB when SubstrLen
is 0.
Code should gracefully and correctly handle SubstrLen == 0
and SubLen == 0
cases.
register
register
is a good keyword to use when the rare programmer better understands the compilation situation better than the compiler. For the other 99.99% of us, drop register
and let the compiler do its job. Do not tie the hands of the compiler.
0
vs. NULL
Minor: Since the return type is a pointer, consider returning NULL
rather than 0
as being more idiomatic. Either works.
Unneeded code
The first if (StrLen < SubstrLen) { return 0;
appears unnecessary as that case is detected in later code.
Redundant test
} while (Str);
can be replaced with } while (1);
. Alternatively make loop a while()
//register const unsigned char *Str = memchr(StrStart, *Substr, StrLen);
//do {
// if (!Str || StrLen < SubstrLen) {
// return 0;
// }
const unsigned char *Str;
while ((Str = memchr(StrStart, *Substr, StrLen)) != NULL && (StrLen >= SubstrLen)) {
Invalid pointer subtraction
Not portable to subtract void *
in NewStr-Str
. Use unsigned char *NewStr
instead.
Array or string
The define treats Substr
as an array and not a string.
Good macro practice uses a ()
arround each parmeter.
// #define MemContainsStr(Str, StrLen, Substr) \
// MemContainsMem(Str, StrLen, SubStr, sizeof(SubStr))
#define MemContainsStr(Str, StrLen, Substr) \
MemContainsMem((Str), (StrLen), (SubStr), strlen(SubStr))
Be wary now of double evaluation of SubStr
Are there register variables that shouldn't be, how can I further optimize this code?
I recommend to drop all register
usages and simple enable greater compiler optimizations levels.
A fundamental weakness to this code is that it is O(m*n) where m, n
are the lengths of the needle and haystack.
Better algorithms are O(m+n)
and oblige significant code changes.
ToDo
Bug??
Str+1
is suspicious in NewStr = memchr(Str+1, *Substr, StrLen);
as it is not clear that StrLen > 0
always at this point. e.g. StrLen, SubstrLen
both are 0.
Hmmm
-
1\$\begingroup\$ If removing
register
and renaming the variable toneedle
, does that mean thatneedle
is now allocated in the stack? :) \$\endgroup\$Lundin– Lundin2020年07月10日 10:12:33 +00:00Commented Jul 10, 2020 at 10:12 -
\$\begingroup\$ @Lundin Yes, needle is in the stack somewhere - just cannot find it. ;-) \$\endgroup\$chux– chux2020年07月10日 12:31:27 +00:00Commented Jul 10, 2020 at 12:31
-
\$\begingroup\$ Feedback for Array or string: I made it treat it like an array, because for this specific implementation, the second argument will always be a array literal. \$\endgroup\$user227432– user2274322020年07月10日 18:04:41 +00:00Commented Jul 10, 2020 at 18:04
-
\$\begingroup\$ @user227432 the name
MemContainsStr()
then mis-leads. Sounds like to are making aMemContainsMem()
orMemContainsArray()
. \$\endgroup\$chux– chux2020年07月10日 18:34:39 +00:00Commented Jul 10, 2020 at 18:34
memmem()
is not a standard C library function, post a reference to the code's functionality. \$\endgroup\$man memmem()
\$\endgroup\$register
variables shouldn't be - don't use this identifier at all. It is an obsolete remain from the past, when compilers were very bad at optimizing code. Nowadays, they do a much better job than the programmer when it comes to determining what should be placed inside a register. In particular, the compiler knows everything about the function calling convention and may utilize the fact that some parameters are already in registers when the function is entered. \$\endgroup\$two_way_short/long_needle()
. Where is#include "str-two-way.h"
\$\endgroup\$