3
\$\begingroup\$

Since memmem() isn't cross-platform, I'm trying to rewrite it from scratch. How does this look?

#define MemContainsStr(Str, StrLen, Substr) MemContainsMem(Str, StrLen, SubStr, sizeof(SubStr))
void *MemContainsMem(const void *StrStart, register unsigned long StrLen, const char *Substr, register const unsigned char SubstrLen) {
 if (StrLen < SubstrLen) {
 return 0;
 }
 register const unsigned char *Str = memchr(StrStart, *Substr, StrLen);
 do {
 if (!Str || StrLen < SubstrLen) {
 return 0;
 }
 if (!memcmp(Str, Substr, SubstrLen)) {
 return Str;
 }
 register void *NewStr = memchr(Str+1, *Substr, StrLen);
 StrLen -= NewStr-Str;
 Str = NewStr;
 } while (Str);
 return 0;
}

Are there register variables that shouldn't be, how can I further optimize this code?

asked Jul 10, 2020 at 2:42
\$\endgroup\$
5
  • 2
    \$\begingroup\$ As memmem() is not a standard C library function, post a reference to the code's functionality. \$\endgroup\$ Commented Jul 10, 2020 at 2:53
  • \$\begingroup\$ @chux-ReinstateMonica man memmem() \$\endgroup\$ Commented Jul 10, 2020 at 2:55
  • 3
    \$\begingroup\$ To actually write this efficiently/library-quality is not trivial. It boils down to using the systems maximum data width efficiently, something the memchr and memcmp supposedly does internally. Your function might need to wiggle around stray bytes that aren't aligned though. You can peek at the memmem implementation inside libc, it's quite complicated. \$\endgroup\$ Commented Jul 10, 2020 at 10:07
  • 3
    \$\begingroup\$ "Are there register variables that shouldn't be?" Yes, register variables shouldn't be - don't use this identifier at all. It is an obsolete remain from the past, when compilers were very bad at optimizing code. Nowadays, they do a much better job than the programmer when it comes to determining what should be placed inside a register. In particular, the compiler knows everything about the function calling convention and may utilize the fact that some parameters are already in registers when the function is entered. \$\endgroup\$ Commented Jul 10, 2020 at 10:10
  • \$\begingroup\$ @Lundin There isn't a way to know how complicated it is, without two_way_short/long_needle(). Where is #include "str-two-way.h" \$\endgroup\$ Commented Jul 10, 2020 at 18:19

1 Answer 1

2
\$\begingroup\$

I'm trying to rewrite it from scratch. How does this look?

Mis-matched function signature

OP code uses

void *MemContainsMem(const void *StrStart, register unsigned long StrLen, 
 const char *Substr, register const unsigned char SubstrLen);

The referenced code uses

void *memmem(const void *haystack, size_t haystacklen, 
 const void *needle, size_t needlelen);

Recommend to use the same function signature. Unclear why OP changed from size_t to unsigned long and changed pointer type.

Use of Str... hints at a string, yet this function does not operate on strings. Maybe Mem...? Hard to beat the names needle/haystack.

Invalid access in corner case

The first *Substr in memchr(StrStart, *Substr, StrLen); is UB when SubstrLen is 0.

Code should gracefully and correctly handle SubstrLen == 0 and SubLen == 0 cases.

register

register is a good keyword to use when the rare programmer better understands the compilation situation better than the compiler. For the other 99.99% of us, drop register and let the compiler do its job. Do not tie the hands of the compiler.

0 vs. NULL

Minor: Since the return type is a pointer, consider returning NULL rather than 0 as being more idiomatic. Either works.

Unneeded code

The first if (StrLen < SubstrLen) { return 0; appears unnecessary as that case is detected in later code.

Redundant test

} while (Str); can be replaced with } while (1);. Alternatively make loop a while()

//register const unsigned char *Str = memchr(StrStart, *Substr, StrLen);
//do {
// if (!Str || StrLen < SubstrLen) {
// return 0;
// }
const unsigned char *Str;
while ((Str = memchr(StrStart, *Substr, StrLen)) != NULL && (StrLen >= SubstrLen)) {

Invalid pointer subtraction

Not portable to subtract void * in NewStr-Str. Use unsigned char *NewStr instead.

Array or string

The define treats Substr as an array and not a string.
Good macro practice uses a () arround each parmeter.

// #define MemContainsStr(Str, StrLen, Substr) \
// MemContainsMem(Str, StrLen, SubStr, sizeof(SubStr))
#define MemContainsStr(Str, StrLen, Substr) \
 MemContainsMem((Str), (StrLen), (SubStr), strlen(SubStr))

Be wary now of double evaluation of SubStr


Are there register variables that shouldn't be, how can I further optimize this code?

I recommend to drop all register usages and simple enable greater compiler optimizations levels.

A fundamental weakness to this code is that it is O(m*n) where m, n are the lengths of the needle and haystack.

Better algorithms are O(m+n) and oblige significant code changes.


ToDo

Bug??

Str+1 is suspicious in NewStr = memchr(Str+1, *Substr, StrLen); as it is not clear that StrLen > 0 always at this point. e.g. StrLen, SubstrLen both are 0.

Hmmm

answered Jul 10, 2020 at 3:23
\$\endgroup\$
4
  • 1
    \$\begingroup\$ If removing register and renaming the variable to needle, does that mean that needle is now allocated in the stack? :) \$\endgroup\$ Commented Jul 10, 2020 at 10:12
  • \$\begingroup\$ @Lundin Yes, needle is in the stack somewhere - just cannot find it. ;-) \$\endgroup\$ Commented Jul 10, 2020 at 12:31
  • \$\begingroup\$ Feedback for Array or string: I made it treat it like an array, because for this specific implementation, the second argument will always be a array literal. \$\endgroup\$ Commented Jul 10, 2020 at 18:04
  • \$\begingroup\$ @user227432 the name MemContainsStr() then mis-leads. Sounds like to are making a MemContainsMem() or MemContainsArray(). \$\endgroup\$ Commented Jul 10, 2020 at 18:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.