I wrote this code as an answer to a question. But I'd like you to have a look at it. This post is basically a copy of the answer I posted.
The code does no error checking. It assumes that the output buffer exists and is big enough and that src
, orig
and new
are valid strings. The headers needed:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stddef.h>
#include <stdint.h>
The main replace function
// Replace the first occurrence of orig in src with new and write result to dest
// Return pointer to first character after the end of first match in src and
// NULL if no match
const char *replace(char *dest, const char *src,
const char *orig, const char *new) {
char *ptr = strstr(src, orig); // First match
// If no match, we want dest to contain a copy of src
if(!ptr) {
strcpy(dest, src);
return NULL;
}
const ptrdiff_t offset = ptr - src; // Calculate offset
const int origlen = strlen(orig);
strncpy(dest, src, offset); // Copy everything before match
strcpy(&dest[offset], new); // Copy replacement
strcpy(&dest[offset + strlen(new)], &src[offset + origlen]); // Copy rest
return src + offset + origlen;
}
Then we just add a function to handle N occurrences
// Replace maximum n occurrences. Stops when no more matches.
// Returns number of replacements
size_t replaceN(char *dest, const char *src, const char *orig,
const char *new, size_t n) {
size_t ret = 0;
// Maybe an unnecessary optimization to avoid multiple calls in
// loop, but it also adds clarity
const int newlen = strlen(new);
const int origlen = strlen(orig);
do {
const char *ptr = replace(dest, src, orig, new); // Replace
if(!ptr) return ret; // Quit if no more matches
// Length of the part of src before first match
const ptrdiff_t offset = ptr - src;
src = ptr; // Move src past what we have already copied.
ret++;
dest += offset - origlen + newlen; // Advance pointer to dest to the end
} while(n > ret);
return ret;
}
And a simple wrapper for all occurrences. Note that it's safe to use SIZE_MAX
here, because there will never be more occurrences than that.
// Replace all. Returns the number of replacements, because why not?
size_t replaceAll(char *dest, const char *src, const char *orig, const char *new) {
return replaceN(dest, src, orig, new, SIZE_MAX);
}
It's easy to write a wrapper for the allocation. We borrow some code from https://stackoverflow.com/q/9052490/6699433
size_t countOccurrences(const char *str, const char *substr) {
size_t count = 0;
while((str = strstr(str, substr)) != NULL) {
count++;
str++; // We're standing at the match, so we need to advance
}
return count;
}
Then some code to calculate size and allocate a buffer
// Allocate a buffer big enough to hold src with n replacements
// of orig to new
char *allocateBuffer(const char *src, const char *orig,
const char *new, size_t n) {
return malloc(strlen(src) +
n * (strlen(new) - strlen(orig)) +
1 // Remember the zero terminator
);
}
And the two final functions
// Allocates a buffer and replaces max n occurrences of orig with new and
// writes it to the allocated buffer.
// Returns the buffer and NULL if allocation failed
char *replaceNAndAllocate(const char *src, const char *orig,
const char *new, size_t n) {
const size_t count = countOccurrences(src, orig);
n = n < count ? n : count; // Min of n and count
char *buf = allocateBuffer(src, orig, new, n);
if(!buf) return NULL;
replaceN(buf, src, orig, new, n);
return buf;
}
// Allocates a buffer and replaces all occurrences of orig with new and
// writes it to the allocated buffer.
// Returns the buffer and NULL if allocation failed
char *replaceAllAndAllocate(const char *src, const char *orig, const char *new) {
return replaceNAndAllocate(src, orig, new, SIZE_MAX);
}
And finally, a simple main
with a test with multiple occurrences and with original string being a substring of replacement string:
int main(void) {
char src[] = "!!!asdf!!!asdf!!!asdf!!!asdf!!!";
char orig[] = "asdf";
char new[] = "asdfaaaaaaaaasdf";
puts(replaceAllAndAllocate(src, orig, new));
}
No warnings with -Wall -Wextra -pedantic
and the output is:
$ ./a.out
!!!asdfaaaaaaaaasdf!!!asdfaaaaaaaaasdf!!!asdfaaaaaaaaasdf!!!asdfaaaaaaaaasdf!!!
2 Answers 2
The interface
There are a number of useful things to return. Let's list them in order of which is least surprising:
- The pointer to the start of the destination. This follows the standard library, but is least useful.
- The pointer to the end of the destination. This follows an alternative convention embodies in the
stpcpy()
function and similar added to POSIX in 2008. - Number of characters produced / to produce. Generally only used if the sink can accept any number of characters, or the destinations size is also passed.
countOccurrences()
seems your flawed attempt to begin implementing the alternative approach to sizing the buffer. - Return a count of replacements done. Degenerates to boolean if maximum number of replacements is 1.
- Try to combine bool (not count) with one of the other options.
strstr()
does it, arguable returning a pointer to the terminator might have been better.
None of those options reflects your choice.
Provide some way to properly size the destination buffer. As a bonus,
replaceNAndAllocate()
becomes fairly trivial.
The algorithm
replaceN()
's use ofreplace()
results in a standard Shlemiel The Painter-algorithm. It should be linear, but ends up quadratic.countOccurrences()
can fail if some proper suffix (neither empty nor the full string) is a prefix of the needle, specifically if the resultant string is found. Skip over the full match found, not the first character.
Implementation
Yes,
strncpy()
(which is not actually a string-function) will behave identically tomemcpy()
if the firstn - 1
bytes are non-null. It is still inefficient and false advertisement though.&pointer[offset_calculation]
might not be longer thanpointer + offset_calculation
, but it is a bit less idiomatic. You decide whether it should be changed.replaceN()
should be the one to implement directly,replace()
being a convenience-function. The overhead is trivial for the latter, and fixes the current problem with its runtime.Most of the time, you use
pointer
directly, but sometimes you usepointer != NULL
. Fix the inconsistency, preferably in favor of the former, which is more idiomatic and shorter.countOccurrences()
andallocateBuffer()
seem to be internal helpers. Mark themstatic
so they are not exported.Your test-program fails to free the buffer
replaceAllAndAllocate()
allocates.
-
\$\begingroup\$ I'm pretty sure we're not using Shlemiel's algorithm, because
src
gets advanced within the loop (that being the reason the worker function returns the end of the first match). However, if you missed that, it says something about the clarity of that part of the code! \$\endgroup\$Toby Speight– Toby Speight2021年07月06日 06:46:08 +00:00Commented Jul 6, 2021 at 6:46 -
\$\begingroup\$ Oh, actually I see that while the next search starts from the right place, we repeatedly copy the string tail, and that's the bit that's quadratic. Good catch! \$\endgroup\$Toby Speight– Toby Speight2021年07月06日 06:47:28 +00:00Commented Jul 6, 2021 at 6:47
-
\$\begingroup\$ Very good points. I'm going to write a more efficient variant. This was mostly intended to be clear code with most focus on reusing functions. I wrote
replace
with the intention of using it inreplaceN
and that's why I chose the return value that makes more sense in that regard. And it can still be used as a boolean. But I'm not sure what you mean by "Provide some way to properly size the destination buffer." I have a function for that, which isallocateBuffer
\$\endgroup\$klutt– klutt2021年07月06日 07:45:15 +00:00Commented Jul 6, 2021 at 7:45 -
\$\begingroup\$ Algorithm 1) Yes, I agree. I'll write another more efficient variant. 2) Thanks for bug report. \$\endgroup\$klutt– klutt2021年07月06日 07:47:41 +00:00Commented Jul 6, 2021 at 7:47
-
\$\begingroup\$ Implementation 1) Will fix. Makes more sense. 2) I'll think about it. I often mix them, it it depends on if I focus on the array or the pointer. 3) You certainly have a point here.
replace
is not needed. 4) I needed to put the exp in parenthesis to suppress warnings. That's why I added!= NULL
. Because then it does not look like you have too many parenthesis. Yes, I know it does not make sense. 5) Maybe 6) Did not care about that. It's not really a part of the whole. Basically just added it so others can have a quick test case. \$\endgroup\$klutt– klutt2021年07月06日 07:56:41 +00:00Commented Jul 6, 2021 at 7:56
First impression is pretty good - care taken with exception conditions and thought to the interface.
Headers: I find it helpful to have a consistent order for headers, to quickly see if the header I need is already included. For this, I group the Standard Library headers in alphabetical order. You might find that helpful too.
char *ptr = strstr(src, orig); // First match
I'd write const *ptr
there, to avoid accidents.
strncpy(dest, src, offset); // Copy everything before match
Probably memcpy
is more appropriate here, given we know that src
doesn't end before src + offset
.
strcpy(&dest[offset], new); // Copy replacement strcpy(&dest[offset + strlen(new)], &src[offset + origlen]); // Copy rest
It's strange to index an array, only to extract a pointer. I'd write those as:
strcpy(dest + offset, new); // Copy replacement strcpy(dest + offset + strlen(new), src + offset + origlen); // Copy rest
Arguably, we could measure strlen(new)
first, then use memcpy
for the first of those.
If countOccurrences()
and allocateBuffer()
aren't intended to be part of the public interface, declare them with internal linkage (static
).
The test program is an optimist. If you've taken the care to safely return a null pointer from the function, the caller ought to have the decency to set a good example by testing the value rather than blindly passing it on!
-
\$\begingroup\$ I group the Standard Library headers like I do my own headers: in increasing order of specificity. Although that is, admittedly, a bit subjective. But I find this easier to understand than alphabetical order. If I want to know whether a specific header is already included, I'm almost certainly going to just search for it to ensure an exact match. \$\endgroup\$Cody Gray– Cody Gray2021年07月06日 08:09:02 +00:00Commented Jul 6, 2021 at 8:09
-
\$\begingroup\$ I updated the original post now stackoverflow.com/a/68237099/6699433 \$\endgroup\$klutt– klutt2021年07月06日 20:32:13 +00:00Commented Jul 6, 2021 at 20:32