Function to remove set of trailing characters from string

Question 1

I needed the ability to truncate the trailing end of path strings in a routine that builds new search paths as it recursively searches directories. After not finding what I was looking for I created function below.

Expected behavior is that function remove_trailing_chars() will update in to remove any occurrences of chars contained in rem, iff they exist contiguously at the very end of the original version of in.

Once a character in the in string becomes the trailing char and it is not included in rem, then function updates in with latest version and returns.

it has been tested for several variations of input char arrays in and rem, including these.

 char in[] = "this is a string with \\ *\\*";//edit this string as needed to test
 char rem[] = "\\* ";//edit this string as needed to test

results in "this is a string with" without following space

 char in[] = "this is a string with *\\*";//edit this string as needed to test
 char rem[] = "\\*";//edit this string as needed to test

results in "this is a string with " includes following space

I am interested in suggestions for efficiency improvements in speed, and readability improvements. (suggestions on more idiomatic methods are welcome.) I do not believe memory should be an issue with this for my usage, but if there are thoughts on any pitfalls in that area, please include them as well.

Here is the code, including one usage case... (Compiler command line and its disassembly are included further down as well.)

#include <stdbool.h>//bool
#include <string.h>//strlen, strcpy
#include <stdlib.h>
//prototypes
void remove_trailing_chars(char *in, const char *rem);
/// demonstrate removing all chars in 'rem' if trailing in 'in'.
int main(void)
{
 char in[] = "this is a string with \\ *\\*";//edit this string as needed to test
 char rem[] = "\\* ";//edit this string as needed to test
 
 remove_trailing_chars(in, rem);
 
 return 0;
}
/// remove all occurrences of chars in 'rem' from end of 'in'
void remove_trailing_chars(char *in, const char *rem)
{
 bool found = true;//when false, last char of 'in' found no matches in 'rem'
 int len = strlen(in);
 char in_dup[len+1];
 strcpy(in_dup, in);
 
 while(found)
 {
 found = false;//for this element of rem
 len = strlen(in_dup);
 int i = 0;
 while(rem[i])
 {
 if(in_dup[len-1] == rem[i])
 {
 in_dup[len - 1] = 0;
 found = true;
 break;
 }
 else
 {
 i++; 
 }
 }
 } 
 strcpy(in, in_dup);
}

Using GCC, build was done with:

Release target:

mingw32-gcc.exe -Wall -O2 -Wall -std=c99 -g -c C:\tempExtract\remove_trainling_chars\main.c -o obj\Release\main.o

Debug target: (to allow viewing disassembly)

gcc.exe -Wall -g -Wall -std=c99 -g -c C:\tempExtract\remove_trainling_chars\main.c -o obj\Debug\main.o

Question 2

Accessing array out of bounds

The code here may try access in_dup[-1] in some cases:

len = strlen(in_dup);
int i = 0;
while(rem[i])
{
 if(in_dup[len-1] == rem[i])
 ^^^^^^^^^^^^^

That is, when the input string is empty, or when the entire input string is made of characters in rem, then in_dup will become empty, len becomes 0, and len - 1 will be an illegal access on in_dup.

In short, the code is missing a check on reaching the beginning of the input.

Avoid unnecessary copying

The code copies in to in_dup, works with in_dup, then copies back from it to in. This is unnecessary, you could work directly with in.

Avoid unnecessary computations

len = strlen(in_dup) is executed every time after some characters are removed from the end. This is inefficient, because strlen needs to loop over the entire string. Instead, you could count the number of characters removed, and then you'll know exactly the end of the input string.

Simplify algorithm

Consider this simpler algorithm:

Loop from the end of the input, going backwards, until the beginning
Loop over the characters in rem, check if it matches the last character of the input
- If there is a match, delete the last character and break out of this inner loop
- If there is no match, then we're done, break out of the outer loop

Implementation, including the other tips above applied as well:

void remove_trailing_chars(char *in, const char *rem)
{
 int remLength = strlen(rem);
 for (int i = strlen(in) - 1; i >= 0; i--) {
 int j = 0;
 while (j < remLength) {
 if (in[i] == rem[j]) {
 in[i] = '0円';
 break;
 }
 j++;
 }
 if (j == remLength) break;
 } 
}

Question 3

I appreciate the detail in your explanations, and follow most of them. I am unclear on your use of strlen in the for expression though. i.e. you had made the point earlier about removing strlen from being called repeatedly in a loop.

Question 4

@ryyker the initializer of the loop is only executed once ;-)

Question 5

Recomputing of strlen(in_dup) at each iteration of the while loop drive the time complexity to quadratic. Better compute it once, and subtract 1 per iteration.
I see no reason to copy in to in_dup, and then back. The second copy assumes that in is writable. Better operate directly on in.
Use standard library. strchr does precisely the same job as the inner loop, and likely does it better.
A variable like bool found is usually a red flag.

All that said, consider

char * end = in + strlen(in);
while (end > in) {
 char ch = *--end;
 if (strchr(rem, ch) {
 *end = 0;
 } else {
 break;
 }
}

Question 6

Great list of suggestions. All good except one question: Why does bool found usually indicate a red flag? btw, suggested code segment has one small typo: (strchr(rem, ch) -> (strchr(rem, ch)). Other than that, very clean. Thank you for reviewing.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2021-11-02 20:30:50Z

Accessing array out of bounds

The code here may try access in_dup[-1] in some cases:

len = strlen(in_dup);
int i = 0;
while(rem[i])
{
 if(in_dup[len-1] == rem[i])
 ^^^^^^^^^^^^^

That is, when the input string is empty, or when the entire input string is made of characters in rem, then in_dup will become empty, len becomes 0, and len - 1 will be an illegal access on in_dup.

In short, the code is missing a check on reaching the beginning of the input.

Avoid unnecessary copying

The code copies in to in_dup, works with in_dup, then copies back from it to in. This is unnecessary, you could work directly with in.

Avoid unnecessary computations

len = strlen(in_dup) is executed every time after some characters are removed from the end. This is inefficient, because strlen needs to loop over the entire string. Instead, you could count the number of characters removed, and then you'll know exactly the end of the input string.

Simplify algorithm

Consider this simpler algorithm:

Loop from the end of the input, going backwards, until the beginning
Loop over the characters in rem, check if it matches the last character of the input
- If there is a match, delete the last character and break out of this inner loop
- If there is no match, then we're done, break out of the outer loop

Implementation, including the other tips above applied as well:

void remove_trailing_chars(char *in, const char *rem)
{
 int remLength = strlen(rem);
 for (int i = strlen(in) - 1; i >= 0; i--) {
 int j = 0;
 while (j < remLength) {
 if (in[i] == rem[j]) {
 in[i] = '0円';
 break;
 }
 j++;
 }
 if (j == remLength) break;
 } 
}

I appreciate the detail in your explanations, and follow most of them. I am unclear on your use of strlen in the for expression though. i.e. you had made the point earlier about removing strlen from being called repeatedly in a loop.
@ryyker the initializer of the loop is only executed once ;-)

Stack Exchange Network

Function to remove set of trailing characters from string

2 Answers 2

Accessing array out of bounds

Avoid unnecessary copying

Avoid unnecessary computations

Simplify algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Function to remove set of trailing characters from string

2 Answers 2

Accessing array out of bounds

Avoid unnecessary copying

Avoid unnecessary computations

Simplify algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions