Remove specified number of characters from a string

Question 1

From an exercise in Kochan's Programming in C, following a chapter on C's null-terminated strings:

Write a function called removeString to remove a specified number of characters from a character string. The function should take three arguments: the source string, the starting index number in the source string, and the number of characters to remove. So, if the character array text contains the string "the wrong son", the call

removeString (text, 4, 6);

has the effect of removing the characters "wrong " (the word "wrong" plus the space that follows) from the array text. The resulting string inside text is then "the son".

I found the exercise interesting to implement. I know I could have used strlen to get the length of the string, but I like the idea of handling everything in a single pass (I imagine strlen has to traverse the characters looking for a null-byte).

Any comments/criticisms are welcome. Here's my solution:

void removeString (char text[], int index, int rm_length)
{
 int i;
 for ( i = 0; i < index; ++i )
 if ( text[i] == '0円' )
 return;
 for ( ; i < index + rm_length; ++i )
 if ( text[i] == '0円' ) {
 text[index] = '0円';
 return;
 }
 do {
 text[i - rm_length] = text[i];
 } while ( text[i++] != '0円' );
}

And a test drive

int main (void)
{
 char string1[] = "the wrong son";
 char string2[] = "the wrong son";
 char string3[] = "the wrong son";
 printf ("string1: %s\n", string1);
 printf ("string2: %s\n", string2);
 printf ("string3: %s\n\n", string3);
 printf ("removeString (string1, 13, 6)\n");
 removeString (string1, 13, 6);
 printf ("string1: %s\n\n", string1);
 printf ("removeString (string2, 11, 6)\n");
 removeString (string2, 11, 6);
 printf ("string2: %s\n\n", string2);
 printf ("removeString (string3, 4, 6)\n");
 removeString (string3, 4, 6);
 printf ("string3: %s\n\n", string3);
 return 0;
}

Output:

string1: the wrong son
string2: the wrong son
string3: the wrong son
removeString (string1, 13, 6)
string1: the wrong son
removeString (string2, 11, 6)
string2: the wrong s
removeString (string3, 4, 6)
string3: the son

Question 2

Looking at this, curious if int is always large enough index for every possible char[]. cprogramming.com/tutorial/secure.html may be of interest.

Question 3

Hm, I hadn't considered that. I guess a long or long long would be better?

Question 4

One quick stylistic comment.

Braces

for ( ; i < index + rm_length; ++i )
 if ( text[i] == '0円' ) {
 text[index] = '0円';
 return;
 }

Constructs like this are begging to have a subtle hidden bug in them. Always use braces if you have more than a single statement inside a for/if statement.

for (; i < index + rm_length; ++i) {
 if ( text[i] == '0円' ) {
 text[index] = '0円';
 return;
 }
}

Question 5

Two loops looking for the same thing: the null character within n characters. They just react differently. Confident with pipelines architectures, this will be faster. So instead of 2 loops, use
```
rm_end = index + rm_length;
for ( i = 0; i < rm_end; ++i ) {
 if ( text[i] == '0円' ) {
 if (i > index) {
 text[index] = '0円';
 }
 return;
 }
}
```
In C, strings are arrays and array sizes are best indexed with size_t, rather than int, long or long long. size_t is the return type of sizeof, so size_t is neither too small nor excessively wide to represent all possible string sizes. int may be too narrow. Note: strlen() returns type size_t. size_t is some unsigned type.
C string functions typically return something. Returning the destination string could be useful with minimal cost.

Putting this all together:

char *removeString(char *text, size_t index, size_t rm_length) {
 size_t rm_end = index + rm_length;
 size_t i;
 for (i = 0; i < rm_end; i++) {
 if (text[i] == '0円') {
 if (i > index) {
 text[index] = '0円';
 }
 return text;
 }
 }
 do {
 text[i - rm_length] = text[i];
 } while (text[i++] != '0円');
 return text;
}

Pedantic code ensures no addition overflow as in index + rm_length:

char *removeString(char *text, size_t index, size_t rm_length) {
 if (index >= SIZE_MAX - rm_length) {
 rm_length = SIZE_MAX - 1 - index;
 }
 size_t rm_end = index + rm_length;
 size_t i;
 ...

Question 6

Why will it be faster to run n + m iterations in a single loop than to run a loop of n iterations followed by a loop of m iterations? Is there that much overhead in setting up each loop?

Question 7

@ivan Certainly not a huge difference - it is the same O(). The benefit is platform dependent. But, in general, one short loop, is faster (or as fast) as 2 short loops.

Question 8

Interesting, I'm seeing the opposite on my system, except with input that truncates a very short string to an even shorter one. Not a significant difference either way though. Anyway, good to know! Thanks

Question 9

Your code is reasonable but I would prefer it if you used standard library functions instead of rolling your own loops. And as others have said, use of size_t is normal, as is returning a value - the start of the string makes sense in this case.

char* removeString (char s[], size_t offset, size_t length)
{
 if (memchr(s, '0円', offset)) {
 return s;
 }
 char *dest = s + offset;
 if (memchr(dest, '0円', length)) {
 *dest = '0円';
 return dest;
 }
 /* Fixed error pointed out by JS1 */
 for (const char *src = dest + length; *dest != '0円'; ++dest, ++src) {
 *dest = *src;
 } 
 return s;
}

You could use memmove (not memcpy, which doesn't handle overlapping areas) in place of the final loop, but that would mean computing the length first.

Question 10

Nice, novel use of memchr() +1. Its tempting to profile this, mine, and one that combines the first 2 loops (use 1 memchar()` loop).

Question 11

Your copy loop is wrong because is it checking the wrong dest character (because it has already advanced by one). It happens to work because it will stop after reaching the next null character, but by that time, src will have read past the end of the string.

Question 12

I think your code can be more concise if you use memcpy. Based on already provided feedback, I have used long instead of int and done some checks:

#include <cstdio>
#include <iostream>
#include <string>
#include <cstring>
void removeString (char text[], long index, long rm_length)
{
 long len = strlen(text);
 if (index < 0 || rm_length < 0 || index + rm_length >= len)
 return;
 memmove(&text[index], &text[index + rm_length], len - index - rm_length);
 text[len - rm_length] = '0円';
}
int main()
{
 char text1[] = "the wrong son";
 removeString (text1, 4, 6);
 printf(text1); printf("\n");
 char text2[] = "text for incorrect indexes";
 removeString (text1, 100, 2);
 printf(text2); printf("\n");
 char text3[] = "text for negative length";
 removeString (text3, 4, -2);
 printf(text3); printf("\n"); printf("\n");
}

This is done in Cpp.sh and should run without any additions. Performance-wise, memcpy is advertised as the fastest memory copy function, but I also use strlen which has to go through all the string (more details are provided here).

[later edit]

As correctly pointed out in a comment, memmove should be used, so I have edited my code to use it instead of memcpy.

Question 13

memcpy doesn't handle overlapping areas

Question 14

All the tests reveal the opposite and also how memcpy works is specified here.

Question 15

As your link says, memcpy doesn't handle overlapping areas. You may find that it 'works' in your example with your compiler on your target machine, but it is not the correct function to use.

Question 16

Yes, that is correct. I have updated my code.

Bizkit Bizkit 1,73910 silver badges17 bronze badges · Answer 1 · 2016-01-06 14:51:24Z

One quick stylistic comment.

Braces

for ( ; i < index + rm_length; ++i )
 if ( text[i] == '0円' ) {
 text[index] = '0円';
 return;
 }

Constructs like this are begging to have a subtle hidden bug in them. Always use braces if you have more than a single statement inside a for/if statement.

for (; i < index + rm_length; ++i) {
 if ( text[i] == '0円' ) {
 text[index] = '0円';
 return;
 }
}

chux chux 36.2k2 gold badges43 silver badges96 bronze badges · Answer 2 · 2016-01-07 22:39:22Z

Two loops looking for the same thing: the null character within n characters. They just react differently. Confident with pipelines architectures, this will be faster. So instead of 2 loops, use
```
rm_end = index + rm_length;
for ( i = 0; i < rm_end; ++i ) {
 if ( text[i] == '0円' ) {
 if (i > index) {
 text[index] = '0円';
 }
 return;
 }
}
```
In C, strings are arrays and array sizes are best indexed with size_t, rather than int, long or long long. size_t is the return type of sizeof, so size_t is neither too small nor excessively wide to represent all possible string sizes. int may be too narrow. Note: strlen() returns type size_t. size_t is some unsigned type.
C string functions typically return something. Returning the destination string could be useful with minimal cost.

Putting this all together:

char *removeString(char *text, size_t index, size_t rm_length) {
 size_t rm_end = index + rm_length;
 size_t i;
 for (i = 0; i < rm_end; i++) {
 if (text[i] == '0円') {
 if (i > index) {
 text[index] = '0円';
 }
 return text;
 }
 }
 do {
 text[i - rm_length] = text[i];
 } while (text[i++] != '0円');
 return text;
}

Pedantic code ensures no addition overflow as in index + rm_length:

char *removeString(char *text, size_t index, size_t rm_length) {
 if (index >= SIZE_MAX - rm_length) {
 rm_length = SIZE_MAX - 1 - index;
 }
 size_t rm_end = index + rm_length;
 size_t i;
 ...

Why will it be faster to run n + m iterations in a single loop than to run a loop of n iterations followed by a loop of m iterations? Is there that much overhead in setting up each loop?
@ivan Certainly not a huge difference - it is the same O(). The benefit is platform dependent. But, in general, one short loop, is faster (or as fast) as 2 short loops.
Interesting, I'm seeing the opposite on my system, except with input that truncates a very short string to an even shorter one. Not a significant difference either way though. Anyway, good to know! Thanks

William Morris William Morris 9,40419 silver badges43 bronze badges · Answer 3 · 2016-01-08 20:27:03Z

Your code is reasonable but I would prefer it if you used standard library functions instead of rolling your own loops. And as others have said, use of size_t is normal, as is returning a value - the start of the string makes sense in this case.

char* removeString (char s[], size_t offset, size_t length)
{
 if (memchr(s, '0円', offset)) {
 return s;
 }
 char *dest = s + offset;
 if (memchr(dest, '0円', length)) {
 *dest = '0円';
 return dest;
 }
 /* Fixed error pointed out by JS1 */
 for (const char *src = dest + length; *dest != '0円'; ++dest, ++src) {
 *dest = *src;
 } 
 return s;
}

You could use memmove (not memcpy, which doesn't handle overlapping areas) in place of the final loop, but that would mean computing the length first.

Nice, novel use of memchr() +1. Its tempting to profile this, mine, and one that combines the first 2 loops (use 1 memchar()` loop).
Your copy loop is wrong because is it checking the wrong dest character (because it has already advanced by one). It happens to work because it will stop after reaching the next null character, but by that time, src will have read past the end of the string.

Alexei Alexei 1,7961 gold badge14 silver badges34 bronze badges · Answer 4 · 2016-01-06 20:53:48Z

I think your code can be more concise if you use memcpy. Based on already provided feedback, I have used long instead of int and done some checks:

#include <cstdio>
#include <iostream>
#include <string>
#include <cstring>
void removeString (char text[], long index, long rm_length)
{
 long len = strlen(text);
 if (index < 0 || rm_length < 0 || index + rm_length >= len)
 return;
 memmove(&text[index], &text[index + rm_length], len - index - rm_length);
 text[len - rm_length] = '0円';
}
int main()
{
 char text1[] = "the wrong son";
 removeString (text1, 4, 6);
 printf(text1); printf("\n");
 char text2[] = "text for incorrect indexes";
 removeString (text1, 100, 2);
 printf(text2); printf("\n");
 char text3[] = "text for negative length";
 removeString (text3, 4, -2);
 printf(text3); printf("\n"); printf("\n");
}

This is done in Cpp.sh and should run without any additions. Performance-wise, memcpy is advertised as the fastest memory copy function, but I also use strlen which has to go through all the string (more details are provided here).

[later edit]

As correctly pointed out in a comment, memmove should be used, so I have edited my code to use it instead of memcpy.

All the tests reveal the opposite and also how memcpy works is specified here.
As your link says, memcpy doesn't handle overlapping areas. You may find that it 'works' in your example with your compiler on your target machine, but it is not the correct function to use.

Stack Exchange Network

Remove specified number of characters from a string

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Remove specified number of characters from a string

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions