From an exercise in Kochan's Programming in C, following a chapter on C's null-terminated strings:
Write a function called removeString to remove a specified number of characters from a character string. The function should take three arguments: the source string, the starting index number in the source string, and the number of characters to remove. So, if the character array text contains the string "the wrong son", the call
removeString (text, 4, 6);
has the effect of removing the characters "wrong " (the word "wrong" plus the space that follows) from the array text. The resulting string inside text is then "the son".
I found the exercise interesting to implement. I know I could have used strlen
to get the length of the string, but I like the idea of handling everything in a single pass (I imagine strlen
has to traverse the characters looking for a null-byte).
Any comments/criticisms are welcome. Here's my solution:
void removeString (char text[], int index, int rm_length)
{
int i;
for ( i = 0; i < index; ++i )
if ( text[i] == '0円' )
return;
for ( ; i < index + rm_length; ++i )
if ( text[i] == '0円' ) {
text[index] = '0円';
return;
}
do {
text[i - rm_length] = text[i];
} while ( text[i++] != '0円' );
}
And a test drive
int main (void)
{
char string1[] = "the wrong son";
char string2[] = "the wrong son";
char string3[] = "the wrong son";
printf ("string1: %s\n", string1);
printf ("string2: %s\n", string2);
printf ("string3: %s\n\n", string3);
printf ("removeString (string1, 13, 6)\n");
removeString (string1, 13, 6);
printf ("string1: %s\n\n", string1);
printf ("removeString (string2, 11, 6)\n");
removeString (string2, 11, 6);
printf ("string2: %s\n\n", string2);
printf ("removeString (string3, 4, 6)\n");
removeString (string3, 4, 6);
printf ("string3: %s\n\n", string3);
return 0;
}
Output:
string1: the wrong son
string2: the wrong son
string3: the wrong son
removeString (string1, 13, 6)
string1: the wrong son
removeString (string2, 11, 6)
string2: the wrong s
removeString (string3, 4, 6)
string3: the son
4 Answers 4
One quick stylistic comment.
Braces
for ( ; i < index + rm_length; ++i )
if ( text[i] == '0円' ) {
text[index] = '0円';
return;
}
Constructs like this are begging to have a subtle hidden bug in them. Always use braces if you have more than a single statement inside a for
/if
statement.
for (; i < index + rm_length; ++i) {
if ( text[i] == '0円' ) {
text[index] = '0円';
return;
}
}
Two loops looking for the same thing: the null character within
n
characters. They just react differently. Confident with pipelines architectures, this will be faster. So instead of 2 loops, userm_end = index + rm_length; for ( i = 0; i < rm_end; ++i ) { if ( text[i] == '0円' ) { if (i > index) { text[index] = '0円'; } return; } }
In C, strings are arrays and array sizes are best indexed with
size_t
, rather thanint
,long
orlong long
.size_t
is the return type ofsizeof
, sosize_t
is neither too small nor excessively wide to represent all possible string sizes.int
may be too narrow. Note:strlen()
returns typesize_t
.size_t
is some unsigned type.C string functions typically return something. Returning the destination string could be useful with minimal cost.
Putting this all together:
char *removeString(char *text, size_t index, size_t rm_length) {
size_t rm_end = index + rm_length;
size_t i;
for (i = 0; i < rm_end; i++) {
if (text[i] == '0円') {
if (i > index) {
text[index] = '0円';
}
return text;
}
}
do {
text[i - rm_length] = text[i];
} while (text[i++] != '0円');
return text;
}
Pedantic code ensures no addition overflow as in index + rm_length
:
char *removeString(char *text, size_t index, size_t rm_length) {
if (index >= SIZE_MAX - rm_length) {
rm_length = SIZE_MAX - 1 - index;
}
size_t rm_end = index + rm_length;
size_t i;
...
-
\$\begingroup\$ Why will it be faster to run n + m iterations in a single loop than to run a loop of n iterations followed by a loop of m iterations? Is there that much overhead in setting up each loop? \$\endgroup\$ivan– ivan2016年01月09日 02:45:00 +00:00Commented Jan 9, 2016 at 2:45
-
\$\begingroup\$ @ivan Certainly not a huge difference - it is the same O(). The benefit is platform dependent. But, in general, one short loop, is faster (or as fast) as 2 short loops. \$\endgroup\$chux– chux2016年01月09日 02:55:39 +00:00Commented Jan 9, 2016 at 2:55
-
\$\begingroup\$ Interesting, I'm seeing the opposite on my system, except with input that truncates a very short string to an even shorter one. Not a significant difference either way though. Anyway, good to know! Thanks \$\endgroup\$ivan– ivan2016年01月09日 03:35:01 +00:00Commented Jan 9, 2016 at 3:35
Your code is reasonable but I would prefer it if you used standard library functions instead of rolling your own loops. And as others have said, use of size_t
is normal, as is returning a value - the start of the string makes sense in this case.
char* removeString (char s[], size_t offset, size_t length)
{
if (memchr(s, '0円', offset)) {
return s;
}
char *dest = s + offset;
if (memchr(dest, '0円', length)) {
*dest = '0円';
return dest;
}
/* Fixed error pointed out by JS1 */
for (const char *src = dest + length; *dest != '0円'; ++dest, ++src) {
*dest = *src;
}
return s;
}
You could use memmove
(not memcpy
, which doesn't handle overlapping areas) in place of the final loop, but that would mean computing the length first.
-
-
1\$\begingroup\$ Your copy loop is wrong because is it checking the wrong
dest
character (because it has already advanced by one). It happens to work because it will stop after reaching the next null character, but by that time,src
will have read past the end of the string. \$\endgroup\$JS1– JS12016年01月09日 04:25:11 +00:00Commented Jan 9, 2016 at 4:25
I think your code can be more concise if you use memcpy. Based on already provided feedback, I have used long
instead of int
and done some checks:
#include <cstdio>
#include <iostream>
#include <string>
#include <cstring>
void removeString (char text[], long index, long rm_length)
{
long len = strlen(text);
if (index < 0 || rm_length < 0 || index + rm_length >= len)
return;
memmove(&text[index], &text[index + rm_length], len - index - rm_length);
text[len - rm_length] = '0円';
}
int main()
{
char text1[] = "the wrong son";
removeString (text1, 4, 6);
printf(text1); printf("\n");
char text2[] = "text for incorrect indexes";
removeString (text1, 100, 2);
printf(text2); printf("\n");
char text3[] = "text for negative length";
removeString (text3, 4, -2);
printf(text3); printf("\n"); printf("\n");
}
This is done in Cpp.sh and should run without any additions. Performance-wise, memcpy is advertised as the fastest memory copy function, but I also use strlen
which has to go through all the string (more details are provided here).
[later edit]
As correctly pointed out in a comment, memmove should be used, so I have edited my code to use it instead of memcpy
.
-
\$\begingroup\$
memcpy
doesn't handle overlapping areas \$\endgroup\$William Morris– William Morris2016年01月08日 20:27:28 +00:00Commented Jan 8, 2016 at 20:27 -
-
\$\begingroup\$ As your link says,
memcpy
doesn't handle overlapping areas. You may find that it 'works' in your example with your compiler on your target machine, but it is not the correct function to use. \$\endgroup\$William Morris– William Morris2016年01月08日 23:26:54 +00:00Commented Jan 8, 2016 at 23:26 -
\$\begingroup\$ Yes, that is correct. I have updated my code. \$\endgroup\$Alexei– Alexei2016年01月09日 02:34:14 +00:00Commented Jan 9, 2016 at 2:34
int
is always large enough index for every possiblechar[]
. cprogramming.com/tutorial/secure.html may be of interest. \$\endgroup\$long
orlong long
would be better? \$\endgroup\$