Low level string manipulation functions in C

Question 1

I'm working with a legacy program that does a bunch of markup manipulation, some of it higher-level and specific to the proprietary markup, some of it lower level string stuff.

I rewrote and documented the customized low level stuff (and used string.h stuff where applicable). I'm not not usually a C-coder, but I'm trying to learn.

Am I duplicating core functionality of a standard library?
Am I following conventions/best practices?
Is the code efficient?

string_plus.h

/**
 * @file
 *
 * A small library of additional low level string manipulation functions. If any
 * functionality is duplicated from any of the normal libraries, it is due to
 * ignorance.
 */
#ifndef STRING_PLUS_H
#define STRING_PLUS_H
#include <string.h>
/**
 * Returns 1 if the front of @p text matches @p search_key, else 0.
 */
int begins_with(char const *text, char const *search_key);
/**
 * Condenses a string buffer by sliding all contents at @p p2 down to @p p1. In
 * other words, anything in between @p p1 and @p p2 is deleted. Returns @p p1.
 */
char *condense_buffer(char *p1, char *p2);
/**
 * Modifies @p text, removing all occurrences of @p c. Returns @p text.
 */
char *remove_char_occurrences(char *text, char const c);
/**
 * Inserts @p content into a buffer at @p p by shifting the buffer's data by @p
 * content's size. Returns a pointer at the end of the newly modified buffer.
 *
 * @warning Caller is responsible for guarding against a buffer overrun.
 */
char *insert_content(char *p, char const *content);
#endif /* Header load-once guard clause */

string_plus.c

#include "string_plus.h"
int begins_with(char const *text, char const *search_key) {
 int result = 1;
 while (*search_key != '0円' && (result = *search_key++ == *text++));
 return result;
}
char *condense_buffer(char *p1, char *p2) {
 while (*(p2-1) != '0円') *p1++ =*p2++;
 return p1;
}
char *remove_char_occurrences(char *text, char const c) {
 char *lead = text;
 char *tail = text;
 while (*lead != '0円') {
 while (*lead == c) ++lead;
 *tail++ = *lead++;
 }
 *tail = *lead;
 return text;
}
char *insert_content(char *p, char const *content) {
 int content_len = strlen(content);
 for (char *end = p + strlen(p); end >= p; end--) *(end + content_len) = *end;
 while (*content != '0円') *p++ = *content++;
 return p;
}

test functions

void test_begins_with() {
 Assert(begins_with("Theory", "The") == 1);
 Assert(begins_with("theory", "The") == 0);
 Assert(begins_with("T", "The") == 0);
 Assert(begins_with("Theory", "") == 1);
}
void test_condense_buffer() {
 char buff[] = "hello world";
 condense_buffer(&buff[7], &buff[10]);
 Assert(strcmp("hello wd", buff) == 0);
}
void test_remove_char_occurrences() {
 char s1[] = "hello world";
 Assert(strcmp("hello world", remove_char_occurrences(s1, 'x')) == 0);
 char s2[] = "hello world";
 Assert(strcmp("heo word", remove_char_occurrences(s2, 'l')) == 0);
}
void test_insert_content() {
 char buff[50] = "hello world";
 char *p = insert_content(&buff[5], "BRAVENEW");
 Assert(strcmp("helloBRAVENEW world", buff) == 0);
 Assert(strcmp(" world", p) == 0);
}

Question 2

condense_buffer is equivalent to strcpy. If you still wish to implement it, use an idiomatic C construct:
```
 while ((*p1++ = *p2++) != 0)
 {}
```
begins_with is equivalent to strncmp(text, key, strlen(key)). It might be argued though that strncmp approach requires a call to strlen and thus incurs performance penalty.

Do put the loop body on a separate line. Do not omit curly braces even for one-liners:

 for (char *end = p + strlen(p); end >= p; end--) {
 *(end + content_len) = *end;
 }

Question 3

1. They are only equivalent because p2 is a pointer somewhere into the middle of a c-string. That's for an unreliable and unintuitive precondition. 2. Not sure how the need for strlen in strncmp could be argued. Any hint? 3. Forcing braces for single statements has no advantages. Giving the body it's own line now, sure.

Question 4

@Deduplicator 1. I ain't get it. The code is identical. 2. Because strncmp needs length. 3. It has so many advantages that here is not a place to discuss them, and no disadvantages whatsoever.

Question 5

1. Start condense_buffer at the beginning of a string. 2. Oops, I dropped a thought there. 3. Yep, writing more is always so much better. There should be at least one post on programmers where the camps get into it.

Question 6

Yes! condense_buffer is equivalent to strcpy, I had not realized it. After reanalyzing my codebase, there's no reason why I should keep condense_buffer around. Thanks! That was the sanity check I needed.

Question 7

Thanks for pointing out strncmp too. I'm keeping begins_with, more for semantic reasons than performance, but I did like your idea.

Question 8

Three things strike me when reading through your code:

Loop bodies not indented on separate lines – Quite a few times in this short code segments I wondered why you don't indent stuff when using while. Then I saw that you had loop bodies after the while condition. Do please shift all of these down a line, and always include braces. Your current style will give you issues (read: bugs) sooner or later!
Very limited error handling – You put a lot of trust on the code calling your code as you always expect strings to be 0円-terminated, and have allocated enough memory for your operations to complete. Kind of typical for C, but still not good practice
Missing name convention – If you look at the documentation they are consistent in naming of arguments. I.e. in string.h you see loads of src, dst, c, n which due to their consistency are easily understood. Your names are lacking a little in this regard, and this makes it harder to understand what is happening

Question 9

1. Braceophilia is a recognized disease some people suffer from. Heal it, don't spread it. 2. Well, adding an assert at the start to weed out programmer-error in debug-mode might be a good idea. Anything more is just playing to C's weaknesses instead of strengths.

Question 10

@deduplicator, in this case he does need to add braces, or seriously work with moving code away from after while/if. If not Murphy will come visit him very soon

Question 11

Thank you for your feedback. Indeed, I agree that for (char *end = p + strlen(p); end >= p; end--) *(end + content_len) = *end; needs a line break (and of course, braces!). But I'm comfortable with shorter one-lines: while (*lead == c) ++lead;. For these low level functions, I think the unit-test approach helps keep Murphy at bay more than extra line breaks.

vnp vnp 58.6k4 gold badges55 silver badges144 bronze badges · Answer 1 · 2015-10-14 19:01:50Z

2

\$\begingroup\$

condense_buffer is equivalent to strcpy. If you still wish to implement it, use an idiomatic C construct:
```
 while ((*p1++ = *p2++) != 0)
 {}
```
begins_with is equivalent to strncmp(text, key, strlen(key)). It might be argued though that strncmp approach requires a call to strlen and thus incurs performance penalty.

Do put the loop body on a separate line. Do not omit curly braces even for one-liners:

 for (char *end = p + strlen(p); end >= p; end--) {
 *(end + content_len) = *end;
 }

Share

answered Oct 14, 2015 at 19:01

vnp's user avatar

vnp vnp

58.6k4 gold badges55 silver badges144 bronze badges

\$\endgroup\$

5

\$\begingroup\$ 1. They are only equivalent because p2 is a pointer somewhere into the middle of a c-string. That's for an unreliable and unintuitive precondition. 2. Not sure how the need for strlen in strncmp could be argued. Any hint? 3. Forcing braces for single statements has no advantages. Giving the body it's own line now, sure. \$\endgroup\$

Deduplicator
– Deduplicator

2015年10月14日 20:05:13 +00:00
Commented Oct 14, 2015 at 20:05
\$\begingroup\$ @Deduplicator 1. I ain't get it. The code is identical. 2. Because strncmp needs length. 3. It has so many advantages that here is not a place to discuss them, and no disadvantages whatsoever. \$\endgroup\$

vnp
– vnp

2015年10月14日 20:50:09 +00:00
Commented Oct 14, 2015 at 20:50
\$\begingroup\$ 1. Start condense_buffer at the beginning of a string. 2. Oops, I dropped a thought there. 3. Yep, writing more is always so much better. There should be at least one post on programmers where the camps get into it. \$\endgroup\$

Deduplicator
– Deduplicator

2015年10月14日 20:56:03 +00:00
Commented Oct 14, 2015 at 20:56
\$\begingroup\$ Yes! condense_buffer is equivalent to strcpy, I had not realized it. After reanalyzing my codebase, there's no reason why I should keep condense_buffer around. Thanks! That was the sanity check I needed. \$\endgroup\$

brian_o
– brian_o

2015年10月15日 14:39:40 +00:00
Commented Oct 15, 2015 at 14:39
\$\begingroup\$ Thanks for pointing out strncmp too. I'm keeping begins_with, more for semantic reasons than performance, but I did like your idea. \$\endgroup\$

brian_o
– brian_o

2015年10月15日 14:41:12 +00:00
Commented Oct 15, 2015 at 14:41

Add a comment |

holroy holroy 11.7k1 gold badge27 silver badges59 bronze badges · Answer 2 · 2015-10-14 19:28:48Z

Three things strike me when reading through your code:

Loop bodies not indented on separate lines – Quite a few times in this short code segments I wondered why you don't indent stuff when using while. Then I saw that you had loop bodies after the while condition. Do please shift all of these down a line, and always include braces. Your current style will give you issues (read: bugs) sooner or later!
Very limited error handling – You put a lot of trust on the code calling your code as you always expect strings to be 0円-terminated, and have allocated enough memory for your operations to complete. Kind of typical for C, but still not good practice
Missing name convention – If you look at the documentation they are consistent in naming of arguments. I.e. in string.h you see loads of src, dst, c, n which due to their consistency are easily understood. Your names are lacking a little in this regard, and this makes it harder to understand what is happening

1. Braceophilia is a recognized disease some people suffer from. Heal it, don't spread it. 2. Well, adding an assert at the start to weed out programmer-error in debug-mode might be a good idea. Anything more is just playing to C's weaknesses instead of strengths.
@deduplicator, in this case he does need to add braces, or seriously work with moving code away from after while/if. If not Murphy will come visit him very soon
Thank you for your feedback. Indeed, I agree that for (char *end = p + strlen(p); end >= p; end--) *(end + content_len) = *end; needs a line break (and of course, braces!). But I'm comfortable with shorter one-lines: while (*lead == c) ++lead;. For these low level functions, I think the unit-test approach helps keep Murphy at bay more than extra line breaks.

Stack Exchange Network

Low level string manipulation functions in C

string_plus.h

string_plus.c

test functions

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Low level string manipulation functions in C

string_plus.h

string_plus.c

test functions

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions