I'm working with a legacy program that does a bunch of markup manipulation, some of it higher-level and specific to the proprietary markup, some of it lower level string stuff.
I rewrote and documented the customized low level stuff (and used string.h stuff where applicable). I'm not not usually a C-coder, but I'm trying to learn.
- Am I duplicating core functionality of a standard library?
- Am I following conventions/best practices?
- Is the code efficient?
string_plus.h
/**
* @file
*
* A small library of additional low level string manipulation functions. If any
* functionality is duplicated from any of the normal libraries, it is due to
* ignorance.
*/
#ifndef STRING_PLUS_H
#define STRING_PLUS_H
#include <string.h>
/**
* Returns 1 if the front of @p text matches @p search_key, else 0.
*/
int begins_with(char const *text, char const *search_key);
/**
* Condenses a string buffer by sliding all contents at @p p2 down to @p p1. In
* other words, anything in between @p p1 and @p p2 is deleted. Returns @p p1.
*/
char *condense_buffer(char *p1, char *p2);
/**
* Modifies @p text, removing all occurrences of @p c. Returns @p text.
*/
char *remove_char_occurrences(char *text, char const c);
/**
* Inserts @p content into a buffer at @p p by shifting the buffer's data by @p
* content's size. Returns a pointer at the end of the newly modified buffer.
*
* @warning Caller is responsible for guarding against a buffer overrun.
*/
char *insert_content(char *p, char const *content);
#endif /* Header load-once guard clause */
string_plus.c
#include "string_plus.h"
int begins_with(char const *text, char const *search_key) {
int result = 1;
while (*search_key != '0円' && (result = *search_key++ == *text++));
return result;
}
char *condense_buffer(char *p1, char *p2) {
while (*(p2-1) != '0円') *p1++ =*p2++;
return p1;
}
char *remove_char_occurrences(char *text, char const c) {
char *lead = text;
char *tail = text;
while (*lead != '0円') {
while (*lead == c) ++lead;
*tail++ = *lead++;
}
*tail = *lead;
return text;
}
char *insert_content(char *p, char const *content) {
int content_len = strlen(content);
for (char *end = p + strlen(p); end >= p; end--) *(end + content_len) = *end;
while (*content != '0円') *p++ = *content++;
return p;
}
test functions
void test_begins_with() {
Assert(begins_with("Theory", "The") == 1);
Assert(begins_with("theory", "The") == 0);
Assert(begins_with("T", "The") == 0);
Assert(begins_with("Theory", "") == 1);
}
void test_condense_buffer() {
char buff[] = "hello world";
condense_buffer(&buff[7], &buff[10]);
Assert(strcmp("hello wd", buff) == 0);
}
void test_remove_char_occurrences() {
char s1[] = "hello world";
Assert(strcmp("hello world", remove_char_occurrences(s1, 'x')) == 0);
char s2[] = "hello world";
Assert(strcmp("heo word", remove_char_occurrences(s2, 'l')) == 0);
}
void test_insert_content() {
char buff[50] = "hello world";
char *p = insert_content(&buff[5], "BRAVENEW");
Assert(strcmp("helloBRAVENEW world", buff) == 0);
Assert(strcmp(" world", p) == 0);
}
2 Answers 2
condense_buffer
is equivalent tostrcpy
. If you still wish to implement it, use an idiomaticC
construct:while ((*p1++ = *p2++) != 0) {}
begins_with
is equivalent tostrncmp(text, key, strlen(key))
. It might be argued though thatstrncmp
approach requires a call tostrlen
and thus incurs performance penalty.Do put the loop body on a separate line. Do not omit curly braces even for one-liners:
for (char *end = p + strlen(p); end >= p; end--) { *(end + content_len) = *end; }
-
\$\begingroup\$ 1. They are only equivalent because
p2
is a pointer somewhere into the middle of a c-string. That's for an unreliable and unintuitive precondition. 2. Not sure how the need forstrlen
instrncmp
could be argued. Any hint? 3. Forcing braces for single statements has no advantages. Giving the body it's own line now, sure. \$\endgroup\$Deduplicator– Deduplicator2015年10月14日 20:05:13 +00:00Commented Oct 14, 2015 at 20:05 -
\$\begingroup\$ @Deduplicator 1. I ain't get it. The code is identical. 2. Because
strncmp
needs length. 3. It has so many advantages that here is not a place to discuss them, and no disadvantages whatsoever. \$\endgroup\$vnp– vnp2015年10月14日 20:50:09 +00:00Commented Oct 14, 2015 at 20:50 -
\$\begingroup\$ 1. Start
condense_buffer
at the beginning of a string. 2. Oops, I dropped a thought there. 3. Yep, writing more is always so much better. There should be at least one post on programmers where the camps get into it. \$\endgroup\$Deduplicator– Deduplicator2015年10月14日 20:56:03 +00:00Commented Oct 14, 2015 at 20:56 -
\$\begingroup\$ Yes!
condense_buffer
is equivalent tostrcpy
, I had not realized it. After reanalyzing my codebase, there's no reason why I should keepcondense_buffer
around. Thanks! That was the sanity check I needed. \$\endgroup\$brian_o– brian_o2015年10月15日 14:39:40 +00:00Commented Oct 15, 2015 at 14:39 -
\$\begingroup\$ Thanks for pointing out
strncmp
too. I'm keepingbegins_with
, more for semantic reasons than performance, but I did like your idea. \$\endgroup\$brian_o– brian_o2015年10月15日 14:41:12 +00:00Commented Oct 15, 2015 at 14:41
Three things strike me when reading through your code:
- Loop bodies not indented on separate lines – Quite a few times in this short code segments I wondered why you don't indent stuff when using
while
. Then I saw that you had loop bodies after thewhile
condition. Do please shift all of these down a line, and always include braces. Your current style will give you issues (read: bugs) sooner or later! - Very limited error handling – You put a lot of trust on the code calling your code as you always expect strings to be
0円
-terminated, and have allocated enough memory for your operations to complete. Kind of typical for C, but still not good practice - Missing name convention – If you look at the documentation they are consistent in naming of arguments. I.e. in
string.h
you see loads ofsrc
,dst
,c
,n
which due to their consistency are easily understood. Your names are lacking a little in this regard, and this makes it harder to understand what is happening
-
1\$\begingroup\$ 1. Braceophilia is a recognized disease some people suffer from. Heal it, don't spread it. 2. Well, adding an assert at the start to weed out programmer-error in debug-mode might be a good idea. Anything more is just playing to C's weaknesses instead of strengths. \$\endgroup\$Deduplicator– Deduplicator2015年10月14日 19:55:57 +00:00Commented Oct 14, 2015 at 19:55
-
\$\begingroup\$ @deduplicator, in this case he does need to add braces, or seriously work with moving code away from after while/if. If not Murphy will come visit him very soon \$\endgroup\$holroy– holroy2015年10月15日 07:04:40 +00:00Commented Oct 15, 2015 at 7:04
-
\$\begingroup\$ Thank you for your feedback. Indeed, I agree that
for (char *end = p + strlen(p); end >= p; end--) *(end + content_len) = *end;
needs a line break (and of course, braces!). But I'm comfortable with shorter one-lines:while (*lead == c) ++lead;
. For these low level functions, I think the unit-test approach helps keep Murphy at bay more than extra line breaks. \$\endgroup\$brian_o– brian_o2015年10月15日 14:35:59 +00:00Commented Oct 15, 2015 at 14:35