After years of criticizing others, I've finally found the time and worked up the courage to polish up one of my bits of code and solicit criticisms of my own.
This is a simple dynamic-string library that I wrote for one of my old C projects (don't judge too harshly if you look into it -- it's a few years old, and I've just begun revising it).
Some notes on the design:
- Its interface is fairly heavily influenced by C++'s
std::string
- Internally, it manages a dynamically allocated character buffer
- It is encoding agnostic -- it thinks in bytes, not true characters
- For academic interest, I went ahead and did a naive small string optimization
Please note that this is not intended to be a fully functioning, Swiss army knife of a string library. The functionality is very limited, and that's the way I intended it. All it is used for is corralling an HTTP request response into something a bit safer and more flexible than a bare manually managed c-string. There is a possibility that it may grow more complex though, so anything in the interface that could prohibit that would be of great interest to me.
To my knowledge, it is fully compliant to the C99 standard.
string_buffer.h
#ifndef UPMON_STRING_BUFFER_H
#define UPMON_STRING_BUFFER_H
#ifdef __cplusplus
extern "C" {
#endif
#include <stddef.h>
#include <stdbool.h>
typedef struct StringBuffer {
char* str;
size_t len;
size_t cap;
char small_str[64];
} StringBuffer;
void string_buffer_init(StringBuffer* s);
void string_buffer_cleanup(StringBuffer* s);
const char* string_buffer_cstr(const StringBuffer* s);
size_t string_buffer_length(const StringBuffer* s);
bool string_buffer_set_bytes(StringBuffer* s, const char* str, size_t len);
bool string_buffer_set_cstr(StringBuffer* s, const char* str);
bool string_buffer_set_string_buffer(StringBuffer* dst, const StringBuffer* src);
bool string_buffer_append_bytes(StringBuffer* s, const char* str, size_t len);
bool string_buffer_append_cstr(StringBuffer* s, const char* str);
bool string_buffer_append_string_buffer(StringBuffer* dst, const StringBuffer* src);
void string_buffer_clear(StringBuffer* s);
#ifdef __cplusplus
}
#endif
#endif
string_buffer.c
#include "string_buffer.h"
#include <stdlib.h>
#include <string.h>
void string_buffer_init(StringBuffer* s) {
s->small_str[0] = '0円';
s->str = NULL;
s->len = 0;
s->cap = sizeof(s->small_str);
}
void string_buffer_cleanup(StringBuffer* s) {
free(s->str);
s->len = 0;
s->cap = sizeof(s->small_str);
}
const char* string_buffer_cstr(const StringBuffer* s) {
return (s->cap <= sizeof(s->small_str)) ? s->small_str : s->str;
}
static char* string_buffer_buf(StringBuffer* s) {
return (char*) string_buffer_cstr(s);
}
size_t string_buffer_length(const StringBuffer* s) {
return s->len;
}
// Currently this is a hidden function, but if ever necessitated, it can be
// exposed in the interface.
static bool string_buffer_reserve(StringBuffer* s, size_t min_cap) {
if (s->cap >= min_cap) {
return true;
}
char* new_buf = realloc(s->str, min_cap);
if (new_buf == NULL) {
return false;
}
// If we're moving from small_str to a buffer, we need to copy over the small_str.
if (s->str == NULL) {
memcpy(new_buf, s->small_str, sizeof(s->small_str));
}
s->str = new_buf;
s->cap = min_cap;
return true;
}
bool string_buffer_set_bytes(StringBuffer* s, const char* str, size_t len) {
if (!string_buffer_reserve(s, len + 1)) {
return false;
}
char* buf = string_buffer_buf(s);
memcpy(buf, str, len);
buf[len] = '0円';
s->len = len;
return true;
}
bool string_buffer_set_cstr(StringBuffer* s, const char* str) {
return string_buffer_set_bytes(s, str, strlen(str));
}
bool string_buffer_set_string_buffer(StringBuffer* dst, const StringBuffer* src) {
return string_buffer_set_bytes(dst, string_buffer_cstr(src), string_buffer_length(src));
}
bool string_buffer_append_bytes(StringBuffer* s, const char* str, size_t len) {
if (!string_buffer_reserve(s, s->len + len + 1)) {
return false;
}
char* dst = string_buffer_buf(s) + s->len;
memcpy(dst, str, len);
dst[len] = '0円';
s->len += len;
return true;
}
bool string_buffer_append_cstr(StringBuffer* s, const char* str) {
return string_buffer_append_bytes(s, str, strlen(str));
}
bool string_buffer_append_string_buffer(StringBuffer* dst, const StringBuffer* src) {
return string_buffer_append_bytes(dst, string_buffer_cstr(src), string_buffer_length(src));
}
void string_buffer_clear(StringBuffer* s) {
s->len = 0;
string_buffer_buf(s)[0] = '0円';
}
4 Answers 4
Overall, I think this is very well done. There is one major thing I find wrong with it though:
Where is the documentation?!?!?!
Sure, as a developer using this library I could read through your short source file and pick it apart to figure out how everything should work. But my time is better spent elsewhere, reading the documentation of other larger projects and programming my own code. That could actually even be a breaking point between using this library, and using a similar (even inferior) library that had good documentation for me to read.
-
\$\begingroup\$ Hrmm, while I certainly agree with you in theory (+1 :p), I can't quite find myself agreeing in this particular circumstance. I guess I misspoke calling it a library as it will likely never escape the light of day outside of the larger project its part of. That pretty quickly eliminates 2/3 of the types of documentation. \$\endgroup\$Corbin– Corbin2014年06月09日 00:37:44 +00:00Commented Jun 9, 2014 at 0:37
-
\$\begingroup\$ As far as the actual consumption of the interface, is there anything confusing? It follows standard C conventions, and I've tried to give everything descriptive names. It's easy to be blinded though since I wrote it. I often have trouble deciding when to document interfaces inside of headers and when it's just noise. I actually originally had this thoroughly documented, but it just felt like clutter. All the docblocks were basically proper sentences formed of the words in the function and argument names. I guess one could easily argue that it's better to have too much documentation though. \$\endgroup\$Corbin– Corbin2014年06月09日 00:39:49 +00:00Commented Jun 9, 2014 at 0:39
-
\$\begingroup\$ I've thought about it a lot since earlier, and I think you're right even for small projects. I've since gone through and documented my code. It was good both for in the future when I inevitably abandon this project and return to it again and for collecting my thoughts. Anyway, I'll likely accept this answer, but I'm going to give it a day or two more just in case :). \$\endgroup\$Corbin– Corbin2014年06月09日 10:32:07 +00:00Commented Jun 9, 2014 at 10:32
-
2\$\begingroup\$ @Corbin Sorry for the late reply. Even though you wrote the project and may never release it, I still think that it would be a good idea to go ahead and document it. What if you leave the project for a few months or years and forget that your library thinks in bytes, not true characters? Anyways, it looks like you have also come to this conclusion, and I know you will be thanking yourself later for doing so (take it from someone that made this mistake before). \$\endgroup\$syb0rg– syb0rg2014年06月09日 14:15:13 +00:00Commented Jun 9, 2014 at 14:15
Calling code is not limited to never calling
string_buffer_cleanup()
after it has been called. A defensive routine would allow repeated calls.void string_buffer_cleanup(StringBuffer* s) { free(s->str); string_buffer_init(s); }
string_buffer_reserve()
does not appear to shift from allocated memory to using the small buffer if able. Code does so going from small buf to allocated memory. This asymmetry, IMO, should be eliminated.Since code is using
#ifdef __cplusplus extern "C" { #endif
, suggest tagging post as C++ also.Pedantic code would insure
s->len + len + 1
(and others) do not overflow.As this code allows
string_buffer_set_bytes()
which may have embedded'0円'
, the results ofstring_buffer_length(s)
may not equalstrlen(string_buffer_cstr(s))
.
-
1\$\begingroup\$ With respect to #1, I just now realized that I forgot to null-out the
str
pointer. I don't think calling cleanup twice should ever be considered valid, but you're right that it might as well make things fully safe rather than the half-ass (and wrong) safety that the current code does. As for #5, I should have been a bit clearer about that in the original post -- that's actually by design. As for #3... I don't think a C++ tag is appropriate for this. Making it accessible from C++ doesn't make it C++. The main C++ criticism anyway would be "use std::string." :p \$\endgroup\$Corbin– Corbin2014年06月09日 21:41:13 +00:00Commented Jun 9, 2014 at 21:41
I'd declare
struct StringBuffer_s {
....
};
straight in StringBuffer.c
, so that StringBuffer.h
would have just
typedef struct StringBuffer_s StringBuffer;
to isolate client code from implementation changes.
-
\$\begingroup\$ I considered going that route, but then it could not be allocated on the stack, and I considered that an unacceptable limitation and overhead for cases of small strings. \$\endgroup\$Corbin– Corbin2014年06月08日 21:22:37 +00:00Commented Jun 8, 2014 at 21:22
All it is used for is corralling an HTTP request response into something a bit safer and more flexible than a bare manually managed C-string.
Therefore I will be considering safety concerns, if you don't mind.
string_buffer.h
__cplusplus: I don't know whether this string library is used in C++ too or not. Either way, I would personally avoid that: C++ has
std::string
. And if, for some reason, you are using this library to handle strings...just don't again.std::string
is likely to outperform your library.StringBuffer declaration: You can omit the struct name when using
typedef
. I changedlen
andcap
tolength
andcapacity
, as they look clearer.typedef struct { char* str; size_t length; size_t capacity; char small_str[64]; } StringBuffer;
string_buffer.cpp
I do like the
small_str
which cuts down on small allocations. When yourealloc
the pointer, though, you just ask formin_cap
. After thememcpy
,length == capacity
.Thus, I would recommend you to allocate more bytes e.g.
min_cap
+sizeof(small_str)
*. You can find this useful if the library often handles big strings.
* I would turn sizeof(small_str)
into a constant, kind of like default_size
, since it appears frequently.
Overall design
- There is no check for
NULL
pointers. That's gonna lead to lots ofSIGSEGV
s if you accidentally forget to callstring_buffer_init
, for example. It's important to point out that you're actually using a function which belongs to an external library.
Nevertheless, I'd go for:
stringbuffer
stringBuffer
sb
rather thanstring_builder
.It should be a compact name; consider e.g.
glPushMatrix()
oralcOpenDevice
.
-
\$\begingroup\$ With regards to Design #1: what is the other option? In my opinion, fast, hard failure is an acceptable outcome of misuse. Anything else could become backwards compatibility baggage that could never be escaped. \$\endgroup\$Corbin– Corbin2014年06月09日 21:36:45 +00:00Commented Jun 9, 2014 at 21:36
-
\$\begingroup\$ Therefore you should also remove the checks for
realloc
andmalloc
. In general, a library always pays attention to error codes and lets the user handle them properly. A library should never either take over (e.g. exit) or handle errors. \$\endgroup\$edmz– edmz2014年06月10日 09:44:49 +00:00Commented Jun 10, 2014 at 9:44 -
\$\begingroup\$ Moreover, we are talking of just a few basic instructions. We've got modern CPUs, which come along with multiple cache levels, branch predictors and so forth. \$\endgroup\$edmz– edmz2014年06月10日 09:52:42 +00:00Commented Jun 10, 2014 at 9:52
-
\$\begingroup\$ Whoops, got a bit side tracked on my first comment now that I reread it (have since deleted it). Anyway, my point is that the user is required by implicit contract to initialize the struct. Comparing it to realloc isn't the same comparison. Checking the return of realloc is analogous to checking the return of
string_buffer_init
(which doesn't return since it can't fail). Not callingstring_buffer_init
is more like not callingrealloc
at all than not checking the return value. Would you be surprised thatchar* a; a[3] = 'c';
seg faults? Of course not. It's user error. \$\endgroup\$Corbin– Corbin2014年06月10日 11:12:00 +00:00Commented Jun 10, 2014 at 11:12
Explore related questions
See similar questions with these tags.
sizeof(StringBuffer)-1
bytes of the structure itself (via a union). I think I got this trick from libc++: The first bit indicates whether the SSO is active, the rest of the first byte is used as the size of the SSO string. This of course requires that the size is the first field. \$\endgroup\$