Bare-bones string library

Question 1

After years of criticizing others, I've finally found the time and worked up the courage to polish up one of my bits of code and solicit criticisms of my own.

This is a simple dynamic-string library that I wrote for one of my old C projects (don't judge too harshly if you look into it -- it's a few years old, and I've just begun revising it).

Some notes on the design:

Its interface is fairly heavily influenced by C++'s std::string
Internally, it manages a dynamically allocated character buffer
It is encoding agnostic -- it thinks in bytes, not true characters
For academic interest, I went ahead and did a naive small string optimization

Please note that this is not intended to be a fully functioning, Swiss army knife of a string library. The functionality is very limited, and that's the way I intended it. All it is used for is corralling an HTTP request response into something a bit safer and more flexible than a bare manually managed c-string. There is a possibility that it may grow more complex though, so anything in the interface that could prohibit that would be of great interest to me.

To my knowledge, it is fully compliant to the C99 standard.

string_buffer.h

#ifndef UPMON_STRING_BUFFER_H
#define UPMON_STRING_BUFFER_H
#ifdef __cplusplus
extern "C" {
#endif
#include <stddef.h>
#include <stdbool.h>
typedef struct StringBuffer {
 char* str;
 size_t len;
 size_t cap;
 char small_str[64];
} StringBuffer;
void string_buffer_init(StringBuffer* s);
void string_buffer_cleanup(StringBuffer* s);
const char* string_buffer_cstr(const StringBuffer* s);
size_t string_buffer_length(const StringBuffer* s);
bool string_buffer_set_bytes(StringBuffer* s, const char* str, size_t len);
bool string_buffer_set_cstr(StringBuffer* s, const char* str);
bool string_buffer_set_string_buffer(StringBuffer* dst, const StringBuffer* src);
bool string_buffer_append_bytes(StringBuffer* s, const char* str, size_t len);
bool string_buffer_append_cstr(StringBuffer* s, const char* str);
bool string_buffer_append_string_buffer(StringBuffer* dst, const StringBuffer* src);
void string_buffer_clear(StringBuffer* s);
#ifdef __cplusplus
}
#endif
#endif

string_buffer.c

#include "string_buffer.h"
#include <stdlib.h>
#include <string.h>
void string_buffer_init(StringBuffer* s) {
 s->small_str[0] = '0円';
 s->str = NULL;
 s->len = 0;
 s->cap = sizeof(s->small_str);
}
void string_buffer_cleanup(StringBuffer* s) {
 free(s->str);
 s->len = 0;
 s->cap = sizeof(s->small_str);
}
const char* string_buffer_cstr(const StringBuffer* s) {
 return (s->cap <= sizeof(s->small_str)) ? s->small_str : s->str;
}
static char* string_buffer_buf(StringBuffer* s) {
 return (char*) string_buffer_cstr(s);
}
size_t string_buffer_length(const StringBuffer* s) {
 return s->len;
}
// Currently this is a hidden function, but if ever necessitated, it can be
// exposed in the interface.
static bool string_buffer_reserve(StringBuffer* s, size_t min_cap) {
 if (s->cap >= min_cap) {
 return true;
 }
 char* new_buf = realloc(s->str, min_cap);
 if (new_buf == NULL) {
 return false;
 }
 // If we're moving from small_str to a buffer, we need to copy over the small_str.
 if (s->str == NULL) {
 memcpy(new_buf, s->small_str, sizeof(s->small_str));
 }
 s->str = new_buf;
 s->cap = min_cap;
 return true;
}
bool string_buffer_set_bytes(StringBuffer* s, const char* str, size_t len) {
 if (!string_buffer_reserve(s, len + 1)) {
 return false;
 }
 char* buf = string_buffer_buf(s);
 memcpy(buf, str, len);
 buf[len] = '0円';
 s->len = len;
 return true;
}
bool string_buffer_set_cstr(StringBuffer* s, const char* str) {
 return string_buffer_set_bytes(s, str, strlen(str));
}
bool string_buffer_set_string_buffer(StringBuffer* dst, const StringBuffer* src) {
 return string_buffer_set_bytes(dst, string_buffer_cstr(src), string_buffer_length(src));
}
bool string_buffer_append_bytes(StringBuffer* s, const char* str, size_t len) {
 if (!string_buffer_reserve(s, s->len + len + 1)) {
 return false;
 }
 char* dst = string_buffer_buf(s) + s->len;
 memcpy(dst, str, len);
 dst[len] = '0円';
 s->len += len;
 return true;
}
bool string_buffer_append_cstr(StringBuffer* s, const char* str) {
 return string_buffer_append_bytes(s, str, strlen(str));
}
bool string_buffer_append_string_buffer(StringBuffer* dst, const StringBuffer* src) {
 return string_buffer_append_bytes(dst, string_buffer_cstr(src), string_buffer_length(src));
}
void string_buffer_clear(StringBuffer* s) {
 s->len = 0;
 string_buffer_buf(s)[0] = '0円';
}

Question 2

A small string optimization can use up to sizeof(StringBuffer)-1 bytes of the structure itself (via a union). I think I got this trick from libc++: The first bit indicates whether the SSO is active, the rest of the first byte is used as the size of the SSO string. This of course requires that the size is the first field.

Question 3

@dyp now you've made me want to toss my approach and do that :). The only downside is that you limit yourself to very short strings that way. I guess a hybrid is always possible though. I read an article about it by Scott Meyers (I think?) a while back that I need to dig up. I think one of the standard implementations does actually have a bit of extra padding for that purpose. Might be worth replicating :p.

Question 4

I'm not very good at this, but couldn't you union str and small_str ?

Question 5

@user1737909 hmmm, yes I believe I could. With an anonymous union, I don't think I'll even need to change any other code :). Good catch!

Question 6

Overall, I think this is very well done. There is one major thing I find wrong with it though:

Where is the documentation?!?!?!

Sure, as a developer using this library I could read through your short source file and pick it apart to figure out how everything should work. But my time is better spent elsewhere, reading the documentation of other larger projects and programming my own code. That could actually even be a breaking point between using this library, and using a similar (even inferior) library that had good documentation for me to read.

Question 7

Hrmm, while I certainly agree with you in theory (+1 :p), I can't quite find myself agreeing in this particular circumstance. I guess I misspoke calling it a library as it will likely never escape the light of day outside of the larger project its part of. That pretty quickly eliminates 2/3 of the types of documentation.

Question 8

As far as the actual consumption of the interface, is there anything confusing? It follows standard C conventions, and I've tried to give everything descriptive names. It's easy to be blinded though since I wrote it. I often have trouble deciding when to document interfaces inside of headers and when it's just noise. I actually originally had this thoroughly documented, but it just felt like clutter. All the docblocks were basically proper sentences formed of the words in the function and argument names. I guess one could easily argue that it's better to have too much documentation though.

Question 9

I've thought about it a lot since earlier, and I think you're right even for small projects. I've since gone through and documented my code. It was good both for in the future when I inevitably abandon this project and return to it again and for collecting my thoughts. Anyway, I'll likely accept this answer, but I'm going to give it a day or two more just in case :).

Question 10

@Corbin Sorry for the late reply. Even though you wrote the project and may never release it, I still think that it would be a good idea to go ahead and document it. What if you leave the project for a few months or years and forget that your library thinks in bytes, not true characters? Anyways, it looks like you have also come to this conclusion, and I know you will be thanking yourself later for doing so (take it from someone that made this mistake before).

Question 11

Calling code is not limited to never calling string_buffer_cleanup() after it has been called. A defensive routine would allow repeated calls.
```
void string_buffer_cleanup(StringBuffer* s) {
 free(s->str);
 string_buffer_init(s);
}
```
string_buffer_reserve() does not appear to shift from allocated memory to using the small buffer if able. Code does so going from small buf to allocated memory. This asymmetry, IMO, should be eliminated.
Since code is using #ifdef __cplusplus extern "C" { #endif, suggest tagging post as C++ also.
Pedantic code would insure s->len + len + 1 (and others) do not overflow.
As this code allows string_buffer_set_bytes() which may have embedded '0円', the results of string_buffer_length(s) may not equal strlen(string_buffer_cstr(s)).

Question 12

With respect to #1, I just now realized that I forgot to null-out the str pointer. I don't think calling cleanup twice should ever be considered valid, but you're right that it might as well make things fully safe rather than the half-ass (and wrong) safety that the current code does. As for #5, I should have been a bit clearer about that in the original post -- that's actually by design. As for #3... I don't think a C++ tag is appropriate for this. Making it accessible from C++ doesn't make it C++. The main C++ criticism anyway would be "use std::string." :p

Question 13

I'd declare

struct StringBuffer_s {
 ....
};

straight in StringBuffer.c, so that StringBuffer.h would have just

typedef struct StringBuffer_s StringBuffer;

to isolate client code from implementation changes.

Question 14

I considered going that route, but then it could not be allocated on the stack, and I considered that an unacceptable limitation and overhead for cases of small strings.

Question 15

All it is used for is corralling an HTTP request response into something a bit safer and more flexible than a bare manually managed C-string.

Therefore I will be considering safety concerns, if you don't mind.

string_buffer.h

__cplusplus: I don't know whether this string library is used in C++ too or not. Either way, I would personally avoid that: C++ has std::string. And if, for some reason, you are using this library to handle strings...just don't again. std::string is likely to outperform your library.
StringBuffer declaration: You can omit the struct name when using typedef. I changed len and cap to length and capacity, as they look clearer.
```
typedef struct {
 char* str;
 size_t length;
 size_t capacity;
 char small_str[64];
} StringBuffer;
```

string_buffer.cpp

I do like the small_str which cuts down on small allocations. When you realloc the pointer, though, you just ask for min_cap. After the memcpy, length == capacity.

Thus, I would recommend you to allocate more bytes e.g. min_cap + sizeof(small_str) ^*. You can find this useful if the library often handles big strings.

^* I would turn sizeof(small_str) into a constant, kind of like default_size, since it appears frequently.

Overall design

There is no check for NULL pointers. That's gonna lead to lots of SIGSEGVs if you accidentally forget to call string_buffer_init, for example.
It's important to point out that you're actually using a function which belongs to an external library.

Nevertheless, I'd go for:
- stringbuffer
- stringBuffer
- sb rather than string_builder.
  
  It should be a compact name; consider e.g. glPushMatrix() or alcOpenDevice.

Question 16

With regards to Design #1: what is the other option? In my opinion, fast, hard failure is an acceptable outcome of misuse. Anything else could become backwards compatibility baggage that could never be escaped.

Question 17

Therefore you should also remove the checks for realloc and malloc. In general, a library always pays attention to error codes and lets the user handle them properly. A library should never either take over (e.g. exit) or handle errors.

Question 18

Moreover, we are talking of just a few basic instructions. We've got modern CPUs, which come along with multiple cache levels, branch predictors and so forth.

Question 19

Whoops, got a bit side tracked on my first comment now that I reread it (have since deleted it). Anyway, my point is that the user is required by implicit contract to initialize the struct. Comparing it to realloc isn't the same comparison. Checking the return of realloc is analogous to checking the return of string_buffer_init (which doesn't return since it can't fail). Not calling string_buffer_init is more like not calling realloc at all than not checking the return value. Would you be surprised that char* a; a[3] = 'c'; seg faults? Of course not. It's user error.

syb0rg syb0rg 21.9k10 gold badges113 silver badges192 bronze badges · Accepted Answer · 2014-06-08 18:28:34Z

9

\$\begingroup\$

Overall, I think this is very well done. There is one major thing I find wrong with it though:

Where is the documentation?!?!?!

Sure, as a developer using this library I could read through your short source file and pick it apart to figure out how everything should work. But my time is better spent elsewhere, reading the documentation of other larger projects and programming my own code. That could actually even be a breaking point between using this library, and using a similar (even inferior) library that had good documentation for me to read.

Share

edited Jun 10, 2020 at 13:24

Community's user avatar

Community Bot

1

answered Jun 8, 2014 at 18:28

syb0rg's user avatar

syb0rg syb0rg

21.9k10 gold badges113 silver badges192 bronze badges

\$\endgroup\$

4

\$\begingroup\$ Hrmm, while I certainly agree with you in theory (+1 :p), I can't quite find myself agreeing in this particular circumstance. I guess I misspoke calling it a library as it will likely never escape the light of day outside of the larger project its part of. That pretty quickly eliminates 2/3 of the types of documentation. \$\endgroup\$

Corbin
– Corbin

2014年06月09日 00:37:44 +00:00
Commented Jun 9, 2014 at 0:37
\$\begingroup\$ As far as the actual consumption of the interface, is there anything confusing? It follows standard C conventions, and I've tried to give everything descriptive names. It's easy to be blinded though since I wrote it. I often have trouble deciding when to document interfaces inside of headers and when it's just noise. I actually originally had this thoroughly documented, but it just felt like clutter. All the docblocks were basically proper sentences formed of the words in the function and argument names. I guess one could easily argue that it's better to have too much documentation though. \$\endgroup\$

Corbin
– Corbin

2014年06月09日 00:39:49 +00:00
Commented Jun 9, 2014 at 0:39
\$\begingroup\$ I've thought about it a lot since earlier, and I think you're right even for small projects. I've since gone through and documented my code. It was good both for in the future when I inevitably abandon this project and return to it again and for collecting my thoughts. Anyway, I'll likely accept this answer, but I'm going to give it a day or two more just in case :). \$\endgroup\$

Corbin
– Corbin

2014年06月09日 10:32:07 +00:00
Commented Jun 9, 2014 at 10:32
2

\$\begingroup\$ @Corbin Sorry for the late reply. Even though you wrote the project and may never release it, I still think that it would be a good idea to go ahead and document it. What if you leave the project for a few months or years and forget that your library thinks in bytes, not true characters? Anyways, it looks like you have also come to this conclusion, and I know you will be thanking yourself later for doing so (take it from someone that made this mistake before). \$\endgroup\$

syb0rg
– syb0rg

2014年06月09日 14:15:13 +00:00
Commented Jun 9, 2014 at 14:15

Add a comment |

Stack Exchange Network

Bare-bones string library

4 Answers 4

Where is the documentation?!?!?!

string_buffer.h

string_buffer.cpp

Overall design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

4 Answers 4

string_buffer.h

string_buffer.cpp

Overall design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related