I'm working on my own generic-type dynamic array in C99 and I would like to hear your input, suggestions, ideas, anything!
What I most care about is: 1) correctness and 2) performance.
So far, I've tested both with standard malloc
and mimalloc, although with the standard implementation it seems to be running 50% faster.
I've also ran benchmarks against libsrt, vec and Nim's implementation of dynamic sequences and my code seems to be beating them all.
Also, of major importance is to note that this code is not meant to be used as a library, but as part of a bytecode VM/intepreter that I've been working on - which means I am the one who is going to use it.
Here's the source:
#ifndef __VARRAY_H__
#define __VARRAY_H__
/**************************************
Type definitions
**************************************/
#define Array(TYPE) \
struct { \
size_t size; \
size_t cap; \
size_t typeSize; \
TYPE *data; \
}
/**************************************
Macros
**************************************/
#define aInit(DEST,TYPE,CAP) { \
int k=0; \
while (powerOfTwo[++k]<CAP); \
\
DEST = malloc(sizeof(Array(TYPE))); \
DEST->size = 0; \
DEST->typeSize = sizeof(TYPE); \
DEST->cap = powerOfTwo[k]; \
DEST->data = calloc(DEST->cap, DEST->typeSize); \
}
#define aNew(NAME,TYPE,CAP) \
Array(TYPE)* NAME; \
aInit(NAME,TYPE,CAP);
#define aResize(DEST) { \
DEST->cap *= 2; \
DEST->data = realloc(DEST->data, DEST->cap * DEST->typeSize); \
}
#define aAdd(DEST,X) \
if (++(DEST->size) >= DEST->cap) aResize(DEST); \
DEST->data[DEST->size-1] = X
#define aAppend(DEST,X) \
DEST->cap += X->cap; \
DEST->data = realloc(DEST->data, DEST->cap * DEST->typeSize); \
memcpy(DEST->data + DEST->size, X->data, X->size * X->typeSize); \
DEST->size += X->size
#define aEach(DEST,INDEX) \
for (int INDEX=0; INDEX<DEST->size; INDEX++)
#define aFree(DEST) \
free(DEST->data); \
free(DEST)
#endif
The powerOfTwo
lookup constant:
/**************************************
Constants
**************************************/
static size_t powerOfTwo[] = {
0,
1, 2, 4, 8, 16, 32, 64, 128,
256, 512, 1024, 2048, 4096, 8192, 16384, 32768,
65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608,
16777216, 33554432, 67108864, 134217728, 268435456, 536870912, 1073741824, 2147483648,
4294967296, 8589934592, 17179869184, 34359738368, 68719476736, 137438953472, 274877906944, 549755813888
};
And a little usage example:
Array(int)* arr;
aInit(arr,int,0);
aAdd(arr,0);
aAdd(arr,1);
aEach(fastArr,i) {
printf("item @ %d = %d\n",i,arr->data[i]);
}
aFree(arr);
2 Answers 2
See this Stackoverflow answer for a discussion on the optimal growth factor for dynamic arrays. The gist of it is that it depends, but growth factors around the golden ratio are easier for the memory allocator to handle. I'd recommend using 1.5 because it is easy to implement:
DEST->cap += DEST->cap / 2
It likely fares worse on artifical benchmarks but better on real workloads. This requires that the minimum capacity of the dynamic array is 2 which is reasonable. But a more realistic minimum size likely should be 8 or 16 since the minimum malloc size is 32 bytes.
In aAppend
you are always realloc:ing which will cause performance
to suffer if many small dynamic arrays are collected into one big one
in a loop. Instead, you want something like:
size_t req = DEST->size + X->size;
if (req > DEST->cap) {
aResize(DEST);
}
memcpy(DEST->data + DEST->size, X->data, X->size * X->typeSize);
DEST->size = req;
The last thing I'd change is the alignment of the capacity to a power
of two in aInit
. On realistic workloads, a good chunk of all
variable arrays never grow so the alignment just wastes space. But
if you insist on alignment, this function is better than the loop:
static inline int
next_pow2(int v) {
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;
return v;
}
I also wonder why you are using macros when C99 has inline functions? They have many advantages in comparison to code macros.
-
3\$\begingroup\$
DEST->cap += DEST->cap / 2
won't change anything of the current value ofDEST->cap
is 1 (or 0, but 0 would be a problem in the original code as well). \$\endgroup\$1201ProgramAlarm– 1201ProgramAlarm2020年01月08日 17:34:19 +00:00Commented Jan 8, 2020 at 17:34 -
2\$\begingroup\$ The minimum malloc size is platform dependent. It might be 32 in your environment, but is likely different for others. The point is good though: allocating very small sizes will hurt performance; but choosing a good minimum size isn't necessarily trivial, especially given that we're measuring in units of
TYPE
but multiplying up to talk tomalloc()
in units ofchar
. \$\endgroup\$Toby Speight– Toby Speight2020年01月08日 18:15:11 +00:00Commented Jan 8, 2020 at 18:15 -
1\$\begingroup\$ Yep! 32 bytes is just an "educated guess" that has been true for some 64-bit systems but might very well change in the future. It all depends on the VM OP is building. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2020年01月08日 18:39:26 +00:00Commented Jan 8, 2020 at 18:39
-
\$\begingroup\$ The expansion by 1.5 was used by standard libraries for a while. But in modern ones they gave up on 1.5 and started using 2. They did not see the theoretical gains in real life enough to make it worth it. But I tried a maths prrof on why 1.5 is better :-) lokiastari.com/blog/2016/03/25/resizemaths/index.html \$\endgroup\$Loki Astari– Loki Astari2020年01月09日 18:35:48 +00:00Commented Jan 9, 2020 at 18:35
-
\$\begingroup\$ Cool! Can you link to their discussion? \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2020年01月09日 19:02:08 +00:00Commented Jan 9, 2020 at 19:02
We're missing an include of <stddef.h>
to define size_t
(or any of the other headers which define it).
We're missing an include of <stdlib.h>
, needed for malloc()
and friends (and this would define size_t
for us, too).
With those fixed, and sufficient minor changes to the test program, I managed to compile with only a few warnings.
DEST = malloc(sizeof(Array(TYPE))); \ DEST->size = 0; \
Oops! If malloc()
returns a null pointer, we have undefined behaviour. Replace with
DEST = malloc(sizeof(Array(TYPE))); \
if (DEST) { \
DEST->size = 0; \
Similarly, when we delete, let's accept a null pointer:
#define aFree(DEST) \
if (DEST) { \
free(DEST->data); \
} \
free(DEST)
We have a dangerous realloc()
:
#define aResize(DEST) { \ DEST->cap *= 2; \ DEST->data = realloc(DEST->data, DEST->cap * DEST->typeSize); \ }
If the realloc()
fails, then we have a null pointer in data
and no way to access the old contents. The standard idiom is to check the return value before assigning it to data
. We're going to need some way to report failure, too.
In aAppend()
, we should be using the aResize()
we've defined instead of resizing to fit - as it is, we're going to reallocate on every single call.
Overall, I think that writing everything as macros is a poor choice. It's probably better to use macro expansion (or equivalent technique, such is multiple includes with varying definitions) to create a set of (small, inlinable) functions for a given type.
The macros we have here look like functions, but can't be used like functions (in particular, aNew
expands to a declaration and a statement, and aAdd
and aFree
both expand to multiple statements, making them dangerous and confusing near conditionals.
It's frustrating that the array control block must be allocated from dynamic memory - C++ programmers expect to be able to create std::vector
objects on the stack, with only the object storage itself on the heap.
-
1\$\begingroup\$ See Malloc Never Fails. Imo, it's not practical to check mallocs return value. Even if @Dr.Kameleon adds
NULL
checks to his mallocs, clients of his library also need to check every invocation of theaAdd
macro to ensure that it didn't cause memory to run out. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2020年01月08日 18:29:12 +00:00Commented Jan 8, 2020 at 18:29 -
1\$\begingroup\$ Ouch, that's a scary attitude. Personally, I find that allowing overcommit does more harm than good, so it's turned off on my non-embedded systems (and certain problem processes get a specific
ulimit
value). And yes, I did say that we need functions that report errors properly; that's another reason to eschew the macros. \$\endgroup\$Toby Speight– Toby Speight2020年01月08日 18:40:00 +00:00Commented Jan 8, 2020 at 18:40 -
1\$\begingroup\$ Oh, having read further, I see, "To clarify, the surprising behaviour
malloc
has does not mean we should ignore its return value". I feel better now. \$\endgroup\$Toby Speight– Toby Speight2020年01月08日 18:41:16 +00:00Commented Jan 8, 2020 at 18:41 -
\$\begingroup\$ It's perfectly reasonable. There's lots of discussion on whether to check the return value of malloc. I usually don't because it's impractical or impossible to test all oom errors, even if you check them, you can't handle them gracefully and false sense of security. For really robust software, I'd eschew dynamic memory entirely. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2020年01月08日 19:20:02 +00:00Commented Jan 8, 2020 at 19:20
-
3\$\begingroup\$ @BjörnLindqvist It's reasonable under some conditions, IF you're aware of what those conditions are, and IF you've explicitly acknowledged the tradeoffs. In your specific applications, you believe it will take too long to make your software fail gracefully, for a situation that rarely happens, and it's fine that you make that call. I don't think it's good practise to generalise that to a blanket statement of "it's not practical" though. I see far too many cases of newbies ignoring return values; their code works fine on a sunny day, but it dies horribly when anything goes wrong. \$\endgroup\$Graham– Graham2020年01月09日 11:08:26 +00:00Commented Jan 9, 2020 at 11:08