I am practicing writing C library code and wrote a priority queue for some graph work.
I am fairly new to writing C libraries so would appreciate any feedback on this implementation.
Some items to start.
The user of the priority queue is expected to define their data type in the data.h header. This is to make the code flexible and not use void*. I would be interested in any feedback on this idea.
The user has to provide a sort function. I can't really see any way round that. I wondered about how to have a default sort but couldn't think of anything. Could that be improved?
The code then.
Firstly the data.h file.
/*
use data.h to define the data to store in the priority queue
*/
#ifndef DATA_H_
#define DATA_H_
struct element {
int weight;
char* data;
};
#endif // DATA_H_
The priority queue header file:
/*
Heap ordered priority queue storing in a resizable array
*/
#ifndef PRIORITY_QUEUE_
#define PRIORITY_QUEUE_
#include "data.h"
struct priority_queue_t;
struct priority_queue_t* priority_queue_init(int(*compare)(const struct element* element1, const struct element* element2));
void priority_queue_free(struct priority_queue_t* pq);
int priority_queue_empty(struct priority_queue_t* pq);
void priority_queue_insert(struct priority_queue_t* pq, struct element* el);
struct element* priority_queue_top(struct priority_queue_t* pq);
#endif // PRIORITY_QUEUE_
the implementation file - priority_queue.c
#include "priority_queue.h"
#include <stdlib.h>
typedef int(*compare)(const struct element* element1, const struct element* element2);
struct priority_queue_t {
int capacity;
int n;
struct element** array;
compare cmp;
};
static const int initial_size = 16;
static void swap(struct priority_queue_t* pq, int index1, int index2) {
// shallow copy of pointers only
struct element* tmp = pq->array[index1];
pq->array[index1] = pq->array[index2];
pq->array[index2] = tmp;
}
static void rise(struct priority_queue_t* pq, int k) {
while (k > 1 && pq->cmp(pq->array[k / 2], pq->array[k]) < 0) {
swap(pq, k, k / 2);
k = k / 2;
}
}
static void fall(struct priority_queue_t* pq, int k) {
while (2 * k <= pq->n) {
int j = 2 * k;
if (j < pq->n && pq->cmp(pq->array[j], pq->array[j + 1]) < 0) {
j++;
}
if (pq->cmp(pq->array[k], pq->array[j]) < 0) {
swap(pq, k, j);
}
k = j;
}
}
static struct element** array_resize(struct element** array, int newlength) {
// reallocate array to new size
return realloc(array, newlength * sizeof(struct element*));
}
struct priority_queue_t* priority_queue_init(int(*compare)(const struct element* element1, const struct element* element2)) {
struct priority_queue_t* pq = malloc(sizeof(struct priority_queue_t));
pq->array = malloc(sizeof(struct element*) * (initial_size + 1));
pq->capacity = initial_size;
pq->n = 0;
pq->cmp = compare;
return pq;
}
void priority_queue_free(struct priority_queue_t* pq) {
free(pq);
}
int priority_queue_empty(struct priority_queue_t* pq) {
return pq->n == 0;
}
void priority_queue_insert(struct priority_queue_t* pq, struct element* el) {
if (pq->n == pq->capacity) {
pq->capacity *= 2;
// we need to resize the array
pq->array = array_resize(pq->array, pq->capacity);
}
// we always insert at end of array
pq->array[++pq->n] = el;
rise(pq, pq->n);
}
struct element* priority_queue_top(struct priority_queue_t* pq) {
// reduce array memory use if appropriate
if (pq->capacity > initial_size && pq->n < pq->capacity / 4) {
pq->array = array_resize(pq->array, pq->capacity / 2);
pq->capacity /= 2;
}
struct element* el = pq->array[1];
swap(pq, 1, pq->n--);
pq->array[pq->n + 1] = NULL; // looks tidier when stepping through code - not really necessary
fall(pq, 1);
return el;
}
some code to exercise
#include "priority_queue.h"
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int compare(const struct element* element1, const struct element* element2) {
return element1->weight - element2->weight;
}
int main() {
struct priority_queue_t* pq = priority_queue_init(compare);
int weights[] = { 14,8,15,16,11,1,12,13,4,10,9,3,5,7,2,6 };
int size = sizeof(weights) / sizeof(weights[0]);
// insert each one into priority queue
for (int i = 0; i < size; ++i) {
struct element* el = malloc(sizeof(struct element));
// generate string
char buffer[20];
sprintf(buffer, "added no: %d", i + 1);
el->data = malloc(strlen(buffer) + 1);
strcpy(el->data, buffer);
el->weight = weights[i];
priority_queue_insert(pq, el);
}
struct element* el = malloc(sizeof(struct element));
el->weight = 22;
el->data = "hi guys";
priority_queue_insert(pq, el);
while (!priority_queue_empty(pq)) {
struct element* top = priority_queue_top(pq);
printf("top is: %d %s\n", top->weight, top->data);
free(top);
}
priority_queue_free(pq);
}
4 Answers 4
"The user of the priority queue is expected to define their data type in the data.h header. This is to make the code flexible and not use void*."
Code limits the types available to pointers to some
struct
. The definition mechanism requires adata.h
file - this is awkward. Strongly recommend a simpler direct approach and reconsider usingvoid *
."The user has to provide a sort function." This is reasonable although I would call it a compare function.
If one still wants to retain the
struct element
approach, at least use a like name for the .h file, perhapselement.h
.The 16 in
initial_size = 16;
is arbitrary. IMO, an empty "bag" like this priority queue should use minimal space. Perhaps start with 0 and useinitial_size
as the initial allocation once some something is added.Consider using
const
for function calls that do not change the state. It conveys codes intent better and possible allows some optimizations.priority_queue_empty(const struct priority_queue_t* pq);
Good use of
static
functions.In general, good naming convention of functions.
I'd expect a function that returns the count of queue members.
By naming, I was surprised that
priority_queue_top()
removed the top priority element and not just report it. Consider 2 functions, one to report and another to pop it off."priority_queue.h" should document the interface more as this is what users see. In particular, detail the meaning of the return value of
int(*compare)(element1, element2)
and how it affects the queue. Negative return bubble upelement1
orelement2
. What happens when return value is 0?Advanced concept: state that the
int(*compare)()
must be consistent, like the compare function forqsort()
. It must report the same sign of the result for the same argument pair each time it is called.qsort
specifies it this way
When the same objects (..., irrespective of their current positions in the array) are passed more than once to the comparison function, the results shall be consistent with one another. That is, for
qsort
they shall define a total ordering on the array,
A definite memory leak:
void priority_queue_free(struct priority_queue_t* pq) { free(pq); }
doesn't
free(pq->array)
.A possible memory leak: a
realloc
inarray_resize
may fail, and afterpq->array = array_resize(....)
the array is lost.
The
pq->array[0]
is never used. I am not sure that it makes the code simpler. Idiomaticallyn
should point the first unused entry.
To address your immediate concerns,
the user is forced to wrap her data type in a struct, even though it could be a primitive type. You may want to omit
struct
from your code, and require just atypedef whatever element
from the user.If I need to use this queue with different data types in the same program, I need to compile it multiple times - once for each type - with different files posing as
data.h
, and work around multiple definitions. It is possible, but I wouldn't want to maintain the build system.
I don't think the default comparator is desirable and/or possible.
-
\$\begingroup\$ Any suggestion for your second point in concerns above? \$\endgroup\$arcomber– arcomber2018年02月01日 18:02:20 +00:00Commented Feb 1, 2018 at 18:02
-
\$\begingroup\$ One way to make a container usable with different types is to mangle the name of the element type into the name of the container. In C++ this happens automagically; in C you can simulate it using macros. See github.com/mgrosvenor/libchaste/tree/master/data_structs/… for an example of the macro technique. (I just googled "generic data structures in C"; I don't know anything about this library besides that it made a decent example of the technique for me to link to.) \$\endgroup\$Quuxplusone– Quuxplusone2018年02月01日 18:26:37 +00:00Commented Feb 1, 2018 at 18:26
The other answers make some good points (especially @vnp who caught the two memory leaks — that's a huge deal!). I'll try not to repeat too much.
int compare(const struct element* element1, const struct element* element2) {
return element1->weight - element2->weight;
}
You should be aware that this comparator can have undefined behavior if element1->weight == -2
and element2->weight == INT_MAX
(for example). A better idiom for comparing "things" (not necessarily even integral things) is
int compare(const struct element* a, const struct element* b) {
return (a->weight < b->weight) ? -1 : (a->weight > b->weight);
}
Your code ends up repeating the keyword struct
a lot. You might consider defining
struct priority_queue; // forward-declared
typedef struct priority_queue priority_queue_t;
struct element; // forward-declared
typedef struct element element_t;
so that you could use slimmed-down parameter lists like
void priority_queue_insert(priority_queue_t *pq, element_t *el);
I was as surprised as @chux that priority_queue_top
actually popped an element from the queue. I would expect the pop function to be named, well, priority_queue_pop
!
Also, it would be nice to document explicitly in the API whether the "pop" operation (currently misnamed "top") pops the smallest element or the biggest element. You could do this in either of two ways:
element_t *priority_queue_min(const priority_queue_t *);
element_t *priority_queue_pop_min(priority_queue_t *);
(and personally I would say _least_element
instead of _min
for absolute clarity) or
element_t *minheap_top(const minheap_t *);
element_t *minheap_pop(minheap_t *);
In the latter case, we use the convention that a "min-heap" has quick access to the minimum element and a "max-heap" has quick access to the maximum element. The data structure you have here is specifically a "min-heap."
Now, looking at priority_queue_top
, I realized that of course I should have known that it modified the heap — because it took a parameter of non-const pointer type! If it wasn't going to modify the heap, it would have taken a pointer-to-const. ...But then I looked and saw this:
int priority_queue_empty(struct priority_queue_t* pq);
Bad! Since empty
is definitely a non-modifying operation, we should instead write
int priority_queue_empty(const priority_queue_t *pq);
// ^^^^^
Or, if you know that struct priority_queue
is very small (say, 16 bytes or less), then you might consider getting clever and writing
int priority_queue_empty(priority_queue_t pq);
This makes a shallow copy of the heap and passes it in by value; used consistently, this can absolutely enforce that no weird mutations are going on, and even (theoretically) allow the compiler to optimize more. But it's definitely "for advanced users only" and maybe "not a good idea anyway." ;)
The last thing I want to talk about is the "genericity" aspect. Right now you have at least three things that the user might want to customize, and you handle each of them in a different way:
(A) The data that belongs to an "element". Customizable exactly once per program, by defining struct element
.
(B) The "less-than" predicate for elements. Customizable per-heap, by passing in a comparator (as a function pointer) to the heap constructor priority_queue_init
.
(C) The "swapping" and "popping" behaviors for an element. Not customizable; instead, we hold, swap, and pop pointers to elements.
(D) The source of dynamic memory allocation. Not customizable; we use malloc
.
If I were writing this code, I would try to mimic the libc function qsort
as much as possible. In particular, I would give the comparator this signature:
typedef int (*priority_queue_comparator_t)(const void *, const void *);
Done right, this could ensure that priority_queue
, qsort
, and bsearch
all play together in a natural way. (You might even look at how the C++ STL defines push_heap
and pop_heap
to work on arbitrary arrays.)
So we could have an API something like this:
typedef int (*comparator_t)(const void *, const void *);
typedef int (*comparator_r_t)(void *cookie, const void *, const void *);
typedef void (*swapper_t)(void *, void *);
typedef void (*swapper_r_t)(void *cookie, void *, void *);
void push_heap_r(void *base, size_t nel, size_t width,
void *cookie, comparator_r_t cmp, swapper_r_t swap)
{
char *a = base;
int k = nel - 1;
while (k > 1 && cmp(cookie, a + width * (k / 2), a + width * k) < 0) {
swap(cookie, a + width * (k / 2), a + width * k);
k /= 2;
}
}
typedef struct minheap {
int capacity;
int n;
void **array;
comparator_t cmp;
} minheap_t;
static int pq_cmp(void *cookie, const void *a, const void *b) {
minheap_t *pq = cookie;
return pq->cmp(*(void **)a, *(void **)b);
}
static void pq_swap(void *, void *a, void *b) {
void *tmp = *(void**)a;
*(void**)a = *(void**)b;
*(void**)b = tmp;
}
void minheap_insert(minheap_t *pq, void *el)
{
if (pq->n == pq->capacity) {
pq->capacity *= 2;
// we need to resize the array
pq->array = array_resize(pq->array, pq->capacity);
}
// we always insert at end of array
pq->array[++pq->n] = el;
push_heap_r(pq->array, pq->n, sizeof(void*), pq, pq_cmp, pq_swap);
}
The "cookie" parameter can be used to pass data from the caller, through the generic algorithm push_heap_r
, into the comparison function. In this instance we're using the cookie to pass the user's original function pointer pq->cmp
through to be used by pq_cmp
.
But the user could just as well use push_heap_r
on their data array directly, with no indirection through void*
s.
static int generic_memcmp(void *cookie, const void *a, const void *b) {
return memcmp(a, b, *(size_t*)cookie);
}
static void generic_swap(void *cookie, void *a, void *b) {
size_t n = *(size_t*)cookie;
char buffer[n]; // VLA alert!
memcpy(buffer, a, n);
memcpy(a, b, n);
memcpy(b, buffer, n);
}
void test() {
struct foo arr[100] = ...;
size_t sz = sizeof (struct foo);
// Build a heap.
for (int i=1; i <= 100; ++i) {
push_heap_r(arr, i, sz, &sz, generic_memcmp, generic_swap);
}
// Now all 100 `foo`s in the array have been arranged heapwise,
// according to the comparator defined by memcmp.
}
Now, this is probably getting a little crazy and going off in directions where you might not need so much genericity. But this is the direction I'd think about going. It's pretty unrecognizable compared to our starting point of "require the user to define struct element
," and yet the usage of it has maybe even gotten simpler. No more malloc
, no more element
, no more arrays of pointers... just the rawest possible thing that could be called a "heap." And then we built the malloc'ing minheap_t
on top of that.
Anyway, it's food for thought, even if YAGNI in practice. :)
-
\$\begingroup\$ I really liked your discussion about the issues. Concerning mimicking
qsort()
's compare function (a good idea): that function takes the address of the elements to compare. OP's compare takes the values of the elements themselves (2struct
pointers). I think to well mimicqsort()
, this priority queue needs asize_t element_size
parameter and not assume astruct
pointer. \$\endgroup\$chux– chux2018年02月01日 15:46:46 +00:00Commented Feb 1, 2018 at 15:46 -
\$\begingroup\$ @chux: How I proposed solving that was that the low-level
push_heap_r
takes awidth
parameter and can sort anything; the higher-levelminheap_t
continues to store an arraypq->array
ofvoid*
s. So whenminheap_t
callspush_heap_r
, it invariably setswidth = sizeof(void*)
; but it calls into the user'scmp
comparator for... oho, I see, I wrotereturn pq->cmp(a, b)
where I had meantreturn pq->cmp(*(void**)a, *(void**)b)
. Will fix! Does that clear up the problem you were seeing? \$\endgroup\$Quuxplusone– Quuxplusone2018年02月01日 18:17:00 +00:00Commented Feb 1, 2018 at 18:17
Off by one bug
When you resize your array, you need to make the new array size capacity + 1
and not just capacity
, because your array is 1-based. You did it correctly in your init function, but not in the two places where you resize the array.
Better local variable names
Instead of j
and k
, it would have been helpful to have names such as parent
and child
. I had to keep searching back in the code to figure out which variable was holding which index.
Top function doesn't check edge case
If you call the top
function on an empty priority queue, it will cause n
to go negative, which could eventually lead to a segfault later on.