For learning purposes, I've decided to implement several classic data structures, starting with binary search trees. The entire code can be seen here, in a frozen branch review-btree-21.12.2017
. So far I've implemented basic functionality - search/insert/delete and the two simplest traversals. Advanced functionality - rebalancing, reordering, conversion from and to lists and such - will be the stuff for another question, which will also incorporate whatever advice I'll get for this one.
This is based on a tutorial by Eternally Confuzzled (there are differences, for instance, he made his trees only partially threaded). The most important thing is that instead of having two variables for left and right link of the tree, an array is used, where 0 means left, and 1 means right. I find it convenient and elegant.
My main concerns are:
- Efficiency - do my procedures perform as well as binary trees allow it?
- Threading - is it broken? The tests I've devised do not reveal any errors (they did, but I fixed those), but that's hardly an indication of their absense.
In the test
directory of the repo with the source there's the test driver I've used to check all of this. It uses check
library, so make sure you have it installed if you want to run the tests. I won't post testing code here unless asked (feel free to do so), since there's enough of code clutter going on below already.
Now for the code. Here's (most of) the header file:
#ifndef BTREE_H
#define BTREE_H
/** A binary tree module.
*
* Based on a tutorial by Eternally Confuzzled.
*
*/
/* ---------- base stuff ---------- */
enum btt_type
{
BTT_INORDER,
BTT_INORDER_REV,
};
struct btree
{
void *data;
/* 0 for left, 1 for right. */
int thread[2];
struct btree *link[2]; /* A threaded link or a child. */
};
/* btt - binary tree traversal */
struct btt
{
enum btt_type type;
struct btree *tree, *cur;
};
/* These should return a negative value if (left < right), 0 if (left == right)
* and a positive value if (left > right).
* */
typedef int (*btree_cmp_fn)(void *left, void *right);
typedef int (*btree_cmp_ex_fn)(void *left, void *right, void *external_arg);
/* ---------- creation ---------- */
extern struct btree *
btree_create(void *data);
/* ---------- destruction ---------- */
/* Note that you can freely destroy subtrees, the parent tree (if there's one)
* will be updated to exclude the destroyed subtree. */
extern void
btree_destroy(struct btree *);
extern void
btree_destroy_ex(struct btree *, void (*destroyer)(void *data));
extern void
btree_destroy_exx(struct btree *, void (*destroyer)(void *data, void *arg), void *arg);
/* ---------- traversal ---------- */
extern struct btt *
btt_create(struct btree *tree, enum btt_type type);
/* A non-allocating version of the above. */
extern void
btt_init(struct btt *btt, struct btree *tree, enum btt_type type);
extern int
btt_done(struct btt *btt);
extern void *
btt_this(struct btt *btt);
extern struct btree *
btt_this_node(struct btt *btt);
extern void *
btt_next(struct btt *);
extern struct btree *
btt_next_node(struct btt *);
extern void
btt_rewind(struct btt *);
extern void
btt_destroy(struct btt *);
/* Finalize a traversal without freeing its memory (just its state's memory, if
* any, will be freed) */
extern void
btt_fin(struct btt *);
/* ---------- accessing struct members and information retrieval ---------- */
extern void *
btree_data(struct btree *tree);
extern struct btree *
btree_left(struct btree *tree);
extern struct btree *
btree_right(struct btree *tree);
extern int
btree_thread(struct btree *tree, int dir);
extern struct btree *
btree_link(struct btree *tree, int dir);
extern int
btree_has_children(struct btree *);
extern int
btree_has_child(struct btree *, int dir);
extern int
btree_num_children(struct btree *);
extern int
btree_is_a_leaf(struct btree *);
/* Return the index of the link in the links array of the given tree, or -1 if
* the link is not there. */
extern int
btree_link_dir(struct btree *tree, struct btree *link);
extern size_t
btree_size(struct btree *tree);
/* ---------- searching ---------- */
/* Return 1 if the element was found (and fill res with it), return 0 and do
* nothing with 'res' otherwise. Pass NULL as 'res' to avoid filling it with
* anything. */
extern int
btree_find(struct btree *tree, void *data, btree_cmp_fn cmp, void **res);
/* Return 1 if the element was found (and fill res with it), return 0 and do
* nothing with 'res' otherwise. Pass NULL as 'res' to avoid filling it with
* anything. */
extern int
btree_find_ex(struct btree *tree, void *data, btree_cmp_ex_fn cmp, void *cmp_arg, void **res);
extern struct btree *
btree_find_node(struct btree *tree, void *data, btree_cmp_fn cmp);
extern struct btree *
btree_find_node_ex(struct btree *tree, void *data, btree_cmp_ex_fn cmp, void *cmp_arg);
extern struct btree *
btree_parent(struct btree *tree);
/* Note that this searches for a successor of the entire subtree. */
extern struct btree *
btree_successor(struct btree *tree);
/* Note that this searches for a predecessor of the entire subtree. */
extern struct btree *
btree_predecessor(struct btree *tree);
/* Return either left or right outermost subnode of the subtree. */
extern struct btree *
btree_outermost(struct btree *tree, int dir);
/* Return either predecessor of the leftmost subnode or a successor of the
* rightmost subnode. */
extern struct btree *
btree_after_outermost(struct btree *tree, int dir);
/* ---------- insertion ---------- */
/* Return the freshly created node, or NULL on an OOM condition. */
extern struct btree *
btree_insert(struct btree *tree, void *data, btree_cmp_fn cmp);
/* Return the freshly created node, or NULL on an OOM condition. */
extern struct btree *
btree_insert_ex(struct btree *tree, void *data, btree_cmp_ex_fn cmp, void *arg);
/* ---------- deletion ---------- */
/* There's no function to delete subtrees - this is done either with
* btree_destroy family or btree_unlink. */
/* + Return 1 if the data was found in the tree, 0 otherwise.
* + Fill 'tree_after' * with the tree after the deletion of the data (might be
* NULL if the last node in the tree was deleted). Pass NULL to avoid filling.
* + Fill 'deleted' with the data from the tree that was deleted (not from the
* 'data' argument! they might be different depending on what 'cmp' function
* does). Pass NULL to avoid filling. */
extern int
btree_delete(struct btree *tree, void *data, btree_cmp_fn cmp,
struct btree **tree_after, void **deleted);
/* Ditto, but cmp takes an extra argument. */
extern int
btree_delete_ex(struct btree *tree, void *data, btree_cmp_ex_fn cmp, void *arg,
struct btree **tree_after, void **deleted);
/* ---------- other ---------- */
/* Unlink the given subtree from its tree. */
extern void
btree_unlink(struct btree *tree);
#endif /* BTREE_H */
I will omit _ex
versions of most functions here to save space, they only differ from ordinary versions in that the function they accept as an argument takes an extra void *
argument, so one doesn't have to use globals.
Here's the creation function, mostly boilerplate:
struct btree *
btree_create(void *data)
{
struct btree *res = malloc(sizeof(struct btree));
if (res == NULL) return NULL;
res->data = data;
res->thread[0] = 0;
res->thread[1] = 0;
res->link[0] = NULL;
res->link[1] = NULL;
return res;
}
Destructor function and its helper procedure, btree_unlink
(which removes threading and child connections between a tree and its parent tree). The reason for the DESTROY_BODY
macro is that there are two more functions for tree destructions which only differ in a single line, where freeing tree data takes place.
#define DESTROY_BODY(tree, free_data) \
{ \
if (tree == NULL) return; \
btree_unlink(tree); \
\
tree = btree_outermost(tree, 0); \
while (tree != NULL) { \
struct btree *link = tree->link[1]; \
int thread = tree->thread[1]; \
free_data; \
free(tree); \
tree = link; \
if (!thread && link != NULL) { \
while (!tree->thread[0] && tree->link[0] != NULL) \
tree = tree->link[0]; \
} \
} \
} \
void
btree_destroy(struct btree *tree)
{
DESTROY_BODY(tree, (void) 0);
}
void
btree_unlink(struct btree *tree)
{
struct btree *left = btree_outermost(tree, 0);
struct btree *right = btree_outermost(tree, 1);
/* Make the outer tree forget about the inner. */
struct btree *before_left = left->link[0];
struct btree *after_right = right->link[1];
if (before_left != NULL && (before_left->thread[1] || before_left->link[1] == tree)) {
before_left->thread[1] = after_right != NULL;
before_left->link[1] = after_right;
}
if (after_right != NULL && (after_right->thread[0] || after_right->link[0] == tree)) {
after_right->thread[0] = before_left != NULL;
after_right->link[0] = before_left;
}
/* Make the inner tree a standalone tree. */
left->thread[0] = 0;
left->link[0] = NULL;
right->thread[1] = 0;
right->link[1] = NULL;
}
Searching is done pretty much as one would expect it from binary trees:
struct btree *
btree_find_node(struct btree *tree, void *data, btree_cmp_fn cmp)
{
if (tree == NULL) return NULL;
while(1) {
int cmp_res = cmp(data, tree->data);
if (cmp_res == 0) return tree;
int dir = cmp_res > 0;
if (tree->thread[dir] || tree->link[dir] == NULL) return NULL;
tree = tree->link[dir];
}
}
Insertion is slightly trickier because of threading (again, the reason for INSERT_BODY
macro is that the only difference between btree_insert
and btree_insert_ex
is a single line where comparison takes place).
#define INSERT_BODY(tree, data, cmp, cmp_line) \
{ \
struct btree *res = btree_create(data); \
if (res == NULL) return NULL; \
\
while (1) { \
int cmp_res = cmp_line; \
int dir = cmp_res > 0; \
if (tree->link[dir] == NULL || tree->thread[dir]) { \
struct btree *old_link = tree->link[dir]; \
/* If a node is inserted to the right, it gets its \
* parent's right thread and the parent as its left \
* thread. If a node is inserted to the left, the \
* situation is mirrored. */ \
res->thread[dir] = old_link != NULL; \
res->link[dir] = old_link; \
res->thread[!dir] = 1; \
res->link[!dir] = tree; \
/* The parent gets the created node as its child. */ \
tree->thread[dir] = 0; \
tree->link[dir] = res; \
/* The threaded node, if it exists, gets threaded to \
* the inserted node if it doesn't have a child in this \
* direction. */ \
if (old_link == NULL) return res; \
if (!old_link->thread[!dir]) return res; \
old_link->thread[!dir] = 1; \
old_link->link[!dir] = res; \
return res; \
} else { \
tree = tree->link[dir]; \
} /* if found insertion location */ \
} /* while 1 */ \
}
struct btree *
btree_insert(struct btree *tree, void *data, btree_cmp_fn cmp)
{
INSERT_BODY(tree, data, cmp, cmp(data, tree->data));
}
Now for deletion. Again DELETE_BODY
helps to avoid code duplication between btree_delete
and btree_delete_ex
.
#define DELETE_BODY(tree, data, cmp, tree_after, deleted, find) \
{ \
if (tree_after != NULL) *tree_after = tree; \
if (deleted != NULL) *deleted = NULL; \
if (tree == NULL) return 0; \
\
struct btree *to_delete = find; \
if (to_delete == NULL) return 0; \
void *old_data = to_delete->data; \
\
/* The simplest case - the node is a leaf. */ \
if (btree_is_a_leaf(to_delete)) { \
btree_unlink(to_delete); \
if (tree_after != NULL) *tree_after = tree == to_delete ? NULL : tree; \
if (deleted != NULL) *deleted = old_data; \
free(to_delete); \
return 1; \
} \
\
/* A bit harder - the node has a single child. */ \
if (btree_num_children(to_delete) == 1) { \
int child_dir = btree_has_child(to_delete, 1); \
/* Simply replace the node with its child. */ \
struct btree *child = to_delete->link[child_dir]; \
struct btree *parent = btree_parent(to_delete); \
if (parent == NULL) { \
child->thread[!child_dir] = 0; \
child->link[!child_dir] = 0; \
if (tree_after != NULL) *tree_after = child; \
if (deleted != NULL) *deleted = old_data; \
free(to_delete); \
return 1; \
} else { \
int dir_to_here = btree_link_dir(parent, to_delete); \
parent->thread[dir_to_here] = 0; \
parent->link[dir_to_here] = child; \
child->thread[!child_dir] = 1; \
child->link[!child_dir] = parent; \
free(to_delete); \
if (tree_after != NULL) *tree_after = parent; \
if (deleted != NULL) *deleted = old_data; \
return 1; \
} /* if parent is null */ \
} /* if the node has one child */ \
\
/* The last case - both children are present. Pretty easy to do, \
* actually - find inorder predecessor and replace the node with it. */ \
struct btree *left_child = to_delete->link[0]; \
struct btree *predecessor = btree_outermost(left_child, 1); \
btree_unlink(predecessor); \
to_delete->data = predecessor->data; \
if (tree_after != NULL) *tree_after = tree; \
if (deleted != NULL) *deleted = old_data; \
free(predecessor); \
return 1; \
} \
extern int
btree_delete(struct btree *tree, void *data, btree_cmp_fn cmp,
struct btree **tree_after, void **deleted)
{
DELETE_BODY(tree, data, cmp, tree_after, deleted,
btree_find_node(tree, data, cmp));
}
Trees being fully threaded has a benefit of making inorder and reverse inorder traversals very easy:
/* Helper functions. */
struct btree *
btt_next_inorder(struct btt *btt)
{
struct btree *cur = btt->cur;
if (cur->thread[1]) {
btt->cur = cur->link[1];
} else {
cur = cur->link[1];
while (cur != NULL && !cur->thread[0] && cur->link[0] != NULL)
cur = cur->link[0];
btt->cur = cur;
}
return btt->cur;
}
struct btree *
btt_next_inorder_rev(struct btt *btt)
{
struct btree *cur = btt->cur;
if (cur->thread[0]) {
btt->cur = cur->link[0];
} else {
cur = cur->link[0];
while (cur != NULL && !cur->thread[1] && cur->link[1] != NULL)
cur = cur->link[1];
btt->cur = cur;
}
return btt->cur;
}
/* Exported functions. */
struct btt *
btt_create(struct btree *tree, enum btt_type type)
{
struct btt *res = malloc(sizeof(struct btt));
if (res == NULL) return NULL;
btt_init(res, tree, type);
return res;
}
void
btt_init(struct btt *btt, struct btree *tree, enum btt_type type)
{
btt->tree = tree;
btt->type = type;
switch (type) {
case BTT_INORDER:
btt->cur = btree_outermost(tree, 0);
break;
case BTT_INORDER_REV:
btt->cur = btree_outermost(tree, 1);
break;
} /* switch type */
}
int
btt_done(struct btt *btt)
{
return btt->cur == NULL;
}
void *
btt_this(struct btt *btt)
{
return btt->cur->data;
}
struct btree *
btt_this_node(struct btt *btt)
{
return btt->cur;
}
void *
btt_next(struct btt *btt)
{
return btt_next_node(btt)->data;
}
struct btree *
btt_next_node(struct btt *btt)
{
switch (btt->type) {
case BTT_INORDER: return btt_next_inorder(btt);
case BTT_INORDER_REV: return btt_next_inorder_rev(btt);
}
}
void
btt_rewind(struct btt *btt)
{
btt_fin(btt);
btt_init(btt, btt->tree, btt->type);
}
void
btt_destroy(struct btt *btt)
{
btt_fin(btt);
free(btt);
}
void
btt_fin(struct btt *btt)
{
/* TODO: btt_fin: clear up any state. */
}
For an example of usage of all of the above, as well as some helper functions that have been used here and there, please see the source in the repo and test
directory. I'm afraid there's too much code in the question already (if there isn't, just say so, I'll add whatever is needed).
-
\$\begingroup\$ Hi! I´m no senior at Code Review but fell that the code posted is kind of intimidating because of its shear size. You could reduce lines-of-code significantly in many places, skipping "boilerplate indentation" for simple functions in the last code snippet for example. Code comments are good but should ADD infromation, not repeat the obvious. No help to you question but maybe someone else will find reviewing it more pleasant... Anyway, good luck! \$\endgroup\$Andreas– Andreas2017年12月23日 12:32:56 +00:00Commented Dec 23, 2017 at 12:32
1 Answer 1
This antique question has gone unanswered for years. And, the OP seems to have wandered off, too.
Still, it provides lots of fodder for appraisal and commentary.
Something for a quiet afternoon...
Attitude needs fixing
Asking the readers to go off-site to ferret out and download missing material for evaluation reflects badly on the OP. Perhaps this is why this question has remained unaddressed for so long.
There's a large header file provided full of prototypes for unseen code (functions) whose names suggest more than basic BST operations.
A skill to develop would be to pare-down the code to a minimum representation that compiles and runs, then present that code for review.
No "opacity"
The header file full of function prototypes (the interface) reveals the guessable inner workings of the data model (to any coder with some experience). Application coders should not know the names of structure members, lest they start using those names directly, or write app code that winds up breaking the expected data model.
Be consistent
In the interface header file, we are presented with:
enum btt_type
{
BTT_INORDER,
BTT_INORDER_REV,
};
Skimming enough of the code, we discover these two tokens used while traversing the tree.
Suggestion: enum btt_type { BTT_INORDER_FWD, BTT_INORDER_REV };
Instead of assuming forward, but specifying reverse, code would be clearer if the distinction were explicit.
Then the reader discovers code peppered with mysterious constants 0
and 1
!
What do these two constants signify?
Why are there no text tokens that explain their purpose or use?
Is it possible some lurking 0
or 1
should be a published token?
The reader is both uncertain and somewhat flummoxed.
Mates...
Consider:
struct btree
{
void *data;
/* 0 for left, 1 for right. */
int thread[2];
struct btree *link[2]; /* A threaded link or a child. */
};
Are thread
and link
conjoined; always handled together?
Suggest:
struct btree {
void *data;
struct {
int thread;
struct btree *link; /* A threaded link or a child. */
} foo[2]; // 0 for left, 1 for right.
};
(Using the name foo
. The OP is better placed to provide a purposeful name for this pair of pairs.)
Controversial...
Consider these two function prototypes in the header file:
extern struct btree *
btt_this_node(struct btt *btt);
extern void *
btt_next(struct btt *);
The first one uses a sample variable name that is the same as the name of the datatype. This is unhelpful.
The second one shows only the datatype of the parameter.
More controversial is my recommendation (an opinion) that the verbosity of this code could be beneficially reduced by employing typedef
instead of sprinkling the token struct
so liberally throughout the code. (It appears that there are only two separate struct
datatypes declared.)
<opinion>
Further, any coder messing with BST code is more likely to expect to see nodes identified as node
rather than tree
. And, there's an ingrained expectation to see one particular node identified as root
.
"When in Rome ..."
</opinion>
Finally, the multiple instances of the qualifier extern
may be viewed by coders as unnecessary verbiage in the 'shared' header file. The OP should study and contemplate the purpose of the extern
qualifier to better understand its application in code written in multiple source files.
Stay away from the Dark Side
I will omit "ex" versions of most functions here to save space, they only differ from ordinary versions in that the function they accept as an argument takes an extra void * argument, so one doesn't have to use globals.
Wrong choice (IMO).
The code you present for review should have been the "_ex" versions. All processing should be done with transient data that is received by functions as parameters. No function should be written "expecting" global variables to be named appropriate for its use. Related is the next item, too.
Don't abuse preprocessor macros
It appears you've written some lengthy blocks of code as preprocessor macros, presumably to be pasted into different functions with different calling signatures.
Left as an exercise:
Re-write these operations to process data 'chunks' (nodes and such) as one capable function. If you want application code (the calling functions) to use fewer parameters, then let that application code provide "wrapper functions" that populate the parameters to your BST functions with the names of its "global variables".
You want to isolate regions of code as much as possible. Capabilities of "primitive functions" can be tested and proven in isolation from "application functions". This aids isolation and determination of the location of inevitable bugs.
Try to find ways to eliminate code
Code that is not present means few places for bugs to hide...
t_btree *btree_create( void *data ) // note use of typedef'd datatype
{
// using '*p' with sizeof is less to maintain (and fewer parentheses)
t_btree *p = calloc( 1, sizeof *p ); // calloc zeros entire block
if( p )
p->data = data; // only one non-zero member to store away
return p; // may be NULL, may be just fine!!
}
That function was 14 lines long in the OP's code, and risked being overlooked if the structure sprouted additional members.
Compact the code; don't inflate it
Look closely at the code of the pair
btt_next_inorder(struct btt *btt)
and
btt_next_inorder_rev(struct btt *btt)
...
Each of these is 14 lines long in the OP's code. The only difference between these two is the use of mysterious constants 0
and 1
(that are swapped).
This screams for a static
helper function that is 14 lines long, and this pair merely provide the helper with either 0
,1
or 1
,0
as two additional parameters.
Well engineered code re-uses as much as it can (as long as correctness and clarity are not sacrificed.)
<opinion>
If it's important to distinguish "reverse", then it's important to distinguish "forward" (from default).
Suggest renaming first function to btt_next_inorder_fwd
OR, recognise the common use of prev
as the opposite of next
.
btt_inorder_next(struct btt *btt)
and
btt_inorder_prev(struct btt *btt)
...
</opinion>
Or, provide two more usable tokens (eg: inorderNxt
and inoderPrv
) and pass the appropriate parameter as in:
btt_inorder( struct btt *ptr, unsigned nxtPrv) {
Then, the coder can write something cryptic like:
// assign src = 1 if next, then dst becomes 0
// or else src = 0 and dst becomes 1
int src = (nxtPrv == inorderNxt ), dst = !src;
Then one of two instances of the functions can be removed.
The mysterious 0
and 1
as array indices is being exploited, but they still should be replaced by informative tokens of those values.
The congested code buried in the two example blocks of preprocessor code are, to this reviewer, so densely packed as to make the code repugnant. I spot:
if (parent == NULL) { \
...
return 1; \
} else { \
...
return 1; \
}
and wonder why each path has its own return 1;
.
Good code inspires confidence; whereas this entire submission merely leads to shaking one's head.
Suggest focussing on writing correct, clear, clean, crisp code while leaving functional embellishments to later versions (driven by need rather than anticipation).