I'm playing around with a toy implementation of binary trees in C.
I have a typedef
'd struct BTree
like so —
typedef struct BTree {
struct BTree* left;
void* value;
struct BTree* right;
} BTree;
I have three functions implemented so far —
BTree* genRandomTree(int depth)
— Creates a balanced binary tree of depth
height and fills it with random integers
int height(BTree* tree)
- Calculates the height of the binary tree
bool isAVLBalanced(BTree* tree)
- Checks if a given tree is AVL balanced
I have run some basic test cases and everything seems to be in working order.
I am specifically looking for advice about performance and correctness regarding any edge-cases I've failed to take into consideration. I am especially interested in any method of converting the three functions into tail-recursive form.
I am specifically not looking for advice telling me not to use gcc's nested function extension (exploring this extension was part of my motivation for writing this program in the first place) or advice about not-having free'd memory. I am also aware that the recursion-heavy style I have written this in is very unidiomatic and recommended against, however, having fun with recursion was also one of my motivations for writing this program, so ̄\_(ツ)_/ ̄.
Compiling with gcc -Wall -Wextra -Werror -Wc++-compat
gives no errors or warnings. Adding -Wpedantic
to the compiler options produces errors about nested functions not being ISO C compliant but nothing else.
Here is the full code —
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
typedef struct BTree {
struct BTree* left;
void* value;
struct BTree* right;
}BTree;
// generates balanced binary tree of maxDepth depth and fills it with integers
BTree* genRandomTree(int depth) {
BTree* _genRandomTree(int curDepth, int maxDepth) {
if (curDepth == maxDepth) {return NULL;}
BTree* node = (BTree*)malloc(sizeof(BTree));
node->left = _genRandomTree(curDepth+1, maxDepth);
node->right = _genRandomTree(curDepth+1, maxDepth);
node->value = malloc(sizeof(int));
*((int *) node->value) = rand()%100;
return node;
}
return _genRandomTree(0, depth);
}
// Calculates height of tree
int height(BTree* tree) {
int _height(BTree* tree, int curHeight) {
if (!tree) { return curHeight; }
int leftHeight = _height(tree->left, curHeight+1);
int rightHeight = _height(tree->right, curHeight+1);
return leftHeight > rightHeight ? leftHeight : rightHeight;
}
return _height(tree, 0);
}
// Checks if tree is AVL balanced
bool isAVLBalanced(BTree* tree) {
if (!tree) { return true; } // Null tree is trivially balanced
int heightDiff = height(tree->left) - height(tree->right);
int heightDiffAbs = heightDiff < 0 ? heightDiff * -1 : heightDiff;
return
heightDiffAbs <= 1
&&
isAVLBalanced(tree->left)
&&
isAVLBalanced(tree->right);
}
int main() {
srand(6);
BTree* root = genRandomTree(3);
if (isAVLBalanced(root)) { printf("Balanced\n"); }
else { printf("Unbalanced\n"); }
return 0;
}
/*
Tree created by genRandomTree() after srand(6)
86
/ \
/ \
12 85
/\ /\
/ \ / \
41 85 65 8
*/
-
\$\begingroup\$ (B-tree is why BST is a somewhat common acronym where BT isn't.) \$\endgroup\$greybeard– greybeard2023年04月26日 06:00:05 +00:00Commented Apr 26, 2023 at 6:00
3 Answers 3
Nested functions are not in the C standard. It is a GCC extension, of a dubious value. Don't do it.
Casting
malloc
is a bad practice. First, it is not necessary, and second, it may lead to a hard-to-find bugs.Along the same line, prefer
sizeof value
rather thansizeof (type)
, e.g.BTree* node = malloc(sizeof *node);
Among other benefits, this avoids double maintenance.
(削除)isAVLBalanced
seems buggy. It only tests the heights of immediate subtrees of the root. You also need the subtrees themselves to be AVL balanced - i.e. it also needs to be recursive. (削除ここまで)Stylistically,
*
shall gravitate to a variable, not to the type. Preferstruct BTree *left;
Disclaimer: this is a matter of an (heretical) opinion. You don't want to say that
left
is a pointer toBTree
. You want to say that*left
in an expression yields aBTree
object.
-
\$\begingroup\$ I'm curious how
isAVLBalanced
only tests the height of immediate subtrees, won't the twoisAVLBalanced
calls in thereturn
expression recursively test heights of all subtrees? \$\endgroup\$ZarakshR– ZarakshR2023年04月26日 13:15:36 +00:00Commented Apr 26, 2023 at 13:15 -
2\$\begingroup\$ I agree with the position of
*
's that you mention, specifically because of a confusion I had in the past. Once I wrote something likeint* x, y, z
, thinking that they would all be pointers, because of the way I wrote my*
's, but if you write that instead asint *x, y, z
then it makes it a lot clearer what's going on. \$\endgroup\$Jacob Garby– Jacob Garby2023年04月26日 17:52:43 +00:00Commented Apr 26, 2023 at 17:52 -
\$\begingroup\$ @ZarakshR I was blibd. I didn't notice the recursive calls. Edited. As a side note, you'd be in better shape to return both a boolean state, and a height, cutting a run time by a factor of 2. \$\endgroup\$vnp– vnp2023年04月27日 03:01:00 +00:00Commented Apr 27, 2023 at 3:01
-
\$\begingroup\$ @vnp Not sure what you mean by returning a height.... \$\endgroup\$ZarakshR– ZarakshR2023年04月27日 08:23:39 +00:00Commented Apr 27, 2023 at 8:23
I would be a little more careful and consistent with my styling. You have
if (curDepth == maxDepth) {return NULL;}
but you later have
if (!tree) { return true; }
Similarly, I would recommend placing spaces between arithmetic operators, a + b
rather than a+b
. If you don't want to do this, at least be consistent with it.
For values which refer to the size of something, you should use size_t
instead of int
, considering <stdlib.h>
is already included. If you really can't use a size_t
, you should keep everything unsigned (does a negative size make sense?).
if (!tree)
doesn't read to me as "check for a null tree". I would recommend replacing it with the negligibly more verbose, but clearer if (tree == NULL)
.
As was said before, the result of malloc
should not be casted. I also agree with the stylistic choice of the *
in pointers being put to the left of the variable name.
heightDiff * -1
can be replaced by -heightDiff
to make it a bit clearer.
Inefficient
isAVLBalanced()
makes far too many excursions through the tree to report a boolean answer. It looks O(n*n).
Note: AVL balanced tree
Instead walk the tree, at most, once, looking for imbalances along the way: O(n). Something like below. Since the tree does not change, use const
.
// Return the height of the tree if sub-trees still nearly balanced.
// Otherwise return SIZE_MAX.
size_t Btree_HeightWithBFTest(const BTree *tree) {
if (tree == NULL) {
return 0;
}
size_t height_left = Btree_HeightWithBFTest(tree->left);
if (height_left == SIZE_MAX) {
return SIZE_MAX;
}
size_t height_right = Btree_HeightWithBFTest(tree->right);
if (height_right == SIZE_MAX) {
return SIZE_MAX;
}
if (height_left > height_right) {
if (height_left - height_right > 1) {
return SIZE_MAX;
}
return height_left + 1;
}
if (height_right - height_left > 1) {
return SIZE_MAX;
}
return height_right + 1;
}
bool isAVLBalanced(const BTree *tree) {
size_t height = Btree_HeightWithBFTest(tree);
return height != SIZE_MAX;
}
Simplify
// int heightDiffAbs = heightDiff < 0 ? heightDiff * -1 : heightDiff;
int heightDiffAbs = abs(heightDiff);
Robust code would check for allocation success
Embedded functions not allowed
As partly mentioned by others, "warning: ISO C forbids nested functions"
Make _genRandomTree()
and int _height()
static
helper functions. Avoid the leading _
.
Improve style
Use an auto formatter on the code. Saves times and improves uniformity of code's appearance.
Why -Wc++-compat
?
Unfortunately this tends to make C code less C like. Consider dropping it unless you have a compelling reason for it.