Unit testing to prove balanced tree

Question 1

I've just built a self-balancing tree (red-black) in Java (language should be irrelevant for this question though), and I'm trying to come up with a good means of testing that it's properly balanced.

I've tested all the basic tree operations, but I can't think of a way to test that it is indeed well and truly balanced. I've tried inserting a large dictionary of words, both pre-sorted and un-sorted. With a balanced tree, those should take roughly the same amount of time, but an unbalanced tree would take significantly longer on the already-sorted list. But I don't know how to go about testing for that in any reasonable, reproducible way. (I've tried doing millisecond tests on these, but there's no noticeable difference - probably because my source data is too small.)

Is there a better way to be sure that the tree is really balanced? Say, by looking at the tree after it's created and seeing how deep it goes? (That is, without modifying the tree itself by adding a depth field to each node, which is just wasteful if you don't need it for anything other than testing.)

Question 2

Your source data is too small? Try inserting all 32-bit integers, sorted and unsorted.

Question 3

Linux rbtree test might be of help. github.com/torvalds/linux/blob/master/lib/rbtree_test.c

Question 4

One way to do this would be to create a method on your tree that measures the depth of the tree at a given node. You don't have to store the value, and if you use such a getDepth() method only for testing, then there's no extra overhead for normal tree operations. The getDepth() method would recursively traverse its child nodes and return the maximum depth found.

Once you have that, you can then check that your whole tree is balanced by recursing over each node of the tree, and verifying a condition something like:

Math.abs(getDepth(left) - getDepth(right)) <= 1

Question 5

As an addition to this answer: if you prevent direct access to the implementation (eg, by using a factory and an interface), you should feel free to add any useful method to the implementation class. Including methods designed solely for testing.

Question 6

Is the depth-difference of no greater than 1 at every node guaranteed for red-black? I know this would work for AVL trees, but I remember reading somewhere that other balanced trees don't necessarily have this property despite being relatively balanced. I'd think something like comparing the maximum depth to the log(n) would a more general solution, but I think there's some wiggle-room there as well...

Question 7

Well if you have a different criteria for what you consider "balanced", then use that. My answer is showing another way to test that your tree meets your requirements, other than measuring performance like you suggest (measuring performance only determines the "balanced-ness" indirectly and is not definitive).

Question 8

Yeah, this probably is the way to go. I just wasn't sure if it would work for red-black. (I've done AVL trees a couple times in various languages, but this is my first time experimenting with other types of balanced trees.)

Question 9

The appropriate invariant of red-black tree is "Every path from a given node to any of its descendant leaves contains the same number of black nodes." And only that many red nodes, because red node may only have black children.

Question 10

If you want to measure time complexity for reads one idea is that you could have the tree issuing callbacks for traversals. This way you could measure the number of operations for a read, which would be a more reliable measurement of time complexity.

Question 11

And also add complexity to every step of the operation. If it were just being used to test (and compare to something unbalanced, say), that might be okay, but you'd have to then disable that code before putting into production.

Question 12

Yes, it would be costly. A compromise is to use an internal counter, that should be cheaper. Direct statistics are as I said IMO the only really reliable way of measuring time complexity. But if you want to get fancy you can try ThreadMXBean.getThreadCPUTime() to measure it.

Question 13

Well, as long as you can prove the tree is balanced at any given point, you know that any basic operations will operate in log time. So a solution which analyzes the current state of the tree and proves it's balanced should be good enough, and doesn't needlessly add time or space complexity when it's not being used for testing. (And is in fact what I've already done - this was a month ago.)

Question 14

Yes well I just wanted to add some pointers as to how you could go ahead with your original approach if you wanted to. That's why I started with "If you want to measure time complexity". For reference.

Greg Hewgill Greg Hewgill 10.2k1 gold badge49 silver badges46 bronze badges · Accepted Answer · 2013-11-11 00:27:32Z

10

One way to do this would be to create a method on your tree that measures the depth of the tree at a given node. You don't have to store the value, and if you use such a getDepth() method only for testing, then there's no extra overhead for normal tree operations. The getDepth() method would recursively traverse its child nodes and return the maximum depth found.

Once you have that, you can then check that your whole tree is balanced by recursing over each node of the tree, and verifying a condition something like:

Math.abs(getDepth(left) - getDepth(right)) <= 1

Share

Improve this answer

answered Nov 11, 2013 at 0:27

Greg Hewgill's user avatar

Greg Hewgill Greg Hewgill

10.2k1 gold badge49 silver badges46 bronze badges

6

1

As an addition to this answer: if you prevent direct access to the implementation (eg, by using a factory and an interface), you should feel free to add any useful method to the implementation class. Including methods designed solely for testing.

kdgregory
– kdgregory

2013年11月11日 01:36:08 +00:00
Commented Nov 11, 2013 at 1:36
Is the depth-difference of no greater than 1 at every node guaranteed for red-black? I know this would work for AVL trees, but I remember reading somewhere that other balanced trees don't necessarily have this property despite being relatively balanced. I'd think something like comparing the maximum depth to the log(n) would a more general solution, but I think there's some wiggle-room there as well...

Darrel Hoffman
– Darrel Hoffman

2013年11月11日 02:38:07 +00:00
Commented Nov 11, 2013 at 2:38
Well if you have a different criteria for what you consider "balanced", then use that. My answer is showing another way to test that your tree meets your requirements, other than measuring performance like you suggest (measuring performance only determines the "balanced-ness" indirectly and is not definitive).

Greg Hewgill
– Greg Hewgill

2013年11月11日 02:39:49 +00:00
Commented Nov 11, 2013 at 2:39
Yeah, this probably is the way to go. I just wasn't sure if it would work for red-black. (I've done AVL trees a couple times in various languages, but this is my first time experimenting with other types of balanced trees.)

Darrel Hoffman
– Darrel Hoffman

2013年11月11日 02:45:40 +00:00
Commented Nov 11, 2013 at 2:45
1

The appropriate invariant of red-black tree is "Every path from a given node to any of its descendant leaves contains the same number of black nodes." And only that many red nodes, because red node may only have black children.

Jan Hudec
– Jan Hudec

2013年11月11日 08:36:46 +00:00
Commented Nov 11, 2013 at 8:36

| Show 1 more comment

Stack Exchange Network

Unit testing to prove balanced tree

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Unit testing to prove balanced tree

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions