The following excerpt is taken from this paper, in page 3.
Our new union-find-delete data structure, like most other union-find data structures, maintains the elements of each set in a rooted tree. As elements can now be deleted, not all the nodes in these trees will contain active elements. Nodes that contain elements are said to be occupied, while nodes that do not contain elements are said to be vacant. When an element is deleted, the node containing it becomes vacant.
In page 6 they define a tidy tree:
Definition 1. A tree is said to be tidy if it satisfies the following properties:
1. Every vacant non root node has at least two children
2. Every leaf is occupied and has rank 0
Then they explain how to tidy a tree and claim the following lemma:
Lemma 1. At most half of the nodes in a tidy tree may be vacant.
I don't understand why this lemma is correct
1 Answer 1
Let's prove your Lemma by induction on the depth of the tree $h$:
- $h=2$: in this case we have a root, its children and its grandchildren which are leaves. The root can be both occupied or vacant. For each child of the root, it is either occupied (and so are its eventual children since they are leaves), or it is vacant and so, by rule 1), it has at least two children which are occupied (by rule 2) because depth is 2 so they are leaves), so for each vacant node we have at least two occupied nodes. So, if $v$ is the number of vacant nodes, the number of occupied nodes is at least 2ドル(v-1)$ (we subtract 1ドル$ because of the root) and $v\leq 2(v-1) \iff v \geq 1$, which means that the number of vacant nodes is at most the number of occupied nodes if we have at least 1 vacant node (otherwise it's trivially true), which means that the vacant nodes are at most half of the total nodes.
- $P(h-1) \implies P(h)$: Let $h$ be the height of the tree. The nodes at level $h-1$ are either occupied or they have at least two occupied children (leaves). So, we can "delete" from the tree the vacant nodes with height $h-1$ and one of their children and put the remaining children (at least another one for each vacant node because of rule 1)+2)) as children of the parent of the deleted vacant node. In this way, we obtained a new tidy tree with height $h-1$ which has a number of vacant nodes $\leq$ than the number of occupied nodes by inductive hypothesis. Furthermore, we removed only pairs of (vacant, occupied) nodes, so the number of vacant nodes is $\leq$ the number of occupied nodes also for the removed nodes, thus the proof is done.
-
$\begingroup$ In the induction step, what about occupied nodes at level $h-1$? They may have children and if you don't address that you'll still have a tree of height $h$ $\endgroup$Shlomiz– Shlomiz2024年10月21日 23:43:19 +00:00Commented Oct 21, 2024 at 23:43
-
1$\begingroup$ @shaggy we like occupied nodes in this case. You can just "remove" their children or put them in the level above together with their parent node; they just make our thesis stronger. The point of the proof is to show that we have in the last two levels at least as many occupied nodes as vacant ones, so that we can "remove" these pairs and we are left with a tidy tree of height $h-1$ plus some eventually additional occupied node at level $h$ that we can just ignore and "remove" $\endgroup$SilvioM– SilvioM2024年10月22日 08:21:41 +00:00Commented Oct 22, 2024 at 8:21