Given a self-balancing binary search tree of size $n$, I want to perform the following operations:
InsertInOrderSequentialBatchan ordered sequence of $k$ values (specialized $k \in \{2, 3, 4\}$ or generalized $k \in N $) which are guaranteed to be sequential in an in-order tree traversal immediately after insertion.For example, insert $[310, 320, 330, 340]$ into a balanced tree containing $[100, 200, 500]$.
Future insertions might still be between the inserted nodes' values.
DeleteRangeall of $k$ nodes (likewise specialized or generalized) between 2 values in the BST.
For both operations, I want the tree to remain balanced.
With a Red-Black or AVL tree, I can achieve both operations through a sequence of $k$ insertions/deletions in $\mathcal{O}(k \log n)$, but wonder if a data structure (e.g. AVL/RB tree variant) could achieve $\mathcal{O}(\log n + k \log k)$ time for a real-world performance gain (e.g. traverse tree for insertion point, insert balanced tree into insertion point, perform one auto-balance pass)?
My needs are less theoretical and more in terms of wall-time & memory pressure for an implementation of Fortune's Algorithm - an algorithm with high constant-time multipliers is unfortunately not useful. My insertion sequence is biased & frequently multimodal (modes not known prior) with long streaks of insertions/deletions within modes. My tree size is anywhere from 100 to 3000 nodes.
-
$\begingroup$ I've had difficulty finding a name for these operations, let alone papers/slides addressing them. Such pointers would be appreciated! $\endgroup$Warty– Warty2021年04月03日 18:04:00 +00:00Commented Apr 3, 2021 at 18:04
-
$\begingroup$ (I can see those operations as (search&) splitting and joining BSTs.) (Dang. I should make it a habit to have a peek at existing answers.) $\endgroup$greybeard– greybeard2021年04月03日 18:33:07 +00:00Commented Apr 3, 2021 at 18:33
2 Answers 2
This can be solved with split and join operations, both achievable in $\mathcal{O}(\log n)$ for Red-Black and AVL trees.
For Red-Black Trees this is doable either leveraging finger trees or via extending a regular Red-Black Tree as in Ron Wein's "Efficient Implementation of Red-Black Trees with Split and Catenate Operations" implemented in CGAL with criticisms mentioned in https://stackoverflow.com/questions/29029894/red-black-tree-split-concatenate-in-logn-time (Tarjan might have a better paper).
For AVL trees this is doable according to Ramzi Fadel and Kim Vagn Jakobsen's "Data structures and algorithms in a two-level memory".
InsertInOrderSequentialBatch is expressable as a split followed by 2 joins. DeleteRange is 2 splits followed by 1 join.
-
$\begingroup$ For k=2,3,4, I suspect it is possible to achieve significantly faster performance through a specialized by-cases solution and would greatly appreciate pointers in that direction. $\endgroup$Warty– Warty2021年04月03日 18:08:53 +00:00Commented Apr 3, 2021 at 18:08
-
$\begingroup$ See en.wikipedia.org/wiki/Join-based_tree_algorithms $\endgroup$Warty– Warty2021年04月03日 18:28:03 +00:00Commented Apr 3, 2021 at 18:28
-
$\begingroup$ See en.wikipedia.org/wiki/… en.wikipedia.org/wiki/… $\endgroup$Warty– Warty2021年04月03日 18:35:02 +00:00Commented Apr 3, 2021 at 18:35
Red-Black tree insertion requires an amortized O(1) recolorings and rotations; if you know where to insert, doing so and rebalancing is O(1). Therefore multi-insertion can be achieved via a single O(logN) search followed by k amortized O(1) insert-successor-and-rebalance operations, since it is possible to implement insert-successor-and-rebalance such that the new node's Black Height is always <= 2, which bounds the next search to a constant time.
Alternatively, but more complex, if blackHeight(treeN) > blackHeight(treeK), insertion can be done in log(n/k)+k time: treeify k values, then insert the tree k into tree n at a leaf and "pull" it upward to maintain black height invariants. This is very similar to implementing a split-join.
For my personal code, I'm doing the O(logN + k) approach first described if k's tree is smaller in black height than N's. Otherwise, I suspect that past some threshold the split-join approach is going to be faster.
Explore related questions
See similar questions with these tags.