2
$\begingroup$

I have a guess about the problem above. Suppose I have a binary search tree $T$ initially empty. Suppose I drawn $x_1,\ldots,x_k$ (from some real interval $[a,b]$) keys and I want to insert the keys in such order in $T,ドル what's the expected depth/height of this tree?

My attempt is the following, suppose $x_1$ has been assigned as root, I define a random variable $L$ as

$$ L = L(x_2,\ldots,x_k | x_1) = \# \left\{x_i : x_i < x_1 ,2 \leq i \leq k\right\} = \sum_{j=2}^k u(x_1 - x_j) $$

This function $L$ models how many elements would go in the left subtree rooted at $x_1,ドル here $u(x)$ is defined as

$$ u(x) = \begin{cases} 1 & x > 0 \\ 0 & otherwise \end{cases}, $$

we also have

$$ R = k - 1 -L $$

the number of elements at the right of $x_1$. Now the expected value for $L$ is

$$ \bar{L} = \int_{[a,b]^{k-1}} L(x_2,\ldots,x_k | x_1) p(x_2,\ldots, x_k) dx_2 \ldots dx_k = \\ \int_{[a,b]^{k-1}} \sum_{j=2}^k u(x_1 - x_j) p(x_2,\ldots, x_k) dx_2 \ldots dx_k = \\ \int_{[a,b]^{k-1}} \sum_{j=2}^k u(x_1 - x_j) p(x_j) \prod_{l=2,l\neq j}^{k}p(x_l) dx_2 \ldots dx_k = \sum_{j=2}^k \int_{[a,b]} u(x_1 - x_j) p(x_j) dx_j = \\ (k-1) \frac{x_1-a}{b-a} $$

We then have

$$\bar{L} = (k-1) \frac{x_1-a}{b-a} = f(x_1)$$

If we now compute the expected value for $f$ we will get the on average how many elements will go on the left subtree rooted at $x_1,ドル regardless of $x_1$. This is easier to compute and we get a value I would expect indeed

$$ \bar{f} = \frac{k-1}{2} $$

We know now that given random values they will be equally distributed, we can write down the recurrence

$$ h(k) = 1 + h \left(\frac{k-1}{2} \right) $$

which models the average height for $k$ random elements, clearly $h$ is monotonic and we can write the inequality

$$ h(k) \leq 1 + h(k/2), $$

and therefore we end up with the solution

$$ h(k) = O(\log k) $$

Assuming the analysis is correct the questions are:

  1. I'm only analyzing with random inputs, what about if now I'll do deletion, how would the probabilistic model change? By deletion mean pick $y_1,\ldots, y_m \in \left\{x_1,\ldots x_k \right\}$ random and delete those, I expect a similar result, but I don't have a clue on the probabilistic model.
  2. Is there an alternative proof maybe more combinatorial?

Thank you

asked Jul 22, 2018 at 19:59
$\endgroup$
8
  • 1
    $\begingroup$ Unfortunately, it is generally not the case that $\mathbb{E}[\phi(X)] = \phi(\mathbb{E}[X])$. Therefore your recursive formula for $h$ doesn't quite hold. Perhaps you can get a similar inequality by convexity, if you can show that $h$ is convex. $\endgroup$ Commented Jul 22, 2018 at 21:12
  • 1
    $\begingroup$ The height of a random binary search tree is a classical topic, and it is easy to find material on it, in various levels of detail. $\endgroup$ Commented Jul 22, 2018 at 21:12
  • $\begingroup$ Which formula doesn't hold? $\endgroup$ Commented Jul 22, 2018 at 21:23
  • $\begingroup$ $h(k) = 1 + h(\frac{k-1}{2})$. $\endgroup$ Commented Jul 22, 2018 at 21:55
  • 3
    $\begingroup$ Possible duplicate of Proof that a randomly built binary search tree has logarithmic height $\endgroup$ Commented Jul 24, 2018 at 6:58

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.