Expected Height of a Randomly-Built Binary Search Tree
Based on the Cormen et al., Section 12.4, pp 299-303
To randomly build a binary search tree, all permutations of inputs (keys) are equally likely. In other words, given n keys, each of the n! inputs are equally likely. Note that this is different from stating that all binary search trees on n keys are equally likely. Why?
We will first introduce two random variables.
1. Xn is the height of a randomly-built binary search tree on n keys. Note that X1 = 0.
2. Yn = 2Xn is the exponential height of a randomly-built binary search tree on n keys where:
(a) Y0 = 0 (by definition), (b) Y1 = 1, and
(c) Yn = 2 · max(Yi−1, Yn−i).
The probability that a randomly-built binary search has a left subtree with i − 1 keys and a right subtree with n−i keys, 1 ≤ i ≤ n, is 1/n since each of the n keys is equally likely to be the root. Hence, the expected value of E[Yn] is expressed as:
Inductive Step
E[Yn]
= n 1 · E[2 · max(Yi−1, Yn−i)]
n
i=1
2 n
E [max(Yi−1 , Yn−i )] (E[Yi−1] + E[Yn−i])
≤
We will now prove by induction that E[Yn] ≤ cn3 where c ≥ 1.
Basis of Induction
E[Y0]=Y0 =0≤c·03 wherec≥1
E[Y1]=Y1 =1≤c·13 wherec≥1 Induction Hypothesis Assume that E[Yi] ≤ c · i3, 0 ≤ i < n.
= n
2 n
≤ n
4 n
≤ n
E[Yn] ≤ ≤ ≤
4 n−1
E[Yi]
i=0 4 n−1
c·i3
i=0 4 n−1
c·i3
i=1
i=1
E [Yi−1 ] E[Yi]
i=1 4 n−1
n
i=0
n
n
n
i=1
4c(n−1)2 ·n2
≤
≤ cn(n − 1)2
n4 ≤ cn3
1
To determine the expected value of Xn, we can use Jensen’s inequality which states that: f(E[X]) ≤ E[f(X)]
provided f is a convex function. In our case, f(Xn) = Yn = 2Xn is a convex function. Hence,
f(E[Xn]) = 2E[Xn] ≤ E[2Xn]
≤ E[Yn] ≤ cn3
Therefore, taking the log2 on both sides yields our result.
E[Xn] ≤ log2 c + 3 log2 n ∈ O(log n)
2