4
26
1357
35
12467
CS61B, 2020
Lecture 17: B-Trees
¡ñ BST Performance
¡ñ Big O ¡ Worst Case
¡ñ 2-3-4 and 2-3 Tree Operations (a.k.a. B-Trees)
¡ñ B-Tree Bushiness Invariants
¡ñ B-Tree Performance
datastructur.es
BST Tree Height
datastructur.es
BST Tree Height
Let¡¯s start today by carefully discussing the height of binary search trees. Trees range from best-case ¡°bushy¡± to worst-case ¡°spindly¡±.
¡ñ
Difference is dramatic!
H=3
k
ev
bgpy
adfjmrxz
k
H=3
v
y
z
datastructur.es
Tree Height: http://yellkey.com/?
Height varies dramatically between ¡°bushy¡± and ¡°spindly¡± trees.
H=3 k
ev
bgpy
adfjmrxz
k
v
y
H=3
z
Let H(N) be the height of a tree with N nodes. Give H(N) in Big-Theta notation for ¡°bushy¡± and ¡°spindly¡± trees, respectively:
A. ¦¨(log(N)), ¦¨(log(N)) B. ¦¨(log(N)), ¦¨(N)
C. ¦¨(N), ¦¨(log(N)) D. ¦¨(N), ¦¨(N)
datastructur.es
Tree Height
Height varies dramatically between ¡°bushy¡± and ¡°spindly¡± trees.
H=3 k
ev
bgpy
adfjmrxz
H = ¦¨(log(N))
k
H=3
v
y
z
H = ¦¨(N)
Performance of operations on spindly trees can be just as bad as a linked list! ¡ñ Example: contains(¡°z¡±) would take linear time.
datastructur.es
Statements about Tree Height: http://yellkey.com/? Which of these statements are true?
A. Worst case BST height is ¦¨(N).
B. BST height is O(N).
C. BST height is O(N2).
datastructur.es
Statements about Tree Height
Which of these statements are true?
A. Worst case BST height is ¦¨(N).
B. BST height is O(N).
C. BST height is O(N2).
All are true!
¡ñ A worst case (spindly tree) has a height that grows exactly linearly – ¦¨(N).
¡ñ All BSTs have a height that grows linearly or better – O(N).
¡ñ All BSTs have a height that grows quadratically or better – O(N2).
datastructur.es
Statements about Tree Height: http://yellkey.com/? Which of these statements is more informative?
A. Worst case BST height is ¦¨(N).
B. BST height is O(N).
C. They are equally informative.
datastructur.es
Statements about Tree Height
Which of these statements is more informative?
A. Worst case BST height is ¦¨(N).
B. BST height is O(N).
C. They are equally informative.
Saying that the worst case has order of growth N is more informative than saying the height is O(N).
Let¡¯s see an analogy.
datastructur.es
Question: http://yellkey.com/?
Which statement gives you more information about a hotel?
A. The most expensive room in the hotel is $639 per night.
B. Every room in the hotel is less than or equal to $639 per night.
Question: http://yellkey.com/?
Which statement gives you more information about a hotel?
A. The most expensive room in the hotel is $639 per night.
B. Every room in the hotel is less than or equal to $639 per night.
Most expensive room: $639/nt
All rooms <= $639/nt
(A nice place to stay!)
BST Height
BST height is all four of these:
¡ñ O(N).
¡ñ ¦¨(log N) in the best case (¡°bushy¡±).
¡ñ ¦¨(N) in the worst case (¡°spindly¡±).
¡ñ O(N2).
The middle two statements are more informative.
¡ñ Big O is NOT mathematically the same thing as ¡°worst case¡±.
¡ð e.g. BST heights are O(N2), but are not quadratic in the worst case.
¡ð ... but Big O often used as shorthand for ¡°worst case¡±.
The Usefulness of Big O
Big O is still a useful idea:
¡ñ Allows us to make simple blanket statements, e.g. can just say ¡°binary search is O(log N)¡± instead of ¡°binary search is ¦¨(log N) in the worst case¡±.
¡ñ Sometimes don¡¯t know the exact runtime, so use O to give an upper bound.
¡ð Example: Runtime for finding shortest route that goes to all world cities
is O(2N)*. There might be a faster way, but nobody knows one yet. ¡ñ Easier to write proofs for Big O than Big Theta, e.g. finding runtime of
mergesort, you can round up the number of items to the next power of 2 (see A level study guide problems for Asymptotics2 lecture). A little beyond the scope of our course.
*: Under certain assumptions and constraints not listed.
BST Height
BST height is both of these:
¡ñ ¦¨(log N) in the best case (¡°bushy¡±).
¡ñ ¦¨(N) in the worst case (¡°spindly¡±).
Let¡¯s now turn to understanding the performance of BST operations.
Height, Depth, and Performance
datastructur.es
Height and Depth
Height and average depth are important properties of BSTs.
¡ñ The ¡°depth¡± of a node is how far it is from the root, e.g. depth(g) = 2.
¡ñ The ¡°height¡± of a tree is the depth of its deepest leaf, e.g. height(T) = 4.
¡ñ The ¡°average depth¡± of a tree is the average depth of a tree¡¯s nodes.
¡ð
(0x1 + 1x2 + 2x4 + 3x6 + 4x1)/(1+2+4+6+1) = 2.35
height 4
depth 0 depth 1 depth 2 depth 3 depth 4
k
ev
bgpy
a d f j r z
s
datastructur.es
Height, Depth and Runtime
Height and average depth determine runtimes for BST operations.
¡ñ The ¡°height¡± of a tree determines the worst case runtime to find a node.
¡ð Example: Worst case is contains(s), requires 5 comparisons (height + 1).
¡ñ The ¡°average depth¡± determines the average case runtime to find a node.
¡ð
Example: Average case is 3.35 comparisons (average depth + 1).
depth 0 depth 1 depth 2
k
e
v
height 4
b
g
p
y
depth 3 a d f j r z
depth 4 s
datastructur.es
BSTs in Practice
Suppose we want to build a BST out of the numbers 1, 2, 3, 4, 5, 6, 7. Give an example of a sequence of add operations that results in:
¡ñ A spindly tree.
¡ñ A bushy tree.
1
2
3
4
5
6
4
26
1357
7
datastructur.es
BSTs in Practice
Give an example of a sequence of add operations that results in:
¡ñ A spindly tree.
¡ð add(1), add(2), add(3), add(4), add(5), add(6), add(7)
¡ñ A bushy tree.
¡ð
add(4), add(2), add(1), add(3), add(6), add(5), add(7)
1
2
3
4
5
6
4
Height: 6
Average Depth: 3 2
6
Height: 2
Average Depth: 1.43
1357
7
datastructur.es
Important Question: What about Real World BSTs?
BSTs have:
¡ñ Worst case ¦¨(N) height.
¡ñ Best case ¦¨(log N) height.
... but what about trees that you¡¯d build during real world applications? One way to approximate this is to consider randomized BSTs.
datastructur.es
Simulation: Trees Built from Random Inserts
max: height + 1
avg: average depth + 1
Nice Property. Random trees have ¦¨(log N) average depth and height.
¡ñ In other words: Random trees are bushy, not spindly. Video courtesy of Kevin Wayne (Princeton University)
datastructur.es
Randomized Trees: Mathematical Analysis
Average Depth. If N distinct keys are inserted into a BST, the expected average depth is ~ 2 ln N = ¦¨(log N).
¡ñ Thus, average runtime for contains operation is ¦¨(log N) on a tree built with random inserts.
¡ñ Will discuss this proof briefly closer to the end of this course.
Tree Height. If N distinct keys are inserted in random order, expected tree
height is ~ 4.311 ln N (see Reed, 2003).
¡ñ Thus, worst case runtime for contains operation is ¦¨(log N) on a tree built with random inserts.
¡ñ Proof is well beyond the scope of the course (and is 27 pages long!). Note: ~ is the same thing as Big Theta, but you don¡¯t drop the multiplicative constants.
datastructur.es
Important Question: What about Real World BSTs?
BSTs have:
¡ñ Worst case ¦¨(N) height.
¡ñ Best case ¦¨(log N) height.
¡ñ ¦¨(log N) height if constructed via random inserts.
In real world applications we expect both insertion and deletion.
¡ñ See extra slides for more on simulations of trees including deletion.
¡ñ Can show that random trees including deletion are still ¦¨(log N) height.
datastructur.es
Good News and Bad News
Good news: BSTs have great performance if we insert items randomly. Performance is ¦¨(log N) per operation.
Bad News: We can¡¯t always insert our items in a random order. Why?
m
eo
bgnp
q
r
s
datastructur.es
Good News and Bad News
Good news: BSTs have great performance if we insert items randomly. Performance is ¦¨(log N) per operation.
Bad News: We can¡¯t always insert our items in a random order. Why?
¡ñ Data comes in over time, don¡¯t have all at once. ¡ð Example: Storing dates of events.
¡ö add(¡°01-Jan-2019, 10:31:00¡±) e
¡ö add(¡°01-Jan-2019, 18:51:00¡±)
m
o
¡ö add(¡°02-Jan-2019, 00:05:00¡±)
¡ö add(¡°02-Jan-2019, 23:10:00¡±)
bgnp
In this lecture, we¡¯ll do something totally different.
q
r
s
datastructur.es
B-trees / 2-3 trees / 2-3-4 trees
datastructur.es
Avoiding Imbalance
The problem is adding new leaves at the bottom. Crazy idea: Never add new leaves at the bottom.
¡ñ Tree can never get imbalanced.
13
5 15
2 7 14 16
Q: What do we do with incoming keys?
?
? 18 ?
17
?
19
datastructur.es
Avoiding Imbalance through Overstuffing
The problem is adding new leaves at the bottom. Avoid new leaves by ¡°overstuffing¡± the leaf nodes.
13
5 15
¡ñ
¡°Overstuffed tree¡± always has balanced height,
because leaf depths never change. ¡ð Height is just max(depth).
2 7 14 16
13 13
5 15 5 15
2 7 14 1617 2 7 14 161718
datastructur.es
Avoiding Imbalance through Overstuffing
Overstuffed trees are a logically consistent but very weird data structure. ¡ñ contains(18):
¡ð ¡ð ¡ð ¡ð ¡ð
Q: What
Is 18 > 13? Yes, go right. Is 18 > 15? Yes, go right. Is 16 = 18? No.
Is 17 = 18? No.
Is 18 = 18? Yes! Found it.
is the problem with this idea?
13
5 15
13
5 15
2 7 14
16 17 18 19
2 7 14
16 17 18 19 20 21 22 23 24
datastructur.es
Revising Our Overstuffed Tree Approach
Height is balanced, but we have a new problem: ¡ñ Leaf nodes can get too juicy.
Solution?
13
5 15
7 14 16 17 18 19
2
datastructur.es
Revising Our Overstuffed Tree Approach: Moving Items Up
Height is balanced, but we have a new problem: ¡ñ Leaf nodes can get too juicy.
13
5 15
7 14
Solution?
2
16 17 18 19
move up
¡ñ Set a limit L on the number of items, say L=3.
¡ñ If any node has more than L items, give an item to parent.
¡ð Which one? Let¡¯s say (arbitrarily) the left-middle.
13
Q: What¡¯s the problem now?
¡ñ 16 is to the right of 17. 2
5
15 17
16 18 19
7
14
datastructur.es
Revising Our Overstuffed Tree Approach
Height is balanced, but we have a new problem: ¡ñ Leaf nodes can get too juicy.
13
5 15
Solution?
2 7 14
16 17 18 19
¡ñ Set a limit L on the number of items, say L=3.
¡ñ If any node has more than L items, give an item to parent.
¡ð Which one? Let¡¯s say (arbitrarily) the left-middle.
13
Challenge for you:
¡ñ How can we tweak this idea to make it work?
5
15 17
16 18 19
2
7
14
datastructur.es
Revising Our Overstuffed Tree Approach: Node Splitting
Height is balanced, but we have a new problem: ¡ñ Leaf nodes can get too juicy.
13
5 15
7 14
Solution?
2
16 17 18 19
¡ñ Set a limit L on the number of items, say L=3.
¡ñ If any node has more than L items, give an item to parent.
¡ð Pulling item out of full node splits it into left and right.
¡ð Parent node now has three children!
13
5
15 17
2 7 14161819
datastructur.es
Revising Our Overstuffed Tree Approach: Node Splitting
This is a logically consistent and not so weird data structure.
¡ñ contains(18):
¡ð 18 > 13, so go right
¡ð 18 > 15, so compare vs. 17 5
¡ð 18 > 17, so go right
13
15 17
2 7 14161819
Examining a node costs us O(L) compares, but that¡¯s OK since L is constant.
What if a non-leaf node gets too full? Can we split that? ¡ñ Sure, we¡¯ll do this in a few slides, but first…
datastructur.es
add Understanding Check
¡ñ
Suppose we add 20, 21:
13
13
5
15 17
5
15 17
14
16
2 7 14 16 1819
2
7 18 19 20 21
¡ñ Q: If our cap is at most L=3 items per node, draw post-split tree:
datastructur.es
add Understanding Check
¡ñ
Suppose we add 20, 21:
13
13
5
15 17
5
15 17
14
16
2 7 14 16 1819
2
7 18 19 20 21
¡ñ Q: If our cap is at most L=3 items per node, draw post-split tree: 13
5
15 17 19
27
datastructur.es
14
16
18
20 21
add: Chain Reaction
¡ñ Suppose we add 25, 26: 13
13
15 17 19
5
2 7 14
15 17 19
5
2021 2 7
16
18
13
13 17
15 19 21
5 15 17 19 21
5
14
16
18
14
16
18
20
2 7 20 2526 2 7
14
16
18
20 21 25 26
25 26
datastructur.e
s
What Happens If The Root Is Too Full?
13 17 13 17 13 17 21
19
22 23
5151921 51519212223515
13 17 21
13 17 21 23
5 22
15
19
5
15
19
22 23 24 25
24 25
Challenge: Draw the tree after the root is split.
datastructur.es
What Happens If The Root Is Too Full?
13 17 21
13 17 21 23
5 22
15
19
5
15
19
22 23 24 25
24 25
Challenge: Draw the tree after the root is split.
17
13
21 23
5 15 19 22 2425
datastructur.es
Perfect Balance
Observation: Splitting-trees have perfect balance.
¡ñ If we split the root, every node gets pushed down by exactly one level.
¡ñ If we split a leaf node or internal node, the height doesn¡¯t change.
We will soon prove: All operations have guaranteed O(log N) time.
¡ñ More details soon.
17
13
21 23
5 15 19 22 2425
datastructur.es
The Real Name for Splitting Trees is ¡°B Trees¡±
Splitting tree is a better name, but I didn¡¯t invent them, so we¡¯re stuck with their real name: B-trees.
¡ñ
B-trees of order L=3 (like we used today) are also called a 2-3-4 tree or a 2-4 tree.
¡ð ¡°2-3-4¡± refers to the number of children that a node can have, e.g. a 2-3-4 tree node may have 2, 3, or 4 children.
B-trees of order L=2 are also called a 2-3 tree.
The origin of “B-tree” has never been explained by the authors. As we shall see, “balanced,” “broad,” or “bushy” might apply. Others suggest that the “B” stands for Boeing. Because of his contributions, however, it seems appropriate to think of B-trees as “Bayer”-trees.
¡ñ
– Douglas Corner (The Ubiquitous B-Tree)
datastructur.es
A note on Terminology
B-Trees are most popular in two specific contexts:
¡ñ Small L (L=2 or L=3):
¡ð Used as a conceptually simple balanced search tree (as today).
¡ñ L is very large (say thousands).
¡ð Used in practice for databases and filesystems (i.e. systems with very
large records).
mq m q
e o s uw e o s u
bgn t bgnprtv
p
r
v
yz
2-3-4 a.k.a. 2-4 Tree (L=3):
Max 3 items per node.
Max 4 non-null children per node.
2-3 Tree (L=2):
Max 2 items per node.
Max 3 non-null children per node.
datastructur.es
B-Tree Bushiness Invariants
datastructur.es
Exercise
add the numbers 1, 2, 3, 4, 5, 6, then 7 (in that order) into a regular BST.
Then try adding 1, 2, 3, 4, 5, 6, then 7 (in that order) into a 2-3 tree (L=2). ¡ñ For L=2, pass the middle item up.
datastructur.es
Exercise
add the numbers 1, 2, 3, 4, 5, 6, then 7 (in that order) into a regular BST.
1
2
3
4
5
Then try adding 1, 2, 3, 4, 5, 6, then 7 (in that order) into a 2-3 tree.
¡ñ Interactive demo: https://tinyurl.com/balanceYD or this link.
¡ñ In this demo ¡°max-degree¡± means the maximum number of children, i.e. 3. 4
6
7
26
All leaves are at depth 2.
1357
datastructur.es
Exercise 2
Find an order such that if you add the items 1, 2, 3, 4, 5, 6, and 7 in that order, the resulting 2-3 tree has height 1.
datastructur.es
Exercise 2
Find an order such that if you add the items 1, 2, 3, 4, 5, 6, and 7 in that order, the resulting 2-3 tree has height 1.
¡ñ One possible answer: 2, 3, 4, 5, 6, 1, 7 35
12467
All leaves are at depth 1.
Not sure why? Make sure to see https://tinyurl.com/balanceYD.
No matter the insertion order you choose, resulting B-Tree is always bushy! ¡ñ May vary in height a little bit, but overall guaranteed to be bushy.
datastructur.es
B-Tree Invariants
Because of the way B-Trees are constructed, we get two nice invariants:
¡ñ All leaves must be the same distance from the source.
¡ñ A non-leaf node with k items must have exactly k+1 children.
¡ñ Example: The tree given below is impossible.
¡ð Leaves ([1] and [5 6 7]) are a different distance from the source.
¡ð Non-leaf node [2 3] has two items but only only one child. Should have
three children.
4
23
5 6 7
1
We have not proven these invariants rigorously, but try thinking them through.
datastructur.es
B-Tree Invariants
Because of the way B-Trees are constructed, we get two nice invariants:
¡ñ All leaves must be the same distance from the source.
¡ñ A non-leaf node with k items must have exactly k+1 children. These invariants guarantee that our trees will be bushy.
datastructur.es
B-Tree Runtime Analysis
datastructur.es
Height of a B-Tree with Limit L
L: Max number of items per node.
Height: Between ~log (N) and ~log (N) L+1 2
¡ñ Largest possible height is all non-leaf nodes have 1 item.
¡ñ Smallest possible height is all nodes have L items.
¡ñ Overall height is therefore ¦¨(log N).
**
** best ** ** case
*
near
* worst *
case
** * * *
N: 8 items
L: 2 max per node H: 2
Height grows with log2(N)
**
**
**
**
**
**
**
**
**
N: 26 items
L: 2 max per node H: 2
Height grows with log3(N)
datastructur.es
Runtime for contains Runtime for contains:
¡ñ Worst case number of nodes to inspect: H + 1
¡ñ Worst case number of items to inspect per node: L
¡ñ Overall runtime: O(HL)
Since H = ¦¨(log N), overall runtime is O(L log N).
¡ñ Since L is a constant, runtime is therefore O(log N).
datastructur.es
Runtime for add Runtime for add:
¡ñ Worst case number of nodes to inspect: H + 1
¡ñ Worst case number of items to inspect per node: L
¡ñ Worst case number of split operations: H + 1
¡ñ Overall runtime: O(HL) = O(L)
Since H = ¦¨(log N), overall runtime is O(L log N).
¡ñ Since L is a constant, runtime is therefore O(log N).
Bottom line: contains and add are both O(log N).
datastructur.es
Summary
datastructur.es
Summary
BSTs have best case height ¦¨(log N), and worst case height ¦¨(N). ¡ñ Big O is not the same thing as worst case!
B-Trees are a modification of the binary search tree that avoids ¦¨(N) worst case.
¡ñ Nodes may contain between 1 and L items.
¡ñ contains works almost exactly like a normal BST.
¡ñ add works by adding items to existing leaf nodes.
¡ð If nodes are too full, they split.
¡ñ Resulting tree has perfect balance. Runtime for operations is O(log N).
¡ñ Have not discussed deletion. See extra slides if you¡¯re curious.
¡ñ Have not discussed how splitting works if L > 3 (see some other class).
¡ñ B-trees are more complex, but they can efficiently handle ANY insertion
order.
datastructur.es
Binary Search Tree Deletion A Quick History (Extra)
This will not be covered in any homework, lab, or exam.
datastructur.es
Randomized Search Trees Including Deletion
Earlier, we saw that BSTs have:
¡ñ Worst case ¦¨(N) height.
¡ñ Best case ¦¨(log N) height.
¡ñ ¦¨(log N) height if constructed via random inserts.
Can also simulate a sequence of random insertions and deletions.
datastructur.es
Randomized Search Trees Including Deletion
In real world applications, items are added and removed from sets (or maps) all the time! Let¡¯s try out a simulation that includes deletion.
max: height + 1
avg: average depth + 1
Video courtesy of Kevin Wayne (Princeton University)
datastructur.es
Important Question: What about Real World BSTs?
BSTs have:
¡ñ Worst case ¦¨(N) height.
¡ñ Best case ¦¨(log N) height.
¡ñ ¦¨(log N) height if constructed via random inserts.
¡ñ ¦¨(sqrt N) height after random insertion and deletion.
¡ð Assumes you always pick the successor when deleting!
¡ð There¡¯s an interesting story how this ¦¨(sqrt N) bound was found…
datastructur.es
History of Asymmetric Hibbard Deletion (AHD) Analysis
In 1975, Gary Knott wrote his Ph.D. thesis ¡°Deletion in Binary Storage Trees.”
¡ñ In this thesis, Knott ran simulations and conjectured that random insertion and deletion using AHD improved the average depth of trees.
In 1983, Jeffrey Eppinger of Carnegie Mellon wrote an article overturning this conjecture.
¡ñ Showed that average depth did improve with random insertions and deletions, but only for a while. Eventually, it got worse!
Here: Average IPL means average depth. The values are normalized to the height of a tree built only by random insertion.
datastructur.es
History of Asymmetric Hibbard Deletion (AHD) Analysis
In 1983, Jeffrey Eppinger of Carnegie Mellon an article overturning this conjecture.
¡ñ Showed that average depth did improve with random insertions and deletions, but only for a while. Eventually, it got worse!
¡ñ Conjectured that average depth is ¦¨(log2 N) rather than ¦¨(log N).
¡ð log(1000000) = 6, log2(1000000) = 36
In 1987, Culberson and Munro overturned Eppinger¡¯s conjecture.
¡ñ Did a more rigorous empirical study, and found that average depth appeared to be ¦¨(sqrt N), not ¦¨(log2 N).
¡ñ This ¦¨(sqrt N) bound has not been mathematically proven!
datastructur.es
Symmetric Hibbard Deletion
If you randomly pick between successor and predecessor, you get ¦¨(log N) height.
¡ñ See Culberson and Munro, 1989 or Knuth¡¯s TAOCP Volume 3 – 6.2 for more if you¡¯re curious (the history is pretty interesting).
datastructur.es
2-3 Tree Deletion (Extra)
This will not be covered in any homework, lab, or exam.
datastructur.es
Deletion Operations
As with regular Binary Search Trees, deletion is a more complicated operation.
¡ñ Many possible deletion algorithms.
¡ñ We¡¯ll develop a deletion algorithm together.
5
17
13 21 23
15 18 19 22 24 25
datastructur.es
Deletion from a Regular BST (Review)
In a regular BST, when we delete a value ¦Á with 2 children, we: ¡ñ Copy the value of the successor into ¦Á.
¡ñ Then we delete the successor. delete(4)
4
26
135
55
2626
1 3 X
1 3 5.5
successor
5.5
5.5
datastructur.es
Deletion from a 2-3 Tree
In a 2-3 Tree, when we delete ¦Á from a node with 2 or more children, we: ¡ñ Swap the value of the successor with ¦Á.
¡ñ Then we delete the successor value.
Example: delete(17):
¡ñ Swap 17 with its successor 18, then delete 17.
Note: Successor will always be in a leaf node!
17
18
13
21 23 13
21 23
5 15 1819 22 2425 5 15 1719 22 2425 datastructur.es
Multi-Key Leaves vs. Single-Key Leaves
If deleting from a leaf with multiple keys, the deletion is trivial. We simply remove the item from the leaf, and we are done.
delete(17):
13
17
18
21 23 13
21 23
5 15 1819 22 2425 5 15 1719 22 2425 18
13
21 23
5 15 19 22 2425
datastructur.es
Multi-Key Leaves vs. Single-Key Leaves
If our leaf has multiple keys, the deletion is trivial. We simply remove the item. If our leaf has a single key, we cannot simply remove the node entirely.
¡ñ Why? delete(22):
13
17 17
21 23 13
21 23
5 15 1819 22 2425 5 15 19
2425
datastructur.es
Multi-Key Leaves vs. Single-Key Leaves
If our leaf has multiple keys, the deletion is trivial. We simply remove the item. If our leaf has a single key, we cannot simply remove the node entirely.
¡ñ Any node with k items must have k + 1 children!
¡ñ Instead, we¡¯ll leave an empty node, which must be filled.
¡ñ Filling the empty node is complex and has many cases (coming soon).
delete(22):
17
17
13
21 23 13
21 23
5 15 1819 22 2425 5 15 19 X 2425 datastructur.es
Filling in Empty Nodes (FIEN)
The hard part about deletion is filling empty nodes.
¡ñ There are three interesting cases to consider.
¡ñ For reasons that will become clear later, we will talk about how to fill empty
boxes ANYWHERE in the tree, not just in the leaves.
17
X 21 23
5 19 22 24 264
X48X6
Since empty nodes contain k=0 items, non-leaf empty nodes have 1 child!
5
7
139
157
datastructur.es
FIEN Case 1A: Multi-Key Sibling
In case 1A, the empty node¡¯s adjacent sibling has multiple keys.
¡ñ
Very hard optional challenge: Try to fill in X in the diagram below so that the result is a valid 2-3 tree.
17
X 21 23
5 19 22 24
datastructur.es
FIEN Case 1A: Multi-Key Sibling
In case 1A, the empty node¡¯s adjacent sibling has multiple keys.
¡ñ ¡ñ
X steals parent¡¯s item. Parent steals sibling¡¯s item.
If X was not a leaf node, X steals one of sibling¡¯s subtrees.
17 21
X 2123 17 23
5 19 22 24
5 19 22 24
21
17
23
5 19 22 24
datastructur.es
FIEN Case 1A: Multi-Key Sibling
In case 1A, the empty node¡¯s adjacent sibling has multiple keys.
¡ñ X steals parent¡¯s item. Parent steals sibling¡¯s item.
¡ñ If X was not a leaf node, X steals one of sibling¡¯s subtrees. Try to delete 17 from the tree below.
¡ñ Hint: You¡¯ll end up in FIEN Case 1. 17
13
21 24
5 15192223 25
datastructur.es
FIEN Case 1A: Multi-Key Sibling
In case 1A, the empty node¡¯s adjacent sibling has multiple keys.
¡ñ X steals parent¡¯s item. Parent steals sibling¡¯s item.
¡ñ If X was not a leaf node, X steals one of sibling¡¯s subtrees. Try to delete 17 from the tree below.
Wasn¡¯t necessary since X was not a leaf node.
¡ñ Swap 17 with 19, then delete 17. 19
19
13
21 24 13
22 24
5 15 X 2223 25 5 15 21 23 25
datastructur.es
FIEN Case 2A: Multi-Key Parent
In case 2A, siblings on the right all have one key, but parent has two.
¡ñ Very hard optional challenge: Try to fill in X in the diagram below so that the result is a valid 2-3 tree.
26
X48
139
5
7
datastructur.es
FIEN Case 2A: Multi-Key Parent
In case 2A, siblings on the right all have one key, but parent has two.
¡ñ X and right sibling steal parent¡¯s keys. Middle sibling moves into the parent.
¡ñ Subtrees are passed along so that every node has the correct children.
264
X48 2X68 13913579
4
2X68
5
7
5
7
139
datastructur.es
FIEN Case 2A Exercise
In case 2A, siblings on the right all have one key, but parent has two.
¡ñ X and right sibling steal parent¡¯s keys. Middle sibling moves into the parent.
¡ñ Subtrees are passed along so that every node has the correct children. Try to delete 3 from the tree below.
3
158
069
2
4
datastructur.es
FIEN Case 2A Exercise
In case 2A, siblings on the right all have one key, but parent has two.
¡ñ X and right sibling steal parent¡¯s keys. Middle sibling moves into the parent.
¡ñ Subtrees are passed along so that every node has the correct children. delete(3)
344
15815816
0 690 69025X89 4
16
2
4
2
X
02589
datastructur.es
FIEN Case 3: Single-Key Parent and Sibling
In FIEN Case 3: The parent and all siblings have only one item.
¡ñ ¡ñ
Combine 1 sibling and parent into a single node that replaces X. Send the blank X up one level.
If blank ends up as the new root, just delete the blank and we are done.
4
X6
157
X
46
157
46
157
datastructur.es
FIEN Case 3 Exercise
In FIEN Case 3: The parent and all siblings have only one item.
¡ñ Combine 1 sibling and parent into a single node that replaces X. Send the blank X up one level.
¡ñ If blank ends up as the new root, just delete the blank and we are done.
6
248
13579
delete(6)
datastructur.es
FIEN Case 3 Exercise
In FIEN Case 3: The parent and all siblings have only one item.
¡ñ Combine 1 sibling and parent into a single node that replaces X. Send the blank X up one level.
¡ñ If blank ends up as the new root, just delete the blank and we are done.
delete(6)
677
24824824X
13579135X913589 4
27
This is just FIEN Case 1A.
1 3 5 8 9
datastructur.es
FIEN Case 3 Exercise #2
In FIEN Case 3: The parent and all siblings have only one item.
¡ñ Combine 1 sibling and parent into a single node that replaces X. Send the blank X up one level.
¡ñ If blank ends up as the new root, just delete the blank and we are done. delete(4)
4
26
1357
datastructur.es
FIEN Case 3 Exercise #2
In FIEN Case 3: The parent and all siblings have only one item.
¡ñ Combine 1 sibling and parent into a single node that replaces X. Send the blank X up one level.
¡ñ If blank ends up as the new root, just delete the blank and we are done. delete(4)
455
X
26262X25
135713X713671367
25
1367
datastructur.es
Citations
Bee-tree from https://beelore.files.wordpress.com/2010/01/thai_beetree.jpg Some isometry figures from Algorithms textbook.
Fantasy code for 2-3 Tree courtesy of Kevin Wayne
datastructur.es