COMP2521
Data Structures & Algorithms
Week 4.3
2-3-4 Trees
(B-Tree of order 4)
1
In this lecture
Why?
Storing one item per node results in a lot of traversals
happening, which can lead to slower lookups in trees
What?
2-3-4 Trees
Data Structure
Insertion
Pseudocode
2
Search Cost
When analysing search costs in BSTs:
Worst case: Length of longest path
Average case: Roughly average path length
In both cases, path length (tree depth) is a critical factor.
Perfectly balanced tree has maximum path length =
The 2 in the path length is what we call “branching factor”. Branching factor is
the same as the number of children.
Branching factor = 2: Binary Search Tree
Branching factor = 4: 2-3-4 tree
Branching factor > 2: B-tree
log (n)2
3 . 1
Branching Factor
While we usually generalise logs with different bases as being the same time
complexity, in reality it will have an impact on the real time (this is true for most
coefficients).
If we compare branching factors for a balanced tree with 4096 nodes:
log (4096) =2 12 steps
log (4096) =4 6 steps
log (4096) =8 4 steps
3 . 2
2-3-4 Tree Structure
2-3-4 trees have one of 3 kinds of nodes:
2-nodes, which have 1 item and 2 children (BST)
3-nodes, which have 2 items and 3 children
4-nodes, which have 3 items and 4 children
4 . 1
2-3-4 Tree Structure
2-3-4 tree nodes are ordered similarly to a BST. Smaller on the left, bigger on
the right. The items in the nodes are also in sorted order.
A balanced 2-3-4 tree has all nodes the same distance from the root.
4 . 2
2-3-4 Insertion
General approach:
Firstly, find the leaf node where item belongs (normal leaf-insert), then either:
Case 1: If the node of insert is not full (i.e. less than 3 items):
Insert item at this node in the correct location
Case 2: If the node is full (i.e. has 3 items):
Node split
Apply checks to parents recursively until no more splitting
5
2-3-4 Insertion – Case 1
Insertion into a 2-node or 3-node
6
2-3-4 Insertion – Case 2 (Node Split)
Insertion into a 4-node (requires a node split).
Middle value (or one just to the left) is propagated to parent node
Values in original node split across original node and new node
Recursively apply to parent until no more splitting is required
7 . 1
2-3-4 Insertion – Case 2 (Node Split)
Simple View
Showing
Intermediate Step
7 . 2
2-3-4 Insertion – Case 2 (Node Split)
This example shows the recursive node splitting, include the splitting of the
root node.
7 . 3
2-3-4 Insertion – Full Example
8
2-3-4 Insertion – Pseudocode
insert(tree,item):
if tree is empty:
return new node containing item
node = Search(tree,item)
parent = parent of node
if node.order < 4:
insert item into node
increment node.order
else:
promote = node.data[1] // middle value
nodeL = new node containing data[0]
nodeR = new node containing data[2]
delete node
if item < promote:
insert(nodeL, item)
else:
insert(nodeR, item)
insert(parent, promote)
while parent.order = 4:
continue promote/split upwards
if parent is root ∧ parent.order = 4:
split root, making new root
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
search(tree,item):
if tree is empty:
return NULL
scan tree.data to find i such:
tree.data[i-1] < item ≤ tree.data[i]
if item=tree.data[i]: // item found
return address of tree.data[i]
else: // keep looking in relevant subtree
return search(tree.child[i],item)
1
2
3
4
5
6
7
8
9
9
2-3-4 Tree Data Structure
typedef struct node {
int order; // 2, 3 or 4
int data[3]; // items in node
struct node *child[4]; // links to subtrees
} node;
1
2
3
4
5
10 . 1
Finding which branch to follow
// n is a pointer to a (struct node)
int i;
for (i = 0; i < n->order-1; i++) {
if (item <= n->data[i]) break;
}
// go to the ith subtree, unless item == n->data[i]
1
2
3
4
5
6
10 . 2
Performance analysis
Searching a 2-3-4 tree still has a complexity of O(log(n)). However, the base on
the logarithm is bigger (i.e. higher coefficient) which results in an effectively
faster search.
For Insertion, we need to break it down a bit more:
Search for leaf node to insert = O(log(n))
If not not full, insert into node = O(1)
If not full, node split, create two new nodes = O(1)
For worst case propagation on node split (root) = O(log(n))
Overall insertion cost = O(log(n))
11
Variations on 2-3-4 trees
2-3-4 trees are actually a special case of a B-tree. A B-tree has a particular
“order” that denotes the number of max children each node can have.
B-tree, order 2 = Binary Search Tree
B-tree, order 3 = ….
B-tree, order 4 = 2-3-4 tree
B-tree, order 5 = ….
etc
12