Multiway Trees
Data Structures and Abstractions
Multiway Trees
Lecture 30
*
Multiway Trees
Multiway trees are trees that store more than one piece of data in a node and more than two links.
A 3-way tree stores up to 2 items of data per node.
A 4-way tree stores up to 3 items of data per node,
etc.
Insertion is done in the same way as with a simple BST.
Of course this means that, like a simple BST, it is possible to end up with a linked list.
Reminder: in the animations we just show storage of a single integer, but in reality trees are used to store larger amounts of information using a key.
*
3-way Tree Insert Animation
*
50
root
Insert(50)
Insert(20)
20
Insert(55)
55
Insert(47)
47
Insert(46)
46
Insert(33)
33
Insert(25)
25
Insert(87)
87
Insert(88)
88
Insert(53)
53
Insert(62)
62
Insert(69)
69
END
Insert(28)
28
B Trees
B Trees are balanced multi-way trees in which a node can have up to k subtrees.
Suitable for data storage on disks when collections are too large for internal memory.
As for most data stores, the elements are usually records, which have a key and a value.
The key is used to locate the node where the record is to be stored.
*
B Tree Definition
Formally, a B Tree of order m is a multi-way tree in which:
the root is either a leaf or has at least two subtrees;
each leaf node holds at least m/2 keys;
each non-leaf node holds k-1 keys and k pointers to subtrees where m/2 <= k <= m;
all leaves are on the same level.
m is normally large (50-500) so that all the information stored in one block on disk can fit into one node.
*
Insertion into a B Tree
Insertion of a key (and its record) is always done at a leaf node.
This may cause changes higher up the tree.
The method is:
Locate: Do a search to locate the leaf in which the new record should be inserted.
Insert:
If the leaf has room, insert the record, in order of key.
If the node if full, ‘split’ it and move the record with the median key upwards.
Repeat (b) until either a non-full node is found, or root is reached.
If the root is full, split it and create a new root node containing one key.
*
5-way B Tree Insert Animation
*
Insert(50)
50
root
Insert(20)
5-way B Tree Insert Animation
*
Insert(50)
20
root
Insert(20)
50
Insert(55)
Insert(47)
55
5-way B Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
55
Insert(55)
Insert(47)
Insert(46)
The ’46’ should go here
As it cannot fit, the node is ‘split’, and the middle value moved up
5-way B Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
55
Insert(55)
Insert(47)
Insert(46)
As it cannot fit, the node is ‘split’, and the middle value moved up
46
Insert(33)
5-way B Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
55
Insert(55)
Insert(47)
Insert(46)
33
46
Insert(33)
Insert(25)
5-way B Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
55
Insert(55)
Insert(47)
Insert(46)
25
33
46
87
88
Insert(33)
Insert(25)
Insert(87)
Insert(88)
Insert(53)
The ’53’ should go here
As it cannot fit, the node is ‘split’, and the middle value moved up
5-way B Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
53
Insert(55)
Insert(47)
Insert(46)
25
33
46
87
88
55
Insert(33)
Insert(25)
Insert(87)
Insert(88)
Insert(53)
Insert(62)
5-way B Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
53
Insert(55)
Insert(47)
Insert(46)
25
33
46
62
87
55
Insert(33)
Insert(25)
Insert(87)
Insert(88)
Insert(53)
88
Insert(62)
Insert(69)
5-way B Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
53
Insert(55)
Insert(47)
Insert(46)
25
33
46
62
69
55
Insert(33)
Insert(25)
Insert(87)
Insert(88)
Insert(53)
As it cannot fit, the node is ‘split’, and the middle value moved up
87
88
Insert(62)
Insert(69)
Insert(28)
The ’28’ should go here
5-way B Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
28
53
Insert(55)
Insert(47)
Insert(46)
25
33
46
62
69
47
55
Insert(33)
Insert(25)
Insert(87)
Insert(88)
Insert(53)
87
88
Insert(62)
Insert(69)
Insert(28)
END
Multiway Tree vs B Tree
The B Tree is clearly more efficient in terms of space.
The B Tree is balanced and therefore has a lower search time.
The B Tree, however, is more complicated to code.
If search time is important (as with a database or list of objects in the scene of a game) the use of B Trees is essential.
*
B+ Trees
Trees are good for searching, but have poor sequential access.
Some databases require both types of processing, for these one uses a B+ tree.
A B+ tree is a B Tree where only the keys are stored in the tree, all the data actually resides in the leaves.
And the leaves are all connected with a list.
This kind of tree is particularly useful for databases that reside entirely on disk.
*
5-way B+ Tree Insert Animation
*
Insert(50)
50
root
Insert(20)
first
Linked list allows sequential access
Tree allows fast searching
5-way B+ Tree Insert Animation
*
Insert(50)
20
root
Insert(20)
50
Insert(55)
Insert(47)
55
first
5-way B+ Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
55
Insert(55)
Insert(47)
Insert(46)
The ’46’ should go here
As it cannot fit, the node is ‘split’,
the middle value goes on the left and the key value (only) of the middle value is moved up
first
5-way B+ Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
55
Insert(55)
Insert(47)
Insert(46)
46
47
Insert(33)
As it cannot fit, the node is ‘split’,
the middle value goes on the left and the key value (only) of the middle value is moved up
first
Key only
Whole record
5-way B+ Tree Insert Animation
*
Insert(50)
20
root
50
Insert(20)
47
55
Insert(55)
Insert(47)
Insert(46)
33
46
47
Insert(33)
first
Insert(25)
The ’25’ should go here
As it cannot fit, the node is ‘split’,
the middle value goes on the left and the key value (only) of the middle value is moved up
5-way B+ Tree Insert Animation
*
Insert(50)
root
50
Insert(20)
33
55
Insert(55)
Insert(47)
Insert(46)
47
Insert(33)
first
Insert(25)
As it cannot fit, the node is ‘split’,
the middle value goes on the left and the key value (only) of the middle value is moved up
20
25
46
47
33
Keys only
Whole records
END
Comparison of B and B+ Trees
They are both balanced so that operations such as Insert and Delete can be done in O(h) time where h is the height of the tree.
B+ trees also allow for fast sequential processing.
B+ trees store the key only in RAM, not the whole record, therefore they use less RAM.
Both can be tuned to have node sizes that allow fast disk reads.
As B+ trees use less RAM, they can have larger nodes which improves the speed of operations based on the height.
Both B Trees and B+ Trees have one major disadvantage in common: since any node can be up to half empty, they waste space.
*
Handling the Wasted Space Problem
A way to get around this is to really treat them as ADSs, i.e. they are conceptually B Trees but are actually stored in some other way.
For example a vector, linked list, dynamic array etc.
The operations (Insert, Delete, Search etc) in the interface do not change, but the internal representation and code do change.
However, it is worth noting that although there are many different ways of solving the space problem, there will always be a space/time/simplicity trade off.
*
Further exploration
Reference book, Introduction to Algorithms. For further study, there are many tree and tree algorithms described in the reference book. For this unit, the lecture material is sufficient.
An earlier textbook used in this unit (some years ago) is a better reference to some of the more interesting Tree (and graph) data structures. The book is available in the library. It is “Algorithms, Data Structures, and Problem solving using C++” by Mark Weiss.