Structure and Insertion
Multiple level indexes can be very useful in speeding up queries
▪ B and B+ trees are used in commercial DBMSs
B trees have two desirable properties
▪ They are multiple level indexes
▪ Maintain as many levels as are
required for the file being indexed
▪ The number of levels is usually small, three or four is common
▪ Tree nodes are managed so that each node is at least 1⁄2 full
All paths from root to leaf are the same length Therefore balanced
Generalization of a binary search tree Nodes have > 2 children, greater fan-out Each tree node is mapped to a disk block
B+ tree nodes contain data entries: keys and pointers
▪ Up to n search keys and n+1 pointers to nodes
What is the value of n? In interior nodes, pointers point to next level nodes
▪ Label the search keys K0 to Kn-1, and pointers p0 to pn
▪ Pointer p0 points to nodes whose search key values are less than K0
▪ Other pointers, pi, point to nodes with search keys greater than or equal to Ki-1 and less than Ki
▪ An interior node must use at least (n + 1)/2 pointers p0 K0 p1 K1 p2 K2 p3
X < 12 12 ≤ X < 24 24 ≤ X < 29 X ≥ 29
pi points to subtrees with search key values, X 4n + 8(n + 1) ≤ 4,096 – n = 340
n should be as large as possible while still allowing n search keys and n+1 pointers to fit on a disk block
12
24
29
e.g. block size = 4,096, search keys 4 byte integers and pointers 8 bytes
Search keys in leaf nodes of a dense index match records attribute values in the data file
▪ The leaf nodes contain the search keys in order
▪ The left most n pointers point to records in the data file ▪ A leaf node must use at least (n + 1)/2 of these pointers
▪ i.e. contain at least (n + 1)/2 key values
▪ The right most pointer points to the next leaf p0 K0 p1 K1 p2 K2 p3
next leaf in the tree
The value of n may be the same as interior nodes
12
24
29
But may be smaller depending on what leaf nodes represent
record with key 12
record with key 24
record with key 29
In a sparse B+ tree index leaf search keys refer to the search key attribute value of the first record of a block
▪ In the underlying sorted file ▪ Such an index must be primary
Leaf nodes contain data entries
▪ Otherwise the index is as described for a dense index
In a file organized as a B+ tree, the leaf nodes contain
records
▪ So contain less than n values
▪ Since records are larger than data entries
Note that these two organizations are very similar
Initially a B+ tree consists of just one node
▪ The root
▪ Which is also a leaf
When the tree consists of at least two levels
the root is no longer a leaf
▪ But it does not follow the occupancy rules for leaves or interior nodes
▪ And may have as few as one search key and two pointers Special case for both insertion and removal
17
n =3
internal nodes must contain at least 2 pointers
10
27
87
from (n+1)/2 = 2
2
5
10
13
15
17
22
27
35
87
91
underlying DB file containing records
start search for key K with nd = root repeat
if nd is leaf find key[i] == K
if found return record at p[i]
else return record not found if nd is interior
if K < key[0] set nd = p[0]
else if K > key[last] set nd = p[last+1] else find key[i] | key[i] <= K < p[i+1]
set nd = p[i+1]
find 22 find 16
search requires one page read per level plus one to retrieve the record
17
27
87
10
2
5
10
13
15
17
22
27
35
87
91
B+ trees are useful for processing range searches find values between 13 and 27
Search the tree for the leaf that should contain the lower bound of the range
17
Follow the leaf pointers until a key greater than the upper bound is reached
A similar process can be used if there is no lower bound or no upper bound
27
87
10
17
22
2
5
10
13
15
27
35
87
91
Insert the record in the data file
▪ Create a data entry with the RID and the search key
value, K
Insert the data entry in a leaf node of the index
▪ Use the search algorithm to find the leaf node
▪ If there is space in the leaf the process is complete If the target leaf node is full then split it
▪ The first (n + 1) / 2 entries stay in the original node
▪ Create a new node to the right of the original node
with the remaining (n + 1) / 2 data entries
▪ Insert the first search key value from the new leaf and a pointer to that leaf in its parent node
1. full
2. split into two nodes
3. insert value and pointer to new node in parent interior node
Adding a data entry may cause a split
▪ If so, after inserting a new entry there will be n + 1
keys and n + 2 pointers
▪ The first (n + 2) / 2 pointers stay in the original
node
▪ Create a new node with the remaining (n + 2) / 2 pointers to the right of the original node
▪ Leave the first n / 2 keys in the original node and move the last n / 2 keys to the new node
▪ The remaining key 's value falls between the values in the original and new node
Insert the left over key and a pointer to the new node in the parent of the node
Repeat until no split or a new root is created
1. full
2. split into two nodes
2
1
3
4
5
1
2
4
5
3
3. insert value and pointer to new node in parent interior node
insert 2, 21 and 11
n = 3
2
11
21
data file
the values are maintained in order in the index pages
insert 8 n = 3
11
create new root with the first value of the new leaf node
2
181
21
11
21
create a new node with the last 1⁄2 of the values
chain the new node to the original node
insert 64, then 5
insert 23 ...
n =3
11
2
85
8
11
21
64
both leaf nodes are now full ...
... inserting 23 ... n = 3
11
23
2
5
8
11
21
64
23
64
insert 6 n = 3
11
23
2
5
8
11
21
23
64
insert 6 n = 3
161
2113
23
2
5
8
6
8
11
21
23
64
insert 6 n = 3
6
11
23
2
5
8
6
8
11
21
23
64
insert 19 and 9 n = 3
6
11
23
2
5
8
6
8
9
11
1291
21
23
64
the same tree ... n = 3
6
11
23
2
5
6
8
9
11
19
21
23
64
insert 7 n = 3
6
11
23
and now insert 8 in the parent
which will require that it splits
2
5
6
87
9
8
9
11
19
21
23
64
insert 7 – inserting 8 and a pointer in the root node
n =3
11
and make the middle value the new root
6
181
23
23
keep 1⁄2 the pointers and the first n/2 values
move 1⁄2 the pointers and the last n/2 values
2
5
6
7
8
9
11
19
21
23
64
tree after inserting 7 n = 3
11
6
8
23
2
5
6
7
8
9
11
19
21
23
64
insert more values ... n = 3
11
6
8
23
45
60
2
5
6
7
8
9
11
19
21
23
31
39
45
51
60
64
97
and insert 77 n = 3
11
6
8
23
45
60
77
97
2
5
6
7
8
9
11
19
21
23
31
39
45
51
60
64
93
... and insert 77 ...
n = 3
11
now insert 60 in the root
6
8
23
45
60
77
77
97
2
5
6
7
8
9
11
19
21
23
31
39
45
51
60
64
... and insert 77 ...
n = 3
11
60
now insert 60 in the root
6
8
23
45
77
77
97
2
5
6
7
8
9
11
19
21
23
31
39
45
51
60
64