程序代写代做代考 B tree algorithm chain Structure and Insertion

Structure and Insertion

 Multiple level indexes can be very useful in speeding up queries
▪ B and B+ trees are used in commercial DBMSs
 B trees have two desirable properties
▪ They are multiple level indexes
▪ Maintain as many levels as are
required for the file being indexed
▪ The number of levels is usually small, three or four is common
▪ Tree nodes are managed so that each node is at least 1⁄2 full
All paths from root to leaf are the same length Therefore balanced
Generalization of a binary search tree Nodes have > 2 children, greater fan-out Each tree node is mapped to a disk block

 B+ tree nodes contain data entries: keys and pointers
▪ Up to n search keys and n+1 pointers to nodes
What is the value of n?  In interior nodes, pointers point to next level nodes
▪ Label the search keys K0 to Kn-1, and pointers p0 to pn
▪ Pointer p0 points to nodes whose search key values are less than K0
▪ Other pointers, pi, point to nodes with search keys greater than or equal to Ki-1 and less than Ki
▪ An interior node must use at least (n + 1)/2 pointers p0 K0 p1 K1 p2 K2 p3
X < 12 12 ≤ X < 24 24 ≤ X < 29 X ≥ 29 pi points to subtrees with search key values, X 4n + 8(n + 1) ≤ 4,096 – n = 340 n should be as large as possible while still allowing n search keys and n+1 pointers to fit on a disk block 12 24 29 e.g. block size = 4,096, search keys 4 byte integers and pointers 8 bytes  Search keys in leaf nodes of a dense index match records attribute values in the data file ▪ The leaf nodes contain the search keys in order ▪ The left most n pointers point to records in the data file ▪ A leaf node must use at least (n + 1)/2 of these pointers ▪ i.e. contain at least (n + 1)/2 key values ▪ The right most pointer points to the next leaf p0 K0 p1 K1 p2 K2 p3 next leaf in the tree The value of n may be the same as interior nodes 12 24 29 But may be smaller depending on what leaf nodes represent record with key 12 record with key 24 record with key 29  In a sparse B+ tree index leaf search keys refer to the search key attribute value of the first record of a block ▪ In the underlying sorted file ▪ Such an index must be primary Leaf nodes contain data entries ▪ Otherwise the index is as described for a dense index  In a file organized as a B+ tree, the leaf nodes contain records ▪ So contain less than n values ▪ Since records are larger than data entries  Note that these two organizations are very similar  Initially a B+ tree consists of just one node ▪ The root ▪ Which is also a leaf  When the tree consists of at least two levels the root is no longer a leaf ▪ But it does not follow the occupancy rules for leaves or interior nodes ▪ And may have as few as one search key and two pointers  Special case for both insertion and removal 17 n =3 internal nodes must contain at least 2 pointers 10 27 87 from (n+1)/2 = 2 2 5 10 13 15 17 22 27 35 87 91 underlying DB file containing records start search for key K with nd = root repeat if nd is leaf find key[i] == K if found return record at p[i] else return record not found if nd is interior if K < key[0] set nd = p[0] else if K > key[last] set nd = p[last+1] else find key[i] | key[i] <= K < p[i+1] set nd = p[i+1] find 22 find 16 search requires one page read per level plus one to retrieve the record 17 27 87 10 2 5 10 13 15 17 22 27 35 87 91 B+ trees are useful for processing range searches find values between 13 and 27 Search the tree for the leaf that should contain the lower bound of the range 17 Follow the leaf pointers until a key greater than the upper bound is reached A similar process can be used if there is no lower bound or no upper bound 27 87 10 17 22 2 5 10 13 15 27 35 87 91  Insert the record in the data file ▪ Create a data entry with the RID and the search key value, K  Insert the data entry in a leaf node of the index ▪ Use the search algorithm to find the leaf node ▪ If there is space in the leaf the process is complete  If the target leaf node is full then split it ▪ The first (n + 1) / 2 entries stay in the original node ▪ Create a new node to the right of the original node with the remaining (n + 1) / 2 data entries ▪ Insert the first search key value from the new leaf and a pointer to that leaf in its parent node 1. full 2. split into two nodes 3. insert value and pointer to new node in parent interior node  Adding a data entry may cause a split ▪ If so, after inserting a new entry there will be n + 1 keys and n + 2 pointers ▪ The first (n + 2) / 2 pointers stay in the original node ▪ Create a new node with the remaining (n + 2) / 2 pointers to the right of the original node ▪ Leave the first n / 2 keys in the original node and move the last n / 2 keys to the new node ▪ The remaining key 's value falls between the values in the original and new node  Insert the left over key and a pointer to the new node in the parent of the node  Repeat until no split or a new root is created 1. full 2. split into two nodes 2 1 3 4 5 1 2 4 5 3 3. insert value and pointer to new node in parent interior node insert 2, 21 and 11 n = 3 2 11 21 data file the values are maintained in order in the index pages insert 8 n = 3 11 create new root with the first value of the new leaf node 2 181 21 11 21 create a new node with the last 1⁄2 of the values chain the new node to the original node insert 64, then 5 insert 23 ... n =3 11 2 85 8 11 21 64 both leaf nodes are now full ... ... inserting 23 ... n = 3 11 23 2 5 8 11 21 64 23 64 insert 6 n = 3 11 23 2 5 8 11 21 23 64 insert 6 n = 3 161 2113 23 2 5 8 6 8 11 21 23 64 insert 6 n = 3 6 11 23 2 5 8 6 8 11 21 23 64 insert 19 and 9 n = 3 6 11 23 2 5 8 6 8 9 11 1291 21 23 64 the same tree ... n = 3 6 11 23 2 5 6 8 9 11 19 21 23 64 insert 7 n = 3 6 11 23 and now insert 8 in the parent which will require that it splits 2 5 6 87 9 8 9 11 19 21 23 64 insert 7 – inserting 8 and a pointer in the root node n =3 11 and make the middle value the new root 6 181 23 23 keep 1⁄2 the pointers and the first n/2 values move 1⁄2 the pointers and the last n/2 values 2 5 6 7 8 9 11 19 21 23 64 tree after inserting 7 n = 3 11 6 8 23 2 5 6 7 8 9 11 19 21 23 64 insert more values ... n = 3 11 6 8 23 45 60 2 5 6 7 8 9 11 19 21 23 31 39 45 51 60 64 97 and insert 77 n = 3 11 6 8 23 45 60 77 97 2 5 6 7 8 9 11 19 21 23 31 39 45 51 60 64 93 ... and insert 77 ... n = 3 11 now insert 60 in the root 6 8 23 45 60 77 77 97 2 5 6 7 8 9 11 19 21 23 31 39 45 51 60 64 ... and insert 77 ... n = 3 11 60 now insert 60 in the root 6 8 23 45 77 77 97 2 5 6 7 8 9 11 19 21 23 31 39 45 51 60 64