Chapter 10:
Tree-Structured Indexes (with a Focus on B+ Trees)
Ed Knorr, Slides from Fall 2020 Based on:
• Sections 10.1, 10.3-10.8.4 of Ramakrishnan & Gehrke (textbook)—i.e., basically all of Chapter 10, but skip the parts about ISAM indexes.
• Sections 8.1, 8.2, and 8.2.1 – Read or re-read this part of the textbook if you want more background information about indexing. Highly recommended! The rest of Chapter 8 may be useful to you when we reach other areas of the course.
• Some of this chapter may be a review from CPSC 221 (although they sometimes teach AVL trees, instead). Regardless, B+ trees are very important external data structures used by RDBMSs!
1
Some Learning Goals
List the 3 ways or “alternatives” of representing data entries k* having key k, in a general index.
Justify the use of indexes in database systems.
Explain how a tree-structured index supports both range and point queries.
Build a B+ tree for a given set of data. Show how to insert and delete keys in a B+ tree.
Analyze the complexity of: (a) searching, (b) inserting into, and (c) deleting from, a B+ tree.
Explain why a B+ tree needs sibling pointers for its leaf pages.
Explain why B+ trees tend to be very shallow, even after storing many millions of data entries. Equivalently, provide arguments for why B+ trees can store large numbers of data entries in their pages.
Explain the pros and cons of allowing prefix key compression in an index.
Given a set of data, build a B+ tree using bulk loading.
2
Some Learning Goals (cont.)
Provide several advantages of bulk loading compared to using a large number of SQL INSERT statements.
Estimate the number of I/Os required to traverse the records of a large file (whose records have key k) using B+ trees for: (a) the clustered case, and (b) the unclustered case. Justify any assumptions that are needed.
Using a diagram, show the principal difference between a clustered and an unclustered index.
Provide examples of when clustering doesn’t help performance, and may actually hinder the performance of certain kinds of queries.
3
Motivation
On the one hand, we want to keep as much data as possible about an entity (e.g., shipping record, telephone transaction, customer record, student record), but on the other hand, we want good performance.
We can have both, via indexing!
• e.g., Searching 100 million records in real time
Good performance is a major DBA goal.
Note that SQL queries perform “exact” matches in indexes; so, spelling and typing matter (unlike Google’s forgiving search engine).
• You can do point and range queries in SQL.
Two major categories of indexes: Tree-structured indexes
Hash index structures
4
Index Entries and Data Entries
Letusdenoteanindexentryasaninternal
Letusdenoteadataentryasa
(From Chapter 8) For any index, there are 3 alternatives for storing data entries k*, whose search key value is k:
Alt. 1: Whole data record with search key k:
Alt. 2:
Alt. 3:
5
Introduction
Tree-structuredindexingtechniquessupportboth range searches and equality searches.
B+ tree: A dynamic, balanced tree structure Adjusts gracefully for insertions and deletions
Skew is generally not an issue.
6
Range Searches in a Sorted File
“Find all students with GPA > 3.0”
Consider a flat file (sometimes called a heap file, but “heap” is
not the same as a CPSC 221 data structures heap).
If the data is sorted on GPA in this file, then: (1) we can use binary search to find one student with that GPA, and then (2) we can sequentially scan to find the others (if there are duplicates).
The “cost” of binary search can be high on a large file.
• Doyouknowwherethestartandendoftherangeof
pages is? Keep in mind that they may be non-contiguous.
• Doyouknowwherethemiddleofarangeofpagesis?
• Isthereanevendistributionofnumbersofrecordson each page?
• Howisdataskewhandled?
• Howmanyseeksdowedo?
7
Range Searches in a Sorted File (cont.)
“Find all students with GPA > 3.0”
SELECT STUDENT_NAME
FROM GRADE G, STUDENT S
WHERE G.STUDENT_ID = S.STUDENT_ID AND G.GPA > 3.0;
Simple idea: Create an ‘index’ file:
k1 k2
kM
Index File
Page 1 Page 2 Page 3
Page N
Data File
Now, we can do a faster search using a smaller (index) file!
8
B+ Tree:
Most Widely-Used Index in RDBMSs
(Unique) Search, Insert, or Delete is O(logF N), where F = fanout and N = # leaf pages (or roughly the total # of leaf + internal pages). N (approx.) = #recs / (avg.#recs /page)
We want to keep the tree height-balanced. Why?
Minimum 50% occupancy (except for root). Each node contains d ≤ m ≤ 2d entries. d is called the order of the tree, and m is the # of keys in a particular node.
Supports equality and range searches, efficiently Index Entries
Data Entries (“Sequence set”)
9
2* 3* 5* 7* 14* 16*
19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Example: B+ Tree
Search always begins at root, and key comparisons direct it to a leaf
Search for: 5*, 15*, all data entries ≥ 24*, etc. Root
13 17 24 30
Based on the search for 15*, we know it is not in the tree!
10
B+ Tree Statistics, in Practice
Some stats:
Typical order = 100
Typical fill-factor = 67% Average fanout = 133
Typical maximum capacities:
Height 4: 1334 = 312,900,721 leaf pages (not records) Height 3: 1333 = 2,352,637 leaf pages
For a busy index, we can often keep the top levels in the buffer pool. Why?
Level0= 1page = Level1= 133pages= Level 2 = 17,689 pages =
4KB ~0.5MB ~69 MB
11
DB2 Metadata for B+ Tree Indexes
For a clustering index: NEAROFFPOSF and FAROFFPOSF indicate how many rows are away from their ideal data page (recall our study of relocated rows, in the previous unit).
LEAFDIST indicates 100x the average number of physical pages between successive leaf pages when doing a sequential index scan.
CLUSTERRATIOF: 100% means the index and data ordering match perfectly. Less than 100% means that some of the data pages do not follow the index order.
As CLUSTERRATIOF drops, think about reorganizing the data.
For more information, and many other kinds of index metadata, see the SYSIBM.SYSINDEX* DB2 catalog tables, such as SYSIBM.SYSINDEXES and SYSIBM.SYSINDEXPART.
12
Indexes Used by Commercial RDBMSs (Other indexes may exist for extended features.)
DBMS
Primary Index Structures
Secondary Index Structures (dense)
IBM DB2
B+ Tree (dense)
B+ Tree
Oracle
B+ Tree (dense), Hash (dense)
B+ Tree, Bitmap
SQL Server
B+ Tree (sparse) B+ Tree (sparse)
B+ Tree B+ Tree
Sybase (purchased by SAP in 2010)
13
Inserting a Data Entry into a B+ Tree
Find correct leaf L
Put data entry onto L
If L has enough space, done!
Else, must split L (into L and a new node L2) • Redistribute entries evenly, copy up middle key • Insert index entry pointing to L2 into parent of L
This can happen recursively:
To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
Splits “grow” tree; only a root split increases the
height
Tree growth: gets wider or one level taller at top
14
Observe how the minimum
Entry to be inserted in parent node.
Note the difference between copy-up and move-up. Be sure you understand the reasons for this.
17
Entry to be inserted in parent node. (Note that 17 is moved up and only
Inserting 8* into the Example B+ Tree from Slide 10
occupancy is guaranteed in both leaf and index page splits.
2* 3*
5* 7* 8*
5 13
24 30
5
(Note that 5 is copied up and continues to appear in the leaf.)
appears once in the index. Contrast this with a leaf split.)
15
Example B+ Tree After Inserting 8*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14*16* 19*20*22* 24*27*29* 33*34*38*39*
Note that root was split, leading to increase in height
In this example, we can avoid splitting by re-distributing entries; however, this is usually not done in practice.
16
Deleting a Data Entry from a B+ Tree
Start at root, find leaf L where entry belongs
Remove the entry
If L is at least half-full, done! If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L)
• If re-distribution fails, merge L and sibling
If merge occurred, must delete entry (pointing to L
or sibling) from parent of L
Merge could propagate to root, decreasing height
17
2* 3*
5* 7* 8* 14*16* 22*24* 27*29* 33*34*38*39*
Tree after Inserting 8* (see Slide 16) and then Deleting 19* and 20* …
Deleting 19* is easy.
Root
17
5 13 27 30
Deleting 20* is done with re-distribution. Notice how the middle key is copied up.
18
Must merge
30
… and then Deleting 24*
Observe ‘toss’ of index entry (on right), and ‘pull down’ of index entry (below)
22* 27* 29*
33* 34* 38* 39*
Root
2* 3* 5* 7* 8* 14*16* 22*27*29*
33*34*38*39*
5 13 17 30
19
2* 3*
5* 7* 8*
14*16* 17*18*
20*21* 22*27*29*
33*34*38*39*
Example of Non-Leaf Re-Distribution
Tree is shown below during deletion of 24* Where was “24” prior to the deletion process?
In contrast to previous example, can re-distribute entry from left child of root to right child
5
13 17 20
30
Root
22
20
2* 3*
5* 7* 8*
14*16* 17*18*
20*21* 22*27*29*
33*34*38*39*
After Re-Distribution
Intuitively, entries are re-distributed by ’pushing through’ the splitting entry in the parent node.
It suffices to re-distribute index entry with key 20. We’ve re-distributed 17, as well, for illustration.
5 13
20 22 30
Root
17
21
An Optimization:
Prefix Key Compression
Increase fan-out for greater efficiency.
Key values in index entries only ’direct traffic’;
but often, we can compress them.
e.g., If we have adjacent index entries with search key values Danielle Miller, David Smith and Deepak Murthy, we can abbreviate David Smith to Dav
• Is this correct? What if there is a data entry Davey Jones? (Can only compress David Smith to Davi)
• In general, while compressing, must leave each index entry greater than every key value (in any subtree) to its left
Insert/delete must be suitably modified
22
Another Optimization: Bulk Loading
If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow. Why?
Bulk Loading can be done much more efficiently
Initialization: Sort all data entries, insert pointer to
first (leaf) page in a new (root) page
Root
Sorted pages of data entries; not yet in B+ tree
3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44*
23
Root
6
Bulk Loading Example
3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*
24
Bulk Loading Example (cont.)
Root
6 10
3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*
25
Bulk Loading Example (cont.)
6 12
10
3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*
26
Bulk Loading Example (cont.)
6
12 20
10
3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*
27
Bulk Loading Example (cont.)
6
12 23
10 20
3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*
28
Index entries for leaf pages are always entered into the right- most index page just above the leaf level. 3* 4* When a node fills up,
6
12
23 35
Data entry pages not yet in B+ tree
it splits. (The split may go up the right- most path to the root.)
Root 10
20
This is much faster than repeated inserts, especially when one considers locking!
35
6 12 23 38
Data entry pages not yet in B+ tree
Bulk Loading (cont.)
6* 9*
10* 11*
12* 13*
20*22*
23* 31*
35*36*
38*41* 44*
Root
10 20
3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*
29
Slow
Summary of Bulk Loading
Non-bulk approach: Multiple SQL INSERT statements
Does not give sequential storage of leaves, unless we sort the keys before issuing the INSERT statements, and pre-allocate space for the leaf pages
Lots of locking overhead
• May wish to use the LOCK TABLE statement in SQL
• Indexes require locking, too!
• Locking part of an index locks the corresponding data rows (and that may be a lot of rows) because rows cannot be inserted, updated, or deleted during this time.
Lots of log records can be generated when loading a table via INSERT statements!
30
Summary of Bulk Loading (cont.)
Bulk Loading
Has advantages for concurrency control and logging
Fewer I/Os during building
Leaves will be stored sequentially (and will be linked, of course)
Can control “fill factor” on pages, and even leave blank pages periodically
• In DB2, PCTFREE and FREEPAGE are options on the CREATE TABLESPACE statement and the CREATE INDEX statement.
31
Summary of Bulk Loading (cont.)
Bulk Loading Steps during a Reorg:
Backup the old table (i.e., do a full “image copy”).
Unload (export) the records from the table.
Sort keys/records by clustering order (for the clustering index and the initial data load).
Delete the old table and its indexes.
Allocate the new table and the new index files, perhaps with
more space and different PCTFREE and FREEPAGE parameters.
Create the indexes (via CREATE INDEX) before loading.
Turn off logging for the new table.
Bulk load (populate) the table, and bulk-load/build the indexes.
Backup the new table (e.g., DB2 will not let users use the table without running a full image copy beforehand … why not?)
Run RUNSTATS and rebind the existing query plans.
Turn logging back on.
Make the table available to users.
32
Summary of ’Order’
Different authors may have slightly different definitions of “order”.
A useful rule/phrase to remember is: “non-root nodes are at least half full”.
Variable-sized records and search keys mean different nodes may contain different numbers of entries.
** To simplify some of our calculations:
We’ll assume that the maximum number of entries in an internal page and
a leaf page are the same (for Alternative 2 indexes).
We’ll accept either an odd or an even number of maximum entries on a leaf/internal B+ tree page for our calculations (unless told otherwise).
We’ll ignore “overhead” bytes on each page (e.g., sibling pointers and housekeeping bytes, which differ for different kinds of pages).
Think back to Chapter 9c on page and record layouts, or to the prefix key compression slide, or even to Alt. 2 vs. Alt. 3—to gain an appreciation for these simplifications to our calculations (otherwise, we’d be overwhelmed).
33
Summary of ’Order’ (cont.)
Some indexes may have a lot of duplicates, and can be handled via:
Overflow pages
Alternative 3 indexes
Regular Alt. 2 situation, with a tweak to the B+ tree algorithm so that records in the left child are ≤ instead of just < (and records in the right child remain as ≥).
Concatenating the key and the rid, and then treating this combination as a unique key (Alternative 2)
Even with fixed-length fields, multiple records with the same search key value (duplicates) can lead to variable- sized data entries in Alt. 3.
34
Summary of the Rest of the Chapter
Tree-structured indexes are very efficient for range searches and for equality searches
We didn’t cover ISAM indexes; however, they are similar in spirit to B+ tree indexes for searching.
There is a nice coverage of ISAM indexes on pages 341-344 of the textbook. You won’t be tested on them, but it’ll strengthen your understanding of indexing.
ISAM: Only leaf pages can be modified; the internal nodes do not change (they simply direct traffic).
ISAM indexes are mostly static, and are well-suited to dictionaries or data that doesn’t change frequently.
Overflow pages are used when the target leaf page is full.
• Leads to potentially long overflow chains
• Overflow chains can degrade performance
35
Summary (cont.)
A B+ tree is a dynamic index structure.
Inserts/deletes leave tree height-balanced
High fanout (F) means depth is rarely more than 3 or 4
O(logF N) cost for: • Unique searches • Insertions
• Deletions
Almost always better than maintaining a sorted file
In practice, nodes have about 67% occupancy, on average.
Usually preferred over ISAM; little reason to use ISAM today Adjusts to growth gracefully
Key compression increases fanout, reduces height
36
Summary (cont.)
Bulk loading can be much faster than repeated INSERTs for a table, especially a large one.
B+ trees are the most widely-used index in DBMSs because of their versatility .
• One of the most optimized components of a DBMS!
Indexing principles are very useful to know... even if you don’t plan to work with databases in the future!
37