PowerPoint Presentation
Tree Indexes
R & G – Chapter 10
1
Reminder on Heap Files
Two access APIs:
fetch by recordId (pageId, slotId)
scan (starting from some page)
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Header Page
DIRECTORY
…
2
Wouldn’t it be nice…
…if we could look things up by value?
Toward a Declarative access API
But … efficiency?
“If you don’t find it in the index, look very carefully through the entire catalog. ”
—Sears, Roebuck, and Co., Consumers’ Guide, 1897
If you don’t find it in the index, look very carefully through the entire catalog.
–Sears, Roebuck, and Co., Consumers’ Guide, 1897
3
We’ve seen this before
Data structures … in RAM:
Search trees (Binary, AVL, Red-Black, …)
Hash tables
Needed: disk-based data structures
“paginated”: made up of disk pages!
Index
An index is data structure that enables fast lookup and modification of
data entries by search key
Lookup: may support many different operations
Equality, 1-d range, 2-d region, …
Search Key: any subset of columns in the relation
Do not need to be unique
—e.g. (firstname) or (firstname, lastname)
5
Index Part 2
An index is data structure that enables fast lookup and modification of
data entries by search key
Data Entries: items stored in the index
Assume for today: a pair (k, recordId) …
Pointers to records in Heap Files!
Easy to generalize later
Modification: want to support fast insert and delete
Many Types of indexes exist: B+-Tree, Hash, R-Tree, GiST, …
Simple Idea?
Step 1: Sort heap file & leave some space
Pages physically stored in logical order (sequential access)
Do we need “next” pointers to link pages?
No. Pages are physically sorted in logical order
Step 2: Build the index data structure over this…
Why not just use binary search in this heap file?
Fan-out of 2 deep tree lots of I/Os
Examine entire records just to read key during search
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
3, 4, 5
1, 2, 7
8, 6, 9
10, _, _
Input
Heap File
8
Build a high fan-out search tree
Start simple: Sorted (key, page id) file
No record data
Binary search in the key file. Better!
Forgot: Need to break across pages!
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
1
3
5
7
9
9
Build a high fan-out search tree
Start simple: Sorted (key, page id) file
No record data
Binary search in the key file. Better!
Forgot: Need to break across pages!
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
1
3
5
7
9
10
Build a high fan-out search tree Part 2
Start simple: Sorted (key, page id) file
No record data
Binary search in the key file. Better!
Complexity?
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
1
3
5
7
9
Build a high fan-out search tree Part 3
Start simple: Sorted (key, page id) file
No record data
Binary search in the key file. Better!
Complexity: Still binary search, just a constant factor smaller input
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
1
3
5
7
9
Build a high fan-out search tree Part 4
Recursively “index” key file
Key Invariant:
Node […, (KL, PL), (KR, PR), … ] All tuples in range KL <= K < KR are in tree PL
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
1
3
5
7
9
1
7
Search a high fan-out search tree
Searching for 5?
Binary Search each node (page) starting at root
Follow pointers to next level of search tree
Complexity? O(logF(#Pages))
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
1
3
5
7
9
1
7
14
Left Key Optimization?
Optimization
Do we need the left most key?
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
1
3
5
7
9
1
7
1
7
1
Build a high fan-out search tree
Disk Layout? All in a single file, Data Pages first.
1, 2, _
3, 4, _
9,10,_
Data
Pages
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
3
5
9
7
Index
Pages
16
Status Check
Some design goals:
Fast sequential scan?
High Fan-out?
Support insertion? …
1, 2, _
3, 4, _
9,10,_
Data
Pages
Indexed File
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
3
5
9
7
Index
Pages
ISAM
Indexed Sequential
Access Method
(Early IBM Indexing Technology)
17
Insert 11, Before
3, 4, _
9,10,_
Data
Pages
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, _
3
5
9
7
1, 2, _
Index
Pages
Insert 11, After
3, 4, _
9,10,11
Data
Pages
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, 11
3
5
9
7
1, 2, _
Index
Pages
Find location
Place in data page
Re-sort page …
Insert 12?
Find location
Place in data page
Add overflow page if necessary …
Overflow
Pages
12, _, _
1, 2, _
3, 4, _
5, 6, _
7, 8, _
9, 10, 11
3
5
9
7
12, _, _
1, 2, _
3, 4, _
9,10,11
Data
Pages
Index
Pages
Recap: ISAM
Data entries in sorted heap file
High fan-out static tree index
Fast search + good locality
Assuming nothing changes
Insert into overflow pages
A Note of Caution
ISAM is an old-fashioned idea
Introduced by IBM in 1960s
B+ trees are usually better, as we’ll see
Though not always ( we’ll come back to this)
But, it’s a good place to start
Simpler than B+ tree, many of the same ideas
Upshot
Don’t brag about ISAM on your resume
Do understand ISAM, and tradeoffs with B+ trees
B+-Tree
Enter the B+ Tree
Similar to ISAM
Same interior node structure
Same search routine as before
Dynamic Tree Index
Always Balanced
Support efficient insertion & deletion
Grows at root not leaves!
“+”? B-tree that stores data entries in leaves only
Example of a B+ Tree
Occupancy Invariant
Each interior node is at least partially full:
d <= #entries <= 2d
d: order of the tree (max fan-out = 2d + 1)
Data pages at bottom need not be stored in logical order
Next and prev pointers
17
5
13
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
22*
24*
27*
29*
33*
34*
38*
39*
Root Node
Page 1
Page 4
Page 6
Page 2
Page 3
Page 5
Page 7
Page 8
Page 9
26
Sanity Check
What is the value of d?
2
What about the root?
The root is special
Why not in sequential order?
Data pages allocated dynamically
17
5
13
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
22*
24*
27*
29*
33*
34*
38*
39*
Root Node
Page 1
Page 4
Page 6
Page 2
Page 3
Page 5
Page 7
Page 8
Page 9
27
B+ Trees and Scale
How big is a height 1 B+ tree
d = 2 Fan-out?
Fan-out = 2d + 1 = 5
Height 1: 5 x 4 = 20 Records
Root Node
B+ Trees and Scale Part 2
Root Node
How big is a height 3 B+ tree
d = 2 Fan-out?
Fan-out = 2d + 1 = 5
Height 3: 53 x 4= 500 Records
B+ Trees in Practice
Typical order: 1600. Typical fill-factor: 67%.
average fan-out = 2144
(assuming 128 Kbytes pages at 40Bytes per record)
At typical capacities
Height 1: 21442 = 4,596,736 records
Height 2: 21443 = 9,855,401,984 records
Roughly 40 bytes per record
2144 * 128 kilobytes = 274.43 MB
2144 * 2144 * 128 kilobytes = 588.38 GB of pages
30
Searching the B+ Tree
17
5
13
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
22*
24*
27*
29*
33*
34*
38*
39*
Root Node
Page 1
Page 4
Page 6
Page 2
Page 3
Page 5
Page 7
Page 8
Page 9
Same as ISAM
Find key = 27
Find split on each node (Binary Search)
Follow pointer to next node
Roughly 40 bytes per record
2144 * 128 kilobytes = 274.43 MB
2144 * 2144 * 128 kilobytes = 588.38 GB of pages
32
Searching the B+ Tree: Find 27
17
5
13
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
22*
24*
27*
29*
33*
34*
38*
39*
Root Node
Page 1
Page 4
Page 6
Page 2
Page 3
Page 5
Page 7
Page 8
Page 9
Same as ISAM
Find key = 27
Find split on each node (Binary Search)
Follow pointer to next node
27*
Roughly 40 bytes per record
2144 * 128 kilobytes = 274.43 MB
2144 * 2144 * 128 kilobytes = 588.38 GB of pages
33
Searching the B+ Tree: Fetch Data
17
5
13
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
22*
24*
27*
29*
33*
34*
38*
39*
Root Node
Page 1
Page 4
Page 6
Page 2
Page 3
Page 5
Page 7
Page 8
Page 9
27*
(20, Tim)
(7, Dan)
(5, Kay)
(3, Jim)
(27, Joe)
(34, Kit)
(1, Kim)
(42, Hal)
Page 1
Page 2
Page 3
Page 4
PageId,
Record Id
Roughly 40 bytes per record
2144 * 128 kilobytes = 274.43 MB
2144 * 2144 * 128 kilobytes = 588.38 GB of pages
34
Inserting 25* into a B+ Tree Part 1
13
17
24
30
2*
3*
5*
7*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
25*
Find the correct leaf
Inserting 25* into a B+ Tree Part 2
13
17
24
30
2*
3*
5*
7*
14*
16*
19*
20*
24*
27*
29*
25*
33*
34*
38*
39*
Root Node
Find the correct leaf
If there is room in the leaf just add the entry
Inserting 25* into a B+ Tree Part 3
13
17
24
30
2*
3*
5*
7*
14*
16*
19*
20*
24*
27*
25*
29*
33*
34*
38*
39*
Root Node
Find the correct leaf
If there is room in the leaf just add the entry
Sort the leaf page by key
Inserting 8* into a B+ Tree: Find Leaf
13
17
24
30
2*
3*
5*
7*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
8*
Find the correct leaf
39
Inserting 8* into a B+ Tree: Insert
13
17
24
30
2*
3*
5*
7*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
8*
Find the correct leaf
Split leaf if there is not enough room
Inserting 8* into a B+ Tree: Split Leaf
Find the correct leaf
Split leaf if there is not enough room
Redistribute entries evenly
13
17
24
30
2*
3*
5*
7*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
8*
41
5*
7*
8*
Inserting 8* into a B+ Tree: Split Leaf, cont
13
17
24
30
5*
7*
8*
2*
3*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
2*
3*
Find the correct leaf
Split leaf if there is not enough room
Redistribute entries evenly
Fix next/prev pointers
42
Inserting 8* into a B+ Tree: Fix Pointers
13
17
24
30
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
2*
3*
5*
7*
8*
Find the correct leaf
Split leaf if there is not enough room
Redistribute entries evenly
Fix next/prev pointers
43
Inserting 8* into a B+ Tree: Mid-Flight
13
17
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
Something is still wrong!
I am an orphan!
44
Inserting 8* into a B+ Tree: Copy Middle Key
13
17
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
Copy up from leaf the middle key
No room in parent? Recursively split index nodes
5
13
Inserting 8* into a B+ Tree: Split Parent, Part 1
17
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
5
Copy up from leaf the middle key
No room in parent? Recursively split index nodes
Redistribute the rightmost d keys
30
24
17
Inserting 8* into a B+ Tree: Split Parent, Part 2
13
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Copy up from leaf the middle key
No room in parent? Recursively split index nodes
Redistribute the rightmost d keys
5
30
24
17
47
Inserting 8* into a B+ Tree: Root Grows Up
13
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
Push up from interior node the middle key
Now the last key on left
No room in parent? Recursively split index nodes
Redistribute the rightmost d keys
5
17
24
30
13
Inserting 8* into a B+ Tree: Root Grows Up, Pt 2
Recursively split index nodes
Redistribute right d keys
Push up middle key
17
24
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
17
17
17
5
30
49
Inserting 8* into a B+ Tree: Root Grows Up, Pt 3
Recursively split index nodes
Redistribute right d keys
Push up middle key
17
13
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
5
Copy up vs Push up!
Notice:
The leaf entry (5) was copied up
The index entry (17) was pushed up
17
13
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
5
Inserting 8* into a B+ Tree: Final
Check invariants
Key Invariant:
Node[…, (KL, PL), …]
KL<= K for all K in PL Sub-tree
Occupancy Invariant:
d <= # entries <= 2d
17
5
13
24
30
2*
3*
5*
7*
8*
14*
16*
19*
20*
24*
27*
29*
33*
34*
38*
39*
Root Node
B+ Tree Insert: Algorithm Sketch
Find the correct leaf L.
Put data entry onto L.
If L has enough space, done!
Else, must split L (into L and a new node L2)
Redistribute entries evenly, copy up middle key
Insert index entry pointing to L2 into parent of L.
B+ Tree Insert: Algorithm Sketch Part 2
Step 2 can happen recursively
To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits)
Splits “grow” tree; root split increases height.
Tree growth: gets wider or one level taller at top.
Before and After Observations
Notice that the root was split to increase the height
Grow from the root not the leaves
All paths from root to leaves are equal lengths
Does the occupancy invariant hold?
Yes! All nodes (except root) are at least half full
Proof?
After
Before
56
Splitting a Leaf
Start with full leaf (2d) entries (let d = 2)
Add a 2d + 1 entry (8*)
Split into leaves with (d, d+1) entries
Copy key up to parent
Why copy key and not push key up to parent?
d entries
d+1 entries
2*
3*
4*
5*
8*
2*
3*
5*
7*
8*
5
57
Splitting an Inner Node
Start with full interior node (2d) entries: (let d = 2)
Add a 2d + 1 entry
Split into nodes with (d, d+1) entries
Push key up to parent
d entries
d+1 entries
13
17
24
30
5
13
24
30
5
17
Splitting an Inner Node Pt 2
Start with full interior node (2d) entries: (let d = 2)
Add a 2d + 1 entry
Split into nodes with (d, d) entries
Push key up to parent
13
17
24
30
5
d entries
d entries
5
13
24
30
17
Splitting an Inner Node Pt 3
Start with full interior node (2d) entries: (let d = 2)
Add a 2d + 1 entry
Split into nodes with (d, d) entries
Push key up to parent
13
17
24
30
5
d entries
d entries
5
13
24
30
17
Why push not copy?
Routing key not needed in child
Occupancy invariant holds after split
Nice Animation Online
Great animation online of B+ Trees
One small difference to note
Upon deletion of leftmost value in a node, it updates the parent index entry
Incurs unnecessary extra writes
B+-Tree Deletion
We will skip deletion
In practice, occupancy invariant often not enforced
Just delete leaf entries and leave space
If new inserts come, great
This is common
If page becomes completely empty, can delete
Parent may become underfull
That’s OK too
Guarantees still attractive: logF(max size of tree)
BULK LOADING B+-TREES
Bulk Loading of B+ Tree Part 1
Suppose we want to build an index on a large table
Would it be efficient to just call insert repeatedly
No … Why not?
Random Order: CLZARNDXEKFWIUB. Order 2.
Try it: Interactive demo
Bulk Loading of B+ Tree Part 2
L
D
F
R
W
A*
B*
C*
D*
E*
F*
I*
K*
L*
N*
R*
U*
W*
X*
Z*
Constantly need to search from root
Leaves and internal nodes mostly half-empty
Modifying random pages:
poor cache efficiency
Bulk Loading of B+ Tree Part 2
Constantly need to search from leaf
Leaves and nodes are mostly half full
Modifying random pages -> poor cache efficiency
L
D
E
A
B
C
F
I
K
L
N
R
U
W
X
Z
D
F
R
W
16
Smarter Bulk Loading a B+ Tree
4
7
10
13
4
7
10
13
1*
2*
3*
4*
5*
6*
7*
8*
9*
10*
11*
12*
13*
14*
15*
16*
17*
18*
Sort the input records by key:
1*, 2*, 3*, 4*, …
We’ll learn a good disk-based sort algorithm soon!
Fill leaf pages to some fill factor (e.g. ¾)
Updating parent until full
68
Smarter Bulk Loading a B+ Tree Part 2
Sort the input records by key:
1*, 2*, 3*, 4*, …
Fill leaf pages to some fill factor (e.g. ¾)
Update parent until full
Then split parent (50/50) and copy to sibling
13
4
7
10
1*
2*
3*
4*
5*
6*
7*
8*
9*
10*
11*
12*
13*
14*
15*
16*
17*
18*
16
69
Smarter Bulk Loading a B+ Tree Part 3
Lower left part of the tree is never touched again
Occupancy invariant maintained
13
4
7
10
1*
2*
3*
4*
5*
6*
7*
8*
9*
10*
11*
12*
13*
14*
15*
16*
17*
18*
16
Never Touched Again
70
19
22
19*
20*
21*
22*
23*
24*
13
4
7
10
1*
2*
3*
4*
5*
6*
7*
8*
9*
10*
11*
12*
13*
14*
15*
16*
17*
18*
16
Smarter Bulk Loading a B+ Tree Part 4
Sort the input records by key:
1*, 2*, 3*, 4*, …
Fill leaf pages to some fill factor (e.g. ¾)
Update parent until full
Then split parent
Summary of Bulk Loading
Option 1: Multiple inserts
Slow
Does not give sequential storage of leaves
Option 2: Bulk Loading
Fewer I/Os during build. (Why?)
Leaves will be stored sequentially (and linked, of course)
Can control “fill factor” on pages.
Summary
ISAM is a static structure
Only leaf pages modified; overflow pages needed
Overflow chains can degrade performance unless size of data set and data distribution stay constant
B+ Tree is a dynamic structure
Inserts/deletes leave tree height-balanced; logFN cost
High fanout (F) means depth rarely more than 3 or 4.
Almost always better than maintaining a sorted file.
Typically, 67% occupancy on average
Usually preferable to ISAM; adjusts to growth gracefully.
Summary Cont.
Bulk loading can be much faster than repeated inserts for creating a B+ tree on a large data set.
B+ tree widely used because of its versatility
One of the most optimized components of a DBMS.
Concurrent Updates
In-memory efficiency
Graphic Components
5
13
17
33*
34*
38*
39*
Root Node
13
33*
5
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node