CS计算机代考程序代写 SQL python data structure database AVL c++ cache algorithm COMP20007 Design of Algorithms

COMP20007 Design of Algorithms
B-trees
Daniel Beck
Lecture 15
Semester 1, 2021
1

Primary vs. Secondary Memories
• Primary memory (RAM)
2

Primary vs. Secondary Memories
• Primary memory (RAM)
• Key comparisons are the most significant operation. • The paradigm we’ve seen so far.
2

Primary vs. Secondary Memories
• Primary memory (RAM)
• Key comparisons are the most significant operation. • The paradigm we’ve seen so far.
• Secondary memory (Hard Drive)
• Disk accesses are the most significant operation.
• Motivates the use of data structures that minimise disk
accesses
2

BSTs and Disk accesses
2–3 trees
• Traversing between nodes requires a new disk access: expensive.
3

BSTs and Disk accesses
2–3 trees
• Traversing between nodes requires a new disk access: expensive.
• Traversing within a node requires search in a contiguous array: cheap.
3

BSTs and Disk accesses
2–3 trees
• Traversing between nodes requires a new disk access: expensive.
• Traversing within a node requires search in a contiguous array: cheap.
• 2–3 trees are ideal because their height is reduced.
3

B-trees
• A generalisation of 2–3 trees.
4

B-trees
• A generalisation of 2–3 trees.
• Defined in terms of an order m, which dictates minimum and maximum number of children (and therefore keys per node).
4

B-trees
• A generalisation of 2–3 trees.
• Defined in terms of an order m, which dictates minimum and maximum number of children (and therefore keys per node).
• There is not a single accepted definition of order… We will adopt the convention in Levitin.
• A 2–3 tree is a B-tree of order 3.
4

B-tree – Example
A B-tree of order 6 (and also orders 5 and 4).
(Figure taken from the Cormen book “Introduction to Algorithms”)
5

B-tree – Definitions
Given an order m:
6

B-tree – Definitions
Given an order m:
• The root is either a leaf or has between 2 and m children.
6

B-tree – Definitions
Given an order m:
• The root is either a leaf or has between 2 and m children.
• Each node which is not the root or a leaf has between ⌈m/2⌉ and m children.
• And therefore between ⌈m/2⌉ − 1 and m − 1 keys.
• The tree is perfectly balanced: all leaves are at the same
level.
Remember: all operations are still O(log n).
6

B-tree – Search
• Search is done in a similar way as a standard BST.
7

B-tree – Search
• Search is done in a similar way as a standard BST. • With an additional search at each node.
• Since keys are kept sorted, binary search can be used.
7

B-tree – Search
• Search is done in a similar way as a standard BST. • With an additional search at each node.
• Since keys are kept sorted, binary search can be used.
• Practical property: B-trees are very efficient when searching for a range of values.
• Disk reading operations typically read full sectors in a single operation, storing all data in the disk cache.
7

B-tree – Insertion
• The main risk when inserting a key is having a node with more keys than allowed.
8

B-tree – Insertion
• The main risk when inserting a key is having a node with more keys than allowed.
• Similar to 2–3 trees, we split a node when this happens.
8

B-tree – Insertion
• The main risk when inserting a key is having a node with more keys than allowed.
• Similar to 2–3 trees, we split a node when this happens. • The median key is promoted to the parent node.
• This in turn can trigger further splits all the way up. • Requires more disk accesses → expensive
• This can be mitigated by preemptively splitting full nodes during an insertion.
8

B-tree – Insertion
Insert B
9

B-tree – Insertion
This will trigger a split.
Insert Q
10

B-tree – Insertion
Insert L
11

B-tree – Insertion
Here the root is split preemptively, allowing insertion through a single pass through the tree and reducing disk accesses.
Insert F
12

B-tree – Insertion
13

B-tree – Deletion
• In deletion, the main risk is having a node with less than the minimum number of keys.
14

B-tree – Deletion
• In deletion, the main risk is having a node with less than the minimum number of keys.
• Two operations to solve that:
14

B-tree – Deletion
• In deletion, the main risk is having a node with less than the minimum number of keys.
• Two operations to solve that:
• If a sibling has “spare” keys, perform a rotation.
• If all siblings have the minimum number of keys, perform
a merge.
14

B-tree – Deletion
Delete F
15

B-tree – Deletion
Delete M
This will trigger a rotation (left children has spare keys)
16

B-tree – Deletion
Delete G
This will trigger a merge (both children have minimum number of keys)
17

B-tree – Deletion
Delete D
This will also trigger a merge, but because an internal node has minimum number of keys (preemptive merge).
18

B-tree – Deletion
Tree shrinks
19

B-tree – Deletion
Delete B This will trigger a rotation.
20

B-tree – Deletion
21

B-tree – Analysis
• The bounds for operations in a B-tree are all O(log n).
22

B-tree – Analysis
• The bounds for operations in a B-tree are all O(log n).
• However, it is the number of disk accesses that’s important.
• This is proportional to the height of the B-tree.
22

B-tree – Analysis
23

B-tree – Analysis
For a B-tree of order m and height h:
h−1
n ≥ 1 + 􏰐 2⌈m/2⌉i −1 (⌈m/2⌉ − 1) + 2⌈m/2⌉h−1
i=1
24

B-tree – Analysis
For a B-tree of order m and height h:
h−1
n ≥ 1 + 􏰐 2⌈m/2⌉i −1 (⌈m/2⌉ − 1) + 2⌈m/2⌉h−1
i=1 h−1
n ≥ 1 + 2 􏰐 􏰁⌈m/2⌉i−1⌈m/2⌉ − ⌈m/2⌉i−1􏰂 + 2⌈m/2⌉h−1 i=1
24

B-tree – Analysis
For a B-tree of order m and height h:
h−1
n ≥ 1 + 􏰐 2⌈m/2⌉i −1 (⌈m/2⌉ − 1) + 2⌈m/2⌉h−1
i=1 h−1
n ≥ 1 + 2 􏰐 􏰁⌈m/2⌉i−1⌈m/2⌉ − ⌈m/2⌉i−1􏰂 + 2⌈m/2⌉h−1 i=1
h−1
n≥1+2􏰐􏰁⌈m/2⌉i −⌈m/2⌉i−1􏰂+2⌈m/2⌉h−1
i=1
24

B-tree – Analysis
4⌈m/2⌉h−1 − 1 ≤ n
25

B-tree – Analysis
4⌈m/2⌉h−1 − 1 ≤ n
⌈m/2⌉h−1 ≤ n + 1 4
25

B-tree – Analysis
4⌈m/2⌉h−1 − 1 ≤ n ⌈m/2⌉h−1 ≤ n + 1
4
h − 1 ≤ ⌊log⌈m/2⌉
􏰆n + 1􏰇 4 ⌋
25

B-tree – Analysis
4⌈m/2⌉h−1 − 1 ≤ n ⌈m/2⌉h−1 ≤ n + 1
4
h − 1 ≤ ⌊log⌈m/2⌉
􏰆n + 1􏰇 4 ⌋
􏰆n + 1􏰇
4 ⌋ + 1
h ≤ ⌊log⌈m/2⌉
25

B-tree – Analysis
Assume 100 million records:
order m 50 100 250 upper bound on h 6 5 4
26

B-tree – Analysis
Assume 100 million records:
order m 50 100 250
upper bound on h 6 5 4
In practice: very high order, usual height upper bound of 3.
26

B-tree – Extensions
• B*-tree: reduces splits at insertion time by using “spills”
27

B-tree – Extensions
• B*-tree: reduces splits at insertion time by using “spills” • When a node has too many keys, try to rotate with the left or right sibling instead of splitting (as long as there
is a sibling with less than m nodes).
• B+-tree: internal nodes only contain keys, records are stored at the leaves.
• Keys appear a second time at the leaves.
27

B+-tree
(from Levitin)
28

BSTs – Summary
• Dictionaries store (key, value) pairs.
29

BSTs – Summary
• Dictionaries store (key, value) pairs.
• BSTs provide Θ(log n) search, insert and delete operations.
29

BSTs – Summary
• Dictionaries store (key, value) pairs.
• BSTs provide Θ(log n) search, insert and delete operations.
• Standard BSTs can degrade to linked lists and Θ(n) worst case performance.
• Self-balancing: AVL trees, Red-Black Trees • Change of representation: 2–3 trees, B-trees
29

BSTs – Summary
• Self-balancing trees tend to be better when the dictionary is fully in memory.
30

BSTs – Summary
• Self-balancing trees tend to be better when the dictionary is fully in memory.
• C++ maps: implemented via Red-black trees.
30

BSTs – Summary
• Self-balancing trees tend to be better when the dictionary is fully in memory.
• C++ maps: implemented via Red-black trees.
• Multiple elements per node are a better choice when secondary memory is involved.
• B-trees (and its variants) are widely used in SQL databases (PostgreSQL, SQLite) and filesystems (Ext4, Reiser4).
30

Related Posts