Chapter 9b: Buffer Pool Management
CPSC 404, Fall 2020 Based on:
Sections 9.4, 9.41, and 16.7.1 to 16.7.4 of our textbook: Ramakrishnan, Raghu and Gehrke, Johannes. Database Management
Systems. McGraw-Hill, 2003.
Silberschatz, Abraham; Galvin, Peter Baer; and Gagne, Greg. Operating System Concepts, Wiley, 2002.
1
Learning Goals
Explain the purpose of a DBMS buffer pool, and justify why a DBMS should manage its own buffer pool, rather than let the OS handle it.
ProvideanexampleofsequentialfloodinginaDBMSbufferpool.
Explain the tradeoffs between force/no force and steal/no steal page management in a buffer pool. Justify the use of the ARIES algorithm to exploit these properties.
For a given reference string, compute the behaviour of the these page replacement algorithms: FIFO, LRU, MRU, Clock (reference bit), and Extended Clock (reference bit + dirty bit).
Create a reference string that produces worst-case performance for a given page replacement algorithm.
[Later] Explain how the page replacement policy and the access plan can have a big impact on the number of I/Os required by a given application.
[Later] Predict which buffer pool page replacement strategy works best with a given workload (e.g., table scan, index scan, index lookup, logging, returning many rows, running the RUNSTATS utility).
2
Introduction to Buffer Pool Management: Some Definitions
A DBMS buffer pool is part of RAM, and is managed independently of the operating system.
We assume that a page is the smallest unit of transfer between disk and main memory.
Logical memory has “pages”.
Physical memory (RAM) has “frames”.
The page/frame size is determined by the hardware.
The Translation Lookaside Buffer (TLB) is a very fast L1 hardware cache. It is used to determine whether or not a particular page is currently in memory.
3
Buffer Management in a DBMS
disk page free frame
MAIN MEMORY DISK
DB
choice of frame dictated
by page replacement policy
Page Requests from Higher Levels
BUFFER POOL
Data must be in RAM for DBMS to operate on it A table of pairs is maintained.
4
When a Page is Requested …
If requested page is not in buffer pool: Choose a frame for replacement
If that frame is dirty, write it to disk
Read requested page into chosen frame
Pin the new page and return its address
“Pin” = hold in memory & don’t allow another page to overwrite it; add 1 to the in-use count
Note: If requests can be predicted (e.g., table scans, index scans, range searches), then pages can be prefetched several pages (i.e., a bigger block of pages) at a time.
5
When a Page is Requested … (cont.)
Requestor of page must unpin it, and indicate whether the page has been modified:
Maintain a dirty bit for the page
Page may be requested many times
Maintain a pin count
A page is a candidate for replacement iff:
pin count = 0
Concurrency control and recovery mechanisms may entail additional I/O when a frame is chosen for replacement
e.g., Write-Ahead Log protocol
6
Committing a Transaction
A transaction is a series of one or more SQL statements.
After the transaction is executed, the user or program
implicitly or explicitly requests a COMMIT operation.
If modifications were performed by the SQL statements (e.g., INSERT, UPDATE, or DELETE), then we want to make these changes “permanent”. In other words, we want to write the changes to disk, so that other users will get the correct, updated values.
A transaction is said to be committed when its log records reach disk. This is for recovery purposes.
Locks held by the transaction are released at COMMIT time.
A transaction that is in progress is said to be in-flight. It hasn’t been committed. (It might be rolled back (aborted) for various reasons.)
7
Related to crash recovery …
Force: At transaction commit time, we force (i.e., write) the transaction’s updated pages to disk (after writing the log records to disk).
No Steal Force Easiest
Steal
Should we force every logical write to disk (= physical write)?
Force and Steal
(See Sections 16.7.1 and 18.4 in the textbook for details.)
Steal: When the BP desperately No Force needs a free page, we can write a
dirty page for an uncommitted transaction to disk (i.e., we steal a
Desired
frame from an in-flight transaction).
The ARIES crash recovery algorithm (see Chapter 18) lets us undo the uncommitted transaction later, if necessary.
8
Page Faults
A newly requested disk page that is not currently in the buffer pool causes a page fault.
In other words, a page fault means we have to read the page from disk into RAM.
The page to be replaced is called the victim page.
A page is chosen for replacement using a page replacement algorithm (PRA).
9
Some Page Replacement Algorithms
FIFO
Victim = oldest page
Least Recently Used (LRU)
Victim = page that hasn’t been referenced for the longest time
MRU (used by DB2’s RUNSTATS Utility) Victim = page that has been most recently used
Clock (a version of this is used by DB2)
Victim? Each page is given a “second chance”. We’ll cycle
through the pages, as necessary, looking for a victim.
Key Idea: If a page is referenced often enough, its reference bit
(RB) will stay set, and it won’t be a victim. Several variants
• e.g., RB can be > 1 (multiple requestors); but, in the following example, RB 1
10
Clock Algorithm (One Version)
For the reference string, assume that an existing page in the BP that is reused has its RB set to 1 upon reuse (optional: the RB can reflect the pin count). For the algorithm presented here, assume that the timestamp is updated only when the page fault algorithm says so.
Clock Algorithm (for when a page fault occurs)
If an empty frame exists in the BP:
• Use it to store the new page’s data.
• Set its RB to 1.
• Set its timestamp to the current time.
Else, until we find the victim page, do:
• Find the oldest page (i.e., with the oldest timestamp) in the BP.
• If that page’s RB is set to 0, then:
• This is the victim page; so, replace it with the new page.
• Set the new page’s RB to 1, and its timestamp to the current time.
• Else:
• // Give that page a second chance. (We haven’t found the victim, yet.) • Decrement that page’s RB to 0.
• Update that page’s timestamp to the current time.
11
Clock Algorithm: See PDF Example
See the accompanying diagram, on Canvas.
12
Extended Clock Algorithm (Overview)
The Clock Algorithm can be extended to include both a reference bit and a dirty bit (RB/DB).
DB is set if page needs to be written to disk eventually
Similar to Clock PRA
Again, the search for a victim begins with the oldest page. The first 0/0 (not used recently; never modified) or 0/0* (not used recently; modified, second chance) is the victim. As we cycle:
• 0/0 or 0/0* * = write out the dirty page, when it becomes the victim • Becomes: Victim
• 0/1
• Becomes: 0/0*
Not used recently; modified
Not used recently; modified, second chance Used recently; …
• 1/0 or 1/0*
• Becomes: 0/0 or 0/0*, respectively Not used recently; …
• 1/1 Used recently; modified
• Becomes: 0/1 Not used recently; modified
13
Extended Clock Algorithm: See PDF Example
See the accompanying diagram, on Canvas.
14
Sequential Flooding
The page replacement policy can have a big impact on the number of I/Os.
It also depends on the data access pattern.
Sequential flooding: A nasty situation caused by LRU + repeated sequential scans:
If # buffer frames < # pages in file, this means each page request causes an I/O operation (page fault).
• Seeexampleonnextpage
• HowwelldoesMRUperforminthisparticular situation/example?
15
Sequential Flooding Example: LRU with 8 Available Buffer Frames
Note: We assume that the user’s program, including the user’s customer records, exist in a separate buffer pool.
Algorithm: For each customer record, do:
Read and process pages 1-9 from Table X
1234 5678
Suggest alternative algorithms that would perform better.
16
Sequential Flooding Example (cont.): LRU with 8 Available Buffer Frames
Note: We assume that the user’s program, including the user’s customer records, exist in a separate buffer pool.
Algorithm: For each customer record, do:
Read and process pages 1-9 from Table X
1234 5678
Suggest alternative algorithms that would perform better.
17
2Q
LRU Variants (FYI Only)
LRU-k
Remember the times of the last k references for each page. Then, apply LRU to the k-th-to-last reference.
• Why? We want popularity over relatively recent time, rather than just the most recent page bursts.
We can use a window during which we don’t evict pages and don’t count re-references.
LRU-k is too nice to new pages
2Q kicks out pages that are not hot (i.e., that don’t get re-referenced soon enough in the incoming A1 queue); re-referenced A1 pages move from A1 to Am (the latter queue uses LRU)
2Q improves upon LRU-2 overhead
LRU can use a priority queue (heap) to keep track of which page
currently has the lowest priority.
Note: There are many different kinds of PRAs. See the operating systems literature for more examples.
18
DBMS vs. OS File System
The OS does disk space and buffer management; so, why don’t we let the OS manage these tasks?
Differences in OS support: portability issues
Buffer management in a DBMS requires ability to:
• Pin a page in the buffer pool, or force a page to disk (both are important for implementing concurrency control and crash recovery). Some disk subsystems want to exercise temporal control over pages (this is not good for a DBMS).
• Adjust the replacement policy, and prefetch pages, based on access patterns in typical DB operations (this is good).
19
Buffer Manager brings pages into RAM
A page stays in RAM until it is released.
Summary
A dirty page is written to disk when such a frame is chosen as the victim (which may be some time after the requestor writes—and releases—the page).
The choice of frame to be replaced depends on the replacement policy.
• There are many different page replacement algorithms.
The Buffer Manager tries to prefetch several pages at a time.
DBMS vs. OS file support
OS’s manage their own buffer pools.
A DBMS needs to have control over events that most OS’s don’t need to worry about for their own paging activities. Such events include forcing a page to disk, controlling the order of page writes to disk, working with files that span disks, having the ability to control prefetching and page replacement policies based on predictable access patterns, etc.
20