And SSDs
Buffer management Solid State Drives
John Edgar 2
5.1
Main memory is partitioned into a collection of frames called the buffer pool
▪ The buffer manager is responsible for bringing pages from disk to main memory as required
▪ Pages are mapped to frames
▪ Processes inform the buffer manager if a page is no longer required
▪ And if it has been modified
A DB may be many times larger than the
buffer pool
▪ Accessing an entire table can easily fill up the buffer pool
▪ Thebuffermanagerreplacespagesby following a replacement policy
occupied frame
free frame
Main Memory
replacement policy
database
In addition to the buffer pool a data structure maps pages to frames
John Edgar
4
e.g. a hash table
If the page is already in the buffer pool
▪ Increment the pin-count (pin the page) If there are vacant frames
set to 1 if page modified dirty bit
▪ Read requested page into chosen frame and set pin-count to 1
If there are no vacant frames
▪ Select an frame with pin-count = 0 using the replacement policy
▪ If the chosen frame is dirty write to disk
▪ If there are no frames with pin-count = 0
▪ The transaction must wait or abort
▪ Read requested page into chosen frame and set pin-count to 1
pin count
pin-count is decremented when a page is released
data page
main memory frame
John Edgar
5
Assumption: buffer size much smaller than working set
The policy used to replace frames can affect the efficiency
of database operations
▪ Ideally a frame should not be replaced if it will be needed again in the near future
Buffer replacement policies ▪ Random
▪ FIFO
▪ Least Recently Used (LRU)
▪ Clock Replacement
▪ Most Recently Used (MRU)
Queue whose entries are frames with pin-count = 0
Replaces frame at front of queue
Requires memory for queue
Assumes frames not recently used are no longer required
John Edgar 6
A variant of the LRU policy with less overhead
▪ Instead of a queue the system requires one bit per frame, and a single
variable, called current
▪ Assume that frames are numbered from 0 to B-1 ▪ Where B is the number of frames
Each frame has an associated referenced bit
▪ Which is initially set to 1 when the frame is read or accessed
The current variable is initially 0, and is used to show the next
frame to be considered for replacement
current
…
ref-bit for frame 0 wraps around ref-bit for frame B-1 John Edgar
7
Consider the current frame for replacement
▪ If pin-count 0, increment current
▪ If pin-count = 0 and referenced bit is 1
▪ Switch referenced to 0 and increment current ▪ If pin-count = 0 and referenced is 0
▪ Replace the frame 2nd time around replace frame ▪ If current equals B-1 set it to 0
consider next frame make frame a candidate
Only replaces frames with pin-counts of zero
▪ Frames with a pin-count of zero are only replaced after all
older candidates are replaced
wrap around
John Edgar 8
LRU and clock replacement are fair schemes
They are not always the best strategies for a DB system
▪ It is common for some DB operations to require repeated sequential scans of data (e.g. Cartesian products, joins)
▪ With LRU such operations may result in sequential flooding An alternative is the Most Recently Used policy
▪ This prevents sequential flooding but is a generally poor replacement policy
Most systems use some variant of LRU
▪ Some systems will identify certain operations, and apply MRU
for those operations
John Edgar 9
Assume that a process requests sequential scans of a file The file, shown below, has nine pages
Assume that the buffer pool has ten frames Buffer Pool
p1
p2
p3
p4
p5
p6
p7
p8
p9
p1
p2
p3
p4
p5
p6
p7
p8
p9
Read page 1 first,
then page 2,
… then page 9
All the pages are in the buffer, when the next scan of the file is requested, no further disk access is required!
John Edgar 10
Assume that a process requests sequential scans of a file This file, shown below, has eleven pages
Assume that the buffer pool still has ten frames Buffer Pool
Read pages 1 to 10 first, page 11 is still to be read
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
John Edgar 11
Assume that a process requests sequential scans of a file This file, shown below, has eleven pages
Assume that the buffer pool still has ten frames Buffer Pool
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p 11
p2
p3
p4
p5
p6
p7
p8
p9
p10
Read pages 1 to 10 first, page 11 is still to be read
Using LRU, replace the appropriate frame, which contains p1, with p11
John Edgar 12
Assume that a process requests sequential scans of a file This file, shown below, has eleven pages
Assume that the buffer pool still has ten frames Buffer Pool
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p11
p21
p3
p4
p5
p6
p7
p8
p9
p10
Read pages 1 to 10 first, page 11 is still to be read
Using LRU, replace the appropriate frame, which contains p1, with p11
The first scan is complete, start the second scan by reading p1 from the file
Replace the LRU frame (containing p2) with p1
John Edgar 13
Assume that a process requests sequential scans of a file This file, shown below, has eleven pages
Assume that the buffer pool still has ten frames Buffer Pool
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p11
p1
p32
p4
p5
p6
p7
p8
p9
p10
Read pages 1 to 10 first, page 11 is still to be read
Using LRU, replace the appropriate frame, which contains p1, with p11
The first scan is complete, start the second scan by reading p1 from the file
Replace the LRU frame (containing p2) with p1
Continue the scan by reading p2, …
John Edgar 14
Assume that a process requests sequential scans of a file This file, shown below, has eleven pages
Assume that the buffer pool still has ten frames Buffer Pool
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p1010
p 111
p2
p43
p45
p65
p67
p87
p89898
p 190
Each scan of the file requires that every page is read from the disk!
In this case LRU is the
worst
possible replacement policy!
John Edgar 15
A DBMS can often predict patterns in the way in which pages are referenced
▪ Most page references are generated by processes such as query processing with known patterns of page accesses
▪ Knowledge of these patterns allows for a better choice of pages to replace and
▪ Allows prefetching of pages, where the page requests can be anticipated and performed before they are requested
A DBMS requires the ability to force a page to disk
▪ To ensure that the page is updated on a disk
▪ This is necessary to implement crash recovery protocols where the order in which pages are written is critical
John Edgar 16
Some DBMS buffer managers predict page requests ▪ And fetch pages into the buffer before they are requested
▪ Knownasprefetching
▪ Pages are available in the buffer when they are requested, and
▪ If the pages to be prefetched are contiguous, the retrieval will be faster than if they had been retrieved individually
▪ If the pages are not contiguous, retrieval may still be faster as access to them can be efficiently scheduled
Prefetching does require additional main memory buffers
John Edgar 17
Prefetching can be combined with buffering
▪ Where two processes are interleaved to improve performance Data from one process is read into main memory as the
CPU acts on the second
▪ Disk I/O processors are separate from the CPU so these tasks can be performed in parallel
Double buffering is an important memory management technique
Read pages for operation
A0
B0
A1
B1
A2
…
Process page for operation
–
A0
B0
A1
B1
…
John Edgar
18
time
Organize data by cylinders
▪ Related data should be stored “close to” each other
Use a RAID system to improve efficiency or reliability
▪ Multiple disks and striping improves efficiency
▪ Mirroring or redundancy improves reliability
Schedule requests using the elevator algorithm
▪ Reduces disk access time for random reads and writes ▪ Most effective when there are many requests waiting
Prefetch data in large chunks and use double buffering
▪ Speeds up access when needed blocks can be predicted but
requires more main memory buffers
John Edgar 19
5.2
Most Solid State Drives (SSDs) use NAND flash memory and do not contain moving parts like an HDD
▪ Accessing an SSD does not require seek time or rotational latency and they are therefore considerably faster
▪ Flash memory is non-volatile memory that is used by smart-phones, mp3 players and thumb (or USBO) drives
Like HDDs SSDs contain a controller whose functions include ▪ Read and write caching
▪ Encryption
▪ Error detection and correction
▪ Wear leveling
▪ Evenly distributing reads and writes across the disk
John Edgar 21
NAND flash architecture is similar to a NAND (negated and) logic gate hence the name
▪ Only able to read and write data one page at a time
▪ NAND flash memory is non-volatile
▪ It does not require power to retain memory ▪ Earlier SSDs used DRAM which
necessitated an internal power supply
There are different types of NAND SSD
▪ SSD with multiple charge levels
▪ Multi-level cell (MLC), Triple Level Cell (TLC), Quad Level Cell (QLC)
▪ Single-level cell (SLC)
John Edgar 22
An SSD page is a few kB in size
Re-writing SSD data requires first erasing the entire block – i.e. multiple pages
MLC , TLC and QLC cells can store multiple different charge levels
▪ And contain more than one bit
▪ With four charge levels a cell can store 2 bits
▪ With eight, 3 bits, etc.
▪ Reading is more complex but more data can be stored per cell
MLC, TLC and QLC SSDs are cheaper per byte than SLC SSDs
▪ However write performance is worse ▪ And their lifetimes are shorter
John Edgar 23
SLC cells can only store a single charge level
▪ They are therefore on or off, and can contain only one
bit
SLC drives are less complex
▪ More reliable with a lower error rate ▪ Faster since it is easier to read
or write a single charge value
But SLC drives are more expensive
▪ And were typically used for enterprise rather than home use
The trend appears to be towards using TLCs for enterprise solutions rather than SLC
Larger storage size and use of cache or other memory types to mitigate their disadvantages
John Edgar 24
An EFD is an Enterprise Flash Drive
▪ The term was introduced by EMS Corporation
EFDs are designed for applications requiring high
performance
▪ Reliability
▪ Energy efficiency
▪ Consistent performance
There is no standard for what defines an EFD
▪ So SSD manufacturers can claim that their high performing products are EFDs
▪ Withoutmeetinganyparticularrequirements
John Edgar 25
SSD technology continues to evolve
▪ Consumer drives are decreasing in price per byte
▪ There are multiple types of SSD and multiple form factors 3D XPoint Intel Optane
▪ New technology
▪ Developed by Intel and Micron
▪ Multi-purpose ▪ SSDs
▪ Cache for other drives ▪ Persistentmemory
▪ Faster than NAND flash but more expensive per byte
John Edgar 26
SSD access is entirely electronic and so no seek time or rotational delay is incurred
Both reads and writes are faster than HDDs
▪ However flash memory must be erased before it is written, and entire blocks must be erased
▪ Referred to as write amplification
The performance increase over HDDs is greatest for
random reads
SSDs are considerably more durable than HDDs
▪ Primarily due to the lack of moving parts
John Edgar 27
Type
Capacity
Access Time
Max Bandwidth
Cost
RAM
4GB – 1TB
30ns
35GB/s
$100 – $20,000
SSD
64GB – 1TB
50s
750MB/s
$50 – $600
USB Stick
4GB – 512GB
100s
50MB/s
$2 – $200
HDD
600GB – 8TB
10ms
200MB/s
$70 – $500
Optical
50GB – 100GB
180ms
72MBs
$100
Tape
2.5TB – 8.5TB
10s – 80s
40 – 250MB/s
$2,500 – $30,000
Tape Jukebox
25TB – 2.1m TB
10s – 80s
250MB/s – 1.2PB/s
$3,000 – $1m
2014 figures – per text (Elmasri, Navathe)
Note that the capacity figures are already quite out of date
In particular RAM and SSD capacity have increased while cost decreased
John Edgar 29