Systems: The Complete Book. Pearson, 2009.
CPSC 404
Chapter 9a: Disk Storage Fall 2020
Based on:
• Ramakrishnan, Raghu and Gehrke, Johannes. Database Management
Systems. McGraw-Hill, 2003. (i.e., our textbook, pp. 304-309)
• Garcia-Molina, Hector; Ullman, Jeffrey; and Widom, Jennifer. Database
• Mullins, Craig. Database Administration. Addison-Wesley, 2013.
• Bryant, Randall and O’Hallaron, David. Computer Systems. Pearson, 2011.
1
Learning Goals
Explain the impact that disk activity has on DBMS query performance.
Draw the memory hierarchy. Show where database bottlenecks are most likely
to occur and where extensive caching takes place.
Compare and contrast: cost, capacity, and speed of access, in the levels of the memory hierarchy.
Identify the components of a disk drive.
Given disk geometry figures, calculate the amount of time that it takes to read or write a number of bytes, blocks/pages, tracks, or cylinders of data to/from a disk.
Given disk geometry figures, compute the minimum, average, and maximum seek, rotation, and transfer times to/from a disk.
Compare and contrast the relative speeds of seek, rotation, and transfer times— when accessing a given size of data on disk.
Explain how a large file, broken up into pages, can be optimally placed on a disk to improve performance.
2
Learning Goals (cont.)
Compare and contrast hard disk drives (HDDs) to solid state disks (SSDs). Discuss performance implications.
Defend the ongoing role of hard disk drives in the DBMS world (e.g., explain why we can’t eliminate spinning, hard disk drives anytime soon).
Explain the difference between a random read and a sequential read, and argue why one is preferable over the other.
3
Disks and DBMS Files
A DBMS stores information on hard disk drives, and uses the disks extensively.
This has major implications for DBMS design.
READ: transfer data from disk to main memory (RAM)
WRITE: transfer data from RAM to disk
Both are high-cost operations (relative to in-memory operations);
so, I/Os must be planned carefully!
Data is stored and retrieved in units called blocks or pages.
Unlike RAM, the time to retrieve a disk page varies depending upon its location on disk
Therefore, relative placement of pages on disk has major impact on DBMS performance!
4
Relationships among Files, Disks, RAM, Buffer Pool(s), DBMS, OS
5
Compare Disk to RAM
Cost: Compare the cost of a hard disk drive (HDD): $______ for 1 TB … to the cost of RAM: $______ for __ GB.
We may have many GB of data, but not all of it needs to be in memory. It’s very time-consuming to load it all.
Volatility: RAM is volatile, but we want data to be saved after it is updated.
Storage Hierarchy:
Main storage = RAM for currently used data
• SRAMisfasterthanDRAM,butmoreexpensive
Secondary storage = Disk for most data
• SomesystemsareusingSolid-StateDrives(SSDs)insteadof
HDDs. HDDs are still very common in large IT installations.
Tertiary storage (for archives and off-site) = Disk, Tape 6
Typical Memory Hierarchy (source: B&O ‘11)
Smaller, faster, costlier per byte
L1 cache holds cache lines retrieved from L2 cache
Larger, slower, cheaper per byte
main memory
(DRAM) Main memory holds disk blocks
L5:
remote secondary storage
(tapes, distributed file systems, Web servers)
L4:
local secondary storage (local disks)
Local disks hold files retrieved from disks on remote network servers
L2: L3:
off‐chip L2 cache (SRAM)
L2 cache holds cache lines retrieved from main memory
L1:
on‐chip L1 cache (SRAM)
L0:
CPU registers hold words retrieved from L1 cache
registers
retrieved from local disks
7
Storage Device Hierarchy
8
Components of a Disk Drive
The platters spin (e.g., 15000 rpm)
The whole arm assembly is moved in, or out, to position a head on a desired track, k.
Sector
All tracks with the same
track # (e.g., track k) make
up cylinder k.
Only one head reads or
writes at a time. Arm assembly
Platters
One head per surface
Block size is a multiple
of sector size (which is fixed).
Disk Head
Spindle Tracks
Arm Movement
9
Accessing a Disk Page
Access time (i.e., the time to read or write a disk page (sometimes called a block, especially if it’s a contiguous group of pages) is typically made up of 3 components:
1. Seek time (i.e., move the arm to position the disk head just above a particularly cylinder (or track, same thing))
2. Rotational delay (wait for the start of the desired block to come around and be positioned under the head)
3. Transfer time (the time it takes to transmit the data between disk and RAM)
Seek time and rotational delay dominate. Seek time varies from about 1-20 ms
Rotational delay varies from about 0-10 ms
The transfer rate is determined by the disk’s rotation rate, and it’s often much less than 1 ms per 4K page. We’ll assume that the extra rotation time (i.e., when all the data for the requested page passes under the head) equals the transfer time.
10
Simple (Toy) Example of Disk Geometry Including Performance Parameters
“Megatron 747” disk drive: rpm = 3840
block size (page size) = 4096 bytes (4K)
4 platters of 2 surfaces each
213 = 8192 cylinders
average # sectors/track = 28 = 256
#bytes/sector = 29 = 512
Moving the head assembly between cylinders c1 & c2
= 1 ms setup time + 1 ms per 500 cylinders moved
(Modern disks are, of course, much faster and have higher capacity, but the same calculations apply.)
11
Example (cont.)
What is the maximum seek time?
What is the average seek time?
What is the maximum rotational latency time? What is the average rotational latency time?
12
Example (cont.)
What is the transfer time for a page (i.e., time to read/write a page once it’s at the head)?
13
Arranging Pages on Disk
For a sequential scan of data pages, prefetching additional pages helps performance.
e.g., read 32 or more pages at a time
Decision to turn on prefetching can be made at bind time (see later), at run time, or a hybrid of the two
For some queries, a user may want to see only the first few rows of a table; but maybe after seeing the first few rows …
DBMS may be able to predict what the user might want next; hence, the term “anticipatory prefetch” or “sequential prefetch”
Source: Jim Unger, Herman cartoon
14
Arranging Pages on Disk (cont.)
Key to lowering I/O costs?
Reduce seek and rotation time!
The “next block” concept. We prefer to access: Blocks on same track, followed by …
Blocks on same cylinder, followed by … Blocks on adjacent cylinders
In other words, blocks in a file should be arranged sequentially on disk (by “next” order), so as to minimize seek and rotational delays.
Question: Why do we sometimes put indexes on different disk drives than tables?
15
Storage Area Networks (SANs)
A SAN is a group of networked disk devices. They can be shared by a group of machines.
High performance
Very reliable (i.e., high availability) Great for a DBMS
UBC uses SANs for its storage.
There’s a RAID configuration, which means the storage system has
redundancy: there are at least two ways that data can be read.
A disk drive can be swapped quickly, if it fails.
If one disk fails, applications can continue, with only a small slow-down. The failed disk gets rebuilt (on a spare disk drive) in the background, using the redundant information.
Snapshots of directories are taken (incrementally) at one-hour intervals, so we can recover files to the way they were 1, 2, …, k hours ago.
16
Storage Area Networks (cont.)
Faculty, staff, and students can self-recover their files to the state that those files were in, at every hour in the past 48 hours, because hourly incremental backups are made in our file system.
17
PDAs
Advantages
Solid State Disks (SSDs)
Flash memory is based on EEPROMs
e.g., USB memory sticks, MP3 players, cameras, cell phones,
SSD is a storage technology based on flash memory.
No spinning disks or moving parts • Less likely to be damaged
Lower power requirements
• e.g., one-third to one-half of the power
• Less heat; therefore, less cooling needed
Quieter
Faster reads (e.g., 0.1 ms vs. 5-10 ms)
18
Solid State Disks (cont.)
Disadvantages:
Cost (e.g., much more expensive per byte)
• e.g., 256 GB SSD = $100; 1 TB HDD = $50 (Sept. 2018)
Smaller Capacity (typically less, due to cost)
Random writes are slower than reads because you can’t just write a single, changed page p to disk; you have to write a whole bank (chunk) X of data.
• e.g., block X is 0.5 MB; you have to copy X and identify changes, then erase the whole original block (e.g., 1 ms) before re-writing
Limited number of write cycles (1-5 million?), and then the block of memory can’t be used anymore
SSDs probably only get us 1 order of magnitude closer to the speed of RAM.
• Recall: 5 orders of magnitude difference between RAM and HDD • A big improvement, but there’s still a big gap.
19
Summary
Be aware of trade-offs in the capacity, access time, and expense of various components of the storage hierarchy.
RAM vs. Disk: About 5 orders of magnitude difference in speeds (nanoseconds vs. milliseconds)
Caching can take place at various places in the storage hierarchy.
Disk storage is fast and cheap; so is memory.
The amount of data to manage grows at a very fast rate.
SSDs
Solve some problems with HDDs, but have some of their own
HDDs will be with us for a long time yet.
Expect to see more computers/devices with SSDs in years to come.
20