Storage Systems
Storage Systems
DSCI 551
Wensheng Wu
1
Storage hierarchy
2
Primary
(Cache, Memory)
Secondary
(Hard disks, SSD)
Tertiary
(Tape, Optical Disk)
S
p
e
e
d
C
o
st, E
n
e
rg
y
C
o
n
su
m
p
tio
n
Access times
LEVEL ACCESS TIME TYPICAL SIZE
Registers “instantaneous” under 1KB
Level 1 Cache 1-3 ns 64KB per core
Level 2 Cache 3-10 ns 256KB per core
Level 3 Cache 10-20 ns 2-20 MB per chip
Main Memory 30-60 ns 4-32 GB per system
Hard Disk 3,000,000-10,000,000 ns over 1TB
3
Resource: https://arstechnica.com/information-
technology/2012/06/inside-the-ssd-revolution-how-
solid-state-disks-really-work/
SSD: 25,000 ns
Time taken before drive is ready to transfer data
https://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/
CRUD
• Basic functions of a storage device
• CRUD:
– (C)reate/write
– (R)ead
– (U)pdate/overwrite
– (D)elete
4
Characterizing a storage device
• Capacity (bytes)
– How much data it can hold
• Cost ($$$)
– Price per byte of storage
• Bandwidth (bytes/sec)
– Number of bytes that can be transferred per second
– Note that read and write bandwidth may be different
• Latency (seconds)
– Time elapsed, waiting for response/delivery of data
5
Time to complete an operation
• Time to complete an operation depends on
both bandwidth and latency
– CompletionTime = Latency + Size/Bandwidth
• The time for a workload may depend on
– Technology, e.g., hard drive/SSD
– Operation type, e.g., read/write
– Number of operations in the workload
– Access pattern (random vs. sequential)
6
Access pattern
• Sequential
– Data to be accessed are located next to each other
or sequentially on the device
• Random
– Access data located randomly on storage device
7
Magnetic recording
• Write head
– Applies electrical current to write head
– Changes direction of magnetic field under head
8
Reading
• Read head senses direction of magnetic field
9
Road map
• Tapes
• Hard disk
• Solid state drive
10
Linear tape
• Data recorded on parallel tracks that span the
length of the tape
11
Tapes
• Current technology is LTO
– Linear Tape-Open (an open standard)
• Characteristics
– Capacity up to 6.25TB per tape (LTO-7)
– Drive cost ~ $2000
– Tape cost ~ $60 for 6TB tape
• Tape access time (~ minute)
– Time to mount the tape
– Time to wind the tape to correct position
• Data transmission rates ~ 250MB/sec
12
$60
300MB/s
6TB
LTO-7
Performance characteristics
• High latency/low cost makes tape most
appropriate for “archival” storage
– Low frequency of (mostly sequential) reads
– Very large data objects
• Random access will be slow due to latency
– Sequential reads will be fast
13
Linear tape file system
• Two partitions on tape
– First contains metadata and directories. Tape
reader can find and load this very quickly
– Second contains blocks for data
• Directory structure coded in XML
– Self describing file format…
14
15
Tape Cartridge
16
17
12.8TB
6.25TB
A tape library
18
Inside a robotic tape library
• https://www.youtube.com/watch?v=nYfTtvpQ
778
19
Road map
• Tapes
• Hard disk
• Solid state drive
20
Hard disk drives
• Perhaps the most pervasive form of storage
• Basic Idea:
– One or more spinning magnetic platters
• Typically two surfaces per platter
– Disk arm positions over the radial position
(tracks) where data are stored
• It swings across tracks (but do not extend/shrink)
– Data is read/written by a read/write head as
platter spins
21
22
Internal of hard disk
23
Actuator Spindle
Disk head
Platter
https://youtu.be/4iaxOUYalJU
Disk arm and platter
24
Disk arm & head close-up
25
Disk head close-ups
26
27
Disk head movement
• Hard disk head movement while copying files
between two folders (e.g., partition c to d)
– https://www.youtube.com/watch?v=BlB49F6ExkQ
28
2GB Storage in 1980s ($250,000!)
29
Physical characteristics
• 3.5″ (diameter, common in desktops)
• 2.5″ (common in laptops)
• Rotational speed
– 4,800 RPM
– 5,400 RPM
– 7,200 RPM
– 10,000 RPM
• Between 5-7 platters
• Current capacity up to 10TB (Western Digital)
30
Disk organization
• Each platter consists of a number of tracks
• Each track is divided into N fixed size sectors
– Typical sector size is 512 bytes (old) or 4KB (new)
– Sectors can be numbered from 0 to N-1
– Entire sector is written “atomically”
• All or nothing
31
CHS (cylinder-head-sector)
• Early way to address a sector
– Now LBA (Logical Block Addressing) more
common
32
https://en.wikipedia.org/wiki/Cylinder-head-sector
https://en.wikipedia.org/wiki/Logical_block_addressing
https://en.wikipedia.org/wiki/Cylinder-head-sector
CHS (Wikipedia)
33
Example
• # cylinders: 256
• # heads: 16 (i.e., 8 platters, 2 heads/platter)
• # sectors/track: 64
• Sector size = 4KB
34
What is the capacity of the drive?
A simple disk drive (one track only)
35
track
sector
Rotational latency
• Waiting for the right sector to rotate under
the head
– On average: about ½ of time of a full rotation
– Worst case?
– Best case?
36
Rotation time
• Assume 10,000 RPM (rotations per minute)
60000 𝑚𝑠
1000 𝑟𝑜𝑡𝑎𝑡𝑖𝑜𝑛𝑠
=
6 𝑚𝑠
𝑟𝑜𝑡𝑎𝑡𝑖𝑜𝑛
37
Multiple tracks: add seek times
38
Average seek time is about 1/3 max seek time
(see reading: Chapter 37, page 9 for more details)
Transfer time
• Assume that we transfer 512KB
• Assume 128 MB/sec transmission bandwidth
• Transfer time:
512KB/128MB * 1000ms = 4ms
39
Completion time
▪ T = Tseek + Trotation + Ttransfer
▪ Tseek : Time to get the disk head on right track
▪ Trotation :Time to wait for the right sector to rotate
under the head
▪ Ttransfer: Time to actually transfer the data
40
Example
• Capacity 4TB
• # platters: 4
• # heads: 8
• Bytes per sector: 4096
• Transmission bandwidth: 100MB/sec
• Maximum seek time: 12ms
• RPM: 10,000
41
Time to transfer a file
• The file occupies 100 sectors (sequentially)
• Avg. seek time =?
• Avg. rotational latency =?
• Transfer time = ?
42
Sector vs. block
• Block has 1 or more sectors
• Disk typically transfers one block at a time
• We will assume one block = one sector
– Unless stated otherwise
43
Sequential operations
• May assume all sectors involved are on same
track
– We may need to seek to the right track or rotate
to the first sector
• But no rotation/seeking needed afterward
44
Actual bandwidth
• Consider a workload w
– E.g., w = sequential access of 100 blocks of data
– Denote size (# of bytes) of data in w as |w|
– E.g., w = 400KB (100 blocks, 4KB/block)
• Suppose completion time for w = t
• Actual bandwidth (with respect to w) = |w|/t
45
Sequential vs. random
• Consider disk with 7ms avg seek, 10,000 RPM
platter speed and 50 MB/sec transfer rate,
4KB/block
• Sequential access of 10 MB
– Completion time = 7 + 3 + 10/50*1000 = 210ms
– Actual bandwidth = 10MB/210ms = 47.62 MB/s
• Random access of 10 MB (2,500 blocks)
– Completion time = 2500 * (7 + 3 + 4/50) = 25.2s
– Actual bandwidth = 10MB / 25.2s = .397 MB/s
46
Road map
• Tapes
• Hard disk
• Solid state drive
47
Solid State Drive
48
Memory chips
49
Solid State Drives
• All electronic, made from flash memory
• Lower energy consumption than hard drive
• More expensive, less capacity
– 3 times or more expensive
• Limited lifetime, can only write a limited
number of times.
– E.g., 100K program/write cycles for SLC (single-
level cell) memory
50
SSD vs. Hard Drive (price)
51
Solid State Drives
• Same form-factor and control interface as
magnetic disks
• Significantly better latency
– No seek or rotational delay
• Much better performance on random
workload:
– Benefits from improved latency
– However, write has much higher latency (but see
next slide)
52
Speed comparison (YouTube)
53
Due to buffered writes
Read vs write: significant difference in SSD vs marginal in HDD
Writing to SSD is complicated
• Can not overwrite a page
– Need to erase its block (at a certain point) instead
• SSD controllers take care of all these details
54
SSD
• Contains a number of flash memory chips
– Chip -> dies -> planes -> blocks -> pages (rows) ->
cells
– Cells are made of floating-gate transistors
• Page is the smallest unit of data transfer
between SSD and main memory
– Much like a block in hard disk
55
Die Layout
56Understanding Flash: Blocks, Pages and Program / Erases
http://flashdba.com/2014/06/20/understanding-flash-blocks-pages-and-program-erases/http:/flashdba.com/2014/06/20/understanding-flash-blocks-pages-and-program-erases/
Dies, planes, block, and pages
• Typically, a chip may have 1, 2, or 4 dies
• A die may have 1 or 2 planes
• A plane has a number of blocks
– Block is the smallest unit that can be erased
• A block has a number of pages
– Page is the smallest unit that can be read,
programmed/written
57
Typical page and block sizes
• Common page sizes: 2K, 4K, 8K, and 16K
• A block typically has 128 to 256 pages
=> Block size: 256KB to 4MB
58
Normal transistor (MOSFET)
• When current applied to gate terminal
– semi-conducting region (purple) becomes
conductive
59
Floating gate transistor
• Contain an additional gate: floating gate
– Floating, since isolated by oxide layer
– (thus not connected to other components)
60
Floating gate transistor
• By applying high positive/negative voltage to
control gate, electrons can be attracted to or
repelled from floating gate
61
Floating gate transistor
• State = 1, if no electrons in the floating gate
• State = 0, if there are electrons (negative
charges)
– Electrons stuck there even when power is off
– So state is retained
62
Read operations
• Electrons on the floating gate affect the
threshold voltage for the floating gate
transistor to conduct
• Higher voltage needed when gate has
electrons
63
Read operations
64
See here for more details
Floating gate is charged
i.e., containing electrons
Typically, .7v
1.5v
1.8v
Understanding Flash: Floating Gates and Wear
Read operations
• Apply Vint (intermediate voltage)
• If the current is detected, gate has no
electrons
=> bit = 1
• If no current, gate must have electrons
=> bit = 0
65
NAND flash layout
• Transistors are strung together in a series
– Similar to the transistors in an NAND gate
66
Output F = 0 only when both a & b = 1
I.e., F will be grounded only when both a and b conduct
NAND reading
67
Apply Von to all others so that
they all conduct, no mater
they are charged or not.
Bit line
NAND flash
68
Write and erase
• Write: 1 => 0
– Apply high POSITVE voltage (>> voltage for read)
to the control gate
– Attract electrons from channel to floating gate
(through quantum tunneling)
• Erase: 0 => 1
– Need to apply much higher NEGATIVE voltage
– Get rid of electrons from floating gate
– May stress surrounding cells
– So dangerous to do on individual pages
69
Read/write units
• Page is the smallest unit for read and write
(write is also called program, 1->0)
• Block is the smallest unit for erase (0->1)
– i.e., make cells “empty” (i.e., no electrons)
70
Example
71
Latencies: read, write, and erase
72
P/E cycle
• P: program/write; E: erase
• Every write & erase damages oxide layer
surrounding the floating-gate to some extent
• P/E cycle:
– Data are written to cells (P): cell value from 1 -> 0
– Then erased (E): 0 -> 1
73
Multi-level cell (MLC)
74
http://www.cactus-tech.com/resources/blog/details/solid-
state-drive-primer-2-slc-mlc-and-tlc-nand-flash
Note different
levels of electrons
MLC reading
• 2 bits, 3 intermediate voltages
75
0010 0111
V2 V3V1
Current vs voltage
SLC compared with MLC
• SLC:
– Less complex
– Faster
– More reliable
– Less storage
– More costly
76
Read more
• Solid-state revolution: in-depth on how SSDs
really work
• How do SSDs work?
– http://www.extremetech.com/extreme/210492-
extremetech-explains-how-do-ssds-work
77
http://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/3/
http://www.extremetech.com/extreme/210492-extremetech-explains-how-do-ssds-work
http://www.extremetech.com/extreme/210492-extremetech-explains-how-do-ssds-work
References
• How Flash Memory Works
– https://www.youtube.com/watch?v=msi5GDz9JIw
• Floating Gate Basics
– http://www.cse.scu.edu/~tschwarz/coen180/LN/fl
ash.html
• Friend of Flash
– http://www.nnc3.com/mags/LM10/Magazine/Arc
hive/2008/86/040-041_logfs/article.html
78
https://www.youtube.com/watch?v=msi5GDz9JIw
http://www.cse.scu.edu/~tschwarz/coen180/LN/flash.html
http://www.nnc3.com/mags/LM10/Magazine/Archive/2008/86/040-041_logfs/article.html
References
• Understanding Flash: Floating Gates and Wear
– https://flashdba.com/2015/01/09/understanding-
flash-floating-gates-and-wear/
• From Transistors to Functions
– http://www.cs.bu.edu/~best/courses/modules/Tr
ansistors2Gates/
79
Understanding Flash: Floating Gates and Wear
http://www.cs.bu.edu/~best/courses/modules/Transistors2Gates/
References
• Solid State Drive Primer
– https://www.cactus-
tech.com/resources/blog/details/solid-state-drive-
primer-1-the-basic-nand-flash-cell
• How Does a Transistor Work?
– https://www.youtube.com/watch?v=IcrBqCFLHIY
&feature=youtu.be
80
References
• How Flash Memory Works
– https://www.youtube.com/watch?v=s7JLXs5es7I
81
https://www.youtube.com/watch?v=s7JLXs5es7I