Chapter 7
Storage Systems
2
• Magnetic Storage is a popular storage format
for storing large amounts of data.
• Common formats
– Hard drives
• Spinning magnetizable discs store data
• Can be read randomly
– Tape drives
• Magnetizable tapes store data
• Can only be read sequentially
7.6 Magnetic Disk Technology
3
• Hard disk platters are
mounted on spindles.
• Read/write heads are
mounted on arms
• The rotating disk forms
a logical cylinder
beneath the read/write
heads.
• Data blocks are
addressed by their
cylinder, surface, and
sector.
7.6 Magnetic Disk Technology
4
Disk tracks are
numbered
from the
outside edge,
starting with
zero.
7.6 Magnetic Disk Technology
5
• Moving parts take time to get to where the data is.
• Seek time is the time that it takes for a disk arm to
move into position over the desired track or
cylinder.
• Rotational delay is the time that it takes for the
desired sector to move into position beneath the
read/write head.
– Average rotational delay is half the time it takes the disk
make one full rotation.
• Seek time + rotational delay = access latency.
7.6 Magnetic Disk Technology
6
• Low cost is the major advantage of hard disks.
• But their limitations include:
– Very slow compared to main memory
– Fragility
– Moving parts wear out
• Reductions in memory cost enable the
widespread adoption of solid state drives, SSDs.
– SSDs use flash memory instead of mechanical
components
– Can operate on the same bus as mechanical drives
– No moving parts:
• (almost) no access latency
• Lower power consumption
• Much Less Fragile
7.6 Magnetic Disk Technology
7
• Optical media is another form of persistent
storage.
• Common formats are CDs, DVDs, Blu-ray
• Writable variants exist
– CD-R, CD-RW, DVD-R, DVD+RW, DVD+R DL, BD-R…
– Shorter life-span than read-only variants
• Accessed similarly to hard drives except the read
head is replaced with a laser. Bits are stored as
reflective or non reflective parts of the disc.
• Reed-Solomon error-correction is used to
transparently recover from damage to the surface.
7.7 Optical Disks
8
• First-generation magnetic tape was not much more
than wide analog recording tape, having capacities
under 11MB.
• Data was usually written in nine vertical tracks:
7.8 Magnetic Tape
9
• Today’s tapes are digital, and provide multiple
gigabytes of data storage.
• Two dominant recording methods are serpentine
and helical scan
7.8 Magnetic Tape
10
• Linear Tape Open, LTO, as the name implies, is a
linear digital tape format with an openly available
specification.
• Generation 8 was released in 2017.
– Without compression, the tapes support a transfer rate of
360MB per second and each tape can hold up to 12TB.
• LTO supports several levels of error correction,
providing superb reliability.
– Tape has a reputation for being an error-prone medium.
7.8 Magnetic Tape
11
• RAID, an acronym for Redundant Array of
Independent Disks was invented to address
problems of disk reliability, cost, and performance.
• In RAID, data is stored across many disks, with
extra disks added to the array to provide error
correction (redundancy).
• The inventors of RAID, David Patterson, Garth
Gibson, and Randy Katz, provided a RAID
taxonomy that has persisted for a quarter of a
century, despite many efforts to redefine it.
7.9 RAID
12
• RAID Level 0, also known as drive spanning,
provides improved performance, but no redundancy.
– Data is written in blocks across the entire array
– The disadvantage of RAID 0 is in its low reliability. If any disk
fails, the whole array fails.
7.9 RAID
13
• RAID Level 1, also known as disk mirroring,
provides 100% redundancy, and good performance.
– All disks contain the same data.
– All disks need to fail for the array to fail
– The disadvantage of RAID 1 is cost.
7.9 RAID
14
• RAID Level 5 is RAID 4 with distributed parity.
– With distributed parity, some accesses can be serviced
concurrently, giving good performance and high reliability.
– RAID 5 is used in many commercial systems.
– Two disks need to fail for the array to fail
7.9 RAID
15
• RAID Level 6 carries two levels of error protection
over striped data: Reed-Soloman and parity.
– It can tolerate the loss of two disks.
– RAID 6 is write-intensive, but highly fault-tolerant.
– Three disks need to fail for the array to fail.
7.9 RAID
16
• RAID levels can stack, RAID 10 and RAID 0+1
combine RAID 0 and 1 together.
• RAID 5+0, 6+0, 10+0 similiarly combine RAID 0 with
RAID 5, 6, and 10 respectivly with RAID 10 already
being RAID 0 and 1 combined.
• Combined RAID is used to minimize downtime. RAID
5 maintains data when 1 drive fails but in order to
operate again, the missing drive needs to be rebuilt.
7.9 RAID
17
• Data Storage needs be reduced by using Data
Compression.
– Compression ratio is the ratio between compressed size
and uncompressed size.
• There are two types of compression
– Lossless: original data can be restored
– Lossy: some data is lost for better compression ratio
• If we have a 200KB file that we compress to 50KB,
we have a compression ratio of 4:1
7A.1 Data Compression
18
• Compression is achieved by removing data
redundancy while preserving information content.
• The information content of a group of bytes (a
message) is its entropy.
– Entropy is the measure of data required to store a symbol
based on frequency
• Entropy = Σ -log₂(P(xi))P(xi)
• The string aaaaaaaaaaaaaaaa has low entropy
• The string abcdefghijklmnop has high entropy
– Data with low entropy permit a larger compression ratio than
data with high entropy.
7A.1 Data Compression
19
• Entropy as a metric is the basis for statistical data
compression.
• Two widely-used statistical coding algorithms are
Huffman coding and arithmetic coding.
• Huffman coding builds a binary tree from the letter
frequencies in the message.
– The binary symbols for each character are read directly
from the tree.
• Symbols with the highest frequencies end up at the
top of the tree, and result in the shortest codes.
7A.2 Statistical Coding
20
• The process of building the tree begins by counting
the occurrences of each symbol in the text to be
encoded.
HIGGLETY PIGGLTY POP
THE DOG HAS EATEN THE MOP
THE PIGS IN A HURRY
THE CATS IN A FLURRY
HIGGLETY PIGGLTY POP
7A.2 Statistical Coding
21
7A.2 Statistical Coding
22
• A Ziv-Lempel (LZ) dictionary keeps a record of data that
has been compressed and references it if needs to
compress it again
• LZ77 and LZ78 are dictionary compression systems
• ZIP archives, GZIP stream compression, GIF images,
PNG images all implement variations of LZ dictionary
compression systems.
• All lossless compression systems.
7A.3 LZ Dictionary Systems
23
• Multimedia consumes an extremely large quantity of
data uncompressed.
– If a single pixel is one word, a 2000 pixel by 1000 pixel image
takes up 2 million words uncompressed.
• Most media contains data picked up by the imaging or
recording sensors that humans don’t readily notice.
– Some data can be removed with being noticed
• A lossy compression systems takes advantage of this
by removing data to get better compression ratios.
– JPEG images, MP3 audio, MP4 video are examples of
systems that do this.
7A.5 JPEG Compression
24
7A.5 JPEG Compression
25
• I/O: PIO, Memory Mapped, Interrupts, DMA.
• Buses require control lines, a clock, and data
lines. Timing diagrams specify operational details.
• Magnetic disks are a common storage mediu,
– Disk performance metrics: seek time, rotational delay
• Magnetic tape is also an archival medium.
• RAID gives disk systems improved performance
and reliability.
• Data Compression is an important part of storage
efficiency
– Lossless is good for data, Lossy is good for media.
Chapter 7 Conclusion