CS计算机代考程序代写 arm algorithm Hive Chapter 7

Chapter 7

Storage Systems

2

• Magnetic Storage is a popular storage format

for storing large amounts of data.

• Common formats

– Hard drives

• Spinning magnetizable discs store data

• Can be read randomly

– Tape drives

• Magnetizable tapes store data

• Can only be read sequentially

7.6 Magnetic Disk Technology

3

• Hard disk platters are

mounted on spindles.

• Read/write heads are

mounted on arms

• The rotating disk forms

a logical cylinder

beneath the read/write

heads.

• Data blocks are

addressed by their

cylinder, surface, and

sector.

7.6 Magnetic Disk Technology

4

Disk tracks are

numbered

from the

outside edge,

starting with

zero.

7.6 Magnetic Disk Technology

5

• Moving parts take time to get to where the data is.

• Seek time is the time that it takes for a disk arm to
move into position over the desired track or

cylinder.

• Rotational delay is the time that it takes for the

desired sector to move into position beneath the
read/write head.

– Average rotational delay is half the time it takes the disk

make one full rotation.

• Seek time + rotational delay = access latency.

7.6 Magnetic Disk Technology

6

• Low cost is the major advantage of hard disks.
• But their limitations include:

– Very slow compared to main memory

– Fragility

– Moving parts wear out

• Reductions in memory cost enable the
widespread adoption of solid state drives, SSDs.
– SSDs use flash memory instead of mechanical

components

– Can operate on the same bus as mechanical drives

– No moving parts:
• (almost) no access latency

• Lower power consumption

• Much Less Fragile

7.6 Magnetic Disk Technology

7

• Optical media is another form of persistent
storage.

• Common formats are CDs, DVDs, Blu-ray

• Writable variants exist

– CD-R, CD-RW, DVD-R, DVD+RW, DVD+R DL, BD-R…

– Shorter life-span than read-only variants

• Accessed similarly to hard drives except the read

head is replaced with a laser. Bits are stored as
reflective or non reflective parts of the disc.

• Reed-Solomon error-correction is used to

transparently recover from damage to the surface.

7.7 Optical Disks

8

• First-generation magnetic tape was not much more
than wide analog recording tape, having capacities

under 11MB.

• Data was usually written in nine vertical tracks:

7.8 Magnetic Tape

9

• Today’s tapes are digital, and provide multiple
gigabytes of data storage.

• Two dominant recording methods are serpentine

and helical scan

7.8 Magnetic Tape

10

• Linear Tape Open, LTO, as the name implies, is a
linear digital tape format with an openly available

specification.

• Generation 8 was released in 2017.

– Without compression, the tapes support a transfer rate of

360MB per second and each tape can hold up to 12TB.

• LTO supports several levels of error correction,

providing superb reliability.

– Tape has a reputation for being an error-prone medium.

7.8 Magnetic Tape

11

• RAID, an acronym for Redundant Array of
Independent Disks was invented to address

problems of disk reliability, cost, and performance.

• In RAID, data is stored across many disks, with

extra disks added to the array to provide error
correction (redundancy).

• The inventors of RAID, David Patterson, Garth

Gibson, and Randy Katz, provided a RAID
taxonomy that has persisted for a quarter of a

century, despite many efforts to redefine it.

7.9 RAID

12

• RAID Level 0, also known as drive spanning,
provides improved performance, but no redundancy.

– Data is written in blocks across the entire array

– The disadvantage of RAID 0 is in its low reliability. If any disk

fails, the whole array fails.

7.9 RAID

13

• RAID Level 1, also known as disk mirroring,
provides 100% redundancy, and good performance.

– All disks contain the same data.

– All disks need to fail for the array to fail

– The disadvantage of RAID 1 is cost.

7.9 RAID

14

• RAID Level 5 is RAID 4 with distributed parity.

– With distributed parity, some accesses can be serviced

concurrently, giving good performance and high reliability.

– RAID 5 is used in many commercial systems.

– Two disks need to fail for the array to fail

7.9 RAID

15

• RAID Level 6 carries two levels of error protection
over striped data: Reed-Soloman and parity.

– It can tolerate the loss of two disks.

– RAID 6 is write-intensive, but highly fault-tolerant.

– Three disks need to fail for the array to fail.

7.9 RAID

16

• RAID levels can stack, RAID 10 and RAID 0+1

combine RAID 0 and 1 together.

• RAID 5+0, 6+0, 10+0 similiarly combine RAID 0 with

RAID 5, 6, and 10 respectivly with RAID 10 already

being RAID 0 and 1 combined.

• Combined RAID is used to minimize downtime. RAID

5 maintains data when 1 drive fails but in order to

operate again, the missing drive needs to be rebuilt.

7.9 RAID

17

• Data Storage needs be reduced by using Data

Compression.

– Compression ratio is the ratio between compressed size

and uncompressed size.

• There are two types of compression

– Lossless: original data can be restored

– Lossy: some data is lost for better compression ratio

• If we have a 200KB file that we compress to 50KB,
we have a compression ratio of 4:1

7A.1 Data Compression

18

• Compression is achieved by removing data
redundancy while preserving information content.

• The information content of a group of bytes (a

message) is its entropy.

– Entropy is the measure of data required to store a symbol

based on frequency

• Entropy = Σ -log₂(P(xi))P(xi)

• The string aaaaaaaaaaaaaaaa has low entropy

• The string abcdefghijklmnop has high entropy

– Data with low entropy permit a larger compression ratio than

data with high entropy.

7A.1 Data Compression

19

• Entropy as a metric is the basis for statistical data

compression.

• Two widely-used statistical coding algorithms are

Huffman coding and arithmetic coding.

• Huffman coding builds a binary tree from the letter
frequencies in the message.

– The binary symbols for each character are read directly

from the tree.

• Symbols with the highest frequencies end up at the
top of the tree, and result in the shortest codes.

7A.2 Statistical Coding

20

• The process of building the tree begins by counting

the occurrences of each symbol in the text to be

encoded.

HIGGLETY PIGGLTY POP

THE DOG HAS EATEN THE MOP

THE PIGS IN A HURRY

THE CATS IN A FLURRY

HIGGLETY PIGGLTY POP

7A.2 Statistical Coding

21

7A.2 Statistical Coding

22

• A Ziv-Lempel (LZ) dictionary keeps a record of data that

has been compressed and references it if needs to

compress it again

• LZ77 and LZ78 are dictionary compression systems

• ZIP archives, GZIP stream compression, GIF images,

PNG images all implement variations of LZ dictionary

compression systems.

• All lossless compression systems.

7A.3 LZ Dictionary Systems

23

• Multimedia consumes an extremely large quantity of

data uncompressed.

– If a single pixel is one word, a 2000 pixel by 1000 pixel image

takes up 2 million words uncompressed.

• Most media contains data picked up by the imaging or

recording sensors that humans don’t readily notice.

– Some data can be removed with being noticed

• A lossy compression systems takes advantage of this

by removing data to get better compression ratios.

– JPEG images, MP3 audio, MP4 video are examples of

systems that do this.

7A.5 JPEG Compression

24

7A.5 JPEG Compression

25

• I/O: PIO, Memory Mapped, Interrupts, DMA.

• Buses require control lines, a clock, and data
lines. Timing diagrams specify operational details.

• Magnetic disks are a common storage mediu,

– Disk performance metrics: seek time, rotational delay

• Magnetic tape is also an archival medium.

• RAID gives disk systems improved performance

and reliability.

• Data Compression is an important part of storage

efficiency

– Lossless is good for data, Lossy is good for media.

Chapter 7 Conclusion