程序代写代做代考 algorithm Improving HDD Efficiency and Reliability

Improving HDD Efficiency and Reliability

 Hard disks act as bottlenecks for processing ▪ DB data is stored on disks, and must be fetched
into main memory to be processed
▪ Disk access is considerably slower than main memory processing
 Hard disks are also relatively unreliable
▪ Disks contain mechanical components that are
more prone to failure than electronic components  One solution is to use multiple disks
John Edgar 2

 Multiple HDDs
▪ Each disk contains multiple platters
▪ Disks can be read in parallel, and
▪ Different disks can read from different
cylinders
▪ e.g. disk 1 can read data from cylinder 6,000, while disk 2 reads data from cylinder 11,000
 Single HDD ▪ Multiple
platters
▪ Disk heads are always over the same cylinder
John Edgar
3

 A disk array gives the user the abstraction of a single large disk
▪ When an I/O request is issued the physical disk blocks to be retrieved have to be identified
▪ How the data is distributed over the disks in the array affects access time
▪ And how many disks are involved in an I/O request
 Data is divided into partitions called striping units
▪ The striping unit is usually either a block or a bit
▪ Striping units are distributed over the disks using a round
robin algorithm
John Edgar 4

Notional File – the data is divided into striping units of a given size
The striping units are distributed across a RAID system in a round robin fashion disk 1
disk 2
disk 3
disk 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65

2
6
10
14
18
22
26
30
34
38
42
46
50
54
58
62
66

3
7
11
15
19
23
27
31
35
39
43
47
51
55
59
63
67

4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
68

The size of the striping unit has an impact on the behaviour of the system
John Edgar
5

 Assumptions
▪ Multiple records fit on a block
▪ Four in the example
▪ Records are composed of multiple bytes
 Bit striping
▪ Bit 1 of the first record on the first block
(b1r1) on D1, b2r1 on D2, b3r1 on D3, etc.
▪ All disks must be read to recreate a single record
 Block striping
▪ Block 1 of the file stored on D1, block 2
on D2, block 3 on D3, etc.
▪ Each record is stored on just one disk
▪ Remaining disks are available for use
block 1
Bit Striping D1
D2
D3 D4
D1 D2
D3 D4
block 2

records
John Edgar
6
one record on four disks
Block Striping

 Assume that a disk array consists of D disks
▪ Data is distributed across the disks using data striping
▪ How does it perform compared to a single disk?  Consider four types of requests
▪ Random read – reading multiple, unrelated records ▪ i.e. records that are not stored close to each other
▪ Random write
▪ Sequential read – reading a number of records ▪ Suchasanentirefileortable
▪ Stored on more than D blocks in close proximity
▪ Sequential write
John Edgar 7

 Use all D disks to improve efficiency, and distribute data using block striping
 Random reads and writes
▪ Up to D different records can be read from or written to at once
▪ Depending on which disks the records reside on
▪ Records might reside on different disks, or – if unlucky – all on the same disk
 Sequential reads and writes
▪ Related data are distributed over all D disks so performance is D
times faster than a single disk
 Writes generally entail reading the data first
But what about reliability …
John Edgar 8

 The standard hard drive failures measure is MTBF, expressed in hours
▪ Annualized Failure Rate (AFR) expresses the probability per year a
drive will fail
▪ AFR  8766 / MTBF
8766? average number of hours in a year
 Modern enterprise hard drives have MTBF values of around 1,000,000
▪ Consumer quality drives are around half these values
▪ Hard drive reliability has increased substantially over time
 MTBF values are usually published by manufacturers
▪ And should be assumed to give a flattering picture of a drive’s reliability
John Edgar 9

 Hard disks contain mechanical components and are less reliable than other, purely electronic, components
▪ Increasing the number of hard disks decreases reliability, reducing the mean-time-between-failures (MTBF)
▪ Let’s say the MTBF of a hard disk is  500,000 hours, or 57 years  In a disk array the overall MTBF decreases
▪ Because the number of disks is greater
▪ MTBF of a 100 disk array is 210 days: (500,000 ÷ 100) ÷ 24
▪ This assumes that failures occur independently and
▪ The failure probability does not change over time
 Reliability can be improved by storing redundant data
John Edgar 10

 Reliability of a disk array can be improved by storing redundant data
 If a disk fails the redundant data can be used to reconstruct the data lost on the failed disk
▪ The data can either be stored on a separate check disk or ▪ Distributed uniformly over all the disks
 Redundant data is typically stored using one of two methods
▪ Mirroring, where each disk is duplicated
▪ A parity scheme, where sufficient redundant data is maintained to recreate the data in any one disk
▪ Other redundancy schemes provide greater reliability
John Edgar 11

 For each bit on the data disks there is a parity bit on a check disk
▪ If the sum of the data disks bits is even the parity bit is set to zero
▪ If the sum of the bits is odd the parity bit is set to one
 The data on any one failed disk can be recreated bit by bit
0
1
1
0
1
1
1
1
0
0
1
1
0
0
1
0
1
1
0
1
1
0
0
1

Reading – not affected by parity data
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
0
1
0
0

0
0
0
1
1
1
0
1
0
0
1
1
0
0
0
1
1
0
1
1
1
0
0
1

Writing – either
0
1
1
0
0
0
1
0
0
1
0
1
0
0
1
0
0
1
0
1
0
0
1
1

calculate new value of the parity bit from all the data disks or
First threebytes of data disks in a four (data) disk RAID system
Corresponding bytes of the system’s check disk
flip parity bit if corresponding data bit has changed
1
0
1
1
1
0
1
0
1
1
1
1
1
0
0
0
1
0
1
0
0
1
1
1

John Edgar
12

 A RAID system consists of several disks organized to increase performance and improve reliability
▪ Performance is improved through data striping
▪ Reliability is improved through redundancy
▪ RAID stands for Redundant Arrays of Independent Disks  There are several RAID schemes or levels
▪ The levels differ in terms of their ▪ Read and write performance,
▪ Reliability,and
▪ Cost
John Edgar 13

 All D disks are used to improve efficiency, and data is distributed using block striping
 No redundant information is kept
 Read and write performance is very good
 Reliability is poor
▪ Unless data is regularly backed up a RAID 0 system should only be used when the data is not important
 A RAID 0 system is the cheapest of all RAID levels
▪ Because there are no disks used for storing redundant data ▪ With D data disks there are 0 check disks
John Edgar 14

 An identical copy is kept of each disk in the system, hence the term mirroring
 Read performance is similar to a single disk
▪ No data striping, but parallel reads of the duplicate disks can be
made improving random read performance
 Write performance is worse than a single disk
as the duplicate disk has to be written to
▪ Writes to the original and mirror should not be
performed simultaneously in case of a global system failure
▪ But write performance is superior to most other RAID levels  Very reliable but costly
▪ With D data disks, a level 1 RAID system has 2D disks
John Edgar 15

 Sometimes referred to as RAID level 10, combines both striping and mirroring
 Very good read performance
▪ Similar to RAID level 0
▪ 2D times the speed of a single disk for sequential reads
▪ Up to 2D times the speed of a single disk for random reads
▪ Allows parallel reads of blocks that, conceptually, reside on the same disk
 Poor write performance, that is similar to RAID level 1  Very reliable but the most expensive RAID level
▪ Requires 2D disks
John Edgar 16

 Writing is the Achilles heel of RAID
▪ Data and check disks should not be written
to simultaneously
▪ Parity information may have to be read before check disks are written
 Bit striping random and sequential writes, block striping sequential writes
▪ Write all data disks using r-m-w cycle
▪ Recalculate parity data from data disks ▪ Write new parity data to check disk
 Block striping random writes
▪ Write to single disk using r-m-w cycle
▪ Read check disk and calculate parity data ▪ Write check disk
In many RAID systems writing is less efficient than a single disk!
John Edgar 17

 A RAID system with D disks can read data up to D times faster than a single disk system
 For sequential reads there is no performance difference between bit striping and block striping
 Block striping is more efficient for random reads
▪ With bit striping all D disks have to be read to recreate a single record
(and block) of the data file
▪ With block striping a complete record is stored on one disk therefore only that one disk is required to satisfy a random read
 Write performance is similar except that it is affected by the parity scheme
▪ Random writes are typically inefficient
John Edgar 18

 Level 2 – Memory Style Error Correcting Code
▪ The striping unit is a single bit
▪ Uses a scheme that allows the failed disk to be identified ▪ Which increases the number of disks required
▪ Modern disk controllers can detect a failed disk so this is unnecessary ▪ Can only tolerate the loss of a single disk
 Level 3 – Byte Interleaved Parity
▪ The striping unit is a byte
▪ Random read and write performance is poor as all disks have to be accessed for each request
▪ Can tolerate the loss of a single disk
▪ Requires D + 1 disks
John Edgar 19

 Uses block striping to distribute data over disks  Uses one redundant disk containing parity data
▪ The ith block on the redundant disk contains parity checks for the ith blocks of all data disks
 Good sequential read performance ▪ D times single disk speed
 Very good random read performance
▪ Disks can be read independently therefore up to D
times single disk speed
John Edgar 20

 Cost is moderate
▪ Only one check disk is required
 The system can tolerate the loss of one drive  Write performance is poor for random writes
▪ In which different data disks are written independently ▪ For each such write a write to the redundant disk is also
required
 Performance can be improved by distributing
the redundant data across all disks – RAID level 5
John Edgar 21

 The dedicated check disk in RAID level 4 tends to act as a bottleneck for random writes
 RAID level 5 does not have a dedicated check disk but distributes the parity data across all disks
▪ Increasing the performance of random writes
▪ Read performance is marginally improved
▪ Sequential read and write performance is similar to level 4  Cost is moderate, with the same effective space
utilization as level 4
 The system can tolerate the loss of one drive
John Edgar 22

 RAID levels 4 and 5 can only cope with single disk crashes
▪ If multiple disks crash at the same time data will be lost  It is unlikely that two disks will fail at the same time
▪ At least, by coincidence
 Replacing a disk is time consuming
▪ Therefore it is possible that a second disk may fail during the rebuilding process
▪ RAID level 6 allows systems to deal with multiple disk crashes
John Edgar 23

 RAID level 6 records two sets of parity data for each set of bits across all disks
▪ Using two different parity schemes
▪ The system requires at least 4 disks
▪ In a 4 disk system half of the data is parity data
▪ As the number of disks increases the percentage of redundant data decreases
 The redundancy data allows a system to recover from two simultaneous disk crashes
John Edgar 24

 The parity data is distributed across all disks ▪ Read performance is similar to level 5
 Write performance is worse than level 5
▪ Sequential writes are slower since two sets of
parity data have to be calculated
▪ Random writes are considerable slower since the two sets of parity data are on separate disks
▪ Which requires a read-modify-write cycle for each disk John Edgar 25

 In real-life RAID systems the disk array is partitioned into reliability groups
▪ A reliability group consists of a set of data disks and a set of check disks
▪ The number of check disks depends on the reliability level that is selected
 Using a RAID system and reliability groups greatly increases reliability over a single disk
▪ Making the possibility of failure relatively remote
John Edgar 26

 RAID level summary
▪ Level 0 is cheap, improves performance but not reliability
▪ Level 1+0 is better than level 1 and has the best write performance ▪ Levels 2 and 4 are always inferior to 3 and 5
▪ Level 3 is worse than 4 for random requests
▪ Level 5 is a good general-purpose solution
▪ Level 6 is appropriate if higher reliability is required  The choice is usually between 0, 1+0, 5 and 6
▪ There are a number of other RAID levels that are also used derived from the levels discussed here
▪ RAID 7, RAID-DP, RAID-K, RAID-Z, …
John Edgar 27