A Case for Redundant Arrays of Inexpensive Disks (RAID) Notes | EduRev

Created by: Stuti Nagpal

: A Case for Redundant Arrays of Inexpensive Disks (RAID) Notes | EduRev

 Page 1


A Case for Redundant Arrays of Inexpensive Disks (RAID) 
Davtd A Patterson, Garth Gibson, and Randy H Katz 
Computer Saence D~v~smn 
Department of Elecmcal Engmeermg and Computer Sclencea 
571 Evans Hall 
Umversity of Cabforma 
Berkeley. CA 94720 (partrsl@WF -kY du) 
Abstract Increasmg performance of CPUs and memorres wrll be 
squandered lf not matched by a sunrlm peformance ourease m II0 Whde 
the capactty of Smgle Large Expenstve D&T (SLED) has grown rapuily, 
the performance rmprovement of SLED has been modest Redundant 
Arrays of Inexpensive Disks (RAID), based on the magnetic duk 
technology developed for personal computers, offers an attractive 
alternattve IO SLED, promtang onprovements of an or&r of mogm&e m 
pctformance, rehabdlty, power consumption, and scalalnlrty Thu paper 
rntroducesfivc levels of RAIDS, grvmg rheu relative costlpetfotmance, and 
compares RAID to an IBM 3380 and a Fupisu Super Eagle 
1 Background: Rlsrng CPU and Memory Performance 
The users of computers are currently enJoymg unprecedented growth 
m the speed of computers Gordon Bell said that between 1974 and 1984. smgle chip computers improved m performance by 40% per year, about 
twice the rate of mmlcomputers [Bell 841 In the followmg year B111 Joy 
predicted an even faster growth [Joy 851 Mamframe and supercomputer manufacturers, havmg &fficulty keeping 
pace with the rapId growth predicted by “Joy’s Law,” cope by offermg 
m&processors as theu top-of-the-lme product. 
But a fast CPU does not a fast system make Gene Amdahl related 
CPU speed to mam memory s12e usmg this rule [Siewmrek 821 Each CPU mnstrucaon per second requues one byte of moan memory, 
If computer system costs are not to be dommated by the cost of memory, 
then Amdahl’s constant suggests that memory chip capacity should grow 
at the same rate Gordon Moore pr&cted that growth rate over 20 years 
fransuforslclup = 2y*-1%4 
AK predzted by Moore’s Law, RAMs have quadrupled m capacity every 
twotMoom75110threeyeaFIyers861 
Recently the rauo of megabytes of mam memory to MIPS ha9 been 
defti as ahha [Garcm 841. vvlth Amdahl’s constant meanmg alpha = 1 In parl because of the rapti drop of memory prices, mam memory we.9 have 
grownfastexthanCPUspeedsandmanymachmesare~ppedtoday~th 
alphas of 3 or tigha 
To mamtam the balance of costs m computer systems, secondary 
storage must match the advances m other parts of the system A key meas- 
Pemuswn to copy mthout fee all or w of &IS matcnal IS granted pronded that the COP!S 
zzrc not made or lstnbuted for dwct commernal advantage, the ACM copyright notIce 
and the tltk of the pubbcatuon and IW da’, appear, and notxe IS @“en that COPYI"K IS by 
pemtrs~on of the Association for Computing Machtnery To COPY otherwIse, or to 
repubbsh, requres B fee and/or spenfic perm~ss~o” 
0 1988 ACM 0-89791~268-3/88/~/OlOP $1 50 
ure of magneuc tik technology 1s the growth m the maxnnum number of 
bits that can be stored per square mch, or the bits per mch m a track 
umes the number of tracks per mch Called MA D , for maxunal area1 
density, the “Fmt Law m Disk Density” predicts ~rank87] 
MAD = lo(Year-1971)/10 
Magnettc dd technology has doubled capacity and halved pnce every three 
years, m hne with the growth rate of semiconductor memory, and m 
practice between 1967 and 1979 the dtsk capacity of the average IBM data 
processmg system more than kept up with its mam memory [Stevens81 ] Capacity IS not the o~rty memory charactensuc that must grow 
rapidly to mamtam system balance, since the speed with which 
msuuctions and data are delivered to a CPU also determmes its ulamdte 
perfarmanceThespeedof~mem~has~tpacefoPtworeasons 
(1) the mvenuon of caches, showmg that a small buff= can be managed 
automamzally to contain a substanttal fractmn of memory refaences. 
(2) and the SRAM technology, used to build caches, whose speed has 
lmpmvedattherateof4O%tolOO%peryear 
In umtmst to pnmary memory technologres, the performance of 
single large expensive ma8netuz d&s (SLED) has improved at a modest 
rate These mechamcal devu~ are dommated by the seek and the rotahon 
delays from 1971 to 1981, the raw seek tune for a high-end IBM disk 
improved by only a factor of two whllt the rocstlon hme did not cbange[Harkex811 Greater denslty means a lugher transfer rate when the 
mformatmn 1s found. and extra heads can educe the aveaage seek tnne, but 
the raw seek hme only unproved at a rate of 7% per year There 1s no reasontoexpectafasterratemthenearfuture 
To mamtam balance, computer systems have been usmg even larger 
mam memones or solid state d&s to buffer some of the I/O acttvlty 
This may be a fine solutron for apphcattons whose I/O actrvlty has 
locality of reference and for which volatlltty 1s not an issue. but 
appbcauons dommated by a high rate of random muests for small peces 
of data (such BS tmmact~on-pmcessmg) or by a low number of requests for 
massive amounts of data (such as large simulahons nmnmg on supercomputers) are facmg a sermus p&mnance hmuatmn 
2. The Pendrng I/O Crisw 
What t3 the Impact of lmprovmg the performance of sOme pieces of a 
problem while leavmg others the same? Amdahl’s answer IS now known 
asAmdahl'sLaw[Amdahl67] 
1 S z 
(1-n +flk 
Whae 
S = the effecttve speedup, 
f=fractmnofworkmfastermode,and 
k = speedup whde m faster mode 
I/G 
Suppose that some current appbcatmns spend 10% of thev ume In 
Then when computers are 10X faster--accordmg to Bdl Joy m JUSt 
Over thtte years--then Amdahl’s Law predicts efQcove speedup wdl be only 
5X When we have computers lOOX faster--vm evolutmn of umprcuzessors 
or by multiprocessors-&s applrcatlon will be less than 10X faster, 
wastmg 90% of the potenhal speedup 
Page 2


A Case for Redundant Arrays of Inexpensive Disks (RAID) 
Davtd A Patterson, Garth Gibson, and Randy H Katz 
Computer Saence D~v~smn 
Department of Elecmcal Engmeermg and Computer Sclencea 
571 Evans Hall 
Umversity of Cabforma 
Berkeley. CA 94720 (partrsl@WF -kY du) 
Abstract Increasmg performance of CPUs and memorres wrll be 
squandered lf not matched by a sunrlm peformance ourease m II0 Whde 
the capactty of Smgle Large Expenstve D&T (SLED) has grown rapuily, 
the performance rmprovement of SLED has been modest Redundant 
Arrays of Inexpensive Disks (RAID), based on the magnetic duk 
technology developed for personal computers, offers an attractive 
alternattve IO SLED, promtang onprovements of an or&r of mogm&e m 
pctformance, rehabdlty, power consumption, and scalalnlrty Thu paper 
rntroducesfivc levels of RAIDS, grvmg rheu relative costlpetfotmance, and 
compares RAID to an IBM 3380 and a Fupisu Super Eagle 
1 Background: Rlsrng CPU and Memory Performance 
The users of computers are currently enJoymg unprecedented growth 
m the speed of computers Gordon Bell said that between 1974 and 1984. smgle chip computers improved m performance by 40% per year, about 
twice the rate of mmlcomputers [Bell 841 In the followmg year B111 Joy 
predicted an even faster growth [Joy 851 Mamframe and supercomputer manufacturers, havmg &fficulty keeping 
pace with the rapId growth predicted by “Joy’s Law,” cope by offermg 
m&processors as theu top-of-the-lme product. 
But a fast CPU does not a fast system make Gene Amdahl related 
CPU speed to mam memory s12e usmg this rule [Siewmrek 821 Each CPU mnstrucaon per second requues one byte of moan memory, 
If computer system costs are not to be dommated by the cost of memory, 
then Amdahl’s constant suggests that memory chip capacity should grow 
at the same rate Gordon Moore pr&cted that growth rate over 20 years 
fransuforslclup = 2y*-1%4 
AK predzted by Moore’s Law, RAMs have quadrupled m capacity every 
twotMoom75110threeyeaFIyers861 
Recently the rauo of megabytes of mam memory to MIPS ha9 been 
defti as ahha [Garcm 841. vvlth Amdahl’s constant meanmg alpha = 1 In parl because of the rapti drop of memory prices, mam memory we.9 have 
grownfastexthanCPUspeedsandmanymachmesare~ppedtoday~th 
alphas of 3 or tigha 
To mamtam the balance of costs m computer systems, secondary 
storage must match the advances m other parts of the system A key meas- 
Pemuswn to copy mthout fee all or w of &IS matcnal IS granted pronded that the COP!S 
zzrc not made or lstnbuted for dwct commernal advantage, the ACM copyright notIce 
and the tltk of the pubbcatuon and IW da’, appear, and notxe IS @“en that COPYI"K IS by 
pemtrs~on of the Association for Computing Machtnery To COPY otherwIse, or to 
repubbsh, requres B fee and/or spenfic perm~ss~o” 
0 1988 ACM 0-89791~268-3/88/~/OlOP $1 50 
ure of magneuc tik technology 1s the growth m the maxnnum number of 
bits that can be stored per square mch, or the bits per mch m a track 
umes the number of tracks per mch Called MA D , for maxunal area1 
density, the “Fmt Law m Disk Density” predicts ~rank87] 
MAD = lo(Year-1971)/10 
Magnettc dd technology has doubled capacity and halved pnce every three 
years, m hne with the growth rate of semiconductor memory, and m 
practice between 1967 and 1979 the dtsk capacity of the average IBM data 
processmg system more than kept up with its mam memory [Stevens81 ] Capacity IS not the o~rty memory charactensuc that must grow 
rapidly to mamtam system balance, since the speed with which 
msuuctions and data are delivered to a CPU also determmes its ulamdte 
perfarmanceThespeedof~mem~has~tpacefoPtworeasons 
(1) the mvenuon of caches, showmg that a small buff= can be managed 
automamzally to contain a substanttal fractmn of memory refaences. 
(2) and the SRAM technology, used to build caches, whose speed has 
lmpmvedattherateof4O%tolOO%peryear 
In umtmst to pnmary memory technologres, the performance of 
single large expensive ma8netuz d&s (SLED) has improved at a modest 
rate These mechamcal devu~ are dommated by the seek and the rotahon 
delays from 1971 to 1981, the raw seek tune for a high-end IBM disk 
improved by only a factor of two whllt the rocstlon hme did not cbange[Harkex811 Greater denslty means a lugher transfer rate when the 
mformatmn 1s found. and extra heads can educe the aveaage seek tnne, but 
the raw seek hme only unproved at a rate of 7% per year There 1s no reasontoexpectafasterratemthenearfuture 
To mamtam balance, computer systems have been usmg even larger 
mam memones or solid state d&s to buffer some of the I/O acttvlty 
This may be a fine solutron for apphcattons whose I/O actrvlty has 
locality of reference and for which volatlltty 1s not an issue. but 
appbcauons dommated by a high rate of random muests for small peces 
of data (such BS tmmact~on-pmcessmg) or by a low number of requests for 
massive amounts of data (such as large simulahons nmnmg on supercomputers) are facmg a sermus p&mnance hmuatmn 
2. The Pendrng I/O Crisw 
What t3 the Impact of lmprovmg the performance of sOme pieces of a 
problem while leavmg others the same? Amdahl’s answer IS now known 
asAmdahl'sLaw[Amdahl67] 
1 S z 
(1-n +flk 
Whae 
S = the effecttve speedup, 
f=fractmnofworkmfastermode,and 
k = speedup whde m faster mode 
I/G 
Suppose that some current appbcatmns spend 10% of thev ume In 
Then when computers are 10X faster--accordmg to Bdl Joy m JUSt 
Over thtte years--then Amdahl’s Law predicts efQcove speedup wdl be only 
5X When we have computers lOOX faster--vm evolutmn of umprcuzessors 
or by multiprocessors-&s applrcatlon will be less than 10X faster, 
wastmg 90% of the potenhal speedup 
Whde we can lmagme improvements m software file systems via 
buffcrmg for near term 40 demands, we need mnovaUon to avoid an J./O 
crms [Boral83] 
3 A Solution: Arrays of Inexpensrve Disks 
RapId unprovements m capacity of large disks have not been the only 
target ofd& designers, smce personal computers have created a market for 
inexpensive magnetic disks These lower cost &sks have lower perfor- 
mance as well as less capacity Table I below compares the top-of-the-lme 
IBM 3380 model AK4 mamframe dtsk, FUJ~$U M2361A “Super Eagle” 
muucomputer disk, and the Conner Penpherals CP 3100 personal 
computer d& 
ChoroctensacS IBM FUJUSU Canners 3380 v 2361 v 
3380 M2361A CP3100 3100 31Go 
(>I mmrr 
3100 Is tt?tter) 
D&c dmmeter (mches) 14 105 35 4 3 Formatted DaraCapaclty (MB) 7500 600 100 01 2 Pr~ce/MB(controller mcl ) $18-$10 $20517 $lO-$7 l-25 17-3 
MlTFRated (hours) 30,oLw 20@030,ooo 1 15 MlTF m pracUce (hours) 100,000 3 ? ?V 
No Actuators 4 1 1 2 1 MaxmuunUO’$econd/ActuaU~ 50 40 30 6 8 Typical I/O’s/second/Actuator JO 24 20 7 8 -~wdsecond/box 200 40 30 2 8 Typical VO’s/secondmox 120 24 20 2 8 Transfer Rate (MB/set) 3 25 1 3 4 Power/box (w) 6,600 640 10 660 64 Volume (cu ft ) 24 34 03 800 110 Table I Companson of IBM 3380 dtsk model AK4 for marnframe 
computers, the Fuptsu M2361A “Super Eagle” dtsk for rmnrcomputers, 
and the Conners Penpherals CP 3100 dtsk for personal computers By 
“‘MOxtmum Ilo’slsecond” we mean the rMxmtum number of average seeks 
and average rotates for a stngle sector access Cost and rehabthty 
rnfonnatzon on the 3380 comes from w&spread expertence [IBM 871 
[hvh2k87] O?kd the lnformatlon on the FuJltsu from the manual [Fu& 
871, whtle some numbers on the new CP3100 are based on speculatton 
The pnce per megabyte w gven as a range to allow for dflerent prices for 
volume &scount and d@rent mark-up practtces of the vendors (The 8 
watt maximum power of the CP3100 was rncreased to 10 watts to allow 
for the tne&xency of an external power supply. stnce rhe other drives 
contan their awn power supphes) 
One suqmsmg fact is that the number of I/Ck per second per Bctuator in an 
inexpensive &Sk is within a factor of two of the large d&s In several of 
the remammg metrics, mcludmg pnce per megabyte, the mexpenslve disk 
ts supenor or equal to the large Qsks 
The small size and low power are even more Impressive since dsks 
such as the CP31CO contam full track buffers and most funcUons of the 
traditional mainframe controller Small disk manufacturers can provide 
such funcUons m high volume dusks because of the efforts of standards 
comm~ttces m defmmg hrgher level penpheral mterfaces. such as the ANSI 
x3 131-1986 Small Computer System Interface (SCSI) Such standards 
have encouraged companies bke Adeptec to offer SCSI mterfaces as single 
chips, m turn allowing &Sk compames to embed mamfiame controller 
functrons at low cost Figure 1 compares the uadltlonal mamframe dsk 
approach and the small computer disk approach 7%~. sine SCSI mterface 
chip emLxd&d as a controller m every disk can also be uSed aS the dmXt 
memory access @MA) deuce at the other end of the SCSI bus 
Such charactensUcs lead to our proposal for buddmg I/O systems as 
-YS of mexpenslve d&s, either mterleaved for the large tninsfers of 
supercomputers [I(lm 86]@vny 871[Satem861 or mdependent for the many 
small mnsfen of transacUon processmg Usmg the mformamn m ‘fable 
I, 75 ~~xpensrve disks potentmlly have 12 hmcs the I/O bandwIdth of the 
IBM 3380 and the same capacity, with lower power COnSUmpUOn and Cost 
4 Caveats 
We cannot explore all issues associated with such -ys m the space 
avaIlable for this paper, so we ConCefltNte on fundamental estimates of 
price-performance and rehabduy Our reasoning IS that If there are no advantages m pnceperformance or temble d&vantages m rehabdlty, then 
there IS IIO need to explore further We chamctenze a transacUon-processing 
workload to evaluate performance of a col&Uon of iexpensive d&s. but 
remember that such a CollecUon is Just one hardware component of a 
complete tranacUon-processmg system While deslgnmg a complete TPS 
based on these ideas 1s enUcmg, we will resst that temptaUon m this 
paper Cabling and packagmg, certamly an issue m the cost and rehablhty 
of an array of many mexpenslve d&s, IS also beyond this paper’s scope 
Mainframe Small Computer 
CPU 
LJ 
0% 
Memoly Channel 
. . . 
. . . 
CPU 
a 
dm 
Figure 1 Comparison of organizations for typlca/ mat&me and small 
compter ahk tnterfaces Stngle chrp SCSI tnte@ces such as the Adaptec 
MC-6250 allow the small computer to ure a single crUp to be the DMA 
tnterface as well as pronde an embedded controllerfor each dtsk [Adeptec 
871 (The pnce per megabyte an Table I mcludes evetythtng zn the shaded 
box.?sabovc) 
5. And Now The Bad News: Reliabihty 
The unrehabd~ty of d&s forces computer systems managers to make 
backup versions of mformaUon quite frequently m case of fmlure What 
would be the impact on relmbdlty of havmg a hundredfold Increase m 
disks? Assummg a constant fmlure rate--that is. an exponenhally 
dlsmbuted Ume to fadure--and that failures are Independent--both 
assumptmns made by dtsk manufacturers when cakulaUng the Mean Time 
To Fadure O--the zebablhty of an array of d&s IS 
MITF ofa slngtc &sk 
MTI’F of a Drsk Array = 
Number MDuks m the Array 
Using the mformatron m Table I. the MTTF of 100 CP 3100 d&s 1s 
30,000/100 = 300 hours, or less than 2 weeks Compared to the 30,ooO 
hour (> 3 years) MTTF of the IBM 3380, this IS &smal If we consider 
scaling the army to 1000 disks, lhen the MTTF IS 30 hours or about one 
day, reqmrmg an ad~ecIne. worse rhan dismal 
Without fault tolerance, large arrays of mexpenstve Qsks are too unrehable to be useful 
6. A Better Solution’ RAID 
To overcome the rebabtbty challenge, we must make use of extra 
d&s contammg redundant mformaUon to recover the ongmai mformatmn 
when a &Sk fads Our acronym for these Redundant Arrays of Inexpensn’e 
Disks IS RAID To sunplify the explanaUon of our final proposal and to avold confusmn wnh previous work, we give a taxonomy of five different 
orgamzaUons of dtsk arrays, begmnmg with murored disks and progressmg 
through a variety of ahemaUves with &ffenng performance and rehablhty 
We refer to each orgamzauon as a RAID level 
The reader should be forewarned that we describe all levels as If 
implemented m hardware solely to slmphfy the presentation, for RAID 
Ideas are apphcable to software implementauons as well as hardware 
Reltabthty Our baste approach will be to break the arrays into 
rellabrhty groups, with each group having extra “check” disks contammg 
redundant mformauon When a disk fads we assume that withm a short 
time the failed disk ~111 be replaced and the mformauon wdl be 
110 
Page 3


A Case for Redundant Arrays of Inexpensive Disks (RAID) 
Davtd A Patterson, Garth Gibson, and Randy H Katz 
Computer Saence D~v~smn 
Department of Elecmcal Engmeermg and Computer Sclencea 
571 Evans Hall 
Umversity of Cabforma 
Berkeley. CA 94720 (partrsl@WF -kY du) 
Abstract Increasmg performance of CPUs and memorres wrll be 
squandered lf not matched by a sunrlm peformance ourease m II0 Whde 
the capactty of Smgle Large Expenstve D&T (SLED) has grown rapuily, 
the performance rmprovement of SLED has been modest Redundant 
Arrays of Inexpensive Disks (RAID), based on the magnetic duk 
technology developed for personal computers, offers an attractive 
alternattve IO SLED, promtang onprovements of an or&r of mogm&e m 
pctformance, rehabdlty, power consumption, and scalalnlrty Thu paper 
rntroducesfivc levels of RAIDS, grvmg rheu relative costlpetfotmance, and 
compares RAID to an IBM 3380 and a Fupisu Super Eagle 
1 Background: Rlsrng CPU and Memory Performance 
The users of computers are currently enJoymg unprecedented growth 
m the speed of computers Gordon Bell said that between 1974 and 1984. smgle chip computers improved m performance by 40% per year, about 
twice the rate of mmlcomputers [Bell 841 In the followmg year B111 Joy 
predicted an even faster growth [Joy 851 Mamframe and supercomputer manufacturers, havmg &fficulty keeping 
pace with the rapId growth predicted by “Joy’s Law,” cope by offermg 
m&processors as theu top-of-the-lme product. 
But a fast CPU does not a fast system make Gene Amdahl related 
CPU speed to mam memory s12e usmg this rule [Siewmrek 821 Each CPU mnstrucaon per second requues one byte of moan memory, 
If computer system costs are not to be dommated by the cost of memory, 
then Amdahl’s constant suggests that memory chip capacity should grow 
at the same rate Gordon Moore pr&cted that growth rate over 20 years 
fransuforslclup = 2y*-1%4 
AK predzted by Moore’s Law, RAMs have quadrupled m capacity every 
twotMoom75110threeyeaFIyers861 
Recently the rauo of megabytes of mam memory to MIPS ha9 been 
defti as ahha [Garcm 841. vvlth Amdahl’s constant meanmg alpha = 1 In parl because of the rapti drop of memory prices, mam memory we.9 have 
grownfastexthanCPUspeedsandmanymachmesare~ppedtoday~th 
alphas of 3 or tigha 
To mamtam the balance of costs m computer systems, secondary 
storage must match the advances m other parts of the system A key meas- 
Pemuswn to copy mthout fee all or w of &IS matcnal IS granted pronded that the COP!S 
zzrc not made or lstnbuted for dwct commernal advantage, the ACM copyright notIce 
and the tltk of the pubbcatuon and IW da’, appear, and notxe IS @“en that COPYI"K IS by 
pemtrs~on of the Association for Computing Machtnery To COPY otherwIse, or to 
repubbsh, requres B fee and/or spenfic perm~ss~o” 
0 1988 ACM 0-89791~268-3/88/~/OlOP $1 50 
ure of magneuc tik technology 1s the growth m the maxnnum number of 
bits that can be stored per square mch, or the bits per mch m a track 
umes the number of tracks per mch Called MA D , for maxunal area1 
density, the “Fmt Law m Disk Density” predicts ~rank87] 
MAD = lo(Year-1971)/10 
Magnettc dd technology has doubled capacity and halved pnce every three 
years, m hne with the growth rate of semiconductor memory, and m 
practice between 1967 and 1979 the dtsk capacity of the average IBM data 
processmg system more than kept up with its mam memory [Stevens81 ] Capacity IS not the o~rty memory charactensuc that must grow 
rapidly to mamtam system balance, since the speed with which 
msuuctions and data are delivered to a CPU also determmes its ulamdte 
perfarmanceThespeedof~mem~has~tpacefoPtworeasons 
(1) the mvenuon of caches, showmg that a small buff= can be managed 
automamzally to contain a substanttal fractmn of memory refaences. 
(2) and the SRAM technology, used to build caches, whose speed has 
lmpmvedattherateof4O%tolOO%peryear 
In umtmst to pnmary memory technologres, the performance of 
single large expensive ma8netuz d&s (SLED) has improved at a modest 
rate These mechamcal devu~ are dommated by the seek and the rotahon 
delays from 1971 to 1981, the raw seek tune for a high-end IBM disk 
improved by only a factor of two whllt the rocstlon hme did not cbange[Harkex811 Greater denslty means a lugher transfer rate when the 
mformatmn 1s found. and extra heads can educe the aveaage seek tnne, but 
the raw seek hme only unproved at a rate of 7% per year There 1s no reasontoexpectafasterratemthenearfuture 
To mamtam balance, computer systems have been usmg even larger 
mam memones or solid state d&s to buffer some of the I/O acttvlty 
This may be a fine solutron for apphcattons whose I/O actrvlty has 
locality of reference and for which volatlltty 1s not an issue. but 
appbcauons dommated by a high rate of random muests for small peces 
of data (such BS tmmact~on-pmcessmg) or by a low number of requests for 
massive amounts of data (such as large simulahons nmnmg on supercomputers) are facmg a sermus p&mnance hmuatmn 
2. The Pendrng I/O Crisw 
What t3 the Impact of lmprovmg the performance of sOme pieces of a 
problem while leavmg others the same? Amdahl’s answer IS now known 
asAmdahl'sLaw[Amdahl67] 
1 S z 
(1-n +flk 
Whae 
S = the effecttve speedup, 
f=fractmnofworkmfastermode,and 
k = speedup whde m faster mode 
I/G 
Suppose that some current appbcatmns spend 10% of thev ume In 
Then when computers are 10X faster--accordmg to Bdl Joy m JUSt 
Over thtte years--then Amdahl’s Law predicts efQcove speedup wdl be only 
5X When we have computers lOOX faster--vm evolutmn of umprcuzessors 
or by multiprocessors-&s applrcatlon will be less than 10X faster, 
wastmg 90% of the potenhal speedup 
Whde we can lmagme improvements m software file systems via 
buffcrmg for near term 40 demands, we need mnovaUon to avoid an J./O 
crms [Boral83] 
3 A Solution: Arrays of Inexpensrve Disks 
RapId unprovements m capacity of large disks have not been the only 
target ofd& designers, smce personal computers have created a market for 
inexpensive magnetic disks These lower cost &sks have lower perfor- 
mance as well as less capacity Table I below compares the top-of-the-lme 
IBM 3380 model AK4 mamframe dtsk, FUJ~$U M2361A “Super Eagle” 
muucomputer disk, and the Conner Penpherals CP 3100 personal 
computer d& 
ChoroctensacS IBM FUJUSU Canners 3380 v 2361 v 
3380 M2361A CP3100 3100 31Go 
(>I mmrr 
3100 Is tt?tter) 
D&c dmmeter (mches) 14 105 35 4 3 Formatted DaraCapaclty (MB) 7500 600 100 01 2 Pr~ce/MB(controller mcl ) $18-$10 $20517 $lO-$7 l-25 17-3 
MlTFRated (hours) 30,oLw 20@030,ooo 1 15 MlTF m pracUce (hours) 100,000 3 ? ?V 
No Actuators 4 1 1 2 1 MaxmuunUO’$econd/ActuaU~ 50 40 30 6 8 Typical I/O’s/second/Actuator JO 24 20 7 8 -~wdsecond/box 200 40 30 2 8 Typical VO’s/secondmox 120 24 20 2 8 Transfer Rate (MB/set) 3 25 1 3 4 Power/box (w) 6,600 640 10 660 64 Volume (cu ft ) 24 34 03 800 110 Table I Companson of IBM 3380 dtsk model AK4 for marnframe 
computers, the Fuptsu M2361A “Super Eagle” dtsk for rmnrcomputers, 
and the Conners Penpherals CP 3100 dtsk for personal computers By 
“‘MOxtmum Ilo’slsecond” we mean the rMxmtum number of average seeks 
and average rotates for a stngle sector access Cost and rehabthty 
rnfonnatzon on the 3380 comes from w&spread expertence [IBM 871 
[hvh2k87] O?kd the lnformatlon on the FuJltsu from the manual [Fu& 
871, whtle some numbers on the new CP3100 are based on speculatton 
The pnce per megabyte w gven as a range to allow for dflerent prices for 
volume &scount and d@rent mark-up practtces of the vendors (The 8 
watt maximum power of the CP3100 was rncreased to 10 watts to allow 
for the tne&xency of an external power supply. stnce rhe other drives 
contan their awn power supphes) 
One suqmsmg fact is that the number of I/Ck per second per Bctuator in an 
inexpensive &Sk is within a factor of two of the large d&s In several of 
the remammg metrics, mcludmg pnce per megabyte, the mexpenslve disk 
ts supenor or equal to the large Qsks 
The small size and low power are even more Impressive since dsks 
such as the CP31CO contam full track buffers and most funcUons of the 
traditional mainframe controller Small disk manufacturers can provide 
such funcUons m high volume dusks because of the efforts of standards 
comm~ttces m defmmg hrgher level penpheral mterfaces. such as the ANSI 
x3 131-1986 Small Computer System Interface (SCSI) Such standards 
have encouraged companies bke Adeptec to offer SCSI mterfaces as single 
chips, m turn allowing &Sk compames to embed mamfiame controller 
functrons at low cost Figure 1 compares the uadltlonal mamframe dsk 
approach and the small computer disk approach 7%~. sine SCSI mterface 
chip emLxd&d as a controller m every disk can also be uSed aS the dmXt 
memory access @MA) deuce at the other end of the SCSI bus 
Such charactensUcs lead to our proposal for buddmg I/O systems as 
-YS of mexpenslve d&s, either mterleaved for the large tninsfers of 
supercomputers [I(lm 86]@vny 871[Satem861 or mdependent for the many 
small mnsfen of transacUon processmg Usmg the mformamn m ‘fable 
I, 75 ~~xpensrve disks potentmlly have 12 hmcs the I/O bandwIdth of the 
IBM 3380 and the same capacity, with lower power COnSUmpUOn and Cost 
4 Caveats 
We cannot explore all issues associated with such -ys m the space 
avaIlable for this paper, so we ConCefltNte on fundamental estimates of 
price-performance and rehabduy Our reasoning IS that If there are no advantages m pnceperformance or temble d&vantages m rehabdlty, then 
there IS IIO need to explore further We chamctenze a transacUon-processing 
workload to evaluate performance of a col&Uon of iexpensive d&s. but 
remember that such a CollecUon is Just one hardware component of a 
complete tranacUon-processmg system While deslgnmg a complete TPS 
based on these ideas 1s enUcmg, we will resst that temptaUon m this 
paper Cabling and packagmg, certamly an issue m the cost and rehablhty 
of an array of many mexpenslve d&s, IS also beyond this paper’s scope 
Mainframe Small Computer 
CPU 
LJ 
0% 
Memoly Channel 
. . . 
. . . 
CPU 
a 
dm 
Figure 1 Comparison of organizations for typlca/ mat&me and small 
compter ahk tnterfaces Stngle chrp SCSI tnte@ces such as the Adaptec 
MC-6250 allow the small computer to ure a single crUp to be the DMA 
tnterface as well as pronde an embedded controllerfor each dtsk [Adeptec 
871 (The pnce per megabyte an Table I mcludes evetythtng zn the shaded 
box.?sabovc) 
5. And Now The Bad News: Reliabihty 
The unrehabd~ty of d&s forces computer systems managers to make 
backup versions of mformaUon quite frequently m case of fmlure What 
would be the impact on relmbdlty of havmg a hundredfold Increase m 
disks? Assummg a constant fmlure rate--that is. an exponenhally 
dlsmbuted Ume to fadure--and that failures are Independent--both 
assumptmns made by dtsk manufacturers when cakulaUng the Mean Time 
To Fadure O--the zebablhty of an array of d&s IS 
MITF ofa slngtc &sk 
MTI’F of a Drsk Array = 
Number MDuks m the Array 
Using the mformatron m Table I. the MTTF of 100 CP 3100 d&s 1s 
30,000/100 = 300 hours, or less than 2 weeks Compared to the 30,ooO 
hour (> 3 years) MTTF of the IBM 3380, this IS &smal If we consider 
scaling the army to 1000 disks, lhen the MTTF IS 30 hours or about one 
day, reqmrmg an ad~ecIne. worse rhan dismal 
Without fault tolerance, large arrays of mexpenstve Qsks are too unrehable to be useful 
6. A Better Solution’ RAID 
To overcome the rebabtbty challenge, we must make use of extra 
d&s contammg redundant mformaUon to recover the ongmai mformatmn 
when a &Sk fads Our acronym for these Redundant Arrays of Inexpensn’e 
Disks IS RAID To sunplify the explanaUon of our final proposal and to avold confusmn wnh previous work, we give a taxonomy of five different 
orgamzaUons of dtsk arrays, begmnmg with murored disks and progressmg 
through a variety of ahemaUves with &ffenng performance and rehablhty 
We refer to each orgamzauon as a RAID level 
The reader should be forewarned that we describe all levels as If 
implemented m hardware solely to slmphfy the presentation, for RAID 
Ideas are apphcable to software implementauons as well as hardware 
Reltabthty Our baste approach will be to break the arrays into 
rellabrhty groups, with each group having extra “check” disks contammg 
redundant mformauon When a disk fads we assume that withm a short 
time the failed disk ~111 be replaced and the mformauon wdl be 
110 
recon~ acted on to the new dlbk usmg the redundant mformauon Th1.s 
time IS Ldled the mean time to repair (MlTR) The MTTR can be reduced 
If the system includes extra d&s to act as “hot” standby spares, when a 
disk fmls, a replacement disk IS swltched m elecrromcally Penodlcally a 
human operator replaces all faded d&s Here are other terms that we use 
D = total number of d&s with data (not mcludmg extra check d&s). 
G = number of data d&s m a group (not mcludmg extra check d&s), 
C = number of check d&s m a group, 
nG =D/G=nUmberOfgoUp& 
As menhoned above we make the same assumptions that disk 
manufacturers make--that fadura are exponenual and mdependent (An 
earthquake or power surge IS a sltuatlon where an array of d&s might not foul Independently ) Since these reliability prticuons wdl be very high, 
we want to emphasize that the rehabdlty IS only of the the &sk-head 
assemblies with this fmlure model, and not the whole software and 
electromc system In ad&non, m our view the pace of technology means 
extremely lugh WF are “overlull”--for, independent of expected bfeume, 
users will replace obsolete &sks After all, how many people are stdl 
using 20 year old d&s? 
The general MT’TF calculation for single-error repamng RAID 1s 
given III two steps Fmt, the group MTIF IS 
mFDtsk 
I MrrF,,,, = 
* 
G+C Probabdrty ofanotherfadure m a group 
b&re repamng the dead oisk 
As more formally denved m the appendix, the probabdlty of a second 
fa&nebeforethefirsthasbeenrepauedIs 
MlTR hill-R 
Probabdrty of = 
E 
Another Failure bfnF,,,,k /(No DIS~T- 1) MmF/j,k /(w-l) 
The mtmuon behmd the formal calculation m-the appendix comes 
from trymg to calculate the average number of second d& fdures durmg 
the repau time for X single &Sk fadures Since we assume that Qsk fadures 
occur at a umform rate, tha average number of second fa&ues durmg the 
rcpau tune for X first fadures 1s 
X *MlTR 
MlTF of remamtng d&s u) the group 
The average number of second fathues for a smgle d&z 1s then 
MlTR 
bfnFD,& / No Of W?UlUllIl~ drSkS l?l the group 
The MTTF of the retnaming disks IS Just the MTI’F of a smgle disk 
dnwkd by the number of go4 disks m the gmup. gwmg the result above 
The second step IS the reltablhty of the whole system, which IS 
approxl~~~teiy (smcc MITFGrow 1s not qmte titnbuted exponentrally) 
MTrFGrarp 
MTTFRAID = 
Pi 
Pluggmg It all together, we get. 
mFD,sk mFD,sk 
1 MITFRAID = - 
* 
*- 
G+C (G+C-l)*MITR “c 
(MmFDtsk)2 
= (G+C)*tlG * (G+C-l)*MITR 
Smce the formula 1s tbe same for each level, we make the abstract 
numbers concrete usmg these parameters as appropriate D=loO total data 
d&s, G=lO data disks per group, M7VDcsk = 30,000 hours, MmR = 1 hour, with the check d&s per group C detennmed by the RAID level 
Relubrlrty Overhead Cost This IS stmply the extra check 
disks. expressed as a percentage of the number of data &sks D As we shall 
see below, the cost vanes WIUI RAID level fmm 100% down to 4% 
Useable Storage Capacity Percentage Another way to express this rellabdlty overhead 1s m terms of the percentage of the total 
capacity of data &sks and check disks that can be used to store data 
Depending on the orgamauon, this vanes from a low of 50% to a high of 
96% 
Performance Smce supercomputer applications and 
transaction-processing systems have &fferent access patterns and rates, we 
need different metncs to evaluate both For supercomputers we count the 
number of reads and wnte.s per second for large blocks of data, with large 
defined as gettmg at least one sector from each data d& III a group Durmg 
large transfers all the disks m a group act as a stngle umt, each readmg or 
wntmg a pomon of the large data block m parallel 
A better measure for transacuon-processmg systems s the number of 
indlvrdual reads or writes per second Smce transacuon-processing 
systems (e g , deblts/cre&ts) use a read-modify-wnte sequence of disk 
accesses, we mclude that metnc as well Ideally durmg small transfers each 
dsk m a group can act mdepe&ndy. e~thez readmg or wntmg mdependent 
mfonnatmn In summary supercomputer applicauons need a hrgh dura rure 
whale transacuon-pmcessm g need a hrgh II0 rate 
For both the large and small transfer calculauons we assume the 
mlmmum user request IS a sector, that a sector 1s small relauve to a track, 
and that there 1s enough work to keep every devtce busy Thus sector size 
affects both dusk storage efficiency and transfer sue Figure 2 shows the 
uiealoperauonoflargeandsmall~accessesmaRAID 
(a) Stngle Large or “Graupcd” Read 
(lreadqwadoverGd&s) 
1tt 1 
q nl .*. 
(b) Several Smll or Indmdual Reads and Writes 
(GndsandlorwntcsqmndawrG&sks) 
Figure 2. Large tramfer vs small tran$ers WI a group of G d&s 
The SIX pelformauce memcs are then the number of reads, wntes, and 
read-mod@-writes per second for both large (grouped) or small (mdlvldual) 
transfers Rather than @ve absolute numbers for each memc, we calculate 
efficiency the number of events per second for a RAID relative to the 
corrcqondmg events per second for a smgle dusk (This ts Boral’s I/O 
bandwidth per ggabyte moral 831 scaled to glgabytes per disk ) In Uns 
pap we are after fundamental Mferences so we use ample. demmmlstlc 
throughput measures for our pezformance memc rather than latency 
Effective Performance Per Dnk The cost of d&s can be a 
large portmn of the cost of a database system, so the I/O performance per 
disk--factonng m the overhead of the check disks--suggests the 
cost/performance of a system ‘flus IS the bottom line for a RAID 
111 
Page 4


A Case for Redundant Arrays of Inexpensive Disks (RAID) 
Davtd A Patterson, Garth Gibson, and Randy H Katz 
Computer Saence D~v~smn 
Department of Elecmcal Engmeermg and Computer Sclencea 
571 Evans Hall 
Umversity of Cabforma 
Berkeley. CA 94720 (partrsl@WF -kY du) 
Abstract Increasmg performance of CPUs and memorres wrll be 
squandered lf not matched by a sunrlm peformance ourease m II0 Whde 
the capactty of Smgle Large Expenstve D&T (SLED) has grown rapuily, 
the performance rmprovement of SLED has been modest Redundant 
Arrays of Inexpensive Disks (RAID), based on the magnetic duk 
technology developed for personal computers, offers an attractive 
alternattve IO SLED, promtang onprovements of an or&r of mogm&e m 
pctformance, rehabdlty, power consumption, and scalalnlrty Thu paper 
rntroducesfivc levels of RAIDS, grvmg rheu relative costlpetfotmance, and 
compares RAID to an IBM 3380 and a Fupisu Super Eagle 
1 Background: Rlsrng CPU and Memory Performance 
The users of computers are currently enJoymg unprecedented growth 
m the speed of computers Gordon Bell said that between 1974 and 1984. smgle chip computers improved m performance by 40% per year, about 
twice the rate of mmlcomputers [Bell 841 In the followmg year B111 Joy 
predicted an even faster growth [Joy 851 Mamframe and supercomputer manufacturers, havmg &fficulty keeping 
pace with the rapId growth predicted by “Joy’s Law,” cope by offermg 
m&processors as theu top-of-the-lme product. 
But a fast CPU does not a fast system make Gene Amdahl related 
CPU speed to mam memory s12e usmg this rule [Siewmrek 821 Each CPU mnstrucaon per second requues one byte of moan memory, 
If computer system costs are not to be dommated by the cost of memory, 
then Amdahl’s constant suggests that memory chip capacity should grow 
at the same rate Gordon Moore pr&cted that growth rate over 20 years 
fransuforslclup = 2y*-1%4 
AK predzted by Moore’s Law, RAMs have quadrupled m capacity every 
twotMoom75110threeyeaFIyers861 
Recently the rauo of megabytes of mam memory to MIPS ha9 been 
defti as ahha [Garcm 841. vvlth Amdahl’s constant meanmg alpha = 1 In parl because of the rapti drop of memory prices, mam memory we.9 have 
grownfastexthanCPUspeedsandmanymachmesare~ppedtoday~th 
alphas of 3 or tigha 
To mamtam the balance of costs m computer systems, secondary 
storage must match the advances m other parts of the system A key meas- 
Pemuswn to copy mthout fee all or w of &IS matcnal IS granted pronded that the COP!S 
zzrc not made or lstnbuted for dwct commernal advantage, the ACM copyright notIce 
and the tltk of the pubbcatuon and IW da’, appear, and notxe IS @“en that COPYI"K IS by 
pemtrs~on of the Association for Computing Machtnery To COPY otherwIse, or to 
repubbsh, requres B fee and/or spenfic perm~ss~o” 
0 1988 ACM 0-89791~268-3/88/~/OlOP $1 50 
ure of magneuc tik technology 1s the growth m the maxnnum number of 
bits that can be stored per square mch, or the bits per mch m a track 
umes the number of tracks per mch Called MA D , for maxunal area1 
density, the “Fmt Law m Disk Density” predicts ~rank87] 
MAD = lo(Year-1971)/10 
Magnettc dd technology has doubled capacity and halved pnce every three 
years, m hne with the growth rate of semiconductor memory, and m 
practice between 1967 and 1979 the dtsk capacity of the average IBM data 
processmg system more than kept up with its mam memory [Stevens81 ] Capacity IS not the o~rty memory charactensuc that must grow 
rapidly to mamtam system balance, since the speed with which 
msuuctions and data are delivered to a CPU also determmes its ulamdte 
perfarmanceThespeedof~mem~has~tpacefoPtworeasons 
(1) the mvenuon of caches, showmg that a small buff= can be managed 
automamzally to contain a substanttal fractmn of memory refaences. 
(2) and the SRAM technology, used to build caches, whose speed has 
lmpmvedattherateof4O%tolOO%peryear 
In umtmst to pnmary memory technologres, the performance of 
single large expensive ma8netuz d&s (SLED) has improved at a modest 
rate These mechamcal devu~ are dommated by the seek and the rotahon 
delays from 1971 to 1981, the raw seek tune for a high-end IBM disk 
improved by only a factor of two whllt the rocstlon hme did not cbange[Harkex811 Greater denslty means a lugher transfer rate when the 
mformatmn 1s found. and extra heads can educe the aveaage seek tnne, but 
the raw seek hme only unproved at a rate of 7% per year There 1s no reasontoexpectafasterratemthenearfuture 
To mamtam balance, computer systems have been usmg even larger 
mam memones or solid state d&s to buffer some of the I/O acttvlty 
This may be a fine solutron for apphcattons whose I/O actrvlty has 
locality of reference and for which volatlltty 1s not an issue. but 
appbcauons dommated by a high rate of random muests for small peces 
of data (such BS tmmact~on-pmcessmg) or by a low number of requests for 
massive amounts of data (such as large simulahons nmnmg on supercomputers) are facmg a sermus p&mnance hmuatmn 
2. The Pendrng I/O Crisw 
What t3 the Impact of lmprovmg the performance of sOme pieces of a 
problem while leavmg others the same? Amdahl’s answer IS now known 
asAmdahl'sLaw[Amdahl67] 
1 S z 
(1-n +flk 
Whae 
S = the effecttve speedup, 
f=fractmnofworkmfastermode,and 
k = speedup whde m faster mode 
I/G 
Suppose that some current appbcatmns spend 10% of thev ume In 
Then when computers are 10X faster--accordmg to Bdl Joy m JUSt 
Over thtte years--then Amdahl’s Law predicts efQcove speedup wdl be only 
5X When we have computers lOOX faster--vm evolutmn of umprcuzessors 
or by multiprocessors-&s applrcatlon will be less than 10X faster, 
wastmg 90% of the potenhal speedup 
Whde we can lmagme improvements m software file systems via 
buffcrmg for near term 40 demands, we need mnovaUon to avoid an J./O 
crms [Boral83] 
3 A Solution: Arrays of Inexpensrve Disks 
RapId unprovements m capacity of large disks have not been the only 
target ofd& designers, smce personal computers have created a market for 
inexpensive magnetic disks These lower cost &sks have lower perfor- 
mance as well as less capacity Table I below compares the top-of-the-lme 
IBM 3380 model AK4 mamframe dtsk, FUJ~$U M2361A “Super Eagle” 
muucomputer disk, and the Conner Penpherals CP 3100 personal 
computer d& 
ChoroctensacS IBM FUJUSU Canners 3380 v 2361 v 
3380 M2361A CP3100 3100 31Go 
(>I mmrr 
3100 Is tt?tter) 
D&c dmmeter (mches) 14 105 35 4 3 Formatted DaraCapaclty (MB) 7500 600 100 01 2 Pr~ce/MB(controller mcl ) $18-$10 $20517 $lO-$7 l-25 17-3 
MlTFRated (hours) 30,oLw 20@030,ooo 1 15 MlTF m pracUce (hours) 100,000 3 ? ?V 
No Actuators 4 1 1 2 1 MaxmuunUO’$econd/ActuaU~ 50 40 30 6 8 Typical I/O’s/second/Actuator JO 24 20 7 8 -~wdsecond/box 200 40 30 2 8 Typical VO’s/secondmox 120 24 20 2 8 Transfer Rate (MB/set) 3 25 1 3 4 Power/box (w) 6,600 640 10 660 64 Volume (cu ft ) 24 34 03 800 110 Table I Companson of IBM 3380 dtsk model AK4 for marnframe 
computers, the Fuptsu M2361A “Super Eagle” dtsk for rmnrcomputers, 
and the Conners Penpherals CP 3100 dtsk for personal computers By 
“‘MOxtmum Ilo’slsecond” we mean the rMxmtum number of average seeks 
and average rotates for a stngle sector access Cost and rehabthty 
rnfonnatzon on the 3380 comes from w&spread expertence [IBM 871 
[hvh2k87] O?kd the lnformatlon on the FuJltsu from the manual [Fu& 
871, whtle some numbers on the new CP3100 are based on speculatton 
The pnce per megabyte w gven as a range to allow for dflerent prices for 
volume &scount and d@rent mark-up practtces of the vendors (The 8 
watt maximum power of the CP3100 was rncreased to 10 watts to allow 
for the tne&xency of an external power supply. stnce rhe other drives 
contan their awn power supphes) 
One suqmsmg fact is that the number of I/Ck per second per Bctuator in an 
inexpensive &Sk is within a factor of two of the large d&s In several of 
the remammg metrics, mcludmg pnce per megabyte, the mexpenslve disk 
ts supenor or equal to the large Qsks 
The small size and low power are even more Impressive since dsks 
such as the CP31CO contam full track buffers and most funcUons of the 
traditional mainframe controller Small disk manufacturers can provide 
such funcUons m high volume dusks because of the efforts of standards 
comm~ttces m defmmg hrgher level penpheral mterfaces. such as the ANSI 
x3 131-1986 Small Computer System Interface (SCSI) Such standards 
have encouraged companies bke Adeptec to offer SCSI mterfaces as single 
chips, m turn allowing &Sk compames to embed mamfiame controller 
functrons at low cost Figure 1 compares the uadltlonal mamframe dsk 
approach and the small computer disk approach 7%~. sine SCSI mterface 
chip emLxd&d as a controller m every disk can also be uSed aS the dmXt 
memory access @MA) deuce at the other end of the SCSI bus 
Such charactensUcs lead to our proposal for buddmg I/O systems as 
-YS of mexpenslve d&s, either mterleaved for the large tninsfers of 
supercomputers [I(lm 86]@vny 871[Satem861 or mdependent for the many 
small mnsfen of transacUon processmg Usmg the mformamn m ‘fable 
I, 75 ~~xpensrve disks potentmlly have 12 hmcs the I/O bandwIdth of the 
IBM 3380 and the same capacity, with lower power COnSUmpUOn and Cost 
4 Caveats 
We cannot explore all issues associated with such -ys m the space 
avaIlable for this paper, so we ConCefltNte on fundamental estimates of 
price-performance and rehabduy Our reasoning IS that If there are no advantages m pnceperformance or temble d&vantages m rehabdlty, then 
there IS IIO need to explore further We chamctenze a transacUon-processing 
workload to evaluate performance of a col&Uon of iexpensive d&s. but 
remember that such a CollecUon is Just one hardware component of a 
complete tranacUon-processmg system While deslgnmg a complete TPS 
based on these ideas 1s enUcmg, we will resst that temptaUon m this 
paper Cabling and packagmg, certamly an issue m the cost and rehablhty 
of an array of many mexpenslve d&s, IS also beyond this paper’s scope 
Mainframe Small Computer 
CPU 
LJ 
0% 
Memoly Channel 
. . . 
. . . 
CPU 
a 
dm 
Figure 1 Comparison of organizations for typlca/ mat&me and small 
compter ahk tnterfaces Stngle chrp SCSI tnte@ces such as the Adaptec 
MC-6250 allow the small computer to ure a single crUp to be the DMA 
tnterface as well as pronde an embedded controllerfor each dtsk [Adeptec 
871 (The pnce per megabyte an Table I mcludes evetythtng zn the shaded 
box.?sabovc) 
5. And Now The Bad News: Reliabihty 
The unrehabd~ty of d&s forces computer systems managers to make 
backup versions of mformaUon quite frequently m case of fmlure What 
would be the impact on relmbdlty of havmg a hundredfold Increase m 
disks? Assummg a constant fmlure rate--that is. an exponenhally 
dlsmbuted Ume to fadure--and that failures are Independent--both 
assumptmns made by dtsk manufacturers when cakulaUng the Mean Time 
To Fadure O--the zebablhty of an array of d&s IS 
MITF ofa slngtc &sk 
MTI’F of a Drsk Array = 
Number MDuks m the Array 
Using the mformatron m Table I. the MTTF of 100 CP 3100 d&s 1s 
30,000/100 = 300 hours, or less than 2 weeks Compared to the 30,ooO 
hour (> 3 years) MTTF of the IBM 3380, this IS &smal If we consider 
scaling the army to 1000 disks, lhen the MTTF IS 30 hours or about one 
day, reqmrmg an ad~ecIne. worse rhan dismal 
Without fault tolerance, large arrays of mexpenstve Qsks are too unrehable to be useful 
6. A Better Solution’ RAID 
To overcome the rebabtbty challenge, we must make use of extra 
d&s contammg redundant mformaUon to recover the ongmai mformatmn 
when a &Sk fads Our acronym for these Redundant Arrays of Inexpensn’e 
Disks IS RAID To sunplify the explanaUon of our final proposal and to avold confusmn wnh previous work, we give a taxonomy of five different 
orgamzaUons of dtsk arrays, begmnmg with murored disks and progressmg 
through a variety of ahemaUves with &ffenng performance and rehablhty 
We refer to each orgamzauon as a RAID level 
The reader should be forewarned that we describe all levels as If 
implemented m hardware solely to slmphfy the presentation, for RAID 
Ideas are apphcable to software implementauons as well as hardware 
Reltabthty Our baste approach will be to break the arrays into 
rellabrhty groups, with each group having extra “check” disks contammg 
redundant mformauon When a disk fads we assume that withm a short 
time the failed disk ~111 be replaced and the mformauon wdl be 
110 
recon~ acted on to the new dlbk usmg the redundant mformauon Th1.s 
time IS Ldled the mean time to repair (MlTR) The MTTR can be reduced 
If the system includes extra d&s to act as “hot” standby spares, when a 
disk fmls, a replacement disk IS swltched m elecrromcally Penodlcally a 
human operator replaces all faded d&s Here are other terms that we use 
D = total number of d&s with data (not mcludmg extra check d&s). 
G = number of data d&s m a group (not mcludmg extra check d&s), 
C = number of check d&s m a group, 
nG =D/G=nUmberOfgoUp& 
As menhoned above we make the same assumptions that disk 
manufacturers make--that fadura are exponenual and mdependent (An 
earthquake or power surge IS a sltuatlon where an array of d&s might not foul Independently ) Since these reliability prticuons wdl be very high, 
we want to emphasize that the rehabdlty IS only of the the &sk-head 
assemblies with this fmlure model, and not the whole software and 
electromc system In ad&non, m our view the pace of technology means 
extremely lugh WF are “overlull”--for, independent of expected bfeume, 
users will replace obsolete &sks After all, how many people are stdl 
using 20 year old d&s? 
The general MT’TF calculation for single-error repamng RAID 1s 
given III two steps Fmt, the group MTIF IS 
mFDtsk 
I MrrF,,,, = 
* 
G+C Probabdrty ofanotherfadure m a group 
b&re repamng the dead oisk 
As more formally denved m the appendix, the probabdlty of a second 
fa&nebeforethefirsthasbeenrepauedIs 
MlTR hill-R 
Probabdrty of = 
E 
Another Failure bfnF,,,,k /(No DIS~T- 1) MmF/j,k /(w-l) 
The mtmuon behmd the formal calculation m-the appendix comes 
from trymg to calculate the average number of second d& fdures durmg 
the repau time for X single &Sk fadures Since we assume that Qsk fadures 
occur at a umform rate, tha average number of second fa&ues durmg the 
rcpau tune for X first fadures 1s 
X *MlTR 
MlTF of remamtng d&s u) the group 
The average number of second fathues for a smgle d&z 1s then 
MlTR 
bfnFD,& / No Of W?UlUllIl~ drSkS l?l the group 
The MTTF of the retnaming disks IS Just the MTI’F of a smgle disk 
dnwkd by the number of go4 disks m the gmup. gwmg the result above 
The second step IS the reltablhty of the whole system, which IS 
approxl~~~teiy (smcc MITFGrow 1s not qmte titnbuted exponentrally) 
MTrFGrarp 
MTTFRAID = 
Pi 
Pluggmg It all together, we get. 
mFD,sk mFD,sk 
1 MITFRAID = - 
* 
*- 
G+C (G+C-l)*MITR “c 
(MmFDtsk)2 
= (G+C)*tlG * (G+C-l)*MITR 
Smce the formula 1s tbe same for each level, we make the abstract 
numbers concrete usmg these parameters as appropriate D=loO total data 
d&s, G=lO data disks per group, M7VDcsk = 30,000 hours, MmR = 1 hour, with the check d&s per group C detennmed by the RAID level 
Relubrlrty Overhead Cost This IS stmply the extra check 
disks. expressed as a percentage of the number of data &sks D As we shall 
see below, the cost vanes WIUI RAID level fmm 100% down to 4% 
Useable Storage Capacity Percentage Another way to express this rellabdlty overhead 1s m terms of the percentage of the total 
capacity of data &sks and check disks that can be used to store data 
Depending on the orgamauon, this vanes from a low of 50% to a high of 
96% 
Performance Smce supercomputer applications and 
transaction-processing systems have &fferent access patterns and rates, we 
need different metncs to evaluate both For supercomputers we count the 
number of reads and wnte.s per second for large blocks of data, with large 
defined as gettmg at least one sector from each data d& III a group Durmg 
large transfers all the disks m a group act as a stngle umt, each readmg or 
wntmg a pomon of the large data block m parallel 
A better measure for transacuon-processmg systems s the number of 
indlvrdual reads or writes per second Smce transacuon-processing 
systems (e g , deblts/cre&ts) use a read-modify-wnte sequence of disk 
accesses, we mclude that metnc as well Ideally durmg small transfers each 
dsk m a group can act mdepe&ndy. e~thez readmg or wntmg mdependent 
mfonnatmn In summary supercomputer applicauons need a hrgh dura rure 
whale transacuon-pmcessm g need a hrgh II0 rate 
For both the large and small transfer calculauons we assume the 
mlmmum user request IS a sector, that a sector 1s small relauve to a track, 
and that there 1s enough work to keep every devtce busy Thus sector size 
affects both dusk storage efficiency and transfer sue Figure 2 shows the 
uiealoperauonoflargeandsmall~accessesmaRAID 
(a) Stngle Large or “Graupcd” Read 
(lreadqwadoverGd&s) 
1tt 1 
q nl .*. 
(b) Several Smll or Indmdual Reads and Writes 
(GndsandlorwntcsqmndawrG&sks) 
Figure 2. Large tramfer vs small tran$ers WI a group of G d&s 
The SIX pelformauce memcs are then the number of reads, wntes, and 
read-mod@-writes per second for both large (grouped) or small (mdlvldual) 
transfers Rather than @ve absolute numbers for each memc, we calculate 
efficiency the number of events per second for a RAID relative to the 
corrcqondmg events per second for a smgle dusk (This ts Boral’s I/O 
bandwidth per ggabyte moral 831 scaled to glgabytes per disk ) In Uns 
pap we are after fundamental Mferences so we use ample. demmmlstlc 
throughput measures for our pezformance memc rather than latency 
Effective Performance Per Dnk The cost of d&s can be a 
large portmn of the cost of a database system, so the I/O performance per 
disk--factonng m the overhead of the check disks--suggests the 
cost/performance of a system ‘flus IS the bottom line for a RAID 
111 I 
7. First Level RAID: Mwrored Disks 
Mmored dusks are 11 tradmonal approach for lmprovmg rellabdlty of 
magneuc disks This IS the most expensive opuon we consider since all 
tiks are duplicated (G=l and C=l). and eve.ry wnte to a data dusk 1s also a 
wnte to a check &Sk Tandem doubles the number of controllers for fault 
tolerance, allowing an opwnized version of mirrored d&s that lets reads 
occur m parallel Table II shows the memcs for a Level 1 RAID assummg 
this optnnuatton 
MTTF 
Total Number of D&s 
Ovcrhcad Cost 
Usecrble Storage Capacity 
Exceeds Useful Roduct Ltiwne 
(4500.000 hrs or > 500 years) 
2D 
100% 
50% 
Eventslscc vs Smgle Disk Full RAID E@caency Per Disk 
hrge (or Grouped) Readr ws 1 00/s 
Large (or Grouped) Wrues D/S 50/S 
Large (or Grouped) R-M-W 4Dl3S 67/S 
Small (or Indsvuiual) Rends W 100 Small (or hd~vuiual) Writes D 50 
Small (or In&dual) R-M-W 4D/3 61 Table II. Charactenstrcs of Level 1 RAID Here we assume that writes 
are not slowed by waztrng jar the second wrote to complete because the 
slowdown for writing 2 dtsks 1s mtnor compared to the slowdown S for 
wntrng a whole group of 10 lo 25 d&s Unltke a “pure” mtrrored scheme 
wtth extra &As that are mvlsrble to the s&ware, we assume an optmuted 
scheme with twice as many controllers allowtng parallel reads to all d&s, 
grvmg full disk bandwidth for large reads and allowtng the reads of 
rea&noaijj-nntes to occw in paralbzl 
When mdwldual accesses am dlsmbuted acmss muluple d&s, average 
queuemg. seek, and rotate delays may &ffer from the smgle Qsk case 
Although bandwidth may be unchanged, it is Qsmbuted more evenly, 
reducing vanance m queuemg delay and, If the disk load IS not too high, 
also reducmg the expected queuemg delay through parallebsm [Llvny 871 When many arms seek to the same track then rotate to the described sector, 
the average seek and rotate time wdl be larger than the average for a smgle 
disk, tendmg toward the worst case tunes Tlus affect should not generally 
more than double the average access tlmc to a smgle sector whde stdl 
gettmg many sectors m parallel In the special case of rmrrored &sks with 
sufficient controllers, the choice between arms that can read any data sector 
will reduce the tune for the average read seek by up to 45% mltton 881 To allow for these factors but to retam our fundamental emphasis we 
apply a slowdown factor, S, when there are more than two d&s m a 
group In general, 1 5 S < 2 whenever groups of disk work m parallel 
With synchronous disks the spindles of all disks m the group are 
synchronous so that the correspondmg sectors of a group of d&s pass 
under the heads stmultaneously,[Kmzwd 881 so for synchronous disks 
there IS no slowdown and S = 1 Smce a Level 1 RAID has only one data 
disk m its group, we assume that the large transfer reqmres the same 
number of Qsks actmg in concert 9s found m groups of the higher level 
RAIDS 10 to 25 d&s 
Dupllcatmg all msks can mean doubhng the cost of the database 
system or using only 50% of the disk storage capacity Such largess 
inspires the next levels of RAID 
8 Second Level RAID: Hammmg Code for ECC 
The history of main memory orgaruzauons suggests a way to reduce 
the cost of rehablhty With the introduction of 4K and 16K DRAMS, 
computer designers discovered that these new devices were SubJe.Ct to 
losing information due to alpha part&s Smce there were many single 
bit DRAMS m a system and smce they were usually accessed m groups of 
16 to 64 chips at a ume, system designers added redundant chips to correct 
single errors and to detect double errors m each group This increased the 
number of memory chips by 12% to 38%--depending on the size of the 
group--but 11 slgmdcantly improved rehabdlty 
As long as all the dam bits m a group are read or wntten together, 
there 1s no Impact on performance However, reads of less than the group 
size requue readmg the whole group to be sure the mformatir? IS correct, 
and writes to a poruon of the group mean three steps 
1) a read step to get all the rest ofthe data, 
2) a mad&v step to merge the new and old mformatwn. 
3) a write step to write the full group, tncludmg check lnformatwn 
Smce we have scores of d&s m a RAID and smce some accesses are 
to groups of d&s, we can mimic the DRAM solution by bit-mterleavmg 
the data across the disks of a group and then add enough check d&s to detect and correct a smgle error A smgle panty dusk can detect a smgle 
error, but to correct an erroI we need enough check dusks to ulentiy the 
disk with the error For a group sue of 10 data do& (G) we need 4 check 
d&s (C) m total, and d G = 25 then C = 5 [HammmgSO] To keep down 
the cost of redundancy, we assume the group size will vary from 10 to 25 Since our mdwidual data transfer urn1 is Just a sector, bit- interleaved 
dsks mean that a large transfer for this RAID must be at least G sectors 
L&e DRAMS, reads to a smaller amount unpiles readmg a full “cctor from 
each of the bit-mterleaved disks m a group, and writes of a single unit 
involve the read-modify-wnte cycle to hll the Qsks Table III shows the 
metncs of this Level 2 RAID 
MlTF ExceedsUseful~ehme 
G=lO G=Z 
(494500 hrs (103500 llrs 
or >50 years) OT 12 years) 
Total Number of D&s 14OD 12OD 
overhud Cost 40% 20% 
Useable Storage Capacity 71% 83% 
EventslSec Full RAID Eficlency Per Ask Eflc~ncy Per Dtsk 
(vs Single Disk) L2 L2lLI L2 L2ILI 
hgeRe& D/S 111s 71% 86/S 86% 
Lurgc wrllcs D/S 71/s 143% 86/s 112% 
Large R-M-W D/S 71/s 107% 86/S 129% 
Small Reodr DISC 01/s 6% 03lS 3% 
Small Wrttes D12sG 04/S 6% o2.B 3% 
Small R-M-W DISC 071s 9% 03/S 4% 
Table III Charactenstlcs of a Level 2 RAID The L2lLI column gives 
the % performance of level 2 m terms of lewl 1 (>lOO% means L.2 IS 
faster) As long as the transfer taut ts large enough to spread over all the 
data d& of a group. the large IIOs get the full bandwuith of each &Sk, 
&w&d by S to allow all dtsks m a group to complete Level 1 large reads 
are fmler because &a IS duphcated and so the redwdoncy d&s can also do 
independent accesses Small 110s still reqture accessmg all the Isks tn a 
group. so only DIG small IIOc can happen at a tone, agam dwrded by S to 
allow a group of disks to jintsh Small Level 2 writes are hke small 
R-M-W becalcse full sectors must be read before new &ta can be written 
onto part of each sector 
For large wntes, the level 2 system has the same performance as level 
1 even though it uses fewer check disks, and so on a per disk basis It outperforms level 1 For small data transfers the performance 1s &smal 
either for the whole system or per disk, all the disks of a group must be 
accessed for a small transfer, llmltmg the Ipaxrmum number of 
simultaneous accesses to DIG We also include the slowdown factor S 
smce the access must wat for all the disks to complete 
Thus level 2 RAID IS desuable for supercomputers but mapproprmte 
for transaction processmg systems, with increasing group size. increasing 
the disparity m performance per disk for the two applications In recognition of this fact, Thrnkmg Machmes Incorporated announced a 
Level 2 RAID this year for its Connecuon Machme supercomputer called 
the “Data Vault,” with G = 32 and C = 8, mcludmg one hot standby spare 
[H&s 871 Before improving small data transfers, we concentrate once more on lowenng the cost 
9 Thwd Level RAID: Single Check Disk Per Group 
Most check disks m the level 2 RAID are used to determme which 
disk faded, for only one redundant panty disk is needed to detect an error 
These extra disks are truly “redundant” smce most drsk controllers can 
already detect If a dusk faded either through special signals provided m the 
disk interface or the extra checking mformauon at the end of a sector used 
to detect and correct soft errors So mformatlon on the failed disk can be 
reconstructed by calculatmg the parity of the remaining good disks and 
then companng bit-by-bit to the panty calculated for the ongmal full 
112 
Page 5


A Case for Redundant Arrays of Inexpensive Disks (RAID) 
Davtd A Patterson, Garth Gibson, and Randy H Katz 
Computer Saence D~v~smn 
Department of Elecmcal Engmeermg and Computer Sclencea 
571 Evans Hall 
Umversity of Cabforma 
Berkeley. CA 94720 (partrsl@WF -kY du) 
Abstract Increasmg performance of CPUs and memorres wrll be 
squandered lf not matched by a sunrlm peformance ourease m II0 Whde 
the capactty of Smgle Large Expenstve D&T (SLED) has grown rapuily, 
the performance rmprovement of SLED has been modest Redundant 
Arrays of Inexpensive Disks (RAID), based on the magnetic duk 
technology developed for personal computers, offers an attractive 
alternattve IO SLED, promtang onprovements of an or&r of mogm&e m 
pctformance, rehabdlty, power consumption, and scalalnlrty Thu paper 
rntroducesfivc levels of RAIDS, grvmg rheu relative costlpetfotmance, and 
compares RAID to an IBM 3380 and a Fupisu Super Eagle 
1 Background: Rlsrng CPU and Memory Performance 
The users of computers are currently enJoymg unprecedented growth 
m the speed of computers Gordon Bell said that between 1974 and 1984. smgle chip computers improved m performance by 40% per year, about 
twice the rate of mmlcomputers [Bell 841 In the followmg year B111 Joy 
predicted an even faster growth [Joy 851 Mamframe and supercomputer manufacturers, havmg &fficulty keeping 
pace with the rapId growth predicted by “Joy’s Law,” cope by offermg 
m&processors as theu top-of-the-lme product. 
But a fast CPU does not a fast system make Gene Amdahl related 
CPU speed to mam memory s12e usmg this rule [Siewmrek 821 Each CPU mnstrucaon per second requues one byte of moan memory, 
If computer system costs are not to be dommated by the cost of memory, 
then Amdahl’s constant suggests that memory chip capacity should grow 
at the same rate Gordon Moore pr&cted that growth rate over 20 years 
fransuforslclup = 2y*-1%4 
AK predzted by Moore’s Law, RAMs have quadrupled m capacity every 
twotMoom75110threeyeaFIyers861 
Recently the rauo of megabytes of mam memory to MIPS ha9 been 
defti as ahha [Garcm 841. vvlth Amdahl’s constant meanmg alpha = 1 In parl because of the rapti drop of memory prices, mam memory we.9 have 
grownfastexthanCPUspeedsandmanymachmesare~ppedtoday~th 
alphas of 3 or tigha 
To mamtam the balance of costs m computer systems, secondary 
storage must match the advances m other parts of the system A key meas- 
Pemuswn to copy mthout fee all or w of &IS matcnal IS granted pronded that the COP!S 
zzrc not made or lstnbuted for dwct commernal advantage, the ACM copyright notIce 
and the tltk of the pubbcatuon and IW da’, appear, and notxe IS @“en that COPYI"K IS by 
pemtrs~on of the Association for Computing Machtnery To COPY otherwIse, or to 
repubbsh, requres B fee and/or spenfic perm~ss~o” 
0 1988 ACM 0-89791~268-3/88/~/OlOP $1 50 
ure of magneuc tik technology 1s the growth m the maxnnum number of 
bits that can be stored per square mch, or the bits per mch m a track 
umes the number of tracks per mch Called MA D , for maxunal area1 
density, the “Fmt Law m Disk Density” predicts ~rank87] 
MAD = lo(Year-1971)/10 
Magnettc dd technology has doubled capacity and halved pnce every three 
years, m hne with the growth rate of semiconductor memory, and m 
practice between 1967 and 1979 the dtsk capacity of the average IBM data 
processmg system more than kept up with its mam memory [Stevens81 ] Capacity IS not the o~rty memory charactensuc that must grow 
rapidly to mamtam system balance, since the speed with which 
msuuctions and data are delivered to a CPU also determmes its ulamdte 
perfarmanceThespeedof~mem~has~tpacefoPtworeasons 
(1) the mvenuon of caches, showmg that a small buff= can be managed 
automamzally to contain a substanttal fractmn of memory refaences. 
(2) and the SRAM technology, used to build caches, whose speed has 
lmpmvedattherateof4O%tolOO%peryear 
In umtmst to pnmary memory technologres, the performance of 
single large expensive ma8netuz d&s (SLED) has improved at a modest 
rate These mechamcal devu~ are dommated by the seek and the rotahon 
delays from 1971 to 1981, the raw seek tune for a high-end IBM disk 
improved by only a factor of two whllt the rocstlon hme did not cbange[Harkex811 Greater denslty means a lugher transfer rate when the 
mformatmn 1s found. and extra heads can educe the aveaage seek tnne, but 
the raw seek hme only unproved at a rate of 7% per year There 1s no reasontoexpectafasterratemthenearfuture 
To mamtam balance, computer systems have been usmg even larger 
mam memones or solid state d&s to buffer some of the I/O acttvlty 
This may be a fine solutron for apphcattons whose I/O actrvlty has 
locality of reference and for which volatlltty 1s not an issue. but 
appbcauons dommated by a high rate of random muests for small peces 
of data (such BS tmmact~on-pmcessmg) or by a low number of requests for 
massive amounts of data (such as large simulahons nmnmg on supercomputers) are facmg a sermus p&mnance hmuatmn 
2. The Pendrng I/O Crisw 
What t3 the Impact of lmprovmg the performance of sOme pieces of a 
problem while leavmg others the same? Amdahl’s answer IS now known 
asAmdahl'sLaw[Amdahl67] 
1 S z 
(1-n +flk 
Whae 
S = the effecttve speedup, 
f=fractmnofworkmfastermode,and 
k = speedup whde m faster mode 
I/G 
Suppose that some current appbcatmns spend 10% of thev ume In 
Then when computers are 10X faster--accordmg to Bdl Joy m JUSt 
Over thtte years--then Amdahl’s Law predicts efQcove speedup wdl be only 
5X When we have computers lOOX faster--vm evolutmn of umprcuzessors 
or by multiprocessors-&s applrcatlon will be less than 10X faster, 
wastmg 90% of the potenhal speedup 
Whde we can lmagme improvements m software file systems via 
buffcrmg for near term 40 demands, we need mnovaUon to avoid an J./O 
crms [Boral83] 
3 A Solution: Arrays of Inexpensrve Disks 
RapId unprovements m capacity of large disks have not been the only 
target ofd& designers, smce personal computers have created a market for 
inexpensive magnetic disks These lower cost &sks have lower perfor- 
mance as well as less capacity Table I below compares the top-of-the-lme 
IBM 3380 model AK4 mamframe dtsk, FUJ~$U M2361A “Super Eagle” 
muucomputer disk, and the Conner Penpherals CP 3100 personal 
computer d& 
ChoroctensacS IBM FUJUSU Canners 3380 v 2361 v 
3380 M2361A CP3100 3100 31Go 
(>I mmrr 
3100 Is tt?tter) 
D&c dmmeter (mches) 14 105 35 4 3 Formatted DaraCapaclty (MB) 7500 600 100 01 2 Pr~ce/MB(controller mcl ) $18-$10 $20517 $lO-$7 l-25 17-3 
MlTFRated (hours) 30,oLw 20@030,ooo 1 15 MlTF m pracUce (hours) 100,000 3 ? ?V 
No Actuators 4 1 1 2 1 MaxmuunUO’$econd/ActuaU~ 50 40 30 6 8 Typical I/O’s/second/Actuator JO 24 20 7 8 -~wdsecond/box 200 40 30 2 8 Typical VO’s/secondmox 120 24 20 2 8 Transfer Rate (MB/set) 3 25 1 3 4 Power/box (w) 6,600 640 10 660 64 Volume (cu ft ) 24 34 03 800 110 Table I Companson of IBM 3380 dtsk model AK4 for marnframe 
computers, the Fuptsu M2361A “Super Eagle” dtsk for rmnrcomputers, 
and the Conners Penpherals CP 3100 dtsk for personal computers By 
“‘MOxtmum Ilo’slsecond” we mean the rMxmtum number of average seeks 
and average rotates for a stngle sector access Cost and rehabthty 
rnfonnatzon on the 3380 comes from w&spread expertence [IBM 871 
[hvh2k87] O?kd the lnformatlon on the FuJltsu from the manual [Fu& 
871, whtle some numbers on the new CP3100 are based on speculatton 
The pnce per megabyte w gven as a range to allow for dflerent prices for 
volume &scount and d@rent mark-up practtces of the vendors (The 8 
watt maximum power of the CP3100 was rncreased to 10 watts to allow 
for the tne&xency of an external power supply. stnce rhe other drives 
contan their awn power supphes) 
One suqmsmg fact is that the number of I/Ck per second per Bctuator in an 
inexpensive &Sk is within a factor of two of the large d&s In several of 
the remammg metrics, mcludmg pnce per megabyte, the mexpenslve disk 
ts supenor or equal to the large Qsks 
The small size and low power are even more Impressive since dsks 
such as the CP31CO contam full track buffers and most funcUons of the 
traditional mainframe controller Small disk manufacturers can provide 
such funcUons m high volume dusks because of the efforts of standards 
comm~ttces m defmmg hrgher level penpheral mterfaces. such as the ANSI 
x3 131-1986 Small Computer System Interface (SCSI) Such standards 
have encouraged companies bke Adeptec to offer SCSI mterfaces as single 
chips, m turn allowing &Sk compames to embed mamfiame controller 
functrons at low cost Figure 1 compares the uadltlonal mamframe dsk 
approach and the small computer disk approach 7%~. sine SCSI mterface 
chip emLxd&d as a controller m every disk can also be uSed aS the dmXt 
memory access @MA) deuce at the other end of the SCSI bus 
Such charactensUcs lead to our proposal for buddmg I/O systems as 
-YS of mexpenslve d&s, either mterleaved for the large tninsfers of 
supercomputers [I(lm 86]@vny 871[Satem861 or mdependent for the many 
small mnsfen of transacUon processmg Usmg the mformamn m ‘fable 
I, 75 ~~xpensrve disks potentmlly have 12 hmcs the I/O bandwIdth of the 
IBM 3380 and the same capacity, with lower power COnSUmpUOn and Cost 
4 Caveats 
We cannot explore all issues associated with such -ys m the space 
avaIlable for this paper, so we ConCefltNte on fundamental estimates of 
price-performance and rehabduy Our reasoning IS that If there are no advantages m pnceperformance or temble d&vantages m rehabdlty, then 
there IS IIO need to explore further We chamctenze a transacUon-processing 
workload to evaluate performance of a col&Uon of iexpensive d&s. but 
remember that such a CollecUon is Just one hardware component of a 
complete tranacUon-processmg system While deslgnmg a complete TPS 
based on these ideas 1s enUcmg, we will resst that temptaUon m this 
paper Cabling and packagmg, certamly an issue m the cost and rehablhty 
of an array of many mexpenslve d&s, IS also beyond this paper’s scope 
Mainframe Small Computer 
CPU 
LJ 
0% 
Memoly Channel 
. . . 
. . . 
CPU 
a 
dm 
Figure 1 Comparison of organizations for typlca/ mat&me and small 
compter ahk tnterfaces Stngle chrp SCSI tnte@ces such as the Adaptec 
MC-6250 allow the small computer to ure a single crUp to be the DMA 
tnterface as well as pronde an embedded controllerfor each dtsk [Adeptec 
871 (The pnce per megabyte an Table I mcludes evetythtng zn the shaded 
box.?sabovc) 
5. And Now The Bad News: Reliabihty 
The unrehabd~ty of d&s forces computer systems managers to make 
backup versions of mformaUon quite frequently m case of fmlure What 
would be the impact on relmbdlty of havmg a hundredfold Increase m 
disks? Assummg a constant fmlure rate--that is. an exponenhally 
dlsmbuted Ume to fadure--and that failures are Independent--both 
assumptmns made by dtsk manufacturers when cakulaUng the Mean Time 
To Fadure O--the zebablhty of an array of d&s IS 
MITF ofa slngtc &sk 
MTI’F of a Drsk Array = 
Number MDuks m the Array 
Using the mformatron m Table I. the MTTF of 100 CP 3100 d&s 1s 
30,000/100 = 300 hours, or less than 2 weeks Compared to the 30,ooO 
hour (> 3 years) MTTF of the IBM 3380, this IS &smal If we consider 
scaling the army to 1000 disks, lhen the MTTF IS 30 hours or about one 
day, reqmrmg an ad~ecIne. worse rhan dismal 
Without fault tolerance, large arrays of mexpenstve Qsks are too unrehable to be useful 
6. A Better Solution’ RAID 
To overcome the rebabtbty challenge, we must make use of extra 
d&s contammg redundant mformaUon to recover the ongmai mformatmn 
when a &Sk fads Our acronym for these Redundant Arrays of Inexpensn’e 
Disks IS RAID To sunplify the explanaUon of our final proposal and to avold confusmn wnh previous work, we give a taxonomy of five different 
orgamzaUons of dtsk arrays, begmnmg with murored disks and progressmg 
through a variety of ahemaUves with &ffenng performance and rehablhty 
We refer to each orgamzauon as a RAID level 
The reader should be forewarned that we describe all levels as If 
implemented m hardware solely to slmphfy the presentation, for RAID 
Ideas are apphcable to software implementauons as well as hardware 
Reltabthty Our baste approach will be to break the arrays into 
rellabrhty groups, with each group having extra “check” disks contammg 
redundant mformauon When a disk fads we assume that withm a short 
time the failed disk ~111 be replaced and the mformauon wdl be 
110 
recon~ acted on to the new dlbk usmg the redundant mformauon Th1.s 
time IS Ldled the mean time to repair (MlTR) The MTTR can be reduced 
If the system includes extra d&s to act as “hot” standby spares, when a 
disk fmls, a replacement disk IS swltched m elecrromcally Penodlcally a 
human operator replaces all faded d&s Here are other terms that we use 
D = total number of d&s with data (not mcludmg extra check d&s). 
G = number of data d&s m a group (not mcludmg extra check d&s), 
C = number of check d&s m a group, 
nG =D/G=nUmberOfgoUp& 
As menhoned above we make the same assumptions that disk 
manufacturers make--that fadura are exponenual and mdependent (An 
earthquake or power surge IS a sltuatlon where an array of d&s might not foul Independently ) Since these reliability prticuons wdl be very high, 
we want to emphasize that the rehabdlty IS only of the the &sk-head 
assemblies with this fmlure model, and not the whole software and 
electromc system In ad&non, m our view the pace of technology means 
extremely lugh WF are “overlull”--for, independent of expected bfeume, 
users will replace obsolete &sks After all, how many people are stdl 
using 20 year old d&s? 
The general MT’TF calculation for single-error repamng RAID 1s 
given III two steps Fmt, the group MTIF IS 
mFDtsk 
I MrrF,,,, = 
* 
G+C Probabdrty ofanotherfadure m a group 
b&re repamng the dead oisk 
As more formally denved m the appendix, the probabdlty of a second 
fa&nebeforethefirsthasbeenrepauedIs 
MlTR hill-R 
Probabdrty of = 
E 
Another Failure bfnF,,,,k /(No DIS~T- 1) MmF/j,k /(w-l) 
The mtmuon behmd the formal calculation m-the appendix comes 
from trymg to calculate the average number of second d& fdures durmg 
the repau time for X single &Sk fadures Since we assume that Qsk fadures 
occur at a umform rate, tha average number of second fa&ues durmg the 
rcpau tune for X first fadures 1s 
X *MlTR 
MlTF of remamtng d&s u) the group 
The average number of second fathues for a smgle d&z 1s then 
MlTR 
bfnFD,& / No Of W?UlUllIl~ drSkS l?l the group 
The MTTF of the retnaming disks IS Just the MTI’F of a smgle disk 
dnwkd by the number of go4 disks m the gmup. gwmg the result above 
The second step IS the reltablhty of the whole system, which IS 
approxl~~~teiy (smcc MITFGrow 1s not qmte titnbuted exponentrally) 
MTrFGrarp 
MTTFRAID = 
Pi 
Pluggmg It all together, we get. 
mFD,sk mFD,sk 
1 MITFRAID = - 
* 
*- 
G+C (G+C-l)*MITR “c 
(MmFDtsk)2 
= (G+C)*tlG * (G+C-l)*MITR 
Smce the formula 1s tbe same for each level, we make the abstract 
numbers concrete usmg these parameters as appropriate D=loO total data 
d&s, G=lO data disks per group, M7VDcsk = 30,000 hours, MmR = 1 hour, with the check d&s per group C detennmed by the RAID level 
Relubrlrty Overhead Cost This IS stmply the extra check 
disks. expressed as a percentage of the number of data &sks D As we shall 
see below, the cost vanes WIUI RAID level fmm 100% down to 4% 
Useable Storage Capacity Percentage Another way to express this rellabdlty overhead 1s m terms of the percentage of the total 
capacity of data &sks and check disks that can be used to store data 
Depending on the orgamauon, this vanes from a low of 50% to a high of 
96% 
Performance Smce supercomputer applications and 
transaction-processing systems have &fferent access patterns and rates, we 
need different metncs to evaluate both For supercomputers we count the 
number of reads and wnte.s per second for large blocks of data, with large 
defined as gettmg at least one sector from each data d& III a group Durmg 
large transfers all the disks m a group act as a stngle umt, each readmg or 
wntmg a pomon of the large data block m parallel 
A better measure for transacuon-processmg systems s the number of 
indlvrdual reads or writes per second Smce transacuon-processing 
systems (e g , deblts/cre&ts) use a read-modify-wnte sequence of disk 
accesses, we mclude that metnc as well Ideally durmg small transfers each 
dsk m a group can act mdepe&ndy. e~thez readmg or wntmg mdependent 
mfonnatmn In summary supercomputer applicauons need a hrgh dura rure 
whale transacuon-pmcessm g need a hrgh II0 rate 
For both the large and small transfer calculauons we assume the 
mlmmum user request IS a sector, that a sector 1s small relauve to a track, 
and that there 1s enough work to keep every devtce busy Thus sector size 
affects both dusk storage efficiency and transfer sue Figure 2 shows the 
uiealoperauonoflargeandsmall~accessesmaRAID 
(a) Stngle Large or “Graupcd” Read 
(lreadqwadoverGd&s) 
1tt 1 
q nl .*. 
(b) Several Smll or Indmdual Reads and Writes 
(GndsandlorwntcsqmndawrG&sks) 
Figure 2. Large tramfer vs small tran$ers WI a group of G d&s 
The SIX pelformauce memcs are then the number of reads, wntes, and 
read-mod@-writes per second for both large (grouped) or small (mdlvldual) 
transfers Rather than @ve absolute numbers for each memc, we calculate 
efficiency the number of events per second for a RAID relative to the 
corrcqondmg events per second for a smgle dusk (This ts Boral’s I/O 
bandwidth per ggabyte moral 831 scaled to glgabytes per disk ) In Uns 
pap we are after fundamental Mferences so we use ample. demmmlstlc 
throughput measures for our pezformance memc rather than latency 
Effective Performance Per Dnk The cost of d&s can be a 
large portmn of the cost of a database system, so the I/O performance per 
disk--factonng m the overhead of the check disks--suggests the 
cost/performance of a system ‘flus IS the bottom line for a RAID 
111 I 
7. First Level RAID: Mwrored Disks 
Mmored dusks are 11 tradmonal approach for lmprovmg rellabdlty of 
magneuc disks This IS the most expensive opuon we consider since all 
tiks are duplicated (G=l and C=l). and eve.ry wnte to a data dusk 1s also a 
wnte to a check &Sk Tandem doubles the number of controllers for fault 
tolerance, allowing an opwnized version of mirrored d&s that lets reads 
occur m parallel Table II shows the memcs for a Level 1 RAID assummg 
this optnnuatton 
MTTF 
Total Number of D&s 
Ovcrhcad Cost 
Usecrble Storage Capacity 
Exceeds Useful Roduct Ltiwne 
(4500.000 hrs or > 500 years) 
2D 
100% 
50% 
Eventslscc vs Smgle Disk Full RAID E@caency Per Disk 
hrge (or Grouped) Readr ws 1 00/s 
Large (or Grouped) Wrues D/S 50/S 
Large (or Grouped) R-M-W 4Dl3S 67/S 
Small (or Indsvuiual) Rends W 100 Small (or hd~vuiual) Writes D 50 
Small (or In&dual) R-M-W 4D/3 61 Table II. Charactenstrcs of Level 1 RAID Here we assume that writes 
are not slowed by waztrng jar the second wrote to complete because the 
slowdown for writing 2 dtsks 1s mtnor compared to the slowdown S for 
wntrng a whole group of 10 lo 25 d&s Unltke a “pure” mtrrored scheme 
wtth extra &As that are mvlsrble to the s&ware, we assume an optmuted 
scheme with twice as many controllers allowtng parallel reads to all d&s, 
grvmg full disk bandwidth for large reads and allowtng the reads of 
rea&noaijj-nntes to occw in paralbzl 
When mdwldual accesses am dlsmbuted acmss muluple d&s, average 
queuemg. seek, and rotate delays may &ffer from the smgle Qsk case 
Although bandwidth may be unchanged, it is Qsmbuted more evenly, 
reducing vanance m queuemg delay and, If the disk load IS not too high, 
also reducmg the expected queuemg delay through parallebsm [Llvny 871 When many arms seek to the same track then rotate to the described sector, 
the average seek and rotate time wdl be larger than the average for a smgle 
disk, tendmg toward the worst case tunes Tlus affect should not generally 
more than double the average access tlmc to a smgle sector whde stdl 
gettmg many sectors m parallel In the special case of rmrrored &sks with 
sufficient controllers, the choice between arms that can read any data sector 
will reduce the tune for the average read seek by up to 45% mltton 881 To allow for these factors but to retam our fundamental emphasis we 
apply a slowdown factor, S, when there are more than two d&s m a 
group In general, 1 5 S < 2 whenever groups of disk work m parallel 
With synchronous disks the spindles of all disks m the group are 
synchronous so that the correspondmg sectors of a group of d&s pass 
under the heads stmultaneously,[Kmzwd 881 so for synchronous disks 
there IS no slowdown and S = 1 Smce a Level 1 RAID has only one data 
disk m its group, we assume that the large transfer reqmres the same 
number of Qsks actmg in concert 9s found m groups of the higher level 
RAIDS 10 to 25 d&s 
Dupllcatmg all msks can mean doubhng the cost of the database 
system or using only 50% of the disk storage capacity Such largess 
inspires the next levels of RAID 
8 Second Level RAID: Hammmg Code for ECC 
The history of main memory orgaruzauons suggests a way to reduce 
the cost of rehablhty With the introduction of 4K and 16K DRAMS, 
computer designers discovered that these new devices were SubJe.Ct to 
losing information due to alpha part&s Smce there were many single 
bit DRAMS m a system and smce they were usually accessed m groups of 
16 to 64 chips at a ume, system designers added redundant chips to correct 
single errors and to detect double errors m each group This increased the 
number of memory chips by 12% to 38%--depending on the size of the 
group--but 11 slgmdcantly improved rehabdlty 
As long as all the dam bits m a group are read or wntten together, 
there 1s no Impact on performance However, reads of less than the group 
size requue readmg the whole group to be sure the mformatir? IS correct, 
and writes to a poruon of the group mean three steps 
1) a read step to get all the rest ofthe data, 
2) a mad&v step to merge the new and old mformatwn. 
3) a write step to write the full group, tncludmg check lnformatwn 
Smce we have scores of d&s m a RAID and smce some accesses are 
to groups of d&s, we can mimic the DRAM solution by bit-mterleavmg 
the data across the disks of a group and then add enough check d&s to detect and correct a smgle error A smgle panty dusk can detect a smgle 
error, but to correct an erroI we need enough check dusks to ulentiy the 
disk with the error For a group sue of 10 data do& (G) we need 4 check 
d&s (C) m total, and d G = 25 then C = 5 [HammmgSO] To keep down 
the cost of redundancy, we assume the group size will vary from 10 to 25 Since our mdwidual data transfer urn1 is Just a sector, bit- interleaved 
dsks mean that a large transfer for this RAID must be at least G sectors 
L&e DRAMS, reads to a smaller amount unpiles readmg a full “cctor from 
each of the bit-mterleaved disks m a group, and writes of a single unit 
involve the read-modify-wnte cycle to hll the Qsks Table III shows the 
metncs of this Level 2 RAID 
MlTF ExceedsUseful~ehme 
G=lO G=Z 
(494500 hrs (103500 llrs 
or >50 years) OT 12 years) 
Total Number of D&s 14OD 12OD 
overhud Cost 40% 20% 
Useable Storage Capacity 71% 83% 
EventslSec Full RAID Eficlency Per Ask Eflc~ncy Per Dtsk 
(vs Single Disk) L2 L2lLI L2 L2ILI 
hgeRe& D/S 111s 71% 86/S 86% 
Lurgc wrllcs D/S 71/s 143% 86/s 112% 
Large R-M-W D/S 71/s 107% 86/S 129% 
Small Reodr DISC 01/s 6% 03lS 3% 
Small Wrttes D12sG 04/S 6% o2.B 3% 
Small R-M-W DISC 071s 9% 03/S 4% 
Table III Charactenstlcs of a Level 2 RAID The L2lLI column gives 
the % performance of level 2 m terms of lewl 1 (>lOO% means L.2 IS 
faster) As long as the transfer taut ts large enough to spread over all the 
data d& of a group. the large IIOs get the full bandwuith of each &Sk, 
&w&d by S to allow all dtsks m a group to complete Level 1 large reads 
are fmler because &a IS duphcated and so the redwdoncy d&s can also do 
independent accesses Small 110s still reqture accessmg all the Isks tn a 
group. so only DIG small IIOc can happen at a tone, agam dwrded by S to 
allow a group of disks to jintsh Small Level 2 writes are hke small 
R-M-W becalcse full sectors must be read before new &ta can be written 
onto part of each sector 
For large wntes, the level 2 system has the same performance as level 
1 even though it uses fewer check disks, and so on a per disk basis It outperforms level 1 For small data transfers the performance 1s &smal 
either for the whole system or per disk, all the disks of a group must be 
accessed for a small transfer, llmltmg the Ipaxrmum number of 
simultaneous accesses to DIG We also include the slowdown factor S 
smce the access must wat for all the disks to complete 
Thus level 2 RAID IS desuable for supercomputers but mapproprmte 
for transaction processmg systems, with increasing group size. increasing 
the disparity m performance per disk for the two applications In recognition of this fact, Thrnkmg Machmes Incorporated announced a 
Level 2 RAID this year for its Connecuon Machme supercomputer called 
the “Data Vault,” with G = 32 and C = 8, mcludmg one hot standby spare 
[H&s 871 Before improving small data transfers, we concentrate once more on lowenng the cost 
9 Thwd Level RAID: Single Check Disk Per Group 
Most check disks m the level 2 RAID are used to determme which 
disk faded, for only one redundant panty disk is needed to detect an error 
These extra disks are truly “redundant” smce most drsk controllers can 
already detect If a dusk faded either through special signals provided m the 
disk interface or the extra checking mformauon at the end of a sector used 
to detect and correct soft errors So mformatlon on the failed disk can be 
reconstructed by calculatmg the parity of the remaining good disks and 
then companng bit-by-bit to the panty calculated for the ongmal full 
112 group When these two parmcs agree, the faded bu was a 0, othcrwtse it RAID levels 2,3, and 4 By stormg a whole transfer umt m a sector, reads 
was a 1 If the check drsk IS the fadure,Just read all the data drsks and store 
the group panty in the replacement drsk 
can be mdependent and operate at the maxrmum rate of a disk yet sull 
detect errors Thus the primary change between level 3 and 4 IS that WC 
Reducmg the check d&s toone per group (C=l) reduces the overhead 
cost to between 4% and 10% for the group stzes considered here The 
performance for the thud level RAID system is the same as the Level 2 RAID, but the effectrve performance per dtsk mcreases smce it needs fewer 
check d&s This reductron m total d&s also increases relrabdtty, but 
since It is shll larger than the useful hfehme of disks, this IS a minor 
pomt One advantage of a level 2 system over level 3 is that the extra 
check mformatton assocrated with each sector to correct soft errors IS not needed, mcreasmg the capactty per dtsk by perhaps 10% Level 2 also 
allows all soft errors to be corrected “on the fly” wnhout havmg to reread a 
sector Table IV summarizes the thud level RAID charactensncs and 
Figure 3 compares the sector layout and check d&s for levels 2 and 3 mterlcave data 
4 Tran$er 
UIlllS 
a, b, c & d 
Level 4 
Sector 0 
&la 
Disk 1 
MlTF Exceeds Useful Lrfenme 
Secwr 0 
Data 
Disk 2 
A 
a 
T 
2 A 
Total Number of D&s 
owrhcad cost 
Useable Storage Capacity 
EventslSec Full RAID 
(vs Single Disk) 
LargeRecu& D/S 
Large Writes D/S 
Large R-M-W D/S 
Small Readr DISC 
Small Vyrites D/2sG 
Small R-M-W DISC 
G=lO 
(820,000 hrs 
or >90 years) 
1 1OD 
10% 
91% 
EIficclency Per Disk 
L3 WIL2 WILl 
91/S 127% 91% 
91/S 121% 182% 
91/S 127% 136% 
09/S 127% 8% 
05/S 127% 8% 
09/S 127% 11% 
G=25 
(346,000 hrs 
or 40 years) 
104D 
4% 
96% 
Eflctency Per Disk 
w Lx2 WILI 
96/S 112% 96% 
96/S 112% 192% 
96/S 112% 142% 
041s 112% 3% 
02/S 112% 3% 
041s 112% 5% 
Table IV Characterrstrcs of a Level 3 RAID The L3lL2 column gives 
the % performance of L3 tn terms of L2 and the L3ILl column give; it in 
terms of LI (>loO% means L3 IS faster) The performance for the full 
systems IS the same m RAID levels 2 and 3, but since there are fewer 
check dtsks the performance per dnk tmproves 
Park and Balasubramaman proposed a thud level RAID system 
without suggestmg a partrcular applicauon park861 Our calculattons 
suggest tt 1s a much better match to supercomputer apphcatrons than to transacuon processing systems This year two disk manufacturers have 
announced level 3 RAIDS for such apphcanons usmg synchronized 5 25 mch disks with G=4 and C=l one from IvIaxtor and one from Mtcropohs 
[Magmms 871 This thud level has brought the rehabrhty overhead cost to its lowest 
level, so in the last two levels we Improve performance of small accesses 
w&out changmg cost or rehabrlny 
10. Fourth Level RAID Independent ReadsbVrltes 
Spreadmg a transfer across all &sks wuhm the group has the 
followmg advantage 
. Large or grouped transfer ttme IS reduced because transfer 
bandwulth of the entue array can be exploned 
But it has the followmg drsadvantagek as well 
. ReadmgAvnhng to a disk m a group requues readmg/wnhng to all the d&s m a group, levels 2 and 3 RAIDS can perform only 
one I/O at a Pme per group 
. If the disks are not synchromzed, you do not see average seek 
and rotattonal delays, the observed delays should move towards 
the worst case, hence the S factor m the equatrons above 
This fourth level RAID improves performance of small transfers through 
parallehsm--the abrhty to do more than one I/O per group at a ume We 
no longer spread the mdtvtdual transfer informanon across several &sks, 
but keep each mdrvrdual unit ma smgle disk 
The vutue of bit-mterleavmg 1s the easy calculatron of the Hammmg 
code needed to detect or correct errors in level 2 But recall that m the thud level RAID we rely on the drsk controller to detect errors wnhm a single 
drsk sector Hence, rf we store an mdrvrdual transfer umt in a single sector, 
we can detect errors on an mdtvtdual read without accessing any other drsk 
Frgure 3 shows the different ways the mformatron is stored in a sector for 
Sectar 0 
L&a 
Dtsk 3 
Sector 0 
D& 
Disk 4 
Sector 0 
Check 
Disk 5 
Sector 0 
Check 
Ask 6 
Sector 0 
Check 
Disk 7 
aEcc0 
bECC0 
CECCO 
dECC0 
aEcc1 
bECC1 
cECC1 
dEcc1 
aEcc2 
bECC2 
cECC2 
dECC2 
ECCa 
ECCb 
ECCc 
ECCd 
(Only one 
check &Sk 
tn level 3 
Check rnfo 
ts calculated 
aver each 
lran.$er 10~1 
(Each @tier 
umt 1s placed tnto 
a single sector 
Note that the check 
~$0 IS now calculated 
over a pece of each 
tran$er urut ) 
D 
I s 
L Frgure 3 Comparrson of locatton of data and check mformatlon In 
sectors for RAID levels 2, 3, and 4 for G=4 Not shown IS the small 
amount of check mformatton per sector added by the disk controller to 
detect and correct soft errors wlthm a sector Remember that we use 
physical sector numbers and hardware control to explain these ideas but 
RAID can be unplemented by sofmare ucmg logical sectors and disks 
At fust thought you mrght expect that an mdrvldual wnte to a smglz 
sector stdl mvolves all the disks m a group smce (1) the check disk mutt 
be rewritten wnh the new panty data, and (2) the rest of the data dash> 
must be read to be able to calculate the new panty data Recall that each 
panty bit IS Just a smgle exclusive OR of s+l the correspondmg data NIL 11 a group In level 4 RAID, unhke level 3, the panty calculatron is ITXFI 
simpler since, if we know the old data value and the old parity balue al 
well as the new data value, we can calculate the new panty mforrmror: sr 
follows 
new panty = (old data xor new data ) xor old pantv 
In level 4 a small wnte then uses 2 dtsks to perform 4 accesses-2 rea& 
and 2 wrnes--whtle a small mad mvolves only one read on one disk Table 
V summarmes the fourth level RAID charactensucs Note that all small 
accesses improve--dramatrcally for the reads--but the small 
read-modrfy-wnte is strll so slow relatrve to a level 1 RAID that ns 
applrcabduy to transactron processmg is doubtful Recently Salem and 
Gama-Molma proposed a Level 4 system [Salem 86) 
Before proceedmg to the next level we need to explam the 
performance of small writes 
in Table V (and hence small 
read-modify-writes smce they entarl the same operatrons m dus RAID) 
The formula for the small wntes drvrdes D by 2 Instead of 4 becau*e 2 113 
Read More
Offer running on EduRev: Apply code STAYHOME200 to get INR 200 off on our premium plan EduRev Infinity!