CA2058734C

CA2058734C - Storage device array architecture with copyback cache

Info

Publication number: CA2058734C
Application number: CA002058734A
Authority: CA
Inventors: William Alexander Brant; David Charles Stallmo; Mark Walker; Albert Lui
Original assignee: EMC Corp
Current assignee: EMC Corp
Priority date: 1991-01-04
Filing date: 1992-01-03
Publication date: 2002-06-25
Anticipated expiration: 2012-01-03
Also published as: DE69126416D1; DE69126416T2; JPH06504863A; EP0493984A2; WO1992012482A1; US5274799A; EP0493984A3; CA2058734A1; US5911779A; US5617530A; EP0493984B1; JP3129732B2; AU1001492A; US5526482A

Abstract

A fault-tolerant storage device array using a copyback cache storage unit for temporary storage. When a Write occurs to the RAID system, the data is immediately written to the first available location in the copyback cache storage unit. Upon completion of the Write to the copyback cache storage unit, the host CPU is immediately informed that the Write was successful. Thereafter, further storage unit accesses by the CPU can continue without waiting for an error-correction block update for the data just written, In a first embodiment of the invention, during idle time for relevant storage units of the storage system, an error-correction block is computed for each "pending" data block on the copy-back cache storage unit, and the data block and corresponding error-correction block are copied to their proper location in the RAID system. The copyback cache storage unit in effect stores "peak load" Write data and then completes the actual Write operations to the RAID system during relatively quiescent periods of I/O accesses by the CPU. In a second embodiment of the invention, after Write data is logged to the copyback cache storage unit, normal Read-Modify-Write operation by the RAID system controller continues in overlapped fashion with other CPU I/O accesses using Write data in the controller's buffer memory.
Performance is enhanced because the CPU can continue processing as soon as the simple Write operation to the copyback cache storage unit completes, thus eliminating the delay caused by a normal Read-Modify-Write RAID system. in this embodiment, the copyback cache storage unit acts more as a running "log" of Write data.

Description

t~~f~At"s~ DElItC~ ARRAN ARtehiITgGTIJ~t~ COPYBA~K ~A~ti$
~,e4tafCGROUND OF THE IPdY~NT'i6~N
t . Fiald of the Invendan This trn~andon retatas to computer system data .storage, and more pantcuiarfy to a fault tolerant storage device array using a copyback cache storage unit for temporary storage.

2 Description of Related ~trt A typical data processing system generaity invohres one or more storage units which are connected to a Central Processor Unit (CPU) either directly or through 1 a a cottttoi unit and a channel. 'fhe function of the storage units is to storsr data and programs which the CPU uses in performing particular data processing tasks.
Various type of storage untts are used In current data processing systems. A
typical system may include one or more large capacity tape units anrllor disk drives (magnetic, optical, or semiconductor) connected to the system through 15 respective control units for storing data.
However, a probtam exists if one of the large opacity storage units fails such that information contained In that unit is no longer available to the system.
Generally, such a failure will strut down the entire computer system.
The prior art has suggested several ways of solving the problem of providing .20 reliable data storage, 1n systems where records are relatively small, it is possible to use error correcting codes whldt generate ECC syndrome bits trial are appended to each data record within a storage unit. With such codes, it 1s possible to correct a small amount of data that may be read erroneously.
However, such codas are generatty not suitable for correcting or recreating long records which are in error, and provide rto remedy at ail if a complete storage unit falls. Therefore, a need exists for providing data reliability external to individual storage units.

~?Zi~~r ~ia~ar~ac;h~s to s~a~h ~Fa~t~rrmfi" reliataiiity h~~r~ been ~3oa~ribsd in the art. ~, roE,~o~ar~h group ak the lJnfversity ~av ~~tifomia, Sorkoley, in a paper ~rrti~ed "A ~asa fret FPrioanclarrt R~.rr~ays c~f in~~partsfva Disk, (FdAI'~)~, Patt~prson, a?
~~1., f~roc< ABM
duns 8 9 has oarlock, ued a nurnlb~r car ~lafPor~nt approaches for pra~viding sash reilability ~rhor~ using disk rdriv~:~ ,torago ~aru3s. Arrays of desk dr~os ~'o oh.~raotari~~~t in one cxt ~'rv~ "~rchit~otur~, sander tho acronym °RAID° (for ~~dun~9a~t ~rr'~y s of inr~a~nsi~4 C~is~CS).
6~ lC~ ~1 architecture invoPvas providing a dupl(8o sat of "rr~irror°
storage unifs erred ka~aping a duplir,,ra8e c.~opy of ail daLs on each pelt wi s2~rage senile. While t ~ s~auh a solution scalves the rell,abilityy prot~lam, it doubles tho cost of storage. A
number oP implementations of RAIt3 t ard~itr~cturaa havQ boas made, in parilcutar by '~andarn ~orpc~ratlon.
A 1~!~ ~ architecture scores ear..h bit of each word of dad, plus Error ~~tact(on ~d ~orractton (EDT) bits foe eactl word, on separate disk drives (this is also t S known as "bit striping°). Por exr3mple, U.S. Patent Pto. ~,7~085 to Floe ef al.
discloses a disk drive memory using a plvratity of rala~tively small, independently operating disk subsystems to tunc~on as a large, high capacity disk drive having art untasualty high fault tolerance and a vary high data transfer bandwidth. A
data organizer adds 7 EDC bits (determined using the weN-known Hammlrtg code) to each 3~-bit data word to provide error detection arid error eorrectlott c8pability.
The resultant 3~-bit word is written, one bit par disk drive, on tt~ 39 disk drives.
fl one of the ~S disk drives fails, the remaining 38 bits of each stored 39-bit word can be used to reconstruct each 3~-bit data word on a word-by-word basis as data word is read from the disk drives, thetgby obtaining fault tolerance.
Art obvious drawback of such a system is the large number of disk drives require~p for a minimum system (since most large computers use a 3~.brt word), ertd tpt~
relatively high ratio of drives rewired to store the EDC bits (7 drives out of 39), A further limtt$tlon of a fiAio 2 disk drive memory system is that the individual disk actuators are operated in unison to write each deta block, the bits of which are distributed over all of the disk drives. 'this arrangement has a high data transfer ?~~r~ci?rArici?~, :~ifac~s ~~ac~s in~tJvi~Jt~al ~JI:pJt 2rar3sierrs p~a~rt ofi a blocft of data, rite net atef~~°? boiftg that vac ~~rrtlr~a taJc,~cJc is ~vsitabl~ to tP~~
~~arrtputor ~~:~?urn muc:Pt fa~tar L'taft if a sin~ts~ cJrfvo uvsro ~~"~ing ~t~a bioclC. ~'t~i~ is ~a~r~frL~~~us for large cJabloctcs. hio~~v~ar, this <arrancisrrjent also ~a'rl~ascr~r~ty pro~rt~tonly a singl~
r~aCi~e~rf~t~2 howl acauator for the ofx~f~ ~torago uP~ft, 't'h9~ ~as~r~srwaty ~stfathe ra~sd~frs ~~~ca~~ po~forrree~c~a o? ~~ drive array/ ,~Pt~rt ~t~2,~~ ~it~s~ ~r~
small, slnc~
only one ~dat~, file at a ilms~ r1 bs ~r,~aa~d by t~~ °~ingt~" aerator, 't'tttus, FiAflS
~ systems afe c~an~ratty not consldared to be ~u#ts~tp for cofnp~rt~r systems d~Jgnad for Can-tune TraottQn Prcr~..ing (sr7~'fP), such in t~nklng, 1 D tlnarsraal, and r~sarvatd~an syystems, where a large number of random accass~s to marry small data tiles compris~s tMg bulk of data storage sand transfer operations.
~ PdAln ~ arcttitoGtufs is basalt On 2he3 concept that each disk drive storage unit hinternal m~ans for detecting a faun or data error. Therefore, tt is not nece:3aary to store extea information to detect rite Ioc:atJon of an error; a simpler t 5 iorrvv of partly-based error r"orrectlon can thus be used. In this approach, the mrrtents of all storage units subject to failure era °Exclusive C~F~'d' (XOR'd) to generate parity irdormation. The rs3sutting parity information is stored In a stngle redundant storag~ unit. tP a storage unit felts, the data on that unit cart be reconstructed on to a replacement storage unit by XOR'ing the data front the 20 r~maining storage untts with the parityr Intormattan. Such an arrangement has the advantage over the mirrorAd disk RAID t architecture in that only one addttlonal storage unit is required for "N" storage units. A further aspect of tkt~ RAID

architecture is that the disk drives are operated in a coupled manner, similar to a RAID 2 system, and a Slngla disk driv~ Is designated as the parity unit.
25 One lmplementatfon of a RAID 3 architecture is tho Micropoiis Oorporat:on ParaJJ~I
Drfv~ Array, Model 1804 SGSI, that uses lour parallel, synchronized disk defuse and c~no redundant parity drive. The failure of one of th~ lour data disk drives can be remedied by the use of the partly btts stored on the partly disk drive.
Another example of a RAIL? 3 system is described in U.S. Patens No. 4,092.73? to Ouchi.

_n,.
~ ITtA,li7 ~ cdl:~(c ~lriy~ rr~~mCary ~~st~r~n has :a rt~ucP~ la:n~u~r r~a~ia of r~~~~n~iar't~yr unit tea data ~ar~it~9 than .a HAIL? ~ >~y;~ksrn. N~~waver, ~ ~i~,i~ ~ .Dya'tsrr~
t~thg same p~r~Yar'maa~c~ lirnttatis~n ~, ~a S~,il3 ~ sy;~~tar~l, in ti~~t ti~ca In~ltvi~lraai di~4c ~~~tuatc~rs are ccaup3~d, cap~r~ting in ~.~niW n. This advut'~~3ty :ati~sc2s trio random zas g~~rtor~ra~r~o~ of 2hta dri~so ~,rr~a~ ~srtsrt ~(a'~~ ftles arse srrta4i, aino.~ only on~ de~tx~ ftie ~t a tirn~ ,n ~~ a~~~~rr~ by tits "~inc~io° aa~a~tcar, ~u~, F~i~ ~
systems errs generally not considor~3r~ to t~~ ateitahie teac c~m~~~rtor :~~y:~t~rns dosignss~d for t7t~'TP
purposes.
A FiAI~ ~ ~trahftactur~a use th~ same partly error cory~~ion concept of th4 FiAI~
t 0 ~ architecture, but Improves on that performance or a ~iAie 3 system with respect to random rmading of small ftles by auncoupling" the operation o? the individual d(sk drive actuators, and reading and writing a larger minimum arnourrt of data (typioaliy, a dlak sector) to each disk (this is aJ.so known as bior~c striping). A
Further aspect of the RAIG a architeotura Is that a single storage un~ is designated 1 ~ as the parity unit A limitation of a F~lii7 4 system is that Writing a data block on arty of the independently operating data storage unite also r~uires writing a n~w parity block on the partly unit. The parity information stored on the parity unit muse be r~ad and XOR'd with the old data (to "remove" the information content of the old data), 2t~ and the reaufting sum must trim be XOR'd wtth the new data (to provtde new parity irrtormation). Both the data and the parity records then muss t~
rewritten to the disk drives. This process is commonly referred to as a "Read-Modity~Wrtte"
sequence.
Thus, a Read and a Wr'tte on the single parity unit occurs each time a record is 2J c~tanged on any of the data storage units covered by the parity record on the parfty unit. The parity unit becomes a bot~a.neck to data writing operations since th~ number of changes to r9cords which cart b~ made per unit of time is a function of the acce>gs rata of the parity unit, as opposed to the teeter access rate provided by parallel operation of th~ multiple data 3'toraga units. ~ecauss of this ~0 limitation, a RAtp ~ system i5 generally not considered to be suitable for computor system:, designed far Dhll~ purposes. lnd~aed, i1; appears that a RA11J ~
system hay not born implaa~narrtad for any aommercial p~urps~sa.
A RAID 5 architecture uses the same parity error rarrectlon concept of tYte RAID
~ archPtero and independent actuators, but irnprr~vos an the wrrong perform-ands of a RAID ~ sys~tam by dlstributlng the data and parity informa~on aoioss all at ih~ avaiiebte disk drtves. Tjlpit~lfy, "~! + 1" storage units in a set (also known as a "redundancy group~) are divided into a plurality of equally sued address areas referred to as blocks, Each storage unit generally contains the same number of blocks. Blocks tram each storage unit in a redundancy group having 1 D the same unit address ranges are referred to as "stripes", Each stripe has N
blacks of data, plus one parity black on one storage unit containing parity for the remainder of the Stripe. Further stripes each have a parity block, the parity blocks being dlstrtduted on different storage units. Parity updating ac~ivitir associated with every modification of data in a redundancy group is therefore distributed over 1 S the dlfterent storage units. No single unit is burdened with ail of the parity update activity.
For example, in a RAID 5 system comprising 5 disk drives, the parity information for the first stripe of blocks may be written to the frfth drive; ttte parity information for the second stripe of blocks may 4s written to the fourth drive; the parity 20 information for the third stripe of blocks may be written to the third drive; etc.
'fhe partty block iar succeeding snipes typically 'processes" around the disk drives in a helical pattern {although other patterns may be used).
Thus, no single disk dr7ve is used for storing the parity information, and the botde-neck of the FIAID 4 architecture is eliminated. An example of a FiAlO 5 system is 25 described in U.S. Patsnt No. 4,761,785 to Clark at al.
As in a RAiD a system, a limitation of a RAID 5 system is that a change in a data block requires a (lead-Modfty-Write s6quence comprising Nvo Read and two Write operartions: the old partly block and old d8ta block must be read sand XOR'd, and the resulting sum must then be XC?R'd with ttt9 new data. Bath the data and th~

parity blocks then mb~c be r~nrrit(sra to tho dlsl'; drives. VVJftile the hero Fiaad operations may bs dAns in parallgl9 as oan the two 4nlrlts operations, modtttcadon of a block of din ~ SID ~ or ~ RAID 5 system still takes subst~cdally longer den tt~e earns opsr~bn on a corerentional disk. A comrerrtlonai disk dos$ not require tf7e preliminary Read operation, and thus does hav~ to wait for the disk drive to rotate back to life previous positiort In order to perform the Write operation. The rotational laterf~cy time alone can amount to about ~09b of the time required for a typia~t! dmodti~tior~ operation. Further, two disk storar~e units are lrrvotved for the duration of each data modifir~tfon operation, Ilmt~ng the throughput of tt~e system as a whale.
Despite ttts ~Ir'rte performance penalty, RAID 5 type systems have become increasingly Popular, since they provide high data reliablifty with a tow overhead cx~st for redundancy, good Bead performance, and fair Write performance.
Hourever, it would be desirable to have the benefits oi: a RAID 5 system without the Wttts performance pertatty resuf~ng from life rotational Is~tency time imposed by the parity update operation.
The present Invention provides suds a system.

~t.9~~1 ~3~ °I'iwl~ i~9°l~d~'~(~~
The present invention sohres the errrar-rorraatlan block taardiQnac~ Inherent in a F~I~ 5 architaature by recngni~on that starag~ emit eases are irrtermittent.
"t'hhat Is, at various tim~s one ar mare Af the stc~raga units in a RAI~ ~
system are Idly In terrns of access requ~sts by th~ CPU. 'this ctsaracterlsttlc ~n be exploited by providing a "copybook cache" storage unit as an adJunct to a standard RAID
system. 'i'he pr~s~nt invention provides two attemative methods of operating such a syst~m.
In bath embodiments, when a Wrtta occurs to the F~AID system, the data is t d immediately written to the first available location in th~ capybadc cache storage unit. Upon cornpletion of the Wrfte to the copybook cache storage unit, the heat CPU is immediately informed that the Write was successful. Thereafter, further starag~ unit accesses by the CPU can continue without wait)ng for an error-correctton block update for the data just wrftten.
15 In the fjrst embodiment at the inverl"tton, during idle lima for relevant storage units of the storage system. an error~corractian bloGc (e.g., XOR parity? is computed for each "pending" data block on the copybook cache storage unit, and the data block and corresponding error-correction black are copied to their proper location in the FIAIC system. Optionally, if a number of pending data blocks are to be 2D written to the same sbfpe, an ~rrQr-correction block can be calculated from all data blocks in the stripe at one time, thus achieving some econarny of time.
In this embodiment, me copybook cache storage unit in eftact stores "peak load"
Write data and then completes the actual Write aperatjons to the FtAlO system during relatively quiescent periods of IIO accesses by the GPU.
In the second embodlmertt of the inver>tion, after Wrtte data is logged to the copybook cache storage unit, normal Read-Modify~Write operation by the f~AID
system controller continues in overlapped tashlon with other CPU IIO
accessras, using Write data in the Controller's buffer memory. Performanc~ is enhanced because the CPU cgn continue processing as soon as the simple carne operation -8..
to the copybaGk cache strarage unit completes, thin eliminating the delay caused by a normal Read-Madlfy-Write RAID system. In this embodimsrrt, the copybook cache storage unit sots more as a running "log° of Wr'ttg data. Data integrity is proservsd since the Write data is saved to the colayback cache storage unit end thus aooesstble even it the Raad-Modify-Write operation to the RAID system never con9(3let~DS.
'1°h~ copybook cache storage unit is preferably non-voisdile, so that data will net be lost on a power tallure. ft the c~opybadt cache storage unit is a dlak drhre, it preferably Is paired wfth a 'mirror" storage unit for fault tolerance.
Opttonaliy, the t t) copyback cache storage unit may ba a solid~te storage unit, which cart aohleve substaraially faster Write and error-corractiQn bk~dc update times titan a disk drive.
The details of the preferred embodiments of ute present tnverttton are set forth in the ar~,.compartying drawings and the descxiption below. Once the details vt the invention ar~ known, numerous addidonal innovsftions and changes will become obvious to one skilled in zhe art.

.g.
~i~i~EE~C~I~'Y'i~~ ~p "Ci'iE A'~ill~9C~
F1C~U~4E 9 is black dlagrarn of a copybac#c cache FIAIt~ system 1n asxordawra with the preserr2 irrverrblon.
~IGIIRE 2 Is a flo~nr-chart of Reaud and t~lriie op~r~don in ~ccorcianwith a ttrst embodlmant of the preaanrc invr~.
FiGIJt~E 3 is a flow-chart of Head anc! Write oper~tlon in accordance with s~
second embodiment at the present tnvertdon.
like reference numbers and designations in the drawings refer to Ilke elements.

p~°f'P~d' Pit39SC
CP~I~~J C~SCRI°iO~i CAF 1"~ti~'~'°'PIC~~i "itarouphout this dsscription> the pee?erred ~mbc~dimarits and examples shown should be considered as exemplars, rather than lirv~itrstiona on the present irlon.
Fi(~tJt~~ 1 is blocac diagram of ~9 copyback cache i~Ait~ system in accordance with ttte present invention. Shown are a CPU 1 coupled by a~ bus ~ to an array controller 3, whioit in the prv~ferred embodiment is a fautt~toi~rant cnritroller. 'fh~
array controller 3 is cqupted to each of the plurality of storag~ units s1-S5 (flue being shown by way of exempla only) by an I/O bus (e.g., a SCS1 bus). The stqrage units S1-SS ate failure independent, meaning that the failure of one unit saes not affect the physical operakton of other units. The array controller 3 is preferably includes a separately programmable processor (tar ~xample, the MIPS
RRtSC processor, made by MIPS of Sunnyvale, California) which can act independently of the Ci'U 1 to cortteol the storage units.
Atso attached to the controller 3 Is a copyback cache storage unit CC, which in the preferred embodiment is coupled to flue common I/O bus (8.g., a 8CSI bus) so that data cart be transferred berivaen the copyback cache storage unit CC
and the storage units S1-SS. The copybaCk cache storage unit CC is pr~ferat~ty non-volatila, so that data will not be lost on a power failure. If the copyback cache storage unit CC is a disk drive, it preferably is paired with a "mirror"
storage unit CC' for fault tolerance. The mirror storage unk CC' is coupled to the cortCrotter 3 such that all data written to the copyback cache storage unit CC is also written essentially simuttaneousty to the mirror storage unit CC', in known fashion.
Optionally, the copybadc cache storage unit CC may ba a solld~state storage unit, 5 which can achiev6 substantially faster Write and error-correc~tlon block update times than a disk drive. tn sucn a case, the solid-stat~ storage unit preferably includes error-datecHon and correction circuitry, and is either non-voiatil4 or has a battery backup on the power supply.

Tho storar~~ unitt~ S1-;~ cba groupaet into orm or mare r~urldancy groups.
1r3 the iiiustrated a~aralpt~s ~dascribad bolaw, ttto redundancy group comprises all of the s'tatage units S1-SS, for slmpllotty of oxplanatir~n.
T°ne prsaserrt invetttlott is preferably irrfplemartted as a ccmputar progr~un axecu~d S by the o~arvtrol9ar ~. FI~aI~Ft~ ~ is a high°lavel tlav~echs~rt roprasarrdng thg steps of the f~a~sd and YYyrite pr s for ~a ittst embodiment of the invention. FIC~~IR~
~ is a high-loved flowchart rapraser'rting the steps of the Road and vyrita proca&aes for a sac~,ond embodiment of the invention. The steps shown in FIGURhS 2 and 3 era raserenc~d below.
Tt~a PeaK Load ~rnbodlmenr 'fh~r controller a monitors input/output requests from the CPU 1 on asserdialiy a ~ntlnuous baaia (Step 20). If a Write request ir> pending (Step ~i ), the data block is immediately written to the first aYaliable location in the oopybadc cache storage unit CC (Step 22) (the data block is also stored on the mirror storage unit CC', if present). preferably, wrtting begins at the first logical block on the copyback cache storage unit CC, and Continues sequen~dalty to the and of the logical blocks. Thereafter, writing commences again at the first block (so long as no blocks are ovatwrirten that have not been stared in the array). This preferred method minlmtzes time-consuming S>rEiC operations (i.a., physical movements of a ReadtWrita head in a storage unit) in the copyback cache storage unit CC.
each data block stored on the copyback cache storage unit CC is also flagged with the location in the army Where the data block Is uJVmataiy to ba stored, and a pointer is sat to indicate that the data block is in the copyback cache storage unit CC (Step 43). This location and pointer ir>tormation is preferably kept in a 2S separate table in memory or on the copybadc cache storage unit CC. The table pref~tably comprises a directory table having entries that include standard information ragatding the si2a, attributes, and status of each data block. In addition, each entry has one or mote fields indicating whether the data block is stored on the copyback cache storage unit CC or in the array (S1-Sti), and the _t~~
°norrnai° 9oo~tlcan in the afray Par the data block's. Creation of ~~s~ dir~t~ary tables Is w~li-6tr~awn iat the ~'°t.
ft a data biaciC is vuritien to th~ ~pybao~c eacfte storage ur>it CC ~hfle a data block to be stared at the same location in the array is still a °psndirag block (a dad block that beta bean Written to the c~pyback cache stafage un'sk CC but not tra~ferfad t~ the array of -SS), the directory location pofrrter far ft ra data block is ~tl~Cl~~ to poir(t to the °neW° V8f~i0n f~l8r tt't$lrt to tile °016~~ 1I~n~tNi. 'Th9 ~51d version Is tt~araaft~r ignored, and may be written aver 1n subsac~uartt operations, Air a Write request is processed in this fashion, the corrOrotler 3 immedlat~ty f t? s~nds an acknowledgement to the CPU 1 indicating that the Wye do was successful (Step 24). 'f'~a monitoring process then repeats (Step 2a~. Further stofage unit accesses by the CPU ~ can cohunue without wat~ng for an error-corfection block update for the data blocfc just written. Thus. ~ W~a 'you~h-put" time of the array app~ars to be the same as a norn-redundant sys~m, slnc~
1 S storage of the Write data on the capybaok cache storage unit GC does not require the Read-Modify-Write sequence of a standard RAID system with respect t4 operation of the CpU 1.
if a Write request is not pending (Step 29), the controller 3 tests wi,ethef a (dead request is pending (Slap 2f3). it a Read request is pending, the controller 3 reads the directory table to determine the looatlon of each requested dais blodt (Step 2'~. If a requested data block is not In then array (Step 2i3), the ocntrolter 3 reads the block trorn the copybadc cache storage untt CC and transfers it to the C?U
1 (Step 29). Ttte monitoring process then repeats (Step 3o). ft the requested data block is in the array (Step 2b), the corrtrallar 3 reads the block from the stray (S1-S5) in normal tashton and transfers it to the CPU 1 (Step 31). The monitoring prdoess then repeats (Step 32).
Some embodiments of the invention may include disk cache memory in the corttroller 3. Read requests may of course be 'transparentt,P satts~d from such a cache in kno~rn Fashion.

If no lNrite or i~~aad operation la pending for parti~~l~tr scorag~a uni?s in the ~trtay, Indtcatlng that thosfl storage units aria "Idle" with rs~spact to Cf'~I 7 i/0 ao~sa9a, the ~~~arrc~ollar ~ checks tea see if any data blocks s,ra "pending blocks flagged to loc~dons cn ttrQ idh storag~ urltis. If no pencllng blocks exist (Step S3), the corrtroll~r a begins the monitoring cycle again (ata~p ).
if a pending block does exlsd (Step ~), the controller ~ reads a par~ltng block from the copyback cache storage anal CC (Step Vii). The Controller 3 then writes ~t~ pending block to the proper loca~on in the array, and compute and stores a new error-corracilon block that Is computed based upon the pandlrtg block.
1 o In the preiarred ~mbodiment of the irnrerrtlon, the error-corrs~on btodrs contain partht information. Titus, update of the error-eorreGtion block for the pending block can be accomplished by reading the old data block and old error-correction block corresponding to the array location indicated by the location informa~c~n for the pending block stored In the directory (Step 36). The controller 1 S 3 than XOR's the old d~a block, the pending data block, and tt~ okt error-correction block to generate a new error~orrectlon btodc (Step 37). The new error-correction block and the pending btxk are then written to the array S1-at their proper locations (Step 38).
Optionally, if a number of pending blocks are to be written to the same stripe, 20 error-correction can be calculated for all data blocks in the stripe at one i>me by reading all data blocks in the stripe that are not being updated, XORmg those data blocks with the pending blocks to generate a new error-correction block.
and writing the pending blocks and the new error-correction biotic to the array, This may achieve some economy of time.
?~ After the pending block is transferred from the capyback c~ChB storag9 unit CC
to the array, the directory entry for that bto:k Is modtfted to indicate that the data block is In the array rather than in the copyback cache storage unit CC (Step 39), Thereafter, the Corttroller 3 begins the monitoring cy~ie again (Step 40), -7 ~-PJitlougn ~'aa inv~3ntfon has bin desgxfbed In term of .d sequerTtiel branching process, the Invention may allso be inlpl~9173~9rrted In a mutt=tasking systefn as sepatate t~ks ~xecuxlng concurrently. Thus, Ute F~ea~t and Writs processes deacribvd above, as wall as the trtsr of pending data blor~s, may be Implemar~tad as separate tasks executed ooncurremtiy. Accordingly, tits tests indicated by Steps 21, 26, and 3~ in FiGtJRl: 2 may be impllcPtly performed in the c~lllng of ilea associated tasks for WrtUng and Fieadlng data blocks, anr~
transfer of pending bloGcs. Thus, for example, ttte transfer of a pending block from the copyback cache storage unit CC to a storage unit in the array may be performed 1 o concurrently with a Flead operation to a different storapg unit in the array. Further, it the array is of the type that permits the controller 3 to ~stac:k~ a number of I/O
requests for each stors~ge unit of the array (as is the case wish marry SCSi~based RAlD systems), the operations described above may b~ performed 'oonourrentPy' with respect to accesses to the same storage unit.
1 s The Data Log Embodiment As In the embodtmerrt describe above, the cormoller 3 monitors Input/output requests rrom the CptJ 1 err essentially a continuous basis (Step 50). In this embactimgnt, the controller 3 is provided with a relatively large (for example, one megabyte) data buffer to temporarily store data to be written to the array, tf a 2Q Write request is pending (Step 51 ), ttte data block I$ immediately written by the controller 3 to the first available locadan in the copybook cache storage unit CC
(Step 52) (the data block is also stored on the mirror storage unit CC', if present).
Preferably, wrtdng begins at the first logk~J block on the copybadc cache storage unit CC, and cornfnues sequentfalfy to the end of the logical blocks.
Thereafter, writing commences again at the first block (so long as no blocks are overwritten that have not been stored in the array). This preferred method minimizes SEl=K
operations in the copybaotc cache storage unit CC.
In ttte first embodiment, SEEK operations are required to retrieve pending blocks during idle times to transfer to the array. In this embodiment, the oopybadc cage 3o storage unit CC acts as a running 'tog' of Write data. in contrast wiW the ftrst embodimerrt, SEEK operations normally are necessary only to Change to a next 1 ~'~-d~t:~~~~ratir7~ ~~~ {u,~., ~ n~;~t aylinticr in a ~.ilsk ctriv.~j :vl~~sn tlt~
cc~rr~rri =arse; is f~ti(.
ar tca r~sad ttts.~ p~ac~N~rtt~a head back vo tt~~ logical b9c~inr~inc~ wr 2h~
,xrarage unit aft~rr r~~ohlnc~ th;~ 3ntl, car to re~tr'we ~ia~°~ bioc#~F.~ at'i~r ~
9afls~r~.
each data bl~rck ~tar~cl on tP~~ c~apyk~~cir ~ctaa ;~t~ar~aga unit t~~, i~
ai:~s~ flagged ~r9~kh the leatlon ire 2ha ;~rta~~ ~,~ta~ro then d~a~ta~ bloc's is ultim~t~iy tc~ b<a :;crated and tPte le~oaaimn caf the data blcack in the copybacic c~.aoi~~a stc~rag~ unit CC, and a poirrtor is sat to indicat~ that the clatt<t block !s In tho controller barter (Step 33j.
As before, such location and pointer information is pre9er~abiy i~opt in a directory table.
1 t~ muse at the buffQr in the corrtrplisr 3, the de~n~lon o9 a °psnding block" in the second erc~abodiment diPPers somewhat from the defin~lon in the ~~st embodiment described above. A "pending block' is a data block that has been Writte~l to the copyback cache storage unit CC but not transferred from ~ controller buffer to the array S1 ~5.
It a data block is written to the capyback cache storage unit CC while a data block to be stored at the same location in the array is still a "pending block" in the controller buffer, the directory location pointers for the data block are changed to point to the "new" version rather than to the "old" version both in the copyback cache storage unit CC and in the buffer. The old version is thereafter ignorAd, 2C and may be written aver In subsequent operations.
After a Write request is processed in this fashion, the controller 3 immediately sends an acknowledgement to the CPU t indica~ng that the Write operation was successful (Step S4). TTte monitoring process then repeats {Step 55j. FurtPler storage unf2 accesses by the CF~t1 1 can continue without watttng for an errar-:..5 correcilon block update for the data btodt Just wrttcen. Thus, the Write response time of ttte array appears to be the same as a non-redundant system, since storage of the Wr-tte data on the copyback ache storsg~ unit CC does not require the Read-Modlf)r-Wrhte sequence of a standard Re0.iD system with respect to operation of the CPU t.

.t~,.
It a Write reRuest is nc~t ponding (Step 51 ), the controller 3 tests v~hathar a Rsad raRuast is paneling (Step ), ti a F~oad rektuest is pending, the c~arrtrsalier 3 reads tt~ea diractc~ry table to determine tY~g lotion of each roquested data. block (Step 3~. if a requested data block is in the array (Step ;a8), the cortbrollor 3 reads the block from th~ array (Si ~-S5) in normal fashion and ~rariafars it to the CPU
1 (Slap 5g). ~ha man~oring pros then repeats (utep BO).
tt a rertuested data block is net in the array (Step 58), it is in the buffer of the controller 3. The controller 3 transfers th~ data block from its buffer to the CZ'U
1 (Step 61 ). "this operation is extremely fast compared to the first ernbo~tmant, since ih~ buffer operates at electronic speeds with no mschanicalty-impos~ad latency period. The monftoring process then repeats (8tep ~.
if no Write or Raad operation is pending for particular storage units in thte array, indicating that those storage units ate "idle" with re$pect tQ CPU 1 I/O
accssges, the controller 3 checks to see if arty data blocks in its buffer are 'pending blocks"
~aggad to locations on the idle storage units. if no pending blocks exist (Step ts3), tPte controller 3 begins the monitoring cycle again (Step 64).
tt a pending black does exist (Step 63), the corttiroller 3 accesses the pending blor,#c (Step 6a~, and then computes and stores a new error-correction biotic based upon the pending block. As before, in the preterr~d embodiment of the irnention, the error~orrec~on blocks contain parity information. Thus, update of thg error-corrsctJnn block for the pending block can be accomplished by reading the old data block and old error~.,rorrectian block corresponding to the array location indicated by the location information for the pending block stored in the directory (Step 6Sy. The controller 3 them XOF1's the old data block, the pending 2S data block, and the old error-correctfon black to generate a new error-correction block (Step 6~. The new error-correction block and the pending block are then wrtKen to the array S1.S5 (Step 88j.
Optionally, it a number of pending blocks are to be written to the same stripe, error-corn~ctton can be calculated for all data blocfcs in ttm stripes at one time by .i ~_ rca~~linr~ a!1 ~~ r~IcacY,s in tMa ~tri~:.-~ tiaat are root iaaing upd~al~, YOR'irsg those data blocks with th~a parading bioc3cr to c~anor~t~ a new error-correction bioc#c, and writing the pa,~cting k~ha~:#~:~ ~ra~ thea r3asra error-c:orraction block tee the array. i'ttis rraay ~aie~e soma acoratarray rai 'tsrra~a.
Altar tt~~s pen~iirag bloc#t i~ tr~a~rr~l from th~a bv.rtf~~r of the controller 3 to the ~arra~r, the dirac°tory irs mo~lth eaa to iradica~ that the pandlng block is no longer valid Ira the copybaokt cacila saor~ga unit CC or in tMa buffer (Step ag), The old pandlng bilk is thareafiar ignored, and may I~a v~ritt~an ovor in subsequent opera-tions. T'trra controller 3 than rash tMe mon~oring cycle (step ~0).
i 0 !P a faliurr~ to the syst~m occurs before alt pending blacks era written from rite b~rtPar to tho array, the oontrolier 3 can read the pending blocks frcxn the ~pyback cach~ storage anti ~~ that were not wrttten to the array. Th~
controller ~ then wrttes the selected pending blocks to the array.
Again, although the irnrentton has bean dasoribed in terms of a sequentlaJ
t 5 branching process, the invention may also ba implemented in a mtattking system as separate tasks executing concurrentty. Accordingly, the lasts 'sndir~teC!
by Steps 8t, 56, and C~,3 in FiGURIr 3 may ba impildt<y performed In the Ailing of the associated tasks for Wr'rttng and Reading data blocks, and transiar of pending blocks.
~0 The present invention therefore provides the benafrts of a FtAIO system without the Write performance penalty resutUng from the rotational latency time imposed by the standard arror.correction updat~ operation, so long as a clan-loaded cond~ion exists with respect to i10 acc~55es by the CPU t. idle time for any of the stray storage units is productively used to allow data stored on the copybadc cetfte ~5 xtorags unit C~ to be written to the array (either from the cacho itself, or tram the controller buffer) during momerrta of relative inactivity by the CPU t , thus improving overall perfomtanoa.

A number of embodiments of the present invention have been described.
Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, the present invention can be used with S RAID 3, RAID 4, or RAID 5 systems. Furthermore, an error-correction method in addition to or in lieu of XOR-generated parity may be used for the necessary redundancy information. One such method using Reed-Solomon codes in disclosed in U.S.
Patent No. 5,148,432.
As another example, in many RAID systems, a "hot spare" storage unit is provided to immediately substitute for any active storage unit that falls. The present invention may be implemented by using such a "hot spare" as the copyback cache storage unit CC, thus eliminating the need for a storage unit dedicated to the copyback cache funtion. If the "hot spare" is needed for its primary purpose, the RAID system can fall back to a non-copyback caching mode of operation until a replacement disk is provided.
As yet another example, the copyback cache storage unit CC may be attached to the controller 3 through a dedicated bus, rather than through the preferred common I/O bus (e.g. a SCSI bus).
Accordingly, it is to be understood that the invention is not to be limited by the specific illustrated embodiment, but only by the scope of the appended claims.

Claims

1. A fault-tolerant storage device array including:
a. a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks;
b. at least one copyback cache storage unit for temporarily storing data blocks;
c. a storage unit controller, coupled to the plurality of storage units and to the at least one copyback cache storage unit, including control means for:
(1) writing received data blocks initially onto the at least one copyback cache storage unit as pending data blocks;
(2) during idle time of at least some of the plurality of storage units:
(a) reading at least one pending data block from at least one copyback cache storage unit;
(b) accessing the storage units and reading information corresponding to each read pending data block;
(c) generating an associated error-correction block from the read information and each read pending data block;
(d) writing each such read pending data block and associated error-correction block to a corresponding stripe of the idle storage units;
(3) reading requested data blocks from at least one copyback cache storage unit when such requested data blocks have not been written to the plurality of storage units, otherwise from the plurality of storage units.

2. The storage device array of claim 1, wherein the control means acknowledges completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

3. The storage device array of claim 1, wherein the information corresponding to each read pending data block includes a corresponding old error-correction block and corresponding old data block read from the corresponding stripe of idle storage units.

4. The storage device array of claim 2, wherein generating a new error-correction block further includes a. reading a corresponding old data block from the corresponding stripe of the idle storage units;
b. reading a corresponding old error-correction block from the corresponding stripe of the idle storage units;
c. exclusively-OR'ing the old data block, the old error correction block, and the read pending data block, thereby generating a new error-correction block.

5. The fault-tolerant storage device array of claim 1, wherein at least one copyback cache storage unit is non-volatile.

6. A method for storing data in a fault-tolerant storage device array comprising a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks, including the steps of:
a. providing at least one copyback cache storage unit for temporarily writing received data blocks;
b. writing received data blocks initially onto the at least one copyback cache storage unit as pending data blocks;
c. during idle time of at least one of the plurality of storage units:
(1) reading at least one pending data block from at least one copyback cache storage unit;
(2) accessing the storage units and reading information corresponding to each read pending data block;
(3) generating an associated error-correction block from the read information and each such read pending data block;
(4) writing each such read pending data block and associated error-correction block to a corresponding stripe of the idle storage units;
d. reading requested data blocks from at least one copyback cache storage unit when such requested data blocks have not been written to the plurality of storage units, otherwise from the plurality of storage units.

7. The method of claim 6, further including the step of acknowledging completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

8. The method of claim 6, wherein the step of generating an associated error-correction block from the read information and each read pending data block comprises the steps of:
a. generating a new error-correction block as a function of at least the read pending data block, and a corresponding old error-correction block and corresponding old data block read from the corresponding stripe of the idle storage units.

9. The method of claim 8, wherein the step of generating a new error-correction block comprises the steps of:
a. reading a corresponding old data block from the corresponding stripe of the idle storage units;
b. reading a corresponding old error-correction block from the corresponding stripe of the idle storage units;
c. exclusively-OR'ing the old data block, the old error-correction block, and the read pending data block, thereby generating a new error-correction block.

10. The method of claim 6, wherein at least one copyback cache storage unit is non-volatile.

11. A fault-tolerant storage device array including:
a. a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks;
b. at least one copyback cache storage unit for temporarily storing data blocks;
c. a storage unit controller, coupled to the plurality of storage units and to the at least one copyback cache storage unit, having a buffer memory and including control means for:
(1) writing received data blocks initially onto the at least one copyback cache storage unit;

(2) temporarily storing received data blocks in the buffer memory as pending data blocks;
(3) during idle time of at least some of the plurality of storage units:
(a) accessing at least one pending data block from the buffer memory;
(b) accessing the storage units and reading information corresponding to each accessed pending data block;
(c) generating an associated error-correction block from the read information and each accessed pending data block;
(d) writing each such accessed pending data block and associated error-correction block to a corresponding stripe of the idle storage units;

(4) reading requested data blocks from the buffer memory when such requested data blocks have not been written to the plurality of storage units, otherwise from the plurality of storage units.

12. The storage device array of claim 11, wherein the control means acknowledges completion of writing a received record to the at least one copyback cache storage unit before writing the received record to one of the storage units.

13. The storage device array of claim 11, wherein the read information includes a corresponding old error-correction block and corresponding old data block read from the corresponding stripe of the idle storage units.

14. The storage device array of claim 13, wherein generating a new error-correction block further includes a. reading a corresponding old data block from the corresponding stripe of the idle storage units;
b. reading a corresponding old error-correction block from the corresponding stripe of the idle storage units;
c. exclusively-OR'ing the old data block, the old error-correction block, and the accessed pending data block, thereby generating a new error-correction block.

15. The storage device array of claim 11, further including means for reading selected data blocks from the at least one copyback cache storage unit and writing such selected data blocks to the plurality of storage units upon a failure of the storage unit controller to write all corresponding data blocks from the buffer memory to the plurality of storage units.

16. The fault-tolerant storage device array of claim 11, wherein at least one copyback cache storage unit is non-volatile.

17. A method for storing data in a fault-tolerant storage device array comprising a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks, including the steps of:
a. providing buffer memory and at least one copyback cache storage unit for temporarily storing data blocks;
b. writing received data blocks initially onto the at least one copyback cache storage unit;

c. temporarily storing received data blocks in the buffer memory as pending data blocks;
d. during idle time of at least one of the plurality of storage units:
(1) accessing at least one pending data block from the buffer memory;
(2) accessing the storage units and reading the information corresponding to each accessed pending data blocks;
(3) generating an associated error-correction block from the read information and each such accessed pending data block;
(4) writing each such accessed pending data block and associated error-correction block to a corresponding stripe of the idle storage units;
e. reading requested data blocks from the buffer memory when such requested data blocks have not been written to the plurality of storage units, otherwise from the plurality of storage units.

18. The method of claim 17, further including the step of acknowledging completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

19. The method of claim 17, wherein the step of generating an associated error-correction block for each accessed pending data block comprises the steps of:
a. generating a new error-correction block as a function of at least the accessed pending data block, and a corresponding old error-correction block and corresponding old data block read from the corresponding stripe of the idle storage units.

20. The method of claim 19, wherein the step of generating a new error-correction block comprises the steps of:
a. reading a corresponding old data block from the corresponding stripe of the idle storage units;
b. reading a corresponding old error-correction block from the corresponding stripe of the idle storage units;
c. exclusively-OR'ing the old data block, the old error-correction block, and the accessed pending data block, thereby generating a new error-correction block.

21. The method of claim 17, further including the steps of reading selected data blocks from the at least one copyback cache storage unit and writing such selected data blocks to the plurality of storage units upon a failure of the storage unit controller to write all corresponding data blocks from the buffer memory to the plurality of storage units.

22. The method of claim 17, wherein at least one copyback cache storage unit is non-volatile.

23. A fault-tolerant storage device array including:
a. a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks;
b. at least one copyback cache storage unit for temporarily storing data blocks;
and c. a storage unit controller, coupled to the plurality of storage units and to the at least one copyback cache storage unit, including control means for:

(1) writing received data blocks initially onto the at least one copyback cache storage unit as pending data blocks;
(2) during idle time of at least some of the plurality of storage units:
(a) reading at least one pending data block from at least one copyback cache storage unit;
(b) accessing the storage units and reading information corresponding to each read pending data block;
(c) generating an associated error-correction block from the read information and each read pending data block; and (d) writing each such read pending data block and associated error-correction block to a corresponding stripe of the idle storage units; and (3) acknowledging completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

24. The fault-tolerant storage device array of claim 23, wherein at least one copyback cache storage unit is non-volatile.

25. A fault-tolerant storage device array including:
a. a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks;
b. at least one copyback cache storage unit for temporarily storing data blocks, at least one of said at least one copyback cache storage unit being non-volatile;
and c. a storage unit controller, coupled to the plurality of storage units and to the at least one copyback cache storage unit, including control means for:

(1) writing received data blocks initially onto the at least one copyback cache storage unit as pending data blocks;
(2) during idle time of at least some of the plurality of storage units:
(a) reading at least one pending data block from at least one copyback cache storage unit;
(b) accessing the storage units and reading information corresponding to each read pending data block;
(c) generating an associated error-correction block from the read information and each read pending data block; and (d) writing each such read pending data block and associated error-correction block to a corresponding stripe of the idle storage units.

26. The storage device array of claim 25, wherein the control means acknowledges completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

27. A fault-tolerant storage device array including:
a. a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks;

b. at least one copyback cache storage unit for temporarily storing data blocks, at least one of said at least one copyback cache storage unit being non-volatile;
and c. a storage unit controller, coupled to the plurality of storage units and to the at least one copyback cache storage unit, including control means for:

(1) writing received data blocks initially onto the at least one copyback cache storage unit as pending data blocks;
(2) during idle time of at least some of the plurality of storage units:

(a) reading at least one pending data block from at least one copyback cache storage unit;
(b) accessing the storage units and reading information corresponding to each read pending data block;
(c) generating an associated error-correction block from the read information and each read pending data block; and (d) writing each such read pending data block and associated error-correction block to a corresponding stripe of the idle storage units;
and (3) acknowledging completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

28. A fault-tolerant storage device array including:

a. a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks;

b. at least one copyback cache storage unit for temporarily storing data blocks;

and c. a storage unit controller, coupled to the plurality of storage units and to the at least one copyback cache storage unit, including control means for:

(1) writing received data blocks initially onto the at least one copyback cache storage unit as pending data blocks;
(2) reading at least one pending data block from at least one copyback cache storage unit;
(3) accessing the storage units and reading information corresponding to each read pending data block;
(4) generating an associated error-correction block from the read information and each read pending data block;
(5) writing each such read pending data block and associated error-correction block to a corresponding stripe of the storage units; and (6) acknowledging completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

29. The storage device array of claim 28, wherein the information corresponding to each read pending data block includes a corresponding old error-correction block and corresponding old data block read from the corresponding stripe of the storage units.

30. The storage device array of claim 28, wherein generating an error-correction block further includes:

a. reading a corresponding old data block from the corresponding stripe of the storage units;
b. reading a corresponding old error-correction block from the corresponding stripe of the storage units; and c. exclusively-ORing the old data block, the old error correction block and the read pending data block, thereby generating a new error-correction block.

31. The storage device array of claim 28, wherein at least one copyback cache storage unit is non-volatile.

32. The storage device array of claim 28, wherein said storage unit controller further includes control means for reading requested data blocks from at least one copyback cache storage unit when such requested data blocks have not been written to the plurality of storage units, otherwise from the plurality of storage units.

33. A fault-tolerant storage device array including:
a. a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks;
b. at least one copyback cache storage unit for temporarily storing data blocks, at least one of said at least one copyback cache storage unit being non-volatile;
and c. a storage unit controller, coupled to the plurality of storage units and to the at least one copyback cache storage unit, including control means for:
(1) writing received data blocks initially onto the at least one copyback cache storage unit as pending data blocks;
(2) reading at least one pending data block from at least one copyback cache storage unit;

(3) accessing the storage units and reading information corresponding to each read pending data block;
(4) generating an associated error-correction block from the read information and each read pending data block; and (5) writing each such read pending data block and associated error-correction block to a corresponding stripe of the storage units.

34. The storage device array of claim 33, wherein the control means acknowledges completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

35. The storage device array of claim 34, wherein the information corresponding to each read pending data block includes a corresponding old error-correction block and corresponding old data block read from the corresponding stripe of the storage units.

36. The storage device array of claim 35, wherein generating an error-correction block further includes:

a. reading a corresponding old data block from the corresponding stripe of the storage units;
b. reading a corresponding old error-correction block from the corresponding stripe of the storage units; and c. exclusively-ORing the old data block, the old error correction block and the read pending data block, thereby generating a new error-correction block.

37. The storage device array of claim 33, wherein said storage unit controller further includes control means for reading requested data blocks from at least one copyback cache storage unit when such requested data blocks have not been written to the plurality of storage units, otherwise from the plurality of storage units.

38. A fault-tolerant storage device array including:
a. a plurality of failure independent storage units for storing information as stripes of blocks, including at least data blocks and associated error-correction blocks;
b. at least one copyback cache storage unit for temporarily storing data blocks, at least one of said at least one copyback cache storage unit being non-volatile;
and c. a storage unit controller, coupled to the plurality of storage units and to the at least one copyback cache storage unit, including control means for:
(1) writing received data blocks initially onto the at least one copyback cache storage unit as pending data blocks;
(2) reading at least one pending data block from at least one copyback cache storage unit;
(3) accessing the storage units and reading information corresponding to each read pending data block;
(4) generating an associated error-correction block from the read information and each read pending data block;
(5) writing each such read pending data block and associated error-correction block to a corresponding stripe of the storage units; and (6) acknowledging completion of writing each received data block to the at least one copyback cache storage unit before writing such data block to one of the storage units.

39. The storage device array of claim 38, wherein the information corresponding to each read pending data block includes a corresponding old error-correction block and corresponding old data block read from the corresponding stripe of the storage units.

40. The storage device array of claim 38, wherein generating an error-correction block further includes:
a. reading a corresponding old data block from the corresponding stripe of the storage units;
b. reading a corresponding old error-correction block from the corresponding stripe of the storage units; and c. exclusively-ORing the old data block, the old error correction block and the read pending data block, thereby generating a new error-correction block.

41. The storage device array of claim 38, wherein said storage unit controller further includes control means for reading requested data blocks from at least one copyback cache storage unit when such requested data blocks have not been written to the plurality of storage units, otherwise from the plurality of storage units.