WO1991001524A1

WO1991001524A1 - An error recovery method and apparatus for high performance disk drives

Info

Publication number: WO1991001524A1
Application number: PCT/US1990/004070
Authority: WO
Inventors: Robert J. Halford
Original assignee: Cray Research, Inc.
Priority date: 1989-07-19
Filing date: 1990-07-19
Publication date: 1991-02-07

Abstract

The present invention discloses an error recovery method for parallel architecture data storage devices. The present invention provides means for simultaneously arranging data on a plurality of recording surfaces so that intermittent and/or solid failures do not prevent access to the data stored thereon. A first error correcting code comprising a parity bit is generated for each dataword. The dataword and the parity bit are stored simultaneously and bitwise to a plurality of recording surfaces. A second error correcting code is generated for a plurality of bits transmitted to a specific recording surface. The second error correcting code is written onto the same recording surface as the bits from which it was generated. The second error correcting code is used to detect and correct intermittent errors in the data read from a particular recording surface. The first error correcting code is used to correct data read from a particular surface when the second error correcting code indicates that a solid failure has occured, which the second error correcting code cannot correct. The result is a data storage device combining large capacity and fast transfer rates with improved fault tolerance.

Description

AN ERROR RECOVERY METHOD AND APPARATUS FOR HIGH PERFORMANCE DISK DRIVES

BACKGROUND OF THE INVENTION 1. Field Of The Invention.

This invention relates generally to data storage devices for computer systems. In particular, the present invention provides an error recovery method for parallel architecture data storage devices.

2. Description Of Related Art.

Disk drives have long been popular mass storage devices. They provide a low cost solution to the problem of non-volatile data storage. Virtually all computer system manufacturers, therefore, provide for disk drives as system peripherals.

The major advantage of disk drives is low cost. This advantage is outweighed for some applications by the disadvantage of insufficient data transfer speed, particularly in supercomputer environments of the type provided by Cray Research, Inc., the Assignee of the present invention. The problems facing a computer system user wishing to increase the data transfer rates of disk drives are not trivial. The basic structure of the disk drive consists of a metal disk coated with magnetic material rotating under one or more read/write heads. Most disk drives are multi-platen systems where a number of the metal disks are arranged in a stack. All data transfers to disk drives are sequential in the sense that data moves in or out sequentially one word at a time. The access time to a selected word is partially dependent on its location. Data is recorded on the disk in concentric circles called "tracks". The disk drive has detection means for indicating when the magnetic head is positioned at the outermost track. A motor controls the head position causing the head to step from track to track. This head positioning function is called a "seek". The period required to position the Read/Write heads from the time the command is received until the time the drive becomes ready is known as the seek time.

Once a track is selected, it is necessary to wait for the desired location to rotate into position under the head. The average waiting time, known as latency time, is the time for half a revolution.

Within each track, information is organized into segments called "sectors". A sector can consist of any number of bytes, limited only by the storage capacity of the track. The addressing of sectors is typically a software function. So that the sectors can be identified by the software, each sector is preceded by an identifier block. The format of this identifier block is system dependent.

Usually each track is single bit serial, so that each byte is stored as eight consecutive bits on a track. Because track selection and latency increase access times, it is preferable to transfer large blocks of data which will be sorted in sequential locations.

Once the disk heads are positioned at a particular track and no further head movement is required, data will be transferred at a fixed rate. This fixed rate is determined by the speed of the disk drive and is independent of the computer system itself.

Parallel architectures increase disk capacity and data transfer rates, but such architectures are more vulnerable to errors and the resultant corruption of data. If there are errors in these parallel architecture devices, then greater amounts of data may become inaccessible. Thus, the usefulness of parallel architecture data storage devices is limited by the fault tolerance of the device.

SUMMARY OF THE INVENTION

To overcome the limitations in the prior art discussed above, and to overcome other limitations readily recognizable to those skilled in the art, the present invention discloses an error recovery method for parallel architecture data storage devices. The present invention provides means for simultaneously arranging data on a plurality of recording surfaces so that errors cannot prevent access thereto. The result is data storage devices capable of high capacity, fast transfer rates, and improved fault tolerance.

In the present invention, a plurality of recording surfaces are provided for recording data. A first error correction code is generated for each dataword. The dataword is divided into a plurality of portions. Each of the portions and the first error correction code are stored on separate recording surfaces. A second error correction code is generated for a plurality of portions stored on a particular recording surface. The second error correction code is stored on the same recording surface as the portions. Errors can be identified and corrected using the first and second error correction codes in burst or recovery/re-read mode error correction.

Thus, a high performance method of storing and retrieving data is disclosed, which method can detect and correct errors due to defects in recording surfaces, Read/Write heads, circuitry, controllers, cables, and other faults.

BRIEF DESCRIPTION OF THE DRAWINGS In the drawings, where like numerals refer to like elements throughout the several views:

Figure 1 is a block diagram describing the components of the data storage device in the first preferred embodiment of the present invention;

Figure 2 is a block diagram describing the operation of the data storage device in the first preferred embodiment;

Figure 3 is a block diagram describing the format of data on the data storage device in the first preferred embodiment;

Figure 4 is a block diagram illustrating an array of data storage devices used in the second preferred embodiment of the present invention; and

Figure 5 shows the logical grouping of data on the disk array and its error detection and correction means in the second preferred embodiment.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT In the following Detailed Description of both Preferred Embodiments, reference is made to the accompanying Drawings which form a part hereof, and in which is shown by way of illustration two specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.

FIRST PREFERRED EMBODIMENT

Figure 1 describes the components of a computer system in the first preferred embodiment of the present invention. A computer 16 stores data on a secondary storage device, for example, disk drive 12. A controller 10 connected to both the computer 16 and the disk drive 12 directs the operations of the disk drive 12. The controller 10 and the disk drive 12 communicate across an interface 14.

In the first preferred embodiment, the interface 14 is a modification of the Intelligent

Peripheral Interface (IPI) standard promulgated by the American National Standards Institute (ANSI) . Alternative embodiments could use different interfaces between the controller 10 and the disk drive 12. The IPI standard uses 8-bit data paths wherein a ninth bit position transmits a parity code for the 8 bits of data. Any number of 8-bit paths, and their associated parity codes, may be combined to create the interface 14. The parity code is generated at the transmitting end of the interface 14 and checked at the receiving end of the interface 14. Thus, the parity code provides limited error detection and correction for data transmitted across the interface 14. The disk drive 12, however, does not store the parity code.

In the first preferred embodiment, the interface 14 also uses a ninth bit position to transmit a parity code for each 8-bit byte of data. The controller 10 generates the parity code and the disk drive 12 stores it. Unlike the IPI standard, however, the interface 14 does not re-generate the parity code when reading data from the disk drive 12. Thus, the parity code provides detection means for any errors introduced by the disk drive 12 or the interface 14.

Preferably, the interface 14 of the preferred embodiment is switchable between the IPI standard and the implementation associated with the present invention.

The interface 14 provides a high performance, expandable I/O channel. For example, in the first preferred embodiment, the interface 14 is comprised of two 8-bit data paths providing a 16-bit wide interface 14. In accordance with the IPI standard, the width of the interface 14 can be expanded in increments of 8 data bits to achieve higher parallel transfer rates. The interface 14 can also operate in a data streaming mode of operation, wherein all paths operate unidirectionally to achieve the fastest possible data transfer rates across the interface 14.

Figure 2 describes the operation of the first preferred embodiment. The computer 16 preferably has a 64-bit word size. Associated with each 64-bit word 26 is an 8-bit SECDED (Single Error Correction, Double

Error Detection) code 28. Preferably, the SECDED code 28 is not written to the disk drive 12 with the 64-bit word 26. Data is transferred to the disk drive 12 across interface 14 which is comprised of two data paths 18 and 20. Both paths 18 and 20 operate simultaneously in transferring 8 bits of data. The controller 10 identifies four 16-bit parcels, 30/32, 34/36, 38/40 and 42/44, within each 64-bit word 26. These parcels are transferred sequentially to the disk drive 12 across the interface 14. Both bytes of a parcel are transmitted on of the paths 18 and 20 simultaneously. Simultaneously with the transfer of data, each path 18 and 20 transfers a first error correcting code. In the first preferred embodiment, the first error correcting code consists of a single parity bit. The controller 10 generates the parity bit. In Figure 2, the data bits of each path 18 and 20 are labeled "0-7" and the parity bit is labeled "P".

The controller 10 also generates a second error correcting code for each 2048 bytes of data transferred in a specific bit position of the paths 18 and 20. In the first preferred embodiment, the second error correcting code is a 32-bit ECC or checksum code, which is placed in the last four bytes of each 2,052 byte sector stored on the disk drive 12.

When the disk drive 12 receives the data from the interface 14, it selects each path 18 or 20 in turn. Each of the nine bits from the selected path 18 or 20 are written by circuits 46-62 simultaneously and bit¬ wise onto one of nine different recording surfaces 64- 80. Thus, the bits transferred in a specific bit position of each path 18 and 20 are stored on a specific recording surface 64-80.

When the controller 10 makes a read request, the parity bit and eight data bits are read by circuits 46-62 simultaneously and bit-wise from the nine recording surfaces 64-80. When two sets of nine bits have been read, one for each path 18 and 20, they are transmitted simultaneously through the interface 14 to the controller 10. A small amount of FIFO buffering is required at the disk drive 12 to hold the first set until the second set is formed.

Figure 3 illustrates the format of data stored on the disk drive 12 in the first preferred embodiment of the present invention. Each row in Figure 3 represents one of the nine bits recorded by circuits 46- 62 on surfaces 64-80 of the disk drive 12. Bits 82-96 are data bits; bit 98 is a parity bit. Each column in Figure 3 represents a 8-bit byte of data stored within a sector. The first 2048 columns represent the data or the parity code. The last 4 columns represent the second error correcting code, the 32-bit ECC. (Note that for clarity 2,052 columns are not shown in Figure 3). In the first preferred embodiment, the ECC is used to correct intermittent errors in the data and the parity bits are used to correct solid failures in the hardware.

For intermittent errors, the first preferred embodiment performs what is termed "burst mode error correction." The controller 10 detects and corrects up to 8 data bits in error that are transferred in a particular bit position of the paths 18 and 20 using the second error correcting code (the ECC or checksum code) . The ECC can correct up to eight bit errors. Preferably, the controller 10 has a buffer capable of storing data so that the error detection and correction process using the second error correcting code can take place in the buffer. Alternatively, the error detection and correction process using the second error correcting code can take place in the computer 16.

For solid failures, the first preferred embodiment performs what is termed "recovery/re-read mode error correction. " A solid failure occurs if more than eight bits are in error. Normally, the parity bits 98 are not read. However, in recovery/re-read mode, the recording surface in error is re-read along with the parity bits 98. The recording surface in error is then corrected using the parity bits 98.

Thus, if data errors occur, because of defects in a recording surface, Read/Write head, Read/Write circuit, disk transceiver, controller transceiver, cable or other fault, the controller 10 can still recreate the data stored on the disk drive 12.

If multiple recording surfaces have intermittent errors, each recording surface can be corrected. However, if a solid failure occurs for more than one recording surface, the parity bits cannot correct the errors. Note that if two recording surfaces are in error, but only the first is a solid failure, i.e., the second has no more than eight bits in error, the data from the first recording surface can be corrected using the parity bits 98 and the data from the second recording surface can be correct using the ECC.

SECOND PREFERRED EMBODIMENT Figure 4 describes the components of a computer system in the second preferred embodiment of the present invention. A computer 104 communicates with a data storage subsystem 108 via an input/output channel 106. This communication includes both control information and data to be stored on the data storage subsystem 108. The data is transmitted in, for example, 16-bit-wide parcels from the computer 104. Each bit of the 16-bit- wide parcel, plus a parity bit, is stored in a simultaneous, parallel operation on one of 17 recording surfaces (i.e., disk drives) 112a-112g in array 112. This parallel operation results in a storage transfer rate that is 17 times faster than the standard architecture. Those skilled in the art will readily recognize that the recording surfaces could be separate disk drives, separate platters, etc.

The disk controller 110 broadcasts control signals to the array of data storage devices 112 simultaneously. This controller 110 provides an interface that appears to the computer 104 as a single data storage device, thereby providing transparent operation and compatibility with existing computer systems. The array of data storage devices 112 perform the same operations simultaneously.

Additional information on the architecture of this second preferred embodiment is available in the co- pending and commonly assigned patent application S/N 07/227,367 entitled "SINGLE DISK EMULATION INTERFACE FOR AN ARRAY OF SYNCHRONOUS SPINDLE DISK DRIVES", which application is incorporated herein by reference.

Figure 5 describes how data is stored on the parallel data storage devices in the array 112. Each row (horizontal) represents bits stored on a single sector on a single track on a single data storage device. Each column (vertical) represents a 16-bit word transferred by the computer 104. Each bit is stored on a different data storage device. The parity bit on the seventeenth data storage device is generated by the disk controller 110 for error detection purposes. The data is logically grouped in 15-word segments. Each 15-word segment includes error detection and correction means labeled in Figure 5 as bits E₀ through E₁₅. Each sector includes an additional word for redundant error correction and detection labeled in Figure 5 as bits C₀ through C₁₅.

Using this storage method and using error detection and correction circuits for manipulating the Error Correction Code (ECC) bits E₀-E₁₅ in the disk controller 110, any data path in error to one of the data storage devices can be corrected for any 15-word segment. In Figure 5, bits P₀ through P₁₅ are "vertical" parity bits. They contain the odd parity value for the column of bits. The bits labelled E₀ through E₁₅ in Figure 5 are members of the ECC values for the block. The combination of ECC and parity check bits enables the identification and correction of all failing bits on any single disk drive within a 15-word segment. The row, or data storage device, in error can change every sixteen bits for the case of randomly detected unflagged media defects. As an additional check, an ECC, labeled in Figure 5 as C₀ through C₁₅, is generated over an entire sector and stored as a vertical "word" immediately following the last group in the sector. This ECC will verify that the sector was repaired correctly. Thus, the second preferred embodiment of the present invention provides a high degree of fault tolerance for a plurality of data storage devices, synchronized and controlled to emulate the operation of a single data storage device, through the use of parity bits and ECC parcels such that one data storage device within the array 112 can fail without interrupting the operation of data storage and retrieval.

CONCLUSION Although two specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement which is calculated to achieve the same purpose may be substituted for embodiments shown. For example, an alternative to the first preferred embodiment need not be restricted to eight data bits, one parity bit, and nine parallel recording surfaces. Also, a larger number of data bits would enhance throughput, while a larger number of error detecting bits would permit the use of different error correcting techniques.

As another example, an alternative to the second preferred embodiment need not be restricted to sixteen data bits, one parity bit, and seventeen parallel data storage paths. Also, a larger number of data bits would enhance throughput, while a larger number of error detecting bits would permit the use of different error correcting techniques. In addition, the sector size used with the second preferred embodiment could also be readily changed.

This application is intended to cover any adaptations or variations of the present invention.

Therefore, it is manifestly intended that this invention be limited only by the claims and the equivalents thereof.

Claims

WHAT IS CLAIMED IS:

1. An error recovery apparatus for a computer storing data on a storage device, comprising:

(a) means for generating a parity code for each 8-bit byte of data stored on the storage device;

(b) means for writing the data and the parity code simultaneously and bit-wise onto a plurality of recording surfaces of the storage device, so that each bit transmitted in a different bit position to the storage device is stored on a different recording surface of the storage device;

(c) means for generating an error correcting code for the bits stored on each recording surface and storing the error correcting code therewith;

(d) means for reading the bits simultaneously and bit-wise from the recording surfaces of the storage device, reconstructing the data therefrom, and transmitting the data to the computer;

(e) intermittent error means for detecting and correcting a finite number of the bits in error that are read from a specific recording surface using the error correcting code; and

(f) solid failure means for correcting more than the finite number of bits in error that are read from the specific recording surface using the parity code.

2. A method for correcting errors in data stored on data storage devices, comprising:

(a) generating a parity code for each 8-bit byte of data stored on the storage device;

(b) writing the data and the parity code simultaneously and bit-wise onto a plurality of recording surfaces of the storage device, so that each bit transmitted in a different bit position to the storage device is stored on a different recording surface of the storage device;

(c) generating an error correcting code for the bits stored on each recording surface and storing the error correcting code therewith;

(d) reading the bits simultaneously and bit¬ wise from the recording surfaces of the storage device, reconstructing the data therefrom, and transmitting the data to the computer;

(e) detecting and correcting a finite number of the bits in error that are read from a specific recording surface using the error correcting code; and

(f) correcting more than the finite number of bits in error that are read from the specific recording surface using the parity code.