« PrécédentContinuer »
READ BLOCK OF DATA CONTAINING
DEVELOP A SET OF WORD SYNDROMES
FOR EACH WORD IN EACH SUBBLOCK
TEST EACH SET OF WORD SYNDROMES
FOR ALL ZERO CONDITION
IF A SET IS NOT ALL ZEROS, PROCESS SYNDROME SET AND CORRECT WORD
STORE ERROR DATA FOR CORRECTED
SUBBLOCK FOR POTENTIAL USE
IN STEP 8
TEST BLOCK SYNDROMES
FOR ALL ZERO CONDITION
IF NOT EQUAL ALL ZERO, REMOVE PREVIOUS CORRECTION FROM WORD AND MODIFY BLOCK SYNDROMES ACCORDINGLY
PROCESS MODIFIED BLOCK SYNDROMES AND CORRECT MULTIPLE ERRORS IN ORIGINALLY MISCORRECTED WORD
MULTIBYTE ERROR CORRECTING SYSTEM INVOLVING A TWO-LEVEL CODE STRUCTURE
BACKGROUND OF THE INVENTION 5
1. Field of Invention
This invention relates in general to a system and method for correcting multiple byte errors in a codeword and, in particular, to a method and system for correcting multibyte errors in a relatively long block of data read from a disk file.
2. Description of the Prior Art
The prior art discloses various systems and methods for correcting errors. The following references dis- ]5 closed many of the basic ECC theories and systems.
I. 1. S. Reed and G. Solomon, "Polynomial Codes Over Certain Finite Fields", J. Soc. Indust. Appl. Math. 8, pp. 300-304, 1960.
2. W. W. Peterson and E. S. Weldon, Error-Correcting ,„ Codes, M.I.T. Press, 1972.
3. D. C. Bossen, "b-Adjacent Error Correction", IBM J. Res. Devel. 14, pp. 402-408, 1970.
4. A. M. Patel and S. J. Hong, "Optimal Rectangular Code for High-Density Magnetic Tapes", IBM J. 25 Res. Devel. 18, pp. 579-588, 1974.
5. A. M. Patel, "Error-recovery Scheme for the IBM 3850 Mass Storage System", IBM J. Res. Devel. 24, pp. 32-42, 1980.
6. G. D. Forney, Concatenated Codes, M.I.T. Press, 30 1966.
7. P. Elias, "Error-free Coding", IEEE Trans. Inf. Theory, Vol. IT4, pp. 29-37, 1954.
8. R. C. Bose and D. K. Ray-Chaudhuri, "On a Class of Error-correcting Binary Group Codes", Inf. Control 35
3. pp. 68-79, 1960.
9. J. K. Wolf, "Adding Two Information Symbols to Certain Non-binary BCH Codes, and Some Applications", Bell Systems Tech. J. 48, pp. 2408-2424, 1969.
10. R. T. Chien, "Cyclic Decoding Procedures for 40 Bose-Chaudhuri-Hocquenghem Codes", IEEE Trans. Inf. Theory, Vol. IT10, pp. 357-363, 1964.
II. E. R. Berkkamp, Algebraic Coding Theory, McGraw
It has long been recognized by the art that the data 45 stored on a magnetic medium, such as a disk file, will be subject to errors during the read back process for a number of valid technical reasons. Present day disk files include a number of different approaches to minimizing the number of errors that may occur during the read 50 back process. For example, most disks undergo a thorough surface analysis test to identify defective areas before the disk is incorporated into the drive. Those disks having errors above a certain predetermined criteria are rejected, which does have an adverse affect on 55 the manufacturing cost of the disk drive.
In addition, systems are provided in many disk drives which, based on defect data stored on a disk, cause the drive to avoid a bad track, a bad sector, or a defective area of a disk track. These later systems involve skip- 60 ping bad areas during the storage of data on the track. Other systems are included in the file which operate to reread the data when an error is detected. The rereading operation occurs under slightly different conditions each time, such as offsetting the transducer from the 65 center of the track or increasing the gain of the read amplifier until, hopefully, the error is corrected during the rereading process.
The addition of such error recovery systems is motivated primarily by the realization that it is important to minimize the number of errors that have to be corrected by associated error correcting circuitry since use of the ECC system may adversely impact overall system performance. In addition, usable storage capacity is decreased since considerably more redundancy is required if more errors must be corrected.
Systems which correct only single errors are used exclusively in current disk files. A single error, by definition, may include a burst type error involving a group of contingent bit positions. However, two separate burst errors or even widely spaced single bit errors cannot be corrected by these single error correcting systems. Consideration must, therefore, be given to the length of the data block that will correspond to the codeword in order to minimize or prevent the occurrence of more than one error in that data block. That consideration is generally based on statistical data in terms of the number of errors that can be expected on a probability basis.
While prior art systems and methods for correcting single errors operate successfully, it is recognized that their use does impact system performance so that considerable effort and expense are taken in the design of disk files to minimize their use, as explained above.
The art has further recognized that all the different error patterns which occur in one byte of a multibyte codeword are correctable using a reasonable amount of redundancy. It has also been recognized that by interleaving codewords, a burst which extends longer than one byte may be corrected, provided the length of the burst is less than "m" bytes.
It is also known that a multibyte, i.e., more than one, error correcting system may be provided in accordance with the teaching of applicant's copending application Ser. No. 454,393, filed Dec. 29, 1982, entitled "On-the Fly Multibyte Error Correcting System", and assigned to the assignee of the present invention.
One of the main reasons why multibyte error correcting systems have not been readily adopted for disk files is the constraint imposed by those codes on the block size or codeword. It is recognized that the codeword is limited to 2b where b is the number of bit positions in the byte employed in the system. Where the byte consists of eight bits, which is substantially a standard in the data processing industry, the codeword there cannot exceed generally 255 bytes. It is further recognized that for each error to be corrected in that codeword of 255 bytes, two check bytes must be associated with the codeword for each error. For example, if the code is designed to correct five errors in each codeword, then ten check byte positions must be provided out of the 255 byte positions.
It can be seen that in such arrangements the redundancy becomes quite high and the overall capacity of the disk file is severely restricted.
Besides adversely affecting useful storage capacity, the relatively small block size also imposes many undesirable constraints in the design of the data format that is used on the track.
In future disk files, it is desirable to provide better reliability and availability in spite of higher data storage density and data rates.
Conventional coding techniques, such as multiple error correcting Reed-Solomon or BCH codes discussed in references 1-3, while very efficient in terms of mathematical redundancy, impose algebraic constraints
on the size of the codeword for a given choice of byte size. Thus, in a practical application of 8-bit bytes and with high error rates, the redundancy is often still unacceptable. These considerations present major hurdles in the application of these conventional coding techniques 5 to future disk files.
A system for correcting multiple errors which does not present these major hurdles is desired. The present invention provides such a system.
SUMMARY OF THE INVENTION 10
In accordance with the present invention, a multibyte error correcting system is provided which employs a two-level code structure consisting of subblocks within a block. The structure provides two major advantages. 15 First, the improved method and system eliminates the problem of the constraint on the size of the codeword and, second, a decoding strategy is established that permits "on-the-fly" correction of multibyte errors at the subblock level and additional reserve error correc- 20 tion capability at the block level.
The two-level coding structure of the present invention employs a data format on a disk track involving subblocks within a block. As described, each subblock includes two or more interleaved primary codewords. 25 At the first code level, the coding structure is designed to correct ti symbols or errors per primary code word so that each subblock includes 2xti check bytes, i.e., two check bytes for each error in the primary codeword. The system is arranged to correct ti errors in 30 each primary codeword in the "on-the-fly" manner suggested by the above-mentioned application Ser. No. 454,393. The code structure is extended to t2 symbol correction at the block level by providing additional block level check bytes which, on reading stored data, 35 reflect corrections inserted at the first level. The block level syndromes developed at the second level, therefore, provide an indication (an all zero syndrome) of whether the corrections to the primary word at the subblock level were valid or whether a miscorrection 40 had been applied (a pattern of not all zeros). The miscorrection occurs because the primary word had more than ti errors, e.g., ti+x errors. The system corrects these ti + x errors in the primary word by using the block syndromes after a modification to reflect the mis- 45 correction, and the syndromes developed from the 2xti check bytes associated with the primary word. The block syndrome bytes and the syndromes of the primary word are sufficient to correct up to t2 errors (t2§ti+x) in one of the subblocks. 50
Since the t2 symbol error's capability is shared over several subblocks and is required for only one subblock in a block, any processing at the block level may also be completed for the block in an on-the-fly manner.
It is, therefore, an object of the present invention to 55 provide an improved multibyte error correcting system and method for use in a disk file.
A further object of the present invention is to provide an ECC system for correcting multiple errors in a relatively long block of data stored on a disk file in a man- 60 ner which minimizes impact on system performance.
Another object of the present invention is to provide an ECC system for correcting multiple errors in a relatively long block of data stored on a disk file in which there is little or no constraint on the manner in which 65 the data is formatted on the track.
The foregoing and other objects, features and advantages of the invention will be apparent from the follow
ing more particular description of a preferred embodiment of the invention as illustrated in the accompanying drawing.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 is a diagrammatic illustration showing the data format of the two-level coding structure for a disk track;
FIG. 2 is a schematic representation of the feedback shift register employed in the subblock check byte encoding operation for the error correcting system embodying the present invention;
FIG. 3 is a block diagram of the specific logic for the feedback shift registers shown schematically in FIG. 2;
FIG. 4a is an illustration of the logic of the matrix multiplier illustrated in block form in FIGS. 3 and 6, while FIG. \b is the matrix T3 which determines the logic operations of FIG. 4a;
FIG. 5 is a block diagram of the specific logic employed in the ECC system for generating one block check byte;
FIG. 6 is a block diagram of the logic employed for generating the second block check byte;
FIG. 7 is a block diagram of the first and second level portions of the ECC system correcting single errors in the subblocks by processing subblock syndromes and for correcting two errors in a subblock by processing subblock and-block syndromes.
FIG. 8 is a block diagram showing the general decoding process.
DESCRIPTION OF THE PREFERRED
FIG. 1 illustrates the data format of a disk track that embodies the two-level code structure of the present invention. It is assumed for purposes of describing a specific embodiment of the present invention that the multibyte ECC system, as shown and described, is designed to correct up to two errors in each block, an error being defined as any pattern of eight bits in one byte position of the block other than the correct pattern. It should be understood however that the invention is applicable to systems for correcting any number of errors in the block, and later on in the specification, a mathematical proof establishing the general case is provided.
As shown in FIG. 1, a track 11 is formatted into a plurality of equal length blocks 12, each of which are divided into a predetermined plurality of subblocks 14. A block check byte area 15 is associated with each block which, as shown, includes four check byte positions 16. Each subblock 14, as shown, comprises two interleaved codewords 18 and 19 which are of equal length. Two pairs of check byte positions Bi and Bo are associated with each subblock so that a different pair of check byte positions Bi and Bo are associated with each subblock codeword 18 or 19.
The details of the two-level code will be discussed in connection with the following main design parameters where:
b = number of bits in a byte (symbol) m=number of data bytes in a primary word n=number of subblocks in a block g = amount of interleaving (number of interleaved words)
tj = number of errors corrected at the subblock level t2=number of errors corrected at the block level
The parameters b, m, n and g determine many of the important capabilities of the code. For example, one symbol correction at the subblock level with "g" interleaved words protects against a burst error signal of a length up to (bg—b+1). Two-symbol correction at the 5 block level with "g" interleaved words at the subblock level protects against two different burst errors, each of which may be (bg—b+l) bits in length or one long burst up to (2bg—b+1) bits.
The above identified parameters of the ECC code 10 allow the capabilities of the code to be adjusted to match the actual measured error conditions of a product without a change in the ECC system hardware even though the actual conditions, as measured, are substantially different from those conditions for which the 15 system was initially designed.
The word length parameter m in bytes and the number n of subblocks in a block determine the capability of the code. The word length must satisfy the following equation: 20
m + 2S(2*-l)
where b represents the number of bit positions in a byte of the word.
The block length n in bytes is equal to the number g of interleaved words times the number of bytes in each word times the number of subblocks in the block.
In the system shown in FIG. 1, it is assumed a word comprises 64 data byte positions (m) and two check byte positions Bi and Bo and a subblock has two (g) interleaved words. A block, therefore, comprises four subblocks of 128 byte positions or 512 byte positions and two pairs 15 and 16 of block check bytes Ci and C2, one pair being associated with even columns and the other pair being associated with odd columns.
In general, while the two-level multibyte ECC system operates at the block level, the following description is directed to processing only one of the interleaved codewords since both are processed in the same manner. The operation involves first processing the two 40 syndrome bytes corresponding to the two check bytes associated with one word of the interleaved words of the subblock. In the specific embodiment disclosed, an error in any one byte of the word will first be corrected, regardless of the number of bit positions in a byte that 45 are in error. Thus, any of the possible 255 error patterns in an 8-bit byte of the word will be correctable by suitably processing the two syndrome bytes provided there are no other errors in that word.
The block syndromes corresponding to one of the 50 two pairs 15 and 16 of check bytes Ci and C2 associated with the block are only processed when the corresponding codeword in a subblock is identified as containing more than one byte in error. Since the multibyte (2) error correction capability is shared over several 55 relatively small subblocks and is required for only one subblock (or none) in a block, the error processing may be easily completed on-the-fly at the block level. The relationship of the block and subblock provide a unique structural advantage in the coding equations for the 60 code of the present invention which are not available or suggested by other prior art two-level coding schemes such as concatenated codes or product codes of References (6) and (7).
The preferred embodiment of the present invention 65 as illustrated in the drawing is based on codes for symbols in Galois Fields (2s) or GF(28). The primary codeword consists of two check bytes designated Bo and Bi,
and m data bytes designated B2, B3. . . Bm+i> which satisfy the following modulo 2 matrix equations:
Equations 8 and 9 per se correspond to prior art single-symbol-correcting Reed-Solomon or BCH codes in which the 8-bit column vectors correspond to elements of GF(28). In the notation of Equations 8 and 9, the multiplication by matrix T' corresponds to the multiplication by the Galois field element a'where a is a primitive element represented by the first column of the matrix T.
FIG. 2 is a schematic diagram of the encoder for generating the check bytes Bo and Bj for each word in a subblock, while FIG. 3 illustrates the encoder in FIG. 2 in more conventional functional logic blocks. The encoder functions to perform modulo g (x) operations where g (x) is a polynomial with roots a and a2. The specific generator polynomial is
g (*)=r3^°®(7-+ 7V©*2
The check bytes Bo and Bi for one word are developed by supplying the date bytes Bm_i through B2 to input 20 of FIG. 2. In FIG. 2, block 21 and block 22 function to store an 8-bit field element. Blocks 23 and 24 function to add two 8-bit field elements modulo 2 while blocks 25 and 26 function as matrix multipliers to multiply an 8-bit field element by a specific matrix. Initially, blocks 21 and 22 are set to zero and the data bytes are clocked into the encoder at input 20. At the end of the operation, the encoder contains check bytes Bi and Bo in its 8-bit blocks 22 and 21, respectively.
The details of the matrix multiplier for T3 represented by block 25 in FIGS. 2 and 3 are shown in FIG. 4a in which B represents an 8-bit input vector, selected bit positions of which are combined modulo 2 in blocks 41 through 48. The bit positions selected for inputs to blocks 41 and 48 are determined from matrix T3. As shown in FIG. 4b, the top row determines the input for block 41, while the bottom row determines the input for blocks 48, a binary 1 digit in a column signifying an input from the corresponding bit position of the input vector B. Hence, block 41 receives input from bit positions 5, 6 and 7, while block 48 receives input from bit positions 4, 5 and 6 corresponding respectively to the