WO2008046078A2 - Determining message residue using a set of polynomials - Google Patents

Determining message residue using a set of polynomials Download PDF

Info

Publication number
WO2008046078A2
WO2008046078A2 PCT/US2007/081312 US2007081312W WO2008046078A2 WO 2008046078 A2 WO2008046078 A2 WO 2008046078A2 US 2007081312 W US2007081312 W US 2007081312W WO 2008046078 A2 WO2008046078 A2 WO 2008046078A2
Authority
WO
WIPO (PCT)
Prior art keywords
polynomials
polynomial
bits
stages
stage
Prior art date
Application number
PCT/US2007/081312
Other languages
French (fr)
Other versions
WO2008046078A3 (en
Inventor
William C. Hasenplaugh
Brad A. Burres
Gunnar Gaubatz
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to EP07854019A priority Critical patent/EP2089973A4/en
Priority to JP2009532617A priority patent/JP5164277B2/en
Publication of WO2008046078A2 publication Critical patent/WO2008046078A2/en
Publication of WO2008046078A3 publication Critical patent/WO2008046078A3/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/09Error detection only, e.g. using cyclic redundancy check [CRC] codes or single parity bit
    • H03M13/091Parallel or block-wise CRC computation
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/65Purpose and implementation aspects
    • H03M13/6508Flexibility, adaptability, parametrability and configurability of the implementation
    • H03M13/6516Support of multiple code parameters, e.g. generalized Reed-Solomon decoder for a variety of generator polynomials or Galois fields

Definitions

  • Data transmitted over network connections or retrieved from a storage device may be corrupted for a variety of reasons. For instance, a noisy transmission line may change a "1" signal to a "0", or vice versa.
  • data is often accompanied by some value derived from the data such as a checksum.
  • a receiver of the data can recompute the checksum and compare with the original checksum to confirm that the data was likely transmitted without error.
  • CRC Cyclic Redundancy Check
  • a CRC value can be used much in the same way.
  • CRC computation is based on interpreting message bits as a polynomial, where each bit of the message represents a polynomial coefficient. For example, a message of " 1110" corresponds to a polynomial of x 3 + x 2 + x + 0. The message is divided by another polynomial known as the key. For example, the other polynomial may be " 11 " or x + 1.
  • a CRC is the remainder of a division of the message by the key.
  • CRC polynomial division is somewhat different than ordinary division in that it is computed over the finite field GF(2) (i.e., the set of integers modulo 2). More simply put: even number coefficients become zeroes and odd number coefficients become ones. [0003]
  • a wide variety of techniques have been developed to perform CRC calculations.
  • a first technique uses a dedicated CRC circuit to implement a specific polynomial key. This approach can produce very fast circuitry with a very small footprint. The speed and size, however, often come at the cost of inflexibility with respect to the polynomial key used. Additionally, supporting multiple keys may increase the circuitry footprint nearly linearly for each key supported.
  • a second commonly used technique features a CRC lookup table where, for a given polynomial and set of data inputs and remainders, all possible CRC results are calculated and stored. Determining a CRC becomes a simple matter of performing table lookups. This approach, however, generally has a comparatively large circuit footprint and may require an entire re -population of the lookup table to change the polynomial key being used.
  • a third technique is a programmable CRC circuit. This allows nearly any polynomial to be supported in a reasonably efficient amount of die area. Unfortunately, this method can suffer from much slower performance than the previously described methods.
  • FIG. 1 is a diagram illustrating a set of stages that apply a set of pre-computed polynomials to determine a polynomial division residue.
  • FIG. 2 is a diagram of a set of pre-computed polynomials.
  • FIG. 3 is a diagram illustrating stages that perform parallel operations on a pre-computed polynomial and input data.
  • FIGs. 4A and 4B are diagrams of sample stages' digital logic gates.
  • FIG. 5 is a diagram of a system to compute a polynomial division residue.
  • FIG. 1 illustrates a sample implementation of a programmable Cyclic Redundancy Check (CRC) circuit 100.
  • the circuit 100 can achieve, roughly, the same performance as a lookup table CRC implementation and may be only modestly slower than a dedicated CRC circuit implementation operating on a typical polynomial. From a die-area perspective, the circuit 100 can be orders of magnitude smaller than a lookup table approach and within an order of magnitude of a dedicated circuit implementation.
  • the circuit 100 uses a series of pre-computed polynomials 10Oa-IOOd derived from a polynomial key. Bits of the pre-computed polynomials 10Oa-IOOd are loaded into storage elements (e.g., registers or memory locations) and fed into a series of stages 106a-106d that successively reduce an initial message into smaller intermediate values en route to a final CRC result output by stage 106d. For example, as shown, the width of data, rb - ra, output by stages 106a-106d decreases with each successive stage.
  • storage elements e.g., registers or memory locations
  • the pre-computed polynomials 10Oa-IOOd and stages 106d-106a are constructed such that the initial input, r a , and the stage outputs, rb - ra, are congruent to each other with respect to the final residue (i.e., r a ⁇ ⁇ , ⁇ r c ⁇ rj).
  • the pre- computed polynomials 10Oa-IOOd permit the stages 106a-106d to perform many of the calculations in parallel, reducing the number of gate delays needed to determine a CRC residue. Reprogramming the circuitry 110 for a different key can simply be a matter of loading the appropriate set of pre-computed polynomials into the storage elements 10Oa-IOOd.
  • FIG. 2 illustrates a sample set of pre-computed polynomials, gi(x), 10Oa-IOOd (e.g., g4, g2, gi, and go).
  • these polynomials 10Oa-IOOd have the property that each successive polynomial 10Oa-IOOd in the set features a leading one bit (i + kth bit) followed by followed by i-zeroes 102 (shaded) and concluding with k- bits of data 104 of the order of the generating (CRC) polynomial (e.g., go).
  • CRC generating
  • each stage 106a-106d to reduce input data to a smaller, but CRC equivalent, value.
  • a CRC could be determined for input data of 16-bits.
  • applying g 4 (x) would reduce the input data from 16-bits to 12-bits
  • g 2 (x) would reduce the next the data from 12- bits to 10-bits, and so forth until an 8-bit residue was output by a final stage 106d.
  • a given stage 106a-106d may use a polynomial 10Oa-IOOd to process mutually exclusive regions of the input data in parallel.
  • g(x) be a k ⁇ -degree CRC polynomial of k+1 bits, where the leading bit is always set in order that the residue may span k bits.
  • a sequence of polynomials can be computed as a function of selected values of i and the original polynomial g(x).
  • a recurrence can be defined, where at each stage a message, m(x), is partially reduced by one of the pre-computed polynomials.
  • m(x) be a 2 L bit message
  • r(x) be the k-bit result:
  • V 1 , m j e GF(2) r ⁇ x) [m(x) ⁇ x mod g(x)]
  • r L (x) [r L ⁇ (x) ⁇ odg 0 (xy ⁇
  • FIG. 3 illustrates a high- level architecture of a circuit implementing the approach described above.
  • a given stage 106a-106d can reduce input data, r, by subtracting a multiple of the k- least significant bits 104 of the pre-computed polynomial gj(x) from the stage input. Again, the resulting stage output is congruent to the stage input with respect to a CRC calculation though of a smaller width.
  • stage 106a-106d that AND 110a- HOd (e.g., multiply) the k-least significant bits 104 of gj(x) by respective bits of input data.
  • the i-zeroes 102 and initial "1" of gj(x) are not needed by the stage since they do not affect the results of stage computation. Thus, only the k-least significant bits of gi(x) need to be stored by the circuitry.
  • the first 110a and third 110c AND gates would output "001010010" while the second 110b and fourth HOd AND gates would output zeros.
  • the output of the AND 1 lOa-11Od gates can be aligned to shift (i.e., multiply) the gate 1 lOa-11Od output in accordance with the respective bit-positions of the input data.
  • the output of the gate 110a operating on the most significant bit of input data is shifted by i-1 bits, and each succeeding gate 11 Ob-11Od decrements this shift by 1.
  • the output of gate 110a, corresponding to the most significant bit of r 0 is shifted by 3- bits with respect to the input data
  • the output of gate 110b corresponding to the next most significant bit of r 0 is shifted by 2-bits, etc.
  • the input data can then be subtracted (e.g., XOR-ed) by the shifted-alignment of the output of gates 11 Oa-11Od.
  • the subtraction result reduces the input data by a number of bits equal to the number of zeroes 102 in the polynomial for i > 0.
  • r Oj act as selectors, either causing subtraction of the input data by some multiple of the k-least significant bits of g ⁇ x) or outputting zeroes that do not alter the input data.
  • the AND gates 11 Oa-11Od of a stage 106a may operate in parallel since they work on mutually-exclusive portions of the input data. That is, AND gates 11 Oa-11Od can each simultaneously process a different bit of ro in parallel. This parallel processing can significantly speed CRC calculation. Additionally, different stages may also process data in parallel. For example, gate HOe of stage 106b can perform its selection near the very outset of operation since the most significant bit of r 0 passes through unaltered to stage 106b.
  • FIG. 4A depicts digital logic gates of a sample stage 106a implementation conforming to the architecture shown in FIG. 3.
  • the stage 106a processes the i-th most significant bits of the input value with i-sets of AND gates 11 Oa-11Od where each input data bit is ANDed 11 Oa-11Od with each of the k-least significant bits of g 4 (x).
  • Each set of k-AND gates 1 lOa-11Od in FIG. 4A corresponds to the conceptual depiction of a single AND gate in FIG. 3.
  • the output of the AND gate arrays 11 Oa-11Od is aligned based on the input data bit position and fed into a tree of XOR gates 112a-l 12d that subtract the shifted AND gate 1 lOa-11Od output from the remaining bits of input data (i.e., the input data less the i-th most significant bits).
  • FIG. 4B depicts digital logic gates of a succeeding stage 106b that receives output 1 data [11 :0] and generates output_2 data [9:0].
  • the stage 106b receives the 12-bit value output by stage 106a and uses g 2 (x) to reduce the 12-bit value to a CRC congruent 10-bit value.
  • Stages 106a, 106b share the same basic architecture of i- arrays of AND gates that operate on the k-least significant bits of gi(x) and an XOR tree that subtracts the shifted AND gate output from the stage input to generate the stage output value. Other stages for different values of i can be similarly constructed. [0026] The architecture shown in FIGs.
  • each stage 106a-106d processed the i-th most significant bits of input data in parallel.
  • a number of bits greater or less than i could be used in parallel, however, this may not succeed in reducing the size of the output data for a given stage.
  • the architecture shown above may be used in deriving the pre-computed polynomials.
  • derivation can be performed by zeroing the storage elements associated with gj(x) and loading go with the k-least significant bits of the polynomial key.
  • the bits associated with successive g r s can be determined by applying x k+1 as the data input to the circuit and storing the resulting k-least significant bits output by the go stage as the value associated with g,.
  • x k+2 can be applied as the circuit, the resulting k-bit output of the go stage can be loaded as the value of the g 2 polynomial.
  • FIG. 5 depicts a sample CRC implementation using techniques described above.
  • the implementation works on successive portions of a larger message in 32- bit segments 120.
  • the sample implementation shifts 124, 126 and XORs 128 a given portion 120 of a message by any pre-existing residue 122 and computes the CRC residue using stages 106a-106f and the k-bits of the respective pre-computed polynomials gi(x).
  • successive stages 106a-106e reduce input data by i-bits until a residue value is output by stage 106f.
  • the circuit then feeds the residue back 122 for use in processing the next message portion 124.
  • the residue remaining after the final message portion 120 is applied is the CRC value determined for the message as a whole. This can either be appended to the message or compared with a received CRC value to determine whether data corruption likely occurred.
  • CRC circuitry as described above may be integrated into a device having one or more media access controllers (e.g., Ethernet MACs) coupled to one or more processors/processor cores.
  • media access controllers e.g., Ethernet MACs
  • circuitry may be integrated into the processor itself, in a network interface card (NIC), chipset, as a co-processor, and so forth.
  • the CRC circuitry may operate on data included within a network packet (e.g., the packet header and/or payload). Additionally, while described in conjunction with a CRC calculation, this technique may be applied in a variety of calculations such as other residue calculations over GF(2) (e.g., Elliptic Curve Cryptography).
  • GF(2) e.g., Elliptic Curve Cryptography

Abstract

A method is described for use in determining a residue of a message. The method includes loading at least a portion of each of a set of polynomials derived from a first polynomial, g(x), and determining the residue using a set of stages. Individual ones of the stages apply a respective one of the derived set of polynomials to data output by a preceding one of the set of stages.

Description

DETERMINING MESSAGE RESIDUE USING A SET OF POLYNOMIALS
BACKGROUND
[0001] Data transmitted over network connections or retrieved from a storage device, for example, may be corrupted for a variety of reasons. For instance, a noisy transmission line may change a "1" signal to a "0", or vice versa. To detect corruption, data is often accompanied by some value derived from the data such as a checksum. A receiver of the data can recompute the checksum and compare with the original checksum to confirm that the data was likely transmitted without error. [0002] A common technique to identify data corruption is known as a Cyclic Redundancy Check (CRC). Though not literally a checksum, a CRC value can be used much in the same way. That is, a comparison of an originally computed CRC and a recomputed CRC can identify data corruption with a very high likelihood. CRC computation is based on interpreting message bits as a polynomial, where each bit of the message represents a polynomial coefficient. For example, a message of " 1110" corresponds to a polynomial of x3 + x2 + x + 0. The message is divided by another polynomial known as the key. For example, the other polynomial may be " 11 " or x + 1. A CRC is the remainder of a division of the message by the key. CRC polynomial division, however, is somewhat different than ordinary division in that it is computed over the finite field GF(2) (i.e., the set of integers modulo 2). More simply put: even number coefficients become zeroes and odd number coefficients become ones. [0003] A wide variety of techniques have been developed to perform CRC calculations. A first technique uses a dedicated CRC circuit to implement a specific polynomial key. This approach can produce very fast circuitry with a very small footprint. The speed and size, however, often come at the cost of inflexibility with respect to the polynomial key used. Additionally, supporting multiple keys may increase the circuitry footprint nearly linearly for each key supported. [0004] A second commonly used technique features a CRC lookup table where, for a given polynomial and set of data inputs and remainders, all possible CRC results are calculated and stored. Determining a CRC becomes a simple matter of performing table lookups. This approach, however, generally has a comparatively large circuit footprint and may require an entire re -population of the lookup table to change the polynomial key being used.
[0005] A third technique is a programmable CRC circuit. This allows nearly any polynomial to be supported in a reasonably efficient amount of die area. Unfortunately, this method can suffer from much slower performance than the previously described methods.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] FIG. 1 is a diagram illustrating a set of stages that apply a set of pre-computed polynomials to determine a polynomial division residue.
[0007] FIG. 2 is a diagram of a set of pre-computed polynomials.
[0008] FIG. 3 is a diagram illustrating stages that perform parallel operations on a pre-computed polynomial and input data.
[0009] FIGs. 4A and 4B are diagrams of sample stages' digital logic gates.
[0010] FIG. 5 is a diagram of a system to compute a polynomial division residue.
DETAILED DESCRIPTION
[0011] FIG. 1 illustrates a sample implementation of a programmable Cyclic Redundancy Check (CRC) circuit 100. The circuit 100 can achieve, roughly, the same performance as a lookup table CRC implementation and may be only modestly slower than a dedicated CRC circuit implementation operating on a typical polynomial. From a die-area perspective, the circuit 100 can be orders of magnitude smaller than a lookup table approach and within an order of magnitude of a dedicated circuit implementation.
[0012] The circuit 100 uses a series of pre-computed polynomials 10Oa-IOOd derived from a polynomial key. Bits of the pre-computed polynomials 10Oa-IOOd are loaded into storage elements (e.g., registers or memory locations) and fed into a series of stages 106a-106d that successively reduce an initial message into smaller intermediate values en route to a final CRC result output by stage 106d. For example, as shown, the width of data, rb - ra, output by stages 106a-106d decreases with each successive stage. The pre-computed polynomials 10Oa-IOOd and stages 106d-106a are constructed such that the initial input, ra, and the stage outputs, rb - ra, are congruent to each other with respect to the final residue (i.e., ra ≡ η, ≡ rc ≡ rj). In addition, the pre- computed polynomials 10Oa-IOOd permit the stages 106a-106d to perform many of the calculations in parallel, reducing the number of gate delays needed to determine a CRC residue. Reprogramming the circuitry 110 for a different key can simply be a matter of loading the appropriate set of pre-computed polynomials into the storage elements 10Oa-IOOd.
[0013] FIG. 2 illustrates a sample set of pre-computed polynomials, gi(x), 10Oa-IOOd (e.g., g4, g2, gi, and go). By examination, these polynomials 10Oa-IOOd have the property that each successive polynomial 10Oa-IOOd in the set features a leading one bit (i + kth bit) followed by followed by i-zeroes 102 (shaded) and concluding with k- bits of data 104 of the order of the generating (CRC) polynomial (e.g., go). The form of these polynomials 10Oa-IOOd enables each stage 106a-106d to reduce input data to a smaller, but CRC equivalent, value. For example, after deriving a set of polynomials (g4(x), g2(x), gi(x)} from some 9-bit polynomial go(x), a CRC could be determined for input data of 16-bits. During operation, applying g4(x) would reduce the input data from 16-bits to 12-bits, g2(x) would reduce the next the data from 12- bits to 10-bits, and so forth until an 8-bit residue was output by a final stage 106d. Additionally, as described in greater detail below, a given stage 106a-106d may use a polynomial 10Oa-IOOd to process mutually exclusive regions of the input data in parallel.
[0014] More rigorously, let g(x) be a k^-degree CRC polynomial of k+1 bits, where the leading bit is always set in order that the residue may span k bits. The polynomial g(x) is defined as k-l g(χ) = [χk + ∑ χ'] gj e GF(2)
[0015] The polynomial gj(x) is then defined as: gl (x) = xk+' + [xk+' mod g(x)]
[0016] In accordance with this definition of gi(x), a sequence of polynomials can be computed as a function of selected values of i and the original polynomial g(x). [0017] The CRC polynomial, g(x) , divides gt (x) : g(χ) \ g,(χ) proof gl(x) = xk+' +[xk+' modg(x)] = xk+' +[xk+' -al(x)g(x)]
- for some at (x) = at(x)g(x)
From this, a recurrence can be defined, where at each stage a message, m(x), is partially reduced by one of the pre-computed polynomials. [0018] Let m(x) be a 2L bit message and r(x) be the k-bit result:
V1 , m j e GF(2) r{x) = [m(x) x mod g(x)]
where m(x) is shifted by x , creating room to append the resulting CRC residue to the message, m(x). Thus: ro(x) = m(x) -xk
Φ) = [r1-ι(x)modg2L , (x)]
for i > 1. Thus, η (x) ≡ r0 (x) mod g(x) , which is proved by induction on i:
Figure imgf000005_0001
proof r1(x) = r0(x)modg2£ 1 (x)
= r0(x)mod[α2£ 1 (x)g(x)]
Figure imgf000005_0002
proof
Figure imgf000005_0003
= r!_1(x)mod[α L , (x)g(x)] [0019] Finally, rL{x) = r(x) , which follows from the observations made above:
rL(x) = [rL^(x)∞odg0(xy\
= [m(x) xk - b(x) g(x)] mod g0 (x)
- for some b(x) = m(x) xk mod g(x)
[0020] These equations provide an approach to CRC computation that can be implemented in a wide variety of circuitry. For example, FIG. 3 illustrates a high- level architecture of a circuit implementing the approach described above. As shown, a given stage 106a-106d can reduce input data, r, by subtracting a multiple of the k- least significant bits 104 of the pre-computed polynomial gj(x) from the stage input. Again, the resulting stage output is congruent to the stage input with respect to a CRC calculation though of a smaller width.
[0021] The sample implementation shown features stages 106a-106d that AND 110a- HOd (e.g., multiply) the k-least significant bits 104 of gj(x) by respective bits of input data. The i-zeroes 102 and initial "1" of gj(x) are not needed by the stage since they do not affect the results of stage computation. Thus, only the k-least significant bits of gi(x) need to be stored by the circuitry.
[0022] To illustrate operation, assuming ro had a value starting "1010..." and the k- least significant bits of g4(x) had a value of "001010010", the first 110a and third 110c AND gates would output "001010010" while the second 110b and fourth HOd AND gates would output zeros. As indicated by the shaded nodes in FIG. 3, the output of the AND 1 lOa-11Od gates can be aligned to shift (i.e., multiply) the gate 1 lOa-11Od output in accordance with the respective bit-positions of the input data. That is, the output of the gate 110a operating on the most significant bit of input data is shifted by i-1 bits, and each succeeding gate 11 Ob-11Od decrements this shift by 1. For example, the output of gate 110a, corresponding to the most significant bit of r0, is shifted by 3- bits with respect to the input data, the output of gate 110b corresponding to the next most significant bit of r0 is shifted by 2-bits, etc. The input data can then be subtracted (e.g., XOR-ed) by the shifted-alignment of the output of gates 11 Oa-11Od. The subtraction result reduces the input data by a number of bits equal to the number of zeroes 102 in the polynomial for i > 0. In essence, the i-most significant bits of input data, rOj act as selectors, either causing subtraction of the input data by some multiple of the k-least significant bits of g^x) or outputting zeroes that do not alter the input data.
[0023] As shown, the AND gates 11 Oa-11Od of a stage 106a may operate in parallel since they work on mutually-exclusive portions of the input data. That is, AND gates 11 Oa-11Od can each simultaneously process a different bit of ro in parallel. This parallel processing can significantly speed CRC calculation. Additionally, different stages may also process data in parallel. For example, gate HOe of stage 106b can perform its selection near the very outset of operation since the most significant bit of r0 passes through unaltered to stage 106b.
[0024] FIG. 4A depicts digital logic gates of a sample stage 106a implementation conforming to the architecture shown in FIG. 3. In this example, the stage 106a receives a 16-bit input value (e.g., ro = input data [15:0]) and the k-least significant bits of g4(x). The stage 106a processes the i-th most significant bits of the input value with i-sets of AND gates 11 Oa-11Od where each input data bit is ANDed 11 Oa-11Od with each of the k-least significant bits of g4(x). Each set of k-AND gates 1 lOa-11Od in FIG. 4A corresponds to the conceptual depiction of a single AND gate in FIG. 3. The output of the AND gate arrays 11 Oa-11Od is aligned based on the input data bit position and fed into a tree of XOR gates 112a-l 12d that subtract the shifted AND gate 1 lOa-11Od output from the remaining bits of input data (i.e., the input data less the i-th most significant bits).
[0025] FIG. 4B depicts digital logic gates of a succeeding stage 106b that receives output 1 data [11 :0] and generates output_2 data [9:0]. The stage 106b receives the 12-bit value output by stage 106a and uses g2(x) to reduce the 12-bit value to a CRC congruent 10-bit value. Stages 106a, 106b share the same basic architecture of i- arrays of AND gates that operate on the k-least significant bits of gi(x) and an XOR tree that subtracts the shifted AND gate output from the stage input to generate the stage output value. Other stages for different values of i can be similarly constructed. [0026] The architecture shown in FIGs. 3, 4A, and 4B are merely examples and a wide variety of other implementations may be used. For example, in the sample FIGs, each stage 106a-106d processed the i-th most significant bits of input data in parallel. In other implementations, a number of bits greater or less than i could be used in parallel, however, this may not succeed in reducing the size of the output data for a given stage.
[0027] The architecture shown above may be used in deriving the pre-computed polynomials. For example, derivation can be performed by zeroing the storage elements associated with gj(x) and loading go with the k-least significant bits of the polynomial key. The bits associated with successive grs can be determined by applying xk+1 as the data input to the circuit and storing the resulting k-least significant bits output by the go stage as the value associated with g,. For example, to derive the polynomial for g2, xk+2 can be applied as the circuit, the resulting k-bit output of the go stage can be loaded as the value of the g2 polynomial.
[0028] FIG. 5 depicts a sample CRC implementation using techniques described above. The implementation works on successive portions of a larger message in 32- bit segments 120. As shown, the sample implementation shifts 124, 126 and XORs 128 a given portion 120 of a message by any pre-existing residue 122 and computes the CRC residue using stages 106a-106f and the k-bits of the respective pre-computed polynomials gi(x). Again, successive stages 106a-106e reduce input data by i-bits until a residue value is output by stage 106f. The circuit then feeds the residue back 122 for use in processing the next message portion 124. The residue remaining after the final message portion 120 is applied is the CRC value determined for the message as a whole. This can either be appended to the message or compared with a received CRC value to determine whether data corruption likely occurred. [0029] The system shown in FIG. 5 featured (L+ 1) stages 106a- 106f where the polynomials were of the form i = {0, 2n l for n = 1 to L}. However, this strict geometric progression of i is not necessary and other values of i may be used in reducing a message. Additionally, at the lower polynomial levels (e.g., i < 4) it may be more efficient to abandon the stage architecture depicted in FIGs. 3, 4 A, and 4B and process an input value using g0 in a traditional bit-serial or other fashion. [0030] Techniques described above can be used to improve CRC calculation speed, power efficiency, and circuit footprint. As such, techniques described above may be used in a variety of environments such as network processors, security processors, chipsets, ASICs (Application Specific Integrated Circuits), and as a functional unit within a processor or processor core where the ability to handle high clock speeds, while supporting arbitrary polynomials, is of particular value. As an example, CRC circuitry as described above may be integrated into a device having one or more media access controllers (e.g., Ethernet MACs) coupled to one or more processors/processor cores. Such circuitry may be integrated into the processor itself, in a network interface card (NIC), chipset, as a co-processor, and so forth. The CRC circuitry may operate on data included within a network packet (e.g., the packet header and/or payload). Additionally, while described in conjunction with a CRC calculation, this technique may be applied in a variety of calculations such as other residue calculations over GF(2) (e.g., Elliptic Curve Cryptography). [0031] The term circuitry as used herein includes implementations of hardwired circuitry, digital circuitry, analog circuitry, programmable circuitry, and so forth. The programmable circuitry may operate on computer instructions disposed on a storage medium
[0032] Other embodiments are within the scope of the following claims. [0033] What is claimed is:

Claims

CLAIMS:
1. A method for use in determining a residue of a message, m(x), the method comprising: loading at least a portion of each of a set of polynomials derived from a first polynomial, g(x); and determining the residue for the message corresponding to m(x) mod g(x) using a set of stages, individual ones of the stages applying at least a portion of a respective one of the set of polynomials to data output by a preceding one of the set of stages.
2. The method of claim 1, wherein the set of polynomials comprises polynomials having a prefix and a k- bit remainder where k is positive integer, wherein the prefix of a polynomial in the set of polynomials consists of a most significant bit equal to 1 followed by a set of zero or more consecutive zeroes, and wherein zeros in the set of one or more consecutive zeroes increases for successive polynomials in the set.
3. The method of claim 2, wherein the set of polynomials conform to: gl(x) = xk+1 + [xk+1 mod g(x)] for multiple values of i, where i is an integer.
4. The method of claim 1 , wherein respective ones of the set of stages receives bits of T1-1(X) and outputs bits of T1(X) such that T1(X) ≡ T1-1(X).
5. The method of claim 1 , further comprising, at least one of: (1) appending the residue to the message for transmission across a network, and (2) comparing the residue to a previously computed residue.
6. The method of claim 2, wherein, in individual ones of the stages, at least a portion of a one of the set of polynomials associated with a respective stage undergoes polynomial multiplication by respective bits of input data received by the respective stage.
7. The method of claim 6, wherein the respective bits of input data consist of a number of bits equal to a number of zeroes in the set of one or more consecutive zeroes in the respective polynomial prefix.
8. The method of claim 6, where the polynomial multiplication by respective bits of input data occurs in parallel for the respective bits of input data.
9. An apparatus for use in determining a residue of a message, m, with respect to a first polynomial, g(x), over a finite field, GF (2), the apparatus comprising: a set of storage elements to store at least a portion of each of a set of polynomials derived from the first polynomial, g(x); and a set of stages coupled to respective ones of the set of storage elements, respective stages in the set of stages comprising digital logic gates to apply a value stored in a respective one of the set of storage elements to a respective input of the stage.
10. The apparatus of claim 9, wherein the set of polynomials comprises polynomials having a prefix and a k- bit remainder where k is a positive integer, wherein the prefix of a polynomial in the set consists of a most significant bit equal to 1 followed by a set of zero or more consecutive zeroes, and wherein a number of consecutive zeroes in the set of zero or more zeroes increases for successive polynomials in the set.
11. The apparatus of claim 10, wherein the set of polynomials comprise polynomials conforming to: gl(x) = xk+1 + [xk+1 mod g(x)] for multiple values of i, where i is an integer.
12. The apparatus of claim 9, wherein respective ones of the set of stages receives bits of T1-1(X) and outputs bits of V1(X), such that T1(X) ≡ T1-1(X).
13. The apparatus of claim 10, wherein in individual ones of the set of stages, the respective polynomial k-bit remainder associated with a respective stage is fed into AND gates with input data bits of the respective stage.
14. The apparatus of claim 13, wherein the respective input data bits fed into the AND gates consist of a number of bits equal to a number of consecutive zeroes in the respective polynomial prefix set of zero or more consecutive zeroes.
15. The apparatus of claim 13, wherein the digital logic gates comprise a tree of exclusive-or (XOR) gates coupled to the output of the AND gates and the least significant bits of the stage input data.
16. The apparatus of claim 9, further comprising circuitry to load new values of the set of polynomials into the storage elements.
17. A device, comprising: at least one media access controller (MAC) to receive a message from a network; at least one processor communicatively coupled to the at least one media access controller; the device including circuitry to determine a residue of the message with respect to a first polynomial, g(x), over a finite field, GF(2), the circuitry comprising: a set of storage elements to store a set of polynomials derived from the first polynomial, g(x); and a set of stages coupled to respective ones of the set of storage elements, respective stages in the set of stages comprising digital logic gates to apply a value stored in a respective one of the set of storage elements to a respective input of the respective stage.
18. The device of claim 17, wherein the set of polynomials comprises polynomials having a prefix and a k- bit remainder where k is a positive integer, wherein the prefix of a polynomial in the set consists of a most significant bit equal to 1 followed by a set of zero or more consecutive zeroes, and wherein a number of consecutive zeroes in the set of zero or more consecutive zeroes increases for successive polynomials in the set.
19. The device of claim 18, wherein the set of polynomials comprise polynomials conforming to: gl(x) = xk+1 + [xk+1 mod g(x)] for multiple values of i, where i is an integer.
20. The device of claim 17, wherein in individual ones of the set of stages, the respective polynomial k-bit remainder associated with a respective stage are fed into AND gates with input data bits of the respective stage; and wherein the respective input data bits fed into the AND gates consist of a number of bits equal to the number of successive zeroes in the respective polynomial prefix set of zero or more consecutive zeroes.
PCT/US2007/081312 2006-10-12 2007-10-12 Determining message residue using a set of polynomials WO2008046078A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP07854019A EP2089973A4 (en) 2006-10-12 2007-10-12 Determining message residue using a set of polynomials
JP2009532617A JP5164277B2 (en) 2006-10-12 2007-10-12 Message remainder determination using a set of polynomials.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/581,055 US7827471B2 (en) 2006-10-12 2006-10-12 Determining message residue using a set of polynomials
US11/581,055 2006-10-12

Publications (2)

Publication Number Publication Date
WO2008046078A2 true WO2008046078A2 (en) 2008-04-17
WO2008046078A3 WO2008046078A3 (en) 2008-05-22

Family

ID=39283673

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/081312 WO2008046078A2 (en) 2006-10-12 2007-10-12 Determining message residue using a set of polynomials

Country Status (6)

Country Link
US (1) US7827471B2 (en)
EP (1) EP2089973A4 (en)
JP (1) JP5164277B2 (en)
CN (1) CN101162964B (en)
TW (1) TW200832935A (en)
WO (1) WO2008046078A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7827471B2 (en) 2006-10-12 2010-11-02 Intel Corporation Determining message residue using a set of polynomials

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7236490B2 (en) 2000-11-17 2007-06-26 Foundry Networks, Inc. Backplane interface adapter
US7596139B2 (en) 2000-11-17 2009-09-29 Foundry Networks, Inc. Backplane interface adapter with error control and redundant fabric
US20120155466A1 (en) 2002-05-06 2012-06-21 Ian Edward Davis Method and apparatus for efficiently processing data packets in a computer network
US20090279558A1 (en) * 2002-05-06 2009-11-12 Ian Edward Davis Network routing apparatus for enhanced efficiency and monitoring capability
US7187687B1 (en) 2002-05-06 2007-03-06 Foundry Networks, Inc. Pipeline method and system for switching packets
US7468975B1 (en) 2002-05-06 2008-12-23 Foundry Networks, Inc. Flexible method for processing data packets in a network routing system for enhanced efficiency and monitoring capability
US7266117B1 (en) 2002-05-06 2007-09-04 Foundry Networks, Inc. System architecture for very fast ethernet blade
US7649885B1 (en) 2002-05-06 2010-01-19 Foundry Networks, Inc. Network routing system for enhanced efficiency and monitoring capability
US6901072B1 (en) 2003-05-15 2005-05-31 Foundry Networks, Inc. System and method for high speed packet transmission implementing dual transmit and receive pipelines
US7817659B2 (en) 2004-03-26 2010-10-19 Foundry Networks, Llc Method and apparatus for aggregating input data streams
US8730961B1 (en) 2004-04-26 2014-05-20 Foundry Networks, Llc System and method for optimizing router lookup
US7657703B1 (en) 2004-10-29 2010-02-02 Foundry Networks, Inc. Double density content addressable memory (CAM) lookup scheme
US7712054B2 (en) * 2005-10-14 2010-05-04 Sap Ag Populating a table in a business application
US8448162B2 (en) 2005-12-28 2013-05-21 Foundry Networks, Llc Hitless software upgrades
US20070288690A1 (en) * 2006-06-13 2007-12-13 Foundry Networks, Inc. High bandwidth, high capacity look-up table implementation in dynamic random access memory
US8229109B2 (en) * 2006-06-27 2012-07-24 Intel Corporation Modular reduction using folding
US7903654B2 (en) 2006-08-22 2011-03-08 Foundry Networks, Llc System and method for ECMP load sharing
US8238255B2 (en) 2006-11-22 2012-08-07 Foundry Networks, Llc Recovering from failures without impact on data traffic in a shared bus architecture
US7978614B2 (en) 2007-01-11 2011-07-12 Foundry Network, LLC Techniques for detecting non-receipt of fault detection protocol packets
US8689078B2 (en) * 2007-07-13 2014-04-01 Intel Corporation Determining a message residue
US8271859B2 (en) * 2007-07-18 2012-09-18 Foundry Networks Llc Segmented CRC design in high speed networks
US8037399B2 (en) 2007-07-18 2011-10-11 Foundry Networks, Llc Techniques for segmented CRC design in high speed networks
US8509236B2 (en) 2007-09-26 2013-08-13 Foundry Networks, Llc Techniques for selecting paths and/or trunk ports for forwarding traffic flows
US7886214B2 (en) * 2007-12-18 2011-02-08 Intel Corporation Determining a message residue
US8042025B2 (en) * 2007-12-18 2011-10-18 Intel Corporation Determining a message residue
US9052985B2 (en) * 2007-12-21 2015-06-09 Intel Corporation Method and apparatus for efficient programmable cyclic redundancy check (CRC)
WO2009091958A1 (en) * 2008-01-17 2009-07-23 Amphenol Corporation Interposer assembly and method
US8225187B1 (en) * 2008-03-31 2012-07-17 Xilinx, Inc. Method and apparatus for implementing a cyclic redundancy check circuit
US8103928B2 (en) * 2008-08-04 2012-01-24 Micron Technology, Inc. Multiple device apparatus, systems, and methods
US8161365B1 (en) * 2009-01-30 2012-04-17 Xilinx, Inc. Cyclic redundancy check generator
US8090901B2 (en) 2009-05-14 2012-01-03 Brocade Communications Systems, Inc. TCAM management approach that minimize movements
US8599850B2 (en) 2009-09-21 2013-12-03 Brocade Communications Systems, Inc. Provisioning single or multistage networks using ethernet service instances (ESIs)
US8464125B2 (en) * 2009-12-10 2013-06-11 Intel Corporation Instruction-set architecture for programmable cyclic redundancy check (CRC) computations
US20170059539A1 (en) * 2013-06-14 2017-03-02 Dresser, Inc. Modular metering system
CN106909339A (en) * 2017-02-22 2017-06-30 深圳职业技术学院 A kind of Galois field multiplier based on binary tree structure
CN108540137B (en) * 2018-03-02 2021-09-03 江西清华泰豪三波电机有限公司 Cyclic redundancy check code generation method and device

Family Cites Families (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3980874A (en) * 1975-05-09 1976-09-14 Burroughs Corporation Binary to modulo M translation
JP2577914B2 (en) * 1987-06-11 1997-02-05 クラリオン株式会社 m-sequence code generator
FR2622713A1 (en) * 1987-10-30 1989-05-05 Thomson Csf CALCULATION CIRCUIT USING RESIDUAL ARITHMETICS
US4979174A (en) * 1988-12-29 1990-12-18 At&T Bell Laboratories Error correction and detection apparatus and method
FR2658932A1 (en) * 1990-02-23 1991-08-30 Koninkl Philips Electronics Nv METHOD OF ENCODING THE RSA METHOD BY A MICROCONTROLLER AND DEVICE USING THE SAME
US5384786A (en) * 1991-04-02 1995-01-24 Cirrus Logic, Inc. Fast and efficient circuit for identifying errors introduced in Reed-Solomon codewords
US5363107A (en) * 1993-07-16 1994-11-08 Massachusetts Institute Of Technology Storage and transmission of compressed weather maps and the like
US5642367A (en) * 1994-02-07 1997-06-24 Mitsubishi Semiconductor America, Inc. Finite field polynomial processing module for error control coding
US5768296A (en) * 1994-07-01 1998-06-16 Quantum Corporation ECC system supporting different-length Reed-Solomon codes whose generator polynomials have common roots
US7190681B1 (en) * 1996-07-10 2007-03-13 Wu William W Error coding in asynchronous transfer mode, internet and satellites
US6128766A (en) * 1996-11-12 2000-10-03 Pmc-Sierra Ltd. High speed cyclic redundancy check algorithm
US5942005A (en) * 1997-04-08 1999-08-24 International Business Machines Corporation Method and means for computationally efficient error and erasure correction in linear cyclic codes
US6038577A (en) * 1998-01-09 2000-03-14 Dspc Israel Ltd. Efficient way to produce a delayed version of a maximum length sequence using a division circuit
US6484192B1 (en) * 1998-01-29 2002-11-19 Toyo Communication Equipment Co., Ltd. Root finding method and root finding circuit of quadratic polynomial over finite field
US6223320B1 (en) * 1998-02-10 2001-04-24 International Business Machines Corporation Efficient CRC generation utilizing parallel table lookup operations
US6456875B1 (en) * 1999-10-12 2002-09-24 Medtronic, Inc. Cyclic redundancy calculation circuitry for use in medical devices and methods regarding same
GB2360177B (en) * 2000-03-07 2003-08-06 3Com Corp Fast frame error checker for multiple byte digital data frames
GB0013355D0 (en) * 2000-06-01 2000-07-26 Tao Group Ltd Parallel modulo arithmetic using bitwise logical operations
US6721771B1 (en) * 2000-08-28 2004-04-13 Sun Microsystems, Inc. Method for efficient modular polynomial division in finite fields f(2{circumflex over ( )}m)
US6609410B2 (en) * 2000-09-29 2003-08-26 Spalding Sports Worldwide, Inc. High strain rate tester for materials used in sports balls
JP2002118471A (en) * 2000-10-06 2002-04-19 Hitachi Ltd Recording and reproducing device, method for correcting and coding error, and method for recording information
US6732317B1 (en) * 2000-10-23 2004-05-04 Sun Microsystems, Inc. Apparatus and method for applying multiple CRC generators to CRC calculation
US20020144208A1 (en) * 2001-03-30 2002-10-03 International Business Machines Corporation Systems and methods for enabling computation of CRC' s N-bit at a time
US7458006B2 (en) * 2002-02-22 2008-11-25 Avago Technologies General Ip (Singapore) Pte. Ltd. Methods for computing the CRC of a message from the incremental CRCs of composite sub-messages
US6904558B2 (en) * 2002-02-22 2005-06-07 Agilent Technologies, Inc. Methods for computing the CRC of a message from the incremental CRCs of composite sub-messages
JP3930479B2 (en) * 2002-04-22 2007-06-13 富士通株式会社 Error detection encoding / decoding device and division device
US7512230B2 (en) * 2002-04-30 2009-03-31 She Alfred C Method and apparatus of fast modular reduction
US7461115B2 (en) * 2002-05-01 2008-12-02 Sun Microsystems, Inc. Modular multiplier
US7343541B2 (en) * 2003-01-14 2008-03-11 Broadcom Corporation Data integrity in protocol offloading
US7243289B1 (en) * 2003-01-25 2007-07-10 Novell, Inc. Method and system for efficiently computing cyclic redundancy checks
US7058787B2 (en) * 2003-05-05 2006-06-06 Stmicroelectronics S.R.L. Method and circuit for generating memory addresses for a memory buffer
US7373514B2 (en) * 2003-07-23 2008-05-13 Intel Corporation High-performance hashing system
WO2005053217A1 (en) * 2003-11-19 2005-06-09 Honeywell International Inc. Message error verification using checking with hidden data
US7543142B2 (en) * 2003-12-19 2009-06-02 Intel Corporation Method and apparatus for performing an authentication after cipher operation in a network processor
US20050149744A1 (en) * 2003-12-29 2005-07-07 Intel Corporation Network processor having cryptographic processing including an authentication buffer
US7171604B2 (en) * 2003-12-30 2007-01-30 Intel Corporation Method and apparatus for calculating cyclic redundancy check (CRC) on data using a programmable CRC engine
US7529924B2 (en) * 2003-12-30 2009-05-05 Intel Corporation Method and apparatus for aligning ciphered data
US7543214B2 (en) * 2004-02-13 2009-06-02 Marvell International Ltd. Method and system for performing CRC
EP1776791A4 (en) 2004-08-13 2011-08-03 Agency Science Tech & Res Transmitter, method for generating a plurality of long preambles and communication device
US20060059219A1 (en) * 2004-09-16 2006-03-16 Koshy Kamal J Method and apparatus for performing modular exponentiations
US7590930B2 (en) * 2005-05-24 2009-09-15 Intel Corporation Instructions for performing modulo-2 multiplication and bit reflection
US7707483B2 (en) * 2005-05-25 2010-04-27 Intel Corporation Technique for performing cyclic redundancy code error detection
US20070083585A1 (en) * 2005-07-25 2007-04-12 Elliptic Semiconductor Inc. Karatsuba based multiplier and method
US7958436B2 (en) * 2005-12-23 2011-06-07 Intel Corporation Performing a cyclic redundancy checksum operation responsive to a user-level instruction
US8229109B2 (en) * 2006-06-27 2012-07-24 Intel Corporation Modular reduction using folding
US7827471B2 (en) 2006-10-12 2010-11-02 Intel Corporation Determining message residue using a set of polynomials
US8689078B2 (en) 2007-07-13 2014-04-01 Intel Corporation Determining a message residue
US8042025B2 (en) * 2007-12-18 2011-10-18 Intel Corporation Determining a message residue
US7886214B2 (en) * 2007-12-18 2011-02-08 Intel Corporation Determining a message residue
US9052985B2 (en) 2007-12-21 2015-06-09 Intel Corporation Method and apparatus for efficient programmable cyclic redundancy check (CRC)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of EP2089973A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7827471B2 (en) 2006-10-12 2010-11-02 Intel Corporation Determining message residue using a set of polynomials

Also Published As

Publication number Publication date
JP2010507290A (en) 2010-03-04
US7827471B2 (en) 2010-11-02
US20080092020A1 (en) 2008-04-17
EP2089973A2 (en) 2009-08-19
CN101162964B (en) 2011-01-26
JP5164277B2 (en) 2013-03-21
CN101162964A (en) 2008-04-16
WO2008046078A3 (en) 2008-05-22
TW200832935A (en) 2008-08-01
EP2089973A4 (en) 2010-12-01

Similar Documents

Publication Publication Date Title
US7827471B2 (en) Determining message residue using a set of polynomials
EP0767539B1 (en) Highly parallel cyclic redundancy code generator
JP4643957B2 (en) Method for calculating the CRC of a message
US6904558B2 (en) Methods for computing the CRC of a message from the incremental CRCs of composite sub-messages
US6044389A (en) System for computing the multiplicative inverse of a field element for galois fields without using tables
US7827384B2 (en) Galois-based incremental hash module
US20040153722A1 (en) Error correction code circuit with reduced hardware complexity
JPH07504312A (en) Efficient CRC remainder coefficient generation and inspection device and method
US10248498B2 (en) Cyclic redundancy check calculation for multiple blocks of a message
JP2001502153A (en) Hardware-optimized Reed-Solomon decoder for large data blocks
US5951677A (en) Efficient hardware implementation of euclidean array processing in reed-solomon decoding
US6701336B1 (en) Shared galois field multiplier
US5365529A (en) Circuitry for detecting and correcting errors in data words occurring in Reed-Solomon coded blocks and determining when errors are uncorrectable by syndrome analysis, Euclid&#39;s algorithm and a Chien search
EP0836285B1 (en) Reed-Solomon decoder with general-purpose processing unit and dedicated circuits
US7168024B2 (en) Data processing system and method
EP0723342B1 (en) Error correction apparatus
US7281196B2 (en) Single error Reed-Solomon decoder
JP3953397B2 (en) Reed-Solomon encoding circuit and Reed-Solomon decoding circuit
JPH06230991A (en) Method and apparatus for computation of inverse number of arbitrary element in finite field
JPS62122332A (en) Decoding device
KR100298833B1 (en) circuit for operating Euclid alhorizn
Nelson et al. Systolic architectures for decoding Reed-Solomon codes
Chang et al. Design and implementation of a reconfigurable architecture for (528, 518) Reed-Solomon codec IP
JPH11163738A (en) Error correction device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07854019

Country of ref document: EP

Kind code of ref document: A2

ENP Entry into the national phase

Ref document number: 2009532617

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2007854019

Country of ref document: EP