WO2002103518A1 - Efficient high performance data operation element for use in a reconfigurable logic environment - Google Patents

Efficient high performance data operation element for use in a reconfigurable logic environment Download PDF

Info

Publication number
WO2002103518A1
WO2002103518A1 PCT/US2002/011870 US0211870W WO02103518A1 WO 2002103518 A1 WO2002103518 A1 WO 2002103518A1 US 0211870 W US0211870 W US 0211870W WO 02103518 A1 WO02103518 A1 WO 02103518A1
Authority
WO
WIPO (PCT)
Prior art keywords
reconfigurable
unit
chip
shifter
instruction
Prior art date
Application number
PCT/US2002/011870
Other languages
French (fr)
Inventor
Joshua Lindner
Gary Lai
Peter Lam
Mark Edward Rollins
Vladimir Dinkevich
Craig Bradley Greenberg
Christopher E. Phillips
Hsin Wang
Bradley L. Taylor
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to DE10296742T priority Critical patent/DE10296742T5/en
Priority to JP2003505770A priority patent/JP2004531149A/en
Priority to GB0327399A priority patent/GB2398653A/en
Priority to KR1020037014350A priority patent/KR100628448B1/en
Publication of WO2002103518A1 publication Critical patent/WO2002103518A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
    • G06F9/3895Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
    • G06F9/3897Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units

Definitions

  • the present invention relates to reconfigurable logic chips, especially reconfigurable logic chips used for reconfigurable computing.
  • Field programmable gate arrays are programmable chips that can implement different configurations. Typically a design is created using design tools and a FPGA is configured for a specific design. Although designs can be changed, typically the FPGA uses a single configuration due to the relatively long time required to change a configuration compared to the operation time of the chip.
  • the present invention concerns a reconfigurable chip, including multiple reconfigurable functional units (such as a data path unit) adapted to implement different functions.
  • the reconfigurable function units preferably include multiplexers, at least one shifting unit and at least one arithmetic logic unit (ALU).
  • the reconfigurable functional units are configured by reconfigurable functional unit instructions. The instructions control the configuration of the multiplexer and shifting unit and the ALU.
  • the reconfigurable chip also includes interconnect adapted to connect together the reconfigurable functional units. In this way data can be passed between the reconfigurable functional units.
  • the reconfigurable functional unit instruction preferably includes a number of fields for the multiplexers, the shifter unit and the arithmetic logic unit. These fields configure these elements in the reconfigurable functional unit in a desired way.
  • there is an associated instruction memory for each reconfigurable functional unit has an associated instruction memory.
  • the instruction memory stores multiple instructions for the reconfigurable functional unit.
  • the state machine addresses the instruction memory to determine the next instruction to be loaded into the reconfigurable functional unit.
  • the reconfigurable functional units provide feedback to the state machine indicating when a function is finished and thus when the next function can be loaded into the reconfigurable functional unit.
  • the shifter unit is configurable with a number of different modes. These modes are preferably selectable by a field of the reconfigurable functional unit instruction.
  • interconnect elements are adapted to selectively connect some of the reconfigurable functional units to transfer word length data.
  • the transferred data preferably has a fixed data length of 32 bits or greater.
  • the fixed length data transfer allows interconnect system to be simplified at the loss of flexibility in the data transfer.
  • the shifting unit in the reconfigurable functional unit allows the arithmetic logic unit to operate on different bits in the word length input data of the reconfigurable functional unit compensating for the fixed structure of the interconnect elements. Thus, if needed data is in a certain location within a word, the shifter can move that bit location to the proper position for manipulation by the arithmetic logic unit.
  • Another embodiment of the present invention comprises using a multiplexer with a delay unit input and an input that bypasses the delay unit. In this manner the reconfigurable functional unit can implement a variable delay increasing the flexibility of the system.
  • Fig. 1 is an overview of the reconfigurable chip of one embodiment of the present invention
  • Fig. 2 is a simplified diagram of a reconfigurable functional unit of one embodiment of the present invention
  • FIG. 3 is a diagram of a reconfigurable functional unit of one embodiment of the present invention.
  • Fig. 4 is a diagram of a multiplier unit which can be used with the embodiment of the present invention.
  • Fig. 5 is a diagram of one slice of the reconfigurable functional unit shown in Fig. 1 illustrating the interconnection between the data path units;
  • Fig. 6 is a diagram illustrating the connections between the data path unit and the horizontal and vertical bus lines;
  • Fig. 7 is a diagram illustrating the interconnection of a data path unit in one tile to a data path unit in another tile;
  • Fig. 8 is a diagram illustrating the interconnection of the data path units and a local system memory of one embodiment of the present invention;
  • FIG. 9 is a diagram illustrating a state machine and functional block configuration memory producing the instruction of configuration information for the functional block data unit
  • Fig. 10A is a diagram illustrating the interconnection of a state machine, configuration state memory and data path unit of the present invention, showing the instruction and instruction fields for the data path unit;
  • Fig. 10B is a diagram illustrating a data path unit using a decoder for at least part of the instruction
  • Fig. 11 is a diagram illustrating the control system configuration memory at data path unit as one embodiment of the present invention.
  • Fig. 12 is a diagram of an interconnection logic unit for use in one embodiment of the present invention.
  • Figs. 13A and 13B are charts illustrating the portions of the instructions for the ALU;
  • Fig. 14 is a diagram illustrating the flags for the system of one embodiment of the present invention.
  • Fig. 15 is a diagram illustrating the shifting mode for shifter;
  • Fig. 16 is a diagram of the instruction of one embodiment of the shifter;
  • Fig. 17 is a diagram illustrating the operation of the shifter of Fig. 16;
  • Fig. 18 is a diagram of a logic system using a multiple master latches of one embodiment of the present invention.
  • Fig. 19 is a diagram illustrating the background and foreground plane latches of one embodiment of the present invention.
  • Fig. 20 is a diagram of one embodiment of a reconfigurable functional unit for a data path in one embodiment of the present invention.
  • Fig. 21 is a diagram of the input multiplexers for the system of Fig. 20;
  • Fig. 22 is a diagram of the shifting mode for the shifter of one embodiment of the present invention.
  • Fig. 23 is a diagram illustrating some shifting modes for the shifter of one embodiment of the present invention.
  • Fig. 24 is a diagram illustrating the implementation of a turbo look up table of one embodiment of the present invention.
  • Fig. 1 illustrates a reconfigurable chip 20.
  • the reconfigurable chip 20 includes a central processing unit (CPU) 22, preferably a reduced instruction set (RISC) CPU. Data from the external memory (not shown) is transferred using memory controller 24. Bus 26, called the roadrunner bus, is used to transfer data from the memory controller to the reconfigurable fabric 28.
  • the reconfigurable fabric 28 is divided into a number of slices. Each slice is broken down into a number of tiles.
  • Each tile includes a data path unit (reconfigurable functional unit), control units and local system memory units. The local system memory units interact with the data path units as described below. In a preferred embodiment, each tile also has a number of multiplier units.
  • the reconfigurable functional unit includes input multipliers 30 and 32.
  • the input multipliers allow the data path unit to receive inputs from a number of different locations, including nearby data path units as well as data buses.
  • the selected output of the input multipliers are sent on to registers 36 and 38.
  • the output of the multiplier 32 goes to shifter unit 34.
  • the shifter unit 34 allows for the selection of different bits to be operated on by the ALU 40. Since the interconnections between the data path units use fixed word length connections to simplify the interconnection system, the use of a shifter unit in the data path unit allows access to bits packed within the interior of a word.
  • the shifter unit 34 preferably has a number of modes which implement more than just logical and arithmetic shifts left and right. These different modes allow the system to operate in a more efficient manner.
  • the arithmetic logic unit 40 described below preferably uses a field of the instruction for the data path unit to implement a function.
  • the output of the ALU 40 preferably goes to an output register 42.
  • the output can also actually be sent to an optional bit shifter 44 to produce a shifted value.
  • a bypassing ALU feedback output on line 46 is also used. This allows portions of the data path unit to operate while the output register 42 controls what outputs are sent from the data path unit.
  • the bit shifter 44 is used to implement the linear feedback shift register as described in the patent application, "Modifications to Reconfigurable Functional Unit in a Reconfigurable Chip to Perform Linear Feedback Shift Register Function, " by Peter Lam, Attorney Docket No. 032001-060.
  • the multiplexers, shifter unit 34 and ALU 40 are preferably controlled by an instruction for the data path unit. This instruction is preferably divided into a number of different fields, including multiplexer instruction fields for the multiplexers, shifter unit fields for the shifter 34 and ALU instruction field for the ALU 40.
  • a decoder is used for at least part of the instruction.
  • Fig. 3 is a detailed diagram of one embodiment of the present invention.
  • the input multiplexers 50 and 52 receive as inputs data from nearby units. In one example, data words from 16 units, including data path units and multiplier units, are used as inputs. Global vertical and horizontal interconnections are used. In one embodiment a connection for the linear feedback shift register feedback, a logical zero constant input and an input for a local system memory unit. Another input is the carry input from the prior data path unit which is provided to the ALU 54 directly.
  • the multiplexer 50 is connected to the shifter 56, including a number of different modes of operation.
  • the shifter 56 is connected to another multiplexer 58 so that the output of the multiplexer 50 can either avoid or use shifter unit 56.
  • the shifter unit 56 can also use the A input from the input multiplexer 52 for some of the modes.
  • the output of the multiplexer 58 and the output of multiplexer 52 can be sent to registers 60 and 62, respectively.
  • the registers 60 and 62 can also be loaded from off the chip.
  • This logic 64 and 66 allows for the register values to act as a mask register for the system.
  • the multiplexers 68 and 70 select the inputs to the ALU 54. Outputs to the ALU are sent to a number of different possible paths. Note that the data path output out of multiplexer 72 can be the value from the output register 74, or the value from the multiplexer 76 (which can be the ALU value or the local system memory re-data on line 78).
  • the flag values from the ALU are sent to multiplexers 80 and 82 which select the desired flag value.
  • This flag value can be stored in registers 88 and 90 and the value of the registers 88 or 90 is sent to the multiplexers 92 and 94 or the selected value from multiplier 80 or 82 is used.
  • the CONF value is a field in the instruction that indicates which flag to select.
  • the registers 60, 62 and 74 can be implemented by using multiple master slave latches, shown in Fig. 18, to allow the loading of background configuration data into the register. In one embodiment, the operation of these registers can be controlled by the field of the reconfigurable functional unit instruction.
  • Fig. 4 is a diagram of a multiplier unit.
  • the multiplier unit is oriented somewhat similar to the reconfigurable functional unit shown in Fig. 3. However, the multiplier unit has a dedicated multiplier rather than an ALU. [0046] As shown in Fig. 5, in one embodiment for every seven data path units or reconfigurable functional units in a tile, there are two multiplier units. [0047] Fig. 6 illustrates the connections of adjacent data path units and multipliers into the data path unit inputs. Note looking at Fig. 5, the data path unit 100 can receive as an input, outputs from eight previous data path units (and multipliers) above and seven of the next data path units (and multipliers) below. The output of the data path unit 100 is also fed back to itself.
  • Fig. 6 is a diagram that illustrates the connection of the one tile reconfigurable functional units (data path units) to horizontal and vertical connection lines. By using multiplexers, the outputs, and inputs, of the data path units can be interconnected to both the vertical routing lines and the horizontal routing lines.
  • Fig. 7 illustrates an example of interconnecting a data path unit in one tile to a data path unit in another tile using vertical interconnected lines. Note that the system of the present invention for the interconnections preferably uses word- based interconnections. In one embodiment, the interconnection lines allow the connection of 32 bit wide data.
  • Fig. 8 illustrates the connection between the data path units and the local system memory.
  • alternate data path units are used to implement the Writes and Reads of the local system memory.
  • data path unit 102 provides Read Addresses to and receives read data from the local system memory 104.
  • Data path unit 106 provides the Write Address and Write data for the local system memory 104.
  • a data path unit can both read from and write to a local system memory.
  • One of the uses of a data path unit is to provide an address to a local system memory to obtain data from a local system memory, which can then be put upon the horizontal and vertical interconnection buses.
  • the connections shown in Fig. 8 are the direct connections to read and write data in and out of the local system memory.
  • the local system memory is globally read from and written to using the memory control system.
  • This general memory control system is used for configuration of the system and for obtaining the data operated on by the data path units.
  • the data path units include structures that allow addresses and data to be provided to the local system memory while the data path unit does some other function.
  • Fig. 9 is a disclosure of a control fabric unit 132 for the reconfigurable functional unit 130.
  • the control fabric unit 132 produces a control or instruction line for the reconfigurable functional unit 130.
  • the control fabric unit 132 is preferably composed of a state machine unit 134 and a functional block configuration memory unit 136.
  • the state machine 134 produces the addresses into the instruction memory 136.
  • One implementation of the state machine 134 uses a reconfigurable programmable sum-of-products unit 136.
  • Fig. 10A illustrates a system with the state machine configuration unit 136, the configuration state memory 138' and the data path unit 130' .
  • the configuration from the configuration state memory 138' can be considered to be an instruction for the data path unit 130' .
  • the instruction preferably includes fields such as an ALU configuration field, shift register configuration field, and a multiplexer configuration field.
  • some of the flags from the data path unit 130' are sent to the state machine 136' in order to switch configurations for the data path unit after the data path unit is done operating on a set of data.
  • the configuration state machine 138' can also be loaded from an external configuration from external memory or from the processor.
  • Fig. 10B is a diagram illustrating a data path unit using a decoder to decode at least part of an instruction.
  • Fig. 11 shows the control system, including the state machines for the different configuration state memories. The data path unit flags are sent to the control system as described above.
  • Fig. 12 is a diagram that illustrates one example of an arithmetic logic unit.
  • This arithmetic logic unit includes an arithmetic unit 142, a parallel logic unit 140 and a flag unit 144. Also shown is a carry selection unit 146.
  • the ALU instruction field from the instruction is sent to select the operations of the ALU.
  • the arithmetic unit 142 uses a carry input. In a preferred embodiment, this carry value is either the carry from the previous data path unit or the control signal or a carry which is part of the instruction.
  • Figs. 13 A and 13B illustrate a list of some of the Opcodes used in one embodiment of an ALU of the reconfigurable functional unit of the present invention. Details of these Opcodes are described in the Appendix I, incorporated herein by reference.
  • Fig. 14 is a diagram of the flag system for the present invention.
  • the flag unit is inside the data path unit and used for producing the flags which go to the control unit as well as to the next data path unit.
  • Fig. 15 illustrates the shift mode and the operation of some of the modes of the shifter unit of one embodiment of the present invention. Since the shifter unit has a number of different modes, the flexibility of the system of the present invention is increased.
  • Figs. 16 and 17 illustrate one implementation of the shifter unit using multiple rows of multiplexers. Additional logic is also of use to produce a special output. Fig. 17 illustrates the operation of some of the implementations of the shift register.
  • This shifter used in the Datapath unit performs more than right/left shift operations.
  • the shifter includes an array of multiplexers which are controlled via mux select signals.
  • a 32-bit operand which is divided into four groups of 8 signals is coupled to a first row of four multiplexers.
  • the outputs of the multiplexers in a previous row are coupled to the inputs of the next row of multiplexers.
  • Each multiplexer in the array is controlled independently.
  • the control signals determine how the signals are routed in the array and hence the type of operation performed on the operand.
  • examples of operations include: 32-bit logical right/left shift, 32 bit arithmetic right/left shift, lower 16-bit sign extend to 32-bit, constant generation, duplicate lower 16 bit to upper 16 bit, duplicate upper 16 bit to lower 16-bit, swap lower and upper 16- bit, 16-bit arithmetic right shift, and byte swap.
  • Fig. 18 illustrates a multiple master latch system used in one embodiment of the system of the present invention.
  • two master latches are used.
  • One of the master latches is used for the background configuration of the system.
  • the other master latch receives data from the pipeline in the data path unit or from the processor.
  • the inputs to the latch 150 are provided through the multiplexer 152.
  • the latch 154 is connected to the configuration bus to receive data from background configuration.
  • the multiplexer 156 can be used to select the input to the slave latch 158.
  • the use of a background configuration memory to the system allows the quick operation of the system in the present invention.
  • the storage element of Fig. 18 has multiple master latches which share a single slave latch via a multiplexer which provides a multi-function storage element.
  • a significant space savings is realized (approximately 25%). This is particularly true in a system utilizing numerous storage elements.
  • the storage element design relies on the fact that configuration bits are infrequently loaded into storage elements. So instead of having a separate slave latch for each master latch coupled to a configuration bitstream signal, according to the invention the master latch coupled to the configuration bitstream signal shares its slave latch with another master latch. Hence, two or more master latches share a single slave latch.
  • a multiplexer is coupled between the master latches and the single slave latch for selecting which master latch is coupled to the slave latch.
  • one master latch's input is coupled to a signal that frequently requires the storage element functionality and the other master latch's input is coupled to a signal that requires the storage element functionality on an infrequent basis.
  • the first master latch is coupled to the data path signal and the second master latch is coupled to the configuration bit signal.
  • the storage element functions to divide the data path pipeline into stages.
  • the configuration bitstream signal is passed to the slave latch, the storage element functions to store the configuration bits.
  • one master latch is coupled to the data path signal and more than one master latch is coupled to a configuration bit signal and all of the master latch outputs are coupled to the multiplexer which is used to select and pass one of the signals from the master latches to the shared slave latch.
  • master latches are reset upon 'RESET' or 'INIT' • slave latches are reset upon 'RESET' only
  • mux B selects arc bus when arc is writing (further qualified by decoding corresponding arc address. Please refer to ARC ext spec for address map)
  • slave latches are transparent only during clock high • master latch 0 is transparent when the latpipe 0 is enabled or arc write to that register is happening
  • master latch 1 is transparent when config loading is activate and the corresponded config address is decoded • slave latch is transparent when 1. config activate to this slice or
  • variable delay units are comprised of a multiplexer which receives first unit sent into a register and the second input which bypasses the register. In this way a variable delay can be implemented.
  • the register 60 connected to the multiplexer 68 and the register 62 connected from multiplexer 70 and register 88 connected to the multiplexer 92, the register 90 connected to the multiplexer 94 and the register 74 connected to the multiplexer 72, can implement such a variable delay.
  • a multiplexer can select a delayed or a bypass signal; the delay signal going through a delay element like a flip-flop.
  • the flexible adaptive delay element includes a storage device (e.g., flip-flop, latch) having its input coupled to an input signal and its output coupled to a first input of a multiplexer.
  • the other input of the multiplexer is coupled to the input signal.
  • the first input of the multiplexer is coupled to the input signal and a second input of the multiplexer is couple to the input signal delayed by the amount provided from the storage device.
  • the select signal can then be used to select either the delayed signal or the undelayed signal.
  • Fig. 19 shows an alternate embodiment of a background foreground plane arrangement
  • the present invention incorporates by reference the prior patent applications, including "A HIGH PERFORMANCE DATA PATH UNIT FOR BEHAVIORAL DATA TRANSMISSION AND RECEPTION, " Inventor Hsinshih Wang, Serial No. 09/307,072 filed May 7, 1999 (Attorney Docket No. 032001-014), "CONTROL FABRIC FOR ENABLING DATA PATH FLOW,” Inventors Shaila Hanrahan, et al., Serial No. 09/401,194 filed September 23, 1999 (Attorney Docket No.
  • Fig. 20 illustrates a ultimate embodiment of the reconfigurable functional unit or data path unit. In this embodiment an additional register and multiplexer are added to the B input path before the shifter. Additionally, the input multiplexer is slightly modified. The input multiplexer is shown with respect to Fig. 21. [0072] Fig. 22 illustrates the shifter mode table for the new embodiment of Fig. 19.
  • Fig. 23 illustrates the implementation of the new modes of Fig. 22.
  • Fig. 24 illustrates a turbo look up table for use with the system of the present invention.
  • the turbo look up table is useful in the addition of data stored in a logarithmic format. This is useful for many communication systems.
  • the data In one prior embodiment in order to do the multiplication of data stored in logarithmic format, the data must be converted to the normal format by doing an exponential expansion of the data. The exponentially expanded data is added together and then the combined information is converted back to the logarithmic format.
  • the turbo look up table is used in the production of an estimate of the addition of a correction factor.
  • This estimate uses the value of the greatest value of A and B as a first estimate of the value of addition of A plus B.
  • the absolute value of this difference of A minus B is used as an input to a look up table to provide a correction factor to add to the greatest value of A or B. By adding this correction factor to the greatest value of A or B a relatively accurate estimation is produced.
  • the look up table need not have the same number of input bits of A. In a preferred embodiment, only a few bits of precision are used. If the magnitude of A minus B is relatively large, the combined value does not significantly differ from the greatest value of A or B. For example, the addition of 1,000,000 to 0.1 is approximately 1,000,000. The addition of 1,000,000 to 1,000,000 is effectively doubling the maximum value.
  • Cin is the CO of the lower 16bits.
  • Pseudocode result A+B+Ctrl Description A plus B with control carry in Flags Affected CO,OV,EQ,SN similar to ADD
  • Pseudocode result A-B Description A minus B Flags Affected COcarry out of the operation (A+-B+1)
  • Pseudocode result * A Description Pass A Flags Affected EQ,SN same as AND Name PASSB
  • the output mux for the Multiplier will be changing for the CS2212 in order to allow the A or B operand to be latched at the O register. This effectively bypasses the multiplication operation. This will require the addition of one bit to the 'muxmultlsmsel' ⁇ field in the MULT CSM.
  • the signal 'muxmultlsmsel' will select the input to the O register in the following fashion:
  • This functionality allows the user to employ the multiplier as a dynamically configurable routing resource when it is not being used for its primary function.
  • the CS2112 has registers that can be used as either mask registers or pipeline registers, hi order to allow the user to use pipeline registers in the A and B operand paths, and use the mask registers, the CS2212 will include additional registers. These registers are inserted after the A and B input and will be referred to as 'apipe' and 'bpipe'. These registers can be bypassed by the signals 'muxapipe' and 'muxbpipe' respectively. Refer to the CS2212 DPU Block Diagram for the placement of these registers and muxes. The muxes are selected in the following fashion:
  • the CS2212 will be able to write to the LSM with the shifter output, as well as with the ALU output.
  • a mux will be added to the LSM write data path. This mux will be referred to as 'muxlsmwd', and will be selected by 'muxlsmwdseP in the following fashion:
  • the fabric is reconfigurable and it is controlled by config bits.
  • the config bits are loaded into the fabric by issuing arc instructions (through load store) and then the config controller will transfer the config bits into the fabric's configuration plane.
  • the following table is to provide software information about what address does each config signal correspond to in the address space. So far the base address of the configuration has not been determined yet and so the following address will start from 0.
  • the top bit (bit[127]) is the parity bit.
  • the hardware will check for even parity in each line of 128 bit. For behaviour of parity, please check ARC extension spec by Dani.
  • the hardware can store at most 112 bit line each cycle
  • Input_mux Slice 0 PLAO ( each bit has individual enables) PLAin[0].is the multiplexer input associated with DPUO PLAin[l] is the multiplexer input associated with DPUO STATE0[n] is the primary state output of slice 0 LSTATEOfn] is the lower state output of slice 0 FLAG[n] is the primary flag output of slice 0 LSTATE0[n] is the lower flag output of slice 0 HBIT0[n] is the Horizontal State bus ' HBIT0[n] needs an optional register before the input mux.
  • IO[7:0] are 8 I/O bits coming from the pins associated with each slice.
  • Bit 0 input, DATA is the hortnet mux output from the datapath. Each tile has 8 hortnet mux. The control will pick up the lower 16 bit hortnet mux 7 from each tile. Notice it is the value at the output of the mux 7, input of the tristate. • Tile D is taken out and so data it provides (STATE*[31:24],FLAG[27:21]...) are connected to ground)
  • the PLA_in_sel are 64tol mux select. There are 16 of them for each PLA (each bit are individually . controlled).
  • Horzjnux SliceO PLAO each bit has individual enables
  • HMUX[0] is one of 16 Horizontal State Lines

Abstract

A reconfigurable chip (20) is taught having reconfigurable functional units including a shift register, arithmetic logic, and multiplexers. The data paths are interconnected to other data path units. Interconnection is provided by transferring word length data. The shifter allows for the word length data to be adjusted for use in the arithmetic logic unit. Reconfigurable functional units are controlled by reconfigurable functional unit instructions. The reconfigurable unit instructions are stored in a reconfigurable functional unit instruction memory, which is addressed by a state machine on the chip.

Description

EFFICIENT HIGH PERFORMANCE DATA OPERATION ELEMENT FOR USE IN A RECONFIGURABLE LOGIC
ENVIRONMENT
Related Application/Priority
[0001] The present application claims the priority of Provisional Application
No. 60/288,298, filed May 2, 2001.
Background of the Invention [0002] The present invention relates to reconfigurable logic chips, especially reconfigurable logic chips used for reconfigurable computing. [0003] Field programmable gate arrays (FPGAs) are programmable chips that can implement different configurations. Typically a design is created using design tools and a FPGA is configured for a specific design. Although designs can be changed, typically the FPGA uses a single configuration due to the relatively long time required to change a configuration compared to the operation time of the chip.
[0004] Recently reconfigurable chips designed to quickly switch portions of an algorithm on to a reconfigurable chip have been created. These reconfigurable chips are designed to use the reconfigurable elements of the chip so as to provide resources for the implementation portions of an algorithm. [0005] It is desired to have a data operation element or reconfigurable functional unit for use in a reconfigurable chip that implements an improved design to more effectively implement algorithms on a reconfigurable chip. Summary of the Invention
[0006] The present invention concerns a reconfigurable chip, including multiple reconfigurable functional units (such as a data path unit) adapted to implement different functions. The reconfigurable function units preferably include multiplexers, at least one shifting unit and at least one arithmetic logic unit (ALU). The reconfigurable functional units are configured by reconfigurable functional unit instructions. The instructions control the configuration of the multiplexer and shifting unit and the ALU. The reconfigurable chip also includes interconnect adapted to connect together the reconfigurable functional units. In this way data can be passed between the reconfigurable functional units.
[0007] The reconfigurable functional unit instruction preferably includes a number of fields for the multiplexers, the shifter unit and the arithmetic logic unit. These fields configure these elements in the reconfigurable functional unit in a desired way. [0008] In a preferred embodiment, there is an associated instruction memory for each reconfigurable functional unit has an associated instruction memory. The instruction memory stores multiple instructions for the reconfigurable functional unit. In a preferred embodiment, the state machine addresses the instruction memory to determine the next instruction to be loaded into the reconfigurable functional unit. In one preferred embodiment, the reconfigurable functional units provide feedback to the state machine indicating when a function is finished and thus when the next function can be loaded into the reconfigurable functional unit. [0009] In one embodiment, the shifter unit is configurable with a number of different modes. These modes are preferably selectable by a field of the reconfigurable functional unit instruction.
[0010] In one embodiment, interconnect elements are adapted to selectively connect some of the reconfigurable functional units to transfer word length data. The transferred data preferably has a fixed data length of 32 bits or greater. The fixed length data transfer allows interconnect system to be simplified at the loss of flexibility in the data transfer. The shifting unit in the reconfigurable functional unit allows the arithmetic logic unit to operate on different bits in the word length input data of the reconfigurable functional unit compensating for the fixed structure of the interconnect elements. Thus, if needed data is in a certain location within a word, the shifter can move that bit location to the proper position for manipulation by the arithmetic logic unit.
[0011] Another embodiment of the present invention comprises using a multiplexer with a delay unit input and an input that bypasses the delay unit. In this manner the reconfigurable functional unit can implement a variable delay increasing the flexibility of the system.
Brief Description of the Drawing Figures
[0012] Fig. 1 is an overview of the reconfigurable chip of one embodiment of the present invention; [0013] Fig. 2 is a simplified diagram of a reconfigurable functional unit of one embodiment of the present invention;
[0014] Fig. 3 is a diagram of a reconfigurable functional unit of one embodiment of the present invention;
[0015] Fig. 4 is a diagram of a multiplier unit which can be used with the embodiment of the present invention;
[0016] Fig. 5 is a diagram of one slice of the reconfigurable functional unit shown in Fig. 1 illustrating the interconnection between the data path units;
[0017] Fig. 6 is a diagram illustrating the connections between the data path unit and the horizontal and vertical bus lines; [0018] Fig. 7 is a diagram illustrating the interconnection of a data path unit in one tile to a data path unit in another tile; [0019] Fig. 8 is a diagram illustrating the interconnection of the data path units and a local system memory of one embodiment of the present invention;
[0020] Fig. 9 is a diagram illustrating a state machine and functional block configuration memory producing the instruction of configuration information for the functional block data unit;
[0021] Fig. 10A is a diagram illustrating the interconnection of a state machine, configuration state memory and data path unit of the present invention, showing the instruction and instruction fields for the data path unit;
[0022] Fig. 10B is a diagram illustrating a data path unit using a decoder for at least part of the instruction;
[0023] Fig. 11 is a diagram illustrating the control system configuration memory at data path unit as one embodiment of the present invention;
[0024] Fig. 12 is a diagram of an interconnection logic unit for use in one embodiment of the present invention; [0025] Figs. 13A and 13B are charts illustrating the portions of the instructions for the ALU;
[0026] Fig. 14 is a diagram illustrating the flags for the system of one embodiment of the present invention;
[0027] Fig. 15 is a diagram illustrating the shifting mode for shifter; [0028] Fig. 16 is a diagram of the instruction of one embodiment of the shifter;
[0029] Fig. 17 is a diagram illustrating the operation of the shifter of Fig. 16;
[0030] Fig. 18 is a diagram of a logic system using a multiple master latches of one embodiment of the present invention;
[0031] Fig. 19 is a diagram illustrating the background and foreground plane latches of one embodiment of the present invention;
[0032] Fig. 20 is a diagram of one embodiment of a reconfigurable functional unit for a data path in one embodiment of the present invention;
[0033] Fig. 21 is a diagram of the input multiplexers for the system of Fig. 20; [0034] Fig. 22 is a diagram of the shifting mode for the shifter of one embodiment of the present invention;
[0035] Fig. 23 is a diagram illustrating some shifting modes for the shifter of one embodiment of the present invention; and [0036] Fig. 24 is a diagram illustrating the implementation of a turbo look up table of one embodiment of the present invention.
Detailed Description of the Invention
[0037] Fig. 1 illustrates a reconfigurable chip 20. The reconfigurable chip 20 includes a central processing unit (CPU) 22, preferably a reduced instruction set (RISC) CPU. Data from the external memory (not shown) is transferred using memory controller 24. Bus 26, called the roadrunner bus, is used to transfer data from the memory controller to the reconfigurable fabric 28. The reconfigurable fabric 28 is divided into a number of slices. Each slice is broken down into a number of tiles. Each tile includes a data path unit (reconfigurable functional unit), control units and local system memory units. The local system memory units interact with the data path units as described below. In a preferred embodiment, each tile also has a number of multiplier units. [0038] Fig. 2 illustrates a simplified diagram of a reconfigurable functional unit of one embodiment of the present invention. The reconfigurable functional unit includes input multipliers 30 and 32. As will be described below, the input multipliers allow the data path unit to receive inputs from a number of different locations, including nearby data path units as well as data buses. The selected output of the input multipliers are sent on to registers 36 and 38. Additionally, the output of the multiplier 32 goes to shifter unit 34. As described below, the shifter unit 34 allows for the selection of different bits to be operated on by the ALU 40. Since the interconnections between the data path units use fixed word length connections to simplify the interconnection system, the use of a shifter unit in the data path unit allows access to bits packed within the interior of a word. [0039] As will be described below, the shifter unit 34 preferably has a number of modes which implement more than just logical and arithmetic shifts left and right. These different modes allow the system to operate in a more efficient manner. The arithmetic logic unit 40 described below, preferably uses a field of the instruction for the data path unit to implement a function. The output of the ALU 40 preferably goes to an output register 42. The output can also actually be sent to an optional bit shifter 44 to produce a shifted value. [0040] In one embodiment, a bypassing ALU feedback output on line 46 is also used. This allows portions of the data path unit to operate while the output register 42 controls what outputs are sent from the data path unit. This is useful when the output register 42 is used to address a local system memory unit. [0041] The bit shifter 44 is used to implement the linear feedback shift register as described in the patent application, "Modifications to Reconfigurable Functional Unit in a Reconfigurable Chip to Perform Linear Feedback Shift Register Function, " by Peter Lam, Attorney Docket No. 032001-060. [0042] Note that the multiplexers, shifter unit 34 and ALU 40 are preferably controlled by an instruction for the data path unit. This instruction is preferably divided into a number of different fields, including multiplexer instruction fields for the multiplexers, shifter unit fields for the shifter 34 and ALU instruction field for the ALU 40. In one embodiment, a decoder is used for at least part of the instruction. [0043] Fig. 3 is a detailed diagram of one embodiment of the present invention. The input multiplexers 50 and 52 receive as inputs data from nearby units. In one example, data words from 16 units, including data path units and multiplier units, are used as inputs. Global vertical and horizontal interconnections are used. In one embodiment a connection for the linear feedback shift register feedback, a logical zero constant input and an input for a local system memory unit. Another input is the carry input from the prior data path unit which is provided to the ALU 54 directly. The multiplexer 50 is connected to the shifter 56, including a number of different modes of operation. The shifter 56 is connected to another multiplexer 58 so that the output of the multiplexer 50 can either avoid or use shifter unit 56. The shifter unit 56 can also use the A input from the input multiplexer 52 for some of the modes. The output of the multiplexer 58 and the output of multiplexer 52 can be sent to registers 60 and 62, respectively. The registers 60 and 62 can also be loaded from off the chip. This logic 64 and 66 allows for the register values to act as a mask register for the system. The multiplexers 68 and 70 select the inputs to the ALU 54. Outputs to the ALU are sent to a number of different possible paths. Note that the data path output out of multiplexer 72 can be the value from the output register 74, or the value from the multiplexer 76 (which can be the ALU value or the local system memory re-data on line 78). The flag values from the ALU are sent to multiplexers 80 and 82 which select the desired flag value. This flag value can be stored in registers 88 and 90 and the value of the registers 88 or 90 is sent to the multiplexers 92 and 94 or the selected value from multiplier 80 or 82 is used. The CONF value is a field in the instruction that indicates which flag to select. [0044] In one embodiment, the registers 60, 62 and 74 can be implemented by using multiple master slave latches, shown in Fig. 18, to allow the loading of background configuration data into the register. In one embodiment, the operation of these registers can be controlled by the field of the reconfigurable functional unit instruction. [0045] Fig. 4 is a diagram of a multiplier unit. The multiplier unit is oriented somewhat similar to the reconfigurable functional unit shown in Fig. 3. However, the multiplier unit has a dedicated multiplier rather than an ALU. [0046] As shown in Fig. 5, in one embodiment for every seven data path units or reconfigurable functional units in a tile, there are two multiplier units. [0047] Fig. 6 illustrates the connections of adjacent data path units and multipliers into the data path unit inputs. Note looking at Fig. 5, the data path unit 100 can receive as an input, outputs from eight previous data path units (and multipliers) above and seven of the next data path units (and multipliers) below. The output of the data path unit 100 is also fed back to itself. The outputs of any of these units can be selected using the input multiplexers of the system for either the A or B inputs. [0048] Fig. 6 is a diagram that illustrates the connection of the one tile reconfigurable functional units (data path units) to horizontal and vertical connection lines. By using multiplexers, the outputs, and inputs, of the data path units can be interconnected to both the vertical routing lines and the horizontal routing lines. [0049] Fig. 7 illustrates an example of interconnecting a data path unit in one tile to a data path unit in another tile using vertical interconnected lines. Note that the system of the present invention for the interconnections preferably uses word- based interconnections. In one embodiment, the interconnection lines allow the connection of 32 bit wide data. The shifter unit in the data path unit allows for the alignment of the data once it is received into a data path unit from the interconnection system. Because the system sends data in a 32 bit word, the complexity of the interconnect system is reduced and simplified, reducing somewhat the flexibility of the interconnect. [0050] Fig. 8 illustrates the connection between the data path units and the local system memory. In a preferred environment, alternate data path units are used to implement the Writes and Reads of the local system memory. For example, data path unit 102 provides Read Addresses to and receives read data from the local system memory 104. Data path unit 106 provides the Write Address and Write data for the local system memory 104. Note that by using the pass gates such as pass gates 106, 108, 110 and 112, the data path units 102 and 106 can connect to other local system memories, such as local system memory 114 and the data path units 116 and 118 can connect to the local system memory 104. In another embodiment, a data path unit can both read from and write to a local system memory. One of the uses of a data path unit is to provide an address to a local system memory to obtain data from a local system memory, which can then be put upon the horizontal and vertical interconnection buses. The connections shown in Fig. 8 are the direct connections to read and write data in and out of the local system memory. In a preferred environment, the local system memory is globally read from and written to using the memory control system. This general memory control system is used for configuration of the system and for obtaining the data operated on by the data path units. Note that as described above, in a preferred embodiment, the data path units include structures that allow addresses and data to be provided to the local system memory while the data path unit does some other function.
[0051] Fig. 9 is a disclosure of a control fabric unit 132 for the reconfigurable functional unit 130. In this embodiment, the control fabric unit 132 produces a control or instruction line for the reconfigurable functional unit 130. In this embodiment, the control fabric unit 132 is preferably composed of a state machine unit 134 and a functional block configuration memory unit 136. The state machine 134 produces the addresses into the instruction memory 136. One implementation of the state machine 134 uses a reconfigurable programmable sum-of-products unit 136. [0052] Fig. 10A illustrates a system with the state machine configuration unit 136, the configuration state memory 138' and the data path unit 130' . Note that the configuration from the configuration state memory 138' can be considered to be an instruction for the data path unit 130' . The instruction preferably includes fields such as an ALU configuration field, shift register configuration field, and a multiplexer configuration field. In one embodiment, some of the flags from the data path unit 130' are sent to the state machine 136' in order to switch configurations for the data path unit after the data path unit is done operating on a set of data. The configuration state machine 138' can also be loaded from an external configuration from external memory or from the processor. [0053] Fig. 10B is a diagram illustrating a data path unit using a decoder to decode at least part of an instruction. [0054] Fig. 11 shows the control system, including the state machines for the different configuration state memories. The data path unit flags are sent to the control system as described above.
[0055] Fig. 12 is a diagram that illustrates one example of an arithmetic logic unit. This arithmetic logic unit includes an arithmetic unit 142, a parallel logic unit 140 and a flag unit 144. Also shown is a carry selection unit 146. The ALU instruction field from the instruction is sent to select the operations of the ALU. The arithmetic unit 142 uses a carry input. In a preferred embodiment, this carry value is either the carry from the previous data path unit or the control signal or a carry which is part of the instruction. [0056] Figs. 13 A and 13B illustrate a list of some of the Opcodes used in one embodiment of an ALU of the reconfigurable functional unit of the present invention. Details of these Opcodes are described in the Appendix I, incorporated herein by reference.
[0057] Fig. 14 is a diagram of the flag system for the present invention. The flag unit is inside the data path unit and used for producing the flags which go to the control unit as well as to the next data path unit. The selection of the flags which are used the control of a field of the reconfigurable functional instruction and are preferred by the present invention. A description of the some of the flags is given below. [0058] ROXR is driven every cycle. It is selected by conf = = 1.
The operation is: case opcode[7] = =0 flag[l] = = "(B[31:0]) flag[0] = = A(B[15:0]) case opcode[7] = = 1 flag[l] = = A(B[31:16]) flag[0] = = A(B[15:0])
Abbreviations:
CO - Carry Out (of the addition/subtraction operation)
OV - Overflow (of the addition/subtraction operation) EQ - Equal (A= = B)
GT - Greater Than
LT - Less than
SN - Sign (sign bit of result)
Previous Flag Cin - Carry in pevious row
Ctrl - Carry in from control
Max - Ox7fff[ffffJ (for 16/32 bits)
Min - 0x8000[0000] (for 16/32 bits)
[0059] Fig. 15 illustrates the shift mode and the operation of some of the modes of the shifter unit of one embodiment of the present invention. Since the shifter unit has a number of different modes, the flexibility of the system of the present invention is increased.
[0060] Figs. 16 and 17 illustrate one implementation of the shifter unit using multiple rows of multiplexers. Additional logic is also of use to produce a special output. Fig. 17 illustrates the operation of some of the implementations of the shift register.
[0061] This shifter used in the Datapath unit performs more than right/left shift operations. The shifter includes an array of multiplexers which are controlled via mux select signals. In one 4x6 multiplexer array shifter embodiment, a 32-bit operand which is divided into four groups of 8 signals is coupled to a first row of four multiplexers. Other than the last row, the outputs of the multiplexers in a previous row are coupled to the inputs of the next row of multiplexers. Each multiplexer in the array is controlled independently. The control signals determine how the signals are routed in the array and hence the type of operation performed on the operand. In one embodiment, examples of operations include: 32-bit logical right/left shift, 32 bit arithmetic right/left shift, lower 16-bit sign extend to 32-bit, constant generation, duplicate lower 16 bit to upper 16 bit, duplicate upper 16 bit to lower 16-bit, swap lower and upper 16- bit, 16-bit arithmetic right shift, and byte swap. [0062] Fig. 18 illustrates a multiple master latch system used in one embodiment of the system of the present invention. In this example, two master latches are used. One of the master latches is used for the background configuration of the system. The other master latch receives data from the pipeline in the data path unit or from the processor. The inputs to the latch 150 are provided through the multiplexer 152. The latch 154 is connected to the configuration bus to receive data from background configuration. The multiplexer 156 can be used to select the input to the slave latch 158. The use of a background configuration memory to the system allows the quick operation of the system in the present invention.
[0063] The storage element of Fig. 18 has multiple master latches which share a single slave latch via a multiplexer which provides a multi-function storage element. In addition, by sharing a slave latch a significant space savings is realized (approximately 25%). This is particularly true in a system utilizing numerous storage elements. The storage element design relies on the fact that configuration bits are infrequently loaded into storage elements. So instead of having a separate slave latch for each master latch coupled to a configuration bitstream signal, according to the invention the master latch coupled to the configuration bitstream signal shares its slave latch with another master latch. Hence, two or more master latches share a single slave latch. A multiplexer is coupled between the master latches and the single slave latch for selecting which master latch is coupled to the slave latch.
[0064] In one embodiment, one master latch's input is coupled to a signal that frequently requires the storage element functionality and the other master latch's input is coupled to a signal that requires the storage element functionality on an infrequent basis. In the first master latch is coupled to the data path signal and the second master latch is coupled to the configuration bit signal. When the data path signal is passed to the slave latch, the storage element functions to divide the data path pipeline into stages. When the configuration bitstream signal is passed to the slave latch, the storage element functions to store the configuration bits. In another embodiment, one master latch is coupled to the data path signal and more than one master latch is coupled to a configuration bit signal and all of the master latch outputs are coupled to the multiplexer which is used to select and pass one of the signals from the master latches to the shared slave latch. [0065] For Fig. 18:
• master latches are reset upon 'RESET' or 'INIT' • slave latches are reset upon 'RESET' only
• mux A selects config path whenever configuration is activating, (further qualified by the particular slice being selected)
• mux B selects arc bus when arc is writing (further qualified by decoding corresponding arc address. Please refer to ARC ext spec for address map)
• master latches are transparent only during clock low
• slave latches are transparent only during clock high • master latch 0 is transparent when the latpipe 0 is enabled or arc write to that register is happening
• master latch 1 is transparent when config loading is activate and the corresponded config address is decoded • slave latch is transparent when 1. config activate to this slice or
2. arc write to this register or
3. latpipe signal from control is high
• This setup is under the assumption that configuration and arc write will not happen at the same time. If it does happen, the configuration has higher priority.
[0066] Another embodiment of the present invention concerns the variable delay units of the present invention. Variable delay units are comprised of a multiplexer which receives first unit sent into a register and the second input which bypasses the register. In this way a variable delay can be implemented. In the reconfigurable functional unit of Fig. 3 the register 60, connected to the multiplexer 68 and the register 62 connected from multiplexer 70 and register 88 connected to the multiplexer 92, the register 90 connected to the multiplexer 94 and the register 74 connected to the multiplexer 72, can implement such a variable delay. A multiplexer can select a delayed or a bypass signal; the delay signal going through a delay element like a flip-flop.
[0067] The flexible adaptive delay element includes a storage device (e.g., flip-flop, latch) having its input coupled to an input signal and its output coupled to a first input of a multiplexer. The other input of the multiplexer is coupled to the input signal. As a result, the first input of the multiplexer is coupled to the input signal and a second input of the multiplexer is couple to the input signal delayed by the amount provided from the storage device. The select signal can then be used to select either the delayed signal or the undelayed signal. [0068] Fig. 19 shows an alternate embodiment of a background foreground plane arrangement
[0069] The present invention incorporates by reference the prior patent applications, including "A HIGH PERFORMANCE DATA PATH UNIT FOR BEHAVIORAL DATA TRANSMISSION AND RECEPTION, " Inventor Hsinshih Wang, Serial No. 09/307,072 filed May 7, 1999 (Attorney Docket No. 032001-014), "CONTROL FABRIC FOR ENABLING DATA PATH FLOW," Inventors Shaila Hanrahan, et al., Serial No. 09/401,194 filed September 23, 1999 (Attorney Docket No. 032001-016), as well as "CONFIGURATION STATE MEMORY FOR FUNCTIONAL BLOCKS ON A RECONFIGURABLE CHIP, " Inventors Shaila Hanrahan and Christopher E. Phillips, Serial No. 09/401,312, and filed on September 23, 1999 (Attorney Docket No. 032001-035). [0070] Vermont Embodiments. ( [0071] Fig. 20 illustrates a ultimate embodiment of the reconfigurable functional unit or data path unit. In this embodiment an additional register and multiplexer are added to the B input path before the shifter. Additionally, the input multiplexer is slightly modified. The input multiplexer is shown with respect to Fig. 21. [0072] Fig. 22 illustrates the shifter mode table for the new embodiment of Fig. 19.
[0073] Fig. 23 illustrates the implementation of the new modes of Fig. 22. [0074] Fig. 24 illustrates a turbo look up table for use with the system of the present invention. The turbo look up table is useful in the addition of data stored in a logarithmic format. This is useful for many communication systems. In one prior embodiment in order to do the multiplication of data stored in logarithmic format, the data must be converted to the normal format by doing an exponential expansion of the data. The exponentially expanded data is added together and then the combined information is converted back to the logarithmic format. In the preferred embodiment, the turbo look up table is used in the production of an estimate of the addition of a correction factor. This estimate uses the value of the greatest value of A and B as a first estimate of the value of addition of A plus B. The absolute value of this difference of A minus B is used as an input to a look up table to provide a correction factor to add to the greatest value of A or B. By adding this correction factor to the greatest value of A or B a relatively accurate estimation is produced. Note that the look up table need not have the same number of input bits of A. In a preferred embodiment, only a few bits of precision are used. If the magnitude of A minus B is relatively large, the combined value does not significantly differ from the greatest value of A or B. For example, the addition of 1,000,000 to 0.1 is approximately 1,000,000. The addition of 1,000,000 to 1,000,000 is effectively doubling the maximum value. [0075] Appendixes II and III further illustrate the Vermont embodiment of the reconfigurable functional unit. [0076] It will be appreciated by those of ordinary skill in the art that the invention can be implemented in other specific forms without departing from the spirit or character thereof. The presently disclosed embodiments are therefore considered in all respects to be illustrative and not restrictive. The scope of the invention is illustrated by the appended claims rather than the foregoing description, and all changes that come within the meaning and range of equivalents thereof are intended to be embraced herein. Appenαix i
-17-
1.9 Opcode Details
Name ADD
Pseudocode result = A+B Description A plus B Flags Affected CO carry out of the operation OV if the operation overflow EQ if a==b SN result[31]
Name ADD16
Pseudocode result = {(AH+BH),(AL+BL)} Description Parallel A plus B Flags Affected CO.OV.EQ.SN similar to ADD except the flag[0] is also valid
Name ADDC
Pseudocode result = A+B+Cin
Description A plus B with carry in for 32 bit operation.
Cin is the CO of the lower 16bits.
Flags Affected CO OV EQ SN similar to ADD
Name ADDCNT
Pseudocode result = A+B+Ctrl Description A plus B with control carry in Flags Affected CO,OV,EQ,SN similar to ADD
Name SUB
Pseudocode result = A-B Description A minus B Flags Affected COcarry out of the operation (A+-B+1)
OVif the operation overflow
GTif A>B LTif A<B EQif A==B SNsign of result
Name SUB16
Pseudocode result = {(AH-BH),(AL-BL)} Description Parallel A minus B Flags Affected CO,OVIGTlLTlEQ>SN similar to SUB
Name SUBC
Pseudocode result = A+~B+Cin Description A minus B with carry in for 32 bit operation Flags Affected CO.OV.GT.LT.EQ.SN
similar to SUB
Name SUBCNT
Pseudocode result = A+~B+Ctrl Description A minus B with control carry in Flags Affected CO.OV.GT.LT.EQ.SN similar to SUB
Name SADD Pseudocode if (overflow) result = max else if (underflow) result = min else result = A+B
Description A plus B with saturation Flags Affected COcarry out of A+B OVoverflow of A+B EQA==B SN sign of result Name SADD16
Pseudocode Parallel 16 bit version of SADD Description A plus B with saturation Flags Affected CO,OV,EQ,SN similar to SADD, flag[0j is valid
Name SADDCNT Pseudocode if (overflow) result = max else if (underflow) result = min else result = A+B+Ctrl
Description A plus B with control carry in and saturation Flags Affected CO.OV.EQ.SN similar to SADD
Name SSUB Pseudocode if (overflow) result •= max else if (underflow) result = min else result = A-B
Description A minus B with saturation Flags Affected CO carry out of A+-B+1 OV overflow of A-B GT A>B LT A<B EQ A==B SN sign of result
Name SSUB16 Pseudocode Parallel 16 bit version of SSUB Description A minus B with saturation Flags Affected CO.OV.GT.LT.EQ.SN similar to ssub, flag[0] valid
Name SSUBCNT Pseudocode if (overflow) result = max else if (undeflow) result = min else result = A+~B+Ctrl
Description A minus B with control carry in and saturation Flags Affected CO.OV.GT.LT.EQ.SN similar to ssub
Name INC
Pseudocode result = B+1 Description Increment B Flags Affected CO carry out of B+1
OV overflow of B+1
SN sign of B+1
Name DEC
Pseudocode result = B-1 Description Decrement B Flags Affected CO.carry out of B+Oxfrffffff OV.overflow of B-1
SN sign of B-1
Name NEG
Pseudocode result = ~B+1 Description Negate B Flag Affected
SNsign of ~B+1
Name ABS
Pseudocode if (B is negative) result = ~B+1 else result = B
Description Absolute value of B
Flags Affected
Name ABS 16
Pseudocode Parallel 16 bit version of ABS
Description Absolute value of B, for 32 bit operation
Flags Affected
Name CSUB
Pseudocode if (A-B >= 0) result = A-B else result = A
Description Conditional subtraction
Flags Affected CO, carry out of A+-B+1
OV, overflow of A-B
GT,A>B
LT,A<B
EQ,A==B
SNsign of result
Name AND
Pseudocode result = A&B
Description Bitwise AND
Flags Affected EQ A==B
SN bit[31] of result Name OR
Pseudocode result = A|B Description Bitwise OR Flags Affected EQ,SN same as AND
Name NAND
Pseudocode result = ~(A&B) Description Bitwise NAND Flags Affected EQ.SN same as AND
Name NOR
Pseudocode result = ~(A|B) Description Bitwise NOR Flags Affected EQ.SN same as AND
Name XOR
Pseudocode result = AΛB Description Bitwise XOR Flags Affected EQ,SN same as AND
Name XNOR
Pseudocode result = ~(AΛB) Description Bitwise XNOR Flags Affected EQ,SN same as AND
Name PASSA
Pseudocode result *= A Description Pass A Flags Affected EQ,SN same as AND Name PASSB
Pseudocode result = B Description Pass B Flags Affected EQ.SN same as AND
Name NOTA
Pseudocode result = ~A Description Invert A Flags Affected EQ.SN same as AND
Name NOTB
Pseudocode result = ~B Description Invert B Flags Affected EQ,SN same as AND
Name MIN Pseudocode if (A<B) result = A else result = B
Description Return smaller of A and B Flags Affected GT A>B LT A<B EQ A==B SN bit[31] of result CO carry out of A+-B+1 OV overflow of A-B
Name MIN16 Pseudocode Parallel 16 bit version of MIN Description Return smaller of A and B, for 32 bit operation Flags Affected same as min, valid for flag[0] as well
Name MAX Pseudocode if (A>B) result = A else result = B
Description Return larger of A and B Flags Affected
GT A>B
LT A<B
EQ A==B
SN bit[31] of result
CO carry out of A+-B+1
OV overflow of A-B
Name MAX16
Pseudocode Parallel 16 bit version of MAX Description Return larger of A and B, for 32 bit operation Flags Affected same as min, valid for flag[0] as well
Name PENC Pseudocode result = 0; for( i = 31;i >= 0;i -;){ if (B(i) == 1) { result = i+1 ; break;
} }
Description Leading One Detection, Flags Affected none
Name MUXBBA Pseudocode result = in[A[4:0]j Description Use the 4 Isb of input A to multiplex the 16 inputs of B Flags Affected
SN bit[31] of result
Name SHIFTBBA Pseudocode ■f <A[5]) result = B « A[4:0]; else result = B « A[4:0];
Description Shift B by A. The 16/32 bit is decided by con32 bit in config memory. In either case, the shifted out bit will be passed to the flag.
Flags Affected bit[31] of result
Figure imgf000028_0001
Figure imgf000029_0001
1.11 DPU Functions - After the shifter/Mask
16 bit 32 bit
Arithmetic Add Add
Sub Sub
Saturation Add Saturation Add
Saturation Sub Saturation Sub
Inc Inc
Dec Dec
Neg Neg
Logic AND AND
OR OR
XOR XOR
NAND NAND
NOR NOR
XNOR XNOR
NOT NOT
PASS(NOP) PASS(NOP)
Special Func. ABS ABS
MIN MIN
MAX MAX
Rxor Rxor
N/A DIY
N/A LFSR
N/A PENC
N/A MUXB by A1
N/A SHFTB by A2
NOTE: We can build a 896 bit (28*32) operator per slice.
1 To implement N:l multiplexers where 2<=N<=8
2 A[4:0] is the shift amount, A[5] is the direction of the shift CS2212 ALU Opcode Additions
The following opcodes will be added to the CS2212:
ADD8 SUB8
ADDSUB16 SUBADD16
Operation ADD8: 8 bit addition Out[7:0] = A[7:0] + B[7:0] Out[15:8] = A[15:8] + B[15:8] Out[23:16] = A[23:16] + B[23:16] Out[31 :24] = A[31 :24] + B[31 :24]
Opcode
Bit Granularity 8 bit operation
Flags Affected No flags are available
Operation SUB8: 8 bit addition Out[7:0] = A[7:0] + ~B[7:0] + 1 Out[15:8] = A[15:8] + ~B[15:8] + 1 Out[23:16] = A[23:16] + ~B[23:16] + 1 Out[31:24] = A[31:24] + ~B[31:24] + 1
Opcode
Bit Granularity 8 bit operation
Flags Affected No flags are available
Operation ADDSTJB16: 16 bit addition and subtraction Out[31:16] = A[31:16] + B[31:16] Out[15:0] = A[15:Q] f ~B[15:01 + 1
Opcode
Bit Granularity 16 bit operation
Flags Affected CO, OV, EQ, SN
Operation SUB DD16: 16 bit addition and subtraction Out[31:16] = A[31:16] + ~B[31:16] + 1 Out[15:0] = A[15:0] + B[15:0]
Opcode
Bit Granularity 16 bit operation
Flags Affected CO, OV, EQ, SN CS2212 Multiplier Output Mux Specification
The output mux for the Multiplier will be changing for the CS2212 in order to allow the A or B operand to be latched at the O register. This effectively bypasses the multiplication operation. This will require the addition of one bit to the 'muxmultlsmsel' ■ field in the MULT CSM. The signal 'muxmultlsmsel' will select the input to the O register in the following fashion:
Figure imgf000032_0001
This functionality allows the user to employ the multiplier as a dynamically configurable routing resource when it is not being used for its primary function.
CS2212 Pipeline Register Specification
The CS2112 has registers that can be used as either mask registers or pipeline registers, hi order to allow the user to use pipeline registers in the A and B operand paths, and use the mask registers, the CS2212 will include additional registers. These registers are inserted after the A and B input and will be referred to as 'apipe' and 'bpipe'. These registers can be bypassed by the signals 'muxapipe' and 'muxbpipe' respectively. Refer to the CS2212 DPU Block Diagram for the placement of these registers and muxes. The muxes are selected in the following fashion:
Figure imgf000033_0001
CS2212 LSM Write Data Mux Specification
The CS2212 will be able to write to the LSM with the shifter output, as well as with the ALU output. In order to facilitate this added functionality, a mux will be added to the LSM write data path. This mux will be referred to as 'muxlsmwd', and will be selected by 'muxlsmwdseP in the following fashion:
Figure imgf000034_0001
Refer to the CS2212 Block Diagram for the placement of 'muxlsmwd'.
2.1 General Description
The fabric is reconfigurable and it is controlled by config bits. The config bits are loaded into the fabric by issuing arc instructions (through load store) and then the config controller will transfer the config bits into the fabric's configuration plane.
The following table is to provide software information about what address does each config signal correspond to in the address space. So far the base address of the configuration has not been determined yet and so the following address will start from 0.
2.2 Details
• The top 16 bits will be the embedded address
• The top bit (bit[127]) is the parity bit. The hardware will check for even parity in each line of 128 bit. For behaviour of parity, please check ARC extension spec by Dani.
• The address below is relative to some base_address
• Each 128bit line will carry 112 bits of config data
• During config loading, the hardware can store at most 112 bit line each cycle
• For the lines which are not needed for a certain configuration, those particular line(s) can be skipped.
• The slice address will not be embedded in the current address map and will be erased from the current mapping for now. User should refer to ARC extension spec to see how to configure multiple slice in one operation.
2.3 Address Map
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000041_0001
Input_mux Slice 0 PLAO ( each bit has individual enables) PLAin[0].is the multiplexer input associated with DPUO PLAin[l] is the multiplexer input associated with DPUO STATE0[n] is the primary state output of slice 0 LSTATEOfn] is the lower state output of slice 0 FLAG[n] is the primary flag output of slice 0 LSTATE0[n] is the lower flag output of slice 0 HBIT0[n] is the Horizontal State bus' HBIT0[n] needs an optional register before the input mux. IO[7:0] are 8 I/O bits coming from the pins associated with each slice.
Bit 0 input, DATA, is the hortnet mux output from the datapath. Each tile has 8 hortnet mux. The control will pick up the lower 16 bit hortnet mux 7 from each tile. Notice it is the value at the output of the mux 7, input of the tristate. • Tile D is taken out and so data it provides (STATE*[31:24],FLAG[27:21]...) are connected to ground)
The PLA_in_sel are 64tol mux select. There are 16 of them for each PLA (each bit are individually . controlled).
Figure imgf000041_0002
• Horzjnux SliceO PLAO (each bit has individual enables)
• HMUX[0] is one of 16 Horizontal State Lines;
• Each mux needs also tristate enable.
Figure imgf000041_0003
Figure imgf000042_0001

Claims

WHAT IS CLAIMED IS:
1. A reconfigurable chip including:
Multiple reconfigurable functional units adapted to implement different functions, the reconfigurable functional units including multiplexers, at least one shifter unit and at least one arithmetic logic unit, the reconfigurable functional units being configured by a reconfigurable functional unit instruction, the instruction controlling the configuration of the multiplexers, shifter unit and arithmetic logic unit; and
Interconnect elements adapted to selectively connect together some of the reconfigurable functional units.
2. The reconfigurable chip of Claim 1 wherein the reconfigurable functional unit instruction is divided into a number of fields, including a multiplexer field, a shifter unit field and an arithmetic logic unit field.
3. The reconfigurable chip of Claim 1 wherein the reconfigurable functional unit comprised of the data path unit.
4. The reconfigurable chip of Claim 1 wherein the interconnect element are adapted to transfer word length data.
5. The reconfigurable chip of Claim 4 wherein the word length data are 32 bits long or greater.
6. The reconfigurable chip of Claim 1 further comprising an instruction memory storing multiple instructions for the reconfigurable functional units.
7. The reconfigurable chip of Claim 1 wherein the shifter unit is configurable with a number of modes.
8. The reconfigurable chip of Claim 7 wherein the reconfigurable functional unit instruction includes a shifter unit field which controls the mode of the shifter unit.
9. The reconfigurable chip of Claim 1 wherein at least one of the multiplexers is associated with a delay unit input and any input that bypasses the delay unit to implement variable delay system.
10. The reconfigurable chip of Claim 1 wherem the reconfigurable functional unit include registers for temporarily storing values within the reconfigurable functional unit.
11. A reconfigurable chip including:
Multiple reconfigurable functional units, the reconfigurable functional units including multiplexers, at least one shifter unit and at least one arithmetic logic unit, the shifter unit adapted to allow the arithmetic logic units to operate on different bits within the word-length input data of the reconfigurable functional unit; and
Interconnect elements adapted to selectively connect together some of the reconfigurable functional units, the interconnect elements adapted to transfer word-length data.
12. The reconfigurable chip of Claim 11 wherein the word length data are 32 bits or greater.
13. The reconfigurable chip of Claim 12 wherein the word length data are 32 bits long.
14. The reconfigurable chip of Claim 11 wherein the reconfigurable functional units are configured by reconfigurable functional unit instruction. The instruction controlling the configuration of the multiplexers, shifter unit and arithmetic logic unit.
15. The reconfigurable chip of Claim 11 wherein the reconfigurable chip further comprises an instruction memory storing multiple instructions for the reconfigurable functional unit.
16. The reconfigurable chip of Claim 11 wherein the shifter unit is configurable with a number of different modes.
17. The reconfigurable chip of Claim 11 wherein some of the multiplexers are associated with a delay unit input and an input that bypasses the delay unit.
18. A reconfigurable chip including:
Multiple reconfigurable functional units, the reconfigurable functional units including multiplexers, at least one shifter unit and at least one arithmetic logic unit, the reconfigurable functional units being configured by a reconfigurable functional unit instruction, the instruction controlling the configuration of the multiplexers, shifter unit and arithmetic control unit; and
An instruction memory storing multiple instructions for the reconfigurable functional units.
19. The reconfigurable chip of Claim 18 wherein an instruction memory is associated with each reconfigurable functional unit.
20. The reconfigurable chip of Claim 18 wherein the instruction memory is associated with a state machine producing an address for the instruction memory.
21. The reconfigurable chip of Claim 18 wherein the reconfigurable functional unit instruction includes fields for configuring the multiplexer, a shifter unit control field and an arithmetic logic unit control field.
22. The reconfigurable chip of Claim 18 further comprising an interconnect elements adapted to selectively connect together some of the reconfigurable functional units.
23. The reconfigurable chip of Claim 22 wherein the interconnect units adapted to transfer word length data.
24. The reconfigurable chip of Claim 18 wherein the shifter unit is configurable with a number of modes.
25. The reconfigurable chip of Claim 24 wherein the shifter unit is controlled by a shifter unit field, the reconfigurable unit instruction.
26. The reconfigurable chip of Claim 18 wherein at least one of the multiplexers is associated with a delay unit input and an input that bypasses the delay unit so that a variable delay can be implemented.
27. A reconfigurable chip including:
Multiple reconfigurable functional units, the reconfigurable functional units including multiplexers, at least one shifter unit and at least one arithmetic logic unit, the shifter unit being configurable with a number of modes; and Interconnect elements adapted to selectively connect together some of the reconfigurable functional units.
28. The reconfigurable of Claim 27 wherem the shifter modes include modes other than logical and arithmetic left and right shifts.
29. The reconfigurable chip of Claim 27 wherein at least one mode rearranges blocks of the input word.
30. The reconfigurable chip of Claim 27 wherein one of the modes comprises a constant generation.
31. The reconfigurable chip of Claim 27 wherein one of the modes comprises the duplication of one set of bits to another set of bits.
32. The reconfigurable chip of Claim 27 wherein one of the modes comprises swapping some of the groups of bits with other groups of bits.
33. The reconfigurable chip of Claim 27 wherein the reconfigurable functional units are configured by reconfigurable functional unit instructions. The reconfigurable functional unit instruction configuring the arithmetic logic unit, shifter unit and multiplexers.
34. The reconfigurable chip of Claim 33 wherein the reconfigurable functional unit instruction includes a field for controlling the shifter unit which controls the mode of the shifter unit.
35. The reconfigurable chip of Claim 27 wherein the interconnect elements are adapted to transfer word length data.
36. The reconfigurable chip of Claim 27 wherein the further comprising instruction memory storing instructions for the reconfigurable functional unit.
37. The reconfigurable chip of Claim 27 wherein at least one of the multiplexers is associated with the delay input unit and an input that bypasses the delay unit so as to implement a variable delay.
38. A reconfigurable chip including:
Multiple reconfigurable functional units, the reconfigurable functional units including multiplexers, at least one shifter unit and at least one arithmetic logic unit, wherein at least one of the multiplexers are associated with a delay unit input and an input that bypasses the delay unit ; and
Interconnect elements adapted to selectively connect together some of the reconfigurable functional units.
39. The reconfigurable chip of Claim 38 wherein the reconfigurable functional units are reconfigured by a reconfigurable functional unit instruction, the instruction controlling the configuration of the multiplexer, shift unit and arithmetic logic unit.
40. The reconfigurable chip of Claim 39 wherein the reconfigurable functional unit instruction includes a number of different fields for controlling the configuration of the multiplexers, shifter unit and arithmetic logic unit.
41. The reconfigurable chip of Claim 39 wherein a field of an instruction for the reconfigurable functional unit indicates the mode of the abstract.
42. The reconfigurable chip of Claim 38 wherein the interconnect elements are adapted to transfer word length data.
43. The reconfigurable chip of Claim 38 wherein further comprising instruction memory storing multiple instructions for the reconfigurable functional units.
44. The reconfigurable chip of Claim 38 wherein the reconfigurable functional units include a shifter unit configurable with a number of different modes.
PCT/US2002/011870 2001-05-02 2002-05-02 Efficient high performance data operation element for use in a reconfigurable logic environment WO2002103518A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
DE10296742T DE10296742T5 (en) 2001-05-02 2002-05-02 Efficient high performance data operational element for use in a reconfigurable logic environment
JP2003505770A JP2004531149A (en) 2001-05-02 2002-05-02 Efficient performance data operation element for use in repositionable logical environment
GB0327399A GB2398653A (en) 2001-05-02 2002-05-02 Efficient high performance data operation element for use in a reconfigurable logic environment
KR1020037014350A KR100628448B1 (en) 2001-05-02 2002-05-02 Efficient high performance data operation element for use in a reconfigurable logic environment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US28829801P 2001-05-02 2001-05-02
US60/288,298 2001-05-02

Publications (1)

Publication Number Publication Date
WO2002103518A1 true WO2002103518A1 (en) 2002-12-27

Family

ID=23106530

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/011870 WO2002103518A1 (en) 2001-05-02 2002-05-02 Efficient high performance data operation element for use in a reconfigurable logic environment

Country Status (7)

Country Link
US (1) US20030088757A1 (en)
JP (1) JP2004531149A (en)
KR (1) KR100628448B1 (en)
CN (1) CN1860441A (en)
DE (1) DE10296742T5 (en)
GB (1) GB2398653A (en)
WO (1) WO2002103518A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006042736A1 (en) * 2004-10-18 2006-04-27 Nuyens Hildegarde Francisca Fe Reconfigurable, modular and hierarchical parallel processor system
WO2006092556A2 (en) * 2005-03-03 2006-09-08 Clearspeed Technology Plc Reconfigurable logic in processors

Families Citing this family (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI234737B (en) * 2001-05-24 2005-06-21 Ip Flex Inc Integrated circuit device
US6781408B1 (en) 2002-04-24 2004-08-24 Altera Corporation Programmable logic device with routing channels
US7142011B1 (en) 2002-04-24 2006-11-28 Altera Corporation Programmable logic device with routing channels
US8620980B1 (en) 2005-09-27 2013-12-31 Altera Corporation Programmable device with specialized multiplier blocks
US8266198B2 (en) 2006-02-09 2012-09-11 Altera Corporation Specialized processing block for programmable logic device
US8301681B1 (en) 2006-02-09 2012-10-30 Altera Corporation Specialized processing block for programmable logic device
US8266199B2 (en) 2006-02-09 2012-09-11 Altera Corporation Specialized processing block for programmable logic device
US8041759B1 (en) 2006-02-09 2011-10-18 Altera Corporation Specialized processing block for programmable logic device
US7836117B1 (en) 2006-04-07 2010-11-16 Altera Corporation Specialized processing block for programmable logic device
US7822799B1 (en) 2006-06-26 2010-10-26 Altera Corporation Adder-rounder circuitry for specialized processing block in programmable logic device
US8099583B2 (en) * 2006-08-23 2012-01-17 Axis Semiconductor, Inc. Method of and apparatus and architecture for real time signal processing by switch-controlled programmable processor configuring and flexible pipeline and parallel processing
US8386550B1 (en) 2006-09-20 2013-02-26 Altera Corporation Method for configuring a finite impulse response filter in a programmable logic device
US7930336B2 (en) 2006-12-05 2011-04-19 Altera Corporation Large multiplier for programmable logic device
US8386553B1 (en) 2006-12-05 2013-02-26 Altera Corporation Large multiplier for programmable logic device
US7814137B1 (en) 2007-01-09 2010-10-12 Altera Corporation Combined interpolation and decimation filter for programmable logic device
US8650231B1 (en) 2007-01-22 2014-02-11 Altera Corporation Configuring floating point operations in a programmable device
US7865541B1 (en) 2007-01-22 2011-01-04 Altera Corporation Configuring floating point operations in a programmable logic device
US8645450B1 (en) 2007-03-02 2014-02-04 Altera Corporation Multiplier-accumulator circuitry and methods
US7949699B1 (en) 2007-08-30 2011-05-24 Altera Corporation Implementation of decimation filter in integrated circuit device using ram-based data storage
US8959137B1 (en) 2008-02-20 2015-02-17 Altera Corporation Implementing large multipliers in a programmable integrated circuit device
US8244789B1 (en) 2008-03-14 2012-08-14 Altera Corporation Normalization of floating point operations in a programmable integrated circuit device
US8078833B2 (en) * 2008-05-29 2011-12-13 Axis Semiconductor, Inc. Microprocessor with highly configurable pipeline and executional unit internal hierarchal structures, optimizable for different types of computational functions
US8181003B2 (en) 2008-05-29 2012-05-15 Axis Semiconductor, Inc. Instruction set design, control and communication in programmable microprocessor cores and the like
US8626815B1 (en) 2008-07-14 2014-01-07 Altera Corporation Configuring a programmable integrated circuit device to perform matrix multiplication
US8255448B1 (en) 2008-10-02 2012-08-28 Altera Corporation Implementing division in a programmable integrated circuit device
US8307023B1 (en) 2008-10-10 2012-11-06 Altera Corporation DSP block for implementing large multiplier on a programmable integrated circuit device
US8549055B2 (en) 2009-03-03 2013-10-01 Altera Corporation Modular digital signal processing circuitry with optionally usable, dedicated connections between modules of the circuitry
US8468192B1 (en) 2009-03-03 2013-06-18 Altera Corporation Implementing multipliers in a programmable integrated circuit device
US8706790B1 (en) 2009-03-03 2014-04-22 Altera Corporation Implementing mixed-precision floating-point operations in a programmable integrated circuit device
US8805916B2 (en) * 2009-03-03 2014-08-12 Altera Corporation Digital signal processing circuitry with redundancy and bidirectional data paths
US8645449B1 (en) 2009-03-03 2014-02-04 Altera Corporation Combined floating point adder and subtractor
US8886696B1 (en) 2009-03-03 2014-11-11 Altera Corporation Digital signal processing circuitry with redundancy and ability to support larger multipliers
US8650236B1 (en) 2009-08-04 2014-02-11 Altera Corporation High-rate interpolation or decimation filter in integrated circuit device
US8412756B1 (en) 2009-09-11 2013-04-02 Altera Corporation Multi-operand floating point operations in a programmable integrated circuit device
US8396914B1 (en) 2009-09-11 2013-03-12 Altera Corporation Matrix decomposition in an integrated circuit device
US8539016B1 (en) 2010-02-09 2013-09-17 Altera Corporation QR decomposition in an integrated circuit device
US7948267B1 (en) 2010-02-09 2011-05-24 Altera Corporation Efficient rounding circuits and methods in configurable integrated circuit devices
US8601044B2 (en) * 2010-03-02 2013-12-03 Altera Corporation Discrete Fourier Transform in an integrated circuit device
US8458243B1 (en) 2010-03-03 2013-06-04 Altera Corporation Digital signal processing circuit blocks with support for systolic finite-impulse-response digital filtering
US8484265B1 (en) 2010-03-04 2013-07-09 Altera Corporation Angular range reduction in an integrated circuit device
US8510354B1 (en) 2010-03-12 2013-08-13 Altera Corporation Calculation of trigonometric functions in an integrated circuit device
US8539014B2 (en) 2010-03-25 2013-09-17 Altera Corporation Solving linear matrices in an integrated circuit device
US8645807B2 (en) * 2010-05-31 2014-02-04 National Chiao Tung University Apparatus and method of processing polynomials
US8862650B2 (en) 2010-06-25 2014-10-14 Altera Corporation Calculation of trigonometric functions in an integrated circuit device
US8589463B2 (en) 2010-06-25 2013-11-19 Altera Corporation Calculation of trigonometric functions in an integrated circuit device
US8577951B1 (en) 2010-08-19 2013-11-05 Altera Corporation Matrix operations in an integrated circuit device
US8645451B2 (en) 2011-03-10 2014-02-04 Altera Corporation Double-clocked specialized processing block in an integrated circuit device
US9600278B1 (en) 2011-05-09 2017-03-21 Altera Corporation Programmable device using fixed and configurable logic to implement recursive trees
US10534608B2 (en) * 2011-08-17 2020-01-14 International Business Machines Corporation Local computation logic embedded in a register file to accelerate programs
US8812576B1 (en) 2011-09-12 2014-08-19 Altera Corporation QR decomposition in an integrated circuit device
US9053045B1 (en) 2011-09-16 2015-06-09 Altera Corporation Computing floating-point polynomials in an integrated circuit device
US8949298B1 (en) 2011-09-16 2015-02-03 Altera Corporation Computing floating-point polynomials in an integrated circuit device
US8762443B1 (en) 2011-11-15 2014-06-24 Altera Corporation Matrix operations in an integrated circuit device
US8543634B1 (en) 2012-03-30 2013-09-24 Altera Corporation Specialized processing block for programmable integrated circuit device
US9098332B1 (en) 2012-06-01 2015-08-04 Altera Corporation Specialized processing block with fixed- and floating-point structures
US8996600B1 (en) 2012-08-03 2015-03-31 Altera Corporation Specialized processing block for implementing floating-point multiplier with subnormal operation support
US9207909B1 (en) 2012-11-26 2015-12-08 Altera Corporation Polynomial calculations optimized for programmable integrated circuit device structures
US9189200B1 (en) 2013-03-14 2015-11-17 Altera Corporation Multiple-precision processing block in a programmable integrated circuit device
US9348795B1 (en) 2013-07-03 2016-05-24 Altera Corporation Programmable device using fixed and configurable logic to implement floating-point rounding
US9379687B1 (en) 2014-01-14 2016-06-28 Altera Corporation Pipelined systolic finite impulse response filter
CN104900260B (en) * 2014-03-07 2018-08-24 中芯国际集成电路制造(上海)有限公司 Delay selection device
US11797473B2 (en) * 2014-05-29 2023-10-24 Altera Corporation Accelerator architecture on a programmable platform
CN106575279B (en) * 2014-05-29 2019-07-26 阿尔特拉公司 Accelerator architecture on programmable platform
US9684488B2 (en) 2015-03-26 2017-06-20 Altera Corporation Combined adder and pre-adder for high-radix multiplier circuit
US10942706B2 (en) 2017-05-05 2021-03-09 Intel Corporation Implementation of floating-point trigonometric functions in an integrated circuit device
US10565036B1 (en) 2019-02-14 2020-02-18 Axis Semiconductor, Inc. Method of synchronizing host and coprocessor operations via FIFO communication
JP7433931B2 (en) 2020-01-27 2024-02-20 キヤノン株式会社 Information processing device and its control method and program

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4761755A (en) * 1984-07-11 1988-08-02 Prime Computer, Inc. Data processing system and method having an improved arithmetic unit
US5794062A (en) * 1995-04-17 1998-08-11 Ricoh Company Ltd. System and method for dynamically reconfigurable computing using a processing unit having changeable internal hardware organization
US5828858A (en) * 1996-09-16 1998-10-27 Virginia Tech Intellectual Properties, Inc. Worm-hole run-time reconfigurable processor field programmable gate array (FPGA)
US5956518A (en) * 1996-04-11 1999-09-21 Massachusetts Institute Of Technology Intermediate-grain reconfigurable processing device
US5970254A (en) * 1997-06-27 1999-10-19 Cooke; Laurence H. Integrated processor and programmable data path chip for reconfigurable computing
US6052773A (en) * 1995-02-10 2000-04-18 Massachusetts Institute Of Technology DPGA-coupled microprocessors
US6108760A (en) * 1997-10-31 2000-08-22 Silicon Spice Method and apparatus for position independent reconfiguration in a network of multiple context processing elements
US6122719A (en) * 1997-10-31 2000-09-19 Silicon Spice Method and apparatus for retiming in a network of multiple context processing elements
US6128724A (en) * 1997-12-11 2000-10-03 Leland Stanford Junior University Computation using codes for controlling configurable computational circuit
US6226735B1 (en) * 1998-05-08 2001-05-01 Broadcom Method and apparatus for configuring arbitrary sized data paths comprising multiple context processing elements
US6353841B1 (en) * 1997-12-17 2002-03-05 Elixent, Ltd. Reconfigurable processor devices

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4862407A (en) * 1987-10-05 1989-08-29 Motorola, Inc. Digital signal processing apparatus
DE69132495T2 (en) * 1990-03-16 2001-06-13 Texas Instruments Inc Distributed processing memory
USH1385H (en) * 1991-10-18 1994-12-06 The United States Of America As Represented By The Secretary Of The Navy High speed computer application specific integrated circuit
US5517439A (en) * 1994-02-14 1996-05-14 Matsushita Electric Industrial Co., Ltd. Arithmetic unit for executing division
US5649174A (en) * 1994-12-09 1997-07-15 Vlsi Technology Inc. Microprocessor with instruction-cycle versus clock-frequency mode selection
US6510510B1 (en) * 1996-01-25 2003-01-21 Analog Devices, Inc. Digital signal processor having distributed register file
FR2757973B1 (en) * 1996-12-27 1999-04-09 Sgs Thomson Microelectronics MATRIX PROCESSING PROCESSOR
US5948098A (en) * 1997-06-30 1999-09-07 Sun Microsystems, Inc. Execution unit and method for executing performance critical and non-performance critical arithmetic instructions in separate pipelines

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4761755A (en) * 1984-07-11 1988-08-02 Prime Computer, Inc. Data processing system and method having an improved arithmetic unit
US6052773A (en) * 1995-02-10 2000-04-18 Massachusetts Institute Of Technology DPGA-coupled microprocessors
US5794062A (en) * 1995-04-17 1998-08-11 Ricoh Company Ltd. System and method for dynamically reconfigurable computing using a processing unit having changeable internal hardware organization
US5956518A (en) * 1996-04-11 1999-09-21 Massachusetts Institute Of Technology Intermediate-grain reconfigurable processing device
US5828858A (en) * 1996-09-16 1998-10-27 Virginia Tech Intellectual Properties, Inc. Worm-hole run-time reconfigurable processor field programmable gate array (FPGA)
US5970254A (en) * 1997-06-27 1999-10-19 Cooke; Laurence H. Integrated processor and programmable data path chip for reconfigurable computing
US6108760A (en) * 1997-10-31 2000-08-22 Silicon Spice Method and apparatus for position independent reconfiguration in a network of multiple context processing elements
US6122719A (en) * 1997-10-31 2000-09-19 Silicon Spice Method and apparatus for retiming in a network of multiple context processing elements
US6128724A (en) * 1997-12-11 2000-10-03 Leland Stanford Junior University Computation using codes for controlling configurable computational circuit
US6353841B1 (en) * 1997-12-17 2002-03-05 Elixent, Ltd. Reconfigurable processor devices
US6226735B1 (en) * 1998-05-08 2001-05-01 Broadcom Method and apparatus for configuring arbitrary sized data paths comprising multiple context processing elements

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006042736A1 (en) * 2004-10-18 2006-04-27 Nuyens Hildegarde Francisca Fe Reconfigurable, modular and hierarchical parallel processor system
WO2006092556A2 (en) * 2005-03-03 2006-09-08 Clearspeed Technology Plc Reconfigurable logic in processors
WO2006092556A3 (en) * 2005-03-03 2006-12-21 Clearspeed Technology Plc Reconfigurable logic in processors

Also Published As

Publication number Publication date
JP2004531149A (en) 2004-10-07
DE10296742T5 (en) 2004-04-29
GB2398653A (en) 2004-08-25
GB0327399D0 (en) 2003-12-31
KR100628448B1 (en) 2006-09-26
CN1860441A (en) 2006-11-08
US20030088757A1 (en) 2003-05-08
KR20040005944A (en) 2004-01-16

Similar Documents

Publication Publication Date Title
KR100628448B1 (en) Efficient high performance data operation element for use in a reconfigurable logic environment
US6591357B2 (en) Method and apparatus for configuring arbitrary sized data paths comprising multiple context processing elements
US6023742A (en) Reconfigurable computing architecture for providing pipelined data paths
US7266672B2 (en) Method and apparatus for retiming in a network of multiple context processing elements
US6745317B1 (en) Three level direct communication connections between neighboring multiple context processing elements
US5915123A (en) Method and apparatus for controlling configuration memory contexts of processing elements in a network of multiple context processing elements
US6108760A (en) Method and apparatus for position independent reconfiguration in a network of multiple context processing elements
US5659785A (en) Array processor communication architecture with broadcast processor instructions
US5805486A (en) Moderately coupled floating point and integer units
Enzler et al. Virtualizing hardware with multi-context reconfigurable arrays
US6052773A (en) DPGA-coupled microprocessors
US6668316B1 (en) Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file
US9564902B2 (en) Dynamically configurable and re-configurable data path
US8046564B2 (en) Reconfigurable paired processing element array configured with context generated each cycle by FSM controller for multi-cycle floating point operation
JP2008537268A (en) An array of data processing elements with variable precision interconnection
US5481736A (en) Computer processing element having first and second functional units accessing shared memory output port on prioritized basis
JP2005539293A (en) Programmable pipeline fabric using a partially global configuration bus
US5848284A (en) Method of transferring data between moderately coupled integer and floating point units
JP4451733B2 (en) Semiconductor device
US6282558B1 (en) Data processing system and register file
US20090113083A1 (en) Means of control for reconfigurable computers
Srini et al. Parallel DSP with memory and I/O processors
US20080263322A1 (en) Mac architecture for pipelined accumulations
US20080155138A1 (en) Datapipe cpu register array
US7685332B2 (en) Datapipe CPU register array and methods of use

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

ENP Entry into the national phase

Ref document number: 0327399

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20020502

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2003505770

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 1020037014350

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 028133811

Country of ref document: CN

RET De translation (de og part 6b)

Ref document number: 10296742

Country of ref document: DE

Date of ref document: 20040429

Kind code of ref document: P

WWE Wipo information: entry into national phase

Ref document number: 10296742

Country of ref document: DE

122 Ep: pct application non-entry in european phase
REG Reference to national code

Ref country code: DE

Ref legal event code: 8607