US20010042193A1 - Data processing unit with interface for sharing registers by a processor and a coprocessor - Google Patents

Data processing unit with interface for sharing registers by a processor and a coprocessor Download PDF

Info

Publication number
US20010042193A1
US20010042193A1 US09/189,111 US18911198A US2001042193A1 US 20010042193 A1 US20010042193 A1 US 20010042193A1 US 18911198 A US18911198 A US 18911198A US 2001042193 A1 US2001042193 A1 US 2001042193A1
Authority
US
United States
Prior art keywords
coprocessor
instruction
data processing
processing unit
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/189,111
Other versions
US6434689B2 (en
Inventor
Rod G. Fleck
Roger D. Arnold
Bruce Holmer
Danielle G. Lemay
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Infineon Technologies AG
Original Assignee
Infineon Technologies North America Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Infineon Technologies North America Corp filed Critical Infineon Technologies North America Corp
Assigned to SIEMENS MICROELECTRONICS, INC. reassignment SIEMENS MICROELECTRONICS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOLMER, BRUCE K., FLECK, ROD G., ARNOLD, ROGER D., LEMAY, DANIELLE G.
Priority to US09/189,111 priority Critical patent/US6434689B2/en
Priority to EP99116655A priority patent/EP1001335B1/en
Priority to DE69903704T priority patent/DE69903704D1/en
Publication of US20010042193A1 publication Critical patent/US20010042193A1/en
Publication of US6434689B2 publication Critical patent/US6434689B2/en
Application granted granted Critical
Assigned to INFINEON TECHNOLOGIES CORPORATION reassignment INFINEON TECHNOLOGIES CORPORATION CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SIEMENS DRAM SEMICONDUCTOR CORPORATION
Assigned to SIEMENS DRAM SEMICONDUCTOR CORPORATION reassignment SIEMENS DRAM SEMICONDUCTOR CORPORATION TRANSFER OF ASSETS Assignors: SMI HOLDING LLC
Assigned to SMI HOLDING LLC reassignment SMI HOLDING LLC MERGER (SEE DOCUMENT FOR DETAILS). Assignors: SIEMENS MICROELECTRONICS, INC.
Assigned to INFINEON TECHNOLOGIES NORTH AMERICA CORP. reassignment INFINEON TECHNOLOGIES NORTH AMERICA CORP. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: INFINEON TECHNOLOGIES CORPORATION
Assigned to INFINEON TECHNOLOGIES AG reassignment INFINEON TECHNOLOGIES AG ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INFINEON TECHNOLOGIES NORTH AMERICA CORP.
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
    • G06F9/3895Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
    • G06F9/3897Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units

Definitions

  • the present invention relates to a data processing unit with a coprocessor interface.
  • a coprocessor is used in a data processing system to perform special tasks, such as floating point operations, digital signal processing, etc.
  • Many data processors are capable of working in combination with a coprocessor.
  • a main processor addresses a coprocessor through the system bus. If the main processor decodes a coprocessor instruction, it transfers, for example by means of an exception routine, the coprocessor instruction and respective data to a coprocessor which performs the instruction and transfers back a result to the main processor.
  • the main processor usually is set in a wait state.
  • FIG. 7 of Pat. No. 5,603,047 shows a block diagram of such a coprocessor having 24 registers.
  • a coprocessor instruction has a specific format which is detected during the decode stage of the pipeline shown in FIG. 2 of Pat. No. 5,603,047.
  • the respective coprocessor instructions are described in column 20 of the Pat. No. 5,603,047. They include instructions for loading and storing data and control from or to the coprocessor.
  • the coprocessor can be able to perform a variety of functions which might be selected by various programs which can be selected through respective addresses which are transferred to the coprocessor.
  • the coprocessor executes these programs and when finished, the respective results can be transferred to the main processor through respective transfer instructions.
  • a data processing unit comprising a register file, a memory, a plurality of execution units, a pipeline configuration for processing instructions having a fetch stage for fetching an instruction from said memory, a decode stage for decoding an operational code from the instruction, an execution stage for activating one of the execution units, and a write-back stage for writing back from the execution unit, a coprocessor interface for coupling at least one coprocessor.
  • the data processing unit has read- and write-lines coupling the register file with the coprocessor for exchanging operands, at least one control line indicating that the coprocessor is busy, a plurality of control lines from the decode stage for controlling said coprocessor which are operated upon detection of a coprocessor instruction.
  • the coprocessor is using the registers from the register file during execution of a coprocessor instruction.
  • the coprocessor is using the register file of the main processor it can execute instructions as fast as any execution unit, such as the arithmetic logic unit, a shifter, a load/store unit, etc.
  • a coprocessor instruction is decoded and executed in the same manner as any other instruction.
  • a field programmable gate array (FPGA) is used as a coprocessor.
  • FPGA field programmable gate array
  • FIG. 1 shows a block diagram of the relevant parts of a data processing unit including a coprocessor interface according to the present invention
  • FIG. 2 shows the format of a coprocessor instruction
  • FIG. 3 shows a block diagram of an embodiment of a single coprocessor
  • FIG. 4 shows a block diagram of an embodiment of four coprocessors.
  • FIG. 1 shows a memory cache subsystem 1 coupled through a bus with a register file 2 .
  • Register file 2 contains an align unit 201 , address buffer 202 and data buffer 207 , address registers 203 and data registers 208 , address forwarding unit 204 and data forwarding unit 209 , address write-back buffer 205 and data write-back buffer 210 , and a control unit 206 .
  • the data registers are interfaced with the coprocessor. Therefore, only the most relevant connecting lines are shown in FIG. 1 for the sake of a better overview. Nevertheless any kind of register from a register file can be used to interface with the coprocessor interface.
  • the data registers 208 are coupled through data buffer 207 and align unit 201 with the cache memory subsystem 1 .
  • the first read bus 211 comprises 64 bit lines
  • the second read bus 212 has 32 bit lines
  • the third read bus 213 provides also 32 bit lines.
  • the number of bit lines per read port is freely selectable and depends on the instruction set.
  • a write bus 214 having 64 bit lines is provided.
  • the instruction decoder 6 provides all execution units with respective operational codes and selects the respective registers 203 , 208 in the register file 2 .
  • a coprocessor interface 7 is provided which is coupled with the four busses 211 , 212 , 213 , and 214 . Furthermore, coprocessor interface 7 is coupled through busses 61 and 62 with instruction decoder 6 .
  • Bus 61 can have n instruction lines for providing operational code and other information.
  • bus 62 has m control lines to provide the pipeline with status information from the coprocessors.
  • the control bus 61 , 62 can have the following functionality: One line can indicate a valid instruction which would be asserted when the integer pipeline is valid. Another line or set of lines could be provided for an instruction sequencer. Depending on the number of instruction cycles needed a 2 bit, 3bit, 4 bit, etc., -wide bus would be provided. A further line can indicate a multi cycle start which would be activated by the coprocessor to indicate when the instruction in the coprocessor decoder is a multi cycle instruction. Yet another line would be activated by the coprocessor to indicate the end of a multi cycle instruction, signaling the last re-inject of the instruction.
  • a multi cycle continue control line can be provided which would be activated by the coprocessor to re-inject an instruction during multi cycle start and end phase.
  • a further control line may be provided.
  • Further control lines indicate which coprocessor has to be enabled, for example, two lines can address four different coprocessors. Other control signals may be provided depending on the structure of the coprocessor unit.
  • the embodiment according to FIG. 1 shows three coprocessors.
  • the number of coprocessors which can be added to the system internally or externally depends on the instruction size of the data processing unit as will be explained later.
  • the first coprocessor 4 a in this embodiment shows a floating point coprocessor.
  • the second coprocessor 4 b is a fuzzy logic coprocessor and the third coprocessor is a re-programmable coprocessor in form of a FPGA. All coprocessors are coupled with the six busses 211 , 212 , 213 , 214 , 61 , and 62 through interface 7 .
  • FIG. 2 shows two possible formats A and B of a coprocessor instruction.
  • an instruction is 32 bits long and the bit fields indicating a coprocessor instruction can be one or both of the opcode fields OPCODE 1 , OPCODE 2 , and OP 1 , OP 2 , respectively.
  • the bit field D indicates the destination in form of a register number where the result of the respective instruction will be written to.
  • the bit field # indicates the number of the coprocessor for executing the instruction defined in the opcode bit field.
  • Bit fields S 1 , S 2 , S 3 contain either data register or immediate data for the respective instruction.
  • each of the bit field S 1 , S 2 , S 3 , and D are 4 bits wide
  • the OPCODE field comprises 12 or 16 bits.
  • the # field has 2 bits, and the 2 bits are not used in both instruction formats A and B indicated as “--”.
  • Instruction fetch unit 5 provides instruction decoder 6 with an instruction from a instruction stream. Instruction decoder 6 determines whether an instruction is designated to a coprocessor by means of the bit field OPCODE 1 , OPCODE 2 , and OP 1 , OP 2 , respectively. After decoding of an instruction the coprocessor indicated in the bit field # receives the respective instruction stored in the opcode bit fields and eventually immediate data from one or more of the bit fields S 1 , S 2 , S 3 through bus 61 and the contents of the selected data registers in bit fields S 1 , S 2 , and S 3 through the three data read busses 211 , 212 , and 213 .
  • the coprocessor executes the instruction decoded by the instruction decoder and writes during the write-back cycle the respective result back to a data register designated in bit field D.
  • execution of a coprocessor instruction can be as quick as an execution of any execution units. No transfers from or to registers are delaying the process of executing a special instruction because the respective coprocessor does not need its own registers. Nevertheless, a coprocessor may have additional registers which contain data that need not be accessible by the data processing unit.
  • a common known coprocessor needs to be initialized by transferring data to the coprocessor, configuring the coprocessor and transferring the respective instruction to the coprocessor. This creates an overhead affecting the overall speed of the system. Thus, a known coprocessor will stall the respective pipelines for a plurality of cycles.
  • the coprocessor according to the present invention does not need these steps. It can operate directly with the register file of the main CPU. Transfer of data is similar to the transfer of data to regular execution units. Thus every instruction which can be executed in a single cycle can be executed in parallel with another pipeline or multiple pipelines. In the embodiment of FIG.
  • the pipelines only get stalled with a multi-cycle instruction in a similar manner as this would occur with any execution unit of the central processing unit.
  • control lines indicating a multi-cycle start, a multi-cycle end, and a multi-cycle continuation described above are used.
  • Using a FPGA as a coprocessor comprises additional benefits.
  • a microcontroller system using a data processing unit according to the present invention is programmed initially.
  • the FPGA may be re-programmed and adapted to each specific task of a complex program dynamically.
  • an instruction for performing a convolution operation is not available in standard instruction sets of either a RISC or a CISC processor.
  • Such an instruction forms, for example, a 32 bit long word out of two 16 bit words by alternatively concatenating the bits of each input word. For example, if the first input word contains only “1111 . . .
  • the result would be a 32 bit word with alternating “0” and “1”.
  • the resulting word consists of bit 16 of the first word, followed by bit 16 of the second word, followed by bit 15 of the first word and so on.
  • the embodiment of a coprocessor interface provides three data read busses 211 , 212 , and 213 and one write-back bus 214 .
  • digital signal processing functionality can be provided by the coprocessors. For example, a single instruction can perform a multiplication of two operands and an addition of the result with a third operand. The final result is written into a designated register. All three operands can be transferred during the decode cycle to the respective coprocessor and written back to the destination register during the write-back cycle.
  • FIG. 3 shows the main blocks of a coprocessor 4 coupled with a coprocessor interface according to the invention.
  • Each coprocessor may have a decode unit 41 which receives the respective coprocessor instruction from the CPU.
  • Decode unit 41 decodes the instruction, for example, bits 16 to 23 according to an instruction as shown in FIG. 2.
  • decode unit 41 provides an execute unit 42 coupled with decode unit 41 with the respective control signals.
  • Execute unit may contain multiplexers, adders, shifter, etc. connected in a way to perform respective functions.
  • the control signals provided by decode unit 41 activate the respective units to operate in a predetermined way.
  • the result is passed to the coprocessor interface, which couples the result bus to the write back bus of the integer pipeline.
  • the coprocessor behaves in a similar way as an additional execution unit as shown in FIG. 1.
  • FIG. 4 shows a solution where multiple execution units 43 , 44 , 45 , and 46 share the same decode unit 41 .
  • Decode unit 41 decodes the respective coprocessor instruction and selects one of the execution units 43 , 44 , 45 , or 46 which performs the respective function. The result is again written back through interface 7 into the register file.
  • the coprocessor interface includes all necessary buffers and logic to feed necessary signals from or to the coprocessors.
  • the coprocessors according to the present invention can be coupled with the coprocessor interface 7 either on-chip or externally.
  • the coprocessors are coupled with the integer pipeline.
  • the coprocessor interface can also be coupled with a different type of pipeline or with more than one pipeline. Thus, two or more coprocessors could operate in parallel.

Abstract

A data processing unit is described comprising a register file, a memory, a plurality of execution units, a pipeline configuration for processing instructions having a fetch stage for fetching an instruction from said memory, a decode stage for decoding an operational code from said instruction, an execution stage for activating one of said execution units, and a write-back stage for writing back from said execution unit, a coprocessor interface for coupling at least one coprocessor. The data processing unit has read- and write-lines coupling said register file with said coprocessor for exchanging operands, at least one control line indicating that said coprocessor is busy, a plurality of control lines from said decode stage for controlling said coprocessor which are operated upon detection of a coprocessor instruction, whereby said coprocessor is using said registers from said register file during execution of a coprocessor instruction.

Description

    BACKGROUND OF THE INVENTION
  • The present invention relates to a data processing unit with a coprocessor interface. A coprocessor is used in a data processing system to perform special tasks, such as floating point operations, digital signal processing, etc. Many data processors are capable of working in combination with a coprocessor. Usually, a main processor addresses a coprocessor through the system bus. If the main processor decodes a coprocessor instruction, it transfers, for example by means of an exception routine, the coprocessor instruction and respective data to a coprocessor which performs the instruction and transfers back a result to the main processor. During execution of the coprocessor, the main processor usually is set in a wait state. [0001]
  • U.S. Pat. No. 5,603,047 describes such a system. FIG. 7 of Pat. No. 5,603,047 shows a block diagram of such a coprocessor having 24 registers. A coprocessor instruction has a specific format which is detected during the decode stage of the pipeline shown in FIG. 2 of Pat. No. 5,603,047. The respective coprocessor instructions are described in [0002] column 20 of the Pat. No. 5,603,047. They include instructions for loading and storing data and control from or to the coprocessor. The coprocessor can be able to perform a variety of functions which might be selected by various programs which can be selected through respective addresses which are transferred to the coprocessor. The coprocessor executes these programs and when finished, the respective results can be transferred to the main processor through respective transfer instructions.
  • SUMMARY OF THE INVENTION
  • In many applications high speed processing of data is necessary. Therefore, there exist a high demand of performing certain tasks within a single cycle of the system clock. Most instructions of known microprocessors or microcontrollers can be executed within a single cycle due to superscalar and superpipeline techniques. Nevertheless, many special instructions are either not available on, for example, reduced instruction set computers, or need a plurality of execution cycles. Even with the addition of coprocessors these tasks cannot be executed in the requested time due to cumbersome transfer protocols between the main processor and a coprocessor. [0003]
  • Therefore, it is an object of the present invention to provide a data processing unit with a coprocessor interface to overcome the above mentioned problems. [0004]
  • This object is achieved according to the present invention by a data processing unit comprising a register file, a memory, a plurality of execution units, a pipeline configuration for processing instructions having a fetch stage for fetching an instruction from said memory, a decode stage for decoding an operational code from the instruction, an execution stage for activating one of the execution units, and a write-back stage for writing back from the execution unit, a coprocessor interface for coupling at least one coprocessor. The data processing unit has read- and write-lines coupling the register file with the coprocessor for exchanging operands, at least one control line indicating that the coprocessor is busy, a plurality of control lines from the decode stage for controlling said coprocessor which are operated upon detection of a coprocessor instruction. The coprocessor is using the registers from the register file during execution of a coprocessor instruction. [0005]
  • Because the coprocessor is using the register file of the main processor it can execute instructions as fast as any execution unit, such as the arithmetic logic unit, a shifter, a load/store unit, etc. A coprocessor instruction is decoded and executed in the same manner as any other instruction. [0006]
  • In a further embodiment a field programmable gate array (FPGA) is used as a coprocessor. Thus, a wide variety of additional instructions can be executed, whereby the instruction variety can be expanded dynamically by means of reprogramming the FPGA.[0007]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows a block diagram of the relevant parts of a data processing unit including a coprocessor interface according to the present invention, [0008]
  • FIG. 2 shows the format of a coprocessor instruction, [0009]
  • FIG. 3 shows a block diagram of an embodiment of a single coprocessor, and [0010]
  • FIG. 4 shows a block diagram of an embodiment of four coprocessors.[0011]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • FIG. 1 shows a [0012] memory cache subsystem 1 coupled through a bus with a register file 2. Register file 2 contains an align unit 201, address buffer 202 and data buffer 207, address registers 203 and data registers 208, address forwarding unit 204 and data forwarding unit 209, address write-back buffer 205 and data write-back buffer 210, and a control unit 206. In the preferred embodiment only the data registers are interfaced with the coprocessor. Therefore, only the most relevant connecting lines are shown in FIG. 1 for the sake of a better overview. Nevertheless any kind of register from a register file can be used to interface with the coprocessor interface. The data registers 208 are coupled through data buffer 207 and align unit 201 with the cache memory subsystem 1.
  • To interface with the [0013] different execution units 3 a, . . . 3 n three different read busses are provided. The first read bus 211 comprises 64 bit lines, the second read bus 212 has 32 bit lines, and the third read bus 213 provides also 32 bit lines. Of course the number of bit lines per read port is freely selectable and depends on the instruction set. Furthermore, a write bus 214 having 64 bit lines is provided. These four busses 211, 212, 213, and 214 allow read and write access to the respective data registers 208 of the register file 2. An instruction fetch unit 5 provides instructions to a following instruction decoder 6. The instruction decoder 6 provides all execution units with respective operational codes and selects the respective registers 203, 208 in the register file 2. A coprocessor interface 7 is provided which is coupled with the four busses 211, 212, 213, and 214. Furthermore, coprocessor interface 7 is coupled through busses 61 and 62 with instruction decoder 6. Bus 61 can have n instruction lines for providing operational code and other information. In addition, bus 62 has m control lines to provide the pipeline with status information from the coprocessors.
  • The [0014] control bus 61, 62 can have the following functionality: One line can indicate a valid instruction which would be asserted when the integer pipeline is valid. Another line or set of lines could be provided for an instruction sequencer. Depending on the number of instruction cycles needed a 2 bit, 3bit, 4 bit, etc., -wide bus would be provided. A further line can indicate a multi cycle start which would be activated by the coprocessor to indicate when the instruction in the coprocessor decoder is a multi cycle instruction. Yet another line would be activated by the coprocessor to indicate the end of a multi cycle instruction, signaling the last re-inject of the instruction. Also, a multi cycle continue control line can be provided which would be activated by the coprocessor to re-inject an instruction during multi cycle start and end phase. To indicate an invalid opcode a further control line may be provided. Further control lines indicate which coprocessor has to be enabled, for example, two lines can address four different coprocessors. Other control signals may be provided depending on the structure of the coprocessor unit.
  • The embodiment according to FIG. 1 shows three coprocessors. The number of coprocessors which can be added to the system internally or externally depends on the instruction size of the data processing unit as will be explained later. The [0015] first coprocessor 4 a in this embodiment shows a floating point coprocessor. The second coprocessor 4 b is a fuzzy logic coprocessor and the third coprocessor is a re-programmable coprocessor in form of a FPGA. All coprocessors are coupled with the six busses 211,212,213,214, 61, and 62 through interface 7.
  • FIG. 2 shows two possible formats A and B of a coprocessor instruction. In this embodiment an instruction is 32 bits long and the bit fields indicating a coprocessor instruction can be one or both of the opcode fields [0016] OPCODE 1, OPCODE 2, and OP 1, OP2, respectively. The bit field D indicates the destination in form of a register number where the result of the respective instruction will be written to. The bit field # indicates the number of the coprocessor for executing the instruction defined in the opcode bit field. Bit fields S1, S2, S3 contain either data register or immediate data for the respective instruction. In this embodiment each of the bit field S1, S2, S3, and D are 4 bits wide, the OPCODE field comprises 12 or 16 bits. The # field has 2 bits, and the 2 bits are not used in both instruction formats A and B indicated as “--”.
  • Instruction fetch [0017] unit 5 provides instruction decoder 6 with an instruction from a instruction stream. Instruction decoder 6 determines whether an instruction is designated to a coprocessor by means of the bit field OPCODE 1, OPCODE 2, and OP 1, OP 2, respectively. After decoding of an instruction the coprocessor indicated in the bit field # receives the respective instruction stored in the opcode bit fields and eventually immediate data from one or more of the bit fields S1, S2, S3 through bus 61 and the contents of the selected data registers in bit fields S1, S2, and S3 through the three data read busses 211, 212, and 213. In the following execution cycle the coprocessor executes the instruction decoded by the instruction decoder and writes during the write-back cycle the respective result back to a data register designated in bit field D. Thus, execution of a coprocessor instruction can be as quick as an execution of any execution units. No transfers from or to registers are delaying the process of executing a special instruction because the respective coprocessor does not need its own registers. Nevertheless, a coprocessor may have additional registers which contain data that need not be accessible by the data processing unit.
  • On the other hand, usually a common known coprocessor needs to be initialized by transferring data to the coprocessor, configuring the coprocessor and transferring the respective instruction to the coprocessor. This creates an overhead affecting the overall speed of the system. Thus, a known coprocessor will stall the respective pipelines for a plurality of cycles. The coprocessor according to the present invention does not need these steps. It can operate directly with the register file of the main CPU. Transfer of data is similar to the transfer of data to regular execution units. Thus every instruction which can be executed in a single cycle can be executed in parallel with another pipeline or multiple pipelines. In the embodiment of FIG. 1 this would be the load/store pipeline coupled with the [0018] address register file 203, and the units 202, 204, 205. The pipelines only get stalled with a multi-cycle instruction in a similar manner as this would occur with any execution unit of the central processing unit. For this purpose, control lines indicating a multi-cycle start, a multi-cycle end, and a multi-cycle continuation described above are used.
  • Using a FPGA as a coprocessor comprises additional benefits. Depending on the specific task a microcontroller system using a data processing unit according to the present invention is programmed initially. The FPGA may be re-programmed and adapted to each specific task of a complex program dynamically. For example an instruction for performing a convolution operation is not available in standard instruction sets of either a RISC or a CISC processor. Such an instruction forms, for example, a 32 bit long word out of two 16 bit words by alternatively concatenating the bits of each input word. For example, if the first input word contains only “1111 . . . 111” and the second input word contains only “0” the result would be a 32 bit word with alternating “0” and “1”. In other words, the resulting word consists of [0019] bit 16 of the first word, followed by bit 16 of the second word, followed by bit 15 of the first word and so on. To perform such an operation a plurality of instructions has to be executed in a conventional microprocessor system. A FPGA can be easily programmed to couple a multiplexer or respective logic with the input and output lines to perform this task in a single cycle. Because such an instruction can be performed with the registers of the data processing unit no additional transfers are necessary.
  • The embodiment of a coprocessor interface according to the present invention provides three data read [0020] busses 211, 212, and 213 and one write-back bus 214. Thus, digital signal processing functionality can be provided by the coprocessors. For example, a single instruction can perform a multiplication of two operands and an addition of the result with a third operand. The final result is written into a designated register. All three operands can be transferred during the decode cycle to the respective coprocessor and written back to the destination register during the write-back cycle.
  • FIG. 3 shows the main blocks of a coprocessor [0021] 4 coupled with a coprocessor interface according to the invention. Each coprocessor may have a decode unit 41 which receives the respective coprocessor instruction from the CPU. Decode unit 41 decodes the instruction, for example, bits 16 to 23 according to an instruction as shown in FIG. 2. Then, decode unit 41 provides an execute unit 42 coupled with decode unit 41 with the respective control signals. Execute unit may contain multiplexers, adders, shifter, etc. connected in a way to perform respective functions. The control signals provided by decode unit 41 activate the respective units to operate in a predetermined way. The result is passed to the coprocessor interface, which couples the result bus to the write back bus of the integer pipeline. Thus, the coprocessor behaves in a similar way as an additional execution unit as shown in FIG. 1.
  • FIG. 4 shows a solution where [0022] multiple execution units 43, 44, 45, and 46 share the same decode unit 41. Decode unit 41 decodes the respective coprocessor instruction and selects one of the execution units 43, 44, 45, or 46 which performs the respective function. The result is again written back through interface 7 into the register file.
  • In case of a longer execution time needed by a coprocessor, the pipeline of the data processing unit needs to be stalled. Thus, [0023] additional control lines 62 are provided which supply information from the coprocessors to the pipeline as described above. For example, the coprocessor executing a respective instruction which needs a plurality of system cycles sends a busy signal through bus 62 to the instruction decode unit 6 to stall the pipeline.
  • The coprocessor interface includes all necessary buffers and logic to feed necessary signals from or to the coprocessors. Thus, the coprocessors according to the present invention can be coupled with the [0024] coprocessor interface 7 either on-chip or externally. In the preferred embodiment the coprocessors are coupled with the integer pipeline. In different embodiments with different with different pipeline structures the coprocessor interface can also be coupled with a different type of pipeline or with more than one pipeline. Thus, two or more coprocessors could operate in parallel.

Claims (7)

claims:
1. Data processing unit comprising:
a register file,
a memory,
a plurality of execution units,
a pipeline configuration for processing instructions having a fetch stage for fetching an instruction from said memory, a decode stage for decoding an operational code from said instruction, an execution stage for activating one of said execution units, and a write-back stage for writing back from said execution unit, a coprocessor interface for coupling at least one coprocessor with said data processing unit having:
read- and write-lines coupling said register file with said coprocessor for exchanging operands,
at least one control line indicating that said coprocessor is busy, and
a plurality of control lines from said decode stage for controlling said coprocessor which are operated upon detection of a coprocessor instruction,
whereby said coprocessor is using said registers from said register file 19 during execution of a coprocessor instruction.
2. Data processing unit according to
claim 1
, wherein said read- and write-lines include a plurality of read lines to read at least two operands from said register file and a plurality of write lines to write-back at least one operand.
3. Data processing unit according to
claim 1
, wherein each coprocessor instruction contains a bit field indicating the respective coprocessor and a bit field indicating the operational code for said coprocessor.
4. Data processing unit according to
claim 1
, wherein said pipeline execution is stalled upon a busy signal from said coprocessor.
5. Data processing unit according to
claim 1
further comprising programming means for programming a programmable gate array and wherein said coprocessor is formed by a programmable gate array.
6. Data processing unit according to
claim 1
wherein the coprocessor comprises a decode unit for decoding said coprocessor instruction and at least one execution unit for executing said coprocessor instruction.
7. Data processing unit according to
claim 6
, wherein the coprocessor comprises a plurality of execution units and said decode unit selects one of the execution units upon said coprocessor instruction.
US09/189,111 1998-11-09 1998-11-09 Data processing unit with interface for sharing registers by a processor and a coprocessor Expired - Lifetime US6434689B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US09/189,111 US6434689B2 (en) 1998-11-09 1998-11-09 Data processing unit with interface for sharing registers by a processor and a coprocessor
EP99116655A EP1001335B1 (en) 1998-11-09 1999-08-26 Data processing unit with coprocessor interface
DE69903704T DE69903704D1 (en) 1998-11-09 1999-08-26 Data processing unit with a coprocessor interface

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/189,111 US6434689B2 (en) 1998-11-09 1998-11-09 Data processing unit with interface for sharing registers by a processor and a coprocessor

Publications (2)

Publication Number Publication Date
US20010042193A1 true US20010042193A1 (en) 2001-11-15
US6434689B2 US6434689B2 (en) 2002-08-13

Family

ID=22695981

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/189,111 Expired - Lifetime US6434689B2 (en) 1998-11-09 1998-11-09 Data processing unit with interface for sharing registers by a processor and a coprocessor

Country Status (3)

Country Link
US (1) US6434689B2 (en)
EP (1) EP1001335B1 (en)
DE (1) DE69903704D1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030088826A1 (en) * 2001-11-06 2003-05-08 Govind Kizhepat Method and apparatus for performing computations and operations on data using data steering
US6996785B1 (en) 2003-04-25 2006-02-07 Universal Network Machines, Inc . On-chip packet-based interconnections using repeaters/routers
US20100017453A1 (en) * 2004-12-14 2010-01-21 Koninklijke Philips Electronics, N.V. Programmable signal processing circuit and method of demodulating
US20100250964A1 (en) * 2009-03-31 2010-09-30 Olson Christopher H Apparatus and method for implementing instruction support for the camellia cipher algorithm
US20100250965A1 (en) * 2009-03-31 2010-09-30 Olson Christopher H Apparatus and method for implementing instruction support for the advanced encryption standard (aes) algorithm
US20100250966A1 (en) * 2009-03-31 2010-09-30 Olson Christopher H Processor and method for implementing instruction support for hash algorithms
US20100246815A1 (en) * 2009-03-31 2010-09-30 Olson Christopher H Apparatus and method for implementing instruction support for the kasumi cipher algorithm
US20160048396A1 (en) * 2014-08-14 2016-02-18 Texas Instruments Deutschland Gmbh Central processor-coprocessor synchronization
US11467836B2 (en) * 2020-02-07 2022-10-11 Alibaba Group Holding Limited Executing cross-core copy instructions in an accelerator to temporarily store an operand that cannot be accommodated by on-chip memory of a primary core into a secondary core

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6505290B1 (en) * 1997-09-05 2003-01-07 Motorola, Inc. Method and apparatus for interfacing a processor to a coprocessor
JP2001092662A (en) * 1999-09-22 2001-04-06 Toshiba Corp Processor core and processor using the same
JP2002041489A (en) * 2000-07-25 2002-02-08 Mitsubishi Electric Corp Synchronizing signal generation circuit, processor system using the same and synchronizing signal generating method
JP4125475B2 (en) * 2000-12-12 2008-07-30 株式会社東芝 RTL generation system, RTL generation method, RTL generation program, and semiconductor device manufacturing method
US6886092B1 (en) * 2001-11-19 2005-04-26 Xilinx, Inc. Custom code processing in PGA by providing instructions from fixed logic processor portion to programmable dedicated processor portion
US7600096B2 (en) * 2002-11-19 2009-10-06 Stmicroelectronics, Inc. Coprocessor extension architecture built using a novel split-instruction transaction model
EP1573571A2 (en) * 2002-12-12 2005-09-14 Koninklijke Philips Electronics N.V. Modular integration of an array processor within a system on chip
US20050071830A1 (en) * 2003-09-30 2005-03-31 Starcore, Llc Method and system for processing a sequence of instructions
US7441106B2 (en) * 2004-07-02 2008-10-21 Seagate Technology Llc Distributed processing in a multiple processing unit environment
JP2006048661A (en) * 2004-07-06 2006-02-16 Matsushita Electric Ind Co Ltd Processor system for controlling data transfer between processor and coprocessor
US7395410B2 (en) * 2004-07-06 2008-07-01 Matsushita Electric Industrial Co., Ltd. Processor system with an improved instruction decode control unit that controls data transfer between processor and coprocessor
US7590822B1 (en) 2004-08-06 2009-09-15 Xilinx, Inc. Tracking an instruction through a processor pipeline
US7546441B1 (en) * 2004-08-06 2009-06-09 Xilinx, Inc. Coprocessor interface controller
US7590823B1 (en) 2004-08-06 2009-09-15 Xilinx, Inc. Method and system for handling an instruction not supported in a coprocessor formed using configurable logic
US7346759B1 (en) 2004-08-06 2008-03-18 Xilinx, Inc. Decoder interface
US7587579B2 (en) * 2004-12-28 2009-09-08 Ceva D.S.P. Ltd. Processor core interface for providing external hardware modules with access to registers of the core and methods thereof
JP3867804B2 (en) * 2005-03-22 2007-01-17 セイコーエプソン株式会社 Integrated circuit device
US20060230213A1 (en) * 2005-03-29 2006-10-12 Via Technologies, Inc. Digital signal system with accelerators and method for operating the same
US20070168646A1 (en) * 2006-01-17 2007-07-19 Jean-Francois Collard Data exchange between cooperating processors
JP2007200180A (en) * 2006-01-30 2007-08-09 Nec Electronics Corp Processor system
JP2008310693A (en) * 2007-06-15 2008-12-25 Panasonic Corp Information processor
US7996656B2 (en) * 2007-09-25 2011-08-09 Intel Corporation Attaching and virtualizing reconfigurable logic units to a processor
US20090183161A1 (en) * 2008-01-16 2009-07-16 Pasi Kolinummi Co-processor for stream data processing
EP2525286A1 (en) 2011-05-17 2012-11-21 Nxp B.V. Co-processor interface
CN102750127B (en) * 2012-06-12 2015-06-24 清华大学 Coprocessor
US11263014B2 (en) * 2019-08-05 2022-03-01 Arm Limited Sharing instruction encoding space between a coprocessor and auxiliary execution circuitry

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4547849A (en) 1981-12-09 1985-10-15 Glenn Louie Interface between a microprocessor and a coprocessor
US5021991A (en) * 1983-04-18 1991-06-04 Motorola, Inc. Coprocessor instruction format
JPS62214464A (en) * 1986-03-17 1987-09-21 Hitachi Ltd Coprocessor coupling system
JPS63261449A (en) * 1987-04-20 1988-10-28 Hitachi Ltd Data processor
JPH01147656A (en) * 1987-12-03 1989-06-09 Nec Corp Microprocessor
JP2741867B2 (en) * 1988-05-27 1998-04-22 株式会社日立製作所 Information processing system and processor
JPH0343827A (en) * 1989-07-12 1991-02-25 Omron Corp Fuzzy microcomputer
US5185872A (en) * 1990-02-28 1993-02-09 Intel Corporation System for executing different cycle instructions by selectively bypassing scoreboard register and canceling the execution of conditionally issued instruction if needed resources are busy
US5347181A (en) 1992-04-29 1994-09-13 Motorola, Inc. Interface control logic for embedding a microprocessor in a gate array
EP0651321B1 (en) 1993-10-29 2001-11-14 Advanced Micro Devices, Inc. Superscalar microprocessors
FR2719926B1 (en) * 1994-05-10 1996-06-07 Sgs Thomson Microelectronics Electronic circuit and method of using a coprocessor.
US5507000A (en) 1994-09-26 1996-04-09 Bull Hn Information Systems Inc. Sharing of register stack by two execution units in a central processor
JP2987308B2 (en) * 1995-04-28 1999-12-06 松下電器産業株式会社 Information processing device
US5752071A (en) * 1995-07-17 1998-05-12 Intel Corporation Function coprocessor
US5603047A (en) * 1995-10-06 1997-02-11 Lsi Logic Corporation Superscalar microprocessor architecture
US5713039A (en) * 1995-12-05 1998-01-27 Advanced Micro Devices, Inc. Register file having multiple register storages for storing data from multiple data streams
US6061711A (en) * 1996-08-19 2000-05-09 Samsung Electronics, Inc. Efficient context saving and restoring in a multi-tasking computing system environment
US5983338A (en) * 1997-09-05 1999-11-09 Motorola, Inc. Method and apparatus for interfacing a processor to a coprocessor for communicating register write information
US5923893A (en) * 1997-09-05 1999-07-13 Motorola, Inc. Method and apparatus for interfacing a processor to a coprocessor

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030088826A1 (en) * 2001-11-06 2003-05-08 Govind Kizhepat Method and apparatus for performing computations and operations on data using data steering
US7376811B2 (en) * 2001-11-06 2008-05-20 Netxen, Inc. Method and apparatus for performing computations and operations on data using data steering
US6996785B1 (en) 2003-04-25 2006-02-07 Universal Network Machines, Inc . On-chip packet-based interconnections using repeaters/routers
US20100017453A1 (en) * 2004-12-14 2010-01-21 Koninklijke Philips Electronics, N.V. Programmable signal processing circuit and method of demodulating
US9184953B2 (en) * 2004-12-14 2015-11-10 Intel Corporation Programmable signal processing circuit and method of demodulating via a demapping instruction
US8832464B2 (en) 2009-03-31 2014-09-09 Oracle America, Inc. Processor and method for implementing instruction support for hash algorithms
US20100250966A1 (en) * 2009-03-31 2010-09-30 Olson Christopher H Processor and method for implementing instruction support for hash algorithms
US20100246815A1 (en) * 2009-03-31 2010-09-30 Olson Christopher H Apparatus and method for implementing instruction support for the kasumi cipher algorithm
US20100250965A1 (en) * 2009-03-31 2010-09-30 Olson Christopher H Apparatus and method for implementing instruction support for the advanced encryption standard (aes) algorithm
US20100250964A1 (en) * 2009-03-31 2010-09-30 Olson Christopher H Apparatus and method for implementing instruction support for the camellia cipher algorithm
US9317286B2 (en) * 2009-03-31 2016-04-19 Oracle America, Inc. Apparatus and method for implementing instruction support for the camellia cipher algorithm
US20160048396A1 (en) * 2014-08-14 2016-02-18 Texas Instruments Deutschland Gmbh Central processor-coprocessor synchronization
US11132203B2 (en) * 2014-08-14 2021-09-28 Texas Instruments Incorporated System and method for synchronizing instruction execution between a central processor and a coprocessor
US20210382721A1 (en) * 2014-08-14 2021-12-09 Texas Instruments Incorporated Central processor-coprocessor synchronization
US11868780B2 (en) * 2014-08-14 2024-01-09 Texas Instruments Incorporated Central processor-coprocessor synchronization
US11467836B2 (en) * 2020-02-07 2022-10-11 Alibaba Group Holding Limited Executing cross-core copy instructions in an accelerator to temporarily store an operand that cannot be accommodated by on-chip memory of a primary core into a secondary core

Also Published As

Publication number Publication date
EP1001335B1 (en) 2002-10-30
EP1001335A1 (en) 2000-05-17
US6434689B2 (en) 2002-08-13
DE69903704D1 (en) 2002-12-05

Similar Documents

Publication Publication Date Title
US6434689B2 (en) Data processing unit with interface for sharing registers by a processor and a coprocessor
EP0927936B1 (en) A microprocessor with configurable on-chip memory
US5978838A (en) Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor
US5838984A (en) Single-instruction-multiple-data processing using multiple banks of vector registers
US5590352A (en) Dependency checking and forwarding of variable width operands
JP2834837B2 (en) Programmable controller
US5185872A (en) System for executing different cycle instructions by selectively bypassing scoreboard register and canceling the execution of conditionally issued instruction if needed resources are busy
US5564056A (en) Method and apparatus for zero extension and bit shifting to preserve register parameters in a microprocessor utilizing register renaming
US6327647B1 (en) Method and apparatus for interfacing a processor to a coprocessor
US7558942B1 (en) Memory mapped register file and method for accessing the same
US5881307A (en) Deferred store data read with simple anti-dependency pipeline inter-lock control in superscalar processor
US20020032848A1 (en) Method and apparatus for obtaining a scalar value directly from a vector register
US6832117B1 (en) Processor core for using external extended arithmetic unit efficiently and processor incorporating the same
JPH09311786A (en) Data processor
EP1089167A2 (en) Processor architecture for executing two different fixed-length instruction sets
JP2002512399A (en) RISC processor with context switch register set accessible by external coprocessor
US5913054A (en) Method and system for processing a multiple-register instruction that permit multiple data words to be written in a single processor cycle
US5983338A (en) Method and apparatus for interfacing a processor to a coprocessor for communicating register write information
JP2004171573A (en) Coprocessor extension architecture built by using novel splint-instruction transaction model
US5481736A (en) Computer processing element having first and second functional units accessing shared memory output port on prioritized basis
US6405303B1 (en) Massively parallel decoding and execution of variable-length instructions
EP0982655A2 (en) Data processing unit and method for executing instructions of variable lengths
US6449712B1 (en) Emulating execution of smaller fixed-length branch/delay slot instructions with a sequence of larger fixed-length instructions
US5768553A (en) Microprocessor using an instruction field to define DSP instructions
US5428811A (en) Interface between a register file which arbitrates between a number of single cycle and multiple cycle functional units

Legal Events

Date Code Title Description
AS Assignment

Owner name: SIEMENS MICROELECTRONICS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FLECK, ROD G.;ARNOLD, ROGER D.;HOLMER, BRUCE K.;AND OTHERS;REEL/FRAME:009585/0498;SIGNING DATES FROM 19981001 TO 19981030

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: SMI HOLDING LLC, CALIFORNIA

Free format text: MERGER;ASSIGNOR:SIEMENS MICROELECTRONICS, INC.;REEL/FRAME:013364/0826

Effective date: 19990330

Owner name: INFINEON TECHNOLOGIES NORTH AMERICA CORP., CALIFOR

Free format text: CHANGE OF NAME;ASSIGNOR:INFINEON TECHNOLOGIES CORPORATION;REEL/FRAME:013364/0844

Effective date: 19990929

Owner name: INFINEON TECHNOLOGIES CORPORATION, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:SIEMENS DRAM SEMICONDUCTOR CORPORATION;REEL/FRAME:013364/0886

Effective date: 19990401

Owner name: SIEMENS DRAM SEMICONDUCTOR CORPORATION, CALIFORNIA

Free format text: TRANSFER OF ASSETS;ASSIGNOR:SMI HOLDING LLC;REEL/FRAME:013364/0832

Effective date: 19990330

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: INFINEON TECHNOLOGIES AG, GERMANY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INFINEON TECHNOLOGIES NORTH AMERICA CORP.;REEL/FRAME:029034/0824

Effective date: 20120918

FPAY Fee payment

Year of fee payment: 12