WO2000008555A1

WO2000008555A1 - Data processing device

Info

Publication number: WO2000008555A1
Application number: PCT/EP1999/005520
Authority: WO
Inventors: Fransiscus W. Sijstermans
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 1998-08-06
Filing date: 1999-07-29
Publication date: 2000-02-17

Abstract

The data processing device has an instruction execution pipeline containing at least a first and second processing stage directly or indirectly in series. The stages execute a first and second stage of instruction execution, a mutually different first and second number of processing cycles after the instruction enters the pipeline. The first and second stage are both coupled to a register file, for writing a result of processing by the first and/or the second stage to the register file upon completion of the first and second number of processing cycles respectively.

Description

Data processing device.

The invention relates to a data processing device with an instruction execution pipeline.

PCT patent application No. WO 98/11483 teaches a data processing device with an instruction pipeline. The pipeline contains a series of processing stages from a front end to a back end, for performing successive operations during the execution of an instruction. The final stage of the back end writes back a processing result to a register file.

The pipeline can process several instructions in parallel, because the front end processing stages can start executing an instruction before the back end processing stages produced and written back the result of an earlier instruction.

Amongst others, it is an object of the invention to reduce the average time needed for processing instructions using the pipeline.

The data processing device according to the invention is described in Claim 1. The invention provides for the possibility to write back results from different processing stages in the pipeline directly after such a processing stage completes processing of an instruction, that is, without passing the entire pipeline and before the entire pipeline has had the opportunity to process the instruction.

For example, a first processing stage might perform an arithmetic operation and a second processing stage might perform a clipping operation on the result of the arithmetic operation. In this case, one may include two types of instruction in the instruction set of the data processing device, one type for arithmetic operation with clipping and one type for arithmetic operations without clipping. In case of an operation with clipping the result would be written back from the second processing unit (after completion of the clipping operation) and in case of an operation without clipping the result would be written back from the first processing stage (before completion of the clipping operation).

The data processor may even write the result of both the first and the second stage (e.g. with and without clipping) in response to some instructions. This means that the result is written back to the register file directly after the processing stage produces its result, that is, earlier than if the processor has to wait for a time period corresponding to the time needed by the second processing stage. Writing to the register file is normally followed by writing to a register after a predetermined delay, but without deviating from the invention, some types of register file may introduce a variable delay until writing is complete, for example in order to resolve access conflicts.

When instruction execution is pipelined, this may mean that the result of a first instruction is written back to the register file is written back before or at the same time as the result of an second instruction that has entered the pipeline before the first instruction. Preferably, the register file is provided with more than one write port, so that results from different stages of the pipeline can be written back in parallel. Also preferably, different write port of the register file are assigned to different processing stages, so that the pipeline is connected to more write ports than needed for writing the result of individual instructions, in order to be able to write results of different instructions in the pipeline from different processing stages in parallel.

These and other advantageous aspects of the invention will be described using the following figures.

Figure 1 shows an architecture of a data processing device

Figure 2 shows a functional unit.

Figure 1 shows the architecture of a data processor. By way of example a VLIW processor has been shown, although the invention is not limited to VLIW processors. The processor contains a register file 10, a number of functional units 12a-f and an instruction issue unit 14. The instruction issue unit 14 has instruction issue connections to the functional units 12a-f. The functional units 12a-f are connected to the register file 10 via read and write ports. A first one of the functional units 12a has two read ports and two write ports connected to the register file 10. Figure 2 shows the first one of the functional units 70, with a cascade of a first and second sub-unit 72. 74. An output of the first sub-unit is coupled to an input of the second sub-unit and to a write port of the register file 10. An output of the second sub-unit is coupled to another port of the register file 10. The functional unit 70 contains two control units 76, 78 coupled to a control input the first and second sub-unit 72, 74 respectively. An input of the first control unit 76 is coupled to an output of the instruction issue unit for receiving an opcode. An output of the first control unit 76 is coupled to an input of the second control unit 78.

In operation, the instruction issue unit 14 fetches successive instructions words from an instruction memory (not shown explicitly). Each instruction word may contain several instructions for different ones of the functional units 12a-f. Normally, each instruction contains fields specifying an opcode, one or more source registers and one result register. When the instruction issue unit 14 has fetched an instruction word from instruction memory, the fields specifying the source registers in a particular instruction are decoded and used to address the register file 10. In response, the register file 10 supplies the content of the source registers to the functional unit 12a-f that will execute the particular instruction.

In case the particular instruction is intended for the first functional unit 12a, 70, the field specifying the opcode and the content of the source registers is supplied to the functional unit 70. The functional unit 70 operates in successive processing cycles. In a first processing cycle a control signal for the first sub-unit 72 is generated by the first control unit 76, dependent on the opcode. Under control of this control signal the first sub-unit 72 generates a result which the first sub-unit may write to the register file via the write port (writing depends on the control signal). The result (and possible additional information) is passed to the second sub-unit 74. Also a further control signal dependent on the opcode is passed from the first control unit 74 to the second control unit 78. In a second processing cycle, which is later than the first processing cycle, the second sub-unit 74 processes the result generated by the first sub-unit 72 under control of the control signal passed by the second control unit 78. A second result, generated by the second sub-unit 72 may be written to the register file via a write port (writing depends on the control signal from the second control unit 78). In the cycle in which the second sub-unit 74 operates under control of the second control unit 78, the first control unit 76 may already cause the first sub-unit 72 to process a subsequent instruction.

This can be illustrated with a pipeline table:

Cl C2 C3 C4

11 IF SI S2 WB

12 IF SI WB This table shows the execution of two instructions II, 12 in successive clock cycles Cl, C2, C3, C4. These instructions involve a number of processing steps: LF, SI, S2, WB. "IF" refers to an instruction fetch and operand fetch step, "SI", "S2" refer to execution in the first and second sub-unit 72, 74 respectively. " WB" refers to a step of writing a result back to the register file 10. It is seen in the table that the result of the second instruction is written back without an "S2" step. As a consequence the WB step for 12 occurs in the same cycle as for II, even though the two instructions II, 12 started after one another on the same functional unit.

This is especially advantageous for processors that have a two or more functional units that can start processing different instructions in parallel, such as VLIW processors. These processors can execute further instructions 13 and 14 that use the results of II and 12 respectively. Due to the invention such a processor can start 13 and 14 in the same cycle, which makes processing faster.

The first sub-unit 72 may be for example an ALU and the second sub-unit 74 may be clipping unit or a rounding unit. In this case, the instruction may be for example an "ADD" instruction. In response to a first type of ADD instruction the first sub-unit 72 adds the source operands and writes the sum to the register file via its write port, i.e. without involvement of the second sub-unit 74; the second sub-unit 74 refrains from writing to its write port if it receives this first type of ADD instruction.

In response to a second type of ADD instruction the first sub-unit 72 adds the source operands, but it refrains from writing the sum to the register file via its write port; the second sub-unit 74 responds to the second type of ADD instruction e.g. by rounding or clipping the sum, which the second sub-unit 74 receives from the first sub-unit 72. Also in response to the second type of ADD instruction the second sub-unit 74 write the result of its operation on the sum to the write port of the second sub-unit 74. Of course, adding and rounding or clipping are used here merely by way of example, many other types of operations, which produce meaningful intermediate results, e.g instead of ADD other arithmetic or logic operations, or vector operations and instead of rounding or clipping further arithmetic or logic operations on the result of the first sub-unit 72. According to the invention, the functional unit may respond to some instructions by writing back from both of the sub-units. This leads to the following pipeline table.

Cl C2 C3 C4

II IF SI S2 and WB WB Of course, each sub-unit 72, 74 itself may contain one or more further subunits, or pipeline stages which process the instruction in successive processing stages. Alternatively, more sub-units for implementing different pipeline stages may be placed in series with the first and second sub-unit 72, 74. According to the invention, more than two of such further sub- units may be connected to their own write ports to the register file 10 for writing a result produced at an intermediate stage in the pipeline. In this case the pipeline table may be

Cl C2 Ck.. Cm.. Cn

11 IF SL. S2.. S2.. WB

12 IF SL. WB

(Here Ck, Cm, Cn refer to clock cycles later than C3, C4, C5 respectively).

Also, forks in the pipeline may be included, where one sub-unit feeds two or more further subunits in parallel, one or more of these sub-units having their own write ports for writing results back to the register file 10.

Furthermore, one may include one or more sub-units (not shown) in parallel to the first sub-unit 72, each having its own instruction and operand inputs and its own write port for writing to the register file 10. In this case these one or more sub-units and the first sub-unit 72 may feed a single second sub-unit 74 in parallel via a multiplexer (not shown), the pipeli- ned instructions determining from which of the sub-units the multiplexer passes results to the second sub-unit 74. In this case, several instructions may be executed in parallel and a selected one of them may be followed by postprocessing in the second sub-unit 74.

A compiler for the processor will have to schedule operations in such a way that results are produced timely, without conflicts about the use of functional units 12a-f or regis- ters. The compiler can treat the functional unit 70 more or less as two or more conceptually different functional units, one for processing instructions without processing by the second sub-unit 74 and one for processing instructions including processing by the second sub-unit 74. These conceptually different functional units have different latencies. The compiler will avoid scheduling instruction simultaneously at the functional unit, but the compiler may schedule the start a further instruction at a time when the second sub-unit 74 is still working on the previous instruction. Owing to the invention the compiler can schedule instructions that use the result of the further instruction earlier, for example as early as an instruction that uses a result of the previous instruction.

Claims

CLAIMS:

1. A data processing device comprising

- an instruction execution pipeline containing at least a first and second processing stage directly or indirectly in series, for executing a first and second stage of instruction execution, a mutually different first and second number of processing cycles after the instruction enters the pipeline respectively;

- a register file for receiving a result of the instruction from the pipeline in a register addressed by the instruction, characterized in that the first and second stage are both coupled to the register file, for making a result of processing by the first and/or the second stage available for writing to the register file from completion of the first and second number of processing cycles respectively.

2. A data processing device according to Claim 1, the data processing device having an instruction set that contains a first and a second type of instruction, the pipeline being arranged to make only the result of processing by the first stage or only the result of processing by the second stage available to the register file, in response to the first and second type of instruction respectively.

3. A data processing device according to Claim 2, the instruction set comprising a third type of instruction, the pipeline being arranged to make both the result of processing by the first stage and the result of processing by the second stage being available in response to the third type of instruction.

4. A data processing device according to Claim 1, the first and second stage being arranged to process mutually different first and second pipelined instructions in parallel and to make results of processing the first and second pipelined instruction available for writing to the register file both in the same processing cycle.

5. A data processing device according to Claim 1, the register file being a multiport register file, having at least a first and a second write port, for writing results to different registers in parallel, the first and second processing stage being coupled to the first and the second write port respectively.

6. A data processing device according to Claim 1 containing two or more functional units for starting execution of different instructions in parallel to one another, the instruction execution pipeline being comprised in a first one of the functional units, the device being programmed for executing a first instruction followed by a second instruction in the first one of the functional units, and executing a third and fourth instruction, which use a result of the first and second instruction written to the register file from the second and first stage respectively, the third and fourth instruction being started in parallel on different ones of the functional units.

7. A method of compiling a program for a processing device according to Claim 1 , the method comprising scheduling a first instruction followed by second instruction in the instruction pipeline, the first and second instruction specifying writing to the register file after the second stage and after the first stage of the instruction execution pipeline respectively, and scheduling a third instruction, which uses a result of the second instruction, before or at the same time as a result of the first instruction becomes available to the register file.