EP0942359A1 - An apparatus for and a method of executing instructions of a program - Google Patents

An apparatus for and a method of executing instructions of a program Download PDF

Info

Publication number
EP0942359A1
EP0942359A1 EP98102925A EP98102925A EP0942359A1 EP 0942359 A1 EP0942359 A1 EP 0942359A1 EP 98102925 A EP98102925 A EP 98102925A EP 98102925 A EP98102925 A EP 98102925A EP 0942359 A1 EP0942359 A1 EP 0942359A1
Authority
EP
European Patent Office
Prior art keywords
instruction
type
instructions
data
referential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP98102925A
Other languages
German (de)
French (fr)
Other versions
EP0942359B1 (en
Inventor
Robert Dr. Knuth
Rom c/o I.C. COM Ltd. Amnon
Rivka Blum
Haim Granot
Anat Hershko
Yoav Lavi
Meny Yanni
Georgy Shenderovitch
Elliot Cohen
Eran Weingarten
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Germany Holding GmbH
Original Assignee
Siemens AG
Lantiq Deutschland GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG, Lantiq Deutschland GmbH filed Critical Siemens AG
Priority to EP98102925A priority Critical patent/EP0942359B1/en
Priority to JP2000532793A priority patent/JP2003525476A/en
Priority to CN99803152.6A priority patent/CN1114857C/en
Priority to PCT/EP1999/000849 priority patent/WO1999042922A1/en
Priority to IL13624699A priority patent/IL136246A0/en
Publication of EP0942359A1 publication Critical patent/EP0942359A1/en
Application granted granted Critical
Publication of EP0942359B1 publication Critical patent/EP0942359B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3818Decoding for concurrent execution
    • G06F9/3822Parallel decoding, e.g. parallel decode units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30149Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30196Instruction operation extension or modification using decoder, e.g. decoder per instruction set, adaptable or programmable decoders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units

Definitions

  • the invention relates to a data processing apparatus for executing instructions of a program according to the pre-characterizing portion of Claim 1.
  • the invention further relates to a method of executing instructions for a data processing apparatus according to the pre-characterizing portion of Claim 8.
  • the first main architecture may also be called a regular machine which means that one single instruction is executed per machine- cycle.
  • the second architecture is generally called a VLIW architecture (very long instruction word). With the VLIW architecture, several instructions are executed within one single machine-cycle.
  • a regular machine executing a single instruction per machine-cycle features a relatively small program data bus.
  • a program data bus is 32 bits wide.
  • the number of computational units in the execution unit of the processor is typically smaller compared to the second above mentioned architecture.
  • the program data bus width and the number of computational units are directly proportional to the power consumption of the processor.
  • a regular processor architecture typically consumes less power than other advanced architectures.
  • the major disadvantage of the regular architecture consists in that the number of MIPS (mega instructions executed per second) is smaller compared to the above mentioned VLIW architecture.
  • a regular machine is for example described in US patent 5,163,139 entitled "Instruction Preprocessor for Conditionally Combining Short Memory Instructions into Virtual Long Instructions".
  • This regular machine comprises two computational units and a main program memory of regular program data width.
  • the machine proposed in this patent further comprises an instruction preprocessor unit which checks whether or not two subsequent instructions in the program memory can be validly combined so as to form a new instruction word in its own. This new instruction word is then interpreted and executed by the two computational units of the machine.
  • the machine of US patent 5,163,139 is limited in that it can only combine pairs of instructions which meet predefined criteria. Thus, the machine largely constrains a programmer in developing program code.
  • VLIW The second architecture
  • the second architecture is based on an instruction set philosophy in which the compiler packs a number of simple, non-interdependent operations into the same instruction word.
  • This type of architecture has been proposed originally by J.A. Fisher in "Very Long Instruction Word Architectures and the ELI-512" in Proceedings of the 10th Annual Symposium on Computer Architecture, June 1983.
  • the VLIW architecture assumes multiple computational units in the processor and several decoding units which analyse the instructions fetched from the program memory.
  • a VLIW architecture has the advantage that several operations are executed in parallel, thus increasing the MIPS performance of the processor.
  • a VLIW processor requires a program memory of a larger bit width. This is a burden for both the chip area required to implement the processor architecture and for its power consumption.
  • the programming skills required from a programmer are inherently higher for writing code for a VLIW processor since it requires to take into account the parallelism of the processor.
  • a particular VLIW processor has been proposed in US patent 5,450,556 entitled "VLIW Processor Which Uses Path Information Generated by a Branch Control Unit to Inhibit Operations Which Are Not on a Correct Path".
  • This patent proposes a solution for efficiently dealing with jump instructions in a VLIW program.
  • it is proposed to add a path expression field to the VLIW instruction.
  • This path expression field is read by a branch control unit in the processor which operates so as to speed up conditional branch operations.
  • the structure proposed in US patent 5,450,556 suffers from the relatively large program memory required to store VLIW instructions, particularly in the case of execution steps which only allow for a small degree of parallelism.
  • the invention is based on the problem that highly parallel computer architectures demand a large program memory space.
  • the invention thus seeks to lower the program memory demand while maintaining the processors' capability of executing instructions in a highly parallel manner.
  • the problem is solved with a data processing apparatus having the features of claim 1.
  • the problem is also solved by method of executing instructions for a data processing apparatus having the features of claim 8.
  • Advantageous embodiments of the inventive apparatus and the inventive method are described in the respective dependent claims.
  • a preferred data processing apparatus for executing instructions of a program comprises a first instruction decoder, an address decoder, a plurality of computational units and an execution logic unit.
  • the first instruction decoder sequentially fetches program instructions of a first type from a first program memory and decodes instructions of said first type.
  • the address decoder determines the address of data to be loaded from or returned to a data memory.
  • Each of said plurality of computational units executes operations upon data according to the interpretation of said first instruction decoder and provides the results of these operations.
  • the execution logic provides said plurality of computational units with data and controls the operation of said plurality of computational units according to an instruction of said first type.
  • the data processing apparatus is characterized in that said first instruction decoder discriminates whether said apparatus is to execute a referential instruction. The referential instruction then initiates the execution of an instruction of a second type.
  • the data processing apparatus of the invention is capable of executing two types of program instructions.
  • the two types of instructions are of a significantly different bit width wherein instructions of said first type have the shorter bit width.
  • the processing apparatus either executes an instruction word of a relatively short bit width or executes an instruction of a relatively large bit width. This allows for flexible program memory organisation and thus a reduction of total memory demand of a particular program.
  • a preferred embodiment of the inventive apparatus further comprises a second instruction decoder which fetches an instruction of said second type.
  • instructions of said second type are stored in a second program memory.
  • said second instruction decoder fetches instructions of said second type from said second program memory and subsequently decodes instructions of said second type.
  • each of the instructions of said first type and said second type By providing a separate memory unit for each of the instructions of said first type and said second type, it is possible to store frequently used instructions of said second type and easily access these instructions by said data processing apparatus.
  • bit width of each of said first and second program memory is set at a fixed length.
  • the architecture of a preferred data processing apparatus is configurable to handle instructions of said first type and said second type in an efficient manner.
  • a preferred embodiment of the inventive apparatus is further characterized in that an instruction of said second type comprises a plurality of operators including data assignment information of operands and data assignment information of results. It is further preferred, that said execution logic comprises means for interpreting said instruction of said second type.
  • said referential instruction includes address information.
  • the address information relates to data upon which the instruction of said second type is to be executed which instruction of said second type is referred to in said referential instruction.
  • the apparatus of the invention is configured to allow for a pipe-lined execution of instructions of any of the first or second type. This configuration particularly eases the simultaneous execution of operations.
  • a preferred method of executing instructions for a data processing apparatus comprises the steps of fetching an instruction of a first type from a first program memory, decoding said instructions of said first type for determining the operation to be executed, reading operands from a data memory or from said data registers according to operands address information included in said instruction of said first type, executing an operation upon said operands, and writing the results of said operation into said data memory or into said data registers according to results address information included in said instruction of said first type.
  • the inventive method is characterized in that upon decoding of a referential instruction which includes predetermined information so as to be decoded as such the steps of fetching an instruction of a second type according to information included in said referential instruction and decoding said instruction of said second type for determining the operations to be executed in parallel are carried out.
  • the preferred method allows for a flexible usage of memory space because of the provision of two types of instructions.
  • the additional information needed for carrying out a particular parallel operation is obtained by referring to further instruction information (instruction of said second type) in an instruction of said first type.
  • a further preferred embodiment of the inventive method is characterized in that said referential instruction includes address information which is decoded substantially at the time of decoding said referential instruction. This feature allows for a significant increase in processing speed because the data which is needed for the instruction, which is referred to in the referential instruction, is loaded at the time of decoding the referential instruction.
  • the steps of decoding a referential instruction and the step of fetching an instruction of said second type are executed substantially simultaneously wherein said referential instruction and said instruction of said second type are associated with each other. This allows for an even further increase in processing speed because the information required for carrying out an instruction of said second type is provided already at the time of decoding said referential instruction.
  • step of reading operands from a data memory and said step of decoding an instruction of said second type are executed substantially simultaneously wherein the operands read are associated with the instruction decoded.
  • Figure 1 shows the basic architecture of the preferred embodiment of a data processing apparatus according to the invention.
  • the data processing apparatus which is particularly apt for digital signal processing is configured for a parallel execution of several operations and thus comprises a plurality of computational units.
  • Each of the computational units 61 to 64 is provided with operands data from an execution logic unit 7.
  • Each of the computational units on the other hand delivers the result of a computation to one or more registers of a bank 5 of multiport registers and/or to a data memory 3 through a data bus line connecting said computational units 61 to 64 to said data memory 3, said data bus having a bit width of r bits.
  • two results may be written directly into said data memory having a data bit width of 16 bits.
  • the bit width r equals 2 x 16 bits.
  • each of said multiport registers 5 is fed back through a bus line of bit width n to said execution logic unit 7.
  • the contents of said multiport registers 5 is also provided to an address decoder 4 for selectively writing data from said multiport registers 5 into said data memory 3.
  • the multiport registers 5 are therefore connected to said address decoder 4 through a bus line also having a bit width of n bits.
  • each register has a data bit with of 16 bit.
  • the bank 5 of multiport registers comprises a total of 16 registers.
  • the data processing apparatus of the invention can be operated either as a register-memory architecture machine or a memory-memory architecture machine.
  • the execution logic unit 7 not only receives data from said multiport registers 5 but also directly from said data memory 3.
  • the computational units 61 to 64 not only write to said multiport registers 5 but also directly to said data memory. It is clear to a person skilled in the art that the invention can similarly be embodied in a load-store architecture (or alternatively called register-register architecture) machine without deviating from the scope of this invention.
  • the execution logic unit 7 not only receives operands data from said multiport registers 5 but also from said data memory 3 through a bus line having a bit width of o bits.
  • the bit width o of the data bus between said data memory 3 and said execution logic unit 7 is proportional to the number of operands to be loaded from said data memory 3 and the bit width of each operand.
  • there are loaded a maximum of four operands from said data memory 3 to said execution logic unit 7, each having a bit width of 16 bits, resulting in a bus width o of 4 x 16 bits 64 bits.
  • the execution logic unit 7 receives decoded instruction information from a regular instruction decoder 1.
  • the execution logic unit 7 thus receives the operands for carrying out a particular instruction from said multiport registers 5 and/or said data memory 3 and delivers them to said computational units 61 to 64 as indicated by the decoded regular instruction.
  • the execution logic unit 7 further comprises means 8 for receiving a decoded instruction from a CLIW instruction decoder 9 (Configurable Length Instruction Word). Once a decoded CLIW instruction is received, said receiving means 8 in said execution logic unit 7 makes sure that the execution is not carried out according to information received from said regular instruction decoder 1 but exclusively according to the decoded instruction as received from said CLIW instruction decoder 9. Thus, said receiving means 8 replaces all the information from said regular instruction decoder 1 with the information received from said CLIW instruction decoder 9.
  • Said regular instruction decoder 1 receives a line of code from a regular program memory 2 for decoding the instruction encoded therein.
  • the regular program memory 2 is addressed by the output of a program counter 15.
  • the regular instruction decoder 1 delivers decoded instruction information to said execution logic unit 7 and delivers an address encoded in a particular instruction to said address decoder 4.
  • the regular instruction decoder 1 is further connected to said CLIW instruction decoder 9 for indicating the fact that a CLIW instruction is to be decoded next.
  • the address decoder 4 receives address information from said regular instruction decoder 1 for decoding the address encoded in a particular instruction.
  • the decoded address is delivered through a bus line having a bit width of m bit to said data memory 3.
  • the bit width m is proportional to the number of addresses and the number of bits per address to be addressed at a time.
  • the address decoder 4 decodes four addresses each having a bit width of 16 bits thus resulting in a bit width m of 64 bits for the bus line connecting said address decoder 4 and said data memory 3.
  • Said data memory 3 is further connected to said regular instruction decoder 1 through lines R/W indicating to said data memory 3 whether data at specified addresses is to be read from or is to be written into.
  • Said CLIW instruction decoder 9 is connected to a CLIW memory 10 having stored therein lines of code representing CLIW instructions.
  • the particular instruction to be read from said CLIW memory 10 is indicated by said regular instruction decoder 1 through a line P connecting said regular instruction decoder 1 to said CLIW memory 10.
  • the regular instruction decoder 1 points to a particular storage location of said CLIW memory 10, the CLIW instruction being stored therein is to be delivered to said CLIW instruction decoder 9.
  • the execution logic unit 7 operates according to instructions sequentially read from said regular program memory 2. As long as said regular instruction decoder 1 does not decode a special instruction, the operation of said CLIW instruction decoder 9 and said CLIW memory 10 is practically inhibited. However, once said regular instruction decoder 1 decodes a special instruction 2 (which also can be called a referential instruction), the function of said CLIW instruction decoder 9 and said CLIW memory 10 is activated. In effect, the execution logic unit 7 then exclusively operates according to information received from said CLIW instruction decoder 9 instead of information received from said regular instruction decoder 1.
  • the mentioned special instruction from said regular program memory 2 contains address information which the regular instruction decoder 1 delivers to said address decoder 4.
  • instruction information from said special instruction and instruction information from an associated CLIW instruction are combined.
  • Figure 2a shows the typical structure of a very long instruction word according to the prior art.
  • the instruction word 14 of figure 2a basically consists of four segments. In a first segment, a plurality of operations are defined. In a second segment, operands are assigned to each of these operations. In a third segment, results are assigned to each of these segments. Finally in a fourth segment, memory addresses are defined for the operands and results assigned in said second and said third segments, respectively.
  • Figure 2b shows the structure of instruction words which are used in conjunction with the invention.
  • a regular (short) instruction 11 having a length of k bits.
  • a regular instruction 11 includes an instruction header containing an operation code (op code) which defines the type of instruction.
  • Figure 2b further shows the structure of a referential instruction 12 which also has a length of k bits.
  • a special op code is stored in the op code header of the referential instruction 12 which op code distinguishes the referential instruction 12 from other regular instructions 11.
  • the referential instruction 12 also includes a plurality of memory addresses upon which a particular referential instruction is to be executed.
  • the referential instruction includes a pointer P which points to a CLIW instruction.
  • Figure 2b also shows the structure of a CLIW instruction 13.
  • the structure is basically identical with the one of a VLIW instruction 14 according to figure 2a except that a CLIW instruction 13 does not include any memory addresses.
  • the addresses for a particular CLIW instruction are included in a referential instruction 12 which points through its pointer P to a particular CLIW instruction 13.
  • a CLIW instruction is shown to have a bit length of 1 bits.
  • regular instructions 11 and referential instructions 12 are stored in the regular program memory 2
  • CLIW instructions are stored in the CLIW memory 10.
  • the regular program memory 2 and the CLIW memory 10 are configured with the respective bit length of the instruction words stored therein.
  • regular instructions 11 and referential instructions 12 have a bit length of 48 bits.
  • CLIW instructions 13 have a bit length of 96 bits.
  • the regular instruction decoder 1 is sequentially and continuously decoding instructions from said regular program memory 2, additional instruction information from said CLIW memory 10 will only be supplied to said execution logic unit 7 when said regular instruction decoder 1 decodes a referential instruction.
  • the decoded instruction from the CLIW instruction decoder 9 is fed to the receiving means 8 of the execution logic unit 7 for replacing all information which would be normally supplied by said regular instruction decoder 1.
  • Figure 3a is a table showing the execution of normal VLIW instructions in a processor with 5 stage pipeline according to the prior art.
  • the table of figure 3a shows the steps of instruction fetch, instruction decode, operand read, execution and operand write.
  • Figure 3b is a table showing the pipelined execution of a program according to the invention.
  • the sequence of operations is identical with the one as shown in the table of figure 3a.
  • two additional steps are inserted.
  • the CLIW instruction referred to in the referential instruction is fetched. See for example the line having line header "instruction decode and CLIW fetch" between machine cycle 2 and machine cycle 6.
  • the CLIW instruction fetched in the previous machine cycle is decoded. This is possible because the referential instruction 12 contains all address information to read the needed operands.
  • a referential instruction 12 contains a pointer to a particular CLIW instruction which is to be fetched and decoded so as to be executed with data to be read. It is referred to the table in figure 3b to the line having line header "operand read and CLIW decode" between machine cycles 3 and 7. The sequence of operations carried out in pipeline for a particular instruction follows a diagonal line in the table as indicated by an arrow.
  • a prior art processor which controls multiple execution units in parallel by one VLIW instruction usually requires large program memory space for the optimum usage of parallel execution of the data processing apparatus.
  • the invention restricts the usage of long instructions to very time consuming parts of an algorithm, the so called inner loops.
  • frequently executed instructions are executed in a highly parallel fashion while significantly decreasing the required memory space for program code for instructions which cannot be carried out in parallel.
  • the code of a VLIW instruction of the prior art determines for each execution step the operation codes, the operand assignments, the output assignments and the memory addresses.
  • the great variety of such configurations results in a high bit width of each of the VLIW instructions.
  • VLIW instructions offer full coding flexibility for each execution step and thus always support maximum parallelism, the program code consumes a large amount of program memory, particularly for those execution steps which do not allow full parallel operation.
  • Typical programs for digital signal processors generally consist of inner loops in which few instructions are repeated very often.
  • the instructions in an inner loop should be supported by maximum parallelism of the digital signal processor because they can reduce the required run time to a large extent.
  • the invention solves this problem by using short instructions combined with configurable length instruction words (CLIW).
  • CLIW configurable length instruction words
  • the regular instructions outside the inner loops are executed sequentially.
  • a regular instruction is directed to only certain frequent connections and operations of the execution units and the necessary operands. All regular instructions are directly fetched from the regular program memory 2.
  • CLIW instructions are stored in a dedicated CLIW memory 10.
  • a special referential instruction is used for initiating the execution of CLIW instructions. The referential instruction loads a CLIW instruction from the CLIW memory 10. The address P of the CLIW instruction to be fetched is defined by the referential instruction.
  • a CLIW instruction 13 defines all possible types of operation, operand connections and output connections.
  • the referential instruction includes all required memory addresses for the operations defined in the CLIW instruction associated therewith.
  • the referential instruction together with its associated CLIW instruction has all the information that a VLIW instruction according to the prior art requires.
  • bit width of regular program instructions (and thus also of referential instructions) is preferably configured to be significantly lower than the bit width of CLIW instructions, it is possible to write a much more compact program code than with VLIW instructions only.
  • the program code for each execution of the same CLIW instruction includes just another regular (short) referential instruction 12. Since typically the type of parallel operations and connections do not change within a set of CLIW instructions (for example for the execution of matrix operations) it is possible to save program space for CLIW instructions by simply changing the memory address in the referential instruction.
  • the number of required CLIW instructions in the inner loops depends on the actual program. There is a possibility to extend the fixed number of available CLIW instructions in a CLIW memory.
  • the CLIW memory can be dynamically reconfigured by recalling referential instructions. Different packets of CLIW instructions can be used in different parts of an algorithm. This feature is enabled by reloading CLIW memory packets at run time.
  • the size of the CLIW memory is user definable. Usually the size of the CLIW memory will be much smaller than the program memory. Those parts of the CLIW memory which contain always a constant set of CLIW instructions can be implemented as a read-only-memory (ROM). CLIW instructions which are encoded in a ROM can still be called together with data at different memory addresses because the address information is included in the referential instruction.
  • ROM read-only-memory

Abstract

The invention relates to a data processing apparatus for executing instructions of a program having a first instruction decoder, an address decoder, a plurality of computational units, and an execution logic unit. The data processing apparatus is characterized in that said first instruction decoder discriminates whether said apparatus is to execute a referential instruction which initiates execution of an instruction of a different type. The invention further relates to a method of executing instructions for a data processing apparatus which method is characterized in that upon decoding a referential instruction the steps of fetching an instruction of a different type according to information included in a referential instruction and decoding said instruction of said different type for determining the operations to be executed in parallel are carried out.

Description

  • The invention relates to a data processing apparatus for executing instructions of a program according to the pre-characterizing portion of Claim 1. The invention further relates to a method of executing instructions for a data processing apparatus according to the pre-characterizing portion of Claim 8.
  • Currently, there exist two main architectures for DSP processors. Both architectures make trade-offs between processing speed and program memory size wherein either the former or the latter enjoys the greater benefit. The first main architecture may also be called a regular machine which means that one single instruction is executed per machine- cycle. The second architecture is generally called a VLIW architecture (very long instruction word). With the VLIW architecture, several instructions are executed within one single machine-cycle.
  • A regular machine executing a single instruction per machine-cycle features a relatively small program data bus. Typically, such a program data bus is 32 bits wide. In a DSP processor environment, the number of computational units in the execution unit of the processor is typically smaller compared to the second above mentioned architecture. The program data bus width and the number of computational units are directly proportional to the power consumption of the processor. Thus, a regular processor architecture typically consumes less power than other advanced architectures. However, the major disadvantage of the regular architecture consists in that the number of MIPS (mega instructions executed per second) is smaller compared to the above mentioned VLIW architecture.
  • A regular machine is for example described in US patent 5,163,139 entitled "Instruction Preprocessor for Conditionally Combining Short Memory Instructions into Virtual Long Instructions". This regular machine comprises two computational units and a main program memory of regular program data width. The machine proposed in this patent further comprises an instruction preprocessor unit which checks whether or not two subsequent instructions in the program memory can be validly combined so as to form a new instruction word in its own. This new instruction word is then interpreted and executed by the two computational units of the machine. The machine of US patent 5,163,139 is limited in that it can only combine pairs of instructions which meet predefined criteria. Thus, the machine largely constrains a programmer in developing program code.
  • The second architecture (VLIW) as mentioned above is based on an instruction set philosophy in which the compiler packs a number of simple, non-interdependent operations into the same instruction word. This type of architecture has been proposed originally by J.A. Fisher in "Very Long Instruction Word Architectures and the ELI-512" in Proceedings of the 10th Annual Symposium on Computer Architecture, June 1983. The VLIW architecture assumes multiple computational units in the processor and several decoding units which analyse the instructions fetched from the program memory. A VLIW architecture has the advantage that several operations are executed in parallel, thus increasing the MIPS performance of the processor. However, a VLIW processor requires a program memory of a larger bit width. This is a burden for both the chip area required to implement the processor architecture and for its power consumption. Also, the programming skills required from a programmer are inherently higher for writing code for a VLIW processor since it requires to take into account the parallelism of the processor.
  • A particular VLIW processor has been proposed in US patent 5,450,556 entitled "VLIW Processor Which Uses Path Information Generated by a Branch Control Unit to Inhibit Operations Which Are Not on a Correct Path". This patent proposes a solution for efficiently dealing with jump instructions in a VLIW program. In order to overcome this problem, it is proposed to add a path expression field to the VLIW instruction. This path expression field is read by a branch control unit in the processor which operates so as to speed up conditional branch operations. As with all previous VLIW processor architectures, the structure proposed in US patent 5,450,556 suffers from the relatively large program memory required to store VLIW instructions, particularly in the case of execution steps which only allow for a small degree of parallelism.
  • The invention is based on the problem that highly parallel computer architectures demand a large program memory space. The invention thus seeks to lower the program memory demand while maintaining the processors' capability of executing instructions in a highly parallel manner.
  • The problem is solved with a data processing apparatus having the features of claim 1. The problem is also solved by method of executing instructions for a data processing apparatus having the features of claim 8. Advantageous embodiments of the inventive apparatus and the inventive method are described in the respective dependent claims.
  • A preferred data processing apparatus for executing instructions of a program comprises a first instruction decoder, an address decoder, a plurality of computational units and an execution logic unit. The first instruction decoder sequentially fetches program instructions of a first type from a first program memory and decodes instructions of said first type. The address decoder determines the address of data to be loaded from or returned to a data memory. Each of said plurality of computational units executes operations upon data according to the interpretation of said first instruction decoder and provides the results of these operations. The execution logic provides said plurality of computational units with data and controls the operation of said plurality of computational units according to an instruction of said first type. The data processing apparatus is characterized in that said first instruction decoder discriminates whether said apparatus is to execute a referential instruction. The referential instruction then initiates the execution of an instruction of a second type.
  • Thus, the data processing apparatus of the invention is capable of executing two types of program instructions. Preferably, the two types of instructions are of a significantly different bit width wherein instructions of said first type have the shorter bit width. Depending on the actual instruction to be executed, the processing apparatus either executes an instruction word of a relatively short bit width or executes an instruction of a relatively large bit width. This allows for flexible program memory organisation and thus a reduction of total memory demand of a particular program.
  • A preferred embodiment of the inventive apparatus further comprises a second instruction decoder which fetches an instruction of said second type. In an even more preferred embodiment, instructions of said second type are stored in a second program memory. Thus, said second instruction decoder fetches instructions of said second type from said second program memory and subsequently decodes instructions of said second type.
  • By providing a separate memory unit for each of the instructions of said first type and said second type, it is possible to store frequently used instructions of said second type and easily access these instructions by said data processing apparatus. Preferably the bit width of each of said first and second program memory is set at a fixed length. Thus, the architecture of a preferred data processing apparatus is configurable to handle instructions of said first type and said second type in an efficient manner.
  • A preferred embodiment of the inventive apparatus is further characterized in that an instruction of said second type comprises a plurality of operators including data assignment information of operands and data assignment information of results. It is further preferred, that said execution logic comprises means for interpreting said instruction of said second type.
  • In a particularly preferred embodiment of the invention said referential instruction includes address information. The address information relates to data upon which the instruction of said second type is to be executed which instruction of said second type is referred to in said referential instruction. This preferred configuration of the inventive apparatus allows that data is fetched while the instruction of said second type is decoded. This can significantly increase the performance of the inventive apparatus.
  • In a preferred embodiment of the apparatus of the invention, it is configured to allow for a pipe-lined execution of instructions of any of the first or second type. This configuration particularly eases the simultaneous execution of operations.
  • A preferred method of executing instructions for a data processing apparatus comprises the steps of fetching an instruction of a first type from a first program memory, decoding said instructions of said first type for determining the operation to be executed, reading operands from a data memory or from said data registers according to operands address information included in said instruction of said first type, executing an operation upon said operands, and writing the results of said operation into said data memory or into said data registers according to results address information included in said instruction of said first type. The inventive method is characterized in that upon decoding of a referential instruction which includes predetermined information so as to be decoded as such the steps of fetching an instruction of a second type according to information included in said referential instruction and decoding said instruction of said second type for determining the operations to be executed in parallel are carried out.
  • As has already been described with regard to the above mentioned preferred data processing apparatus of the invention, the preferred method allows for a flexible usage of memory space because of the provision of two types of instructions. The additional information needed for carrying out a particular parallel operation is obtained by referring to further instruction information (instruction of said second type) in an instruction of said first type.
  • A further preferred embodiment of the inventive method is characterized in that said referential instruction includes address information which is decoded substantially at the time of decoding said referential instruction. This feature allows for a significant increase in processing speed because the data which is needed for the instruction, which is referred to in the referential instruction, is loaded at the time of decoding the referential instruction.
  • In an even more preferred embodiment of the method of the invention the steps of decoding a referential instruction and the step of fetching an instruction of said second type are executed substantially simultaneously wherein said referential instruction and said instruction of said second type are associated with each other. This allows for an even further increase in processing speed because the information required for carrying out an instruction of said second type is provided already at the time of decoding said referential instruction.
  • In a still preferred embodiment of the method of the invention said step of reading operands from a data memory and said step of decoding an instruction of said second type are executed substantially simultaneously wherein the operands read are associated with the instruction decoded. This preferred feature allows for an even more increase in processing speed as now all information is available to the computational units of the data processing apparatus for carrying out the operations according to the instruction of said second type.
  • Further advantages, features and possibilities of using the invention are explained in the following description of a preferred embodiment of the invention which is to be read in conjunction with the attached drawings. In the drawings;
  • Figure 1
    depicts a circuit diagram of a preferred data processing apparatus according to the invention;
    Figure 2a
    shows an example of the structure of a very long instruction word as is used in the prior art;
    Figure 2b
    shows the structures of instructions of two different types as used in the preferred embodiment of the invention;
    Figure 3a
    is a table showing the sequence of pipe-lined instructions in a data processing apparatus of the prior art; and
    Figure 3b
    is a table showing the sequence of instructions according to a preferred embodiment of the invention.
  • Figure 1 shows the basic architecture of the preferred embodiment of a data processing apparatus according to the invention. The data processing apparatus which is particularly apt for digital signal processing is configured for a parallel execution of several operations and thus comprises a plurality of computational units. In the preferred embodiment, there are provided four computational units which are assigned reference numerals 61 to 64. Each of the computational units 61 to 64 is provided with operands data from an execution logic unit 7. Each of the computational units on the other hand delivers the result of a computation to one or more registers of a bank 5 of multiport registers and/or to a data memory 3 through a data bus line connecting said computational units 61 to 64 to said data memory 3, said data bus having a bit width of r bits. In the preferred embodiment two results may be written directly into said data memory having a data bit width of 16 bits. Thus, the bit width r equals 2 x 16 bits.
  • The contents of each of said multiport registers 5 is fed back through a bus line of bit width n to said execution logic unit 7. The contents of said multiport registers 5 is also provided to an address decoder 4 for selectively writing data from said multiport registers 5 into said data memory 3. The multiport registers 5 are therefore connected to said address decoder 4 through a bus line also having a bit width of n bits. In the preferred embodiment each register has a data bit with of 16 bit. Further, the bank 5 of multiport registers comprises a total of 16 registers. Thus, n is set to 16 x 16 bit = 256 bits in the preferred embodiment of the data processing apparatus.
  • With this kind of configuration of the preferred embodiment, the data processing apparatus of the invention can be operated either as a register-memory architecture machine or a memory-memory architecture machine. On the one hand, the execution logic unit 7 not only receives data from said multiport registers 5 but also directly from said data memory 3. On the other hand, the computational units 61 to 64 not only write to said multiport registers 5 but also directly to said data memory. It is clear to a person skilled in the art that the invention can similarly be embodied in a load-store architecture (or alternatively called register-register architecture) machine without deviating from the scope of this invention.
  • As already mentioned above, the execution logic unit 7 not only receives operands data from said multiport registers 5 but also from said data memory 3 through a bus line having a bit width of o bits. The bit width o of the data bus between said data memory 3 and said execution logic unit 7 is proportional to the number of operands to be loaded from said data memory 3 and the bit width of each operand. In the preferred embodiment there are loaded a maximum of four operands from said data memory 3 to said execution logic unit 7, each having a bit width of 16 bits, resulting in a bus width o of 4 x 16 bits = 64 bits.
  • The execution logic unit 7 receives decoded instruction information from a regular instruction decoder 1. The execution logic unit 7 thus receives the operands for carrying out a particular instruction from said multiport registers 5 and/or said data memory 3 and delivers them to said computational units 61 to 64 as indicated by the decoded regular instruction. The execution logic unit 7 further comprises means 8 for receiving a decoded instruction from a CLIW instruction decoder 9 (Configurable Length Instruction Word). Once a decoded CLIW instruction is received, said receiving means 8 in said execution logic unit 7 makes sure that the execution is not carried out according to information received from said regular instruction decoder 1 but exclusively according to the decoded instruction as received from said CLIW instruction decoder 9. Thus, said receiving means 8 replaces all the information from said regular instruction decoder 1 with the information received from said CLIW instruction decoder 9.
  • Said regular instruction decoder 1 receives a line of code from a regular program memory 2 for decoding the instruction encoded therein. For sequential operation of the data processing apparatus, the regular program memory 2 is addressed by the output of a program counter 15. The regular instruction decoder 1 delivers decoded instruction information to said execution logic unit 7 and delivers an address encoded in a particular instruction to said address decoder 4. The regular instruction decoder 1 is further connected to said CLIW instruction decoder 9 for indicating the fact that a CLIW instruction is to be decoded next.
  • The address decoder 4 receives address information from said regular instruction decoder 1 for decoding the address encoded in a particular instruction. The decoded address is delivered through a bus line having a bit width of m bit to said data memory 3. The bit width m is proportional to the number of addresses and the number of bits per address to be addressed at a time. In the preferred embodiment, the address decoder 4 decodes four addresses each having a bit width of 16 bits thus resulting in a bit width m of 64 bits for the bus line connecting said address decoder 4 and said data memory 3. Said data memory 3 is further connected to said regular instruction decoder 1 through lines R/W indicating to said data memory 3 whether data at specified addresses is to be read from or is to be written into.
  • Said CLIW instruction decoder 9 is connected to a CLIW memory 10 having stored therein lines of code representing CLIW instructions. The particular instruction to be read from said CLIW memory 10 is indicated by said regular instruction decoder 1 through a line P connecting said regular instruction decoder 1 to said CLIW memory 10. Thus, the regular instruction decoder 1 points to a particular storage location of said CLIW memory 10, the CLIW instruction being stored therein is to be delivered to said CLIW instruction decoder 9.
  • The general operation of the preferred embodiment of the invention can be described as follows. The execution logic unit 7 operates according to instructions sequentially read from said regular program memory 2. As long as said regular instruction decoder 1 does not decode a special instruction, the operation of said CLIW instruction decoder 9 and said CLIW memory 10 is practically inhibited. However, once said regular instruction decoder 1 decodes a special instruction 2 (which also can be called a referential instruction), the function of said CLIW instruction decoder 9 and said CLIW memory 10 is activated. In effect, the execution logic unit 7 then exclusively operates according to information received from said CLIW instruction decoder 9 instead of information received from said regular instruction decoder 1.
  • In the preferred embodiment of the invention the mentioned special instruction from said regular program memory 2 contains address information which the regular instruction decoder 1 delivers to said address decoder 4. In order for the data processing machine to execute such a special instruction, instruction information from said special instruction and instruction information from an associated CLIW instruction are combined.
  • Figure 2a shows the typical structure of a very long instruction word according to the prior art. The instruction word 14 of figure 2a basically consists of four segments. In a first segment, a plurality of operations are defined. In a second segment, operands are assigned to each of these operations. In a third segment, results are assigned to each of these segments. Finally in a fourth segment, memory addresses are defined for the operands and results assigned in said second and said third segments, respectively.
  • Figure 2b shows the structure of instruction words which are used in conjunction with the invention. There is shown a regular (short) instruction 11 having a length of k bits. A regular instruction 11 includes an instruction header containing an operation code (op code) which defines the type of instruction. Figure 2b further shows the structure of a referential instruction 12 which also has a length of k bits. A special op code is stored in the op code header of the referential instruction 12 which op code distinguishes the referential instruction 12 from other regular instructions 11. The referential instruction 12 also includes a plurality of memory addresses upon which a particular referential instruction is to be executed. Finally the referential instruction includes a pointer P which points to a CLIW instruction.
  • Figure 2b also shows the structure of a CLIW instruction 13. The structure is basically identical with the one of a VLIW instruction 14 according to figure 2a except that a CLIW instruction 13 does not include any memory addresses. In fact, the addresses for a particular CLIW instruction are included in a referential instruction 12 which points through its pointer P to a particular CLIW instruction 13. A CLIW instruction is shown to have a bit length of 1 bits.
  • Whereas regular instructions 11 and referential instructions 12 are stored in the regular program memory 2, CLIW instructions are stored in the CLIW memory 10. Thus, the regular program memory 2 and the CLIW memory 10 are configured with the respective bit length of the instruction words stored therein. In the preferred embodiment, regular instructions 11 and referential instructions 12 have a bit length of 48 bits. On the other hand, CLIW instructions 13 have a bit length of 96 bits. While the regular instruction decoder 1 is sequentially and continuously decoding instructions from said regular program memory 2, additional instruction information from said CLIW memory 10 will only be supplied to said execution logic unit 7 when said regular instruction decoder 1 decodes a referential instruction. At this point, the decoded instruction from the CLIW instruction decoder 9 is fed to the receiving means 8 of the execution logic unit 7 for replacing all information which would be normally supplied by said regular instruction decoder 1.
  • Figure 3a is a table showing the execution of normal VLIW instructions in a processor with 5 stage pipeline according to the prior art. The table of figure 3a shows the steps of instruction fetch, instruction decode, operand read, execution and operand write.
  • Figure 3b is a table showing the pipelined execution of a program according to the invention. In terms of processing regular program instructions, the sequence of operations is identical with the one as shown in the table of figure 3a. In case a referential instruction is encountered however, two additional steps are inserted. At the time of decoding a regular instruction, which is decoded as being a referential instruction, the CLIW instruction referred to in the referential instruction is fetched. See for example the line having line header "instruction decode and CLIW fetch" between machine cycle 2 and machine cycle 6. Also, at the time operands are read from memory, the CLIW instruction fetched in the previous machine cycle is decoded. This is possible because the referential instruction 12 contains all address information to read the needed operands. A referential instruction 12 contains a pointer to a particular CLIW instruction which is to be fetched and decoded so as to be executed with data to be read. It is referred to the table in figure 3b to the line having line header "operand read and CLIW decode" between machine cycles 3 and 7. The sequence of operations carried out in pipeline for a particular instruction follows a diagonal line in the table as indicated by an arrow.
  • A prior art processor which controls multiple execution units in parallel by one VLIW instruction usually requires large program memory space for the optimum usage of parallel execution of the data processing apparatus. The invention restricts the usage of long instructions to very time consuming parts of an algorithm, the so called inner loops. Thus frequently executed instructions are executed in a highly parallel fashion while significantly decreasing the required memory space for program code for instructions which cannot be carried out in parallel. The code of a VLIW instruction of the prior art determines for each execution step the operation codes, the operand assignments, the output assignments and the memory addresses. The great variety of such configurations results in a high bit width of each of the VLIW instructions. Although VLIW instructions offer full coding flexibility for each execution step and thus always support maximum parallelism, the program code consumes a large amount of program memory, particularly for those execution steps which do not allow full parallel operation.
  • Typical programs for digital signal processors generally consist of inner loops in which few instructions are repeated very often. The instructions in an inner loop should be supported by maximum parallelism of the digital signal processor because they can reduce the required run time to a large extent.
  • The invention solves this problem by using short instructions combined with configurable length instruction words (CLIW). Thus, the invention offers the advantage of maximizing the execution efficiency of inner loops and limited program space for program code outside these inner loops.
  • The regular instructions outside the inner loops are executed sequentially. A regular instruction is directed to only certain frequent connections and operations of the execution units and the necessary operands. All regular instructions are directly fetched from the regular program memory 2. Additionally, CLIW instructions are stored in a dedicated CLIW memory 10. A special referential instruction is used for initiating the execution of CLIW instructions. The referential instruction loads a CLIW instruction from the CLIW memory 10. The address P of the CLIW instruction to be fetched is defined by the referential instruction.
  • A CLIW instruction 13 defines all possible types of operation, operand connections and output connections. The referential instruction includes all required memory addresses for the operations defined in the CLIW instruction associated therewith. Thus, the referential instruction together with its associated CLIW instruction has all the information that a VLIW instruction according to the prior art requires.
  • Since the bit width of regular program instructions (and thus also of referential instructions) is preferably configured to be significantly lower than the bit width of CLIW instructions, it is possible to write a much more compact program code than with VLIW instructions only.
  • The program code for each execution of the same CLIW instruction includes just another regular (short) referential instruction 12. Since typically the type of parallel operations and connections do not change within a set of CLIW instructions (for example for the execution of matrix operations) it is possible to save program space for CLIW instructions by simply changing the memory address in the referential instruction.
  • It is thus preferred to specify the memory address of operands within a referential instruction 12 independent of the reference to a particular CLIW instruction 13. This does not only allow for using different memory operands with the same CLIW instruction but this also speeds up the instruction flow execution within the processor when using a pipelined execution.
  • The number of required CLIW instructions in the inner loops depends on the actual program. There is a possibility to extend the fixed number of available CLIW instructions in a CLIW memory. After initialization, the CLIW memory can be dynamically reconfigured by recalling referential instructions. Different packets of CLIW instructions can be used in different parts of an algorithm. This feature is enabled by reloading CLIW memory packets at run time.
  • The size of the CLIW memory is user definable. Usually the size of the CLIW memory will be much smaller than the program memory. Those parts of the CLIW memory which contain always a constant set of CLIW instructions can be implemented as a read-only-memory (ROM). CLIW instructions which are encoded in a ROM can still be called together with data at different memory addresses because the address information is included in the referential instruction.

Claims (12)

  1. A data processing apparatus for executing instructions of a program comprising a plurality of instructions, said apparatus having:
    a first instruction decoder (1) for sequentially fetching program instructions (11) of a first type from a first program memory (2) and for decoding instructions of said first type;
    an address decoder (4) for determining the address of data to be loaded from or written to a data memory (3);
    a plurality of computational units (61, 62, 63, 64) for executing operations upon data according to the interpretation of said first instruction decoder (7) and for providing the results of these operations;
    an execution logic unit (7) for providing said plurality of computational units (61, 62, 63, 64) with data and for controlling the operation of said plurality of computational units (61, 62, 63, 64) according to an instruction (11) of said first type;
    characterized in that
    said first instruction decoder (1) discriminates whether said apparatus is to execute a referential instruction (12) which initiates execution of an instruction (13) of a second type.
  2. The apparatus according to claim 1,
    characterized by
    a second instruction decoder (9) for fetching an instruction (13) of said second type and for decoding an instruction (13) of said second type.
  3. The apparatus according to claim 2 or 3,
    characterized in that
    said instruction (13) of said second type comprises a plurality of operators including data assignment information of operands and data assignment information of results.
  4. The apparatus according to any of the previous claims,
    characterized in that
    said execution logic (7) comprises means (8) for interpreting instructions (13) of said second type.
  5. The apparatus according to any of the previous claims,
    characterized in that
    said referential instruction (12) includes address information of data upon which said instruction (13) of said second type is to be executed.
  6. The apparatus according to any of the previous claims,
    characterized by
    said apparatus is configured to allow for a pipe-lined execution of instructions (11, 12; 13) of any of the first or second type.
  7. The apparatus of any of claims 2 to 6,
    characterized in that
    instructions (13) of said second type are stored in a second program memory (10).
  8. A method of executing instructions for a data processing apparatus comprising a plurality of computational units (61, 62, 63, ..., 6n) which can be operated in parallel, and data registers (5), the method comprising the steps of:
    fetching (IF1, IF2, ..., IF5) an instruction (11) of a first type from a first program memory (2);
    decoding (ID1, ID2, ..., ID5) said instructions (11) of said first type for determining the operation to be executed;
    reading (OR1, OR2, ..., OR5) operands from a data memory (3) or from said data registers (5);
    executing (E1, E2, ..., E5) an operation upon said operands; and
    writing (OW1, OW2, ..., OW5) the results of said operation into said data memory (3) or into said data registers (5);
    characterized in that
    upon decoding of a referential instruction (12), which includes predetermined information so as to be decoded as such, the following steps are performed:
    fetching (CF1,CF2,...,CF5) an instruction (13) of a second type according to information included in said referential instruction (12); and
    decoding said instruction (12) of said second type for determining the operations to be executed in parallel.
  9. The method according to claim 8,
    characterized in that
    said referential instruction (12) includes address information including operands addresses and results addresses which information is decoded substantially at the time of decoding said referential instruction (12).
  10. The method according to any of claims 8 or 9,
    characterized in that
    said steps of decoding a referential instruction (12) and said step of fetching an instruction (13) of said second type, which is associated with a particular referential instruction (12), are executed substantially simultaneously.
  11. The method according to any of claims 8 to 10,
    characterized in that
    said step of reading operands from a data memory (3) and said step of decoding an instruction (13) of said second type, which is associated with said operands, are executed substantially simultaneously.
  12. The method according to any of claims 8 to 11,
    characterized in that
    said method is carried out by a data processing apparatus in a pipe-lined manner.
EP98102925A 1998-02-19 1998-02-19 An apparatus for executing instructions of a program Expired - Lifetime EP0942359B1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP98102925A EP0942359B1 (en) 1998-02-19 1998-02-19 An apparatus for executing instructions of a program
JP2000532793A JP2003525476A (en) 1998-02-19 1999-02-04 Apparatus and method for executing program instructions
CN99803152.6A CN1114857C (en) 1998-02-19 1999-02-04 Device and method of executing instructions of a program
PCT/EP1999/000849 WO1999042922A1 (en) 1998-02-19 1999-02-04 An apparatus for and a method of executing instructions of a program
IL13624699A IL136246A0 (en) 1998-02-19 1999-02-04 An apparatus for, and a method of, executing instructions of a program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP98102925A EP0942359B1 (en) 1998-02-19 1998-02-19 An apparatus for executing instructions of a program

Publications (2)

Publication Number Publication Date
EP0942359A1 true EP0942359A1 (en) 1999-09-15
EP0942359B1 EP0942359B1 (en) 2012-07-04

Family

ID=8231450

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98102925A Expired - Lifetime EP0942359B1 (en) 1998-02-19 1998-02-19 An apparatus for executing instructions of a program

Country Status (5)

Country Link
EP (1) EP0942359B1 (en)
JP (1) JP2003525476A (en)
CN (1) CN1114857C (en)
IL (1) IL136246A0 (en)
WO (1) WO1999042922A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1050798A1 (en) * 1999-05-03 2000-11-08 STMicroelectronics SA Decoding instructions
WO2002042907A2 (en) * 2000-11-27 2002-05-30 Koninklijke Philips Electronics N.V. Data processing apparatus with multi-operand instructions
EP1335279A2 (en) * 2002-02-12 2003-08-13 IP-First LLC Apparatus and method for extending a microprocessor instruction set
EP1351133A2 (en) * 2002-04-02 2003-10-08 IP-First LLC Suppression of store into instruction stream detection
EP1351130A2 (en) * 2002-04-02 2003-10-08 IP-First LLC Apparatus and method for conditional instruction execution
EP1351132A2 (en) * 2002-04-02 2003-10-08 IP-First LLC Apparatus and method for selective control of results write back
EP1351131A2 (en) * 2002-04-02 2003-10-08 IP-First LLC Mechanism for extending the number of registers in a microprocessor
EP1343075A3 (en) * 2002-03-08 2005-08-03 IP-First LLC Apparatus and method for instruction set extension using prefixes
US7185180B2 (en) 2002-04-02 2007-02-27 Ip-First, Llc Apparatus and method for selective control of condition code write back
US7315921B2 (en) 2002-02-19 2008-01-01 Ip-First, Llc Apparatus and method for selective memory attribute control
US7328328B2 (en) 2002-02-19 2008-02-05 Ip-First, Llc Non-temporal memory reference control mechanism
US7380109B2 (en) 2002-04-15 2008-05-27 Ip-First, Llc Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor
US7529912B2 (en) 2002-02-12 2009-05-05 Via Technologies, Inc. Apparatus and method for instruction-level specification of floating point format
US7546446B2 (en) 2002-03-08 2009-06-09 Ip-First, Llc Selective interrupt suppression

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7043625B2 (en) 2000-03-27 2006-05-09 Infineon Technologies Ag Method and apparatus for adding user-defined execution units to a processor using configurable long instruction word (CLIW)
ATE498158T1 (en) 2000-11-06 2011-02-15 Broadcom Corp RECONFIGURABLE PROCESSING SYSTEM AND METHOD
CN100378653C (en) * 2005-01-20 2008-04-02 西安电子科技大学 8-bit RISC microcontroller with double arithmetic logic units
US11204768B2 (en) 2019-11-06 2021-12-21 Onnivation Llc Instruction length based parallel instruction demarcator

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2199896A5 (en) * 1972-09-15 1974-04-12 Ibm
EP0723220A2 (en) * 1995-01-17 1996-07-24 International Business Machines Corporation Parallel processing system amd method using surrogate instructions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2199896A5 (en) * 1972-09-15 1974-04-12 Ibm
EP0723220A2 (en) * 1995-01-17 1996-07-24 International Business Machines Corporation Parallel processing system amd method using surrogate instructions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"SELECTING PREDECODED INSTRUCTIONS WITH A SURROGATE", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 36, no. 6A, 1 June 1993 (1993-06-01), pages 35 - 38, XP000370750 *
J. A. BARBER ET AL.: "MLID Addressing", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 27, no. 3, October 1984 (1984-10-01), NEW YORK US, pages 1740 - 1745, XP002069681 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6678818B1 (en) 1999-05-03 2004-01-13 Stmicroelectronics S.A. Decoding next instruction of different length without length mode indicator change upon length change instruction detection
EP1050798A1 (en) * 1999-05-03 2000-11-08 STMicroelectronics SA Decoding instructions
WO2002042907A2 (en) * 2000-11-27 2002-05-30 Koninklijke Philips Electronics N.V. Data processing apparatus with multi-operand instructions
WO2002042907A3 (en) * 2000-11-27 2002-08-15 Koninkl Philips Electronics Nv Data processing apparatus with multi-operand instructions
US7543134B2 (en) 2002-02-12 2009-06-02 Ip-First, Llc Apparatus and method for extending a microprocessor instruction set
US7181596B2 (en) 2002-02-12 2007-02-20 Ip-First, Llc Apparatus and method for extending a microprocessor instruction set
US7529912B2 (en) 2002-02-12 2009-05-05 Via Technologies, Inc. Apparatus and method for instruction-level specification of floating point format
EP1335279A3 (en) * 2002-02-12 2005-08-03 IP-First LLC Apparatus and method for extending a microprocessor instruction set
EP1335279A2 (en) * 2002-02-12 2003-08-13 IP-First LLC Apparatus and method for extending a microprocessor instruction set
US7647479B2 (en) 2002-02-19 2010-01-12 Ip First, Llc Non-temporal memory reference control mechanism
US7328328B2 (en) 2002-02-19 2008-02-05 Ip-First, Llc Non-temporal memory reference control mechanism
US7315921B2 (en) 2002-02-19 2008-01-01 Ip-First, Llc Apparatus and method for selective memory attribute control
US7546446B2 (en) 2002-03-08 2009-06-09 Ip-First, Llc Selective interrupt suppression
EP1343075A3 (en) * 2002-03-08 2005-08-03 IP-First LLC Apparatus and method for instruction set extension using prefixes
US7395412B2 (en) 2002-03-08 2008-07-01 Ip-First, Llc Apparatus and method for extending data modes in a microprocessor
EP1351133A3 (en) * 2002-04-02 2005-10-19 IP-First LLC Suppression of store into instruction stream detection
US7185180B2 (en) 2002-04-02 2007-02-27 Ip-First, Llc Apparatus and method for selective control of condition code write back
US7302551B2 (en) 2002-04-02 2007-11-27 Ip-First, Llc Suppression of store checking
EP1351130A3 (en) * 2002-04-02 2005-09-07 IP-First LLC Apparatus and method for conditional instruction execution
EP1351132A3 (en) * 2002-04-02 2005-09-07 IP-First LLC Apparatus and method for selective control of results write back
US7373483B2 (en) 2002-04-02 2008-05-13 Ip-First, Llc Mechanism for extending the number of registers in a microprocessor
US7380103B2 (en) 2002-04-02 2008-05-27 Ip-First, Llc Apparatus and method for selective control of results write back
EP1351131A3 (en) * 2002-04-02 2005-08-17 IP-First LLC Mechanism for extending the number of registers in a microprocessor
EP1351131A2 (en) * 2002-04-02 2003-10-08 IP-First LLC Mechanism for extending the number of registers in a microprocessor
EP1351132A2 (en) * 2002-04-02 2003-10-08 IP-First LLC Apparatus and method for selective control of results write back
EP1351130A2 (en) * 2002-04-02 2003-10-08 IP-First LLC Apparatus and method for conditional instruction execution
EP1351133A2 (en) * 2002-04-02 2003-10-08 IP-First LLC Suppression of store into instruction stream detection
US7380109B2 (en) 2002-04-15 2008-05-27 Ip-First, Llc Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor

Also Published As

Publication number Publication date
CN1291306A (en) 2001-04-11
WO1999042922A1 (en) 1999-08-26
CN1114857C (en) 2003-07-16
JP2003525476A (en) 2003-08-26
IL136246A0 (en) 2001-05-20
EP0942359B1 (en) 2012-07-04

Similar Documents

Publication Publication Date Title
EP0942359A1 (en) An apparatus for and a method of executing instructions of a program
US6356994B1 (en) Methods and apparatus for instruction addressing in indirect VLIW processors
US7028170B2 (en) Processing architecture having a compare capability
US5233694A (en) Pipelined data processor capable of performing instruction fetch stages of a plurality of instructions simultaneously
JP2550213B2 (en) Parallel processing device and parallel processing method
US6101592A (en) Methods and apparatus for scalable instruction set architecture with dynamic compact instructions
EP0652510B1 (en) Software scheduled superscalar computer architecture
US7039791B2 (en) Instruction cache association crossbar switch
EP1323036B1 (en) Storing stack operands in registers
US20040073773A1 (en) Vector processor architecture and methods performed therein
EP0772821B1 (en) Tagged prefetch and instruction decoder for variable length instruction set and method of operation
KR100266424B1 (en) Data processor having a microprogram rom
US7574583B2 (en) Processing apparatus including dedicated issue slot for loading immediate value, and processing method therefor
KR100316078B1 (en) Processor with pipelining-structure
JP3100721B2 (en) Apparatus and method for issuing multiple instructions
US6341348B1 (en) Software branch prediction filtering for a microprocessor
EP0982655A2 (en) Data processing unit and method for executing instructions of variable lengths
US20010016899A1 (en) Data-processing device
US7356673B2 (en) System and method including distributed instruction buffers for storing frequently executed instructions in predecoded form
US20020120830A1 (en) Data processor assigning the same operation code to multiple operations
US5815697A (en) Circuits, systems, and methods for reducing microprogram memory power for multiway branching
US6704857B2 (en) Methods and apparatus for loading a very long instruction word memory
KR101147190B1 (en) Run-time selection of feed-back connections in a multiple-instruction word processor
JP4828409B2 (en) Support for conditional actions in time stationery processors
JP5122277B2 (en) Data processing method, processing device, multiple instruction word set generation method, compiler program

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB IT

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17P Request for examination filed

Effective date: 19991019

17Q First examination report despatched

Effective date: 20000323

AKX Designation fees paid

Free format text: DE FR GB IT

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: INFINEON TECHNOLOGIES AG

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: INFINEON TECHNOLOGIES AG

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: LANTIQ DEUTSCHLAND GMBH

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 69842785

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G06F0009318000

Ipc: G06F0009300000

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 9/38 20060101ALI20111222BHEP

Ipc: G06F 9/318 20060101ALI20111222BHEP

Ipc: G06F 9/30 20060101AFI20111222BHEP

RTI1 Title (correction)

Free format text: AN APPARATUS FOR EXECUTING INSTRUCTIONS OF A PROGRAM

RIN1 Information on inventor provided before grant (corrected)

Inventor name: WEINGARTEN, ERAN

Inventor name: COHEN, ELLIOT

Inventor name: SHENDEROVITCH, GEORGY

Inventor name: YANNI, MENY

Inventor name: LAVI, YOAV

Inventor name: HERSHKO, ANAT

Inventor name: GRANOT, HAIM

Inventor name: BLUM, RIVKA

Inventor name: ROM, AMNONC/O I.C. COM

Inventor name: KNUTH, ROBERT, DR.

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB IT

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 69842785

Country of ref document: DE

Effective date: 20120830

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20120704

26N No opposition filed

Effective date: 20130405

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 69842785

Country of ref document: DE

Effective date: 20130405

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20150219

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20150219

Year of fee payment: 18

Ref country code: GB

Payment date: 20150218

Year of fee payment: 18

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 69842785

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20160219

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20161028

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160901

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160219

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160229