WO1993001547A1 - Risc microprocessor architecture implementing fast trap and exception state - Google Patents

Risc microprocessor architecture implementing fast trap and exception state Download PDF

Info

Publication number
WO1993001547A1
WO1993001547A1 PCT/JP1992/000872 JP9200872W WO9301547A1 WO 1993001547 A1 WO1993001547 A1 WO 1993001547A1 JP 9200872 W JP9200872 W JP 9200872W WO 9301547 A1 WO9301547 A1 WO 9301547A1
Authority
WO
WIPO (PCT)
Prior art keywords
εaid
instruction
unit
execution
inεtruction
Prior art date
Application number
PCT/JP1992/000872
Other languages
French (fr)
Inventor
Le Trong Nguyen
Derek J. Lentz
Yoshiyuki Miyayama
Sanjiv Garg
Yasuaki Hagiwara
Johannes Wang
Quang H. Trang
Original Assignee
Seiko Epson Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seiko Epson Corporation filed Critical Seiko Epson Corporation
Priority to JP50215493A priority Critical patent/JP3333196B2/en
Priority to AT92914386T priority patent/ATE188786T1/en
Priority to KR1019930700689A priority patent/KR100294276B1/en
Priority to DE69230554T priority patent/DE69230554T2/en
Priority to EP92914386A priority patent/EP0547240B1/en
Publication of WO1993001547A1 publication Critical patent/WO1993001547A1/en
Priority to HK98116066A priority patent/HK1014783A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • G06F9/3865Recovery, e.g. branch miss-prediction, exception handling using deferred exception handling, e.g. exception flags
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • G06F9/384Register renaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/461Saving or restoring of program or task context
    • G06F9/462Saving or restoring of program or task context with multiple register sets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked

Definitions

  • the present invention relates to microprocessor architectures, and more particularly, to interrupt and exception handling in microprocessors.
  • a microprocessor In a typical microprocessor, instructions are generally executed in sequence unless a control flow varying instruction is encountered or an exception occurs. With respect to exceptions, facilities are included for changing the control flow upon the occurrence of particular events which may or may not be related to particular instructions in the instruction stream.
  • a microprocessor may include an interrupt request (IRQ) lead which, when activated by an external device, causes the microprocessor to save certain information relating to the current state of the machine, including an indication of the address of the next instruction to be executed, and then immediately transfer control to an interrupt handler which begins at some predetermined address.
  • IRQ interrupt request
  • microprocessor may also save information related to the current state of the machine and transfer control to an exception handler.
  • some microprocessors include a "software trap" instruction in their instruction set, which also causes the microprocessor to save information concerning the state of the machine and transfer control to an exception handler.
  • interrupt, trap, fault and exception are used interchangeably.
  • an externally generated interrupt always causes the microprocessor to transfer control to the same interrupt handler entry point. If several external devices are present and able to activate the interrupt request lead, the interrupt handler must first determine which device caused the interrupt and then transfer control to a portion of code to handle that particular device.
  • the Intel 8048 microcontroller includes an ⁇ .NT input which, when activated, causes the microcontroller to transfer control to absolute memory location 3.
  • the 8048 also includes a RESET input which, when activated, causes the microcontroller to transfer control to absolute memory location 0. It also includes an internal timer/counter which can generate interrupts which cause a transfer of control to absolute memory location 7.
  • Other microprocessors include "interrupt level" leads in addition to the interrupt request lead.
  • microprocessors when an external device activates the interrupt request lead, it also places a trap number, unique to that particular device, on the interrupt level lines. The internal hardware of the microprocessor then transfers control, or "vectors", to any of several interrupt handlers, each corresponding to a different trap number. Similarly, some microprocessors have only a single predetermined entry point for all routines written to handle internally generated exceptions, and others have facilities for vectoring automatically to a routine dependent upon a trap number defined for each particular type of internal exception that might occur. - 4 -
  • a number of different techniques were used to determine the entry point of the appropriate handler.
  • a table of addresses was created, beginning at a particular table base address which was either fixed or definable by the user. Each entry in the table was the same length as the length of an address, for example two or four bytes long, and contained the entry point for a corresponding trap number.
  • the microprocessor first determined the base address of the table, then added times the trap number (where m is the number of bytes in each entry) , and then loaded the information stored at the resulting address into the program counter (PC) to thereby transfer control to the routine beginning at the address specified in the table entr .
  • PC program counter
  • an entire branch instruction was stored in each entry in the table, instead of merely the address of a handler.
  • the number of bytes in each entry was equal to the number of bytes in a branch instruction.
  • the microprocessor would first determine the table base address, add times the trap number, and simply load the result into the program counter. The first instruction then executed would be the branch instruction in the table, and control would finally transfer to the appropriate exception handler.
  • the hardware automatically stores the contents of the registers on a stack before transferring control to the handler.
  • This technique is also inadequate since it increases hardware complexity, and also can delay transfer to the handler significantly.
  • the delays caused by existing techniques for protecting the contents of registers when a trap handler is invoked can be unacceptable in a high performance microprocessor.
  • a microprocessor architecture is employed which alleviates many of the above deficiencies in prior art systems.
  • a "fast trap" exception dispatching technique is employed by which an entire handler can be stored in a single vector address table entry.
  • Each table entry has enough space for at least two instructions, and preferably significantly more, so that when a fast trap occurs, the microprocessor need only branch to an address determined by concatenating m times the trap number to a base address. The delay required to fetch an entry point address from the table, or to fetch and execute a preliminary branch instruction is eliminated.
  • the microprocessor may also include other, less time efficient, vectoring techniques for less critical types of traps.
  • the processor when a trap is encountered, the processor enters an interrupted state which automatically shifts a number of shadow registers to the foreground and shifts a corresponding set of foreground registers into the background. Register contents are not transferred; rather, the shadow registers are simply made available in place of the normal registers.
  • the handler has a set of registers immediately available for use without any need to be concerned about destroying data needed for the main instruction stream.
  • the above-mentioned HIGH-PERFORMANCE RISC MICROPROCESSOR ARCHITECTURE application describes an advanced microprocessor which prefetches instructions prior to the time they are executed, can handle out-of- order return of instruction prefetch requests, can execute more than one instruction during the same execution time, and can also execute instructions out of order relative to their sequence in the instruction stream.
  • Another aspect of the present invention includes a mechanism to maintain the preciseness of synchronous exceptions which occur relative to instructions prior to and during the time they are executed.
  • the microprocessor architecture described in that application further includes facilities for handling a separate procedural instruction flow called via a procedural, or emulation, instruction in the main instruction flow.
  • the transfer of control to a procedural instruction flow is accomplished without flushing any instructions already prefetched in the main instruction flow, by having a separate emulation instruction prefetch queue.
  • the interrupted state remains available whether the processor is executing from the main instruction stream or a procedural instruction stream, and the processor maintains an indication of which instruction stream to return to upon a return from trap.
  • FIG. 1 is a simplified block diagram of the preferred microprocessor architecture implementing the present invention
  • FIG. 2 is a detailed block diagram of the instruction fetch unit constructed in accordance with the present invention
  • Figure 3 is a block diagram of the program counter logic unit constructed in accordance with the present invention
  • Figure 4 is a further detailed block diagram of the program counter data and control path logic
  • Figure 5 is a simplified block diagram of the instruction execution unit of the present invention
  • Figure 6a is a simplified block diagram of the register file architecture utilized in a preferred embodiment of the present invention.
  • Figure 6b is a graphic illustration of the storage register format of the temporary buffer register file and utilized in a preferred embodiment of the present invention;
  • Figure 6c is a graphic illustration of the primary and secondary instruction sets as present in the last two stages of the instruction FIFO unit of the present invention.
  • Figures 7a-c provide a graphic illustration of the reconfigurable states of the primary integer register set as provided in accordance with a preferred embodiment of the present invention
  • Figure 8 is a graphic illustration of a reconfigurable floating point and secondary integer register set as provided in accordance with the preferred embodiment of the present invention.
  • Figure 9 is a graphic illustration of a tertiary boolean register set as provided in a preferred embodiment of the present invention
  • Figure 10 is a detailed block diagram of the primary integer processing data path portion of the instruction execution unit constructed in accordance with the preferred embodiment of the present invention
  • Figure 11 is a detailed block diagram of the primary floating point data path portion of the instruction execution unit constructed in accordance with a preferred embodiment of the present invention.
  • Figure 12 is a detailed block diagram of the boolean operation data path portion of the instruction execution unit as constructed in accordance with the preferred embodiment of the present invention.
  • Figure 13 is a detailed block diagram of a load/store unit constructed in accordance with the preferred embodiment of the present invention
  • Figure 14 is a timing diagram illustrating the preferred sequence of operation of a preferred embodiment of the present invention in executing multiple instructions in accordance with the present invention
  • Figure 15 is a simplified block diagram of the virtual memory control unit as constructed in accordance with the preferred embodiment of the present invention
  • Figure 16 is a graphic representation of the virtual memory control algorithm as utilized in a preferred embodiment of the present invention.
  • Figure 17 is a simplified block diagram of the cache control unit as utilized in a preferred embodiment of the present invention.
  • the architecture 100 of the present invention is generally shown in Figure 1.
  • An Instruction Fetch Unit (IFU) 102 and an Instruction Execution Unit (IEU) 104 are the principal operative elements of the architecture 100.
  • a Virtual Memory Unit (VMU) 108, Cache Control Unit (CCU) 106, and Memory Control Unit (MCU) 110 are provided to directly support the function of the IFU 102 and IEU 104.
  • a Memory Array Unit (MAU) 112 is also provided as a generally essential element for the operation of the architecture 100, though the MAU 112 does not directly exist as an integral component of the architecture 100.
  • the IFU 102, IEU 104, VMU 108, CCU 106, and MCU 110 are fabricated on a single silicon die utilizing a conventional 0.8 micron design rule low-power CMOS process and comprising some 1,200,000 transistors.
  • the standard processor or system clock speed of the architecture 100 is 40 MHz.
  • the internal processor clock speed is 160 MHz.
  • the IFU 102 is primarily responsible for the fetching of instructions, the buffering of instructions pending execution by the IEU 104, and, generally, the calculation of the next virtual address to be used for the fetching of next instructions .
  • instructions are each fixed at a length of 32 bits.
  • Instruction sets, or "buckets" of four instructions are fetched by the IFU 102 simultaneously from an instruction cache 132 within the CCU 106 via a 128 bit wide instruction bus 114.
  • the transfer of instruction sets is coordinated between the IFU 102 and CCU 106 by control signals provided via a control bus 116.
  • the virtual address of a instruction set to be fetched is provided by the IFU 102 via an IFU combined arbitration, ccntrol and address bus 118 onto a shared arbitration, control and address bus 120 further coupled between the IEU 104 and VMU 108.
  • Arbitration for access to the VMU 108 arises from the fact that both the IFU 102 and IEU 104 utilize the VMU 108 as a common, shared resource.
  • the low order bits defining an address within a physical page of the virtual address are transferred directly by the IFU 102 to the Cache Control Unit 106 via the control lines 116.
  • the virtualizing, high order bits of the virtual address supplied by the IFU 102 are provided by the address portion of the buses 118, 120 to the VMU 108 for translation into a corresponding physical page address.
  • this physical page address is transferred directly from the VMU 108 to the Cache Control Unit 106 via the address control lines 122 one-half internal processor cycle after the translation request is placed with the VMU 108.
  • the instruction stream fetched by the IFU 102 is, in turn, provided via an instruction stream bus 124 to the IEU 104. Control signals are exchanged between the IFU 102 and the IEU 104 via controls lines 126. In addition, certain instruction fetch addresses, typically those requiring access to the register file present within the IEU 104, are provided back to the IFU via a target address return bus within the control lines 126.
  • the IEU 104 stores and retrieves data with respect to a data cache 134 provided within the CCU 106 via an
  • the entire physical address for IEU data accesses is provided via an address portion of the control bus 128 to the CCU 106.
  • the control bus 128 also provides for the exchange of control signals between the IEU 104 and CCU 106 for managing data transfers.
  • the IEU 104 utilizes the VMU 108 as a resource for converting virtual data address into physical data addresses suitable for submission to the CCU 106.
  • the virtualizing portion of the data address is provided via the arbitration, control and address bus 120 to the VMU 108.
  • the VMU 108 returns the corresponding physical address via the bus 120 to the IEU 104.
  • the IEU 104 requires the physical address for use in ensuring that load/store operations occur in proper program stream order.
  • the CCU 106 performs the generally conventional high-level function of determining whether physical address defined requests for data can be satisfied from the instruction and data caches 132, 134, as appropriate. Where the access request can be properly fulfilled by access to the instruction or data caches 132, 134, the CCU 106 coordinates and performs the data transfer via the data buses 114, 128.
  • the CCU 106 provides the corresponding physical address to the MCU 110 along with sufficient control information to identify whether a read or write access of the MAU 112 is desired, the source or destination cache 132, 134 of the CCU 106 for each request, and additional identifying information to allow the request operation to be correlated with the ultimate data request as issued by the IFU 102 or IEU 104.
  • the MCU 110 preferably includes a port switch unit 142 that is coupled by a uni-dire ⁇ tional data bus 136 with the instruction cache 132 of the CCU 106 and a bi ⁇ directional data bus 138 to the data cache 134.
  • the port switch 142 is, in essence, a large multiplexer allowing a physical address obtained from the control bus 140 to be routed to any one of a number of ports P 0 - P N 146 ⁇ and the bi-directional transfer of data from the ports to the data buses 136, 138.
  • Each memory access request processed by the MCU 110 is associated with one of the ports 146 ⁇ for purposes of arbitrating for access to the main system memory bus 162 as required for an access of the MAU 112.
  • the MCU provides control information via the control bus 140 to the CCU 106 to initiate the transfer of data between either the instruction or data cache 132, 134 and MAU 112 via the port switch 142 and the corresponding one of the ports 146 ⁇ .
  • the MCU 110 does not actually store or latch data in transit between the CCU 106 and MAU 112. This is done to minimize latency in the transfer and to obviate the need for tracking or managing data that may be uniquely present in the MCU 110.
  • the IFU data path begins with the instruction bus 114 that receives instruction sets for temporary storage in a prefetch buffer 260.
  • An instruction set from the prefetch buffer 260 is passed through an IDecode unit 262 and then to an IFIFO unit 264. Instruction sets stored in the last two stages of the instruction FIFO 264 are continuously available, via the data buses 278, 280, to the IEU 104.
  • the prefetch buffer unit 260 receives a single instruction set at a time from the instruction bus 114.
  • the full 128 bit wide instruction set is generally written in parallel to one of four 128 bit wide prefetch buffer locations in a Main Buffer (MBUF) 188 portion of the prefetch buffer 260.
  • MBUF Main Buffer
  • Up to four additional instruction sets may be similarly written into two 128 bit wide Target Buffer (TBUF) 190 prefetch buffer locations or to two 128 bit wide Procedural Buffer (EBUF) 192 prefetch buffer locations.
  • TBUF Target Buffer
  • EBUF Procedural Buffer
  • an instruction set in any one of the prefetch buffer locations within the MBUF 188, TBUF 190 or EBUF 192 may be transferred to the prefetch buffer output bus 196.
  • a direct fall through instruction set bus 194 is provided to connect the instruction bus 114 directly with the prefetch buffer output bus 196, thereby bypassing the MBUF, TBUF and EBUF 188, 190, 192.
  • the MBUF 188 is utilized to buffer instruction sets in the nominal or main instruction stream.
  • the TBUF 190 is utilized to buffer instruction sets fetched from a tentative target branch instruction stream. Consequently, the prefetch buffer unit 260 allows both possible instruction streams following a conditional branch instruction to be prefetched. This facility obviates the latency for further accesses to at least the CCU 106, if not the substantially greater latency of a MAU 112, for obtaining the correct next instruction set for execution following a conditional branch instruction regardless of the particular instruction stream eventually selected upon resolution of the conditional branch instruction.
  • the provision of the MBUF 188 and TBUF 190 allow the instruction fetch unit 102 to prefetch both potential instruction streams and, as will be discussed below in relationship to the instruction execution unit 104, to further allow execution of the presumed correct instruction stream.
  • any instruction sets in the TBUF 190 may be simply invalidated.
  • the instruction prefetch buffer unit 260 provides for the direct, lateral transfer of those instruction sets from the TBUF 190 to respective buffer locations in the MBUF 188.
  • the prior MBUF 188 stored instruction sets are effectively invalidated by being overwritten by the TBUF 190 transferred instruction sets. Where there is no TBUF instruction set transferred to an MBUF location, that location is simply marked invalid.
  • the EBUF 192 is provided as another, alternate prefetch path through the prefetch buffer 260.
  • the EBUF 192 is preferably utilized in the prefetching of an alternate instruction stream that is used to implement an operation specified by a single instruction, a "procedural" instruction, encountered in the MBUF 188 instruction stream.
  • complex or extended instructions can be implemented through software routines, or procedures, and processed through the prefetch buffer unit 260 without disturbing the instruction streams already prefetched into the MBUF 188.
  • the present invention generally permits handling of procedural instructions that are first encountered in the TBUF 190, prefetching of the procedural instruction stream is held with all prior pending conditional branch instructions are resolved.
  • conditional branch instructions occurring in the procedural instruction stream to be consistently handled through the use of the TBUF 190.
  • the target instruction sets will have been prefetched into the TBUF 190 and can be simply laterally transferred to the EFUF 192.
  • each of the MBUF 188, TBUF 190 and EBUF 192 are coupled to the prefetch buffer output bus 196 so as to provide any instruction set stored by the prefetch unit onto the output bus 196.
  • a flow through bus 194 is provided to directly transfer an instruction set from the instruction bus 114 directly to the output bus 196.
  • the prefetch buffers within the MBUF 188, TBUF 190, EBUF 192 do not directly form a FIFO structure. Instead, the provision of an any buffer location to output bus 196 connectivity allows substantial freedom in the prefetch ordering of instruction sets retrieved from the instruction cache 132. That is, the instruction fetch unit 102 generally determines and requests instruction sets in the appropriate instruction stream order of instructions. However, the order in which instruction sets are returned to the IFU 102 is allowed to occur out-of- order as appropriate to match the circumstances where some requested instruction sets are available and accessible from the CCU 106 alone and others require an access of the MAU 112.
  • instruction sets may not be returned in order to the prefetch buffer unit 260, the sequence of instruction sets output on the output bus 196 must generally conform to the order of instruction set requests issued by the IFU 102; the in-order instruction stream sequence subject to, for example, tentative execution of a target branch stream.
  • the IDecode unit 262 receives the instruction sets, generally one per cycle, IFIFO unit 264 space permitting, from the prefetch buffer output bus 196. Each set of four instructions that make up a single instruction set is decoded in parallel by the IDecode unit 262. While relevant control flow information is extracted via lines 318 for the benefit of the control path portion of the IFU 102, the contents of the instruction set is not altered by the IDecode unit 262.
  • Instruction sets from the IDecode Unit 162 are provided onto a 128 bit wide input bus 198 of the IFIFO unit 264.
  • the IFIFO unit 264 consists of a sequence of master/slave registers 200, 204, 208, 212, 216, 220, 224. Each register is coupled to its successor to allow the contents of the master registers 200, 208, 216 to be transferred during a first half internal processor cycle of FIFO operation to the slave registers 204, 212, 220 and then to the next successive master register 208, 216, 224 during the succeeding half-cycle of operation.
  • the input bus 198 is connected to the input of each of the master registers 200, 208, 216, 224 to allow loading of an instruction set from the IDecode unit 262 directly in to a master register during the second half-cycle of FIFO operation.
  • loading of a master register from the input bus 198 need not occur simultaneously with a FIFO shift of data within the IFIFO unit 264. Consequently, the IFIFO unit 264 can be continuously filled from the input bus 198 regardless of the current depth of instruction sets stored within the instruction FIFO unit 264 and, further, independent of the FIFO shifting of data through the IFIFO unit 264.
  • Each of the master/slave registers 200, 204, 208, 212, 216, 220, 224 in addition to providing for the full parallel storage of a 128 bit wide instruction set, also provides for the storage of several bits of control information in the respective control registers 202, 206, 210, 214, 218, 222, 226.
  • the preferred set of control bits include exception miss and exception modify, (VMU), no memory (MCU), branch bias, stream, and offset (IFU) .
  • VMU exception miss and exception modify,
  • MCU no memory
  • IFU offset
  • the output of instruction sets from the IFIFO unit 264 is obtained simultaneously from the last two master registers 216, 224 on the I_Bucket_0 and I_Bucket_l instruction set output buses 278, 280.
  • the corresponding control register information is provided on the IBASV0 and IBASV1 control field buses 282, 284. These output buses 278, 282, 280, 284 are all provided as the instruction stream bus 124 to the IEU 104.
  • the control path for the IFU 102 directly supports the operation of the prefetch buffer unit 260, IDecode unit 262 and IFIFO unit 264.
  • a prefetch control logic unit 266 primarily manages the operation of the prefetch buffer unit 260.
  • the prefetch control logic unit 266 and IFU 102 in general, receives the system clock signal via the clock line 290 for synchronizing IFU operations with these of the IEU 104, CCU 106 and VMU 108. Control signals appropriate for the selection and writing of instruction sets into the MBUF 188, TBUF
  • a number of control signals are provided on the control lines 316 to the prefetch control logic unit 266. Specifically, a fetch request control signal is provided to initiate a prefetch operation. Other control signals provided on the control line 316 identify the intended destination of the requested prefetch operation as being the MBUF 188, TBUF 190 or EBUF 192.
  • the prefetch control logic unit 266 In response to a prefetch request, the prefetch control logic unit 266 generates an ID value and determines whether the prefetch request can be posted to the CCU 106. Generation of the ID value is accomplished through the use of a circular four-bit counter.
  • the use of a four-bit counter is significant in three regards. The first is that, typically a maximum of nine instruction sets may be active at one time in the prefetch buffer unit 260; four instruction sets in the MBUF 188, two in the TBUF 190, two in the EBUF 192 and one provided directly to the IDecode unit 262 via the flow through bus 194. Secondly, instruction sets include four instructions of four bytes each. Consequently, the least significant four bits of any address selecting an instruction set for fetching are superfluous. Finally, the prefetch request ID value can be easily associated with a prefetch request by insertion as the least significant four bits of the prefetch request address; thereby reducing the total number of address lines required to interface with the CCU 106.
  • the architecture 100 provides for the return of the ID request value with the return of instruction sets from the CCU 106.
  • the out-of-order instruction set return capability may result in exhaustion of the sixteen unique IDs.
  • a combination of conditional instructions executed out-of-order, resulting in additional prefetches and instruction sets requested but not yet returned can lead to potential re-use of an ID value. Therefore, the four-bit counter is preferably held, and no further instruction set prefetch requests issued, where the next ID value would be the same as that associated with an as yet outstanding fetch request or another instruction set then pending in the prefetch buffer 260.
  • the prefetch control logic unit 266 directly manages a prefetch status array 268 which contains status storage locations logically corresponding to each instruction set prefetch buffer location within the MBUF 188, TBUF 190 and EBUF 192.
  • the prefetch control logic unit 266, via selection and data lines 306, can scan, read and write data to the status register array 268.
  • a main buffer register 308 provides for storage of four, four-bit ID values (MB ID), four single-bit reserved flags (MB RES) and four single-bit valid flags (MB VAL) , each corresponding by logical bit-position to the respective instruction set storage locations within the MBUF 180.
  • a target buffer register 310 and extended buffer register 312 each provide for the storage of two four-bit ID values (TB ID, EB ID), two single-bit reserved flags (TB RES, EB RES), and two single-bit valid flags (TB VAL, EB VAL).
  • a flow through status register 314 provides for the storage of a single four-bit ID value (FT ID), a single reserved flag bit (FT RES), and a single valid flag bit (FT VAL) .
  • the status register array 268 is first scanned and, as appropriate, updated by the prefetch control logic unit 266 each time a prefetch request is placed with the CCU 106 and subsequently scanned and updated each time an instruction set is returned. Specifically, upon receipt of the prefetch request signal via the control lines 316, the prefetch control logic unit 216 increments the current circular counter generated ID value, scans the status register array 268 to determine whether the ID value is available for use and whether a prefetch buffer location of the type specified by the prefetch request signal is available, examines the state of the CCU IBUSY control line 300 to determine whether the CCU 106 can accept a prefetch request and, if so, asserts a CCU IREAD control signal on the control line 298, and places the incremented ID value on the CCU ID out bus 294 to the CCU 106.
  • a prefetch storage location is available for use where both of the corresponding reserved and valid status flags are false.
  • the prefetch request ID is written into the ID storage location within the status register array 268 corresponding to the intended storage location within the MBUF 188, TBUF 190, or EBUF 192 concurrent with the placement of the request with the CCU 106.
  • the corresponding reserved status flag is set true.
  • the CCU IREADY signal is asserted on control line 302 and the corresponding instruction set ID is provided on the CCU
  • the valid status flag in the corresponding status register array is set true.
  • the PC logic unit 270 tracks the virtual address of the MBUF 188, TBUF 190 and EBUF 192 instruction streams through the entirety of the IFU 102. In performing this function, the PC logic block 270 both controls and operates from the IDecode unit 262. Specifically, portions of the instructions decoded by the IDecode unit 262 potentially relevant to a change in the program instruction stream flow are provided on the bus 318 to a control flow detection unit 274 and directly to the PC logic block 270.
  • the control flow detection unit 274 identifies each instruction in the decoded instruction set that constitutes a control flow instruction including conditional and unconditional branch instructions, call type instructions, software traps procedural instructions and various return instructions.
  • the control flow detection unit 274 provides a control signal, via lines 322, to the PC logic unit 270 to identify the location and specific nature of the control flow instructions within the instruction set present in the IDecode unit 262.
  • the PC logic unit 270 determines the target address of the control flow instruction, typically from data provided within the instruction and transferred to the PC logic unit via lines 318. Where, for example, a branch logic bias has been selected to execute ahead for conditional branch instructions, the PC logic unit 270 will begin to direct and separately track the prefetching of instruction sets from the conditional branch instruction target address.
  • the PC logic unit 270 will further assert a control signal, via lines 316, selecting the destination of the prefetch to be the TBUF 190, assuming that prior prefetch instruction sets were directed to the MBUF 188 or EBUF 192.
  • the prefetch control logic unit 266 determines that a prefetch request can be supplied to the CCU 106, the prefetch control logic unit 266 provides an enabling signal, again via lines 316, to the PC logic unit 270 to enable the provision of a page offset portion of the target address (CCU PADDR [13:4]) via the address lines 324 directly to the CCU 106.
  • the PC logic unit 270 where a new virtual to physical page translation is required further provides a VMU request signal via control line 328 and the virtualizing portion of the target address (VMU VADDR [31:14]) via the address lines 326 to the VMU 108 for translation into a physical address.
  • VMU VADDR [31:14] virtualizing portion of the target address
  • the previous translation result is maintained in an output latch coupled to the bus 122 for immediate use by the CCU 106.
  • VMU exception and VMU miss control lines 332, 334 Operational errors in the VMU 108 in performing the virtual to physical translation requested by the PC logic unit 270 are reported via the VMU exception and VMU miss control lines 332, 334.
  • the VMU miss control line 334 reports a translation lookaside buffer (TLB) miss.
  • TLB translation lookaside buffer
  • the VMU exception control signal, on VMU exception line 332, is raised for all other exceptions.
  • the PC logic unit handles the error condition by storing the current execution point in the instruction stream and then prefetching, as if in response to an unconditional branch, a dedicated exception handling routine instruction stream for diagnosing and handling the error condition.
  • the VMU exception and miss control signals identify the general nature of the exception encountered, thereby allowing the PC logic unit 270 to identify the prefetch address of a corresponding exception handling routine.
  • the IFIFO control logic unit 272 is provided to directly support the IFIFO unit 264. Specifically, the PC logic unit 270 provides a control signal via the control lines 336 to signal the IFIFO control logic unit 272 that an instruction set is available on the input bus 198 from the IDecode unit 262. The IFIFO control unit 272 is responsible for selecting the deepest available master register 200, 208, 2i6, 224 for receipt of the instruction set. The output of each of the master control registers 202, 210, 218, 226 is provided to the IFIFO control unit 272 via the control bus 338.
  • the control bits stored by each master control register includes a two-bit buffer address (IF_Bx_ADR) , a single stream indicator bit (IF_Bx_STRM) , and a single valid bit (IF Bx VLD) .
  • the two bit buffer address identifies - 27 - the first valid instruction within the corresponding instruction set. That is, instruction sets returned by the CCU 106 may not be aligned such that the target instruction of a branch operation, for example, is located in the initial instruction location within the instruction set. Thus, the buffer address value is provided to uniquely identify the initial instruction within an instruction set- that is to be considered for execution.
  • the stream bit is used essentially as a marker to identify the location of instruction sets containing conditional control flow instructions, and giving rise to potential control flow changes, in the stream of instructions through the IFIFO unit 264.
  • the main instruction stream is processed through the MBUF 188 generally with a stream bit value of 0.
  • the corresponding instruction set is marked with a stream bit value of 1.
  • the conditional branch instruction is detected by the IDecode unit 262. Up to four conditional control flow instructions may be present in the instruction set.
  • the instruction set is then stored in the deepest available master register of the IFIFO unit 264.
  • the current IEU 104 execution point address (DPC), the relative location of the conditional instruction containing instruction set as identified by the stream bit, and the conditional instruction location offset in the instruction set, as provided by the control flow detector 274, are combined with the relative branch offset value as obtained from a corresponding branch instruction field via control lines 318.
  • the result is a branch target virtual address that is stored by the PC logic unit 270.
  • the initial instruction sets of the target instruction stream may then be prefetched into the TBUF 190 utilizing this address.
  • the IFIFO unit 264 will continue to be loaded from either the MBUF 188 or TBUF 190.
  • the instruction set is marked with a stream bit value of 0. Since a second target stream cannot be fetched, the target address is calculated and stored by the PC logic unit 270, but no prefetch is performed. In addition, no further instruction sets can be processed through the IDecode unit 262, or at least none that are found to contain a conditional flow control instruction.
  • the PC logic unit 270 in the preferred embodiments of the present invention, can manage upto eight conditional flow instructions occurring in upto two instruction sets.
  • the target addresses for each of the two instruction sets marked by stream bit changes are stored in an array of four address registers with each target address positioned logically with respect to the location of the corresponding conditional flow instruction in the instruction set.
  • the PC logic unit 270 will direct the prefetch control unit 260, via control signals on lines 316, to transfer the contents of the TBUF 190 to the MBUF 188, if the branch is taken, and to mark invalid the contents of the TBUF 190. Any instruction sets in the IFIFO unit 264 from the incorrect instruction stream, target stream if the branch is not taken and main stream if the branch is taken, are cleared from the IFIFO unit 264.
  • the target addresses of the second stream bit marked instruction set are promoted to the first array of address registers.
  • a next instruction set containing conditional flow instructions can then be evaluated through the IDecode unit 262.
  • the toggle usage of the stream bit allows potential control flow changes to be marked and tracked through the IFIFO unit 264 for purposes of calculating branch target addresses and for marking the instruction set location above which to clear where the branch bias is subsequently determined to have been incorrect for a particular conditional flow control instruction.
  • the IFIFO control logic unit 272 simply resets the valid bit flag in the control registers of the corresponding master registers of the IFIFO unit 264.
  • the clear operation is instigated by the PC logic unit 270 in a control signal provided on lines 336.
  • the inputs of each of the master control registers 202, 210, 218, 226 are directly accessible by the IFIFO control logic unit 272 via the status bus 230.
  • the bits within these master control registers 202, 210, 218, 226 may be set by the IFIFO control unit 272 concurrent with or independent of a data shift operation by the IFIFO unit 264.
  • This capability allows an instruction set to be written into any of the master registers 200, 208, 216, 224, and the corresponding status information to be written into the master control registers 202, 210, 218, 226 asynchronously with respect to the operation of the IEU 104.
  • an additional control line on the control and status bus 230 enables and directs the FIFO operation of the IFIFO unit 264.
  • An IFIFO shift is performed by the IFIFO control logic unit 272 in response to the shift request control signal provided by the PC logic unit 270 via the control lines 336.
  • IFIFO control unit 272 based on the availability of a master register 200, 208, 216, 224 to receive an instruction set provides a control signal, via lines
  • the control interface between the IFU 102 and IEU 104 is provided by the control bus 126.
  • This control bus 126 is coupled to the PC logic unit 270 and consists of a number of control, address and specialized data lines.
  • Interrupt request and acknowledge control signals allow the IFU 102 to signal and synchronize interrupt operations with the IEU 104.
  • An externally generated interrupt signal is provided on a line 292 to the logic unit 270.
  • an interrupt request control signal provided on lines 340, causes the IEU 104 to , cancel tentatively executed instructions.
  • Information regarding the nature of an interrupt- is exchanged via interrupt information lines 341.
  • IEU 104 When the IEU 104 is ready to begin receiving instruction sets prefetched from the interrupt service routine address determined by the PC logic unit 270, the IEU 104 asserts an interrupt acknowledge control signal on the lines 340. Execution of the interrupt service routine, as prefetched by the IFU 102, will then commence.
  • An IFIFO read (IFIFO RD) control signal is provided by the IEU 104 to signal that the instruction set present in the deepest master register 224 has been completely executed and that a next instruction set is desired.
  • the PC logic unit 270 directs the IFIFO control logic unit 272 to perform a IFIFO shift operation on the IFIFO unit 264.
  • PC INC/SIZE PC increment request and size value
  • DPC point of execution program counter
  • a target address (TARGET ADDR) is returned on the address lines 346 to the PC logic unit 270.
  • the target address is the virtual target address of a branch instruction that depends on data stored within the register file of the IEU 104. Operation of the IEU 104 is therefore required to calculate the target address.
  • Control flow result (CF RESULT) control signals are provided on the control lines 348 to the PC logic unit 270 to identify whether any currently pending conditional branch instruction has been resolved and whether the result is either a branch taken or not taken. Based on these control signals, the PC logic unit 270 can determine which of the instruction sets in the prefetch buffer 260 and IFIFO unit 264 must be cancelled, if at all, as a consequence of the execution of the conditional flow instruction.
  • IEU Return IEU instruction return type control signals
  • IEU Return IEU instruction return type control signals
  • These instructions include a return from procedural instruction, return from trap, and return from subroutine call.
  • the return from trap instruction is used equally in hardware interrupt and software trap handling routines.
  • the subroutine call return is also used in conjunction with jump-and-link type calls.
  • the return control signals are provided to alert the IFU 102 to resume its instruction fetching operation with respect to the previously interrupted instruction stream. Origination of the signals from the IEU 104 allows the precise operation of the system 100 to be maintained; the resumption of an "interrupted" instruction stream is performed at the point of execution of the return instruction.
  • a current instruction execution PC address (Current IFPC) is provided on an address bus 352 to the IEU 104.
  • This address value, the DPC identifies the precise instruction being executed by the IEU 104. That is, while the IEU 104 may tentatively execute ahead instructions past the current IFPC address, this address must be maintained for purposes of precise control of the architecture 100 with respect to the occurrence of interrupts, exceptions, and any other events that would require knowing the precise state-of-the-machine.
  • the IEU 104 determines that the precise state-of-the- machine in the currently executing instruction stream can be advanced, the PC Inc/Size signal is provided to the IFU 102 and immediately reflected back in the current IFPC address value.
  • an address and bi-directional data bus 354 is provided for the transfer of special register data. This data may be programmed into or read from special registers within the IFU 102 by the IEU 104. Special register data is generally loaded or calculated by the IEU 104 for use by the IFU 102.
  • PC Logic Unit Detail A detailed diagram of the PC Logic unit 270 including a PC control unit 362, interrupt control unit 363, prefetch PC control unit 364 and execution PC control unit 366, is shown in Figure 3.
  • the PC control unit 362 provides timing control over the prefetch and execution PC control units 364, 366 in response to control signals from the prefetch control logic unit 266, IFIFO control logic unit 272, and the IEU 104, via the interface bus 126.
  • the Interrupt Control Unit 363 is responsible for managing the precise processing of interrupts and exceptions, including the determination of a prefetch trap address offset that selects an appropriate handling routine to process a respective type of trap.
  • the prefetch PC control unit 364 is, in particular, responsible for managing program counters necessary to support the prefetch buffers 188, 190, 192, including storing return addresses for traps handling and procedural routine instruction flows. In support of this operation, the prefetch PC control unit 364 is responsible for generating the prefetch virtual address including the CCU PADDER address on the physical address bus lines 324 and the VMU VMADDR address on the address lines 326. Consequently, the prefetch PC control unit 364 is responsible for maintaining the current prefetch PC virtual address value.
  • the prefetch operation is generally initiated by the IFIFO control logic unit 272 via a control signal provided on the control lines 316.
  • the PC control unit 362 generates a number of control signals provided on the control lines 372 to operate the prefetch PC control unit 364 to generate the PADDR and, as needed, the VMADDR addresses on the address lines 324, 326.
  • An increment signal having a value of 0 to four, may be also provided on the control lines 374 depending on whether the PC control unit 362 is re- executing an instruction set fetch at the present prefetch address, aligning for the second in a series of prefetch requests, or selecting the next full sequential instruction set for prefetch.
  • the current prefetch address PF_PC is provided on the bus 370 to the execution PC control unit 366.
  • New prefetch addresses originate from a number of sources.
  • a primary source of addresses is the current IF_PC address provided from the execution PC control unit 366 via bus 352.
  • the IF_PC address provides a return address for subsequent use by the prefetch PC control unit 364 when an initial call, trap or procedural instruction occurs.
  • the IF_PC address is stored in registers in the prefetch PC control unit 364 upon each occurrence of these instructions. In this manner, the PC control unit 362, on receipt of a IEU return signal, via control lines 350, need merely select the corresponding return address register within the prefetch PC control unit 364 to source a new prefetch virtual address, thereby resuming the original program instruction stream.
  • Another source of prefetch addresses is the target address value provided on the relative target address bus 382 from the execution PC control unit 366 or on the absolute target address bus 346 provided from the IEU 104.
  • Relative target addresses are those that can be calculated by the execution PC control unit 366 directly. Absolute target addresses must be generated by the IEU 104, since such target addresses are dependant on data contained in the IEU register file.
  • the target address is routed over the target address bus 384 to the prefetch PC control unit 364 for use as a prefetch virtual address.
  • an operand portion of the corresponding branch instruction is also provided on the operand displacement portion of the bus 318 from the IDecode unit 262.
  • a return address bus 352' is provided to transfer the current IF_PC value (DPC) to the prefetch PC control unit 364. This address is utilized as a return address where an interrupt, trap or other control flow instruction such as a call has occurred within the instruction stream.
  • the prefetch PC control unit 364 is then free to prefetch a new instruction stream.
  • the PC control unit 362 receives an IEU return signal, via lines 350, from the IEU 104 once the corresponding interrupt or trap handling routine or subroutine has been executed.
  • the PC control unit 362 selects, via one of the PFPC control signals on line 372 and based on an identification of the return instruction executed as provided via lines 350, a register containing the current return virtual address. This address is then used to continue the prefetch operation by the PC logic unit 270.
  • Another source of prefetch virtual addresses is from the special register address and data bus 354.
  • An address value, or at least a base address value, calculated or loaded by the IEU 104 is transferred as data via the bus 354 to the prefetch PC control unit 364.
  • the base addresses include the base addresses for the trap address table, a fast trap table, and a base procedural instruction dispatch table.
  • the bus 354 also allows many of the registers in the prefetch and execution PC control units 364, 366 to be read to allow corresponding aspects of the state-of- the-machine to be manipulated through the IEU 104.
  • the execution PC control unit 366 subject to the control of the PC control unit 362 is primarily responsible for calculating the current IF_PC address value.
  • the execution PC control unit 366 responds to control signals provided by the PC control unit 362 on the ExPc control lines 378 and increment/size control signals provided on the control lines 380 to adjust the IF_PC addres ' s. These control signals are generated primarily in response to the IFIFO read control signal provided on line 342 and the PC increment/size value provided on the control lines 344 from the IEU 104.
  • Figure 4 provides a detailed block diagram of the prefetch and execution PC control units 364, 366. These units primarily consist of registers, 5 incrementor ⁇ and the like, selectors and adder blocks. Control for managing the transfer of data between these blocks is provided by the PC Control Unit 362 via the PFPC control lines 372, the ExPC Control lines 378 and the Increment Control lines 374, 380. For purposes of
  • prefetch selector (PF_PC SEL) 390 that operates as a central selector of the current prefetch virtual address. This current prefetch address is provided on the output bus 392 from the prefetch selector to an incrementor unit 394 to generate a next prefetch
  • This next prefetch address is provided on the incrementor output bus 396 to a parallel array of registers MBUF PFnPC 398, TBUF PFnPC 400, and EBUF PFnPC 402. These registers 398, 400, 402 effectively store the next instruction prefetch address. However, in
  • prefetch addresses are held for the MBUF 188, TBUF 190, and EBUF 192.
  • the prefetch addresses, as stored by the MBUF, TBUF and EBUF PFnPC registers 398, 400, 402 are respectively provided by the MBUF, TBUF and EBUF PFnPC registers 398, 400, 402 are respectively provided by the MBUF, TBUF and EBUF PFnPC registers 398, 400, 402 are respectively provided by the
  • the PC control unit 362 can direct an immediate switch of the prefetch instruction stream merely by directing the selection, by the prefetch selector 390, of another one of the prefetch registers 398, 400, 402. Once that address value has been incremented by the incrementor 394, if a next instruction set in the stream is to be prefetched, the value is returned to the appropriate one of the prefetch registers 398, 400, 402.
  • Another parallel array of registers for simplicity shown as the single special register block 412, is provided to store a number of special addresses.
  • the register block 412 includes a trap return address register, a procedural instruction return address register, a procedural instruction dispatch table base address register, a trap routine dispatch table base address register, and a fast trap routine table base address register. Under the control of the PC control unit 362, these return address registers may receive the current IFPC execution address via the bus 352'. The address values stored by the return and base address registers within the register block 412 may be both read and written independently by the IEU 104. The register are selected and values transferred via the special register address and data bus 354.
  • a selector within the special register block 412 allows the addresses stored by the registers of the register block 412 to be put on the special register output bus 416 to the prefetch selector 390. Return addresses are provided directly to the prefetch selector 390. Base address values are combined with the offset value provided on the interrupt offset bus 373 from the interrupt control unit 363. Once sourced to the prefetch selector 390 via the bus 373', a special address can be used as the initial address for a new prefetch instruction stream by thereafter continuing the incremental loop of the address through the incrementor 394 and one of the prefetch registers 398, 400, 402.
  • Another source of addresses to the prefetch selector 390 is an array of registers within the target address register block 414.
  • the target registers within the block 414 provide for storage of, in the preferred embodiment, eight potential branch target addresses. These eight storage locations logically correspond to the eight potentially executable instructions held in the lowest two master registers 216, 224 of the IFIFO unit 264. Since any, and potentially all of the those instructions could be conditional branch instructions, the target register block 414 allows for their precalculated target addresses to be stored awaiting use for fetching of a target instruction stream through the TBUF 190.
  • a conditional branch bias is set such that the PC Control Unit 362 immediately begins prefetching of a target instruction stream
  • the target address is immediately fed through the target register block 414 via the address bus 418 to the prefetch selector 390. Once incremented by the incrementor 394, the address is stored back to the TBUF PFnPC 400 for use in subsequent prefetch operations of the target instruction stream. If additional branch instructions occur within the target instruction stream, the target addresses of such secondary branches are calculated and stored in the target register array 414 pending use upon resolution of the first conditional branch instruction.
  • a calculated target address as stored by the target register block 414 is transferred from a target address calculation unit within the execution PC control unit 366 via the address lines 382 or from the IEU 104 via the absolute target address bus 346.
  • the Address value transferred through the prefetch PF_PC selector 390 is a full thirty-two bit virtual address value.
  • the page size, in the preferred embodiment of the present invention is fixed at 16 KBytes, corresponding to the maximum page offset address value [13:0]. Therefore, a VMU page translation is not required unless there is a change in the current prefetch virtual page address [27:14], A comparitor in the prefetch selector 390 detects this circumstance.
  • a VMU translation request signal (VMXLAT) is provided via line 372' to the PC control unit 362 when there is a change in the virtual page address, either due incrementing accross a page boundary or a control flow branch to another page address.
  • the PC control unit 362 directs the placement of the VM VADDR address on lines 326, in addition to the CCU PADDR on lines 324, both via a buffer unit 420, and the appropriate control signals on the VMU control lines 326, 328, 330 to obtain a VMU virtual to physical page translation.
  • the current physical page address [31:14] is maintained by a latch at the output of the VMU unit 108 on the bus 122.
  • the virtual address provided onto the bus 370 is incremented by the incrementor 394 in response to a signal provided on the increment control line 374.
  • the incrementor 394 increments by a value representing an instruction set (four instructions or sixteen bytes) in order to select a next instruction set.
  • the low-order four bits of a prefetch address as provided to the CCU unit 106 are zero. Therefore the actual target address instruction in a first branch target instruction set may not be located in the first instruction location. However, the low-order four bits of the address are provided to the PC control unit 362 to allow the proper first branch instruction location to be known by the IFU 102.
  • the detection and handling by returning the low order bits [3:2] of a target addressas the two-bit •buffer address, to select the proper first instruction for execution in a non-aligned target instruction set, is performed only for the first prefetch of a new instruction stream, i.e., any first non-sequential instruction set address in an instruction stream.
  • the non-aligned relationship between the address of the first instruction in an instruction set and the prefetch address used in prefetching the instruction set can and is thereafter ignored for the duration of the current sequential.instruction stream.
  • the remainder of the functional blocks shown in Figure 4 comprise the execution PC control unit 366.
  • the execution PC control ' unit 366 incorporates its own independently functioning program counter incrementor.
  • Central to this function is an execution selector (DPC SEL) 430.
  • the address output by the execution selector 430, on the address bus 352', is the present execution address (DPC) of the architecture 100.
  • This execution address is provided to an adder unit 434.
  • the increment/size control- signals provided on the lines 380 specify an instruction increment value of from one to four that the adder unit 434 adds to the address obtained from the selector 430.
  • the adder 432 additionally performs an output latch function, the incremented next execution address is provided on the address lines 436 directly back to the execution selector 430 for use in the next execution increment cycle.
  • the initial execution address and all subsequent new stream addresses are obtained through a new stream register unit 438 via the address lines 440.
  • the new stream register unit 438 allows the new current prefetch address, as provided on the PFPC address bus 370 from the prefetch selector 390 to be passed on to the address bus 440 directly or stored for subsequent use. That is, where the prefetch PC control unit 364 determines to begin prefetching at a new virtual address, the new stream address is temporarily stored by the new stream register unit 438.
  • the PC control unit 362 by its participation in both ' the prefetch and execution increment cycles, holds the new stream address in the new stream register 438 unit until the execution address has reached the program execution point corresponding to the control flow instruction that instigated the new instruction stream.
  • the new stream address is then output from the new stream register unit 438 to the execution selector 430 to initiate the independent generation of execution addresses in the new instruction stream.
  • the new stream register unit 438 provides for the buffering of two control flow instruction target addresses.
  • an IFPC selector (IF_PC SEL) 442 is provided to ultimately issue the current IFPC address on the address bus 352 to the IEU 104.
  • the inputs to the IFPC selector 442 are the output addresses obtained from either the execution selector 430 or new stream register unit 438.
  • the IFPC selector 442 is directed by the PC control unit 362 to select the execution address output by the execution selector 430.
  • the selected address provided from the new stream register unit 438 can be bypassed via bus 440 directly to the IFPC selector 442 for provision as the current IFPC execution address.
  • the execution PC control unit 366 is capable of calculating all relative branch target addresses.
  • the current execution point address and the new stream register unit 438 provided address are received by a control flow selector (CF_PC) 446 via the address buses 352', 440. Consequently, the PC control unit 362 has substantial flexibility in selecting the exact initial address from which to calculate a target address.
  • This initial, or base, address is provided via address bus 454 to a target address ALU 450.
  • a second input value to the target ALU 450 is provided from a control flow displacement calculation unit 452 via bus 458.
  • Relative branch instructions in accordance with the preferred architecture 100, incorporate a displacement value in the form of an immediate mode constant that specifies a relative new target address.
  • the control flow displacement calculation unit 452 receives the operand displacement value initially obtained via the IDecode unit operand output bus 318. Finally, an offset register value is provided to the target address ALU 450 via the lines 456. The offset register 448 receives an offset value via the control lines 378' from the PC control unit 362. The magnitude of the offset value is determined by the PC control unit 362 based on the address offset between the base address provided on the address lines 454 and the address of the current branch instruction for which the relative target address is being calculated.
  • the PC control unit 362 through its control of the IFIFO control logic unit 272 tracks the number of instructions separating the instruction at the current execution point address (requested by CP_PC) and the instruction that is currently being processed by the IDecode unit 262 and, therefore, being processed by the PC logic unit 270 to determine the target address for that instruction.
  • the target address is written into a corresponding one of the target registers 414 via the address bus 382.
  • a 32 bit incrementor adjusts the address value in the MBUF PFnPC by sixteen bytes (xl6) with each prefetch cycle.
  • the target address of a relative unconditional control flow is calculated by the IFU from register data maintained by the IFU and from operand data following the control flow instruction.
  • the target address of an absolute unconditional control flow instruction is eventually calculated by the IEU. from a register reference, a base register value, and an index register value.
  • EBUF PFnPC Procedural Instruction Stream Processing: EBUF PFnPC 2.1 a procedural instruction may be prefetched in the main or branch target instruction stream. If fetched in a target stream, stall prefetching of the procedural stream until the conditional control flow instruction resolves and the procedural instruction is transferred to the MBUF. This allows the TBUF to be used in handling of conditional control flows that occur in the procedural instruction stream.
  • a procedural instruction stream that, in turn, includes first and second conditional control flow instruction containing instruction sets will stall prefetching with respect to the second conditional control flow instruction set until any conditional control flow instructions in the first such instruction set are resolved and the second conditional control flow instruction set has been transferred to the MBUF.
  • procedural instructions provide a relative offset, included as an immediate mode operand field of the instruction, to identify the procedural routine starting address:
  • the offset value provided by the procedural instruction is combined with a value contained in a procedural base address (PBR) register maintained in the IFU.
  • PBR procedural base address
  • the starting address of the procedural stream is simultaneously provided to the new stream register unit and to the incrementor for incrementing (xl6); the incremented address is then stored in the EBUF PFnPC.
  • a 32 bit incrementor adjusts address value (xl6) in the EBUF PFnPC with each procedural instruction prefetch cycle.
  • the target address of a relative unconditional control flow instruction is calculated by the IFU from IFU maintained register data and from the operand data provided within an immediate mode operand field of the control flow instruction.
  • the target address of an absolute unconditional branch is calculated by the IEU from a register reference, a base register value, and an index register value.
  • Branch Instruction Stream Processing TBUF PFnPC 3.1 when a conditional control flow instruction, occuring in a first instruction set in the MBUF instruction stream, is IDecoded, the target address is determined by the IFU if the target address is relative to the current address or by the IEU for absolute addresses. 3.2 for "branch taken bias":
  • conditional control flow instruction in the first conditional instruction set is relative, calculate the target address and store in the target registers.
  • conditional control flow instructions in the first conditional instruction set is absolute, wait for the IEU to calculate the target address and return the address to the target registers.
  • Target instruction sets are not loaded into the IFIFO (the branch target instructions are thus on hand when each conditional control flow instruction in the first instruction set resolves) .
  • MBUF PFnPC or EBUF as determined from the state of the procedure- in-progress bit.
  • 3.3.3.3 transfer the prefetched TBUF instructions to the MBUF or EBUF, as determined from the state of the procedure-in-progress bit.
  • 3.3.3.4 continue MBUF or EBUF prefetching operations, as determined from the state of the procedure-in-progres ⁇ bit. 3.3.4 if a conditional control flow instruction in the first set resolves to "not taken”: 3.3.4.1 flush the TBUF of instruction sets from the target instruction stream.
  • 4.1.1.2 can occur at any time and persist.
  • 4.1.1.3 serviced in priority order between atomic (ordinary) instructions and may suspend procedural instructions.
  • the starting address of an interrupt handler is determined as the vector number offset into a predefined table of trap handler entry points.
  • the starting address of the trap handler is determined from the trap number offset combined with a base address value stored in the TBR or FTB register.
  • the starting address of the exception handler is determined from the trap number offset into a predefined table of trap handler entry point.
  • Traps may nest, provided the trap handling routine saves the xPC addres ⁇ prior to a next allowed trap — failure to do so will corrupt the state of the machine if a trap occurs prior to completion of the current trap operation.
  • TBR or FTB regi ⁇ ter depending on the type of trap as determined by the trap number, which are provided in the set of special registers.
  • the trap handling routine may provide for the xPC addres ⁇ to be saved to a predefined location and interrupts re-enabled; the xPC register i ⁇ read/write via a special regi ⁇ ter move in ⁇ truciton and the special regi ⁇ ter addres ⁇ and data bus.
  • Interrupt ⁇ and exceptions will be processed, as long as they are enabled, regardless of whether the processor is executing from the main instruction ⁇ tream or a procedural in ⁇ truction ⁇ tream. Interrupts and exceptions are serviced in priority order, and persist until cleared.
  • the starting address of a trap handler is determined as the vector number offset into a predefined table of trap handler addresses as described below.
  • Interrupts and exceptions are of two basic types in the present embodiment, those which occur synchronously with particular instructions in the instruction stream, and those which occur asynchronously with particular instructions in the instruction ⁇ tream.
  • the terms interrupt, exception, trap and fault are used interchangeably herein.
  • Asynchronous interrupt ⁇ are generated by hardware, either on-chip or off-chip, which does not operate synchronou ⁇ ly with the instruction ⁇ tream.
  • interrupts generated by an on- chip timer/counter are asynchronous, as are hardware interrupts and non-maskable interrupt ⁇ (NMI) provided from off-chip.
  • NMI non-maskable interrupt ⁇
  • a ⁇ ynchronou ⁇ interrupt When an a ⁇ ynchronou ⁇ interrupt occurs, the processor context is frozen, all traps are disabled, certain processor status information is stored, and the processor vectors to an interrupt handler corresponding to the particular interrupt received. After the interrupt handler completes its processing, ' program execution continues with the instruction following the last completed instruction in the stream which was executing when the interrupt occurred. Synchronous exceptions are those that occur ⁇ ynchronou ⁇ ly with in ⁇ truction ⁇ in the in ⁇ truction stream. These exceptions occur in relation to particular instructions, and are held until the relevant instruction is to be executed. In the preferred embodiments, ⁇ ynchronous exceptions arise during prefetch, during in ⁇ truction decode, or during instruction execution. Prefetch exceptions include, for example, TLB miss or other VMU exceptions.
  • Decode exceptions arise, for example, if the in ⁇ truction being decoded i ⁇ an illegal in ⁇ truction or does not match the current privilege level of the processor. Execution exceptions arise due to arithmetic errors, for example, such as divide by zero. Whenever these exceptions occur, the preferred embodiments maintain- them in correspondence with the particular in ⁇ truction which caused the exception, until the time at which that instruction is to be retired. At that time, all prior completed instructions are retired, any tentative results from the in ⁇ truction which caused the exception are flushed, as are the tentative results of any following tentatively executed instructions. Control i ⁇ then transferred to an exception handler corresponding to the highest priority exception which occurred for that instruction.
  • Asynchronous interrupts are signaled to the PC logic unit 270 over interrupt lines 292.
  • the ⁇ e line ⁇ are provided to the interrupt logic unit 363 in the PC logic unit 270, and compri ⁇ e an NMI line, an IRQ line and a ⁇ et of interrupt level lines (LVL) .
  • the NMI line signals a nonmaskable interrupt, and derives from an external source. It i ⁇ the highest priority interrupt except for hardware reset.
  • the IRQ line also derives from an external source, and indicates when an external device i ⁇ requesting a hardware interrupt.
  • the preferred embodiments permit up to 32 user-defined externally supplied hardware interrupts and the particular external device requesting the interrupt provides the number of the interrupt (0-31) on the interrupt level lines (LVL).
  • the memory error line i ⁇ activated by the MCU 110 to ⁇ ignal variou ⁇ kind ⁇ of memory errors.
  • Other a ⁇ ynchronou ⁇ interrupt line ⁇ are also provided to the interrupt logic unit 363, including lines for requesting a timer/counter interrupt, a memory I/O error interrupt, a machine check interrupt and a performance monitor interrupt.
  • Each of the asynchronous interrupts, as well a ⁇ the synchronous exceptions described below, have a corresponding predetermined trap number associated with them, 32 of the ⁇ e trap numbers being a ⁇ bciated with the 32 available hardware interrupt level ⁇ .
  • a table of these trap numbers i ⁇ maintained in the interrupt logic unit 363. The higher the trap number, in general, the higher the priority of the trap.
  • interrupt control unit 363 When one of the asynchronou ⁇ interrupts is signaled to the interrupt logic unit 363, the interrupt control unit 363 sends out an interrupt request to the IEU 104 over INT REQ/ACK line ⁇ 340. Interrupt control unit 363 also sends a suspend prefetch ⁇ ignal to PC control unit 362 over line ⁇ 343, causing the PC control unit 262 to stop prefetching instructions.
  • the IEU 104 either cancels all then-executing in ⁇ truction ⁇ , and flu ⁇ hing all tentative re ⁇ ults, or it may allow some or all instruction ⁇ to complete. In the preferred embodiments, • any then-executing instructions are canceled, thereby permitting the fastest response to asynchronous interrupt ⁇ .
  • the DPC in the execution PC control unit 366 i ⁇ updated to corre ⁇ pond to the last instruction which has been completed and retired, before the IEU 104 acknowledges the interrupt. All other prefetched instructions in MBUF, EBUF, TBUF and IFIFO 264 are also cancelled.
  • the interrupt control unit 363 For ⁇ ynchronou ⁇ exception ⁇ , the interrupt control unit 363 maintain ⁇ a ⁇ et of four internal exception bits (not ⁇ hown) for each instruction set, one bit corresponding to each instruction in the set. The interrupt control unit 363 also maintains an indication of the particular trap numbers, if any detected for each instruction.
  • VMU signal ⁇ a TLB miss or another VMU exception while a particular in ⁇ truction ⁇ et is being prefetched this information is transmitted to the PC logic unit 270, and in particular to the interrupt control unit 363, over the VMU control lines 332 and 334.
  • the interrupt control unit 363 receives such a ⁇ ignal, it signal ⁇ the PC control unit 362 over line 343 to suspend further prefetches.
  • the interrupt control unit 363 sets the VM_Mis ⁇ or VM_Excp bit, a ⁇ appropriate, a ⁇ ociated the prefetch buffer to which the in ⁇ truction ⁇ et was destined.
  • the interrupt control unit 363 then set ⁇ all four internal exception indicator bit ⁇ corresponding to that instruction ⁇ et, ⁇ ince none of the in ⁇ truction ⁇ in the ⁇ et are valid, and ⁇ tores the trap number for the particular exception received in correspondence with each of the four instructions in the faulty instruction set.
  • the shifting and executing of instructions prior to the faulty instruction ⁇ et then continues as usual until the faulty ⁇ et reaches the lowest level in the IFIFO 264.
  • this information is also tran ⁇ mitted to the interrupt control unit 363 which ⁇ ets the internal exception indicator bit corresponding to the instruction generating the exception and ⁇ tore ⁇ the trap number in correspondence with that exception.
  • the shifting and executing of instructions prior to the faulty instruction then continues as u ⁇ ual until the faulty ⁇ et reaches the lowest level in the IFIFO 264.
  • the only type of exception which i ⁇ detected during the ⁇ hifting of an in ⁇ truction through the prefetch buffers 260, the IDecode unit 262 or the IFIFO 264 is a software trap instruction.
  • Software trap instructions are detected at the IDecode stage by CF_DET unit 274. While in ⁇ ome embodiment ⁇ other forms of synchronous exceptions may be detected in the IDecode unit 262, it is preferred that the detection of any other synchronous exceptions wait until the in ⁇ truction reaches the execution unit 104.
  • Thi ⁇ avoids the possibility that certain exceptions, such a ⁇ arri ⁇ ing from the handling of privileged instruction, might be signaled on the basi ⁇ of a processor state which could change before the effective in-order-execution of the instruction.
  • software trap instructions are detected at the IDecode stage by the CF_DET unit 274.
  • the internal exception indicator bit corresponding to that instruction in the interrupt logic unit 363 is set and the software trap number, which can be any number from 0 to 127 and which is specified in an immediate mode operand field of the software trap instruction, i ⁇ stored in correspondence with the trap in ⁇ truction.
  • the interrupt control unit 363 does not signal PC control unit 362 to ⁇ u ⁇ pend prefetches when a software trap instruction is detected.
  • the IFU 102 prefetches the trap handler into the MBUF instruction ⁇ tream buffer.
  • the interrupt logic unit 363 transmits the exception indicator bits for that in ⁇ truction ⁇ et a ⁇ a 4-bit vector to the IEU 104 over the SYNCH_INT_INFO lines 341 to indicate which, if any, of the instructions in the in ⁇ truction ⁇ et have already been determined to be the source of a ⁇ ynchronou ⁇ exception.
  • the IEU 104 doe ⁇ not respond immediately, but rather permit ⁇ all the in ⁇ truction ⁇ in the in ⁇ truction set to be scheduled in the normal course.
  • exceptions such as integer arithmetic exceptions
  • Exception ⁇ which depend on the current ⁇ tate of the machine, such as due to the execution of a privileged instruction, are also detected at this time, and in order to ensure that the ⁇ tate of the machine is current with respect to all previous instruction ⁇ in the instruction stream, all in ⁇ truction ⁇ which have a possibility of affecting the PSR (such a ⁇ ⁇ pecial move and return ⁇ from trap instructions) are forced .to execute in order. Only when an in ⁇ truction that is the source of a synchronous exception of any sort is about to be retired, i ⁇ the occurance of the exception ⁇ ignaled to the interrupt logic unit 363.
  • the IEU 104 retires all in ⁇ truction ⁇ which have been tentatively executed and which occur in the instruction ⁇ tream prior to the fir ⁇ t in ⁇ truction which has a synchronous exception, and flushes the tentative results from any tentatively executed instructions which occur subsequently in the in ⁇ truction ⁇ tream.
  • the particular instruction that caused the exception i ⁇ al ⁇ o flu ⁇ hed since that instruction will typically be re- executed upon return from trap.
  • the IF_PC in the execution PC control unit 366 i ⁇ then updated to correspond to the last in ⁇ truction actually retired, and the before any exception i ⁇ signaled to the interrupt control unit 363.
  • the IEU 104 When the in ⁇ truction that i ⁇ the ⁇ ource of an exception is retired, the IEU 104 returns to the interrupt logic unit 363, over the SYNCH_INT_INFO lines 341, both a new 4-bit vector indicating which, if any, in ⁇ truction ⁇ in the retiring in ⁇ truction set (register 224) had a synchronous exception, as well as information indicating the source of the fir ⁇ t exception in the instruction ⁇ et.
  • the information in the 4-bit exception vector returned by IEU 104 i ⁇ an accumulation of the 4-bit exception vectors provided to the IEU 104 by the interrupt logic unit 363, a ⁇ well as exceptions generated in the IEU 104.
  • interrupt control unit 363 determines the nature of the highest priority synchronous exception and its trap number.
  • the current DPC is temporarily ⁇ tored a ⁇ a return addre ⁇ in an xPC regi ⁇ ter, which i ⁇ one of the ⁇ pecial registers 412 ( Figure 4).
  • the addre ⁇ s of a trap handler is calculated a ⁇ a trap base register address plus an offset.
  • the PC logic unit 270 maintains two base regi ⁇ ter ⁇ for trap ⁇ , both of which are part of the special register ⁇ 412 ( Figure 4), and both of which are initialized by ⁇ pecial move instructions executed previously.
  • the base register used to calculate the addres ⁇ of the handler is a trap base regi ⁇ ter TBR.
  • the interrupt control unit 363 determine ⁇ the highe ⁇ t priority interrupt or exception currently pending and, through a look-up table, determines the trap number associated therewith. This is provided over a ⁇ et of INT_OFFSET line ⁇ 373 to the prefetch PC control unit 364 a ⁇ an off ⁇ et to the selected base regi ⁇ ter.
  • the vector addre ⁇ s i ⁇ calculated by merely concatenating the off ⁇ et bits a ⁇ low-order bits to the higher order bits obtained from the TBR register. This avoid ⁇ any need for the delay ⁇ of an adder.
  • the handler addre ⁇ may be calculated by concatenating the 8 bit trap number to the end of a 22-bit TBR ⁇ tored value. Two low-order zero bits may be appended to the trap number to en ⁇ ure that the trap handler address always occurs on a word boundary.
  • the concatenated handler address thus constructed is provided a ⁇ one of the input ⁇ , 373; to the prefetch selector PF_PC Sel 390 ( Figure 4), and is selected as the next address from which instructions are to be prefetched.
  • the vector handler addre ⁇ s for traps u ⁇ ing the TBR regi ⁇ ter are all only one word apart. Thu ⁇ , the instruction at the trap handler addres ⁇ must be a preliminary branch instruction to a longer trap handling routine. Certain traps require very careful handling, however, to prevent degradation of sy ⁇ tem performance. TLB traps, for example, must be executed very quickly. For thi ⁇ reason, the preferred embodiments include .a fast trap mechanism designed to allow the calling of small trap handlers without the cost of this preliminary branch. In addition, fast trap handler ⁇ can be located independently in memory, in on-chip ROM, for example, to eliminate memory ⁇ y ⁇ tem penalties a ⁇ ociated with RAM location ⁇ .
  • the only trap ⁇ which re ⁇ ult in fa ⁇ t trap ⁇ are the VMU exceptions mentioned above. Fa ⁇ t traps are numbered separately from other trap ⁇ , and have a range from 0 to 7. However, they have the same priority as MMU exceptions.
  • the interrupt control unit 363 recognizes a fast trap a ⁇ the highe ⁇ t priority trap then pending, it cau ⁇ e ⁇ a fa ⁇ t trap base register (FTB) to be selected from the special regi ⁇ ter ⁇ 412 and provided on the line ⁇ 416 to be combined with the trap off ⁇ et.
  • FTB fa ⁇ t trap base register
  • each fast trap addres ⁇ i ⁇ 128 byte ⁇ , or 32 word ⁇ apart.
  • the proce ⁇ or branche ⁇ to the starting word and may execute programs within the block or branch out of it. Execution of small programs, such as standard TLB handling routines which may be implemented in 32 instructions or less, is faster than ordinary traps because the preliminary branch to the actual exception handling routine is obviated.
  • fa ⁇ t trap mechani ⁇ m i ⁇ al ⁇ o useful in microproces ⁇ ors who ⁇ e in ⁇ truction ⁇ are variable in length.
  • the fa ⁇ t trap vector addres ⁇ e ⁇ be separated by enough ⁇ pace to accommodate at least two of the ⁇ horte ⁇ t in ⁇ truction ⁇ available on the microproce ⁇ or, and preferably about 32 average- ⁇ ized instructions.
  • the vector addres ⁇ e ⁇ should be ⁇ eparated by at lea ⁇ t enough ⁇ pace to permit that instruction to be preceded by at lea ⁇ t one other instruction in the handler.
  • the processor On dispatch to a trap handler, the processor enters both a kernel mode and an interrupted ⁇ tate. Conncurrently, a copy of the compare ⁇ tate register (CSR) i ⁇ placed in the prior carry ⁇ tate regi ⁇ ter (PCSR) and a copy of the PSR is stored in the prior PSR (PPSR) regi ⁇ ter.
  • the kernel and interrupted ⁇ tates modes are represented by bits in the processor statu ⁇ regi ⁇ ter (PSR) . Whenever the interrupted_ ⁇ tate bit in the current PSR i ⁇ ⁇ et, the shadow regi ⁇ ter ⁇ or trap registers RT[24] through RT[31], as described above and as ⁇ hown in Figure 7b, become vi ⁇ ible.
  • the interrupt handler may switch out of kernel mode merely by writing a new mode into the PSR, but the only way to leave the interrupted state i ⁇ by executing a return from trap (RTT) in ⁇ truction.
  • PCSR is restored to CSR register and PPSR regi ⁇ ter i ⁇ re ⁇ tored to the PSR regi ⁇ ter, thereby automatically clearing the interrupt_ ⁇ tate bit in the PSR regi ⁇ ter.
  • xPC is restored to either the MBUF PFnPC or the EBUF PFnPC as appropriate, via incrementor 394 and bus 396.
  • the decision as to whether to restore xPC into the EBUF or MBUF PFnPC is made according to the "procedure_in_progres ⁇ " bit of the PSR, once re ⁇ tored.
  • the processor does not use the same ⁇ pecial regi ⁇ ter xPC to ⁇ tore the return address for both trap ⁇ and procedural instruction ⁇ .
  • the interrupted ⁇ tate remains available even while the processor i ⁇ executing an emulation ⁇ tream invoked by a procedural instruction.
  • exception handling routines should not include any procedural in ⁇ truction ⁇ ⁇ ince there i ⁇ no ⁇ pecial regi ⁇ ter to ⁇ tore an addre ⁇ for return to the exception handler after the emulation ⁇ tream i ⁇ complete.
  • a trap handler should back up any desired regi ⁇ ter ⁇ , clear any interrupt condition, read any information nece ⁇ ary for handling the trap from the system regi ⁇ ter ⁇ and proce ⁇ it as appropriate. Interrupts are automatically disabled upon dispatch to the trap handler. After processing, the handler can then restore the backed up register ⁇ , re-enable interrupts and execute the RTT instruction to return from the interrupt. If nested traps are to be allowed, the trap handler should be divided into fir ⁇ t and second portions.
  • the xPC should be copied, using a ⁇ pecial regi ⁇ ter move instruction, and pushed onto the stack maintained by the trap handler.
  • the addre ⁇ of the beginning of the second portion of the trap handler should then be moved using the ⁇ pecial regi ⁇ ter move in ⁇ truction into the xPC, and a return from trap instruction (RTT) executed.
  • the RTT removes the interrupted ⁇ tate (via the re ⁇ toration of PPSR into PSR) and transfers control to the addre ⁇ in the xPC, which now contain ⁇ the addre ⁇ s of the second portion of the handler.
  • the ⁇ econd portion may enable interrupt ⁇ at this point and continue to process the exception in an interruptable mode.
  • the handler should preserve any of the "A" register values where these register values are likely to be altered by the handler.
  • the trap handling procedure i ⁇ fini ⁇ hed, it ⁇ hould restore all backed up registers, pop the original xPC off the trap handler stack and move it back into the xPC special regi ⁇ ter u ⁇ ing a ⁇ pecial register move instruction, and execute another RTT. This return ⁇ control to the appropriate instruction in the main or emulation instruction stream.
  • the combined control and data path portions of IEU 104 are ⁇ hown in Figure 5.
  • the primary data path begins with the instruction/operand data bu ⁇ 124 from the IFU 102.
  • immediate operand ⁇ are provided to an operand alignment unit 470 and pas ⁇ ed on to a regi ⁇ ter file (REG ARRAY) 472.
  • Regi ⁇ ter data i ⁇ provided from the regi ⁇ ter file 472 through a bypass unit 474, via a register file output bus 476, to a parallel array of functional computing element ⁇ (FU ⁇ ) 478 ⁇ ,,, via a di ⁇ tribution bu ⁇ 480.
  • Data generated by the functional units 478 ⁇ is provided back to the bypass unit 474 or the regi ⁇ ter array 472, or both, via an output bus 482.
  • a load/store unit 484 completes the data path portion of the IEU 104.
  • the load/store unit 484 is responsible for managing the transfer of data between the IEU 104 and CCU 106. Specifically, load data obtained from the data cache 134 of the CCU 106 i ⁇ transferred by the load/store unit 484 to an input of the regi ⁇ ter array 472 via a load data bu ⁇ 486. Data to be ⁇ tored to the data cache 134 of the CCU 106 i ⁇ received from the functional unit di ⁇ tribution bu ⁇ 480.
  • the control path portion of the IEU 104 is responsible for is ⁇ uing, managing, and completing the proce ⁇ ing of information through the IEU data path.
  • the IEU control path is capable of managing the concurrent execution of multiple in ⁇ truction ⁇ and the IEU data path provides for multiple independent data tran ⁇ fer ⁇ between essentially all data path element ⁇ of the IEU 104.
  • the IEU control path operates in response to in ⁇ truction ⁇ received via the in ⁇ truction/operand bu ⁇ 124. Specifically, in ⁇ truction ⁇ et ⁇ are received by the EDecode unit 490.
  • the EDcode 490 receives and decodes both instruction sets held by the IFIFO master registers 216, 224.
  • the results of the decoding of all eight instructions is variously provided to a carry checker (CRY CHKR) unit 492, dependency checker (DEP CHKR) unit 494, register renaming unit (REG RENAME) 496, instruction is ⁇ uer (ISSUER) unit 498 and retirement control unit (RETIRE CTL) 500.
  • the carry checker unit 492 receives decoded information about the eight pending instructions from the EDecode unit 490 via control line ⁇ 502.
  • the function of the carry checker 492 is to identify those ones of the pending in ⁇ truction ⁇ that either affect the carry bit of the proce ⁇ or ⁇ tatu ⁇ word or are dependent on the ⁇ tate of the carry bit.
  • Thi ⁇ control information i ⁇ provided via control line ⁇ 504 to the in ⁇ truction i ⁇ uer unit 498. Decoded information identifying the regi ⁇ ters of the register file 472 that are used by the eight pending instructions as provided directly to the regi ⁇ ter renaming unit 496 via control line ⁇ 506. Thi ⁇ information i ⁇ al ⁇ o provided to the dependency checker unit 494. The function of the dependency checker unit 494 i ⁇ to determine which of the pending in ⁇ truction ⁇ reference registers as the destination for data and which instructions, if any, are dependant on any of those destination regi ⁇ ter ⁇ . Tho ⁇ e instruction ⁇ that have register dependencies are identified by control ⁇ ignal ⁇ provided via the control line ⁇ 508 to the register rename unit 496.
  • the EDecode unit 490 provides control information identifying the particular nature and function of each of the eight pending instruction ⁇ to the in ⁇ truction i ⁇ uer unit 498 via control line ⁇ 510.
  • the i ⁇ uer unit 498 i ⁇ re ⁇ pon ⁇ ible for determining the data path resources, particularly of the availability of particular functional units, for the execution of pending instruction ⁇ .
  • in ⁇ truction i ⁇ uer unit 498 allows for the out-of-order execution of any of the eight pending in ⁇ truction ⁇ ⁇ ubject to the availability of data path re ⁇ ource ⁇ and carry and regi ⁇ ter dependency con ⁇ traint ⁇ .
  • the regi ⁇ ter rename unit 496 provide ⁇ the in ⁇ truction issuing unit 498 with a bit map, via control lines 512 of those instructions that are suitably unconstrained to allow execution. Instructions that have already been executed (done) and those with regi ⁇ ter or carry dependancie ⁇ are logically removed from the bit map.
  • the instruction is ⁇ uer unit 498 may initiate the execution of multiple in ⁇ truction ⁇ during each ⁇ y ⁇ tem clock cycle.
  • the ⁇ tatus of the functional units 478 ⁇ are provided via a ⁇ tatus bu ⁇ 514 to the in ⁇ truction i ⁇ uer unit 498.
  • Control ⁇ ignals for initiating, and subsequently managing the execution of instructions are provided by the instruction issuer unit 498 on the control lines 516 to the regi ⁇ ter rename unit 496 and ⁇ electively to the functional unit ⁇ 78 ⁇ .
  • the register rename unit 496 provide ⁇ regi ⁇ ter selection signal ⁇ on a regi ⁇ ter file acce ⁇ control bus 518.
  • a bypa ⁇ control unit (BYPASS CTL) 520 generally controls the operation of the bypa ⁇ data routing unit 474 via control signals on control lines 524.
  • the bypa ⁇ control unit 520 monitor ⁇ the ⁇ tatu ⁇ of each of the functional unit ⁇ 478 ⁇ and, in conjunction with the regi ⁇ ter references provided from the register rename unit 496 via control lines 522, determines whether data is to be routed from the regi ⁇ ter file 472 to the functional unit ⁇ 478 ⁇ or whether data being produced by the functional unit ⁇ 478 ⁇ can be immediately routed via the bypa ⁇ unit 474 to the functional unit di ⁇ tribution bus 480 for use in the execution of a newly i ⁇ ued in ⁇ truction ⁇ elected by the instruction i ⁇ uer unit 498. In either case, the in ⁇ truction issuer unit 498 directly controls the routing of data from the di ⁇ tribution bu ⁇ 480 to the functional unit ⁇ 78 ⁇ by selectively enabling specific register data to each of the functional units 478 ⁇ .
  • the remaining units of the IEU control path include a retirement control unit 500, a control flow control (CF CT1) unit 528, and a done control (DONE CTL) unit 536.
  • the retirement control unit 500 operates to void or confirm the execution of out-of-order executed instructions . Where an instruction has been executed out-of-order, that in ⁇ truction can be confirmed or retired once all prior in ⁇ truction ⁇ have al ⁇ o been retired.
  • the retirement control unit 500 provide ⁇ control signals on control line ⁇ 534 coupled to the bus 518 to effectively confirm the result data stored by the regi ⁇ ter array 472 as the re ⁇ ult of the prior execution of an out-of-order executed in ⁇ truction.
  • the retirement control unit 500 provides the PC increment/size control ⁇ ignals on control lines 344 to the IFU 102 as it retires each instruction. Since multiple instruction ⁇ may be executed out-of-order, and therefore ready ! for ⁇ imultaneou ⁇ retirement, the retirement control unit 500 determines a ⁇ ize value ba ⁇ ed on the number of instruction ⁇ simultaneously retired.
  • the retirement control unit 500 provides the IFIFO read control signal on the control line 342 to the IFU 102 to initiate an IFIFO unit 264 shift operation, thereby providing the EDecode unit 490 with an additional four in ⁇ truction ⁇ a ⁇ in ⁇ truction ⁇ pending execution.
  • the control flow control unit 528 perform ⁇ the ⁇ omewhat more ⁇ pecific function of detecting the logical branch re ⁇ ult of each conditional branch in ⁇ truction.
  • the control flow control unit 528 receive ⁇ an 8 bit vector identification of the currently pending conditional branch instructions from the EDecode unit 490 via the control lines 510.
  • This done control signal allows the control flow control unit 528 to identify when a conditional branch instruction i ⁇ done at lea ⁇ t to a point sufficient to determine a conditional control flow ⁇ tatu ⁇ .
  • the control flow ⁇ tatu ⁇ re ⁇ ult for the pending conditional branch instructions are stored by the control flow control unit 528 as they are executed.
  • the data nece ⁇ ary to determine the conditional control flow in ⁇ truction outcome i ⁇ obtained from temporary status registers in the register array 472 via the control lines 530.
  • control flow control unit provides a new control flow result ⁇ ignal on the control lines 348 to the IFU 102.
  • This control flow result signal preferably includes two 8 bit vectors defining whether the statu ⁇ results, by respective bit position, of the eight potentially pending control flow instruction are known and the corresponding status re ⁇ ult ⁇ tate ⁇ , also given by bit position correspondence.
  • the done control unit 540 is provided to monitor the operational execution ⁇ tate of each of the functional unit ⁇ 478 ⁇ . As any of the functional unit ⁇ 478 ⁇ signal completion of an instruction execution operation, the done control unit 540 provides a corresponding done control ⁇ ignal on the control line ⁇ 542 to alert the regi ⁇ ter rename unit 496, in ⁇ truction i ⁇ uer unit 498, retirement control unit 500 and bypa ⁇ control unit 520.
  • the parallel array arrangement of the functional unit ⁇ 478 0 ., enhances the control consi ⁇ tency of the IEU 104.
  • the particular nature of the individual functional unit ⁇ 478 ⁇ must *° e known by the in ⁇ truction i ⁇ suer unit 498 in order for instructions to be properly recognized and scheduled for execution.
  • the functional units 478 ⁇ n are responsible for determining and implementing their ⁇ pecific control flow operation nece ⁇ ary to perform their requi ⁇ ite function. Thu ⁇ , other than the in ⁇ truction issuer 498, none of the IEU control units need to have independant knowledge of the control flow proces ⁇ ing of an in ⁇ truction.
  • the in ⁇ truction issuer unit 498 and the functional units 478 ⁇ provide the necessary control signal prompting of the functions to be performed by the remaining control flow managing unit ⁇ 496, 500, 520, 528, 540. Thu ⁇ , alteration in the particular control flow operation of a functional unit 478 ⁇ doe ⁇ not impact the control operation of the IEU 104. Further, the functional augmentation of an exi ⁇ ting functional unit 478 ⁇ and even the addition of one or more new functional unit ⁇ 78 ⁇ , ⁇ uch a ⁇ an extended preci ⁇ ion floating point multiplier and extended preci ⁇ ion floating point ALU, a fa ⁇ t fourier computation functional unit, and a trigonometric computational unit, require only minor modification of the instruction i ⁇ uer unit 498.
  • the required modifications mu ⁇ t provide for recognition of the particular instruction, based on the corresponding instruction field isolated by the EDecode unit 490, a correlation of the instruction to the required functional unit 78 ⁇ . Control over the selection of regi ⁇ ter date, routing of data, instruction completion and retirement remain consi ⁇ tent with the handling of all other int ⁇ truction ⁇ executed with re ⁇ pect to all other one ⁇ of the functional unit ⁇ 478 ⁇ ,,.
  • the pre ⁇ ent invention provide ⁇ for a number of parallel data path ⁇ optimized generally for specific unctions.
  • the two principal data paths are integer and floating point.
  • a portion of the register file 472 i ⁇ provided to support the data manipulations occurring within that data path.
  • the preferred generic architecture of a data path register file is ⁇ hown in Figure 6a.
  • the data path regi ⁇ ter file 550 include ⁇ a temporary buffer 552, a regi ⁇ ter file array 564, an input ⁇ elector 559, and an output selector 556.
  • Data ultimately destined for the regi ⁇ ter array 564 i ⁇ typically fir ⁇ t received by the temporary buffer 552 through a combined data input bu ⁇ 558'. That is, all data directed to the data path regi ⁇ ter file 550 i ⁇ multiplexed by the input selector 559 from a number of input buse ⁇ 558, preferably two, onto the input bus 558'.
  • Register select and enable control signals provided on the control bus 518 select the register location for the received data within the temporary buffer 552.
  • control ⁇ ignal ⁇ again provided on the control bu ⁇ 518 enable the transfer of the data from the temporary buffer 552 to a logically corresponding register within the regi ⁇ ter file array 564 via the data bu ⁇ 560.
  • data ⁇ tored in the register ⁇ of the temporary buffer 552 may be utilized in the execution of ⁇ ub ⁇ equent in ⁇ tructions by routing the temporary buffer ⁇ tored data to the output data ⁇ elector 556 via a bypa ⁇ portion of the data bu ⁇ 560.
  • the ⁇ elector 556 controlled by a control ⁇ ignal provided via the control bu ⁇ 518 ⁇ elect ⁇ between data provided from the regi ⁇ ter ⁇ of the temporary buffer 552 and of the regi ⁇ ter file array 564.
  • Al ⁇ o where an executing instruction will be retired on completion, i.e., the instruction has been executed in-order, the input ⁇ elector 559 can be directed to route the re ⁇ ult data directly to the register array 554 via bypas ⁇ exten ⁇ ion 558".
  • each data path register file 550 permit ⁇ two ⁇ imultaneou ⁇ regi ⁇ ter operations to occur.
  • the input bus 558 provide ⁇ for two full regi ⁇ ter width data value ⁇ to be written to the temporary buffer 552.
  • the temporary buffer 552 provide ⁇ a multiplexer array permitting the ⁇ imultaneous routing of the input data to any two registers within the temporary buffer 552.
  • internal multiplexers allow any five regi ⁇ ters of the temporary buffer 552 to be selected to output data onto the bus 560.
  • the regi ⁇ ter file ,array 564 likewi ⁇ e includes input and output multiplexers allowing two regi ⁇ ter ⁇ to be ⁇ elected to receive, on bu ⁇ 560, or five to ⁇ ource, via bus 562, respective data ⁇ imultaneou ⁇ ly.
  • the regi ⁇ ter file output ⁇ elector 556 i ⁇ preferably implemented to allow any five of the ten regi ⁇ ter data value ⁇ received via the buses 560, 562 to be ⁇ imultaneou ⁇ ly output on the register file output bu ⁇ 564.
  • the regi ⁇ ter set within the temporary buffer is generally shown in Figure 6b.
  • the register ⁇ et 552' consists of eight single word (32 bit) registers IORD, I1RD...I7RD.
  • the register set 552' may al ⁇ o be used as a set of four double word register ⁇ IORD, IORD+1 (IORD4), I1RD, I1RD+1 (ISRD)... I3RD, I3RD+1 (I7RD).
  • the regi ⁇ ter ⁇ in the temporary buffer regi ⁇ ter set 552 are referenced by the register rename unit 496 ba ⁇ ed on the relative location of the re ⁇ pective in ⁇ truction ⁇ within the two IFIFO ma ⁇ ter regi ⁇ ters 216, 224.
  • Each instruction implemented by the architecture 100 may reference for output up to two regi ⁇ ters, or one double word register, for the destination of data produced by the execution of the instruction. Typically, an instruction will reference only a single output regi ⁇ ter.
  • the data destination register I2RD will be selected to receive data produced by the execution of the instruction.
  • the data produced by the instruction I 2 i ⁇ u ⁇ ed by a ⁇ ub ⁇ equent in ⁇ truction, for example, I 5 the data stored in the I2RD regi ⁇ ter will be tran ⁇ ferred out via the bu ⁇ 560 and the re ⁇ ultant data stored back to the temporary buffer 552 into the regi ⁇ ter identified a ⁇ I5RD.
  • instruction I ⁇ i ⁇ dependent on in ⁇ truction I 2 is not limited to the instructions.
  • In ⁇ truction I 5 cannot be executed until the re ⁇ ult data from I 2 is available. However, a ⁇ can be seen, instruction I 5 can execute prior to the retirement of instruction I 2 by obtaining it ⁇ required input data from the in ⁇ truction I 2 data location of the temporary buffer 552'.
  • a ⁇ in ⁇ truction I 2 is retired, the data from the regi ⁇ ter I2RD i ⁇ written to the regi ⁇ ter location within the regi ⁇ ter file array 564 a ⁇ determined by the logical po ⁇ ition of the instruction at the point of retirement. That is, the retirement control unit 560 determines the address of the destination regi ⁇ ter ⁇ in the register file array from the register reference field data provided from the EDecode unit 490 on the control lines 510.
  • instruction I 2 provides a double word result value.
  • execution of instructions I 4.7 are held where a double word output reference by any of the in ⁇ truction ⁇ I ⁇ i ⁇ detected by the register rename unit 496. This allow ⁇ the entire temporary buffer 552' to be u ⁇ ed a ⁇ a ⁇ ingle rank of double word regi ⁇ ter ⁇ . Once instructions I M have been retired, the temporary buffer 552' can again be u ⁇ ed as two ranks of single word regi ⁇ ters.
  • any instruction I 4 . 7 is held where a double word output regi ⁇ ter i ⁇ required until the in ⁇ truction ha ⁇ been ⁇ hifted into a corresponding I M location.
  • the logical organization of the register file array 564 is ⁇ hown in. Figure 7a-b.
  • the register file array 564 for the integer data path consists of 40 32-bit wide regi ⁇ ters.
  • This set of register ⁇ constituting a register set "A" is organized as a base register set ra[0..23] 565, a top set of general purpose regi ⁇ ter ⁇ ra[24..31] 566, and a shadow regi ⁇ ter set of eight general purpose trap regi ⁇ ters rt[24..31].
  • the general purpose register ⁇ ra[0..31] 565, 566 con ⁇ titutes the active "A" regi ⁇ ter ⁇ et of the regi ⁇ ter file array for the integer data path.
  • the trap regi ⁇ ters rt[24..31] 567 may be swapped into the active register set "A" to allow access along with the active base set of regi ⁇ ters ra[0..23] 565.
  • This configuration of the "A" register set is selected upon the acknowledgement of an interrupt or the execution of an exception trap handling routine.
  • This state of the register set "A” is maintained until expres ⁇ ly returned to the ⁇ tate ⁇ hown in Figure 7a by the execution of an enable interrupt ⁇ in ⁇ truction or execution of a return from trap instruction.
  • the floating point data path utilizes an extended precision regi ⁇ ter file array 572 a ⁇ generally shown in Figure 8.
  • the register file array 572 consists of 32 regi ⁇ ter ⁇ , rf[0..31], each having a width of 64 bits.
  • the floating point register file 572 may al ⁇ o be logically referenced as a "B" set of integer regi ⁇ ter ⁇ rb[0..31].
  • thi ⁇ "B" ⁇ et of regi ⁇ ter ⁇ i ⁇ equivalent to the low-order 32 bit ⁇ of each of the floating point regi ⁇ ter ⁇ rf[0..31].
  • a boolean operator register ⁇ et 574 is provided, a ⁇ ⁇ hown in Figure 9, to ⁇ tore the logical result of boolean combinatorial operations.
  • This "C" regi ⁇ ter ⁇ et 574 consists of 32 single bit register ⁇ , rc[0..31].
  • the operation of the boolean regi ⁇ ter ⁇ et 574 i ⁇ unique in that the results of boolean operations can be directed to any in ⁇ truction ⁇ elected regi ⁇ ter of the boolean regi ⁇ ter set- 574.
  • Thi ⁇ i ⁇ in contrast to utilizing a single processor statu ⁇ word regi ⁇ ter that stores single bit flags for conditions such a ⁇ equal, not equal, greater than and other simple boolean statu ⁇ values.
  • Both the floating point regi ⁇ ter ⁇ et 572 and the boolean regi ⁇ ter set 574 are complimented by temporary buffers architecturally identical to the integer temporary buffer 552 ⁇ hown in Figure 6b.
  • a number of additional ⁇ pecial regi ⁇ ters are at lea ⁇ t logically pre ⁇ ent in the register array 472.
  • the registers that are physically present in the register array 472, as ⁇ hown in Figure 7c, include a kernel stack pointer 568, processor ⁇ tate regi ⁇ ter (PSR) 569, previou ⁇ proce ⁇ or ⁇ tate regi ⁇ ter (PPSR) 570, and an array of eight temporary processor state registers (tPSR[0..7]) 571.
  • PSR processor ⁇ tate regi ⁇ ter
  • PPSR previou ⁇ proce ⁇ or ⁇ tate regi ⁇ ter
  • tPSR[0..7] temporary processor state registers
  • the ⁇ pecial addre ⁇ s and data bu ⁇ 354 i ⁇ provided to select and tran ⁇ fer data between the ⁇ pecial register ⁇ and the "A" and "B” ⁇ et ⁇ of regi ⁇ ter ⁇ .
  • a ⁇ pecial register move in ⁇ truction i ⁇ provided to select a regi ⁇ ter from either the "A” or "B" regi ⁇ ter ⁇ et, the direction of tran ⁇ fer and to specify the addres ⁇ identifier of a special regi ⁇ ter.
  • the kernel ⁇ tack pointer register and temporary proces ⁇ or ⁇ tate regi ⁇ ter ⁇ differ from the other ⁇ pecial regi ⁇ ter ⁇ .
  • the kernel ⁇ tack pointer may be acce ⁇ ed through execution of a ⁇ tandard regi ⁇ ter to regi ⁇ ter move in ⁇ truction when in kernel ⁇ tate.
  • the temporary proce ⁇ or ⁇ tate registers are not directly accessible. Rather, this array of regi ⁇ ter ⁇ i ⁇ u ⁇ ed to implement an inheritance mechanism for propagating the value of the proces ⁇ or ⁇ tate regi ⁇ ter for use by out-of-order executing instruction ⁇ .
  • the initial propagation value i ⁇ that of the proce ⁇ or state register the value provided by the last retired instruction.
  • This initial value is propagated forward through the temporary proces ⁇ or ⁇ tate regi ⁇ ter ⁇ ⁇ o that any out-of-order executing in ⁇ truction ha ⁇ access to the value in the po ⁇ itionally corresponding temporary processor state register.
  • the ⁇ pecific nature of an in ⁇ truction define ⁇ the condition code bit ⁇ , if any, that the in ⁇ truction i ⁇ dependent on and may change. Where an in ⁇ truction is unconstrained by dependencie ⁇ , regi ⁇ ter or condition code as determined by the register dependency checker unit 494 and carry dependency checker 492, the instruction can be executed out-of-order. Any modification of the condition code bit ⁇ of the processor ⁇ tate regi ⁇ ter are directed to the logically corre ⁇ ponding temporary proce ⁇ or state regi ⁇ ter.
  • PC ⁇ maintain the next address of the currently executing program in ⁇ truction ⁇ tream.
  • TBUF and EBUF PFnPC ⁇ maintain the next prefetch in ⁇ truction addre ⁇ es for the respective prefetch instruction ⁇ tream ⁇ .
  • uPC R/W Micro-Program Counter maintain ⁇ the addre ⁇ s of the in ⁇ truction following a procedural in ⁇ truction.
  • Thi ⁇ i ⁇ the addre ⁇ of the fir ⁇ t in ⁇ truction to be executed upon return from a procedural routine.
  • xPC R/W Interrupt/Exception .Program Counter hold ⁇ the return addre ⁇ of an interrupt or and exception.
  • TBR W Trap Base Regi ⁇ ter base addres ⁇ of a vector table used for trap handling routine dispatching. Each entry i ⁇ one word long.
  • the trap number provided by Interrupt Logic Unit 363, i ⁇ used as an index into the table pointed to by thi ⁇ addre ⁇ .
  • FTB W Fa ⁇ t Trap Base Register base addres ⁇ of an immediate trap handling routine table. Each table entry i ⁇ 32 words and is used to directly implement a trap handling routine. The trap number, provided by Interrupt Logic Unit 363, times 32 is used a ⁇ an offset into the table pointed to by this addre ⁇ s.
  • PBR W Procedural Base Register base address of a vector table used for procedural routine dispatching. Each entry i ⁇ one word long, aligned on four word boundaries. The procedure number, provided as a procedural in ⁇ truction field, is u ⁇ ed a ⁇ an index into the table pointed to by thi ⁇ addres ⁇ .
  • Statu ⁇ data bit ⁇ include: carry, overflow, zero, negative, processor mode, current interrupt level, procedural routine being executed, divide by 0, overflow exception, hardware function enables, procedural enable, interrupt enable.
  • PPSR R/W Previous Proce ⁇ or State Regi ⁇ ter loaded from the PSR on ⁇ uccessful completion of an in ⁇ truction or when an interrupt or trap i ⁇ taken.
  • PCSR R/W Previous Compare State Regi ⁇ ter loaded from the CSR on ⁇ ucce ⁇ ful completion of an in ⁇ truction or when an interrupt or trap is taken. 7 ) Integer Data Path Detail:
  • the integer data path of the IEU 104 con ⁇ tructed in accordance with the preferred embodiment of the present invention, i ⁇ ⁇ hown in Figure 10.
  • the many control path connections to the integer data path 580 are not shown. Those connections are defined with respect to Figure 5.
  • Input data for the data path 580 is obtained from the alignment unit ⁇ 582, 584 and the integer load/ ⁇ tore unit 586.
  • Integer immediate data value ⁇ originally provided a ⁇ an in ⁇ truction embedded data field are obtained from the operand unit 470 via a bu ⁇ 588.
  • the alignment unit 582 operate ⁇ to isolate the integer data value and provide the resulting value onto the output bu ⁇ 590 to a multiplexer 592.
  • a second input to the multiplexer 592 is the special regi ⁇ ter addre ⁇ s and data bus 354.
  • Immediate operands obtained from the instruction stream are also obtained from the operand unit 570 via the data bus 594.
  • the ⁇ e value ⁇ are again right justified by the alignment unit 584 before provi ⁇ ion onto an output bu ⁇ 596.
  • the integer load/ ⁇ tore unit 586 communicates bi- directionally via the external data bu ⁇ 598 with the CCU 106.
  • Data output from the multiplexer 592 and latch 602 are provided on the multiplexer input buses 604, 606 of a multiplexer 608.
  • Data from the functional unit output bus 482' i ⁇ also received by the multiplexer 608.
  • Thi ⁇ multiplexer 608, in the preferred embodiments of the architecture 100, provide ⁇ for two ⁇ imultaneou ⁇ data paths to the output multiplexer bu ⁇ e ⁇ 610.
  • the tran ⁇ fer of data through the multiplexer 608 can be completed within each half cycle of the system clock. Since most instructions implemented by the architecture 100 utilize a single de ⁇ tination regi ⁇ ter, a maximum of four in ⁇ truction ⁇ can provide data to the temporary buffer 612 during each ⁇ y ⁇ tem clock cycle.
  • Data from the temporary buffer 612 can be tran ⁇ ferred to an integer regi ⁇ ter file array 614, via temporary regi ⁇ ter output bu ⁇ e ⁇ 616 or to a output multiplexer 620 via alternate temporary buffer regi ⁇ ter bu ⁇ e ⁇ 618.
  • Integer regi ⁇ ter array output bu ⁇ e ⁇ 622 permit the tran ⁇ fer of integer register data to the multiplexer 620.
  • the output buse ⁇ connected to the temporary buffer 612 and integer regi ⁇ ter file array 614 each permit five regi ⁇ ter value ⁇ to be output ⁇ imultaneou ⁇ ly. That i ⁇ , two in ⁇ truction ⁇ referencing a total of up to five ⁇ ource registers can be issued simultaneously.
  • the temporary buffer 612, regi ⁇ ter file array 614 and multiplexer 620 allow outbound regi ⁇ ter data transfers to occur every half ⁇ y ⁇ tem clock cycle. Thu ⁇ , up to four integer and floating point instructions may be is ⁇ ued during each clock cycle.
  • the multiplexer 620 operate ⁇ to ⁇ elect outbound regi ⁇ ter data value ⁇ from the regi ⁇ ter file array 614 or directly from the temporary buffer 612. This allows out-of-order executed instruction ⁇ with dependencie ⁇ on prior out-of-order executed instructions to be executed by the IEU 104. This facilitates the twin goals of maximizing the execution through-put capability of the IEU integer data path by the out-of-order execution of pending instruction ⁇ while preci ⁇ ely segregating out- of-order data result ⁇ from data re ⁇ ult ⁇ produced by in ⁇ truction ⁇ that have been executed and retired.
  • the pre ⁇ ent invention allow ⁇ the data value ⁇ pre ⁇ ent in the temporary buffer 612 to be simply cleared.
  • the regi ⁇ ter file array 614 i ⁇ therefore left to contain preci ⁇ ely tho ⁇ e data value ⁇ produced only by the execution of instructions completed and retired prior to the occurrence of the interrupt or other exception condition.
  • the up to five register data values selected during each half system clock cycle operation of the multiplexer 620 are provided via the multiplexer output buse ⁇ 624 to an integer bypa ⁇ unit 626.
  • Thi ⁇ bypa ⁇ unit 626 i ⁇ in essence, a parallel array of multiplexers that provide for the routing of data presented at any of its input ⁇ to any of it ⁇ output ⁇ .
  • the bypa ⁇ unit 626 input ⁇ include the ⁇ pecial regi ⁇ ter addre ⁇ ed data value or immediate integer value via the output bus 604 from the multiplexer 592, the up to five regi ⁇ ter data value ⁇ provided on the buses 624, the load operand data from the integer load/store unit 586 via the double integer bu ⁇ 600, the immediate operand value obtained from the alignment unit 584 via it ⁇ output bu ⁇ 596, and, finally, a bypass data path from the functional unit output bus 482.
  • This bypass data path, and the data bu ⁇ 482 provide ⁇ for the simultaneous transfer of four register value ⁇ per ⁇ y ⁇ tem clock cycle.
  • the functional unit di ⁇ tribution bu ⁇ 480 i ⁇ implemented through the operation of a router unit 634.
  • the router unit 634 i ⁇ implemented by a parallel array of multiplexer ⁇ that permit five regi ⁇ ter value ⁇ received at its inputs to be routed to the functional units provided in the integer data path.
  • the router unit 634 receive ⁇ the five regi ⁇ ter data values provided via the buses 630 from the bypa ⁇ unit 626, the current IF_PC addre ⁇ value via the addre ⁇ bu ⁇ 352 and the control flow off ⁇ et value determined by the PC control unit 362 and as provided on the lines 378'.
  • the router unit 634 may optionally receive, via the data bus 636 an operand data value ⁇ ourced from a bypa ⁇ unit provided within ' the floating point data path.
  • the regi ⁇ ter data value ⁇ received by the router unit 634 may be tran ⁇ ferred onto the special, register addre ⁇ and data bu ⁇ 354 and to the functional unit ⁇ 640, 642, 644.
  • the router unit 634 i ⁇ capable of providing up to three register operand values to each of the functional unit ⁇ 640, 642, 644 via router output bu ⁇ e ⁇ 646, 648, 650.
  • Con ⁇ i ⁇ tent with the general architecture of the architecture 100, up to two in ⁇ truction ⁇ could be ⁇ imultaneou ⁇ ly i ⁇ sued to the functional units 640, 642, 644.
  • the preferred embodiment of the present invention provide ⁇ for three dedicated integer functional units, implementing respectively a programmable shift function and two arithmetic logic unit functions.
  • An ALU0 functional unit 644, ALU1 functional unit 642 and ⁇ hifter functional unit 640 provide re ⁇ pective output regi ⁇ ter data onto the functional unit bu ⁇ 482'.
  • the output data produced by the ALU0 and ⁇ hifter functional unit 644, 640 are also provided onto a shared integer functional unit bu ⁇ 650 that i ⁇ coupled into the floating point data path.
  • the ALUO functional unit 644 i ⁇ u ⁇ ed al ⁇ o in the generation of virtual address values in support of both the prefetch operations of the IFU 102 and data operations of the integer load/ ⁇ tore unit 586.
  • the virtual addre ⁇ value calculated by the ALUO functional unit 644 is provided onto an output bu ⁇ 654 that connect ⁇ to both the target addre ⁇ s bu ⁇ 346 of the IFU 102 and to the CCU 106 to provide the execution unit phy ⁇ ical addre ⁇ s (EX PADDR) .
  • a latch 656 i ⁇ provided to ⁇ tore the virtualizing portion of the addre ⁇ produced by the ALUO functional unit 644.
  • Thi ⁇ virtualizing portion of the address is provided onto an output bus 658 to the VMU 108.
  • Initial data i ⁇ again received from a number of ⁇ ource ⁇ including the immediate integer operand bu ⁇ 588, immediate operand bu ⁇ 594 and the ⁇ pecial regi ⁇ ter addre ⁇ data bus 354.
  • the final ⁇ ource of external data i ⁇ a floating point load/ ⁇ tore unit 662 that i ⁇ coupled to the CCU 106 via the external data bu ⁇ 598.
  • the multiplexer 666 also receives the special register addres ⁇ data bu ⁇ 354. Immediate operand ⁇ are provided to a second alignment unit 670 for right ju ⁇ tification before being provided on an output bu ⁇ 672.
  • the multiplexer 678 provide ⁇ for selectable data path ⁇ ⁇ ufficient to allow two regi ⁇ ter data value ⁇ to be written to a temporary buffer 680, via the multiplexer output bu ⁇ e ⁇ 682, each half cycle of the ⁇ y ⁇ tem clock.
  • the temporary buffer 680 incorporates a regi ⁇ ter ⁇ et logically identical to the temporary buffer 552' a ⁇ shown in Figure 6b.
  • the temporary buffer 680 further provide ⁇ for up to five regi ⁇ ter data value ⁇ to be read from the temporary buffer 680 to a floating point register file array 684, via data buses 686, and to an output multiplexer 688 via output data buse ⁇ 690.
  • the multiplexer 688 al ⁇ o receive ⁇ , via data bu ⁇ e ⁇ 692, up to five regi ⁇ ter data value ⁇ from the floating point regi ⁇ ter file array 684 ⁇ imultaneou ⁇ ly.
  • the multiplexer 688 function ⁇ to ⁇ elect up to five regi ⁇ ter data values for simultaneous transfer to a bypas ⁇ unit 694 via data bu ⁇ e ⁇ 696.
  • the bypa ⁇ unit 694 al ⁇ o receive ⁇ the immediate operand value provided by the alignment unit 670 via the data bu ⁇ 672, the output data bu ⁇ 698 from the multiplexer 666, the load data bu ⁇ 676 and a data bypass extension of the functional unit data return bus 482".
  • the bypass unit 694 operates to ⁇ elect up to five simultaneou ⁇ regi ⁇ ter operand data values for output onto the bypa ⁇ s unit output buses 700, a ⁇ tore data bu ⁇ 702 connected to the floating point load/ ⁇ tore unit 662, and the floating point bypa ⁇ s bu ⁇ 636 that connect ⁇ to the router unit 634 of the integer data path 580.
  • a floating point router unit 704 provide ⁇ for simultaneou ⁇ ⁇ electable data paths between the bypa ⁇ unit output bu ⁇ e ⁇ 700 and the integer data path bypa ⁇ bu ⁇ 628 and functional unit input bu ⁇ e ⁇ 706, 708, 710 coupled to the re ⁇ pective functional unit ⁇ 712, 714, 716.
  • Each of the input bu ⁇ e ⁇ 706, 708, 710 in accordance with the preferred embodiment of the architecture 100, permit ⁇ the ⁇ imultaneou ⁇ transfer of up to three register operand data value ⁇ to each of the functional unit 712, 714, 716.
  • the output bu ⁇ e ⁇ of the ⁇ e functional unit ⁇ 712, 714, 716 are coupled to the functional unit data return bu ⁇ 482" for returning data to the regi ⁇ ter file input multiplexer 678.
  • the integer data path functional unit output bus 650 may al ⁇ o be provided to connect to the functional unit data return bu ⁇ 482".
  • the architecture 100 does provide for a connection of the functional unit output buses of a multiplier functional unit 712 and a floating point ALU 714 to be coupled via the floating point data path functional unit bus 652 to the functional unit data return bus 482' of the integer data path 580.
  • the boolean operations data path 720 is shown in Figure 12. This data path 720 i ⁇ utilized in ⁇ upport of the execution of e ⁇ entially two type ⁇ of in ⁇ truction ⁇ .
  • the fir ⁇ t type i ⁇ an operand compari ⁇ on in ⁇ truction where two operands, selected from the integer regi ⁇ ter ⁇ et ⁇ , floating point register ⁇ et ⁇ or provided a ⁇ immediate operand ⁇ , are compared by ⁇ ubtraction in one of the ALU functional unit ⁇ of the integer and floating point data path ⁇ .
  • Compari ⁇ on is performed by a subtraction operation by any of the ALU functional units 642, 644, 714, 716 with the resulting ⁇ ign and zero ⁇ tatus bits being provided to a combined input ⁇ elector and compari ⁇ on operator unit 722.
  • Thi ⁇ unit 722 in re ⁇ ponse to in ⁇ truction identifying control ⁇ icnals received from the EDecode unit 490, ⁇ elect ⁇ the output of an ALU functional unit 642, 644, 714, 716 and combines the sign and zero bits to extract a boolean comparison re ⁇ ult value.
  • An output bus 723 allows the re ⁇ ult ⁇ of the compari ⁇ on operation to be tran ⁇ ferred ⁇ imultaneou ⁇ ly to an input multiplexer 726 and a bypa ⁇ s unit 742.
  • the bypa ⁇ unit 742 i ⁇ implemented a ⁇ a parallel array of multiplexer ⁇ providing multiple selectable data path ⁇ between the input ⁇ of the bypa ⁇ unit 742 to multiple outputs.
  • the other input ⁇ of the bypa ⁇ s unit 742 include a boolean operation re ⁇ ult return data bus 724 and two boolean operands on data bu ⁇ e ⁇ 744.
  • the bypa ⁇ unit 742 permits boolean operands representing up to two simultaneously executing boolean instructions to be tran ⁇ ferred to a boolean operation functional unit 746, via operand bu ⁇ e ⁇ 748.
  • the bypa ⁇ unit 746 al ⁇ o permit ⁇ transfer of up to two single bit boolean operand bits (CFO, CF1) to be simultaneou ⁇ ly provided on the control flow re ⁇ ult control line ⁇ 750, 752.
  • the remainder of the boolean operation data path 720 include ⁇ the input multiplexer 726 that receive ⁇ a ⁇ it ⁇ input ⁇ , the compari ⁇ on and the boolean operation re ⁇ ult value ⁇ provided on the compari ⁇ on re ⁇ ult bu ⁇ 723 and a boolean result bu ⁇ 724.
  • the bu ⁇ 724 permit ⁇ up to two ⁇ imultaneou ⁇ boolean re ⁇ ult bits to be tran ⁇ ferred to the multiplexer 726.
  • up to two comparison re ⁇ ult bit ⁇ may be tran ⁇ ferred via the bu ⁇ 723 to the multiplexer 726.
  • the multiplexer 726 permit ⁇ any two single bits presented at the multiplexer inputs to be transferred via the multiplexer output buses 730 to a boolean operation temporary buffer 728 during each half cycle of the ⁇ y ⁇ tem clock.
  • the temporary buffer 728 is logically equivalent to the temporary buffer 752', a ⁇ ⁇ hown in Figure 6b, though differing in two significant respects.
  • the fir ⁇ t respect i ⁇ that each regi ⁇ ter entry in the temporary buffer 728 con ⁇ i ⁇ ts of a single bit.
  • the second distinction is that only a single register is provided for each of the eight pending instruction slot ⁇ , ⁇ ince the re ⁇ ult of a boolean operation is, by definition, fully defined by a single result bit.
  • the temporary buffer 728 provides up to four output operand values ⁇ imultaneou ⁇ ly. Thi ⁇ allow ⁇ the simultaneous execution of two boolean instructions, each requiring acces ⁇ to two ⁇ ource register ⁇ .
  • the four boolean regi ⁇ ter value ⁇ may be tran ⁇ ferred during each half cycle of the ⁇ ystem clock onto the operand buses 736 to a multiplexer 738 or to a boolean regi ⁇ ter file array 732 via the boolean operand data bu ⁇ e ⁇ 734.
  • the boolean regi ⁇ ter file array 732 is a single 32 bit wide data regi ⁇ ter that permit ⁇ any ⁇ eparate combination of up to four ⁇ ingle bit location ⁇ to be modified with data from the temporary buffer 728 and read from the boolean regi ⁇ ter file array 732 onto the output buse ⁇ 740 during each half cycle of the ⁇ y ⁇ tem clock.
  • the multiplexer 738 provide ⁇ for any two pair ⁇ of boolean operand ⁇ received at its input ⁇ via the bu ⁇ e ⁇ 736, 740 to be tran ⁇ ferred onto the operand output bu ⁇ es 744 to the bypas ⁇ unit 742.
  • the boolean operation functional unit 746 i ⁇ capable of performing a wide range of boolean operation ⁇ on two ⁇ ource values.
  • the ⁇ ource value ⁇ are a pair of operands obtained from any of the integer and floating point register ⁇ et ⁇ and any immediate operand provided to the IEU 104, and, for a boolean instruction, any two of boolean regi ⁇ ter operands.
  • Tables III and IV identify the logical compari ⁇ on operation ⁇ provided by the preferred embodiment of the architecture 100.
  • Table V identifies the direct boolean operation ⁇ provided by the preferred implementation of the architecture 100.
  • the in ⁇ truction condition codes and function code ⁇ specified in the Tables III-V represent a segment of the corresponding instruction ⁇ .
  • the instruction also provides an identification of the source pair of operand register ⁇ and the de ⁇ tination boolean regi ⁇ ter for ⁇ torage of the corre ⁇ ponding boolean operation result.
  • the load/store units 586 662 are preferrably implemented as a single shared load/ ⁇ tore unit 760.
  • the addre ⁇ utilized by the load/store unit 760 i ⁇ a physical addre ⁇ a ⁇ oppo ⁇ ed to the virtual address utilized by the IFU 102 and the remainder of the IEU 104. While the IFU 102 operates on virtual addres ⁇ e ⁇ , relying on coordination between the CCU 106 and VMU 108 to produce a phy ⁇ ical addre ⁇ , the IEU 104 requires the load/store unit 760 to operate directly in a physical addre ⁇ s mode.
  • Load in ⁇ truction ⁇ referencing the same phy ⁇ ical addre ⁇ a ⁇ executed but not retired ⁇ tore in ⁇ truction ⁇ are delayed until the store instruction i ⁇ actually retired. At that point the ⁇ tore data may be tran ⁇ ferred to the CCU 106 by the load/ ⁇ tore unit 760 and then immediately loaded back by the execution of a CCU data load operation. Specifically, full phy ⁇ ical addre ⁇ e ⁇ are provided from the VMU 108 onto the load/store address bus 762. Load addresses are, in general, ⁇ tored in load address regi ⁇ ter ⁇ 768, ⁇ . Store addre ⁇ e ⁇ are latched into ⁇ tore addre ⁇ regi ⁇ ters 770 M .
  • the load/store control unit 774 provide ⁇ control ⁇ ignal ⁇ on control line ⁇ 778 for latching load addre ⁇ e ⁇ and on control line ⁇ 780 for latching ⁇ tore addresses.
  • Store data is latched simultaneou ⁇ with the latching of ⁇ tore addre ⁇ e ⁇ in logically corre ⁇ ponding ⁇ lot ⁇ of the ⁇ tore data register set IBl ⁇ .
  • a 4x4x32 bit wide addre ⁇ s comparator unit 772 is simultaneously provided with each of the addres ⁇ e ⁇ in the load and store addres ⁇ registers 768 3 ⁇ , 770 3 ⁇ ,.
  • the load/ ⁇ tore control unit 774 Upon receipt of a control ⁇ ignal from the retirement control unit 500, indicating that the corre ⁇ ponding ⁇ tore data in ⁇ truction i ⁇ retiring, the load/ ⁇ tore control unit 774 initiate ⁇ a CCU data tran ⁇ fer operation by arbitrating, via control line ⁇ 784 for acce ⁇ s to the CCU 106. When the CCU 106 ⁇ ignals ready, the load/store control unit 774 direct ⁇ the ⁇ elector 786 to provide a CCU phy ⁇ ical addre ⁇ onto the CCU PADDR address bu ⁇ 788. This addres ⁇ is obtained from the corre ⁇ ponding store regi ⁇ ter 770 g ⁇ via the address bus 790. Data from the corresponding ⁇ tore data regi ⁇ ter 782 3 ) is provided onto the CCU data bus 792.
  • the load ⁇ tore control unit 774 Upon is ⁇ uance of load in ⁇ truction by the in ⁇ truction issuer 498, the load ⁇ tore control unit 774 enables one of the load addres ⁇ latche ⁇ 768 ⁇ to latch the reque ⁇ ted load addre ⁇ .
  • the ⁇ pecific latch 768 0.3 ⁇ ele ⁇ ted logically corre ⁇ pond ⁇ to the po ⁇ ition of the load instruction in the relevant in ⁇ truction ⁇ et.
  • the in ⁇ truction i ⁇ uer 498 provide ⁇ the load/ ⁇ tore control unit 774 with a five bit vector identifying the load in ⁇ truction within either of the two po ⁇ ible pending instruction set ⁇ .
  • the load addre ⁇ is routed via an addre ⁇ bu ⁇ 794 to the selector 786 for output onto the CCU PADDR addre ⁇ bu ⁇ 788.
  • Provi ⁇ ion of the addre ⁇ i ⁇ performed in concert with CCU reque ⁇ t and ready control ⁇ ignal ⁇ being exchanged between the load/store control unit 774 and CCU 106.
  • An execution ID value (ExID) i ⁇ al ⁇ o prepared and i ⁇ ued by the load/ ⁇ tore control unit ' 774 to the CCU 106 in order to identify the load reque ⁇ t when the CCU 106 ⁇ ub ⁇ equently return ⁇ the reque ⁇ ted data including ExID value.
  • the ID value is thu ⁇ the ⁇ ame a ⁇ the bit vector provided with the load reque ⁇ t from the instruction issuer unit 498.
  • the load/ ⁇ tore control unit 774 On sub ⁇ equent ⁇ ignal from the CCU 106 to the load/ ⁇ tore control unit 774 of the availability of prior reque ⁇ ted load data, the load/ ⁇ tore control unit 774 enables an alignment unit to receive the data and provide it on the load data bu ⁇ 764. An alignment unit 798 operates to right justify the load data.
  • the load/ ⁇ tore control unit 774 receive ⁇ the ExID value from the CCU 106.
  • the load/ ⁇ tore control unit 774 provides a control ⁇ ignal to the in ⁇ truction i ⁇ uer unit 498 identifying that load data is being provided on the load data bus 764 and, further, return ⁇ a bit vector identifying the load in ⁇ truction for which the load data i ⁇ being returned.
  • the timing diagram of Figure 14 ⁇ how ⁇ a ⁇ equence of proces ⁇ or ⁇ y ⁇ tem clock cycle ⁇ , P M .
  • Each proce ⁇ or cycle begins with an internal T Cycle, T 0 .
  • T 0 There are two T cycles per proce ⁇ or cycle in a preferred embodiment of the pre ⁇ ent invention a ⁇ provided for by the architecture 100.
  • the IFU 102 and the VMU 108 operate to generate a phy ⁇ ical addres ⁇ .
  • the phy ⁇ ical addre ⁇ i ⁇ provided to the CCU 106 and an in ⁇ truction cache acce ⁇ operation is initiated.
  • an in ⁇ truction ⁇ et i ⁇ returned to the IFU 102 at about the mid-point of processor cycle one.
  • the IFU 102 then manages the transfer of the instruction set through the prefetch unit 260 and IFIFO 264, whereupon the instruction ⁇ et is fir ⁇ t pre ⁇ ented to the IEU 104 for execution.
  • the EDecode unit 490 receive ⁇ the full in ⁇ truction ⁇ et in parallel for decoding prior to the conclu ⁇ ion of proces ⁇ or cycle one.
  • the EDecode unit 490 in the preferred architecture 100, i ⁇ implemented a ⁇ a pure combinatorial logic block that provide ⁇ for the direct parallel decoding of all valid in ⁇ truction ⁇ that are received via the bu ⁇ 124.
  • Each type of instruction recognized by the architecture 100 including the specification of the in ⁇ truction, regi ⁇ ter requirement ⁇ and re ⁇ ource need ⁇ are identified in Table VI.
  • Convert Integer Operation Function Code specifie ⁇ Move type of floating point to integer conversion Source/De ⁇ tination Regi ⁇ ter Register Set A/B ⁇ elect
  • Boolean Functions Boolean Function Code: ⁇ pecifie ⁇ And, Or, etc. Destination boolean regi ⁇ ter Source Regi ⁇ ter 1 Source Regi ⁇ ter 2 Regi ⁇ ter Set A/B select
  • Extended Procedure Procedure specifier: specifie ⁇ addre ⁇ off ⁇ et from procedural base value Operation: value pas ⁇ ed to procedure routine Atomic Procedure Procedure ⁇ pecifier: ⁇ pe ⁇ ifie ⁇ addre ⁇ s value
  • * - in ⁇ truction include ⁇ the ⁇ e field ⁇ in addition to a field that decode ⁇ to identify the in ⁇ truction.
  • the EDecode unit 490 decodes each instruction of an in ⁇ truction set in parallel. The resulting identification of instructions, instruction . functions, register references and function requirements are made available on the outputs of the EDecode unit 490. This information is regenerated and latched by the EDecode unit 490 during each half proce ⁇ or cycle until all in ⁇ truction ⁇ in the in ⁇ truction set are retired. Thu ⁇ , information regarding all eight pending instructions is constantly maintained at the output of the EDecode unit 490. This information is presented in the form of eight element bit vectors where the bits or sub-field ⁇ of each vector logically corre ⁇ pond to the phy ⁇ ical location of the corre ⁇ ponding in ⁇ truction within the two pending in ⁇ truction ⁇ et ⁇ .
  • втори ⁇ vector ⁇ are provided via the control line ⁇ 502 to the carry checker 492, where each vector ⁇ pecifie ⁇ whether the corre ⁇ ponding in ⁇ truction affect ⁇ or i ⁇ dependant on the carry bit of the proce ⁇ or ⁇ tatu ⁇ word.
  • Eight vectors are provided via the control lines 510 to identify the specific nature of each in ⁇ truction and the function unit requirements.
  • Eight vectors are provided via the control line ⁇ 506 ⁇ pecifying the regi ⁇ ter reference ⁇ u ⁇ ed by each of the eight pending in ⁇ truction ⁇ . The ⁇ e vector ⁇ are provided prior to the end of proce ⁇ or cycle one. 2, Carry Checker Unit Detail:
  • the carry checker unit 492 operates in parallel with the dependency check unit 494 during the data dependency phase of operation ⁇ hown in Figure 14.
  • the carry check unit 492 is implemented in the preferred architecture 100 a ⁇ pure combinatorial logic. Thu ⁇ , during each iteration of operation by the carry checker unit 492, all eight in ⁇ truction ⁇ are con ⁇ idered with re ⁇ pect to whether they modify the carry flag of the proce ⁇ or ⁇ tate regi ⁇ ter. Thi ⁇ is nece ⁇ ary in order to allow the out-of-order execution of instructions that depend on the state of the carry bit a ⁇ ⁇ et by prior in ⁇ truction ⁇ . Control ⁇ ignals provided on the control lines 504 allow the carry check unit 492 to identify the specific instructions that are dependant on the execution of prior instructions with re ⁇ pect to the carry flag.
  • the carry checker unit 492 maintain ⁇ a temporary copy of the carry bit for each of the eight pending instruction ⁇ . For tho ⁇ e in ⁇ truction ⁇ that do not modify the carry bit, the carry checker unit 492 propagate ⁇ the carry bit to the next in ⁇ truction forward in the order of the program in ⁇ truction ⁇ tream.
  • an out-of-order executed instruction that modifies the carry bit can be executed and, further, a ⁇ ub ⁇ equent in ⁇ truction that i ⁇ dependant on ⁇ uch an out-of-order executed instruction may al ⁇ o be allowed to execute, though ⁇ ub ⁇ equent to the in ⁇ truction that modifie ⁇ the carry bit.
  • the data dependency checker unit 494 receive ⁇ the eight register reference identification vectors from the EDecode unit 490 via the control lines 506.
  • Each register reference is indicated by a five bit value, suitable for identifying any one of 32 registers at a time, and a two bit value that identifies the register bank a ⁇ located within the "A", "B" or boolean register sets.
  • the floating point register ⁇ et i ⁇ equivalently identified a ⁇ the "B" register ⁇ et.
  • Each instruction may have up to three register reference field ⁇ : two ⁇ ource register fields and one destination.
  • an in ⁇ truction bit field recognized by the EDecode unit 490 may signify that no actual output data is to be produced. Rather, execution of the in ⁇ truction i ⁇ only for the purpo ⁇ e of determining an alteration of the value of the proce ⁇ or ⁇ tatu ⁇ regi ⁇ ter.
  • the data dependency checker 494 implemented again a ⁇ pure combinatorial logic in the preferred architecture 100, operate ⁇ to ⁇ imultaneou ⁇ ly determine dependencie ⁇ between ⁇ ource regi ⁇ ter reference ⁇ of in ⁇ tructions subsequent in the program in ⁇ truction ⁇ tream and destination regi ⁇ ter reference ⁇ of relatively prior in ⁇ truction ⁇ .
  • a bit array is produced by the data dependency checker 494 that identifie ⁇ not only which in ⁇ truction ⁇ are dependant on other ⁇ , but al ⁇ o the regi ⁇ ter ⁇ upon which each dependency ari ⁇ e ⁇ .
  • the regi ⁇ ter rename unit 496 receive ⁇ the identification of the regi ⁇ ter references of all eight pending instructions via the control lines 506, and register dependencies via the control lines 508. A matrix of eight elements is al ⁇ o received via the control line ⁇ 542 that identify tho ⁇ e in ⁇ tructions within the current set of pending in ⁇ truction ⁇ that have been executed (done) . From thi ⁇ information, the regi ⁇ ter rename unit 496 provide ⁇ an eight element array of control ⁇ ignal ⁇ to the in ⁇ truction i ⁇ uer unit 498 via the control lines 512.
  • the control information so provided reflect ⁇ the determination made by the regi ⁇ ter rename unit 496 as to which of the currently pending in ⁇ tructions, that have not already been executed, are now available to be executed given the current set of identified data dependencie ⁇ .
  • the regi ⁇ ter rename unit 496 receive ⁇ a selection control signal via the lines 516 that identifies up to six instructions that are to be ⁇ imultaneou ⁇ ly i ⁇ ued for execution: two integer, two floating point and two boolean.
  • the regi ⁇ ter rename unit 496 perform ⁇ the additional function of ⁇ electing, via control ⁇ ignal ⁇ provided on the bu ⁇ 518 to the regi ⁇ ter file array 472, the ⁇ ource regi ⁇ ter ⁇ for acce ⁇ in the execution of the identified in ⁇ tructions.
  • De ⁇ tination regi ⁇ ter ⁇ for out- of-order executed instruction ⁇ are ⁇ elected as being in the temporary buffers 612, 660, 728 of the corre ⁇ ponding data path.
  • In-order executed in ⁇ truction ⁇ are retired on completion with re ⁇ ult data being ⁇ tored through to the register files 614, 684, 732.
  • ⁇ ource regi ⁇ ter ⁇ depend ⁇ on whether the regi ⁇ ter ha ⁇ been prior ⁇ elected a ⁇ a de ⁇ tination and the corre ⁇ ponding prior in ⁇ truction ha ⁇ not yet been retired. In ⁇ uch an in ⁇ tance, the ⁇ ource register i ⁇ ⁇ elected from the corre ⁇ ponding temporary buffer 612, 680, 728. Where the prior in ⁇ truction has been retired, then the register of the corresponding regi ⁇ ter file 614, 684, 732 i ⁇ ⁇ elected. Con ⁇ equently, the regi ⁇ ter rename unit 496 operates to effectively substitute temporary buffer register references for register file register references in the case of out-of-order executed instructions.
  • the temporary buffers 612, 680, 728 are not duplicate register ⁇ tructure ⁇ of their corresponding regi ⁇ ter file array ⁇ . Rather, a ⁇ ingle destination register slot is provided for each of eight pending in ⁇ truction ⁇ .
  • the ⁇ ub ⁇ titution of a temporary buffer de ⁇ tination regi ⁇ ter reference is determined by the location of the corresponding instruction within the pending regi ⁇ ter ⁇ et ⁇ .
  • the in ⁇ truction i ⁇ uer unit 498 determines the ⁇ et of in ⁇ tructions that can be issued, based on the output of the regi ⁇ ter rename unit 496 and the function requirement ⁇ of the in ⁇ tructions as identified by the EDecode unit 490. The in ⁇ truction i ⁇ uer unit 498 make ⁇ this determination ba ⁇ ed on the status of each of the functional units 478 ⁇ a ⁇ reported via control line ⁇ 514. Thu ⁇ , the in ⁇ truction i ⁇ uer unit 498 begins operation upon receipt of the available set of in ⁇ tructions to issue from the regi ⁇ ter rename unit 496.
  • the instruction i ⁇ uer unit 498 anticipates the availability of functional unit 478 ⁇ that may be currently executing an instruction. In order to minimize the delay in identifying the instruction ⁇ to be i ⁇ sued to the register rename unit 496, the in ⁇ truction i ⁇ uer unit 498 is implemented in dedicated combinatorial logic. Upon identification of the in ⁇ truction ⁇ to i ⁇ ue, the register rename unit 496 initiates a register file access that continues to the end of the third proce ⁇ sor cycle, P 2 .
  • the instruction issuer unit 498 initiates operation by one or more of the functional units 478 ⁇ , such a ⁇ ⁇ hown a ⁇ "Execute 0", to receive and proce ⁇ s ⁇ ource data provided from the register file array 472.
  • ⁇ ome in ⁇ truction ⁇ require multiple proce ⁇ or cycles to complete, ⁇ uch a ⁇ ⁇ hown a ⁇ "Execute 1", a ⁇ imultaneou ⁇ ly issued in ⁇ truction.
  • the Execute zero and Execute 1 in ⁇ truction ⁇ may, for example, be executed by an ALU and floating point multiplier functional unit ⁇ re ⁇ pectively.
  • the ALU functional unit, a ⁇ ⁇ hown is Figure 14, produces output data within one proces ⁇ or cycle and, by ⁇ imple provi ⁇ ion of output latching, available for use in executing another instruction during the fifth proces ⁇ or cycle, P 4 .
  • the floating point multiply functional unit i ⁇ preferably an internally pipelined functional unit. Therefore, another additional floating point multiply in ⁇ truction can be i ⁇ ued in the next proce ⁇ or cycle. However, the re ⁇ ult of the fir ⁇ t instruction will not be available for a data dependant number of processor cycles; the instruction ⁇ hown in Figure 14 require ⁇ three proce ⁇ or cycle ⁇ to complete proce ⁇ ing through the functional unit. During each proce ⁇ sor cycle, the function of the in ⁇ truction issuer unit 498 i ⁇ repeated. Con ⁇ equently, the ⁇ tatu ⁇ of the current ⁇ et of pending in ⁇ truction ⁇ a ⁇ well as the availability state of the full ⁇ et of functional units 478 ⁇ are reevaluated during each proces ⁇ or cycle.
  • the preferred architecture 100 is therefore capable of executing up to six in ⁇ truction ⁇ per proce ⁇ sor cycle.
  • a typical instruction mix will re ⁇ ult in an overall average execution of 1.5 to 2.0 in ⁇ truction ⁇ per proce ⁇ or cycle.
  • a final con ⁇ ideration in the function of the in ⁇ truction issuer 498 is it ⁇ participation in the handling of traps conditions and the execution of specific instructions.
  • the occurrence of a trap condition requires that the IEU 104 be cleared of all in ⁇ truction ⁇ that have not yet been retired. Such a circumstance may arise in re ⁇ pon ⁇ e to an externally received interrupt that i ⁇ relayed to the IEU 104 via the interrupt reque ⁇ t/acknowledge control line 340, from any of the functional units 478o. n in re ⁇ pon ⁇ e to an arithmetic fault, or, for example, the EDecode unit 490 upon the decoding of an illegal instruction.
  • the in ⁇ truction i ⁇ uer unit 498 i ⁇ responsible for halting or voiding all un- retired in ⁇ truction ⁇ currently pending in the IEU 104. All in ⁇ tructions that cannot be retired ⁇ imultaneously will be voided. This result is essential to maintain e preciseness of the occurrence of the interrupt with respect to the conventional in-order execution of a program in ⁇ truction ⁇ tream.
  • the in ⁇ truction issuer 498 is responsible for ensuring that all instructions which can alter the PSR ( ⁇ uch a ⁇ special move and return from trap) are executed strictly in- order.
  • Instruction ⁇ of thi ⁇ type include subroutine returns, return ⁇ from procedural in ⁇ tructions, and return ⁇ from trap ⁇ .
  • the instruction is ⁇ uer unit 498 provide ⁇ identifying control ⁇ ignal ⁇ via the IEU return control lines 350 to the IFU 102.
  • the done control unit 540 monitors the functional unit ⁇ 478 ⁇ for the completion ⁇ tatu ⁇ of their current operation ⁇ .
  • the done control unit 540 anticipates the completion of operation ⁇ by each functional unit ⁇ ufficient to provide a completion vector, reflecting the ⁇ tatus of the execution of each instruction in the currently pending ⁇ et of in ⁇ truction ⁇ , to the regi ⁇ ter rename unit 496, bypas ⁇ control unit 520 and retirement control unit 500 approximately one half proce ⁇ sor cycle prior to the execution completion of an instruction by a functional unit 478o .n .
  • the instruction is ⁇ uer unit 498 via the regi ⁇ ter rename unit 496, to consider the instruction completing functional units as available resource ⁇ for the next in ⁇ truction i ⁇ uing cycle.
  • the bypa ⁇ control unit 520 i ⁇ allowed to prepare to bypa ⁇ data output by the functional unit through the bypa ⁇ unit 474.
  • the retirement control unit 500 may operate to retire the corresponding instruction ⁇ imultaneou ⁇ with the tran ⁇ fer of data from the functional unit 478 ⁇ to the regi ⁇ ter file array 472.
  • the retirement control unit 500 monitor ⁇ the olde ⁇ t instruction set output from the EDecode output 490. As each in ⁇ truction in in ⁇ truction ⁇ tream order i ⁇ marked done by the done control unit 540, the retirement control unit 500 directs, via control signals provided on control line ⁇ 534, the transfer of data from the temporary buffer slot to the corre ⁇ ponding in ⁇ truction ⁇ pecified regi ⁇ ter file regi ⁇ ter location within the regi ⁇ ter file array 472.
  • the PC Inc/Size control ⁇ ignal ⁇ are provided on the control line ⁇ 344 for each- one or more in ⁇ truction ⁇ imultaneou ⁇ ly retired. Up to four instructions may be retired per proce ⁇ or cycle. Whenever an entire instruction ⁇ et has been retired, an IFIFO read control ⁇ ignal i ⁇ provided on the control line 342 to advance the IFIFO 264.
  • the control flow control unit 528 operate ⁇ to continuou ⁇ ly provide the IFU 102 with information ⁇ pecifying whether any control flow in ⁇ truction ⁇ within the current ⁇ et of pending in ⁇ truction ⁇ have been re ⁇ olved and, further, whether the branch re ⁇ ult is taken or not taken.
  • the control flow control unit 528 obtains, via control line ⁇ 510, an identification of the control flow branch in ⁇ truction ⁇ by the EDecode 490.
  • the current set of regi ⁇ ter dependencie ⁇ i ⁇ provided via control line ⁇ 536 from the data dependency checker unit 494 to the control flow control unit 528 to allow the control flow control unit 528 to determine whether the outcome of a branch in ⁇ truction i ⁇ constrained by dependencies or i ⁇ now known.
  • the register reference ⁇ provided via bu ⁇ 518 from the regi ⁇ ter rename unit 496 are monitored by the control flow control 528 to identify the boolean regi ⁇ ter that will define the branch deci ⁇ ion. Thu ⁇ , the branch deci ⁇ ion may be determined even prior to the out-of-order execution of the control flow in ⁇ truction.
  • the bypa ⁇ s unit 472 is directed by the bypas ⁇ control unit 520 to provide the control flow re ⁇ ults onto control line ⁇ 530, con ⁇ i ⁇ ting of the control flow zero and control flow one 1 control lines 750, 752, to the control flow control unit 528.
  • the control flow control unit 528 continuously provides two vector ⁇ of eight bit ⁇ each to the IFU 102 via control lines 348. These vector ⁇ define whether a branch in ⁇ truction at the corresponding logical location corre ⁇ ponding to the bit ⁇ within the vector ⁇ have been resolved and whether the branch result is taken or not taken.
  • control flow control unit 528 is implemented as pure combinatorial logic operating continuously in respon ⁇ e to the input control ⁇ ignal ⁇ to the control unit 528.
  • the instruction is ⁇ uer unit 498 operate ⁇ closely in conjunction with the bypas ⁇ control unit 520 to control the routing of data between the register file array 472 and the functional units 478o. n .
  • the bypas ⁇ control unit 520 operates in conjunction with the register file acces ⁇ , output and ⁇ tore pha ⁇ e ⁇ of operation ⁇ hown in Figure 14.
  • the bypas ⁇ control unit 520 may recognize, via control lines 522, an acce ⁇ s of a destination register within the register file array 472 that i ⁇ in the proce ⁇ of being written during the output phase of execution of an in ⁇ truction.
  • the bypas ⁇ control unit 520 direct ⁇ the ⁇ election of data provided on the functional unit output bus 482 to be bypa ⁇ ed back to the functional unit di ⁇ tribution bu ⁇ 480. Control over the bypas ⁇ unit 520 i ⁇ provided by the in ⁇ truction issuer unit 498 via control lines 542.
  • VMU 108 An interface definition for the VMU 108 is provided in Figure 15.
  • the VMU 108 con ⁇ i ⁇ t ⁇ principally of a VMU control logic unit 800 and a content addressable memory (CAM) 802.
  • the general function of the VMU 108 i ⁇ ⁇ hown graphically in Figure 16.
  • a representation of a virtual addre ⁇ is ⁇ hown partitioned into a space identifier (sID[31 : 28] ) , a virtual page number (VADDR[27: 14] ) , page offset (PADDR[13:4] ) , and a reque ⁇ t ID (rID[3:0]).
  • the 34 bit addre ⁇ s operates as a content addre ⁇ tag u ⁇ ed to identify a corre ⁇ ponding buffer regi ⁇ ter within the buffer 844.
  • an 18 bit wide regi ⁇ ter value is provided a ⁇ the high order 18 bits of a phy ⁇ ical addre ⁇ 846.
  • the page off ⁇ et and reque ⁇ t ID are provided a ⁇ the low order 14 bit ⁇ of the phy ⁇ ical addres ⁇ 846.
  • VMU miss is signalled. This require ⁇ the execution of a VMU fa ⁇ t trap handling routine that implements conventional ha ⁇ h algorithm 848 that acce ⁇ e ⁇ a complete page table data ⁇ tructure maintained in the MAU 112.
  • Thi ⁇ page table 850 contain ⁇ entrie ⁇ for all memory pages currently in u ⁇ e by the architecture 100.
  • the ha ⁇ h algorithm 848 identifie ⁇ tho ⁇ e entries in the page table 850 neces ⁇ ary to ⁇ ati ⁇ fy the current virtual page translation operation.
  • Tho ⁇ e page table entrie ⁇ are loaded from the MAU 112 to the trap regi ⁇ ter ⁇ of regi ⁇ ter ⁇ et "A" and then transferred by ⁇ pecial regi ⁇ ter move instructions to the table look a ⁇ ide buffer 844.
  • the in ⁇ truction giving rise to the VMU mi ⁇ exception i ⁇ re-executed by the IEU 104.
  • the virtual to physical addre ⁇ tran ⁇ lation operation ⁇ hould then complete without exception.
  • the VMU control logic 800 provides a dual interface to both the IFU 102 and IEU 104.
  • a ready ⁇ ignal i ⁇ provided on control lines 822 to the IEU 104 to signify that the VMU 108 is available for an address translation.
  • the VMU 108 is alsways ready to accept IFU 120 translation requests.
  • Both the IFU and IEU 102, 104 may po ⁇ e requests via control line 328, 804.
  • the IFU 102 has priority access to the VMU 108. Consequently, only a single busy control line 820 is provided to the IEU 104.
  • Both the IFU and IEU 102, 104 provide the ⁇ pace ID and virtual page number field ⁇ to the VMU control logic 800 via control line ⁇ 326, 808, re ⁇ pectivel .
  • the IEU 104 provides a read/write control ⁇ ignal via control signal 806 to define whether the addres ⁇ i ⁇ to be u ⁇ ed for a load or ⁇ tore operation a ⁇ nece ⁇ ary to modify memory acce ⁇ protection attribute ⁇ of the virtual memory referenced.
  • the ⁇ pace ID and virtual page field ⁇ of the virtual addre ⁇ s are pas ⁇ ed to the CAM unit 802 to perform the actual tran ⁇ lation operation.
  • the page off ⁇ et and ExID field ⁇ are eventually provided by the IEU 104 directly to the CCU 106.
  • the phy ⁇ ical page and reque ⁇ t ID field ⁇ are provided on the addre ⁇ line ⁇ 836 to the CAM unit 802.
  • the occurrence of a table look aside buffer match is signalled via the hit line and control output line ⁇ 830 to the VMU control logic unit 800.
  • the VMU control logic unit 800 generate ⁇ the virtual memory miss and virtual memory exception control ⁇ ignal ⁇ on line ⁇ 334, 332 in re ⁇ pon ⁇ e to the hit and control output control ⁇ ignal ⁇ on line ⁇ 830.
  • a virtual memory translation mi ⁇ s i ⁇ defined a ⁇ failure to match a page table identifier in the table look a ⁇ ide buffer 844. All other tran ⁇ lation error ⁇ are reported a ⁇ virtual memory exceptions.
  • the data tables within the CAM unit 802 may be modified through the execution of ⁇ pecial register to regi ⁇ ter move in ⁇ truction ⁇ by the IEU 104.
  • Read/write, regi ⁇ ter ⁇ elect, re ⁇ et, load and clear control ⁇ ignal ⁇ are provided by the IEU 104 via control lines 810, 812, 814, 816, 818.
  • the control on data interface for the CCU 106 i ⁇ ⁇ hown in Figure 17.
  • ⁇ eparate interface ⁇ are provided for the IFU 102 and IEU 104.
  • logically ⁇ eparate interfaces are provided by the CCU 106 to the MCU 110 with respect to in ⁇ truction and data tran ⁇ fers.
  • the IFU interface consists of the phy ⁇ ical page addre ⁇ s provided on addres ⁇ line ⁇ 324, the VMU converted page addre ⁇ a ⁇ provided on the addre ⁇ line ⁇ 824, and reque ⁇ t ID ⁇ a ⁇ tran ⁇ ferred ⁇ eparately on control line ⁇ 294, 296.
  • the read/busy and ready control ⁇ ignals are provided to the CCU 106 via control lines 298, 300, 302.
  • the reque ⁇ t ExID ⁇ are ⁇ eparately provided from and to the load/ ⁇ tore unit of the IEU 104 via control lines 796.
  • An 80 bit wide bidirectional data bus i ⁇ provided by the CCU 106 to the IEU 104.
  • only the lower 64 bits are utilized by the IEU 104.
  • the availability and ⁇ upport within the CCU 106 of a full 80 bit data tran ⁇ fer bus is provided to ⁇ upport subsequent implementations of the architecture 100 that support, through modifications of the floating point data path 660, floating point operation in accordance with IEEE standard 754.
  • the IEU control interface establi ⁇ hed via request, busy, ready, read/write and with control ⁇ ignal ⁇ 784 i ⁇ ⁇ ubstantially the same as the corre ⁇ ponding control ⁇ ignal ⁇ utilized by the IFU 102.
  • the exception being the provision of a read/write control ⁇ ignal to differentiate between load and ⁇ tore operation ⁇ .
  • the width control signals specify the number of bytes being transferred during each CCU 106 acce ⁇ by the IEU 104; in contra ⁇ t every acce ⁇ of the in ⁇ truction cache 132 is a fixed 128 bit wide data fetch operation.
  • the CCU 106 implements a ⁇ ub ⁇ tantially conventional cache controller function with re ⁇ pect to the separate in ⁇ truction and data cache ⁇ 132, 134.
  • the instruction cache 132 i ⁇ a high ⁇ peed memory providing for the ⁇ torage of 256 128 bit wide in ⁇ truction ⁇ et ⁇ .
  • the data cache 134 provide ⁇ for the ⁇ torage of 1024 32 bit wide word ⁇ of data. In ⁇ truction and data reque ⁇ ts that cannot be immediately satisfied from the contents of the instruction and data caches 132, 134 are passed on to the MCU 110.
  • instruction cache is ⁇ e ⁇
  • the 28 bit wide phy ⁇ ical addre ⁇ s is provided to the MCU 110 via the addres ⁇ bus 860.
  • the request ID and additional control signals for coordinating the operation of the CCU 106 and MCU 110 are provided on control lines 862.
  • the MCU 110 has coordinated the necessary read access of the MAU 112, two consecutive 64 bit wide data tran ⁇ fers are performed directly from the MAU 112 through to the instruction cache 132. Two transfers are required given that the data bus 136 is, in the preferred architecture 100, a 64 bit wide bu ⁇ .
  • a ⁇ the reque ⁇ ted data i ⁇ returned through the MCU 110 the request ID maintained during the pendency of the request operation is al ⁇ o returned to the CCU 106 via the control line ⁇ 862.
  • Data tran ⁇ fer operations between the data cache 134 and MCU 110 are ⁇ ubstantially the same a ⁇ in ⁇ truction cache operations . Since data load and ⁇ tore operation ⁇ may reference a ⁇ ingle byte, a full 32 bit wide phy ⁇ ical addre ⁇ is provided to the MCU 110 via the addre ⁇ bu ⁇ 864. Interface control ⁇ ignal ⁇ and the reque ⁇ t ExID are tran ⁇ ferred via control line ⁇ 866. Bidirectional 64 bit wide data tran ⁇ fers are provided via the data cache bu ⁇ 138.
  • a high-performance RISC ba ⁇ ed microprocessor architecture has been di ⁇ clo ⁇ ed.
  • the architecture efficiently implement ⁇ out-of-order execution of in ⁇ truction ⁇ , ⁇ eparate main and target in ⁇ truction ⁇ tream prefetch instruction transfer paths, and a procedural in ⁇ truction recognition and dedicated prefetch path.
  • the optimized in ⁇ truction execution unit provides multiple optimized data processing paths supporting integer, floating point and boolean operations and incorporates re ⁇ pective temporary register files facilitating out-of-order execution and instruction cancellation while maintaining a readily establi ⁇ hed precise state-of-the-machine ⁇ tatu ⁇ .

Abstract

Fast trap mechanism for a microprocessor, wherein a vector trap table is maintained which contains space for a plurality of instructions in each table entry. When a fast trap occurs, control is transferred directly into the table entry corresponding to the trap number. The trap handler can be located completely inside the table entry, or it can transfer control to additional handler code.

Description

DESCRIPTION
RISC MICROPROCESSOR ARCHITECTURE IMPLEMENTING FAST TRAP AND EXCEPTION STATE
CROSS-REFERENCE TO RELATED APPLICATIONS This Application is related to the following applications, all of which are assigned to the assignee of the present application/ and all of which are incorporated herein by reference:
1. HIGH-PERFORMANCE RISC MICROPROCESSOR ARCHITECTURE, invented by Le T. Nguyen et al. SMOS 7984 MCF/GBR, Application Serial Number 07 /727.nnfi . filed 08 July 1991; 2. EXTENSIBLE RISC MICROPROCESSOR ARCHITECTURE, invented by Le T. Nguyen et al, SMOS 7985 MCF/GBR, Application Serial Number 077121 .D R . filedΠR ,τni ioo-| •.
3. RISC MICROPROCESSOR ARCHITECTURE WITH ISOLATED ARCHITECTURAL DEPENDENCIES, invented by Le T. Nguyen et al, SMOS 7987 MCF/GBR, Application Serial Number 07/726,744 filed 08 July 1991;
4. RISC MICROPROCESSOR ARCHITECTURE IMPLEMENTING MULTIPLE TYPED REGISTER SETS, invented by Sanjiv Garg et al, SMOS 7988 MCF/GBR/RCC, Application Serial Number 07/726,773 filed 08 July 1991;
5. SINGLE CHIP PAGE PRINTER CONTROLLER, invented by Derek J. Lentz et al, SMOS 7991 MCF/GBR, Application Serial Number O7/726 -929 ■ filed °8 JulY ^ 6. MICROPROCESSOR ARCHITECTURE CAPABLE OF SUPPORTING HETEROGENEOUS PROCESSORS, invented by Derek J. Lentz et al, SMOS 7992 MCF/GBR/ MB, Application Serial Number 07Z726 -893 ■ filed °8 JulY l991.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to microprocessor architectures, and more particularly, to interrupt and exception handling in microprocessors.
2. Description of Related Art
In a typical microprocessor, instructions are generally executed in sequence unless a control flow varying instruction is encountered or an exception occurs. With respect to exceptions, facilities are included for changing the control flow upon the occurrence of particular events which may or may not be related to particular instructions in the instruction stream. For example, a microprocessor may include an interrupt request (IRQ) lead which, when activated by an external device, causes the microprocessor to save certain information relating to the current state of the machine, including an indication of the address of the next instruction to be executed, and then immediately transfer control to an interrupt handler which begins at some predetermined address. As another example, if an execution error such as divide-by-zero occurs during the execution of a particular instruction, the microprocessor may also save information related to the current state of the machine and transfer control to an exception handler. As yet another example, some microprocessors include a "software trap" instruction in their instruction set, which also causes the microprocessor to save information concerning the state of the machine and transfer control to an exception handler. As used herein, the terms interrupt, trap, fault and exception are used interchangeably.
In some microprocessors, an externally generated interrupt always causes the microprocessor to transfer control to the same interrupt handler entry point. If several external devices are present and able to activate the interrupt request lead, the interrupt handler must first determine which device caused the interrupt and then transfer control to a portion of code to handle that particular device. For example, the Intel 8048 microcontroller includes an Ϊ.NT input which, when activated, causes the microcontroller to transfer control to absolute memory location 3. The 8048 also includes a RESET input which, when activated, causes the microcontroller to transfer control to absolute memory location 0. It also includes an internal timer/counter which can generate interrupts which cause a transfer of control to absolute memory location 7. Other microprocessors include "interrupt level" leads in addition to the interrupt request lead. For these microprocessors, when an external device activates the interrupt request lead, it also places a trap number, unique to that particular device, on the interrupt level lines. The internal hardware of the microprocessor then transfers control, or "vectors", to any of several interrupt handlers, each corresponding to a different trap number. Similarly, some microprocessors have only a single predetermined entry point for all routines written to handle internally generated exceptions, and others have facilities for vectoring automatically to a routine dependent upon a trap number defined for each particular type of internal exception that might occur. - 4 -
In the past, where interrupt and exception handlers were vectored, a number of different techniques were used to determine the entry point of the appropriate handler. In one technique, a table of addresses was created, beginning at a particular table base address which was either fixed or definable by the user. Each entry in the table was the same length as the length of an address, for example two or four bytes long, and contained the entry point for a corresponding trap number. When an interrupt or exception occurred, the microprocessor first determined the base address of the table, then added times the trap number (where m is the number of bytes in each entry) , and then loaded the information stored at the resulting address into the program counter (PC) to thereby transfer control to the routine beginning at the address specified in the table entr .
In other microprocessors, an entire branch instruction was stored in each entry in the table, instead of merely the address of a handler. The number of bytes in each entry was equal to the number of bytes in a branch instruction. When an interrupt or exception was received, the microprocessor would first determine the table base address, add times the trap number, and simply load the result into the program counter. The first instruction then executed would be the branch instruction in the table, and control would finally transfer to the appropriate exception handler.
In both of the above techniques for vectoring to a handler, a delay is encountered because a preliminary operation must be performed before the operational part of the handler can begin execution. In the first above-mentioned technique, the entry point address first had to be retrieved from the table before it could be loaded into the program counter. In the second above-described technique, an entire preliminary branch instruction had to be retrieved and executed before the substantive part of the handler could begin executing. Adder delays could be eliminated in the calculation of the table base address plus m times the trap number, by merely concatenating high-order bits from the base address with the trap number itself as lower-order bits, followed by log2 m zero bits, but the delays caused by the preliminary operations just described remained. Such delays can be detrimental in a system where the response time to handle certain types of interrupts is critical.
Another problem related to exception handling in prior art microprocessors concerns the amount of information which must be stored to be able to reinstate the "state of the machine" if and when the trap handler returns to the main instruction flow. A tradeoff exists between the desire to store as much information as possible, and the desire to minimize the delay in dispatching to a trap handler. With respect to on-chip data registers in particular, one technique that has been used is to store none of the on-chip data registers, leaving it up to the handler to temporarily store the data in each register before it can use the register for its own purposes. The handler then had to replace the data in the register before returning. The need to store and restore these registers can slow the operation of the handler significantly. In another technique, the hardware automatically stores the contents of the registers on a stack before transferring control to the handler. This technique is also inadequate since it increases hardware complexity, and also can delay transfer to the handler significantly. Thus, with the vectoring techniques described above, the delays caused by existing techniques for protecting the contents of registers when a trap handler is invoked can be unacceptable in a high performance microprocessor.
SUMMARY OF THE INVENTION According to the invention, a microprocessor architecture is employed which alleviates many of the above deficiencies in prior art systems. In particular, a "fast trap" exception dispatching technique is employed by which an entire handler can be stored in a single vector address table entry. Each table entry has enough space for at least two instructions, and preferably significantly more, so that when a fast trap occurs, the microprocessor need only branch to an address determined by concatenating m times the trap number to a base address. The delay required to fetch an entry point address from the table, or to fetch and execute a preliminary branch instruction is eliminated. The microprocessor may also include other, less time efficient, vectoring techniques for less critical types of traps.
In another aspect of the invention, when a trap is encountered, the processor enters an interrupted state which automatically shifts a number of shadow registers to the foreground and shifts a corresponding set of foreground registers into the background. Register contents are not transferred; rather, the shadow registers are simply made available in place of the normal registers. Thus the handler has a set of registers immediately available for use without any need to be concerned about destroying data needed for the main instruction stream. The above-mentioned HIGH-PERFORMANCE RISC MICROPROCESSOR ARCHITECTURE application describes an advanced microprocessor which prefetches instructions prior to the time they are executed, can handle out-of- order return of instruction prefetch requests, can execute more than one instruction during the same execution time, and can also execute instructions out of order relative to their sequence in the instruction stream. Another aspect of the present invention includes a mechanism to maintain the preciseness of synchronous exceptions which occur relative to instructions prior to and during the time they are executed.
The microprocessor architecture described in that application further includes facilities for handling a separate procedural instruction flow called via a procedural, or emulation, instruction in the main instruction flow. The transfer of control to a procedural instruction flow is accomplished without flushing any instructions already prefetched in the main instruction flow, by having a separate emulation instruction prefetch queue. According to another aspect of the invention, the interrupted state remains available whether the processor is executing from the main instruction stream or a procedural instruction stream, and the processor maintains an indication of which instruction stream to return to upon a return from trap. Further, separate prefetch program counters are maintained for the main and emulation instruction streams, and the processor stores only the prefetch PC from the current instruction stream when a trap handler is invoked, and restores it to the proper prefetch program counter when the handler returns. BRIEF DESCRIPTION OF THE DRAWINGS These and other advantages and features of the present invention will become better understood upon consideration of the following detailed description of the invention when considered in connection of the accompanying drawings, in which like reference numerals designate like parts throughout the figures thereof, and wherein:
Figure 1 is a simplified block diagram of the preferred microprocessor architecture implementing the present invention;
Figure 2 is a detailed block diagram of the instruction fetch unit constructed in accordance with the present invention; Figure 3 is a block diagram of the program counter logic unit constructed in accordance with the present invention;
Figure 4 is a further detailed block diagram of the program counter data and control path logic; Figure 5 is a simplified block diagram of the instruction execution unit of the present invention;
Figure 6a is a simplified block diagram of the register file architecture utilized in a preferred embodiment of the present invention. Figure 6b is a graphic illustration of the storage register format of the temporary buffer register file and utilized in a preferred embodiment of the present invention;
Figure 6c is a graphic illustration of the primary and secondary instruction sets as present in the last two stages of the instruction FIFO unit of the present invention;
Figures 7a-c provide a graphic illustration of the reconfigurable states of the primary integer register set as provided in accordance with a preferred embodiment of the present invention;
Figure 8 is a graphic illustration of a reconfigurable floating point and secondary integer register set as provided in accordance with the preferred embodiment of the present invention;
Figure 9 is a graphic illustration of a tertiary boolean register set as provided in a preferred embodiment of the present invention; Figure 10 is a detailed block diagram of the primary integer processing data path portion of the instruction execution unit constructed in accordance with the preferred embodiment of the present invention;
Figure 11 is a detailed block diagram of the primary floating point data path portion of the instruction execution unit constructed in accordance with a preferred embodiment of the present invention;
Figure 12 is a detailed block diagram of the boolean operation data path portion of the instruction execution unit as constructed in accordance with the preferred embodiment of the present invention;
Figure 13 is a detailed block diagram of a load/store unit constructed in accordance with the preferred embodiment of the present invention; Figure 14 is a timing diagram illustrating the preferred sequence of operation of a preferred embodiment of the present invention in executing multiple instructions in accordance with the present invention; Figure 15 is a simplified block diagram of the virtual memory control unit as constructed in accordance with the preferred embodiment of the present invention; Figure 16 is a graphic representation of the virtual memory control algorithm as utilized in a preferred embodiment of the present invention; and
Figure 17 is a simplified block diagram of the cache control unit as utilized in a preferred embodiment of the present invention.
DETAILED DESCRIPTION
I. Microprocessor Architectural Overview . . . . 12 II. Instruction Fetch Unit 15
A) IFU Data Path 16
B) IFU Control Path 21
C) IFU/IEU Control Interface 30
D) PC Logic Unit Detail 33 1) PF and ExPC Control/Data Unit Detail 37
2) PC Control Algorithm Detail 44
E) Interrupt and Exception Handling . . . . 56
1) Overview 56
2) Asynchronous Interrupts: 58 3) Synchronous Exceptions 60
4) Handler Dispatch and Return 64
5) Nesting 68
6) List of Traps: 69 III. Instruction Execution Unit 70
A) IEU Data Path Detail 77
1) Register File Detail 77
2) Integer Data Path Detail 86
3) Floating Point Data Path Detail . . . 90 4) Boolean Register Data Path Detail . . 92
B) Load/Store Control Unit 97
C) IEU Control Path Detail 100
1) EDecode Unit Detail 101
2) Carry Checker Unit Detail 104 3) Data Dependency Checker Unit Detail . 105
4) Register Rename Unit Detail 106
5) Instruction Issuer Unit Detail . . . 108
6) Done Control Unit Detail Ill
7) Retirement Control Unit Detail . . . Ill 8) Control Flow Control Unit Detail . . 112
9) Bypass Control Unit Detail 113
IV. Virtual Memory Control Unit 113 V. Cache Control Unit 116
VI. Summary/Conclusion 118 I. Microprocessor Architectural Overview:
The architecture 100 of the present invention is generally shown in Figure 1. An Instruction Fetch Unit (IFU) 102 and an Instruction Execution Unit (IEU) 104 are the principal operative elements of the architecture 100. A Virtual Memory Unit (VMU) 108, Cache Control Unit (CCU) 106, and Memory Control Unit (MCU) 110 are provided to directly support the function of the IFU 102 and IEU 104. A Memory Array Unit (MAU) 112 is also provided as a generally essential element for the operation of the architecture 100, though the MAU 112 does not directly exist as an integral component of the architecture 100. That is, in the preferred embodiments of the present invention, the IFU 102, IEU 104, VMU 108, CCU 106, and MCU 110 are fabricated on a single silicon die utilizing a conventional 0.8 micron design rule low-power CMOS process and comprising some 1,200,000 transistors. The standard processor or system clock speed of the architecture 100 is 40 MHz. However, in accordance with a preferred embodiment of the present invention, the internal processor clock speed is 160 MHz.
The IFU 102 is primarily responsible for the fetching of instructions, the buffering of instructions pending execution by the IEU 104, and, generally, the calculation of the next virtual address to be used for the fetching of next instructions .
In the preferred embodiments of the present invention, instructions are each fixed at a length of 32 bits. Instruction sets, or "buckets" of four instructions, are fetched by the IFU 102 simultaneously from an instruction cache 132 within the CCU 106 via a 128 bit wide instruction bus 114. The transfer of instruction sets is coordinated between the IFU 102 and CCU 106 by control signals provided via a control bus 116. The virtual address of a instruction set to be fetched is provided by the IFU 102 via an IFU combined arbitration, ccntrol and address bus 118 onto a shared arbitration, control and address bus 120 further coupled between the IEU 104 and VMU 108. Arbitration for access to the VMU 108 arises from the fact that both the IFU 102 and IEU 104 utilize the VMU 108 as a common, shared resource. In the preferred embodiment of the architecture 100, the low order bits defining an address within a physical page of the virtual address are transferred directly by the IFU 102 to the Cache Control Unit 106 via the control lines 116. The virtualizing, high order bits of the virtual address supplied by the IFU 102 are provided by the address portion of the buses 118, 120 to the VMU 108 for translation into a corresponding physical page address. For the IFU 102, this physical page address is transferred directly from the VMU 108 to the Cache Control Unit 106 via the address control lines 122 one-half internal processor cycle after the translation request is placed with the VMU 108.
The instruction stream fetched by the IFU 102 is, in turn, provided via an instruction stream bus 124 to the IEU 104. Control signals are exchanged between the IFU 102 and the IEU 104 via controls lines 126. In addition, certain instruction fetch addresses, typically those requiring access to the register file present within the IEU 104, are provided back to the IFU via a target address return bus within the control lines 126.
The IEU 104 stores and retrieves data with respect to a data cache 134 provided within the CCU 106 via an
80-bit wide bi-directional data bus 130. The entire physical address for IEU data accesses is provided via an address portion of the control bus 128 to the CCU 106. The control bus 128 also provides for the exchange of control signals between the IEU 104 and CCU 106 for managing data transfers. The IEU 104 utilizes the VMU 108 as a resource for converting virtual data address into physical data addresses suitable for submission to the CCU 106. The virtualizing portion of the data address is provided via the arbitration, control and address bus 120 to the VMU 108. Unlike operation with respect to the IFU 102, the VMU 108 returns the corresponding physical address via the bus 120 to the IEU 104. In the preferred embodiments of the architecture 100, the IEU 104 requires the physical address for use in ensuring that load/store operations occur in proper program stream order.
The CCU 106 performs the generally conventional high-level function of determining whether physical address defined requests for data can be satisfied from the instruction and data caches 132, 134, as appropriate. Where the access request can be properly fulfilled by access to the instruction or data caches 132, 134, the CCU 106 coordinates and performs the data transfer via the data buses 114, 128.
Where a data access request cannot be satisfied from the instruction or data caches 132, 134, the CCU 106 provides the corresponding physical address to the MCU 110 along with sufficient control information to identify whether a read or write access of the MAU 112 is desired, the source or destination cache 132, 134 of the CCU 106 for each request, and additional identifying information to allow the request operation to be correlated with the ultimate data request as issued by the IFU 102 or IEU 104. The MCU 110 preferably includes a port switch unit 142 that is coupled by a uni-direσtional data bus 136 with the instruction cache 132 of the CCU 106 and a bi¬ directional data bus 138 to the data cache 134. The port switch 142 is, in essence, a large multiplexer allowing a physical address obtained from the control bus 140 to be routed to any one of a number of ports P0- PN 146^ and the bi-directional transfer of data from the ports to the data buses 136, 138. Each memory access request processed by the MCU 110 is associated with one of the ports 146^ for purposes of arbitrating for access to the main system memory bus 162 as required for an access of the MAU 112. Once a data transfer connection has been established, the MCU provides control information via the control bus 140 to the CCU 106 to initiate the transfer of data between either the instruction or data cache 132, 134 and MAU 112 via the port switch 142 and the corresponding one of the ports 146^. In accordance with the preferred embodiments of the architecture 100 the MCU 110 does not actually store or latch data in transit between the CCU 106 and MAU 112. This is done to minimize latency in the transfer and to obviate the need for tracking or managing data that may be uniquely present in the MCU 110.
II. Instruction Fetch Unit:
The primary elements of the Instruction Fetch Unit 102 are shown in Figure 2. The operation and interrelationship of these elements can best be understood by considering their participation in the IFU data and control paths . A) IFU Data Path:
The IFU data path begins with the instruction bus 114 that receives instruction sets for temporary storage in a prefetch buffer 260. An instruction set from the prefetch buffer 260 is passed through an IDecode unit 262 and then to an IFIFO unit 264. Instruction sets stored in the last two stages of the instruction FIFO 264 are continuously available, via the data buses 278, 280, to the IEU 104. The prefetch buffer unit 260 receives a single instruction set at a time from the instruction bus 114. The full 128 bit wide instruction set is generally written in parallel to one of four 128 bit wide prefetch buffer locations in a Main Buffer (MBUF) 188 portion of the prefetch buffer 260. Up to four additional instruction sets may be similarly written into two 128 bit wide Target Buffer (TBUF) 190 prefetch buffer locations or to two 128 bit wide Procedural Buffer (EBUF) 192 prefetch buffer locations. In the preferred architecture 100, an instruction set in any one of the prefetch buffer locations within the MBUF 188, TBUF 190 or EBUF 192 may be transferred to the prefetch buffer output bus 196. In addition, a direct fall through instruction set bus 194 is provided to connect the instruction bus 114 directly with the prefetch buffer output bus 196, thereby bypassing the MBUF, TBUF and EBUF 188, 190, 192.
In the preferred architecture 100, the MBUF 188 is utilized to buffer instruction sets in the nominal or main instruction stream. The TBUF 190 is utilized to buffer instruction sets fetched from a tentative target branch instruction stream. Consequently, the prefetch buffer unit 260 allows both possible instruction streams following a conditional branch instruction to be prefetched. This facility obviates the latency for further accesses to at least the CCU 106, if not the substantially greater latency of a MAU 112, for obtaining the correct next instruction set for execution following a conditional branch instruction regardless of the particular instruction stream eventually selected upon resolution of the conditional branch instruction. In the preferred architecture 100 invention, the provision of the MBUF 188 and TBUF 190 allow the instruction fetch unit 102 to prefetch both potential instruction streams and, as will be discussed below in relationship to the instruction execution unit 104, to further allow execution of the presumed correct instruction stream. Where, upon resolution of the conditional branch instruction, the correct instruction stream has been prefetched into the MBUF 188, any instruction sets in the TBUF 190 may be simply invalidated. Alternately, where instruction sets of the correct instruction stream are present in the TBUF 190, the instruction prefetch buffer unit 260 provides for the direct, lateral transfer of those instruction sets from the TBUF 190 to respective buffer locations in the MBUF 188. The prior MBUF 188 stored instruction sets are effectively invalidated by being overwritten by the TBUF 190 transferred instruction sets. Where there is no TBUF instruction set transferred to an MBUF location, that location is simply marked invalid.
Similarly, the EBUF 192 is provided as another, alternate prefetch path through the prefetch buffer 260. The EBUF 192 is preferably utilized in the prefetching of an alternate instruction stream that is used to implement an operation specified by a single instruction, a "procedural" instruction, encountered in the MBUF 188 instruction stream. In this manner, complex or extended instructions can be implemented through software routines, or procedures, and processed through the prefetch buffer unit 260 without disturbing the instruction streams already prefetched into the MBUF 188. Although the present invention generally permits handling of procedural instructions that are first encountered in the TBUF 190, prefetching of the procedural instruction stream is held with all prior pending conditional branch instructions are resolved. This allows conditional branch instructions occurring in the procedural instruction stream to be consistently handled through the use of the TBUF 190. Thus, where a branch is taken in the procedural stream, the target instruction sets will have been prefetched into the TBUF 190 and can be simply laterally transferred to the EFUF 192.
Finally, each of the MBUF 188, TBUF 190 and EBUF 192 are coupled to the prefetch buffer output bus 196 so as to provide any instruction set stored by the prefetch unit onto the output bus 196. In addition, a flow through bus 194 is provided to directly transfer an instruction set from the instruction bus 114 directly to the output bus 196.
In the preferred architecture 100, the prefetch buffers within the MBUF 188, TBUF 190, EBUF 192 do not directly form a FIFO structure. Instead, the provision of an any buffer location to output bus 196 connectivity allows substantial freedom in the prefetch ordering of instruction sets retrieved from the instruction cache 132. That is, the instruction fetch unit 102 generally determines and requests instruction sets in the appropriate instruction stream order of instructions. However, the order in which instruction sets are returned to the IFU 102 is allowed to occur out-of- order as appropriate to match the circumstances where some requested instruction sets are available and accessible from the CCU 106 alone and others require an access of the MAU 112. Although instruction sets may not be returned in order to the prefetch buffer unit 260, the sequence of instruction sets output on the output bus 196 must generally conform to the order of instruction set requests issued by the IFU 102; the in-order instruction stream sequence subject to, for example, tentative execution of a target branch stream.
The IDecode unit 262 receives the instruction sets, generally one per cycle, IFIFO unit 264 space permitting, from the prefetch buffer output bus 196. Each set of four instructions that make up a single instruction set is decoded in parallel by the IDecode unit 262. While relevant control flow information is extracted via lines 318 for the benefit of the control path portion of the IFU 102, the contents of the instruction set is not altered by the IDecode unit 262.
Instruction sets from the IDecode Unit 162 are provided onto a 128 bit wide input bus 198 of the IFIFO unit 264. Internally, the IFIFO unit 264 consists of a sequence of master/slave registers 200, 204, 208, 212, 216, 220, 224. Each register is coupled to its successor to allow the contents of the master registers 200, 208, 216 to be transferred during a first half internal processor cycle of FIFO operation to the slave registers 204, 212, 220 and then to the next successive master register 208, 216, 224 during the succeeding half-cycle of operation. The input bus 198 is connected to the input of each of the master registers 200, 208, 216, 224 to allow loading of an instruction set from the IDecode unit 262 directly in to a master register during the second half-cycle of FIFO operation. However, loading of a master register from the input bus 198 need not occur simultaneously with a FIFO shift of data within the IFIFO unit 264. Consequently, the IFIFO unit 264 can be continuously filled from the input bus 198 regardless of the current depth of instruction sets stored within the instruction FIFO unit 264 and, further, independent of the FIFO shifting of data through the IFIFO unit 264. Each of the master/slave registers 200, 204, 208, 212, 216, 220, 224, in addition to providing for the full parallel storage of a 128 bit wide instruction set, also provides for the storage of several bits of control information in the respective control registers 202, 206, 210, 214, 218, 222, 226. The preferred set of control bits include exception miss and exception modify, (VMU), no memory (MCU), branch bias, stream, and offset (IFU) . This control information originates from the control path portion of the IFU 102 simultaneous with the loading of an IFIFO master register with a new instruction set from the input bus 198. Thereafter, the control register information is shifted in parallel concurrently with the instruction sets through the IFIFO unit 264. Finally, in the preferred architecture 100, the output of instruction sets from the IFIFO unit 264 is obtained simultaneously from the last two master registers 216, 224 on the I_Bucket_0 and I_Bucket_l instruction set output buses 278, 280. In addition, the corresponding control register information is provided on the IBASV0 and IBASV1 control field buses 282, 284. These output buses 278, 282, 280, 284 are all provided as the instruction stream bus 124 to the IEU 104. B. IFU Control Path:
The control path for the IFU 102 directly supports the operation of the prefetch buffer unit 260, IDecode unit 262 and IFIFO unit 264. A prefetch control logic unit 266 primarily manages the operation of the prefetch buffer unit 260. The prefetch control logic unit 266 and IFU 102 in general, receives the system clock signal via the clock line 290 for synchronizing IFU operations with these of the IEU 104, CCU 106 and VMU 108. Control signals appropriate for the selection and writing of instruction sets into the MBUF 188, TBUF
190 and EBUF 192 are provided on the control lines 304.
A number of control signals are provided on the control lines 316 to the prefetch control logic unit 266. Specifically, a fetch request control signal is provided to initiate a prefetch operation. Other control signals provided on the control line 316 identify the intended destination of the requested prefetch operation as being the MBUF 188, TBUF 190 or EBUF 192. In response to a prefetch request, the prefetch control logic unit 266 generates an ID value and determines whether the prefetch request can be posted to the CCU 106. Generation of the ID value is accomplished through the use of a circular four-bit counter.
The use of a four-bit counter is significant in three regards. The first is that, typically a maximum of nine instruction sets may be active at one time in the prefetch buffer unit 260; four instruction sets in the MBUF 188, two in the TBUF 190, two in the EBUF 192 and one provided directly to the IDecode unit 262 via the flow through bus 194. Secondly, instruction sets include four instructions of four bytes each. Consequently, the least significant four bits of any address selecting an instruction set for fetching are superfluous. Finally, the prefetch request ID value can be easily associated with a prefetch request by insertion as the least significant four bits of the prefetch request address; thereby reducing the total number of address lines required to interface with the CCU 106.
To allow instruction sets to be returned by the CCU 106 out-of-order with respect to the sequence of prefetch requests issued by the IFU 102, the architecture 100 provides for the return of the ID request value with the return of instruction sets from the CCU 106. However, the out-of-order instruction set return capability may result in exhaustion of the sixteen unique IDs. A combination of conditional instructions executed out-of-order, resulting in additional prefetches and instruction sets requested but not yet returned can lead to potential re-use of an ID value. Therefore, the four-bit counter is preferably held, and no further instruction set prefetch requests issued, where the next ID value would be the same as that associated with an as yet outstanding fetch request or another instruction set then pending in the prefetch buffer 260. The prefetch control logic unit 266 directly manages a prefetch status array 268 which contains status storage locations logically corresponding to each instruction set prefetch buffer location within the MBUF 188, TBUF 190 and EBUF 192. The prefetch control logic unit 266, via selection and data lines 306, can scan, read and write data to the status register array 268. Within the array 268, a main buffer register 308 provides for storage of four, four-bit ID values (MB ID), four single-bit reserved flags (MB RES) and four single-bit valid flags (MB VAL) , each corresponding by logical bit-position to the respective instruction set storage locations within the MBUF 180. Similarly, a target buffer register 310 and extended buffer register 312 each provide for the storage of two four-bit ID values (TB ID, EB ID), two single-bit reserved flags (TB RES, EB RES), and two single-bit valid flags (TB VAL, EB VAL). Finally, a flow through status register 314 provides for the storage of a single four-bit ID value (FT ID), a single reserved flag bit (FT RES), and a single valid flag bit (FT VAL) .
The status register array 268 is first scanned and, as appropriate, updated by the prefetch control logic unit 266 each time a prefetch request is placed with the CCU 106 and subsequently scanned and updated each time an instruction set is returned. Specifically, upon receipt of the prefetch request signal via the control lines 316, the prefetch control logic unit 216 increments the current circular counter generated ID value, scans the status register array 268 to determine whether the ID value is available for use and whether a prefetch buffer location of the type specified by the prefetch request signal is available, examines the state of the CCU IBUSY control line 300 to determine whether the CCU 106 can accept a prefetch request and, if so, asserts a CCU IREAD control signal on the control line 298, and places the incremented ID value on the CCU ID out bus 294 to the CCU 106. A prefetch storage location is available for use where both of the corresponding reserved and valid status flags are false. The prefetch request ID is written into the ID storage location within the status register array 268 corresponding to the intended storage location within the MBUF 188, TBUF 190, or EBUF 192 concurrent with the placement of the request with the CCU 106. In addition, the corresponding reserved status flag is set true.
When the CCU 106 is able to return a previously requested instruction set to the IFU 102, the CCU IREADY signal is asserted on control line 302 and the corresponding instruction set ID is provided on the CCU
ID control lines 296. The prefetch control logic unit
266 scans the ID values and reserved flags within the status register array 268 to identify the intended destination of the instruction set within the prefetch buffer unit 260. Only a single match is possible. Once identified, the instruction set is written via the bus
114 into the appropriate location within the prefetch buffer unit 260 or, if identified as a flow through request, provided directly to the IDecode unit 262. In either case, the valid status flag in the corresponding status register array is set true.
The PC logic unit 270, as will be described below in greater detail, tracks the virtual address of the MBUF 188, TBUF 190 and EBUF 192 instruction streams through the entirety of the IFU 102. In performing this function, the PC logic block 270 both controls and operates from the IDecode unit 262. Specifically, portions of the instructions decoded by the IDecode unit 262 potentially relevant to a change in the program instruction stream flow are provided on the bus 318 to a control flow detection unit 274 and directly to the PC logic block 270. The control flow detection unit 274 identifies each instruction in the decoded instruction set that constitutes a control flow instruction including conditional and unconditional branch instructions, call type instructions, software traps procedural instructions and various return instructions.
The control flow detection unit 274 provides a control signal, via lines 322, to the PC logic unit 270 to identify the location and specific nature of the control flow instructions within the instruction set present in the IDecode unit 262. The PC logic unit 270, in turn, determines the target address of the control flow instruction, typically from data provided within the instruction and transferred to the PC logic unit via lines 318. Where, for example, a branch logic bias has been selected to execute ahead for conditional branch instructions, the PC logic unit 270 will begin to direct and separately track the prefetching of instruction sets from the conditional branch instruction target address. Thus, with the next assertion of a prefetch request on the control lines 316, the PC logic unit 270 will further assert a control signal, via lines 316, selecting the destination of the prefetch to be the TBUF 190, assuming that prior prefetch instruction sets were directed to the MBUF 188 or EBUF 192. Once the prefetch control logic unit 266 determines that a prefetch request can be supplied to the CCU 106, the prefetch control logic unit 266 provides an enabling signal, again via lines 316, to the PC logic unit 270 to enable the provision of a page offset portion of the target address (CCU PADDR [13:4]) via the address lines 324 directly to the CCU 106. At the same time, the PC logic unit 270, where a new virtual to physical page translation is required further provides a VMU request signal via control line 328 and the virtualizing portion of the target address (VMU VADDR [31:14]) via the address lines 326 to the VMU 108 for translation into a physical address. Where a page translation is not required, no operation by the VMU 108 is required. Rather, the previous translation result is maintained in an output latch coupled to the bus 122 for immediate use by the CCU 106.
Operational errors in the VMU 108 in performing the virtual to physical translation requested by the PC logic unit 270 are reported via the VMU exception and VMU miss control lines 332, 334. The VMU miss control line 334 reports a translation lookaside buffer (TLB) miss. The VMU exception control signal, on VMU exception line 332, is raised for all other exceptions. In both cases, the PC logic unit handles the error condition by storing the current execution point in the instruction stream and then prefetching, as if in response to an unconditional branch, a dedicated exception handling routine instruction stream for diagnosing and handling the error condition. The VMU exception and miss control signals identify the general nature of the exception encountered, thereby allowing the PC logic unit 270 to identify the prefetch address of a corresponding exception handling routine. The IFIFO control logic unit 272 is provided to directly support the IFIFO unit 264. Specifically, the PC logic unit 270 provides a control signal via the control lines 336 to signal the IFIFO control logic unit 272 that an instruction set is available on the input bus 198 from the IDecode unit 262. The IFIFO control unit 272 is responsible for selecting the deepest available master register 200, 208, 2i6, 224 for receipt of the instruction set. The output of each of the master control registers 202, 210, 218, 226 is provided to the IFIFO control unit 272 via the control bus 338. The control bits stored by each master control register includes a two-bit buffer address (IF_Bx_ADR) , a single stream indicator bit (IF_Bx_STRM) , and a single valid bit (IF Bx VLD) . The two bit buffer address identifies - 27 - the first valid instruction within the corresponding instruction set. That is, instruction sets returned by the CCU 106 may not be aligned such that the target instruction of a branch operation, for example, is located in the initial instruction location within the instruction set. Thus, the buffer address value is provided to uniquely identify the initial instruction within an instruction set- that is to be considered for execution. The stream bit is used essentially as a marker to identify the location of instruction sets containing conditional control flow instructions, and giving rise to potential control flow changes, in the stream of instructions through the IFIFO unit 264. The main instruction stream is processed through the MBUF 188 generally with a stream bit value of 0. On the occurrence of a relative conditional branch instruction, for example, the corresponding instruction set is marked with a stream bit value of 1. The conditional branch instruction is detected by the IDecode unit 262. Up to four conditional control flow instructions may be present in the instruction set. The instruction set is then stored in the deepest available master register of the IFIFO unit 264. In order to determine the target address of the conditional branch instruction, the current IEU 104 execution point address (DPC), the relative location of the conditional instruction containing instruction set as identified by the stream bit, and the conditional instruction location offset in the instruction set, as provided by the control flow detector 274, are combined with the relative branch offset value as obtained from a corresponding branch instruction field via control lines 318. The result is a branch target virtual address that is stored by the PC logic unit 270. The initial instruction sets of the target instruction stream may then be prefetched into the TBUF 190 utilizing this address. Depending on the preselected branch bias selected for the PC logic unit 270, the IFIFO unit 264 will continue to be loaded from either the MBUF 188 or TBUF 190. If a second instruction set containing one or more conditional flow instructions is encountered, the instruction set is marked with a stream bit value of 0. Since a second target stream cannot be fetched, the target address is calculated and stored by the PC logic unit 270, but no prefetch is performed. In addition, no further instruction sets can be processed through the IDecode unit 262, or at least none that are found to contain a conditional flow control instruction.
The PC logic unit 270, in the preferred embodiments of the present invention, can manage upto eight conditional flow instructions occurring in upto two instruction sets. The target addresses for each of the two instruction sets marked by stream bit changes are stored in an array of four address registers with each target address positioned logically with respect to the location of the corresponding conditional flow instruction in the instruction set.
Once the branch result of the first in-order conditional flow instruction is resolved, the PC logic unit 270 will direct the prefetch control unit 260, via control signals on lines 316, to transfer the contents of the TBUF 190 to the MBUF 188, if the branch is taken, and to mark invalid the contents of the TBUF 190. Any instruction sets in the IFIFO unit 264 from the incorrect instruction stream, target stream if the branch is not taken and main stream if the branch is taken, are cleared from the IFIFO unit 264. If a second or subsequent conditional flow control instruction exists in the first stream bit marked instruction set, that instruction is handled in a consistent manner: the instruction sets from the target stream are prefetched, instruction sets from the MBUF 188 or TBUF 190 are processed through the IDecode unit 262 depending on the branch bias, and the IFIFO unit 264 is cleared of incorrect stream instruction sets when the conditional flow instruction finally resolves.
If a secondary conditional flow instruction set remains in the IFIFO unit 264 once the IFIFO unit 264 is cleared of incorrect stream instruction sets, and the first conditional flow instruction set contains no further conditional flow instructions, the target addresses of the second stream bit marked instruction set are promoted to the first array of address registers. In any case, a next instruction set containing conditional flow instructions can then be evaluated through the IDecode unit 262. Thus, the toggle usage of the stream bit allows potential control flow changes to be marked and tracked through the IFIFO unit 264 for purposes of calculating branch target addresses and for marking the instruction set location above which to clear where the branch bias is subsequently determined to have been incorrect for a particular conditional flow control instruction.
Rather than actually clearing instruction sets from the master registers, the IFIFO control logic unit 272 simply resets the valid bit flag in the control registers of the corresponding master registers of the IFIFO unit 264. The clear operation is instigated by the PC logic unit 270 in a control signal provided on lines 336. The inputs of each of the master control registers 202, 210, 218, 226 are directly accessible by the IFIFO control logic unit 272 via the status bus 230. In the preferred architecture 100, the bits within these master control registers 202, 210, 218, 226 may be set by the IFIFO control unit 272 concurrent with or independent of a data shift operation by the IFIFO unit 264. This capability allows an instruction set to be written into any of the master registers 200, 208, 216, 224, and the corresponding status information to be written into the master control registers 202, 210, 218, 226 asynchronously with respect to the operation of the IEU 104.
Finally, an additional control line on the control and status bus 230 enables and directs the FIFO operation of the IFIFO unit 264. An IFIFO shift is performed by the IFIFO control logic unit 272 in response to the shift request control signal provided by the PC logic unit 270 via the control lines 336. The
IFIFO control unit 272, based on the availability of a master register 200, 208, 216, 224 to receive an instruction set provides a control signal, via lines
316, to the prefetch control unit 266 to request the transfer of a next appropriate instruction set from the prefetch buffers 260. On transfer of the instruction set, the corresponding valid bit in the array 268 is reset.
C, IFU/IEU Control Interface:
The control interface between the IFU 102 and IEU 104 is provided by the control bus 126. This control bus 126 is coupled to the PC logic unit 270 and consists of a number of control, address and specialized data lines. Interrupt request and acknowledge control signals, as passed via the control lines 340, allow the IFU 102 to signal and synchronize interrupt operations with the IEU 104. An externally generated interrupt signal is provided on a line 292 to the logic unit 270. In response, an interrupt request control signal, provided on lines 340, causes the IEU 104 to, cancel tentatively executed instructions. Information regarding the nature of an interrupt- is exchanged via interrupt information lines 341. When the IEU 104 is ready to begin receiving instruction sets prefetched from the interrupt service routine address determined by the PC logic unit 270, the IEU 104 asserts an interrupt acknowledge control signal on the lines 340. Execution of the interrupt service routine, as prefetched by the IFU 102, will then commence. An IFIFO read (IFIFO RD) control signal is provided by the IEU 104 to signal that the instruction set present in the deepest master register 224 has been completely executed and that a next instruction set is desired. Upon receipt of this control signal, the PC logic unit 270 directs the IFIFO control logic unit 272 to perform a IFIFO shift operation on the IFIFO unit 264.
A PC increment request and size value (PC INC/SIZE) is provided on the control lines 344 to direct the PC logic unit 270 to update the current program counter value by a corresponding size number of instructions. This allows the PC logic unit 270 to maintain a point of execution program counter (DPC) that is precise to the location of the first in-order executing instruction in the current program instruction stream.
A target address (TARGET ADDR) is returned on the address lines 346 to the PC logic unit 270. The target address is the virtual target address of a branch instruction that depends on data stored within the register file of the IEU 104. Operation of the IEU 104 is therefore required to calculate the target address.
Control flow result (CF RESULT) control signals are provided on the control lines 348 to the PC logic unit 270 to identify whether any currently pending conditional branch instruction has been resolved and whether the result is either a branch taken or not taken. Based on these control signals, the PC logic unit 270 can determine which of the instruction sets in the prefetch buffer 260 and IFIFO unit 264 must be cancelled, if at all, as a consequence of the execution of the conditional flow instruction.
A number of IEU instruction return type control signals (IEU Return) are provided on the control lines 350 to alert the IFU 102 to the execution of certain instructions by the IEU 104. These instructions include a return from procedural instruction, return from trap, and return from subroutine call. The return from trap instruction is used equally in hardware interrupt and software trap handling routines. The subroutine call return is also used in conjunction with jump-and-link type calls. In each case, the return control signals are provided to alert the IFU 102 to resume its instruction fetching operation with respect to the previously interrupted instruction stream. Origination of the signals from the IEU 104 allows the precise operation of the system 100 to be maintained; the resumption of an "interrupted" instruction stream is performed at the point of execution of the return instruction.
A current instruction execution PC address (Current IFPC) is provided on an address bus 352 to the IEU 104. This address value, the DPC, identifies the precise instruction being executed by the IEU 104. That is, while the IEU 104 may tentatively execute ahead instructions past the current IFPC address, this address must be maintained for purposes of precise control of the architecture 100 with respect to the occurrence of interrupts, exceptions, and any other events that would require knowing the precise state-of-the-machine. When the IEU 104 determines that the precise state-of-the- machine in the currently executing instruction stream can be advanced, the PC Inc/Size signal is provided to the IFU 102 and immediately reflected back in the current IFPC address value.
Finally, an address and bi-directional data bus 354 is provided for the transfer of special register data. This data may be programmed into or read from special registers within the IFU 102 by the IEU 104. Special register data is generally loaded or calculated by the IEU 104 for use by the IFU 102.
D) PC Logic Unit Detail: A detailed diagram of the PC Logic unit 270 including a PC control unit 362, interrupt control unit 363, prefetch PC control unit 364 and execution PC control unit 366, is shown in Figure 3. The PC control unit 362 provides timing control over the prefetch and execution PC control units 364, 366 in response to control signals from the prefetch control logic unit 266, IFIFO control logic unit 272, and the IEU 104, via the interface bus 126. The Interrupt Control Unit 363 is responsible for managing the precise processing of interrupts and exceptions, including the determination of a prefetch trap address offset that selects an appropriate handling routine to process a respective type of trap. The prefetch PC control unit 364 is, in particular, responsible for managing program counters necessary to support the prefetch buffers 188, 190, 192, including storing return addresses for traps handling and procedural routine instruction flows. In support of this operation, the prefetch PC control unit 364 is responsible for generating the prefetch virtual address including the CCU PADDER address on the physical address bus lines 324 and the VMU VMADDR address on the address lines 326. Consequently, the prefetch PC control unit 364 is responsible for maintaining the current prefetch PC virtual address value.
The prefetch operation is generally initiated by the IFIFO control logic unit 272 via a control signal provided on the control lines 316. In response, the PC control unit 362 generates a number of control signals provided on the control lines 372 to operate the prefetch PC control unit 364 to generate the PADDR and, as needed, the VMADDR addresses on the address lines 324, 326. An increment signal, having a value of 0 to four, may be also provided on the control lines 374 depending on whether the PC control unit 362 is re- executing an instruction set fetch at the present prefetch address, aligning for the second in a series of prefetch requests, or selecting the next full sequential instruction set for prefetch. Finally, the current prefetch address PF_PC is provided on the bus 370 to the execution PC control unit 366.
New prefetch addresses originate from a number of sources. A primary source of addresses is the current IF_PC address provided from the execution PC control unit 366 via bus 352. Principally, the IF_PC address provides a return address for subsequent use by the prefetch PC control unit 364 when an initial call, trap or procedural instruction occurs. The IF_PC address is stored in registers in the prefetch PC control unit 364 upon each occurrence of these instructions. In this manner, the PC control unit 362, on receipt of a IEU return signal, via control lines 350, need merely select the corresponding return address register within the prefetch PC control unit 364 to source a new prefetch virtual address, thereby resuming the original program instruction stream.
Another source of prefetch addresses is the target address value provided on the relative target address bus 382 from the execution PC control unit 366 or on the absolute target address bus 346 provided from the IEU 104. Relative target addresses are those that can be calculated by the execution PC control unit 366 directly. Absolute target addresses must be generated by the IEU 104, since such target addresses are dependant on data contained in the IEU register file. The target address is routed over the target address bus 384 to the prefetch PC control unit 364 for use as a prefetch virtual address. In calculating the relative target address, an operand portion of the corresponding branch instruction is also provided on the operand displacement portion of the bus 318 from the IDecode unit 262.
Another source of prefetch virtual addresses is the execution PC control unit 366. A return address bus 352' is provided to transfer the current IF_PC value (DPC) to the prefetch PC control unit 364. This address is utilized as a return address where an interrupt, trap or other control flow instruction such as a call has occurred within the instruction stream. The prefetch PC control unit 364 is then free to prefetch a new instruction stream. The PC control unit 362 receives an IEU return signal, via lines 350, from the IEU 104 once the corresponding interrupt or trap handling routine or subroutine has been executed. In turn, the PC control unit 362 selects, via one of the PFPC control signals on line 372 and based on an identification of the return instruction executed as provided via lines 350, a register containing the current return virtual address. This address is then used to continue the prefetch operation by the PC logic unit 270.
Finally, another source of prefetch virtual addresses is from the special register address and data bus 354. An address value, or at least a base address value, calculated or loaded by the IEU 104 is transferred as data via the bus 354 to the prefetch PC control unit 364. The base addresses include the base addresses for the trap address table, a fast trap table, and a base procedural instruction dispatch table. The bus 354 also allows many of the registers in the prefetch and execution PC control units 364, 366 to be read to allow corresponding aspects of the state-of- the-machine to be manipulated through the IEU 104. The execution PC control unit 366, subject to the control of the PC control unit 362 is primarily responsible for calculating the current IF_PC address value. In this role, the execution PC control unit 366 responds to control signals provided by the PC control unit 362 on the ExPc control lines 378 and increment/size control signals provided on the control lines 380 to adjust the IF_PC addres's. These control signals are generated primarily in response to the IFIFO read control signal provided on line 342 and the PC increment/size value provided on the control lines 344 from the IEU 104. lϊ PF and ExPC Control/Data Unit Detail: Figure 4 provides a detailed block diagram of the prefetch and execution PC control units 364, 366. These units primarily consist of registers, 5 incrementorε and the like, selectors and adder blocks. Control for managing the transfer of data between these blocks is provided by the PC Control Unit 362 via the PFPC control lines 372, the ExPC Control lines 378 and the Increment Control lines 374, 380. For purposes of
10 clarity, those specific control lines are not shown in the block diagram of Figure 4. However, it should be understood that these control signals are provided to the blocks shown as described herein.
Central to the prefetch PC control unit 364 is a
15 prefetch selector (PF_PC SEL) 390 that operates as a central selector of the current prefetch virtual address. This current prefetch address is provided on the output bus 392 from the prefetch selector to an incrementor unit 394 to generate a next prefetch
20 address. This next prefetch address is provided on the incrementor output bus 396 to a parallel array of registers MBUF PFnPC 398, TBUF PFnPC 400, and EBUF PFnPC 402. These registers 398, 400, 402 effectively store the next instruction prefetch address. However, in
25 accordance with the preferred embodiment of the present invention, separate prefetch addresses are held for the MBUF 188, TBUF 190, and EBUF 192. The prefetch addresses, as stored by the MBUF, TBUF and EBUF PFnPC registers 398, 400, 402 are respectively provided by the
•30 address buses 404, 408, 410 to the prefetch selector 390. Thus, the PC control unit 362 can direct an immediate switch of the prefetch instruction stream merely by directing the selection, by the prefetch selector 390, of another one of the prefetch registers 398, 400, 402. Once that address value has been incremented by the incrementor 394, if a next instruction set in the stream is to be prefetched, the value is returned to the appropriate one of the prefetch registers 398, 400, 402. Another parallel array of registers, for simplicity shown as the single special register block 412, is provided to store a number of special addresses. The register block 412 includes a trap return address register, a procedural instruction return address register, a procedural instruction dispatch table base address register, a trap routine dispatch table base address register, and a fast trap routine table base address register. Under the control of the PC control unit 362, these return address registers may receive the current IFPC execution address via the bus 352'. The address values stored by the return and base address registers within the register block 412 may be both read and written independently by the IEU 104. The register are selected and values transferred via the special register address and data bus 354.
A selector within the special register block 412, controlled by the PC control unit 362, allows the addresses stored by the registers of the register block 412 to be put on the special register output bus 416 to the prefetch selector 390. Return addresses are provided directly to the prefetch selector 390. Base address values are combined with the offset value provided on the interrupt offset bus 373 from the interrupt control unit 363. Once sourced to the prefetch selector 390 via the bus 373', a special address can be used as the initial address for a new prefetch instruction stream by thereafter continuing the incremental loop of the address through the incrementor 394 and one of the prefetch registers 398, 400, 402.
Another source of addresses to the prefetch selector 390 is an array of registers within the target address register block 414. The target registers within the block 414 provide for storage of, in the preferred embodiment, eight potential branch target addresses. These eight storage locations logically correspond to the eight potentially executable instructions held in the lowest two master registers 216, 224 of the IFIFO unit 264. Since any, and potentially all of the those instructions could be conditional branch instructions, the target register block 414 allows for their precalculated target addresses to be stored awaiting use for fetching of a target instruction stream through the TBUF 190. In particular, if a conditional branch bias is set such that the PC Control Unit 362 immediately begins prefetching of a target instruction stream, the target address is immediately fed through the target register block 414 via the address bus 418 to the prefetch selector 390. Once incremented by the incrementor 394, the address is stored back to the TBUF PFnPC 400 for use in subsequent prefetch operations of the target instruction stream. If additional branch instructions occur within the target instruction stream, the target addresses of such secondary branches are calculated and stored in the target register array 414 pending use upon resolution of the first conditional branch instruction. A calculated target address as stored by the target register block 414, is transferred from a target address calculation unit within the execution PC control unit 366 via the address lines 382 or from the IEU 104 via the absolute target address bus 346. The Address value transferred through the prefetch PF_PC selector 390 is a full thirty-two bit virtual address value. The page size, in the preferred embodiment of the present invention is fixed at 16 KBytes, corresponding to the maximum page offset address value [13:0]. Therefore, a VMU page translation is not required unless there is a change in the current prefetch virtual page address [27:14], A comparitor in the prefetch selector 390 detects this circumstance. A VMU translation request signal (VMXLAT) is provided via line 372' to the PC control unit 362 when there is a change in the virtual page address, either due incrementing accross a page boundary or a control flow branch to another page address. In turn, the PC control unit 362 directs the placement of the VM VADDR address on lines 326, in addition to the CCU PADDR on lines 324, both via a buffer unit 420, and the appropriate control signals on the VMU control lines 326, 328, 330 to obtain a VMU virtual to physical page translation. Where a page translation is not required, the current physical page address [31:14] is maintained by a latch at the output of the VMU unit 108 on the bus 122.
The virtual address provided onto the bus 370 is incremented by the incrementor 394 in response to a signal provided on the increment control line 374. The incrementor 394 increments by a value representing an instruction set (four instructions or sixteen bytes) in order to select a next instruction set. The low-order four bits of a prefetch address as provided to the CCU unit 106 are zero. Therefore the actual target address instruction in a first branch target instruction set may not be located in the first instruction location. However, the low-order four bits of the address are provided to the PC control unit 362 to allow the proper first branch instruction location to be known by the IFU 102. The detection and handling, by returning the low order bits [3:2] of a target addressas the two-bit •buffer address, to select the proper first instruction for execution in a non-aligned target instruction set, is performed only for the first prefetch of a new instruction stream, i.e., any first non-sequential instruction set address in an instruction stream. The non-aligned relationship between the address of the first instruction in an instruction set and the prefetch address used in prefetching the instruction set can and is thereafter ignored for the duration of the current sequential.instruction stream.
The remainder of the functional blocks shown in Figure 4 comprise the execution PC control unit 366. In accordance with the preferred embodiment of the present invention, the execution PC control ' unit 366 incorporates its own independently functioning program counter incrementor. Central to this function is an execution selector (DPC SEL) 430. The address output by the execution selector 430, on the address bus 352', is the present execution address (DPC) of the architecture 100. This execution address is provided to an adder unit 434. The increment/size control- signals provided on the lines 380 specify an instruction increment value of from one to four that the adder unit 434 adds to the address obtained from the selector 430. As the adder 432 additionally performs an output latch function, the incremented next execution address is provided on the address lines 436 directly back to the execution selector 430 for use in the next execution increment cycle.
The initial execution address and all subsequent new stream addresses are obtained through a new stream register unit 438 via the address lines 440. The new stream register unit 438 allows the new current prefetch address, as provided on the PFPC address bus 370 from the prefetch selector 390 to be passed on to the address bus 440 directly or stored for subsequent use. That is, where the prefetch PC control unit 364 determines to begin prefetching at a new virtual address, the new stream address is temporarily stored by the new stream register unit 438. The PC control unit 362, by its participation in both ' the prefetch and execution increment cycles, holds the new stream address in the new stream register 438 unit until the execution address has reached the program execution point corresponding to the control flow instruction that instigated the new instruction stream. The new stream address is then output from the new stream register unit 438 to the execution selector 430 to initiate the independent generation of execution addresses in the new instruction stream. In accordance with the preferred embodiments of the present invention, the new stream register unit 438 provides for the buffering of two control flow instruction target addresses. By the immediate availability of the new stream address, there is essentially no latency in the switching of the execution PC control unit 366 from the generation of a current sequence of execution addresses to a new stream sequence of execution addresses.
Finally, an IFPC selector (IF_PC SEL) 442 is provided to ultimately issue the current IFPC address on the address bus 352 to the IEU 104. The inputs to the IFPC selector 442 are the output addresses obtained from either the execution selector 430 or new stream register unit 438. In most instances, the IFPC selector 442 is directed by the PC control unit 362 to select the execution address output by the execution selector 430. However, in order to further reduce latency in switching to a new virtual address used to initiate execution of a new instruction stream, the selected address provided from the new stream register unit 438 can be bypassed via bus 440 directly to the IFPC selector 442 for provision as the current IFPC execution address.
The execution PC control unit 366 is capable of calculating all relative branch target addresses. The current execution point address and the new stream register unit 438 provided address are received by a control flow selector (CF_PC) 446 via the address buses 352', 440. Consequently, the PC control unit 362 has substantial flexibility in selecting the exact initial address from which to calculate a target address. This initial, or base, address is provided via address bus 454 to a target address ALU 450. A second input value to the target ALU 450 is provided from a control flow displacement calculation unit 452 via bus 458. Relative branch instructions, in accordance with the preferred architecture 100, incorporate a displacement value in the form of an immediate mode constant that specifies a relative new target address. The control flow displacement calculation unit 452 receives the operand displacement value initially obtained via the IDecode unit operand output bus 318. Finally, an offset register value is provided to the target address ALU 450 via the lines 456. The offset register 448 receives an offset value via the control lines 378' from the PC control unit 362. The magnitude of the offset value is determined by the PC control unit 362 based on the address offset between the base address provided on the address lines 454 and the address of the current branch instruction for which the relative target address is being calculated. That is, the PC control unit 362, through its control of the IFIFO control logic unit 272 tracks the number of instructions separating the instruction at the current execution point address (requested by CP_PC) and the instruction that is currently being processed by the IDecode unit 262 and, therefore, being processed by the PC logic unit 270 to determine the target address for that instruction. Once the relative target address has been calculated by the target address ALU 450, the target address is written into a corresponding one of the target registers 414 via the address bus 382.
2, PC Control Algorithm Detail:
1. Main Instruction Stream Processing: MBUF PFnPC
1.1 the address of the next main flow prefetch instruction is stored in the MBUF PFnPC.
1.2 in the absence of a control flow instruction, a 32 bit incrementor adjusts the address value in the MBUF PFnPC by sixteen bytes (xl6) with each prefetch cycle.
1.3 when an unconditional control flow instruction is IDecoded, all prefetched data fetched subsequent to the instruction set will be flushed and the MBUF PFnPC is loaded, through the target register unit, PF_PC selector and incrementor, with the new main instruction stream address. The new address is also stored in the new stream registers.
1.3.1 the target address of a relative unconditional control flow is calculated by the IFU from register data maintained by the IFU and from operand data following the control flow instruction. 1.3.2 the target address of an absolute unconditional control flow instruction is eventually calculated by the IEU. from a register reference, a base register value, and an index register value.
1.3.2.1 instructionprefetch cycling stalls until the target address is returned by the IEU for absolute address control flow instruction; instruction execution cycling continues. 1.4 the address of the next main flow prefetch instruction set, resulting from an unconditional control flow instruction, is bypassed through the target address register unit, PF_PC selector and incrementor and routed for eventual storage in the MBUF PFnPC; prefetching continues at 1.2.
2. Procedural Instruction Stream Processing: EBUF PFnPC 2.1 a procedural instruction may be prefetched in the main or branch target instruction stream. If fetched in a target stream, stall prefetching of the procedural stream until the conditional control flow instruction resolves and the procedural instruction is transferred to the MBUF. This allows the TBUF to be used in handling of conditional control flows that occur in the procedural instruction stream. 2.1.1 a procedural instruction should not appear in a procedural instruction stream, i.e., procedural instructions should not be nested: a return from procedural instruction will return execution to the main instruction flow. In order to allow nesting, an additional, dedicated return from nested procedural instruction would be required. While the architecture can readily support such an instruction, the need for a nested procedural instruction capability will not likely improve the performance of the architecture.
2.1.2 in a main instruction stream, a procedural instruction stream that, in turn, includes first and second conditional control flow instruction containing instruction sets will stall prefetching with respect to the second conditional control flow instruction set until any conditional control flow instructions in the first such instruction set are resolved and the second conditional control flow instruction set has been transferred to the MBUF.
2.2 procedural instructions provide a relative offset, included as an immediate mode operand field of the instruction, to identify the procedural routine starting address:
2.2.1 the offset value provided by the procedural instruction is combined with a value contained in a procedural base address (PBR) register maintained in the IFU. This PBR register is readable and writable via the special address and data bus in response to the execution of a special register move instruction.
2.3 when a procedural instruction is encountered, the next main instruction stream IF__PC address is stored in the uPC return address register and the procedure-in-progresε bit in the processor status register (PSR) is set.
2.4 the starting address of the procedural stream is routed from the PBR register (plus the procedural instruction operand offset value) to the PF_PC selector.
2.5 the starting address of the procedural stream is simultaneously provided to the new stream register unit and to the incrementor for incrementing (xl6); the incremented address is then stored in the EBUF PFnPC.
2.6 in the absence of a control flow instruction, a 32 bit incrementor adjusts address value (xl6) in the EBUF PFnPC with each procedural instruction prefetch cycle.
2.7 when an unconditional control flow instruction is IDecoded, all prefetched data fetched subsequent to the branch instruction will be flushed and the EBUF PFnPC is loaded with the new procedural instruction stream address.
2.7.1 the target address of a relative unconditional control flow instruction is calculated by the IFU from IFU maintained register data and from the operand data provided within an immediate mode operand field of the control flow instruction.
2.7.2 the target address of an absolute unconditional branch is calculated by the IEU from a register reference, a base register value, and an index register value.
2.7.2.1 instruction prefetch cycling stalls until the target address is returned by the IEU for absolute address branches; execution cycling continues.
2.8 the address of the next procedural flow prefetch instruction set is stored in the EBUF
PFnPC and prefetching continues at 1.2.
2.9 when a return from procedure instruction is IDecoded, prefetching continues from the address stored in the uPC register, which is then incremented (xl6) and returned to the MBUF
PFnPC register for subsequent prefetches . Branch Instruction Stream Processing: TBUF PFnPC 3.1 when a conditional control flow instruction, occuring in a first instruction set in the MBUF instruction stream, is IDecoded, the target address is determined by the IFU if the target address is relative to the current address or by the IEU for absolute addresses. 3.2 for "branch taken bias":
3.2.1 if the branch is to an absolute address, stall instruction prefetch cycling until the target address is returned by the IEU; execution cycling continues.
3.2.2 load the TBUF PFnPC with the branch target address by thransfer through the PF_PC selector and incrementor. 3.2.3 target instruction stream instructions are prefetched into the TBUF and then routed into the IFIFO for subsequent execution; if the IFIFO and TBUF becomes full, stall prefetching. 3.2.4 the 32 bit incrementor adjusts (xl6) the address value in the TBUF PFnPC with each prefetch cycle. 3.2.5 stall the prefetch operation on IDecode of a conditional control flow instruction, occuring in a second instruction set in the target instruction stream until the all conditional branch instructions in the first (primary) set are resolved (but go ahead and calculate the relative target address and store in target reisters) . 3.2.6 if conditional branch in the first instruction set resolves to "taken":
3.2.6.1 flush instruction sets following the first conditional flow instruction set in the MBUF or
EBUF, if the source of the branch was the EBUF instruction stream as determined from the procedure-in- progreεs bit. 3.2.6.2 transfer the TBUF PFnPC value to
MBUF PFnPC or EBUF based on the state of the procedure-in-progress bit.
3.2.6.3 transfer the prefetched TBUF instructions to the MBUF or EBUF based on the state of procedure- in-progress bit.
3.2.6.4 if a second conditional branch instruction set has not been IDecoded, continue MBUF or EBUF prefetching operations based on the state of the procedure-in-progress bit.
3.2.6.5 if a second conditional branch instruction has been IDecoded, begin processing that instruction
(go to step 3.3.1).
3.2.7 if the conditional control for instruction(s) in the first conditional instruction set resolves to "not taken" : 3.2.7.1 flush the IFIFO and IEU of instruction sets and instructions from the target instruction stream. 3.2.7.2 continue MBUF or EBUF prefetching operations. 3.3 for "branch not taken bias":
3.3.1 stall prefetch of instructions into the MBUF; execution cycling continues.
3.3.1.1 if the conditional control flow instruction in the first conditional instruction set is relative, calculate the target address and store in the target registers.
3.3.1.2 if the conditional control flow instructions in the first conditional instruction set is absolute, wait for the IEU to calculate the target address and return the address to the target registers.
3.3.1.3 stall the prefetch operation on IDecode of a conditional control flow instruction in a second instruction set until the conditional control flow instructionε) in the first conditional instruction set instruction is resolved.
3.3.2 once the target address of the first conditional branch is calculated, load into TBUF PFnPC and also begin prefetching instructions into the TBUF concurrent with execution of the main instruction stream. Target instruction sets are not loaded into the IFIFO (the branch target instructions are thus on hand when each conditional control flow instruction in the first instruction set resolves) .
3.3.3 if a conditional control flow instruction in the first set resolves to "taken": 3.3.3.1 flush the MBUF or EBUF, if the source of the branch was the EBUF instruction stream, as determined from the state of the procedure- in-progress bit, and the IFIFO and IEU of instructions from the main stream following the first conditional branch instruction set. 3.3.3.2 transfer the TBUF PFnPC value to
MBUF PFnPC or EBUF, as determined from the state of the procedure- in-progress bit.
3.3.3.3 transfer the prefetched TBUF instructions to the MBUF or EBUF, as determined from the state of the procedure-in-progress bit.
3.3.3.4 continue MBUF or EBUF prefetching operations, as determined from the state of the procedure-in-progresε bit. 3.3.4 if a conditional control flow instruction in the first set resolves to "not taken": 3.3.4.1 flush the TBUF of instruction sets from the target instruction stream.
3.3.4.2 if a second conditional branch instruction has not been IDecoded, continue MBUF or EBUF, as determined from the state of the procedure-in-progress bit, prefetching operations.
3.3.4.3 if a second conditional branch instruction has been IDecoded, begin processing that instruction (go to step 3.4.1).
4. Interrupts, Exceptions and Trap Instructions. 4.1 Traps generically include:
4.1.1 Hardware Interrupts.
4.1.1.1 asynchronously (external) occurring events, internal or external.
4.1.1.2 can occur at any time and persist. 4.1.1.3 serviced in priority order between atomic (ordinary) instructions and may suspend procedural instructions.
4.1.1.4 the starting address of an interrupt handler is determined as the vector number offset into a predefined table of trap handler entry points.
4.1.2 Software Trap Instructions. 4.1.2.1 synchronously (internal) occurring instructions.
4.1.2.2 a' software instruction that executes as an exception.
4.1.2.3 the starting address of the trap handler is determined from the trap number offset combined with a base address value stored in the TBR or FTB register.
4.1.3 Exceptions . 4.1.3.1 Events occurring synchronously with an instruction.
4.1.3.2 handled at the time the instruction is executed. 4.1.3.3 due to consequences of the exception, the excepted instruction and all subsequent executed instructions are cancelled.
4.1.3.4 the starting address of the exception handler is determined from the trap number offset into a predefined table of trap handler entry point.
4.2 Trap instruction stream operations occur in- line with the then currently executing instruction stream.
4.3 Traps may nest, provided the trap handling routine saves the xPC addresε prior to a next allowed trap — failure to do so will corrupt the state of the machine if a trap occurs prior to completion of the current trap operation.
5. Trap Instruction Stream Procesεing: xPC.
5.1 when a trap is encountered: 5.1.1 if an asynchronous interrupt, the execution of the currently executing instruction(s) is εuεpended. 5.1.2 if a εynchronouε exception, the trap iε processed upon execution of the excepted instruction.
5.2 when a trap iε processed: 5.2.1 interrupts are disabled. 5.2.2 the current IF_PC addreεε is stored in the xPC trap state return addresε regiεter.
5.2.3 the IFIFO and the MBUF prefetch buffers at and subsequent to the IF_PC address are flushed.
5.2.4 executed instructions at and subsequent to the address IF_PC and the results of those instructions are flushed from the IEU.
5.2.5 the MBUF PFnPC is loaded with the address of the trap handler routine.
5.2.5.1 source of a trap addresε either the
TBR or FTB regiεter, depending on the type of trap as determined by the trap number, which are provided in the set of special registers.
5.2.6 instructions are prefetched and dropped into the IFIFO for execution in a normal manner.
5.2.7 the instructions of the trap routine are then executed.
5.2.7.1 the trap handling routine may provide for the xPC addresε to be saved to a predefined location and interrupts re-enabled; the xPC register iε read/write via a special regiεter move inεtruciton and the special regiεter addresε and data bus.
5.2.8 the trap state must be exited by the execution of a return from trap instruction. 5.2.8.1 if prior saved, the xPC addreεε must be restored from its predefined location before executing the return from trap instruction.
5.3 when a return from trap is executed:
5.3.1 interrupts are enabled.
5.3.2 the xPC addreεε is returned to the current instruction stream regiεter MBUF or EBUF PFnPC, as determined from the state of the procedure-in-progress bit, and prefetching continues from that address.
5.3.3 the xPC addresε iε reεtored to the IF_PC regiεter through the new εtream regiεter. Interrupt and Exception Handling: , ) Overview:
Interruptε and exceptions will be processed, as long as they are enabled, regardless of whether the processor is executing from the main instruction εtream or a procedural inεtruction εtream. Interrupts and exceptions are serviced in priority order, and persist until cleared. The starting address of a trap handler is determined as the vector number offset into a predefined table of trap handler addresses as described below.
Interrupts and exceptions are of two basic types in the present embodiment, those which occur synchronously with particular instructions in the instruction stream, and those which occur asynchronously with particular instructions in the instruction εtream. The terms interrupt, exception, trap and fault are used interchangeably herein. Asynchronous interruptε are generated by hardware, either on-chip or off-chip, which does not operate synchronouεly with the instruction εtream. For example, interrupts generated by an on- chip timer/counter are asynchronous, as are hardware interrupts and non-maskable interruptε (NMI) provided from off-chip. When an aεynchronouε interrupt occurs, the processor context is frozen, all traps are disabled, certain processor status information is stored, and the processor vectors to an interrupt handler corresponding to the particular interrupt received. After the interrupt handler completes its processing, ' program execution continues with the instruction following the last completed instruction in the stream which was executing when the interrupt occurred. Synchronous exceptions are those that occur εynchronouεly with inεtructionε in the inεtruction stream. These exceptions occur in relation to particular instructions, and are held until the relevant instruction is to be executed. In the preferred embodiments, εynchronous exceptions arise during prefetch, during inεtruction decode, or during instruction execution. Prefetch exceptions include, for example, TLB miss or other VMU exceptions. Decode exceptions arise, for example, if the inεtruction being decoded iε an illegal inεtruction or does not match the current privilege level of the processor. Execution exceptions arise due to arithmetic errors, for example, such as divide by zero. Whenever these exceptions occur, the preferred embodiments maintain- them in correspondence with the particular inεtruction which caused the exception, until the time at which that instruction is to be retired. At that time, all prior completed instructions are retired, any tentative results from the inεtruction which caused the exception are flushed, as are the tentative results of any following tentatively executed instructions. Control iε then transferred to an exception handler corresponding to the highest priority exception which occurred for that instruction.
Software trap instructions are detected at the IDecode stage by CF_DET 274 (Fig. 2) and are handled similarly to both unconditional call inεtructionε and other εynchronous traps. That iε, a target addreεε iε calculated and prefetch continues to the then-current prefetch queue (EBUF or MBUF). At the same time, the exception is also noted in correspondence with the instruction and is handled when the instruction iε to be retired. All other types of εynchronous exceptions are merely noted and accumulated in correspondence with the particular inεtruction which caused it and are handled at execution time.
2) Asynchronous Interrupts:
Asynchronous interrupts are signaled to the PC logic unit 270 over interrupt lines 292. Aε εhown in Figure 3, theεe lineε are provided to the interrupt logic unit 363 in the PC logic unit 270, and compriεe an NMI line, an IRQ line and a εet of interrupt level lines (LVL) . The NMI line signals a nonmaskable interrupt, and derives from an external source. It iε the highest priority interrupt except for hardware reset. The IRQ line also derives from an external source, and indicates when an external device iε requesting a hardware interrupt. The preferred embodiments permit up to 32 user-defined externally supplied hardware interrupts and the particular external device requesting the interrupt provides the number of the interrupt (0-31) on the interrupt level lines (LVL).
The memory error line iε activated by the MCU 110 to εignal variouε kindε of memory errors. Other aεynchronouε interrupt lineε (not εhown) are also provided to the interrupt logic unit 363, including lines for requesting a timer/counter interrupt, a memory I/O error interrupt, a machine check interrupt and a performance monitor interrupt. Each of the asynchronous interrupts, as well aε the synchronous exceptions described below, have a corresponding predetermined trap number associated with them, 32 of theεe trap numbers being aεεbciated with the 32 available hardware interrupt levelε. A table of these trap numbers iε maintained in the interrupt logic unit 363. The higher the trap number, in general, the higher the priority of the trap. When one of the asynchronouε interrupts is signaled to the interrupt logic unit 363, the interrupt control unit 363 sends out an interrupt request to the IEU 104 over INT REQ/ACK lineε 340. Interrupt control unit 363 also sends a suspend prefetch εignal to PC control unit 362 over lineε 343, causing the PC control unit 262 to stop prefetching instructions. The IEU 104 either cancels all then-executing inεtructionε, and fluεhing all tentative reεults, or it may allow some or all instructionε to complete. In the preferred embodiments, • any then-executing instructions are canceled, thereby permitting the fastest response to asynchronous interruptε. In any event, the DPC in the execution PC control unit 366 iε updated to correεpond to the last instruction which has been completed and retired, before the IEU 104 acknowledges the interrupt. All other prefetched instructions in MBUF, EBUF, TBUF and IFIFO 264 are also cancelled.
Only when the IEU 104 iε ready to receive inεtructionε from an interrupt handler does it send an interrupt acknowledge εignal on INT REQ/ACK lineε 340 back to the interrupt control unit 363. The interrupt control unit 363 then diεpatcheε to the appropriate trap handler as described below. 3T Synchronous Exceptions:
For εynchronouε exceptionε, the interrupt control unit 363 maintainε a εet of four internal exception bits (not εhown) for each instruction set, one bit corresponding to each instruction in the set. The interrupt control unit 363 also maintains an indication of the particular trap numbers, if any detected for each instruction.
If the VMU signalε a TLB miss or another VMU exception while a particular inεtruction εet is being prefetched, this information is transmitted to the PC logic unit 270, and in particular to the interrupt control unit 363, over the VMU control lines 332 and 334. When the interrupt control unit 363 receives such a εignal, it signalε the PC control unit 362 over line 343 to suspend further prefetches. At the same time, the interrupt control unit 363 sets the VM_Misε or VM_Excp bit, aε appropriate, aεεociated the prefetch buffer to which the inεtruction εet was destined. The interrupt control unit 363 then setε all four internal exception indicator bitε corresponding to that instruction εet, εince none of the inεtructionε in the εet are valid, and εtores the trap number for the particular exception received in correspondence with each of the four instructions in the faulty instruction set. The shifting and executing of instructions prior to the faulty instruction εet then continues as usual until the faulty εet reaches the lowest level in the IFIFO 264. Similarly, if other synchronous exceptions are detected during the shifting of an inεtruction through the prefetch buffers 260, the IDecode unit 262 or the IFIFO 264, this information is also tranεmitted to the interrupt control unit 363 which εets the internal exception indicator bit corresponding to the instruction generating the exception and εtoreε the trap number in correspondence with that exception. As with prefetch synchronous exceptions, the shifting and executing of instructions prior to the faulty instruction then continues as uεual until the faulty εet reaches the lowest level in the IFIFO 264.
In the preferred embodiments, the only type of exception which iε detected during the εhifting of an inεtruction through the prefetch buffers 260, the IDecode unit 262 or the IFIFO 264 is a software trap instruction. Software trap instructions are detected at the IDecode stage by CF_DET unit 274. While in εome embodimentε other forms of synchronous exceptions may be detected in the IDecode unit 262, it is preferred that the detection of any other synchronous exceptions wait until the inεtruction reaches the execution unit 104. Thiε avoids the possibility that certain exceptions, such aε arriεing from the handling of privileged instruction, might be signaled on the basiε of a processor state which could change before the effective in-order-execution of the instruction. Exceptions which do not depend on the processor state, such as illegal instruction, could be detected in the IDecode stage, but hardware iε minimized if the same logic detects all pre- execution synchronous exceptions (apart from VMU exceptions). Nor iε there any time penalty impoεed by waiting until instructions reach the execution unit 104, since the handling of such exceptions iε rarely time critical.
As mentioned, software trap instructions are detected at the IDecode stage by the CF_DET unit 274. The internal exception indicator bit corresponding to that instruction in the interrupt logic unit 363 is set and the software trap number, which can be any number from 0 to 127 and which is specified in an immediate mode operand field of the software trap instruction, iε stored in correspondence with the trap inεtruction. Unlike prefetch εynchronous exceptions, however, since software traps are treated as both a control flow instruction and as a εynchronous exception, the interrupt control unit 363 does not signal PC control unit 362 to εuεpend prefetches when a software trap instruction is detected. Rather, at the same time the instruction is shifting through the IFIFO 264, the IFU 102 prefetches the trap handler into the MBUF instruction εtream buffer. When an instruction set reaches the lowest level of the IFIFO 264, the interrupt logic unit 363 transmits the exception indicator bits for that inεtruction εet aε a 4-bit vector to the IEU 104 over the SYNCH_INT_INFO lines 341 to indicate which, if any, of the instructions in the inεtruction εet have already been determined to be the source of a εynchronouε exception. The IEU 104 doeε not respond immediately, but rather permitε all the inεtructionε in the inεtruction set to be scheduled in the normal course. Further exceptions, such as integer arithmetic exceptions, may be generated during execution. Exceptionε which depend on the current εtate of the machine, such as due to the execution of a privileged instruction, are also detected at this time, and in order to ensure that the εtate of the machine is current with respect to all previous instructionε in the instruction stream, all inεtructionε which have a possibility of affecting the PSR (such aε εpecial move and returnε from trap instructions) are forced .to execute in order. Only when an inεtruction that is the source of a synchronous exception of any sort is about to be retired, iε the occurance of the exception εignaled to the interrupt logic unit 363.
The IEU 104 retires all inεtructionε which have been tentatively executed and which occur in the instruction εtream prior to the firεt inεtruction which has a synchronous exception, and flushes the tentative results from any tentatively executed instructions which occur subsequently in the inεtruction εtream. The particular instruction that caused the exception iε alεo fluεhed since that instruction will typically be re- executed upon return from trap. The IF_PC in the execution PC control unit 366 iε then updated to correspond to the last inεtruction actually retired, and the before any exception iε signaled to the interrupt control unit 363.
When the inεtruction that iε the εource of an exception is retired, the IEU 104 returns to the interrupt logic unit 363, over the SYNCH_INT_INFO lines 341, both a new 4-bit vector indicating which, if any, inεtructionε in the retiring inεtruction set (register 224) had a synchronous exception, as well as information indicating the source of the firεt exception in the instruction εet. The information in the 4-bit exception vector returned by IEU 104 iε an accumulation of the 4-bit exception vectors provided to the IEU 104 by the interrupt logic unit 363, aε well as exceptions generated in the IEU 104. The remainder of the information returned from the IEU 104 to interrupt control unit 363, together with any information already stored in the interrupt control unit 363 due to exceptions detected on prefetch or IDecode, is sufficient for the interrupt control unit 363 to determine the nature of the highest priority synchronous exception and its trap number.
41 Handler Dispatch and Return:
After an interrupt acknowledge εignal iε received over lineε 340 from the IEU, or after a non- zero exception vector iε received over lineε 341, the current DPC is temporarily εtored aε a return addreεε in an xPC regiεter, which iε one of the εpecial registers 412 (Figure 4). The current processor status register (PSR) iε alεo εtored in a previouε PSR (PPSR) regiεter, and the current compare state register (CSR) iε εaved in a prior compare εtate regiεter (PCSR) in the εpecial regiεterε 412.
The addreεs of a trap handler is calculated aε a trap base register address plus an offset. The PC logic unit 270 maintains two base regiεterε for trapε, both of which are part of the special registerε 412 (Figure 4), and both of which are initialized by εpecial move instructions executed previously. For most trapε, the base register used to calculate the addresε of the handler is a trap base regiεter TBR.
The interrupt control unit 363 determineε the higheεt priority interrupt or exception currently pending and, through a look-up table, determines the trap number associated therewith. This is provided over a εet of INT_OFFSET lineε 373 to the prefetch PC control unit 364 aε an offεet to the selected base regiεter. Advantageously, the vector addreεs iε calculated by merely concatenating the offεet bits aε low-order bits to the higher order bits obtained from the TBR register. This avoidε any need for the delayε of an adder. (Aε uεed herein, the '21 bit iε referred to aε the i'th order bit.) For example, if trapε are numbered from 0 through 255, represented as an 8 bit value, the handler addreεε may be calculated by concatenating the 8 bit trap number to the end of a 22-bit TBR εtored value. Two low-order zero bits may be appended to the trap number to enεure that the trap handler address always occurs on a word boundary. The concatenated handler address thus constructed is provided aε one of the inputε, 373; to the prefetch selector PF_PC Sel 390 (Figure 4), and is selected as the next address from which instructions are to be prefetched.
The vector handler addreεs for traps uεing the TBR regiεter are all only one word apart. Thuε, the instruction at the trap handler addresε must be a preliminary branch instruction to a longer trap handling routine. Certain traps require very careful handling, however, to prevent degradation of syεtem performance. TLB traps, for example, must be executed very quickly. For thiε reason, the preferred embodiments include .a fast trap mechanism designed to allow the calling of small trap handlers without the cost of this preliminary branch. In addition, fast trap handlerε can be located independently in memory, in on-chip ROM, for example, to eliminate memory εyεtem penalties aεεociated with RAM locationε.
In the preferred embodimentε, the only trapε which reεult in faεt trapε are the VMU exceptions mentioned above. Faεt traps are numbered separately from other trapε, and have a range from 0 to 7. However, they have the same priority as MMU exceptions. When the interrupt control unit 363 recognizes a fast trap aε the higheεt priority trap then pending, it cauεeε a faεt trap base register (FTB) to be selected from the special regiεterε 412 and provided on the lineε 416 to be combined with the trap offεet. The resulting vector address provided to the prefetch selector PF_PC Sel 390, via lineε 373', iε then a concatenation of the high-order 22 bitε from the FTB register, followed by three bits representing the fast trap number, followed by seven bitε of 0'ε. Thus, each fast trap addresε iε 128 byteε, or 32 wordε apart. When called, the proceεεor brancheε to the starting word and may execute programs within the block or branch out of it. Execution of small programs, such as standard TLB handling routines which may be implemented in 32 instructions or less, is faster than ordinary traps because the preliminary branch to the actual exception handling routine is obviated.
It should be noted that although all instructionε have the same length of 4 bytes (i.e., occupy four address locations) in the preferred embodimentε, it should be noted that the faεt trap mechaniεm iε alεo useful in microprocesεors whoεe inεtructionε are variable in length. In this case, it will be appreciated that the faεt trap vector addresεeε be separated by enough εpace to accommodate at least two of the εhorteεt inεtructionε available on the microproceεεor, and preferably about 32 average-εized instructions. Certainly, if the microprocesεor includeε a return from trap instruction, the vector addresεeε should be εeparated by at leaεt enough εpace to permit that instruction to be preceded by at leaεt one other instruction in the handler.
Also on dispatch to a trap handler, the processor enters both a kernel mode and an interrupted εtate. Conncurrently, a copy of the compare εtate register (CSR) iε placed in the prior carry εtate regiεter (PCSR) and a copy of the PSR is stored in the prior PSR (PPSR) regiεter. The kernel and interrupted εtates modes are represented by bits in the processor statuε regiεter (PSR) . Whenever the interrupted_εtate bit in the current PSR iε εet, the shadow regiεterε or trap registers RT[24] through RT[31], as described above and as εhown in Figure 7b, become viεible. The interrupt handler may switch out of kernel mode merely by writing a new mode into the PSR, but the only way to leave the interrupted state iε by executing a return from trap (RTT) inεtruction.
When the IEU 104 executes an RTT inεtruction, PCSR is restored to CSR register and PPSR regiεter iε reεtored to the PSR regiεter, thereby automatically clearing the interrupt_εtate bit in the PSR regiεter. The PF_PC SEL εelector 390 alεo εelectε εpecial regiεter xPC in the special register set 412 as the next address from which to prefetch. xPC is restored to either the MBUF PFnPC or the EBUF PFnPC as appropriate, via incrementor 394 and bus 396. The decision as to whether to restore xPC into the EBUF or MBUF PFnPC is made according to the "procedure_in_progresε" bit of the PSR, once reεtored.
It should be noted that the processor does not use the same εpecial regiεter xPC to εtore the return address for both trapε and procedural instructionε . The return addreεε for a trap iε εtored in the εpecial register xPC, as mentioned, but the addreεε to return to after a procedural inεtruction iε εtored in a different special regiεter, uPC. Thus, the interrupted εtate remains available even while the processor iε executing an emulation εtream invoked by a procedural instruction. On the other hand, exception handling routines should not include any procedural inεtructionε εince there iε no εpecial regiεter to εtore an addreεε for return to the exception handler after the emulation εtream iε complete.
5 ) Nesting: Although certain procesεor εtatus information is automatically backed up on diεpatch to a trap handler, in particular CSR, PSR, the return PC, and in a sense the "A" regiεter εet ra[24] through ra[31], other context information is not protected. For example, the contents of a floating point εtatuε register (FSR) is not automatically backed up. If a trap handler intends to alter these regiεterε, it muεt perform itε own backup.
Because of the limited backup which is performed automatically on a dispatch to a trap handler, nesting of traps is not automatically permitted. A trap handler should back up any desired regiεterε, clear any interrupt condition, read any information neceεεary for handling the trap from the system regiεterε and proceεε it as appropriate. Interrupts are automatically disabled upon dispatch to the trap handler. After processing, the handler can then restore the backed up registerε, re-enable interrupts and execute the RTT instruction to return from the interrupt. If nested traps are to be allowed, the trap handler should be divided into firεt and second portions. In the first portion, while interrupts are disabled, the xPC should be copied, using a εpecial regiεter move instruction, and pushed onto the stack maintained by the trap handler. The addreεε of the beginning of the second portion of the trap handler should then be moved using the εpecial regiεter move inεtruction into the xPC, and a return from trap instruction (RTT) executed. The RTT removes the interrupted εtate (via the reεtoration of PPSR into PSR) and transfers control to the addreεε in the xPC, which now containε the addreεs of the second portion of the handler. The εecond portion may enable interruptε at this point and continue to process the exception in an interruptable mode. It should be noted that the shadow registerε RT[24] through RT[31] are visible only in the first portion of this handler, and not in the second portion. Thus, in the second portion, the handler should preserve any of the "A" register values where these register values are likely to be altered by the handler. When the trap handling procedure iε finiεhed, it εhould restore all backed up registers, pop the original xPC off the trap handler stack and move it back into the xPC special regiεter uεing a εpecial register move instruction, and execute another RTT. This returnε control to the appropriate instruction in the main or emulation instruction stream.
61 List of Trans: The following Table I setε forth the trap numbers, priorities and handling modes of trapε which are recognized in the preferred embodiment :
Figure imgf000072_0001
The combined control and data path portions of IEU 104 are εhown in Figure 5. The primary data path begins with the instruction/operand data buε 124 from the IFU 102. As a data bus, immediate operandε are provided to an operand alignment unit 470 and pasεed on to a regiεter file (REG ARRAY) 472. Regiεter data iε provided from the regiεter file 472 through a bypass unit 474, via a register file output bus 476, to a parallel array of functional computing elementε (FU^) 478^,,, via a diεtribution buε 480. Data generated by the functional units 478^ is provided back to the bypass unit 474 or the regiεter array 472, or both, via an output bus 482.
A load/store unit 484 completes the data path portion of the IEU 104. The load/store unit 484 is responsible for managing the transfer of data between the IEU 104 and CCU 106. Specifically, load data obtained from the data cache 134 of the CCU 106 iε transferred by the load/store unit 484 to an input of the regiεter array 472 via a load data buε 486. Data to be εtored to the data cache 134 of the CCU 106 iε received from the functional unit diεtribution buε 480. The control path portion of the IEU 104 is responsible for isεuing, managing, and completing the proceεεing of information through the IEU data path. In the preferred embodiments of the present invention the IEU control path is capable of managing the concurrent execution of multiple inεtructionε and the IEU data path provides for multiple independent data tranεferε between essentially all data path elementε of the IEU 104. The IEU control path operates in response to inεtructionε received via the inεtruction/operand buε 124. Specifically, inεtruction εetε are received by the EDecode unit 490. In the preferred embodiments of the present invention, the EDcode 490 receives and decodes both instruction sets held by the IFIFO master registers 216, 224. The results of the decoding of all eight instructions is variously provided to a carry checker (CRY CHKR) unit 492, dependency checker (DEP CHKR) unit 494, register renaming unit (REG RENAME) 496, instruction isεuer (ISSUER) unit 498 and retirement control unit (RETIRE CTL) 500. The carry checker unit 492 receives decoded information about the eight pending instructions from the EDecode unit 490 via control lineε 502. The function of the carry checker 492 is to identify those ones of the pending inεtructionε that either affect the carry bit of the proceεεor εtatuε word or are dependent on the εtate of the carry bit. Thiε control information iε provided via control lineε 504 to the inεtruction iεεuer unit 498. Decoded information identifying the regiεters of the register file 472 that are used by the eight pending instructions as provided directly to the regiεter renaming unit 496 via control lineε 506. Thiε information iε alεo provided to the dependency checker unit 494. The function of the dependency checker unit 494 iε to determine which of the pending inεtructionε reference registers as the destination for data and which instructions, if any, are dependant on any of those destination regiεterε. Thoεe instructionε that have register dependencies are identified by control εignalε provided via the control lineε 508 to the register rename unit 496.
Finally, the EDecode unit 490 provides control information identifying the particular nature and function of each of the eight pending instructionε to the inεtruction iεεuer unit 498 via control lineε 510. The iεεuer unit 498 iε reεponεible for determining the data path resources, particularly of the availability of particular functional units, for the execution of pending instructionε. In accordance with the preferred embodimentε of the architecture 100, inεtruction iεεuer unit 498 allows for the out-of-order execution of any of the eight pending inεtructionε εubject to the availability of data path reεourceε and carry and regiεter dependency conεtraintε. The regiεter rename unit 496 provideε the inεtruction issuing unit 498 with a bit map, via control lines 512 of those instructions that are suitably unconstrained to allow execution. Instructions that have already been executed (done) and those with regiεter or carry dependancieε are logically removed from the bit map.
Depending on the availability of required functional units 478^, the instruction isεuer unit 498 may initiate the execution of multiple inεtructionε during each εyεtem clock cycle. The εtatus of the functional units 478^ are provided via a εtatus buε 514 to the inεtruction iεεuer unit 498. Control εignals for initiating, and subsequently managing the execution of instructions are provided by the instruction issuer unit 498 on the control lines 516 to the regiεter rename unit 496 and εelectively to the functional unitε 78^. In response, the register rename unit 496 provideε regiεter selection signalε on a regiεter file acceεε control bus 518. The specific regiεters enabled via the control εignalε provided on the buε 518 are determined by the εelection of the inεtruction being executed and by the determination by the register rename unit 496 of the registerε referenced by that particular instruction. A bypaεε control unit (BYPASS CTL) 520 generally controls the operation of the bypaεε data routing unit 474 via control signals on control lines 524. The bypaεε control unit 520 monitorε the εtatuε of each of the functional unitε 478^ and, in conjunction with the regiεter references provided from the register rename unit 496 via control lines 522, determines whether data is to be routed from the regiεter file 472 to the functional unitε 478^ or whether data being produced by the functional unitε 478^ can be immediately routed via the bypaεε unit 474 to the functional unit diεtribution bus 480 for use in the execution of a newly iεεued inεtruction εelected by the instruction iεεuer unit 498. In either case, the inεtruction issuer unit 498 directly controls the routing of data from the diεtribution buε 480 to the functional unitε 78^ by selectively enabling specific register data to each of the functional units 478^.
The remaining units of the IEU control path include a retirement control unit 500, a control flow control (CF CT1) unit 528, and a done control (DONE CTL) unit 536. The retirement control unit 500 operates to void or confirm the execution of out-of-order executed instructions . Where an instruction has been executed out-of-order, that inεtruction can be confirmed or retired once all prior inεtructionε have alεo been retired. Baεed on an identification of which of the current εet of eight pending instructions have been executed provided on the control lines 532, the retirement control unit 500 provideε control signals on control lineε 534 coupled to the bus 518 to effectively confirm the result data stored by the regiεter array 472 as the reεult of the prior execution of an out-of-order executed inεtruction. The retirement control unit 500 provides the PC increment/size control εignals on control lines 344 to the IFU 102 as it retires each instruction. Since multiple instructionε may be executed out-of-order, and therefore ready ! for εimultaneouε retirement, the retirement control unit 500 determines a εize value baεed on the number of instructionε simultaneously retired. Finally, where all instructions of the IFIFO master regiεter 224 have been executed and retired, the retirement control unit 500 provides the IFIFO read control signal on the control line 342 to the IFU 102 to initiate an IFIFO unit 264 shift operation, thereby providing the EDecode unit 490 with an additional four inεtructionε aε inεtructionε pending execution. The control flow control unit 528 performε the εomewhat more εpecific function of detecting the logical branch reεult of each conditional branch inεtruction. The control flow control unit 528 receiveε an 8 bit vector identification of the currently pending conditional branch instructions from the EDecode unit 490 via the control lines 510. An 8 bit vector inεtruction done control εignal iε εimilarly received via the control lineε 538 from the done control unit 540. This done control signal allows the control flow control unit 528 to identify when a conditional branch instruction iε done at leaεt to a point sufficient to determine a conditional control flow εtatuε. The control flow εtatuε reεult for the pending conditional branch instructions are stored by the control flow control unit 528 as they are executed. The data neceεεary to determine the conditional control flow inεtruction outcome iε obtained from temporary status registers in the register array 472 via the control lines 530. As each conditional control flow instruction is executed, the control flow control unit provides a new control flow result εignal on the control lines 348 to the IFU 102. This control flow result signal preferably includes two 8 bit vectors defining whether the statuε results, by respective bit position, of the eight potentially pending control flow instruction are known and the corresponding status reεult εtateε, also given by bit position correspondence.
Lastly, the done control unit 540 is provided to monitor the operational execution εtate of each of the functional unitε 478^. As any of the functional unitε 478^ signal completion of an instruction execution operation, the done control unit 540 provides a corresponding done control εignal on the control lineε 542 to alert the regiεter rename unit 496, inεtruction iεεuer unit 498, retirement control unit 500 and bypaεε control unit 520.
The parallel array arrangement of the functional unitε 4780.,, enhances the control consiεtency of the IEU 104. The particular nature of the individual functional unitε 478^ muste known by the inεtruction iεsuer unit 498 in order for instructions to be properly recognized and scheduled for execution. The functional units 478^ n are responsible for determining and implementing their εpecific control flow operation neceεεary to perform their requiεite function. Thuε, other than the inεtruction issuer 498, none of the IEU control units need to have independant knowledge of the control flow procesεing of an inεtruction. Together, the inεtruction issuer unit 498 and the functional units 478^ provide the necessary control signal prompting of the functions to be performed by the remaining control flow managing unitε 496, 500, 520, 528, 540. Thuε, alteration in the particular control flow operation of a functional unit 478^ doeε not impact the control operation of the IEU 104. Further, the functional augmentation of an exiεting functional unit 478^ and even the addition of one or more new functional unitε 78^, εuch aε an extended preciεion floating point multiplier and extended preciεion floating point ALU, a faεt fourier computation functional unit, and a trigonometric computational unit, require only minor modification of the instruction iεεuer unit 498. The required modifications muεt provide for recognition of the particular instruction, based on the corresponding instruction field isolated by the EDecode unit 490, a correlation of the instruction to the required functional unit 78^. Control over the selection of regiεter date, routing of data, instruction completion and retirement remain consiεtent with the handling of all other intεtructionε executed with reεpect to all other oneε of the functional unitε 478^,,.
A) IEU Data Path Detail:
The central element of the IEU data path iε the regiεter file 472. Within the IEU data path, however, the preεent invention provideε for a number of parallel data pathε optimized generally for specific unctions. The two principal data paths are integer and floating point. Within each parallel data path, a portion of the register file 472 iε provided to support the data manipulations occurring within that data path.
1, Register File Detail: The preferred generic architecture of a data path register file is εhown in Figure 6a. The data path regiεter file 550 includeε a temporary buffer 552, a regiεter file array 564, an input εelector 559, and an output selector 556. Data ultimately destined for the regiεter array 564 iε typically firεt received by the temporary buffer 552 through a combined data input buε 558'. That is, all data directed to the data path regiεter file 550 iε multiplexed by the input selector 559 from a number of input buseε 558, preferably two, onto the input bus 558'. Register select and enable control signals provided on the control bus 518 select the register location for the received data within the temporary buffer 552. On retirement of an instruction that produced data εtored in the temporary buffer, control εignalε again provided on the control buε 518 enable the transfer of the data from the temporary buffer 552 to a logically corresponding register within the regiεter file array 564 via the data buε 560. However, prior to retirement of the inεtruction, data εtored in the registerε of the temporary buffer 552 may be utilized in the execution of εubεequent inεtructions by routing the temporary buffer εtored data to the output data εelector 556 via a bypaεε portion of the data buε 560. The εelector 556, controlled by a control εignal provided via the control buε 518 εelectε between data provided from the regiεterε of the temporary buffer 552 and of the regiεter file array 564. The reεulting data iε provided on the regiεter file output buε 564. Alεo, where an executing instruction will be retired on completion, i.e., the instruction has been executed in-order, the input εelector 559 can be directed to route the reεult data directly to the register array 554 via bypasε extenεion 558".
In accordance with the preferred e bodimentε of the present invention, each data path register file 550 permitε two εimultaneouε regiεter operations to occur. Thus, the input bus 558 provideε for two full regiεter width data valueε to be written to the temporary buffer 552. Internally, the temporary buffer 552 provideε a multiplexer array permitting the εimultaneous routing of the input data to any two registers within the temporary buffer 552. Similarly, internal multiplexers allow any five regiεters of the temporary buffer 552 to be selected to output data onto the bus 560. The regiεter file ,array 564 likewiεe includes input and output multiplexers allowing two regiεterε to be εelected to receive, on buε 560, or five to εource, via bus 562, respective data εimultaneouεly. Finally, the regiεter file output εelector 556 iε preferably implemented to allow any five of the ten regiεter data valueε received via the buses 560, 562 to be εimultaneouεly output on the register file output buε 564.
The regiεter set within the temporary buffer is generally shown in Figure 6b. The register εet 552' consists of eight single word (32 bit) registers IORD, I1RD...I7RD. The register set 552' may alεo be used as a set of four double word registerε IORD, IORD+1 (IORD4), I1RD, I1RD+1 (ISRD)... I3RD, I3RD+1 (I7RD).
In accordance with the present invention, rather than provide duplicate registers for each of the registers within the regiεter file array 564, the regiεterε in the temporary buffer regiεter set 552 are referenced by the register rename unit 496 baεed on the relative location of the reεpective inεtructionε within the two IFIFO maεter regiεters 216, 224. Each instruction implemented by the architecture 100 may reference for output up to two regiεters, or one double word register, for the destination of data produced by the execution of the instruction. Typically, an instruction will reference only a single output regiεter. Thuε, for an instruction two (I2) of the eight pending instructionε, poεitionally identified aε εhown in Figure 6C and that references a single output register, the data destination register I2RD will be selected to receive data produced by the execution of the instruction. Where the data produced by the instruction I2iε uεed by a εubεequent inεtruction, for example, I5, the data stored in the I2RD regiεter will be tranεferred out via the buε 560 and the reεultant data stored back to the temporary buffer 552 into the regiεter identified aε I5RD. Notably, instruction Iε iε dependent on inεtruction I2. Inεtruction I5 cannot be executed until the reεult data from I2 is available. However, aε can be seen, instruction I5 can execute prior to the retirement of instruction I2 by obtaining itε required input data from the inεtruction I2 data location of the temporary buffer 552'.
Finally, aε inεtruction I2 is retired, the data from the regiεter I2RD iε written to the regiεter location within the regiεter file array 564 aε determined by the logical poεition of the instruction at the point of retirement. That is, the retirement control unit 560 determines the address of the destination regiεterε in the register file array from the register reference field data provided from the EDecode unit 490 on the control lines 510. Once instructions 1,^ have been retired, the values in I4RD-I7RD are shifted into I0RD- I3RD simultaneouε with a shift of the IFIFO unit 264.
A complication arises where instruction I2 provides a double word result value. In accordance with a preferred embodiment of the present invention, a combination of locations I2RD and I6RD iε uεed to εtore the data resulting from inεtruction I2 until that inεtruction iε retired or otherwise cancelled. In the preferred embodiment, execution of instructions I4.7 are held where a double word output reference by any of the inεtructionε I^ iε detected by the register rename unit 496. This allowε the entire temporary buffer 552' to be uεed aε a εingle rank of double word regiεterε. Once instructions IM have been retired, the temporary buffer 552' can again be uεed as two ranks of single word regiεters. Further, the execution of any instruction I4. 7 is held where a double word output regiεter iε required until the inεtruction haε been εhifted into a corresponding IM location. The logical organization of the register file array 564 is εhown in. Figure 7a-b. In accordance with the preferred embodimentε of the present invention, the register file array 564 for the integer data path consists of 40 32-bit wide regiεters. This set of registerε, constituting a register set "A", is organized as a base register set ra[0..23] 565, a top set of general purpose regiεterε ra[24..31] 566, and a shadow regiεter set of eight general purpose trap regiεters rt[24..31]. In normal operation, the general purpose registerε ra[0..31] 565, 566 conεtitutes the active "A" regiεter εet of the regiεter file array for the integer data path.
Aε εhown in Figure 7b the trap regiεters rt[24..31] 567 may be swapped into the active register set "A" to allow access along with the active base set of regiεters ra[0..23] 565. This configuration of the "A" register set is selected upon the acknowledgement of an interrupt or the execution of an exception trap handling routine. This state of the register set "A" is maintained until expresεly returned to the εtate εhown in Figure 7a by the execution of an enable interruptε inεtruction or execution of a return from trap instruction.
In the preferred embodiment of the preεent invention as implemented by the architecture 100, the floating point data path utilizes an extended precision regiεter file array 572 aε generally shown in Figure 8. The register file array 572 consists of 32 regiεterε, rf[0..31], each having a width of 64 bits. The floating point register file 572 may alεo be logically referenced as a "B" set of integer regiεterε rb[0..31]. In the architecture 100, thiε "B" εet of regiεterε iε equivalent to the low-order 32 bitε of each of the floating point regiεterε rf[0..31]. Representing a third data path, a boolean operator register εet 574 is provided, aε εhown in Figure 9, to εtore the logical result of boolean combinatorial operations. This "C" regiεter εet 574 consists of 32 single bit registerε, rc[0..31]. The operation of the boolean regiεter εet 574 iε unique in that the results of boolean operations can be directed to any inεtruction εelected regiεter of the boolean regiεter set- 574. Thiε iε in contrast to utilizing a single processor statuε word regiεter that stores single bit flags for conditions such aε equal, not equal, greater than and other simple boolean statuε values.
Both the floating point regiεter εet 572 and the boolean regiεter set 574 are complimented by temporary buffers architecturally identical to the integer temporary buffer 552 εhown in Figure 6b. The eεεential difference iε that the width of the temporary buffer regiεterε iε defined to be identical to those of the complimenting register file array 572, 574; in the preferred implementation, 64 bits and one bit, respectively.
A number of additional εpecial regiεters are at leaεt logically preεent in the register array 472. The registers that are physically present in the register array 472, as εhown in Figure 7c, include a kernel stack pointer 568, processor εtate regiεter (PSR) 569, previouε proceεεor εtate regiεter (PPSR) 570, and an array of eight temporary processor state registers (tPSR[0..7]) 571. The remaining special regiεterε are distributed throughout variouε partε of the architecture 100. The εpecial addreεs and data buε 354 iε provided to select and tranεfer data between the εpecial registerε and the "A" and "B" εetε of regiεterε. A εpecial register move inεtruction iε provided to select a regiεter from either the "A" or "B" regiεter εet, the direction of tranεfer and to specify the addresε identifier of a special regiεter.
The kernel εtack pointer register and temporary procesεor εtate regiεterε differ from the other εpecial regiεterε. The kernel εtack pointer may be acceεεed through execution of a εtandard regiεter to regiεter move inεtruction when in kernel εtate. The temporary proceεεor εtate registers are not directly accessible. Rather, this array of regiεterε iε uεed to implement an inheritance mechanism for propagating the value of the procesεor εtate regiεter for use by out-of-order executing instructionε. The initial propagation value iε that of the proceεεor state register: the value provided by the last retired instruction. This initial value is propagated forward through the temporary procesεor εtate regiεterε εo that any out-of-order executing inεtruction haε access to the value in the poεitionally corresponding temporary processor state register. The εpecific nature of an inεtruction defineε the condition code bitε, if any, that the inεtruction iε dependent on and may change. Where an inεtruction is unconstrained by dependencieε, regiεter or condition code as determined by the register dependency checker unit 494 and carry dependency checker 492, the instruction can be executed out-of-order. Any modification of the condition code bitε of the processor εtate regiεter are directed to the logically correεponding temporary proceεεor state regiεter. Specifically, only those bits that may change are applied to the value in the temporary procesεor εtate regiεter and propagated to ' all higher order temporary procesεor εtate regiεterε. Conεequently, every out-of- order executed inεtruction executeε from a proceεsor εtate regiεter value modified appropriately by any intervening PSR modifying inεtructionε . Retirement of an inεtruction only tranεfers the corresponding temporary processor state registerε value to the PSR regiεter 569.
The remaining εpecial regiεterε are deεcribed in Table II.
TABLE II Special Registerε
Special Move Reg R/W Description:
PC R Program Counterε : in general, PCε maintain the next address of the currently executing program inεtruction εtream.
IF__PC R/W IFU Program Counter: the IF_PC maintainε the preciεe next execution addreεε .
PFnPCε R Prefetch Program Counterε: the MBUF,
TBUF and EBUF PFnPCε maintain the next prefetch inεtruction addreεεes for the respective prefetch instruction εtreamε . uPC R/W Micro-Program Counter: maintainε the addreεs of the inεtruction following a procedural inεtruction. Thiε iε the addreεε of the firεt inεtruction to be executed upon return from a procedural routine. xPC R/W Interrupt/Exception .Program Counter: holdε the return addreεε of an interrupt or and exception. The return addreεε iε the addreεε of the IFPC at the time of the trap. TBR W Trap Base Regiεter: base addresε of a vector table used for trap handling routine dispatching. Each entry iε one word long. The trap number, provided by Interrupt Logic Unit 363, iε used as an index into the table pointed to by thiε addreεε.
FTB W Faεt Trap Base Register: base addresε of an immediate trap handling routine table. Each table entry iε 32 words and is used to directly implement a trap handling routine. The trap number, provided by Interrupt Logic Unit 363, times 32 is used aε an offset into the table pointed to by this addreεs.
PBR W Procedural Base Register: base address of a vector table used for procedural routine dispatching. Each entry iε one word long, aligned on four word boundaries. The procedure number, provided as a procedural inεtruction field, is uεed aε an index into the table pointed to by thiε addresε.
PSR R/W Proceεεor State Regiεter: maintainε the proceεεor statuε word. Statuε data bitε include: carry, overflow, zero, negative, processor mode, current interrupt level, procedural routine being executed, divide by 0, overflow exception, hardware function enables, procedural enable, interrupt enable.
PPSR R/W Previous Proceεεor State Regiεter: loaded from the PSR on εuccessful completion of an inεtruction or when an interrupt or trap iε taken.
CSR R/W Compare State (Boolean) Regiεter: the boolean regiεter εet acceεεible aε a εingle word.
PCSR R/W Previous Compare State Regiεter: loaded from the CSR on εucceεεful completion of an inεtruction or when an interrupt or trap is taken. 7 ) Integer Data Path Detail:
The integer data path of the IEU 104, conεtructed in accordance with the preferred embodiment of the present invention, iε εhown in Figure 10. For purposes of clarity, the many control path connections to the integer data path 580 are not shown. Those connections are defined with respect to Figure 5.
Input data for the data path 580 is obtained from the alignment unitε 582, 584 and the integer load/εtore unit 586. Integer immediate data valueε, originally provided aε an inεtruction embedded data field are obtained from the operand unit 470 via a buε 588. The alignment unit 582 operateε to isolate the integer data value and provide the resulting value onto the output buε 590 to a multiplexer 592. A second input to the multiplexer 592 is the special regiεter addreεs and data bus 354.
Immediate operands obtained from the instruction stream are also obtained from the operand unit 570 via the data bus 594. Theεe valueε are again right justified by the alignment unit 584 before proviεion onto an output buε 596.
The integer load/εtore unit 586 communicates bi- directionally via the external data buε 598 with the CCU 106. Inbound data to the IEU 104 iε tranεferred by the integer load/εtore unit 586 onto the input data bus 600 to an input latch 602. Data output from the multiplexer 592 and latch 602 are provided on the multiplexer input buses 604, 606 of a multiplexer 608. Data from the functional unit output bus 482' iε also received by the multiplexer 608. Thiε multiplexer 608, in the preferred embodiments of the architecture 100, provideε for two εimultaneouε data paths to the output multiplexer buεeε 610. Further, the tranεfer of data through the multiplexer 608 can be completed within each half cycle of the system clock. Since most instructions implemented by the architecture 100 utilize a single deεtination regiεter, a maximum of four inεtructionε can provide data to the temporary buffer 612 during each εyεtem clock cycle.
Data from the temporary buffer 612 can be tranεferred to an integer regiεter file array 614, via temporary regiεter output buεeε 616 or to a output multiplexer 620 via alternate temporary buffer regiεter buεeε 618. Integer regiεter array output buεeε 622 permit the tranεfer of integer register data to the multiplexer 620. The output buseε connected to the temporary buffer 612 and integer regiεter file array 614 each permit five regiεter valueε to be output εimultaneouεly. That iε, two inεtructionε referencing a total of up to five εource registers can be issued simultaneously. The temporary buffer 612, regiεter file array 614 and multiplexer 620 allow outbound regiεter data transfers to occur every half εyεtem clock cycle. Thuε, up to four integer and floating point instructions may be isεued during each clock cycle.
The multiplexer 620 operateε to εelect outbound regiεter data valueε from the regiεter file array 614 or directly from the temporary buffer 612. This allows out-of-order executed instructionε with dependencieε on prior out-of-order executed instructions to be executed by the IEU 104. This facilitates the twin goals of maximizing the execution through-put capability of the IEU integer data path by the out-of-order execution of pending instructionε while preciεely segregating out- of-order data resultε from data reεultε produced by inεtructionε that have been executed and retired. Whenever an interrupt or other exception condition occurs that requires the preciεe εtate of the machine to be reεtored, the preεent invention allowε the data valueε preεent in the temporary buffer 612 to be simply cleared. The regiεter file array 614 iε therefore left to contain preciεely thoεe data valueε produced only by the execution of instructions completed and retired prior to the occurrence of the interrupt or other exception condition. The up to five register data values selected during each half system clock cycle operation of the multiplexer 620 are provided via the multiplexer output buseε 624 to an integer bypaεε unit 626. Thiε bypaεε unit 626 iε, in essence, a parallel array of multiplexers that provide for the routing of data presented at any of its inputε to any of itε outputε. The bypaεε unit 626 inputε include the εpecial regiεter addreεεed data value or immediate integer value via the output bus 604 from the multiplexer 592, the up to five regiεter data valueε provided on the buses 624, the load operand data from the integer load/store unit 586 via the double integer buε 600, the immediate operand value obtained from the alignment unit 584 via itε output buε 596, and, finally, a bypass data path from the functional unit output bus 482. This bypass data path, and the data buε 482, provideε for the simultaneous transfer of four register valueε per εyεtem clock cycle.
Data iε output by the bypaεε unit 626 onto an integer bypasε bus 628 that is connected to the floating point data path, to two operand data buses providing for the transfer out of up to five regiεter data valueε simultaneously, and a εtore data buε 632 that iε uεed to provide data to the integer load/εtore unit 586. The functional unit diεtribution buε 480 iε implemented through the operation of a router unit 634. Again, the router unit 634 iε implemented by a parallel array of multiplexerε that permit five regiεter valueε received at its inputs to be routed to the functional units provided in the integer data path. Specifically, the router unit 634 receiveε the five regiεter data values provided via the buses 630 from the bypaεε unit 626, the current IF_PC addreεε value via the addreεε buε 352 and the control flow offεet value determined by the PC control unit 362 and as provided on the lines 378'. The router unit 634 may optionally receive, via the data bus 636 an operand data value εourced from a bypaεε unit provided within' the floating point data path. The regiεter data valueε received by the router unit 634 may be tranεferred onto the special, register addreεε and data buε 354 and to the functional unitε 640, 642, 644. Specifically, the router unit 634 iε capable of providing up to three register operand values to each of the functional unitε 640, 642, 644 via router output buεeε 646, 648, 650. Conεiεtent with the general architecture of the architecture 100, up to two inεtructionε could be εimultaneouεly iεsued to the functional units 640, 642, 644. The preferred embodiment of the present invention provideε for three dedicated integer functional units, implementing respectively a programmable shift function and two arithmetic logic unit functions.
An ALU0 functional unit 644, ALU1 functional unit 642 and εhifter functional unit 640 provide reεpective output regiεter data onto the functional unit buε 482'. The output data produced by the ALU0 and εhifter functional unit 644, 640 are also provided onto a shared integer functional unit buε 650 that iε coupled into the floating point data path. A εimilar floating point functional unit output value data buε 652 iε provided from the floating point data path to the functional unit output buε 482' . The ALUO functional unit 644 iε uεed alεo in the generation of virtual address values in support of both the prefetch operations of the IFU 102 and data operations of the integer load/εtore unit 586. The virtual addreεε value calculated by the ALUO functional unit 644 is provided onto an output buε 654 that connectε to both the target addreεs buε 346 of the IFU 102 and to the CCU 106 to provide the execution unit phyεical addreεs (EX PADDR) . A latch 656 iε provided to εtore the virtualizing portion of the addreεε produced by the ALUO functional unit 644. Thiε virtualizing portion of the address is provided onto an output bus 658 to the VMU 108.
3) Floating Point Data Path Detail:
Referring now to Figure 11, the floating point data path 660 is shown. Initial data iε again received from a number of εourceε including the immediate integer operand buε 588, immediate operand buε 594 and the εpecial regiεter addreεε data bus 354. The final εource of external data iε a floating point load/εtore unit 662 that iε coupled to the CCU 106 via the external data buε 598.
The immediate integer operand iε received by an alignment unit 664 that functions to right justify the integer data field before εubmiεεion to a multiplexer 666 via an alignment output data buε 668. The multiplexer 666 also receives the special register addresε data buε 354. Immediate operandε are provided to a second alignment unit 670 for right juεtification before being provided on an output buε 672. Inbound data from the floating point load/εtore unit 662 iε received by a latch 674 from a load data bus 676. Data from the multiplexer 666, latch 674 and a functional unit data return buε 482" iε received on the inputε of a multiplexer 678. The multiplexer 678 provideε for selectable data pathε εufficient to allow two regiεter data valueε to be written to a temporary buffer 680, via the multiplexer output buεeε 682, each half cycle of the εyεtem clock. The temporary buffer 680 incorporates a regiεter εet logically identical to the temporary buffer 552' aε shown in Figure 6b. The temporary buffer 680 further provideε for up to five regiεter data valueε to be read from the temporary buffer 680 to a floating point register file array 684, via data buses 686, and to an output multiplexer 688 via output data buseε 690. The multiplexer 688 alεo receiveε, via data buεeε 692, up to five regiεter data valueε from the floating point regiεter file array 684 εimultaneouεly. The multiplexer 688 functionε to εelect up to five regiεter data values for simultaneous transfer to a bypasε unit 694 via data buεeε 696. The bypaεε unit 694 alεo receiveε the immediate operand value provided by the alignment unit 670 via the data buε 672, the output data buε 698 from the multiplexer 666, the load data buε 676 and a data bypass extension of the functional unit data return bus 482". The bypass unit 694 operates to εelect up to five simultaneouε regiεter operand data values for output onto the bypaεs unit output buses 700, a εtore data buε 702 connected to the floating point load/εtore unit 662, and the floating point bypaεs buε 636 that connectε to the router unit 634 of the integer data path 580.
A floating point router unit 704 provideε for simultaneouε εelectable data paths between the bypaεε unit output buεeε 700 and the integer data path bypaεε buε 628 and functional unit input buεeε 706, 708, 710 coupled to the reεpective functional unitε 712, 714, 716. Each of the input buεeε 706, 708, 710, in accordance with the preferred embodiment of the architecture 100, permitε the εimultaneouε transfer of up to three register operand data valueε to each of the functional unit 712, 714, 716. The output buεeε of theεe functional unitε 712, 714, 716 are coupled to the functional unit data return buε 482" for returning data to the regiεter file input multiplexer 678. The integer data path functional unit output bus 650 may alεo be provided to connect to the functional unit data return buε 482". The architecture 100 does provide for a connection of the functional unit output buses of a multiplier functional unit 712 and a floating point ALU 714 to be coupled via the floating point data path functional unit bus 652 to the functional unit data return bus 482' of the integer data path 580.
A ) Boolean Register Data Path Detail: The boolean operations data path 720 is shown in Figure 12. This data path 720 iε utilized in εupport of the execution of eεεentially two typeε of inεtructionε. The firεt type iε an operand compariεon inεtruction where two operands, selected from the integer regiεter εetε, floating point register εetε or provided aε immediate operandε, are compared by εubtraction in one of the ALU functional unitε of the integer and floating point data pathε. Compariεon is performed by a subtraction operation by any of the ALU functional units 642, 644, 714, 716 with the resulting εign and zero εtatus bits being provided to a combined input εelector and compariεon operator unit 722. Thiε unit 722, in reεponse to inεtruction identifying control εicnals received from the EDecode unit 490, εelectε the output of an ALU functional unit 642, 644, 714, 716 and combines the sign and zero bits to extract a boolean comparison reεult value. An output bus 723 allows the reεultε of the compariεon operation to be tranεferred εimultaneouεly to an input multiplexer 726 and a bypaεs unit 742. As in the integer and floating point data pathε, the bypaεε unit 742 iε implemented aε a parallel array of multiplexerε providing multiple selectable data pathε between the inputε of the bypaεε unit 742 to multiple outputs. The other inputε of the bypaεs unit 742 include a boolean operation reεult return data bus 724 and two boolean operands on data buεeε 744. The bypaεε unit 742 permits boolean operands representing up to two simultaneously executing boolean instructions to be tranεferred to a boolean operation functional unit 746, via operand buεeε 748. The bypaεε unit 746 alεo permitε transfer of up to two single bit boolean operand bits (CFO, CF1) to be simultaneouεly provided on the control flow reεult control lineε 750, 752. The remainder of the boolean operation data path 720 includeε the input multiplexer 726 that receiveε aε itε inputε, the compariεon and the boolean operation reεult valueε provided on the compariεon reεult buε 723 and a boolean result buε 724. The buε 724 permitε up to two εimultaneouε boolean reεult bits to be tranεferred to the multiplexer 726. In addition, up to two comparison reεult bitε may be tranεferred via the buε 723 to the multiplexer 726. The multiplexer 726 permitε any two single bits presented at the multiplexer inputs to be transferred via the multiplexer output buses 730 to a boolean operation temporary buffer 728 during each half cycle of the εyεtem clock. The temporary buffer 728 is logically equivalent to the temporary buffer 752', aε εhown in Figure 6b, though differing in two significant respects. The firεt respect iε that each regiεter entry in the temporary buffer 728 conεiεts of a single bit. The second distinction is that only a single register is provided for each of the eight pending instruction slotε, εince the reεult of a boolean operation is, by definition, fully defined by a single result bit.
The temporary buffer 728 provides up to four output operand values εimultaneouεly. Thiε allowε the simultaneous execution of two boolean instructions, each requiring accesε to two εource registerε. The four boolean regiεter valueε may be tranεferred during each half cycle of the εystem clock onto the operand buses 736 to a multiplexer 738 or to a boolean regiεter file array 732 via the boolean operand data buεeε 734. The boolean regiεter file array 732, as logically depicted in Figure 9, is a single 32 bit wide data regiεter that permitε any εeparate combination of up to four εingle bit locationε to be modified with data from the temporary buffer 728 and read from the boolean regiεter file array 732 onto the output buseε 740 during each half cycle of the εyεtem clock. The multiplexer 738, provideε for any two pairε of boolean operandε received at its inputε via the buεeε 736, 740 to be tranεferred onto the operand output buεes 744 to the bypasε unit 742.
The boolean operation functional unit 746 iε capable of performing a wide range of boolean operationε on two εource values. In the case of compariεon inεtructionε, the εource valueε are a pair of operands obtained from any of the integer and floating point register εetε and any immediate operand provided to the IEU 104, and, for a boolean instruction, any two of boolean regiεter operands. Tables III and IV identify the logical compariεon operationε provided by the preferred embodiment of the architecture 100. Table V identifies the direct boolean operationε provided by the preferred implementation of the architecture 100. The inεtruction condition codes and function codeε specified in the Tables III-V represent a segment of the corresponding instructionε. The instruction also provides an identification of the source pair of operand registerε and the deεtination boolean regiεter for εtorage of the correεponding boolean operation result.
Figure imgf000097_0001
*rs = regiεter source
Figure imgf000098_0001
*bε = boolean εource regiεter ) Loa /Store Control Unit:
An exemplary load/εtore unit 760 iε εhown in Figure 13. Although εeparately εhown in the data paths 580, 660, the load/store units 586 662 are preferrably implemented as a single shared load/εtore unit 760. The interface from a reεpective data path 580, 660 iε via an addreεε buε 762 and load and εtore data buεeε 764 (600, 676), 766 (632, 702).
The addreεε utilized by the load/store unit 760 iε a physical addreεε aε oppoεed to the virtual address utilized by the IFU 102 and the remainder of the IEU 104. While the IFU 102 operates on virtual addresεeε, relying on coordination between the CCU 106 and VMU 108 to produce a phyεical addreεε, the IEU 104 requires the load/store unit 760 to operate directly in a physical addreεs mode. This requirement iε neceεεary to inεure data integrity in the preεence of out-of-order executed inεtructionε that may involve overlapping physical addreεε data load and εtore operationε and in the preεence of out-of-order data returnε from the CCU 106 to the load/εtore unit 760. In order to insure data integrity, the load/store unit 760 bufferε data provided by εtore instructions until the store instruction is retired by the IEU 104. Consequently, εtore data buffered by the load store unit 760 may be uniquely preεent only in the load/εtore unit 760. Load inεtructionε referencing the same phyεical addreεε aε executed but not retired εtore inεtructionε are delayed until the store instruction iε actually retired. At that point the εtore data may be tranεferred to the CCU 106 by the load/εtore unit 760 and then immediately loaded back by the execution of a CCU data load operation. Specifically, full phyεical addreεεeε are provided from the VMU 108 onto the load/store address bus 762. Load addresses are, in general, εtored in load address regiεterε 768,^. Store addreεεeε are latched into εtore addreεε regiεters 770M. A load/store control unit 774 cperateε in reεponεe to control εignalε received from the inεtruction isεuer unit 498 in order to coordinate latching of load and εtore addreεεeε into the regiεterε 7683.0, 7703^,. The load/store control unit 774 provideε control εignalε on control lineε 778 for latching load addreεεeε and on control lineε 780 for latching εtore addresses. Store data is latched simultaneouε with the latching of εtore addreεεeε in logically correεponding εlotε of the εtore data register set IBl^ . A 4x4x32 bit wide addreεs comparator unit 772 is simultaneously provided with each of the addresεeε in the load and store addresε registers 7683^, 7703^,. The execution of a full matrix address comparison during each half cycle of the syεtem clock iε controlled by the load/εtore control unit 774 via control lines 776. The exiεtence and logical location of a load addreεε that matcheε a store addresε iε provided via control signalε returned to the load εtore control unit 774 via control lineε 776. Where a load addreεs iε provided from the VMU 108 and there are no pending εtoreε, the load addreεε iε bypaεεed directly from the buε 762 to an addreεε εelector 786 concurrent with the initiation of a CCU load operation. However, where εtore data is pending, the load addreεε will be latched in an available load addresε latch 7680.3. Upon receipt of a control εignal from the retirement control unit 500, indicating that the correεponding εtore data inεtruction iε retiring, the load/εtore control unit 774 initiateε a CCU data tranεfer operation by arbitrating, via control lineε 784 for acceεs to the CCU 106. When the CCU 106 εignals ready, the load/store control unit 774 directε the εelector 786 to provide a CCU phyεical addreεε onto the CCU PADDR address buε 788. This addresε is obtained from the correεponding store regiεter 770g^ via the address bus 790. Data from the corresponding εtore data regiεter 7823 ) is provided onto the CCU data bus 792.
Upon isεuance of load inεtruction by the inεtruction issuer 498, the load εtore control unit 774 enables one of the load addresε latcheε 768^ to latch the requeεted load addreεε. The εpecific latch 7680.3 εeleσted logically correεpondε to the poεition of the load instruction in the relevant inεtruction εet. The inεtruction iεεuer 498 provideε the load/εtore control unit 774 with a five bit vector identifying the load inεtruction within either of the two poεεible pending instruction setε. Where the comparator 772 does not identify a matching store addresε, the load addreεε is routed via an addreεε buε 794 to the selector 786 for output onto the CCU PADDR addreεε buε 788. Proviεion of the addreεε iε performed in concert with CCU requeεt and ready control εignalε being exchanged between the load/store control unit 774 and CCU 106. An execution ID value (ExID) iε alεo prepared and iεεued by the load/εtore control unit' 774 to the CCU 106 in order to identify the load requeεt when the CCU 106 εubεequently returnε the requeεted data including ExID value. Thiε ID value conεiεtε of a four bit vector utilizing unique bitε to identify the reεpective load address latch 7683. 3 from which the current load request iε generated. A fifth bit iε utilized to identify the inεtruction εet that containε the load inεtruction. The ID value is thuε the εame aε the bit vector provided with the load requeεt from the instruction issuer unit 498.
On subεequent εignal from the CCU 106 to the load/εtore control unit 774 of the availability of prior requeεted load data, the load/εtore control unit 774 enables an alignment unit to receive the data and provide it on the load data buε 764. An alignment unit 798 operates to right justify the load data.
Simultaneouεly with the return of data from the CCU 106, the load/εtore control unit 774 receiveε the ExID value from the CCU 106. The load/εtore control unit 774, in turn, provides a control εignal to the inεtruction iεεuer unit 498 identifying that load data is being provided on the load data bus 764 and, further, returnε a bit vector identifying the load inεtruction for which the load data iε being returned.
Cl IEU Control Path Detail:
Referring again to Figure 5, the operation of the IEU control path will now be deεcribed in detail with respect to the timing diagram provided in Figure 14. The timing of the execution of inεtructionε repreεented in Figure 14 iε exemplary of the operation of the preεent invention, and not exhauεtive of execution timing permutationε .
The timing diagram of Figure 14 εhowε a εequence of procesεor εyεtem clock cycleε, PM. Each proceεεor cycle begins with an internal T Cycle, T0. There are two T cycles per proceεεor cycle in a preferred embodiment of the preεent invention aε provided for by the architecture 100.
In proceεεor cycle zero, the IFU 102 and the VMU 108 operate to generate a phyεical addresε. The phyεical addreεε iε provided to the CCU 106 and an inεtruction cache acceεε operation is initiated. Where the requested instruction εet iε present in the inεtruction cache 132, an inεtruction εet iε returned to the IFU 102 at about the mid-point of processor cycle one. The IFU 102 then manages the transfer of the instruction set through the prefetch unit 260 and IFIFO 264, whereupon the instruction εet is firεt preεented to the IEU 104 for execution. i EDecode Unit Detail: The EDecode unit 490 receiveε the full inεtruction εet in parallel for decoding prior to the concluεion of procesεor cycle one. The EDecode unit 490, in the preferred architecture 100, iε implemented aε a pure combinatorial logic block that provideε for the direct parallel decoding of all valid inεtructionε that are received via the buε 124. Each type of instruction recognized by the architecture 100, including the specification of the inεtruction, regiεter requirementε and reεource needε are identified in Table VI.
TABLE VI
Instruction/Specificationε
Instruction Control and Operand Information* Move Regiεter Logical/Arithmetic Function Code: to Regiεter specifies Add, Subtract,
Multiply, Shift, etc. Destination Register
Set PSR only Source Regiεter 1 Source Regiεter 2 or Immediate conεtant value Regiεter Set A/B εelect
Move Immediate Deεtination Regiεter to Regiεter Immediate Integer or Floating Point conεtant value Regiεter Set A/B select Load/Store Operation Function Code: specifies Regiεter Load or Store, use immediate value, -base and immediate value, or baεe and offεet
Source/Deεtination Regiεter
Baεe Register
Index Register or Immediate constant value
Regiεter Set A/B εelect
Immediate Call Signed Immediate Displacement control Flow Operation Function Code: εpecifieε branch type and triggering condition Baεe Regiεter Index Register, Immediate constant displacement value, or Trap
Number Register Set A/B select
Special Regiεter Operation Function Code: εpecifieε Move move to/from special/integer regiεter Special Regiεter Addreεε Identifier
Source/Destination Register Register Set A/B select
Convert Integer Operation Function Code: specifieε Move type of floating point to integer conversion Source/Deεtination Regiεter Register Set A/B εelect
Boolean Functions Boolean Function Code: εpecifieε And, Or, etc. Destination boolean regiεter Source Regiεter 1 Source Regiεter 2 Regiεter Set A/B select
Extended Procedure Procedure specifier: specifieε addreεε offεet from procedural base value Operation: value pasεed to procedure routine Atomic Procedure Procedure εpecifier: εpeσifieε addreεs value
* - inεtruction includeε theεe fieldε in addition to a field that decodeε to identify the inεtruction.
The EDecode unit 490 decodes each instruction of an inεtruction set in parallel. The resulting identification of instructions, instruction . functions, register references and function requirements are made available on the outputs of the EDecode unit 490. This information is regenerated and latched by the EDecode unit 490 during each half proceεεor cycle until all inεtructionε in the inεtruction set are retired. Thuε, information regarding all eight pending instructions is constantly maintained at the output of the EDecode unit 490. This information is presented in the form of eight element bit vectors where the bits or sub-fieldε of each vector logically correεpond to the phyεical location of the correεponding inεtruction within the two pending inεtruction εetε. Thuε, eight vectorε are provided via the control lineε 502 to the carry checker 492, where each vector εpecifieε whether the correεponding inεtruction affectε or iε dependant on the carry bit of the proceεεor εtatuε word. Eight vectors are provided via the control lines 510 to identify the specific nature of each inεtruction and the function unit requirements. Eight vectors are provided via the control lineε 506 εpecifying the regiεter referenceε uεed by each of the eight pending inεtructionε. Theεe vectorε are provided prior to the end of proceεεor cycle one. 2, Carry Checker Unit Detail:
The carry checker unit 492 operates in parallel with the dependency check unit 494 during the data dependency phase of operation εhown in Figure 14. The carry check unit 492 is implemented in the preferred architecture 100 aε pure combinatorial logic. Thuε, during each iteration of operation by the carry checker unit 492, all eight inεtructionε are conεidered with reεpect to whether they modify the carry flag of the proceεεor εtate regiεter. Thiε is neceεεary in order to allow the out-of-order execution of instructions that depend on the state of the carry bit aε εet by prior inεtructionε. Control εignals provided on the control lines 504 allow the carry check unit 492 to identify the specific instructions that are dependant on the execution of prior instructions with reεpect to the carry flag.
In addition, the carry checker unit 492 maintainε a temporary copy of the carry bit for each of the eight pending instructionε. For thoεe inεtructionε that do not modify the carry bit, the carry checker unit 492 propagateε the carry bit to the next inεtruction forward in the order of the program inεtruction εtream. Thus, an out-of-order executed instruction that modifies the carry bit can be executed and, further, a εubεequent inεtruction that iε dependant on εuch an out-of-order executed instruction may alεo be allowed to execute, though εubεequent to the inεtruction that modifieε the carry bit. Further, maintenance of the carry bit by the carry checker unit 492 facilitateε out-of-order execution in that any exception occurring prior to the retirement of those inεtructionε merely requires the carry checker unit 492 to clear the internal temporary carry bit regiεter. Conεequentl , the proceεεor εtatuε register is unaffected by the execution of out-of-order executed inεtructionε. The temporary bit carry regiεter maintained by the carry checker unit 492 iε updated upon completion of each out-of-order executed inεtruction. Upon retirement of out-of-order executed instructions, the carry bit correεponding to the last retired instruction in the program instruction εtream is transferred to the carry bit location of the proceεεor εtatuε regiεter. 3) Data Dependency Checker Unit Detail:
The data dependency checker unit 494 receiveε the eight register reference identification vectors from the EDecode unit 490 via the control lines 506. Each register reference is indicated by a five bit value, suitable for identifying any one of 32 registers at a time, and a two bit value that identifies the register bank aε located within the "A", "B" or boolean register sets. The floating point register εet iε equivalently identified aε the "B" register εet. Each instruction may have up to three register reference fieldε: two εource register fields and one destination. Although some inεtructionε, moεt notably the move regiεter to regiεter instructions, may specify a destination regiεter, an inεtruction bit field recognized by the EDecode unit 490 may signify that no actual output data is to be produced. Rather, execution of the inεtruction iε only for the purpoεe of determining an alteration of the value of the proceεεor εtatuε regiεter. The data dependency checker 494, implemented again aε pure combinatorial logic in the preferred architecture 100, operateε to εimultaneouεly determine dependencieε between εource regiεter referenceε of inεtructions subsequent in the program inεtruction εtream and destination regiεter referenceε of relatively prior inεtructionε. A bit array is produced by the data dependency checker 494 that identifieε not only which inεtructionε are dependant on otherε, but alεo the regiεterε upon which each dependency ariεeε.
The carry and regiεter data dependencieε are identified εhortly after the beginning of the second proceεεor cycle.
4 ) Register Rename Unit Detail: The regiεter rename unit 496 receiveε the identification of the regiεter references of all eight pending instructions via the control lines 506, and register dependencies via the control lines 508. A matrix of eight elements is alεo received via the control lineε 542 that identify thoεe inεtructions within the current set of pending inεtructionε that have been executed (done) . From thiε information, the regiεter rename unit 496 provideε an eight element array of control εignalε to the inεtruction iεεuer unit 498 via the control lines 512. The control information so provided reflectε the determination made by the regiεter rename unit 496 as to which of the currently pending inεtructions, that have not already been executed, are now available to be executed given the current set of identified data dependencieε. The regiεter rename unit 496 receiveε a selection control signal via the lines 516 that identifies up to six instructions that are to be εimultaneouεly iεεued for execution: two integer, two floating point and two boolean. The regiεter rename unit 496 performε the additional function of εelecting, via control εignalε provided on the buε 518 to the regiεter file array 472, the εource regiεterε for acceεε in the execution of the identified inεtructions. Deεtination regiεterε for out- of-order executed instructionε are εelected as being in the temporary buffers 612, 660, 728 of the correεponding data path. In-order executed inεtructionε are retired on completion with reεult data being εtored through to the register files 614, 684, 732. The selection of εource regiεterε dependε on whether the regiεter haε been prior εelected aε a deεtination and the correεponding prior inεtruction haε not yet been retired. In εuch an inεtance, the εource register iε εelected from the correεponding temporary buffer 612, 680, 728. Where the prior inεtruction has been retired, then the register of the corresponding regiεter file 614, 684, 732 iε εelected. Conεequently, the regiεter rename unit 496 operates to effectively substitute temporary buffer register references for register file register references in the case of out-of-order executed instructions.
Aε implemented in the architecture 100, the temporary buffers 612, 680, 728 are not duplicate register εtructureε of their corresponding regiεter file arrayε. Rather, a εingle destination register slot is provided for each of eight pending inεtructionε. Conεequently, the εubεtitution of a temporary buffer deεtination regiεter reference is determined by the location of the corresponding instruction within the pending regiεter εetε. A εubεequent εource regiεter reference iε identified by the data dependency checker 494 with reεpect to the instruction from which the εource dependency occurs. Therefore, a destination slot in the temporary buffer regiεter is readily determinable by the register rename unit 496. 5 Instruction Issuer Unit Detail:
The inεtruction iεεuer unit 498 determines the εet of inεtructions that can be issued, based on the output of the regiεter rename unit 496 and the function requirementε of the inεtructions as identified by the EDecode unit 490. The inεtruction iεεuer unit 498 makeε this determination baεed on the status of each of the functional units 478^ aε reported via control lineε 514. Thuε, the inεtruction iεεuer unit 498 begins operation upon receipt of the available set of inεtructions to issue from the regiεter rename unit 496. Given that a register file access iε required for the execution of each inεtruction, the instruction iεεuer unit 498 anticipates the availability of functional unit 478^ that may be currently executing an instruction. In order to minimize the delay in identifying the instructionε to be iεsued to the register rename unit 496, the inεtruction iεεuer unit 498 is implemented in dedicated combinatorial logic. Upon identification of the inεtructionε to iεεue, the register rename unit 496 initiates a register file access that continues to the end of the third proceεsor cycle, P2. At the beginning of processor cycle P3, the instruction issuer unit 498 initiates operation by one or more of the functional units 478^, such aε εhown aε "Execute 0", to receive and proceεs εource data provided from the register file array 472.
Typically, most instructions proceεεed by the architecture 100 are executed through a functional unit in a εingle proceεsor cycle. However, εome inεtructionε require multiple proceεεor cycles to complete, εuch aε εhown aε "Execute 1", a εimultaneouεly issued inεtruction. The Execute zero and Execute 1 inεtructionε may, for example, be executed by an ALU and floating point multiplier functional unitε reεpectively. The ALU functional unit, aε εhown is Figure 14, produces output data within one procesεor cycle and, by εimple proviεion of output latching, available for use in executing another instruction during the fifth procesεor cycle, P4. The floating point multiply functional unit iε preferably an internally pipelined functional unit. Therefore, another additional floating point multiply inεtruction can be iεεued in the next proceεεor cycle. However, the reεult of the firεt instruction will not be available for a data dependant number of processor cycles; the instruction εhown in Figure 14 requireε three proceεεor cycleε to complete proceεεing through the functional unit. During each proceεsor cycle, the function of the inεtruction issuer unit 498 iε repeated. Conεequently, the εtatuε of the current εet of pending inεtructionε aε well as the availability state of the full εet of functional units 478^ are reevaluated during each procesεor cycle. Under optimum conditionε, the preferred architecture 100 is therefore capable of executing up to six inεtructionε per proceεsor cycle. However, a typical instruction mix will reεult in an overall average execution of 1.5 to 2.0 inεtructionε per proceεεor cycle.
A final conεideration in the function of the inεtruction issuer 498 is itε participation in the handling of traps conditions and the execution of specific instructions. The occurrence of a trap condition requires that the IEU 104 be cleared of all inεtructionε that have not yet been retired. Such a circumstance may arise in reεponεe to an externally received interrupt that iε relayed to the IEU 104 via the interrupt requeεt/acknowledge control line 340, from any of the functional units 478o.n in reεponεe to an arithmetic fault, or, for example, the EDecode unit 490 upon the decoding of an illegal instruction. On the occurrence of the trap condition, the inεtruction iεεuer unit 498 iε responsible for halting or voiding all un- retired inεtructionε currently pending in the IEU 104. All inεtructions that cannot be retired εimultaneously will be voided. This result is essential to maintain e preciseness of the occurrence of the interrupt with respect to the conventional in-order execution of a program inεtruction εtream. Once the IEU 104 is ready to begin execution of the trap handling program routine, the instruction issuer 498 acknowledges the interrupt via a return control signal along the control lines 340. Also, in order to avoid the possibility that an exception condition relative to one instruction may be recognized based on a processor εtate bit which would have changed before that instruction would have executed in a clasεical pure in-order routine, the inεtruction issuer 498 is responsible for ensuring that all instructions which can alter the PSR (εuch aε special move and return from trap) are executed strictly in- order.
Certain instructions that alter program control flow are not identified by the IDecode unit 262. Instructionε of thiε type include subroutine returns, returnε from procedural inεtructions, and returnε from trapε. The instruction isεuer unit 498 provideε identifying control εignalε via the IEU return control lines 350 to the IFU 102. A corresponding one of the εpecial regiεters 412 iε εelected to provide the IF_PC execution address that existed at the point in time of the call instruction, occurrence of the trap or encountering of a procedural instruction. - I l l -
61 Done Control Unit Detail:
The done control unit 540 monitors the functional unitε 478^ for the completion εtatuε of their current operationε. In the preferred architecture 100, the done control unit 540 anticipates the completion of operationε by each functional unit εufficient to provide a completion vector, reflecting the εtatus of the execution of each instruction in the currently pending εet of inεtructionε, to the regiεter rename unit 496, bypasε control unit 520 and retirement control unit 500 approximately one half proceεsor cycle prior to the execution completion of an instruction by a functional unit 478o.n. This allows the instruction isεuer unit 498, via the regiεter rename unit 496, to consider the instruction completing functional units as available resourceε for the next inεtruction iεεuing cycle. The bypaεε control unit 520 iε allowed to prepare to bypaεε data output by the functional unit through the bypaεε unit 474. Finally, the retirement control unit 500 may operate to retire the corresponding instruction εimultaneouε with the tranεfer of data from the functional unit 478^ to the regiεter file array 472. 71 Retirement Control Unit Detail;
In addition to the instruction done vector provided from the done control unit 540, the retirement control unit 500 monitorε the oldeεt instruction set output from the EDecode output 490. As each inεtruction in inεtruction εtream order iε marked done by the done control unit 540, the retirement control unit 500 directs, via control signals provided on control lineε 534, the transfer of data from the temporary buffer slot to the correεponding inεtruction εpecified regiεter file regiεter location within the regiεter file array 472. The PC Inc/Size control εignalε are provided on the control lineε 344 for each- one or more inεtruction εimultaneouεly retired. Up to four instructions may be retired per proceεεor cycle. Whenever an entire instruction εet has been retired, an IFIFO read control εignal iε provided on the control line 342 to advance the IFIFO 264.
81 Control Flow Control Unit Detail:
The control flow control unit 528 operateε to continuouεly provide the IFU 102 with information εpecifying whether any control flow inεtructionε within the current εet of pending inεtructionε have been reεolved and, further, whether the branch reεult is taken or not taken. The control flow control unit 528 obtains, via control lineε 510, an identification of the control flow branch inεtructionε by the EDecode 490. The current set of regiεter dependencieε iε provided via control lineε 536 from the data dependency checker unit 494 to the control flow control unit 528 to allow the control flow control unit 528 to determine whether the outcome of a branch inεtruction iε constrained by dependencies or iε now known. The register referenceε provided via buε 518 from the regiεter rename unit 496 are monitored by the control flow control 528 to identify the boolean regiεter that will define the branch deciεion. Thuε, the branch deciεion may be determined even prior to the out-of-order execution of the control flow inεtruction.
Simultaneouε with the execution of a control flow inεtruction, the bypaεs unit 472 is directed by the bypasε control unit 520 to provide the control flow reεults onto control lineε 530, conεiεting of the control flow zero and control flow one 1 control lines 750, 752, to the control flow control unit 528. Finally, the control flow control unit 528 continuously provides two vectorε of eight bitε each to the IFU 102 via control lines 348. These vectorε define whether a branch inεtruction at the corresponding logical location correεponding to the bitε within the vectorε have been resolved and whether the branch result is taken or not taken.
In the preferred architecture 100, the control flow control unit 528 is implemented as pure combinatorial logic operating continuously in responεe to the input control εignalε to the control unit 528.
9) Bypass Control Unit Petal1;
The instruction isεuer unit 498 operateε closely in conjunction with the bypasε control unit 520 to control the routing of data between the register file array 472 and the functional units 478o.n. The bypasε control unit 520 operates in conjunction with the register file accesε, output and εtore phaεeε of operation εhown in Figure 14. During a regiεter file access, the bypasε control unit 520 may recognize, via control lines 522, an acceεs of a destination register within the register file array 472 that iε in the proceεε of being written during the output phase of execution of an inεtruction. In this case, the bypasε control unit 520 directε the εelection of data provided on the functional unit output bus 482 to be bypaεεed back to the functional unit diεtribution buε 480. Control over the bypasε unit 520 iε provided by the inεtruction issuer unit 498 via control lines 542.
TV. Virtual Memory Control Unit:
An interface definition for the VMU 108 is provided in Figure 15. The VMU 108 conεiεtε principally of a VMU control logic unit 800 and a content addressable memory (CAM) 802. The general function of the VMU 108 iε εhown graphically in Figure 16. There, a representation of a virtual addreεε is εhown partitioned into a space identifier (sID[31 : 28] ) , a virtual page number (VADDR[27: 14] ) , page offset (PADDR[13:4] ) , and a requeεt ID (rID[3:0]). The algorithm for generating a phyεical addreεε iε to uεe the space ID to εelect one of 16 registers within a space table 842. The contents of the selected εpace regiεter in combination with a virtual page number iε uεed aε an addreεs for accesεing a table look aεide buffer (TLB) 844. The 34 bit addreεs operates as a content addreεε tag uεed to identify a correεponding buffer regiεter within the buffer 844. On the occurrence of a tag match, an 18 bit wide regiεter value is provided aε the high order 18 bits of a phyεical addreεε 846. The page offεet and requeεt ID are provided aε the low order 14 bitε of the phyεical addresε 846.
Where there is a tag miss in the table look aside buffer 844, a VMU miss is signalled. This requireε the execution of a VMU faεt trap handling routine that implements conventional haεh algorithm 848 that acceεεeε a complete page table data εtructure maintained in the MAU 112. Thiε page table 850 containε entrieε for all memory pages currently in uεe by the architecture 100. The haεh algorithm 848 identifieε thoεe entries in the page table 850 necesεary to εatiεfy the current virtual page translation operation. Thoεe page table entrieε are loaded from the MAU 112 to the trap regiεterε of regiεter εet "A" and then transferred by εpecial regiεter move instructions to the table look aεide buffer 844. Upon return from the exception handling routine, the inεtruction giving rise to the VMU miεε exception iε re-executed by the IEU 104. The virtual to physical addreεε tranεlation operation εhould then complete without exception.
The VMU control logic 800 provides a dual interface to both the IFU 102 and IEU 104. A ready εignal iε provided on control lines 822 to the IEU 104 to signify that the VMU 108 is available for an address translation. In the preferred embodiement, the VMU 108 is alsways ready to accept IFU 120 translation requests. Both the IFU and IEU 102, 104 may poεe requests via control line 328, 804. In the preferred architecture 100, the IFU 102 has priority access to the VMU 108. Consequently, only a single busy control line 820 is provided to the IEU 104.
Both the IFU and IEU 102, 104 provide the εpace ID and virtual page number fieldε to the VMU control logic 800 via control lineε 326, 808, reεpectivel . In addition, the IEU 104 provides a read/write control εignal via control signal 806 to define whether the addresε iε to be uεed for a load or εtore operation aε neceεεary to modify memory acceεε protection attributeε of the virtual memory referenced. The εpace ID and virtual page fieldε of the virtual addreεs are pasεed to the CAM unit 802 to perform the actual tranεlation operation. The page offεet and ExID fieldε are eventually provided by the IEU 104 directly to the CCU 106. The phyεical page and requeεt ID fieldε are provided on the addreεε lineε 836 to the CAM unit 802. The occurrence of a table look aside buffer match is signalled via the hit line and control output lineε 830 to the VMU control logic unit 800. The resulting physical addresε, 18 bits in length, iε provided on the addreεε output lineε 824.
The VMU control logic unit 800 generateε the virtual memory miss and virtual memory exception control εignalε on lineε 334, 332 in reεponεe to the hit and control output control εignalε on lineε 830. A virtual memory translation miεs iε defined aε failure to match a page table identifier in the table look aεide buffer 844. All other tranεlation errorε are reported aε virtual memory exceptions.
Finally, the data tables within the CAM unit 802 may be modified through the execution of εpecial register to regiεter move inεtructionε by the IEU 104. Read/write, regiεter εelect, reεet, load and clear control εignalε are provided by the IEU 104 via control lines 810, 812, 814, 816, 818. Data to be written to the CAM unit registerε iε received by the VMU control logic unit 800 via the addreεε buε 808 coupled to the special address data bus 354 from the IEU 104. Thiε data iε transferred via bus 836 to the CAM unit 802 εimultaneouε with control εignalε 828 that control the initialization, regiεter εelection, and read or write control signal. Consequently, the data registerε within the CAM unit 802 may be readily written aε required during the dynamic operation of the architecture 100 including read out for εtorage aε required for the handling of context εwitcheε defined by a higher level operating syεtem.
V. Cache Control Unit:
The control on data interface for the CCU 106 iε εhown in Figure 17. Again, εeparate interfaceε are provided for the IFU 102 and IEU 104. Further, logically εeparate interfaces are provided by the CCU 106 to the MCU 110 with respect to inεtruction and data tranεfers.
The IFU interface consists of the phyεical page addreεs provided on addresε lineε 324, the VMU converted page addreεε aε provided on the addreεε lineε 824, and requeεt IDε aε tranεferred εeparately on control lineε 294, 296. A unidirectional data tranεfer buε 114 iε provided to tranεfer an entire inεtruction εet in parallel to the IFU 102. Finally, the read/busy and ready control εignals are provided to the CCU 106 via control lines 298, 300, 302.
Similarly, a complete physical addresε iε provided by the IEU 102 via the phyεical addreεs buε 788. The requeεt ExIDε are εeparately provided from and to the load/εtore unit of the IEU 104 via control lines 796. An 80 bit wide bidirectional data bus iε provided by the CCU 106 to the IEU 104. However, in the present preferred implementation of the architecture 100, only the lower 64 bits are utilized by the IEU 104. The availability and εupport within the CCU 106 of a full 80 bit data tranεfer bus is provided to εupport subsequent implementations of the architecture 100 that support, through modifications of the floating point data path 660, floating point operation in accordance with IEEE standard 754.
The IEU control interface, establiεhed via request, busy, ready, read/write and with control εignalε 784 iε εubstantially the same as the correεponding control εignalε utilized by the IFU 102. The exception being the provision of a read/write control εignal to differentiate between load and εtore operationε. The width control signals specify the number of bytes being transferred during each CCU 106 acceεε by the IEU 104; in contraεt every acceεε of the inεtruction cache 132 is a fixed 128 bit wide data fetch operation.
The CCU 106 implements a εubεtantially conventional cache controller function with reεpect to the separate inεtruction and data cacheε 132, 134. In the preferred architecture 100, the instruction cache 132 iε a high εpeed memory providing for the εtorage of 256 128 bit wide inεtruction εetε. The data cache 134 provideε for the εtorage of 1024 32 bit wide wordε of data. Inεtruction and data requeεts that cannot be immediately satisfied from the contents of the instruction and data caches 132, 134 are passed on to the MCU 110. For instruction cache isεeε, the 28 bit wide phyεical addreεs is provided to the MCU 110 via the addresε bus 860. The request ID and additional control signals for coordinating the operation of the CCU 106 and MCU 110 are provided on control lines 862. Once the MCU 110 has coordinated the necessary read access of the MAU 112, two consecutive 64 bit wide data tranεfers are performed directly from the MAU 112 through to the instruction cache 132. Two transfers are required given that the data bus 136 is, in the preferred architecture 100, a 64 bit wide buε. Aε the requeεted data iε returned through the MCU 110 the request ID maintained during the pendency of the request operation is alεo returned to the CCU 106 via the control lineε 862.
Data tranεfer operations between the data cache 134 and MCU 110 are εubstantially the same aε inεtruction cache operations . Since data load and εtore operationε may reference a εingle byte, a full 32 bit wide phyεical addreεε is provided to the MCU 110 via the addreεε buε 864. Interface control εignalε and the requeεt ExID are tranεferred via control lineε 866. Bidirectional 64 bit wide data tranεfers are provided via the data cache buε 138.
VI. Summary/Conc usion:
Thus, a high-performance RISC baεed microprocessor architecture has been diεcloεed. The architecture efficiently implementε out-of-order execution of inεtructionε, εeparate main and target inεtruction εtream prefetch instruction transfer paths, and a procedural inεtruction recognition and dedicated prefetch path. The optimized inεtruction execution unit provides multiple optimized data processing paths supporting integer, floating point and boolean operations and incorporates reεpective temporary register files facilitating out-of-order execution and instruction cancellation while maintaining a readily establiεhed precise state-of-the-machine εtatuε.
It is therefore to be understood that while the foregoing disclosure describeε the preferred embodiment of the preεent invention, other variationε and modifications may be readily made by thoεe of average εkill within the scope of the present invention.

Claims

1. A method for handling a trap in a microprocesεor for which inεtructionε have a minimum length of £ addreεε locationε, compriεing the εtepε of: determining the entry point for a trap handling routine as addreεε location nm + b, where b iε a baεe address location, n iε a trap number correεponding to εaid trap, and iε a multiplier greater than or equal to 2£; and tranεferring control to an inεtruction at εaid entry point.
2. A method according to claim 1, wherein all instructions for εaid microproceεεor have the εame length £.
3. A method according to claim 1, wherein m iε a power of 2 and b iε an integer multiple of m, and wherein the εtep of determining comprises the εtep of concatenating n aε low-order bitε to all the bitε of b having an order at least as high as log2 m.
4. A method according to claim 3, wherein £ iε a power of 2, and wherein the εtep of determining further co priεeε the step of concatenating log2 £ zero bitε aε low-order bitε to εaid concatenation of n and εaid bits of b having an order higher than log2 m.
5. A method according to claim 1, wherein εaid entry point iε provided as a logical addreεε, further compriεing the εtep of determining a phyεical addreεs from said logical addreεε.
6. A method for handling trapε in a microproceεsor for which inεtructionε have a minimum length of £ addreεε locationε, each trap generating a trap number n, compriεing the εtepε of: determining a firεt vector addresε location aε nm, + b, for trapε of a firεt type, where b, iε a firεt baεe addreεε location and m, iε a firεt multiplier; determining a second vector addreεε location aε nmj + b2 for trapε of a second type, where b2 iε a εecond baεe addreεs location different from said firεt baεe addreεε location and m-, iε a second multiplier greater than n, π^ j> 2£; and transferring control to an instruction at eaid εecond vector addreεs location for traps of εaid second type.
7. A method according to claim 6, wherein mt = £ , further compriεing the εtepε of: εtoring a branch inεtruction at a plurality of the vector addreεs locationε km, + b1, k = 1, 2, 3, ..., prior to the εtep of determining a firεt vector address location; and transferring control to the branch instruction at said firεt vector addreεs location for traps of said first type.
8. A method for handling traps in a microproceεεor, compriεing the εtepε of: prefetching inεtructionε in an inεtruction εtream for εubεequent execution by εaid microprocessor; executing prefetched instructions during an execution time; detecting, prior to a given execution time being the execution time for a given one of said prefetched inεtructionε, whether said given one of said inεtructionε haε had any of a firεt claεε of εynchronouε exceptionε; and invoking an exception handler during εaid given execution time if εaid given one of εaid inεtructionε has had a εynchronous exception.
9. A method according to claim 8, wherein εaid εtep of prefetching compriεeε the εtep of prefetching a plurality of inεtructionε at a time.
10. A method according to claim 8, wherein εaid first class of exceptions are faults occurring during said εtep of prefetching.
11. A method according to claim 8, further compriεing the εtep of detecting, during εaid given execution time, whether εaid given one of εaid inεtructions haε had any one of an execution claεε of synchronouε exceptionε.
12. A method according to claim 8, wherein εaid microprocessor is capable of executing instructionε out- of-order relative to their order in said εtream.
13. A method according to claim 8, wherein εaid microproceεsor iε capable of executing a plurality of inεtructions during each execution time.
14. A method according to claim 8, wherein said microprocessor iε capable of executing a plurality of inεtructions from a εequence of instructionε during a single execution time, further comprising the steps of: scheduling a given plurality of instructions for execution during said given execution time; and determining εaid given one of εaid inεtructionε aε the sequentially first inεtruction in εaid given plurality which haε had a εynchronouε exception.
15. A method according to claim 14, further compriεing the εtep of detecting, during εaid given execution time and prior to εaid εtep of determining, whether each inεtruction in εaid given plurality haε had any of an execution claεε of exceptionε.
16. A method according to claim 15, wherein εaid execution clasε of exceptionε includeε a εecond exception type which iε dependent upon the εtate of at least one processor εtatuε bit during εaid given execution time, wherein εaid microproceεεor is capable of executing inεtructionε out-of-order relative to their order in εaid sequence of inεtructionε, further compriεing the step of scheduling all instructions which can modify said procesεor εtatus bit to execute in the same εequence aε in εaid sequence of instructions.
17. A method according to claim 14, wherein εaid εtep of executing compriεeε the εteps of: tentatively executing a plurality of inεtructionε scheduled for execution and storing any results of εaid tentative execution in temporary registerε; and copying reεultε from said temporary regiεters into permanent regiεterε upon retirement of an inεtruction, further comprising the steps of: retiring all instructions in said given plurality which were sequentially prior to said given instruction; and cancelling all instructions in said given plurality which were sequentially subsequent to εaid given inεtruction.
18. A method according to claim 17, further compriεing the εtep of cancelling εaid given instruction.
19. A method according to claim 8, wherein all inεtructionε for εaid microproceεεor have the εame length £ , and wherein εaid εtep of invoking comprises the steps of: determining the entry point for an exception handling routine aε addreεε location nm + b, where b iε a base addreεε location, n iε an exception trap number correεponding to εaid εynchronouε exception, and m iε a multiplier greater than or equal to 2£; and transferring control to an inεtruction at εaid entry point.
20. A method for handling exceptions in a microprocesεor capable of executing a plurality of instructionε in a εingle execution time, compriεing the stens of: tentatively executing inεtructionε during execution timeε determined according to an execution sequence; and upon completion of all inεtructionε tentatively executed during a given execution time, if any of εaid tentatively executed inεtructionε has had a εynchronouε exception, (a) retiring all tentatively executed instructions which occur in said εtream prior to the first instruction which has had a εynchronouε exception, (b) cancelling any inεtructionε which occur in εaid stream subεequent to εaid first instruction which has had a εynchronouε exception, and (c) invoking an exception handler.
21. A method according to claim 20, wherein εaid microproceεεor further iε capable of executing inεtructionε out-of-order from their εequence in εaid instruction εtream.
22. A method for uεe in a microproceεsor, comprising the εteps of: executing inεtructionε from a main inεtruction εtream; in reεponεe to a procedural inεtruction in εaid main instruction stream, executing instructionε from an emulation instruction εtream while maintaining an indication of a first return addreεε to εaid main instruction εtream; in response to a εynchronous exception occurring relative to an instruction in εaid main inεtruction εteam, executing inεtructions from a εecond handler instruction stream, while maintaining an indication of a second return addreεε to εaid main inεtruction εtream; and in response to a synchronous exception occurring relative to an instruction in said emulation instruction εtream, executing instructions from a third handler inεtruction stream, while maintaining both an indication of a third return address to said emulation inεtruction .stream and said indication of said firεt return address to said main instruction stream.
23. A method according to claim 22, further compriεing the εtepε of: reεuming execution of inεtructionε from εaid main inεtruction εtream beginning at εaid second return addresε, in reεponεe to a return from trap inεtruction in εaid εecond handler instruction εtream; and resuming execution of instructionε from εaid emulation inεtruction stream beginning at εaid third return addreεε, in reεponεe to a return from trap instruction in said third handler instruction εtream.
24. A method according to claim 22, further compriεing the εtep of reεuming execution of inεtructionε from εaid main inεtruction εtream beginning at εaid first return addresε in reεponεe to a return from procedure inεtruction in εaid emulation instruction stream.
25. Apparatus for handling a trap in a microprocessor for which instructions have a minimum length of 21 locations, compriεing: a firεt εet of conductorε for carrying high- order bitε of a baεe addreεε; firεt meanε for providing εaid high-order bitε of εaid base address on said firεt εet of conductorε; a εecond εet of conductorε for carrying a trap number; εecond meanε for providing εaid trap number on εaid second εet of conductorε; and meanε for generating 'a firεt entry point addreεε from a concatenation of said first and εecond sets of conductors followed by, aε lower-order bits, j, conductors each carrying a reεpective fixed logic level, j*, ≥- - + 1-
26. Apparatuε according to claim 25, wherein all inεtructionε for said microproceεsor have the same length 21.
27. Apparatuε according to claim 25, wherein said first entry point addreεs iε provided aε a logical addreεs, and wherein said means for generating further includes means for converting said logical addreεs to a phyεical address.
28. Apparatus according to claim 25, wherein εaid trap may be of a firεt type or a εecond type, wherein εaid first means for providing compriseε: firεt and εecond trap baεe addreεε sources; and meanε for placing information from εaid first trap baεe εource on εaid first set of conductors if εaid trap is of εaid firεt type, and for placing information from eaid εecond trap baεe addreεε εource on εaid firεt εet of conductorε if εaid trap is of said second type, and wherein said meanε for generating generateε εaid first entry point addreεε if εaid trap iε of εaid firεt type, and is further for, if said trap iε of said εecond type, generating a εecond entry point addreεε from a concatenation of εaid firεt and εecond εetε of conductorε followed by, aε loweεt-order bitε, exactly j2 conductors each carrying a reεpective fixed logic level,
29. Apparatus according to claim 28, wherein all inεtructionε for said microproceεεor have the same length 21, and wherein j2 = i.
30. Apparatus for handling exceptions in a miσroprocesεor, for use with a εource of instructions and an exception handler, compriεing: execution means for executing instructionε provided thereto; and prefetch meanε for prefetching inεtructionε from εaid εource of inεtructions and providing them to said execution means in a particular sequence, wherein said prefetch meanε includes indicator meanε for indicating, in correspondence with said instructionε provided to εaid execution means, whether a synchronous prefetch exception haε occurred relative to εaid given inεtruction, and wherein said execution means includes invoking means, responsive to said indicator means, for invoking an exception handler if a synchronous exception has occurred relative to an inεtruction provided to εaid execution meanε by εaid prefetch meanε.
31. Apparatuε according to claim 30, wherein εaid execution means further includes detection means for detecting a εynchronous execution exception occurring relative to an instruction being executed by εaid execution meanε, εaid invoking meanε further being responsive to said detection means.
32. Microprocesεor apparatus, for use with a source of instructions and an exception handler, comprising: execution means for executing inεtructionε provided to it; a program counter and firεt and second εtorage registerε; meanε for updating εaid program counter in response to each instruction executed by said execution means; meanε for, in reεponεe to a procedural inεtruction, εtoring εaid program counter in εaid firεt tcragε regiεter and providing inεtructionε to εaid execution means from an emulation εtream beginning at an addreεε reεponεive to said procedural instruction; and means for, in response to the occurrence of a εynchronouε exception relative to an inεtruction provided to εaid execution meanε, εtoring εaid program counter in εaid εecond εtorage register and providing instructionε to εaid execution meanε from a handler εtream beginning at an addreεε reεponεive to εaid εynchronouε exception.
33. Apparatuε according to claim 32, further compriεing meanε for providing further inεtructionε to εaid execution meanε beginning at an addreεε reεponεive to the contentε of εaid firεt εtorage regiεter, in response to a procedural return inεtruction in εaid emulation εtream.
34. Apparatuε according to claim 32, further compriεing meanε for providing further inεtructionε to εaid execution means beginning at an addreεε responsive to the contents of εaid second εtorage regiεter, in reεponεe to a trap return inεtruction in said handler stream.
35. Microprocessor apparatus for use with a regiεter εelect εignal, compriεing: a functional unit having at leaεt one data input and at leaεt one data output; a procesεor state bit and means for setting said εtate bit to a firεt value to indicate an interrupted state in reεponse to the occurrence of a trap; a εet of firεt regiεters each having a data output; at leaεt one εecond regiεter each having a data output and each correεponding to a reεpective one of εaid first registerε; and meanε for coupling to εaid data input of said functional unit: the data output of a εelected one of said first registers selected in responεe to εaid regiεter εelect εignal if εaid εtate bit iε not in said first value, or if none of εaid second registerε corresponds to said selected firεt regiεter; and if one of εaid εecond regiεterε correεponds to said selected firεt regiεter and said state bit iε in said firεt value, the data output of said εecond regiεter corresponding to said εelected firεt regiεter.
PCT/JP1992/000872 1991-07-08 1992-07-07 Risc microprocessor architecture implementing fast trap and exception state WO1993001547A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
JP50215493A JP3333196B2 (en) 1991-07-08 1992-07-07 Trap processing method
AT92914386T ATE188786T1 (en) 1991-07-08 1992-07-07 RISC MICROPROCESSOR ARCHITECTURE WITH FAST INTERRUPTION AND EXCEPTION MODES
KR1019930700689A KR100294276B1 (en) 1991-07-08 1992-07-07 RSC microprocessor structure with high speed trap and exception
DE69230554T DE69230554T2 (en) 1991-07-08 1992-07-07 RISC MICROPROCESSOR ARCHITECTURE WITH FAST INTERRUPT AND EXCEPTION MODE
EP92914386A EP0547240B1 (en) 1991-07-08 1992-07-07 Risc microprocessor architecture implementing fast trap and exception state
HK98116066A HK1014783A1 (en) 1991-07-08 1998-12-28 Risc microprocessor architecture implementing fast trap and exception state

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US72694291A 1991-07-08 1991-07-08
US726,942 1991-07-08

Publications (1)

Publication Number Publication Date
WO1993001547A1 true WO1993001547A1 (en) 1993-01-21

Family

ID=24920677

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1992/000872 WO1993001547A1 (en) 1991-07-08 1992-07-07 Risc microprocessor architecture implementing fast trap and exception state

Country Status (8)

Country Link
US (2) US5448705A (en)
EP (2) EP0945787A3 (en)
JP (6) JP3333196B2 (en)
KR (1) KR100294276B1 (en)
AT (1) ATE188786T1 (en)
DE (1) DE69230554T2 (en)
HK (1) HK1014783A1 (en)
WO (1) WO1993001547A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5673427A (en) * 1994-03-01 1997-09-30 Intel Corporation Packing valid micro operations received from a parallel decoder into adjacent locations of an output queue
EP0806723A2 (en) * 1996-05-07 1997-11-12 Lucent Technologies Inc. Method and apparatus for handling multiple precise events in a pipelined digital processor
EP0815507A1 (en) * 1995-02-14 1998-01-07 Fujitsu Limited Structure and method for high-performance speculative execution processor providing special features
WO1999031579A2 (en) * 1997-12-15 1999-06-24 Motorola Inc. Computer instruction which generates multiple data-type results

Families Citing this family (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493687A (en) 1991-07-08 1996-02-20 Seiko Epson Corporation RISC microprocessor architecture implementing multiple typed register sets
EP0547247B1 (en) 1991-07-08 2001-04-04 Seiko Epson Corporation Extensible risc microprocessor architecture
US5539911A (en) 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
WO1993020505A2 (en) * 1992-03-31 1993-10-14 Seiko Epson Corporation Superscalar risc instruction scheduling
DE69308548T2 (en) 1992-05-01 1997-06-12 Seiko Epson Corp DEVICE AND METHOD FOR COMPLETING THE COMMAND IN A SUPER-SCALAR PROCESSOR.
EP0682789B1 (en) * 1992-12-31 1998-09-09 Seiko Epson Corporation System and method for register renaming
US5628021A (en) 1992-12-31 1997-05-06 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
JPH06242948A (en) * 1993-02-16 1994-09-02 Fujitsu Ltd Pipeline processing computer
JP2596712B2 (en) * 1993-07-01 1997-04-02 インターナショナル・ビジネス・マシーンズ・コーポレイション System and method for managing execution of instructions, including adjacent branch instructions
US5548776A (en) * 1993-09-30 1996-08-20 Intel Corporation N-wide bypass for data dependencies within register alias table
DK0661625T3 (en) * 1994-01-03 2000-04-03 Intel Corp Method and apparatus for implementing a four-stage system for determining program branches (Four Stage Bra
GB2286265B (en) * 1994-01-26 1998-02-18 Advanced Risc Mach Ltd selectable processing registers
US5515521A (en) * 1994-02-08 1996-05-07 Meridian Semiconductor, Inc. Circuit and method for reducing delays associated with contention interference between code fetches and operand accesses of a microprocessor
NL9400607A (en) * 1994-04-15 1995-11-01 Arcobel Graphics Bv Data processing circuit, pipeline multiplier, ALU, and shift register unit for use with a data processing circuit.
JP3672634B2 (en) 1994-09-09 2005-07-20 株式会社ルネサステクノロジ Data processing device
US5642516A (en) * 1994-10-14 1997-06-24 Cirrus Logic, Inc. Selective shadowing of registers for interrupt processing
US5673426A (en) * 1995-02-14 1997-09-30 Hal Computer Systems, Inc. Processor structure and method for tracking floating-point exceptions
US6006030A (en) * 1995-02-17 1999-12-21 Vlsi Technology, Inc. Microprocessor with programmable instruction trap for deimplementing instructions
US5887152A (en) * 1995-04-12 1999-03-23 Advanced Micro Devices, Inc. Load/store unit with multiple oldest outstanding instruction pointers for completing store and load/store miss instructions
US5692170A (en) * 1995-04-28 1997-11-25 Metaflow Technologies, Inc. Apparatus for detecting and executing traps in a superscalar processor
US6052801A (en) * 1995-05-10 2000-04-18 Intel Corporation Method and apparatus for providing breakpoints on a selectable address range
US5659679A (en) * 1995-05-30 1997-08-19 Intel Corporation Method and apparatus for providing breakpoints on taken jumps and for providing software profiling in a computer system
US5694589A (en) * 1995-06-13 1997-12-02 Intel Corporation Instruction breakpoint detection apparatus for use in an out-of-order microprocessor
US5740413A (en) * 1995-06-19 1998-04-14 Intel Corporation Method and apparatus for providing address breakpoints, branch breakpoints, and single stepping
US5621886A (en) * 1995-06-19 1997-04-15 Intel Corporation Method and apparatus for providing efficient software debugging
US5774709A (en) * 1995-12-06 1998-06-30 Lsi Logic Corporation Enhanced branch delay slot handling with single exception program counter
US5729729A (en) * 1996-06-17 1998-03-17 Sun Microsystems, Inc. System for fast trap generation by creation of possible trap masks from early trap indicators and selecting one mask using late trap indicators
US5924128A (en) * 1996-06-20 1999-07-13 International Business Machines Corporation Pseudo zero cycle address generator and fast memory access
US5958061A (en) * 1996-07-24 1999-09-28 Transmeta Corporation Host microprocessor with apparatus for temporarily holding target processor state
US6199152B1 (en) 1996-08-22 2001-03-06 Transmeta Corporation Translated memory protection apparatus for an advanced microprocessor
US5937437A (en) * 1996-10-28 1999-08-10 International Business Machines Corporation Method and apparatus for monitoring address translation performance
US5764971A (en) * 1996-12-11 1998-06-09 Industrial Technology Research Institute Method and apparatus for implementing precise interrupts in a pipelined data processing system
US5784606A (en) * 1996-12-16 1998-07-21 International Business Machines Corporation Method and system in a superscalar data processing system for the efficient handling of exceptions
US5850556A (en) * 1996-12-26 1998-12-15 Cypress Semiconductor Corp. Interruptible state machine
US5978900A (en) * 1996-12-30 1999-11-02 Intel Corporation Renaming numeric and segment registers using common general register pool
US6253317B1 (en) * 1997-01-09 2001-06-26 Sun Microsystems, Inc. Method and apparatus for providing and handling traps
US5987601A (en) * 1997-02-14 1999-11-16 Xyron Corporation Zero overhead computer interrupts with task switching
US6122729A (en) 1997-05-13 2000-09-19 Advanced Micro Devices, Inc. Prefetch buffer which stores a pointer indicating an initial predecode position
CN1107909C (en) * 1997-07-11 2003-05-07 全斯美达有限公司 Host microprocessor with apparatus for temporarily holding target processor state
DE69739608D1 (en) * 1997-07-11 2009-11-12 Intellectual Venture Funding L COMPUTER MICROPROCESSOR WITH DEVICE FOR PERIODICALLY STOPPING THE PROCESSOR CONDITION OF A TARGET COMPUTER
US6128728A (en) 1997-08-01 2000-10-03 Micron Technology, Inc. Virtual shadow registers and virtual register windows
US6061787A (en) * 1998-02-02 2000-05-09 Texas Instruments Incorporated Interrupt branch address formed by concatenation of base address and bits corresponding to highest priority interrupt asserted and enabled
US6339752B1 (en) * 1998-12-15 2002-01-15 Bull Hn Information Systems Inc. Processor emulation instruction counter virtual memory address translation
US6253299B1 (en) * 1999-01-04 2001-06-26 International Business Machines Corporation Virtual cache registers with selectable width for accommodating different precision data formats
US6327650B1 (en) * 1999-02-12 2001-12-04 Vsli Technology, Inc. Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor
US6493781B1 (en) * 1999-08-19 2002-12-10 Koninklijke Philips Electronics N.V. Servicing of interrupts with stored and restored flags
US6745321B1 (en) * 1999-11-08 2004-06-01 International Business Machines Corporation Method and apparatus for harvesting problematic code sections aggravating hardware design flaws in a microprocessor
US6523097B1 (en) * 1999-11-12 2003-02-18 International Business Machines Corporation Unvalue-tagged memory without additional bits
US6678817B1 (en) * 2000-02-22 2004-01-13 Hewlett-Packard Development Company, L.P. Method and apparatus for fetching instructions from the memory subsystem of a mixed architecture processor into a hardware emulation engine
US6968469B1 (en) 2000-06-16 2005-11-22 Transmeta Corporation System and method for preserving internal processor context when the processor is powered down and restoring the internal processor context when processor is restored
FR2817460B1 (en) * 2000-12-04 2003-09-05 Ge Med Sys Global Tech Co Llc METHOD AND SYSTEM FOR SIMULATING THE ENLARGEMENT OF DIAMETER OF A BLOOD VESSEL LESION, IN PARTICULAR A STENOSIS, USING AN ENDOVASCULAR PROSTHESIS
US6889279B2 (en) * 2000-12-11 2005-05-03 Cadence Design Systems, Inc. Pre-stored vector interrupt handling system and method
US20030014474A1 (en) * 2001-05-30 2003-01-16 Mckaig Ray S. Alternate zero overhead task change circuit
EP1523347B1 (en) 2002-07-19 2011-05-18 Baxter International Inc. Systems and methods for performing peritoneal dialysis
DE60335724D1 (en) * 2002-09-03 2011-02-24 Silicon Hive Bv DEVICE AND METHOD FOR PROCESSING SMOKED INTERRUPTIONS
US7165018B2 (en) 2002-11-22 2007-01-16 Texas Instruments Incorporated Address range comparator for detection of multi size memory accesses with data matching qualification and full or partial overlap
US7493478B2 (en) * 2002-12-05 2009-02-17 International Business Machines Corporation Enhanced processor virtualization mechanism via saving and restoring soft processor/system states
US7360070B1 (en) * 2003-08-13 2008-04-15 Apple Inc. Specialized processing upon an occurrence of an exceptional situation during the course of a computation
US7014122B2 (en) * 2003-12-24 2006-03-21 International Business Machines Corporation Method and apparatus for performing bit-aligned permute
US7613911B2 (en) * 2004-03-12 2009-11-03 Arm Limited Prefetching exception vectors by early lookup exception vectors within a cache memory
GB2412192B (en) * 2004-03-18 2007-08-29 Advanced Risc Mach Ltd Function calling mechanism
JP2007041837A (en) * 2005-08-03 2007-02-15 Nec Electronics Corp Instruction prefetch apparatus and method
US20080077782A1 (en) * 2006-09-26 2008-03-27 Arm Limited Restoring a register renaming table within a processor following an exception
JP5507317B2 (en) 2010-04-12 2014-05-28 ルネサスエレクトロニクス株式会社 Microcomputer and interrupt control method
KR102025338B1 (en) * 2011-12-28 2019-09-26 삼성전자 주식회사 Signal processing device, display apparatus having the same, and method for signal processing
EP3286640A4 (en) 2015-04-24 2019-07-10 Optimum Semiconductor Technologies, Inc. Computer processor with separate registers for addressing memory
KR102268112B1 (en) * 2019-12-24 2021-06-22 한국항공우주연구원 Method and system for saving satellite commands

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4296470A (en) * 1979-06-21 1981-10-20 International Business Machines Corp. Link register storage and restore system for use in an instruction pre-fetch micro-processor interrupt system
US4410939A (en) * 1979-07-17 1983-10-18 Matsushita Electric Industrial Co. Ltd. System for program interrupt processing with quasi-stack of register-sets
US4434461A (en) * 1980-09-15 1984-02-28 Motorola, Inc. Microprocessor with duplicate registers for processing interrupts
US4459657A (en) * 1980-09-24 1984-07-10 Tokyo Shibaura Denki Kabushiki Kaisha Data processing system having re-entrant function for subroutines
EP0372751A2 (en) * 1988-12-09 1990-06-13 International Computers Limited Pipelined data-processing apparatus
EP0377991A2 (en) * 1989-01-13 1990-07-18 International Business Machines Corporation Data processing systems
EP0402856A2 (en) * 1989-06-13 1990-12-19 Nec Corporation Instruction execution control system
US5003462A (en) * 1988-05-31 1991-03-26 International Business Machines Corporation Apparatus and method for implementing precise interrupts on a pipelined processor with multiple functional units with separate address translation interrupt means

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3346851A (en) * 1964-07-08 1967-10-10 Control Data Corp Simultaneous multiprocessing computer system
GB1301471A (en) * 1968-10-29 1972-12-29
US3789365A (en) * 1971-06-03 1974-01-29 Bunker Ramo Processor interrupt system
US3771138A (en) * 1971-08-31 1973-11-06 Ibm Apparatus and method for serializing instructions from two independent instruction streams
US4034349A (en) * 1976-01-29 1977-07-05 Sperry Rand Corporation Apparatus for processing interrupts in microprocessing systems
AU529675B2 (en) * 1977-12-07 1983-06-16 Honeywell Information Systems Incorp. Cache memory unit
US4315314A (en) * 1977-12-30 1982-02-09 Rca Corporation Priority vectored interrupt having means to supply branch address directly
US4200927A (en) * 1978-01-03 1980-04-29 International Business Machines Corporation Multi-instruction stream branch processing mechanism
US4228495A (en) * 1978-12-19 1980-10-14 Allen-Bradley Company Multiprocessor numerical control system
US4434641A (en) * 1982-03-11 1984-03-06 Ball Corporation Buckle resistance for metal container closures
US4635194A (en) * 1983-05-02 1987-01-06 International Business Machines Corporation Instruction buffer bypass apparatus
US4800486A (en) * 1983-09-29 1989-01-24 Tandem Computers Incorporated Multiple data patch CPU architecture
JPS60225943A (en) * 1984-04-25 1985-11-11 Hitachi Ltd Exceptional interruption processing system
US4766564A (en) * 1984-08-13 1988-08-23 International Business Machines Corporation Dual putaway/bypass busses for multiple arithmetic units
NL193475C (en) * 1984-12-27 1999-11-02 Sony Corp Microprocessor device.
JPH0762823B2 (en) * 1985-05-22 1995-07-05 株式会社日立製作所 Data processing device
JPS63131230A (en) * 1986-11-21 1988-06-03 Hitachi Ltd Information processor
US4926323A (en) * 1988-03-03 1990-05-15 Advanced Micro Devices, Inc. Streamlined instruction processor
US4897810A (en) * 1988-06-13 1990-01-30 Advanced Micro Devices, Inc. Asynchronous interrupt status bit circuit
JPH0673105B2 (en) * 1988-08-11 1994-09-14 株式会社東芝 Instruction pipeline type microprocessor
JP2810068B2 (en) * 1988-11-11 1998-10-15 株式会社日立製作所 Processor system, computer system, and instruction processing method
DE69031257T2 (en) * 1989-09-21 1998-02-12 Texas Instruments Inc Integrated circuit with an embedded digital signal processor
JP2835103B2 (en) * 1989-11-01 1998-12-14 富士通株式会社 Instruction designation method and instruction execution method
EP0479390B1 (en) * 1990-10-05 1999-01-07 Koninklijke Philips Electronics N.V. Processing device including a memory circuit and a group of functional units

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4296470A (en) * 1979-06-21 1981-10-20 International Business Machines Corp. Link register storage and restore system for use in an instruction pre-fetch micro-processor interrupt system
US4410939A (en) * 1979-07-17 1983-10-18 Matsushita Electric Industrial Co. Ltd. System for program interrupt processing with quasi-stack of register-sets
US4434461A (en) * 1980-09-15 1984-02-28 Motorola, Inc. Microprocessor with duplicate registers for processing interrupts
US4459657A (en) * 1980-09-24 1984-07-10 Tokyo Shibaura Denki Kabushiki Kaisha Data processing system having re-entrant function for subroutines
US5003462A (en) * 1988-05-31 1991-03-26 International Business Machines Corporation Apparatus and method for implementing precise interrupts on a pipelined processor with multiple functional units with separate address translation interrupt means
EP0372751A2 (en) * 1988-12-09 1990-06-13 International Computers Limited Pipelined data-processing apparatus
EP0377991A2 (en) * 1989-01-13 1990-07-18 International Business Machines Corporation Data processing systems
EP0402856A2 (en) * 1989-06-13 1990-12-19 Nec Corporation Instruction execution control system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MELEAR C.: "THE DESIGN OF THE 88000 RISC FAMILY.", IEEE MICRO., IEEE SERVICE CENTER, LOS ALAMITOS, CA., US, vol. 09., no. 02., 1 April 1989 (1989-04-01), US, pages 26 - 38., XP000124913, ISSN: 0272-1732, DOI: 10.1109/40.24848 *
PATENT ABSTRACTS OF JAPAN vol. 010, no. 089 (P-444)8 April 1986 & JP,A,60 225 943 ( HITACHI SEISAKUSHO K K ) 11 November 1985 *
SMITH J. E., PLESZKUN A. R.: "IMPLEMENTING PRECISE INTERRUPTS IN PIPELINED PROCESSORS.", IEEE TRANSACTIONS ON COMPUTERS., IEEE SERVICE CENTER, LOS ALAMITOS, CA., US, vol. 37., no. 5., 1 May 1988 (1988-05-01), US, pages 562 - 573., XP000047779, ISSN: 0018-9340, DOI: 10.1109/12.4607 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5673427A (en) * 1994-03-01 1997-09-30 Intel Corporation Packing valid micro operations received from a parallel decoder into adjacent locations of an output queue
EP0815507A1 (en) * 1995-02-14 1998-01-07 Fujitsu Limited Structure and method for high-performance speculative execution processor providing special features
EP0815507A4 (en) * 1995-02-14 2001-05-16 Fujitsu Ltd Structure and method for high-performance speculative execution processor providing special features
EP0806723A2 (en) * 1996-05-07 1997-11-12 Lucent Technologies Inc. Method and apparatus for handling multiple precise events in a pipelined digital processor
EP0806723A3 (en) * 1996-05-07 1999-09-22 Lucent Technologies Inc. Method and apparatus for handling multiple precise events in a pipelined digital processor
WO1999031579A2 (en) * 1997-12-15 1999-06-24 Motorola Inc. Computer instruction which generates multiple data-type results
WO1999031579A3 (en) * 1997-12-15 1999-10-21 Motorola Inc Computer instruction which generates multiple data-type results

Also Published As

Publication number Publication date
KR930702719A (en) 1993-09-09
JP2003330708A (en) 2003-11-21
US5481685A (en) 1996-01-02
JP3750743B2 (en) 2006-03-01
EP0547240B1 (en) 2000-01-12
JP2001022584A (en) 2001-01-26
EP0945787A3 (en) 2008-12-31
US5448705A (en) 1995-09-05
KR100294276B1 (en) 2001-09-17
EP0945787A2 (en) 1999-09-29
JP2001067220A (en) 2001-03-16
HK1014783A1 (en) 1999-09-30
ATE188786T1 (en) 2000-01-15
JPH06502035A (en) 1994-03-03
JP3552995B2 (en) 2004-08-11
DE69230554D1 (en) 2000-02-17
DE69230554T2 (en) 2000-07-06
EP0547240A1 (en) 1993-06-23
JP3333196B2 (en) 2002-10-07
JP3879812B2 (en) 2007-02-14
JP2001022583A (en) 2001-01-26
JP2001027949A (en) 2001-01-30

Similar Documents

Publication Publication Date Title
WO1993001547A1 (en) Risc microprocessor architecture implementing fast trap and exception state
US6272619B1 (en) High-performance, superscalar-based computer system with out-of-order instruction execution
US5961629A (en) High performance, superscalar-based computer system with out-of-order instruction execution
EP0547247B1 (en) Extensible risc microprocessor architecture

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU MC NL SE

WWE Wipo information: entry into national phase

Ref document number: 1992914386

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1019930700689

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 1992914386

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1992914386

Country of ref document: EP