WO1993001547A1

WO1993001547A1 - Risc microprocessor architecture implementing fast trap and exception state

Info

Publication number: WO1993001547A1
Application number: PCT/JP1992/000872
Authority: WO
Inventors: Le Trong Nguyen; Derek J. Lentz; Yoshiyuki Miyayama; Sanjiv Garg; Yasuaki Hagiwara; Johannes Wang; Quang H. Trang
Original assignee: Seiko Epson Corporation
Priority date: 1991-07-08
Filing date: 1992-07-07
Publication date: 1993-01-21
Also published as: KR930702719A; JP2003330708A; US5481685A; JP3750743B2; EP0547240B1; JP2001022584A; EP0945787A3; US5448705A; KR100294276B1; EP0945787A2; JP2001067220A; HK1014783A1; ATE188786T1; JPH06502035A; JP3552995B2; DE69230554D1; DE69230554T2; EP0547240A1; JP3333196B2; JP3879812B2

Abstract

Fast trap mechanism for a microprocessor, wherein a vector trap table is maintained which contains space for a plurality of instructions in each table entry. When a fast trap occurs, control is transferred directly into the table entry corresponding to the trap number. The trap handler can be located completely inside the table entry, or it can transfer control to additional handler code.

Description

DESCRIPTION

RISC MICROPROCESSOR ARCHITECTURE IMPLEMENTING FAST TRAP AND EXCEPTION STATE

CROSS-REFERENCE TO RELATED APPLICATIONS This Application is related to the following applications, all of which are assigned to the assignee of the present application/ and all of which are incorporated herein by reference:

1. HIGH-PERFORMANCE RISC MICROPROCESSOR ARCHITECTURE, invented by Le T. Nguyen et al. SMOS 7984 MCF/GBR, Application Serial Number 07 /727.nnfi . filed 08 July 1991; 2. EXTENSIBLE RISC MICROPROCESSOR ARCHITECTURE, invented by Le T. Nguyen et al, SMOS 7985 MCF/GBR, Application Serial Number 077121 .D R . filedΠR ,τni ioo-| •.

3. RISC MICROPROCESSOR ARCHITECTURE WITH ISOLATED ARCHITECTURAL DEPENDENCIES, invented by Le T. Nguyen et al, SMOS 7987 MCF/GBR, Application Serial Number 07/726,744 filed 08 July 1991;

4. RISC MICROPROCESSOR ARCHITECTURE IMPLEMENTING MULTIPLE TYPED REGISTER SETS, invented by Sanjiv Garg et al, SMOS 7988 MCF/GBR/RCC, Application Serial Number 07/726,773 filed 08 July 1991;

5. SINGLE CHIP PAGE PRINTER CONTROLLER, invented by Derek J. Lentz et al, SMOS 7991 MCF/GBR, Application Serial Number O⁷/⁷²⁶ -⁹²⁹ ■ filed °^{8 Jul}Y ^ 6. MICROPROCESSOR ARCHITECTURE CAPABLE OF SUPPORTING HETEROGENEOUS PROCESSORS, invented by Derek J. Lentz et al, SMOS 7992 MCF/GBR/ MB, Application Serial Number ⁰⁷Z⁷²⁶ -⁸⁹³ ■ filed °^{8 Jul}Y ^l991.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to microprocessor architectures, and more particularly, to interrupt and exception handling in microprocessors.

2. Description of Related Art

In a typical microprocessor, instructions are generally executed in sequence unless a control flow varying instruction is encountered or an exception occurs. With respect to exceptions, facilities are included for changing the control flow upon the occurrence of particular events which may or may not be related to particular instructions in the instruction stream. For example, a microprocessor may include an interrupt request (IRQ) lead which, when activated by an external device, causes the microprocessor to save certain information relating to the current state of the machine, including an indication of the address of the next instruction to be executed, and then immediately transfer control to an interrupt handler which begins at some predetermined address. As another example, if an execution error such as divide-by-zero occurs during the execution of a particular instruction, the microprocessor may also save information related to the current state of the machine and transfer control to an exception handler. As yet another example, some microprocessors include a "software trap" instruction in their instruction set, which also causes the microprocessor to save information concerning the state of the machine and transfer control to an exception handler. As used herein, the terms interrupt, trap, fault and exception are used interchangeably.

In some microprocessors, an externally generated interrupt always causes the microprocessor to transfer control to the same interrupt handler entry point. If several external devices are present and able to activate the interrupt request lead, the interrupt handler must first determine which device caused the interrupt and then transfer control to a portion of code to handle that particular device. For example, the Intel 8048 microcontroller includes an Ϊ.NT input which, when activated, causes the microcontroller to transfer control to absolute memory location 3. The 8048 also includes a RESET input which, when activated, causes the microcontroller to transfer control to absolute memory location 0. It also includes an internal timer/counter which can generate interrupts which cause a transfer of control to absolute memory location 7. Other microprocessors include "interrupt level" leads in addition to the interrupt request lead. For these microprocessors, when an external device activates the interrupt request lead, it also places a trap number, unique to that particular device, on the interrupt level lines. The internal hardware of the microprocessor then transfers control, or "vectors", to any of several interrupt handlers, each corresponding to a different trap number. Similarly, some microprocessors have only a single predetermined entry point for all routines written to handle internally generated exceptions, and others have facilities for vectoring automatically to a routine dependent upon a trap number defined for each particular type of internal exception that might occur. - 4 -

In the past, where interrupt and exception handlers were vectored, a number of different techniques were used to determine the entry point of the appropriate handler. In one technique, a table of addresses was created, beginning at a particular table base address which was either fixed or definable by the user. Each entry in the table was the same length as the length of an address, for example two or four bytes long, and contained the entry point for a corresponding trap number. When an interrupt or exception occurred, the microprocessor first determined the base address of the table, then added times the trap number (where m is the number of bytes in each entry) , and then loaded the information stored at the resulting address into the program counter (PC) to thereby transfer control to the routine beginning at the address specified in the table entr .

In other microprocessors, an entire branch instruction was stored in each entry in the table, instead of merely the address of a handler. The number of bytes in each entry was equal to the number of bytes in a branch instruction. When an interrupt or exception was received, the microprocessor would first determine the table base address, add times the trap number, and simply load the result into the program counter. The first instruction then executed would be the branch instruction in the table, and control would finally transfer to the appropriate exception handler.

In both of the above techniques for vectoring to a handler, a delay is encountered because a preliminary operation must be performed before the operational part of the handler can begin execution. In the first above-mentioned technique, the entry point address first had to be retrieved from the table before it could be loaded into the program counter. In the second above-described technique, an entire preliminary branch instruction had to be retrieved and executed before the substantive part of the handler could begin executing. Adder delays could be eliminated in the calculation of the table base address plus m times the trap number, by merely concatenating high-order bits from the base address with the trap number itself as lower-order bits, followed by log₂ m zero bits, but the delays caused by the preliminary operations just described remained. Such delays can be detrimental in a system where the response time to handle certain types of interrupts is critical.

Another problem related to exception handling in prior art microprocessors concerns the amount of information which must be stored to be able to reinstate the "state of the machine" if and when the trap handler returns to the main instruction flow. A tradeoff exists between the desire to store as much information as possible, and the desire to minimize the delay in dispatching to a trap handler. With respect to on-chip data registers in particular, one technique that has been used is to store none of the on-chip data registers, leaving it up to the handler to temporarily store the data in each register before it can use the register for its own purposes. The handler then had to replace the data in the register before returning. The need to store and restore these registers can slow the operation of the handler significantly. In another technique, the hardware automatically stores the contents of the registers on a stack before transferring control to the handler. This technique is also inadequate since it increases hardware complexity, and also can delay transfer to the handler significantly. Thus, with the vectoring techniques described above, the delays caused by existing techniques for protecting the contents of registers when a trap handler is invoked can be unacceptable in a high performance microprocessor.

SUMMARY OF THE INVENTION According to the invention, a microprocessor architecture is employed which alleviates many of the above deficiencies in prior art systems. In particular, a "fast trap" exception dispatching technique is employed by which an entire handler can be stored in a single vector address table entry. Each table entry has enough space for at least two instructions, and preferably significantly more, so that when a fast trap occurs, the microprocessor need only branch to an address determined by concatenating m times the trap number to a base address. The delay required to fetch an entry point address from the table, or to fetch and execute a preliminary branch instruction is eliminated. The microprocessor may also include other, less time efficient, vectoring techniques for less critical types of traps.

In another aspect of the invention, when a trap is encountered, the processor enters an interrupted state which automatically shifts a number of shadow registers to the foreground and shifts a corresponding set of foreground registers into the background. Register contents are not transferred; rather, the shadow registers are simply made available in place of the normal registers. Thus the handler has a set of registers immediately available for use without any need to be concerned about destroying data needed for the main instruction stream. The above-mentioned HIGH-PERFORMANCE RISC MICROPROCESSOR ARCHITECTURE application describes an advanced microprocessor which prefetches instructions prior to the time they are executed, can handle out-of- order return of instruction prefetch requests, can execute more than one instruction during the same execution time, and can also execute instructions out of order relative to their sequence in the instruction stream. Another aspect of the present invention includes a mechanism to maintain the preciseness of synchronous exceptions which occur relative to instructions prior to and during the time they are executed.

The microprocessor architecture described in that application further includes facilities for handling a separate procedural instruction flow called via a procedural, or emulation, instruction in the main instruction flow. The transfer of control to a procedural instruction flow is accomplished without flushing any instructions already prefetched in the main instruction flow, by having a separate emulation instruction prefetch queue. According to another aspect of the invention, the interrupted state remains available whether the processor is executing from the main instruction stream or a procedural instruction stream, and the processor maintains an indication of which instruction stream to return to upon a return from trap. Further, separate prefetch program counters are maintained for the main and emulation instruction streams, and the processor stores only the prefetch PC from the current instruction stream when a trap handler is invoked, and restores it to the proper prefetch program counter when the handler returns. BRIEF DESCRIPTION OF THE DRAWINGS These and other advantages and features of the present invention will become better understood upon consideration of the following detailed description of the invention when considered in connection of the accompanying drawings, in which like reference numerals designate like parts throughout the figures thereof, and wherein:

Figure 1 is a simplified block diagram of the preferred microprocessor architecture implementing the present invention;

Figure 2 is a detailed block diagram of the instruction fetch unit constructed in accordance with the present invention; Figure 3 is a block diagram of the program counter logic unit constructed in accordance with the present invention;

Figure 4 is a further detailed block diagram of the program counter data and control path logic; Figure 5 is a simplified block diagram of the instruction execution unit of the present invention;

Figure 6a is a simplified block diagram of the register file architecture utilized in a preferred embodiment of the present invention. Figure 6b is a graphic illustration of the storage register format of the temporary buffer register file and utilized in a preferred embodiment of the present invention;

Figure 6c is a graphic illustration of the primary and secondary instruction sets as present in the last two stages of the instruction FIFO unit of the present invention;

Figures 7a-c provide a graphic illustration of the reconfigurable states of the primary integer register set as provided in accordance with a preferred embodiment of the present invention;

Figure 8 is a graphic illustration of a reconfigurable floating point and secondary integer register set as provided in accordance with the preferred embodiment of the present invention;

Figure 9 is a graphic illustration of a tertiary boolean register set as provided in a preferred embodiment of the present invention; Figure 10 is a detailed block diagram of the primary integer processing data path portion of the instruction execution unit constructed in accordance with the preferred embodiment of the present invention;

Figure 11 is a detailed block diagram of the primary floating point data path portion of the instruction execution unit constructed in accordance with a preferred embodiment of the present invention;

Figure 12 is a detailed block diagram of the boolean operation data path portion of the instruction execution unit as constructed in accordance with the preferred embodiment of the present invention;

Figure 13 is a detailed block diagram of a load/store unit constructed in accordance with the preferred embodiment of the present invention; Figure 14 is a timing diagram illustrating the preferred sequence of operation of a preferred embodiment of the present invention in executing multiple instructions in accordance with the present invention; Figure 15 is a simplified block diagram of the virtual memory control unit as constructed in accordance with the preferred embodiment of the present invention; Figure 16 is a graphic representation of the virtual memory control algorithm as utilized in a preferred embodiment of the present invention; and

Figure 17 is a simplified block diagram of the cache control unit as utilized in a preferred embodiment of the present invention.

DETAILED DESCRIPTION

I. Microprocessor Architectural Overview . . . . 12 II. Instruction Fetch Unit 15

A) IFU Data Path 16

B) IFU Control Path 21

C) IFU/IEU Control Interface 30

D) PC Logic Unit Detail 33 1) PF and ExPC Control/Data Unit Detail 37

2) PC Control Algorithm Detail 44

E) Interrupt and Exception Handling . . . . 56

1) Overview 56

2) Asynchronous Interrupts: 58 3) Synchronous Exceptions 60

4) Handler Dispatch and Return 64

5) Nesting 68

6) List of Traps: 69 III. Instruction Execution Unit 70

A) IEU Data Path Detail 77

1) Register File Detail 77

2) Integer Data Path Detail 86

3) Floating Point Data Path Detail . . . 90 4) Boolean Register Data Path Detail . . 92

B) Load/Store Control Unit 97

C) IEU Control Path Detail 100

1) EDecode Unit Detail 101

2) Carry Checker Unit Detail 104 3) Data Dependency Checker Unit Detail . 105

4) Register Rename Unit Detail 106

5) Instruction Issuer Unit Detail . . . 108

6) Done Control Unit Detail Ill

7) Retirement Control Unit Detail . . . Ill 8) Control Flow Control Unit Detail . . 112

9) Bypass Control Unit Detail 113

IV. Virtual Memory Control Unit 113 V. Cache Control Unit 116

VI. Summary/Conclusion 118 I. Microprocessor Architectural Overview:

The architecture 100 of the present invention is generally shown in Figure 1. An Instruction Fetch Unit (IFU) 102 and an Instruction Execution Unit (IEU) 104 are the principal operative elements of the architecture 100. A Virtual Memory Unit (VMU) 108, Cache Control Unit (CCU) 106, and Memory Control Unit (MCU) 110 are provided to directly support the function of the IFU 102 and IEU 104. A Memory Array Unit (MAU) 112 is also provided as a generally essential element for the operation of the architecture 100, though the MAU 112 does not directly exist as an integral component of the architecture 100. That is, in the preferred embodiments of the present invention, the IFU 102, IEU 104, VMU 108, CCU 106, and MCU 110 are fabricated on a single silicon die utilizing a conventional 0.8 micron design rule low-power CMOS process and comprising some 1,200,000 transistors. The standard processor or system clock speed of the architecture 100 is 40 MHz. However, in accordance with a preferred embodiment of the present invention, the internal processor clock speed is 160 MHz.

The IFU 102 is primarily responsible for the fetching of instructions, the buffering of instructions pending execution by the IEU 104, and, generally, the calculation of the next virtual address to be used for the fetching of next instructions .

In the preferred embodiments of the present invention, instructions are each fixed at a length of 32 bits. Instruction sets, or "buckets" of four instructions, are fetched by the IFU 102 simultaneously from an instruction cache 132 within the CCU 106 via a 128 bit wide instruction bus 114. The transfer of instruction sets is coordinated between the IFU 102 and CCU 106 by control signals provided via a control bus 116. The virtual address of a instruction set to be fetched is provided by the IFU 102 via an IFU combined arbitration, ccntrol and address bus 118 onto a shared arbitration, control and address bus 120 further coupled between the IEU 104 and VMU 108. Arbitration for access to the VMU 108 arises from the fact that both the IFU 102 and IEU 104 utilize the VMU 108 as a common, shared resource. In the preferred embodiment of the architecture 100, the low order bits defining an address within a physical page of the virtual address are transferred directly by the IFU 102 to the Cache Control Unit 106 via the control lines 116. The virtualizing, high order bits of the virtual address supplied by the IFU 102 are provided by the address portion of the buses 118, 120 to the VMU 108 for translation into a corresponding physical page address. For the IFU 102, this physical page address is transferred directly from the VMU 108 to the Cache Control Unit 106 via the address control lines 122 one-half internal processor cycle after the translation request is placed with the VMU 108.

The instruction stream fetched by the IFU 102 is, in turn, provided via an instruction stream bus 124 to the IEU 104. Control signals are exchanged between the IFU 102 and the IEU 104 via controls lines 126. In addition, certain instruction fetch addresses, typically those requiring access to the register file present within the IEU 104, are provided back to the IFU via a target address return bus within the control lines 126.

The IEU 104 stores and retrieves data with respect to a data cache 134 provided within the CCU 106 via an

80-bit wide bi-directional data bus 130. The entire physical address for IEU data accesses is provided via an address portion of the control bus 128 to the CCU 106. The control bus 128 also provides for the exchange of control signals between the IEU 104 and CCU 106 for managing data transfers. The IEU 104 utilizes the VMU 108 as a resource for converting virtual data address into physical data addresses suitable for submission to the CCU 106. The virtualizing portion of the data address is provided via the arbitration, control and address bus 120 to the VMU 108. Unlike operation with respect to the IFU 102, the VMU 108 returns the corresponding physical address via the bus 120 to the IEU 104. In the preferred embodiments of the architecture 100, the IEU 104 requires the physical address for use in ensuring that load/store operations occur in proper program stream order.

The CCU 106 performs the generally conventional high-level function of determining whether physical address defined requests for data can be satisfied from the instruction and data caches 132, 134, as appropriate. Where the access request can be properly fulfilled by access to the instruction or data caches 132, 134, the CCU 106 coordinates and performs the data transfer via the data buses 114, 128.

Where a data access request cannot be satisfied from the instruction or data caches 132, 134, the CCU 106 provides the corresponding physical address to the MCU 110 along with sufficient control information to identify whether a read or write access of the MAU 112 is desired, the source or destination cache 132, 134 of the CCU 106 for each request, and additional identifying information to allow the request operation to be correlated with the ultimate data request as issued by the IFU 102 or IEU 104. The MCU 110 preferably includes a port switch unit 142 that is coupled by a uni-direσtional data bus 136 with the instruction cache 132 of the CCU 106 and a bi¬ directional data bus 138 to the data cache 134. The port switch 142 is, in essence, a large multiplexer allowing a physical address obtained from the control bus 140 to be routed to any one of a number of ports P₀- P_N 146^ and the bi-directional transfer of data from the ports to the data buses 136, 138. Each memory access request processed by the MCU 110 is associated with one of the ports 146^ for purposes of arbitrating for access to the main system memory bus 162 as required for an access of the MAU 112. Once a data transfer connection has been established, the MCU provides control information via the control bus 140 to the CCU 106 to initiate the transfer of data between either the instruction or data cache 132, 134 and MAU 112 via the port switch 142 and the corresponding one of the ports 146^. In accordance with the preferred embodiments of the architecture 100 the MCU 110 does not actually store or latch data in transit between the CCU 106 and MAU 112. This is done to minimize latency in the transfer and to obviate the need for tracking or managing data that may be uniquely present in the MCU 110.

II. Instruction Fetch Unit:

The primary elements of the Instruction Fetch Unit 102 are shown in Figure 2. The operation and interrelationship of these elements can best be understood by considering their participation in the IFU data and control paths . A) IFU Data Path:

The IFU data path begins with the instruction bus 114 that receives instruction sets for temporary storage in a prefetch buffer 260. An instruction set from the prefetch buffer 260 is passed through an IDecode unit 262 and then to an IFIFO unit 264. Instruction sets stored in the last two stages of the instruction FIFO 264 are continuously available, via the data buses 278, 280, to the IEU 104. The prefetch buffer unit 260 receives a single instruction set at a time from the instruction bus 114. The full 128 bit wide instruction set is generally written in parallel to one of four 128 bit wide prefetch buffer locations in a Main Buffer (MBUF) 188 portion of the prefetch buffer 260. Up to four additional instruction sets may be similarly written into two 128 bit wide Target Buffer (TBUF) 190 prefetch buffer locations or to two 128 bit wide Procedural Buffer (EBUF) 192 prefetch buffer locations. In the preferred architecture 100, an instruction set in any one of the prefetch buffer locations within the MBUF 188, TBUF 190 or EBUF 192 may be transferred to the prefetch buffer output bus 196. In addition, a direct fall through instruction set bus 194 is provided to connect the instruction bus 114 directly with the prefetch buffer output bus 196, thereby bypassing the MBUF, TBUF and EBUF 188, 190, 192.

In the preferred architecture 100, the MBUF 188 is utilized to buffer instruction sets in the nominal or main instruction stream. The TBUF 190 is utilized to buffer instruction sets fetched from a tentative target branch instruction stream. Consequently, the prefetch buffer unit 260 allows both possible instruction streams following a conditional branch instruction to be prefetched. This facility obviates the latency for further accesses to at least the CCU 106, if not the substantially greater latency of a MAU 112, for obtaining the correct next instruction set for execution following a conditional branch instruction regardless of the particular instruction stream eventually selected upon resolution of the conditional branch instruction. In the preferred architecture 100 invention, the provision of the MBUF 188 and TBUF 190 allow the instruction fetch unit 102 to prefetch both potential instruction streams and, as will be discussed below in relationship to the instruction execution unit 104, to further allow execution of the presumed correct instruction stream. Where, upon resolution of the conditional branch instruction, the correct instruction stream has been prefetched into the MBUF 188, any instruction sets in the TBUF 190 may be simply invalidated. Alternately, where instruction sets of the correct instruction stream are present in the TBUF 190, the instruction prefetch buffer unit 260 provides for the direct, lateral transfer of those instruction sets from the TBUF 190 to respective buffer locations in the MBUF 188. The prior MBUF 188 stored instruction sets are effectively invalidated by being overwritten by the TBUF 190 transferred instruction sets. Where there is no TBUF instruction set transferred to an MBUF location, that location is simply marked invalid.

Similarly, the EBUF 192 is provided as another, alternate prefetch path through the prefetch buffer 260. The EBUF 192 is preferably utilized in the prefetching of an alternate instruction stream that is used to implement an operation specified by a single instruction, a "procedural" instruction, encountered in the MBUF 188 instruction stream. In this manner, complex or extended instructions can be implemented through software routines, or procedures, and processed through the prefetch buffer unit 260 without disturbing the instruction streams already prefetched into the MBUF 188. Although the present invention generally permits handling of procedural instructions that are first encountered in the TBUF 190, prefetching of the procedural instruction stream is held with all prior pending conditional branch instructions are resolved. This allows conditional branch instructions occurring in the procedural instruction stream to be consistently handled through the use of the TBUF 190. Thus, where a branch is taken in the procedural stream, the target instruction sets will have been prefetched into the TBUF 190 and can be simply laterally transferred to the EFUF 192.

Finally, each of the MBUF 188, TBUF 190 and EBUF 192 are coupled to the prefetch buffer output bus 196 so as to provide any instruction set stored by the prefetch unit onto the output bus 196. In addition, a flow through bus 194 is provided to directly transfer an instruction set from the instruction bus 114 directly to the output bus 196.

In the preferred architecture 100, the prefetch buffers within the MBUF 188, TBUF 190, EBUF 192 do not directly form a FIFO structure. Instead, the provision of an any buffer location to output bus 196 connectivity allows substantial freedom in the prefetch ordering of instruction sets retrieved from the instruction cache 132. That is, the instruction fetch unit 102 generally determines and requests instruction sets in the appropriate instruction stream order of instructions. However, the order in which instruction sets are returned to the IFU 102 is allowed to occur out-of- order as appropriate to match the circumstances where some requested instruction sets are available and accessible from the CCU 106 alone and others require an access of the MAU 112. Although instruction sets may not be returned in order to the prefetch buffer unit 260, the sequence of instruction sets output on the output bus 196 must generally conform to the order of instruction set requests issued by the IFU 102; the in-order instruction stream sequence subject to, for example, tentative execution of a target branch stream.

The IDecode unit 262 receives the instruction sets, generally one per cycle, IFIFO unit 264 space permitting, from the prefetch buffer output bus 196. Each set of four instructions that make up a single instruction set is decoded in parallel by the IDecode unit 262. While relevant control flow information is extracted via lines 318 for the benefit of the control path portion of the IFU 102, the contents of the instruction set is not altered by the IDecode unit 262.

Instruction sets from the IDecode Unit 162 are provided onto a 128 bit wide input bus 198 of the IFIFO unit 264. Internally, the IFIFO unit 264 consists of a sequence of master/slave registers 200, 204, 208, 212, 216, 220, 224. Each register is coupled to its successor to allow the contents of the master registers 200, 208, 216 to be transferred during a first half internal processor cycle of FIFO operation to the slave registers 204, 212, 220 and then to the next successive master register 208, 216, 224 during the succeeding half-cycle of operation. The input bus 198 is connected to the input of each of the master registers 200, 208, 216, 224 to allow loading of an instruction set from the IDecode unit 262 directly in to a master register during the second half-cycle of FIFO operation. However, loading of a master register from the input bus 198 need not occur simultaneously with a FIFO shift of data within the IFIFO unit 264. Consequently, the IFIFO unit 264 can be continuously filled from the input bus 198 regardless of the current depth of instruction sets stored within the instruction FIFO unit 264 and, further, independent of the FIFO shifting of data through the IFIFO unit 264. Each of the master/slave registers 200, 204, 208, 212, 216, 220, 224, in addition to providing for the full parallel storage of a 128 bit wide instruction set, also provides for the storage of several bits of control information in the respective control registers 202, 206, 210, 214, 218, 222, 226. The preferred set of control bits include exception miss and exception modify, (VMU), no memory (MCU), branch bias, stream, and offset (IFU) . This control information originates from the control path portion of the IFU 102 simultaneous with the loading of an IFIFO master register with a new instruction set from the input bus 198. Thereafter, the control register information is shifted in parallel concurrently with the instruction sets through the IFIFO unit 264. Finally, in the preferred architecture 100, the output of instruction sets from the IFIFO unit 264 is obtained simultaneously from the last two master registers 216, 224 on the I_Bucket_0 and I_Bucket_l instruction set output buses 278, 280. In addition, the corresponding control register information is provided on the IBASV0 and IBASV1 control field buses 282, 284. These output buses 278, 282, 280, 284 are all provided as the instruction stream bus 124 to the IEU 104. B. IFU Control Path:

The control path for the IFU 102 directly supports the operation of the prefetch buffer unit 260, IDecode unit 262 and IFIFO unit 264. A prefetch control logic unit 266 primarily manages the operation of the prefetch buffer unit 260. The prefetch control logic unit 266 and IFU 102 in general, receives the system clock signal via the clock line 290 for synchronizing IFU operations with these of the IEU 104, CCU 106 and VMU 108. Control signals appropriate for the selection and writing of instruction sets into the MBUF 188, TBUF

190 and EBUF 192 are provided on the control lines 304.

A number of control signals are provided on the control lines 316 to the prefetch control logic unit 266. Specifically, a fetch request control signal is provided to initiate a prefetch operation. Other control signals provided on the control line 316 identify the intended destination of the requested prefetch operation as being the MBUF 188, TBUF 190 or EBUF 192. In response to a prefetch request, the prefetch control logic unit 266 generates an ID value and determines whether the prefetch request can be posted to the CCU 106. Generation of the ID value is accomplished through the use of a circular four-bit counter.

The use of a four-bit counter is significant in three regards. The first is that, typically a maximum of nine instruction sets may be active at one time in the prefetch buffer unit 260; four instruction sets in the MBUF 188, two in the TBUF 190, two in the EBUF 192 and one provided directly to the IDecode unit 262 via the flow through bus 194. Secondly, instruction sets include four instructions of four bytes each. Consequently, the least significant four bits of any address selecting an instruction set for fetching are superfluous. Finally, the prefetch request ID value can be easily associated with a prefetch request by insertion as the least significant four bits of the prefetch request address; thereby reducing the total number of address lines required to interface with the CCU 106.

To allow instruction sets to be returned by the CCU 106 out-of-order with respect to the sequence of prefetch requests issued by the IFU 102, the architecture 100 provides for the return of the ID request value with the return of instruction sets from the CCU 106. However, the out-of-order instruction set return capability may result in exhaustion of the sixteen unique IDs. A combination of conditional instructions executed out-of-order, resulting in additional prefetches and instruction sets requested but not yet returned can lead to potential re-use of an ID value. Therefore, the four-bit counter is preferably held, and no further instruction set prefetch requests issued, where the next ID value would be the same as that associated with an as yet outstanding fetch request or another instruction set then pending in the prefetch buffer 260. The prefetch control logic unit 266 directly manages a prefetch status array 268 which contains status storage locations logically corresponding to each instruction set prefetch buffer location within the MBUF 188, TBUF 190 and EBUF 192. The prefetch control logic unit 266, via selection and data lines 306, can scan, read and write data to the status register array 268. Within the array 268, a main buffer register 308 provides for storage of four, four-bit ID values (MB ID), four single-bit reserved flags (MB RES) and four single-bit valid flags (MB VAL) , each corresponding by logical bit-position to the respective instruction set storage locations within the MBUF 180. Similarly, a target buffer register 310 and extended buffer register 312 each provide for the storage of two four-bit ID values (TB ID, EB ID), two single-bit reserved flags (TB RES, EB RES), and two single-bit valid flags (TB VAL, EB VAL). Finally, a flow through status register 314 provides for the storage of a single four-bit ID value (FT ID), a single reserved flag bit (FT RES), and a single valid flag bit (FT VAL) .

The status register array 268 is first scanned and, as appropriate, updated by the prefetch control logic unit 266 each time a prefetch request is placed with the CCU 106 and subsequently scanned and updated each time an instruction set is returned. Specifically, upon receipt of the prefetch request signal via the control lines 316, the prefetch control logic unit 216 increments the current circular counter generated ID value, scans the status register array 268 to determine whether the ID value is available for use and whether a prefetch buffer location of the type specified by the prefetch request signal is available, examines the state of the CCU IBUSY control line 300 to determine whether the CCU 106 can accept a prefetch request and, if so, asserts a CCU IREAD control signal on the control line 298, and places the incremented ID value on the CCU ID out bus 294 to the CCU 106. A prefetch storage location is available for use where both of the corresponding reserved and valid status flags are false. The prefetch request ID is written into the ID storage location within the status register array 268 corresponding to the intended storage location within the MBUF 188, TBUF 190, or EBUF 192 concurrent with the placement of the request with the CCU 106. In addition, the corresponding reserved status flag is set true.

When the CCU 106 is able to return a previously requested instruction set to the IFU 102, the CCU IREADY signal is asserted on control line 302 and the corresponding instruction set ID is provided on the CCU

ID control lines 296. The prefetch control logic unit

266 scans the ID values and reserved flags within the status register array 268 to identify the intended destination of the instruction set within the prefetch buffer unit 260. Only a single match is possible. Once identified, the instruction set is written via the bus

114 into the appropriate location within the prefetch buffer unit 260 or, if identified as a flow through request, provided directly to the IDecode unit 262. In either case, the valid status flag in the corresponding status register array is set true.

The PC logic unit 270, as will be described below in greater detail, tracks the virtual address of the MBUF 188, TBUF 190 and EBUF 192 instruction streams through the entirety of the IFU 102. In performing this function, the PC logic block 270 both controls and operates from the IDecode unit 262. Specifically, portions of the instructions decoded by the IDecode unit 262 potentially relevant to a change in the program instruction stream flow are provided on the bus 318 to a control flow detection unit 274 and directly to the PC logic block 270. The control flow detection unit 274 identifies each instruction in the decoded instruction set that constitutes a control flow instruction including conditional and unconditional branch instructions, call type instructions, software traps procedural instructions and various return instructions.

The control flow detection unit 274 provides a control signal, via lines 322, to the PC logic unit 270 to identify the location and specific nature of the control flow instructions within the instruction set present in the IDecode unit 262. The PC logic unit 270, in turn, determines the target address of the control flow instruction, typically from data provided within the instruction and transferred to the PC logic unit via lines 318. Where, for example, a branch logic bias has been selected to execute ahead for conditional branch instructions, the PC logic unit 270 will begin to direct and separately track the prefetching of instruction sets from the conditional branch instruction target address. Thus, with the next assertion of a prefetch request on the control lines 316, the PC logic unit 270 will further assert a control signal, via lines 316, selecting the destination of the prefetch to be the TBUF 190, assuming that prior prefetch instruction sets were directed to the MBUF 188 or EBUF 192. Once the prefetch control logic unit 266 determines that a prefetch request can be supplied to the CCU 106, the prefetch control logic unit 266 provides an enabling signal, again via lines 316, to the PC logic unit 270 to enable the provision of a page offset portion of the target address (CCU PADDR [13:4]) via the address lines 324 directly to the CCU 106. At the same time, the PC logic unit 270, where a new virtual to physical page translation is required further provides a VMU request signal via control line 328 and the virtualizing portion of the target address (VMU VADDR [31:14]) via the address lines 326 to the VMU 108 for translation into a physical address. Where a page translation is not required, no operation by the VMU 108 is required. Rather, the previous translation result is maintained in an output latch coupled to the bus 122 for immediate use by the CCU 106.

Operational errors in the VMU 108 in performing the virtual to physical translation requested by the PC logic unit 270 are reported via the VMU exception and VMU miss control lines 332, 334. The VMU miss control line 334 reports a translation lookaside buffer (TLB) miss. The VMU exception control signal, on VMU exception line 332, is raised for all other exceptions. In both cases, the PC logic unit handles the error condition by storing the current execution point in the instruction stream and then prefetching, as if in response to an unconditional branch, a dedicated exception handling routine instruction stream for diagnosing and handling the error condition. The VMU exception and miss control signals identify the general nature of the exception encountered, thereby allowing the PC logic unit 270 to identify the prefetch address of a corresponding exception handling routine. The IFIFO control logic unit 272 is provided to directly support the IFIFO unit 264. Specifically, the PC logic unit 270 provides a control signal via the control lines 336 to signal the IFIFO control logic unit 272 that an instruction set is available on the input bus 198 from the IDecode unit 262. The IFIFO control unit 272 is responsible for selecting the deepest available master register 200, 208, 2i6, 224 for receipt of the instruction set. The output of each of the master control registers 202, 210, 218, 226 is provided to the IFIFO control unit 272 via the control bus 338. The control bits stored by each master control register includes a two-bit buffer address (IF_Bx_ADR) , a single stream indicator bit (IF_Bx_STRM) , and a single valid bit (IF Bx VLD) . The two bit buffer address identifies - 27 - the first valid instruction within the corresponding instruction set. That is, instruction sets returned by the CCU 106 may not be aligned such that the target instruction of a branch operation, for example, is located in the initial instruction location within the instruction set. Thus, the buffer address value is provided to uniquely identify the initial instruction within an instruction set- that is to be considered for execution. The stream bit is used essentially as a marker to identify the location of instruction sets containing conditional control flow instructions, and giving rise to potential control flow changes, in the stream of instructions through the IFIFO unit 264. The main instruction stream is processed through the MBUF 188 generally with a stream bit value of 0. On the occurrence of a relative conditional branch instruction, for example, the corresponding instruction set is marked with a stream bit value of 1. The conditional branch instruction is detected by the IDecode unit 262. Up to four conditional control flow instructions may be present in the instruction set. The instruction set is then stored in the deepest available master register of the IFIFO unit 264. In order to determine the target address of the conditional branch instruction, the current IEU 104 execution point address (DPC), the relative location of the conditional instruction containing instruction set as identified by the stream bit, and the conditional instruction location offset in the instruction set, as provided by the control flow detector 274, are combined with the relative branch offset value as obtained from a corresponding branch instruction field via control lines 318. The result is a branch target virtual address that is stored by the PC logic unit 270. The initial instruction sets of the target instruction stream may then be prefetched into the TBUF 190 utilizing this address. Depending on the preselected branch bias selected for the PC logic unit 270, the IFIFO unit 264 will continue to be loaded from either the MBUF 188 or TBUF 190. If a second instruction set containing one or more conditional flow instructions is encountered, the instruction set is marked with a stream bit value of 0. Since a second target stream cannot be fetched, the target address is calculated and stored by the PC logic unit 270, but no prefetch is performed. In addition, no further instruction sets can be processed through the IDecode unit 262, or at least none that are found to contain a conditional flow control instruction.

The PC logic unit 270, in the preferred embodiments of the present invention, can manage upto eight conditional flow instructions occurring in upto two instruction sets. The target addresses for each of the two instruction sets marked by stream bit changes are stored in an array of four address registers with each target address positioned logically with respect to the location of the corresponding conditional flow instruction in the instruction set.

Once the branch result of the first in-order conditional flow instruction is resolved, the PC logic unit 270 will direct the prefetch control unit 260, via control signals on lines 316, to transfer the contents of the TBUF 190 to the MBUF 188, if the branch is taken, and to mark invalid the contents of the TBUF 190. Any instruction sets in the IFIFO unit 264 from the incorrect instruction stream, target stream if the branch is not taken and main stream if the branch is taken, are cleared from the IFIFO unit 264. If a second or subsequent conditional flow control instruction exists in the first stream bit marked instruction set, that instruction is handled in a consistent manner: the instruction sets from the target stream are prefetched, instruction sets from the MBUF 188 or TBUF 190 are processed through the IDecode unit 262 depending on the branch bias, and the IFIFO unit 264 is cleared of incorrect stream instruction sets when the conditional flow instruction finally resolves.

If a secondary conditional flow instruction set remains in the IFIFO unit 264 once the IFIFO unit 264 is cleared of incorrect stream instruction sets, and the first conditional flow instruction set contains no further conditional flow instructions, the target addresses of the second stream bit marked instruction set are promoted to the first array of address registers. In any case, a next instruction set containing conditional flow instructions can then be evaluated through the IDecode unit 262. Thus, the toggle usage of the stream bit allows potential control flow changes to be marked and tracked through the IFIFO unit 264 for purposes of calculating branch target addresses and for marking the instruction set location above which to clear where the branch bias is subsequently determined to have been incorrect for a particular conditional flow control instruction.

Rather than actually clearing instruction sets from the master registers, the IFIFO control logic unit 272 simply resets the valid bit flag in the control registers of the corresponding master registers of the IFIFO unit 264. The clear operation is instigated by the PC logic unit 270 in a control signal provided on lines 336. The inputs of each of the master control registers 202, 210, 218, 226 are directly accessible by the IFIFO control logic unit 272 via the status bus 230. In the preferred architecture 100, the bits within these master control registers 202, 210, 218, 226 may be set by the IFIFO control unit 272 concurrent with or independent of a data shift operation by the IFIFO unit 264. This capability allows an instruction set to be written into any of the master registers 200, 208, 216, 224, and the corresponding status information to be written into the master control registers 202, 210, 218, 226 asynchronously with respect to the operation of the IEU 104.

Finally, an additional control line on the control and status bus 230 enables and directs the FIFO operation of the IFIFO unit 264. An IFIFO shift is performed by the IFIFO control logic unit 272 in response to the shift request control signal provided by the PC logic unit 270 via the control lines 336. The

IFIFO control unit 272, based on the availability of a master register 200, 208, 216, 224 to receive an instruction set provides a control signal, via lines

316, to the prefetch control unit 266 to request the transfer of a next appropriate instruction set from the prefetch buffers 260. On transfer of the instruction set, the corresponding valid bit in the array 268 is reset.

C, IFU/IEU Control Interface:

The control interface between the IFU 102 and IEU 104 is provided by the control bus 126. This control bus 126 is coupled to the PC logic unit 270 and consists of a number of control, address and specialized data lines. Interrupt request and acknowledge control signals, as passed via the control lines 340, allow the IFU 102 to signal and synchronize interrupt operations with the IEU 104. An externally generated interrupt signal is provided on a line 292 to the logic unit 270. In response, an interrupt request control signal, provided on lines 340, causes the IEU 104 to_, cancel tentatively executed instructions. Information regarding the nature of an interrupt- is exchanged via interrupt information lines 341. When the IEU 104 is ready to begin receiving instruction sets prefetched from the interrupt service routine address determined by the PC logic unit 270, the IEU 104 asserts an interrupt acknowledge control signal on the lines 340. Execution of the interrupt service routine, as prefetched by the IFU 102, will then commence. An IFIFO read (IFIFO RD) control signal is provided by the IEU 104 to signal that the instruction set present in the deepest master register 224 has been completely executed and that a next instruction set is desired. Upon receipt of this control signal, the PC logic unit 270 directs the IFIFO control logic unit 272 to perform a IFIFO shift operation on the IFIFO unit 264.

A PC increment request and size value (PC INC/SIZE) is provided on the control lines 344 to direct the PC logic unit 270 to update the current program counter value by a corresponding size number of instructions. This allows the PC logic unit 270 to maintain a point of execution program counter (DPC) that is precise to the location of the first in-order executing instruction in the current program instruction stream.

A target address (TARGET ADDR) is returned on the address lines 346 to the PC logic unit 270. The target address is the virtual target address of a branch instruction that depends on data stored within the register file of the IEU 104. Operation of the IEU 104 is therefore required to calculate the target address.

Control flow result (CF RESULT) control signals are provided on the control lines 348 to the PC logic unit 270 to identify whether any currently pending conditional branch instruction has been resolved and whether the result is either a branch taken or not taken. Based on these control signals, the PC logic unit 270 can determine which of the instruction sets in the prefetch buffer 260 and IFIFO unit 264 must be cancelled, if at all, as a consequence of the execution of the conditional flow instruction.

A number of IEU instruction return type control signals (IEU Return) are provided on the control lines 350 to alert the IFU 102 to the execution of certain instructions by the IEU 104. These instructions include a return from procedural instruction, return from trap, and return from subroutine call. The return from trap instruction is used equally in hardware interrupt and software trap handling routines. The subroutine call return is also used in conjunction with jump-and-link type calls. In each case, the return control signals are provided to alert the IFU 102 to resume its instruction fetching operation with respect to the previously interrupted instruction stream. Origination of the signals from the IEU 104 allows the precise operation of the system 100 to be maintained; the resumption of an "interrupted" instruction stream is performed at the point of execution of the return instruction.

A current instruction execution PC address (Current IFPC) is provided on an address bus 352 to the IEU 104. This address value, the DPC, identifies the precise instruction being executed by the IEU 104. That is, while the IEU 104 may tentatively execute ahead instructions past the current IFPC address, this address must be maintained for purposes of precise control of the architecture 100 with respect to the occurrence of interrupts, exceptions, and any other events that would require knowing the precise state-of-the-machine. When the IEU 104 determines that the precise state-of-the- machine in the currently executing instruction stream can be advanced, the PC Inc/Size signal is provided to the IFU 102 and immediately reflected back in the current IFPC address value.

Finally, an address and bi-directional data bus 354 is provided for the transfer of special register data. This data may be programmed into or read from special registers within the IFU 102 by the IEU 104. Special register data is generally loaded or calculated by the IEU 104 for use by the IFU 102.

D) PC Logic Unit Detail: A detailed diagram of the PC Logic unit 270 including a PC control unit 362, interrupt control unit 363, prefetch PC control unit 364 and execution PC control unit 366, is shown in Figure 3. The PC control unit 362 provides timing control over the prefetch and execution PC control units 364, 366 in response to control signals from the prefetch control logic unit 266, IFIFO control logic unit 272, and the IEU 104, via the interface bus 126. The Interrupt Control Unit 363 is responsible for managing the precise processing of interrupts and exceptions, including the determination of a prefetch trap address offset that selects an appropriate handling routine to process a respective type of trap. The prefetch PC control unit 364 is, in particular, responsible for managing program counters necessary to support the prefetch buffers 188, 190, 192, including storing return addresses for traps handling and procedural routine instruction flows. In support of this operation, the prefetch PC control unit 364 is responsible for generating the prefetch virtual address including the CCU PADDER address on the physical address bus lines 324 and the VMU VMADDR address on the address lines 326. Consequently, the prefetch PC control unit 364 is responsible for maintaining the current prefetch PC virtual address value.

The prefetch operation is generally initiated by the IFIFO control logic unit 272 via a control signal provided on the control lines 316. In response, the PC control unit 362 generates a number of control signals provided on the control lines 372 to operate the prefetch PC control unit 364 to generate the PADDR and, as needed, the VMADDR addresses on the address lines 324, 326. An increment signal, having a value of 0 to four, may be also provided on the control lines 374 depending on whether the PC control unit 362 is re- executing an instruction set fetch at the present prefetch address, aligning for the second in a series of prefetch requests, or selecting the next full sequential instruction set for prefetch. Finally, the current prefetch address PF_PC is provided on the bus 370 to the execution PC control unit 366.

New prefetch addresses originate from a number of sources. A primary source of addresses is the current IF_PC address provided from the execution PC control unit 366 via bus 352. Principally, the IF_PC address provides a return address for subsequent use by the prefetch PC control unit 364 when an initial call, trap or procedural instruction occurs. The IF_PC address is stored in registers in the prefetch PC control unit 364 upon each occurrence of these instructions. In this manner, the PC control unit 362, on receipt of a IEU return signal, via control lines 350, need merely select the corresponding return address register within the prefetch PC control unit 364 to source a new prefetch virtual address, thereby resuming the original program instruction stream.

Another source of prefetch addresses is the target address value provided on the relative target address bus 382 from the execution PC control unit 366 or on the absolute target address bus 346 provided from the IEU 104. Relative target addresses are those that can be calculated by the execution PC control unit 366 directly. Absolute target addresses must be generated by the IEU 104, since such target addresses are dependant on data contained in the IEU register file. The target address is routed over the target address bus 384 to the prefetch PC control unit 364 for use as a prefetch virtual address. In calculating the relative target address, an operand portion of the corresponding branch instruction is also provided on the operand displacement portion of the bus 318 from the IDecode unit 262.

Another source of prefetch virtual addresses is the execution PC control unit 366. A return address bus 352' is provided to transfer the current IF_PC value (DPC) to the prefetch PC control unit 364. This address is utilized as a return address where an interrupt, trap or other control flow instruction such as a call has occurred within the instruction stream. The prefetch PC control unit 364 is then free to prefetch a new instruction stream. The PC control unit 362 receives an IEU return signal, via lines 350, from the IEU 104 once the corresponding interrupt or trap handling routine or subroutine has been executed. In turn, the PC control unit 362 selects, via one of the PFPC control signals on line 372 and based on an identification of the return instruction executed as provided via lines 350, a register containing the current return virtual address. This address is then used to continue the prefetch operation by the PC logic unit 270.

Finally, another source of prefetch virtual addresses is from the special register address and data bus 354. An address value, or at least a base address value, calculated or loaded by the IEU 104 is transferred as data via the bus 354 to the prefetch PC control unit 364. The base addresses include the base addresses for the trap address table, a fast trap table, and a base procedural instruction dispatch table. The bus 354 also allows many of the registers in the prefetch and execution PC control units 364, 366 to be read to allow corresponding aspects of the state-of- the-machine to be manipulated through the IEU 104. The execution PC control unit 366, subject to the control of the PC control unit 362 is primarily responsible for calculating the current IF_PC address value. In this role, the execution PC control unit 366 responds to control signals provided by the PC control unit 362 on the ExPc control lines 378 and increment/size control signals provided on the control lines 380 to adjust the IF_PC addres^'s. These control signals are generated primarily in response to the IFIFO read control signal provided on line 342 and the PC increment/size value provided on the control lines 344 from the IEU 104. lϊ PF and ExPC Control/Data Unit Detail: Figure 4 provides a detailed block diagram of the prefetch and execution PC control units 364, 366. These units primarily consist of registers, 5 incrementorε and the like, selectors and adder blocks. Control for managing the transfer of data between these blocks is provided by the PC Control Unit 362 via the PFPC control lines 372, the ExPC Control lines 378 and the Increment Control lines 374, 380. For purposes of

10 clarity, those specific control lines are not shown in the block diagram of Figure 4. However, it should be understood that these control signals are provided to the blocks shown as described herein.

Central to the prefetch PC control unit 364 is a

15 prefetch selector (PF_PC SEL) 390 that operates as a central selector of the current prefetch virtual address. This current prefetch address is provided on the output bus 392 from the prefetch selector to an incrementor unit 394 to generate a next prefetch

20 address. This next prefetch address is provided on the incrementor output bus 396 to a parallel array of registers MBUF PFnPC 398, TBUF PFnPC 400, and EBUF PFnPC 402. These registers 398, 400, 402 effectively store the next instruction prefetch address. However, in

25 accordance with the preferred embodiment of the present invention, separate prefetch addresses are held for the MBUF 188, TBUF 190, and EBUF 192. The prefetch addresses, as stored by the MBUF, TBUF and EBUF PFnPC registers 398, 400, 402 are respectively provided by the

•30 address buses 404, 408, 410 to the prefetch selector 390. Thus, the PC control unit 362 can direct an immediate switch of the prefetch instruction stream merely by directing the selection, by the prefetch selector 390, of another one of the prefetch registers 398, 400, 402. Once that address value has been incremented by the incrementor 394, if a next instruction set in the stream is to be prefetched, the value is returned to the appropriate one of the prefetch registers 398, 400, 402. Another parallel array of registers, for simplicity shown as the single special register block 412, is provided to store a number of special addresses. The register block 412 includes a trap return address register, a procedural instruction return address register, a procedural instruction dispatch table base address register, a trap routine dispatch table base address register, and a fast trap routine table base address register. Under the control of the PC control unit 362, these return address registers may receive the current IFPC execution address via the bus 352'. The address values stored by the return and base address registers within the register block 412 may be both read and written independently by the IEU 104. The register are selected and values transferred via the special register address and data bus 354.

A selector within the special register block 412, controlled by the PC control unit 362, allows the addresses stored by the registers of the register block 412 to be put on the special register output bus 416 to the prefetch selector 390. Return addresses are provided directly to the prefetch selector 390. Base address values are combined with the offset value provided on the interrupt offset bus 373 from the interrupt control unit 363. Once sourced to the prefetch selector 390 via the bus 373', a special address can be used as the initial address for a new prefetch instruction stream by thereafter continuing the incremental loop of the address through the incrementor 394 and one of the prefetch registers 398, 400, 402.

Another source of addresses to the prefetch selector 390 is an array of registers within the target address register block 414. The target registers within the block 414 provide for storage of, in the preferred embodiment, eight potential branch target addresses. These eight storage locations logically correspond to the eight potentially executable instructions held in the lowest two master registers 216, 224 of the IFIFO unit 264. Since any, and potentially all of the those instructions could be conditional branch instructions, the target register block 414 allows for their precalculated target addresses to be stored awaiting use for fetching of a target instruction stream through the TBUF 190. In particular, if a conditional branch bias is set such that the PC Control Unit 362 immediately begins prefetching of a target instruction stream, the target address is immediately fed through the target register block 414 via the address bus 418 to the prefetch selector 390. Once incremented by the incrementor 394, the address is stored back to the TBUF PFnPC 400 for use in subsequent prefetch operations of the target instruction stream. If additional branch instructions occur within the target instruction stream, the target addresses of such secondary branches are calculated and stored in the target register array 414 pending use upon resolution of the first conditional branch instruction. A calculated target address as stored by the target register block 414, is transferred from a target address calculation unit within the execution PC control unit 366 via the address lines 382 or from the IEU 104 via the absolute target address bus 346. The Address value transferred through the prefetch PF_PC selector 390 is a full thirty-two bit virtual address value. The page size, in the preferred embodiment of the present invention is fixed at 16 KBytes, corresponding to the maximum page offset address value [13:0]. Therefore, a VMU page translation is not required unless there is a change in the current prefetch virtual page address [27:14], A comparitor in the prefetch selector 390 detects this circumstance. A VMU translation request signal (VMXLAT) is provided via line 372' to the PC control unit 362 when there is a change in the virtual page address, either due incrementing accross a page boundary or a control flow branch to another page address. In turn, the PC control unit 362 directs the placement of the VM VADDR address on lines 326, in addition to the CCU PADDR on lines 324, both via a buffer unit 420, and the appropriate control signals on the VMU control lines 326, 328, 330 to obtain a VMU virtual to physical page translation. Where a page translation is not required, the current physical page address [31:14] is maintained by a latch at the output of the VMU unit 108 on the bus 122.

The virtual address provided onto the bus 370 is incremented by the incrementor 394 in response to a signal provided on the increment control line 374. The incrementor 394 increments by a value representing an instruction set (four instructions or sixteen bytes) in order to select a next instruction set. The low-order four bits of a prefetch address as provided to the CCU unit 106 are zero. Therefore the actual target address instruction in a first branch target instruction set may not be located in the first instruction location. However, the low-order four bits of the address are provided to the PC control unit 362 to allow the proper first branch instruction location to be known by the IFU 102. The detection and handling, by returning the low order bits [3:2] of a target addressas the two-bit •buffer address, to select the proper first instruction for execution in a non-aligned target instruction set, is performed only for the first prefetch of a new instruction stream, i.e., any first non-sequential instruction set address in an instruction stream. The non-aligned relationship between the address of the first instruction in an instruction set and the prefetch address used in prefetching the instruction set can and is thereafter ignored for the duration of the current sequential.instruction stream.

The remainder of the functional blocks shown in Figure 4 comprise the execution PC control unit 366. In accordance with the preferred embodiment of the present invention, the execution PC control ^' unit 366 incorporates its own independently functioning program counter incrementor. Central to this function is an execution selector (DPC SEL) 430. The address output by the execution selector 430, on the address bus 352', is the present execution address (DPC) of the architecture 100. This execution address is provided to an adder unit 434. The increment/size control- signals provided on the lines 380 specify an instruction increment value of from one to four that the adder unit 434 adds to the address obtained from the selector 430. As the adder 432 additionally performs an output latch function, the incremented next execution address is provided on the address lines 436 directly back to the execution selector 430 for use in the next execution increment cycle.

The initial execution address and all subsequent new stream addresses are obtained through a new stream register unit 438 via the address lines 440. The new stream register unit 438 allows the new current prefetch address, as provided on the PFPC address bus 370 from the prefetch selector 390 to be passed on to the address bus 440 directly or stored for subsequent use. That is, where the prefetch PC control unit 364 determines to begin prefetching at a new virtual address, the new stream address is temporarily stored by the new stream register unit 438. The PC control unit 362, by its participation in both ^' the prefetch and execution increment cycles, holds the new stream address in the new stream register 438 unit until the execution address has reached the program execution point corresponding to the control flow instruction that instigated the new instruction stream. The new stream address is then output from the new stream register unit 438 to the execution selector 430 to initiate the independent generation of execution addresses in the new instruction stream. In accordance with the preferred embodiments of the present invention, the new stream register unit 438 provides for the buffering of two control flow instruction target addresses. By the immediate availability of the new stream address, there is essentially no latency in the switching of the execution PC control unit 366 from the generation of a current sequence of execution addresses to a new stream sequence of execution addresses.

Finally, an IFPC selector (IF_PC SEL) 442 is provided to ultimately issue the current IFPC address on the address bus 352 to the IEU 104. The inputs to the IFPC selector 442 are the output addresses obtained from either the execution selector 430 or new stream register unit 438. In most instances, the IFPC selector 442 is directed by the PC control unit 362 to select the execution address output by the execution selector 430. However, in order to further reduce latency in switching to a new virtual address used to initiate execution of a new instruction stream, the selected address provided from the new stream register unit 438 can be bypassed via bus 440 directly to the IFPC selector 442 for provision as the current IFPC execution address.

The execution PC control unit 366 is capable of calculating all relative branch target addresses. The current execution point address and the new stream register unit 438 provided address are received by a control flow selector (CF_PC) 446 via the address buses 352', 440. Consequently, the PC control unit 362 has substantial flexibility in selecting the exact initial address from which to calculate a target address. This initial, or base, address is provided via address bus 454 to a target address ALU 450. A second input value to the target ALU 450 is provided from a control flow displacement calculation unit 452 via bus 458. Relative branch instructions, in accordance with the preferred architecture 100, incorporate a displacement value in the form of an immediate mode constant that specifies a relative new target address. The control flow displacement calculation unit 452 receives the operand displacement value initially obtained via the IDecode unit operand output bus 318. Finally, an offset register value is provided to the target address ALU 450 via the lines 456. The offset register 448 receives an offset value via the control lines 378' from the PC control unit 362. The magnitude of the offset value is determined by the PC control unit 362 based on the address offset between the base address provided on the address lines 454 and the address of the current branch instruction for which the relative target address is being calculated. That is, the PC control unit 362, through its control of the IFIFO control logic unit 272 tracks the number of instructions separating the instruction at the current execution point address (requested by CP_PC) and the instruction that is currently being processed by the IDecode unit 262 and, therefore, being processed by the PC logic unit 270 to determine the target address for that instruction. Once the relative target address has been calculated by the target address ALU 450, the target address is written into a corresponding one of the target registers 414 via the address bus 382.

2, PC Control Algorithm Detail:

1. Main Instruction Stream Processing: MBUF PFnPC

1.1 the address of the next main flow prefetch instruction is stored in the MBUF PFnPC.

1.2 in the absence of a control flow instruction, a 32 bit incrementor adjusts the address value in the MBUF PFnPC by sixteen bytes (xl6) with each prefetch cycle.

1.3 when an unconditional control flow instruction is IDecoded, all prefetched data fetched subsequent to the instruction set will be flushed and the MBUF PFnPC is loaded, through the target register unit, PF_PC selector and incrementor, with the new main instruction stream address. The new address is also stored in the new stream registers.

1.3.1 the target address of a relative unconditional control flow is calculated by the IFU from register data maintained by the IFU and from operand data following the control flow instruction. 1.3.2 the target address of an absolute unconditional control flow instruction is eventually calculated by the IEU. from a register reference, a base register value, and an index register value.

1.3.2.1 instructionprefetch cycling stalls until the target address is returned by the IEU for absolute address control flow instruction; instruction execution cycling continues. 1.4 the address of the next main flow prefetch instruction set, resulting from an unconditional control flow instruction, is bypassed through the target address register unit, PF_PC selector and incrementor and routed for eventual storage in the MBUF PFnPC; prefetching continues at 1.2.

2. Procedural Instruction Stream Processing: EBUF PFnPC 2.1 a procedural instruction may be prefetched in the main or branch target instruction stream. If fetched in a target stream, stall prefetching of the procedural stream until the conditional control flow instruction resolves and the procedural instruction is transferred to the MBUF. This allows the TBUF to be used in handling of conditional control flows that occur in the procedural instruction stream. 2.1.1 a procedural instruction should not appear in a procedural instruction stream, i.e., procedural instructions should not be nested: a return from procedural instruction will return execution to the main instruction flow. In order to allow nesting, an additional, dedicated return from nested procedural instruction would be required. While the architecture can readily support such an instruction, the need for a nested procedural instruction capability will not likely improve the performance of the architecture.

2.1.2 in a main instruction stream, a procedural instruction stream that, in turn, includes first and second conditional control flow instruction containing instruction sets will stall prefetching with respect to the second conditional control flow instruction set until any conditional control flow instructions in the first such instruction set are resolved and the second conditional control flow instruction set has been transferred to the MBUF.

2.2 procedural instructions provide a relative offset, included as an immediate mode operand field of the instruction, to identify the procedural routine starting address:

2.2.1 the offset value provided by the procedural instruction is combined with a value contained in a procedural base address (PBR) register maintained in the IFU. This PBR register is readable and writable via the special address and data bus in response to the execution of a special register move instruction.

2.3 when a procedural instruction is encountered, the next main instruction stream IF__PC address is stored in the uPC return address register and the procedure-in-progresε bit in the processor status register (PSR) is set.

2.4 the starting address of the procedural stream is routed from the PBR register (plus the procedural instruction operand offset value) to the PF_PC selector.

2.5 the starting address of the procedural stream is simultaneously provided to the new stream register unit and to the incrementor for incrementing (xl6); the incremented address is then stored in the EBUF PFnPC.

2.6 in the absence of a control flow instruction, a 32 bit incrementor adjusts address value (xl6) in the EBUF PFnPC with each procedural instruction prefetch cycle.

2.7 when an unconditional control flow instruction is IDecoded, all prefetched data fetched subsequent to the branch instruction will be flushed and the EBUF PFnPC is loaded with the new procedural instruction stream address.

2.7.1 the target address of a relative unconditional control flow instruction is calculated by the IFU from IFU maintained register data and from the operand data provided within an immediate mode operand field of the control flow instruction.

2.7.2 the target address of an absolute unconditional branch is calculated by the IEU from a register reference, a base register value, and an index register value.

2.7.2.1 instruction prefetch cycling stalls until the target address is returned by the IEU for absolute address branches; execution cycling continues.

2.8 the address of the next procedural flow prefetch instruction set is stored in the EBUF

PFnPC and prefetching continues at 1.2.

2.9 when a return from procedure instruction is IDecoded, prefetching continues from the address stored in the uPC register, which is then incremented (xl6) and returned to the MBUF

PFnPC register for subsequent prefetches . Branch Instruction Stream Processing: TBUF PFnPC 3.1 when a conditional control flow instruction, occuring in a first instruction set in the MBUF instruction stream, is IDecoded, the target address is determined by the IFU if the target address is relative to the current address or by the IEU for absolute addresses. 3.2 for "branch taken bias":

3.2.1 if the branch is to an absolute address, stall instruction prefetch cycling until the target address is returned by the IEU; execution cycling continues.

3.2.2 load the TBUF PFnPC with the branch target address by thransfer through the PF_PC selector and incrementor. 3.2.3 target instruction stream instructions are prefetched into the TBUF and then routed into the IFIFO for subsequent execution; if the IFIFO and TBUF becomes full, stall prefetching. 3.2.4 the 32 bit incrementor adjusts (xl6) the address value in the TBUF PFnPC with each prefetch cycle. 3.2.5 stall the prefetch operation on IDecode of a conditional control flow instruction, occuring in a second instruction set in the target instruction stream until the all conditional branch instructions in the first (primary) set are resolved (but go ahead and calculate the relative target address and store in target reisters) . 3.2.6 if conditional branch in the first instruction set resolves to "taken":

3.2.6.1 flush instruction sets following the first conditional flow instruction set in the MBUF or

EBUF, if the source of the branch was the EBUF instruction stream as determined from the procedure-in- progreεs bit. 3.2.6.2 transfer the TBUF PFnPC value to

MBUF PFnPC or EBUF based on the state of the procedure-in-progress bit.

3.2.6.3 transfer the prefetched TBUF instructions to the MBUF or EBUF based on the state of procedure- in-progress bit.

3.2.6.4 if a second conditional branch instruction set has not been IDecoded, continue MBUF or EBUF prefetching operations based on the state of the procedure-in-progress bit.

3.2.6.5 if a second conditional branch instruction has been IDecoded, begin processing that instruction

(go to step 3.3.1).

3.2.7 if the conditional control for instruction(s) in the first conditional instruction set resolves to "not taken" : 3.2.7.1 flush the IFIFO and IEU of instruction sets and instructions from the target instruction stream. 3.2.7.2 continue MBUF or EBUF prefetching operations. 3.3 for "branch not taken bias":

3.3.1 stall prefetch of instructions into the MBUF; execution cycling continues.

3.3.1.1 if the conditional control flow instruction in the first conditional instruction set is relative, calculate the target address and store in the target registers.

3.3.1.2 if the conditional control flow instructions in the first conditional instruction set is absolute, wait for the IEU to calculate the target address and return the address to the target registers.

3.3.1.3 stall the prefetch operation on IDecode of a conditional control flow instruction in a second instruction set until the conditional control flow instructionε) in the first conditional instruction set instruction is resolved.

3.3.2 once the target address of the first conditional branch is calculated, load into TBUF PFnPC and also begin prefetching instructions into the TBUF concurrent with execution of the main instruction stream. Target instruction sets are not loaded into the IFIFO (the branch target instructions are thus on hand when each conditional control flow instruction in the first instruction set resolves) .

3.3.3 if a conditional control flow instruction in the first set resolves to "taken": 3.3.3.1 flush the MBUF or EBUF, if the source of the branch was the EBUF instruction stream, as determined from the state of the procedure- in-progress bit, and the IFIFO and IEU of instructions from the main stream following the first conditional branch instruction set. 3.3.3.2 transfer the TBUF PFnPC value to

MBUF PFnPC or EBUF, as determined from the state of the procedure- in-progress bit.

3.3.3.3 transfer the prefetched TBUF instructions to the MBUF or EBUF, as determined from the state of the procedure-in-progress bit.

3.3.3.4 continue MBUF or EBUF prefetching operations, as determined from the state of the procedure-in-progresε bit. 3.3.4 if a conditional control flow instruction in the first set resolves to "not taken": 3.3.4.1 flush the TBUF of instruction sets from the target instruction stream.

3.3.4.2 if a second conditional branch instruction has not been IDecoded, continue MBUF or EBUF, as determined from the state of the procedure-in-progress bit, prefetching operations.

3.3.4.3 if a second conditional branch instruction has been IDecoded, begin processing that instruction (go to step 3.4.1).

4. Interrupts, Exceptions and Trap Instructions. 4.1 Traps generically include:

4.1.1 Hardware Interrupts.

4.1.1.1 asynchronously (external) occurring events, internal or external.

4.1.1.2 can occur at any time and persist. 4.1.1.3 serviced in priority order between atomic (ordinary) instructions and may suspend procedural instructions.

4.1.1.4 the starting address of an interrupt handler is determined as the vector number offset into a predefined table of trap handler entry points.

4.1.2 Software Trap Instructions. 4.1.2.1 synchronously (internal) occurring instructions.

4.1.2.2 a^' software instruction that executes as an exception.

4.1.2.3 the starting address of the trap handler is determined from the trap number offset combined with a base address value stored in the TBR or FTB register.

4.1.3 Exceptions . 4.1.3.1 Events occurring synchronously with an instruction.

4.1.3.2 handled at the time the instruction is executed. 4.1.3.3 due to consequences of the exception, the excepted instruction and all subsequent executed instructions are cancelled.

4.1.3.4 the starting address of the exception handler is determined from the trap number offset into a predefined table of trap handler entry point.

4.2 Trap instruction stream operations occur in- line with the then currently executing instruction stream.

4.3 Traps may nest, provided the trap handling routine saves the xPC addresε prior to a next allowed trap — failure to do so will corrupt the state of the machine if a trap occurs prior to completion of the current trap operation.

5. Trap Instruction Stream Procesεing: xPC.

5.1 when a trap is encountered: 5.1.1 if an asynchronous interrupt, the execution of the currently executing instruction(s) is εuεpended. 5.1.2 if a εynchronouε exception, the trap iε processed upon execution of the excepted instruction.

5.2 when a trap iε processed: 5.2.1 interrupts are disabled. 5.2.2 the current IF_PC addreεε is stored in the xPC trap state return addresε regiεter.

5.2.3 the IFIFO and the MBUF prefetch buffers at and subsequent to the IF_PC address are flushed.

5.2.4 executed instructions at and subsequent to the address IF_PC and the results of those instructions are flushed from the IEU.

5.2.5 the MBUF PFnPC is loaded with the address of the trap handler routine.

5.2.5.1 source of a trap addresε either the

TBR or FTB regiεter, depending on the type of trap as determined by the trap number, which are provided in the set of special registers.

5.2.6 instructions are prefetched and dropped into the IFIFO for execution in a normal manner.

5.2.7 the instructions of the trap routine are then executed.

5.2.7.1 the trap handling routine may provide for the xPC addresε to be saved to a predefined location and interrupts re-enabled; the xPC register iε read/write via a special regiεter move inεtruciton and the special regiεter addresε and data bus.

5.2.8 the trap state must be exited by the execution of a return from trap instruction. 5.2.8.1 if prior saved, the xPC addreεε must be restored from its predefined location before executing the return from trap instruction.

5.3 when a return from trap is executed:

5.3.1 interrupts are enabled.

5.3.2 the xPC addreεε is returned to the current instruction stream regiεter MBUF or EBUF PFnPC, as determined from the state of the procedure-in-progress bit, and prefetching continues from that address.

5.3.3 the xPC addresε iε reεtored to the IF_PC regiεter through the new εtream regiεter. Interrupt and Exception Handling: , ) Overview:

Interruptε and exceptions will be processed, as long as they are enabled, regardless of whether the processor is executing from the main instruction εtream or a procedural inεtruction εtream. Interrupts and exceptions are serviced in priority order, and persist until cleared. The starting address of a trap handler is determined as the vector number offset into a predefined table of trap handler addresses as described below.

Interrupts and exceptions are of two basic types in the present embodiment, those which occur synchronously with particular instructions in the instruction stream, and those which occur asynchronously with particular instructions in the instruction εtream. The terms interrupt, exception, trap and fault are used interchangeably herein. Asynchronous interruptε are generated by hardware, either on-chip or off-chip, which does not operate synchronouεly with the instruction εtream. For example, interrupts generated by an on- chip timer/counter are asynchronous, as are hardware interrupts and non-maskable interruptε (NMI) provided from off-chip. When an aεynchronouε interrupt occurs, the processor context is frozen, all traps are disabled, certain processor status information is stored, and the processor vectors to an interrupt handler corresponding to the particular interrupt received. After the interrupt handler completes its processing, ^' program execution continues with the instruction following the last completed instruction in the stream which was executing when the interrupt occurred. Synchronous exceptions are those that occur εynchronouεly with inεtructionε in the inεtruction stream. These exceptions occur in relation to particular instructions, and are held until the relevant instruction is to be executed. In the preferred embodiments, εynchronous exceptions arise during prefetch, during inεtruction decode, or during instruction execution. Prefetch exceptions include, for example, TLB miss or other VMU exceptions. Decode exceptions arise, for example, if the inεtruction being decoded iε an illegal inεtruction or does not match the current privilege level of the processor. Execution exceptions arise due to arithmetic errors, for example, such as divide by zero. Whenever these exceptions occur, the preferred embodiments maintain- them in correspondence with the particular inεtruction which caused the exception, until the time at which that instruction is to be retired. At that time, all prior completed instructions are retired, any tentative results from the inεtruction which caused the exception are flushed, as are the tentative results of any following tentatively executed instructions. Control iε then transferred to an exception handler corresponding to the highest priority exception which occurred for that instruction.

Software trap instructions are detected at the IDecode stage by CF_DET 274 (Fig. 2) and are handled similarly to both unconditional call inεtructionε and other εynchronous traps. That iε, a target addreεε iε calculated and prefetch continues to the then-current prefetch queue (EBUF or MBUF). At the same time, the exception is also noted in correspondence with the instruction and is handled when the instruction iε to be retired. All other types of εynchronous exceptions are merely noted and accumulated in correspondence with the particular inεtruction which caused it and are handled at execution time.

2) Asynchronous Interrupts:

Asynchronous interrupts are signaled to the PC logic unit 270 over interrupt lines 292. Aε εhown in Figure 3, theεe lineε are provided to the interrupt logic unit 363 in the PC logic unit 270, and compriεe an NMI line, an IRQ line and a εet of interrupt level lines (LVL) . The NMI line signals a nonmaskable interrupt, and derives from an external source. It iε the highest priority interrupt except for hardware reset. The IRQ line also derives from an external source, and indicates when an external device iε requesting a hardware interrupt. The preferred embodiments permit up to 32 user-defined externally supplied hardware interrupts and the particular external device requesting the interrupt provides the number of the interrupt (0-31) on the interrupt level lines (LVL).

The memory error line iε activated by the MCU 110 to εignal variouε kindε of memory errors. Other aεynchronouε interrupt lineε (not εhown) are also provided to the interrupt logic unit 363, including lines for requesting a timer/counter interrupt, a memory I/O error interrupt, a machine check interrupt and a performance monitor interrupt. Each of the asynchronous interrupts, as well aε the synchronous exceptions described below, have a corresponding predetermined trap number associated with them, 32 of theεe trap numbers being aεεbciated with the 32 available hardware interrupt levelε. A table of these trap numbers iε maintained in the interrupt logic unit 363. The higher the trap number, in general, the higher the priority of the trap. When one of the asynchronouε interrupts is signaled to the interrupt logic unit 363, the interrupt control unit 363 sends out an interrupt request to the IEU 104 over INT REQ/ACK lineε 340. Interrupt control unit 363 also sends a suspend prefetch εignal to PC control unit 362 over lineε 343, causing the PC control unit 262 to stop prefetching instructions. The IEU 104 either cancels all then-executing inεtructionε, and fluεhing all tentative reεults, or it may allow some or all instructionε to complete. In the preferred embodiments, • any then-executing instructions are canceled, thereby permitting the fastest response to asynchronous interruptε. In any event, the DPC in the execution PC control unit 366 iε updated to correεpond to the last instruction which has been completed and retired, before the IEU 104 acknowledges the interrupt. All other prefetched instructions in MBUF, EBUF, TBUF and IFIFO 264 are also cancelled.

Only when the IEU 104 iε ready to receive inεtructionε from an interrupt handler does it send an interrupt acknowledge εignal on INT REQ/ACK lineε 340 back to the interrupt control unit 363. The interrupt control unit 363 then diεpatcheε to the appropriate trap handler as described below. 3T Synchronous Exceptions:

For εynchronouε exceptionε, the interrupt control unit 363 maintainε a εet of four internal exception bits (not εhown) for each instruction set, one bit corresponding to each instruction in the set. The interrupt control unit 363 also maintains an indication of the particular trap numbers, if any detected for each instruction.

If the VMU signalε a TLB miss or another VMU exception while a particular inεtruction εet is being prefetched, this information is transmitted to the PC logic unit 270, and in particular to the interrupt control unit 363, over the VMU control lines 332 and 334. When the interrupt control unit 363 receives such a εignal, it signalε the PC control unit 362 over line 343 to suspend further prefetches. At the same time, the interrupt control unit 363 sets the VM_Misε or VM_Excp bit, aε appropriate, aεεociated the prefetch buffer to which the inεtruction εet was destined. The interrupt control unit 363 then setε all four internal exception indicator bitε corresponding to that instruction εet, εince none of the inεtructionε in the εet are valid, and εtores the trap number for the particular exception received in correspondence with each of the four instructions in the faulty instruction set. The shifting and executing of instructions prior to the faulty instruction εet then continues as usual until the faulty εet reaches the lowest level in the IFIFO 264. Similarly, if other synchronous exceptions are detected during the shifting of an inεtruction through the prefetch buffers 260, the IDecode unit 262 or the IFIFO 264, this information is also tranεmitted to the interrupt control unit 363 which εets the internal exception indicator bit corresponding to the instruction generating the exception and εtoreε the trap number in correspondence with that exception. As with prefetch synchronous exceptions, the shifting and executing of instructions prior to the faulty instruction then continues as uεual until the faulty εet reaches the lowest level in the IFIFO 264.

In the preferred embodiments, the only type of exception which iε detected during the εhifting of an inεtruction through the prefetch buffers 260, the IDecode unit 262 or the IFIFO 264 is a software trap instruction. Software trap instructions are detected at the IDecode stage by CF_DET unit 274. While in εome embodimentε other forms of synchronous exceptions may be detected in the IDecode unit 262, it is preferred that the detection of any other synchronous exceptions wait until the inεtruction reaches the execution unit 104. Thiε avoids the possibility that certain exceptions, such aε arriεing from the handling of privileged instruction, might be signaled on the basiε of a processor state which could change before the effective in-order-execution of the instruction. Exceptions which do not depend on the processor state, such as illegal instruction, could be detected in the IDecode stage, but hardware iε minimized if the same logic detects all pre- execution synchronous exceptions (apart from VMU exceptions). Nor iε there any time penalty impoεed by waiting until instructions reach the execution unit 104, since the handling of such exceptions iε rarely time critical.

As mentioned, software trap instructions are detected at the IDecode stage by the CF_DET unit 274. The internal exception indicator bit corresponding to that instruction in the interrupt logic unit 363 is set and the software trap number, which can be any number from 0 to 127 and which is specified in an immediate mode operand field of the software trap instruction, iε stored in correspondence with the trap inεtruction. Unlike prefetch εynchronous exceptions, however, since software traps are treated as both a control flow instruction and as a εynchronous exception, the interrupt control unit 363 does not signal PC control unit 362 to εuεpend prefetches when a software trap instruction is detected. Rather, at the same time the instruction is shifting through the IFIFO 264, the IFU 102 prefetches the trap handler into the MBUF instruction εtream buffer. When an instruction set reaches the lowest level of the IFIFO 264, the interrupt logic unit 363 transmits the exception indicator bits for that inεtruction εet aε a 4-bit vector to the IEU 104 over the SYNCH_INT_INFO lines 341 to indicate which, if any, of the instructions in the inεtruction εet have already been determined to be the source of a εynchronouε exception. The IEU 104 doeε not respond immediately, but rather permitε all the inεtructionε in the inεtruction set to be scheduled in the normal course. Further exceptions, such as integer arithmetic exceptions, may be generated during execution. Exceptionε which depend on the current εtate of the machine, such as due to the execution of a privileged instruction, are also detected at this time, and in order to ensure that the εtate of the machine is current with respect to all previous instructionε in the instruction stream, all inεtructionε which have a possibility of affecting the PSR (such aε εpecial move and returnε from trap instructions) are forced .to execute in order. Only when an inεtruction that is the source of a synchronous exception of any sort is about to be retired, iε the occurance of the exception εignaled to the interrupt logic unit 363.

The IEU 104 retires all inεtructionε which have been tentatively executed and which occur in the instruction εtream prior to the firεt inεtruction which has a synchronous exception, and flushes the tentative results from any tentatively executed instructions which occur subsequently in the inεtruction εtream. The particular instruction that caused the exception iε alεo fluεhed since that instruction will typically be re- executed upon return from trap. The IF_PC in the execution PC control unit 366 iε then updated to correspond to the last inεtruction actually retired, and the before any exception iε signaled to the interrupt control unit 363.

When the inεtruction that iε the εource of an exception is retired, the IEU 104 returns to the interrupt logic unit 363, over the SYNCH_INT_INFO lines 341, both a new 4-bit vector indicating which, if any, inεtructionε in the retiring inεtruction set (register 224) had a synchronous exception, as well as information indicating the source of the firεt exception in the instruction εet. The information in the 4-bit exception vector returned by IEU 104 iε an accumulation of the 4-bit exception vectors provided to the IEU 104 by the interrupt logic unit 363, aε well as exceptions generated in the IEU 104. The remainder of the information returned from the IEU 104 to interrupt control unit 363, together with any information already stored in the interrupt control unit 363 due to exceptions detected on prefetch or IDecode, is sufficient for the interrupt control unit 363 to determine the nature of the highest priority synchronous exception and its trap number.

41 Handler Dispatch and Return:

After an interrupt acknowledge εignal iε received over lineε 340 from the IEU, or after a non- zero exception vector iε received over lineε 341, the current DPC is temporarily εtored aε a return addreεε in an xPC regiεter, which iε one of the εpecial registers 412 (Figure 4). The current processor status register (PSR) iε alεo εtored in a previouε PSR (PPSR) regiεter, and the current compare state register (CSR) iε εaved in a prior compare εtate regiεter (PCSR) in the εpecial regiεterε 412.

The addreεs of a trap handler is calculated aε a trap base register address plus an offset. The PC logic unit 270 maintains two base regiεterε for trapε, both of which are part of the special registerε 412 (Figure 4), and both of which are initialized by εpecial move instructions executed previously. For most trapε, the base register used to calculate the addresε of the handler is a trap base regiεter TBR.

The interrupt control unit 363 determineε the higheεt priority interrupt or exception currently pending and, through a look-up table, determines the trap number associated therewith. This is provided over a εet of INT_OFFSET lineε 373 to the prefetch PC control unit 364 aε an offεet to the selected base regiεter. Advantageously, the vector addreεs iε calculated by merely concatenating the offεet bits aε low-order bits to the higher order bits obtained from the TBR register. This avoidε any need for the delayε of an adder. (Aε uεed herein, the ^'2¹ bit iε referred to aε the i'th order bit.) For example, if trapε are numbered from 0 through 255, represented as an 8 bit value, the handler addreεε may be calculated by concatenating the 8 bit trap number to the end of a 22-bit TBR εtored value. Two low-order zero bits may be appended to the trap number to enεure that the trap handler address always occurs on a word boundary. The concatenated handler address thus constructed is provided aε one of the inputε, 373; to the prefetch selector PF_PC Sel 390 (Figure 4), and is selected as the next address from which instructions are to be prefetched.

The vector handler addreεs for traps uεing the TBR regiεter are all only one word apart. Thuε, the instruction at the trap handler addresε must be a preliminary branch instruction to a longer trap handling routine. Certain traps require very careful handling, however, to prevent degradation of syεtem performance. TLB traps, for example, must be executed very quickly. For thiε reason, the preferred embodiments include .a fast trap mechanism designed to allow the calling of small trap handlers without the cost of this preliminary branch. In addition, fast trap handlerε can be located independently in memory, in on-chip ROM, for example, to eliminate memory εyεtem penalties aεεociated with RAM locationε.

In the preferred embodimentε, the only trapε which reεult in faεt trapε are the VMU exceptions mentioned above. Faεt traps are numbered separately from other trapε, and have a range from 0 to 7. However, they have the same priority as MMU exceptions. When the interrupt control unit 363 recognizes a fast trap aε the higheεt priority trap then pending, it cauεeε a faεt trap base register (FTB) to be selected from the special regiεterε 412 and provided on the lineε 416 to be combined with the trap offεet. The resulting vector address provided to the prefetch selector PF_PC Sel 390, via lineε 373', iε then a concatenation of the high-order 22 bitε from the FTB register, followed by three bits representing the fast trap number, followed by seven bitε of 0'ε. Thus, each fast trap addresε iε 128 byteε, or 32 wordε apart. When called, the proceεεor brancheε to the starting word and may execute programs within the block or branch out of it. Execution of small programs, such as standard TLB handling routines which may be implemented in 32 instructions or less, is faster than ordinary traps because the preliminary branch to the actual exception handling routine is obviated.

It should be noted that although all instructionε have the same length of 4 bytes (i.e., occupy four address locations) in the preferred embodimentε, it should be noted that the faεt trap mechaniεm iε alεo useful in microprocesεors whoεe inεtructionε are variable in length. In this case, it will be appreciated that the faεt trap vector addresεeε be separated by enough εpace to accommodate at least two of the εhorteεt inεtructionε available on the microproceεεor, and preferably about 32 average-εized instructions. Certainly, if the microprocesεor includeε a return from trap instruction, the vector addresεeε should be εeparated by at leaεt enough εpace to permit that instruction to be preceded by at leaεt one other instruction in the handler.

Also on dispatch to a trap handler, the processor enters both a kernel mode and an interrupted εtate. Conncurrently, a copy of the compare εtate register (CSR) iε placed in the prior carry εtate regiεter (PCSR) and a copy of the PSR is stored in the prior PSR (PPSR) regiεter. The kernel and interrupted εtates modes are represented by bits in the processor statuε regiεter (PSR) . Whenever the interrupted_εtate bit in the current PSR iε εet, the shadow regiεterε or trap registers RT[24] through RT[31], as described above and as εhown in Figure 7b, become viεible. The interrupt handler may switch out of kernel mode merely by writing a new mode into the PSR, but the only way to leave the interrupted state iε by executing a return from trap (RTT) inεtruction.

When the IEU 104 executes an RTT inεtruction, PCSR is restored to CSR register and PPSR regiεter iε reεtored to the PSR regiεter, thereby automatically clearing the interrupt_εtate bit in the PSR regiεter. The PF_PC SEL εelector 390 alεo εelectε εpecial regiεter xPC in the special register set 412 as the next address from which to prefetch. xPC is restored to either the MBUF PFnPC or the EBUF PFnPC as appropriate, via incrementor 394 and bus 396. The decision as to whether to restore xPC into the EBUF or MBUF PFnPC is made according to the "procedure_in_progresε" bit of the PSR, once reεtored.

It should be noted that the processor does not use the same εpecial regiεter xPC to εtore the return address for both trapε and procedural instructionε . The return addreεε for a trap iε εtored in the εpecial register xPC, as mentioned, but the addreεε to return to after a procedural inεtruction iε εtored in a different special regiεter, uPC. Thus, the interrupted εtate remains available even while the processor iε executing an emulation εtream invoked by a procedural instruction. On the other hand, exception handling routines should not include any procedural inεtructionε εince there iε no εpecial regiεter to εtore an addreεε for return to the exception handler after the emulation εtream iε complete.

5 ) Nesting: Although certain procesεor ^• εtatus information is automatically backed up on diεpatch to a trap handler, in particular CSR, PSR, the return PC, and in a sense the "A" regiεter εet ra[24] through ra[31], other context information is not protected. For example, the contents of a floating point εtatuε register (FSR) is not automatically backed up. If a trap handler intends to alter these regiεterε, it muεt perform itε own backup.

Because of the limited backup which is performed automatically on a dispatch to a trap handler, nesting of traps is not automatically permitted. A trap handler should back up any desired regiεterε, clear any interrupt condition, read any information neceεεary for handling the trap from the system regiεterε and proceεε it as appropriate. Interrupts are automatically disabled upon dispatch to the trap handler. After processing, the handler can then restore the backed up registerε, re-enable interrupts and execute the RTT instruction to return from the interrupt. If nested traps are to be allowed, the trap handler should be divided into firεt and second portions. In the first portion, while interrupts are disabled, the xPC should be copied, using a εpecial regiεter move instruction, and pushed onto the stack maintained by the trap handler. The addreεε of the beginning of the second portion of the trap handler should then be moved using the εpecial regiεter move inεtruction into the xPC, and a return from trap instruction (RTT) executed. The RTT removes the interrupted εtate (via the reεtoration of PPSR into PSR) and transfers control to the addreεε in the xPC, which now containε the addreεs of the second portion of the handler. The εecond portion may enable interruptε at this point and continue to process the exception in an interruptable mode. It should be noted that the shadow registerε RT[24] through RT[31] are visible only in the first portion of this handler, and not in the second portion. Thus, in the second portion, the handler should preserve any of the "A" register values where these register values are likely to be altered by the handler. When the trap handling procedure iε finiεhed, it εhould restore all backed up registers, pop the original xPC off the trap handler stack and move it back into the xPC special regiεter uεing a εpecial register move instruction, and execute another RTT. This returnε control to the appropriate instruction in the main or emulation instruction stream.

61 List of Trans: The following Table I setε forth the trap numbers, priorities and handling modes of trapε which are recognized in the preferred embodiment :

The combined control and data path portions of IEU 104 are εhown in Figure 5. The primary data path begins with the instruction/operand data buε 124 from the IFU 102. As a data bus, immediate operandε are provided to an operand alignment unit 470 and pasεed on to a regiεter file (REG ARRAY) 472. Regiεter data iε provided from the regiεter file 472 through a bypass unit 474, via a register file output bus 476, to a parallel array of functional computing elementε (FU^) 478^,,, via a diεtribution buε 480. Data generated by the functional units 478^ is provided back to the bypass unit 474 or the regiεter array 472, or both, via an output bus 482.

A load/store unit 484 completes the data path portion of the IEU 104. The load/store unit 484 is responsible for managing the transfer of data between the IEU 104 and CCU 106. Specifically, load data obtained from the data cache 134 of the CCU 106 iε transferred by the load/store unit 484 to an input of the regiεter array 472 via a load data buε 486. Data to be εtored to the data cache 134 of the CCU 106 iε received from the functional unit diεtribution buε 480. The control path portion of the IEU 104 is responsible for isεuing, managing, and completing the proceεεing of information through the IEU data path. In the preferred embodiments of the present invention the IEU control path is capable of managing the concurrent execution of multiple inεtructionε and the IEU data path provides for multiple independent data tranεferε between essentially all data path elementε of the IEU 104. The IEU control path operates in response to inεtructionε received via the inεtruction/operand buε 124. Specifically, inεtruction εetε are received by the EDecode unit 490. In the preferred embodiments of the present invention, the EDcode 490 receives and decodes both instruction sets held by the IFIFO master registers 216, 224. The results of the decoding of all eight instructions is variously provided to a carry checker (CRY CHKR) unit 492, dependency checker (DEP CHKR) unit 494, register renaming unit (REG RENAME) 496, instruction isεuer (ISSUER) unit 498 and retirement control unit (RETIRE CTL) 500. The carry checker unit 492 receives decoded information about the eight pending instructions from the EDecode unit 490 via control lineε 502. The function of the carry checker 492 is to identify those ones of the pending inεtructionε that either affect the carry bit of the proceεεor εtatuε word or are dependent on the εtate of the carry bit. Thiε control information iε provided via control lineε 504 to the inεtruction iεεuer unit 498. Decoded information identifying the regiεters of the register file 472 that are used by the eight pending instructions as provided directly to the regiεter renaming unit 496 via control lineε 506. Thiε information iε alεo provided to the dependency checker unit 494. The function of the dependency checker unit 494 iε to determine which of the pending inεtructionε reference registers as the destination for data and which instructions, if any, are dependant on any of those destination regiεterε. Thoεe instructionε that have register dependencies are identified by control εignalε provided via the control lineε 508 to the register rename unit 496.

Finally, the EDecode unit 490 provides control information identifying the particular nature and function of each of the eight pending instructionε to the inεtruction iεεuer unit 498 via control lineε 510. The iεεuer unit 498 iε reεponεible for determining the data path resources, particularly of the availability of particular functional units, for the execution of pending instructionε. In accordance with the preferred embodimentε of the architecture 100, inεtruction iεεuer unit 498 allows for the out-of-order execution of any of the eight pending inεtructionε εubject to the availability of data path reεourceε and carry and regiεter dependency conεtraintε. The regiεter rename unit 496 provideε the inεtruction issuing unit 498 with a bit map, via control lines 512 of those instructions that are suitably unconstrained to allow execution. Instructions that have already been executed (done) and those with regiεter or carry dependancieε are logically removed from the bit map.

Depending on the availability of required functional units 478^, the instruction isεuer unit 498 may initiate the execution of multiple inεtructionε during each εyεtem clock cycle. The εtatus of the functional units 478^ are provided via a εtatus buε 514 to the inεtruction iεεuer unit 498. Control εignals for initiating, and subsequently managing the execution of instructions are provided by the instruction issuer unit 498 on the control lines 516 to the regiεter rename unit 496 and εelectively to the functional unitε 78^. In response, the register rename unit 496 provideε regiεter selection signalε on a regiεter file acceεε control bus 518. The specific regiεters enabled via the control εignalε provided on the buε 518 are determined by the εelection of the inεtruction being executed and by the determination by the register rename unit 496 of the registerε referenced by that particular instruction. A bypaεε control unit (BYPASS CTL) 520 generally controls the operation of the bypaεε data routing unit 474 via control signals on control lines 524. The bypaεε control unit 520 monitorε the εtatuε of each of the functional unitε 478^ and, in conjunction with the regiεter references provided from the register rename unit 496 via control lines 522, determines whether data is to be routed from the regiεter file 472 to the functional unitε 478^ or whether data being produced by the functional unitε 478^ can be immediately routed via the bypaεε unit 474 to the functional unit diεtribution bus 480 for use in the execution of a newly iεεued inεtruction εelected by the instruction iεεuer unit 498. In either case, the inεtruction issuer unit 498 directly controls the routing of data from the diεtribution buε 480 to the functional unitε 78^ by selectively enabling specific register data to each of the functional units 478^.

The remaining units of the IEU control path include a retirement control unit 500, a control flow control (CF CT1) unit 528, and a done control (DONE CTL) unit 536. The retirement control unit 500 operates to void or confirm the execution of out-of-order executed instructions . Where an instruction has been executed out-of-order, that inεtruction can be confirmed or retired once all prior inεtructionε have alεo been retired. Baεed on an identification of which of the current εet of eight pending instructions have been executed provided on the control lines 532, the retirement control unit 500 provideε control signals on control lineε 534 coupled to the bus 518 to effectively confirm the result data stored by the regiεter array 472 as the reεult of the prior execution of an out-of-order executed inεtruction. The retirement control unit 500 provides the PC increment/size control εignals on control lines 344 to the IFU 102 as it retires each instruction. Since multiple instructionε may be executed out-of-order, and therefore ready ! for εimultaneouε retirement, the retirement control unit 500 determines a εize value baεed on the number of instructionε simultaneously retired. Finally, where all instructions of the IFIFO master regiεter 224 have been executed and retired, the retirement control unit 500 provides the IFIFO read control signal on the control line 342 to the IFU 102 to initiate an IFIFO unit 264 shift operation, thereby providing the EDecode unit 490 with an additional four inεtructionε aε inεtructionε pending execution. The control flow control unit 528 performε the εomewhat more εpecific function of detecting the logical branch reεult of each conditional branch inεtruction. The control flow control unit 528 receiveε an 8 bit vector identification of the currently pending conditional branch instructions from the EDecode unit 490 via the control lines 510. An 8 bit vector inεtruction done control εignal iε εimilarly received via the control lineε 538 from the done control unit 540. This done control signal allows the control flow control unit 528 to identify when a conditional branch instruction iε done at leaεt to a point sufficient to determine a conditional control flow εtatuε. The control flow εtatuε reεult for the pending conditional branch instructions are stored by the control flow control unit 528 as they are executed. The data neceεεary to determine the conditional control flow inεtruction outcome iε obtained from temporary status registers in the register array 472 via the control lines 530. As each conditional control flow instruction is executed, the control flow control unit provides a new control flow result εignal on the control lines 348 to the IFU 102. This control flow result signal preferably includes two 8 bit vectors defining whether the statuε results, by respective bit position, of the eight potentially pending control flow instruction are known and the corresponding status reεult εtateε, also given by bit position correspondence.

Lastly, the done control unit 540 is provided to monitor the operational execution εtate of each of the functional unitε 478^. As any of the functional unitε 478^ signal completion of an instruction execution operation, the done control unit 540 provides a corresponding done control εignal on the control lineε 542 to alert the regiεter rename unit 496, inεtruction iεεuer unit 498, retirement control unit 500 and bypaεε control unit 520.

The parallel array arrangement of the functional unitε 478₀.,, enhances the control consiεtency of the IEU 104. The particular nature of the individual functional unitε 478^ ^must *°^e known by the inεtruction iεsuer unit 498 in order for instructions to be properly recognized and scheduled for execution. The functional units 478^ _n are responsible for determining and implementing their εpecific control flow operation neceεεary to perform their requiεite function. Thuε, other than the inεtruction issuer 498, none of the IEU control units need to have independant knowledge of the control flow procesεing of an inεtruction. Together, the inεtruction issuer unit 498 and the functional units 478^ provide the necessary control signal prompting of the functions to be performed by the remaining control flow managing unitε 496, 500, 520, 528, 540. Thuε, alteration in the particular control flow operation of a functional unit 478^ doeε not impact the control operation of the IEU 104. Further, the functional augmentation of an exiεting functional unit 478^ and even the addition of one or more new functional unitε 78^, εuch aε an extended preciεion floating point multiplier and extended preciεion floating point ALU, a faεt fourier computation functional unit, and a trigonometric computational unit, require only minor modification of the instruction iεεuer unit 498. The required modifications muεt provide for recognition of the particular instruction, based on the corresponding instruction field isolated by the EDecode unit 490, a correlation of the instruction to the required functional unit 78^. Control over the selection of regiεter date, routing of data, instruction completion and retirement remain consiεtent with the handling of all other intεtructionε executed with reεpect to all other oneε of the functional unitε 478^,,.

A) IEU Data Path Detail:

The central element of the IEU data path iε the regiεter file 472. Within the IEU data path, however, the preεent invention provideε for a number of parallel data pathε optimized generally for specific unctions. The two principal data paths are integer and floating point. Within each parallel data path, a portion of the register file 472 iε provided to support the data manipulations occurring within that data path.

1, Register File Detail: The preferred generic architecture of a data path register file is εhown in Figure 6a. The data path regiεter file 550 includeε a temporary buffer 552, a regiεter file array 564, an input εelector 559, and an output selector 556. Data ultimately destined for the regiεter array 564 iε typically firεt received by the temporary buffer 552 through a combined data input buε 558'. That is, all data directed to the data path regiεter file 550 iε multiplexed by the input selector 559 from a number of input buseε 558, preferably two, onto the input bus 558'. Register select and enable control signals provided on the control bus 518 select the register location for the received data within the temporary buffer 552. On retirement of an instruction that produced data εtored in the temporary buffer, control εignalε again provided on the control buε 518 enable the transfer of the data from the temporary buffer 552 to a logically corresponding register within the regiεter file array 564 via the data buε 560. However, prior to retirement of the inεtruction, data εtored in the registerε of the temporary buffer 552 may be utilized in the execution of εubεequent inεtructions by routing the temporary buffer εtored data to the output data εelector 556 via a bypaεε portion of the data buε 560. The εelector 556, controlled by a control εignal provided via the control buε 518 εelectε between data provided from the regiεterε of the temporary buffer 552 and of the regiεter file array 564. The reεulting data iε provided on the regiεter file output buε 564. Alεo, where an executing instruction will be retired on completion, i.e., the instruction has been executed in-order, the input εelector 559 can be directed to route the reεult data directly to the register array 554 via bypasε extenεion 558".

In accordance with the preferred e bodimentε of the present invention, each data path register file 550 permitε two εimultaneouε regiεter operations to occur. Thus, the input bus 558 provideε for two full regiεter width data valueε to be written to the temporary buffer 552. Internally, the temporary buffer 552 provideε a multiplexer array permitting the εimultaneous routing of the input data to any two registers within the temporary buffer 552. Similarly, internal multiplexers allow any five regiεters of the temporary buffer 552 to be selected to output data onto the bus 560. The regiεter file ,array 564 likewiεe includes input and output multiplexers allowing two regiεterε to be εelected to receive, on buε 560, or five to εource, via bus 562, respective data εimultaneouεly. Finally, the regiεter file output εelector 556 iε preferably implemented to allow any five of the ten regiεter data valueε received via the buses 560, 562 to be εimultaneouεly output on the register file output buε 564.

The regiεter set within the temporary buffer is generally shown in Figure 6b. The register εet 552' consists of eight single word (32 bit) registers IORD, I1RD...I7RD. The register set 552' may alεo be used as a set of four double word registerε IORD, IORD+1 (IORD4), I1RD, I1RD+1 (ISRD)... I3RD, I3RD+1 (I7RD).

In accordance with the present invention, rather than provide duplicate registers for each of the registers within the regiεter file array 564, the regiεterε in the temporary buffer regiεter set 552 are referenced by the register rename unit 496 baεed on the relative location of the reεpective inεtructionε within the two IFIFO maεter regiεters 216, 224. Each instruction implemented by the architecture 100 may reference for output up to two regiεters, or one double word register, for the destination of data produced by the execution of the instruction. Typically, an instruction will reference only a single output regiεter. Thuε, for an instruction two (I₂) of the eight pending instructionε, poεitionally identified aε εhown in Figure 6C and that references a single output register, the data destination register I2RD will be selected to receive data produced by the execution of the instruction. Where the data produced by the instruction I₂iε uεed by a εubεequent inεtruction, for example, I₅, the data stored in the I2RD regiεter will be tranεferred out via the buε 560 and the reεultant data stored back to the temporary buffer 552 into the regiεter identified aε I5RD. Notably, instruction I_ε iε dependent on inεtruction I₂. Inεtruction I₅ cannot be executed until the reεult data from I₂ is available. However, aε can be seen, instruction I₅ can execute prior to the retirement of instruction I₂ by obtaining itε required input data from the inεtruction I₂ data location of the temporary buffer 552'.

Finally, aε inεtruction I₂ is retired, the data from the regiεter I2RD iε written to the regiεter location within the regiεter file array 564 aε determined by the logical poεition of the instruction at the point of retirement. That is, the retirement control unit 560 determines the address of the destination regiεterε in the register file array from the register reference field data provided from the EDecode unit 490 on the control lines 510. Once instructions 1,^ have been retired, the values in I4RD-I7RD are shifted into I0RD- I3RD simultaneouε with a shift of the IFIFO unit 264.

A complication arises where instruction I₂ provides a double word result value. In accordance with a preferred embodiment of the present invention, a combination of locations I2RD and I6RD iε uεed to εtore the data resulting from inεtruction I₂ until that inεtruction iε retired or otherwise cancelled. In the preferred embodiment, execution of instructions I_4.7 are held where a double word output reference by any of the inεtructionε I^ iε detected by the register rename unit 496. This allowε the entire temporary buffer 552' to be uεed aε a εingle rank of double word regiεterε. Once instructions I_M have been retired, the temporary buffer 552' can again be uεed as two ranks of single word regiεters. Further, the execution of any instruction I₄. ₇ is held where a double word output regiεter iε required until the inεtruction haε been εhifted into a corresponding I_M location. The logical organization of the register file array 564 is εhown in. Figure 7a-b. In accordance with the preferred embodimentε of the present invention, the register file array 564 for the integer data path consists of 40 32-bit wide regiεters. This set of registerε, constituting a register set "A", is organized as a base register set ra[0..23] 565, a top set of general purpose regiεterε ra[24..31] 566, and a shadow regiεter set of eight general purpose trap regiεters rt[24..31]. In normal operation, the general purpose registerε ra[0..31] 565, 566 conεtitutes the active "A" regiεter εet of the regiεter file array for the integer data path.

Aε εhown in Figure 7b the trap regiεters rt[24..31] 567 may be swapped into the active register set "A" to allow access along with the active base set of regiεters ra[0..23] 565. This configuration of the "A" register set is selected upon the acknowledgement of an interrupt or the execution of an exception trap handling routine. This state of the register set "A" is maintained until expresεly returned to the εtate εhown in Figure 7a by the execution of an enable interruptε inεtruction or execution of a return from trap instruction.

In the preferred embodiment of the preεent invention as implemented by the architecture 100, the floating point data path utilizes an extended precision regiεter file array 572 aε generally shown in Figure 8. The register file array 572 consists of 32 regiεterε, rf[0..31], each having a width of 64 bits. The floating point register file 572 may alεo be logically referenced as a "B" set of integer regiεterε rb[0..31]. In the architecture 100, thiε "B" εet of regiεterε iε equivalent to the low-order 32 bitε of each of the floating point regiεterε rf[0..31]. Representing a third data path, a boolean operator register εet 574 is provided, aε εhown in Figure 9, to εtore the logical result of boolean combinatorial operations. This "C" regiεter εet 574 consists of 32 single bit registerε, rc[0..31]. The operation of the boolean regiεter εet 574 iε unique in that the results of boolean operations can be directed to any inεtruction εelected regiεter of the boolean regiεter set- 574. Thiε iε in contrast to utilizing a single processor statuε word regiεter that stores single bit flags for conditions such aε equal, not equal, greater than and other simple boolean statuε values.

Both the floating point regiεter εet 572 and the boolean regiεter set 574 are complimented by temporary buffers architecturally identical to the integer temporary buffer 552 εhown in Figure 6b. The eεεential difference iε that the width of the temporary buffer regiεterε iε defined to be identical to those of the complimenting register file array 572, 574; in the preferred implementation, 64 bits and one bit, respectively.

A number of additional εpecial regiεters are at leaεt logically preεent in the register array 472. The registers that are physically present in the register array 472, as εhown in Figure 7c, include a kernel stack pointer 568, processor εtate regiεter (PSR) 569, previouε proceεεor εtate regiεter (PPSR) 570, and an array of eight temporary processor state registers (tPSR[0..7]) 571. The remaining special regiεterε are distributed throughout variouε partε of the architecture 100. The εpecial addreεs and data buε 354 iε provided to select and tranεfer data between the εpecial registerε and the "A" and "B" εetε of regiεterε. A εpecial register move inεtruction iε provided to select a regiεter from either the "A" or "B" regiεter εet, the direction of tranεfer and to specify the addresε identifier of a special regiεter.

The kernel εtack pointer register and temporary procesεor εtate regiεterε differ from the other εpecial regiεterε. The kernel εtack pointer may be acceεεed through execution of a εtandard regiεter to regiεter move inεtruction when in kernel εtate. The temporary proceεεor εtate registers are not directly accessible. Rather, this array of regiεterε iε uεed to implement an inheritance mechanism for propagating the value of the procesεor εtate regiεter for use by out-of-order executing instructionε. The initial propagation value iε that of the proceεεor state register: the value provided by the last retired instruction. This initial value is propagated forward through the temporary procesεor εtate regiεterε εo that any out-of-order executing inεtruction haε access to the value in the poεitionally corresponding temporary processor state register. The εpecific nature of an inεtruction defineε the condition code bitε, if any, that the inεtruction iε dependent on and may change. Where an inεtruction is unconstrained by dependencieε, regiεter or condition code as determined by the register dependency checker unit 494 and carry dependency checker 492, the instruction can be executed out-of-order. Any modification of the condition code bitε of the processor εtate regiεter are directed to the logically correεponding temporary proceεεor state regiεter. Specifically, only those bits that may change are applied to the value in the temporary procesεor εtate regiεter and propagated to ^' all higher order temporary procesεor εtate regiεterε. Conεequently, every out-of- order executed inεtruction executeε from a proceεsor εtate regiεter value modified appropriately by any intervening PSR modifying inεtructionε . Retirement of an inεtruction only tranεfers the corresponding temporary processor state registerε value to the PSR regiεter 569.

The remaining εpecial regiεterε are deεcribed in Table II.

TABLE II Special Registerε

Special Move Reg R/W Description:

PC R Program Counterε : in general, PCε maintain the next address of the currently executing program inεtruction εtream.

IF__PC R/W IFU Program Counter: the IF_PC maintainε the preciεe next execution addreεε .

PFnPCε R Prefetch Program Counterε: the MBUF,

TBUF and EBUF PFnPCε maintain the next prefetch inεtruction addreεεes for the respective prefetch instruction εtreamε . uPC R/W Micro-Program Counter: maintainε the addreεs of the inεtruction following a procedural inεtruction. Thiε iε the addreεε of the firεt inεtruction to be executed upon return from a procedural routine. xPC R/W Interrupt/Exception .Program Counter: holdε the return addreεε of an interrupt or and exception. The return addreεε iε the addreεε of the IFPC at the time of the trap. TBR W Trap Base Regiεter: base addresε of a vector table used for trap handling routine dispatching. Each entry iε one word long. The trap number, provided by Interrupt Logic Unit 363, iε used as an index into the table pointed to by thiε addreεε.

FTB W Faεt Trap Base Register: base addresε of an immediate trap handling routine table. Each table entry iε 32 words and is used to directly implement a trap handling routine. The trap number, provided by Interrupt Logic Unit 363, times 32 is used aε an offset into the table pointed to by this addreεs.

PBR W Procedural Base Register: base address of a vector table used for procedural routine dispatching. Each entry iε one word long, aligned on four word boundaries. The procedure number, provided as a procedural inεtruction field, is uεed aε an index into the table pointed to by thiε addresε.

PSR R/W Proceεεor State Regiεter: maintainε the proceεεor statuε word. Statuε data bitε include: carry, overflow, zero, negative, processor mode, current interrupt level, procedural routine being executed, divide by 0, overflow exception, hardware function enables, procedural enable, interrupt enable.

PPSR R/W Previous Proceεεor State Regiεter: loaded from the PSR on εuccessful completion of an inεtruction or when an interrupt or trap iε taken.

CSR R/W Compare State (Boolean) Regiεter: the boolean regiεter εet acceεεible aε a εingle word.

PCSR R/W Previous Compare State Regiεter: loaded from the CSR on εucceεεful completion of an inεtruction or when an interrupt or trap is taken. 7 ) Integer Data Path Detail:

The integer data path of the IEU 104, conεtructed in accordance with the preferred embodiment of the present invention, iε εhown in Figure 10. For purposes of clarity, the many control path connections to the integer data path 580 are not shown. Those connections are defined with respect to Figure 5.

Input data for the data path 580 is obtained from the alignment unitε 582, 584 and the integer load/εtore unit 586. Integer immediate data valueε, originally provided aε an inεtruction embedded data field are obtained from the operand unit 470 via a buε 588. The alignment unit 582 operateε to isolate the integer data value and provide the resulting value onto the output buε 590 to a multiplexer 592. A second input to the multiplexer 592 is the special regiεter addreεs and data bus 354.

Immediate operands obtained from the instruction stream are also obtained from the operand unit 570 via the data bus 594. Theεe valueε are again right justified by the alignment unit 584 before proviεion onto an output buε 596.

The integer load/εtore unit 586 communicates bi- directionally via the external data buε 598 with the CCU 106. Inbound data to the IEU 104 iε tranεferred by the integer load/εtore unit 586 onto the input data bus 600 to an input latch 602. Data output from the multiplexer 592 and latch 602 are provided on the multiplexer input buses 604, 606 of a multiplexer 608. Data from the functional unit output bus 482' iε also received by the multiplexer 608. Thiε multiplexer 608, in the preferred embodiments of the architecture 100, provideε for two εimultaneouε data paths to the output multiplexer buεeε 610. Further, the tranεfer of data through the multiplexer 608 can be completed within each half cycle of the system clock. Since most instructions implemented by the architecture 100 utilize a single deεtination regiεter, a maximum of four inεtructionε can provide data to the temporary buffer 612 during each εyεtem clock cycle.

Data from the temporary buffer 612 can be tranεferred to an integer regiεter file array 614, via temporary regiεter output buεeε 616 or to a output multiplexer 620 via alternate temporary buffer regiεter buεeε 618. Integer regiεter array output buεeε 622 permit the tranεfer of integer register data to the multiplexer 620. The output buseε connected to the temporary buffer 612 and integer regiεter file array 614 each permit five regiεter valueε to be output εimultaneouεly. That iε, two inεtructionε referencing a total of up to five εource registers can be issued simultaneously. The temporary buffer 612, regiεter file array 614 and multiplexer 620 allow outbound regiεter data transfers to occur every half εyεtem clock cycle. Thuε, up to four integer and floating point instructions may be isεued during each clock cycle.

The multiplexer 620 operateε to εelect outbound regiεter data valueε from the regiεter file array 614 or directly from the temporary buffer 612. This allows out-of-order executed instructionε with dependencieε on prior out-of-order executed instructions to be executed by the IEU 104. This facilitates the twin goals of maximizing the execution through-put capability of the IEU integer data path by the out-of-order execution of pending instructionε while preciεely segregating out- of-order data resultε from data reεultε produced by inεtructionε that have been executed and retired. Whenever an interrupt or other exception condition occurs that requires the preciεe εtate of the machine to be reεtored, the preεent invention allowε the data valueε preεent in the temporary buffer 612 to be simply cleared. The regiεter file array 614 iε therefore left to contain preciεely thoεe data valueε produced only by the execution of instructions completed and retired prior to the occurrence of the interrupt or other exception condition. The up to five register data values selected during each half system clock cycle operation of the multiplexer 620 are provided via the multiplexer output buseε 624 to an integer bypaεε unit 626. Thiε bypaεε unit 626 iε, in essence, a parallel array of multiplexers that provide for the routing of data presented at any of its inputε to any of itε outputε. The bypaεε unit 626 inputε include the εpecial regiεter addreεεed data value or immediate integer value via the output bus 604 from the multiplexer 592, the up to five regiεter data valueε provided on the buses 624, the load operand data from the integer load/store unit 586 via the double integer buε 600, the immediate operand value obtained from the alignment unit 584 via itε output buε 596, and, finally, a bypass data path from the functional unit output bus 482. This bypass data path, and the data buε 482, provideε for the simultaneous transfer of four register valueε per εyεtem clock cycle.

Data iε output by the bypaεε unit 626 onto an integer bypasε bus 628 that is connected to the floating point data path, to two operand data buses providing for the transfer out of up to five regiεter data valueε simultaneously, and a εtore data buε 632 that iε uεed to provide data to the integer load/εtore unit 586. The functional unit diεtribution buε 480 iε implemented through the operation of a router unit 634. Again, the router unit 634 iε implemented by a parallel array of multiplexerε that permit five regiεter valueε received at its inputs to be routed to the functional units provided in the integer data path. Specifically, the router unit 634 receiveε the five regiεter data values provided via the buses 630 from the bypaεε unit 626, the current IF_PC addreεε value via the addreεε buε 352 and the control flow offεet value determined by the PC control unit 362 and as provided on the lines 378'. The router unit 634 may optionally receive, via the data bus 636 an operand data value εourced from a bypaεε unit provided within^' the floating point data path. The regiεter data valueε received by the router unit 634 may be tranεferred onto the special, register addreεε and data buε 354 and to the functional unitε 640, 642, 644. Specifically, the router unit 634 iε capable of providing up to three register operand values to each of the functional unitε 640, 642, 644 via router output buεeε 646, 648, 650. Conεiεtent with the general architecture of the architecture 100, up to two inεtructionε could be εimultaneouεly iεsued to the functional units 640, 642, 644. The preferred embodiment of the present invention provideε for three dedicated integer functional units, implementing respectively a programmable shift function and two arithmetic logic unit functions.

An ALU0 functional unit 644, ALU1 functional unit 642 and εhifter functional unit 640 provide reεpective output regiεter data onto the functional unit buε 482'. The output data produced by the ALU0 and εhifter functional unit 644, 640 are also provided onto a shared integer functional unit buε 650 that iε coupled into the floating point data path. A εimilar floating point functional unit output value data buε 652 iε provided from the floating point data path to the functional unit output buε 482' . The ALUO functional unit 644 iε uεed alεo in the generation of virtual address values in support of both the prefetch operations of the IFU 102 and data operations of the integer load/εtore unit 586. The virtual addreεε value calculated by the ALUO functional unit 644 is provided onto an output buε 654 that connectε to both the target addreεs buε 346 of the IFU 102 and to the CCU 106 to provide the execution unit phyεical addreεs (EX PADDR) . A latch 656 iε provided to εtore the virtualizing portion of the addreεε produced by the ALUO functional unit 644. Thiε virtualizing portion of the address is provided onto an output bus 658 to the VMU 108.

3) Floating Point Data Path Detail:

Referring now to Figure 11, the floating point data path 660 is shown. Initial data iε again received from a number of εourceε including the immediate integer operand buε 588, immediate operand buε 594 and the εpecial regiεter addreεε data bus 354. The final εource of external data iε a floating point load/εtore unit 662 that iε coupled to the CCU 106 via the external data buε 598.

The immediate integer operand iε received by an alignment unit 664 that functions to right justify the integer data field before εubmiεεion to a multiplexer 666 via an alignment output data buε 668. The multiplexer 666 also receives the special register addresε data buε 354. Immediate operandε are provided to a second alignment unit 670 for right juεtification before being provided on an output buε 672. Inbound data from the floating point load/εtore unit 662 iε received by a latch 674 from a load data bus 676. Data from the multiplexer 666, latch 674 and a functional unit data return buε 482" iε received on the inputε of a multiplexer 678. The multiplexer 678 provideε for selectable data pathε εufficient to allow two regiεter data valueε to be written to a temporary buffer 680, via the multiplexer output buεeε 682, each half cycle of the εyεtem clock. The temporary buffer 680 incorporates a regiεter εet logically identical to the temporary buffer 552' aε shown in Figure 6b. The temporary buffer 680 further provideε for up to five regiεter data valueε to be read from the temporary buffer 680 to a floating point register file array 684, via data buses 686, and to an output multiplexer 688 via output data buseε 690. The multiplexer 688 alεo receiveε, via data buεeε 692, up to five regiεter data valueε from the floating point regiεter file array 684 εimultaneouεly. The multiplexer 688 functionε to εelect up to five regiεter data values for simultaneous transfer to a bypasε unit 694 via data buεeε 696. The bypaεε unit 694 alεo receiveε the immediate operand value provided by the alignment unit 670 via the data buε 672, the output data buε 698 from the multiplexer 666, the load data buε 676 and a data bypass extension of the functional unit data return bus 482". The bypass unit 694 operates to εelect up to five simultaneouε regiεter operand data values for output onto the bypaεs unit output buses 700, a εtore data buε 702 connected to the floating point load/εtore unit 662, and the floating point bypaεs buε 636 that connectε to the router unit 634 of the integer data path 580.

A floating point router unit 704 provideε for simultaneouε εelectable data paths between the bypaεε unit output buεeε 700 and the integer data path bypaεε buε 628 and functional unit input buεeε 706, 708, 710 coupled to the reεpective functional unitε 712, 714, 716. Each of the input buεeε 706, 708, 710, in accordance with the preferred embodiment of the architecture 100, permitε the εimultaneouε transfer of up to three register operand data valueε to each of the functional unit 712, 714, 716. The output buεeε of theεe functional unitε 712, 714, 716 are coupled to the functional unit data return buε 482" for returning data to the regiεter file input multiplexer 678. The integer data path functional unit output bus 650 may alεo be provided to connect to the functional unit data return buε 482". The architecture 100 does provide for a connection of the functional unit output buses of a multiplier functional unit 712 and a floating point ALU 714 to be coupled via the floating point data path functional unit bus 652 to the functional unit data return bus 482' of the integer data path 580.

A ) Boolean Register Data Path Detail: The boolean operations data path 720 is shown in Figure 12. This data path 720 iε utilized in εupport of the execution of eεεentially two typeε of inεtructionε. The firεt type iε an operand compariεon inεtruction where two operands, selected from the integer regiεter εetε, floating point register εetε or provided aε immediate operandε, are compared by εubtraction in one of the ALU functional unitε of the integer and floating point data pathε. Compariεon is performed by a subtraction operation by any of the ALU functional units 642, 644, 714, 716 with the resulting εign and zero εtatus bits being provided to a combined input εelector and compariεon operator unit 722. Thiε unit 722, in reεponse to inεtruction identifying control εicnals received from the EDecode unit 490, εelectε the output of an ALU functional unit 642, 644, 714, 716 and combines the sign and zero bits to extract a boolean comparison reεult value. An output bus 723 allows the reεultε of the compariεon operation to be tranεferred εimultaneouεly to an input multiplexer 726 and a bypaεs unit 742. As in the integer and floating point data pathε, the bypaεε unit 742 iε implemented aε a parallel array of multiplexerε providing multiple selectable data pathε between the inputε of the bypaεε unit 742 to multiple outputs. The other inputε of the bypaεs unit 742 include a boolean operation reεult return data bus 724 and two boolean operands on data buεeε 744. The bypaεε unit 742 permits boolean operands representing up to two simultaneously executing boolean instructions to be tranεferred to a boolean operation functional unit 746, via operand buεeε 748. The bypaεε unit 746 alεo permitε transfer of up to two single bit boolean operand bits (CFO, CF1) to be simultaneouεly provided on the control flow reεult control lineε 750, 752. The remainder of the boolean operation data path 720 includeε the input multiplexer 726 that receiveε aε itε inputε, the compariεon and the boolean operation reεult valueε provided on the compariεon reεult buε 723 and a boolean result buε 724. The buε 724 permitε up to two εimultaneouε boolean reεult bits to be tranεferred to the multiplexer 726. In addition, up to two comparison reεult bitε may be tranεferred via the buε 723 to the multiplexer 726. The multiplexer 726 permitε any two single bits presented at the multiplexer inputs to be transferred via the multiplexer output buses 730 to a boolean operation temporary buffer 728 during each half cycle of the εyεtem clock. The temporary buffer 728 is logically equivalent to the temporary buffer 752', aε εhown in Figure 6b, though differing in two significant respects. The firεt respect iε that each regiεter entry in the temporary buffer 728 conεiεts of a single bit. The second distinction is that only a single register is provided for each of the eight pending instruction slotε, εince the reεult of a boolean operation is, by definition, fully defined by a single result bit.

The temporary buffer 728 provides up to four output operand values εimultaneouεly. Thiε allowε the simultaneous execution of two boolean instructions, each requiring accesε to two εource registerε. The four boolean regiεter valueε may be tranεferred during each half cycle of the εystem clock onto the operand buses 736 to a multiplexer 738 or to a boolean regiεter file array 732 via the boolean operand data buεeε 734. The boolean regiεter file array 732, as logically depicted in Figure 9, is a single 32 bit wide data regiεter that permitε any εeparate combination of up to four εingle bit locationε to be modified with data from the temporary buffer 728 and read from the boolean regiεter file array 732 onto the output buseε 740 during each half cycle of the εyεtem clock. The multiplexer 738, provideε for any two pairε of boolean operandε received at its inputε via the buεeε 736, 740 to be tranεferred onto the operand output buεes 744 to the bypasε unit 742.

The boolean operation functional unit 746 iε capable of performing a wide range of boolean operationε on two εource values. In the case of compariεon inεtructionε, the εource valueε are a pair of operands obtained from any of the integer and floating point register εetε and any immediate operand provided to the IEU 104, and, for a boolean instruction, any two of boolean regiεter operands. Tables III and IV identify the logical compariεon operationε provided by the preferred embodiment of the architecture 100. Table V identifies the direct boolean operationε provided by the preferred implementation of the architecture 100. The inεtruction condition codes and function codeε specified in the Tables III-V represent a segment of the corresponding instructionε. The instruction also provides an identification of the source pair of operand registerε and the deεtination boolean regiεter for εtorage of the correεponding boolean operation result.

*rs = regiεter source

*bε = boolean εource regiεter ) Loa /Store Control Unit:

An exemplary load/εtore unit 760 iε εhown in Figure 13. Although εeparately εhown in the data paths 580, 660, the load/store units 586 662 are preferrably implemented as a single shared load/εtore unit 760. The interface from a reεpective data path 580, 660 iε via an addreεε buε 762 and load and εtore data buεeε 764 (600, 676), 766 (632, 702).

The addreεε utilized by the load/store unit 760 iε a physical addreεε aε oppoεed to the virtual address utilized by the IFU 102 and the remainder of the IEU 104. While the IFU 102 operates on virtual addresεeε, relying on coordination between the CCU 106 and VMU 108 to produce a phyεical addreεε, the IEU 104 requires the load/store unit 760 to operate directly in a physical addreεs mode. This requirement iε neceεεary to inεure data integrity in the preεence of out-of-order executed inεtructionε that may involve overlapping physical addreεε data load and εtore operationε and in the preεence of out-of-order data returnε from the CCU 106 to the load/εtore unit 760. In order to insure data integrity, the load/store unit 760 bufferε data provided by εtore instructions until the store instruction is retired by the IEU 104. Consequently, εtore data buffered by the load store unit 760 may be uniquely preεent only in the load/εtore unit 760. Load inεtructionε referencing the same phyεical addreεε aε executed but not retired εtore inεtructionε are delayed until the store instruction iε actually retired. At that point the εtore data may be tranεferred to the CCU 106 by the load/εtore unit 760 and then immediately loaded back by the execution of a CCU data load operation. Specifically, full phyεical addreεεeε are provided from the VMU 108 onto the load/store address bus 762. Load addresses are, in general, εtored in load address regiεterε 768,^. Store addreεεeε are latched into εtore addreεε regiεters 770_M. A load/store control unit 774 cperateε in reεponεe to control εignalε received from the inεtruction isεuer unit 498 in order to coordinate latching of load and εtore addreεεeε into the regiεterε 768_3.0, 770₃^,. The load/store control unit 774 provideε control εignalε on control lineε 778 for latching load addreεεeε and on control lineε 780 for latching εtore addresses. Store data is latched simultaneouε with the latching of εtore addreεεeε in logically correεponding εlotε of the εtore data register set IBl^ . A 4x4x32 bit wide addreεs comparator unit 772 is simultaneously provided with each of the addresεeε in the load and store addresε registers 768₃^, 770₃^,. The execution of a full matrix address comparison during each half cycle of the syεtem clock iε controlled by the load/εtore control unit 774 via control lines 776. The exiεtence and logical location of a load addreεε that matcheε a store addresε iε provided via control signalε returned to the load εtore control unit 774 via control lineε 776. Where a load addreεs iε provided from the VMU 108 and there are no pending εtoreε, the load addreεε iε bypaεεed directly from the buε 762 to an addreεε εelector 786 concurrent with the initiation of a CCU load operation. However, where εtore data is pending, the load addreεε will be latched in an available load addresε latch 768_0.3. Upon receipt of a control εignal from the retirement control unit 500, indicating that the correεponding εtore data inεtruction iε retiring, the load/εtore control unit 774 initiateε a CCU data tranεfer operation by arbitrating, via control lineε 784 for acceεs to the CCU 106. When the CCU 106 εignals ready, the load/store control unit 774 directε the εelector 786 to provide a CCU phyεical addreεε onto the CCU PADDR address buε 788. This addresε is obtained from the correεponding store regiεter 770_g^ via the address bus 790. Data from the corresponding εtore data regiεter 782_{3 )} is provided onto the CCU data bus 792.

Upon isεuance of load inεtruction by the inεtruction issuer 498, the load εtore control unit 774 enables one of the load addresε latcheε 768^ to latch the requeεted load addreεε. The εpecific latch 768_0.3 εeleσted logically correεpondε to the poεition of the load instruction in the relevant inεtruction εet. The inεtruction iεεuer 498 provideε the load/εtore control unit 774 with a five bit vector identifying the load inεtruction within either of the two poεεible pending instruction setε. Where the comparator 772 does not identify a matching store addresε, the load addreεε is routed via an addreεε buε 794 to the selector 786 for output onto the CCU PADDR addreεε buε 788. Proviεion of the addreεε iε performed in concert with CCU requeεt and ready control εignalε being exchanged between the load/store control unit 774 and CCU 106. An execution ID value (ExID) iε alεo prepared and iεεued by the load/εtore control unit^' 774 to the CCU 106 in order to identify the load requeεt when the CCU 106 εubεequently returnε the requeεted data including ExID value. Thiε ID value conεiεtε of a four bit vector utilizing unique bitε to identify the reεpective load address latch 768₃. ₃ from which the current load request iε generated. A fifth bit iε utilized to identify the inεtruction εet that containε the load inεtruction. The ID value is thuε the εame aε the bit vector provided with the load requeεt from the instruction issuer unit 498.

On subεequent εignal from the CCU 106 to the load/εtore control unit 774 of the availability of prior requeεted load data, the load/εtore control unit 774 enables an alignment unit to receive the data and provide it on the load data buε 764. An alignment unit 798 operates to right justify the load data.

Simultaneouεly with the return of data from the CCU 106, the load/εtore control unit 774 receiveε the ExID value from the CCU 106. The load/εtore control unit 774, in turn, provides a control εignal to the inεtruction iεεuer unit 498 identifying that load data is being provided on the load data bus 764 and, further, returnε a bit vector identifying the load inεtruction for which the load data iε being returned.

Cl IEU Control Path Detail:

Referring again to Figure 5, the operation of the IEU control path will now be deεcribed in detail with respect to the timing diagram provided in Figure 14. The timing of the execution of inεtructionε repreεented in Figure 14 iε exemplary of the operation of the preεent invention, and not exhauεtive of execution timing permutationε .

The timing diagram of Figure 14 εhowε a εequence of procesεor εyεtem clock cycleε, P_M. Each proceεεor cycle begins with an internal T Cycle, T₀. There are two T cycles per proceεεor cycle in a preferred embodiment of the preεent invention aε provided for by the architecture 100.

In proceεεor cycle zero, the IFU 102 and the VMU 108 operate to generate a phyεical addresε. The phyεical addreεε iε provided to the CCU 106 and an inεtruction cache acceεε operation is initiated. Where the requested instruction εet iε present in the inεtruction cache 132, an inεtruction εet iε returned to the IFU 102 at about the mid-point of processor cycle one. The IFU 102 then manages the transfer of the instruction set through the prefetch unit 260 and IFIFO 264, whereupon the instruction εet is firεt preεented to the IEU 104 for execution. i EDecode Unit Detail: The EDecode unit 490 receiveε the full inεtruction εet in parallel for decoding prior to the concluεion of procesεor cycle one. The EDecode unit 490, in the preferred architecture 100, iε implemented aε a pure combinatorial logic block that provideε for the direct parallel decoding of all valid inεtructionε that are received via the buε 124. Each type of instruction recognized by the architecture 100, including the specification of the inεtruction, regiεter requirementε and reεource needε are identified in Table VI.

TABLE VI

Instruction/Specificationε

Instruction Control and Operand Information* Move Regiεter Logical/Arithmetic Function Code: to Regiεter specifies Add, Subtract,

Multiply, Shift, etc. Destination Register

Set PSR only Source Regiεter 1 Source Regiεter 2 or Immediate conεtant value Regiεter Set A/B εelect

Move Immediate Deεtination Regiεter to Regiεter Immediate Integer or Floating Point conεtant value Regiεter Set A/B select Load/Store Operation Function Code: specifies Regiεter Load or Store, use immediate value, -base and immediate value, or baεe and offεet

Source/Deεtination Regiεter

Baεe Register

Index Register or Immediate constant value

Regiεter Set A/B εelect

Immediate Call Signed Immediate Displacement control Flow Operation Function Code: εpecifieε branch type and triggering condition Baεe Regiεter Index Register, Immediate constant displacement value, or Trap

Number Register Set A/B select

Special Regiεter Operation Function Code: εpecifieε Move move to/from special/integer regiεter Special Regiεter Addreεε Identifier

Source/Destination Register Register Set A/B select

Convert Integer Operation Function Code: specifieε Move type of floating point to integer conversion Source/Deεtination Regiεter Register Set A/B εelect

Boolean Functions Boolean Function Code: εpecifieε And, Or, etc. Destination boolean regiεter Source Regiεter 1 Source Regiεter 2 Regiεter Set A/B select

Extended Procedure Procedure specifier: specifieε addreεε offεet from procedural base value Operation: value pasεed to procedure routine Atomic Procedure Procedure εpecifier: εpeσifieε addreεs value

* - inεtruction includeε theεe fieldε in addition to a field that decodeε to identify the inεtruction.

The EDecode unit 490 decodes each instruction of an inεtruction set in parallel. The resulting identification of instructions, instruction . functions, register references and function requirements are made available on the outputs of the EDecode unit 490. This information is regenerated and latched by the EDecode unit 490 during each half proceεεor cycle until all inεtructionε in the inεtruction set are retired. Thuε, information regarding all eight pending instructions is constantly maintained at the output of the EDecode unit 490. This information is presented in the form of eight element bit vectors where the bits or sub-fieldε of each vector logically correεpond to the phyεical location of the correεponding inεtruction within the two pending inεtruction εetε. Thuε, eight vectorε are provided via the control lineε 502 to the carry checker 492, where each vector εpecifieε whether the correεponding inεtruction affectε or iε dependant on the carry bit of the proceεεor εtatuε word. Eight vectors are provided via the control lines 510 to identify the specific nature of each inεtruction and the function unit requirements. Eight vectors are provided via the control lineε 506 εpecifying the regiεter referenceε uεed by each of the eight pending inεtructionε. Theεe vectorε are provided prior to the end of proceεεor cycle one. 2, Carry Checker Unit Detail:

The carry checker unit 492 operates in parallel with the dependency check unit 494 during the data dependency phase of operation εhown in Figure 14. The carry check unit 492 is implemented in the preferred architecture 100 aε pure combinatorial logic. Thuε, during each iteration of operation by the carry checker unit 492, all eight inεtructionε are conεidered with reεpect to whether they modify the carry flag of the proceεεor εtate regiεter. Thiε is neceεεary in order to allow the out-of-order execution of instructions that depend on the state of the carry bit aε εet by prior inεtructionε. Control εignals provided on the control lines 504 allow the carry check unit 492 to identify the specific instructions that are dependant on the execution of prior instructions with reεpect to the carry flag.

In addition, the carry checker unit 492 maintainε a temporary copy of the carry bit for each of the eight pending instructionε. For thoεe inεtructionε that do not modify the carry bit, the carry checker unit 492 propagateε the carry bit to the next inεtruction forward in the order of the program inεtruction εtream. Thus, an out-of-order executed instruction that modifies the carry bit can be executed and, further, a εubεequent inεtruction that iε dependant on εuch an out-of-order executed instruction may alεo be allowed to execute, though εubεequent to the inεtruction that modifieε the carry bit. Further, maintenance of the carry bit by the carry checker unit 492 facilitateε out-of-order execution in that any exception occurring prior to the retirement of those inεtructionε merely requires the carry checker unit 492 to clear the internal temporary carry bit regiεter. Conεequentl , the proceεεor εtatuε register is unaffected by the execution of out-of-order executed inεtructionε. The temporary bit carry regiεter maintained by the carry checker unit 492 iε updated upon completion of each out-of-order executed inεtruction. Upon retirement of out-of-order executed instructions, the carry bit correεponding to the last retired instruction in the program instruction εtream is transferred to the carry bit location of the proceεεor εtatuε regiεter. 3) Data Dependency Checker Unit Detail:

The data dependency checker unit 494 receiveε the eight register reference identification vectors from the EDecode unit 490 via the control lines 506. Each register reference is indicated by a five bit value, suitable for identifying any one of 32 registers at a time, and a two bit value that identifies the register bank aε located within the "A", "B" or boolean register sets. The floating point register εet iε equivalently identified aε the "B" register εet. Each instruction may have up to three register reference fieldε: two εource register fields and one destination. Although some inεtructionε, moεt notably the move regiεter to regiεter instructions, may specify a destination regiεter, an inεtruction bit field recognized by the EDecode unit 490 may signify that no actual output data is to be produced. Rather, execution of the inεtruction iε only for the purpoεe of determining an alteration of the value of the proceεεor εtatuε regiεter. The data dependency checker 494, implemented again aε pure combinatorial logic in the preferred architecture 100, operateε to εimultaneouεly determine dependencieε between εource regiεter referenceε of inεtructions subsequent in the program inεtruction εtream and destination regiεter referenceε of relatively prior inεtructionε. A bit array is produced by the data dependency checker 494 that identifieε not only which inεtructionε are dependant on otherε, but alεo the regiεterε upon which each dependency ariεeε.

The carry and regiεter data dependencieε are identified εhortly after the beginning of the second proceεεor cycle.

4 ) Register Rename Unit Detail: The regiεter rename unit 496 receiveε the identification of the regiεter references of all eight pending instructions via the control lines 506, and register dependencies via the control lines 508. A matrix of eight elements is alεo received via the control lineε 542 that identify thoεe inεtructions within the current set of pending inεtructionε that have been executed (done) . From thiε information, the regiεter rename unit 496 provideε an eight element array of control εignalε to the inεtruction iεεuer unit 498 via the control lines 512. The control information so provided reflectε the determination made by the regiεter rename unit 496 as to which of the currently pending inεtructions, that have not already been executed, are now available to be executed given the current set of identified data dependencieε. The regiεter rename unit 496 receiveε a selection control signal via the lines 516 that identifies up to six instructions that are to be εimultaneouεly iεεued for execution: two integer, two floating point and two boolean. The regiεter rename unit 496 performε the additional function of εelecting, via control εignalε provided on the buε 518 to the regiεter file array 472, the εource regiεterε for acceεε in the execution of the identified inεtructions. Deεtination regiεterε for out- of-order executed instructionε are εelected as being in the temporary buffers 612, 660, 728 of the correεponding data path. In-order executed inεtructionε are retired on completion with reεult data being εtored through to the register files 614, 684, 732. The selection of εource regiεterε dependε on whether the regiεter haε been prior εelected aε a deεtination and the correεponding prior inεtruction haε not yet been retired. In εuch an inεtance, the εource register iε εelected from the correεponding temporary buffer 612, 680, 728. Where the prior inεtruction has been retired, then the register of the corresponding regiεter file 614, 684, 732 iε εelected. Conεequently, the regiεter rename unit 496 operates to effectively substitute temporary buffer register references for register file register references in the case of out-of-order executed instructions.

Aε implemented in the architecture 100, the temporary buffers 612, 680, 728 are not duplicate register εtructureε of their corresponding regiεter file arrayε. Rather, a εingle destination register slot is provided for each of eight pending inεtructionε. Conεequently, the εubεtitution of a temporary buffer deεtination regiεter reference is determined by the location of the corresponding instruction within the pending regiεter εetε. A εubεequent εource regiεter reference iε identified by the data dependency checker 494 with reεpect to the instruction from which the εource dependency occurs. Therefore, a destination slot in the temporary buffer regiεter is readily determinable by the register rename unit 496. 5 Instruction Issuer Unit Detail:

The inεtruction iεεuer unit 498 determines the εet of inεtructions that can be issued, based on the output of the regiεter rename unit 496 and the function requirementε of the inεtructions as identified by the EDecode unit 490. The inεtruction iεεuer unit 498 makeε this determination baεed on the status of each of the functional units 478^ aε reported via control lineε 514. Thuε, the inεtruction iεεuer unit 498 begins operation upon receipt of the available set of inεtructions to issue from the regiεter rename unit 496. Given that a register file access iε required for the execution of each inεtruction, the instruction iεεuer unit 498 anticipates the availability of functional unit 478^ that may be currently executing an instruction. In order to minimize the delay in identifying the instructionε to be iεsued to the register rename unit 496, the inεtruction iεεuer unit 498 is implemented in dedicated combinatorial logic. Upon identification of the inεtructionε to iεεue, the register rename unit 496 initiates a register file access that continues to the end of the third proceεsor cycle, P₂. At the beginning of processor cycle P₃, the instruction issuer unit 498 initiates operation by one or more of the functional units 478^, such aε εhown aε "Execute 0", to receive and proceεs εource data provided from the register file array 472.

Typically, most instructions proceεεed by the architecture 100 are executed through a functional unit in a εingle proceεsor cycle. However, εome inεtructionε require multiple proceεεor cycles to complete, εuch aε εhown aε "Execute 1", a εimultaneouεly issued inεtruction. The Execute zero and Execute 1 inεtructionε may, for example, be executed by an ALU and floating point multiplier functional unitε reεpectively. The ALU functional unit, aε εhown is Figure 14, produces output data within one procesεor cycle and, by εimple proviεion of output latching, available for use in executing another instruction during the fifth procesεor cycle, P₄. The floating point multiply functional unit iε preferably an internally pipelined functional unit. Therefore, another additional floating point multiply inεtruction can be iεεued in the next proceεεor cycle. However, the reεult of the firεt instruction will not be available for a data dependant number of processor cycles; the instruction εhown in Figure 14 requireε three proceεεor cycleε to complete proceεεing through the functional unit. During each proceεsor cycle, the function of the inεtruction issuer unit 498 iε repeated. Conεequently, the εtatuε of the current εet of pending inεtructionε aε well as the availability state of the full εet of functional units 478^ are reevaluated during each procesεor cycle. Under optimum conditionε, the preferred architecture 100 is therefore capable of executing up to six inεtructionε per proceεsor cycle. However, a typical instruction mix will reεult in an overall average execution of 1.5 to 2.0 inεtructionε per proceεεor cycle.

A final conεideration in the function of the inεtruction issuer 498 is itε participation in the handling of traps conditions and the execution of specific instructions. The occurrence of a trap condition requires that the IEU 104 be cleared of all inεtructionε that have not yet been retired. Such a circumstance may arise in reεponεe to an externally received interrupt that iε relayed to the IEU 104 via the interrupt requeεt/acknowledge control line 340, from any of the functional units 478o._n in reεponεe to an arithmetic fault, or, for example, the EDecode unit 490 upon the decoding of an illegal instruction. On the occurrence of the trap condition, the inεtruction iεεuer unit 498 iε responsible for halting or voiding all un- retired inεtructionε currently pending in the IEU 104. All inεtructions that cannot be retired εimultaneously will be voided. This result is essential to maintain e preciseness of the occurrence of the interrupt with respect to the conventional in-order execution of a program inεtruction εtream. Once the IEU 104 is ready to begin execution of the trap handling program routine, the instruction issuer 498 acknowledges the interrupt via a return control signal along the control lines 340. Also, in order to avoid the possibility that an exception condition relative to one instruction may be recognized based on a processor εtate bit which would have changed before that instruction would have executed in a clasεical pure in-order routine, the inεtruction issuer 498 is responsible for ensuring that all instructions which can alter the PSR (εuch aε special move and return from trap) are executed strictly in- order.

Certain instructions that alter program control flow are not identified by the IDecode unit 262. Instructionε of thiε type include subroutine returns, returnε from procedural inεtructions, and returnε from trapε. The instruction isεuer unit 498 provideε identifying control εignalε via the IEU return control lines 350 to the IFU 102. A corresponding one of the εpecial regiεters 412 iε εelected to provide the IF_PC execution address that existed at the point in time of the call instruction, occurrence of the trap or encountering of a procedural instruction. - I l l -

61 Done Control Unit Detail:

The done control unit 540 monitors the functional unitε 478^ for the completion εtatuε of their current operationε. In the preferred architecture 100, the done control unit 540 anticipates the completion of operationε by each functional unit εufficient to provide a completion vector, reflecting the εtatus of the execution of each instruction in the currently pending εet of inεtructionε, to the regiεter rename unit 496, bypasε control unit 520 and retirement control unit 500 approximately one half proceεsor cycle prior to the execution completion of an instruction by a functional unit 478o_.n. This allows the instruction isεuer unit 498, via the regiεter rename unit 496, to consider the instruction completing functional units as available resourceε for the next inεtruction iεεuing cycle. The bypaεε control unit 520 iε allowed to prepare to bypaεε data output by the functional unit through the bypaεε unit 474. Finally, the retirement control unit 500 may operate to retire the corresponding instruction εimultaneouε with the tranεfer of data from the functional unit 478^ to the regiεter file array 472. 71 Retirement Control Unit Detail;

In addition to the instruction done vector provided from the done control unit 540, the retirement control unit 500 monitorε the oldeεt instruction set output from the EDecode output 490. As each inεtruction in inεtruction εtream order iε marked done by the done control unit 540, the retirement control unit 500 directs, via control signals provided on control lineε 534, the transfer of data from the temporary buffer slot to the correεponding inεtruction εpecified regiεter file regiεter location within the regiεter file array 472. The PC Inc/Size control εignalε are provided on the control lineε 344 for each- one or more inεtruction εimultaneouεly retired. Up to four instructions may be retired per proceεεor cycle. Whenever an entire instruction εet has been retired, an IFIFO read control εignal iε provided on the control line 342 to advance the IFIFO 264.

81 Control Flow Control Unit Detail:

The control flow control unit 528 operateε to continuouεly provide the IFU 102 with information εpecifying whether any control flow inεtructionε within the current εet of pending inεtructionε have been reεolved and, further, whether the branch reεult is taken or not taken. The control flow control unit 528 obtains, via control lineε 510, an identification of the control flow branch inεtructionε by the EDecode 490. The current set of regiεter dependencieε iε provided via control lineε 536 from the data dependency checker unit 494 to the control flow control unit 528 to allow the control flow control unit 528 to determine whether the outcome of a branch inεtruction iε constrained by dependencies or iε now known. The register referenceε provided via buε 518 from the regiεter rename unit 496 are monitored by the control flow control 528 to identify the boolean regiεter that will define the branch deciεion. Thuε, the branch deciεion may be determined even prior to the out-of-order execution of the control flow inεtruction.

Simultaneouε with the execution of a control flow inεtruction, the bypaεs unit 472 is directed by the bypasε control unit 520 to provide the control flow reεults onto control lineε 530, conεiεting of the control flow zero and control flow one 1 control lines 750, 752, to the control flow control unit 528. Finally, the control flow control unit 528 continuously provides two vectorε of eight bitε each to the IFU 102 via control lines 348. These vectorε define whether a branch inεtruction at the corresponding logical location correεponding to the bitε within the vectorε have been resolved and whether the branch result is taken or not taken.

In the preferred architecture 100, the control flow control unit 528 is implemented as pure combinatorial logic operating continuously in responεe to the input control εignalε to the control unit 528.

9) Bypass Control Unit Petal1;

The instruction isεuer unit 498 operateε closely in conjunction with the bypasε control unit 520 to control the routing of data between the register file array 472 and the functional units 478o._n. The bypasε control unit 520 operates in conjunction with the register file accesε, output and εtore phaεeε of operation εhown in Figure 14. During a regiεter file access, the bypasε control unit 520 may recognize, via control lines 522, an acceεs of a destination register within the register file array 472 that iε in the proceεε of being written during the output phase of execution of an inεtruction. In this case, the bypasε control unit 520 directε the εelection of data provided on the functional unit output bus 482 to be bypaεεed back to the functional unit diεtribution buε 480. Control over the bypasε unit 520 iε provided by the inεtruction issuer unit 498 via control lines 542.

TV. Virtual Memory Control Unit:

An interface definition for the VMU 108 is provided in Figure 15. The VMU 108 conεiεtε principally of a VMU control logic unit 800 and a content addressable memory (CAM) 802. The general function of the VMU 108 iε εhown graphically in Figure 16. There, a representation of a virtual addreεε is εhown partitioned into a space identifier (sID[31 : 28] ) , a virtual page number (VADDR[27: 14] ) , page offset (PADDR[13:4] ) , and a requeεt ID (rID[3:0]). The algorithm for generating a phyεical addreεε iε to uεe the space ID to εelect one of 16 registers within a space table 842. The contents of the selected εpace regiεter in combination with a virtual page number iε uεed aε an addreεs for accesεing a table look aεide buffer (TLB) 844. The 34 bit addreεs operates as a content addreεε tag uεed to identify a correεponding buffer regiεter within the buffer 844. On the occurrence of a tag match, an 18 bit wide regiεter value is provided aε the high order 18 bits of a phyεical addreεε 846. The page offεet and requeεt ID are provided aε the low order 14 bitε of the phyεical addresε 846.

Where there is a tag miss in the table look aside buffer 844, a VMU miss is signalled. This requireε the execution of a VMU faεt trap handling routine that implements conventional haεh algorithm 848 that acceεεeε a complete page table data εtructure maintained in the MAU 112. Thiε page table 850 containε entrieε for all memory pages currently in uεe by the architecture 100. The haεh algorithm 848 identifieε thoεe entries in the page table 850 necesεary to εatiεfy the current virtual page translation operation. Thoεe page table entrieε are loaded from the MAU 112 to the trap regiεterε of regiεter εet "A" and then transferred by εpecial regiεter move instructions to the table look aεide buffer 844. Upon return from the exception handling routine, the inεtruction giving rise to the VMU miεε exception iε re-executed by the IEU 104. The virtual to physical addreεε tranεlation operation εhould then complete without exception.

The VMU control logic 800 provides a dual interface to both the IFU 102 and IEU 104. A ready εignal iε provided on control lines 822 to the IEU 104 to signify that the VMU 108 is available for an address translation. In the preferred embodiement, the VMU 108 is alsways ready to accept IFU 120 translation requests. Both the IFU and IEU 102, 104 may poεe requests via control line 328, 804. In the preferred architecture 100, the IFU 102 has priority access to the VMU 108. Consequently, only a single busy control line 820 is provided to the IEU 104.

Both the IFU and IEU 102, 104 provide the εpace ID and virtual page number fieldε to the VMU control logic 800 via control lineε 326, 808, reεpectivel . In addition, the IEU 104 provides a read/write control εignal via control signal 806 to define whether the addresε iε to be uεed for a load or εtore operation aε neceεεary to modify memory acceεε protection attributeε of the virtual memory referenced. The εpace ID and virtual page fieldε of the virtual addreεs are pasεed to the CAM unit 802 to perform the actual tranεlation operation. The page offεet and ExID fieldε are eventually provided by the IEU 104 directly to the CCU 106. The phyεical page and requeεt ID fieldε are provided on the addreεε lineε 836 to the CAM unit 802. The occurrence of a table look aside buffer match is signalled via the hit line and control output lineε 830 to the VMU control logic unit 800. The resulting physical addresε, 18 bits in length, iε provided on the addreεε output lineε 824.

The VMU control logic unit 800 generateε the virtual memory miss and virtual memory exception control εignalε on lineε 334, 332 in reεponεe to the hit and control output control εignalε on lineε 830. A virtual memory translation miεs iε defined aε failure to match a page table identifier in the table look aεide buffer 844. All other tranεlation errorε are reported aε virtual memory exceptions.

Finally, the data tables within the CAM unit 802 may be modified through the execution of εpecial register to regiεter move inεtructionε by the IEU 104. Read/write, regiεter εelect, reεet, load and clear control εignalε are provided by the IEU 104 via control lines 810, 812, 814, 816, 818. Data to be written to the CAM unit registerε iε received by the VMU control logic unit 800 via the addreεε buε 808 coupled to the special address data bus 354 from the IEU 104. Thiε data iε transferred via bus 836 to the CAM unit 802 εimultaneouε with control εignalε 828 that control the initialization, regiεter εelection, and read or write control signal. Consequently, the data registerε within the CAM unit 802 may be readily written aε required during the dynamic operation of the architecture 100 including read out for εtorage aε required for the handling of context εwitcheε defined by a higher level operating syεtem.

V. Cache Control Unit:

The control on data interface for the CCU 106 iε εhown in Figure 17. Again, εeparate interfaceε are provided for the IFU 102 and IEU 104. Further, logically εeparate interfaces are provided by the CCU 106 to the MCU 110 with respect to inεtruction and data tranεfers.

The IFU interface consists of the phyεical page addreεs provided on addresε lineε 324, the VMU converted page addreεε aε provided on the addreεε lineε 824, and requeεt IDε aε tranεferred εeparately on control lineε 294, 296. A unidirectional data tranεfer buε 114 iε provided to tranεfer an entire inεtruction εet in parallel to the IFU 102. Finally, the read/busy and ready control εignals are provided to the CCU 106 via control lines 298, 300, 302.

Similarly, a complete physical addresε iε provided by the IEU 102 via the phyεical addreεs buε 788. The requeεt ExIDε are εeparately provided from and to the load/εtore unit of the IEU 104 via control lines 796. An 80 bit wide bidirectional data bus iε provided by the CCU 106 to the IEU 104. However, in the present preferred implementation of the architecture 100, only the lower 64 bits are utilized by the IEU 104. The availability and εupport within the CCU 106 of a full 80 bit data tranεfer bus is provided to εupport subsequent implementations of the architecture 100 that support, through modifications of the floating point data path 660, floating point operation in accordance with IEEE standard 754.

The IEU control interface, establiεhed via request, busy, ready, read/write and with control εignalε 784 iε εubstantially the same as the correεponding control εignalε utilized by the IFU 102. The exception being the provision of a read/write control εignal to differentiate between load and εtore operationε. The width control signals specify the number of bytes being transferred during each CCU 106 acceεε by the IEU 104; in contraεt every acceεε of the inεtruction cache 132 is a fixed 128 bit wide data fetch operation.

The CCU 106 implements a εubεtantially conventional cache controller function with reεpect to the separate inεtruction and data cacheε 132, 134. In the preferred architecture 100, the instruction cache 132 iε a high εpeed memory providing for the εtorage of 256 128 bit wide inεtruction εetε. The data cache 134 provideε for the εtorage of 1024 32 bit wide wordε of data. Inεtruction and data requeεts that cannot be immediately satisfied from the contents of the instruction and data caches 132, 134 are passed on to the MCU 110. For instruction cache isεeε, the 28 bit wide phyεical addreεs is provided to the MCU 110 via the addresε bus 860. The request ID and additional control signals for coordinating the operation of the CCU 106 and MCU 110 are provided on control lines 862. Once the MCU 110 has coordinated the necessary read access of the MAU 112, two consecutive 64 bit wide data tranεfers are performed directly from the MAU 112 through to the instruction cache 132. Two transfers are required given that the data bus 136 is, in the preferred architecture 100, a 64 bit wide buε. Aε the requeεted data iε returned through the MCU 110 the request ID maintained during the pendency of the request operation is alεo returned to the CCU 106 via the control lineε 862.

Data tranεfer operations between the data cache 134 and MCU 110 are εubstantially the same aε inεtruction cache operations . Since data load and εtore operationε may reference a εingle byte, a full 32 bit wide phyεical addreεε is provided to the MCU 110 via the addreεε buε 864. Interface control εignalε and the requeεt ExID are tranεferred via control lineε 866. Bidirectional 64 bit wide data tranεfers are provided via the data cache buε 138.

VI. Summary/Conc usion:

Thus, a high-performance RISC baεed microprocessor architecture has been diεcloεed. The architecture efficiently implementε out-of-order execution of inεtructionε, εeparate main and target inεtruction εtream prefetch instruction transfer paths, and a procedural inεtruction recognition and dedicated prefetch path. The optimized inεtruction execution unit provides multiple optimized data processing paths supporting integer, floating point and boolean operations and incorporates reεpective temporary register files facilitating out-of-order execution and instruction cancellation while maintaining a readily establiεhed precise state-of-the-machine εtatuε.

It is therefore to be understood that while the foregoing disclosure describeε the preferred embodiment of the preεent invention, other variationε and modifications may be readily made by thoεe of average εkill within the scope of the present invention.

Claims

1. A method for handling a trap in a microprocesεor for which inεtructionε have a minimum length of £ addreεε locationε, compriεing the εtepε of: determining the entry point for a trap handling routine as addreεε location nm + b, where b iε a baεe address location, n iε a trap number correεponding to εaid trap, and iε a multiplier greater than or equal to 2£; and tranεferring control to an inεtruction at εaid entry point.

2. A method according to claim 1, wherein all instructions for εaid microproceεεor have the εame length £.

3. A method according to claim 1, wherein m iε a power of 2 and b iε an integer multiple of m, and wherein the εtep of determining comprises the εtep of concatenating n aε low-order bitε to all the bitε of b having an order at least as high as log₂ m.

4. A method according to claim 3, wherein £ iε a power of 2, and wherein the εtep of determining further co priεeε the step of concatenating log₂ £ zero bitε aε low-order bitε to εaid concatenation of n and εaid bits of b having an order higher than log₂ m.

5. A method according to claim 1, wherein εaid entry point iε provided as a logical addreεε, further compriεing the εtep of determining a phyεical addreεs from said logical addreεε.

6. A method for handling trapε in a microproceεsor for which inεtructionε have a minimum length of £ addreεε locationε, each trap generating a trap number n, compriεing the εtepε of: determining a firεt vector addresε location aε nm, + b, for trapε of a firεt type, where b, iε a firεt baεe addreεε location and m, iε a firεt multiplier; determining a second vector addreεε location aε nm_j + b₂ for trapε of a second type, where b₂ iε a εecond baεe addreεs location different from said firεt baεe addreεε location and m-, iε a second multiplier greater than n, π^ j> 2£; and transferring control to an instruction at eaid εecond vector addreεs location for traps of εaid second type.

7. A method according to claim 6, wherein m_t = £ , further compriεing the εtepε of: εtoring a branch inεtruction at a plurality of the vector addreεs locationε km, + b₁, k = 1, 2, 3, ..., prior to the εtep of determining a firεt vector address location; and transferring control to the branch instruction at said firεt vector addreεs location for traps of said first type.

8. A method for handling traps in a microproceεεor, compriεing the εtepε of: prefetching inεtructionε in an inεtruction εtream for εubεequent execution by εaid microprocessor; executing prefetched instructions during an execution time; detecting, prior to a given execution time being the execution time for a given one of said prefetched inεtructionε, whether said given one of said inεtructionε haε had any of a firεt claεε of εynchronouε exceptionε; and invoking an exception handler during εaid given execution time if εaid given one of εaid inεtructionε has had a εynchronous exception.

9. A method according to claim 8, wherein εaid εtep of prefetching compriεeε the εtep of prefetching a plurality of inεtructionε at a time.

10. A method according to claim 8, wherein εaid first class of exceptions are faults occurring during said εtep of prefetching.

11. A method according to claim 8, further compriεing the εtep of detecting, during εaid given execution time, whether εaid given one of εaid inεtructions haε had any one of an execution claεε of synchronouε exceptionε.

12. A method according to claim 8, wherein εaid microprocessor is capable of executing instructionε out- of-order relative to their order in said εtream.

13. A method according to claim 8, wherein εaid microproceεsor iε capable of executing a plurality of inεtructions during each execution time.

14. A method according to claim 8, wherein said microprocessor iε capable of executing a plurality of inεtructions from a εequence of instructionε during a single execution time, further comprising the steps of: scheduling a given plurality of instructions for execution during said given execution time; and determining εaid given one of εaid inεtructionε aε the sequentially first inεtruction in εaid given plurality which haε had a εynchronouε exception.

15. A method according to claim 14, further compriεing the εtep of detecting, during εaid given execution time and prior to εaid εtep of determining, whether each inεtruction in εaid given plurality haε had any of an execution claεε of exceptionε.

16. A method according to claim 15, wherein εaid execution clasε of exceptionε includeε a εecond exception type which iε dependent upon the εtate of at least one processor εtatuε bit during εaid given execution time, wherein εaid microproceεεor is capable of executing inεtructionε out-of-order relative to their order in εaid sequence of inεtructionε, further compriεing the step of scheduling all instructions which can modify said procesεor εtatus bit to execute in the same εequence aε in εaid sequence of instructions.

17. A method according to claim 14, wherein εaid εtep of executing compriεeε the εteps of: tentatively executing a plurality of inεtructionε scheduled for execution and storing any results of εaid tentative execution in temporary registerε; and copying reεultε from said temporary regiεters into permanent regiεterε upon retirement of an inεtruction, further comprising the steps of: retiring all instructions in said given plurality which were sequentially prior to said given instruction; and cancelling all instructions in said given plurality which were sequentially subsequent to εaid given inεtruction.

18. A method according to claim 17, further compriεing the εtep of cancelling εaid given instruction.

19. A method according to claim 8, wherein all inεtructionε for εaid microproceεεor have the εame length £ , and wherein εaid εtep of invoking comprises the steps of: determining the entry point for an exception handling routine aε addreεε location nm + b, where b iε a base addreεε location, n iε an exception trap number correεponding to εaid εynchronouε exception, and m iε a multiplier greater than or equal to 2£; and transferring control to an inεtruction at εaid entry point.

20. A method for handling exceptions in a microprocesεor capable of executing a plurality of instructionε in a εingle execution time, compriεing the stens of: tentatively executing inεtructionε during execution timeε determined according to an execution sequence; and upon completion of all inεtructionε tentatively executed during a given execution time, if any of εaid tentatively executed inεtructionε has had a εynchronouε exception, (a) retiring all tentatively executed instructions which occur in said εtream prior to the first instruction which has had a εynchronouε exception, (b) cancelling any inεtructionε which occur in εaid stream subεequent to εaid first instruction which has had a εynchronouε exception, and (c) invoking an exception handler.

21. A method according to claim 20, wherein εaid microproceεεor further iε capable of executing inεtructionε out-of-order from their εequence in εaid instruction εtream.

22. A method for uεe in a microproceεsor, comprising the εteps of: executing inεtructionε from a main inεtruction εtream; in reεponεe to a procedural inεtruction in εaid main instruction stream, executing instructionε from an emulation instruction εtream while maintaining an indication of a first return addreεε to εaid main instruction εtream; in response to a εynchronous exception occurring relative to an instruction in εaid main inεtruction εteam, executing inεtructions from a εecond handler instruction stream, while maintaining an indication of a second return addreεε to εaid main inεtruction εtream; and in response to a synchronous exception occurring relative to an instruction in said emulation instruction εtream, executing instructions from a third handler inεtruction stream, while maintaining both an indication of a third return address to said emulation inεtruction .stream and said indication of said firεt return address to said main instruction stream.

23. A method according to claim 22, further compriεing the εtepε of: reεuming execution of inεtructionε from εaid main inεtruction εtream beginning at εaid second return addresε, in reεponεe to a return from trap inεtruction in εaid εecond handler instruction εtream; and resuming execution of instructionε from εaid emulation inεtruction stream beginning at εaid third return addreεε, in reεponεe to a return from trap instruction in said third handler instruction εtream.

24. A method according to claim 22, further compriεing the εtep of reεuming execution of inεtructionε from εaid main inεtruction εtream beginning at εaid first return addresε in reεponεe to a return from procedure inεtruction in εaid emulation instruction stream.

25. Apparatus for handling a trap in a microprocessor for which instructions have a minimum length of 2¹ locations, compriεing: a firεt εet of conductorε for carrying high- order bitε of a baεe addreεε; firεt meanε for providing εaid high-order bitε of εaid base address on said firεt εet of conductorε; a εecond εet of conductorε for carrying a trap number; εecond meanε for providing εaid trap number on εaid second εet of conductorε; and meanε for generating ^'a firεt entry point addreεε from a concatenation of said first and εecond sets of conductors followed by, aε lower-order bits, j, conductors each carrying a reεpective fixed logic level, j*, ≥- - ⁺ 1-

26. Apparatuε according to claim 25, wherein all inεtructionε for said microproceεsor have the same length 2¹.

27. Apparatuε according to claim 25, wherein said first entry point addreεs iε provided aε a logical addreεs, and wherein said means for generating further includes means for converting said logical addreεs to a phyεical address.

28. Apparatus according to claim 25, wherein εaid trap may be of a firεt type or a εecond type, wherein εaid first means for providing compriseε: firεt and εecond trap baεe addreεε sources; and meanε for placing information from εaid first trap baεe εource on εaid first set of conductors if εaid trap is of εaid firεt type, and for placing information from eaid εecond trap baεe addreεε εource on εaid firεt εet of conductorε if εaid trap is of said second type, and wherein said meanε for generating generateε εaid first entry point addreεε if εaid trap iε of εaid firεt type, and is further for, if said trap iε of said εecond type, generating a εecond entry point addreεε from a concatenation of εaid firεt and εecond εetε of conductorε followed by, aε loweεt-order bitε, exactly j₂ conductors each carrying a reεpective fixed logic level,

29. Apparatus according to claim 28, wherein all inεtructionε for said microproceεεor have the same length 2¹, and wherein j₂ = i.

30. Apparatus for handling exceptions in a miσroprocesεor, for use with a εource of instructions and an exception handler, compriεing: execution means for executing instructionε provided thereto; and prefetch meanε for prefetching inεtructionε from εaid εource of inεtructions and providing them to said execution means in a particular sequence, wherein said prefetch meanε includes indicator meanε for indicating, in correspondence with said instructionε provided to εaid execution means, whether a synchronous prefetch exception haε occurred relative to εaid given inεtruction, and wherein said execution means includes invoking means, responsive to said indicator means, for invoking an exception handler if a synchronous exception has occurred relative to an inεtruction provided to εaid execution meanε by εaid prefetch meanε.

31. Apparatuε according to claim 30, wherein εaid execution means further includes detection means for detecting a εynchronous execution exception occurring relative to an instruction being executed by εaid execution meanε, εaid invoking meanε further being responsive to said detection means.

32. Microprocesεor apparatus, for use with a source of instructions and an exception handler, comprising: execution means for executing inεtructionε provided to it; a program counter and firεt and second εtorage registerε; meanε for updating εaid program counter in response to each instruction executed by said execution means; meanε for, in reεponεe to a procedural inεtruction, εtoring εaid program counter in εaid firεt tcragε regiεter and providing inεtructionε to εaid execution means from an emulation εtream beginning at an addreεε reεponεive to said procedural instruction; and means for, in response to the occurrence of a εynchronouε exception relative to an inεtruction provided to εaid execution meanε, εtoring εaid program counter in εaid εecond εtorage register and providing instructionε to εaid execution meanε from a handler εtream beginning at an addreεε reεponεive to εaid εynchronouε exception.

33. Apparatuε according to claim 32, further compriεing meanε for providing further inεtructionε to εaid execution meanε beginning at an addreεε reεponεive to the contentε of εaid firεt εtorage regiεter, in response to a procedural return inεtruction in εaid emulation εtream.

34. Apparatuε according to claim 32, further compriεing meanε for providing further inεtructionε to εaid execution means beginning at an addreεε responsive to the contents of εaid second εtorage regiεter, in reεponεe to a trap return inεtruction in said handler stream.

35. Microprocessor apparatus for use with a regiεter εelect εignal, compriεing: a functional unit having at leaεt one data input and at leaεt one data output; a procesεor state bit and means for setting said εtate bit to a firεt value to indicate an interrupted state in reεponse to the occurrence of a trap; a εet of firεt regiεters each having a data output; at leaεt one εecond regiεter each having a data output and each correεponding to a reεpective one of εaid first registerε; and meanε for coupling to εaid data input of said functional unit: the data output of a εelected one of said first registers selected in responεe to εaid regiεter εelect εignal if εaid εtate bit iε not in said first value, or if none of εaid second registerε corresponds to said selected firεt regiεter; and if one of εaid εecond regiεterε correεponds to said selected firεt regiεter and said state bit iε in said firεt value, the data output of said εecond regiεter corresponding to said εelected firεt regiεter.