US20050268074A1 - Method and apparatus for determining the criticality of a micro-operation - Google Patents
Method and apparatus for determining the criticality of a micro-operation Download PDFInfo
- Publication number
- US20050268074A1 US20050268074A1 US11/200,677 US20067705A US2005268074A1 US 20050268074 A1 US20050268074 A1 US 20050268074A1 US 20067705 A US20067705 A US 20067705A US 2005268074 A1 US2005268074 A1 US 2005268074A1
- Authority
- US
- United States
- Prior art keywords
- μop
- line
- door
- critical
- stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000004519 manufacturing process Methods 0.000 claims 5
- FUYLLJCBCKRIAL-UHFFFAOYSA-N 4-methylumbelliferone sulfate Chemical compound C1=C(OS(O)(=O)=O)C=CC2=C1OC(=O)C=C2C FUYLLJCBCKRIAL-UHFFFAOYSA-N 0.000 description 18
- 238000012545 processing Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 230000014616 translation Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- JTJMJGYZQZDUJJ-UHFFFAOYSA-N phencyclidine Chemical compound C1CCCCN1C1(C=2C=CC=CC=2)CCCCC1 JTJMJGYZQZDUJJ-UHFFFAOYSA-N 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000004377 microelectronic Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3854—Instruction completion, e.g. retiring, committing or graduating
- G06F9/3856—Reordering of instructions, e.g. using queues or age tags
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
Definitions
- the invention relates generally to microprocessors and other processing devices. Specifically, the invention relates to a method and apparatus for intelligently inserting micro-operations into an execution stream via a side-door entry.
- Microelectronic manufacturers are continually striving to improve the speed and performance of microprocessors and other processing devices, the performance of such devices being dependent upon many factors.
- One factor affecting the performance of a processor is the scheduling and execution of instructions associated with a piece of code executing on the processor.
- a processor includes an instruction decoder that decodes an instruction to create one or more micro-instructions, or micro-operations, that can be understood and executed by the processor.
- a micro-operation will also be referred to herein as a “ ⁇ OP.”
- Micro-operations ready for execution are provided to a scheduler, which schedules the order of execution of a series of ⁇ OPs. Scheduled ⁇ OPs are then inserted into an execution stream and subsequently passed to execution circuitry for execution.
- a processor may also include a checker that determines whether a ⁇ OP has been properly executed. If a ⁇ OP has been executed, the ⁇ OP is retired. If the ⁇ OP did not properly execute, the ⁇ OP is sent into a replay loop, wherein the ⁇ OP is returned to the scheduler and rescheduled for execution.
- Access to the execution stream may be provided via a multiplexer or “MUX.”
- the scheduler output is passed to the execution stream via an input at the MUX, this input often referred to as the “front-door entry” to the execution stream.
- a typical processor can execute multiple threads of execution (e.g., two) and, further, is capable of executing instructions out of order.
- the front-door stream may include ⁇ OPs for two or more threads, the ⁇ OPs for each thread being out-of-order and interleaved with ⁇ OPs of other threads.
- a processor may also include a page miss handler (PMH).
- PMH page miss handler
- One task of the PMH is to process specific events—such as, for example, page table misses and page splits—that occur during execution of an instruction or piece of code (i.e., a series of front-door ⁇ OPs). When such an event occurs, the PMH will generate a series of ⁇ OPs to handle the event.
- These PMH ⁇ OPs are provided to the execution stream via a “side-door entry” into the execution stream, the side-door entry comprising a second input to the MUX.
- the flow of ⁇ OPs from the PMH and into the side-door entry of the execution stream may be referred to as the “side-door stream.”
- ⁇ OP On each clock cycle of a processor, only one ⁇ OP may be passed to the execution stream via the MUX.
- the execution stream has only one opportunity to receive—or only one “entry slot” for receiving—a ⁇ OP, and that entry slot may receive a ⁇ OP from only one of the front-door entry and the side-door entry. Therefore, contention for the entry slot of the executions stream will occur whenever a ⁇ OP in the front-door stream—i.e., a “front-door SOP”—is “waiting” for entrance into the execution stream and a ⁇ OP in the side-door stream (from the PMH)—i.e., a “side-door ⁇ OP”—is also seeking entrance to the execution stream.
- Whacking the front-door ⁇ OP irrespective of that ⁇ OP's characteristics can add significant latency to the execution of piece of code.
- Certain ⁇ OPs in the front-door stream will have a greater impact on the execution of other front-door ⁇ OPs and, therefore, are much more critical to the successful execution of an instruction or piece of code.
- the process of automatically whacking a front-door ⁇ OP in favor of a side-door ⁇ OP whenever contention for the entry slot into the execution stream exists, and irrespective of the criticality of the front-door ⁇ OP, may increase latency and inhibit performance.
- Future generations of processors will be expected to perform multiple processes (e.g., handling a page-table miss, handling a cache miss, handling a page split, etc.) in parallel, and a failure to efficiently share the entrance into an execution stream amongst all processes will result in even greater latencies.
- FIG. 1 shows a schematic diagram of an exemplary embodiment of a computer system including a processor having a whacking element.
- FIG. 2 shows a schematic diagram of the processor illustrated in FIG. 1 , the processor including the whacking element.
- FIG. 3 shows a partial schematic diagram of the processor illustrated in FIGS. 1 and 2 .
- FIG. 4 is a flow chart illustrating an exemplary embodiment of operation of the processor of FIGS. 2 and 3 .
- FIG. 5 is a flow chart illustrating an exemplary embodiment of a method of whacking ⁇ OPs, as may be performed by the whacking element of FIG. 2 .
- FIG. 6 is a flow chart illustrating an embodiment of a method of determining the criticality of a ⁇ OP, as may be performed by the whacking element.
- FIG. 7 is a flow chart illustrating another embodiment of the method of determining the criticality of a ⁇ OP.
- FIG. 8 is a flow chart illustrating a further embodiment of the method of determining the criticality of a ⁇ OP.
- the computer system 100 includes a bus 110 having a processor 200 coupled therewith.
- a main memory 120 is coupled—via bus 110 —with the processor 200 , the main memory 120 comprising, for example, random access memory (RAM) or other suitable memory.
- the computer system 100 may further include a read-only memory (ROM) 130 coupled with the bus 110 .
- the processor 200 may also have a data storage device 140 coupled therewith by bus 110 .
- the data storage device 140 comprises any suitable non-volatile memory, such as, for example, a hard disk drive.
- Computer system 100 may further include one or more input devices 150 coupled with the bus 110 . Common input devices 150 include keyboards, pointing devices such as a mouse, and scanners or other data entry devices.
- One or more output devices 160 such as, for example, a video monitor, may also be coupled with the bus 110 .
- the computer system 100 illustrated in FIG. 1 is intended to represent an exemplary embodiment of a computer system and, further, that such a computer system may include many additional components, which have been omitted for clarity.
- the computer system 100 may include a removable storage media (e.g., floppy disk drive, CD-ROM drive), a network interface (e.g., a network card), a chip set associated with the processor 200 , as well as additional signal lines and buses.
- the computer system 100 may not include all of the components shown in FIG. 1 .
- the processor 200 includes a scheduler 210 that receives ⁇ OPs 205 from, for example, an instruction decoder (not shown in figures). As will be explained below, the scheduler 210 may also receive ⁇ OPs 255 from a replay loop 250 . The received ⁇ OPs 205 , as well as those ⁇ OPs 255 received via replay loop 250 , may be associated with a single thread or, alternatively, associated with two or more threads. Scheduling the received ⁇ OPs 205 and replay loop ⁇ OPs 255 for execution is performed by the scheduler 210 .
- the received ⁇ OPs 205 and replay loop ⁇ OPs 255 may be scheduled in an out-of-order manner.
- a stream of scheduled ⁇ OPs is output from the scheduler 210 , this stream of scheduled ⁇ OPs being referred to herein as the “front-door stream” 310 .
- the front-door stream 310 is provided to a first input 221 of a selector or multiplexer 220 , the multiplexer 220 having a second input 222 as well as an output 228 .
- the multiplexer (or selector) 220 comprises any suitable circuitry and/or logic capable of receiving multiple inputs and, in response to a received signal (e.g., a select signal), selecting one of the multiple inputs for the output of the multiplexer.
- the multiplexer (or selector) 220 will be referred to herein as the “side-door MUX.”
- the second input 222 of the side-door MUX 220 receives a “side-door stream” of ⁇ OPs 320 from a page miss handler (PMH) 260 .
- the side-door MUX 220 places ⁇ OPs received from the front-door and side-door streams 310 , 320 into an execution stream 330 , the execution stream 330 being coupled with the output 228 of the side-door MUX 220 .
- the execution stream 330 passes the ⁇ OPs to execution circuitry 230 for execution.
- Another input 224 of the side-door MUX 220 receives the select signal 390 from the PMH 260 .
- the processor 200 may also include a checker 240 coupled with the execution circuitry 230 .
- the checker 240 verifies that each ⁇ OP in the execution stream 330 has successfully executed in execution circuitry 230 .
- a ⁇ OP that has successfully executed is retired (see reference numeral 245 ). However, if a ⁇ OP has not, for any reason, successfully executed, the checker 240 feeds the unexecuted ⁇ OP into the replay loop 250 .
- the replay loop 250 returns the unexecuted ⁇ OP to the scheduler 210 , such that the unexecuted ⁇ OP can be rescheduled and again provided to the execution stream 330 .
- a page miss handler 260 is coupled with the second input 222 of the side-door MUX 220 .
- the PMH 260 may be coupled with a segmentation and address translation unit (SAAT) 270 , and the SAAT 270 may include a translation lookaside buffer (TLB) 275 , which provides a cache for virtual-to-physical address translations.
- SAAT segmentation and address translation unit
- TLB translation lookaside buffer
- the PMH 260 includes circuitry and/or instructions for handling certain events, such as a page miss, a cache miss, a TLB miss, a page split, or a cache split, as well as other events. Generally, such an event occurs in response to execution of one or more ⁇ OPs in the front-door stream.
- a page miss, cache miss, or TLB miss may occur in response to a series of ⁇ OPs representing a load instruction.
- the PMH 260 may process a single event at a time or, alternatively, the PMH 260 may process multiple events (e.g., a page miss and a page split) in parallel.
- the PMH 260 will generate one or more ⁇ OPs to process the event, and these ⁇ OPs are inserted into the execution stream 330 via the side-door stream 320 and input 222 at side-door MUX 220 .
- the SAAT 270 which interfaces directly with the PMH 260 , detects the occurrence of any such event and issues a request to the PMH 260 to process the detected event.
- the SAAT 270 detects a TLB miss—as previously described, the TLB 275 provides a cache for virtual-to-physical address translations—the SAAT 270 will issue a request to the PMH 260 , this request directing the PMH 260 to execute a page walk in order to load the appropriate physical address translation from main memory 120 and into the TLB 275 .
- the PMH 260 will generate one or more ⁇ OPs to handle the page walk.
- the processor 200 illustrated in FIGS. 2 and 3 is intended to represent an exemplary embodiment of a processor and, further, that such a processor may include many additional components that are not shown in these figures, these components having been omitted for ease of understanding.
- the processor 200 may include an instruction decoding unit (as previously suggested), one or more execution units (e.g., for floating point operations, for integer operations, etc.), a register file unit, a bus interface unit, as well as internal clock circuitry.
- many of the components shown in FIG. 2 may be combined and/or share circuitry.
- the SAAT 270 (including TLB 275 ), PMH 260 (including whacking element 265 ), and the side-door MUX 220 may comprise part of a single, integrated system commonly known as a memory execution unit (MEU).
- MEU memory execution unit
- the embodiments described herein are not limited to any particular architecture or arrangement—as well as not being limited to any particular terminology used to described such an architecture or arrangement—and the disclosed embodiments may be practiced on any type of processing device, irrespective of its architecture or the terminology ascribed to it.
- the PMH 260 of processor 200 processes certain types of events—including page misses, cache misses, TLB misses, page splits, and cache splits—and the PMH 260 may process two or more such events in parallel, as set forth above. For example, as illustrated by the method 400 shown in FIG. 4 , the PMH 260 may process a first event 405 a (“EVENT A”) and a second event 405 b (“EVENT B”) in parallel. However, the PMH 260 may, of course, process only one such event or, perhaps, more than two such events.
- the PMH To process the events 405 a, 405 b, the PMH must first recognize that these events have occurred, as shown at blocks 410 a, 410 b. Event recognition is provided by a command or request received from the SAAT 270 , which detects the occurrence of the aforementioned events. Upon recognition of the events 405 a, 405 b, respectively, the PMH will generate one or more ⁇ OPs to process each of the events 405 a, 405 b, which is illustrated by blocks 420 a, 420 b.
- the ⁇ OPs for each event 405 a, 405 b are to eventually be forwarded to the execution stream 330 via the side-door stream 320 (look ahead to reference numeral 440 ); however, when processing multiple events in parallel, not all of the events may be of equal priority. Thus, it may be desirable to assess the priority of each side-door ⁇ OP 322 (or each series of side-door ⁇ OPs 322 associated with an event)—see blocks 430 a, 430 b —in order to determine the order in which these ⁇ OPs should be provided to the side-door stream 320 and, hence, to the execution stream 330 . As shown at block 440 , the side-door ⁇ OPs are then provided to the side-door stream 320 according to their respective priorities.
- the side-door MUX 220 receives the ⁇ OPs provided to the side-door stream 320 —i.e., “side-door ⁇ OPs” 322 —as well as receiving ⁇ OPs from the front-door stream 310 —i.e., “front-door ⁇ OPs” 312 —and these ⁇ OPs 312 , 322 must be inserted into the execution stream 330 .
- these ⁇ OPs 312 , 322 must be inserted into the execution stream 330 .
- only one ⁇ OP may be placed in the execution stream 330 .
- entry slot 331 represents an opportunity to place a ⁇ OP into the execution stream 330 .
- the processor 200 further includes a whacking element 265 coupled with, and/or forming a part of, the PMH 260 (see FIG. 2 ).
- the whacking element 265 comprises any suitable circuitry and/or instructions capable of assessing the criticality of a pending front-door ⁇ OP 312 ′ and, based upon this measure of the criticality of this ⁇ OP, determining which of the pending front-door ⁇ OP 312 ′ and a pending side-door ⁇ OP 322 ′ should be awarded the entry slot 331 of execution stream 330 .
- the whacking element 265 provides for the efficient sharing of the entry slot 331 into the execution stream 330 . Operation of the whacking element 265 and PMH 260 is explained in detail below.
- the criticality of a ⁇ OP may depend upon any one of several factors or a combination thereof.
- a front-door ⁇ OP 312 When a front-door ⁇ OP 312 is contending for the entry slot in the execution stream, that ⁇ OP has been processed by the scheduler 210 and has made its way through the front-door stream 310 .
- This pending ⁇ OP is often times the “oldest” (or one of the “oldest”) ⁇ OP in the front-door stream 310 .
- Execution of many subsequent or “younger” ⁇ OPs in the front-door stream 310 may hinge on successful execution of such an “old” ⁇ OP.
- a multi-threaded processor may, at any instant, favor one thread over other threads, the favored thread having a higher priority.
- the front-door stream 310 may include ⁇ OPs from two or more threads, as previously noted, and one of these threads may have a higher priority than other threads. Accordingly, performance may suffer if a ⁇ OP associated with a high priority thread is whacked rather than a ⁇ OP associated with a thread that has not been given priority.
- the thread priority associated with a ⁇ OP may determine that ⁇ OP's criticality and, hence, its impact on the successful execution of a piece of code. It should be understood that other criteria may provide a measure of the criticality a ⁇ OP.
- FIG. 5 Illustrated in FIG. 5 is an embodiment of a method 500 of intelligently whacking front-door ⁇ OPs.
- the method 500 of whacking front-door ⁇ OPs may be used to determine which of a next-in-line front-door ⁇ OP 312 ′ and a next-in-line side-door ⁇ OP 322 ′ should be placed in the execution stream 330 during a clock cycle.
- the method 500 of intelligently whacking ⁇ OPs as well as the embodiments (see FIGS. 6, 7 , and 8 ) of a method for determining the criticality of a ⁇ OP, as will be described below—are not limited in application to the PMH 260 and side-door MUX 220 of FIGS. 2 and 3 . Rather, the apparatus and methods disclosed herein are generally applicable to any architecture wherein multiple inputs are contending for a single point of entry into an execution stream.
- next-in-line front-door ⁇ OP 312 ′ is “critical,” that front-door ⁇ OP 312 ′ will not be whacked in favor of a next-in-line side-door ⁇ OP 322 ′. If not “critical,” the pending front-door ⁇ OP 312 ′ will be whacked or discarded and sent into the replay loop 250 .
- a ⁇ OP is deemed “critical” if its criticality—as determined based upon an examination of criteria such as, for example, age and thread priority—is such that a failure to place that ⁇ OP into the execution stream 330 will add significant latency to the execution of a piece of code or will otherwise negatively impact performance of the processor 200 .
- criteria such as, for example, age and thread priority
- next-in-line front-door ⁇ OP 312 ′ is critical, that ⁇ OP is awarded entry into the execution stream 330 and is placed in the entry slot 331 , as shown at block 560 . If a next-in-line front-door ⁇ OP 312 ′ is awarded the entry slot 331 , the next-in-line side-door ⁇ OP 322 ′ is held until the next clock cycle, as illustrated at 570 , at which time that side-door ⁇ OP may again be considered for entry into the execution stream.
- next-in-line front-door ⁇ OP 312 ′ is not critical—refer again to reference numeral 550 —that front-door ⁇ OP is whacked, as shown at block 580 , and passed to the replay loop 250 .
- the entry slot 331 of execution stream 330 is awarded to the next-in-line side-door ⁇ OP 322 ′, as illustrated at block 530 .
- the method 500 of whacking front-door ⁇ OPs is presented—for clarity and ease of understanding—in the context of a single clock cycle in which there is one opportunity to insert a ⁇ OP into the execution stream 330 .
- the apparatus and methods disclosed herein are not so limited and, further, that the disclosed embodiments are generally applicable to any type of clock architecture and/or method of providing a clock signal.
- it may be possible to have multiple opportunities to insert a ⁇ OP into the execution stream 330 during a clock cycle e.g., as may be achieved by inserting a ⁇ OP on both the rising and falling edges of a clock signal.
- no ⁇ OP is passed to the execution stream 330 during a clock cycle or during every clock cycle.
- the disclosed apparatus and methods may be applied whenever there is an opportunity—i.e., an entry slot—for inserting a ⁇ OP into an execution stream.
- a method 600 of determiming the criticality of a ⁇ OP is illustrated, as may be performed by whacking element 265 .
- the next-in-line front-door ⁇ OP 312 ′ is accessed to obtain knowledge of its characteristics.
- the age of the next-in-line front-door ⁇ OP 312 ′ is ascertained.
- a predefined policy is then applied to the next-in-line front-door ⁇ OP 312 ′, which is illustrated by block 620 .
- the policy comprises one or more metrics that measure the criticality of a ⁇ OP and determine whether the ⁇ OP is to be considered critical and protected against whacking.
- the policy comprises comparing the age of the next-in-line front-door ⁇ OP 312 ′ against a specified threshold age, as shown at block 630 .
- the threshold age may, by way of example, correspond to the oldest front-door ⁇ OP.
- the threshold age may correspond to, for example, the three oldest front-door ⁇ OPs or, more generally, to the N oldest ⁇ OPs.
- old ⁇ OPs are generally more critical than younger ⁇ OPs and a failure to execute such old ⁇ OPs can significantly impact performance.
- next-in-line front-door ⁇ OP 312 ′ if the age of the next-in-line front-door ⁇ OP 312 ′ is greater than the threshold age, the next-in-line front-door ⁇ OP 312 ′ is identified as critical, as shown at block 650 . Conversely, if the age of the next-in-line front-door ⁇ OP 312 ′ is less than the threshold age, the next-in-line front-door ⁇ OP 312 ′ is not critical, as illustrated at block 660 .
- a select signal 390 is issued or generated by the whacking element 265 (or by the PMH 260 ) and is provided to the side-door MUX 220 , as illustrated at block 670 .
- the select signal 390 indicates to the side-door MUX 220 which of the two inputs 221 , 222 is to receive a ⁇ OP. If the first input 221 is selected, the next-in-line front-door ⁇ OP 312 ′ is received and passed to the execution stream 330 .
- next-in-line side-door ⁇ OP 322 ′ is received and passed to the execution stream 330 , whereas the next-in-line front-door ⁇ OP 312 ′ is whacked and sent into the replay loop 250 .
- FIG. 7 another embodiment—as denoted by reference numeral 700 —of the method of determining or examining the criticality of a ⁇ OP is illustrated.
- the next-in-line front-door ⁇ OP 312 ′ is accessed to obtain knowledge of its characteristics.
- the thread associated with the next-in-line front-door ⁇ OP 312 ′ is ascertained.
- a predefined policy is then applied to the next-in-line front-door ⁇ OP 312 ′, which is illustrated by block 720 .
- the processor 200 may favor a certain thread (or threads) and grant priority to that thread.
- the policy may comprise determining whether the thread associated with the next-in-line front-door ⁇ OP 312 ′ has been given priority by the processor 200 , as shown at block 730 .
- next-in-line front-door ⁇ OP 312 ′ is deemed critical, as shown at block 750 . Conversely, if the thread associated with the next-in-line front-door ⁇ OP 312 ′ does not have priority, the next-in-line front-door ⁇ OP 312 ′ is not critical, as illustrated at block 760 . Once the criticality of the next-in-line front-door ⁇ OP 312 has been determined, the select signal 390 is issued to the side-door MUX 220 , as shown at block 770 .
- the select signal 390 indicates to the side-door MUX 220 which of the two inputs 221 , 222 is to receive a ⁇ OP. If the first input 221 is selected, the next-in-line front-door ⁇ OP 312 ′ is received and passed to the execution stream 330 . If, however, the second input 222 is selected, the next-in-line side-door ⁇ OP 322 ′ is received and passed to the execution stream 330 , whereas the next-in-line front-door ⁇ OP 312 ′ is whacked and sent into the replay loop 250 .
- embodiments 600 , 700 of the method of determining the criticality of a ⁇ OP are only exemplary and, further, that any suitable metric, or combination of metrics, may be employed to assess the criticality of a ⁇ OP. For example, rather than looking to either the age or thread priority associated with a ⁇ OP individually, both age and thread priority may be considered in determining whether to protect a front-door ⁇ OP. More generally, as illustrated in FIG. 8 , a policy may be employed that includes any suitable number of metrics.
- next-in-line front-door ⁇ OP 312 ′ is accessed to obtain knowledge of its characteristics. Characteristics such as age and thread priority, as well as other properties, are ascertained.
- a predefined policy is then applied to the next-in-line front-door ⁇ OP 312 ′, as shown at block 820 .
- the policy comprises any suitable number of metrics or criteria—such as, for example, metrics 830 a, 830 b, . . . , 830 j —that may be used to evaluate the criticality of a ⁇ OP and, further, to determine whether the ⁇ OP is to be protected against whacking.
- next-in-line front-door ⁇ OP 312 ′ satisfies the metric, or metrics (e.g., metrics 830 a - j ). If the next-in-line front-door ⁇ OP 312 ′ satisfies the metrics 830 a - j, or a specified portion of these metrics, the next-in-line front-door ⁇ OP 312 ′ is critical, as shown at block 850 .
- next-in-line front-door ⁇ OP 312 ′ does not satisfy the metrics 830 a - j, or at least a specified number of these metrics, the next-in-line front-door ⁇ OP 312 ′ is not critical, as illustrated at block 860 .
- the select signal 390 is provided to the side-door MUX 220 , as illustrated at block 870 .
- two ⁇ OPs e.g., a pending front-door ⁇ OP and a pending side-door ⁇ OP—are contending for entrance into an execution stream
- a decision to whack one of the two pending ⁇ OPs is made based upon the criticality of one of the ⁇ OPs, thereby avoiding the whacking of a critical (e.g., old) ⁇ OP and minimizing latency.
- Any suitable metric or combination of metrics may be used to determine the criticality of a ⁇ OP.
Abstract
A method and apparatus for whacking a μOP based upon the criticality of that μOP. Also disclosed are embodiments of a method for determining the criticality of a μOP.
Description
- The invention relates generally to microprocessors and other processing devices. Specifically, the invention relates to a method and apparatus for intelligently inserting micro-operations into an execution stream via a side-door entry.
- Microelectronic manufacturers are continually striving to improve the speed and performance of microprocessors and other processing devices, the performance of such devices being dependent upon many factors. One factor affecting the performance of a processor is the scheduling and execution of instructions associated with a piece of code executing on the processor. Typically, a processor includes an instruction decoder that decodes an instruction to create one or more micro-instructions, or micro-operations, that can be understood and executed by the processor. A micro-operation will also be referred to herein as a “μOP.” Micro-operations ready for execution are provided to a scheduler, which schedules the order of execution of a series of μOPs. Scheduled μOPs are then inserted into an execution stream and subsequently passed to execution circuitry for execution. A processor may also include a checker that determines whether a μOP has been properly executed. If a μOP has been executed, the μOP is retired. If the μOP did not properly execute, the μOP is sent into a replay loop, wherein the μOP is returned to the scheduler and rescheduled for execution.
- Access to the execution stream may be provided via a multiplexer or “MUX.” The scheduler output is passed to the execution stream via an input at the MUX, this input often referred to as the “front-door entry” to the execution stream. The flow of μOPs from the scheduler and into the front-door entry of the execution steam—the output of the scheduler including μOPs received from the instruction decoder, as well as μOPs received from the replay loop—may be referred to as the “front-door stream.” A typical processor can execute multiple threads of execution (e.g., two) and, further, is capable of executing instructions out of order. Accordingly, the front-door stream may include μOPs for two or more threads, the μOPs for each thread being out-of-order and interleaved with μOPs of other threads.
- A processor may also include a page miss handler (PMH). One task of the PMH is to process specific events—such as, for example, page table misses and page splits—that occur during execution of an instruction or piece of code (i.e., a series of front-door μOPs). When such an event occurs, the PMH will generate a series of μOPs to handle the event. These PMH μOPs are provided to the execution stream via a “side-door entry” into the execution stream, the side-door entry comprising a second input to the MUX. The flow of μOPs from the PMH and into the side-door entry of the execution stream may be referred to as the “side-door stream.”
- On each clock cycle of a processor, only one μOP may be passed to the execution stream via the MUX. In other words, during a clock cycle, the execution stream has only one opportunity to receive—or only one “entry slot” for receiving—a μOP, and that entry slot may receive a μOP from only one of the front-door entry and the side-door entry. Therefore, contention for the entry slot of the executions stream will occur whenever a μOP in the front-door stream—i.e., a “front-door SOP”—is “waiting” for entrance into the execution stream and a μOP in the side-door stream (from the PMH)—i.e., a “side-door μOP”—is also seeking entrance to the execution stream. In conventional processors, when a side-door μOP was pending, the entry slot was automatically “awarded” to the side-door μOP and the front-door μOP was discarded, or “whacked.” The whacked front-door μOP was sent into the replay loop and returned to the scheduler for rescheduling. The process of whacking a front-door μOP in favor of a side-door μOP is commonly referred to as “side-door whacking.”
- Whacking the front-door μOP irrespective of that μOP's characteristics can add significant latency to the execution of piece of code. Certain μOPs in the front-door stream will have a greater impact on the execution of other front-door μOPs and, therefore, are much more critical to the successful execution of an instruction or piece of code. Thus, the process of automatically whacking a front-door μOP in favor of a side-door μOP whenever contention for the entry slot into the execution stream exists, and irrespective of the criticality of the front-door μOP, may increase latency and inhibit performance. Future generations of processors will be expected to perform multiple processes (e.g., handling a page-table miss, handling a cache miss, handling a page split, etc.) in parallel, and a failure to efficiently share the entrance into an execution stream amongst all processes will result in even greater latencies.
-
FIG. 1 shows a schematic diagram of an exemplary embodiment of a computer system including a processor having a whacking element. -
FIG. 2 shows a schematic diagram of the processor illustrated inFIG. 1 , the processor including the whacking element. -
FIG. 3 shows a partial schematic diagram of the processor illustrated inFIGS. 1 and 2 . -
FIG. 4 is a flow chart illustrating an exemplary embodiment of operation of the processor ofFIGS. 2 and 3 . -
FIG. 5 is a flow chart illustrating an exemplary embodiment of a method of whacking μOPs, as may be performed by the whacking element ofFIG. 2 . -
FIG. 6 is a flow chart illustrating an embodiment of a method of determining the criticality of a μOP, as may be performed by the whacking element. -
FIG. 7 is a flow chart illustrating another embodiment of the method of determining the criticality of a μOP. -
FIG. 8 is a flow chart illustrating a further embodiment of the method of determining the criticality of a μOP. - Referring to
FIG. 1 , an exemplary embodiment of acomputer system 100 is illustrated. Thecomputer system 100 includes abus 110 having aprocessor 200 coupled therewith. Amain memory 120 is coupled—viabus 110—with theprocessor 200, themain memory 120 comprising, for example, random access memory (RAM) or other suitable memory. Thecomputer system 100 may further include a read-only memory (ROM) 130 coupled with thebus 110. Theprocessor 200 may also have adata storage device 140 coupled therewith bybus 110. Thedata storage device 140 comprises any suitable non-volatile memory, such as, for example, a hard disk drive.Computer system 100 may further include one ormore input devices 150 coupled with thebus 110.Common input devices 150 include keyboards, pointing devices such as a mouse, and scanners or other data entry devices. One ormore output devices 160, such as, for example, a video monitor, may also be coupled with thebus 110. - It should be understood that the
computer system 100 illustrated inFIG. 1 is intended to represent an exemplary embodiment of a computer system and, further, that such a computer system may include many additional components, which have been omitted for clarity. By way of example, thecomputer system 100 may include a removable storage media (e.g., floppy disk drive, CD-ROM drive), a network interface (e.g., a network card), a chip set associated with theprocessor 200, as well as additional signal lines and buses. Also, it should be understood that thecomputer system 100 may not include all of the components shown inFIG. 1 . - Referring now to
FIG. 2 in conjunction withFIG. 3 , theprocessor 200 is illustrated in greater detail. Theprocessor 200 includes ascheduler 210 that receivesμOPs 205 from, for example, an instruction decoder (not shown in figures). As will be explained below, thescheduler 210 may also receiveμOPs 255 from areplay loop 250. The receivedμOPs 205, as well as thoseμOPs 255 received viareplay loop 250, may be associated with a single thread or, alternatively, associated with two or more threads. Scheduling the receivedμOPs 205 andreplay loop μOPs 255 for execution is performed by thescheduler 210. The receivedμOPs 205 andreplay loop μOPs 255 may be scheduled in an out-of-order manner. A stream of scheduled μOPs is output from thescheduler 210, this stream of scheduled μOPs being referred to herein as the “front-door stream” 310. - The front-
door stream 310 is provided to afirst input 221 of a selector ormultiplexer 220, themultiplexer 220 having asecond input 222 as well as anoutput 228. The multiplexer (or selector) 220 comprises any suitable circuitry and/or logic capable of receiving multiple inputs and, in response to a received signal (e.g., a select signal), selecting one of the multiple inputs for the output of the multiplexer. The multiplexer (or selector) 220 will be referred to herein as the “side-door MUX.” As will be explained in greater detail below, thesecond input 222 of the side-door MUX 220 receives a “side-door stream” ofμOPs 320 from a page miss handler (PMH) 260. The side-door MUX 220 places μOPs received from the front-door and side-door streams execution stream 330, theexecution stream 330 being coupled with theoutput 228 of the side-door MUX 220. Theexecution stream 330 passes the μOPs toexecution circuitry 230 for execution. Anotherinput 224 of the side-door MUX 220 receives theselect signal 390 from thePMH 260. - The
processor 200 may also include achecker 240 coupled with theexecution circuitry 230. Thechecker 240 verifies that each μOP in theexecution stream 330 has successfully executed inexecution circuitry 230. A μOP that has successfully executed is retired (see reference numeral 245). However, if a μOP has not, for any reason, successfully executed, thechecker 240 feeds the unexecuted μOP into thereplay loop 250. Thereplay loop 250 returns the unexecuted μOP to thescheduler 210, such that the unexecuted μOP can be rescheduled and again provided to theexecution stream 330. - As noted above, a
page miss handler 260 is coupled with thesecond input 222 of the side-door MUX 220. ThePMH 260 may be coupled with a segmentation and address translation unit (SAAT) 270, and theSAAT 270 may include a translation lookaside buffer (TLB) 275, which provides a cache for virtual-to-physical address translations. ThePMH 260 includes circuitry and/or instructions for handling certain events, such as a page miss, a cache miss, a TLB miss, a page split, or a cache split, as well as other events. Generally, such an event occurs in response to execution of one or more μOPs in the front-door stream. For example, a page miss, cache miss, or TLB miss may occur in response to a series of μOPs representing a load instruction. ThePMH 260 may process a single event at a time or, alternatively, thePMH 260 may process multiple events (e.g., a page miss and a page split) in parallel. - In response to one of the aforementioned events, the
PMH 260 will generate one or more μOPs to process the event, and these μOPs are inserted into theexecution stream 330 via the side-door stream 320 andinput 222 at side-door MUX 220. TheSAAT 270, which interfaces directly with thePMH 260, detects the occurrence of any such event and issues a request to thePMH 260 to process the detected event. By way of example, if theSAAT 270 detects a TLB miss—as previously described, theTLB 275 provides a cache for virtual-to-physical address translations—theSAAT 270 will issue a request to thePMH 260, this request directing thePMH 260 to execute a page walk in order to load the appropriate physical address translation frommain memory 120 and into theTLB 275. ThePMH 260 will generate one or more μOPs to handle the page walk. - It should be understood that the
processor 200 illustrated inFIGS. 2 and 3 is intended to represent an exemplary embodiment of a processor and, further, that such a processor may include many additional components that are not shown in these figures, these components having been omitted for ease of understanding. For example, theprocessor 200 may include an instruction decoding unit (as previously suggested), one or more execution units (e.g., for floating point operations, for integer operations, etc.), a register file unit, a bus interface unit, as well as internal clock circuitry. Also, it should be understood that many of the components shown inFIG. 2 may be combined and/or share circuitry. By way of example, the SAAT 270 (including TLB 275), PMH 260 (including whacking element 265), and the side-door MUX 220 may comprise part of a single, integrated system commonly known as a memory execution unit (MEU). Most importantly, the embodiments described herein are not limited to any particular architecture or arrangement—as well as not being limited to any particular terminology used to described such an architecture or arrangement—and the disclosed embodiments may be practiced on any type of processing device, irrespective of its architecture or the terminology ascribed to it. - The
PMH 260 ofprocessor 200 processes certain types of events—including page misses, cache misses, TLB misses, page splits, and cache splits—and thePMH 260 may process two or more such events in parallel, as set forth above. For example, as illustrated by themethod 400 shown inFIG. 4 , thePMH 260 may process afirst event 405 a (“EVENT A”) and asecond event 405 b (“EVENT B”) in parallel. However, thePMH 260 may, of course, process only one such event or, perhaps, more than two such events. - To process the
events blocks SAAT 270, which detects the occurrence of the aforementioned events. Upon recognition of theevents events blocks event execution stream 330 via the side-door stream 320 (look ahead to reference numeral 440); however, when processing multiple events in parallel, not all of the events may be of equal priority. Thus, it may be desirable to assess the priority of each side-door μOP 322 (or each series of side-door μOPs 322 associated with an event)—seeblocks door stream 320 and, hence, to theexecution stream 330. As shown atblock 440, the side-door μOPs are then provided to the side-door stream 320 according to their respective priorities. - Referring again to
FIG. 3 , the side-door MUX 220 receives the μOPs provided to the side-door stream 320—i.e., “side-door μOPs” 322—as well as receiving μOPs from the front-door stream 310—i.e., “front-door μOPs” 312—and theseμOPs execution stream 330. During each clock cycle of theprocessor 200, however, only one μOP may be placed in theexecution stream 330. Stated another way, there is only one “entry slot” 331 into the execution stream during a clock cycle, and the front-door stream 310 and side-door stream 320 must share this single point of entry. In essence, theentry slot 331 represents an opportunity to place a μOP into theexecution stream 330. - For conventional processors, as previously described, the entry slot into the execution stream was always awarded to the side-door stream if a side-door μOP was pending, and any pending front-door μOP was automatically discarded or whacked. No attempt was made to assess the criticality and, hence, the potential for increased latency of the whacked front-door μOP. This failure to efficiently share the execution stream between the front-door and side-door streams resulted in decreased performance.
- To intelligently select which of two pending μOPs—i.e., a “next-in-line” front-
door μOP 312′ and a “next-in-line” side-door μOP 322′—should be awarded theentry slot 331 into theexecution stream 330 on any given clock cycle, theprocessor 200 further includes awhacking element 265 coupled with, and/or forming a part of, the PMH 260 (seeFIG. 2 ). Thewhacking element 265 comprises any suitable circuitry and/or instructions capable of assessing the criticality of a pending front-door μOP 312′ and, based upon this measure of the criticality of this μOP, determining which of the pending front-door μOP 312′ and a pending side-door μOP 322′ should be awarded theentry slot 331 ofexecution stream 330. Thus, the whackingelement 265 provides for the efficient sharing of theentry slot 331 into theexecution stream 330. Operation of thewhacking element 265 andPMH 260 is explained in detail below. - The criticality of a μOP may depend upon any one of several factors or a combination thereof. When a front-
door μOP 312 is contending for the entry slot in the execution stream, that μOP has been processed by thescheduler 210 and has made its way through the front-door stream 310. This pending μOP is often times the “oldest” (or one of the “oldest”) μOP in the front-door stream 310. Execution of many subsequent or “younger” μOPs in the front-door stream 310 may hinge on successful execution of such an “old” μOP. Thus, if an older front-door μOP is whacked in favor of a side-door μOP, significant latency may be incurred, as it may take several clock cycles (e.g., 18) for the whacked front-door μOP to make its way back to the execution stream. Further, even after making its way back through thereplay loop 250 and thescheduler 210, the whacked front-door μOP may again be whacked if another side-door μOP is pending. The repeated whacking of an old μOP may lead to a “live lock-up,” wherein a piece of code reaches a state of virtual lock-up because younger μOPs can not successfully execute prior to execution of a repeatedly whacked old μOP. Also, the potential for whacking an old μOP—rather than whacking a younger μOP—is greater during out-of-order execution of instructions, a practice now commonplace in many processors. - Another factor that may be indicative of the criticality of a μOP is thread priority. A multi-threaded processor may, at any instant, favor one thread over other threads, the favored thread having a higher priority. The front-
door stream 310 may include μOPs from two or more threads, as previously noted, and one of these threads may have a higher priority than other threads. Accordingly, performance may suffer if a μOP associated with a high priority thread is whacked rather than a μOP associated with a thread that has not been given priority. Thus, in addition to the “age” of a μOP, the thread priority associated with a μOP may determine that μOP's criticality and, hence, its impact on the successful execution of a piece of code. It should be understood that other criteria may provide a measure of the criticality a μOP. - Illustrated in
FIG. 5 is an embodiment of amethod 500 of intelligently whacking front-door μOPs. Themethod 500 of whacking front-door μOPs may be used to determine which of a next-in-line front-door μOP 312′ and a next-in-line side-door μOP 322′ should be placed in theexecution stream 330 during a clock cycle. It should be understood, however, that themethod 500 of intelligently whacking μOPs—as well as the embodiments (seeFIGS. 6, 7 , and 8) of a method for determining the criticality of a μOP, as will be described below—are not limited in application to thePMH 260 and side-door MUX 220 ofFIGS. 2 and 3 . Rather, the apparatus and methods disclosed herein are generally applicable to any architecture wherein multiple inputs are contending for a single point of entry into an execution stream. - Referring now to
FIG. 5 , during a clock cycle—seereference numerals 510 and 540—it is determined whether there is contention for theentry slot 331 into theexecution stream 330. If there is both a front-door μOP 312′ and a side-door μOP 322′ seeking entrance into theexecution stream 330, as shown atreference numeral 520, contention for theentry slot 331 will exist. If, upon examination, it is found that no contention for theentry slot 331 exists—i.e., only a front-door μOP 312′ is pending or only a side-door μOP 322′ is pending, but not both—the pending μOP is awarded access to theentry slot 331, as illustrated atblock 530. - If there is contention for the
entry slot 331 ofexecution stream 330, the criticality of the next-in-line front-door μOP 312′—and, hence, whether to whack the next-in-line front-door μOP—is determined, as shown atblock 600. If the next-in-line front-door μOP 312′ is “critical,” that front-door μOP 312′ will not be whacked in favor of a next-in-line side-door μOP 322′. If not “critical,” the pending front-door μOP 312′ will be whacked or discarded and sent into thereplay loop 250. A μOP is deemed “critical” if its criticality—as determined based upon an examination of criteria such as, for example, age and thread priority—is such that a failure to place that μOP into theexecution stream 330 will add significant latency to the execution of a piece of code or will otherwise negatively impact performance of theprocessor 200. Embodiments of a method of determining the criticality of a μOP are described below. - Referring to reference numeral 550, if the next-in-line front-
door μOP 312′ is critical, that μOP is awarded entry into theexecution stream 330 and is placed in theentry slot 331, as shown atblock 560. If a next-in-line front-door μOP 312′ is awarded theentry slot 331, the next-in-line side-door μOP 322′ is held until the next clock cycle, as illustrated at 570, at which time that side-door μOP may again be considered for entry into the execution stream. On the other hand, if it is determined that the next-in-line front-door μOP 312′ is not critical—refer again to reference numeral 550—that front-door μOP is whacked, as shown at block 580, and passed to thereplay loop 250. With the pending front-door μOP 312′ whacked, theentry slot 331 ofexecution stream 330 is awarded to the next-in-line side-door μOP 322′, as illustrated atblock 530. - The
method 500 of whacking front-door μOPs is presented—for clarity and ease of understanding—in the context of a single clock cycle in which there is one opportunity to insert a μOP into theexecution stream 330. However, it should be understood that the apparatus and methods disclosed herein are not so limited and, further, that the disclosed embodiments are generally applicable to any type of clock architecture and/or method of providing a clock signal. For example, it may be possible to have multiple opportunities to insert a μOP into theexecution stream 330 during a clock cycle (e.g., as may be achieved by inserting a μOP on both the rising and falling edges of a clock signal). Also, there may be instances where no μOP is passed to theexecution stream 330 during a clock cycle or during every clock cycle. In general, the disclosed apparatus and methods may be applied whenever there is an opportunity—i.e., an entry slot—for inserting a μOP into an execution stream. - Embodiments of a method of determining or examining the criticality of a next-in-line front-
door μOP 312′—or, more generally, any μOP—are now described. Referring toFIG. 6 , one embodiment of amethod 600 of determiming the criticality of a μOP is illustrated, as may be performed by whackingelement 265. As shown ablock 610, the next-in-line front-door μOP 312′ is accessed to obtain knowledge of its characteristics. Specifically, for themethod 600 ofFIG. 6 , the age of the next-in-line front-door μOP 312′ is ascertained. A predefined policy is then applied to the next-in-line front-door μOP 312′, which is illustrated byblock 620. The policy comprises one or more metrics that measure the criticality of a μOP and determine whether the μOP is to be considered critical and protected against whacking. For themethod 600 of examining criticality, the policy comprises comparing the age of the next-in-line front-door μOP 312′ against a specified threshold age, as shown atblock 630. The threshold age may, by way of example, correspond to the oldest front-door μOP. Similarly, the threshold age may correspond to, for example, the three oldest front-door μOPs or, more generally, to the N oldest μOPs. As previously described, old μOPs are generally more critical than younger μOPs and a failure to execute such old μOPs can significantly impact performance. - Referring to reference numeral 640, if the age of the next-in-line front-
door μOP 312′ is greater than the threshold age, the next-in-line front-door μOP 312′ is identified as critical, as shown atblock 650. Conversely, if the age of the next-in-line front-door μOP 312′ is less than the threshold age, the next-in-line front-door μOP 312′ is not critical, as illustrated atblock 660. Once the criticality of the next-in-line front-door μOP 312′ has been determined—and, hence, whether the pending front-door μOP 312′ should be whacked—aselect signal 390 is issued or generated by the whacking element 265 (or by the PMH 260) and is provided to the side-door MUX 220, as illustrated atblock 670. Theselect signal 390 indicates to the side-door MUX 220 which of the twoinputs first input 221 is selected, the next-in-line front-door μOP 312′ is received and passed to theexecution stream 330. If, however, thesecond input 222 is selected, the next-in-line side-door μOP 322′ is received and passed to theexecution stream 330, whereas the next-in-line front-door μOP 312′ is whacked and sent into thereplay loop 250. - Referring to
FIG. 7 , another embodiment—as denoted byreference numeral 700—of the method of determining or examining the criticality of a μOP is illustrated. As shown ablock 710, the next-in-line front-door μOP 312′ is accessed to obtain knowledge of its characteristics. Specifically, for themethod 700 ofFIG. 7 , the thread associated with the next-in-line front-door μOP 312′ is ascertained. A predefined policy is then applied to the next-in-line front-door μOP 312′, which is illustrated byblock 720. During a given time period, theprocessor 200 may favor a certain thread (or threads) and grant priority to that thread. Over time, changing conditions may dictate that a different thread be given priority, and the processor may switch back and forth between threads. Accordingly, the policy may comprise determining whether the thread associated with the next-in-line front-door μOP 312′ has been given priority by theprocessor 200, as shown atblock 730. - Referring to reference numeral 740, if the thread associated with the next-in-line front-
door μOP 312′ has priority, the next-in-line front-door μOP 312′ is deemed critical, as shown atblock 750. Conversely, if the thread associated with the next-in-line front-door μOP 312′ does not have priority, the next-in-line front-door μOP 312′ is not critical, as illustrated atblock 760. Once the criticality of the next-in-line front-door μOP 312 has been determined, theselect signal 390 is issued to the side-door MUX 220, as shown atblock 770. Again, theselect signal 390 indicates to the side-door MUX 220 which of the twoinputs first input 221 is selected, the next-in-line front-door μOP 312′ is received and passed to theexecution stream 330. If, however, thesecond input 222 is selected, the next-in-line side-door μOP 322′ is received and passed to theexecution stream 330, whereas the next-in-line front-door μOP 312′ is whacked and sent into thereplay loop 250. - It should be understood that the
embodiments FIG. 8 , a policy may be employed that includes any suitable number of metrics. - Referring to block 810 in
FIG. 8 , the next-in-line front-door μOP 312′ is accessed to obtain knowledge of its characteristics. Characteristics such as age and thread priority, as well as other properties, are ascertained. A predefined policy is then applied to the next-in-line front-door μOP 312′, as shown atblock 820. The policy comprises any suitable number of metrics or criteria—such as, for example,metrics door μOP 312′ satisfies the metric, or metrics (e.g., metrics 830 a-j). If the next-in-line front-door μOP 312′ satisfies the metrics 830 a-j, or a specified portion of these metrics, the next-in-line front-door μOP 312′ is critical, as shown atblock 850. Conversely, if the next-in-line front-door μOP 312′ does not satisfy the metrics 830 a-j, or at least a specified number of these metrics, the next-in-line front-door μOP 312′ is not critical, as illustrated atblock 860. After determining the criticality of the next-in-line front-door μOP 312′, theselect signal 390 is provided to the side-door MUX 220, as illustrated atblock 870. - An embodiment of a
method 500 of intelligently whacking μOPs—as well asembodiments - The foregoing detailed description and accompanying drawings are only illustrative and not restrictive. They have been provided primarily for a clear and comprehensive understanding of the present invention and no unnecessary limitations are to be understood therefrom. Numerous additions, deletions, and modifications to the embodiments described herein, as well as alternative arrangements, may be devised by those skilled in the art without departing from the spirit of the present invention and the scope of the appended claims.
Claims (36)
1-12. (canceled)
13. A method comprising:
accessing a next-in-line μOP of an input stream;
applying a metric to the next-in-line μOP; and
if the next-in-line μOP satisfies the metric, identifying the next-in-line μOP as critical.
14. The method of claim 13 , further comprising identifying the next-in-line μOP as not critical if the next-in-line μOP does not satisfy the metric.
15. The method of claim 13 , wherein the metric comprises comparing an age of the next-in-line μOP with a predefined threshold age.
16. The method of claim 13 , wherein the metric comprises determining whether a thread associated with the next-in-line μOP has been given priority.
17. The method of claim 14 , further comprising issuing a select signal, wherein the select signal indicates:
if the next-in-line μOP is critical, that the next-in-line μOP is selected for output; and
if the next-in-line μOP is not critical, that a next-in-line μOP of another input stream is selected for output.
18. A method comprising:
accessing a next-in-line μOP of a front-door stream;
comparing an age of the next-in-line front-door μOP with a predefined threshold age; and
if the age of the next-in-line front-door μOP exceeds the threshold age, identifying the next-in-line front-door μOP as critical.
19. The method of claim 18 , further comprising identifying the next-in-line front-door μOP as not critical if the age of the next-in-line front-door μOP is less than the threshold age.
20. The method of claim 18 , wherein the threshold age corresponds to an oldest μOP.
21. The method of claim 18 , wherein the next-in-line front-door μOP is associated with a thread, the method further comprising:
determining whether the thread has been given priority; and
if the thread does not have priority, identifying the next-in-line front-door μOP as not critical.
22. The method of claim 19 , further comprising issuing a select signal, wherein the select signal indicates:
if the next-in-line front-door μOP is critical, that the next-in-line front-door μOP is selected for output; and
if the next-in-line front-door μOP is not critical, that a next-in-line μOP of a side-door stream is selected for output.
23. A method comprising:
accessing a next-in-line μOP of a front-door stream, the next-in-line front-door μOP associated with a thread;
determining whether the thread has been given priority; and
if the thread has priority, identifying the next-in-line front-door μOP as critical.
24. The method of claim 23 , further comprising identifying the next-in-line front-door μOP as not critical if the thread does not have priority.
25. The method of claim 23 , further comprising:
comparing an age of the next-in-line front-door μOP with a predefined threshold age; and
if the age of the next-in-line front-door μOP is less than the threshold age, identifying the next-in-line front-door μOP as not critical.
26. The method of claim 24 , further comprising issuing a select signal, wherein the select signal indicates:
if the next-in-line front-door μOP is critical, that the next-in-line front-door μOP is selected for output; and
if the next-in-line front-door μOP is not critical, that a next-in-line μOP of a side-door stream is selected for output.
27-36. (canceled)
37. A device comprising:
a selector to receive an input stream; and
a whacking element coupled with the selector, the whacking element to
access a next-in-line μOP of the input stream,
apply a metric to the next-in-line μOP, and
if the next-in-line μOP satisfies the metric, identify the next-in-line μOP as critical.
38. The device of claim 37 , the whacking element to identify the next-in-line μOP as not critical if the next-in-line μOP does not satisfy the metric.
39. The device of claim 37 , the whacking element, when applying the metric, to compare an age of the next-in-line μOP with a predefined threshold age.
40. The device of claim 37 , the whacking element, when applying the metric, to determine whether a thread associated with the next-in-line μOP has been given priority.
41. The device of claim 38 , the whacking element to provide a select signal to the selector, wherein the select signal indicates:
if the next-in-line μOP is critical, that the next-in-line μOP is selected for output; and
if the next-in-line μOP is not critical, that a next-in-line μOP of another input stream is selected for output.
42. A device comprising:
a multiplexer having a first input, a second input, and an output, the multiplexer to receive a front-door stream at the first input;
a page miss handler coupled with the second input of the multiplexer, the page miss handler to provide a side-door stream to the multiplexer; and
a whacking element coupled with the page miss handler, the whacking unit to
access a next-in-line μOP of the front-door stream,
compare an age of the next-in-line front-door μOP with a predefined threshold age, and
if the age of the next-in-line front-door μOP exceeds a threshold age, identify the next-in-line front-door μOP as critical.
43. The device of claim 42 , the whacking element to identify the next-in-line front-door μOP as not critical if the age of the next-in-line front-door μOP is less than the threshold age.
44. The device of claim 42 , wherein the threshold age corresponds to an oldest μOP.
45. The device of claim 42 , wherein the next-in-line front-door μOP is associated with a thread, the whacking element to:
determine whether the thread has been given priority; and
if the thread does not have priority, identifying the next-in-line front-door μOP as not critical.
46. The device of claim 43 , the whacking element to provide a select signal to the multiplexer, wherein the select signal indicates:
if the next-in-line front-door μOP is critical, that the next-in-line front-door μOP is selected for output; and
if the next-in-line front-door μOP is not critical, that a next-in-line μOP of a side-door stream is selected for output.
47. A device comprising:
a multiplexer having a first input, a second input, and an output, the multiplexer to receive a front-door stream at the first input;
a page miss handler coupled with the second input of the multiplexer, the page miss handler to provide a side-door stream to the multiplexer; and
a whacking element coupled with the page miss handler, the whacking element to
access a next-in-line μOP of the front-door stream, the next-in-line front-door μOP associated with a thread,
determine whether the thread has been given priority, and
if the thread has priority, identify the next-in-line front-door μOP as critical.
48. The device of claim 47 , the whacking element to identify the next-in-line front-door μOP as not critical if the thread does not have priority.
49. The device of claim 47 , the whacking element to:
compare an age of the next-in-line front-door μOP with a predefined threshold age; and
if the age of the next-in-line front-door μOP is less than the threshold age, identify the next-in-line front-door μOP as not critical.
50. The method of claim 48 , the whacking element to provide a select signal to the multiplexer, wherein the select signal indicates:
if the next-in-line front-door μOP is critical, that the next-in-line front-door μOP is selected for output; and
if the next-in-line front-door μOP is not critical, that a next-in-line μOP of a side-door stream is selected for output.
51-54. (canceled)
55. An article of manufacture comprising:
a machine accessible medium providing content that, when accessed by a machine, causes the machine to
access a next-in-line μOP of an input stream;
apply a metric to the next-in-line μOP; and
if the next-in-line μOP satisfies the metric, identify the next-in-line μOP as critical.
56. The article of manufacture of claim 55 , wherein the content, when accessed, further causes the machine to identify the next-in-line μOP as not critical if the next-in-line μOP does not satisfy the metric.
57. The article of manufacture of claim 55 , wherein the content, when accessed, further causes the machine, when applying the metric, to compare an age of the next-in-line μOP with a predefined threshold age.
58. The article of manufacture of claim 55 , wherein the content, when accessed, further causes the machine, when applying the metric, to determine whether a thread associated with the next-in-line μOP has been given priority.
59. The article of manufacture of claim 56 , wherein the content, when accessed, further causes the machine to issue a select signal, the select signal to indicate:
if the next-in-line μOP is critical, that the next-in-line μOP is selected for output; and
if the next-in-line μOP is not critical, that a next-in-line μOP of another input stream is selected for output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/200,677 US20050268074A1 (en) | 2002-01-02 | 2005-08-09 | Method and apparatus for determining the criticality of a micro-operation |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/037,023 US7069424B2 (en) | 2002-01-02 | 2002-01-02 | Placing front instruction in replay loop to front to place side instruction into execution stream upon determination of criticality |
US11/200,677 US20050268074A1 (en) | 2002-01-02 | 2005-08-09 | Method and apparatus for determining the criticality of a micro-operation |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/037,023 Division US7069424B2 (en) | 2002-01-02 | 2002-01-02 | Placing front instruction in replay loop to front to place side instruction into execution stream upon determination of criticality |
Publications (1)
Publication Number | Publication Date |
---|---|
US20050268074A1 true US20050268074A1 (en) | 2005-12-01 |
Family
ID=21892017
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/037,023 Expired - Fee Related US7069424B2 (en) | 2002-01-02 | 2002-01-02 | Placing front instruction in replay loop to front to place side instruction into execution stream upon determination of criticality |
US11/200,677 Abandoned US20050268074A1 (en) | 2002-01-02 | 2005-08-09 | Method and apparatus for determining the criticality of a micro-operation |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/037,023 Expired - Fee Related US7069424B2 (en) | 2002-01-02 | 2002-01-02 | Placing front instruction in replay loop to front to place side instruction into execution stream upon determination of criticality |
Country Status (1)
Country | Link |
---|---|
US (2) | US7069424B2 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9354875B2 (en) * | 2012-12-27 | 2016-05-31 | Intel Corporation | Enhanced loop streaming detector to drive logic optimization |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5471601A (en) * | 1992-06-17 | 1995-11-28 | Intel Corporation | Memory device and method for avoiding live lock of a DRAM with cache |
US5613083A (en) * | 1994-09-30 | 1997-03-18 | Intel Corporation | Translation lookaside buffer that is non-blocking in response to a miss for use within a microprocessor capable of processing speculative instructions |
US5680565A (en) * | 1993-12-30 | 1997-10-21 | Intel Corporation | Method and apparatus for performing page table walks in a microprocessor capable of processing speculative instructions |
US5867701A (en) * | 1995-06-12 | 1999-02-02 | Intel Corporation | System for inserting a supplemental micro-operation flow into a macroinstruction-generated micro-operation flow |
US5966544A (en) * | 1996-11-13 | 1999-10-12 | Intel Corporation | Data speculatable processor having reply architecture |
US6073159A (en) * | 1996-12-31 | 2000-06-06 | Compaq Computer Corporation | Thread properties attribute vector based thread selection in multithreading processor |
US6076153A (en) * | 1997-12-24 | 2000-06-13 | Intel Corporation | Processor pipeline including partial replay |
US6112317A (en) * | 1997-03-10 | 2000-08-29 | Digital Equipment Corporation | Processor performance counter for sampling the execution frequency of individual instructions |
US6243788B1 (en) * | 1998-06-17 | 2001-06-05 | International Business Machines Corporation | Cache architecture to enable accurate cache sensitivity |
US6247121B1 (en) * | 1997-12-16 | 2001-06-12 | Intel Corporation | Multithreading processor with thread predictor |
US6266764B1 (en) * | 1998-03-17 | 2001-07-24 | Matsushita Electric Industrial Co., Ltd. | Program controller for switching between first program and second program |
US6282629B1 (en) * | 1992-11-12 | 2001-08-28 | Compaq Computer Corporation | Pipelined processor for performing parallel instruction recording and register assigning |
US6292882B1 (en) * | 1998-12-10 | 2001-09-18 | Intel Corporation | Method and apparatus for filtering valid information for downstream processing |
US6385715B1 (en) * | 1996-11-13 | 2002-05-07 | Intel Corporation | Multi-threading for a processor utilizing a replay queue |
US6477562B2 (en) * | 1998-12-16 | 2002-11-05 | Clearwater Networks, Inc. | Prioritized instruction scheduling for multi-streaming processors |
US6643767B1 (en) * | 2000-01-27 | 2003-11-04 | Kabushiki Kaisha Toshiba | Instruction scheduling system of a processor |
US6658447B2 (en) * | 1997-07-08 | 2003-12-02 | Intel Corporation | Priority based simultaneous multi-threading |
US6735688B1 (en) * | 1996-11-13 | 2004-05-11 | Intel Corporation | Processor having replay architecture with fast and slow replay paths |
US6785803B1 (en) * | 1996-11-13 | 2004-08-31 | Intel Corporation | Processor including replay queue to break livelocks |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US608567A (en) * | 1898-08-09 | Automatic sack filling and sewing machine | ||
US6141715A (en) | 1997-04-03 | 2000-10-31 | Micron Technology, Inc. | Method and system for avoiding live lock conditions on a computer bus by insuring that the first retired bus master is the first to resubmit its retried transaction |
-
2002
- 2002-01-02 US US10/037,023 patent/US7069424B2/en not_active Expired - Fee Related
-
2005
- 2005-08-09 US US11/200,677 patent/US20050268074A1/en not_active Abandoned
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5471601A (en) * | 1992-06-17 | 1995-11-28 | Intel Corporation | Memory device and method for avoiding live lock of a DRAM with cache |
US6282629B1 (en) * | 1992-11-12 | 2001-08-28 | Compaq Computer Corporation | Pipelined processor for performing parallel instruction recording and register assigning |
US5680565A (en) * | 1993-12-30 | 1997-10-21 | Intel Corporation | Method and apparatus for performing page table walks in a microprocessor capable of processing speculative instructions |
US5613083A (en) * | 1994-09-30 | 1997-03-18 | Intel Corporation | Translation lookaside buffer that is non-blocking in response to a miss for use within a microprocessor capable of processing speculative instructions |
US5867701A (en) * | 1995-06-12 | 1999-02-02 | Intel Corporation | System for inserting a supplemental micro-operation flow into a macroinstruction-generated micro-operation flow |
US6785803B1 (en) * | 1996-11-13 | 2004-08-31 | Intel Corporation | Processor including replay queue to break livelocks |
US5966544A (en) * | 1996-11-13 | 1999-10-12 | Intel Corporation | Data speculatable processor having reply architecture |
US6735688B1 (en) * | 1996-11-13 | 2004-05-11 | Intel Corporation | Processor having replay architecture with fast and slow replay paths |
US6385715B1 (en) * | 1996-11-13 | 2002-05-07 | Intel Corporation | Multi-threading for a processor utilizing a replay queue |
US6073159A (en) * | 1996-12-31 | 2000-06-06 | Compaq Computer Corporation | Thread properties attribute vector based thread selection in multithreading processor |
US6112317A (en) * | 1997-03-10 | 2000-08-29 | Digital Equipment Corporation | Processor performance counter for sampling the execution frequency of individual instructions |
US6658447B2 (en) * | 1997-07-08 | 2003-12-02 | Intel Corporation | Priority based simultaneous multi-threading |
US6247121B1 (en) * | 1997-12-16 | 2001-06-12 | Intel Corporation | Multithreading processor with thread predictor |
US6076153A (en) * | 1997-12-24 | 2000-06-13 | Intel Corporation | Processor pipeline including partial replay |
US6266764B1 (en) * | 1998-03-17 | 2001-07-24 | Matsushita Electric Industrial Co., Ltd. | Program controller for switching between first program and second program |
US6243788B1 (en) * | 1998-06-17 | 2001-06-05 | International Business Machines Corporation | Cache architecture to enable accurate cache sensitivity |
US6292882B1 (en) * | 1998-12-10 | 2001-09-18 | Intel Corporation | Method and apparatus for filtering valid information for downstream processing |
US6477562B2 (en) * | 1998-12-16 | 2002-11-05 | Clearwater Networks, Inc. | Prioritized instruction scheduling for multi-streaming processors |
US6643767B1 (en) * | 2000-01-27 | 2003-11-04 | Kabushiki Kaisha Toshiba | Instruction scheduling system of a processor |
Also Published As
Publication number | Publication date |
---|---|
US7069424B2 (en) | 2006-06-27 |
US20030126407A1 (en) | 2003-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7302527B2 (en) | Systems and methods for executing load instructions that avoid order violations | |
US6687809B2 (en) | Maintaining processor ordering by checking load addresses of unretired load instructions against snooping store addresses | |
EP2674856B1 (en) | Zero cycle load instruction | |
US7266648B2 (en) | Cache lock mechanism with speculative allocation | |
US9524164B2 (en) | Specialized memory disambiguation mechanisms for different memory read access types | |
US9342310B2 (en) | MFENCE and LFENCE micro-architectural implementation method and system | |
US6349382B1 (en) | System for store forwarding assigning load and store instructions to groups and reorder queues to keep track of program order | |
US6463511B2 (en) | System and method for high performance execution of locked memory instructions in a system with distributed memory and a restrictive memory model | |
JP4603583B2 (en) | Processor, apparatus, and method | |
US5619662A (en) | Memory reference tagging | |
US6922745B2 (en) | Method and apparatus for handling locks | |
DE112009000117T5 (en) | Processor with hybrid redundancy to protect against logic errors | |
US10628160B2 (en) | Selective poisoning of data during runahead | |
US6622235B1 (en) | Scheduler which retries load/store hit situations | |
US7769985B2 (en) | Load address dependency mechanism system and method in a high frequency, low power processor system | |
US6301654B1 (en) | System and method for permitting out-of-order execution of load and store instructions | |
US7730290B2 (en) | Systems for executing load instructions that achieve sequential load consistency | |
US20040216103A1 (en) | Mechanism for detecting and handling a starvation of a thread in a multithreading processor environment | |
US7725659B2 (en) | Alignment of cache fetch return data relative to a thread | |
US20080141002A1 (en) | Instruction pipeline monitoring device and method thereof | |
JP4965638B2 (en) | System and method for controlling task switching | |
US7370181B2 (en) | Single stepping a virtual machine guest using a reorder buffer | |
US7069424B2 (en) | Placing front instruction in replay loop to front to place side instruction into execution stream upon determination of criticality | |
US6473850B1 (en) | System and method for handling instructions occurring after an ISYNC instruction | |
US6282629B1 (en) | Pipelined processor for performing parallel instruction recording and register assigning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VENKATRAMAN, KS;BAKTHA, ARAVINDH;REEL/FRAME:016880/0640 Effective date: 20020211 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |