US20050097304A1 - Pipeline recirculation for data misprediction in a fast-load data cache - Google Patents
Pipeline recirculation for data misprediction in a fast-load data cache Download PDFInfo
- Publication number
- US20050097304A1 US20050097304A1 US10/697,503 US69750303A US2005097304A1 US 20050097304 A1 US20050097304 A1 US 20050097304A1 US 69750303 A US69750303 A US 69750303A US 2005097304 A1 US2005097304 A1 US 2005097304A1
- Authority
- US
- United States
- Prior art keywords
- data
- load
- speculative
- pipeline
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000001419 dependent effect Effects 0.000 claims abstract description 28
- 238000000034 method Methods 0.000 claims abstract description 15
- 238000011010 flushing procedure Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 13
- 239000000872 buffer Substances 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 230000001427 coherent effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003134 recirculating effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
Abstract
A system and method in a computer architecture for selectively permitting data, such as instructions, in a pipeline to be executed based upon a speculative data load in a fast-load data cache. Each data load that is dependent upon the load of a specific data load is selectively flagged in a pipeline that selectively loads, executes, and/or flushes each data load, while the fast-load data cache speculatively loads one or more data loads. Upon the determination of a misprediction of a speculative data load, the data loads flagged as dependent on the mispredicted data load are not used in the one or more pipelines, and are alternately flushed.
Description
- 1. Field of the Invention
- The present invention generally relates to computer architecture. More particularly, the invention relates to a device and method for invalidating pipelined data from an errant guess in a speculative fast-load instruction cache.
- 2. Description of the Related Art
- In computer architecture, the usage of extensive transistors on a microprocessor allows for a technology called “pipelining.” In a pipelined architecture, a series of instruction execution overlaps in a series of transistors called the pipeline. Consequently, even though it might take four clock cycles to execute each specific instruction, there can be several instructions in various stages of execution simultaneously within the pipeline to optimally achieve the completion of one instruction for every clock cycle. Many modern processors have multiple instruction decoders, and each decoder can have a dedicated pipeline. Such architecture provides multiple data streams of instructions which can accelerate processor throughput whereby more than one instruction can complete during each clock cycle. However, errors in instructions can require the entire instruction to be flushed from the pipeline, or a non-sequential instruction execution, i.e. one instruction in the pipeline requires 5 clock cycles while another requires 3 clock cycles, causes less than one instruction to be completed per clock cycle.
- “Caching” is a technology based on a memory subsystem of the processor whereby a smaller and faster data store, such as a bank of registers or buffers, can fetch, hold, and provide data to the processor at a much faster rate than other memory. For example, a cache that can provide data access at two times faster than the main memory access is called a
level 2 cache or an L2 cache. And a smaller and faster memory system that exchanges data directly into the processor, and is accessed at the cyclic rate of the processor, as opposed to the speed of the memory bus, is called alevel 1 cache (L1 cache). For example, on a 233-megahertz (MHz) processor, L1 cache is 3.5 times faster than the L2 cache, which is two times faster than the processor access to main memory. - The cache will hold data that is most likely to be needed by the processor for instruction execution. The cache uses a principle called “locality of reference” which assumes that data most frequently accessed or most recently accessed will be the most likely data again needed by the processor and stores that data in the cache(s). Upon instruction execution, the processor first searches the cache(s) for the required data, and if the data is present (a cache “hit”) the data is provided at the faster rate, and if the data is not present (a cache “miss”) the processor will access main memory a slower rate looking for the data, and in the worst case, peripheral memory which is the slowest data transfer rate.
- A number of methods have been proposed to initiate load instructions for sequential execution in a speculative manner such that a correct guess made as to the instruction sequence will speed the data throughput and execution. To that end, a special fast load (L0) data cache has been used to access a cache element that is likely, but not certain, to be the correct one for instruction execution, and that cache element is likely to have dependent data loads/instructions. A correct speculative guess can produce a
load target 1 to 3 cycles earlier that an L1 cache. However, the use of such a fast-load data cache within a superscalar pipelined computer architecture with a significant queue of pipelined instructions can be problematic due the likelihood of an errant instruction load that will require the pipeline to flush the instruction stream. Depending upon the size of the pipeline, the recovery time to restart the instruction causing an event can be as significant as 5-10 clock cycles. In the worst case, a speculative L0 data cache that has a high miss rate can actually hinder overall processor performance. - Additionally, the L0 data cache has to maintain coherency for all data accesses, which is difficult when used with a data bus that have several potential callers of the L0Cache. In such instance, the L0 data cache must check every potential accessing device's directory to make sure that the devices have access to the common data load at any given point.
- Although, all of the instructions held within the pipeline may not be adversely affected from an incorrect guess at the L0 data cache. In such case, otherwise correct instructions are flushed from the pipeline. For example, if half of the load instructions in a 5 cycle pipeline can make use of speculative instruction loading but the speculative fast access is wrong 20-40%, which is typical for extant L0 caches, the total penalty cycles can equal the speculative gain in cycles so that no or only a very small net performance gain is possible. Therefore, it would be advantageous to provide a system that can give the benefit of correct speculative instruction loading in a fast-load L0 cache, but not incur significant penalties from flushing otherwise correct instructions from the pipeline. Accordingly, the present invention is primarily directed to such a system and method for recirculating pipeline data upon a misprediction in the fast-load data cache.
- The present invention is a system and method in a computer architecture for selectively permitting some pipelined instructions to be executed or items of data processed while other pipelined instructions or data that are based upon a misprediction of an instruction or data speculatively loaded in a fast-load data cache are not executed. Each instruction or data load that is dependent upon the load of a specific instruction or data load is selectively flagged in a pipeline that can selectively load, execute, and/or flushes each instruction or item of data, while the fast-load data cache speculatively executes a load cache fetch access. Upon the determination of a misprediction of a speculatively loaded instruction, the data loads flagged as dependent on that specific instruction or data are not executed in the one or more pipelines, which avoids the necessity of flushing the entire pipeline.
- The system for selectively permitting instructions in a pipeline to be executed based upon a misprediction of a speculative instruction loaded in a fast-load data cache includes one or more pipelines, with each pipeline able to selectively start, execute, and flush each instruction, and each instruction is selectively flagged to indicate dependence upon the load of a specific instruction. The system also includes at least one fast-load data cache that speculatively executes one or more load instructions or data, and upon the determination of the misprediction of a load instruction, the instructions flagged as dependent on that specific instruction are not executed in the one or more pipelines. The speculative instruction can be loaded in the one or more pipelines, or can be loaded in a separate data store with one or more of the instructions in the pipeline(s) dependent thereupon. The flag can be a bit within the instruction, or data attached to the instruction. In one embodiment, the flagged dependent specific instruction can be flushed from the one or more pipelines upon the determination of the misprediction of a loaded instruction.
- In one embodiment, the system generates speculative versions of the instructions or data loads, which can be invalid because the speculative data load that initiated that sequence accessed incorrect data because of a wrong “guess” access, and the speculative (and faster) load and following instruction sequence is invalidated. The nonspeculative instruction, which is always correct is marked valid and always completes execution. Thus, on a speculative “bad guess” case of a load, no time is actually lost because the nonspeculative sequence executes on time, assuming sufficient resources are always available. Overall performance of the system can be modified to have only one speculative instruction load allowed per cycle and two load/store and four ALU (FX) execution units available, which ensures that adequate resources are present to handle the otherwise valid nonspeculative instruction.
- The method for selectively permitting instructions in a pipeline to be executed based upon a misprediction of a speculative instruction or data loaded in a fast-load data cache includes the steps of loading data into a pipeline, selectively flagging one or more of the instructions or data to indicate dependence upon the load of a specific instruction, speculatively loading a speculative instruction in a fast-load data cache, determining if the speculative instruction is a misprediction, and then selectively executing the instructions not flagged as dependent on that specific instruction determined to be a misprediction.
- The present system and method accordingly provide an advantage in that a processor can gain a multicycle advantage from correct speculative instruction loading in a fast-load data cache (L0), but not incur significant penalties from having to flush otherwise correct instructions from the pipeline. The system is simple in implementation as it uses flag bits, either within the instruction or attached to the instruction, to indicate the instructions based upon speculative instruction loads, which does not significantly add to complexity of processor design or consumption of space on the chip.
- Other objects, advantages, and features of the present invention will become apparent after review of the hereinafter set forth Brief Description of the Drawings, Detailed Description of the Invention, and the Claims.
-
FIG. 1 is a prior art block diagram of a pipelined series of instructions that incurs a 5-cycle penalty upon a miss in the fast-load data cache (L0 Dcache). -
FIG. 2 is a block diagram illustrating a prior art implementation of a fast-load data cache. -
FIG. 3 is a block diagram illustrating the load cache physical layout, and specifically showing the fast-load data cache laid out at the same rank as the ALU. -
FIG. 4 is a block diagram of a speculative and nonspeculative issue instances of a dependent instruction R. -
FIG. 5 is a block diagram of the temporarily valid and speculative bits appended to each pipeline instruction after a load. -
FIG. 6 is a representative diagram of a pipelined cycle of instructions utilizing the fast-load data cache (L0 Dir) to set and reset speculative bits on the pipelined instructions. -
FIG. 7 is a block diagram of a generalized N-issue superscalar pipeline with instruction reissue. - With reference to the figures in which like numerals represent like elements throughout,
FIG. 1 illustrates a prior art pipelined series of instructions that incurs a 5-cycle penalty upon a miss in a fast-load data cache (L0 Dcache 12). The five cycles include the IC cycle (L1 Instruction cache access), the RI (Register File access) cycle, the ALU (Arithmetic Logic Unit) or AGEN (Address Generation) cycle, the DC (Data Cache Access), and the WB (Register File write back) cycle. As used herein, the term “instruction” in conjunction with data items held in a pipeline means any command, operator, transitional product or other data handled in the registers of the pipeline. Here, the pipelined load instructions are reviewed to determine if a misprediction has been made in an instruction load, i.e. if the data load in L0 Dcache 12 was a cache miss. If an incorrect load has occurred, the instruction pipeline is invalidated, rolled back and restarted, and the instruction can be recycled and the event recorded. Thus, a 5-cycle penalty is incurred. The instruction group must then be serialized and again placed into the pipeline from the last valid instruction. The present invention allows the parallel use of instruction/load sequences where one of the sequences is speculative based upon the load in the fast-load datacache L0 Dcache 12. -
FIG. 2 is a block diagram of a illustrating a prior art implementation of a fast-load data cache 12. A fast-load data cache (L0 Dcache 12) allows all data loads to access the cache, and is implemented as normal, directory-based associative, coherent data cache. Here, LBUF (L0)Directory 16 indicates a miss when the access is not contained in the data cache. Upon a directory hit, the Load-Buffer data select 21 will latch the data from the fast-load data cache (L0 Dcache 12). In order to achieve the very fast one-cycle access time, the size is highly restricted to be no larger than 1 KB. And if all load references use the cache, the miss rate becomes unacceptably high and the reload time for theL0 Dcache 12 completely stalls the pipeline. -
FIG. 3 is a block diagram illustrating the fast-load data cache physical layout on a processor core chip area with the fast-load data caches, sown here as 1 KBLD BUF A 13 and 1KBLD BUF B 15, near theALUs load data caches ALUs ALUs -
FIG. 4 is a block diagram of a speculative and nonspeculative issue instances of a dependent instruction R. A specially flagged (as temporarily valid) speculative access from anL0 Dcache 46, which typically produces a load target result 1-3 cycles earlier than theL1 cache 39. Dependent instructions after the load can follow. However, in addition, a nonspeculative copy of the load can also be sent down the same pipeline with any dependent instructions in the next few cycles afterward (where the number of cycles equals the difference in latency between the L0 and L1 accesses). If it is determined that the fast speculative load instruction produced correct results, the speculative versions of the instructions are kept along with their results while the nonspeculative but now erroneous versions of the instructions earlier in the pipe are invalidated and thus have no lasting effect. - The example in
FIG. 4 shows data initially loaded atinstruction buffer 40, and the addition of aninstruction reissue buffer 42 contains the nonspeculative instance of the load plus dependent instruction(s) immediately following the fast-load of the speculative instruction which accesses the fast-load data cache 46 (L0 Dcache) and its associateddirectory 47. In certain cases, thedirectory 47 can be omitted from the architecture, so long as theL0 Dcache 46 is a one-way associate or direct-map cache. Thus, no set selection function is required of the directory, and because a direct-map embodiment of fast-load data cache (L0 Dcache 46) is a speculative and non-coherent cache the remaining hit or miss functionality of the directory is replaced by the compare equal 10 result. - The instruction (R) is issued first assuming a fast-load data cache (L0 Dcache 46) hit, and then a second time (N cycles) later, both marked temporarily valid where N equals the number of cycles of additional latency incurred in an
L1 Dcache 39 access. A compare 10 with theL1 DCache 39 and adelay 49 of theL0 Dcache 46 determines if the speculative load in the L0 Dcache 46 (which was earlier than the load to L1Dcache 39) was correct. As long as the fast-load data cache (L0 Dcache 46) hit/miss indication is known at least one cycle before the register file is written (Register file write 44), the proper copy of the instruction (R or R′) is validated and the other (R′ or R) is invalidated depending on a fast-load data cache (L0 Dcache 46) hit or miss. In other words, both possible outcomes—a fast-load data cache (L0 Dcache 46) hit or L1hit)—have pipelined execution in order with only the fastest valid instruction copy actually completing execution. In such manner, the entire pipeline is not required to be flushed upon an incorrect speculative load, and consequently, a cycle savings occurs, e.g. instead of a 5 cycle penalty as shown inFIG. 1 , the worst case is a 4 cycle penalty if aL0 cache 46 miss occurs. - However, no matter how fast the fast-load data cache (L0 Dcache 46) miss signal is made to be during the AGEN cycle of the load, it is never fast enough to allow the pipeline to execute a simple 1-cycle stall to allow the data from the L1 data cache to be used since the fast-load data cache (L0 Dcache 46) has missed. In fact, a 2-cycle stall is difficult. Thus, losing three cycles for every fast-load data cache (L0 Dcache 46) miss at a 25-30% miss eliminates most of the performance gain on hits (one cycle) so as to render it unusable. Conversely, as
FIG. 4 shows, there is ample time to invalidate an instruction before its target(s) is (are) written to the register file, i.e.register file write 44. Here, the hit/miss indication from the directory can have up to two cycles delay 49 to reach theregister file 44 to inhibit a write. Although, it is more convenient to simply append a valid bit to each instruction and keep it as a flag bit through all stages of execution, as is shown inFIG. 5 . Then, only this single valid bit need be reset (to indicate an invalid instruction) and which will prohibit any further cycles from producing results based on the corresponding instruction. Alternately, the pipeline can also flush the invalid instruction if desired. Through the use of the system, it is not only unnecessary to attempt to stall hundreds of latches throughout the pipeline, but only a single latch or subset of latches (valid bit), and further, the remainder of the next cycle is available to actually prevent the now invalid pipe stage result from being latched (kept). -
FIG. 5 is a block diagram of the temporarily valid and speculative bits appended to each pipeline instruction after a load. Instances of a typical dependent instruction R is issued speculatively as R′ assuming anL0 Dcache 46 hit, and also issued a second time as R′ assuming an L0 Dcache miss and a L1 Dcache hit. The initial speculative data load is marked with a “speculative” bit and data loads dependent upon a specific speculative load are marked with “temporarily valid” bits to indicate their dependency. Thus, both R and R′ are marked “temporarily valid” until the L0 miss/hit indication, and then the correct sequence is marked valid, the other marked invalid, which ultimately NO-OPs the instructions/data loads by preventing the results from the errant instruction being clocked or used. -
FIG. 6 is a representative diagram of a pipelined cycle of instructions utilizing the fast-load data cache (L0 DCache 46) to set and reset speculative bits on the pipelined instructions. Thus, incycle 1, the instruction is loaded (L), and incycle 2 instruction ADD (R) that is dependent on load target is loaded with the Instruction valid bit set or reset at this stage by theLo Directory 47. Then incycle 3, the ADD′(R′)—the reissued ADD—is marked as temporarily valid and is invalidated ifL0 Dcache 46 hits, and ADD is invalidated if L0 misses with ADD′ validated, i.e. the instruction valid bit is set to valid. -
FIG. 7 is a block diagram of a generalized N-issue superscalar pipeline with instruction reissue with two exemplary pipelines A and B. In the pipeline, multiple instructions are issued each cycle and where multiple speculative instructions exist in the pipelines (instructions following the load) along with other nonspeculative instructions. Here, selective invalidation of instructions in pipeline stages must occur, i.e., only instruction pairs marked temporarily valid are affected by fast-load data cache (L0 Dcache 46) miss signals where other instructions started in order without speculation are allowed to proceed normally. Here, theinstruction buffer 50 loads the data loads into the pipelines, and reissuebuffer 52 is enlarged to four instructions equaling the largest issue group for a four-pipeline superscalar. It should also be noted that the same scheme works to eliminate the stall conditions for value prediction miscompares with the assumption that the miscompare must be known at the end of theL1 Dcache 49 and a compare equal access to invalidate a register file write 54 in time. It should be noted that each pipeline does not need to be speculatively loaded, i.e. pipeline A can include speculatively loaded data capable of flushing whereas pipeline B can include nonspeculative instructions or data that is processed in parallel with pipeline A. - It can thus be seen that the system thus provides a method for selectively permitting instructions in a pipeline, such as the pipelines in
FIGS. 4 and 7 , to be executed based upon a speculative instruction or data loaded in a fast-load data cache 46, comprising the steps of loading one or more instructions into a pipeline, selectively flagging one or more of the instructions to indicate dependence upon the load of a specific instruction, speculatively loading a speculative data load in a fast-load data cache 46, determining if the speculative data load is a misprediction, such as through compare 10 inFIG. 4 , and selectively executing the instructions/data loads not flagged as dependent on that specific instruction determined to be a misprediction, i.e. not temporarily valid. The method can further include the step of loading the speculative instruction into the pipeline, and the step of flushing the flagged dependent specific instruction from the pipeline upon the determination of the misprediction of a loaded instruction - The step of selectively flagging the one or more instructions does not necessarily flag an instruction that is not dependent on any specific instruction. Further, the step of selectively flagging the instruction can occur through altering a bit within the instruction, and alternately, through attaching a flag of one or more bits to the instruction.
- While the foregoing disclosure shows illustrative embodiments of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
Claims (15)
1. In a computer architecture, a system for selectively permitting data loads in a pipeline to be executed based upon a speculative data load in a fast-load data cache, comprising:
one or more pipelines, each pipeline able to selectively load, execute, and flush a series of data loads, and each data load selectively flagged to indicate dependence upon the loading of a specific data load; and
at least one fast-load data cache that loads one or more speculative data loads;
wherein upon determination of a misprediction for a specific speculative data load, the data loads flagged as dependent on that specific speculative data load not being executed in the one or more pipelines.
2. The system of claim 1 , wherein the speculative data load is loaded in the one or more pipelines.
3. The system of claim 1 , wherein one or more of the data loads in the one or more pipelines are not dependent on any specific data load and not selectively flagged.
4. The system of claim 1 , wherein the flag is a bit within the data load.
5. The system of claim 1 , wherein the flag is attached to the data load 6. The system of claim 1 , wherein the flagged dependent specific data load is flushed from the one or more pipelines upon the determination of a misprediction for a data load.
7. The system of claim 1 , wherein the fast-load data cache includes a directory.
8. The system of claim 1 , wherein the fast-load data cache does not include a directory.
9. A method for selectively permitting data loads in a pipeline to be executed based upon a speculative data load in a fast-load data cache, comprising the steps of:
loading one or more data loads into a pipeline;
selectively flagging one or more of the data loads to indicate dependence upon the load of a specific data load;
loading a speculative data load in a fast-load data cache;
determining if the speculative data load is a misprediction; and
selectively executing the data loads not flagged as dependent on that specific data load determined to be a misprediction.
10. The method of claim 9 , further comprising the step of loading the speculative data load into the pipeline.
11. The method of claim 9 , wherein the step of selectively flagging the one or more data loads does not flag any data load that is not dependent on any specific data load.
12. The method of claim 9 , wherein the step of selectively flagging the data load occurs through altering a bit within the data load.
13. The method of claim 9 , wherein the step of selectively flagging the data load occurs through attaching a flag to the data load.
14. The method of claim 7 , further comprising the step of flushing the flagged dependent specific data load from the pipeline upon the determination of a misprediction of a data load.
15. In a computer architecture, a system for selectively permitting instructions in a pipeline to be executed based upon a speculative data load, comprising:
a means for pipelining one or more data loads, the means able to selectively load, execute, and flush each data load;
a means for selectively flagging one or more data loads to indicate dependence upon the load of a specific data load;
a means for speculatively loading one or more data loads; and
a means for determining a misprediction of a speculative data load,
wherein upon the determination of a misprediction in a speculative data load, the means for pipelining not using data loads flagged as dependent on that specific data load.
16. The system of claim 13 , wherein the pipeline means flushes the flagged dependent data load upon the determination of a misprediction in a speculative data load.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/697,503 US20050097304A1 (en) | 2003-10-30 | 2003-10-30 | Pipeline recirculation for data misprediction in a fast-load data cache |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/697,503 US20050097304A1 (en) | 2003-10-30 | 2003-10-30 | Pipeline recirculation for data misprediction in a fast-load data cache |
Publications (1)
Publication Number | Publication Date |
---|---|
US20050097304A1 true US20050097304A1 (en) | 2005-05-05 |
Family
ID=34550377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/697,503 Abandoned US20050097304A1 (en) | 2003-10-30 | 2003-10-30 | Pipeline recirculation for data misprediction in a fast-load data cache |
Country Status (1)
Country | Link |
---|---|
US (1) | US20050097304A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2448118A (en) * | 2007-04-03 | 2008-10-08 | Advanced Risc Mach Ltd | Error recovery following speculative execution with an instruction processing pipeline |
US20130227251A1 (en) * | 2012-02-24 | 2013-08-29 | Jeffry E. Gonion | Branch mispredication behavior suppression on zero predicate branch mispredict |
WO2013188565A1 (en) * | 2012-06-15 | 2013-12-19 | Soft Machines, Inc. | A semaphore method and system with out of order loads in a memory consistency model that constitutes loads reading from memory in order |
US20170168836A1 (en) * | 2015-12-15 | 2017-06-15 | International Business Machines Corporation | Operation of a multi-slice processor with speculative data loading |
US9904552B2 (en) | 2012-06-15 | 2018-02-27 | Intel Corporation | Virtual load store queue having a dynamic dispatch window with a distributed structure |
US9928121B2 (en) | 2012-06-15 | 2018-03-27 | Intel Corporation | Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization |
US9965277B2 (en) | 2012-06-15 | 2018-05-08 | Intel Corporation | Virtual load store queue having a dynamic dispatch window with a unified structure |
US9990198B2 (en) | 2012-06-15 | 2018-06-05 | Intel Corporation | Instruction definition to implement load store reordering and optimization |
US10019263B2 (en) | 2012-06-15 | 2018-07-10 | Intel Corporation | Reordered speculative instruction sequences with a disambiguation-free out of order load store queue |
US10048964B2 (en) | 2012-06-15 | 2018-08-14 | Intel Corporation | Disambiguation-free out of order load store queue |
WO2020144446A1 (en) * | 2019-01-11 | 2020-07-16 | Arm Limited | Controlling use of data determined by a resolve-pending speculative operation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4628450A (en) * | 1981-09-29 | 1986-12-09 | Tokyo Shibaura Electric Co | Data processing system having a local memory which does not use a directory device with distributed resident programs and a method therefor |
US4946696A (en) * | 1988-11-14 | 1990-08-07 | Joe Nendl | Process for producing fine patternation in chocolate surfaces |
US5548795A (en) * | 1994-03-28 | 1996-08-20 | Quantum Corporation | Method for determining command execution dependencies within command queue reordering process |
US5721864A (en) * | 1995-09-18 | 1998-02-24 | International Business Machines Corporation | Prefetching instructions between caches |
US6376000B1 (en) * | 2000-01-03 | 2002-04-23 | Peter B Waters | Method of creating painted chocolate |
US6467027B1 (en) * | 1999-12-30 | 2002-10-15 | Intel Corporation | Method and system for an INUSE field resource management scheme |
US20030208665A1 (en) * | 2002-05-01 | 2003-11-06 | Jih-Kwon Peir | Reducing data speculation penalty with early cache hit/miss prediction |
US20040021757A1 (en) * | 2002-08-05 | 2004-02-05 | Mars, Incorporated | Ink-jet printing on surface modified edibles and products made |
US20050061184A1 (en) * | 2001-04-20 | 2005-03-24 | Russell John R. | Printing process with edible inks |
US6893671B2 (en) * | 2000-12-15 | 2005-05-17 | Mars, Incorporated | Chocolate confectionery having high resolution printed images on an edible image-substrate coating |
-
2003
- 2003-10-30 US US10/697,503 patent/US20050097304A1/en not_active Abandoned
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4628450A (en) * | 1981-09-29 | 1986-12-09 | Tokyo Shibaura Electric Co | Data processing system having a local memory which does not use a directory device with distributed resident programs and a method therefor |
US4946696A (en) * | 1988-11-14 | 1990-08-07 | Joe Nendl | Process for producing fine patternation in chocolate surfaces |
US5548795A (en) * | 1994-03-28 | 1996-08-20 | Quantum Corporation | Method for determining command execution dependencies within command queue reordering process |
US5721864A (en) * | 1995-09-18 | 1998-02-24 | International Business Machines Corporation | Prefetching instructions between caches |
US6467027B1 (en) * | 1999-12-30 | 2002-10-15 | Intel Corporation | Method and system for an INUSE field resource management scheme |
US6376000B1 (en) * | 2000-01-03 | 2002-04-23 | Peter B Waters | Method of creating painted chocolate |
US6893671B2 (en) * | 2000-12-15 | 2005-05-17 | Mars, Incorporated | Chocolate confectionery having high resolution printed images on an edible image-substrate coating |
US20050061184A1 (en) * | 2001-04-20 | 2005-03-24 | Russell John R. | Printing process with edible inks |
US20030208665A1 (en) * | 2002-05-01 | 2003-11-06 | Jih-Kwon Peir | Reducing data speculation penalty with early cache hit/miss prediction |
US20040021757A1 (en) * | 2002-08-05 | 2004-02-05 | Mars, Incorporated | Ink-jet printing on surface modified edibles and products made |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9519538B2 (en) | 2007-04-03 | 2016-12-13 | Arm Limited | Error recovery following speculative execution with an instruction processing pipeline |
US20080250271A1 (en) * | 2007-04-03 | 2008-10-09 | Arm Limited | Error recovery following speculative execution with an instruction processing pipeline |
GB2448118B (en) * | 2007-04-03 | 2011-08-24 | Advanced Risc Mach Ltd | Error recovery following erroneous execution with an instruction processing pipeline |
US8037287B2 (en) | 2007-04-03 | 2011-10-11 | Arm Limited | Error recovery following speculative execution with an instruction processing pipeline |
GB2448118A (en) * | 2007-04-03 | 2008-10-08 | Advanced Risc Mach Ltd | Error recovery following speculative execution with an instruction processing pipeline |
US20130227251A1 (en) * | 2012-02-24 | 2013-08-29 | Jeffry E. Gonion | Branch mispredication behavior suppression on zero predicate branch mispredict |
US9268569B2 (en) * | 2012-02-24 | 2016-02-23 | Apple Inc. | Branch misprediction behavior suppression on zero predicate branch mispredict |
US9904552B2 (en) | 2012-06-15 | 2018-02-27 | Intel Corporation | Virtual load store queue having a dynamic dispatch window with a distributed structure |
WO2013188565A1 (en) * | 2012-06-15 | 2013-12-19 | Soft Machines, Inc. | A semaphore method and system with out of order loads in a memory consistency model that constitutes loads reading from memory in order |
US9928121B2 (en) | 2012-06-15 | 2018-03-27 | Intel Corporation | Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization |
US9965277B2 (en) | 2012-06-15 | 2018-05-08 | Intel Corporation | Virtual load store queue having a dynamic dispatch window with a unified structure |
US9990198B2 (en) | 2012-06-15 | 2018-06-05 | Intel Corporation | Instruction definition to implement load store reordering and optimization |
US10019263B2 (en) | 2012-06-15 | 2018-07-10 | Intel Corporation | Reordered speculative instruction sequences with a disambiguation-free out of order load store queue |
US10048964B2 (en) | 2012-06-15 | 2018-08-14 | Intel Corporation | Disambiguation-free out of order load store queue |
US10592300B2 (en) | 2012-06-15 | 2020-03-17 | Intel Corporation | Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization |
US20170168836A1 (en) * | 2015-12-15 | 2017-06-15 | International Business Machines Corporation | Operation of a multi-slice processor with speculative data loading |
US20170168821A1 (en) * | 2015-12-15 | 2017-06-15 | International Business Machines Corporation | Operation of a multi-slice processor with speculative data loading |
US9921833B2 (en) * | 2015-12-15 | 2018-03-20 | International Business Machines Corporation | Determining of validity of speculative load data after a predetermined period of time in a multi-slice processor |
US9928073B2 (en) * | 2015-12-15 | 2018-03-27 | International Business Machines Corporation | Determining of validity of speculative load data after a predetermined period of time in a multi-slice processor |
WO2020144446A1 (en) * | 2019-01-11 | 2020-07-16 | Arm Limited | Controlling use of data determined by a resolve-pending speculative operation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7730283B2 (en) | Simple load and store disambiguation and scheduling at predecode | |
JP5357017B2 (en) | Fast and inexpensive store-load contention scheduling and transfer mechanism | |
US7350027B2 (en) | Architectural support for thread level speculative execution | |
US8627044B2 (en) | Issuing instructions with unresolved data dependencies | |
US5226130A (en) | Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency | |
US6691220B1 (en) | Multiprocessor speculation mechanism via a barrier speculation flag | |
US6665776B2 (en) | Apparatus and method for speculative prefetching after data cache misses | |
US6279105B1 (en) | Pipelined two-cycle branch target address cache | |
US6625660B1 (en) | Multiprocessor speculation mechanism for efficiently managing multiple barrier operations | |
US7523266B2 (en) | Method and apparatus for enforcing memory reference ordering requirements at the L1 cache level | |
US9021240B2 (en) | System and method for Controlling restarting of instruction fetching using speculative address computations | |
US9875105B2 (en) | Checkpointed buffer for re-entry from runahead | |
US5446850A (en) | Cross-cache-line compounding algorithm for scism processors | |
US20100332806A1 (en) | Dependency matrix for the determination of load dependencies | |
US7155574B2 (en) | Look ahead LRU array update scheme to minimize clobber in sequentially accessed memory | |
US20090006905A1 (en) | In Situ Register State Error Recovery and Restart Mechanism | |
US7631149B2 (en) | Systems and methods for providing fixed-latency data access in a memory system having multi-level caches | |
US7194604B2 (en) | Address generation interlock resolution under runahead execution | |
US10067875B2 (en) | Processor with instruction cache that performs zero clock retires | |
JP3159435B2 (en) | Load / load detection and reordering method and apparatus | |
US5649137A (en) | Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency | |
US20050097304A1 (en) | Pipeline recirculation for data misprediction in a fast-load data cache | |
US6973561B1 (en) | Processor pipeline stall based on data register status | |
Borkenhagen et al. | AS/400 64-bit PowerPC-compatible processor implementation | |
US20090210683A1 (en) | Method and apparatus for recovering from branch misprediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LUICK, DAVID A.;REEL/FRAME:014659/0146 Effective date: 20031028 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |