US20080109644A1 - System and method for using a working global history register - Google Patents
System and method for using a working global history register Download PDFInfo
- Publication number
- US20080109644A1 US20080109644A1 US11/556,244 US55624406A US2008109644A1 US 20080109644 A1 US20080109644 A1 US 20080109644A1 US 55624406 A US55624406 A US 55624406A US 2008109644 A1 US2008109644 A1 US 2008109644A1
- Authority
- US
- United States
- Prior art keywords
- branch
- instruction
- stage
- history information
- register
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000012937 correction Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 12
- 238000013461 design Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000009738 saturating Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3848—Speculative instruction execution using hybrid branch prediction, e.g. selection between prediction techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
- G06F9/3806—Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
Definitions
- the present invention relates generally to computer systems, and more particularly to a method and a system for using a working global history register.
- processors At the heart of the computer platform evolution is the processor. Early processors were limited by the technology available at that time. New advances in fabrication technology allow transistor designs to be reduced up to and exceeding 1/1000 th of the size of early processors. These smaller processor designs are faster, more efficient and use substantially less power while delivering processing power exceeding prior expectations.
- pipelining of instructions has been implemented in processor designs since the early 1960's.
- One example of pipelining is the concept of breaking execution pipelines into units or stages, through which instructions flow sequentially in a stream. The stages are arranged so that several stages can be simultaneously processing the appropriate parts of several instructions.
- One advantage of pipelining is that the execution of the instructions is overlapped because the instructions are evaluated in parallel.
- a processor pipeline is composed of many stages where each stage performs a function associated with executing an instruction. Each stage is referred to as a pipe stage or pipe segment. The stages are connected together to form the pipeline. Instructions enter at one end of the pipeline and exit at the other end.
- conditional branch instructions Most programs executed by the processor include conditional branch instructions, the actual branching behavior of which is not known until the instruction is evaluated deep in the pipeline. To avoid a stall that would result from waiting for actual evaluation of the branch instruction, modern processors may employ some form of branch prediction, whereby the branching behavior of a conditional branch instruction is predicted early in the pipeline. Based on the predicted branch evaluation, the processor speculatively fetches and executes instructions from a predicted address—either the branch target address (if the branch is predicted to be taken) or the next sequential address after the branch instruction (if the branch is predicted not to be taken). Whether a conditional branch instruction is taken or not taken is referred to as determining the direction of the branch. Determining the direction of the branch may be made at prediction time and at actual branch resolution time.
- branch prediction includes partitioning branch prediction into two predictors: an initial branch target address cache (BTAC) and a branch history table (BHT).
- BTAC initial branch target address cache
- BHT branch history table
- the BTAC is indexed by an instruction fetch group address and contains the next fetched address, also referred to as the branch target, corresponding to the instruction fetch group address.
- Entries are added to the BTAC after a branch instruction has passed through the processor pipeline and its branch has been taken. If the BTAC becomes full, entries are removed from the BTAC using standard cache replacement algorithms (such as round robin or least-recently used) when the next entry is being added.
- the BTAC may be a highly-associative cache design and is accessed early in the instruction execution pipeline. If the fetch group address matches a BTAC entry (a BTAC hit), the corresponding next fetch address or target address is fetched in the next cycle. This match and subsequent fetching of the target address is referred to as an implicit taken branch prediction. If there is no match (a BTAC miss), the next sequentially incremented address is fetched in the next cycle. This no match situation is also referred to an implicit not-taken prediction.
- BTACs may be utilized in conjunction with a more accurate individual branch direction predictor such as a branch history table (BHT) also known as a pattern history table (PHT).
- BHT branch history table
- PHT pattern history table
- a conventional BHT may contain a set of saturating predicted direction counters to produce a more accurate taken/not-taken decision for individual branch instructions.
- each saturating predicted direction counter may comprise a 2-bit counter that assumes one of four states, each assigned a weighted prediction value, such as:
- the output of a conventional BHT is a taken or not taken decision which results in either fetching the target address of the branch instruction or the next sequential address in the next cycle.
- the BHT is commonly updated with branch outcome information as it becomes known.
- various other prediction techniques may be implemented which use recent branch history information from other branches as feedback.
- current branch behavior may be correlated to the history of previously executed branch instructions.
- the history of previously executed branch instructions may influence how a conditional branch instruction is predicted.
- a Global History Register also referred to in the art as a global branch history register or a global history shift register, may be used to keep track of the past history of previously executed branch instructions.
- the branch history provides a view of the sequence of branch instructions encountered in the code path leading up to the presently executed branch instruction in order to achieve improved prediction results.
- identification of a branch instruction and its associated prediction information may occur only after an instruction decode stage.
- the instruction decode stage may be a later stage in the instruction execution sequence.
- the GHR is loaded with appropriate branch history information. As the branch history information is identified it is shifted into the GHR. The output of the GHR is used to identify the prediction value stored in the BHT which is used to predict the next conditional branch instruction.
- the GHR may not reflect the actual branch history information encountered when multiple branch instructions are executed in parallel during a relatively short period of time.
- the GHR may not be updated with the branch history information from the first branch instruction before the second branch instruction is predicted.
- an inaccurate value of the GHR may be used to identify the entry in the BHT for the second conditional branch instruction.
- Using an inaccurate value to index the entry in the BHT may affect the accuracy of the branch prediction. If the processor had been able to keep pace with the branch history information from the first conditional branch instruction, a different value would have been stored in the GHR and a different entry in the BHT would have been identified for the second conditional branch instruction.
- a processor that may store and use branch history information sooner than the GHR in order to achieve more accurate branch predictions.
- the present disclosure recognizes this need and discloses a processor which identifies branch instructions early in the execution stages of the processor. Using the branch instruction information as input, the processor may steer the selection of prediction values for subsequent conditional branch instructions.
- a method of processing branch history information is disclosed.
- the method identifies branch instructions during a first pipeline stage and loads the branch history information in a first register during the first pipeline stage.
- the method confirms the branch instructions in a second pipeline stage and the branch history information is loaded into a second register during the second pipeline stage.
- a pipeline processor comprising a first register having branch history information and a second register having branch history information is disclosed.
- the pipeline processor has a plurality of pipeline stages wherein the first register is loaded with the branch history information in a first pipeline stage when a branch instruction is identified and, a second register is loaded with branch history information during a second pipeline stage.
- a method of processing branch history information fetches a branch instruction, identifies the branch instructions during a first pipeline stage and loads the branch history information in a first register during the first pipeline stage.
- the method confirms the branch instructions in a second pipeline stage and the branch history information is loaded into a second register during the second pipeline stage.
- FIG. 1 shows a high level logic hardware block diagram of a processor using an embodiment of the present invention.
- FIG. 2 displays an exemplary branch history table used by the processor of FIG. 1 .
- FIG. 3 shows a lower level logic block diagram of the processor of FIG. 1 employing a Working Global History Register.
- FIG. 4 shows detailed view of the Working Global History Register and the Global History Register.
- FIG. 5 shows an exemplary group of instructions executed by the processor of FIG. 1 .
- FIG. 6 shows a timing diagram of the exemplary group of instructions of FIG. 5 as they are executed through various stages of the processor of FIG. 1 .
- FIG. 7 shows a flow chart illustrating an instruction process flow performed by the processor of FIG. 1 using a Working Global History Register
- FIG. 1 shows a high level view of a superscalar processor 100 utilizing an embodiment as hereinafter described.
- the processor 100 has a central processing unit (CPU) 102 that is coupled via a dedicated high speed bus 104 to an instruction cache 106 .
- the instruction cache is also coupled via a general purpose bus 116 to memory 114 .
- an Instruction Fetch Unit (IFU) 122 controls the loading of instructions from memory 114 into the instruction cache 106 . Once the instruction cache 106 is loaded with instructions, the CPU 102 is able to access them via the high speed bus 104 .
- the instruction cache 106 may be a separate memory structure as shown in FIG. 1 , or it may be integrated as an internal component of the CPU 102 . The integration may hinge on the size of the instruction cache 106 as well as the complexity and power dissipation of the CPU 102 .
- BTAC Branch Target Address Cache
- BHT Branch History Table 140
- two lower pipelines 160 and 170 are also coupled to the IFU 122 .
- Instructions may be fetched and decoded from the instruction cache 106 several instructions at a time. Within the instruction cache 106 instructions are grouped into sections known as cache lines. Each cache line may contain multiple instructions as well as associated data. The number of instructions fetched may depend upon the required fetch bandwidth as well as the number of instructions in each cache line. Within the IFU 122 , the fetched instructions are analyzed for operation type and data dependencies. After analyzing the instructions, the processor 100 may distribute the instructions from the IFU 122 to lower functional units or lower pipelines 160 or 170 for further execution.
- Lower pipelines 160 and 170 may contain various Execution Units (EU) 118 including arithmetic logic units, floating point units, store units, load units and the like.
- EU Execution Unit
- an EU 118 such as an arithmetic logic unit may execute a wide range of arithmetic functions, such as integer addition, subtraction, simple multiplication, bitwise logic operations (e.g. AND, NOT, OR, XOR), bit shifting and the like.
- the lower pipelines 160 and 170 may have a resolution stage (not shown), during which the actual results of a conditional branch instruction are identified. Once the actual results of the branch instruction are identified, the processor 100 may compare the actual results to the predicted results and, if they don't match, a mispredict has occurred.
- the BTAC 130 may be similar to a Branch Target Buffer (BTB) or a Branch Target Instruction Cache (BTIC).
- BTB or BTIC stores both the address of a branch and the instruction data (or opcodes) of the target branch.
- the BTAC 130 is used in conjunction with the various embodiments of the present invention.
- Other embodiments of the invention may alternatively include a BTB or BTIC instead of the BTAC 130 .
- the BTAC 130 may be subsequently updated to reflect the target address of the particular conditional branch instruction as well as a processor mode (e.g. Arm vs. Thumb operation in the advanced RISC processor architecture). Any time thereafter that the branch instruction is fetched again, the information stored in the BTAC 130 will be fetched on the next processor cycle, even without completely decoding the fetched branch instruction.
- a processor mode e.g. Arm vs. Thumb operation in the advanced RISC processor architecture
- a BTAC hit (e.g. when the fetch group address matches an address in the BTAC 130 ) may occur for either a conditional or unconditional branch instruction. This is due to the fact that the BTAC 130 may store information relating to both conditional branch instructions as well as unconditional branch instructions. In the case of a BTAC hit for an unconditional branch instruction, the predicted target address, predicted mode of the processor as well as the fact that the branch instruction is unconditional may be stored. In situations where an unconditional branch instruction address is stored in an entry in the BTAC 130 , the entry will indicate a branch direction of taken.
- FIG. 2 displays a more detailed illustration of an exemplary Branch History Table (BHT) 140 used by the processor 100 .
- the BHT 140 may be organized into 2 m lines 202 which are indexed using an address having m address bits. In one embodiment, nine bits of address are used which results in a BHT 140 having 512 lines. Within each line 202 there are 2 n counters 204 , where n is the number of bits used to select the appropriate counter. Additionally, 3 bits of address may be used to select the counter 204 , resulting in a BHT 140 that has eight counters 204 per line 202 . In one exemplary embodiment, fetch group address bits 12 through 4 may be used to select the line 202 in the BHT 140 . Bits 3 - 1 of the fetch group address may be used to select the specific counter 204 .
- BHT Branch History Table
- the processor 100 may identify branch instructions earlier in the instruction execution process prior to an instruction decode stage. When branch instructions are identified earlier, branch history information, such as the prediction value (conditional branch instruction) or taken branch direction (unconditional branch instruction) may also be identified at the same time.
- a Working Global History Register (WGHR), as will be described in the discussion of FIG. 3 , may be used by the processor 100 to receive and process the branch history information earlier in the instruction execution process.
- WGHR may store the prediction values of conditional branch instructions as well as branch directions of unconditional branch instructions.
- a WGHR may store only the prediction values of conditional branch instructions.
- the output of the WGHR may be employed to index a corresponding entry in the BHT 140 for the next conditional branch instruction.
- FIG. 3 displays a lower level logic block diagram 300 of the processor 100 including a Working Global History Register (WGHR) 316 .
- WGHR Working Global History Register
- the upper pipe 350 includes four instruction execution stages, an Instruction Cache 1 Stage (IC 1 ) 304 , an Instruction Cache 2 Stage (IC 2 ) 306 , an Instruction Data Alignment Stage (IDA) 308 and a Decode Stage (DCD) 310 .
- IC 1 Instruction Cache 1 Stage
- IDA Instruction Data Alignment Stage
- DCD Decode Stage
- the fetch logic circuit 302 as well as the upper pipe 350 , the Working Global History Register (WGHR) 316 , Global History Register (GHR) 314 , Branch Correction logic circuit (BCL) 330 , selection mux 322 , and address hashing logic circuit 320 may also be located within the IFU 122 .
- the fetch logic circuit 302 determines what instructions are to be fetched during the IC 1 stage 304 . In order to retrieve the instructions, the fetch logic circuit 302 sends the fetch group address to the Instruction Cache 106 . If the fetch group address is found within the Instruction Cache 106 (e.g. an instruction cache hit) the instructions are read from the hit cache line in the Instruction Cache 106 during the IC 2 stage 304 .
- the processor 100 sends the fetch group address to the BTAC 130 . If the processor 100 encounters a BTAC hit, the information stored within the BTAC for the fetch group address is received during the IC 2 Stage 306 .
- information stored within the BTAC 130 may include branch information such as a branch target, processor mode, as well as a taken branch direction (in the case of an unconditional branch instruction).
- the fetch logic sends the fetch group address to the address hashing logic circuit 320 .
- bits 12 - 4 of the fetch group address are exclusively or'd (XOR'd) with the output of the selection mux 322 .
- the output of the address hashing logic circuit 320 (e.g. the XOR function) provides the address index into the BHT 140 .
- bits 3 - 1 of the fetch group address may provide the selection bits to select the appropriate counter 204 .
- the processor 100 reads the results from sending the instruction fetch group address to the Instruction Cache 106 , the BTAC 130 and the BHT 140 . In the IC 2 stage 306 , the processor 100 determines if a BTAC hit has occurred. When a BTAC hit is confirmed during the IC 2 stage 306 the processor 100 also determines if the branch is a conditional or unconditional branch instruction. In the IC 2 stage 306 the prediction value from the BHT 140 is also received and stored.
- each cache line in the Instruction Cache 106 may contain multiple instructions, the individual instructions may need to be separated from a cache line. As well, data may be intertwined with the instructions in the cache line. The information from the cache line may need to be formatted and aligned in order to properly analyze and execute the instructions. The alignment and formatting of the instructions into individual executable instructions occurs during the IDA stage 308 .
- the instructions After the instructions are processed during the IDA stage 308 , they pass through the Decode (DCD) stage 310 .
- the instructions are analyzed to determine the type of instruction and what additional information or resources may be required for further processing.
- the processor 100 may hold the instruction in the DCD stage 310 or the processor 100 may pass it on to either of the lower pipelines 160 or 170 for further execution.
- the processor 100 confirms the instruction as a conditional branch instruction and confirms the instruction's prediction value (read during the IC 2 stage 306 ) from the BHT 140 . The accuracy of the prediction value will be verified during a later stage of instruction execution in either of the lower pipelines 160 or 170 .
- a branch prediction is determined to be incorrect (e.g. a mispredict)
- the processor 100 assumes that the prediction value is the true value and proceeds fetching instructions based on this prediction.
- the WGHR 316 allows the processor 100 to store and process branch history information associated with branch instructions which have been identified prior to the DCD stage 310 .
- the WGHR 316 may be loaded with the prediction value from the BHT 140 for a conditional branch instruction when a BTAC hit occurs.
- a BTAC hit signifies that the instruction being fetched is a branch instruction and has associated branch history information (e.g. prediction value for a conditional branch instruction or a taken direction for an unconditional branch instruction).
- the processor 100 can utilize the branch history information earlier for subsequent branch predictions (i.e the branch history information is more current) as opposed to waiting until the branch instruction is confirmed during the DCD stage 310 .
- the output of the WGHR 314 is sent to the address hashing logic circuit 320 to determine the address index for the next entry in the BHT 140 .
- branch history information and BTAC hit may be received during the IC 2 stage 306 .
- the branch history information and BTAC hit may be received during the IDA stage 308 .
- branch history information and BTAC hit may be available during those stages prior to a decoding stage.
- the branch history information for conditional branch instructions is shifted in to the WGHR 316 during the IC 2 stage 306 (when a BTAC hit occurs).
- branch history information for both conditional branch instructions and unconditional branch instructions are shifted into the WGHR 316 .
- the WGHR 316 may be updated during the IDA stage 308 with branch history information. This situation may occur when the prediction value stored in the BHT 140 or the BTAC hit information is not available until the IDA stage 308 .
- the selection mux 322 is configured to receive the output of WGHR 316 .
- the output of the WGHR 316 is a nine bit value containing the branch history of the last nine branch instructions processed by the processor 100 .
- the output of the selection mux 322 is used as input into the address hashing logic circuit 320 which indexes into the BHT 140 for the next conditional branch instruction.
- the GHR 314 operates much like the WGHR 316 , except the GHR 314 may be loaded with the branch history information during the DCD stage 310 .
- the contents of the GHR 314 will mirror the contents of the WGHR 316 once the branch instruction passes through the DCD stage 310 .
- the output of the GHR may be used to index the prediction value
- the output of the GHR 314 is coupled to the selection mux 322 .
- the selection mux 322 is directed to select the output of the GHR 314 to be used by the address hashing logic circuit 320 for indexing. In this instance, the GHR 314 is used because the WGHR 316 does not yet have the branch history information for the taken branch (due to the BTAC miss).
- the output of the GHR 314 may also be used by the address hashing logic circuit 320 when a BTAC miss occurs because the WGHR 316 may have been updated by a subsequently fetched branch instruction prior to indexing the BHT 140 for the current branch instruction. In this instance, the WGHR 314 may not reflect the proper value for the current branch instruction and if used by the address hashing logic circuit 320 an incorrect entry in the BHT 140 may be indexed.
- the output of the GHR 314 is also coupled to Branch Correction Logic circuit (BCL) 330 .
- BCL Branch Correction Logic circuit
- the BCL 340 uses the GHR 314 to provide a “true” copy of the branch history information which is used for recovery purposes should a mispredict occur.
- the BCL 340 restores the branch history information in both the GHR 314 and WGHR 316 .
- a mispredict occurs when a branch instruction reaches a resolution stage and the actual results do not match the predicted results.
- the BCL 340 When a mispredict occurs, the BCL 340 sends information to the fetch logic circuit 302 which directs the fetch logic circuit 302 to flush instructions that were fetched based on the mispredicted conditional branch instruction. In order to be more efficient, the BCL 340 may restore the GHR 314 and the WGHR 316 to the correct branch history information at the same time it provides the correct branch history information to the selection mux 322 . When the mispredict occurs, the processor 100 may select the output of the BCL 340 (through the selection mux 320 ) to be directed to the address hashing logic circuit 320 for use in indexing the appropriate counter 204 .
- the BCL 340 restores the GHR and WGHR to their proper values.
- the BCL 340 may take a snapshot of the GHR 314 after the GHR 314 is loaded with a prediction value for a conditional branch instruction. The BCL 340 may then invert the most recent prediction value (e.g. the MSB) of the GHR 314 . By taking the opposite of the prediction value, the BCL 340 prepares a corrected value which should be reflected in the GHR 314 and WGHR 316 if a mispredict occurs.
- the most recent prediction value e.g. the MSB
- the BCL 340 may flip the MSB corresponding to the conditional branch instruction and store the corrected value “001011111” linked to the conditional branch instruction.
- the conditional branch instruction is incorrectly predicted, the corrected value is ready to be sent to the GHR 314 , the WGHR 316 and the selection mux 322 .
- FIG. 4 displays a detailed view 400 of the WGHR 316 , the GHR 314 and the BCL 340 .
- a WGHR selection mux 402 receives branch history information from the IC 2 stage 306 , the DCD stage 310 as well as corrected branch history information from the BCL 340 .
- a GHR selection mux 404 receives branch history information from the DCD stage 310 and corrected branch history information from the BCL 340 .
- the WGHR selection mux 402 selects which input is used to load the WGHR 316 with branch history information.
- the input from the BCL 340 has priority over information being sent from the IC 2 Stage 306 or DCD stage 310 .
- the BCL 340 has priority because subsequent branch history information following a mispredict may be associated with conditional branch instructions fetched down the incorrectly predicted branch path. Therefore, the branch history information passed by the IC 2 stage or DCD stage 310 may also be incorrect.
- the input selection for the WGHR selection mux 402 may be determined according to the following examples listed from highest to lowest priority:
- a branch instruction returns a BTAC miss during the IC 2 stage 306 but ends up predicted taken after being decoded during the DCD stage 310 , the branch history value confirmed during the DCD stage 310 is shifted into the WGHR 316 .
- the DCD stage 310 has priority in this case because instructions fetched after the predicted taken branch instruction need to be flushed. Therefore, any branch history information identified during the IC 2 stage 306 for a subsequent branch instruction which may be ready to write into the WGHR 316 during the same processor cycle is discarded.
- the DCD stage 310 is not executing a branch instruction associated with a BTAC miss, the IC 2 stage 306 will have the next highest priority. As long as a BTAC hit occurs for the branch instruction, the branch history information identified during the IC 2 stage 306 is shifted in to the WGHR 316 .
- the WGHR 316 will be rewritten once more from the DCD stage 310 .
- the WGHR 316 is written with this branch history information. The writing of the WGHR 316 ensures that the GHR 314 and the WGHR 316 will be synchronized after the instruction passes through the decode stage 310 .
- the GHR selection mux 404 selects the appropriate input used to update the GHR 314 . Similar to the WGHR selection logic 402 , the GHR selection mux 404 gives the input from the BCL 340 the highest priority for the same reasons as explained above. Thus if no mispredict occurs, the GHR 314 is updated with branch history information identified during the DCD stage 310 for a particular branch instruction.
- FIG. 6 shows a timing diagram 600 of the exemplary group of instructions 500 as they move through the upper pipe 350 .
- exemplary group of instructions 500 are multiple branch instructions.
- the X-axis 602 of FIG. 6 depicts the processor cycle and the Y-Axis 604 illustrates the execution stage within upper pipe 350 the instruction passes through as well as the contents of the GHR 314 and WGHR 316 .
- the contents of the GHR 314 and the WGHR 316 are written to during one processor cycle and latched at the beginning of the next processor cycle.
- the latched contents are of the GHR 314 and WGHR 316 are displayed For ease of illustration, only the three most significant bits of the GHR 314 and the WGHR 316 are shown.
- the instructions move down the Y-axis 604 .
- the fetch logic circuit 302 sends a fetch group address to the Instruction Cache 106 , the BTAC 130 and address hashing logic circuit 320 for instruction A. This is shown in the timing diagram 600 as instruction A enters the IC 1 Stage 304 . Also in Processor Cycle 1 , the three most significant bits of the GHR 314 and WGHR 316 are all zeros indicating that the last three branch instructions executed were all not taken.
- Processor Cycle 2 the results of sending the fetch group address to the instruction cache 106 , the BTAC 130 and the BHT 140 are received. This is displayed in the timing diagram as instruction A entering the IC 2 stage 306 . Since the instruction cache 106 stores multiple instructions, instruction A+4 is also shown retrieved along with instruction A in the IC 2 stage 306 . Logic circuitry within the IC 2 stage 306 analyzes the information received from the BTAC 130 and BHT 140 . During the IC 2 stage 306 , the processor 100 determines that instruction A is a conditional branch instruction (based on the information from a BTAC hit) as well as the prediction value returned from the BHT 140 . In this example, instruction A is predicted taken.
- instruction A is a conditional branch instruction (based on the information from a BTAC hit) as well as the prediction value returned from the BHT 140 . In this example, instruction A is predicted taken.
- the actual entry in the BHT 140 for instruction A may be either strongly taken (11) or weakly taken (10).
- the processor 100 loads in a “1” in the MSB of the WGHR 316 to reflect the prediction value associated with conditional branch instruction A. Since instruction A is predicted taken, the next sequential instruction (A+4) is flushed after instruction A passes through the IC 2 stage 306 since instruction A+4 will not be the next instruction to be executed. As shown in the timing diagram 600 , the value “100” is latched into the WGHR 316 at the start of Processor Cycle 3 .
- instruction A enters the IDA stage 308 . While in the IDA stage 308 , instruction A is formatted and aligned, thus preparing the instruction to enter the DCD stage 310 . While instruction A moves through the IDA stage 308 , the fetch group address for instruction B is sent to the instruction cache 106 , the BTAC 130 and BHT 140 during the IC 1 stage 304 .
- instruction A enters the DCD stage 310 , the results from the fetch request for instructions B and B+4 are received (the IC 2 stage 306 ) and the fetch group address for instruction B+8 is sent to the instruction cache 106 , the BTAC 130 and BHT 140 (the IC 1 Stage 304 ).
- the contents of WGHR 316 (“100”) are selected by the selection mux 322 and are used by the address hashing logic circuit 320 for indexing the entry into the BHT 140 for instruction B+8.
- the processor 100 confirms that instruction A is a conditional branch instruction and as a result the prediction value (“1”) is shifted into the GHR 314 .
- the processor 100 will not see the updated value of the GHR 314 from instruction A until the beginning of Processor Cycle 5 when the processor 100 latches GHR 314 .
- instruction A leaves the upper pipe 350 and is directed to lower pipelines 160 or 170 for further execution.
- the predicted value returned from a BHT for instruction B+8 may not be accurate. This is because address hashing logic circuit would use the value of the GHR in Processor Cycle 4 to determine the entry in the BHT for instruction B+8, (e.g. the value “000” would have been used). This value of the GHR does not accurately reflect the actual branch history encountered by the processor because the branch history information for instruction A was not accurately reflected. If the same instruction sequence was subsequently executed, but this time, the processor experienced a delay when fetching instruction B+8, (i.e.
- a different entry into the BHT may be accessed.
- a processor using only a GHR to store branch history information could access two different BHT entries for the same conditional branch instruction having the same instruction execution sequence.
- the WGHR 316 is rewritten with the prediction value the same time the GHR 314 is loaded.
- the two registers are synchronized for instruction A. Since it is uncommon that two conditional branch instructions will be predicted taken immediately following one another, there is little chance that synchronizing the two registers will lose any branch history information.
- instructions B and B+4 enter the IDA stage 308 while instructions B+8 and B+12 enter the IC 2 stage 306 .
- the fetch group address for instructions B+16 and B+20 are sent to the instruction cache 106 , BTAC 130 and BHT 140 .
- instruction B+8 returns a BTAC hit. Since instruction B+8 is a BTAC hit, the processor 100 also determines that instruction B+8 is a conditional branch instruction and its prediction value returned from the BHT 140 during the IC 2 stage 306 is shifted into the WGHR 316 . In this example, instruction B+8 is also predicted taken.
- the actual entry in the BHT 140 may be either strongly taken (11) or weakly taken (10). Because instruction B+8 is a predicted taken branch instruction, instructions B+12, B+16 and B+20 will be flushed by the fetch logic circuit 302 after instruction B+8 leaves the IC 2 stage 306 and the target address reflecting instruction C (received from the BTAC hit) is directed to the fetch logic circuit 302 .
- the contents of the WGHR 316 are updated with the prediction value of taken (“1”) and the value is latched at the beginning of Processor Cycle 6 as reflected in the timing diagram 600 .
- instructions B and B+4 enter the DCD stage 310 while instruction B+8 enters the IDA stage 308 . Also during Processor Cycle 6 , the fetch group address for instruction C is sent to the Instruction Cache 106 , BTAC 130 and BHT 140 (IC 1 stage 304 ). At the end of Processor Cycle 6 , instructions B and B+4 leave the upper pipe 350 and are directed to lower pipelines 160 or 170 for further execution.
- instruction B+8 is processed during the DCD stage 310 .
- instruction B+8 is confirmed as a conditional branch instruction and its prediction value is also confirmed.
- the prediction value identified for instruction B+8 is shifted into the GHR 314 and reloaded into the WGHR 316 during Processor Cycle 7 .
- Instructions C and C+4 are returned from the Instruction Cache 106 during the IC 2 stage 306 .
- instruction B+8 leaves the upper pipe 350 and is directed to lower pipelines 160 or 170 for further execution.
- FIG. 7 is a flow chart displaying an instruction process flow 700 taken by the processor 100 executing instructions using a Working Global History Register (WGHR) 316 .
- the instruction process flow 700 starts at block 702 .
- the instruction process flow proceeds to block 704 where the fetch logic circuit 302 sends the fetch group address to the BTAC 130 and the address hashing logic circuit 320 (for indexing into the BHT 140 ).
- the sending of the fetch group address may occur during the IC 1 stage 304 in the processor 100 .
- results of searching the BTAC 130 (to determine if the instruction being fetched is a branch instruction) are returned. The results are returned during the IC 2 stage 306 .
- the instruction process flow 700 proceeds to decision block 706 .
- the processor 100 determines if a BTAC hit has occurred at decision block 706 . This determination may also occur during the IC 2 stage 306 . As explained previously, a BTAC hit may occur for a conditional branch instruction or a taken unconditional branch instruction. If there is no BTAC hit (e.g. a BTAC miss), the instruction process flow 700 proceeds directly to block 712 .
- the instruction process flow 700 proceeds to block 710 .
- the WGHR 316 is updated by shifting the prediction value retrieved from the BHT 140 into the WGHR 316 . For example, a “1” is shifted into the WGHR 316 if the branch instruction is predicted taken or a “0” is shifted in if the prediction is not taken.
- the prediction value may be returned during any processor execution stage prior to a decode stage. In the embodiment as previously described the WGHR 316 is updated during the IC 2 stage 306 .
- the instruction process flow 700 proceeds to block 712 where the instruction passes through a Decode Stage (e.g. the DCD Stage 310 ). During the Decode Stage, at block 712 , the instruction may be confirmed as a branch instruction. After the instruction is executed in the decode stage, the instruction process flow 700 proceeds to decision block 714 . If at decision block 714 , the instruction is not a branch instruction, the instruction process flow 700 ends at block 720 .
- a Decode Stage e.g. the DCD Stage 310 .
- the processor 100 confirms that the instruction is a branch instruction
- the instruction process flow 700 proceeds to block 716 .
- the WGHR 316 and GHR 314 are updated with the appropriate branch history information and the instruction process flow ends at block 720 .
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
- a processor may also be implemented as a combination of computing components, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
Abstract
Description
- 1. Field of Invention
- The present invention relates generally to computer systems, and more particularly to a method and a system for using a working global history register.
- 2. Relevant Background
- At the heart of the computer platform evolution is the processor. Early processors were limited by the technology available at that time. New advances in fabrication technology allow transistor designs to be reduced up to and exceeding 1/1000th of the size of early processors. These smaller processor designs are faster, more efficient and use substantially less power while delivering processing power exceeding prior expectations.
- As the physical design of the processor evolved, innovative ways of processing information and performing functions have also changed. For example, “pipelining” of instructions has been implemented in processor designs since the early 1960's. One example of pipelining is the concept of breaking execution pipelines into units or stages, through which instructions flow sequentially in a stream. The stages are arranged so that several stages can be simultaneously processing the appropriate parts of several instructions. One advantage of pipelining is that the execution of the instructions is overlapped because the instructions are evaluated in parallel.
- A processor pipeline is composed of many stages where each stage performs a function associated with executing an instruction. Each stage is referred to as a pipe stage or pipe segment. The stages are connected together to form the pipeline. Instructions enter at one end of the pipeline and exit at the other end.
- Most programs executed by the processor include conditional branch instructions, the actual branching behavior of which is not known until the instruction is evaluated deep in the pipeline. To avoid a stall that would result from waiting for actual evaluation of the branch instruction, modern processors may employ some form of branch prediction, whereby the branching behavior of a conditional branch instruction is predicted early in the pipeline. Based on the predicted branch evaluation, the processor speculatively fetches and executes instructions from a predicted address—either the branch target address (if the branch is predicted to be taken) or the next sequential address after the branch instruction (if the branch is predicted not to be taken). Whether a conditional branch instruction is taken or not taken is referred to as determining the direction of the branch. Determining the direction of the branch may be made at prediction time and at actual branch resolution time. When the actual branch behavior is determined, if the branch was mispredicted, the speculatively fetched instructions must be flushed from the pipeline, and new instructions fetched from the correct address. Speculatively fetching instructions in response to an erroneous branch prediction can adversely impact processor performance and power consumption. Consequently, improving the accuracy of branch predictions is an important processor design goal.
- One known form of branch prediction includes partitioning branch prediction into two predictors: an initial branch target address cache (BTAC) and a branch history table (BHT). The BTAC is indexed by an instruction fetch group address and contains the next fetched address, also referred to as the branch target, corresponding to the instruction fetch group address. Entries are added to the BTAC after a branch instruction has passed through the processor pipeline and its branch has been taken. If the BTAC becomes full, entries are removed from the BTAC using standard cache replacement algorithms (such as round robin or least-recently used) when the next entry is being added.
- The BTAC may be a highly-associative cache design and is accessed early in the instruction execution pipeline. If the fetch group address matches a BTAC entry (a BTAC hit), the corresponding next fetch address or target address is fetched in the next cycle. This match and subsequent fetching of the target address is referred to as an implicit taken branch prediction. If there is no match (a BTAC miss), the next sequentially incremented address is fetched in the next cycle. This no match situation is also referred to an implicit not-taken prediction.
- BTACs may be utilized in conjunction with a more accurate individual branch direction predictor such as a branch history table (BHT) also known as a pattern history table (PHT). A conventional BHT may contain a set of saturating predicted direction counters to produce a more accurate taken/not-taken decision for individual branch instructions. For example, each saturating predicted direction counter may comprise a 2-bit counter that assumes one of four states, each assigned a weighted prediction value, such as:
-
- 11—Strongly predicted taken
- 10—Weakly predicted taken
- 01—Weakly predicted not taken
- 00—Strongly predicted not taken
- The output of a conventional BHT, also referred to as a prediction value, is a taken or not taken decision which results in either fetching the target address of the branch instruction or the next sequential address in the next cycle. The BHT is commonly updated with branch outcome information as it becomes known.
- In order to increase the accuracy of branch predictions, various other prediction techniques may be implemented which use recent branch history information from other branches as feedback. As those skilled in the art appreciate, current branch behavior may be correlated to the history of previously executed branch instructions. For example, the history of previously executed branch instructions may influence how a conditional branch instruction is predicted.
- A Global History Register (GHR), also referred to in the art as a global branch history register or a global history shift register, may be used to keep track of the past history of previously executed branch instructions. As stored by the GHR, the branch history provides a view of the sequence of branch instructions encountered in the code path leading up to the presently executed branch instruction in order to achieve improved prediction results.
- In some processors, identification of a branch instruction and its associated prediction information may occur only after an instruction decode stage. Commonly, the instruction decode stage may be a later stage in the instruction execution sequence. After an instruction is decoded and confirmed as a branch instruction, the GHR is loaded with appropriate branch history information. As the branch history information is identified it is shifted into the GHR. The output of the GHR is used to identify the prediction value stored in the BHT which is used to predict the next conditional branch instruction.
- In a conventional processor using a GHR, the GHR may not reflect the actual branch history information encountered when multiple branch instructions are executed in parallel during a relatively short period of time. In this instance, the GHR may not be updated with the branch history information from the first branch instruction before the second branch instruction is predicted. As a result, an inaccurate value of the GHR may be used to identify the entry in the BHT for the second conditional branch instruction. Using an inaccurate value to index the entry in the BHT may affect the accuracy of the branch prediction. If the processor had been able to keep pace with the branch history information from the first conditional branch instruction, a different value would have been stored in the GHR and a different entry in the BHT would have been identified for the second conditional branch instruction.
- Accordingly, there exists a need in the industry to have a processor that may store and use branch history information sooner than the GHR in order to achieve more accurate branch predictions. The present disclosure recognizes this need and discloses a processor which identifies branch instructions early in the execution stages of the processor. Using the branch instruction information as input, the processor may steer the selection of prediction values for subsequent conditional branch instructions.
- A method of processing branch history information is disclosed. The method identifies branch instructions during a first pipeline stage and loads the branch history information in a first register during the first pipeline stage. The method confirms the branch instructions in a second pipeline stage and the branch history information is loaded into a second register during the second pipeline stage.
- A pipeline processor comprising a first register having branch history information and a second register having branch history information is disclosed. The pipeline processor has a plurality of pipeline stages wherein the first register is loaded with the branch history information in a first pipeline stage when a branch instruction is identified and, a second register is loaded with branch history information during a second pipeline stage.
- A method of processing branch history information is disclosed. The method fetches a branch instruction, identifies the branch instructions during a first pipeline stage and loads the branch history information in a first register during the first pipeline stage. The method confirms the branch instructions in a second pipeline stage and the branch history information is loaded into a second register during the second pipeline stage.
- A more complete understanding of the present invention, as well as further features and advantages of the invention, will be apparent from the following detailed description and the accompanying drawings.
-
FIG. 1 shows a high level logic hardware block diagram of a processor using an embodiment of the present invention. -
FIG. 2 displays an exemplary branch history table used by the processor ofFIG. 1 . -
FIG. 3 shows a lower level logic block diagram of the processor ofFIG. 1 employing a Working Global History Register. -
FIG. 4 shows detailed view of the Working Global History Register and the Global History Register. -
FIG. 5 shows an exemplary group of instructions executed by the processor ofFIG. 1 . -
FIG. 6 shows a timing diagram of the exemplary group of instructions ofFIG. 5 as they are executed through various stages of the processor ofFIG. 1 . -
FIG. 7 shows a flow chart illustrating an instruction process flow performed by the processor ofFIG. 1 using a Working Global History Register - The detailed description set forth below in connection with the appended drawings is intended as a description of various embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring the concepts of the present invention. Acronyms and other descriptive terminology may be used merely for convenience and clarity and are not intended to limit the scope of the invention.
-
FIG. 1 shows a high level view of asuperscalar processor 100 utilizing an embodiment as hereinafter described. Theprocessor 100 has a central processing unit (CPU) 102 that is coupled via a dedicatedhigh speed bus 104 to aninstruction cache 106. The instruction cache is also coupled via ageneral purpose bus 116 tomemory 114. - Within the
processor 100, an Instruction Fetch Unit (IFU) 122 controls the loading of instructions frommemory 114 into theinstruction cache 106. Once theinstruction cache 106 is loaded with instructions, theCPU 102 is able to access them via thehigh speed bus 104. Theinstruction cache 106 may be a separate memory structure as shown inFIG. 1 , or it may be integrated as an internal component of theCPU 102. The integration may hinge on the size of theinstruction cache 106 as well as the complexity and power dissipation of theCPU 102. Also coupled to theIFU 122 is a Branch Target Address Cache 130 (BTAC), a Branch History Table 140 (BHT) and twolower pipelines - Instructions may be fetched and decoded from the
instruction cache 106 several instructions at a time. Within theinstruction cache 106 instructions are grouped into sections known as cache lines. Each cache line may contain multiple instructions as well as associated data. The number of instructions fetched may depend upon the required fetch bandwidth as well as the number of instructions in each cache line. Within theIFU 122, the fetched instructions are analyzed for operation type and data dependencies. After analyzing the instructions, theprocessor 100 may distribute the instructions from theIFU 122 to lower functional units orlower pipelines -
Lower pipelines EU 118 such as an arithmetic logic unit may execute a wide range of arithmetic functions, such as integer addition, subtraction, simple multiplication, bitwise logic operations (e.g. AND, NOT, OR, XOR), bit shifting and the like. Additionally, thelower pipelines processor 100 may compare the actual results to the predicted results and, if they don't match, a mispredict has occurred. - Those skilled in the art appreciate that the
BTAC 130 may be similar to a Branch Target Buffer (BTB) or a Branch Target Instruction Cache (BTIC). A BTB or BTIC stores both the address of a branch and the instruction data (or opcodes) of the target branch. For ease of illustration, theBTAC 130 is used in conjunction with the various embodiments of the present invention. Other embodiments of the invention may alternatively include a BTB or BTIC instead of theBTAC 130. - The first time a branch instruction is executed, there is no entry in the
BTAC 130 and a BTAC miss occurs. After the branch instruction finishes its execution, theBTAC 130 may be subsequently updated to reflect the target address of the particular conditional branch instruction as well as a processor mode (e.g. Arm vs. Thumb operation in the advanced RISC processor architecture). Any time thereafter that the branch instruction is fetched again, the information stored in theBTAC 130 will be fetched on the next processor cycle, even without completely decoding the fetched branch instruction. - A BTAC hit (e.g. when the fetch group address matches an address in the BTAC 130) may occur for either a conditional or unconditional branch instruction. This is due to the fact that the
BTAC 130 may store information relating to both conditional branch instructions as well as unconditional branch instructions. In the case of a BTAC hit for an unconditional branch instruction, the predicted target address, predicted mode of the processor as well as the fact that the branch instruction is unconditional may be stored. In situations where an unconditional branch instruction address is stored in an entry in theBTAC 130, the entry will indicate a branch direction of taken. -
FIG. 2 displays a more detailed illustration of an exemplary Branch History Table (BHT) 140 used by theprocessor 100. TheBHT 140 may be organized into 2mlines 202 which are indexed using an address having m address bits. In one embodiment, nine bits of address are used which results in aBHT 140 having 512 lines. Within eachline 202 there are 2ncounters 204, where n is the number of bits used to select the appropriate counter. Additionally, 3 bits of address may be used to select thecounter 204, resulting in aBHT 140 that has eightcounters 204 perline 202. In one exemplary embodiment, fetchgroup address bits 12 through 4 may be used to select theline 202 in theBHT 140. Bits 3-1 of the fetch group address may be used to select thespecific counter 204. - The
processor 100 may identify branch instructions earlier in the instruction execution process prior to an instruction decode stage. When branch instructions are identified earlier, branch history information, such as the prediction value (conditional branch instruction) or taken branch direction (unconditional branch instruction) may also be identified at the same time. A Working Global History Register (WGHR), as will be described in the discussion ofFIG. 3 , may be used by theprocessor 100 to receive and process the branch history information earlier in the instruction execution process. For example, a WGHR may store the prediction values of conditional branch instructions as well as branch directions of unconditional branch instructions. Alternatively, a WGHR may store only the prediction values of conditional branch instructions. The output of the WGHR may be employed to index a corresponding entry in theBHT 140 for the next conditional branch instruction. -
FIG. 3 displays a lower level logic block diagram 300 of theprocessor 100 including a Working Global History Register (WGHR) 316. In the lower level block diagram 300 is an upper pipe 350. Coupled to the top of the upper pipe is fetch logic circuit 302. The upper pipe 350 includes four instruction execution stages, anInstruction Cache 1 Stage (IC1) 304, an Instruction Cache 2 Stage (IC2) 306, an Instruction Data Alignment Stage (IDA) 308 and a Decode Stage (DCD) 310. It should be noted that pipe stages may be added to or subtracted from upper pipe 350 without limiting the scope of the present disclosure. The fetch logic circuit 302 as well as the upper pipe 350, the Working Global History Register (WGHR) 316, Global History Register (GHR) 314, Branch Correction logic circuit (BCL) 330, selection mux 322, and address hashinglogic circuit 320 may also be located within theIFU 122. - As the
processor 100 begins executing instructions, the fetch logic circuit 302 determines what instructions are to be fetched during the IC1 stage 304. In order to retrieve the instructions, the fetch logic circuit 302 sends the fetch group address to theInstruction Cache 106. If the fetch group address is found within the Instruction Cache 106 (e.g. an instruction cache hit) the instructions are read from the hit cache line in theInstruction Cache 106 during the IC2 stage 304. - In parallel, during the IC1 stage 304, the
processor 100 sends the fetch group address to theBTAC 130. If theprocessor 100 encounters a BTAC hit, the information stored within the BTAC for the fetch group address is received during the IC2 Stage 306. As mentioned previously, information stored within theBTAC 130 may include branch information such as a branch target, processor mode, as well as a taken branch direction (in the case of an unconditional branch instruction). - Also during the IC1 stage 304, the fetch logic sends the fetch group address to the address hashing
logic circuit 320. Within the addressinghashing logic circuit 320, bits 12-4 of the fetch group address are exclusively or'd (XOR'd) with the output of the selection mux 322. The output of the address hashing logic circuit 320 (e.g. the XOR function) provides the address index into theBHT 140. As mentioned previously, bits 3-1 of the fetch group address may provide the selection bits to select theappropriate counter 204. - During the IC2 stage 306, the
processor 100 reads the results from sending the instruction fetch group address to theInstruction Cache 106, theBTAC 130 and theBHT 140. In the IC2 stage 306, theprocessor 100 determines if a BTAC hit has occurred. When a BTAC hit is confirmed during the IC2 stage 306 theprocessor 100 also determines if the branch is a conditional or unconditional branch instruction. In the IC2 stage 306 the prediction value from theBHT 140 is also received and stored. - Since each cache line in the
Instruction Cache 106 may contain multiple instructions, the individual instructions may need to be separated from a cache line. As well, data may be intertwined with the instructions in the cache line. The information from the cache line may need to be formatted and aligned in order to properly analyze and execute the instructions. The alignment and formatting of the instructions into individual executable instructions occurs during the IDA stage 308. - After the instructions are processed during the IDA stage 308, they pass through the Decode (DCD)
stage 310. During theDCD stage 310, the instructions are analyzed to determine the type of instruction and what additional information or resources may be required for further processing. Depending on the type of instruction or the current instruction load, theprocessor 100 may hold the instruction in theDCD stage 310 or theprocessor 100 may pass it on to either of thelower pipelines DCD stage 310 theprocessor 100 confirms the instruction as a conditional branch instruction and confirms the instruction's prediction value (read during the IC2 stage 306) from theBHT 140. The accuracy of the prediction value will be verified during a later stage of instruction execution in either of thelower pipelines processor 100 assumes that the prediction value is the true value and proceeds fetching instructions based on this prediction. - Coupled to the upper pipe 350 is the Working Global History Register 316 (WGHR). The WGHR 316 allows the
processor 100 to store and process branch history information associated with branch instructions which have been identified prior to theDCD stage 310. In one embodiment, the WGHR 316 may be loaded with the prediction value from theBHT 140 for a conditional branch instruction when a BTAC hit occurs. As stated previously, a BTAC hit signifies that the instruction being fetched is a branch instruction and has associated branch history information (e.g. prediction value for a conditional branch instruction or a taken direction for an unconditional branch instruction). Based on this condition, theprocessor 100 can utilize the branch history information earlier for subsequent branch predictions (i.e the branch history information is more current) as opposed to waiting until the branch instruction is confirmed during theDCD stage 310. The output of the WGHR 314 is sent to the address hashinglogic circuit 320 to determine the address index for the next entry in theBHT 140. - When the branch history information becomes available is dependent upon on how fast the branch history information may be retrieved from the
BHT 140 and how fast a BTAC hit may be acknowledged. In some processor designs, the branch history information and BTAC hit may be received during the IC2 stage 306. In other processor designs, the branch history information and BTAC hit may be received during the IDA stage 308. In yet other processor designs incorporating stages other than the stages previously described, branch history information and BTAC hit may be available during those stages prior to a decoding stage. - In one embodiment, the branch history information for conditional branch instructions is shifted in to the WGHR 316 during the IC2 stage 306 (when a BTAC hit occurs). In yet another embodiment, branch history information for both conditional branch instructions and unconditional branch instructions are shifted into the WGHR 316. In a further embodiment, the WGHR 316 may be updated during the IDA stage 308 with branch history information. This situation may occur when the prediction value stored in the
BHT 140 or the BTAC hit information is not available until the IDA stage 308. - The selection mux 322 is configured to receive the output of WGHR 316. In one embodiment, the output of the WGHR 316 is a nine bit value containing the branch history of the last nine branch instructions processed by the
processor 100. The output of the selection mux 322 is used as input into the address hashinglogic circuit 320 which indexes into theBHT 140 for the next conditional branch instruction. - The GHR 314 operates much like the WGHR 316, except the GHR 314 may be loaded with the branch history information during the
DCD stage 310. The contents of the GHR 314 will mirror the contents of the WGHR 316 once the branch instruction passes through theDCD stage 310. Depending on the circumstances the output of the GHR may be used to index the prediction value - The output of the GHR 314 is coupled to the selection mux 322. When a BTAC miss occurs and it is determined during the
DCD stage 310 that the instruction is confirmed as a taken branch instruction, the selection mux 322 is directed to select the output of the GHR 314 to be used by the address hashinglogic circuit 320 for indexing. In this instance, the GHR 314 is used because the WGHR 316 does not yet have the branch history information for the taken branch (due to the BTAC miss). Alternatively, the output of the GHR 314 may also be used by the address hashinglogic circuit 320 when a BTAC miss occurs because the WGHR 316 may have been updated by a subsequently fetched branch instruction prior to indexing theBHT 140 for the current branch instruction. In this instance, the WGHR 314 may not reflect the proper value for the current branch instruction and if used by the address hashinglogic circuit 320 an incorrect entry in theBHT 140 may be indexed. - The output of the GHR 314 is also coupled to Branch Correction Logic circuit (BCL) 330. The
BCL 340 uses the GHR 314 to provide a “true” copy of the branch history information which is used for recovery purposes should a mispredict occur. When a mispredict occurs, theBCL 340 restores the branch history information in both the GHR 314 and WGHR 316. As mentioned previously, a mispredict occurs when a branch instruction reaches a resolution stage and the actual results do not match the predicted results. - When a mispredict occurs, the
BCL 340 sends information to the fetch logic circuit 302 which directs the fetch logic circuit 302 to flush instructions that were fetched based on the mispredicted conditional branch instruction. In order to be more efficient, theBCL 340 may restore the GHR 314 and the WGHR 316 to the correct branch history information at the same time it provides the correct branch history information to the selection mux 322. When the mispredict occurs, theprocessor 100 may select the output of the BCL 340 (through the selection mux 320) to be directed to the address hashinglogic circuit 320 for use in indexing theappropriate counter 204. - When the
processor 100 encounters a mispredict, theBCL 340 restores the GHR and WGHR to their proper values. In one embodiment, theBCL 340 may take a snapshot of the GHR 314 after the GHR 314 is loaded with a prediction value for a conditional branch instruction. TheBCL 340 may then invert the most recent prediction value (e.g. the MSB) of the GHR 314. By taking the opposite of the prediction value, theBCL 340 prepares a corrected value which should be reflected in the GHR 314 and WGHR 316 if a mispredict occurs. For example, if after identifying a conditional branch instruction and its prediction value during theDCD stage 310, the GHR 314 and theBCL 340 are loaded with the value “101011111” (MSB=>LSB). TheBCL 340 may flip the MSB corresponding to the conditional branch instruction and store the corrected value “001011111” linked to the conditional branch instruction. Thus, if the conditional branch instruction is incorrectly predicted, the corrected value is ready to be sent to the GHR 314, the WGHR 316 and the selection mux 322. -
FIG. 4 displays adetailed view 400 of the WGHR 316, the GHR 314 and theBCL 340. Within thedetailed view 400, aWGHR selection mux 402 receives branch history information from the IC2 stage 306, theDCD stage 310 as well as corrected branch history information from theBCL 340. AGHR selection mux 404 receives branch history information from theDCD stage 310 and corrected branch history information from theBCL 340. - The
WGHR selection mux 402 selects which input is used to load the WGHR 316 with branch history information. When a mispredict occurs, the input from theBCL 340 has priority over information being sent from the IC2 Stage 306 orDCD stage 310. TheBCL 340 has priority because subsequent branch history information following a mispredict may be associated with conditional branch instructions fetched down the incorrectly predicted branch path. Therefore, the branch history information passed by the IC2 stage orDCD stage 310 may also be incorrect. - If no mispredict occurs, the input selection for the
WGHR selection mux 402 may be determined according to the following examples listed from highest to lowest priority: - a) If a branch instruction returns a BTAC miss during the IC2 stage 306 but ends up predicted taken after being decoded during the
DCD stage 310, the branch history value confirmed during theDCD stage 310 is shifted into the WGHR316. TheDCD stage 310 has priority in this case because instructions fetched after the predicted taken branch instruction need to be flushed. Therefore, any branch history information identified during the IC2 stage 306 for a subsequent branch instruction which may be ready to write into the WGHR 316 during the same processor cycle is discarded. - b) If the
DCD stage 310 is not executing a branch instruction associated with a BTAC miss, the IC2 stage 306 will have the next highest priority. As long as a BTAC hit occurs for the branch instruction, the branch history information identified during the IC2 stage 306 is shifted in to the WGHR 316. - c) If a branch instruction has been previously identified as a BTAC hit and the associated branch history information was loaded according to the previously described example (b), the WGHR 316 will be rewritten once more from the
DCD stage 310. As well, if a conditional branch instruction is a BTAC miss and the branch instruction is predicted not taken, the WGHR 316 is written with this branch history information. The writing of the WGHR 316 ensures that the GHR 314 and the WGHR 316 will be synchronized after the instruction passes through thedecode stage 310. - The
GHR selection mux 404 selects the appropriate input used to update the GHR 314. Similar to theWGHR selection logic 402, theGHR selection mux 404 gives the input from theBCL 340 the highest priority for the same reasons as explained above. Thus if no mispredict occurs, the GHR 314 is updated with branch history information identified during theDCD stage 310 for a particular branch instruction. -
FIG. 6 shows a timing diagram 600 of the exemplary group ofinstructions 500 as they move through the upper pipe 350. Within exemplary group ofinstructions 500 are multiple branch instructions. TheX-axis 602 ofFIG. 6 depicts the processor cycle and the Y-Axis 604 illustrates the execution stage within upper pipe 350 the instruction passes through as well as the contents of the GHR 314 and WGHR 316. The contents of the GHR 314 and the WGHR 316 are written to during one processor cycle and latched at the beginning of the next processor cycle. As reflected in the timing diagram 600, the latched contents are of the GHR 314 and WGHR 316 are displayed For ease of illustration, only the three most significant bits of the GHR 314 and the WGHR 316 are shown. As the instructions are executed, the instructions move down the Y-axis 604. - In
Processor Cycle 1, the fetch logic circuit 302 sends a fetch group address to theInstruction Cache 106, theBTAC 130 and address hashinglogic circuit 320 for instruction A. This is shown in the timing diagram 600 as instruction A enters the IC1 Stage 304. Also inProcessor Cycle 1, the three most significant bits of the GHR 314 and WGHR 316 are all zeros indicating that the last three branch instructions executed were all not taken. - In Processor Cycle 2 the results of sending the fetch group address to the
instruction cache 106, theBTAC 130 and theBHT 140 are received. This is displayed in the timing diagram as instruction A entering the IC2 stage 306. Since theinstruction cache 106 stores multiple instructions, instruction A+4 is also shown retrieved along with instruction A in the IC2 stage 306. Logic circuitry within the IC2 stage 306 analyzes the information received from theBTAC 130 andBHT 140. During the IC2 stage 306, theprocessor 100 determines that instruction A is a conditional branch instruction (based on the information from a BTAC hit) as well as the prediction value returned from theBHT 140. In this example, instruction A is predicted taken. The actual entry in theBHT 140 for instruction A may be either strongly taken (11) or weakly taken (10). At the end of Processor Cycle 2 theprocessor 100 loads in a “1” in the MSB of the WGHR 316 to reflect the prediction value associated with conditional branch instruction A. Since instruction A is predicted taken, the next sequential instruction (A+4) is flushed after instruction A passes through the IC2 stage 306 since instruction A+4 will not be the next instruction to be executed. As shown in the timing diagram 600, the value “100” is latched into the WGHR 316 at the start of Processor Cycle 3. - During Processor Cycle 3, instruction A enters the IDA stage 308. While in the IDA stage 308, instruction A is formatted and aligned, thus preparing the instruction to enter the
DCD stage 310. While instruction A moves through the IDA stage 308, the fetch group address for instruction B is sent to theinstruction cache 106, theBTAC 130 andBHT 140 during the IC1 stage 304. - In
Processor Cycle 4, instruction A enters theDCD stage 310, the results from the fetch request for instructions B and B+4 are received (the IC2 stage 306) and the fetch group address forinstruction B+ 8 is sent to theinstruction cache 106, theBTAC 130 and BHT 140 (the IC1 Stage 304). The contents of WGHR 316 (“100”) are selected by the selection mux 322 and are used by the address hashinglogic circuit 320 for indexing the entry into theBHT 140 forinstruction B+ 8. When instruction A is in theDCD stage 310, theprocessor 100 confirms that instruction A is a conditional branch instruction and as a result the prediction value (“1”) is shifted into the GHR 314. - The
processor 100 will not see the updated value of the GHR 314 from instruction A until the beginning of Processor Cycle 5 when theprocessor 100 latches GHR 314. At the end ofProcessor Cycle 4, instruction A leaves the upper pipe 350 and is directed tolower pipelines - In a conventional processor that does not utilize a WGHR 316 and employs only a GHR to store branch history information and executed the exemplary group of
instructions 500, the predicted value returned from a BHT forinstruction B+ 8 may not be accurate. This is because address hashing logic circuit would use the value of the GHR inProcessor Cycle 4 to determine the entry in the BHT forinstruction B+ 8, (e.g. the value “000” would have been used). This value of the GHR does not accurately reflect the actual branch history encountered by the processor because the branch history information for instruction A was not accurately reflected. If the same instruction sequence was subsequently executed, but this time, the processor experienced a delay when fetchinginstruction B+ 8, (i.e. the contents of the GHR were updated by the time the address hashing logic circuit used the value of the GHR to access the BHT entry) a different entry into the BHT may be accessed. In this case, a processor using only a GHR to store branch history information could access two different BHT entries for the same conditional branch instruction having the same instruction execution sequence. - In one embodiment, when instruction A is in the
DCD stage 310, the WGHR 316 is rewritten with the prediction value the same time the GHR 314 is loaded. By writing both registers with the same prediction value at the same time, the two registers are synchronized for instruction A. Since it is uncommon that two conditional branch instructions will be predicted taken immediately following one another, there is little chance that synchronizing the two registers will lose any branch history information. - In Processor Cycle 5, instructions B and B+4 enter the IDA stage 308 while instructions B+8 and B+12 enter the IC2 stage 306. Also in Processor Cycle 5, the fetch group address for instructions B+16 and B+20 are sent to the
instruction cache 106,BTAC 130 andBHT 140. In the IC2 Stage 306, instruction B+8 returns a BTAC hit. Sinceinstruction B+ 8 is a BTAC hit, theprocessor 100 also determines thatinstruction B+ 8 is a conditional branch instruction and its prediction value returned from theBHT 140 during the IC2 stage 306 is shifted into the WGHR 316. In this example,instruction B+ 8 is also predicted taken. The actual entry in theBHT 140 may be either strongly taken (11) or weakly taken (10). Becauseinstruction B+ 8 is a predicted taken branch instruction, instructions B+12, B+16 and B+20 will be flushed by the fetch logic circuit 302 after instruction B+8 leaves the IC2 stage 306 and the target address reflecting instruction C (received from the BTAC hit) is directed to the fetch logic circuit 302. The contents of the WGHR 316 are updated with the prediction value of taken (“1”) and the value is latched at the beginning of Processor Cycle 6 as reflected in the timing diagram 600. - In Processor Cycle 6, instructions B and B+4 enter the
DCD stage 310 whileinstruction B+ 8 enters the IDA stage 308. Also during Processor Cycle 6, the fetch group address for instruction C is sent to theInstruction Cache 106,BTAC 130 and BHT 140 (IC1 stage 304). At the end of Processor Cycle 6, instructions B and B+4 leave the upper pipe 350 and are directed tolower pipelines - In Processor Cycle 7,
instruction B+ 8 is processed during theDCD stage 310. During theDCD stage 310,instruction B+ 8 is confirmed as a conditional branch instruction and its prediction value is also confirmed. The prediction value identified forinstruction B+ 8 is shifted into the GHR 314 and reloaded into the WGHR 316 during Processor Cycle 7. Instructions C and C+4 are returned from theInstruction Cache 106 during the IC2 stage 306. At the end of Processor Cycle 7, instruction B+8 leaves the upper pipe 350 and is directed tolower pipelines - In code segments where branch instructions may be executed in close proximity to one another (based on the depth of the pipeline), the latest branch history information is used to process branch predictions.
- During
Processor Cycle 8, the value of the GHR 314 is latched along with the WGHR 316. Instructions C and C+4 are processed during theIDA stage 310 and any sequential instructions following instruction C and C+4 may be fetched and executed. -
FIG. 7 is a flow chart displaying aninstruction process flow 700 taken by theprocessor 100 executing instructions using a Working Global History Register (WGHR) 316. The instruction process flow 700 starts at block 702. The instruction process flow proceeds to block 704 where the fetch logic circuit 302 sends the fetch group address to theBTAC 130 and the address hashing logic circuit 320 (for indexing into the BHT 140). As mentioned previously, the sending of the fetch group address may occur during the IC1 stage 304 in theprocessor 100. Atblock 704, results of searching the BTAC 130 (to determine if the instruction being fetched is a branch instruction) are returned. The results are returned during the IC2 stage 306. Fromblock 704, theinstruction process flow 700 proceeds todecision block 706. Theprocessor 100 determines if a BTAC hit has occurred atdecision block 706. This determination may also occur during the IC2 stage 306. As explained previously, a BTAC hit may occur for a conditional branch instruction or a taken unconditional branch instruction. If there is no BTAC hit (e.g. a BTAC miss), theinstruction process flow 700 proceeds directly to block 712. - If there is a BTAC hit, the
instruction process flow 700 proceeds to block 710. Atblock 710, the WGHR 316 is updated by shifting the prediction value retrieved from theBHT 140 into the WGHR 316. For example, a “1” is shifted into the WGHR 316 if the branch instruction is predicted taken or a “0” is shifted in if the prediction is not taken. Depending upon the implementation, the prediction value may be returned during any processor execution stage prior to a decode stage. In the embodiment as previously described the WGHR 316 is updated during the IC2 stage 306. - The
instruction process flow 700 proceeds to block 712 where the instruction passes through a Decode Stage (e.g. the DCD Stage 310). During the Decode Stage, atblock 712, the instruction may be confirmed as a branch instruction. After the instruction is executed in the decode stage, theinstruction process flow 700 proceeds todecision block 714. If atdecision block 714, the instruction is not a branch instruction, theinstruction process flow 700 ends atblock 720. - If at
block 714, theprocessor 100 confirms that the instruction is a branch instruction, theinstruction process flow 700 proceeds to block 716. Atblock 716, the WGHR 316 and GHR 314 are updated with the appropriate branch history information and the instruction process flow ends atblock 720. - The various illustrative logical blocks, modules, circuits, elements, and/or components described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic component, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing components, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- Although specific embodiments have been illustrated and described herein, those of ordinary skill in the art appreciate that any arrangement, which is calculated to achieve the same purpose, may be substituted for the specific embodiments shown and that the invention has other applications in other environments. This application is intended to cover any adaptations or variations of the present invention. The following claims are in no way intended to limit the scope of the invention to the specific embodiments described herein.
Claims (22)
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/556,244 US7984279B2 (en) | 2006-11-03 | 2006-11-03 | System and method for using a working global history register |
AT07844606T ATE496329T1 (en) | 2006-11-03 | 2007-10-25 | SYSTEM AND METHOD FOR USING A FUNCTIONAL GLOBAL PREHISTORY REGISTER |
EP07844606A EP2084602B1 (en) | 2006-11-03 | 2007-10-25 | A system and method for using a working global history register |
PCT/US2007/082538 WO2008055045A1 (en) | 2006-11-03 | 2007-10-25 | A system and method for using a working global history register |
JP2009536380A JP5209633B2 (en) | 2006-11-03 | 2007-10-25 | System and method with working global history register |
KR1020097011477A KR101081674B1 (en) | 2006-11-03 | 2007-10-25 | A system and method for using a working global history register |
CN2007800398002A CN101529378B (en) | 2006-11-03 | 2007-10-25 | A system and method for using a working global history register |
DE602007012131T DE602007012131D1 (en) | 2006-11-03 | 2007-10-25 | SYSTEM AND METHOD FOR USING A WORKING GLOBAL HISTORICAL REGISTER |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/556,244 US7984279B2 (en) | 2006-11-03 | 2006-11-03 | System and method for using a working global history register |
Publications (2)
Publication Number | Publication Date |
---|---|
US20080109644A1 true US20080109644A1 (en) | 2008-05-08 |
US7984279B2 US7984279B2 (en) | 2011-07-19 |
Family
ID=38926137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/556,244 Active 2027-05-01 US7984279B2 (en) | 2006-11-03 | 2006-11-03 | System and method for using a working global history register |
Country Status (8)
Country | Link |
---|---|
US (1) | US7984279B2 (en) |
EP (1) | EP2084602B1 (en) |
JP (1) | JP5209633B2 (en) |
KR (1) | KR101081674B1 (en) |
CN (1) | CN101529378B (en) |
AT (1) | ATE496329T1 (en) |
DE (1) | DE602007012131D1 (en) |
WO (1) | WO2008055045A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090037709A1 (en) * | 2007-07-31 | 2009-02-05 | Yasuo Ishii | Branch prediction device, hybrid branch prediction device, processor, branch prediction method, and branch prediction control program |
US20100031011A1 (en) * | 2008-08-04 | 2010-02-04 | International Business Machines Corporation | Method and apparatus for optimized method of bht banking and multiple updates |
US20100161951A1 (en) * | 2008-12-18 | 2010-06-24 | Faraday Technology Corp. | Processor and method for recovering global history shift register and return address stack thereof |
US20140195790A1 (en) * | 2011-12-28 | 2014-07-10 | Matthew C. Merten | Processor with second jump execution unit for branch misprediction |
US10120687B2 (en) | 2014-02-26 | 2018-11-06 | Fanuc Corporation | Programmable controller |
US10372590B2 (en) | 2013-11-22 | 2019-08-06 | International Business Corporation | Determining instruction execution history in a debugger |
US20230315468A1 (en) * | 2022-03-30 | 2023-10-05 | Advanced Micro Devices, Inc. | Enforcing consistency across redundant tagged geometric (tage) branch histories |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5367416B2 (en) * | 2009-03-04 | 2013-12-11 | 光洋サーモシステム株式会社 | Transfer robot device |
US9229723B2 (en) | 2012-06-11 | 2016-01-05 | International Business Machines Corporation | Global weak pattern history table filtering |
US9858081B2 (en) * | 2013-08-12 | 2018-01-02 | International Business Machines Corporation | Global branch prediction using branch and fetch group history |
CN113721985B (en) * | 2021-11-02 | 2022-02-08 | 超验信息科技(长沙)有限公司 | RISC-V vector register grouping setting method, device and electronic equipment |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5155818A (en) * | 1988-09-28 | 1992-10-13 | Data General Corporation | Unconditional wide branch instruction acceleration |
US5604877A (en) * | 1994-01-04 | 1997-02-18 | Intel Corporation | Method and apparatus for resolving return from subroutine instructions in a computer processor |
US5860017A (en) * | 1996-06-28 | 1999-01-12 | Intel Corporation | Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction |
US20020038417A1 (en) * | 2000-09-27 | 2002-03-28 | Joachim Strombergsson | Pipelined microprocessor and a method relating thereto |
US6622240B1 (en) * | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US6647467B1 (en) * | 1997-08-01 | 2003-11-11 | Micron Technology, Inc. | Method and apparatus for high performance branching in pipelined microsystems |
US20040158697A1 (en) * | 2003-02-04 | 2004-08-12 | Via Technologies, Inc. | Pipelined microprocessor, apparatus, and method for performing early correction of conditional branch instruction mispredictions |
US20040225866A1 (en) * | 2003-05-06 | 2004-11-11 | Williamson David James | Branch prediction in a data processing system |
US20050132175A1 (en) * | 2001-05-04 | 2005-06-16 | Ip-First, Llc. | Speculative hybrid branch direction predictor |
US20050228977A1 (en) * | 2004-04-09 | 2005-10-13 | Sun Microsystems,Inc. | Branch prediction mechanism using multiple hash functions |
US20060095750A1 (en) * | 2004-08-30 | 2006-05-04 | Nye Jeffrey L | Processes, circuits, devices, and systems for branch prediction and other processor improvements |
US20060277397A1 (en) * | 2005-06-02 | 2006-12-07 | Sartorius Thomas A | Method and apparatus for predicting branch instructions |
US20070239975A1 (en) * | 2006-04-07 | 2007-10-11 | Lei Wang | Programmable backward jump instruction prediction mechanism |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08106387A (en) | 1994-10-06 | 1996-04-23 | Oki Electric Ind Co Ltd | Instruction prefetch circuit and cache device |
US5918245A (en) * | 1996-03-13 | 1999-06-29 | Sun Microsystems, Inc. | Microprocessor having a cache memory system using multi-level cache set prediction |
US5838962A (en) * | 1997-04-09 | 1998-11-17 | Hewlett-Packard Company | Interrupt driven dynamic adjustment of branch predictions |
US6418530B2 (en) | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US7181190B2 (en) | 2004-04-30 | 2007-02-20 | Microsoft Corporation | Method for maintaining wireless network response time while saving wireless adapter power |
JP4533432B2 (en) * | 2004-06-02 | 2010-09-01 | インテル コーポレイション | TLB correlation type branch predictor and method of using the same |
-
2006
- 2006-11-03 US US11/556,244 patent/US7984279B2/en active Active
-
2007
- 2007-10-25 WO PCT/US2007/082538 patent/WO2008055045A1/en active Application Filing
- 2007-10-25 JP JP2009536380A patent/JP5209633B2/en not_active Expired - Fee Related
- 2007-10-25 AT AT07844606T patent/ATE496329T1/en not_active IP Right Cessation
- 2007-10-25 DE DE602007012131T patent/DE602007012131D1/en active Active
- 2007-10-25 CN CN2007800398002A patent/CN101529378B/en not_active Expired - Fee Related
- 2007-10-25 KR KR1020097011477A patent/KR101081674B1/en not_active IP Right Cessation
- 2007-10-25 EP EP07844606A patent/EP2084602B1/en not_active Not-in-force
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5155818A (en) * | 1988-09-28 | 1992-10-13 | Data General Corporation | Unconditional wide branch instruction acceleration |
US5604877A (en) * | 1994-01-04 | 1997-02-18 | Intel Corporation | Method and apparatus for resolving return from subroutine instructions in a computer processor |
US5768576A (en) * | 1994-01-04 | 1998-06-16 | Intel Corporation | Method and apparatus for predicting and handling resolving return from subroutine instructions in a computer processor |
US5860017A (en) * | 1996-06-28 | 1999-01-12 | Intel Corporation | Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction |
US6647467B1 (en) * | 1997-08-01 | 2003-11-11 | Micron Technology, Inc. | Method and apparatus for high performance branching in pipelined microsystems |
US20070174599A1 (en) * | 1997-08-01 | 2007-07-26 | Micron Technology, Inc. | Method and apparatus for high performance branching in pipelined microsystems |
US6622240B1 (en) * | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US20020038417A1 (en) * | 2000-09-27 | 2002-03-28 | Joachim Strombergsson | Pipelined microprocessor and a method relating thereto |
US20050132175A1 (en) * | 2001-05-04 | 2005-06-16 | Ip-First, Llc. | Speculative hybrid branch direction predictor |
US7107438B2 (en) * | 2003-02-04 | 2006-09-12 | Via Technologies, Inc. | Pipelined microprocessor, apparatus, and method for performing early correction of conditional branch instruction mispredictions |
US20040158697A1 (en) * | 2003-02-04 | 2004-08-12 | Via Technologies, Inc. | Pipelined microprocessor, apparatus, and method for performing early correction of conditional branch instruction mispredictions |
US20040225866A1 (en) * | 2003-05-06 | 2004-11-11 | Williamson David James | Branch prediction in a data processing system |
US20050228977A1 (en) * | 2004-04-09 | 2005-10-13 | Sun Microsystems,Inc. | Branch prediction mechanism using multiple hash functions |
US20060095750A1 (en) * | 2004-08-30 | 2006-05-04 | Nye Jeffrey L | Processes, circuits, devices, and systems for branch prediction and other processor improvements |
US20060277397A1 (en) * | 2005-06-02 | 2006-12-07 | Sartorius Thomas A | Method and apparatus for predicting branch instructions |
US20070239975A1 (en) * | 2006-04-07 | 2007-10-11 | Lei Wang | Programmable backward jump instruction prediction mechanism |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090037709A1 (en) * | 2007-07-31 | 2009-02-05 | Yasuo Ishii | Branch prediction device, hybrid branch prediction device, processor, branch prediction method, and branch prediction control program |
US8892852B2 (en) | 2007-07-31 | 2014-11-18 | Nec Corporation | Branch prediction device and method that breaks accessing a pattern history table into multiple pipeline stages |
US20100031011A1 (en) * | 2008-08-04 | 2010-02-04 | International Business Machines Corporation | Method and apparatus for optimized method of bht banking and multiple updates |
US20100161951A1 (en) * | 2008-12-18 | 2010-06-24 | Faraday Technology Corp. | Processor and method for recovering global history shift register and return address stack thereof |
US8078851B2 (en) * | 2008-12-18 | 2011-12-13 | Faraday Technology Corp. | Processor and method for recovering global history shift register and return address stack thereof by determining a removal range of a branch recovery table |
US20140195790A1 (en) * | 2011-12-28 | 2014-07-10 | Matthew C. Merten | Processor with second jump execution unit for branch misprediction |
US10372590B2 (en) | 2013-11-22 | 2019-08-06 | International Business Corporation | Determining instruction execution history in a debugger |
US10552297B2 (en) | 2013-11-22 | 2020-02-04 | International Business Machines Corporation | Determining instruction execution history in a debugger |
US10977160B2 (en) | 2013-11-22 | 2021-04-13 | International Business Machines Corporation | Determining instruction execution history in a debugger |
US10120687B2 (en) | 2014-02-26 | 2018-11-06 | Fanuc Corporation | Programmable controller |
US20230315468A1 (en) * | 2022-03-30 | 2023-10-05 | Advanced Micro Devices, Inc. | Enforcing consistency across redundant tagged geometric (tage) branch histories |
Also Published As
Publication number | Publication date |
---|---|
ATE496329T1 (en) | 2011-02-15 |
CN101529378B (en) | 2013-04-03 |
WO2008055045A1 (en) | 2008-05-08 |
EP2084602A1 (en) | 2009-08-05 |
DE602007012131D1 (en) | 2011-03-03 |
JP2010509680A (en) | 2010-03-25 |
JP5209633B2 (en) | 2013-06-12 |
EP2084602B1 (en) | 2011-01-19 |
KR101081674B1 (en) | 2011-11-09 |
US7984279B2 (en) | 2011-07-19 |
CN101529378A (en) | 2009-09-09 |
KR20090089358A (en) | 2009-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7984279B2 (en) | System and method for using a working global history register | |
US7278012B2 (en) | Method and apparatus for efficiently accessing first and second branch history tables to predict branch instructions | |
US5903750A (en) | Dynamic branch prediction for branch instructions with multiple targets | |
US7685410B2 (en) | Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects | |
US6189091B1 (en) | Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection | |
JP5255367B2 (en) | Processor with branch destination address cache and method of processing data | |
US9201654B2 (en) | Processor and data processing method incorporating an instruction pipeline with conditional branch direction prediction for fast access to branch target instructions | |
US6550004B1 (en) | Hybrid branch predictor with improved selector table update mechanism | |
US7877586B2 (en) | Branch target address cache selectively applying a delayed hit | |
JPH0334024A (en) | Method of branch prediction and instrument for the same | |
EP1851620A2 (en) | Suppressing update of a branch history register by loop-ending branches | |
US7827392B2 (en) | Sliding-window, block-based branch target address cache | |
JPH0863356A (en) | Branch estimation device | |
JP5815596B2 (en) | Method and system for accelerating a procedure return sequence | |
US8909908B2 (en) | Microprocessor that refrains from executing a mispredicted branch in the presence of an older unretired cache-missing load instruction | |
US7865705B2 (en) | Branch target address cache including address type tag bit | |
US6421774B1 (en) | Static branch predictor using opcode of instruction preceding conditional branch | |
US20040003213A1 (en) | Method for reducing the latency of a branch target calculation by linking the branch target address cache with the call-return stack | |
US20090070569A1 (en) | Branch prediction device,branch prediction method, and microprocessor | |
EP0666538A2 (en) | Data processor with branch target address cache and method of operation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:STEMPEL, BRIAN MICHAEL;DIEFFENDERFER, JAMES NORRIS;SARTORIUS, THOMAS ANDREW;AND OTHERS;REEL/FRAME:018476/0666;SIGNING DATES FROM 20060921 TO 20060922 Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:STEMPEL, BRIAN MICHAEL;DIEFFENDERFER, JAMES NORRIS;SARTORIUS, THOMAS ANDREW;AND OTHERS;SIGNING DATES FROM 20060921 TO 20060922;REEL/FRAME:018476/0666 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |