WO2013000400A1 - Branch processing method and system - Google Patents

Branch processing method and system Download PDF

Info

Publication number
WO2013000400A1
WO2013000400A1 PCT/CN2012/077565 CN2012077565W WO2013000400A1 WO 2013000400 A1 WO2013000400 A1 WO 2013000400A1 CN 2012077565 W CN2012077565 W CN 2012077565W WO 2013000400 A1 WO2013000400 A1 WO 2013000400A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
branch
address
track
processor
Prior art date
Application number
PCT/CN2012/077565
Other languages
French (fr)
Chinese (zh)
Inventor
林正浩
Original Assignee
上海芯豪微电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海芯豪微电子有限公司 filed Critical 上海芯豪微电子有限公司
Publication of WO2013000400A1 publication Critical patent/WO2013000400A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • G06F9/3806Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding

Definitions

  • the present invention relates to the field of electronic computer and microprocessor architecture, and in particular to a branch processing method and system.
  • Control hazards also known as branches
  • branch instruction the traditional processor cannot know in advance where to get the next instruction to execute after the branch instruction, but needs to wait until the branch instruction is completed before the empty instruction occurs after the branch instruction in the pipeline.
  • Figure 1 shows the traditional pipeline structure, where the pipeline segment corresponds to a branch instruction.
  • the columns in Table 1 represent the clock cycles in the pipeline and the rows represent the sequential instructions.
  • the instruction address is provided to the instruction memory for addressing when the instruction is fetched, after which the output of the instruction memory is sent to the decoder to decode the fetched instruction.
  • the pipeline includes instruction fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and write back (WB). Stop ("stall") indicates a pipeline pause or an empty cycle.
  • Table 1 shows a branch instruction labeled 'i', which is fetched during clock cycle '1'.
  • 'i+1' denotes an instruction immediately following the branch instruction
  • target denotes a branch target instruction of the branch point
  • target +1 denotes a branch target instruction of the branch point
  • target +2 denotes a branch target instruction of the branch point
  • target +3 indicates the sequential instruction immediately following the branch target instruction.
  • the processor acquires the branch instruction 'i'.
  • the processor fetches the instruction 'i+1' and decodes the branch instruction 'i'. It is assumed that the branch target address can be calculated at the end of the branch instruction decoding segment and the branch decision is completed. If the branch determines that a branch transfer has occurred, then the branch target address is saved as the next address for the next instruction.
  • the branch target instruction is acquired and decoded and executed in a subsequent cycle. From here on, the pipeline processes the instructions following the branch target instruction.
  • the methods and systems proposed by the present invention can be used to solve one or more of the above problems, as well as other problems.
  • the present invention provides a method of controlling processor pipeline operations.
  • the processor is coupled to a memory containing executable computer instructions.
  • the method includes determining whether the instruction to be executed by the processor is a branch instruction, and providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence.
  • the method further includes determining a branch instruction of the branch instruction based on at least an address of the branch target instruction, and according to the branch determination, selecting at least one of the branch target instruction and the latter instruction before the branch instruction reaches its execution segment in the pipeline The instruction that the execution unit is about to execute, so that the transfer of the pipeline instruction does not cause a pause in the pipeline operation.
  • the present invention also proposes a pipeline control system for controlling processor pipeline operations.
  • the processor is coupled to a memory containing executable computer instructions.
  • the system includes a review unit, an addressing unit, a branch logic unit, and a selector.
  • the review unit is configured to determine whether the instruction to be executed by the processor is a branch instruction.
  • the addressing unit is coupled to the processor for providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence.
  • the branch logic unit is configured to determine a branch decision regarding the branch instruction based on at least a branch target instruction address provided by the addressing unit.
  • the selector is configured to determine, according to the branch decision provided by the branch logic unit, at least one of the branch target instruction and the latter instruction as the instruction to be executed by the execution unit before the branch instruction reaches the execution segment in the pipeline, so that Whether or not the transfer of the branch instruction occurs does not cause a pause in the pipeline operation.
  • the present invention also provides a method of controlling processor pipeline operations.
  • the processor is coupled to a memory containing executable computer instructions.
  • the method includes determining whether the instruction to be executed by the processor is a branch instruction, and providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence.
  • the method further includes obtaining the branch target instruction and the subsequent instruction according to the branch target instruction address and the subsequent instruction address, respectively.
  • the method further includes: decoding the obtained branch target instruction and the subsequent instruction, and selecting, according to the branch judgment provided by the processor, the decoding result of the branch target instruction and the decoding result of the subsequent instruction to be sent to the execution unit, This causes no stalling of the pipeline operation, regardless of whether a branch instruction transfer occurs.
  • the present invention also proposes a pipeline control system for controlling processor pipeline operations.
  • the processor is coupled to a memory containing executable computer instructions.
  • the pipeline control system includes an addressing unit coupled to the processor for providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence.
  • the pipeline control system also includes a read buffer coupled between the memory and the processor for storing at least one of a branch target instruction and a subsequent instruction of the branch instruction.
  • the read buffer further includes a selector coupled to the processor for providing one of the branch target instruction or the latter instruction to the processor when the branch instruction is executed, so that no branch transfer of the branch instruction occurs. Will cause a pause in the operation of the pipeline.
  • the system and method of the present invention can provide a basic solution for branch processing of pipeline processors.
  • the system and method acquires the address of the branch target instruction before the branch point is executed, and uses various branch decision logic to eliminate efficiency loss due to erroneous branch prediction.
  • Other advantages and benefits of the present invention can also be derived by those skilled in the art.
  • Figure 1 is a control structure of a conventional ordinary pipeline
  • FIG. 3 is an embodiment of a processor system in accordance with the present invention.
  • Figure 4 is an embodiment of the track table of the present invention.
  • Figure 5A is an embodiment of another pipeline control structure of the present invention.
  • Figure 5B is an embodiment of another pipeline control structure according to the present invention.
  • FIG. 6 is an embodiment of another processor system in accordance with the present invention.
  • FIG. 7 is an embodiment of another processor system in accordance with the present invention.
  • Figure 8 is an embodiment of different command values in the operation of the present invention.
  • Figure 9 is an embodiment of another pipeline control structure of the present invention.
  • Figure 10 is an embodiment of a processor environment in accordance with the present invention.
  • FIG. 11 is a schematic diagram of a branch prediction method according to the present invention.
  • Figure 12 is an embodiment of the branch prediction described in the present invention.
  • Figure 3 shows a preferred embodiment of the invention.
  • FIG. 2 shows an example of a pipeline control structure 1 consistent with the disclosed invention.
  • pipeline operations include fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and write back (WB).
  • Other pipeline structures can also be used.
  • decoder 11 fetches instructions from instruction memory (or instruction cache) 10 via instruction bus 16.
  • the decoder 11 decodes the fetched instructions and prepares operands for subsequent operations.
  • the decoded instructions and operands are sent to the execution and program counter 12 (EX/PC) for execution and calculate the address 21 of the next instruction in the program family.
  • the address 21 of the next instruction is used as the input of the selector 20.
  • a branch judging unit 13 provides a branch control signal 14 for controlling the selector 20.
  • the branch control signal 14 can be generated based on the branch type and the branch condition (or a condition flag).
  • the branch control signal 14 controls the selector 20 to select which of the inputs is output to the register 17 and the address bus 19. Thereafter, the output on bus 19 is used to extract the next instruction from instruction memory 10.
  • FIG. 3 shows a processor environment 300 corresponding to the pipeline control structure 1 of the present invention.
  • processor environment 300 includes a low level memory 122, a high level memory 124, and a processor core 125.
  • processor environment 300 includes a fill/builder 123, an active table 121, a track table 126, a tracker 170, and a branch decision logic 210 (corresponding to branch decision logic 13 in FIG. 2).
  • branch decision logic 210 corresponding to branch decision logic 13 in FIG. 2
  • the various components listed herein are for ease of description and may include other components, and some components may also be omitted.
  • the various components herein may be distributed across multiple systems, either physically or virtually, and may be hardware implemented (eg, integrated circuits), implemented in software, or implemented in a combination of hardware and software.
  • High level memory 124 and low level memory 122 may comprise any suitable storage device such as: static memory (SRAM), dynamic memory (DRAM), and flash memory (flash) Memory).
  • the level of memory refers to the degree of proximity between the memory and the processor core. The closer the processor core is, the higher the level.
  • a high level of memory is typically faster than a low level of memory but has a small capacity.
  • the high level memory 124 can operate as a cache for the system, or as a level 1 cache when other buffers are present, and can be partitioned into a plurality of blocks called blocks (eg, memory blocks) for storing the processor core 125.
  • a stored fragment of the accessed data ie, the instructions and data in the instruction block and the data block).
  • Processor core 125 can be any suitable processor that can be pipelined and cooperate with the cache system. Processor core 125 may use separate instruction caches and data caches, and may include some instructions for cache operations. When processor core 125 executes an instruction, processor core 125 first needs to read the instructions and/or data from the memory. Active table 121, track table 126, tracker 170, and pad/generator 123 are used to fill instructions to be executed by processor core 125 into high level memory 124, enabling processor core 125 to be from high level memory 124. Read the required instructions at a very low cache miss rate.
  • the term “fill” means moving data/instructions from a lower level memory to a higher level memory
  • the term “memory access” means that processor core 125 is the closest memory (ie, high level memory). 124 or level 1 cache) to read or write.
  • the pad/generator 123 can fetch instructions or instruction blocks according to appropriate addresses, and can review each instruction fetched from the low level memory 122 to fill into the high level memory 124 and extract certain Information such as instruction type, instruction address, and branch target information for branch instructions.
  • the instruction and the extracted information containing the branch target information are used to calculate the address and sent to other modules, such as the active table 121 and the track table 126.
  • a branch instruction or a branch point refers to any suitable form of instructions that causes processor core 125 to change the execution stream (e.g., execute an instruction out of order). If the instruction block corresponding to the branch target information has not been filled into the high level memory 124, the corresponding track is established while the instruction block is filled into the high level memory 124.
  • the tracks in the track table 126 correspond one-to-one with the memory blocks in the high level memory 124 and are all pointed by the same pointer 152. Any instructions that processor core 125 is to execute can be populated into high level memory 124 prior to execution.
  • the pad/generator 123 may determine address information such as an instruction type, a branch source address, and a branch target address information based on the instruction and the branch target information.
  • the instruction types may include conditional branch instructions, unconditional branch instructions, and other instructions.
  • the instruction class may also include subcategories of conditional branch instructions, such as branching when equal, branching when greater than, etc.
  • an unconditional branch instruction can be considered a special case of a conditional branch instruction, ie, the condition is always true. Therefore, the instruction type can be divided into branch instructions and other instructions.
  • the branch source address can refer to the address of the branch instruction itself, and the branch target address can refer to the address to which the branch will be transferred when the branch succeeds. In addition, you can include other information.
  • a track table can be created based on the pre-computed information for providing an address to populate the high level memory 124.
  • 4 is an example of a track table operation as disclosed herein. As shown in FIG. 4, track table 126 interacts with tracker 170 to provide the address required for buffering and branching processing.
  • the track table 126 can include tracks for instructions executed by the processor core 125, the tracker 170 provides different addresses based on the track table 126, and provides a read pointer for the track table 126.
  • the track referred to here means an expression of a series of instructions (such as an instruction block) to be executed. This form of expression can include any suitable data type, such as an address, block number, or other number.
  • a track contains a branch point that has a branch target that changes the flow of the program, or another instruction after an instruction is in a different instruction block, such as a block in the next instruction block.
  • a new track can be created when an instruction, or an exception program, or another program thread, etc.
  • the track table 126 can include a plurality of tracks, wherein each track of the track table 126 has a corresponding relationship with a line marked with a line number or block number (BN) in the track table, the block number pointing to a corresponding memory block.
  • a track may include a plurality of track points, and a track point may correspond to one or more instructions. Further, since one track corresponds to one line in the track table 126, one track point corresponds to one entry (for example, one storage unit) of one line in the track table 126. Thus, the total track point in one track can be equal to the total number of entries in a row in track table 126. Other ways of organizing can also be used.
  • a track point (ie, an item in a table entry) can contain information about an instruction in the track, such as a branch instruction.
  • the content of a track point can contain information about the class of the corresponding instruction and the branch target.
  • processor core 125 can read an instruction for operation with an instruction address of (M+Z) bits, where M and Z are integers.
  • the M-bit portion of the address can be referred to as a high-order address, and the Z-bit portion can be referred to as an offset address.
  • the track table 126 may contain 2 M lines, i.e. a total of 2 M tracks, and the upper address may be used for the addressing of the track table 126.
  • Each line may contain 2 Z track items, ie a total of 2 Z track points, offset
  • the address can be used to address in the corresponding row to determine a particular track point.
  • each entry or track point in the row may include a category portion 57, an XADDR portion 58, and a YADDR portion 59. Other parts can also be included.
  • the category portion 57 represents the category of the track point corresponding instruction.
  • the instruction classes may include conditional branch instructions, unconditional branch instructions, and other instructions.
  • the instruction class may also include subcategories of conditional branch instructions, such as branching when equal, branching when greater than, etc.
  • the XADDR portion 58 may contain an M-bit address, which may also be referred to as a first-dimensional address or simply as a first address.
  • the YADDR portion 59 may contain a Z-bit address, which may also be referred to as a second-dimensional address or simply as a second address.
  • the new track can be built in an available row of the track table 126, and the branch track point can be established in an available entry for the row. in.
  • the location of the row and the entry is determined by the source address of the branch point (ie, the branch source address).
  • the row number or the block number may be determined according to the upper address of the branch source address, and the entry is determined according to the offset address of the branch source address.
  • the content of the new track point can correspond to the branch target instruction.
  • the contents of the branch track point store the branch target address information.
  • the line number or block number of the specific row corresponding to one branch target instruction in the track table 126 is stored as the first address in the content of the branch track point.
  • the offset address indicates the position of the branch target instruction in its track, and the offset address is stored as the second address in the content of the branch track point.
  • the first address is used as the row address
  • the second address is used as the column address to address the branch target track point in the row.
  • Instruction memory 46 may be part of high level memory 124 for instruction access and may be comprised of any suitable high performance memory. Instruction memory 46 may contain 2 M memory blocks, each of which contains 2 Z bytes or words. That is, the instruction memory 46 can store all instructions addressed by the M and Z bits (i.e., instruction addresses) such that the M bits can be used for a particular memory block, and the Z bits can be used for A particular byte or word in that particular memory block is addressed.
  • M and Z bits i.e., instruction addresses
  • the tracker 170 can be comprised of various components or devices, such as registers, selectors, stacks, and/or other memory modules for determining the next track to be executed by the processor core 125.
  • the tracker 170 can determine the next track based on information such as the current track in the track table 126, track point information, and whether branching has occurred due to execution of the processor core 125.
  • the (M+Z) bit instruction address of the branch instruction is passed on bus 55.
  • the M-bit address is sent to the track table 126 as a first address or XADDR (or X address) via the bus 56
  • the Z-bit address is sent to the track table 126 as a second address or YADDR (or Y address) via the bus 53.
  • the track table 126 can find a branch instruction entry and output the branch target address of the branch instruction to the bus 51.
  • the selector 49 selects YADDR on the bus 53 to increment by one (1) byte or word to obtain a new second address 54.
  • the first address remains unchanged and the new address can be output on bus 52.
  • the register 50 keeps the first address unchanged, and the second address is incremented by one (1) by the increment one logic 48 until it points to the current track table. The next branch instruction on the line.
  • register 50 holds the first address of the changed corresponding new track and provides a new address of (M+Z) bits to bus 55. on.
  • track table 126 and tracker 170 provide a block address, while processor core 125 provides only one offset.
  • the processor core 125 feeds back the branch instruction execution state so that the tracker 170 can perform the decision operation.
  • the instruction block corresponding to the track is filled into the instruction memory 46 before a new track is executed. Repeating this process can cause cache misses to occur for all instructions that processor core 125 will execute.
  • the active table 121 can be used to store any established track information and establish a mapping relationship between the address (or a portion of the address) and the block number so that it can be used Any available rows in track table 126 establish a track. For example, when a track is established, branch destination address information of all branch points in the track is stored in the active table 121. Thus, the active table 21 can store mapping information of tracks of all branch target track points in the program. Other configuration structures can also be used.
  • the active table 121 can be used to store the block number of the instruction block in the high level memory 124.
  • the block number also corresponds to the line number in the track table 126.
  • the block number of the branch target address can be obtained by matching the address with the entry in the active table 121.
  • the result of the successful matching i.e. the block number (the aforementioned first address)
  • the offset of the instruction in the track (the aforementioned second address) to determine the position of the track point.
  • a block number is specified by the active table 121 and the instruction segment corresponding to the address is filled into the position in the high-level memory 124 indexed by the block number, and a new track corresponding to the block number is established in the track table 126, so that the active table is obtained.
  • 121 can represent the established track and associated address. Therefore, the operations of the active table 121 and the pad/generator 123 can be filled into the cache 124 (ie, the high-level memory) of the instruction segment corresponding to the branch target instruction of the branch point before the branch point is acquired and executed by the processor core 125. 124) Medium.
  • the track table 126 can be configured as a two-dimensional table in which each row is indexed by the first address BNX, corresponding to one memory block or one storage row, and the second address BNY is indexed for each column, corresponding to the corresponding instruction (data ) The offset in the memory block.
  • the write address of the track table corresponds to the source address of the instruction.
  • the active table 121 assigns a BNX based on the upper address, and BNY is equal to the offset. Then, BNX and BNY can form a write address that points to the written entry.
  • the branch target address of all branch instructions can be obtained by calculating the sum of the branch instruction address and the branch offset of the branch target instruction.
  • the branch target address (higher address, offset) is sent to the active table 121 to match the upper address portion, and the active table 121 can be assigned a BNX.
  • tracker 170 can be used to provide a read pointer 151 to track table 126.
  • the read pointer 151 can also be in the form of BNX and BNY.
  • the contents of the track entry pointed to by the read pointer are read along with the BNX and BNY (source BNX and source BNY) of the entry and are checked by the tracker 170.
  • TAKEN control signal
  • the BNY method updates the read pointer.
  • the tracker 170 implements a track-based operation with the track table 126 and the active meter 121.
  • the branch information, the branch target instruction, and the address information of the instruction following the branch instruction can be determined in advance. This information can be used by the pipeline control structure 1 to perform branch processing operations without suspending the pipeline.
  • the tracker 170 receives the branch target address from the track table 126 via the bus 150.
  • the upper address of the branch destination address (target BNX) is used as one input for one selector, and the other input is current BNX (BN The high address of 151, or the source BNX).
  • the output of this selector is the next BNX.
  • the offset portion of the branch target address (target BNY) is taken as one input to the other selector, and the other input is derived from the PC offset 155 of the processor core 125.
  • the output of this selector is used as the "offset 1" address of the high level memory 124 to be paired by BNX
  • the instructions in the cache block determined by 152 are addressed.
  • Read pointer 151 (BNX 152, BNY 153) Move at a faster speed than the PC (eg, the tracker 170 operates at a higher clock frequency, etc.).
  • the read pointer 151 moves along the track.
  • Core 125 executes the branch point and waits for control signal 'TAKEN' signal 212 and 'BRANCH/JUMP' signal 213 from branch decision logic 210.
  • Processor core 125 provides a PC offset to address instructions in high level memory 124, while tracker 170 provides BNY. 153 addresses the branch points in the track table 126.
  • branch decision logic 210 For comparison, If PC offset 155 and BNY 153 is equal, then indicating that processor core 125 is acquiring the branch point. That is to say, BNY The match of 153 with PC offset 155 can be used to control the timing of the branching process such that branch decision logic 210 equals BNY at PC offset 155. At 153, branch determination is made. Alternatively, branch processing may be started when the PC offset 155 is different from the BNY 153 by a predetermined number of instructions.
  • the branch decision logic 210 can determine whether a branch transfer has occurred. In some cases, branch decisions can be made based on branch type and branch conditions (or condition flags).
  • the branch type 211 (derived from the track table 126) may represent a particular type of branch instruction, such as a branch transfer when the branch condition is equal to zero or a branch transfer when the branch condition is greater than zero.
  • the branch conditions can be generated by processor operations of processor core 125. Depending on the processor architecture, branch instructions, and/or pipeline operations, the branch conditions for a particular branch instruction may be valid across multiple pipeline segments of processor core 125.
  • Branch decision logic 210 may include any suitable circuitry for branching decisions. As previously described, the branch decision logic 210 can be equal to BNY at a PC offset of 155. 153 or at PC offset 155 and BNY When a relationship is formed 153 (e.g., greater than), a branch decision is made, and the branch decision can give a signal that the condition flag is ready. Thereafter, the result of the branch decision logic 210 is taken as the 'TAKEN' signal 212 and the 'BRANCH/JUMP' signal 213. The 'BRANCH/JUMP' signal informs the tracker 170 that the processor core 125 has reached the branch instruction and enables the read pointer 151 to be updated. The 'TAKEN' signal is the actual result of the program being executed and selects the correct next instruction to be executed.
  • next BNX target BNX
  • next BNY target BNY
  • the processor core 125 is used to acquire the branch target instruction (target BNX 152, target BNY)
  • the correct address of 150) has been prepared to provide the port "Block Select 1" and "Offset 1" to the high level memory 124.
  • Table 2 shows the pipeline segment diagram when the branch transfer succeeds.
  • the row labeled "Instruction Address” is the instruction memory address corresponding to the instruction memory 124 "Block Select 1" (High Address) and "Offset 1" (Low Address), and is labeled "Get Instruction”.
  • the row corresponds to the instruction on the high level memory 124 "read port 1".
  • the instruction 'i' is a branch instruction
  • the 'target' is a branch target instruction
  • the 'target +1' is the next instruction of the branch target instruction, and so on.
  • processor core 125 is used to obtain the correct address of the instruction immediately following the branch instruction (source) BNX 152, PC Offset 155) are also ready to provide port 'block selection 1' and 'offset 1' to high level memory 124. Thus, the processor core 125 You can continue the pipeline operation without waiting. Further, the tracker 170 can use the read pointer to acquire the next branch point under the control of the control signal to continue the branch processing as described earlier.
  • table 3 An illustration of the pipeline segment when the branch transfer was unsuccessful is shown. The instruction 'i' is a branch instruction, 'i+1' is the last instruction of the branch instruction, and so on.
  • Figure 5A shows another pipeline control structure 2 of the present invention.
  • the decoder 11 decodes the fetched instructions and provides the operands required for execution.
  • the resulting instruction decode result and operand are sent to the execution unit and program counter (EX/PC) to execute and calculate the next instruction address 21 in the program stream.
  • the next instruction address 21 and the branch target instruction address 18 are sent to the instruction memory (or instruction cache) 22 through the registers 24 and 23, respectively.
  • Instruction memory 22 may contain multiple ports for read/write operations.
  • the instruction memory 22 can include two address ports for outputting the next instruction address 21 and the branch target instruction address 18.
  • the instruction memory 22 can provide respective instructions on the output ports 28 and 29, respectively.
  • two instructions corresponding to the next instruction address 21 and the branch target instruction address 18 on the output ports 28 and 29, respectively, are input to the selector 26, and the branch determination logic 13 can provide a control signal 14 to the selector 26 for selection. Inputs from ports 28 and 29 are sent to decoder 11.
  • the branch judging logic 13 judges that the branch point transfer has occurred, the instruction 29 corresponding to the branch target instruction address 18 is output to the decoder 11. If the branch judging logic 13 judges that the branch point transfer does not occur, the instruction 28 corresponding to the next instruction address 21 is output to the decoder 11. Furthermore, since the branch decision logic 13 makes this determination before the branch point reaches its execution segment or before the instruction decodes, the clock cycle loss of the pipeline is not caused by waiting for the branch decision.
  • FIG. 6 shows an embodiment of a processor environment 400 corresponding to the pipeline control structure 2.
  • processor environment 400 is similar to processor environment 300 in FIG. However, processor environment 400 differs from processor environment 300 in that branch decision logic is included in processor core 125, and high level memory 124 provides two address ports "Block Select 1, Offset 1" and “Block” Option 2, offset 2", and two read ports “Read Port 1” 127 and "Read Port 2" 128.
  • the track table 126 when processing a branch instruction, can provide a branch target instruction address target BNX 201 and a target BNY to the address port "Block Select 2, Offset 2". 202. Further, the read pointer 151 supplies the block address BNX 152 of the next instruction to "Block Select 1", and the processor core 125 provides the offset address of the next instruction to "Offset 1".
  • the high level memory 124 fetches the branch target instruction and the next instruction, respectively, and takes the acquired branch target instruction and the next instruction as the acquired instruction 204 and acquires respectively.
  • the instruction 203 is sent to "Read Port 2" 128 and "Read Port 1" 127.
  • the fetched instruction 204 and the fetched instruction 203 are also two inputs to the selector 205 that is controlled by the control signal 207 (ie, the TAKEN signal from the processor core 125).
  • selector 205 selects the correct one of the fetched instructions as output 206 to processor core 125 before processor core 125 decodes the fetched instruction based on the TAKEN signal. If the branch transfer occurs, the acquired branch target instruction is selected, and if the branch transfer does not occur, the acquired next instruction is selected.
  • the processor core 125 also provides a BRANCH/JUMP signal to the tracker 170 to indicate that the processor core 125 has reached a branch instruction, the TAKEN signal at this time is the actual result of the program execution and selects the correct next executed instruction. .
  • the tracker 170 uses the new address as the BN. 151.
  • the obtained instruction 204 corresponding to the branch target instruction (target BNX 201, target BNY) 202) has been sent to the processor core 125 as output 206. In this way, processor core 125 can continue pipeline operations without interruption.
  • the branch decision is unconditional, the unconditional branch instruction can be treated as a special branch point that satisfies the condition and does not require further judgment.
  • Table 4 shows an illustration of the pipeline segments in the event that a branch transfer occurs.
  • the row labeled "Instruction Address” is the corresponding instruction memory 124.
  • the "block select 1" (high order address) and “offset 1" (lower address) instructions store the address, while the line labeled "get instruction" corresponds to the instruction on the output 206 of the selector 205.
  • the branch target instruction ("target") is fetched from the high level memory 124 along with the next instruction ("+1") and before the end of the decode segment Perform branch determination. Since both instructions are fetched, the correct instruction can be selected and used in its decode segment (clock cycle 4), regardless of whether the branch transfer occurs. This means that the instruction fetched after the branch point is always a valid instruction and there is no need to pause the pipeline. Similarly, as shown in Table 4, "Read Port 2" provides the next branch target instruction in advance.
  • the branch target instruction from "Read Port 2" is selected at clock cycle 3 as an instruction to enter the decode segment at clock cycle 4.
  • the processor core 125 program counter (PC) is forced to the next instruction of the branch target instruction (target +1) instead of the branch target instruction (target).
  • the instruction address is incremented in the normal way until the next branch point address is reached.
  • processor core 125 continues the pipeline operation without suspending.
  • Table 5 shows an illustration of the pipeline segments when branching does not occur.
  • the instruction "i+1" following the branch instruction from "Read Port 1" is selected in clock cycle 3 as the instruction to enter the decode segment at clock cycle 4. From this point on, the instruction address is incremented in the normal way until the next branch point is reached.
  • FIG. 5B shows a block diagram of the pipeline control structure 3.
  • the pipeline control structure 3 is another option than the pipeline control structure 2 described above.
  • the pipeline control structure 3 differs from the pipeline control structure 2 in that it includes an additional memory 40.
  • the memory 40 may contain the same number of memory blocks as the number of rows of the track table 126, each corresponding to one of the track tables 126.
  • each memory block in memory 40 may contain a memory cell of the same number of track points or entries as a row in track table 126.
  • the branch target instruction is stored in the corresponding memory location of the memory 40 in addition to being stored in the memory block of the instruction memory 22 corresponding to the branch target instruction.
  • the branch target address 18 is derived from the entry of the track table 126.
  • the content of the entry is BNX and BNY of the branch target instruction corresponding to the entry or the branch track point.
  • BNX and BNY can be used as an index to find the corresponding branch target instruction stored in memory 40.
  • the selected branch target instruction can be sent to the selector 26 via the bus 29.
  • the next instruction can be fetched from the instruction memory 22 based on the next instruction address 21, and the fetched next instruction can also be sent to the selector 26 via the bus 28.
  • the instruction memory 22 of Figure 5B can be a single port storage device without the need for a dual port storage device as shown in Figure 5A.
  • track table 126 can store the branch target instruction. That is to say, the contents of the branch track point include the branch target instruction in addition to the address and offset of the branch target instruction.
  • track table 126 can provide branch target instructions directly to selector 26 for selection by control signal 14 from branch decision logic 13. This configuration structure can be considered as the memory 40 being integrated in the track table 126.
  • the branch target instruction address can be determined in advance, in other words, since the branch target information and the branch type are already prepared, the branch condition flag can be branched immediately after the processor core operation is set. determination.
  • the main function of the branch decision is to calculate the branch target address and perform the branch decision according to the branch type and the condition flag of the branch instruction, the branch decision can be made earlier than when the branch instruction itself reaches its normal execution segment. In general, the sooner a branch decision is made, the less additional hardware resources are needed.
  • various configuration configurations can be used such that the pipeline can continue without branching when processing the branch transfer.
  • FIG. 7 shows an embodiment of a processor environment 600 in accordance with the present invention.
  • a read buffer is used to provide a branch target instruction for a branch instruction in the program stream of processor core 125 and an instruction immediately following the branch instruction.
  • Processor environment 600 is similar to processor environment 300 in Figure 3, with some differences.
  • processor environment 600 includes a read buffer 229 and a selector 225 in addition to cache 124, processor core 125, track table 126, and tracker 170.
  • Read buffer 229 is coupled between cache 124 and processor core 125 and includes a memory module 216 and a selector 214.
  • the storage module 216 is used to store certain instructions.
  • memory module 216 in read buffer 229 stores and provides one of a branch target instruction or a subsequent instruction, while the other is provided directly by cache 124 such that the same cache structure can provide higher bandwidth.
  • the selector 214 in the read buffer 229 is used to select one of the branch target instruction and the subsequent instruction based on the branch decision such that the instruction provided to the processor core 125 after the branch instruction is valid or correct.
  • selector 214 is used to select one of the outputs from storage module 216 or cache 124 as output 219 to processor core 125.
  • selector 220 is used to select one of the addresses originating from track table 126 or tracker 170 as output 224 to buffer 124 (a block address); and selector 225 is used to select source track table 126 or One of the PC (Program Counter) offsets from processor 125 is sent as output 224 to buffer 124 (an offset address).
  • Control signal 215 from tracker 170 is used to control selectors 220 and 225 and memory module 216, while a 'TAKEN' signal is used to control selector 214.
  • tracker 170 provides BNX 152 and BNY 153 such that track table 126 can output one corresponding to the BNX 152 and BNY Track point of 153.
  • the content in which the track point is read contains information such as an instruction type and a branch target address.
  • the content (eg, instruction type and branch target address) can be sent to the tracker 170 via the bus 150.
  • the upper portion of the branch target address (BNX) is sent to the selector 220 as an input.
  • the BNY of the branch destination address or a portion of the BNY (e.g., the highest two bits) may also be sent to the selector 225 via the bus 222.
  • Another input to selector 220 may be BNX provided by tracker 170, and the other input of selector 225 may be part of the PC offset or PC offset (eg, the highest two bits).
  • the storage module 216 can include a predetermined number of storage units for storing instructions based on the capacity of other components. For example, if a memory block (eg, an instruction block) contains a total of 16 instructions, the length of the BNY and PC offsets can be 4 bits (4). Bit). Assuming that four instructions are fetched from the instruction memory or cache 124 in one clock cycle, the memory module 216 can store four instructions, and the highest two bits of the BNY or PC offset can be used from the memory block pointed to by the BNX. Read 4 instructions and use the lowest two bits of the BNY or PC offset to select one of the four instructions read.
  • a memory block eg, an instruction block
  • the length of the BNY and PC offsets can be 4 bits (4). Bit). Assuming that four instructions are fetched from the instruction memory or cache 124 in one clock cycle, the memory module 216 can store four instructions, and the highest two bits of the BNY or PC offset can be used from the memory block pointed to by the BNX. Read 4 instructions
  • the total number of instructions fetched in one clock cycle is four, and for a single or multiple transmit processor, the total number of instructions fetched per clock cycle can be any suitable number.
  • the total number of instructions fetched in one clock cycle (eg, 4) may exceed the total number of instructions executed by processor core 125 in one clock cycle (eg, 1).
  • the memory module 216 or the fill buffer 124 can be loaded using the track table 126 and other related components at a certain clock cycle.
  • the cache 124 can include a single port memory module having a bandwidth greater than the processor core 125 commanded emissivity to support padding of the memory module 216 by the tracker 170 and fetching of the processor core 125.
  • the tracker 170 When the tracker 170 detects that an instruction is a branch instruction, the tracker 170 suspends the self-increment of BNY.
  • the instruction type information can be used as the write enable signal to control the memory module 216, and the four instructions currently output by the buffer 124 are written to the memory module 216 via the bus 217.
  • the signal 215 can control the selector 220 to select the BNX of the branch target instruction on the bus 221 as the instruction block address, and control the selector 225 to select the bus 222.
  • the upper two bits of the BNY of the branch destination address find four instructions in the instruction block.
  • These four instructions contain branch target instructions that can be read in the next read cycle or the next clock cycle.
  • the four instructions including the branch target instruction are stored in the storage module 216, and the PC offset is used again to read the next instruction.
  • Figure 8 shows an embodiment of reading instructions during operation in accordance with the teachings of the present invention.
  • column 226 shows the value on output 218 of memory module 216
  • column 227 shows the value on output 217 of buffer 124
  • column 228 shows the current instruction fetched by processor core 125.
  • the instructions I0, I1, I2 and I3 are four consecutive instructions corresponding to the highest two bits of the same PC offset, where I2 is a branch instruction.
  • the branch target instruction of the branch instruction I2 is T1
  • the instructions T0, T1, T2, and T3 are four consecutive instructions corresponding to the highest two bits of the same PC offset.
  • the lines here represent subsequent clock cycles or execution cycles (one execution cycle may contain more than one clock cycle).
  • the four rows correspond to period i, period i+1, period i+2, and period i+3, respectively. Furthermore, it is assumed that the 'TAKEN' signal is generated in the latter cycle of the branch instruction acquisition (ie, whether the branch transfer of the branch instruction occurs).
  • cycle i assuming that the PC offset points to I0, the read pointer reaches the track point of the corresponding branch instruction I2.
  • selector 214 selects the output from cache 124 as output 219, and the lowest two bits of the PC offset can be used to select instruction I0 required by processor core 125 from among four consecutive instructions.
  • the read pointer stops at the branch track point, and the four instructions output from the buffer 124 are stored in the memory module 216, and the branch target address is used as the instruction address for the next cycle (i.e., cycle i+1). Get 4 instructions including branch target instructions.
  • memory module 216 stores instructions I0, I1, I2, and I3, while cache 124 outputs instructions T0, T1, T2, and T3.
  • selector 214 selects the output of memory module 216 as output 219, and the lowest two bits can be used to select the instruction I1 required by processor core 125 from the four instructions on bus 219. Further, in the period i+1, the four instructions T0, T1, T2, and T3 are written to the storage module 216, and the BNX and PC offsets of the track points pointed by the read pointer are used as the instructions of the next cycle (ie, Instruction I2) address.
  • memory module 216 stores and outputs instructions T0, T1, T2, and T3, while buffer 124 outputs instructions I0, I1, I2, and I3.
  • selector 214 selects the output of buffer 124 as output 219, and the lowest two bits of the PC spoofing amount can be used to select the instruction I2 required by processor core 125 from the four instructions on bus 219.
  • the address of the next instruction (ie, I3) is used as the instruction address for the next cycle.
  • memory module 216 stores and outputs instructions T0, T1, T2, and T3, while buffer 124 outputs instructions I0, I1, I2, and I3.
  • selector 214 selects one of the outputs from cache 124 or the output from memory module 216 as output 219 depending on whether a branch transfer of the branch instruction occurs.
  • the 'TAKEN' signal (i.e., whether branching of the branch instruction occurs) can be used to select the output of cache 124 or the output of memory module 216.
  • instruction I3 and instruction T1 may be provided to processor core 125 at the same time, and processor core 125 may separately decode instruction I3 and instruction T1 and simultaneously obtain the operands of instruction I3 and instruction T1.
  • processor core 125 selects the decoding result of instruction T1 or the decoding result of instruction I3, as well as the correct operand. Specifically, when the read pointer reaches the track point corresponding to the branch instruction I2, if the instruction that the processor core 125 is acquiring is close to the branch instruction I2, for example, the instruction I1 is being acquired, after the instruction I2 is fetched, the buffer 124 is It is possible to start outputting four instructions I0, I1, I2 and I3.
  • Processor core 125 may still obtain I3 and T1 from cache 124 and storage module 216, respectively.
  • an exclusive OR logic can be used to invert the value of the select signal of control selector 214 to select a branch target instruction or four instructions including a branch target instruction from the output of cache 124, respectively, or from memory module 216.
  • the next instruction is selected in the output or four instructions including the next instruction.
  • the four instructions T0, T1, T2, and T3 need not be stored in the storage module 216 regardless of whether a branch transfer occurs.
  • Figure 9 shows another pipeline control structure 4 of the present invention.
  • the pipeline control structure 4 is similar to the pipeline control structure 2 of FIG.
  • the pipeline control structure 4 differs from the pipeline control structure 2 in that it includes two independent decoders: a decoder 25 and a decoder 26 instead of only one decoder 11.
  • the two instructions fetched from the instruction memory 22 are further decoded by the decoder 25 and the decoder 26, respectively, and the instruction decode result 31 and the instruction decode result 32 are sent to the selector 33. It is selected by control signal 14 from branch decision logic 13.
  • branch decision logic 13 determines that a branch point branch has occurred, then instruction decode result 32 corresponding to branch target instruction address 18 is selected and sent to execution unit 12. If the branch decision logic 13 determines that the branch point transfer does not occur, then the instruction decode 31 corresponding to the next instruction address 21 is selected and sent to the execution unit 12. Furthermore, since the branch decision logic 13 can complete the decision at the end of the branch instruction execution segment and before the execution segment of the next instruction, the pipeline will not have any clock cycle loss due to waiting for the branch result.
  • the branch determination logic 13 can determine the branch transition in a normal pipeline segment in addition to the branch branch determination prior to execution of the branch point, as at the end of the branch instruction execution segment. Since all instructions that may be executed by processor core 125 after the branch point have been fetched and decoded, and the instruction type is known, there will be no pipeline stalls due to branch decisions.
  • processor core 125 executes one instruction at a time as previously described, processor core 125 may execute more than one instruction at a time (i.e., one multi-transmission processor), which is also possible with the above examples.
  • processor core 125 may execute more than one instruction at a time (i.e., one multi-transmission processor), which is also possible with the above examples.
  • processor core 125 may execute more than one instruction at a time (i.e., one multi-transmission processor), which is also possible with the above examples.
  • a 5-segment pipeline operation is described, pipeline operations for any other number of pipeline stages in various pipeline structures are also possible.
  • clock cycle loss due to branch instruction processing can also be reduced by pre-processing executable instructions or using predefined instructions.
  • a branch instruction can be combined with a non-branch instruction to form a compound instruction such that the branch instruction can be processed while the non-branch instruction is being processed such that the clock cycle penalty required for the branch instruction is reduced to zero or at a minimum.
  • a processor instruction set typically contains some reserved or unused instructions, or some non-branch instructions have reserved or unused portions. These non-branch instructions can be used to include branch conditions and branch target addresses or offsets of branch instructions. Thus, when these non-branch instructions are executed, the branch condition can be determined and branch transfer can be performed during the execution of the non-branch instruction, thereby achieving zero-cost branch processing. Since the branch instruction roughly accounts for 20% of the total number of instructions executed by the processor, reducing the total number of executable instructions by 20% can significantly increase the performance of the processor.
  • a type of addition instruction consists of a 5-bit instruction code, two source operands in the form of a 4-bit register number, and a destination operand, thus, in this case, an addition instruction A total of 17 shares were shared, and the remaining 15 were not used.
  • a type of branch instruction performs branch decision by comparing the values of two registers.
  • a branch instruction can contain a 5-bit instruction code, a 5-bit branch offset, and a 4-bit register number each.
  • the branch instruction uses 18 bits.
  • addition and branch instruction when the addition instruction and the branch instruction are combined to form a composite instruction (eg, addition and branch), one bit of the 5-bit instruction code may be added to represent the composite instruction.
  • this "addition and branch" instruction contains a 6-bit instruction code, three register numbers for the addition operation occupy a total of 12 bits, two register numbers for branch transfer occupy a total of 8 bits, and a 5-bit branch bias The amount of shift is 31 in total.
  • the branch instruction can be executed while the add instruction is being executed, thereby achieving zero-cost branch processing. This approach makes zero-cost branching possible.
  • certain execution type instructions may have a 6-bit instruction code, and three of each are 5-bit register numbers, for a total of 21 bits. This leaves 11 bits for the additional branch operation.
  • This branching operation can be of a fixed type, such as branching occurs when the value of a particular register is non-zero.
  • One of these 11 bits can be referred to as a branch bit, while the other 10 bits can be a branch offset.
  • the branch bit is set to "0"
  • the instruction is a normal executable instruction.
  • the branch bit is set to "1”
  • the instruction is a branch instruction in addition to the function of performing the executable operation (addition, etc.).
  • the register contents are not equal to zero, the content is decremented by one, and the result of the execution is that the branch is transferred to an instruction whose address is a branch offset plus the composite instruction address.
  • the contents of the register are equal to zero, the branch transfer does not occur, and the next executed instruction is the instruction immediately following the composite instruction. This type of instruction can reduce two clock cycles per program cycle.
  • FIG. 10 shows an embodiment of a processor environment 1000 in accordance with the present invention.
  • a read buffer 229 is used to provide a branch instruction in the program stream of processor core 125 and subsequent instructions following the branch instruction.
  • the processor environment is similar to processor environment 600 in Figure 7, but with some differences.
  • processor environment 1000 includes a read buffer 229 in addition to cache 124, processor core 125, track table 126, and tracker 170.
  • Read buffer 229 is coupled between cache 124 and processor core 125 and includes a memory module 216 and a selector 214.
  • the storage module 216 is used to store certain instructions, such as content in a memory block in the cache 124.
  • memory module 216 in read buffer 229 stores and provides branch target instructions and subsequent instructions, while branch targets are provided directly by cache 124 such that the same buffer 124 can provide higher bandwidth.
  • the selector 214 in the read buffer 229 selects one of the branch target instruction (from the cache 124) or the subsequent instruction of the branch instruction (from the storage module 216) as the output 219 to the processor core 125 based on the branch decision, such that the branch instruction The instructions provided to processor core 125 are then valid or correct.
  • the branch target address in the bus 150 read from the track table 126 is sent to the buffer 124 as a block address and an intra-block offset address; the PC offset 155 (intra-block offset address) derived from the processor 125 is It is sent to the cache storage module 216.
  • the 'TAKEN' signal from memory 125 is used to control selector 214.
  • Tracker 170 provides BNX 152 and BNY during operation 153 addressing so that track table 126 can output one corresponding to the BNX 152 and BNY Track point of 153.
  • the content in which the track point is read contains information such as an instruction type and a branch target address. This content can be sent to the tracker 170 via the bus 150.
  • the branch target block address 221 (target BNX) on the bus 150, and the branch target offset address 222 (target BNY) are sent to the buffer 124 to
  • the branch target instruction is fetched from the cache 124 (which may also include other instructions on the same memory block of the branch target instruction) and placed on the bus 217 to the write port of the memory module 216 (write Port) and an input of selector 214.
  • the branch target block address 221 and the branch target offset address 222 can be latched by the register and then sent to the cache 124 for addressing.
  • the storage module 216 can include a specific number of storage units for storing instructions. For example, all instructions that contain a block of memory (eg, an instruction block).
  • the processor core 125 provides an in-block offset 155 for addressing to the memory module 216, and selects from the instructions stored in the memory module a single or multiple processor cores to send instructions to be executed to the selector 214. Input.
  • Processor core 125 is also provided
  • the 'TAKEN' signal and the 'BRANCH/JUMP' signal are sent to the tracker 170 to pass the branch or not information.
  • the 'TAKEN' signal is also sent to the selector 214 as an input to the selector 214, and is also sent to the storage module 216 to select whether to replace the contents of the storage module 216 with the instruction block output by the buffer 124.
  • the instruction selected from the memory module 216 to be placed on the input of the selector 214 contains the singular or plural instructions following the branch instruction. If the result of the determination is that no branching, the 'TAKEN' signal control selector 214 selects the output from the storage module 216 (the instruction following the branch instruction) and also controls the storage module 216 to keep the existing content unchanged. In this case, processor core 125 executes the instructions following the branch instruction. At this time, the tracker 170 moves to the next branch instruction in the same row of the track table, and repeats the above operation.
  • the 'TAKEN' signal controls the selector 214 to select the output of the cache 124 (branch target), and also controls the storage module 216 to update the contents of the storage module 216 with the output of the cache 124.
  • the processor core 125 executes the branch target instruction and the instruction after the branch target instruction.
  • the tracker 170 moves to the item in the track table where the branch target instruction is located. Thereafter, the PC offset 155 selects the instruction in the memory module 216 (the instruction following the branch target instruction) for execution by the processor core 125, and the tracker 170 moves to the next branch instruction in the same row of the track table, repeating the above operations.
  • the processor core 125 executes a branch instruction corresponding to a branch point
  • the branch target instruction and the subsequent instruction following the branch point can be simultaneously provided, so that the correct instruction can be fetched according to whether the branch transfer occurs.
  • An unconditional branch flag can be added after the last instruction in the track.
  • the branch target instruction is the instruction in the program stream immediately following the last instruction.
  • FIG. 11 shows a schematic 1100 of the branch prediction method of the present invention.
  • the instruction stream 1101 is an instruction stream composed of a series of sequentially executed instructions, and the execution order is from left to right.
  • the instruction 1102 on the instruction stream 1101 is a branch instruction.
  • the instructions 1103, 1104, 1105 on the instruction stream 1101 are all instructions that change the branch instruction 1102 branch condition (or condition flag), where the instruction 1105 is the last of these instructions to change the branch instruction 1102 branch condition (or condition flag).
  • the branch condition or condition flag
  • Figure 12 is an embodiment 1200 of the branch prediction of the present invention.
  • the branch prediction system 1200 is composed of three parts: an instruction buffer 1201, a pre-detection control unit 1202, and a time point detection unit 1203.
  • the instruction buffer 1201 stores the instruction 1205 currently being executed and the subsequent instruction following the instruction 1205.
  • the time point detecting unit 1203 includes a position register corresponding to each branch transfer determination condition (or condition flag).
  • the branch transfer decision condition (or condition flag) may be a general purpose register, a status register, or a flag bit, depending on the processor instruction set architecture. It is possible to compare with each other by different branch transfer judgment conditions (or condition flags) to obtain a determination result of whether or not branch transfer occurs. It is also possible to compare the branch transfer judgment condition (or condition flag) with a preset value to obtain a determination result as to whether or not the branch transfer occurs.
  • the pre-detection control unit 1202 controls the lead pointer 1204 to scan subsequent instructions from the current instruction 1205 along the instruction buffer at a faster rate than the processor program counter (PC) until the first branch instruction 1206 is reached.
  • the instruction pointed by the leading pointer is read out and sent to the time point detecting unit 1203. Since the number of conditions (or condition flags) available in the processor for branch branch determination is limited, decoding by decoder 1207 in time point detection unit 1203 indicates whether the instruction pointed to by leading pointer 1204 would change these conditions (or The value of one or more of the condition flags; if the instruction changes the condition (or condition flag) value of the branch transfer decision, then it is possible to know which value of the condition (or condition flag) the instruction will change.
  • condition register in the location register.
  • the branch prediction system 1200 takes only two kinds of judgment conditions (COND1 and COND2) as branching instructions. When there are more judgment conditions (or condition flags), the branching prediction system 1200 can also be implemented by the same method.
  • a total of three instructions from the current instruction 1205 to the first branch instruction 1206 change the determination condition, wherein the instruction position information of the instruction 1208 that changes the COND1 value is ' 3', the instruction position information of the instruction 1209 that changes the COND2 value is '4', and the instruction position information of the other instruction 1210 that changes the COND2 value is '7'.
  • the instruction 1208 is read and sent to the decoding unit 1207 via the bus 1211. Upon decoding, it is found that the instruction changes the value of COND1. Therefore, the instruction position information '3' of the instruction 1208 is written in the position register 1212 corresponding to COND1.
  • the leading pointer 1204 points to the instruction 1209 and the instruction 1210, the instruction position information '4' of the instruction 1209 and the instruction position information '7' of the instruction 1210 are sequentially written into the position register 1213 corresponding to the COND2.
  • the location information of the instruction that last updated the condition value before the branch instruction 1206 is stored in the location registers 1212 and 1213, respectively.
  • the instruction is read and sent to the decoding unit 1207 via the bus 1211, and after decoding, it is found to be a branch instruction, and the stop signal is sent to the pre-detection control unit through the control line 1216. 1202, causing the leading pointer 1204 to stay at the branch instruction 1206.
  • the decoding unit 1207 decodes and selects the value of the position register related to the condition determined by the branch instruction 1206 in the position register corresponding to all the branch conditions through the control line 1215, and outputs the value to the comparison.
  • Unit 1218 Another input for comparison by comparison unit 1218 is current instruction position information 1214 of the current instruction that has completed the condition value update.
  • the value of the current instruction location information 1214 sent to the comparison unit 1218 is equal to the value of the last update of the branch determination condition value before the branch instruction 1206 is executed.
  • the command position information of the command that is, the result of the comparison unit 1218 outputting "equal” is sent to the control unit 1219, indicating that the judgment condition value has been updated, and can be used to judge whether or not the branch transfer condition is satisfied.
  • control unit 1219 can issue a "decidable" signal 1220, allowing the processor to perform branch determination on the branch instruction 1206, thereby determining the branch in advance.
  • the point in time detection unit 1203 may also obtain the necessary information from registers, instruction buffers 1201, or any other suitable source in the processor to generate the signal 1220. At the same time, the point in time detection unit 1203 can also send the necessary information to the processor to generate the signal 1220.
  • the values of all the location registers corresponding to the required branch determination conditions may not be sent to the comparison unit 1218, but are decoded by the decoding unit 1207.
  • a control signal is issued, and the largest value (position value) among the plurality of position registers corresponding to the desired branch determination condition is selected and output to the comparison unit 1218.
  • the result of the comparison unit 1218 outputting "equal" is sent to the control unit 1219, or the position register value is greater than or equal to the value of the current command position information 1214, all the judgment condition values required for the branch instruction are updated.
  • the value of the program counter may also be used as the value of the current command position information 1214.
  • the apparatus and method proposed by the present invention can be used in a variety of processor related applications such as general purpose processors, special purpose processors, system on a chip (SOC) applications, application specific integrated circuit (ASIC) applications, and other computing systems.
  • processor related applications such as general purpose processors, special purpose processors, system on a chip (SOC) applications, application specific integrated circuit (ASIC) applications, and other computing systems.
  • SOC system on a chip
  • ASIC application specific integrated circuit
  • the apparatus and method proposed by the present invention can be used in high performance processors to increase its pipeline efficiency and overall system performance.

Abstract

A method is provided for controlling a pipeline operation of a processor. The processor is connected to a memory containing executable computer instructions. The method comprises determining that whether the instruction to be executed by the processor is a branch instruction, and providing both an address of a branch target instruction of the branch instruction and an address of a next instruction following the branch instruction in a program sequence. The method also comprises determining a branch decision with respect to the branch instruction based on at least the address of the provided branch target instruction, and selecting at least one of the branch target instruction and the next instruction as a proper instruction to be executed by an execution unit, based on the branch decision and before the branch instruction is processed to reach the execution stage in the pipeline, such that the pipeline operation is not stalled whether or not a branch is taken with respect to the branch instruction.

Description

分支处理方法与系统  Branch processing method and system 技术领域Technical field
本发明涉及电子计算机和微处理器体系结构领域,具体涉及分支处理方法与系统。  The present invention relates to the field of electronic computer and microprocessor architecture, and in particular to a branch processing method and system.
背景技术Background technique
控制相关 (control hazards) 也称为分支( Branch ),是流水线性能损失的一大原因。在处理分支指令时,传统处理器无法提前知道应该从哪里获取分支指令之后执行的下一条指令,而是需要等到分支指令完成后才能知道,使得流水线中分支指令之后出现空周期。图 1 是显示了传统流水线结构,其中的流水线段对应了一条分支指令。  Control hazards (also known as branches) ), is a major cause of loss of performance in the pipeline. When processing a branch instruction, the traditional processor cannot know in advance where to get the next instruction to execute after the branch instruction, but needs to wait until the branch instruction is completed before the empty instruction occurs after the branch instruction in the pipeline. Figure 1 shows the traditional pipeline structure, where the pipeline segment corresponds to a branch instruction.
表 1 分支指令的流水线段(分支转移发生时)
顺序 i IF ID EX MEM WB
i+1 IF stall stall stall
目标 IF ID EX MEM
目标+1 IF ID EX
目标+2 IF ID
指令地址 i i+1 目标 目标+1 目标+2 目标+3 目标+4
获取指令 i 目标 目标+1 目标+2 目标+3
时钟周期 1 2 3 4 5 6 7
Table 1 Pipeline segments of branch instructions (when branching occurs)
order i IF ID EX MEM WB
i+1 IF Stall Stall Stall
aims IF ID EX MEM
Goal +1 IF ID EX
Target +2 IF ID
Instruction address i i+1 aims Goal +1 Target +2 Target +3 Target +4
Get instruction i aims Goal +1 Target +2 Target +3
Clock cycle 1 2 3 4 5 6 7
结合图1与表1一同说明,表1中的列表示流水线中的时钟周期,而行表示按顺序的指令。指令地址是提供给指令存储器用于在获取指令时进行寻址,之后指令存储器的输出被送到译码器对获取到的指令进行译码。该流水线包括指令获取(IF)、指令译码(ID)、执行(EX)、存储器访问(MEM)和写回(WB)。停止(“stall”)表示流水线暂停或空周期。Referring to Figure 1 in conjunction with Table 1, the columns in Table 1 represent the clock cycles in the pipeline and the rows represent the sequential instructions. The instruction address is provided to the instruction memory for addressing when the instruction is fetched, after which the output of the instruction memory is sent to the decoder to decode the fetched instruction. The pipeline includes instruction fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and write back (WB). Stop ("stall") indicates a pipeline pause or an empty cycle.
表1显示了一个被标记为‘i’的分支指令,该指令在时钟周期‘1’被获取到。此外,‘i+1’表示紧跟在该分支指令之后的指令,“目标”表示该分支点的分支目标指令,而“目标+1”、“目标+2”、“目标+3”和“目标+4”表示紧跟在该分支目标指令之后的顺序指令。Table 1 shows a branch instruction labeled 'i', which is fetched during clock cycle '1'. In addition, 'i+1' denotes an instruction immediately following the branch instruction, "target" denotes a branch target instruction of the branch point, and "target +1", "target +2", "target +3" and " The target +4" indicates the sequential instruction immediately following the branch target instruction.
如表1所示,在时钟周期‘2’,处理器获取到分支指令‘i’。在时钟周期‘3’,处理器获取到指令‘i+1’,并对分支指令‘i’进行译码。假设在该分支指令译码段的结尾可以计算出分支目标地址,并完成分支判定。如果分支判定为分支转移发生,那么分支目标地址就被保存为下一地址用于获取下一指令。在时钟周期‘4’,获取分支目标指令,并在之后周期进行译码和执行。从这里开始,流水线处理分支目标指令后面的指令。然而,在这种情况下,已经被获取到的紧跟在分支指令之后的指令‘i+1’是不应该被执行的,因此流水线会因为指令‘i+1’而暂停。这样,当分支转移成功发生时,流水线会有一个时钟周期的暂停,这会导致流水线操作性能明显降低。As shown in Table 1, at the clock cycle '2', the processor acquires the branch instruction 'i'. At clock cycle '3', the processor fetches the instruction 'i+1' and decodes the branch instruction 'i'. It is assumed that the branch target address can be calculated at the end of the branch instruction decoding segment and the branch decision is completed. If the branch determines that a branch transfer has occurred, then the branch target address is saved as the next address for the next instruction. At the clock cycle '4', the branch target instruction is acquired and decoded and executed in a subsequent cycle. From here on, the pipeline processes the instructions following the branch target instruction. However, in this case, the instruction 'i+1' which has been acquired immediately after the branch instruction should not be executed, so the pipeline will be suspended due to the instruction 'i+1'. Thus, when the branch transfer succeeds, the pipeline will have a one-clock pause, which will result in a significant reduction in pipeline performance.
技术问题technical problem
为了减少分支处理对流水线性能的不利影响,已经有各种静态或动态的分支预测方法被提出,如:延时槽、分支预测缓冲、分支目标缓冲和跟踪缓存(trace cache)等。然而,这些预测方法通常根据处理器之前的运行结果进行预测,因此仍然会有预测错误而导致的性能损失。 In order to reduce the adverse effects of branch processing on pipeline performance, various static or dynamic branch prediction methods have been proposed, such as delay slots, branch prediction buffers, branch target buffers, and trace caches (trace). Cache) and so on. However, these prediction methods are usually predicted based on the previous running results of the processor, so there is still performance loss due to prediction errors.
技术解决方案Technical solution
本发明提出的方法与系统可用于解决上述一个或多个问题,以及其他问题。 The methods and systems proposed by the present invention can be used to solve one or more of the above problems, as well as other problems.
本发明提出一种控制处理器流水线操作的方法。该处理器连接一个包含可执行的计算机指令的存储器。该方法包括判断处理器即将执行的指令是否是分支指令,并提供该分支指令的分支目标指令地址和该分支指令在程序序列中的后一指令地址。该方法还包括至少根据分支目标指令的地址对分支指令的分支判定,并根据该分支判断,在分支指令到达其在流水线中的执行段前,至少选择分支目标指令和后一指令中的一个作为执行单元将要执行的指令,使得无论分支指令的转移是否发生都不会导致流水线操作的暂停。The present invention provides a method of controlling processor pipeline operations. The processor is coupled to a memory containing executable computer instructions. The method includes determining whether the instruction to be executed by the processor is a branch instruction, and providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence. The method further includes determining a branch instruction of the branch instruction based on at least an address of the branch target instruction, and according to the branch determination, selecting at least one of the branch target instruction and the latter instruction before the branch instruction reaches its execution segment in the pipeline The instruction that the execution unit is about to execute, so that the transfer of the pipeline instruction does not cause a pause in the pipeline operation.
本发明还提出一种用于控制处理器流水线操作的流水线控制系统。该处理器连接一个包含可执行的计算机指令的存储器。该系统包括一个审查单元、一个寻址单元、一个分支逻辑单元和一个选择器。该审查单元用于判断处理器即将执行的指令是否为分支指令。该寻址单元连接处理器,用于提供分支指令的分支目标指令地址和分支指令在程序序列中的后一指令地址。此外,分支逻辑单元用于至少根据寻址单元提供的分支目标指令地址决定关于该分支指令的分支判定。该选择器用于根据分支逻辑单元提供的分支判断,在分支指令到达其在流水线中的执行段之前,至少选择分支目标指令和后一指令中的一个作为所述执行单元将要执行的指令,使得无论该分支指令的转移是否发生都不会导致流水线操作的暂停。The present invention also proposes a pipeline control system for controlling processor pipeline operations. The processor is coupled to a memory containing executable computer instructions. The system includes a review unit, an addressing unit, a branch logic unit, and a selector. The review unit is configured to determine whether the instruction to be executed by the processor is a branch instruction. The addressing unit is coupled to the processor for providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence. Further, the branch logic unit is configured to determine a branch decision regarding the branch instruction based on at least a branch target instruction address provided by the addressing unit. The selector is configured to determine, according to the branch decision provided by the branch logic unit, at least one of the branch target instruction and the latter instruction as the instruction to be executed by the execution unit before the branch instruction reaches the execution segment in the pipeline, so that Whether or not the transfer of the branch instruction occurs does not cause a pause in the pipeline operation.
本发明还提出一种控制处理器流水线操作的方法。该处理器连接一个包含可执行的计算机指令的存储器。该方法包括判断处理器即将执行的指令是否是分支指令,并提供该分支指令的分支目标指令地址和该分支指令在程序序列中的后一指令地址。该方法还包括根据分支目标指令地址和后一指令地址分别获取分支目标指令和后一指令。此外,该方法还包括对获取到的分支目标指令和后一指令进行译码,并根据处理器提供的分支判断选择分支目标指令的译码结果和后一指令的译码结果送到执行单元,使得无论分支指令的转移是否发生都不会导致流水线操作的暂停。The present invention also provides a method of controlling processor pipeline operations. The processor is coupled to a memory containing executable computer instructions. The method includes determining whether the instruction to be executed by the processor is a branch instruction, and providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence. The method further includes obtaining the branch target instruction and the subsequent instruction according to the branch target instruction address and the subsequent instruction address, respectively. In addition, the method further includes: decoding the obtained branch target instruction and the subsequent instruction, and selecting, according to the branch judgment provided by the processor, the decoding result of the branch target instruction and the decoding result of the subsequent instruction to be sent to the execution unit, This causes no stalling of the pipeline operation, regardless of whether a branch instruction transfer occurs.
本发明还提出一种用于控制处理器流水线操作的流水线控制系统。该处理器连接一个包含可执行的计算机指令的存储器。该流水线控制系统包括一个连接处理器的寻址单元,用于提供分支指令的分支目标指令地址和分支指令在程序序列中的后一指令地址。该流水线控制系统还包括一个连接在存储器和处理器之间的读缓冲,用于至少存储分支指令的分支目标指令和后一指令中的一个。此外,该读缓冲还包括一个连接处理器的选择器,用于在分支指令被执行时,向处理器提供分支目标指令或后一指令中的一个,使得无论分支指令的分支转移是否发生都不会导致流水线操作的暂停。The present invention also proposes a pipeline control system for controlling processor pipeline operations. The processor is coupled to a memory containing executable computer instructions. The pipeline control system includes an addressing unit coupled to the processor for providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence. The pipeline control system also includes a read buffer coupled between the memory and the processor for storing at least one of a branch target instruction and a subsequent instruction of the branch instruction. In addition, the read buffer further includes a selector coupled to the processor for providing one of the branch target instruction or the latter instruction to the processor when the branch instruction is executed, so that no branch transfer of the branch instruction occurs. Will cause a pause in the operation of the pipeline.
对于本领域专业人士,还可以在本发明的说明、权利要求和附图的启发下,理解、领会本发明所包含其他方面内容。Other aspects of the present invention can be understood and appreciated by those skilled in the art in light of the description of the invention.
有益效果Beneficial effect
本发明所述的系统和方法可以为流水线处理器的分支处理提供一种基本的解决方法。该系统和方法在分支点被执行前获取分支目标指令的地址,并使用各种分支判断逻辑以消除因错误的分支预测而造成的效率损失。对于本领域技术人员而言,也可以推导出本发明的其他优点和有益效果。 The system and method of the present invention can provide a basic solution for branch processing of pipeline processors. The system and method acquires the address of the branch target instruction before the branch point is executed, and uses various branch decision logic to eliminate efficiency loss due to erroneous branch prediction. Other advantages and benefits of the present invention can also be derived by those skilled in the art.
附图说明DRAWINGS
图1是现有普通流水线的控制结构;Figure 1 is a control structure of a conventional ordinary pipeline;
图2是本发明所述的一个流水线控制结构的实施例;2 is an embodiment of a pipeline control structure according to the present invention;
图3是本发明所述的一个处理器系统的实施例;Figure 3 is an embodiment of a processor system in accordance with the present invention;
图4是本发明所述的轨道表的实施例;Figure 4 is an embodiment of the track table of the present invention;
图5A是本发明所述的另一个流水线控制结构的实施例;Figure 5A is an embodiment of another pipeline control structure of the present invention;
图5B是本发明所述的另一个流水线控制结构的实施例;Figure 5B is an embodiment of another pipeline control structure according to the present invention;
图6是本发明所述的另一个处理器系统的实施例;Figure 6 is an embodiment of another processor system in accordance with the present invention;
图7是本发明所述的另一个处理器系统的实施例;Figure 7 is an embodiment of another processor system in accordance with the present invention;
图8是本发明所述的操作中的不同指令值的实施例; Figure 8 is an embodiment of different command values in the operation of the present invention;
图9是本发明所述的另一个流水线控制结构的实施例;Figure 9 is an embodiment of another pipeline control structure of the present invention;
图10是本发明所述的处理器环境的一个实施例;Figure 10 is an embodiment of a processor environment in accordance with the present invention;
图11是本发明所述的分支预测方法的一个示意图;和11 is a schematic diagram of a branch prediction method according to the present invention; and
图12是本发明所述的分支预测的实施例。Figure 12 is an embodiment of the branch prediction described in the present invention.
本发明的最佳实施方式BEST MODE FOR CARRYING OUT THE INVENTION
图3显示了本发明的最佳实施方式。 Figure 3 shows a preferred embodiment of the invention.
本发明的实施方式Embodiments of the invention
虽然该发明可以以多种形式的修改和替换来扩展,说明书中也列出了一些具体的实施图例并进行详细阐述。应当理解的是,发明者的出发点不是将该发明限于所阐述的特定实施例,正相反,发明者的出发点在于保护所有基于由本权利声明定义的精神或范围内进行的改进、等效转换和修改。同样的元器件号码也可以被用于所有附图以代表相同的或类似的部分。Although the invention may be modified in various forms of modifications and substitutions, some specific embodiments of the invention are set forth in the specification and detailed. It should be understood that the inventor's point of departure is not to limit the invention to the particular embodiments set forth, but the inventor's point of departure is to protect all improvements, equivalent transformations and modifications based on the spirit or scope defined by the claims. . The same component numbers may also be used in all figures to represent the same or similar parts.
图2显示了一个与本公开发明一致的流水线控制结构1的例子。为了便于说明,流水线操作包括取指令(IF)、指令译码(ID)、执行(EX)、存储器访问(MEM)和写回(WB)。也可以使用其它流水线结构。如图2所示,译码器11通过指令总线16从指令存储器(或指令缓存)10取指令。译码器11将取得的指令译码并为后续操作准备操作数。译码后的指令与操作数被送往执行与程序计数器12(EX/PC)做执行操作并计算程序系列中下条指令的地址21。下一条指令的地址21被作为选择器20的输入。Figure 2 shows an example of a pipeline control structure 1 consistent with the disclosed invention. For ease of illustration, pipeline operations include fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and write back (WB). Other pipeline structures can also be used. As shown in FIG. 2, decoder 11 fetches instructions from instruction memory (or instruction cache) 10 via instruction bus 16. The decoder 11 decodes the fetched instructions and prepares operands for subsequent operations. The decoded instructions and operands are sent to the execution and program counter 12 (EX/PC) for execution and calculate the address 21 of the next instruction in the program family. The address 21 of the next instruction is used as the input of the selector 20.
同时,如果一个取得的指令是一个分支点,分支目标的指令地址在程序计数器到达该分支点前已被预先算出,如后续的段落细节描述。预先算出的分支目标指令地址被作为选择器20的另一个输入18。另外,一个分支判断单元13提供了一个分支控制信号14用以控制选择器20。可以基于分支类型与分支条件(或一个条件标志)产生分支控制信号14。分支控制信号14控制选择器20选择输出哪一路输入被输出到寄存器17和地址总线19。之后,总线19上的输出被用于从指令存储器10提取下条指令。Meanwhile, if a fetched instruction is a branch point, the instruction address of the branch target is pre-calculated before the program counter reaches the branch point, as described in the subsequent paragraph details. The pre-calculated branch target instruction address is used as the other input 18 of the selector 20. In addition, a branch judging unit 13 provides a branch control signal 14 for controlling the selector 20. The branch control signal 14 can be generated based on the branch type and the branch condition (or a condition flag). The branch control signal 14 controls the selector 20 to select which of the inputs is output to the register 17 and the address bus 19. Thereafter, the output on bus 19 is used to extract the next instruction from instruction memory 10.
图3显示了对应本发明所述流水线控制结构1的处理器环境300。如图3所示,处理器环境300包含一个低层次存储器122、一个高层次存储器124和一个处理器核125。此外,处理器环境300包含一个填充/生成器123、一个主动表121、一个轨道表126、一个循迹器170和一个分支判断逻辑210(对应于图2中的分支判断逻辑13)。应当理解的是,这里列出各种部件的目的是为了便于描述,还可以包含其他部件,而某些部件也可以被省略。这里的各种部件可以分布在多个系统中,可以是物理上存在的或是虚拟的,可以是硬件实现(如:集成电路)、软件实现或由硬件和软件组合实现。FIG. 3 shows a processor environment 300 corresponding to the pipeline control structure 1 of the present invention. As shown in FIG. 3, processor environment 300 includes a low level memory 122, a high level memory 124, and a processor core 125. In addition, processor environment 300 includes a fill/builder 123, an active table 121, a track table 126, a tracker 170, and a branch decision logic 210 (corresponding to branch decision logic 13 in FIG. 2). It should be understood that the various components listed herein are for ease of description and may include other components, and some components may also be omitted. The various components herein may be distributed across multiple systems, either physically or virtually, and may be hardware implemented (eg, integrated circuits), implemented in software, or implemented in a combination of hardware and software.
高层次存储器124和低层次存储器122可以包含任何合适的存储设备,如:静态存储器(SRAM)、动态存储器(DRAM)和闪存存储器(flash memory)。在这里,存储器的层次指的是存储器与处理器核之间的接近程度。越接近处理器核的层次越高。此外,一个高层次的存储器通常比低层次的存储器速度快但容量小。高层次存储器124可以作为系统的一个缓存工作,或当有其他缓存存在时作为一级缓存工作,且可以被分割成复数个被称为块(如存储块)的用于存储处理器核125要访问的数据(即在指令块和数据块中的指令和数据)的存储片段。 High level memory 124 and low level memory 122 may comprise any suitable storage device such as: static memory (SRAM), dynamic memory (DRAM), and flash memory (flash) Memory). Here, the level of memory refers to the degree of proximity between the memory and the processor core. The closer the processor core is, the higher the level. In addition, a high level of memory is typically faster than a low level of memory but has a small capacity. The high level memory 124 can operate as a cache for the system, or as a level 1 cache when other buffers are present, and can be partitioned into a plurality of blocks called blocks (eg, memory blocks) for storing the processor core 125. A stored fragment of the accessed data (ie, the instructions and data in the instruction block and the data block).
处理器核125可以是能以流水线方式并与缓存系统协同工作的任意合适的处理器。处理器核125可以使用分开的指令缓存与数据缓存,并可以包含一些用于缓存操作的指令。当处理器核125执行一条指令时,处理器核125首先需要从存储器中读入指令和/或数据。主动表121、轨道表126、循迹器170和填充/生成器123用于将处理器核125将要执行到的指令填充到高层次存储器124中,使处理器核125能从高层次存储器124中以非常低的缓存缺失率读到所需的指令。在本实施例中,术语“填充”表示将数据/指令从较低层次的存储器移动到较高层次的存储器中,术语“存储器访问”表示处理器核125对最接近的存储器(即高层次存储器124或一级缓存)进行读或写。 Processor core 125 can be any suitable processor that can be pipelined and cooperate with the cache system. Processor core 125 may use separate instruction caches and data caches, and may include some instructions for cache operations. When processor core 125 executes an instruction, processor core 125 first needs to read the instructions and/or data from the memory. Active table 121, track table 126, tracker 170, and pad/generator 123 are used to fill instructions to be executed by processor core 125 into high level memory 124, enabling processor core 125 to be from high level memory 124. Read the required instructions at a very low cache miss rate. In the present embodiment, the term "fill" means moving data/instructions from a lower level memory to a higher level memory, and the term "memory access" means that processor core 125 is the closest memory (ie, high level memory). 124 or level 1 cache) to read or write.
此外,,填充/生成器123可以根据适当的地址获取指令或指令块,并可以对从低层次存储器122中获取来以填充到高层次存储器124中的每一条指令进行审查,并提取出某些信息,如:指令类型、指令地址和分支指令的分支目标信息。该指令以及包含分支目标信息的被提取出的信息被用于计算地址并送到其他模块,如主动表121和轨道表126。在本实施例中一条分支指令或一个分支点指的是任何适当的能导致处理器核125改变执行流(如:非按顺序执行一条指令)的指令形式。如果分支目标信息对应的指令块尚未被填充到高层次存储器124中,那么在将该指令块填充到高层次存储器124中的同时,建立对应的轨道。轨道表126中的轨道和高层次存储器124中的存储块一一对应,且都由同一指针152指向。处理器核125将要执行的任何指令,都可以在执行前被填充到高层次存储器124中。In addition, the pad/generator 123 can fetch instructions or instruction blocks according to appropriate addresses, and can review each instruction fetched from the low level memory 122 to fill into the high level memory 124 and extract certain Information such as instruction type, instruction address, and branch target information for branch instructions. The instruction and the extracted information containing the branch target information are used to calculate the address and sent to other modules, such as the active table 121 and the track table 126. In this embodiment a branch instruction or a branch point refers to any suitable form of instructions that causes processor core 125 to change the execution stream (e.g., execute an instruction out of order). If the instruction block corresponding to the branch target information has not been filled into the high level memory 124, the corresponding track is established while the instruction block is filled into the high level memory 124. The tracks in the track table 126 correspond one-to-one with the memory blocks in the high level memory 124 and are all pointed by the same pointer 152. Any instructions that processor core 125 is to execute can be populated into high level memory 124 prior to execution.
填充/生成器123可以根据指令和分支目标信息确定地址信息,如:指令类型、分支源地址和分支目标地址信息。例如,指令类型可以包括条件分支指令、无条件分支指令和其他指令等。指令类别还可以包括条件分支指令的子类别,如相等时分支转移、大于时分支转移等。在某些情况下,可以认为无条件分支指令是条件分支指令的一种特例,即条件总是成立。因此,指令类型可以分为分支指令和其他指令等。分支源地址可以指分支指令本身的地址,分支目标地址可以指当分支成功发生时将被转移到的地址。此外,还可以包括其他信息。The pad/generator 123 may determine address information such as an instruction type, a branch source address, and a branch target address information based on the instruction and the branch target information. For example, the instruction types may include conditional branch instructions, unconditional branch instructions, and other instructions. The instruction class may also include subcategories of conditional branch instructions, such as branching when equal, branching when greater than, etc. In some cases, an unconditional branch instruction can be considered a special case of a conditional branch instruction, ie, the condition is always true. Therefore, the instruction type can be divided into branch instructions and other instructions. The branch source address can refer to the address of the branch instruction itself, and the branch target address can refer to the address to which the branch will be transferred when the branch succeeds. In addition, you can include other information.
此外,可以基于预先计算的信息建立一个轨道表用于提供地址以填充高层次存储器124。图4是一个如本发明所公开的轨道表操作的例子。如图4所示,轨道表126与循迹器170相互作用以提供缓存与分支处理所需的地址。Additionally, a track table can be created based on the pre-computed information for providing an address to populate the high level memory 124. 4 is an example of a track table operation as disclosed herein. As shown in FIG. 4, track table 126 interacts with tracker 170 to provide the address required for buffering and branching processing.
轨道表126可包含处理器核125执行的指令的轨道,循迹器170基于轨道表126提供不同的地址,并为轨道表126提供一个读取指针。这里说的轨道的意思是对要被执行的一系列指令(比如一个指令块)的一种表达形式。这种表达形式可以包括任何适当的数据类型,如地址,块号码,或者其它数字。此外,当一个轨道包含一个分支点,该分支点有一个改变了程序流向的分支目标时,或当一条指令后的另一条指令是在一个不同的指令块,比如是下一个指令块中的一条指令、或一个异常程序、或另外一个程序线程等时,可以建立一个新轨道。The track table 126 can include tracks for instructions executed by the processor core 125, the tracker 170 provides different addresses based on the track table 126, and provides a read pointer for the track table 126. The track referred to here means an expression of a series of instructions (such as an instruction block) to be executed. This form of expression can include any suitable data type, such as an address, block number, or other number. In addition, when a track contains a branch point that has a branch target that changes the flow of the program, or another instruction after an instruction is in a different instruction block, such as a block in the next instruction block. A new track can be created when an instruction, or an exception program, or another program thread, etc.
轨道表126可包括复数条轨道,其中轨道表126种每条轨道与轨道表中的标有一个行号或者块号(BN)的一行有对应关系,该块号指向一个相应的存储器块。一条轨道可包括复数个轨迹点,而一个轨迹点可以对应于一条或多条指令。此外,由于一条轨道对应于轨道表126中的一行,因此一个轨迹点对应于轨道表126中一行的一个表项(比如一个存储单元)。这样,一条轨道中的总的轨迹点可以等于轨道表126中一行中的表项总数。也可用其它组织方式。The track table 126 can include a plurality of tracks, wherein each track of the track table 126 has a corresponding relationship with a line marked with a line number or block number (BN) in the track table, the block number pointing to a corresponding memory block. A track may include a plurality of track points, and a track point may correspond to one or more instructions. Further, since one track corresponds to one line in the track table 126, one track point corresponds to one entry (for example, one storage unit) of one line in the track table 126. Thus, the total track point in one track can be equal to the total number of entries in a row in track table 126. Other ways of organizing can also be used.
一个轨迹点(即,在表的项中的一项)可含有这个轨道中一条指令的信息,比如分支指令。这样,一个轨迹点的内容可以包含相应指令的类别与分支目标的信息。通过检查一个轨迹点的内容,基于其中的分支目标地址可以确定一个分支目标点。A track point (ie, an item in a table entry) can contain information about an instruction in the track, such as a branch instruction. Thus, the content of a track point can contain information about the class of the corresponding instruction and the branch target. By examining the contents of a track point, a branch target point can be determined based on the branch target address therein.
例如,如图 4 所示,处理器核 125 可用一种( M+Z )位的指令地址读取操作用的指令,在此 M 与 Z 是整数。地址中的 M 位部分可以被称为高位地址, Z 位部分可被称为偏移地址。轨迹表 126 可包含 2M 行,即总共 2M 个轨道,而高位地址可以被用于轨道表 126 的寻址每一行可包含 2Z 个轨道项,即总数为 2Z 个轨迹点,偏移地址可用于在相应的行中寻址以确定一个特定的轨迹点。 For example, as shown in FIG. 4, processor core 125 can read an instruction for operation with an instruction address of (M+Z) bits, where M and Z are integers. The M-bit portion of the address can be referred to as a high-order address, and the Z-bit portion can be referred to as an offset address. The track table 126 may contain 2 M lines, i.e. a total of 2 M tracks, and the upper address may be used for the addressing of the track table 126. Each line may contain 2 Z track items, ie a total of 2 Z track points, offset The address can be used to address in the corresponding row to determine a particular track point.
此外,行中的每个表项或轨迹点的内容格式可以包括一个类别部分57、一个XADDR部分58和一个YADDR部分59。也可包含其它部分。类别部分57表示轨迹点对应指令的类别。如前所述,指令类别可包括条件分支指令、非条件分支指令和其它指令。指令类别还可以包括条件分支指令的子类别,如相等时分支转移、大于时分支转移等。XADDR部分58可以包含M位地址,也可以被称为一个第一维地址或者简称为第一地址。YADDR部分59可以包含Z位地址,也可以被称为第二维地址或简称为第二地址。In addition, the content format of each entry or track point in the row may include a category portion 57, an XADDR portion 58, and a YADDR portion 59. Other parts can also be included. The category portion 57 represents the category of the track point corresponding instruction. As previously mentioned, the instruction classes may include conditional branch instructions, unconditional branch instructions, and other instructions. The instruction class may also include subcategories of conditional branch instructions, such as branching when equal, branching when greater than, etc. The XADDR portion 58 may contain an M-bit address, which may also be referred to as a first-dimensional address or simply as a first address. The YADDR portion 59 may contain a Z-bit address, which may also be referred to as a second-dimensional address or simply as a second address.
当一条包含一个分支点(一个分支轨迹点)的新轨道被建立时,该新轨道可以被建立在轨道表126的一个可用行中,而分支轨迹点可以被建立在该行的一个可用表项中。该行和该表项的位置由分支点的源地址(即分支源地址)确定。例如,可以根据分支源地址的高位地址确定该行号码或块号码,根据分支源地址的偏移地址确定表项。When a new track containing a branch point (a branch track point) is created, the new track can be built in an available row of the track table 126, and the branch track point can be established in an available entry for the row. in. The location of the row and the entry is determined by the source address of the branch point (ie, the branch source address). For example, the row number or the block number may be determined according to the upper address of the branch source address, and the entry is determined according to the offset address of the branch source address.
此外,新轨迹点的内容可以对应分支目标指令。换句话说,分支轨迹点的内容存储了分支目标地址信息。例如,轨道表126中的对应与一条分支目标指令的特定的行的行号或块号被作为第一地址存储到分支轨迹点的内容中。此外,偏移地址表示分支目标指令在其轨道中的位置,而该偏移地址被作为第二地址存储到该分支轨迹点的内容中。这样,在分支点的内容中,第一地址被用做行地址,而第二地址被用做列地址对该行中的分支目标轨迹点进行寻址。In addition, the content of the new track point can correspond to the branch target instruction. In other words, the contents of the branch track point store the branch target address information. For example, the line number or block number of the specific row corresponding to one branch target instruction in the track table 126 is stored as the first address in the content of the branch track point. Further, the offset address indicates the position of the branch target instruction in its track, and the offset address is stored as the second address in the content of the branch track point. Thus, in the content of the branch point, the first address is used as the row address, and the second address is used as the column address to address the branch target track point in the row.
指令存储器 46 可以是高层次存储器 124 的一部分,用于指令访问,并可以由任意合适的高性能存储器构成。指令存储器 46 可以包含 2M 个存储块,每个存储块包含 2Z 个字节或字。这就是说,指令存储器 46 可以存储所有被 M 和 Z 位(即指令地址)寻址的指令,使得这 M 位可以被用于对某个特定的存储块,而这 Z 位可以被用于对该特定存储块中的特定字节或字进行寻址。 Instruction memory 46 may be part of high level memory 124 for instruction access and may be comprised of any suitable high performance memory. Instruction memory 46 may contain 2 M memory blocks, each of which contains 2 Z bytes or words. That is, the instruction memory 46 can store all instructions addressed by the M and Z bits (i.e., instruction addresses) such that the M bits can be used for a particular memory block, and the Z bits can be used for A particular byte or word in that particular memory block is addressed.
循迹器170可以由各种部件或设备构成,如:寄存器、选择器、栈和/或其他存储模块,用于确定处理器核125执行的下一轨道。循迹器170可以根据轨道表126中的当前轨道、轨迹点信息和是否因处理器核125的执行发生分支转移等信息确定下一轨道。The tracker 170 can be comprised of various components or devices, such as registers, selectors, stacks, and/or other memory modules for determining the next track to be executed by the processor core 125. The tracker 170 can determine the next track based on information such as the current track in the track table 126, track point information, and whether branching has occurred due to execution of the processor core 125.
例如,在运行过程中,当处理器核125执行分支指令时,总线55上传递了分支指令的(M+Z)位指令地址。M位地址通过总线56被作为第一地址或XADDR(或X地址)送到轨道表126,Z位地址通过总线53被作为第二地址或YADDR(或Y地址)送到轨道表126。根据该第一地址和第二地址,轨道表126可以找到一个分支指令表项,并将该分支指令的分支目标地址输出到总线51上。For example, during operation, when processor core 125 executes a branch instruction, the (M+Z) bit instruction address of the branch instruction is passed on bus 55. The M-bit address is sent to the track table 126 as a first address or XADDR (or X address) via the bus 56, and the Z-bit address is sent to the track table 126 as a second address or YADDR (or Y address) via the bus 53. Based on the first address and the second address, the track table 126 can find a branch instruction entry and output the branch target address of the branch instruction to the bus 51.
如果该分支指令的分支转移条件不成立,那么分支转移不发生,选择器49选择总线53上的YADDR经增一逻辑48增加一(1)个字节或字后得到新的第二地址54,而第一地址保持不变,可以在总线52上输出该新地址。根据处理器核125来的控制信号60(如:一个不成功的分支转移),寄存器50保持第一地址不变,由增一逻辑48不断对第二地址增一(1)直至指向当前轨道表行上的下一个分支指令。If the branch transfer condition of the branch instruction does not hold, the branch transfer does not occur, and the selector 49 selects YADDR on the bus 53 to increment by one (1) byte or word to obtain a new second address 54. The first address remains unchanged and the new address can be output on bus 52. According to the control signal 60 from the processor core 125 (eg, an unsuccessful branch transfer), the register 50 keeps the first address unchanged, and the second address is incremented by one (1) by the increment one logic 48 until it points to the current track table. The next branch instruction on the line.
另一方面,如果所述分支指令的分支转移条件成立,那么分支转移发生,选择器49选择总线51上的分支点对应的轨道表项的内容中存储的分支目标地址作为输出送到总线52上。根据处理器核125来的控制信号60(如:一个成功的分支转移),寄存器50保持该改变后的对应新轨道的第一地址,并将(M+Z)位的新地址提供到总线55上。On the other hand, if the branch transfer condition of the branch instruction is established, the branch transfer occurs, and the selector 49 selects the branch target address stored in the content of the track entry corresponding to the branch point on the bus 51 as an output to be sent to the bus 52. . Based on control signal 60 from processor core 125 (e.g., a successful branch transfer), register 50 holds the first address of the changed corresponding new track and provides a new address of (M+Z) bits to bus 55. on.
这样,为了进行存储器寻址,轨道表126和循迹器170提供了一个块地址,而处理器核125只提供一个偏移量。处理器核125反馈分支指令执行状态使得循迹器170能够进行判断操作。Thus, for memory addressing, track table 126 and tracker 170 provide a block address, while processor core 125 provides only one offset. The processor core 125 feeds back the branch instruction execution state so that the tracker 170 can perform the decision operation.
在执行一条新轨道之前,对应于该轨道的指令块被填充到指令存储器46中。重复该过程可以使处理器核125将要执行的所有指令都不会发生缓存缺失。The instruction block corresponding to the track is filled into the instruction memory 46 before a new track is executed. Repeating this process can cause cache misses to occur for all instructions that processor core 125 will execute.
回到图3,为了提升效率并减小存储器容量,主动表121可以被用于存储任何已建立的轨道信息,并建立地址(或地址中一部分)与块号码之间的映射关系,使得可以使用轨道表126中的任何可用行建立轨道。例如,当建立一条轨道时,轨道中所有分支点的分支目标地址信息都被存储在主动表121中。这样,主动表21可以存储程序中所有分支目标轨迹点的轨道的映射信息。也可以使用其他的配置结构。Returning to Figure 3, in order to increase efficiency and reduce memory capacity, the active table 121 can be used to store any established track information and establish a mapping relationship between the address (or a portion of the address) and the block number so that it can be used Any available rows in track table 126 establish a track. For example, when a track is established, branch destination address information of all branch points in the track is stored in the active table 121. Thus, the active table 21 can store mapping information of tracks of all branch target track points in the program. Other configuration structures can also be used.
这样,主动表121可以被用于存储高层次存储器124中指令块的块号。块号也对应于轨道表126中的行号。在审查过程中,可以通过对地址和主动表121中表项进行匹配的方式得到分支目标地址的块号。匹配成功的结果,即块号(前述的第一地址)可以与指令在轨道中的偏移量(前述的第二地址)一起用于确定轨迹点的位置。Thus, the active table 121 can be used to store the block number of the instruction block in the high level memory 124. The block number also corresponds to the line number in the track table 126. During the review process, the block number of the branch target address can be obtained by matching the address with the entry in the active table 121. The result of the successful matching, i.e. the block number (the aforementioned first address), can be used together with the offset of the instruction in the track (the aforementioned second address) to determine the position of the track point.
如果匹配不成功,表示该地址对应的轨道尚未建立。由主动表121指定一个块号并将该地址对应的指令段填充到高层次存储器124中由该块号索引的位置中,在轨道表126中建立对应这个块号的一条新轨道,使得主动表121可以表示这条已建立的轨道及相关地址。因此,之前所述主动表121和填充/生成器123的操作可以在分支点被处理器核125获取并执行之前,将分支点的分支目标指令对应的指令段填充到缓存124(即高层次存储器124)中。If the match is unsuccessful, the track corresponding to the address has not been established. A block number is specified by the active table 121 and the instruction segment corresponding to the address is filled into the position in the high-level memory 124 indexed by the block number, and a new track corresponding to the block number is established in the track table 126, so that the active table is obtained. 121 can represent the established track and associated address. Therefore, the operations of the active table 121 and the pad/generator 123 can be filled into the cache 124 (ie, the high-level memory) of the instruction segment corresponding to the branch target instruction of the branch point before the branch point is acquired and executed by the processor core 125. 124) Medium.
这样,轨道表126可以被配置成一种二维表,其中,由第一地址BNX对每一行索引,对应一个存储块或一个存储行,而第二地址BNY对每一列索引,对应相应指令(数据)在存储块中的偏移量。简单来说,轨道表的写地址对应指令的源地址。此外,对于一个特定的分支源地址,由主动表121根据高位地址分配一个BNX,而BNY就等于偏移量。然后,BNX和BNY就可以组成一个指向被写表项的写地址。Thus, the track table 126 can be configured as a two-dimensional table in which each row is indexed by the first address BNX, corresponding to one memory block or one storage row, and the second address BNY is indexed for each column, corresponding to the corresponding instruction (data ) The offset in the memory block. In simple terms, the write address of the track table corresponds to the source address of the instruction. In addition, for a particular branch source address, the active table 121 assigns a BNX based on the upper address, and BNY is equal to the offset. Then, BNX and BNY can form a write address that points to the written entry.
此外,当指令被填充到高层次存储器124中时,可以通过计算分支指令地址及分支目标指令的分支偏移量之和的方法得到所有分支指令的分支目标地址。该分支目标地址(高位地址、偏移量)被送到主动表121以对高位地址部分进行匹配,而主动表121可以分配一个BNX。该分配到的BNX与从生成器130来的指令类型以及偏移量(BNY)一起构成每个分支指令轨道表项内容。该内容被存储在由相应写地址寻址的分支点中。Furthermore, when an instruction is filled into the high level memory 124, the branch target address of all branch instructions can be obtained by calculating the sum of the branch instruction address and the branch offset of the branch target instruction. The branch target address (higher address, offset) is sent to the active table 121 to match the upper address portion, and the active table 121 can be assigned a BNX. The assigned BNX, together with the instruction type and offset (BNY) from the generator 130, constitutes the contents of each branch instruction track entry. The content is stored in a branch point addressed by the corresponding write address.
此外,循迹器170可以被用于向轨道表126提供一个读指针151。读指针151也可以是BNX和BNY的形式。由读指针指向的轨道表项的内容与该表项的BNX和BNY(源BNX和源BNY)一起被读出并被循迹器170检查。循迹器170可以根据该内容进行多种不同的读指针更新操作。例如,如果该表项不是一个分支点,循迹器170可以用新BNX=源BNX、新BNY=源BNY+1的方法更新读指针。Additionally, tracker 170 can be used to provide a read pointer 151 to track table 126. The read pointer 151 can also be in the form of BNX and BNY. The contents of the track entry pointed to by the read pointer are read along with the BNX and BNY (source BNX and source BNY) of the entry and are checked by the tracker 170. The tracker 170 can perform a plurality of different read pointer update operations based on the content. For example, if the entry is not a branch point, the tracker 170 can update the read pointer with the new BNX=source BNX, new BNY=source BNY+1.
如果该表项是一个条件分支,循迹器170等待处理器核125送来的该分支点的分支指令被执行时产生的控制信号(TAKEN)。如果该控制信号表明分支转移没有发生,循迹器170可以用新BNX=源BNX、新BNY=源BNY+1的方法更新读指针。然而,如果该分支成功发生,循迹器170可以用新BNX=目标BNX、新BNY=目标BNY的方法更新读指针。If the entry is a conditional branch, the tracker 170 waits for a control signal (TAKEN) generated when the branch instruction of the branch point sent by the processor core 125 is executed. If the control signal indicates that a branch transfer has not occurred, the tracker 170 can update the read pointer with the new BNX=source BNX, new BNY=source BNY+1. However, if the branch succeeds, the tracker 170 can update the read pointer with the new BNX=target BNX, new BNY=target BNY.
如果该表项是一个无条件分支(或跳转),循迹器170可以将其视为一个条件成立的条件分支,也就是在该分支指令被执行时用新BNX=目标BNX、新BNY=目标BNY的方法更新读指针。If the entry is an unconditional branch (or jump), the tracer 170 can treat it as a conditional branch that is conditional, that is, when the branch instruction is executed, the new BNX=target BNX, new BNY=target The BNY method updates the read pointer.
循迹器170与轨道表126及主动表121一起实现基于轨道的操作。这样,分支指令、分支目标指令和紧跟分支指令之后的指令的地址信息都能事先确定。这些信息可以被流水线控制结构1用于在不暂停流水线的情况下进行分支处理操作。The tracker 170 implements a track-based operation with the track table 126 and the active meter 121. Thus, the branch information, the branch target instruction, and the address information of the instruction following the branch instruction can be determined in advance. This information can be used by the pipeline control structure 1 to perform branch processing operations without suspending the pipeline.
具体地,如图3所示,当读指针151到达一个分支点时,循迹器170接收到从轨道表126经总线150来的分支目标地址。该分支目标地址的高位地址(目标BNX)被作为一个选择器的一个输入,另一个输入是当前BNX(BN 151的高位地址,或源BNX)。该选择器的输出是下一BNX。此外,该分支目标地址的偏移量部分(目标BNY)被作为另一个选择器的一个输入,而另一个输入是来源于处理器核125的PC偏移量155。该选择器的输出被用做高层次存储器124的“偏移量1”地址,以对由BNX 152确定的缓存块中的指令进行寻址。Specifically, as shown in FIG. 3, when the read pointer 151 reaches a branch point, the tracker 170 receives the branch target address from the track table 126 via the bus 150. The upper address of the branch destination address (target BNX) is used as one input for one selector, and the other input is current BNX (BN The high address of 151, or the source BNX). The output of this selector is the next BNX. In addition, the offset portion of the branch target address (target BNY) is taken as one input to the other selector, and the other input is derived from the PC offset 155 of the processor core 125. The output of this selector is used as the "offset 1" address of the high level memory 124 to be paired by BNX The instructions in the cache block determined by 152 are addressed.
读指针151(BNX 152、BNY 153)以比PC更快的速度移动(如:循迹器170运行在更高的时钟频率等)。读指针151沿轨道移动,当从轨道表126的表项中读出的内容表示该表项是一条带有分支目标地址(BNX和BNY)的分支指令时,读指针151停止移动,等待处理器核125执行该分支点,并等待从分支判断逻辑210来的控制信号‘TAKEN’信号212和‘BRANCH/JUMP’信号213。处理器核125提供一个PC偏移量以对高层次存储器124中的指令寻址,而循迹器170提供BNY 153以对轨道表126中的分支点寻址。这两个信号也被送到分支判断逻辑210进行比较。如果PC偏移量155与BNY 153相等,那么表示处理器核125正在获取该分支点。这就是说,BNY 153与PC偏移量155的匹配可以被用于控制分支处理的时机,使得分支判断逻辑210在PC偏移量155等于BNY 153时进行分支判定。或者,也可以在PC偏移量155离BNY 153还差预设数目条指令时就可以开始进行分支处理。Read pointer 151 (BNX 152, BNY 153) Move at a faster speed than the PC (eg, the tracker 170 operates at a higher clock frequency, etc.). The read pointer 151 moves along the track. When the content read from the entry of the track table 126 indicates that the entry is a branch instruction with a branch target address (BNX and BNY), the read pointer 151 stops moving, waiting for the processor. Core 125 executes the branch point and waits for control signal 'TAKEN' signal 212 and 'BRANCH/JUMP' signal 213 from branch decision logic 210. Processor core 125 provides a PC offset to address instructions in high level memory 124, while tracker 170 provides BNY. 153 addresses the branch points in the track table 126. These two signals are also sent to the branch decision logic 210 for comparison. If PC offset 155 and BNY 153 is equal, then indicating that processor core 125 is acquiring the branch point. That is to say, BNY The match of 153 with PC offset 155 can be used to control the timing of the branching process such that branch decision logic 210 equals BNY at PC offset 155. At 153, branch determination is made. Alternatively, branch processing may be started when the PC offset 155 is different from the BNY 153 by a predetermined number of instructions.
当PC偏移量155等于或离BNY 153还差预设数目条指令时,处理器核125获取分支点。分支判断逻辑210可以对分支转移是否发生进行判定。在某些情况下,可以根据分支类型和分支条件(或条件标志)进行分支判定。分支类型211(来源于轨道表126)可以表示分支指令的特定类型,如:当分支条件等于零时分支转移或当分支条件大于零时分支转移等。分支条件可以由处理器核125的处理器操作产生。根据处理器结构、分支指令和/或流水线操作的不同,一条特定的分支指令的分支条件可以在处理器核125的多个流水线段有效。When the PC offset 155 is equal to or away from BNY When the 153 is still a preset number of instructions, the processor core 125 acquires the branch point. The branch decision logic 210 can determine whether a branch transfer has occurred. In some cases, branch decisions can be made based on branch type and branch conditions (or condition flags). The branch type 211 (derived from the track table 126) may represent a particular type of branch instruction, such as a branch transfer when the branch condition is equal to zero or a branch transfer when the branch condition is greater than zero. The branch conditions can be generated by processor operations of processor core 125. Depending on the processor architecture, branch instructions, and/or pipeline operations, the branch conditions for a particular branch instruction may be valid across multiple pipeline segments of processor core 125.
分支判断逻辑210可以包含任何合适的电路用以进行分支判定。如之前所述,分支判断逻辑210可以在PC偏移量155等于BNY 153或在PC偏移量155与BNY 153形成某种关系(如:大于)时进行分支判定,该分支判定可以给出条件标志已经准备好的信号。之后,分支判断逻辑210的结果被作为‘TAKEN’信号212和‘BRANCH/JUMP’信号213。该‘BRANCH/JUMP’信号通知循迹器170处理器核125已经到达分支指令并使读指针151能够更新。该‘TAKEN’信号是正在执行的程序的真实结果并选择正确的下一条该被执行的指令。 Branch decision logic 210 may include any suitable circuitry for branching decisions. As previously described, the branch decision logic 210 can be equal to BNY at a PC offset of 155. 153 or at PC offset 155 and BNY When a relationship is formed 153 (e.g., greater than), a branch decision is made, and the branch decision can give a signal that the condition flag is ready. Thereafter, the result of the branch decision logic 210 is taken as the 'TAKEN' signal 212 and the 'BRANCH/JUMP' signal 213. The 'BRANCH/JUMP' signal informs the tracker 170 that the processor core 125 has reached the branch instruction and enables the read pointer 151 to be updated. The 'TAKEN' signal is the actual result of the program being executed and selects the correct next instruction to be executed.
这样,当检测到‘BRANCH/JUMP’信号时,如果分支转移不发生,那么下一BNX=源BNX,而下一BNY=源BNY+1,这样,选择没有改变的BNX 152(源BNX)送到“块选择1”,并选择来源于处理器核125的下条指令的指令地址偏移量(PC偏移量155)送到“偏移量1”以对该分支指令之后的指令进行寻址。然而如果分支转移发生,下一BNX=目标BNX,而下一BNY=目标BNY,这样,选择改变后的BNX 152(目标BNX)送到“块选择1”,并选择来源与轨道表126的分支目标指令的偏移量(目标BNY)送到“偏移量1”以对该分支指令的分支目标指令进行寻址。这样,根据从轨道表126来的分支类型信息和从处理器核125来的分支条件标志,可以在事先由轨道表126提供了分支目标指令的地址信息,PC提供了分支指令之后的指令的地址信息,而分支判断逻辑210对分支转移进行了判定。Thus, when the 'BRANCH/JUMP' signal is detected, if the branch transfer does not occur, then the next BNX = source BNX, and the next BNY = source BNY+1, thus selecting BNX without change 152 (source BNX) is sent to "block selection 1", and the instruction address offset (PC offset 155) from the next instruction of the processor core 125 is selected and sent to "offset 1" to the branch. The instructions following the instruction are addressed. However, if a branch transfer occurs, the next BNX = target BNX, and the next BNY = target BNY, thus selecting the changed BNX 152 (target BNX) is sent to "block selection 1", and the offset of the branch target instruction of the source and track table 126 (target BNY) is selected and sent to "offset 1" to perform the branch target instruction of the branch instruction. Addressing. Thus, based on the branch type information from the track table 126 and the branch condition flag from the processor core 125, the address information of the branch target instruction can be provided in advance by the track table 126, and the PC provides the address of the instruction after the branch instruction. Information, and branch decision logic 210 determines the branch transfer.
因此,如果分支转移发生,那么处理器核125用于获取分支目标指令(目标BNX 152、目标BNY 150)的正确地址已经被准备好提供到高层次存储器124的端口“块选择1”和“偏移量1”。这样,处理器核125可以在不等待的情况下继续流水线操作。表2显示了分支转移成功发生时的流水线段图示。在表2中,被标记为“指令地址”的行是对应指令存储器124“块选择1”(高位地址)和“偏移量1”(低位地址)的指令存储地址,而标记为“获取指令”的行对应高层次存储器124“读端口1”上的指令。这里假设从指令地址有效到指令有效需要一个时钟周期的时延。此外,指令‘i’是分支指令,而‘目标’是分支目标指令,‘目标+1’是分支目标指令的后一条指令,依此类推。Therefore, if a branch transfer occurs, the processor core 125 is used to acquire the branch target instruction (target BNX 152, target BNY) The correct address of 150) has been prepared to provide the port "Block Select 1" and "Offset 1" to the high level memory 124. In this way, processor core 125 can continue pipeline operations without waiting. Table 2 shows the pipeline segment diagram when the branch transfer succeeds. In Table 2, the row labeled "Instruction Address" is the instruction memory address corresponding to the instruction memory 124 "Block Select 1" (High Address) and "Offset 1" (Low Address), and is labeled "Get Instruction". The row corresponds to the instruction on the high level memory 124 "read port 1". It is assumed here that a delay of one clock cycle is required from the assertion of the instruction address to the assertion of the instruction. Further, the instruction 'i' is a branch instruction, and the 'target' is a branch target instruction, the 'target +1' is the next instruction of the branch target instruction, and so on.
表 2 流水线段图示(分支转移发生时)
顺序 i IF ID EX MEM WB
目标 IF ID EX MEM WB
目标+1 IF ID EX MEM
目标+2 IF ID EX
目标+3 IF ID
IF
指令地址 i 目标 目标+1 目标+2 目标+3 目标+4
获取指令 i 目标 目标+1 目标+2 目标+3
时钟周期 1 2 3 4 5 6 7
Table 2 Pipeline segment diagram (when branching occurs)
order i IF ID EX MEM WB
aims IF ID EX MEM WB
Goal +1 IF ID EX MEM
Target +2 IF ID EX
Target +3 IF ID
IF
Instruction address i aims Goal +1 Target +2 Target +3 Target +4
Get instruction i aims Goal +1 Target +2 Target +3
Clock cycle 1 2 3 4 5 6 7
另一方面,如果分支转移不发生,那么处理器核 125 用于获取紧跟在分支指令之后的指令的正确地址(源 BNX 152 、 PC 偏移量 155 )也已经准备好提供到高层次存储器 124 的端口'块选择 1 '和'偏移量 1 '。这样,处理器核 125 可以在不等待的情况下继续流水线操作。此外,循迹器 170 可以如之前所述,在控制信号的控制下使用读指针获取下一分支点以继续分支处理。表 3 显示了分支转移不成功时的流水线段图示。指令' i '是分支指令, ' i +1 '是分支指令的后一条指令,依此类推。  On the other hand, if branch branching does not occur, processor core 125 is used to obtain the correct address of the instruction immediately following the branch instruction (source) BNX 152, PC Offset 155) are also ready to provide port 'block selection 1' and 'offset 1' to high level memory 124. Thus, the processor core 125 You can continue the pipeline operation without waiting. Further, the tracker 170 can use the read pointer to acquire the next branch point under the control of the control signal to continue the branch processing as described earlier. table 3 An illustration of the pipeline segment when the branch transfer was unsuccessful is shown. The instruction 'i' is a branch instruction, 'i+1' is the last instruction of the branch instruction, and so on.
表 3 流水线段图示(分支转移不发生时)
顺序 i IF ID EX MEM WB
i+1 IF ID EX MEM WB
i+2 IF ID EX MEM
i+3 IF ID EX
i+4 IF ID
指令地址 i i+1 i+2 i+3 i+4
获取指令 i i+1 i+2 i+3 i+4
时钟周期 1 2 3 4 5 6 7
Table 3 Pipeline segment diagram (when branching does not occur)
order i IF ID EX MEM WB
i+1 IF ID EX MEM WB
i+2 IF ID EX MEM
i+3 IF ID EX
i+4 IF ID
Instruction address i i+1 i+2 i+3 i+4
Get instruction i i+1 i+2 i+3 i+4
Clock cycle 1 2 3 4 5 6 7
图5A显示了本发明所述的另一个流水线控制结构2。如图5A所示,译码器11对获取到的指令进行译码并提供执行所需的操作数。得到的指令译码结果和操作数被送到执行单元和程序计数器(EX/PC)以执行并计算程序流中的下一指令地址21。然而,与图2所述的流水线控制结构1不同,下一指令地址21和分支目标指令地址18通过寄存器24和23分别被送到指令存储器(或指令缓存)22。指令存储器22可以包含多个端口供读/写操作。Figure 5A shows another pipeline control structure 2 of the present invention. As shown in FIG. 5A, the decoder 11 decodes the fetched instructions and provides the operands required for execution. The resulting instruction decode result and operand are sent to the execution unit and program counter (EX/PC) to execute and calculate the next instruction address 21 in the program stream. However, unlike the pipeline control structure 1 described in FIG. 2, the next instruction address 21 and the branch target instruction address 18 are sent to the instruction memory (or instruction cache) 22 through the registers 24 and 23, respectively. Instruction memory 22 may contain multiple ports for read/write operations.
这样,指令存储器22可以包含两个地址端口用于输出下一指令地址21和分支目标指令地址18。当接收到下一指令地址21和分支目标指令地址18后,指令存储器22可以分别在输出端口28和29上提供相应指令。此外,在输出端口28和29上分别对应下一指令地址21和分支目标指令地址18的两条指令被输入到选择器26,而分支判断逻辑13可以向选择器26提供一个控制信号14以选择来源于端口28和29的输入并送到译码器11。Thus, the instruction memory 22 can include two address ports for outputting the next instruction address 21 and the branch target instruction address 18. Upon receiving the next instruction address 21 and the branch target instruction address 18, the instruction memory 22 can provide respective instructions on the output ports 28 and 29, respectively. Further, two instructions corresponding to the next instruction address 21 and the branch target instruction address 18 on the output ports 28 and 29, respectively, are input to the selector 26, and the branch determination logic 13 can provide a control signal 14 to the selector 26 for selection. Inputs from ports 28 and 29 are sent to decoder 11.
如果分支判断逻辑13判定分支点转移发生,输出对应分支目标指令地址18的指令29到译码器11。如果分支判断逻辑13判定分支点转移不发生,输出对应下一指令地址21的指令28到译码器11。此外,由于分支判断逻辑13在分支点到达其执行段或在其后指令译码之前就进行该判定,因此不会因为等待分支判定而造成流水线的时钟周期损失。If the branch judging logic 13 judges that the branch point transfer has occurred, the instruction 29 corresponding to the branch target instruction address 18 is output to the decoder 11. If the branch judging logic 13 judges that the branch point transfer does not occur, the instruction 28 corresponding to the next instruction address 21 is output to the decoder 11. Furthermore, since the branch decision logic 13 makes this determination before the branch point reaches its execution segment or before the instruction decodes, the clock cycle loss of the pipeline is not caused by waiting for the branch decision.
图6显示了对应流水线控制结构2的处理器环境400的实施例。如图6所示,处理器环境400与图3中的处理器环境300类似。然而,处理器环境400与处理器环境300的不同点在于分支判断逻辑被包含在处理器核125中,且高层次存储器124提供两个地址端口“块选择1、偏移量1”和“块选择2、偏移量2”,以及两个读端口“读端口1”127和“读端口2”128。FIG. 6 shows an embodiment of a processor environment 400 corresponding to the pipeline control structure 2. As shown in FIG. 6, processor environment 400 is similar to processor environment 300 in FIG. However, processor environment 400 differs from processor environment 300 in that branch decision logic is included in processor core 125, and high level memory 124 provides two address ports "Block Select 1, Offset 1" and "Block" Option 2, offset 2", and two read ports "Read Port 1" 127 and "Read Port 2" 128.
如图6所示,在处理分支指令时,轨道表126可以向地址端口“块选择2、偏移量2”提供分支目标指令地址目标BNX 201和目标BNY 202。此外,读指针151向“块选择1”提供下一指令的块地址BNX 152,而处理器核125向“偏移量1”提供下一指令的偏移地址。As shown in FIG. 6, when processing a branch instruction, the track table 126 can provide a branch target instruction address target BNX 201 and a target BNY to the address port "Block Select 2, Offset 2". 202. Further, the read pointer 151 supplies the block address BNX 152 of the next instruction to "Block Select 1", and the processor core 125 provides the offset address of the next instruction to "Offset 1".
当接收到分支目标指令地址和下一指令地址时,高层次存储器124分别取出分支目标指令和下一指令,并将获取到的分支目标指令和下一指令分别作为获取到的指令204和获取到的指令203送到“读端口2”128和“读端口1”127。获取到的指令204和获取到的指令203也是受控制信号207(即来源于处理器核125的TAKEN信号)控制的选择器205的两个输入。此外,选择器205根据TAKEN信号,在处理器核125对获取到的指令进行译码之前,选择被获取到的指令中的正确的一个作为输出206送到处理器核125。如果分支转移发生,那么选择获取到的分支目标指令,而如果分支转移不发生,那么选择获取到的下一指令。When receiving the branch target instruction address and the next instruction address, the high level memory 124 fetches the branch target instruction and the next instruction, respectively, and takes the acquired branch target instruction and the next instruction as the acquired instruction 204 and acquires respectively. The instruction 203 is sent to "Read Port 2" 128 and "Read Port 1" 127. The fetched instruction 204 and the fetched instruction 203 are also two inputs to the selector 205 that is controlled by the control signal 207 (ie, the TAKEN signal from the processor core 125). In addition, selector 205 selects the correct one of the fetched instructions as output 206 to processor core 125 before processor core 125 decodes the fetched instruction based on the TAKEN signal. If the branch transfer occurs, the acquired branch target instruction is selected, and if the branch transfer does not occur, the acquired next instruction is selected.
处理器核125还向循迹器170提供BRANCH/JUMP信号用以表示处理器核125已经到达一条分支指令,此时的TAKEN信号是程序执行的真实结果并选择正确的下一条该被执行的指令。这样,当检测到BRANCH/JUMP信号时,循迹器170用新地址作为BN 151。The processor core 125 also provides a BRANCH/JUMP signal to the tracker 170 to indicate that the processor core 125 has reached a branch instruction, the TAKEN signal at this time is the actual result of the program execution and selects the correct next executed instruction. . Thus, when the BRANCH/JUMP signal is detected, the tracker 170 uses the new address as the BN. 151.
如果分支转移发生,获取到的对应分支目标指令的指令204(目标BNX 201,目标BNY 202)已经作为输出206被送到处理器核125。这样,处理器核125可以不间断地继续流水线操作。当前,如果分支判断是无条件的,该无条件分支指令可以被视为一个条件满足、不需要进一步判断的特殊分支点。表4显示了在分支转移发生的情况下的流水线段的图示。在表4中,被标记为“指令地址”的行是对应指令存储器124 “块选择1”(高位地址)和“偏移量1”(低位地址)的指令存储地址,而标记为“获取指令”的行对应选择器205的输出206上的指令。If the branch transfer occurs, the obtained instruction 204 corresponding to the branch target instruction (target BNX 201, target BNY) 202) has been sent to the processor core 125 as output 206. In this way, processor core 125 can continue pipeline operations without interruption. Currently, if the branch decision is unconditional, the unconditional branch instruction can be treated as a special branch point that satisfies the condition and does not require further judgment. Table 4 shows an illustration of the pipeline segments in the event that a branch transfer occurs. In Table 4, the row labeled "Instruction Address" is the corresponding instruction memory 124. The "block select 1" (high order address) and "offset 1" (lower address) instructions store the address, while the line labeled "get instruction" corresponds to the instruction on the output 206 of the selector 205.
表 4 流水线段图示(分支转移发生时)
顺序 i IF ID EX MEM WB
目标 IF ID EX MEM WB
目标+1 IF ID EX MEM
目标+2 IF ID EX
目标+3 IF ID
指令地址 i i+1 目标+1 目标+2 目标+3 目标+4
读端口1 i i+1 目标+1 目标+2 目标+3 目标+4
读端口2 目标 目标 目标 目标 新目标 新目标 新目标 新目标
获取指令 目标 目标+1 目标+2 目标+3 目标+4
时钟周期 1 2 3 4 5 6 7
Table 4 Pipeline segment diagram (when branching occurs)
order i IF ID EX MEM WB
aims IF ID EX MEM WB
Goal +1 IF ID EX MEM
Target +2 IF ID EX
Target +3 IF ID
Instruction address i i+1 Goal +1 Target +2 Target +3 Target +4
Read port 1 i i+1 Goal +1 Target +2 Target +3 Target +4
Read port 2 aims aims aims aims new goal new goal new goal new goal
Get instruction aims Goal +1 Target +2 Target +3 Target +4
Clock cycle 1 2 3 4 5 6 7
在分支指令的译码段(时钟周期3),分支目标指令(“目标”)与下一指令(“+1”)一起从高层次存储器124中被获取到,并在该译码段结束前进行分支判定。由于两条指令都被获取到了,因此无论该分支转移是否发生,正确的指令都可以被选出并在其译码段(时钟周期4)使用。这就是说,分支点之后获取到的指令总是一条有效的指令,且不需要暂停流水线。同样地,如表4所示,“读端口2”事先提供了下一分支目标指令。In the decode segment of the branch instruction (clock cycle 3), the branch target instruction ("target") is fetched from the high level memory 124 along with the next instruction ("+1") and before the end of the decode segment Perform branch determination. Since both instructions are fetched, the correct instruction can be selected and used in its decode segment (clock cycle 4), regardless of whether the branch transfer occurs. This means that the instruction fetched after the branch point is always a valid instruction and there is no need to pause the pipeline. Similarly, as shown in Table 4, "Read Port 2" provides the next branch target instruction in advance.
当分支转移发生时,在时钟周期3选择从“读端口2”来的分支目标指令作为在时钟周期4进入译码段的指令。同样地,在时钟周期3的结尾,将处理器核125程序计数器(PC)强制设为分支目标指令的后一指令(目标+1),而非分支目标指令(目标)。循迹器170输出源BNX 152按正常方式驱动“块选择1”,由于当分支转移发生时,循迹器170将包含分支目标地址信息的下一BN 151传送到BN 152,因此源BNX 152 = 目标BNX。这样保证了下一个“目标+1”指令而非“目标”指令在时钟周期4能被获取到。这样,可以在不需要任何流水线暂停的情况下将程序流切换到分支目标。此外,指令地址按正常方式增加,直到到达下一分支点地址。When a branch transfer occurs, the branch target instruction from "Read Port 2" is selected at clock cycle 3 as an instruction to enter the decode segment at clock cycle 4. Similarly, at the end of clock cycle 3, the processor core 125 program counter (PC) is forced to the next instruction of the branch target instruction (target +1) instead of the branch target instruction (target). Tracker 170 output source BNX 152 drives "Block Select 1" in the normal manner, since when the branch transfer occurs, the tracker 170 transfers the next BN 151 containing the branch target address information to the BN 152, so the source BNX 152 = Target BNX. This ensures that the next "target +1" instruction, rather than the "target" instruction, can be acquired during clock cycle 4. This way, the program flow can be switched to the branch target without any pipeline pauses. In addition, the instruction address is incremented in the normal way until the next branch point address is reached.
另一方面,如果分支转移不发生,获取到的对应下一指令(源BNX 152,PC偏移量155)的指令203作为输出206被送到处理器核125。这样,处理器核125不暂停而继续流水线操作。表5显示了分支转移不发生时的流水线段的图示。On the other hand, if the branch transfer does not occur, the corresponding next instruction is obtained (source BNX Instruction 203 of 152, PC offset 155) is sent to processor core 125 as output 206. Thus, processor core 125 continues the pipeline operation without suspending. Table 5 shows an illustration of the pipeline segments when branching does not occur.
表 5 流水线段图示(分支转移不发生时)
顺序 i IF ID EX MEM WB
i+1 IF ID EX MEM WB
i+2 IF ID EX MEM
i+3 IF ID EX
i+4 IF ID
指令地址 i i+1 i+2 i+3 i+4 i+5
读端口1 i i+1 i+2 i+3 i+4 i+5
读端口2 目标 目标 目标 目标 新目标 新目标 新目标 新目标
获取指令 i i+1 i+2 i+3 i+4 i+5
时钟周期 1 2 3 4 5 6 7
Table 5 Pipeline segment diagram (when branching does not occur)
order i IF ID EX MEM WB
i+1 IF ID EX MEM WB
i+2 IF ID EX MEM
i+3 IF ID EX
i+4 IF ID
Instruction address i i+1 i+2 i+3 i+4 i+5
Read port 1 i i+1 i+2 i+3 i+4 i+5
Read port 2 aims aims aims aims new goal new goal new goal new goal
Get instruction i i+1 i+2 i+3 i+4 i+5
Clock cycle 1 2 3 4 5 6 7
这样,当分支转移不发生时,在时钟周期3选择从“读端口1”来的分支指令之后的指令“i+1”作为在时钟周期4进入译码段的指令。从这点以后,指令地址以正常方式增加直到到达下一个分支点。Thus, when the branch transfer does not occur, the instruction "i+1" following the branch instruction from "Read Port 1" is selected in clock cycle 3 as the instruction to enter the decode segment at clock cycle 4. From this point on, the instruction address is incremented in the normal way until the next branch point is reached.
图5B显示了流水线控制结构3的框图。流水线控制结构3是上述流水线控制结构2以外的另一种选择。流水线控制结构3与流水线控制结构2的不同点在于包含了一个额外的存储器40。存储器40可以包含与轨道表126的行数相同数目的存储块,每个存储块对应轨道表126中的一行。FIG. 5B shows a block diagram of the pipeline control structure 3. The pipeline control structure 3 is another option than the pipeline control structure 2 described above. The pipeline control structure 3 differs from the pipeline control structure 2 in that it includes an additional memory 40. The memory 40 may contain the same number of memory blocks as the number of rows of the track table 126, each corresponding to one of the track tables 126.
此外,存储器40中的每个存储块可以包含一个与轨道表126中一行的轨迹点或表项数目相同的存储单元。这样,当一个轨迹点是分支点时,分支目标指令除了被存入指令存储器22中对应该分支目标指令的存储块中,还被存入存储器40相应的存储单元中。Moreover, each memory block in memory 40 may contain a memory cell of the same number of track points or entries as a row in track table 126. Thus, when a track point is a branch point, the branch target instruction is stored in the corresponding memory location of the memory 40 in addition to being stored in the memory block of the instruction memory 22 corresponding to the branch target instruction.
分支目标地址18来源与轨道表126的表项。该表项的内容就是该表项或该分支轨迹点对应的分支目标指令的BNX和BNY。这样,BNX和BNY可以被用做索引以找到存储在存储器40中的相应分支目标指令。被选出的分支目标指令可以通过总线29被送到选择器26。此外,如之前所述,可以根据下一指令地址21从指令存储器22获取到下一指令,而该获取到的下一指令也可以通过总线28被送到选择器26。这样,图5B中的指令存储器22可以是一个单端口的存储设备,而不需要如图5A所示的双端口存储设备。The branch target address 18 is derived from the entry of the track table 126. The content of the entry is BNX and BNY of the branch target instruction corresponding to the entry or the branch track point. Thus, BNX and BNY can be used as an index to find the corresponding branch target instruction stored in memory 40. The selected branch target instruction can be sent to the selector 26 via the bus 29. Moreover, as previously described, the next instruction can be fetched from the instruction memory 22 based on the next instruction address 21, and the fetched next instruction can also be sent to the selector 26 via the bus 28. Thus, the instruction memory 22 of Figure 5B can be a single port storage device without the need for a dual port storage device as shown in Figure 5A.
可选地,轨道表126中对应分支点的表项本身可以存储分支目标指令。这就是说,分支轨迹点的内容除了分支目标指令的地址和偏移量外,还包括了分支目标指令。这样,轨道表126可以直接向选择器26提供分支目标指令以供来源于分支判断逻辑13的控制信号14选择。这种配置结构可以被视为存储器40集成在轨道表126中。Alternatively, the entry of the corresponding branch point in the track table 126 itself may store the branch target instruction. That is to say, the contents of the branch track point include the branch target instruction in addition to the address and offset of the branch target instruction. Thus, track table 126 can provide branch target instructions directly to selector 26 for selection by control signal 14 from branch decision logic 13. This configuration structure can be considered as the memory 40 being integrated in the track table 126.
这样,如前所述,由于分支目标指令地址可以事先确定,换句话说,由于分支目标信息和分支类型已经准备好的,因此可以在分支条件标志刚被处理器核操作设置完毕后就进行分支判定。这样,由于分支判定的主要功能就是计算分支目标地址并根据分支指令的分支类型和条件标志进行分支判定,因此可以早于分支指令本身到达其正常执行段时就进行分支判定。通常地,越早完成分支判定,需要的额外硬件资源就可以越少。根据从分支判断逻辑13来的预先分支判定,可以使用各种配置结构使得处理分支转移时流水线可以继续进行而不暂停。Thus, as described above, since the branch target instruction address can be determined in advance, in other words, since the branch target information and the branch type are already prepared, the branch condition flag can be branched immediately after the processor core operation is set. determination. Thus, since the main function of the branch decision is to calculate the branch target address and perform the branch decision according to the branch type and the condition flag of the branch instruction, the branch decision can be made earlier than when the branch instruction itself reaches its normal execution segment. In general, the sooner a branch decision is made, the less additional hardware resources are needed. Based on the pre-branch decision from the branch decision logic 13, various configuration configurations can be used such that the pipeline can continue without branching when processing the branch transfer.
图7显示了本发明所述处理器环境600的一个实施例。在处理器环境600中,使用了一个读缓冲用于提供处理器核125的程序流中一条分支指令的分支目标指令和中紧跟在该分支指令之后的指令。处理器环境600与图3中的处理器环境300类似,但有一些区别。如图7所示,处理器环境600除了缓存124、处理器核125、轨道表126和循迹器170外,还包含一个读缓冲229和一个选择器225。FIG. 7 shows an embodiment of a processor environment 600 in accordance with the present invention. In processor environment 600, a read buffer is used to provide a branch target instruction for a branch instruction in the program stream of processor core 125 and an instruction immediately following the branch instruction. Processor environment 600 is similar to processor environment 300 in Figure 3, with some differences. As shown in FIG. 7, processor environment 600 includes a read buffer 229 and a selector 225 in addition to cache 124, processor core 125, track table 126, and tracker 170.
读缓冲229连接在缓存124和处理器核125之间,并包含一个存储模块216和一个选择器214。存储模块216用于存储某些指令。例如,读缓冲229中的存储模块216存储并向外提供分支目标指令或后续指令中的一种,而另一种由缓存124直接提供,使得同样的缓存结构能提供更高的带宽。读缓冲229中的选择器214被用于根据分支判断选择分支目标指令和后续指令中的一种,使得在分支指令之后提供给处理器核125的指令是有效的或正确的。例如,选择器214被用于选择来源于存储模块216或缓存124的输出之一作为输出219送到处理器核125。此外,选择器220被用于选择来源于轨道表126或循迹器170的地址之一作为输出224送到缓存124(一个块地址);而选择器225被用于选择来源于轨道表126或来源于处理器125的PC(程序计数器)偏移量中的一个作为输出224送到缓存124(一个偏移地址)。来源于循迹器170的控制信号215被用于控制选择器220和225以及存储模块216,而‘TAKEN’信号被用于控制选择器214。Read buffer 229 is coupled between cache 124 and processor core 125 and includes a memory module 216 and a selector 214. The storage module 216 is used to store certain instructions. For example, memory module 216 in read buffer 229 stores and provides one of a branch target instruction or a subsequent instruction, while the other is provided directly by cache 124 such that the same cache structure can provide higher bandwidth. The selector 214 in the read buffer 229 is used to select one of the branch target instruction and the subsequent instruction based on the branch decision such that the instruction provided to the processor core 125 after the branch instruction is valid or correct. For example, selector 214 is used to select one of the outputs from storage module 216 or cache 124 as output 219 to processor core 125. In addition, selector 220 is used to select one of the addresses originating from track table 126 or tracker 170 as output 224 to buffer 124 (a block address); and selector 225 is used to select source track table 126 or One of the PC (Program Counter) offsets from processor 125 is sent as output 224 to buffer 124 (an offset address). Control signal 215 from tracker 170 is used to control selectors 220 and 225 and memory module 216, while a 'TAKEN' signal is used to control selector 214.
在操作过程中,循迹器170提供BNX 152和BNY 153使得轨道表126可以输出一个对应于该BNX 152和BNY 153的轨迹点。该轨迹点被读出的内容中包含诸如指令类型和分支目标地址等的信息。该内容(如:指令类型和分支目标地址)可以通过总线150被送到循迹器170。此外,分支目标地址的高位部分(BNX)被送到选择器220作为一个输入。分支目标地址的BNY或该BNY的一部分(如:最高两位)也可以通过总线222被送到选择器225。选择器220的另一个输入可以是由循迹器170提供的BNX,而选择器225的另一个输入可以是PC偏移量或PC偏移量的一部分(如:最高两位)。During operation, tracker 170 provides BNX 152 and BNY 153 such that track table 126 can output one corresponding to the BNX 152 and BNY Track point of 153. The content in which the track point is read contains information such as an instruction type and a branch target address. The content (eg, instruction type and branch target address) can be sent to the tracker 170 via the bus 150. Further, the upper portion of the branch target address (BNX) is sent to the selector 220 as an input. The BNY of the branch destination address or a portion of the BNY (e.g., the highest two bits) may also be sent to the selector 225 via the bus 222. Another input to selector 220 may be BNX provided by tracker 170, and the other input of selector 225 may be part of the PC offset or PC offset (eg, the highest two bits).
存储模块216可以包含根据其他部件的容量而预设数量的存储单元,用以存储指令。例如,如果一个存储块(如:指令块)总共包含16条指令,那么BNY和PC偏移量的长度可以是4位(4 bit)。假设在一个时钟周期内从指令存储器或缓存124中获取4条指令,存储模块216可以存储4条指令,且BNY或PC偏移量的最高两位可以被用于从由BNX指向的存储块中读取4条指令,使用BNY或PC偏移量的最低两位从读出的4条指令中选择1条。The storage module 216 can include a predetermined number of storage units for storing instructions based on the capacity of other components. For example, if a memory block (eg, an instruction block) contains a total of 16 instructions, the length of the BNY and PC offsets can be 4 bits (4). Bit). Assuming that four instructions are fetched from the instruction memory or cache 124 in one clock cycle, the memory module 216 can store four instructions, and the highest two bits of the BNY or PC offset can be used from the memory block pointed to by the BNX. Read 4 instructions and use the lowest two bits of the BNY or PC offset to select one of the four instructions read.
为便于描述,在此一个时钟周期获取的指令总数为4,而对于单发射或多发射处理器,每个时钟周期内获取的指令总数可以是任何合适的数目。此外,在一个时钟周期内获取的指令总数(如:4)可以超过处理器核125在一个时钟周期内执行指令的总数(如:1)。这样,可以在某个时钟周期使用轨道表126和其他相关部件装载存储模块216或填充缓存124。在某些实施例中,缓存124可以包含一个带宽大于处理器核125指令发射率的单端口存储模块,以支持循迹器170对存储模块216的填充,以及处理器核125的取指。For ease of description, the total number of instructions fetched in one clock cycle is four, and for a single or multiple transmit processor, the total number of instructions fetched per clock cycle can be any suitable number. In addition, the total number of instructions fetched in one clock cycle (eg, 4) may exceed the total number of instructions executed by processor core 125 in one clock cycle (eg, 1). Thus, the memory module 216 or the fill buffer 124 can be loaded using the track table 126 and other related components at a certain clock cycle. In some embodiments, the cache 124 can include a single port memory module having a bandwidth greater than the processor core 125 commanded emissivity to support padding of the memory module 216 by the tracker 170 and fetching of the processor core 125.
当循迹器170检测到一条指令是分支指令时,循迹器170暂停BNY的自增。当取指时间槽到来时,指令类型信息可以被用做控制信号215作为写使能信号控制存储模块216,将缓存124当前输出的4条指令通过总线217写入存储模块216。与此同时,根据指令类型信息(如:指令类型为分支指令),信号215可以控制选择器220选择总线221上的分支目标指令的BNX作为指令块地址,并控制选择器225选择总线222上的分支目标地址的BNY的高两位在该指令块中找到4条指令。这4条指令包含可以在下一个读周期或下一时钟周期被读取的分支目标指令。此外,这包含分支目标指令在内的4条指令被存入存储模块216,而PC偏移量被再次用于读取下一指令。这样,当处理器核125执行一个分支点对应的分支指令时,可以同时提供分支目标指令和紧跟该分支点的后续指令,从而可以根据分支转移是否发生取到正确的指令。When the tracker 170 detects that an instruction is a branch instruction, the tracker 170 suspends the self-increment of BNY. When the fetch time slot arrives, the instruction type information can be used as the write enable signal to control the memory module 216, and the four instructions currently output by the buffer 124 are written to the memory module 216 via the bus 217. At the same time, according to the instruction type information (eg, the instruction type is a branch instruction), the signal 215 can control the selector 220 to select the BNX of the branch target instruction on the bus 221 as the instruction block address, and control the selector 225 to select the bus 222. The upper two bits of the BNY of the branch destination address find four instructions in the instruction block. These four instructions contain branch target instructions that can be read in the next read cycle or the next clock cycle. In addition, the four instructions including the branch target instruction are stored in the storage module 216, and the PC offset is used again to read the next instruction. Thus, when the processor core 125 executes a branch instruction corresponding to a branch point, the branch target instruction and the subsequent instruction following the branch point can be simultaneously provided, so that the correct instruction can be fetched according to whether the branch transfer occurs.
图8显示了根据本发明技术方案在操作过程中读出指令的实施例。如图8所示,列226显示了存储模块216的输出218上的值,列227显示了缓存124的输出217上的值,而列228显示了处理器核125获取的当前指令。此外,假设指令I0、I1、I2和I3是对应同一PC偏移量的最高两位的4条连续的指令,其中I2是一条分支指令。再假设分支指令I2的分支目标指令是T1,而指令T0、T1、T2和T3是对应同一PC偏移量的最高两位的4条连续的指令。这里的行表示后续的时钟周期或执行周期(一个执行周期可能包含超过一个时钟周期)。4行分别对应周期i、周期i+1、周期i+2和周期i+3。此外,假设在分支指令被获取的后一个周期产生‘TAKEN’信号(即:该分支指令的分支转移是否发生)。Figure 8 shows an embodiment of reading instructions during operation in accordance with the teachings of the present invention. As shown in FIG. 8, column 226 shows the value on output 218 of memory module 216, column 227 shows the value on output 217 of buffer 124, and column 228 shows the current instruction fetched by processor core 125. Furthermore, it is assumed that the instructions I0, I1, I2 and I3 are four consecutive instructions corresponding to the highest two bits of the same PC offset, where I2 is a branch instruction. It is further assumed that the branch target instruction of the branch instruction I2 is T1, and the instructions T0, T1, T2, and T3 are four consecutive instructions corresponding to the highest two bits of the same PC offset. The lines here represent subsequent clock cycles or execution cycles (one execution cycle may contain more than one clock cycle). The four rows correspond to period i, period i+1, period i+2, and period i+3, respectively. Furthermore, it is assumed that the 'TAKEN' signal is generated in the latter cycle of the branch instruction acquisition (ie, whether the branch transfer of the branch instruction occurs).
在周期i,假设PC偏移量指向I0,读指针到达对应分支指令I2的轨迹点。在该周期,选择器214选择来源于缓存124的输出作为输出219,而PC偏移量的最低两位可以被用于从4条连续的指令中选择处理器核125所需的指令I0。如之前所述,读指针停止在分支轨迹点,从缓存124输出的4条指令则被存入存储模块216,而分支目标地址被用做下一周期(即周期i+1)的指令地址用于获取包含分支目标指令在内的4条指令。In cycle i, assuming that the PC offset points to I0, the read pointer reaches the track point of the corresponding branch instruction I2. During this cycle, selector 214 selects the output from cache 124 as output 219, and the lowest two bits of the PC offset can be used to select instruction I0 required by processor core 125 from among four consecutive instructions. As described earlier, the read pointer stops at the branch track point, and the four instructions output from the buffer 124 are stored in the memory module 216, and the branch target address is used as the instruction address for the next cycle (i.e., cycle i+1). Get 4 instructions including branch target instructions.
在周期i+1,存储模块216存储了指令I0、I1、I2和I3,而缓存124输出的指令为T0、T1、T2和T3。在周期i+1中,选择器214选择存储模块216的输出作为输出219,最低两位可以被用于从总线219上的4条指令中选择处理器核125所需的指令I1。此外,在周期i+1中,4条指令T0、T1、T2和T3被写入存储模块216,且读指针指向的轨迹点的BNX和PC偏移量被用做下个周期的指令(即指令I2)地址。At cycle i+1, memory module 216 stores instructions I0, I1, I2, and I3, while cache 124 outputs instructions T0, T1, T2, and T3. In cycle i+1, selector 214 selects the output of memory module 216 as output 219, and the lowest two bits can be used to select the instruction I1 required by processor core 125 from the four instructions on bus 219. Further, in the period i+1, the four instructions T0, T1, T2, and T3 are written to the storage module 216, and the BNX and PC offsets of the track points pointed by the read pointer are used as the instructions of the next cycle (ie, Instruction I2) address.
在周期i+2中,存储模块216存储并输出指令T0、T1、T2和T3,而缓存124输出的是指令I0、I1、I2和I3。在这个周期中,选择器214选择缓存124的输出作为输出219,而PC骗移量的最低两位可以被用于从总线219上的4条指令中选择处理器核125所需的指令I2。下一条指令(即I3)的地址被用做下个周期的指令地址。In cycle i+2, memory module 216 stores and outputs instructions T0, T1, T2, and T3, while buffer 124 outputs instructions I0, I1, I2, and I3. In this cycle, selector 214 selects the output of buffer 124 as output 219, and the lowest two bits of the PC spoofing amount can be used to select the instruction I2 required by processor core 125 from the four instructions on bus 219. The address of the next instruction (ie, I3) is used as the instruction address for the next cycle.
在周期i+3中,存储模块216存储并输出指令T0、T1、T2和T3,而缓存124输出的是指令I0、I1、I2和I3。在这个周期中,选择器214根据分支指令的分支转移是否发生,选择来源于缓存124的输出或来源于存储模块216的输出之一作为输出219。此外,还可以根据分支指令的分支转移是否发生,使用分支目标地址的BNY的最低两位或PC偏移量的最低两位相应选择处理器核所需的指令T1或I3。In cycle i+3, memory module 216 stores and outputs instructions T0, T1, T2, and T3, while buffer 124 outputs instructions I0, I1, I2, and I3. During this cycle, selector 214 selects one of the outputs from cache 124 or the output from memory module 216 as output 219 depending on whether a branch transfer of the branch instruction occurs. In addition, it is also possible to select the instruction T1 or I3 required by the processor core according to whether the branch transfer of the branch instruction occurs, using the lowest two bits of the BNY of the branch target address or the lowest two bits of the PC offset.
这样,该‘TAKEN’信号(即分支指令的分支转移是否发生)可以被用于选择缓存124的输出或存储模块216的输出。或者,可以分别使用分支目标地址的BNY的最低两位和PC偏移量的最低两位,从包含分支目标指令在内的4条指令中选择一条指令,以及从包含下一指令在内的4条指令中选择另一条指令。Thus, the 'TAKEN' signal (i.e., whether branching of the branch instruction occurs) can be used to select the output of cache 124 or the output of memory module 216. Alternatively, you can use the lowest two bits of BNY of the branch destination address and the lowest two bits of the PC offset, respectively, to select one instruction from the four instructions including the branch target instruction, and from the four instructions including the next instruction. Select another instruction from the instruction.
或者,可以同时向处理器核125提供指令I3和指令T1,而处理器核125可以对指令I3和指令T1进行分别译码,并同时得到指令I3和指令T1的操作数。根据分支指令的分支转移是否发生,处理器核125选择指令T1的译码结果或指令I3的译码结果,以及正确的操作数。具体地,当读指针到达分支指令I2对应的轨迹点时,如果处理器核125正在获取的指令离分支指令I2很近,如:正在获取指令I1,当指令I2被取到之后,缓存124就可以开始输出4条指令I0、I1、I2和I3。处理器核125可以仍然从缓存124和存储模块216分别获取I3和T1。例如,可以使用一个异或逻辑使控制选择器214的选择信号的值取反,从而分别从缓存124的输出中选择分支目标指令或包含分支目标指令在内的4条指令,或从存储模块216的输出中选择下一指令或包含下一指令在内的4条指令。在这种情况下,无论分支转移是否发生,4条指令T0、T1、T2和T3都不需要被存储在存储模块216中。Alternatively, instruction I3 and instruction T1 may be provided to processor core 125 at the same time, and processor core 125 may separately decode instruction I3 and instruction T1 and simultaneously obtain the operands of instruction I3 and instruction T1. Depending on whether branch branching of the branch instruction occurs, processor core 125 selects the decoding result of instruction T1 or the decoding result of instruction I3, as well as the correct operand. Specifically, when the read pointer reaches the track point corresponding to the branch instruction I2, if the instruction that the processor core 125 is acquiring is close to the branch instruction I2, for example, the instruction I1 is being acquired, after the instruction I2 is fetched, the buffer 124 is It is possible to start outputting four instructions I0, I1, I2 and I3. Processor core 125 may still obtain I3 and T1 from cache 124 and storage module 216, respectively. For example, an exclusive OR logic can be used to invert the value of the select signal of control selector 214 to select a branch target instruction or four instructions including a branch target instruction from the output of cache 124, respectively, or from memory module 216. The next instruction is selected in the output or four instructions including the next instruction. In this case, the four instructions T0, T1, T2, and T3 need not be stored in the storage module 216 regardless of whether a branch transfer occurs.
此外,图9显示了另一个本发明所述的流水线控制结构4。流水线控制结构4与图5中的流水线控制结构2类似。然而,流水线控制结构4与流水线控制结构2的不同点在于其包含两个独立的译码器:译码器25和译码器26,而不是只有一个译码器11。如图9所示,从指令存储器22获取到的两条指令进一步分别被译码器25和译码器26译码,而指令译码结果31和指令译码结果32被送到选择器33,由从分支判断逻辑13来的控制信号14选择。Furthermore, Figure 9 shows another pipeline control structure 4 of the present invention. The pipeline control structure 4 is similar to the pipeline control structure 2 of FIG. However, the pipeline control structure 4 differs from the pipeline control structure 2 in that it includes two independent decoders: a decoder 25 and a decoder 26 instead of only one decoder 11. As shown in FIG. 9, the two instructions fetched from the instruction memory 22 are further decoded by the decoder 25 and the decoder 26, respectively, and the instruction decode result 31 and the instruction decode result 32 are sent to the selector 33. It is selected by control signal 14 from branch decision logic 13.
如果分支判断逻辑13判定分支点转移发生,那么选择对应分支目标指令地址18的指令译码结果32送到执行单元12。如果分支判断逻辑13判定分支点转移不发生,那么选择下一指令地址21对应的指令译码31送到执行单元12。此外,由于分支判断逻辑13可以在分支指令执行段结尾且下一指令的执行段之前完成该判定,因此流水线不会有任何等待分支结果而导致的时钟周期损失。If branch decision logic 13 determines that a branch point branch has occurred, then instruction decode result 32 corresponding to branch target instruction address 18 is selected and sent to execution unit 12. If the branch decision logic 13 determines that the branch point transfer does not occur, then the instruction decode 31 corresponding to the next instruction address 21 is selected and sent to the execution unit 12. Furthermore, since the branch decision logic 13 can complete the decision at the end of the branch instruction execution segment and before the execution segment of the next instruction, the pipeline will not have any clock cycle loss due to waiting for the branch result.
这样,分支判断逻辑13除了在执行该分支点之前进行分支转移的判定外,还可以在一个正常的流水线段中对分支转移进行判定,如在分支指令执行段的结尾进行。由于在分支点之后所有可能被处理器核125执行的指令都已经被获取并译码,且指令类型是已知的,因此不会有因分支判定而导致的流水线暂停。Thus, the branch determination logic 13 can determine the branch transition in a normal pipeline segment in addition to the branch branch determination prior to execution of the branch point, as at the end of the branch instruction execution segment. Since all instructions that may be executed by processor core 125 after the branch point have been fetched and decoded, and the instruction type is known, there will be no pipeline stalls due to branch decisions.
此外,虽然如之前所述处理器核125每次执行一条指令,但处理器核125也可以每次执行超过一条指令(即一个多发射处理器),对于上述例子也是可行的。类似地,虽然描述的是5段流水线操作,对于各种流水线结构中任何其他数目的流水线级数的流水线操作也是可行的。Moreover, although processor core 125 executes one instruction at a time as previously described, processor core 125 may execute more than one instruction at a time (i.e., one multi-transmission processor), which is also possible with the above examples. Similarly, although a 5-segment pipeline operation is described, pipeline operations for any other number of pipeline stages in various pipeline structures are also possible.
此外,也可以通过对可执行指令的预处理或使用预定义的指令减少因分支指令处理而导致的时钟周期损失。例如,分支指令可以与非分支指令组合以构成一条复合指令,从而可以在处理该非分支指令的同时处理该分支指令,使得该分支指令所需的时钟周期代价被减少到零或最少。In addition, clock cycle loss due to branch instruction processing can also be reduced by pre-processing executable instructions or using predefined instructions. For example, a branch instruction can be combined with a non-branch instruction to form a compound instruction such that the branch instruction can be processed while the non-branch instruction is being processed such that the clock cycle penalty required for the branch instruction is reduced to zero or at a minimum.
例如,处理器指令集中通常包含一些保留或未使用的指令,或某些非分支指令中有保留位或未使用的部分。这些非分支指令可以被用于包含分支指令的分支条件和分支目标地址或偏移量等。这样,当执行这些非分支指令时,可以对分支条件进行判定,并在该非分支指令执行的过程中进行分支转移,从而实现零代价的分支处理。由于分支指令大致占处理器执行的指令总数的20%,减少可执行指令总数的20%可以显著增加处理器的性能。For example, a processor instruction set typically contains some reserved or unused instructions, or some non-branch instructions have reserved or unused portions. These non-branch instructions can be used to include branch conditions and branch target addresses or offsets of branch instructions. Thus, when these non-branch instructions are executed, the branch condition can be determined and branch transfer can be performed during the execution of the non-branch instruction, thereby achieving zero-cost branch processing. Since the branch instruction roughly accounts for 20% of the total number of instructions executed by the processor, reducing the total number of executable instructions by 20% can significantly increase the performance of the processor.
例如,在一个32位指令集中,一类加法指令包含5位的指令码,以4位寄存器号形式出现的两个源操作数和一个目标操作数,这样,在这种情况下,一条加法指令总共用掉了17位,而剩下的15位没有被用到。For example, in a 32-bit instruction set, a type of addition instruction consists of a 5-bit instruction code, two source operands in the form of a 4-bit register number, and a destination operand, thus, in this case, an addition instruction A total of 17 shares were shared, and the remaining 15 were not used.
另一方面,一类分支指令通过比较两个寄存器的值进行分支判定。作为一条独立的指令,这类分支指令可以包含一个5位的指令码,5位的分支偏移量,以及每个都是4位的寄存器号。这样,该分支指令用掉了18位。On the other hand, a type of branch instruction performs branch decision by comparing the values of two registers. As a separate instruction, such a branch instruction can contain a 5-bit instruction code, a 5-bit branch offset, and a 4-bit register number each. Thus, the branch instruction uses 18 bits.
然而,当该加法指令和该分支指令组合在一起构成一条复合指令(如:加法并分支)时,可以对5位的指令码增加1位以表示这条复合指令。这样,这条“加法并分支”指令包含6位的指令码,三个用于加法操作的寄存器号共占12位,两个用于分支转移的寄存器号共占8位,以及5位分支偏移量,总共为31位。这样,在这个例子中,可以在该加法指令被执行的同时执行该分支指令,从而实现零代价的分支处理。该方法使得零代价的分支转移成为可能。However, when the addition instruction and the branch instruction are combined to form a composite instruction (eg, addition and branch), one bit of the 5-bit instruction code may be added to represent the composite instruction. Thus, this "addition and branch" instruction contains a 6-bit instruction code, three register numbers for the addition operation occupy a total of 12 bits, two register numbers for branch transfer occupy a total of 8 bits, and a 5-bit branch bias The amount of shift is 31 in total. Thus, in this example, the branch instruction can be executed while the add instruction is being executed, thereby achieving zero-cost branch processing. This approach makes zero-cost branching possible.
在其他32位指令集的例子中某些执行类型的指令(如:加法、减法等)可以有一个6位的指令码,以及三个每个为5位寄存器号,总共为21位。这样,为附加的分支操作留下了11位。这种分支操作可以是固定类型的,如当某个特定寄存器的值非零时分支转移发生。这11位中的1位可以被称为分支位,而其他10位可以是分支偏移量。当该分支位被设为“0”时,该指令是一条普通的可执行指令。当该分支位被设为“1”时,该指令除了具有执行该可执行操作(加法等)的功能外,还是一条分支指令。此外,如果寄存器内容不等于零,该内容被减1,而该执行的结果是分支转移到地址为分支偏移量加上该复合指令地址的指令。另一方面,如果该寄存器内容等于零,那么分支转移不发生,下一被执行的指令为紧跟在该复合指令之后的指令。这种类型的指令能够为每次程序循环减少两个时钟周期。In other 32-bit instruction set examples, certain execution type instructions (eg, addition, subtraction, etc.) may have a 6-bit instruction code, and three of each are 5-bit register numbers, for a total of 21 bits. This leaves 11 bits for the additional branch operation. This branching operation can be of a fixed type, such as branching occurs when the value of a particular register is non-zero. One of these 11 bits can be referred to as a branch bit, while the other 10 bits can be a branch offset. When the branch bit is set to "0", the instruction is a normal executable instruction. When the branch bit is set to "1", the instruction is a branch instruction in addition to the function of performing the executable operation (addition, etc.). Furthermore, if the register contents are not equal to zero, the content is decremented by one, and the result of the execution is that the branch is transferred to an instruction whose address is a branch offset plus the composite instruction address. On the other hand, if the contents of the register are equal to zero, the branch transfer does not occur, and the next executed instruction is the instruction immediately following the composite instruction. This type of instruction can reduce two clock cycles per program cycle.
图10显示了本发明所述处理器环境1000的一个实施例。在处理器环境1000中,使用了一个读缓冲229用于提供处理器核125的程序流中的一条分支指令和在该分支指令之后的后续指令。处理器环境与图7中的处理器环境600类似,但有一些区别。如图10所示,处理器环境1000除了缓存124、处理器核125、轨道表126和循迹器170外,还包含一个读缓冲229。FIG. 10 shows an embodiment of a processor environment 1000 in accordance with the present invention. In processor environment 1000, a read buffer 229 is used to provide a branch instruction in the program stream of processor core 125 and subsequent instructions following the branch instruction. The processor environment is similar to processor environment 600 in Figure 7, but with some differences. As shown in FIG. 10, processor environment 1000 includes a read buffer 229 in addition to cache 124, processor core 125, track table 126, and tracker 170.
读缓冲229连接在缓存124和处理器核125之间,并包含一个存储模块216和一个选择器214。存储模块216用于存储某些指令,比如在缓存124中一个存储块中的内容。例如,读缓冲229中的存储模块216存储并向外提供分支目标指令及后续指令,而分支目标由缓存124直接提供,使得同样的缓存器124能提供更高的带宽。读缓冲229中的选择器214基于分支判断选择分支目标指令(来自缓存124)或分支指令的后续指令(来自存储模块216)中的一种作为输出219送到处理器核125,使得在分支指令之后提供给处理器核125的指令是有效的或正确的。此外,从轨道表126读出的总线150中的分支目标地址被送到缓存124作为块地址及块内偏移地址;来源于处理器125的PC偏移量155(块内偏移地址)被送到缓存存储模块216。而来自存储器125的‘TAKEN’信号被用于控制选择器214。Read buffer 229 is coupled between cache 124 and processor core 125 and includes a memory module 216 and a selector 214. The storage module 216 is used to store certain instructions, such as content in a memory block in the cache 124. For example, memory module 216 in read buffer 229 stores and provides branch target instructions and subsequent instructions, while branch targets are provided directly by cache 124 such that the same buffer 124 can provide higher bandwidth. The selector 214 in the read buffer 229 selects one of the branch target instruction (from the cache 124) or the subsequent instruction of the branch instruction (from the storage module 216) as the output 219 to the processor core 125 based on the branch decision, such that the branch instruction The instructions provided to processor core 125 are then valid or correct. Further, the branch target address in the bus 150 read from the track table 126 is sent to the buffer 124 as a block address and an intra-block offset address; the PC offset 155 (intra-block offset address) derived from the processor 125 is It is sent to the cache storage module 216. The 'TAKEN' signal from memory 125 is used to control selector 214.
在操作过程中,循迹器170提供BNX 152和BNY 153寻址,使得轨道表126可以输出一个对应于该BNX 152和BNY 153的轨迹点。该轨迹点被读出的内容中包含诸如指令类型和分支目标地址等的信息。该内容可以通过总线150被送到循迹器170。当循迹器170检测到一个轨迹点包含一条分支指令的信息时,总线150上的分支目标块地址221(目标BNX),及分支目标偏移地址222(目标BNY)被送到缓存124,以从缓存124中取出该分支目标指令(也可包括该分支目标指令同一存储块上的其他指令)放到总线217上送到存储模块216的写端口(write port)以及选择器214的一个输入端。其中分支目标块地址221及分支目标偏移地址222可被寄存器锁存后再送往缓存124寻址。 Tracker 170 provides BNX 152 and BNY during operation 153 addressing so that track table 126 can output one corresponding to the BNX 152 and BNY Track point of 153. The content in which the track point is read contains information such as an instruction type and a branch target address. This content can be sent to the tracker 170 via the bus 150. When the tracker 170 detects that a track point contains information of a branch instruction, the branch target block address 221 (target BNX) on the bus 150, and the branch target offset address 222 (target BNY) are sent to the buffer 124 to The branch target instruction is fetched from the cache 124 (which may also include other instructions on the same memory block of the branch target instruction) and placed on the bus 217 to the write port of the memory module 216 (write Port) and an input of selector 214. The branch target block address 221 and the branch target offset address 222 can be latched by the register and then sent to the cache 124 for addressing.
存储模块216可以包含特定数量的存储单元,用以存储指令。例如,包含一个存储块(如:指令块)的所有指令。处理器核125向存储模块216提供块内偏移量(offset)155来寻址,从存储模块中存储的指令中选择单数或复数条处理器核将要执行的指令送到选择器214的另一个输入端。处理器核125也提供 ‘TAKEN’信号和‘BRANCH/JUMP’信号至循迹器170以传递分支与否信息。其中‘TAKEN’信号也被送至选择器214作为选择器214的输入,也被送至存储模块216选择是否用缓存124输出的指令块替换存储模块216的内容。The storage module 216 can include a specific number of storage units for storing instructions. For example, all instructions that contain a block of memory (eg, an instruction block). The processor core 125 provides an in-block offset 155 for addressing to the memory module 216, and selects from the instructions stored in the memory module a single or multiple processor cores to send instructions to be executed to the selector 214. Input. Processor core 125 is also provided The 'TAKEN' signal and the 'BRANCH/JUMP' signal are sent to the tracker 170 to pass the branch or not information. The 'TAKEN' signal is also sent to the selector 214 as an input to the selector 214, and is also sent to the storage module 216 to select whether to replace the contents of the storage module 216 with the instruction block output by the buffer 124.
当分支判断时间槽到来时,从存储模块216中被选出放在选择器214的输入端上的指令包含有分支指令后的单数或复数条指令。如果判断结果是不进行分支,那么‘TAKEN’信号控制选择器214选择来自存储模块216的输出(分支指令后的指令),也控制存储模块216保持现有内容不变。在此情况下,处理器核125执行分支指令后的指令。此时循迹器170移向轨道表同一行中下一个分支指令,重复以上操作。When the branch determines that the time slot is coming, the instruction selected from the memory module 216 to be placed on the input of the selector 214 contains the singular or plural instructions following the branch instruction. If the result of the determination is that no branching, the 'TAKEN' signal control selector 214 selects the output from the storage module 216 (the instruction following the branch instruction) and also controls the storage module 216 to keep the existing content unchanged. In this case, processor core 125 executes the instructions following the branch instruction. At this time, the tracker 170 moves to the next branch instruction in the same row of the track table, and repeats the above operation.
但是,如果判断结果是进行分支,那么‘TAKEN’信号控制选择器214选择缓存124的输出(分支目标),也控制存储模块216用缓存124的输出更新存储模块216的内容。在这个情况下,处理器核125执行分支目标指令及分支目标指令后的指令。However, if the result of the determination is a branch, the 'TAKEN' signal controls the selector 214 to select the output of the cache 124 (branch target), and also controls the storage module 216 to update the contents of the storage module 216 with the output of the cache 124. In this case, the processor core 125 executes the branch target instruction and the instruction after the branch target instruction.
此时循迹器170移向轨道表中分支目标指令所在行的所在项。此后,PC偏移量155选择存储模块216中的指令(分支目标指令后的指令)供处理器核125执行,循迹器170移向轨道表同一行中下一个分支指令,重复以上操作。At this time, the tracker 170 moves to the item in the track table where the branch target instruction is located. Thereafter, the PC offset 155 selects the instruction in the memory module 216 (the instruction following the branch target instruction) for execution by the processor core 125, and the tracker 170 moves to the next branch instruction in the same row of the track table, repeating the above operations.
如此,当处理器核125执行一个分支点对应的分支指令时,可以同时提供分支目标指令和紧跟该分支点的后续指令,从而可以根据分支转移是否发生取到正确的指令。Thus, when the processor core 125 executes a branch instruction corresponding to a branch point, the branch target instruction and the subsequent instruction following the branch point can be simultaneously provided, so that the correct instruction can be fetched according to whether the branch transfer occurs.
可以在轨道中最后一条指令之后增加一个无条件转移标志,其转移目标指令就是程序流中紧跟在上述最后一条指令之后的指令。采用前述相同方法,可以在每条轨道上的指令执行完毕后,在不暂停流水线操作的情况下执行之后的指令。An unconditional branch flag can be added after the last instruction in the track. The branch target instruction is the instruction in the program stream immediately following the last instruction. By the same method as described above, the subsequent instructions can be executed without suspending the pipeline operation after the execution of the instructions on each track is completed.
此外,还可以通过检测使分支指令进行分支条件判断所需要的条件最终被确定的指令位置或时间点,在该条件被确定之后就进行分支判定,提前确定分支指令后应执行的指令的地址,从而在不使用现有分支预测方法的情况下,实现100%成功率的分支预测。图11显示了本发明所述分支预测方法的一个示意图1100。In addition, it is also possible to determine the position or time point at which the condition required for the branch condition determination by the branch instruction is finally determined, and after the condition is determined, perform branch determination, and determine the address of the instruction to be executed after the branch instruction is determined in advance. Thus, branch prediction with 100% success rate is achieved without using the existing branch prediction method. Figure 11 shows a schematic 1100 of the branch prediction method of the present invention.
如图11所示,指令流1101为一系列顺序执行的指令构成的指令流,且执行顺序为自左向右。指令流1101上的指令1102是分支指令。指令流1101上的指令1103、1104、1105均是改变分支指令1102分支条件(或条件标志)的指令,其中指令1105为这些指令中最后一个改变分支指令1102分支条件(或条件标志)的指令。与传统处理器的做法(执行到分支指令1102时再判断转移条件是否满足)不同,在本实施例中,在执行指令1105使得分支指令1102所需的分支条件(或条件标志)均被确定后,即可对分支转移条件是否满足进行判断。As shown in FIG. 11, the instruction stream 1101 is an instruction stream composed of a series of sequentially executed instructions, and the execution order is from left to right. The instruction 1102 on the instruction stream 1101 is a branch instruction. The instructions 1103, 1104, 1105 on the instruction stream 1101 are all instructions that change the branch instruction 1102 branch condition (or condition flag), where the instruction 1105 is the last of these instructions to change the branch instruction 1102 branch condition (or condition flag). Unlike the conventional processor practice (which determines whether the transition condition is satisfied when executing the branch instruction 1102), in the present embodiment, after the execution of the instruction 1105 causes the branch condition (or condition flag) required for the branch instruction 1102 to be determined. , you can judge whether the branch transfer condition is satisfied.
图12是本发明所述分支预测的实施例1200。分支预测系统1200由三部分组成:指令缓冲1201、预检测控制单元1202和时间点检测单元1203。指令缓冲1201中存储了当前正在执行的指令1205和指令1205之后的后续指令。时间点检测单元1203中包含了对应每个分支转移判断条件(或条件标志)的位置寄存器。根据处理器指令集体系结构的不同,分支转移判断条件(或条件标志)可以是通用寄存器、状态寄存器或标志位。可以通过不同分支转移判断条件(或条件标志)进行相互比较,以得到分支转移是否发生的判定结果。也可以将分支转移判断条件(或条件标志)与预设的值进行比较,以得到分支转移是否发生的判定结果。Figure 12 is an embodiment 1200 of the branch prediction of the present invention. The branch prediction system 1200 is composed of three parts: an instruction buffer 1201, a pre-detection control unit 1202, and a time point detection unit 1203. The instruction buffer 1201 stores the instruction 1205 currently being executed and the subsequent instruction following the instruction 1205. The time point detecting unit 1203 includes a position register corresponding to each branch transfer determination condition (or condition flag). The branch transfer decision condition (or condition flag) may be a general purpose register, a status register, or a flag bit, depending on the processor instruction set architecture. It is possible to compare with each other by different branch transfer judgment conditions (or condition flags) to obtain a determination result of whether or not branch transfer occurs. It is also possible to compare the branch transfer judgment condition (or condition flag) with a preset value to obtain a determination result as to whether or not the branch transfer occurs.
预检测控制单元1202控制领先指针1204以比处理器程序计数器(PC)更快的速度自当前指令1205沿指令缓冲扫描后续指令,直到到达第一个分支指令1206。在此过程中,读出领先指针指向的指令,送到时间点检测单元1203。由于处理器中可以用于分支转移判断的条件(或条件标志)数量是有限的,因此经时间点检测单元1203中译码器1207译码可知领先指针1204指向的指令是否会改变这些条件(或条件标志)中的一个或多个的值;如果该指令会改变分支转移判断的条件(或条件标志)值,那么同时能知道该指令将改变哪一个或哪些条件(或条件标志)的值。在扫描过程中,一旦发现领先指针1204指向的指令会改变分支转移判断条件(或条件标志)的值,就将该指令的指令位置信息写入时间点判断单元1203中对应被改变的那个或那些条件(或条件标志)的位置寄存器中。The pre-detection control unit 1202 controls the lead pointer 1204 to scan subsequent instructions from the current instruction 1205 along the instruction buffer at a faster rate than the processor program counter (PC) until the first branch instruction 1206 is reached. In the process, the instruction pointed by the leading pointer is read out and sent to the time point detecting unit 1203. Since the number of conditions (or condition flags) available in the processor for branch branch determination is limited, decoding by decoder 1207 in time point detection unit 1203 indicates whether the instruction pointed to by leading pointer 1204 would change these conditions (or The value of one or more of the condition flags; if the instruction changes the condition (or condition flag) value of the branch transfer decision, then it is possible to know which value of the condition (or condition flag) the instruction will change. During the scanning process, once it is found that the instruction pointed to by the leading pointer 1204 changes the value of the branch transition determination condition (or condition flag), the instruction position information of the instruction is written to the corresponding one or those of the time point judgment unit 1203. The condition register (or condition flag) in the location register.
为便于描述,分支预测系统1200以分支指令仅有两种判断条件(COND1和COND2)为例,当具有更多判断条件(或条件标志)时,也可以用同样方法推广实现。For convenience of description, the branch prediction system 1200 takes only two kinds of judgment conditions (COND1 and COND2) as branching instructions. When there are more judgment conditions (or condition flags), the branching prediction system 1200 can also be implemented by the same method.
以分支预测系统1200为例,通过对指令缓冲的扫描,从当前指令1205到第一条分支指令1206之间一共有三条指令会改变判断条件,其中改变COND1值的指令1208的指令位置信息是‘3’,改变COND2值的指令1209的指令位置信息是‘4’,另一条改变COND2值的指令1210的指令位置信息是‘7’。Taking the branch prediction system 1200 as an example, by scanning the instruction buffer, a total of three instructions from the current instruction 1205 to the first branch instruction 1206 change the determination condition, wherein the instruction position information of the instruction 1208 that changes the COND1 value is ' 3', the instruction position information of the instruction 1209 that changes the COND2 value is '4', and the instruction position information of the other instruction 1210 that changes the COND2 value is '7'.
当领先指针1204指向指令1208时,将指令1208读出并通过总线1211送到译码单元1207,经译码后,发现该指令会改变COND1的值。因此将指令1208的指令位置信息‘3’写入COND1对应的位置寄存器1212中。When the leading pointer 1204 points to the instruction 1208, the instruction 1208 is read and sent to the decoding unit 1207 via the bus 1211. Upon decoding, it is found that the instruction changes the value of COND1. Therefore, the instruction position information '3' of the instruction 1208 is written in the position register 1212 corresponding to COND1.
同理,当领先指针1204先后指向指令1209和指令1210时,将指令1209的指令位置信息‘4’和指令1210的指令位置信息‘7’先后写入COND2对应的位置寄存器1213中。这样,当领先指针1204到达分支指令1206时,位置寄存器1212和1213中分别存储了执行分支指令1206前最后更新条件值的指令的位置信息。此外,当领先指针1204到达指令1206处,该指令被读出并通过总线1211送到译码单元1207,经译码后发现是分支指令,则通过控制线1216将停止信号发送给预检测控制单元1202,使得领先指针1204停留在分支指令1206处。Similarly, when the leading pointer 1204 points to the instruction 1209 and the instruction 1210, the instruction position information '4' of the instruction 1209 and the instruction position information '7' of the instruction 1210 are sequentially written into the position register 1213 corresponding to the COND2. Thus, when the leading pointer 1204 reaches the branch instruction 1206, the location information of the instruction that last updated the condition value before the branch instruction 1206 is stored in the location registers 1212 and 1213, respectively. In addition, when the leading pointer 1204 reaches the instruction 1206, the instruction is read and sent to the decoding unit 1207 via the bus 1211, and after decoding, it is found to be a branch instruction, and the stop signal is sent to the pre-detection control unit through the control line 1216. 1202, causing the leading pointer 1204 to stay at the branch instruction 1206.
同时,由于领先指针1204指向的是分支指令,译码单元1207译码后通过控制线1215选择所有分支条件对应的位置寄存器中与分支指令1206所需判断的条件相关的位置寄存器的值输出到比较单元1218。比较单元1218的另一个用于比较的输入为已完成条件值更新的当前指令的当前指令位置信息1214。At the same time, since the leading pointer 1204 points to the branch instruction, the decoding unit 1207 decodes and selects the value of the position register related to the condition determined by the branch instruction 1206 in the position register corresponding to all the branch conditions through the control line 1215, and outputs the value to the comparison. Unit 1218. Another input for comparison by comparison unit 1218 is current instruction position information 1214 of the current instruction that has completed the condition value update.
由于该位置寄存器中存储的是指令的位置信息,因此一旦执行完毕位于分支指令1206前的最后更新某个分支判断条件值的指令时,送到比较单元1218的当前指令位置信息1214的值就等于该指令的指令位置信息,即比较单元1218输出“相等”的结果送到控制单元1219,表示该判断条件值已被更新,可以被用于进行分支转移条件是否满足的判断。Since the location information of the instruction is stored in the location register, the value of the current instruction location information 1214 sent to the comparison unit 1218 is equal to the value of the last update of the branch determination condition value before the branch instruction 1206 is executed. The command position information of the command, that is, the result of the comparison unit 1218 outputting "equal" is sent to the control unit 1219, indicating that the judgment condition value has been updated, and can be used to judge whether or not the branch transfer condition is satisfied.
按此方法进行,当分支指令1206所需的全部判断条件值均更新完毕时,控制单元1219即可发出“可以判断”的信号1220,允许处理器对分支指令1206进行分支判定,从而提前确定分支指令后应执行的指令的地址,实现100%成功率的分支预测。According to this method, when all the judgment condition values required for the branch instruction 1206 are updated, the control unit 1219 can issue a "decidable" signal 1220, allowing the processor to perform branch determination on the branch instruction 1206, thereby determining the branch in advance. The address of the instruction that should be executed after the instruction, to achieve branch prediction with 100% success rate.
虽然在图中没有明示,但是应该理解,时间点检测单元1203还可以从处理器中的寄存器、指令缓冲1201或其他任何合适的来源获取必要的信息,以产生信号1220。同时,时间点检测单元1203还可以向处理器发送必要的信息,以产生信号1220。Although not explicitly shown in the figures, it should be understood that the point in time detection unit 1203 may also obtain the necessary information from registers, instruction buffers 1201, or any other suitable source in the processor to generate the signal 1220. At the same time, the point in time detection unit 1203 can also send the necessary information to the processor to generate the signal 1220.
此外,在某些情况下,如处理器不进行乱序执行时,也可以不将所需分支判断条件对应的全部位置寄存器的值送到比较单元1218,而是由译码单元1207译码后发出控制信号,选择所需分支判断条件对应的多个位置寄存器中最大的值(位置值)输出到比较单元1218。这样,当比较单元1218输出“相等”的结果送到控制单元1219,或该位置寄存器值大于等于当前指令位置信息1214的值时,分支指令所需的全部判断条件值均更新完毕。此时,也可以用程序计数器的值作为当前指令位置信息1214的值。In addition, in some cases, if the processor does not perform out-of-order execution, the values of all the location registers corresponding to the required branch determination conditions may not be sent to the comparison unit 1218, but are decoded by the decoding unit 1207. A control signal is issued, and the largest value (position value) among the plurality of position registers corresponding to the desired branch determination condition is selected and output to the comparison unit 1218. Thus, when the result of the comparison unit 1218 outputting "equal" is sent to the control unit 1219, or the position register value is greater than or equal to the value of the current command position information 1214, all the judgment condition values required for the branch instruction are updated. At this time, the value of the program counter may also be used as the value of the current command position information 1214.
工业实用性Industrial applicability
本发明提出的装置和方法可以被用于各种与处理器相关的应用中,如通用处理器、专用处理器、片上系统(SOC)应用、专用集成电路(ASIC)应用,以及其他计算系统。例如,本发明提出的装置和方法可以被用于高性能处理器中以提高其流水线效率及系统的整体性能。 The apparatus and method proposed by the present invention can be used in a variety of processor related applications such as general purpose processors, special purpose processors, system on a chip (SOC) applications, application specific integrated circuit (ASIC) applications, and other computing systems. For example, the apparatus and method proposed by the present invention can be used in high performance processors to increase its pipeline efficiency and overall system performance.
序列表自由内容Sequence table free content

Claims (50)

  1. 一种控制处理器流水线操作的方法,所述处理器连接包含可执行的计算机指令的存储器;其特征在于所述方法包括:  A method of controlling processor pipeline operations, the processor being coupled to a memory comprising executable computer instructions; characterized in that the method comprises:
    判断处理器即将执行的指令是否是分支指令;Determining whether the instruction to be executed by the processor is a branch instruction;
    提供所述分支指令的分支目标指令地址和所述分支指令在程序序列中的后一指令地址;Providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;
    决定对应于分支指令的分支判定;和Determining a branch decision corresponding to a branch instruction; and
    根据所述分支判定,在所述分支指令到达其在流水线中的执行段前,选择至少所述分支目标指令和所述后一指令中的一个作为所述执行单元将要执行的指令,使得无论所述分支指令的转移是否发生都不会导致流水线操作的暂停。Determining, according to the branch decision, at least one of the branch target instruction and the latter instruction as an instruction to be executed by the execution unit before the branch instruction reaches its execution segment in the pipeline, such that Whether or not the branch instruction transfer occurs does not cause a pause in the pipeline operation.
  2. 根据权利要求1所述方法,其特征在于:The method of claim 1 wherein:
    根据分支类型和分支状态标志决定分支判定。The branch decision is determined according to the branch type and the branch status flag.
  3. 根据权利要求1所述方法,其特征在于所述选择进一步包括:The method of claim 1 wherein said selecting further comprises:
    根据所述分支判定,选择分支目标指令地址和后一指令地址中的一个;和Selecting one of a branch target instruction address and a subsequent instruction address according to the branch determination; and
    根据分支目标指令地址和后一指令地址中被选择的一个地址,获取分支目标指令和后一指令中的一个供给执行单元。One of the branch target instruction and the latter instruction is supplied to the execution unit according to the selected one of the branch target instruction address and the subsequent instruction address.
  4. 根据权利要求1所述方法,其特征在于所述选择进一步包括:The method of claim 1 wherein said selecting further comprises:
    使用分支目标指令地址和后一指令地址从存储器中相应获取分支目标指令和后一指令;和Obtaining a branch target instruction and a subsequent instruction from the memory using the branch target instruction address and the subsequent instruction address; and
    根据分支判定,选择获取到的分支目标指令和获取到的后一指令中的一个供给执行单元。According to the branch determination, one of the acquired branch target instruction and the acquired subsequent instruction is selected to be supplied to the execution unit.
  5. 根据权利要求1所述方法,其特征在于所述选择进一步包括:The method of claim 1 wherein said selecting further comprises:
    根据分支目标指令的地址从一个储存装置中获取分支目标指令;Obtaining a branch target instruction from a storage device according to an address of the branch target instruction;
    根据后一指令地址从存储器中获取后一指令;和Obtaining the next instruction from the memory according to the latter instruction address; and
    根据分支判定,选择获取到的分支目标指令和获取到的后一指令中的一个供给执行单元。According to the branch determination, one of the acquired branch target instruction and the acquired subsequent instruction is selected to be supplied to the execution unit.
  6. 根据权利要求2所述方法,其特征在于所述提供进一步包括:The method of claim 2 wherein said providing further comprises:
    通过审查可执行的计算机指令,提取出至少包含分支信息的指令信息;Extracting instruction information including at least branch information by reviewing executable computer instructions;
    根据提取的指令信息建立复数条轨道;和Establishing a plurality of tracks according to the extracted instruction information; and
    根据所述复数条轨道确定分支目标指令的地址。Determining an address of the branch target instruction based on the plurality of tracks.
  7. 根据权利要求6所述方法,其特征在于所述建立复数条轨道进一步包括:The method of claim 6 wherein said establishing a plurality of tracks further comprises:
    建立一个轨道表;所述轨道表包含对应复数条轨道的复数个轨道表行,每个表行对应一条轨道并包含复数个表项,每个表项对应一个轨迹点,每个所述轨迹点对应至少一条指令。Establishing a track table; the track table includes a plurality of track table rows corresponding to the plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to one track point, each of the track points Corresponds to at least one instruction.
  8. 根据权利要求7所述方法,其特征在于所述方法进一步包括:The method of claim 7 wherein said method further comprises:
    根据由第一地址确定的轨道号和由第二地址确定的轨道内偏移量对轨迹点寻址。The track point is addressed based on the track number determined by the first address and the intra-track offset determined by the second address.
  9. 根据权利要求8所述方法,其特征在于:The method of claim 8 wherein:
    由轨道表提供分支类型;和The branch type is provided by the track table; and
    由处理器提供分支状态标志。A branch status flag is provided by the processor.
  10. 根据权利要求8所述方法,其特征在于:The method of claim 8 wherein:
    当处理器提供的程序计数器(PC)偏移量等于轨道表分支轨迹点中的偏移量时,进行分支判定。The branch decision is made when the program counter (PC) offset provided by the processor is equal to the offset in the track table branch track point.
  11. 根据权利要求8所述方法,其特征在于:The method of claim 8 wherein:
    当处理器执行轨迹点对应的一条指令时,由第一地址确定包含所述指令的存储单元块,根据处理器提供的偏移量在所述存储单元块中可找到所述指令。When the processor executes an instruction corresponding to the track point, the memory cell block containing the instruction is determined by the first address, and the instruction can be found in the memory cell block according to the offset provided by the processor.
  12. 根据权利要求11所述方法,其特征在于所述方法进一步包括:The method of claim 11 wherein said method further comprises:
    可以通过对所述分支指令所在存储单元块的块地址、所述分支指令在所述存储单元块中的偏移量、转移到分支目标指令的转移偏移量三者求和,计算出分支目标指令的地址。The branch target can be calculated by summing the block address of the storage unit block in which the branch instruction is located, the offset of the branch instruction in the storage unit block, and the transfer offset transferred to the branch target instruction. The address of the instruction.
  13. 根据权利要求12所述方法,其特征在于所述方法进一步包括:The method of claim 12 wherein said method further comprises:
    将所述分支目标指令地址作为表项内容存储到轨道表中所述分支指令对应的表项中。And storing the branch target instruction address as an entry content in an entry corresponding to the branch instruction in the track table.
  14. 根据权利要求13所述方法,其特征在于所述方法进一步包括:The method of claim 13 wherein said method further comprises:
    当转移成功时,将所述分支指令对应表项中存储的第一地址和第二地址相应作为下一第一地址和下一第二地址;和When the transfer succeeds, the first address and the second address stored in the branch instruction corresponding entry are respectively used as the next first address and the next second address; and
    当转移不成功时,保持当前第一地址不变作为下一第一地址,并对当前第二地址加一作为下一第二地址,从而到达轨道表中的下一轨迹点。When the transfer is unsuccessful, the current first address is kept unchanged as the next first address, and the current second address is incremented by one as the next second address, thereby reaching the next track point in the track table.
  15. 根据权利要求13所述方法,其特征在于所述方法进一步包括:The method of claim 13 wherein said method further comprises:
    当转移成功时,将处理器的程序计数器强制置为分支目标指令的后一指令的地址,使得处理器在执行分支目标指令的同时获取所述分支目标指令的后一指令。When the transfer is successful, the processor's program counter is forced to the address of the next instruction of the branch target instruction, so that the processor acquires the latter instruction of the branch target instruction while executing the branch target instruction.
  16. 根据权利要求1所述方法,其特征在于:The method of claim 1 wherein:
    分支指令可以与非分支指令结合,使所述分支指令的分支执行过程与所述非分支指令的执行过程同时进行。The branch instruction may be combined with the non-branch instruction such that the branch execution process of the branch instruction is performed concurrently with the execution of the non-branch instruction.
  17. 一种用于控制处理器流水线操作的流水线控制系统;所述处理器连接包含可执行的计算机指令的存储器;其特征在于所述系统包括:A pipeline control system for controlling processor pipeline operations; the processor coupled to a memory containing executable computer instructions; characterized in that the system comprises:
    一个审查单元,用于判断处理器即将执行的指令是否为分支指令;a review unit for determining whether the instruction to be executed by the processor is a branch instruction;
    一个连接处理器的寻址单元,用于提供所述分支指令的分支目标指令地址和所述分支指令在程序序列中的后一指令地址;An addressing unit connected to the processor, configured to provide a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;
    一个分支逻辑单元,用于至少根据轨道单元提供的分支目标指令地址决定关于所述分支指令的分支判定;和a branch logic unit for determining a branch decision regarding the branch instruction based on at least a branch target instruction address provided by the track unit; and
    一个选择器,用于根据分支逻辑单元提供的分支判定,在所述分支指令到达其在流水线中的执行段之前,选择分支目标指令和后一指令中的至少一个作为所述执行单元将要执行的指令,使得无论所述分支指令的转移是否发生都不会导致流水线操作的暂停。a selector for selecting, according to a branch decision provided by the branch logic unit, at least one of a branch target instruction and a subsequent instruction as the execution unit is to perform before the branch instruction reaches its execution segment in the pipeline The instruction causes no stalling of the pipeline operation, whether or not the branch instruction is transferred.
  18. 根据权利要求17所述系统,其特征在于:The system of claim 17 wherein:
    所述选择器根据分支判定选择分支目标指令地址和后一指令地址中的一个,从而实现选择分支目标指令和后一指令中的至少一个;和Selecting, by the selector, one of a branch target instruction address and a subsequent instruction address according to the branch determination, thereby implementing selecting at least one of a branch target instruction and a subsequent instruction; and
    所述流水线控制系统进一步包括:The pipeline control system further includes:
    一个获取单元,用于根据分支目标指令地址和后一指令地址中被选出的一个地址从存储器中获取分支目标指令和后一指令中的一个,供给执行单元。An obtaining unit is configured to obtain one of the branch target instruction and the latter instruction from the memory according to the selected one of the branch target instruction address and the subsequent instruction address, and supply the same to the execution unit.
  19. 根据权利要求17所述系统,其特征在于:The system of claim 17 wherein:
    所述流水线控制系统进一步包括:The pipeline control system further includes:
    一个获取单元,用于使用分支目标指令地址和后一指令地址相应地从存储器中获取分支目标指令和后一指令;和An obtaining unit, configured to acquire a branch target instruction and a subsequent instruction from the memory, respectively, using the branch target instruction address and the subsequent instruction address; and
    所述选择器根据分支判定选择所述获取到的分支目标指令和所述获取到的后一指令中的一个供给执行单元,从而实现选择分支目标指令和后一指令中的至少一个。The selector selects one of the acquired branch target instruction and the acquired subsequent instruction according to the branch determination, thereby implementing selection of at least one of the branch target instruction and the latter instruction.
  20. 根据权利要求17所述系统,其特征在于所述系统进一步包括:The system of claim 17 wherein said system further comprises:
    一个获取单元和一个储存装置,其中:An acquisition unit and a storage device, wherein:
    所述获取单元用于:The obtaining unit is used to:
    根据分支目标指令地址从所述储存装置中获取分支目标指令;和Obtaining a branch target instruction from the storage device according to a branch target instruction address; and
    根据后一指令地址从存储器中获取后一指令;和Obtaining the next instruction from the memory according to the latter instruction address; and
    所述选择器根据分支判定选择所述获取到的分支目标指令和所述获取到的后一指令中的一个供给执行单元。The selector selects one of the acquired branch target instruction and the acquired subsequent instruction to supply the execution unit according to the branch determination.
  21. 根据权利要求17所述系统,其特征在于:The system of claim 17 wherein:
    所述审查单元可以进一步用于:The review unit can be further used to:
    通过审查所述可执行的计算机指令,提取出至少包含分支信息的指令信息;和Extracting instruction information including at least branch information by reviewing the executable computer instructions; and
    为实现提供所述分支指令的分支目标指令地址和后一指令地址,所述轨道单元可以进一步用于:To implement the branch target instruction address and the subsequent instruction address providing the branch instruction, the track unit may be further used to:
    根据提取的指令信息建立复数条轨道;和Establishing a plurality of tracks according to the extracted instruction information; and
    根据所述复数条轨道确定分支目标指令的地址。Determining an address of the branch target instruction based on the plurality of tracks.
  22. 根据权利要求21所述系统,其特征在于所述寻址单元进一步包括:The system of claim 21 wherein said addressing unit further comprises:
    一个轨道表;所述轨道表包含对应复数条轨道的复数个轨道表行,每个表行对应一条轨道并包含复数个表项,每个表项对应一个轨迹点,每个所述轨迹点对应至少一条指令。a track table; the track table includes a plurality of track table rows corresponding to a plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to a track point, each of the track points corresponding to At least one instruction.
  23. 根据权利要求17所述系统,其特征在于:The system of claim 17 wherein:
    分支指令可以与非分支指令结合,使所述分支指令的分支执行过程与所述非分支指令的执行过程同时进行。The branch instruction may be combined with the non-branch instruction such that the branch execution process of the branch instruction is performed concurrently with the execution of the non-branch instruction.
  24. 根据权利要求17所述系统,其特征在于:The system of claim 17 wherein:
    分支指令可以作为包含所述分支指令和一条非分支指令的复合指令的一部分。A branch instruction can be part of a compound instruction that includes the branch instruction and a non-branch instruction.
  25. 根据权利要求24所述系统,其特征在于:The system of claim 24 wherein:
    所述复合指令包括一个分支位用于表示所述复合指令中包含的分支指令是否将被执行;和The composite instruction includes a branch bit for indicating whether a branch instruction included in the composite instruction is to be executed; and
    基于一个预先设置的寄存器的内容对所述复合指令中的分支指令进行分支判断。The branch instruction in the composite instruction is branched and determined based on the contents of a preset register.
  26. 一种控制处理器流水线操作的方法,所述处理器连接包含可执行的计算机指令的存储器;其特征在于所述方法包括:A method of controlling processor pipeline operations, the processor being coupled to a memory comprising executable computer instructions; characterized in that the method comprises:
    判断处理器即将执行的指令是否是分支指令;Determining whether the instruction to be executed by the processor is a branch instruction;
    提供所述分支指令的分支目标指令地址和所述分支指令在程序序列中的后一指令地址;Providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;
    根据所述分支目标指令地址和所述后一指令地址相应获取分支目标指令和后一指令;Obtaining a branch target instruction and a subsequent instruction according to the branch target instruction address and the subsequent instruction address;
    对所述获取到的分支目标指令和后一指令进行译码;和Decoding the acquired branch target instruction and the subsequent instruction; and
    根据处理器提供的分支判定选择所述被译码的分支目标指令和被译码的后一指令供给执行单元,使得无论所述分支指令的转移是否发生都不会导致流水线操作的暂停。The decoded branch target instruction and the decoded next instruction are supplied to the execution unit in accordance with a branch decision provided by the processor such that no stalling of the pipeline operation occurs regardless of whether the branch instruction transition occurs.
  27. 根据权利要求26所述方法,其特征在于所述提供进一步包括:The method of claim 26 wherein said providing further comprises:
    通过审查所述可执行的计算机指令,提取出至少包含分支信息的指令信息;Extracting instruction information including at least branch information by reviewing the executable computer instructions;
    根据提取的指令信息建立复数条轨道;和Establishing a plurality of tracks according to the extracted instruction information; and
    根据所述复数条轨道确定分支目标指令的地址。Determining an address of the branch target instruction based on the plurality of tracks.
  28. 根据权利要求27所述方法,其特征在于所述建立复数条轨道进一步包括:The method of claim 27 wherein said establishing a plurality of tracks further comprises:
    建立一个轨道表;所述轨道表包含对应复数条轨道的复数个轨道表行,每个表行对应一条轨道并包含复数个表项,每个表项对应一个轨迹点,每个所述轨迹点对应至少一条指令。Establishing a track table; the track table includes a plurality of track table rows corresponding to the plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to one track point, each of the track points Corresponds to at least one instruction.
  29. 一种用于控制处理器流水线操作的流水线控制系统;所述处理器连接包含可执行的计算机指令的存储器;其特征在于所述系统包括:A pipeline control system for controlling processor pipeline operations; the processor coupled to a memory containing executable computer instructions; characterized in that the system comprises:
    一个连接处理器的寻址单元,用于提供所述分支指令的分支目标指令地址和所述分支指令在程序序列中的后一指令地址;An addressing unit connected to the processor, configured to provide a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;
    一个连接存储器和处理器的读缓冲,用于至少存储分支指令的分支目标指令和所述后一指令中的一个;a read buffer coupled to the memory and the processor, configured to store at least one of a branch target instruction and a subsequent instruction of the branch instruction;
    其中,读缓冲还包括一个连接处理器的选择器,用于在分支指令被执行时,向处理器提供分支目标指令或后一指令中的一个,使得无论所述分支指令的转移是否发生都不会导致流水线操作的暂停。The read buffer further includes a selector coupled to the processor for providing one of the branch target instruction or the latter instruction to the processor when the branch instruction is executed, such that neither the branch instruction transfer occurs or not Will cause a pause in the operation of the pipeline.
  30. 根据权利要求29所述系统,其特征在于:The system of claim 29 wherein:
    所述存储器在一个周期内能够输出至少两条指令;和The memory is capable of outputting at least two instructions in one cycle; and
    所述读缓冲在一个周期内能够存储至少两条指令。The read buffer is capable of storing at least two instructions in one cycle.
  31. 根据权利要求30所述系统,其特征在于:The system of claim 30 wherein:
    所述存储器包括一个带宽高于处理器指令发射速率的单端口存储模块。The memory includes a single port memory module having a bandwidth that is higher than the processor instruction transmission rate.
  32. 根据权利要求30所述系统,其特征在于:The system of claim 30 wherein:
    指令地址中的一部分用于从存储器中的一个存储块中读出至少两条指令;和A portion of the instruction address is used to read at least two instructions from a memory block in memory; and
    指令地址中的另一部分用于从所述至少两条指令中选出所述指令。Another portion of the instruction address is for selecting the instruction from the at least two instructions.
  33. 根据权利要求30所述系统,其特征在于:The system of claim 30 wherein:
    在第一个周期内,分支目标指令地址被送到存储器用于读出包含分支目标指令在内的至少两条指令。In the first cycle, the branch target instruction address is sent to the memory for reading at least two instructions including the branch target instruction.
  34. 根据权利要求33所述系统,其特征在于:The system of claim 33 wherein:
    在第二个周期内,存储器输出所述包含分支目标指令在内的至少两条指令,且分支指令地址被送到存储器用于读出包含分支指令在内的至少两条指令。In the second cycle, the memory outputs the at least two instructions including the branch target instruction, and the branch instruction address is sent to the memory for reading at least two instructions including the branch instruction.
  35. 根据权利要求34所述系统,其特征在于:The system of claim 34 wherein:
    在第三个周期内,所述包含分支目标指令在内的至少两条指令被存储到读缓冲中,存储器输出所述包含分支指令在内的至少两条指令,且后一指令地址被送到存储器用于读出包含后一指令在内的至少两条指令。In the third cycle, the at least two instructions including the branch target instruction are stored in the read buffer, and the memory outputs the at least two instructions including the branch instruction, and the latter instruction address is sent The memory is used to read at least two instructions including the latter instruction.
  36. 根据权利要求35所述系统,其特征在于:The system of claim 35 wherein:
    在第四个周期内,读缓冲输出所述包含分支目标指令在内的至少两条指令,且存储器输出所述包含后一指令在内的至少两条指令。In the fourth cycle, the read buffer outputs the at least two instructions including the branch target instruction, and the memory outputs the at least two instructions including the latter instruction.
  37. 根据权利要求35所述系统,其特征在于:The system of claim 35 wherein:
    一个来自处理器的表示分支是否成功发生的控制信号用于决定选择所述包含分支目标指令在内的至少两条指令,或选择所述包含后一指令在内的至少两条指令。A control signal from the processor indicating whether the branch successfully occurred is used to determine at least two instructions including the branch target instruction, or to select the at least two instructions including the latter instruction.
  38. 根据权利要求37所述系统,其特征在于:The system of claim 37 wherein:
    程序计数器偏移量的一部分用于从所述包含分支目标指令在内的至少两条指令中选出分支目标指令,或从所述包含后一指令在内的至少两条指令中选出后一指令。And a part of the program counter offset is used to select a branch target instruction from the at least two instructions including the branch target instruction, or select the next one from the at least two instructions including the latter instruction instruction.
  39. 根据权利要求29所述系统,其特征在于:The system of claim 29 wherein:
    所述寻址单元包括一个包含对应复数条轨道的复数个轨道表行的轨道表,每个表行对应一条轨道并包含复数个表项,每个表项对应一个轨迹点,每个轨迹点对应至少一条指令。The addressing unit includes a track table including a plurality of track table rows corresponding to a plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to one track point, and each track point corresponding to each track point At least one instruction.
  40. 根据权利要求39所述系统,其特征在于:The system of claim 39 wherein:
    所述轨道表中存储了对应于轨道表中分支指令的作为表项的内容的分支目标指令地址。A branch target instruction address corresponding to the branch instruction in the track table as the contents of the entry is stored in the track table.
  41. 一种用于控制处理器流水线操作的流水线控制系统;所述处理器连接包含可执行的计算机指令的存储器;其特征在于所述系统包括:A pipeline control system for controlling processor pipeline operations; the processor coupled to a memory containing executable computer instructions; characterized in that the system comprises:
    一个连接处理器的寻址单元,用于提供所述分支指令的分支目标指令地址和所述分支指令在程序序列中的后一指令地址;An addressing unit connected to the processor, configured to provide a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;
    一个连接存储器和处理器的读缓冲,用于存储当前指令所在的指令段;a read buffer connected to the memory and the processor for storing the instruction segment in which the current instruction is located;
    其中,读缓冲还包括一个连接处理器的选择器,用于在分支指令被执行时,向处理器提供分支目标指令或分支指令的后一指令中的一个,使得无论所述分支指令的转移是否发生都不会导致流水线操作的暂停。The read buffer further includes a selector coupled to the processor for providing one of the branch target instruction or the branch instruction to the processor when the branch instruction is executed, such that whether the branch instruction is transferred or not Occurs without causing a pause in pipeline operation.
  42. 根据权利要求41所述系统,其特征在于:The system of claim 41 wherein:
    使用来自寻址单元的分支目标地址从所述存储器中选出分支目标指令的后一指令;和Selecting a subsequent instruction of the branch target instruction from the memory using a branch target address from the addressed unit; and
    使用来自处理器的程序计数器偏移量从读缓冲中选出当前指令的后一指令。The next instruction of the current instruction is selected from the read buffer using the program counter offset from the processor.
  43. 根据权利要求42所述系统,其特征在于:The system of claim 42 wherein:
    一个来自处理器的表示分支是否成功发生的控制信号用于决定选择来源于所述存储器的分支目标指令,或选择所述来源于读缓冲的当前指令的后一指令。A control signal from the processor indicating whether the branch was successfully generated is used to decide to select a branch target instruction originating from the memory, or to select the latter instruction from the current instruction of the read buffer.
  44. 根据权利要求41所述系统,其特征在于:The system of claim 41 wherein:
    可以根据当前指令的类型,对送到存储器的分支目标指令地址进行锁存。The branch target instruction address sent to the memory can be latched according to the type of the current instruction.
  45. 根据权利要求41所述系统,其特征在于:The system of claim 41 wherein:
    所述寻址单元包括一个包含对应复数条轨道的复数个轨道表行的轨道表,每个表行对应一条轨道并包含复数个表项,每个表项对应一个轨迹点,每个轨迹点对应至少一条指令。The addressing unit includes a track table including a plurality of track table rows corresponding to a plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to one track point, and each track point corresponding to each track point At least one instruction.
  46. 根据权利要求45所述系统,其特征在于:The system of claim 45 wherein:
    所述轨道表中存储了对应于轨道表中分支指令的作为表项的内容的分支目标指令地址。A branch target instruction address corresponding to the branch instruction in the track table as the contents of the entry is stored in the track table.
  47. 一种用于控制处理器流水线操作的流水线控制系统;所述处理器连接一个包含可执行的计算机指令的存储器和一个比所述存储器速度更快的指令缓冲;其特征在于所述流水线控制系统包括:A pipeline control system for controlling processor pipeline operations; the processor coupled to a memory including executable computer instructions and a faster instruction buffer than the memory; wherein the pipeline control system includes :
    一个预检测控制单元,用于控制一个沿读缓冲以比指向当前正在被处理器核执行的当前指令指针更快速度移动的领先指针;所述预检测控制单元可进一步对领先指针经过的指令进行审查,从而提取出至少包含分支指令信息和最后更新分支指令的分支判断条件或条件标志的指令信息的指令信息,使得领先指针停止在至少一条分支指令处;和a pre-detection control unit for controlling a leading pointer along the read buffer to move faster than a current instruction pointer currently being executed by the processor core; the pre-detection control unit may further perform an instruction for the leading pointer to pass Examining, thereby extracting instruction information including at least one branch instruction information and instruction information of a branch judgment condition or condition flag of the last update branch instruction, so that the leading pointer is stopped at at least one branch instruction;
    一个时间点检测单元,用于在最后更新该分支指令分支条件或条件标志的指令执行后进行分支判定,从而在所述分支指令执行前就可确定其后应被处理器执行的指令是分支目标指令还是所述分支指令后的指令,使得无论所述分支指令的转移是否发生都不会导致流水线操作的暂停。a time point detecting unit, configured to perform branch determination after the execution of the instruction for updating the branch instruction branch condition or the condition flag, so that before the execution of the branch instruction, the instruction that should be executed by the processor is a branch target The instruction is also an instruction following the branch instruction such that no stalling of the pipeline operation occurs regardless of whether the branch instruction transition occurs.
  48. 根据权利要求47所述系统,其特征在于所述流水线控制系统还可以进一步用于根据提取出的指令信息建立对应指令段的轨道;所述轨道包含复数个轨迹点,每个对应指令段中的一条指令。The system according to claim 47, wherein said pipeline control system is further configured to: establish a track corresponding to the instruction segment based on the extracted instruction information; said track comprises a plurality of track points, each corresponding to the instruction segment An instruction.
  49. 根据权利要求47所述系统,其特征在于所述流水线控制系统进一步用于:The system of claim 47 wherein said pipeline control system is further for:
    将所有更新分支指令分支条件或条件标志的指令的位置信息存储到相应的位置寄存器中;All position information of the instruction that updates the branch instruction branch condition or the condition flag is stored in the corresponding location register;
    对当前指令指针和存储在位置寄存器中的对应至少一条分支指令的位置信息进行比较;和Comparing the current instruction pointer with the location information of the corresponding at least one branch instruction stored in the location register; and
    如果当前指令指针大于或等于存储在所述位置寄存器中的位置信息时,生成一个信号从而进行分支判定。If the current instruction pointer is greater than or equal to the location information stored in the location register, a signal is generated to perform the branch decision.
  50. 根据权利要求47所述系统,其特征在于:The system of claim 47 wherein:
    所述分支指令信息包括直接寻址分支指令信息和间接寻址分支指令信息。The branch instruction information includes direct addressing branch instruction information and indirect addressing branch instruction information.
PCT/CN2012/077565 2011-06-29 2012-06-26 Branch processing method and system WO2013000400A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110184416.XA CN102855121B (en) 2011-06-29 2011-06-29 Branching processing method and system
CN201110184416.X 2011-06-29

Publications (1)

Publication Number Publication Date
WO2013000400A1 true WO2013000400A1 (en) 2013-01-03

Family

ID=47401736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/077565 WO2013000400A1 (en) 2011-06-29 2012-06-26 Branch processing method and system

Country Status (2)

Country Link
CN (2) CN102855121B (en)
WO (1) WO2013000400A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703832A (en) * 2021-09-10 2021-11-26 中国人民解放军国防科技大学 Method, device and medium for executing immediate data transfer instruction

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424158A (en) 2013-08-19 2015-03-18 上海芯豪微电子有限公司 General unit-based high-performance processor system and method
CN104793921B (en) * 2015-04-29 2018-07-31 深圳芯邦科技股份有限公司 A kind of instruction branch prediction method and system
WO2017045212A1 (en) * 2015-09-20 2017-03-23 张鹏 Branch prefetching method
CN108845831A (en) * 2017-04-13 2018-11-20 上海芯豪微电子有限公司 A kind of branch processing method and system
CN109101276B (en) 2018-08-14 2020-05-05 阿里巴巴集团控股有限公司 Method for executing instruction in CPU
CN109783143B (en) * 2019-01-25 2021-03-09 贵州华芯通半导体技术有限公司 Control method and control device for pipelined instruction streams
CN111258649B (en) * 2020-01-21 2022-03-01 Oppo广东移动通信有限公司 Processor, chip and electronic equipment
CN111461326B (en) * 2020-03-31 2022-12-20 中科寒武纪科技股份有限公司 Instruction addressing method based on equipment memory and computer readable storage medium
CN111538533B (en) * 2020-04-07 2023-08-08 江南大学 Class adder-based instruction request circuit and out-of-order instruction transmitting architecture
CN111538535B (en) * 2020-04-28 2021-09-21 支付宝(杭州)信息技术有限公司 CPU instruction processing method, controller and central processing unit
CN114528025B (en) * 2022-02-25 2022-11-15 深圳市航顺芯片技术研发有限公司 Instruction processing method and device, microcontroller and readable storage medium
CN115437695B (en) * 2022-07-01 2024-01-23 无锡芯领域微电子有限公司 Branch delay slot processing method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1222985A (en) * 1996-05-03 1999-07-14 艾利森电话股份有限公司 Method relating to handling of conditional jumps in multi-stage pipeline arrangement
US5928357A (en) * 1994-09-15 1999-07-27 Intel Corporation Circuitry and method for performing branching without pipeline delay
CN1349160A (en) * 2001-11-28 2002-05-15 中国人民解放军国防科学技术大学 Correlation delay eliminating method for streamline control
CN1497436A (en) * 2002-10-22 2004-05-19 富士通株式会社 Information processing unit and information processing method
US20040111592A1 (en) * 2002-12-06 2004-06-10 Renesas Technology Corp. Microprocessor performing pipeline processing of a plurality of stages
US7197603B2 (en) * 1997-08-01 2007-03-27 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems
WO2011079824A1 (en) * 2009-12-31 2011-07-07 Shanghai Xin Hao Micro Electronics Co., Ltd. Branching processing method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5928357A (en) * 1994-09-15 1999-07-27 Intel Corporation Circuitry and method for performing branching without pipeline delay
CN1222985A (en) * 1996-05-03 1999-07-14 艾利森电话股份有限公司 Method relating to handling of conditional jumps in multi-stage pipeline arrangement
US7197603B2 (en) * 1997-08-01 2007-03-27 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems
CN1349160A (en) * 2001-11-28 2002-05-15 中国人民解放军国防科学技术大学 Correlation delay eliminating method for streamline control
CN1497436A (en) * 2002-10-22 2004-05-19 富士通株式会社 Information processing unit and information processing method
US20040111592A1 (en) * 2002-12-06 2004-06-10 Renesas Technology Corp. Microprocessor performing pipeline processing of a plurality of stages
WO2011079824A1 (en) * 2009-12-31 2011-07-07 Shanghai Xin Hao Micro Electronics Co., Ltd. Branching processing method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703832A (en) * 2021-09-10 2021-11-26 中国人民解放军国防科技大学 Method, device and medium for executing immediate data transfer instruction

Also Published As

Publication number Publication date
CN102855121B (en) 2017-04-19
CN106990942A (en) 2017-07-28
CN102855121A (en) 2013-01-02

Similar Documents

Publication Publication Date Title
WO2013000400A1 (en) Branch processing method and system
WO2011079824A1 (en) Branching processing method and system
WO2011091768A1 (en) Processor-cache system and method
US5394530A (en) Arrangement for predicting a branch target address in the second iteration of a short loop
US7653786B2 (en) Power reduction for processor front-end by caching decoded instructions
CA1332248C (en) Processor controlled interface with instruction streaming
WO2014079389A1 (en) Branch processing method and system
US7185171B2 (en) Semiconductor integrated circuit
US6430674B1 (en) Processor executing plural instruction sets (ISA's) with ability to have plural ISA's in different pipeline stages at same time
US7496732B2 (en) Method and apparatus for results speculation under run-ahead execution
US20060236080A1 (en) Reducing the fetch time of target instructions of a predicted taken branch instruction
US5297281A (en) Multiple sequence processor system
JPS6323586B2 (en)
JPS61107434A (en) Data processor
JP2004038344A (en) Instruction fetch control device
JP3242508B2 (en) Microcomputer
US20020133689A1 (en) Method and apparatus for executing coprocessor instructions
US5924120A (en) Method and apparatus for maximizing utilization of an internal processor bus in the context of external transactions running at speeds fractionally greater than internal transaction times
WO2014000626A1 (en) High-performance data cache system and method
WO2015024532A1 (en) System and method for caching high-performance instruction
US7613905B2 (en) Partial register forwarding for CPUs with unequal delay functional units
US6865665B2 (en) Processor pipeline cache miss apparatus and method for operation
JPH03175548A (en) Microprocessor and address control system
US7124277B2 (en) Method and apparatus for a trace cache trace-end predictor
US8078845B2 (en) Device and method for processing instructions based on masked register group size information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12804013

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12804013

Country of ref document: EP

Kind code of ref document: A1