WO2013000400A1

WO2013000400A1 - Branch processing method and system

Info

Publication number: WO2013000400A1
Application number: PCT/CN2012/077565
Authority: WO
Inventors: 林正浩
Original assignee: 上海芯豪微电子有限公司
Priority date: 2011-06-29
Filing date: 2012-06-26
Publication date: 2013-01-03
Also published as: CN102855121B; CN106990942A; CN102855121A

Abstract

A method is provided for controlling a pipeline operation of a processor. The processor is connected to a memory containing executable computer instructions. The method comprises determining that whether the instruction to be executed by the processor is a branch instruction, and providing both an address of a branch target instruction of the branch instruction and an address of a next instruction following the branch instruction in a program sequence. The method also comprises determining a branch decision with respect to the branch instruction based on at least the address of the provided branch target instruction, and selecting at least one of the branch target instruction and the next instruction as a proper instruction to be executed by an execution unit, based on the branch decision and before the branch instruction is processed to reach the execution stage in the pipeline, such that the pipeline operation is not stalled whether or not a branch is taken with respect to the branch instruction.

Description

Branch processing method and system

Technical field

The present invention relates to the field of electronic computer and microprocessor architecture, and in particular to a branch processing method and system.

Background technique

Control hazards (also known as branches) ), is a major cause of loss of performance in the pipeline. When processing a branch instruction, the traditional processor cannot know in advance where to get the next instruction to execute after the branch instruction, but needs to wait until the branch instruction is completed before the empty instruction occurs after the branch instruction in the pipeline. Figure 1 shows the traditional pipeline structure, where the pipeline segment corresponds to a branch instruction.

Table 1 Pipeline segments of branch instructions (when branching occurs)

order	i		IF	ID	EX	MEM	WB
	i+1			IF	Stall	Stall	Stall
	aims				IF	ID	EX	MEM
	Goal +1					IF	ID	EX
	Target +2						IF	ID

Instruction address		i	i+1	aims	Goal +1	Target +2	Target +3	Target +4
Get instruction			i		aims	Goal +1	Target +2	Target +3
Clock cycle		1	2	3	4	5	6	7

Referring to Figure 1 in conjunction with Table 1, the columns in Table 1 represent the clock cycles in the pipeline and the rows represent the sequential instructions. The instruction address is provided to the instruction memory for addressing when the instruction is fetched, after which the output of the instruction memory is sent to the decoder to decode the fetched instruction. The pipeline includes instruction fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and write back (WB). Stop ("stall") indicates a pipeline pause or an empty cycle.

Table 1 shows a branch instruction labeled 'i', which is fetched during clock cycle '1'. In addition, 'i+1' denotes an instruction immediately following the branch instruction, "target" denotes a branch target instruction of the branch point, and "target +1", "target +2", "target +3" and " The target +4" indicates the sequential instruction immediately following the branch target instruction.

As shown in Table 1, at the clock cycle '2', the processor acquires the branch instruction 'i'. At clock cycle '3', the processor fetches the instruction 'i+1' and decodes the branch instruction 'i'. It is assumed that the branch target address can be calculated at the end of the branch instruction decoding segment and the branch decision is completed. If the branch determines that a branch transfer has occurred, then the branch target address is saved as the next address for the next instruction. At the clock cycle '4', the branch target instruction is acquired and decoded and executed in a subsequent cycle. From here on, the pipeline processes the instructions following the branch target instruction. However, in this case, the instruction 'i+1' which has been acquired immediately after the branch instruction should not be executed, so the pipeline will be suspended due to the instruction 'i+1'. Thus, when the branch transfer succeeds, the pipeline will have a one-clock pause, which will result in a significant reduction in pipeline performance.

technical problem

In order to reduce the adverse effects of branch processing on pipeline performance, various static or dynamic branch prediction methods have been proposed, such as delay slots, branch prediction buffers, branch target buffers, and trace caches (trace). Cache) and so on. However, these prediction methods are usually predicted based on the previous running results of the processor, so there is still performance loss due to prediction errors.

Technical solution

The methods and systems proposed by the present invention can be used to solve one or more of the above problems, as well as other problems.

The present invention provides a method of controlling processor pipeline operations. The processor is coupled to a memory containing executable computer instructions. The method includes determining whether the instruction to be executed by the processor is a branch instruction, and providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence. The method further includes determining a branch instruction of the branch instruction based on at least an address of the branch target instruction, and according to the branch determination, selecting at least one of the branch target instruction and the latter instruction before the branch instruction reaches its execution segment in the pipeline The instruction that the execution unit is about to execute, so that the transfer of the pipeline instruction does not cause a pause in the pipeline operation.

The present invention also proposes a pipeline control system for controlling processor pipeline operations. The processor is coupled to a memory containing executable computer instructions. The system includes a review unit, an addressing unit, a branch logic unit, and a selector. The review unit is configured to determine whether the instruction to be executed by the processor is a branch instruction. The addressing unit is coupled to the processor for providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence. Further, the branch logic unit is configured to determine a branch decision regarding the branch instruction based on at least a branch target instruction address provided by the addressing unit. The selector is configured to determine, according to the branch decision provided by the branch logic unit, at least one of the branch target instruction and the latter instruction as the instruction to be executed by the execution unit before the branch instruction reaches the execution segment in the pipeline, so that Whether or not the transfer of the branch instruction occurs does not cause a pause in the pipeline operation.

The present invention also provides a method of controlling processor pipeline operations. The processor is coupled to a memory containing executable computer instructions. The method includes determining whether the instruction to be executed by the processor is a branch instruction, and providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence. The method further includes obtaining the branch target instruction and the subsequent instruction according to the branch target instruction address and the subsequent instruction address, respectively. In addition, the method further includes: decoding the obtained branch target instruction and the subsequent instruction, and selecting, according to the branch judgment provided by the processor, the decoding result of the branch target instruction and the decoding result of the subsequent instruction to be sent to the execution unit, This causes no stalling of the pipeline operation, regardless of whether a branch instruction transfer occurs.

The present invention also proposes a pipeline control system for controlling processor pipeline operations. The processor is coupled to a memory containing executable computer instructions. The pipeline control system includes an addressing unit coupled to the processor for providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in the program sequence. The pipeline control system also includes a read buffer coupled between the memory and the processor for storing at least one of a branch target instruction and a subsequent instruction of the branch instruction. In addition, the read buffer further includes a selector coupled to the processor for providing one of the branch target instruction or the latter instruction to the processor when the branch instruction is executed, so that no branch transfer of the branch instruction occurs. Will cause a pause in the operation of the pipeline.

Other aspects of the present invention can be understood and appreciated by those skilled in the art in light of the description of the invention.

Beneficial effect

The system and method of the present invention can provide a basic solution for branch processing of pipeline processors. The system and method acquires the address of the branch target instruction before the branch point is executed, and uses various branch decision logic to eliminate efficiency loss due to erroneous branch prediction. Other advantages and benefits of the present invention can also be derived by those skilled in the art.

DRAWINGS

Figure 1 is a control structure of a conventional ordinary pipeline;

2 is an embodiment of a pipeline control structure according to the present invention;

Figure 3 is an embodiment of a processor system in accordance with the present invention;

Figure 4 is an embodiment of the track table of the present invention;

Figure 5A is an embodiment of another pipeline control structure of the present invention;

Figure 5B is an embodiment of another pipeline control structure according to the present invention;

Figure 6 is an embodiment of another processor system in accordance with the present invention;

Figure 7 is an embodiment of another processor system in accordance with the present invention;

Figure 8 is an embodiment of different command values in the operation of the present invention;

Figure 9 is an embodiment of another pipeline control structure of the present invention;

Figure 10 is an embodiment of a processor environment in accordance with the present invention;

11 is a schematic diagram of a branch prediction method according to the present invention; and

Figure 12 is an embodiment of the branch prediction described in the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

Figure 3 shows a preferred embodiment of the invention.

Embodiments of the invention

Although the invention may be modified in various forms of modifications and substitutions, some specific embodiments of the invention are set forth in the specification and detailed. It should be understood that the inventor's point of departure is not to limit the invention to the particular embodiments set forth, but the inventor's point of departure is to protect all improvements, equivalent transformations and modifications based on the spirit or scope defined by the claims. . The same component numbers may also be used in all figures to represent the same or similar parts.

Figure 2 shows an example of a pipeline control structure 1 consistent with the disclosed invention. For ease of illustration, pipeline operations include fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and write back (WB). Other pipeline structures can also be used. As shown in FIG. 2, decoder 11 fetches instructions from instruction memory (or instruction cache) 10 via instruction bus 16. The decoder 11 decodes the fetched instructions and prepares operands for subsequent operations. The decoded instructions and operands are sent to the execution and program counter 12 (EX/PC) for execution and calculate the address 21 of the next instruction in the program family. The address 21 of the next instruction is used as the input of the selector 20.

Meanwhile, if a fetched instruction is a branch point, the instruction address of the branch target is pre-calculated before the program counter reaches the branch point, as described in the subsequent paragraph details. The pre-calculated branch target instruction address is used as the other input 18 of the selector 20. In addition, a branch judging unit 13 provides a branch control signal 14 for controlling the selector 20. The branch control signal 14 can be generated based on the branch type and the branch condition (or a condition flag). The branch control signal 14 controls the selector 20 to select which of the inputs is output to the register 17 and the address bus 19. Thereafter, the output on bus 19 is used to extract the next instruction from instruction memory 10.

FIG. 3 shows a processor environment 300 corresponding to the pipeline control structure 1 of the present invention. As shown in FIG. 3, processor environment 300 includes a low level memory 122, a high level memory 124, and a processor core 125. In addition, processor environment 300 includes a fill/builder 123, an active table 121, a track table 126, a tracker 170, and a branch decision logic 210 (corresponding to branch decision logic 13 in FIG. 2). It should be understood that the various components listed herein are for ease of description and may include other components, and some components may also be omitted. The various components herein may be distributed across multiple systems, either physically or virtually, and may be hardware implemented (eg, integrated circuits), implemented in software, or implemented in a combination of hardware and software.

High level memory 124 and low level memory 122 may comprise any suitable storage device such as: static memory (SRAM), dynamic memory (DRAM), and flash memory (flash) Memory). Here, the level of memory refers to the degree of proximity between the memory and the processor core. The closer the processor core is, the higher the level. In addition, a high level of memory is typically faster than a low level of memory but has a small capacity. The high level memory 124 can operate as a cache for the system, or as a level 1 cache when other buffers are present, and can be partitioned into a plurality of blocks called blocks (eg, memory blocks) for storing the processor core 125. A stored fragment of the accessed data (ie, the instructions and data in the instruction block and the data block).

Processor core 125 can be any suitable processor that can be pipelined and cooperate with the cache system. Processor core 125 may use separate instruction caches and data caches, and may include some instructions for cache operations. When processor core 125 executes an instruction, processor core 125 first needs to read the instructions and/or data from the memory. Active table 121, track table 126, tracker 170, and pad/generator 123 are used to fill instructions to be executed by processor core 125 into high level memory 124, enabling processor core 125 to be from high level memory 124. Read the required instructions at a very low cache miss rate. In the present embodiment, the term "fill" means moving data/instructions from a lower level memory to a higher level memory, and the term "memory access" means that processor core 125 is the closest memory (ie, high level memory). 124 or level 1 cache) to read or write.

In addition, the pad/generator 123 can fetch instructions or instruction blocks according to appropriate addresses, and can review each instruction fetched from the low level memory 122 to fill into the high level memory 124 and extract certain Information such as instruction type, instruction address, and branch target information for branch instructions. The instruction and the extracted information containing the branch target information are used to calculate the address and sent to other modules, such as the active table 121 and the track table 126. In this embodiment a branch instruction or a branch point refers to any suitable form of instructions that causes processor core 125 to change the execution stream (e.g., execute an instruction out of order). If the instruction block corresponding to the branch target information has not been filled into the high level memory 124, the corresponding track is established while the instruction block is filled into the high level memory 124. The tracks in the track table 126 correspond one-to-one with the memory blocks in the high level memory 124 and are all pointed by the same pointer 152. Any instructions that processor core 125 is to execute can be populated into high level memory 124 prior to execution.

The pad/generator 123 may determine address information such as an instruction type, a branch source address, and a branch target address information based on the instruction and the branch target information. For example, the instruction types may include conditional branch instructions, unconditional branch instructions, and other instructions. The instruction class may also include subcategories of conditional branch instructions, such as branching when equal, branching when greater than, etc. In some cases, an unconditional branch instruction can be considered a special case of a conditional branch instruction, ie, the condition is always true. Therefore, the instruction type can be divided into branch instructions and other instructions. The branch source address can refer to the address of the branch instruction itself, and the branch target address can refer to the address to which the branch will be transferred when the branch succeeds. In addition, you can include other information.

Additionally, a track table can be created based on the pre-computed information for providing an address to populate the high level memory 124. 4 is an example of a track table operation as disclosed herein. As shown in FIG. 4, track table 126 interacts with tracker 170 to provide the address required for buffering and branching processing.

The track table 126 can include tracks for instructions executed by the processor core 125, the tracker 170 provides different addresses based on the track table 126, and provides a read pointer for the track table 126. The track referred to here means an expression of a series of instructions (such as an instruction block) to be executed. This form of expression can include any suitable data type, such as an address, block number, or other number. In addition, when a track contains a branch point that has a branch target that changes the flow of the program, or another instruction after an instruction is in a different instruction block, such as a block in the next instruction block. A new track can be created when an instruction, or an exception program, or another program thread, etc.

The track table 126 can include a plurality of tracks, wherein each track of the track table 126 has a corresponding relationship with a line marked with a line number or block number (BN) in the track table, the block number pointing to a corresponding memory block. A track may include a plurality of track points, and a track point may correspond to one or more instructions. Further, since one track corresponds to one line in the track table 126, one track point corresponds to one entry (for example, one storage unit) of one line in the track table 126. Thus, the total track point in one track can be equal to the total number of entries in a row in track table 126. Other ways of organizing can also be used.

A track point (ie, an item in a table entry) can contain information about an instruction in the track, such as a branch instruction. Thus, the content of a track point can contain information about the class of the corresponding instruction and the branch target. By examining the contents of a track point, a branch target point can be determined based on the branch target address therein.

For example, as shown in FIG. 4, processor core 125 can read an instruction for operation with an instruction address of (M+Z) bits, where M and Z are integers. The M-bit portion of the address can be referred to as a high-order address, and the Z-bit portion can be referred to as an offset address. The track table 126 may contain 2 ^M lines, i.e. a total of 2 ^M tracks, and the upper address may be used for the addressing of the track table 126. Each line may contain 2 ^Z track items, ie a total of 2 ^Z track points, offset The address can be used to address in the corresponding row to determine a particular track point.

In addition, the content format of each entry or track point in the row may include a category portion 57, an XADDR portion 58, and a YADDR portion 59. Other parts can also be included. The category portion 57 represents the category of the track point corresponding instruction. As previously mentioned, the instruction classes may include conditional branch instructions, unconditional branch instructions, and other instructions. The instruction class may also include subcategories of conditional branch instructions, such as branching when equal, branching when greater than, etc. The XADDR portion 58 may contain an M-bit address, which may also be referred to as a first-dimensional address or simply as a first address. The YADDR portion 59 may contain a Z-bit address, which may also be referred to as a second-dimensional address or simply as a second address.

When a new track containing a branch point (a branch track point) is created, the new track can be built in an available row of the track table 126, and the branch track point can be established in an available entry for the row. in. The location of the row and the entry is determined by the source address of the branch point (ie, the branch source address). For example, the row number or the block number may be determined according to the upper address of the branch source address, and the entry is determined according to the offset address of the branch source address.

In addition, the content of the new track point can correspond to the branch target instruction. In other words, the contents of the branch track point store the branch target address information. For example, the line number or block number of the specific row corresponding to one branch target instruction in the track table 126 is stored as the first address in the content of the branch track point. Further, the offset address indicates the position of the branch target instruction in its track, and the offset address is stored as the second address in the content of the branch track point. Thus, in the content of the branch point, the first address is used as the row address, and the second address is used as the column address to address the branch target track point in the row.

Instruction memory 46 may be part of high level memory 124 for instruction access and may be comprised of any suitable high performance memory. Instruction memory 46 may contain 2 ^M memory blocks, each of which contains 2 ^Z bytes or words. That is, the instruction memory 46 can store all instructions addressed by the M and Z bits (i.e., instruction addresses) such that the M bits can be used for a particular memory block, and the Z bits can be used for A particular byte or word in that particular memory block is addressed.

The tracker 170 can be comprised of various components or devices, such as registers, selectors, stacks, and/or other memory modules for determining the next track to be executed by the processor core 125. The tracker 170 can determine the next track based on information such as the current track in the track table 126, track point information, and whether branching has occurred due to execution of the processor core 125.

For example, during operation, when processor core 125 executes a branch instruction, the (M+Z) bit instruction address of the branch instruction is passed on bus 55. The M-bit address is sent to the track table 126 as a first address or XADDR (or X address) via the bus 56, and the Z-bit address is sent to the track table 126 as a second address or YADDR (or Y address) via the bus 53. Based on the first address and the second address, the track table 126 can find a branch instruction entry and output the branch target address of the branch instruction to the bus 51.

If the branch transfer condition of the branch instruction does not hold, the branch transfer does not occur, and the selector 49 selects YADDR on the bus 53 to increment by one (1) byte or word to obtain a new second address 54. The first address remains unchanged and the new address can be output on bus 52. According to the control signal 60 from the processor core 125 (eg, an unsuccessful branch transfer), the register 50 keeps the first address unchanged, and the second address is incremented by one (1) by the increment one logic 48 until it points to the current track table. The next branch instruction on the line.

On the other hand, if the branch transfer condition of the branch instruction is established, the branch transfer occurs, and the selector 49 selects the branch target address stored in the content of the track entry corresponding to the branch point on the bus 51 as an output to be sent to the bus 52. . Based on control signal 60 from processor core 125 (e.g., a successful branch transfer), register 50 holds the first address of the changed corresponding new track and provides a new address of (M+Z) bits to bus 55. on.

Thus, for memory addressing, track table 126 and tracker 170 provide a block address, while processor core 125 provides only one offset. The processor core 125 feeds back the branch instruction execution state so that the tracker 170 can perform the decision operation.

The instruction block corresponding to the track is filled into the instruction memory 46 before a new track is executed. Repeating this process can cause cache misses to occur for all instructions that processor core 125 will execute.

Returning to Figure 3, in order to increase efficiency and reduce memory capacity, the active table 121 can be used to store any established track information and establish a mapping relationship between the address (or a portion of the address) and the block number so that it can be used Any available rows in track table 126 establish a track. For example, when a track is established, branch destination address information of all branch points in the track is stored in the active table 121. Thus, the active table 21 can store mapping information of tracks of all branch target track points in the program. Other configuration structures can also be used.

Thus, the active table 121 can be used to store the block number of the instruction block in the high level memory 124. The block number also corresponds to the line number in the track table 126. During the review process, the block number of the branch target address can be obtained by matching the address with the entry in the active table 121. The result of the successful matching, i.e. the block number (the aforementioned first address), can be used together with the offset of the instruction in the track (the aforementioned second address) to determine the position of the track point.

If the match is unsuccessful, the track corresponding to the address has not been established. A block number is specified by the active table 121 and the instruction segment corresponding to the address is filled into the position in the high-level memory 124 indexed by the block number, and a new track corresponding to the block number is established in the track table 126, so that the active table is obtained. 121 can represent the established track and associated address. Therefore, the operations of the active table 121 and the pad/generator 123 can be filled into the cache 124 (ie, the high-level memory) of the instruction segment corresponding to the branch target instruction of the branch point before the branch point is acquired and executed by the processor core 125. 124) Medium.

Thus, the track table 126 can be configured as a two-dimensional table in which each row is indexed by the first address BNX, corresponding to one memory block or one storage row, and the second address BNY is indexed for each column, corresponding to the corresponding instruction (data ) The offset in the memory block. In simple terms, the write address of the track table corresponds to the source address of the instruction. In addition, for a particular branch source address, the active table 121 assigns a BNX based on the upper address, and BNY is equal to the offset. Then, BNX and BNY can form a write address that points to the written entry.

Furthermore, when an instruction is filled into the high level memory 124, the branch target address of all branch instructions can be obtained by calculating the sum of the branch instruction address and the branch offset of the branch target instruction. The branch target address (higher address, offset) is sent to the active table 121 to match the upper address portion, and the active table 121 can be assigned a BNX. The assigned BNX, together with the instruction type and offset (BNY) from the generator 130, constitutes the contents of each branch instruction track entry. The content is stored in a branch point addressed by the corresponding write address.

Additionally, tracker 170 can be used to provide a read pointer 151 to track table 126. The read pointer 151 can also be in the form of BNX and BNY. The contents of the track entry pointed to by the read pointer are read along with the BNX and BNY (source BNX and source BNY) of the entry and are checked by the tracker 170. The tracker 170 can perform a plurality of different read pointer update operations based on the content. For example, if the entry is not a branch point, the tracker 170 can update the read pointer with the new BNX=source BNX, new BNY=source BNY+1.

If the entry is a conditional branch, the tracker 170 waits for a control signal (TAKEN) generated when the branch instruction of the branch point sent by the processor core 125 is executed. If the control signal indicates that a branch transfer has not occurred, the tracker 170 can update the read pointer with the new BNX=source BNX, new BNY=source BNY+1. However, if the branch succeeds, the tracker 170 can update the read pointer with the new BNX=target BNX, new BNY=target BNY.

If the entry is an unconditional branch (or jump), the tracer 170 can treat it as a conditional branch that is conditional, that is, when the branch instruction is executed, the new BNX=target BNX, new BNY=target The BNY method updates the read pointer.

The tracker 170 implements a track-based operation with the track table 126 and the active meter 121. Thus, the branch information, the branch target instruction, and the address information of the instruction following the branch instruction can be determined in advance. This information can be used by the pipeline control structure 1 to perform branch processing operations without suspending the pipeline.

Specifically, as shown in FIG. 3, when the read pointer 151 reaches a branch point, the tracker 170 receives the branch target address from the track table 126 via the bus 150. The upper address of the branch destination address (target BNX) is used as one input for one selector, and the other input is current BNX (BN The high address of 151, or the source BNX). The output of this selector is the next BNX. In addition, the offset portion of the branch target address (target BNY) is taken as one input to the other selector, and the other input is derived from the PC offset 155 of the processor core 125. The output of this selector is used as the "offset 1" address of the high level memory 124 to be paired by BNX The instructions in the cache block determined by 152 are addressed.

Read pointer 151 (BNX 152, BNY 153) Move at a faster speed than the PC (eg, the tracker 170 operates at a higher clock frequency, etc.). The read pointer 151 moves along the track. When the content read from the entry of the track table 126 indicates that the entry is a branch instruction with a branch target address (BNX and BNY), the read pointer 151 stops moving, waiting for the processor. Core 125 executes the branch point and waits for control signal 'TAKEN' signal 212 and 'BRANCH/JUMP' signal 213 from branch decision logic 210. Processor core 125 provides a PC offset to address instructions in high level memory 124, while tracker 170 provides BNY. 153 addresses the branch points in the track table 126. These two signals are also sent to the branch decision logic 210 for comparison. If PC offset 155 and BNY 153 is equal, then indicating that processor core 125 is acquiring the branch point. That is to say, BNY The match of 153 with PC offset 155 can be used to control the timing of the branching process such that branch decision logic 210 equals BNY at PC offset 155. At 153, branch determination is made. Alternatively, branch processing may be started when the PC offset 155 is different from the BNY 153 by a predetermined number of instructions.

When the PC offset 155 is equal to or away from BNY When the 153 is still a preset number of instructions, the processor core 125 acquires the branch point. The branch decision logic 210 can determine whether a branch transfer has occurred. In some cases, branch decisions can be made based on branch type and branch conditions (or condition flags). The branch type 211 (derived from the track table 126) may represent a particular type of branch instruction, such as a branch transfer when the branch condition is equal to zero or a branch transfer when the branch condition is greater than zero. The branch conditions can be generated by processor operations of processor core 125. Depending on the processor architecture, branch instructions, and/or pipeline operations, the branch conditions for a particular branch instruction may be valid across multiple pipeline segments of processor core 125.

Branch decision logic 210 may include any suitable circuitry for branching decisions. As previously described, the branch decision logic 210 can be equal to BNY at a PC offset of 155. 153 or at PC offset 155 and BNY When a relationship is formed 153 (e.g., greater than), a branch decision is made, and the branch decision can give a signal that the condition flag is ready. Thereafter, the result of the branch decision logic 210 is taken as the 'TAKEN' signal 212 and the 'BRANCH/JUMP' signal 213. The 'BRANCH/JUMP' signal informs the tracker 170 that the processor core 125 has reached the branch instruction and enables the read pointer 151 to be updated. The 'TAKEN' signal is the actual result of the program being executed and selects the correct next instruction to be executed.

Thus, when the 'BRANCH/JUMP' signal is detected, if the branch transfer does not occur, then the next BNX = source BNX, and the next BNY = source BNY+1, thus selecting BNX without change 152 (source BNX) is sent to "block selection 1", and the instruction address offset (PC offset 155) from the next instruction of the processor core 125 is selected and sent to "offset 1" to the branch. The instructions following the instruction are addressed. However, if a branch transfer occurs, the next BNX = target BNX, and the next BNY = target BNY, thus selecting the changed BNX 152 (target BNX) is sent to "block selection 1", and the offset of the branch target instruction of the source and track table 126 (target BNY) is selected and sent to "offset 1" to perform the branch target instruction of the branch instruction. Addressing. Thus, based on the branch type information from the track table 126 and the branch condition flag from the processor core 125, the address information of the branch target instruction can be provided in advance by the track table 126, and the PC provides the address of the instruction after the branch instruction. Information, and branch decision logic 210 determines the branch transfer.

Therefore, if a branch transfer occurs, the processor core 125 is used to acquire the branch target instruction (target BNX 152, target BNY) The correct address of 150) has been prepared to provide the port "Block Select 1" and "Offset 1" to the high level memory 124. In this way, processor core 125 can continue pipeline operations without waiting. Table 2 shows the pipeline segment diagram when the branch transfer succeeds. In Table 2, the row labeled "Instruction Address" is the instruction memory address corresponding to the instruction memory 124 "Block Select 1" (High Address) and "Offset 1" (Low Address), and is labeled "Get Instruction". The row corresponds to the instruction on the high level memory 124 "read port 1". It is assumed here that a delay of one clock cycle is required from the assertion of the instruction address to the assertion of the instruction. Further, the instruction 'i' is a branch instruction, and the 'target' is a branch target instruction, the 'target +1' is the next instruction of the branch target instruction, and so on.

Table 2 Pipeline segment diagram (when branching occurs)

order	i		IF	ID	EX	MEM	WB
	aims			IF	ID	EX	MEM	WB
	Goal +1				IF	ID	EX	MEM
	Target +2					IF	ID	EX
	Target +3						IF	ID
								IF
Instruction address		i	aims	Goal +1	Target +2	Target +3	Target +4
Get instruction			i	aims	Goal +1	Target +2	Target +3
Clock cycle		1	2	3	4	5	6	7

On the other hand, if branch branching does not occur, processor core 125 is used to obtain the correct address of the instruction immediately following the branch instruction (source) BNX 152, PC Offset 155) are also ready to provide port 'block selection 1' and 'offset 1' to high level memory 124. Thus, the processor core 125 You can continue the pipeline operation without waiting. Further, the tracker 170 can use the read pointer to acquire the next branch point under the control of the control signal to continue the branch processing as described earlier. table 3 An illustration of the pipeline segment when the branch transfer was unsuccessful is shown. The instruction 'i' is a branch instruction, 'i+1' is the last instruction of the branch instruction, and so on.

Table 3 Pipeline segment diagram (when branching does not occur)

order	i		IF	ID	EX	MEM	WB
	i+1			IF	ID	EX	MEM	WB
	i+2				IF	ID	EX	MEM
	i+3					IF	ID	EX
	i+4						IF	ID

Instruction address		i	i+1	i+2	i+3	i+4
Get instruction			i	i+1	i+2	i+3	i+4
Clock cycle		1	2	3	4	5	6	7

Figure 5A shows another pipeline control structure 2 of the present invention. As shown in FIG. 5A, the decoder 11 decodes the fetched instructions and provides the operands required for execution. The resulting instruction decode result and operand are sent to the execution unit and program counter (EX/PC) to execute and calculate the next instruction address 21 in the program stream. However, unlike the pipeline control structure 1 described in FIG. 2, the next instruction address 21 and the branch target instruction address 18 are sent to the instruction memory (or instruction cache) 22 through the

registers

24 and 23, respectively. Instruction memory 22 may contain multiple ports for read/write operations.

Thus, the instruction memory 22 can include two address ports for outputting the next instruction address 21 and the branch target instruction address 18. Upon receiving the next instruction address 21 and the branch target instruction address 18, the instruction memory 22 can provide respective instructions on the

output ports

28 and 29, respectively. Further, two instructions corresponding to the next instruction address 21 and the branch target instruction address 18 on the

output ports

28 and 29, respectively, are input to the selector 26, and the branch determination logic 13 can provide a control signal 14 to the selector 26 for selection. Inputs from

ports

28 and 29 are sent to decoder 11.

If the branch judging logic 13 judges that the branch point transfer has occurred, the instruction 29 corresponding to the branch target instruction address 18 is output to the decoder 11. If the branch judging logic 13 judges that the branch point transfer does not occur, the instruction 28 corresponding to the next instruction address 21 is output to the decoder 11. Furthermore, since the branch decision logic 13 makes this determination before the branch point reaches its execution segment or before the instruction decodes, the clock cycle loss of the pipeline is not caused by waiting for the branch decision.

FIG. 6 shows an embodiment of a processor environment 400 corresponding to the pipeline control structure 2. As shown in FIG. 6, processor environment 400 is similar to processor environment 300 in FIG. However, processor environment 400 differs from processor environment 300 in that branch decision logic is included in processor core 125, and high level memory 124 provides two address ports "Block Select 1, Offset 1" and "Block" Option 2, offset 2", and two read ports "Read Port 1" 127 and "Read Port 2" 128.

As shown in FIG. 6, when processing a branch instruction, the track table 126 can provide a branch target instruction address target BNX 201 and a target BNY to the address port "Block Select 2, Offset 2". 202. Further, the read pointer 151 supplies the block address BNX 152 of the next instruction to "Block Select 1", and the processor core 125 provides the offset address of the next instruction to "Offset 1".

When receiving the branch target instruction address and the next instruction address, the high level memory 124 fetches the branch target instruction and the next instruction, respectively, and takes the acquired branch target instruction and the next instruction as the acquired instruction 204 and acquires respectively. The instruction 203 is sent to "Read Port 2" 128 and "Read Port 1" 127. The fetched instruction 204 and the fetched instruction 203 are also two inputs to the selector 205 that is controlled by the control signal 207 (ie, the TAKEN signal from the processor core 125). In addition, selector 205 selects the correct one of the fetched instructions as output 206 to processor core 125 before processor core 125 decodes the fetched instruction based on the TAKEN signal. If the branch transfer occurs, the acquired branch target instruction is selected, and if the branch transfer does not occur, the acquired next instruction is selected.

The processor core 125 also provides a BRANCH/JUMP signal to the tracker 170 to indicate that the processor core 125 has reached a branch instruction, the TAKEN signal at this time is the actual result of the program execution and selects the correct next executed instruction. . Thus, when the BRANCH/JUMP signal is detected, the tracker 170 uses the new address as the BN. 151.

If the branch transfer occurs, the obtained instruction 204 corresponding to the branch target instruction (target BNX 201, target BNY) 202) has been sent to the processor core 125 as output 206. In this way, processor core 125 can continue pipeline operations without interruption. Currently, if the branch decision is unconditional, the unconditional branch instruction can be treated as a special branch point that satisfies the condition and does not require further judgment. Table 4 shows an illustration of the pipeline segments in the event that a branch transfer occurs. In Table 4, the row labeled "Instruction Address" is the corresponding instruction memory 124. The "block select 1" (high order address) and "offset 1" (lower address) instructions store the address, while the line labeled "get instruction" corresponds to the instruction on the output 206 of the selector 205.

Table 4 Pipeline segment diagram (when branching occurs)

order	i		IF	ID	EX	MEM	WB
	aims			IF	ID	EX	MEM	WB
	Goal +1				IF	ID	EX	MEM
	Target +2					IF	ID	EX
	Target +3						IF	ID

Instruction address		i	i+1	Goal +1	Target +2	Target +3	Target +4
Read port 1			i	i+1	Goal +1	Target +2	Target +3	Target +4
Read port 2	aims	aims	aims	aims	new goal	new goal	new goal	new goal
Get instruction				aims	Goal +1	Target +2	Target +3	Target +4
Clock cycle		1	2	3	4	5	6	7

In the decode segment of the branch instruction (clock cycle 3), the branch target instruction ("target") is fetched from the high level memory 124 along with the next instruction ("+1") and before the end of the decode segment Perform branch determination. Since both instructions are fetched, the correct instruction can be selected and used in its decode segment (clock cycle 4), regardless of whether the branch transfer occurs. This means that the instruction fetched after the branch point is always a valid instruction and there is no need to pause the pipeline. Similarly, as shown in Table 4, "Read Port 2" provides the next branch target instruction in advance.

When a branch transfer occurs, the branch target instruction from "Read Port 2" is selected at clock cycle 3 as an instruction to enter the decode segment at clock cycle 4. Similarly, at the end of clock cycle 3, the processor core 125 program counter (PC) is forced to the next instruction of the branch target instruction (target +1) instead of the branch target instruction (target). Tracker 170 output source BNX 152 drives "Block Select 1" in the normal manner, since when the branch transfer occurs, the tracker 170 transfers the next BN 151 containing the branch target address information to the BN 152, so the source BNX 152 = Target BNX. This ensures that the next "target +1" instruction, rather than the "target" instruction, can be acquired during clock cycle 4. This way, the program flow can be switched to the branch target without any pipeline pauses. In addition, the instruction address is incremented in the normal way until the next branch point address is reached.

On the other hand, if the branch transfer does not occur, the corresponding next instruction is obtained (source BNX Instruction 203 of 152, PC offset 155) is sent to processor core 125 as output 206. Thus, processor core 125 continues the pipeline operation without suspending. Table 5 shows an illustration of the pipeline segments when branching does not occur.

Table 5 Pipeline segment diagram (when branching does not occur)

order	i		IF	ID	EX	MEM	WB
	i+1			IF	ID	EX	MEM	WB
	i+2				IF	ID	EX	MEM
	i+3					IF	ID	EX
	i+4						IF	ID

Instruction address		i	i+1	i+2	i+3	i+4	i+5
Read port 1			i	i+1	i+2	i+3	i+4	i+5
Read port 2	aims	aims	aims	aims	new goal	new goal	new goal	new goal
Get instruction			i	i+1	i+2	i+3	i+4	i+5
Clock cycle		1	2	3	4	5	6	7

Thus, when the branch transfer does not occur, the instruction "i+1" following the branch instruction from "Read Port 1" is selected in clock cycle 3 as the instruction to enter the decode segment at clock cycle 4. From this point on, the instruction address is incremented in the normal way until the next branch point is reached.

FIG. 5B shows a block diagram of the pipeline control structure 3. The pipeline control structure 3 is another option than the pipeline control structure 2 described above. The pipeline control structure 3 differs from the pipeline control structure 2 in that it includes an additional memory 40. The memory 40 may contain the same number of memory blocks as the number of rows of the track table 126, each corresponding to one of the track tables 126.

Moreover, each memory block in memory 40 may contain a memory cell of the same number of track points or entries as a row in track table 126. Thus, when a track point is a branch point, the branch target instruction is stored in the corresponding memory location of the memory 40 in addition to being stored in the memory block of the instruction memory 22 corresponding to the branch target instruction.

The branch target address 18 is derived from the entry of the track table 126. The content of the entry is BNX and BNY of the branch target instruction corresponding to the entry or the branch track point. Thus, BNX and BNY can be used as an index to find the corresponding branch target instruction stored in memory 40. The selected branch target instruction can be sent to the selector 26 via the bus 29. Moreover, as previously described, the next instruction can be fetched from the instruction memory 22 based on the next instruction address 21, and the fetched next instruction can also be sent to the selector 26 via the bus 28. Thus, the instruction memory 22 of Figure 5B can be a single port storage device without the need for a dual port storage device as shown in Figure 5A.

Alternatively, the entry of the corresponding branch point in the track table 126 itself may store the branch target instruction. That is to say, the contents of the branch track point include the branch target instruction in addition to the address and offset of the branch target instruction. Thus, track table 126 can provide branch target instructions directly to selector 26 for selection by control signal 14 from branch decision logic 13. This configuration structure can be considered as the memory 40 being integrated in the track table 126.

Thus, as described above, since the branch target instruction address can be determined in advance, in other words, since the branch target information and the branch type are already prepared, the branch condition flag can be branched immediately after the processor core operation is set. determination. Thus, since the main function of the branch decision is to calculate the branch target address and perform the branch decision according to the branch type and the condition flag of the branch instruction, the branch decision can be made earlier than when the branch instruction itself reaches its normal execution segment. In general, the sooner a branch decision is made, the less additional hardware resources are needed. Based on the pre-branch decision from the branch decision logic 13, various configuration configurations can be used such that the pipeline can continue without branching when processing the branch transfer.

FIG. 7 shows an embodiment of a processor environment 600 in accordance with the present invention. In processor environment 600, a read buffer is used to provide a branch target instruction for a branch instruction in the program stream of processor core 125 and an instruction immediately following the branch instruction. Processor environment 600 is similar to processor environment 300 in Figure 3, with some differences. As shown in FIG. 7, processor environment 600 includes a read buffer 229 and a selector 225 in addition to cache 124, processor core 125, track table 126, and tracker 170.

Read buffer 229 is coupled between cache 124 and processor core 125 and includes a memory module 216 and a selector 214. The storage module 216 is used to store certain instructions. For example, memory module 216 in read buffer 229 stores and provides one of a branch target instruction or a subsequent instruction, while the other is provided directly by cache 124 such that the same cache structure can provide higher bandwidth. The selector 214 in the read buffer 229 is used to select one of the branch target instruction and the subsequent instruction based on the branch decision such that the instruction provided to the processor core 125 after the branch instruction is valid or correct. For example, selector 214 is used to select one of the outputs from storage module 216 or cache 124 as output 219 to processor core 125. In addition, selector 220 is used to select one of the addresses originating from track table 126 or tracker 170 as output 224 to buffer 124 (a block address); and selector 225 is used to select source track table 126 or One of the PC (Program Counter) offsets from processor 125 is sent as output 224 to buffer 124 (an offset address). Control signal 215 from tracker 170 is used to control

selectors

220 and 225 and memory module 216, while a 'TAKEN' signal is used to control selector 214.

During operation, tracker 170 provides BNX 152 and BNY 153 such that track table 126 can output one corresponding to the BNX 152 and BNY Track point of 153. The content in which the track point is read contains information such as an instruction type and a branch target address. The content (eg, instruction type and branch target address) can be sent to the tracker 170 via the bus 150. Further, the upper portion of the branch target address (BNX) is sent to the selector 220 as an input. The BNY of the branch destination address or a portion of the BNY (e.g., the highest two bits) may also be sent to the selector 225 via the bus 222. Another input to selector 220 may be BNX provided by tracker 170, and the other input of selector 225 may be part of the PC offset or PC offset (eg, the highest two bits).

The storage module 216 can include a predetermined number of storage units for storing instructions based on the capacity of other components. For example, if a memory block (eg, an instruction block) contains a total of 16 instructions, the length of the BNY and PC offsets can be 4 bits (4). Bit). Assuming that four instructions are fetched from the instruction memory or cache 124 in one clock cycle, the memory module 216 can store four instructions, and the highest two bits of the BNY or PC offset can be used from the memory block pointed to by the BNX. Read 4 instructions and use the lowest two bits of the BNY or PC offset to select one of the four instructions read.

For ease of description, the total number of instructions fetched in one clock cycle is four, and for a single or multiple transmit processor, the total number of instructions fetched per clock cycle can be any suitable number. In addition, the total number of instructions fetched in one clock cycle (eg, 4) may exceed the total number of instructions executed by processor core 125 in one clock cycle (eg, 1). Thus, the memory module 216 or the fill buffer 124 can be loaded using the track table 126 and other related components at a certain clock cycle. In some embodiments, the cache 124 can include a single port memory module having a bandwidth greater than the processor core 125 commanded emissivity to support padding of the memory module 216 by the tracker 170 and fetching of the processor core 125.

When the tracker 170 detects that an instruction is a branch instruction, the tracker 170 suspends the self-increment of BNY. When the fetch time slot arrives, the instruction type information can be used as the write enable signal to control the memory module 216, and the four instructions currently output by the buffer 124 are written to the memory module 216 via the bus 217. At the same time, according to the instruction type information (eg, the instruction type is a branch instruction), the signal 215 can control the selector 220 to select the BNX of the branch target instruction on the bus 221 as the instruction block address, and control the selector 225 to select the bus 222. The upper two bits of the BNY of the branch destination address find four instructions in the instruction block. These four instructions contain branch target instructions that can be read in the next read cycle or the next clock cycle. In addition, the four instructions including the branch target instruction are stored in the storage module 216, and the PC offset is used again to read the next instruction. Thus, when the processor core 125 executes a branch instruction corresponding to a branch point, the branch target instruction and the subsequent instruction following the branch point can be simultaneously provided, so that the correct instruction can be fetched according to whether the branch transfer occurs.

Figure 8 shows an embodiment of reading instructions during operation in accordance with the teachings of the present invention. As shown in FIG. 8, column 226 shows the value on output 218 of memory module 216, column 227 shows the value on output 217 of buffer 124, and column 228 shows the current instruction fetched by processor core 125. Furthermore, it is assumed that the instructions I0, I1, I2 and I3 are four consecutive instructions corresponding to the highest two bits of the same PC offset, where I2 is a branch instruction. It is further assumed that the branch target instruction of the branch instruction I2 is T1, and the instructions T0, T1, T2, and T3 are four consecutive instructions corresponding to the highest two bits of the same PC offset. The lines here represent subsequent clock cycles or execution cycles (one execution cycle may contain more than one clock cycle). The four rows correspond to period i, period i+1, period i+2, and period i+3, respectively. Furthermore, it is assumed that the 'TAKEN' signal is generated in the latter cycle of the branch instruction acquisition (ie, whether the branch transfer of the branch instruction occurs).

In cycle i, assuming that the PC offset points to I0, the read pointer reaches the track point of the corresponding branch instruction I2. During this cycle, selector 214 selects the output from cache 124 as output 219, and the lowest two bits of the PC offset can be used to select instruction I0 required by processor core 125 from among four consecutive instructions. As described earlier, the read pointer stops at the branch track point, and the four instructions output from the buffer 124 are stored in the memory module 216, and the branch target address is used as the instruction address for the next cycle (i.e., cycle i+1). Get 4 instructions including branch target instructions.

At cycle i+1, memory module 216 stores instructions I0, I1, I2, and I3, while cache 124 outputs instructions T0, T1, T2, and T3. In cycle i+1, selector 214 selects the output of memory module 216 as output 219, and the lowest two bits can be used to select the instruction I1 required by processor core 125 from the four instructions on bus 219. Further, in the period i+1, the four instructions T0, T1, T2, and T3 are written to the storage module 216, and the BNX and PC offsets of the track points pointed by the read pointer are used as the instructions of the next cycle (ie, Instruction I2) address.

In cycle i+2, memory module 216 stores and outputs instructions T0, T1, T2, and T3, while buffer 124 outputs instructions I0, I1, I2, and I3. In this cycle, selector 214 selects the output of buffer 124 as output 219, and the lowest two bits of the PC spoofing amount can be used to select the instruction I2 required by processor core 125 from the four instructions on bus 219. The address of the next instruction (ie, I3) is used as the instruction address for the next cycle.

In cycle i+3, memory module 216 stores and outputs instructions T0, T1, T2, and T3, while buffer 124 outputs instructions I0, I1, I2, and I3. During this cycle, selector 214 selects one of the outputs from cache 124 or the output from memory module 216 as output 219 depending on whether a branch transfer of the branch instruction occurs. In addition, it is also possible to select the instruction T1 or I3 required by the processor core according to whether the branch transfer of the branch instruction occurs, using the lowest two bits of the BNY of the branch target address or the lowest two bits of the PC offset.

Thus, the 'TAKEN' signal (i.e., whether branching of the branch instruction occurs) can be used to select the output of cache 124 or the output of memory module 216. Alternatively, you can use the lowest two bits of BNY of the branch destination address and the lowest two bits of the PC offset, respectively, to select one instruction from the four instructions including the branch target instruction, and from the four instructions including the next instruction. Select another instruction from the instruction.

Alternatively, instruction I3 and instruction T1 may be provided to processor core 125 at the same time, and processor core 125 may separately decode instruction I3 and instruction T1 and simultaneously obtain the operands of instruction I3 and instruction T1. Depending on whether branch branching of the branch instruction occurs, processor core 125 selects the decoding result of instruction T1 or the decoding result of instruction I3, as well as the correct operand. Specifically, when the read pointer reaches the track point corresponding to the branch instruction I2, if the instruction that the processor core 125 is acquiring is close to the branch instruction I2, for example, the instruction I1 is being acquired, after the instruction I2 is fetched, the buffer 124 is It is possible to start outputting four instructions I0, I1, I2 and I3. Processor core 125 may still obtain I3 and T1 from cache 124 and storage module 216, respectively. For example, an exclusive OR logic can be used to invert the value of the select signal of control selector 214 to select a branch target instruction or four instructions including a branch target instruction from the output of cache 124, respectively, or from memory module 216. The next instruction is selected in the output or four instructions including the next instruction. In this case, the four instructions T0, T1, T2, and T3 need not be stored in the storage module 216 regardless of whether a branch transfer occurs.

Furthermore, Figure 9 shows another pipeline control structure 4 of the present invention. The pipeline control structure 4 is similar to the pipeline control structure 2 of FIG. However, the pipeline control structure 4 differs from the pipeline control structure 2 in that it includes two independent decoders: a decoder 25 and a decoder 26 instead of only one decoder 11. As shown in FIG. 9, the two instructions fetched from the instruction memory 22 are further decoded by the decoder 25 and the decoder 26, respectively, and the instruction decode result 31 and the instruction decode result 32 are sent to the selector 33. It is selected by control signal 14 from branch decision logic 13.

If branch decision logic 13 determines that a branch point branch has occurred, then instruction decode result 32 corresponding to branch target instruction address 18 is selected and sent to execution unit 12. If the branch decision logic 13 determines that the branch point transfer does not occur, then the instruction decode 31 corresponding to the next instruction address 21 is selected and sent to the execution unit 12. Furthermore, since the branch decision logic 13 can complete the decision at the end of the branch instruction execution segment and before the execution segment of the next instruction, the pipeline will not have any clock cycle loss due to waiting for the branch result.

Thus, the branch determination logic 13 can determine the branch transition in a normal pipeline segment in addition to the branch branch determination prior to execution of the branch point, as at the end of the branch instruction execution segment. Since all instructions that may be executed by processor core 125 after the branch point have been fetched and decoded, and the instruction type is known, there will be no pipeline stalls due to branch decisions.

Moreover, although processor core 125 executes one instruction at a time as previously described, processor core 125 may execute more than one instruction at a time (i.e., one multi-transmission processor), which is also possible with the above examples. Similarly, although a 5-segment pipeline operation is described, pipeline operations for any other number of pipeline stages in various pipeline structures are also possible.

In addition, clock cycle loss due to branch instruction processing can also be reduced by pre-processing executable instructions or using predefined instructions. For example, a branch instruction can be combined with a non-branch instruction to form a compound instruction such that the branch instruction can be processed while the non-branch instruction is being processed such that the clock cycle penalty required for the branch instruction is reduced to zero or at a minimum.

For example, a processor instruction set typically contains some reserved or unused instructions, or some non-branch instructions have reserved or unused portions. These non-branch instructions can be used to include branch conditions and branch target addresses or offsets of branch instructions. Thus, when these non-branch instructions are executed, the branch condition can be determined and branch transfer can be performed during the execution of the non-branch instruction, thereby achieving zero-cost branch processing. Since the branch instruction roughly accounts for 20% of the total number of instructions executed by the processor, reducing the total number of executable instructions by 20% can significantly increase the performance of the processor.

For example, in a 32-bit instruction set, a type of addition instruction consists of a 5-bit instruction code, two source operands in the form of a 4-bit register number, and a destination operand, thus, in this case, an addition instruction A total of 17 shares were shared, and the remaining 15 were not used.

On the other hand, a type of branch instruction performs branch decision by comparing the values of two registers. As a separate instruction, such a branch instruction can contain a 5-bit instruction code, a 5-bit branch offset, and a 4-bit register number each. Thus, the branch instruction uses 18 bits.

However, when the addition instruction and the branch instruction are combined to form a composite instruction (eg, addition and branch), one bit of the 5-bit instruction code may be added to represent the composite instruction. Thus, this "addition and branch" instruction contains a 6-bit instruction code, three register numbers for the addition operation occupy a total of 12 bits, two register numbers for branch transfer occupy a total of 8 bits, and a 5-bit branch bias The amount of shift is 31 in total. Thus, in this example, the branch instruction can be executed while the add instruction is being executed, thereby achieving zero-cost branch processing. This approach makes zero-cost branching possible.

In other 32-bit instruction set examples, certain execution type instructions (eg, addition, subtraction, etc.) may have a 6-bit instruction code, and three of each are 5-bit register numbers, for a total of 21 bits. This leaves 11 bits for the additional branch operation. This branching operation can be of a fixed type, such as branching occurs when the value of a particular register is non-zero. One of these 11 bits can be referred to as a branch bit, while the other 10 bits can be a branch offset. When the branch bit is set to "0", the instruction is a normal executable instruction. When the branch bit is set to "1", the instruction is a branch instruction in addition to the function of performing the executable operation (addition, etc.). Furthermore, if the register contents are not equal to zero, the content is decremented by one, and the result of the execution is that the branch is transferred to an instruction whose address is a branch offset plus the composite instruction address. On the other hand, if the contents of the register are equal to zero, the branch transfer does not occur, and the next executed instruction is the instruction immediately following the composite instruction. This type of instruction can reduce two clock cycles per program cycle.

FIG. 10 shows an embodiment of a processor environment 1000 in accordance with the present invention. In processor environment 1000, a read buffer 229 is used to provide a branch instruction in the program stream of processor core 125 and subsequent instructions following the branch instruction. The processor environment is similar to processor environment 600 in Figure 7, but with some differences. As shown in FIG. 10, processor environment 1000 includes a read buffer 229 in addition to cache 124, processor core 125, track table 126, and tracker 170.

Read buffer 229 is coupled between cache 124 and processor core 125 and includes a memory module 216 and a selector 214. The storage module 216 is used to store certain instructions, such as content in a memory block in the cache 124. For example, memory module 216 in read buffer 229 stores and provides branch target instructions and subsequent instructions, while branch targets are provided directly by cache 124 such that the same buffer 124 can provide higher bandwidth. The selector 214 in the read buffer 229 selects one of the branch target instruction (from the cache 124) or the subsequent instruction of the branch instruction (from the storage module 216) as the output 219 to the processor core 125 based on the branch decision, such that the branch instruction The instructions provided to processor core 125 are then valid or correct. Further, the branch target address in the bus 150 read from the track table 126 is sent to the buffer 124 as a block address and an intra-block offset address; the PC offset 155 (intra-block offset address) derived from the processor 125 is It is sent to the cache storage module 216. The 'TAKEN' signal from memory 125 is used to control selector 214.

Tracker 170 provides BNX 152 and BNY during operation 153 addressing so that track table 126 can output one corresponding to the BNX 152 and BNY Track point of 153. The content in which the track point is read contains information such as an instruction type and a branch target address. This content can be sent to the tracker 170 via the bus 150. When the tracker 170 detects that a track point contains information of a branch instruction, the branch target block address 221 (target BNX) on the bus 150, and the branch target offset address 222 (target BNY) are sent to the buffer 124 to The branch target instruction is fetched from the cache 124 (which may also include other instructions on the same memory block of the branch target instruction) and placed on the bus 217 to the write port of the memory module 216 (write Port) and an input of selector 214. The branch target block address 221 and the branch target offset address 222 can be latched by the register and then sent to the cache 124 for addressing.

The storage module 216 can include a specific number of storage units for storing instructions. For example, all instructions that contain a block of memory (eg, an instruction block). The processor core 125 provides an in-block offset 155 for addressing to the memory module 216, and selects from the instructions stored in the memory module a single or multiple processor cores to send instructions to be executed to the selector 214. Input. Processor core 125 is also provided The 'TAKEN' signal and the 'BRANCH/JUMP' signal are sent to the tracker 170 to pass the branch or not information. The 'TAKEN' signal is also sent to the selector 214 as an input to the selector 214, and is also sent to the storage module 216 to select whether to replace the contents of the storage module 216 with the instruction block output by the buffer 124.

When the branch determines that the time slot is coming, the instruction selected from the memory module 216 to be placed on the input of the selector 214 contains the singular or plural instructions following the branch instruction. If the result of the determination is that no branching, the 'TAKEN' signal control selector 214 selects the output from the storage module 216 (the instruction following the branch instruction) and also controls the storage module 216 to keep the existing content unchanged. In this case, processor core 125 executes the instructions following the branch instruction. At this time, the tracker 170 moves to the next branch instruction in the same row of the track table, and repeats the above operation.

However, if the result of the determination is a branch, the 'TAKEN' signal controls the selector 214 to select the output of the cache 124 (branch target), and also controls the storage module 216 to update the contents of the storage module 216 with the output of the cache 124. In this case, the processor core 125 executes the branch target instruction and the instruction after the branch target instruction.

At this time, the tracker 170 moves to the item in the track table where the branch target instruction is located. Thereafter, the PC offset 155 selects the instruction in the memory module 216 (the instruction following the branch target instruction) for execution by the processor core 125, and the tracker 170 moves to the next branch instruction in the same row of the track table, repeating the above operations.

Thus, when the processor core 125 executes a branch instruction corresponding to a branch point, the branch target instruction and the subsequent instruction following the branch point can be simultaneously provided, so that the correct instruction can be fetched according to whether the branch transfer occurs.

An unconditional branch flag can be added after the last instruction in the track. The branch target instruction is the instruction in the program stream immediately following the last instruction. By the same method as described above, the subsequent instructions can be executed without suspending the pipeline operation after the execution of the instructions on each track is completed.

In addition, it is also possible to determine the position or time point at which the condition required for the branch condition determination by the branch instruction is finally determined, and after the condition is determined, perform branch determination, and determine the address of the instruction to be executed after the branch instruction is determined in advance. Thus, branch prediction with 100% success rate is achieved without using the existing branch prediction method. Figure 11 shows a schematic 1100 of the branch prediction method of the present invention.

As shown in FIG. 11, the instruction stream 1101 is an instruction stream composed of a series of sequentially executed instructions, and the execution order is from left to right. The instruction 1102 on the instruction stream 1101 is a branch instruction. The

instructions

1103, 1104, 1105 on the instruction stream 1101 are all instructions that change the branch instruction 1102 branch condition (or condition flag), where the instruction 1105 is the last of these instructions to change the branch instruction 1102 branch condition (or condition flag). Unlike the conventional processor practice (which determines whether the transition condition is satisfied when executing the branch instruction 1102), in the present embodiment, after the execution of the instruction 1105 causes the branch condition (or condition flag) required for the branch instruction 1102 to be determined. , you can judge whether the branch transfer condition is satisfied.

Figure 12 is an embodiment 1200 of the branch prediction of the present invention. The branch prediction system 1200 is composed of three parts: an instruction buffer 1201, a pre-detection control unit 1202, and a time point detection unit 1203. The instruction buffer 1201 stores the instruction 1205 currently being executed and the subsequent instruction following the instruction 1205. The time point detecting unit 1203 includes a position register corresponding to each branch transfer determination condition (or condition flag). The branch transfer decision condition (or condition flag) may be a general purpose register, a status register, or a flag bit, depending on the processor instruction set architecture. It is possible to compare with each other by different branch transfer judgment conditions (or condition flags) to obtain a determination result of whether or not branch transfer occurs. It is also possible to compare the branch transfer judgment condition (or condition flag) with a preset value to obtain a determination result as to whether or not the branch transfer occurs.

The pre-detection control unit 1202 controls the lead pointer 1204 to scan subsequent instructions from the current instruction 1205 along the instruction buffer at a faster rate than the processor program counter (PC) until the first branch instruction 1206 is reached. In the process, the instruction pointed by the leading pointer is read out and sent to the time point detecting unit 1203. Since the number of conditions (or condition flags) available in the processor for branch branch determination is limited, decoding by decoder 1207 in time point detection unit 1203 indicates whether the instruction pointed to by leading pointer 1204 would change these conditions (or The value of one or more of the condition flags; if the instruction changes the condition (or condition flag) value of the branch transfer decision, then it is possible to know which value of the condition (or condition flag) the instruction will change. During the scanning process, once it is found that the instruction pointed to by the leading pointer 1204 changes the value of the branch transition determination condition (or condition flag), the instruction position information of the instruction is written to the corresponding one or those of the time point judgment unit 1203. The condition register (or condition flag) in the location register.

For convenience of description, the branch prediction system 1200 takes only two kinds of judgment conditions (COND1 and COND2) as branching instructions. When there are more judgment conditions (or condition flags), the branching prediction system 1200 can also be implemented by the same method.

Taking the branch prediction system 1200 as an example, by scanning the instruction buffer, a total of three instructions from the current instruction 1205 to the first branch instruction 1206 change the determination condition, wherein the instruction position information of the instruction 1208 that changes the COND1 value is ' 3', the instruction position information of the instruction 1209 that changes the COND2 value is '4', and the instruction position information of the other instruction 1210 that changes the COND2 value is '7'.

When the leading pointer 1204 points to the instruction 1208, the instruction 1208 is read and sent to the decoding unit 1207 via the bus 1211. Upon decoding, it is found that the instruction changes the value of COND1. Therefore, the instruction position information '3' of the instruction 1208 is written in the position register 1212 corresponding to COND1.

Similarly, when the leading pointer 1204 points to the instruction 1209 and the instruction 1210, the instruction position information '4' of the instruction 1209 and the instruction position information '7' of the instruction 1210 are sequentially written into the position register 1213 corresponding to the COND2. Thus, when the leading pointer 1204 reaches the branch instruction 1206, the location information of the instruction that last updated the condition value before the branch instruction 1206 is stored in the location registers 1212 and 1213, respectively. In addition, when the leading pointer 1204 reaches the instruction 1206, the instruction is read and sent to the decoding unit 1207 via the bus 1211, and after decoding, it is found to be a branch instruction, and the stop signal is sent to the pre-detection control unit through the control line 1216. 1202, causing the leading pointer 1204 to stay at the branch instruction 1206.

At the same time, since the leading pointer 1204 points to the branch instruction, the decoding unit 1207 decodes and selects the value of the position register related to the condition determined by the branch instruction 1206 in the position register corresponding to all the branch conditions through the control line 1215, and outputs the value to the comparison. Unit 1218. Another input for comparison by comparison unit 1218 is current instruction position information 1214 of the current instruction that has completed the condition value update.

Since the location information of the instruction is stored in the location register, the value of the current instruction location information 1214 sent to the comparison unit 1218 is equal to the value of the last update of the branch determination condition value before the branch instruction 1206 is executed. The command position information of the command, that is, the result of the comparison unit 1218 outputting "equal" is sent to the control unit 1219, indicating that the judgment condition value has been updated, and can be used to judge whether or not the branch transfer condition is satisfied.

According to this method, when all the judgment condition values required for the branch instruction 1206 are updated, the control unit 1219 can issue a "decidable" signal 1220, allowing the processor to perform branch determination on the branch instruction 1206, thereby determining the branch in advance. The address of the instruction that should be executed after the instruction, to achieve branch prediction with 100% success rate.

Although not explicitly shown in the figures, it should be understood that the point in time detection unit 1203 may also obtain the necessary information from registers, instruction buffers 1201, or any other suitable source in the processor to generate the signal 1220. At the same time, the point in time detection unit 1203 can also send the necessary information to the processor to generate the signal 1220.

In addition, in some cases, if the processor does not perform out-of-order execution, the values of all the location registers corresponding to the required branch determination conditions may not be sent to the comparison unit 1218, but are decoded by the decoding unit 1207. A control signal is issued, and the largest value (position value) among the plurality of position registers corresponding to the desired branch determination condition is selected and output to the comparison unit 1218. Thus, when the result of the comparison unit 1218 outputting "equal" is sent to the control unit 1219, or the position register value is greater than or equal to the value of the current command position information 1214, all the judgment condition values required for the branch instruction are updated. At this time, the value of the program counter may also be used as the value of the current command position information 1214.

Industrial applicability

The apparatus and method proposed by the present invention can be used in a variety of processor related applications such as general purpose processors, special purpose processors, system on a chip (SOC) applications, application specific integrated circuit (ASIC) applications, and other computing systems. For example, the apparatus and method proposed by the present invention can be used in high performance processors to increase its pipeline efficiency and overall system performance.

Sequence table free content

Claims

A method of controlling processor pipeline operations, the processor being coupled to a memory comprising executable computer instructions; characterized in that the method comprises:

Determining whether the instruction to be executed by the processor is a branch instruction;

Providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;

Determining a branch decision corresponding to a branch instruction; and

Determining, according to the branch decision, at least one of the branch target instruction and the latter instruction as an instruction to be executed by the execution unit before the branch instruction reaches its execution segment in the pipeline, such that Whether or not the branch instruction transfer occurs does not cause a pause in the pipeline operation.
The method of claim 1 wherein:

The branch decision is determined according to the branch type and the branch status flag.
The method of claim 1 wherein said selecting further comprises:

Selecting one of a branch target instruction address and a subsequent instruction address according to the branch determination; and

One of the branch target instruction and the latter instruction is supplied to the execution unit according to the selected one of the branch target instruction address and the subsequent instruction address.
The method of claim 1 wherein said selecting further comprises:

Obtaining a branch target instruction and a subsequent instruction from the memory using the branch target instruction address and the subsequent instruction address; and

According to the branch determination, one of the acquired branch target instruction and the acquired subsequent instruction is selected to be supplied to the execution unit.
The method of claim 1 wherein said selecting further comprises:

Obtaining a branch target instruction from a storage device according to an address of the branch target instruction;

Obtaining the next instruction from the memory according to the latter instruction address; and

According to the branch determination, one of the acquired branch target instruction and the acquired subsequent instruction is selected to be supplied to the execution unit.
The method of claim 2 wherein said providing further comprises:

Extracting instruction information including at least branch information by reviewing executable computer instructions;

Establishing a plurality of tracks according to the extracted instruction information; and

Determining an address of the branch target instruction based on the plurality of tracks.
The method of claim 6 wherein said establishing a plurality of tracks further comprises:

Establishing a track table; the track table includes a plurality of track table rows corresponding to the plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to one track point, each of the track points Corresponds to at least one instruction.
The method of claim 7 wherein said method further comprises:

The track point is addressed based on the track number determined by the first address and the intra-track offset determined by the second address.
The method of claim 8 wherein:

The branch type is provided by the track table; and

A branch status flag is provided by the processor.
The method of claim 8 wherein:

The branch decision is made when the program counter (PC) offset provided by the processor is equal to the offset in the track table branch track point.
The method of claim 8 wherein:

When the processor executes an instruction corresponding to the track point, the memory cell block containing the instruction is determined by the first address, and the instruction can be found in the memory cell block according to the offset provided by the processor.
The method of claim 11 wherein said method further comprises:

The branch target can be calculated by summing the block address of the storage unit block in which the branch instruction is located, the offset of the branch instruction in the storage unit block, and the transfer offset transferred to the branch target instruction. The address of the instruction.
The method of claim 12 wherein said method further comprises:

And storing the branch target instruction address as an entry content in an entry corresponding to the branch instruction in the track table.
The method of claim 13 wherein said method further comprises:

When the transfer succeeds, the first address and the second address stored in the branch instruction corresponding entry are respectively used as the next first address and the next second address; and

When the transfer is unsuccessful, the current first address is kept unchanged as the next first address, and the current second address is incremented by one as the next second address, thereby reaching the next track point in the track table.
The method of claim 13 wherein said method further comprises:

When the transfer is successful, the processor's program counter is forced to the address of the next instruction of the branch target instruction, so that the processor acquires the latter instruction of the branch target instruction while executing the branch target instruction.
The method of claim 1 wherein:

The branch instruction may be combined with the non-branch instruction such that the branch execution process of the branch instruction is performed concurrently with the execution of the non-branch instruction.
A pipeline control system for controlling processor pipeline operations; the processor coupled to a memory containing executable computer instructions; characterized in that the system comprises:

a review unit for determining whether the instruction to be executed by the processor is a branch instruction;

An addressing unit connected to the processor, configured to provide a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;

a branch logic unit for determining a branch decision regarding the branch instruction based on at least a branch target instruction address provided by the track unit; and

a selector for selecting, according to a branch decision provided by the branch logic unit, at least one of a branch target instruction and a subsequent instruction as the execution unit is to perform before the branch instruction reaches its execution segment in the pipeline The instruction causes no stalling of the pipeline operation, whether or not the branch instruction is transferred.
The system of claim 17 wherein:

Selecting, by the selector, one of a branch target instruction address and a subsequent instruction address according to the branch determination, thereby implementing selecting at least one of a branch target instruction and a subsequent instruction; and

The pipeline control system further includes:

An obtaining unit is configured to obtain one of the branch target instruction and the latter instruction from the memory according to the selected one of the branch target instruction address and the subsequent instruction address, and supply the same to the execution unit.
The system of claim 17 wherein:

The pipeline control system further includes:

An obtaining unit, configured to acquire a branch target instruction and a subsequent instruction from the memory, respectively, using the branch target instruction address and the subsequent instruction address; and

The selector selects one of the acquired branch target instruction and the acquired subsequent instruction according to the branch determination, thereby implementing selection of at least one of the branch target instruction and the latter instruction.
The system of claim 17 wherein said system further comprises:

An acquisition unit and a storage device, wherein:

The obtaining unit is used to:

Obtaining a branch target instruction from the storage device according to a branch target instruction address; and

Obtaining the next instruction from the memory according to the latter instruction address; and

The selector selects one of the acquired branch target instruction and the acquired subsequent instruction to supply the execution unit according to the branch determination.
The system of claim 17 wherein:

The review unit can be further used to:

Extracting instruction information including at least branch information by reviewing the executable computer instructions; and

To implement the branch target instruction address and the subsequent instruction address providing the branch instruction, the track unit may be further used to:

Establishing a plurality of tracks according to the extracted instruction information; and

Determining an address of the branch target instruction based on the plurality of tracks.
The system of claim 21 wherein said addressing unit further comprises:

a track table; the track table includes a plurality of track table rows corresponding to a plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to a track point, each of the track points corresponding to At least one instruction.
The system of claim 17 wherein:

The branch instruction may be combined with the non-branch instruction such that the branch execution process of the branch instruction is performed concurrently with the execution of the non-branch instruction.
The system of claim 17 wherein:

A branch instruction can be part of a compound instruction that includes the branch instruction and a non-branch instruction.
The system of claim 24 wherein:

The composite instruction includes a branch bit for indicating whether a branch instruction included in the composite instruction is to be executed; and

The branch instruction in the composite instruction is branched and determined based on the contents of a preset register.
A method of controlling processor pipeline operations, the processor being coupled to a memory comprising executable computer instructions; characterized in that the method comprises:

Determining whether the instruction to be executed by the processor is a branch instruction;

Providing a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;

Obtaining a branch target instruction and a subsequent instruction according to the branch target instruction address and the subsequent instruction address;

Decoding the acquired branch target instruction and the subsequent instruction; and

The decoded branch target instruction and the decoded next instruction are supplied to the execution unit in accordance with a branch decision provided by the processor such that no stalling of the pipeline operation occurs regardless of whether the branch instruction transition occurs.
The method of claim 26 wherein said providing further comprises:

Extracting instruction information including at least branch information by reviewing the executable computer instructions;

Establishing a plurality of tracks according to the extracted instruction information; and

Determining an address of the branch target instruction based on the plurality of tracks.
The method of claim 27 wherein said establishing a plurality of tracks further comprises:

Establishing a track table; the track table includes a plurality of track table rows corresponding to the plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to one track point, each of the track points Corresponds to at least one instruction.
A pipeline control system for controlling processor pipeline operations; the processor coupled to a memory containing executable computer instructions; characterized in that the system comprises:

An addressing unit connected to the processor, configured to provide a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;

a read buffer coupled to the memory and the processor, configured to store at least one of a branch target instruction and a subsequent instruction of the branch instruction;

The read buffer further includes a selector coupled to the processor for providing one of the branch target instruction or the latter instruction to the processor when the branch instruction is executed, such that neither the branch instruction transfer occurs or not Will cause a pause in the operation of the pipeline.
The system of claim 29 wherein:

The memory is capable of outputting at least two instructions in one cycle; and

The read buffer is capable of storing at least two instructions in one cycle.
The system of claim 30 wherein:

The memory includes a single port memory module having a bandwidth that is higher than the processor instruction transmission rate.
The system of claim 30 wherein:

A portion of the instruction address is used to read at least two instructions from a memory block in memory; and

Another portion of the instruction address is for selecting the instruction from the at least two instructions.
The system of claim 30 wherein:

In the first cycle, the branch target instruction address is sent to the memory for reading at least two instructions including the branch target instruction.
The system of claim 33 wherein:

In the second cycle, the memory outputs the at least two instructions including the branch target instruction, and the branch instruction address is sent to the memory for reading at least two instructions including the branch instruction.
The system of claim 34 wherein:

In the third cycle, the at least two instructions including the branch target instruction are stored in the read buffer, and the memory outputs the at least two instructions including the branch instruction, and the latter instruction address is sent The memory is used to read at least two instructions including the latter instruction.
The system of claim 35 wherein:

In the fourth cycle, the read buffer outputs the at least two instructions including the branch target instruction, and the memory outputs the at least two instructions including the latter instruction.
The system of claim 35 wherein:

A control signal from the processor indicating whether the branch successfully occurred is used to determine at least two instructions including the branch target instruction, or to select the at least two instructions including the latter instruction.
The system of claim 37 wherein:

And a part of the program counter offset is used to select a branch target instruction from the at least two instructions including the branch target instruction, or select the next one from the at least two instructions including the latter instruction instruction.
The system of claim 29 wherein:

The addressing unit includes a track table including a plurality of track table rows corresponding to a plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to one track point, and each track point corresponding to each track point At least one instruction.
The system of claim 39 wherein:

A branch target instruction address corresponding to the branch instruction in the track table as the contents of the entry is stored in the track table.
A pipeline control system for controlling processor pipeline operations; the processor coupled to a memory containing executable computer instructions; characterized in that the system comprises:

An addressing unit connected to the processor, configured to provide a branch target instruction address of the branch instruction and a subsequent instruction address of the branch instruction in a program sequence;

a read buffer connected to the memory and the processor for storing the instruction segment in which the current instruction is located;

The read buffer further includes a selector coupled to the processor for providing one of the branch target instruction or the branch instruction to the processor when the branch instruction is executed, such that whether the branch instruction is transferred or not Occurs without causing a pause in pipeline operation.
The system of claim 41 wherein:

Selecting a subsequent instruction of the branch target instruction from the memory using a branch target address from the addressed unit; and

The next instruction of the current instruction is selected from the read buffer using the program counter offset from the processor.
The system of claim 42 wherein:

A control signal from the processor indicating whether the branch was successfully generated is used to decide to select a branch target instruction originating from the memory, or to select the latter instruction from the current instruction of the read buffer.
The system of claim 41 wherein:

The branch target instruction address sent to the memory can be latched according to the type of the current instruction.
The system of claim 41 wherein:

The addressing unit includes a track table including a plurality of track table rows corresponding to a plurality of tracks, each table row corresponding to one track and including a plurality of entries, each entry corresponding to one track point, and each track point corresponding to each track point At least one instruction.
The system of claim 45 wherein:

A branch target instruction address corresponding to the branch instruction in the track table as the contents of the entry is stored in the track table.
A pipeline control system for controlling processor pipeline operations; the processor coupled to a memory including executable computer instructions and a faster instruction buffer than the memory; wherein the pipeline control system includes :

a pre-detection control unit for controlling a leading pointer along the read buffer to move faster than a current instruction pointer currently being executed by the processor core; the pre-detection control unit may further perform an instruction for the leading pointer to pass Examining, thereby extracting instruction information including at least one branch instruction information and instruction information of a branch judgment condition or condition flag of the last update branch instruction, so that the leading pointer is stopped at at least one branch instruction;

a time point detecting unit, configured to perform branch determination after the execution of the instruction for updating the branch instruction branch condition or the condition flag, so that before the execution of the branch instruction, the instruction that should be executed by the processor is a branch target The instruction is also an instruction following the branch instruction such that no stalling of the pipeline operation occurs regardless of whether the branch instruction transition occurs.
The system according to claim 47, wherein said pipeline control system is further configured to: establish a track corresponding to the instruction segment based on the extracted instruction information; said track comprises a plurality of track points, each corresponding to the instruction segment An instruction.
The system of claim 47 wherein said pipeline control system is further for:

All position information of the instruction that updates the branch instruction branch condition or the condition flag is stored in the corresponding location register;

Comparing the current instruction pointer with the location information of the corresponding at least one branch instruction stored in the location register; and

If the current instruction pointer is greater than or equal to the location information stored in the location register, a signal is generated to perform the branch decision.
The system of claim 47 wherein:

The branch instruction information includes direct addressing branch instruction information and indirect addressing branch instruction information.