US5467473A - Out of order instruction load and store comparison - Google Patents

Out of order instruction load and store comparison Download PDF

Info

Publication number
US5467473A
US5467473A US08/001,976 US197693A US5467473A US 5467473 A US5467473 A US 5467473A US 197693 A US197693 A US 197693A US 5467473 A US5467473 A US 5467473A
Authority
US
United States
Prior art keywords
instructions
instruction
sequence
load
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US08/001,976
Inventor
James A. Kahle
Chin-Cheng Kau
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US08/001,976 priority Critical patent/US5467473A/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST. Assignors: KAHLE, JAMES ALLAN, KAU, CHIN-CHENG
Priority to JP5269218A priority patent/JP2597811B2/en
Priority to EP93120937A priority patent/EP0605869A1/en
Application granted granted Critical
Publication of US5467473A publication Critical patent/US5467473A/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency

Definitions

  • the present invention generally relates to processing instructions in a computer system. More particularly, a system for processing out of order load and store instructions is provided with the capability to check for the correct execution of load operations relative to store operations.
  • U.S. Pat. No. 4,630,195 describes a system for determining data dependency wherein data transfer commands are generated and a register in local storage is assigned to the data being transferred. A tag that identifies the register in which the data is stored and subsequent data transfer is compared to the stored tags to determine any potential dependencies.
  • IBM Technical Disclosure Bulletin Vol. 30, No. 1, June 1987, Pages 191-192 discusses a problem using pipelined architecture wherein during the execution of a store instruction the possibility exists that the data could already be stored in a memory location previously loaded into the pipeline. An expression is derived that will determine if the difference between a program counter and the memory address is equal to one or two, which would be the case if the data is in a memory location already loaded into the pipeline.
  • U.S. Pat. No. 4,965,716 describes a priority queue wherein elements are kept in a unsorted stack. The stack is searched for the next highest priority element after the highest priority element has been read from a holding register. A priority comparison is implemented to determine the highest priority element.
  • a program instruction calls for transfer of data from a particular main storage location to a general purpose register. This system uses pointers to allow subsequent load instructions involving the same particular main storage location to make data stored in a hardware register immediately available to a central processing unit if the data from the previous load instruction is still stored in one of the hardware registers.
  • U.S. Pat. No. 4,697,233 ensures data integrity by including a compare stack in a computer system having a pipelined architecture.
  • the stack structure is partially duplexed such that a predetermined number of bits of each data word are stored in a compare stack.
  • the bits are available for comparison to bits stored in the stack registers to determine that proper decoding has occurred, thereby ensuring data integrity of data in the pipeline.
  • U.S. Pat. No. 4,638,429 discusses a data processing apparatus using pipeline control that includes an Operand Store Conflict (OSC) circuit for detecting if a succeeding instruction uses an operand to be modified by a preceding instruction whose store operation has not been completed.
  • the OSC detects the execution result of an in order store instruction whose store operation has not been completed is utilized as an operand of a fetch instruction which exceeds the store operation.
  • a control unit aligns the fetched operand to the operand position of the succeeding instruction and merges them accordingly.
  • the prior art does not provide any type of system which compares load and store operations to allow for out of order execution of instructions.
  • Some conventional systems are capable of determining any data dependency conflicts that may be present prior to the execution of the instructions, but do not compare the sequence of data load and store operations subsequent to execution of the instructions. In fact, it is not possible to detect load and store conflicts prior to execution, since the memory address for the store data is not generated until the store instruction is executed. Conflicts between data load and store operations occur fairly infrequently, however it is essential that these conditions be detected and corrected. Failure to correct these conflicts will result in processing errors, such as incorrect data, and the like. Thus, it can be seen that a need exists for a system that can detect out of order load and store instructions which cause conflicts when executed.
  • the present invention provides a system including an instruction cache unit (ICU) and dispatch unit for providing instructions to a processor bus.
  • the dispatch unit is capable of altering the order of the instructions to obtain greater processing efficiency.
  • a completion unit is also included which maintains the order of the instructions as they were provided from the ICU to the dispatch (prior to any reordering).
  • At least one load/store unit is provided for loading instructions to a processing unit, such as a fixed point unit, floating point unit, or the like.
  • load and store queues are provided that include the addresses of the instructions. During execution of the store instruction, the address is compared to the address of previously executed load instructions, in a load queue, which executed out of order ahead of the store.
  • a program counter compares the program number of the store instruction being executed with the program number of the load instruction in the load queue.
  • the present invention compares the addresses for the load and store instructions, and the program number for these instructions. If the addresses are not the same, then no problem exists. That is, if the addresses are not the same, then no conflict exists between the instructions being compared, because the data is not at the same address in memory. Further, if the address is the same, and the store program number is greater than the load program number then the instructions have been executed in order (the load correctly preceded the store) and no problem exists.
  • FIG. 1 is a schematic diagram of the components of a system that allows for out of order execution of load and store operations used in conjunction with the present invention
  • FIG. 2 is an example of instructions that may be executed out of order to increase processing efficiency
  • FIG. 3 shows a block diagram including the load and store queues of the present invention connected to the cache used by the load/store units;
  • FIG. 4 is an example of a set of instructions that cannot be executed out of order and will be detected by the present invention
  • FIG. 5 shows the load queue of the present invention and how the store instructions are compared with the load instructions to determine impermissible out of order conditions
  • FIG. 6 is a flow chart showing the sequence of events that the present invention utilizes in order to detect out of order load and store conflicts.
  • FIG. 1 various elements of a processor system are shown. It should be noted that these elements may be included on multiple integrated circuit devices (chip) or incorporated onto a single device.
  • the processor of the present invention is incorporated on a single chip which is one of the PowerPC processors designed and marketed by the IBM Corporation. (PowerPC is a trademark of the IBM Corp.).
  • PowerPC is a trademark of the IBM Corp.
  • any processing system which includes the capability to alter the order of instructions is contemplated by the scope of the present invention.
  • various models of the IBM RISC System/6000 reduced instruction set computer may be capable of executing floating point instructions out of order and as such, would be capable of using the present invention.
  • FIG. 1 includes an instruction cache unit (ICU) 1 which contains instructions received from an operating system, such as the IBM AIX system (AIX is a trademark of the IBM Corp.), an application program, or the like. These instructions are to be executed by the processing system of the present invention.
  • an operating system such as the IBM AIX system (AIX is a trademark of the IBM Corp.), an application program, or the like.
  • FPU floating point unit
  • FXU fixed point unit
  • a dispatch unit 5 and branch processing unit 7 are also included in the system of FIG. 1.
  • the dispatch unit receives instructions from the ICU 1 and organizes these instructions prior to their execution.
  • Dispatch unit 5 determines if there are instruction efficiencies that can be exploited by altering the order of the instructions.
  • the branch unit 7 operates in conjunction with dispatch unit 5 and is utilized to reduce any pipeline penalty caused by branch instructions.
  • Three types of branch executions are possible, an unconditional branch, a conditional branch that is not taken and a conditional branch that is taken.
  • the unconditional branch and conditional branch which are not taken may not require an apparent machine cycle (zero-cycle branches), while the conditional branch that is taken may have a delay of up to three cycles.
  • the efficiency achieved by the dispatch and branch units i.e. exploiting the instruction interdependencies, may cause the branch unit to be transparent to the system. In other words, the increased efficiency (cycles saved) makes up for the cycles used by the dispatch and branch units.
  • the branch unit 7 and dispatch unit 5 are described in the IBM RISC System/6000 Technology Publication.
  • dispatch unit 5 operates with branch unit 7 to allow reordering of instructions, prior to their execution. More particularly, the conditional branch instruction may be loaded into an instruction queue (not shown) within dispatch 5 for execution by the branch unit.
  • the presence of a conditional branch is predicted. If the conditional branch is predicted as "not taken", the sequential instructions in the dispatch unit 5 are simply executed. However, if this prediction is incorrect, the instruction queue in the dispatch unit 5 must be purged of the sequential instructions which follow the conditional branch instructions and target instructions must be fetched from the ICU.
  • conditional branch is predicted as "taken”
  • target instructions are fetched and utilized to follow the conditional branch, if the prediction is resolved as correct.
  • the target instructions must be purged and the sequential instructions which follow the conditional branch instruction in program order must be retrieved.
  • the dispatch status of instructions within an instruction buffer is periodically determined. In response to a status of the instructions at the beginning of the instruction buffer, the remaining instructions are shifted within the instruction buffer and a partial group of instructions are loaded into the instruction buffer from the instruction queue, utilizing a selectively controlled multiplex circuit.
  • the dispatch unit 5 is capable of providing out of order instructions to the processing units shown in FIG. 1, such as the FXU 12 and FPU 11.
  • FIG. 1 Previously noted Co-pending Patent Application, "Method and System for Increased Instruction Dispatch Efficiency in a Superscalar Processor System, filed Jan. 8, 1993, having attorney docket number 08/001,867, more fully describes the operation of dispatch unit 5 and is hereby incorporated by reference.
  • An order queue 3 is provided that will allow the completion logic unit 15 to maintain the sequence of instructions as they are provided from the ICU 1 to the dispatch unit 5.
  • the instructions in the order queue 3 are used as a reference by the completion logic unit 15 to allow reordering of instructions back to their initial sequence, if required.
  • Data bus 2 is utilized to provide instructions to at least one load/store (L/S) unit 9, which is utilized to load and store a general purpose register (GPR) file.
  • the L/S unit 9 retrieves data from the GPR to be loaded into the processing units for execution, e.g. FXU 11 and FPU 12, and stores data that had been manipulated by the processing units into the GPR for subsequent placement into a memory location.
  • An additional data bus 4 is provided that allows the L/S units 9 to communicate with the FXU and FPU, as well as dual port cache 13.
  • This cache 13 is also connected to system memory 21 (FIG. 3) and temporarily stores the data being loaded and stored into the processing units and memory, respectively.
  • Cache 13 is capable of loading and storing two data words from the load/store units 9.
  • Rename registers 17 allow for the renaming of data prior to its being placed into system registers 19, and provide the ability to more efficiently place data in the system registers 19.
  • These system registers are a set of architecture registers which are organized by number and used to input data to the processing units, such as the FPU and FXU. Once the data is placed in the system registers they are committed to the architecture. However, the rename registers are a temporary pool of system registers which hold the data prior to input to the system registers. If data is placed in the rename registers, it is not yet committed to the architecture and can be removed prior to being input.
  • FIG. 2 shows some representative instructions that may be executed out of order to improve processing efficiency.
  • Instruction 0 will load data into register R1, e.g. into register corresponding to processing units 11, 12, from memory location (address) A.
  • Instruction 1 then stores new data from register R2 into memory location A, while instruction 2 loads data from memory location B into register R3.
  • Instruction 3 adds the contents of register R1 to the contents of register R3 and places the result in register R4. If the sequence of instructions are executed in this manner, it can be seen that four functional steps are required.
  • one step loads the data from memory A to register R1
  • a second step stores data from register R2 into memory A
  • a third step loads data from memory B into register R3
  • a fourth step adds the contents of registers R3 and R1 and places the sum in register R4.
  • the same functions can be achieved in smaller number of steps. For example, if instruction 1 (store) is placed after the load instructions (0 and 2), then the load instructions can be executed in parallel during a single step. More particularly, instructions 0 and 2 can be executed simultaneously since the dual port cache 13 can retrieve two data words from memory. In this example, the contents of memory location A will be loaded into register R1 and the contents of memory location B will be loaded into register R3 during the first step.
  • instruction 3 adds the contents of registers R1 and R3 and puts the result in register R4 during the second step.
  • manipulated data is stored into memory location A from register R2 in the third step.
  • Load queue 31 is provided for storing the address of load instructions which executed prior to the store instruction, along with the program number of the load instruction, as tracked by a program counter. These load addresses and program numbers are received from the load/store unit 9 and are removed from the load queue 31 when they are committed to the architecture, e.g. no other exception exists or condition, such as a pending interrupt is present.
  • a finish store queue 33 is also interconnected to the load store unit and will maintain the address and program number of executed store instructions which have not yet been committed to the architecture.
  • the address and program number of these executed store instructions are compared to the address and program number, in the load queue, of the load instructions in accordance with the present invention.
  • the address and program number, in finish store queue 33, of store instructions which have been executed but are not yet committed to be stored in memory are placed in finish store queue 33.
  • the results of the store operations (data and address) will be placed in completed store queue 35, when the result is committed to the architecture.
  • the store operations in completed store queue 35 will be placed in memory 21, via cache 13, by the completion logic unit 15.
  • Load queue 31, finish queue 33 and completed store queue 35 are all a part of the queuing system of the processing system of the present invention.
  • the load queue 31, finish store queue 33 and completed store queue 35 are discrete components of the processing system of the present invention and can be physically located at various locations on the chip. These queues (31, 33, 35) are associated with cache 13 since the addresses stored therein must be provided to the cache so that the data can be retrieved from and stored to the memory 21. In a preferred embodiment of the present invention, the queues 31, 33, 35 are 64-bit registers which are capable of storing approximately 34 sets of addresses and program numbers.
  • FIG. 4 will now be described to illustrate the problems that can be encountered when certain instructions are executed out of order and why it is necessary to detect this situation.
  • Instruction 0 adds the contents of certain hardware registers R3 and R4 and places the result into register R1. Instruction 1 then stores the contents of register R1 in memory location A. Next, instruction 2 loads the data from memory A into a register R5, and instruction 3 then adds the contents of register R5 with register R6 and places the result in register R7. Finally, instruction 4 stores the contents of register R7 into memory location A.
  • FIG. 5 is a schematic representation of the load queue 31.
  • the load/store unit 9 will place every load instruction address and program number in the load queue. Then, when a store instruction executes, a comparison is made between the address being generated by the load/store unit 9 during execution of the store instruction and the addresses (corresponding to the load instruction) in the load queue. Additionally, the program number for the store instruction being executed is compared with program numbers 49 of the load instructions in the load queue.
  • the present invention determines if the address for the store operation is the same as one of the addresses 47 for a load instruction in the load queue.
  • FIG. 6 illustrates a load queue 31 having five positions for load instruction addresses and program numbers. Of course, five positions is used for illustrative purposes only and a load queue having additional positions is contemplated by the present invention.
  • the store address from L/S 9 is compared to addresses 47 by a comparator 45 and the program number from a program counter in the load/store 9 is compared with the program number 49 for the loads in load queue 31 by comparators 41 and 43.
  • Comparators 41 will determine if the program number from the store instruction is less than the program number of load, and comparators 43 will determine if the store program number is greater than the load program number.
  • comparators 45 determine if the address of the store is the same as the address, in the load queue, of the load instruction.
  • comparators 41, 43 and 45 will be implemented in Boolean logic array including a series of and, or, exclusive or, nand, nor gates, or the like. More particularly, specialized circuitry can be utilized for the address comparator which will determine, by using exclusive or (XOR), logic the binary value of the address in the load queue and the binary value of the address being generated for the store. If the values are identical then the circuit will output a binary 1 and if the address values are different, then a binary 0 is output.
  • All of the addresses in the load queue are compared with the generated store address to determine if any identical addresses exist.
  • specialized circuitry is also used to determine if the program number for the store instruction is less than the load instruction program numbers in the load queue.
  • a subtract circuit may be used which will subtract one program number from the other and determine which is less than the other based on whether the resulting value is positive, or negative. For example, if the store program number is subtracted from the load program number and the result is positive, then the store program number is less than the load number. However, if the result is negative, then the store program number is greater than the load program number.
  • the present invention is capable of determining whether the store and load instructions are using the same memory, and which of these instructions should execute first.
  • Other embodiments include hardwired implementations and software comparators, particularly microcode, and are all contemplated by the scope of the present invention.
  • step 1 the instruction set is retrieved from the instruction cache unit 1 and provided to the dispatch unit 5 which then may reorder the instructions to take advantage of any efficiencies, as described above in conjunction with FIG. 2 (step 2). It is then determined at step 3 whether a load or store instruction is being considered. If a load instruction is encountered, then the system proceeds to step 3a where the load instructions are executed. The load/store unit 9 then places the program number and address for the load instructions that have been executed in the load queue 31 at step 4. However, if a store instruction has been encountered at step 3, then the system jumps to step 5 and the store instructions are executed.
  • step 6 A comparison is then made between the address generated during execution of the store instruction and the load addresses in the load queue (step 6). If at the step 6, it is determined that the store addresses and load addresses are not equal, then the system proceeds to step 7 and continues execution of instructions. On the other hand, if the comparison of step 6 determines that the store address equals one of the load addresses in the load queue, then another comparison is made, at step 8, to determine if the program number for the store instruction is less than the program number. If the store program number is greater than the load program number, then the load operation properly occurred prior to the store operation for that memory address and no conflict exists. The operation then proceeds to step 7 and execution of instructions continues.
  • step 9 which marks the load as incorrectly executed and places the load instructions back in their original order. This reordering back to the original sequence is possible since the completion logic unit has maintained a record of the original sequence in which the instructions are provided from ICU 1 to dispatch unit 5, via order queue 3.
  • Step 10 then re-executes the load instructions so that the store will correctly precede the corresponding load instructions, since the store instructions have been allowed to complete.
  • dispatch unit 5 has reordered sequence of the instructions in the left column of FIG. 5 to execute in the sequence shown in the right column of FIG. 5.
  • the add instruction 0 is first executed followed by original load instruction 2, which loads the data in memory A to register R5. Since the load instruction has executed, its address (memory A) is placed in position 47a and program number (2) is placed in position 49a of load queue 31.
  • store instruction 1 is executed and its generated address (A) is compared with the addresses in position 47 of load queue 31. It can be seen that the store address is equal to the address in position 47a.
  • the present invention determines if the program number of the store instruction 2 is less than the program number in position 49a of load queue 31.
  • the store instruction program number 1 is less than the load instruction program number 2 in the load queue and a conflict is present. That is, the store instruction should have been executed prior to the load instruction.
  • These instructions must effectively be reordered to their original sequence and re-executed.
  • This re-execution may be implemented one of various methods, such as actually putting all of the instructions back into their original sequence and re-executing all of them. Another method is to mark the incorrect out of order load instruction as having been executed too early, and then re-executing only those load instructions marked as incorrect, while letting the store instructions complete and put the results in memory.
  • the store instructions do not need to be re-executed since they have been allowed to execute and will precede the re-executed load instructions.
  • the marking of the load may be implemented by setting a flag bit to a binary 1 or 0. In this manner, marking the load is viewed by the processing system as an interrupt which will cause the incorrect load to be re-fetched from the instruction cache unit and re-executed.
  • the add instruction 3 which places the data from registers R5 and R6 into register R7 is then executed and the data from register R7 is stored in memory address A.
  • the present invention again compares the address of the store instruction with the address of the load instruction in position 47a of load queue 31. These addresses are equal (memory address A) and the program number of the store instruction 4 is then compared with the program number of the load instruction 2 in position 47a of load queue 31. In this case, the store instruction program number is greater than the load instruction program number and properly executed after the load instruction. The processing system then continues normal execution operations.

Abstract

A processing system allows for out of order instruction execution and includes at least one load/store unit for loading instructions to a register for processing by a fixed point unit, floating point unit, or the like, and store the results to memory. A load queue maintains the addresses and program numbers of the load instructions. During execution the address of the store instruction is compared to the address in the load queue of previously executed load instructions. A program counter compares the program number of the store instruction with the program number of the load instruction in the load queue. If the addresses are different, then no impermissible out of order situation exists between the load and store instructions being compared, because the data is not at the same address. If the address is the same, and the store program number is greater than the load program number, then the instructions have been executed in order (the load correctly preceded the store) and no problem exists. However, if the addresses are the same and the load instruction has been incorrectly reordered to precede the store instruction, then a reordering conflict exists and the load instructions must be re-executed.

Description

CROSS REFERENCE TO RELATED APPLICATIONS
"Data Processing system with Multiple Execution Units Capable of Executing Instructions Out of Sequence", Ser. No. 07/750,132, filed Aug. 26, 1991, assigned to IBM Corporation.
"Method and System for Increased Instruction Synchronization Efficiency in a Superscalar Processor System", filed Jan. 8, 1993, having attorney docket number 08/001,863, assigned to IBM Corporation.
"Method and System for Increased Instruction Dispatch Efficiency in a Superscalar Processor System", filed Jan. 8, 1993, having attorney docket number 08/001,867 assigned to IBM Corporation.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to processing instructions in a computer system. More particularly, a system for processing out of order load and store instructions is provided with the capability to check for the correct execution of load operations relative to store operations.
2. Description of Related Art
Currently, computer systems are generally available that provide for out of order execution of load and store operations. It is known in the art that processing speed is enhanced if store instructions can be executed after the load instructions, i.e. delay executing the store operations as long as possible. However, a problem exists for certain sequences of instructions since data that is to be loaded from a particular memory address may not have had the correct value previously stored therein. In this case, the store instruction must precede the load instruction. The present invention detects this situation and reloads the queue with the required instruction.
U.S. Pat. No. 4,630,195 describes a system for determining data dependency wherein data transfer commands are generated and a register in local storage is assigned to the data being transferred. A tag that identifies the register in which the data is stored and subsequent data transfer is compared to the stored tags to determine any potential dependencies. IBM Technical Disclosure Bulletin Vol. 30, No. 1, June 1987, Pages 191-192, discusses a problem using pipelined architecture wherein during the execution of a store instruction the possibility exists that the data could already be stored in a memory location previously loaded into the pipeline. An expression is derived that will determine if the difference between a program counter and the memory address is equal to one or two, which would be the case if the data is in a memory location already loaded into the pipeline.
U.S. Pat. No. 4,965,716 describes a priority queue wherein elements are kept in a unsorted stack. The stack is searched for the next highest priority element after the highest priority element has been read from a holding register. A priority comparison is implemented to determine the highest priority element. In U.S. Pat. No. 4,574,349 a program instruction calls for transfer of data from a particular main storage location to a general purpose register. This system uses pointers to allow subsequent load instructions involving the same particular main storage location to make data stored in a hardware register immediately available to a central processing unit if the data from the previous load instruction is still stored in one of the hardware registers.
U.S. Pat. No. 4,697,233 ensures data integrity by including a compare stack in a computer system having a pipelined architecture. The stack structure is partially duplexed such that a predetermined number of bits of each data word are stored in a compare stack. At readout, the bits are available for comparison to bits stored in the stack registers to determine that proper decoding has occurred, thereby ensuring data integrity of data in the pipeline.
U.S. Pat. No. 4,638,429 discusses a data processing apparatus using pipeline control that includes an Operand Store Conflict (OSC) circuit for detecting if a succeeding instruction uses an operand to be modified by a preceding instruction whose store operation has not been completed. The OSC detects the execution result of an in order store instruction whose store operation has not been completed is utilized as an operand of a fetch instruction which exceeds the store operation. When a conflict is detected and a store operation is the preceding instruction, a control unit aligns the fetched operand to the operand position of the succeeding instruction and merges them accordingly.
The prior art does not provide any type of system which compares load and store operations to allow for out of order execution of instructions. Some conventional systems are capable of determining any data dependency conflicts that may be present prior to the execution of the instructions, but do not compare the sequence of data load and store operations subsequent to execution of the instructions. In fact, it is not possible to detect load and store conflicts prior to execution, since the memory address for the store data is not generated until the store instruction is executed. Conflicts between data load and store operations occur fairly infrequently, however it is essential that these conditions be detected and corrected. Failure to correct these conflicts will result in processing errors, such as incorrect data, and the like. Thus, it can be seen that a need exists for a system that can detect out of order load and store instructions which cause conflicts when executed.
SUMMARY OF THE INVENTION
In contrast to the prior art, the present invention provides a system including an instruction cache unit (ICU) and dispatch unit for providing instructions to a processor bus. The dispatch unit is capable of altering the order of the instructions to obtain greater processing efficiency. A completion unit is also included which maintains the order of the instructions as they were provided from the ICU to the dispatch (prior to any reordering). At least one load/store unit is provided for loading instructions to a processing unit, such as a fixed point unit, floating point unit, or the like. Further, load and store queues are provided that include the addresses of the instructions. During execution of the store instruction, the address is compared to the address of previously executed load instructions, in a load queue, which executed out of order ahead of the store. A program counter compares the program number of the store instruction being executed with the program number of the load instruction in the load queue. The present invention then compares the addresses for the load and store instructions, and the program number for these instructions. If the addresses are not the same, then no problem exists. That is, if the addresses are not the same, then no conflict exists between the instructions being compared, because the data is not at the same address in memory. Further, if the address is the same, and the store program number is greater than the load program number then the instructions have been executed in order (the load correctly preceded the store) and no problem exists. However, if the addresses are the same and the load instruction had been incorrectly reordered to precede the store instruction, which is detected since the program number for the store instruction is less than the load instruction, then a problem exists and at least a portion of the instructions must be reordered and re-executed prior to providing them to the system.
Other objects, features and advantages will be apparent to those skilled in the art from the subsequent description taken in conjunction with the attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic diagram of the components of a system that allows for out of order execution of load and store operations used in conjunction with the present invention;
FIG. 2 is an example of instructions that may be executed out of order to increase processing efficiency;
FIG. 3 shows a block diagram including the load and store queues of the present invention connected to the cache used by the load/store units;
FIG. 4 is an example of a set of instructions that cannot be executed out of order and will be detected by the present invention;
FIG. 5 shows the load queue of the present invention and how the store instructions are compared with the load instructions to determine impermissible out of order conditions; and
FIG. 6 is a flow chart showing the sequence of events that the present invention utilizes in order to detect out of order load and store conflicts.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Referring to FIG. 1, various elements of a processor system are shown. It should be noted that these elements may be included on multiple integrated circuit devices (chip) or incorporated onto a single device. In the preferred embodiment, the processor of the present invention is incorporated on a single chip which is one of the PowerPC processors designed and marketed by the IBM Corporation. (PowerPC is a trademark of the IBM Corp.). However, it will be understood by those skilled in the art that any processing system which includes the capability to alter the order of instructions is contemplated by the scope of the present invention. For example, various models of the IBM RISC System/6000 reduced instruction set computer may be capable of executing floating point instructions out of order and as such, would be capable of using the present invention.
Delaying the execution of store instructions will enhance processor performance since store operations place data, which is the result of manipulation by a processing unit, into a memory location, via a cache, buffer, or the like. On the other hand, load operations put data into the system registers, e.g. a floating point register, to be manipulated by a processing unit, thus performing actual computing operations. Therefore, it can be seen that doing as many load operations as possible, without using machine cycles to store data into memory, will enhance system performance. However, if a load instruction needs to retrieve data from a memory location, and the correct data has not yet been stored to that memory location, then a problem exists. That is, the correct data has not yet been stored to the memory location, due to the out of order instructions. The present invention detects and solves this problem by determining when an impermissible out of order condition exists and then reordering and re-executing the instructions.
FIG. 1, includes an instruction cache unit (ICU) 1 which contains instructions received from an operating system, such as the IBM AIX system (AIX is a trademark of the IBM Corp.), an application program, or the like. These instructions are to be executed by the processing system of the present invention. In particular, at least one floating point unit (FPU) 12 and at least on fixed point unit (FXU) 11 are included and shown in FIG. 1. These processing units are known in the art and described in detail in the IBM RISC System/6000 Technology publication, hereby incorporated by reference. More particularly, the FPU is described at pages 34-42 and the FXU is discussed at pages 24-32.
A dispatch unit 5 and branch processing unit 7 are also included in the system of FIG. 1. The dispatch unit receives instructions from the ICU 1 and organizes these instructions prior to their execution. Dispatch unit 5 determines if there are instruction efficiencies that can be exploited by altering the order of the instructions.
The branch unit 7 operates in conjunction with dispatch unit 5 and is utilized to reduce any pipeline penalty caused by branch instructions. Three types of branch executions are possible, an unconditional branch, a conditional branch that is not taken and a conditional branch that is taken. The unconditional branch and conditional branch which are not taken may not require an apparent machine cycle (zero-cycle branches), while the conditional branch that is taken may have a delay of up to three cycles. It will be understood that the efficiency achieved by the dispatch and branch units, i.e. exploiting the instruction interdependencies, may cause the branch unit to be transparent to the system. In other words, the increased efficiency (cycles saved) makes up for the cycles used by the dispatch and branch units. The branch unit 7 and dispatch unit 5 are described in the IBM RISC System/6000 Technology Publication.
Additionally, dispatch unit 5 operates with branch unit 7 to allow reordering of instructions, prior to their execution. More particularly, the conditional branch instruction may be loaded into an instruction queue (not shown) within dispatch 5 for execution by the branch unit. In an effort to minimize run-time delays in a pipelined processor, such as the superscalar processor of the present invention, the presence of a conditional branch is predicted. If the conditional branch is predicted as "not taken", the sequential instructions in the dispatch unit 5 are simply executed. However, if this prediction is incorrect, the instruction queue in the dispatch unit 5 must be purged of the sequential instructions which follow the conditional branch instructions and target instructions must be fetched from the ICU. Alternately, if the conditional branch is predicted as "taken", then the target instructions are fetched and utilized to follow the conditional branch, if the prediction is resolved as correct. Of course, if the prediction of "taken" is incorrect, the target instructions must be purged and the sequential instructions which follow the conditional branch instruction in program order must be retrieved. In general, the dispatch status of instructions within an instruction buffer, from which the dispatch unit dispatches instructions, is periodically determined. In response to a status of the instructions at the beginning of the instruction buffer, the remaining instructions are shifted within the instruction buffer and a partial group of instructions are loaded into the instruction buffer from the instruction queue, utilizing a selectively controlled multiplex circuit. In this manner, additional instructions may be dispatched to available processing units without requiring a previous group of instructions to be completely dispatched. Thus, the dispatch unit 5 is capable of providing out of order instructions to the processing units shown in FIG. 1, such as the FXU 12 and FPU 11. Previously noted Co-pending Patent Application, "Method and System for Increased Instruction Dispatch Efficiency in a Superscalar Processor System, filed Jan. 8, 1993, having attorney docket number 08/001,867, more fully describes the operation of dispatch unit 5 and is hereby incorporated by reference.
An order queue 3 is provided that will allow the completion logic unit 15 to maintain the sequence of instructions as they are provided from the ICU 1 to the dispatch unit 5. The instructions in the order queue 3 are used as a reference by the completion logic unit 15 to allow reordering of instructions back to their initial sequence, if required. Data bus 2 is utilized to provide instructions to at least one load/store (L/S) unit 9, which is utilized to load and store a general purpose register (GPR) file. The L/S unit 9 retrieves data from the GPR to be loaded into the processing units for execution, e.g. FXU 11 and FPU 12, and stores data that had been manipulated by the processing units into the GPR for subsequent placement into a memory location. An additional data bus 4 is provided that allows the L/S units 9 to communicate with the FXU and FPU, as well as dual port cache 13. This cache 13 is also connected to system memory 21 (FIG. 3) and temporarily stores the data being loaded and stored into the processing units and memory, respectively. Cache 13 is capable of loading and storing two data words from the load/store units 9.
Rename registers 17 allow for the renaming of data prior to its being placed into system registers 19, and provide the ability to more efficiently place data in the system registers 19. These system registers are a set of architecture registers which are organized by number and used to input data to the processing units, such as the FPU and FXU. Once the data is placed in the system registers they are committed to the architecture. However, the rename registers are a temporary pool of system registers which hold the data prior to input to the system registers. If data is placed in the rename registers, it is not yet committed to the architecture and can be removed prior to being input.
FIG. 2 shows some representative instructions that may be executed out of order to improve processing efficiency. Instruction 0 will load data into register R1, e.g. into register corresponding to processing units 11, 12, from memory location (address) A. Instruction 1 then stores new data from register R2 into memory location A, while instruction 2 loads data from memory location B into register R3. Instruction 3 adds the contents of register R1 to the contents of register R3 and places the result in register R4. If the sequence of instructions are executed in this manner, it can be seen that four functional steps are required. That is, one step loads the data from memory A to register R1, a second step stores data from register R2 into memory A, a third step loads data from memory B into register R3, and a fourth step adds the contents of registers R3 and R1 and places the sum in register R4. However, if the instruction sequence is altered, the same functions can be achieved in smaller number of steps. For example, if instruction 1 (store) is placed after the load instructions (0 and 2), then the load instructions can be executed in parallel during a single step. More particularly, instructions 0 and 2 can be executed simultaneously since the dual port cache 13 can retrieve two data words from memory. In this example, the contents of memory location A will be loaded into register R1 and the contents of memory location B will be loaded into register R3 during the first step. Next, instruction 3 adds the contents of registers R1 and R3 and puts the result in register R4 during the second step. Finally, manipulated data is stored into memory location A from register R2 in the third step. Thus, by altering the instruction sequence, the same operations can be performed with less processing time being utilized. The previous example is extremely simple, however those skilled in the art will understand how this type of instruction reordering can provide enormous savings of processing resources.
Referring to FIG. 3, specific elements of the present invention are shown in relation to the cache 13, load/store unit 9 and memory 21, previously described. Load queue 31 is provided for storing the address of load instructions which executed prior to the store instruction, along with the program number of the load instruction, as tracked by a program counter. These load addresses and program numbers are received from the load/store unit 9 and are removed from the load queue 31 when they are committed to the architecture, e.g. no other exception exists or condition, such as a pending interrupt is present. A finish store queue 33 is also interconnected to the load store unit and will maintain the address and program number of executed store instructions which have not yet been committed to the architecture. The address and program number of these executed store instructions are compared to the address and program number, in the load queue, of the load instructions in accordance with the present invention. The address and program number, in finish store queue 33, of store instructions which have been executed but are not yet committed to be stored in memory are placed in finish store queue 33. Next, the results of the store operations (data and address) will be placed in completed store queue 35, when the result is committed to the architecture. The store operations in completed store queue 35 will be placed in memory 21, via cache 13, by the completion logic unit 15. Load queue 31, finish queue 33 and completed store queue 35 are all a part of the queuing system of the processing system of the present invention.
The load queue 31, finish store queue 33 and completed store queue 35 are discrete components of the processing system of the present invention and can be physically located at various locations on the chip. These queues (31, 33, 35) are associated with cache 13 since the addresses stored therein must be provided to the cache so that the data can be retrieved from and stored to the memory 21. In a preferred embodiment of the present invention, the queues 31, 33, 35 are 64-bit registers which are capable of storing approximately 34 sets of addresses and program numbers.
FIG. 4 will now be described to illustrate the problems that can be encountered when certain instructions are executed out of order and why it is necessary to detect this situation.
Instruction 0 adds the contents of certain hardware registers R3 and R4 and places the result into register R1. Instruction 1 then stores the contents of register R1 in memory location A. Next, instruction 2 loads the data from memory A into a register R5, and instruction 3 then adds the contents of register R5 with register R6 and places the result in register R7. Finally, instruction 4 stores the contents of register R7 into memory location A.
However, for this set of instructions a conflict will be created if the processing system, through the dispatch unit 5, reorders the sequence of the instructions. For example, if instructions 1 and 2 are reversed in order to place the store subsequent to the load a conflict will occur. Instruction 0 added the contents of registers R3 and R4 and put the result in register R1. Instruction 2 then loads whatever data is currently in memory A into register R5 and instruction 1 will subsequently store the data in register R1 into memory location A. Instruction 3 again adds the contents of registers R5 and R6 and places the sum into register R7. Therefore, the incorrect data has been loaded into register R5. Whatever data was originally in memory A was loaded in register R5, before the desired data (sum of registers R3 and R4) is placed in memory location A. This will cause the incorrect data to then be added to the contents of register R6 causing incorrect data to be placed in register R7. Thus, it can be seen that it is not always possible to execute store operations subsequent to load operations and detecting this type of condition is needed for systems that have the capability to execute instructions out of order.
FIG. 5 is a schematic representation of the load queue 31. As described previously, the load/store unit 9 will place every load instruction address and program number in the load queue. Then, when a store instruction executes, a comparison is made between the address being generated by the load/store unit 9 during execution of the store instruction and the addresses (corresponding to the load instruction) in the load queue. Additionally, the program number for the store instruction being executed is compared with program numbers 49 of the load instructions in the load queue.
The present invention then determines if the address for the store operation is the same as one of the addresses 47 for a load instruction in the load queue. FIG. 6 illustrates a load queue 31 having five positions for load instruction addresses and program numbers. Of course, five positions is used for illustrative purposes only and a load queue having additional positions is contemplated by the present invention. In any event the store address from L/S 9 is compared to addresses 47 by a comparator 45 and the program number from a program counter in the load/store 9 is compared with the program number 49 for the loads in load queue 31 by comparators 41 and 43. Comparators 41 will determine if the program number from the store instruction is less than the program number of load, and comparators 43 will determine if the store program number is greater than the load program number. The program numbers cannot be equal since the comparison is between loads and stores, which are different instructions. The comparators 45 determine if the address of the store is the same as the address, in the load queue, of the load instruction. In a preferred embodiment of the present invention, comparators 41, 43 and 45 will be implemented in Boolean logic array including a series of and, or, exclusive or, nand, nor gates, or the like. More particularly, specialized circuitry can be utilized for the address comparator which will determine, by using exclusive or (XOR), logic the binary value of the address in the load queue and the binary value of the address being generated for the store. If the values are identical then the circuit will output a binary 1 and if the address values are different, then a binary 0 is output. All of the addresses in the load queue are compared with the generated store address to determine if any identical addresses exist. Similarly, specialized circuitry is also used to determine if the program number for the store instruction is less than the load instruction program numbers in the load queue. In this case, a subtract circuit may be used which will subtract one program number from the other and determine which is less than the other based on whether the resulting value is positive, or negative. For example, if the store program number is subtracted from the load program number and the result is positive, then the store program number is less than the load number. However, if the result is negative, then the store program number is greater than the load program number. In this manner, the present invention is capable of determining whether the store and load instructions are using the same memory, and which of these instructions should execute first. Other embodiments include hardwired implementations and software comparators, particularly microcode, and are all contemplated by the scope of the present invention.
A description of the operation of the present invention will now be described with regard to the flow chart of FIG. 6. At step 1, the instruction set is retrieved from the instruction cache unit 1 and provided to the dispatch unit 5 which then may reorder the instructions to take advantage of any efficiencies, as described above in conjunction with FIG. 2 (step 2). It is then determined at step 3 whether a load or store instruction is being considered. If a load instruction is encountered, then the system proceeds to step 3a where the load instructions are executed. The load/store unit 9 then places the program number and address for the load instructions that have been executed in the load queue 31 at step 4. However, if a store instruction has been encountered at step 3, then the system jumps to step 5 and the store instructions are executed. A comparison is then made between the address generated during execution of the store instruction and the load addresses in the load queue (step 6). If at the step 6, it is determined that the store addresses and load addresses are not equal, then the system proceeds to step 7 and continues execution of instructions. On the other hand, if the comparison of step 6 determines that the store address equals one of the load addresses in the load queue, then another comparison is made, at step 8, to determine if the program number for the store instruction is less than the program number. If the store program number is greater than the load program number, then the load operation properly occurred prior to the store operation for that memory address and no conflict exists. The operation then proceeds to step 7 and execution of instructions continues. However, if the store program number is less than the load program number, then it must be greater than the load program number (since they cannot be equal). In this case, a conflict exists and the system proceeds to step 9 which marks the load as incorrectly executed and places the load instructions back in their original order. This reordering back to the original sequence is possible since the completion logic unit has maintained a record of the original sequence in which the instructions are provided from ICU 1 to dispatch unit 5, via order queue 3. Step 10 then re-executes the load instructions so that the store will correctly precede the corresponding load instructions, since the store instructions have been allowed to complete.
An example of the present invention as shown in FIG. 5 will now be described in conjunction with the instructions of FIG. 4. For the purposes of this example it is assumed that dispatch unit 5 has reordered sequence of the instructions in the left column of FIG. 5 to execute in the sequence shown in the right column of FIG. 5. The add instruction 0 is first executed followed by original load instruction 2, which loads the data in memory A to register R5. Since the load instruction has executed, its address (memory A) is placed in position 47a and program number (2) is placed in position 49a of load queue 31. Next, store instruction 1 is executed and its generated address (A) is compared with the addresses in position 47 of load queue 31. It can be seen that the store address is equal to the address in position 47a. The present invention (using comparators 41 and 43) then determines if the program number of the store instruction 2 is less than the program number in position 49a of load queue 31. In this example, the store instruction program number 1 is less than the load instruction program number 2 in the load queue and a conflict is present. That is, the store instruction should have been executed prior to the load instruction. These instructions must effectively be reordered to their original sequence and re-executed. This re-execution may be implemented one of various methods, such as actually putting all of the instructions back into their original sequence and re-executing all of them. Another method is to mark the incorrect out of order load instruction as having been executed too early, and then re-executing only those load instructions marked as incorrect, while letting the store instructions complete and put the results in memory. The store instructions do not need to be re-executed since they have been allowed to execute and will precede the re-executed load instructions. The marking of the load may be implemented by setting a flag bit to a binary 1 or 0. In this manner, marking the load is viewed by the processing system as an interrupt which will cause the incorrect load to be re-fetched from the instruction cache unit and re-executed.
Continuing with above example, the add instruction 3 which places the data from registers R5 and R6 into register R7 is then executed and the data from register R7 is stored in memory address A. At this point the present invention again compares the address of the store instruction with the address of the load instruction in position 47a of load queue 31. These addresses are equal (memory address A) and the program number of the store instruction 4 is then compared with the program number of the load instruction 2 in position 47a of load queue 31. In this case, the store instruction program number is greater than the load instruction program number and properly executed after the load instruction. The processing system then continues normal execution operations.
It can be seen that none of the conventional systems allow for detection of impermissible out of order instructions subsequent to execution and prior to the point where their results must be provided to the system. Those skilled in the art will understand that detection of an out of order condition prior to the result of the instruction being committed to the architecture (even if after execution of the instruction) will greatly enhance processor performance.
Although certain preferred embodiments have been shown and described, it should be understood that many changes and modifications can be made therein without departing from the scope of the appended claims.

Claims (19)

We claim:
1. A data processing system that executes instructions which access a specific address in memory in a plurality of sequences, comprising:
means for reordering said instructions from a first sequence to a second sequence to enable simultaneous execution of said instructions in said second sequence;
means for storing said instructions in said first sequence, and for storing said instructions in said second sequence simultaneously;
a plurality of execution means for simultaneously executing said instructions in said second sequence; and
means for comparing, subsequent to execution of said instructions in said second sequence, whether the address of the non-executed instructions in said first sequence is the same as the address of the executed instructions in said second sequence, and for comparing an instruction number position in said means for storing of instructions in said first and second sequence which access the same address
wherein, if the address is the same and the instruction number for instructions in said first sequence is less than the instruction number for instructions in said second sequence, the data processing system continues executing instructions in said second sequence.
2. A system according to claim 1 wherein said means for comparing comprises:
means for storing information relating to a first type of instruction, subsequent to execution of said first type of instruction; and
means for providing information relating to a second type of instruction upon execution of said first type of instruction.
3. A system according to claim 2 wherein said first type of instruction and said second type of instruction must be executed in a particular sequence.
4. A system according to claim 3 wherein said information comprises a memory address and program number.
5. A system according to claim 4 wherein said means for comparing further comprises:
means for comparing the memory address of said first and second types of instructions; and
means for comparing the program number of said first and second types of instructions.
6. A system according to claim 5 wherein said means for comparing further comprises means for determining that one of said first type of instructions was incorrectly executed out of sequence when the memory address of said first and second types of instructions is equal and the program number of said second type of instructions is less than the program number for said first type of instruction.
7. A system according to claim 6 further comprising:
means for indicating that one of said first type of instructions has been incorrectly executed out of sequence; and
means for re-executing said incorrectly executed one of said first type of instruction in a sequence relative to said second type of instructions.
8. A system according to claim 7 wherein said first type of instruction is a load instruction, and said second type of instruction is a store instruction.
9. A system according to claim 8 wherein said means for storing information comprises a load queue.
10. A system according to claim 9 wherein said means for comparing a memory address and said means for comparing a program number comprise circuitry having a plurality of logical elements disposed in said processing system.
11. A computer implemented method for processing data in a system that executes instructions which access a specific address in memory in a plurality of sequences, said method comprising the computer implemented steps of:
reordering said instructions from a first sequence to a second sequence to enable simultaneous execution of said instructions in said second sequence;
storing said instructions in said first sequence, and for storing said instructions in said second sequence in a queue simultaneously;
simultaneously executing on a plurality of execution units said instructions in said second sequence; and
comparing, subsequent to execution of instructions in said second sequence, whether the address of the non-executed instructions in said first sequence is the same as the address of the executed instructions in said second sequence, and comparing an instruction number position in said queue of instructions in said first and second sequence which access the same address;
wherein, if the address is the same and the instruction number for instructions in said first sequence is less than the instruction number for instructions in said second sequence, the data processing system continues executing instructions in said second sequence.
12. A method according to claim 11 wherein said step of comparing comprises the steps of:
storing information relating to a first type of instruction, subsequent to execution of said first type of instruction; and
providing information relating to a second type of instruction upon execution of said first type of instruction.
13. A method according to claim 12 wherein said first type of instruction and said second type of instruction must be executed in a particular sequence.
14. A method according to claim 13 wherein said information comprises a memory address and program number.
15. A method according to claim 14 wherein said step of comparing further comprises the steps of:
comparing the memory address of said first and second types of instructions; and
comparing the program number of said first and second types of instructions.
16. A method according to claim 15 wherein said step of comparing further comprises the step of determining that one of said first type of instructions was incorrectly executed out of sequence when the memory address of said first and second types of instructions is equal and the program number of said second type of instructions is less than the program number for said first type of instruction.
17. A method according to claim 16 further comprising the steps of:
indicating that one of said first type of instructions has been incorrectly executed out of sequence; and
re-executing said incorrectly executed one of said first type of instruction in a sequence relative to said second type of instructions.
18. A method according to claim 17 wherein said first type of instruction is a load instruction, and said second type of instruction is a store instruction.
19. A method according to claim 18 wherein said step of storing comprises the step of storing said information in a load queue.
US08/001,976 1993-01-08 1993-01-08 Out of order instruction load and store comparison Expired - Fee Related US5467473A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US08/001,976 US5467473A (en) 1993-01-08 1993-01-08 Out of order instruction load and store comparison
JP5269218A JP2597811B2 (en) 1993-01-08 1993-10-27 Data processing system
EP93120937A EP0605869A1 (en) 1993-01-08 1993-12-27 Out of order instruction execution using load and store comparison

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08/001,976 US5467473A (en) 1993-01-08 1993-01-08 Out of order instruction load and store comparison

Publications (1)

Publication Number Publication Date
US5467473A true US5467473A (en) 1995-11-14

Family

ID=21698673

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/001,976 Expired - Fee Related US5467473A (en) 1993-01-08 1993-01-08 Out of order instruction load and store comparison

Country Status (3)

Country Link
US (1) US5467473A (en)
EP (1) EP0605869A1 (en)
JP (1) JP2597811B2 (en)

Cited By (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996012227A1 (en) * 1994-10-14 1996-04-25 Silicon Graphics, Inc. An address queue capable of tracking memory dependencies
US5574928A (en) * 1993-10-29 1996-11-12 Advanced Micro Devices, Inc. Mixed integer/floating point processor core for a superscalar microprocessor with a plurality of operand buses for transferring operand segments
US5623619A (en) * 1993-10-29 1997-04-22 Advanced Micro Devices, Inc. Linearly addressable microprocessor cache
US5634026A (en) * 1995-05-12 1997-05-27 International Business Machines Corporation Source identifier for result forwarding
US5651125A (en) * 1993-10-29 1997-07-22 Advanced Micro Devices, Inc. High performance superscalar microprocessor including a common reorder buffer and common register file for both integer and floating point operations
US5666550A (en) * 1995-06-07 1997-09-09 International Business Machines Corporation Bus operation circuit using CMOS ratio logic circuits
US5694553A (en) * 1994-01-04 1997-12-02 Intel Corporation Method and apparatus for determining the dispatch readiness of buffered load operations in a processor
US5699538A (en) * 1994-12-09 1997-12-16 International Business Machines Corporation Efficient firm consistency support mechanisms in an out-of-order execution superscaler multiprocessor
US5724536A (en) * 1994-01-04 1998-03-03 Intel Corporation Method and apparatus for blocking execution of and storing load operations during their execution
US5745729A (en) * 1995-02-16 1998-04-28 Sun Microsystems, Inc. Methods and apparatuses for servicing load instructions
US5751946A (en) * 1996-01-18 1998-05-12 International Business Machines Corporation Method and system for detecting bypass error conditions in a load/store unit of a superscalar processor
US5751986A (en) * 1994-03-01 1998-05-12 Intel Corporation Computer system with self-consistent ordering mechanism
US5751983A (en) * 1995-10-03 1998-05-12 Abramson; Jeffrey M. Out-of-order processor with a memory subsystem which handles speculatively dispatched load operations
US5754812A (en) * 1995-10-06 1998-05-19 Advanced Micro Devices, Inc. Out-of-order load/store execution control
US5758051A (en) * 1996-07-30 1998-05-26 International Business Machines Corporation Method and apparatus for reordering memory operations in a processor
US5761680A (en) * 1995-08-23 1998-06-02 Symantec Corporation Coherent film system access during defragmentation operations on a storage medium
US5761713A (en) * 1996-03-01 1998-06-02 Hewlett-Packard Co. Address aggregation system and method for increasing throughput to a multi-banked data cache from a processor by concurrently forwarding an address to each bank
US5765035A (en) * 1995-11-20 1998-06-09 Advanced Micro Devices, Inc. Recorder buffer capable of detecting dependencies between accesses to a pair of caches
US5781790A (en) * 1995-12-29 1998-07-14 Intel Corporation Method and apparatus for performing floating point to integer transfers and vice versa
US5784586A (en) * 1995-02-14 1998-07-21 Fujitsu Limited Addressing method for executing load instructions out of order with respect to store instructions
US5784639A (en) * 1995-12-29 1998-07-21 Intel Corporation Load buffer integrated dynamic decoding logic
US5796975A (en) * 1996-05-24 1998-08-18 Hewlett-Packard Company Operand dependency tracking system and method for a processor that executes instructions out of order
US5799162A (en) * 1994-06-01 1998-08-25 Advanced Micro Devices, Inc. Program counter update mechanism
US5802340A (en) * 1995-08-22 1998-09-01 International Business Machines Corporation Method and system of executing speculative store instructions in a parallel processing computer system
US5809275A (en) * 1996-03-01 1998-09-15 Hewlett-Packard Company Store-to-load hazard resolution system and method for a processor that executes instructions out of order
US5835747A (en) * 1996-01-26 1998-11-10 Advanced Micro Devices, Inc. Hierarchical scan logic for out-of-order load/store execution control
US5838942A (en) * 1996-03-01 1998-11-17 Hewlett-Packard Company Panic trap system and method
US5838941A (en) * 1996-12-30 1998-11-17 Intel Corporation Out-of-order superscalar microprocessor with a renaming device that maps instructions from memory to registers
US5848256A (en) * 1996-09-30 1998-12-08 Institute For The Development Of Emerging Architectures, L.L.C. Method and apparatus for address disambiguation using address component identifiers
US5848287A (en) * 1996-02-20 1998-12-08 Advanced Micro Devices, Inc. Superscalar microprocessor including a reorder buffer which detects dependencies between accesses to a pair of caches
US5850563A (en) * 1995-09-11 1998-12-15 International Business Machines Corporation Processor and method for out-of-order completion of floating-point operations during load/store multiple operations
US5857096A (en) * 1995-12-19 1999-01-05 Intel Corporation Microarchitecture for implementing an instruction to clear the tags of a stack reference register file
US5862398A (en) * 1996-05-15 1999-01-19 Philips Electronics North America Corporation Compiler generating swizzled instructions usable in a simplified cache layout
US5872949A (en) * 1996-11-13 1999-02-16 International Business Machines Corp. Apparatus and method for managing data flow dependencies arising from out-of-order execution, by an execution unit, of an instruction series input from an instruction source
US5878245A (en) * 1993-10-29 1999-03-02 Advanced Micro Devices, Inc. High performance load/store functional unit and data cache
US5878242A (en) * 1997-04-21 1999-03-02 International Business Machines Corporation Method and system for forwarding instructions in a processor with increased forwarding probability
US5903772A (en) * 1993-10-29 1999-05-11 Advanced Micro Devices, Inc. Plural operand buses of intermediate widths coupling to narrower width integer and wider width floating point superscalar processing core
US5931957A (en) * 1997-03-31 1999-08-03 International Business Machines Corporation Support for out-of-order execution of loads and stores in a processor
US5940859A (en) * 1995-12-19 1999-08-17 Intel Corporation Emptying packed data state during execution of packed data instructions
US5949971A (en) * 1995-10-02 1999-09-07 International Business Machines Corporation Method and system for performance monitoring through identification of frequency and length of time of execution of serialization instructions in a processing system
US5995746A (en) * 1990-06-29 1999-11-30 Digital Equipment Corporation Byte-compare operation for high-performance processor
US6014759A (en) * 1997-06-13 2000-01-11 Micron Technology, Inc. Method and apparatus for transferring test data from a memory array
US6038657A (en) * 1995-10-06 2000-03-14 Advanced Micro Devices, Inc. Scan chains for out-of-order load/store execution control
US6044429A (en) * 1997-07-10 2000-03-28 Micron Technology, Inc. Method and apparatus for collision-free data transfers in a memory device with selectable data or address paths
US6065110A (en) * 1998-02-09 2000-05-16 International Business Machines Corporation Method and apparatus for loading an instruction buffer of a processor capable of out-of-order instruction issue
US6070238A (en) * 1997-09-11 2000-05-30 International Business Machines Corporation Method and apparatus for detecting overlap condition between a storage instruction and previously executed storage reference instruction
US6091646A (en) * 1998-02-17 2000-07-18 Micron Technology, Inc. Method and apparatus for coupling data from a memory device using a single ended read data path
US6122217A (en) * 1997-03-11 2000-09-19 Micron Technology, Inc. Multi-bank memory input/output line selection
US6134646A (en) * 1999-07-29 2000-10-17 International Business Machines Corp. System and method for executing and completing store instructions
US6148394A (en) * 1998-02-10 2000-11-14 International Business Machines Corporation Apparatus and method for tracking out of order load instructions to avoid data coherency violations in a processor
US6170997B1 (en) 1995-12-19 2001-01-09 Intel Corporation Method for executing instructions that operate on different data types stored in the same single logical register file
US6212622B1 (en) 1998-08-24 2001-04-03 Advanced Micro Devices, Inc. Mechanism for load block on store address generation
US6212623B1 (en) * 1998-08-24 2001-04-03 Advanced Micro Devices, Inc. Universal dependency vector/queue entry
WO2001053951A1 (en) * 2000-01-19 2001-07-26 Fujitsu Limited Memory control device and memory control method
US6336183B1 (en) * 1999-02-26 2002-01-01 International Business Machines Corporation System and method for executing store instructions
US6405307B1 (en) * 1998-06-02 2002-06-11 Intel Corporation Apparatus and method for detecting and handling self-modifying code conflicts in an instruction fetch pipeline
US6405280B1 (en) 1998-06-05 2002-06-11 Micron Technology, Inc. Packet-oriented synchronous DRAM interface supporting a plurality of orderings for data block transfers within a burst sequence
US6442677B1 (en) * 1999-06-10 2002-08-27 Advanced Micro Devices, Inc. Apparatus and method for superforwarding load operands in a microprocessor
US20020124156A1 (en) * 2000-12-29 2002-09-05 Adi Yoaz Using "silent store" information to advance loads
US20020141426A1 (en) * 2001-03-28 2002-10-03 Hidehiko Tanaka Load store queue applied to processor
CN1095117C (en) * 1997-04-10 2002-11-27 国际商业机器公司 Forwarding of results of store instructions
US20030018854A1 (en) * 2001-07-17 2003-01-23 Fujitsu Limited Microprocessor
US6542984B1 (en) 2000-01-03 2003-04-01 Advanced Micro Devices, Inc. Scheduler capable of issuing and reissuing dependency chains
US6564315B1 (en) 2000-01-03 2003-05-13 Advanced Micro Devices, Inc. Scheduler which discovers non-speculative nature of an instruction after issuing and reissues the instruction
US6567338B1 (en) 1996-04-19 2003-05-20 Integrated Device Technology, Inc. Fully synchronous pipelined RAM
US6581155B1 (en) * 1999-08-25 2003-06-17 National Semiconductor Corporation Pipelined, superscalar floating point unit having out-of-order execution capability and processor employing the same
US6591354B1 (en) 1998-02-23 2003-07-08 Integrated Device Technology, Inc. Separate byte control on fully synchronous pipelined SRAM
US20030131219A1 (en) * 1994-12-02 2003-07-10 Alexander Peleg Method and apparatus for unpacking packed data
US6622235B1 (en) 2000-01-03 2003-09-16 Advanced Micro Devices, Inc. Scheduler which retries load/store hit situations
US6622237B1 (en) 2000-01-03 2003-09-16 Advanced Micro Devices, Inc. Store to load forward predictor training using delta tag
US6651161B1 (en) 2000-01-03 2003-11-18 Advanced Micro Devices, Inc. Store load forward predictor untraining
CN1130632C (en) * 1998-01-29 2003-12-10 西门子公司 Method and device for preventing store non-actual data into computer memory
US6694424B1 (en) 2000-01-03 2004-02-17 Advanced Micro Devices, Inc. Store load forward predictor training
US6708296B1 (en) 1995-06-30 2004-03-16 International Business Machines Corporation Method and system for selecting and distinguishing an event sequence using an effective address in a processing system
US6728867B1 (en) * 1999-05-21 2004-04-27 Intel Corporation Method for comparing returned first load data at memory address regardless of conflicting with first load and any instruction executed between first load and check-point
US6792523B1 (en) 1995-12-19 2004-09-14 Intel Corporation Processor with instructions that operate on different data types stored in the same single logical register file
US20050154832A1 (en) * 2004-01-13 2005-07-14 Steely Simon C.Jr. Consistency evaluation of program execution across at least one memory barrier
US7069406B2 (en) 1999-07-02 2006-06-27 Integrated Device Technology, Inc. Double data rate synchronous SRAM with 100% bus utilization
US7089404B1 (en) * 1999-06-14 2006-08-08 Transmeta Corporation Method and apparatus for enhancing scheduling in an advanced microprocessor
US20060179346A1 (en) * 2005-02-10 2006-08-10 International Business Machines Corporation Method for checkpointing instruction groups with out-of-order floating point instructions in a multi-threaded processor
US20060179265A1 (en) * 2005-02-08 2006-08-10 Flood Rachel M Systems and methods for executing x-form instructions
US20060179207A1 (en) * 2005-02-10 2006-08-10 International Business Machines Corporation Processor instruction retry recovery
US20060271820A1 (en) * 2005-05-27 2006-11-30 Mack Michael J Method and apparatus for reducing number of cycles required to checkpoint instructions in a multi-threaded processor
US20070038846A1 (en) * 2005-08-10 2007-02-15 P.A. Semi, Inc. Partial load/store forward prediction
US20070035550A1 (en) * 2005-08-12 2007-02-15 Bohuslav Rychlik Advanced load value check enhancement
US20070288727A1 (en) * 2006-06-08 2007-12-13 International Business Machines Corporation A method to reduce the number of times in-flight loads are searched by store instructions in a multi-threaded processor
US7321964B2 (en) 2003-07-08 2008-01-22 Advanced Micro Devices, Inc. Store-to-load forwarding buffer using indexed lookup
US20090013135A1 (en) * 2007-07-05 2009-01-08 Board Of Regents, The University Of Texas System Unordered load/store queue
US20090210675A1 (en) * 2008-02-20 2009-08-20 International Business Machines Corporation Method and system for early instruction text based operand store compare reject avoidance
US7634635B1 (en) 1999-06-14 2009-12-15 Brian Holscher Systems and methods for reordering processor instructions
US7716452B1 (en) 1996-08-22 2010-05-11 Kelly Edmund J Translated memory protection apparatus for an advanced microprocessor
US7730330B1 (en) 2000-06-16 2010-06-01 Marc Fleischmann System and method for saving and restoring a processor state without executing any instructions from a first instruction set
US20110145551A1 (en) * 2009-12-16 2011-06-16 Cheng Wang Two-stage commit (tsc) region for dynamic binary optimization in x86
US20110185158A1 (en) * 2010-01-28 2011-07-28 International Business Machines Corporation History and alignment based cracking for store multiple instructions for optimizing operand store compare penalties
US20110219213A1 (en) * 2010-03-05 2011-09-08 International Business Machines Corporation Instruction cracking based on machine state
US20110276764A1 (en) * 2010-05-05 2011-11-10 International Business Machines Corporation Cracking destructively overlapping operands in variable length instructions
US20120117335A1 (en) * 2010-11-10 2012-05-10 Advanced Micro Devices, Inc. Load ordering queue
US20130061290A1 (en) * 2011-09-06 2013-03-07 Jacob Mendel System for securely performing a transaction
US8650555B1 (en) 1999-10-20 2014-02-11 Richard Johnson Method for increasing the speed of speculative execution
US20140215190A1 (en) * 2013-01-25 2014-07-31 Apple Inc. Completing load and store instructions in a weakly-ordered memory model
US9128725B2 (en) 2012-05-04 2015-09-08 Apple Inc. Load-store dependency predictor content management
US20160266205A1 (en) * 2015-03-10 2016-09-15 Fujitsu Limited Logic verification apparatus, logic verification method and test program
US9600289B2 (en) 2012-05-30 2017-03-21 Apple Inc. Load-store dependency predictor PC hashing
US9626189B2 (en) 2012-06-15 2017-04-18 International Business Machines Corporation Reducing operand store compare penalties
WO2017095515A1 (en) * 2015-11-30 2017-06-08 Intel IP Corporation Instruction and logic for in-order handling in an out-of-order processor
US9710268B2 (en) 2014-04-29 2017-07-18 Apple Inc. Reducing latency for pointer chasing loads
US10437595B1 (en) 2016-03-15 2019-10-08 Apple Inc. Load/store dependency predictor optimization for replayed loads
US10514925B1 (en) 2016-01-28 2019-12-24 Apple Inc. Load speculation recovery
CN112445587A (en) * 2019-08-30 2021-03-05 上海华为技术有限公司 Task processing method and task processing device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005041047A2 (en) * 2003-10-22 2005-05-06 Intel Corporation Method and apparatus for efficient ordered stores over an interconnection network

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4408274A (en) * 1979-09-29 1983-10-04 Plessey Overseas Limited Memory protection system using capability registers
US4574349A (en) * 1981-03-30 1986-03-04 International Business Machines Corp. Apparatus for addressing a larger number of instruction addressable central processor registers than can be identified by a program instruction
US4607332A (en) * 1983-01-14 1986-08-19 At&T Bell Laboratories Dynamic alteration of firmware programs in Read-Only Memory based systems
US4630195A (en) * 1984-05-31 1986-12-16 International Business Machines Corporation Data processing system with CPU register to register data transfers overlapped with data transfer to and from main storage
US4638429A (en) * 1983-12-19 1987-01-20 Hitachi, Ltd. Data processing apparatus for processing operand store conflict
US4697233A (en) * 1984-04-02 1987-09-29 Unisys Corporation Partial duplication of pipelined stack with data integrity checking
US4757440A (en) * 1984-04-02 1988-07-12 Unisys Corporation Pipelined data stack with access through-checking
EP0302999A2 (en) * 1987-05-18 1989-02-15 International Business Machines Corporation Out-of-sequence operand fetches
US4827405A (en) * 1985-11-27 1989-05-02 Nec Corporation Data processing apparatus
US4831517A (en) * 1986-10-10 1989-05-16 International Business Machines Corporation Branch and return on address instruction and methods and apparatus for implementing same in a digital data processing system
US4905200A (en) * 1988-08-29 1990-02-27 Ford Motor Company Apparatus and method for correcting microcomputer software errors
US4965716A (en) * 1988-03-11 1990-10-23 International Business Machines Corporation Fast access priority queue for managing multiple messages at a communications node or managing multiple programs in a multiprogrammed data processor
US5051940A (en) * 1990-04-04 1991-09-24 International Business Machines Corporation Data dependency collapsing hardware apparatus
US5101341A (en) * 1988-08-25 1992-03-31 Edgcore Technology, Inc. Pipelined system for reducing instruction access time by accumulating predecoded instruction bits a FIFO
US5131086A (en) * 1988-08-25 1992-07-14 Edgcore Technology, Inc. Method and system for executing pipelined three operand construct
US5133077A (en) * 1987-10-19 1992-07-21 International Business Machines Corporation Data processor having multiple execution units for processing plural classs of instructions in parallel
US5136697A (en) * 1989-06-06 1992-08-04 Advanced Micro Devices, Inc. System for reducing delay for execution subsequent to correctly predicted branch instruction using fetch information stored with each block of instructions in cache
EP0523337A2 (en) * 1991-07-15 1993-01-20 International Business Machines Corporation Self-scheduling parallel computer system and method
US5185871A (en) * 1989-12-26 1993-02-09 International Business Machines Corporation Coordination of out-of-sequence fetching between multiple processors using re-execution of instructions
US5202975A (en) * 1990-06-11 1993-04-13 Supercomputer Systems Limited Partnership Method for optimizing instruction scheduling for a processor having multiple functional resources
US5241633A (en) * 1988-04-01 1993-08-31 Nec Corporation Instruction handling sequence control system for simultaneous execution of instructions
US5253349A (en) * 1991-01-30 1993-10-12 International Business Machines Corporation Decreasing processing time for type 1 dyadic instructions
US5261071A (en) * 1991-03-21 1993-11-09 Control Data System, Inc. Dual pipe cache memory with out-of-order issue capability

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5731049A (en) * 1980-07-31 1982-02-19 Nec Corp Information processing equipment

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4408274A (en) * 1979-09-29 1983-10-04 Plessey Overseas Limited Memory protection system using capability registers
US4574349A (en) * 1981-03-30 1986-03-04 International Business Machines Corp. Apparatus for addressing a larger number of instruction addressable central processor registers than can be identified by a program instruction
US4607332A (en) * 1983-01-14 1986-08-19 At&T Bell Laboratories Dynamic alteration of firmware programs in Read-Only Memory based systems
US4638429A (en) * 1983-12-19 1987-01-20 Hitachi, Ltd. Data processing apparatus for processing operand store conflict
US4697233A (en) * 1984-04-02 1987-09-29 Unisys Corporation Partial duplication of pipelined stack with data integrity checking
US4757440A (en) * 1984-04-02 1988-07-12 Unisys Corporation Pipelined data stack with access through-checking
US4630195A (en) * 1984-05-31 1986-12-16 International Business Machines Corporation Data processing system with CPU register to register data transfers overlapped with data transfer to and from main storage
US4827405A (en) * 1985-11-27 1989-05-02 Nec Corporation Data processing apparatus
US4831517A (en) * 1986-10-10 1989-05-16 International Business Machines Corporation Branch and return on address instruction and methods and apparatus for implementing same in a digital data processing system
US4991090A (en) * 1987-05-18 1991-02-05 International Business Machines Corporation Posting out-of-sequence fetches
EP0302999A2 (en) * 1987-05-18 1989-02-15 International Business Machines Corporation Out-of-sequence operand fetches
US5133077A (en) * 1987-10-19 1992-07-21 International Business Machines Corporation Data processor having multiple execution units for processing plural classs of instructions in parallel
US4965716A (en) * 1988-03-11 1990-10-23 International Business Machines Corporation Fast access priority queue for managing multiple messages at a communications node or managing multiple programs in a multiprogrammed data processor
US5241633A (en) * 1988-04-01 1993-08-31 Nec Corporation Instruction handling sequence control system for simultaneous execution of instructions
US5101341A (en) * 1988-08-25 1992-03-31 Edgcore Technology, Inc. Pipelined system for reducing instruction access time by accumulating predecoded instruction bits a FIFO
US5131086A (en) * 1988-08-25 1992-07-14 Edgcore Technology, Inc. Method and system for executing pipelined three operand construct
US4905200A (en) * 1988-08-29 1990-02-27 Ford Motor Company Apparatus and method for correcting microcomputer software errors
US5136697A (en) * 1989-06-06 1992-08-04 Advanced Micro Devices, Inc. System for reducing delay for execution subsequent to correctly predicted branch instruction using fetch information stored with each block of instructions in cache
US5185871A (en) * 1989-12-26 1993-02-09 International Business Machines Corporation Coordination of out-of-sequence fetching between multiple processors using re-execution of instructions
US5051940A (en) * 1990-04-04 1991-09-24 International Business Machines Corporation Data dependency collapsing hardware apparatus
US5202975A (en) * 1990-06-11 1993-04-13 Supercomputer Systems Limited Partnership Method for optimizing instruction scheduling for a processor having multiple functional resources
US5253349A (en) * 1991-01-30 1993-10-12 International Business Machines Corporation Decreasing processing time for type 1 dyadic instructions
US5261071A (en) * 1991-03-21 1993-11-09 Control Data System, Inc. Dual pipe cache memory with out-of-order issue capability
EP0523337A2 (en) * 1991-07-15 1993-01-20 International Business Machines Corporation Self-scheduling parallel computer system and method

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
IBM TDB, "MSIS Combining Serialization-MF-OSC within a Single Control" vol. 36, No. 1, Jan. 1993, pp. 229-235.
IBM TDB, "MSIS MP Version", vol. 36, No. 1, Jan. 1993, pp. 317-322.
IBM TDB, "Pipeline Prefetch Detector", vol. 30, No. 1, Jun. 1987, pp. 191-192.
IBM TDB, MSIS Combining Serialization MF OSC within a Single Control vol. 36, No. 1, Jan. 1993, pp. 229 235. *
IBM TDB, MSIS MP Version , vol. 36, No. 1, Jan. 1993, pp. 317 322. *
IBM TDB, Pipeline Prefetch Detector , vol. 30, No. 1, Jun. 1987, pp. 191 192. *

Cited By (195)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5995746A (en) * 1990-06-29 1999-11-30 Digital Equipment Corporation Byte-compare operation for high-performance processor
US6298423B1 (en) 1993-10-29 2001-10-02 Advanced Micro Devices, Inc. High performance load/store functional unit and data cache
US5655098A (en) * 1993-10-29 1997-08-05 Advanced Micro Devices, Inc. High performance superscalar microprocessor including a circuit for byte-aligning cisc instructions stored in a variable byte-length format
US5867682A (en) * 1993-10-29 1999-02-02 Advanced Micro Devices, Inc. High performance superscalar microprocessor including a circuit for converting CISC instructions to RISC operations
US5751981A (en) * 1993-10-29 1998-05-12 Advanced Micro Devices, Inc. High performance superscalar microprocessor including a speculative instruction queue for byte-aligning CISC instructions stored in a variable byte-length format
US6240484B1 (en) * 1993-10-29 2001-05-29 Advanced Micro Devices, Inc. Linearly addressable microprocessor cache
US5655097A (en) * 1993-10-29 1997-08-05 Advanced Micro Devices, Inc. High performance superscalar microprocessor including an instruction cache circuit for byte-aligning CISC instructions stored in a variable byte-length format
US5664136A (en) * 1993-10-29 1997-09-02 Advanced Micro Devices, Inc. High performance superscalar microprocessor including a dual-pathway circuit for converting cisc instructions to risc operations
US5867683A (en) * 1993-10-29 1999-02-02 Advanced Micro Devices, Inc. Method of operating a high performance superscalar microprocessor including a common reorder buffer and common register file for both integer and floating point operations
US5878245A (en) * 1993-10-29 1999-03-02 Advanced Micro Devices, Inc. High performance load/store functional unit and data cache
US5903772A (en) * 1993-10-29 1999-05-11 Advanced Micro Devices, Inc. Plural operand buses of intermediate widths coupling to narrower width integer and wider width floating point superscalar processing core
US5574928A (en) * 1993-10-29 1996-11-12 Advanced Micro Devices, Inc. Mixed integer/floating point processor core for a superscalar microprocessor with a plurality of operand buses for transferring operand segments
US5623619A (en) * 1993-10-29 1997-04-22 Advanced Micro Devices, Inc. Linearly addressable microprocessor cache
US5651125A (en) * 1993-10-29 1997-07-22 Advanced Micro Devices, Inc. High performance superscalar microprocessor including a common reorder buffer and common register file for both integer and floating point operations
US5694553A (en) * 1994-01-04 1997-12-02 Intel Corporation Method and apparatus for determining the dispatch readiness of buffered load operations in a processor
US5724536A (en) * 1994-01-04 1998-03-03 Intel Corporation Method and apparatus for blocking execution of and storing load operations during their execution
US5751986A (en) * 1994-03-01 1998-05-12 Intel Corporation Computer system with self-consistent ordering mechanism
US6035386A (en) * 1994-06-01 2000-03-07 Advanced Micro Devices, Inc. Program counter update mechanism
US5799162A (en) * 1994-06-01 1998-08-25 Advanced Micro Devices, Inc. Program counter update mechanism
US6351801B1 (en) 1994-06-01 2002-02-26 Advanced Micro Devices, Inc. Program counter update mechanism
US6216200B1 (en) * 1994-10-14 2001-04-10 Mips Technologies, Inc. Address queue
WO1996012227A1 (en) * 1994-10-14 1996-04-25 Silicon Graphics, Inc. An address queue capable of tracking memory dependencies
US8838946B2 (en) 1994-12-02 2014-09-16 Intel Corporation Packing lower half bits of signed data elements in two source registers in a destination register with saturation
US20030131219A1 (en) * 1994-12-02 2003-07-10 Alexander Peleg Method and apparatus for unpacking packed data
US8601246B2 (en) 1994-12-02 2013-12-03 Intel Corporation Execution of instruction with element size control bit to interleavingly store half packed data elements of source registers in same size destination register
US9361100B2 (en) 1994-12-02 2016-06-07 Intel Corporation Packing saturated lower 8-bit elements from two source registers of packed 16-bit elements
US20110093682A1 (en) * 1994-12-02 2011-04-21 Alexander Peleg Method and apparatus for packing data
US9223572B2 (en) 1994-12-02 2015-12-29 Intel Corporation Interleaving half of packed data elements of size specified in instruction and stored in two source registers
US8793475B2 (en) 1994-12-02 2014-07-29 Intel Corporation Method and apparatus for unpacking and moving packed data
US9389858B2 (en) 1994-12-02 2016-07-12 Intel Corporation Orderly storing of corresponding packed bytes from first and second source registers in result register
US9182983B2 (en) 1994-12-02 2015-11-10 Intel Corporation Executing unpack instruction and pack instruction with saturation on packed data elements from two source operand registers
US8521994B2 (en) 1994-12-02 2013-08-27 Intel Corporation Interleaving corresponding data elements from part of two source registers to destination register in processor operable to perform saturation
US9141387B2 (en) 1994-12-02 2015-09-22 Intel Corporation Processor executing unpack and pack instructions specifying two source packed data operands and saturation
US8495346B2 (en) 1994-12-02 2013-07-23 Intel Corporation Processor executing pack and unpack instructions
US7966482B2 (en) 1994-12-02 2011-06-21 Intel Corporation Interleaving saturated lower half of data elements from two source registers of packed data
US9116687B2 (en) 1994-12-02 2015-08-25 Intel Corporation Packing in destination register half of each element with saturation from two source packed data registers
US20060236076A1 (en) * 1994-12-02 2006-10-19 Alexander Peleg Method and apparatus for packing data
US8639914B2 (en) 1994-12-02 2014-01-28 Intel Corporation Packing signed word elements from two source registers to saturated signed byte elements in destination register
US20110219214A1 (en) * 1994-12-02 2011-09-08 Alexander Peleg Microprocessor having novel operations
US9015453B2 (en) 1994-12-02 2015-04-21 Intel Corporation Packing odd bytes from two source registers of packed data
US8190867B2 (en) 1994-12-02 2012-05-29 Intel Corporation Packing two packed signed data in registers with saturation
US5699538A (en) * 1994-12-09 1997-12-16 International Business Machines Corporation Efficient firm consistency support mechanisms in an out-of-order execution superscaler multiprocessor
US5784586A (en) * 1995-02-14 1998-07-21 Fujitsu Limited Addressing method for executing load instructions out of order with respect to store instructions
US5745729A (en) * 1995-02-16 1998-04-28 Sun Microsystems, Inc. Methods and apparatuses for servicing load instructions
US5802575A (en) * 1995-02-16 1998-09-01 Sun Microsystems, Inc. Hit bit for indicating whether load buffer entries will hit a cache when they reach buffer head
US5634026A (en) * 1995-05-12 1997-05-27 International Business Machines Corporation Source identifier for result forwarding
US5758179A (en) * 1995-06-07 1998-05-26 International Business Machines Corporation Bus operation circuit using CMOS ratio logic circuits
US5666550A (en) * 1995-06-07 1997-09-09 International Business Machines Corporation Bus operation circuit using CMOS ratio logic circuits
US6708296B1 (en) 1995-06-30 2004-03-16 International Business Machines Corporation Method and system for selecting and distinguishing an event sequence using an effective address in a processing system
US5802340A (en) * 1995-08-22 1998-09-01 International Business Machines Corporation Method and system of executing speculative store instructions in a parallel processing computer system
US5761680A (en) * 1995-08-23 1998-06-02 Symantec Corporation Coherent film system access during defragmentation operations on a storage medium
US5850563A (en) * 1995-09-11 1998-12-15 International Business Machines Corporation Processor and method for out-of-order completion of floating-point operations during load/store multiple operations
US5949971A (en) * 1995-10-02 1999-09-07 International Business Machines Corporation Method and system for performance monitoring through identification of frequency and length of time of execution of serialization instructions in a processing system
US5751983A (en) * 1995-10-03 1998-05-12 Abramson; Jeffrey M. Out-of-order processor with a memory subsystem which handles speculatively dispatched load operations
US5754812A (en) * 1995-10-06 1998-05-19 Advanced Micro Devices, Inc. Out-of-order load/store execution control
US6038657A (en) * 1995-10-06 2000-03-14 Advanced Micro Devices, Inc. Scan chains for out-of-order load/store execution control
US5765035A (en) * 1995-11-20 1998-06-09 Advanced Micro Devices, Inc. Recorder buffer capable of detecting dependencies between accesses to a pair of caches
US7149882B2 (en) 1995-12-19 2006-12-12 Intel Corporation Processor with instructions that operate on different data types stored in the same single logical register file
US6266686B1 (en) 1995-12-19 2001-07-24 Intel Corporation Emptying packed data state during execution of packed data instructions
US6170997B1 (en) 1995-12-19 2001-01-09 Intel Corporation Method for executing instructions that operate on different data types stored in the same single logical register file
US6792523B1 (en) 1995-12-19 2004-09-14 Intel Corporation Processor with instructions that operate on different data types stored in the same single logical register file
US7373490B2 (en) 1995-12-19 2008-05-13 Intel Corporation Emptying packed data state during execution of packed data instructions
US5940859A (en) * 1995-12-19 1999-08-17 Intel Corporation Emptying packed data state during execution of packed data instructions
US5857096A (en) * 1995-12-19 1999-01-05 Intel Corporation Microarchitecture for implementing an instruction to clear the tags of a stack reference register file
US20050038977A1 (en) * 1995-12-19 2005-02-17 Glew Andrew F. Processor with instructions that operate on different data types stored in the same single logical register file
US6751725B2 (en) 1995-12-19 2004-06-15 Intel Corporation Methods and apparatuses to clear state for operation of a stack
US5781790A (en) * 1995-12-29 1998-07-14 Intel Corporation Method and apparatus for performing floating point to integer transfers and vice versa
US5784639A (en) * 1995-12-29 1998-07-21 Intel Corporation Load buffer integrated dynamic decoding logic
US5751946A (en) * 1996-01-18 1998-05-12 International Business Machines Corporation Method and system for detecting bypass error conditions in a load/store unit of a superscalar processor
US5835747A (en) * 1996-01-26 1998-11-10 Advanced Micro Devices, Inc. Hierarchical scan logic for out-of-order load/store execution control
US5848287A (en) * 1996-02-20 1998-12-08 Advanced Micro Devices, Inc. Superscalar microprocessor including a reorder buffer which detects dependencies between accesses to a pair of caches
US6192462B1 (en) 1996-02-20 2001-02-20 Advanced Micro Devices, Inc. Superscalar microprocessor including a load/store unit, decode units and a reorder buffer to detect dependencies between access to a stack cache and a data cache
US5761713A (en) * 1996-03-01 1998-06-02 Hewlett-Packard Co. Address aggregation system and method for increasing throughput to a multi-banked data cache from a processor by concurrently forwarding an address to each bank
US5838942A (en) * 1996-03-01 1998-11-17 Hewlett-Packard Company Panic trap system and method
US5809275A (en) * 1996-03-01 1998-09-15 Hewlett-Packard Company Store-to-load hazard resolution system and method for a processor that executes instructions out of order
US6785188B2 (en) 1996-04-19 2004-08-31 Integrated Device Technology, Inc. Fully synchronous pipelined RAM
US6567338B1 (en) 1996-04-19 2003-05-20 Integrated Device Technology, Inc. Fully synchronous pipelined RAM
US5862398A (en) * 1996-05-15 1999-01-19 Philips Electronics North America Corporation Compiler generating swizzled instructions usable in a simplified cache layout
US5796975A (en) * 1996-05-24 1998-08-18 Hewlett-Packard Company Operand dependency tracking system and method for a processor that executes instructions out of order
US5758051A (en) * 1996-07-30 1998-05-26 International Business Machines Corporation Method and apparatus for reordering memory operations in a processor
US8495337B2 (en) 1996-08-22 2013-07-23 Edmund Kelly Translated memory protection
US8055877B1 (en) 1996-08-22 2011-11-08 Kelly Edmund J Translated memory protection apparatus for an advanced microprocessor
US7840776B1 (en) 1996-08-22 2010-11-23 Kelly Edmund J Translated memory protection apparatus for an advanced microprocessor
US8719544B2 (en) 1996-08-22 2014-05-06 Edmund J. Kelly Translated memory protection apparatus for an advanced microprocessor
US7716452B1 (en) 1996-08-22 2010-05-11 Kelly Edmund J Translated memory protection apparatus for an advanced microprocessor
US20100205413A1 (en) * 1996-08-22 2010-08-12 Kelly Edmund J Translated memory protection
US5848256A (en) * 1996-09-30 1998-12-08 Institute For The Development Of Emerging Architectures, L.L.C. Method and apparatus for address disambiguation using address component identifiers
US5872949A (en) * 1996-11-13 1999-02-16 International Business Machines Corp. Apparatus and method for managing data flow dependencies arising from out-of-order execution, by an execution unit, of an instruction series input from an instruction source
US5838941A (en) * 1996-12-30 1998-11-17 Intel Corporation Out-of-order superscalar microprocessor with a renaming device that maps instructions from memory to registers
US6122217A (en) * 1997-03-11 2000-09-19 Micron Technology, Inc. Multi-bank memory input/output line selection
US6256255B1 (en) 1997-03-11 2001-07-03 Micron Technology, Inc. Multi-bank memory input/output line selection
US5931957A (en) * 1997-03-31 1999-08-03 International Business Machines Corporation Support for out-of-order execution of loads and stores in a processor
CN1095117C (en) * 1997-04-10 2002-11-27 国际商业机器公司 Forwarding of results of store instructions
US5878242A (en) * 1997-04-21 1999-03-02 International Business Machines Corporation Method and system for forwarding instructions in a processor with increased forwarding probability
US6519719B1 (en) 1997-06-13 2003-02-11 Micron Technology, Inc. Method and apparatus for transferring test data from a memory array
US6014759A (en) * 1997-06-13 2000-01-11 Micron Technology, Inc. Method and apparatus for transferring test data from a memory array
US6789175B2 (en) 1997-07-10 2004-09-07 Micron Technology, Inc. Method and apparatus for synchronous data transfers in a memory device with selectable data or address paths
US6415340B1 (en) 1997-07-10 2002-07-02 Micron Technology, Inc. Method and apparatus for synchronous data transfers in a memory device with selectable data or address paths
US6556483B2 (en) 1997-07-10 2003-04-29 Micron Technology, Inc. Method and apparatus for synchronous data transfers in a memory device with selectable data or address paths
US6560668B2 (en) 1997-07-10 2003-05-06 Micron Technology, Inc. Method and apparatus for reading write-modified read data in memory device providing synchronous data transfers
US6272608B1 (en) 1997-07-10 2001-08-07 Micron Technology, Inc. Method and apparatus for synchronous data transfers in a memory device with lookahead logic for detecting latency intervals
US6611885B2 (en) 1997-07-10 2003-08-26 Micron Technology, Inc. Method and apparatus for synchronous data transfers in a memory device with selectable data or address paths
US6614698B2 (en) 1997-07-10 2003-09-02 Micron Technology, Inc. Method and apparatus for synchronous data transfers in a memory device with selectable data or address paths
US6044429A (en) * 1997-07-10 2000-03-28 Micron Technology, Inc. Method and apparatus for collision-free data transfers in a memory device with selectable data or address paths
US6070238A (en) * 1997-09-11 2000-05-30 International Business Machines Corporation Method and apparatus for detecting overlap condition between a storage instruction and previously executed storage reference instruction
CN1130632C (en) * 1998-01-29 2003-12-10 西门子公司 Method and device for preventing store non-actual data into computer memory
US6065110A (en) * 1998-02-09 2000-05-16 International Business Machines Corporation Method and apparatus for loading an instruction buffer of a processor capable of out-of-order instruction issue
US6148394A (en) * 1998-02-10 2000-11-14 International Business Machines Corporation Apparatus and method for tracking out of order load instructions to avoid data coherency violations in a processor
US6091646A (en) * 1998-02-17 2000-07-18 Micron Technology, Inc. Method and apparatus for coupling data from a memory device using a single ended read data path
US6591354B1 (en) 1998-02-23 2003-07-08 Integrated Device Technology, Inc. Separate byte control on fully synchronous pipelined SRAM
US6405307B1 (en) * 1998-06-02 2002-06-11 Intel Corporation Apparatus and method for detecting and handling self-modifying code conflicts in an instruction fetch pipeline
US6405280B1 (en) 1998-06-05 2002-06-11 Micron Technology, Inc. Packet-oriented synchronous DRAM interface supporting a plurality of orderings for data block transfers within a burst sequence
US6553482B1 (en) 1998-08-24 2003-04-22 Advanced Micro Devices, Inc. Universal dependency vector/queue entry
US6212622B1 (en) 1998-08-24 2001-04-03 Advanced Micro Devices, Inc. Mechanism for load block on store address generation
US6212623B1 (en) * 1998-08-24 2001-04-03 Advanced Micro Devices, Inc. Universal dependency vector/queue entry
US6336183B1 (en) * 1999-02-26 2002-01-01 International Business Machines Corporation System and method for executing store instructions
US6728867B1 (en) * 1999-05-21 2004-04-27 Intel Corporation Method for comparing returned first load data at memory address regardless of conflicting with first load and any instruction executed between first load and check-point
US6442677B1 (en) * 1999-06-10 2002-08-27 Advanced Micro Devices, Inc. Apparatus and method for superforwarding load operands in a microprocessor
US8209517B1 (en) * 1999-06-14 2012-06-26 Rozas Guillermo J Method and apparatus for enhancing scheduling in an advanced microprocessor
US7634635B1 (en) 1999-06-14 2009-12-15 Brian Holscher Systems and methods for reordering processor instructions
US9081563B2 (en) 1999-06-14 2015-07-14 Guillermo J. Rozas Method and apparatus for enhancing scheduling in an advanced microprocessor
US7089404B1 (en) * 1999-06-14 2006-08-08 Transmeta Corporation Method and apparatus for enhancing scheduling in an advanced microprocessor
US7069406B2 (en) 1999-07-02 2006-06-27 Integrated Device Technology, Inc. Double data rate synchronous SRAM with 100% bus utilization
US6134646A (en) * 1999-07-29 2000-10-17 International Business Machines Corp. System and method for executing and completing store instructions
US6581155B1 (en) * 1999-08-25 2003-06-17 National Semiconductor Corporation Pipelined, superscalar floating point unit having out-of-order execution capability and processor employing the same
US6907518B1 (en) 1999-08-25 2005-06-14 National Semiconductor Corporation Pipelined, superscalar floating point unit having out-of-order execution capability and processor employing the same
US10061582B2 (en) 1999-10-20 2018-08-28 Intellectual Ventures Holding 81 Llc Method for increasing the speed of speculative execution
US8650555B1 (en) 1999-10-20 2014-02-11 Richard Johnson Method for increasing the speed of speculative execution
US6651161B1 (en) 2000-01-03 2003-11-18 Advanced Micro Devices, Inc. Store load forward predictor untraining
US6694424B1 (en) 2000-01-03 2004-02-17 Advanced Micro Devices, Inc. Store load forward predictor training
US6564315B1 (en) 2000-01-03 2003-05-13 Advanced Micro Devices, Inc. Scheduler which discovers non-speculative nature of an instruction after issuing and reissues the instruction
US6622237B1 (en) 2000-01-03 2003-09-16 Advanced Micro Devices, Inc. Store to load forward predictor training using delta tag
US6542984B1 (en) 2000-01-03 2003-04-01 Advanced Micro Devices, Inc. Scheduler capable of issuing and reissuing dependency chains
US6622235B1 (en) 2000-01-03 2003-09-16 Advanced Micro Devices, Inc. Scheduler which retries load/store hit situations
US7093074B2 (en) 2000-01-19 2006-08-15 Fujitsu Limited Storage control device and storage control method
WO2001053951A1 (en) * 2000-01-19 2001-07-26 Fujitsu Limited Memory control device and memory control method
US20030005227A1 (en) * 2000-01-19 2003-01-02 Fujitsu Limited Storage control device and storage control method
US8140872B1 (en) 2000-06-16 2012-03-20 Marc Fleischmann Restoring processor context in response to processor power-up
US7730330B1 (en) 2000-06-16 2010-06-01 Marc Fleischmann System and method for saving and restoring a processor state without executing any instructions from a first instruction set
US7062638B2 (en) * 2000-12-29 2006-06-13 Intel Corporation Prediction of issued silent store operations for allowing subsequently issued loads to bypass unexecuted silent stores and confirming the bypass upon execution of the stores
US20020124156A1 (en) * 2000-12-29 2002-09-05 Adi Yoaz Using "silent store" information to advance loads
US20020141426A1 (en) * 2001-03-28 2002-10-03 Hidehiko Tanaka Load store queue applied to processor
US7058049B2 (en) * 2001-03-28 2006-06-06 Semiconductor Technology Academic Research Center Load store queue applied to processor
US20030018854A1 (en) * 2001-07-17 2003-01-23 Fujitsu Limited Microprocessor
US6823406B2 (en) * 2001-07-17 2004-11-23 Fujitsu Limited Microprocessor executing a program to guarantee an access order
US7321964B2 (en) 2003-07-08 2008-01-22 Advanced Micro Devices, Inc. Store-to-load forwarding buffer using indexed lookup
US8301844B2 (en) * 2004-01-13 2012-10-30 Hewlett-Packard Development Company, L.P. Consistency evaluation of program execution across at least one memory barrier
US20050154832A1 (en) * 2004-01-13 2005-07-14 Steely Simon C.Jr. Consistency evaluation of program execution across at least one memory barrier
US20060179265A1 (en) * 2005-02-08 2006-08-10 Flood Rachel M Systems and methods for executing x-form instructions
US20060179346A1 (en) * 2005-02-10 2006-08-10 International Business Machines Corporation Method for checkpointing instruction groups with out-of-order floating point instructions in a multi-threaded processor
US7478276B2 (en) * 2005-02-10 2009-01-13 International Business Machines Corporation Method for checkpointing instruction groups with out-of-order floating point instructions in a multi-threaded processor
US20060179207A1 (en) * 2005-02-10 2006-08-10 International Business Machines Corporation Processor instruction retry recovery
US7827443B2 (en) 2005-02-10 2010-11-02 International Business Machines Corporation Processor instruction retry recovery
US7467325B2 (en) 2005-02-10 2008-12-16 International Business Machines Corporation Processor instruction retry recovery
US20090063898A1 (en) * 2005-02-10 2009-03-05 International Business Machines Corporation Processor Instruction Retry Recovery
US7409589B2 (en) 2005-05-27 2008-08-05 International Business Machines Corporation Method and apparatus for reducing number of cycles required to checkpoint instructions in a multi-threaded processor
US20060271820A1 (en) * 2005-05-27 2006-11-30 Mack Michael J Method and apparatus for reducing number of cycles required to checkpoint instructions in a multi-threaded processor
US20080177988A1 (en) * 2005-08-10 2008-07-24 Sudarshan Kadambi Partial Load/Store Forward Prediction
US7568087B2 (en) 2005-08-10 2009-07-28 Apple Inc. Partial load/store forward prediction
US20090254734A1 (en) * 2005-08-10 2009-10-08 Sudarshan Kadambi Partial Load/Store Forward Prediction
US7984274B2 (en) 2005-08-10 2011-07-19 Apple Inc. Partial load/store forward prediction
US20070038846A1 (en) * 2005-08-10 2007-02-15 P.A. Semi, Inc. Partial load/store forward prediction
US7376817B2 (en) 2005-08-10 2008-05-20 P.A. Semi, Inc. Partial load/store forward prediction
US7613906B2 (en) * 2005-08-12 2009-11-03 Qualcomm Incorporated Advanced load value check enhancement
US20070035550A1 (en) * 2005-08-12 2007-02-15 Bohuslav Rychlik Advanced load value check enhancement
US20070288727A1 (en) * 2006-06-08 2007-12-13 International Business Machines Corporation A method to reduce the number of times in-flight loads are searched by store instructions in a multi-threaded processor
US7516310B2 (en) * 2006-06-08 2009-04-07 International Business Machines Corporation Method to reduce the number of times in-flight loads are searched by store instructions in a multi-threaded processor
US20090013135A1 (en) * 2007-07-05 2009-01-08 Board Of Regents, The University Of Texas System Unordered load/store queue
US8447911B2 (en) 2007-07-05 2013-05-21 Board Of Regents, University Of Texas System Unordered load/store queue
US7975130B2 (en) * 2008-02-20 2011-07-05 International Business Machines Corporation Method and system for early instruction text based operand store compare reject avoidance
US20090210675A1 (en) * 2008-02-20 2009-08-20 International Business Machines Corporation Method and system for early instruction text based operand store compare reject avoidance
US8195924B2 (en) 2008-02-20 2012-06-05 International Business Machines Corporation Early instruction text based operand store compare reject avoidance
US20110167244A1 (en) * 2008-02-20 2011-07-07 International Business Machines Corporation Early instruction text based operand store compare reject avoidance
US8418156B2 (en) * 2009-12-16 2013-04-09 Intel Corporation Two-stage commit (TSC) region for dynamic binary optimization in X86
US20110145551A1 (en) * 2009-12-16 2011-06-16 Cheng Wang Two-stage commit (tsc) region for dynamic binary optimization in x86
US20110185158A1 (en) * 2010-01-28 2011-07-28 International Business Machines Corporation History and alignment based cracking for store multiple instructions for optimizing operand store compare penalties
US9135005B2 (en) 2010-01-28 2015-09-15 International Business Machines Corporation History and alignment based cracking for store multiple instructions for optimizing operand store compare penalties
US8938605B2 (en) 2010-03-05 2015-01-20 International Business Machines Corporation Instruction cracking based on machine state
US20110219213A1 (en) * 2010-03-05 2011-09-08 International Business Machines Corporation Instruction cracking based on machine state
US8645669B2 (en) * 2010-05-05 2014-02-04 International Business Machines Corporation Cracking destructively overlapping operands in variable length instructions
US20110276764A1 (en) * 2010-05-05 2011-11-10 International Business Machines Corporation Cracking destructively overlapping operands in variable length instructions
US20120117335A1 (en) * 2010-11-10 2012-05-10 Advanced Micro Devices, Inc. Load ordering queue
US20130061290A1 (en) * 2011-09-06 2013-03-07 Jacob Mendel System for securely performing a transaction
US9128725B2 (en) 2012-05-04 2015-09-08 Apple Inc. Load-store dependency predictor content management
US9600289B2 (en) 2012-05-30 2017-03-21 Apple Inc. Load-store dependency predictor PC hashing
US9626189B2 (en) 2012-06-15 2017-04-18 International Business Machines Corporation Reducing operand store compare penalties
US20140215190A1 (en) * 2013-01-25 2014-07-31 Apple Inc. Completing load and store instructions in a weakly-ordered memory model
US9535695B2 (en) * 2013-01-25 2017-01-03 Apple Inc. Completing load and store instructions in a weakly-ordered memory model
US9710268B2 (en) 2014-04-29 2017-07-18 Apple Inc. Reducing latency for pointer chasing loads
US20160266205A1 (en) * 2015-03-10 2016-09-15 Fujitsu Limited Logic verification apparatus, logic verification method and test program
WO2017095515A1 (en) * 2015-11-30 2017-06-08 Intel IP Corporation Instruction and logic for in-order handling in an out-of-order processor
US10191748B2 (en) 2015-11-30 2019-01-29 Intel IP Corporation Instruction and logic for in-order handling in an out-of-order processor
US10514925B1 (en) 2016-01-28 2019-12-24 Apple Inc. Load speculation recovery
US10437595B1 (en) 2016-03-15 2019-10-08 Apple Inc. Load/store dependency predictor optimization for replayed loads
CN112445587A (en) * 2019-08-30 2021-03-05 上海华为技术有限公司 Task processing method and task processing device

Also Published As

Publication number Publication date
JP2597811B2 (en) 1997-04-09
JPH07160501A (en) 1995-06-23
EP0605869A1 (en) 1994-07-13

Similar Documents

Publication Publication Date Title
US5467473A (en) Out of order instruction load and store comparison
US5611063A (en) Method for executing speculative load instructions in high-performance processors
EP0638183B1 (en) A system and method for retiring instructions in a superscalar microprocessor
US5961636A (en) Checkpoint table for selective instruction flushing in a speculative execution unit
US6374347B1 (en) Register file backup queue
US7526583B2 (en) Method and apparatus to launch write queue read data in a microprocessor recovery unit
US5694565A (en) Method and device for early deallocation of resources during load/store multiple operations to allow simultaneous dispatch/execution of subsequent instructions
US5826055A (en) System and method for retiring instructions in a superscalar microprocessor
US5535346A (en) Data processor with future file with parallel update and method of operation
US6393550B1 (en) Method and apparatus for pipeline streamlining where resources are immediate or certainly retired
US5898864A (en) Method and system for executing a context-altering instruction without performing a context-synchronization operation within high-performance processors
EP1256053B1 (en) Secondary reorder buffer microprocessor
US5778248A (en) Fast microprocessor stage bypass logic enable
JP3207124B2 (en) Method and apparatus for supporting speculative execution of a count / link register change instruction
US6209081B1 (en) Method and system for nonsequential instruction dispatch and execution in a superscalar processor system
US6101597A (en) Method and apparatus for maximum throughput scheduling of dependent operations in a pipelined processor
IL155298A (en) Locking source registers in a data processing apparatus
US6898696B1 (en) Method and system for efficiently restoring a processor's execution state following an interrupt caused by an interruptible instruction
JP2742375B2 (en) Method and system for selective serialization of instruction processing in a superscalar processor
US5974535A (en) Method and system in data processing system of permitting concurrent processing of instructions of a particular type
US5850563A (en) Processor and method for out-of-order completion of floating-point operations during load/store multiple operations
US5841999A (en) Information handling system having a register remap structure using a content addressable table
EP0753173B1 (en) Processing system and method of operation
US5784606A (en) Method and system in a superscalar data processing system for the efficient handling of exceptions

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST.;ASSIGNORS:KAHLE, JAMES ALLAN;KAU, CHIN-CHENG;REEL/FRAME:006400/0599

Effective date: 19930108

FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
FP Lapsed due to failure to pay maintenance fee

Effective date: 20031114

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362