US20040064632A1 - Register stack engine having speculative load/store modes - Google Patents
Register stack engine having speculative load/store modes Download PDFInfo
- Publication number
- US20040064632A1 US20040064632A1 US10/663,247 US66324703A US2004064632A1 US 20040064632 A1 US20040064632 A1 US 20040064632A1 US 66324703 A US66324703 A US 66324703A US 2004064632 A1 US2004064632 A1 US 2004064632A1
- Authority
- US
- United States
- Prior art keywords
- register
- computer system
- data
- registers
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims abstract description 81
- 238000012546 transfer Methods 0.000 claims abstract description 23
- 230000000694 effects Effects 0.000 claims abstract description 4
- 238000012544 monitoring process Methods 0.000 claims description 2
- 238000005192 partition Methods 0.000 description 30
- 241001208007 Procas Species 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000001960 triggered effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 101710125615 Cortical fragment-lytic enzyme Proteins 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000002629 repopulating effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30123—Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
- G06F9/30127—Register windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
- G06F9/384—Register renaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
Definitions
- the present invention relates to microprocessors and, in particular, to mechanisms for managing data in a register file.
- Modem processors include extensive execution resources to support concurrent processing of multiple instructions.
- a processor typically includes one or more integer, floating point, branch, and memory execution units to implement integer, floating point, branch, and load/store instructions, respectively.
- integer and floating point units typically include register files to maintain data relatively close to the processor core.
- a register file is a high speed storage structure that is used to temporarily store information close to the execution resources of the processor.
- the operands on which instructions operate are preferentially stored in the entries (“registers”) of the register file, since they can be accessed more quickly from these locations.
- Data stored in larger, more remote storage structures such as caches or main memory, may take longer to access. The longer access times can reduce the processor's performance.
- Register files thus serve as a primary source of data for the processor's execution resources, and high performance processors provide large register files to take advantage of their low access latency.
- Register files take up relatively large areas on the processor's die. While improvements in semiconductor processing have reduced the size of the individual storage elements in a register, the wires that move data in and out of these storage elements have not benefited to the same degree. These wires are responsible for a significant portion of the register file's die area, particularly in the case of multi-ported register files.
- the die area impact of register files limits the size of the register files (and the number of registers) that can be used effectively on a given processor.
- the number of registers employed on succeeding processor generations has increased, so has the amount of data processors handle.
- superscalar processors include multiple instruction execution pipelines, each of which must be provided with data. In addition, these instruction execution pipelines operate at ever greater speeds. The net result is that the register files remain a relatively scare resource, and processors must manage the movement of data in and out of these register files carefully to operate at their peak efficiencies.
- Typical register management techniques empty registers to and load registers from higher latency storage devices, respectively, to optimize register usage.
- the data transfers are often triggered when control of the processor passes from one software procedure to another. For example, data from the registers used by a first procedure that is currently inactive may be emptied or “spilled” to a backing store if an active procedure requires more registers than are currently available in the register file.
- registers are reallocated to the procedure and loaded or “filled” with the associated data from the backing store.
- the store and load operations that transfer data between the register file and backing store may have relatively long latencies. This is particularly true if the data sought is only available in one of the large caches or main memory or if significant amounts of data must be transferred from anywhere in the memory hierarchy. In these cases, execution of the newly activated procedure is stalled while the data transfers are implemented. Execution stalls halt the progress of instructions through the processor's execution pipeline, degrading the processor's performance.
- the present invention addresses these and other problems related to register file management.
- FIG. 1 is a block diagram of one embodiment of a computer system that implements the present invention.
- FIG. 2 is a block diagram representing one embodiment of a register management system in accordance with the present invention.
- FIG. 3 is a schematic representation of register allocation operations for one embodiment of the register file of FIG. 1.
- FIG. 4 is a schematic representation of the operations implemented by the register stack engine between the backing memory and the register file of FIG. 1.
- FIG. 5 is a flowchart representing one embodiment of the method in accordance with the present invention for speculatively executing register spill and fill operations.
- FIG. 6 is a state machine representing one embodiment of the register stack engine in accordance with the present invention.
- the present invention provides a mechanism for managing the storage of data in a processor's register files.
- the mechanism identifies available execution cycles in a processor and uses the available execution cycles to speculatively spill data from and fill data into the registers of a register file. Registers associated with currently inactive procedures are targeted by the speculative spill and fill operations.
- the speculative spill and fill operations increase the “clean partition” of the register file, using available bandwidth in the processor-memory channel.
- “clean partition” refers to registers that store valid data which is also backed up in the memory hierarchy, e.g. a backing store. These registers may be allocated to a new procedure without first spilling them because the data they store has already been backed up. If the registers are not needed for a new procedure, they are available for the procedure to which they were previously allocated without first filling them from the backing store.
- Speculative spill and fill operations reduce the need for mandatory spill and fill operations, which are triggered in response to procedures calls, returns, returns from interrupts, and the like.
- Mandatory spill and fill operations may cause the processor to stall if the active procedure can not make forward progress until the mandatory spill/fill operations complete.
- One embodiment of a computer system in accordance with the present invention includes a processor and a memory coupled to the processor through a memory channel.
- the processor includes a stacked register file and a register stack engine.
- the stacked register file stores data for one or more procedures in one or more frames, respectively.
- the register stack engine monitors activity on the processor-memory channel and transfers data between selected frames of the register file and a backing store responsive to the available bandwidth in the memory channel. For example, the register stack engine may monitor a load/store unit of the processor for empty instruction slots and inject speculative load/store operations for the register file when available instruction slots are identified.
- FIG. 1 is a block diagram of one embodiment of a computer system 100 in accordance with the present invention.
- Computer system 100 includes a processor 110 and a main memory 170 .
- Processor 110 includes an instruction cache 120 , an execution core 130 , one or more register files 140 , a register stack engine (RSE) 150 , and one or more data caches 160 .
- RSE register stack engine
- a load/store execution unit (LSU) 134 is shown in execution core 130 .
- Other components of processor 110 such as rename logic, retirement logic, instruction decoders, arithmetic/logic unit(s) and the like are not shown.
- a bus 180 provides a communication channel between main memory 170 and the various components of processor 110 .
- cache(s) 160 and main memory 190 form a memory hierarchy.
- Data that is not available in register file 140 may be provided by the first structure in the memory hierarchy in which the data is found.
- data that is evicted from register file 140 to accommodate new procedures may be stored in the memory hierarchy until it is needed again.
- RSE 150 monitors traffic on the memory channel and initiates data transfers between register file(s) 140 and the memory hierarchy when bandwidth is available. For example, RSE 150 may use otherwise idle cycles, i.e. empty instruction slots, on LSU 134 to speculatively execute spill and fill operations. The speculative operations are targeted to increase the portion of data in register file 140 that is backed up in memory 190 .
- register file 140 is logically partitioned to store data associated with different procedures in different frames. Portions of these frames may overlap to facilitate data transfers between different procedures.
- RSE 150 speculatively transfers data for inactive procedures between register file 140 and the memory hierarchy.
- RSE 150 may store data from registers associated with inactive procedures (RSE_Store) to a backing memory.
- RSE_Store registers associated with inactive procedures
- an inactive or parent procedure is a procedure that called the current active procedure either directly or through one or more intervening procedures.
- Speculative RSE_Stores increase the probability that copies of data stored in registers is already backed up in the memory hierarchy should the registers be needed for use by an active procedure.
- RSE 150 may load data from the memory hierarchy to registers that do not currently store valid data (RSE_Load ). Speculative RSE_Loads increase the probability that the data associated with an inactive (parent) procedure will be available in register file 140 when the procedure is re-activated.
- FIG. 2 is a schematic representation of a register management system 200 that is suitable for use with the present invention.
- Register management system 200 includes register file 140 , RSE 150 , a memory channel 210 and a backing store 220 .
- Backing store 220 may include, for example, memory locations in one or more of cache(s) 160 and main memory 170 .
- Memory channel 210 may include, for example, bus 180 and/or LSU 134 .
- RSE 150 manages data transfers between stacked register file 140 and backing store 220 .
- the disclosed embodiment of RSE 150 includes state registers 280 to track the status of the speculative and mandatory operations it implements.
- State registers 280 may indicate the next registers targeted by speculative load and store operations (“RSE.LoadReg” and “RSE.StoreReg”, respectively), as well as the location in the backing store associated with the currently active procedure (“RSE.BOF”).
- MSB mode status bit
- register file 140 is a stacked register file that is operated as a circular buffer (dashed line) to store data for current and recently active procedures. The embodiment is illustrated for the case in which data for three procedures, ProcA, ProcB and ProcC, is currently being stored. The figure represents the state of register file 140 after ProcA has called ProcB, which has in turn called ProcC. Each process has been allocated a set of registers in stacked register file 140 .
- ProcC is active.
- the current active frame of stacked register file 140 includes registers 250 , which are allocated to ProcC.
- ProcB which called ProcC, is inactive
- ProcA which called ProcB, is inactive.
- ProcB and ProcA are parent procedures.
- data is transferred between execution core 130 and registers 250 (the active frame) responsive to the instructions of ProcC.
- RSE 150 implements speculative spill and fill operations on registers 230 and 240 , which are allocated to inactive procedures, ProcA and ProcB, respecitvely.
- Unallocated registers 260 , 270 appear above and below allocated registers 230 , 240 , 250 in register file 140
- the size of the current active frame is indicated by a size of frame parameter for ProcC (SOF c ).
- the active frame includes registers that are available only to ProcC (local registers) as well as registers that may be used to share data with other procedures (output registers).
- the local registers for ProcC are indicated by a size of locals parameter (SOL c ).
- SOL c locals parameter
- the actual size of the corresponding frames, when active, are indicated through frame-tracking registers, which are discussed in greater detail below.
- FIG. 3 represents a series of register allocation/deallocation operations in response to procedure calls and returns for one embodiment of computer system 100 .
- FIG. 3 illustrates the instructions, register allocation, and frame tracking that occur when ProcB passes control of processor 110 to ProcC and when ProcC returns control of processor 110 to ProcB.
- ProcB is active.
- a current frame marker (CFM) tracks SOF and SOL for the active procedure
- a previous frame marker (PFM) tracks SOF and SOL for the procedure that called the current active procedure.
- the SOF and SOL values for ProcB are stored in PFM and the SOF and SOL values of ProcC are stored in CFM.
- ProcC executes an allocate instruction to acquire additional registers and redistribute the registers of its frame among local and output registers.
- the current active frame for ProcC includes 19 registers, 16 of which are local.
- PFM is unchanged by the allocation instruction.
- ProcC executes a return instruction to return control of the processor to ProcB.
- ProcB's frame is restored using the values from PFM.
- the above described procedure-switching may trigger the transfer of data between register file 140 and backing store 220 .
- Load and store operations triggered in response to procedure switching are termed “mandatory”.
- Mandatory store (“spill”) operations occur, for example, when a new procedure requires the use of a large number of registers, and some of these registers store data for another procedure that has yet to be copied to backing store 210 .
- RSE 150 issues one or more store operations to save the data to backing store 210 before allocating the registers to the newly activated procedure. This prevents the new procedure from overwriting data in the newly allocated registers.
- the present invention provides a mechanism that speculatively saves and restores (spills and fills) data from registers in inactive frames to reduce the number of stalls generated by mandatory RSE operations. Speculative operations allow the active procedure to use more of the registers in register file 140 without concern for overwriting data from inactive procedures that has yet to be backed-up or evicting data for inactive procedures unnecessarily.
- the register file is partitioned according to the state of the data in different registers. These registers are partitioned as follows:
- the Clean Partition includes registers that store data values from parent procedure frames.
- the registers in this partition have been successfully spilled to the backing store by the RSE and their contents have not been modified since they were written to the backing store.
- the clean partition includes the registers between the next register to be stored by the RSE (RSE.StoreReg) and the next register to be loaded by the RSE (RSE.LoadReg).
- the Dirty Partition includes registers that store data values from parent procedure frames.
- the data in this partition has not yet been spilled to the backing store by the RSE.
- the number of registers in the dirty partition (“ndirty”) is equal to the distance between a pointer to the register at the bottom of the current active frame (RSE.BOF) and a pointer the next register to be stored by the RSE (RSE.StoreReg).
- the Current Frame includes stacked registers allocated for use by the procedure that currently controls the processor.
- the position of the current frame in the physical stacked register file is defined by RSE.BOF, and the number of registers in the current frame is specified by the size of frame parameter in the current frame marker (CFM.sof).
- the Invalid Partition includes registers outside the current frame that do not store values from parent procedures. Registers in this partition are available for immediate allocation into the current frame or for RSE load operations.
- RSE 150 tracks the register file partitions and initiates speculative load and store operations between the register file and the backing store when the processor has available bandwidth.
- Table 1 summarizes the parameters used to track the partitions and the internal state of the RSE. The parameters are named and defined in the first two columns, respectively, and the parameters that are architecturally visible, e.g. available to software, are indicated in the third column of the table.
- AR represents a set of application registers that may be read or modified by software operating on, e.g., computers system 100 .
- the exemplary registers and instructions discussed in conjunction with Tables 1-4 are from the IA64TM Instruction Set Architecture (ISA), which is described in Intel® IA64 Architecture Software Developer's Guide, Volumes 1-4, published by Intel® Corporation of Santa Clara, Calif. TABLE 1 Architectural Name Description Location RSE.N_Stacked — Number of stacked physical Phys registers in the particular implementation of the register file RSE.BOF Number of the physical register AR[BSP] at the bottom of the current frame. For the disclosed embodiment, this physical register is mapped to logical register 32.
- ISA Instruction Set Architecture
- RSE.BspLoad Points to the 64-bit backing store address that is 8 bytes greater than the next address to be loaded by the RSE RSE.NATBitIndex 6-bit wide RNAT collection Bit AR[BSPSTORE] Index-defines which RNAT (8:3) collection bit gets updated RSE.CFLE Current Frame load enable bit- control bit that permits the RSE to load regsieter in the current frame after a branch return or return from interrupt (rfi)
- FIG. 4 is a schematic representation of the operations implemented by RSE 150 to transfer data speculatively between register file 140 and backing store 210 .
- Various partitions 410 , 420 , 430 and 440 of register file 140 are indicated along with the operations of RSE 150 on these partitions.
- partition 410 comprises the registers of the current (active) frame, which stores data for ProcC.
- Dirty partition 420 comprises registers that store data from a parent procedure which has not yet been copied to backing store 210 .
- dirty partition 420 is delineated by the registers indicated through RSE.StoreReg and RSE.BOF.
- dirty partition 420 includes some or all local registers allocated to ProcB and, possibly, ProcA, when the contents of these registers have not yet been copied to backing store 210 .
- Clean partition 430 includes local registers whose contents have been copied to backing store 210 and have not been modified in the meantime.
- clean partition may include registers allocated to ProcA and, possibly, ProcB.
- Invalid partition 440 comprises register that do not currently store valid data for any procedures.
- RSE 150 monitors processor 110 and executes store operations (RSE_Stores) on registers in dirty partition 420 when bandwidth is available in the memory channel.
- RSE.StoreReg indicates the next register to be spilled to backing store 210 . It is incremented as RSE 150 copies data from register file 140 to backing store 210 .
- RSE_Stores are opportunistic store operations that expand the size of clean partition 430 at the expense of dirty partition 420 .
- RSE_Stores increase the fraction of registers in register file 140 that are backed up in backing store 210 . These transfers are speculative because the registers may be reaccessed by the procedure to which they were originally allocated before they are allocated to a new procedure.
- RSE 150 also executes load operations (RSE_Loads) to registers in invalid partition 440 , when bandwidth is available in the memory channel. These opportunistic load operations increase the size of clean partition 430 at the expense of invalid partition 440 .
- RSE.LoadReg indicates the next register in invalid partition 440 to which RSE 150 restores data. By speculatively repopulating registers in invalid partition 440 with data, RSE 150 reduces the probability that mandatory loads will be necessary to transfer data from backing store 210 to register file 140 when a new procedure is (re) activated. The transfer is speculative because another procedure may require allocation of the registers before the procedure associated with the restored data is re-activated.
- RSE 150 may operate in different modes, depending on the nature of the application that is being executed. In all modes, mandatory spill and fill operations are supported. However, some modes may selectively enable speculative spill operations and speculative fill operations. A mode may be selected depending on the anticipated register needs of the application that is to be executed. For example, a register stack configuration (RSC) register may be used to indicate the mode in which RSE 150 operates. Table 2 identifies four RSE modes, the types of RSE loads and RSE stores enabled for each mode, and a bit pattern associated with the mode.
- RSC register stack configuration
- FIG. 5 is a flowchart representing one embodiment of a method for managing data transfers between a backing store and a register file.
- Method 500 checks 510 for mandatory RSE operations. If a mandatory RSE operation is pending, it is executed. If no mandatory RSE operations are pending, method 500 determines 530 whether there is any available bandwidth in the memory channel. If bandwidth is available 530 , speculative one or more RSE operations are executed 540 and the RSE internal state is updated 550 . If no bandwidth is available 530 , method 500 continues monitoring 510 , 530 for mandatory RSE operations and available bandwidth.
- FIG. 6 represents one embodiment of a state machine 600 that may be implemented by RSE 15 .
- State machine 600 includes a monitor state 610 , an adjust state 620 and a speculative execution state 630 .
- monitor state 610 For purposes of illustration, it is assumed that speculative RSE_loads and RS_stores are both enabled for state machine 600 , i.e. it is operating in eager mode.
- state machine 600 monitors processor 110 for RSE-related instructions (RI) and available bandwidth (BW).
- RIs are instructions that may alter portions of the architectural state of the processor that are relevant to the RSE (“RSE state”). The RSE may have to stall the processor and implement mandatory spill and fill operations if these adjustments indicate that data/registers are not available in the register stack.
- the disclosed embodiment of state machine 600 transitions to adjust state 620 when an RI is detected and implements changes to the RSE state indicated by the RI. If the RSE state indicates that mandatory spill or fill operations (MOPs) are necessary, these are implemented and the RSE state is adjusted accordingly. If no MOPs are indicated by the state change (!MOP), state machine 600 returns to monitor state 610 .
- MOPs mandatory spill or fill operations
- RIs include load-register-stack instructions (loadrs), flush-register-stack instructions (flushrs), cover instructions, register allocation instruction (alloc), procedure return instructions (ret) and return-from-interrupt instructions (rfi) instructions. that may alter the architectural state of processor 110 as well as the internal state of the RSE.
- loadrs load-register-stack instructions
- flush-register-stack instructions flush-register-stack instructions
- cover instructions register allocation instruction (alloc), procedure return instructions (ret) and return-from-interrupt instructions (rfi) instructions.
- state machine 600 transitions from monitor state 610 to speculative execution state 630 .
- state machine 600 may execute RSE_Store instructions for inactive register frames and adjust its register tracking parameter (StoreReg) accordingly, or it may execute RSE_Load instructions on inactive register frames and adjust its memory pointer (BspLoad) and register tracking parameter (LoadReg) accordingly.
- StoreReg register tracking parameter
- BspLoad memory pointer
- LoadReg register tracking parameter
- State machine 600 transitions from speculative execution state 630 back to monitor state 610 if available bandwidth dries up (!BW). Alternatively, detection of an RI may cause a transition from speculative execution state 630 to adjust state 620 .
- the present invention thus provides a register management system that supports more efficient use of a processor's registers.
- a register stack engine employs available bandwidth in the processor-memory channel to speculatively spill and fill registers allocated to inactive procedures. The speculative operations increase the size of the register file's clean partition, reducing the need for mandatory spill and fill operations which may stall processor execution.
Abstract
A computer system is provided having a register stack engine to manage data transfers between a backing store and a register stack. The computer system includes a processor and a memory coupled to the processor through a memory channel. The processor includes a register stack to store data from one or more procedures in one or more frames, respectively. The register stack engine monitors activity on the memory channel and transfers data between selected frames of the register stack and a backing store in the memory responsive to the available bandwidth on the memory channel.
Description
- 1. Technical Field
- The present invention relates to microprocessors and, in particular, to mechanisms for managing data in a register file.
- 2. Background Art
- Modem processors include extensive execution resources to support concurrent processing of multiple instructions. A processor typically includes one or more integer, floating point, branch, and memory execution units to implement integer, floating point, branch, and load/store instructions, respectively. In addition, integer and floating point units typically include register files to maintain data relatively close to the processor core.
- A register file is a high speed storage structure that is used to temporarily store information close to the execution resources of the processor. The operands on which instructions operate are preferentially stored in the entries (“registers”) of the register file, since they can be accessed more quickly from these locations. Data stored in larger, more remote storage structures such as caches or main memory, may take longer to access. The longer access times can reduce the processor's performance. Register files thus serve as a primary source of data for the processor's execution resources, and high performance processors provide large register files to take advantage of their low access latency.
- Register files take up relatively large areas on the processor's die. While improvements in semiconductor processing have reduced the size of the individual storage elements in a register, the wires that move data in and out of these storage elements have not benefited to the same degree. These wires are responsible for a significant portion of the register file's die area, particularly in the case of multi-ported register files. The die area impact of register files limits the size of the register files (and the number of registers) that can be used effectively on a given processor. Although the number of registers employed on succeeding processor generations has increased, so has the amount of data processors handle. For example, superscalar processors include multiple instruction execution pipelines, each of which must be provided with data. In addition, these instruction execution pipelines operate at ever greater speeds. The net result is that the register files remain a relatively scare resource, and processors must manage the movement of data in and out of these register files carefully to operate at their peak efficiencies.
- Typical register management techniques empty registers to and load registers from higher latency storage devices, respectively, to optimize register usage. The data transfers are often triggered when control of the processor passes from one software procedure to another. For example, data from the registers used by a first procedure that is currently inactive may be emptied or “spilled” to a backing store if an active procedure requires more registers than are currently available in the register file. When control is returned to the first procedure, registers are reallocated to the procedure and loaded or “filled” with the associated data from the backing store.
- The store and load operations that transfer data between the register file and backing store may have relatively long latencies. This is particularly true if the data sought is only available in one of the large caches or main memory or if significant amounts of data must be transferred from anywhere in the memory hierarchy. In these cases, execution of the newly activated procedure is stalled while the data transfers are implemented. Execution stalls halt the progress of instructions through the processor's execution pipeline, degrading the processor's performance.
- The present invention addresses these and other problems related to register file management.
- The present invention may be understood with reference to the following drawings, in which like elements are indicated by like numbers. These drawings are provided to illustrate selected embodiments of the present invention and are not intended to limit the scope of the invention.
- FIG. 1 is a block diagram of one embodiment of a computer system that implements the present invention.
- FIG. 2 is a block diagram representing one embodiment of a register management system in accordance with the present invention.
- FIG. 3 is a schematic representation of register allocation operations for one embodiment of the register file of FIG. 1.
- FIG. 4 is a schematic representation of the operations implemented by the register stack engine between the backing memory and the register file of FIG. 1.
- FIG. 5 is a flowchart representing one embodiment of the method in accordance with the present invention for speculatively executing register spill and fill operations.
- FIG. 6 is a state machine representing one embodiment of the register stack engine in accordance with the present invention.
- The following discussion sets forth numerous specific details to provide a thorough understanding of the invention. However, those of ordinary skill in the art, having the benefit of this disclosure, will appreciate that the invention may be practiced without these specific details. In addition, various well-known methods, procedures, components, and circuits have not been described in detail in order to focus attention on the features of the present invention.
- The present invention provides a mechanism for managing the storage of data in a processor's register files. The mechanism identifies available execution cycles in a processor and uses the available execution cycles to speculatively spill data from and fill data into the registers of a register file. Registers associated with currently inactive procedures are targeted by the speculative spill and fill operations.
- For one embodiment of the invention, the speculative spill and fill operations increase the “clean partition” of the register file, using available bandwidth in the processor-memory channel. Here, “clean partition” refers to registers that store valid data which is also backed up in the memory hierarchy, e.g. a backing store. These registers may be allocated to a new procedure without first spilling them because the data they store has already been backed up. If the registers are not needed for a new procedure, they are available for the procedure to which they were previously allocated without first filling them from the backing store. Speculative spill and fill operations reduce the need for mandatory spill and fill operations, which are triggered in response to procedures calls, returns, returns from interrupts, and the like. Mandatory spill and fill operations may cause the processor to stall if the active procedure can not make forward progress until the mandatory spill/fill operations complete.
- One embodiment of a computer system in accordance with the present invention includes a processor and a memory coupled to the processor through a memory channel. The processor includes a stacked register file and a register stack engine. The stacked register file stores data for one or more procedures in one or more frames, respectively. The register stack engine monitors activity on the processor-memory channel and transfers data between selected frames of the register file and a backing store responsive to the available bandwidth in the memory channel. For example, the register stack engine may monitor a load/store unit of the processor for empty instruction slots and inject speculative load/store operations for the register file when available instruction slots are identified.
- FIG. 1 is a block diagram of one embodiment of a
computer system 100 in accordance with the present invention.Computer system 100 includes aprocessor 110 and amain memory 170.Processor 110 includes aninstruction cache 120, anexecution core 130, one ormore register files 140, a register stack engine (RSE) 150, and one ormore data caches 160. A load/store execution unit (LSU) 134 is shown inexecution core 130. Other components ofprocessor 110 such as rename logic, retirement logic, instruction decoders, arithmetic/logic unit(s) and the like are not shown. Abus 180 provides a communication channel betweenmain memory 170 and the various components ofprocessor 110. - For the disclosed embodiment of
computer system 100, cache(s) 160 and main memory 190 form a memory hierarchy. Data that is not available inregister file 140 may be provided by the first structure in the memory hierarchy in which the data is found. In addition, data that is evicted fromregister file 140 to accommodate new procedures may be stored in the memory hierarchy until it is needed again.RSE 150 monitors traffic on the memory channel and initiates data transfers between register file(s) 140 and the memory hierarchy when bandwidth is available. For example,RSE 150 may use otherwise idle cycles, i.e. empty instruction slots, onLSU 134 to speculatively execute spill and fill operations. The speculative operations are targeted to increase the portion of data inregister file 140 that is backed up in memory 190. - For one embodiment of the invention,
register file 140 is logically partitioned to store data associated with different procedures in different frames. Portions of these frames may overlap to facilitate data transfers between different procedures. To increase the number of registers available for use by the currently executing procedure,RSE 150 speculatively transfers data for inactive procedures betweenregister file 140 and the memory hierarchy. For example,RSE 150 may store data from registers associated with inactive procedures (RSE_Store) to a backing memory. Here, an inactive or parent procedure is a procedure that called the current active procedure either directly or through one or more intervening procedures. Speculative RSE_Stores increase the probability that copies of data stored in registers is already backed up in the memory hierarchy should the registers be needed for use by an active procedure. Similarly,RSE 150 may load data from the memory hierarchy to registers that do not currently store valid data (RSE_Load ). Speculative RSE_Loads increase the probability that the data associated with an inactive (parent) procedure will be available inregister file 140 when the procedure is re-activated. - FIG. 2 is a schematic representation of a
register management system 200 that is suitable for use with the present invention.Register management system 200 includesregister file 140,RSE 150, amemory channel 210 and abacking store 220.Backing store 220 may include, for example, memory locations in one or more of cache(s) 160 andmain memory 170.Memory channel 210 may include, for example,bus 180 and/orLSU 134. -
RSE 150 manages data transfers between stackedregister file 140 andbacking store 220. The disclosed embodiment ofRSE 150 includes state registers 280 to track the status of the speculative and mandatory operations it implements. State registers 280 may indicate the next registers targeted by speculative load and store operations (“RSE.LoadReg” and “RSE.StoreReg”, respectively), as well as the location in the backing store associated with the currently active procedure (“RSE.BOF”). Also shown in FIG. 2 is an optional mode status bit (“MSB”) that indicates which, if any, of thespeculative operations RSE 150 should implement. These features ofRSE 150 are discussed below in greater detail. - The disclosed embodiment of
register file 140 is a stacked register file that is operated as a circular buffer (dashed line) to store data for current and recently active procedures. The embodiment is illustrated for the case in which data for three procedures, ProcA, ProcB and ProcC, is currently being stored. The figure represents the state ofregister file 140 after ProcA has called ProcB, which has in turn called ProcC. Each process has been allocated a set of registers in stackedregister file 140. - In the exemplary state, the instructions of ProcC are currently being executed by
processor 110. That is, ProcC is active. The current active frame of stackedregister file 140 includesregisters 250, which are allocated to ProcC. ProcB, which called ProcC, is inactive, and ProcA, which called ProcB, is inactive. ProcB and ProcA are parent procedures. For the disclosed embodiment ofregister management system 200, data is transferred betweenexecution core 130 and registers 250 (the active frame) responsive to the instructions of ProcC.RSE 150 implements speculative spill and fill operations onregisters Unallocated registers registers register file 140 - For the disclosed embodiment of
register file 140, the size of the current active frame (registers 250) is indicated by a size of frame parameter for ProcC (SOFc). The active frame includes registers that are available only to ProcC (local registers) as well as registers that may be used to share data with other procedures (output registers). The local registers for ProcC are indicated by a size of locals parameter (SOLc). For inactive procedures, ProcA and ProcB, only local registers are reflected in register file 140 (by SOLa and SOLb, respectively). The actual size of the corresponding frames, when active, are indicated through frame-tracking registers, which are discussed in greater detail below. - FIG. 3 represents a series of register allocation/deallocation operations in response to procedure calls and returns for one embodiment of
computer system 100. In particular, FIG. 3 illustrates the instructions, register allocation, and frame tracking that occur when ProcB passes control ofprocessor 110 to ProcC and when ProcC returns control ofprocessor 110 to ProcB. - At time (I), the instructions of ProcB are executing on the processor, i.e. ProcB is active. ProcB has a frame size of 21 registers (SOFb=21), of which 14 are local to ProcB (SOLb=14) and 7 are available for sharing. A current frame marker (CFM) tracks SOF and SOL for the active procedure, and a previous frame marker (PFM) tracks SOF and SOL for the procedure that called the current active procedure.
- ProcB calls ProcC, which is initialized with the output registers of ProcB and no local registers (SOLc=0 and SOFc=7) at time (II). For the disclosed embodiment, initialization is accomplished by renaming output registers of ProcB to output registers of ProcC. The SOF and SOL values for ProcB are stored in PFM and the SOF and SOL values of ProcC are stored in CFM.
- ProcC executes an allocate instruction to acquire additional registers and redistribute the registers of its frame among local and output registers. At time (III), following the allocation, the current active frame for ProcC includes 19 registers, 16 of which are local. CFM is updated from (SOLc=0 and SOFc=7) to (SOLc=16 and SOFc=19). PFM is unchanged by the allocation instruction. When ProcC completes, it executes a return instruction to return control of the processor to ProcB. At time (IV), following execution of the return instruction, ProcB's frame is restored using the values from PFM.
- The above described procedure-switching may trigger the transfer of data between
register file 140 andbacking store 220. Load and store operations triggered in response to procedure switching are termed “mandatory”. Mandatory store (“spill”) operations occur, for example, when a new procedure requires the use of a large number of registers, and some of these registers store data for another procedure that has yet to be copied tobacking store 210. In this case,RSE 150 issues one or more store operations to save the data tobacking store 210 before allocating the registers to the newly activated procedure. This prevents the new procedure from overwriting data in the newly allocated registers. - Mandatory fill operations may occur when the processor returns to a parent procedure if the data associated with the parent procedure has been evicted from the register file to accommodate data for another procedure. In this case,
RSE 150 issues one or more load operations to restore the data to the registers associated with the re-activated parent procedure. - When forward progress of the newly activated (or re-activated) procedure is blocked by mandatory spill and fill operations, the processor stalls until these operations complete. This reduces the performance of the processor.
- The present invention provides a mechanism that speculatively saves and restores (spills and fills) data from registers in inactive frames to reduce the number of stalls generated by mandatory RSE operations. Speculative operations allow the active procedure to use more of the registers in
register file 140 without concern for overwriting data from inactive procedures that has yet to be backed-up or evicting data for inactive procedures unnecessarily. - For one embodiment of the invention, the register file is partitioned according to the state of the data in different registers. These registers are partitioned as follows:
- The Clean Partition includes registers that store data values from parent procedure frames. The registers in this partition have been successfully spilled to the backing store by the RSE and their contents have not been modified since they were written to the backing store. For the disclosed embodiment of the register management system, the clean partition includes the registers between the next register to be stored by the RSE (RSE.StoreReg) and the next register to be loaded by the RSE (RSE.LoadReg).
- The Dirty Partition includes registers that store data values from parent procedure frames. The data in this partition has not yet been spilled to the backing store by the RSE. The number of registers in the dirty partition (“ndirty”) is equal to the distance between a pointer to the register at the bottom of the current active frame (RSE.BOF) and a pointer the next register to be stored by the RSE (RSE.StoreReg).
- The Current Frame includes stacked registers allocated for use by the procedure that currently controls the processor. The position of the current frame in the physical stacked register file is defined by RSE.BOF, and the number of registers in the current frame is specified by the size of frame parameter in the current frame marker (CFM.sof).
- The Invalid Partition includes registers outside the current frame that do not store values from parent procedures. Registers in this partition are available for immediate allocation into the current frame or for RSE load operations.
- For one embodiment of the invention,
RSE 150 tracks the register file partitions and initiates speculative load and store operations between the register file and the backing store when the processor has available bandwidth. Table 1 summarizes the parameters used to track the partitions and the internal state of the RSE. The parameters are named and defined in the first two columns, respectively, and the parameters that are architecturally visible, e.g. available to software, are indicated in the third column of the table. Here, AR represents a set of application registers that may be read or modified by software operating on, e.g.,computers system 100. The exemplary registers and instructions discussed in conjunction with Tables 1-4 are from the IA64™ Instruction Set Architecture (ISA), which is described in Intel® IA64 Architecture Software Developer's Guide, Volumes 1-4, published by Intel® Corporation of Santa Clara, Calif.TABLE 1 Architectural Name Description Location RSE.N_Stacked— Number of stacked physical Phys registers in the particular implementation of the register file RSE.BOF Number of the physical register AR[BSP] at the bottom of the current frame. For the disclosed embodiment, this physical register is mapped to logical register 32. RSE.StoreReg Physical register number of the AR[BSPSTORE] next register to be stored by the RSE RSE.LoadReg Physical register number that is RSE.BspLoad one greater than the next register to be loaded (modulo N— Stacked_Phy). RSE.BspLoad Points to the 64-bit backing store address that is 8 bytes greater than the next address to be loaded by the RSE RSE.NATBitIndex 6-bit wide RNAT collection Bit AR[BSPSTORE] Index-defines which RNAT (8:3) collection bit gets updated RSE.CFLE Current Frame load enable bit- control bit that permits the RSE to load regsieter in the current frame after a branch return or return from interrupt (rfi) - FIG. 4 is a schematic representation of the operations implemented by
RSE 150 to transfer data speculatively betweenregister file 140 andbacking store 210.Various partitions register file 140 are indicated along with the operations ofRSE 150 on these partitions. For the disclosed embodiment,partition 410 comprises the registers of the current (active) frame, which stores data for ProcC. -
Dirty partition 420 comprises registers that store data from a parent procedure which has not yet been copied tobacking store 210. For the disclosed embodiment ofregister management system 200,dirty partition 420 is delineated by the registers indicated through RSE.StoreReg and RSE.BOF. For the example of FIG. 2,dirty partition 420 includes some or all local registers allocated to ProcB and, possibly, ProcA, when the contents of these registers have not yet been copied tobacking store 210. -
Clean partition 430 includes local registers whose contents have been copied tobacking store 210 and have not been modified in the meantime. For the example of FIG. 2, clean partition may include registers allocated to ProcA and, possibly, ProcB.Invalid partition 440 comprises register that do not currently store valid data for any procedures. -
RSE 150 monitorsprocessor 110 and executes store operations (RSE_Stores) on registers indirty partition 420 when bandwidth is available in the memory channel. For the disclosed embodiment of the invention, RSE.StoreReg indicates the next register to be spilled tobacking store 210. It is incremented asRSE 150 copies data fromregister file 140 tobacking store 210. RSE_Stores are opportunistic store operations that expand the size ofclean partition 430 at the expense ofdirty partition 420. RSE_Stores increase the fraction of registers inregister file 140 that are backed up inbacking store 210. These transfers are speculative because the registers may be reaccessed by the procedure to which they were originally allocated before they are allocated to a new procedure. -
RSE 150 also executes load operations (RSE_Loads) to registers ininvalid partition 440, when bandwidth is available in the memory channel. These opportunistic load operations increase the size ofclean partition 430 at the expense ofinvalid partition 440. For the disclosed embodiment, RSE.LoadReg indicates the next register ininvalid partition 440 to whichRSE 150 restores data. By speculatively repopulating registers ininvalid partition 440 with data,RSE 150 reduces the probability that mandatory loads will be necessary to transfer data frombacking store 210 to registerfile 140 when a new procedure is (re) activated. The transfer is speculative because another procedure may require allocation of the registers before the procedure associated with the restored data is re-activated. - For one embodiment of the invention,
RSE 150 may operate in different modes, depending on the nature of the application that is being executed. In all modes, mandatory spill and fill operations are supported. However, some modes may selectively enable speculative spill operations and speculative fill operations. A mode may be selected depending on the anticipated register needs of the application that is to be executed. For example, a register stack configuration (RSC) register may be used to indicate the mode in whichRSE 150 operates. Table 2 identifies four RSE modes, the types of RSE loads and RSE stores enabled for each mode, and a bit pattern associated with the mode.TABLE 2 RSE Mode RSE Loads RSE Stores RSC.mode Enforced Lazy Mode Mandatory Mandatory 00 Store Intensive Mode Mandatory Mandatory + 01 Speculative Load Intensive Mode Mandatory + Mandatory 10 Speculative Eager Mode Mandatory + Mandatory + 11 Speculative Speculative - FIG. 5 is a flowchart representing one embodiment of a method for managing data transfers between a backing store and a register file. Method500
checks 510 for mandatory RSE operations. If a mandatory RSE operation is pending, it is executed. If no mandatory RSE operations are pending, method 500 determines 530 whether there is any available bandwidth in the memory channel. If bandwidth is available 530, speculative one or more RSE operations are executed 540 and the RSE internal state is updated 550. If no bandwidth is available 530, method 500 continues monitoring 510, 530 for mandatory RSE operations and available bandwidth. - FIG. 6 represents one embodiment of a state machine600 that may be implemented by RSE 15. State machine 600 includes a
monitor state 610, an adjuststate 620 and aspeculative execution state 630. For purposes of illustration, it is assumed that speculative RSE_loads and RS_stores are both enabled for state machine 600, i.e. it is operating in eager mode. - In
monitor state 610, state machine 600 monitorsprocessor 110 for RSE-related instructions (RI) and available bandwidth (BW). RIs are instructions that may alter portions of the architectural state of the processor that are relevant to the RSE (“RSE state”). The RSE may have to stall the processor and implement mandatory spill and fill operations if these adjustments indicate that data/registers are not available in the register stack. The disclosed embodiment of state machine 600 transitions to adjuststate 620 when an RI is detected and implements changes to the RSE state indicated by the RI. If the RSE state indicates that mandatory spill or fill operations (MOPs) are necessary, these are implemented and the RSE state is adjusted accordingly. If no MOPs are indicated by the state change (!MOP), state machine 600 returns to monitorstate 610. - For one embodiment of the invention, RIs include load-register-stack instructions (loadrs), flush-register-stack instructions (flushrs), cover instructions, register allocation instruction (alloc), procedure return instructions (ret) and return-from-interrupt instructions (rfi) instructions. that may alter the architectural state of
processor 110 as well as the internal state of the RSE. The effects of various RIs on the processor state for one embodiment ofregister management system 200 are summarized below in Tables 3 and 4. - If no RIs are detected and bandwidth is available for speculative RSE operations (BW && !RIs), state machine600 transitions from
monitor state 610 tospeculative execution state 630. Instate 630, state machine 600 may execute RSE_Store instructions for inactive register frames and adjust its register tracking parameter (StoreReg) accordingly, or it may execute RSE_Load instructions on inactive register frames and adjust its memory pointer (BspLoad) and register tracking parameter (LoadReg) accordingly. - State machine600 transitions from
speculative execution state 630 back to monitorstate 610 if available bandwidth dries up (!BW). Alternatively, detection of an RI may cause a transition fromspeculative execution state 630 to adjuststate 620.TABLE 3 INSTRUCTIONS AFFECTED Alloc RFI STATE (rI = ar.pfs, I, l, o, r) Branch-Call Branch-Return (CR[IFS].v = 1) AR[BSP] {63:3} Unchanged AR[BSP]{63:3} + AR[BSP]{63:3} − AR[BSP]{63:3} CFM.sol + AR[PFS].pfm.sol − (62 − CR[IFS].ifm.sof − (62 − (AR[BSP]{8:3} + AR[BSP]{8:3} + AR[BSP][8:3} + CFM.sol)/63 AR[PFS].pfm.sol)/63 + CR[IFS].ifm.sof)/63 CFM.sol)/63 AR[PFS] Unchanged AR[PFS].pfm = CFM Unchanged Unchanged AR[PFS].pec = AR[EC] AR[PFS].ppl = PSR.cpl GR[rI] AR[PFS] N/A N/A N/A CFM CFM.sof = i + l +o CFM.sof = CFM.sol AR[PFS].pfm CR[IFS].ifm CFM.sol = i+ l CFM.sol = 0 OR CFM.sor = r>>3 CFM.sor = 0 CFM.sof =o CFM.rrb.gr = 0 CFM.sol = 0 CFM.rrb.fr = 0 CFM.sor = 0 CFM.rrb.p = 0 CFM.rrb.gr = 0 CFM.rrb.fr = 0 CFM.rrb.p = 0 -
TABLE 4 INSTRUCTION AFFECTED STATE Cover Flushrs Loadrs AR[BSP] {63:3} AR[BSP]{63:3} + CFM.sof + Unchanged Unchanged (AR[BSP]{8:3} + CFM.sof)/63 AR[BSPSTORE]{63:3} Unchanged AR[BSP]{63:3} AR[BSP]{63:3} − AR[RSC].loadrs{14:3} RSE.BspLoad[63:3} Unchanged Model specific AR[BSP]{63:3} − AR[RSC].loadrs{14:3} AR[RNAT] Unchanged Updated Undefined RSE.RNATBitIndex Unchanged AR[BSPSTORE]{8:3} AR[BSPSTORE]{8:3} CR[IFS] If(PSR.ic = = 0) {CR[IFS].ifm = Unchanged Unchanged CFM CR[IFS].v = 1 CFM CFM.sof =o Unchanged Unchanged CFM.sol = 0 CFM.sor = 0 CFM.rrb.gr = 0 CFM.rrb.fr = 0 CFM.rrb.p = 0 - The present invention thus provides a register management system that supports more efficient use of a processor's registers. A register stack engine employs available bandwidth in the processor-memory channel to speculatively spill and fill registers allocated to inactive procedures. The speculative operations increase the size of the register file's clean partition, reducing the need for mandatory spill and fill operations which may stall processor execution.
- The disclosed embodiments of the present invention are provided solely for purposes of illustration. Persons skilled in the art of computer architecture and having the benefit of this disclosure will recognize variations on the disclosed embodiments that fall within the spirit of the present invention. The scope of the present invention should be limited only by the appended claims.
Claims (21)
1. A computer system comprising:
a memory;
a register file coupled to the memory through a memory channel, the register file to store data for one or more procedures in one or more frames, respectively; and
a register stack engine to monitor activity on the memory channel and to transfer data between selected frames of the register file and the memory responsive to available bandwidth on the memory channel.
2. The computer system of claim 1 , wherein the memory includes a backing store and the register stack engine transfers data between the selected frames and the backing store.
3. The computer system of claim 1 , wherein a portion of the register file is organized as a register stack.
4. The computer system of claim 3 , wherein the register stack engine includes a first pointer to indicate a first location in a current frame of the register stack.
5. The computer system of claim 4 , wherein the register stack engine includes a second pointer to indicate an oldest dirty register in the register stack.
6. The computer system of claim 5 , wherein the register stack engine includes a third pointer to indicate an oldest clean register in the register stack.
7. The computer system of claim 1 , wherein registers of the register file are mapped to a current frame and an inactive frame, and the register stack engine transfers data between registers in the inactive frame and the backing store.
8. The computer system of claim 7 , wherein the registers mapped to the inactive frame are designated as clean or dirty, according to whether data stored in the registers has or has not been spilled to the memory.
9. The computer system of claim 8 , wherein the memory includes a backing store.
10. The computer system of claim 9 , wherein the register stack engine transfers data from a dirty registers to a corresponding location in the backing store when bandwidth is available on the memory channel.
11. The computer system of claim 9 , wherein the register stack engine transfers data to a clean register from a corresponding location in the backing store when bandwidth is available on the memory channel.
12. A method for managing data in a register stack comprising:
designating registers in the register stack as clean or dirty, according to whether data in the registers has been spilled to a backing store;
monitoring operations on a memory channel; and
spilling data from a current oldest dirty register to the backing store when capacity is available on the memory channel.
13. The method of claim 12 , further comprising updating a first pointer to indicate a new oldest dirty register when data is spilled from the curent oldest dirty register.
14. The method of claim 12 , further comprising filling data from the backing store to a current oldest clean register when capacity is available on the memory channel.
15. The method of claim 14 , further comprising updating a second pointer to indicate a new oldest clean register when data is filled to the curent oldest clean register.
16. A computer system comprising:
a memory system;
a register file to store data for an active procedure and one or more inactive procedures; and
a register stack engine to transfer data between registers associated with the one or more inactive procedures and the memory system, responsive to available bandwidth to the memory system.
17. The computer system of claim 16 , wherein the computer system further comprises a load/store unit and the register stack engine monitors the load/store unit to determine available bandwidth to the memory system.
18. The computer system of claim 16 , wherein the register stack engine includes a first pointer to track a next inactive register to spill to the memory system and a second pointer to track a next inactive register to fill from the memory system responsive to available bandwidth.
19. The computer system of claim 16 , wherein the register stack engine transfers data for inactive procedures responsive to a mode status indicator.
20. The computer system of claim 19 , wherein the register stack engine operates in a lazy mode, a store intensive mode, a load intensive mode, or an eager mode according to the mode status indicator.
21. The computer system of claim 19 , wherein the mode status indicator is set under software control responsive to a type of application to run on the computer system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/663,247 US20040064632A1 (en) | 2000-04-28 | 2003-09-15 | Register stack engine having speculative load/store modes |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/561,145 US6631452B1 (en) | 2000-04-28 | 2000-04-28 | Register stack engine having speculative load/store modes |
US10/663,247 US20040064632A1 (en) | 2000-04-28 | 2003-09-15 | Register stack engine having speculative load/store modes |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/561,145 Continuation US6631452B1 (en) | 2000-04-28 | 2000-04-28 | Register stack engine having speculative load/store modes |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040064632A1 true US20040064632A1 (en) | 2004-04-01 |
Family
ID=28675695
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/561,145 Expired - Fee Related US6631452B1 (en) | 2000-04-28 | 2000-04-28 | Register stack engine having speculative load/store modes |
US10/663,247 Abandoned US20040064632A1 (en) | 2000-04-28 | 2003-09-15 | Register stack engine having speculative load/store modes |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/561,145 Expired - Fee Related US6631452B1 (en) | 2000-04-28 | 2000-04-28 | Register stack engine having speculative load/store modes |
Country Status (1)
Country | Link |
---|---|
US (2) | US6631452B1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050149918A1 (en) * | 2003-12-29 | 2005-07-07 | Intel Corporation | Inter-procedural allocation of stacked registers for a processor |
US20070094484A1 (en) * | 2005-10-20 | 2007-04-26 | Bohuslav Rychlik | Backing store buffer for the register save engine of a stacked register file |
US20100106206A1 (en) * | 2008-10-24 | 2010-04-29 | Boston Scientific Neuromodulation Corporation | Method to detect proper lead connection in an implantable stimulation system |
US20120123984A1 (en) * | 2010-11-16 | 2012-05-17 | International Business Machines Corporation | Optimal persistence of a business process |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020083309A1 (en) * | 2000-12-21 | 2002-06-27 | Sun Microsystems, Inc. | Hardware spill/fill engine for register windows |
US20050102494A1 (en) * | 2003-11-12 | 2005-05-12 | Grochowski Edward T. | Method and apparatus for register stack implementation using micro-operations |
US7206923B2 (en) * | 2003-12-12 | 2007-04-17 | International Business Machines Corporation | Method and apparatus for eliminating the need for register assignment, allocation, spilling and re-filling |
US20060277396A1 (en) * | 2005-06-06 | 2006-12-07 | Renno Erik K | Memory operations in microprocessors with multiple execution modes and register files |
US7844804B2 (en) * | 2005-11-10 | 2010-11-30 | Qualcomm Incorporated | Expansion of a stacked register file using shadow registers |
US7805573B1 (en) * | 2005-12-20 | 2010-09-28 | Nvidia Corporation | Multi-threaded stack cache |
US9239735B2 (en) * | 2013-07-17 | 2016-01-19 | Texas Instruments Incorporated | Compiler-control method for load speculation in a statically scheduled microprocessor |
US11321242B2 (en) * | 2020-09-15 | 2022-05-03 | Vmware, Inc. | Early acknowledgement of translation lookaside buffer shootdowns |
Citations (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US556024A (en) * | 1896-03-10 | Valve | ||
US5142635A (en) * | 1989-04-07 | 1992-08-25 | Intel Corporation | Method and circuitry for performing multiple stack operations in succession in a pipelined digital computer |
US5420991A (en) * | 1994-01-04 | 1995-05-30 | Intel Corporation | Apparatus and method for maintaining processing consistency in a computer system having multiple processors |
US5463745A (en) * | 1993-12-22 | 1995-10-31 | Intel Corporation | Methods and apparatus for determining the next instruction pointer in an out-of-order execution computer system |
US5479633A (en) * | 1992-10-30 | 1995-12-26 | Intel Corporation | Method of controlling clean-up of a solid state memory disk storing floating sector data |
US5522051A (en) * | 1992-07-29 | 1996-05-28 | Intel Corporation | Method and apparatus for stack manipulation in a pipelined processor |
US5535397A (en) * | 1993-06-30 | 1996-07-09 | Intel Corporation | Method and apparatus for providing a context switch in response to an interrupt in a computer process |
US5574935A (en) * | 1993-12-29 | 1996-11-12 | Intel Corporation | Superscalar processor with a multi-port reorder buffer |
US5640582A (en) * | 1992-05-21 | 1997-06-17 | Intel Corporation | Register stacking in a computer system |
US5680640A (en) * | 1995-09-01 | 1997-10-21 | Emc Corporation | System for migrating data by selecting a first or second transfer means based on the status of a data element map initialized to a predetermined state |
US5751996A (en) * | 1994-09-30 | 1998-05-12 | Intel Corporation | Method and apparatus for processing memory-type information within a microprocessor |
US5778245A (en) * | 1994-03-01 | 1998-07-07 | Intel Corporation | Method and apparatus for dynamic allocation of multiple buffers in a processor |
US5852726A (en) * | 1995-12-19 | 1998-12-22 | Intel Corporation | Method and apparatus for executing two types of instructions that specify registers of a shared logical register file in a stack and a non-stack referenced manner |
US5867602A (en) * | 1994-09-21 | 1999-02-02 | Ricoh Corporation | Reversible wavelet transform and embedded codestream manipulation |
US5976525A (en) * | 1994-10-05 | 1999-11-02 | Antex Biologics Inc. | Method for producing enhanced antigenic enteric bacteria |
US6006318A (en) * | 1995-08-16 | 1999-12-21 | Microunity Systems Engineering, Inc. | General purpose, dynamic partitioning, programmable media processor |
US6035389A (en) * | 1998-08-11 | 2000-03-07 | Intel Corporation | Scheduling instructions with different latencies |
US6065114A (en) * | 1998-04-21 | 2000-05-16 | Idea Corporation | Cover instruction and asynchronous backing store switch |
US6202204B1 (en) * | 1998-03-11 | 2001-03-13 | Intel Corporation | Comprehensive redundant load elimination for architectures supporting control and data speculation |
US6219783B1 (en) * | 1998-04-21 | 2001-04-17 | Idea Corporation | Method and apparatus for executing a flush RS instruction to synchronize a register stack with instructions executed by a processor |
US6243668B1 (en) * | 1998-08-07 | 2001-06-05 | Hewlett-Packard Company | Instruction set interpreter which uses a register stack to efficiently map an application register state |
US6314513B1 (en) * | 1997-09-30 | 2001-11-06 | Intel Corporation | Method and apparatus for transferring data between a register stack and a memory resource |
US6321328B1 (en) * | 1999-03-22 | 2001-11-20 | Hewlett-Packard Company | Processor having data buffer for speculative loads |
US6453388B1 (en) * | 1992-06-17 | 2002-09-17 | Intel Corporation | Computer system having a bus interface unit for prefetching data from system memory |
US6487630B2 (en) * | 1999-02-26 | 2002-11-26 | Intel Corporation | Processor with register stack engine that dynamically spills/fills physical registers to backing store |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5974525A (en) * | 1997-12-05 | 1999-10-26 | Intel Corporation | System for allowing multiple instructions to use the same logical registers by remapping them to separate physical segment registers when the first is being utilized |
-
2000
- 2000-04-28 US US09/561,145 patent/US6631452B1/en not_active Expired - Fee Related
-
2003
- 2003-09-15 US US10/663,247 patent/US20040064632A1/en not_active Abandoned
Patent Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US556024A (en) * | 1896-03-10 | Valve | ||
US5142635A (en) * | 1989-04-07 | 1992-08-25 | Intel Corporation | Method and circuitry for performing multiple stack operations in succession in a pipelined digital computer |
US5640582A (en) * | 1992-05-21 | 1997-06-17 | Intel Corporation | Register stacking in a computer system |
US6453388B1 (en) * | 1992-06-17 | 2002-09-17 | Intel Corporation | Computer system having a bus interface unit for prefetching data from system memory |
US5522051A (en) * | 1992-07-29 | 1996-05-28 | Intel Corporation | Method and apparatus for stack manipulation in a pipelined processor |
US5479633A (en) * | 1992-10-30 | 1995-12-26 | Intel Corporation | Method of controlling clean-up of a solid state memory disk storing floating sector data |
US5535397A (en) * | 1993-06-30 | 1996-07-09 | Intel Corporation | Method and apparatus for providing a context switch in response to an interrupt in a computer process |
US5463745A (en) * | 1993-12-22 | 1995-10-31 | Intel Corporation | Methods and apparatus for determining the next instruction pointer in an out-of-order execution computer system |
US5574935A (en) * | 1993-12-29 | 1996-11-12 | Intel Corporation | Superscalar processor with a multi-port reorder buffer |
US5420991A (en) * | 1994-01-04 | 1995-05-30 | Intel Corporation | Apparatus and method for maintaining processing consistency in a computer system having multiple processors |
US5778245A (en) * | 1994-03-01 | 1998-07-07 | Intel Corporation | Method and apparatus for dynamic allocation of multiple buffers in a processor |
US5867602A (en) * | 1994-09-21 | 1999-02-02 | Ricoh Corporation | Reversible wavelet transform and embedded codestream manipulation |
US5751996A (en) * | 1994-09-30 | 1998-05-12 | Intel Corporation | Method and apparatus for processing memory-type information within a microprocessor |
US5976525A (en) * | 1994-10-05 | 1999-11-02 | Antex Biologics Inc. | Method for producing enhanced antigenic enteric bacteria |
US6006318A (en) * | 1995-08-16 | 1999-12-21 | Microunity Systems Engineering, Inc. | General purpose, dynamic partitioning, programmable media processor |
US5680640A (en) * | 1995-09-01 | 1997-10-21 | Emc Corporation | System for migrating data by selecting a first or second transfer means based on the status of a data element map initialized to a predetermined state |
US5852726A (en) * | 1995-12-19 | 1998-12-22 | Intel Corporation | Method and apparatus for executing two types of instructions that specify registers of a shared logical register file in a stack and a non-stack referenced manner |
US6314513B1 (en) * | 1997-09-30 | 2001-11-06 | Intel Corporation | Method and apparatus for transferring data between a register stack and a memory resource |
US6202204B1 (en) * | 1998-03-11 | 2001-03-13 | Intel Corporation | Comprehensive redundant load elimination for architectures supporting control and data speculation |
US6065114A (en) * | 1998-04-21 | 2000-05-16 | Idea Corporation | Cover instruction and asynchronous backing store switch |
US6219783B1 (en) * | 1998-04-21 | 2001-04-17 | Idea Corporation | Method and apparatus for executing a flush RS instruction to synchronize a register stack with instructions executed by a processor |
US6367005B1 (en) * | 1998-04-21 | 2002-04-02 | Idea Corporation Of Delaware | System and method for synchronizing a register stack engine (RSE) and backing memory image with a processor's execution of instructions during a state saving context switch |
US6243668B1 (en) * | 1998-08-07 | 2001-06-05 | Hewlett-Packard Company | Instruction set interpreter which uses a register stack to efficiently map an application register state |
US6035389A (en) * | 1998-08-11 | 2000-03-07 | Intel Corporation | Scheduling instructions with different latencies |
US6487630B2 (en) * | 1999-02-26 | 2002-11-26 | Intel Corporation | Processor with register stack engine that dynamically spills/fills physical registers to backing store |
US6321328B1 (en) * | 1999-03-22 | 2001-11-20 | Hewlett-Packard Company | Processor having data buffer for speculative loads |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050149918A1 (en) * | 2003-12-29 | 2005-07-07 | Intel Corporation | Inter-procedural allocation of stacked registers for a processor |
US7120775B2 (en) * | 2003-12-29 | 2006-10-10 | Intel Corporation | Inter-procedural allocation of stacked registers for a processor |
US20070094484A1 (en) * | 2005-10-20 | 2007-04-26 | Bohuslav Rychlik | Backing store buffer for the register save engine of a stacked register file |
US7962731B2 (en) * | 2005-10-20 | 2011-06-14 | Qualcomm Incorporated | Backing store buffer for the register save engine of a stacked register file |
JP2012234556A (en) * | 2005-10-20 | 2012-11-29 | Qualcomm Inc | Backing store buffer for register save engine of stacked register file |
JP2014130606A (en) * | 2005-10-20 | 2014-07-10 | Qualcomm Incorporated | Backing store buffer for register save engine of stacked register file |
US20100106206A1 (en) * | 2008-10-24 | 2010-04-29 | Boston Scientific Neuromodulation Corporation | Method to detect proper lead connection in an implantable stimulation system |
US20120123984A1 (en) * | 2010-11-16 | 2012-05-17 | International Business Machines Corporation | Optimal persistence of a business process |
US8538963B2 (en) * | 2010-11-16 | 2013-09-17 | International Business Machines Corporation | Optimal persistence of a business process |
US8892557B2 (en) | 2010-11-16 | 2014-11-18 | International Business Machines Corporation | Optimal persistence of a business process |
US9569722B2 (en) | 2010-11-16 | 2017-02-14 | International Business Machines Corporation | Optimal persistence of a business process |
Also Published As
Publication number | Publication date |
---|---|
US6631452B1 (en) | 2003-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5869009B2 (en) | Backing storage buffer for stacked register file register save engine | |
US6408325B1 (en) | Context switching technique for processors with large register files | |
Mowry et al. | Automatic compiler-inserted I/O prefetching for out-of-core applications | |
US6871219B2 (en) | Dynamic memory placement policies for NUMA architecture | |
US5251308A (en) | Shared memory multiprocessor with data hiding and post-store | |
Marty et al. | Virtual hierarchies to support server consolidation | |
US6871264B2 (en) | System and method for dynamic processor core and cache partitioning on large-scale multithreaded, multiprocessor integrated circuits | |
US7512745B2 (en) | Method for garbage collection in heterogeneous multiprocessor systems | |
US6658564B1 (en) | Reconfigurable programmable logic device computer system | |
US7827374B2 (en) | Relocating page tables | |
US20080005495A1 (en) | Relocation of active DMA pages | |
US7490214B2 (en) | Relocating data from a source page to a target page by marking transaction table entries valid or invalid based on mappings to virtual pages in kernel virtual memory address space | |
US7844804B2 (en) | Expansion of a stacked register file using shadow registers | |
US20020087815A1 (en) | Microprocessor reservation mechanism for a hashed address system | |
US20060026183A1 (en) | Method and system provide concurrent access to a software object | |
US7661115B2 (en) | Method, apparatus and program storage device for preserving locked pages in memory when in user mode | |
WO2005001693A2 (en) | Multiprocessor system with dynamic cache coherency regions | |
US6631452B1 (en) | Register stack engine having speculative load/store modes | |
Stanley et al. | A performance analysis of automatically managed top of stack buffers | |
CA2019300C (en) | Multiprocessor system with shared memory | |
Wilkinson et al. | Angel: A proposed multiprocessor operating system kernel | |
US6842847B2 (en) | Method, apparatus and system for acquiring a plurality of global promotion facilities through execution of an instruction | |
US7017031B2 (en) | Method, apparatus and system for managing released promotion bits | |
Niehaus et al. | Architecture and OS support for predictable real-time systems | |
Russell et al. | A stack-based register set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |