US6854051B2 - Cycle count replication in a simultaneous and redundantly threaded processor - Google Patents
Cycle count replication in a simultaneous and redundantly threaded processor Download PDFInfo
- Publication number
- US6854051B2 US6854051B2 US09/839,459 US83945901A US6854051B2 US 6854051 B2 US6854051 B2 US 6854051B2 US 83945901 A US83945901 A US 83945901A US 6854051 B2 US6854051 B2 US 6854051B2
- Authority
- US
- United States
- Prior art keywords
- processor
- cycle count
- cycle
- thread
- instructions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime, expires
Links
- 230000010076 replication Effects 0.000 title description 21
- 230000001052 transient effect Effects 0.000 claims abstract description 15
- 230000003362 replicative effect Effects 0.000 claims abstract description 9
- 230000004044 response Effects 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 21
- 239000000872 buffer Substances 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 5
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 description 10
- 230000005855 radiation Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000012536 storage buffer Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1405—Saving, restoring, recovering or retrying at machine instruction level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1497—Details of time redundant execution on a single processing unit
Definitions
- the present invention generally relates to microprocessors. More particularly, the present invention relates to a pipelined, multithreaded processor that can execute a program in at least two separate, redundant threads. More particularly still, the invention relates to a method and apparatus for ensuring valid replication of reads from a cycle counter to each redundant thread.
- Solid state electronics such as microprocessors
- microprocessors are susceptible to transient hardware faults.
- cosmic rays or alpha particles can alter the voltage levels that represent data values in microprocessors, which typically include millions of transistors.
- Cosmic radiation can change the state of individual transistors causing faulty operation.
- the frequency of such transient faults is relatively low—typically less than one fault per year per thousand computers. Because of this relatively low failure rate, making computers fault tolerant currently is attractive more for mission-critical applications, such as online transaction processing and the space program, than computers used by average consumers.
- future microprocessors will be more prone to transient fault due to their smaller anticipated size, reduced voltage levels, higher transistor count, and reduced noise margins. Accordingly, even low-end personal computers may benefit from being able to protect against such faults.
- One way to protect solid state electronics from faults resulting from cosmic radiation is to surround the potentially effected electronics by a sufficient amount of concrete. It has been calculated that the energy flux of the cosmic rays can be reduced to acceptable levels with six feet or more of concrete surrounding the computer containing the chips to be protected. For obvious reasons, protecting electronics from faults caused by cosmic ray with six feet of concrete usually is not feasible. Further, computers usually are placed in buildings that have already been constructed without this amount of concrete.
- processors Rather than attempting to create an impenetrable barrier through which cosmic rays cannot pierce, it is generally more economically feasible and otherwise more desirable to provide the affected electronics with a way to detect and recover from a fault caused by cosmic radiation. In this manner, a cosmic ray may still impact the device and cause a fault, but the device or system in which the device resides can detect and recover from the fault.
- This disclosure focuses on enabling microprocessors (referred to throughout this disclosure simply as “processors”) to recover from a fault condition.
- Lockstepped processors have their clock cycles synchronized and both processors are provided with identical inputs (i.e., the same instructions to execute, the same data, etc.).
- a checker circuit compares the processors' data output which may also include memory addressed for store instructions). The output data from the two processors should be identical because the processors are processing the same data using the same instructions, unless of course a fault exists. If an output data mismatch occurs, the checker circuit flags an error and initiates a software or hardware recovery sequence. Thus, if one processor has been affected by a transient fault, its output likely will differ from that of the other synchronized processor.
- lockstepped processors are generally satisfactory for creating a fault tolerant environment, implementing fault tolerance with two processors takes up valuable real estate.
- a “pipelined” processor includes a series of functional units (e.g., fetch unit, decode unit, execution units, etc.), arranged so that several units can be simultaneously processing an appropriate part of several instructions. Thus, while one instruction is being decoded, an earlier fetched instruction can be executed.
- a “simultaneous multithreaded” (“SMT”) processor permits instructions from two or more different program threads (e.g., applications) to be processed through the processor simultaneously.
- An “out-of-order” processor permits instructions to be processed in an order that is different than the order in which the instructions are provided in the program (referred to as “program order”). Out-of-order processing potentially increases the throughput efficiency of the processor. Accordingly, an SMT processor can process two programs simultaneously.
- SMT processor can be modified so that the same program is simultaneously executed in two separate threads to provide fault tolerance within a single processor.
- Such a processor is called a simultaneous and redundantly threaded (“SRT”) processor.
- Executing the same program in two different threads permits the processor to detect faults such as may be caused by cosmic radiation, noted above.
- faults such as may be caused by cosmic radiation, noted above.
- By comparing the output data from the two threads at appropriate times and locations within the SRT processor it is possible to detect whether a fault has occurred. For example, data written to cache memory or registers that should be identical from corresponding instructions in the two threads can be compared. If the output data matches, there is no fault. Alternatively, if there is a mismatch in the output data, a fault has presumably occurred in one or both of the threads.
- Cache misses occur when an instruction requests data from memory that is not also available in cache memory.
- the processor first checks whether the requested data already resides in the faster access cache memory, which generally is onboard the processor die. If the requested data is not present in cache (a condition referred to as a cache “miss”), then the processor is forced to retrieve the data from main system memory which takes more time, thereby causing latency, than if the data could have been retrieved from the faster onboard cache. Because the two threads are executing the same instructions, any instruction in one thread that results in a cache miss will also experience the same cache miss when that same instruction is executed in other thread. That is, the cache latency will be present in both threads.
- a branch instruction requires program execution either to continue with the instruction immediately following the branch instruction if a certain condition is met, or branch to a different instruction if the particular condition is not met. Accordingly, the outcome of a branch instruction is not known until the instruction is executed.
- a branch instruction (or any instruction for that matter) may not be executed for at least several, and perhaps many, clock cycles after the branch instruction is fetched by the fetch unit in the processor.
- branch prediction logic which predicts the outcome of a branch instruction before it is actually executed (also referred to as “speculating”). Branch prediction logic generally bases its speculation on short or long term history.
- a processor's fetch unit can speculate the outcome of a branch instruction before it is actually executed.
- the speculation may or may not turn out to be accurate. That is, the branch predictor logic may guess wrong regarding the direction of program execution following a branch instruction. If the speculation proves to have been accurate, which is determined when the branch instruction is executed by the processor, then the next instructions to be executed have already been fetched and are working their way through the pipeline.
- misspeculation branch speculation turns out to have been the wrong prediction
- many or all of the instructions filling the pipeline behind the branch instruction may have to be thrown out (i.e., not executed) because they are not the correct instructions to be executed after the branch instruction.
- the result is a substantial performance hit as the fetch unit must fetch the correct instructions to be processed through the pipeline.
- Suitable branch prediction methods result in correct speculations more often than misspeculations and the overall performance of the processor is improved with a suitable branch predictor (even in the face of some misspeculations) than if no speculation was available at all.
- any branch misspeculation is exacerbated because both threads will experience the same misspeculation. Because the branch misspeculation occurs in both threads, the processor's internal resources usable to each thread are wasted while the wrong instructions are replaced with the correct instructions.
- threads may be separated by a predetermined amount of slack to improve performance.
- one thread is processed ahead of the other thread thereby creating a “slack” of instructions between the two threads so that the instructions in one thread are processed through the processor's pipeline ahead of the corresponding instructions from the other thread.
- the thread whose instructions are processed earlier is called the “leading” thread, while the other thread is the “trailing” thread.
- the processor verifies that inputs to the multiple threads are identical to guarantee that both execution copies or threads follow precisely the same path. Thus, corresponding operations that input data from other locations within the system (e.g., memory, cycle counter), must return the same data values to both redundant threads. Otherwise, the threads may follow divergent execution paths, leading to different outputs that will be detected and handled as if a hardware fault occurred.
- locations within the system e.g., memory, cycle counter
- a cycle counter is a running counter that advances once for each tick of the processor clock. Thus, for a 1 GHz processor, the counter will advance once every nanosecond.
- a conventional cycle counter may be a 64-bit counter that counts up from zero to the maximum value and wraps around to zero to continue counting.
- a program that is running on the processor may periodically request the current value of the cycle counter using a read or fetch command.
- Compaq Alpha servers execute an “rpcc” command that is included in the instruction set for Alpha processors.
- the processor may calculate how many clock cycles (and therefore, how much time) elapsed during execution of the instructions.
- the “read cycle counter” command provides a means of measuring system performance.
- the problems noted above are solved in large part by a simultaneous and redundantly threaded processor that can simultaneously execute the same program in two separate threads to provide fault tolerance.
- the system can be made fault tolerant by checking the output data pertaining to corresponding instructions in the threads to ensure that the data matches.
- a data mismatch indicates a fault in the processor effecting one or both of the threads.
- the preferred embodiment of the invention provides an increase in performance to such a fault tolerant, simultaneous and redundantly threaded processor.
- the preferred embodiment includes a pipelined, simultaneous and redundantly threaded (“SRT”) processor, comprising a program counter configured to assign program count identifiers to instructions in each thread, a register update unit configured to store a queue of instructions prior to execution by the processor, load/store units configured to perform load and store operations to or from data locations such as a data cache and data registers, and a cycle counter configured to keep a running count of processor clock cycles.
- the processor is configured to detect transient faults during program execution by executing instructions in at least two redundant copies of a program thread. False errors caused by incorrectly replicating cycle count values in the redundant program threads are avoided by using the actual values from cycle count reads in a first program thread for the second program thread.
- the SRT processor is an out-of-order processor capable of executing instructions in the most efficient order, but read cycle count (“RCC”) instructions are executed in the same order in both the first and second program threads.
- the register update unit is capable of managing program order for the RCC instructions by establishing a dependence with instructions before and after the RCC instructions in the register update unit.
- the SRT processor further comprises a cycle count queue for storing the actual values fetched by RCC instructions in the first program thread.
- the load/store units place a duplicate copy of the cycle count value in the cycle count queue after fetching the cycle count value from the cycle counter.
- the load/store units then access the cycle count queue, and not the cycle counter, to fetch cycle count values in response to corresponding RCC instructions in the second program thread.
- the cycle count queue is preferably a FIFO buffer and individual cycle count entries stored in the cycle count queue comprise: a program count assigned to the RCC instruction by the program counter and a cycle count value that was returned by the corresponding RCC instruction in the leading thread. If the cycle count queue becomes full, the first thread is stalled to prevent more cycle count values from entering the cycle count queue. Conversely, if the cycle count queue becomes empty, the second thread may be stalled to allow cycle count values to enter the cycle count queue.
- An alternative embodiment exists for use in systems that do not have access to a cycle count queue.
- the processor executes the redundant threads with some predetermined amount of slack between the threads.
- the leading thread is halted and the trailing thread is executed until the corresponding RCC command is reached in the trailing thread.
- the load/store units fetch the current cycle count value from the cycle counter and distributes this value to both threads.
- FIG. 1 is a diagram of a computer system constructed in accordance with the preferred embodiment of the invention and including a simultaneous and redundantly threaded processor;
- FIG. 2 is a graphical depiction of the input replication and output comparison executed by the simultaneous and redundantly threaded processor according to the preferred embodiment
- FIG. 3 conceptually illustrates the problem encountered by the multithreaded processor of FIGS. 1 and 2 when corresponding cycle count read commands are issued at different cycle count values;
- FIG. 4 is a block diagram of the simultaneous and redundantly threaded processor from FIG. 1 in accordance with the preferred embodiment that includes a single cycle counter and a cycle count queue;
- FIG. 5 is a diagram of a Register Update Unit in accordance with a preferred embodiment.
- FIG. 6 is a diagram of a Cycle Count Queue in accordance with a preferred embodiment.
- FIG. 1 shows a computer system 90 including a pipelined, simultaneous and redundantly threaded (“SRT”) processor 100 constructed in accordance with the preferred embodiment of the invention.
- computer system 90 also includes dynamic random access memory (“DRAM”) 92 , an input/output (“I/O”) controller 93 , and various I/O devices which may include a floppy drive 94 , a hard drive 95 , a keyboard 96 , and the like.
- the I/O controller 93 provides an interface between processor 100 and the various I/O devices 94 - 96 .
- the DRAM 92 can be any suitable type of memory devices such as RAMBUSTM memory.
- SRT processor 100 may also be coupled to other SRT processors if desired in a commonly known “Manhattan” grid, or other suitable architecture.
- the preferred embodiment of the invention ensures correct operation and provides a performance enhancement to SRT processors.
- the preferred SRT processor 100 described above is capable of processing instructions from two different threads simultaneously. Such a processor in fact can be made to execute the same program as two different threads. In other words, the two threads contain the same program set. Processing the same program through the processor in two different threads permits the processor to detect faults caused by cosmic radiation or alpha particles as noted above.
- FIG. 2 conceptually shows the simultaneous and redundant execution of threads 250 , 260 in the processor 100 .
- the threads 250 , 260 are referred to as Thread 0 (“T 0 ”) and Thread 1 (“T 1 ”).
- the processor 100 or a significant portion thereof resides in a sphere of replication 200 , which defines the boundary within which all activity and states are replicated either logically or physically. Values that cross the boundary of the sphere of replication are the outputs and inputs that require comparison 210 and replication 220 , respectively.
- a sphere of replication 200 that includes fewer components may require fewer replications but may also require more output comparisons because more information crosses the boundary of the sphere of replication.
- the preferred sphere of replication is described in conjunction with the discussion of FIG. 4 below.
- All inputs to the sphere of replication 200 must be replicated 220 .
- an input resulting from a memory load command must return the same value to each execution thread 250 , 260 . If two distinctly different values are returned, the threads 250 , 260 may follow divergent execution paths.
- the outputs of both threads 250 , 260 must be compared 210 before the values contained therein are shared with the rest of the system 230 . For instance, each thread may need to write data to memory 92 or send a command to the I/O controller 93 . If the outputs from the threads 250 , 260 are identical, then it is assumed that no transient faults have occurred and a single output is forwarded to the appropriate destination and thread execution continues. Conversely, if the outputs do not match, then appropriate error recovery techniques may be implemented to re-execute and re-verify the “faulty” threads.
- the rest of the system 230 which may include such components as memory 92 , I/O devices 93 - 96 , and the operating system need not be aware that two threads of each program are executed by the processor 100 .
- the preferred embodiment generally assumes that all input and output values or commands are transmitted as if only a single thread exists. It is only within the sphere of replication 200 that the input or output data is replicated.
- FIG. 3 illustratively shows the problem with running two separate threads with corresponding “read cycle count” (“RCC”) instructions.
- FIG. 3 shows two distinct, but replicated copies of a program thread T 0 & T 1 presumably executed in the same pipeline. Thread T 0 is arbitrarily designated as the “leading” thread while thread T 1 is designated as the “trailing” thread. The threads may be separated in time by a predetermined slack and may also be executed out of program order.
- an RCC command is issued in the leading thread T 0 that returns a cycle count value of “4”.
- processor 100 preferably comprises a pipelined architecture which includes a series of functional units, arranged so that several units can be simultaneously processing appropriate parts of several instructions.
- the exemplary embodiment of processor 100 includes a fetch unit 102 , one or more program counters 106 , an instruction cache 110 , decode logic 114 , register rename logic 118 , floating point and integer registers 122 , 126 , a register update unit 130 , execution units 134 , 138 , and 142 , a data cache 146 , a cycle counter 148 and a cycle count queue 150 .
- Fetch unit 102 uses a program counter 106 for assistance as to which instruction to fetch. Being a multithreaded processor, the fetch unit 102 preferably can simultaneously fetch instructions from multiple threads. A separate program counter 106 is associated with each thread. Each program counter 106 is a register that contains the address of the next instruction to be fetched from the corresponding thread by the fetch unit 102 . FIG. 4 shows two program counters 106 to permit the simultaneous fetching of instructions from two threads. It should be recognized, however, that additional program counters can be provided to fetch instructions from more than two threads simultaneously.
- fetch unit 102 includes branch prediction logic 103 and a “slack” counter 104 .
- Slack counter 104 is used to create a delay of a desired number of instructions between the threads that include the same instruction set. The introduction of slack permits the leading thread T 0 to resolve all or most branch misspeculations and cache misses so that the corresponding instructions in the trailing thread T 1 will not experience the same latency problems.
- the branch prediction logic 103 permits the fetch unit 102 to speculate ahead on branch instructions as noted above. In order to keep the pipeline full (which is desirable for efficient operation), the branch predictor logic 103 speculates the outcome of a branch instruction before the branch instruction is actually executed. Branch predictor 103 generally bases its speculation on previous instructions. Any suitable speculation algorithm can be used in branch predictor 103 .
- instruction cache 110 provides a temporary storage buffer for the instructions to be executed.
- Decode logic 114 retrieves the instructions from instruction cache 110 and determines the instruction type (e.g., add, subtract, load, store, etc.). Decoded instructions are then passed to the register rename logic 118 which maps logical registers onto a pool of physical registers.
- the register update unit (“RUU”) 130 provides an instruction queue for the instructions to be executed.
- the RUU 130 serves as a combination of global reservation station pool, rename register file, and reorder buffer.
- the RUU 130 breaks load and store instructions into an address portion and a memory (i.e., register) reference.
- the address portion is placed in the RUU 130 , while the memory reference portion is placed into a load/store queue (not specifically shown in FIG. 4 ).
- the RUU 130 also handles out-of-order execution management. As instructions are placed in the RUU 130 , any dependence between instructions (e.g., one instruction depends on the output from another or because branch instructions must be executed in program order) is maintained by placing appropriate dependent instruction numbers in a field associated with each entry in the RUU 130 .
- FIG. 5 provides a simplified representation of the various fields that exist for each entry in the RUU 130 .
- Each instruction in the RUU 130 includes an instruction number, the instruction to be performed, and a dependent instruction number (“DIN”) field. As instructions are executed by the execution units 134 , 138 , 142 , dependency between instructions can be maintained by first checking the DIN field for instructions in the RUU 130 . For example, FIG.
- Instruction I 3 includes the value I 1 in the DIN field which implies that the execution of I 3 depends on the outcome of I 1 .
- execution units 134 , 138 , 142 recognize that instruction number I 1 must be executed before instruction 13 . Therefore, in the example shown in FIG. 5 , the same dependency exists between instructions I 4 and I 3 as well as I 8 and I 7 . Meanwhile, independent instructions (i.e., those with no number in the dependent instruction number field) may be executed out of order.
- the floating point register 122 and integer register 126 are used for the execution of instructions that require the use of such registers as is known by those of ordinary skill in the art. These registers 122 , 126 can be loaded with data from the data cache 146 . The registers also provide their contents to the RUU 130 .
- the execution units 134 , 138 , and 142 comprise a floating point execution unit 134 , a load/store execution unit 138 , and an integer execution unit 142 .
- Each execution unit performs the operation specified by the corresponding instruction type.
- the floating point execution units 134 execute floating instructions such as multiply and divide instruction while the integer execution units 142 execute integer-based instructions.
- the load/store units 138 perform load operations in which data from memory is loaded into a register 122 or 126 .
- the load/store units 138 also perform store operations in which data from registers 122 , 126 is written to data cache 146 and/or DRAM memory 92 (FIG. 1 ).
- the load/store units 138 also read the cycle counter 148 in response to read cycle count (“RCC”) commands as they are encountered in a program thread.
- RRC read cycle count
- the sphere of replication is represented by the dashed box shown in FIG. 4 .
- the majority of the pipelined processor components are included in the sphere of replication 200 with the notable exception of the instruction cache 110 and the data cache 146 .
- the floating point and integer registers 122 , 126 may alternatively reside outside of the sphere of replication 200 , but for purposes of this discussion, they will remain as shown.
- the cycle counter clock 148 also resides outside of the sphere of replication and therefore, any reads from the cycle counter clock 148 must be replicated for the duplicate threads. Note also that the cycle count queue 150 resides outside the sphere of replication as well.
- ECC error checking and correcting
- the preferred embodiment provides an effective means of replicating cycle counter values returned from an RCC command in the leading thread T 0 and delivering a “copy” to the trailing thread T 1 .
- the load/store units 138 load the current cycle count value from the cycle counter 148 as a conventional processor would.
- the preferred embodiment of the load/store units 138 loads the same cycle count value into the cycle count queue 150 .
- the cycle count queue 150 is preferably a FIFO buffer that stores the cycle count values until the corresponding RCC commands are encountered in the trailing thread T 1 .
- the cycle count queue 150 preferably includes, at a minimum, the fields shown in FIG. 6 . Entries in the representative cycle count queue 150 shown in FIG.
- the program count is used to properly identify the RCC instructions in the queue and the cycle count value is the value that was retrieved by the leading thread T 0 when the RCC command was issued.
- the program count value field is optional because the FIFO buffer guarantees that cycle count values are retrieved by the trailing thread in the correct order.
- the load/store units 138 read the cycle count value from the cycle count queue 150 (and not the cycle counter 148 ). Since the buffer delivers the oldest cycle count values in the stack, and assuming the RCC commands are encountered in program order in the trailing thread, the same cycle count values are returned to each thread. The cycle count values are, therefore, properly replicated and erroneous faults are not generated.
- the assumed program order is maintained by creating appropriate dependencies in the RUU 130 (as discussed above) between the RCC commands and instructions immediately before or after the RCC command.
- the SRT processor 100 still comprises load/store units 138 and cycle counter 148 , but the cycle count queue 150 is unnecessary.
- the load/store units 138 encounter an RCC command in the leading thread, the execution of that command and all subsequent commands in the T 0 thread is temporarily halted.
- SRT processor 100 fetches, executes, and retires instructions exclusively in the trailing thread T 1 until the corresponding RCC command is encountered. At this point, the load/store units 138 will then execute the RCC command and return the cycle count value to both threads T 0 and T 1 .
- this feature must be temporarily disabled to permit synchronization of the threads.
- disabling the slack fetch feature will temporarily eliminate some of the advantages mentioned above, but this alternative embodiment permits implementation in older legacy systems that do not include a FIFO buffer that may be used as a cycle count queue 150 . While this alternative embodiment is the less preferred of the two embodiments presented, it does permit implementation of transient fault detection in an existing computer system.
- the preferred embodiment of the invention provides a method of replicating cycle counter values in an SRT processor that can execute the same instruction set in two different threads.
Abstract
Description
Claims (21)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/839,459 US6854051B2 (en) | 2000-04-19 | 2001-04-19 | Cycle count replication in a simultaneous and redundantly threaded processor |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US19853000P | 2000-04-19 | 2000-04-19 | |
US09/839,459 US6854051B2 (en) | 2000-04-19 | 2001-04-19 | Cycle count replication in a simultaneous and redundantly threaded processor |
Publications (2)
Publication Number | Publication Date |
---|---|
US20010037445A1 US20010037445A1 (en) | 2001-11-01 |
US6854051B2 true US6854051B2 (en) | 2005-02-08 |
Family
ID=26893873
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/839,459 Expired - Lifetime US6854051B2 (en) | 2000-04-19 | 2001-04-19 | Cycle count replication in a simultaneous and redundantly threaded processor |
Country Status (1)
Country | Link |
---|---|
US (1) | US6854051B2 (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070174837A1 (en) * | 2005-12-30 | 2007-07-26 | Wang Cheng C | Apparatus and method for redundant software thread computation |
US20070234016A1 (en) * | 2006-03-28 | 2007-10-04 | Sun Microsystems, Inc. | Method and system for trace generation using memory index hashing |
US20080077778A1 (en) * | 2006-09-25 | 2008-03-27 | Davis Gordon T | Method and Apparatus for Register Renaming in a Microprocessor |
US20080086597A1 (en) * | 2006-10-05 | 2008-04-10 | Davis Gordon T | Apparatus and Method for Using Branch Prediction Heuristics for Determination of Trace Formation Readiness |
US20080086596A1 (en) * | 2006-10-04 | 2008-04-10 | Davis Gordon T | Apparatus and Method for Supporting Simultaneous Storage of Trace and Standard Cache Lines |
US20080086595A1 (en) * | 2006-10-04 | 2008-04-10 | Davis Gordon T | Apparatus and Method for Saving Power in a Trace Cache |
US20080109687A1 (en) * | 2006-10-25 | 2008-05-08 | Christopher Michael Abernathy | Method and apparatus for correcting data errors |
US20080114964A1 (en) * | 2006-11-14 | 2008-05-15 | Davis Gordon T | Apparatus and Method for Cache Maintenance |
US20080120468A1 (en) * | 2006-11-21 | 2008-05-22 | Davis Gordon T | Instruction Cache Trace Formation |
US20080141000A1 (en) * | 2005-02-10 | 2008-06-12 | Michael Stephen Floyd | Intelligent smt thread hang detect taking into account shared resource contention/blocking |
US20080215804A1 (en) * | 2006-09-25 | 2008-09-04 | Davis Gordon T | Structure for register renaming in a microprocessor |
US20080235500A1 (en) * | 2006-11-21 | 2008-09-25 | Davis Gordon T | Structure for instruction cache trace formation |
US20080244186A1 (en) * | 2006-07-14 | 2008-10-02 | International Business Machines Corporation | Write filter cache method and apparatus for protecting the microprocessor core from soft errors |
US20080250205A1 (en) * | 2006-10-04 | 2008-10-09 | Davis Gordon T | Structure for supporting simultaneous storage of trace and standard cache lines |
US20080250207A1 (en) * | 2006-11-14 | 2008-10-09 | Davis Gordon T | Design structure for cache maintenance |
US20080250206A1 (en) * | 2006-10-05 | 2008-10-09 | Davis Gordon T | Structure for using branch prediction heuristics for determination of trace formation readiness |
US8010846B1 (en) * | 2008-04-30 | 2011-08-30 | Honeywell International Inc. | Scalable self-checking processing platform including processors executing both coupled and uncoupled applications within a frame |
CN101681285B (en) * | 2007-06-20 | 2012-07-25 | 富士通株式会社 | Arithmetic device for concurrently processing a plurality of threads |
US20170192790A1 (en) * | 2016-01-06 | 2017-07-06 | Freescale Semiconductor, Inc. | Providing task-triggered determinisitic operational mode for simultaneous multi-threaded superscalar processor |
US10387296B2 (en) * | 2012-06-27 | 2019-08-20 | Intel Corporation | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs using expressions |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3886870B2 (en) | 2002-09-06 | 2007-02-28 | 株式会社ルネサステクノロジ | Data processing device |
US20050108509A1 (en) * | 2003-11-13 | 2005-05-19 | Safford Kevin D. | Error detection method and system for processors that employs lockstepped concurrent threads |
US7353365B2 (en) * | 2004-09-29 | 2008-04-01 | Intel Corporation | Implementing check instructions in each thread within a redundant multithreading environments |
US7472228B2 (en) * | 2004-10-27 | 2008-12-30 | International Business Machines Corporation | Read-copy update method |
US20070234014A1 (en) * | 2006-03-28 | 2007-10-04 | Ryotaro Kobayashi | Processor apparatus for executing instructions with local slack prediction of instructions and processing method therefor |
US20110099439A1 (en) * | 2009-10-23 | 2011-04-28 | Infineon Technologies Ag | Automatic diverse software generation for use in high integrity systems |
US20110208948A1 (en) * | 2010-02-23 | 2011-08-25 | Infineon Technologies Ag | Reading to and writing from peripherals with temporally separated redundant processor execution |
US8793689B2 (en) * | 2010-06-09 | 2014-07-29 | Intel Corporation | Redundant multithreading processor |
US8516356B2 (en) | 2010-07-20 | 2013-08-20 | Infineon Technologies Ag | Real-time error detection by inverse processing |
US9015655B2 (en) | 2012-10-19 | 2015-04-21 | Northrop Grumman Systems Corporation | Generating a diverse program |
US9996354B2 (en) * | 2015-01-09 | 2018-06-12 | International Business Machines Corporation | Instruction stream tracing of multi-threaded processors |
US10146681B2 (en) * | 2015-12-24 | 2018-12-04 | Intel Corporation | Non-uniform memory access latency adaptations to achieve bandwidth quality of service |
US10114795B2 (en) * | 2016-12-30 | 2018-10-30 | Western Digital Technologies, Inc. | Processor in non-volatile storage memory |
FR3107369B1 (en) * | 2020-02-14 | 2022-02-25 | Thales Sa | ELECTRONIC COMPUTER, ELECTRONIC SYSTEM, METHOD FOR MONITORING THE EXECUTION OF AN APPLICATION AND ASSOCIATED COMPUTER PROGRAM |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5758142A (en) | 1994-05-31 | 1998-05-26 | Digital Equipment Corporation | Trainable apparatus for predicting instruction outcomes in pipelined processors |
US5802265A (en) * | 1995-12-01 | 1998-09-01 | Stratus Computer, Inc. | Transparent fault tolerant computer system |
US5933860A (en) | 1995-02-10 | 1999-08-03 | Digital Equipment Corporation | Multiprobe instruction cache with instruction-based probe hint generation and training whereby the cache bank or way to be accessed next is predicted |
US6357016B1 (en) * | 1999-12-09 | 2002-03-12 | Intel Corporation | Method and apparatus for disabling a clock signal within a multithreaded processor |
US6493740B1 (en) * | 1998-06-16 | 2002-12-10 | Oracle Corporation | Methods and apparatus for multi-thread processing utilizing a single-context architecture |
-
2001
- 2001-04-19 US US09/839,459 patent/US6854051B2/en not_active Expired - Lifetime
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5758142A (en) | 1994-05-31 | 1998-05-26 | Digital Equipment Corporation | Trainable apparatus for predicting instruction outcomes in pipelined processors |
US5933860A (en) | 1995-02-10 | 1999-08-03 | Digital Equipment Corporation | Multiprobe instruction cache with instruction-based probe hint generation and training whereby the cache bank or way to be accessed next is predicted |
US5802265A (en) * | 1995-12-01 | 1998-09-01 | Stratus Computer, Inc. | Transparent fault tolerant computer system |
US6493740B1 (en) * | 1998-06-16 | 2002-12-10 | Oracle Corporation | Methods and apparatus for multi-thread processing utilizing a single-context architecture |
US6357016B1 (en) * | 1999-12-09 | 2002-03-12 | Intel Corporation | Method and apparatus for disabling a clock signal within a multithreaded processor |
Non-Patent Citations (19)
Title |
---|
A. Mahmood et al., "Concurrent Error Detection Using Watchdog Processors-A Survey," IEEE Trans. on Computers, 37(2):160-174, Feb. 1988. |
AR-SMT: Microarchitectural Approach To Fault Tolerance In Microprocessors, Eric Rotenberg, (8 p.). |
D. A. Reynolds et al., "Fault Detection Capabilities Of Alternating Logic," IEEE Trans. on Computers, 27(12):1093-1098, Dec. 1978. |
D. M. Tullsen, et al., "Simultaneous Multithreading: Maximizing On-Chip Parallelism," Proceedings of the 22nd Annual International Symposium on Computer Architecture, Italy, Jun. 1995. |
D. Tullsen et al., "Exploiting Choice: Instruction Fetch And Issue On An Implementable Simultaneous Multithreading Processor," Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA), May, 1996. |
DIVA: A Dynamic Approach To Microprocessor Verification, Todd M. Austin, Journal of Instruction-Level Parallelism 2 (2000) I-6, Submitted Feb. 2000; published May 2000 (26 p.). |
DIVA: A Reliable Substrate For Deep Submicron Microarchitecture Design, Todd M. Austin, May/Jun. 1999 (12 p.). |
E. Rotenberg et al., "Trace Cache: A Low Latency Approach To High Bandwidth Instruction Fetching," Proceedings of the 29th Annual International Symposium on Microarchitecture, pp. 24-34, Dec. 1996. |
E. Rotenberg et al., "Trace Processors," 30th Annual International Symposium on Microarchitecture (MICRO-30), Dec. 1997. |
G. S. Sohi et al., "A Study Of Time-Redundant Fault Tolerance Techniques For High-Performance Pipelined Computers," Digest of Papers, 19th International Symposium on Fault-Tolerant Computing, pp. 436-443, 1989. |
G. S. Sohi et al., "Instruction Issue Logic For High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers," IEEE Transactions on Computers, 39(3):349-359, Mar. 1990. |
J. E. Smith et al., "Implementing Precise Interrupts In Pipelined Processors," IEEE Trans. on Computers, 37(5):562-573, May 1988. |
J. H. Patel et al., "Concurrent Error Detection In ALU's by Recomputing With Shifted Operands," IEEE Trans. on Computers, 31(7):589-595, Jul. 1982. |
K. Sundaramoorthy et al., "Slipstream Processors: Improving Both Performance And Fault Tolerance" (6 p.). |
L. Spainhower et al., "IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective," IBM J. Res. Develop. vol. 43, No. 5/6, Sep./Nov. 1999, pp. 863-873. |
M. Franklin, "A Study Of Time Redundant Fault Tolerance Techniques For Superscalar Processors" (5 p.). |
M. Franklin, "Incorporating Fault Tolerance in Superscalar Processors," Proceedings of High Performance Computing, Dec., 1996. |
S. K. Reinhardt et al., "Transient Fault Detection Via Simultaneous Multithreading" (12 p.). |
T. J. Slegel et al., "IBM's S/390 G5 Microprocessor Design," IEEE Micro, pp. 12-23, Mar./Apr. 1999. |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7725685B2 (en) * | 2005-02-10 | 2010-05-25 | International Business Machines Corporation | Intelligent SMT thread hang detect taking into account shared resource contention/blocking |
US20080141000A1 (en) * | 2005-02-10 | 2008-06-12 | Michael Stephen Floyd | Intelligent smt thread hang detect taking into account shared resource contention/blocking |
US20070174837A1 (en) * | 2005-12-30 | 2007-07-26 | Wang Cheng C | Apparatus and method for redundant software thread computation |
US7818744B2 (en) * | 2005-12-30 | 2010-10-19 | Intel Corporation | Apparatus and method for redundant software thread computation |
US20070234016A1 (en) * | 2006-03-28 | 2007-10-04 | Sun Microsystems, Inc. | Method and system for trace generation using memory index hashing |
US7444499B2 (en) * | 2006-03-28 | 2008-10-28 | Sun Microsystems, Inc. | Method and system for trace generation using memory index hashing |
US20080244186A1 (en) * | 2006-07-14 | 2008-10-02 | International Business Machines Corporation | Write filter cache method and apparatus for protecting the microprocessor core from soft errors |
US7921331B2 (en) * | 2006-07-14 | 2011-04-05 | International Business Machines Corporation | Write filter cache method and apparatus for protecting the microprocessor core from soft errors |
US20080215804A1 (en) * | 2006-09-25 | 2008-09-04 | Davis Gordon T | Structure for register renaming in a microprocessor |
US20080077778A1 (en) * | 2006-09-25 | 2008-03-27 | Davis Gordon T | Method and Apparatus for Register Renaming in a Microprocessor |
US20080086595A1 (en) * | 2006-10-04 | 2008-04-10 | Davis Gordon T | Apparatus and Method for Saving Power in a Trace Cache |
US20080086596A1 (en) * | 2006-10-04 | 2008-04-10 | Davis Gordon T | Apparatus and Method for Supporting Simultaneous Storage of Trace and Standard Cache Lines |
US8386712B2 (en) | 2006-10-04 | 2013-02-26 | International Business Machines Corporation | Structure for supporting simultaneous storage of trace and standard cache lines |
US20080250205A1 (en) * | 2006-10-04 | 2008-10-09 | Davis Gordon T | Structure for supporting simultaneous storage of trace and standard cache lines |
US7644233B2 (en) | 2006-10-04 | 2010-01-05 | International Business Machines Corporation | Apparatus and method for supporting simultaneous storage of trace and standard cache lines |
US7610449B2 (en) | 2006-10-04 | 2009-10-27 | International Business Machines Corporation | Apparatus and method for saving power in a trace cache |
US7996618B2 (en) | 2006-10-05 | 2011-08-09 | International Business Machines Corporation | Apparatus and method for using branch prediction heuristics for determination of trace formation readiness |
US7934081B2 (en) | 2006-10-05 | 2011-04-26 | International Business Machines Corporation | Apparatus and method for using branch prediction heuristics for determination of trace formation readiness |
US20080086597A1 (en) * | 2006-10-05 | 2008-04-10 | Davis Gordon T | Apparatus and Method for Using Branch Prediction Heuristics for Determination of Trace Formation Readiness |
US20080250206A1 (en) * | 2006-10-05 | 2008-10-09 | Davis Gordon T | Structure for using branch prediction heuristics for determination of trace formation readiness |
US20110131394A1 (en) * | 2006-10-05 | 2011-06-02 | International Business Machines Corporation | Apparatus and method for using branch prediction heuristics for determination of trace formation readiness |
US8020072B2 (en) | 2006-10-25 | 2011-09-13 | International Business Machines Corporation | Method and apparatus for correcting data errors |
US20080109687A1 (en) * | 2006-10-25 | 2008-05-08 | Christopher Michael Abernathy | Method and apparatus for correcting data errors |
US20080114964A1 (en) * | 2006-11-14 | 2008-05-15 | Davis Gordon T | Apparatus and Method for Cache Maintenance |
US20080250207A1 (en) * | 2006-11-14 | 2008-10-09 | Davis Gordon T | Design structure for cache maintenance |
US20080120468A1 (en) * | 2006-11-21 | 2008-05-22 | Davis Gordon T | Instruction Cache Trace Formation |
US20080235500A1 (en) * | 2006-11-21 | 2008-09-25 | Davis Gordon T | Structure for instruction cache trace formation |
CN101681285B (en) * | 2007-06-20 | 2012-07-25 | 富士通株式会社 | Arithmetic device for concurrently processing a plurality of threads |
US8010846B1 (en) * | 2008-04-30 | 2011-08-30 | Honeywell International Inc. | Scalable self-checking processing platform including processors executing both coupled and uncoupled applications within a frame |
US10387296B2 (en) * | 2012-06-27 | 2019-08-20 | Intel Corporation | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs using expressions |
US20170192790A1 (en) * | 2016-01-06 | 2017-07-06 | Freescale Semiconductor, Inc. | Providing task-triggered determinisitic operational mode for simultaneous multi-threaded superscalar processor |
US10572261B2 (en) * | 2016-01-06 | 2020-02-25 | Nxp Usa, Inc. | Providing task-triggered deterministic operational mode for simultaneous multi-threaded superscalar processor |
Also Published As
Publication number | Publication date |
---|---|
US20010037445A1 (en) | 2001-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6854051B2 (en) | Cycle count replication in a simultaneous and redundantly threaded processor | |
US6823473B2 (en) | Simultaneous and redundantly threaded processor uncached load address comparator and data value replication circuit | |
US6854075B2 (en) | Simultaneous and redundantly threaded processor store instruction comparator | |
US6792525B2 (en) | Input replicator for interrupts in a simultaneous and redundantly threaded processor | |
US6598122B2 (en) | Active load address buffer | |
US6757811B1 (en) | Slack fetch to improve performance in a simultaneous and redundantly threaded processor | |
US20010037447A1 (en) | Simultaneous and redundantly threaded processor branch outcome queue | |
US6615366B1 (en) | Microprocessor with dual execution core operable in high reliability mode | |
US7590826B2 (en) | Speculative data value usage | |
US6772368B2 (en) | Multiprocessor with pair-wise high reliability mode, and method therefore | |
US6665792B1 (en) | Interface to a memory system for a processor having a replay system | |
US7159154B2 (en) | Technique for synchronizing faults in a processor having a replay system | |
US7865770B2 (en) | Processor including efficient signature generation for logic error protection | |
US20020023202A1 (en) | Load value queue input replication in a simultaneous and redundantly threaded processor | |
US6519730B1 (en) | Computer and error recovery method for the same | |
US20090183035A1 (en) | Processor including hybrid redundancy for logic error protection | |
US20050193283A1 (en) | Buffering unchecked stores for fault detection in redundant multithreading systems using speculative memory support | |
US7861228B2 (en) | Variable delay instruction for implementation of temporal redundancy | |
US6883086B2 (en) | Repair of mis-predicted load values | |
CN108694094B (en) | Apparatus and method for handling memory access operations | |
US9594648B2 (en) | Controlling non-redundant execution in a redundant multithreading (RMT) processor | |
US10817369B2 (en) | Apparatus and method for increasing resilience to faults | |
US10185635B2 (en) | Targeted recovery process | |
US6799285B2 (en) | Self-checking multi-threaded processor | |
US10289332B2 (en) | Apparatus and method for increasing resilience to faults |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:COMPAQ COMPUTER CORPORATION;REEL/FRAME:012478/0358 Effective date: 20010620 |
|
AS | Assignment |
Owner name: COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MUKHERJEE, SHUBHENDU S.;REEL/FRAME:013087/0450 Effective date: 20020626 |
|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:COMPAQ INFORMATION TECHNOLOGIES GROUP L.P.;REEL/FRAME:014177/0428 Effective date: 20021001 Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.,TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:COMPAQ INFORMATION TECHNOLOGIES GROUP L.P.;REEL/FRAME:014177/0428 Effective date: 20021001 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
CC | Certificate of correction | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001 Effective date: 20151027 |
|
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 12 |
|
SULP | Surcharge for late payment |
Year of fee payment: 11 |
|
AS | Assignment |
Owner name: SONRAI MEMORY, LTD., IRELAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP;HEWLETT PACKARD ENTERPRISE COMPANY;REEL/FRAME:052567/0734 Effective date: 20200423 |
|
AS | Assignment |
Owner name: NERA INNOVATIONS LIMITED, IRELAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SONRAI MEMORY LIMITED;REEL/FRAME:066778/0178 Effective date: 20240305 |