CA2003342A1 - Memory management in high-performance fault-tolerant computer system - Google Patents

Memory management in high-performance fault-tolerant computer system

Info

Publication number
CA2003342A1
CA2003342A1 CA002003342A CA2003342A CA2003342A1 CA 2003342 A1 CA2003342 A1 CA 2003342A1 CA 002003342 A CA002003342 A CA 002003342A CA 2003342 A CA2003342 A CA 2003342A CA 2003342 A1 CA2003342 A1 CA 2003342A1
Authority
CA
Canada
Prior art keywords
memory
cpus
cpu
global
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002003342A
Other languages
French (fr)
Inventor
Kenneth C. Debacker
Nikhil A. Mehta
John D. Allison
Robert W. Horst
Richard W. Cutts, Jr.
Charles E. Peet, Jr.
Douglas E. Jewett
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TANDEN COMPUTERS Inc
Tandem Computers Inc
Original Assignee
TANDEN COMPUTERS Inc
Tandem Computers Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TANDEN COMPUTERS Inc, Tandem Computers Inc filed Critical TANDEN COMPUTERS Inc
Publication of CA2003342A1 publication Critical patent/CA2003342A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • G06F9/3863Recovery, e.g. branch miss-prediction, exception handling using multiple copies of the architectural state, e.g. shadow registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • G06F11/0724Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU] in a multiprocessor or a multi-core unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1629Error detection by comparing the output of redundant processing systems
    • G06F11/1641Error detection by comparing the output of redundant processing systems where the comparison is not performed by the redundant processing components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1666Error detection or correction of the data by redundancy in hardware where the redundant component is memory or memory area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1683Temporal synchronisation or re-synchronisation of redundant processing components at instruction level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1687Temporal synchronisation or re-synchronisation of redundant processing components at event level, e.g. by interrupt or result of polling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1691Temporal synchronisation or re-synchronisation of redundant processing components using a quantum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/183Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components
    • G06F11/184Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components where the redundant components implement processing functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/183Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components
    • G06F11/184Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components where the redundant components implement processing functionality
    • G06F11/185Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components where the redundant components implement processing functionality and the voting is itself performed redundantly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2017Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where memory access, memory control or I/O control functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1405Saving, restoring, recovering or retrying at machine instruction level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1679Temporal synchronisation or re-synchronisation of redundant processing components at clock signal level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/181Eliminating the failing redundant component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/182Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits based on mutual exchange of the output between redundant processing components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2015Redundant power supplies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/85Active fault masking without idle spares
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/70Masking faults in memories by using spares or by reconfiguring
    • G11C29/74Masking faults in memories by using spares or by reconfiguring using duplex memories, i.e. using dual copies

Abstract

ABSTRACT: A computer system in a fault-tolerant configuration employs three identical CPUs executing the same instruction stream, with two identical, self-checking memory modules storing duplicates of the same data. Memory references by the three CPUs are made by three separate busses connected to three separate ports of each of the two memory modules. The three CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all three CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are imple-mented using two identical I/O busses, each of which is separate-ly coupled to only one of the memory modules. A number Or I/O
processors are coupled to both I/O busses. Each CPU has its own fast cache and also a local memory not accessible by the other CPUs. A hierarchical virtual memory management arrangement for this system employs demand paging to keep the most-used data in the local memory, page-swapping with the global memory. Page swapping with disk memory is through the global memory; the global memory is used as a disk buffer and also to hold pages likely to be needed for loading to local memory. The operating system kernel is kept in local memory. A private-write area is included in the shared memory space in the memory modules to allow functions such as software voting of state information unique to CPUs. All CPUs write state information to their private-write area, then all CPUs read all the private-write areas for functions such as detecting differences in interrupt cause or the like.

Description

:

RELATED CASES: This application discloses ~ub~ct matter al~o disclosed in copend~ng u S patent applications Ser No .
282,538, 282,629, 283,139, and 283,141, ril-d Dec 9, 1988, and ser No. 283,574, ~iled Dec 13. 1988, all assign-d to Tandem co~puters Incorporated ~. .

~ 8ACXGROUND OF THE INVENTION
: .
This invention relates to computer ~ystems, and more par-ticularly to a memory manag-m nt ~y~tem used in a ~ault-tolerant computer having multiple CPUs ,"...~
. ~ 20 Highly reliabl- digital proce~ing i~ achiev-d in various computor architoctur-- mploying r dundancy For xample, ~MR
~tripl- modular r-dundancy) ~y~t-m~ m~y ~ploy thr-e CPUs execut-ing th- ~ in-truction tr-~m, ~long with thr~ parat- main m-mory unit- and ~-p~r~t- I/O d-vice- which duplic~t- ~unctions, ~ 25 80 if on- of ~ch typ- of l-ment f~il-, th- y~tem continu-s to `i op-rat- Anoth~r fault-tol-rant typ- of sy-t-Q i- shown in U S
Pat-nt 4,228,496, i~-u-d to Xatzm~n t al, for "Multiproc-s-or Sy~tem", a~ign-d to TAnd-m Comput-r- Incorpor~t-d V~riou~
m thod~ hav bo-n us-d for ynchroniz$nq th- unit~ in r-dundant ~ 30 ~ystem~; for xampl-, in ~aid prior ~pplic~tion Ser No 118,503, ;j fil-d Nov. 9, 1987, by ~ W Borst, for ~M thod and Apparatus for ,:.
~ 2 1 :

.;~ .
,.~
. ' .
, ~ . . ; , ' ., ' ' ~ !

:''` . ' ~ . ' ;. , . . ': ' ~ ' .
. ~

~00334Z
Synchronizing a Plurality o~ Processors~, also assiqned to Tandem Computer~ Incorporated, a method of "loo~e" synchronizing is disclosed, in contra~t to other sy~tems which have employed a lock-step synchronization using a single clock, as shown in u s Patent 4,453,215 ~or "Central Processing Apparatus ~or Fault-Tolerant ComputingH, a~signed to Stratu~ Computer, Inc A
technique called "synchronization voting" i~ disclosed by Davies & Wakerly in "Synchronization and Matching in Redundant Sys-tems", IEEE Transactions on Computer~ June 1978, pp 531-539 A
method for interrupt synchronization in redu~dant fault-tolerant systems is disclosed by Yondea et al in Proceeding Or 15th Annual Symposium on Fault-Tolerant Computing, June 1985, pp 246-251, "Implementation of Interrupt Handler for Loosely Synchronized TMR
Systems" U S Patent 4,644,498 for "Fault-Tolerant Real Time Clock" discloses a triple modular redundant clock configuration ~or use in a ~MR computer system U S Patent 4,733,353 for "Frame Synchronization of Multiply Redundant Computers~ discloses a synchronization method using separately-clocked CPUs which are periodically synchronized by executing a synch frame As high-performance microproces~or devices have become ! available, using higher clock speeds and providing greater capabilities, such as the Intel 80386 and Motorola 68030 chips operating at 25-MHz clock rates, and as other elements of com-puter system~ such as memory, disk drives, and the like have corre-pondingly becom- le~- exp-n~iv- and o~ greater capability, th- p-rrormanc- and co-t ot high-reliability proce~sors has been r-guir-d to ~ollow the same trends In addition, ~tandardization on a ~ew op-rating ~ystems in the computer industry in general has vastly increased the availability o~ applications software, so a similar demand is made on the field Or high-reliability systems; i - , a standard operating system must be available It i~ thereror- the principal object Or this invention to provide an improved high-reliability computer ~ystem, particular-ly o~ the ~ault-tolerant type Another ob~ect i5 to provide an :`

improved redundant, fault-tolerant type o~ computing system, and one in which high per~ormanc- and reduced co~t are both possible;
particularly, it is preferable that the improved system avoid the performan~e burden~ u~ually associated with highly redundant S sy5temg. A further ob~ect is to provide a high-reliability computer ~ystem in which the performance, measured in reliability as well a~ speed and software compatibility, is $mproved but yet at a cost comparable to other altQrnatives of lower performance An additional object is to provide a high-reliability computer sy~tem which i~ capable of executing an operating system which uses virtual memory management with demand paging, and having protected (supervisory or "kernel") mode; particularly an operat-ing system also permitting execution of multiple processes; all at a high level of performance SUMMARY OF THE INVENTION
.
In accordance with one embodiment of the invention, a computer system employs three identical CPUs typically executing ; th- same instruction stream, and has two identical, self-checking memory modules storing duplicates of the samQ data A configura-tion of three CPUs and two memorie~ i9 th-refore employed, rather than three CPUs and three memorie~ as in the clas~ic TMR systems Memory references by the throe CPU~ are mad- by three separate bu~es connected to throe ~eparat- ports Or ach of the two m-mory module- In ordor to avoid impo~ing th- performance ~2S burd-n Or fault-tol-rant op-ration on th- CPUs thems-lve~, and impo-ing th- xp-n--, compl-xity and timing problem~ of fault-tol-rant clocking, th- thr-- CPU- ach hav their own separate and ind-p-nd-nt clock~, but ar- loo--ly ynchroniz-d, a~ by detecting event~ such as m mory r d -r-nc-~ and stalling any CPU
ah-ad of others until all xecut- tho function simultaneously;
th- int-rrupts are also ~ynchronised to th- CPU~ ensuring that the CPU~ xecut- th- int-rrupt at tho ~am point in their in-struction stream The thre- asynchronou- m-mory ref-rences via .' .

~ ., . ~. .. . . .
: , . - . . ~ . . ~

. . .
.: . . .
... . . .

:. :

. ` ~ .

the separate CPU-to-memory bu~seg are voted at the three geparate ports of each of the memory modules at th- time of th- memory ! request, but read da~a is not voted when returned to the CPUs The two memories both per~orm all write rQquests received from either the CPUs or the I/0 bu~ses, ~o that both are kopt up-to-date, but only one memory module presents read data back to the CPUs or I/Os in response to read requests; the one memory module producing read data is designated the ~primary~ and the other $9 the back-up Accordingly, incoming data i~ from only one source and is not voted The memory requQsts to the two memory modules are implemented while the voting is still going on, so the read data is available to the CPUs a short delay after the last one of the CPUs makes the request Even write cycles can be substantially overlapped because DRAMs used for these memory modules use a large part of the write access to merely r-ad and refresh, then if not strobed for the last part of the writ- cycle the read is non-destructive; therefora, a write cycle begins as soon as the first CPU makes a reque~t, but does not complete until the last request has been received and voted good These features of non-voted read-data returns and overlapped accesses allow fault-tolerant operation at high performance, but yet at minimum complexity and expense . .
I/0 function~ ar- implemented using two identical I/0 busses, each of which is s-parately coupled to only one of the memory modules a numb r of I/0 proc-ssors ar- coupl-d to both I/0 bus~--, and IlO d-vic-- ar- coupl-d to pairs Or the I/0 proc--~or~ but acc-~-d by only one o~ th- I/0 processors Since one m-mory modul- i~ designated primary, only the I/0 bus for this modul- will b- controllinq th- I/0 processors, and I/0 traffic between memory module and I/0 is not voted The CPUs can access th- I/0 proc-~ors through th- m mory modul-s (each access b-ing voted ~ust as the memory acc-sses are voted), but the IlO
proces-ors can only access the memory modules, not th- CPUs; the I/0 processors can only send interrupts to the CPUs, and these :
.:

`''` '' ::
, interrupts are collected in the ~emory modulQ~ be~or< pr~senting to th- CPU- Thus synchronization ov-rh-ad for ~/0 device access i~ not burdening the CPUs~ yet fault tolerance is provided If an I/O procQssor fails, the othor one of the pair can take over s control of the I/o device~ for thi~ I/o procQ~sor by merely changing the addresse~ used ~or the I/o device in th- I/o page table maintained by the operating system In this manner, fault tolerance and rein~egration of an I/o device is possible without sy~tem shutdown, and yet without hardware expense and performance p-nalty as~ociated with voting and the like in these I/o paths The memory system used in the illustrated embodiment i8 - hierarchical at several levels Each CPU has its own cache, operating at essentially the clock speed of the CPU Then each CPU has a local memory not accessible by the other CPUs, and virtual memory management allow~ the kernel of the operating system and pages for th- current ta~k to b- in local memory for all thre- CPUs, acces~ible at high speed without fault-tolerance ; overhead ~uch as voting or ~ynchronizing impo~ed Next is the memory module level, referred to as global memory, where voting and synchronization take place 90 some access-time burden is - introduced; nevertheless, the speed of the global memory is muchfaster than disk access, 80 thi~ level is u~ed for page swapping ~; with local memory to keep the most-used data in the fastest area, rather than employing disk for the fir~t level of demand paging ~ ' 2S one of the f-atur-~ of th- disclos-d embodiment of the inv-ntion i- ability to r-plac- faulty components, ~uch as CPU
modul-~ or m-mory modul--, without shutting down the sy~tem - Thu-, th- y-t-m i- ~v~ilabl- for continuouB U5- even though ; component- may ~ail and hav- to be replaced In addition, the ability to obtain a high l-v l o~ ~ault toleranc- with f-wer sy~t-m compon-nts, g , no fault-tolerant clocking needed, only two m-mory module~ n--ded inste~d o~ thr--, voting circuits ; minimized, tc , m-an~ that th-r- ar- f-w-r compon-nts to fail, and 80 th- reliability is enhanced That i9, ther- are fewer ~ .
, ' ~-'x '. .~;;
. . . .
., ~ .. ..
.. ... , :: . .
; ~. . . . . ..
.. . .
.. , . . ' .: . -:.
'' ' fa~lures because there are ~ewer component~, and when there are ~ailur-s th- compon~nt~ are isolated to allow the system to keep running, while the components can be replaced without system shut-down ~he CPUs of this system pre~erably u3e a commercially-available high-performance microproeessor chip ~or which operat-ing sy~tems such a~ UnixI~ are available Th- parts o~ the system which make it fault-tolerant are Rither transparent to the op-rating syQtem or easily adapted to the operating system Accordingly, a high-performance fault-tolerant system is provided ~ which allows comparability with contemporary widely-used multi-¦ tasking operating system and applications software s Although the memory modules are essentially duplicates or on- another, storins the same date, th-r- is still a need in some situation~ to be able to ~tore data separately by each CPU in a mann-r ~uch that the data is readabl- by all CPUs Of course, th- CPUs of th- exampl- embodiment hav- loeal memory (not in the memory moduleQ but instead on the CPU modul-s) but this loeal t~- memory is not aeeessible by the oth-r CPU~ Thus, according to a f-ature of one embodiment, an ar-a of private-write memory is included in th- shared memory area, ~o that unigu- state informa-tion ean be writt-n by aeh CPU then read by the others to do a eompare operation, for xampl- The privat- write is aecessed in ~ a manner such that th- in-truetion str-am~ of the CPU~ ar- still 5 25 identieal, and addr~ u--d are id-ntieal, so the integrity of th- id-ntieal eod- tr-am i- maintain-d Voting of data i~
u-p-nd-d wh n a privat- writ- op-ration i- d-t-eted by the ' m-mory modul~ ine- this data may differ, but th- addr-~ses and eommand- ar- still vot-d Th- ar-a us~d for privat- writ- may be ehang-d, or liminat-d, und-r eontrol of th- instruetion stream Aeeordingly, th- abllity to eompar- unigu- data i~ provided in a fl-xibl- manner, without bypassing the synehronization and voting m ehanism~, and without disturbing th- id-ntieal nature of the ~ eode ex-euted by th- multiple CPUs t 7 ~.' ~ ;~
,.~
,, ~ .
.
,: :
'' :. , ' , -. . , - . ~ .
`,--.

20~
BRIEF DESCRIPTION OF THE DRAWINGS

The features believed characteristic of the invention are set forth in the appended claims The invention itself, however, a~ well a~ other featurQs and advantages th~reo~, may be~t be , 5 understood by reference to the detailed de~cription o~ a specific embodiment which follow~, when read in con~unction with the accompanying drawings, wherein Figure 1 is an electrical diagram in block ~orm of a computer system according to on- embodim-nt of the invention;

t ' 10 Figure 2 is an electrical schematic diagram in block form of one o~ the CPUs of the system of Figure l;

s Figure 3 i~ an electrieal Jehematie diagram in block form of one of the mieroproees~or ehip u~ed in the CPU of Figure 2;

Figures 4 and 5 are timing diagrams showing events oeeurring in the CPU of Figures 2 and 3 as a funetion of time;

Figure 6 is an eleetrieal sehematie diagram in block form of one of the memory module~ in the eomputer ~ystem of Figure l;

Figure 7 is a timing diagra~ ~howing events oeeurring on the CPU to m-mory bu~ in th- y-t-m of Figur- l;
~':
Figur- 8 i- an l-etrieal ~ehomatie diagram in bloek form of ~, one of th- I/O proe-~ors in th- eomputer sy~tem of Figure 1;

Figur- 9 i- a ti~ing diagra~ showing ~vent~ V8 . time for the transf-r protoeol b-twe-n a memory modul~ and an I/O proees~or in th- ~y~t-m of Figur- l;

Figur- lO i~ a timing diagra~ ~howing eventa vs time ~or `4 exeeution of instruetions in th- CPU- of Figures 1, 2 and 3;

., 5 ,1 ~' `'''~'' '' ' ' ,'::
,~:

.: ',,' ;'~:', ' . ~,' .

Figure lOa is a detail view of a part of the diagram of Figure 10;

Figuras 11 and 12 are timing diagram~ similar to Flgure lo ~howing evQnt~ V8 time for execution of in~tructions in the cPus Or Figures 1, 2 and 3;

Figure 13 is an electrical schematic diagram in block form of the interrupt synchronization circuit used in the CPU of Figure 2;

Figures 14, 15, 16 and 17 are timing diagrams like Figures 10 or 11 showing events V8 time for execution of instructions in the CPUs of Figures 1, 2 and 3 when an interrupt occurs, - illustrating various ~cenarios;

Figur- 18 is a physical memory map of the memories used in the system of Figures 1, 2, 3 and 6;

Figure 19 is a virtual memory map of the CPUs used in the system of Figures 1, 2, 3 and 6;
. ., `~ Figure 20 is a diagram of the format of the virtual address `~ and th- TL~ entrie- in the microprocessor chip~ in th- CPU
according to Figure 2 or 3;

~20 Flgur- 21 1~ an lllu-tration Or the private m-mory locations in th- n~ ory ~ap of th- global memory modules in th- system of Figur-- 1, 2, 3 and 6; and Figur- 22 1- an l-ctrical diagra~ of a fault-tol-rant power ~upply us-d with the syst-m o~ th- invention according to one ,;25 ambodiment :~ .
.:
,~ 9 , . !

~ . . , , ' ':', . ", '' '' ., ' ~., . ' "' : ' ' .' ' . '.
,' ~ ' , ' . . .
'~ ' ' , '.' " ' -': , . ' j . ', . ' i , '. ~

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENT

With reference to Figure 1, a computer system using ~eatures of the invention i9 8hown in one embodiment having three identical processors 11, 12 and 13, re~err-d to as CPU-A, CPU-B
and CPU-C, which operate as one logical processor, all three typically executing the ~ame instruction str-am; the only time the threo processors are not executing the same instruction stream i~ in such operations as power-up self test, diagnostics and the like The three processors are coupled to two memory modules 14 and lS, referred to as Memory-~l and Memory-t2, each memory storing the same data in the same addr-~ space In a preferred embodiment, each one of the processors 11, 12 and 13 contains its own local memory 16, as well, accessible only by the proce~sor containing this memory Each one of tho processor~ 11, 12 and 13, as well as each one of the memory modules 14 and 15, ha- its own separatQ clock oocillator 17; in thio embodim nt, th- proc-s~ors are not run in nlock stepn, but instead are loosely synchronized by a method ;~ such as i~ set forth in the above-m-ntion-d application Ser No 118,503, i e , using event~ uch ao external memory references to bring the CPUs into ~ynchronization Ext-rnal intQrrupts are synchronized among th- threo CPUs by a t-chnigue employing a set of buos-o 18 for coupling the int-rrupt r-quQst~ and otatus from each of the processor~ to th- oth-r two; ach one of the proces-soro CPU-A, CPU-B and CPU-C io r-~ponoiv to the thr-- interrupt r-qu--t-, lt- own and th- two r-c-iv-d frou th- oth-r CPU~, to pr-o-nt an int-rrupt to the CPUo at th- a~ point in the ~ ox-cution tr-a~ Th- m-mory moduleo 14 and 15 vot- the memory ;l r-f-r-nc-o, and allow a ~-mory r-f-r-nc- to prose-d only when all ~-; 30 thr-e CPUo hav- mad- th- sam r-gu-ot ~with provision ~or ; faulto) In thi~ mann-r, th- proc---oro ar- ~ynchroniz-d at the tim- of xt-rnal ev-nt~ (me~ory r-f-r-nc--), r-oulting in the proc---or- typically ex-cuting th- oa~ instruction otream, in th- am s-qu-nc-, but not n~c~-arily during align-d clock . . ~ . .
~ 10 .~

.:;.

. : .

.. . ~ ..
.

cycles in the time between synchronization events In addition, external interrupts are synchronized to be xecuted at the same point in th- instruction ~tream of each CPU

The CPU-A proceggor 11 ~8 connectad to the Memory-tl module 14 and to the Memory-~2 module 15 by a bus 21; likewise the CPU-B is connected to the modules 14 and 15 by a bus 22, and the cPU-C is connected to the memory modules by a bus 23 These busses - 21, 22, 23 each include a 32-bit multiplexed address/data bus, a command bus, and control linos for address and data strobes The lo CPUs have control of these busses 21, 22 and 23, so there is no arbitration, or bus-request and bu~-grant Each one of the memory modules 14 and 15 is separately coupled to a respective input/output bus 24 or 25, and each of these bus~es i8 coupled to two (or more) input/output proce~sors lS 26 and i7 The system can have multiple I/O processor~ a~ needed to accommodate the I/0 devices needed for the particular system configuration Each one of the input/output processors 26 and 27 is connected to a bus 28, which may be of a standard configuration such as a VMEbu~, and each bu~ 28 is connected to one or more bus interface modules 29 for interface with a standard I/O controller 30 Each bus interface module 29 is connect-d to two o~ the buss-s 28, so failur~ o~ one I/0 proce~sor 26 or 27, or failure of one of th- bu~ channels 28, can bo tolorat-d Th- I/0 proc-s~or~ 26 and 21 can be addre~sed by th- CPUs 11, 12 and 13 through th- m-mory module~ 14 and 15, and can ignal an int-rrupt to th- CPU- via th- memory modules Disk drive-, t-rminal~ with CRT screens and keyboards, and network adapt-r~, ar- typical peripheral d-vic-~ operat-d by the controll-r- 30 Th- controll-r~ 30 may mak- DMA-typ- references to th- memory modul-~ 14 and 15 to tran~f-r block~ Or data Each ono of th- I/O proces~ors 26, 27, tc , has c-rtain individual lines directly connected to each on- of the memory module~ for bus r-qu--t, ku- grant, tc ; th - polnt-to-polnt connections ;

i~ .
:~ . ~........... .
.
-: , . .. . .... . ..
. ' ~.
:~ :. ' . . ..
.

are called "radial~ and are included in a group o~ radial lines A Qy~t-m statUJ bus 32 is individually connected to each one of the CPUs 11, 12 and 13, to each memory module 14 and lS, and S to each ot the I/o proce~sor~ 26 and 27, ~or th- purpose of providing in~ormation on the status ot each element This status bu~ provide~ in~ormation about which o~ the CPUa, memory modules and I/0 proce~sors i8 curr-ntly in the ~y-tem and operating properly An acknowledge/status bus 33 connocting the three CPUs and two memory modules includec individual linea by which th- modules 14 and lS ssnd acknowledge signals to th- CPU~ wh-n memory regue~t~ ar- made by the CPU~, and at th- sam- tim a ~tatus ; ti-ld i~ ~-nt to report on the statu~ ot th- command and whether it execut-d correctly The memory modul-- not only check parity on data r-ad trom or written to the global memory, but also check parity on data pa~ing through the menory modul-s to or from the ~ I/0 bu~s-~ 24 and 25, a~ well a~ cheeking the validity ot ,~ eommand~ It is through the status lines in bus 33 that these eheek~ are reported to th- CPU~ 11, 12 and 13, 80 if errors occur a fault routine can be entered to isolzte a faulty eomponent ;~ Ev n though both memory modul-~ 14 and 15 ar- storing the ~am data in global m mory, and op-rating to p-rfor~ v-ry memory r-~er-ne- in duplieat-, on- o~ th-~ ory module~ is dQsig-nat-d a~ pri~ary and th oth-r a- baek-up, at any giv n time N ~ory wrlt- op ration~ ar- x-eut-d by both m mory modules so $ both ar- k-pt eurr-nt, and al-o a ~ ~ory r-ad op-ration is i x-eut-d by both, but only th- pri~ary ~odul- aetually loads the r-ad-data baek onto th- bu~--a 21, 22 and 23, and only th-pri~ary ~-nory ~odul- eontrol- th arbitration tor ~ulti-master i bu---- 24 and 2S To k--p th- pri~ary and baek-up ~odul-~
x-euting th- tam op-ration-, a bu- 3~ eonv-y- eontrol 3 in~or~ation ~rom primary to baek-up Either modul- ean assume . ;, .

,.
. .
`!
'~t , . :-.`, ~ .
~'`'`""'' ' ' ' , .
" ' '""' ' . "'' ' ' `"' '` ' .
'.~''" '. ~

' ' ,` '' ~ ' .

the role o~ primary at boot-up, and the roles can switch durlng operat~on und-r softwar- control; the rol-~ can al~o switch when selected rror coAditions are detected by the CPUs or other error-responsive parts of the system Certain interrupts generated in the CPUs are also voted by the memory modules 14 and 15 When th- CPUs encounter such an interrupt condition (and ar- not st~lled), they signal an interrupt reguest to th- memory modules by individual lines in an interrupt bus 35, 80 the three int-rrupt reguests ~rom the three CPUs can b- voted When all interrupts hav- been voted, the memory modules each send a voted-int-rrupt signal to th- three CPUs via bus 35 This voting of interrupts also function~ to check on the operation of the CPUs ~he three CPUs synch the voted interrupt CPU interrupt s$gnal via the inter-CPU bus 18 and present th- interrupt to the processors at a common point in the instruction stream This interrupt synchronization is accomplished without stalling any of the CPUs CPU Module :
Referring now to Figure 2, one of th- processors 11, 12 or 13 i8 shown in mor- detail All thre- CPU module~ are of the same con-truction in a pro~erred e~bodiment, 80 only CPU-A will be de~crib d her- In ord-r to ke-p cost- within a competitive rang-, and to provid- r-ady acc-~- to alr-ady-dev-lop-d software and op-rating JyJt-u-, it i- pr-f-rr-d to us- a comc rcially-~2S availabl- uicroproc---or chip, and any on- of a nu~b r of devices may b- cho--n Th- ~ISC (r-duc-d in-truction set) architecture has ou advantag in i~pl-monting th- loo-e synchronization as will b- d-~crib d, but ~ore-conv-ntional CISC (complex in-truction -t) nlcroproc---or- such a~ Motorola 68030 devices j 30 or Int-l 80386 devic-~ (availabl- in 20-MHz and 2S-MHz peeds) could b used High-~p--d 32-bit RISC nicroproc-s~or d-vices are availabl- from -v ral sourc-- in thr-- ba~ic type-; Motorola produc-- a d-vic- a- part nu~b4r 88000, ~IPS Computor Sy~tems, ~ 13 .

; . - - :.
. ~ . .
., , , - ~ ~ .
:i . : , .~:
, . . , : :
: : -:.~., , ~
:: -Inc and others produce a chlp set referr-d to as the MIPS type, and Sun Mlcrosystems ha- announc-d a so-call-d SPARCJ~ typo (scalable processor architecture) Cypr-~ S-miconductor of san Jose, California, ~or example, manu~acture- a microprocessor re~erred to as part number CY7C601 provlding 20-MIPS (million instructions per second), clocked at 33-MHz, supporting the SPARC
tandard, and Fu~itsu manufactures a CMOS RISC microprocessor, part numb r S-25, also supporting the SPARC standard The CPU board or modul- in the illu~trative embodiment, used as an exampl-, employs a microprocessor chip 40 which i5 in this case an R2000 devic- de~ign-d by MIPS Comput-r Systems, Inc , and also manufactured by Integrat-d Dsvice T-chnology, Inc The R2000 device is a 32-bit proc-ssor u-ing RISC archit-cture to provide high performance, e g , 12-MIPS at 16 67-MHz clock rate ; 15 Higher-sp-ed versions of this d-vic- may b- u~ed inst~ad, such as the R3000 that provides 20-MIPS at 25-MHz clock rat- The proc-~sor 40 also has a co-processor u--d for memory management, including a translation lookasid- bu~r-r to cache translations of logical to physical addresses The proc-ssor 40 i8 coupled to a local bus havinq a data bus 41, an addr-~- bus 42 and a control bus 43 S-parate in~truction and data cach- memories 44 and 45 are coupl-d to thi~ local bus Th-se cach-~ ar- ach of 64K-i byte iz-, for exampl-, and are acc-~s-d within a single clock cycle of the proc-~-or 40 A num ric or floating point co-` 25 proc-s-or 46 i9 coupl-d to th local bu- if additional p-rformanc- i- n--d-d for th-J- typ-~ of calculation-; this nu~-r~c proc---or d vic- 1- al~o comm~ rcially availabl- from MIPS
~` Comput-r Sy~t~ a- part number R2010 Th- local buJ 41, 42, 43, coupl-d to an int-rnal bus ~tructur- through a write buffer 50 and a r-ad bu~f-r S~ Th- writ- buff-r i- a com~m rcially availabl- d-vic-, part nu~b r R2020, and function~ to allow th-proc---or 40 to continu- to x-cut- Run cycl-s aft-r toring data and addre~s in the writ- buff-r 50 for a writ- op-ration, rather ~ than having to ex-cut- stall cycl-- whil- th- writ- is ; 3S compl-ting ....

., ;~

:
.", ,:.~','' , ' .. i . . . .
. '`", ' ` ~ , . .

:~; - , . .
.. . ~ .

- ~ ,' :. .
~,'; ' ' - ..', :

200:~342 In add~tion to the path through the write bu~er 50, a path is provided to allow the processor 40 to execute write operations bypas~ing the write buffer 50 This path i~ a write bu~fer bypass 52 allows the processor, under so~tware ~election, to perform synchronous writes If the write buffer bypass 52 is enabled (write bu~fer 50 not enabled) and the processor executes a write then the processor will stall until the write completes In contrast, when writes are executed with the write buffer bypas~ 52 disabled the processor will not stall because data is written into the write buffer 50 (unles~ the write buffer is full) If the write buffer 50 iQ enabled when the proce~sor 40 performs a write operation, the write bu~er 50 captures the output data from bus 41 and the address from bus 42, as well as controls from bus 43 The write buffer 50 can hold up to ~our lS such data-address sets while it waits to pass the data on to the main memory The write buffer runs ~ynchronously with the clock 17 of the processor chip 40, ~o the processor-to-buffer transfers are synchronous and at the machine cycle rate of the processor The write buffer 50 signals the processor if it is full and unable to accept data Read operations by the processor 40 are checked against the addresses contain-d in the four-deep write buffer 50, so if a read is attempted to one of the data words waiting in the write buffer to be written to m mory 16 or to global memory, the read is stalled until th- write is completed The writ- and r-ad buffers 50 and 51 are coupl-d to an int-rnal bu- ~tructur- having a data bu~ 53, an address bus 54 and a control bus SS Th- local memory 16 is accessed by this int-rnal bu-, and a bu- interfac- 56 coupled to th- internal bus i- us-d to acce~- th- system bus 21 ~or bus 22 or 23 for the other CPU~) The ~-parat- data and addre~s bu-ses 53 and 54 of the int-rnal bus (as d-rived rrom buss-s 41 and 42 o~ th- local ~ bus) are conv-rted to a multipl-x-d addr-~-/data bus S7 in the ; system bus 21, and th- command and control line~ ar-,~ 15 `
:
. . .
,~ :, . . .
-. . .

., , . - ' - ' - ' . ' ` ' ; .
: ~ : - .

corre~pondingly converted to command lin-~ s8 ~nd control lines 59 in thi~ external bus The bu~ int~rface unit 56 also receives the acknowlodge/status lines 33 ~rom the memory modules 14 and 15 In these lines 33, separate ~tatus line~ 33-1 or 33-2 are coupled from each o~ the moduleJ 14 and 15, so th- responses from both ; memory modules can be evaluated upon th- evQnt of a transfer (read or write) between CPUs and global m-mory, as will be explained .
The local memory 16, in one embodiment, comprises about 8-Mbyte of RAM which can be accessed in about three or four of the machine cycles of processor 40, and thi~ acces~ is synchronous with the clock 17 Or this CPU, whereas the memory access time to the module~ 14 and 15 is much greater than that to local memory, and thi- access to th- memory modul~ 14 and 15 is asynchronous and sub~-ct to the synchronization overhead imposed by waiting for all CPUs to make the request then voting For eompari~on, access to a typical commercially-available disk memory through the I/O processors 26, 27 and 29 i~ mea-ured in milliseconds, i e , considerably ~lower than acces~ to the modules 14 and 15 Thus, there is a hi-rarchy o~ mem~ry aceess by the CPU chip 40, th- highe~t being the instruetion and data eaehes 44 and 45 which will provide a hit ratio o~ p-rhaps 95% wh-n u~ing 64-XByte cache siz- and uitabl- r$11 algorith~s The s-eond highe~t is the ~ 25 loeal m-mory 16, and again by mploying eont-mporary virtual -, m-mory ~anag-m-nt ~lgorithm- a hit ratio Or p-rhap- 95% is i obtain-d ~or m ~ory r-~-rene-~ ~or whieh a each- miss occurs but a hit in loeal ~ ~ory 16 i~ ~ound, in an xampl- wh-re th- size o~ th- loeal m-mory i~ about 8-MByt- Th- net r-sult, from the standpoint o~ th- proe-~or ehip 40, is that p-rhaps great-r than 99% o~ m-mory r-f-r-ne-~ (but not I/0 r-~erenees) will b-synehronou- and will oeeur in ither th- sam- maehin- cycle or in three or ~our maehin- eyele~

" 16 : ..
.

. .
:,i , ., . . ~ , -.;. .
.
~................................ .
.
,. . .

The local memory 16 i8 accessed ~ro~ thQ internal bus by a memory controller 60 which receives the addrosses from address bus 54, and the addrsss strobes from the control bus 55, and generates separate row and column addrQsses, and RAS and CAS
controls, for example, if the local memory 16 employ~ DRAMs with multiplexed address~ng, a8 is usually th- case Data is written to or read from the local memory via data bus 53 In addition, several local registers 61, as well as non-volatile memory 62 such as NVRAMs, and high-speed PROMs 63, as may be used by the operating system, are accessed by the internal bus; some of this part of the memory i8 used only at power-on, some is used by the operating syste~ and may be almo~t continuously within the cache 44, and other may be within the non-cached part o~ the memory map External interrupts are applied to the processor 40 by one of the pins of the control bus 43 or 55 from an interrupt circuit 65 in the CPU module of Figure 2 This type of interrupt is voted in the circuit 65, so that befor- an interrupt is executed by the processor 40 it is determined whether or not all three CPUs are presented with the interrupt; to this end, the circuit 65 receives interrupt pending inputs 66 from the other two CPUs 12 and 13, and sends an interrupt pending Qignal to the other two CPUs via line 67, these lines being part of the bus 18 connecting the three CPUs 11, 12 and 13 togeth-r Also, for voting other types o~ interrupt~, sp-cifically CPU-g-n-rated interrupts, the circuit 65 can send an int-rrupt r-gu--t rrom this CPU to both of th- m mory modul-- 14 and lS by a lin- 68 in the bus 35, then r-c-iv- ~-parate vot-d-interrupt signals from tho memory modules via lin-- 69 and 70; both memory module~ will present the ext-rnal int-rrupt to b- act-d upon An int-rrupt g-nerated in some ext-rnal ourc- 8uch as a keyboard or disk driv- on one of th- I/O chann-ls 28, for exampl-, will not bo pr-sent-d to the interrupt pin of th- chip 40 rrom th- circuit 65 until each one of tho CPUs 11, 12 and 13 is at th- sam point in the instruction stream, as will be explained ` 17 ~' .

' ; .

~ Sinc- the processor~ 40 are clocked by separate cloc~
; oscillator- 17, there mu-t be ~ome mechanism for period~cally bringing the processors 40 back into ~ynchronization Even though the clock oscillators 17 are of the same nominal rrequency, ~ g , 16 67-MHz, and th- toleranc- ror theso devices i~ about 25-ppm (parts per milllon), th- proce~sors can potentially become many cycles out Or phase unless periodically brought back into ~ynch Of cour~ very time an xternal interrupt occurs the CPUs will be brought into ~ynch in the sense Or being interrupted at the sam point in th-ir instruction stream (due to the interrupt synch m chani~m), but this doe~ not help bring the cycle count into synch Th- mechanism Or voting memory references in the memory modul-s 14 and 15 will bring the CPUs into synch (in real time), as will be oxplained However, som- conditions result in long periods wh-re no meaory reterence occur-, and so an additional mechanism is used to introduc- ~tall cycle- to bring tho proce~sors 40 bacX into synch A cycl-counter 71 i~ coupl-d to th- cIock 17 and th- control pins of the processor 40 via control bus 43 to count aachine cycles which are .5 20 Run cycl-s (but not Stall cycles) This counter 71 includes a count regist-r having a maxiaua count valu- ~olected to represent the p-riod during which th aaximua allowable drift ~Qtween CPUs would occur (taking into account tho speciried tolerance tor the crystal oscillators); wh-n this count r-gist-r overflow- action 2S is initiated to tall th- ta-t-r proce--ors until th- slow-r proc---or or proc-s-or- catch up Thi- count-r 71 i- r-s-t wh-n-v r a ynchronlzatlon i- don- by a a aory rer-renc- to the i a aory aodul-- 14 and 15 Also, a r-tr-sh counter 72 is employed to p-rtors r-tr--h cycl-- on the local a aory 16, as will be xplainod In addition, a count-r 73 counts aachin- cycle which ar- Run cycl-- but not Stall cycl-s, lik th- count-r 71 does, but this count-r 73 i- not re--t by a a aory r-ter-nc-; th-' count-r 73 is u--d tor int-rrupt ynchronization as xplain-d ;~ below, and to this nd produces th- output signals CC-4 and CC-8 to th- int-rrupt ynchronization circuit 65 ., ,1 '.~
`.~
. .
. - . , :' :' .: . ' . ' ' .
. ~ .
''`" ' . ~ . .

Th- proc-ssor 40 has a RISC instruction s-t which does not suppsrt memory-to-memory instruction~, but in~tead only memory-to-regiater or regi~ter-to-m-mory instructions (i e , load or store) It is important to koep rr-gu-ntly-us-d data and th-S currently-executing code in local m mory Accordingly, a block-transfer operation iB provided by a DMA stat- machine 74 coupled to th- bus interface 56 Th- proce~or 40 wr$tes a word to a regi~ter in the DMA circuit 74 to function as a command, and writes the starting address and l-ngth of the block to registers in this circuit 74 In one embodim nt, the microprocessor stalls while the DMA circuit takes over and xecutes th- block transfer, producing the necessary addresses, command~ and strobes on the busses 53-S5 and 21 The command ex-cut-d by th- processor 40 to initiate this block transfer can b- a r-ad from a register in the DMA circuit 74 Sinc- memory manag-m nt in th- Unix operating system r-lies upon demand paging, thQs- block tran~fers will most often be pages being moved betw -n global and local memory and I/0 traffic A page is 4-RByte- or course, the bu~ses 21, 22 and 23 support single-word read and writ- transfers between CPUs and global memory; th- block transfer- ref-rred to are only po-sibl- betwe-n local and global memory .,.
Th- Proe-s-or , R f-rring now to Figur- 3, th- R2000 or R3000 typ- of - mieroproe-~-or 40 of th- x~pl- badi~ nt i9 ~hown in more 2S d-t~il Thi- d-viC- inelud-- ~ ~ain 32-bit CPU 75 eontaining thirty-two 32-bit g-n-ral purpos- regi-t-r~ 76, a 32-bit ALU 77, a z-ro-to-6~ bit ~hift-r 78, and a 32-by-32 multiply/divide eireuit 79 Thi- CPU al~o ha- ~ program eount-r 80 along with a-~oeiat d iner-~-nt-r ~nd adder Th--- eo~pon-nt~ ar- eoupled ~, 30 to a proc-~or bu~ tructur- 81, whieh i- coupl-d to th- local d~t~ bu~ 41 and to an in~truction d-eod-r 82 with a~soeiated ~ eontrol logie to ex-eut- in~truetion- feteh-d via data bus 41 ;~ Th- 32-bit loe~l addr-~ bu~ 42 i~ driv-n by a virtu~l ~-mory '~ .
. ~ .

, ~'.
:-. : . . . . :! ~. .. :-.:~ ' . . '' '' ' : ' ,'' . ' .' . , , . ;.
:
. . . , ,, . , ' .:

. ~` . ~ . .. . : .. .. . .

management arrangement including a tranglatiOn lookaside buffer (TLB) 83 within an on-chip memory-management coprocessor The TLB 83 contains sixty-four entries to b- compared with a virtual address received from the microprocessor block 75 via virtual address bus 84 The low-order 16-bit part 85 o~ the bus 42 is driven by the low-ordor part of thi~ virtual address bu~ 84, and the high-order part is from th- bus 84 if the virtual address is used as the physical address, or i~ the tag ntry from the TLB 83 via output 86 if virtual addressing is us-d and a hit occurs The control lines 43 of the local bus ar- connected to pipeline and bus control circuitry 87, driven from the internal bus structurQ 81 and the control logic 82 The microprocessor block 75 in the proces_or 40 is of the RISC type in that most in_truetion~ exeeute in one maehine eycle, and the instruetion set use~ register-to-regist-r and load/store instruction~ rathQr than having compl-x inotruetions involving memory ref-rences along with ALU op-ration~ There are no compl-x addressing schQmes includ-d a- part of the in~truction set, sueh as "add the operand whose addr-ss is the Qum of the eontent~ of register Al and register A2 to th- operand whose address is found at th- main memory location addressed by the eontents of regiQter B, and store tho result in main memory at tho loeation whose address is found in r-gister C n Instead, this operation i~ don- in a number of simpl- regist-r-to-register and load/stor- instruetion~ add r-gi-t-r A2 to r-gister Al;
load regist-r Bl trom m mory loeation whos- addres~ i~ in r-gi-t-r B; add regi-t-r Al and r-gi~t-r Bl; tor- r-gi~ter Bl to m mory loeation addr~ d by r-gist-r C Optimizing eompiler t-ehnlqu-- ar- u--d to maximize th- us- of th- thirty-two r-glst-r- 76, 1 - , a~ur- that mo-t op ration~ wlll find the op-rands already in the r-gi~t-r s-t Th- load in~truetion~
aetually tak- longer than on- ~aehine eyel-, and to aeeount for ~ thi~ a lat-ney of on- instruetlon 1~ lntrodue-d; th- data fetched -~ by th- load instruetion is not us-d until th- seeond eyele, and , :
- .. .
:` : ~ . ...

. .
. .

:

the intervening cycle is used for ~ome other ln~truction, if po~sible.

The main cPu 75 is highly pipelined to facilitata the goal of averaging one instruction execution per machine cycle S Referring to Figure 4, a singlo instruction is executed over a period including five machine cycles, where a machine cycle is one clock period or 60-nsec for a 16 67-MHz clock 17 These f ive cycle~ or pipe stages are referred to a~ IF (instruction ~etch from I-cache 44), RD (read operands fro~ regi~ter set 76), ALU
(perform the required operation in ALU 77), MEM (access D-cache 45 if required), and WB (write back ALU re~ult to register file 76) A~ seen in Figure 5, these five pipe ~tage~ are overlapped so that in a given machine cycle, cycle-5 for example, instruction I~5 is in it~ first or IF pipe stage and instruction I~l is in its last or WB stage, while the other instructions are in the intervening pipe stages Memory Module With reference to Figure 6, one of the memory modules 14 or 15 is shown in detail Both memory modules aro of the same construction in a preferred embodiment, so only the Memory~l modul- is shown The memory module includes three input/output ports 91, 92 and 93 coupled to th- three busseo 21, 22 and 23 ,. . .
; coming from the CPU~ 11, 12 and 13, r-sp-ctively Inputs toth-~- ports ar- latch-d into regi~t-rs 94, 95 and 96 each of which ha- -parat- -ctiono to otor- data, addres~, command and otrob-o for a writ- op-ration, or addr---, command and strobes rOr a r-ad op-ration Th- cont-nts o~ th-s- three registers are vot-d by a vot- circuit 100 having inputo connected to all section~ of all thr-- regist-rs If ~11 three of th- CPUs 11, 12 and 13 ma~e th- sam m-mory request (sam- addr-s~, ~am- command), a~ should be th- ca~e ~ince th- CPUs ar- typically ex-cuting th-i ~ame instruction str-am, then the memory request is allowed to complete; howev-r, a~ soon a~ the fir~t memory request is latched :

''`

:' . ~ ,, ~ . .
. ~

'' ` '' ' ' ` ~

.

: . . .~.. .
- ;

20(~3342 into any one o~ the three latches 94, 95 or 96, it is passed on immediat-ly to begin the m-mory acces~ To this end, the address, data and command are applied to an internal bus including data buo lol, addre~s bus 102 and control bu~ 103 From thio internal bus the m-mory reque~t acc-~se~ various resources, depending upon the address, and depQnding upon the ~ystQm conflguration In one embodiment, a larg- DRAM 104 i~ accessed by the internal bu~, using a memory controller 105 which accepts the address from address bus 102 and msmory reque~t and strobes from control bu~ 103 to generate multiplexed row and column addresses for the DRAM so that data input/output is provided on the data bus 101 This DRAM 104 is also referred to as global memory, and is of a size of perhap~ 32-MByte in one embodiment In addition, the internal bu~ 101-103 can acces- control and status registers 106, a quantity of non-volatile RAM 107, and write-protect RAM
108 Th- m-mory reference by th- CPUa can also bypass the memory in the m-mory module 14 or 15 and acce~s th- I/O busses 24 and 25 by a bus interface 109 which ha~ inputs connected to the internal bus 101-103 If the memory module i~ th- primary memory module, a bus arbitrator 110 in each memory module controls the bus interface 109 If a memory modul- i~ the backup module, the bus 34 controls the bus int-rface 109 A memory access to th- DRAM 104 iJ initiat-d as soon as the tirst r-gu--t is latch-d into on- of the latches 94, 95 or 96, but i- not allow d to compl-t- unl--s th- vot- circuit 100 d-t-r~in-- that a plurality ot th- r-qu--ts are the sam-, with provi-ion tor taults Th- arrival ot th- tirst of th- three r-gu--t- cau--- th- access to th- DRAM 104 to ~egin For a read, th- DRAM 104 is addres--d, th- sens- ampliti-rs ar- strob-d, and th- data output i- produc-d at th- DRAM output-, so it th- vot-j is good att-r th- third r-qu--t is r-c-iv d th-n th- requ-sted '~ data is r-ady ~or imm-diat- transt-r back to th- CPUs In this mann-r, voting i~ ov-rlapped with DRAM acc-ss . .

.... . . .
. - - . -: : . . -, ~
;: : . , .
. ~ . .
: . . . .

: : , -Re~erring to Figure 7, the bu~e~ 21, 22 and 23 apply memory requests to ports 91, 92 and 93 of the memory module~ 14 and 15 in the format illustrated Each o~ these busses consists of thirty-two bidirectional multiplexed addr-~s/data lines, thirteen s unidirectional command lines, and two ~trobes The command lines include a field which specifies the type of bus activity, such as read, write, block transfer, aingle transfer, I/o read or write, etc Also, a ~ield functions as a byte enabl~ ~or the four bytes The strobes are AS, addres~ strobe, and DS, data strobe lo The CPUs 11, 12 and 13 each control their own bu~ 21, 22 or 23;
in this embodiment, these are not multi-master busses, there is no contention or arbitration For a write, the CPU drives the address and command onto the bus in one cycle along with the address strobe AS (active low), then in a subseguent cycle (possibly the next cycle, but not neces~arily) drives the data onto the address/data lines of the bus at the same time as a data strob DS The address strobe AS from each CPU cause~ the addres~ and co~mand then appearing at th- port~ 91, 92 or 93 to be latched into the address and com~and sections of the registers 94, 95 and 96, as these strobes appear, then the data strobe DS
cause~ the data to be latched When a plurality (two out of three in this embodiment) of the busses 21, 22 and 23 drive the sam- memory request into the latches 94, 95 and 96, the vote circuit 100 passes on the final command to the bus 103 and the memory access will be xecut-d; if the command is a write, an acknowl-dg- ACK ignal i~ ~ent back to ach CPU by a line 112 (sp-cifically lin- 112-1 rOr Memory~l and line 112-2 for M mory~2) a- soon a~ th- writ- ha~ be-n executed, and at the same time ~tatu- bit~ ar- driven via acknowl-dg-/status bua 33 (~p-cifically lin-- 33-1 for Memory~l and lines 33-2 for Memory~2) to each CPU at time T3 of Figur- 7 The delay T4 b twe-n th- last strob- DS (or AS if a r-~d) and th- ACK at T3 is variable, depending upon how many cycles out of synch th- CPUs ar- at th- tim- of th- memory r-guest, and dep-nding upon the 3S d-lay in th- voting circuit and th- pha~- Or th- internal . ~.
' ' :.

. . .
. : .

, : '' - , .
; . , ~

independent clock 17 o~ the memory ~odule 14 or lS compared to the CPU clocks 17 I~ the memory requegt is~ued by the CPUg is a read, then the ACX signal on lines 112-1 and 112-2 and the status bits on lines 33-1 and 33-2 will be sent at the same time as the data is driven to the address/data bus, during time T3; this will release the stall in the CPUs and thu~ synchronize the CPU chips 40 on the same lnstruction That is, the rastest CPU will have executed more stall cycles as it waited ~or the slower ones to catch up, then all three will be released at the same time, although the clocks 17 will probably be out of phase; the first instruction executed by all three CPUs when th-y come out of stall will be the same instruction All data being sent from the memory module 14 or 15 to the CPUs 11, 12 and 13, whether the data is read data from the DRAM
lS 104 or from the memory locations 106-108, or is I/O data from the busses 24 and 25, goes through a register 114 Thi~ register is loaded ~rom the internal data bu~ 101, and an output 115 from this register is applied to the addre~s/data lines for busses 21, 22 and 23 at ports 91, 92 and 93 at tim T3 Parity is checked when the data is loaded to this register 114 All data written to the DRAM 104, and all data on the I/O busses, has parity bits associated with it, but the parity bits ar- not trans~erred on ~, busses 21, 22 and 23 to the CPU modules Parity errors detected ;i at th- read regist-r 114 ar- r-ported to the CPU via the status ; 25 busses 33-1 and 33-2 Only the memory modul- 14 or 15 designated a- primary will driv- th- data in its r-gi~t-r 114 onto the bu~ 21, 22 and 23 Th- m-mory modul- d-~ignated as back-up or -condary will compl-t- a read op-ration all th- way up to the point o~ loading th- register 114 and ch-cking parity, and will report tatu- on bus-- 33-1 and 33-2, but no data will be driven to th- bu~ 21, 22 and 23 :, .
A controll-r 117 in ach memory modul- 14 or 15 operates as a stat- machine clock-d by the clock oscillator 17 ~or this modul- and rec-iving th- variou~ command lin-- from bus 103 and ,i . ~.

~';.

; -,, .; ~; : .-.
: .
..
: , .

bussei9 21-23 , ~tC., to generate control bits to load registers and bus~e~, generate external control ~ignals, and tha like Thisi controller aliso iisi connected to the bus 34 between the memory moduleis 14 and 15 which tran~fers ~tatus and control in~o~mation betwe~n the two The controIlQr 117 in the module 14 or 15 currently designated as primary will arbitrate via arbitrator 110 between the I/O side (interface los) and the CPU
side (ports 91-93) for acceiss to the common bus 101-103 This decision made by the controller 117 in th- primary memory module 14 or 15 is communicated to the controller 117 of other memory modula by the lines 34, and forc-s the other memory module to execute the same access The controller 117 in each memory module also introduces refresh cycles for the DRAM 104, based upon a refresh counter 118 receiving pulses from th- clock oscillator 17 for this module The DRAM must receiv- 512 refresh cycles ev-ry 8-misec, so on average there must be a refresh cycl- introduced about every 15-microsec The counter 118 thuis produces an ov-rflow signal to the controller 117 every 15-microsec , and if an idle condition exiists (no CPU access or I/O access executing) a refresh cycle is implemented by a command applied to the bus 103 If an operation is in progress, the refresh is executed when the current operation i8 finished For lengthy operations such as block transfers used in m-mory paging, sev-ral rerresh cycles may be backed up and execut- in a bur~t mode after th- transfer is eompl-t-d; to thi- nd, the numb r of overflowo of counter 118 ine- th- la-t r-~r--h eyel- are aeeumulated in a regiister assoeiat-d with th- eount-r 118 ; lnt-rrupt r-gue~ts for CPU-g-nerated int-rrupts ar- receivedfrom each CPU 11, 12 and 13 individually by lines 68 in the int-rrupt bu~ 35; th--- int-rrupt regueot- ar- s-nt to each m-mory module 14 and 15 These interrupt r-qu-st lines 68 in bus 35 ar- applied to an interrupt vote eireuit 119 whieh eompares the thre- reguests and produeeis a vot-d int-rrupt signal on . ..
~' ; ~

-.. - , -, . .
.: ' ' . ' . ' outgoing line 69 of the bus 35 Tho cPu~ each receive a voted interrupt ~ignal on the two lines 69 and 70 (one from each module 14 and 15) via the bus 35 The voted interrupts from each memory module 14 and 15 are ORed and presented to the interrupt synchronizing circuit 65 The CPUs, under software control, decide which interrupts to service External interrupts, generated in the I/O processors or I/O controllers, are also signalled to the CPUs through the memory modules 14 and 15 via lines 69 and 70 in bus 35, and likewise th- CPUs only respond to an interrupt from the primary module 14 or lS

I/O Processor Referring now to Figure 8, one of th- I/O processors 26 or 27 is shown in detail The I/O processor has two identical ports, one port 121 to the I/O bus 24 and the other port 122 to the I/O bus 25 Each one of the I/O busses 24 and 25 consists ; of a 36-bit bidirectional multiplexed address/data bus 123 (containing 32-bits plus 4-bits parity), a bidirectional command bus 124 defining the read, write, block read, block write, etc , type of operation that is being executed, an address line that designates which location i~ being addressed, ither internal to I/O processor or on busses 28, and ~he byte mask, and finally control lines 125 including addr-s~ ~trobe, data strobe, address acknowl-dg- and data acknowl-dg- The radial line~ in bus 31 includ- individual lin-~ ~rom each I/O proc-ssor to each memory modulo bu- r-qu--t from I/O proce~or to th- memory modules, bus grant from th- memory module~ to th- I/O processor, interrupt r-que~t lin-s trom I/O processor to memorv module, and a reset lin- from m-mory to I/O processor Lin-~ to indicate which memory modul- i~ primary are connect-d to ach I/O proc-ssor via th- syst-m statu~ bus 32 A controll-r or stat- machin- 126 in th- I/O proce~sor of Figure 8 r-CQiv-~ th command, control, ~3 status and radial lines and int-rnal data, and command lines from th- buss-s 28, and d-fines th- int-rnal operation o~ th- I/O

~., :
.
. ..
.: ., . ' ' ' 2~)03342 proces~or, including operation or latche~ 127 and 128 which receive the content~ of busses 24 and 25 and also hold in~ormation ~or transmitting onto the busse~

~ransfer on the bus~es 24 and 2s from memory module to I/o S processor use~ a protocol as shown in Figur- g with the address and data ~eparately acknowledged The arbitrator circuit llo in the memory module which i~ designated primary performs the arbitration for ownership of the I/0 busses 24 and 25 When a trans~er from CPUs to I/o is needed, tho CPU request is presented to the arbitration logic llo in the memory module When the arbiter 110 qrants this request the memory modules apply the address and command to busses 123 and 124 (of both busses 24 and 25) at the same time the address strobe is asserted on bus 125 (o~ both busses 24 and 25) in time T1 o~ Figure 9; when the controller 126 ha~ caused the addres~ to b- latched into latches 127 or 128, the addr-~- acknowledg- i~ a-s-rted on bus 125, then the memory modules place the data (via both busses 24 and 25) on the bu~ 123 and a data strobe on lines 125 in time T2, following which the controller causes the data to be latched into both latches 127 and 128 and a data acknowledge signal is placed upon the lines 125, so upon receipt Or the data acknowledge, both of the memory modules release the bus 24, 25 by de-asserting the address strobe signal ~he I/0 proc-ssor then deasserts the address acknowledge signal ' 25 For tran-~-r- ~rom I/0 proc-soor to the m mory module, when; th- I/0 proc---or n--do to us- th- I/0 buo, it ass-rts a bus r-qu--t by a lln- in th- radial buo 31, to both busses 24 and 25, then waito ~or a buo grant signal ~rom an arbitrator circuit 110 in th- prlmary m-mory modul- 14 or lS, th- bu~ grant lin- also b ing on- o~ th- radialo When th- bus grant has be-n asserted, th- controll-r 126 th-n waits until th- addres- strob- and ~` addr-~o acknowledg- ignalo on buo~e~ 125 are d-a~erted (i e , ~als-) m anlng the pr-viou~ tran-rer is completed At that time, th- controll-r 126 causes th- addr-s~ to b- applied ~rom latches s ~ 27 , .

, '` ` ' - ' ` ~: ' . . .

': ' ' ' . :
. . . , - ~ ~ .
-,.~ : . '~ ' - : ` -. :~, . .. .

127 and 128 to lines 123 of both busses 24 and 25, the command to be appli-d to linQs 124, and the addres~ trobe to be ~pplied to the bus 125 o~ both buss~s 24 and 25 When address acknowledqe is received from both busses 24 and 25, theso are ~ollowed by s applying the data to the addres~/data bus~e~, along with data strobes, and the trans~or is completed with a data acknowledge signal~ from the memory modules to the I/0 processor The latches 127 and 128 are coupled to an internal bus 129 including an address bus 129a, and data bus 129b and a control bus 129c, which can address int-rnal statu~ and control register 130 used to set up the commands to be executed by the controller state machine 126, to hold the st2tus distributed by the bus 32, etc These registers 130 are addressable for read or write from the CPUs in the address space of the CPUs A bus interface 131 communicates with the VMEbus 28, under control of the controller 126 Th- bus 28 includes an address bus 28a, a data bus 28b, a control bus 28c, and radials 28d, and all of these lines are communicated through the bus interfac- modules 29 to the I/O
controllers 30; the bus interface module 29 contains a multiplexer 132 to allow only on- set of bus lines 28 (from one I/O processor or the other but not both) drive the controller 30 Internal to the controller 30 are command, control, status and data registers 133 which (as is standard practice for peripheral cantrollers of this typ-) are addr-ssable from the CPUs 11, 12 2S and 13 for read and write to initiat- and control operations in I/0 devic-8 ;j Each on- Or th- I/0 controller- 30 on the VMEbuses 28 has connection~ via a multiplexer 132 in the BIM 29 to both I/O
proc--~or- 26 and 27 and can b- controlled by ither one, but is bound to on- or th- oth-r by th- progra~ executing in th- CPUs A particular addr-~ (or ~-t of addre~ tablished for control and data-transf-r register~ 133 r-pre~-nting each controll-r 30, and th--- address-s are ~aintained in an I/0 page tabl- (normally in the k-rn-l data ection of local memory) by ~ 28 ,:
. ~
~.
~, , - : :
..... . .
~. .

the operating system. These addresses a6sociate each controller 30 a~ being accQ3sible only through ither I/0 proces~or #1 or t2, but not both. That i8, a dir~erent addres~ is used to reach a particular register 133 via I/0 processor 26 compared to I/0 s processor 27. The bus interface 131 (and controller 126) can switch the multiplexer 132 to acc-pt bu~ 28 from one or the other, and this i8 done by a write to the registers 130 of the I/0 processors from the CPUs. Thus, when tho device driver is called up to access this controller 30, the operating system uses these addresses in the page table to do it. The processors 40 access the controllers 30 by I/0 writes to the control and data-transfer registers 133 in these controllers using the write buffer bypass path 52, rather than through the write buffer S0, so these are synchronous writes, voted by circuits 100, passed through the memory modules to the bu~ses 24 or 25, thus to the selected bus 28; the processors 40 stall until the write is completed. The I/0 processor board of Figure 8 is con~igured to detect certain failures, such as improper commands, time-outs where no response i3 received over VMEbus 28, parity-checked data if implemented, etc., and when one of these failures is detected the I/0 processor guits responding to bu~ traf~ic, i.e., guits - sending address acknowledge and data acknowledge as discussed above with reference to Figure 9. Thi~ i~ detected by the bus interface 56 a~ a bus fault, resulting in an interrupt as will be explained, and self-correcting action if pos~ible.
;' ,:', ~ Error ~-cov-ry:
~ ., The ~-qu-nc- u~-d by the CPUs 11, 12 and 13 to evaluate response- by th- memory modules 14 and 15 to transfers via busses 21, 22 and 23 will now be doscribed. Thi~ seguence is defined by the ~tat- machine in the bus inter~ace unit- 56 and in code executed by the CPUs.

: --::
. ,~ .
.:
- . . .. - . .

.; - . . . ~ :
. ; ~. . ~ .:
, `: . ' . . .
. . .

In case one, for a read trans~er, it i~ as~umed that no data errors are indicated in the ~tatu- blts on line~ 33 ~ro~ the primary memory Here, the stall begun by the memory reference is ended by asserting a Ready signal via control bus s5 and 43 to - 5 allow instruction execution to continue in each microproce~sor 40 But, another transfer i8 not started until acknowledge i9 recelved on line 112 from the other (non-primary) memory module(or it times out) An interrupt is posted if any error was detected in either status field (lines 33-1 or 33-2), or if the non-primary memory times out In case two, for a read transfer, it i~ assumed that a data error i~ indicated in the status lines 33 from the primary memory or that no response is received from the primary memory The CPU~ will wait for an acknowledge from the other memory, and if no data errors are found in status bit~ from the other memory, circuitry of the bus interfac- 56 forces a change in ownership (primary m-mory ~tatus), then a retry is instituted to see if data i~ correctly read from the new primary If good status is roceiv-d from the new primary, then the stall is ended as before, and an interrupt is posted to update the sy~tem (to note one memory bad and different memory i- primary) How ver, if data error or timeout results from this attempt to read from the new primary, then an interrupt is asserted to th- processor 40 via - control bus SS and 43 i~
2S For writ- tr~n~f-r-, with th- writ- burfer S0 bypassed, case ono i- wh-r- no data rror~ ar- indicat-d in statu- bits 33-1 or 33-2 from th- ith-r m mory modul- Th- stall i~ endod to allow in~truction x-cution to continuo Again, an interrupt i~ posted if any rror wa~ d-t-ctod in ith-r status field .
; 30 For write trans~-r~, writ- bu~f-r 50 bypass-d, case two is - wh-re a data rror i~ indicat-d in tatu~ from the primary m-mory, or no respons- is r-c-iv-d ~rom th- primary m-mory The ; interfac- controll-r of ach CPU wait- for an acknowl-dg- from .''~

.!

r, ' . ~ ' .
' ~ ~

:~`
`' the other memory module, and i~ no data errors are fou~d in the statu~ from th- other ~eDory an owner~hlp change i8 forced and an interrupt i~ postod But if data errors or timeout occur ror the other (new primary) memory module, then an interrupt is a~sQrted s to the processor 4 0 .

For write transfers with the write buffer 50 enabled so the CPU chip is not stalled by a write operation, case one is with no errors indicated in status from either memory module The transfer is ended, so another bus transf-r can begin But if any lo error is detected in either status field an interrupt is posted For write transfers, write buffer 50 enabled, case two is where a data error i8 indicated in statu~ from the primary memory, or no responQe is received from the primary memory The mechanism waits for an acknowledge from the other memory, and if no data error i~ found in the statu~ from th- other memory then - an ownership change is forced and an interrupt is posted But ifdata error or timeout occur for the other memory, then an interrupt i8 posted . .
once it has been determined by the mechanism ~ust described , 20 that a memory module 14 or 15 is faulty, the fault condition is ~ignalled to the oparator, but the sy~tem can continue operating The operator will probably wish to replace the memory board containing the faulty module, which can b- done while the system pow-red up and operating The system i8 then able to re-2S int-grat- th- n-w m ~ory board without a shutdown Thls mechani-m al~o work- to revive a memory module that failed to xecut- a writ- due to a ~oft error but then tested good so it need not b phy~ically r-plac-d The ta-k is to get th- memory modul- back to a ~tat- wh-re it- data is identical to th- other ~0 memory module Thi- reviv- mod- i~ a two st-p process First, it is assumed that the memory is uninitialized and may contain parity errors, so good data with good parity must be written into all locations, this could be all zero~ at this point, but since . ~ .

,`

. . .

~ - ' ,'' : , - ~ . . , .

..
- - : .

: , . -all writes ars executed on bot~ memories the way this ~irst step i8 accomplished i5 to read a location in th- good memory module then write this data to the same location in both memory modules 14 and 15 This is done while ordinary operations are going on, interleaved with the task being per~ormed Writes originating from the I/O busses 24 or 25 are ignored by this revive routine in its ~irst stage After all location- have been thus written, the next step i3 the same a- the ~irst except that I/o accesses are also written; that is, I/O writ-s from the I/O busses 24 or 25 are executed as they occur in ordinary traffic in the executing task, interleaved with r-ading every location in the good memory and writing this same data to the same location in both memory modules When the modules have been addressed from zero to maximum address in this second step, the memorie~ are lS identical During thi~ second revive step, both CPUs and I/O
processors expect the memory module being revived to perform all operation~ without error~ The I/O processor~ 26, 27 will not use data presented by the memory modul- b-ing revived during data read transfers After completing th- revive procQss the revived ; 20 memory can then be (if necessary) designated primary `~ A similar revive process is provided for CPU modules When one CPU is detected faulty (as by the me~ory voter 100, etc ) the other two continue to operat-, and th- bad CPU board can be ~ replac-d without syst-m shutdown Wh-n th- new CPU board has run ;;~ 25 it~ power-on s-lf-t-~t routin-~ rrOm on-board ROM 63, it signals thi~ to th- oth-r CPU-, and a r-viv routin- is execut-d First, th- two good CPU- will copy th-ir ~tat- to global memory, then all thr-- CPU- will x-cut- a "-oft r---t" whereby the CPUs reset and t~rt x-cuting fro~ th-ir initialization routines in ROM, so th-y will all com up at th- exact aame point in their instruction stream and will b- ~ynchronized, then the saved state ;1 i8 copi-d bacX into all thre- CPU- and th- task pr-viously j x-cuting i- continued . ~

~ 32 i :`
.~ . . - .

. , -; . , .. ,,.. ,., ~ . - ,. - - ~
.

. - .

`: :. :
. . : ..

A~ noted abov~, th~ vot~ circuit 100 in oach memory module determinos whether or not all three CPUs make identical memory references If so, the memory operation i~ allowed to proceed to completion If not, a CPU fault mode is entered The CPU which transmits a different memory reference, as detected at the vote circuit 100, is identified in the status returned on bus 33-1 and or 33-2 An interrupt is posted and a software subsequently puts the faulty CPU offline This offline status is re~lected on status bus 32 The memory reference where the fault was detected is allowed to complete based upon the two-out-of-three vote, then until the bad CPU board has been replaced the vote circuit 100 require~ two identical memory requests fro~ the two good CPUs before allowing a memorv reference to proceed The system is ordinarily configured to continue operating with one CPU off-line, but not two However, if it were desired to operate with only one good CPU, this is an alternative available A CPU is voted faulty by the voter circuit 100 if different data is detected in its memory request, and also by a time-out; if two CPUs send identical memory requests, but the third does not send any signal~ for a preselected time-out period, that CPU is a~sumed to be faulty and is placed off-line as before The I/0 arrangement of the syste~ has a mechanism for software reintegration in the event of a failure That is, the CPU and memory modul- core i~ hardware fault-protected as just described, but the I/0 portion of the system is software fault-protect-d When on- of th- I/0 proce~sor~ 26 or 27 fails, the controll-r- 30 bound to that I/0 processor by ~oftware as mention-d abov- ar- ~witched over to the other I/0 processor by ~oftwar-; th- operating system rewrites the addresses in the I/0 page tabl- to use th- new addresses for the same controllers, and from then on the~e controllers are bound to the other one of the pair of I/0 processors 26 or 27 The error or ~ault can be ; detected by a bus error terminating a bus cycle at the bus inter~ace 56, producing an exception dispatching into the kernel ~35 through an exception handler routine that will determine the .~;
.
,.

, . . .

-.
.
~ ' ` ' - ' ` .
: :' ' ' ;

caus~ of the exception, and then (by rewriting addresses in the I/O table) move all the controller~ 30 from the railed I/o processor 26 or 2 7 to the other one When the bus int~rface 56 detects a bus error as just described, the fault must be isolated be~ore the reintegration scheme i~ used When a CPU does a write, either to one of the I/0 proce~sors 26 or 27 or to one of the I/0 controllers 30 on one of the busses 28 (e g , to one of the control or status registers, or data registers, in one of the I/0 elements), this 0 i8 a bypass operation in the memory modules and both memory modules execute the operation, passing it on to the two I/0 busses 24 and 25; the two I/o processors 26 and 27 both monitor th- bus~es 24 and 25 and check parity and check the commands for proper syntax via the controllers 126 For example, if the CPUs lS are executing a write to a register in an I/o processor 26 or 27, if either one of the memory modules presents a valid address, valid command and valid data (as evidenced by no parity errors and proper protocol), the addressed I/0 processor will write the data to the addressed location and respond to the memory module with an Acknowledge indication that the write was completed successfully Both memory modules 14 and lS are monitoring the responses from the I/0 processor 26 or 27 (i e , the address and data acknowledge signals of Figur- 9, and associated status), and both memory modules r-spond to th- CPU~ with operation status on lin-s 33-1 and 33-2 ~If thi~ had be-n a read, only the primary memory modul- would r-turn data, but both would return status ) !~, Now t~- CPU- can d-t-rmin- if both x-cuted th- write correctly,or only on-, or non- If only one r-turns good status, and that was th- primary, th-n th-r- i8 no n--d to forc- an ownership chang-, but if th- backup returned good and the primary bad, then an own-rship chang- is forc-d to mak- th- on- that xecuted corr-ctly now th- primary In ith-r cas- an interrupt is ~l enter-d to report th- fault At thia point th- CPU~ do not know ~3 wh-th-r it i~ a memory modul- or som-thing down~tr-am of the ~;l35 m-mory modules that is bad So, a similar writ- is attempted to :~
:
~1 s : ~. ~ , ' . : --:
.- , - .
. . - ~.
. .

:
, the other I/O processor, but if this succe-ds it does not necessarily prove th- memory module is bad becau~e the I/O
processor initially addressed could be hanging up a line on the bus 24 or 25, for example, and causing parity errors So, the s process can then selectively shut otf the I/o proces~ors and retry the operations, to see if both memory modules can correctly execute a write to the same I/O processor If so, the system can continue operating with the bad I/O proce-~or off-line until replaced and reintegrated But if the retry still gives bad status from one memory, the memory can be off-line, or further fault-isolation steps taken to mak- sur- the rault is in the memory and not in some other element; this can include switching all the controllers 30 to one I/O processor 26 or 27 then issuing a reset command to the off I/O processor and retry communication with the online I/O processor with both memory modules live -then if the reset I/O processor had been corrupting the bus 24 or 25 it~ bu~ drivers will have been turned off by the reset so if the retry of communication to the online I/O proce~sor (via both busses 24 and 25) now returnJ good status it iB known that the reset I/O processor was at fault In any event, for each bus error, some type of fault isolation seguence in implemented to determine which system component needs to be forced offline . . .
Synchronization The processor~ 40 u-ed in the illu-trative embodiment are of pip-lin-d archit-ctur- with overlapp-d in~truction execution, as di~cu---d abov- with r-f-r-nc- to Figure~ 4 and 5 Since a ~ynchronization techniqu- us-d in thi~ e~bodiment relies upon cycle counting, i - , incr-~ nting a count-r 71 and a counter 73 ; of Figur- 2 very tim an in~truction is executed, generally as s-t forth in application Ser No 118,503, th-r- mu~t be a definition of what con~titut-s th- exQcution of an instruction in the proc-~sor 40 A straightforward d-finition is that every tim- th- pip-line advanc-~ an instruction is executed One of .~
'', .

'~:............ .. . .. ..

~; - , ~ .
. ~
.
;: , .~.: ' . " . ~ ' ; , ' the control lines in the control bus 43 is a ~ignal RUNt which indicates that the pipeline is stalled; when RUN~ is h~gh the pipeline is stalled, when RUN# is low (logic zero~ the pipeline advance3 e~ch machine cycle Thi~ RUNt signal is used in the numeric processor 46 to monitor the pipeline of the processor 40 ~o this coprocessor 46 can run in lock~tep with its associated processor 40 This RUN~ signal in the control bus 43 along with the clock 17 are usQd by the counter~ 71 and 73 to count Run cycles.
., lo The size of the counter register 71, in a preferred embodiment, is chosen to be 4056, i e , 212, which is selected because the tolerances of the crystal o~cillators used in the clocks 17 are such that the drift in about 4K Run cycles on averaqe results in a skew or difference in number of cycles run by a processor chip 40 of about all that can be reasonably allowed for proper operation of the interrupt synchronization as explained below one synchronization mechanism is to force action to cause the CPUs to synchronize whenever the counter 71 overflow~ One such action is to force a cache mi~ in response ; 20 to an overflow signal OVFL from the counter 71; this can be done by merely generating a false Miss signal (e g , TagValid bit not set) on control bus 43 for the next I-cache reference, thus torcing a cache miss exception routine to be entered and the ; r-sultant memory reference will produc- synchronization just as any memory referenc- does Another method of torcing synchronization upon ov-rtlow of counter 71 i~ by torcing a stall in th- proc--~or 40, which can b don- by using th- over~low ignal OVFL to g-n-rat- a CP Bu~y (coproce~sor bu~y) signal on ~, control bu- 43 via logic circuit 71a o~ Figure 2; this CP Busy ~ignal alway- r-~ult~ in th- processor 40 ent-ring stall until CP
Busy i~ d-as~-rt-d All thr-e proc-~or~ will enter this stall b-caus- th-y are ex-cuting the sam- cod- and will count the same i cycleB in th-ir count-r 71, but the actual tim- th-y enter the ~tall will vary; th- logic circuit 71a r-c-ives th- RUNt signal , 35 trom bus 43 ot the oth-r two proce~sors via input Rt, so when all "1 36 ~i ;

' .~ ", ' , ' ~ ' ', ' , ' ' - .
.: - :
.
. ~ ~, .
.

three have stalled t~e cP Busy s~gnal is released and the processors will come out of stall in ~ynch again.

Thus, two synchronization techniques have been described, the first being the synchronization resulting from voting the memory references in circuits 100 in the memory modules, and the second by the overflow of counter 71 as ~u~t set forth. In addition, interrupts are synchronized, ais will be described below. It ls important to nots, howev~r, that the procQSsors 40 are basically running free at their own clock speed, and are substantially decoupled from one another, except when synchronizing events occur. The fact that microprocessors are used as illustrated in Figures 4 and 5 would make lock-step synchronization with a single clock more difficult, and would degrade performance; also, use of the write buffer 50 serves to decouple the processors, and would be much less effective with close coupling of the processors. Likewise, the high-performance resulting from using instruction and data caches, and virtual memory management with the TLBs 83, would be more difficult to implement if close coupling were used, and performance would ` 20 suffer.
' The interrupt synchronization technique must distinguish between real time and i~o-called "virtual time". Real time is the external actual time, clock-on-the-wall time, measured in seconds, or for convenience, mea~ured in machine cycles which are ' 25 60-ns-c divisions in th- xampl-. The clock generators 17 each produc- clock pul~-- in real time, of cours-. Virtual time is the lnt-rnal cycle-count tim- Or each of the processor chips 40 as measured in each on- of the cycle counters 71 and 73, i.e., the instruction number of the ini~truction being executed by the processor chip, measured in instructions since some arbitrary beginning point. R-~erring to Figure 10, the relationship ; between real time, shown ais to to t12, and virtual time, shown as instruction numb r (modulo-16 count in count regi~ter 73) Io to I15, is illustrated. Each row of Figur- 10 is the cycle count .~ .

i , . . .. .
':: ': ; ' , , ' .

.~ , . ,-- .
.-; . . . .- :
,-,, ` . `' ' , ` '' . .

, :; ' .

;Z 003342 ~or one of the cPuc A, B or c, and each column i~ a "polnt" in real tim~. The clocks for the cPus will most likelY be out of phase, so the actual time correlation will be as seen in Figure lOa, where the instruction numbers (column~) are not per~ectly s aligned, i.e., the cycle-count does not change on aligned real-time machine cycle boundaries; however, for explanatory purposes the illustration of Figure 10 will suffice. In Figure 10, at real time t3 the CPU-A is at the third instruction, CPU-B is at count-9 or executing the ninth instruetion, and CPU-C is at the fourth instruetion. Note that both real time and virtual time ean only advance.

The processor chip 40 in a CPU stall~ under certain conditions when a resource is not available, sueh as a D-caehe 45 or I-caehe 44 miss during a load or an instruetion feteh, or a signal that the write buffer 50 is full during a store operation, or a "CP Busy" signal via the eontrol bus 43 that the eoprocessor 46 is busy (the eoproeessor reeeives an instruetion it cannot yet handle due to data depondeney or limited proeessing resourees), - or the multiplier/divider 79 is busy (the internal multiply/divide eireuit has not eompleted an oporation at the time the proeessor attempts to aeeess the result register). Of ; these, the eaehes 44 and 45 are "pas~ive re~ourees" which do not ehange ~tate without intervention by the proeessor 40, but the ~ remainder of th- items are aetive resouree~ that ean ehange state ; 25 while the proeessor is not doing anything to aet upon the resouree. For exampl-, th- write buffer 50 ean ehange from full to empty with no action by th- proee~or (so long aQ the proe-~-or do-~ not p-rform another store operation). So there are two typ-s of stalls: stalls on pa~siv- resourees and stalls ` 30 on aetiv resourees. Stalls on aetive rQ~ourees are ealled interloek stalls.

Sinee the eode streams exeeuting on the CPU~ A, B and C are the ~ame, the state~ of the passive resourees ~ueh as eaehes 44 and 45 in the three CPU~ are neeessarily the same at every point :

~_.

.
~;

in virtual time Ir a stall i~ a result o~ a confliCt at a passivo resource (e g , th~ data cache 45) then all three processors will perform a stall, and the only variable will be the length of the stall Referring to Figure 11, assume the cache miss occurs at I~, and that the acce~s to the qlobal memory 14 or 15 resulting from the miss takes eight clocks (actually it may be more than eight) In this case, CPU-C begins the access to global memory 14 and 15 at tl, and the controller 117 for global memory begins the memory access when the first processor CPU-C signals the beginning of the memory access The controller 117 completes the access eight clock~ later, at t8, although CPU-A and CPU-B each stalled less than the eight clocks required for the memory access The result is that the CPUs become synchronized in real time as well as in virtual time This example also illustrates the advantage of overlapping the access to DRAM 104 and the voting in circuit 100 Interlock stalls present a different situation from passive resource stalls One CPU can perform an interlock stall when another CPU does not stall at all Referring to Figure 12, an interlock stall caused by the write buffer 50 is illustrated The cycle-counts for CPU-A and CPU-B are shown, and the full flags A~b and B~b from write buffers 50 for CPU-A and CPU-B are shown below the cycle-counts (high or loqic one means full, low or logic zero mean~ empty) The CPU checks the state of the full ~25 flag every time a stor- operation is executed; if the full flag i8 set, the CPU stall~ until the full flag is cleared then compl-t-~ th- stor- op-ration Th- writ- bufrer 50 sets the full flag if th- ~tor- op-ration fill~ th- buffer, and clears the full ; flag wh-n-v-r a stor- operation drains one word from the buffer th-reby fre-ing a location for th- n-xt CPU store operation At time to tho CPU-B is three clocks ahead of CPU-A, and the write buffer~ are both full A~sum- the writ- buffers are performing a write operation to global memory, 80 when this write completes during t5 the write buffer full flaqs will be cleared; this . ` .

.
. ~

,:;, . : .
,': . . .: ' .: ' , ' ''' ' , ';" ' ' ' : :

, ', ' , ' ,: ' ~

.
clearing will occur synchronously in t6 in real time (for the reason illu~trated by Figure 11) but not ~ynchronously in virtual time Now, assume the instruction at cycle-count I6 is a store operation; CPU-A executes this store at t6 after the write buffer rull flag i- cleared, but CPU-B trie~ to execute this atore operation at t3 and finds the write buffer full flag ls still set and 80 has to stall for three clocks Thu~, CPU-B performs a stall that CPU-A did not .
The property that one CPU may stall and the other not stall imposes a restriction on the interpretation of the cycle counter 71 In Figure 12, assume interrupts are presented to the CPUs on a cycle count of I7 (while the CPU-B is stalling from the I6 - instruction) The run cycle for cyele eount I7 oeeurs ror both CPUs at t7 If the eyele eounter alon- presQnts the interrupt to the CPU, then CPU-A would see the interrupt on cyele eount I7 but CPU-B would sQe the interrupt during a stall eyele resulting from ; cyele eount I6, so this method of presenting interrupts would eause the two CPUs to take an exeeption on different instruetions, a condition that would not have oceurred if either all of the CPUs stalled or none stalled Another restriction on the interpr-tation of the cycle eounter i~ that th-r- hould not b- any d-lays between deteeting ~; th- eyel- eount and p-r~orming an aetion Again re~-rring to Figur- 12, a-~um- int-rrupt- ar- pr-~-nt-d to the CPUs on eyele 2S eount I~, but beeau-- o~ implem-ntation r-strietionJ an extra eloek d-lay i~ int-rposed betwe-n deteetion o~ eyele eount I6 and ~ pres-ntation o~ th- interrupt to the CPU The result is that ;~ CPU-A 8-e- thiJ int-rrupt on eyele eount I7, but CPU-8 will see th- int-rrupt during th- ~tall from eycl- eount I~, cau~ing the , 30 two CPUs to tak- an xe-ption on di~-r-nt instruetions Again, I th- importanee of monitoring th- statQ o~ th- instruetion `~ pip-lin- in r-al time i~ illustrated .~
~ 40 . ::. . .
.
,, , , . . :
;. ' . . -.. . : .
i'.;' ' ~; ' , : '~

2~)03342 Interrupt Synchronization The three cPus of the ~y~tem of Figures 1-3 are required to function a~ a single logical processor, thus requiring that the CPUs adhere to certain restrictions regarding their internal state to ensure that the programming model of the three CPUs is that o~ a ~ingle logical proces~or Excopt in failure modes and in diagnostic functions, the instruction streams of the three CPUs are required to be identical~ If not identical, then voting global memory accesses at voting circuitry 100 of Figure 6 would be difficult; the voter would not know whether one CPU was faulty or whether it was executing a different sequence of instructions The synchronization scheme is designed so that if the code stream of any CPU diverges from the code stream of the other CPUs, then a failure is assumed to have occurred Interrupt synchroniza-i 15 tion provides one of the mechanisms of maintaining a single CPU image All interrupts are required to occur synchronous to virtual time, ensuring that the instruction streams of the three proce~sors CPU-A, CPU-~ and CPU-C will not diverge as a result of interrupts (th-re are other causes of divergent instruction stream-, such as one proc-ssor r-ading diff-rent data than the data read by the oth-r processors) Several scenario~ exist wh-r-by interrupts occurring asynchronous to virtual time would cau-- th- code str-ams to div-rg- For example, an interrupt 2S cau-lng a cont-xt witch on on- CPU b for- proces~ A completes,but cau-ing th- cont-xt ~witch after proce~s A completes on ~ anoth-r CPU would r--ult in a ~ituation wh-r-, at some point ;~, lat-r, on- CPU continu-s executing proces~ A, but the other CPUcannot ex-cut- proc-s~ A becaus- that proc-s~ had alr-ady compl-ted If in thi~ case the int-rrupt- occurr-d aJynchronous to virtual tim-, then just thQ fact that th- xc-ption program counters wer- diff-rent could ~au-e probl-ms The act of writing `j th- xception program counters to global m mory would result in ~~ 41 .~
,~

t5.
'''`' -~r :: :
.~ . . . .

. ''~':- : ' , . .

the voter detecting different data from the three CPU~, producing a vote fault certain types of exceptions in the CPU~ ar~ inherently synchronous to virtual time One example is a breakpoint exception caused by the execution of a breakpoint instruction Slnce the in~truction streams of the CPUs aro identical, the breakpoint exception occurs at the ~ame point in virtual time on all three of the CPUs Similarly, all such internal exceptions inherently occur synchronous to virtual time For example, ~LB
exceptions are internal exceptions that are inherently synchronous TLB exceptions occur because the virtual page number does not match any of the entries in the TLB 83 Because the act of translating addresses is solely a function of the instruction stream (exactly as in the case of the breakpoint exception), the translation is inherently synchronous to virtual time In order to ensure that TI~3 Qxceptions are synchronous to virtual time, the state of the TLBs 83 must be identical in all three of the CPUs 11, 12 and 13, and thi~ is guaranteed because the TLB 83 can only be modified by software Again, since all of th- CPUs execute the same instruction stream, the state of the TL8s 83 are always changed synchronous to virtual time So, as a general rule of thumb, if an action is performed by software then the act$on is synchronou~ to virtual time If an action is performed by hardware, which does not use the cycle counters 71, then the action is g-nerally synchronou~ to real time :
Ext-rnal xceptions are not inh-r-ntly synchronous to virtual tlu- I/O d-vic-s 26, 27 or 30 have no information about the virtual tim- o~ the threo CPUs 11, 12 and 13 Therefore, all interrupt~ that ar- generated by thes- I/O devices must be synchroniz-d to virtual tim- befor- pr-senting to th- CPUs, as xplain-d b-low Floating point xc-ptions ar- di~ferent from I/O device interrupts becausa th- floating point coprocessor 46 is tightly coupled to the microprocessor 40 within the CPU
` :

', . .

'' . . :

. ,.

External devices view the three CPUs a~ one logical processor, and have no information about the synchronaity or lack of synchronaity between the CPUs~ so the external device~ cannot produce interrupts that are ~ynchronous with the individual instruction stream (virtual time) of each CPU Without any sort of synchronization, if some external device drove an interrupt at time real time tl of Figure 10, and the interrupt was presented directly to the CPUs at this time then the three CPUs would take an exception trap at different instructions, resulting in an unacceptable state of the three CPUs This is an example of an - event (assertion of an interrupt) which i~ synchronous to real time but not synchronous to virtual time Interrupts are synchronized to virtual time in the system of Figures 1-3 by performing a distributed vote on the interrupts lS and then presenting the interrupt to the processor on a predetermined cycle count Figure 13 shows a more detailed block diagram of the interrupt synchronization logic 65 of Figure 2 Each CPU contains a distributor 135 which captures the external interrupt from the line 69 or 70 coming from the modules 14 or lS; this capture occurs on a predetermined cycle count, e g , at count-4 as signalled on an input line CC-4 from the counter 71 The captured interrupt is distributed to the other two CPUs via the int~r-CPU bus 18 These distributed interrupts are called pending interrupts Ther- are thr~e pending interrupts, one from each CPU 11, 12 and 13 A voter circuit 136 captures the pending i interrupts and per~orm- a vot- to veri~y that all o~ the CPUs did r-c-iv- th- xternal int-rrupt r-qu--t On a predetermined cycle count ~det-ct-d ~ro~ th- cycl- count~r 71), in this example ` cycl--8 r-c-iv-d by input lin- CC-8, the interrupt voter 136 ~30 pres nt~ th- interrupt to the interrupt pin on its respective microproc-ssor 40 via lin~ 13~ and control bus 55 and 43 Since ;` th- cycl- count that is used to pres-nt th- interrupt is pr-d-ter~in-d, all o~ th- microproc-ssors 40 will receive the ~ int-rrupt on th- same cycl- count and thu~ the interrupt will ;~35 hav~ b--n synchronized to virtual tim . ,.
; 43 .
; ,;

'~
'`. .

.: :........ .. . .
, ' .
;:: -... .. . .
: .
. . ~

.. ~ :

2003;~42 Figure 14 shows the sequence of events ~or synchronizing interrupts to virtual time The rows labeled CPU-A, CPU-B, and CPu-c indicate the cycle count in counter 71 of each CPU at a point in real time The rows labeled IRQ A PEND, IRQ B PEND, and IRQ C PEND indicate the state o~ the interrupt pending bits coupled via the inter-CPU bus 18 to the input of the voters 136 (a one signifies that the pending bit i5 set) The rows labeled IRQ A, IRQ B, and IRQ_C indicate the ~tate of the interrupt input pin en the microprocessor 40 (the signals on lines 137), where a one signifies that an interrupt is present at the input pin In Figure 14, the external interrupt (EX IRQ) is asserted on line 69 at to If the interrupt distributor 135 captures and then distributes the interrupt to the inter-CPU bus 18 on cycle count 4, then IR Q C PEND will go active at tl, IRQ B PEND will qo lS activ- at t2, and I~Q A PEND will go active at t~ If the interrupt voter 136 captures and then votes the interrupt pending bits on cycle count 8, then IRQ C will go active at ts, IRQ B
;~ will go active at t6, and IRQ-A will go active at t8 The result - is that the interrupts were presented to the CPUs at differentpoints in real time but at the same point in virtual time (i e ' cycle count 8) Figure lS illustrates a scenario which reguires the algorithm presented in Figure 14 to be modified Note that the cycl- counter 71 i~ h-r- r-pr--ented by a modulo 8 counter The 2S xt-rnal int-rrupt (EX ~RQ) is asserted at time t3, and the lnterrupt di-tributor 13S captur-s and then distributes the int-rrupt to tho int-r-CPU bus 18 on cycle count 4 S~nce CPU-B
and CPU-C have ex-cuted cycle count 4 before time t3, their interrupt distributor do-s not captur- tho external interrupt CPU-A, however, execut-s cycl- count 4 after time t3 The result is that CPU-A capturas and distributes the xt-rnal interrupt at time t~ But if th- interrupt vot-r 136 capture~ and votes the interrupt pending bits on cycle 7, the interrupt voter on CPU-A
:~ :
~ 44 ;

.~

.

.. .
~. , -..... :.

.:

captures the IRQ_A_PEND signal at time t7, when the two other interrupt pending bits are not ~et The interrupt vot-r 136 on CPU-A recognizes that not all of the CPU~ hav- distributed the external interrupt and thus places the captured interrupt pending s blt in a holding register 138 The interrupt voters 136 on CPU-B and CPU-C capture the single interrupt pending bit at times ts and t~ respectively LiXe the interrupt voter on CPU-A, the voters r-cognize that not all o~ the interrupt pending bits are set, and thus the single interrupt pending bit that i5 set is placed into the holding register 138 When the cycle counter 71 on each CPU reaches a cycle count of 7, the counter rolls over and begins counting at cycle count 0 Since the external - interrupt is still asserted, the interrupt distributor 135 on CPU-B and CPU-C will capture the external interrupt at times tlo and tg respectively These times correspond to when the cycle count becomes egual to 4 At time tl~, th- interrupt voter on CPU-C captures the interrupt pending bitff on th- inter-CPU bus 18 Th- voter 136 determineQ that all of th- CPUs did capture and distribute the extornal interrupt and thus presents the interrupt to the processor chip 40 At times tl33 and tl5, the interrupt voters 136 on CPU-B and CPU-A capture the interrupt pending bits and then presents the interrupt to the processor chip 40 The result is that all of tho processor chips received the external interrupt reque~t at identical instructions, and the ~i 25 information saved in the holding regi~ters i~ not needed .s Holding R-gi-t-r In th- int-rrupt scenario pr-sent-d above with reference to Figur- lS, the voter 136 use~ a holding register 138 to save some state information In particular, th- sav-d state wa~ that some, but not all, of the CPU~ captur-d and di~tribut-d an xternal interrupt If th- ~yst-m do-- not hav- any fault- (a~ wa- the situation in Figur- 15) th-n thi~ stat~ information i~ not .
. ~ .. - . . .
.`. . ' ~ . ' . ' ' ''. ' ' :
. . , .~ . - :., .. : . .
:, ::: . - - . . -.~ .
':~

Z0~3342 necessary because, a ~hown in the prQvious example, external interrupto can be synchronized to virtual time without the use o~
the holding register 138 The algorithm is that the interrupt voter 136 capture~ and voteC the interrupt pending bits on a pr-determined cycle count When all of the interrupt pending bits are aaserted, then the interrupt is presented to the processor chip 40 on the predetermined cycle count In the example of Figure 15, the interrupts were voted on cycle count 7 Referring to Figure 15, if CPU-C fails and the failure mode is such that the interrupt distributor 135 does not function correctly, then if the interrupt voters 136 waited until all of the interrupt pending bits were set before presenting the interrupt to the processor chip 40, the result would be that the interrupt would never get presented Thus, a single fault on a single CPU renders the entire interrupt chain on all of the CPUs inoperable The holding register 138 provides a mechanism for the voter ! 136 to know that the last interrupt vote cycle captured at least one, but not all, of the interrupt pending bits The interrupt vote cycle occurs on the cycle count that the interrupt voter capture~ and votes the interrupt pending bits There are only two scenarios that result in some of the interrupt pending bits being set One is the ~cenario presented in reference to Figure $~ 15 in which the external int-rrupt i8 aoserted before the int-rrupt di-tribution cycl- on some of the CPUs but after the int-rrupt di-tributlon cycl- on other CPUa In the second sc-nario, at l-ast on- of the CPUs fails in a manner that disabl-~ the interrupt distributor If th- rea~on that only some of th- int-rrupt p-nding bits ar- set at th- interrupt ~ote cycle is case on- scQnario, then th- interrupt voter is guaranteed that all of th- interrupt p-nding bit~ will b- set on the next interrupt vote cycl- Therefore, if the interrupt voter discover- that the holding register hao been s-t and not all of the interrupt pending bits are set, then an error must exist on : ,. .

. -- .
.-:- :

' .

one or more of the CPU~ This as~umes that the holding regl~ter 138 o~ each CPU gets cleared when an interrupt is serviced, so that the ~tate of the holding register doe- not repre~ent ~tale 3tate on the interrupt pending bits In th- case o~ an error, tho interrupt voter 136 can pre~ent the interrupt to the proces-sor chip 40 and simultaneou~ly indicate that an error has been d-tected in the interrupt synchronization logic :
The interrupt voter 136 does not actually do any voting but instead merely checks the state of the interrupt pending bits and lo the holding register 137 to determin- whether or not to present an interrupt to the processor chip 40 and whether or not to ' indicate an error in the interrupt logic , . ~ .
Modulo Cycle Counters ;, :
;~ The interrupt synchronization example of Figure 15 3 15 repr-sQnt-d the interrupt cycle count-r 71 as a modulo N counter (e g , a modulo 8 counter) Using a modulo N cycle counter ;, simplifi-d the description of the interrupt voting algorithm by allowing the concept of an interrupt vot- cycle With a modulo N
cycle counter, the interrupt vote cycle c~n be described a~ a single cycle count which lies between O and N-l where N is the modulo of the cycl- count-r What-v r valu- o~ cycle counter is cho~-n for th- int-rrupt vot- cycle, that cycle count i8 gu~rant--d to occur ~ ry N cycl- count-; a~ illu~trat-d in ' Figur- 15 for ~ modulo 8 count-r, v-ry ight count~ an interrupt q 2S vot- cycl- occur- Th- interrupt vot- cycl- i9 used here merely to illu-trat- th- p-riodic nature of a modulo N cycle counter Any v nt th~t i- k-y-d to ~ p~rticular cycl- count of ~ modulo N
cycle count-r is guar~nt--d to occur ev-ry N cycl- counts Obviously, ~n infinit- (i - , non-rep-~ting count-r 71) couldn't b- u--d ..~
,',,~ .
~i 47 . .', .;
.

:- -:
.... . .

. .

-:, A value of N is chosen to maximize system parameters that have a pos~tive effect on the systom and to minimize system parameters that have a negative effect on the system Some of such ef~ects are developed empirically First, some o~ the s parameters will be described; C. and cd are the interrupt vote cycle and the interrupt distribution cycle respectively (in the circuit of Figure 13 these are the inputs CC-8 and CC-4, respectively) The value of C. and Cd must lie in the range between 0 and N-l where N is the modulo of the cycle counter D ~X is the maximum amount of cycle count drift between the three processors CPU-A, -B and -C that can be tolerated by the synchronization logic The processor drift is determined by taking a snapshot of the cycle counter 71 from each CPU at a point in real time The drift is calculated by subtracting the lS cycle count of the slowest CPU from the cycl- count of the fastest CPU, performed as modulo N subtraction The value of D~ i8 described as a function of N and the values of C~ and Cd First, D,~ will be defined as a function of the difference ' Cv-Cd, where the subtraction operation is performed as modulo N
subtraction This allows us to choose values of C. and Cd that maximize D~ Consider the scenario in Figure 16 Suppose that Cd-8 and C.-9 From Figure 16 the processor drift can be calculated to be D~ 4. The external interrupt on line 69 is asserted at time t~ In this cas-, CPU-D will capture and ;25 di~tribut- the interrupt at time t~ CPU-B will then capture and ; vot- th- int-rrupt pending bits at time t6 This scenario is ~ncon-ist-nt with th- int-rrupt synchronization algorithm pres-nt-d arli-r becausQ CPU-B ex-cute~ it~ interrupt vote cycle before CPU-A has performed the interrupt distribution cycle The `30 flaw with thi~ scenario is that the processors have drifted further apart than the difference b~twe-n C. and Cd The relationship can b- formally written a-~ Equation (1) C. - C~ < D~ - e :
:, 48 ., .

'~ ' ,': . ~ ' . ...

,' . . - ~ :
, . .
~ ' `
', ' '' ' 20~3342 wher8 e i9 the time need-d for th- interrupt p-nding bits to propagate on the inter-CPU bu~ 18 In pr-viou~ xampl-s, e has been assumed to b- zero Since wall-clock tim- has been quantized in cloc~ cycl- (Run cycle) incrementQ, e can also be quantized Thu~ the equation becomes Equation (2) C. - Cd < D..~
where D~,~ is expressed as an integer numb r of cycle counts Next, the m~ximum drift can be described as a function of N
Figur- 17 illustrate~ a ~cenario in which N~4 and the processor dri~t D~3 Suppose that Cd-0 The subseript~ on cycle count 0 o~ each proeessor denote the quotient part (Q) of the instruction , cycle eount Since the cycle count iB now represented in modulo N, the value of th- cycle counter is the remainder portion of I/N
where I is the number of instructions that have been executed sinee time to The Q of the instruction eycle count is the integer portion Or I/N If the xternal interrupt is asserted at time t3, then CPU-A will eapture and di~tribut- th- interrupt at tim- t~, and CPU-B will execute its int-rrupt distribution cycle at tim- t5 Thi~ presents a problem beeause the interrupt distribution cycle for CPU-A ha~ Q~1 and the interrupt distribution cycle for CPU-B has Q~2 Th- synchronization logic will continue as if there are no problem~ and will thus present th- int-rrupt to th- proeessors on equal eyele eounts But the ~nterrupt will b- pr-~-nt-d to th- proe-s-ors on diff-r-nt 2S instruetions beeau~- th- Q of aeh proe-s-or is different The r-lation-hip of D.~, a- a funetion of N i~ th-refor-:,.
Equ~tion ~3) N/2 > D..~
wh-r- N 1- an v-n nu~b r and Dmax i8 xpre-sed as an integer num~ r of eyel- eount- (Th--- equation~ 2 and 3 ean be shown to b- both quival-nt to th- Nyquist th-or-m in sampling theory ) ; Combining quations 2 and 3 giv-~
Equation (4) C. - Cd < N/2 - 1 whieh allow- optimu~ valu-s of Cv and Cd to b- eho~en ~or a given valu- of N

,:

.. ~ : . .. .
.
.

:.. : . . , . -.,.~, .

All o~ the above equations suggest that N should be as large as possible The only factor that trie~ to drive N to a small number is interrupt latency Interrupt latency is the time interval between the assertion o~ the external interrupt on line 69 and the presentation o~ the interrupt to the microprocessor chip on line 1~7 Which processor should be used to determine the interrupt latency is not a clear-cut choice The three microprocessors will operate at different speeds because of the slight differences in the crystal oscillators in clock sources 17 and other factors T~ere will be a ~aste~t processor, a slowest processor, and the other processor Defining the interrupt latency with respect to the slowest processor is reasonable because the performance of system is ultimately determined by the performance of the slowest processor Th- maximum interrupt latency is Equation (5) L~ 2N - 1 where L~ is the maximum interrupt latency expressed in cycle counts The maximum interrupt latency occurs when the external interrupt is asserted after the interrupt distribution cycle Cd of the fastest processor but before the interrupt distribution cycle Cd of the slowest processor The calculation of the average interrupt latency L~.~ i8 more complicated because it depends on the probability that the external interrupt occurs after the interrupt distribution cycle of the fastest processor and before the int-rrupt distribution cycle of the slowe~t proce--or ~his probability d-pends on th- drift between the proe~-sor- which in turn i8 det-rmin-d by a number of external factors If we assum that theJe probabilities are zero, then the average latency may be expressed as Eguation (6) L~ N/2 + (C~ - Cd) Using the~e relationships, values of N, C~, and Cd are chosen using the system requirement for D.~ and interrupt latency For example, choosing N-128 and (C. - Cd)~10, L~ 74 or about 4 4 microsee (with no stall cycles) Using the preferred .. ,~ . " . .
; .. . : ;
, : .

.

.

20()3342 embodiment where a four bit (~our binary stage) counter 71a is used as the interrupt synch count-r, and th- distribute and VotQ
outputs aro at ec-4 and CC-8 as discussed, it is seen that N~16, e.~8 and Cd-4, so L.~.-16/2 +(8-4) - 12-cycles or 0 7 microsec :;
Refresh Control for Local Memory ..
The r-fresh counter 72 counts non-stall cycles (not machine cycles) ~ust as the counters 71 and 71a count The ob~ect is that the refresh cycle~ will be introduced for each ePu at the same cycle count, measured in virtual tim- rather than real time lo Preferably, each one of the CPUs will interpose a refresh cycle ~; at the ~ame point in the in6truction stream as the other two - The DRAMs in loeal memory 16 must be refreshed on a S12 cyeles per 8-msee sehedule ~ust as mentioned above regard$ng the DRAMs 104 o~ the global m-mory Thus, the eounter 72 eould issue a r-fr-sh eommand to the DRAMs 16 onee every lS-mierosee , addr-s~ing one row of S12, 80 the refr--h speeifieation would be satisfied; if a memory operation was reguested during refre~h then a Busy response would result until r-rresh was finished But l-tting eaeh CPU handle its own loeal memory refresh in real tim- independently of th- others eould eause the CPUs to get out of syneh, and so additional eontrol is needed For example, if refr-sh mod- i~ ent-r-d ~ust a~ a divide operation is beginning, i~ th-n timing i- sueh that on- CPU eould tak- two eloek- longer than oth-r- or, if a non-int-rruptabl- -gu-ne- was nt-red by ~2S a faJt-r CPU th-n th- oth-r- w nt into r-fr-sh b fore ent-ring ;~ thi- routin , th- CPU eould walk away from on- another ~5~ How-v r, u-ing th- eyele eount-r 71 ~in~tead of real time) to avoid om of th-~- problem~ means that ~tall eyeles are not eount-d, and if a loop i- enter-d eau-ing many stall~ (some ean eau-- a 7-to-1 stall-to-run ratio) th-n th refre~h peeifieation not m-t unl-s~ th- p-riod i~ d-er-a--d ubstantially from the 15-miero~-e figur-, but that would d-grad- performane- For this rea~on, ~tall eyel-- ar- also eount-d in a seeond eounter 72a, :~.

~ . . : , .. .... .
.... . .
.- . .
. . . .. ; : .:
.: . :.
~ - .. . . .

seen in Figure 2, and every time thi~ counter reache~ the same number as that counted in the refre~h counter 72, an additional refresh cycle i~ introduced For exampl~, the re~resh counter 72 counts 2~ or 256 Run cycle~, ln ~tep with the counter 71, and s when ~t overflows a refre~h is ~ignalled via control bus 43 Meanwhile, counter 72a counts 2~ ~tall cycles (responsive to the RUN~ signal and clock 17), and every time it overflows a second counter 72b is incremented (counter 72b may be merely bits 9-to-11 ~or the eight-bit counter 72a), 80 when a refresh mode is ~inally entered the CPU does a number of additional refreshes indicated by the number in the counter regi~ter 72b Thus, if a long period of stall-intensive execution is encountered, the average number of refreshes will stay in the one per 15-microsec range, even i~ up to 7x256 stall cycles are interposed, because when finally going into a re~resh mode the number of row~
refreshed will catch up to the nominal re~resh rate, yet there is no degradation of per~ormance by arbitrarily shortening the re~resh cycle Memory Management The CPUs 11, 12 and 13 o~ Figures 1-3 have memory space organiz-d as illustrated in Figure 18 Using the example that the local memory 16 is 8-MByte and the global memory 14 or 15 is r 32-MByte, note that th- local m-mory 16 is part of the same continuou- zero-to-40M map o~ CPU memory access space, rather than b ing a c~ch- or a s-parat- m-mory space; realizing that the 0-8N -ction i~ triplicated (in the three CPU modules), and the 8-40M ection i- duplicated, neverthele-s logically there is m-rely a ingl- 0-40M physical addre~- space An address over 8-MByte on bu~ 54 cau--~ th- bu- inter~ac- 56 to mak- a request to th- memory module- 14 and 15, but an addr--s under 8-MByte will acc--~ the local memory 16 within the CPU module its-l~
j~ Per~ormance i~ improved by placing mor- o~ the memory used by the application- being execut-d in local m-mory 16, and so as memory '~
. .

.

`, .- ~

?

chips are availablo in higher densitie~ at lower cost and higher speeds, additional local memory will b- added, a~ well as additional global memory For example, th- local memory might be 32-MByte and the global memory 128-M~yte On the other hand, if a very minimum-cost ~ystem is needed, and perrormance i9 not a ma~or determining ractor, the system can b- operated with no local m-mory, all main memory being in the global memory area (in memory modules 14 and 15), althouqh the perrormance ponalty is high rOr such a conriguration Th- content Or local memory portion 141 of the map Or Figure 18 i~ identical in the three CPU~ 11, 12 and 13 Likewise, the two memory modules 14 and 15 contain identieally the aame data in their spaeo 142 at any given instant Within the loeal memory ` portion 141 is ~tor-d the kernel 143 (eode) for the Unix operating system, and this area is physieally mapped within a fixed portion of the loeal memory 16 of eaeh CPU Likewise, ~ kern-l data is a~sign-d a rixod ar-a 144 in eaeh local memory 16;
xcept upon boot-up, these block- do not get swapped to or from global memory or di~k Another portion 145 of local memory 16 is employ-d for us-r progra~ (and data) pag-s, which are swapped to ar-a 146 of th- global memory 14 and 15 und-r control of the operating sy~tem The global memory area 142 i~ u~ed as a staging ar-a for u~-r page~ in area 146, and also as a disk buff-r in an ar-a 147; if th CPU~ ar- x-euting eod- whieh p-rform- a writ- of a bloek of data or eod- fro~ loeal memory 16 i to dl-k 148, th-n th- -qu-ne- i~ to alway- writ- to a disk - buff-r ar-a 1~7 in-t-ad b eau-e th- ti~ to eopy to aroa 147 is n-gligibl- eo~par d to the ti~ to eopy direetly to the I~O
proe---or 26 and 27 and thu- via I/O eontroll-r 30 to disk 148 ~30 Th-n, whil- th- CPU proe--d to x-eut- oth-r cod-, th- write-to-di-k op-ration i~ don-, tran-par-nt to th- CPU~, to mov- the ~ bloek fro~ ar-a 147 to di~k 148 In a lik- ~ann-r, th- global `~ m-mory ar-a 146 i~ mapp-d to inelud- an I/O taging 14g area, for ~imilar tr-atment of I/O aee--~e- oth r than disk (- g , video) .~

:~

.- . -, . ~ . . :
,. ~ .

: .

~ . , : , The physical memory map of Figure 18 ig correlated with the virtual memory management syst~m o~ the proc--sor 40 in each CPU
Figure 19 illustrates the virtual addres~ map of the R2000 processor chip used in the example embodiment, although it is understood that other microproces~or chips ~upporting virtual memory management with paging and a protectlon mechani~m would provide corresponding features In Figure 19, two separat- 2-GByte virtual address spaces 150 and 151 are illustrated; the processor 40 operate~ in one of two modes, user mode and kernel mode The proc-s~or can only access the area 150 in the user mode, or can access both the areas 150 and 151 in the kernel mode The kernel mode is analogous to the supervisory mode provided in many machines The processor 40 is configured to operate normally in the user mode until an xception is detected forcing it into the kernel mode, where it remains until a restore from exception (RFE) instruction is executed The manner in which the memory address-s are translated or mapped depends upon tho operating mode of the microproces~or, which i8 defin-d by a bit in a status register When in th- user mod-, a singl-, uniform virtual address space 150 referred to as "ku~eg" of 2-GByt- size is available Each virtual address is also extended with a 6-bit process identifier ~PID) field to form unigue virtual addresses for up to sixty-four u~er proc-ss-~ All r-f-r-nc-- to thi~ segment 150 in user mod- ar- mapp-d through th- T1~3 83, and u-- of th- cache~ 144 and 145 i~ d-termin-d by bit ~-tting~ for ach page entry in the TLB
ntri~ om pag-- may be cachabl- and some not as specifl-d by th- programmer . . .
Wh-n in the kern-l mode, th- virtual addr-ss spac- includes both th- ar-a~ 150 and 151 of Flguro 19, and thi~ spac- has four s-parat- ~egments kus-g 150, ksegO 152, k--gl 153 and k~eg2 154 The kuseg 150 s-gment for th- k-rn-l mod- 1~ 2-GByt- in ~ize, `` coincid-nt with th- "ku~-g" of th- u~-r mod-, ~o when in the k-rn-l mode th- proc-s~or treats r-f-r-nc-- to thi~ s-gm-nt just ... .

:, ~ .

.

.. 1 ~ . .
, - .
.

: , .

' -.~ - . : -. .
.. . . .
" ,.

like user mode references, thus ~treamlining kernel access to user data The ku~eg 150 i~ u~ed to hold us-r code and data, but the operating system often needR to reference this same code or data The k~egO area 152 is a 512-MByte kernel physical address space direct-mapped onto the first sl2-MBytes o~ phy~ical address ~pace, and is cached but does not use th- ~L~ 83; this segment is used ~or kernel executable code and ~ome kernel data, and is represented by thQ area 143 of Figure 18 in local memory 16 The ksegl area 153 is also directly mapped into the first 512-MByte of phy~ical addres~ space, the same as k~egO, and is uncached and uses no TLB entrie~ Ksegl differs ~rom k~egO only in that it is uncached Ksegl i~ used by the operating sy~tem for I/O
registers, ROM code and disk buffer~, and so corresponds to areas 147 and 149 of the physical map of Figure 18 The kseg2 area 154 i~ a l-G~yt- space which, like kuseg, u~e~ T1~3 83 entries to map virtual addresses to arbitrary physieal ones, with or w$thout caching This k~eg2 area diff-r- from th- kuseg ar-a 150 only in that it is not accessiblQ in the user mod-, but instead only in the kernel mode The operating system use~ kseg2 for ~tack~ and per-process data that must remap on eontext switehes, for user page tables (memory map), and for some dynamieally-allocated data areas Kseg2 allows seleetive eaehing and mapping on a per page basis, rather than requiring an all-or-nothing approaeh The 32-bit virtual addr-sse- generated in the regi~ters 76 ~25 or PC 80 of th- mieroproee--or ehip and output on th- bus 84 are r-pr---nt-d in Figur- 20, wh-r- it i~ --n that bits 0-11 are the Or~--t u~-d uneonditionally a- th- low-ord-r 12-bit~ of th-addr--- on bu- 42 o~ Figur- 3, while bit~ 12-31 ar- th- VPN or virtual pag- numb r in whieh bit~ 29-31 ~ ct b tw--n kus-g, ~30 ks-gO, k~ gl and ks-g2 Th- proe--- id-ntifi-r PID for the eurr-ntly--xoeuting proc--~ tor-d in a regi~ter also acc-~-ibl- by th- TLB Th- 64-bit TL~ ntri-- are r-pre--nted in ~ Figur- 20 as w-ll, wh-re it i~ -en that th- 20-bit VPN from the `~ virtual addres~ is compar-d to th- 20-bit VPN field locat-d in ~35 bits 44-63 of th- 64-bit ntry, whil- at th- sam tim th- PID is SS

,~
" .
.j ` `' ..

., -.
.,. . ~

. .

2~t~3342 compared to bits 38-43; if a match is found in any of the sixty-~our 64-bit T~3 entrie~, the pag- frame number PFN at bits 12-31 of th- matched entry i~ used as the output via busse- 82 and 42 of Figure 3 (assu~ing other crit-ria ar- m-t) Other one-bit valu~ in a TLB entry include N, D, V and G N i8 the non-cachable indicator, and if set the page i- non-cachable and the processor directly accesses local memory or global memory instead o~ first accessing the cache 44 or 45 D is a write-protect bit, and if set means that the location is ~dirty" and therefore lo writable, but if zero a write operation cau-es a trap The V bit means valid if set, and allow~ the TLB Qntrie~ to be cleared by merely resetting the valid bits; thi~ V bit is used in the page-swapping arrangement of this system to indicat- whether a page is in local or global memory The G bit is to allow global accesses which ignore the PID match reguirement ~or a valid TLB
translation; in kseg2 this allow- the k-rnel to access all mapped data without regard ~or PID

The device controllers 30 cannot do DMA into local memory 16 directly, and so the global memory is used a~ a ~taging area for DMA type block transfers, typically ~ro~ disk 148 or the like The CPUs can perform operations dir-ctly at the controllers 30, to initiate or actually control operations by the controllers (i e , programmed I/O), but the controll-r~ 30 cannot do DMA
except to global memory; the controll-r~ 30 can become the VMEbus `~ 25 (bus 28) ma~ter and through th- I/O proc--~or 26 or 27 do reads ; or writ-- dir-ctly to global m mory in the m-mory modules 14 and lS

.~
~`~ Pag- wapping b tween global and local memori-s (~nd disk) i~ initiat-d ith-r by a pag- ~ault or by an aging proc-~ A
pag- rault occurs wh-n a proces~ i~ ex-cuting and att-mpts to ex-cute ~rom or access a pag- that i- in global m-mory or on disk; the TI~3 83 will show a mi~ and a trap will r-sult, so low ~ level trap code in th- kern-l will show th- location o~ the page, `~ and a routine will b- entered to initiat- a pag- swap If the r ~ 56 ;~
. ~
,~, `', - ~ ~ .. . ..

.~ . .
.. ...~.
' - ; , ,. '~ , `

~ ~;

page needed i~ in global memory, a serie- of command~ are sent to the DMA controller 74 to writ- t~ t-r~cently-used pag- ~rom local memory to global memory and to read th- needed page from global to local If the page i9 on disk, commands and addresses s (sectors) are written to the controller 30 rom tho CPU to go to disk and acquire th- page, then the proce-s which made the memory referenc- is suspended When the disk controller has ~ound the data and i~ ready to ond it, an interrupt i- signalled which will be used by the memory module~ (not reaching the CPUs) to allow the disk controller to begin a DMA to global memory to write th- page into global memory, and wh-n finished the CPU is interrupted to begin a block transfer under control of DMA
controller 74 to swap a least used page from local to global and read the needed page to local Then, th- original process is lS mad- runnabl- again, ~tate is re~tored, and the original memory referenc- will again occur, finding the needed page in local ~-mory The other mechanism to initiate pag- swapping is an aging routin- by which the operating ~y~tem periodically goes through th- page~ in local memory marking them as to whether or `20 not each page has been used recently, and tho~e that have not are sub~ect to be pushed out to global memory A ta~k switch does not itselr initiat- pag- swapping, but in~tead as the new task begin~ to produce page fault~ pages will be swapped as needed, i and th- candidate- ~or swapping out ar- those not recently used .
~25 I~ a memory r-~-r-nc- i- mad- and a TLB mi-s is shown, but `~ th- pag- tabl- lookup r-~ulting ~om th- TLB mis~ xc-ption shows th- pag- i- ln loeal ~ -ory, th-n a TLB ntry i- mad- to show thl- pag to b- ln local m mory That 1~, th- proc-~s takes an xc-ptlon wh-n th- TL~ - oecur-, go-~ to th- pag~ tables (in ~30 th- k-rn-l data ~-etion), ~ind- th- tabl- ntry, writ-~ to T~B, then th- proe--- i- allow d to proe- d But ir th- memory r-f-r-ne- hows a TL~ mi--, and th- pag- tables how th-corr-~ponding phy~ieal addr-s~ i~ in global memory (ov-r 8M
phy~ieal addr-~s), th- T~B ntry i- mad- for this page, and when ~35 the proc-s~ r-~ume- it will find th- page ntry in th- TLB as ~ 57 ,~
.~

., ::`
., ~ , ,. . - -'`''" ' ~:' ' . , ' ' ' ~' ' ' . ' --: .

~ : :
, . . .

before; yet another exception is taken b~cause the valid bit will be zero, indicating the page i5 physically not in local memory, so this time the exceptlon will enter a routine to swap the page from global to local and validate the TL~ entry, ~o xecution can then proceed In the third situation, ir th- page tables ~how address for the memory re~erence is on dlsk, not ~n local or global memory, thQn the system operate~ as indicated above, i e , the process i8 put off the run queuo and put in the sleep queue, a disk request is made, and when the diok has transferred the page to global memory and signalled a command-complete interrupt, then the page is swapped from global to local, and the TLB
updated, then the process can execute again Private Memory Although the m mory modules 14 and 15 stor- the same data at tho same locations, and all three CPUs 11, 12 and 13 have equal access to thes- memory modules, ther- i~ a ~mall area of the memory assigned under software control a- a private memory in each one of the memory modules For xample, as illustrated in Figur- 21, an area 155 of the map of the m mory modul- locations is d-signat-d the private memory area, and is writabl- only when th- CPUs issue a "private m-mory writ-~ command on bus 59 In an exampl- mbodim-nt, th- private memory ar-a 155 i~ a 4K pag-~tarting at th- addr-~- contain-d in a r-gi-t-r 156 in th- bus int-rfac- 56 of ach on- of th- CPU modul--; this starting addr--- can b- chang d und-r softwar- control by writing to this r-gist-r lS6 by th- CPU Th- privato m-mory area 155 is further divid-d b tw -n th thr-- CPUs; only cPU-a can write to area 155a, CPU-B to ar-a 155b, and CPU-C to ar-a 155c On- o~ tho command ~ignal~ in bu- S7 i- sot by tho bu- interfac- 56 to inform th- m~mory modul-- 14 and lS that th- op-ration is a privat- writ-, and thi~ i- -t in r--pon-- to th- addr---g-n-rat-d by th~ proc--~or 40 from a Stor- instruction; bits of th- addr-ss (and a WritQ command) ar- detected by a decoder 157 :
:, .

. . .
.
.

.: .: ~ . . . - , :. . .. .

~.

.

in the bus interface (which compares bus addresses to the contentJ o~ r~gister 156) and us-d to genorat- the "private memory writ-" com~and for bus 57 In th- m~mory module, when a write command i~ d-tected in th- registers 94, 95 and 96, and the s addresses and commands ar- all vot-d good (i - , in agreement) by th- vot- circuit 100, th-n th- control circuit lOo allows the data from only on- of the CPUs to pass through to the bus 101, this one being determined by two bits of the address from the CPUs During this privat- write, all thr-- CPUs pre~ent the same address on th-ir bus 57 but differ-nt data on th-ir bus 58 (the different data i~ so~- state unigu- to th- CPU, for exa~ple) Th- memory modules vote the addresses and commands, and select data from only one CPU based upon part of the address field seen on the address bus To allow the CPUs to vote some data, all lS thr~e CPU~ will do three private writes (th-r- will be three writes on the bu~es 21, 22 and 23) of ~om state information unique to a CPU, into both m mory modul-- 14 and 15 During each write, ach CPU s-nd~ its uniqu- data, but only on- is accepted each time So, the software ffeguenc- ex-cuted by all three CPUs iB (1) Storo (to location lSSa), (2) Store (to loeation lSSb), (3) Stor- (to location lSSe) But data from only on- CPU is aetually written eaeh time, and th- data i- not voted (beeause it is or eould be differont and eould show a fault if voted) Then, the CPUs ean vote the data by having all three CPUs read all ~25 throe of th- loeation- 15Sa, lSSb and 155e, and by software eompar- thi~ data Thi- typ- of op-ration i~ us-d in diagno-tie-, for xa~pl-, or in int-rrupta to vote the cause r-gl~t-r data ~h- privat--writ- m ehani~ u~ed in fault deteetion and ~30 r-eov-ry For xampl-, if th- CPU- d-t-et a bus rror upon making a ~e~ory r-ad r-qu-~t, sueh a~ a me~ory module 14 or 15 returning bad statu~ on lin ~ 33-1 or 33-2 At thi~ point a CPU
:*
do-sn~t know if th- oth-r CPU- r-e-iv-d th sa~- status from the mory ~odul-; th- CPU eould b- faulty or its status deteetion eireuit faulty, or, a- indieat-d, th- ~e~ory eould be faulty ; ~`
.,, : .
:, ~ ' .-:
, .. . -.
.. ,.
: .

, .
.
. . . . .

so~ to isolate the fault, when the bus fault routine mentioned above i8 ntered, all thre~ cPu~ do a privat- write of th~ status information they ~UQt received from the memory module~ in the preced~ng read attempt Then all threo CPUs read what the others have writt-n, and compare it with their own memory status information If they all agree, then the memory module i9 voted off-line If not, and one CPU shows bad status for a memory module but the others show good status, then that CPU is voted of~-line Fault-Tolerant Power Supply .' ~
Referring now to Figure 22, the system of the pre erred embodiment may us- a fault-tolerant power supply which provides the capability for on-line replacement of failed power ~upply modules, as well as on-line replacement of CPU modules, memory modules, I/O processor modules, I/O controllers and di~k module~
as discussed above In the circuit of Figure 22, an a/c power line 160 is connected directly to a power distribution unit 161 that provides power line filtering, tran~i-nt suppr- ~ors, and a circuit breaker to protect against short clrcuits To protect again~t a/c power line failure, redundant battery packs 162 and 163 provide 4-1/2 minutes of full sy-tQm power 80 that ordarly system shutdown can be accomplish-d Only on- of the two battery packs 162 or 163 i~ r-quir-d to be op-rative to saf-ly shut the y-t-~ down ,:
2S Th- pow-r sub-y~t-~ has two identical AC to DC bulk power supplio~ 164 and 16S which exhibit high pow-r factor and energize a pair of 36-volt DC di-tribution bu~ 166 and 167 Th- ~ystem can remain op-rational with on- of th- bulk power supplies 164 or 16S op~rational .:
Four ~-parat- pow r distribution buss-s are includ-d in thes- bu~s-s 166 and 167 Th- bulk ~upply 164 drive~ a power bus .,~ .
' 60 .;
: i, ; , .
, ;

: . - - . -: . . .
..
.
. ............ - , . .
... .

166-~, 167-1, while the bulk upply 165 drive power bus 166-2, 167-2 Th- batt-ry pack 162 drives bus 166-3, 167-3, and is itself recharged from both 166-1 and 166-2 The batt-ry pack ~63 drives bu~ 166-3, 167-3 and is recharged from busse~ 166-1 and 167-2 Th- three cPus 11, 12 and 13 are driven ~rom dif~er-nt combination~ of the-- tour distribution bu~

A number of DC-to-DC converters 168 connected to these 36-v busses 166 and 167 are used to individually power the CPU modules 11, 12 and 13, the memory module~ 14 and 15, th- I/0 processors 26 and 27, and the I/0 controllQr~ 30 Th- bulk power supplies 164 and 165 also pow-r the thre- syst-m fan- 169, and battery charger~ for the battory packs 162 and 163 By having thQse separate DC-to-DC converters for each y~t-m component, failur-of on- conv rter doe~ not result in ~yst-~ ~hutdown, but instead lS the 4y-t-~ will continu- under on- of it~ failur- r-covery modes discuss-d abov-, and th- failed power 8Upply compon-nt can b-rJ r-plac-d while the sy~tQm is operating : q The power system can be shut down by ither a manual switch (with ~tandby and off functions) or und-r oftware control from a ~20 maint-nanc- and diagno-tic proc--sor 170 which automatically d-~aults to th- pow r-on stat- in th- v nt of a maint-nanc- and diagnostic pow r failur-. ~ .
`3 Whil- th- inv ntion ha- b--n d ~erib d with r-fer-nce to a p-eifie ~bodi~ nt, th d--eription i~ not ~ ant to b- eon~trued ~;~2S in a li~iting -n-- Variou- ~odifieation- of th di~clos-d ~bodi~ nt, a- w ll a- oth-r rbodi~ nt- of the inv ntion, will b- appar-nt to p-r-on~ kill-d in th art upon r-f-r-nc- to thia d--eription It i- th-r-for- eont-uplat d that th- app-nded claim~ will eov-r any ueh ~odifieation- or e~bodl~-nt~ a~ fall ~30 within th- tru- ~eop- of th- inv ntion .
.

:`~

. ;:~ ., .: . . ~, ' ~ . .
,

Claims (44)

1. A computer system comprising:
a) multiple CPUs each executing the same instruction stream, each CPU employing virtual memory addressing with paging;
b) each CPU having a local memory accessible only by said CUP, the local memory containing selected pages;
c) a global memory accessible by all said CPUs, the local memory having faster access time than the global memory, the global memory containing selected pages and page-swapped with said local memory upon demand to maintain most-used pages in said local memory of each CPU.
2. A system according to claim 1 further including a disk memory coupled to said global memory and having access time slower than said global memory, the disk memory containing pages defined by said virtual memory addressing and page-swapped with said global memory and local memory upon demand.
3. A system according to claim 1 further including an operating system having a kernel stored in said local memory for each CPU.
4. A system according to claim 1 wherein each said CPU has a separate cache memory having access time faster than that of said local memory.
5. A system according to claim 1 wherein said CPUs are clocked independent of one another, and wherein said CPUs are synchronized upon accessing said global memory, and said global memory is duplicated.
6. A system according to claim 1 wherein said global memory is coupled to I/O means accessible only via said global memory, and said global memory is used for staging I/O requests by said CPUs.
7. A method of operating a computer system, comprising the steps of:
a) executing the same instruction stream in multiple CPUs using virtual memory addressing with paging;
b) accessing a local memory by each CPU in execution of said instruction stream, each local memory accessible only by one of said CPUs, to store selected pages in the local memory;
c) accessing a global memory by all of said CPUs in execu-tion of said instruction stream, the global memory accessible by all said CPUs, the local memory having faster access time than the global memory, to store selected pages in the global memory page-swapped with said local memory upon demand to maintain most-used pages in said local memory of each CPU.
8. A method according to claim 7 further including the step of storing pages in a disk memory coupled to said global memory, the disk memory having access time slower than said global memory, the pages stored in said disk memory being defined by said virtual memory addressing and page-swapped with said global memory and local memory upon demand.
9. A method according to claim 7 including executing said instruction stream under an operating system having a kernel stored in said local memory for each CPU.
10. A method according to claim 7 wherein each said CPU has a separate cache memory having access time faster than that of said local memory.
11. A method according to claim 7 including the step of clocking the CPUs independently of one another, including the step of synchronizing the CPUs upon accessing said global memory, and wherein said global memory is duplicated.
12. A method according to claim 7 wherein said global memory is coupled to I/O means accessible only via said global memory, and including the step of transferring data between said CPUs and said I/O means using said global memory for staging.
13. A method of operating a computer system, comprising the steps of:
a) executing the same instruction stream in multiple proces-sors using virtual memory addressing with paging under control of an operating system having a kernel;
b) accessing a local memory by each processor in execution of said instruction stream, each local memory accessible only by one of said processors, to store selected pages in the local memory and to store said kernel of said operating system;
c) accessing a duplicated global memory by all of said processors in execution of said instruction stream, the global memory accessible by all said processors, the local memory having faster access time than the global memory, to store selected pages in the global memory page-swapped with said local memory upon demand under control of said operating system to maintain most-used pages in said local memory of each processor; and d) storing pages in a disk memory coupled to said global memory, the disk memory having access time slower than said global memory, the pages stored in said disk memory being defined by said virtual memory addressing using said operating system and page-swapped with said global memory and local memory upon demand.
14. A method according to claim 13 wherein each said processor has a separate cache memory having access time faster than that of said local memory.
15. A method according to claim 13 including the steps of clocking the processors independently of one another, and includ-ing the step of synchronizing the processors upon accessing said global memory.
16. A method according to claim 13 wherein said global memory is coupled to I/O means accessible only via said global memory, and including the step of transferring data between said processor and said I/O means using said global memory for staging.
17. A computer system, comprising a) a plurality of CPUs each executing an instruction stream, the CPUs being clocked independently of one another to provide execution cycles, the CPUs executing stall cycles while awaiting implementation of some instruction execution;
b) each of the CPUs having a first counter to count execu-tion cycles but not stall cycles, and having a second counter to count stall cycles;
c) each of said CPUs having a local memory requiring perio-dic refresh;
d) and a refresh control for each CPU responsive to said first and second counters to initiate a refresh of said local memory to perform a number of refresh cycles depending upon output of the second counter.
18. A system according to claim 17 wherein said refresh control initiates said refresh at execution of the same instruc-tion in said instruction stream in each of said CPUs.
19. A system according to claim 17 wherein said CPUs are loosely synchronized by voting access to a common memory acces-sible by all said CPUs.
20. A system according to claim 17 wherein there are three said CPUs and wherein said CPUs access a duplicated common global memory.
21. A computer system comprising:
a) a CPU executing an instruction stream, the CPU being clocked to provide execution cycles, the CPU executing stall cycles while awaiting implementation of some instruction execu-tion;
b) the CPU having a first counter to count execution cycles but not stall cycles, and having a second counter to count stall cycles;
c) said CPU having a memory requiring periodic refresh;
d) and a refresh control for said CPU responsive to said first and second counters to initiate a refresh of said memory to perform a number of refresh cycles depending upon output of the second counter.
22. A system according to claim 21 wherein a third counter counts the number of times said second counter overflows, and the number of said refresh cycles is determined by the content of said third counter.
23. A system according to claim 21 wherein said first counter is of a size related to the number of refresh cycles needed by said local memory in a given time period.
24. A method of operating a computer system, comprising the steps of:
a) executing an instruction stream in each of a plurality of CPUs, the CPUs being clocked independently of one another to provide execution cycles, the CPUs executing stall cycles while awaiting implementation of some instruction execution;
b) counting execution cycles but not stall cycles in each of the CPUs in a first counter, and counting stall cycles in each CPU in a second counter;
e) each of said CPUs accessing a local memory requiring periodic refresh;
d) and initiate refresh of said local memory for each CPU
responsive to said first and second counters to perform a number of refresh cycles depending upon output of the second counter.
25. A method according to claim 24 wherein said step of initiating said refresh is done at execution of the same instruc-tion in said instruction stream in each of said CPUs.
26. A method according to claim 24 wherein said CPUs are loosely synchronized by voting access to a common memory acces-sible by all said CPUs.
27. A method according to claim 24 wherein there are three said CPUs and wherein said CPUs access a duplicated common global memory.
28. A computer system comprising:
a) multiple CPUs executing the same instruction stream, b) a common memory having memory space accessed by all said CPUs, c) a private memory space in said common memory for storing state information for each CPU writable only by one CPU, d) said state information in said private memory spaces for all CPUs being readable by all CPUs to thereby evaluate said state information for equality by each CPU.
29. A system according to claim 28 wherein there are a plurality of said private memory spaces, one for each one of said CPUs.
30. A system according to claim 28 wherein memory accesses made by said CPUs to said common memory are voted by said common memory before being executed.
31. A system according to claim 30 wherein memory accesses made by said CPUs to said private memory are voted to compare addresses but not data.
32. A system according to claim 28 wherein said private memory for each CPU has the same logical address associated with instructions executed by said CPUs, but is translated to a unique address for each private memory before addressing said common memory.
33. A computer system having multiple CPUs comprising:
a) a shared memory having memory space accessed by all of said multiple CPUs, b) each one of said multiple CPUs also having a separate private-write memory space in said shared memory for storing state information, each said private-write space writable only by one of said multiple CPUs;
c) said private-write memory spaces for each one of said multiple CPUs being readable by all of said multiple CPUs.
34. A system according to claim 33 wherein said multiple CPUs are executing the same instruction stream.
35. A system according to claim 34 wherein said shared memory votes memory requests made by said multiple CPUs to said shared memory.
36. A system according to claim 33 wherein said shared memory votes write requests made to said private-write spaces by comparing addresses but not data.
37. A method or operating a computer system having multiple processors, comprising the steps of:
a) storing data by each of said multiple processors in a shared memory having memory space accessed by all of said multi-ple processors, b) also storing information by each one of said multiple processors in a private memory space for each multiple processor writable only by one multiple processor.
38. A method according to claim 37 including the step of executing the same instruction stream in each one of said multi-ple processors.
39. A method according to claim 37 wherein said step of storing data includes voting memory requests to said shared memory made by said multiple processors.
40. A method according to claim 37 wherein step of storing information in private memory space includes making a write request to all of said private memory spaces by each of said multiple processors but executing the write request only for the one processor for each write request associated with each private memory space.
41. A method according to claim 37 including the step of evaluating for equality said information from said private memory space by each one of said multiple processors.
42. A method according to claim 37 including the step of reading said information in said private memory spaces for all multiple processors by each multiple processor.
43. A method according to claim 42 including the step of executing the same instruction stream in each one of said multi-ple processors, and wherein said step of storing data includes voting memory requests to said shared memory made by said multi-ple processors.
44. A method according to claim 43 wherein said multiple processors are loosely synchronized upon the event of voting memory requests.
CA002003342A 1988-12-09 1989-11-20 Memory management in high-performance fault-tolerant computer system Abandoned CA2003342A1 (en)

Applications Claiming Priority (11)

Application Number Priority Date Filing Date Title
US28246988A 1988-12-09 1988-12-09
US28254088A 1988-12-09 1988-12-09
US28262988A 1988-12-09 1988-12-09
US282,469 1988-12-09
US282,540 1988-12-09
US28357488A 1988-12-13 1988-12-13
US07/283,573 US4965717A (en) 1988-12-09 1988-12-13 Multiple processor system having shared memory with private-write capability
US283,573 1988-12-13
EP90105103A EP0447578A1 (en) 1988-12-09 1990-03-19 Memory management in high-performance fault-tolerant computer system
EP90105102A EP0447577A1 (en) 1988-12-09 1990-03-19 High-performance computer system with fault-tolerant capability
AU52027/90A AU628497B2 (en) 1988-12-09 1990-03-20 Memory management in high-performance fault-tolerant computer systems

Publications (1)

Publication Number Publication Date
CA2003342A1 true CA2003342A1 (en) 1990-06-09

Family

ID=41040648

Family Applications (2)

Application Number Title Priority Date Filing Date
CA002003342A Abandoned CA2003342A1 (en) 1988-12-09 1989-11-20 Memory management in high-performance fault-tolerant computer system
CA002003337A Abandoned CA2003337A1 (en) 1988-12-09 1989-11-20 High-performance computer system with fault-tolerant capability

Family Applications After (1)

Application Number Title Priority Date Filing Date
CA002003337A Abandoned CA2003337A1 (en) 1988-12-09 1989-11-20 High-performance computer system with fault-tolerant capability

Country Status (7)

Country Link
US (7) US4965717A (en)
EP (5) EP0372579B1 (en)
JP (3) JPH079625B2 (en)
AT (1) ATE158879T1 (en)
AU (1) AU628497B2 (en)
CA (2) CA2003342A1 (en)
DE (1) DE68928360T2 (en)

Families Citing this family (432)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2003338A1 (en) * 1987-11-09 1990-06-09 Richard W. Cutts, Jr. Synchronization of fault-tolerant computer system having multiple processors
AU616213B2 (en) 1987-11-09 1991-10-24 Tandem Computers Incorporated Method and apparatus for synchronizing a plurality of processors
JPH02103656A (en) * 1988-10-12 1990-04-16 Fujitsu Ltd System for controlling successive reference to main storage
US4965717A (en) 1988-12-09 1990-10-23 Tandem Computers Incorporated Multiple processor system having shared memory with private-write capability
AU625293B2 (en) * 1988-12-09 1992-07-09 Tandem Computers Incorporated Synchronization of fault-tolerant computer system having multiple processors
US5148533A (en) * 1989-01-05 1992-09-15 Bull Hn Information Systems Inc. Apparatus and method for data group coherency in a tightly coupled data processing system with plural execution and data cache units
EP0378415A3 (en) * 1989-01-13 1991-09-25 International Business Machines Corporation Multiple instruction dispatch mechanism
US5276828A (en) * 1989-03-01 1994-01-04 Digital Equipment Corporation Methods of maintaining cache coherence and processor synchronization in a multiprocessor system using send and receive instructions
IT1228728B (en) * 1989-03-15 1991-07-03 Bull Hn Information Syst MULTIPROCESSOR SYSTEM WITH GLOBAL DATA REPLICATION AND TWO LEVELS OF ADDRESS TRANSLATION UNIT.
NL8901825A (en) * 1989-07-14 1991-02-01 Philips Nv PIPELINE SYSTEM WITH MULTI-RESOLUTION REAL-TIME DATA PROCESSING.
US5307468A (en) * 1989-08-23 1994-04-26 Digital Equipment Corporation Data processing system and method for controlling the latter as well as a CPU board
JPH0666056B2 (en) * 1989-10-12 1994-08-24 甲府日本電気株式会社 Information processing system
US5551050A (en) * 1989-12-20 1996-08-27 Texas Instruments Incorporated System and method using synchronized processors to perform real time internal monitoring of a data processing device
US5317752A (en) * 1989-12-22 1994-05-31 Tandem Computers Incorporated Fault-tolerant computer system with auto-restart after power-fall
US5327553A (en) * 1989-12-22 1994-07-05 Tandem Computers Incorporated Fault-tolerant computer system with /CONFIG filesystem
US5295258A (en) 1989-12-22 1994-03-15 Tandem Computers Incorporated Fault-tolerant computer system with online recovery and reintegration of redundant components
DE69033954T2 (en) * 1990-01-05 2002-11-28 Sun Microsystems Inc ACTIVE HIGH SPEED BUS
US5263163A (en) * 1990-01-19 1993-11-16 Codex Corporation Arbitration among multiple users of a shared resource
JPH0748190B2 (en) * 1990-01-22 1995-05-24 株式会社東芝 Microprocessor with cache memory
DE69123987T2 (en) * 1990-01-31 1997-04-30 Hewlett Packard Co Push operation for microprocessor with external system memory
US5680574A (en) * 1990-02-26 1997-10-21 Hitachi, Ltd. Data distribution utilizing a master disk unit for fetching and for writing to remaining disk units
US6728832B2 (en) 1990-02-26 2004-04-27 Hitachi, Ltd. Distribution of I/O requests across multiple disk units
JPH03254497A (en) * 1990-03-05 1991-11-13 Mitsubishi Electric Corp Microcomputer
US5247648A (en) * 1990-04-12 1993-09-21 Sun Microsystems, Inc. Maintaining data coherency between a central cache, an I/O cache and a memory
US5289588A (en) * 1990-04-24 1994-02-22 Advanced Micro Devices, Inc. Interlock acquisition for critical code section execution in a shared memory common-bus individually cached multiprocessor system
DE69124285T2 (en) * 1990-05-18 1997-08-14 Fujitsu Ltd Data processing system with an input / output path separation mechanism and method for controlling the data processing system
US5276896A (en) * 1990-06-11 1994-01-04 Unisys Corporation Apparatus for implementing data communications between terminal devices and user programs
US5488709A (en) * 1990-06-27 1996-01-30 Mos Electronics, Corp. Cache including decoupling register circuits
US5732241A (en) * 1990-06-27 1998-03-24 Mos Electronics, Corp. Random access cache memory controller and system
DE69129252T2 (en) * 1990-08-06 1998-12-17 Ncr Int Inc Method for operating a computer memory and arrangement
US5450573A (en) * 1990-08-14 1995-09-12 Siemens Aktiengesellschaft Device for monitoring the functioning of external synchronization modules in a multicomputer system
GB9018993D0 (en) * 1990-08-31 1990-10-17 Ncr Co Work station interfacing means having burst mode capability
GB9019023D0 (en) * 1990-08-31 1990-10-17 Ncr Co Work station having multiplexing and burst mode capabilities
EP0473802B1 (en) * 1990-09-03 1995-11-08 International Business Machines Corporation Computer with extended virtual storage concept
US6108755A (en) * 1990-09-18 2000-08-22 Fujitsu Limited Asynchronous access system to a shared storage
DE69231452T2 (en) * 1991-01-25 2001-05-03 Hitachi Ltd Fault-tolerant computer system with processing units that each have at least three computer units
US6247144B1 (en) * 1991-01-31 2001-06-12 Compaq Computer Corporation Method and apparatus for comparing real time operation of object code compatible processors
US5465339A (en) * 1991-02-27 1995-11-07 Vlsi Technology, Inc. Decoupled refresh on local and system busses in a PC/at or similar microprocessor environment
US5303362A (en) * 1991-03-20 1994-04-12 Digital Equipment Corporation Coupled memory multiprocessor computer system including cache coherency management protocols
US5339404A (en) * 1991-05-28 1994-08-16 International Business Machines Corporation Asynchronous TMR processing system
US5233615A (en) * 1991-06-06 1993-08-03 Honeywell Inc. Interrupt driven, separately clocked, fault tolerant processor synchronization
US5280608A (en) * 1991-06-28 1994-01-18 Digital Equipment Corporation Programmable stall cycles
JPH056344A (en) * 1991-06-28 1993-01-14 Fujitsu Ltd Program run information sampling processing system
US5319760A (en) * 1991-06-28 1994-06-07 Digital Equipment Corporation Translation buffer for virtual machines with address space match
JP3679813B2 (en) * 1991-07-22 2005-08-03 株式会社日立製作所 Parallel computer
US5421002A (en) * 1991-08-09 1995-05-30 Westinghouse Electric Corporation Method for switching between redundant buses in a distributed processing system
US5386540A (en) * 1991-09-18 1995-01-31 Ncr Corporation Method and apparatus for transferring data within a computer using a burst sequence which includes modified bytes and a minimum number of unmodified bytes
JP2520544B2 (en) * 1991-09-26 1996-07-31 インターナショナル・ビジネス・マシーンズ・コーポレイション Method for monitoring task overrun status and apparatus for detecting overrun of task execution cycle
WO1993009494A1 (en) * 1991-10-28 1993-05-13 Digital Equipment Corporation Fault-tolerant computer processing using a shadow virtual processor
EP0543032A1 (en) * 1991-11-16 1993-05-26 International Business Machines Corporation Expanded memory addressing scheme
US5379417A (en) * 1991-11-25 1995-01-03 Tandem Computers Incorporated System and method for ensuring write data integrity in a redundant array data storage system
EP0550358A3 (en) * 1991-12-30 1994-11-02 Eastman Kodak Co Fault tolerant multiprocessor cluster
US5313628A (en) * 1991-12-30 1994-05-17 International Business Machines Corporation Component replacement control for fault-tolerant data processing system
JP2500038B2 (en) * 1992-03-04 1996-05-29 インターナショナル・ビジネス・マシーンズ・コーポレイション Multiprocessor computer system, fault tolerant processing method and data processing system
JPH07504527A (en) * 1992-03-09 1995-05-18 オースペックス システムズ インコーポレイテッド High performance non-volatile RAM protected write cache accelerator system
US5632037A (en) * 1992-03-27 1997-05-20 Cyrix Corporation Microprocessor having power management circuitry with coprocessor support
US5428769A (en) * 1992-03-31 1995-06-27 The Dow Chemical Company Process control interface system having triply redundant remote field units
AU4279793A (en) * 1992-04-07 1993-11-08 Video Technology Computers, Ltd. Self-controlled write back cache memory apparatus
JP2534430B2 (en) * 1992-04-15 1996-09-18 インターナショナル・ビジネス・マシーンズ・コーポレイション Methods for achieving match of computer system output with fault tolerance
DE4219005A1 (en) * 1992-06-10 1993-12-16 Siemens Ag Computer system
AU4400893A (en) * 1992-06-12 1994-01-04 Dow Chemical Company, The Stealth interface for process control computers
US5583757A (en) * 1992-08-04 1996-12-10 The Dow Chemical Company Method of input signal resolution for actively redundant process control computers
US5537655A (en) * 1992-09-28 1996-07-16 The Boeing Company Synchronized fault tolerant reset
US5379415A (en) * 1992-09-29 1995-01-03 Zitel Corporation Fault tolerant memory system
US6951019B1 (en) * 1992-09-30 2005-09-27 Apple Computer, Inc. Execution control for processor tasks
JPH06214969A (en) * 1992-09-30 1994-08-05 Internatl Business Mach Corp <Ibm> Method and equipment for information communication
US5434997A (en) * 1992-10-02 1995-07-18 Compaq Computer Corp. Method and apparatus for testing and debugging a tightly coupled mirrored processing system
US6237108B1 (en) * 1992-10-09 2001-05-22 Fujitsu Limited Multiprocessor system having redundant shared memory configuration
US5781715A (en) * 1992-10-13 1998-07-14 International Business Machines Corporation Fault-tolerant bridge/router with a distributed switch-over mechanism
US5448716A (en) * 1992-10-30 1995-09-05 International Business Machines Corporation Apparatus and method for booting a multiple processor system having a global/local memory architecture
DE69325769T2 (en) * 1992-11-04 2000-03-23 Digital Equipment Corp Detection of command synchronization errors
US5327548A (en) * 1992-11-09 1994-07-05 International Business Machines Corporation Apparatus and method for steering spare bit in a multiple processor system having a global/local memory architecture
EP0600623B1 (en) * 1992-12-03 1998-01-21 Advanced Micro Devices, Inc. Servo loop control
US5751932A (en) * 1992-12-17 1998-05-12 Tandem Computers Incorporated Fail-fast, fail-functional, fault-tolerant multiprocessor system
US6233702B1 (en) * 1992-12-17 2001-05-15 Compaq Computer Corporation Self-checked, lock step processor pairs
US6157967A (en) * 1992-12-17 2000-12-05 Tandem Computer Incorporated Method of data communication flow control in a data processing system using busy/ready commands
JP2826028B2 (en) * 1993-01-28 1998-11-18 富士通株式会社 Distributed memory processor system
US5845329A (en) * 1993-01-29 1998-12-01 Sanyo Electric Co., Ltd. Parallel computer
US5473770A (en) * 1993-03-02 1995-12-05 Tandem Computers Incorporated Fault-tolerant computer system with hidden local memory refresh
JPH0773059A (en) * 1993-03-02 1995-03-17 Tandem Comput Inc Fault-tolerant computer system
EP0616274B1 (en) * 1993-03-16 1996-06-05 Siemens Aktiengesellschaft Synchronisation method for an automation system
JP2819982B2 (en) * 1993-03-18 1998-11-05 株式会社日立製作所 Multiprocessor system with cache match guarantee function that can specify range
JP2784440B2 (en) * 1993-04-14 1998-08-06 インターナショナル・ビジネス・マシーンズ・コーポレイション Data page transfer control method
US5479599A (en) * 1993-04-26 1995-12-26 International Business Machines Corporation Computer console with group ICON control
US5381541A (en) * 1993-05-26 1995-01-10 International Business Machines Corp. Computer system having planar board with single interrupt controller and processor card with plural processors and interrupt director
JP3004861U (en) * 1993-06-04 1994-11-29 ディジタル イクイプメント コーポレイション Fault Tolerant Storage Control System Using Tightly Coupled Dual Controller Modules
US5435001A (en) * 1993-07-06 1995-07-18 Tandem Computers Incorporated Method of state determination in lock-stepped processors
US5909541A (en) * 1993-07-14 1999-06-01 Honeywell Inc. Error detection and correction for data stored across multiple byte-wide memory devices
JPH0793274A (en) * 1993-07-27 1995-04-07 Fujitsu Ltd System and device for transferring data
US5572620A (en) * 1993-07-29 1996-11-05 Honeywell Inc. Fault-tolerant voter system for output data from a plurality of non-synchronized redundant processors
US5530907A (en) * 1993-08-23 1996-06-25 Tcsi Corporation Modular networked image processing system and method therefor
US5548711A (en) * 1993-08-26 1996-08-20 Emc Corporation Method and apparatus for fault tolerant fast writes through buffer dumping
JPH07129456A (en) * 1993-10-28 1995-05-19 Toshiba Corp Computer system
US5604863A (en) * 1993-11-01 1997-02-18 International Business Machines Corporation Method for coordinating executing programs in a data processing system
US5504859A (en) * 1993-11-09 1996-04-02 International Business Machines Corporation Data processor with enhanced error recovery
EP0731945B1 (en) * 1993-12-01 2000-05-17 Marathon Technologies Corporation Fault resilient/fault tolerant computing
US6161162A (en) * 1993-12-08 2000-12-12 Nec Corporation Multiprocessor system for enabling shared access to a memory
US5537538A (en) * 1993-12-15 1996-07-16 Silicon Graphics, Inc. Debug mode for a superscalar RISC processor
JPH07175698A (en) * 1993-12-17 1995-07-14 Fujitsu Ltd File system
US5535405A (en) * 1993-12-23 1996-07-09 Unisys Corporation Microsequencer bus controller system
US5606685A (en) * 1993-12-29 1997-02-25 Unisys Corporation Computer workstation having demand-paged virtual memory and enhanced prefaulting
JPH07219913A (en) * 1994-01-28 1995-08-18 Fujitsu Ltd Method for controlling multiprocessor system and device therefor
TW357295B (en) * 1994-02-08 1999-05-01 United Microelectronics Corp Microprocessor's data writing, reading operations
US5452441A (en) * 1994-03-30 1995-09-19 At&T Corp. System and method for on-line state restoration of one or more processors in an N module redundant voting processor system
JP2679674B2 (en) * 1994-05-02 1997-11-19 日本電気株式会社 Semiconductor production line controller
JPH07334416A (en) * 1994-06-06 1995-12-22 Internatl Business Mach Corp <Ibm> Method and means for initialization of page-mode memory in computer system
US5566297A (en) * 1994-06-16 1996-10-15 International Business Machines Corporation Non-disruptive recovery from file server failure in a highly available file system for clustered computing environments
US5636359A (en) * 1994-06-20 1997-06-03 International Business Machines Corporation Performance enhancement system and method for a hierarchical data cache using a RAID parity scheme
EP0702306A1 (en) * 1994-09-19 1996-03-20 International Business Machines Corporation System and method for interfacing risc busses to peripheral circuits using another template of busses in a data communication adapter
US5530946A (en) * 1994-10-28 1996-06-25 Dell Usa, L.P. Processor failure detection and recovery circuit in a dual processor computer system and method of operation thereof
US5557783A (en) * 1994-11-04 1996-09-17 Canon Information Systems, Inc. Arbitration device for arbitrating access requests from first and second processors having different first and second clocks
US5630045A (en) * 1994-12-06 1997-05-13 International Business Machines Corporation Device and method for fault tolerant dual fetch and store
US5778443A (en) * 1994-12-14 1998-07-07 International Business Machines Corp. Method and apparatus for conserving power and system resources in a computer system employing a virtual memory
US5586253A (en) * 1994-12-15 1996-12-17 Stratus Computer Method and apparatus for validating I/O addresses in a fault-tolerant computer system
KR100397240B1 (en) * 1994-12-19 2003-11-28 코닌클리케 필립스 일렉트로닉스 엔.브이. Variable data processor allocation and memory sharing
US5555372A (en) * 1994-12-21 1996-09-10 Stratus Computer, Inc. Fault-tolerant computer system employing an improved error-broadcast mechanism
FR2730074B1 (en) * 1995-01-27 1997-04-04 Sextant Avionique FAULT-TOLERANT COMPUTER ARCHITECTURE
US5692153A (en) * 1995-03-16 1997-11-25 International Business Machines Corporation Method and system for verifying execution order within a multiprocessor data processing system
US5864654A (en) * 1995-03-31 1999-01-26 Nec Electronics, Inc. Systems and methods for fault tolerant information processing
US5727167A (en) * 1995-04-14 1998-03-10 International Business Machines Corporation Thresholding support in performance monitoring
JP3329986B2 (en) * 1995-04-28 2002-09-30 富士通株式会社 Multiprocessor system
JP3132744B2 (en) * 1995-05-24 2001-02-05 株式会社日立製作所 Operation matching verification method for redundant CPU maintenance replacement
US5632013A (en) * 1995-06-07 1997-05-20 International Business Machines Corporation Memory and system for recovery/restoration of data using a memory controller
JP3502216B2 (en) * 1995-07-13 2004-03-02 富士通株式会社 Information processing equipment
JP3595033B2 (en) * 1995-07-18 2004-12-02 株式会社日立製作所 Highly reliable computer system
DE19529434B4 (en) * 1995-08-10 2009-09-17 Continental Teves Ag & Co. Ohg Microprocessor system for safety-critical regulations
JP3526492B2 (en) * 1995-09-19 2004-05-17 富士通株式会社 Parallel processing system
US5666483A (en) * 1995-09-22 1997-09-09 Honeywell Inc. Redundant processing system architecture
US5673384A (en) * 1995-10-06 1997-09-30 Hewlett-Packard Company Dual disk lock arbitration between equal sized partition of a cluster
US5790775A (en) * 1995-10-23 1998-08-04 Digital Equipment Corporation Host transparent storage controller failover/failback of SCSI targets and associated units
US5708771A (en) * 1995-11-21 1998-01-13 Emc Corporation Fault tolerant controller system and method
US5732209A (en) * 1995-11-29 1998-03-24 Exponential Technology, Inc. Self-testing multi-processor die with internal compare points
US5802265A (en) * 1995-12-01 1998-09-01 Stratus Computer, Inc. Transparent fault tolerant computer system
US5805789A (en) * 1995-12-14 1998-09-08 International Business Machines Corporation Programmable computer system element with built-in self test method and apparatus for repair during power-on
US5812822A (en) * 1995-12-19 1998-09-22 Selway; David W. Apparatus for coordinating clock oscillators in a fully redundant computer system
US5941994A (en) * 1995-12-22 1999-08-24 Lsi Logic Corporation Technique for sharing hot spare drives among multiple subsystems
US5742823A (en) * 1996-01-17 1998-04-21 Nathen P. Edwards Total object processing system and method with assembly line features and certification of results
US5761518A (en) * 1996-02-29 1998-06-02 The Foxboro Company System for replacing control processor by operating processor in partially disabled mode for tracking control outputs and in write enabled mode for transferring control loops
JPH09251443A (en) * 1996-03-18 1997-09-22 Hitachi Ltd Processor fault recovery processing method for information processing system
US5784625A (en) * 1996-03-19 1998-07-21 Vlsi Technology, Inc. Method and apparatus for effecting a soft reset in a processor device without requiring a dedicated external pin
US5724501A (en) * 1996-03-29 1998-03-03 Emc Corporation Quick recovery of write cache in a fault tolerant I/O system
EP0895602B1 (en) * 1996-04-23 2000-10-04 AlliedSignal Inc. Integrated hazard avoidance system
US6141769A (en) 1996-05-16 2000-10-31 Resilience Corporation Triple modular redundant computer system and associated method
TW320701B (en) * 1996-05-16 1997-11-21 Resilience Corp
US5809546A (en) * 1996-05-23 1998-09-15 International Business Machines Corporation Method for managing I/O buffers in shared storage by structuring buffer table having entries including storage keys for controlling accesses to the buffers
US5802397A (en) * 1996-05-23 1998-09-01 International Business Machines Corporation System for storage protection from unintended I/O access using I/O protection key by providing no control by I/O key entries over access by CP entity
US5724551A (en) * 1996-05-23 1998-03-03 International Business Machines Corporation Method for managing I/O buffers in shared storage by structuring buffer table having entries include storage keys for controlling accesses to the buffers
US5900019A (en) * 1996-05-23 1999-05-04 International Business Machines Corporation Apparatus for protecting memory storage blocks from I/O accesses
US5787309A (en) * 1996-05-23 1998-07-28 International Business Machines Corporation Apparatus for protecting storage blocks from being accessed by unwanted I/O programs using I/O program keys and I/O storage keys having M number of bits
KR100195065B1 (en) * 1996-06-20 1999-06-15 유기범 Data network matching device
US5953742A (en) * 1996-07-01 1999-09-14 Sun Microsystems, Inc. Memory management in fault tolerant computer systems utilizing a first and second recording mechanism and a reintegration mechanism
US5784386A (en) * 1996-07-03 1998-07-21 General Signal Corporation Fault tolerant synchronous clock distribution
EP0825506B1 (en) 1996-08-20 2013-03-06 Invensys Systems, Inc. Methods and apparatus for remote process control
US5790397A (en) 1996-09-17 1998-08-04 Marathon Technologies Corporation Fault resilient/fault tolerant computing
US6000040A (en) * 1996-10-29 1999-12-07 Compaq Computer Corporation Method and apparatus for diagnosing fault states in a computer system
DE69718129T2 (en) * 1996-10-29 2003-10-23 Hitachi Ltd Redundant data processing system
US5784394A (en) * 1996-11-15 1998-07-21 International Business Machines Corporation Method and system for implementing parity error recovery schemes in a data processing system
US6167486A (en) 1996-11-18 2000-12-26 Nec Electronics, Inc. Parallel access virtual channel memory system with cacheable channels
US5887160A (en) * 1996-12-10 1999-03-23 Fujitsu Limited Method and apparatus for communicating integer and floating point data over a shared data path in a single instruction pipeline processor
US6161202A (en) * 1997-02-18 2000-12-12 Ee-Signals Gmbh & Co. Kg Method for the monitoring of integrated circuits
US5805606A (en) * 1997-03-13 1998-09-08 International Business Machines Corporation Cache module fault isolation techniques
US6151684A (en) * 1997-03-28 2000-11-21 Tandem Computers Incorporated High availability access to input/output devices in a distributed system
US5951686A (en) * 1997-03-31 1999-09-14 International Business Machines Corporation Method and system for reboot recovery
US6557121B1 (en) 1997-03-31 2003-04-29 International Business Machines Corporation Method and system for fault isolation for PCI bus errors
US6065139A (en) * 1997-03-31 2000-05-16 International Business Machines Corporation Method and system for surveillance of computer system operations
US6502208B1 (en) 1997-03-31 2002-12-31 International Business Machines Corporation Method and system for check stop error handling
US6119246A (en) * 1997-03-31 2000-09-12 International Business Machines Corporation Error collection coordination for software-readable and non-software readable fault isolation registers in a computer system
US6393520B2 (en) * 1997-04-17 2002-05-21 Matsushita Electric Industrial Co., Ltd. Data processor and data processing system with internal memories
US5933857A (en) * 1997-04-25 1999-08-03 Hewlett-Packard Co. Accessing multiple independent microkernels existing in a globally shared memory system
US5923830A (en) * 1997-05-07 1999-07-13 General Dynamics Information Systems, Inc. Non-interrupting power control for fault tolerant computer systems
US5896523A (en) * 1997-06-04 1999-04-20 Marathon Technologies Corporation Loosely-coupled, synchronized execution
US5991893A (en) * 1997-08-29 1999-11-23 Hewlett-Packard Company Virtually reliable shared memory
US6148387A (en) 1997-10-09 2000-11-14 Phoenix Technologies, Ltd. System and method for securely utilizing basic input and output system (BIOS) services
US6542926B2 (en) 1998-06-10 2003-04-01 Compaq Information Technologies Group, L.P. Software partitioned multi-processor system with flexible resource sharing levels
US6381682B2 (en) 1998-06-10 2002-04-30 Compaq Information Technologies Group, L.P. Method and apparatus for dynamically sharing memory in a multiprocessor system
US6332180B1 (en) 1998-06-10 2001-12-18 Compaq Information Technologies Group, L.P. Method and apparatus for communication in a multi-processor computer system
US6647508B2 (en) 1997-11-04 2003-11-11 Hewlett-Packard Development Company, L.P. Multiprocessor computer architecture with multiple operating system instances and software controlled resource allocation
US6199179B1 (en) * 1998-06-10 2001-03-06 Compaq Computer Corporation Method and apparatus for failure recovery in a multi-processor computer system
US6260068B1 (en) 1998-06-10 2001-07-10 Compaq Computer Corporation Method and apparatus for migrating resources in a multi-processor computer system
US6633916B2 (en) 1998-06-10 2003-10-14 Hewlett-Packard Development Company, L.P. Method and apparatus for virtual resource handling in a multi-processor computer system
US6965974B1 (en) * 1997-11-14 2005-11-15 Agere Systems Inc. Dynamic partitioning of memory banks among multiple agents
JP2001523855A (en) 1997-11-14 2001-11-27 マラソン テクノロジーズ コーポレイション Failure recovery / fault-tolerant computer
US6252583B1 (en) * 1997-11-14 2001-06-26 Immersion Corporation Memory and force output management for a force feedback system
US6163840A (en) * 1997-11-26 2000-12-19 Compaq Computer Corporation Method and apparatus for sampling multiple potentially concurrent instructions in a processor pipeline
US6237073B1 (en) 1997-11-26 2001-05-22 Compaq Computer Corporation Method for providing virtual memory to physical memory page mapping in a computer operating system that randomly samples state information
US6442585B1 (en) 1997-11-26 2002-08-27 Compaq Computer Corporation Method for scheduling contexts based on statistics of memory system interactions in a computer system
US6237059B1 (en) 1997-11-26 2001-05-22 Compaq Computer Corporation Method for estimating statistics of properties of memory system interactions among contexts in a computer system
US6332178B1 (en) 1997-11-26 2001-12-18 Compaq Computer Corporation Method for estimating statistics of properties of memory system transactions
US6175814B1 (en) 1997-11-26 2001-01-16 Compaq Computer Corporation Apparatus for determining the instantaneous average number of instructions processed
US6195748B1 (en) 1997-11-26 2001-02-27 Compaq Computer Corporation Apparatus for sampling instruction execution information in a processor pipeline
US6374367B1 (en) 1997-11-26 2002-04-16 Compaq Computer Corporation Apparatus and method for monitoring a computer system to guide optimization
US6549930B1 (en) 1997-11-26 2003-04-15 Compaq Computer Corporation Method for scheduling threads in a multithreaded processor
FR2771526B1 (en) * 1997-11-27 2004-07-23 Bull Sa ARCHITECTURE FOR MANAGING VITAL DATA IN A MULTI-MODULAR MACHINE AND METHOD FOR IMPLEMENTING SUCH AN ARCHITECTURE
US6185646B1 (en) * 1997-12-03 2001-02-06 International Business Machines Corporation Method and apparatus for transferring data on a synchronous multi-drop
US6098158A (en) * 1997-12-18 2000-08-01 International Business Machines Corporation Software-enabled fast boot
US6397281B1 (en) * 1997-12-23 2002-05-28 Emc Corporation Bus arbitration system
US6502149B2 (en) * 1997-12-23 2002-12-31 Emc Corporation Plural bus data storage system
DE69815482T2 (en) * 1997-12-24 2004-04-29 Texas Instruments Inc., Dallas Computer arrangement with processor and memory hierarchy and its operating method
JPH11203157A (en) * 1998-01-13 1999-07-30 Fujitsu Ltd Redundancy device
US6249878B1 (en) * 1998-03-31 2001-06-19 Emc Corporation Data storage system
DE19815263C2 (en) * 1998-04-04 2002-03-28 Astrium Gmbh Device for fault-tolerant execution of programs
US6058490A (en) * 1998-04-21 2000-05-02 Lucent Technologies, Inc. Method and apparatus for providing scaleable levels of application availability
US6216051B1 (en) 1998-05-04 2001-04-10 Nec Electronics, Inc. Manufacturing backup system
US6691183B1 (en) 1998-05-20 2004-02-10 Invensys Systems, Inc. Second transfer logic causing a first transfer logic to check a data ready bit prior to each of multibit transfer of a continous transfer operation
US6173351B1 (en) * 1998-06-15 2001-01-09 Sun Microsystems, Inc. Multi-processor system bridge
US6148348A (en) * 1998-06-15 2000-11-14 Sun Microsystems, Inc. Bridge interfacing two processing sets operating in a lockstep mode and having a posted write buffer storing write operations upon detection of a lockstep error
US6473840B2 (en) * 1998-06-19 2002-10-29 International Business Machines Corporation Data processing system having a network and method for managing memory by storing discardable pages in a local paging device
US6119215A (en) 1998-06-29 2000-09-12 Cisco Technology, Inc. Synchronization and control system for an arrayed processing engine
US6101599A (en) * 1998-06-29 2000-08-08 Cisco Technology, Inc. System for context switching between processing elements in a pipeline of processing elements
US6836838B1 (en) 1998-06-29 2004-12-28 Cisco Technology, Inc. Architecture for a processor complex of an arrayed pipelined processing engine
US6513108B1 (en) 1998-06-29 2003-01-28 Cisco Technology, Inc. Programmable processing engine for efficiently processing transient data
US6195739B1 (en) 1998-06-29 2001-02-27 Cisco Technology, Inc. Method and apparatus for passing data among processor complex stages of a pipelined processing engine
US6247143B1 (en) * 1998-06-30 2001-06-12 Sun Microsystems, Inc. I/O handling for a multiprocessor computer system
JP2000067009A (en) 1998-08-20 2000-03-03 Hitachi Ltd Main storage shared type multi-processor
KR100589532B1 (en) * 1998-08-21 2006-06-13 크레던스 시스템스 코포레이션 Method and apparatus for built-in self test of integrated circuits
US7013305B2 (en) 2001-10-01 2006-03-14 International Business Machines Corporation Managing the state of coupling facility structures, detecting by one or more systems coupled to the coupling facility, the suspended state of the duplexed command, detecting being independent of message exchange
US6233690B1 (en) * 1998-09-17 2001-05-15 Intel Corporation Mechanism for saving power on long latency stalls
SE515461C2 (en) * 1998-10-05 2001-08-06 Ericsson Telefon Ab L M Method and arrangement for memory management
US6230190B1 (en) * 1998-10-09 2001-05-08 Openwave Systems Inc. Shared-everything file storage for clustered system
US6397345B1 (en) * 1998-10-09 2002-05-28 Openwave Systems Inc. Fault tolerant bus for clustered system
US6412079B1 (en) * 1998-10-09 2002-06-25 Openwave Systems Inc. Server pool for clustered system
US6728839B1 (en) 1998-10-28 2004-04-27 Cisco Technology, Inc. Attribute based memory pre-fetching technique
US6374402B1 (en) 1998-11-16 2002-04-16 Into Networks, Inc. Method and apparatus for installation abstraction in a secure content delivery system
US6763370B1 (en) * 1998-11-16 2004-07-13 Softricity, Inc. Method and apparatus for content protection in a secure content delivery system
US7017188B1 (en) * 1998-11-16 2006-03-21 Softricity, Inc. Method and apparatus for secure content delivery over broadband access networks
US6385747B1 (en) 1998-12-14 2002-05-07 Cisco Technology, Inc. Testing of replicated components of electronic device
US6173386B1 (en) 1998-12-14 2001-01-09 Cisco Technology, Inc. Parallel processor with debug capability
US6920562B1 (en) 1998-12-18 2005-07-19 Cisco Technology, Inc. Tightly coupled software protocol decode with hardware data encryption
US7206877B1 (en) 1998-12-22 2007-04-17 Honeywell International Inc. Fault tolerant data communication network
JP3809930B2 (en) * 1998-12-25 2006-08-16 株式会社日立製作所 Information processing device
US6564311B2 (en) * 1999-01-19 2003-05-13 Matsushita Electric Industrial Co., Ltd. Apparatus for translation between virtual and physical addresses using a virtual page number, a physical page number, a process identifier and a global bit
US6526370B1 (en) * 1999-02-04 2003-02-25 Advanced Micro Devices, Inc. Mechanism for accumulating data to determine average values of performance parameters
US7730169B1 (en) 1999-04-12 2010-06-01 Softricity, Inc. Business method and system for serving third party software applications
US7370071B2 (en) 2000-03-17 2008-05-06 Microsoft Corporation Method for serving third party software applications from servers to client computers
US8099758B2 (en) 1999-05-12 2012-01-17 Microsoft Corporation Policy based composite file system and method
US7089530B1 (en) 1999-05-17 2006-08-08 Invensys Systems, Inc. Process control configuration system with connection validation and configuration
US6754885B1 (en) 1999-05-17 2004-06-22 Invensys Systems, Inc. Methods and apparatus for controlling object appearance in a process control configuration system
US7096465B1 (en) 1999-05-17 2006-08-22 Invensys Systems, Inc. Process control configuration system with parameterized objects
US7272815B1 (en) 1999-05-17 2007-09-18 Invensys Systems, Inc. Methods and apparatus for control configuration with versioning, security, composite blocks, edit selection, object swapping, formulaic values and other aspects
WO2000070531A2 (en) 1999-05-17 2000-11-23 The Foxboro Company Methods and apparatus for control configuration
US7043728B1 (en) 1999-06-08 2006-05-09 Invensys Systems, Inc. Methods and apparatus for fault-detecting and fault-tolerant process control
US6501995B1 (en) 1999-06-30 2002-12-31 The Foxboro Company Process control system and method with improved distribution, installation and validation of components
US6788980B1 (en) 1999-06-11 2004-09-07 Invensys Systems, Inc. Methods and apparatus for control using control devices that provide a virtual machine environment and that communicate via an IP network
US6510352B1 (en) 1999-07-29 2003-01-21 The Foxboro Company Methods and apparatus for object-based process control
US7953931B2 (en) * 1999-08-04 2011-05-31 Super Talent Electronics, Inc. High endurance non-volatile memory devices
US6438710B1 (en) * 1999-08-31 2002-08-20 Rockwell Electronic Commerce Corp. Circuit and method for improving memory integrity in a microprocessor based application
AU6949600A (en) * 1999-08-31 2001-03-26 Times N Systems, Inc. Efficient page ownership control
US6499113B1 (en) * 1999-08-31 2002-12-24 Sun Microsystems, Inc. Method and apparatus for extracting first failure and attendant operating information from computer system devices
US6681341B1 (en) 1999-11-03 2004-01-20 Cisco Technology, Inc. Processor isolation method for integrated multi-processor systems
US6529983B1 (en) 1999-11-03 2003-03-04 Cisco Technology, Inc. Group and virtual locking mechanism for inter processor synchronization
US6708254B2 (en) 1999-11-10 2004-03-16 Nec Electronics America, Inc. Parallel access virtual channel memory system
US6473660B1 (en) 1999-12-03 2002-10-29 The Foxboro Company Process control system and method with automatic fault avoidance
US7555683B2 (en) * 1999-12-23 2009-06-30 Landesk Software, Inc. Inventory determination for facilitating commercial transactions during diagnostic tests
US8019943B2 (en) * 2000-01-06 2011-09-13 Super Talent Electronics, Inc. High endurance non-volatile memory devices
US6574753B1 (en) * 2000-01-10 2003-06-03 Emc Corporation Peer link fault isolation
US6779128B1 (en) 2000-02-18 2004-08-17 Invensys Systems, Inc. Fault-tolerant data transfer
US6892237B1 (en) 2000-03-28 2005-05-10 Cisco Technology, Inc. Method and apparatus for high-speed parsing of network messages
US6820213B1 (en) 2000-04-13 2004-11-16 Stratus Technologies Bermuda, Ltd. Fault-tolerant computer system with voter delay buffer
US6687851B1 (en) 2000-04-13 2004-02-03 Stratus Technologies Bermuda Ltd. Method and system for upgrading fault-tolerant systems
US6862689B2 (en) 2001-04-12 2005-03-01 Stratus Technologies Bermuda Ltd. Method and apparatus for managing session information
US6802022B1 (en) 2000-04-14 2004-10-05 Stratus Technologies Bermuda Ltd. Maintenance of consistent, redundant mass storage images
US6901481B2 (en) 2000-04-14 2005-05-31 Stratus Technologies Bermuda Ltd. Method and apparatus for storing transactional information in persistent memory
US6647516B1 (en) * 2000-04-19 2003-11-11 Hewlett-Packard Development Company, L.P. Fault tolerant data storage systems and methods of operating a fault tolerant data storage system
US6708331B1 (en) * 2000-05-03 2004-03-16 Leon Schwartz Method for automatic parallelization of software
US6675315B1 (en) * 2000-05-05 2004-01-06 Oracle International Corp. Diagnosing crashes in distributed computing systems
US6505269B1 (en) 2000-05-16 2003-01-07 Cisco Technology, Inc. Dynamic addressing mapping to eliminate memory resource contention in a symmetric multiprocessor system
US20020018211A1 (en) * 2000-06-07 2002-02-14 Megerle Clifford A. System and method to detect the presence of a target organism within an air sample using flow cytometry
US6609216B1 (en) 2000-06-16 2003-08-19 International Business Machines Corporation Method for measuring performance of code sequences in a production system
US6804703B1 (en) * 2000-06-22 2004-10-12 International Business Machines Corporation System and method for establishing persistent reserves to nonvolatile storage in a clustered computer environment
US6438647B1 (en) 2000-06-23 2002-08-20 International Business Machines Corporation Method and apparatus for providing battery-backed immediate write back cache for an array of disk drives in a computer system
EP1182569B8 (en) * 2000-08-21 2011-07-06 Texas Instruments Incorporated TLB lock and unlock operation
EP1215577B1 (en) * 2000-08-21 2012-02-22 Texas Instruments Incorporated Fault management and recovery based on task-ID
EP1213650A3 (en) * 2000-08-21 2006-08-30 Texas Instruments France Priority arbitration based on current task and MMU
US6647470B1 (en) * 2000-08-21 2003-11-11 Micron Technology, Inc. Memory device having posted write per command
US6732289B1 (en) 2000-08-31 2004-05-04 Sun Microsystems, Inc. Fault tolerant data storage system
GB2370380B (en) 2000-12-19 2003-12-31 Picochip Designs Ltd Processor architecture
US6948010B2 (en) * 2000-12-20 2005-09-20 Stratus Technologies Bermuda Ltd. Method and apparatus for efficiently moving portions of a memory block
US6990657B2 (en) * 2001-01-24 2006-01-24 Texas Instruments Incorporated Shared software breakpoints in a shared memory system
US6886171B2 (en) * 2001-02-20 2005-04-26 Stratus Technologies Bermuda Ltd. Caching for I/O virtual address translation and validation using device drivers
JP3628265B2 (en) * 2001-02-21 2005-03-09 株式会社半導体理工学研究センター Multiprocessor system unit
US6829693B2 (en) 2001-02-28 2004-12-07 International Business Machines Corporation Auxiliary storage slot scavenger
US7017073B2 (en) * 2001-02-28 2006-03-21 International Business Machines Corporation Method and apparatus for fault-tolerance via dual thread crosschecking
US6766413B2 (en) 2001-03-01 2004-07-20 Stratus Technologies Bermuda Ltd. Systems and methods for caching with file-level granularity
US6874102B2 (en) * 2001-03-05 2005-03-29 Stratus Technologies Bermuda Ltd. Coordinated recalibration of high bandwidth memories in a multiprocessor computer
EP1239369A1 (en) * 2001-03-07 2002-09-11 Siemens Aktiengesellschaft Fault-tolerant computer system and method for its use
US6754788B2 (en) * 2001-03-15 2004-06-22 International Business Machines Corporation Apparatus, method and computer program product for privatizing operating system data
US6751718B1 (en) * 2001-03-26 2004-06-15 Networks Associates Technology, Inc. Method, system and computer program product for using an instantaneous memory deficit metric to detect and reduce excess paging operations in a computer system
US7065672B2 (en) * 2001-03-28 2006-06-20 Stratus Technologies Bermuda Ltd. Apparatus and methods for fault-tolerant computing using a switching fabric
US6928583B2 (en) * 2001-04-11 2005-08-09 Stratus Technologies Bermuda Ltd. Apparatus and method for two computing elements in a fault-tolerant server to execute instructions in lockstep
US6971043B2 (en) * 2001-04-11 2005-11-29 Stratus Technologies Bermuda Ltd Apparatus and method for accessing a mass storage device in a fault-tolerant server
US6862693B2 (en) * 2001-04-13 2005-03-01 Sun Microsystems, Inc. Providing fault-tolerance by comparing addresses and data from redundant processors running in lock-step
US6996750B2 (en) 2001-05-31 2006-02-07 Stratus Technologies Bermuda Ltd. Methods and apparatus for computer bus error termination
US6799186B2 (en) * 2001-10-11 2004-09-28 International Business Machines Corporation SLA monitor calendar buffering
US7120901B2 (en) * 2001-10-26 2006-10-10 International Business Machines Corporation Method and system for tracing and displaying execution of nested functions
US7039774B1 (en) * 2002-02-05 2006-05-02 Juniper Networks, Inc. Memory allocation using a memory address pool
US6832270B2 (en) * 2002-03-08 2004-12-14 Hewlett-Packard Development Company, L.P. Virtualization of computer system interconnects
WO2003089995A2 (en) 2002-04-15 2003-10-30 Invensys Systems, Inc. Methods and apparatus for process, factory-floor, environmental, computer aided manufacturing-based or other control system with real-time data distribution
KR100450320B1 (en) * 2002-05-10 2004-10-01 한국전자통신연구원 Method/Module of Digital TV image signal processing with Auto Error Correction
JP3606281B2 (en) * 2002-06-07 2005-01-05 オムロン株式会社 Programmable controller, CPU unit, special function module, and duplex processing method
US7155721B2 (en) * 2002-06-28 2006-12-26 Hewlett-Packard Development Company, L.P. Method and apparatus for communicating information between lock stepped processors
US7136798B2 (en) * 2002-07-19 2006-11-14 International Business Machines Corporation Method and apparatus to manage multi-computer demand
US20040044508A1 (en) * 2002-08-29 2004-03-04 Hoffman Robert R. Method for generating commands for testing hardware device models
DE10251912A1 (en) * 2002-11-07 2004-05-19 Siemens Ag Data processing synchronization procedure for redundant data processing units, involves obtaining access to run units for common storage zone via access confirmation request
GB2396446B (en) * 2002-12-20 2005-11-16 Picochip Designs Ltd Array synchronization
US8281084B2 (en) * 2003-01-13 2012-10-02 Emlilex Design & Manufacturing Corp. Method and interface for access to memory within a first electronic device by a second electronic device
US7149923B1 (en) * 2003-01-17 2006-12-12 Unisys Corporation Software control using the controller as a component to achieve resiliency in a computer system utilizing separate servers for redundancy
US7779285B2 (en) * 2003-02-18 2010-08-17 Oracle America, Inc. Memory system including independent isolated power for each memory module
US7467326B2 (en) 2003-02-28 2008-12-16 Maxwell Technologies, Inc. Self-correcting computer
JP2004302713A (en) * 2003-03-31 2004-10-28 Hitachi Ltd Storage system and its control method
DE10328059A1 (en) * 2003-06-23 2005-01-13 Robert Bosch Gmbh Method and device for monitoring a distributed system
US20050039074A1 (en) * 2003-07-09 2005-02-17 Tremblay Glenn A. Fault resilient/fault tolerant computing
US7779212B2 (en) 2003-10-17 2010-08-17 Micron Technology, Inc. Method and apparatus for sending data from multiple sources over a communications bus
US7107411B2 (en) 2003-12-16 2006-09-12 International Business Machines Corporation Apparatus method and system for fault tolerant virtual memory management
US7472320B2 (en) 2004-02-24 2008-12-30 International Business Machines Corporation Autonomous self-monitoring and corrective operation of an integrated circuit
US7321985B2 (en) * 2004-02-26 2008-01-22 International Business Machines Corporation Method for achieving higher availability of computer PCI adapters
US7761923B2 (en) 2004-03-01 2010-07-20 Invensys Systems, Inc. Process control methods and apparatus for intrusion detection, protection and network hardening
US20050193378A1 (en) * 2004-03-01 2005-09-01 Breault Richard E. System and method for building an executable program with a low probability of failure on demand
JP2005267111A (en) * 2004-03-17 2005-09-29 Hitachi Ltd Storage control system and method for controlling storage control system
JP4056488B2 (en) * 2004-03-30 2008-03-05 エルピーダメモリ株式会社 Semiconductor device testing method and manufacturing method
US20060020852A1 (en) * 2004-03-30 2006-01-26 Bernick David L Method and system of servicing asynchronous interrupts in multiple processors executing a user program
US20050240806A1 (en) * 2004-03-30 2005-10-27 Hewlett-Packard Development Company, L.P. Diagnostic memory dump method in a redundant processor
US7426656B2 (en) * 2004-03-30 2008-09-16 Hewlett-Packard Development Company, L.P. Method and system executing user programs on non-deterministic processors
US8799706B2 (en) * 2004-03-30 2014-08-05 Hewlett-Packard Development Company, L.P. Method and system of exchanging information between processors
JP2005293427A (en) * 2004-04-02 2005-10-20 Matsushita Electric Ind Co Ltd Data transfer processing apparatus and data transfer processing method
US7237144B2 (en) * 2004-04-06 2007-06-26 Hewlett-Packard Development Company, L.P. Off-chip lockstep checking
US7290169B2 (en) * 2004-04-06 2007-10-30 Hewlett-Packard Development Company, L.P. Core-level processor lockstepping
US7296181B2 (en) * 2004-04-06 2007-11-13 Hewlett-Packard Development Company, L.P. Lockstep error signaling
GB0411054D0 (en) * 2004-05-18 2004-06-23 Ricardo Uk Ltd Fault tolerant data processing
US7392426B2 (en) * 2004-06-15 2008-06-24 Honeywell International Inc. Redundant processing architecture for single fault tolerance
US7243212B1 (en) * 2004-08-06 2007-07-10 Xilinx, Inc. Processor-controller interface for non-lock step operation
US7590822B1 (en) 2004-08-06 2009-09-15 Xilinx, Inc. Tracking an instruction through a processor pipeline
US7346759B1 (en) 2004-08-06 2008-03-18 Xilinx, Inc. Decoder interface
US7590823B1 (en) 2004-08-06 2009-09-15 Xilinx, Inc. Method and system for handling an instruction not supported in a coprocessor formed using configurable logic
US7546441B1 (en) 2004-08-06 2009-06-09 Xilinx, Inc. Coprocessor interface controller
US7404105B2 (en) 2004-08-16 2008-07-22 International Business Machines Corporation High availability multi-processor system
TW200609721A (en) * 2004-09-03 2006-03-16 Inventec Corp Redundancy control system and method thereof
US20060080514A1 (en) * 2004-10-08 2006-04-13 International Business Machines Corporation Managing shared memory
JP4182948B2 (en) * 2004-12-21 2008-11-19 日本電気株式会社 Fault tolerant computer system and interrupt control method therefor
US7778812B2 (en) * 2005-01-07 2010-08-17 Micron Technology, Inc. Selecting data to verify in hardware device model simulation test generation
US7418541B2 (en) * 2005-02-10 2008-08-26 International Business Machines Corporation Method for indirect access to a support interface for memory-mapped resources to reduce system connectivity from out-of-band support processor
US7467204B2 (en) * 2005-02-10 2008-12-16 International Business Machines Corporation Method for providing low-level hardware access to in-band and out-of-band firmware
EP1715589A1 (en) * 2005-03-02 2006-10-25 STMicroelectronics N.V. LDPC decoder in particular for DVB-S2 LDCP code decoding
US20060222125A1 (en) * 2005-03-31 2006-10-05 Edwards John W Jr Systems and methods for maintaining synchronicity during signal transmission
US20060222126A1 (en) * 2005-03-31 2006-10-05 Stratus Technologies Bermuda Ltd. Systems and methods for maintaining synchronicity during signal transmission
US20060236168A1 (en) * 2005-04-01 2006-10-19 Honeywell International Inc. System and method for dynamically optimizing performance and reliability of redundant processing systems
US7549082B2 (en) * 2005-04-28 2009-06-16 Hewlett-Packard Development Company, L.P. Method and system of bringing processors to the same computational point
US8103861B2 (en) * 2005-04-28 2012-01-24 Hewlett-Packard Development Company, L.P. Method and system for presenting an interrupt request to processors executing in lock step
US7730350B2 (en) * 2005-04-28 2010-06-01 Hewlett-Packard Development Company, L.P. Method and system of determining the execution point of programs executed in lock step
US7549029B2 (en) * 2005-05-06 2009-06-16 International Business Machines Corporation Methods for creating hierarchical copies
DE102005038567A1 (en) 2005-08-12 2007-02-15 Micronas Gmbh Multi-processor architecture and method for controlling memory access in a multi-process architecture
US8074059B2 (en) 2005-09-02 2011-12-06 Binl ATE, LLC System and method for performing deterministic processing
US7519754B2 (en) * 2005-12-28 2009-04-14 Silicon Storage Technology, Inc. Hard disk drive cache memory and playback device
US20070147115A1 (en) * 2005-12-28 2007-06-28 Fong-Long Lin Unified memory and controller
JP4816911B2 (en) * 2006-02-07 2011-11-16 日本電気株式会社 Memory synchronization method and refresh control circuit
US7860857B2 (en) 2006-03-30 2010-12-28 Invensys Systems, Inc. Digital data processing apparatus and methods for improving plant performance
FR2901379B1 (en) * 2006-05-19 2008-06-27 Airbus France Sas METHOD AND DEVICE FOR SOFTWARE SYNCHRONIZATION CONSOLIDATION IN FLIGHT CONTROL COMPUTERS
US8074109B1 (en) * 2006-11-14 2011-12-06 Unisys Corporation Third-party voting to select a master processor within a multi-processor computer
US8006029B2 (en) * 2006-11-30 2011-08-23 Intel Corporation DDR flash implementation with direct register access to legacy flash functions
US20080189495A1 (en) * 2007-02-02 2008-08-07 Mcbrearty Gerald Francis Method for reestablishing hotness of pages
ATE537502T1 (en) * 2007-03-29 2011-12-15 Fujitsu Ltd INFORMATION PROCESSING APPARATUS AND ERROR PROCESSING METHOD
US7472038B2 (en) * 2007-04-16 2008-12-30 International Business Machines Corporation Method of predicting microprocessor lifetime reliability using architecture-level structure-aware techniques
US7743285B1 (en) * 2007-04-17 2010-06-22 Hewlett-Packard Development Company, L.P. Chip multiprocessor with configurable fault isolation
US8375188B1 (en) * 2007-08-08 2013-02-12 Symantec Corporation Techniques for epoch pipelining
US20090049323A1 (en) * 2007-08-14 2009-02-19 Imark Robert R Synchronization of processors in a multiprocessor system
JP2009087028A (en) * 2007-09-28 2009-04-23 Toshiba Corp Memory system and memory read method, and program
JP5148236B2 (en) 2007-10-01 2013-02-20 ルネサスエレクトロニクス株式会社 Semiconductor integrated circuit and method for controlling semiconductor integrated circuit
GB2454865B (en) * 2007-11-05 2012-06-13 Picochip Designs Ltd Power control
US20090133022A1 (en) * 2007-11-15 2009-05-21 Karim Faraydon O Multiprocessing apparatus, system and method
US7809980B2 (en) * 2007-12-06 2010-10-05 Jehoda Refaeli Error detector in a cache memory using configurable way redundancy
FR2925191B1 (en) * 2007-12-14 2010-03-05 Thales Sa HIGH-INTEGRITY DIGITAL PROCESSING ARCHITECTURE WITH MULTIPLE SUPERVISED RESOURCES
US8243614B2 (en) * 2008-03-07 2012-08-14 Honeywell International Inc. Hardware efficient monitoring of input/output signals
US7996714B2 (en) * 2008-04-14 2011-08-09 Charles Stark Draper Laboratory, Inc. Systems and methods for redundancy management in fault tolerant computing
US8621154B1 (en) 2008-04-18 2013-12-31 Netapp, Inc. Flow based reply cache
US8161236B1 (en) 2008-04-23 2012-04-17 Netapp, Inc. Persistent reply cache integrated with file system
US8386664B2 (en) * 2008-05-22 2013-02-26 International Business Machines Corporation Reducing runtime coherency checking with global data flow analysis
US8281295B2 (en) * 2008-05-23 2012-10-02 International Business Machines Corporation Computer analysis and runtime coherency checking
CN102124432B (en) 2008-06-20 2014-11-26 因文西斯系统公司 Systems and methods for immersive interaction with actual and/or simulated facilities for process, environmental and industrial control
US8285670B2 (en) 2008-07-22 2012-10-09 International Business Machines Corporation Dynamically maintaining coherency within live ranges of direct buffers
WO2010016169A1 (en) * 2008-08-07 2010-02-11 日本電気株式会社 Multiprocessor system and method for controlling the same
TW201015321A (en) * 2008-09-25 2010-04-16 Panasonic Corp Buffer memory device, memory system and data trnsfer method
US8762621B2 (en) * 2008-10-28 2014-06-24 Micron Technology, Inc. Logical unit operation
JP5439808B2 (en) * 2008-12-25 2014-03-12 富士通セミコンダクター株式会社 System LSI with multiple buses
GB2466661B (en) * 2009-01-05 2014-11-26 Intel Corp Rake receiver
DE102009007215A1 (en) * 2009-02-03 2010-08-05 Siemens Aktiengesellschaft Automation system with a programmable matrix module
JP2010198131A (en) * 2009-02-23 2010-09-09 Renesas Electronics Corp Processor system and operation mode switching method for processor system
US8171227B1 (en) 2009-03-11 2012-05-01 Netapp, Inc. System and method for managing a flow based reply cache
GB2470037B (en) 2009-05-07 2013-07-10 Picochip Designs Ltd Methods and devices for reducing interference in an uplink
US8127060B2 (en) 2009-05-29 2012-02-28 Invensys Systems, Inc Methods and apparatus for control configuration with control objects that are fieldbus protocol-aware
US8463964B2 (en) 2009-05-29 2013-06-11 Invensys Systems, Inc. Methods and apparatus for control configuration with enhanced change-tracking
GB2470891B (en) 2009-06-05 2013-11-27 Picochip Designs Ltd A method and device in a communication network
GB2470771B (en) 2009-06-05 2012-07-18 Picochip Designs Ltd A method and device in a communication network
US8966195B2 (en) * 2009-06-26 2015-02-24 Hewlett-Packard Development Company, L.P. Direct memory access and super page swapping optimizations for a memory blade
JP5676950B2 (en) * 2009-08-20 2015-02-25 キヤノン株式会社 Image forming apparatus
GB2474071B (en) 2009-10-05 2013-08-07 Picochip Designs Ltd Femtocell base station
US8473818B2 (en) * 2009-10-12 2013-06-25 Empire Technology Development Llc Reliable communications in on-chip networks
DE102009050161A1 (en) * 2009-10-21 2011-04-28 Siemens Aktiengesellschaft A method and apparatus for testing a system having at least a plurality of parallel executable software units
US8516356B2 (en) 2010-07-20 2013-08-20 Infineon Technologies Ag Real-time error detection by inverse processing
GB2482869B (en) 2010-08-16 2013-11-06 Picochip Designs Ltd Femtocell access control
GB2484927A (en) * 2010-10-26 2012-05-02 Advanced Risc Mach Ltd Provision of access control data within a data processing system
US8443230B1 (en) * 2010-12-15 2013-05-14 Xilinx, Inc. Methods and systems with transaction-level lockstep
CN102621938A (en) * 2011-01-28 2012-08-01 上海新华控制技术(集团)有限公司 Triple redundancy control system in process control and method thereof
US8972696B2 (en) 2011-03-07 2015-03-03 Microsoft Technology Licensing, Llc Pagefile reservations
GB2489919B (en) 2011-04-05 2018-02-14 Intel Corp Filter
GB2489716B (en) 2011-04-05 2015-06-24 Intel Corp Multimode base system
GB2491098B (en) 2011-05-16 2015-05-20 Intel Corp Accessing a base station
JP5699057B2 (en) * 2011-08-24 2015-04-08 株式会社日立製作所 Programmable device, programmable device reconfiguration method, and electronic device
US8924780B2 (en) * 2011-11-10 2014-12-30 Ge Aviation Systems Llc Method of providing high integrity processing
US8832411B2 (en) 2011-12-14 2014-09-09 Microsoft Corporation Working set swapping using a sequentially ordered swap file
KR101947726B1 (en) * 2012-03-08 2019-02-13 삼성전자주식회사 Image processing apparatus and Method for processing image thereof
JP5850774B2 (en) * 2012-03-22 2016-02-03 ルネサスエレクトロニクス株式会社 Semiconductor integrated circuit device and system using the same
US9378098B2 (en) 2012-06-06 2016-06-28 Qualcomm Incorporated Methods and systems for redundant data storage in a register
JP6111605B2 (en) * 2012-11-08 2017-04-12 日本電気株式会社 Computer system, computer system diagnostic method and diagnostic program
US9569223B2 (en) * 2013-02-13 2017-02-14 Red Hat Israel, Ltd. Mixed shared/non-shared memory transport for virtual machines
US10102148B2 (en) 2013-06-13 2018-10-16 Microsoft Technology Licensing, Llc Page-based compressed storage management
CN107219999B (en) * 2013-08-31 2020-06-26 华为技术有限公司 Data migration method of memory module in server and server
KR102116984B1 (en) * 2014-03-11 2020-05-29 삼성전자 주식회사 Method for controlling memory swap operation and data processing system adopting the same
US9684625B2 (en) 2014-03-21 2017-06-20 Microsoft Technology Licensing, Llc Asynchronously prefetching sharable memory pages
JP6341795B2 (en) * 2014-08-05 2018-06-13 ルネサスエレクトロニクス株式会社 Microcomputer and microcomputer system
US9632924B2 (en) 2015-03-02 2017-04-25 Microsoft Technology Licensing, Llc Using memory compression to reduce memory commit charge
US10037270B2 (en) 2015-04-14 2018-07-31 Microsoft Technology Licensing, Llc Reducing memory commit charge when compressing memory
WO2016187232A1 (en) 2015-05-21 2016-11-24 Goldman, Sachs & Co. General-purpose parallel computing architecture
US11449452B2 (en) 2015-05-21 2022-09-20 Goldman Sachs & Co. LLC General-purpose parallel computing architecture
JP2017146897A (en) * 2016-02-19 2017-08-24 株式会社デンソー Microcontroller and electronic control unit
US9595312B1 (en) 2016-03-31 2017-03-14 Altera Corporation Adaptive refresh scheduling for memory
EP3607454A4 (en) * 2017-04-06 2021-03-31 Goldman Sachs & Co. LLC General-purpose parallel computing architecture
US10802932B2 (en) 2017-12-04 2020-10-13 Nxp Usa, Inc. Data processing system having lockstep operation
CN108804109B (en) * 2018-06-07 2021-11-05 北京四方继保自动化股份有限公司 Industrial deployment and control method based on multi-path functional equivalent module redundancy arbitration
US11099748B1 (en) * 2018-08-08 2021-08-24 United States Of America As Represented By The Administrator Of Nasa Fault tolerant memory card
US10824573B1 (en) 2019-04-19 2020-11-03 Micron Technology, Inc. Refresh and access modes for memory
US11609845B2 (en) * 2019-05-28 2023-03-21 Oracle International Corporation Configurable memory device connected to a microprocessor
US20230267043A1 (en) * 2022-02-23 2023-08-24 Micron Technology, Inc. Parity-based error management for a processing system
CN114610472B (en) * 2022-05-09 2022-12-02 上海登临科技有限公司 Multi-process management method in heterogeneous computing and computing equipment

Family Cites Families (135)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR1587572A (en) * 1968-10-25 1970-03-20
GB1253309A (en) * 1969-11-21 1971-11-10 Marconi Co Ltd Improvements in or relating to data processing arrangements
GB1308497A (en) * 1970-09-25 1973-02-21 Marconi Co Ltd Data processing arrangements
US3864670A (en) * 1970-09-30 1975-02-04 Yokogawa Electric Works Ltd Dual computer system with signal exchange system
SE347826B (en) * 1970-11-20 1972-08-14 Ericsson Telefon Ab L M
US3810119A (en) * 1971-05-04 1974-05-07 Us Navy Processor synchronization scheme
BE790654A (en) * 1971-10-28 1973-04-27 Siemens Ag TREATMENT SYSTEM WITH SYSTEM UNITS
US3760365A (en) * 1971-12-30 1973-09-18 Ibm Multiprocessing computing system with task assignment at the instruction level
DE2202231A1 (en) * 1972-01-18 1973-07-26 Siemens Ag PROCESSING SYSTEM WITH TRIPLE SYSTEM UNITS
US3783250A (en) * 1972-02-25 1974-01-01 Nasa Adaptive voting computer system
US3828321A (en) * 1973-03-15 1974-08-06 Gte Automatic Electric Lab Inc System for reconfiguring central processor and instruction storage combinations
CH556576A (en) * 1973-03-28 1974-11-29 Hasler Ag DEVICE FOR SYNCHRONIZATION OF THREE COMPUTERS.
JPS5024046A (en) * 1973-07-04 1975-03-14
US4099241A (en) * 1973-10-30 1978-07-04 Telefonaktiebolaget L M Ericsson Apparatus for facilitating a cooperation between an executive computer and a reserve computer
FR2253432A5 (en) * 1973-11-30 1975-06-27 Honeywell Bull Soc Ind
FR2253423A5 (en) * 1973-11-30 1975-06-27 Honeywell Bull Soc Ind
IT1014277B (en) * 1974-06-03 1977-04-20 Cselt Centro Studi Lab Telecom CONTROL SYSTEM OF PROCESS COMPUTERS OPERATING IN PARALLEL
FR2285458A1 (en) * 1974-09-20 1976-04-16 Creusot Loire HYDROCARBON RETENTION DEVICE IN A CONVERTER
US4015246A (en) * 1975-04-14 1977-03-29 The Charles Stark Draper Laboratory, Inc. Synchronous fault tolerant multi-processor system
US4015243A (en) * 1975-06-02 1977-03-29 Kurpanek Horst G Multi-processing computer system
US4034347A (en) * 1975-08-08 1977-07-05 Bell Telephone Laboratories, Incorporated Method and apparatus for controlling a multiprocessor system
JPS5260540A (en) * 1975-11-14 1977-05-19 Hitachi Ltd Synchronization control of double-type system
US4224664A (en) * 1976-05-07 1980-09-23 Honeywell Information Systems Inc. Apparatus for detecting when the activity of one process in relation to a common piece of information interferes with any other process in a multiprogramming/multiprocessing computer system
US4228496A (en) 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4456952A (en) * 1977-03-17 1984-06-26 Honeywell Information Systems Inc. Data processing system having redundant control processors for fault detection
JPS53116040A (en) * 1977-03-18 1978-10-11 Nec Corp System controller
US4358823A (en) * 1977-03-25 1982-11-09 Trw, Inc. Double redundant processor
US4101960A (en) * 1977-03-29 1978-07-18 Burroughs Corporation Scientific processor
US4187538A (en) * 1977-06-13 1980-02-05 Honeywell Inc. Read request selection system for redundant storage
GB1545169A (en) * 1977-09-22 1979-05-02 Burroughs Corp Data processor system including data-save controller for protection against loss of volatile memory information during power failure
IT1111606B (en) * 1978-03-03 1986-01-13 Cselt Centro Studi Lab Telecom MULTI-CONFIGURABLE MODULAR PROCESSING SYSTEM INTEGRATED WITH A PRE-PROCESSING SYSTEM
US4270168A (en) * 1978-08-31 1981-05-26 United Technologies Corporation Selective disablement in fail-operational, fail-safe multi-computer control system
US4234920A (en) * 1978-11-24 1980-11-18 Engineered Systems, Inc. Power failure detection and restart system
US4257097A (en) * 1978-12-11 1981-03-17 Bell Telephone Laboratories, Incorporated Multiprocessor system with demand assignable program paging stores
US4253144A (en) * 1978-12-21 1981-02-24 Burroughs Corporation Multi-processor communication network
US4380046A (en) * 1979-05-21 1983-04-12 Nasa Massively parallel processor computer
US4449183A (en) * 1979-07-09 1984-05-15 Digital Equipment Corporation Arbitration scheme for a multiported shared functional device for use in multiprocessing systems
US4428044A (en) * 1979-09-20 1984-01-24 Bell Telephone Laboratories, Incorporated Peripheral unit controller
DE2939487A1 (en) * 1979-09-28 1981-04-16 Siemens AG, 1000 Berlin und 8000 München COMPUTER ARCHITECTURE BASED ON A MULTI-MICROCOMPUTER STRUCTURE AS A FAULT-TOLERANT SYSTEM
US4315310A (en) * 1979-09-28 1982-02-09 Intel Corporation Input/output data processing system
NL7909178A (en) * 1979-12-20 1981-07-16 Philips Nv CALCULATOR WITH DISTRIBUTED REDUNDANCY DISTRIBUTED OVER DIFFERENT INSULATION AREAS FOR ERRORS.
FR2474201B1 (en) * 1980-01-22 1986-05-16 Bull Sa METHOD AND DEVICE FOR MANAGING CONFLICTS CAUSED BY MULTIPLE ACCESSES TO THE SAME CACH OF A DIGITAL INFORMATION PROCESSING SYSTEM COMPRISING AT LEAST TWO PROCESSES EACH HAVING A CACHE
US4356546A (en) * 1980-02-05 1982-10-26 The Bendix Corporation Fault-tolerant multi-computer system
JPS56119596A (en) * 1980-02-26 1981-09-19 Nec Corp Control signal generator
US4351023A (en) * 1980-04-11 1982-09-21 The Foxboro Company Process control system with improved system security features
US4493019A (en) * 1980-05-06 1985-01-08 Burroughs Corporation Pipelined microprogrammed digital data processor employing microinstruction tasking
JPS573148A (en) * 1980-06-06 1982-01-08 Hitachi Ltd Diagnostic system for other system
US4412281A (en) * 1980-07-11 1983-10-25 Raytheon Company Distributed signal processing system
US4369510A (en) * 1980-07-25 1983-01-18 Honeywell Information Systems Inc. Soft error rewrite control system
US4392196A (en) * 1980-08-11 1983-07-05 Harris Corporation Multi-processor time alignment control system
US4399504A (en) * 1980-10-06 1983-08-16 International Business Machines Corporation Method and means for the sharing of data resources in a multiprocessing, multiprogramming environment
US4375683A (en) * 1980-11-12 1983-03-01 August Systems Fault tolerant computational system and voter circuit
US4371754A (en) * 1980-11-19 1983-02-01 Rockwell International Corporation Automatic fault recovery system for a multiple processor telecommunications switching control
US4414624A (en) * 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
JPH0614328B2 (en) * 1980-11-21 1994-02-23 沖電気工業株式会社 Common memory access method
US4430707A (en) * 1981-03-05 1984-02-07 Burroughs Corporation Microprogrammed digital data processing system employing multi-phase subroutine control for concurrently executing tasks
US4455605A (en) * 1981-07-23 1984-06-19 International Business Machines Corporation Method for establishing variable path group associations and affiliations between "non-static" MP systems and shared devices
US4556952A (en) * 1981-08-12 1985-12-03 International Business Machines Corporation Refresh circuit for dynamic memory of a data processor employing a direct memory access controller
US4438494A (en) * 1981-08-25 1984-03-20 Intel Corporation Apparatus of fault-handling in a multiprocessing system
US4486826A (en) * 1981-10-01 1984-12-04 Stratus Computer, Inc. Computer peripheral control apparatus
US4920540A (en) * 1987-02-25 1990-04-24 Stratus Computer, Inc. Fault-tolerant digital timing apparatus and method
US4597084A (en) * 1981-10-01 1986-06-24 Stratus Computer, Inc. Computer memory apparatus
IN160140B (en) * 1981-10-10 1987-06-27 Westinghouse Brake & Signal
DE3208573C2 (en) * 1982-03-10 1985-06-27 Standard Elektrik Lorenz Ag, 7000 Stuttgart 2 out of 3 selection device for a 3 computer system
US4497059A (en) * 1982-04-28 1985-01-29 The Charles Stark Draper Laboratory, Inc. Multi-channel redundant processing systems
DE3216238C1 (en) * 1982-04-30 1983-11-03 Siemens AG, 1000 Berlin und 8000 München Dataprocessing system with virtual subaddressing of the buffer memory
JPS5914062A (en) * 1982-07-15 1984-01-24 Hitachi Ltd Method for controlling duplicated shared memory
DE3235762A1 (en) * 1982-09-28 1984-03-29 Fried. Krupp Gmbh, 4300 Essen METHOD AND DEVICE FOR SYNCHRONIZING DATA PROCESSING SYSTEMS
NL8203921A (en) * 1982-10-11 1984-05-01 Philips Nv MULTIPLE REDUNDANT CLOCK SYSTEM, CONTAINING A NUMBER OF SYNCHRONIZING CLOCKS, AND CLOCK CIRCUIT FOR USE IN SUCH A CLOCK SYSTEM.
US4667287A (en) * 1982-10-28 1987-05-19 Tandem Computers Incorporated Multiprocessor multisystem communications network
US4473452A (en) * 1982-11-18 1984-09-25 The Trustees Of Columbia University In The City Of New York Electrophoresis using alternating transverse electric fields
US4590554A (en) * 1982-11-23 1986-05-20 Parallel Computers Systems, Inc. Backup fault tolerant computer system
US4648035A (en) * 1982-12-06 1987-03-03 Digital Equipment Corporation Address conversion unit for multiprocessor system
US4541094A (en) * 1983-03-21 1985-09-10 Sequoia Systems, Inc. Self-checking computer circuitry
US4591977A (en) * 1983-03-23 1986-05-27 The United States Of America As Represented By The Secretary Of The Air Force Plurality of processors where access to the common memory requires only a single clock interval
US4644498A (en) * 1983-04-04 1987-02-17 General Electric Company Fault-tolerant real time clock
US4661900A (en) * 1983-04-25 1987-04-28 Cray Research, Inc. Flexible chaining in vector processor with selective use of vector registers as operand and result registers
US4577272A (en) * 1983-06-27 1986-03-18 E-Systems, Inc. Fault tolerant and load sharing processing system
US4646231A (en) * 1983-07-21 1987-02-24 Burroughs Corporation Method of synchronizing the sequence by which a variety of randomly called unrelated activities are executed in a digital processor
JPS6054052A (en) * 1983-09-02 1985-03-28 Nec Corp Processing continuing system
US4912698A (en) * 1983-09-26 1990-03-27 Siemens Aktiengesellschaft Multi-processor central control unit of a telephone exchange system and its operation
DE3334796A1 (en) * 1983-09-26 1984-11-08 Siemens AG, 1000 Berlin und 8000 München METHOD FOR OPERATING A MULTIPROCESSOR CONTROLLER, ESPECIALLY FOR THE CENTRAL CONTROL UNIT OF A TELECOMMUNICATION SWITCHING SYSTEM
US4564903A (en) * 1983-10-05 1986-01-14 International Business Machines Corporation Partitioned multiprocessor programming system
US4631701A (en) * 1983-10-31 1986-12-23 Ncr Corporation Dynamic random access memory refresh control system
US4783733A (en) * 1983-11-14 1988-11-08 Tandem Computers Incorporated Fault tolerant communications controller system
US4607365A (en) * 1983-11-14 1986-08-19 Tandem Computers Incorporated Fault-tolerant communications controller system
US4570261A (en) * 1983-12-09 1986-02-11 Motorola, Inc. Distributed fault isolation and recovery system and method
WO1985002698A1 (en) * 1983-12-12 1985-06-20 Parallel Computers, Inc. Computer processor controller
US4608688A (en) * 1983-12-27 1986-08-26 At&T Bell Laboratories Processing system tolerant of loss of access to secondary storage
US4622631B1 (en) * 1983-12-30 1996-04-09 Recognition Int Inc Data processing system having a data coherence solution
US4625296A (en) * 1984-01-17 1986-11-25 The Perkin-Elmer Corporation Memory refresh circuit with varying system transparency
DE3509900A1 (en) * 1984-03-19 1985-10-17 Konishiroku Photo Industry Co., Ltd., Tokio/Tokyo METHOD AND DEVICE FOR PRODUCING A COLOR IMAGE
US4638427A (en) * 1984-04-16 1987-01-20 International Business Machines Corporation Performance evaluation for an asymmetric multiprocessor system
US4633394A (en) * 1984-04-24 1986-12-30 International Business Machines Corp. Distributed arbitration for multiple processors
US4589066A (en) * 1984-05-31 1986-05-13 General Electric Company Fault tolerant, frame synchronization for multiple processor systems
US4823256A (en) * 1984-06-22 1989-04-18 American Telephone And Telegraph Company, At&T Bell Laboratories Reconfigurable dual processor system
US4959774A (en) * 1984-07-06 1990-09-25 Ampex Corporation Shadow memory system for storing variable backup blocks in consecutive time periods
JPS6184740A (en) * 1984-10-03 1986-04-30 Hitachi Ltd Generating system of general-use object code
US4827401A (en) * 1984-10-24 1989-05-02 International Business Machines Corporation Method and apparatus for synchronizing clocks prior to the execution of a flush operation
AU568977B2 (en) * 1985-05-10 1988-01-14 Tandem Computers Inc. Dual processor error detection system
JPS61265660A (en) * 1985-05-20 1986-11-25 Toshiba Corp Execution mode switching control system in multiprocessor system
US4757442A (en) * 1985-06-17 1988-07-12 Nec Corporation Re-synchronization system using common memory bus to transfer restart data from non-faulty processor to failed processor
US4751639A (en) * 1985-06-24 1988-06-14 Ncr Corporation Virtual command rollback in a fault tolerant data processing system
US4683570A (en) * 1985-09-03 1987-07-28 General Electric Company Self-checking digital fault detector for modular redundant real time clock
US4845419A (en) * 1985-11-12 1989-07-04 Norand Corporation Automatic control means providing a low-power responsive signal, particularly for initiating data preservation operation
JPS62135940A (en) * 1985-12-09 1987-06-18 Nec Corp Stall detecting system
US4733353A (en) 1985-12-13 1988-03-22 General Electric Company Frame synchronization of multiply redundant computers
JPH0778750B2 (en) * 1985-12-24 1995-08-23 日本電気株式会社 Highly reliable computer system
US4703452A (en) * 1986-01-03 1987-10-27 Gte Communication Systems Corporation Interrupt synchronizing circuit
US4773038A (en) * 1986-02-24 1988-09-20 Thinking Machines Corporation Method of simulating additional processors in a SIMD parallel processor array
US4799140A (en) * 1986-03-06 1989-01-17 Orbital Sciences Corporation Ii Majority vote sequencer
US4757505A (en) * 1986-04-30 1988-07-12 Elgar Electronics Corp. Computer power system
US4868832A (en) * 1986-04-30 1989-09-19 Marrington S Paul Computer power system
US4763333A (en) * 1986-08-08 1988-08-09 Universal Vectors Corporation Work-saving system for preventing loss in a computer due to power interruption
US4819159A (en) * 1986-08-29 1989-04-04 Tolerant Systems, Inc. Distributed multiprocess transaction processing system and method
IT1213344B (en) * 1986-09-17 1989-12-20 Honoywell Information Systems FAULT TOLERANCE CALCULATOR ARCHITECTURE.
US4774709A (en) * 1986-10-02 1988-09-27 United Technologies Corporation Symmetrization for redundant channels
GB2211638A (en) * 1987-10-27 1989-07-05 Ibm Simd array processor
US4847837A (en) * 1986-11-07 1989-07-11 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Local area network with fault-checking, priorities and redundant backup
JPS63165950A (en) * 1986-12-27 1988-07-09 Pfu Ltd Common memory system
US4831520A (en) * 1987-02-24 1989-05-16 Digital Equipment Corporation Bus interface circuit for digital data processor
US4967353A (en) 1987-02-25 1990-10-30 International Business Machines Corporation System for periodically reallocating page frames in memory based upon non-usage within a time period or after being allocated
US4914657A (en) 1987-04-15 1990-04-03 Allied-Signal Inc. Operations controller for a fault tolerant multiple node processing system
CH675781A5 (en) * 1987-04-16 1990-10-31 Bbc Brown Boveri & Cie
US4800462A (en) * 1987-04-17 1989-01-24 Tandem Computers Incorporated Electrical keying for replaceable modules
US4868826A (en) * 1987-08-31 1989-09-19 Triplex Fault-tolerant output circuits
US4868818A (en) * 1987-10-29 1989-09-19 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Fault tolerant hypercube computer system architecture
AU616213B2 (en) * 1987-11-09 1991-10-24 Tandem Computers Incorporated Method and apparatus for synchronizing a plurality of processors
US4879716A (en) * 1987-12-23 1989-11-07 Bull Hn Information Systems Inc. Resilient data communications system
US4907232A (en) * 1988-04-28 1990-03-06 The Charles Stark Draper Laboratory, Inc. Fault-tolerant parallel processing system
US4937741A (en) * 1988-04-28 1990-06-26 The Charles Stark Draper Laboratory, Inc. Synchronization of fault-tolerant parallel processing systems
US4873685A (en) * 1988-05-04 1989-10-10 Rockwell International Corporation Self-checking voting logic for fault tolerant computing applications
US4965717A (en) * 1988-12-09 1990-10-23 Tandem Computers Incorporated Multiple processor system having shared memory with private-write capability
US5018148A (en) 1989-03-01 1991-05-21 Ncr Corporation Method and apparatus for power failure protection
US5020059A (en) 1989-03-31 1991-05-28 At&T Bell Laboratories Reconfigurable signal processor

Also Published As

Publication number Publication date
CA2003337A1 (en) 1990-06-09
EP0447578A1 (en) 1991-09-25
JPH079625B2 (en) 1995-02-01
US5146589A (en) 1992-09-08
EP0372579B1 (en) 1997-10-01
US4965717A (en) 1990-10-23
EP0372578A3 (en) 1992-01-15
US4965717B1 (en) 1993-05-25
EP0681239A2 (en) 1995-11-08
US5193175A (en) 1993-03-09
US5588111A (en) 1996-12-24
DE68928360T2 (en) 1998-05-07
EP0681239A3 (en) 1996-01-24
JPH0713789A (en) 1995-01-17
ATE158879T1 (en) 1997-10-15
EP0447577A1 (en) 1991-09-25
EP0372579A3 (en) 1991-07-24
EP0372578A2 (en) 1990-06-13
EP0372579A2 (en) 1990-06-13
US5758113A (en) 1998-05-26
US5388242A (en) 1995-02-07
AU628497B2 (en) 1992-09-17
DE68928360D1 (en) 1997-11-06
AU5202790A (en) 1991-09-26
JPH02202636A (en) 1990-08-10
JPH02202637A (en) 1990-08-10
US5276823A (en) 1994-01-04

Similar Documents

Publication Publication Date Title
CA2003342A1 (en) Memory management in high-performance fault-tolerant computer system
US5890003A (en) Interrupts between asynchronously operating CPUs in fault tolerant computer system
US5384906A (en) Method and apparatus for synchronizing a plurality of processors
US5317726A (en) Multiple-processor computer system with asynchronous execution of identical code streams
US5327553A (en) Fault-tolerant computer system with /CONFIG filesystem
US5317752A (en) Fault-tolerant computer system with auto-restart after power-fall
US5295258A (en) Fault-tolerant computer system with online recovery and reintegration of redundant components
KR970004514B1 (en) Fault tolerant multiprocessor computer system
CA1299756C (en) Dual rail processors with error checking at single rail interfaces
EP0433979A2 (en) Fault-tolerant computer system with/config filesystem
US6216236B1 (en) Processing unit for a computer and a computer system incorporating such a processing unit
US5434997A (en) Method and apparatus for testing and debugging a tightly coupled mirrored processing system
JPH07219913A (en) Method for controlling multiprocessor system and device therefor
WO1994008293A9 (en) Method and apparatus for operating tightly coupled mirrored processors
EP0683456B1 (en) Fault-tolerant computer system with online reintegration and shutdown/restart
Tamir et al. The UCLA mirror processor: A building block for self-checking self-repairing computing nodes
KR19990057809A (en) Error prevention system
Tamir Self-checking self-repairing computer nodes using the Mirror Processor
CN112506701A (en) Multiprocessor chip error recovery method based on three-mode lockstep

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued