5 6
MESI coherency systems, only the owner of a cache line and 145. Attached to the local bus 135 are processors 150
knows that he owns it. Thus, if the owner dies or his and 155. Although two processors are illustrated on each
connection to the system fails, the correct state of memory local bus, the number of processors is arbitrary within
is totally unknown. The data structures used by inclusive system limits, including only a single processor,
systems to track inclusion (address tags for caches, and 5 FIGS. 2 and 3 are flowcharts of an address error recovery
directory entries for directories) contain redundant informa- method for the multiprocessor computing system 100. The
tion about the ownership of lines and, in some cases, flowcharts of FIGS. 2 and 3 illustrate the method in refer
up-to-date copies of modified data. The present invention is ence to the following example: Assume that the processor
sometimes capable of providing enough data about the state 140 executes a read from a given memory address. The
of memory to allow applications to recover from address 10 processor 145 sees this read on the local bus 130 and detects
errors, such as parity errors. The information provided may that the address 18 erroneous. The address is truly erroneous
be sufficient to allow a complete recovery, but more often, and cannot be remedied by retransmission or other protocol,
the information provided will allow the system to avoid Thus> the cache 120 may be inconsistent,
corrupt data and run long enough to permit graceful shut- According to the method 200 illustrated in FIG. 2, a
down of mission critical applications. 15 detecting step 205 is first taken to determine that the address
According to a method of the present invention, an fea on the l°cal bus 130 is erroneous. Typically this address error is detected on a local channel, such as a local ^teebng step 205 comprises a parity check over the address bus. The coherency states of one or more lines of cache field; an error ls/elected, the local bus 130 is placed memory associated with the local channel are then read, and ln a falled s,,tate' accordlngto tbx Placln# fteP 210' 50 *attbx actions are taken in response. Reading of coherency states 20 local bus 130Us cut off from the rest of the system 100. The ranges from a complete and active interrogation of all cache ?TMng step 210 has the effect of quiescing the processors lines, to a selective and passive interrogation, such as in 140 and,145 and aTM that ^ corrupted data that may responding to snoop requests. If the data state consistency is ^xistln the Pressor 140, the processor 145 or the local bus unknown, such as when the MESI state is Modified (M) or 130,cannot be, transferred to the main memory 110 or Exclusive (E), then the corresponding data in main memory 25 another part of the system 100. Next, a notifying step 215 is is poisoned. Poisoning may be accomplished by writing a Performed to notify another processor of the error. The detectable but unrecoverable error pattern in the main notlfied Pressor may be a another processor, such as the memory. Alternatively, the same effect may be accomplished Pressors 150 or 155, or a separate processor, such as a by signaling a hard error on the system bus. If the data state master Processor or a maintenance processor (not shown). It consistency of an interrogated cache line is Shared (S) or 30 * necessary that the notified processor not be isolated from Invalid (I), the line may be ignored or the line marked the cache 120 or the maln memory 110' 50 that the invalid. If the state of the cached line is valid and consistent, Processor can execute a recovery routine. Any method of such as the "Modified uncached" (Mu) state in a MuMESI notification can be utilized to reach this non-isolated proprotocol, then the line may be written to main memory or cessor- An. exemplary method of notification is signaling an provided to a snoop requester. 35 interrupt line, preferably a high priority interrupt line.
In response to the notifying step 215, the non-isolated
DESCRIPTION OF DRAWINGS processor performs an interrogating step 220 on each line in
the cache 120. Based upon a coherency state of a line, an
FIG. 1 is a block diagram of a multiprocessor computing appropriate action is taken so as to minimize the impact of
system according to the present invention. 40 the address error. The interrogated coherency states may be
FIG. 2 is a flowchart of a first method according to the stored in the cache 120 in the case of a snoopy system or in
present invention. a directory (not illustrated) in a directory based coherency
FIG. 3 is a flowchart of a second method according to the system. In a preferred embodiment, the possible coherency
present invention. states are based upon the MSI, MESI, MOESI, MuMESI or
FIG. 4 is a flowchart of a third method according to the 45 similar protocols, which are collectively referred to as
present invention MESI-based schemes. If the MESI state of a given line is
Modified uncached (Mu)—if this state is implemented—
FIG. 5 is a block diagram of modules according to the then a main memQry wdting step 225 ig by wMch ±e
present invention. given cache Hne is written to the main memory no. The
50 Mu-MESI protocol augments the basic MESI protocol by utilizing an additional "Modified uncached" (Mu) state. This state basically signifies that a cache line is valid and con
FIG. 1 is a block diagram of a multiprocessor computing sistent and is described in greater detail in pending U.S.
system 100. A system bus 105 interconnects major system patent application Ser. No. 09/290,430, entitled "Optimiza
elements. Although the system bus 105 is illustrated as a bus, 55 tion of MESI/MSI Protocol to Improve L3 Cache Perfor
a bus is exemplary of any channel (e.g., ring) interconnect- mance" (attorney docket no. HP 10981260-1), which is
ing major system elements. Attached to the system bus 105 hereby incorporated by reference. The MOESI protocol is
is a main memory 110. One or more input/output (I/O) another variation of the MESI protocol and is well known to
devices 115 may also be attached to the system bus 105. those skilled in the art.
Exemplary I/O devices include disk drives, graphics devices 60 If the MESI state of the given line is Modified (M),
and network interfaces. Also attached to the system bus 105 representing that the consistency of this data is unknown,
are caches 120 and 125. Although two caches are illustrated, then a poisoning step 230 is performed. The objective of the
any supportable number of caches is possible. Attached to poisoning step 230 is to ensure that corrupt data will not be
the caches 120 and 125 are local buses 130 and 135, used by the system 100 in subsequent computations,
respectively. As with the system bus 105, the bus structure 65 According to the poisoning step 230, a detectable but
shown for local buses 130 and 135 is illustrative of any type uncorrectable error pattern is written onto the data field
of channel. Attached to the local bus 130 are processors 140 corresponding to this line in the main memory 110. Thus,