WO1994011820A1

WO1994011820A1 - Efficient schemes for constructing reliable computing nodes in distributed systems

Info

Publication number: WO1994011820A1
Application number: PCT/GB1993/002225
Authority: WO
Inventors: Santosh Kumar Shrivastava; Neil Alexander Speirs; Sha Tao; Paul Devadoss Ezhilchelvan; Francisco Vilar Brasileiro
Original assignee: University Of Newcastle Upon Tyne
Priority date: 1992-11-06
Filing date: 1993-10-28
Publication date: 1994-05-26
Also published as: AU5343794A; GB2288045B; GB9509299D0; US5754757A; GB2288045A

Abstract

The invention relates to a computing system, a fail-silent node for use in a computing system and a method of organising information so that a number of microprocessors in a computing node, which are arranged to receive messages from other components in the computing system and to process the received messages so as to transmit the results of this processing to other components in the system, compare the results of their processing and send nothing out from the node unless either all the microprocessors in the node produce identical results or more than half of the microprocessors in the node produce identical results. This is achieved by manipulating the order in which messages are processed by each microprocessor so as to ensure that each microprocessor in the node receives the same messages, orders these same messages so that messages within each microprocessor are processed in the same order, thus ensuring, if all the microprocessors are functioning correctly that the same results are produced.

Description

Efficient Schemes for Constructing Reliable Computing Nodes in Distributed Systems

The invention relates to a computing node in or for use in a computer processing system and particularly a fail-silent computing node.

It is known that replicating computer processing on different computer microprocessors provides a practical means of constructing computer systems capable of tolerating arbitrary computer processor failures. A computing node is composed of a number of conventional computer processors on which applications are replicated to achieve tolerance to failures. Computing nodes are connected via a network.

Typically, individual hardware components do not inherently fail by becoming silent rather their output is corrupted. For some devices, simple models of their correct behaviour exist and can thus be used as a checking means, eg, memory devices should output exactly the data that was originally input to them. In these cases, faults can easily be identified by the addition to the data of redundant information, eg, parity bits, which can be checked when the data is output. However, for complex devices, for which there is no simple correlation between their inputs and subsequent outputs, eg, microprocessors, the easiest error detection method of adding redundancy is to duplicate the device and compare the outputs of the two devices.

In typical existing implementations of a fail-silent node, a plurality of or duplicated microprocessors are closely coupled and run in micro-synchronisation. Each microprocessor is initialised to an identical state and then performs identical actions on identical data for each tick of the system clock. Kence on every clock cycle the data output by the component is identical. The principles underlying the node architecture can be explained by examining Figure 1 which is a diagrammatic representation of a conventional fail-silent node. Since the data streams to be compared are in exact lock-step, a simple hardware comparator (cmp) can be used to check that the data streams are identical and to prevent any outputs once a discrepancy is detected. Although two replicas are actually running, because they are micro- synchronised and compared by the dedicated hardware comparator, the application running is unaware of the replication and d e comparisons undertaken. When this fail-silent technique is used, the correct and erroneous message sets sent over the network are distinguished by the fact that the only erroneous messages that can be sent are incomplete correct messages, since the occurrence of a fault during the transmission of a message can stop transmission within one clock tick. Such incomplete messages are easily identified by the receiver since they will contravene the lowest levels of network protocols.

Fail-silent nodes have been used widely, for example, in commercial transaction computer processing systems. Such nodes have been designed with the assistance of specialised comparator hardware and clock circuits. A common (reliable) clock source is used for driving a pair of processors that execute in lock-step, with the outputs compared by a (reliable) comparator; no output is produced, once a disagreement is detected by the comparator. Note that since only two microprocessors are used within a node to check on each other, the fail- silent characteristics of a node can be guaranteed only if no more than one microprocessor within a node is faulty.

Intuitively, fail-silent behaviour ought to mean that a node never generates an erroneous output, ie, the node can only either generate correct outputs or remain silent. However, this is impossible to implement in practice since output messages take a finite time to transmit, and a fault may occur leading to an error during the transmission of a message. A definition of fail-silence must include the case where a message receiver rejects such erroneous messages. Thus a two-microprocessor node will be said to exhibit fail-silent behaviour in the following sense: the outputs produced by it (if any) are either valid messages or detectabfy invalid messages; this behaviour is guaranteed so long as no more than one microprocessor in the node fails. The disadvantages of the above described fail-silent node is as follows;

Firstly, with this type of node every new microprocessor architecture is likely to require substantial design overheads. Secondly, tightly synchronised processors may not be resilient to transients which may affect the microprocessors in identical manner, commonly known as common mode transient failures. Thirdly there may be market resistance from customers to the use of these highly specialised and customised non-standard nodes and finally, lock-step synchronisation at very high clock speeds (50-100 MHZ) may well turn out to be difficult or impossible to achieve.

It follows from the above that there is a need to provide a fail-silent node which does not require the microprocessors to be synchronised in lock-step, rather, that the microprocessors are synchronised with one another only when sending or receiving information. The microprocessors of a node function to execute synchronisation and order protocols "to keep in step". We have achieved this by providing fail-silent nodes which have an ordering mechanism so that identical messages in identical order are selected for processing thus providing identical outputs. A node implemented according to the invention does not require dedicated clock or comparator circuits (the hardware signified by dotted lines in Figure 1 can thus be dispensed with). Further advantages of the invention include the fact that technology upgrades are easier. This is because the principles behind the invention do not change thus the techniques can be easily ported to any pair of microprocessors; secondly because the replicated computations are loosely synchronised, the architecture is likely to be capable of detecting common mode transient failures. This is because transients are unlikely to affect the computations on the microprocessor pairs in an identical fashion.

According to a first aspeα of the invention there is therefore provided a computing system comprising a computing node arranged to receive messages from other components in the system, to process received messages, and to transmit messages to other components in the system; the computing node comprising; a) a plurality of microprocessors linked together and arranged to process received messages; b) a means for ordering die messages to be processed such tiiat similar messages in identical order are selected for processing by correctly functioning microprocessors which then produce identical outputs; and c) means for comparing the outputs produced by the microprocessors of the node and for controlling the output of the node so that nothing is output from the node unless all the microprocessors in the node give identical output, the node output then being the same as the identical outputs.

In an alternative embodiment of the invention the said means for comparing the outputs produced by the microprocessors of the node and for controlling the output of the node operates so that nothing is output from the node unless more man half of the number of microprocessors in the node give identical output, the node output then being the same as the identical outputs.

According to a second aspect of the invention there is provided a fail-silent node in or for use in a processing system comprising;

a plurality of microprocessors having interface means for enabling communications with other components in the system, such as for example, other nodes, and a link means to enable communication between said processors in said node, characterised in mat; said microprocessors further include;

a) authentication means so that each microprocessor can confirm the integrity of any message it receives; b) signature means so that each microprocessor can label a message with its own, preferably unique, signature; c) ordering means so that each microprocessor can order authenticated messages in time- stamped order; d) diffusion means so that each microprocessor can send messages to other microprocessors; and e) comparison and control means so that the outputs produced by each microprocessor can be compared; whereby similar messages are processed in identical order and the same outputs are produced by each microprocessor so that nothing is output from the node unless all the microprocessors in the node give an identical output, the node output then being then being the same as the identical outputs.

In an alternative embodiment of the invention the said means for comparing and controlling the outputs produced by the microprocessors of the node and for controlling the output of the node operates so that nothing is output from the node unless more than half of the number of microprocessors in the node give identical output, the node output then being the same as the identical outputs.

According to either aspect of the invention, the ordering means comprises the provisions of clock means within each microprocessor which clock means are synchronised such that a measurable difference between readings of clocks at any instant is represented by a maximum known constant. Preferably the clock means is a logical clock.

Alternatively, the ordering means comprises the designation of at least one microprocessor as;

a Leader microprocessor and at least another of said microprocessors is designated as a Follower microprocessor whereby the Leader receives messages from outside the node and sends said messages to the Follower such that the order in which requests are processed is dictated by the Leader microprocessor.

In this ideal embodiment die Leader processes the information and then sends the result of this processing to the Follower so that the Follower can compare this result with its own generated result. In the event that the two results are identical, d e Follower is adapted to produce a multiple signed message which is transmitted tiirough the system. In the event that the two outputs are not identical a multiple signed message is not produced. In addition, d e Follower is provided witii means which enables it to monitor messages received from outside the node whereby faults can be detected in the Leader.

Preferably still said comparison means of said computing system or said comparison and control means of fail-silent node compares incoming messages with those produced locally so that successful messages can be countersigned by die local microprocessor and die subsequently generated multiple signed message can be transmitted through the system. In the event that the comparison fails, multiple signed messages are not produced and thus such messages are not sent through the system.

Preferably said computing system or said fail-silent node includes receiving means which discards duplicate messages.

Preferably said computing system or said fail-silent node includes microprocessors which are adapted to receive said messages in parallel. This latter arrangement is not present in the aforementioned Leader/Follower arrangement.

According to a yet further aspect of the invention there is provided a metiiod for ordering messages to be processed within a fail-silent computer node comprising;

a) receiving messages at a microprocessor; b) authenticating said messages so as to confirm the integrity of same; c) stamping said messages to be ordered with a time-stamp corresponding to a local clock reading at said microprocessor; d) signing said messages; e) diffusing eitiier the signed, time-stamped message or a copy of mis signed, time- stamped message via a link means to other microprocessors in the node; f) ordering a plurality of signed, time-stamped messages in time-stamped order; g) processing me ordered messages according to tiieir time-stamped order; h) signing the processed message output; i) diffusing eitiier this signed, processed output message or a copy of this signed, processed output message via a link means to other microprocessors in the node; and j) comparing the output messages in the node and, where a pre-determined number of said output messages are identical, releasing said output messages from said node.

In a preferred embodiment of the invention said pre-determined number equals a number equal to all the number of microprocessors in the node.

In an alternative embodiment of die invention said pre-determined number equals a number equal to more than half of said microprocessors in said node.

Preferably said messages are received at said microprocessors in a parallel manner.

The method of ordering involves a process of stabilisation whereby incoming messages are delayed for a pre-determined length of time before they are queued in die time-stamped order of messages.

In the Leader/Follower embodiment of me invention, die pre-determined lengtii of time for which incoming messages are delayed in die Leader microprocessor equals 0.

In a preferred method, for a two microprocessor fail-silent node, the process of ordering or stabilisation involves;

a) diffusing messages according to a First In First Out policy; b) receiving a time-stamped message witii a time-stamp equal to T; and c) where T is greater than the local clock value, advancing me local clock to a time T + 1 and stabilising all messages with a time-stamp less than or equal to T; or d) where T is less than or equal to the local clock value, stabilising all messages with a time-stamp less than or equal to T.

This preferred metittod enhances the efficiency of the computing system or node. An embodiment of the invention will now be described by way of example only with reference to the accompanying Figures wherein;

Figure 2 represents a diagrammatic illustration of a fail-silent node in accordance witii die invention;

Figure 3 represents a diagrammatic illustration of die operation of a fail-silent node in accordance witii the invention; and

Figure 4 represents a diagrammatic illustration of a preferred embodiment of a fail-silent node in accordance widi the invention and particularly a Leader/Follower fail-silent node in accordance witii the invention.

Figure 5 represents a diagrammatic illustration of an improved time-based ordering means.

In this detailed description it is assumed that computer systems have been structured to include a number of computer microprocessors that interact only by way of messages. Messages are defined as data which is sent from one microprocessor to another. Further, it is assumed mat computations performed by microprocessors are deterministic, mat is to say, if all the correctly functioning replicas of a process have identical initial states then they will continue to produce identical responses to incoming messages provided the messages are processed in an identical order.

The overall node architecture is shown in Figure 2. Each of the two microprocessors (Pi, P₂) has network interfaces (n,, n₂) for inter-node communication over (redundant) networks; in addition, the microprocessors are internally connected by a communication link, 1, or alternatively, die microprocessor may be linked by external means, for example by use of interfaces (n,, n₂). Each non-faulty microprocessor in a node is assumed to be able to sign a message it sends by affixing the message with its (the microprocessor's) unforgeable signature; it is also assumed to be able to authenticate any received message, tiiereby detect any attempts to corrupt the message. For example, digital signature based techniques provide such functionality with extremely high probability.

It is necessary that the replicas of computational processes on microprocessors within a node select identical messages for processing, to ensure that they produce identical outputs. Identical message selection can be guaranteed by maintaining identical ordering of messages at input ports and ensuring that application processes pick up messages at the head of their respective input ports. An ordering mechanism is then required to ensure identical ordering if both microprocessors are non-faulty.

Each non-faulty microprocessor, as shown in the Figure 2 arrangement, of a node has me following mechanisms;

a) Diffusion: this takes the messages produced by d e application process running on that microprocessor, signs them and sends them to the other microprocessor of the node for comparison. b) Comparison: this authenticates all incoming messages from the neighbouring microprocessor; an authenticated message is compared with its counterpart produced locally. If the comparison succeeds, the authenticated message is countersigned and this doubly signed message is transmitted to destination nodes. A message that cannot be compared because its counterpart does not arrive or a comparison mat detects a disagreement indicates a failure. Once a failure is indicated, die comparison mechanism stops. No further double signed messages are produced by d e node. c) Receiving: this accepts authentic messages for processing from the network, discarding any duplicates; such valid messages are sent to the local ordering mechanism. d) Ordering: this mechanism negotiates with its counterpart in the other microprocessor and attempts to construct identical queues of valid messages for processing by die computational processes.

One known method of achieving ordering requires that the physical clocks of both the microprocessors of a node are synchronised such tiiat the measurable difference between readings of clocks at any instant is bounded by a known constant.

Essentially, in the known method for ordering, the order process of a microprocessor stamps a message to be ordered witii its local clock reading. A copy of die time-stamped message is signed and sent over the link to the order process of the other microprocessor in the node. If T is die time-stamp of the message received from or sent to the order process of the other microprocessor, then die message becomes stable at local clock time T+d+e where d is the maximum transmission time taken for a time-stamped message to travel from one order process to another order process over the link and e is die maximum difference between the clocks of the two microprocessors. A message with time-stamp T will be designated stable, if no message with another time-stamp <T will be received by an order process. Stable messages are queued at the relevant input ports in the increasing time-stamp order (witii care taken not to queue a stable message, if its replica has already been queued).

The above operation ensures that the two following properties are met;

Agreement: all the non-faulty replicas of a process receive the same input messages;

Order: all die non-faulty replicas have identical input message queues or ordered message queues.

So, if all the non-faulty replicas of a process of a node have identical initial states and replicas always pick messages at the head of queues for processing then identical output messages will be produced by diem.

We have developed a time-based ordering means for use in a fail-silent node which will now be explained with reference to Figure 3. The detailed architecture of a node is depicted in Figure 3, where the major components of the system within a microprocessor of a node and their interactions are summarised. The RXJ_NT and RX EXT processes are responsible for receiving and authenticating messages from inside and outside the node respectively. An authentic message coming from outside a node will have two distinct signatures (for simplicity, the authentication of internal messages, received from the otiier microprocessor in the node, is omitted from Figure 3). Similarly, the TX INT and TX ENT processes must send messages inside and outside die node respectively. The actual computing application is represented in Figure 3 by the Service process. For the purpose of sending and receiving valid messages, each microprocessor maintains several message queues:

(i) Received Message Queue (RMQ): Contains valid received messages intended for ordering, (ii) Processed Message Queue (PMQ): Contains unsigned output messages produced by computational processes. These messages must be validated: checked by die comparator before transmission to the final destination.

(iii) External Candidate Message Queue (EMQ) : Contains singly signed messages that have been received for validation, (iv) Internal Candidate Message Queue (IMQ): Contains unsigned messages, each waiting for a signed message witii identical content to arrive in EMQ. (v) Delivered Message Queue (DMQ): Contains ordered, signed messages to be delivered to die application process for processing. (vi) Neighbouring Message Queue (NMQ): Contains signed messages to be relayed to the neighbouring microprocessor of the node. Messages could either be for ordering (from the order process) or for validation (from the diffuse process). (vii) Compared Message Queue (CMQ) : Contains doubly signed messages awaiting output, (viii) Order Message Queue (OMQ): Contains messages relayed by the neighbour microprocessor for ordering.

A message received by RX INT process could eitiier be for validation, in which case it is deposited in EMQ, or for ordering, in which case it is deposited in OMQ.

The time-based ordering means has been improved so as to reduce die stability delay. It is of note tiiat physical clocks are replaced by counters (logical clocks) which are no longer synchronised. The details are as follows. The arrival of a relayed message in OMQ can be used to reduce the stability delay, d+e, as defined previously, imposed by ordering messages. As each microprocessor is unable to generate a time-stamp smaller than any other that it has previously generated, and messages are diffused (relayed) according to a First In First Out (FIFO) policy, the time-stamp of a message received defines intervals of time where messages can be stabilised earlier than the time d+e. Figure 5 illustrates this improvement.

For instance, a message with time-stamp smaller than the local logical time is received. As no more messages will be received witii time-stamps smaller than the time-stamp of this message (local time-stamps will be greater than the local time and remote messages will necessarily have time-stamps greater than that of the received message) all previously received messages, local or remote, with time-stamps smaller than or equal to the time-stamp of the diffused message are designated stable.

Alternatively, a message with time-stamp greater than the local time is received. In this case it is certain that no more messages will be received with time-stamp smaller than the local time, thus every previously received message with time-stamp smaller than or equal to the local time is stable. Further the local logical clock is advanced so as to exceed die time-stamp of me received message.

In the above, ordering means the two microprocessors operate symmetrically, that is to say, me microprocessors execute the same method. However, considerable performance improvements can be obtained by using an asymmetrical approach. We assign different roles for each of die microprocessors forming a node. We will term one the Leader and die other the Follower.

The order in which the requests will be processed in the node is dictated by die Leader microprocessor. The Leader selects one of the input messages for processing and sends a copy to die Follower. After transmission of this message to the Follower, die Leader processes die message, signs it and afterwards sends a copy of the output message to the Follower. Meanwhile the Follower receives the message, processes it and waits for the Leader's output message. The Follower then picks up the Leader's output message and compares it with its own. If the two messages agree, a double signed message is output, otherwise, the Follower microprocessor stops its activities and no double signed messages will be output. It is necessary to have communication in the Follower-Leader direction so that the Leader can detect faults in the Follower. Also, the Follower must monitor the messages received from outside (omitted for simplicity from Figure 4) the node in order to detect faults occurring in the Leader. Figure 4 shows the architecture for the Leader/Follower fail-silent node. The various processes and queues perform the same functions as described earlier in this section. In the Follower microprocessor, the number of signatures appended to a message received from the Leader determines its destination (EMQ or DMQ). Two signatures indicate the message is to be processed.

The Service, Diffuse and Compare processes work in almost the same way as in the normal fail-silent architecture. The Rx lnt and Tx lnt receive and transmit messages within the node. The Rx Ext and Tuning processes on the Follower are responsible for detecting omission and timing faults occurring in the Leader. In a correctly functioning system, both microprocessors will receive the same request messages from outside the node (although not necessarily in the same order). The Follower's Rx Ext process receives each request from outside die node and deposits it in the External Received Message Queue (ERMQ) (if a copy of the message is already there, having been deposited by die Timing process, then the message time-out is reset) with an associated time-out. The Timing process picks up each message in the Internal Received Message Queue (URMQ) and resets the time-out associated witii its counterpart in the ERMQ (if its counterpart is not there, the message is placed in ERMQ with an associated time-out). If a time-out 'fires', the Follower assumed that the Leader has failed and ceases its own activities. As a result, no more double signed messages will be output. To solve the problem of detecting omission and timing failures in the Follower, it suffices to make the Follower send to the Leader the single signed messages that are supposed to be output. After comparing this with its own output, the Leader will also output a double signed message. If die expected message does not arrive in a 'reasonable' time, die Leader will stop sending messages to die Follower, and so no more double signed messages will be output. To calculate the time to process a request for this node, it is necessary to analyse the activities in both microprocessors. As the activities in the Leader and Follower microprocessors are executed in parallel, we cannot simply add the times of the activities executed in each node. The Service processes are executed in both microprocessors in parallel. However, the Follower has to wait for the request message sent by the Leader before service can begin (the wait time is equal to T,,^. The Compare process in the Follower microprocessor has to wait for the local message produced by die Service process (in die Follower) and for the Leader's counterpart of this message. In general, if request messages and local response messages are sent by the Leader to die Follower through independent channels, it is likely that the Compare process will never have to wait for the neighbour's response message, as this message will be sent while the Follower is executing Service. More formally, the time mat die Service process in the Follower will wait for the request message plus the time that the Compare process will wait for the neighbour's response message is equal to

Thus T_-__Uf is given by the following expression:

T * pFsSil/ff —^— T 1 RR_-.EEΠ-_ + <^{" ~} i^~4-_rfthhu_-. + ' T '^■- _fx_trmnae. > ' * c-oo-ojpp--rrce '' T -l-T-, x-lm and following the same approach of the previous section, we conclude that:

^L FSl/r ^—— i ⁱ NNααoo--RR--np ^{" ~}•^" ' ^1■ώ-_Sffn--_- ^"• L

Since microprocessors in a fail-silent node must exchange at least one message per request (the message to be compared) me Leader/Follower soft fail-silent node has near optimal performance for a software implemented fail-silent node.

When discussing the Leader-Follower mechanism described above, it is necessary to examine the performance of both Leader and Follower since they are executing different protocols. The input delay in the Follower is defined to be the time between remove(m,Rx_Ext) at the Leader and remove(m,DMQ) at the Follower. Hence it reflects the time taken for the Leader to receive the message, relay it to the Follower and have die Follower remove it from DMQ. The output delays of Leader and Follower are significantly different. In die Follower, the output delays are smaller than in the Leader because die Leader begins to service the request before the Follower so that when the Follower is ready to compare its result, the Leader will before the Follower so that when the Follower is ready to compare its result, the Leader will have already send (or be sending) its response. If the comparison at the Follower is successful, the Follower outputs the compared message before passing its response to the Leader. Hence the output delay at d e Leader reflects this additional time.

The experimental figures given in Table 1 indicate that adopting die Leader-Follower mechanism within a fail-silent node leads to a significant improvement in performance. The overhead of using soft fail-silent nodes is to produce a delay in response of approximately 3.7 ms in a lightly loaded system up to 6 ms when messages are constantly queued awaiting service. In either case, the performance of the Leader/Follower fail-silent node is considerably better than either of the fail-silent nodes employing an order protocol. If the application services involve lengthy computations then the percentage overhead involved in adding replication is extremely small. It is only when communication time between nodes outweighs computation time within nodes tiiat the cost of replication becomes significant.

Modifications of the invention to accommodate a fail-silent node including two or more microprocessors

Fail-silent nodes stop issuing valid messages as soon as a fault is detected in the node. However, it is possible to build multi-microprocessor nodes which mask failures and continue to work in the presence of failures. An N failure masking node contains N microprocessors and continues to work provided not more than f=(N-l)/2 of these processors fail. Failure masking nodes also require an ordering mechanism for incoming messages. However, the performance of the ordering mechanism of failure masking nodes can be enhanced by applying an extension of the logical clock method as described above. The description of N failure masking nodes is more complex since the nodes must continue to work despite the presence of up to f arbitrary processor failures of a node. Below we describe die method:

Each non-faulty microprocessor P, in an N failure masking node maintains a logical clock which is first initialised to 1 when die node is started, and whose value will only increase with the passage of real time. When microprocessor Pj receives an authentic external message M (that is to say, a message with f + 1 signatures), it composes an internal message comprising of the contents of M, a local logical time-stamp and the identity of the microprocessor (ie P,). This composed message is deposited in a message pool called received, and is incremented by 1 - this ensures that a non-faulty Pj will prepare internal messages with increasing time-stamps. The message is then signed and sent to all other microprocessors in the node.

When microprocessor P_; receives an authentic internal message m with s distinct signatures, s> 1, it deposits m in a pool received,, if m is not already there; if m is a new message, d is set to max {Ci,m.T+ l}. Because of this operation a non-faulty P, will never prepare and send a different message m' with a time-stamp less than the original message after having received the original message. If the number of signatures does not exceed f, the received message is countersigned and sent to all other microprocessors in the node who have not signed m. The messages in received,, which have time-stamps smaller than the smallest logical clock value in the node are stable and can be ordered according to tiieir time-stamps.

A non-faulty microprocessor P_s detects late messages using time-outs that are set-up as a result of receiving or sending messages. The principles behind setting these time-outs are as follows:

(i) suppose Pj prepares and sends a message m to every other P, at its (physical) clock time t; after its clock time t+d, every Pj should have C_j> m.T (time-stamp of m) and hence after its clock time t+2d, Pj will not accept any different messages m', with m'.T≤m.T; (ii) suppose that P_; receives a message m at its clock time t; any non-faulty microprocessor P, tiiat signed m, must have its C_j> m.T at the time it sent m.

Therefore, allowing d for possible message take-over during transmission, Pj should not accept any single signed m' from any P_j after its clock time t+d; and

(iii) suppose that in case (ii), P_k is a microprocessor that has not signed m. P_k may receive m as late as t+d according to P,'s clock. So P, must accept any single signed message m' from P_k so long as its clock reads less than t+2d. Similar time-outs for multiple signed messages can be derived, and careful analysis indicates that each additional signature in a message will increase the time-out for accepting that message by 2d. For example, (as in case (i)) P, after having prepared and sent m at its clock time t, should accept any new double-signed message m', with m'.T≤m.T, whilst its clock reads less than t+4d. This complexity is necessary to prevent faulty microprocessors corrupting the ordered queues of correct microprocessors.

This scheme, unlike most ordering mechanisms, does not require that microprocessors' physical clocks are synchronised within a known bound. It provides an efficient mechanism for providing ordered message queues in an N failure masking node. In particular, for the special case, N=3 and f= l, the above technique can be optimised, resulting in a reduced ordering delay. Further optimisation still is possible when there are no microprocessor failures in the node and die communication time between two non-faulty microprocessors is much less than the estimated upper bound d. Since these conditions generally hold in practical systems, these optimisations give valuable performance enhancement to the node.

Claims

1. A computing system comprising a computing node arranged to receive messages from other components in the system, to process received messages, and to transmit messages to other components in the system; the computing node comprising; a) a plurality of microprocessors linked togetiier and arranged to process received messages; b) a means for ordering the messages to be processed such tiiat similar messages in identical order are selected for processing by correctly functioning microprocessors which then produce identical outputs; and c) means for comparing the outputs produced by the microprocessors of the node and for controlling the output of the node so that nothing is output from the node unless all the microprocessors in the node give identical output, the node output then being the same as the identical outputs.

2. A computing system according to Claim 1 wherein the said means for comparing the outputs produced by die microprocessors of the node and for controlling the output of the node operates so that nothing is output from the node unless more than half of the number of microprocessors in the node give identical output, the node output then being the same as the identical outputs.

3. A fail-silent node in or for use in a microprocessing system comprising; a plurality of microprocessors having interface means for enabling communications with other components in the system and a link means to enable communication between said processors in said node, characterised in that; said microprocessors further include; a) authentication means so that each microprocessor can confirm the integrity of any message it receives; b) signature means so that each microprocessor can label a message with its own signature; c) ordering means so that each microprocessor can order authenticated messages in time-stamped order; d) diffusion means so that each microprocessor can send messages to other microprocessors; and e) comparison and control means so that the outputs produced by each microprocessor can be compared; whereby similar messages are processed in identical order and the same outputs are produced by each microprocessor so that nothing is output from the node unless all the microprocessors in the node give the same output, the node output then being then being the same as the said same output.

4. A fail-silent node according to Claim 3 wherein the said means for comparing and controlling die outputs produced by die microprocessors of the node are adapted to operate so that nothing is output from the node unless more than half of the number of microprocessors in the node give die same output, the node output then being the same as the said same output.

5. A computing system or a fail-silent node according to any preceding Claim wherein the ordering means comprises the provisions of clock means within each microprocessor which clock means are synchronised such that a measurable difference between readings of clocks at any instant is represented by a maximum known constant.

6. A computing system or a fail-silent node according to Claim 5 wherein the clock means is a logical clock.

7. A computing system or a fail-silent node according to Claims 1-4 wherein the ordering means comprises the designation of at least one microprocessor as; a Leader microprocessor and at least another of said microprocessors is designated as a Follower microprocessor whereby the Leader receives messages from outside die node and sends said messages to the Follower such that die order in which requests are processed is dictated by the Leader microprocessor.

8. A computer system or a fail-silent node according to Claim 7 wherein the

Leader is adapted to process the information and then sends the result of this processing to the Follower so that the Follower can compare this result with its own generated result and in the event that die two results are identical, die Follower is adapted to produce a multiple signed message which is transmitted through the system.

9. A computer system or a fail-silent node according to Claim 8 wherein the

Follower is provided witii means which enables it to monitor messages received from outside die node whereby faults can be detected in the Leader.

10. A computing system or a fail-silent node according to any preceding Claim wherein said comparison means of said computing system or said comparison and control means of said fail-silent node is adapted to compare incoming messages with those produced locally so that successful messages can be countersigned by die local microprocessor and a subsequently generated multiple signed message can be transmitted through the system.

11. A computing system or a fail-silent node according to any preceding Claim wherein said computing system or said fail-silent node includes receiving means which discards duplicate messages.

12. A computing system or fail-silent node according to Claims 1-6, 10 or 11, wherein said computing system or said fail-silent node includes microprocessors which are adapted to receive said messages in parallel.

13. A method for ordering messages to be processed within a fail-silent computer node comprising; a) receiving messages at a microprocessor; b) authenticating said messages so as to confirm the integrity of same; c) stamping said messages to be ordered with a time-stamp corresponding to a local clock reading at said microprocessor; d) signing said messages; e) diffusing eitiier the signed, time-stamped message or a copy of this signed, time-stamped message via a link means to other microprocessors in the node; f) ordering a plurality of signed, time-stamped messages in time-stamped order; g) processing the ordered messages according to their time-stamped order; h) signing the processed message output; i) diffusing either this signed, processed message output or a copy of this signed, processed message output via a link means to other microprocessors in the node; and j) comparing the message outputs in the node and, where a pre¬ determined number of said message outputs are the same, releasing said same message outputs from said node.

14. A method in accordance with Claim 13 wherein said pre-determined number equals a number equal to all the number of microprocessors in the node.

15. A method according to Claim 13 wherein said pre-determined number equals a number equal to more than half of said microprocessors in said node.

16. A method according to Claims 13, 14 or 15 wherein the method further includes die step of receiving said messages at said microprocessors in a parallel manner.

17. A method according to Claims 13-16 wherein the method of ordering involves a process of stabilisation whereby incoming messages are delayed for a pre-determined length of time before they are queued in the time-stamped order of messages.

18. A method according to Claim 17 wherein one of said microprocessors is designated a Leader microprocessor and at least one other of said microprocessors is designated a Follower microprocessor whereby the Leader receives messages from outside die node and sends said messages to the Follower such that the order in which messages are processed is dictated by the Leader microprocessor; and die pre-determined length of time for which incoming messages are delayed in die Leader microprocessor equals 0.

19. A method according to Claim 17 wherein two microprocessors are provided in said node and the process of ordering or stabilisation involves; a) diffusing messages according to a First In First Out policy; b) receiving a time-stamped message with a time-stamp equal to T; and c) where T is greater than the local clock value, advancing the local clock to a time T+ 1 and stabilising all messages with a time-stamp less than or equal to T; or d) where T is less than or equal to the local clock value, stabilising all messages with a time-stamp less than or equal to T.