WO1998010348A1

WO1998010348A1 - Microcontroller fail-safe system

Info

Publication number: WO1998010348A1
Application number: PCT/EP1997/004876
Authority: WO
Inventors: Wilhard Wendorff; Steve Machell
Original assignee: Motorola Gmbh
Priority date: 1996-09-07
Filing date: 1997-09-08
Publication date: 1998-03-12
Also published as: GB9618702D0; GB2317032A

Abstract

A microcontroller fail-safe system includes a primary microprocessor (20) coupled to a bus (80), and arranged for processing unprocessed signals received from the bus (80) and for providing primary processed signals. A delay arrangement (210) delays the unprocessed and the processed signals by a predetermined period, to provide delayed signals. A secondary microprocessor (120) is arranged for processing the delayed unprocessed signals and for providing secondary processed signals. A comparing arrangement (210) compares the delayed primary processed signals and secondary processed signals such that the compared signals are synchronised through having experienced the same delay.

Description

MICROCONTROLLER FAILSAFE SYSTEM

Field of the Invention

This invention relates to microcontroller fail-safe systems, and particularly but not exclusively to microcontroller fail-safe systems for embedded microcontrollers

Background of the Invention

In many safety critical applications, for example in an automotive Anti-lock braking system (ABS), electronic systems such as embedded microcontrollers are increasingly used to replace mechanical systems. Typically, safety relevant embedded microcontroller systems have a mechanical fall-back solution which provides a fail-safe mechanism if the electronic system should fail.

Due to the complexity of these systems and the need for further cost reduction, maintaining a mechanical system and an electronic system is undesirable. However, with an ever increasing demand on safety, some form of fall back arrangement is necessary.

Furthermore semiconductor manufacturers are asked to constantly improve the quality of semiconductor devices. Quality standard programmes typically end when devices are delivered to a customer. In the future it is envisaged that devices be quality checked within customer applications through self-checking means.

One solution includes the use of two or more identical embedded microcontrollers, one arranged to confirm the operation and/or quality of the other.

A problem with such an arrangement is that common mode errors (e.g. power glitches, clock noise) cause identical errors on both (or more) microcontrollers and thus the errors remain undetected.

Furthermore all the system resources, such as memory, peripherals etc. have to be duplicated in order to run a double embedded microcontroller system, giving rise to expensive semi-redundant architecture. In addition a synchronisation protocol has to be present to synchronise the multiple microcontrollers, giving rise to further processing and communication overhead. Lastly, two different software programs are required to be processed by the two microcontrollers. These factors all increase the cost and size of the system, and affect reliability.

This invention seeks to provide a microcontroller fail-safe system which mitigates the above mentioned disadvantages.

Summary of the Invention

According to the present invention there is provided a microcontroller fail-safe system comprising: a bus; a primary microprocessor coupled to the bus, and arranged for processing unprocessed signals received therefrom and for providing primary processed signals; delay means coupled to receive the unprocessed and the processed signals from the bus and arranged for delaying the signals by a predetermined period, to provide delayed signals therefrom; a secondary microprocessor coupled to the delay means and arranged for processing the delayed unprocessed signals received therefrom and for providing secondary processed signals; and, comparing means coupled to the delay means and to the secondary microprocessor, for comparing the delayed primary processed signals and secondary processed signals; whereby the compared signals are synchronised through having experienced the same delay.

The microcontroller fail-safe system preferably further comprises a main memory coupled to the main bus, for storing a program to be executed by the primary microprocessor, and preferably also further comprises a safety bus coupled to the main memory and to the comparing means, and a safety memory for storing redundant data related to data stored in the main memory, such that when data stored in the main memory is written to the bus, the redundant data in the safety memory is written to the comparing means via the safety bus.

The safety bus is preferably coupled to the delay means, and the redundant data is also subject to the delay before being stored in the safety memory. Preferably the primary microprocessor is coupled to a primary peripheral device via the bus.

The secondary microprocessor is preferably coupled to a secondary peripheral device via a secondary bus, and wherein the secondary peripheral device dupUcates the operation of the primary peripheral device. Preferably the delay comprises a fixed portion and a time dependent portion.

In this way common mode errors are substantially avoided without the need to duplicate all the system resources, and without the need for a synchronisation protocol or two different software programs.

Brief Description of the Drawings)

An exemplary embodiment of the invention will now be described with reference to the drawing in which:

FIG.l shows a preferred embodiment of a microprocessor fail-safe system in accordance with the invention.

FIG.2 shows an element of the system of FIG.l in greater detail.

Detailed Description of a Preferred Embodiment

Referring to FIG.l, there is shown a microcontroller fail safe system 5, comprising a main microcontroller module 10, a redundant microcontroller module 100 and an interface unit 200.

The main microcontroller module 10 comprises a Central Processing Unit (CPU) 20, a main memory 30, a peripheral device 40, an interrupt controller 50, and a bus extension 70, all interconnected by a main bus 80.

The interrupt controller 50 is a standard embedded control interrupt controller. It synchronises the external asynchronous interrupt events and activates on- chip synchronous interrupt lines (not shown). The redundant microcontroller module 100 comprises a redundant CPU 120, a safety bus address generator 150, and a redundant peripheral device 140 interconnected by a redundant bus 180, and a safety memory 130 connected to a safety bus 190 to be further described below.

The redundant CPU 120 is a standard modular CPU core, substantially identical to the main CPU 20, and is connected to the redundant bus 180 as it would be in a single (non redundant) CPU system. No special code has to be included to control or program the redundant CPU 120.

The redundant peripheral device 140 allows the redundant CPU 120 the ability to disable some features/actuators of the system 5 in case the peripheral device 40 fails. This allows a system redundancy beyond that of the CPU redundancy.

The safety bus 190 is coupled to the peripheral device unit 40, the safety memory 130 and the fail-safe module 210, and is arranged to provide safety data such as parity bits derived data written to the main bus 80 by peripherals such as the peripheral device 40. The safety memory 130 is arranged to store this safety bus data. The safety bus address generator 150 is coupled to the main bus 80 and the safety bus 180. It addresses the safety memory 130 and initiates safety memory 130 read and write operations.

The interface unit 200 comprises a fail safe module 210 and a frequency modulated phase lock loop (PLL) 220. The PLL 220 generates two clocks, a main clock 230, which provides clock signals for the main microcontroller module 10, and a safety clock 240 which provides clock signals for the redundant microcontroller module 100. The safety clock 240 is frequency modulated. This introduces a time dependant phase delay between the main CPU 20 and the redundant CPU 120.

The fail safe module 210 is coupled to the main bus 80, the redundant bus 180 the safety bus 190, the main clock 230 and the safety clock 240. The fail safe module 210 includes a bus delay unit 300, a compare unit 310 , a test unit 320, an error unit 330, an interrupt delay unit 340 and a safety bus unit 350.

The bus delay unit 300 is coupled between the main bus 80 and the redundant bus 180, and is arranged to receive signals from the main bus 80, and to pass these to the redundant bus 180 with the incorporation of a fixed length delay and a time dependant delay. The time dependent delay is provided as a result of the synchronisation between two clocks of the PLL 220. In this way the redundant CPU 120 receives the same data as the main CPU 20, but delayed by the fixed length delay and the time dependant delay. To increase security, this fixed length delay should be a odd multiple of the execution time of the longest instruction processed by the main CPU 20, divided by two. To deliver all data from the peripheral device 40 and the main memory 30 to the redundant CPU 120, all main bus 80 activities have to be delayed and written to the redundant bus 180. Any data written by the main CPU 20 is also delayed but not written to the redundant bus 180. Instead, it is used by the compare unit 310 as described below.

The compare unit 310 is coupled to the bus delay unit 300, and is arranged to compare the delayed data written by the main CPU 20, with non delayed-data written by the redundant CPU 120. Since the delay time of the delayed data is the same as the processing delay of the redundant CPU 120, there will be an identical match if both the main CPU 20 and the redundant CPU 120 have operated correctly. If the compare fails, either a CPU error or a compare unit 310 error has occurred, and an error output line to the error unit 330 is driven. The fail-safe module 210 has to have an in-build redundancy, to detect errors within this circuitry, and this is provided by the test unit 320, which is coupled to each of the units within the fail-safe module 210.

The error unit 330 is coupled to receive error signals from the compare unit 330, from the test unit 320, and from the safety bus unit 350. If the compare unit 310 detects an error in the correlation of the redundant bus 180 and the main bus 80 activities, if the safety bus unit 350 detects an error in the correlation between the safety bus 190 activities and the main bus 80 activities,_or if the safety bus unit 350 detects an error in the test unit activities 320 (an error in the compare unit 310), an output 335 is driven to notify the system that the microcontroller system 5 is not running correctly. The output 335 could be used to reset the system 5 and to put system peripherals such as actuators or relays into a secure state, for example, to switch off ABS actuators.

The interrupt delay unit 340 is coupled between the main bus 80 and the redundant bus 180, and is arranged to handle all asynchronous external and internal interrupts. Interrupts are synchronised and put onto the main bus 80 by the interrupt controller 50. The interrupt delay unit 340 delays all these interrupts on the main bus 80 by the fixed length delay and the time dependant delay, and writes the delayed interrupts to the redundant bus 180, in the same manner as the delayed data/addresses handled by the bus delay unit 300.

The safety bus unit 350 generates redundant data of the main bus 80 onto the safety bus 190, if the originator of the data is the main CPU 20. It also checks the safety bus for redundant data generated by the peripherals such as the peripheral device 40 or the safety memory 130. Redundant information could be, for example, parity bits, CRC bits etc. The safety bus unit 350 then detects errors due to bus data distortion, peripheral data distortion, or memory data distortion.

In operation, the main CPU 20 processes a standard processing algorithm/program which is stored in the main memory 30. The main CPU 20 addresses the peripheral device 40 and the main memory 30, via the main bus 80, to obtain the program and necessary data and to write back outputs/results.

The redundant CPU 120 reprocesses the same standard processing algorithm/program delayed by the fixed delay and the time dependent delay. The fixed time delay ensures that the main CPU 20 and the redundant CPU 120 are never in the same state of processing at the same time. This prevents common mode errors (errors that are triggered by the same event and cause the same error result in two different circuitries). The time dependent delay decreases EMI radiation of the entire fail-safe microcontroller system 5 by broadening the characteristic spectra peaks and therefore decreasing radiated energy density. Furthermore the time dependent delay inserts a slight variation into the phase relationship between the main CPU 20 and the redundant CPU 120. This reduces the probability of common mode errors arising from similar or duplicate instructions at different stages of the program stored in the main memory 30, which are processed by the main CPU 20 and the redundant CPU 120 at the same time and might lead to the same error results.

The fail safe module 210 buffers all data and addresses on the main bus 80. Data and addresses generated by the main CPU 20 are buffered/delayed and compared to the data/addresses generated by the redundant CPU 120. Data/addresses originated by the peripheral unit 40 and the main memory 30 are buffered/delayed and forwarded to the redundant bus 180. Asynchronous interrupt events are synchronised by the interrupt controller 50 and also buffered/delayed by the interrupt delay unit 340 of the fail safe module 210.

When data is sent on the main bus 80 to the main memory 30 or to a memory/register within the peripheral device 40, the safety memory 130 stores redundant data relating to the data on the main bus 80. When data is subsequently retrieved from the main memory 30 or the peripheral device 40, the safety data stored in the safety memory 130 is passed to the safety bus unit 350 of the fail safe module 210 via the safety bus 190. The safety bus address generator 150 addresses and initiates these write and read operation into or from the safety memory 130. The safety bus unit 350 then compares the redundant data with the data being written to the main bus 80. If the redundant data does not agree with the data being written to the main bus 80 (for example if the parity bit does not match the data value), then the safety bus unit sends an error signal to the error unit 330, signifying that data/address distortion has occurred either on the main bus 80, within the main memory 30,within the safety bus address generator 150, or the peripheral device 40.

No special test and debug features have to be included. A standard scan path, for example a background debug mode is included to identify and verify field return defects during the lifetime of the system 5. This can be multiplexed with the respective lines for the main CPU 20. Functional factory testing can be done fully transparently. Functional tests run on the main CPU 20 are performed in parallel (with a delay due to the fail safe module delay lines) on the redundant CPU 120. The fail safe module 210 verifies the functional correctness of the redundant CPU 120 by comparing the result with the results of the main CPU 20. Additional tests using a scan path have to be performed to test the fail safe module 210.

It will be appreciated that alternative embodiments to the one described above are possible. For example, the safety bus 190 and safety memory 130 could be arranged to operate with the same delay as the redundant CPU 120.

Furthermore, to provide further safety, two compare units could be used within the fail-safe module 210. The output of these two units would then be logically "XOR" (exclusively "OR") combined and arranged to drive the output. To increase the robustness of this redundancy even more, a delay between these two compare units could be provided.

In addition, the safety bus address generator 150 could be incorporated as part of the fail safe module 210.

Claims

1. A microcontroller fail-safe system comprising: a bus; a primary microprocessor coupled to the bus, and arranged for processing unprocessed signals received therefrom and for providing primary processed signals; delay means coupled to receive the unprocessed and the processed signals from the bus and arranged for delaying the signals by a predetermined period, to provide delayed signals therefrom; a secondary microprocessor coupled to the delay means and arranged for processing the delayed unprocessed signals received therefrom and for providing secondary processed signals; and, comparing means coupled to the delay means and to the secondary microprocessor, for comparing the delayed primary processed signals and secondary processed signals; whereby the compared signals are synchronised through having experienced the same delay.

2. The microcontroller fail-safe system of claim 1 further comprising a main memory coupled to the main bus, for storing a program to be executed by the primary microprocessor.

3. The microcontroller fail-safe system of claim 2 further comprising a safety bus coupled to the main memory and to the comparing means, and a safety memory for storing redundant data related to data stored in the main memory, such that when data stored in the main memory is written to the bus, the redundant data in the safety memory is written to the comparing means via the safety bus.

4. The microcontroller fail-safe system of claim 3 wherein the safety bus is coupled to the delay means, and wherein the redundant data is also subject to the delay before being stored in the safety memory.

5. The microcontroller fail-safe system of any preceding claim wherein the primary microprocessor is coupled to a primary peripheral device via the bus.

6. The microcontroller fail-safe system of claim 5 wherein the secondary microprocessor is coupled to a secondary peripheral device via a secondary bus, and wherein the secondary peripheral device duplicates the operation of the primary peripheral device.

7. The microcontroller fail-safe system of any preceding claim wherein the delay comprises a fixed portion and a time dependent portion.

8. A microcontroller fail-safe system substantially as hereinbefore described and with reference to the drawings.