CA2657882A1 - Fault tolerance and failover using active copy-cat - Google Patents
Fault tolerance and failover using active copy-cat Download PDFInfo
- Publication number
- CA2657882A1 CA2657882A1 CA002657882A CA2657882A CA2657882A1 CA 2657882 A1 CA2657882 A1 CA 2657882A1 CA 002657882 A CA002657882 A CA 002657882A CA 2657882 A CA2657882 A CA 2657882A CA 2657882 A1 CA2657882 A1 CA 2657882A1
- Authority
- CA
- Canada
- Prior art keywords
- primary
- instance
- input
- backup
- copy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1629—Error detection by comparing the output of redundant processing systems
- G06F11/1641—Error detection by comparing the output of redundant processing systems where the comparison is not performed by the redundant processing components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1675—Temporal synchronisation or re-synchronisation of redundant processing components
- G06F11/1687—Temporal synchronisation or re-synchronisation of redundant processing components at event level, e.g. by interrupt or result of polling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1695—Error detection or correction of the data by redundancy in hardware which are operating with time diversity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2028—Failover techniques eliminating a faulty processor or activating a spare
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2038—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1479—Generic software techniques for error detection or fault masking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2046—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hardware Redundancy (AREA)
Abstract
Fault tolerant operation is disclosed for a primary instance, such as a process, thread, application, processor, etc., using an active copy-cat instance, a.k,a. backup instance, that mirrors operations in the primary instance, but only after those operations have successfully completed in the primary instance. Fault tolerant logic monitors inputs and outputs of the primary instance and gates those inputs to the backup instance once a given input has been processed. The outputs of the backup instance are then compared with the outputs of the primary instance to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup instance to take over for the primary instance in a fault situation wherein the primary and backup instances are loosely coupled, i.e. they need not be aware that they are operating in a fault tolerant environment. As such, the primary instance need not be specifically designed or programmed to interact with the fault tolerant mechanisms. Instead, the primary instance need only be designed to adhere to specific basic operating guidelines and shut itself down when it cannot do so. By externally controlling the ability of the primary instance to successfully adhere to its operating guidelines, the fault tolerant mechanisms of the disclosed embodiments can recognize error conditions and easily failover from the primary instance to the backup instance.
Claims (21)
1. A method of providing fault tolerance to a primary instance, the method comprising:
receiving a copy of a first input transmitted to the primary instance;
forwarding the copy of the first input to a backup instance operative to generate a first backup result based on the forwarded copy of the first input;
waiting, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input;
determining that the first primary result is not likely to be transmitted;
preventing, based on the determination that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and transmitting the first backup result when the first primary result is not likely to be transmitted.
receiving a copy of a first input transmitted to the primary instance;
forwarding the copy of the first input to a backup instance operative to generate a first backup result based on the forwarded copy of the first input;
waiting, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input;
determining that the first primary result is not likely to be transmitted;
preventing, based on the determination that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and transmitting the first backup result when the first primary result is not likely to be transmitted.
2. The method of claim 1 further comprising:
receiving a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
waiting, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the forwarding further comprises forwarding the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance; and comparing the first primary result with the first backup result and indicating a failure of the backup instance, the primary instance, or a combination thereof, when the first primary result is at least partially different from the first backup result.
receiving a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
waiting, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the forwarding further comprises forwarding the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance; and comparing the first primary result with the first backup result and indicating a failure of the backup instance, the primary instance, or a combination thereof, when the first primary result is at least partially different from the first backup result.
3. The method of claim 1, wherein the waiting further comprises waiting for a defined period of time to elapse from when the first input is received, wherein the determining further comprises determining that the first primary result has not been received before the defined period of time has elapsed.
4. The method of claim 1, wherein the transaction comprises a store transaction to a database, the preventing further comprising preventing completion of the store transaction.
5. The method of claim 4, wherein the preventing further comprises causing the database to return a constraint violation in response to the store transaction.
6. The method of claim 1, where in the primary instance comprises a match server of a financial exchange.
7. The method of claim 1, wherein the primary instance comprises a software application.
8. The method of claim 1, wherein the primary instance comprises a processor.
9. A system for providing fault tolerance to a primary instance, the system comprising:
a receiver operative to receive a copy of a first input transmitted to the primary instance;
an input forwarder coupled with the receiver and operative to forward the copy of the first input to a backup instance, the backup instance being operative to generate a first backup result based on the forwarded copy of the first input;
a fault detector coupled with the receiver and the primary instance and operative to wait, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, wherein the fault detector is further operative to determine that the first primary result is not likely to be transmitted;
a transaction inhibitor coupled with the fault detector and operative to prevent, based on the determination that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and a transmitter coupled with the backup instance and the fault detector and operative to transmit the first backup result when the first primary result is not likely to be transmitted.
a receiver operative to receive a copy of a first input transmitted to the primary instance;
an input forwarder coupled with the receiver and operative to forward the copy of the first input to a backup instance, the backup instance being operative to generate a first backup result based on the forwarded copy of the first input;
a fault detector coupled with the receiver and the primary instance and operative to wait, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, wherein the fault detector is further operative to determine that the first primary result is not likely to be transmitted;
a transaction inhibitor coupled with the fault detector and operative to prevent, based on the determination that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and a transmitter coupled with the backup instance and the fault detector and operative to transmit the first backup result when the first primary result is not likely to be transmitted.
10. The system of claim 9 wherein:
the receiver is further operative to receive a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
the fault detector is further operative to wait, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the input forwarder is further operative to forward the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
a comparator coupled with the primary instance and the backup instance and operative to compare the first primary result with the first backup result and indicate a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result.
the receiver is further operative to receive a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
the fault detector is further operative to wait, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the input forwarder is further operative to forward the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
a comparator coupled with the primary instance and the backup instance and operative to compare the first primary result with the first backup result and indicate a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result.
11. The system of claim 9, wherein the fault detector is further operative to wait for a defined period of time to elapse from when the first input is received and determine that the first primary result has not been received before the defined period of time has elapsed.
12. The system of claim 9, wherein the transaction comprises a store transaction to a database, the transaction inhibitor being further operative to prevent completion of the store transaction.
13. The system of claim 12, wherein the transaction inhibitor being further operative to cause the database to return a constraint violation in response to the store transaction.
14. The system of claim 9, where in the primary instance comprises a match server of a financial exchange.
15. The system of claim 9, wherein the primary instance comprises a software application.
16. The system of claim 9, wherein the primary instance comprises a processor.
17. A system for providing fault tolerance to a primary instance, the system comprising:
a processor;
a memory coupled with the processor;
first logic stored in the memory and executable by the processor to receive a copy of a first input transmitted to the primary instance;
second logic stored in the memory, coupled with the first logic and executable by the processor to forward the copy of the first input to a backup instance, the backup instance being operative to generate a first backup result based on the forwarded copy of the first input;
fourth logic stored in the memory, coupled with the first logic and executable by the processor to wait, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, wherein the fourth logic is further executable by the processor to determine that the first primary result is not likely to be transmitted;
fifth logic stored in the memory, coupled with the fourth logic and executable by the processor to prevent, based on the fourth logic determining that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and sixth logic stored in the memory, coupled with the backup instance and the fourth logic and executable by the processor to transmit the first backup result when the first primary result is not likely to be transmitted.
a processor;
a memory coupled with the processor;
first logic stored in the memory and executable by the processor to receive a copy of a first input transmitted to the primary instance;
second logic stored in the memory, coupled with the first logic and executable by the processor to forward the copy of the first input to a backup instance, the backup instance being operative to generate a first backup result based on the forwarded copy of the first input;
fourth logic stored in the memory, coupled with the first logic and executable by the processor to wait, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, wherein the fourth logic is further executable by the processor to determine that the first primary result is not likely to be transmitted;
fifth logic stored in the memory, coupled with the fourth logic and executable by the processor to prevent, based on the fourth logic determining that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and sixth logic stored in the memory, coupled with the backup instance and the fourth logic and executable by the processor to transmit the first backup result when the first primary result is not likely to be transmitted.
18. The system of claim 17 wherein:
the first logic is further executable by the processor to receive a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
the fourth logic is further executable by the processor to wait, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the second logic is further executable by the processor to forward the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
seventh logic stored in the memory, coupled with the primary instance and the backup instance and executable by the processor to compare the first primary result with the first backup result and indicate a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result.
the first logic is further executable by the processor to receive a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
the fourth logic is further executable by the processor to wait, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the second logic is further executable by the processor to forward the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
seventh logic stored in the memory, coupled with the primary instance and the backup instance and executable by the processor to compare the first primary result with the first backup result and indicate a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result.
19. The system of claim 17, wherein the primary instance comprises a first execution by the processor of seventh logic stored in the memory and executable by the processor, the backup instance comprising a second execution of the seventh logic by the processor.
20. A system for providing fault tolerance to a primary instance means, the system comprising:
means for receiving a copy of a first input transmitted to the primary instance;
means for forwarding, coupled with the means for receiving, the copy of the first input to a backup instance means operative to generate a first backup result based on the forwarded copy of the first input;
means for waiting, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, the means for waiting being coupled with the means for receiving;
means for determining, coupled with the means for receiving and the means for waiting, that the first primary result is not likely to be transmitted;
means for preventing, coupled with the means for determining, the primary instance means from completing a transaction that the primary instance means is supposed to complete to continue operating, based on the determination that the first primary result is not likely to be transmitted,; and means for transmitting, coupled with the means for determining and the backup instance means, for transmitting the first backup result when the first primary result is not likely to be transmitted.
means for receiving a copy of a first input transmitted to the primary instance;
means for forwarding, coupled with the means for receiving, the copy of the first input to a backup instance means operative to generate a first backup result based on the forwarded copy of the first input;
means for waiting, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, the means for waiting being coupled with the means for receiving;
means for determining, coupled with the means for receiving and the means for waiting, that the first primary result is not likely to be transmitted;
means for preventing, coupled with the means for determining, the primary instance means from completing a transaction that the primary instance means is supposed to complete to continue operating, based on the determination that the first primary result is not likely to be transmitted,; and means for transmitting, coupled with the means for determining and the backup instance means, for transmitting the first backup result when the first primary result is not likely to be transmitted.
21. The system of claim 20 further comprising:
means for receiving a copy of a second input transmitted to the primary instance means, the copy of the second input being received subsequent to the copy of the first input;
means for waiting, in response to the receiving of the copy of the second input, for the primary instance means to transmit a second primary result based on the second input; and wherein the means for forwarding further comprises means for forwarding the copy of the first input to the backup instance means upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
means for comparing the first primary result with the first backup result arid indicating a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result, the means for comparing being coupled with the primary instance means and the backup instance means.
means for receiving a copy of a second input transmitted to the primary instance means, the copy of the second input being received subsequent to the copy of the first input;
means for waiting, in response to the receiving of the copy of the second input, for the primary instance means to transmit a second primary result based on the second input; and wherein the means for forwarding further comprises means for forwarding the copy of the first input to the backup instance means upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
means for comparing the first primary result with the first backup result arid indicating a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result, the means for comparing being coupled with the primary instance means and the backup instance means.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/502,998 US7480827B2 (en) | 2006-08-11 | 2006-08-11 | Fault tolerance and failover using active copy-cat |
US11/502,998 | 2006-08-11 | ||
PCT/US2007/073141 WO2008021636A2 (en) | 2006-08-11 | 2007-07-10 | Fault tolerance and failover using active copy-cat |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2657882A1 true CA2657882A1 (en) | 2008-02-21 |
CA2657882C CA2657882C (en) | 2012-02-07 |
Family
ID=39082842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2657882A Active CA2657882C (en) | 2006-08-11 | 2007-07-10 | Fault tolerance and failover using active copy-cat |
Country Status (4)
Country | Link |
---|---|
US (4) | US7480827B2 (en) |
EP (2) | EP3118743B1 (en) |
CA (1) | CA2657882C (en) |
WO (1) | WO2008021636A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11641395B2 (en) * | 2019-07-31 | 2023-05-02 | Stratus Technologies Ireland Ltd. | Fault tolerant systems and methods incorporating a minimum checkpoint interval |
Families Citing this family (79)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7177866B2 (en) * | 2001-03-16 | 2007-02-13 | Gravic, Inc. | Asynchronous coordinated commit replication and dual write with replication transmission and locking of target database on updates only |
US20060271663A1 (en) * | 2005-05-31 | 2006-11-30 | Fabio Barillari | A Fault-tolerant Distributed Data Processing System |
US7725764B2 (en) * | 2006-08-04 | 2010-05-25 | Tsx Inc. | Failover system and method |
US7434096B2 (en) | 2006-08-11 | 2008-10-07 | Chicago Mercantile Exchange | Match server for a financial exchange having fault tolerant operation |
US7480827B2 (en) * | 2006-08-11 | 2009-01-20 | Chicago Mercantile Exchange | Fault tolerance and failover using active copy-cat |
US8041985B2 (en) * | 2006-08-11 | 2011-10-18 | Chicago Mercantile Exchange, Inc. | Match server for a financial exchange having fault tolerant operation |
JP2008123357A (en) * | 2006-11-14 | 2008-05-29 | Honda Motor Co Ltd | Parallel computer system, parallel computing method, and program for parallel computer |
US7617413B2 (en) * | 2006-12-13 | 2009-11-10 | Inventec Corporation | Method of preventing erroneous take-over in a dual redundant server system |
JP4295326B2 (en) * | 2007-01-10 | 2009-07-15 | 株式会社日立製作所 | Computer system |
JP5152175B2 (en) * | 2007-03-20 | 2013-02-27 | 富士通モバイルコミュニケーションズ株式会社 | Information processing device |
US7929418B2 (en) * | 2007-03-23 | 2011-04-19 | Hewlett-Packard Development Company, L.P. | Data packet communication protocol offload method and system |
US20090055689A1 (en) * | 2007-08-21 | 2009-02-26 | International Business Machines Corporation | Systems, methods, and computer products for coordinated disaster recovery |
US7805632B1 (en) * | 2007-09-24 | 2010-09-28 | Net App, Inc. | Storage system and method for rapidly recovering from a system failure |
US7840839B2 (en) * | 2007-11-06 | 2010-11-23 | Vmware, Inc. | Storage handling for fault tolerance in virtual machines |
JP4491482B2 (en) * | 2007-11-28 | 2010-06-30 | 株式会社日立製作所 | Failure recovery method, computer, cluster system, management computer, and failure recovery program |
JP5213108B2 (en) * | 2008-03-18 | 2013-06-19 | 株式会社日立製作所 | Data replication method and data replication system |
US7792897B2 (en) * | 2008-06-02 | 2010-09-07 | International Business Machines Corporation | Distributed transaction processing system |
US8370679B1 (en) * | 2008-06-30 | 2013-02-05 | Symantec Corporation | Method, apparatus and system for improving failover within a high availability disaster recovery environment |
US8676760B2 (en) | 2008-08-05 | 2014-03-18 | International Business Machines Corporation | Maintaining data integrity in data servers across data centers |
US20100107154A1 (en) * | 2008-10-16 | 2010-04-29 | Deepak Brahmavar | Method and system for installing an operating system via a network |
US9330100B2 (en) * | 2009-02-26 | 2016-05-03 | Red Hat, Inc. | Protocol independent mirroring |
US8369968B2 (en) * | 2009-04-03 | 2013-02-05 | Dell Products, Lp | System and method for handling database failover |
US8201169B2 (en) * | 2009-06-15 | 2012-06-12 | Vmware, Inc. | Virtual machine fault tolerance |
KR101167938B1 (en) * | 2009-09-22 | 2012-08-03 | 엘지전자 주식회사 | Method for using rights to contents |
US20110179303A1 (en) * | 2010-01-15 | 2011-07-21 | Microsoft Corporation | Persistent application activation and timer notifications |
US9053073B1 (en) * | 2011-04-18 | 2015-06-09 | American Megatrends, Inc. | Use of timestamp logic in synchronous replication |
WO2013015777A1 (en) | 2011-07-25 | 2013-01-31 | Hewlett-Packard Development Company, L.P. | Transferring a conference session between conference servers due to failure |
US9063822B2 (en) * | 2011-09-02 | 2015-06-23 | Microsoft Technology Licensing, Llc | Efficient application-aware disaster recovery |
US9087019B2 (en) | 2012-01-27 | 2015-07-21 | Promise Technology, Inc. | Disk storage system with rebuild sequence and method of operation thereof |
KR101322401B1 (en) * | 2012-01-31 | 2013-10-28 | 주식회사 알티베이스 | Apparatus and method for parallel processing in database management system for synchronous replication |
US9836353B2 (en) | 2012-09-12 | 2017-12-05 | International Business Machines Corporation | Reconstruction of system definitional and state information |
US8874956B2 (en) * | 2012-09-18 | 2014-10-28 | Datadirect Networks, Inc. | Data re-protection in a distributed replicated data storage system |
US9436407B1 (en) * | 2013-04-10 | 2016-09-06 | Amazon Technologies, Inc. | Cursor remirroring |
EP2987090B1 (en) * | 2013-04-16 | 2019-03-27 | EntIT Software LLC | Distributed event correlation system |
CA2911001C (en) * | 2013-06-13 | 2019-11-19 | Tsx Inc. | Failover system and method |
US10148523B1 (en) * | 2013-09-16 | 2018-12-04 | Amazon Technologies, Inc. | Resetting computing resources in a service provider network |
US11037239B2 (en) | 2013-11-07 | 2021-06-15 | Chicago Mercantile Exchange Inc. | Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance |
US10366452B2 (en) | 2013-11-07 | 2019-07-30 | Chicago Mercantile Exchange Inc. | Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance |
US10929926B2 (en) | 2013-11-07 | 2021-02-23 | Chicago Mercantile Exchange Inc. | Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance |
US20150127509A1 (en) | 2013-11-07 | 2015-05-07 | Chicago Mercantile Exchange Inc. | Transactionally Deterministic High Speed Financial Exchange Having Improved, Efficiency, Communication, Customization, Performance, Access, Trading Opportunities, Credit Controls, and Fault Tolerance |
US9691102B2 (en) | 2013-11-07 | 2017-06-27 | Chicago Mercantile Exchange Inc. | Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance |
US10332206B2 (en) | 2013-11-07 | 2019-06-25 | Chicago Mercantile Exchange Inc. | Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance |
US10467693B2 (en) | 2013-11-07 | 2019-11-05 | Chicago Mercantile Exchange Inc. | Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance |
US10692143B2 (en) | 2013-11-07 | 2020-06-23 | Chicago Mercantile Exchange Inc. | Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance |
JP6307858B2 (en) * | 2013-11-29 | 2018-04-11 | 富士通株式会社 | Transmission apparatus, transmission system, and monitoring control method |
EP3358466B1 (en) | 2013-12-12 | 2019-11-13 | Huawei Technologies Co., Ltd. | Data replication method and storage system |
US10108496B2 (en) | 2014-06-30 | 2018-10-23 | International Business Machines Corporation | Use of replicated copies to improve database backup performance |
US10402113B2 (en) | 2014-07-31 | 2019-09-03 | Hewlett Packard Enterprise Development Lp | Live migration of data |
WO2016036347A1 (en) | 2014-09-02 | 2016-03-10 | Hewlett Packard Enterprise Development Lp | Serializing access to fault tolerant memory |
WO2016064417A1 (en) | 2014-10-24 | 2016-04-28 | Hewlett Packard Enterprise Development Lp | End-to-end negative acknowledgment |
US20160132420A1 (en) * | 2014-11-10 | 2016-05-12 | Institute For Information Industry | Backup method, pre-testing method for environment updating and system thereof |
AU2015373914B2 (en) * | 2014-12-31 | 2017-09-07 | Servicenow, Inc. | Failure resistant distributed computing system |
CN104618155B (en) * | 2015-01-23 | 2018-06-05 | 华为技术有限公司 | A kind of virtual machine fault-tolerant method, apparatus and system |
US10409681B2 (en) | 2015-01-30 | 2019-09-10 | Hewlett Packard Enterprise Development Lp | Non-idempotent primitives in fault-tolerant memory |
WO2016122642A1 (en) | 2015-01-30 | 2016-08-04 | Hewlett Packard Enterprise Development Lp | Determine failed components in fault-tolerant memory |
US10402287B2 (en) | 2015-01-30 | 2019-09-03 | Hewlett Packard Enterprise Development Lp | Preventing data corruption and single point of failure in a fault-tolerant memory |
US10402261B2 (en) | 2015-03-31 | 2019-09-03 | Hewlett Packard Enterprise Development Lp | Preventing data corruption and single point of failure in fault-tolerant memory fabrics |
US10798146B2 (en) * | 2015-07-01 | 2020-10-06 | Oracle International Corporation | System and method for universal timeout in a distributed computing environment |
US11164248B2 (en) | 2015-10-12 | 2021-11-02 | Chicago Mercantile Exchange Inc. | Multi-modal trade execution with smart order routing |
US11288739B2 (en) | 2015-10-12 | 2022-03-29 | Chicago Mercantile Exchange Inc. | Central limit order book automatic triangulation system |
KR101758558B1 (en) * | 2016-03-29 | 2017-07-26 | 엘에스산전 주식회사 | Energy managemnet server and energy managemnet system having thereof |
CN105915375B (en) * | 2016-04-13 | 2019-06-07 | 北京交通大学 | The activestandby state management method of Dual-Computer Hot-Standby System |
US10580100B2 (en) | 2016-06-06 | 2020-03-03 | Chicago Mercantile Exchange Inc. | Data payment and authentication via a shared data structure |
US11514448B1 (en) | 2016-07-11 | 2022-11-29 | Chicago Mercantile Exchange Inc. | Hierarchical consensus protocol framework for implementing electronic transaction processing systems |
US10417217B2 (en) | 2016-08-05 | 2019-09-17 | Chicago Mercantile Exchange Inc. | Systems and methods for blockchain rule synchronization |
US10748210B2 (en) | 2016-08-09 | 2020-08-18 | Chicago Mercantile Exchange Inc. | Systems and methods for coordinating processing of scheduled instructions across multiple components |
US10943297B2 (en) | 2016-08-09 | 2021-03-09 | Chicago Mercantile Exchange Inc. | Systems and methods for coordinating processing of instructions across multiple components |
US10193634B2 (en) | 2016-09-19 | 2019-01-29 | Hewlett Packard Enterprise Development Lp | Optical driver circuits |
US10007582B2 (en) * | 2016-09-27 | 2018-06-26 | International Business Machines Corporation | Rebuild rollback support in distributed SDS systems |
US10270646B2 (en) | 2016-10-24 | 2019-04-23 | Servicenow, Inc. | System and method for resolving master node failures within node clusters |
US10326862B2 (en) | 2016-12-09 | 2019-06-18 | Chicago Mercantile Exchange Inc. | Distributed and transactionally deterministic data processing architecture |
US10467113B2 (en) | 2017-06-09 | 2019-11-05 | Hewlett Packard Enterprise Development Lp | Executing programs through a shared NVM pool |
WO2018229930A1 (en) * | 2017-06-15 | 2018-12-20 | 株式会社日立製作所 | Controller |
US10389342B2 (en) | 2017-06-28 | 2019-08-20 | Hewlett Packard Enterprise Development Lp | Comparator |
CN110019502B (en) | 2017-08-29 | 2023-03-21 | 阿里巴巴集团控股有限公司 | Synchronization method between primary database and backup database, database system and device |
US10831619B2 (en) * | 2017-09-29 | 2020-11-10 | Oracle International Corporation | Fault-tolerant stream processing |
CN108011698B (en) * | 2017-11-13 | 2020-05-22 | 北京全路通信信号研究设计院集团有限公司 | RSSP-I secure communication method based on dual-system synchronization |
CN112236792A (en) * | 2018-06-06 | 2021-01-15 | E·马伊姆 | Secure transaction system in P2P architecture |
CN112130521B (en) * | 2020-09-22 | 2021-04-20 | 郑州嘉晨电器有限公司 | Control device for safety control of engineering machinery |
Family Cites Families (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4228496A (en) | 1976-09-07 | 1980-10-14 | Tandem Computers Incorporated | Multiprocessor system |
US4590554A (en) | 1982-11-23 | 1986-05-20 | Parallel Computers Systems, Inc. | Backup fault tolerant computer system |
US4674044A (en) * | 1985-01-30 | 1987-06-16 | Merrill Lynch, Pierce, Fenner & Smith, Inc. | Automated securities trading system |
US5088021A (en) * | 1989-09-07 | 1992-02-11 | Honeywell, Inc. | Apparatus and method for guaranteed data store in redundant controllers of a process control system |
JPH05128080A (en) | 1991-10-14 | 1993-05-25 | Mitsubishi Electric Corp | Information processor |
US5363503A (en) | 1992-01-22 | 1994-11-08 | Unisys Corporation | Fault tolerant computer system with provision for handling external events |
US5715386A (en) * | 1992-09-30 | 1998-02-03 | Lucent Technologies Inc. | Apparatus and methods for software rejuvenation |
EP0593062A3 (en) * | 1992-10-16 | 1995-08-30 | Siemens Ind Automation Inc | Redundant networked database system |
US5621885A (en) | 1995-06-07 | 1997-04-15 | Tandem Computers, Incorporated | System and method for providing a fault tolerant computer program runtime support environment |
US5978933A (en) * | 1996-01-11 | 1999-11-02 | Hewlett-Packard Company | Generic fault tolerant platform |
US7515697B2 (en) * | 1997-08-29 | 2009-04-07 | Arbinet-Thexchange, Inc. | Method and a system for settlement of trading accounts |
JP3052908B2 (en) * | 1997-09-04 | 2000-06-19 | 日本電気株式会社 | Transaction program parallel execution method and transaction program parallel execution method |
US6324654B1 (en) | 1998-03-30 | 2001-11-27 | Legato Systems, Inc. | Computer network remote data mirroring system |
US6199171B1 (en) * | 1998-06-26 | 2001-03-06 | International Business Machines Corporation | Time-lag duplexing techniques |
US6393582B1 (en) * | 1998-12-10 | 2002-05-21 | Compaq Computer Corporation | Error self-checking and recovery using lock-step processor pair architecture |
US6169726B1 (en) * | 1998-12-17 | 2001-01-02 | Lucent Technologies, Inc. | Method and apparatus for error free switching in a redundant duplex communication carrier system |
GB2353113B (en) * | 1999-08-11 | 2001-10-10 | Sun Microsystems Inc | Software fault tolerant computer system |
GB2359384B (en) * | 2000-02-16 | 2004-06-16 | Data Connection Ltd | Automatic reconnection of partner software processes in a fault-tolerant computer system |
US20020026400A1 (en) | 2000-08-22 | 2002-02-28 | Bondglobe Inc. | System and method to establish trading mechanisms employing auctions and reverse auctions |
US20040133606A1 (en) * | 2003-01-02 | 2004-07-08 | Z-Force Communications, Inc. | Directory aggregation for files distributed over a plurality of servers in a switched file system |
EP1370947A4 (en) * | 2001-02-13 | 2009-05-27 | Candera Inc | Silicon-based storage virtualization server |
US6971044B2 (en) * | 2001-04-20 | 2005-11-29 | Egenera, Inc. | Service clusters and method in a processing system with failover capability |
JP2003015900A (en) * | 2001-06-28 | 2003-01-17 | Hitachi Ltd | Follow-up type multiplex system and data processing method capable of improving reliability by follow-up |
US6954877B2 (en) * | 2001-11-29 | 2005-10-11 | Agami Systems, Inc. | Fault tolerance using logical checkpointing in computing systems |
US7093004B2 (en) * | 2002-02-04 | 2006-08-15 | Datasynapse, Inc. | Using execution statistics to select tasks for redundant assignment in a distributed computing platform |
US7421478B1 (en) * | 2002-03-07 | 2008-09-02 | Cisco Technology, Inc. | Method and apparatus for exchanging heartbeat messages and configuration information between nodes operating in a master-slave configuration |
GB0206604D0 (en) * | 2002-03-20 | 2002-05-01 | Global Continuity Plc | Improvements relating to overcoming data processing failures |
US6978396B2 (en) * | 2002-05-30 | 2005-12-20 | Solid Information Technology Oy | Method and system for processing replicated transactions parallel in secondary server |
JP3982353B2 (en) * | 2002-07-12 | 2007-09-26 | 日本電気株式会社 | Fault tolerant computer apparatus, resynchronization method and resynchronization program |
US7120825B2 (en) * | 2003-06-06 | 2006-10-10 | Hewlett-Packard Development Company, L.P. | Adaptive batch sizing for asynchronous data redundancy |
US7139939B2 (en) * | 2003-06-20 | 2006-11-21 | International Business Machines Corporation | System and method for testing servers and taking remedial action |
US20050071391A1 (en) * | 2003-09-29 | 2005-03-31 | International Business Machines Corporation | High availability data replication set up using external backup and restore |
JP4288418B2 (en) * | 2003-12-02 | 2009-07-01 | 日本電気株式会社 | Computer system, status acquisition method, and status acquisition program |
US20060112219A1 (en) * | 2004-11-19 | 2006-05-25 | Gaurav Chawla | Functional partitioning method for providing modular data storage systems |
GB0426309D0 (en) | 2004-11-30 | 2004-12-29 | Ibm | Method and system for error strategy in a storage system |
US20070038849A1 (en) * | 2005-08-11 | 2007-02-15 | Rajiv Madampath | Computing system and method |
US7519859B2 (en) | 2005-08-30 | 2009-04-14 | International Business Machines Corporation | Fault recovery for transaction server |
US7725764B2 (en) | 2006-08-04 | 2010-05-25 | Tsx Inc. | Failover system and method |
US8041985B2 (en) * | 2006-08-11 | 2011-10-18 | Chicago Mercantile Exchange, Inc. | Match server for a financial exchange having fault tolerant operation |
US7480827B2 (en) | 2006-08-11 | 2009-01-20 | Chicago Mercantile Exchange | Fault tolerance and failover using active copy-cat |
US7434096B2 (en) * | 2006-08-11 | 2008-10-07 | Chicago Mercantile Exchange | Match server for a financial exchange having fault tolerant operation |
-
2006
- 2006-08-11 US US11/502,998 patent/US7480827B2/en active Active
-
2007
- 2007-07-10 EP EP16166682.1A patent/EP3118743B1/en active Active
- 2007-07-10 EP EP07799438.2A patent/EP2049995B1/en active Active
- 2007-07-10 WO PCT/US2007/073141 patent/WO2008021636A2/en active Application Filing
- 2007-07-10 CA CA2657882A patent/CA2657882C/en active Active
-
2008
- 2008-11-03 US US12/263,821 patent/US7975173B2/en active Active
-
2011
- 2011-06-10 US US13/157,476 patent/US8468390B2/en active Active
-
2013
- 2013-05-16 US US13/896,093 patent/US9244771B2/en active Active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11641395B2 (en) * | 2019-07-31 | 2023-05-02 | Stratus Technologies Ireland Ltd. | Fault tolerant systems and methods incorporating a minimum checkpoint interval |
Also Published As
Publication number | Publication date |
---|---|
US9244771B2 (en) | 2016-01-26 |
US7480827B2 (en) | 2009-01-20 |
CA2657882C (en) | 2012-02-07 |
EP2049995A2 (en) | 2009-04-22 |
EP2049995B1 (en) | 2016-06-01 |
US8468390B2 (en) | 2013-06-18 |
US20110246819A1 (en) | 2011-10-06 |
US20090106328A1 (en) | 2009-04-23 |
EP3118743A1 (en) | 2017-01-18 |
WO2008021636A2 (en) | 2008-02-21 |
US7975173B2 (en) | 2011-07-05 |
WO2008021636A3 (en) | 2008-10-02 |
EP2049995A4 (en) | 2009-11-11 |
EP3118743B1 (en) | 2021-10-06 |
US20080126853A1 (en) | 2008-05-29 |
US20130297970A1 (en) | 2013-11-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2657882A1 (en) | Fault tolerance and failover using active copy-cat | |
CA2659395A1 (en) | Match server for a financial exchange having fault tolerant operation | |
US8572455B2 (en) | Systems and methods to respond to error detection | |
US7321906B2 (en) | Method of improving replica server performance and a replica server system | |
US7761734B2 (en) | Automated firmware restoration to a peer programmable hardware device | |
WO2019136595A1 (en) | Method for handling i2c bus deadlock, electronic device, and communication system | |
US7861022B2 (en) | Livelock resolution | |
US9417946B2 (en) | Method and system for fault containment | |
US8644136B2 (en) | Sideband error signaling | |
US20150161014A1 (en) | Persistent application activation and timer notifications | |
JP2014509012A5 (en) | ||
US20130332795A1 (en) | Rank-specific cycle redundancy check | |
CN105393519A (en) | Failover system and method | |
US20060133410A1 (en) | Fault tolerant duplex computer system and its control method | |
US9348682B2 (en) | Methods for transitioning control between two controllers of a storage system | |
JP2005031993A (en) | Electronic controller | |
US7549082B2 (en) | Method and system of bringing processors to the same computational point | |
JP2009116642A (en) | Method and program for recovering from pci bus fault | |
CN102521086B (en) | Dual-mode redundant system based on lock step synchronization and implement method thereof | |
WO2021111639A1 (en) | Controller | |
JP2012022429A (en) | Dual system arithmetic processing unit and dual system arithmetic processing method | |
CN104683153B (en) | A kind of active and standby MPU control method of cluster routers and its system | |
US11307552B2 (en) | Method for modifying a configuration and industrial plant system | |
US8103861B2 (en) | Method and system for presenting an interrupt request to processors executing in lock step | |
CN108958986B (en) | Method and apparatus for identifying hardware errors in a microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |