CA2657882A1 - Fault tolerance and failover using active copy-cat - Google Patents

Fault tolerance and failover using active copy-cat Download PDF

Info

Publication number
CA2657882A1
CA2657882A1 CA002657882A CA2657882A CA2657882A1 CA 2657882 A1 CA2657882 A1 CA 2657882A1 CA 002657882 A CA002657882 A CA 002657882A CA 2657882 A CA2657882 A CA 2657882A CA 2657882 A1 CA2657882 A1 CA 2657882A1
Authority
CA
Canada
Prior art keywords
primary
instance
input
backup
copy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002657882A
Other languages
French (fr)
Other versions
CA2657882C (en
Inventor
Zuber Shethwala
Paul J. Callaway
Troy Reece
Paul Andrew Bauerschmidt
Robert C. Hageman
Enrico Ferrari
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CME Group Inc
Original Assignee
Chicago Mercantile Exchange
Zuber Shethwala
Paul J. Callaway
Troy Reece
Paul Andrew Bauerschmidt
Robert C. Hageman
Enrico Ferrari
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chicago Mercantile Exchange, Zuber Shethwala, Paul J. Callaway, Troy Reece, Paul Andrew Bauerschmidt, Robert C. Hageman, Enrico Ferrari filed Critical Chicago Mercantile Exchange
Publication of CA2657882A1 publication Critical patent/CA2657882A1/en
Application granted granted Critical
Publication of CA2657882C publication Critical patent/CA2657882C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1629Error detection by comparing the output of redundant processing systems
    • G06F11/1641Error detection by comparing the output of redundant processing systems where the comparison is not performed by the redundant processing components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1687Temporal synchronisation or re-synchronisation of redundant processing components at event level, e.g. by interrupt or result of polling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1695Error detection or correction of the data by redundancy in hardware which are operating with time diversity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2046Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

Fault tolerant operation is disclosed for a primary instance, such as a process, thread, application, processor, etc., using an active copy-cat instance, a.k,a. backup instance, that mirrors operations in the primary instance, but only after those operations have successfully completed in the primary instance. Fault tolerant logic monitors inputs and outputs of the primary instance and gates those inputs to the backup instance once a given input has been processed. The outputs of the backup instance are then compared with the outputs of the primary instance to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup instance to take over for the primary instance in a fault situation wherein the primary and backup instances are loosely coupled, i.e. they need not be aware that they are operating in a fault tolerant environment. As such, the primary instance need not be specifically designed or programmed to interact with the fault tolerant mechanisms. Instead, the primary instance need only be designed to adhere to specific basic operating guidelines and shut itself down when it cannot do so. By externally controlling the ability of the primary instance to successfully adhere to its operating guidelines, the fault tolerant mechanisms of the disclosed embodiments can recognize error conditions and easily failover from the primary instance to the backup instance.

Claims (21)

1. A method of providing fault tolerance to a primary instance, the method comprising:

receiving a copy of a first input transmitted to the primary instance;
forwarding the copy of the first input to a backup instance operative to generate a first backup result based on the forwarded copy of the first input;
waiting, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input;
determining that the first primary result is not likely to be transmitted;
preventing, based on the determination that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and transmitting the first backup result when the first primary result is not likely to be transmitted.
2. The method of claim 1 further comprising:

receiving a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
waiting, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the forwarding further comprises forwarding the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance; and comparing the first primary result with the first backup result and indicating a failure of the backup instance, the primary instance, or a combination thereof, when the first primary result is at least partially different from the first backup result.
3. The method of claim 1, wherein the waiting further comprises waiting for a defined period of time to elapse from when the first input is received, wherein the determining further comprises determining that the first primary result has not been received before the defined period of time has elapsed.
4. The method of claim 1, wherein the transaction comprises a store transaction to a database, the preventing further comprising preventing completion of the store transaction.
5. The method of claim 4, wherein the preventing further comprises causing the database to return a constraint violation in response to the store transaction.
6. The method of claim 1, where in the primary instance comprises a match server of a financial exchange.
7. The method of claim 1, wherein the primary instance comprises a software application.
8. The method of claim 1, wherein the primary instance comprises a processor.
9. A system for providing fault tolerance to a primary instance, the system comprising:

a receiver operative to receive a copy of a first input transmitted to the primary instance;

an input forwarder coupled with the receiver and operative to forward the copy of the first input to a backup instance, the backup instance being operative to generate a first backup result based on the forwarded copy of the first input;
a fault detector coupled with the receiver and the primary instance and operative to wait, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, wherein the fault detector is further operative to determine that the first primary result is not likely to be transmitted;
a transaction inhibitor coupled with the fault detector and operative to prevent, based on the determination that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and a transmitter coupled with the backup instance and the fault detector and operative to transmit the first backup result when the first primary result is not likely to be transmitted.
10. The system of claim 9 wherein:
the receiver is further operative to receive a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
the fault detector is further operative to wait, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the input forwarder is further operative to forward the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
a comparator coupled with the primary instance and the backup instance and operative to compare the first primary result with the first backup result and indicate a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result.
11. The system of claim 9, wherein the fault detector is further operative to wait for a defined period of time to elapse from when the first input is received and determine that the first primary result has not been received before the defined period of time has elapsed.
12. The system of claim 9, wherein the transaction comprises a store transaction to a database, the transaction inhibitor being further operative to prevent completion of the store transaction.
13. The system of claim 12, wherein the transaction inhibitor being further operative to cause the database to return a constraint violation in response to the store transaction.
14. The system of claim 9, where in the primary instance comprises a match server of a financial exchange.
15. The system of claim 9, wherein the primary instance comprises a software application.
16. The system of claim 9, wherein the primary instance comprises a processor.
17. A system for providing fault tolerance to a primary instance, the system comprising:
a processor;
a memory coupled with the processor;
first logic stored in the memory and executable by the processor to receive a copy of a first input transmitted to the primary instance;
second logic stored in the memory, coupled with the first logic and executable by the processor to forward the copy of the first input to a backup instance, the backup instance being operative to generate a first backup result based on the forwarded copy of the first input;
fourth logic stored in the memory, coupled with the first logic and executable by the processor to wait, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, wherein the fourth logic is further executable by the processor to determine that the first primary result is not likely to be transmitted;
fifth logic stored in the memory, coupled with the fourth logic and executable by the processor to prevent, based on the fourth logic determining that the first primary result is not likely to be transmitted, the primary instance from completing a transaction that the primary instance is supposed to complete to continue operating; and sixth logic stored in the memory, coupled with the backup instance and the fourth logic and executable by the processor to transmit the first backup result when the first primary result is not likely to be transmitted.
18. The system of claim 17 wherein:

the first logic is further executable by the processor to receive a copy of a second input transmitted to the primary instance, the copy of the second input being received subsequent to the copy of the first input;
the fourth logic is further executable by the processor to wait, in response to the receiving of the copy of the second input, for the primary instance to transmit a second primary result based on the second input;
wherein the second logic is further executable by the processor to forward the copy of the first input to the backup instance upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
seventh logic stored in the memory, coupled with the primary instance and the backup instance and executable by the processor to compare the first primary result with the first backup result and indicate a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result.
19. The system of claim 17, wherein the primary instance comprises a first execution by the processor of seventh logic stored in the memory and executable by the processor, the backup instance comprising a second execution of the seventh logic by the processor.
20. A system for providing fault tolerance to a primary instance means, the system comprising:

means for receiving a copy of a first input transmitted to the primary instance;

means for forwarding, coupled with the means for receiving, the copy of the first input to a backup instance means operative to generate a first backup result based on the forwarded copy of the first input;

means for waiting, in response to the receiving of the copy of the first input, for the primary instance to transmit a first primary result based on the first input, the means for waiting being coupled with the means for receiving;
means for determining, coupled with the means for receiving and the means for waiting, that the first primary result is not likely to be transmitted;
means for preventing, coupled with the means for determining, the primary instance means from completing a transaction that the primary instance means is supposed to complete to continue operating, based on the determination that the first primary result is not likely to be transmitted,; and means for transmitting, coupled with the means for determining and the backup instance means, for transmitting the first backup result when the first primary result is not likely to be transmitted.
21. The system of claim 20 further comprising:
means for receiving a copy of a second input transmitted to the primary instance means, the copy of the second input being received subsequent to the copy of the first input;

means for waiting, in response to the receiving of the copy of the second input, for the primary instance means to transmit a second primary result based on the second input; and wherein the means for forwarding further comprises means for forwarding the copy of the first input to the backup instance means upon the transmission of the first and second primary results by the primary instance;
the system further comprising:
means for comparing the first primary result with the first backup result arid indicating a failure of the backup instance, the primary instance or a combination thereof, when the first primary result is at least partially different from the first backup result, the means for comparing being coupled with the primary instance means and the backup instance means.
CA2657882A 2006-08-11 2007-07-10 Fault tolerance and failover using active copy-cat Active CA2657882C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11/502,998 US7480827B2 (en) 2006-08-11 2006-08-11 Fault tolerance and failover using active copy-cat
US11/502,998 2006-08-11
PCT/US2007/073141 WO2008021636A2 (en) 2006-08-11 2007-07-10 Fault tolerance and failover using active copy-cat

Publications (2)

Publication Number Publication Date
CA2657882A1 true CA2657882A1 (en) 2008-02-21
CA2657882C CA2657882C (en) 2012-02-07

Family

ID=39082842

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2657882A Active CA2657882C (en) 2006-08-11 2007-07-10 Fault tolerance and failover using active copy-cat

Country Status (4)

Country Link
US (4) US7480827B2 (en)
EP (2) EP3118743B1 (en)
CA (1) CA2657882C (en)
WO (1) WO2008021636A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11641395B2 (en) * 2019-07-31 2023-05-02 Stratus Technologies Ireland Ltd. Fault tolerant systems and methods incorporating a minimum checkpoint interval

Families Citing this family (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177866B2 (en) * 2001-03-16 2007-02-13 Gravic, Inc. Asynchronous coordinated commit replication and dual write with replication transmission and locking of target database on updates only
US20060271663A1 (en) * 2005-05-31 2006-11-30 Fabio Barillari A Fault-tolerant Distributed Data Processing System
US7725764B2 (en) * 2006-08-04 2010-05-25 Tsx Inc. Failover system and method
US7434096B2 (en) 2006-08-11 2008-10-07 Chicago Mercantile Exchange Match server for a financial exchange having fault tolerant operation
US7480827B2 (en) * 2006-08-11 2009-01-20 Chicago Mercantile Exchange Fault tolerance and failover using active copy-cat
US8041985B2 (en) * 2006-08-11 2011-10-18 Chicago Mercantile Exchange, Inc. Match server for a financial exchange having fault tolerant operation
JP2008123357A (en) * 2006-11-14 2008-05-29 Honda Motor Co Ltd Parallel computer system, parallel computing method, and program for parallel computer
US7617413B2 (en) * 2006-12-13 2009-11-10 Inventec Corporation Method of preventing erroneous take-over in a dual redundant server system
JP4295326B2 (en) * 2007-01-10 2009-07-15 株式会社日立製作所 Computer system
JP5152175B2 (en) * 2007-03-20 2013-02-27 富士通モバイルコミュニケーションズ株式会社 Information processing device
US7929418B2 (en) * 2007-03-23 2011-04-19 Hewlett-Packard Development Company, L.P. Data packet communication protocol offload method and system
US20090055689A1 (en) * 2007-08-21 2009-02-26 International Business Machines Corporation Systems, methods, and computer products for coordinated disaster recovery
US7805632B1 (en) * 2007-09-24 2010-09-28 Net App, Inc. Storage system and method for rapidly recovering from a system failure
US7840839B2 (en) * 2007-11-06 2010-11-23 Vmware, Inc. Storage handling for fault tolerance in virtual machines
JP4491482B2 (en) * 2007-11-28 2010-06-30 株式会社日立製作所 Failure recovery method, computer, cluster system, management computer, and failure recovery program
JP5213108B2 (en) * 2008-03-18 2013-06-19 株式会社日立製作所 Data replication method and data replication system
US7792897B2 (en) * 2008-06-02 2010-09-07 International Business Machines Corporation Distributed transaction processing system
US8370679B1 (en) * 2008-06-30 2013-02-05 Symantec Corporation Method, apparatus and system for improving failover within a high availability disaster recovery environment
US8676760B2 (en) 2008-08-05 2014-03-18 International Business Machines Corporation Maintaining data integrity in data servers across data centers
US20100107154A1 (en) * 2008-10-16 2010-04-29 Deepak Brahmavar Method and system for installing an operating system via a network
US9330100B2 (en) * 2009-02-26 2016-05-03 Red Hat, Inc. Protocol independent mirroring
US8369968B2 (en) * 2009-04-03 2013-02-05 Dell Products, Lp System and method for handling database failover
US8201169B2 (en) * 2009-06-15 2012-06-12 Vmware, Inc. Virtual machine fault tolerance
KR101167938B1 (en) * 2009-09-22 2012-08-03 엘지전자 주식회사 Method for using rights to contents
US20110179303A1 (en) * 2010-01-15 2011-07-21 Microsoft Corporation Persistent application activation and timer notifications
US9053073B1 (en) * 2011-04-18 2015-06-09 American Megatrends, Inc. Use of timestamp logic in synchronous replication
WO2013015777A1 (en) 2011-07-25 2013-01-31 Hewlett-Packard Development Company, L.P. Transferring a conference session between conference servers due to failure
US9063822B2 (en) * 2011-09-02 2015-06-23 Microsoft Technology Licensing, Llc Efficient application-aware disaster recovery
US9087019B2 (en) 2012-01-27 2015-07-21 Promise Technology, Inc. Disk storage system with rebuild sequence and method of operation thereof
KR101322401B1 (en) * 2012-01-31 2013-10-28 주식회사 알티베이스 Apparatus and method for parallel processing in database management system for synchronous replication
US9836353B2 (en) 2012-09-12 2017-12-05 International Business Machines Corporation Reconstruction of system definitional and state information
US8874956B2 (en) * 2012-09-18 2014-10-28 Datadirect Networks, Inc. Data re-protection in a distributed replicated data storage system
US9436407B1 (en) * 2013-04-10 2016-09-06 Amazon Technologies, Inc. Cursor remirroring
EP2987090B1 (en) * 2013-04-16 2019-03-27 EntIT Software LLC Distributed event correlation system
CA2911001C (en) * 2013-06-13 2019-11-19 Tsx Inc. Failover system and method
US10148523B1 (en) * 2013-09-16 2018-12-04 Amazon Technologies, Inc. Resetting computing resources in a service provider network
US11037239B2 (en) 2013-11-07 2021-06-15 Chicago Mercantile Exchange Inc. Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance
US10366452B2 (en) 2013-11-07 2019-07-30 Chicago Mercantile Exchange Inc. Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance
US10929926B2 (en) 2013-11-07 2021-02-23 Chicago Mercantile Exchange Inc. Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance
US20150127509A1 (en) 2013-11-07 2015-05-07 Chicago Mercantile Exchange Inc. Transactionally Deterministic High Speed Financial Exchange Having Improved, Efficiency, Communication, Customization, Performance, Access, Trading Opportunities, Credit Controls, and Fault Tolerance
US9691102B2 (en) 2013-11-07 2017-06-27 Chicago Mercantile Exchange Inc. Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance
US10332206B2 (en) 2013-11-07 2019-06-25 Chicago Mercantile Exchange Inc. Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance
US10467693B2 (en) 2013-11-07 2019-11-05 Chicago Mercantile Exchange Inc. Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance
US10692143B2 (en) 2013-11-07 2020-06-23 Chicago Mercantile Exchange Inc. Transactionally deterministic high speed financial exchange having improved, efficiency, communication, customization, performance, access, trading opportunities, credit controls, and fault tolerance
JP6307858B2 (en) * 2013-11-29 2018-04-11 富士通株式会社 Transmission apparatus, transmission system, and monitoring control method
EP3358466B1 (en) 2013-12-12 2019-11-13 Huawei Technologies Co., Ltd. Data replication method and storage system
US10108496B2 (en) 2014-06-30 2018-10-23 International Business Machines Corporation Use of replicated copies to improve database backup performance
US10402113B2 (en) 2014-07-31 2019-09-03 Hewlett Packard Enterprise Development Lp Live migration of data
WO2016036347A1 (en) 2014-09-02 2016-03-10 Hewlett Packard Enterprise Development Lp Serializing access to fault tolerant memory
WO2016064417A1 (en) 2014-10-24 2016-04-28 Hewlett Packard Enterprise Development Lp End-to-end negative acknowledgment
US20160132420A1 (en) * 2014-11-10 2016-05-12 Institute For Information Industry Backup method, pre-testing method for environment updating and system thereof
AU2015373914B2 (en) * 2014-12-31 2017-09-07 Servicenow, Inc. Failure resistant distributed computing system
CN104618155B (en) * 2015-01-23 2018-06-05 华为技术有限公司 A kind of virtual machine fault-tolerant method, apparatus and system
US10409681B2 (en) 2015-01-30 2019-09-10 Hewlett Packard Enterprise Development Lp Non-idempotent primitives in fault-tolerant memory
WO2016122642A1 (en) 2015-01-30 2016-08-04 Hewlett Packard Enterprise Development Lp Determine failed components in fault-tolerant memory
US10402287B2 (en) 2015-01-30 2019-09-03 Hewlett Packard Enterprise Development Lp Preventing data corruption and single point of failure in a fault-tolerant memory
US10402261B2 (en) 2015-03-31 2019-09-03 Hewlett Packard Enterprise Development Lp Preventing data corruption and single point of failure in fault-tolerant memory fabrics
US10798146B2 (en) * 2015-07-01 2020-10-06 Oracle International Corporation System and method for universal timeout in a distributed computing environment
US11164248B2 (en) 2015-10-12 2021-11-02 Chicago Mercantile Exchange Inc. Multi-modal trade execution with smart order routing
US11288739B2 (en) 2015-10-12 2022-03-29 Chicago Mercantile Exchange Inc. Central limit order book automatic triangulation system
KR101758558B1 (en) * 2016-03-29 2017-07-26 엘에스산전 주식회사 Energy managemnet server and energy managemnet system having thereof
CN105915375B (en) * 2016-04-13 2019-06-07 北京交通大学 The activestandby state management method of Dual-Computer Hot-Standby System
US10580100B2 (en) 2016-06-06 2020-03-03 Chicago Mercantile Exchange Inc. Data payment and authentication via a shared data structure
US11514448B1 (en) 2016-07-11 2022-11-29 Chicago Mercantile Exchange Inc. Hierarchical consensus protocol framework for implementing electronic transaction processing systems
US10417217B2 (en) 2016-08-05 2019-09-17 Chicago Mercantile Exchange Inc. Systems and methods for blockchain rule synchronization
US10748210B2 (en) 2016-08-09 2020-08-18 Chicago Mercantile Exchange Inc. Systems and methods for coordinating processing of scheduled instructions across multiple components
US10943297B2 (en) 2016-08-09 2021-03-09 Chicago Mercantile Exchange Inc. Systems and methods for coordinating processing of instructions across multiple components
US10193634B2 (en) 2016-09-19 2019-01-29 Hewlett Packard Enterprise Development Lp Optical driver circuits
US10007582B2 (en) * 2016-09-27 2018-06-26 International Business Machines Corporation Rebuild rollback support in distributed SDS systems
US10270646B2 (en) 2016-10-24 2019-04-23 Servicenow, Inc. System and method for resolving master node failures within node clusters
US10326862B2 (en) 2016-12-09 2019-06-18 Chicago Mercantile Exchange Inc. Distributed and transactionally deterministic data processing architecture
US10467113B2 (en) 2017-06-09 2019-11-05 Hewlett Packard Enterprise Development Lp Executing programs through a shared NVM pool
WO2018229930A1 (en) * 2017-06-15 2018-12-20 株式会社日立製作所 Controller
US10389342B2 (en) 2017-06-28 2019-08-20 Hewlett Packard Enterprise Development Lp Comparator
CN110019502B (en) 2017-08-29 2023-03-21 阿里巴巴集团控股有限公司 Synchronization method between primary database and backup database, database system and device
US10831619B2 (en) * 2017-09-29 2020-11-10 Oracle International Corporation Fault-tolerant stream processing
CN108011698B (en) * 2017-11-13 2020-05-22 北京全路通信信号研究设计院集团有限公司 RSSP-I secure communication method based on dual-system synchronization
CN112236792A (en) * 2018-06-06 2021-01-15 E·马伊姆 Secure transaction system in P2P architecture
CN112130521B (en) * 2020-09-22 2021-04-20 郑州嘉晨电器有限公司 Control device for safety control of engineering machinery

Family Cites Families (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4228496A (en) 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4590554A (en) 1982-11-23 1986-05-20 Parallel Computers Systems, Inc. Backup fault tolerant computer system
US4674044A (en) * 1985-01-30 1987-06-16 Merrill Lynch, Pierce, Fenner & Smith, Inc. Automated securities trading system
US5088021A (en) * 1989-09-07 1992-02-11 Honeywell, Inc. Apparatus and method for guaranteed data store in redundant controllers of a process control system
JPH05128080A (en) 1991-10-14 1993-05-25 Mitsubishi Electric Corp Information processor
US5363503A (en) 1992-01-22 1994-11-08 Unisys Corporation Fault tolerant computer system with provision for handling external events
US5715386A (en) * 1992-09-30 1998-02-03 Lucent Technologies Inc. Apparatus and methods for software rejuvenation
EP0593062A3 (en) * 1992-10-16 1995-08-30 Siemens Ind Automation Inc Redundant networked database system
US5621885A (en) 1995-06-07 1997-04-15 Tandem Computers, Incorporated System and method for providing a fault tolerant computer program runtime support environment
US5978933A (en) * 1996-01-11 1999-11-02 Hewlett-Packard Company Generic fault tolerant platform
US7515697B2 (en) * 1997-08-29 2009-04-07 Arbinet-Thexchange, Inc. Method and a system for settlement of trading accounts
JP3052908B2 (en) * 1997-09-04 2000-06-19 日本電気株式会社 Transaction program parallel execution method and transaction program parallel execution method
US6324654B1 (en) 1998-03-30 2001-11-27 Legato Systems, Inc. Computer network remote data mirroring system
US6199171B1 (en) * 1998-06-26 2001-03-06 International Business Machines Corporation Time-lag duplexing techniques
US6393582B1 (en) * 1998-12-10 2002-05-21 Compaq Computer Corporation Error self-checking and recovery using lock-step processor pair architecture
US6169726B1 (en) * 1998-12-17 2001-01-02 Lucent Technologies, Inc. Method and apparatus for error free switching in a redundant duplex communication carrier system
GB2353113B (en) * 1999-08-11 2001-10-10 Sun Microsystems Inc Software fault tolerant computer system
GB2359384B (en) * 2000-02-16 2004-06-16 Data Connection Ltd Automatic reconnection of partner software processes in a fault-tolerant computer system
US20020026400A1 (en) 2000-08-22 2002-02-28 Bondglobe Inc. System and method to establish trading mechanisms employing auctions and reverse auctions
US20040133606A1 (en) * 2003-01-02 2004-07-08 Z-Force Communications, Inc. Directory aggregation for files distributed over a plurality of servers in a switched file system
EP1370947A4 (en) * 2001-02-13 2009-05-27 Candera Inc Silicon-based storage virtualization server
US6971044B2 (en) * 2001-04-20 2005-11-29 Egenera, Inc. Service clusters and method in a processing system with failover capability
JP2003015900A (en) * 2001-06-28 2003-01-17 Hitachi Ltd Follow-up type multiplex system and data processing method capable of improving reliability by follow-up
US6954877B2 (en) * 2001-11-29 2005-10-11 Agami Systems, Inc. Fault tolerance using logical checkpointing in computing systems
US7093004B2 (en) * 2002-02-04 2006-08-15 Datasynapse, Inc. Using execution statistics to select tasks for redundant assignment in a distributed computing platform
US7421478B1 (en) * 2002-03-07 2008-09-02 Cisco Technology, Inc. Method and apparatus for exchanging heartbeat messages and configuration information between nodes operating in a master-slave configuration
GB0206604D0 (en) * 2002-03-20 2002-05-01 Global Continuity Plc Improvements relating to overcoming data processing failures
US6978396B2 (en) * 2002-05-30 2005-12-20 Solid Information Technology Oy Method and system for processing replicated transactions parallel in secondary server
JP3982353B2 (en) * 2002-07-12 2007-09-26 日本電気株式会社 Fault tolerant computer apparatus, resynchronization method and resynchronization program
US7120825B2 (en) * 2003-06-06 2006-10-10 Hewlett-Packard Development Company, L.P. Adaptive batch sizing for asynchronous data redundancy
US7139939B2 (en) * 2003-06-20 2006-11-21 International Business Machines Corporation System and method for testing servers and taking remedial action
US20050071391A1 (en) * 2003-09-29 2005-03-31 International Business Machines Corporation High availability data replication set up using external backup and restore
JP4288418B2 (en) * 2003-12-02 2009-07-01 日本電気株式会社 Computer system, status acquisition method, and status acquisition program
US20060112219A1 (en) * 2004-11-19 2006-05-25 Gaurav Chawla Functional partitioning method for providing modular data storage systems
GB0426309D0 (en) 2004-11-30 2004-12-29 Ibm Method and system for error strategy in a storage system
US20070038849A1 (en) * 2005-08-11 2007-02-15 Rajiv Madampath Computing system and method
US7519859B2 (en) 2005-08-30 2009-04-14 International Business Machines Corporation Fault recovery for transaction server
US7725764B2 (en) 2006-08-04 2010-05-25 Tsx Inc. Failover system and method
US8041985B2 (en) * 2006-08-11 2011-10-18 Chicago Mercantile Exchange, Inc. Match server for a financial exchange having fault tolerant operation
US7480827B2 (en) 2006-08-11 2009-01-20 Chicago Mercantile Exchange Fault tolerance and failover using active copy-cat
US7434096B2 (en) * 2006-08-11 2008-10-07 Chicago Mercantile Exchange Match server for a financial exchange having fault tolerant operation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11641395B2 (en) * 2019-07-31 2023-05-02 Stratus Technologies Ireland Ltd. Fault tolerant systems and methods incorporating a minimum checkpoint interval

Also Published As

Publication number Publication date
US9244771B2 (en) 2016-01-26
US7480827B2 (en) 2009-01-20
CA2657882C (en) 2012-02-07
EP2049995A2 (en) 2009-04-22
EP2049995B1 (en) 2016-06-01
US8468390B2 (en) 2013-06-18
US20110246819A1 (en) 2011-10-06
US20090106328A1 (en) 2009-04-23
EP3118743A1 (en) 2017-01-18
WO2008021636A2 (en) 2008-02-21
US7975173B2 (en) 2011-07-05
WO2008021636A3 (en) 2008-10-02
EP2049995A4 (en) 2009-11-11
EP3118743B1 (en) 2021-10-06
US20080126853A1 (en) 2008-05-29
US20130297970A1 (en) 2013-11-07

Similar Documents

Publication Publication Date Title
CA2657882A1 (en) Fault tolerance and failover using active copy-cat
CA2659395A1 (en) Match server for a financial exchange having fault tolerant operation
US8572455B2 (en) Systems and methods to respond to error detection
US7321906B2 (en) Method of improving replica server performance and a replica server system
US7761734B2 (en) Automated firmware restoration to a peer programmable hardware device
WO2019136595A1 (en) Method for handling i2c bus deadlock, electronic device, and communication system
US7861022B2 (en) Livelock resolution
US9417946B2 (en) Method and system for fault containment
US8644136B2 (en) Sideband error signaling
US20150161014A1 (en) Persistent application activation and timer notifications
JP2014509012A5 (en)
US20130332795A1 (en) Rank-specific cycle redundancy check
CN105393519A (en) Failover system and method
US20060133410A1 (en) Fault tolerant duplex computer system and its control method
US9348682B2 (en) Methods for transitioning control between two controllers of a storage system
JP2005031993A (en) Electronic controller
US7549082B2 (en) Method and system of bringing processors to the same computational point
JP2009116642A (en) Method and program for recovering from pci bus fault
CN102521086B (en) Dual-mode redundant system based on lock step synchronization and implement method thereof
WO2021111639A1 (en) Controller
JP2012022429A (en) Dual system arithmetic processing unit and dual system arithmetic processing method
CN104683153B (en) A kind of active and standby MPU control method of cluster routers and its system
US11307552B2 (en) Method for modifying a configuration and industrial plant system
US8103861B2 (en) Method and system for presenting an interrupt request to processors executing in lock step
CN108958986B (en) Method and apparatus for identifying hardware errors in a microprocessor

Legal Events

Date Code Title Description
EEER Examination request