US20040117687A1 - High-availability architecture using high-speed pipes - Google Patents
High-availability architecture using high-speed pipes Download PDFInfo
- Publication number
- US20040117687A1 US20040117687A1 US10/692,252 US69225203A US2004117687A1 US 20040117687 A1 US20040117687 A1 US 20040117687A1 US 69225203 A US69225203 A US 69225203A US 2004117687 A1 US2004117687 A1 US 2004117687A1
- Authority
- US
- United States
- Prior art keywords
- computer system
- data
- transferring
- availability
- pipe
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims abstract description 31
- 238000012546 transfer Methods 0.000 claims description 41
- 239000003550 marker Substances 0.000 claims description 11
- 238000012544 monitoring process Methods 0.000 claims 2
- 238000004891 communication Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 6
- 239000000872 buffer Substances 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 239000003999 initiator Substances 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000003862 health status Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2038—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
Definitions
- the present invention relates generally to high availability computer system architectures, and in particular to apparatus, systems, and methods for a high-availability computer system architecture using high-speed pipes.
- Apparatus, systems, and methods consistent with the present invention utilize available high-speed pipes to transfer information necessary for high-availability between two computer systems.
- one or more logical pipes are implemented on a physical pipe between two computer systems.
- the use of the term pipe refers to a communication channel.
- a physical pipe refers to a physical communication channel.
- a logical pipe refers to a logical communication channel, and high-availability information refers to data transferred between systems for purposes of implementing a high-availability architecture.
- the logical pipes are used for data transfer between an active system and a standby system so that the standby system has the information necessary to take over from the active system if the active system fails in some way.
- the logical pipes that transfer information necessary for implementing high-availability are part of a physical pipe that also carries other types of information used by the active system.
- the system may also use network interface cards to implement the high-speed pipes.
- the network interface cards may be implemented using conventional interface cards without departing from the principles of the invention.
- a NIC using Virtual Interface (VI) Architecture may be used.
- An apparatus consistent with the present invention comprises a physical pipe for transferring data between an active system and a standby system.
- the apparatus further comprises a first logical pipe for transferring data over the physical pipe, and a second logical pipe for transferring high-availability data over the physical pipe.
- Another apparatus consistent with the present invention comprises a physical pipe for transferring data between an active system and a standby system.
- the apparatus further comprises network interface card for transferring data and high-availability information over the physical pipe.
- Yet another apparatus consistent with the present invention includes a physical pipe for transferring data between an active system and a standby system.
- the apparatus further comprises a first logical pipe for transferring checkpointing data over the physical pipe, and a second logical pipe for transferring total system state data over the physical pipe.
- a system consistent with the present invention comprises a physical pipe.
- the system further comprises an active system for transferring data and high-availability information over the physical pipe, and a standby system for receiving the high-availability information from the physical pipe.
- a method in a high-availability system having an active system and a standby system is provided.
- the active system sends a message to the standby system to enter a switch-over state.
- the standby system monitors a transfer complete marker.
- the method transfers total system state from the active system to the standby system.
- the method switches the high-availability system from the active system to the standby system upon detecting the transfer complete marker.
- FIG. 1 is a block diagram showing a high-availability computer system consistent with the present invention
- FIG. 2 is a block diagram showing a method using a transfer complete marker consistent with the present invention
- FIG. 3 is a block diagram showing transitions for Graceful Switch Over consistent with the present invention.
- FIG. 4 is a block diagram illustrating the VI Architectural Model consistent with the present invention.
- Apparatus, systems, and methods consistent with the improved high-availability architecture disclosed herein use high-speed pipes to exchange information between an active computer system and a standby system.
- Appendix A which contains a glossary of the terms and conventions used in describing the invention, is incorporated herein in its entirety as part of this Detailed Description.
- High Speed Pipes system uses high speed pipes to transfer information necessary for high-availability between two computer systems. This information exchange permits a standby computer system to takeover in case the active system fails.
- the present system uses logical pipes on existing physical pipes, thereby realizing significant cost savings compared with conventional systems that require dedicated pipes for transferring high-availability information between computer systems.
- the call processing platform redundancy scheme is based on a 1+1 model and can be expanded to work in an n+1 redundancy model.
- a database in this document means protected memory region, hard disk drive files and any data structures common on both the active and standby system.
- the use of the term pipe refers to a communication channel.
- a physical pipe refers to a physical communication channel.
- a logical pipe refers to a logical communication channel, and high-availability information refers to data transferred between systems for purposes of implementing a high-availability architecture.
- FIG. 1 illustrates a high-availability computer system 100 .
- active system 102 is interconnected with standby system 104 .
- Active system 102 comprises a disk drive 106 , memory 108 , and CPU 109 .
- Standby system 104 comprises a disk drive 110 , memory 112 , and CPU 113 .
- Active system 102 and standby system 104 are interconnected via two interfaces or logical pipes: Interface A 114 and Interface B 116 .
- Interface A 114 transfers two types of traffic:
- Interface A 114 one of the logical pipes, is used to transfer “Heart Beat,” or in other words messages between active node 102 and standby node 104 that make one system aware of another's existence or health.
- Interface A 114 is used to transfer “P.mem updates,” or in other words any protected memory updates that occur on the active node are replicated on the standby node.
- Interface A 114 is used to ensure disk redundancy. For example, any updates or write operations performed on active node disk 106 are replicated on standby node disk 110 .
- any configuration changes made on active node 102 are replicated on standby node 104 by transferring commands or inputs associated with any configuration changes to the standby node using Interface A.
- the pipe is configured to work within a client server type configuration, allowing software to access the pipe in a similar manner to a socket-based TCP/IP connection. All data sent across this pipe will be encapsulated and sent across as a message. Only complete transactions should be sent across this interface at any one time to prevent the case where a partial transaction has been sent to the standby side when active node failure occurs, thus causing an inconsistent database on the standby side.
- the transaction could be built on the inactive side from partial transactions, and then applied as a single transaction once it has been fully built. Due to the symmetrical nature of the system, it can be assumed that if a transaction is completed on the active side, the same transaction will be complete on the standby side, therefore no rollback and retry functionality will be required for the first phase of this system.
- GSO Graceful Switch over
- a data transfer mechanism transfers the total system state at a particular time by the active node to the standby node in the least amount of time possible, allowing the standby node to continue where the previously active node stopped.
- the HSP must exhibit the following characteristics:
- the receiving side must pend and be notified of completion without OS involvement.
- DMA Direct Memory Access
- RDMA Remote Direct Memory Access
- RDMAW Remote Direct Memory Access Write
- RDMAR Remote Direct Memory Access Read
- the two may be viewed as push (the initiator writes directly into the recipients memory) for RDMAW and pull (the initiator reads the hosts memory and copies the data into its own memory) for RDMAR.
- All current adaptors support RDMAW, a few also support RDMAR. Even though FIG. 1 depicts RDMAW operation, one skilled in the art will appreciate that RDMAR may also be used.
- the receiving side must know when the transfer is complete.
- a small loop is entered on the receiving side where a memory address is monitored.
- address location 0x7 ffff 206 on standby node 204 is set to 0 ⁇ 0000 initially and is monitored.
- the transfer complete marker it indicates that transfer has been completed and thus standby side may assume the role of the active node.
- a value of 0xfabe 210 is depicted as the transfer complete marker.
- this value could be any non-zero value that the active node and the standby node have agreed to treat as the transfer complete marker.
- FIG. 3 is a block diagram illustrating an overview of the transitions taking place on both nodes during a GSO.
- Side A the active side is in normal active 302 state.
- Side B the standby side, is in normal standby 304 state.
- a GSO event is always initiated by the active side ‘A’ by sending a message to the standby side ‘B’ to enter the GSO receive state.
- side A enters Start GSO 306 state and upon receiving the Start GSO message, side B enters Start GSO 308 state as well.
- Side ‘A’ then enters a PRE-GSO state, the GSO Interrupt State (State 310 ), and waits for an acknowledgment from side ‘B’ that it is ready to receive the system image.
- State 310 the GSO Interrupt State
- Side ‘B’ stops all activity and enters a small loop looking for a specific memory location to change, the GSO Interrupt State (state 312 ). Side ‘A’ then initiates the RDMAW, and enters a loop, similar to side ‘B,’ to prevent it from restarting until the system image has been transferred (state 314 ). Side ‘B’ sends a done message to side ‘A’ when it detects that the transfer complete marker has changed (state 316 ), thus allowing the side ‘A’ to restart (state 318 ) and become the standby node (state 322 ).
- Side ‘B’ then executes a return from interrupt or return from exception instruction, for example, IRET (state 320 ), causing the processor to continue from the point where side ‘A’ jumped into the GSO interrupt, thus assuming the role of the active node (state 324 ).
- IRET exception instruction
- FIG. 3 depicts the state transitions in a particular order, the order of these state transitions may be changed.
- NIC Network Interface Card
- a commercially available NIC from Compaq that fulfills the requirements of the HSP may be used to implement the physical and the logical pipes.
- the Servernet card has been externalized for the open systems server market, allowing it to be used as the HSP hardware.
- Some NIC's include a virtual interface architecture, such as the Virtual Interface Architecture (VIA) standard.
- VIP Virtual Interface Architecture
- NIC's such as the Servernet card
- X and Y a dual interconnect fabric denoted X and Y allowing transparent link redundancy to be part of the standard interface.
- the NIC has native VIA processing in the hardware.
- a software VIA emulator may be used, allowing software to be written to utilize the VIA interface.
- VIA NIC's have to provide the ability to do RDMAW operations because the RDMAW is the basic transport mechanism of the VIA interface.
- One consideration for CPP high availability strategy is that the RDMAW can take place with no OS support, because of the requirement that both the active and standby sides are in a locked interrupt state to prevent the OS state from changing.
- VIA is a channel architecture. Therefore, one or more logical pipes may exist through one physical pipe.
- VI Architectural Model 400 depicts the relationship between VI Consumer 402 and VI Provider 404 .
- VI Consumer 402 comprises Application 406 , OS Communication Interface 408 , and VI User Agent 410 .
- OS Communication Interface may consist of sockets, Message Passing Interface (MPI), Cluster, or other communication mechanisms.
- VI Provider 404 comprises VI Kernel Agent 412 , VI Send/Receive and Completion Module 414 , and VI Network Adapter 416 .
- VI Consumer 402 runs in the user mode and VI Provider 404 runs in the kernel mode as depicted in FIG. 4.
- the VI Consumer on the local node always specifies the location of the data.
- the sending process specifies the memory regions that contain the data to be sent.
- the receiving process specifies the memory regions where the data will be placed. Given a single connection, there is a one-to-one correspondence between send Descriptors on the transmitting side and receive Descriptors on the receiving side.
- the VI Consumer at the receiving end pre-posts a Descriptor to the receive queue of a VI send/receive module.
- the VI Consumer at the sending end can then post the message to the corresponding VI's send queue.
- the Send/Receive model of data transfer requires that the VI Consumers be notified of Descriptor completion at both ends of the transfer, for synchronization purposes.
- VI Consumers are responsible for managing flow control on a connection.
- the VI Consumer on the receiving side must post a Receive Descriptor of sufficient size before the sender's data arrives. If the Receive Descriptor at the head of the queue is not large enough to handle the incoming message, or the Receive Queue is empty, an error will occur.
- the connection may be broken if it is intended to be reliable.
- the VI Architecture differs from some existing models in that all Send/Receive operations are completed asynchronously.
- the initiator of the data transfer specifies both the source buffer and the destination buffer of the data transfer.
- the VI Consumer specifies the source of the data transfer in one of its local registered memory regions, and the destination of the data transfer within a remote memory region that has been registered on the remote system.
- the source of an RDMA Write can be specified as a gather list of buffers, while the destination must be a single, virtually contiguous region.
- the RDMA Write operation implies that prior to the data transfer, the VI Consumer at the remote end has informed the initiator of the RDMA Write of the location of the destination buffer, and that the buffer itself is enabled for RDMA Write operations.
- the remote location of the data is specified by its virtual address and its associated memory handle.
- the VI Consumer specifies the source of the data transfer at the remote end, and the destination of the data transfer within a locally registered memory region.
- the VI Consumer on the receiving side must post a Receive Descriptor to receive the Immediate Data, before the sender executes the RDMA Write. If no Descriptor is posted, an error will occur and the connection may be broken.
- RDMAW does not change OS state during operation, on both initiator and receiver.
- memory transfer may be implemented in a variety of ways. For example, the system could start at lower memory location and increment address as data is transferred, or start at high memory location and decrement address as data is transferred.
- Apparatus, systems, and methods consistent with the principles of the invention disclosed herein provide a high-availability architecture using high-speed pipes.
- the high-speed pipes may be implemented using logical pipes over an existing physical pipe.
- the high-speed pipes may also be implemented using conventional network interface cards.
- the apparatus disclosed herein should be understood to support the processes performed thereby, and, similarly, the processes disclosed herein should be understood to support the apparatus necessary to perform the steps of the processes. It should be further understood that the apparatus and methods disclosed herein may be implemented entirely in hardware, entirely in software, or a mixture of hardware and software.
- the apparatus and method consistent with the present invention and disclosed herein are related to apparatus and methods for a high-availability architecture using high-speed pipes.
- Parts of the architecture may be implemented in whole or in part by one or more sequences of instructions which carry out the apparatus and method described herein.
- Such instructions may be read by the computer systems or by network interface cards from a computer-readable medium, such as a storage device.
- Execution of sequences of instructions by the computer system or network interface cards causes performance of process steps consistent with the present invention described herein.
- Execution of sequences of instructions may also be considered to implement apparatus elements that perform the process steps.
- Hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention.
- embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
- Non-volatile memory media includes, for example, optical or magnetic disks.
- Volatile memory media includes RAM.
- Transmission media includes, for example, coaxial cables, copper wire and fiber optics, including the wires. Transmission media can also take the form of acoustic or light waves, such-as those generated during radio-wave and infra-red data communications.
- Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic storage medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read and use.
- Various forms of computer readable media may be involved in carrying one or more sequences of instructions for execution to implement the high-availability architecture described herein.
- the instructions may initially be carried on a magnetic disk or a remote computer.
- the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
- a modem local to a computer system can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infrared signal.
- An infra-red detector coupled to appropriate circuitry can receive the data carried in the infra-red signal and place the data on a bus.
- the bus may carry data to a memory, from which a processor retrieves and executes the instructions.
- the instructions received by the memory may optionally be stored on a storage device either before or after execution by the processor.
Abstract
Apparatus, system, and methods for a high availability computer system architecture using high-speed pipes are provided. An active computer system and a standby computer system are connected using a physical pipe for transferring data between the active computer system and the standby computer system. A first logical pipe is used for transferring data over the physical pipe, and a second logical pipe is used for transferring high-availability data over the physical pipe. Network-interface cards may be used to implement the high-speed pipes.
Description
- This application claims the benefit of provisional application “Methods and Apparatus for High-Availability Architecture Using High-Speed Pipes,” filed Jun. 2, 1999 bearing Serial No. 60/137,203, the contents of which are relied upon and incorporated by reference.
- A. Field of the Invention
- The present invention relates generally to high availability computer system architectures, and in particular to apparatus, systems, and methods for a high-availability computer system architecture using high-speed pipes.
- B. Description of the Related Art
- Conventional high-availability computer systems use special purpose, dedicated systems for implementing redundancy. For example, some conventional systems utilize two computer systems, one of which is active and the other standby, and special purpose hardware and software that interacts with each computer system to implement high-availability. The special purpose hardware and software communicates with the active computer system to capture status information so that in the event the active system goes down the standby system can start in place of the active system using the information collected by the special purpose hardware and software.
- Thus, conventional high-availability computer system architectures require special purpose hardware and software, which raises system costs. The additional costs make these systems very expensive. There is, therefore, a need for a high-availability computer system architecture that solves the problems associated with special purpose hardware and software high-availability systems.
- Apparatus, systems, and methods consistent with the present invention utilize available high-speed pipes to transfer information necessary for high-availability between two computer systems. In one embodiment, one or more logical pipes are implemented on a physical pipe between two computer systems. The use of the term pipe refers to a communication channel. A physical pipe refers to a physical communication channel. A logical pipe refers to a logical communication channel, and high-availability information refers to data transferred between systems for purposes of implementing a high-availability architecture. The logical pipes are used for data transfer between an active system and a standby system so that the standby system has the information necessary to take over from the active system if the active system fails in some way. In one embodiment, the logical pipes that transfer information necessary for implementing high-availability are part of a physical pipe that also carries other types of information used by the active system.
- The system may also use network interface cards to implement the high-speed pipes. The network interface cards (NIC) may be implemented using conventional interface cards without departing from the principles of the invention. For example, a NIC using Virtual Interface (VI) Architecture may be used.
- By using logical pipes on existing physical pipes, there are significant cost savings as compared to conventional systems that require dedicated pipes to transfer the high-availability information. Moreover, by using network interface cards, additional cost savings may be realized. Logical pipes and network interface cards may also be used in combination. Because the architecture reduces or eliminates special purpose hardware and software, costs are significantly reduced.
- An apparatus consistent with the present invention comprises a physical pipe for transferring data between an active system and a standby system. The apparatus further comprises a first logical pipe for transferring data over the physical pipe, and a second logical pipe for transferring high-availability data over the physical pipe.
- Another apparatus consistent with the present invention comprises a physical pipe for transferring data between an active system and a standby system. The apparatus further comprises network interface card for transferring data and high-availability information over the physical pipe.
- Yet another apparatus consistent with the present invention includes a physical pipe for transferring data between an active system and a standby system. The apparatus further comprises a first logical pipe for transferring checkpointing data over the physical pipe, and a second logical pipe for transferring total system state data over the physical pipe.
- A system consistent with the present invention comprises a physical pipe. The system further comprises an active system for transferring data and high-availability information over the physical pipe, and a standby system for receiving the high-availability information from the physical pipe.
- A method in a high-availability system having an active system and a standby system is provided. According to this method, the active system sends a message to the standby system to enter a switch-over state. The standby system monitors a transfer complete marker. The method transfers total system state from the active system to the standby system. The method switches the high-availability system from the active system to the standby system upon detecting the transfer complete marker.
- Such apparatus, systems, and methods overcome the problems of conventional high-availability architectures described above. Additional advantages of the invention are apparent from the description which follows, and may be learned by practice of the invention. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
- The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the advantages and principles of the invention. In the drawings,
- FIG. 1 is a block diagram showing a high-availability computer system consistent with the present invention;
- FIG. 2 is a block diagram showing a method using a transfer complete marker consistent with the present invention;
- FIG. 3 is a block diagram showing transitions for Graceful Switch Over consistent with the present invention; and
- FIG. 4 is a block diagram illustrating the VI Architectural Model consistent with the present invention.
- Apparatus, systems, and methods consistent with the improved high-availability architecture disclosed herein use high-speed pipes to exchange information between an active computer system and a standby system. Appendix A, which contains a glossary of the terms and conventions used in describing the invention, is incorporated herein in its entirety as part of this Detailed Description.
- HSP System Overview
- High Speed Pipes system uses high speed pipes to transfer information necessary for high-availability between two computer systems. This information exchange permits a standby computer system to takeover in case the active system fails. The present system uses logical pipes on existing physical pipes, thereby realizing significant cost savings compared with conventional systems that require dedicated pipes for transferring high-availability information between computer systems.
- Due to legacy reasons, the call processing platform redundancy scheme is based on a 1+1 model and can be expanded to work in an n+1 redundancy model.
- The use of the term database in this document means protected memory region, hard disk drive files and any data structures common on both the active and standby system. The use of the term pipe refers to a communication channel. A physical pipe refers to a physical communication channel. A logical pipe refers to a logical communication channel, and high-availability information refers to data transferred between systems for purposes of implementing a high-availability architecture.
- Interface A
- FIG. 1 illustrates a high-
availability computer system 100. As shown in FIG. 1,active system 102 is interconnected withstandby system 104.Active system 102 comprises adisk drive 106,memory 108, andCPU 109.Standby system 104 comprises adisk drive 110,memory 112, andCPU 113.Active system 102 andstandby system 104 are interconnected via two interfaces or logical pipes:Interface A 114 andInterface B 116.Interface A 114 transfers two types of traffic: - Operational and Management (OA&M) and Health status
- Transactions that change the state of the protected memory interface.
- Accordingly, as shown in FIG. 1,
Interface A 114, one of the logical pipes, is used to transfer “Heart Beat,” or in other words messages betweenactive node 102 andstandby node 104 that make one system aware of another's existence or health. In addition,Interface A 114 is used to transfer “P.mem updates,” or in other words any protected memory updates that occur on the active node are replicated on the standby node. Also, as shown in FIG. 1,Interface A 114 is used to ensure disk redundancy. For example, any updates or write operations performed onactive node disk 106 are replicated onstandby node disk 110. One skilled in the art will appreciate that other operational and management and health status information may also be transferred using Interface A. For example, any configuration changes made onactive node 102 are replicated onstandby node 104 by transferring commands or inputs associated with any configuration changes to the standby node using Interface A. - The characteristics of this interface are:
- Low latency
- Low CPU utilization
- Moderate to high bandwidth
- Low latency reduces the time window where a transaction has been placed into the logical pipe and the possibility of the active system failing during the transfer of the data, thus causing an inconsistency in the database on the standby side.
- Low CPU utilization is required due to the large number of transactions expected between the two systems during normal operation. The utilization of the main CPU by processes or tasks other than for maintaining high-availability should not be greater than 10% during normal operation.
- In many systems, the average traffic rate across this pipe will be relatively low, although during some administrative operations the traffic rate will have some significant bursts of traffic. In one embodiment, the pipe is configured to work within a client server type configuration, allowing software to access the pipe in a similar manner to a socket-based TCP/IP connection. All data sent across this pipe will be encapsulated and sent across as a message. Only complete transactions should be sent across this interface at any one time to prevent the case where a partial transaction has been sent to the standby side when active node failure occurs, thus causing an inconsistent database on the standby side. Alternatively, the transaction could be built on the inactive side from partial transactions, and then applied as a single transaction once it has been fully built. Due to the symmetrical nature of the system, it can be assumed that if a transaction is completed on the active side, the same transaction will be complete on the standby side, therefore no rollback and retry functionality will be required for the first phase of this system.
- Failure of this link will cause inconsistency in the two databases. Therefore a procedure may be used to synchronize the databases without impact to the operation of the system. This synchronization could happen at any time. For example, synchronization could happen at hardware failure, software error, or human error (e.g., inadvertently removes cable).
- To ensure database consistency, some form of audit facility may be run periodically. It will be assumed that the active database is correct and any differences will be applied to the standby database in the case of inconsistency.
- Interface B
- The second logical pipe,
Interface B 116, between the two systems is used only during a Graceful Switch over (GSO). GSO in this context refers to the ability to transfer control from one processing element to a standby element within a brief period, such as one second, without any impact to the functionality of the system. To facilitate GSO, a data transfer mechanism transfers the total system state at a particular time by the active node to the standby node in the least amount of time possible, allowing the standby node to continue where the previously active node stopped. - Within the CPP system this transfer of data period is known as the stop and copy point. The requirements of the HSP during the stop and copy phase are considerably different to the requirements during normal operation.
- During the stop and copy phase the HSP must exhibit the following characteristics:
- Very High bandwidth.
- OS-independent data transfer.
- Does not change the system state on the active or inactive side.
- The receiving side must pend and be notified of completion without OS involvement.
- These requirements pose a number of technical challenges. Although many technologies offer very high bandwidth (IEEE 1394, Giga-bit Ethernet, etc.), many of them require the use of OS services. The use of a Direct Memory Access (DMA) engine fulfills most of the requirements except the ability to transfer the data between two independent nodes. Remote Direct Memory Access (RDMA) has all the same characteristics of regular DMA engines except that a DMA transaction can be performed across a pair of nodes, thus allowing a block of data to be directly transferred between the memory subsystems of two independent nodes. Two nodes of RDMA exist: Remote Direct Memory Access Write (RDMAW) and Remote Direct Memory Access Read (RDMAR). The two may be viewed as push (the initiator writes directly into the recipients memory) for RDMAW and pull (the initiator reads the hosts memory and copies the data into its own memory) for RDMAR. All current adaptors support RDMAW, a few also support RDMAR. Even though FIG. 1 depicts RDMAW operation, one skilled in the art will appreciate that RDMAR may also be used.
- The receiving side must know when the transfer is complete. In one embodiment, a small loop is entered on the receiving side where a memory address is monitored. When this location changes, the last byte of the transfer has completed, allowing the standby side to return out of the GSO and assume the role of the active node. For example, as shown in FIG. 2, address
location 0x7 ffff 206 onstandby node 204 is set to 0×0000 initially and is monitored. When this location changes to a previously agreed upon value, the transfer complete marker, it indicates that transfer has been completed and thus standby side may assume the role of the active node. In FIG. 2, for example, a value of0xfabe 210 is depicted as the transfer complete marker. One skilled in the art will appreciate that this value could be any non-zero value that the active node and the standby node have agreed to treat as the transfer complete marker. - FIG. 3 is a block diagram illustrating an overview of the transitions taking place on both nodes during a GSO. Initially, Side A, the active side is in normal active302 state. Side B, the standby side, is in normal standby 304 state. In one embodiment a GSO event is always initiated by the active side ‘A’ by sending a message to the standby side ‘B’ to enter the GSO receive state. Thus, side A enters
Start GSO 306 state and upon receiving the Start GSO message, side B entersStart GSO 308 state as well. Side ‘A’ then enters a PRE-GSO state, the GSO Interrupt State (State 310), and waits for an acknowledgment from side ‘B’ that it is ready to receive the system image. Side ‘B’ stops all activity and enters a small loop looking for a specific memory location to change, the GSO Interrupt State (state 312). Side ‘A’ then initiates the RDMAW, and enters a loop, similar to side ‘B,’ to prevent it from restarting until the system image has been transferred (state 314). Side ‘B’ sends a done message to side ‘A’ when it detects that the transfer complete marker has changed (state 316), thus allowing the side ‘A’ to restart (state 318) and become the standby node (state 322). Side ‘B’ then executes a return from interrupt or return from exception instruction, for example, IRET (state 320), causing the processor to continue from the point where side ‘A’ jumped into the GSO interrupt, thus assuming the role of the active node (state 324). One skilled in the art will appreciate that even though FIG. 3 depicts the state transitions in a particular order, the order of these state transitions may be changed. - Use of a Network Interface Card (NIC) as HSP
- A commercially available NIC from Compaq (Servernet), that fulfills the requirements of the HSP may be used to implement the physical and the logical pipes. Recently the Servernet card has been externalized for the open systems server market, allowing it to be used as the HSP hardware. Some NIC's include a virtual interface architecture, such as the Virtual Interface Architecture (VIA) standard.
- Conventional NIC's, such as the Servernet card, employ a dual interconnect fabric denoted X and Y allowing transparent link redundancy to be part of the standard interface. In one embodiment of the invention, the NIC has native VIA processing in the hardware. In another embodiment, a software VIA emulator may be used, allowing software to be written to utilize the VIA interface.
- An important feature of all VIA NIC's is that they have to provide the ability to do RDMAW operations because the RDMAW is the basic transport mechanism of the VIA interface. One consideration for CPP high availability strategy is that the RDMAW can take place with no OS support, because of the requirement that both the active and standby sides are in a locked interrupt state to prevent the OS state from changing.
- Software Interface
- The following is an overview of the software interface for the two logical HSP's. It should be noted that VIA is a channel architecture. Therefore, one or more logical pipes may exist through one physical pipe.
- Send/Receive
- The Send/Receive model of the known VI Architecture follows a well-known model of transferring data between two endpoints. As shown in FIG. 4,
VI Architectural Model 400 depicts the relationship betweenVI Consumer 402 andVI Provider 404.VI Consumer 402 comprisesApplication 406,OS Communication Interface 408, and VI User Agent 410. OS Communication Interface may consist of sockets, Message Passing Interface (MPI), Cluster, or other communication mechanisms.VI Provider 404 comprisesVI Kernel Agent 412, VI Send/Receive andCompletion Module 414, andVI Network Adapter 416. - In one
implementation VI Consumer 402 runs in the user mode andVI Provider 404 runs in the kernel mode as depicted in FIG. 4. - In this model, the VI Consumer on the local node always specifies the location of the data. On the sending side, the sending process specifies the memory regions that contain the data to be sent. On the receiving side, the receiving process specifies the memory regions where the data will be placed. Given a single connection, there is a one-to-one correspondence between send Descriptors on the transmitting side and receive Descriptors on the receiving side.
- The VI Consumer at the receiving end pre-posts a Descriptor to the receive queue of a VI send/receive module. The VI Consumer at the sending end can then post the message to the corresponding VI's send queue. The Send/Receive model of data transfer requires that the VI Consumers be notified of Descriptor completion at both ends of the transfer, for synchronization purposes. VI Consumers are responsible for managing flow control on a connection. The VI Consumer on the receiving side must post a Receive Descriptor of sufficient size before the sender's data arrives. If the Receive Descriptor at the head of the queue is not large enough to handle the incoming message, or the Receive Queue is empty, an error will occur. The connection may be broken if it is intended to be reliable. The VI Architecture differs from some existing models in that all Send/Receive operations are completed asynchronously.
- Remote Direct Memory Access (RDMA)
- In the RDMA Model, the initiator of the data transfer specifies both the source buffer and the destination buffer of the data transfer. There are two types of RDMA operations, RDMA Write and RDMA Read.
- For the RDMA Write operation, the VI Consumer specifies the source of the data transfer in one of its local registered memory regions, and the destination of the data transfer within a remote memory region that has been registered on the remote system. The source of an RDMA Write can be specified as a gather list of buffers, while the destination must be a single, virtually contiguous region. The RDMA Write operation implies that prior to the data transfer, the VI Consumer at the remote end has informed the initiator of the RDMA Write of the location of the destination buffer, and that the buffer itself is enabled for RDMA Write operations. The remote location of the data is specified by its virtual address and its associated memory handle. For the RDMA Read operation, the VI Consumer specifies the source of the data transfer at the remote end, and the destination of the data transfer within a locally registered memory region. The VI Consumer on the receiving side must post a Receive Descriptor to receive the Immediate Data, before the sender executes the RDMA Write. If no Descriptor is posted, an error will occur and the connection may be broken.
- The following using Servernet as an example, is a list of actions and VI Architecture calls required to support both HSP links required for the CPP high availability pipes. Hardware init
- ServernetInit
- ServernerReset
- Hardware Connection
- VipOpenNic
- VipCloseNic
- Endpoint Creation and Destruction
- VipCreateVi
- VipDestroyVi
- Connection Management
- VipConnectWait
- VipConnectAccept
- VipConnectRequest
- VipDisconnect
- Data transfer
- VipPostSend
- VipSendDone
- VipSendWait
- VipPostRecv
- VipRecvDone
- VipRecvWait
- Querying Information
- VlpQueryNic
- VipSetViAttributes
- VipQueryVi
- VipQuerySystemManagementInfo
- Special requirements for stop and copy functionality.
- RDMAW does not change OS state during operation, on both initiator and receiver.
- During a RDMAW operation memory transfer may be implemented in a variety of ways. For example, the system could start at lower memory location and increment address as data is transferred, or start at high memory location and decrement address as data is transferred.
- Apparatus, systems, and methods consistent with the principles of the invention disclosed herein provide a high-availability architecture using high-speed pipes. The high-speed pipes may be implemented using logical pipes over an existing physical pipe. The high-speed pipes may also be implemented using conventional network interface cards.
- It will be apparent to those skilled in the art that various modifications and variations can be made in the high-availability apparatus, system, and methods consistent with the principles of the present invention without departing from the scope or spirit of the invention. Although several embodiments have been described above, other variations are possible within the spirit and scope consistent with the principles of the present invention.
- Although the invention has been described in terms of two systems, the principles may be applied to more than two systems. The principles of the invention, as disclosed herein, may be used in any environment requiring high-availability. For example, the principles may be used in financial settings or call-processing systems.
- The apparatus disclosed herein should be understood to support the processes performed thereby, and, similarly, the processes disclosed herein should be understood to support the apparatus necessary to perform the steps of the processes. It should be further understood that the apparatus and methods disclosed herein may be implemented entirely in hardware, entirely in software, or a mixture of hardware and software.
- The apparatus and method consistent with the present invention and disclosed herein are related to apparatus and methods for a high-availability architecture using high-speed pipes. Parts of the architecture may be implemented in whole or in part by one or more sequences of instructions which carry out the apparatus and method described herein. Such instructions may be read by the computer systems or by network interface cards from a computer-readable medium, such as a storage device. Execution of sequences of instructions by the computer system or network interface cards causes performance of process steps consistent with the present invention described herein. Execution of sequences of instructions may also be considered to implement apparatus elements that perform the process steps. Hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
- The term “computer-readable medium” as used herein refers to any medium that may store instructions for execution. Such a medium may take many forms, including but not limited to, non-volatile memory media, volatile memory media, and transmission media. Non-volatile memory media includes, for example, optical or magnetic disks. Volatile memory media includes RAM. Transmission media includes, for example, coaxial cables, copper wire and fiber optics, including the wires. Transmission media can also take the form of acoustic or light waves, such-as those generated during radio-wave and infra-red data communications.
- Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic storage medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read and use.
- Various forms of computer readable media may be involved in carrying one or more sequences of instructions for execution to implement the high-availability architecture described herein. For example, the instructions may initially be carried on a magnetic disk or a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to a computer system can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infrared signal. An infra-red detector coupled to appropriate circuitry can receive the data carried in the infra-red signal and place the data on a bus. The bus may carry data to a memory, from which a processor retrieves and executes the instructions. The instructions received by the memory may optionally be stored on a storage device either before or after execution by the processor.
- Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed embodiments. The specification and examples are exemplary only, and the true scope and spirit of the invention is defined by the following claims and their equivalents.
Claims (24)
1. An apparatus for implementing a high-availability computer system architecture, comprising:
a physical pipe for transferring data between an active computer system and a standby computer system;
a first logical pipe for transferring data over the physical pipe; and
a second logical pipe for transferring high-availability data over the physical pipe.
2. The apparatus of claim 1 , wherein the data transferred between the active computer system and the standby computer system on the first logical pipe comprises checkpointing data.
3. The apparatus of claim 1 , wherein the high-availability data transferred between the active computer system and the standby computer system on the second logical pipe comprises total system state data of the active computer system.
4. The apparatus of claim 1 , wherein the second logical pipe uses remote direct memory access write operations for transferring high-availability data.
5. The apparatus of claim 1 , wherein the second logical pipe uses remote direct memory access read operations for transferring high-availability data.
6. An apparatus for implementing a high-availability computer system architecture, comprising:
a physical pipe for transferring data between an active computer system and a standby computer system; and
a network interface card for transferring data and high-availability information over the physical pipe.
7. The apparatus of claim 6 , wherein the data transferred between the active computer system and the standby computer system on the network interface card comprises checkpointing data.
8. The apparatus of claim 6 , wherein the high-availability information transferred between the active computer system and the standby computer system on the network interface card comprises total system state data of the active computer system.
9. The apparatus of claim 6 , wherein the second logical pipe uses remote direct memory access write operations for transferring high-availability data.
10. The apparatus of claim 6 , wherein the second logical pipe uses remote direct memory access read operations for transferring high-availability data.
11. A system for implementing a high-availability computer system architecture, comprising:
a physical pipe;
an active computer system for transferring data and high-availability information over the physical pipe; and
a standby computer system for receiving the high-availability information from the physical pipe.
12. The system according to claim 11 , wherein the active computer system further comprises:
an interface card for transferring the data and high-availability information.
13. The system according to claim 11 , wherein the standby computer system further comprises:
an interface card for receiving the high-availability information.
14. A system for implementing a high-availability computer system architecture, comprising:
physical means for transferring data between an active computer system and a standby computer system;
a first logical means for transferring data over the physical means; and
a second logical means for transferring high-availability data over the physical means.
15. The system of claim 14 , wherein the data transferred between the active computer system and the standby computer system on the first logical means comprises checkpointing data.
16. The system of claim 14 , wherein the high-availability data transferred between the active computer system and the standby computer system on the second logical means comprises total system state data of the active computer system.
17. The system of claim 14 , wherein the second logical means uses remote direct memory access read and write operations for transferring high-availability data.
18. An apparatus for implementing a high-availability computer system architecture, comprising:
a physical pipe for transferring data between an active computer system and a standby computer system;
a first logical pipe for transferring checkpointing data over the physical pipe; and
a second logical pipe for transferring total system state data over the physical pipe.
19. The apparatus of claim 18 , wherein the second logical pipe uses remote direct memory access write operations for transferring total system state data over the physical pipe.
20. The apparatus of claim 18 , wherein the second logical pipe uses remote direct memory access read operations for transferring total system state data over the physical pipe.
21. A method in a high-availability computer system having an active computer system and a standby computer system, the method comprising the steps of:
sending a message to the standby computer system to enter a switch-over state;
monitoring a transfer complete marker;
transferring total system state from the active computer system to the standby computer system; and
switching from the active computer system to the standby computer system upon detecting the transfer complete marker.
22. The method of claim 21 , wherein the step of transferring total system state from the active computer system to the standby computer system, further includes the step of:
performing remote direct memory access read and write operations.
23. A computer-readable medium containing instructions for performing a method in a high-availability computer system having an active computer system and a standby computer system, the method comprising the steps of:
sending a message to the standby computer system to enter a switch-over state;
monitoring a transfer complete-marker;
transferring total system state from the active computer system to the standby computer system; and
switching from the active computer system to the standby computer system upon detecting the transfer complete marker.
24. The computer-readable medium of claim 23 , wherein the step of transferring total system state from the active computer system to the standby computer system, further includes the step of:
performing remote direct memory access read and write operations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/692,252 US20040117687A1 (en) | 1999-06-02 | 2003-10-23 | High-availability architecture using high-speed pipes |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13720399P | 1999-06-02 | 1999-06-02 | |
US09/585,577 US6715099B1 (en) | 1999-06-02 | 2000-06-02 | High-availability architecture using high-speed pipes |
US10/692,252 US20040117687A1 (en) | 1999-06-02 | 2003-10-23 | High-availability architecture using high-speed pipes |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/585,577 Continuation US6715099B1 (en) | 1999-06-02 | 2000-06-02 | High-availability architecture using high-speed pipes |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040117687A1 true US20040117687A1 (en) | 2004-06-17 |
Family
ID=31996568
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/585,577 Expired - Lifetime US6715099B1 (en) | 1999-06-02 | 2000-06-02 | High-availability architecture using high-speed pipes |
US10/692,252 Abandoned US20040117687A1 (en) | 1999-06-02 | 2003-10-23 | High-availability architecture using high-speed pipes |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/585,577 Expired - Lifetime US6715099B1 (en) | 1999-06-02 | 2000-06-02 | High-availability architecture using high-speed pipes |
Country Status (1)
Country | Link |
---|---|
US (2) | US6715099B1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050283543A1 (en) * | 2002-03-12 | 2005-12-22 | Hawkins Peter A | Redundant system management controllers |
US20060150005A1 (en) * | 2004-12-21 | 2006-07-06 | Nec Corporation | Fault tolerant computer system and interrupt control method for the same |
US7085226B1 (en) * | 1999-10-01 | 2006-08-01 | Lg Electronics Inc. | Control apparatus and method for relay node duplexing |
US7117393B2 (en) | 2003-08-26 | 2006-10-03 | Hitachi, Ltd. | Failover method in a redundant computer system with storage devices |
US20060250946A1 (en) * | 2005-04-19 | 2006-11-09 | Marian Croak | Method and apparatus for maintaining active calls during failover of network elements |
US20130326261A1 (en) * | 2012-06-04 | 2013-12-05 | Verizon Patent And Licensing Inc. | Failover of interrelated services on multiple devices |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7010614B2 (en) * | 2000-07-05 | 2006-03-07 | International Business Machines Corporation | System for computing cumulative amount of data received by all RDMA to determine when a complete data transfer has arrived at receiving device |
US8086894B1 (en) * | 2000-09-06 | 2011-12-27 | Cisco Technology, Inc. | Managing redundant network components |
US7177919B1 (en) * | 2000-11-28 | 2007-02-13 | Cisco Technology, Inc. | Method and system for controlling tasks on network cards |
JP2002208981A (en) * | 2001-01-12 | 2002-07-26 | Hitachi Ltd | Communication method |
CA2432386A1 (en) * | 2001-01-31 | 2002-08-08 | International Business Machines Corporation | Method and apparatus for transferring interrupts from a peripheral device to a host computer system |
US7870258B2 (en) * | 2001-08-08 | 2011-01-11 | Microsoft Corporation | Seamless fail-over support for virtual interface architecture (VIA) or the like |
US7251747B1 (en) * | 2001-09-20 | 2007-07-31 | Ncr Corp. | Method and system for transferring data using a volatile data transfer mechanism such as a pipe |
KR100474704B1 (en) * | 2002-04-29 | 2005-03-08 | 삼성전자주식회사 | Dual processor apparatus capable of burst concurrent writing of data |
US7117390B1 (en) * | 2002-05-20 | 2006-10-03 | Sandia Corporation | Practical, redundant, failure-tolerant, self-reconfiguring embedded system architecture |
US20050091334A1 (en) * | 2003-09-29 | 2005-04-28 | Weiyi Chen | System and method for high performance message passing |
US7937616B2 (en) * | 2005-06-28 | 2011-05-03 | International Business Machines Corporation | Cluster availability management |
US7647483B2 (en) * | 2007-02-20 | 2010-01-12 | Sony Computer Entertainment Inc. | Multi-threaded parallel processor methods and apparatus |
US8774225B2 (en) * | 2009-02-04 | 2014-07-08 | Nokia Corporation | Mapping service components in a broadcast environment |
US9864772B2 (en) | 2010-09-30 | 2018-01-09 | International Business Machines Corporation | Log-shipping data replication with early log record fetching |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5715386A (en) * | 1992-09-30 | 1998-02-03 | Lucent Technologies Inc. | Apparatus and methods for software rejuvenation |
US5951695A (en) * | 1997-07-25 | 1999-09-14 | Hewlett-Packard Company | Fast database failover |
US5974114A (en) * | 1997-09-25 | 1999-10-26 | At&T Corp | Method and apparatus for fault tolerant call processing |
US5987621A (en) * | 1997-04-25 | 1999-11-16 | Emc Corporation | Hardware and software failover services for a file server |
US6081851A (en) * | 1997-12-15 | 2000-06-27 | Intel Corporation | Method and apparatus for programming a remote DMA engine residing on a first bus from a destination residing on a second bus |
US6115829A (en) * | 1998-04-30 | 2000-09-05 | International Business Machines Corporation | Computer system with transparent processor sparing |
US6205557B1 (en) * | 1998-06-09 | 2001-03-20 | At&T Corp. | Redundant call processing |
US6263363B1 (en) * | 1999-01-28 | 2001-07-17 | Skydesk, Inc. | System and method for creating an internet-accessible working replica of a home computer on a host server controllable by a user operating a remote access client computer |
US6298457B1 (en) * | 1997-10-17 | 2001-10-02 | International Business Machines Corporation | Non-invasive networked-based customer support |
US20010056554A1 (en) * | 1997-05-13 | 2001-12-27 | Michael Chrabaszcz | System for clustering software applications |
US6374262B1 (en) * | 1998-03-25 | 2002-04-16 | Fujitsu Limited | Relational database synchronization method and a recording medium storing a program therefore |
US6378021B1 (en) * | 1998-02-16 | 2002-04-23 | Hitachi, Ltd. | Switch control method and apparatus in a system having a plurality of processors |
US6427213B1 (en) * | 1998-11-16 | 2002-07-30 | Lucent Technologies Inc. | Apparatus, method and system for file synchronization for a fault tolerate network |
US6463342B1 (en) * | 2000-04-19 | 2002-10-08 | Ford Motor Company | Method for preventing computer down time |
-
2000
- 2000-06-02 US US09/585,577 patent/US6715099B1/en not_active Expired - Lifetime
-
2003
- 2003-10-23 US US10/692,252 patent/US20040117687A1/en not_active Abandoned
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5715386A (en) * | 1992-09-30 | 1998-02-03 | Lucent Technologies Inc. | Apparatus and methods for software rejuvenation |
US5987621A (en) * | 1997-04-25 | 1999-11-16 | Emc Corporation | Hardware and software failover services for a file server |
US20010056554A1 (en) * | 1997-05-13 | 2001-12-27 | Michael Chrabaszcz | System for clustering software applications |
US6363497B1 (en) * | 1997-05-13 | 2002-03-26 | Micron Technology, Inc. | System for clustering software applications |
US5951695A (en) * | 1997-07-25 | 1999-09-14 | Hewlett-Packard Company | Fast database failover |
US5974114A (en) * | 1997-09-25 | 1999-10-26 | At&T Corp | Method and apparatus for fault tolerant call processing |
US6298457B1 (en) * | 1997-10-17 | 2001-10-02 | International Business Machines Corporation | Non-invasive networked-based customer support |
US6081851A (en) * | 1997-12-15 | 2000-06-27 | Intel Corporation | Method and apparatus for programming a remote DMA engine residing on a first bus from a destination residing on a second bus |
US6378021B1 (en) * | 1998-02-16 | 2002-04-23 | Hitachi, Ltd. | Switch control method and apparatus in a system having a plurality of processors |
US6374262B1 (en) * | 1998-03-25 | 2002-04-16 | Fujitsu Limited | Relational database synchronization method and a recording medium storing a program therefore |
US6115829A (en) * | 1998-04-30 | 2000-09-05 | International Business Machines Corporation | Computer system with transparent processor sparing |
US6205557B1 (en) * | 1998-06-09 | 2001-03-20 | At&T Corp. | Redundant call processing |
US6427213B1 (en) * | 1998-11-16 | 2002-07-30 | Lucent Technologies Inc. | Apparatus, method and system for file synchronization for a fault tolerate network |
US6263363B1 (en) * | 1999-01-28 | 2001-07-17 | Skydesk, Inc. | System and method for creating an internet-accessible working replica of a home computer on a host server controllable by a user operating a remote access client computer |
US6463342B1 (en) * | 2000-04-19 | 2002-10-08 | Ford Motor Company | Method for preventing computer down time |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7085226B1 (en) * | 1999-10-01 | 2006-08-01 | Lg Electronics Inc. | Control apparatus and method for relay node duplexing |
US20050283543A1 (en) * | 2002-03-12 | 2005-12-22 | Hawkins Peter A | Redundant system management controllers |
US7337243B2 (en) * | 2002-03-12 | 2008-02-26 | Intel Corporation | Redundant system management controllers |
US7117393B2 (en) | 2003-08-26 | 2006-10-03 | Hitachi, Ltd. | Failover method in a redundant computer system with storage devices |
US20060150005A1 (en) * | 2004-12-21 | 2006-07-06 | Nec Corporation | Fault tolerant computer system and interrupt control method for the same |
US7441150B2 (en) * | 2004-12-21 | 2008-10-21 | Nec Corporation | Fault tolerant computer system and interrupt control method for the same |
US20060250946A1 (en) * | 2005-04-19 | 2006-11-09 | Marian Croak | Method and apparatus for maintaining active calls during failover of network elements |
US8593939B2 (en) * | 2005-04-19 | 2013-11-26 | At&T Intellectual Property Ii, L.P. | Method and apparatus for maintaining active calls during failover of network elements |
US20130326261A1 (en) * | 2012-06-04 | 2013-12-05 | Verizon Patent And Licensing Inc. | Failover of interrelated services on multiple devices |
US8935562B2 (en) * | 2012-06-04 | 2015-01-13 | Verizon Patent And Licensing Inc. | Failover of interrelated services on multiple devices |
Also Published As
Publication number | Publication date |
---|---|
US6715099B1 (en) | 2004-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6715099B1 (en) | High-availability architecture using high-speed pipes | |
JP3266481B2 (en) | Method and associated apparatus for recovering from a failure in a disk access path of a clustered computing system | |
US8191078B1 (en) | Fault-tolerant messaging system and methods | |
JP3718471B2 (en) | Crash recovery without full remirror | |
US5878205A (en) | Method and system for processing complex recovery using polling signals in a shared medium | |
AU723208B2 (en) | Fault resilient/fault tolerant computing | |
JP3156083B2 (en) | Fault-tolerant computer equipment | |
US7600087B2 (en) | Distributed remote copy system | |
US7194652B2 (en) | High availability synchronization architecture | |
CN100591031C (en) | Methods and apparatus for implementing a high availability fibre channel switch | |
US7290086B2 (en) | Method, apparatus and program storage device for providing asynchronous status messaging in a data storage system | |
US8375363B2 (en) | Mechanism to change firmware in a high availability single processor system | |
US6718347B1 (en) | Method and apparatus for maintaining coherence among copies of a database shared by multiple computers | |
US7167963B2 (en) | Storage system with multiple remote site copying capability | |
US8583755B2 (en) | Method and system for communicating between memory regions | |
US9948545B2 (en) | Apparatus and method for failover of device interconnect using remote memory access with segmented queue | |
JP2003503796A (en) | Intelligent splitter, system, and usage | |
JP2002041348A (en) | Communication pass through shared system resource to provide communication with high availability, network file server and its method | |
JP2002525748A (en) | Protocol for replication server | |
US7065673B2 (en) | Staged startup after failover or reboot | |
US7987154B2 (en) | System, a method and a device for updating a data set through a communication network | |
JP4498389B2 (en) | Multi-node computer system | |
JPH086910A (en) | Cluster type computer system | |
US8595452B1 (en) | System and method for streaming data conversion and replication | |
EP1001344A2 (en) | Apparatus, method and system for file synchronization for a fault tolerant network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |