CN102355369A - Virtual clustered system as well as processing method and processing device thereof - Google Patents

Virtual clustered system as well as processing method and processing device thereof Download PDF

Info

Publication number
CN102355369A
CN102355369A CN2011103017960A CN201110301796A CN102355369A CN 102355369 A CN102355369 A CN 102355369A CN 2011103017960 A CN2011103017960 A CN 2011103017960A CN 201110301796 A CN201110301796 A CN 201110301796A CN 102355369 A CN102355369 A CN 102355369A
Authority
CN
China
Prior art keywords
node
host node
virtual machine
slave node
common host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011103017960A
Other languages
Chinese (zh)
Other versions
CN102355369B (en
Inventor
江滢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201110301796.0A priority Critical patent/CN102355369B/en
Publication of CN102355369A publication Critical patent/CN102355369A/en
Priority to PCT/CN2012/082196 priority patent/WO2013044828A1/en
Application granted granted Critical
Publication of CN102355369B publication Critical patent/CN102355369B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • H04L41/0836Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability to enhance reliability, e.g. reduce downtime

Abstract

The invention discloses a virtual clustered system as well as a processing method and a processing device thereof. The system comprises at least two partitions, wherein each partition comprises one main node and at least one spare node; each main node and each spare node are respectively provided with at least one virtual machine; a peer-to-peer architecture is used between the main nodes in different partitions; a star architecture is used between the main node and the spare node in each partition; the main nodes comprise one management main node and at least one normal main node, wherein the management main node is used for reselecting a new normal main node or spare node in the partition of the ineffective normal node or spare node when the normal node or the spare node is ineffective, or, rebooting the virtual machine when the virtual machine on the normal main node or spare node is failed. According to the embodiment of the invention, the expandability and availability of the system can be improved.

Description

System of virtual cluster and processing method thereof and equipment
Technical field
The present invention relates to the network communications technology, relate in particular to a kind of system of virtual cluster and processing method thereof and equipment.
Background technology
Group system has powerful overall computational performance, memory property and management of performance, and the service form of single system mapping and to the availability guarantee and the fault-tolerant ability of user transparent, becomes the main flow infrastructure architecture of data center.The application of Intel Virtualization Technology, for cluster development provide more excellent also more potential solution party to.Intel Virtualization Technology allows a platform to move a plurality of operating systems simultaneously, and application program can move in separate space and be independent of each other, thereby significantly improves the operating efficiency of computer.Move the calculating potential that a plurality of virtual machines can make full use of physical server, for data center provides capability of fast response.
After introducing Intel Virtualization Technology, can expand and high availability is the ultimate challenge that group system faces.
Summary of the invention
The embodiment of the invention provides a kind of system of virtual cluster and processing method and equipment, improves the extensibility and the availability of cluster virtual machine system.
The embodiment of the invention provides a kind of processing method of system of virtual cluster, comprising:
Node judges whether at least one in following taken place: have the common host node that lost efficacy, have the slave node that lost efficacy, perhaps, have the virtual machine of fault;
Node is lived again and is imitated new common host node after confirming there is the common host node that lost efficacy; After confirming there is the slave node that lost efficacy, live again and imitate new slave node; Perhaps, after confirming there is the virtual machine of fault, restart virtual machine;
Wherein, said common host node and slave node are divided in two the subregion at least, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to few one virtual machine on each host node and each slave node; Adopt the peer-type framework between the host node in the different subregions; Adopt star schema between host node in each subregion and the slave node; Said host node comprises a management host node and the common host node of at least one.
The embodiment of the invention provides a kind of treatment facility of system of virtual cluster, comprising:
Judging unit is used for judging whether at least one item of following takes place: have the common host node that lost efficacy, have the slave node that lost efficacy, perhaps, have the virtual machine of fault;
Processing unit is used for after confirming there is the common host node that lost efficacy, and lives again and imitates new common host node; After confirming there is the slave node that lost efficacy, live again and imitate new slave node; Perhaps, after confirming there is the virtual machine of fault, restart virtual machine;
Wherein, said common host node and slave node are divided in two the subregion at least, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to few one virtual machine on each host node and each slave node; Adopt the peer-type framework between the host node in the different subregions; Adopt star schema between host node in each subregion and the slave node; Said host node comprises a management host node and the common host node of at least one.
The embodiment of the invention provides a kind of system of virtual cluster, comprising:
At least two subregion comprises a host node and the slave node of at least one in each subregion; Be respectively provided to few one virtual machine on each host node and each slave node;
Adopt the peer-type framework between the host node in the different subregions;
Adopt star schema between host node in each subregion and the slave node;
Said host node comprises a management host node and the common host node of at least one; Said management host node is used for after common host node or slave node inefficacy; A new common host node or slave node of gravity treatment in the subregion at common host node that lost efficacy or slave node place; Perhaps; During virtual-machine fail on common host node or slave node, restart virtual machine.
Can know that by technique scheme the system of virtual cluster of the embodiment of the invention can be realized system extension through increasing subregion through dividing subregion; Adopt the peer-type structure between the host node of subregion, can eliminate bottleneck problem, and can improve reliability; Through the new host node of gravity treatment, slave node or restart virtual machine and can further improve reliability.
Description of drawings
In order to be illustrated more clearly in the technical scheme in the embodiment of the invention; The accompanying drawing of required use is done an introduction simply in will describing embodiment below; Obviously; Accompanying drawing in describing below is some embodiments of the present invention; For those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the system configuration sketch map of first embodiment of the invention;
Fig. 2 is the method flow sketch map of first embodiment of the invention;
Fig. 3 is the device structure sketch map of first embodiment of the invention;
Fig. 4 is the method flow sketch map of second embodiment of the invention;
Fig. 5 is the system configuration sketch map of second embodiment of the invention;
Fig. 6 is the method flow sketch map of third embodiment of the invention;
Fig. 7 is the system configuration sketch map of third embodiment of the invention;
Fig. 8 is the method flow sketch map of fourth embodiment of the invention;
Fig. 9 is the system configuration sketch map of fourth embodiment of the invention.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer; To combine the accompanying drawing in the embodiment of the invention below; Technical scheme in the embodiment of the invention is carried out clear, intactly description; Obviously; Described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
Fig. 1 is the system configuration sketch map of first embodiment of the invention, and referring to Fig. 1, this system comprises at least two subregion 1, comprises a host node (master) 11 and the slave node (slave) 12 of at least one in each subregion; Be respectively provided to few one virtual machine (Virtual Machine, VM) 13 on each host node 11 and each slave node 12.
For example; Referring to Fig. 1; Host node comprises host node A, host node B and host node C etc.; The slave node of host node A place subregion comprises slave node a1, slave node a2 etc.; The slave node of host node B place subregion comprises slave node b1, slave node b2 etc., and the slave node of host node C place subregion comprises slave node c1, slave node c2 etc.
Adopt the peer-type framework between the host node 11 in the different subregions, promptly a host node can send resource state information to other arbitrary host node, also can receive the resource state information that other arbitrary host nodes send.Adopt star schema between host node 11 in each subregion and the slave node 12, that is, slave node can send resource state information to host node, and host node does not send resource state information to slave node.This resource state information can show that corresponding nodes is normally or lost efficacy.
Said host node comprises a management host node (master leader) and the common host node of at least one; Said management host node is used for after common host node or slave node inefficacy; A new common host node or slave node of gravity treatment in the subregion at common host node that lost efficacy or slave node place; Perhaps; During virtual-machine fail on common host node or slave node, restart virtual machine.
Wherein, In the host node one can be set to manage host node in advance; All the other host nodes are common host node; Can store the information of the virtual machine on each host node and slave node and the node in the management host node; All nodes in the subregion carry out unified management, Unified Treatment fault after breaking down.For example, referring to Fig. 1, host node C can be set be the management host node, and host node A, host node B etc. are common host node.
Corresponding above-mentioned system, the flow process of each equipment room can be following.
Fig. 2 is the method flow sketch map of first embodiment of the invention, comprising:
Step 21: node judges whether at least one in following taken place: have the common host node that lost efficacy, have the slave node that lost efficacy, perhaps, have the virtual machine of fault;
Step 22: node is lived again and is imitated new common host node after confirming there is the common host node that lost efficacy; After confirming there is the slave node that lost efficacy, live again and imitate new slave node; Perhaps, after confirming there is the virtual machine of fault, restart virtual machine;
Wherein, said common host node and slave node are divided in two the subregion at least, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to few one virtual machine on each host node and each slave node; Adopt the peer-type framework between the host node in the different subregions; Adopt star schema between host node in each subregion and the slave node; Said host node comprises a management host node and the common host node of at least one.
Above-mentioned node can be specially common host node, management host node and slave node, and in node difference, scene not simultaneously, above-mentioned flow process can have different embodiments.Particular content can be referring to the subsequent implementation example.
Accordingly, this method corresponding equipment can be as described below.
Fig. 3 is the device structure sketch map of first embodiment of the invention, comprises judging unit 31 and processing unit 32; Judging unit 31 is used for judging whether at least one item of following takes place: have the common host node that lost efficacy, have the slave node that lost efficacy, perhaps, have the virtual machine of fault; Processing unit 32 is used for after confirming there is the common host node that lost efficacy, and lives again and imitates new common host node; After confirming there is the slave node that lost efficacy, live again and imitate new slave node; Perhaps, after confirming there is the virtual machine of fault, restart virtual machine;
Wherein, said common host node and slave node are divided in two the subregion at least, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to few one virtual machine on each host node and each slave node; Adopt the peer-type framework between the host node in the different subregions; Adopt star schema between host node in each subregion and the slave node; Said host node comprises a management host node and the common host node of at least one.
Certainly, corresponding with the said method flow process, above-mentioned equipment can be common host node, management host node, slave node, and under different nodes and scene, the concrete function of said units is different.Specifically can be referring to the following examples.
The system of virtual cluster of the embodiment of the invention can be realized system extension through increasing subregion through dividing subregion; Adopt the peer-type structure between the host node of subregion, can eliminate bottleneck problem, and can improve reliability; Through the new host node of gravity treatment, slave node or restart virtual machine and can further improve reliability.
Fig. 4 is the method flow sketch map of second embodiment of the invention, and Fig. 5 is the system configuration sketch map of second embodiment of the invention, and it is example that present embodiment lost efficacy with common host node.
Referring to Fig. 4, present embodiment comprises:
Step 41: during the cluster operate as normal, the common host node of each subregion detects heartbeat mutually through heartbeat detection module (heartbeatsync).
For example, the heartbeat detection module of common host node A sends to heartbeat message the heartbeat detection module of common host node B.
Step 42: stop if the heartbeat detection module of common host node B detects the heartbeat message of common host node A, multicast failure message then carries the identification information of common host node A in this failure message, lost efficacy to show common host node A.
Wherein, after common host node B does not receive the heartbeat message of common host node A within a certain period of time, confirm that then the heartbeat that detects common host node A stops.
This identification information can be used to distinguish each node, for example is ID or address of common host node A etc.
Wherein, all the other common host nodes and management host node all can receive this failure message.
Step 43: after the heartbeat detection module of management host node receives failure message, to the management host node high available (High Availability, HA) module reports the host node failure message, carries the identification information of common host node A in this host node failure message.
Step 44: the HA module of management host node is the new common host node of this subregion with a slave node gravity treatment in the subregion of common host node A place.
For example, according to the ID priority of each slave node, the dynamic load situation of slave node, the slave node a1 that A is belonged in the subregion elects new common host node as.
Step 45: the HA module of management host node is sent the request of migration virtual machine to the resource management module (ResourceMgmt) of management host node, carries the identification information of new common host node a1 and the identification information of common host node A in this migration virtual machine request.
Step 46: the management host node resource management module with the virtual machine (vm) migration on the common host node A to new common host node a1.
For example, the configuration information of the virtual machine on the common host node A is sent to new common host node a1, and indicate new common host node a1 to rerun this configuration information to restart the corresponding virtual machine.Wherein, the configuration information of virtual machine is the information that can start virtual machine, for example is software virtual machine, after carrying out this software virtual machine, can start virtual machine.
Further, after new common host node added, host node also will further upgrade member relation:
Step 47: new common host node joins request to all the other common host node multicasts; After the heartbeat detection module of all the other common host nodes detects this and joins request; Send member relation to the member management module (MembershipMgmt) of correspondence and upgrade request, this member relation upgrades the identification information of the common host node of the identification information that carries new common host node in the request and inefficacy.
For example; After common host node B receives joining request of new common host node a1 multicast; The heartbeat detection module of common host node B is sent member relation to the member management module of common host node B and is upgraded request, carries the identification information of A and the identification information of a1 in this message.
Step 48: the member relation administration module upgrades member's relation list.
For example, the identification information of new common host node a1 is added in this member's tabulation, and the identification information of the common host node A that lost efficacy of deletion.
With reference to above-mentioned flow process, corresponding module can be following:
Referring to Fig. 5, in the present embodiment, relate to common host node 51 and management host node 52.Further, for common host node, its judging unit is specially the first heartbeat module detection module (Heartbeat Sync) 511, and processing unit is specially the first member relation administration module (MembershipMgmt) 512.For the management host node, its judging unit is specially the second heartbeat detection module 521, and processing unit specifically comprises first high available (HA) module 522 and first resource management module (ResourceMgmt) 523.
The first heartbeat detection module 511 is used for after detecting arbitrary other heartbeat of common host node and stopping, confirming to have the common host node that lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;
The first member relation administration module 512 is used to receive the first member relation request message; Carry the identification information of common host node of identification information and the inefficacy of new common host node in the said first member relation request message; The identification information of said common host node is newly added in the tabulation of first member relation, and delete the identification information of the common host node of the said inefficacy in said first member relation tabulation;
Wherein, Gravity treatment obtains said new common host node the slave node in the common host node place subregion of said inefficacy behind first failure message for the management host node receives; Said first failure message is that said common host node sends after confirming there is the common host node that lost efficacy, and carries the identification information of the common host node of said inefficacy in said first failure message.
The second heartbeat detection module 521 is used for after receiving first failure message; Confirm to exist the common host node that lost efficacy; Said first failure message is that common host node sends after confirming there is the common host node that lost efficacy, and carries the identification information of the common host node of said inefficacy in said first failure message;
The first high available modules 522 is used to receive the host node failure message; Carry the identification information of the common host node of said inefficacy in the said host node failure message; Gravity treatment goes out a new common host node in the slave node of the common host node place of said inefficacy subregion; And the identification information of the common host node of the identification information of said new common host node and inefficacy is carried at first moves in the virtual machine request and send, said host node failure message is to send after receiving said first failure message;
First resource management module 523 is used for according to the said first migration virtual machine request message, sends to the identification information of the virtual machine on the common host node of said inefficacy on the said new common host node and restarts said virtual machine.
Present embodiment can be realized the autgmentability of group system through subregion.Present embodiment is through adopting the peer-type framework between host node, can be after a host node lose efficacy, know in time that host node lost efficacy to lay equal stress on to select new host node, and improve availability.
Fig. 6 is the method flow sketch map of third embodiment of the invention, and Fig. 7 is the system configuration sketch map of third embodiment of the invention, and present embodiment is an example in order to node failure.
Referring to Fig. 6, present embodiment comprises:
Step 601: during the cluster operate as normal, the slave node of each subregion sends heartbeat message through the heartbeat detection module to the phychology detection module of the common host node of place subregion.
For example, the heartbeat detection module of slave node a1 sends to heartbeat message the heartbeat detection module of the common host node A of place subregion.
Step 602: stop if the heartbeat detection module of common host node A detects the heartbeat of slave node a1, then another slave node to the place subregion sends heartbeat detection message.
For example; Common host node A does not detect the heartbeat message of slave node a1 in the time of setting; The heartbeat that then common host node A detects slave node a1 stops; And, carry the identification information of slave node a1 in this heartbeat detection message to another slave node a2 of its place subregion transmission heartbeat detection message.
Step 603: slave node a2 detects the heartbeat situation of slave node a1.
For example, slave node a2 sends ping message to slave node a1, if do not receive the response message that slave node a1 returns, shows that then slave node a1 heartbeat stops.
Step 604: slave node a2 sends the heartbeat detection result to common host node A, wherein carries the heartbeat detection result to slave node a1.
Step 605: stop if the heartbeat detection result also shows the heartbeat of slave node a1, then common host node A multicast failure message carries the identification information of slave node a1 in this failure message.
Wherein, all the other common host nodes all can receive failure message with the management host node.
Step 606: after the heartbeat detection module of management host node received this failure message, the HA module in the management host node was sent the slave node failure message, carries the identification information of the slave node a1 of inefficacy in this slave node failure message.
Step 607: the HA module of management host node is elected another slave node as the slave node of migration virtual machine in the subregion of slave node a1 place.
Wherein, also can select another slave node according to priority, loading condition etc.
Step 608: the HA module of management host node is sent the request of migration virtual machine to the resource management module of management host node, wherein carries the identification information of slave node of identification information and the inefficacy of new slave node.
For example, the slave node of gravity treatment is a2, then moves and carries the identification information of a1 and the identification information of a2 in the virtual machine request.
Step 609: the management host node resource management module with the virtual machine (vm) migration on the slave node a1 to slave node a2.
For example, the configuration information of the virtual machine on the slave node a1 is sent to slave node a2, and indication a2 reruns this configuration information to restart the corresponding virtual machine.Wherein, the configuration information of virtual machine be can start the information of virtual machine, for example be software virtual machine, after carrying out this software virtual machine, can start virtual machine.
Further, the slave node of inefficacy can be carried out following action:
Step 610: slave node a1 is after finding that own heartbeat message is lost, and the ping gateway is promptly to oneself gateway transmission ping message.
Step 611: if ping is obstructed, promptly can not receive ping message corresponding response message, then down.
With reference to above-mentioned flow process, corresponding module can be following:
Referring to Fig. 7, in the present embodiment, relate to common host node 71, management host node 72 and slave node 73.Further, for common host node, its judging unit and processing unit are same module, are specially three-core mode hopping piece detection module 711.For the management host node, its judging unit is specially the 4th heartbeat detection module 721, and processing unit specifically comprises the second high available modules 722 and second resource management module (ResourceMgmt) 723.For slave node, its judging unit and processing unit are same module, are specially the 5th heartbeat module detection module 731.
After the heartbeat that said the 3rd heartbeat detection module 711 is used for the arbitrary slave node in detecting said common host node place subregion stops, confirming to have the slave node that lost efficacy, and the slave node of slave node for losing efficacy that stop of definite heartbeat;
The 4th heartbeat detection module 721 is after receiving second failure message; Confirm to exist the slave node that lost efficacy; The said second failure identification information is that common host node sends after confirming there is the slave node that lost efficacy, and carries the identification information of the slave node of said inefficacy in the said second failure identification information;
The second high available modules 722 is used to receive the slave node failure message; Carry the identification information of the slave node of said inefficacy in the said slave node failure message; Gravity treatment goes out a new slave node in the subregion of the slave node place of said inefficacy; And the identification information of the slave node of the identification information of said new slave node and inefficacy is carried in the second migration virtual machine request sends, said slave node failure message is to send after receiving said second failure message;
Second resource management module 723 is used for according to the said second migration virtual machine request message, sends to the identification information of the virtual machine on the slave node of said inefficacy on the said new slave node and restarts said virtual machine.
Said the 5th heartbeat detection module 731 is used for when said slave node did not lose efficacy, sending heartbeat message; When losing efficacy, do not send heartbeat message; So that the common host node of said slave node place subregion confirms according to the situation of said heartbeat message whether said slave node lost efficacy; And electric treatment under from the time as the slave node that lost efficacy; Perhaps; Self not the slave node that lost efficacy and receiving whether the corresponding slave node of detection request back detection is the slave node of inefficacy; And testing result is notified to said common host node; Make said common host node live again and imitate the slave node processing; Said detection request is to send after said common host node is not received the heartbeat message of arbitrary slave node within a certain period of time, carries the identification information of the slave node that heartbeat stops in the said detection request.
Present embodiment can be realized the autgmentability of group system through subregion.Present embodiment adopts star schema through slave node and host node, can be after a slave node lose efficacy, and host node in time and to the virtual machine on the slave node that lost efficacy moves, and improves availability.
Fig. 8 is the method flow sketch map of fourth embodiment of the invention, and Fig. 9 is the system configuration sketch map of fourth embodiment of the invention, and present embodiment is example with the virtual-machine fail.
Referring to Fig. 8, present embodiment comprises:
Step 81: during the cluster operate as normal, the virtual machine proxy module on each node sends heartbeat message to the heartbeat detection module of its place node.
For example, the virtual machine proxy module of a certain slave node sends heartbeat message to the heartbeat detection module of this slave node.
Step 82: stop if the heartbeat detection module of this slave node detects the heartbeat of virtual machine, then the common host node to the place subregion sends failure message.
For example, the heartbeat detection module does not receive the heartbeat message that the virtual machine proxy module on the corresponding nodes sends on this slave node within a certain period of time, confirms that then the jumping of corresponding virtual movement stops.
Step 83: after common host node received failure message, the multicast failure message carried the identification information of the virtual machine of fault in this failure message.
Above-mentioned is example with the virtual-machine fail on the slave node; When the virtual-machine fail on the host node; After heartbeat detection module on the host node does not receive the heartbeat message of virtual machine proxy module transmission within a certain period of time, then confirm the virtual-machine fail on this host node, the multicast failure message.
Above-mentioned failure message can be received by all the other common host nodes and management host node.
Step 84: after the heartbeat detection module of management host node receives failure message, send virtual-machine fail message, carry the identification information of the virtual machine of fault in this virtual-machine fail message to the HA module of management host node.
Step 85: the HA module of management host node is restarted the virtual machine request to the resource management module transmission of management host node, and this restarts the identification information that carries the virtual machine of fault in the virtual machine request.
Step 86: the resource module of management host node is restarted virtual machine.
For example, the configuration information of the virtual machine of fault is issued the node at this virtual machine place once more, and the indication corresponding nodes reruns this configuration information to restart virtual machine.Perhaps, as destination node, the configuration information with the virtual machine of this fault sends to this destination node to the management host node afterwards, and the indicating target node reruns this configuration information to restart virtual machine according to node of gravity treatments such as priority, loading condition.It specifically can be this configuration information of resource management module gravity treatment operation of destination node.
With reference to above-mentioned flow process, corresponding module can be following:
Referring to Fig. 9, in the present embodiment, relate to common host node 91, management host node 92 and slave node 93.Further, for common host node, its judging unit is specially the 6th heartbeat module detection module 911, and processing unit is specially the 4th resource management module 912.For the management host node, its judging unit is specially the 7th heartbeat detection module 921, and processing unit specifically comprises the 3rd high available modules 922 and information resources administration module 923.For slave node, its judging unit specifically comprises virtual machine proxy module 931 and the 8th heartbeat module detection module 932, and processing unit is specially the 5th resource management module 933.
The 6th heartbeat detection module 911 is used for the virtual-machine fail message that the arbitrary slave node in receiving said common host node place subregion sends; Perhaps; After the heartbeat that detects the virtual machine of self stops; Confirm to exist the virtual machine of fault, and the virtual machine that the virtual machine or the heartbeat of said virtual-machine fail message indication stops to be confirmed as the virtual machine of fault;
The 4th resource management module 912 is used for when the virtual-machine fail of self; The configuration information of the virtual machine of the fault that the receiving management host node sends; And rerun said configuration information to restart the virtual machine of said fault; The configuration information of the virtual machine of said fault is that said management host node sends after receiving the 3rd failure message; Said the 3rd failure message is that said common host node sends after confirming there is the virtual machine that lost efficacy, and carries the identification information of the virtual machine of said fault in said the 3rd failure message.
The 7th heartbeat detection module 921 is used for after receiving the 3rd failure message, confirming to exist the virtual machine of fault, carries the identification information of fault virtual machine in said the 3rd failure message;
The 3rd high available modules 922 is used for the sink virtual machine failure message and sends restarting the virtual machine request; Said virtual-machine fail message is to send after receiving said the 3rd failure message, and said virtual-machine fail message and said is restarted the identification information that carries the fault virtual machine in the virtual machine request;
Information resources administration module 923 is used for the configuration information of said fault virtual machine corresponding virtual machine is sent to the node at said fault virtual machine place, and indicates said node to rerun said configuration information to restart said fault virtual machine.
Virtual machine proxy module 931 is used for just often sending heartbeat message at the corresponding virtual machine, and when fault, does not send heartbeat message;
The 8th heartbeat detection module 932 is used for transmission situation according to said heartbeat message after the heartbeat that detects the virtual machine on the said slave node stops, and then confirm to exist the virtual machine of fault, and the virtual machine that definite heartbeat stops is the fault virtual machine;
The 5th resource management module 933 is used for the configuration information of the fault virtual machine of receiving management host node transmission; And rerun said configuration information to restart said fault virtual machine; The configuration information of said fault virtual machine is that said management host node sends after receiving the 3rd failure message; Said the 3rd failure message is that said common host node sends after receiving virtual-machine fail message; Carry the identification information of said fault virtual machine in said the 3rd failure message; Said virtual-machine fail message is that said slave node stops the back transmission in the heartbeat that detects the virtual machine on the said slave node, carries the identification information of said fault virtual machine in the said virtual-machine fail message.
Present embodiment can be realized the autgmentability of group system through subregion.Present embodiment is through adopting the peer-type framework between host node, slave node and host node adopt star schema, can behind virtual-machine fail, in time know virtual-machine fail and restart virtual machine, improve availability.
To sum up, the embodiment of the invention can realize the expansion of cluster scale through increasing subregion through subregion is set; Adopt the peer-type management through a plurality of host nodes, can eliminate the HA bottleneck; Through isochronous resources state information between host node, and asynchronous resource utilization information, can be so that the malfunction monitoring communication-cost be little, the state synchronized expense is little; After the heartbeat of certain slave node stopped, the host node of this subregion selected interior other slave node of this subregion to arbitrate, and can reduce erroneous judgement and promote availability; Adopt the peer-type framework between host node,, further strengthen the host node reliability compared to star schema; Through effectively utilizing slave node, with virtual machine (vm) migration, can reduce the wasting of resources, reduce administration overhead.
Be understandable that the reference each other of the correlated characteristic in said method and the equipment.In addition, " first " in the foregoing description, " second " etc. are to be used to distinguish each embodiment, and do not represent the quality of each embodiment.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be accomplished through the relevant hardware of program command; Aforesaid program can be stored in the computer read/write memory medium; This program the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
What should illustrate at last is: above embodiment is only in order to illustrating technical scheme of the present invention, but not to its restriction; Although with reference to previous embodiment the present invention has been carried out detailed explanation, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these are revised or replacement, do not make the spirit and scope of the essence disengaging various embodiments of the present invention technical scheme of relevant art scheme.

Claims (20)

1. the processing method of a system of virtual cluster is characterized in that, comprising:
Node judges whether at least one in following taken place: have the common host node that lost efficacy, have the slave node that lost efficacy, perhaps, have the virtual machine of fault;
Node is lived again and is imitated new common host node after confirming there is the common host node that lost efficacy; After confirming there is the slave node that lost efficacy, live again and imitate new slave node; Perhaps, after confirming there is the virtual machine of fault, restart virtual machine;
Wherein, said common host node and slave node are divided in two the subregion at least, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to few one virtual machine on each host node and each slave node; Adopt the peer-type framework between the host node in the different subregions; Adopt star schema between host node in each subregion and the slave node; Said host node comprises a management host node and the common host node of at least one.
2. method according to claim 1 is characterized in that, if said node is common host node,
The common host node that judge to exist lost efficacy comprises: after detecting arbitrary other heartbeat of common host node and stopping, confirming to have the common host node that lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;
Said after confirming there is the common host node that lost efficacy, live again and imitate new common host node, comprising:
Receive the first member relation request message; Carry the identification information of common host node of identification information and the inefficacy of new common host node in the said first member relation request message; The identification information of said common host node is newly added in the tabulation of first member relation, and delete the identification information of the common host node of the said inefficacy in said first member relation tabulation;
Wherein, Gravity treatment obtains said new common host node the slave node in the common host node place subregion of said inefficacy behind first failure message for the management host node receives; Said first failure message is that said common host node sends after confirming there is the common host node that lost efficacy, and carries the identification information of the common host node of said inefficacy in said first failure message.
3. method according to claim 1 and 2 is characterized in that, when said node is common host node,
Judge and exist the slave node of inefficacy to comprise: after the heartbeat of the arbitrary slave node in detecting said common host node place subregion stops, confirming to have the slave node that lost efficacy, and the slave node that definite heartbeat stops is the slave node that lost efficacy;
Said after confirming there is the slave node that lost efficacy, live again and imitate new slave node, comprising:
Receive the second member relation request message; Carry the identification information of slave node of identification information and the inefficacy of new slave node in the said second member relation request message; The identification information of said new slave node is added in the tabulation of second member relation, and delete the identification information of the slave node of the said inefficacy in said second member relation tabulation;
Wherein, Gravity treatment obtains said new slave node the slave node in the slave node place subregion of said inefficacy behind second failure message for the management host node receives; Said second failure message is that said common host node sends after confirming there is the slave node that lost efficacy, and carries the identification information of the slave node of said inefficacy in said second failure message.
4. method according to claim 1 and 2 is characterized in that, when said node is common host node,
Judging exists the virtual machine of fault to comprise: the virtual-machine fail message that the arbitrary slave node in receiving said common host node place subregion sends; Perhaps; After the heartbeat that detects the virtual machine of self stops; Confirm to exist the virtual machine of fault, and the virtual machine that the virtual machine or the heartbeat of said virtual-machine fail message indication stops to be confirmed as the virtual machine of fault;
Saidly after confirming there is the virtual machine of fault, restart virtual machine, comprising:
When self virtual-machine fail; The configuration information of the virtual machine of the fault that the receiving management host node sends; And rerun said configuration information to restart the virtual machine of said fault; The configuration information of the virtual machine of said fault is that said management host node sends after receiving the 3rd failure message; Said the 3rd failure message is that said common host node sends after confirming there is the virtual machine that lost efficacy, and carries the identification information of the virtual machine of said fault in said the 3rd failure message.
5. method according to claim 1 is characterized in that, when said node is the management host node,
Judge and exist the common host node that lost efficacy to comprise: after receiving first failure message; Confirm to exist the common host node that lost efficacy; Said first failure message is that common host node sends after confirming there is the common host node that lost efficacy, and carries the identification information of the common host node of said inefficacy in said first failure message;
Said after confirming there is the common host node that lost efficacy, live again and imitate new common host node, comprising:
Receive the host node failure message; Carry the identification information of the common host node of said inefficacy in the said host node failure message; Gravity treatment goes out a new common host node in the slave node of the common host node place of said inefficacy subregion; And the identification information of the common host node of the identification information of said new common host node and inefficacy is carried at first moves in the virtual machine request and send, said host node failure message is to send after receiving said first failure message;
According to the said first migration virtual machine request message, send to the identification information of the virtual machine on the common host node of said inefficacy on the said new common host node and restart said virtual machine.
6. according to claim 1 or 5 described methods, it is characterized in that, when said node is the management host node,
Judge and exist the slave node that lost efficacy to comprise: after receiving second failure message; Confirm to exist the slave node that lost efficacy; The said second failure identification information is that common host node sends after confirming there is the slave node that lost efficacy, and carries the identification information of the slave node of said inefficacy in the said second failure identification information;
Said after confirming there is the slave node that lost efficacy, live again and imitate new slave node, comprising:
Receive the slave node failure message; Carry the identification information of the slave node of said inefficacy in the said slave node failure message; Gravity treatment goes out a new slave node in the subregion of the slave node place of said inefficacy; And the identification information of the slave node of the identification information of said new slave node and inefficacy is carried in the second migration virtual machine request sends, said slave node failure message is to send after receiving said second failure message;
According to the said second migration virtual machine request message, send to the identification information of the virtual machine on the slave node of said inefficacy on the said new slave node and restart said virtual machine.
7. according to claim 1 or 5 described methods, it is characterized in that, when said node is the management host node,
Judging exists the virtual machine of fault to comprise: after receiving the 3rd failure message, confirm to exist the virtual machine of fault, carry the identification information of fault virtual machine in said the 3rd failure message;
Saidly after confirming there is the virtual machine of fault, restart virtual machine, comprising:
The sink virtual machine failure message also sends and to restart the virtual machine request, and said virtual-machine fail message is to send after receiving said the 3rd failure message, and said virtual-machine fail message and said is restarted the identification information that carries the fault virtual machine in the virtual machine request;
The configuration information of said fault virtual machine corresponding virtual machine is sent to the node at said fault virtual machine place, and indicate said node to rerun said configuration information to restart said fault virtual machine.
8. method according to claim 1 is characterized in that, when said node is slave node,
Judge to have the slave node that lost efficacy, after confirming there is the slave node that lost efficacy, live again and imitate new slave node, comprising:
, sends said slave node heartbeat message when not losing efficacy; When losing efficacy, do not send heartbeat message; So that the common host node of said slave node place subregion confirms according to the situation of said heartbeat message whether said slave node lost efficacy; And electric treatment under from the time as the slave node that lost efficacy; Perhaps; Self not the slave node that lost efficacy and receiving whether the corresponding slave node of detection request back detection is the slave node of inefficacy; And testing result is notified to said common host node; Make said common host node live again and imitate the slave node processing; Said detection request is to send after said common host node is not received the heartbeat message of arbitrary slave node within a certain period of time, carries the identification information of the slave node that heartbeat stops in the said detection request.
9. according to claim 1 or 8 described methods, it is characterized in that, when said node is slave node,
Judging exists the virtual machine of fault to comprise:
Just often send heartbeat message at the corresponding virtual machine, and when fault, do not send heartbeat message;
After the heartbeat that detects the virtual machine on the said slave node stops, then confirming to exist the virtual machine of fault, and the virtual machine that definite heartbeat stops is the fault virtual machine according to the transmission situation of said heartbeat message;
Saidly after confirming there is the virtual machine of fault, restart virtual machine, comprising:
The configuration information of the fault virtual machine that the receiving management host node sends; And rerun said configuration information to restart said fault virtual machine; The configuration information of said fault virtual machine is that said management host node sends after receiving the 3rd failure message; Said the 3rd failure message is that said common host node sends after receiving virtual-machine fail message; Carry the identification information of said fault virtual machine in said the 3rd failure message; Said virtual-machine fail message is that said slave node stops the back transmission in the heartbeat that detects the virtual machine on the said slave node, carries the identification information of said fault virtual machine in the said virtual-machine fail message.
10. the treatment facility of a system of virtual cluster is characterized in that, comprising:
Judging unit is used for judging whether at least one item of following takes place: have the common host node that lost efficacy, have the slave node that lost efficacy, perhaps, have the virtual machine of fault;
Processing unit is used for after confirming there is the common host node that lost efficacy, and lives again and imitates new common host node; After confirming there is the slave node that lost efficacy, live again and imitate new slave node; Perhaps, after confirming there is the virtual machine of fault, restart virtual machine;
Wherein, said common host node and slave node are divided in two the subregion at least, comprise a host node and the slave node of at least one in each subregion; Be respectively provided to few one virtual machine on each host node and each slave node; Adopt the peer-type framework between the host node in the different subregions; Adopt star schema between host node in each subregion and the slave node; Said host node comprises a management host node and the common host node of at least one.
11. equipment according to claim 10 is characterized in that, when said equipment is common host node,
Said judging unit comprises:
The first heartbeat detection module is used for after detecting arbitrary other heartbeat of common host node and stopping, confirming to have the common host node that lost efficacy, and the common host node of common host node for losing efficacy that stop of definite heartbeat;
Said processing unit comprises:
The first member relation administration module; Be used to receive the first member relation request message; Carry the identification information of common host node of identification information and the inefficacy of new common host node in the said first member relation request message; The identification information of said common host node is newly added in the tabulation of first member relation, and delete the identification information of the common host node of the said inefficacy in said first member relation tabulation;
Wherein, Gravity treatment obtains said new common host node the slave node in the common host node place subregion of said inefficacy behind first failure message for the management host node receives; Said first failure message is that said common host node sends after confirming there is the common host node that lost efficacy, and carries the identification information of the common host node of said inefficacy in said first failure message.
12. equipment according to claim 10 is characterized in that, when said equipment is the management host node,
Said judging unit comprises:
The second heartbeat detection module; Be used for after receiving first failure message; Confirm to exist the common host node that lost efficacy; Said first failure message is that common host node sends after confirming there is the common host node that lost efficacy, and carries the identification information of the common host node of said inefficacy in said first failure message;
Said processing unit comprises:
The first high available modules; Be used to receive the host node failure message; Carry the identification information of the common host node of said inefficacy in the said host node failure message; Gravity treatment goes out a new common host node in the slave node of the common host node place of said inefficacy subregion; And the identification information of the common host node of the identification information of said new common host node and inefficacy is carried at first moves in the virtual machine request and send, said host node failure message is to send after receiving said first failure message;
First resource management module is used for according to the said first migration virtual machine request message, sends to the configuration information of the virtual machine on the common host node of said inefficacy on the said new common host node and restarts said virtual machine.
13. according to claim 10 or 11 described equipment, it is characterized in that, when said equipment is common host node,
Said judging unit and processing unit are arranged in the 3rd heartbeat detection module; After the heartbeat that said the 3rd heartbeat detection module is used for the arbitrary slave node in detecting said common host node place subregion stops; Confirm to have the slave node that lost efficacy, and the slave node that definite heartbeat stops is the slave node that lost efficacy;
Wherein, Gravity treatment obtains said new slave node the slave node in the slave node place subregion of said inefficacy behind second failure message for the management host node receives; Said second failure message is that said common host node sends after confirming there is the slave node that lost efficacy, and carries the identification information of the slave node of said inefficacy in said second failure message.
14. according to claim 10 or 12 described equipment, it is characterized in that, when said equipment is the management host node,
Said judging unit comprises:
The 4th heartbeat detection module; Be used for after receiving second failure message; Confirm to exist the slave node that lost efficacy; The said second failure identification information is that common host node sends after confirming there is the slave node that lost efficacy, and carries the identification information of the slave node of said inefficacy in the said second failure identification information;
Said processing unit comprises:
The second high available modules; Be used to receive the slave node failure message; Carry the identification information of the slave node of said inefficacy in the said slave node failure message; Gravity treatment goes out a new slave node in the subregion of the slave node place of said inefficacy; And the identification information of the slave node of the identification information of said new slave node and inefficacy is carried in the second migration virtual machine request sends, said slave node failure message is to send after receiving said second failure message;
Second resource management module is used for according to the said second migration virtual machine request message, sends to the identification information of the virtual machine on the slave node of said inefficacy on the said new slave node and restarts said virtual machine.
15. equipment according to claim 10 is characterized in that, when said equipment is slave node,
Said judging unit and said processing unit are formed the 5th heartbeat detection module; Said the 5th heartbeat detection module is used for when said slave node did not lose efficacy, sending heartbeat message; When losing efficacy, do not send heartbeat message; So that the common host node of said slave node place subregion confirms according to the situation of said heartbeat message whether said slave node lost efficacy; And electric treatment under from the time as the slave node that lost efficacy; Perhaps; Self not the slave node that lost efficacy and receiving whether the corresponding slave node of detection request back detection is the slave node of inefficacy; And testing result is notified to said common host node; Make said common host node live again and imitate the slave node processing; Said detection request is to send after said common host node is not received the heartbeat message of arbitrary slave node within a certain period of time, carries the identification information of the slave node that heartbeat stops in the said detection request.
16. according to claim 10 or 11 described equipment, it is characterized in that, when said equipment is common host node,
Said judging unit comprises:
The 6th heartbeat detection module; Be used for the virtual-machine fail message that the arbitrary slave node in receiving said common host node place subregion sends; Perhaps; After the heartbeat that detects the virtual machine of self stops; Confirm to exist the virtual machine of fault, and the virtual machine that the virtual machine or the heartbeat of said virtual-machine fail message indication stops to be confirmed as the virtual machine of fault;
Said processing unit comprises:
The 4th resource management module; Be used for when self virtual-machine fail; The configuration information of the virtual machine of the fault that the receiving management host node sends; And rerun said configuration information to restart the virtual machine of said fault; The configuration information of the virtual machine of said fault is that said management host node sends after receiving the 3rd failure message; Said the 3rd failure message is that said common host node sends after confirming there is the virtual machine that lost efficacy, and carries the identification information of the virtual machine of said fault in said the 3rd failure message.
17. according to claim 10 or 12 described equipment, it is characterized in that, when said equipment is the management host node,
Said judging unit comprises:
The 7th heartbeat detection module is used for after receiving the 3rd failure message, confirming to exist the virtual machine of fault, carries the identification information of fault virtual machine in said the 3rd failure message;
Said processing unit comprises:
The 3rd high available modules; Be used for the sink virtual machine failure message and send restarting the virtual machine request; Said virtual-machine fail message is to send after receiving said the 3rd failure message, and said virtual-machine fail message and said is restarted the identification information that carries the fault virtual machine in the virtual machine request;
The information resources administration module is used for the configuration information of said fault virtual machine corresponding virtual machine is sent to the node at said fault virtual machine place, and indicates said node to rerun said configuration information to restart said fault virtual machine.
18. according to claim 10 or 15 described equipment, it is characterized in that, when said equipment is slave node,
Said judging unit comprises:
The virtual machine proxy module is used for just often sending heartbeat message at the corresponding virtual machine, and when fault, does not send heartbeat message;
The 8th heartbeat detection module, the transmission situation according to said heartbeat message of being used for then confirm to exist the virtual machine of fault, and the virtual machine that definite heartbeat stops is the fault virtual machine after the heartbeat that detects the virtual machine on the said slave node stops;
Said processing unit comprises:
The 5th resource management module; The configuration information that is used for the fault virtual machine of receiving management host node transmission; And rerun said configuration information to restart said fault virtual machine; The configuration information of said fault virtual machine is that said management host node sends after receiving the 3rd failure message; Said the 3rd failure message is that said common host node sends after receiving virtual-machine fail message; Carry the identification information of said fault virtual machine in said the 3rd failure message; Said virtual-machine fail message is that said slave node stops the back transmission in the heartbeat that detects the virtual machine on the said slave node, carries the identification information of said fault virtual machine in the said virtual-machine fail message.
19. a system of virtual cluster is characterized in that, comprising:
At least two subregion comprises a host node and the slave node of at least one in each subregion; Be respectively provided to few one virtual machine on each host node and each slave node;
Adopt the peer-type framework between the host node in the different subregions;
Adopt star schema between host node in each subregion and the slave node;
Said host node comprises a management host node and the common host node of at least one; Said management host node is used for after common host node or slave node inefficacy; A new common host node or slave node of gravity treatment in the subregion at common host node that lost efficacy or slave node place; Perhaps; During virtual-machine fail on common host node or slave node, restart virtual machine.
20. system according to claim 19 is characterized in that,
Said common host node is an equipment as claimed in claim 11; Said management host node is an equipment as claimed in claim 12;
Perhaps,
Said common host node is an equipment as claimed in claim 13; Said management host node is an equipment as claimed in claim 14; And said slave node is an equipment as claimed in claim 15;
Perhaps,
Said common host node is an equipment as claimed in claim 16; Said management host node is an equipment as claimed in claim 17; And said slave node is an equipment as claimed in claim 18.
CN201110301796.0A 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof Expired - Fee Related CN102355369B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201110301796.0A CN102355369B (en) 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof
PCT/CN2012/082196 WO2013044828A1 (en) 2011-09-27 2012-09-27 Virtual cluster system, processing method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110301796.0A CN102355369B (en) 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof

Publications (2)

Publication Number Publication Date
CN102355369A true CN102355369A (en) 2012-02-15
CN102355369B CN102355369B (en) 2014-01-08

Family

ID=45578866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110301796.0A Expired - Fee Related CN102355369B (en) 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof

Country Status (2)

Country Link
CN (1) CN102355369B (en)
WO (1) WO2013044828A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102664763A (en) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 Method for rapidly detecting connection states and making virtual machine HA
WO2013044828A1 (en) * 2011-09-27 2013-04-04 华为技术有限公司 Virtual cluster system, processing method and device thereof
CN103229463A (en) * 2012-12-18 2013-07-31 华为技术有限公司 Method for determining administrative domains and network devices and virtual cluster
CN103294494A (en) * 2012-02-29 2013-09-11 中兴通讯股份有限公司 Automatic deployment method and system of virtual system
CN103607296A (en) * 2013-11-01 2014-02-26 杭州华三通信技术有限公司 Virtual machine fault processing method and equipment thereof
CN105357038A (en) * 2015-10-26 2016-02-24 北京百度网讯科技有限公司 Method and system for monitoring virtual machine cluster
CN105591780A (en) * 2014-10-24 2016-05-18 杭州华三通信技术有限公司 Cluster monitoring method and device
CN106062717A (en) * 2014-11-06 2016-10-26 华为技术有限公司 Distributed storage replication system and method
CN106612314A (en) * 2015-10-26 2017-05-03 上海宝信软件股份有限公司 System for realizing software-defined storage based on virtual machine
CN106789350A (en) * 2017-01-23 2017-05-31 郑州云海信息技术有限公司 A kind of method and device of back-level server virtualization system host node High Availabitity
CN103729234B (en) * 2013-12-20 2017-06-27 中电长城网际系统应用有限公司 A kind of cluster virtual machine management method and device
CN107018041A (en) * 2017-03-31 2017-08-04 杭州数梦工场科技有限公司 Data migration method and device in cluster
CN107315663A (en) * 2017-03-10 2017-11-03 秦皇岛市第医院 Double machine and clustering framework
CN108108255A (en) * 2016-11-25 2018-06-01 中兴通讯股份有限公司 The detection of virtual-machine fail and restoration methods and device
CN108111337A (en) * 2017-12-06 2018-06-01 北京天融信网络安全技术有限公司 Distributed system arbitrates the method and apparatus of host node
CN109361777A (en) * 2018-12-18 2019-02-19 广东浪潮大数据研究有限公司 Synchronous method, synchronization system and the relevant apparatus of distributed type assemblies node state
WO2019178714A1 (en) * 2018-03-19 2019-09-26 华为技术有限公司 Fault detection method, apparatus, and system
CN110661599A (en) * 2018-06-28 2020-01-07 中兴通讯股份有限公司 HA implementation method, device and storage medium between main node and standby node
US10728099B2 (en) 2015-05-14 2020-07-28 Huawei Technologies Co., Ltd. Method for processing virtual machine cluster and computer system
CN113742417A (en) * 2020-05-29 2021-12-03 同方威视技术股份有限公司 Multi-level distributed consensus method and system, electronic device and computer readable medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155912A1 (en) * 2005-01-12 2006-07-13 Dell Products L.P. Server cluster having a virtual server
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8554981B2 (en) * 2007-02-02 2013-10-08 Vmware, Inc. High availability virtual machine cluster
CN102355369B (en) * 2011-09-27 2014-01-08 华为技术有限公司 Virtual clustered system as well as processing method and processing device thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155912A1 (en) * 2005-01-12 2006-07-13 Dell Products L.P. Server cluster having a virtual server
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013044828A1 (en) * 2011-09-27 2013-04-04 华为技术有限公司 Virtual cluster system, processing method and device thereof
CN103294494A (en) * 2012-02-29 2013-09-11 中兴通讯股份有限公司 Automatic deployment method and system of virtual system
CN102664763A (en) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 Method for rapidly detecting connection states and making virtual machine HA
CN103229463A (en) * 2012-12-18 2013-07-31 华为技术有限公司 Method for determining administrative domains and network devices and virtual cluster
CN103229463B (en) * 2012-12-18 2015-11-25 华为技术有限公司 A kind of method, the network equipment and Virtual Cluster determining management domain
US9973427B2 (en) 2012-12-18 2018-05-15 Huawei Technologies Co., Ltd. Method for determining management domain, network device, and virtual cluster
US9699080B2 (en) 2012-12-18 2017-07-04 Huawei Technologies Co., Ltd. Method for determining management domain, network device, and virtual cluster
CN103607296A (en) * 2013-11-01 2014-02-26 杭州华三通信技术有限公司 Virtual machine fault processing method and equipment thereof
CN103607296B (en) * 2013-11-01 2017-08-22 新华三技术有限公司 A kind of virtual-machine fail processing method and equipment
CN103729234B (en) * 2013-12-20 2017-06-27 中电长城网际系统应用有限公司 A kind of cluster virtual machine management method and device
CN105591780A (en) * 2014-10-24 2016-05-18 杭州华三通信技术有限公司 Cluster monitoring method and device
CN105591780B (en) * 2014-10-24 2019-01-29 新华三技术有限公司 Cluster monitoring method and equipment
US10713134B2 (en) 2014-11-06 2020-07-14 Huawei Technologies Co., Ltd. Distributed storage and replication system and method
CN106062717A (en) * 2014-11-06 2016-10-26 华为技术有限公司 Distributed storage replication system and method
US10728099B2 (en) 2015-05-14 2020-07-28 Huawei Technologies Co., Ltd. Method for processing virtual machine cluster and computer system
CN105357038A (en) * 2015-10-26 2016-02-24 北京百度网讯科技有限公司 Method and system for monitoring virtual machine cluster
CN106612314A (en) * 2015-10-26 2017-05-03 上海宝信软件股份有限公司 System for realizing software-defined storage based on virtual machine
CN108108255A (en) * 2016-11-25 2018-06-01 中兴通讯股份有限公司 The detection of virtual-machine fail and restoration methods and device
CN106789350A (en) * 2017-01-23 2017-05-31 郑州云海信息技术有限公司 A kind of method and device of back-level server virtualization system host node High Availabitity
CN107315663B (en) * 2017-03-10 2020-06-09 秦皇岛市第一医院 Dual-machine cluster architecture
CN107315663A (en) * 2017-03-10 2017-11-03 秦皇岛市第医院 Double machine and clustering framework
CN107018041B (en) * 2017-03-31 2019-05-17 杭州数梦工场科技有限公司 Data migration method and device in cluster
CN109981412A (en) * 2017-03-31 2019-07-05 杭州数梦工场科技有限公司 Data migration method, device, computer equipment and storage medium in cluster
CN107018041A (en) * 2017-03-31 2017-08-04 杭州数梦工场科技有限公司 Data migration method and device in cluster
CN109981412B (en) * 2017-03-31 2020-11-17 杭州数梦工场科技有限公司 Data migration method and device in cluster and storage medium
CN108111337A (en) * 2017-12-06 2018-06-01 北京天融信网络安全技术有限公司 Distributed system arbitrates the method and apparatus of host node
CN108111337B (en) * 2017-12-06 2021-04-06 北京天融信网络安全技术有限公司 Method and equipment for arbitrating main nodes in distributed system
WO2019178714A1 (en) * 2018-03-19 2019-09-26 华为技术有限公司 Fault detection method, apparatus, and system
CN110661599A (en) * 2018-06-28 2020-01-07 中兴通讯股份有限公司 HA implementation method, device and storage medium between main node and standby node
CN109361777A (en) * 2018-12-18 2019-02-19 广东浪潮大数据研究有限公司 Synchronous method, synchronization system and the relevant apparatus of distributed type assemblies node state
CN113742417A (en) * 2020-05-29 2021-12-03 同方威视技术股份有限公司 Multi-level distributed consensus method and system, electronic device and computer readable medium

Also Published As

Publication number Publication date
CN102355369B (en) 2014-01-08
WO2013044828A1 (en) 2013-04-04

Similar Documents

Publication Publication Date Title
CN102355369B (en) Virtual clustered system as well as processing method and processing device thereof
US8615676B2 (en) Providing first field data capture in a virtual input/output server (VIOS) cluster environment with cluster-aware vioses
CN108200124B (en) High-availability application program architecture and construction method
US8949828B2 (en) Single point, scalable data synchronization for management of a virtual input/output server cluster
US8726274B2 (en) Registration and initialization of cluster-aware virtual input/output server nodes
US8473692B2 (en) Operating system image management
US8381017B2 (en) Automated node fencing integrated within a quorum service of a cluster infrastructure
KR101504882B1 (en) Hardware failure mitigation
US10860375B1 (en) Singleton coordination in an actor-based system
JP2014501424A (en) Integrated software and hardware system that enables automated provisioning and configuration based on the physical location of the blade
CN105159798A (en) Dual-machine hot-standby method for virtual machines, dual-machine hot-standby management server and system
US20120151095A1 (en) Enforcing logical unit (lu) persistent reservations upon a shared virtual storage device
CN111506391B (en) Container deployment method and device
CN112052230B (en) Multi-machine room data synchronization method, computing device and storage medium
US10120779B1 (en) Debugging of hosted computer programs
US10387053B1 (en) Memory synchronization in a distributed computing system
CN104158707A (en) Method and device of detecting and processing brain split in cluster
CN104517067A (en) Method, device and system for data access
CN108810183B (en) Method and device for processing conflicting MAC addresses and machine-readable storage medium
CN111221620B (en) Storage method, device and storage medium
US9348672B1 (en) Singleton coordination in an actor-based system
CN106790521B (en) System and method for distributed networking by using node equipment based on FTP
US10133496B1 (en) Bindable state maintaining components
CN116010111B (en) Cross-cluster resource scheduling method, system and terminal equipment
CN114064217B (en) OpenStack-based node virtual machine migration method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140108

Termination date: 20190927

CF01 Termination of patent right due to non-payment of annual fee