WO2013044828A1 - Virtual cluster system, processing method and device thereof - Google Patents

Virtual cluster system, processing method and device thereof Download PDF

Info

Publication number
WO2013044828A1
WO2013044828A1 PCT/CN2012/082196 CN2012082196W WO2013044828A1 WO 2013044828 A1 WO2013044828 A1 WO 2013044828A1 CN 2012082196 W CN2012082196 W CN 2012082196W WO 2013044828 A1 WO2013044828 A1 WO 2013044828A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
virtual machine
standby
failed
standby node
Prior art date
Application number
PCT/CN2012/082196
Other languages
French (fr)
Chinese (zh)
Inventor
江滢
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013044828A1 publication Critical patent/WO2013044828A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • H04L41/0836Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability to enhance reliability, e.g. reduce downtime

Definitions

  • the present invention relates to network communication technologies, and in particular, to a virtualization cluster system and a processing method and apparatus therefor.
  • a cluster system has a powerful overall computing performance, storage performance, and management performance, as well as a single system image service form, and transparent availability guarantee and fault tolerance to the user, becoming the main infrastructure structure of the data center.
  • the application of virtualization technology provides a better and more promising solution for cluster development. Virtualization technology allows a single platform to run multiple operating systems simultaneously, and applications can run in separate spaces without affecting each other, significantly increasing the productivity of the computer. Running multiple virtual machines takes full advantage of the computing power of physical servers to provide a fast response capability for the data center.
  • Embodiments of the present invention provide a virtualized cluster system and a processing method and device thereof, which improve scalability and usability of a virtual machine cluster system.
  • An embodiment of the present invention provides a method for processing a virtualized cluster system, including:
  • the node determines whether at least one of the following items occurs: a normal primary node that has failed, a failed standby node, or a virtual machine that is faulty;
  • the node After determining that there is a failed normal primary node, the node re-enacts the new normal primary node; After the failed standby node, the new standby node is re-enabled; or, after determining the faulty virtual machine, restart the virtual machine;
  • the common primary node and the standby node are divided into at least two partitions, each partition includes a primary node and at least one standby node; each primary node and each standby node are respectively provided with at least one a virtual machine; a peer-to-peer architecture is adopted between the master nodes in different partitions; a star schema is adopted between the master node and the standby node in each partition; the master node includes a management master node and at least one common master node .
  • An embodiment of the present invention provides a processing device for a virtualized cluster system, including:
  • a determining unit configured to determine whether at least one of the following items occurs: a normal primary node that has failed, a standby node that has failed, or a virtual machine that is faulty;
  • the processing unit is configured to re-enable the new normal primary node after determining that the normal primary node that fails, and to re-enable the new standby node after determining that the failed standby node exists; or, after determining the faulty virtual machine, Restart the virtual machine;
  • the common primary node and the standby node are divided into at least two partitions, each partition includes a primary node and at least one standby node; each primary node and each standby node are respectively provided with at least one a virtual machine; a peer-to-peer architecture is adopted between the master nodes in different partitions; a star schema is adopted between the master node and the standby node in each partition; the master node includes a management master node and at least one common master node .
  • An embodiment of the present invention provides a virtualized cluster system, including:
  • each partition includes one primary node and at least one standby node; each primary node and each standby node respectively set at least one virtual machine;
  • a peer-to-peer architecture is used between the master nodes in different partitions
  • a star schema is used between the primary node and the standby node in each partition
  • the master node includes a management master node and at least one common master node, and the management master node is used to reselect one of the failed primary master nodes or the standby node in the partition where the normal master node or the standby node fails. A new normal primary or standby node, or restart the virtual machine when the virtual machine on the normal primary or standby node fails.
  • the virtualized cluster system in the embodiment of the present invention can implement system expansion by adding partitions by partitioning; the primary nodes of the partitions adopt a peer-to-peer structure, which can be eliminated.
  • reliability can be improved; reliability can be improved by reselecting new master nodes, standby nodes, or restarting virtual machines.
  • FIG. 1 is a schematic structural diagram of a system according to a first embodiment of the present invention
  • FIG. 2 is a schematic flow chart of a method according to a first embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a device according to a first embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of a method according to a second embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a system according to a second embodiment of the present invention.
  • FIG. 6 is a schematic flow chart of a method according to a third embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a system according to a third embodiment of the present invention.
  • FIG. 8 is a schematic flow chart of a method according to a fourth embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a system according to a fourth embodiment of the present invention.
  • the technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention.
  • the embodiments are a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
  • the system includes at least two partitions 1, each partition including a master node 11 and at least one slave node (slave) 12
  • Each of the master node 11 and each of the standby nodes 12 is provided with at least one virtual machine (VM) 13 respectively.
  • the master node includes a master node A, a master node B, and a master node C.
  • the standby node of the partition where the master node A is located includes the standby node a1 and the standby node a2, and the standby node of the partition where the master node B is located includes the standby node.
  • the node bl, the standby node b2, and the like, and the standby node of the partition where the primary node C is located includes the standby node cl, the standby node c2, and the like.
  • the master node 11 in different partitions adopts a peer-to-peer architecture, that is, one master node can send resource state information to any other master node, and can also receive resource state information sent by any other master node.
  • a star schema is adopted between the master node 11 and the standby node 12 in each partition, and the standby node sends resource state information to the master node, and the master node does not send resource state information to the standby node.
  • the resource status information can indicate whether the corresponding node is normal or invalid.
  • the master node includes a management master node and at least one common master node, and the management master node is used to partition the failed normal primary node or standby node after the normal primary node or the standby node fails. Re-select a new normal primary or standby node, or restart the virtual machine when the virtual machine on the normal primary or standby node fails.
  • one of the primary nodes may be pre-configured as a management primary node, and the remaining primary nodes are ordinary primary nodes, and the management primary node stores information of each of the primary node and the standby node and the virtual machine on the node, All nodes are managed in a unified manner, and the faults are handled uniformly after a failure.
  • the master node C can be set as the management master node, and the master node and the master node B are ordinary master nodes.
  • the flow between the devices can be as follows.
  • FIG. 2 is a schematic flow chart of a method according to a first embodiment of the present invention, including:
  • Step 21 The node determines whether at least one of the following items occurs: a normal primary node that has failed, a standby node that has failed, or a virtual machine that is faulty;
  • Step 22 After determining that there is a failed normal primary node, the node re-enacts the new normal primary node; after determining that the standby standby node exists, re-active the new standby node; or, after determining the faulty virtual machine, Restart the virtual machine;
  • the common primary node and the standby node are divided into at least two partitions, each partition includes a primary node and at least one standby node; each primary node and each standby node are respectively provided with at least one a virtual machine; a peer-to-peer architecture between the master nodes in different partitions; a star schema is adopted between the master node and the standby node in each partition; the master node includes a management master node and at least one Ordinary master node.
  • the above-mentioned nodes may be specifically a common master node, a management master node, and a standby node.
  • the foregoing processes may have different specific implementation manners. See the subsequent examples for details.
  • the device corresponding to the method can be as follows.
  • FIG. 3 is a schematic structural diagram of a device according to a first embodiment of the present invention, including a determining unit 31 and a processing unit 32.
  • the determining unit 31 is configured to determine whether at least one of the following items occurs: a normal primary node that has failed, and a failed primary device a node, or a virtual machine that is faulty;
  • the processing unit 32 is configured to re-enable the new normal primary node after determining that the normal primary node is invalid; and re-effective the new standby node after determining that the standby node is invalid; or After determining the faulty virtual machine, restarting the virtual machine; wherein the common primary node and the standby node are divided into at least two partitions, each partition includes one primary node and at least one standby node; At least one virtual machine is set on each of the primary node and each standby node; a peer-to-peer architecture is adopted between the primary nodes in different partitions; a star schema is adopted between the primary node and the
  • the foregoing device may be a common primary node, a management primary node, and a standby node, and the specific functions of the foregoing units are different in different nodes and scenarios. See the examples below for details.
  • the virtualized cluster system of the embodiment of the present invention can realize system expansion by dividing partitions by partitioning; the peer nodes of the partitions adopt a peer-to-peer structure, which can eliminate bottleneck problems and improve reliability;
  • the primary node, the standby node, or the restart virtual machine can further improve reliability.
  • FIG. 4 is a schematic flowchart of a method according to a second embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a system according to a second embodiment of the present invention.
  • this embodiment includes:
  • Step 41 When the cluster is working normally, the normal master node of each partition detects the heartbeat with each other through the heartbeat detection module (heartbeatsync).
  • the heartbeat detection module of the common primary node A sends the heartbeat information to the heartbeat detection module of the ordinary primary node B.
  • Step 42 If the heartbeat detection module of the ordinary primary node B detects the heartbeat of the ordinary primary node A If the information is stopped, the multicast fault message carries the identifier information of the common master node A to indicate that the normal master node A is invalid.
  • the normal primary node B After the normal primary node B does not receive the heartbeat information of the ordinary primary node A within a certain period of time, it is determined that the heartbeat stop of the ordinary primary node A is detected.
  • the identification information can be used to distinguish each node, for example, the ID or address of the ordinary primary node A.
  • the remaining common master node and the management master node will receive the fault message.
  • Step 43 After receiving the fault message, the heartbeat detection module of the management master node reports the master node fault message to the high availability (HA) module of the management master node, where the master node fault message carries the identifier of the common master node A. information.
  • HA high availability
  • Step 44 Manage the HA module of the master node in the partition where the common master node A is located, and reselect a standby node as the new common master node of the partition.
  • the standby node al in the zone where A is located is selected as the new common primary node.
  • Step 45 The HA module of the management master node sends a migration virtual machine request to the resource management module (ResourceMgmt) of the management master node, where the migration virtual machine request carries the identifier information of the new common master node a1 and the identifier of the common master node A. information.
  • ResourceMgmt resource management module
  • Step 46 The resource management module of the management master node migrates the virtual machine on the common primary node A to the new normal primary node al.
  • the configuration information of the virtual machine on the normal primary node A is sent to the new normal primary node al, and the new normal primary node al is instructed to re-run the configuration information to restart the corresponding virtual machine.
  • the configuration information of the virtual machine is information capable of starting the virtual machine, for example, virtual machine software, and the virtual machine can be started after executing the virtual machine software.
  • Step 47 The new normal master node multicasts the join request to the remaining common master nodes, and the heartbeat detection modules of the remaining common master nodes After detecting the join request, the member relationship update request is sent to the corresponding member management module (MembershipMgmt), and the membership update request carries the identifier information of the new common master node and the identifier information of the invalid ordinary master node.
  • MembershipMgmt member management module
  • the heartbeat detection module of the common primary node B sends the membership update to the member management module of the ordinary primary node B.
  • the request carries the identification information of A and the identification information of al.
  • Step 48 The membership management module updates the membership list.
  • the identification information of the new normal primary node al is added to the member list, and the identification information of the failed ordinary primary node A is deleted.
  • the corresponding module can be as follows:
  • a common master node 51 and a management master node 52 are involved.
  • the determining unit is specifically a first heartbeat module detection module (Heartbeat Sync) 511
  • the processing unit is specifically a first membership relationship management module (MembershipMgmt) 512.
  • the determining unit is specifically a second heartbeat detecting module 521
  • the processing unit specifically includes a first high availability (HA) module 522 and a first resource management module (ResourceMgmt) 523.
  • HA high availability
  • ResourceMgmt resource management module
  • the first heartbeat detection module 511 is configured to determine that a normal primary node that has failed after detecting a heartbeat stop of any other common primary node, and determine that the ordinary primary node that stops the heartbeat is a failed primary primary node;
  • the first member relationship management module 512 is configured to receive a first member relationship request message, where the first member relationship request message carries the identifier information of the new common master node and the identifier information of the invalid ordinary master node, and the new The identifier information of the common master node is added to the first member relationship list, and the identifier information of the invalid ordinary master node in the first member relationship list is deleted.
  • the new common primary node is reselected from the standby node in the partition where the failed primary primary node is located after the primary node receives the first fault message, and the first fault message is the common After the primary node determines that there is a failed normal primary node, the first fault message carries the identifier information of the failed normal primary node.
  • the second heartbeat detection module 521 is configured to: after receiving the first fault message, determine that there is a normal primary node that is in a failure, where the first fault message is sent by the ordinary master node after determining that the normal primary node is invalid, the first A fault message carries the identification information of the failed normal master node;
  • the first high availability module 522 is configured to receive a primary node fault message, where the primary node fault message carries the identifier information of the failed normal primary node, and is reselected in the standby node of the failed normal primary node. a new common master node, and carrying the identifier information of the new normal master node and the identifier information of the failed normal master node in the first migration virtual machine request, where the master node fault message is received Sent after the first fault message;
  • the first resource management module 523 is configured to send, according to the first migration virtual machine request message, identifier information of the virtual machine on the failed normal primary node to the new common primary node, and restart the virtual machine. .
  • the scalability of the cluster system can be achieved by partitioning.
  • a peer-to-peer architecture is adopted between the master nodes, and after a master node fails, the master node is known to be invalid and the new master node is reselected in time to improve the availability.
  • FIG. 6 is a schematic flowchart of a method according to a third embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of a system according to a third embodiment of the present invention.
  • this embodiment includes:
  • Step 601 When the cluster is working normally, the standby node of each partition sends heartbeat information to the heartbeat detection module of the common primary node in the partition through the heartbeat detection module.
  • the heartbeat detection module of the standby node al sends the heartbeat information to the heartbeat detection module of the normal primary node A of the partition.
  • Step 602 If the heartbeat detection module of the normal primary node A detects the heartbeat stop of the standby node al, sends a heartbeat detection message to another standby node of the local node.
  • the normal master node A detects the heartbeat stop of the standby node a1 and sends heartbeat detection to another standby node a2 of the partition in which it is located.
  • the heartbeat detection message carries the identifier information of the standby node a1.
  • Step 603 The standby node a2 detects the heartbeat condition of the standby node al.
  • the standby node a2 sends a pmg message to the standby node al. If the response message returned by the standby node al is not received, the standby node a heartbeat stops.
  • Step 604 The standby node a2 sends a heartbeat detection result to the normal primary node A, where the heartbeat detection result of the standby node al is carried.
  • Step 605 If the heartbeat detection result also indicates that the heartbeat of the standby node al is stopped, the normal primary node A multicasts a fault message, and the fault message carries the identification information of the standby node al.
  • the rest of the common primary node and the management master node will receive the fault message.
  • Step 606 After receiving the fault message, the heartbeat detection module of the management master node sends a standby node fault message to the HA module in the management master node, where the standby node fault message carries the identifier information of the failed standby node a1.
  • Step 607 The HA module of the management master node is in the partition where the standby node a is located, and the other standby node is selected as the standby node of the migration virtual machine.
  • another standby node can also be selected according to priority, load condition, and the like.
  • Step 608 The HA module of the management master node sends a migration virtual machine request to the resource management module of the management master node, where the identifier information of the new standby node and the identifier information of the invalid standby node are carried.
  • the migration virtual machine request carries the identification information of al and the identification information of a2.
  • Step 609 The resource management module of the management master node migrates the virtual machine on the standby node al to the standby node a2.
  • the configuration information of the virtual machine is information that can be used to start the virtual machine, for example, virtual machine software, and the virtual machine can be started after executing the virtual machine software.
  • the failed standby node can perform the following actions:
  • Step 610 After the standby node al finds that its heartbeat information is lost, the pmg gateway sends a ping message to its own gateway.
  • Step 611 If the ping fails, the response message corresponding to the ping message is not received, and the power is off.
  • the corresponding module can be as follows:
  • a common master node 71, a management master node 72, and a standby node 73 are involved. Further, for the common master node, the judgment unit and the processing unit are the same module, specifically the third heartbeat module detection module 711.
  • the determining unit is specifically a fourth heartbeat detecting module 721, and the processing unit specifically includes a second high-availability module 722 and a second resource management module.
  • the determining unit and the processing unit are the same module, specifically the fifth heartbeat module detecting module 731.
  • the third heartbeat detection module 711 is configured to: after detecting a heartbeat stop of any standby node in the partition where the common primary node is located, determine that the standby node is inactive, and determine that the standby node that is stopped by the heartbeat is a failed standby node. ;
  • the fourth heartbeat detecting module 721 determines that there is a standby node that is invalid, and the second fault identifier information is sent by the normal master node after determining that the standby node is invalid.
  • the second fault identification information carries the identifier information of the failed standby node;
  • the second high-availability module 722 is configured to receive a standby node fault message, where the standby node fault message carries the identifier information of the failed standby node, and reselects a new standby node in the partition of the failed standby node. And transmitting the identifier information of the new standby node and the identifier information of the failed standby node in the second migration virtual machine request, where the standby node failure message is sent after receiving the second fault message;
  • the second resource management module 723 is configured to send the identifier information of the virtual machine on the failed standby node to the new standby node according to the second migrated virtual machine request message, and restart the virtual machine.
  • the fifth heartbeat detection module 731 is configured to send heartbeat information when the standby node is not invalid, and does not send heartbeat information when the slave node is invalid, so that the ordinary master node of the partition where the standby node is located is determined according to the situation of the heartbeat information. Whether the standby node is powered off, and performs power-off processing when it is a standby node that is invalid, or detects whether the corresponding standby node is a failed standby node after receiving the detection request, and The detection result is sent to the common primary node, so that the normal primary node performs re-effectiveness of the standby node, and the detection request is sent by the ordinary primary node after receiving the heartbeat information of any standby node within a certain period of time.
  • the detection request carries the identification information of the standby node that stops the heartbeat.
  • the scalability of the cluster system can be achieved by partitioning.
  • the standby node and the primary node adopt a star architecture, and after a standby node fails, the primary node migrates the virtual machine on the failed standby node in time to improve availability.
  • FIG. 8 is a schematic flowchart of a method according to a fourth embodiment of the present invention
  • FIG. 9 is a schematic structural diagram of a system according to a fourth embodiment of the present invention.
  • this embodiment includes:
  • Step 81 When the cluster is working normally, the virtual machine proxy module on each node sends heartbeat information to the heartbeat detection module of the node where it is located.
  • the virtual machine proxy module of a standby node sends a heartbeat to the heartbeat detection module of the standby node.
  • Step 82 If the heartbeat detection module of the standby node detects the heartbeat stop of the virtual machine, send a fault message to the normal primary node of the partition. For example, if the heartbeat detection module on the standby node does not receive the heartbeat information sent by the virtual machine proxy module on the corresponding node within a certain period of time, it determines that the corresponding virtual machine heartbeat stops.
  • Step 83 After receiving the fault message, the normal master node multicasts the fault message, and the fault message carries the identifier information of the faulty virtual machine.
  • the heartbeat detection module on the primary node does not receive the heartbeat information sent by the virtual machine proxy module within a certain period of time, and then determines the primary node.
  • the virtual machine on the fault, multicast failure message when the virtual machine on the standby node is faulty, the heartbeat detection module on the primary node does not receive the heartbeat information sent by the virtual machine proxy module within a certain period of time, and then determines the primary node.
  • the virtual machine on the fault, multicast failure message when the virtual machine on the standby node is faulty.
  • the above fault message can be received by the remaining common master nodes and the management master node.
  • Step 84 After receiving the fault message, the heartbeat detection module of the management master node sends a virtual machine fault message to the HA module of the management master node, where the virtual machine fault message carries the label of the faulty virtual machine? Self
  • Step 85 The ⁇ module of the management master node sends a restart virtual machine request to the resource management module of the management master node, where the restart virtual machine request carries the identification information of the faulty virtual machine.
  • Step 86 Manage the resource module of the primary node to restart the virtual machine.
  • the configuration information of the failed virtual machine is sent to the node where the virtual machine is located, and the corresponding node is instructed to re-run the configuration information to restart the virtual machine.
  • the management master node reselects a node as the target node according to the priority, the load condition, and the like, and then sends the configuration information of the failed virtual machine to the target node, and instructs the target node to re-run the configuration information to restart the virtual machine.
  • the resource management module of the target node reselects and runs the configuration information.
  • the corresponding module can be as follows:
  • a common master node 91, a management master node 92, and a standby node 93 are involved.
  • the determining unit is specifically the sixth heartbeat module detecting module 911, and the processing unit is specifically the fourth resource management module 912.
  • the determining unit is specifically a seventh heartbeat detecting module 921, and the processing unit specifically includes a third high availability module 922 and a third resource management module 923.
  • the determining unit includes a virtual machine proxy module 931 and an eighth heartbeat module detecting module 932, and the processing unit is specifically a fifth resource management module 933.
  • the sixth heartbeat detection module 911 is configured to: after receiving the virtual machine fault message sent by any standby node in the partition where the common primary node is located, or detecting the heartbeat stop of the virtual machine of the virtual machine, determining the faulty virtual machine And virtualizing the virtual machine or heartbeat indicated by the virtual machine failure message The virtual machine that is determined to be faulty;
  • the fourth resource management module 912 is configured to: when the virtual machine of the virtual machine is faulty, receive configuration information of the virtual machine that manages the fault that is sent by the primary node, and re-run the configuration information to restart the faulty virtual machine, where the faulty
  • the configuration information of the virtual machine is sent after the management master node receives the third fault message, and the third fault message is sent by the common master node after determining that the virtual machine is invalid.
  • the message carries the identification information of the faulty virtual machine.
  • the seventh heartbeat detection module 921 is configured to: after receiving the third fault message, determine that the virtual machine has a fault, and the third fault message carries the identifier information of the faulty virtual machine;
  • the third high availability module 922 is configured to receive a virtual machine fault message and send a restart virtual machine request, where the virtual machine fault message is sent after receiving the third fault message, the virtual machine fault message and the restart virtual
  • the machine request carries the identification information of the faulty virtual machine
  • the third resource management module 923 is configured to send configuration information of the virtual machine corresponding to the faulty virtual machine to the node where the faulty virtual machine is located, and instruct the node to re-run the configuration information to restart the faulty virtual machine. .
  • the virtual machine proxy module 931 is configured to send heartbeat information when the corresponding virtual machine is normal, and does not send heartbeat information when the fault occurs;
  • the eighth heartbeat detection module 932 is configured to determine, after detecting a heartbeat stop of the virtual machine on the standby node, according to the sending condition of the heartbeat information, determine that the virtual machine that is faulty exists, and determine that the virtual machine whose heartbeat is stopped is a fault virtual Machine
  • the fifth resource management module 933 is configured to receive configuration information of the faulty virtual machine sent by the management master node, and re-run the configuration information to restart the faulty virtual machine, where the configuration information of the faulty virtual machine is the management master node.
  • the third fault message is sent by the common master node after receiving the virtual machine fault message, and the third fault message carries the identifier information of the faulty virtual machine.
  • the virtual machine fault message is sent by the standby node after detecting a heartbeat stop of the virtual machine on the standby node, where the virtual machine fault message carries the identifier information of the faulty virtual machine.
  • the scalability of the cluster system can be achieved by partitioning.
  • a peer-to-peer architecture is adopted between the primary nodes, and the standby node and the primary node adopt a star architecture, so that after the virtual machine fails, the virtual machine is faulty and the virtual machine is restarted to improve the availability.
  • the cluster size can be expanded by adding a partition; by adopting peer-to-peer management of multiple master nodes, the HA bottleneck can be eliminated; and the resource status information of the peer node is different.
  • the resource usage rate information can make the fault monitoring communication overhead small, and the state has the same overhead; when the heartbeat of a standby node stops, the primary node of the partition selects other standby nodes in the partition for arbitration, which can reduce the misjudgment and improve the availability;
  • the peer-to-peer architecture is adopted between the master nodes. Compared with the star architecture, the reliability of the master node is enhanced. By effectively utilizing the standby nodes and migrating the VMs, resource waste can be reduced and management overhead can be reduced.
  • the foregoing method includes the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Abstract

Disclosed are a virtual cluster system, a processing method and a device thereof. The system comprises at least two partitions, wherein each partition comprises one master node and at least one slave node; each master node and each slave node are respectively provided with at least one virtual machine; a peer-to-peer architecture is used between the master nodes in different partitions; a star architecture is used between the master node and the slave node in each partition; the master nodes comprise one management master node and at least one normal master node, wherein the management master node is used for reselecting a new normal master node or slave node in the partition of the ineffective normal node or slave node when the normal node or the slave node is ineffective, or, rebooting the virtual machine when the virtual machine on the normal master node or slave node is failed. According to the embodiment of the invention, the expandability and availability of the system can be improved.

Description

虚拟化集群系统及其处理方法和设备  Virtualized cluster system and processing method and device thereof
本申请要求于 2011 年 9 月 27 日提交中国专利局、 申请号为 201110301796.0, 发明名称为 "虚拟化集群系统及其处理方法和设备"的中国 专利申请优先权, 上述专利的全部内容通过引用结合在本申请中。 技术领域 本发明涉及网络通信技术,尤其涉及一种虚拟化集群系统及其处理方法和 设备。 背景技术 集群系统具有强大的整体计算性能、存储性能和管理性能, 以及单一系统 映像的服务形式, 和对用户透明的可用性保障和容错能力, 成为数据中心的主 流基础设施结构。虚拟化技术的应用, 为集群发展提供了更优也更有潜力的解 决方向。虚拟化技术允许一个平台同时运行多个操作系统, 并且应用程序可以 在相互独立的空间内运行而互不影响, 从而显著提高计算机的工作效率。运行 多个虚拟机可以充分利用物理服务器的计算潜能,为数据中心提供快速响应能 力。 This application claims the priority of the Chinese patent application filed on September 27, 2011, the Chinese Patent Office, Application No. 201110301796.0, entitled "Virtualized Cluster System and Processing Method and Equipment", the entire contents of which are incorporated by reference. In this application. TECHNICAL FIELD The present invention relates to network communication technologies, and in particular, to a virtualization cluster system and a processing method and apparatus therefor. BACKGROUND OF THE INVENTION A cluster system has a powerful overall computing performance, storage performance, and management performance, as well as a single system image service form, and transparent availability guarantee and fault tolerance to the user, becoming the main infrastructure structure of the data center. The application of virtualization technology provides a better and more promising solution for cluster development. Virtualization technology allows a single platform to run multiple operating systems simultaneously, and applications can run in separate spaces without affecting each other, significantly increasing the productivity of the computer. Running multiple virtual machines takes full advantage of the computing power of physical servers to provide a fast response capability for the data center.
在引入虚拟化技术后, 可扩展以及高可用性是集群系统面临的最大挑战。 发明内容 本发明实施例是提供一种虚拟化集群系统及其处理方法和设备,提高虚拟 机集群系统的可扩展性和可用性。  With the introduction of virtualization technology, scalability and high availability are the biggest challenges facing cluster systems. SUMMARY OF THE INVENTION Embodiments of the present invention provide a virtualized cluster system and a processing method and device thereof, which improve scalability and usability of a virtual machine cluster system.
本发明实施例提供了一种虚拟化集群系统的处理方法, 包括:  An embodiment of the present invention provides a method for processing a virtualized cluster system, including:
节点判断是否发生如下项中的至少一项:存在失效的普通主节点, 存在失 效的备节点, 或者, 存在故障的虚拟机;  The node determines whether at least one of the following items occurs: a normal primary node that has failed, a failed standby node, or a virtual machine that is faulty;
节点在确定存在失效的普通主节点后, 重生效新的普通主节点; 在确定存 在失效的备节点后, 重生效新的备节点; 或者, 在确定存在故障的虚拟机后, 重启虚拟机; After determining that there is a failed normal primary node, the node re-enacts the new normal primary node; After the failed standby node, the new standby node is re-enabled; or, after determining the faulty virtual machine, restart the virtual machine;
其中, 所述普通主节点和备节点被划分到至少两个的分区内, 每个分 区内包含一个主节点和至少一个的备节点; 每个主节点和每个备节点上分 别设置至少一个的虚拟机; 不同分区内的主节点之间采用对等型架构; 每 个分区内的主节点和备节点之间采用星型架构; 所述主节点包括一个管理 主节点和至少一个的普通主节点。  The common primary node and the standby node are divided into at least two partitions, each partition includes a primary node and at least one standby node; each primary node and each standby node are respectively provided with at least one a virtual machine; a peer-to-peer architecture is adopted between the master nodes in different partitions; a star schema is adopted between the master node and the standby node in each partition; the master node includes a management master node and at least one common master node .
本发明实施例提供了一种虚拟化集群系统的处理设备, 包括:  An embodiment of the present invention provides a processing device for a virtualized cluster system, including:
判断单元, 用于判断是否发生如下项中的至少一项:存在失效的普通主节 点, 存在失效的备节点, 或者, 存在故障的虚拟机;  a determining unit, configured to determine whether at least one of the following items occurs: a normal primary node that has failed, a standby node that has failed, or a virtual machine that is faulty;
处理单元,用于在确定存在失效的普通主节点后,重生效新的普通主节点; 在确定存在失效的备节点后, 重生效新的备节点; 或者, 在确定存在故障的虚 拟机后, 重启虚拟机;  The processing unit is configured to re-enable the new normal primary node after determining that the normal primary node that fails, and to re-enable the new standby node after determining that the failed standby node exists; or, after determining the faulty virtual machine, Restart the virtual machine;
其中, 所述普通主节点和备节点被划分到至少两个的分区内, 每个分 区内包含一个主节点和至少一个的备节点; 每个主节点和每个备节点上分 别设置至少一个的虚拟机; 不同分区内的主节点之间采用对等型架构; 每 个分区内的主节点和备节点之间采用星型架构; 所述主节点包括一个管理 主节点和至少一个的普通主节点。  The common primary node and the standby node are divided into at least two partitions, each partition includes a primary node and at least one standby node; each primary node and each standby node are respectively provided with at least one a virtual machine; a peer-to-peer architecture is adopted between the master nodes in different partitions; a star schema is adopted between the master node and the standby node in each partition; the master node includes a management master node and at least one common master node .
本发明实施例提供了一种虚拟化集群系统, 包括:  An embodiment of the present invention provides a virtualized cluster system, including:
至少两个的分区, 每个分区内包含一个主节点和至少一个的备节点; 每个 主节点和每个备节点上分别设置至少一个的虚拟机;  At least two partitions, each partition includes one primary node and at least one standby node; each primary node and each standby node respectively set at least one virtual machine;
不同分区内的主节点之间采用对等型架构;  A peer-to-peer architecture is used between the master nodes in different partitions;
每个分区内的主节点和备节点之间采用星型架构;  A star schema is used between the primary node and the standby node in each partition;
所述主节点包括一个管理主节点和至少一个的普通主节点, 所述管理 主节点用于在普通主节点或备节点失效后, 在失效的普通主节点或备节点 所在的分区内重选一个新的普通主节点或备节点, 或者, 在普通主节点或 备节点上的虚拟机故障时, 重启虚拟机。  The master node includes a management master node and at least one common master node, and the management master node is used to reselect one of the failed primary master nodes or the standby node in the partition where the normal master node or the standby node fails. A new normal primary or standby node, or restart the virtual machine when the virtual machine on the normal primary or standby node fails.
由上述技术方案可知, 本发明实施例的虚拟化集群系统通过划分分区, 可 以通过增加分区, 实现系统扩展; 分区的主节点之间采用对等型结构, 可以消 除瓶颈问题, 并可以提高可靠性; 通过重选新的主节点、 备节点或重启虚拟机 可以进一歩提高可靠性。 附图说明 为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所 需要使用的附图作一简单地介绍, 显而易见地, 下面描述中的附图是本发明的 一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。 It can be seen from the above technical solution that the virtualized cluster system in the embodiment of the present invention can implement system expansion by adding partitions by partitioning; the primary nodes of the partitions adopt a peer-to-peer structure, which can be eliminated. In addition to bottlenecks, reliability can be improved; reliability can be improved by reselecting new master nodes, standby nodes, or restarting virtual machines. BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. Obviously, the drawings in the following description are some of the present invention. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative labor.
图 1为本发明第一实施例的系统结构示意图;  1 is a schematic structural diagram of a system according to a first embodiment of the present invention;
图 2为本发明第一实施例的方法流程示意图;  2 is a schematic flow chart of a method according to a first embodiment of the present invention;
图 3为本发明第一实施例的设备结构示意图;  3 is a schematic structural diagram of a device according to a first embodiment of the present invention;
图 4为本发明第二实施例的方法流程示意图;  4 is a schematic flow chart of a method according to a second embodiment of the present invention;
图 5为本发明第二实施例的系统结构示意图;  FIG. 5 is a schematic structural diagram of a system according to a second embodiment of the present invention; FIG.
图 6为本发明第三实施例的方法流程示意图;  6 is a schematic flow chart of a method according to a third embodiment of the present invention;
图 7为本发明第三实施例的系统结构示意图;  7 is a schematic structural diagram of a system according to a third embodiment of the present invention;
图 8为本发明第四实施例的方法流程示意图;  8 is a schematic flow chart of a method according to a fourth embodiment of the present invention;
图 9为本发明第四实施例的系统结构示意图。 具体实施方式 为使本发明实施例的目的、技术方案和优点更加清楚, 下面将结合本发明 实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。基于本发明中 的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其 他实施例, 都属于本发明保护的范围。  FIG. 9 is a schematic structural diagram of a system according to a fourth embodiment of the present invention. The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. The embodiments are a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图 1为本发明第一实施例的系统结构示意图, 参见图 1, 该系统包括至少 两个的分区 1, 每个分区内包含一个主节点 (master) 11和至少一个的备节点 ( slave) 12; 每个主节点 11和每个备节点 12上分别设置至少一个的虚拟机 (Virtual Machine, VM) 13。 例如, 参见图 1, 主节点包括主节点 A、 主节点 B和主节点 C等, 主节点 A所在分区的备节点包括备节点 al、 备节点 a2等, 主节点 B所在分区的备节 点包括备节点 bl、 备节点 b2等, 主节点 C所在分区的备节点包括备节点 cl、 备节点 c2等。 1 is a schematic structural diagram of a system according to a first embodiment of the present invention. Referring to FIG. 1, the system includes at least two partitions 1, each partition including a master node 11 and at least one slave node (slave) 12 Each of the master node 11 and each of the standby nodes 12 is provided with at least one virtual machine (VM) 13 respectively. For example, referring to FIG. 1, the master node includes a master node A, a master node B, and a master node C. The standby node of the partition where the master node A is located includes the standby node a1 and the standby node a2, and the standby node of the partition where the master node B is located includes the standby node. The node bl, the standby node b2, and the like, and the standby node of the partition where the primary node C is located includes the standby node cl, the standby node c2, and the like.
不同分区内的主节点 11之间采用对等型架构, 即一个主节点可以向其他 的任一主节点发送资源状态信息,也可以接收其他任一主节点发送的资源状态 信息。每个分区内的主节点 11和备节点 12之间采用星型架构, gp, 备节点会 向主节点发送资源状态信息, 主节点不向备节点发送资源状态信息。该资源状 态信息可以表明对应的节点是正常还是失效。  The master node 11 in different partitions adopts a peer-to-peer architecture, that is, one master node can send resource state information to any other master node, and can also receive resource state information sent by any other master node. A star schema is adopted between the master node 11 and the standby node 12 in each partition, and the standby node sends resource state information to the master node, and the master node does not send resource state information to the standby node. The resource status information can indicate whether the corresponding node is normal or invalid.
所述主节点包括一个管理主节点 (master leader) 和至少一个的普通主节 点,所述管理主节点用于在普通主节点或备节点失效后, 在失效的普通主节点 或备节点所在的分区内重选一个新的普通主节点或备节点, 或者, 在普通主节 点或备节点上的虚拟机故障时, 重启虚拟机。  The master node includes a management master node and at least one common master node, and the management master node is used to partition the failed normal primary node or standby node after the normal primary node or the standby node fails. Re-select a new normal primary or standby node, or restart the virtual machine when the virtual machine on the normal primary or standby node fails.
其中, 可以将主节点中的一个预先设置为管理主节点, 其余主节点为普通 主节点, 管理主节点中会存储每个主节点和备节点以及节点上的虚拟机的信 息, 对分区中的所有节点进行统一管理, 当出现故障后统一处理故障。 例如, 参见图 1, 可以设置主节点 C为管理主节点, 而主节点 、主节点 B等为普通 主节点。  Wherein, one of the primary nodes may be pre-configured as a management primary node, and the remaining primary nodes are ordinary primary nodes, and the management primary node stores information of each of the primary node and the standby node and the virtual machine on the node, All nodes are managed in a unified manner, and the faults are handled uniformly after a failure. For example, referring to Figure 1, the master node C can be set as the management master node, and the master node and the master node B are ordinary master nodes.
对应上述的系统, 各设备间的流程可以如下。  Corresponding to the above system, the flow between the devices can be as follows.
图 2为本发明第一实施例的方法流程示意图, 包括:  2 is a schematic flow chart of a method according to a first embodiment of the present invention, including:
歩骤 21 : 节点判断是否发生如下项中的至少一项: 存在失效的普通主节 点, 存在失效的备节点, 或者, 存在故障的虚拟机;  Step 21: The node determines whether at least one of the following items occurs: a normal primary node that has failed, a standby node that has failed, or a virtual machine that is faulty;
歩骤 22: 节点在确定存在失效的普通主节点后, 重生效新的普通主节点; 在确定存在失效的备节点后, 重生效新的备节点; 或者, 在确定存在故障的虚 拟机后, 重启虚拟机;  Step 22: After determining that there is a failed normal primary node, the node re-enacts the new normal primary node; after determining that the standby standby node exists, re-active the new standby node; or, after determining the faulty virtual machine, Restart the virtual machine;
其中,所述普通主节点和备节点被划分到至少两个的分区内, 每个分区内 包含一个主节点和至少一个的备节点;每个主节点和每个备节点上分别设置至 少一个的虚拟机; 不同分区内的主节点之间采用对等型架构; 每个分区内的主 节点和备节点之间采用星型架构;所述主节点包括一个管理主节点和至少一个 的普通主节点。 The common primary node and the standby node are divided into at least two partitions, each partition includes a primary node and at least one standby node; each primary node and each standby node are respectively provided with at least one a virtual machine; a peer-to-peer architecture between the master nodes in different partitions; a star schema is adopted between the master node and the standby node in each partition; the master node includes a management master node and at least one Ordinary master node.
上述的节点可以具体为普通主节点、 管理主节点和备节点, 在节点不同、 场景不同时, 上述流程可以有不同的具体实施方式。具体内容可以参见后续实 施例。  The above-mentioned nodes may be specifically a common master node, a management master node, and a standby node. When the nodes are different and the scenarios are different, the foregoing processes may have different specific implementation manners. See the subsequent examples for details.
对应地, 该方法对应的设备可以如下所述。  Correspondingly, the device corresponding to the method can be as follows.
图 3为本发明第一实施例的设备结构示意图, 包括判断单元 31和处理单 元 32; 判断单元 31用于判断是否发生如下项中的至少一项: 存在失效的普通 主节点, 存在失效的备节点, 或者, 存在故障的虚拟机; 处理单元 32用于在 确定存在失效的普通主节点后, 重生效新的普通主节点; 在确定存在失效的备 节点后,重生效新的备节点; 或者,在确定存在故障的虚拟机后,重启虚拟机; 其中,所述普通主节点和备节点被划分到至少两个的分区内, 每个分区内 包含一个主节点和至少一个的备节点;每个主节点和每个备节点上分别设置至 少一个的虚拟机; 不同分区内的主节点之间采用对等型架构; 每个分区内的主 节点和备节点之间采用星型架构;所述主节点包括一个管理主节点和至少一个 的普通主节点。  FIG. 3 is a schematic structural diagram of a device according to a first embodiment of the present invention, including a determining unit 31 and a processing unit 32. The determining unit 31 is configured to determine whether at least one of the following items occurs: a normal primary node that has failed, and a failed primary device a node, or a virtual machine that is faulty; the processing unit 32 is configured to re-enable the new normal primary node after determining that the normal primary node is invalid; and re-effective the new standby node after determining that the standby node is invalid; or After determining the faulty virtual machine, restarting the virtual machine; wherein the common primary node and the standby node are divided into at least two partitions, each partition includes one primary node and at least one standby node; At least one virtual machine is set on each of the primary node and each standby node; a peer-to-peer architecture is adopted between the primary nodes in different partitions; a star schema is adopted between the primary node and the standby node in each partition; The master node includes a management master node and at least one common master node.
当然,与上述方法流程对应,上述的设备可以为普通主节点、管理主节点、 备节点, 在不同节点及场景下, 上述单元的具体功能不同。 具体可以参见下面 的实施例。  Certainly, corresponding to the foregoing method flow, the foregoing device may be a common primary node, a management primary node, and a standby node, and the specific functions of the foregoing units are different in different nodes and scenarios. See the examples below for details.
本发明实施例的虚拟化集群系统通过划分分区, 可以通过增加分区, 实现 系统扩展; 分区的主节点之间采用对等型结构, 可以消除瓶颈问题, 并可以提 高可靠性;通过重选新的主节点、备节点或重启虚拟机可以进一歩提高可靠性。  The virtualized cluster system of the embodiment of the present invention can realize system expansion by dividing partitions by partitioning; the peer nodes of the partitions adopt a peer-to-peer structure, which can eliminate bottleneck problems and improve reliability; The primary node, the standby node, or the restart virtual machine can further improve reliability.
图 4为本发明第二实施例的方法流程示意图,图 5为本发明第二实施例的 系统结构示意图, 本实施例以普通主节点失效为例。  4 is a schematic flowchart of a method according to a second embodiment of the present invention. FIG. 5 is a schematic structural diagram of a system according to a second embodiment of the present invention.
参见图 4, 本实施例包括:  Referring to FIG. 4, this embodiment includes:
歩骤 41 : 集群正常工作时, 各分区的普通主节点通过心跳检测模块 (heartbeatsync) 互相检测心跳。  Step 41: When the cluster is working normally, the normal master node of each partition detects the heartbeat with each other through the heartbeat detection module (heartbeatsync).
例如, 普通主节点 A的心跳检测模块将心跳信息发送给普通主节点 B的 心跳检测模块。  For example, the heartbeat detection module of the common primary node A sends the heartbeat information to the heartbeat detection module of the ordinary primary node B.
歩骤 42: 如果普通主节点 B的心跳检测模块检测到普通主节点 A的心跳 信息停止, 则组播故障消息, 该故障消息中携带普通主节点 A的标识信息, 以表明普通主节点 A失效。 Step 42: If the heartbeat detection module of the ordinary primary node B detects the heartbeat of the ordinary primary node A If the information is stopped, the multicast fault message carries the identifier information of the common master node A to indicate that the normal master node A is invalid.
其中, 当普通主节点 B在一定时间内没有收到普通主节点 A的心跳信息 后, 则确定检测到普通主节点 A的心跳停止。  After the normal primary node B does not receive the heartbeat information of the ordinary primary node A within a certain period of time, it is determined that the heartbeat stop of the ordinary primary node A is detected.
该标识信息可以用于区分各节点,例如为普通主节点 A的 ID或者地址等。 其中, 其余普通主节点以及管理主节点都会接收到该故障消息。  The identification information can be used to distinguish each node, for example, the ID or address of the ordinary primary node A. The remaining common master node and the management master node will receive the fault message.
歩骤 43: 管理主节点的心跳检测模块接收到故障消息后, 向管理主节点 的高可用 (High Availability, HA) 模块上报主节点故障消息, 该主节点故障 消息中携带普通主节点 A的标识信息。  Step 43: After receiving the fault message, the heartbeat detection module of the management master node reports the master node fault message to the high availability (HA) module of the management master node, where the master node fault message carries the identifier of the common master node A. information.
歩骤 44: 管理主节点的 HA模块在普通主节点 A所在分区内, 将一个备 节点重选为该分区的新的普通主节点。  Step 44: Manage the HA module of the master node in the partition where the common master node A is located, and reselect a standby node as the new common master node of the partition.
例如, 根据各备节点的 ID优先级、 备节点的动态负载情况, 将 A所在分 区内的备节点 al选为新的普通主节点。  For example, according to the ID priority of each standby node and the dynamic load status of the standby node, the standby node al in the zone where A is located is selected as the new common primary node.
歩骤 45 : 管理主节点的 HA 模块向管理主节点的资源管理模块 (ResourceMgmt) 发送迁移虚拟机请求, 该迁移虚拟机请求中携带新的普通 主节点 al的标识信息及普通主节点 A的标识信息。  Step 45: The HA module of the management master node sends a migration virtual machine request to the resource management module (ResourceMgmt) of the management master node, where the migration virtual machine request carries the identifier information of the new common master node a1 and the identifier of the common master node A. information.
歩骤 46: 管理主节点的资源管理模块将普通主节点 A上的虚拟机迁移到 新的普通主节点 al上。  Step 46: The resource management module of the management master node migrates the virtual machine on the common primary node A to the new normal primary node al.
例如,将普通主节点 A上的虚拟机的配置信息发送给新的普通主节点 al , 并指示新的普通主节点 al重新运行该配置信息以重启对应的虚拟机。 其中, 虚拟机的配置信息是能够启动虚拟机的信息, 例如为虚拟机软件, 当执行该虚 拟机软件后可以启动虚拟机。  For example, the configuration information of the virtual machine on the normal primary node A is sent to the new normal primary node al, and the new normal primary node al is instructed to re-run the configuration information to restart the corresponding virtual machine. The configuration information of the virtual machine is information capable of starting the virtual machine, for example, virtual machine software, and the virtual machine can be started after executing the virtual machine software.
进一歩地, 当新的普通主节点加入后, 主节点还要进一歩更新成员关系: 歩骤 47: 新的普通主节点向其余普通主节点组播加入请求, 其余普通主 节点的心跳检测模块检测到该加入请求后, 向对应的成员管理模块 (MembershipMgmt) 发送成员关系更新请求, 该成员关系更新请求中携带新 的普通主节点的标识信息和失效的普通主节点的标识信息。  Further, when the new common master node joins, the master node has to further update the membership relationship: Step 47: The new normal master node multicasts the join request to the remaining common master nodes, and the heartbeat detection modules of the remaining common master nodes After detecting the join request, the member relationship update request is sent to the corresponding member management module (MembershipMgmt), and the membership update request carries the identifier information of the new common master node and the identifier information of the invalid ordinary master node.
例如, 普通主节点 B接收到新的普通主节点 al组播的加入请求后, 普通 主节点 B的心跳检测模块向普通主节点 B的成员管理模块发送成员关系更新 请求, 该消息中携带 A的标识信息和 al的标识信息。 For example, after the normal primary node B receives the join request of the new common primary node a1 multicast, the heartbeat detection module of the common primary node B sends the membership update to the member management module of the ordinary primary node B. The request carries the identification information of A and the identification information of al.
歩骤 48: 成员关系管理模块更新成员关系列表。  Step 48: The membership management module updates the membership list.
例如, 将新的普通主节点 al 的标识信息添加到该成员列表中, 并删除失 效的普通主节点 A的标识信息。  For example, the identification information of the new normal primary node al is added to the member list, and the identification information of the failed ordinary primary node A is deleted.
参照上述流程, 对应的模块可以如下:  Referring to the above process, the corresponding module can be as follows:
参见图 5, 本实施例中, 涉及普通主节点 51和管理主节点 52。进一歩地, 对于普通主节点, 其判断单元具体为第一心跳模块检测模块 (Heartbeat Sync) 511, 处理单元具体为第一成员关系管理模块 (MembershipMgmt) 512。 对于 管理主节点, 其判断单元具体为第二心跳检测模块 521, 处理单元具体包括第 一高可用 (HA) 模块 522和第一资源管理模块 (ResourceMgmt) 523。  Referring to FIG. 5, in this embodiment, a common master node 51 and a management master node 52 are involved. Further, for the common master node, the determining unit is specifically a first heartbeat module detection module (Heartbeat Sync) 511, and the processing unit is specifically a first membership relationship management module (MembershipMgmt) 512. For the management master node, the determining unit is specifically a second heartbeat detecting module 521, and the processing unit specifically includes a first high availability (HA) module 522 and a first resource management module (ResourceMgmt) 523.
第一心跳检测模块 511 用于在检测到任一其他的普通主节点的心跳停止 后, 确定存在失效的普通主节点, 且确定心跳停止的普通主节点为失效的普通 主节点;  The first heartbeat detection module 511 is configured to determine that a normal primary node that has failed after detecting a heartbeat stop of any other common primary node, and determine that the ordinary primary node that stops the heartbeat is a failed primary primary node;
第一成员关系管理模块 512用于接收第一成员关系请求消息,所述第一成 员关系请求消息中携带新的普通主节点的标识信息和失效的普通主节点的标 识信息,将所述新的普通主节点的标识信息添加到第一成员关系列表中, 并删 除所述第一成员关系列表中的所述失效的普通主节点的标识信息;  The first member relationship management module 512 is configured to receive a first member relationship request message, where the first member relationship request message carries the identifier information of the new common master node and the identifier information of the invalid ordinary master node, and the new The identifier information of the common master node is added to the first member relationship list, and the identifier information of the invalid ordinary master node in the first member relationship list is deleted.
其中,所述新的普通主节点为管理主节点接收到第一故障消息后从所述失 效的普通主节点所在分区内的备节点中重选得到的,所述第一故障消息为所述 普通主节点在确定存在失效的普通主节点后发送的,所述第一故障消息中携带 所述失效的普通主节点的标识信息。  The new common primary node is reselected from the standby node in the partition where the failed primary primary node is located after the primary node receives the first fault message, and the first fault message is the common After the primary node determines that there is a failed normal primary node, the first fault message carries the identifier information of the failed normal primary node.
第二心跳检测模块 521用于在接收到第一故障消息后,确定存在失效的普 通主节点,所述第一故障消息为普通主节点在确定存在失效的普通主节点后发 送的, 所述第一故障消息中携带所述失效的普通主节点的标识信息;  The second heartbeat detection module 521 is configured to: after receiving the first fault message, determine that there is a normal primary node that is in a failure, where the first fault message is sent by the ordinary master node after determining that the normal primary node is invalid, the first A fault message carries the identification information of the failed normal master node;
第一高可用模块 522用于接收主节点故障消息,所述主节点故障消息中携 带所述失效的普通主节点的标识信息,在所述失效的普通主节点所在分区的备 节点中重选出一个新的普通主节点,并将所述新的普通主节点的标识信息和失 效的普通主节点的标识信息携带在第一迁移虚拟机请求中发送,所述主节点故 障消息为接收到所述第一故障消息后发送的; 第一资源管理模块 523用于根据所述第一迁移虚拟机请求消息,将所述失 效的普通主节点上的虚拟机的标识信息发送给所述新的普通主节点上并重启 所述虚拟机。 The first high availability module 522 is configured to receive a primary node fault message, where the primary node fault message carries the identifier information of the failed normal primary node, and is reselected in the standby node of the failed normal primary node. a new common master node, and carrying the identifier information of the new normal master node and the identifier information of the failed normal master node in the first migration virtual machine request, where the master node fault message is received Sent after the first fault message; The first resource management module 523 is configured to send, according to the first migration virtual machine request message, identifier information of the virtual machine on the failed normal primary node to the new common primary node, and restart the virtual machine. .
本实施例通过分区可以实现集群系统的扩展性。本实施例通过主节点间采 用对等型架构, 可以在一个主节点失效后,及时获知主节点失效并重选出新的 主节点, 提高可用性。  In this embodiment, the scalability of the cluster system can be achieved by partitioning. In this embodiment, a peer-to-peer architecture is adopted between the master nodes, and after a master node fails, the master node is known to be invalid and the new master node is reselected in time to improve the availability.
图 6为本发明第三实施例的方法流程示意图,图 7为本发明第三实施例的 系统结构示意图, 本实施例以备节点失效为例。  FIG. 6 is a schematic flowchart of a method according to a third embodiment of the present invention, and FIG. 7 is a schematic structural diagram of a system according to a third embodiment of the present invention.
参见图 6, 本实施例包括:  Referring to Figure 6, this embodiment includes:
歩骤 601 : 集群正常工作时, 各分区的备节点通过心跳检测模块向所在分 区的普通主节点的心态检测模块发送心跳信息。  Step 601: When the cluster is working normally, the standby node of each partition sends heartbeat information to the heartbeat detection module of the common primary node in the partition through the heartbeat detection module.
例如, 备节点 al 的心跳检测模块将心跳信息发送给所在分区的普通主节 点 A的心跳检测模块。  For example, the heartbeat detection module of the standby node al sends the heartbeat information to the heartbeat detection module of the normal primary node A of the partition.
歩骤 602: 如果普通主节点 A的心跳检测模块检测到备节点 al的心跳停 止, 则向所在分区的另一个备节点发送心跳检测消息。  Step 602: If the heartbeat detection module of the normal primary node A detects the heartbeat stop of the standby node al, sends a heartbeat detection message to another standby node of the local node.
例如, 普通主节点 A在设定的时间内没有检测到备节点 al的心跳信息, 则普通主节点 A检测到备节点 al的心跳停止, 并向其所在分区的另一个备节 点 a2发送心跳检测消息, 该心跳检测消息中携带备节点 al的标识信息。  For example, if the normal master node A does not detect the heartbeat information of the standby node a within the set time, the normal master node A detects the heartbeat stop of the standby node a1 and sends heartbeat detection to another standby node a2 of the partition in which it is located. The heartbeat detection message carries the identifier information of the standby node a1.
歩骤 603 : 备节点 a2检测备节点 al的心跳情况。  Step 603: The standby node a2 detects the heartbeat condition of the standby node al.
例如, 备节点 a2向备节点 al发送 pmg消息, 如果没有接收到备节点 al 返回的响应消息, 则表明备节点 al心跳停止。  For example, the standby node a2 sends a pmg message to the standby node al. If the response message returned by the standby node al is not received, the standby node a heartbeat stops.
歩骤 604: 备节点 a2向普通主节点 A发送心跳检测结果, 其中携带对备 节点 al的心跳检测结果。  Step 604: The standby node a2 sends a heartbeat detection result to the normal primary node A, where the heartbeat detection result of the standby node al is carried.
歩骤 605: 如果心跳检测结果也表明备节点 al 的心跳停止, 则普通主节 点 A组播故障消息, 该故障消息中携带备节点 al的标识信息。  Step 605: If the heartbeat detection result also indicates that the heartbeat of the standby node al is stopped, the normal primary node A multicasts a fault message, and the fault message carries the identification information of the standby node al.
其中, 其余普通主节点和管理主节点都会接收到故障消息。  Among them, the rest of the common primary node and the management master node will receive the fault message.
歩骤 606: 管理主节点的心跳检测模块接收该故障消息后, 向管理主节点 中的 HA模块发送备节点故障消息, 该备节点故障消息中携带失效的备节点 al的标识信息。 歩骤 607: 管理主节点的 HA模块在备节点 al所在分区内, 将另一个备 节点选为迁移虚拟机的备节点。 Step 606: After receiving the fault message, the heartbeat detection module of the management master node sends a standby node fault message to the HA module in the management master node, where the standby node fault message carries the identifier information of the failed standby node a1. Step 607: The HA module of the management master node is in the partition where the standby node a is located, and the other standby node is selected as the standby node of the migration virtual machine.
其中, 也可以根据优先级、 负载情况等选择另一个备节点。  Among them, another standby node can also be selected according to priority, load condition, and the like.
歩骤 608: 管理主节点的 HA模块向管理主节点的资源管理模块发送迁移 虚拟机请求, 其中携带新的备节点的标识信息及失效的备节点的标识信息。  Step 608: The HA module of the management master node sends a migration virtual machine request to the resource management module of the management master node, where the identifier information of the new standby node and the identifier information of the invalid standby node are carried.
例如, 重选的备节点为 a2, 则迁移虚拟机请求中携带 al的标识信息以及 a2的标识信息。  For example, if the standby node to be reselected is a2, the migration virtual machine request carries the identification information of al and the identification information of a2.
歩骤 609: 管理主节点的资源管理模块将备节点 al上的虚拟机迁移到备 节点 a2上。  Step 609: The resource management module of the management master node migrates the virtual machine on the standby node al to the standby node a2.
例如, 将备节点 al 上的虚拟机的配置信息发送给备节点 a2, 并指示 a2 重新运行该配置信息以重启对应的虚拟机。其中,虚拟机的配置信息是能够所 启动虚拟机的信息, 例如为虚拟机软件, 当执行该虚拟机软件后可以启动虚拟 机。  For example, send the configuration information of the virtual machine on the standby node al to the standby node a2, and instruct a2 to re-run the configuration information to restart the corresponding virtual machine. The configuration information of the virtual machine is information that can be used to start the virtual machine, for example, virtual machine software, and the virtual machine can be started after executing the virtual machine software.
进一歩地, 失效的备节点可以执行如下动作:  Further, the failed standby node can perform the following actions:
歩骤 610: 备节点 al在发现自己的心跳信息丢失后, pmg网关, 即向自 己的网关发送 ping消息。  Step 610: After the standby node al finds that its heartbeat information is lost, the pmg gateway sends a ping message to its own gateway.
歩骤 611 : 如果 ping不通, 即收不到 ping消息对应的响应消息, 则下电。 参照上述流程, 对应的模块可以如下:  Step 611: If the ping fails, the response message corresponding to the ping message is not received, and the power is off. Referring to the above process, the corresponding module can be as follows:
参见图 7,本实施例中,涉及普通主节点 71、管理主节点 72和备节点 73。 进一歩地, 对于普通主节点, 其判断单元和处理单元为同一个模块, 具体为第 三心跳模块检测模块 711。 对于管理主节点, 其判断单元具体为第四心跳检测 模块 721, 处理单元具体包括第二高可用模块 722 和第二资源管理模块 Referring to FIG. 7, in this embodiment, a common master node 71, a management master node 72, and a standby node 73 are involved. Further, for the common master node, the judgment unit and the processing unit are the same module, specifically the third heartbeat module detection module 711. For the management master node, the determining unit is specifically a fourth heartbeat detecting module 721, and the processing unit specifically includes a second high-availability module 722 and a second resource management module.
(ResourceMgmt) 723。 对于备节点, 其判断单元和处理单元为同一个模块, 具体为第五心跳模块检测模块 731。 (ResourceMgmt) 723. For the standby node, the determining unit and the processing unit are the same module, specifically the fifth heartbeat module detecting module 731.
所述第三心跳检测模块 711 用于在检测到所述普通主节点所在分区内的 任一备节点的心跳停止后, 确定存在失效的备节点, 且确定心跳停止的备节点 为失效的备节点;  The third heartbeat detection module 711 is configured to: after detecting a heartbeat stop of any standby node in the partition where the common primary node is located, determine that the standby node is inactive, and determine that the standby node that is stopped by the heartbeat is a failed standby node. ;
第四心跳检测模块 721于在接收到第二故障消息后,确定存在失效的备节 点, 所述第二故障标识信息为普通主节点在确定存在失效的备节点后发送的, 所述第二故障标识信息中携带所述失效的备节点的标识信息; After receiving the second fault message, the fourth heartbeat detecting module 721 determines that there is a standby node that is invalid, and the second fault identifier information is sent by the normal master node after determining that the standby node is invalid. The second fault identification information carries the identifier information of the failed standby node;
第二高可用模块 722用于接收备节点故障消息,所述备节点故障消息中携 带所述失效的备节点的标识信息,在所述失效的备节点所在分区中重选出一个 新的备节点,并将所述新的备节点的标识信息和失效的备节点的标识信息携带 在第二迁移虚拟机请求中发送,所述备节点故障消息为接收到所述第二故障消 息后发送的;  The second high-availability module 722 is configured to receive a standby node fault message, where the standby node fault message carries the identifier information of the failed standby node, and reselects a new standby node in the partition of the failed standby node. And transmitting the identifier information of the new standby node and the identifier information of the failed standby node in the second migration virtual machine request, where the standby node failure message is sent after receiving the second fault message;
第二资源管理模块 723用于根据所述第二迁移虚拟机请求消息,将所述失 效的备节点上的虚拟机的标识信息发送给所述新的备节点上并重启所述虚拟 机。  The second resource management module 723 is configured to send the identifier information of the virtual machine on the failed standby node to the new standby node according to the second migrated virtual machine request message, and restart the virtual machine.
所述第五心跳检测模块 731用于在所述备节点未失效时发送心跳信息,在 失效时不发送心跳信息,以使所述备节点所在分区的普通主节点根据所述心跳 信息的情况确定所述备节点是否失效,并在自身为失效的备节点时进行下电处 理, 或者,在自身不是失效的备节点且接收到检测请求后检测对应的备节点是 否为失效的备节点, 并将检测结果通知给所述普通主节点, 使得所述普通主节 点进行重生效备节点处理,所述检测请求为所述普通主节点在一定时间内没有 收到任一备节点的心跳信息后发送的,所述检测请求中携带心跳停止的备节点 的标识信息。  The fifth heartbeat detection module 731 is configured to send heartbeat information when the standby node is not invalid, and does not send heartbeat information when the slave node is invalid, so that the ordinary master node of the partition where the standby node is located is determined according to the situation of the heartbeat information. Whether the standby node is powered off, and performs power-off processing when it is a standby node that is invalid, or detects whether the corresponding standby node is a failed standby node after receiving the detection request, and The detection result is sent to the common primary node, so that the normal primary node performs re-effectiveness of the standby node, and the detection request is sent by the ordinary primary node after receiving the heartbeat information of any standby node within a certain period of time. The detection request carries the identification information of the standby node that stops the heartbeat.
本实施例通过分区可以实现集群系统的扩展性。本实施例通过备节点与主 节点采用星型架构, 可以在一个备节点失效后, 主节点及时并对失效的备节点 上的虚拟机进行迁移, 提高可用性。  In this embodiment, the scalability of the cluster system can be achieved by partitioning. In this embodiment, the standby node and the primary node adopt a star architecture, and after a standby node fails, the primary node migrates the virtual machine on the failed standby node in time to improve availability.
图 8为本发明第四实施例的方法流程示意图,图 9为本发明第四实施例的 系统结构示意图, 本实施例以虚拟机故障为例。  FIG. 8 is a schematic flowchart of a method according to a fourth embodiment of the present invention, and FIG. 9 is a schematic structural diagram of a system according to a fourth embodiment of the present invention.
参见图 8, 本实施例包括:  Referring to Figure 8, this embodiment includes:
歩骤 81 : 集群正常工作时, 各节点上的虚拟机代理模块向其所在节点的 心跳检测模块发送心跳信息。  Step 81: When the cluster is working normally, the virtual machine proxy module on each node sends heartbeat information to the heartbeat detection module of the node where it is located.
例如,某一备节点的虚拟机代理模块向该备节点的心跳检测模块发送心跳 自  For example, the virtual machine proxy module of a standby node sends a heartbeat to the heartbeat detection module of the standby node.
歩骤 82: 如果该备节点的心跳检测模块检测到虚拟机的心跳停止, 则向 所在分区的普通主节点发送故障消息。 例如,在一定时间内该备节点上心跳检测模块没有接收到对应的节点上的 虚拟机代理模块发送的心跳信息, 则确定对应的虚拟机心跳停止。 Step 82: If the heartbeat detection module of the standby node detects the heartbeat stop of the virtual machine, send a fault message to the normal primary node of the partition. For example, if the heartbeat detection module on the standby node does not receive the heartbeat information sent by the virtual machine proxy module on the corresponding node within a certain period of time, it determines that the corresponding virtual machine heartbeat stops.
歩骤 83 : 普通主节点接收故障消息后, 组播故障消息, 该故障消息中携 带故障的虚拟机的标识信息。  Step 83: After receiving the fault message, the normal master node multicasts the fault message, and the fault message carries the identifier information of the faulty virtual machine.
上述以备节点上的虚拟机故障为例, 当主节点上的虚拟机故障时, 主节点 上的心跳检测模块在一定时间内没有接收到虚拟机代理模块发送的心跳信息 后, 则确定该主节点上的虚拟机故障, 组播故障消息。  For example, when the virtual machine on the standby node is faulty, the heartbeat detection module on the primary node does not receive the heartbeat information sent by the virtual machine proxy module within a certain period of time, and then determines the primary node. The virtual machine on the fault, multicast failure message.
上述的故障消息可以被其余普通主节点和管理主节点接收。  The above fault message can be received by the remaining common master nodes and the management master node.
歩骤 84: 管理主节点的心跳检测模块接收到故障消息后, 向管理主节点 的 HA模块发送虚拟机故障消息,该虚拟机故障消息中携带故障的虚拟机的标 ^口 ?自  Step 84: After receiving the fault message, the heartbeat detection module of the management master node sends a virtual machine fault message to the HA module of the management master node, where the virtual machine fault message carries the label of the faulty virtual machine? Self
歩骤 85: 管理主节点的 ΗΑ模块向管理主节点的资源管理模块发送重启 虚拟机请求, 该重启虚拟机请求中携带故障的虚拟机的标识信息。  Step 85: The ΗΑ module of the management master node sends a restart virtual machine request to the resource management module of the management master node, where the restart virtual machine request carries the identification information of the faulty virtual machine.
歩骤 86: 管理主节点的资源模块重启虚拟机。  Step 86: Manage the resource module of the primary node to restart the virtual machine.
例如,将故障的虚拟机的配置信息再次发给该虚拟机所在的节点, 并指示 对应的节点重新运行该配置信息以重启虚拟机。或者,管理主节点根据优先级、 负载情况等重选一个节点作为目标节点,之后将该故障的虚拟机的配置信息发 送给该目标节点, 并指示目标节点重新运行该配置信息以重启虚拟机。具体可 以是目标节点的资源管理模块重选运行该配置信息。  For example, the configuration information of the failed virtual machine is sent to the node where the virtual machine is located, and the corresponding node is instructed to re-run the configuration information to restart the virtual machine. Alternatively, the management master node reselects a node as the target node according to the priority, the load condition, and the like, and then sends the configuration information of the failed virtual machine to the target node, and instructs the target node to re-run the configuration information to restart the virtual machine. Specifically, the resource management module of the target node reselects and runs the configuration information.
参照上述流程, 对应的模块可以如下:  Referring to the above process, the corresponding module can be as follows:
参见图 9,本实施例中,涉及普通主节点 91、管理主节点 92和备节点 93。 进一歩地, 对于普通主节点, 其判断单元具体为第六心跳模块检测模块 911, 处理单元具体为第四资源管理模块 912。 对于管理主节点, 其判断单元具体为 第七心跳检测模块 921, 处理单元具体包括第三高可用模块 922和第三资源管 理模块 923。 对于备节点, 其判断单元具体包括虚拟机代理模块 931和第八心 跳模块检测模块 932, 处理单元具体为第五资源管理模块 933。  Referring to FIG. 9, in this embodiment, a common master node 91, a management master node 92, and a standby node 93 are involved. Further, for the common master node, the determining unit is specifically the sixth heartbeat module detecting module 911, and the processing unit is specifically the fourth resource management module 912. For the management master node, the determining unit is specifically a seventh heartbeat detecting module 921, and the processing unit specifically includes a third high availability module 922 and a third resource management module 923. For the standby node, the determining unit includes a virtual machine proxy module 931 and an eighth heartbeat module detecting module 932, and the processing unit is specifically a fifth resource management module 933.
第六心跳检测模块 911 用于在接收到所述普通主节点所在分区内的任一 备节点发送的虚拟机故障消息, 或者, 检测到自身的虚拟机的心跳停止后, 确 定存在故障的虚拟机,且将所述虚拟机故障消息指示的虚拟机或心跳停止的虚 拟机确定为故障的虚拟机; The sixth heartbeat detection module 911 is configured to: after receiving the virtual machine fault message sent by any standby node in the partition where the common primary node is located, or detecting the heartbeat stop of the virtual machine of the virtual machine, determining the faulty virtual machine And virtualizing the virtual machine or heartbeat indicated by the virtual machine failure message The virtual machine that is determined to be faulty;
第四资源管理模块 912用于当自身的虚拟机故障时,接收管理主节点发送 的故障的虚拟机的配置信息,并重新运行所述配置信息以重启所述故障的虚拟 机,所述故障的虚拟机的配置信息为所述管理主节点在接收到第三故障消息后 发送的,所述第三故障消息为所述普通主节点在确定存在失效的虚拟机后发送 的, 所述第三故障消息中携带所述故障的虚拟机的标识信息。  The fourth resource management module 912 is configured to: when the virtual machine of the virtual machine is faulty, receive configuration information of the virtual machine that manages the fault that is sent by the primary node, and re-run the configuration information to restart the faulty virtual machine, where the faulty The configuration information of the virtual machine is sent after the management master node receives the third fault message, and the third fault message is sent by the common master node after determining that the virtual machine is invalid. The message carries the identification information of the faulty virtual machine.
第七心跳检测模块 921用于在接收到第三故障消息后,确定存在故障的虚 拟机, 所述第三故障消息中携带故障虚拟机的标识信息;  The seventh heartbeat detection module 921 is configured to: after receiving the third fault message, determine that the virtual machine has a fault, and the third fault message carries the identifier information of the faulty virtual machine;
第三高可用模块 922用于接收虚拟机故障消息并发送重启虚拟机请求,所 述虚拟机故障消息为接收到所述第三故障消息后发送的,所述虚拟机故障消息 和所述重启虚拟机请求中携带故障虚拟机的标识信息;  The third high availability module 922 is configured to receive a virtual machine fault message and send a restart virtual machine request, where the virtual machine fault message is sent after receiving the third fault message, the virtual machine fault message and the restart virtual The machine request carries the identification information of the faulty virtual machine;
第三资源管理模块 923 用于将所述故障虚拟机对应的虚拟机的配置信息 发送给所述故障虚拟机所在的节点,并指示所述节点重新运行所述配置信息以 重启所述故障虚拟机。  The third resource management module 923 is configured to send configuration information of the virtual machine corresponding to the faulty virtual machine to the node where the faulty virtual machine is located, and instruct the node to re-run the configuration information to restart the faulty virtual machine. .
虚拟机代理模块 931用于在对应的虚拟机正常时发送心跳信息,并在故障 时不发送心跳信息;  The virtual machine proxy module 931 is configured to send heartbeat information when the corresponding virtual machine is normal, and does not send heartbeat information when the fault occurs;
第八心跳检测模块 932 用于根据所述心跳信息的发送情况在检测到所述 备节点上的虚拟机的心跳停止后, 则确定存在故障的虚拟机, 且确定心跳停止 的虚拟机为故障虚拟机;  The eighth heartbeat detection module 932 is configured to determine, after detecting a heartbeat stop of the virtual machine on the standby node, according to the sending condition of the heartbeat information, determine that the virtual machine that is faulty exists, and determine that the virtual machine whose heartbeat is stopped is a fault virtual Machine
第五资源管理模块 933 用于接收管理主节点发送的故障虚拟机的配置信 息, 并重新运行所述配置信息以重启所述故障虚拟机,所述故障虚拟机的配置 信息为所述管理主节点在接收到第三故障消息后发送的,所述第三故障消息为 所述普通主节点在接收到虚拟机故障消息后发送的,所述第三故障消息中携带 所述故障虚拟机的标识信息,所述虚拟机故障消息为所述备节点在检测到所述 备节点上的虚拟机的心跳停止后发送的,所述虚拟机故障消息中携带所述故障 虚拟机的标识信息。  The fifth resource management module 933 is configured to receive configuration information of the faulty virtual machine sent by the management master node, and re-run the configuration information to restart the faulty virtual machine, where the configuration information of the faulty virtual machine is the management master node. After the third fault message is received, the third fault message is sent by the common master node after receiving the virtual machine fault message, and the third fault message carries the identifier information of the faulty virtual machine. The virtual machine fault message is sent by the standby node after detecting a heartbeat stop of the virtual machine on the standby node, where the virtual machine fault message carries the identifier information of the faulty virtual machine.
本实施例通过分区可以实现集群系统的扩展性。本实施例通过主节点间采 用对等型架构, 备节点与主节点采用星型架构, 可以在虚拟机故障后, 及时获 知虚拟机故障并重启虚拟机, 提高可用性。 综上, 本发明实施例通过设置分区, 可以通过增加分区实现集群规模的扩 展; 通过多个主节点采用对等型管理, 可以消除 HA瓶颈; 通过主节点间同歩 资源状态信息, 而不同歩资源使用率信息, 可以使得故障监测通讯开销小, 状 态同歩开销小; 当某备节点的心跳停止后, 该分区的主节点选择该分区内其它 备节点进行仲裁, 可以减少误判提升可用性; 主节点间采用对等型架构, 相较 于星型架构,进一歩增强主节点可靠性;通过有效利用备节点,将虚拟机迁移, 可以减少资源浪费, 降低管理开销。 In this embodiment, the scalability of the cluster system can be achieved by partitioning. In this embodiment, a peer-to-peer architecture is adopted between the primary nodes, and the standby node and the primary node adopt a star architecture, so that after the virtual machine fails, the virtual machine is faulty and the virtual machine is restarted to improve the availability. In summary, in the embodiment of the present invention, by setting a partition, the cluster size can be expanded by adding a partition; by adopting peer-to-peer management of multiple master nodes, the HA bottleneck can be eliminated; and the resource status information of the peer node is different. The resource usage rate information can make the fault monitoring communication overhead small, and the state has the same overhead; when the heartbeat of a standby node stops, the primary node of the partition selects other standby nodes in the partition for arbitration, which can reduce the misjudgment and improve the availability; The peer-to-peer architecture is adopted between the master nodes. Compared with the star architecture, the reliability of the master node is enhanced. By effectively utilizing the standby nodes and migrating the VMs, resource waste can be reduced and management overhead can be reduced.
可以理解的是, 上述方法及设备中的相关特征可以相互参考。 另外, 上述 实施例中的 "第一" 、 "第二"等是用于区分各实施例, 而并不代表各实施例 的优劣。  It can be understood that related features in the above methods and devices can be referred to each other. Further, "first", "second", and the like in the above embodiments are used to distinguish the embodiments, and do not represent the advantages and disadvantages of the embodiments.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分歩骤可 以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储 介质中, 该程序在执行时, 执行包括上述方法实施例的歩骤; 而前述的存储介 质包括: ROM、 RAM, 磁碟或者光盘等各种可以存储程序代码的介质。  A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing method includes the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修 改, 或者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不 使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。  It should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

权 利 要 求 Rights request
1、 一种虚拟化集群系统的处理方法, 其特征在于, 包括: A method for processing a virtualized cluster system, comprising:
节点判断是否发生如下项中的至少一项:存在失效的普通主节点, 存在失 效的备节点, 或者, 存在故障的虚拟机;  The node determines whether at least one of the following items occurs: a normal primary node that has failed, a failed standby node, or a virtual machine that is faulty;
节点在确定存在失效的普通主节点后, 重生效新的普通主节点; 在确定存 在失效的备节点后, 重生效新的备节点; 或者, 在确定存在故障的虚拟机后, 重启虚拟机;  After determining that there is a failed normal primary node, the node re-enacts the new normal primary node; after determining that there is a failed standby node, re-active the new standby node; or, after determining the faulty virtual machine, restarting the virtual machine;
其中,所述普通主节点和备节点被划分到至少两个的分区内, 每个分区内 包含一个主节点和至少一个的备节点;每个主节点和每个备节点上分别设置至 少一个的虚拟机; 不同分区内的主节点之间采用对等型架构; 每个分区内的主 节点和备节点之间采用星型架构;所述主节点包括一个管理主节点和至少一个 的普通主节点。  The common primary node and the standby node are divided into at least two partitions, each partition includes a primary node and at least one standby node; each primary node and each standby node are respectively provided with at least one a virtual machine; a peer-to-peer architecture between the master nodes in different partitions; a star schema is used between the master node and the standby node in each partition; the master node includes a management master node and at least one common master node .
2、 根据权利要求 1所述的方法, 其特征在于, 如果所述节点为普通主节 点,  2. The method according to claim 1, wherein if the node is a normal primary node,
判断存在失效的普通主节点包括:在检测到任一其他的普通主节点的心跳 停止后, 确定存在失效的普通主节点, 且确定心跳停止的普通主节点为失效的 普通主节点;  Determining that there is a failure of the ordinary primary node includes: after detecting the heartbeat stop of any other common primary node, determining that there is a failed normal primary node, and determining that the ordinary primary node of the cardiac arrest is a failed normal primary node;
所述在确定存在失效的普通主节点后, 重生效新的普通主节点, 包括: 接收第一成员关系请求消息,所述第一成员关系请求消息中携带新的普通 主节点的标识信息和失效的普通主节点的标识信息,将所述新的普通主节点的 标识信息添加到第一成员关系列表中,并删除所述第一成员关系列表中的所述 失效的普通主节点的标识信息;  After the normal primary node is determined to be invalid, the new common primary node is validated, and the method includes: receiving the first membership relationship request message, where the first member relationship request message carries the identifier information of the new common primary node and is invalid. The identification information of the common primary node is added to the first member relationship list, and the identification information of the invalid ordinary primary node in the first member relationship list is deleted;
其中,所述新的普通主节点为管理主节点接收到第一故障消息后从所述失 效的普通主节点所在分区内的备节点中重选得到的,所述第一故障消息为所述 普通主节点在确定存在失效的普通主节点后发送的,所述第一故障消息中携带 所述失效的普通主节点的标识信息。  The new common primary node is reselected from the standby node in the partition where the failed primary primary node is located after the primary node receives the first fault message, and the first fault message is the common After the primary node determines that there is a failed normal primary node, the first fault message carries the identifier information of the failed normal primary node.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 当所述节点为普通主 节点时, 判断存在失效的备节点包括:在检测到所述普通主节点所在分区内的任一 备节点的心跳停止后, 确定存在失效的备节点, 且确定心跳停止的备节点为失 效的备节点; The method according to claim 1 or 2, wherein when the node is a normal master node, Determining that there is a failed standby node includes: determining that the standby node is inactive after determining that the standby node of the standby node in the partition where the common primary node is located is a failed standby node;
所述在确定存在失效的备节点后, 重生效新的备节点, 包括:  After the determined standby node is determined to be invalid, the new standby node is re-in effect, including:
接收第二成员关系请求消息,所述第二成员关系请求消息中携带新的备节 点的标识信息和失效的备节点的标识信息,将所述新的备节点的标识信息添加 到第二成员关系列表中,并删除所述第二成员关系列表中的所述失效的备节点 的标识信息;  Receiving a second member relationship request message, where the second member relationship request message carries the identification information of the new standby node and the identification information of the invalid standby node, and adds the identification information of the new standby node to the second member relationship. In the list, deleting the identification information of the failed standby node in the second membership list;
其中,所述新的备节点为管理主节点接收到第二故障消息后从所述失效的 备节点所在分区内的备节点中重选得到的,所述第二故障消息为所述普通主节 点在确定存在失效的备节点后发送的,所述第二故障消息中携带所述失效的备 节点的标识信息。  The new standby node is reselected from the standby node in the partition where the failed standby node is located after receiving the second fault message, and the second fault message is the common primary node. And sending, by the second fault message, the identifier information of the failed standby node.
4、 根据权利要求 1或 2所述的方法, 其特征在于, 当所述节点为普通主 节点时,  The method according to claim 1 or 2, wherein when the node is a normal primary node,
判断存在故障的虚拟机包括:在接收到所述普通主节点所在分区内的任一 备节点发送的虚拟机故障消息, 或者, 检测到自身的虚拟机的心跳停止后, 确 定存在故障的虚拟机,且将所述虚拟机故障消息指示的虚拟机或心跳停止的虚 拟机确定为故障的虚拟机;  Determining the faulty virtual machine includes: receiving a virtual machine fault message sent by any standby node in the partition where the common primary node is located, or determining that the faulty virtual machine is detected after detecting the heartbeat stop of the virtual machine of the virtual machine And determining, by the virtual machine fault message, the virtual machine or the heartbeat stop virtual machine as a faulty virtual machine;
所述在确定存在故障的虚拟机后, 重启虚拟机, 包括:  After the virtual machine is determined to be faulty, the virtual machine is restarted, including:
当自身的虚拟机故障时, 接收管理主节点发送的故障的虚拟机的配置信 息, 并重新运行所述配置信息以重启所述故障的虚拟机, 所述故障的虚拟机的 配置信息为所述管理主节点在接收到第三故障消息后发送的,所述第三故障消 息为所述普通主节点在确定存在失效的虚拟机后发送的,所述第三故障消息中 携带所述故障的虚拟机的标识信息。  Receiving configuration information of the faulty virtual machine sent by the management node, and re-running the configuration information to restart the faulty virtual machine, where the configuration information of the faulty virtual machine is After the third node receives the third fault message, the third fault message is sent by the normal master node after determining that the virtual machine is invalid. The third fault message carries the virtual fault. Identification information of the machine.
5、 根据权利要求 1所述的方法, 其特征在于, 当所述节点为管理主节点 时,  5. The method according to claim 1, wherein when the node is a management master node,
判断存在失效的普通主节点包括:在接收到第一故障消息后, 确定存在失 效的普通主节点,所述第一故障消息为普通主节点在确定存在失效的普通主节 点后发送的, 所述第一故障消息中携带所述失效的普通主节点的标识信息; 所述在确定存在失效的普通主节点后, 重生效新的普通主节点, 包括: 接收主节点故障消息,所述主节点故障消息中携带所述失效的普通主节点 的标识信息,在所述失效的普通主节点所在分区的备节点中重选出一个新的普 通主节点,并将所述新的普通主节点的标识信息和失效的普通主节点的标识信 息携带在第一迁移虚拟机请求中发送,所述主节点故障消息为接收到所述第一 故障消息后发送的; Determining that there is a failure of the normal primary node includes: after receiving the first fault message, determining that there is a failed normal primary node, the first fault message is sent by the ordinary primary node after determining that the normal primary node is invalid, The first fault message carries the identifier information of the failed normal master node; After the normal primary node is determined to be invalid, the new common primary node is re-initiated, including: receiving a primary node failure message, where the primary node failure message carries the identification information of the failed ordinary primary node, where A new normal primary node is re-selected in the standby node of the zone where the failed primary node is located, and the identification information of the new common primary node and the identification information of the failed normal primary node are carried in the first migration virtual machine request. Transmitting, the master node fault message is sent after receiving the first fault message;
根据所述第一迁移虚拟机请求消息,将所述失效的普通主节点上的虚拟机 的标识信息发送给所述新的普通主节点上并重启所述虚拟机。  And sending, according to the first migration virtual machine request message, identifier information of the virtual machine on the failed normal primary node to the new normal primary node, and restarting the virtual machine.
6、 根据权利要求 1或 5所述的方法, 其特征在于, 当所述节点为管理主 节点时,  6. The method according to claim 1 or 5, wherein when the node is a management master node,
判断存在失效的备节点包括:在接收到第二故障消息后, 确定存在失效的 备节点,所述第二故障标识信息为普通主节点在确定存在失效的备节点后发送 的, 所述第二故障标识信息中携带所述失效的备节点的标识信息;  Determining that there is a failed standby node includes: determining that there is a failed standby node after receiving the second fault message, where the second fault identifier information is sent by the normal master node after determining that the standby node is invalid, the second The fault identification information carries the identifier information of the failed standby node.
所述在确定存在失效的备节点后, 重生效新的备节点, 包括:  After the determined standby node is determined to be invalid, the new standby node is re-in effect, including:
接收备节点故障消息,所述备节点故障消息中携带所述失效的备节点的标 识信息,在所述失效的备节点所在分区中重选出一个新的备节点, 并将所述新 的备节点的标识信息和失效的备节点的标识信息携带在第二迁移虚拟机请求 中发送, 所述备节点故障消息为接收到所述第二故障消息后发送的;  Receiving a standby node fault message, the standby node fault message carrying the identifier information of the failed standby node, reselecting a new standby node in the partition of the failed standby node, and the new standby device The identifier information of the node and the identifier information of the failed standby node are sent in the second migration virtual machine request, and the standby node fault message is sent after receiving the second fault message.
根据所述第二迁移虚拟机请求消息,将所述失效的备节点上的虚拟机的标 识信息发送给所述新的备节点上并重启所述虚拟机。  And sending, according to the second migration virtual machine request message, the identification information of the virtual machine on the failed standby node to the new standby node and restarting the virtual machine.
7、 根据权利要求 1或 5所述的方法, 其特征在于, 当所述节点为管理主 节点时,  The method according to claim 1 or 5, wherein when the node is a management master node,
判断存在故障的虚拟机包括:在接收到第三故障消息后, 确定存在故障的 虚拟机, 所述第三故障消息中携带故障虚拟机的标识信息;  Determining the faulty virtual machine includes: determining, after receiving the third fault message, the faulty virtual machine, where the third fault message carries the identifier information of the faulty virtual machine;
所述在确定存在故障的虚拟机后, 重启虚拟机, 包括:  After the virtual machine is determined to be faulty, the virtual machine is restarted, including:
接收虚拟机故障消息并发送重启虚拟机请求,所述虚拟机故障消息为接收 到所述第三故障消息后发送的,所述虚拟机故障消息和所述重启虚拟机请求中 携带故障虚拟机的标识信息;  Receiving a virtual machine failure message and sending a restart virtual machine request, where the virtual machine failure message is sent after receiving the third failure message, where the virtual machine failure message and the restarting virtual machine request carry the faulty virtual machine Identification information;
将所述故障虚拟机对应的虚拟机的配置信息发送给所述故障虚拟机所在 的节点, 并指示所述节点重新运行所述配置信息以重启所述故障虚拟机。 Sending configuration information of the virtual machine corresponding to the faulty virtual machine to the faulty virtual machine Node, and instructing the node to re-run the configuration information to restart the failed virtual machine.
8、 根据权利要求 1所述的方法, 其特征在于, 当所述节点为备节点时, 判断存在失效的备节点,在确定存在失效的备节点后,重生效新的备节点, 包括:  The method according to claim 1, wherein when the node is a standby node, it is determined that there is a failed standby node, and after determining that the standby node is invalid, the new standby node is re-in effect, including:
在所述备节点未失效时发送心跳信息, 在失效时不发送心跳信息, 以使所 述备节点所在分区的普通主节点根据所述心跳信息的情况确定所述备节点是 否失效, 并在自身为失效的备节点时进行下电处理, 或者, 在自身不是失效的 备节点且接收到检测请求后检测对应的备节点是否为失效的备节点,并将检测 结果通知给所述普通主节点, 使得所述普通主节点进行重生效备节点处理,所 述检测请求为所述普通主节点在一定时间内没有收到任一备节点的心跳信息 后发送的, 所述检测请求中携带心跳停止的备节点的标识信息。  The heartbeat information is sent when the standby node is not invalid, and the heartbeat information is not sent when the standby node is invalid, so that the normal primary node of the partition where the standby node is located determines whether the standby node is invalid according to the situation of the heartbeat information, and is in itself When the standby node is a failed standby node, the device is powered off, or the standby node that is not the failed node and the detection request is received, and the corresponding standby node is detected as a failed standby node, and the detection result is notified to the common primary node. The normal master node is configured to perform the re-prioritization of the standby node, where the detection request is sent after the normal master node does not receive the heartbeat information of any standby node within a certain period of time, and the detection request carries the heartbeat stop. Identification information of the standby node.
9、 根据权利要求 1或 8所述的方法, 其特征在于, 当所述节点为备节点 时,  The method according to claim 1 or 8, wherein when the node is a standby node,
判断存在故障的虚拟机包括:  Determining a faulty virtual machine includes:
在对应的虚拟机正常时发送心跳信息, 并在故障时不发送心跳信息; 根据所述心跳信息的发送情况在检测到所述备节点上的虚拟机的心跳停 止后, 则确定存在故障的虚拟机, 且确定心跳停止的虚拟机为故障虚拟机; 所述在确定存在故障的虚拟机后, 重启虚拟机, 包括:  The heartbeat information is sent when the corresponding virtual machine is normal, and the heartbeat information is not sent when the fault occurs; and after the heartbeat of the virtual machine on the standby node is detected to be stopped according to the sending condition of the heartbeat information, the virtual virtual fault is determined. The virtual machine that determines that the heartbeat is stopped is a faulty virtual machine; after the virtual machine that determines the fault is determined, the virtual machine is restarted, including:
接收管理主节点发送的故障虚拟机的配置信息,并重新运行所述配置信息 以重启所述故障虚拟机,所述故障虚拟机的配置信息为所述管理主节点在接收 到第三故障消息后发送的,所述第三故障消息为所述普通主节点在接收到虚拟 机故障消息后发送的, 所述第三故障消息中携带所述故障虚拟机的标识信息, 所述虚拟机故障消息为所述备节点在检测到所述备节点上的虚拟机的心跳停 止后发送的, 所述虚拟机故障消息中携带所述故障虚拟机的标识信息。  Receiving configuration information of the faulty virtual machine sent by the management master node, and re-running the configuration information to restart the faulty virtual machine, where the configuration information of the faulty virtual machine is after the management master node receives the third fault message. And sending, the third fault message is sent by the common master node after receiving the virtual machine fault message, where the third fault message carries the identifier information of the faulty virtual machine, where the virtual machine fault message is After the standby node detects that the heartbeat of the virtual machine on the standby node is stopped, the virtual machine fault message carries the identifier information of the faulty virtual machine.
10、 一种虚拟化集群系统的处理设备, 其特征在于, 包括:  A processing device for a virtualized cluster system, comprising:
判断单元, 用于判断是否发生如下项中的至少一项:存在失效的普通主节 点, 存在失效的备节点, 或者, 存在故障的虚拟机;  a determining unit, configured to determine whether at least one of the following items occurs: a normal primary node that has failed, a standby node that has failed, or a virtual machine that is faulty;
处理单元,用于在确定存在失效的普通主节点后,重生效新的普通主节点; 在确定存在失效的备节点后, 重生效新的备节点; 或者, 在确定存在故障的虚 拟机后, 重启虚拟机; a processing unit, configured to re-enable a new normal primary node after determining that the normal primary node is inactive; to re-enable the new standby node after determining that the standby node is inactive; or, to determine that there is a faulty virtual node After the machine is started, restart the virtual machine;
其中,所述普通主节点和备节点被划分到至少两个的分区内, 每个分区内 包含一个主节点和至少一个的备节点;每个主节点和每个备节点上分别设置至 少一个的虚拟机; 不同分区内的主节点之间采用对等型架构; 每个分区内的主 节点和备节点之间采用星型架构;所述主节点包括一个管理主节点和至少一个 的普通主节点。  The common primary node and the standby node are divided into at least two partitions, each partition includes a primary node and at least one standby node; each primary node and each standby node are respectively provided with at least one a virtual machine; a peer-to-peer architecture between the master nodes in different partitions; a star schema is used between the master node and the standby node in each partition; the master node includes a management master node and at least one common master node .
11、 根据权利要求 10所述的设备, 其特征在于, 当所述设备为普通主节 点时,  11. The device according to claim 10, wherein when the device is a normal primary node,
所述判断单元包括:  The determining unit includes:
第一心跳检测模块, 用于在检测到任一其他的普通主节点的心跳停止后, 确定存在失效的普通主节点,且确定心跳停止的普通主节点为失效的普通主节 点;  The first heartbeat detecting module is configured to: after detecting the heartbeat stop of any other common primary node, determine that there is a normal primary node that fails, and determine that the ordinary primary node that stops the heartbeat is a failed normal primary node;
所述处理单元包括:  The processing unit includes:
第一成员关系管理模块, 用于接收第一成员关系请求消息,所述第一成员 关系请求消息中携带新的普通主节点的标识信息和失效的普通主节点的标识 信息,将所述新的普通主节点的标识信息添加到第一成员关系列表中, 并删除 所述第一成员关系列表中的所述失效的普通主节点的标识信息;  a first member relationship management module, configured to receive a first member relationship request message, where the first member relationship request message carries the identifier information of the new common master node and the identifier information of the invalid ordinary master node, and the new The identifier information of the common master node is added to the first member relationship list, and the identifier information of the invalid ordinary master node in the first member relationship list is deleted.
其中,所述新的普通主节点为管理主节点接收到第一故障消息后从所述失 效的普通主节点所在分区内的备节点中重选得到的,所述第一故障消息为所述 普通主节点在确定存在失效的普通主节点后发送的,所述第一故障消息中携带 所述失效的普通主节点的标识信息。  The new common primary node is reselected from the standby node in the partition where the failed primary primary node is located after the primary node receives the first fault message, and the first fault message is the common After the primary node determines that there is a failed normal primary node, the first fault message carries the identifier information of the failed normal primary node.
12、 根据权利要求 10所述的设备, 其特征在于, 当所述设备为管理主节 点时,  12. The device according to claim 10, wherein when the device is managing a main node,
所述判断单元包括:  The determining unit includes:
第二心跳检测模块,用于在接收到第一故障消息后, 确定存在失效的普通 主节点,所述第一故障消息为普通主节点在确定存在失效的普通主节点后发送 的, 所述第一故障消息中携带所述失效的普通主节点的标识信息;  a second heartbeat detection module, configured to: after receiving the first fault message, determine that there is a failed normal master node, where the first fault message is sent by the ordinary master node after determining that the normal master node is invalid, A fault message carries the identification information of the failed normal master node;
所述处理单元包括:  The processing unit includes:
第一高可用模块, 用于接收主节点故障消息,所述主节点故障消息中携带 所述失效的普通主节点的标识信息,在所述失效的普通主节点所在分区的备节 点中重选出一个新的普通主节点,并将所述新的普通主节点的标识信息和失效 的普通主节点的标识信息携带在第一迁移虚拟机请求中发送,所述主节点故障 消息为接收到所述第一故障消息后发送的; a first high availability module, configured to receive a primary node failure message, where the primary node failure message is carried The identification information of the failed normal primary node, reselecting a new common primary node in the standby node of the partition where the failed ordinary primary node is located, and identifying the new common primary node and the invalidation The identifier information of the common master node is sent in the first migration virtual machine request, and the master node fault message is sent after receiving the first fault message.
第一资源管理模块,用于根据所述第一迁移虚拟机请求消息,将所述失效 的普通主节点上的虚拟机的配置信息发送给所述新的普通主节点上并重启所 述虚拟机。  a first resource management module, configured to send configuration information of the virtual machine on the failed normal primary node to the new common primary node according to the first migration virtual machine request message, and restart the virtual machine .
13、 根据权利要求 10或 11所述的设备, 其特征在于, 当所述设备为普通 主节点时,  The device according to claim 10 or 11, wherein when the device is a common master node,
所述判断单元和处理单元位于第三心跳检测模块中,所述第三心跳检测模 块用于在检测到所述普通主节点所在分区内的任一备节点的心跳停止后,确定 存在失效的备节点, 且确定心跳停止的备节点为失效的备节点;  The determining unit and the processing unit are located in the third heartbeat detecting module, and the third heartbeat detecting module is configured to determine that there is a failed device after detecting a heartbeat stop of any standby node in the partition where the common primary node is located. a node, and determining that the standby node of the heartbeat stop is a failed standby node;
其中,所述新的备节点为管理主节点接收到第二故障消息后从所述失效的 备节点所在分区内的备节点中重选得到的,所述第二故障消息为所述普通主节 点在确定存在失效的备节点后发送的,所述第二故障消息中携带所述失效的备 节点的标识信息。  The new standby node is reselected from the standby node in the partition where the failed standby node is located after receiving the second fault message, and the second fault message is the common primary node. And sending, by the second fault message, the identifier information of the failed standby node.
14、 根据权利要求 10或 12所述的设备, 其特征在于, 当所述设备为管理 主节点时,  The device according to claim 10 or 12, wherein when the device is a management master node,
所述判断单元包括:  The determining unit includes:
第四心跳检测模块,用于在接收到第二故障消息后, 确定存在失效的备节 点, 所述第二故障标识信息为普通主节点在确定存在失效的备节点后发送的, 所述第二故障标识信息中携带所述失效的备节点的标识信息;  a fourth heartbeat detecting module, configured to: after receiving the second fault message, determine that there is a standby node that is invalid, where the second fault identifier information is sent by the common master node after determining that the standby node is invalid, the second The fault identification information carries the identifier information of the failed standby node.
所述处理单元包括:  The processing unit includes:
第二高可用模块, 用于接收备节点故障消息,所述备节点故障消息中携带 所述失效的备节点的标识信息,在所述失效的备节点所在分区中重选出一个新 的备节点,并将所述新的备节点的标识信息和失效的备节点的标识信息携带在 第二迁移虚拟机请求中发送,所述备节点故障消息为接收到所述第二故障消息 后发送的;  a second high-availability module, configured to receive a standby node fault message, where the standby node fault message carries the identifier information of the failed standby node, and reselects a new standby node in the partition where the failed standby node is located And transmitting the identifier information of the new standby node and the identifier information of the failed standby node in the second migration virtual machine request, where the standby node failure message is sent after receiving the second fault message;
第二资源管理模块,用于根据所述第二迁移虚拟机请求消息,将所述失效 的备节点上的虚拟机的标识信息发送给所述新的备节点上并重启所述虚拟机。a second resource management module, configured to invalidate the virtual machine according to the second migration virtual machine request message The identification information of the virtual machine on the standby node is sent to the new standby node and the virtual machine is restarted.
15、 根据权利要求 10所述的设备, 其特征在于, 当所述设备为备节点时, 所述判断单元和所述处理单元组成第五心跳检测模块,所述第五心跳检测 模块用于在所述备节点未失效时发送心跳信息,在失效时不发送心跳信息, 以 使所述备节点所在分区的普通主节点根据所述心跳信息的情况确定所述备节 点是否失效, 并在自身为失效的备节点时进行下电处理, 或者, 在自身不是失 效的备节点且接收到检测请求后检测对应的备节点是否为失效的备节点,并将 检测结果通知给所述普通主节点, 使得所述普通主节点进行重生效备节点处 理,所述检测请求为所述普通主节点在一定时间内没有收到任一备节点的心跳 信息后发送的, 所述检测请求中携带心跳停止的备节点的标识信息。 The device according to claim 10, wherein when the device is a standby node, the determining unit and the processing unit constitute a fifth heartbeat detecting module, and the fifth heartbeat detecting module is configured to The heartbeat information is sent when the standby node is not invalid, and the heartbeat information is not sent when the standby node is invalid, so that the normal primary node of the partition where the standby node is located determines whether the standby node is invalid according to the situation of the heartbeat information, and is The failed standby node performs power-off processing, or detects whether the corresponding standby node is a failed standby node after receiving the detection request, and notifies the normal primary node of the detection result. The normal master node performs the re-prioritization of the standby node, and the detection request is sent after the normal master node does not receive the heartbeat information of any standby node within a certain period of time, and the detecting request carries the heartbeat stop device. The identification information of the node.
16、 根据权利要求 10或 11所述的设备, 其特征在于, 当所述设备为普通 主节点时,  The device according to claim 10 or 11, wherein when the device is a common master node,
所述判断单元包括:  The determining unit includes:
第六心跳检测模块,用于在接收到所述普通主节点所在分区内的任一备节 点发送的虚拟机故障消息, 或者, 检测到自身的虚拟机的心跳停止后, 确定存 在故障的虚拟机,且将所述虚拟机故障消息指示的虚拟机或心跳停止的虚拟机 确定为故障的虚拟机;  The sixth heartbeat detection module is configured to: after receiving the virtual machine fault message sent by any standby node in the partition where the ordinary primary node is located, or after detecting the heartbeat stop of the virtual machine of the virtual machine, determining the faulty virtual machine And determining, by the virtual machine fault message, the virtual machine or the heartbeat stop virtual machine as a faulty virtual machine;
所述处理单元包括:  The processing unit includes:
第四资源管理模块,用于当自身的虚拟机故障时, 接收管理主节点发送的 故障的虚拟机的配置信息, 并重新运行所述配置信息以重启所述故障的虚拟 机,所述故障的虚拟机的配置信息为所述管理主节点在接收到第三故障消息后 发送的,所述第三故障消息为所述普通主节点在确定存在失效的虚拟机后发送 的, 所述第三故障消息中携带所述故障的虚拟机的标识信息。  And a fourth resource management module, configured to: when the virtual machine of the fault is faulty, receive configuration information of the faulty virtual machine sent by the management master node, and re-run the configuration information to restart the faulty virtual machine, where the faulty The configuration information of the virtual machine is sent after the management master node receives the third fault message, and the third fault message is sent by the common master node after determining that the virtual machine is invalid. The message carries the identification information of the faulty virtual machine.
17、 根据权利要求 10或 12所述的设备, 其特征在于, 当所述设备为管理 主节点时,  17. The device according to claim 10 or 12, wherein when the device is a management master node,
所述判断单元包括:  The determining unit includes:
第七心跳检测模块,用于在接收到第三故障消息后, 确定存在故障的虚拟 机, 所述第三故障消息中携带故障虚拟机的标识信息;  a seventh heartbeat detecting module, configured to: after receiving the third fault message, determine the faulty virtual machine, where the third fault message carries the identifier information of the faulty virtual machine;
所述处理单元包括: 第三高可用模块, 用于接收虚拟机故障消息并发送重启虚拟机请求,所述 虚拟机故障消息为接收到所述第三故障消息后发送的,所述虚拟机故障消息和 所述重启虚拟机请求中携带故障虚拟机的标识信息; The processing unit includes: a third high-availability module, configured to receive a virtual machine fault message and send a restart virtual machine request, where the virtual machine fault message is sent after receiving the third fault message, the virtual machine fault message and the restart virtual The machine request carries the identification information of the faulty virtual machine;
第三资源管理模块,用于将所述故障虚拟机对应的虚拟机的配置信息发送 给所述故障虚拟机所在的节点,并指示所述节点重新运行所述配置信息以重启 所述故障虚拟机。  a third resource management module, configured to send configuration information of the virtual machine corresponding to the faulty virtual machine to a node where the faulty virtual machine is located, and instruct the node to re-run the configuration information to restart the faulty virtual machine .
18、 根据权利要求 10或 15所述的设备, 其特征在于, 当所述设备为备节 点时,  18. The device according to claim 10 or 15, wherein when the device is a standby node,
所述判断单元包括:  The determining unit includes:
虚拟机代理模块, 用于在对应的虚拟机正常时发送心跳信息, 并在故障时 不发送心跳信息;  a virtual machine proxy module, configured to send heartbeat information when the corresponding virtual machine is normal, and not send heartbeat information when the fault occurs;
第八心跳检测模块,用于根据所述心跳信息的发送情况在检测到所述备节 点上的虚拟机的心跳停止后, 则确定存在故障的虚拟机, 且确定心跳停止的虚 拟机为故障虚拟机;  The eighth heartbeat detection module is configured to determine, after detecting a heartbeat stop of the virtual machine on the standby node, the virtual machine that is faulty, and determine that the virtual machine with the heartbeat is a fault virtual Machine
所述处理单元包括:  The processing unit includes:
第五资源管理模块, 用于接收管理主节点发送的故障虚拟机的配置信息, 并重新运行所述配置信息以重启所述故障虚拟机,所述故障虚拟机的配置信息 为所述管理主节点在接收到第三故障消息后发送的,所述第三故障消息为所述 普通主节点在接收到虚拟机故障消息后发送的,所述第三故障消息中携带所述 故障虚拟机的标识信息,所述虚拟机故障消息为所述备节点在检测到所述备节 点上的虚拟机的心跳停止后发送的,所述虚拟机故障消息中携带所述故障虚拟 机的标识信息。  a fifth resource management module, configured to receive configuration information of a faulty virtual machine sent by the management master node, and re-run the configuration information to restart the faulty virtual machine, where configuration information of the faulty virtual machine is the management master node After the third fault message is received, the third fault message is sent by the common master node after receiving the virtual machine fault message, and the third fault message carries the identifier information of the faulty virtual machine. The virtual machine fault message is sent by the standby node after detecting a heartbeat stop of the virtual machine on the standby node, where the virtual machine fault message carries the identifier information of the faulty virtual machine.
19、 一种虚拟化集群系统, 其特征在于, 包括:  19. A virtualized cluster system, comprising:
至少两个的分区, 每个分区内包含一个主节点和至少一个的备节点; 每个 主节点和每个备节点上分别设置至少一个的虚拟机;  At least two partitions, each partition includes one primary node and at least one standby node; each primary node and each standby node respectively set at least one virtual machine;
不同分区内的主节点之间采用对等型架构;  A peer-to-peer architecture is used between the master nodes in different partitions;
每个分区内的主节点和备节点之间采用星型架构;  A star schema is used between the primary node and the standby node in each partition;
所述主节点包括一个管理主节点和至少一个的普通主节点,所述管理主节 点用于在普通主节点或备节点失效后,在失效的普通主节点或备节点所在的分 区内重选一个新的普通主节点或备节点, 或者,在普通主节点或备节点上的虚 拟机故障时, 重启虚拟机。 The master node includes a management master node and at least one common master node, and the management master node is used to locate the failed normal master node or standby node after the normal master node or the standby node fails. Re-select a new common primary or standby node in the zone, or restart the virtual machine when the virtual machine on the common primary or standby node fails.
20、 根据权利要求 19所述的系统, 其特征在于,  20. The system of claim 19, wherein:
所述普通主节点为如权利要求 11所述的设备; 所述管理主节点为如权利 要求 12所述的设备;  The common master node is the device according to claim 11; the management master node is the device according to claim 12;
或者,  Or,
所述普通主节点为如权利要求 13所述的设备; 所述管理主节点为如权利 要求 14所述的设备; 以及, 所述备节点为如权利要求 15所述的设备;  The common master node is the device according to claim 13; the management master node is the device according to claim 14; and the standby node is the device according to claim 15;
或者,  Or,
所述普通主节点为如权利要求 16所述的设备;所述管理主节点为如权 利要求 17所述的设备; 以及, 所述备节点为如权利要求 18所述的设备。  The common master node is the device according to claim 16; the management master node is the device according to claim 17; and the standby node is the device according to claim 18.
PCT/CN2012/082196 2011-09-27 2012-09-27 Virtual cluster system, processing method and device thereof WO2013044828A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110301796.0A CN102355369B (en) 2011-09-27 2011-09-27 Virtual clustered system as well as processing method and processing device thereof
CN201110301796.0 2011-09-27

Publications (1)

Publication Number Publication Date
WO2013044828A1 true WO2013044828A1 (en) 2013-04-04

Family

ID=45578866

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/082196 WO2013044828A1 (en) 2011-09-27 2012-09-27 Virtual cluster system, processing method and device thereof

Country Status (2)

Country Link
CN (1) CN102355369B (en)
WO (1) WO2013044828A1 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355369B (en) * 2011-09-27 2014-01-08 华为技术有限公司 Virtual clustered system as well as processing method and processing device thereof
CN103294494B (en) * 2012-02-29 2018-07-03 中兴通讯股份有限公司 A kind of method and system of virtual system automatically dispose
CN102664763A (en) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 Method for rapidly detecting connection states and making virtual machine HA
EP2928129B1 (en) 2012-12-18 2018-05-09 Huawei Technologies Co., Ltd. Method and network devices for determining an administrative domain in a virtual cluster
CN103607296B (en) * 2013-11-01 2017-08-22 新华三技术有限公司 A kind of virtual-machine fail processing method and equipment
CN103729234B (en) * 2013-12-20 2017-06-27 中电长城网际系统应用有限公司 A kind of cluster virtual machine management method and device
CN105591780B (en) * 2014-10-24 2019-01-29 新华三技术有限公司 Cluster monitoring method and equipment
EP3159794B1 (en) 2014-11-06 2019-01-16 Huawei Technologies Co. Ltd. Distributed storage replication system and method
CN106302569B (en) 2015-05-14 2019-06-18 华为技术有限公司 Handle the method and computer system of cluster virtual machine
CN106612314A (en) * 2015-10-26 2017-05-03 上海宝信软件股份有限公司 System for realizing software-defined storage based on virtual machine
CN105357038B (en) * 2015-10-26 2019-05-07 北京百度网讯科技有限公司 Monitor the method and system of cluster virtual machine
CN108108255A (en) * 2016-11-25 2018-06-01 中兴通讯股份有限公司 The detection of virtual-machine fail and restoration methods and device
CN106789350A (en) * 2017-01-23 2017-05-31 郑州云海信息技术有限公司 A kind of method and device of back-level server virtualization system host node High Availabitity
CN107315663B (en) * 2017-03-10 2020-06-09 秦皇岛市第一医院 Dual-machine cluster architecture
CN107018041B (en) * 2017-03-31 2019-05-17 杭州数梦工场科技有限公司 Data migration method and device in cluster
CN108111337B (en) * 2017-12-06 2021-04-06 北京天融信网络安全技术有限公司 Method and equipment for arbitrating main nodes in distributed system
WO2019178714A1 (en) * 2018-03-19 2019-09-26 华为技术有限公司 Fault detection method, apparatus, and system
CN110661599B (en) * 2018-06-28 2022-04-29 中兴通讯股份有限公司 HA implementation method, device and storage medium between main node and standby node
CN109361777B (en) * 2018-12-18 2021-08-10 广东浪潮大数据研究有限公司 Synchronization method, synchronization system and related device for distributed cluster node states
CN113742417A (en) * 2020-05-29 2021-12-03 同方威视技术股份有限公司 Multi-level distributed consensus method and system, electronic device and computer readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155912A1 (en) * 2005-01-12 2006-07-13 Dell Products L.P. Server cluster having a virtual server
US20080189468A1 (en) * 2007-02-02 2008-08-07 Vmware, Inc. High Availability Virtual Machine Cluster
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof
CN102355369A (en) * 2011-09-27 2012-02-15 华为技术有限公司 Virtual clustered system as well as processing method and processing device thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155912A1 (en) * 2005-01-12 2006-07-13 Dell Products L.P. Server cluster having a virtual server
US20080189468A1 (en) * 2007-02-02 2008-08-07 Vmware, Inc. High Availability Virtual Machine Cluster
CN102110071A (en) * 2011-03-04 2011-06-29 浪潮(北京)电子信息产业有限公司 Virtual machine cluster system and implementation method thereof
CN102355369A (en) * 2011-09-27 2012-02-15 华为技术有限公司 Virtual clustered system as well as processing method and processing device thereof

Also Published As

Publication number Publication date
CN102355369B (en) 2014-01-08
CN102355369A (en) 2012-02-15

Similar Documents

Publication Publication Date Title
WO2013044828A1 (en) Virtual cluster system, processing method and device thereof
KR102059251B1 (en) Node system, server device, scaling control method and program
US9971660B2 (en) Virtual machine network loss detection and recovery for high availability
CN107526659B (en) Method and apparatus for failover
US7617411B2 (en) Cluster system and failover method for cluster system
WO2017000260A1 (en) Method and apparatus for switching vnf
US10243780B2 (en) Dynamic heartbeating mechanism
EP3338418B1 (en) Data center resource tracking
US9489230B1 (en) Handling of virtual machine migration while performing clustering operations
US10838752B2 (en) Network notification loss detection for virtual machine migration
US10541862B2 (en) VNF processing policy determining method, apparatus, and system
JPWO2015146355A1 (en) Update management system and update management method
JP6432955B2 (en) Method, apparatus and system for migrating virtual network function instances
US10728099B2 (en) Method for processing virtual machine cluster and computer system
JP2015103092A (en) Fault recovery system and method of constructing fault recovery system
WO2018137520A1 (en) Service recovery method and apparatus
US9100443B2 (en) Communication protocol for virtual input/output server (VIOS) cluster communication
WO2020001409A1 (en) Virtual network function (vnf) deployment method and apparatus
CN113709220B (en) High-availability implementation method and system of virtual load equalizer and electronic equipment
US10305987B2 (en) Method to syncrhonize VSAN node status in VSAN cluster
CN116095145B (en) Data control method and system of VPC cluster
US11418382B2 (en) Method of cooperative active-standby failover between logical routers based on health of attached services
JP6077945B2 (en) Network system and control method
US11748131B2 (en) Network updates for virtual machine migration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12836786

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12836786

Country of ref document: EP

Kind code of ref document: A1