US8812501B2 - Method or apparatus for selecting a cluster in a group of nodes - Google Patents

Method or apparatus for selecting a cluster in a group of nodes Download PDF

Info

Publication number
US8812501B2
US8812501B2 US11/491,362 US49136206A US8812501B2 US 8812501 B2 US8812501 B2 US 8812501B2 US 49136206 A US49136206 A US 49136206A US 8812501 B2 US8812501 B2 US 8812501B2
Authority
US
United States
Prior art keywords
nodes
token
node
cluster
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/491,362
Other versions
US20070033205A1 (en
Inventor
Tanmay Kumar Pradhan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Valtrus Innovations Ltd
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PRADHAN, TANMAY KUMAR
Publication of US20070033205A1 publication Critical patent/US20070033205A1/en
Application granted granted Critical
Publication of US8812501B2 publication Critical patent/US8812501B2/en
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Assigned to VALTRUS INNOVATIONS LIMITED reassignment VALTRUS INNOVATIONS LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT PACKARD ENTERPRISE COMPANY, HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/505Clust

Definitions

  • the present invention relates to a method or apparatus for selecting a cluster in a group of nodes, and more particularly to assigning, identifying, and selecting a subgroup in a group of nodes.
  • a cluster is made up a group of interconnected computers (nodes) running cluster software which enables the group to behave like a single computer.
  • the nodes communicate with each other via a set of network connections referred to as a cluster interconnect.
  • a cluster will generally have shared data storage devices connected to the nodes via a shared storage bus.
  • the cluster software running on each node is arranged so that in the event of failure of any node in the cluster, the functions and services provided by the cluster are unaffected.
  • Failures can occur in the nodes themselves or in the cluster interconnect.
  • the cluster becomes split into subgroups of nodes, each unable to communicate with other subgroups.
  • the cluster software is arranged to spontaneously reorganize the subgroups to form one or more new candidate clusters.
  • the largest candidate cluster is self selected to continue to provide the cluster functions and services.
  • Each node knows the total number of nodes in the system and this data is used by each candidate cluster to determine whether the number of nodes it contains makes it the largest cluster.
  • this method can result in more than one cluster considering themselves to be the largest. In this case more than one cluster can accesses the cluster data set and compromise the integrity of that data.
  • a predetermined hardware element such as a disk drive
  • This chosen hardware element is connected to the shared storage bus and thus connected to all nodes in the cluster.
  • the candidate which acquires access to the hardware first during the reorganization of nodes forms the cluster.
  • the subgroup which is first in communication with the specified hardware is chosen to continue as the cluster.
  • using a hardware element in this way can increase the overall hardware costs of the cluster system. Also, accessing the hardware element increases the network activity and processing complexity during the node reorganization process.
  • FIG. 1 is a schematic illustration of a computer system including a cluster server (cluster) according to an embodiment of the invention
  • FIG. 2 is a diagram illustrating the passing of a token between nodes in the cluster of FIG. 1 ;
  • FIG. 3 is a flow chart illustrating processing carried out during the formation or reorganization of the cluster of FIG. 1 ;
  • FIG. 4 is a flow chart illustrating processing carried out during the formation and operation of each node of the cluster of FIG. 1 .
  • the token may be passed from one node to another.
  • the token may be passed between nodes at predetermined time intervals or at random time intervals.
  • the passing of the token can be suspended.
  • the first node may be the first node assigned to the cluster during cluster configuration. If a node holding the token is required to shut down then prior to shut down the token may be passed to another node. If a node holding the token crashes then step c) may be suspended until the node reboots.
  • the passing of the token between nodes may be carried out atomically so that the token is not lost and remains unique.
  • the passing of the token between nodes is carried out using a three phase commit protocol.
  • Each node may be assigned one or more votes and in step c) if two subgroups hold the largest numbers of votes then selecting as the cluster the subgroup containing the node to which the token is assigned.
  • the token may count for one or more votes.
  • the number of votes or nodes held by a subgroup may be treated as being equal if the difference between them falls within a predetermined limit.
  • an apparatus for selecting a cluster in a group of nodes comprising:
  • selecting means operable if the two largest the subgroups comprise equal numbers of nodes to select as the cluster the subgroup containing the node to which the token is assigned.
  • the connected group comprises half of the total nodes then forming the cluster if the connected group includes a node to which a token is assigned.
  • a node in a cluster comprising:
  • b) means operable if the connected group of nodes comprises more than half of the total nodes in the cluster, to form the cluster from the group;
  • c) means operable if the connected group comprises half of the total nodes to form the cluster if the connected group includes a node to which a token is assigned.
  • Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to carry out a method of selecting a cluster in a group of nodes, the method comprising the steps of:
  • Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to provide apparatus for selecting a cluster in a group of nodes, the apparatus comprising:
  • selecting means operable if the two largest the subgroups comprise equal numbers of nodes to select as the cluster the subgroup containing the node to which the token is assigned.
  • Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to carry out a method of operating a node in a cluster, the method comprising the steps of:
  • the connected group comprises half of the total nodes then forming the cluster if the connected group includes a node to which a token is assigned.
  • b) means operable if the connected group of nodes comprises more than half of the total nodes in the cluster, to form the cluster from the group;
  • c) means operable if the connected group comprises half of the total nodes to form the cluster if the connected group includes a node to which a token is assigned.
  • Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to carry out a method of selecting a cluster in a group of nodes, the method comprising the steps of:
  • FIG. 1 shows a computer system in the form of a cluster server 101 comprising four computers 103 , 105 , 107 , 109 each running cluster server software and each constituting a node in a group of nodes that form the cluster server 101 .
  • the nodes are interconnected by a private network connection called a cluster interconnect 111 .
  • the cluster has three shared storage devices 113 , 115 , 117 which are accessed by another network connection in the form of a shared storage bus 119 .
  • a communications link 121 links each node to a wide area network (WAN) 123 in the form of the internet and enables communications between the cluster server 101 and a client computer 125 .
  • the client computer 125 is operable to access data and services provided by the cluster server 101 over the WAN 123 .
  • WAN wide area network
  • the cluster server software running on each node 103 , 105 , 107 , 109 is arranged to spontaneously reorganize the subgroups of nodes which are interconnected. Only the subgroup comprising the majority of nodes will be designated to form the new cluster. If no subgroup comprises such a majority then no cluster will be designated until a subsequent reorganization results in a majority subgroup or until the nodes are reconfigured.
  • Each node knows that the cluster has a total of four nodes and therefore any subgroup with three nodes holds the majority of nodes and will form the new cluster. However, if a subgroup is made up of two nodes then it would be a joint largest subgroup. In order to resolve this situation where two largest subgroups are created during the formation or reconfiguration of the cluster, a token in the form of a global variable is created when the cluster is first configured. The token is counted as one node when a group of nodes is determining if it comprises the majority of nodes. If two subgroups comprise equal numbers of nodes the subgroup including the node holding the token forms the cluster.
  • each node can be treated as having one vote and in the event of a tie in the number of votes between candidate clusters, the token provides a tie breaker vote. In the case where there are an odd number of nodes in the system as a whole, the token is not required and therefore not counted.
  • the token is arranged to move from node to node in the cluster at predetermined time intervals.
  • a node receives the token, it selects another node at random from the list of nodes connected to it. Once the predetermined time interval has elapsed, the node sends the token to the selected node.
  • the token keeps moving among the connected nodes for the life of the cluster.
  • the token is implemented in each node by a global variable called quorum_token. When quorum_token equals zero for a node, then that node does not hold the token. If quorum_token is equal to one for a node, this signifies that the node holds the token. At any point of time only one node has a nonzero quorum_token.
  • the token is initialized when the first node of the cluster is created and starts its rotation from that first node. Moving the token from one node to another is carried out by setting quorum_token to zero on a token transmitting node and setting quorum_token to one on a receiving node.
  • the token movement is an atomic transaction which uses a three phase commit protocol to set quorum_token on the transmitting and receiving nodes.
  • the first node in the cluster is node 103 where the token 201 is initialized. From node 103 the token 201 moves at successive predetermined intervals to randomly chosen nodes 109 and 105 respectively. After a further interval the token 201 then returns by chance to node 103 , as shown in FIG. 2 , and then randomly chooses to move to node 107 .
  • the token 201 is effectively an autonomous tie breaker with a randomly chosen location within a group of nodes.
  • the processing carried out by nodes during cluster formation or reconfiguration is performed in co-operation with the other nodes in the subgroup of which any given node forms a part. In other words, the nodes collectively determine whether or not they form the largest subgroup and thus whether they should form the new cluster.
  • This processing will be described in further detail with reference to the flow chart of FIG. 3 .
  • the process is initiated by a cluster being either created or reconfigured.
  • a cluster may be reconfigured automatically as a result of communications failures between nodes or manually by a system administrator. Processing then moves to step 303 where each node communicates with its connected nodes to identify the size of the subgroup.
  • processing then moves to step 305 where the subgroup compares its number of nodes to the total number of nodes to determine if that subgroup is the largest or majority group. If the subgroup does not hold a majority of the nodes then processing moves to step 307 where the nodes of that subgroup await a further reconfiguration. If, however, the subgroup comprises a majority of the nodes then processing moves to step 309 where that single largest group is designated as the cluster and provides the functions and services of the cluster. In the event of a subgroup identifying that it comprises half of all the nodes, then a subgroup holding the token will move from step 305 to step 309 while the subgroup without the token will move to step 307 .
  • FIG. 4 illustrates the processing carried out by each node on receipt of the token 201 .
  • the node receives the token from a connected node using the three phase commit protocol described above to ensure that the token transfer is atomic.
  • the result of this step is the quorum_token variable being set to one on the current receiving node and to zero on the transmitting node.
  • Processing then moves to step 403 where a timer is initiated with the predetermined time interval. Once the time interval has elapsed, processing moves to step 405 where the node chooses a connected node at random as the recipient of the token 201 .
  • Processing then moves to step 407 where the token is transmitted to the new recipient node using the three phase process described above. The process then repeats on the receiving node for the duration of the life of the cluster.
  • the token may be passed between nodes at random time intervals. In a further embodiment, the transfer of the token between nodes may be suspended for a period of time under the control of a system user. In other embodiments the token carries more than one vote. In some embodiments if a node has a planned shut down it is arranged to pass the token to another node prior to shutting down. In another embodiment, each node is assigned one or more votes and if two subgroups hold the largest numbers of votes then the subgroup containing the node to which said token is assigned is selected as the cluster. In some embodiments, the token may count for one or more votes and the number of votes or nodes held by a subgroup can be treated as being equal if the difference between them falls within a predetermined limit or band.
  • the apparatus that embodies a part or all of the present technique disclosed here may be a general purpose device having software arranged to provide a part or all of an embodiment of the invention.
  • the device could be single device or a group of devices and the software could be a single program or a set of programs.
  • any or all of the software used to implement the technique can be communicated via various transmission or storage means such as computer networks or storage devices so that the software can be loaded onto one or more devices.

Abstract

A method and apparatus is disclosed for selecting a cluster in a group of nodes in which a token is assigned to a first node of a group of nodes, subgroups of nodes that are interconnected are identified and if the two largest said subgroups comprise equal numbers of nodes then the subgroup containing the node to which said token is assigned is selected as the cluster.

Description

RELATED APPLICATION
The present application is based on, and claims priority from, India Application Number IN1097/CHE/2005, filed Aug. 8, 2005, the disclosure of which is hereby incorporated by reference herein in its entirety.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method or apparatus for selecting a cluster in a group of nodes, and more particularly to assigning, identifying, and selecting a subgroup in a group of nodes.
2. Description of the Prior Art
Computer systems which need to be highly reliable both in terms of service availability and data integrity are commonly implemented using cluster architecture. A cluster is made up a group of interconnected computers (nodes) running cluster software which enables the group to behave like a single computer. The nodes communicate with each other via a set of network connections referred to as a cluster interconnect. A cluster will generally have shared data storage devices connected to the nodes via a shared storage bus. The cluster software running on each node is arranged so that in the event of failure of any node in the cluster, the functions and services provided by the cluster are unaffected.
Failures can occur in the nodes themselves or in the cluster interconnect. In the event of a failure in the cluster interconnect, the cluster becomes split into subgroups of nodes, each unable to communicate with other subgroups. In such circumstances, the cluster software is arranged to spontaneously reorganize the subgroups to form one or more new candidate clusters. The largest candidate cluster is self selected to continue to provide the cluster functions and services. Each node knows the total number of nodes in the system and this data is used by each candidate cluster to determine whether the number of nodes it contains makes it the largest cluster. However, if two candidate clusters are the same size then this method can result in more than one cluster considering themselves to be the largest. In this case more than one cluster can accesses the cluster data set and compromise the integrity of that data.
In order to deal with this problem, some systems use a predetermined hardware element, such as a disk drive, as a tie breaker. This chosen hardware element is connected to the shared storage bus and thus connected to all nodes in the cluster. In the event of a failure in the cluster interconnect, the candidate which acquires access to the hardware first during the reorganization of nodes forms the cluster. In other words, given subgroups of the same size, the subgroup which is first in communication with the specified hardware is chosen to continue as the cluster. However, using a hardware element in this way can increase the overall hardware costs of the cluster system. Also, accessing the hardware element increases the network activity and processing complexity during the node reorganization process.
BRIEF DESCRIPTION OF THE DRAWINGS Brief Summary of the Invention
There will be described a method of selecting, or constituting, a cluster in a group of nodes, the method comprising the steps of:
    • a) assigning a token to a first node in a group of nodes;
    • b) identifying subgroups of nodes that are interconnected; and
    • c) if the two largest subgroups comprise equal numbers of nodes then selecting as the cluster the subgroup containing the node to which the token is assigned.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
FIG. 1 is a schematic illustration of a computer system including a cluster server (cluster) according to an embodiment of the invention;
FIG. 2 is a diagram illustrating the passing of a token between nodes in the cluster of FIG. 1;
FIG. 3 is a flow chart illustrating processing carried out during the formation or reorganization of the cluster of FIG. 1; and
FIG. 4 is a flow chart illustrating processing carried out during the formation and operation of each node of the cluster of FIG. 1.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION
There will be described a method of selecting, or constituting, a cluster in a group of nodes, the method comprising the steps of:
a) assigning a token to a first node in a group of nodes;
b) identifying subgroups of nodes that are interconnected; and
c) if the two largest the subgroups comprise equal numbers of nodes then selecting as the cluster the subgroup containing the node to which the token is assigned.
The token may be passed from one node to another. The token may be passed between nodes at predetermined time intervals or at random time intervals. The passing of the token can be suspended. The first node may be the first node assigned to the cluster during cluster configuration. If a node holding the token is required to shut down then prior to shut down the token may be passed to another node. If a node holding the token crashes then step c) may be suspended until the node reboots. The passing of the token between nodes may be carried out atomically so that the token is not lost and remains unique. The passing of the token between nodes is carried out using a three phase commit protocol. Each node may be assigned one or more votes and in step c) if two subgroups hold the largest numbers of votes then selecting as the cluster the subgroup containing the node to which the token is assigned. The token may count for one or more votes. The number of votes or nodes held by a subgroup may be treated as being equal if the difference between them falls within a predetermined limit.
Also described will be an apparatus for selecting a cluster in a group of nodes, the apparatus comprising:
a) a token assigned to a node in a group of nodes;
b) communication means for identifying subgroups of nodes that are interconnected; and
c) selecting means operable if the two largest the subgroups comprise equal numbers of nodes to select as the cluster the subgroup containing the node to which the token is assigned.
There will also be described a method of operating a node in a cluster, the method comprising the steps of:
a) determining the number of other nodes connected to the current node forming a connected group;
b) if the connected group of nodes comprises more than half of the total nodes in the cluster, then forming the cluster from the group; or
c) if the connected group comprises half of the total nodes then forming the cluster if the connected group includes a node to which a token is assigned.
Also described will be a node in a cluster comprising:
a) means for determining the number of other nodes connected to the current node forming a connected group;
b) means operable if the connected group of nodes comprises more than half of the total nodes in the cluster, to form the cluster from the group; or
c) means operable if the connected group comprises half of the total nodes to form the cluster if the connected group includes a node to which a token is assigned.
Further described will be a method of selecting a cluster in a group of nodes, the method comprising the steps of:
a) assigning a token at random to a node in a group of nodes;
b) assigning a vote to each node;
b) identifying subgroups of interconnected nodes; and
c) if the number of votes of the largest subgroups are equal then selecting as the cluster the subgroup containing the node to which the token is assigned, otherwise selecting the subgroup with the majority of votes.
Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to carry out a method of selecting a cluster in a group of nodes, the method comprising the steps of:
a) assigning a token to a first node in a group of nodes;
b) identifying subgroups of nodes that are interconnected; and
c) if the two largest the subgroups comprise equal numbers of nodes then selecting as the cluster the subgroup containing the node to which the token is assigned.
Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to provide apparatus for selecting a cluster in a group of nodes, the apparatus comprising:
a) a token assigned to a node in a group of nodes;
b) communication means for identifying subgroups of nodes that are interconnected; and
c) selecting means operable if the two largest the subgroups comprise equal numbers of nodes to select as the cluster the subgroup containing the node to which the token is assigned.
Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to carry out a method of operating a node in a cluster, the method comprising the steps of:
a) determining the number of other nodes connected to the current node forming a connected group;
b) if the connected group of nodes comprises more than half of the total nodes in the cluster, then forming the cluster from the group; or
c) if the connected group comprises half of the total nodes then forming the cluster if the connected group includes a node to which a token is assigned.
Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to provide a node in a cluster comprising:
a) means for determining the number of other nodes connected to the current node forming a connected group;
b) means operable if the connected group of nodes comprises more than half of the total nodes in the cluster, to form the cluster from the group; or
c) means operable if the connected group comprises half of the total nodes to form the cluster if the connected group includes a node to which a token is assigned.
Some embodiments are implemented as a computer program or group of computer programs arranged to enable a computer or group of computers to carry out a method of selecting a cluster in a group of nodes, the method comprising the steps of:
a) assigning a token at random to a node in a group of nodes;
b) assigning a vote to each node;
b) identifying subgroups of interconnected nodes; and
c) if the number of votes of the largest subgroups are equal then selecting as the cluster the subgroup containing the node to which the token is assigned, otherwise selecting the subgroup with the majority of votes.
FIG. 1 shows a computer system in the form of a cluster server 101 comprising four computers 103, 105, 107, 109 each running cluster server software and each constituting a node in a group of nodes that form the cluster server 101. The nodes are interconnected by a private network connection called a cluster interconnect 111. The cluster has three shared storage devices 113, 115, 117 which are accessed by another network connection in the form of a shared storage bus 119. A communications link 121 links each node to a wide area network (WAN) 123 in the form of the internet and enables communications between the cluster server 101 and a client computer 125. The client computer 125 is operable to access data and services provided by the cluster server 101 over the WAN 123.
In the event of a failure in the cluster interconnect 111, the cluster server software running on each node 103, 105, 107, 109 is arranged to spontaneously reorganize the subgroups of nodes which are interconnected. Only the subgroup comprising the majority of nodes will be designated to form the new cluster. If no subgroup comprises such a majority then no cluster will be designated until a subsequent reorganization results in a majority subgroup or until the nodes are reconfigured.
Each node knows that the cluster has a total of four nodes and therefore any subgroup with three nodes holds the majority of nodes and will form the new cluster. However, if a subgroup is made up of two nodes then it would be a joint largest subgroup. In order to resolve this situation where two largest subgroups are created during the formation or reconfiguration of the cluster, a token in the form of a global variable is created when the cluster is first configured. The token is counted as one node when a group of nodes is determining if it comprises the majority of nodes. If two subgroups comprise equal numbers of nodes the subgroup including the node holding the token forms the cluster. In other words, each node can be treated as having one vote and in the event of a tie in the number of votes between candidate clusters, the token provides a tie breaker vote. In the case where there are an odd number of nodes in the system as a whole, the token is not required and therefore not counted.
The token is arranged to move from node to node in the cluster at predetermined time intervals. When a node receives the token, it selects another node at random from the list of nodes connected to it. Once the predetermined time interval has elapsed, the node sends the token to the selected node. The token keeps moving among the connected nodes for the life of the cluster. The token is implemented in each node by a global variable called quorum_token. When quorum_token equals zero for a node, then that node does not hold the token. If quorum_token is equal to one for a node, this signifies that the node holds the token. At any point of time only one node has a nonzero quorum_token. The token is initialized when the first node of the cluster is created and starts its rotation from that first node. Moving the token from one node to another is carried out by setting quorum_token to zero on a token transmitting node and setting quorum_token to one on a receiving node. The token movement is an atomic transaction which uses a three phase commit protocol to set quorum_token on the transmitting and receiving nodes.
An example of this movement is illustrated in FIG. 2. The first node in the cluster is node 103 where the token 201 is initialized. From node 103 the token 201 moves at successive predetermined intervals to randomly chosen nodes 109 and 105 respectively. After a further interval the token 201 then returns by chance to node 103, as shown in FIG. 2, and then randomly chooses to move to node 107. The token 201 is effectively an autonomous tie breaker with a randomly chosen location within a group of nodes.
The processing carried out by nodes during cluster formation or reconfiguration is performed in co-operation with the other nodes in the subgroup of which any given node forms a part. In other words, the nodes collectively determine whether or not they form the largest subgroup and thus whether they should form the new cluster. This processing will be described in further detail with reference to the flow chart of FIG. 3. At step 301, the process is initiated by a cluster being either created or reconfigured. A cluster may be reconfigured automatically as a result of communications failures between nodes or manually by a system administrator. Processing then moves to step 303 where each node communicates with its connected nodes to identify the size of the subgroup. Processing then moves to step 305 where the subgroup compares its number of nodes to the total number of nodes to determine if that subgroup is the largest or majority group. If the subgroup does not hold a majority of the nodes then processing moves to step 307 where the nodes of that subgroup await a further reconfiguration. If, however, the subgroup comprises a majority of the nodes then processing moves to step 309 where that single largest group is designated as the cluster and provides the functions and services of the cluster. In the event of a subgroup identifying that it comprises half of all the nodes, then a subgroup holding the token will move from step 305 to step 309 while the subgroup without the token will move to step 307.
FIG. 4 illustrates the processing carried out by each node on receipt of the token 201. At step 401, the node receives the token from a connected node using the three phase commit protocol described above to ensure that the token transfer is atomic. The result of this step is the quorum_token variable being set to one on the current receiving node and to zero on the transmitting node. Processing then moves to step 403 where a timer is initiated with the predetermined time interval. Once the time interval has elapsed, processing moves to step 405 where the node chooses a connected node at random as the recipient of the token 201. Processing then moves to step 407 where the token is transmitted to the new recipient node using the three phase process described above. The process then repeats on the receiving node for the duration of the life of the cluster.
In another embodiment the token may be passed between nodes at random time intervals. In a further embodiment, the transfer of the token between nodes may be suspended for a period of time under the control of a system user. In other embodiments the token carries more than one vote. In some embodiments if a node has a planned shut down it is arranged to pass the token to another node prior to shutting down. In another embodiment, each node is assigned one or more votes and if two subgroups hold the largest numbers of votes then the subgroup containing the node to which said token is assigned is selected as the cluster. In some embodiments, the token may count for one or more votes and the number of votes or nodes held by a subgroup can be treated as being equal if the difference between them falls within a predetermined limit or band.
It will be understood by those skilled in the art that the apparatus that embodies a part or all of the present technique disclosed here may be a general purpose device having software arranged to provide a part or all of an embodiment of the invention. The device could be single device or a group of devices and the software could be a single program or a set of programs. Furthermore, any or all of the software used to implement the technique can be communicated via various transmission or storage means such as computer networks or storage devices so that the software can be loaded onto one or more devices.
While the present technique has been illustrated by the description of the embodiments thereof, and while the embodiments have been described in considerable detail, it is not the intention of the applicant to restrict or in any way limit the scope of the appended claims to such detail. Additional advantages and modifications will readily appear to those skilled in the art. Therefore, the technique disclosed here in its broader aspects is not limited to the specific details representative apparatus and method, and illustrative examples shown and described. Accordingly, departures may be made from such details without departure from the spirit or scope of applicant's general inventive concept.

Claims (23)

The invention claimed is:
1. A method of selecting a cluster in a group of nodes, said method comprising the steps of:
a) assigning a token to a first node in the group of nodes;
b) identifying subgroups, from among the group of nodes, that are interconnected; and
c) if two largest subgroups, from among the identified subgroups, comprise equal numbers of nodes, then selecting as the cluster a subgroup containing the node to which said token is assigned.
2. A method according to claim 1 in which said token is passed from one node to another.
3. A method according to claim 2 in which said token is passed between nodes at predetermined time intervals and/or random time interval.
4. A method according to any of claim 2 in which said passing of said token can be suspended.
5. A method according to claim 2 in which said passing of said token between nodes is carried out atomically so that said token is not lost and remains unique.
6. A method according to claim 2 in which said passing of said token between nodes is carried out using a three phase commit protocol.
7. A method according to claim 1 in which said first node is the first node assigned to the cluster during cluster configuration.
8. A method according to claim 1 in which if a node holding said token is required to shut down then prior to said shut down said token is passed to another node.
9. A method according to claim 1 in which if a node holding said token crashes then step c) is suspended until said node reboots.
10. A method according to claim 1 in which each node is assigned one or more votes and in step c) if two subgroups hold the largest numbers of votes then selecting as the cluster the subgroup containing the node to which said token is assigned.
11. A method according to claim 10 in which said token counts for one or more votes.
12. A method according to claim 10, in which the number of votes or nodes held by subgroups is treated as being equal if the difference between them falls within a predetermined limit.
13. The method according to claim 1, wherein the token is a global variable.
14. An apparatus for selecting a cluster in a group of nodes, wherein the apparatus is a part of a computer system, the apparatus comprising:
a) a token assigned to a first node in the group of nodes;
b) communication means for identifying subgroups of nodes in the group of nodes that are interconnected; and
c) selecting means, operable if two subgroups are largest of the identified subgroups and the two subgroups comprise equal numbers of nodes, to select as the cluster the subgroup containing the node to which said token is assigned.
15. The apparatus of claim 14, wherein said token is between nodes at predetermined time intervals and/or random time interval.
16. The apparatus of claim 15, wherein said passing of said token between nodes is carried out atomically so that said token is not lost and remains unique.
17. The apparatus of claim 14, said first node is the first node assigned to the cluster during cluster configuration.
18. The apparatus of claim 14, wherein if a node holding said token is required to shut down then prior to said shut down said token is passed to another node.
19. The apparatus of claim 14, wherein if a node holding said token crashes then operation of said selecting means is suspended until said node reboots.
20. The apparatus of claim 14, wherein each node is assigned one or more votes and said selecting means is operable if two subgroups hold the largest numbers of votes to select as the cluster the subgroup containing the node to which said token is assigned.
21. The apparatus of claim 20, wherein a number of votes or nodes held by subgroups is treated as being equal if a difference between them falls within a predetermined limit.
22. The apparatus according to claim 14, wherein the token is a global variable.
23. A method of operating a node in a cluster, said method comprising the steps of:
a) determining a number of nodes connected to form a connected group of nodes;
b) if said connected group of nodes comprises more than half of a total number of nodes in the cluster, then forming the cluster from said group of nodes; or
c) if said connected group comprises half of said total number of nodes, then forming said cluster if said connected group includes a node to which a token is assigned;
d) wherein a number of nodes held by subgroups is treated as being equal if a difference between them falls within a predetermined limit;
wherein the token is a global variable.
US11/491,362 2005-08-08 2006-07-24 Method or apparatus for selecting a cluster in a group of nodes Active 2030-02-24 US8812501B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
IN1097CH2005 2005-08-08
IN1097/CHE/2005 2005-08-08
ININ1097/CHE/2005 2005-08-08

Publications (2)

Publication Number Publication Date
US20070033205A1 US20070033205A1 (en) 2007-02-08
US8812501B2 true US8812501B2 (en) 2014-08-19

Family

ID=37718775

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/491,362 Active 2030-02-24 US8812501B2 (en) 2005-08-08 2006-07-24 Method or apparatus for selecting a cluster in a group of nodes

Country Status (1)

Country Link
US (1) US8812501B2 (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080077635A1 (en) * 2006-09-22 2008-03-27 Digital Bazaar, Inc. Highly Available Clustered Storage Network
GB0622553D0 (en) * 2006-11-11 2006-12-20 Ibm A method, apparatus or software for managing partitioning in a cluster of nodes
US8903917B2 (en) * 2009-06-03 2014-12-02 Novell, Inc. System and method for implementing a cluster token registry for business continuity
US9218244B1 (en) 2014-06-04 2015-12-22 Pure Storage, Inc. Rebuilding data across storage nodes
US9367243B1 (en) 2014-06-04 2016-06-14 Pure Storage, Inc. Scalable non-uniform storage sizes
US9213485B1 (en) 2014-06-04 2015-12-15 Pure Storage, Inc. Storage system architecture
US10114757B2 (en) 2014-07-02 2018-10-30 Pure Storage, Inc. Nonrepeating identifiers in an address space of a non-volatile solid-state storage
US8874836B1 (en) 2014-07-03 2014-10-28 Pure Storage, Inc. Scheduling policy for queues in a non-volatile solid-state storage
US9747229B1 (en) 2014-07-03 2017-08-29 Pure Storage, Inc. Self-describing data format for DMA in a non-volatile solid-state storage
US9495255B2 (en) 2014-08-07 2016-11-15 Pure Storage, Inc. Error recovery in a storage cluster
US9483346B2 (en) 2014-08-07 2016-11-01 Pure Storage, Inc. Data rebuild on feedback from a queue in a non-volatile solid-state storage
GB2532205B (en) * 2014-11-06 2021-10-20 Metaswitch Networks Ltd Controlling enablement of resources
US9948615B1 (en) 2015-03-16 2018-04-17 Pure Storage, Inc. Increased storage unit encryption based on loss of trust
US10082985B2 (en) 2015-03-27 2018-09-25 Pure Storage, Inc. Data striping across storage nodes that are assigned to multiple logical arrays
US11232079B2 (en) 2015-07-16 2022-01-25 Pure Storage, Inc. Efficient distribution of large directories
US9768953B2 (en) 2015-09-30 2017-09-19 Pure Storage, Inc. Resharing of a split secret
US10853266B2 (en) 2015-09-30 2020-12-01 Pure Storage, Inc. Hardware assisted data lookup methods
US9843453B2 (en) 2015-10-23 2017-12-12 Pure Storage, Inc. Authorizing I/O commands with I/O tokens
US10007457B2 (en) 2015-12-22 2018-06-26 Pure Storage, Inc. Distributed transactions with token-associated execution
US10216420B1 (en) 2016-07-24 2019-02-26 Pure Storage, Inc. Calibration of flash channels in SSD
US11422719B2 (en) 2016-09-15 2022-08-23 Pure Storage, Inc. Distributed file deletion and truncation
US10979223B2 (en) 2017-01-31 2021-04-13 Pure Storage, Inc. Separate encryption for a solid-state drive

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5784648A (en) * 1995-12-01 1998-07-21 Apple Computer, Inc. Token style arbitration on a serial bus by passing an unrequested bus grand signal and returning the token by a token refusal signal
US6243825B1 (en) * 1998-04-17 2001-06-05 Microsoft Corporation Method and system for transparently failing over a computer name in a server cluster
US6363495B1 (en) * 1999-01-19 2002-03-26 International Business Machines Corporation Method and apparatus for partition resolution in clustered computer systems
US6449641B1 (en) * 1997-10-21 2002-09-10 Sun Microsystems, Inc. Determining cluster membership in a distributed computer system
US20030078946A1 (en) * 2001-06-05 2003-04-24 Laurie Costello Clustered filesystem
US20030187927A1 (en) * 2002-02-22 2003-10-02 Winchell David F. Clustering infrastructure system and method
US6662219B1 (en) * 1999-12-15 2003-12-09 Microsoft Corporation System for determining at subgroup of nodes relative weight to represent cluster by obtaining exclusive possession of quorum resource
US6748429B1 (en) * 2000-01-10 2004-06-08 Sun Microsystems, Inc. Method to dynamically change cluster or distributed system configuration
US20040148326A1 (en) * 2003-01-24 2004-07-29 Nadgir Neelakanth M. System and method for unique naming of resources in networked environments
US20050074806A1 (en) * 1999-10-22 2005-04-07 Genset, S.A. Methods of genetic cluster analysis and uses thereof
US20050268154A1 (en) * 2000-12-06 2005-12-01 Novell, Inc. Method for detecting and resolving a partition condition in a cluster
US20060282443A1 (en) * 2005-06-09 2006-12-14 Sony Corporation Information processing apparatus, information processing method, and information processing program
US20070016822A1 (en) * 2005-07-15 2007-01-18 Rao Sudhir G Policy-based, cluster-application-defined quorum with generic support interface for cluster managers in a shared storage environment
US20070129928A1 (en) * 2005-11-08 2007-06-07 Microsoft Corporation Distributed system simulation: slow message relaxation
US20070255813A1 (en) * 2006-04-26 2007-11-01 Hoover David J Compatibility enforcement in clustered computing systems
US7299294B1 (en) * 1999-11-10 2007-11-20 Emc Corporation Distributed traffic controller for network data
US20070288935A1 (en) * 2006-06-13 2007-12-13 Zvi Tannenbaum Cluster computing support for application programs

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5784648A (en) * 1995-12-01 1998-07-21 Apple Computer, Inc. Token style arbitration on a serial bus by passing an unrequested bus grand signal and returning the token by a token refusal signal
US6449641B1 (en) * 1997-10-21 2002-09-10 Sun Microsystems, Inc. Determining cluster membership in a distributed computer system
US6243825B1 (en) * 1998-04-17 2001-06-05 Microsoft Corporation Method and system for transparently failing over a computer name in a server cluster
US6363495B1 (en) * 1999-01-19 2002-03-26 International Business Machines Corporation Method and apparatus for partition resolution in clustered computer systems
US20050074806A1 (en) * 1999-10-22 2005-04-07 Genset, S.A. Methods of genetic cluster analysis and uses thereof
US7299294B1 (en) * 1999-11-10 2007-11-20 Emc Corporation Distributed traffic controller for network data
US6662219B1 (en) * 1999-12-15 2003-12-09 Microsoft Corporation System for determining at subgroup of nodes relative weight to represent cluster by obtaining exclusive possession of quorum resource
US6748429B1 (en) * 2000-01-10 2004-06-08 Sun Microsystems, Inc. Method to dynamically change cluster or distributed system configuration
US20050268154A1 (en) * 2000-12-06 2005-12-01 Novell, Inc. Method for detecting and resolving a partition condition in a cluster
US20030078946A1 (en) * 2001-06-05 2003-04-24 Laurie Costello Clustered filesystem
US20030187927A1 (en) * 2002-02-22 2003-10-02 Winchell David F. Clustering infrastructure system and method
US20040148326A1 (en) * 2003-01-24 2004-07-29 Nadgir Neelakanth M. System and method for unique naming of resources in networked environments
US20060282443A1 (en) * 2005-06-09 2006-12-14 Sony Corporation Information processing apparatus, information processing method, and information processing program
US20070016822A1 (en) * 2005-07-15 2007-01-18 Rao Sudhir G Policy-based, cluster-application-defined quorum with generic support interface for cluster managers in a shared storage environment
US20070129928A1 (en) * 2005-11-08 2007-06-07 Microsoft Corporation Distributed system simulation: slow message relaxation
US20070255813A1 (en) * 2006-04-26 2007-11-01 Hoover David J Compatibility enforcement in clustered computing systems
US20070288935A1 (en) * 2006-06-13 2007-12-13 Zvi Tannenbaum Cluster computing support for application programs

Also Published As

Publication number Publication date
US20070033205A1 (en) 2007-02-08

Similar Documents

Publication Publication Date Title
US8812501B2 (en) Method or apparatus for selecting a cluster in a group of nodes
US7848261B2 (en) Systems and methods for providing a quiescing protocol
US7870230B2 (en) Policy-based cluster quorum determination
US6886064B2 (en) Computer system serialization control method involving unlocking global lock of one partition, after completion of machine check analysis regardless of state of other partition locks
JP3640187B2 (en) Fault processing method for multiprocessor system, multiprocessor system and node
US5774640A (en) Method and apparatus for providing a fault tolerant network interface controller
US6892316B2 (en) Switchable resource management in clustered computer system
JP2566728B2 (en) Logical path scheduling device and execution method
US6311217B1 (en) Method and apparatus for improved cluster administration
US8024432B1 (en) Method and apparatus for partitioning a computer cluster through coordination point devices
US9146790B1 (en) Performing fencing operations in multi-node distributed storage systems
US20050060608A1 (en) Maximizing processor utilization and minimizing network bandwidth requirements in throughput compute clusters
JP2003528371A (en) Federated operating system for servers
KR20040015223A (en) Resource action in clustered computer system incorporating prepare operation
JP2006187438A (en) System for hall management
US10120779B1 (en) Debugging of hosted computer programs
US8031637B2 (en) Ineligible group member status
EP1744520B1 (en) Method and apparatus for selecting a group leader
US7904752B2 (en) Synchronizing device error information among nodes
US6990608B2 (en) Method for handling node failures and reloads in a fault tolerant clustered database supporting transaction registration and fault-in logic
US5894547A (en) Virtual route synchronization
JPH07168790A (en) Information processor
CN104657240B (en) The Failure Control method and device of more kernel operating systems
US7769844B2 (en) Peer protocol status query in clustered computer system
US7515553B2 (en) Group synchronization by subgroups

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PRADHAN, TANMAY KUMAR;REEL/FRAME:018379/0995

Effective date: 20060816

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

AS Assignment

Owner name: VALTRUS INNOVATIONS LIMITED, IRELAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP;HEWLETT PACKARD ENTERPRISE COMPANY;REEL/FRAME:055360/0424

Effective date: 20210121

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8