CN100410954C - Method and system for collecting sofeware and hardware information in cluster node - Google Patents

Method and system for collecting sofeware and hardware information in cluster node Download PDF

Info

Publication number
CN100410954C
CN100410954C CNB021419299A CN02141929A CN100410954C CN 100410954 C CN100410954 C CN 100410954C CN B021419299 A CNB021419299 A CN B021419299A CN 02141929 A CN02141929 A CN 02141929A CN 100410954 C CN100410954 C CN 100410954C
Authority
CN
China
Prior art keywords
information
node
node machine
group
aggregating apparatus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB021419299A
Other languages
Chinese (zh)
Other versions
CN1466095A (en
Inventor
程菊生
吴雪丽
胡毅
田宏萍
顾光导
金正操
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CNB021419299A priority Critical patent/CN100410954C/en
Publication of CN1466095A publication Critical patent/CN1466095A/en
Application granted granted Critical
Publication of CN100410954C publication Critical patent/CN100410954C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention provides a method and a system thereof for collecting software and hardware information of nodes of different types of a monitoring cluster, which is characterized in that information is collected in a laminated and classified mode; thus, the entire cluster can be monitored as single mapping, which facilitates management and maintenance. Therefore, the running reliability of the entire cluster can be enhanced.

Description

Gather the method and system of the software and hardware information of node in the group of planes
Technical field
The present invention relates to gather the method and system of group of planes information, relate in particular to the method and system of soft, the hardware information of gathering dissimilar nodes in the group of planes, these nodes comprise computing node, login node, I/O node.
Background technology
A group of planes (Cluster) server system is the set of interconnected a plurality of stand-alone computer (node machine).These computing machines can be PC, also can be workstations etc., and each node machine all has storer, the I/O the device and operating system of oneself.These node machines link together by express network, under cooperations such as middleware, form a superserver.Cluster server calculates in extensive science, is bringing into play important effect such as aspects such as oil geologies.
Because the cluster nodes number is numerous, how the each several part information of Network of Workstation in time and is exactly gathered, be the important and urgent problem that a group of planes is monitored and safeguarded.And Network of Workstation comprises multiple node, and such as computing node, login node, I/O node, there are very big difference in soft, the hardware configuration of these type node.Need all be monitored dissimilar nodes and could be guaranteed not have the overall operation state that Network of Workstation is grasped on ground of omitting.Still lack at present the good scheme that can monitor in real time the software and hardware information unification of the different nodes of aerial fleet system.
Summary of the invention
An object of the present invention is to provide a kind of system and method for gathering the software and hardware information of dissimilar nodes in the group of planes.
An object of the present invention is to provide a kind of system and method for gathering the software and hardware information of dissimilar nodes in the group of planes, this system and method is easy to node computer quantity in the Network of Workstation is expanded.
A further object of the present invention is to provide the system and method for the software and hardware information of dissimilar nodes in a kind of new collection group of planes, and this system and method can guarantee the synchronism to each node computer data acquisition.
Further purpose of the present invention is to provide a kind of new supervisory system and method, and this system and method can reduce the taking of system resource, thereby reduces operating cost.
Further purpose of the present invention is to provide the system and method for the software and hardware information of dissimilar nodes in a kind of new collection group of planes, and this system and method can be distinguished and effectively gathers and transmit group of planes software and hardware information.
The invention provides a kind of cluster nodes information collecting and monitoring system, a described group of planes has at least one group node machine, described system comprises: the information collecting device of node machine, the information that is used for the acquisition node machine, the information aggregating apparatus of described group node machine is used to compile the information that each node machine information harvester is gathered, supervising device, be used to receive and gather the information of each node machine of a described group of planes, communication line links to each other described group information aggregating apparatus with described supervising device.
The present invention also provides a kind of method of harvester group node information, a group of planes has a plurality of node machines and is furnished with the group monitoring device, described method comprises the steps: the node machine of a described group of planes is divided into the node unit, Information Monitoring on each node machine in the node unit, in described node unit, compile the information that described each node machine collects, each the node machine information that compiles in the group is sent to described group monitoring device.
The method of a kind of harvester group node information of the present invention adopts sample grading and classification to transmit by the software and hardware to different nodes.
The method of a kind of harvester group node information of the present invention adopts collection respectively and transmission respectively by the software and hardware to different nodes.
Description of drawings
Fig. 1 is the synoptic diagram according to a cluster nodes information collecting and monitoring system of the present invention.
The synoptic diagram that Fig. 2 uses on N rack for a cluster nodes information collecting and monitoring system according to the present invention.
The detailed maps that Fig. 3 uses on N rack for a cluster nodes information collecting and monitoring system according to the present invention.
Fig. 4 is that cluster nodes information collecting and monitoring system according to the present invention is at acquisition monitoring software information method and structure.
Fig. 5 and Fig. 6 are cluster nodes information collecting and monitoring system according to the present invention image data and aggregation process in the acquisition monitoring software information.
Fig. 7 is the synoptic diagram according to an embodiment of the harvester of cluster nodes information collecting and monitoring system of the present invention.
Fig. 8 is the circuit diagram according to an embodiment of the harvester of cluster nodes information collecting and monitoring system of the present invention.
Fig. 9 is the synoptic diagram according to an embodiment of the aggregating apparatus of cluster nodes information collecting and monitoring system of the present invention.
Figure 10 is the circuit diagram according to an embodiment of the aggregating apparatus of cluster nodes information collecting and monitoring system of the present invention.
Embodiment
Technical scheme according to cluster nodes information collecting and monitoring system of the present invention, the difference of the multiple node that comprises at Network of Workstation, such as calculating node, login node, IO node, and the difference of each node software and hardware structure existence, take different approach to obtain its configuration, operation information, so that unified monitoring in addition.
For calculating node, adopt two approach to obtain its information.Hardware information is (such as rotation speed of the fan, cpu temperature etc.) obtain (such as reading IIC information on the mainboard) at node computer by an agllutination point machine information harvester, send to the information aggregating apparatus of rack then by a universal serial bus, send to the monitoring host computer of Network of Workstation after by aggregating apparatus the information of each node computer in the rack being gathered again.By the obtained big advantage of node computer information collecting device to hardware information, do not rely on the operating system of node computer exactly, even do not depend on node computer and whether be in open state, all can collect hardware information.Simultaneously, by to transmitting the classification or the hierarchical arrangement of communication, saved resource greatly.Other need pass through the software information that operating system just can be obtained, and such as cpu busy percentage, network traffics etc., can obtain by Ethernet.Collection for this class software information transmits, at first the operation Agent carries out data acquisition on each node computer, through moving middle-agent's layer or interlude the data aggregation that each node computer collects is gathered, pass to the service routine (BSP) that moves on the monitoring host computer by the Ethernet unification then.
For the login node, because its structure and computing node are similar, information also is to obtain by above-mentioned two kinds of approach.
For the IO node, owing to be not suitable for adopting node computer information collecting device acquisition hardware information, its all monitor message is obtained as Ethernet by network.The Agent that moves on each node computer is responsible for inquiring about the running state information of collecting node computer, and middle layer or intergrade pass to the service routine (BSP) that moves on the monitoring host computer after the data aggregation that collects is gathered by the Ethernet unification.Below IO is elaborated to the characteristics of information acquisition.
Owing on the mainboard of IO node machine, use the harvester cisco unity malfunction of other type node computer, so all information of IO node machine all are to obtain by Ethernet.IO node machine hardware information is different with the transmission mode of other node machine hardware informations, and the acquisition mode of IO node machine hardware information is also different.Do an explanation below.
Agent NA on the IO node machine is (this point is the same with common NA) except that the task of finishing acquisition software information, and it also will gather the hardware information of IO node machine.The acquiring way of its hardware information is by BMC chip and I on the mainboard 2The C bus is followed the IPMI standard, reads the information of hardware.In the process of the transmission of giving middle-agent's layer or interlude NP, behind software information, enclose hardware information; NP also extracts hardware information when receiving the information of IO node machine simultaneously, is attached to the back of whole rack information, is transferred to BSP; BSP is when the information of the rack that receives IO node machine place, hardware information is extracted, be placed in the buffer zone of hardware information, etc. in the database to be deposited (on this level, the separation software and hardware information of BSP, make database and GCM need not be concerned about the difference of IO node machine software and hardware information, thereby reach hiding purpose).
Like this, after the software and hardware information of all types of nodes is gathered in a different manner, be aggregated into monitoring host computer, handle by the monitoring host computer unification.According to technical scheme of the present invention, the data of collection are abundant to have contained all kinds node in the group of planes comprehensively, data centralization and be convenient to management, safeguard and control on the other hand, thus realized really whole Network of Workstation being monitored as a single reflection.
Fig. 1 is the synoptic diagram according to a preferred embodiment of cluster nodes information collecting and monitoring system of the present invention.As shown in Figure 1, comprise at least one rack 10 in the group of planes, comprise at least one node machine 101 in the rack, be provided with a node machine information aggregating apparatus 102 in the rack 10, each node machine 101 is provided with a node machine information harvester 101A, may comprise polytype node (for the purpose of clear, not being shown among the figure) in the rack.In rack, each node machine 101 is connected to node machine information aggregating apparatus 102 via universal serial bus 40A, and node machine information aggregating apparatus 102 is connected to monitoring host computer 401 via universal serial bus 40B, in the present embodiment, universal serial bus 40A and universal serial bus 40B (being referred to as universal serial bus 40) adopt 485 buses.Each node machine 101 is connected with monitoring host computer 401 via switch 30 by Ethernet 20 again.In the present embodiment, be provided with a monitoring rack (not shown).Except placing monitoring host computer 401, can also place other node machines, I/O node machine etc. in the monitoring rack.Monitoring host computer 401 receives the various types of nodal informations that gather to come automatic network 20.In embodiments of the present invention, node then for example has computing node, login node and I/O node.
Monitoring host computer 401 obtains mainly by two approach for group of planes information: an approach is a universal serial bus 40; Another approach is an Ethernet 20.Below this is illustrated respectively.
At first describe the present invention and obtain group of planes information work process by universal serial bus referring to Fig. 1.By universal serial bus 40A, the information of first set type that each node machine information harvester 101A is collected is pooled to the rack information aggregating apparatus 102 in the rack such as rotation speed of the fan, cpu temperature, the such hardware information of memory voltage.Monitoring host computer 401 is via universal serial bus 40B, the information of first set type of the node machine information aggregating apparatus 102 of each rack being compiled by polling mode is gathered, and be stored in the memory storage (not shown) in the monitoring host computer, so that carry out subsequent treatment.Repeat no more about the polling mode that monitoring host computer adopted, because of its known technology of generally understanding for those skilled in the art.
Describe the present invention referring to Fig. 1 and obtain group of planes information work process by network 20.Each node machine 10 is connected to switch 30 via Ethernet 20, switch 30 is connected to monitoring host computer 401 in the monitoring rack 50 via Ethernet 20 again, the information of second set type is delivered to monitoring host computer 401, and the information of this second set type is software information in the present embodiment.The operating system Agent of node machine 101 reads the information of this second set type that collects, then through a middle layer or program the data aggregation that collects is gathered, pass to the service routine (BSP) of operation on the monitoring host computer 401 by the Ethernet unification.Adopt the soft information of second set type of this approach collection to comprise cpu busy percentage, memory usage, network traffics, consumer process etc.
Fig. 2 has shown the synoptic diagram according to an embodiment who is applied to N rack or unit of cluster nodes information collecting and monitoring system of the present invention.Wherein a group of planes has N rack 10N, N node machine 101N arranged in each rack, an information aggregating apparatus 102 is arranged in each rack, each node machine 101 has an information collecting device 101A, the information aggregating apparatus 102 of each rack is coupled together by universal serial bus 40B with monitoring host computer 401, set up the serial communication of the first order, again the information collecting device 101A of all the node machines in each rack and the information aggregating apparatus 102 of this cabinet are coupled together by universal serial bus 40A, set up partial serial communication.In addition, by Ethernet the information collecting device 101A of each node machine 101 is linked to each other with monitoring host computer 401 in the monitoring rack 50.Switch 30 is set in Ethernet.Except monitoring host computer 401, login node machine and IO node machine can also be housed in the monitoring rack, they also are to set up with monitoring host computer 401 by Ethernet 20 and through switch 30 to communicate by letter.
Fig. 3 has shown the more detailed synoptic diagram according to an embodiment who is applied to N rack or unit of cluster nodes information collecting and monitoring system of the present invention.Wherein a group of planes has N rack 10N, N node machine 101N arranged in each rack, an information aggregating apparatus 102 is arranged in each rack, each node machine 101 has an information collecting device 101A, the information aggregating apparatus 102 of each rack is coupled together by universal serial bus 40B with monitoring host computer 401, set up the serial communication of the first order, again the information collecting device 101A of all the node machines in each rack and the information aggregating apparatus 102 of this cabinet are coupled together by universal serial bus 40A, set up partial serial communication.Thereby collection type I information, i.e. hardware information.Wherein, information collecting device 101A can be by various kinds of sensors or the direct hardware information of acquisition node machine on node machine mainboard.Simultaneously, information aggregating apparatus 102 also can directly be gathered the hardware information of whole rack by various kinds of sensors, and it is gathered each node machine hardware information that the hardware information of whole rack and information collecting device 101A gather, mail to monitoring host computer 401 by universal serial bus 40B.In addition, by Ethernet the information collecting device 101A of each node machine 101 is linked to each other with monitoring host computer 401 in the monitoring rack.Switch 30 is set in Ethernet.Except monitoring host computer 401, login node machine and IO node machine can also be housed in the monitoring rack, they also are to set up with monitoring host computer 401 by Ethernet 20 and through switch 30 to communicate by letter.Thereby gather second category information, i.e. software information.Monitoring host computer 401, information collecting device 101A and information aggregating apparatus 102 can be provided with warning device, all kinds of fault-signals that 401 pairs of various information that gather of response monitoring main frame are sent after treatment, the warning that responds.Described system also can be used for the purposes to other control of enforcement of a group of planes, such as the monitoring of the power status of a group of planes and braking control of group of planes order power-on and power-off etc.
According to the tissue of ethernet communication of the present invention and serial ports network class seemingly, also adopted a kind of structure of layering: one for node-agent (Node Proxy, NP) layer, one be the node collection (NodeAgent, NA) layer.Monitoring host computer 401 is done a group to 101 strokes in the node machine in each rack, by with the NP process communication of node machine 101, in each group, select a node machine as the agency of group (NP).401 of monitoring host computers are communicated by letter with NP.NA is responsible for the information of acquisition operations system, monitors the data-collection command from NP simultaneously, and gives NP with information data transmission.By such hierarchy, that make that monitoring host computer 401 faces is the NP of minority, rather than many node machines 101.The node machine 101 of relatively small amounts in the rack 10 that while NP faces; Reduce the possibility of the instantaneous blast of communicating by letter like this, also strengthened the elasticity of monitoring software structure simultaneously.
Accompanying drawing 4 has shown according to the present invention for gathering the information of second set type, i.e. a preferred embodiment of the collection monitoring system of software information (hardware information that also can comprise the IO node).Moving basic service module or program (BSP) 11 on monitoring host computer, it is responsible for sending data acquisition command when needs are understood the Network of Workstation running status, waits for and receive the data of being returned by node computer then, and it is gathered and analyzing and processing.All node computers of Network of Workstation are divided into several groups 12, each group 12 has N node computer (as shown in accompanying drawing 5 and 6) respectively, move node proxy module or program (NP) 14 on the node computer but have only, and all moving node acquisition module or program (NA) 13 on each node machine.Wherein np module 14 is responsible for after the acquisition of receiving from BSP, sends acquisition the NA module of all node computers in this group to, waits for and receive the data that the NA module is returned then, and it is gathered the unified BSP that sends in back; 13 responsible running state data of periodically gathering the place node computer of NA module, and after receiving acquisition, return up-to-date image data once immediately.NA module and np module all are software or the programs that runs on the node computer operating system.
According to the above, we as can be seen, the module of supervisory system is divided into three grades according to function separately, BSP is the first order, NP is the second level, NA is the third level.Be in other module of different levels level and finish different tasks, they cooperatively interact, and realize the monitoring to Network of Workstation.Wherein, the NA module has been played the part of the function of information collecting device, and np module has been played the part of the function of information aggregating apparatus.
The primary information gatherer process can be divided into two steps:
One, the transmission of acquisition.As shown in Figure 5, BSP module 11 at first is broadcast to all np modules 14 to acquisition by the udp broadcast mode, and like this, acquisition just arrives all NP synchronously; After NP receives acquisition, in the udp broadcast mode acquisition is broadcast to all NA modules 13 in the place group 12 immediately, same, acquisition arrives all NA modules again synchronously.We are not difficult to find out, final all the NA modules that arrive synchronously in the whole Network of Workstation of the acquisition that BSP sends.
Two, image data gathers.As shown in Figure 6, each NA module 13 is periodically gathered the running state data of place node computer, and is stored in the buffer zone, when receiving the acquisition that np module 14 sends, just sends the up-to-date data message that once collects to NP immediately; Each NP receives the image data that this group 12 all NA return, and it is gathered the unified BSP module 11 that sends in back; BSP receives the data that all NP return, and gathers, and inserts in the database.
According to the above, because data acquisition command arrives all NA synchronously, NA receives order and returns up-to-date data of once gathering immediately, so, what BSP finally received is the running state data of all node computers at synchronization, can reflect the overall operation situation of Network of Workstation.
Because NA module self is image data periodically, can respond acquisition fast, the guarantee information gatherer process is finished rapidly.The collection period of NA module had both guaranteed the real-time of data through rationally being provided with, and had farthest reduced the resource of node computer again.Thereby also reduce operating cost.
Another program GCM (monitoring display module) sense data from database of monitoring host computer, and can show in patterned mode.
A utilization as preferred embodiment of the present invention provides an example below.
A Network of Workstation is made up of 256 node computers (server), and 256 node computers are placed in respectively in 16 racks, lay 16 node computers in each rack.These 256 node computers connect into a large-scale Network of Workstation by Ethernet equipment.
For the ruuning situation of this each node computer of system is monitored, according to the present invention, according to its physical layout (rack) grouping, 16 node computers in each rack are one group, totally 16 groups with these 256 node computers for we.On each node computer, all move node acquisition module NA, on one of them of every group of 16 node computers, also moving node proxy module NP.Move BSP on the monitoring host computer, it is communicated by letter with NA with NP with Ethernet by 485 serial ports networks.
When monitoring host computer BSP need understand group of planes running status, at first acquisition is sent to the NP of 16 groups in the udp broadcast mode; Each NP sends to this order in the udp broadcast mode 16 NA of this group immediately after receiving this order.Like this, the acquisition sent of BSP just arrives all NA of (256) node computers synchronously.
Each NA periodically gathers the data of place node computer, leaves in the buffer zone, after it receives the acquisition of NP, returns up-to-date image data once immediately; Each NP receives 16 data that NA beams back in this group, sends to BSP after it is gathered.
From top description to the embodiment of the invention as can be seen, computers group monitoring of the present invention and method are easy to the quantity of group of planes node computer is expanded.Such as, in above-mentioned example, when 256 node computers expand to 320 node computers, for BSP, only increased by 4 and accepted and the np module of the information of transmission.
Fig. 7 and Fig. 8 are respectively synoptic diagram and the circuit diagram of the used node machine information harvester 101A of one embodiment of the invention.Wherein be provided with central processing unit (microprocessor), and the communication interface that is connected and is used for the information of transmitting with monitoring host computer 401 with this central processing unit; This central processing unit is by its I 2The C bus interface is connected with node machine mainboard.In the present embodiment, this communication interface is the RS-485 interface, is used for monitor node machine mainboard and transmits information.Single-chip microcomputer is by its I 2The detection information that the C bus interface is connected with node machine mainboard and receiving node machine mainboard transmits.Above-mentioned device also is provided with the switch that is used for fixed this device ID address on the address wire of central processing unit, this device directly is connected with the 5VSB power supply of place node machine.Pass through I 2The temperature and the fan running status of the measured intranodal of C bus receiving node machine (mainboard) sensor, and point for measuring temperature can be set as required voluntarily and settle fan, extensibility is good;
The switch of present embodiment is connected with single-chip microcomputer with reset signal, can carry out operations such as remote on-off easily, owing to node machine information harvester 101A directly is connected with the 5VSB power supply of place node machine, and therefore can independent operating.
Referring to Fig. 8, be provided with a single-chip microcomputer U1, the I that single-chip microcomputer U1 forms by its port P1.6, P1.7 2The C bus interface is connected with node machine mainboard corresponding interface, reads the detection information of voltage, temperature and the fan of node machine mainboard, and reads control temperature, rotation speed of the fan monitoring chip.Above-mentioned device also is provided with the pilot lamp that is used for the display monitoring state, and this pilot lamp is connected to the output port of central processing unit.Single-chip microcomputer U1 is connected with LED S1 and LED4-LED6 by its output signal LED1-LED6, constitutes alarm lamp.
In an embodiment, also be provided with switch control chip U6, be used to export mainboard switching signal and the reset signal RST of single-chip microcomputer U1, therefore, can be when damage fault occurring self-closing link point machine, be not subjected to serious breaking-up with protection node machine; In addition, above-mentioned device also is provided with the switch S 1 of ID address on the address wire of single-chip microcomputer U1, and this switch is used for setting this device in whole monitoring system way address information.In the present embodiment, its power supply directly is connected with the 5VSB power supply of place node machine, can be independent of this node machine operation.
The present invention has realized real-time monitoring and the warning to each node machine of Network of Workstation, and protection node machine is not damaged, and the user can grasp the current running status of a group of planes quickly, and carries out operation such as remote on-off easily; Communicate by letter with the node machine information aggregating apparatus 102 of rack by 485 high-speed serial bus; Accept and carry out the node machine information aggregating apparatus 102 of rack the information aggregating order, add/power off command and reset command etc., realize operations such as remote information location, remote on-off; Whether the present invention does not rely on the node machine and starts; And has the automatic recognition function in address.
Fig. 9 and Figure 10 are respectively the synoptic diagram circuit diagram that the used node machine information of one embodiment of the invention compiles warning device 102.Information aggregating apparatus 102 is between monitored node machine and monitoring host computer; the information of compiling the monitored node machine; and carry out alternately with monitoring host computer; the needs that extensive Network of Workstation carried out monitoring management can be satisfied, and each hardware information that monitored object can read node machine 101 can be expanded on a large scale.As shown in Figure 2, this monitor message is compiled warning device and will be compiled from the information of the information collecting device 101A on each node machine 101 in the rack, and communicates by letter with monitoring host computer 401 by 485 buses.
Information aggregating apparatus 102 comprises central processing unit at least, be used for the communication interface and the storage unit that communicate with node machine harvester 102 and monitoring host computer more than one; This communication interface is connected with central processing unit, and this central processing unit is connected with this storage unit.Information aggregating apparatus 102 also is provided with the interface of the sensor that is used for direct joint detection rack integral status, and as the connecting interface of the sensor of power supply, this connecting interface is connected to the analog to digital conversion input end of central processing unit.Thereby information aggregating apparatus 102 also can directly carry out information acquisition to the rack integral status and compile, and directly monitoring and operation are implemented in whole some operation to rack simultaneously, as the situation information acquisition of rack power supply with to the control of rack power-on and power-off.
Information aggregating apparatus 102 also is provided with the device that is used to set the ID address, and this device is connected with the data bus of central processing unit.It also is provided with the device that is used to set hardware integrated circuit board sign, and this device is connected with the data bus of central processing unit.This node machine information aggregating apparatus also is provided with the pilot lamp that is used to show its duty and display alarm information, and this pilot lamp is connected with central processing unit.
Referring to Figure 10, information aggregating apparatus 102 of the present invention is provided with central processing unit U1, is made of RS485 serial communication interface U16, U6 and storer U3, U4; Wherein, this RS485 serial communication interface U16 directly is connected with central processing unit U1, this RS485 serial communication interface U6 is connected with central processing unit U1 through serial communication chip U18, and glacis central processing unit U1 is connected by data address bus with this storer U3, U4.Central processing unit U1 connects a connecting interface J9 by its analog to digital conversion signal port P5.0/ADC0, P5.1/ADC1, and this interface J9 is used to detect the sensor of rack power supply; In addition, also be provided with the device SW8 that is used to set the ID address in the present embodiment, it is a multi-way switch that is connected with the data bus of central processing unit, is used for manually setting this identification address of the present invention.Central processing unit U1 does not connect respectively by its output port P4.2-P4.2 and controls pilot lamp U7, U8, U9, the U10 that is used to show its duty and display alarm information.
Information aggregating apparatus 102 places in the rack, can directly gather information such as the interior cabinet fan of rack, temperature, and can increase leak informaton fan and temperature sensor as required, and its interface J1 is used for being connected with fan, and central processing unit U1 connects and controls the rotating speed of fan by this interface J1.102 pairs of information of oneself gathering of information aggregating apparatus of the present invention are monitored; Communicate by letter with the information collecting device that is arranged on the node machine by the RS485 high-speed serial bus simultaneously, each node machine running status and operational factor are sent to monitoring host computer in information that oneself is gathered and the rack.Accept the order that monitoring host computer sends, realize long-range go up information acquisition and monitoring.And according to monitored instruction control, as the power supply of node machine and the switch of rack power supply.When catastrophic failure occurring, unit is implemented power-off protection.
By above description, it will be apparent to those skilled in the art that, make hardware information after collection, be aggregated into monitoring host computer according to the present invention, handle by monitoring host computer is unified, position and report to the police, thereby realized a whole group of planes is monitored as a single map, therefore can improve the range of application that group of planes reliability of operation also can further expand a group of planes on this basis.
Owing on the mainboard of IO node machine, use the harvester cisco unity malfunction of other type node computer, so all information of IO node machine all are to obtain by Ethernet.IO node machine hardware information is different with the transmission mode of other node machine hardware informations, and the acquisition mode of IO node machine hardware information is also different.Do an explanation below.
NA on the IO node machine is (this point is the same with common NA) except that the task of finishing acquisition software information, and it also will gather the hardware information of IO node machine.The acquiring way of its hardware information is by BMC chip and I on the mainboard 2The C bus is followed the IPMI standard, reads the information of hardware.In the process of the transmission of giving NP, behind software information, enclose hardware information; NP also extracts hardware information when receiving the information of IO node machine simultaneously, is attached to the back of whole rack information, is transferred to BSP; BSP is when the information of the rack that receives IO node machine place, hardware information is extracted, be placed in the buffer zone of hardware information, etc. in the database to be deposited (on this level, the separation software and hardware information of BSP, make database and GCM need not be concerned about the difference of IO node machine software and hardware information, thereby reach hiding purpose).
Workflow to supervising device (monitoring host computer)-information aggregating apparatus (compiling card)-information collecting device (capture card) illustrates below.
Supervising device is to be undertaken by BSP with the communication interaction that compiles card.Just say BSP and the transmission course of compiling card below.
BSP is according to form bag composed as follows: (order of transmission is from low to high)
Figure C0214192900151
BSP waits for the answer that compiles card then.Compile in the process that is stuck in monitoring serial ports network data, find that Serial Port Line has signal, just attempt to find " bag beginning flag " and " bag termination flag ", judge then whether destination address wherein is identical with the ID of oneself, if different, so just abandon this packet, continue to monitor the serial ports network.If destination address is identical with the ID of oneself, check so check and, if not right, so also abandon this packet, continue to monitor the serial ports network.
If above-mentioned inspection is all right, compiles card so and will do corresponding operation (collect data, power on or electricity etc. down) according to the type of bag.If the collection data, a complete rack data transmission that will be kept at buffer zone is so given BSP, and data layout is as follows:
Figure C0214192900152
Figure C0214192900153
BSP compiles in the process that card replys in wait, does not also compile the answer of card if surpass the regular hour, resends order so.After receiving the answer that compiles card, at first check the form of packet, if not right, resend order so, and wait for new answer.If, so the information extraction in the packet is come out, deposit in the buffer zone of BSP oneself maintenance by checking.
If BSP receives is to compile card to last/answer of electricity operation down, so this answer need convey to CMS (because CMS be on/promoter of electricity operation down).
Figure C0214192900161
BSP just can know by the state value in the bag whether this operation is compiled the card successful execution.No matter compile card and carry out this operation success or not, BSP is according to success or not, and the result who returns execution gives CMS.
Below to capture card with compile that image data illustrates to the warning course of work between the card.
1. the acquisition communication process is as follows: send the node computer address, the capture card that meets the address is replied the node computer address.Compile card and send image data order (command type 0x03), this capture card is received back commands in return type 0x03.Compile the card command type 0x03 that gives an order again, this capture card just sends the data of inquiry.
The transmission form of capture card image data:
The dynamo-electric source 12V+ verification of the dynamo-electric source of 3.3V+ node, dynamo-electric source, node machine address+CPU voltage+CPU1 temperature+CPU1 fan+CPU2 temperature+CPU2 fan+housing temperature+system fan 1+ system fan 2+ node 5V+ node and;
Add verification and, it is 13 bytes that each node machine transmits data.
2. capture card warning (fault demonstration)
Compile card and send the node computer address, the capture card that meets the address is replied the node computer address.Compile card and send fault display command (command type 0x01), this capture card is received back commands in return type 0x01, compile card and send the fault display command once more, after receiving, starts this capture card warning device (making bright, the sudden strain of a muscle of corresponding fault display lamp), and return command type 0x01, if make mistakes in the communication process, then restPose.
Obviously, those skilled in the art can be according to reasons such as different situations or customs, and how many data layout in the previous example or bytes are adjusted or changed.In addition, the node computer information collecting device is responsible for the information of acquisition hardware part, comprises temperature voltage fan etc., with the protocol format of agreement data is issued information aggregating apparatus, issues supervising device by information aggregating apparatus again.Supervising device playbacks data according to the order of agreement regulation again.Such as first data is CPU voltage, and second data is rotation speeds of the fan, is the implication of coming specified data like this.So this local information type can clearly illustrate in the content of communications, also can only transmit data, provide and in communication protocol, imply.
Be not difficult to find out by foregoing description simultaneously, collection and monitor procedure for the collection of the hardware information of the software information of all types of nodes and IO node and monitor procedure and above-mentioned hardware information are similar, but are that NA program (another kind of harvester) and NP program (another kind of aggregating apparatus) are by carrying out between Ethernet and monitoring host computer.In addition, about the inner structure of NA program (harvester) and NP program (aggregating apparatus), those skilled in the art are easily according to the present invention to its programming, just repeat no more here.
Like this,, use different modes to be gathered, be aggregated into monitoring host computer again, by its unified processing for soft, the hardware information of all types of nodes.Therefore, the present invention has realized the collection to type node information all in the group of planes.
By above description, it will be apparent to those skilled in the art that, make soft, the hardware information of dissimilar nodes after collection, be aggregated into monitoring host computer according to the present invention, handle by monitoring host computer is unified, thereby realized a whole group of planes is monitored as a single object, therefore can improve the range of application that group of planes reliability of operation also can further expand a group of planes on this basis.
It should be noted last that, above embodiment only in order to the explanation the present invention and and unrestricted technical scheme described in the invention; Therefore, although this instructions has been described in detail the present invention with reference to each above-mentioned embodiment,, those of ordinary skill in the art should be appreciated that still and can make amendment or replacement to the present invention with being equal to; And all do not break away from the technical scheme and the improvement thereof of the spirit and scope of the present invention, and it all should be encompassed in the middle of the claim scope of the present invention.

Claims (14)

1. cluster nodes information collecting and monitoring system, a described group of planes has at least one node unit, and described system comprises:
The information collecting device of node machine is used for the hardware information and the software information of acquisition node machine, and the hardware information that is collected is delivered to information aggregating apparatus;
At the information aggregating apparatus of described at least one node unit, be used to compile the hardware information that each node machine information harvester is gathered, and be delivered to supervising device; And
Supervising device is used for receiving and gathering from information aggregating apparatus the hardware information of each node machine of a described group of planes, and receives and gather the software information of each node machine of a described group of planes from information collecting device.
2. cluster nodes information collecting and monitoring system as claimed in claim 1, wherein said information aggregating apparatus also are used for directly gathering and the whole relevant information of described at least one node unit.
3. cluster nodes information collecting and monitoring system as claimed in claim 1 further comprises a communication line.
4. cluster nodes information collecting and monitoring system as claimed in claim 3, wherein said communication line comprises universal serial bus.
5. cluster nodes information collecting and monitoring system as claimed in claim 3, wherein said communication line comprises Ethernet.
6. cluster nodes information collecting and monitoring system as claimed in claim 4, described universal serial bus further comprises first universal serial bus, links to each other with described information aggregating apparatus in order to the information collecting device with described group described node machine.
7. cluster nodes information collecting and monitoring system as claimed in claim 6, wherein said universal serial bus further comprises second universal serial bus, links to each other with described supervising device in order to the information aggregating apparatus with described group described node machine.
8. cluster nodes information collecting and monitoring system as claimed in claim 5, wherein said Ethernet links to each other each node machine of described group with described supervising device.
9. the method for a harvester group node information, a group of planes has a plurality of node machines and is furnished with the group monitoring device, and described method comprises the steps:
The node machine of a described group of planes is divided at least one node unit,
By acquisition hardware information and software information on information collecting device each node machine in the node unit,
In described node unit, compile the hardware information that described each node machine collects by information aggregating apparatus,
The hardware information of each node machine of compiling in the group is sent to described group monitoring device, and software information is sent to described group monitoring device from information collecting device.
10. the method for harvester group node information as claimed in claim 9 also comprises direct collection and generation and described group of whole relevant information, and compiles in described compilation steps.
11. node machine information aggregating apparatus that is used for the cluster nodes information collecting and monitoring system of claim 1, which comprises at least central processing unit, be used for the communication interface and the storage unit that communicate with node machine harvester and monitoring host computer more than one, this communication interface is connected with this central processing unit, and this central processing unit is connected with this storage unit.
12. as the node machine information aggregating apparatus of claim 11, it also is provided with the connecting interface of the sensor that is used for joint detection rack power supply, this connecting interface is connected to the analog to digital conversion input end of described central processing unit.
13. the node machine information aggregating apparatus as claim 11 also is provided with the device that is used to set this device ID address, this device is connected with the data bus of described central processing unit.
14. the node machine information aggregating apparatus as claim 11 also further is provided with the pilot lamp that is used for display working condition and display alarm information, this pilot lamp is connected with central processing unit.
CNB021419299A 2002-06-10 2002-08-27 Method and system for collecting sofeware and hardware information in cluster node Expired - Fee Related CN100410954C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB021419299A CN100410954C (en) 2002-06-10 2002-08-27 Method and system for collecting sofeware and hardware information in cluster node

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
CN02237849.9 2002-06-10
CN02237849 2002-06-10
CN022378499 2002-06-10
CN021256268 2002-07-25
CN02125626 2002-07-25
CN02125626.8 2002-07-25
CNB021419299A CN100410954C (en) 2002-06-10 2002-08-27 Method and system for collecting sofeware and hardware information in cluster node

Publications (2)

Publication Number Publication Date
CN1466095A CN1466095A (en) 2004-01-07
CN100410954C true CN100410954C (en) 2008-08-13

Family

ID=34198440

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021419299A Expired - Fee Related CN100410954C (en) 2002-06-10 2002-08-27 Method and system for collecting sofeware and hardware information in cluster node

Country Status (1)

Country Link
CN (1) CN100410954C (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1331042C (en) * 2004-03-29 2007-08-08 联想(北京)有限公司 Message service device and method for console of machine group mornitoring-controlling system
CN101834876B (en) * 2010-05-27 2012-11-21 哈尔滨工业大学 Distributed semi-physics simulation system based on Bluetooth, database and UDP protocol and data summarization and distribution method thereof
CN102313506B (en) * 2010-07-09 2013-12-25 联想(北京)有限公司 Method for detecting physical position of equipment, cabinet and equipment
CN103095739A (en) * 2011-10-27 2013-05-08 英业达科技有限公司 Cabinet server system and node communication method thereof
CN102394779A (en) * 2011-11-07 2012-03-28 百度在线网络技术(北京)有限公司 Centralized management system of racks and method thereof
CN103207825A (en) * 2012-01-13 2013-07-17 百度在线网络技术(北京)有限公司 Method and device for managing faults of entire equipment cabinet
CN102693166B (en) * 2012-05-10 2015-04-22 华为技术有限公司 Method, device and system for processing information
US9529583B2 (en) * 2013-01-15 2016-12-27 Intel Corporation Single microcontroller based management of multiple compute nodes
CN103516553A (en) * 2013-10-22 2014-01-15 浪潮电子信息产业股份有限公司 Rack-mounted server information management and design method based on transparent network switch
CN104156297A (en) * 2014-08-07 2014-11-19 浪潮(北京)电子信息产业有限公司 Warning method and device
CN105354129A (en) * 2015-12-15 2016-02-24 山东海量信息技术研究院 Node management and asset management method for high-end fault-tolerant server
CN107391342B (en) * 2017-07-21 2021-01-15 苏州浪潮智能科技有限公司 Database all-in-one machine and monitoring method thereof
CN109117350A (en) * 2018-09-20 2019-01-01 北京北信源信息安全技术有限公司 Alarm method, device and the server of automatic monitoring computer software and hardware

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920691A (en) * 1989-07-31 1999-07-06 Kabushiki Kaisha Toshiba Computer network system for collecting distributed management information
WO2001042988A2 (en) * 1999-11-15 2001-06-14 Transcom Software Inc. Computer network information management system and method
US20010029474A1 (en) * 2000-04-07 2001-10-11 Noriaki Yada Asset management system and asset management method
WO2002021276A1 (en) * 2000-09-08 2002-03-14 Goahead Software Inc>. A system and method for managing clusters containing multiple nodes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920691A (en) * 1989-07-31 1999-07-06 Kabushiki Kaisha Toshiba Computer network system for collecting distributed management information
WO2001042988A2 (en) * 1999-11-15 2001-06-14 Transcom Software Inc. Computer network information management system and method
US20010029474A1 (en) * 2000-04-07 2001-10-11 Noriaki Yada Asset management system and asset management method
WO2002021276A1 (en) * 2000-09-08 2002-03-14 Goahead Software Inc>. A system and method for managing clusters containing multiple nodes

Also Published As

Publication number Publication date
CN1466095A (en) 2004-01-07

Similar Documents

Publication Publication Date Title
CN100410954C (en) Method and system for collecting sofeware and hardware information in cluster node
CN100463822C (en) Computer interlock system
CN101369927B (en) Universal remote automatic data acquisition system
CN100339835C (en) Method and system for cluster fault localization and alarm
CN201623722U (en) Supervising platform for running and maintaining information security of electric power secondary system
CN104123134A (en) Intelligent electricity use data management method and system based on AMI and J2EE
CN104637265A (en) Dispatch-automated multilevel integration intelligent watching alarming system
CN103295155A (en) Security core service system monitoring method
CN106469328A (en) A kind of intelligent management system and approaches to IM
CN110932887A (en) BMC debugging method, system and device
CN106469329A (en) A kind of intelligent management system and approaches to IM
CN104516801A (en) Substation computer monitoring system and method
CN112327777A (en) Data acquisition system and method
CN110430265A (en) A kind of method and device obtaining server and inter-exchange corresponding relationship
CN107918273A (en) A kind of application system of mobile technology of Internet of things on halved belt sorter
CN108107292A (en) The business datum monitoring system and method for Electric Power Quality On-line Monitor System
CN101707503B (en) Embedded method and device for controlling automatic positioning of channel communication failure
CN103152274A (en) Energy efficiency and safety data wireless network routing device, energy efficiency and safety data wireless network routing system and method applying same
CN104504537B (en) A kind of transformer station's AC power monitoring system and method
CN107404416A (en) A kind of visualizing monitor method of power information acquisition system
CN111953525A (en) Special equipment operation and maintenance monitoring system
CN106385332A (en) Operation data collection and fault response method based on WIFI
CN106130186A (en) Small-sized photovoltaic power station data monitoring system
CN110640266A (en) Electric welding remote monitering system
CN110995525A (en) Router detection method based on maintenance matrix

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080813

Termination date: 20200827