CN104503867A - Method for automatic disaster recovery after drop-off and reconnection of extension cabinet - Google Patents

Method for automatic disaster recovery after drop-off and reconnection of extension cabinet Download PDF

Info

Publication number
CN104503867A
CN104503867A CN201410817445.9A CN201410817445A CN104503867A CN 104503867 A CN104503867 A CN 104503867A CN 201410817445 A CN201410817445 A CN 201410817445A CN 104503867 A CN104503867 A CN 104503867A
Authority
CN
China
Prior art keywords
extension cabinet
data
extension
cabinet
identification information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410817445.9A
Other languages
Chinese (zh)
Other versions
CN104503867B (en
Inventor
段舒文
李艳国
王道邦
王清翰
罗华
周泽湘
方仑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Original Assignee
BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING TOYOU FEIJI ELECTRONICS Co Ltd filed Critical BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority to CN201410817445.9A priority Critical patent/CN104503867B/en
Publication of CN104503867A publication Critical patent/CN104503867A/en
Application granted granted Critical
Publication of CN104503867B publication Critical patent/CN104503867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a method for automatic disaster recovery after the drop-off and reconnection of extension cabinet, and belongs to the technical field of computer storage. The method comprises the following steps: 1, scanning hardware equipment, establishing a hardware topological graph, creating a circular buffer queue, and savingthe state identification information of the extension cabinet; 2, synchronously writing data into the buffer queue when the data is written into the extension cabinet, updating the state identification information if the data is successfully written, checking whether the extension cabinet is normally connected or not if the data fails to be written, reading the data in the circular buffer queue for rewriting if the extension cabinet is normally connected, otherwise savingrelated information; 3, after the extension cabinet is connected to a system, checking whether the extension cabinet is original equipment of the system or not, performing savingand configuration recovery according to the related information if the extension cabinet is the original equipment of the system, importing the data before the drop-off of the extension cabinet, otherwise creating a circular buffer queue for the newly added equipment, and savingthe state identification information. Compared with the prior art, the method has the advantages that the usability and data security of a redundant array of independent disks product are improved, the number of field maintenance personnel is reduced, and themaintenance cost of the product is lowered.

Description

A kind of extension cabinet goes offline and disaster automatic recovery method after reconnecting
Technical field
The present invention relates to a kind of extension cabinet to go offline and the method automatically recovered of disaster after reconnecting, belong to computer memory technical field.
Background technology
Current data stores and often adopts disk array (Redundant Arrays ofindependent Disks, RAID), i.e. Redundant Array of Independent Disks (RAID), it is making up that single disk space is limited, while performance these defects strong, improve the security of data, thus be widely used.But loading the main cabinet of polylith disk, its storage space is also not enough to the growth tackling data explosion type, so introduce the extension cabinet existed as main cabinet capacity extension unit.This just determines extension cabinet is not individual components, can not depart from main cabinet and individualism.But the introducing of extension cabinet, bring new problem to storage system.
In prior art, main cabinet is reconnected to after extension cabinet in use goes offline, system can only find the storage medium in extension cabinet, automatically corresponding file-level and block level storage derivation service cannot be recovered, client need interrupt the stores service of existing main cabinet to restart whole system, maybe need contact professional to go manual recovery to configure, can stores service be provided to make extension cabinet; And the data of losing in extension cabinet goes offline process, then can not give for change again.
Summary of the invention
The object of the invention is, for solving the loss of data write when existing extension cabinet goes offline, to reconnect the problem that rear business cannot be recovered automatically, providing a kind of extension cabinet to go offline and the method automatically recovered of disaster after reconnecting.
The object of the invention is to be achieved through the following technical solutions:
Extension cabinet goes offline and the method automatically recovered of disaster after reconnecting, comprises the following steps:
Step one, start finger daemon, finger daemon, according to the annexation of hardware, creates hardware topology figure, circular buffer queue, collects and preserves the device-identification information of extension cabinet;
In described step one, after storage system powers up, the driver of loading control, driver will start finger daemon, and this process scans the backboard, the disk that are directly or indirectly connected on HBA card, create the hardware topology figure under HBA card.And as a whole with extension cabinet, for each extension cabinet creates a circular buffer queue, collect the device-identification information of extension cabinet and preserve.
The information that extension cabinet device-identification information comprises is as the last write time of disk each in extension cabinet, for identifying the message bit pattern etc. that in extension cabinet, whether storage medium uses.
Data consistency protection is carried out during step 2, data write extension cabinet;
In described step 2, finger daemon receives system after the request of extension cabinet write data, while data are write disk, in the corresponding circular buffer queue of the extension cabinet created in write step one.When extension cabinet return correctly write time, delete in buffer queue the data write, upgrade extension cabinet device-identification information.If the failure of data write extension cabinet, performs step 3.
Check whether extension cabinet normally connects during step 3, data write extension cabinet failure;
In described step 3, check whether extension cabinet normally connects.If extension cabinet normally connects, then re-issue the data that in step 2, circular buffer queue is preserved, successfully after write, upgrade the device-identification information of extension cabinet, delete the data in the queue of extension cabinet circular buffer; If detect that extension cabinet goes offline, then preserve the topology diagram (topology diagram of the extension cabinet that goes offline of extension cabinet device-identification information and the extension cabinet that goes offline, a subgraph of the hardware topology figure preserved in step one), upgrade hardware topology figure simultaneously.
Data cached ageing of step 4, finger daemon quantitative check.
In described step 4, finger daemon timing scan is data cached, checks the holding time of data and current system time.The data cached selection exceeding certain hour is abandoned.The space that the circular buffer queue of release extension cabinet, extension cabinet device-identification information, extension cabinet topology diagram use.
As preferably, can arrange the out-of-service time T of an acquiescence for the data of buffer memory and be kept in configuration file, user can change the out-of-service time T in configuration file.
Business recovery is carried out when step 5, extension cabinet connecting system;
In described step 5, scanning adds the information of extension cabinet, mates with the topology diagram of the extension cabinet of preserving in step 3.If recognizing this extension cabinet is new equipment, then upgrade hardware topology figure, and the extension cabinet establishment circular buffer queue for newly adding, collect extension cabinet device-identification information and also preserve; If recognizing this extension cabinet is the original extension cabinet of system, upgrade hardware topology figure, and automatically configure associated storage derivation service according to the superblock information in extension cabinet disk and configuration information, recovering associated storage derives the state before going offline to extension cabinet of serving.
Date restoring is carried out under the prerequisite of step 6, extension cabinet business recovery;
In described step 6, check that whether the device-identification information of extension cabinet is consistent with the extension cabinet device-identification information of record, if unanimously, by the date restoring in respective cycle buffer queue to extension cabinet, and upgrade extension cabinet device-identification information; If inconsistent, abandon the data in circular buffer queue, again collect extension cabinet device-identification information and preserve.
Beneficial effect
Present invention achieves a kind of extension cabinet to go offline and disaster automatic recovery method after reconnecting, no matter extension cabinet because of why reason and lose connection, or connecting line fault, or maloperation etc., when extension cabinet is again connected to main frame in official hour time, as long as the data in extension cabinet can be recovered, will automatically recover according to the original superblock information of extension cabinet and relevant configuration information after system identification to extension cabinet, recover storage that extension cabinet is correlated with to derive to serve and to recover when extension cabinet goes offline data cached when not interrupting the stores service of main cabinet.
Reconnect storage and the configuration of rear automatic recovery extension cabinet, and it is data cached to recover when extension cabinet goes offline, substantially increase the ease for use of product, the security of data, the number of times of obvious minimizing maintainer field maintemance, thus significantly reduce the maintenance cost of Products, and then win higher customer satisfaction.
Accompanying drawing explanation
Fig. 1 is existing storage system framed structure schematic diagram;
Fig. 2 is the processing flow chart of the inventive method data write
Fig. 3 is the processing flow chart of extension cabinet access
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in detail, also describe technical matters and the beneficial effect of technical solution of the present invention solution simultaneously, it is pointed out that described embodiment is only intended to be convenient to the understanding of the present invention, and any restriction effect is not play to it.
Be illustrated in figure 1 existing storage system framed structure schematic diagram, main storage medium cashier's office in a shop, by main backboard cashier's office in a shop, is connected to kernel; Storage medium in extension cabinet, by the cascade of extension cabinet (can multistage cascade), be connected to kernel.In order to meet different application demands, kernel is driven by controller, RAID drives, Volume drives, and creates NAS volume, ISCSI volume or FC volume on a storage medium, and starts corresponding service processes.The client in high in the clouds is by network accessed storage system, and storage system is after the request receiving client, and the different agreement that can use according to client, provides different derivation services.
Start below by by system, extension cabinet power down, the process that extension cabinet re-powers afterwards is described the inventive method.
After step one, system electrification, the driver of the automatic loading control of meeting, driver starts finger daemon.Finger daemon scans the hardware be directly or indirectly connected on HBA card, in units of extension cabinet, creates hardware topology figure, for each extension cabinet creates a circular buffer queue, collects the device-identification information of extension cabinet simultaneously and preserve;
When creating hardware topology figure, different hardware devices can be distinguished with the sequence number SN of hardware.
Circular buffer queue can regulate oneself space dynamically according to the service condition of reality, circular buffer queue comprises head pointer and tail pointer.
Extension cabinet creates RAID, volume group, logical volume, and to extension cabinet write data, next the treatment scheme of data write extension cabinet is described.
Be illustrated in figure 2 the process flow diagram of data write, corresponding to step 2 and step 3.
According to step 2, when system is to extension cabinet write data, in the corresponding circular buffer queue of extension cabinet simultaneously created in write step one.When extension cabinet return correctly write time, deleting in buffer queue the data write, for buffer memory round-robin queue, can realize by the value of circular buffer rear of queue pointer being assigned to head pointer, renewal extension cabinet device-identification information simultaneously.If the failure of data write extension cabinet, saves the data in circular buffer queue.
In the process of write data, cut off the power lead of extension cabinet, extension cabinet gone offline, illustrate extension cabinet go offline after treatment scheme.
Data are failure when writing extension cabinet, and possible cause is that extension cabinet goes offline suddenly, and also may be that extension cabinet is online, other reason causes writing unsuccessfully, now needs to check whether extension cabinet normally connects.
According to step 3, if extension cabinet normally connects, then re-issue the data that in step 2, circular buffer queue is preserved, successfully after write, upgrade the device-identification information of extension cabinet, delete the data in the queue of extension cabinet circular buffer; If detect that extension cabinet goes offline, then upgrade extension cabinet device-identification information, and the topology diagram of extension cabinet device-identification information and extension cabinet is preserved, the hardware topology figure simultaneously described in step of updating one.
Because whether uncertain extension cabinet can be connected to main cabinet again, so need to clear up the data cached of preservation.
According to step 4, finger daemon timing scan is data cached, checks the holding time of data and current system time.The data cached selection exceeding certain hour is abandoned, the space that the circular buffer queue of release extension cabinet, extension cabinet device-identification information, extension cabinet topology diagram use.
As preferably, can arrange the out-of-service time T of an acquiescence for the data of buffer memory and be kept in configuration file, user can change the out-of-service time T in configuration file.
Be illustrated in figure 3 extension cabinet add system after processing flow chart, corresponding step 5 and step 6.
According to step 5, scanning adds the information of extension cabinet, mates with the topological diagram of the extension cabinet of preserving in step 3.If recognizing this extension cabinet is new equipment, then upgrade hardware topology figure, and the extension cabinet establishment circular buffer queue for newly adding, collect extension cabinet device-identification information and also preserve; If recognizing this extension cabinet is the original extension cabinet of system, then automatically configures associated storage according to the superblock information in extension cabinet disk and configuration information and derive service, recovering associated storage derives the state before going offline to extension cabinet of serving.
Store in recovery, on the successful basis of recovery configuring, that does not preserve when trial recovery extension cabinet goes offline is data cached.According to step 6, check that whether the status information mark of extension cabinet is consistent with the extension cabinet device-identification information of record, if unanimously, by the date restoring in respective cycle buffer queue to extension cabinet, and upgrade extension cabinet device-identification information; If inconsistent, abandon the data in circular buffer queue, again collect extension cabinet device-identification information and preserve.
Above-described specific descriptions; the object of inventing, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; the protection domain be not intended to limit the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. extension cabinet goes offline and a disaster automatic recovery method after reconnecting, it is characterized in that: comprise the following steps:
Step one, start finger daemon, finger daemon, according to the annexation of hardware, creates hardware topology figure, circular buffer queue, searches for and preserves the device-identification information of extension cabinet;
Data consistency protection is carried out during step 2, data write extension cabinet;
Check whether extension cabinet normally connects and carry out respective handling, and detailed process is as described below during step 3, data write extension cabinet failure:
If extension cabinet normally connects, then re-issue the data that in step 2, circular buffer queue is preserved, successfully after write, upgrade the device-identification information of described extension cabinet, delete the data in the queue of extension cabinet circular buffer; If detect that extension cabinet goes offline, then preserve the topology diagram of extension cabinet device-identification information and the extension cabinet that goes offline, upgrade described hardware topology figure simultaneously;
Data cached ageing of step 4, quantitative check, and discard processing is carried out to stale data;
Business recovery is carried out when step 5, extension cabinet connecting system;
Date restoring is carried out under the prerequisite of step 6, extension cabinet business recovery.
2. a kind of extension cabinet according to claim 1 goes offline and disaster automatic recovery method after reconnecting, it is characterized in that: when creating described hardware topology figure, the sequence number SN of available hardware distinguishes different hardware devices.
3. a kind of extension cabinet according to claim 1 goes offline and disaster automatic recovery method after reconnecting, it is characterized in that: described extension cabinet device-identification information comprise each disk in extension cabinet last write time information and for identifying the message bit pattern that in extension cabinet, whether storage medium uses.
4. a kind of extension cabinet according to claim 1 goes offline and disaster automatic recovery method after reconnecting, it is characterized in that: described in step 3, delete the data in the queue of extension cabinet circular buffer, by arranging initial and end pointer for circular buffer queue, and when deleting data, the value of tail pointer can be assigned to head pointer realization.
5. a kind of extension cabinet according to claim 1 goes offline and disaster automatic recovery method after reconnecting, it is characterized in that: described step 4 adopts following process to complete: timing scan is data cached, check the holding time of data and current system time, the data cached selection exceeding out-of-service time T is abandoned, the space that the circular buffer queue of release extension cabinet, extension cabinet device-identification information, extension cabinet topology diagram use.
6. a kind of extension cabinet according to claim 5 goes offline and disaster automatic recovery method after reconnecting, and it is characterized in that: the threshold value that can arrange an acquiescence to out-of-service time T, and be kept in configuration file by T, user can change it.
7. a kind of extension cabinet according to claim 1 goes offline and disaster automatic recovery method after reconnecting, it is characterized in that: business recovery described in step 5 can be completed by following process: scanning adds the information of extension cabinet, mates with the topology diagram of the extension cabinet of preserving in described step 3; If recognizing this extension cabinet is new equipment, then upgrade described hardware topology figure, and the extension cabinet establishment circular buffer queue for newly adding, collect and preserve extension cabinet device-identification information; If recognizing this extension cabinet is the original extension cabinet of system, then upgrade described hardware topology figure, and derive service according to the superblock information in extension cabinet disk and configuration information configuration associated storage, the state before the service of deriving of recovery associated storage goes offline to extension cabinet.
8. a kind of extension cabinet according to claim 1 goes offline and disaster automatic recovery method after reconnecting, it is characterized in that: date restoring described in step 6 can be completed by following process: check that whether the device-identification information of extension cabinet is consistent with the extension cabinet device-identification information preserved in described step 3, if consistent, by the date restoring in respective cycle buffer queue to extension cabinet, and upgrade extension cabinet device-identification information; If inconsistent, abandon the data in circular buffer queue, again collect and preserve extension cabinet device-identification information.
CN201410817445.9A 2014-12-24 2014-12-24 A kind of extension cabinet go offline and reconnect after disaster automatic recovery method Active CN104503867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410817445.9A CN104503867B (en) 2014-12-24 2014-12-24 A kind of extension cabinet go offline and reconnect after disaster automatic recovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410817445.9A CN104503867B (en) 2014-12-24 2014-12-24 A kind of extension cabinet go offline and reconnect after disaster automatic recovery method

Publications (2)

Publication Number Publication Date
CN104503867A true CN104503867A (en) 2015-04-08
CN104503867B CN104503867B (en) 2017-07-11

Family

ID=52945267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410817445.9A Active CN104503867B (en) 2014-12-24 2014-12-24 A kind of extension cabinet go offline and reconnect after disaster automatic recovery method

Country Status (1)

Country Link
CN (1) CN104503867B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148221A (en) * 2020-09-18 2020-12-29 北京浪潮数据技术有限公司 Method, device, equipment and storage medium for routing inspection of redundant array of disks

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1832489A (en) * 2006-04-19 2006-09-13 杭州华为三康技术有限公司 Method for accessing object magnetic dish and system for extensing disk content
CN101141659A (en) * 2006-09-07 2008-03-12 国际商业机器公司 System and method for dynamic determination of system topology in a multiple building block server system
US20080184217A1 (en) * 2007-01-30 2008-07-31 Fujitsu Limited Storage system, storage unit, and method for hot swapping of firmware
CN101256526A (en) * 2008-03-10 2008-09-03 清华大学 Method for implementing document condition compatibility maintenance in inspection point fault-tolerant technique
CN102073458A (en) * 2009-11-19 2011-05-25 上海圣桥信息科技有限公司 Startup and shutdown time sequence control device for magnetic disk array storage system
CN102508793A (en) * 2011-10-11 2012-06-20 浪潮电子信息产业股份有限公司 Method for preventing disk storage system from expanding and matching disks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1832489A (en) * 2006-04-19 2006-09-13 杭州华为三康技术有限公司 Method for accessing object magnetic dish and system for extensing disk content
CN101141659A (en) * 2006-09-07 2008-03-12 国际商业机器公司 System and method for dynamic determination of system topology in a multiple building block server system
US20080184217A1 (en) * 2007-01-30 2008-07-31 Fujitsu Limited Storage system, storage unit, and method for hot swapping of firmware
CN101256526A (en) * 2008-03-10 2008-09-03 清华大学 Method for implementing document condition compatibility maintenance in inspection point fault-tolerant technique
CN102073458A (en) * 2009-11-19 2011-05-25 上海圣桥信息科技有限公司 Startup and shutdown time sequence control device for magnetic disk array storage system
CN102508793A (en) * 2011-10-11 2012-06-20 浪潮电子信息产业股份有限公司 Method for preventing disk storage system from expanding and matching disks

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148221A (en) * 2020-09-18 2020-12-29 北京浪潮数据技术有限公司 Method, device, equipment and storage medium for routing inspection of redundant array of disks
CN112148221B (en) * 2020-09-18 2024-02-13 北京浪潮数据技术有限公司 Method, device, equipment and storage medium for inspecting redundant array of inexpensive disks

Also Published As

Publication number Publication date
CN104503867B (en) 2017-07-11

Similar Documents

Publication Publication Date Title
US10261853B1 (en) Dynamic replication error retry and recovery
TWI734890B (en) System and method for providing data replication in nvme-of ethernet ssd
US10108367B2 (en) Method for a source storage device sending data to a backup storage device for storage, and storage device
EP3519969B1 (en) Physical media aware spacially coupled journaling and replay
US9170888B2 (en) Methods and apparatus for virtual machine recovery
US10013166B2 (en) Virtual tape library system
CN100383749C (en) Remote copy method and remote copy system
US9047108B1 (en) Systems and methods for migrating replicated virtual machine disks
US20190220379A1 (en) Troubleshooting Method, Apparatus, and Device
US10860447B2 (en) Database cluster architecture based on dual port solid state disk
US10929231B1 (en) System configuration selection in a storage system
WO2016078529A1 (en) Service processing method, device and system
US10229010B2 (en) Methods for preserving state across a failure and devices thereof
CN107133132B (en) Data sending method, data receiving method and storage device
US20210326211A1 (en) Data backup method, apparatus, and system
US11137918B1 (en) Administration of control information in a storage system
CN107533495A (en) Technology for data backup and resume
US20130151769A1 (en) Hard Disk Drive Reliability In Server Environment Using Forced Hot Swapping
CN108874312A (en) Date storage method and storage equipment
CN113051428B (en) Method and device for back-up storage at front end of camera
CN104503867A (en) Method for automatic disaster recovery after drop-off and reconnection of extension cabinet
CN102325171B (en) Data storage method in monitoring system and system
CN104407806B (en) The amending method and device of RAID group hard disc information
CN111475334A (en) TiDB database maintenance method and device, computer equipment and storage medium
WO2023050665A1 (en) Cross-node cloning method and apparatus for storage volume, and device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant