WO2015176455A1

WO2015176455A1 - Hadoop-based hard disk damage handling method and device

Info

Publication number: WO2015176455A1
Application number: PCT/CN2014/087477
Authority: WO
Inventors: 杨庆平; 屠要峰; 黄震江; 李莹; 张家明
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-05-22
Filing date: 2014-09-25
Publication date: 2015-11-26
Also published as: CN105095030A; CN105095030B

Abstract

Provided are a Hadoop-based hard disk damage handling method and device, the method comprising: information is detected regarding the success of an operation to replace a Hadoop-based damaged hard disk with a replacement hard disk (S202); it is determined whether the replacement hard disk supports data read/write operations (S204); if the result of determining is "yes", then data is read from/written to the replacement hard disk (S206). The above technical solution solves the problem in the related art of handling hard disk damage only by restarting the system, resulting in interrupted services and increased costs; thus the effect of replacing the hard disk is achieved in the case of no system restart and service interruption.

Description

Hard disk damage processing method and device based on Hadoop

Technical field

The present invention relates to the field of communications, and in particular to a Hadoop-based hard disk damage processing method and apparatus.

Background technique

Hadoop, an open source big data storage and analytics platform, has become the de facto standard for the industry to handle big data. The Hadoop platform consists of two important subsystems: HDFS (Distributed File System) and MapReduce (Parallel Computing Framework).

FIG. 1 is a schematic diagram of a platform architecture of Hadoop storage data in related art. As shown in FIG. 1 , Hadoop is a highly fault-tolerant multi-copy cluster storage distributed system, which is suitable for deployment on a cheap machine, and Hadoop supports multiple machines. Parallel data writing and reading of block hard disks. With the development of big data, the amount of data has increased sharply. Enterprises use Hadoop platform to deploy on cheap PC Server in order to reduce costs. At least 24 hard disks will be configured on each server. The largest number of cluster machines has exceeded 5,000. There are 24 hard disks on each device, and the number of hard disks in the entire cluster reaches more than 100,000 hard disks. Basically, hard disks are damaged every day.

In the related art, the hard disk is damaged. The processing adopted by Hadoop is as follows: when there is a hard disk loss in the device, the system cannot write data, and triggered by an abnormal event, Hadoop will move the lost hard disk into the damaged hard disk list, and the system will be in the subsequent operation. No more data is written to the directory where the hard disk mount is damaged. During the system running, the damaged hard disk list will no longer be accessed, that is, the directory corresponding to the damaged hard disk will not be verified. After the operation and maintenance personnel replace the new hard disk with the broken hard disk, mount the new hard disk to the directory. The system does not consider the current hard disk to be usable during the running. Only after restarting the hadoop process, the system will recheck all data directories and the new hard disk participates in the system.

However, the above hard disk damage processing has the following defects: (1) The service needs to be interrupted: since the new hard disk can be used because the process must be restarted, the service needs to be interrupted during the restart process, thereby causing a loss of business. (2) High operation and maintenance cost: After the hard disk is damaged, not only hardware engineers but also software engineers need to restart and observe the Hadoop cluster, which greatly increases the labor cost.

Therefore, the processing of the hard disk damage in the related art only adopts a method of restarting the system, resulting in interruption of the service and an increase in cost.

Summary of the invention

The present invention provides a method and a device for processing a hard disk damage based on Hadoop, so as to at least solve the problem that the processing of the hard disk damage in the related art only adopts a method of restarting the system, resulting in interruption of the service and an increase in cost.

According to an aspect of the present invention, a Hadoop-based hard disk damage processing method is provided, including: detecting a replacement operation success information of a Hadoop-based damaged hard disk replaced with a replacement hard disk; and determining whether the replacement hard disk supports data read/write operations; When the determination result is YES, data read and write processing is performed on the replacement hard disk.

Preferably, before detecting the replacement operation success information of the replacement hard disk based on the Hadoop, the method further includes: detecting the abnormal disk abnormality information of the data read and write operation of the damaged hard disk; The hard disk abnormality information is interrupted, and the data read and write operations on the damaged hard disk are interrupted.

Preferably, before determining whether the replacement hard disk supports data read and write operations, the method further includes: establishing an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, according to the established available directory and The unavailable directory determines whether the replacement hard disk supports data read and write operations.

Preferably, before determining whether the replacement hard disk supports the data read/write operation, the method further includes: dynamically updating the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes The replacement operation success information or the hard disk abnormality information.

Preferably, the replacement operation success information of the Hadoop-based damaged hard disk replaced by the replacement hard disk is detected by at least one of the following: receiving the replacement hard disk to replace the damaged hard disk hard disk mount suspension event Scan to the notification message that the damaged hard disk directory has changed from abnormal to normal.

According to another aspect of the present invention, a Hadoop-based hard disk damage processing apparatus is provided, including: a first detecting module configured to detect a replacement operation success information of a Hadoop-based damaged hard disk replaced with a replacement hard disk; a determining module, setting To determine whether the replacement hard disk supports data read/write operations; and the processing module is configured to perform data read and write processing on the replacement hard disk if the determination result is yes.

Preferably, the device further includes: a second detecting module, configured to detect hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk; and the interrupting module is configured to interrupt the damaged hard disk according to the hard disk abnormality information Data read and write operations.

Preferably, the apparatus further comprises: an establishing module, configured to establish an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, and determine according to the established available directory and the unavailable directory Whether the replacement hard disk supports data read and write operations.

Preferably, the device further includes: an update module, configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes the replacement operation success information or Hard disk error information.

Preferably, the first detecting module of the device comprises at least one of: a receiving unit configured to receive a hard disk mount suspend event of the replacement hard disk replacing the damaged hard disk; and a scanning unit configured to scan to the damaged The hard disk directory changes from abnormal to normal notification message.

According to the present invention, the replacement operation success information of the Hadoop-based damaged hard disk is replaced with the replacement hard disk; whether the replacement hard disk supports the data read/write operation; and if the determination result is yes, the data is replaced by the replacement hard disk. The read/write processing solves the problem that the processing of the hard disk damage in the related technology only adopts the method of restarting the system, resulting in the interruption of the service and the increase of the cost, thereby achieving the replacement processing of the damaged hard disk without restarting the system interrupt service. Effect.

DRAWINGS

The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:

1 is a schematic diagram of a platform architecture of Hadoop storage data in the related art;

2 is a flowchart of a Hadoop-based hard disk damage processing method according to an embodiment of the present invention;

3 is a structural block diagram of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;

4 is a block diagram 1 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;

5 is a block diagram 2 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;

6 is a block diagram 3 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;

7 is a block diagram showing a preferred structure of a first detecting module 32 in a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;

8 is a flowchart of a method for processing an abnormality of a hard disk according to a preferred embodiment of the present invention;

9 is a flow chart of a method for successfully processing a hard disk replacement in accordance with a preferred embodiment of the present invention.

detailed description

The invention will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.

In this embodiment, a Hadoop-based hard disk damage processing method is provided. FIG. 2 is a flowchart of a Hadoop-based hard disk damage processing method according to an embodiment of the present invention. As shown in FIG. 2, the process includes the following steps:

Step S202, detecting that the Hadoop-based damaged hard disk is replaced with the replacement operation success information of the replacement hard disk;

Step S204, determining whether the replacement hard disk supports data read and write operations;

In step S206, if the determination result is YES, data read and write processing is performed on the replacement hard disk.

Through the above steps, the incident response processing for replacing the damaged hard disk with the replacement hard disk is added. Compared with the related art, after replacing the damaged hard disk with the replacement hard disk, the replacement hard disk can be read and written again by restarting the system. The operation not only solves the problem that the processing of the hard disk damage in the related technology only adopts the method of restarting the system, resulting in interruption of the service, and the problem of increasing the cost, thereby achieving the replacement processing of the damaged hard disk without restarting the system interrupt service. effect.

In addition to increasing the event response processing for the normal event added in the foregoing processing, the method is also compatible with the event response processing of the hard disk abnormality, for example, before detecting the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk, It can also detect hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk; interrupt data read and write operations on the damaged hard disk according to the abnormal information of the hard disk. Therefore, after the hard disk is damaged, the data is still read and written, causing the system to crash.

When determining whether the replacement hard disk supports data read and write operations, various processing methods can be adopted. For example, a corresponding label directory can be established for the hard disk, and the hard disk can be divided into an available directory and an unavailable directory according to whether the state can be enabled. That is, an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations are established. According to the established available directory and the unavailable directory, it is determined whether the replacement hard disk supports data read and write operations.

Before determining whether the replacement hard disk supports the data read/write operation, the available directory and the unavailable directory may be dynamically updated according to the detected hard disk state information, where the hard disk state information includes replacement operation success information or hard disk abnormality information. For example, when detecting the hard disk abnormality information of the data read and write operation that is damaged by the hard disk, the directory corresponding to the hard disk is moved from the available directory to the unavailable directory, and when it is detected based on Hadoop Replace the damaged hard disk with the replacement hard disk replacement operation success information is ground, the hard disk corresponding directory is moved from the impossible directory to the available directory, according to the dynamic update of the directory state to achieve read and write operations on the hard disk.

It should be noted that the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk may be processed in multiple ways. For example, at least one of the following two methods may be adopted: receiving the replacement hard disk replacement triggered by the operating system layer. The hard disk mount hang event of the damaged hard disk; the scan to the service system layer triggered the damaged hard disk directory from abnormal to normal notification message.

In this embodiment, a Hadoop-based hard disk damage processing device is also provided, which is used to implement the foregoing embodiments and preferred embodiments, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.

3 is a structural block diagram of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus includes a first detecting module 32, a determining module 34, and a processing module 36. .

The first detecting module 32 is configured to detect the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk; the determining module 34 is connected to the first detecting module 32, and is configured to determine whether the replacement hard disk supports the data read/write operation; The processing module 36 is connected to the determination module 34, and is configured to perform data read and write processing on the replacement hard disk when the determination result is YES.

4 is a block diagram of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes a second detecting module 42 and an interrupt, in addition to all the modules shown in FIG. Module 44, the device will be described below.

The second detecting module 42 is configured to detect hard disk abnormality information that is abnormal in data read and write operations that damage the hard disk; the interrupting module 44 is connected to the second detecting module 42 and the first detecting module 32, and is configured to be interrupted according to the abnormal information of the hard disk. Data read and write operations on damaged hard disks.

FIG. 5 is a block diagram of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 5, the apparatus includes an establishing module 52, in addition to all the modules shown in FIG. The device is described.

The establishing module 52 is connected to the first detecting module 32 and the determining module 34, and is configured to establish an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, according to the established available directory and the unavailable directory. , to determine whether the replacement hard disk supports data read and write operations.

6 is a block diagram 3 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus includes an update module 62 in addition to all the modules shown in FIG. The update module 62 is described.

The update module 62 is connected to the foregoing establishing module 52 and the determining module 34, and is configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, wherein the hard disk state information includes replacement operation success information or hard disk abnormality information.

FIG. 7 is a block diagram of a preferred structure of a first detection module 32 in a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 7, the first detection module 32 includes at least one of the following: a receiving unit 72, scanning Unit 74, the first detection module 32 will be described below.

The receiving unit 72 is configured to receive a hard disk mount suspend event of replacing the damaged hard disk with the replacement hard disk; and the scanning unit 74 is configured to scan to the notification message that the damaged hard disk directory is changed from abnormal to normal.

Considering the related art, only the exception triggering is performed, that is, only the startup directory is responsible for processing, and the processing is triggered only based on the abnormal event during the running, and there is no corresponding processing flow for the normal event, thereby causing service interruption and cost increase. In this embodiment, not only the abnormal event trigger processing but also the normal event trigger processing is considered, that is, the newly replaced hard disk replaces the damaged hard disk as soon as possible, which can effectively overcome the existing Hadoop-only hard disk processing at startup and abnormal event triggering. The defects, and does not affect the operation of the existing system, the cost is also greatly reduced. The processing scheme based on the combination of the normal event trigger and the abnormal event trigger includes the following two important event trigger processing: a hard disk write abnormal event and a hard disk replacement success event. The hard disk write exception event is used to trigger the Hadoop cluster to perform abnormal processing on the current hard disk to prevent subsequent data from being continuously written to the hard disk. As a result, system data cannot be written and the system crashes. The hard disk replacement success event is used to trigger the Hadoop cluster to activate the currently available hard disk and automatically recognize it during operation.

The following two events are triggered to explain the processing:

Hard disk exception handling:

In step S1, Hadoop writes data, and the hard disk cannot be written;

In step S2, the Hadoop process captures the hard disk write abnormality information;

In step S3, the Hadoop process deletes the data directory corresponding to the current hard disk;

In step S4, Hadoop continues to write other available normal hard disks.

It should be noted that, in the foregoing step S1, the hard disk cannot write data, and the hardware itself reports an error through the operating system.

In the above step S2, the datanode process of the subsystem HDFS in the Hadoop can capture the current abnormal information.

In the above step S3, the datanode process captures the data directory corresponding to the abnormal hard disk and moves it into the unavailable directory.

In the above step S4, the datanode automatically isolates the deleted unavailable directory when writing data.

Hard disk replacement success event processing:

In step S1, after the damaged hard disk is successfully replaced, the hadoop process is triggered;

In step S2, the hadoop process checks whether the directory of the newly replaced hard disk mount is available;

In step S3, the hadoop process dynamically moves the current directory from the unavailable list to the available list;

Step S4, when the hadoop process writes data, data is written to the newly inserted hard disk;

It should be noted that, in the above step S1, the hard disk replacement is successful, and the operating system triggers the hadoop process.

In the above steps S2 and S3, the datanode process of the HDFS subsystem of the hadoop is responsible for checking whether the current directory is available, and the directory restart is enabled.

In the above step S4, when the datanode writes data, the newly inserted hard disk is automatically enabled.

The hard disk replacement automatic identification access method provided by the foregoing embodiments and the preferred embodiment reduces the risk of service interruption, and greatly simplifies the operation of the Hadoop operation and maintenance mechanism, which is not only reliable and effective, but also does not reduce the write. And read performance.

Preferred embodiments of the invention are described below.

FIG. 8 is a flowchart of a method for processing an abnormality of a hard disk according to a preferred embodiment of the present invention. As shown in FIG. 8, the process includes the following steps:

Step S802, after Hadoop is started, HDFS automatically loads all configured data directories into the memory, and Hadoop writes in parallel according to the list of available directories;

Step S804, when the HDFS write hard disk data is abnormal, the bottom layer will report data abnormality, determine whether there is a hard disk damage, if the determination result is yes, proceed to step S806, otherwise proceed to step S808;

In step S806, the HDFS captures the data abnormality, deletes the damaged hard disk, and deletes the abnormal directory from the memory;

Step S808, the HDFS continues to write to the normal data directory in the memory;

At step S810, the data writing ends.

FIG. 9 is a flowchart of a method for successfully processing a hard disk replacement according to a preferred embodiment of the present invention. As shown in FIG. 9, the process includes the following steps:

In step S902, the normal notification of the hard disk can be implemented in the following two ways:

(1) The hard disk mount event is triggered by the operating system layer. When Hadoop captures this event, it performs processing.

(2) The abnormal directory is scanned cyclically by the business system. When the abnormal directory is normal, Hadoop is notified, and Hadoop performs the next processing.

Step S904, Hadoop receives an event that the abnormal directory is normal, and the HDFS checks again that the directory is available, that is, the directory corresponding to the damaged hard disk is restored;

Step S906, the HDFS updates the directory to the memory, that is, the directory dynamically takes effect;

In step S908, the HDFS writes the data to the available hard disk directory in the memory (that is, the system can write data on the new hard disk), and does not need to restart the HDFS process.

It will be apparent to those skilled in the art that the various modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.

The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Industrial applicability

As described above, the above-mentioned embodiments and the preferred embodiments solve the problem that the processing of the hard disk damage in the related art only adopts the method of restarting the system, causing the service to be interrupted, and increasing the cost, thereby achieving the service interruption without restarting the system. In the case, the effect of the replacement processing of the damaged hard disk.

Claims

A method for processing hard disk damage based on Hadoop, including:

Replace the successful operation information of the Hadoop-based damaged hard disk with the replacement hard disk replacement operation;

Determining whether the replacement hard disk supports data read and write operations;

When the determination result is YES, data read and write processing is performed on the replacement hard disk.
The method of claim 1, further comprising: before detecting the replacement operation success information of the replacement hard disk based on Hadoop, further comprising:

Detecting abnormal disk abnormality information of the damaged hard disk data read and write operation;

And interrupting data read and write operations on the damaged hard disk according to the hard disk abnormality information.
The method according to claim 1 or 2, wherein before determining whether the replacement hard disk supports data read/write operations, the method further comprises:

Establishing an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, and determining whether the replacement hard disk supports data read and write operations according to the established available directory and the unavailable directory.
The method of claim 3, wherein before determining whether the replacement hard disk supports data read and write operations, the method further comprises:

The available directory and the unavailable directory are dynamically updated according to the detected hard disk state information, where the hard disk state information includes the replacement operation success information or the hard disk abnormality information.
The method according to claim 1, 2 or 4, wherein the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk is detected by at least one of the following manners:

Receiving the replacement hard disk to replace the hard disk mount suspend event of the damaged hard disk;

Scans the notification message that the damaged hard disk directory has changed from abnormal to normal.
A Hadoop-based hard disk damage processing device, comprising:

The first detecting module is configured to detect a replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk;

a judging module, configured to determine whether the replacement hard disk supports data read and write operations;

The processing module is configured to perform data read and write processing on the replacement hard disk when the determination result is YES.
The apparatus of claim 6 further comprising:

The second detecting module is configured to detect the hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk;

The interrupt module is configured to interrupt data read and write operations on the damaged hard disk according to the hard disk abnormality information.
The device according to claim 6 or 7, further comprising:

Establishing a module, setting an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, determining whether the replacement hard disk supports data reading according to the established available directory and the unavailable directory Write operation.
The apparatus of claim 8 further comprising:

The update module is configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes the replacement operation success information or the hard disk abnormality information.
The apparatus of claim 6, 7 or 9, wherein said first detection module comprises at least one of:

a receiving unit, configured to receive the hard disk mount suspend event of the damaged hard disk after replacing the damaged hard disk;

The scanning unit is configured to scan to the notification message that the damaged hard disk directory is changed from abnormal to normal.