WO2015176455A1 - Hadoop-based hard disk damage handling method and device - Google Patents

Hadoop-based hard disk damage handling method and device Download PDF

Info

Publication number
WO2015176455A1
WO2015176455A1 PCT/CN2014/087477 CN2014087477W WO2015176455A1 WO 2015176455 A1 WO2015176455 A1 WO 2015176455A1 CN 2014087477 W CN2014087477 W CN 2014087477W WO 2015176455 A1 WO2015176455 A1 WO 2015176455A1
Authority
WO
WIPO (PCT)
Prior art keywords
hard disk
replacement
directory
data read
hadoop
Prior art date
Application number
PCT/CN2014/087477
Other languages
French (fr)
Chinese (zh)
Inventor
杨庆平
屠要峰
黄震江
李莹
张家明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015176455A1 publication Critical patent/WO2015176455A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing

Definitions

  • the present invention relates to the field of communications, and in particular to a Hadoop-based hard disk damage processing method and apparatus.
  • Hadoop an open source big data storage and analytics platform, has become the de facto standard for the industry to handle big data.
  • the Hadoop platform consists of two important subsystems: HDFS (Distributed File System) and MapReduce (Parallel Computing Framework).
  • FIG. 1 is a schematic diagram of a platform architecture of Hadoop storage data in related art.
  • Hadoop is a highly fault-tolerant multi-copy cluster storage distributed system, which is suitable for deployment on a cheap machine, and Hadoop supports multiple machines. Parallel data writing and reading of block hard disks. With the development of big data, the amount of data has increased sharply. Enterprises use Hadoop platform to deploy on cheap PC Server in order to reduce costs. At least 24 hard disks will be configured on each server. The largest number of cluster machines has exceeded 5,000. There are 24 hard disks on each device, and the number of hard disks in the entire cluster reaches more than 100,000 hard disks. Basically, hard disks are damaged every day.
  • the hard disk is damaged.
  • the processing adopted by Hadoop is as follows: when there is a hard disk loss in the device, the system cannot write data, and triggered by an abnormal event, Hadoop will move the lost hard disk into the damaged hard disk list, and the system will be in the subsequent operation. No more data is written to the directory where the hard disk mount is damaged. During the system running, the damaged hard disk list will no longer be accessed, that is, the directory corresponding to the damaged hard disk will not be verified. After the operation and maintenance personnel replace the new hard disk with the broken hard disk, mount the new hard disk to the directory. The system does not consider the current hard disk to be usable during the running. Only after restarting the hadoop process, the system will recheck all data directories and the new hard disk participates in the system.
  • the above hard disk damage processing has the following defects: (1) The service needs to be interrupted: since the new hard disk can be used because the process must be restarted, the service needs to be interrupted during the restart process, thereby causing a loss of business. (2) High operation and maintenance cost: After the hard disk is damaged, not only hardware engineers but also software engineers need to restart and observe the Hadoop cluster, which greatly increases the labor cost.
  • the processing of the hard disk damage in the related art only adopts a method of restarting the system, resulting in interruption of the service and an increase in cost.
  • the present invention provides a method and a device for processing a hard disk damage based on Hadoop, so as to at least solve the problem that the processing of the hard disk damage in the related art only adopts a method of restarting the system, resulting in interruption of the service and an increase in cost.
  • a Hadoop-based hard disk damage processing method including: detecting a replacement operation success information of a Hadoop-based damaged hard disk replaced with a replacement hard disk; and determining whether the replacement hard disk supports data read/write operations; When the determination result is YES, data read and write processing is performed on the replacement hard disk.
  • the method before detecting the replacement operation success information of the replacement hard disk based on the Hadoop, the method further includes: detecting the abnormal disk abnormality information of the data read and write operation of the damaged hard disk;
  • the hard disk abnormality information is interrupted, and the data read and write operations on the damaged hard disk are interrupted.
  • the method before determining whether the replacement hard disk supports data read and write operations, the method further includes: establishing an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, according to the established available directory and The unavailable directory determines whether the replacement hard disk supports data read and write operations.
  • the method further includes: dynamically updating the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes The replacement operation success information or the hard disk abnormality information.
  • the replacement operation success information of the Hadoop-based damaged hard disk replaced by the replacement hard disk is detected by at least one of the following: receiving the replacement hard disk to replace the damaged hard disk hard disk mount suspension event Scan to the notification message that the damaged hard disk directory has changed from abnormal to normal.
  • a Hadoop-based hard disk damage processing apparatus including: a first detecting module configured to detect a replacement operation success information of a Hadoop-based damaged hard disk replaced with a replacement hard disk; a determining module, setting To determine whether the replacement hard disk supports data read/write operations; and the processing module is configured to perform data read and write processing on the replacement hard disk if the determination result is yes.
  • the device further includes: a second detecting module, configured to detect hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk; and the interrupting module is configured to interrupt the damaged hard disk according to the hard disk abnormality information Data read and write operations.
  • a second detecting module configured to detect hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk
  • the interrupting module is configured to interrupt the damaged hard disk according to the hard disk abnormality information Data read and write operations.
  • the apparatus further comprises: an establishing module, configured to establish an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, and determine according to the established available directory and the unavailable directory Whether the replacement hard disk supports data read and write operations.
  • an establishing module configured to establish an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, and determine according to the established available directory and the unavailable directory Whether the replacement hard disk supports data read and write operations.
  • the device further includes: an update module, configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes the replacement operation success information or Hard disk error information.
  • an update module configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes the replacement operation success information or Hard disk error information.
  • the first detecting module of the device comprises at least one of: a receiving unit configured to receive a hard disk mount suspend event of the replacement hard disk replacing the damaged hard disk; and a scanning unit configured to scan to the damaged
  • the hard disk directory changes from abnormal to normal notification message.
  • the replacement operation success information of the Hadoop-based damaged hard disk is replaced with the replacement hard disk; whether the replacement hard disk supports the data read/write operation; and if the determination result is yes, the data is replaced by the replacement hard disk.
  • the read/write processing solves the problem that the processing of the hard disk damage in the related technology only adopts the method of restarting the system, resulting in the interruption of the service and the increase of the cost, thereby achieving the replacement processing of the damaged hard disk without restarting the system interrupt service. Effect.
  • FIG. 1 is a schematic diagram of a platform architecture of Hadoop storage data in the related art
  • FIG. 2 is a flowchart of a Hadoop-based hard disk damage processing method according to an embodiment of the present invention
  • FIG. 3 is a structural block diagram of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram 1 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention
  • FIG. 5 is a block diagram 2 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention
  • FIG. 6 is a block diagram 3 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention.
  • FIG. 7 is a block diagram showing a preferred structure of a first detecting module 32 in a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention
  • FIG. 8 is a flowchart of a method for processing an abnormality of a hard disk according to a preferred embodiment of the present invention.
  • FIG. 9 is a flow chart of a method for successfully processing a hard disk replacement in accordance with a preferred embodiment of the present invention.
  • FIG. 2 is a flowchart of a Hadoop-based hard disk damage processing method according to an embodiment of the present invention. As shown in FIG. 2, the process includes the following steps:
  • Step S202 detecting that the Hadoop-based damaged hard disk is replaced with the replacement operation success information of the replacement hard disk
  • Step S204 determining whether the replacement hard disk supports data read and write operations
  • step S206 if the determination result is YES, data read and write processing is performed on the replacement hard disk.
  • the incident response processing for replacing the damaged hard disk with the replacement hard disk is added.
  • the replacement hard disk can be read and written again by restarting the system.
  • the operation not only solves the problem that the processing of the hard disk damage in the related technology only adopts the method of restarting the system, resulting in interruption of the service, and the problem of increasing the cost, thereby achieving the replacement processing of the damaged hard disk without restarting the system interrupt service. effect.
  • the method is also compatible with the event response processing of the hard disk abnormality, for example, before detecting the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk, It can also detect hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk; interrupt data read and write operations on the damaged hard disk according to the abnormal information of the hard disk. Therefore, after the hard disk is damaged, the data is still read and written, causing the system to crash.
  • a corresponding label directory can be established for the hard disk, and the hard disk can be divided into an available directory and an unavailable directory according to whether the state can be enabled. That is, an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations are established. According to the established available directory and the unavailable directory, it is determined whether the replacement hard disk supports data read and write operations.
  • the available directory and the unavailable directory may be dynamically updated according to the detected hard disk state information, where the hard disk state information includes replacement operation success information or hard disk abnormality information. For example, when detecting the hard disk abnormality information of the data read and write operation that is damaged by the hard disk, the directory corresponding to the hard disk is moved from the available directory to the unavailable directory, and when it is detected based on Hadoop Replace the damaged hard disk with the replacement hard disk replacement operation success information is ground, the hard disk corresponding directory is moved from the impossible directory to the available directory, according to the dynamic update of the directory state to achieve read and write operations on the hard disk.
  • the hard disk state information includes replacement operation success information or hard disk abnormality information. For example, when detecting the hard disk abnormality information of the data read and write operation that is damaged by the hard disk, the directory corresponding to the hard disk is moved from the available directory to the unavailable directory, and when it is detected based on Hadoop Replace the damaged hard disk with the replacement hard disk replacement operation success information is ground, the hard disk corresponding directory is moved from the impossible directory to
  • the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk may be processed in multiple ways. For example, at least one of the following two methods may be adopted: receiving the replacement hard disk replacement triggered by the operating system layer. The hard disk mount hang event of the damaged hard disk; the scan to the service system layer triggered the damaged hard disk directory from abnormal to normal notification message.
  • a Hadoop-based hard disk damage processing device is also provided, which is used to implement the foregoing embodiments and preferred embodiments, and has not been described again.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 3 is a structural block diagram of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus includes a first detecting module 32, a determining module 34, and a processing module 36. .
  • the first detecting module 32 is configured to detect the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk;
  • the determining module 34 is connected to the first detecting module 32, and is configured to determine whether the replacement hard disk supports the data read/write operation;
  • the processing module 36 is connected to the determination module 34, and is configured to perform data read and write processing on the replacement hard disk when the determination result is YES.
  • FIG. 4 is a block diagram of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes a second detecting module 42 and an interrupt, in addition to all the modules shown in FIG. Module 44, the device will be described below.
  • the second detecting module 42 is configured to detect hard disk abnormality information that is abnormal in data read and write operations that damage the hard disk; the interrupting module 44 is connected to the second detecting module 42 and the first detecting module 32, and is configured to be interrupted according to the abnormal information of the hard disk. Data read and write operations on damaged hard disks.
  • FIG. 5 is a block diagram of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 5, the apparatus includes an establishing module 52, in addition to all the modules shown in FIG. The device is described.
  • the establishing module 52 is connected to the first detecting module 32 and the determining module 34, and is configured to establish an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, according to the established available directory and the unavailable directory. , to determine whether the replacement hard disk supports data read and write operations.
  • FIG. 6 is a block diagram 3 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus includes an update module 62 in addition to all the modules shown in FIG. The update module 62 is described.
  • the update module 62 is connected to the foregoing establishing module 52 and the determining module 34, and is configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, wherein the hard disk state information includes replacement operation success information or hard disk abnormality information.
  • FIG. 7 is a block diagram of a preferred structure of a first detection module 32 in a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention.
  • the first detection module 32 includes at least one of the following: a receiving unit 72, scanning Unit 74, the first detection module 32 will be described below.
  • the receiving unit 72 is configured to receive a hard disk mount suspend event of replacing the damaged hard disk with the replacement hard disk; and the scanning unit 74 is configured to scan to the notification message that the damaged hard disk directory is changed from abnormal to normal.
  • the processing scheme based on the combination of the normal event trigger and the abnormal event trigger includes the following two important event trigger processing: a hard disk write abnormal event and a hard disk replacement success event.
  • the hard disk write exception event is used to trigger the Hadoop cluster to perform abnormal processing on the current hard disk to prevent subsequent data from being continuously written to the hard disk. As a result, system data cannot be written and the system crashes.
  • the hard disk replacement success event is used to trigger the Hadoop cluster to activate the currently available hard disk and automatically recognize it during operation.
  • step S1 Hadoop writes data, and the hard disk cannot be written
  • step S2 the Hadoop process captures the hard disk write abnormality information
  • step S3 the Hadoop process deletes the data directory corresponding to the current hard disk
  • step S4 Hadoop continues to write other available normal hard disks.
  • step S1 the hard disk cannot write data, and the hardware itself reports an error through the operating system.
  • step S2 the datanode process of the subsystem HDFS in the Hadoop can capture the current abnormal information.
  • step S3 the datanode process captures the data directory corresponding to the abnormal hard disk and moves it into the unavailable directory.
  • step S4 the datanode automatically isolates the deleted unavailable directory when writing data.
  • step S1 after the damaged hard disk is successfully replaced, the hadoop process is triggered;
  • step S2 the hadoop process checks whether the directory of the newly replaced hard disk mount is available
  • step S3 the hadoop process dynamically moves the current directory from the unavailable list to the available list
  • Step S4 when the hadoop process writes data, data is written to the newly inserted hard disk;
  • step S1 the hard disk replacement is successful, and the operating system triggers the hadoop process.
  • the datanode process of the HDFS subsystem of the hadoop is responsible for checking whether the current directory is available, and the directory restart is enabled.
  • step S4 when the datanode writes data, the newly inserted hard disk is automatically enabled.
  • the hard disk replacement automatic identification access method provided by the foregoing embodiments and the preferred embodiment reduces the risk of service interruption, and greatly simplifies the operation of the Hadoop operation and maintenance mechanism, which is not only reliable and effective, but also does not reduce the write. And read performance.
  • FIG. 8 is a flowchart of a method for processing an abnormality of a hard disk according to a preferred embodiment of the present invention. As shown in FIG. 8, the process includes the following steps:
  • Step S802 after Hadoop is started, HDFS automatically loads all configured data directories into the memory, and Hadoop writes in parallel according to the list of available directories;
  • Step S804 when the HDFS write hard disk data is abnormal, the bottom layer will report data abnormality, determine whether there is a hard disk damage, if the determination result is yes, proceed to step S806, otherwise proceed to step S808;
  • step S806 the HDFS captures the data abnormality, deletes the damaged hard disk, and deletes the abnormal directory from the memory;
  • Step S808 the HDFS continues to write to the normal data directory in the memory
  • step S810 the data writing ends.
  • FIG. 9 is a flowchart of a method for successfully processing a hard disk replacement according to a preferred embodiment of the present invention. As shown in FIG. 9, the process includes the following steps:
  • step S902 the normal notification of the hard disk can be implemented in the following two ways:
  • the hard disk mount event is triggered by the operating system layer. When Hadoop captures this event, it performs processing.
  • the abnormal directory is scanned cyclically by the business system.
  • Hadoop is notified, and Hadoop performs the next processing.
  • Step S904 Hadoop receives an event that the abnormal directory is normal, and the HDFS checks again that the directory is available, that is, the directory corresponding to the damaged hard disk is restored;
  • Step S906 the HDFS updates the directory to the memory, that is, the directory dynamically takes effect
  • step S908 the HDFS writes the data to the available hard disk directory in the memory (that is, the system can write data on the new hard disk), and does not need to restart the HDFS process.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the above-mentioned embodiments and the preferred embodiments solve the problem that the processing of the hard disk damage in the related art only adopts the method of restarting the system, causing the service to be interrupted, and increasing the cost, thereby achieving the service interruption without restarting the system. In the case, the effect of the replacement processing of the damaged hard disk.

Abstract

Provided are a Hadoop-based hard disk damage handling method and device, the method comprising: information is detected regarding the success of an operation to replace a Hadoop-based damaged hard disk with a replacement hard disk (S202); it is determined whether the replacement hard disk supports data read/write operations (S204); if the result of determining is "yes", then data is read from/written to the replacement hard disk (S206). The above technical solution solves the problem in the related art of handling hard disk damage only by restarting the system, resulting in interrupted services and increased costs; thus the effect of replacing the hard disk is achieved in the case of no system restart and service interruption.

Description

基于Hadoop的硬盘损坏处理方法及装置Hard disk damage processing method and device based on Hadoop 技术领域Technical field
本发明涉及通信领域,具体而言,涉及一种基于Hadoop的硬盘损坏处理方法及装置。The present invention relates to the field of communications, and in particular to a Hadoop-based hard disk damage processing method and apparatus.
背景技术Background technique
Hadoop,是一种开源的大数据存储和分析平台,已成为业界处理大数据的事实标准。Hadoop平台包含HDFS(分布式文件系统)和MapReduce(并行计算框架)两个重要的子系统。Hadoop, an open source big data storage and analytics platform, has become the de facto standard for the industry to handle big data. The Hadoop platform consists of two important subsystems: HDFS (Distributed File System) and MapReduce (Parallel Computing Framework).
图1是相关技术中Hadoop存储数据的平台架构示意图,如图1所示,Hadoop是一个高度容错的多副本的集群存储分布式系统,适用于部署在廉价的机器上,并且Hadoop支持机器上多块硬盘的并行数据写入和读取。随着大数据的发展,数据量急剧增加,企业为了减少成本采用Hadoop平台部署在廉价的PC Server上,每台服务器上至少会配置24块以上的硬盘,目前最大的集群机器数量已经超过5000台,每台设备上有24块硬盘,整个集群硬盘数量达到了10多万块硬盘,基本每天都会有硬盘损坏。FIG. 1 is a schematic diagram of a platform architecture of Hadoop storage data in related art. As shown in FIG. 1 , Hadoop is a highly fault-tolerant multi-copy cluster storage distributed system, which is suitable for deployment on a cheap machine, and Hadoop supports multiple machines. Parallel data writing and reading of block hard disks. With the development of big data, the amount of data has increased sharply. Enterprises use Hadoop platform to deploy on cheap PC Server in order to reduce costs. At least 24 hard disks will be configured on each server. The largest number of cluster machines has exceeded 5,000. There are 24 hard disks on each device, and the number of hard disks in the entire cluster reaches more than 100,000 hard disks. Basically, hard disks are damaged every day.
在相关技术中,硬盘损坏,Hadoop所采用的处理如下:当设备中有硬盘损失,系统无法写入数据,由异常事件触发,Hadoop将损失硬盘移入到损坏硬盘列表中,系统在后续运行中将不再往损坏硬盘mount的目录写数据。系统在运行期间,将不再访问损坏硬盘列表,即不对损坏硬盘对应的目录进行校验。当运维人员将新硬盘替换掉坏掉硬盘后,将新硬盘mount到目录,系统在运行期间不认为当前硬盘可以使用。只有在重启hadoop进程后,系统才重新对所有数据目录进行检查,新硬盘参与系统的运行。In the related art, the hard disk is damaged. The processing adopted by Hadoop is as follows: when there is a hard disk loss in the device, the system cannot write data, and triggered by an abnormal event, Hadoop will move the lost hard disk into the damaged hard disk list, and the system will be in the subsequent operation. No more data is written to the directory where the hard disk mount is damaged. During the system running, the damaged hard disk list will no longer be accessed, that is, the directory corresponding to the damaged hard disk will not be verified. After the operation and maintenance personnel replace the new hard disk with the broken hard disk, mount the new hard disk to the directory. The system does not consider the current hard disk to be usable during the running. Only after restarting the hadoop process, the system will recheck all data directories and the new hard disk participates in the system.
然而上述硬盘损坏处理存在以下缺陷:(1)业务需要中断:由于必须要重启进程才可以使得新硬盘使用,在重启进程期间,业务需要中断,因而带来的业务上的损失。(2)运维成本高:硬盘损坏后,不但需要硬件工程师,还需要软件工程师对Hadoop集群进行重启和观察,大大增加了人力成本。However, the above hard disk damage processing has the following defects: (1) The service needs to be interrupted: since the new hard disk can be used because the process must be restarted, the service needs to be interrupted during the restart process, thereby causing a loss of business. (2) High operation and maintenance cost: After the hard disk is damaged, not only hardware engineers but also software engineers need to restart and observe the Hadoop cluster, which greatly increases the labor cost.
因此,在相关技术中对硬盘损坏的处理只采用重启系统的方式,导致中断业务,以及增加成本的问题。 Therefore, the processing of the hard disk damage in the related art only adopts a method of restarting the system, resulting in interruption of the service and an increase in cost.
发明内容Summary of the invention
本发明提供了一种基于Hadoop的硬盘损坏处理方法及装置,以至少解决相关技术中对硬盘损坏的处理只采用重启系统的方式,导致中断业务,以及增加成本的问题。The present invention provides a method and a device for processing a hard disk damage based on Hadoop, so as to at least solve the problem that the processing of the hard disk damage in the related art only adopts a method of restarting the system, resulting in interruption of the service and an increase in cost.
根据本发明的一个方面,提供了一种基于Hadoop的硬盘损坏处理方法,包括:检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息;判断所述替换硬盘是否支持数据读写操作;在判断结果为是的情况下,对所述替换硬盘进行数据读写处理。According to an aspect of the present invention, a Hadoop-based hard disk damage processing method is provided, including: detecting a replacement operation success information of a Hadoop-based damaged hard disk replaced with a replacement hard disk; and determining whether the replacement hard disk supports data read/write operations; When the determination result is YES, data read and write processing is performed on the replacement hard disk.
优选地,在检测到基于Hadoop的所述损坏硬盘替换为所述替换硬盘的所述替换操作成功信息之前,还包括:检测到所述损坏硬盘的数据读写操作异常的硬盘异常信息;依据所述硬盘异常信息,中断对所述损坏硬盘的数据读写操作。Preferably, before detecting the replacement operation success information of the replacement hard disk based on the Hadoop, the method further includes: detecting the abnormal disk abnormality information of the data read and write operation of the damaged hard disk; The hard disk abnormality information is interrupted, and the data read and write operations on the damaged hard disk are interrupted.
优选地,在判断所述替换硬盘是否支持数据读写操作之前,还包括:建立支持数据读写操作的可用目录,以及不支持数据读写操作的不可用目录,依据建立的所述可用目录以及所述不可用目录,判断所述替换硬盘是否支持数据读写操作。Preferably, before determining whether the replacement hard disk supports data read and write operations, the method further includes: establishing an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, according to the established available directory and The unavailable directory determines whether the replacement hard disk supports data read and write operations.
优选地,在判断所述替换硬盘是否支持数据读写操作之前,还包括:依据检测到的硬盘状态信息对所述可用目录以及所述不可用目录进行动态更新,其中,所述硬盘状态信息包括所述替换操作成功信息或者硬盘异常信息。Preferably, before determining whether the replacement hard disk supports the data read/write operation, the method further includes: dynamically updating the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes The replacement operation success information or the hard disk abnormality information.
优选地,通过以下方式至少之一,检测到基于Hadoop的所述损坏硬盘替换为所述替换硬盘的所述替换操作成功信息:接收到所述替换硬盘替换所述损坏硬盘的硬盘mount挂起事件;扫描到所述损坏硬盘目录由异常转为正常的通知消息。Preferably, the replacement operation success information of the Hadoop-based damaged hard disk replaced by the replacement hard disk is detected by at least one of the following: receiving the replacement hard disk to replace the damaged hard disk hard disk mount suspension event Scan to the notification message that the damaged hard disk directory has changed from abnormal to normal.
根据本发明的另一方面,提供了一种基于Hadoop的硬盘损坏处理装置,包括:第一检测模块,设置为检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息;判断模块,设置为判断所述替换硬盘是否支持数据读写操作;处理模块,设置为在判断结果为是的情况下,对所述替换硬盘进行数据读写处理。According to another aspect of the present invention, a Hadoop-based hard disk damage processing apparatus is provided, including: a first detecting module configured to detect a replacement operation success information of a Hadoop-based damaged hard disk replaced with a replacement hard disk; a determining module, setting To determine whether the replacement hard disk supports data read/write operations; and the processing module is configured to perform data read and write processing on the replacement hard disk if the determination result is yes.
优选地,该装置还包括:第二检测模块,设置为检测到所述损坏硬盘的数据读写操作异常的硬盘异常信息;中断模块,设置为依据所述硬盘异常信息,中断对所述损坏硬盘的数据读写操作。Preferably, the device further includes: a second detecting module, configured to detect hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk; and the interrupting module is configured to interrupt the damaged hard disk according to the hard disk abnormality information Data read and write operations.
优选地,该装置还包括:建立模块,设置为建立支持数据读写操作的可用目录,以及不支持数据读写操作的不可用目录,依据建立的所述可用目录以及所述不可用目录,判断所述替换硬盘是否支持数据读写操作。 Preferably, the apparatus further comprises: an establishing module, configured to establish an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, and determine according to the established available directory and the unavailable directory Whether the replacement hard disk supports data read and write operations.
优选地,该装置还包括:更新模块,设置为依据检测到的硬盘状态信息对所述可用目录以及所述不可用目录进行动态更新,其中,所述硬盘状态信息包括所述替换操作成功信息或者硬盘异常信息。Preferably, the device further includes: an update module, configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes the replacement operation success information or Hard disk error information.
优选地,该装置所述第一检测模块包括以下至少之一:接收单元,设置为接收到所述替换硬盘替换所述损坏硬盘的硬盘mount挂起事件;扫描单元,设置为扫描到所述损坏硬盘目录由异常转为正常的通知消息。Preferably, the first detecting module of the device comprises at least one of: a receiving unit configured to receive a hard disk mount suspend event of the replacement hard disk replacing the damaged hard disk; and a scanning unit configured to scan to the damaged The hard disk directory changes from abnormal to normal notification message.
通过本发明,采用检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息;判断所述替换硬盘是否支持数据读写操作;在判断结果为是的情况下,对所述替换硬盘进行数据读写处理,解决了相关技术中对硬盘损坏的处理只采用重启系统的方式,导致中断业务,以及增加成本的问题,进而达到了在不重启系统中断业务的情况下,对损坏硬盘的替换处理的效果。According to the present invention, the replacement operation success information of the Hadoop-based damaged hard disk is replaced with the replacement hard disk; whether the replacement hard disk supports the data read/write operation; and if the determination result is yes, the data is replaced by the replacement hard disk. The read/write processing solves the problem that the processing of the hard disk damage in the related technology only adopts the method of restarting the system, resulting in the interruption of the service and the increase of the cost, thereby achieving the replacement processing of the damaged hard disk without restarting the system interrupt service. Effect.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是相关技术中Hadoop存储数据的平台架构示意图;1 is a schematic diagram of a platform architecture of Hadoop storage data in the related art;
图2是根据本发明实施例的基于Hadoop的硬盘损坏处理方法的流程图;2 is a flowchart of a Hadoop-based hard disk damage processing method according to an embodiment of the present invention;
图3是根据本发明实施例的基于Hadoop的硬盘损坏处理装置的结构框图;3 is a structural block diagram of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;
图4是根据本发明实施例的基于Hadoop的硬盘损坏处理装置的优选结构框图一;4 is a block diagram 1 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;
图5是根据本发明实施例的基于Hadoop的硬盘损坏处理装置的优选结构框图二;5 is a block diagram 2 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;
图6是根据本发明实施例的基于Hadoop的硬盘损坏处理装置的优选结构框图三;6 is a block diagram 3 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;
图7是根据本发明实施例的基于Hadoop的硬盘损坏处理装置中第一检测模块32的优选结构框图;7 is a block diagram showing a preferred structure of a first detecting module 32 in a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention;
图8是根据本发明优选实施方式的硬盘异常处理方法的流程图;8 is a flowchart of a method for processing an abnormality of a hard disk according to a preferred embodiment of the present invention;
图9是根据本发明优选实施方式的硬盘替换成功处理方法的流程图。 9 is a flow chart of a method for successfully processing a hard disk replacement in accordance with a preferred embodiment of the present invention.
具体实施方式detailed description
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The invention will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.
在本实施例中提供了一种基于Hadoop的硬盘损坏处理方法,图2是根据本发明实施例的基于Hadoop的硬盘损坏处理方法的流程图,如图2所示,该流程包括如下步骤:In this embodiment, a Hadoop-based hard disk damage processing method is provided. FIG. 2 is a flowchart of a Hadoop-based hard disk damage processing method according to an embodiment of the present invention. As shown in FIG. 2, the process includes the following steps:
步骤S202,检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息;Step S202, detecting that the Hadoop-based damaged hard disk is replaced with the replacement operation success information of the replacement hard disk;
步骤S204,判断替换硬盘是否支持数据读写操作;Step S204, determining whether the replacement hard disk supports data read and write operations;
步骤S206,在判断结果为是的情况下,对替换硬盘进行数据读写处理。In step S206, if the determination result is YES, data read and write processing is performed on the replacement hard disk.
通过上述步骤,增加对将损坏硬盘替换为替换硬盘的正常事件的事件响应处理,相对于相关技术中,在将损坏硬盘替换为替换硬盘后,只能通过重启系统来实现替换硬盘的重新读写操作,不仅解决了相关技术中对硬盘损坏的处理只采用重启系统的方式,导致中断业务,以及增加成本的问题,进而达到了在不重启系统中断业务的情况下,对损坏硬盘的替换处理的效果。Through the above steps, the incident response processing for replacing the damaged hard disk with the replacement hard disk is added. Compared with the related art, after replacing the damaged hard disk with the replacement hard disk, the replacement hard disk can be read and written again by restarting the system. The operation not only solves the problem that the processing of the hard disk damage in the related technology only adopts the method of restarting the system, resulting in interruption of the service, and the problem of increasing the cost, thereby achieving the replacement processing of the damaged hard disk without restarting the system interrupt service. effect.
其中,除了增加上述处理中增加的对正常事件的事件响应处理,该方法还兼容硬盘异常的事件响应处理,例如,上述在检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息之前,还可以检测到损坏硬盘的数据读写操作异常的硬盘异常信息;依据硬盘异常信息,中断对损坏硬盘的数据读写操作。从而防止在硬盘损坏后,还继续读写数据,造成系统崩溃。In addition to increasing the event response processing for the normal event added in the foregoing processing, the method is also compatible with the event response processing of the hard disk abnormality, for example, before detecting the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk, It can also detect hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk; interrupt data read and write operations on the damaged hard disk according to the abnormal information of the hard disk. Therefore, after the hard disk is damaged, the data is still read and written, causing the system to crash.
在判断替换硬盘是否支持数据读写操作时,可以采用多种处理方式,例如,可以采用对硬盘建立对应的标记目录,对是否能够启用的状态将该硬盘划分为可用目录,以及不可用目录,即建立支持数据读写操作的可用目录,以及不支持数据读写操作的不可用目录,依据建立的可用目录以及不可用目录,判断替换硬盘是否支持数据读写操作。When determining whether the replacement hard disk supports data read and write operations, various processing methods can be adopted. For example, a corresponding label directory can be established for the hard disk, and the hard disk can be divided into an available directory and an unavailable directory according to whether the state can be enabled. That is, an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations are established. According to the established available directory and the unavailable directory, it is determined whether the replacement hard disk supports data read and write operations.
其中,在判断替换硬盘是否支持数据读写操作之前,还可以依据检测到的硬盘状态信息对可用目录以及不可用目录进行动态更新,其中,硬盘状态信息包括替换操作成功信息或者硬盘异常信息。例如,在检测到损坏硬盘的数据读写操作异常的硬盘异常信息时,将硬盘对应的目录由可用目录移动到不可用目录,而在检测到基于Hadoop 的损坏硬盘替换为替换硬盘的替换操作成功信息是地,将硬盘对应的目录由不可能目录移动到可用目录,依据目录状态的动态更新,来实现是否对该硬盘进行读写操作。Before determining whether the replacement hard disk supports the data read/write operation, the available directory and the unavailable directory may be dynamically updated according to the detected hard disk state information, where the hard disk state information includes replacement operation success information or hard disk abnormality information. For example, when detecting the hard disk abnormality information of the data read and write operation that is damaged by the hard disk, the directory corresponding to the hard disk is moved from the available directory to the unavailable directory, and when it is detected based on Hadoop Replace the damaged hard disk with the replacement hard disk replacement operation success information is ground, the hard disk corresponding directory is moved from the impossible directory to the available directory, according to the dynamic update of the directory state to achieve read and write operations on the hard disk.
需要说明的是,检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息可以采用多种处理方式,例如,可以采用以下两种方式至少之一:接收到操作系统层触发的替换硬盘替换损坏硬盘的硬盘mount挂起事件;扫描到业务系统层触发的损坏硬盘目录由异常转为正常的通知消息。It should be noted that the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk may be processed in multiple ways. For example, at least one of the following two methods may be adopted: receiving the replacement hard disk replacement triggered by the operating system layer. The hard disk mount hang event of the damaged hard disk; the scan to the service system layer triggered the damaged hard disk directory from abnormal to normal notification message.
在本实施例中还提供了一种基于Hadoop的硬盘损坏处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In this embodiment, a Hadoop-based hard disk damage processing device is also provided, which is used to implement the foregoing embodiments and preferred embodiments, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
图3是根据本发明实施例的基于Hadoop的硬盘损坏处理装置的结构框图,如图3所示,该装置包括第一检测模块32、判断模块34和处理模块36,下面对该装置进行说明。3 is a structural block diagram of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus includes a first detecting module 32, a determining module 34, and a processing module 36. .
第一检测模块32,设置为检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息;判断模块34,连接至上述第一检测模块32,设置为判断替换硬盘是否支持数据读写操作;处理模块36,连接至上述判断模块34,设置为在判断结果为是的情况下,对替换硬盘进行数据读写处理。The first detecting module 32 is configured to detect the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk; the determining module 34 is connected to the first detecting module 32, and is configured to determine whether the replacement hard disk supports the data read/write operation; The processing module 36 is connected to the determination module 34, and is configured to perform data read and write processing on the replacement hard disk when the determination result is YES.
图4是根据本发明实施例的基于Hadoop的硬盘损坏处理装置的优选结构框图一,如图4所示,该装置除包括图3所示的所有模块外,还包括第二检测模块42和中断模块44,下面对该装置进行说明。4 is a block diagram of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes a second detecting module 42 and an interrupt, in addition to all the modules shown in FIG. Module 44, the device will be described below.
第二检测模块42,设置为检测到损坏硬盘的数据读写操作异常的硬盘异常信息;中断模块44,连接至上述第二检测模块42和第一检测模块32,设置为依据硬盘异常信息,中断对损坏硬盘的数据读写操作。The second detecting module 42 is configured to detect hard disk abnormality information that is abnormal in data read and write operations that damage the hard disk; the interrupting module 44 is connected to the second detecting module 42 and the first detecting module 32, and is configured to be interrupted according to the abnormal information of the hard disk. Data read and write operations on damaged hard disks.
图5是根据本发明实施例的基于Hadoop的硬盘损坏处理装置的优选结构框图二,如图5所示,该装置除包括图3所示的所有模块外,还包括建立模块52,下面对该装置进行说明。FIG. 5 is a block diagram of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 5, the apparatus includes an establishing module 52, in addition to all the modules shown in FIG. The device is described.
建立模块52,连接至上述第一检测模块32和判断模块34,设置为建立支持数据读写操作的可用目录,以及不支持数据读写操作的不可用目录,依据建立的可用目录以及不可用目录,判断替换硬盘是否支持数据读写操作。 The establishing module 52 is connected to the first detecting module 32 and the determining module 34, and is configured to establish an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, according to the established available directory and the unavailable directory. , to determine whether the replacement hard disk supports data read and write operations.
图6是根据本发明实施例的基于Hadoop的硬盘损坏处理装置的优选结构框图三,如图6所示,该装置除包括图5所示的所有模块外,还包括更新模块62,下面对该更新模块62进行说明。6 is a block diagram 3 of a preferred structure of a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus includes an update module 62 in addition to all the modules shown in FIG. The update module 62 is described.
更新模块62,连接至上述建立模块52和判断模块34,设置为依据检测到的硬盘状态信息对可用目录以及不可用目录进行动态更新,其中,硬盘状态信息包括替换操作成功信息或者硬盘异常信息。The update module 62 is connected to the foregoing establishing module 52 and the determining module 34, and is configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, wherein the hard disk state information includes replacement operation success information or hard disk abnormality information.
图7是根据本发明实施例的基于Hadoop的硬盘损坏处理装置中第一检测模块32的优选结构框图,如图7所示,该第一检测模块32包括以下至少之一:接收单元72、扫描单元74,下面对该第一检测模块32进行说明。FIG. 7 is a block diagram of a preferred structure of a first detection module 32 in a Hadoop-based hard disk damage processing apparatus according to an embodiment of the present invention. As shown in FIG. 7, the first detection module 32 includes at least one of the following: a receiving unit 72, scanning Unit 74, the first detection module 32 will be described below.
接收单元72,设置为接收到替换硬盘替换损坏硬盘的硬盘mount挂起事件;扫描单元74,设置为扫描到损坏硬盘目录由异常转为正常的通知消息。The receiving unit 72 is configured to receive a hard disk mount suspend event of replacing the damaged hard disk with the replacement hard disk; and the scanning unit 74 is configured to scan to the notification message that the damaged hard disk directory is changed from abnormal to normal.
考虑到相关技术中,只基于异常触发,即只对启动时目录负责处理,运行期间只基于异常事件触发处理,对于正常事件未有对应的处理流程,因而引起业务中断,以及成本增加的问题。在本实施例中,通过不仅考虑异常事件触发处理,而且考虑正常事件触发处理,即使得新替换的硬盘尽快替换损坏硬盘,不仅能够有效克服现有Hadoop对硬盘只在启动时处理以及异常事件触发的缺陷,而且不影响现有系统运行,成本也大大降低。该基于正常事件触发与异常事件触发相互结合的处理方案包含以下两个重要事件触发处理:硬盘写入异常事件和硬盘替换成功事件。硬盘写入异常事件用于触发Hadoop集群对当前硬盘进行异常处理,防止后续数据持续写入此硬盘,造成系统数据无法写入,进入使得系统崩溃。硬盘替换成功事件用于触发Hadoop集群使得对当前可用的硬盘激活,在运行期间自动识别使用。Considering the related art, only the exception triggering is performed, that is, only the startup directory is responsible for processing, and the processing is triggered only based on the abnormal event during the running, and there is no corresponding processing flow for the normal event, thereby causing service interruption and cost increase. In this embodiment, not only the abnormal event trigger processing but also the normal event trigger processing is considered, that is, the newly replaced hard disk replaces the damaged hard disk as soon as possible, which can effectively overcome the existing Hadoop-only hard disk processing at startup and abnormal event triggering. The defects, and does not affect the operation of the existing system, the cost is also greatly reduced. The processing scheme based on the combination of the normal event trigger and the abnormal event trigger includes the following two important event trigger processing: a hard disk write abnormal event and a hard disk replacement success event. The hard disk write exception event is used to trigger the Hadoop cluster to perform abnormal processing on the current hard disk to prevent subsequent data from being continuously written to the hard disk. As a result, system data cannot be written and the system crashes. The hard disk replacement success event is used to trigger the Hadoop cluster to activate the currently available hard disk and automatically recognize it during operation.
下面对这两个事件触发处理分别进行说明:The following two events are triggered to explain the processing:
硬盘异常处理:Hard disk exception handling:
步骤S1,Hadoop写入数据,硬盘无法写入;In step S1, Hadoop writes data, and the hard disk cannot be written;
步骤S2,Hadoop进程捕获到硬盘写入异常信息;In step S2, the Hadoop process captures the hard disk write abnormality information;
步骤S3,Hadoop进程将当前硬盘对应的数据目录删除;In step S3, the Hadoop process deletes the data directory corresponding to the current hard disk;
步骤S4,Hadoop进行继续写其他可用正常硬盘。 In step S4, Hadoop continues to write other available normal hard disks.
需要说明的是,在上述步骤S1中,硬盘无法写入数据,硬件本身会通过操作系统上报错误。It should be noted that, in the foregoing step S1, the hard disk cannot write data, and the hardware itself reports an error through the operating system.
在上述步骤S2中,Hadoop中子系统HDFS的datanode进程可以捕获到当前异常信息。In the above step S2, the datanode process of the subsystem HDFS in the Hadoop can capture the current abnormal information.
在上述步骤S3中,datanode进程将捕获异常硬盘对应的数据目录,移入到不可用目录。In the above step S3, the datanode process captures the data directory corresponding to the abnormal hard disk and moves it into the unavailable directory.
在上述步骤S4中,datanode在写入数据时自动隔离已删除的不可用目录。In the above step S4, the datanode automatically isolates the deleted unavailable directory when writing data.
硬盘替换成功事件处理:Hard disk replacement success event processing:
步骤S1,损坏硬盘替换成功后,触发hadoop进程;In step S1, after the damaged hard disk is successfully replaced, the hadoop process is triggered;
步骤S2,hadoop进程检查新替换的硬盘mount的目录是否可用;In step S2, the hadoop process checks whether the directory of the newly replaced hard disk mount is available;
步骤S3,hadoop进程将当前目录从不可用列表中,动态移入到可用列表;In step S3, the hadoop process dynamically moves the current directory from the unavailable list to the available list;
步骤S4,hadoop进程写入数据时,有数据写入新插入的硬盘;Step S4, when the hadoop process writes data, data is written to the newly inserted hard disk;
需要说明的是,在上述步骤S1中,硬盘替换成功,操作系统触发hadoop进程。It should be noted that, in the above step S1, the hard disk replacement is successful, and the operating system triggers the hadoop process.
在上述步骤S2、步骤S3中,hadoop的子系统HDFS的datanode进程负责检查当前目录是否可用,可用才将目录重启启用。In the above steps S2 and S3, the datanode process of the HDFS subsystem of the hadoop is responsible for checking whether the current directory is available, and the directory restart is enabled.
在上述步骤S4中,datanode在写入数据时,自动将新插入硬盘启用。In the above step S4, when the datanode writes data, the newly inserted hard disk is automatically enabled.
通过上述实施例及优选实施方式提供的硬盘替换自动识别接入方法,由于不需要重启hadoop进程,降低了业务中断的风险,大大简化了hadoop运维机制,不仅可靠有效,而且并未降低写入和读取的性能。The hard disk replacement automatic identification access method provided by the foregoing embodiments and the preferred embodiment reduces the risk of service interruption, and greatly simplifies the operation of the Hadoop operation and maintenance mechanism, which is not only reliable and effective, but also does not reduce the write. And read performance.
下面对本发明优选实施方式进行说明。Preferred embodiments of the invention are described below.
图8是根据本发明优选实施方式的硬盘异常处理方法的流程图,如图8所示,该流程包括如下步骤:FIG. 8 is a flowchart of a method for processing an abnormality of a hard disk according to a preferred embodiment of the present invention. As shown in FIG. 8, the process includes the following steps:
步骤S802,Hadoop启动后,HDFS会自动将配置的所有数据目录加载到内存中,Hadoop按可用目录列表进行并行写入; Step S802, after Hadoop is started, HDFS automatically loads all configured data directories into the memory, and Hadoop writes in parallel according to the list of available directories;
步骤S804,当HDFS写入硬盘数据异常,底层会报数据异常,判断是否有硬盘损坏,在判断结果为是的情况下,进入步骤S806,否则进入步骤S808;Step S804, when the HDFS write hard disk data is abnormal, the bottom layer will report data abnormality, determine whether there is a hard disk damage, if the determination result is yes, proceed to step S806, otherwise proceed to step S808;
步骤S806,HDFS捕获到数据异常,将损坏硬盘删除,也将异常目录从内存中删除;In step S806, the HDFS captures the data abnormality, deletes the damaged hard disk, and deletes the abnormal directory from the memory;
步骤S808,HDFS持续写入在内存中的正常数据目录;Step S808, the HDFS continues to write to the normal data directory in the memory;
步骤S810,数据写入结束。At step S810, the data writing ends.
图9是根据本发明优选实施方式的硬盘替换成功处理方法的流程图,如图9所示,该流程包括如下步骤:FIG. 9 is a flowchart of a method for successfully processing a hard disk replacement according to a preferred embodiment of the present invention. As shown in FIG. 9, the process includes the following steps:
步骤S902,硬盘正常通知可以由以下两种方式实现:In step S902, the normal notification of the hard disk can be implemented in the following two ways:
(1)由操作系统层触发硬盘mount事件,当Hadoop捕获到此事件时,进行后去处理。(1) The hard disk mount event is triggered by the operating system layer. When Hadoop captures this event, it performs processing.
(2)由业务系统循环扫描异常目录,当异常目录正常后,通知Hadoop,由Hadoop进行下一步处理。(2) The abnormal directory is scanned cyclically by the business system. When the abnormal directory is normal, Hadoop is notified, and Hadoop performs the next processing.
步骤S904,Hadoop收到异常目录正常好的事件,HDFS再次检查该目录可用,即将损坏硬盘对应的目录恢复;Step S904, Hadoop receives an event that the abnormal directory is normal, and the HDFS checks again that the directory is available, that is, the directory corresponding to the damaged hard disk is restored;
步骤S906,HDFS将此目录更新到内存中,即目录动态生效;Step S906, the HDFS updates the directory to the memory, that is, the directory dynamically takes effect;
步骤S908,HDFS将数据写入内存中可用的硬盘目录(即系统可以在新的硬盘上写入数据),不需要重启HDFS进程。In step S908, the HDFS writes the data to the available hard disk directory in the memory (that is, the system can write data on the new hard disk), and does not need to restart the HDFS process.
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。 It will be apparent to those skilled in the art that the various modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
工业实用性Industrial applicability
如上所述,通过上述实施例及优选实施方式,解决了相关技术中对硬盘损坏的处理只采用重启系统的方式,导致中断业务,以及增加成本的问题,进而达到了在不重启系统中断业务的情况下,对损坏硬盘的替换处理的效果。 As described above, the above-mentioned embodiments and the preferred embodiments solve the problem that the processing of the hard disk damage in the related art only adopts the method of restarting the system, causing the service to be interrupted, and increasing the cost, thereby achieving the service interruption without restarting the system. In the case, the effect of the replacement processing of the damaged hard disk.

Claims (10)

  1. 一种基于Hadoop的硬盘损坏处理方法,包括:A method for processing hard disk damage based on Hadoop, including:
    检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息;Replace the successful operation information of the Hadoop-based damaged hard disk with the replacement hard disk replacement operation;
    判断所述替换硬盘是否支持数据读写操作;Determining whether the replacement hard disk supports data read and write operations;
    在判断结果为是的情况下,对所述替换硬盘进行数据读写处理。When the determination result is YES, data read and write processing is performed on the replacement hard disk.
  2. 根据权利要求1所述的方法,其中,在检测到基于Hadoop的所述损坏硬盘替换为所述替换硬盘的所述替换操作成功信息之前,还包括:The method of claim 1, further comprising: before detecting the replacement operation success information of the replacement hard disk based on Hadoop, further comprising:
    检测到所述损坏硬盘的数据读写操作异常的硬盘异常信息;Detecting abnormal disk abnormality information of the damaged hard disk data read and write operation;
    依据所述硬盘异常信息,中断对所述损坏硬盘的数据读写操作。And interrupting data read and write operations on the damaged hard disk according to the hard disk abnormality information.
  3. 根据权利要求1或2所述的方法,其中,在判断所述替换硬盘是否支持数据读写操作之前,还包括:The method according to claim 1 or 2, wherein before determining whether the replacement hard disk supports data read/write operations, the method further comprises:
    建立支持数据读写操作的可用目录,以及不支持数据读写操作的不可用目录,依据建立的所述可用目录以及所述不可用目录,判断所述替换硬盘是否支持数据读写操作。Establishing an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, and determining whether the replacement hard disk supports data read and write operations according to the established available directory and the unavailable directory.
  4. 根据权利要求3所述的方法,其中,在判断所述替换硬盘是否支持数据读写操作之前,还包括:The method of claim 3, wherein before determining whether the replacement hard disk supports data read and write operations, the method further comprises:
    依据检测到的硬盘状态信息对所述可用目录以及所述不可用目录进行动态更新,其中,所述硬盘状态信息包括所述替换操作成功信息或者硬盘异常信息。The available directory and the unavailable directory are dynamically updated according to the detected hard disk state information, where the hard disk state information includes the replacement operation success information or the hard disk abnormality information.
  5. 根据权利要求1、2或4所述的方法,其中,通过以下方式至少之一,检测到基于Hadoop的所述损坏硬盘替换为所述替换硬盘的所述替换操作成功信息:The method according to claim 1, 2 or 4, wherein the replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk is detected by at least one of the following manners:
    接收到所述替换硬盘替换所述损坏硬盘的硬盘mount挂起事件;Receiving the replacement hard disk to replace the hard disk mount suspend event of the damaged hard disk;
    扫描到所述损坏硬盘目录由异常转为正常的通知消息。Scans the notification message that the damaged hard disk directory has changed from abnormal to normal.
  6. 一种基于Hadoop的硬盘损坏处理装置,包括:A Hadoop-based hard disk damage processing device, comprising:
    第一检测模块,设置为检测到基于Hadoop的损坏硬盘替换为替换硬盘的替换操作成功信息;The first detecting module is configured to detect a replacement operation success information of the Hadoop-based damaged hard disk replaced with the replacement hard disk;
    判断模块,设置为判断所述替换硬盘是否支持数据读写操作; a judging module, configured to determine whether the replacement hard disk supports data read and write operations;
    处理模块,设置为在判断结果为是的情况下,对所述替换硬盘进行数据读写处理。The processing module is configured to perform data read and write processing on the replacement hard disk when the determination result is YES.
  7. 根据权利要求6所述的装置,其中,还包括:The apparatus of claim 6 further comprising:
    第二检测模块,设置为检测到所述损坏硬盘的数据读写操作异常的硬盘异常信息;The second detecting module is configured to detect the hard disk abnormality information that is abnormal in data read and write operations of the damaged hard disk;
    中断模块,设置为依据所述硬盘异常信息,中断对所述损坏硬盘的数据读写操作。The interrupt module is configured to interrupt data read and write operations on the damaged hard disk according to the hard disk abnormality information.
  8. 根据权利要求6或7所述的装置,其中,还包括:The device according to claim 6 or 7, further comprising:
    建立模块,设置为建立支持数据读写操作的可用目录,以及不支持数据读写操作的不可用目录,依据建立的所述可用目录以及所述不可用目录,判断所述替换硬盘是否支持数据读写操作。Establishing a module, setting an available directory that supports data read and write operations, and an unavailable directory that does not support data read and write operations, determining whether the replacement hard disk supports data reading according to the established available directory and the unavailable directory Write operation.
  9. 根据权利要求8所述的装置,其中,还包括:The apparatus of claim 8 further comprising:
    更新模块,设置为依据检测到的硬盘状态信息对所述可用目录以及所述不可用目录进行动态更新,其中,所述硬盘状态信息包括所述替换操作成功信息或者硬盘异常信息。The update module is configured to dynamically update the available directory and the unavailable directory according to the detected hard disk state information, where the hard disk state information includes the replacement operation success information or the hard disk abnormality information.
  10. 根据权利要求6、7或9所述的装置,其中,所述第一检测模块包括以下至少之一:The apparatus of claim 6, 7 or 9, wherein said first detection module comprises at least one of:
    接收单元,设置为接收到所述替换硬盘替换所述损坏硬盘的硬盘mount挂起事件;a receiving unit, configured to receive the hard disk mount suspend event of the damaged hard disk after replacing the damaged hard disk;
    扫描单元,设置为扫描到所述损坏硬盘目录由异常转为正常的通知消息。 The scanning unit is configured to scan to the notification message that the damaged hard disk directory is changed from abnormal to normal.
PCT/CN2014/087477 2014-05-22 2014-09-25 Hadoop-based hard disk damage handling method and device WO2015176455A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410220454.XA CN105095030B (en) 2014-05-22 2014-05-22 Hard disk corruptions processing method and processing device based on Hadoop
CN201410220454.X 2014-05-22

Publications (1)

Publication Number Publication Date
WO2015176455A1 true WO2015176455A1 (en) 2015-11-26

Family

ID=54553336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/087477 WO2015176455A1 (en) 2014-05-22 2014-09-25 Hadoop-based hard disk damage handling method and device

Country Status (2)

Country Link
CN (1) CN105095030B (en)
WO (1) WO2015176455A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121620A (en) * 2017-12-22 2018-06-05 联想(北京)有限公司 The restorative procedure and system and server of distributed file system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080178040A1 (en) * 2005-05-19 2008-07-24 Fujitsu Limited Disk failure restoration method and disk array apparatus
US20080209263A1 (en) * 2007-02-27 2008-08-28 International Business Machines Corporation Rebuildling a failed disk in a disk array
CN101281452A (en) * 2007-04-05 2008-10-08 英业达股份有限公司 Method for automatically rebuilding hard disk
CN102355568A (en) * 2011-09-22 2012-02-15 杭州海康威视数字技术股份有限公司 Method and device for carrying out charged uninstallation and installation of hard disk for digital video recorder
CN102521058A (en) * 2011-12-01 2012-06-27 北京威视数据系统有限公司 Disk data pre-migration method of RAID (Redundant Array of Independent Disks) group

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100644572B1 (en) * 1999-10-02 2006-11-13 삼성전자주식회사 Device operation detecting apparatus and method in directory serve
CN100449472C (en) * 2006-09-08 2009-01-07 华为技术有限公司 Method and apparatus for treating disc hot insert

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080178040A1 (en) * 2005-05-19 2008-07-24 Fujitsu Limited Disk failure restoration method and disk array apparatus
US20080209263A1 (en) * 2007-02-27 2008-08-28 International Business Machines Corporation Rebuildling a failed disk in a disk array
CN101281452A (en) * 2007-04-05 2008-10-08 英业达股份有限公司 Method for automatically rebuilding hard disk
CN102355568A (en) * 2011-09-22 2012-02-15 杭州海康威视数字技术股份有限公司 Method and device for carrying out charged uninstallation and installation of hard disk for digital video recorder
CN102521058A (en) * 2011-12-01 2012-06-27 北京威视数据系统有限公司 Disk data pre-migration method of RAID (Redundant Array of Independent Disks) group

Also Published As

Publication number Publication date
CN105095030A (en) 2015-11-25
CN105095030B (en) 2019-05-28

Similar Documents

Publication Publication Date Title
US10152382B2 (en) Method and system for monitoring virtual machine cluster
US8688642B2 (en) Systems and methods for managing application availability
US8856592B2 (en) Mechanism to provide assured recovery for distributed application
US9582373B2 (en) Methods and systems to hot-swap a virtual machine
US9652326B1 (en) Instance migration for rapid recovery from correlated failures
US9753954B2 (en) Data node fencing in a distributed file system
US20120174112A1 (en) Application resource switchover systems and methods
US9098439B2 (en) Providing a fault tolerant system in a loosely-coupled cluster environment using application checkpoints and logs
JP5579650B2 (en) Apparatus and method for executing monitored process
WO2018095414A1 (en) Method and apparatus for detecting and recovering fault of virtual machine
WO2014076838A1 (en) Virtual machine synchronization system
US8990608B1 (en) Failover of applications between isolated user space instances on a single instance of an operating system
CN107453932B (en) Distributed storage system management method and device
US20160266924A1 (en) Apparatus and method for identifying a virtual machine having changeable settings
US9444885B2 (en) Workflow processing in a distributed computing environment
US20190303233A1 (en) Automatically Detecting Time-Of-Fault Bugs in Cloud Systems
WO2015176455A1 (en) Hadoop-based hard disk damage handling method and device
CN109254880B (en) Method and device for processing database downtime
US11509555B2 (en) Determining operational status of Internet of Things devices
US10855521B2 (en) Efficient replacement of clients running large scale applications
CN110727652B (en) Cloud storage processing system and method for realizing data processing
CN114880150A (en) Fault isolation and field protection method and system
CN110688193B (en) Disk processing method and device
US20240095011A1 (en) State machine operation for non-disruptive update of a data management system
JPWO2014132466A1 (en) Software safe stop system, software safe stop method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14892657

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14892657

Country of ref document: EP

Kind code of ref document: A1