US20060200726A1 - Failure trend detection and correction in a data storage array - Google Patents

Failure trend detection and correction in a data storage array Download PDF

Info

Publication number
US20060200726A1
US20060200726A1 US11/070,942 US7094205A US2006200726A1 US 20060200726 A1 US20060200726 A1 US 20060200726A1 US 7094205 A US7094205 A US 7094205A US 2006200726 A1 US2006200726 A1 US 2006200726A1
Authority
US
United States
Prior art keywords
data
data storage
analysis
storage devices
devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/070,942
Inventor
Robert Gittins
Robert Lester
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seagate Technology LLC
Original Assignee
Seagate Technology LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seagate Technology LLC filed Critical Seagate Technology LLC
Priority to US11/070,942 priority Critical patent/US20060200726A1/en
Assigned to SEAGATE TECHNOLOGY LLC reassignment SEAGATE TECHNOLOGY LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GITTINS, ROBERT SHERWOOD, LESTER, ROBERT MICHAEL
Priority to JP2005202408A priority patent/JP5059304B2/en
Publication of US20060200726A1 publication Critical patent/US20060200726A1/en
Priority to US11/867,543 priority patent/US7765437B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/36Monitoring, i.e. supervising the progress of recording or reproducing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/20Disc-shaped record carriers
    • G11B2220/25Disc-shaped record carriers characterised in that the disc is based on a specific recording technology
    • G11B2220/2508Magnetic discs
    • G11B2220/2516Hard disks
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/40Combinations of multiple record carriers
    • G11B2220/41Flat as opposed to hierarchical combination, e.g. library of tapes or discs, CD changer, or groups of record carriers that together store one title

Definitions

  • the claimed invention relates generally to the field of data storage systems and more particularly, but not by way of limitation, to an apparatus and method for detecting and correcting parametric failure trends in a data storage array.
  • Multi-device arrays are relatively large data space storage systems comprising a number of data storage devices, such as hard disc drives (HDDs), that are grouped together to provide an inter-device addressable memory space.
  • HDDs hard disc drives
  • MDAs are increasingly used in a wide variety of data intensive applications, web servers and other network accessed systems.
  • Individual data storage devices can be equipped with routines that monitor various operational parameters to provide early failure trend detection capabilities. This allows a user to take appropriate corrective action, such as reallocation or replacement of the associated data storage device, prior to a system failure event that adversely affects other portions of the system.
  • Preferred embodiments of the present invention are generally directed to an apparatus and method for detecting and correcting parametric failure trends in a data storage array.
  • a plurality of data storage devices are arranged to form a multi-device addressable memory array space.
  • a controller is provided to control access to the array space.
  • the controller is configured to accumulate operational performance data from each of the devices into a history log.
  • a statistical analysis engine of the controller analyzes the data to detect anomalous operation of the devices, including a horizontal analysis of data across multiple devices.
  • the controller utilizes a corrective action module to initiate a data storage device specific corrective action event in response to the analysis, as required.
  • a data request block requests additional data samples for a given parameter, or requests additional parametric data to further the analysis.
  • a graphical user interface reports alarm indications to a system user, as well as facilitates user-specified data collection and analyses.
  • FIG. 1 is an exploded view of a data storage device constructed and operated in accordance with preferred embodiments of the present invention.
  • FIG. 2 is a generalized functional block diagram of the device of FIG. 1 .
  • FIG. 3 illustrates relevant portions of a multi-disc array (MDA) formed from a plurality of data storage devices such as shown in FIGS. 1 and 2 .
  • MDA multi-disc array
  • FIG. 4 represents a network system utilizing a number of the MDAs such as shown in FIG. 3 .
  • FIG. 5 provides a generalized functional block diagram of operation of a selected MDA/controller sub-system of FIG. 4 in accordance with preferred embodiments of the present invention.
  • FIG. 6 shows a preferred format for the data log of FIG. 5 .
  • FIG. 7 provides a flow of alternative statistical analysis strategies carried out by the sub-system of FIG. 5 .
  • FIG. 8 graphically illustrates a number of parametric data sets to better set forth preferred operation of the sub-system of FIG. 5 .
  • FIG. 1 shows an exploded view of a data storage device 100 .
  • the device 100 is preferably characterized as a 3 . 5 inch form factor hard disc drive of the type used to store and retrieve computerized data, but such is not limiting to the scope of the claimed subject matter.
  • the device 100 includes a rigid, environmentally controlled housing 102 formed from a base deck 104 and a top cover 106 .
  • a spindle motor 108 is mounted within the housing 102 to rotate a number of data storage media 110 at a relatively high speed.
  • Data are arranged on the media 110 in concentric tracks (not shown) which are accessed by a corresponding array of data transducing heads 112 .
  • the heads 112 (transducers) are supported by an actuator 114 and moved across the media surfaces by application of current to a voice coil motor, VCM 116 .
  • a flex circuit assembly 118 facilitates communication between the actuator 114 and control circuitry on an externally mounted printed circuit board, PCB 120 .
  • control circuitry preferably includes an interface circuit 124 which communicates with a host device using a suitable interface protocol.
  • a top level processor 126 provides top level control for the device 100 and is preferably characterized as a programmable, general purpose processor with suitable programming to direct the operation of the device 100 .
  • a read/write channel 128 operates in conjunction with a preamplifier/driver circuit (preamp) 130 to write data to and to recover data from the discs 108 .
  • preamp preamplifier/driver circuit
  • a servo circuit 132 provides closed loop positional control for the heads 112 .
  • the processor 126 can include programming routines to carry out failure trend detection during operation of the device 100 .
  • various parameters associated with the operation of the device 100 can be monitored over time, and variation in the values of these parameters can signal the onset of degraded performance or imminent failure.
  • Parameters that can be monitored in this way include, but are not limited to read error rates, channel quality, head bias current magnitudes, servo positioning times, spindle motor speed, vibration levels, operational temperature levels, the occurrence of thermal asperities or other grown defects on the media, etc.
  • preselected threshold levels for the various parameters are established.
  • the device 100 provides an alarm to the end user who can then take appropriate corrective action to ensure system data integrity, such as reallocation of the data stored by the device and replacement of the failed device with a new unit.
  • MDA multi-device array
  • the N devices 100 are arranged to communicate with a common input/output block 142 .
  • a power supply block 144 and a battery back-up supply 146 are included to meet the normal and standby requirements of the MDA 140 .
  • the components are preferably arranged into a common housing so as to provide a single plug-and-play unit which can be incorporated into a rack or other system. Additional elements such as cooling fans and interconnection backplanes are omitted for clarity of illustration, and redundant sets of the components shown in FIG. 4 (e.g., two power supplies, two battery back-ups, etc.) are preferably incorporated into the MDA 140 to enhance system reliability and availability.
  • FIG. 4 illustrates a network 150 in which a number of MDAs such as 140 are incorporated.
  • Each MDA 140 is shown to have an associated controller 152 which controls access to each respective MDA 140 .
  • Each controller 152 preferably includes a relatively powerful general purpose processor and a relatively large cache memory space to control large scale data transfers with the MDA 140 .
  • controllers 152 and two MDAs 140 are operated in tandem at each location for redundancy.
  • the controllers 152 communicate with a number of host computers 154 through a fabric 156 , which can comprise the Internet, a wide area network, or other network connection system.
  • FIG. 5 illustrates a preferred operational architecture of each controller/MDA combination from FIG. 4 .
  • operational parametric data from each of the devices 100 in the MDA 140 are accumulated by the controller 152 into a data log 160 .
  • a statistical analysis engine 162 analyzes the data and, when appropriate, initiates a data storage specific corrective action event using a corrective action module 164 .
  • the module 164 interfaces with a GUI 166 (graphical user interface) to provide visual and/or audible alarm indicators and other outputs to a user.
  • the GUI 166 further allows access to the engine 162 to initiate user-specific data requests and analyses.
  • the engine 162 further provides parametric monitoring data requests via command block 168 to adjust the types and/or sampling frequency of parametric data supplied to the log 160 , as required.
  • the log 160 is preferably stored in a designated portion of the non-volatile memory space provided by the devices 100 in the MDA 140 . From here, the entire log or selected portions thereof are uploaded into the cache memory space of the controller 152 to allow access by the engine 162 . Alternatively, separate provision of memory space (including a dedicated array) is provided accessable by the controller 152 to store the parametric data from the devices 100 .
  • the log 160 can take any number of forms, depending on the requirements of a given application.
  • a particularly useful format is generally set forth by FIG. 6 , which provides individual parametric data from each device 100 in separate “columns” using a common index (such as elapsed time).
  • the column for device 1 can comprise all of the data for a single parameter (e.g., channel quality) in historical sequence over time, with later obtained CQ measurements appended at the end. Similar data are provided in adjacent columns for each of the remaining devices 2 -N. Separate “sheets” can be formed to track each of the different operational parameters being monitored.
  • a single parameter e.g., channel quality
  • the log 160 represents historical parametric data across all of the relevant devices 100 in the MDA 140 .
  • a hierarchy of potential analysis modes is thus envisioned, as set forth by FIG. 7 .
  • the individual devices 100 continue as originally configured to carry out separate monitoring of selected parameters during operation. This is signified by block 174 . Such operation is separately carried out by the local top level processor 130 ( FIG. 2 ) in each device.
  • an alarm indication can be transmitted via the local I/F block 124 to the MDA I/O block 142 , which notifies the controller 152 .
  • the controller 152 takes the appropriate action, such as logging the event or notifying the user via the corrective action module 164 and GUI 166 .
  • the appropriate corrective action may be taken at the device level, by the device in response to a specific command control input by the controller, or by user intervention.
  • all of the parametric data collected and analyzed by the individual devices 100 are preferably forwarded to the data log 160 to accumulate the historical data into the log.
  • FIG. 7 Another level of analysis provided in FIG. 7 is the aforementioned vertical analysis by the engine 162 , depicted at block 176 .
  • this provides a second level of verification capability. That is, the engine 162 can carry out the same analysis in tandem with the local processor 130 , enhancing system reliability and reducing false positives.
  • the engine 162 can alternatively rely upon the local processors 130 to serve as first pass filter screens, so that alarms set by the individual devices 100 serve as inputs to the engine 162 to commence investigation and analysis at the controller level.
  • the engine 162 applies advanced statistical analyses to the existing data, and may use heuristic methods to request additional data not previously supplied by the associated device 100 (i.e., greater frequency of samples, reporting of other available but not normally reported parameters, etc.) in order to evaluate the situation and arrive at a decision with regard to whether a failure trend has in fact been detected and what corrective action, if any, should be taken.
  • the localized parametric optimization at the individual device level is eliminated, such being carried out instead by the more powerful engine 162 .
  • the devices 100 merely upload the associated run-time parametric data to the log with no or minimal analysis thereof.
  • the freeing of system resources at the individual device level on the analysis end can be used to budget greater amounts of data (more samples as well as greater numbers of parameters) to the data log 160 by the individual devices.
  • the vertical analysis represented by block 176 is envisioned as replacing the localized parametric analysis performed by the individual devices 100 (block 174 ).
  • the controller 152 because of the greater processing power of the controller 152 , more complex and computationally intensive statistical processes can be applied to the data than are presently available.
  • detection of an initial trend can result in tuned data requests via block 168 to the associated device 100 for more data to enhance the analysis.
  • Block 178 in FIG. 7 depicts the aforementioned horizontal analysis across multiple devices 100 in the MDA 140 .
  • This level of analysis is preferably performed in addition to the horizontal analyses of blocks 174 and/or 176 , such as on a time or parameter basis.
  • the horizontal analysis of block 178 involves performing an analysis on at least a subset of the data in the history log 160 , with the subset associated with at least multiple ones of the devices 100 in the MDA 140 (i.e., spread across multiple devices, or all of the devices in the array as required).
  • GUI 166 User-specified queries and analyses initiated through the GUI 166 are depicted at block 180 . It will be noted that the various blocks in FIG. 7 can be utilized singly or in combination, and the output of one can automatically trigger the execution of another.
  • FIG. 8 illustrates one manner in which the analysis blocks can be advantageously utilized.
  • FIG. 8 provides a generic series of parametric history curves 182 , 184 , 186 and 188 , graphically plotted against an index x-axis 190 and a common amplitude y-axis 192 . It will be recognized that graphical depiction of the parameter sets is not necessarily required by the engine 162 in order to carry out the associated processes, but such graphs facilitate the present discussion and can readily be provided to the user via the GUI 166 , as desired.
  • the curves 182 , 184 , 186 and 188 represent data for each of the devices 1 , 2 , 3 and N respectively associated with a particular parameter, in this case, error rate.
  • the data are represented such that lower values are “better” and higher values are “worse,” although such is merely one available formulation.
  • Associated baseline values are denoted via broken lines.
  • An increase in error rate in and of itself does not necessarily suggest a particular cause, but does allow immediate remedial corrective action to be taken, such as reallocation of the affected data, etc. so as to minimize the effects of the trend upon system performance. Further monitoring and diagnostics, however, can take place to isolate one or more causes, leading to elimination of the problem from the system.
  • Exemplary corrective actions include decommissioning of a particular head/media combination, substitution of a particular device for a standby “spare” within the MDA, application of a different RAID or ECC level, performance of routine scheduled maintenance, etc.
  • analyzing the data across multiple devices within the MDA 140 provides further important information with regard to this event, namely, that only device N is presently experiencing the localized increase in error rate and the other devices are apparently not affected within the applicable time period. In other words, even at this point it appears that the failure event is isolated to the device N.
  • the unified data log approach provides superior analysis and corrective action operations even when the data event is isolated to a single device, and even when the same level of analysis is performed as would be performed at the individual device level.
  • each of the curves 182 , 184 , 186 , 188 represent different parameters such as, for example, channel quality, servo qualification time, rotational vibration and off-track errors, respectively, for the same or different devices.
  • inter-parametric correlations such as at 196 and 198 can be identified, allowing further insight into the inter-dependency of respective parameters.
  • Time lag relationships can also be established such as, for ex-ample, the decrease at 198 inducing the corresponding increase at 194 . The identification of such relationships can better isolate the true cause of a particular event.
  • the recited first means will be understood to correspond to the controller structure set forth in FIG. 5 , with the engine configured to carry out horizontal analyses as depicted in FIGS. 6 and 7 .

Abstract

Method and apparatus for detecting and correcting parametric failure trends in a data storage array. A plurality of data storage devices, such as hard disc drives, are arranged to form a multi-device addressable memory array space. A controller controls access to the array space, and is configured to accumulate operational performance data from each of the devices into a history log. A statistical analysis engine of the controller analyzes the data to detect anomalous operation of the devices, including a horizontal analysis of data across multiple devices. The controller initiates a data storage device specific corrective action event in response to the analysis, as required. The analysis by the engine can be in addition to, or in lieu of, analysis by the individual devices. A data request block requests additional data samples for a given parameter, or requests additional parametric data to further the analysis.

Description

    FIELD OF THE INVENTION
  • The claimed invention relates generally to the field of data storage systems and more particularly, but not by way of limitation, to an apparatus and method for detecting and correcting parametric failure trends in a data storage array.
  • BACKGROUND
  • Multi-device arrays (MDAs) are relatively large data space storage systems comprising a number of data storage devices, such as hard disc drives (HDDs), that are grouped together to provide an inter-device addressable memory space. MDAs are increasingly used in a wide variety of data intensive applications, web servers and other network accessed systems.
  • Individual data storage devices can be equipped with routines that monitor various operational parameters to provide early failure trend detection capabilities. This allows a user to take appropriate corrective action, such as reallocation or replacement of the associated data storage device, prior to a system failure event that adversely affects other portions of the system.
  • While operable, due to the continued increase in the reliance and use of MDAs, there remains a continual need in the manner in which failure trends can be analyzed and system failure events can be avoided.
  • SUMMARY OF THE INVENTION
  • Preferred embodiments of the present invention are generally directed to an apparatus and method for detecting and correcting parametric failure trends in a data storage array.
  • In accordance with preferred embodiments, a plurality of data storage devices, such as hard disc drives, are arranged to form a multi-device addressable memory array space. A controller is provided to control access to the array space.
  • The controller is configured to accumulate operational performance data from each of the devices into a history log. A statistical analysis engine of the controller analyzes the data to detect anomalous operation of the devices, including a horizontal analysis of data across multiple devices. The controller utilizes a corrective action module to initiate a data storage device specific corrective action event in response to the analysis, as required.
  • The analysis by the engine can be in addition to, or in lieu of, analysis by the individual devices. A data request block requests additional data samples for a given parameter, or requests additional parametric data to further the analysis. A graphical user interface (GUI) reports alarm indications to a system user, as well as facilitates user-specified data collection and analyses.
  • These and various other features and advantages which characterize the claimed invention will become apparent upon reading the following detailed description and upon reviewing the associated drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is an exploded view of a data storage device constructed and operated in accordance with preferred embodiments of the present invention.
  • FIG. 2 is a generalized functional block diagram of the device of FIG. 1.
  • FIG. 3 illustrates relevant portions of a multi-disc array (MDA) formed from a plurality of data storage devices such as shown in FIGS. 1 and 2.
  • FIG. 4 represents a network system utilizing a number of the MDAs such as shown in FIG. 3.
  • FIG. 5 provides a generalized functional block diagram of operation of a selected MDA/controller sub-system of FIG. 4 in accordance with preferred embodiments of the present invention.
  • FIG. 6 shows a preferred format for the data log of FIG. 5.
  • FIG. 7 provides a flow of alternative statistical analysis strategies carried out by the sub-system of FIG. 5.
  • FIG. 8 graphically illustrates a number of parametric data sets to better set forth preferred operation of the sub-system of FIG. 5.
  • DETAILED DESCRIPTION
  • FIG. 1 shows an exploded view of a data storage device 100. The device 100 is preferably characterized as a 3.5 inch form factor hard disc drive of the type used to store and retrieve computerized data, but such is not limiting to the scope of the claimed subject matter.
  • The device 100 includes a rigid, environmentally controlled housing 102 formed from a base deck 104 and a top cover 106. A spindle motor 108 is mounted within the housing 102 to rotate a number of data storage media 110 at a relatively high speed.
  • Data are arranged on the media 110 in concentric tracks (not shown) which are accessed by a corresponding array of data transducing heads 112. The heads 112 (transducers) are supported by an actuator 114 and moved across the media surfaces by application of current to a voice coil motor, VCM 116. A flex circuit assembly 118 facilitates communication between the actuator 114 and control circuitry on an externally mounted printed circuit board, PCB 120.
  • As shown in FIG. 2, the control circuitry preferably includes an interface circuit 124 which communicates with a host device using a suitable interface protocol. A top level processor 126 provides top level control for the device 100 and is preferably characterized as a programmable, general purpose processor with suitable programming to direct the operation of the device 100.
  • A read/write channel 128 operates in conjunction with a preamplifier/driver circuit (preamp) 130 to write data to and to recover data from the discs 108. A servo circuit 132 provides closed loop positional control for the heads 112.
  • It is contemplated that the processor 126 can include programming routines to carry out failure trend detection during operation of the device 100. As those skilled in the art will recognize, various parameters associated with the operation of the device 100 can be monitored over time, and variation in the values of these parameters can signal the onset of degraded performance or imminent failure. Parameters that can be monitored in this way include, but are not limited to read error rates, channel quality, head bias current magnitudes, servo positioning times, spindle motor speed, vibration levels, operational temperature levels, the occurrence of thermal asperities or other grown defects on the media, etc.
  • In one approach, preselected threshold levels for the various parameters are established. When an associated threshold is reached, the device 100 provides an alarm to the end user who can then take appropriate corrective action to ensure system data integrity, such as reallocation of the data stored by the device and replacement of the failed device with a new unit.
  • It is becoming increasingly common to incorporate multiple sets of the devices 100 into a multi-device array (MDA), such as generally represented at 140 in FIG. 3. The MDA 140 pools the data storage capacity of the devices 100 to provide a single, relatively large addressable memory space. Well-known RAID techniques are preferably employed to distribute the recording of data across the various devices 100.
  • The N devices 100 are arranged to communicate with a common input/output block 142. A power supply block 144 and a battery back-up supply 146 are included to meet the normal and standby requirements of the MDA 140.
  • Although not depicted in FIG. 4, it will be understood that the components are preferably arranged into a common housing so as to provide a single plug-and-play unit which can be incorporated into a rack or other system. Additional elements such as cooling fans and interconnection backplanes are omitted for clarity of illustration, and redundant sets of the components shown in FIG. 4 (e.g., two power supplies, two battery back-ups, etc.) are preferably incorporated into the MDA 140 to enhance system reliability and availability.
  • FIG. 4 illustrates a network 150 in which a number of MDAs such as 140 are incorporated. Each MDA 140 is shown to have an associated controller 152 which controls access to each respective MDA 140. Each controller 152 preferably includes a relatively powerful general purpose processor and a relatively large cache memory space to control large scale data transfers with the MDA 140.
  • Although not shown, preferably two controllers 152 and two MDAs 140 are operated in tandem at each location for redundancy. The controllers 152 communicate with a number of host computers 154 through a fabric 156, which can comprise the Internet, a wide area network, or other network connection system.
  • FIG. 5 illustrates a preferred operational architecture of each controller/MDA combination from FIG. 4. As explained in greater detail below, operational parametric data from each of the devices 100 in the MDA 140 are accumulated by the controller 152 into a data log 160.
  • A statistical analysis engine 162 analyzes the data and, when appropriate, initiates a data storage specific corrective action event using a corrective action module 164. The module 164 interfaces with a GUI 166 (graphical user interface) to provide visual and/or audible alarm indicators and other outputs to a user. The GUI 166 further allows access to the engine 162 to initiate user-specific data requests and analyses. The engine 162 further provides parametric monitoring data requests via command block 168 to adjust the types and/or sampling frequency of parametric data supplied to the log 160, as required.
  • The log 160 is preferably stored in a designated portion of the non-volatile memory space provided by the devices 100 in the MDA 140. From here, the entire log or selected portions thereof are uploaded into the cache memory space of the controller 152 to allow access by the engine 162. Alternatively, separate provision of memory space (including a dedicated array) is provided accessable by the controller 152 to store the parametric data from the devices 100.
  • It is contemplated that the log 160 can take any number of forms, depending on the requirements of a given application. A particularly useful format is generally set forth by FIG. 6, which provides individual parametric data from each device 100 in separate “columns” using a common index (such as elapsed time).
  • Thus for example, the column for device 1 can comprise all of the data for a single parameter (e.g., channel quality) in historical sequence over time, with later obtained CQ measurements appended at the end. Similar data are provided in adjacent columns for each of the remaining devices 2-N. Separate “sheets” can be formed to track each of the different operational parameters being monitored.
  • Other constructs for the data log 160 are readily envisioned, however, including formats that group all or related subsets of correlated parameters into the same table, or that provide a different sheet per device. Regardless, the log represents historical parametric data across all of the relevant devices 100 in the MDA 140.
  • This facilitates the execution of a vertical analysis by the engine 162 upon data associated with a single one of the devices 100, as represented by vertical data block 170, as well as a horizontal analysis by the engine 162 across multiple devices, as represented by horizontal data block 172.
  • A hierarchy of potential analysis modes is thus envisioned, as set forth by FIG. 7. In some preferred embodiments, the individual devices 100 continue as originally configured to carry out separate monitoring of selected parameters during operation. This is signified by block 174. Such operation is separately carried out by the local top level processor 130 (FIG. 2) in each device.
  • In this example, when a particular parameter is found to be out-of-bounds, an alarm indication can be transmitted via the local I/F block 124 to the MDA I/O block 142, which notifies the controller 152. The controller 152 takes the appropriate action, such as logging the event or notifying the user via the corrective action module 164 and GUI 166. Depending upon the severity of the event, the appropriate corrective action may be taken at the device level, by the device in response to a specific command control input by the controller, or by user intervention.
  • In addition to the foregoing operation, all of the parametric data collected and analyzed by the individual devices 100 are preferably forwarded to the data log 160 to accumulate the historical data into the log.
  • Another level of analysis provided in FIG. 7 is the aforementioned vertical analysis by the engine 162, depicted at block 176. Using the above example where the individual devices 100 continue to perform in situ parametric analysis, this provides a second level of verification capability. That is, the engine 162 can carry out the same analysis in tandem with the local processor 130, enhancing system reliability and reducing false positives.
  • The engine 162 can alternatively rely upon the local processors 130 to serve as first pass filter screens, so that alarms set by the individual devices 100 serve as inputs to the engine 162 to commence investigation and analysis at the controller level. In this case, the engine 162 applies advanced statistical analyses to the existing data, and may use heuristic methods to request additional data not previously supplied by the associated device 100 (i.e., greater frequency of samples, reporting of other available but not normally reported parameters, etc.) in order to evaluate the situation and arrive at a decision with regard to whether a failure trend has in fact been detected and what corrective action, if any, should be taken.
  • In another alternative embodiment, the localized parametric optimization at the individual device level is eliminated, such being carried out instead by the more powerful engine 162. In this case the devices 100 merely upload the associated run-time parametric data to the log with no or minimal analysis thereof.
  • An advantage of this particular approach is the simplification of the design and programming of the individual devices, since the power and resources required for such analysis can be eliminated from the design. It will be appreciated by those skilled in the art that such simplifications can result in a not insignificant cost savings per device, which when multiplied by the sheer volume of devices incorporated into the MDAs can result in significant cost savings and system availability advances.
  • Alternatively, the freeing of system resources at the individual device level on the analysis end can be used to budget greater amounts of data (more samples as well as greater numbers of parameters) to the data log 160 by the individual devices.
  • Accordingly, in this alternative approach the vertical analysis represented by block 176 is envisioned as replacing the localized parametric analysis performed by the individual devices 100 (block 174). As before, because of the greater processing power of the controller 152, more complex and computationally intensive statistical processes can be applied to the data than are presently available. Moreover, detection of an initial trend can result in tuned data requests via block 168 to the associated device 100 for more data to enhance the analysis.
  • Block 178 in FIG. 7 depicts the aforementioned horizontal analysis across multiple devices 100 in the MDA 140. This level of analysis is preferably performed in addition to the horizontal analyses of blocks 174 and/or 176, such as on a time or parameter basis. It will be noted that the horizontal analysis of block 178 involves performing an analysis on at least a subset of the data in the history log 160, with the subset associated with at least multiple ones of the devices 100 in the MDA 140 (i.e., spread across multiple devices, or all of the devices in the array as required).
  • User-specified queries and analyses initiated through the GUI 166 are depicted at block 180. It will be noted that the various blocks in FIG. 7 can be utilized singly or in combination, and the output of one can automatically trigger the execution of another.
  • FIG. 8 illustrates one manner in which the analysis blocks can be advantageously utilized. FIG. 8 provides a generic series of parametric history curves 182, 184, 186 and 188, graphically plotted against an index x-axis 190 and a common amplitude y-axis 192. It will be recognized that graphical depiction of the parameter sets is not necessarily required by the engine 162 in order to carry out the associated processes, but such graphs facilitate the present discussion and can readily be provided to the user via the GUI 166, as desired.
  • In a first example, it will be contemplated that the curves 182, 184, 186 and 188 represent data for each of the devices 1, 2, 3 and N respectively associated with a particular parameter, in this case, error rate. The data are represented such that lower values are “better” and higher values are “worse,” although such is merely one available formulation. Associated baseline values are denoted via broken lines.
  • It can be seen that a significant upward trend in error rate for device N (denoted locally at 194) can be readily detected, either by trend analysis (moving average, etc.) or via cross-over of an associated threshold (not shown).
  • An increase in error rate in and of itself does not necessarily suggest a particular cause, but does allow immediate remedial corrective action to be taken, such as reallocation of the affected data, etc. so as to minimize the effects of the trend upon system performance. Further monitoring and diagnostics, however, can take place to isolate one or more causes, leading to elimination of the problem from the system. Exemplary corrective actions include decommissioning of a particular head/media combination, substitution of a particular device for a standby “spare” within the MDA, application of a different RAID or ECC level, performance of routine scheduled maintenance, etc.
  • Continuing with this example, it will be noted that analyzing the data across multiple devices within the MDA 140 provides further important information with regard to this event, namely, that only device N is presently experiencing the localized increase in error rate and the other devices are apparently not affected within the applicable time period. In other words, even at this point it appears that the failure event is isolated to the device N.
  • The reader may note that the same knowledge would appear to be available simply relying upon the separate, individual device level analysis of block 174, but this is not the case; the failure of any of the other devices in the array to identify an out-of-bounds condition trend is not the same as knowing globally what the specific data are for each of the devices at the same time. Accordingly, the unified data log approach provides superior analysis and corrective action operations even when the data event is isolated to a single device, and even when the same level of analysis is performed as would be performed at the individual device level.
  • Continuing with another example using FIG. 8, it will now be contemplated that each of the curves 182, 184, 186, 188 represent different parameters such as, for example, channel quality, servo qualification time, rotational vibration and off-track errors, respectively, for the same or different devices. In this case, inter-parametric correlations such as at 196 and 198 can be identified, allowing further insight into the inter-dependency of respective parameters. Time lag relationships can also be established such as, for ex-ample, the decrease at 198 inducing the corresponding increase at 194. The identification of such relationships can better isolate the true cause of a particular event.
  • For example, it might be determined that the device associated with curve 184 (device 2) is inducing the error in curve 188 (device N) by way of acting upon the device represented by curve 186 (device 3). Thus, adjustment or replacement of device 2 would resolve the operational difficulties experienced by devices 3 and N, and so on.
  • It will now be appreciated that the preferred embodiments of the present invention as set forth herein present advantages over the prior art. Using the data log 160 to accumulate historical data across a number of the devices 100 can provide cost savings and the freeing of system resources, deeper and global analysis of the parametric data on a per device basis, and analysis of the data across multiple devices.
  • For purposes of the appended claims, the recited first means will be understood to correspond to the controller structure set forth in FIG. 5, with the engine configured to carry out horizontal analyses as depicted in FIGS. 6 and 7.
  • It is to be understood that even though numerous characteristics and advantages of various embodiments of the present invention have been set forth in the foregoing description, together with details of the structure and function of various embodiments of the invention, this detailed description is illustrative only, and changes may be made in detail, especially in matters of structure and arrangements of parts within the principles of the present invention to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed. For example, the particular elements may vary depending on the particular control environment without departing from the spirit and scope of the present invention.
  • In addition, although the embodiments described herein are directed to a multiple disc array that employs a number of hard disc drives to present a common addressable memory space, it will be appreciated by those skilled in the art that the claimed subject matter is not so limited and various other data storage systems, including optical based and solid state data storage devices, can readily be utilized without departing from the spirit and scope of the claimed invention.

Claims (20)

1. An apparatus comprising a plurality of data storage devices arranged to form a multi-device array space and a controller which controls access to the array space, the controller configured to accumulate operational performance data from each of the plurality of data storage devices into a history log, to analyze said data to detect anomalous operation of said devices, and to initiate a data storage device specific corrective action event in relation to said analysis.
2. The apparatus of claim 1, wherein each of plurality of data storage devices analyzes the operational performance data that is accumulated into the history log associated with said device to detect anomalous operation of said device.
3. The apparatus of claim 1, wherein the analysis of said data by the controller comprises parametric data associated with multiple ones of the plurality of the data storage devices.
4. The apparatus of claim 1, wherein the controller comprises a statistical analysis engine which operates upon the data stored in the data log to analyze said data.
5. The apparatus of claim 4, wherein the controller further comprises a corrective action module which forwards an alarm indication to a user of the system in response to detection of said anomalous operation of said devices.
6. The apparatus of claim 4, further comprising a graphical user interface in communication with the engine to facilitate user-specified analysis by the engine upon the data accumulated in the data log.
7. The apparatus of claim 4, further comprising a data request block in communication with the engine which issues a request to at least a selected one of the data storage devices to provide additional data to the data log in response to the engine.
8. The apparatus of claim 1, wherein the data log is stored in the array space established by the plurality of data storage devices.
9. The apparatus of claim 1, wherein each of the plurality of data storage devices is characterized as a hard disc drive comprising at least one rotatable data storage medium accessed by a moveable transducer.
10. An apparatus, comprising:
a plurality of data storage devices arranged to form a multi-device memory array space; and
first means for accumulating operational performance data from each of the plurality of data storage devices, for performing an analysis of a subset of said data associated with multiple ones of said devices, and for providing an alarm indication to a user in response to detection of an anomalous event as a result of said analysis.
11. The apparatus of claim 10, wherein at least one of the plurality of data storage devices performs an analysis of the accumulated operational performance data, and wherein the first means operates in response to the analysis performed by the at least one of the plurality of data storage devices.
12. The apparatus of claim 10, wherein the first means further issues a data request command to at least one of the plurality of data storage devices to supply additional data for accumulation and analysis by the first means.
13. A method comprising:
arranging a plurality of data storage devices to form a multi-device memory array space; and
providing a controller which controls access to the array space, the controller configured to accumulate operational performance data from each of the plurality of data storage devices into a history log, to analyze said data to detect anomalous operation of said devices, and to initiate a data storage device specific corrective action event in relation to said analysis.
14. The method of claim 13, further comprising a step of configuring each of the plurality of data storage devices to separately analyze the operational performance data that is accumulated into the history log associated with said device to detect anomalous operation of said device.
15. The method of claim 13, wherein the analysis of said data during the providing step comprises an analysis of parametric data associated with multiple ones of the plurality of the data storage devices.
16. The method of claim 13, wherein the controller of the providing step comprises a statistical analysis engine which operates upon the data stored in the data log to analyze said data.
17. The method of claim 16, wherein the controller of the providing step further comprises a corrective action module which forwards an alarm indication to a user of the system in response to detection of said anomalous operation of said devices.
18. The method of claim 16, wherein the controller of the providing step further comprises a data request block in communication with the engine which issues a request to at least a selected one of the data storage devices to provide additional data to the data log in response to the analysis performed by the engine.
19. The method of claim 13, wherein the data log of the providing step is stored in the array space formed from the plurality of data storage devices.
20. The method of claim 13, wherein each of the plurality of data storage devices is characterized as a hard disc drive comprising at least one rotatable data storage medium accessed by a moveable transducer.
US11/070,942 2005-03-03 2005-03-03 Failure trend detection and correction in a data storage array Abandoned US20060200726A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US11/070,942 US20060200726A1 (en) 2005-03-03 2005-03-03 Failure trend detection and correction in a data storage array
JP2005202408A JP5059304B2 (en) 2005-03-03 2005-07-12 Apparatus and method for detecting and correcting failure trends in data storage arrays
US11/867,543 US7765437B2 (en) 2005-03-03 2007-10-04 Failure trend detection and correction in a data storage array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/070,942 US20060200726A1 (en) 2005-03-03 2005-03-03 Failure trend detection and correction in a data storage array

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/867,543 Continuation US7765437B2 (en) 2005-03-03 2007-10-04 Failure trend detection and correction in a data storage array

Publications (1)

Publication Number Publication Date
US20060200726A1 true US20060200726A1 (en) 2006-09-07

Family

ID=36945437

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/070,942 Abandoned US20060200726A1 (en) 2005-03-03 2005-03-03 Failure trend detection and correction in a data storage array
US11/867,543 Active 2025-08-20 US7765437B2 (en) 2005-03-03 2007-10-04 Failure trend detection and correction in a data storage array

Family Applications After (1)

Application Number Title Priority Date Filing Date
US11/867,543 Active 2025-08-20 US7765437B2 (en) 2005-03-03 2007-10-04 Failure trend detection and correction in a data storage array

Country Status (2)

Country Link
US (2) US20060200726A1 (en)
JP (1) JP5059304B2 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060218439A1 (en) * 2005-03-23 2006-09-28 Microsoft Corporation Threat event-driven backup
US20070074083A1 (en) * 2005-09-20 2007-03-29 Seagate Technology Llc Preventive recovery from adjacent track interference
US20110010698A1 (en) * 2009-07-13 2011-01-13 Apple Inc. Test partitioning for a non-volatile memory
US20110119090A1 (en) * 2009-02-09 2011-05-19 Steven Lazar Smart cap with communication function
US20110239064A1 (en) * 2010-03-24 2011-09-29 Apple Inc. Management of a non-volatile memory based on test quality
US20110239065A1 (en) * 2010-03-24 2011-09-29 Apple Inc. Run-time testing of memory locations in a non-volatile memory
US8726095B2 (en) 2010-12-02 2014-05-13 Dell Products L.P. System and method for proactive management of an information handling system with in-situ measurement of end user actions
US8751903B2 (en) 2010-07-26 2014-06-10 Apple Inc. Methods and systems for monitoring write operations of non-volatile memory
US20140233365A1 (en) * 2006-08-25 2014-08-21 International Business Machines Corporation Periodic rotational vibration check for storage devices to compensate for varying loads
US20160170822A1 (en) * 2011-02-09 2016-06-16 Ebay Inc. High-volume distributed script error handling
US20170371689A1 (en) * 2013-03-12 2017-12-28 Intel Corporation Layered virtual machine integrity monitoring
US10514978B1 (en) * 2015-10-23 2019-12-24 Pure Storage, Inc. Automatic deployment of corrective measures for storage arrays
US10599536B1 (en) * 2015-10-23 2020-03-24 Pure Storage, Inc. Preventing storage errors using problem signatures
US10818113B1 (en) 2016-04-11 2020-10-27 State Farm Mutual Automobile Insuance Company Systems and methods for providing awareness of emergency vehicles
US10872379B1 (en) 2016-04-11 2020-12-22 State Farm Mutual Automobile Insurance Company Collision risk-based engagement and disengagement of autonomous control of a vehicle
US10895471B1 (en) 2016-04-11 2021-01-19 State Farm Mutual Automobile Insurance Company System for driver's education
US10930158B1 (en) 2016-04-11 2021-02-23 State Farm Mutual Automobile Insurance Company System for identifying high risk parking lots
US10933881B1 (en) 2016-04-11 2021-03-02 State Farm Mutual Automobile Insurance Company System for adjusting autonomous vehicle driving behavior to mimic that of neighboring/surrounding vehicles
US10989556B1 (en) 2016-04-11 2021-04-27 State Farm Mutual Automobile Insurance Company Traffic risk a avoidance for a route selection system
US11024157B1 (en) 2016-04-11 2021-06-01 State Farm Mutual Automobile Insurance Company Networked vehicle control systems to facilitate situational awareness of vehicles
US11150971B1 (en) 2020-04-07 2021-10-19 International Business Machines Corporation Pattern recognition for proactive treatment of non-contiguous growing defects
US11360844B1 (en) 2015-10-23 2022-06-14 Pure Storage, Inc. Recovery of a container storage provider
US11498537B1 (en) 2016-04-11 2022-11-15 State Farm Mutual Automobile Insurance Company System for determining road slipperiness in bad weather conditions

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5468837B2 (en) * 2009-07-30 2014-04-09 株式会社日立製作所 Anomaly detection method, apparatus, and program
WO2011083687A1 (en) 2010-01-08 2011-07-14 日本電気株式会社 Operation management device, operation management method, and program storage medium
WO2014006701A1 (en) * 2012-07-04 2014-01-09 富士通株式会社 Information processing device, access control program, and access control method
US8970977B1 (en) * 2012-09-28 2015-03-03 Western Digital Technologies, Inc. Disk drive logging failure analysis data when performing an emergency unload
US8908308B1 (en) 2013-11-26 2014-12-09 Seagate Technology Llc Adaptive passive data track erasure healing
US11099924B2 (en) 2016-08-02 2021-08-24 International Business Machines Corporation Preventative system issue resolution
US10621026B2 (en) * 2017-06-04 2020-04-14 Apple Inc. Auto bug capture
US11237893B2 (en) 2019-06-26 2022-02-01 Western Digital Technologies, Inc. Use of error correction-based metric for identifying poorly performing data storage devices
US10969969B2 (en) 2019-06-26 2021-04-06 Western Digital Technologies, Inc. Use of recovery behavior for prognosticating and in-situ repair of data storage devices

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5450609A (en) * 1990-11-13 1995-09-12 Compaq Computer Corp. Drive array performance monitor
US5721816A (en) * 1996-07-29 1998-02-24 Kusbel; Paul F. Adaptive recovery of read and write errors in a disc drive
US5774285A (en) * 1995-09-06 1998-06-30 Seagate Technology, Inc. Selection of optimal read/write channel parameters in a hard disc drive
US6249890B1 (en) * 1998-06-05 2001-06-19 Seagate Technology Llc Detecting head readback response degradation in a disc drive
US20020053046A1 (en) * 1998-09-21 2002-05-02 Gray William F. Apparatus and method for predicting failure of a disk drive
US6401214B1 (en) * 1999-03-04 2002-06-04 International Business Machines Corporation Preventive recovery action in hard disk drives
US6415189B1 (en) * 1999-07-23 2002-07-02 International Business Machines Corporation Method and system for predicting disk drive failures
US6460151B1 (en) * 1999-07-26 2002-10-01 Microsoft Corporation System and method for predicting storage device failures
US6606210B1 (en) * 1999-04-21 2003-08-12 Seagate Technology Llc Intelligent sector recovery algorithm
US6611393B1 (en) * 2001-04-30 2003-08-26 Western Digital Technologies, Inc. Disk drive employing field calibration based on marginal sectors
US20030182136A1 (en) * 2002-03-21 2003-09-25 Horton Kurt H. System and method for ranking objects by likelihood of possessing a property
US20030221057A1 (en) * 2002-05-21 2003-11-27 International Business Machines Corporation Hot spare reliability for storage arrays and storage networks
US20040051988A1 (en) * 2002-09-16 2004-03-18 Jing Gary Gang Predictive disc drive failure methodology
US6738757B1 (en) * 1999-06-02 2004-05-18 Workwise, Inc. System for database monitoring and agent implementation
US20040103246A1 (en) * 2002-11-26 2004-05-27 Paresh Chatterjee Increased data availability with SMART drives
US6760174B2 (en) * 2001-08-06 2004-07-06 Seagate Technology Llc Adaptive fly height for error recovery in a disc drive
US6771440B2 (en) * 2001-12-18 2004-08-03 International Business Machines Corporation Adaptive event-based predictive failure analysis measurements in a hard disk drive
US6832236B1 (en) * 1999-07-08 2004-12-14 International Business Machines Corporation Method and system for implementing automatic filesystem growth monitor for production UNIX computer system
US20050060618A1 (en) * 2003-09-11 2005-03-17 Copan Systems, Inc. Method and system for proactive drive replacement for high availability storage systems

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US624890A (en) * 1899-05-09 Valve-controlling device
JPH0651915A (en) * 1992-08-03 1994-02-25 Hitachi Ltd Disk device and disk array control system
JPH11345095A (en) * 1998-06-02 1999-12-14 Toshiba Corp Disk array device and control method therefor
JP2000305720A (en) * 1999-04-15 2000-11-02 Nec Software Hokkaido Ltd Method for automatically resorting array disk and system therefor
JP2004227449A (en) * 2003-01-27 2004-08-12 Hitachi Ltd Diagnostic device for trouble in disk array device
US7317943B2 (en) * 2003-01-31 2008-01-08 Medtronic, Inc. Capture threshold monitoring
EP1810143A4 (en) * 2004-09-22 2011-03-16 Xyratex Tech Ltd System and method for network performance monitoring and predictive failure analysis
US7769975B2 (en) * 2004-11-15 2010-08-03 International Business Machines Corporation Method for configuring volumes in a storage system
US20070079170A1 (en) * 2005-09-30 2007-04-05 Zimmer Vincent J Data migration in response to predicted disk failure

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5450609A (en) * 1990-11-13 1995-09-12 Compaq Computer Corp. Drive array performance monitor
US5774285A (en) * 1995-09-06 1998-06-30 Seagate Technology, Inc. Selection of optimal read/write channel parameters in a hard disc drive
US5721816A (en) * 1996-07-29 1998-02-24 Kusbel; Paul F. Adaptive recovery of read and write errors in a disc drive
US6249890B1 (en) * 1998-06-05 2001-06-19 Seagate Technology Llc Detecting head readback response degradation in a disc drive
US20020053046A1 (en) * 1998-09-21 2002-05-02 Gray William F. Apparatus and method for predicting failure of a disk drive
US6401214B1 (en) * 1999-03-04 2002-06-04 International Business Machines Corporation Preventive recovery action in hard disk drives
US6606210B1 (en) * 1999-04-21 2003-08-12 Seagate Technology Llc Intelligent sector recovery algorithm
US6738757B1 (en) * 1999-06-02 2004-05-18 Workwise, Inc. System for database monitoring and agent implementation
US6832236B1 (en) * 1999-07-08 2004-12-14 International Business Machines Corporation Method and system for implementing automatic filesystem growth monitor for production UNIX computer system
US6415189B1 (en) * 1999-07-23 2002-07-02 International Business Machines Corporation Method and system for predicting disk drive failures
US6460151B1 (en) * 1999-07-26 2002-10-01 Microsoft Corporation System and method for predicting storage device failures
US6611393B1 (en) * 2001-04-30 2003-08-26 Western Digital Technologies, Inc. Disk drive employing field calibration based on marginal sectors
US6760174B2 (en) * 2001-08-06 2004-07-06 Seagate Technology Llc Adaptive fly height for error recovery in a disc drive
US6771440B2 (en) * 2001-12-18 2004-08-03 International Business Machines Corporation Adaptive event-based predictive failure analysis measurements in a hard disk drive
US20030182136A1 (en) * 2002-03-21 2003-09-25 Horton Kurt H. System and method for ranking objects by likelihood of possessing a property
US6732233B2 (en) * 2002-05-21 2004-05-04 International Business Machines Corporation Hot spare reliability for storage arrays and storage networks
US20030221057A1 (en) * 2002-05-21 2003-11-27 International Business Machines Corporation Hot spare reliability for storage arrays and storage networks
US20040051988A1 (en) * 2002-09-16 2004-03-18 Jing Gary Gang Predictive disc drive failure methodology
US20040103246A1 (en) * 2002-11-26 2004-05-27 Paresh Chatterjee Increased data availability with SMART drives
US20050060618A1 (en) * 2003-09-11 2005-03-17 Copan Systems, Inc. Method and system for proactive drive replacement for high availability storage systems

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7636872B2 (en) * 2005-03-23 2009-12-22 Microsoft Corporation Threat event-driven backup
US20060218439A1 (en) * 2005-03-23 2006-09-28 Microsoft Corporation Threat event-driven backup
US20070074083A1 (en) * 2005-09-20 2007-03-29 Seagate Technology Llc Preventive recovery from adjacent track interference
US7747907B2 (en) * 2005-09-20 2010-06-29 Seagate Technology Llc Preventive recovery from adjacent track interference
US10043554B2 (en) * 2006-08-25 2018-08-07 International Business Machines Corporation Periodic rotational vibration check for storage devices to compensate for varying loads
US20140233365A1 (en) * 2006-08-25 2014-08-21 International Business Machines Corporation Periodic rotational vibration check for storage devices to compensate for varying loads
US20110119090A1 (en) * 2009-02-09 2011-05-19 Steven Lazar Smart cap with communication function
US8319613B2 (en) * 2009-02-09 2012-11-27 Steven Lazar Smart cap with communication function
US8683456B2 (en) 2009-07-13 2014-03-25 Apple Inc. Test partitioning for a non-volatile memory
US20110010698A1 (en) * 2009-07-13 2011-01-13 Apple Inc. Test partitioning for a non-volatile memory
US9472285B2 (en) 2009-07-13 2016-10-18 Apple Inc. Test partitioning for a non-volatile memory
US8650446B2 (en) * 2010-03-24 2014-02-11 Apple Inc. Management of a non-volatile memory based on test quality
US8645776B2 (en) * 2010-03-24 2014-02-04 Apple Inc. Run-time testing of memory locations in a non-volatile memory
US20110239065A1 (en) * 2010-03-24 2011-09-29 Apple Inc. Run-time testing of memory locations in a non-volatile memory
US20110239064A1 (en) * 2010-03-24 2011-09-29 Apple Inc. Management of a non-volatile memory based on test quality
US8751903B2 (en) 2010-07-26 2014-06-10 Apple Inc. Methods and systems for monitoring write operations of non-volatile memory
US9146821B2 (en) 2010-07-26 2015-09-29 Apple Inc. Methods and systems for monitoring write operations of non-volatile memory
US8726095B2 (en) 2010-12-02 2014-05-13 Dell Products L.P. System and method for proactive management of an information handling system with in-situ measurement of end user actions
US9195561B2 (en) 2010-12-02 2015-11-24 Dell Products L.P. System and method for proactive management of an information handling system with in-situ measurement of end user actions
US10671469B2 (en) * 2011-02-09 2020-06-02 Ebay Inc. High-volume distributed script error handling
US20160170822A1 (en) * 2011-02-09 2016-06-16 Ebay Inc. High-volume distributed script error handling
US10671416B2 (en) * 2013-03-12 2020-06-02 Intel Corporation Layered virtual machine integrity monitoring
US20170371689A1 (en) * 2013-03-12 2017-12-28 Intel Corporation Layered virtual machine integrity monitoring
US11360844B1 (en) 2015-10-23 2022-06-14 Pure Storage, Inc. Recovery of a container storage provider
AU2016342069B2 (en) * 2015-10-23 2021-05-27 Pure Storage, Inc. Proactively providing corrective measures for storage arrays
US11934260B2 (en) 2015-10-23 2024-03-19 Pure Storage, Inc. Problem signature-based corrective measure deployment
US11874733B2 (en) 2015-10-23 2024-01-16 Pure Storage, Inc. Recovering a container storage system
US11061758B1 (en) 2015-10-23 2021-07-13 Pure Storage, Inc. Proactively providing corrective measures for storage arrays
US11593194B2 (en) 2015-10-23 2023-02-28 Pure Storage, Inc. Cloud-based providing of one or more corrective measures for a storage system
US10599536B1 (en) * 2015-10-23 2020-03-24 Pure Storage, Inc. Preventing storage errors using problem signatures
US10514978B1 (en) * 2015-10-23 2019-12-24 Pure Storage, Inc. Automatic deployment of corrective measures for storage arrays
US10872379B1 (en) 2016-04-11 2020-12-22 State Farm Mutual Automobile Insurance Company Collision risk-based engagement and disengagement of autonomous control of a vehicle
US11727495B1 (en) 2016-04-11 2023-08-15 State Farm Mutual Automobile Insurance Company Collision risk-based engagement and disengagement of autonomous control of a vehicle
US10988960B1 (en) 2016-04-11 2021-04-27 State Farm Mutual Automobile Insurance Company Systems and methods for providing awareness of emergency vehicles
US10989556B1 (en) 2016-04-11 2021-04-27 State Farm Mutual Automobile Insurance Company Traffic risk a avoidance for a route selection system
US10818113B1 (en) 2016-04-11 2020-10-27 State Farm Mutual Automobile Insuance Company Systems and methods for providing awareness of emergency vehicles
US11024157B1 (en) 2016-04-11 2021-06-01 State Farm Mutual Automobile Insurance Company Networked vehicle control systems to facilitate situational awareness of vehicles
US10933881B1 (en) 2016-04-11 2021-03-02 State Farm Mutual Automobile Insurance Company System for adjusting autonomous vehicle driving behavior to mimic that of neighboring/surrounding vehicles
US11205340B2 (en) 2016-04-11 2021-12-21 State Farm Mutual Automobile Insurance Company Networked vehicle control systems to facilitate situational awareness of vehicles
US11257377B1 (en) 2016-04-11 2022-02-22 State Farm Mutual Automobile Insurance Company System for identifying high risk parking lots
US10930158B1 (en) 2016-04-11 2021-02-23 State Farm Mutual Automobile Insurance Company System for identifying high risk parking lots
US11498537B1 (en) 2016-04-11 2022-11-15 State Farm Mutual Automobile Insurance Company System for determining road slipperiness in bad weather conditions
US10895471B1 (en) 2016-04-11 2021-01-19 State Farm Mutual Automobile Insurance Company System for driver's education
US11656094B1 (en) 2016-04-11 2023-05-23 State Farm Mutual Automobile Insurance Company System for driver's education
US10991181B1 (en) 2016-04-11 2021-04-27 State Farm Mutual Automobile Insurance Company Systems and method for providing awareness of emergency vehicles
US11851041B1 (en) 2016-04-11 2023-12-26 State Farm Mutual Automobile Insurance Company System for determining road slipperiness in bad weather conditions
US10829966B1 (en) 2016-04-11 2020-11-10 State Farm Mutual Automobile Insurance Company Systems and methods for control systems to facilitate situational awareness of a vehicle
US11150971B1 (en) 2020-04-07 2021-10-19 International Business Machines Corporation Pattern recognition for proactive treatment of non-contiguous growing defects

Also Published As

Publication number Publication date
US20080244316A1 (en) 2008-10-02
US7765437B2 (en) 2010-07-27
JP2006244447A (en) 2006-09-14
JP5059304B2 (en) 2012-10-24

Similar Documents

Publication Publication Date Title
US7765437B2 (en) Failure trend detection and correction in a data storage array
US6600614B2 (en) Critical event log for a disc drive
US7526684B2 (en) Deterministic preventive recovery from a predicted failure in a distributed storage system
JP3348417B2 (en) Data recovery method in storage system
US10606722B2 (en) Method and system for diagnosing remaining lifetime of storages in data center
US6415189B1 (en) Method and system for predicting disk drive failures
US10268553B2 (en) Adaptive failure prediction modeling for detection of data storage device failures
US20060053338A1 (en) Method and system for disk drive exercise and maintenance of high-availability storage systems
CN100368976C (en) Disk array apparatus and backup method of data
JP2005322399A (en) Maintenance method of track data integrity in magnetic disk storage device
US7330325B2 (en) Proactive fault monitoring of disk drives through phase-sensitive surveillance
Huang et al. Characterizing disk health degradation and proactively protecting against disk failures for reliable storage systems
US20060215456A1 (en) Disk array data protective system and method
US8843781B1 (en) Managing drive error information in data storage systems
JP4775843B2 (en) Storage system and storage control method
US7461298B2 (en) Method and apparatus for diagnosing mass storage device anomalies
JP2006252733A (en) Medium storage device and write path diagnosing method for the same
US20060245103A1 (en) Storage device system operating based on system information, and method for controlling thereof
JP2003263703A5 (en)
US9384077B2 (en) Storage control apparatus and method for controlling storage apparatus
CN113179665A (en) Identifying underperforming data storage devices using error correction based metrics
US7457990B2 (en) Information processing apparatus and information processing recovery method
JP7273669B2 (en) Storage system and its control method
JP2880701B2 (en) Disk subsystem
US10599343B2 (en) Proactively resilvering a striped disk array in advance of a predicted disk drive failure

Legal Events

Date Code Title Description
AS Assignment

Owner name: SEAGATE TECHNOLOGY LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GITTINS, ROBERT SHERWOOD;LESTER, ROBERT MICHAEL;REEL/FRAME:016348/0052

Effective date: 20050302

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION