US20040054939A1 - Method and apparatus for power-efficient high-capacity scalable storage system - Google Patents

Method and apparatus for power-efficient high-capacity scalable storage system Download PDF

Info

Publication number
US20040054939A1
US20040054939A1 US10/607,932 US60793203A US2004054939A1 US 20040054939 A1 US20040054939 A1 US 20040054939A1 US 60793203 A US60793203 A US 60793203A US 2004054939 A1 US2004054939 A1 US 2004054939A1
Authority
US
United States
Prior art keywords
data storage
drives
data
drive
storage drives
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/607,932
Other versions
US7035972B2 (en
US20050268119A9 (en
Inventor
Aloke Guha
Chris Santilli
Gary McMillian
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US10/607,932 priority Critical patent/US7035972B2/en
Application filed by Individual filed Critical Individual
Publication of US20040054939A1 publication Critical patent/US20040054939A1/en
Priority to US11/076,447 priority patent/US20050210304A1/en
Priority to US11/108,077 priority patent/US7210004B2/en
Publication of US20050268119A9 publication Critical patent/US20050268119A9/en
Priority to US11/322,787 priority patent/US7330931B2/en
Priority to US11/351,979 priority patent/US7210005B2/en
Application granted granted Critical
Publication of US7035972B2 publication Critical patent/US7035972B2/en
Priority to US11/716,338 priority patent/US7380060B2/en
Priority to US11/686,268 priority patent/US20070220316A1/en
Priority to US11/953,712 priority patent/US20080114948A1/en
Active - Reinstated legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0625Power saving in storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/325Power saving in peripheral device
    • G06F1/3268Power saving in hard disk drive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0634Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates generally to data storage systems, and more particularly to power-efficient, high-capacity data storage systems that are scalable and reliable.
  • RAID redundant array of inexpensive (independent) disks
  • RAID level 1 There are six commonly known RAID “levels” or standard geometries that are generally used for conventional RAID storage systems.
  • the simplest array that provides a form of redundancy, a RAID level 1 system comprises one or more disks for storing data and an equal number of additional mirror disks for storing copies of the information written to the data disks.
  • the remaining RAID levels, identified as RAID level 2 - 6 systems segment the data into portions for storage across several data disks. One of more additional disks is utilized to store error check or parity information.
  • RAID storage subsystems typically utilize a control module that shields the user or host system from the details of managing the redundant array.
  • the controller makes the subsystem appear to the host computer as a single, highly reliable, high capacity disk drive even though a RAID controller may distribute the data across many smaller drives.
  • RAID subsystems provide large cache memory structures to further improve the performance of the subsystem.
  • the host system simply requests blocks of data to be read or written and the RAID controller manipulates the disk array and cache memory as required.
  • RAID levels are distinguished by their relative performance capabilities as well as their overhead storage requirements. For example, a RAID level 1 “mirrored” storage system requires more overhead storage than RAID levels 2 - 5 that utilize XOR parity to provide requisite redundancy. RAID level 1 requires 100% overhead since it duplicates all data, while RAID level 5 requires 1/N of the storage capacity used for storing data, where N is the number of data disk drives used in the RAID set.
  • one approach employs energy-conscious provisioning of servers by concentrating request loads to a minimal active set of servers for the current aggregate load level (see Jeffrey S. Chase, Darrell C. Anderson, Parchi N. Thakar, Amin M. Vahdat, and Ronald P. Doyle. Managing energy and server resources in hosting centers. In Proceedings of the 18 th ACM Symposium on Operating Systems Principles, pages 103-116, October 2001). Active servers always run near a configured utilization threshold, while the excess servers transition to low-power idle states to reduce the energy cost of maintaining surplus capacity during periods of light load. The focus is on power cycling servers and not on storage. Chase, et al. mention that power cycling may reduce the life of the disks, but current disks have a start/stop limit that will likely not be exceeded.
  • MAID massive array of idle disks
  • FAST Usenix Conference on File and Storage Technologies
  • the power off schedule is based on a heuristic, such as a least-recently-used or least expected to be used model, i.e., the array of drives is turned off when no data access is expected on any of the drives in the array.
  • a heuristic such as a least-recently-used or least expected to be used model
  • Another approach uses archival storage systems where ATA drives are also powered off (as in the case of MAID) based on the algorithms similar to the LRU policy (see Kai Li and Howard Lee, Archival data storage system and method, U.S. patent application Ser. No. 2002-0144057, Oct. 3, 2002).
  • the array of drives comprises a RAID set.
  • the entire RAID set is opportunistically powered on or off (see, e.g., Firefly Digital Virtual Library, http://www.asaca.com/DVL/DM — 200.htm).
  • Firefly Digital Virtual Library http://www.asaca.com/DVL/DM — 200.htm.
  • control of the drives refers to both controlling access to drives for I/O operations, and providing data protection, such as by using RAID parity schemes.
  • RAID parity schemes There are two obvious challenges that arise in relation to the interconnection mechanism: the cost of the interconnection and the related complexity of fanout from the controller to the drives.
  • the invention comprises systems and methods for providing scalable, reliable, power-efficient, high-capacity data storage, wherein large numbers of closely packed data drives having corresponding metadata and parity volumes are individually powered on and off, according to usage requirements.
  • the invention is implemented in a RAID-type data storage system.
  • This system employs a large number of hard disk drives that are individually controlled, so that in this embodiment only the disk drives that are in use are powered on. Consequently, the system uses only a fraction of the power that would be consumed if all of the disk drives in the system had to be powered on.
  • the data protection scheme is designed to utilize large, contiguous blocks of space on the data disk drives, and to use the space on one data disk drive at a time, so that the data disk drives which are not in use can be powered down.
  • One embodiment of the invention comprises a method which includes the steps of providing a data storage system having a plurality of data storage drives, performing data accesses to the data storage system, wherein the data accesses involve accesses to a first subset of the data storage drives and wherein the first subset of the data storage drives is powered on, and powering down a second subset of the data storage drives, wherein the data accesses do not involve accesses to the second subset of the data storage drives.
  • FIG. 1 is a diagram illustrating the general structure of a multiple-disk data storage system in accordance with one embodiment.
  • FIGS. 2A and 2B are diagrams illustrating the interconnections between the controllers and disk drives in a densely packed data storage system in accordance with one embodiment.
  • FIG. 3 is a diagram illustrating the physical configuration of a densely packed data storage system in accordance with one embodiment.
  • FIG. 4 is a flow diagram illustrating the manner in which the power management scheme of a densely packed data storage system is determined in accordance with one embodiment.
  • FIG. 5 is a diagram illustrating the manner in which information is written to a parity disk and the manner in which disk drives are powered on and off in accordance with one embodiment.
  • FIG. 6 is a diagram illustrating the content of a metadata disk in accordance with one embodiment.
  • FIG. 7 is a diagram illustrating the structure of information stored on a metadata disk in accordance with one embodiment.
  • FIG. 8 is a diagram illustrating the manner in which containers of data are arranged on a set of disk drives in accordance with one embodiment.
  • FIG. 9 is a diagram illustrating the manner in which the initial segments of data from a plurality of disk drives are stored on a metadata volume in accordance with one embodiment.
  • FIG. 10 is a diagram illustrating the use of a pair of redundant disk drives and corresponding parity and metadata volumes in accordance with one embodiment.
  • FIG. 11 is a diagram illustrating the use of a data storage system as a backup target for the primary storage via a direct connection and as a media (backup) server to a tape library in accordance with one embodiment.
  • FIG. 12 is a diagram illustrating the interconnect from the host (server or end user) to the end disk drives in accordance with one embodiment.
  • FIG. 13 is a diagram illustrating the interconnection of a channel controller with multiple stick controllers in accordance with one embodiment.
  • FIG. 14 is a diagram illustrating the interconnection of the outputs of a SATA channel controller with corresponding stick controller data/command router devices in accordance with one embodiment.
  • FIG. 15 is a diagram illustrating the implementation of a rack controller in accordance with one embodiment.
  • various embodiments of the invention comprise systems and methods for providing scalable, reliable, power-efficient, high-capacity data storage, wherein large numbers of closely packed data drives having corresponding data storage drives are individually powered on and off, depending upon their usage requirements.
  • the invention is implemented in a RAID-type data storage system.
  • This system employs a large number of hard disk drives.
  • the data is written to one or more of the disk drives.
  • Metadata and parity information corresponding to the data are also written to one or more of the disk drives to reduce the possibility of data being lost or corrupted.
  • the manner in which data is written to the disks typically involves only one data disk at a time, in addition to metadata and parity disks.
  • reads of data typically only involve one data disk at a time. Consequently, data disks which are not currently being accessed can be powered down.
  • the system is therefore configured to individually control the power to each of the disks so that it can power up the subset of disks that are currently being accessed, while powering down the subset of disks that are not being accessed.
  • the power consumption of the system is less than that of a comparable conventional system (i.e., one with approximately the same total number of similar disk drives) in which all of the disk drives have to be powered on at the same time.
  • a comparable conventional system i.e., one with approximately the same total number of similar disk drives
  • the present system can therefore be packaged in a smaller enclosure than the comparable conventional system.
  • Another difference between the present system and conventional systems is that conventional systems require switches for routing data to appropriate data disks in accordance with the data protection scheme employed by the system (e.g., RAID level 3 ).
  • the data protection scheme employed by the system e.g., RAID level 3
  • most of the disk drives are powered down at any given time, so the data can be distributed by a simple fan-out interconnection, which consumes less power and takes up less volume within the system enclosure than the switches used in conventional systems.
  • the present system can be designed to meet a particular reliability level (e.g., threshold mean time between failures, MTBF), as opposed to conventional systems, which are essentially constrained by the number of disk drives in the system and the reliability of the individual disk drives.
  • a particular reliability e.g., mean time to failure, or MTTF
  • MTBF threshold mean time between failures
  • the various embodiments of the invention may provide advantages over conventional systems (e.g., RAID systems) in the four areas discussed above: power management; data protection; physical packaging; and storage transaction performance. These advantages are described below with respect to the different areas of impact.
  • embodiments of the present invention may not only decrease power consumption, but also increase system reliability by optimally power cycling the drives. In other words, only a subset of the total number of drives is powered on at any time. Consequently, the overall system reliability can be designed to be above a certain acceptable threshold.
  • the power cycling of drives results in a limited number of drives being powered on at any time. This affects performance in two areas. First, the total I/O is bound by the number of powered drives. Second, a random Read operation to a block in a powered down drive would incur a very large penalty in the spin-up time.
  • the embodiments of the present invention use large numbers of individual drives, so that the number of drives that are powered on, even though it will be only a fraction of the total number of drives, will allow the total I/O to be within specification.
  • the data access scheme masks the delay so that the host system does not perceive the delay or experience a degradation in performance.
  • FIG. 1 a diagram illustrating the general structure of a multiple-disk data storage system in accordance with one embodiment of the invention is shown. It should be noted that the system illustrated in FIG. 1 is a very simplified structure which is intended merely to illustrate one aspect (power cycling) of an embodiment of the invention. A more detailed representation of a preferred embodiment is illustrated in FIG. 2 and the accompanying text below.
  • data storage system 10 includes multiple disk drives 20 .
  • disk drives 20 are connected to a controller 30 via interconnect 40 .
  • disk drives 20 are grouped into two subsets, 50 and 60 .
  • Subset 50 and subset 60 differ in that the disk drives in one of the subsets (e.g., 50 ) are powered on, while the disk drives in the other subset (e.g., 60 ) are powered down.
  • the individual disk drives in the system are powered on (or powered up) only when needed. When they are not needed, they are powered off (powered down).
  • the particular disk drives that make up each subset will change as required to enable data accesses (reads and writes) by one or more users.
  • conventional data storage e.g., RAID
  • FIG. 1 the system illustrated by FIG. 1 is used here simply to introduce the power cycling aspect of one embodiment of the invention.
  • This and other embodiments described herein are exemplary and numerous variations on these embodiments may be possible.
  • the embodiment of FIG. 1 utilizes multiple disk drives, other types of data storage, such as solid state memories, optical drives, or the like could also be used. It is also possible to use mixed media drives, although it is contemplated that this will not often be practical. References herein to disk drives or data storage drives should therefore be construed broadly to cover any type of data storage.
  • the embodiment of FIG. 1 has two subsets of disk drives one of which is powered on and one of which is powered off, other power states may also be possible. For instance, there may be various additional states of operation (e.g., standby) in which the disk drives may exist, each state having its own power consumption characteristics.
  • additional states of operation e.g., standby
  • the powering of only a subset of the disk drives in the system enables the use of a greater number of drives within the same footprint as a system in which all of the drives are powered on at once.
  • One embodiment of the invention therefore provides high density packing and interconnection of the disk drives.
  • This system comprises a rack having multiple shelves, wherein each shelf contains multiple rows, or “sticks” of disk drives. The structure of this system is illustrated in FIG. 2.
  • the top level interconnection between the system controller 120 and the shelves 110 is shown on the left side of the figure.
  • the shelf-level interconnection to each of the sticks 150 of disk drives 160 is shown on the right side of the figure.
  • the system has multiple shelves 110 , each of which is connected to a system controller 120 .
  • Each shelf has a shelf controller 140 which is connected to the sticks 150 in the shelf.
  • Each stick 150 is likewise connected to each of the disk drives 160 so that they can be individually controlled, both in terms of the data accesses to the disk drives and the powering on/off of the disk drives.
  • the mechanism for determining the optimal packing and interconnection configuration of the drives in the system is described below.
  • s number of shelf units in the system, typically determined by the physical height of the system. For example, for a 44U standard rack system, s can be chosen to be 8.
  • d the number of disk drives in each stick in a shelf. In a standard rack, d can be
  • FIG. 2 The configuration as shown in FIG. 2 is decomposed into shelves, sticks and disks so that the best close packing of disks can be achieved for purposes of maximum volumetric capacity of disk drives.
  • FIG. 3 One example of this is shown in FIG. 3. With the large racks that are available, nearly 1000 3.5′′ disks can be packed into the rack.
  • the preferred configuration is determined by the decomposition of N into s, t and d while optimizing with respect to the i) volume constraints of the drives and the overall system (the rack), and ii) the weight constraint of the complete system.
  • the latter constraints are imposed by the physical size and weight limits of standard rack sizes in data centers.
  • One embodiment of the invention comprises a bulk storage or near-online (NOL) system.
  • This storage system is a rack-level disk system comprising multiple shelves. Hosts can connect to the storage system via Fibre Channel ports on the system level rack controller, which interconnects to the shelves in the rack.
  • Each shelf has a local controller that controls all of the drives in the shelf. RAID functionality is supported within each shelf with enough drives for providing redundancy for parity protection as well as disk spares for replacing failed drives.
  • the system is power cycled. More particularly, the individual drives are powered on or off to improve the system reliability over the entire (large) set of drives. Given current known annualized failure rates (AFRs), a set of 1000 ATA drives would be expected to have a MTBF of about 20 days. In an enterprise environment, a drive replacement period of 20 days to service the storage system is not acceptable.
  • the present scheme for power cycling the individual drives effectively extends the real life of the drives significantly.
  • power cycling results in many contact start-stops (CSSs), and increasing CSSs reduces the total life of the drive.
  • CSSs contact start-stops
  • having fewer powered drives makes it difficult to spread data across a large RAID set. Consequently, it may be difficult to implement data protection at a level equivalent to RAID 5 . Still further, the effective system bandwidth is reduced when there are few powered drives.
  • the approach for determining the power cycling parameters is as shown in the flow diagram of FIG. 4 and as described below. It should be noted that the following description assumes that the disk drives have an exponential failure rate (i.e., the probability of failure is 1-e- ⁇ t , where ⁇ is the inverse of the failure rate).
  • the failure rates of disk drives (or other types of drives) in other embodiments may have failure rates that are more closely approximated by other mathematical functions. For such systems, the calculations described below would use the alternative failure function instead of the present exponential function.
  • the system MTBF can be increased by powering the drives on and off, i.e., power cycling the drives, to increase the overall life of each drives in the system. This facilitates maintenance of the system, since serviceability of computing systems in the enterprise requires deterministic and scheduled service times when components (drives) can be repaired or replaced. Since it is desired to have scheduled service at regular intervals, this constraint is incorporated into the calculations that follow.
  • the effective system MTBF is T, and the effective failure rate of the system is 1/T
  • the ratio R of all drives at a shelf is also the number of drives that must be powered ON in total in each shelf. This also limits the number of drives that are used for data writing or reading as well as any other drives used for holding metadata.
  • FIG. 4 depicts the flowchart for establishing power cycling parameters.
  • a new RAID variant is implemented in order to meet the needs of the present Power Managed system.
  • the power duty cycle R of the drives will be less than 100% and may be well below 50%. Consequently, when a data volume is written to a RAID volume in a shelf, all drives in the RAID set cannot be powered up (ON).
  • the RAID variant disclosed herein is designed to provide the following features.
  • this scheme is designed to provide adequate parity protection. Further, it is designed to ensure that CSS thresholds imposed by serviceability needs are not violated. Further, the RAID striping parameters are designed to meet the needs of the workload patterns, the bandwidth to be supported at the rack level, and access time. The time to access the first byte must also be much better than tape or sequential media. The scheme is also designed to provide parity based data protection and disk sparing with low overhead.
  • a metadata drive contains metadata for all I/O operations and disk drive operational transitions (power up, power down, sparing, etc.).
  • the data that resides on this volume is organized such that it provides information on the data on the set of disk drives, and also caches data that is to be written or read from drives that are not yet powered on.
  • the metadata volume plays an important role in disk management, I/O performance, and fault tolerance.
  • the RAID variant used in the present system “serializes” writes to smallest subset of disks in the RAID set, while ensuring that CSS limits are not exceeded and that the write I/O performance does not suffer in access time and data rate.
  • the first assumption is that this data storage system is not to achieve or approach the I/O performance of an enterprise online storage system. In other words, the system is not designed for high I/O transactions, but for reliability.
  • the second assumption is that the I/O workload usage for this data storage is typically large sequential writes and medium to large sequential reads.
  • An initialized set of disk drives consists of a mapped organization of data in which a single disk drive failure will not result in a loss of data. For this technique, all disk drives are initialized to a value of 0.
  • the parity disk contains a value equal to the XOR'ing of all three data disks, it is not necessary to power on all of the disks to generate the correct parity. Instead, the old parity (“5”) is simply XOR'ed with the newly written data (“A”) to generate the new parity (“F”). Thus, it is not necessary to XOR out the old data on disk 202 .
  • MDV metadata volume
  • This volume is a set of online, operational disk drives which may be mirrored for fault tolerance. This volume resides within the same domain as the set of disk drives. Thus, the operating environment should provide enough power, cooling, and packaging to support this volume.
  • This volume contains metadata that is used for I/O operations and disk drive operational transitions (power up, power down, sparing, etc.).
  • the data that resides on this volume is organized such that copies of subsets of data representing the data on the set of disk drives.
  • a metadata volume is located within each shelf corresponding to metadata for all data volumes resident on the disks in the shelf. Referring to FIGS. 6 and 7, the data content of a metadata volume is illustrated. This volume contains all the metadata for the shelf, RAID, disk and enclosure. There also exists metadata for the rack controller. This metadata is used to determine the correct system configuration between the rack controller and disk shelf.
  • the metadata volume contains shelf attributes, such as the number of total drives, drive spares, unused data, RAID set attributes and memberships, such as the RAID set set, drive attributes, such as the serial number, hardware revisions, firmware revisions, and volume cache, including read cache and write cache.
  • the metadata volume is a set of mirrored disk drives.
  • the minimum number of the mirrored drives in this embodiment is 2.
  • the number of disk drives in the metadata volume can be configured to match the level of protection requested by the user. The number of disks cannot exceed the number of disk controllers.
  • the metadata volume is mirrored across each disk controller. This eliminates the possibility of a single disk controller disabling the Shelf Controller.
  • the layout of the metadata volume is designed to provide persistent data and state of the disk shelf. This data is used for shelf configuring, RAID set configuring, volume configuring, and disk configuring. This persistent metadata is updated and utilized during all phases of the disk shelf (Initialization, Normal, Reconstructing, Service, etc.).
  • the metadata volume data is used to communicate status and configuration data to the rack controller.
  • the metadata may include “health information for each disk drive (i.e., information on how long the disk drive has been in service, how many times it has been powered on and off, and other factors that may affect its reliability). If the health information for a particular disk drive indicates that the drive should be replaced, the system may begin copying the data on the disk drive to another drive in case the first drive fails, or it may simply provide a notification that the drive should be replaced at the next normal service interval.
  • the metadata volume data also has designated volume-cache area for each of the volumes. In the event that a volume is offline, the data stored in the metadata volume for the offline volume can be used while the volume comes online.
  • This provides, via a request from the rack controller, a window of 10-12 seconds (or whatever time is necessary to power-on the corresponding drives) during which write data is cached while the drives of the offline volume are being powered up. After the drives are powered up and the volume is online, the cached data is written to the volume.
  • This data is used to bring the disk shelf to an operational mode. Once the disk shelf has completed the initialization, it will wait for the rack controller to initiate the rack controller initialization process.
  • each volume is synchronized with the metadata volume.
  • Each volume will have its associated set of metadata on the disk drive. This is needed in the event of a disastrous metadata volume failure.
  • the metadata volume has reserved space for each volume.
  • an allocated volume read cache VRC
  • This read cache is designed to alleviate the spin-up and seek time of a disk drive once initiated with power.
  • the VRC replicates the initial portion of each volume.
  • the size of data replicated in the VRC will depend on the performance desired and the environmental conditions. Therefore, in the event that an I/O READ request is given to an offline volume, the data can be sourced from the VRC. Care must be taken to ensure that this data is coherent and consistent with the associated volume.
  • the metadata volume has reserved space for each volume.
  • an allocated volume write cache (VWC). This write cached is designed to alleviate the spin-up and seek time of a disk drive once initiated with power.
  • the VWC has a portion of the initial data, e.g., 512 MB, replicated for each volume. Therefore, in the event that an I/O write request is given to an offline volume, the data can be temporarily stored in the VWC. Again, care must be taken to ensure that this data is coherent and consistent with the associated volume.
  • FIG. 8 a diagram illustrating the manner in which data is stored on a set of disks is shown.
  • a set of disks are partitioned into “large contiguous” sets of data blocks, known as containers.
  • Single or multiple disk volumes which are presented to the storage user or server can represent a container.
  • the data blocks within a container are dictated by the disk sector size, typically, 512 bytes.
  • Each container is statically allocated and addressed from 0 to x, where x is the number of data blocks minus 1.
  • Each container can be then divided into some number of sub-containers.
  • the access to each of the containers is through a level of address indirection.
  • the container is a contiguous set of blocks that is addressed from 0 to x.
  • the associated disk drive must be powered and operational.
  • container 0 is fully contained within the address space of disk drive 1 .
  • the only disk drive that is powered on is disk drive 1 .
  • disk drives 1 and 2 must be alternately powered, as container 2 spans both disk drives. Initially, disk drive 1 is powered. Then, disk drive 1 is powered down, and disk drive 2 is powered up. Consequently, there will be a delay for disk drive 2 to become ready for access. Thus, the access of the next set of data blocks on disk drive 2 will be delayed. This generally is not an acceptable behavior for access to a disk drive.
  • the first segment of each disk drive and/or container is therefore cached on a separate set of active/online disk drives.
  • the data blocks for container 2 reside on the metadata volume, as illustrated in FIG. 9.
  • This technique in which a transition between two disk drives is accomplished by powering down one disk drive and powering up the other disk drive, can be applied to more than just a single pair of disk drives.
  • the single drives described above can each be representative of a set of disk drives.
  • This disk drive configuration could comprise RAID 10 or some form of data organization that would “spread” a hot spot over many disk drives (spindles).
  • FIG. 10 a diagram illustrating the use of a pair of redundant disk drives is shown.
  • the replication is a form of RAID ( 1 , 4 , 5 , etc.)
  • the process of merging must keep the data coherent. This process may be done in synchronously with each write operation, or it may be performed at a later time. Since not all disk drives are powered on at one time, there is additional housekeeping of the current status of a set of disk drives. This housekeeping comprises the information needed to regenerate data blocks, knowing exactly which set of disk drives or subset of disk drives are valid in restoring the data.
  • drives in a RAID set can be reused, even in the event of multiple disk drive failures.
  • failure of more than one drive in a RAID set results in the need to abandon all of the drives in the RAID set, since data is striped or distributed across all of the drives in the RAID set.
  • the set of member drives in the RAID set can be decreased (e.g., from six drives to four).
  • the parity for the reduced set of drives can be calculated from the data that resides on these drives. This allows the preservation of the data on the remaining drives in the event of future drive failures.
  • a new parity drive could be designated for the newly formed RAID set, and the parity information would be stored on this drive.
  • Disk drive metadata is updated to reflect the remaining and/or new drives that now constitute the reduced or newly formed RAID set.
  • a RAID set has five member drives, including four data drives and one parity drive.
  • the data can be reconstructed, either on the remaining disk drives if sufficient space is available. (If a spare is available to replace the failed drive and it is not necessary to reduce the RAID set, the data can be reconstructed on the new member drive.)
  • the data on the non-failed drives can be retained and operations can proceed with the remaining data on the reduced RAID set, or the reduced RAID set can be re-initialized and used as a new RAID set.
  • the sparing of a failed disk on of a set of disk drives is performed at both failed data block and the failed disk drive events.
  • the sparing of failed data blocks is temporarily regenerated.
  • the process of restoring redundancy within a set of disk drives can be more efficient and effective. This process is matched to the powering of the each of the remaining disk drives in a set of disk drives.
  • a spare disk drive is allocated as a candidate for replacement into the RAID set. Since only a limited number of drives can be powered-on at one time, only the drive having the failed data blocks and the candidate drive are powered. At this point, only the known good data blocks are copied onto the corresponding address locations of the failed data blocks. Once all the known good blocks have been copied, the process to restore the failed blocks is initiated. Thus the entire RAID set will need to be powered-on. Although the entire set of disk drives needs to powered-on, it is only for the time necessary to repair the bad blocks. After all the bad blocks have been repaired, the drives are returned to a powered-off state.
  • the end user of the system may use it, for example, as a disk system attached directly to a server as direct attached storage (DAS) or as shared storage in a storage area network (SAN).
  • DAS direct attached storage
  • SAN storage area network
  • the system is used as the backup target to the primary storage via a direct connection and then connected via a media (backup) server to a tape library.
  • the system may be used in other ways in other embodiments.
  • the system presents volume images to the servers or users of the system.
  • physical volumes are not directly accessible to the end users. This is because, as described earlier, through the power managed RAID, the system hides the complexity of access to physical drives, whether they are powered on or not.
  • the controller at the rack and the shelf level isolates the logical volume from the physical volume and drives.
  • the system can rewrite, relocate or move the logical volumes to different physical locations.
  • the system may provide independence from the disk drive type, capacity, data rates, etc. This allows migration to new media as they become available and when new technology is adopted. It also eliminates the device (disk) management administration required to incorporate technology obsolescence.
  • the system may also provide automated replication for disaster recovery.
  • the second copy of a primary volume can be independently copied to third party storage devices over the network, either local or over wide-area.
  • the device can be another disk system, another tape system, or the like.
  • the volume could be replicated to multiple sites for simultaneously creating multiple remote or local copies.
  • the system may also provide automatic incremental backup to conserve media and bandwidth. Incremental and differential changes in the storage volume can be propagated to the third or later copies.
  • the system may also provide authentication and authorization services. Access to both the physical and logical volumes and drives can be controlled by the rack and shelf controller since it is interposed between the end user of the volumes and the physical drives.
  • the system may also provide automated data revitalization. Since data on disk media can degrade over time, the system controller can refresh the volume data to different drives automatically so that the data integrity is maintained. Since the controllers have information on when disks and volumes are written, they can keep track of which disk data has to be refreshed or revitalized.
  • the system may also provide concurrent restores: multiple restores can be conducted concurrently, possibly initiated asynchronously or via policy by the controllers in the system.
  • the system may also provide unique indexing of metadata within a storage volume: by keeping metadata information on the details of objects contained within a volume, such as within the metadata volume in a shelf.
  • the metadata can be used by the controller for the rapid search of specific objects across volumes in the system.
  • the system may also provide other storage administration feature for the management of secondary and multiple copies of volumes, such as single-view of all data to simplify and reduce cost of managing all volume copies, automated management of the distribution of the copies of data, and auto-discovery and change detection of the primary volume that is being backed up When the system is used for creating backups.
  • the preferred interconnect system provides a means to connect 896 disk drives, configured as 112 disks per shelf and 8 shelves per rack.
  • the internal system interconnect is designed to provide an aggregate throughput equivalent to six 2 Gb/sec Fibre Channel interfaces (1000 MB/s read or write).
  • the external system interface is Fibre Channel.
  • the interconnect system is optimized for the lowest cost per disk at the required throughput.
  • FIG. 12 shows the interconnect scheme from the host (server or end user) to the end disk drives.
  • the interconnect system incorporates RAID at the shelf level to provide data reliability.
  • the RAID controller is designed to address 112 disks, some of which may be allocated to sparing.
  • the RAID controller spans 8 sticks of 14 disks each.
  • the RAID set should be configured to span multiple sticks to guard against loss of any single stick controller or interconnect or loss of any single disk drive.
  • the system interconnect from shelf to stick can be configured to provide redundancy at the stick level for improved availability.
  • the stick-level interconnect is composed of a stick controller (FPGA/ASIC plus SERDES), shelf controller (FPGA/ASIC plus SERDES, external processor and memory), rack controller (FPGA/ASIC plus SERDES) and associated cables, connectors, printed circuit boards, power supplies and miscellaneous components.
  • the SERDES and/or processor functions may be integrated into an advanced FPGA (e.g., using Xilinx Virtex II Pro).
  • the shelf controller and the associated 8 stick controllers are shown in FIG. 13.
  • the shelf controller is connected to the rack controller (FIG. 15) via Fibre Channel interconnects.
  • Fibre Channel interconnects It should be noted that, in other embodiments, other types of controllers and interconnects (e.g., SCSI) may be used.
  • the shelf controller can provide different RAID level support such as RAID 0 , 1 and 5 and combinations thereof across programmable disk RAID sets accessible via eight SATA initiator ports.
  • the RAID functions are implemented in firmware, with acceleration provided by an XOR engine and DMA engine implemented in hardware. In this case, XOR-equipped CPU Intel IOP321 is used.
  • the Shelf Controller RAID control unit connects to the Stick Controller via a SATA Channel Controller over the PCI-X bus.
  • the 8 SATA outputs of the SATA Channel Controller each connect with a stick controller data/command router device (FIG. 14).
  • Each data/command router controls 14 SATA drives of each stick.
  • the rack controller comprises a motherboard with a ServerWorks GC-LE chipset and four to 8 PCI-X slots.
  • the PCI-X slots are populated with dual-port or quad-port 2 G Fibre Channel PCI-X target bus adapters (TBA).
  • TAA Fibre Channel PCI-X target bus adapters
  • other components which employ other protocols, may be used.
  • quad-port shelf SCSI adapters using u320 to the shelf units may be used.

Abstract

Systems and methods for providing scalable, reliable, power-efficient, high-capacity data storage, wherein large numbers of closely packed data drives having corresponding metadata and parity volumes are individually powered on and off, depending upon their respective usage. In one embodiment, the invention is implemented in a RAID-type data storage system which employs a large number of hard disk drives that are individually controlled, so that only the disk drives that are in use are powered on. The reduced power consumption allows the disk drives to be contained in a smaller enclosure than would conventionally be possible. In a preferred embodiment, the data protection scheme is designed to utilize large, contiguous blocks of space on the data disk drives, and to use the space on one data disk drive at a time, so that the data disk drives which are not in use can be powered down.

Description

    RELATED APPLICATIONS
  • This application claims priority to U.S. Provisional Patent Application No. 60/409,980, entitled “Method and Apparatus for Efficient Scalable Storage Management,” by Guha, et al., filed Sep. 12, 2002, which is incorporated by reference as if set forth herein in its entirety.[0001]
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0002]
  • The present invention relates generally to data storage systems, and more particularly to power-efficient, high-capacity data storage systems that are scalable and reliable. [0003]
  • 2. Related Art [0004]
  • The need for large data storage motivates the need for building large-scale and high-capacity storage systems. While one option for building scalable systems is to connect and centrally manage multiple storage systems across a network, such as a storage area network (SAN), the inherent capacity increase in a single system is still highly desirable for two reasons: first, increasing total storage capacity in a single system in effect provide a multiplier effect for the total storage across a SAN; and second, for many uses providing a single device that manages larger capacity of storage is always more cost-effective in testing, integrating and deploying. [0005]
  • Traditionally, tape drives, automated tape libraries or other removable media storage devices have been used to deliver large capacity storage in a single system. This is due in large part to the lower cost and footprint of these types of systems when compared to media such as disk drives. Recent advances in disk technology, however, have caused designers to revisit the design of large scale storage systems using disk drives. There are two primary reasons for this. First, the cost differential between disk and tape devices on per unit storage is decreasing rapidly due to the higher capacity of disk drives available at effectively lower cost. Second, the performance of disk systems with respect to access times and throughput are far greater than tape systems. [0006]
  • Despite the falling cost of disk drives and their performance in throughput and access times, some tape drives still have the advantage of being able to support large numbers (e.g., ten or more) of removable cartridges in a single automated library. Because a single tape drive can access multiple tape volumes, equivalent storage on multiple disk drives will consume ore (e.g., ten times more) power than the equivalent tape drive systems, even with a comparable footprint. Furthermore, for a disk-based storage system that has the same number of powered drives as the number of passive cartridges in a tape system, the probability of failures increases in the disk storage system. It would therefore be desirable to provide a single high-capacity disk based storage system that is as cost effective as tertiary tape storage systems but with high reliability and greater performance. [0007]
  • Traditional RAID and Data Protection Schemes Issues [0008]
  • The dominant approach to building large storage systems is to use a redundant array of inexpensive (independent) disks (RAID). RAID systems are described, for example, in David A. Patterson, G. Gibson, and Randy H. Katz, “A Case for Redundant Arrays of Inexpensive Disks (RAID),” International Conference on Management of Data (SIGMOD), p. 109-116, June 1988. The primary goal for RAID is to provide data protection or fault tolerance in access to data in the case of failures, especially disk failures. A secondary benefit is increasing I/O performance by spreading data over multiple disk spindles and performing operations in parallel, which allows multiple drives to be working on a single transfer request. [0009]
  • There are six commonly known RAID “levels” or standard geometries that are generally used for conventional RAID storage systems. The simplest array that provides a form of redundancy, a [0010] RAID level 1 system, comprises one or more disks for storing data and an equal number of additional mirror disks for storing copies of the information written to the data disks. The remaining RAID levels, identified as RAID level 2-6 systems, segment the data into portions for storage across several data disks. One of more additional disks is utilized to store error check or parity information.
  • RAID storage subsystems typically utilize a control module that shields the user or host system from the details of managing the redundant array. The controller makes the subsystem appear to the host computer as a single, highly reliable, high capacity disk drive even though a RAID controller may distribute the data across many smaller drives. Frequently, RAID subsystems provide large cache memory structures to further improve the performance of the subsystem. The host system simply requests blocks of data to be read or written and the RAID controller manipulates the disk array and cache memory as required. [0011]
  • The various RAID levels are distinguished by their relative performance capabilities as well as their overhead storage requirements. For example, a [0012] RAID level 1 “mirrored” storage system requires more overhead storage than RAID levels 2-5 that utilize XOR parity to provide requisite redundancy. RAID level 1 requires 100% overhead since it duplicates all data, while RAID level 5 requires 1/N of the storage capacity used for storing data, where N is the number of data disk drives used in the RAID set.
  • Traditional Power Consumption Issues [0013]
  • There have been a few recent efforts at power cycling computing resources at a data center. This is done for a variety of different reasons, such as energy cost and reliability. For example, a data storage system may be scaled upward to incorporate a very large number of disk drives. As the number of disk drives in the system increases, it is apparent that the amount of energy required to operate the system increases. It may be somewhat less apparent that the reliability of the system is likely to decrease because of the increased heat generated by the disk drives in the system. While prior art systems use various approaches to address these problems, they typically involve opportunistically powering down all of the drives in the system, as demonstrated by the following examples. [0014]
  • To reduce energy costs in a data center, one approach employs energy-conscious provisioning of servers by concentrating request loads to a minimal active set of servers for the current aggregate load level (see Jeffrey S. Chase, Darrell C. Anderson, Parchi N. Thakar, Amin M. Vahdat, and Ronald P. Doyle. Managing energy and server resources in hosting centers. In [0015] Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 103-116, October 2001). Active servers always run near a configured utilization threshold, while the excess servers transition to low-power idle states to reduce the energy cost of maintaining surplus capacity during periods of light load. The focus is on power cycling servers and not on storage. Chase, et al. mention that power cycling may reduce the life of the disks, but current disks have a start/stop limit that will likely not be exceeded.
  • Another approach uses a large-capacity storage system which is referred to as a massive array of idle disks, or MAID (see Dennis Colarelli, Dirk Grunwald and Michael Neufeld, The Case for Massive Arrays of Idle Disks (MAID), Usenix Conference on File and Storage Technologies (FAST), January 2002, Monterey Calif.). In this approach, a block level storage system uses a front-end cache and controller that allow access to the full array of drives. The full array can be powered off opportunistically to extend the life of IDE or ATA drives. The power off schedule is based on a heuristic, such as a least-recently-used or least expected to be used model, i.e., the array of drives is turned off when no data access is expected on any of the drives in the array. Another approach uses archival storage systems where ATA drives are also powered off (as in the case of MAID) based on the algorithms similar to the LRU policy (see Kai Li and Howard Lee, Archival data storage system and method, U.S. patent application Ser. No. 2002-0144057, Oct. 3, 2002). In some systems, the array of drives comprises a RAID set. In these systems, the entire RAID set is opportunistically powered on or off (see, e.g., Firefly Digital Virtual Library, http://www.asaca.com/DVL/DM[0016] 200.htm). These systems can power down a RAID set that has been in an extended state of inactivity, or power up a RAID set for which I/O requests are pending.
  • Systems with Very Large Numbers of Drives [0017]
  • One of the challenges that exists in the current data storage environment is to build a storage controller that can handle hundreds of drives for providing large-scale storage capacity, while maintaining performance and reliability. This challenge encompasses several different aspects of the system design: the system reliability; the interconnection and switching scheme for control of the drives; the performance in terms of disk I/O; and the cost of the system. Each of these aspects is addressed briefly below. [0018]
  • System Reliability. [0019]
  • As the number of operational drives increases in the system, especially if many drives are seeking for data concurrently, the probability of a drive failure increases almost linearly with the number of drives, thereby decreasing overall reliability of the system. For example, if a typical disk drive can be characterized as having a mean time to failure (MTTF) of 500,000 hours, a system with 1000 of these drives will be expected to have its first disk fail in 500.5 hours or 21 days. [0020]
  • Interconnection and Switching Scheme for Control of Drives. [0021]
  • As the number of drives increases, an efficient interconnect scheme is required both to move data and to control commands between the controller and all of the drives. As used here, control of the drives refers to both controlling access to drives for I/O operations, and providing data protection, such as by using RAID parity schemes. There are two obvious challenges that arise in relation to the interconnection mechanism: the cost of the interconnection and the related complexity of fanout from the controller to the drives. [0022]
  • Performance for Disk I/O. [0023]
  • Since the controller will read and write data to and from all of the drives, the bandwidth required between the controller and the drives will scale with the number of active drives. In addition, there is the difficulty of RAIDing across a very large set, since the complexity, the extent of processing logic and the delay of the parity computation will grow with the number of drives in the RAID set. [0024]
  • Cost. [0025]
  • All of the above design issues must be addressed, while ensuring that the cost of the overall disk system can be competitive with typically lower cost tertiary tape storage devices. [0026]
  • SUMMARY OF THE INVENTION
  • One or more of the problems outlined above may be solved by the various embodiments of the invention. Broadly speaking, the invention comprises systems and methods for providing scalable, reliable, power-efficient, high-capacity data storage, wherein large numbers of closely packed data drives having corresponding metadata and parity volumes are individually powered on and off, according to usage requirements. [0027]
  • In one embodiment, the invention is implemented in a RAID-type data storage system. This system employs a large number of hard disk drives that are individually controlled, so that in this embodiment only the disk drives that are in use are powered on. Consequently, the system uses only a fraction of the power that would be consumed if all of the disk drives in the system had to be powered on. In a preferred embodiment, the data protection scheme is designed to utilize large, contiguous blocks of space on the data disk drives, and to use the space on one data disk drive at a time, so that the data disk drives which are not in use can be powered down. [0028]
  • One embodiment of the invention comprises a method which includes the steps of providing a data storage system having a plurality of data storage drives, performing data accesses to the data storage system, wherein the data accesses involve accesses to a first subset of the data storage drives and wherein the first subset of the data storage drives is powered on, and powering down a second subset of the data storage drives, wherein the data accesses do not involve accesses to the second subset of the data storage drives. [0029]
  • Numerous additional embodiments are also possible. [0030]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Other objects and advantages of the invention may become apparent upon reading the following detailed description and upon reference to the accompanying drawings. [0031]
  • FIG. 1 is a diagram illustrating the general structure of a multiple-disk data storage system in accordance with one embodiment. [0032]
  • FIGS. 2A and 2B are diagrams illustrating the interconnections between the controllers and disk drives in a densely packed data storage system in accordance with one embodiment. [0033]
  • FIG. 3 is a diagram illustrating the physical configuration of a densely packed data storage system in accordance with one embodiment. [0034]
  • FIG. 4 is a flow diagram illustrating the manner in which the power management scheme of a densely packed data storage system is determined in accordance with one embodiment. [0035]
  • FIG. 5 is a diagram illustrating the manner in which information is written to a parity disk and the manner in which disk drives are powered on and off in accordance with one embodiment. [0036]
  • FIG. 6 is a diagram illustrating the content of a metadata disk in accordance with one embodiment. [0037]
  • FIG. 7 is a diagram illustrating the structure of information stored on a metadata disk in accordance with one embodiment. [0038]
  • FIG. 8 is a diagram illustrating the manner in which containers of data are arranged on a set of disk drives in accordance with one embodiment. [0039]
  • FIG. 9 is a diagram illustrating the manner in which the initial segments of data from a plurality of disk drives are stored on a metadata volume in accordance with one embodiment. [0040]
  • FIG. 10 is a diagram illustrating the use of a pair of redundant disk drives and corresponding parity and metadata volumes in accordance with one embodiment. [0041]
  • FIG. 11 is a diagram illustrating the use of a data storage system as a backup target for the primary storage via a direct connection and as a media (backup) server to a tape library in accordance with one embodiment. [0042]
  • FIG. 12 is a diagram illustrating the interconnect from the host (server or end user) to the end disk drives in accordance with one embodiment. [0043]
  • FIG. 13 is a diagram illustrating the interconnection of a channel controller with multiple stick controllers in accordance with one embodiment. [0044]
  • FIG. 14 is a diagram illustrating the interconnection of the outputs of a SATA channel controller with corresponding stick controller data/command router devices in accordance with one embodiment. [0045]
  • FIG. 15 is a diagram illustrating the implementation of a rack controller in accordance with one embodiment.[0046]
  • While the invention is subject to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and the accompanying detailed description. It should be understood, however, that the drawings and detailed description are not intended to limit the invention to the particular embodiment which is described. This disclosure is instead intended to cover all modifications, equivalents and alternatives falling within the scope of the present invention as defined by the appended claims. [0047]
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • One or more embodiments of the invention are described below. It should be noted that these and any other embodiments described below are exemplary and are intended to be illustrative of the invention rather than limiting. [0048]
  • As described herein, various embodiments of the invention comprise systems and methods for providing scalable, reliable, power-efficient, high-capacity data storage, wherein large numbers of closely packed data drives having corresponding data storage drives are individually powered on and off, depending upon their usage requirements. [0049]
  • In one embodiment, the invention is implemented in a RAID-type data storage system. This system employs a large number of hard disk drives. When data is written to the system, the data is written to one or more of the disk drives. Metadata and parity information corresponding to the data are also written to one or more of the disk drives to reduce the possibility of data being lost or corrupted. The manner in which data is written to the disks typically involves only one data disk at a time, in addition to metadata and parity disks. Similarly, reads of data typically only involve one data disk at a time. Consequently, data disks which are not currently being accessed can be powered down. The system is therefore configured to individually control the power to each of the disks so that it can power up the subset of disks that are currently being accessed, while powering down the subset of disks that are not being accessed. [0050]
  • Because only a portion of the disk drives in the system are powered on at any given time, the power consumption of the system is less than that of a comparable conventional system (i.e., one with approximately the same total number of similar disk drives) in which all of the disk drives have to be powered on at the same time. As a result of the lower power consumption of the system, it generates less heat and requires less cooling than the conventional system. [0051]
  • The present system can therefore be packaged in a smaller enclosure than the comparable conventional system. Another difference between the present system and conventional systems is that conventional systems require switches for routing data to appropriate data disks in accordance with the data protection scheme employed by the system (e.g., RAID level [0052] 3). In the present system, on the other hand, most of the disk drives are powered down at any given time, so the data can be distributed by a simple fan-out interconnection, which consumes less power and takes up less volume within the system enclosure than the switches used in conventional systems. Yet another difference between the present system and conventional systems is that, given a particular reliability (e.g., mean time to failure, or MTTF) of the individual disk drives, the present system can be designed to meet a particular reliability level (e.g., threshold mean time between failures, MTBF), as opposed to conventional systems, which are essentially constrained by the number of disk drives in the system and the reliability of the individual disk drives.
  • The various embodiments of the invention may provide advantages over conventional systems (e.g., RAID systems) in the four areas discussed above: power management; data protection; physical packaging; and storage transaction performance. These advantages are described below with respect to the different areas of impact. [0053]
  • Power Management [0054]
  • In regard to power management, embodiments of the present invention may not only decrease power consumption, but also increase system reliability by optimally power cycling the drives. In other words, only a subset of the total number of drives is powered on at any time. Consequently, the overall system reliability can be designed to be above a certain acceptable threshold. [0055]
  • The power cycling of the drives on an individual basis is one feature that distinguishes the present embodiments from conventional systems. As noted above, prior art multi-drive systems do not allow individual drives, or even sets of drives to be powered off in a deterministic manner during operation of the system to conserve energy. Instead, they teach the powering off of entire systems opportunistically. In other words, if it is expected that the system will not be used at all, the entire system can be powered down. During the period in which the system is powered off, of course, it is not available for use. By powering off individual drives while other drives in the system remain powered on, embodiments of the present invention provide power-efficient systems for data storage and enable such features as the use of closely packed drives to achieve higher drive density than conventional systems in the same footprint. [0056]
  • Data Protection [0057]
  • In regard to data protection, it is desirable to provide a data protection scheme that assures efficiency in storage overhead used while allowing failed disks to be replaced without significant disruption during replacement. This scheme must be optimized with respect to the power cycling of drives since RAID schemes will have to work with the correct subset of drives that are powered on at any time. Thus, any Read or Write operations must be completed in expected time even when a fixed set of drives are powered on. Because embodiments of the present invention employ a data protection scheme that does not use most or all of the data disks simultaneously, the drives that are powered off can be easily replaced without significantly disrupting operations. [0058]
  • Physical Packaging [0059]
  • In regard to the physical packaging of the system, most storage devices must conform to a specific volumetric constraint. For example, there are dimensional and weight limits that correspond to a standard rack, and many customers may have to use systems that fall within these limits. The embodiments of the present invention use high density packing and interconnection of drives to optimize the physical organization of the drives and achieve the largest number of drives possible within these constraints. [0060]
  • Storage Transaction Performance [0061]
  • In regard to storage transaction performance, the power cycling of drives results in a limited number of drives being powered on at any time. This affects performance in two areas. First, the total I/O is bound by the number of powered drives. Second, a random Read operation to a block in a powered down drive would incur a very large penalty in the spin-up time. The embodiments of the present invention use large numbers of individual drives, so that the number of drives that are powered on, even though it will be only a fraction of the total number of drives, will allow the total I/O to be within specification. In regard to the spin-up delay, the data access scheme masks the delay so that the host system does not perceive the delay or experience a degradation in performance. [0062]
  • Referring to FIG. 1, a diagram illustrating the general structure of a multiple-disk data storage system in accordance with one embodiment of the invention is shown. It should be noted that the system illustrated in FIG. 1 is a very simplified structure which is intended merely to illustrate one aspect (power cycling) of an embodiment of the invention. A more detailed representation of a preferred embodiment is illustrated in FIG. 2 and the accompanying text below. [0063]
  • As depicted in FIG. 1, [0064] data storage system 10 includes multiple disk drives 20. It should be noted that, for the purposes of this disclosure, identical items in the figures may be indicated by identical reference numerals followed by a lowercase letter, e.g., 20 a, 20 b, and so on. The items may be collectively referred to herein simply by the reference numeral. Each of disk drives 20 is connected to a controller 30 via interconnect 40.
  • It can be seen in FIG. 1 that disk drives [0065] 20 are grouped into two subsets, 50 and 60. Subset 50 and subset 60 differ in that the disk drives in one of the subsets (e.g., 50) are powered on, while the disk drives in the other subset (e.g., 60) are powered down. The individual disk drives in the system are powered on (or powered up) only when needed. When they are not needed, they are powered off (powered down). Thus, the particular disk drives that make up each subset will change as required to enable data accesses (reads and writes) by one or more users. This is distinctive because, as noted above, conventional data storage (e.g., RAID) systems only provide power cycling of the entire set of disk drives—they do not allow the individual disk drives in the system to be powered up and down as needed.
  • As mentioned above, the system illustrated by FIG. 1 is used here simply to introduce the power cycling aspect of one embodiment of the invention. This and other embodiments described herein are exemplary and numerous variations on these embodiments may be possible. For example, while the embodiment of FIG. 1 utilizes multiple disk drives, other types of data storage, such as solid state memories, optical drives, or the like could also be used. It is also possible to use mixed media drives, although it is contemplated that this will not often be practical. References herein to disk drives or data storage drives should therefore be construed broadly to cover any type of data storage. Similarly, while the embodiment of FIG. 1 has two subsets of disk drives one of which is powered on and one of which is powered off, other power states may also be possible. For instance, there may be various additional states of operation (e.g., standby) in which the disk drives may exist, each state having its own power consumption characteristics. [0066]
  • The powering of only a subset of the disk drives in the system enables the use of a greater number of drives within the same footprint as a system in which all of the drives are powered on at once. One embodiment of the invention therefore provides high density packing and interconnection of the disk drives. This system comprises a rack having multiple shelves, wherein each shelf contains multiple rows, or “sticks” of disk drives. The structure of this system is illustrated in FIG. 2. [0067]
  • Referring to FIG. 2, the top level interconnection between the [0068] system controller 120 and the shelves 110 is shown on the left side of the figure. The shelf-level interconnection to each of the sticks 150 of disk drives 160 is shown on the right side of the figure. As shown on the left side of the figure, the system has multiple shelves 110, each of which is connected to a system controller 120. Each shelf has a shelf controller 140 which is connected to the sticks 150 in the shelf. Each stick 150 is likewise connected to each of the disk drives 160 so that they can be individually controlled, both in terms of the data accesses to the disk drives and the powering on/off of the disk drives. The mechanism for determining the optimal packing and interconnection configuration of the drives in the system is described below.
  • It should be noted that, for the sake of clarity, not all of the identical items in FIG. 2 are individually identified by reference numbers. For example, only a few of the disk shelves ([0069] 110 a-110 c), sticks (150 a-150 b) and disk drives (160 a-160 c) are numbered. This is not intended to distinguish the items having reference numbers from the identical items that do not have reference numbers.
  • Let the number of drives in the system be N, where N is a large number. [0070]
  • N is then decomposed into a 3-tuple, such that N=s.t.d where [0071]
  • s: number of shelf units in the system, typically determined by the physical height of the system. For example, for a 44U standard rack system, s can be chosen to be 8. [0072]
  • t: the number of “sticks” in the each shelf unit where a stick comprises a column of disks. For example, in a 24-inch-wide rack, t<=8. [0073]
  • d: the number of disk drives in each stick in a shelf. In a standard rack, d can be [0074]
  • The configuration as shown in FIG. 2 is decomposed into shelves, sticks and disks so that the best close packing of disks can be achieved for purposes of maximum volumetric capacity of disk drives. One example of this is shown in FIG. 3. With the large racks that are available, nearly 1000 3.5″ disks can be packed into the rack. [0075]
  • The preferred configuration is determined by the decomposition of N into s, t and d while optimizing with respect to the i) volume constraints of the drives and the overall system (the rack), and ii) the weight constraint of the complete system. The latter constraints are imposed by the physical size and weight limits of standard rack sizes in data centers. [0076]
  • Besides constraints on weight and dimensions, large-scale packing of drives must also provide adequate airflow and heat dissipation to enable the disks to operate below a specified ambient temperature. This thermal dissipation limit also affects how the disks are arranged within the system. [0077]
  • One specific implementation that maximizes the density of drives while providing sufficient air flow for heat dissipation is the configuration shown in FIG. 3. [0078]
  • Power Cycling of Drives to Increase System Reliability and Serviceability [0079]
  • One embodiment of the invention comprises a bulk storage or near-online (NOL) system. This storage system is a rack-level disk system comprising multiple shelves. Hosts can connect to the storage system via Fibre Channel ports on the system level rack controller, which interconnects to the shelves in the rack. Each shelf has a local controller that controls all of the drives in the shelf. RAID functionality is supported within each shelf with enough drives for providing redundancy for parity protection as well as disk spares for replacing failed drives. [0080]
  • In this embodiment, the system is power cycled. More particularly, the individual drives are powered on or off to improve the system reliability over the entire (large) set of drives. Given current known annualized failure rates (AFRs), a set of 1000 ATA drives would be expected to have a MTBF of about 20 days. In an enterprise environment, a drive replacement period of 20 days to service the storage system is not acceptable. The present scheme for power cycling the individual drives effectively extends the real life of the drives significantly. However, such power cycling requires significant optimization for a number of reasons. For example, power cycling results in many contact start-stops (CSSs), and increasing CSSs reduces the total life of the drive. Also, having fewer powered drives makes it difficult to spread data across a large RAID set. Consequently, it may be difficult to implement data protection at a level equivalent to RAID [0081] 5. Still further, the effective system bandwidth is reduced when there are few powered drives.
  • In one embodiment, the approach for determining the power cycling parameters is as shown in the flow diagram of FIG. 4 and as described below. It should be noted that the following description assumes that the disk drives have an exponential failure rate (i.e., the probability of failure is 1-e-[0082] −λt, where λ is the inverse of the failure rate). The failure rates of disk drives (or other types of drives) in other embodiments may have failure rates that are more closely approximated by other mathematical functions. For such systems, the calculations described below would use the alternative failure function instead of the present exponential function.
  • With a large number of drives, N, that are closely packed into a single physical system, the MTTF of the system will grow significantly as N grows to large numbers. [0083]
  • If the MTTF of a single drive is f (typically in hours) where f=1/(failure rate of a drive) then the system MTBF, F, between failures of individual disks in the system is [0084]
  • F=1/(1−(1−1/f)**N)
  • For N=1000, and f=500,000 hrs or 57 years, F=22 days. Such low MTBF is not acceptable for most data centers and enterprises. As mentioned above, the system MTBF can be increased by powering the drives on and off, i.e., power cycling the drives, to increase the overall life of each drives in the system. This facilitates maintenance of the system, since serviceability of computing systems in the enterprise requires deterministic and scheduled service times when components (drives) can be repaired or replaced. Since it is desired to have scheduled service at regular intervals, this constraint is incorporated into the calculations that follow. [0085]
  • Let the interval to service the system to replace failed disk drives be T, and required the power cycling duty ratio be R. [0086]
  • The effective system MTBF is T, and the effective failure rate of the system is 1/T [0087]
  • Then, the effective MTBF in a system of N disks is: [0088]
  • f*=1/{1−(1−1/T)**1/N}
  • Thus, we can compute the effective MTTF of disks in a large number of drives in a single system so that the service interval is T. [0089]
  • Since the actual MTTF is f, the approach we take is to power cycle the drives, i.e., turn off the drives for a length of time and then turn them on for a certain length of time. [0090]
  • If R is the duty ratio to meet the effective MTTF, then [0091]
  • R=f/f*>1
  • Thus, if the ON period of the drives is p hours, then the drives must be OFF for p/R hours. [0092]
  • Further, since at any one time only a subset of all drives are powered on, the effective number of drives in the system that are powered ON is R*N. [0093]
  • Thus, the ratio R of all drives at a shelf is also the number of drives that must be powered ON in total in each shelf. This also limits the number of drives that are used for data writing or reading as well as any other drives used for holding metadata. [0094]
  • There is one other constraint that must be satisfied in the power cycling that determines the ON period of p hours. [0095]
  • If the typical life of the drive is f hours (same as nominal MTTF), then the number of power cycling events for a drive is CSS (for contact start stops) [0096]
  • CSS=f/(p+p/R)
  • Since CSS is limited to a maximum CSSmax, for any drive [0097]
  • CSS<CSSmax
  • Thus, p must be chosen such that CSSmax is never exceeded. [0098]
  • FIG. 4 depicts the flowchart for establishing power cycling parameters. [0099]
  • Efficient Data Protection Scheme for Near Online (NOL) System [0100]
  • In one embodiment, a new RAID variant is implemented in order to meet the needs of the present Power Managed system. To meet the serviceability requirement of the system, the power duty cycle R of the drives will be less than 100% and may be well below 50%. Consequently, when a data volume is written to a RAID volume in a shelf, all drives in the RAID set cannot be powered up (ON). The RAID variant disclosed herein is designed to provide the following features. [0101]
  • First, this scheme is designed to provide adequate parity protection. Further, it is designed to ensure that CSS thresholds imposed by serviceability needs are not violated. Further, the RAID striping parameters are designed to meet the needs of the workload patterns, the bandwidth to be supported at the rack level, and access time. The time to access the first byte must also be much better than tape or sequential media. The scheme is also designed to provide parity based data protection and disk sparing with low overhead. [0102]
  • There are a number of problems that have to be addressed in the data protection scheme. For instance, failure of a disk during a write (because of the increased probability of a disk failure due to the large number of drives in the system) can lead to an I/O transaction not being completed. Means to ensure data integrity and avoid loss of data during a write should therefore be designed into the scheme. Further, data protection requires RAID redundancy or parity protection. RAID operations, however, normally require all drives powered ON since data and parity are written on multiple drives. Further, Using RAID protection and disk sparing typically leads to high disk space overhead that potentially reduces effective capacity. Still further, power cycling increases the number of contact start stops (CSSs), so CSS failure rates increase, possibly by 4 times or more. [0103]
  • In one embodiment, there are 3 types of drives in each shelf: data and parity drives that are power cycled per schedule or by read/write activity; spare drives that are used to migrate data in the event of drive failures; and metadata drives that maintain the state and configuration of any given RAID set. A metadata drive contains metadata for all I/O operations and disk drive operational transitions (power up, power down, sparing, etc.). The data that resides on this volume is organized such that it provides information on the data on the set of disk drives, and also caches data that is to be written or read from drives that are not yet powered on. Thus, the metadata volume plays an important role in disk management, I/O performance, and fault tolerance. [0104]
  • The RAID variant used in the present system “serializes” writes to smallest subset of disks in the RAID set, while ensuring that CSS limits are not exceeded and that the write I/O performance does not suffer in access time and data rate. [0105]
  • Approach to RAID Variant [0106]
  • In applying data protection techniques, there are multiple states in which the set of drives and the data can reside. In one embodiment, the following states are used. Initialize—in this state, a volume has been allocated, but no data has been written to the corresponding disks, except for possible file metadata. Normal—in this state, a volume has valid data residing within the corresponding set of disk drives. This includes volumes for which I/O operations have resulted in the transferring of data. Data redundancy—in this state, a volume has been previously degraded and is in the process of restoring data redundancy throughout the volume. Sparing—in this state, a disk drive within a set is replaced. [0107]
  • Assumptions [0108]
  • When developing techniques for data protection, there are tradeoffs that have to be made based on the technique that is selected. Two assumptions are made when considering tradeoffs. The first assumption is that this data storage system is not to achieve or approach the I/O performance of an enterprise online storage system. In other words, the system is not designed for high I/O transactions, but for reliability. The second assumption is that the I/O workload usage for this data storage is typically large sequential writes and medium to large sequential reads. [0109]
  • Set of Disk Drives Initialized [0110]
  • An initialized set of disk drives consists of a mapped organization of data in which a single disk drive failure will not result in a loss of data. For this technique, all disk drives are initialized to a value of 0. [0111]
  • The presence of “zero-initialized” disk drives is used as the basis for creating a “rolling parity” update. For instance, referring to FIG. 5, in a set of 4 disk drives, [0112] 201-204, all drives (3 data and 1 parity) are initialized to “0”. (It should be noted that the disk drives are arranged horizontally in the figure—each vertically aligned column represents a single disk at different points in time.) The result of the XOR computation denotes the result of the content of the parity drive (0⊕0⊕0=0). If data having a value of “5” is written to the first disk, 201, then the parity written to parity disk 204 would represent a “5” (5⊕0⊕0=5). If the next data disk (disk 202) were written with a value of “A”, then the parity would be represented as “F” (5⊕A⊕0=F). It should be noted that, while the parity disk contains a value equal to the XOR'ing of all three data disks, it is not necessary to power on all of the disks to generate the correct parity. Instead, the old parity (“5”) is simply XOR'ed with the newly written data (“A”) to generate the new parity (“F”). Thus, it is not necessary to XOR out the old data on disk 202.
  • Metadata Volume [0113]
  • In order to maintain the state and configuration of a given RAID set in one embodiment, there exists a “metadata volume” (MDV). This volume is a set of online, operational disk drives which may be mirrored for fault tolerance. This volume resides within the same domain as the set of disk drives. Thus, the operating environment should provide enough power, cooling, and packaging to support this volume. This volume contains metadata that is used for I/O operations and disk drive operational transitions (power up, power down, sparing, etc.). The data that resides on this volume is organized such that copies of subsets of data representing the data on the set of disk drives. [0114]
  • In a preferred implementation, a metadata volume is located within each shelf corresponding to metadata for all data volumes resident on the disks in the shelf. Referring to FIGS. 6 and 7, the data content of a metadata volume is illustrated. This volume contains all the metadata for the shelf, RAID, disk and enclosure. There also exists metadata for the rack controller. This metadata is used to determine the correct system configuration between the rack controller and disk shelf. [0115]
  • In one embodiment, the metadata volume contains shelf attributes, such as the number of total drives, drive spares, unused data, RAID set attributes and memberships, such as the RAID set set, drive attributes, such as the serial number, hardware revisions, firmware revisions, and volume cache, including read cache and write cache. [0116]
  • Volume Configurations [0117]
  • In one embodiment, the metadata volume is a set of mirrored disk drives. The minimum number of the mirrored drives in this embodiment is 2. The number of disk drives in the metadata volume can be configured to match the level of protection requested by the user. The number of disks cannot exceed the number of disk controllers. In order to provide the highest level of fault tolerance within a disk shelf, the metadata volume is mirrored across each disk controller. This eliminates the possibility of a single disk controller disabling the Shelf Controller. [0118]
  • In order to provide the best performance of a metadata volume, dynamic reconfiguration is enabled to determine the best disk controllers for which to have the disk drives operational. Also, in the event of a metadata volume disk failure, the first unallocated disk drive within a disk shelf will be used. Thus if there are no more unallocated disk drives, the first allocated spare disk drive will be used. If there are no more disk drives available, the shelf controller will remain in a stalled state until the metadata volume has been addressed. [0119]
  • Volume Layout [0120]
  • The layout of the metadata volume is designed to provide persistent data and state of the disk shelf. This data is used for shelf configuring, RAID set configuring, volume configuring, and disk configuring. This persistent metadata is updated and utilized during all phases of the disk shelf (Initialization, Normal, Reconstructing, Service, etc.). [0121]
  • The metadata volume data is used to communicate status and configuration data to the rack controller. For instance, the metadata may include “health information for each disk drive (i.e., information on how long the disk drive has been in service, how many times it has been powered on and off, and other factors that may affect its reliability). If the health information for a particular disk drive indicates that the drive should be replaced, the system may begin copying the data on the disk drive to another drive in case the first drive fails, or it may simply provide a notification that the drive should be replaced at the next normal service interval. The metadata volume data also has designated volume-cache area for each of the volumes. In the event that a volume is offline, the data stored in the metadata volume for the offline volume can be used while the volume comes online. This provides, via a request from the rack controller, a window of 10-12 seconds (or whatever time is necessary to power-on the corresponding drives) during which write data is cached while the drives of the offline volume are being powered up. After the drives are powered up and the volume is online, the cached data is written to the volume. [0122]
  • Shelf Initializations [0123]
  • At power-on/reset of the disk shelf, all data is read from the metadata volume. [0124]
  • This data is used to bring the disk shelf to an operational mode. Once the disk shelf has completed the initialization, it will wait for the rack controller to initiate the rack controller initialization process. [0125]
  • Volume Operations [0126]
  • Once the disk shelf is in an operational mode, each volume is synchronized with the metadata volume. Each volume will have its associated set of metadata on the disk drive. This is needed in the event of a disastrous metadata volume failure. [0127]
  • Read Cache Operations [0128]
  • The metadata volume has reserved space for each volume. Within the reserved space of the metadata volume resides an allocated volume read cache (VRC). This read cache is designed to alleviate the spin-up and seek time of a disk drive once initiated with power. The VRC replicates the initial portion of each volume. The size of data replicated in the VRC will depend on the performance desired and the environmental conditions. Therefore, in the event that an I/O READ request is given to an offline volume, the data can be sourced from the VRC. Care must be taken to ensure that this data is coherent and consistent with the associated volume. [0129]
  • Write Cache Operations [0130]
  • As noted above, the metadata volume has reserved space for each volume. Within the reserved space of the metadata volume resides an allocated volume write cache (VWC). This write cached is designed to alleviate the spin-up and seek time of a disk drive once initiated with power. The VWC has a portion of the initial data, e.g., 512 MB, replicated for each volume. Therefore, in the event that an I/O write request is given to an offline volume, the data can be temporarily stored in the VWC. Again, care must be taken to ensure that this data is coherent and consistent with the associated volume. [0131]
  • Set of Disk I/O Operations [0132]
  • Referring to FIG. 8, a diagram illustrating the manner in which data is stored on a set of disks is shown. A set of disks are partitioned into “large contiguous” sets of data blocks, known as containers. Single or multiple disk volumes which are presented to the storage user or server can represent a container. The data blocks within a container are dictated by the disk sector size, typically, 512 bytes. Each container is statically allocated and addressed from 0 to x, where x is the number of data blocks minus 1. Each container can be then divided into some number of sub-containers. [0133]
  • The access to each of the containers is through a level of address indirection. The container is a contiguous set of blocks that is addressed from 0 to x. As the device is accessed, the associated disk drive must be powered and operational. As an example, [0134] container 0 is fully contained within the address space of disk drive 1. Thus, when container 0 is written or read, the only disk drive that is powered on is disk drive 1.
  • If there is a limited amount of power and cooling capacity for the system and only one disk drive can be accessed at a time, then in order to access [0135] container 2, disk drives 1 and 2 must be alternately powered, as container 2 spans both disk drives. Initially, disk drive 1 is powered. Then, disk drive 1 is powered down, and disk drive 2 is powered up. Consequently, there will be a delay for disk drive 2 to become ready for access. Thus, the access of the next set of data blocks on disk drive 2 will be delayed. This generally is not an acceptable behavior for access to a disk drive. The first segment of each disk drive and/or container is therefore cached on a separate set of active/online disk drives. In this embodiment, the data blocks for container 2 reside on the metadata volume, as illustrated in FIG. 9.
  • This technique, in which a transition between two disk drives is accomplished by powering down one disk drive and powering up the other disk drive, can be applied to more than just a single pair of disk drives. In the event that there is a need for higher bandwidth, the single drives described above can each be representative of a set of disk drives. This disk drive configuration could comprise RAID[0136] 10 or some form of data organization that would “spread” a hot spot over many disk drives (spindles). Set of Disk Drives becoming Redundant
  • Referring to FIG. 10, a diagram illustrating the use of a pair of redundant disk drives is shown. As data is allocated to a set of disk drives, there is a need for data replication. Assuming that the replication is a form of RAID ([0137] 1, 4, 5, etc.), then the process of merging must keep the data coherent. This process may be done in synchronously with each write operation, or it may be performed at a later time. Since not all disk drives are powered on at one time, there is additional housekeeping of the current status of a set of disk drives. This housekeeping comprises the information needed to regenerate data blocks, knowing exactly which set of disk drives or subset of disk drives are valid in restoring the data.
  • Variable RAID Set Membership [0138]
  • One significant benefit of the power-managed system described herein is that drives in a RAID set can be reused, even in the event of multiple disk drive failures. In conventional RAID systems, failure of more than one drive in a RAID set results in the need to abandon all of the drives in the RAID set, since data is striped or distributed across all of the drives in the RAID set. In the case of the power-managed system described herein, it is possible to reuse the remaining drives in a different RAID set or a RAID set of different size. This results in much greater utilization of the storage space in the total system. [0139]
  • In the event of multiple drive failures in the same RAID set, the set of member drives in the RAID set can be decreased (e.g., from six drives to four). Using the property of “zero-based” XOR parity as described above, the parity for the reduced set of drives can be calculated from the data that resides on these drives. This allows the preservation of the data on the remaining drives in the event of future drive failures. In the event that the parity drive is one of the failed drives, a new parity drive could be designated for the newly formed RAID set, and the parity information would be stored on this drive. Disk drive metadata is updated to reflect the remaining and/or new drives that now constitute the reduced or newly formed RAID set. [0140]
  • In one exemplary embodiment, a RAID set has five member drives, including four data drives and one parity drive. In the event of a failure of one data drive, the data can be reconstructed, either on the remaining disk drives if sufficient space is available. (If a spare is available to replace the failed drive and it is not necessary to reduce the RAID set, the data can be reconstructed on the new member drive.) In the event of a simultaneous failure of two or more data drives, the data on the non-failed drives can be retained and operations can proceed with the remaining data on the reduced RAID set, or the reduced RAID set can be re-initialized and used as a new RAID set. [0141]
  • This same principle can be applied to expand a set of disk drives. In other words, if it would be desirable to add a drive to a RAID set (e.g., increasing the set from four drives to five), this can also be accomplished in a manner similar to the reduction of the RAID set. In the event a RAID set would warrant an additional disk drive, the disk drive metadata would need to be updated to represent the membership of the new drive(s). [0142]
  • Sparing of a Set of Disk Drives [0143]
  • The sparing of a failed disk on of a set of disk drives is performed at both failed data block and the failed disk drive events. The sparing of failed data blocks is temporarily regenerated. Using both the metadata volume and a ‘spare’ disk drive, the process of restoring redundancy within a set of disk drives, can be more efficient and effective. This process is matched to the powering of the each of the remaining disk drives in a set of disk drives. [0144]
  • In the event of an exceeded threshold for failed data blocks, a spare disk drive is allocated as a candidate for replacement into the RAID set. Since only a limited number of drives can be powered-on at one time, only the drive having the failed data blocks and the candidate drive are powered. At this point, only the known good data blocks are copied onto the corresponding address locations of the failed data blocks. Once all the known good blocks have been copied, the process to restore the failed blocks is initiated. Thus the entire RAID set will need to be powered-on. Although the entire set of disk drives needs to powered-on, it is only for the time necessary to repair the bad blocks. After all the bad blocks have been repaired, the drives are returned to a powered-off state. [0145]
  • In the event of a failed disk drive, all disk drives in the RAID set are powered-on. The reconstruction process, discussed in the previous section, would then be initiated for the restoration of all the data on the failed disk drive. [0146]
  • RAID Automated Storage Management Features [0147]
  • The end user of the system may use it, for example, as a disk system attached directly to a server as direct attached storage (DAS) or as shared storage in a storage area network (SAN). In FIG. 11, the system is used as the backup target to the primary storage via a direct connection and then connected via a media (backup) server to a tape library. The system may be used in other ways in other embodiments. [0148]
  • In this embodiment, the system presents volume images to the servers or users of the system. However, physical volumes are not directly accessible to the end users. This is because, as described earlier, through the power managed RAID, the system hides the complexity of access to physical drives, whether they are powered on or not. The controller at the rack and the shelf level isolates the logical volume from the physical volume and drives. [0149]
  • Given this presentation of the logical view of the disk volumes, the system can rewrite, relocate or move the logical volumes to different physical locations. [0150]
  • This enables a number of volume-level functions that are described below. For instance, the system may provide independence from the disk drive type, capacity, data rates, etc. This allows migration to new media as they become available and when new technology is adopted. It also eliminates the device (disk) management administration required to incorporate technology obsolescence. [0151]
  • The system may also provide automated replication for disaster recovery. The second copy of a primary volume can be independently copied to third party storage devices over the network, either local or over wide-area. Further, the device can be another disk system, another tape system, or the like. Also, the volume could be replicated to multiple sites for simultaneously creating multiple remote or local copies. [0152]
  • The system may also provide automatic incremental backup to conserve media and bandwidth. Incremental and differential changes in the storage volume can be propagated to the third or later copies. [0153]
  • The system may also provide authentication and authorization services. Access to both the physical and logical volumes and drives can be controlled by the rack and shelf controller since it is interposed between the end user of the volumes and the physical drives. [0154]
  • The system may also provide automated data revitalization. Since data on disk media can degrade over time, the system controller can refresh the volume data to different drives automatically so that the data integrity is maintained. Since the controllers have information on when disks and volumes are written, they can keep track of which disk data has to be refreshed or revitalized. [0155]
  • The system may also provide concurrent restores: multiple restores can be conducted concurrently, possibly initiated asynchronously or via policy by the controllers in the system. [0156]
  • The system may also provide unique indexing of metadata within a storage volume: by keeping metadata information on the details of objects contained within a volume, such as within the metadata volume in a shelf. The metadata can be used by the controller for the rapid search of specific objects across volumes in the system. [0157]
  • The system may also provide other storage administration feature for the management of secondary and multiple copies of volumes, such as single-view of all data to simplify and reduce cost of managing all volume copies, automated management of the distribution of the copies of data, and auto-discovery and change detection of the primary volume that is being backed up When the system is used for creating backups. [0158]
  • A Preferred Implementation [0159]
  • Interconnect [0160]
  • The preferred interconnect system provides a means to connect 896 disk drives, configured as 112 disks per shelf and 8 shelves per rack. The internal system interconnect is designed to provide an aggregate throughput equivalent to six 2 Gb/sec Fibre Channel interfaces (1000 MB/s read or write). The external system interface is Fibre Channel. The interconnect system is optimized for the lowest cost per disk at the required throughput. FIG. 12 shows the interconnect scheme from the host (server or end user) to the end disk drives. [0161]
  • The interconnect system incorporates RAID at the shelf level to provide data reliability. The RAID controller is designed to address 112 disks, some of which may be allocated to sparing. The RAID controller spans 8 sticks of 14 disks each. The RAID set should be configured to span multiple sticks to guard against loss of any single stick controller or interconnect or loss of any single disk drive. [0162]
  • The system interconnect from shelf to stick can be configured to provide redundancy at the stick level for improved availability. [0163]
  • The stick-level interconnect is composed of a stick controller (FPGA/ASIC plus SERDES), shelf controller (FPGA/ASIC plus SERDES, external processor and memory), rack controller (FPGA/ASIC plus SERDES) and associated cables, connectors, printed circuit boards, power supplies and miscellaneous components. As an option, the SERDES and/or processor functions may be integrated into an advanced FPGA (e.g., using Xilinx Virtex II Pro). [0164]
  • Shelf and Stick Controller [0165]
  • The shelf controller and the associated [0166] 8 stick controllers are shown in FIG. 13. In this implementation, the shelf controller is connected to the rack controller (FIG. 15) via Fibre Channel interconnects. It should be noted that, in other embodiments, other types of controllers and interconnects (e.g., SCSI) may be used.
  • The shelf controller can provide different RAID level support such as [0167] RAID 0, 1 and 5 and combinations thereof across programmable disk RAID sets accessible via eight SATA initiator ports. The RAID functions are implemented in firmware, with acceleration provided by an XOR engine and DMA engine implemented in hardware. In this case, XOR-equipped CPU Intel IOP321 is used.
  • The Shelf Controller RAID control unit connects to the Stick Controller via a SATA Channel Controller over the PCI-X bus. The 8 SATA outputs of the SATA Channel Controller each connect with a stick controller data/command router device (FIG. 14). Each data/command router controls [0168] 14 SATA drives of each stick.
  • Rack Controller [0169]
  • The rack controller comprises a motherboard with a ServerWorks GC-LE chipset and four to 8 PCI-X slots. In the implementation shown in FIG. 15, the PCI-X slots are populated with dual-port or quad-port 2 G Fibre Channel PCI-X target bus adapters (TBA). In other embodiments, other components, which employ other protocols, may be used. For example, in one embodiment, quad-port shelf SCSI adapters using u320 to the shelf units may be used. [0170]
  • The benefits and advantages which may be provided by the present invention have been described above with regard to specific embodiments. These benefits and advantages, and any elements or limitations that may cause them to occur or to become more pronounced are not to be construed as critical, required, or essential features of any or all of the claims. As used herein, the terms ‘comprises,’ ‘comprising,’ or any other variations thereof, are intended to be interpreted as non-exclusively including the elements or limitations which follow those terms. Accordingly, a system, method, or other embodiment that comprises a set of elements is not limited to only those elements, and may include other elements not expressly listed or inherent to the claimed embodiment. [0171]
  • While the present invention has been described with reference to particular embodiments, it should be understood that the embodiments are illustrative and that the scope of the invention is not limited to these embodiments. Many variations, modifications, additions and improvements to the embodiments described above are possible. It is contemplated that these variations, modifications, additions and improvements fall within the scope of the invention as detailed within the following claims. [0172]

Claims (39)

What is claimed is:
1. A system comprising:
a plurality of data storage drives; and
a controller coupled to each of the data storage drives;
wherein the controller is configured to power on a first subset of the data storage drives and to power off a second subset of the data storage drives, and wherein each of the first and second subsets contains at least one of the data storage drives.
2. The system of claim 1, wherein the plurality of data storage drives comprise a single RAID set of data storage drives.
3. The system of claim 1, wherein the first subset comprises at least one data storage drive in a first RAID set and the second subset comprises at least one data storage drive in the first RAID set.
4. The system of claim 1, wherein each of the plurality of data storage drives is individually controllable to power the data storage drive on or off, independent of the remainder of the plurality of data storage drives.
5. The system of claim 1, wherein the data storage drives comprise hard disk drives.
6. The system of claim 5, wherein the system comprises a RAID system.
7. The system of claim 5, wherein the system comprises multiple shelves, wherein each shelf comprises multiple subsets of data storage drives.
8. The system of claim 7, wherein the system comprises one or more RAID sets of data storage drives and wherein each of the one or more RAID sets of data storage drives comprises data storage drives from at least two of the shelves.
9. The system of claim 7, wherein the controller comprises a rack controller connected to a plurality of shelf controllers, wherein each shelf controller is configured to control a set of data storage drives on a corresponding shelf.
10. The system of claim 7, wherein the data storage drives are contained in a single physical enclosure.
11. The system of claim 1, wherein the data storage drives comprise optical disk drives.
12. The system of claim 1, further comprising one or more parity drives, each of which is associated with a corresponding RAID set of the plurality of data storage drives.
13. The system of claim 12, wherein the system is configured to compute parity information for the RAID set by XOR'ing an old parity value with a value currently written to one of the data storage drives in the RAID set to generate a current parity value, and storing the current parity value on the parity drive.
14. The system of claim 12, wherein the one or more parity drives are always powered on.
15. The system of claim 1, further comprising one or more metadata drives, each of which is associated with a corresponding group of the plurality of data storage drives.
16. The system of claim 15, wherein the system is configured to store metadata information on the metadata drive, wherein the metadata comprises a mapping of logical addresses for the system to physical addresses for the corresponding group of data storage drives.
17. The system of claim 15, wherein the system is configured to store metadata information on the metadata drive, wherein the metadata comprises health information for the corresponding group of data storage drives.
18. The system of claim 15, wherein the system is configured to store metadata information on the metadata drive, wherein the metadata comprises data which duplicates a portion of each of the corresponding group of data storage drives.
19. The system of claim 15, wherein the one or more metadata drives are always powered on.
20. The system of claim 1, wherein the first subset comprises no more than a predetermined fraction of the plurality of data storage drives.
21. The system of claim 1, wherein the predetermined fraction is a function of a failure rate of individual data storage drives, a minimum required service period, and a total number of data storage drives in the system.
22. The system of claim 21, wherein the predetermined fraction is equal to f/{1−(1−1/T)**1/N}, where f is a mean time between failures of an individual data storage drive, T is a minimum required service period, and N is the total number of data storage drives in the system.
23. A method comprising:
providing a data storage system having a plurality of data storage drives;
performing data accesses to the data storage system, wherein the data accesses involve accesses to a first subset of the data storage drives, wherein the first subset of the data storage drives is powered on; and
powering down a second subset of the data storage drives, wherein the data accesses do not involve accesses to the second subset of the data storage drives.
24. The method of claim 23, wherein the plurality of data storage drives comprise a single RAID set of data storage drives.
25. The method of claim 23, wherein the first subset comprises at least one data storage drive in a first RAID set and the second subset comprises at least one data storage drive in the first RAID set.
26. The method of claim 23, wherein each of the plurality of data storage drives is individually controlled to power the data storage drive on or off, independent of the remainder of the plurality of data storage drives.
27. The method of claim 23, wherein performing data accesses to the data storage system comprises accessing a block of storage that spans a first data storage drive and a second data storage drive, wherein as the first data storage drive is accessed, the first data storage drive is powered on and the second data storage drive is powered off, and as the second data storage drive is accessed, the second data storage drive is powered on and the first data storage drive is powered off.
28. The method of claim 27, further comprising, if the data accesses comprise writes, caching data for the second data storage drive as the second data storage drive is transitioned from a powered off state to a powered on state.
29. The method of claim 27, further comprising, if the data accesses comprise reads, retrieving data corresponding to the second data storage drive from a metadata volume as the second data storage drive is transitioned from a powered off state to a powered on state.
30. The method of claim 23, wherein performing data accesses to the data storage system comprises accessing one or more data storage drives and corresponding parity drives.
31. The method of claim 30, further comprising computing parity information for a RAID set by XOR'ing an old parity value with a value currently written to one of the data storage drives in the RAID set to generate a current parity value, and storing the current parity value on the parity drive.
32. The method of claim 23, wherein performing data accesses to the data storage system comprises accessing one or more data storage drives and corresponding metadata drives.
33. The method of claim 32, wherein accessing the metadata drives comprises storing metadata information on the metadata drive, wherein the metadata comprises health information for the corresponding group of data storage drives.
34. The method of claim 32, wherein accessing the metadata drives comprises storing metadata information on the metadata drive, wherein the metadata comprises data which duplicates a portion of each of the corresponding group of data storage drives.
35. The method of claim 32, further comprising refreshing data on the one or more data storage drives based on information stored on the metadata drive.
36. The method of claim 23, wherein the first subset comprises no more than a predetermined fraction of the plurality of data storage drives.
37. The system of claim 36, wherein the predetermined fraction is a function of a failure rate of individual data storage drives, a minimum required service period, and a total number of data storage drives in the system.
38. The method of claim 37, wherein the predetermined fraction is equal to f/{1−(1−1/T)**1/N}, where f is a mean time between failures of an individual data storage drive, T is a minimum required service period, and N is the total number of data storage drives in the system.
39. The method of claim 23, further comprising replacing one or more data storage drives that are in the second subset.
US10/607,932 2002-09-03 2003-06-26 Method and apparatus for power-efficient high-capacity scalable storage system Active - Reinstated 2024-06-30 US7035972B2 (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
US10/607,932 US7035972B2 (en) 2002-09-03 2003-06-26 Method and apparatus for power-efficient high-capacity scalable storage system
US11/076,447 US20050210304A1 (en) 2003-06-26 2005-03-08 Method and apparatus for power-efficient high-capacity scalable storage system
US11/108,077 US7210004B2 (en) 2003-06-26 2005-04-14 Method and system for background processing of data in a storage system
US11/322,787 US7330931B2 (en) 2003-06-26 2005-12-30 Method and system for accessing auxiliary data in power-efficient high-capacity scalable storage system
US11/351,979 US7210005B2 (en) 2002-09-03 2006-02-09 Method and apparatus for power-efficient high-capacity scalable storage system
US11/716,338 US7380060B2 (en) 2002-09-03 2007-03-09 Background processing of data in a storage system
US11/686,268 US20070220316A1 (en) 2002-09-03 2007-03-14 Method and Apparatus for Power-Efficient High-Capacity Scalable Storage System
US11/953,712 US20080114948A1 (en) 2003-06-26 2007-12-10 Method and system for accessing auxiliary data in power-efficient high-capacity scalable storage

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US40729902P 2002-09-03 2002-09-03
US40998002P 2002-09-12 2002-09-12
US10/607,932 US7035972B2 (en) 2002-09-03 2003-06-26 Method and apparatus for power-efficient high-capacity scalable storage system

Related Child Applications (4)

Application Number Title Priority Date Filing Date
US11/076,447 Continuation-In-Part US20050210304A1 (en) 2003-06-26 2005-03-08 Method and apparatus for power-efficient high-capacity scalable storage system
US11/108,077 Continuation-In-Part US7210004B2 (en) 2002-09-03 2005-04-14 Method and system for background processing of data in a storage system
US11/322,787 Continuation-In-Part US7330931B2 (en) 2003-06-26 2005-12-30 Method and system for accessing auxiliary data in power-efficient high-capacity scalable storage system
US11/351,979 Continuation US7210005B2 (en) 2002-09-03 2006-02-09 Method and apparatus for power-efficient high-capacity scalable storage system

Publications (3)

Publication Number Publication Date
US20040054939A1 true US20040054939A1 (en) 2004-03-18
US20050268119A9 US20050268119A9 (en) 2005-12-01
US7035972B2 US7035972B2 (en) 2006-04-25

Family

ID=31997887

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/607,932 Active - Reinstated 2024-06-30 US7035972B2 (en) 2002-09-03 2003-06-26 Method and apparatus for power-efficient high-capacity scalable storage system

Country Status (1)

Country Link
US (1) US7035972B2 (en)

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040260967A1 (en) * 2003-06-05 2004-12-23 Copan Systems, Inc. Method and apparatus for efficient fault-tolerant disk drive replacement in raid storage systems
US20050111249A1 (en) * 2003-11-26 2005-05-26 Hitachi, Ltd. Disk array optimizing the drive operation time
US20050144383A1 (en) * 2003-12-25 2005-06-30 Seiichi Higaki Memory control device and method for controlling the same
US20050259345A1 (en) * 1999-04-05 2005-11-24 Kazuo Hakamata Disk array unit
US20060107099A1 (en) * 2004-10-28 2006-05-18 Nec Laboratories America, Inc. System and Method for Redundant Storage with Improved Energy Consumption
US20060143381A1 (en) * 2003-06-18 2006-06-29 Akihiro Mori System and method for accessing an offline storage unit through an online storage unit
WO2006086066A2 (en) * 2005-02-04 2006-08-17 Dot Hill Systems Corp. Storage device method and apparatus
US20060255409A1 (en) * 2004-02-04 2006-11-16 Seiki Morita Anomaly notification control in disk array
US20060279869A1 (en) * 2005-06-08 2006-12-14 Akira Yamamoto Storage system controlling power supply module and fan
US20070061510A1 (en) * 2005-09-13 2007-03-15 Atsuya Kumagai Storage unit and disk control method
US20070061512A1 (en) * 2005-09-13 2007-03-15 Hitachi, Ltd. Management apparatus, management method and storage management system
US20070061509A1 (en) * 2005-09-09 2007-03-15 Vikas Ahluwalia Power management in a distributed file system
US20070067559A1 (en) * 2005-09-22 2007-03-22 Akira Fujibayashi Storage control apparatus, data management system and data management method
US20070067560A1 (en) * 2005-09-20 2007-03-22 Tomoya Anzai System for controlling spinning of disk
US20070073970A1 (en) * 2004-01-16 2007-03-29 Hitachi, Ltd. Disk array apparatus and disk array apparatus controlling method
EP1770499A1 (en) * 2005-09-22 2007-04-04 Hitachi, Ltd. Storage control apparatus, data management system and data management method
US20070079088A1 (en) * 2005-10-05 2007-04-05 Akira Deguchi Information processing system, control method for information processing system, and storage system
US20070079156A1 (en) * 2005-09-30 2007-04-05 Kazuhisa Fujimoto Computer apparatus, storage apparatus, system management apparatus, and hard disk unit power supply controlling method
US20070130424A1 (en) * 2005-12-02 2007-06-07 Hitachi, Ltd. Storage system and capacity allocation method therefor
US20070143542A1 (en) * 2005-12-16 2007-06-21 Hitachi, Ltd. Storage controller, and method of controlling storage controller
US20070143559A1 (en) * 2005-12-20 2007-06-21 Yuichi Yagawa Apparatus, system and method incorporating virtualization for data storage
WO2007079056A2 (en) 2005-12-30 2007-07-12 Copan Systems, Inc. Method and system for accessing auxiliary data in power-efficient high-capacity scalable storage system
US20070208921A1 (en) * 2006-03-03 2007-09-06 Hitachi, Ltd. Storage system and control method for the same
US20070244964A1 (en) * 2001-12-19 2007-10-18 Challenger James R H Method and system for caching message fragments using an expansion attribute in a fragment link tag
EP1887472A2 (en) * 2006-08-04 2008-02-13 Hitachi, Ltd. Storage system for suppressing failures of storage media group
US20080091876A1 (en) * 2006-10-11 2008-04-17 Akira Fujibayashi Storage apparatus and control method thereof
US20080104280A1 (en) * 2006-10-30 2008-05-01 Biskeborn Robert G On Demand Storage Array
US20080126701A1 (en) * 2006-11-28 2008-05-29 Uehara Go Storage system comprising both power saving and diagnostic functions
US20080256307A1 (en) * 2006-10-17 2008-10-16 Kazuhisa Fujimoto Storage subsystem, storage system, and method of controlling power supply to the storage subsystem
US7472211B2 (en) 2006-07-28 2008-12-30 International Business Machines Corporation Blade server switch module using out-of-band signaling to detect the physical location of an active drive enclosure device
US20090031150A1 (en) * 2007-07-24 2009-01-29 Hitachi, Ltd. Storage controller and method for controlling the same
JP2009020858A (en) * 2007-07-10 2009-01-29 Hitachi Ltd Storage of high power efficiency using data de-duplication
US20090055520A1 (en) * 2007-08-23 2009-02-26 Shunya Tabata Method for scheduling of storage devices
US20090210732A1 (en) * 2008-02-19 2009-08-20 Canon Kabushiki Kaisha Information processing apparatus and method of controlling the same
US20090217067A1 (en) * 2008-02-27 2009-08-27 Dell Products L.P. Systems and Methods for Reducing Power Consumption in a Redundant Storage Array
US20090254702A1 (en) * 2007-12-26 2009-10-08 Fujitsu Limited Recording medium storing data allocation control program, data allocation control device, data allocation control method, and multi-node storage-system
US20090271645A1 (en) * 2008-04-24 2009-10-29 Hitachi, Ltd. Management apparatus, storage apparatus and information processing system
WO2009135530A1 (en) * 2008-05-08 2009-11-12 Idcs System and method for sequential recording and archiving large volumes of video data
US7627610B2 (en) 2005-11-21 2009-12-01 Hitachi, Ltd. Computer system and method of reproducing data for the system
US20100050007A1 (en) * 2008-08-20 2010-02-25 Shan Jiang Solid state disk and method of managing power supply thereof and terminal including the same
US20100122050A1 (en) * 2008-11-13 2010-05-13 International Business Machines Corporation Virtual storage migration technique to minimize spinning disks
US20100149581A1 (en) * 2008-12-11 2010-06-17 Canon Kabushiki Kaisha Information processing system, information processing apparatus, and information processing method
US20110087912A1 (en) * 2009-10-08 2011-04-14 Bridgette, Inc. Dba Cutting Edge Networked Storage Power saving archive system
US20110302137A1 (en) * 2010-06-08 2011-12-08 Dell Products L.P. Systems and methods for improving storage efficiency in an information handling system
US20110320712A1 (en) * 2009-03-12 2011-12-29 Chengdu Huawei Symantec Technologies Co., Ltd. Method and apparatus for controlling state of storage device and storage device
US20120158652A1 (en) * 2010-12-15 2012-06-21 Pavan Ps System and method for ensuring consistency in raid storage array metadata
WO2013112141A1 (en) * 2012-01-25 2013-08-01 Hewlett-Packard Development Company, L.P. Storage system device management
KR101365562B1 (en) * 2011-01-28 2014-02-21 주식회사 디케이아이테크놀로지 Multiple data-storage-system for having power-saving function using meta-data and method for power-saving thereof
US8677154B2 (en) 2011-10-31 2014-03-18 International Business Machines Corporation Protecting sensitive data in a transmission
US8788913B1 (en) * 2011-12-30 2014-07-22 Emc Corporation Selection of erasure code parameters for no data repair
US8788755B2 (en) 2010-07-01 2014-07-22 Infinidat Ltd. Mass data storage system and method of operating thereof
US20140252829A1 (en) * 2013-03-05 2014-09-11 Wonderland Nurserygoods Company Limited Child Safety Seat Assembly
WO2014209915A1 (en) * 2013-06-28 2014-12-31 Western Digital Technologies, Inc. Dynamic raid controller power management
US9176544B2 (en) 2010-06-16 2015-11-03 Hewlett-Packard Development Company, L.P. Computer racks
US9189407B2 (en) 2010-07-01 2015-11-17 Infinidat Ltd. Pre-fetching in a storage system
US20180217772A1 (en) * 2017-01-31 2018-08-02 NE One LLC Controlled access to storage

Families Citing this family (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6721672B2 (en) 2002-01-02 2004-04-13 American Power Conversion Method and apparatus for preventing overloads of power distribution networks
US7210004B2 (en) * 2003-06-26 2007-04-24 Copan Systems Method and system for background processing of data in a storage system
US7210005B2 (en) * 2002-09-03 2007-04-24 Copan Systems, Inc. Method and apparatus for power-efficient high-capacity scalable storage system
JP4322031B2 (en) * 2003-03-27 2009-08-26 株式会社日立製作所 Storage device
US7278053B2 (en) * 2003-05-06 2007-10-02 International Business Machines Corporation Self healing storage system
US20050210304A1 (en) * 2003-06-26 2005-09-22 Copan Systems Method and apparatus for power-efficient high-capacity scalable storage system
US7111182B2 (en) * 2003-08-29 2006-09-19 Texas Instruments Incorporated Thread scheduling mechanisms for processor resource power management
US7484050B2 (en) * 2003-09-08 2009-01-27 Copan Systems Inc. High-density storage systems using hierarchical interconnect
US20060090098A1 (en) * 2003-09-11 2006-04-27 Copan Systems, Inc. Proactive data reliability in a power-managed storage system
US7373559B2 (en) * 2003-09-11 2008-05-13 Copan Systems, Inc. Method and system for proactive drive replacement for high availability storage systems
US7234074B2 (en) * 2003-12-17 2007-06-19 International Business Machines Corporation Multiple disk data storage system for reducing power consumption
US7216244B2 (en) * 2004-02-25 2007-05-08 Hitachi, Ltd. Data storage system with redundant storage media and method therefor
WO2006057671A1 (en) * 2004-07-06 2006-06-01 Prostor Systems, Inc. Data replication systems and methods
US8127068B2 (en) * 2004-07-06 2012-02-28 Tandberg Data Holdings S.A.R.L. Removable cartridge storage devices and methods
US8019908B2 (en) * 2004-07-06 2011-09-13 Tandberg Data Holdings S.A.R.L. Data replication systems and methods
US7434090B2 (en) * 2004-09-30 2008-10-07 Copan System, Inc. Method and apparatus for just in time RAID spare drive pool management
JP4688514B2 (en) * 2005-02-14 2011-05-25 株式会社日立製作所 Storage controller
US7698334B2 (en) * 2005-04-29 2010-04-13 Netapp, Inc. System and method for multi-tiered meta-data caching and distribution in a clustered computer environment
US7962689B1 (en) * 2005-04-29 2011-06-14 Netapp, Inc. System and method for performing transactional processing in a striped volume set
US20070079086A1 (en) * 2005-09-29 2007-04-05 Copan Systems, Inc. System for archival storage of data
US7386666B1 (en) * 2005-09-30 2008-06-10 Emc Corporation Global sparing of storage capacity across multiple storage arrays
US20080065703A1 (en) * 2006-02-22 2008-03-13 Copan Systems, Inc. Configurable views of archived data storage
US7516348B1 (en) 2006-02-24 2009-04-07 Emc Corporation Selective power management of disk drives during semi-idle time in order to save power and increase drive life span
JP2007293448A (en) * 2006-04-21 2007-11-08 Hitachi Ltd Storage system and its power supply control method
US7849261B2 (en) * 2006-06-29 2010-12-07 Seagate Technology Llc Temperature control to reduce cascade failures in a multi-device array
US7661005B2 (en) * 2006-06-30 2010-02-09 Seagate Technology Llc Individual storage device power control in a multi-device array
US7809885B2 (en) * 2006-09-29 2010-10-05 Voom Technologies, Inc. Scalable hard-drive replicator
US20080154920A1 (en) * 2006-12-22 2008-06-26 Copan Systems, Inc. Method and system for managing web content linked in a hierarchy
US7793042B2 (en) * 2007-01-05 2010-09-07 Dell Products, Lp System, method, and module for reducing power states for storage devices and associated logical volumes
US8312214B1 (en) 2007-03-28 2012-11-13 Netapp, Inc. System and method for pausing disk drives in an aggregate
JP4571958B2 (en) * 2007-03-30 2010-10-27 富士通株式会社 Power saving device by controller or disk control
US7702853B2 (en) * 2007-05-04 2010-04-20 International Business Machines Corporation Data storage system with power management control and method
US7870409B2 (en) 2007-09-26 2011-01-11 Hitachi, Ltd. Power efficient data storage with data de-duplication
US8068433B2 (en) * 2007-11-26 2011-11-29 Microsoft Corporation Low power operation of networked devices
US7984259B1 (en) 2007-12-17 2011-07-19 Netapp, Inc. Reducing load imbalance in a storage system
US9619162B1 (en) * 2007-12-21 2017-04-11 EMC IP Holding Company LLC Selecting a data protection strategy for a content unit on a storage system
US20090177836A1 (en) * 2008-01-09 2009-07-09 Yasuyuki Mimatsu Methods and apparatuses for managing data in a computer storage system
JP5379988B2 (en) * 2008-03-28 2013-12-25 株式会社日立製作所 Storage system
US8074014B2 (en) * 2008-03-31 2011-12-06 Microsoft Corporation Storage systems using write off-loading
US20090276647A1 (en) * 2008-04-30 2009-11-05 Intel Corporation Storage device power consumption state
JP4988653B2 (en) * 2008-06-13 2012-08-01 株式会社日立製作所 Disk array recording apparatus and recording control method therefor
JP5207367B2 (en) * 2008-06-16 2013-06-12 株式会社日立製作所 Computer system for reducing power consumption of storage system and control method thereof
US8510577B2 (en) * 2008-07-28 2013-08-13 Microsoft Corporation Reducing power consumption by offloading applications
JP2010049634A (en) * 2008-08-25 2010-03-04 Hitachi Ltd Storage system, and data migration method in storage system
JP2010061702A (en) * 2008-09-01 2010-03-18 Hitachi Ltd Information recording/reproducing apparatus, information recording/reproducing system, information processing apparatus, and information reproducing apparatus
JP5130169B2 (en) * 2008-09-17 2013-01-30 株式会社日立製作所 Method for allocating physical volume area to virtualized volume and storage device
US8448004B2 (en) 2008-10-27 2013-05-21 Netapp, Inc. Power savings using dynamic storage cluster membership
CN101729272B (en) * 2008-10-27 2013-01-23 华为技术有限公司 Method, system and device for content distribution, and media server
WO2010055549A1 (en) * 2008-11-17 2010-05-20 Hitachi, Ltd. Storage control apparatus and storage control method
US8185754B2 (en) * 2009-02-25 2012-05-22 International Business Machines Corporation Time-based storage access and method of power savings and improved utilization thereof
US8356139B2 (en) * 2009-03-24 2013-01-15 Hitachi, Ltd. Storage system for maintaining hard disk reliability
TWI431464B (en) * 2009-04-29 2014-03-21 Micro Star Int Co Ltd Computer system with power control and power control method
US9619163B2 (en) 2009-06-26 2017-04-11 International Business Machines Corporation Maintaining access times in storage systems employing power saving techniques
US8201001B2 (en) * 2009-08-04 2012-06-12 Lsi Corporation Method for optimizing performance and power usage in an archival storage system by utilizing massive array of independent disks (MAID) techniques and controlled replication under scalable hashing (CRUSH)
US8224993B1 (en) * 2009-12-07 2012-07-17 Amazon Technologies, Inc. Managing power consumption in a data center
US8370672B2 (en) * 2010-02-26 2013-02-05 Microsoft Corporation Reducing power consumption of distributed storage systems
KR20110106594A (en) * 2010-03-23 2011-09-29 주식회사 히타치엘지 데이터 스토리지 코리아 Apparatus and method for setting a parity drive in optical disc drive archive system
US9720606B2 (en) 2010-10-26 2017-08-01 Avago Technologies General Ip (Singapore) Pte. Ltd. Methods and structure for online migration of data in storage systems comprising a plurality of storage devices
US8656454B2 (en) 2010-12-01 2014-02-18 Microsoft Corporation Data store including a file location attribute
US9594421B2 (en) 2011-03-08 2017-03-14 Xyratex Technology Limited Power management in a multi-device storage array
US20120254501A1 (en) * 2011-03-28 2012-10-04 Byungcheol Cho System architecture based on flash memory
US20120254500A1 (en) * 2011-03-28 2012-10-04 Byungcheol Cho System architecture based on ddr memory
US9384199B2 (en) 2011-03-31 2016-07-05 Microsoft Technology Licensing, Llc Distributed file system
CN103793035A (en) * 2012-10-30 2014-05-14 鸿富锦精密工业(深圳)有限公司 Hard disk control circuit
US9213611B2 (en) 2013-07-24 2015-12-15 Western Digital Technologies, Inc. Automatic raid mirroring when adding a second boot drive
JP6213130B2 (en) * 2013-10-09 2017-10-18 富士通株式会社 Storage control device, storage control program, and storage control method
US9081828B1 (en) * 2014-04-30 2015-07-14 Igneous Systems, Inc. Network addressable storage controller with storage drive profile comparison
USRE48835E1 (en) * 2014-04-30 2021-11-30 Rubrik, Inc. Network addressable storage controller with storage drive profile comparison
US9116833B1 (en) 2014-12-18 2015-08-25 Igneous Systems, Inc. Efficiency for erasure encoding
US10026454B2 (en) 2015-04-28 2018-07-17 Seagate Technology Llc Storage system with cross flow cooling of power supply unit
US9361046B1 (en) 2015-05-11 2016-06-07 Igneous Systems, Inc. Wireless data storage chassis
US10097636B1 (en) 2015-06-15 2018-10-09 Western Digital Technologies, Inc. Data storage device docking station
US11269861B2 (en) 2019-06-17 2022-03-08 Bank Of America Corporation Database tool
US11100092B2 (en) 2019-06-17 2021-08-24 Bank Of America Corporation Database tool
US11327670B2 (en) * 2020-01-09 2022-05-10 International Business Machines Corporation Reducing power consumption in a dispersed storage network

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5088081A (en) * 1990-03-28 1992-02-11 Prime Computer, Inc. Method and apparatus for improved disk access
US5530658A (en) * 1994-12-07 1996-06-25 International Business Machines Corporation System and method for packing heat producing devices in an array to prevent local overheating
US5666538A (en) * 1995-06-07 1997-09-09 Ast Research, Inc. Disk power manager for network servers
US5680579A (en) * 1994-11-10 1997-10-21 Kaman Aerospace Corporation Redundant array of solid state memory devices
US20020007464A1 (en) * 1990-06-01 2002-01-17 Amphus, Inc. Apparatus and method for modular dynamically power managed power supply and cooling system for computer systems, server applications, and other electronic devices
US20020062454A1 (en) * 2000-09-27 2002-05-23 Amphus, Inc. Dynamic power and workload management for multi-server system
US20020144057A1 (en) * 2001-01-30 2002-10-03 Data Domain Archival data storage system and method
US20030196126A1 (en) * 2002-04-11 2003-10-16 Fung Henry T. System, method, and architecture for dynamic server power management and dynamic workload management for multi-server environment
US20050177755A1 (en) * 2000-09-27 2005-08-11 Amphus, Inc. Multi-server and multi-CPU power management system and method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5088081A (en) * 1990-03-28 1992-02-11 Prime Computer, Inc. Method and apparatus for improved disk access
US20020007464A1 (en) * 1990-06-01 2002-01-17 Amphus, Inc. Apparatus and method for modular dynamically power managed power supply and cooling system for computer systems, server applications, and other electronic devices
US20030200473A1 (en) * 1990-06-01 2003-10-23 Amphus, Inc. System and method for activity or event based dynamic energy conserving server reconfiguration
US5680579A (en) * 1994-11-10 1997-10-21 Kaman Aerospace Corporation Redundant array of solid state memory devices
US5530658A (en) * 1994-12-07 1996-06-25 International Business Machines Corporation System and method for packing heat producing devices in an array to prevent local overheating
US5787462A (en) * 1994-12-07 1998-07-28 International Business Machines Corporation System and method for memory management in an array of heat producing devices to prevent local overheating
US5666538A (en) * 1995-06-07 1997-09-09 Ast Research, Inc. Disk power manager for network servers
US5961613A (en) * 1995-06-07 1999-10-05 Ast Research, Inc. Disk power manager for network servers
US20020062454A1 (en) * 2000-09-27 2002-05-23 Amphus, Inc. Dynamic power and workload management for multi-server system
US20050177755A1 (en) * 2000-09-27 2005-08-11 Amphus, Inc. Multi-server and multi-CPU power management system and method
US20020144057A1 (en) * 2001-01-30 2002-10-03 Data Domain Archival data storage system and method
US20030196126A1 (en) * 2002-04-11 2003-10-16 Fung Henry T. System, method, and architecture for dynamic server power management and dynamic workload management for multi-server environment

Cited By (145)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060193073A1 (en) * 1999-04-05 2006-08-31 Kazuo Hakamata Disk array unit
US8929018B2 (en) 1999-04-05 2015-01-06 Hitachi, Ltd. Disk array unit
US7355806B2 (en) 1999-04-05 2008-04-08 Hitachi, Ltd. Disk array unit
US20050259345A1 (en) * 1999-04-05 2005-11-24 Kazuo Hakamata Disk array unit
US7554758B2 (en) 1999-04-05 2009-06-30 Hitachi, Ltd. Disk array unit
US20070244964A1 (en) * 2001-12-19 2007-10-18 Challenger James R H Method and system for caching message fragments using an expansion attribute in a fragment link tag
US20040260967A1 (en) * 2003-06-05 2004-12-23 Copan Systems, Inc. Method and apparatus for efficient fault-tolerant disk drive replacement in raid storage systems
US7434097B2 (en) 2003-06-05 2008-10-07 Copan System, Inc. Method and apparatus for efficient fault-tolerant disk drive replacement in raid storage systems
US20060143381A1 (en) * 2003-06-18 2006-06-29 Akihiro Mori System and method for accessing an offline storage unit through an online storage unit
US7366870B2 (en) 2003-06-18 2008-04-29 Hitachi, Ltd. System and method for accessing an offline storage unit through an online storage unit
US8078809B2 (en) 2003-06-18 2011-12-13 Hitachi, Ltd. System for accessing an offline storage unit through an online storage unit
US7657768B2 (en) 2003-11-26 2010-02-02 Hitachi, Ltd. Disk array optimizing the drive operation time
US20080168227A1 (en) * 2003-11-26 2008-07-10 Hitachi, Ltd. Disk Array Optimizing The Drive Operation Time
US7353406B2 (en) 2003-11-26 2008-04-01 Hitachi, Ltd. Disk array optimizing the drive operation time
US20050111249A1 (en) * 2003-11-26 2005-05-26 Hitachi, Ltd. Disk array optimizing the drive operation time
US7669016B2 (en) * 2003-12-25 2010-02-23 Hitachi, Ltd. Memory control device and method for controlling the same
US8516204B2 (en) 2003-12-25 2013-08-20 Hitachi, Ltd. Memory control device and method for controlling the same
US20100122029A1 (en) * 2003-12-25 2010-05-13 Hitachi, Ltd. Memory control device and method for controlling the same
US7360017B2 (en) * 2003-12-25 2008-04-15 Hitachi, Ltd. Storage control device for longevity of the disk spindles based upon access of hard disk drives
US7975113B2 (en) 2003-12-25 2011-07-05 Hitachi, Ltd. Memory control device and method for controlling the same
US20060101222A1 (en) * 2003-12-25 2006-05-11 Hitachi, Ltd. Memory control device and method for controlling the same
US20050144383A1 (en) * 2003-12-25 2005-06-30 Seiichi Higaki Memory control device and method for controlling the same
US20070073970A1 (en) * 2004-01-16 2007-03-29 Hitachi, Ltd. Disk array apparatus and disk array apparatus controlling method
US8402211B2 (en) 2004-01-16 2013-03-19 Hitachi, Ltd. Disk array apparatus and disk array apparatus controlling method
US7373456B2 (en) 2004-01-16 2008-05-13 Hitachi, Ltd. Disk array apparatus and disk array apparatus controlling method
US20080040543A1 (en) * 2004-01-16 2008-02-14 Hitachi, Ltd. Disk array apparatus and disk array apparatus controlling method
US7281088B2 (en) 2004-01-16 2007-10-09 Hitachi, Ltd. Disk array apparatus and disk array apparatus controlling method
US8015442B2 (en) 2004-02-04 2011-09-06 Hitachi, Ltd. Anomaly notification control in disk array
US20060255409A1 (en) * 2004-02-04 2006-11-16 Seiki Morita Anomaly notification control in disk array
US7823010B2 (en) 2004-02-04 2010-10-26 Hitachi, Ltd. Anomaly notification control in disk array
US20070168709A1 (en) * 2004-02-04 2007-07-19 Seiki Morita Anomaly notification control in disk array
US8365013B2 (en) 2004-02-04 2013-01-29 Hitachi, Ltd. Anomaly notification control in disk array
US7516346B2 (en) * 2004-10-28 2009-04-07 Nec Laboratories America, Inc. System and method for dynamically changing the power mode of storage disks based on redundancy and system load
US20060107099A1 (en) * 2004-10-28 2006-05-18 Nec Laboratories America, Inc. System and Method for Redundant Storage with Improved Energy Consumption
WO2006086066A3 (en) * 2005-02-04 2007-04-19 Dot Hill Systems Corp Storage device method and apparatus
WO2006086066A2 (en) * 2005-02-04 2006-08-17 Dot Hill Systems Corp. Storage device method and apparatus
US20060279869A1 (en) * 2005-06-08 2006-12-14 Akira Yamamoto Storage system controlling power supply module and fan
US20100246058A1 (en) * 2005-06-08 2010-09-30 Akira Yamamoto Storage system controlling power supply module and fan
US7227713B2 (en) 2005-06-08 2007-06-05 Hitachi, Ltd. Storage system controlling power supply module and fan
US8325438B2 (en) 2005-06-08 2012-12-04 Hitachi, Ltd. Storage system controlling power supply module and fan
US20070242385A1 (en) * 2005-06-08 2007-10-18 Akira Yamamoto Storage system controlling power supply module and fan
US7710680B2 (en) 2005-06-08 2010-05-04 Hitachi, Ltd. Storage system controlling power supply module and fan
US8125726B2 (en) 2005-06-08 2012-02-28 Hitachi, Ltd. Storage system controlling power supply module and fan
US20070061509A1 (en) * 2005-09-09 2007-03-15 Vikas Ahluwalia Power management in a distributed file system
US8112596B2 (en) 2005-09-13 2012-02-07 Hitachi, Ltd. Management apparatus, management method and storage management system
US20070061510A1 (en) * 2005-09-13 2007-03-15 Atsuya Kumagai Storage unit and disk control method
US20070061512A1 (en) * 2005-09-13 2007-03-15 Hitachi, Ltd. Management apparatus, management method and storage management system
US7444483B2 (en) 2005-09-13 2008-10-28 Hitachi, Ltd. Management apparatus, management method and storage management system
US20090006735A1 (en) * 2005-09-13 2009-01-01 Hitachi, Ltd. Storage unit and disk control method
US20090044035A1 (en) * 2005-09-13 2009-02-12 Hitachi, Ltd. Management apparatus, management method and storage management system
US7404035B2 (en) 2005-09-20 2008-07-22 Hitachi, Ltd. System for controlling spinning of disk
US20070067560A1 (en) * 2005-09-20 2007-03-22 Tomoya Anzai System for controlling spinning of disk
EP1770494A1 (en) * 2005-09-20 2007-04-04 Hitachi, Ltd. System for controlling the spinning of disks
US20080270699A1 (en) * 2005-09-20 2008-10-30 Hitachi, Ltd. System for controlling spinning of disk
US20090276565A1 (en) * 2005-09-22 2009-11-05 Akira Fujibayashi Storage control apparatus, data management system and data management method
US7568075B2 (en) 2005-09-22 2009-07-28 Hitachi, Ltd. Apparatus, system and method for making endurance of storage media
US20070067559A1 (en) * 2005-09-22 2007-03-22 Akira Fujibayashi Storage control apparatus, data management system and data management method
EP1768014A1 (en) 2005-09-22 2007-03-28 Hitachi, Ltd. Storage control apparatus, data management system and data management method
EP1770499A1 (en) * 2005-09-22 2007-04-04 Hitachi, Ltd. Storage control apparatus, data management system and data management method
US20070271413A1 (en) * 2005-09-22 2007-11-22 Akira Fujibayashi Storage control apparatus, data management system and data management method
US7962704B2 (en) 2005-09-22 2011-06-14 Hitachi, Ltd. Storage system of storage hierarchy devices and a method of controlling storage hierarchy devices based on a user policy of I/O performance and power consumption
US20110213916A1 (en) * 2005-09-22 2011-09-01 Akira Fujibayashi Storage control apparatus, data management system and data management method
EP2109035A3 (en) * 2005-09-22 2009-10-21 Hitachi Ltd. Storage control apparatus, data management system and data management method
US8166270B2 (en) 2005-09-22 2012-04-24 Hitachi, Ltd. Storage control apparatus, data management system and data management method for determining storage heirarchy based on a user policy
US7549016B2 (en) 2005-09-22 2009-06-16 Hitachi, Ltd. Storage control apparatus for selecting storage media based on a user-specified performance requirement
US7426646B2 (en) 2005-09-30 2008-09-16 Hitachi, Ltd. Computer apparatus, storage apparatus, system management apparatus, and hard disk unit power supply controlling method
US8024587B2 (en) 2005-09-30 2011-09-20 Hitachi, Ltd. Computer apparatus, storage apparatus, system management apparatus, and hard disk unit power supply controlling method
US20090077392A1 (en) * 2005-09-30 2009-03-19 Kazuhisa Fujimoto Computer apparatus, storage apparatus, system management apparatus, and hard disk unit power supply controlling method
US20070079156A1 (en) * 2005-09-30 2007-04-05 Kazuhisa Fujimoto Computer apparatus, storage apparatus, system management apparatus, and hard disk unit power supply controlling method
US7640443B2 (en) 2005-09-30 2009-12-29 Hitachi, Ltd. Computer apparatus, storage apparatus, system management apparatus, and hard disk unit power supply controlling method
US20070079088A1 (en) * 2005-10-05 2007-04-05 Akira Deguchi Information processing system, control method for information processing system, and storage system
US7529950B2 (en) * 2005-10-05 2009-05-05 Hitachi, Ltd. Information processing system, control method for information processing system, and storage system
US7627610B2 (en) 2005-11-21 2009-12-01 Hitachi, Ltd. Computer system and method of reproducing data for the system
US20090248980A1 (en) * 2005-12-02 2009-10-01 Hitachi, Ltd. Storage System and Capacity Allocation Method Therefor
US7743212B2 (en) 2005-12-02 2010-06-22 Hitachi, Ltd. Storage system and capacity allocation method therefor
US20070130424A1 (en) * 2005-12-02 2007-06-07 Hitachi, Ltd. Storage system and capacity allocation method therefor
US7539817B2 (en) 2005-12-02 2009-05-26 Hitachi, Ltd. Storage system and capacity allocation method therefor
US20070143542A1 (en) * 2005-12-16 2007-06-21 Hitachi, Ltd. Storage controller, and method of controlling storage controller
US7469315B2 (en) 2005-12-16 2008-12-23 Hitachi, Ltd. Storage controller, and method of controlling storage controller to improve the reliability of the storage controller
US7836251B2 (en) 2005-12-16 2010-11-16 Hitachi, Ltd. Storage controller, and method operative to relocate logical storage devices based on times and locations specified in a relocating plan
US20070143559A1 (en) * 2005-12-20 2007-06-21 Yuichi Yagawa Apparatus, system and method incorporating virtualization for data storage
EP1966671A2 (en) * 2005-12-30 2008-09-10 Copan Systems, Inc. Method and system for accessing auxiliary data in power-efficient high-capacity scalable storage system
EP1966671A4 (en) * 2005-12-30 2010-09-08 Copan Systems Inc Method and system for accessing auxiliary data in power-efficient high-capacity scalable storage system
WO2007079056A2 (en) 2005-12-30 2007-07-12 Copan Systems, Inc. Method and system for accessing auxiliary data in power-efficient high-capacity scalable storage system
US20070208921A1 (en) * 2006-03-03 2007-09-06 Hitachi, Ltd. Storage system and control method for the same
US7472211B2 (en) 2006-07-28 2008-12-30 International Business Machines Corporation Blade server switch module using out-of-band signaling to detect the physical location of an active drive enclosure device
EP1887472A2 (en) * 2006-08-04 2008-02-13 Hitachi, Ltd. Storage system for suppressing failures of storage media group
EP1887472A3 (en) * 2006-08-04 2012-02-08 Hitachi, Ltd. Storage system for suppressing failures of storage media group
US8214586B2 (en) 2006-10-11 2012-07-03 Hitachi, Ltd. Apparatus and method for mirroring data between nonvolatile memory and a hard disk drive
US7669019B2 (en) 2006-10-11 2010-02-23 Hitachi, Ltd. Apparatus and method of mirroring data between nonvolatile memory and hard disk
US8082389B2 (en) 2006-10-11 2011-12-20 Hitachi, Ltd. Apparatus and method for mirroring data between nonvolatile memory and a hard disk drive
US20080091876A1 (en) * 2006-10-11 2008-04-17 Akira Fujibayashi Storage apparatus and control method thereof
US20100106903A1 (en) * 2006-10-11 2010-04-29 Akira Fujibayashi Storage apparatus and control method thereof
US8171214B2 (en) * 2006-10-17 2012-05-01 Hitachi, Ltd. Storage subsystem, storage system, and method of controlling power supply to the storage subsystem
US20080256307A1 (en) * 2006-10-17 2008-10-16 Kazuhisa Fujimoto Storage subsystem, storage system, and method of controlling power supply to the storage subsystem
US20080104280A1 (en) * 2006-10-30 2008-05-01 Biskeborn Robert G On Demand Storage Array
US7952882B2 (en) 2006-10-30 2011-05-31 International Business Machines Corporation On demand storage array
US20080126701A1 (en) * 2006-11-28 2008-05-29 Uehara Go Storage system comprising both power saving and diagnostic functions
EP1953620A3 (en) * 2006-11-28 2011-07-06 Hitachi, Ltd. Storage system comprising both power saving and diagnostic functions
US8219748B2 (en) 2006-11-28 2012-07-10 Hitachi, Ltd. Storage system comprising both power saving and diagnostic functions
EP2017712A3 (en) * 2007-07-10 2011-12-07 Hitachi, Ltd. Power efficient storage with data de-duplication
JP2009020858A (en) * 2007-07-10 2009-01-29 Hitachi Ltd Storage of high power efficiency using data de-duplication
US8225036B2 (en) 2007-07-24 2012-07-17 Hitachi, Ltd. Storage controller and method for controlling the same
US20090031150A1 (en) * 2007-07-24 2009-01-29 Hitachi, Ltd. Storage controller and method for controlling the same
EP2026186A3 (en) * 2007-07-24 2011-12-07 Hitachi, Ltd. Storage controller and method for controlling the same
US7996605B2 (en) 2007-07-24 2011-08-09 Hitachi, Ltd. Storage controller and method for controlling the same
EP2026186A2 (en) 2007-07-24 2009-02-18 Hitachi, Ltd. Storage controller and method for controlling the same
US20090055520A1 (en) * 2007-08-23 2009-02-26 Shunya Tabata Method for scheduling of storage devices
US7849285B2 (en) 2007-08-23 2010-12-07 Hitachi, Ltd. Method for scheduling of storage devices
US8141095B2 (en) 2007-12-26 2012-03-20 Fujitsu Limited Recording medium storing data allocation control program, data allocation control device, data allocation control method, and multi-node storage-system
US20090254702A1 (en) * 2007-12-26 2009-10-08 Fujitsu Limited Recording medium storing data allocation control program, data allocation control device, data allocation control method, and multi-node storage-system
US20090210732A1 (en) * 2008-02-19 2009-08-20 Canon Kabushiki Kaisha Information processing apparatus and method of controlling the same
US8762750B2 (en) * 2008-02-19 2014-06-24 Canon Kabushiki Kaisha Information processing apparatus and method of controlling the same
US20090217067A1 (en) * 2008-02-27 2009-08-27 Dell Products L.P. Systems and Methods for Reducing Power Consumption in a Redundant Storage Array
US20090271645A1 (en) * 2008-04-24 2009-10-29 Hitachi, Ltd. Management apparatus, storage apparatus and information processing system
WO2009135530A1 (en) * 2008-05-08 2009-11-12 Idcs System and method for sequential recording and archiving large volumes of video data
US20110106907A1 (en) * 2008-05-08 2011-05-05 V.S.K. Electronics System and method for sequential recording and archiving large volumes of video data
US8195971B2 (en) * 2008-08-20 2012-06-05 Lenovo (Beijing) Limited Solid state disk and method of managing power supply thereof and terminal including the same
US20100050007A1 (en) * 2008-08-20 2010-02-25 Shan Jiang Solid state disk and method of managing power supply thereof and terminal including the same
US20100122050A1 (en) * 2008-11-13 2010-05-13 International Business Machines Corporation Virtual storage migration technique to minimize spinning disks
US8301852B2 (en) 2008-11-13 2012-10-30 International Business Machines Corporation Virtual storage migration technique to minimize spinning disks
US8902443B2 (en) * 2008-12-11 2014-12-02 Canon Kabushiki Kaisha Information processing system, information processing apparatus, and information processing method
US20100149581A1 (en) * 2008-12-11 2010-06-17 Canon Kabushiki Kaisha Information processing system, information processing apparatus, and information processing method
US20110320712A1 (en) * 2009-03-12 2011-12-29 Chengdu Huawei Symantec Technologies Co., Ltd. Method and apparatus for controlling state of storage device and storage device
US20110087912A1 (en) * 2009-10-08 2011-04-14 Bridgette, Inc. Dba Cutting Edge Networked Storage Power saving archive system
US8627130B2 (en) 2009-10-08 2014-01-07 Bridgette, Inc. Power saving archive system
US10191910B2 (en) * 2010-06-08 2019-01-29 Dell Products L.P. Systems and methods for improving storage efficiency in an information handling system
US20160154814A1 (en) * 2010-06-08 2016-06-02 Dell Products L.P. Systems and methods for improving storage efficiency in an information handling system
US9292533B2 (en) * 2010-06-08 2016-03-22 Dell Products L.P. Systems and methods for improving storage efficiency in an information handling system
US20110302137A1 (en) * 2010-06-08 2011-12-08 Dell Products L.P. Systems and methods for improving storage efficiency in an information handling system
US9176544B2 (en) 2010-06-16 2015-11-03 Hewlett-Packard Development Company, L.P. Computer racks
US8788755B2 (en) 2010-07-01 2014-07-22 Infinidat Ltd. Mass data storage system and method of operating thereof
US9189407B2 (en) 2010-07-01 2015-11-17 Infinidat Ltd. Pre-fetching in a storage system
US20120158652A1 (en) * 2010-12-15 2012-06-21 Pavan Ps System and method for ensuring consistency in raid storage array metadata
KR101365562B1 (en) * 2011-01-28 2014-02-21 주식회사 디케이아이테크놀로지 Multiple data-storage-system for having power-saving function using meta-data and method for power-saving thereof
US8677154B2 (en) 2011-10-31 2014-03-18 International Business Machines Corporation Protecting sensitive data in a transmission
US9280416B1 (en) * 2011-12-30 2016-03-08 Emc Corporation Selection of erasure code parameters for no data repair
US8788913B1 (en) * 2011-12-30 2014-07-22 Emc Corporation Selection of erasure code parameters for no data repair
CN104067237A (en) * 2012-01-25 2014-09-24 惠普发展公司,有限责任合伙企业 Storage system device management
WO2013112141A1 (en) * 2012-01-25 2013-08-01 Hewlett-Packard Development Company, L.P. Storage system device management
US20140252829A1 (en) * 2013-03-05 2014-09-11 Wonderland Nurserygoods Company Limited Child Safety Seat Assembly
US9469222B2 (en) * 2013-03-05 2016-10-18 Wonderland Nurserygoods Company Limited Child safety seat assembly
WO2014209915A1 (en) * 2013-06-28 2014-12-31 Western Digital Technologies, Inc. Dynamic raid controller power management
US20180217772A1 (en) * 2017-01-31 2018-08-02 NE One LLC Controlled access to storage
US10474379B2 (en) * 2017-01-31 2019-11-12 NE One LLC Controlled access to storage

Also Published As

Publication number Publication date
US7035972B2 (en) 2006-04-25
US20050268119A9 (en) 2005-12-01

Similar Documents

Publication Publication Date Title
US7035972B2 (en) Method and apparatus for power-efficient high-capacity scalable storage system
US7210005B2 (en) Method and apparatus for power-efficient high-capacity scalable storage system
US7330931B2 (en) Method and system for accessing auxiliary data in power-efficient high-capacity scalable storage system
US7380060B2 (en) Background processing of data in a storage system
US20050210304A1 (en) Method and apparatus for power-efficient high-capacity scalable storage system
Verma et al. SRCMap: Energy Proportional Storage Using Dynamic Consolidation.
EP1912122B1 (en) Storage apparatus and control method thereof
US6058489A (en) On-line disk array reconfiguration
EP1540450B1 (en) Method and apparatus for power-efficient high-capacity scalable storage system
US7882305B2 (en) Storage apparatus and data management method in storage apparatus
US7493441B2 (en) Mass storage controller with apparatus and method for extending battery backup time by selectively providing battery power to volatile memory banks not storing critical data
Xiao et al. Semi-RAID: A reliable energy-aware RAID data layout for sequential data access
JP2012506087A (en) Power and performance management using MAIDX and adaptive data placement
US8171324B2 (en) Information processing device, data writing method, and program for the same
Wan et al. ThinRAID: Thinning down RAID array for energy conservation
Chen et al. An energy-efficient and reliable storage mechanism for data-intensive academic archive systems
JP3597086B2 (en) Disk array controller
Guha A new approach to disk-based mass storage systems
Wan et al. A reliable and energy-efficient storage system with erasure coding cache
Dong et al. HS-RAID2: optimizing small write performance in HS-RAID
Yao Improving energy efficiency and performance in storage server systems
JPH1166692A (en) Disk storage device and its control method

Legal Events

Date Code Title Description
AS Assignment

Owner name: COPAN SYSTEMS, INC., COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GUHA, ALOKE;SANTILLI, CHRIS T.;MCMILLIAN, GARY B.;REEL/FRAME:014247/0723;SIGNING DATES FROM 20030616 TO 20030623

Owner name: COPAN SYSTEMS, INC., COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GUHA, ALOKE;SANTILLI, CHRIS T.;MCMILLIAN, GARY B.;SIGNING DATES FROM 20030616 TO 20030623;REEL/FRAME:014247/0723

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: SILICON VALLEY BANK, CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNORS:COPAN SYSTEMS, INC.;COPAN SYSTEMS EMEA (PTY) LIMITED;REEL/FRAME:022228/0408

Effective date: 20081024

Owner name: SILICON VALLEY BANK, CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:COPAN SYSTEMS, INC.;REEL/FRAME:022228/0601

Effective date: 20081024

AS Assignment

Owner name: WESTBURY INVESTMENT PARTNERS SBIC, LP, NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNOR:COPAN SYSTEMS, INC.;REEL/FRAME:022309/0579

Effective date: 20090209

REMI Maintenance fee reminder mailed
FEPP Fee payment procedure

Free format text: PETITION RELATED TO MAINTENANCE FEES FILED (ORIGINAL EVENT CODE: PMFP); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PMFG); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: SILICON GRAPHICS INTERNATIONAL CORP.,CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:024351/0936

Effective date: 20100223

Owner name: SILICON GRAPHICS INTERNATIONAL CORP., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:024351/0936

Effective date: 20100223

PRDP Patent reinstated due to the acceptance of a late maintenance fee

Effective date: 20100512

FPAY Fee payment

Year of fee payment: 4

SULP Surcharge for late payment
FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: SILICON VALLEY BANK, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WESTBURY INVESTMENT PARTNERS SBIC, LP;REEL/FRAME:032276/0091

Effective date: 20100223

AS Assignment

Owner name: RPX CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SILICON GRAPHICS INTERNATIONAL CORP.;REEL/FRAME:035409/0615

Effective date: 20150327

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, IL

Free format text: SECURITY AGREEMENT;ASSIGNORS:RPX CORPORATION;RPX CLEARINGHOUSE LLC;REEL/FRAME:038041/0001

Effective date: 20160226

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553)

Year of fee payment: 12

AS Assignment

Owner name: RPX CORPORATION, CALIFORNIA

Free format text: RELEASE (REEL 038041 / FRAME 0001);ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:044970/0030

Effective date: 20171222

Owner name: RPX CLEARINGHOUSE LLC, CALIFORNIA

Free format text: RELEASE (REEL 038041 / FRAME 0001);ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:044970/0030

Effective date: 20171222

AS Assignment

Owner name: JEFFERIES FINANCE LLC, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:RPX CORPORATION;REEL/FRAME:046486/0433

Effective date: 20180619

AS Assignment

Owner name: BARINGS FINANCE LLC, AS COLLATERAL AGENT, NORTH CAROLINA

Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:RPX CLEARINGHOUSE LLC;RPX CORPORATION;REEL/FRAME:054198/0029

Effective date: 20201023

Owner name: BARINGS FINANCE LLC, AS COLLATERAL AGENT, NORTH CAROLINA

Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:RPX CLEARINGHOUSE LLC;RPX CORPORATION;REEL/FRAME:054244/0566

Effective date: 20200823

AS Assignment

Owner name: RPX CORPORATION, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JEFFERIES FINANCE LLC;REEL/FRAME:054486/0422

Effective date: 20201023