US9342389B2 - Neighbor based and dynamic hot threshold based hot data identification - Google Patents

Neighbor based and dynamic hot threshold based hot data identification Download PDF

Info

Publication number
US9342389B2
US9342389B2 US14/169,877 US201414169877A US9342389B2 US 9342389 B2 US9342389 B2 US 9342389B2 US 201414169877 A US201414169877 A US 201414169877A US 9342389 B2 US9342389 B2 US 9342389B2
Authority
US
United States
Prior art keywords
hot
metric
dynamic
threshold
received address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/169,877
Other versions
US20140304480A1 (en
Inventor
Xiangyu Tang
Frederick K. H. Lee
Jason Bellorado
Lingqi Zeng
Zheng Wu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SK Hynix Inc
Original Assignee
SK Hynix Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SK Hynix Inc filed Critical SK Hynix Inc
Priority to US14/169,877 priority Critical patent/US9342389B2/en
Priority to CN201480019802.5A priority patent/CN105556485B/en
Priority to PCT/US2014/014506 priority patent/WO2014163743A1/en
Assigned to SK HYNIX MEMORY SOLUTIONS INC. reassignment SK HYNIX MEMORY SOLUTIONS INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BELLORADO, JASON, LEE, FREDERICK K.H., TANG, XIANGYU, WU, ZHENG, ZENG, LINGQI
Assigned to SK Hynix Inc. reassignment SK Hynix Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SK HYNIX MEMORY SOLUTIONS INC.
Publication of US20140304480A1 publication Critical patent/US20140304480A1/en
Application granted granted Critical
Publication of US9342389B2 publication Critical patent/US9342389B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3471Address tracing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms

Definitions

  • Hot data identification is a process or technique is which data is classified or identified as being either hot or cold.
  • Hot data generally speaking
  • Cold data generally speaking
  • solid state storage systems use hot data identification techniques to group hot data together and cold data together, for example when host writes are received (e.g., hot data is written to the cache whereas cold data is written to the larger but slower main drive) or during garbage collection (e.g., hot data in a block being garbage collected is written to a first new block whereas cold data in that same block is written to a second new block).
  • Such improved techniques may (for example) improve the accuracy of hot data identification, which in turn improves the efficiency of the solid state system (e.g., by reducing write amplification, which is defined as the ratio of the number of writes to solid state storage compared to the number of host writes).
  • FIG. 1 is a flowchart illustrating an embodiment of a neighbor based hot data identification process.
  • FIG. 2 is a diagram which shows exemplary simulation results comparing neighbor based hot data identification to another hot data identification technique.
  • FIG. 3 is a diagram illustrating an embodiment of a system which performs neighbor based hot data identifier system.
  • FIG. 4 is a flowchart illustrating an embodiment of a process for determining one or more neighboring hot metrics.
  • FIG. 5 is a flowchart illustrating an embodiment of a process for determining a hot metric using neighboring hot metrics.
  • FIG. 6 is a flowchart illustrating an embodiment of a dynamic hot threshold based hot data identification process.
  • FIG. 7 is a diagram illustrating an embodiment of a system which performs dynamic hot threshold based hot data identification.
  • FIG. 8 is a flowchart illustrating an embodiment of a process for identifying hot data based on neighbors and using a dynamic hot threshold.
  • FIG. 9 is a diagram illustrating an embodiment of a system which performs neighbor based and dynamic hot threshold based hot data identification.
  • the invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.
  • these implementations, or any other form that the invention may take, may be referred to as techniques.
  • the order of the steps of disclosed processes may be altered within the scope of the invention.
  • a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task.
  • the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
  • FIG. 1 is a flowchart illustrating an embodiment of a neighbor based hot data identification process.
  • the process is performed by a storage controller in a solid state storage system.
  • the storage controller is implemented as or using a semiconductor device, such as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • an address is received.
  • the received address is a logical block address (LBA) in solid state storage (e.g., NAND Flash).
  • LBA logical block address
  • the process of FIG. 1 is performed during a host write and the address received at step 100 is associated with a host write.
  • one or more neighboring hot metrics are determined for one or more neighbors associated with the received address.
  • a neighboring hot metric refers to a metric associated with a neighbor (e.g., a neighbor of the address received at step 100 ).
  • the neighboring hot metrics determined at step 102 are generated on-the-fly so that the neighboring hot metrics are fresh or otherwise up to date. Any one of a variety of hot data identification techniques may be used at step 102 .
  • a metric associated with the received address may be determined based on recency and/or frequency, and then that metric is compared against some hot threshold.
  • a (neighboring) hot metric is one of two possible values: hot or cold.
  • the neighboring hot metrics determined at step 102 may be one of two values: hot (e.g., a one) or cold (e.g., a zero).
  • a (neighboring) hot metric is a score.
  • the neighboring hot metrics determined at step 102 may range in value from one (e.g., very cold) to five (e.g., very hot).
  • a hot metric for the received address is determined based at least in part on the neighboring hot metrics. In one example, if any of the neighboring hot metrics determined at step 102 corresponds to a hot value, then the hot metric for the received address is set to a hot value.
  • the process of FIG. 1 may be performed by a solid state storage system (e.g., a NAND Flash storage system) during a host write.
  • the exemplary solid state storage system includes a cache (e.g., which has quicker access times (e.g., it is faster to read from and/or write to), can tolerate more errors (e.g., compared to the non-cache portion), and can tolerate more program/erase cycles (e.g., it is more robust)).
  • a cache e.g., which has quicker access times (e.g., it is faster to read from and/or write to), can tolerate more errors (e.g., compared to the non-cache portion), and can tolerate more program/erase cycles (e.g., it is more robust)).
  • an address e.g., being written to during the host write
  • that data is written into the cache.
  • an address is determined to be cold data, then that data is written into the non-cache portion.
  • solid state storage does not support in-place updates (e.g., unlike magnetic storage). For example, when old data is superseded by some new data, the old location or address (e.g., of the old data) in a page is marked as invalid and the new data is written to a new location or address in another page in the solid state storage. Over time, the number of locations in a block that are invalid will grow. To reclaim the block, the remaining valid locations are written into another block, thus freeing the entire block in solid state storage for some other use. This reclamation process is referred to as garbage collection. A block which is being garbage collected is likely to contain addresses which are cold. As such, checking neighbors may result in too many false positives during garbage collection in some solid state storage systems.
  • a benefit to neighbor based hot data identification is that it may better detect hot data when (e.g., solid state) storage is first being written to compared to some other hot data identification techniques.
  • some other hot data identification techniques identify hot data by comparing the access count of a location (e.g., the number of times a particular LBA has been accessed in the past) against a hot threshold. If the access count is greater than the hot threshold, then the location is declared to be hot.
  • the access count for that location is zero and several accesses are required to reach the hotness threshold before that location is considered hot. Therefore, for the first few accesses, locations will be misidentified as cold.
  • FIG. 2 is a diagram which shows exemplary simulation results comparing neighbor based hot data identification to another hot data identification technique. As is shown in region 200 of the graph, the neighbor based hot data identification tends to have better correct identification percentages when the access count is below ⁇ 2 ⁇ 10 4 .
  • FIG. 3 is a diagram illustrating an embodiment of a system which performs neighbor based hot data identifier system.
  • neighbor based hot data identifier 300 may be part of a storage controller (e.g., which reads from and/or writes to solid state storage media) and/or may be implemented using an ASIC or an FPGA.
  • neighbor based hot data identifier 300 is used only during a host write and is not used during garbage collection.
  • the LBA which is passed to neighbor based hot data identifier 300 may be an LBA being written to (e.g., as part of a host write).
  • neighbor based hot data identification techniques may be used during garbage collection in other embodiments if desired.
  • Neighbor generator 310 receives the LBA and obtains a range of neighbors to consider from settings 304 .
  • settings 304 is implemented as a register and the number of neighbors to consider is programmable (e.g., so that more/less neighbors are considered).
  • neighbor generator 302 Based on the range of neighbors to consider, neighbor generator 302 generates one or more LBA(s) of the neighbor(s) which are passed to hot data identifier 306 .
  • Hot data identifier 306 generates one or more hot data metric(s) of the neighbor(s).
  • hot data identifier 306 uses a hot data identifier technique which is independent of the neighbors of the LBA to be written (i.e., it does not take into consideration the hotness of a neighbor). Any appropriate technique may be employed by hot data identifier 306 .
  • the hot metric(s) of the neighbor(s) are passed from hot data identifier 306 to combiner 312 .
  • a hot metric may either be a hot/cold value or some score over a range (e.g., a range of 1-5).
  • a hot metric with a hot value is output by combiner 312 (e.g., combiner 312 is an OR).
  • combiner 312 is an OR
  • any manner of combination may be used by combiner 312 .
  • the information used to generate a hot data metric is updated in real time.
  • the hot data metric being generated by neighbor based hot data identifier 300 may directly or indirectly affect a hot data metric, either for that LBA or another LBA.
  • a generated hot metric may affect which block(s) is/are garbage collected, which in turn affects program/erase counts for the affected block(s), which in turn affects a subsequent hot metric.
  • the host write may cause the program/erase count for that LBA to increment, which in turn affects a subsequent hot metric.
  • FIG. 4 is a flowchart illustrating an embodiment of a process for determining one or more neighboring hot metrics.
  • the process is used at step 102 in FIG. 1 .
  • the processing performed by neighbor based hot data identifier 300 in FIG. 3 is shown.
  • one or more neighboring addresses on a second side of the received address is/are determined based on the range. To continue the example from above, step 404 generates the addresses: (ADDR+1), (ADDR+2), and (ADDR+3).
  • steps 402 and 404 are performed by controller 302 .
  • one or more neighboring hot metrics on the first side is/are determined.
  • FIG. 5 is a flowchart illustrating an embodiment of a process for determining a hot metric using neighboring hot metrics.
  • the process is used at step 104 in FIG. 1 .
  • the processing performed by neighbor based hot data identifier 300 in FIG. 3 is shown.
  • it is determined if at least one neighboring hot metric corresponds to a hot value. If so, the hot metric for the received address is set to a hot value at 502 . If not, at 504 , the hot metric for the received address is set to a cold value.
  • the hot metric for the received address would be set to a hot value because the neighbor (LBA+3) has a neighboring hot metric which corresponds to a hot value.
  • the technique is used during a host write. In some embodiments, the technique is used during garbage collection.
  • FIG. 6 is a flowchart illustrating an embodiment of a dynamic hot threshold based hot data identification process.
  • the process is performed by a storage controller in a solid state storage system.
  • the storage controller is implemented as or using a semiconductor device, such as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • an address is received.
  • the address may be an LBA within a block of a solid state storage system.
  • the received address is associated with a host write.
  • the received address is associated with garbage collection (e.g., it is the address of valid data within a block that also includes invalid data).
  • a dynamic hot threshold is determined. Unlike a static hot threshold, a dynamic hot threshold changes over time, for example as the state of the solid state storage system changes.
  • a dynamic hot threshold is determined based on one or more of the following: an average access count in a block of interest (C avg ), a measure of fullness of a cache associated with solid state storage (f), or some measure of wear leveling associated with the solid state storage (e.g., a wear leveling difference, the wear leveling of the entire solid state storage, the wear leveling of a specific portion of the solid state storage, etc.).
  • a single dynamic hot threshold is determined for the entire solid state storage drive (which may include multiple solid state die).
  • the storage controller cannot control where logical block addresses are stored. For example, logical block addresses can move from die to die with each garbage collection.
  • a metric associated with the received address to compare against the dynamic hot threshold is determined. Any appropriate technique may be used to determine the metric at step 604 .
  • the metric determined at step 604 is based on recency (e.g., tracked using a timestamp associated with a last write to that address) as well as frequency (e.g., tracked using access count). For example, an address which is written both frequently and recently is considered hot, but an address which is written frequently but in the distant past or an address that was written recently but infrequently is considered cold. Similarly, an address which was written neither frequently nor recently would also be considered cold.
  • a hot metric for the received address is determined based at least in part on the metric associated with the received address and the dynamic hot threshold.
  • a dynamic hot threshold at step 606 enables a system to generate hot metrics which adapt to different usage patterns. For example, different users can have different usage patterns, and even the same user can have different usage patterns over time.
  • a dynamic hot threshold enables a system to adapt to the condition of the solid state storage itself, which varies over time. For these reasons, a hot metric which is determined using a dynamic hot threshold is desirable.
  • FIG. 7 is a diagram illustrating an embodiment of a system which performs dynamic hot threshold based hot data identification.
  • an LBA is input to dynamic hot threshold based hot data identifier 700 .
  • the LBA may be an address associated with a host write, or an address which is associated with garbage collection (e.g., it is an address within a block being garbage collected).
  • access count based hot data identifier 706 (e.g., internally) generates an access count for the received LBA.
  • Access count based hot data identifier 706 may use any appropriate technique to track and/or determine access counts for a given LBA.
  • the access count is compared by access count based hot data identifier 706 against a dynamic hot threshold which is generated by dynamic hot threshold generator 704 (described in further detail below).
  • a hot metric having a cold value is output for the received LBA.
  • the hot metric output by dynamic hot threshold based hot data identifier 700 may be either a hot/cold value or a value which spans a range of values (e.g., a range of 1-5).
  • Dynamic hot threshold generator 704 uses the incoming traffic (in the form of LBAs in FIG. 7 ) to adjust the dynamic hot threshold. This is because two identical solid state storage systems may be used in very different manners and depending on the usage, different hot thresholds will result. Dynamic hot threshold generator 704 also uses any number of settings 702 . For example, the settings may specify values of constants used to generate the dynamic hot threshold and/or what system or state information to use in generating the dynamic hot threshold. In some embodiments, the settings used by dynamic hot threshold generator 704 begin with default or initial values and are adjusted over time as the actual usage pattern of the solid state storage system becomes more apparent.
  • dynamic hot threshold generator 704 generates a dynamic hot threshold using C avg , which is an average access count of LBAs in a block of interest (e.g., a block being garbage collected or the block being written to during a host write).
  • C avg is an average access count of LBAs in a block of interest (e.g., a block being garbage collected or the block being written to during a host write).
  • the dynamic hot threshold increases. This has the effect of causing more data to be classified as cold data and stored in the non-cache portion of the solid state storage as opposed to the cache.
  • dynamic hot threshold generator 704 generates a dynamic hot threshold using f, which is a measure of fullness of a cache associated with solid state storage, that is, T(f). As the cache sustains more wear compared to the rest of the solid state storage (i.e., as w diff increases), the dynamic hot threshold increases. In such cases it is desirable to store more data in the non-cache portion of the solid state storage since the cache is getting worn out faster than the rest of the solid state storage.
  • dynamic hot threshold generator 704 generates a dynamic hot threshold using w diff , which is a difference between the wear level of a cache (w cache ) and the wear level of a non-cache portion of the solid state storage (w). In other words, T(w diff ).
  • w diff a difference between the wear level of a cache (w cache ) and the wear level of a non-cache portion of the solid state storage (w).
  • T(w diff ) This enables the dynamic hot threshold to adapt to different usage patterns using the average access count (C avg ). For example, as the average access count increases, the dynamic hot threshold increases. It does not make sense, for example, to have the same hot threshold for solid state storage systems which experience (e.g., radically) different usage patterns.
  • C avg may be some non-negative real number
  • f may be a number between zero and one (e.g., where zero means that the cache is empty and one means that the cache is full)
  • w diff , w cache , and w may be numbers between zero and one (e.g., where one indicates 100% of a maximum number of program/erase cycles and zero indicates 0% of a maximum number of program/erase cycles).
  • any number of constants and/or configurations which affect dynamic hot threshold generation may be obtained from settings 702 .
  • Access count based hot data identifier 706 is merely exemplary and is not intended to be limiting. Any appropriate hot data identification technique which uses a threshold may be used in combination with the dynamic hot threshold technique described herein.
  • a dynamic hot threshold is used in combination with neighbor based hot data identification.
  • the following figures show some examples of this.
  • FIG. 8 is a flowchart illustrating an embodiment of a process for identifying hot data based on neighbors and using a dynamic hot threshold.
  • an address is received. As described above, in various embodiments a received address may be associated with a host write or with garbage collection.
  • one or more neighboring hot metrics for one or more neighbors associated with the received address is/are determined, including by: (1) determining a dynamic hot threshold and (2) for each of the one or more neighbors: (a) determining a metric associated with a given neighbor to compare against the dynamic hot threshold and (b) determining, based at least in part on the metric associated with the given neighbor and the dynamic hot threshold, a hot metric for the given neighbor.
  • one or more neighboring hot metrics are determined where each neighboring hot metric is determined using a dynamic hot metric that changes over time (e.g., as the state and/or usage of a solid state storage system changes).
  • a hot metric for the received address is determined based at least in part on the neighboring hot metrics. For example, if at least one neighboring hot metric corresponds to a hot value, then the hot metric for the received address is set to a hot value.
  • FIG. 9 is a diagram illustrating an embodiment of a system which performs neighbor based and dynamic hot threshold based hot data identification.
  • an LBA is received by neighbor based and dynamic hot threshold based hot data identifier 900 .
  • the received address is passed to dynamic hot threshold generator 904 and neighbor generator 910 in neighbor based and dynamic hot threshold based hot data identifier 900 .
  • dynamic hot threshold generator 904 uses any number of parameters or settings from settings 902 and the received LBA to generate a dynamic hot threshold which is passed to access count based hot data identifier 906 .
  • Neighbor generator 908 generates one or more LBAs for one or more neighbors of the specified LBA using the received LBA and any number of settings or parameters from settings 902 .
  • the LBA(s) of the neighbors are passed from neighbor generator 910 to access count based hot data identifier 906 .
  • Access count based hot data identifier 906 generates a metric (for each neighbor) to compare against the dynamic hot threshold and sends one or more neighboring hot metrics of the neighbors to comparator 912 based on the comparison. In some embodiments, if at least one neighboring hot metric corresponds to a hot value, then the hot metric of the LBA which is output is set to a hot value (e.g., comparator 912 is an OR).
  • Write amplification is defined as the ratio of the total number of (e.g., actual) writes to solid state storage compared to the number of host writes (e.g., which triggered the actual writes). For example, if a host writes one LBA and in doing so causes a garbage collection process to conduct one extra or additional write in addition to the host write, the write amplification is two.
  • Write amplification is a useful parameter because reducing write amplification is an important aspect of solid state storage management. A large write amplification value increases the amount of time to perform a host write, which is undesirable. Also, since solid state storage can only last for a limited number of write cycles, reducing write amplification can prolong the life of the solid state storage system.
  • the neighbor based hot data identification techniques i.e., the last two rows

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Memory System (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Radiation Pyrometers (AREA)

Abstract

An address is received. One or more neighbors associated with the received address is/are determined. One or more neighboring hot metrics is/are determined for the one or more neighbors associated with the received address. A hot metric for the received address is determined based at least in part on the neighboring hot metrics.

Description

CROSS REFERENCE TO OTHER APPLICATIONS
This application claims priority to U.S. Provisional Patent Application No. 61/808,529 entitled REDUCING WRITE AMPLIFICATION AND INCREASING THROUGHPUT IN SSDS filed Apr. 4, 2013 which is incorporated herein by reference for all purposes.
BACKGROUND OF THE INVENTION
Hot data identification is a process or technique is which data is classified or identified as being either hot or cold. Hot data (generally speaking) is data that will be invalidated shortly in the future. Cold data (generally speaking) is data that will remain valid for a long time in the future. In one example application, solid state storage systems use hot data identification techniques to group hot data together and cold data together, for example when host writes are received (e.g., hot data is written to the cache whereas cold data is written to the larger but slower main drive) or during garbage collection (e.g., hot data in a block being garbage collected is written to a first new block whereas cold data in that same block is written to a second new block). Although a number of hot data identification techniques exist, improved hot data identification techniques would be desirable. Such improved techniques may (for example) improve the accuracy of hot data identification, which in turn improves the efficiency of the solid state system (e.g., by reducing write amplification, which is defined as the ratio of the number of writes to solid state storage compared to the number of host writes).
BRIEF DESCRIPTION OF THE DRAWINGS
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
FIG. 1 is a flowchart illustrating an embodiment of a neighbor based hot data identification process.
FIG. 2 is a diagram which shows exemplary simulation results comparing neighbor based hot data identification to another hot data identification technique.
FIG. 3 is a diagram illustrating an embodiment of a system which performs neighbor based hot data identifier system.
FIG. 4 is a flowchart illustrating an embodiment of a process for determining one or more neighboring hot metrics.
FIG. 5 is a flowchart illustrating an embodiment of a process for determining a hot metric using neighboring hot metrics.
FIG. 6 is a flowchart illustrating an embodiment of a dynamic hot threshold based hot data identification process.
FIG. 7 is a diagram illustrating an embodiment of a system which performs dynamic hot threshold based hot data identification.
FIG. 8 is a flowchart illustrating an embodiment of a process for identifying hot data based on neighbors and using a dynamic hot threshold.
FIG. 9 is a diagram illustrating an embodiment of a system which performs neighbor based and dynamic hot threshold based hot data identification.
DETAILED DESCRIPTION
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Two hot data identification techniques are described herein. First, some examples describing various aspects of a neighbor based hot data identification technique are described. Then, examples of a hot data identification technique which uses a dynamic hot threshold are described. Finally, some examples which show both techniques being used together are described.
FIG. 1 is a flowchart illustrating an embodiment of a neighbor based hot data identification process. In some embodiments, the process is performed by a storage controller in a solid state storage system. In some such embodiments, the storage controller is implemented as or using a semiconductor device, such as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
At 100, an address is received. In one example, the received address is a logical block address (LBA) in solid state storage (e.g., NAND Flash). In some embodiments, the process of FIG. 1 is performed during a host write and the address received at step 100 is associated with a host write.
At 102, one or more neighboring hot metrics are determined for one or more neighbors associated with the received address. As used herein, a neighboring hot metric refers to a metric associated with a neighbor (e.g., a neighbor of the address received at step 100). In some embodiments, the neighboring hot metrics determined at step 102 are generated on-the-fly so that the neighboring hot metrics are fresh or otherwise up to date. Any one of a variety of hot data identification techniques may be used at step 102. For example, a metric associated with the received address may be determined based on recency and/or frequency, and then that metric is compared against some hot threshold.
In some embodiments, a (neighboring) hot metric is one of two possible values: hot or cold. For example, the neighboring hot metrics determined at step 102 may be one of two values: hot (e.g., a one) or cold (e.g., a zero). In some embodiments, a (neighboring) hot metric is a score. For example, the neighboring hot metrics determined at step 102 may range in value from one (e.g., very cold) to five (e.g., very hot).
At 104, a hot metric for the received address is determined based at least in part on the neighboring hot metrics. In one example, if any of the neighboring hot metrics determined at step 102 corresponds to a hot value, then the hot metric for the received address is set to a hot value.
In one example of how a hot metric generated according to the process of FIG. 1 is used, the process of FIG. 1 may be performed by a solid state storage system (e.g., a NAND Flash storage system) during a host write. The exemplary solid state storage system includes a cache (e.g., which has quicker access times (e.g., it is faster to read from and/or write to), can tolerate more errors (e.g., compared to the non-cache portion), and can tolerate more program/erase cycles (e.g., it is more robust)). In this example, if an address (e.g., being written to during the host write) is determined by the process of FIG. 1 to be hot data, then that data is written into the cache. Conversely, if an address is determined to be cold data, then that data is written into the non-cache portion.
In some applications it may be undesirable to use neighbor based hot data identification during garbage collection. Using solid state storage as an example, solid state storage does not support in-place updates (e.g., unlike magnetic storage). For example, when old data is superseded by some new data, the old location or address (e.g., of the old data) in a page is marked as invalid and the new data is written to a new location or address in another page in the solid state storage. Over time, the number of locations in a block that are invalid will grow. To reclaim the block, the remaining valid locations are written into another block, thus freeing the entire block in solid state storage for some other use. This reclamation process is referred to as garbage collection. A block which is being garbage collected is likely to contain addresses which are cold. As such, checking neighbors may result in too many false positives during garbage collection in some solid state storage systems.
A benefit to neighbor based hot data identification is that it may better detect hot data when (e.g., solid state) storage is first being written to compared to some other hot data identification techniques. For example, some other hot data identification techniques identify hot data by comparing the access count of a location (e.g., the number of times a particular LBA has been accessed in the past) against a hot threshold. If the access count is greater than the hot threshold, then the location is declared to be hot. When a location is first accessed, the access count for that location is zero and several accesses are required to reach the hotness threshold before that location is considered hot. Therefore, for the first few accesses, locations will be misidentified as cold. In contrast, neighbor based hot data identification will more quickly identify data as being hot because (at least in some embodiments) if at least one neighbor under consideration has an access count which has reached the hot threshold, then the given location will be declared hot. FIG. 2 is a diagram which shows exemplary simulation results comparing neighbor based hot data identification to another hot data identification technique. As is shown in region 200 of the graph, the neighbor based hot data identification tends to have better correct identification percentages when the access count is below ˜2×104.
FIG. 3 is a diagram illustrating an embodiment of a system which performs neighbor based hot data identifier system. In various embodiments, neighbor based hot data identifier 300 may be part of a storage controller (e.g., which reads from and/or writes to solid state storage media) and/or may be implemented using an ASIC or an FPGA. In some embodiments, neighbor based hot data identifier 300 is used only during a host write and is not used during garbage collection. For example, the LBA which is passed to neighbor based hot data identifier 300 may be an LBA being written to (e.g., as part of a host write). Naturally, neighbor based hot data identification techniques may be used during garbage collection in other embodiments if desired.
Neighbor generator 310 receives the LBA and obtains a range of neighbors to consider from settings 304. In one example, settings 304 is implemented as a register and the number of neighbors to consider is programmable (e.g., so that more/less neighbors are considered). Based on the range of neighbors to consider, neighbor generator 302 generates one or more LBA(s) of the neighbor(s) which are passed to hot data identifier 306.
Hot data identifier 306 generates one or more hot data metric(s) of the neighbor(s). In this particular example, hot data identifier 306 uses a hot data identifier technique which is independent of the neighbors of the LBA to be written (i.e., it does not take into consideration the hotness of a neighbor). Any appropriate technique may be employed by hot data identifier 306.
The hot metric(s) of the neighbor(s) are passed from hot data identifier 306 to combiner 312. As described above, a hot metric may either be a hot/cold value or some score over a range (e.g., a range of 1-5). In this particular example, if at least one neighbor being evaluated is hot data, then a hot metric with a hot value is output by combiner 312 (e.g., combiner 312 is an OR). If all of the neighbors have cold data, then a hot metric corresponding to a cold value is output by neighbor based hot data identifier 300. Any manner of combination may be used by combiner 312.
In some embodiments (e.g., relevant to this figure and other figures described below), the information used to generate a hot data metric is updated in real time. For example, the hot data metric being generated by neighbor based hot data identifier 300 may directly or indirectly affect a hot data metric, either for that LBA or another LBA. In one example, a generated hot metric may affect which block(s) is/are garbage collected, which in turn affects program/erase counts for the affected block(s), which in turn affects a subsequent hot metric. Or, if the received LBA is associated with a host write, then the host write may cause the program/erase count for that LBA to increment, which in turn affects a subsequent hot metric.
FIG. 4 is a flowchart illustrating an embodiment of a process for determining one or more neighboring hot metrics. In some embodiments, the process is used at step 102 in FIG. 1. In the example shown, the processing performed by neighbor based hot data identifier 300 in FIG. 3 is shown.
At 400, a range of neighbors to consider is obtained. For example, suppose n=3. At 402, one or more neighboring addresses on a first side of the received address is/are determined based on the range. In one example, if the received address is ADDR and n=3, then step 402 generates the addresses: (ADDR−3), (ADDR−2), and (ADDR−1). At 404, one or more neighboring addresses on a second side of the received address is/are determined based on the range. To continue the example from above, step 404 generates the addresses: (ADDR+1), (ADDR+2), and (ADDR+3). In FIG. 3, steps 402 and 404 are performed by controller 302.
At 406, for the neighboring addresses on the first side, one or more neighboring hot metrics on the first side is/are determined. In one example, the following neighboring hot metrics are determined: (ADDR−3)=cold, (ADDR−2)=cold, and (ADDR−1)=cold. At 408, for the neighboring addresses on the second side, one or more neighboring hot metrics on the second side is/are determined. For example, (ADDR+1)=cold, (ADDR+2)=cold, and (ADDR+3)=hot.
FIG. 5 is a flowchart illustrating an embodiment of a process for determining a hot metric using neighboring hot metrics. In some embodiments, the process is used at step 104 in FIG. 1. In the example shown, the processing performed by neighbor based hot data identifier 300 in FIG. 3 is shown. At 500, it is determined if at least one neighboring hot metric corresponds to a hot value. If so, the hot metric for the received address is set to a hot value at 502. If not, at 504, the hot metric for the received address is set to a cold value.
If the example above were processed according to FIG. 5, the hot metric for the received address would be set to a hot value because the neighbor (LBA+3) has a neighboring hot metric which corresponds to a hot value.
The following figures describe some examples of a hot data identification technique which uses a dynamic hot threshold. In some embodiments, the technique is used during a host write. In some embodiments, the technique is used during garbage collection.
FIG. 6 is a flowchart illustrating an embodiment of a dynamic hot threshold based hot data identification process. In some embodiments, the process is performed by a storage controller in a solid state storage system. In some such embodiments, the storage controller is implemented as or using a semiconductor device, such as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
At 600, an address is received. For example, the address may be an LBA within a block of a solid state storage system. In some embodiments, the received address is associated with a host write. In some embodiments, the received address is associated with garbage collection (e.g., it is the address of valid data within a block that also includes invalid data).
At 602, a dynamic hot threshold is determined. Unlike a static hot threshold, a dynamic hot threshold changes over time, for example as the state of the solid state storage system changes. In some embodiments, a dynamic hot threshold is determined based on one or more of the following: an average access count in a block of interest (Cavg), a measure of fullness of a cache associated with solid state storage (f), or some measure of wear leveling associated with the solid state storage (e.g., a wear leveling difference, the wear leveling of the entire solid state storage, the wear leveling of a specific portion of the solid state storage, etc.).
In some embodiments, a single dynamic hot threshold is determined for the entire solid state storage drive (which may include multiple solid state die). In some applications, the storage controller cannot control where logical block addresses are stored. For example, logical block addresses can move from die to die with each garbage collection.
At 604, a metric associated with the received address to compare against the dynamic hot threshold is determined. Any appropriate technique may be used to determine the metric at step 604. In some embodiments, the metric determined at step 604 is based on recency (e.g., tracked using a timestamp associated with a last write to that address) as well as frequency (e.g., tracked using access count). For example, an address which is written both frequently and recently is considered hot, but an address which is written frequently but in the distant past or an address that was written recently but infrequently is considered cold. Similarly, an address which was written neither frequently nor recently would also be considered cold.
At 606, a hot metric for the received address is determined based at least in part on the metric associated with the received address and the dynamic hot threshold. Using a dynamic hot threshold at step 606 enables a system to generate hot metrics which adapt to different usage patterns. For example, different users can have different usage patterns, and even the same user can have different usage patterns over time. In addition to or as an alternative to usage patterns, a dynamic hot threshold enables a system to adapt to the condition of the solid state storage itself, which varies over time. For these reasons, a hot metric which is determined using a dynamic hot threshold is desirable.
FIG. 7 is a diagram illustrating an embodiment of a system which performs dynamic hot threshold based hot data identification. In the example shown, an LBA is input to dynamic hot threshold based hot data identifier 700. For example, the LBA may be an address associated with a host write, or an address which is associated with garbage collection (e.g., it is an address within a block being garbage collected).
In this example, access count based hot data identifier 706 (e.g., internally) generates an access count for the received LBA. Access count based hot data identifier 706 may use any appropriate technique to track and/or determine access counts for a given LBA. The access count is compared by access count based hot data identifier 706 against a dynamic hot threshold which is generated by dynamic hot threshold generator 704 (described in further detail below). In this example, if the access count is less than the dynamic hot threshold, then a hot metric having a cold value is output for the received LBA. If the access count is greater than the dynamic hot threshold, then a hot metric having a hot value is output for the received LBA. As described above, the hot metric output by dynamic hot threshold based hot data identifier 700 may be either a hot/cold value or a value which spans a range of values (e.g., a range of 1-5).
Dynamic hot threshold generator 704 uses the incoming traffic (in the form of LBAs in FIG. 7) to adjust the dynamic hot threshold. This is because two identical solid state storage systems may be used in very different manners and depending on the usage, different hot thresholds will result. Dynamic hot threshold generator 704 also uses any number of settings 702. For example, the settings may specify values of constants used to generate the dynamic hot threshold and/or what system or state information to use in generating the dynamic hot threshold. In some embodiments, the settings used by dynamic hot threshold generator 704 begin with default or initial values and are adjusted over time as the actual usage pattern of the solid state storage system becomes more apparent.
In some embodiments, dynamic hot threshold generator 704 generates a dynamic hot threshold using Cavg, which is an average access count of LBAs in a block of interest (e.g., a block being garbage collected or the block being written to during a host write). For example, the dynamic hot threshold (T) may be T(Cavg)=1.2Cavg. In general, as the cache becomes full (i.e., as f increases), the dynamic hot threshold increases. This has the effect of causing more data to be classified as cold data and stored in the non-cache portion of the solid state storage as opposed to the cache.
In some embodiments, dynamic hot threshold generator 704 generates a dynamic hot threshold using f, which is a measure of fullness of a cache associated with solid state storage, that is, T(f). As the cache sustains more wear compared to the rest of the solid state storage (i.e., as wdiff increases), the dynamic hot threshold increases. In such cases it is desirable to store more data in the non-cache portion of the solid state storage since the cache is getting worn out faster than the rest of the solid state storage.
In some embodiments, dynamic hot threshold generator 704 generates a dynamic hot threshold using wdiff, which is a difference between the wear level of a cache (wcache) and the wear level of a non-cache portion of the solid state storage (w). In other words, T(wdiff). This enables the dynamic hot threshold to adapt to different usage patterns using the average access count (Cavg). For example, as the average access count increases, the dynamic hot threshold increases. It does not make sense, for example, to have the same hot threshold for solid state storage systems which experience (e.g., radically) different usage patterns.
To illustrate exemplary values, Cavg may be some non-negative real number, f may be a number between zero and one (e.g., where zero means that the cache is empty and one means that the cache is full), and wdiff, wcache, and w may be numbers between zero and one (e.g., where one indicates 100% of a maximum number of program/erase cycles and zero indicates 0% of a maximum number of program/erase cycles).
In addition to the variables described above, any number of constants and/or configurations which affect dynamic hot threshold generation may be obtained from settings 702. For example, the dynamic hot threshold (T) may be calculated using:
T(C avg ,f,w diff)=k 1 C avgexp(k 2(f−f k)+k 3(w diff −w k)),
where k1, k2, k3, fk, and wk are constants which are set as desired and are obtained from settings 702.
Access count based hot data identifier 706 is merely exemplary and is not intended to be limiting. Any appropriate hot data identification technique which uses a threshold may be used in combination with the dynamic hot threshold technique described herein.
In some embodiments, a dynamic hot threshold is used in combination with neighbor based hot data identification. The following figures show some examples of this.
FIG. 8 is a flowchart illustrating an embodiment of a process for identifying hot data based on neighbors and using a dynamic hot threshold. At 800, an address is received. As described above, in various embodiments a received address may be associated with a host write or with garbage collection.
At 802, one or more neighboring hot metrics for one or more neighbors associated with the received address is/are determined, including by: (1) determining a dynamic hot threshold and (2) for each of the one or more neighbors: (a) determining a metric associated with a given neighbor to compare against the dynamic hot threshold and (b) determining, based at least in part on the metric associated with the given neighbor and the dynamic hot threshold, a hot metric for the given neighbor. In other words, one or more neighboring hot metrics are determined where each neighboring hot metric is determined using a dynamic hot metric that changes over time (e.g., as the state and/or usage of a solid state storage system changes).
At 804, a hot metric for the received address is determined based at least in part on the neighboring hot metrics. For example, if at least one neighboring hot metric corresponds to a hot value, then the hot metric for the received address is set to a hot value.
FIG. 9 is a diagram illustrating an embodiment of a system which performs neighbor based and dynamic hot threshold based hot data identification. In the example shown, an LBA is received by neighbor based and dynamic hot threshold based hot data identifier 900. The received address is passed to dynamic hot threshold generator 904 and neighbor generator 910 in neighbor based and dynamic hot threshold based hot data identifier 900.
Using any number of parameters or settings from settings 902 and the received LBA, dynamic hot threshold generator 904 generates a dynamic hot threshold which is passed to access count based hot data identifier 906. Neighbor generator 908 generates one or more LBAs for one or more neighbors of the specified LBA using the received LBA and any number of settings or parameters from settings 902. The LBA(s) of the neighbors are passed from neighbor generator 910 to access count based hot data identifier 906.
Access count based hot data identifier 906 generates a metric (for each neighbor) to compare against the dynamic hot threshold and sends one or more neighboring hot metrics of the neighbors to comparator 912 based on the comparison. In some embodiments, if at least one neighboring hot metric corresponds to a hot value, then the hot metric of the LBA which is output is set to a hot value (e.g., comparator 912 is an OR).
The following table illustrates the performance improvement, as measured by write amplification of the various hot data identification techniques described herein. Write amplification is defined as the ratio of the total number of (e.g., actual) writes to solid state storage compared to the number of host writes (e.g., which triggered the actual writes). For example, if a host writes one LBA and in doing so causes a garbage collection process to conduct one extra or additional write in addition to the host write, the write amplification is two. Write amplification is a useful parameter because reducing write amplification is an important aspect of solid state storage management. A large write amplification value increases the amount of time to perform a host write, which is undesirable. Also, since solid state storage can only last for a limited number of write cycles, reducing write amplification can prolong the life of the solid state storage system.
In the table below, the neighbor based hot data identification techniques (i.e., the last two rows) utilize a value of n=3 (i.e., three neighbors on each side of the LBA of interest are considered). For the dynamic hot threshold techniques (i.e., the third row and last row), the threshold function used is T(Cavg)=1.2Cavg.
TABLE 1
Comparison of Write Amplification Values for
Various Hot Data Identification Techniques
Hot Data Identification Technique Write Amplification
None 6.1
Some Other Hot Data 5.01
Identification Technique
Dynamic Hot Threshold Based 4.74
Neighbor Based 4.65
Both Dynamic Hot Threshold 4.55
Based and Neighbor Based
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.

Claims (23)

What is claimed is:
1. A system, comprising:
a memory with a cache portion and a non-cache portion;
a neighbor generator configured to:
receive an address; and
determine one or more neighbors associated with the received address;
a hot data identifier configured to determine one or more neighboring hot metrics for the one or more neighbors associated with the received address; and
a comparator configured to determine, based at least in part on the neighboring hot metrics, a hot metric for the received address,
wherein the received address is written to the cache portion of the memory when the determined hot metric for the received address corresponds to a hot data value.
2. The system of claim 1, wherein the received address is associated with a host write.
3. The system of claim 1, wherein the received address is associated with garbage collection.
4. The system of claim 1, wherein the hot data identifier is configured to determine the neighboring hot metrics, including by:
obtaining a range of neighbors to consider;
determining one or more neighboring addresses on a first side of the received address based on the range;
determining one or more neighboring addresses on a second side of the received address based on the range;
determining, for the neighboring addresses on the first side, one or more neighboring hot metrics on the first side; and
determining, for the neighboring addresses on the second side, one or more neighboring hot metrics on the second side.
5. The system of claim 1, wherein the comparator is configured to determine the hot metric for the received address, including by:
determining if at least one neighboring hot metric corresponds to a hot value;
in the event it is determined that at least one neighboring hot metric corresponds to the hot value, setting the hot metric for the received address to the hot data value; and
in the event it is determined that at least one neighboring hot metric does not correspond to the hot value, setting the hot metric for the received address to a cold data value.
6. The system of claim 1, wherein the hot data identifier is configured to determine the neighboring hot metrics, including by:
determining a dynamic hot threshold; and
for each of the one or more neighbors:
determining a metric associated with a given neighbor to compare against the dynamic hot threshold; and
determining, based at least in part on the metric associated with the given neighbor and the dynamic hot threshold, a hot metric for the given neighbor.
7. The system of claim 6, wherein the hot data identifier is configured to determine the dynamic hot threshold, including by determining the dynamic hot threshold using an average access count of logical block addresses (LBAs) in a block of interest.
8. The system of claim 6, wherein the hot data identifier is configured to determine the dynamic hot threshold, including by determining the dynamic hot threshold using a measure of fullness of the cache portion of the memory.
9. The system of claim 6, wherein the hot data identifier is configured to determine the dynamic hot threshold, including by determining the dynamic hot threshold using a difference between a wear level of the cache portion and a wear level of the non-cache portion of the memory.
10. A system, comprising:
a memory including a cache portion and a non-cache portion;
a dynamic hot threshold generator configured to determine a dynamic hot threshold; and
a hot data identifier configured to:
receive an address;
determine a metric associated with the received address to compare against the dynamic hot threshold; and
determine, based at least in part on the metric associated with the received address and the dynamic hot threshold, a hot metric for the received address,
wherein the received address is written to the cache portion of the memory when the determined hot metric for the received address corresponds to a hot data value.
11. The system of claim 10, wherein the received address is associated with a host write.
12. The system of claim 10, wherein the received address is associated with garbage collection.
13. The system of claim 10, wherein the hot data identifier is configured to determine the dynamic hot threshold including by determining the dynamic hot threshold using an average access count of logical block addresses (LBAs) in a block of interest.
14. The system of claim 10, wherein the hot data identifier is configured to determine the dynamic hot threshold including by determining the dynamic hot threshold using a measure of fullness of the cache portion of the memory.
15. The system of claim 10, wherein the hot data identifier is configured to determine the dynamic hot threshold including by determining the dynamic hot threshold using a difference between a wear level of the cache portion and the wear level of the non-cache portion of the memory.
16. A method, comprising:
receiving an address;
determining one or more neighbors associated with the received address;
using a processor to determine one or more neighboring hot metrics for the one or more neighbors associated with the received address;
determining, based at least in part on the neighboring hot metrics, a hot metric for the received address; and
writing the received address to a cache portion of a memory when the determined hot metric for the received address corresponds to a hot data value.
17. The method of claim 16, wherein using the processor to determine the neighboring hot metrics includes:
obtaining a range of neighbors to consider;
determining one or more neighboring addresses on a first side of the received address based on the range;
determining one or more neighboring addresses on a second side of the received address based on the range;
determining, for the neighboring addresses on the first side, one or more neighboring hot metrics on the first side; and
determining, for the neighboring addresses on the second side, one or more neighboring hot metrics on the second side.
18. The method of claim 16, wherein determining the hot metric for the received address includes:
determining if at least one neighboring hot metric corresponds to a hot value;
in the event it is determined that at least one neighboring hot metric corresponds to the hot value, setting the hot metric for the received address to the hot data value; and
in the event it is determined that at least one neighboring hot metric does not correspond to the hot value, setting the hot metric for the received address to a cold data value.
19. The method of claim 16, wherein using the processor to determine the neighboring hot metrics includes:
determining a dynamic hot threshold; and
for each of the one or more neighbors:
determining a metric associated with a given neighbor to compare against the dynamic hot threshold; and
determining, based at least in part on the metric associated with the given neighbor and the dynamic hot threshold, a hot metric for the given neighbor.
20. A method, comprising:
using a processor to determine a dynamic hot threshold; and
receiving an address;
determining a metric associated with the received address to compare against the dynamic hot threshold;
determining, based at least in part on the metric associated with the received address and the dynamic hot threshold, a hot metric for the received address; and
writing the received address to a cache portion of a memory when the determined hot metric for the received address corresponds to a hot data value.
21. The method of claim 20, wherein using the processor to determine the dynamic hot threshold includes determining the dynamic hot threshold using an average access count of logical block addresses (LBAs) in a block of interest.
22. The method of claim 20, wherein using the processor to determine the dynamic hot threshold includes determining the dynamic hot threshold using a measure of fullness of the cache portion of the memory.
23. The method of claim 20, wherein using the processor to determine the dynamic hot threshold includes determining the dynamic hot threshold using a difference between a wear level of the cache portion and a wear level of a non-cache portion of the memory.
US14/169,877 2013-04-04 2014-01-31 Neighbor based and dynamic hot threshold based hot data identification Active 2034-07-18 US9342389B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US14/169,877 US9342389B2 (en) 2013-04-04 2014-01-31 Neighbor based and dynamic hot threshold based hot data identification
CN201480019802.5A CN105556485B (en) 2013-04-04 2014-02-03 Dsc data identification based on adjacent body and based on Dynamic Thermal threshold value
PCT/US2014/014506 WO2014163743A1 (en) 2013-04-04 2014-02-03 Neighbor based and dynamic hot threshold based hot data identification

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361808529P 2013-04-04 2013-04-04
US14/169,877 US9342389B2 (en) 2013-04-04 2014-01-31 Neighbor based and dynamic hot threshold based hot data identification

Publications (2)

Publication Number Publication Date
US20140304480A1 US20140304480A1 (en) 2014-10-09
US9342389B2 true US9342389B2 (en) 2016-05-17

Family

ID=51655335

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/169,877 Active 2034-07-18 US9342389B2 (en) 2013-04-04 2014-01-31 Neighbor based and dynamic hot threshold based hot data identification

Country Status (3)

Country Link
US (1) US9342389B2 (en)
CN (1) CN105556485B (en)
WO (1) WO2014163743A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190196956A1 (en) * 2017-12-22 2019-06-27 SK Hynix Inc. Semiconductor device for managing wear leveling operation of a nonvolatile memory device
US10365854B1 (en) 2018-03-19 2019-07-30 Micron Technology, Inc. Tracking data temperatures of logical block addresses
US10446197B2 (en) 2017-08-31 2019-10-15 Micron Technology, Inc. Optimized scan interval
US10754580B2 (en) 2017-10-23 2020-08-25 Micron Technology, Inc. Virtual partition management in a memory device
US11455245B2 (en) 2017-12-11 2022-09-27 Micron Technology, Inc. Scheme to improve efficiency of garbage collection in cached flash translation layer
US20230068529A1 (en) * 2021-09-01 2023-03-02 Micron Technology, Inc. Cold data identification

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9612964B2 (en) * 2014-07-08 2017-04-04 International Business Machines Corporation Multi-tier file storage management using file access and cache profile information
US20170139826A1 (en) * 2015-11-17 2017-05-18 Kabushiki Kaisha Toshiba Memory system, memory control device, and memory control method
US10733107B2 (en) 2016-10-07 2020-08-04 Via Technologies, Inc. Non-volatile memory apparatus and address classification method thereof
CN106897026B (en) * 2016-10-07 2020-02-07 威盛电子股份有限公司 Nonvolatile memory device and address classification method thereof
CN106874213B (en) * 2017-01-12 2020-03-20 杭州电子科技大学 Solid state disk hot data identification method fusing multiple machine learning algorithms
US11055002B2 (en) * 2018-06-11 2021-07-06 Western Digital Technologies, Inc. Placement of host data based on data characteristics
US10884627B2 (en) 2018-09-26 2021-01-05 International Business Machines Corporation Compacting data in a dispersed storage network
US11080205B2 (en) * 2019-08-29 2021-08-03 Micron Technology, Inc. Classifying access frequency of a memory sub-system component
CN111881346B (en) * 2020-07-15 2022-06-17 北京浪潮数据技术有限公司 Hot data identification method, system and related device
US11561907B2 (en) * 2020-08-18 2023-01-24 Micron Technology, Inc. Access to data stored in quarantined memory media
US11442654B2 (en) * 2020-10-15 2022-09-13 Microsoft Technology Licensing, Llc Managing and ranking memory resources
CN112948398B (en) * 2021-04-29 2023-02-24 电子科技大学 Hierarchical storage system and method for cold and hot data
KR20230060569A (en) * 2021-10-27 2023-05-08 삼성전자주식회사 Controller, storage device and operation method of the storage device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6321240B1 (en) 1999-03-15 2001-11-20 Trishul M. Chilimbi Data structure partitioning with garbage collection to optimize cache utilization
US20060059474A1 (en) * 2004-09-10 2006-03-16 Microsoft Corporation Increasing data locality of recently accessed resources
US20080282045A1 (en) 2007-05-09 2008-11-13 Sudeep Biswas Garbage collection in storage devices based on flash memories
US20100169586A1 (en) * 2008-12-31 2010-07-01 Li-Pin Chang Memory storage device and a control method thereof
US20110113183A1 (en) 2009-11-09 2011-05-12 Industrial Technology Research Institute Method for Managing a Non-Violate Memory and Computer Readable Medium Thereof
US20110225346A1 (en) 2010-03-10 2011-09-15 Seagate Technology Llc Garbage collection in a storage device
US20110264843A1 (en) 2010-04-22 2011-10-27 Seagate Technology Llc Data segregation in a storage device
US20120297122A1 (en) 2011-05-17 2012-11-22 Sergey Anatolievich Gorobets Non-Volatile Memory and Method Having Block Management with Hot/Cold Data Sorting
US20130024609A1 (en) * 2011-05-17 2013-01-24 Sergey Anatolievich Gorobets Tracking and Handling of Super-Hot Data in Non-Volatile Memory Systems
US20140013052A1 (en) * 2012-07-06 2014-01-09 Seagate Technology Llc Criteria for selection of data for a secondary cache
US20140013027A1 (en) * 2012-07-06 2014-01-09 Seagate Technology Llc Layered architecture for hybrid controller

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9153337B2 (en) * 2006-12-11 2015-10-06 Marvell World Trade Ltd. Fatigue management system and method for hybrid nonvolatile solid state memory system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6321240B1 (en) 1999-03-15 2001-11-20 Trishul M. Chilimbi Data structure partitioning with garbage collection to optimize cache utilization
US20060059474A1 (en) * 2004-09-10 2006-03-16 Microsoft Corporation Increasing data locality of recently accessed resources
US20080282045A1 (en) 2007-05-09 2008-11-13 Sudeep Biswas Garbage collection in storage devices based on flash memories
US20100169586A1 (en) * 2008-12-31 2010-07-01 Li-Pin Chang Memory storage device and a control method thereof
US20110113183A1 (en) 2009-11-09 2011-05-12 Industrial Technology Research Institute Method for Managing a Non-Violate Memory and Computer Readable Medium Thereof
US20110225346A1 (en) 2010-03-10 2011-09-15 Seagate Technology Llc Garbage collection in a storage device
US20110264843A1 (en) 2010-04-22 2011-10-27 Seagate Technology Llc Data segregation in a storage device
US20120297122A1 (en) 2011-05-17 2012-11-22 Sergey Anatolievich Gorobets Non-Volatile Memory and Method Having Block Management with Hot/Cold Data Sorting
WO2012158521A1 (en) 2011-05-17 2012-11-22 Sandisk Technologies Inc. Non-volatile memory and method having block management with hot/cold data sorting
US20130024609A1 (en) * 2011-05-17 2013-01-24 Sergey Anatolievich Gorobets Tracking and Handling of Super-Hot Data in Non-Volatile Memory Systems
US20140013052A1 (en) * 2012-07-06 2014-01-09 Seagate Technology Llc Criteria for selection of data for a secondary cache
US20140013027A1 (en) * 2012-07-06 2014-01-09 Seagate Technology Llc Layered architecture for hybrid controller

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Hsieh et al., "Efficient identification of hot data for flash memory storage systems", ACM Transactions on Storage, pp. 22-40, 2006.
Park et al., "Hot Data Identification for Flash Memory Using Multiple Bloom Filters", 2010.
The international Preliminary Report on Patentability issued by the World Intellectual Property Organization for a PCT Appl. No. PCT/US14/14506 on Oct. 15, 2015.
Tjioe et al, "Making Garbage Collection Wear Conscious for Flash SSD", 2012.

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10446197B2 (en) 2017-08-31 2019-10-15 Micron Technology, Inc. Optimized scan interval
US11056156B2 (en) 2017-08-31 2021-07-06 Micron Technology, Inc. Optimized scan interval
US10573357B2 (en) 2017-08-31 2020-02-25 Micron Technology, Inc. Optimized scan interval
US10754580B2 (en) 2017-10-23 2020-08-25 Micron Technology, Inc. Virtual partition management in a memory device
US11340836B2 (en) 2017-10-23 2022-05-24 Micron Technology, Inc. Virtual partition management in a memory device
US11789661B2 (en) 2017-10-23 2023-10-17 Micron Technology, Inc. Virtual partition management
US11455245B2 (en) 2017-12-11 2022-09-27 Micron Technology, Inc. Scheme to improve efficiency of garbage collection in cached flash translation layer
US11720489B2 (en) 2017-12-11 2023-08-08 Micron Technology, Inc. Scheme to improve efficiency of device garbage collection in memory devices
US10713159B2 (en) * 2017-12-22 2020-07-14 SK Hynix Inc. Semiconductor device for managing wear leveling operation of a nonvolatile memory device
US20190196956A1 (en) * 2017-12-22 2019-06-27 SK Hynix Inc. Semiconductor device for managing wear leveling operation of a nonvolatile memory device
KR20190076132A (en) 2017-12-22 2019-07-02 에스케이하이닉스 주식회사 Semiconductor device for managing wear levelling operation of nonvolatile memory device
US10365854B1 (en) 2018-03-19 2019-07-30 Micron Technology, Inc. Tracking data temperatures of logical block addresses
US11068197B2 (en) 2018-03-19 2021-07-20 Micron Technology, Inc. Tracking data temperatures of logical block addresses
US20230068529A1 (en) * 2021-09-01 2023-03-02 Micron Technology, Inc. Cold data identification
US11829636B2 (en) * 2021-09-01 2023-11-28 Micron Technology, Inc. Cold data identification

Also Published As

Publication number Publication date
WO2014163743A1 (en) 2014-10-09
CN105556485A (en) 2016-05-04
CN105556485B (en) 2018-12-07
US20140304480A1 (en) 2014-10-09

Similar Documents

Publication Publication Date Title
US9342389B2 (en) Neighbor based and dynamic hot threshold based hot data identification
TWI632457B (en) Method of wear leveling for data storage device
CN108052414B (en) Method and system for improving working temperature range of SSD
US9639283B2 (en) Offline characterization for adaptive flash tuning
TWI446345B (en) Method for performing block management, and associated memory device and controller thereof
CN104572489B (en) Wear leveling method and device
US9846641B2 (en) Variability aware wear leveling
Woo et al. Diversifying wear index for MLC NAND flash memory to extend the lifetime of SSDs
EP3079067A1 (en) Method and apparatus for using solid state disk
US10303382B1 (en) Application defined storage device
US10901629B2 (en) Method and apparatus for managing health of a storage medium in a storage device
US9710176B1 (en) Maintaining wear spread by dynamically adjusting wear-leveling frequency
TWI650757B (en) Decoding method and storage controller
CN108228449B (en) Terminal device control method and device, terminal device and computer readable storage medium
US20130173842A1 (en) Adaptive Logical Group Sorting to Prevent Drive Fragmentation
US9529722B1 (en) Prefetch with localities and performance monitoring
US11416389B2 (en) Managing garbage collection in a memory subsystem based on characteristics of data streams
JP6102515B2 (en) Information processing apparatus, control circuit, control program, and control method
US11194725B2 (en) Method and apparatus for adjusting cache prefetch policies based on predicted cache pollution from dynamically evolving workloads
US20210019074A1 (en) Managing garbage collection in a memory subsystem based on characteristics of data streams
JP2014220021A (en) Information processor, control circuit, control program, and control method
US9477416B2 (en) Device and method of controlling disk cache by identifying cached data using metadata
CN114631082B (en) Cache access measurement skew correction
US20110107056A1 (en) Method for determining data correlation and a data processing method for a memory
CN103473179A (en) Background system and method for deleting repeating data in solid state disk

Legal Events

Date Code Title Description
AS Assignment

Owner name: SK HYNIX MEMORY SOLUTIONS INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TANG, XIANGYU;LEE, FREDERICK K.H.;BELLORADO, JASON;AND OTHERS;REEL/FRAME:032711/0763

Effective date: 20140414

AS Assignment

Owner name: SK HYNIX INC., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SK HYNIX MEMORY SOLUTIONS INC.;REEL/FRAME:033061/0324

Effective date: 20140519

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8