US20020087802A1 - System and method for maintaining prefetch stride continuity through the use of prefetch bits - Google Patents

System and method for maintaining prefetch stride continuity through the use of prefetch bits Download PDF

Info

Publication number
US20020087802A1
US20020087802A1 US09/820,967 US82096701A US2002087802A1 US 20020087802 A1 US20020087802 A1 US 20020087802A1 US 82096701 A US82096701 A US 82096701A US 2002087802 A1 US2002087802 A1 US 2002087802A1
Authority
US
United States
Prior art keywords
cache
data
prefetch
line
prefetched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/820,967
Inventor
Khalid Al-Dajani
Mohammad Abdallah
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US09/820,967 priority Critical patent/US20020087802A1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ABDALLAH, MOHAMMAD A., AL-DAJANI, KHALID D.
Publication of US20020087802A1 publication Critical patent/US20020087802A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/345Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
    • G06F9/3455Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results using stride
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • G06F9/3832Value prediction for operands; operand history buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6026Prefetching based on access pattern detection, e.g. stride based prefetch

Definitions

  • Embodiments of the present invention relate to prefetching data from a memory.
  • the present invention relates to methods and apparatus for prefetching data from a memory for use by a processor.
  • Instructions executed by a processor often use data that may be stored in a system memory device such as a Random Access Memory (RAM).
  • a processor may execute a LOAD instruction to load a register with data that is stored at a particular memory address.
  • LOAD LOAD instruction to load a register with data that is stored at a particular memory address.
  • frequently used data elements are copied from the system memory into a faster memory device called a cache and, if possible, the processor uses the copy of the data element in the cache when it needs to access (i.e., read to or write from) that data element.
  • One way to decrease the time spent waiting to access a RAM is to “prefetch” data from the system memory before it is needed and, thus, before the cache miss occurs.
  • Many processors have an instruction cycle in which instructions to be executed are obtained from memory in one step (i.e., an instruction fetch) and executed in another step. If the instruction to be executed accesses a memory location (e.g., a memory LOAD), then the data at that location must be fetched into the appropriate section of the processor from a cache or, if a cache miss, from a system memory.
  • a cache prefetcher attempts to anticipate which data addresses will be accessed by instructions in the future and to prefetch this data from the memory before the data is needed.
  • a cache prefetcher typically determines and maintains a data access pattern for an instruction and prefetches data into the cache based on this data access pattern.
  • instruction refers to a particular instance of an instruction in the program, with each instruction being identified by a different instruction pointer (“IP”) value.
  • IP instruction pointer
  • a prefetcher maintains access pattern “continuity” if the prefetcher maintains a discovered access pattern as long as the pattern is active and relinquishes an access pattern that is no longer active. A prefetcher operates less efficiently if the continuity of the access patterns are not maintained.
  • FIG. 1 is a partial block diagram of a computer system having a processor that maintains prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention.
  • FIG. 2 is a partial block diagram of a cache having prefetch bits according to an embodiment of the present invention.
  • FIG. 3 is a flow diagram of a method of maintaining prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention.
  • FIG. 4 is a partial block diagram of a computer system having prefetch bits according to another embodiment of the present invention.
  • Embodiments of the present invention relate to a prefetcher which prefetches data for an instruction based on an access pattern that has been determined and maintained for the instruction.
  • the access pattern used is the distance between cache misses caused by the instruction. This distance is the stride for the cache misses and may be referred to as the “miss distance” for the instruction.
  • the miss distance may be stored in a prefetch table.
  • a prefetcher may incorrectly determine that an access pattern has been dropped when in fact the access pattern is still active. This situation may occur, for example, when the prefetched data is stored in a storage medium, such as a cache, that is not controlled by the prefetcher. If the prefetcher is using a pattern of cache misses as the access pattern, data prefetched into the cache could falsely interrupt the access pattern detected and result in a loss of stride continuity because the prefetched data may cause a cache hit even though this data would have caused a cache miss if it had not been prefetched. That is, the act of prefetching data into the cache causes requests that would have resulted in cache misses to instead result in a cache hit. Thus, while the ultimate object of prefetching is to decrease the number of cache misses, a prefetcher that relies on a pattern of cache misses may disrupt the detected access pattern by the very act of prefetching data into the cache.
  • the invention disclosed in this application handles a cache request that results in a cache hit to prefetched data as if a cache miss has occurred.
  • a miss may be referred to as a “virtual miss” because an actual cache miss (“actual miss”) did not occur.
  • a plurality of prefetch bits are associated with each line in the cache and are used to indicate that the data stored in the associated line was prefetched into the cache.
  • the prefetcher will calculate the miss pattern and other prefetch data as though a cache miss occurred if a cache request results in a cache hit to a line that is associated with a set prefetch bit.
  • the prefetch bits store information that is used by a prefetch manager to determine if data was prefetched into the cache.
  • prefetch bits use the prefetch bits for purposes in addition to maintaining the continuity of the prefetch access pattern.
  • a prefetch bit may be reset after the first hit to a prefetched cache line (i.e., the first hit occurring after the line is updated with prefetched data) in order to prevent the calculation of a miss distance of zero for instructions with stride that is smaller than the size of a cache line.
  • prefetch bits are also used to prevent cache pollution.
  • FIG. 1 is a partial block diagram of a computer system having a processor that maintains prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention.
  • Computer system 100 includes a processor 101 that has a decoder 110 that is coupled to a prefetcher 120 .
  • Computer system 100 also has an execution unit 107 that is coupled to decoder 110 and prefetcher 120 .
  • the term “coupled” encompasses a direct connection, an indirect connection, an indirect communication, etc.
  • Processor 101 may be may be any micro-processor capable of processing instructions, such as for example a general purpose processor in the INTEL PENTIUM family of processors.
  • Execution unit 107 is a device which performs instructions.
  • Decoder 110 may be a device or program that changes one type of code into another type of code that may be executed.
  • decoder 110 may decode a LOAD instruction that is part of a program, and the decoded LOAD instruction may later be executed by execution unit 107 .
  • Processor 101 is coupled to Random Access Memory (RAM) 140 .
  • RAM 140 is a system memory. In other embodiments, a type of system memory other than a RAM may be used in computer system 100 instead of or in addition to RAM 140 .
  • processor 101 contains a cache 130 that is coupled to execution unit 107 , prefetcher 120 , and RAM 140 .
  • cache 130 may be located outside of processor 101 .
  • Cache 130 may be a Static Random Access Memory (SRAM).
  • SRAM Static Random Access Memory
  • cache 130 contains prefetch bits 135 .
  • cache 130 contains plurality of lines to store data and each prefetch bit is associated with one of the cache lines. Further details of prefetch bits are discussed below with reference to other figures.
  • prefetcher 120 includes a prefetch manager 122 and a prefetch memory 125 .
  • Prefetch manager 122 may include logic to prefetch data for an instruction based on the distance between cache misses caused by the instruction.
  • logic may include hardware logic, such as circuits that are wired to perform operations, or program logic, such as firmware that performs operations.
  • Prefetch memory 125 may store a prefetch table 126 that contains entries including the distance between cache misses caused by an instruction.
  • prefetch memory 125 is a content addressable memory (CAM).
  • Prefetch manager 122 may determine the addresses of data elements to be prefetched based on the miss distance that is recorded for instructions in the prefetch table.
  • FIG. 2 is a partial block diagram of a cache having prefetch bits 135 according to an embodiment of the present invention.
  • FIG. 2 shows cache 130 including a data array 240 and a least recently used (LRU) array 250 .
  • data array 240 contains a plurality of cache lines 245 . Each cache line may be, for example, 32 bytes long.
  • data array 240 may be organized into sets and ways, as per conventional techniques, and cache 130 may contain other arrays such as for example a tag array.
  • LRU array 250 may contain recency of use information that is used, for example, to determine cache lines to be evicted when a portion of the data array 240 becomes full.
  • prefetched bits 135 are stored as part of the LRU array 250 .
  • LRU array 250 contains a prefetch bit for each cache line 135 in data array 240 .
  • each cache line 245 is associated with one of the prefetch bits 135 .
  • the prefetch bits 135 may be located in a part of the cache 130 other than LRU array 250 .
  • FIG. 2 shows a cache manager 260 that is coupled to data array 240 and LRU array 250 .
  • cache manager 260 contains prefetch bit management logic 261 and recency of use logic 262 .
  • the prefetch bit management logic 261 manages the values stored in the prefetch bits 135 .
  • the prefetch bit management logic 261 may set a prefetch bit each time that a cache line is updated with data that was prefetched into the cache.
  • the prefetcher 120 sends a signal to prefetch management logic 261 whenever the data loaded into the cache is prefetched data.
  • prefetch bit management logic 261 resets a prefetch bit in response to a read from a data array line associated with the prefetch bit.
  • Recency of use logic 262 may store recency of use information in LRU array 250 which information is associated with each data array line. In an embodiment, the recency of use logic 262 stores information indicating that a data array line has a status of least recently used whenever the data array line is updated with data that was prefetched into the cache. In a further embodiment, the recency of use logic 262 stores information indicating that the data array line last read has a status of most recently used unless the data array line is associated with a prefetch bit that indicates data being stored in this data array line was prefetched into the cache.
  • a set prefetch bit may indicate that the associated data array line contains prefetched data. As shown in FIG. 2, two of the prefetch bits shown are set (i.e., they have a value of “1”) and five of the prefetch bits shown are not set (i.e., have a value of “0”). If data is loaded into the cache in response to a cache miss, this data would not have been prefetched and thus the associated prefetch bit may indicate that the data was not prefetched. Of course, any value may be used to indicate that the prefetch bit is set. In an embodiment that is discussed in more detail below, the prefetch bit may be cleared the first time that prefetched data is loaded, even though the associated cache line will still contain prefetch data, to handle the case where more than one miss occurs for an instruction in the same cache line.
  • FIG. 3 is a flow diagram of a method of maintaining prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention.
  • the method shown in FIG. 3 may be used with a system such as that shown in FIGS. 1 - 2 .
  • the processor 101 may be executing a program that contains instructions.
  • decoder 101 may decode an instruction ( 301 ).
  • This instruction may be, for example, a LOAD instruction that has an IP of XXXX.
  • the LOAD instruction may load data from a location in RAM 140 , for example the data element at address YYY.
  • FIG. 3 is a flow diagram of a method of maintaining prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention.
  • the method shown in FIG. 3 may be used with a system such as that shown in FIGS. 1 - 2 .
  • the processor 101 may be executing a program that contains instructions.
  • decoder 101 may decode an instruction (
  • the instruction decoded has been executed a number of times in the past.
  • the data at address YYY has already been prefetched into a line of cache 130 . Because this data was prefetched from the RAM into a line of cache 130 , at the time the data was prefetched a prefetch bit associated with the cache line was set by prefetch bit management logic 261 .
  • the decoder 110 may cause a request to be sent to cache 130 for a data element (e.g., the data stored at address YYY) that is to be used by the instruction ( 302 ).
  • Prefetcher 120 will receive information about the response to the cache request and will determine whether the request resulted in a cache hit ( 303 ). In this example, the request would have resulted in an actual miss if the data had not been prefetched into the cache. If the request resulted in a cache miss, prefetcher 120 calculates prefetch information for the instruction based on the request having resulted in a cache miss ( 304 ).
  • the prefetcher 120 obtains information for the prefetch bit associated with the cache line that contains the data element requested ( 305 ). The prefetcher 120 then determines if the information indicates that the data element was prefetched into the cache ( 306 ). If the information indicates that the data element had been prefetched into the cache, the prefetcher 120 treats the request as a virtual miss and calculates prefetch information for the instruction based on the request having resulted in a cache miss ( 304 ). If the information indicates that the data element had not been prefetched into the cache, the prefetcher 120 calculates prefetch information for the instruction based on the request having resulted in a cache hit ( 307 ).
  • the cache manager will generate a cache miss response for a cache request if data requested is stored in the cache and the cache manager determines that the data was prefetched into the cache.
  • the cache prefetcher receives cache miss responses from the cache manager and prefetches data into the cache based on the distance between cache misses for an instruction.
  • the prefetcher 120 updates prefetch table 126 to indicate that a miss response was received whenever either an actual miss or a virtual miss response was received.
  • the prefetcher may have detected an access pattern of every fifth address because a cache miss has been detected occurring at every fifth address (e.g., 0x0005, 0x0010, 0x005, 0x0020, . . . ) for this instruction.
  • the prefetcher will prefetch the address that is five addresses away from the last address accessed by that instruction (because that is the next expected miss).
  • the data element at address 0x0025 has been prefetched into the cache, however, it will not cause an actual cache miss.
  • the prefetcher will determine, based on the content of the prefetch bit for the cache line in question, that the request caused a virtual miss. Thus, the prefetcher will update the prefetch information (e.g., the miss distance) for the instruction as if the request generated an actual miss.
  • the prefetch bit management logic 161 prevents the calculation of a miss distance of zero for instructions that have a stride greater than zero but less than the size of a cache line.
  • the prefetch bit management logic 261 resets the prefetch bit associated with the cache line that contains the data requested. The next time that this data is requested from the cache, the cache will respond to the request by indicating that an actual hit has resulted, even though the data had been prefetched, because the prefetch bit will have been reset. If the stride of the instruction is less than a cache line apart, the addresses requested by two or more instructions' could occur in the same cache line.
  • a virtual miss would be generated by the same cache line with a stride of zero every time these instructions hit the same cache line with a prefetch bit set. Clearing the prefetch bit after the first hit to the prefetched cache line prevents this case from occurring.
  • the cache manager 260 stores recency of use information for the plurality of cache lines and uses information from the prefetch bits to determine this recency of use information.
  • the recency of use logic 262 stores information in LRU array 250 indicating that a data array line has a status of least recently used whenever the data array line is updated with data that was prefetched into the cache. According to this embodiment, data that has been prefetched into the cache, but has not yet been used, may be selected first for eviction.
  • the recency of use logic 262 stores information indicating that the data array line last read has a status of most recently used unless the data array line is associated with a prefetch bit that indicates data being stored in this data array line was prefetched into the cache.
  • a cache line containing prefetched data that is hit a first time will not be changed to a status of most recently used.
  • prefetched data that is hit only once may also be evicted first.
  • the prefetch bit is cleared once the cache line is hit, and thus upon the second hit to the cache line the recency of use logic 262 will treat the cache line as it if were not prefetched and will change its status to most recently used.
  • the above embodiments for reducing cache pollution use the same data structure (i.e., the prefetch bits) as is used to indicate that a cache line contains prefetched data. If data is prefetched into the cache that is not accessed or reused, this data will first be replaced.
  • FIG. 4 is a partial block diagram of a computer system having prefetch bits according to another embodiment of the present invention. Similar to FIG. 1, FIG. 4 shows a computer system 400 that contains a processor 401 that is coupled to a RAM 440 .
  • Processor 401 contains a decoder 410 coupled to an execution unit 407 .
  • Processor 401 also contains a prefetcher 420 that is coupled to decoder 410 and execution unit 407 .
  • Computer system 400 contains a cache 430 that is coupled to processor 401 and to RAM 440 .
  • processor 401 also contains a read request buffer 470 that is coupled to prefetcher 420 , cache 430 , and RAM 440 .
  • prefetch bits 475 are attached to read request buffer 470 .
  • Read request buffer 470 may be a cache fill buffer that starts the prefetch request to memory.
  • the prefetch bit may be associated with the cache line before the data is brought into the cache. If the same instruction hits the prefetch line when it is still in the request stage, then the stride continuity may be maintained and the new prefetch request may be issued while the old prefetch request is in progress.
  • Embodiments of the present invention relate to a prefetcher which prefetches data for an instruction based on an access pattern that has been determined and maintained for the instruction.
  • the present invention maintains stride continuity by handling cache requests resulting in a cache hit to prefetched data as if a cache miss had occurred.
  • Several embodiments of the present invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention. For example, any combination of one or more of the aspects described above may be used.
  • the invention may be used with physical address or linear addresses.
  • the invention may be used with prefetch schemes based on different types of access patterns including those based on a sequential, linear or series patterns.

Abstract

A processor includes a cache that has a lines to store data. The processor also includes prefetch bits each of which is associated with one of the cache lines. The processor further includes a prefetch manager that calculates prefetch data as if a cache miss occurred whenever a cache request results in a cache hit to a cache line that is associated with a prefetch bit that is set. In a further embodiment, the prefetch manager prefetches data into the cache based on the distance between cache misses for an instruction.

Description

    FIELD OF THE INVENTION
  • Embodiments of the present invention relate to prefetching data from a memory. In particular, the present invention relates to methods and apparatus for prefetching data from a memory for use by a processor. [0001]
  • BACKGROUND
  • Instructions executed by a processor often use data that may be stored in a system memory device such as a Random Access Memory (RAM). For example, a processor may execute a LOAD instruction to load a register with data that is stored at a particular memory address. In many systems, because the access time for the system memory is relatively slow, frequently used data elements are copied from the system memory into a faster memory device called a cache and, if possible, the processor uses the copy of the data element in the cache when it needs to access (i.e., read to or write from) that data element. If the memory location that is accessed by an instruction has not been copied into a cache, then the access to the memory location by the instruction is said to cause a “cache miss” because the data needed could not be obtained from the cache. Computer systems operate more efficiently if the number of cache misses is minimized. [0002]
  • One way to decrease the time spent waiting to access a RAM is to “prefetch” data from the system memory before it is needed and, thus, before the cache miss occurs. Many processors have an instruction cycle in which instructions to be executed are obtained from memory in one step (i.e., an instruction fetch) and executed in another step. If the instruction to be executed accesses a memory location (e.g., a memory LOAD), then the data at that location must be fetched into the appropriate section of the processor from a cache or, if a cache miss, from a system memory. A cache prefetcher attempts to anticipate which data addresses will be accessed by instructions in the future and to prefetch this data from the memory before the data is needed. A cache prefetcher typically determines and maintains a data access pattern for an instruction and prefetches data into the cache based on this data access pattern. As used herein, “instruction” refers to a particular instance of an instruction in the program, with each instruction being identified by a different instruction pointer (“IP”) value. [0003]
  • The performance of a cache prefetching scheme degrades if the data access pattern is not properly managed. A prefetcher maintains access pattern “continuity” if the prefetcher maintains a discovered access pattern as long as the pattern is active and relinquishes an access pattern that is no longer active. A prefetcher operates less efficiently if the continuity of the access patterns are not maintained.[0004]
  • DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a partial block diagram of a computer system having a processor that maintains prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention. [0005]
  • FIG. 2 is a partial block diagram of a cache having prefetch bits according to an embodiment of the present invention. [0006]
  • FIG. 3 is a flow diagram of a method of maintaining prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention. [0007]
  • FIG. 4 is a partial block diagram of a computer system having prefetch bits according to another embodiment of the present invention. [0008]
  • DETAILED DESCRIPTION
  • Embodiments of the present invention relate to a prefetcher which prefetches data for an instruction based on an access pattern that has been determined and maintained for the instruction. In one embodiment, the access pattern used is the distance between cache misses caused by the instruction. This distance is the stride for the cache misses and may be referred to as the “miss distance” for the instruction. The miss distance may be stored in a prefetch table. [0009]
  • A prefetcher may incorrectly determine that an access pattern has been dropped when in fact the access pattern is still active. This situation may occur, for example, when the prefetched data is stored in a storage medium, such as a cache, that is not controlled by the prefetcher. If the prefetcher is using a pattern of cache misses as the access pattern, data prefetched into the cache could falsely interrupt the access pattern detected and result in a loss of stride continuity because the prefetched data may cause a cache hit even though this data would have caused a cache miss if it had not been prefetched. That is, the act of prefetching data into the cache causes requests that would have resulted in cache misses to instead result in a cache hit. Thus, while the ultimate object of prefetching is to decrease the number of cache misses, a prefetcher that relies on a pattern of cache misses may disrupt the detected access pattern by the very act of prefetching data into the cache. [0010]
  • In order to maintain stride continuity, the invention disclosed in this application handles a cache request that results in a cache hit to prefetched data as if a cache miss has occurred. Such a miss may be referred to as a “virtual miss” because an actual cache miss (“actual miss”) did not occur. In an embodiment, a plurality of prefetch bits (or “virtual bits”) are associated with each line in the cache and are used to indicate that the data stored in the associated line was prefetched into the cache. In an embodiment, the prefetcher will calculate the miss pattern and other prefetch data as though a cache miss occurred if a cache request results in a cache hit to a line that is associated with a set prefetch bit. Thus, the prefetch bits store information that is used by a prefetch manager to determine if data was prefetched into the cache. [0011]
  • Other embodiments use the prefetch bits for purposes in addition to maintaining the continuity of the prefetch access pattern. For example, in an embodiment a prefetch bit may be reset after the first hit to a prefetched cache line (i.e., the first hit occurring after the line is updated with prefetched data) in order to prevent the calculation of a miss distance of zero for instructions with stride that is smaller than the size of a cache line. In a further embodiment, prefetch bits are also used to prevent cache pollution. [0012]
  • FIG. 1 is a partial block diagram of a computer system having a processor that maintains prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention. [0013] Computer system 100 includes a processor 101 that has a decoder 110 that is coupled to a prefetcher 120. Computer system 100 also has an execution unit 107 that is coupled to decoder 110 and prefetcher 120. The term “coupled” encompasses a direct connection, an indirect connection, an indirect communication, etc. Processor 101 may be may be any micro-processor capable of processing instructions, such as for example a general purpose processor in the INTEL PENTIUM family of processors. Execution unit 107 is a device which performs instructions. Decoder 110 may be a device or program that changes one type of code into another type of code that may be executed. For example, decoder 110 may decode a LOAD instruction that is part of a program, and the decoded LOAD instruction may later be executed by execution unit 107. Processor 101 is coupled to Random Access Memory (RAM) 140. RAM 140 is a system memory. In other embodiments, a type of system memory other than a RAM may be used in computer system 100 instead of or in addition to RAM 140.
  • In the embodiment shown in FIG. 1, [0014] processor 101 contains a cache 130 that is coupled to execution unit 107, prefetcher 120, and RAM 140. In another embodiment, cache 130 may be located outside of processor 101. Cache 130 may be a Static Random Access Memory (SRAM). In an embodiment, cache 130 contains prefetch bits 135. In a further embodiment, cache 130 contains plurality of lines to store data and each prefetch bit is associated with one of the cache lines. Further details of prefetch bits are discussed below with reference to other figures.
  • As shown in FIG. 1, [0015] prefetcher 120 includes a prefetch manager 122 and a prefetch memory 125. Prefetch manager 122 may include logic to prefetch data for an instruction based on the distance between cache misses caused by the instruction. As used in this application, “logic” may include hardware logic, such as circuits that are wired to perform operations, or program logic, such as firmware that performs operations. Prefetch memory 125 may store a prefetch table 126 that contains entries including the distance between cache misses caused by an instruction. In an embodiment, prefetch memory 125 is a content addressable memory (CAM). Prefetch manager 122 may determine the addresses of data elements to be prefetched based on the miss distance that is recorded for instructions in the prefetch table.
  • FIG. 2 is a partial block diagram of a cache having [0016] prefetch bits 135 according to an embodiment of the present invention. FIG. 2 shows cache 130 including a data array 240 and a least recently used (LRU) array 250. As shown in FIG. 2, data array 240 contains a plurality of cache lines 245. Each cache line may be, for example, 32 bytes long. In an embodiment, data array 240 may be organized into sets and ways, as per conventional techniques, and cache 130 may contain other arrays such as for example a tag array. As would be appreciated by a person of skill in the art, LRU array 250 may contain recency of use information that is used, for example, to determine cache lines to be evicted when a portion of the data array 240 becomes full. In an embodiment of the present invention, prefetched bits 135 are stored as part of the LRU array 250. In this embodiment, LRU array 250 contains a prefetch bit for each cache line 135 in data array 240. In this embodiment, each cache line 245 is associated with one of the prefetch bits 135. In another embodiment, the prefetch bits 135 may be located in a part of the cache 130 other than LRU array 250.
  • FIG. 2 shows a [0017] cache manager 260 that is coupled to data array 240 and LRU array 250. In the embodiment shown, cache manager 260 contains prefetch bit management logic 261 and recency of use logic 262. In an embodiment, the prefetch bit management logic 261 manages the values stored in the prefetch bits 135. For example, the prefetch bit management logic 261 may set a prefetch bit each time that a cache line is updated with data that was prefetched into the cache. In an embodiment, the prefetcher 120 sends a signal to prefetch management logic 261 whenever the data loaded into the cache is prefetched data. In a further embodiment, prefetch bit management logic 261 resets a prefetch bit in response to a read from a data array line associated with the prefetch bit. Recency of use logic 262 may store recency of use information in LRU array 250 which information is associated with each data array line. In an embodiment, the recency of use logic 262 stores information indicating that a data array line has a status of least recently used whenever the data array line is updated with data that was prefetched into the cache. In a further embodiment, the recency of use logic 262 stores information indicating that the data array line last read has a status of most recently used unless the data array line is associated with a prefetch bit that indicates data being stored in this data array line was prefetched into the cache.
  • In an embodiment, a set prefetch bit may indicate that the associated data array line contains prefetched data. As shown in FIG. 2, two of the prefetch bits shown are set (i.e., they have a value of “1”) and five of the prefetch bits shown are not set (i.e., have a value of “0”). If data is loaded into the cache in response to a cache miss, this data would not have been prefetched and thus the associated prefetch bit may indicate that the data was not prefetched. Of course, any value may be used to indicate that the prefetch bit is set. In an embodiment that is discussed in more detail below, the prefetch bit may be cleared the first time that prefetched data is loaded, even though the associated cache line will still contain prefetch data, to handle the case where more than one miss occurs for an instruction in the same cache line. [0018]
  • An example of the operation of the present invention is described with reference to FIG. 3. FIG. 3 is a flow diagram of a method of maintaining prefetch stride continuity through the use of prefetch bits according to an embodiment of the present invention. The method shown in FIG. 3 may be used with a system such as that shown in FIGS. [0019] 1-2. The processor 101 may be executing a program that contains instructions. As shown in FIG. 3, decoder 101 may decode an instruction (301). This instruction may be, for example, a LOAD instruction that has an IP of XXXX. The LOAD instruction may load data from a location in RAM 140, for example the data element at address YYY. In the example shown in FIG. 3, the instruction decoded has been executed a number of times in the past. This allowed prefetcher 120 to determine an access pattern for the instruction (information on which is stored in prefetch table 126) and to prefetch the next data element to be loaded from RAM according to the access pattern. Thus, in this example the data at address YYY has already been prefetched into a line of cache 130. Because this data was prefetched from the RAM into a line of cache 130, at the time the data was prefetched a prefetch bit associated with the cache line was set by prefetch bit management logic 261.
  • According to the example shown in FIG. 3, after decoding the instruction the [0020] decoder 110 may cause a request to be sent to cache 130 for a data element (e.g., the data stored at address YYY) that is to be used by the instruction (302). Prefetcher 120 will receive information about the response to the cache request and will determine whether the request resulted in a cache hit (303). In this example, the request would have resulted in an actual miss if the data had not been prefetched into the cache. If the request resulted in a cache miss, prefetcher 120 calculates prefetch information for the instruction based on the request having resulted in a cache miss (304). If the request resulted in a cache hit, the prefetcher 120 obtains information for the prefetch bit associated with the cache line that contains the data element requested (305). The prefetcher 120 then determines if the information indicates that the data element was prefetched into the cache (306). If the information indicates that the data element had been prefetched into the cache, the prefetcher 120 treats the request as a virtual miss and calculates prefetch information for the instruction based on the request having resulted in a cache miss (304). If the information indicates that the data element had not been prefetched into the cache, the prefetcher 120 calculates prefetch information for the instruction based on the request having resulted in a cache hit (307). In this embodiment, the cache manager will generate a cache miss response for a cache request if data requested is stored in the cache and the cache manager determines that the data was prefetched into the cache. The cache prefetcher receives cache miss responses from the cache manager and prefetches data into the cache based on the distance between cache misses for an instruction. In an embodiment, the prefetcher 120 updates prefetch table 126 to indicate that a miss response was received whenever either an actual miss or a virtual miss response was received.
  • In the example above, the prefetcher may have detected an access pattern of every fifth address because a cache miss has been detected occurring at every fifth address (e.g., 0x0005, 0x0010, 0x005, 0x0020, . . . ) for this instruction. Thus, the prefetcher will prefetch the address that is five addresses away from the last address accessed by that instruction (because that is the next expected miss). Once the data element at address 0x0025 has been prefetched into the cache, however, it will not cause an actual cache miss. If the prefetching scheme is based on a detected pattern of cache misses, the presence of the prefetched data element from address 0x0025 in the cache could cause the prefetcher to determine that the pattern has been interrupted (because the request for address 0x0025 did not cause an actual cache miss) even though the access pattern is actually still valid. Thus, the learned stride access pattern of 5 may become corrupted. According to embodiments of the invention disclosed in this application, the prefetcher will determine, based on the content of the prefetch bit for the cache line in question, that the request caused a virtual miss. Thus, the prefetcher will update the prefetch information (e.g., the miss distance) for the instruction as if the request generated an actual miss. [0021]
  • In a further embodiment, the prefetch bit management logic [0022] 161 prevents the calculation of a miss distance of zero for instructions that have a stride greater than zero but less than the size of a cache line. In this embodiment, whenever a request results in a virtual miss, the prefetch bit management logic 261 resets the prefetch bit associated with the cache line that contains the data requested. The next time that this data is requested from the cache, the cache will respond to the request by indicating that an actual hit has resulted, even though the data had been prefetched, because the prefetch bit will have been reset. If the stride of the instruction is less than a cache line apart, the addresses requested by two or more instructions' could occur in the same cache line. Thus, a virtual miss would be generated by the same cache line with a stride of zero every time these instructions hit the same cache line with a prefetch bit set. Clearing the prefetch bit after the first hit to the prefetched cache line prevents this case from occurring.
  • In a further embodiment, the [0023] cache manager 260 stores recency of use information for the plurality of cache lines and uses information from the prefetch bits to determine this recency of use information. In an embodiment, the recency of use logic 262 stores information in LRU array 250 indicating that a data array line has a status of least recently used whenever the data array line is updated with data that was prefetched into the cache. According to this embodiment, data that has been prefetched into the cache, but has not yet been used, may be selected first for eviction. The recency of use logic 262 stores information indicating that the data array line last read has a status of most recently used unless the data array line is associated with a prefetch bit that indicates data being stored in this data array line was prefetched into the cache. According to this embodiment, a cache line containing prefetched data that is hit a first time will not be changed to a status of most recently used. Thus, prefetched data that is hit only once may also be evicted first. The prefetch bit is cleared once the cache line is hit, and thus upon the second hit to the cache line the recency of use logic 262 will treat the cache line as it if were not prefetched and will change its status to most recently used. The above embodiments for reducing cache pollution use the same data structure (i.e., the prefetch bits) as is used to indicate that a cache line contains prefetched data. If data is prefetched into the cache that is not accessed or reused, this data will first be replaced.
  • FIG. 4 is a partial block diagram of a computer system having prefetch bits according to another embodiment of the present invention. Similar to FIG. 1, FIG. 4 shows a [0024] computer system 400 that contains a processor 401 that is coupled to a RAM 440. Processor 401 contains a decoder 410 coupled to an execution unit 407. Processor 401 also contains a prefetcher 420 that is coupled to decoder 410 and execution unit 407. Computer system 400 contains a cache 430 that is coupled to processor 401 and to RAM 440. Unlike the processor 101 of FIG. 1, processor 401 also contains a read request buffer 470 that is coupled to prefetcher 420, cache 430, and RAM 440. In this embodiment, prefetch bits 475 are attached to read request buffer 470. Read request buffer 470 may be a cache fill buffer that starts the prefetch request to memory. When this embodiment is used, the prefetch bit may be associated with the cache line before the data is brought into the cache. If the same instruction hits the prefetch line when it is still in the request stage, then the stride continuity may be maintained and the new prefetch request may be issued while the old prefetch request is in progress.
  • Embodiments of the present invention relate to a prefetcher which prefetches data for an instruction based on an access pattern that has been determined and maintained for the instruction. The present invention maintains stride continuity by handling cache requests resulting in a cache hit to prefetched data as if a cache miss had occurred. Several embodiments of the present invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention. For example, any combination of one or more of the aspects described above may be used. In addition, the invention may be used with physical address or linear addresses. In addition, the invention may be used with prefetch schemes based on different types of access patterns including those based on a sequential, linear or series patterns. [0025]

Claims (24)

What is claimed is:
1. A processor comprising:
a cache having plurality of lines to store data;
a plurality of prefetch bits each associated with one of the cache lines; and
a prefetcher to calculate prefetch data as though a cache miss occurred if a cache request results in a cache hit to a line that is associated with a set prefetch bit.
2. The processor of claim 1, wherein the prefetch data includes miss distance information for instructions.
3. The processor of claim 1, wherein the prefetch bits are stored in the cache.
4. The processor of claim 1, wherein the prefetch manager has logic to reset a prefetch bit associated with a cache line whenever a cache request results in a cache hit to the cache line and the prefetch bit was set.
5. The processor of claim 1 further comprising logic to store recency of use information for the plurality of cache lines which logic uses information from the prefetch bits to determine the recency of use information.
6. A cache comprising:
a data array having plurality of lines to store data; and
a plurality of prefetch bits each associated with one of the data array lines to indicate that data stored in the associated line was prefetched into the cache.
7. The cache of claim 6, wherein the cache contains logic to reset a prefetch bit in response to a read from the data array line associated with the prefetch bit.
8. The cache of claim 6, wherein the cache further comprises a least recently used (LRU) array, and wherein said plurality of prefetch bits are located in the LRU array.
9. The cache of claim 6, wherein the cache has logic to store recency of use information associated with each data array line, and wherein said logic stores information indicating that a data array line has a status of least recently used whenever the data array line is updated with data that was prefetched into the cache.
10. The cache of claim 9, wherein said logic stores information indicating that the data array line last read has a status of most recently used unless the data array line is associated with a prefetch bit that indicates data being stored in this data array line was prefetched into the cache.
11. A processor comprising:
a cache;
an instruction decoder to decode instructions and to cause cache requests to be sent for data to be used by the instructions decoded;
a cache manager to generate a cache miss response for a cache request if data requested is stored in the cache and the cache manager determines that the data was prefetched into the cache; and
a cache prefetcher to receive cache miss responses from the cache manager and to prefetch data into the cache based on the distance between cache misses for an instruction.
12. The processor of claim 11, wherein the processor further includes a plurality of prefetch bits that store information used by the prefetch manager to determine if data was prefetched into the cache.
13. The processor of claim 12, wherein the cache contains a Read request buffer, and wherein the plurality of prefetch bits are attached to the Read request buffer.
14. The processor of claim 12, wherein the processor contains logic to reset the prefetch bit associated with prefetched data in response to the first hit to the prefetched data, and wherein the cache manager will determine that data was not prefetched if the prefetch bit associated with said data is reset.
15. The processor of claim 11, wherein the cache contains bits to store information about the status of each line in the cache, and wherein the cache contains logic to update the status of a cache line to least recently used whenever prefetched data is stored in the cache line and to update the status of said cache line to most recently used whenever the prefetched data is read a second time after the data is prefetched into the cache.
16. A method of maintaining the continuity of prefetch information, the method comprising:
decoding an instruction a first time;
sending a first request to a cache for data to be used by said instruction;
determining that the data requested in the first request is stored in a line in a cache;
determining that a prefetch bit associated with said cache line indicates that the cache line stores data that was prefetched into the cache; and
calculating prefetch information for said instruction, wherein the prefetch information is calculated based on the first request having resulted in a cache miss.
17. The method of claim 16, wherein the method further comprises resetting the prefetch bit associated with the cache line.
18. The method of claim 16, wherein the calculation of prefetch information for an instruction comprises calculating miss distance information for the instruction.
19. The method of claim 16, further comprising:
decoding a second instruction which is to use said data;
sending a second request to the cache for said data;
determining that the data requested in the second request is stored in a line in the cache;
determining that the prefetch bit associated with said cache line indicates that the cache line stores data that was not prefetched into the cache;
calculating prefetch information for the instruction, wherein the prefetch information is calculated based on the second request having resulted in a cache hit; and
updating the status information corresponding to the cache line to indicate that the cache line was most recently used.
20. A processor comprising:
a cache having a plurality of cache lines;
a means for prefetching data into one of the cache lines; and
a means for indicating that a cache line contains prefetched data.
21. The processor of claim 20, further comprising a means for determining that a virtual miss has occurred in response to a request for data sent to the cache whenever the data is stored in a cache line and the means for indicating indicates that the cache lines contains prefetched data.
22. The processor of claim 20, wherein the means for prefetching updates a prefetch table to indicate that a miss response was received whenever a response was received for an actual miss or a virtual miss.
23. The processor of claim 20 further comprising a means for preventing the calculation of a miss distance of zero for instructions that have a miss distance that is greater than zero but less than the size of a cache line.
24. The processor of claim 20, wherein the means for indicating that a cache line contains prefetched data stores a data structure that is used to determine whether a cache line contains prefetched data, and wherein the processor further comprising a means for reducing cache pollution that uses the same data structure as said means for indicating.
US09/820,967 2000-12-29 2001-03-30 System and method for maintaining prefetch stride continuity through the use of prefetch bits Abandoned US20020087802A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/820,967 US20020087802A1 (en) 2000-12-29 2001-03-30 System and method for maintaining prefetch stride continuity through the use of prefetch bits

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/749,936 US6584549B2 (en) 2000-12-29 2000-12-29 System and method for prefetching data into a cache based on miss distance
US09/820,967 US20020087802A1 (en) 2000-12-29 2001-03-30 System and method for maintaining prefetch stride continuity through the use of prefetch bits

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/749,936 Continuation-In-Part US6584549B2 (en) 2000-12-29 2000-12-29 System and method for prefetching data into a cache based on miss distance

Publications (1)

Publication Number Publication Date
US20020087802A1 true US20020087802A1 (en) 2002-07-04

Family

ID=25015840

Family Applications (3)

Application Number Title Priority Date Filing Date
US09/749,936 Expired - Lifetime US6584549B2 (en) 2000-12-29 2000-12-29 System and method for prefetching data into a cache based on miss distance
US09/820,967 Abandoned US20020087802A1 (en) 2000-12-29 2001-03-30 System and method for maintaining prefetch stride continuity through the use of prefetch bits
US10/427,908 Expired - Lifetime US6701414B2 (en) 2000-12-29 2003-05-02 System and method for prefetching data into a cache based on miss distance

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/749,936 Expired - Lifetime US6584549B2 (en) 2000-12-29 2000-12-29 System and method for prefetching data into a cache based on miss distance

Family Applications After (1)

Application Number Title Priority Date Filing Date
US10/427,908 Expired - Lifetime US6701414B2 (en) 2000-12-29 2003-05-02 System and method for prefetching data into a cache based on miss distance

Country Status (8)

Country Link
US (3) US6584549B2 (en)
EP (1) EP1346281A2 (en)
CN (1) CN1222870C (en)
AU (1) AU2002241682A1 (en)
HK (1) HK1064170A1 (en)
RU (1) RU2260838C2 (en)
TW (1) TW541498B (en)
WO (1) WO2002054230A2 (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040030839A1 (en) * 2001-10-22 2004-02-12 Stmicroelectronics Limited Cache memory operation
US20040039878A1 (en) * 2002-08-23 2004-02-26 Van De Waerdt Jan-Willem Processor prefetch to match memory bus protocol characteristics
US20040148471A1 (en) * 2003-01-28 2004-07-29 Sun Microsystems, Inc Multiprocessing computer system employing capacity prefetching
US20060090036A1 (en) * 2004-10-27 2006-04-27 Ofir Zohar Method for differential discarding of cached data in distributed storage systems
US20060224830A1 (en) * 2005-03-30 2006-10-05 Ibm Corporation Performance of a cache by detecting cache lines that have been reused
US20070043908A1 (en) * 2003-05-30 2007-02-22 Mips Technologies, Inc. Microprocessor with improved data stream prefetching
US20070239940A1 (en) * 2006-03-31 2007-10-11 Doshi Kshitij A Adaptive prefetching
US20090195542A1 (en) * 2006-05-16 2009-08-06 Panasonic Corporation Image processing apparatus
US7873791B1 (en) * 2007-09-28 2011-01-18 Emc Corporation Methods and systems for incorporating improved tail cutting in a prefetch stream in TBC mode for data storage having a cache memory
US20130185515A1 (en) * 2012-01-16 2013-07-18 Qualcomm Incorporated Utilizing Negative Feedback from Unexpected Miss Addresses in a Hardware Prefetcher
US20140082324A1 (en) * 2012-09-14 2014-03-20 Reuven Elhamias Method and Storage Device for Using File System Data to Predict Host Device Operations
US20140281248A1 (en) * 2013-03-16 2014-09-18 Intel Corporation Read-write partitioning of cache memory
US8949522B1 (en) * 2011-06-21 2015-02-03 Netlogic Microsystems, Inc. Performance of a stride-based prefetcher on an out-of-order processing unit (CPU)
US9009449B2 (en) * 2011-11-10 2015-04-14 Oracle International Corporation Reducing power consumption and resource utilization during miss lookahead
US20180024930A1 (en) * 2016-07-20 2018-01-25 International Business Machines Corporation Processing data based on cache residency
US9904624B1 (en) 2016-04-07 2018-02-27 Apple Inc. Prefetch throttling in a multi-core system
US20180095884A1 (en) * 2016-09-30 2018-04-05 Intel Corporation Mass storage cache in non volatile level of multi-level system memory
US9971694B1 (en) * 2015-06-24 2018-05-15 Apple Inc. Prefetch circuit for a processor with pointer optimization
US10169239B2 (en) 2016-07-20 2019-01-01 International Business Machines Corporation Managing a prefetch queue based on priority indications of prefetch requests
US10180905B1 (en) 2016-04-07 2019-01-15 Apple Inc. Unified prefetch circuit for multi-level caches
US20190138449A1 (en) * 2017-11-06 2019-05-09 Samsung Electronics Co., Ltd. Coordinated cache management policy for an exclusive cache hierarchy
US10331567B1 (en) 2017-02-17 2019-06-25 Apple Inc. Prefetch circuit with global quality factor to reduce aggressiveness in low power modes
US20190303037A1 (en) * 2018-03-30 2019-10-03 Ca, Inc. Using sequential read intention to increase data buffer reuse
US10452395B2 (en) 2016-07-20 2019-10-22 International Business Machines Corporation Instruction to query cache residency
US10521350B2 (en) 2016-07-20 2019-12-31 International Business Machines Corporation Determining the effectiveness of prefetch instructions
US20200201776A1 (en) * 2018-12-20 2020-06-25 Micron Technology, Inc. Using a second content-addressable memory to manage memory burst accesses in memory sub-systems
US11093248B2 (en) * 2018-09-10 2021-08-17 International Business Machines Corporation Prefetch queue allocation protection bubble in a processor
US20230121686A1 (en) * 2021-10-14 2023-04-20 Arm Limited Prefetcher training

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10116863A1 (en) * 2001-04-04 2002-10-17 Infineon Technologies Ag interface
US8171266B2 (en) * 2001-08-02 2012-05-01 Hewlett-Packard Development Company, L.P. Look-ahead load pre-fetch in a processor
US20030084433A1 (en) * 2001-10-31 2003-05-01 Chi-Keung Luk Profile-guided stride prefetching
US6832296B2 (en) * 2002-04-09 2004-12-14 Ip-First, Llc Microprocessor with repeat prefetch instruction
EP1361518B1 (en) * 2002-05-10 2013-08-07 Texas Instruments Incorporated Reducing TAG-RAM accesses and accelerating cache operation during cache miss
CN1327353C (en) * 2003-04-21 2007-07-18 智慧第一公司 Microprocessor device capable of selectively withdraw prefetch and method
US7111126B2 (en) * 2003-09-24 2006-09-19 Arm Limited Apparatus and method for loading data values
US7487296B1 (en) * 2004-02-19 2009-02-03 Sun Microsystems, Inc. Multi-stride prefetcher with a recurring prefetch table
US7120753B2 (en) * 2004-04-20 2006-10-10 International Business Machines Corporation System and method for dynamically adjusting read ahead values based upon memory usage
US7421540B2 (en) * 2005-05-03 2008-09-02 International Business Machines Corporation Method, apparatus, and program to efficiently calculate cache prefetching patterns for loops
US7681047B2 (en) * 2006-04-18 2010-03-16 International Business Machines Corporation Decryption of data in storage systems
US7774578B2 (en) * 2006-06-07 2010-08-10 Advanced Micro Devices, Inc. Apparatus and method of prefetching data in response to a cache miss
US8060701B2 (en) * 2006-12-08 2011-11-15 Qualcomm Incorporated Apparatus and methods for low-complexity instruction prefetch system
US8209488B2 (en) * 2008-02-01 2012-06-26 International Business Machines Corporation Techniques for prediction-based indirect data prefetching
US8161263B2 (en) * 2008-02-01 2012-04-17 International Business Machines Corporation Techniques for indirect data prefetching
US8166277B2 (en) * 2008-02-01 2012-04-24 International Business Machines Corporation Data prefetching using indirect addressing
US8161265B2 (en) * 2008-02-01 2012-04-17 International Business Machines Corporation Techniques for multi-level indirect data prefetching
US8161264B2 (en) * 2008-02-01 2012-04-17 International Business Machines Corporation Techniques for data prefetching using indirect addressing with offset
US7925865B2 (en) * 2008-06-02 2011-04-12 Oracle America, Inc. Accuracy of correlation prefetching via block correlation and adaptive prefetch degree selection
US8140769B2 (en) * 2009-04-20 2012-03-20 Oracle America, Inc. Data prefetcher
US8166251B2 (en) * 2009-04-20 2012-04-24 Oracle America, Inc. Data prefetcher that adjusts prefetch stream length based on confidence
US8880844B1 (en) * 2010-03-12 2014-11-04 Trustees Of Princeton University Inter-core cooperative TLB prefetchers
US8433852B2 (en) * 2010-08-30 2013-04-30 Intel Corporation Method and apparatus for fuzzy stride prefetch
CN102253901B (en) * 2011-07-13 2013-07-24 清华大学 Read/write distinguished data storage replacing method based on phase change memory
WO2013101138A1 (en) * 2011-12-30 2013-07-04 Intel Corporation Identifying and prioritizing critical instructions within processor circuitry
US9304932B2 (en) * 2012-12-20 2016-04-05 Qualcomm Incorporated Instruction cache having a multi-bit way prediction mask
US9792120B2 (en) * 2013-03-05 2017-10-17 International Business Machines Corporation Anticipated prefetching for a parent core in a multi-core chip
CN104750696B (en) * 2013-12-26 2018-07-20 华为技术有限公司 A kind of data prefetching method and device
US9792224B2 (en) * 2015-10-23 2017-10-17 Intel Corporation Reducing latency by persisting data relationships in relation to corresponding data in persistent memory
CN106776371B (en) * 2015-12-14 2019-11-26 上海兆芯集成电路有限公司 Span refers to prefetcher, processor and the method for pre-fetching data into processor
CN106021128B (en) * 2016-05-31 2018-10-30 东南大学—无锡集成电路技术研究所 A kind of data pre-fetching device and its forecasting method based on stride and data dependence
US10303575B2 (en) * 2017-01-10 2019-05-28 International Business Machines Corporation Time-slice-instrumentation facility
US11099995B2 (en) 2018-03-28 2021-08-24 Intel Corporation Techniques for prefetching data to a first level of memory of a hierarchical arrangement of memory
US11281585B2 (en) 2018-08-30 2022-03-22 Micron Technology, Inc. Forward caching memory systems and methods
US11243884B2 (en) * 2018-11-13 2022-02-08 Advanced Micro Devices, Inc. Control flow guided lock address prefetch and filtering
US11294808B2 (en) 2020-05-21 2022-04-05 Micron Technology, Inc. Adaptive cache
US11422934B2 (en) 2020-07-14 2022-08-23 Micron Technology, Inc. Adaptive address tracking
US11409657B2 (en) 2020-07-14 2022-08-09 Micron Technology, Inc. Adaptive address tracking

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5093777A (en) 1989-06-12 1992-03-03 Bull Hn Information Systems Inc. Method and apparatus for predicting address of a subsequent cache request upon analyzing address patterns stored in separate miss stack
US5790823A (en) * 1995-07-13 1998-08-04 International Business Machines Corporation Operand prefetch table
US6055622A (en) 1997-02-03 2000-04-25 Intel Corporation Global stride prefetching apparatus and method for a high-performance processor
US6253306B1 (en) 1998-07-29 2001-06-26 Advanced Micro Devices, Inc. Prefetch instruction mechanism for processor
US6311260B1 (en) 1999-02-25 2001-10-30 Nec Research Institute, Inc. Method for perfetching structured data
US6574712B1 (en) * 1999-11-08 2003-06-03 International Business Machines Corporation Software prefetch system and method for predetermining amount of streamed data

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6959363B2 (en) * 2001-10-22 2005-10-25 Stmicroelectronics Limited Cache memory operation
US20040030839A1 (en) * 2001-10-22 2004-02-12 Stmicroelectronics Limited Cache memory operation
US7162588B2 (en) * 2002-08-23 2007-01-09 Koninklijke Philips Electronics N.V. Processor prefetch to match memory bus protocol characteristics
US20040039878A1 (en) * 2002-08-23 2004-02-26 Van De Waerdt Jan-Willem Processor prefetch to match memory bus protocol characteristics
US20040148471A1 (en) * 2003-01-28 2004-07-29 Sun Microsystems, Inc Multiprocessing computer system employing capacity prefetching
US7165146B2 (en) * 2003-01-28 2007-01-16 Sun Microsystems, Inc. Multiprocessing computer system employing capacity prefetching
US7664920B2 (en) * 2003-05-30 2010-02-16 Mips Technologies, Inc. Microprocessor with improved data stream prefetching
US20070043908A1 (en) * 2003-05-30 2007-02-22 Mips Technologies, Inc. Microprocessor with improved data stream prefetching
US8078806B2 (en) 2003-05-30 2011-12-13 Mips Technologies, Inc. Microprocessor with improved data stream prefetching
US20110040941A1 (en) * 2003-05-30 2011-02-17 Diefendorff Keith E Microprocessor with Improved Data Stream Prefetching
US7822943B2 (en) 2003-05-30 2010-10-26 Mips Technologies, Inc. Microprocessor with improved data stream prefetching using multiple transaction look-aside buffers (TLBs)
US20090077321A1 (en) * 2003-05-30 2009-03-19 Mips Technologies, Inc. Microprocessor with Improved Data Stream Prefetching
US20060090036A1 (en) * 2004-10-27 2006-04-27 Ofir Zohar Method for differential discarding of cached data in distributed storage systems
US7313654B2 (en) * 2004-10-27 2007-12-25 Xiv Ltd Method for differential discarding of cached data in distributed storage systems
US20080168236A1 (en) * 2005-03-30 2008-07-10 International Business Machines Corporation Performance of a cache by detecting cache lines that have been reused
US7552286B2 (en) * 2005-03-30 2009-06-23 International Business Machines Corporation Performance of a cache by detecting cache lines that have been reused
US7380065B2 (en) * 2005-03-30 2008-05-27 International Business Machines Corporation Performance of a cache by detecting cache lines that have been reused
US20060224830A1 (en) * 2005-03-30 2006-10-05 Ibm Corporation Performance of a cache by detecting cache lines that have been reused
US20070239940A1 (en) * 2006-03-31 2007-10-11 Doshi Kshitij A Adaptive prefetching
US20090195542A1 (en) * 2006-05-16 2009-08-06 Panasonic Corporation Image processing apparatus
JP5131986B2 (en) * 2006-05-16 2013-01-30 パナソニック株式会社 Image processing device
US7873791B1 (en) * 2007-09-28 2011-01-18 Emc Corporation Methods and systems for incorporating improved tail cutting in a prefetch stream in TBC mode for data storage having a cache memory
US8949522B1 (en) * 2011-06-21 2015-02-03 Netlogic Microsystems, Inc. Performance of a stride-based prefetcher on an out-of-order processing unit (CPU)
US9009449B2 (en) * 2011-11-10 2015-04-14 Oracle International Corporation Reducing power consumption and resource utilization during miss lookahead
US20130185515A1 (en) * 2012-01-16 2013-07-18 Qualcomm Incorporated Utilizing Negative Feedback from Unexpected Miss Addresses in a Hardware Prefetcher
US20140082324A1 (en) * 2012-09-14 2014-03-20 Reuven Elhamias Method and Storage Device for Using File System Data to Predict Host Device Operations
US20140281248A1 (en) * 2013-03-16 2014-09-18 Intel Corporation Read-write partitioning of cache memory
US9223710B2 (en) * 2013-03-16 2015-12-29 Intel Corporation Read-write partitioning of cache memory
US9971694B1 (en) * 2015-06-24 2018-05-15 Apple Inc. Prefetch circuit for a processor with pointer optimization
US9904624B1 (en) 2016-04-07 2018-02-27 Apple Inc. Prefetch throttling in a multi-core system
US10180905B1 (en) 2016-04-07 2019-01-15 Apple Inc. Unified prefetch circuit for multi-level caches
US10452395B2 (en) 2016-07-20 2019-10-22 International Business Machines Corporation Instruction to query cache residency
US11080052B2 (en) 2016-07-20 2021-08-03 International Business Machines Corporation Determining the effectiveness of prefetch instructions
US10169239B2 (en) 2016-07-20 2019-01-01 International Business Machines Corporation Managing a prefetch queue based on priority indications of prefetch requests
US20180024930A1 (en) * 2016-07-20 2018-01-25 International Business Machines Corporation Processing data based on cache residency
US10521350B2 (en) 2016-07-20 2019-12-31 International Business Machines Corporation Determining the effectiveness of prefetch instructions
US10572254B2 (en) 2016-07-20 2020-02-25 International Business Machines Corporation Instruction to query cache residency
US10621095B2 (en) * 2016-07-20 2020-04-14 International Business Machines Corporation Processing data based on cache residency
US20180095884A1 (en) * 2016-09-30 2018-04-05 Intel Corporation Mass storage cache in non volatile level of multi-level system memory
US10331567B1 (en) 2017-02-17 2019-06-25 Apple Inc. Prefetch circuit with global quality factor to reduce aggressiveness in low power modes
US20190138449A1 (en) * 2017-11-06 2019-05-09 Samsung Electronics Co., Ltd. Coordinated cache management policy for an exclusive cache hierarchy
US10606752B2 (en) * 2017-11-06 2020-03-31 Samsung Electronics Co., Ltd. Coordinated cache management policy for an exclusive cache hierarchy
US20190303037A1 (en) * 2018-03-30 2019-10-03 Ca, Inc. Using sequential read intention to increase data buffer reuse
US11093248B2 (en) * 2018-09-10 2021-08-17 International Business Machines Corporation Prefetch queue allocation protection bubble in a processor
US20200201776A1 (en) * 2018-12-20 2020-06-25 Micron Technology, Inc. Using a second content-addressable memory to manage memory burst accesses in memory sub-systems
US11442867B2 (en) * 2018-12-20 2022-09-13 Micron Technology, Inc. Using a second content-addressable memory to manage memory burst accesses in memory sub-systems
US11847058B2 (en) 2018-12-20 2023-12-19 Micron Technology, Inc. Using a second content-addressable memory to manage memory burst accesses in memory sub-systems
US20230121686A1 (en) * 2021-10-14 2023-04-20 Arm Limited Prefetcher training
US11853220B2 (en) * 2021-10-14 2023-12-26 Arm Limited Prefetcher training

Also Published As

Publication number Publication date
TW541498B (en) 2003-07-11
US20030196046A1 (en) 2003-10-16
HK1064170A1 (en) 2005-01-21
CN1222870C (en) 2005-10-12
US6584549B2 (en) 2003-06-24
AU2002241682A1 (en) 2002-07-16
RU2260838C2 (en) 2005-09-20
US20020087800A1 (en) 2002-07-04
WO2002054230A3 (en) 2003-12-11
US6701414B2 (en) 2004-03-02
RU2003119149A (en) 2005-01-10
EP1346281A2 (en) 2003-09-24
CN1484788A (en) 2004-03-24
WO2002054230A2 (en) 2002-07-11
WO2002054230A8 (en) 2003-03-06

Similar Documents

Publication Publication Date Title
US20020087802A1 (en) System and method for maintaining prefetch stride continuity through the use of prefetch bits
JP4486750B2 (en) Shared cache structure for temporary and non-temporary instructions
US6138213A (en) Cache including a prefetch way for storing prefetch cache lines and configured to move a prefetched cache line to a non-prefetch way upon access to the prefetched cache line
US5537573A (en) Cache system and method for prefetching of data
US5603004A (en) Method for decreasing time penalty resulting from a cache miss in a multi-level cache system
US5752261A (en) Method and apparatus for detecting thrashing in a cache memory
US5630097A (en) Enhanced cache operation with remapping of pages for optimizing data relocation from addresses causing cache misses
US6480939B2 (en) Method and apparatus for filtering prefetches to provide high prefetch accuracy using less hardware
US20100217937A1 (en) Data processing apparatus and method
US7380047B2 (en) Apparatus and method for filtering unused sub-blocks in cache memories
US20030221072A1 (en) Method and apparatus for increasing processor performance in a computing system
US6487639B1 (en) Data cache miss lookaside buffer and method thereof
US6668307B1 (en) System and method for a software controlled cache
JPH10283261A (en) Method and device for cache entry reservation processing
US7716423B2 (en) Pseudo LRU algorithm for hint-locking during software and hardware address translation cache miss handling modes
JP3262519B2 (en) Method and system for enhancing processor memory performance by removing old lines in second level cache
US9690707B2 (en) Correlation-based instruction prefetching
US6715040B2 (en) Performance improvement of a write instruction of a non-inclusive hierarchical cache memory unit
GB2299879A (en) Instruction/data prefetching using non-referenced prefetch cache
US11397680B2 (en) Apparatus and method for controlling eviction from a storage structure
US7051159B2 (en) Method and system for cache data fetch operations
US11086781B2 (en) Methods and apparatus for monitoring prefetcher accuracy information using a prefetch flag independently accessible from prefetch tag information
JPH0675853A (en) Cache memory device
JP2024011696A (en) Arithmetic processing apparatus and arithmetic processing method
JPH0337745A (en) Cache memory controller

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AL-DAJANI, KHALID D.;ABDALLAH, MOHAMMAD A.;REEL/FRAME:011880/0107

Effective date: 20010529

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION