US7502895B2 - Techniques for reducing castouts in a snoop filter - Google Patents

Techniques for reducing castouts in a snoop filter Download PDF

Info

Publication number
US7502895B2
US7502895B2 US11/225,937 US22593705A US7502895B2 US 7502895 B2 US7502895 B2 US 7502895B2 US 22593705 A US22593705 A US 22593705A US 7502895 B2 US7502895 B2 US 7502895B2
Authority
US
United States
Prior art keywords
request
bus
tag
state
shared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/225,937
Other versions
US20070061520A1 (en
Inventor
Phillip Matthew Jones
Kourosh Gharachorloo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Valtrus Innovations Ltd
Hewlett Packard Enterprise Development LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US11/225,937 priority Critical patent/US7502895B2/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JONES, PHILLIP M., GHARACHORLOO, KOUROSH
Publication of US20070061520A1 publication Critical patent/US20070061520A1/en
Application granted granted Critical
Publication of US7502895B2 publication Critical patent/US7502895B2/en
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Assigned to OT PATENT ESCROW, LLC reassignment OT PATENT ESCROW, LLC PATENT ASSIGNMENT, SECURITY INTEREST, AND LIEN AGREEMENT Assignors: HEWLETT PACKARD ENTERPRISE COMPANY, HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Assigned to VALTRUS INNOVATIONS LIMITED reassignment VALTRUS INNOVATIONS LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OT PATENT ESCROW, LLC
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/082Associative directories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means

Definitions

  • Some systems include multiple processing units or microprocessors connected via a processor bus. By implementing multiple processors, system processing efficiency is improved by providing a system that is able to simultaneously process requests.
  • a host/data controller is generally provided. The host/data controller is further tasked with coordinating the exchange of information between the plurality of processors and the system memory.
  • the host/data controller may be responsible not only for the exchange of information in the typical Read-Only Memory (ROM) and the Random Access Memory (RAM), but also the cache memory in high speed systems.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • Cache memory is a special high speed storage mechanism which may be provided as a reserved section of the main memory or as an independent high-speed storage device.
  • the cache memory is a portion of the RAM which is typically made of high speed static RAM (SRAM) rather than the slower and cheaper dynamic RAM (DRAM) which may be used for the remainder of the main memory.
  • SRAM static RAM
  • DRAM dynamic RAM
  • each processor may have an associated cache memory. By storing frequently accessed data in the cache memory, the processor avoids having to re-access the shared memory each time the information is needed.
  • bus sniffing or bus snooping may be implemented to maintain system memory coherency.
  • an algorithm or apparatus should be designed to promote data changes by any agent to any other agent demand request. That is to say that in order to maintain coherency, each time a processor issues a request for memory data, the other processor caches may need to be searched for copies of that data, depending on the type of request, to insure that only the most up to date information is used.
  • Some aspects of this apparatus is provided by the processor architecture. For example, X86 architecture maintains coherency across different levels of processor cache. The X86 architecture front side bus definition also deploys a self snooping protocol for agents that share the same bus.
  • Snoop filters or tag caches are a common solution for coordinating system level coherency across multiple bus segments.
  • One of the primary goals of an efficient snoop filter design is to minimize the number of unnecessary snoops to preserve front side bus bandwidth for request and data traffic. This includes request snoops required to retrieve the most recent data or provide an agent exclusive access to data and to “castout” snoops required to make space in the tag cache for a forced inclusion snoop filter.
  • a typical snoop filter is implemented using a direct mapped policy where the tag cache can track only one tag at a given tag index.
  • the snoop filter runs a castout cycle using the current tag, to make room for the new tag.
  • the castout cycle runs a back invalidation to the processor bus(es) being tracked by the snoop filter. If the processor needs the evicted cacheline again, it is forced to fetch the cache line from memory instead of its own cache. This results in a performance penalty, as the latency to an internal cache running at core clock speed compared to the latency to the main memory running at system bus clock speed can be an order of magnitude in difference.
  • the present invention may address one or more of the problems set forth above.
  • FIG. 1 is a block diagram illustrating an exemplary computer system having a multiple processor bus architecture according to the embodiments of the present invention
  • FIG. 2 is a block diagram illustrating an exemplary host controller in accordance with embodiments of the present invention.
  • FIGS. 3-7 are respective flow charts illustrating an improved method of processing various request types in a computer system in accordance with embodiments of the present invention.
  • a new approach to snoop filters is provided.
  • the basic operation of the filter described herein is to force inclusion on exclusive access only and not track shared accesses. As long as the cached data remains in the shared state, the data will not be evicted by the snoop filter algorithm.
  • the penalty of this technique is that exclusive requests must invalidate all bus segments to insure shared data is removed from all processor caches except the request agent.
  • a method is described that allows the snoop filter to track shared accesses until a penalty is encountered.
  • the “castout penalty” refers to the occurance of a castout snoop on a shared access with no net gain in precise snoop filter state information.
  • the castout tag is retained, the castout snoop is canceled and the shared request is dropped from the snoop filter, as described further below.
  • FIG. 1 a block diagram of an exemplary computer system with multiple processor buses and an I/O bus, generally designated as reference numeral 10 , is illustrated.
  • the computer system 10 typically includes one or more processors or CPUs.
  • the system 10 may utilize eight CPUs 12 A- 12 H.
  • Each CPU 12 A- 12 H may include a respective cache memory 13 A- 13 H for storing recently accessed information.
  • the system 10 may utilize a split-bus configuration in which the CPUs 12 A- 12 D are coupled to a first bus 14 A and the CPUs 12 E- 12 H are coupled to a second bus 14 B.
  • processors or CPUs 12 A- 12 H may be of any suitable type, such as a microprocessor available from Intel, AMD, or Motorola, for example. Each CPU 12 A- 12 H may include a segment of cache memory for storage of frequently accessed data and programs. Furthermore, any suitable bus configuration may be coupled to the CPUs 12 A- 12 H, such as a single bus, a split-bus (as illustrated), or individual buses.
  • the exemplary system 10 may utilize Intel Pentium IV processors and the buses 14 A and 14 B may operate at 100/133 MHz.
  • Each of the buses 14 A and 14 B may be coupled to a chip set which includes a host controller 16 and a data controller 18 .
  • the data controller 18 may be effectively a data cross-bar slave device controlled by the host controller 16 .
  • the data controller 18 may be used to store data awaiting transfer from one area of the system 10 to a requesting area of the system 10 . Because of the master/slave relationship between the host controller 16 and the data controller 18 , the chips may be referred to together as the host/data controller 16 , 18 .
  • the host/data controller 16 , 18 is coupled to main memory 20 via a memory bus 22 .
  • the memory 20 may include one or more memory devices, such as dynamic random access memory (DRAM) devices, configured to store data.
  • the memory devices may be configured on one or more memory modules, such as dual inline memory modules (DIMMs). Further, the memory modules may be configured to form a memory array including redundant and/or hot pluggable memory segments.
  • the memory 20 may also include one or more memory controllers (not shown) to coordinate the exchange of requests and data between the memory 20 and a requesting device such as a CPU 12 A- 12 H or I/O device.
  • the host/data controller 16 , 18 is typically coupled to one or more bridges 24 A- 24 C via an Input/Output (I/O) bus 26 .
  • the opposite side of each bridge 24 A- 24 C may be coupled to a respective bus 28 A- 28 C, and a plurality of peripheral devices 30 A and 30 B, 32 A and 32 B, and 34 A and 34 B may be coupled to the respective buses 28 A, 28 B, and 28 C.
  • the bridges 24 A- 24 C may be any of a variety of suitable types, such as PCI, PCI-X, EISA, AGP, etc.
  • the system 10 includes a tag RAM 36 that is configured to store state information corresponding to the state of each system bus, such as the buses 14 A and 14 B.
  • FIG. 2 illustrates a block diagram of the host/data controller 16 , 18 .
  • each of the components illustrated and described with reference to the host controller 16 may have a corresponding companion component in the data controller 18 .
  • the functionality of each component may be described generally with respect to the host controller 16 , which may be configured to receive requests and to coordinate the exchange of requested data through the data controller 18 .
  • the host controller 16 generally coordinates the exchange of requests and data from the processor buses 14 A and 14 B, the I/O bus 26 , and the memory bus 22 .
  • the host controller 16 may include a memory controller MCON that facilitates communication with the memory 20 .
  • the host controller 16 may also include a processor controller PCON for each of the processor and I/O buses 14 A, 14 B, and 26 .
  • the processor controller corresponding to the processor bus 14 A is designated as “PCON 0 .”
  • the processor controller corresponding to the processor bus 14 B is designated as “PCON 1 .”
  • the processor controller corresponding to the I/O bus 26 is designated as “PCON 2 .”
  • each processor controller PCON 0 -PCON 2 serves to connect a respective bus external to the host controller 16 (i.e., processor bus 14 A and 14 B and I/O bus 26 ) to the internal blocks of the host controller 16 .
  • the processor controllers PCON 0 -PCON 2 facilitate the interface from the host controller 16 to each of the buses 14 A, 14 B, and 26 . Further, in an alternate embodiment, a single processor controller PCON may serve as the interface for all of the system buses 14 A, 14 B, and 26 .
  • the processor controllers PCON 0 -PCON 2 may be referred to collectively as “PCON.” Any number of specific designs for the processor controller PCON and the memory controller MCON may be implemented in conjunction with the techniques described herein, as can be appreciated by those skilled in the art.
  • the host controller 16 may also include a tag controller TCON or snoop filter.
  • tag controller TCON and “snoop filter” will be used interchangeably.
  • the tag controller TCON maintains coherency and request cycle ordering in the system 10 .
  • “Cache coherence” refers to a protocol for managing the caches (e.g., caches 13 A- 13 H) and shared system memory 20 to insure that demand or request accesses to system memory 20 receive data that includes the latest updates. Once the data is received by the requesting agent, it may be stored in a local cache for future accesses (reads and writes). When a subsequent request is made to data previously requested, it may be found in the processors local cache.
  • the processor will check for a local copy of the data before forwarding the request to the front side bus where it is processed by the host controller (e.g., system 10 )
  • the tag controller TCON (or snoop filter) is the mechanism that tracks tag and state information, filters snoops and issues castout snoops based on the current request and state information stored in the tag RAM 36 .
  • the tag controller TCON maintains coherency by ordering access to the tag RAM 36 .
  • the tag controller TCON is also tasked with snooping each of the buses 14 A and 14 B and the caches 13 A- 13 H associated with the corresponding buses to retrieve modified data or transfer cacheline ownership between buses 14 A and 14 B.
  • a tag RAM 36 may be provided to identify which data from the main memory is currently stored in each processor cache associated with each memory segment.
  • the tag controller TCON or snoop filter is a mechanism used to reduce bus traffic in certain computer systems, particularly multiple-processor systems.
  • the tag RAM 36 is essentially a specialized cache for storing cache tag and state information of memory cachelines stored in local processor caches 13 A- 13 H of the processors 12 A- 12 H.
  • the snoop filter keeps track of the coherency state of each cache line of each of the processors 12 A- 12 H.
  • the state information in the tag RAM 36 is used by the snoop filter to decide which bus transactions received from the various processors 12 A- 12 H should be passed on to other processors 12 A- 12 H in the system 10 to maintain coherent memory.
  • the snoop filter filters unnecessary bus transactions by preventing them from reaching those processors 12 A- 12 H on adjacent bus segments if coherency can be resolved without accessing those segments. Hence, the snoop filter can have a dramatic positive impact on the overall system performance by reducing snoop traffic on the front side bus.
  • the snoop filter essentially provides a directory to the data stored in the processor caches. For each request, received at the host controller 16 , the address is decomposed into a tag and a direct mapped index. The tag is stored in the tag RAM 36 , along with bus and state identification information. As previously discussed, the tag RAM 36 generally comprises a buffer having a number of indices, wherein each index is configured to store a single tag. Alternatively, each tag index may be configured to store multiple tags. Each time a request accesses a particular tag index and the tag differs from the current tag at that index, the snoop filter runs a castout cycle using the current tag to make room for the new tag.
  • each cache line is marked with one of the four MESI states: Modified, Exclusive, Shared or Invalid.
  • the cache lines are marked by encoding two additional bits added to the cache line.
  • the Modified state indicates that a cache line was modified and therefore the underlying data (i.e., the associated data in main memory) is no longer valid. In other words, the data in one of the caches is more recent that the data stored in memory.
  • the Exclusive state indicates that a cache line is only stored in this cache and has not been changed by a write access yet. A copy of data stored in a cache which is in an Exclusive state is writable.
  • the Shared state indicates that a cache line may be stored in other processor caches. Shared state cachelines are generally read-only copies of the data stored in memory.
  • the Invalid state indicates that the data is no longer valid and is no longer present in the cache.
  • Typical snoop filters for x86 applications track all request allocations, based on MESI protocol. This results in a high number of castouts with a high number of castouts penalizing the processor as it continues to access the evicted cacheline.
  • the presently disclosed snoop filter is configured to track only requests allocated to certain states, as previously described (exclusive forced inclusion).
  • a modified MESI protocol is implemented to selectively track Shared state information in an exclusive forced inclusion snoop filter. The criteria for tracking a request agent's Shared state is that the request does not have an associated castout penalty when tracked.
  • a request allocates to Shared (BRLC or BRLD) and the snoop filter determines the request tag is a tagmiss
  • the snoop filter will retain the current tag and state as will be described further below with reference to FIG. 3 and FIG. 4 . This preserves the state history of the current tag and avoids unnecessary castout snoop cycles (castout penalty). Additionally, retaining the current tag and state will preserve any precise state information that has accumulated to the existing tag.
  • BWIL Invalid
  • the snoop filter will retain the existing tag and state history if the castout state is in any state other than Unknown, as described in more detail below with reference to FIG. 6 below. If the castout state is unknown, the snoop filter will update the tag and set the state to Invalid. This sets the stage to track a subsequent Shared request should it be a taghit. The following information relevant to the modified protocol states described herein may be helpful.
  • EXCLUSIVE STATE The present snoop filter will track those requests that are allocated to the Exclusive state or the Modified state.
  • the front side bus may not be configured to distinguish between requests that allocate to the Exclusive state or the Modified state and will appear identical to the snoop filter.
  • this embodiment may be implemented by tracking fewer front side bus attributes. This exemplary embodiment is described herein. Accordingly, further references to the “Exclusive state” refer to either an Exclusive or Modified state. Alternatively, the system may be designed such that processor allocation to the Exclusive or Modified state can be tracked independently.
  • the present exemplary snoop filter does not track all requests allocated to the Shared state. Rather, the snoop filter only tracks those requests allocated to the Shared state if the request does not have a castout penalty associated with it. By identifying tags that are Invalid or Exclusive before a request to shared is issued by a processor agent, Shared states can be tracked without causing castout cycles.
  • UNKNOWN STATE Those requests that allocate to Shared state but have a castout penalty associated with them, will not be tracked in the snoop filter and will inherit the default “Unknown state.” This technique results in one or more busses having shared data that is not tracked in the snoop filter.
  • the processor and IO busses are allowed to share the address as long one of the bus agents does not execute a request for exclusive access (BRIL or BWIL).
  • BRIL or BWIL request for exclusive access
  • the Invalid state tracks tags taken to the Invalid state by normal program flow, as will be appreciated by those skilled in the art. This includes BWILs that hit an existing tag in the snoop filter or BWLs that take the processor cache to the Invalid state.
  • the snoop filter By tracking shared states under the rules defined in the flow charts illustrated in FIGS. 3-7 , the number of castouts is reduced, allowing the processor to continue accessing the cacheline from its internal cache. Furthermore, since the majority of requests allocate to the Shared state without associated penalty, the snoop filter is available to track exclusive states or modified states (“Exclusive state”) and limited shared allocations (“Shared state”), thereby increasing the apparent size and efficiency of the snoop filter. As discussed above, if shared states are to be tracked, the snoop filter must guarantee that the bus state information is accurate. In other words, if the snoop filter tracks a Shared state on one or more buses (e.g., 14 A and 14 B) for a single address, the remaining buses must be invalid.
  • buses e.g., 14 A and 14 B
  • the snoop filter identifies when an address is known to be invalid or exclusive (i.e., modified/exclusive). From this point, the state tracker can accurately track (under the modified state definitions), the Shared state.
  • a requesting device such as the CPU 12 D or the peripheral device 30 A, for example, may initiate read requests to the host controller 16 .
  • the respective processor controller PCON sends the request to the memory controller MCON and the tag controller TCON.
  • the memory controller MCON passes the request to the memory 20 to obtain the requested data.
  • the tag controller TCON may send a tag lookup request to the tag RAM 36 to determine whether the requested data is currently stored in one of the processor caches 13 A- 13 H.
  • the tag state information indicates and exclusive or modified state (Owned State) on a remote bus
  • the tag controller TCON will issue a snoop cycle. If a tag match is found (HIT#), the remote bus will return modified data which will be reconciled with the original request to memory before the data is returned to the requester.
  • the snoop filter applies forced inclusion to exclusive/modified (owned) cycles.
  • requests are allocating to the shared state, they will not be tracked in the snoop filter unless there is no penalty associated with tracking the request.
  • the tag controller TCON will force requests to the shared state whenever possible. This is accomplished on the processor bus by asserting HIT# in the snoop phase on all read commands (e.g., Bus Read Line Code and Bus Read Line Data) that support the HIT# snoop response.
  • a snoop will be issued to demote the remote bus to the shared state. If a read command does not support the HIT# snoop response, the processors 12 A- 12 H are configured to take the cache line owned (exclusive/modified). For these cases, the snoop filter 36 will always issue an invalidate snoop to the adjacent processor bus 14 A or 14 B to invalidate a shared copy of the cache line that is potentially cached by one of the processors 12 A- 12 H on that bus.
  • the snoop filter 36 will either (1) issue a snoop to the I/O bus 26 to invalidate a potentially shared copy or (2) actually track I/O reads in the snoop filter and only issue downstream snoops if the address associated with the processor read request matches the I/O address in the snoop filter.
  • snoops are only required if the read address hits an exclusive address in the snoop filter 36 . In this case a snoop will be issued to the bus specified in the tag cache to invalidate the processor cache 13 A- 13 H and retrieve modified data if necessary.
  • I/O cycles allocate to the shared state so they do not need to snoop the processor buses 14 A and 14 B as long as the processor buses are invalid or shared and have no associated penalty.
  • FIG. 3 is a flow chart illustrating the handling of a Bus Read Line Data (BRLD) request.
  • BRLD Bus Read Line Data
  • the tag controller TCON determines whether the request is in an exclusive state, an invalid state, a shared state or an unknown state, as indicated in block 48 . If the request is in the exclusive state, the tag controller TCON checks the bus-state information stored in the tag RAM 36 to determine whether the cache line indicates exclusive ownership on a remote bus (a bus other than the bus initiating the request) or indicates exclusive ownership on the same bus as the request, as indicated in block 49 . If the snoop filter bus-state information indicates exclusive ownership on a remote bus, a snoop is issued to demote the exclusive state to shared in the remote-bus CPU cache, as indicated in block 50 .
  • the snoop filter will monitor the snoop phase of the snooped bus to see if a processor agent asserts HIT, indicating its intention of keeping a shared copy of the data cached. Following this snoop cycle, the tag controller TCON asserts HIT# on the request bus, as indicated in block 51 . Next, the snoop filter will determine whether HIT# is asserted on the remote bus, as indicated in block 52 . If not, the snoop filter tracks the request bus as shared, as indicated in block 53 . If HIT# is asserted on the remote bus, all bus states are known to be shared and both buses (remote and request) are tracked as shared in the snoop filter, as indicated in block 54 .
  • the existing exclusive agent asserts HIT#, then the shared state on the requesting bus is tracked in the snoop filter, as indicated in block 57 . If the existing exclusive agent does not assert HIT#, then the requesting agent is allowed to allocate to the exclusive state and is tracked as such in the snoop filter, as indicated in block 58 .
  • the tag controller TCON asserts the HIT# on the request bus, as indicated in block 59 .
  • the tag controller TCON determines whether the cache line is shared on the remote bus or shared on the request bus, as indicated in block 60 . If the cacheline corresponding to the request is not shared on the remote bus (and thus is shared on the request bus), the shared state is tracked in the snoop filter on the request bus, as indicated in block 62 . If the cache line is in the shared state on the remote bus, the snoop filter may issue a snoop to the remote bus and monitor the snoop response to determine the next state. The algorithm illustrated in FIG. 3 does not issue a snoop to the remote Shared bus. Consequently, the remote Shared bus must retain the Shared state to remain coherent, as indicated in block 64 .
  • the tag controller TCON asserts the HIT# on the request bus, as indicated in block 66 .
  • the shared state is tracked in the snoop filter on the request bus, as indicated by block 67 .
  • the tag controller TCON asserts HIT#, as indicated in block 68 and the tag state is retained as unknown, as indicated in block 69 .
  • FIG. 4 is a flow chart illustrating the processing of a Bus Read Line Code (BRLC) request.
  • BRLC Bus Read Line Code
  • the tag controller TCON determines whether there is a tag match (block 74 ), the tag controller will determine whether the request is in an exclusive state, an invalid state, a shared state or an unknown state, as indicated in block 78 . If the request is in the exclusive state, the tag controller TCON checks the bus-state information stored in the tag RAM 36 to determine whether the cache line indicates exclusive ownership on a remote bus (a bus other than the bus initiating the request) or indicates exclusive ownership on the same bus as the request, as indicated in block 80 .
  • the snoop filter bus-state information indicates exclusive ownership on a remote bus
  • a snoop is issued to demote the exclusive state to shared in the remote-bus CPU cache, as indicated in block 81
  • the snoop filter will determine whether HIT# is asserted on the remote bus, as indicated in block 82 . If not, the snoop filter tracks the request bus as shared, as indicated in block 83 . If HIT# is asserted on the remote bus, all bus states are known to be shared and both buses (remote and request) are tracked as shared in the snoop filter, as indicated in block 84 .
  • the tag controller TCON determines whether the cache line is shared on the remote bus or shared on the request bus, as indicated in block 86 . If the cache line corresponding to the request is not shared on the remote bus (and thus is shared on the request bus), the shared state is tracked in the snoop filter on the request bus, as indicated in block 88 . If the cache line is in the shared state on the remote bus, the shared state is tracked in the snoop filter on both the remote bus and the request bus, as indicated in block 90 .
  • the shared state is tracked in the snoop filter on the request bus, as indicated by block 92 .
  • the tag controller TCON asserts HIT#, as indicated in block 94 and the tag is retained in the unknown state, as indicated in block 96 .
  • FIG. 5 is a flow chart illustrating the handling of a Bus Read and Invalidate Line (BRIL) request.
  • the snoop filter 36 tracks BRIL requests as exclusive in the tag RAM 36 since the requesting CPU 12 A- 12 H will allocate the request address as either exclusive or modified.
  • a BRIL request is executed by a CPU 12 A- 12 H, it is received at the tag controller TCON, as indicated in block 100 .
  • the tag controller TCON will search the snoop filter for the request address, as indicated in block 102 .
  • the tag controller TCON will determine whether there is a tag match (i.e., whether the request address is currently being tracked in the tag RAM 36 ), as indicated in block 104 .
  • the tag controller TCON will determine whether the request is in an exclusive state, an invalid state, a shared state or an unknown state, as indicated in block 106 . If the request is in the exclusive state, the tag controller TCON checks the bus-state information stored in the tag RAM 36 to determine whether the cache line indicates exclusive ownership on a remote bus (a bus other than the bus initiating the request) or indicates exclusive ownership on the same bus as the request, as indicated in block 108 . If the snoop filter bus-state information indicates exclusive ownership on a remote bus, a snoop is issued to demote the state from exclusive to invalid on the remote-bus CPU cache, as indicated in block 110 .
  • the snoop filter tracks the request bus as exclusive in the tag RAM 36 , as indicated in block 112 . If a tag match occurs with an exclusive state and the snoop filter bus-state information indicates exclusive ownership on the same bus as the request (requesting bus), no snoops are issued, because the bus is a self-snooping bus. The snoop filter tracks the request bus as exclusive in the tag RAM 36 , as indicated in block 112 .
  • the tag controller TCON determines whether the cache line is shared on the remote bus or shared on the request bus, as indicated in block 116 . If the cache line corresponding to the request is not shared on the remote bus (and thus is shared on the request bus), the request bus is tracked as exclusive in the tag RAM 36 , as indicated in block 120 . If the cache line is in the shared state on the remote bus, a snoop is issued to demote the state from shared to invalid on the remote-bus CPU cache, as indicated in block 118 . Following this snoop cycle, the snoop filter tracks the request bus as exclusive in the tag RAM 36 , as indicated in block 120 . Finally, if the request is in the invalid state, the request bus is tracked as exclusive in the tag RAM 36 , as indicated by block 121 .
  • invalidate-snoops are issued to any bus that is remote to the request bus using the request tag address, as indicated in block 128 .
  • the purpose for the snoops is to invalidate potential shared states on these remote buses.
  • the request bus is self-snooping and therefore does not receive a specific snoop.
  • the snoop filter tracks the request bus as exclusive in the tag RAM 36 , as indicated in block 129 .
  • the tag controller may determine that there is no tag match. If a tag match does not occur, the tag controller (TCON) will determine whether the tag state is Exclusive, as indicated in block 122 . If the castout state is Exclusive, the bus segment specified by the Exclusive state bits (may be the request bus) is snoop invalidated with the castout tag, as indicated in block 124 . Next, invalidate-snoops are issued to any bus that is remote to the request bus using the request tag address, as indicated in block 126 . The purpose for the snoops is to invalidate potential shared states on these remote buses. As will be appreciated, the request bus is self-snooping and therefore does not receive a specific snoop.
  • the snoop filter tracks the request bus as exclusive in the tag RAM 36 , as indicated in block 127 . If the state associated with the castout tag is any state other than Exclusive, no castout snoops are issued. In either case, all remote busses are snoop invalidated using the request address to invalidate potentially shared data in agent caches.
  • the tag controller TCON assumes that one or more of the remote buses has the same address as the request cached in the shared state. Since BRILs always allocate to exclusive or modified, snoops are issued to all remote buses using the request tag to invalidate their caches, as indicated in block 126 . Finally, the snoop filter tracks the request bus as exclusive in the tag RAM 36 , as indicated in block 127 .
  • FIG. 6 is a flow chart illustrating the handling of a Bus Write and Invalidate Line (BWIL) request.
  • BWIL Bus Write and Invalidate Line
  • the snoop filter 36 tracks BWIL request as invalid since the requesting CPU 12 A- 12 H will allocate the request address as invalid after a BWIL request.
  • a BWIL request is executed by a CPU 12 A- 12 H, it is received at the tag controller TCON, as indicated in block 130 .
  • the tag controller TCON will search the snoop filter for the request address, as indicated in block 132 .
  • the tag controller TCON will determine whether there is a tag match (i.e., whether the request address is currently being tracked in the snoop filter 36 ), as indicated in block 134 .
  • the tag controller TCON will determine whether the request is in an exclusive state, an invalid state, a shared state or an unknown state, as indicated in block 136 . If the request is in the exclusive state, the tag controller TCON checks the bus-state information stored in the tag RAM 36 to determine whether the cache line indicates exclusive ownership on a remote bus (a bus other than the bus initiating the request) or indicates exclusive ownership on the same bus as the request, as indicated in block 138 . If the snoop filter bus-state information indicates exclusive ownership on a remote bus, a snoop is issued to demote the state from exclusive to invalid on the remote-bus CPU cache, as indicated in block 140 .
  • the snoop filter tracks all busses as invalid in the tag RAM 36 , as indicated in block 142 .
  • the snoop filter tracks all busses as invalid in the tag RAM 36 , as indicated in block 142 .
  • the tag controller TCON determines whether the cache line is shared on the remote bus or shared on the request bus, as indicated in block 148 . If the cache line is in the shared state on the remote bus, a snoop is issued to demote the state from shared to invalid on the remote-bus CPU cache, as indicated in block 150 . Following this snoop cycle, the snoop filter tracks all busses as invalid in the tag RAM 36 , as indicated in block 152 .
  • the snoop filter tracks all busses as invalid in the tag RAM 36 , as indicated in block 152 . If the request is in the invalid state, the snoop filter continues to track all busses as invalid in the tag RAM 36 , as indicated by block 154 . Finally, if the request is in the unknown state, invalidate-snoops are issued to any bus that is remote to the request bus using the request tag address, as indicated in block 162 , and the snoop filter tracks all busses as invalid in the tag RAM 36 , as indicated in block 164 .
  • the tag controller may determine that there is no tag match. If a tag match does not occur, invalidate-snoops are issued to any bus that is remote to the request bus using the request tag address to invalidate potentially shared copies of the data in processor caches, as indicated in block 156 . Next, it is determined whether the state of the existing tag RAM is unknown, as indicated in block 157 . If the current tag state is Unknown, the snoop filter will track the request tag state as Invalid, as indicated in block 158 . If the state of the existing tag RAM is not unknown, the snoop filter continues to track the existing tag and its current state in the tag RAM 36 , as indicated by block 160 .
  • FIG. 7 is a flow chart illustrating the handling of a Bus Write Line (BWL) request.
  • the snoop filter 36 tracks BWL requests as invalid or exclusive/modified, depending on the CPU architecture (i.e., whether the CPU cache is inclusive or not).
  • a BWL request is executed by a CPU 12 A- 12 H, it is received at the tag controller TCON, as indicated in block 170 .
  • the tag controller TCON will search the snoop filter for the request address, as indicated in block 172 .
  • a tag match will always occur with the snoop filter bus-state information indicating exclusive ownership on the same bus as the request, as indicated in block 174 .
  • the snoop filter will update in one of two ways depending on whether the CPU demotes to invalid, as indicated in block 176 . If the CPU demotes to invalid following a BWL request, the snoop filter 36 is updated to reflect an invalid state across all buses, as indicated in block 178 . If the CPU is allowed to remain at the exclusive or modified state following a BWL, no updates are made to the snoop filter state information and the address will continue to be tracked as exclusive in the snoop filter, as indicated in block 180 .
  • the ordered listing can be embodied in a computer-readable medium for use by or in connection with a computer-based system that can retrieve the instructions and execute them to carry out the previously described processes.
  • the computer-readable medium can be a means that can contain, store, communicate, propagate, transmit or transport the instructions.
  • the computer readable medium can be an electronic, a magnetic, an optical, an electromagnetic, or an infrared system, apparatus, or device.
  • An illustrative, but non-exhaustive list of computer-readable mediums can include an electrical connection (electronic) having one or more wires, a portable computer diskette, a random access memory (RAM) a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, and a portable compact disk read-only memory (CDROM).
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CDROM portable compact disk read-only memory

Abstract

Method and apparatus for reducing castouts in a snoop filter. More specifically, there is provided a system comprising a plurality of buses, one or more processors coupled to each of the plurality of buses and a snoop filter. The snoop filter configured to eliminate unnecessary snoops of the plurality of buses, and further configured to track requests from the one or more processors only if tracking the request does not result in a castout penalty.

Description

BACKGROUND OF THE INVENTION
This section is intended to introduce the reader to various aspects of art which may be related to various aspects of the present invention which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
With the advent of standardized architectures and operating systems, computers have become virtually indispensable for a wide variety of uses from business applications to home computing. Whether a computer system is a personal computer or a network of computers connected via a server interface, computers today rely on processors, associated chip sets, and memory chips to perform most of the processing functions, including the processing of system requests. The more complex the system architecture, the more difficult it becomes to process requests in the system efficiently. Despite the increasing complexity of system architectures, demands for improved request processing speed continue to drive system design. Designers are often challenged with finding ways to reduce the cycle time for accessing data and processing requests.
Some systems include multiple processing units or microprocessors connected via a processor bus. By implementing multiple processors, system processing efficiency is improved by providing a system that is able to simultaneously process requests. To coordinate the exchange of information among the processors, a host/data controller is generally provided. The host/data controller is further tasked with coordinating the exchange of information between the plurality of processors and the system memory. The host/data controller may be responsible not only for the exchange of information in the typical Read-Only Memory (ROM) and the Random Access Memory (RAM), but also the cache memory in high speed systems. Cache memory is a special high speed storage mechanism which may be provided as a reserved section of the main memory or as an independent high-speed storage device. Essentially, the cache memory is a portion of the RAM which is typically made of high speed static RAM (SRAM) rather than the slower and cheaper dynamic RAM (DRAM) which may be used for the remainder of the main memory. Alternatively, each processor may have an associated cache memory. By storing frequently accessed data in the cache memory, the processor avoids having to re-access the shared memory each time the information is needed.
For multiprocessor and multibus shared memory systems, bus sniffing or bus snooping may be implemented to maintain system memory coherency. For bus sniffing/bus snooping techniques, an algorithm or apparatus should be designed to promote data changes by any agent to any other agent demand request. That is to say that in order to maintain coherency, each time a processor issues a request for memory data, the other processor caches may need to be searched for copies of that data, depending on the type of request, to insure that only the most up to date information is used. Some aspects of this apparatus is provided by the processor architecture. For example, X86 architecture maintains coherency across different levels of processor cache. The X86 architecture front side bus definition also deploys a self snooping protocol for agents that share the same bus. If more than one bus segment is supported in the system, a system level solution should be implemented to maintain coherency across a multitude of bus segments. Snoop filters or tag caches are a common solution for coordinating system level coherency across multiple bus segments. One of the primary goals of an efficient snoop filter design is to minimize the number of unnecessary snoops to preserve front side bus bandwidth for request and data traffic. This includes request snoops required to retrieve the most recent data or provide an agent exclusive access to data and to “castout” snoops required to make space in the tag cache for a forced inclusion snoop filter.
A typical snoop filter is implemented using a direct mapped policy where the tag cache can track only one tag at a given tag index. Each time a request accesses a particular tag index and the tag differs from the current tag at the index, the snoop filter runs a castout cycle using the current tag, to make room for the new tag. The castout cycle runs a back invalidation to the processor bus(es) being tracked by the snoop filter. If the processor needs the evicted cacheline again, it is forced to fetch the cache line from memory instead of its own cache. This results in a performance penalty, as the latency to an internal cache running at core clock speed compared to the latency to the main memory running at system bus clock speed can be an order of magnitude in difference.
The present invention may address one or more of the problems set forth above.
BRIEF DESCRIPTION OF THE DRAWINGS
Advantages of the invention may become apparent upon reading the following detailed description and upon reference to the drawings in which:
FIG. 1 is a block diagram illustrating an exemplary computer system having a multiple processor bus architecture according to the embodiments of the present invention;
FIG. 2 is a block diagram illustrating an exemplary host controller in accordance with embodiments of the present invention; and
FIGS. 3-7 are respective flow charts illustrating an improved method of processing various request types in a computer system in accordance with embodiments of the present invention.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
One or more specific embodiments of the present invention will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
In accordance with embodiments of the present invention, a new approach to snoop filters is provided. In accordance with the present techniques, the basic operation of the filter described herein is to force inclusion on exclusive access only and not track shared accesses. As long as the cached data remains in the shared state, the data will not be evicted by the snoop filter algorithm. The penalty of this technique is that exclusive requests must invalidate all bus segments to insure shared data is removed from all processor caches except the request agent. To reduce the number of castouts with exclusive forced inclusion, a method is described that allows the snoop filter to track shared accesses until a penalty is encountered. As used herein, the “castout penalty” refers to the occurance of a castout snoop on a shared access with no net gain in precise snoop filter state information. In accordance with embodiments of the present invention, if a shared request causes a snoop filter castout that discards precise state information associated with the castout tag, the castout tag is retained, the castout snoop is canceled and the shared request is dropped from the snoop filter, as described further below.
Turning now to the drawings and referring initially to FIG. 1, a block diagram of an exemplary computer system with multiple processor buses and an I/O bus, generally designated as reference numeral 10, is illustrated. The computer system 10 typically includes one or more processors or CPUs. In the exemplary embodiment, the system 10 may utilize eight CPUs 12A-12H. Each CPU 12A-12H may include a respective cache memory 13A-13H for storing recently accessed information. The system 10 may utilize a split-bus configuration in which the CPUs 12A-12D are coupled to a first bus 14A and the CPUs 12E-12H are coupled to a second bus 14B. It should be understood that the processors or CPUs 12A-12H may be of any suitable type, such as a microprocessor available from Intel, AMD, or Motorola, for example. Each CPU 12A-12H may include a segment of cache memory for storage of frequently accessed data and programs. Furthermore, any suitable bus configuration may be coupled to the CPUs 12A-12H, such as a single bus, a split-bus (as illustrated), or individual buses. By way of example, the exemplary system 10 may utilize Intel Pentium IV processors and the buses 14A and 14B may operate at 100/133 MHz.
Each of the buses 14A and 14B may be coupled to a chip set which includes a host controller 16 and a data controller 18. In this embodiment, the data controller 18 may be effectively a data cross-bar slave device controlled by the host controller 16. The data controller 18 may be used to store data awaiting transfer from one area of the system 10 to a requesting area of the system 10. Because of the master/slave relationship between the host controller 16 and the data controller 18, the chips may be referred to together as the host/ data controller 16, 18.
The host/ data controller 16, 18 is coupled to main memory 20 via a memory bus 22. The memory 20 may include one or more memory devices, such as dynamic random access memory (DRAM) devices, configured to store data. The memory devices may be configured on one or more memory modules, such as dual inline memory modules (DIMMs). Further, the memory modules may be configured to form a memory array including redundant and/or hot pluggable memory segments. The memory 20 may also include one or more memory controllers (not shown) to coordinate the exchange of requests and data between the memory 20 and a requesting device such as a CPU 12A-12H or I/O device.
The host/ data controller 16, 18 is typically coupled to one or more bridges 24A-24C via an Input/Output (I/O) bus 26. The opposite side of each bridge 24A-24C may be coupled to a respective bus 28A-28C, and a plurality of peripheral devices 30A and 30B, 32A and 32B, and 34A and 34B may be coupled to the respective buses 28A, 28B, and 28C. The bridges 24A-24C may be any of a variety of suitable types, such as PCI, PCI-X, EISA, AGP, etc. Finally, as described further below with reference to FIG. 2, the system 10 includes a tag RAM 36 that is configured to store state information corresponding to the state of each system bus, such as the buses 14A and 14B.
FIG. 2 illustrates a block diagram of the host/ data controller 16, 18. As can be appreciated, each of the components illustrated and described with reference to the host controller 16 may have a corresponding companion component in the data controller 18. The functionality of each component may be described generally with respect to the host controller 16, which may be configured to receive requests and to coordinate the exchange of requested data through the data controller 18. The host controller 16 generally coordinates the exchange of requests and data from the processor buses 14A and 14B, the I/O bus 26, and the memory bus 22.
The host controller 16 may include a memory controller MCON that facilitates communication with the memory 20. The host controller 16 may also include a processor controller PCON for each of the processor and I/ O buses 14A, 14B, and 26. For simplicity, the processor controller corresponding to the processor bus 14A is designated as “PCON0.” The processor controller corresponding to the processor bus 14B is designated as “PCON1.” The processor controller corresponding to the I/O bus 26 is designated as “PCON2.” Essentially, each processor controller PCON0-PCON2 serves to connect a respective bus external to the host controller 16 (i.e., processor bus 14A and 14B and I/O bus 26) to the internal blocks of the host controller 16. Thus, the processor controllers PCON0-PCON2 facilitate the interface from the host controller 16 to each of the buses 14A, 14B, and 26. Further, in an alternate embodiment, a single processor controller PCON may serve as the interface for all of the system buses 14A, 14B, and 26. The processor controllers PCON0-PCON2 may be referred to collectively as “PCON.” Any number of specific designs for the processor controller PCON and the memory controller MCON may be implemented in conjunction with the techniques described herein, as can be appreciated by those skilled in the art.
The host controller 16 may also include a tag controller TCON or snoop filter. As used herein, the terms “tag controller TCON” and “snoop filter” will be used interchangeably. Generally, the tag controller TCON maintains coherency and request cycle ordering in the system 10. “Cache coherence” refers to a protocol for managing the caches (e.g., caches 13A-13H) and shared system memory 20 to insure that demand or request accesses to system memory 20 receive data that includes the latest updates. Once the data is received by the requesting agent, it may be stored in a local cache for future accesses (reads and writes). When a subsequent request is made to data previously requested, it may be found in the processors local cache. The processor will check for a local copy of the data before forwarding the request to the front side bus where it is processed by the host controller (e.g., system 10) Generally, the tag controller TCON (or snoop filter) is the mechanism that tracks tag and state information, filters snoops and issues castout snoops based on the current request and state information stored in the tag RAM 36. The tag controller TCON maintains coherency by ordering access to the tag RAM 36. The tag controller TCON is also tasked with snooping each of the buses 14A and 14B and the caches 13A-13H associated with the corresponding buses to retrieve modified data or transfer cacheline ownership between buses 14A and 14B.
As previously described, a tag RAM 36 may be provided to identify which data from the main memory is currently stored in each processor cache associated with each memory segment. The tag controller TCON or snoop filter is a mechanism used to reduce bus traffic in certain computer systems, particularly multiple-processor systems. The tag RAM 36 is essentially a specialized cache for storing cache tag and state information of memory cachelines stored in local processor caches 13A-13H of the processors 12A-12H. The snoop filter keeps track of the coherency state of each cache line of each of the processors 12A-12H. The state information in the tag RAM 36 is used by the snoop filter to decide which bus transactions received from the various processors 12A-12H should be passed on to other processors 12A-12H in the system 10 to maintain coherent memory. The snoop filter filters unnecessary bus transactions by preventing them from reaching those processors 12A-12H on adjacent bus segments if coherency can be resolved without accessing those segments. Hence, the snoop filter can have a dramatic positive impact on the overall system performance by reducing snoop traffic on the front side bus.
As previously described, the snoop filter essentially provides a directory to the data stored in the processor caches. For each request, received at the host controller 16, the address is decomposed into a tag and a direct mapped index. The tag is stored in the tag RAM 36, along with bus and state identification information. As previously discussed, the tag RAM 36 generally comprises a buffer having a number of indices, wherein each index is configured to store a single tag. Alternatively, each tag index may be configured to store multiple tags. Each time a request accesses a particular tag index and the tag differs from the current tag at that index, the snoop filter runs a castout cycle using the current tag to make room for the new tag. Whenever the snoop filter evicts a cacheline, an invalidate cycle is issued to invalidate the processor caches, forcing subsequent cycles to retrieve data from system memory 20, rather than the internal cache 13A-13H. This results in performance penalty as the latency to an internal cache 13A-13H running at core clock speed is much smaller than the latency to system memory 20 running at system bus clock speeds.
One commonly used standard for maintaining cache coherence is known as the “MESI protocol.” In accordance with the MESI protocol, each cache line is marked with one of the four MESI states: Modified, Exclusive, Shared or Invalid. The cache lines are marked by encoding two additional bits added to the cache line. As will be appreciated, the Modified state indicates that a cache line was modified and therefore the underlying data (i.e., the associated data in main memory) is no longer valid. In other words, the data in one of the caches is more recent that the data stored in memory. The Exclusive state indicates that a cache line is only stored in this cache and has not been changed by a write access yet. A copy of data stored in a cache which is in an Exclusive state is writable. The Shared state indicates that a cache line may be stored in other processor caches. Shared state cachelines are generally read-only copies of the data stored in memory. The Invalid state indicates that the data is no longer valid and is no longer present in the cache.
Typical snoop filters for x86 applications track all request allocations, based on MESI protocol. This results in a high number of castouts with a high number of castouts penalizing the processor as it continues to access the evicted cacheline. In order to reduce the number of castouts (or “evictions”), the presently disclosed snoop filter is configured to track only requests allocated to certain states, as previously described (exclusive forced inclusion). In accordance with embodiments of the present invention, a modified MESI protocol is implemented to selectively track Shared state information in an exclusive forced inclusion snoop filter. The criteria for tracking a request agent's Shared state is that the request does not have an associated castout penalty when tracked. If a request allocates to Shared (BRLC or BRLD) and the snoop filter determines the request tag is a tagmiss, the snoop filter will retain the current tag and state as will be described further below with reference to FIG. 3 and FIG. 4. This preserves the state history of the current tag and avoids unnecessary castout snoop cycles (castout penalty). Additionally, retaining the current tag and state will preserve any precise state information that has accumulated to the existing tag. If a request allocates to Invalid (BWIL) and the snoop filter determines the request is a tagmiss, the snoop filter will retain the existing tag and state history if the castout state is in any state other than Unknown, as described in more detail below with reference to FIG. 6 below. If the castout state is unknown, the snoop filter will update the tag and set the state to Invalid. This sets the stage to track a subsequent Shared request should it be a taghit. The following information relevant to the modified protocol states described herein may be helpful.
EXCLUSIVE STATE: The present snoop filter will track those requests that are allocated to the Exclusive state or the Modified state. In accordance with one exemplary embodiment, the front side bus may not be configured to distinguish between requests that allocate to the Exclusive state or the Modified state and will appear identical to the snoop filter. Advantageously, this embodiment may be implemented by tracking fewer front side bus attributes. This exemplary embodiment is described herein. Accordingly, further references to the “Exclusive state” refer to either an Exclusive or Modified state. Alternatively, the system may be designed such that processor allocation to the Exclusive or Modified state can be tracked independently.
SHARED STATE: Further, unlike prior snoop filters, the present exemplary snoop filter does not track all requests allocated to the Shared state. Rather, the snoop filter only tracks those requests allocated to the Shared state if the request does not have a castout penalty associated with it. By identifying tags that are Invalid or Exclusive before a request to shared is issued by a processor agent, Shared states can be tracked without causing castout cycles.
UNKNOWN STATE: Those requests that allocate to Shared state but have a castout penalty associated with them, will not be tracked in the snoop filter and will inherit the default “Unknown state.” This technique results in one or more busses having shared data that is not tracked in the snoop filter. The processor and IO busses are allowed to share the address as long one of the bus agents does not execute a request for exclusive access (BRIL or BWIL). As used herein, anytime a tag is referred to as being “dropped” from the snoop filter, it is said to be in the Unknown state.
INVALID STATE: Further, in accordance with the present techniques, the Invalid state tracks tags taken to the Invalid state by normal program flow, as will be appreciated by those skilled in the art. This includes BWILs that hit an existing tag in the snoop filter or BWLs that take the processor cache to the Invalid state.
By tracking shared states under the rules defined in the flow charts illustrated in FIGS. 3-7, the number of castouts is reduced, allowing the processor to continue accessing the cacheline from its internal cache. Furthermore, since the majority of requests allocate to the Shared state without associated penalty, the snoop filter is available to track exclusive states or modified states (“Exclusive state”) and limited shared allocations (“Shared state”), thereby increasing the apparent size and efficiency of the snoop filter. As discussed above, if shared states are to be tracked, the snoop filter must guarantee that the bus state information is accurate. In other words, if the snoop filter tracks a Shared state on one or more buses (e.g., 14A and 14B) for a single address, the remaining buses must be invalid. To insure the accuracy of the bus state tracker when tracking the Shared states, the snoop filter identifies when an address is known to be invalid or exclusive (i.e., modified/exclusive). From this point, the state tracker can accurately track (under the modified state definitions), the Shared state.
During a typical read operation, a requesting device such as the CPU 12D or the peripheral device 30A, for example, may initiate read requests to the host controller 16. The respective processor controller PCON sends the request to the memory controller MCON and the tag controller TCON. The memory controller MCON passes the request to the memory 20 to obtain the requested data. Concurrently, the tag controller TCON may send a tag lookup request to the tag RAM 36 to determine whether the requested data is currently stored in one of the processor caches 13A-13H. Generally if the tag state information indicates and exclusive or modified state (Owned State) on a remote bus, the tag controller TCON will issue a snoop cycle. If a tag match is found (HIT#), the remote bus will return modified data which will be reconciled with the original request to memory before the data is returned to the requester.
Generally, depending on the type of request, certain steps are taken to advantageously implement the embodiments of the present invention, as described further below with respect of FIGS. 3-7. In summary, in order to reduce castouts or evictions, the snoop filter applies forced inclusion to exclusive/modified (owned) cycles. As long as requests are allocating to the shared state, they will not be tracked in the snoop filter unless there is no penalty associated with tracking the request. To maximize the benefit of owned forced inclusion, the tag controller TCON will force requests to the shared state whenever possible. This is accomplished on the processor bus by asserting HIT# in the snoop phase on all read commands (e.g., Bus Read Line Code and Bus Read Line Data) that support the HIT# snoop response. For these cases, if a remote bus is in the exclusive state, a snoop will be issued to demote the remote bus to the shared state. If a read command does not support the HIT# snoop response, the processors 12A-12H are configured to take the cache line owned (exclusive/modified). For these cases, the snoop filter 36 will always issue an invalidate snoop to the adjacent processor bus 14A or 14B to invalidate a shared copy of the cache line that is potentially cached by one of the processors 12A-12H on that bus.
One exception to the general rule is if the read request hits an exclusive state on the request bus. In this case, the host controller cannot assert HIT# or an infinite snoop stall may occur. The final state is determined by the read request type and whether or not another agent on the request bus asserts HIT. The I/O bus 26 is handled like the processor bus 14A or 14B except it cannot go exclusive. The snoop filter 36 will either (1) issue a snoop to the I/O bus 26 to invalidate a potentially shared copy or (2) actually track I/O reads in the snoop filter and only issue downstream snoops if the address associated with the processor read request matches the I/O address in the snoop filter. If the read request originates on the I/O bus 26, snoops are only required if the read address hits an exclusive address in the snoop filter 36. In this case a snoop will be issued to the bus specified in the tag cache to invalidate the processor cache 13A-13H and retrieve modified data if necessary. In general, I/O cycles allocate to the shared state so they do not need to snoop the processor buses 14A and 14B as long as the processor buses are invalid or shared and have no associated penalty.
FIG. 3 is a flow chart illustrating the handling of a Bus Read Line Data (BRLD) request. When a BRLD request is executed by a CPU 12A-12H, it is received at the tag controller TCON, as indicated in block 40. The tag controller TCON will search the tag RAM for the request address, as indicated in block 42. First, the tag controller TCON will determine whether there is a tag match (i.e., whether the request address is currently being tracked in the snoop filter 36), as indicated in block 44. If there is no tag match, then the tag controller TCON asserts the HIT# on the request bus, as indicated in block 45. Since the tag cache does not have any history on the new tag, system agents may already have the address cached in the Shared state and the new tag must remain in the Unknown state. No additional information is gained by the snoop filter by tracking the new tag in the Unknown state. Therefore, the current tag is not castout but is retained instead along with any state information that exists, and the request address will not be tracked by the snoop filter 36, as indicated in block 46.
If the tag controller TCON determines that there is a tag match (block 44), the tag controller will determine whether the request is in an exclusive state, an invalid state, a shared state or an unknown state, as indicated in block 48. If the request is in the exclusive state, the tag controller TCON checks the bus-state information stored in the tag RAM 36 to determine whether the cache line indicates exclusive ownership on a remote bus (a bus other than the bus initiating the request) or indicates exclusive ownership on the same bus as the request, as indicated in block 49. If the snoop filter bus-state information indicates exclusive ownership on a remote bus, a snoop is issued to demote the exclusive state to shared in the remote-bus CPU cache, as indicated in block 50. The snoop filter will monitor the snoop phase of the snooped bus to see if a processor agent asserts HIT, indicating its intention of keeping a shared copy of the data cached. Following this snoop cycle, the tag controller TCON asserts HIT# on the request bus, as indicated in block 51. Next, the snoop filter will determine whether HIT# is asserted on the remote bus, as indicated in block 52. If not, the snoop filter tracks the request bus as shared, as indicated in block 53. If HIT# is asserted on the remote bus, all bus states are known to be shared and both buses (remote and request) are tracked as shared in the snoop filter, as indicated in block 54.
Returning to block 50, if a tag match occurs with an exclusive state and the snoop filter bus-state information indicates exclusive ownership on the same bus as the request (requesting bus), no snoops are issued, because the bus is a self-snooping bus. In this situation (i.e., snoop filter indicates exclusive state on the request bus), the tag controller TCON cannot assert HIT# during the snoop phase as it will lead to an infinite snoop stall if the exclusive CPU asserts HITM#. The next step depends on whether the existing exclusive agent (as opposed to the tag controller TCON) asserts HIT#, as indicated in block 56. If the existing exclusive agent asserts HIT#, then the shared state on the requesting bus is tracked in the snoop filter, as indicated in block 57. If the existing exclusive agent does not assert HIT#, then the requesting agent is allowed to allocate to the exclusive state and is tracked as such in the snoop filter, as indicated in block 58.
Returning to block 48, if there is a tag match, and the request is in the shared state, the tag controller TCON asserts the HIT# on the request bus, as indicated in block 59. Next, the tag controller TCON determines whether the cache line is shared on the remote bus or shared on the request bus, as indicated in block 60. If the cacheline corresponding to the request is not shared on the remote bus (and thus is shared on the request bus), the shared state is tracked in the snoop filter on the request bus, as indicated in block 62. If the cache line is in the shared state on the remote bus, the snoop filter may issue a snoop to the remote bus and monitor the snoop response to determine the next state. The algorithm illustrated in FIG. 3 does not issue a snoop to the remote Shared bus. Consequently, the remote Shared bus must retain the Shared state to remain coherent, as indicated in block 64.
Returning again to block 48, if the request is in the invalid state, the tag controller TCON asserts the HIT# on the request bus, as indicated in block 66. The shared state is tracked in the snoop filter on the request bus, as indicated by block 67. Finally, if the request is in the unknown state, the tag controller TCON asserts HIT#, as indicated in block 68 and the tag state is retained as unknown, as indicated in block 69.
FIG. 4 is a flow chart illustrating the processing of a Bus Read Line Code (BRLC) request. When a BRLC request is executed by a CPU 12A-12H, it is received at the tag controller TCON, as indicated in block 70. The tag controller TCON will search the snoop filter for the request address, as indicated in block 72. First, the tag controller TCON will determine whether there is a tag match (i.e., whether the request address is currently being tracked in the snoop filter), as indicated in block 74. If there is no tag match, then the tag controller TCON asserts the HIT# on the request bus, as indicated in block 75. Since the tag cache does not have any history on the new tag, system agents may already have the address cached in the Shared state and the new tag must remain in the Unknown state. No additional information is gained by the snoop filter by tracking the new tag in the Unknown state. Therefore, the current tag is not castout but is retained instead along with any state information that exists and the request address will not be tracked by the snoop filter 36, as indicated in block 76.
If the tag controller TCON determines that there is a tag match (block 74), the tag controller will determine whether the request is in an exclusive state, an invalid state, a shared state or an unknown state, as indicated in block 78. If the request is in the exclusive state, the tag controller TCON checks the bus-state information stored in the tag RAM 36 to determine whether the cache line indicates exclusive ownership on a remote bus (a bus other than the bus initiating the request) or indicates exclusive ownership on the same bus as the request, as indicated in block 80. If the snoop filter bus-state information indicates exclusive ownership on a remote bus, a snoop is issued to demote the exclusive state to shared in the remote-bus CPU cache, as indicated in block 81 Next, the snoop filter will determine whether HIT# is asserted on the remote bus, as indicated in block 82. If not, the snoop filter tracks the request bus as shared, as indicated in block 83. If HIT# is asserted on the remote bus, all bus states are known to be shared and both buses (remote and request) are tracked as shared in the snoop filter, as indicated in block 84. If a tag match occurs with an exclusive state and the snoop filter bus-state information indicates exclusive ownership on the same bus as the request (requesting bus), no snoops are issued, because the bus is a self-snooping bus. Thus, the request bus is tracked as shared in the snoop filter, as indicated in block 85.
Returning to block 78, if there is a tag match, and the request is in the shared state, the tag controller TCON determines whether the cache line is shared on the remote bus or shared on the request bus, as indicated in block 86. If the cache line corresponding to the request is not shared on the remote bus (and thus is shared on the request bus), the shared state is tracked in the snoop filter on the request bus, as indicated in block 88. If the cache line is in the shared state on the remote bus, the shared state is tracked in the snoop filter on both the remote bus and the request bus, as indicated in block 90.
Returning to block 78, if the request is in the invalid state, the shared state is tracked in the snoop filter on the request bus, as indicated by block 92. Finally, if the request is in the unknown state, the tag controller TCON asserts HIT#, as indicated in block 94 and the tag is retained in the unknown state, as indicated in block 96.
FIG. 5 is a flow chart illustrating the handling of a Bus Read and Invalidate Line (BRIL) request. Generally, in accordance with embodiments of the present invention, the snoop filter 36 tracks BRIL requests as exclusive in the tag RAM 36 since the requesting CPU 12A-12H will allocate the request address as either exclusive or modified. When a BRIL request is executed by a CPU 12A-12H, it is received at the tag controller TCON, as indicated in block 100. The tag controller TCON will search the snoop filter for the request address, as indicated in block 102. First, the tag controller TCON will determine whether there is a tag match (i.e., whether the request address is currently being tracked in the tag RAM 36), as indicated in block 104.
If a tag match occurs, the tag controller TCON will determine whether the request is in an exclusive state, an invalid state, a shared state or an unknown state, as indicated in block 106. If the request is in the exclusive state, the tag controller TCON checks the bus-state information stored in the tag RAM 36 to determine whether the cache line indicates exclusive ownership on a remote bus (a bus other than the bus initiating the request) or indicates exclusive ownership on the same bus as the request, as indicated in block 108. If the snoop filter bus-state information indicates exclusive ownership on a remote bus, a snoop is issued to demote the state from exclusive to invalid on the remote-bus CPU cache, as indicated in block 110. Following this snoop cycle, the snoop filter tracks the request bus as exclusive in the tag RAM 36, as indicated in block 112. If a tag match occurs with an exclusive state and the snoop filter bus-state information indicates exclusive ownership on the same bus as the request (requesting bus), no snoops are issued, because the bus is a self-snooping bus. The snoop filter tracks the request bus as exclusive in the tag RAM 36, as indicated in block 112.
Returning to block 106, if there is a tag match, and the request is in the shared state, the tag controller TCON determines whether the cache line is shared on the remote bus or shared on the request bus, as indicated in block 116. If the cache line corresponding to the request is not shared on the remote bus (and thus is shared on the request bus), the request bus is tracked as exclusive in the tag RAM 36, as indicated in block 120. If the cache line is in the shared state on the remote bus, a snoop is issued to demote the state from shared to invalid on the remote-bus CPU cache, as indicated in block 118. Following this snoop cycle, the snoop filter tracks the request bus as exclusive in the tag RAM 36, as indicated in block 120. Finally, if the request is in the invalid state, the request bus is tracked as exclusive in the tag RAM 36, as indicated by block 121.
Returning to block 106, if there is a tag match and the request is in the unknown state, invalidate-snoops are issued to any bus that is remote to the request bus using the request tag address, as indicated in block 128. The purpose for the snoops is to invalidate potential shared states on these remote buses. As will be appreciated, the request bus is self-snooping and therefore does not receive a specific snoop. Finally, the snoop filter tracks the request bus as exclusive in the tag RAM 36, as indicated in block 129.
Returning to block 104, the tag controller (TCON) may determine that there is no tag match. If a tag match does not occur, the tag controller (TCON) will determine whether the tag state is Exclusive, as indicated in block 122. If the castout state is Exclusive, the bus segment specified by the Exclusive state bits (may be the request bus) is snoop invalidated with the castout tag, as indicated in block 124. Next, invalidate-snoops are issued to any bus that is remote to the request bus using the request tag address, as indicated in block 126. The purpose for the snoops is to invalidate potential shared states on these remote buses. As will be appreciated, the request bus is self-snooping and therefore does not receive a specific snoop. Finally, the snoop filter tracks the request bus as exclusive in the tag RAM 36, as indicated in block 127. If the state associated with the castout tag is any state other than Exclusive, no castout snoops are issued. In either case, all remote busses are snoop invalidated using the request address to invalidate potentially shared data in agent caches.
Returning to block 122, if no tag is present at the tag index, then all previous requests to the current request address are assumed to be in the shared state. In this case, the tag controller TCON assumes that one or more of the remote buses has the same address as the request cached in the shared state. Since BRILs always allocate to exclusive or modified, snoops are issued to all remote buses using the request tag to invalidate their caches, as indicated in block 126. Finally, the snoop filter tracks the request bus as exclusive in the tag RAM 36, as indicated in block 127.
FIG. 6 is a flow chart illustrating the handling of a Bus Write and Invalidate Line (BWIL) request. Generally, in accordance with embodiments of the present invention, the snoop filter 36 tracks BWIL request as invalid since the requesting CPU 12A-12H will allocate the request address as invalid after a BWIL request. When a BWIL request is executed by a CPU 12A-12H, it is received at the tag controller TCON, as indicated in block 130. The tag controller TCON will search the snoop filter for the request address, as indicated in block 132. First, the tag controller TCON will determine whether there is a tag match (i.e., whether the request address is currently being tracked in the snoop filter 36), as indicated in block 134.
If a tag match occurs, the tag controller TCON will determine whether the request is in an exclusive state, an invalid state, a shared state or an unknown state, as indicated in block 136. If the request is in the exclusive state, the tag controller TCON checks the bus-state information stored in the tag RAM 36 to determine whether the cache line indicates exclusive ownership on a remote bus (a bus other than the bus initiating the request) or indicates exclusive ownership on the same bus as the request, as indicated in block 138. If the snoop filter bus-state information indicates exclusive ownership on a remote bus, a snoop is issued to demote the state from exclusive to invalid on the remote-bus CPU cache, as indicated in block 140. Following this snoop cycle, the snoop filter tracks all busses as invalid in the tag RAM 36, as indicated in block 142. Returning to block 138, if a tag match occurs with an exclusive state and the snoop filter bus-state information indicates exclusive ownership on the same bus as the request (requesting bus), the snoop filter tracks all busses as invalid in the tag RAM 36, as indicated in block 142.
Returning to block 136, if there is a tag match, and the request is in the shared state, the tag controller TCON determines whether the cache line is shared on the remote bus or shared on the request bus, as indicated in block 148. If the cache line is in the shared state on the remote bus, a snoop is issued to demote the state from shared to invalid on the remote-bus CPU cache, as indicated in block 150. Following this snoop cycle, the snoop filter tracks all busses as invalid in the tag RAM 36, as indicated in block 152. If the cache line corresponding to the request is not shared on the remote bus (and thus is shared on the request bus), the snoop filter tracks all busses as invalid in the tag RAM 36, as indicated in block 152. If the request is in the invalid state, the snoop filter continues to track all busses as invalid in the tag RAM 36, as indicated by block 154. Finally, if the request is in the unknown state, invalidate-snoops are issued to any bus that is remote to the request bus using the request tag address, as indicated in block 162, and the snoop filter tracks all busses as invalid in the tag RAM 36, as indicated in block 164.
Returning to block 134, the tag controller (TCON) may determine that there is no tag match. If a tag match does not occur, invalidate-snoops are issued to any bus that is remote to the request bus using the request tag address to invalidate potentially shared copies of the data in processor caches, as indicated in block 156. Next, it is determined whether the state of the existing tag RAM is unknown, as indicated in block 157. If the current tag state is Unknown, the snoop filter will track the request tag state as Invalid, as indicated in block 158. If the state of the existing tag RAM is not unknown, the snoop filter continues to track the existing tag and its current state in the tag RAM 36, as indicated by block 160.
FIG. 7 is a flow chart illustrating the handling of a Bus Write Line (BWL) request. Generally, in accordance with embodiments of the present invention, the snoop filter 36 tracks BWL requests as invalid or exclusive/modified, depending on the CPU architecture (i.e., whether the CPU cache is inclusive or not). When a BWL request is executed by a CPU 12A-12H, it is received at the tag controller TCON, as indicated in block 170. The tag controller TCON will search the snoop filter for the request address, as indicated in block 172. A tag match will always occur with the snoop filter bus-state information indicating exclusive ownership on the same bus as the request, as indicated in block 174. As such, no snoops will be issued to remote busses. Depending on the CPU cache architecture, the snoop filter will update in one of two ways depending on whether the CPU demotes to invalid, as indicated in block 176. If the CPU demotes to invalid following a BWL request, the snoop filter 36 is updated to reflect an invalid state across all buses, as indicated in block 178. If the CPU is allowed to remain at the exclusive or modified state following a BWL, no updates are made to the snoop filter state information and the address will continue to be tracked as exclusive in the snoop filter, as indicated in block 180.
Many of the steps of the exemplary processes described above with reference to FIGS. 3-7 comprise an ordered listing of executable instructions for implementing logical functions. The ordered listing can be embodied in a computer-readable medium for use by or in connection with a computer-based system that can retrieve the instructions and execute them to carry out the previously described processes. In the context of this application, the computer-readable medium can be a means that can contain, store, communicate, propagate, transmit or transport the instructions. By way of example, the computer readable medium can be an electronic, a magnetic, an optical, an electromagnetic, or an infrared system, apparatus, or device. An illustrative, but non-exhaustive list of computer-readable mediums can include an electrical connection (electronic) having one or more wires, a portable computer diskette, a random access memory (RAM) a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, and a portable compact disk read-only memory (CDROM). It is even possible to use paper or another suitable medium upon which the instructions are printed. For instance, the instructions can be electronically captured via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
While the invention may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, it should be understood that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the invention as defined by the following appended claims.

Claims (16)

1. A method of processing requests in a computer system comprising:
receiving a request at a controller, wherein the request has an associated state, a tag and address;
checking a snoop filter for the address;
determining whether there is a tag match in the snoop filter;
determining whether the state of the request is one of exclusive, shared, invalid and unknown;
if the request is shared, determining whether the request is shared on one of a request bus and a remote bus;
if the request is shared on the remote bus, tracking the request bus and remote bus as shared in a tag RAM; and
if the request is shared on the request bus, tracking the request bus as shared in the tag RAM, wherein the tag is not associated with a castout penalty that is an occurrence of a castout snoop on a shared access without a net gain in precise snoop filter state information.
2. The method, as set forth in claim 1, wherein receiving the request comprises receiving a bus read line data (BRLD) request, and further comprising asserting HIT# on the request bus.
3. The method, as set forth in claim 1, wherein receiving the request comprises receiving a bus read line code (BRLC) request.
4. A method of processing requests in a computer system comprising:
receiving a request at a controller, wherein the request has an associated state, tag and address;
checking a snoop filter for the address;
determining whether there is a tag match in the snoop filter;
determining whether the state of the request is one of exclusive, shared, invalid and unknown;
if the request is shared, determining whether the request is shared on one of a request bus and a remote bus; and
if the request is shared on the remote bus, issuing a snoop to demote from an exclusive state to an invalid state on the remote bus, wherein the tag associated with the request is not associated with a castout penalty that is an occurrence of a castout snoop on a shared access without a net gain in precise snoop filter state information.
5. The method, as set forth in claim 4, wherein receiving the request comprises receiving a bus read invalidate line (BRIL) request, and further, if the request is shared on the request bus, tracking the request bus in an exclusive state in the tag RAM.
6. The method, as set forth in claim 4, wherein receiving the request comprises receiving a bus read invalidate (BRIL) request, and further, if the request is shared on the remote bus, issuing a snoop to demote from a shared state in an invalid state on the remote bus and tracking the request in an exclusive state in the tag RAM.
7. The method, as set forth in claim 4, wherein receiving the request comprises receiving a bus write invalidate (BWIL) request, and further, tracking the request bus and the remote bus in an invalid state in the tag RAM.
8. A system comprising:
a plurality of buses;
a tag RAM; and
a tag controller coupled to each of the buses and configured to receive requests, each request having a corresponding one of a plurality of states wherein the tag controller is configured to track only those requests having certain of the plurality of states in the tag RAM and not being associated with a castout penalty that is an occurrence of a castout snoop on a shared access without a net gain in precise snoop filter state information, and wherein the tag controller is further configured not to track those requests not having the certain of the plurality of states.
9. The system, as set forth in claim 8, wherein the tag controller is configured to receive requests, each having a state comprising one of an exclusive state, a shared state, an invalid state and an unknown state.
10. The system, as set forth in claim 9, wherein the certain of the plurality of states excludes the unknown state.
11. The system, as set forth in claim 8, wherein the tag controller is configured to track only those requests allocated to a shared state without an associated castout penalty in the tag RAM.
12. The system, as set forth in claim 8, wherein the tag controller is configured to track Bus Read Line Data (BRLD) requests and Bus Read Line Code (BRLC) requests.
13. The system, as set forth in claim 12, wherein the BRLD requests and the BRLC requests are tracked in the shared state if there is no associated castout penalty.
14. A system comprising:
a plurality of buses;
at least one processor coupled to each of the plurality of buses; and
a snoop filter configured to eliminate unnecessary snoops of the plurality of buses, and further configured to track requests from the at least one processor only if tracking the request does not result in a castout penalty, wherein a castout tag is retained, a castout snoop is canceled and the request is dropped from the snoop filter if the request causes the snoop filter castout that discards state information associated with the castout tag.
15. The system, as set forth in claim 14, wherein the snoop filter is configured to receive the request on a request bus, wherein the request comprises a tag, an address and a state, and wherein the snoop filter is configured to snoop a remote bus for the tag corresponding to the request.
16. The system, as set forth in claim 15, further comprising a tag RAM, wherein the tag corresponding to the request is tracked in the tag RAM only if the state corresponding to the request is one of an exclusive state and a shared state.
US11/225,937 2005-09-13 2005-09-13 Techniques for reducing castouts in a snoop filter Active 2026-08-12 US7502895B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/225,937 US7502895B2 (en) 2005-09-13 2005-09-13 Techniques for reducing castouts in a snoop filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/225,937 US7502895B2 (en) 2005-09-13 2005-09-13 Techniques for reducing castouts in a snoop filter

Publications (2)

Publication Number Publication Date
US20070061520A1 US20070061520A1 (en) 2007-03-15
US7502895B2 true US7502895B2 (en) 2009-03-10

Family

ID=37856651

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/225,937 Active 2026-08-12 US7502895B2 (en) 2005-09-13 2005-09-13 Techniques for reducing castouts in a snoop filter

Country Status (1)

Country Link
US (1) US7502895B2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013188414A3 (en) * 2012-06-15 2014-03-13 Soft Machines, Inc. A method and system for filtering the stores to prevent all stores from having to snoop check against all words of a cache
US9904552B2 (en) 2012-06-15 2018-02-27 Intel Corporation Virtual load store queue having a dynamic dispatch window with a distributed structure
US9928121B2 (en) 2012-06-15 2018-03-27 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US9965277B2 (en) 2012-06-15 2018-05-08 Intel Corporation Virtual load store queue having a dynamic dispatch window with a unified structure
US9990198B2 (en) 2012-06-15 2018-06-05 Intel Corporation Instruction definition to implement load store reordering and optimization
US10019263B2 (en) 2012-06-15 2018-07-10 Intel Corporation Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
US10048964B2 (en) 2012-06-15 2018-08-14 Intel Corporation Disambiguation-free out of order load store queue
US10346307B2 (en) 2016-09-28 2019-07-09 Samsung Electronics Co., Ltd. Power efficient snoop filter design for mobile platform

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7383398B2 (en) * 2006-03-31 2008-06-03 Intel Corporation Preselecting E/M line replacement technique for a snoop filter
US20080140633A1 (en) * 2006-10-05 2008-06-12 Holt John M Synchronization with partial memory replication
US8539164B2 (en) * 2007-04-30 2013-09-17 Hewlett-Packard Development Company, L.P. Cache coherency within multiprocessor computer system
WO2014149038A1 (en) * 2013-03-20 2014-09-25 Hewlett-Packard Development Company, L.P. Caching data in a memory system having memory nodes at different hierarchical levels
US9798672B1 (en) 2016-04-14 2017-10-24 Macom Connectivity Solutions, Llc Data managment for cache memory

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0661641A2 (en) * 1993-12-30 1995-07-05 International Business Machines Corporation A computer system
US5611058A (en) 1993-01-29 1997-03-11 International Business Machines Corporation System and method for transferring information between multiple buses
US6115795A (en) 1997-08-06 2000-09-05 International Business Machines Corporation Method and apparatus for configurable multiple level cache with coherency in a multiprocessor system
US6678798B1 (en) 2000-07-20 2004-01-13 Silicon Graphics, Inc. System and method for reducing memory latency during read requests
US20040117561A1 (en) * 2002-12-17 2004-06-17 Quach Tuan M. Snoop filter bypass
US6829683B1 (en) 2000-07-20 2004-12-07 Silicon Graphics, Inc. System and method for transferring ownership of data in a distributed shared memory system
US20040255085A1 (en) * 2003-06-11 2004-12-16 International Business Machines Corporation Ensuring orderly forward progress in granting snoop castout requests
US20060080508A1 (en) * 2004-10-08 2006-04-13 International Business Machines Corporation Snoop filter directory mechanism in coherency shared memory system
US20060224839A1 (en) * 2005-03-29 2006-10-05 International Business Machines Corporation Method and apparatus for filtering snoop requests using multiple snoop caches
US7213106B1 (en) * 2004-08-09 2007-05-01 Sun Microsystems, Inc. Conservative shadow cache support in a point-to-point connected multiprocessing node
US7404046B2 (en) * 2005-02-10 2008-07-22 International Business Machines Corporation Cache memory, processing unit, data processing system and method for filtering snooped operations

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5611058A (en) 1993-01-29 1997-03-11 International Business Machines Corporation System and method for transferring information between multiple buses
EP0661641A2 (en) * 1993-12-30 1995-07-05 International Business Machines Corporation A computer system
US6115795A (en) 1997-08-06 2000-09-05 International Business Machines Corporation Method and apparatus for configurable multiple level cache with coherency in a multiprocessor system
US6678798B1 (en) 2000-07-20 2004-01-13 Silicon Graphics, Inc. System and method for reducing memory latency during read requests
US6829683B1 (en) 2000-07-20 2004-12-07 Silicon Graphics, Inc. System and method for transferring ownership of data in a distributed shared memory system
US20040117561A1 (en) * 2002-12-17 2004-06-17 Quach Tuan M. Snoop filter bypass
US20040255085A1 (en) * 2003-06-11 2004-12-16 International Business Machines Corporation Ensuring orderly forward progress in granting snoop castout requests
US7213106B1 (en) * 2004-08-09 2007-05-01 Sun Microsystems, Inc. Conservative shadow cache support in a point-to-point connected multiprocessing node
US20060080508A1 (en) * 2004-10-08 2006-04-13 International Business Machines Corporation Snoop filter directory mechanism in coherency shared memory system
US7305524B2 (en) * 2004-10-08 2007-12-04 International Business Machines Corporation Snoop filter directory mechanism in coherency shared memory system
US7404046B2 (en) * 2005-02-10 2008-07-22 International Business Machines Corporation Cache memory, processing unit, data processing system and method for filtering snooped operations
US20060224839A1 (en) * 2005-03-29 2006-10-05 International Business Machines Corporation Method and apparatus for filtering snoop requests using multiple snoop caches

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013188414A3 (en) * 2012-06-15 2014-03-13 Soft Machines, Inc. A method and system for filtering the stores to prevent all stores from having to snoop check against all words of a cache
US9904552B2 (en) 2012-06-15 2018-02-27 Intel Corporation Virtual load store queue having a dynamic dispatch window with a distributed structure
US9928121B2 (en) 2012-06-15 2018-03-27 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US9965277B2 (en) 2012-06-15 2018-05-08 Intel Corporation Virtual load store queue having a dynamic dispatch window with a unified structure
US9990198B2 (en) 2012-06-15 2018-06-05 Intel Corporation Instruction definition to implement load store reordering and optimization
US10019263B2 (en) 2012-06-15 2018-07-10 Intel Corporation Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
US10048964B2 (en) 2012-06-15 2018-08-14 Intel Corporation Disambiguation-free out of order load store queue
US10592300B2 (en) 2012-06-15 2020-03-17 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US10346307B2 (en) 2016-09-28 2019-07-09 Samsung Electronics Co., Ltd. Power efficient snoop filter design for mobile platform

Also Published As

Publication number Publication date
US20070061520A1 (en) 2007-03-15

Similar Documents

Publication Publication Date Title
US7502895B2 (en) Techniques for reducing castouts in a snoop filter
US5623632A (en) System and method for improving multilevel cache performance in a multiprocessing system
US5561779A (en) Processor board having a second level writeback cache system and a third level writethrough cache system which stores exclusive state information for use in a multiprocessor computer system
US7305522B2 (en) Victim cache using direct intervention
US8015365B2 (en) Reducing back invalidation transactions from a snoop filter
US6578116B2 (en) Snoop blocking for cache coherency
US6018791A (en) Apparatus and method of maintaining cache coherency in a multi-processor computer system with global and local recently read states
KR100318104B1 (en) Non-uniform memory access (numa) data processing system having shared intervention support
US6721848B2 (en) Method and mechanism to use a cache to translate from a virtual bus to a physical bus
US6289420B1 (en) System and method for increasing the snoop bandwidth to cache tags in a multiport cache memory subsystem
US7305523B2 (en) Cache memory direct intervention
US7669009B2 (en) Method and apparatus for run-ahead victim selection to reduce undesirable replacement behavior in inclusive caches
US6829665B2 (en) Next snoop predictor in a host controller
US20030070016A1 (en) Efficient snoop filter in a multiple-processor-bus system
US20030154350A1 (en) Methods and apparatus for cache intervention
US9639470B2 (en) Coherency checking of invalidate transactions caused by snoop filter eviction in an integrated circuit
US8015364B2 (en) Method and apparatus for filtering snoop requests using a scoreboard
US5829027A (en) Removable processor board having first, second and third level cache system for use in a multiprocessor computer system
US20190155729A1 (en) Method and apparatus for improving snooping performance in a multi-core multi-processor
US20070038814A1 (en) Systems and methods for selectively inclusive cache
KR20210041485A (en) Memory interface having data signal path and tag signal path
US5987544A (en) System interface protocol with optional module cache
US8332592B2 (en) Graphics processor with snoop filter
US7024520B2 (en) System and method enabling efficient cache line reuse in a computer system
EP0681241A1 (en) Processor board having a second level writeback cache system and a third level writethrough cache system which stores exclusive state information for use in a multiprocessor computer system

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JONES, PHILLIP M.;GHARACHORLOO, KOUROSH;REEL/FRAME:016999/0795;SIGNING DATES FROM 20050909 TO 20050912

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 8

SULP Surcharge for late payment

Year of fee payment: 7

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12

AS Assignment

Owner name: OT PATENT ESCROW, LLC, ILLINOIS

Free format text: PATENT ASSIGNMENT, SECURITY INTEREST, AND LIEN AGREEMENT;ASSIGNORS:HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP;HEWLETT PACKARD ENTERPRISE COMPANY;REEL/FRAME:055269/0001

Effective date: 20210115

AS Assignment

Owner name: VALTRUS INNOVATIONS LIMITED, IRELAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OT PATENT ESCROW, LLC;REEL/FRAME:055403/0001

Effective date: 20210201