US20020087803A1 - Apparatus for identifying memory requests originating on remote I/O devices as noncacheable - Google Patents

Apparatus for identifying memory requests originating on remote I/O devices as noncacheable Download PDF

Info

Publication number
US20020087803A1
US20020087803A1 US09/751,505 US75150500A US2002087803A1 US 20020087803 A1 US20020087803 A1 US 20020087803A1 US 75150500 A US75150500 A US 75150500A US 2002087803 A1 US2002087803 A1 US 2002087803A1
Authority
US
United States
Prior art keywords
bridge unit
data
bus
cacheable
cache coherence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/751,505
Other versions
US6463510B1 (en
Inventor
Phillip Jones
Robert Woods
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Original Assignee
Compaq Information Technologies Group LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Compaq Information Technologies Group LP filed Critical Compaq Information Technologies Group LP
Priority to US09/751,505 priority Critical patent/US6463510B1/en
Assigned to COMPAQ COMPUTER CORPORATION reassignment COMPAQ COMPUTER CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JONES, PHILLIP M.
Assigned to COMPAQ COMPUTER CORPORATION reassignment COMPAQ COMPUTER CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WOODS, ROBERT L.
Assigned to COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P. reassignment COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: COMPAQ COMPUTER CORPORATION
Publication of US20020087803A1 publication Critical patent/US20020087803A1/en
Application granted granted Critical
Publication of US6463510B1 publication Critical patent/US6463510B1/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: COMPAQ INFORMATION TECHNOLOGIES GROUP, LP
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0835Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means for main memory peripheral accesses (e.g. I/O or DMA)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass

Definitions

  • the present invention generally relates to sharing data among processors using cache memories in a computer system with multiple processors. More particularly, the present invention relates to maintaining the coherence of data in cache memories by using a cache coherence directory and bus snooping. Still more particularly, the invention relates to a computer system in which a sideband signal identifies memory requests as non-cacheable to minimize cache coherence directory lookups and bus snoops.
  • Modem day computer systems can include a single processor or multiple processors for higher performance.
  • a host bridge unit coupled to each processor of the multiprocessing computer system allows the computer system to support many different kinds of devices attached to a multitude of different buses.
  • the host bridge unit may cornect to processor buses, a main memory bus, I/O bus, and connected through an I/O bridge unit, an advanced graphic port (“AGP”) bus, peripheral component interconnect (“PCI”) bus or peripheral component interconnect extended (“PCIx”) bus.
  • AGP advanced graphic port
  • PCI peripheral component interconnect
  • PCIx peripheral component interconnect extended
  • Each of the processor buses can support a maximum number of processors (e.g. 4 , 6 , 8 , 12 etc.) connected to the processor bus while still maintaining bus communication bandwidth for sufficiently high performance.
  • Each processor of the computer system includes a memory cache either integrated into the processor chip itself or external to the processor chip.
  • the memory cache stores data and instructions and improves processor performance by allowing high-speed access to the needed data and instructions resulting in reduced program execution time.
  • each unit of data is identified as being owned by a particular processor.
  • Requestor processors in the computer system may request a unit of data from an owner processor.
  • the requesting processor may access data to perform either read or write operations. If a requesting processor modifies the data by performing a write, other processors of the computer system may have access to old, unmodified versions of the data.
  • each processor maintains a local record of the addresses cached on the various processors and the particular “state” of each unit of data associated with the address in a cache coherence directory.
  • a “state” describes the copies of the data unit stored in the memory caches of the particular system.
  • the computer system using a cache coherence directory, implements a coherency protocol that enforces the consistency of data in the cache memories.
  • the coherency protocol describes the different states of a data unit.
  • a data unit may be in a shared state that corresponds to processors having a read only copy of the data unit.
  • a data unit may be in an exclusive state in which only one requestor processor contains a copy of the data unit that it may modify.
  • a bus snoop involves accessing the bus to communicate with processors on the processor bus to monitor and maintain coherency of data.
  • a bus snoop is needed whenever a requester processor needs access to data that it does not have an exclusive copy of or is not the owner. Large amounts of snoop traffic can seriously impact computer system performance.
  • One solution to this problem is to compare the address of the data to the cache coherence directory to determine if one of the other processors owns the address or has an exclusive copy. If the cache coherence directory indicates ownership of the address or an exclusive copy by a different processor, a bus snoop is performed. If the requesting processor owns the address or has an exclusive copy, a bus snoop is not performed, thus preserving processor bus bandwidth.
  • Hardware to maintain the coherency of the data includes a cache coherence controller and cache coherence directory.
  • the cache coherence directory preferably includes enough Random Access Memory (“RAM”) to maintain a sufficient record of the addresses cached on the various processors and the particular state of each unit of data associated with the address. It would be advantageous if the cache coherence directory and cache coherence protocol could be implemented in such a way as to be able to quickly retrieve memory requests from the processor and peripheral devices. To implement a fast cache coherence directory, interleaved banks of RAM can be used. To further reduce the access time for processor and peripheral device memory requests, the cache coherence protocol could be implemented to reduce the number of memory requests that must be compared to the cache coherence directory. One way to reduce memory request access times would be for the host bridge unit to identify memory requests from peripheral devices as non-cacheable and then skip the cache coherence directory lookup and bus snoop.
  • RAM Random Access Memory
  • the current generation of host bridge units has no dedicated hardware support for the identification of non-cacheable memory requests.
  • the next generation of AGP bus implementations in computer systems will have graphics devices coupled to an I/O bridge in order to permit greater flexibility in the I/O subsystem.
  • graphics devices will send requests for data stored in memory to the I/O bridge that will then forward the request to the host bridge.
  • the graphics device and AGP bus are implemented to inform the I/O bridge that the data requested is non-cacheable, because host bridge units have no dedicated hardware support for identification of non-cacheable memory requests, the information that the data for the memory request is non-cacheable will not be provided to the host bridge.
  • the deficiencies of the prior art described above are solved in large part by an apparatus for identifying memory requests originating on remote I/O devices as non-cacheable in a computer system with multiple processors.
  • the apparatus includes a main memory, memory cache, processor and cache coherence directory all coupled to a host bridge unit (North bridge or memory controller).
  • the I/O device transmits requests for data to the I/O bridge unit.
  • the I/O bridge unit forwards the request for data to the host bridge unit and asserts a sideband signal to the host bridge unit if the request is for non-cacheable data.
  • the host bridge unit includes a cache coherence controller that implements a protocol to maintain the coherence of data stored in each of the processor caches in the computer system.
  • the cache coherence directory connects to the cache coherence controller. If the host bridge unit determines that the data is cacheable, (i.e. the sideband signal is not asserted) then it requests the cache coherence controller to perform a cache coherence directory lookup to maintain the coherence of the data. If the data is non-cacheable, (i.e. the sideband signal is asserted) then the host bridge unit does not request the cache coherence controller to perform a cache coherence directory lookup.
  • Various I/O devices can be coupled to the I/O bridge unit through an AGP bus, PCI bus, or PCIx bus.
  • the preferred embodiment of the invention comprises a combination of features and advantages that enable it to overcome various problems of prior devices.
  • the various characteristics described above, as well as other features, will be readily apparent to those skilled in the art upon reading the following detailed description of the preferred embodiments of the invention, and by referring to the accompanying drawings.
  • FIG. 1 shows a system diagram of a plurality of processors coupled together through a multitude of processor buses
  • FIGS. 2 shows a block diagram of the hardware to identify memory requests as non-cacheable to reduce cache coherence directory lookups and bus snoops in accordance with the preferred embodiment.
  • computer system 90 comprises one or more processor modules 100 coupled to a main memory 102 and an input/output (“I/O”) bridge unit 104 .
  • computer system 90 includes five processor modules 100 , each processor module 100 coupled to a main memory 102 , an external memory cache 106 , and an I/O bridge unit 104 .
  • Each processor preferably includes processor buses 108 for connection to adjacent processors.
  • a processor module 100 preferably couples through a straight-line processor bus 108 to all other processor modules 100 of computer system 90 .
  • each processor module 100 in the embodiment shown can be connected to four other processor modules 100 through the processor bus 108 .
  • any desired number of processors e.g. 4 , 6 , 7 , 8 , 12 etc.
  • any desired number of processors e.g. 4 , 6 , 7 , 8 , 12 etc.
  • the I/O bridge unit 104 provides an interface to various input/output devices such as disk drives and display devices as described in greater detail below. Data from the I/O devices thus enters the processor bus 108 of the computer system via the I/O bridge unit 104 .
  • the main memory 102 generally includes a conventional memory device or an array of memory devices in which application programs and data are stored.
  • the capacity of the main memory 102 can be any suitable size.
  • main memory 102 preferably is any suitable type of memory such as dynamic random access memory (“DRAM”) or any of the various types of DRAM circuits such as synchronous dynamic random access memory (“SDRAM”).
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • an off-chip external cache 106 couples to a processor module 100 through the processor bus 108 .
  • the external cache may be a 1.75-MB, seven-way set associative write-back mixed instruction and data cache.
  • the L2 cache holds physical address data for each block.
  • memory cache 106 may be integrated on-chip into the processor in processor module 100 .
  • the on-chip memory cache may be unified instruction and data cache.
  • the memory cache 106 preferably comprises a 64-KB, two-way set associative, virtually indexed, physically tagged, writeback memory cache with 64-byte cache blocks.
  • each data cache block contains 64 data bytes and associated quadword ECC bits, tag parity bit calculated across the tag and one bit to control round-robin set allocation.
  • the memory cache 106 is organized to contain two cache sets, each with 512 rows containing 64-byte blocks per row (i.e., 32 KB of data per cache set).
  • computer system 90 can be configured so that any processor module 100 can access its own main memory 102 and I/O devices as well as the main memory and I/O devices of all other processors in the network.
  • the computer system has physical connections between each processor module resulting in low interprocessor communication times and improved memory and I/O device access reliability.
  • each hardware block 110 preferably includes main memory 102 coupled to processor module 100 through a memory bus 230 .
  • External cache 106 of the preferred embodiment couples to processor module 100 through processor bus 108 .
  • the processor module 100 preferably includes multiple processor buses 108 (e.g. processor bus 0 and processor bus 1 ) coupling to other processor modules and external caches.
  • An I/O bridge unit (South bridge or I/O controller) 104 couples to the processor module 100 through an I/O bus 240 .
  • a side band signal 250 also couples the processor module 100 to the I/O bridge unit 104 .
  • I/O bridge unit 104 preferably couples the processor module 100 to various peripheral devices through a variety of different peripheral buses.
  • a PCI bus 260 couples the PCI device 265 to the I/O bridge unit 104 .
  • PCI devices which can be coupled to the PCI bus include network interface cards, video accelerators, audio cards, SCSI adapters, and telephony cards, to name a few.
  • the I/O bridge unit 104 also may couple to PCIx devices 275 through a PCIx bus 270 .
  • the I/O bridge unit 104 also couples the processor module 100 to a graphics controller 285 through an AGP bus 280 .
  • the graphics controller 285 is coupled to a display 290 .
  • a suitable display 290 may include, for example, a cathode ray tube (“CRT”), a liquid crystal display (“LCD”), or a virtual retinal display (“VRD”), or any other type of suitable display device for a computer system.
  • the graphics controller controls the output sent to the display 290 .
  • the processor module 100 includes a host bridge unit (North Bridge or memory controller) 210 that couples to a processor 215 through the processor bus 108 .
  • the external cache 106 of the preferred embodiment couples through the processor bus 108 to the host bridge unit 210 and processor 215 .
  • processor bus 0 108 a couples the processor to another processor in the computer system 90 .
  • the host bridge unit 210 also couples to the main memory 102 through the memory bus 230 .
  • I/O bridge unit 104 couples to the host bridge unit 210 through the I/O bus 240 and side band signal 250 .
  • processor module 100 also includes a cache coherence directory 220 that couples to the host bridge unit 210 .
  • Processor bus 1 108 c couples other processors of computer system 90 to processor 215 through host bridge unit 210 .
  • Processor bus 108 couples the processor 215 to the host bridge unit 210 and the memory bus 230 couples the host bridge unit 210 to the main memory 102 .
  • the processor 215 is illustrative of, for example, a Pentium® Pro Microprocessor. It should be understood, however, that other alternative types of processors could be employed.
  • the main memory controller (not shown in FIG. 2) typically is incorporated within the host bridge unit 210 to generate various control signals for accessing the main memory 102 .
  • An interface to a high bandwidth local expansion bus, such as the PCI bus, may also be included as a separate I/O bridge unit.
  • a separate peripheral bus optimized for graphics related data transfers is provided.
  • a popular example of such a bus is the AGP bus.
  • the AGP bus is generally considered a high performance, component level interconnect bus optimized for three dimensional graphical display applications, and is based on a set of performance extensions or enhancements to the PCI standard.
  • the AGP bus was developed in response to the increasing demands placed on memory bandwidths for three-dimensional renderings.
  • a graphics controller can be removed from the PCI bus (where it traditionally was located) to the AGP bus.
  • AGP provides greater bandwidth for data transfer between a graphics accelerator and system memory than is possible with PCI or other conventional bus architectures.
  • AGP advanced graphics processing unit
  • Graphics controller 285 controls the rendering of text and images on display 290 .
  • Graphics controller 285 may embody a typical graphics accelerator generally known in the art to render three-dimensional data structures on display 290 . These data structures can be effectively shifted into and out of main memory 102 .
  • the graphics controller 285 therefore may be a master of the AGP bus 280 in that it can request and receive access through the I/O bridge unit 104 to a target interface within the host bridge unit 210 to thereby obtain access to main memory 102 .
  • a dedicated graphics bus accommodates rapid retrieval of data from main memory 102 .
  • graphics controller 285 may further be configured to generate PCI protocol transactions on the AGP bus 280 .
  • the AGP interface of the I/O bridge unit 104 may thus include functionality to support both AGP protocol transactions as well as PCI protocol transactions.
  • Display 118 is any electronic display device upon which an image or text can be represented.
  • Computer system 90 for coupling together various computer buses.
  • Computer system 90 can be implemented with respect to the particular bus architectures shown in FIG. 2 (i.e., PCI, PCIx, and AGP buses), or other bus architectures, as desired.
  • the embodiment described herein assumes buses 260 , 270 , and 280 represent a PCI bus, PCIx bus, and an AGP bus, as shown in FIG. 2.
  • processor 215 is assumed to be a Pentium® Pro processor and thus processor bus 108 represents a Pentium Pro bus®.
  • Host bridge unit 210 of the preferred embodiment includes a cache coherence controller 212 that preferably implements the coherence protocol.
  • a memory request from a peripheral device is transmitted to the host bridge unit 210 .
  • the host bridge unit 210 is informed through sideband signal 250 that a memory request is non-cacheable.
  • the host bridge unit does not send the memory request to the cache coherence controller 212 .
  • the cache coherence controller 212 for non-cacheable memory requests does not perform a cache coherence directory lookup and evaluation.
  • the host bridge unit 210 will return a coherent response indicating that main memory 102 of the owner processor contains the most recent copy.
  • the coherent response from the host bridge unit 210 is significantly faster than those memory requests requiring a cache coherence directory lookup. If the memory request is cacheable and a snoop of the processor bus 108 is required, the host bridge unit 210 broadcasts the memory request to the appropriate processor cache 106 using the processor bus 108 . In the preferred embodiment of the invention described, if a significant number of non-cacheable memory request cycles exist, excluding them from the cache coherence directory lookup and comparison process results in significantly increased bus performance and reduced snoop traffic.
  • transactions on the processor bus 108 preferably follow a strong ordering model for memory accesses, I/O accesses, locked memory accesses, and PCI configuration accesses. Strong ordering of transactions means that the transactions are completed on the processor bus 108 in the order in which they were initiated. If additional explanation of the ordering rules identified above for the Pentium® Pro bus is desired, reference may be made to the Pentium Pro Family Developer's Manual , Volume 3: Operating System Writer's Manual.
  • I/O bridge unit 104 receives requests for instructions and data from the peripheral devices.
  • the I/O bridge unit 104 transmits the memory request to the host bridge unit 210 that then performs a cache coherence directory lookup based on if the memory request was cacheable or non-cacheable.
  • the I/O bridge unit 104 includes local cache 262 coupled to each PCI device 265 through a PCI bus 260 .
  • PCI devices coupled to the PCI bus 260 may request the I/O bridge 104 to fetch data and instructions from main memory 102 or cache 106 .
  • the I/O bridge 104 is implemented to constantly prefetch and store data and instructions into the local cache 262 to try to stay ahead of PCI device requests.
  • PCIx devices, 275 coupled to the PCIx bus 270 request from the I/O bridge 104 a range of memory that it will need in the future. Because the data fetched is not needed immediately, PCIx data and instructions are generally tagged as non-cacheable. The data is retrieved from main memory 102 and stored into local cache 272 . Because most of the data and instructions requested by PCIx devices are non-cacheable, memory requests from PCIx devices can benefit significantly from the apparatus to bypass cache coherence directory lookups and bus snoops described in the preferred embodiment of the invention.
  • the AGP bus is provided as a part of the host bridge unit 210 .
  • the graphics controller 285 couples to the I/O bridge unit 104 through the AGP port or bus 280 . Connection of graphics devices to the I/O bridge unit 104 rather than interfaced directly to the host bridge unit 210 offers greater flexibility in the design of the I/O subsystem.
  • Devices coupled to the AGP bus 280 benefit from bypassing cache coherence directory lookups and bus snoops for non-cacheable requests of data to main memory as described in the preferred embodiment of the invention. This is because AGP enabled graphics devices are capable of non-cacheable data transfer rates peaking at 1 Gigabyte/sec. If the preferred embodiment of the invention that allows distinguishing between cacheable and non-cacheable memory requests is not implemented, requests to memory from AGP devices will significantly impact bus snoop performance.
  • the I/O bus 240 couples the host bridge unit 210 to the I/O bridge unit 104 .
  • the I/O bus 240 generally does not support identification of cacheable and non-cacheable requests for data and instructions as part of its bus protocol. Thus, use of the I/O bus 240 by itself degrades performance as bus snoops are performed unnecessarily for data and instructions that are non-cacheable.
  • one solution to this problem is for the host bridge unit 210 to support a sideband signal 250 that identifies non-cacheable memory requests.
  • the host bridge unit 210 When the host bridge unit 210 receives a memory request in which the side band signal 250 is asserted indicating that the data or instructions are non-cacheable, the host bridge unit will not request the cache coherence controller to perform a cache coherence directory lookup or snoop the processor buses.

Abstract

An apparatus for identifying memory requests originating on remote I/O devices as non-cacheable in a computer system with multiple processors includes a main memory, memory cache, processor, cache coherence directory and cache coherence controller all coupled to a host bridge unit (North bridge). The I/O device transmits requests for data to an I/O bridge unit. The I/0 bridge unit forwards the request for data to the host bridge unit and asserts a sideband signal to the host bridge unit if the request is for non-cacheable data. The sideband signal informs the host bridge unit that the memory request is for non-cacheable data and that the cache coherence controller does not need to perform a cache coherence directory lookup. For cacheable data, the cache coherence controller performs a cache coherence directory lookup to maintain the coherence of data stored in a plurality of processor caches in the computer system.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application relates to the following commonly assigned co-pending application entitled “System For Identifying Memory Requests As Non-cacheable To Reduce Cache Coherence Directory Lookups And Bus Snoops,” Ser. No.______ , filed December ???, 2000, Attorney Docket No. 1662-34500 which is incorporated by reference herein. [0001]
  • STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
  • Not applicable. [0002]
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0003]
  • The present invention generally relates to sharing data among processors using cache memories in a computer system with multiple processors. More particularly, the present invention relates to maintaining the coherence of data in cache memories by using a cache coherence directory and bus snooping. Still more particularly, the invention relates to a computer system in which a sideband signal identifies memory requests as non-cacheable to minimize cache coherence directory lookups and bus snoops. [0004]
  • 2. Background of the Invention [0005]
  • Modem day computer systems can include a single processor or multiple processors for higher performance. A host bridge unit coupled to each processor of the multiprocessing computer system allows the computer system to support many different kinds of devices attached to a multitude of different buses. The host bridge unit may cornect to processor buses, a main memory bus, I/O bus, and connected through an I/O bridge unit, an advanced graphic port (“AGP”) bus, peripheral component interconnect (“PCI”) bus or peripheral component interconnect extended (“PCIx”) bus. Each of the processor buses can support a maximum number of processors (e.g. [0006] 4, 6, 8, 12 etc.) connected to the processor bus while still maintaining bus communication bandwidth for sufficiently high performance.
  • Each processor of the computer system includes a memory cache either integrated into the processor chip itself or external to the processor chip. The memory cache stores data and instructions and improves processor performance by allowing high-speed access to the needed data and instructions resulting in reduced program execution time. In a computer system with multiple processors, each unit of data is identified as being owned by a particular processor. Requestor processors in the computer system may request a unit of data from an owner processor. The requesting processor may access data to perform either read or write operations. If a requesting processor modifies the data by performing a write, other processors of the computer system may have access to old, unmodified versions of the data. To remedy this problem, each processor maintains a local record of the addresses cached on the various processors and the particular “state” of each unit of data associated with the address in a cache coherence directory. [0007]
  • A “state” describes the copies of the data unit stored in the memory caches of the particular system. The computer system, using a cache coherence directory, implements a coherency protocol that enforces the consistency of data in the cache memories. The coherency protocol describes the different states of a data unit. A data unit may be in a shared state that corresponds to processors having a read only copy of the data unit. Alternatively, a data unit may be in an exclusive state in which only one requestor processor contains a copy of the data unit that it may modify. [0008]
  • Use of a coherence protocol requiring a cache coherence directory may call for excessive utilization of the processor bus interconnecting the processors. A “bus snoop” involves accessing the bus to communicate with processors on the processor bus to monitor and maintain coherency of data. A bus snoop is needed whenever a requester processor needs access to data that it does not have an exclusive copy of or is not the owner. Large amounts of snoop traffic can seriously impact computer system performance. One solution to this problem is to compare the address of the data to the cache coherence directory to determine if one of the other processors owns the address or has an exclusive copy. If the cache coherence directory indicates ownership of the address or an exclusive copy by a different processor, a bus snoop is performed. If the requesting processor owns the address or has an exclusive copy, a bus snoop is not performed, thus preserving processor bus bandwidth. [0009]
  • Hardware to maintain the coherency of the data includes a cache coherence controller and cache coherence directory. The cache coherence directory preferably includes enough Random Access Memory (“RAM”) to maintain a sufficient record of the addresses cached on the various processors and the particular state of each unit of data associated with the address. It would be advantageous if the cache coherence directory and cache coherence protocol could be implemented in such a way as to be able to quickly retrieve memory requests from the processor and peripheral devices. To implement a fast cache coherence directory, interleaved banks of RAM can be used. To further reduce the access time for processor and peripheral device memory requests, the cache coherence protocol could be implemented to reduce the number of memory requests that must be compared to the cache coherence directory. One way to reduce memory request access times would be for the host bridge unit to identify memory requests from peripheral devices as non-cacheable and then skip the cache coherence directory lookup and bus snoop. [0010]
  • The current generation of host bridge units has no dedicated hardware support for the identification of non-cacheable memory requests. Furthermore, the next generation of AGP bus implementations in computer systems will have graphics devices coupled to an I/O bridge in order to permit greater flexibility in the I/O subsystem. Thus, graphics devices will send requests for data stored in memory to the I/O bridge that will then forward the request to the host bridge. Even if the graphics device and AGP bus are implemented to inform the I/O bridge that the data requested is non-cacheable, because host bridge units have no dedicated hardware support for identification of non-cacheable memory requests, the information that the data for the memory request is non-cacheable will not be provided to the host bridge. [0011]
  • For the reasons discussed above, it would be advantageous to design a computer system with dedicated hardware capable of informing the host bridge that a memory request from a graphics device is to non-cacheable data so as to bypass the cache coherence directory lookup and bus snoop. Despite the apparent performance advantages of such a system, to date no such system has been implemented. [0012]
  • BRIEF SUMMARY OF THE INVENTION
  • The deficiencies of the prior art described above are solved in large part by an apparatus for identifying memory requests originating on remote I/O devices as non-cacheable in a computer system with multiple processors. The apparatus includes a main memory, memory cache, processor and cache coherence directory all coupled to a host bridge unit (North bridge or memory controller). The I/O device transmits requests for data to the I/O bridge unit. The I/O bridge unit forwards the request for data to the host bridge unit and asserts a sideband signal to the host bridge unit if the request is for non-cacheable data. The host bridge unit includes a cache coherence controller that implements a protocol to maintain the coherence of data stored in each of the processor caches in the computer system. The cache coherence directory connects to the cache coherence controller. If the host bridge unit determines that the data is cacheable, (i.e. the sideband signal is not asserted) then it requests the cache coherence controller to perform a cache coherence directory lookup to maintain the coherence of the data. If the data is non-cacheable, (i.e. the sideband signal is asserted) then the host bridge unit does not request the cache coherence controller to perform a cache coherence directory lookup. Various I/O devices can be coupled to the I/O bridge unit through an AGP bus, PCI bus, or PCIx bus. [0013]
  • The preferred embodiment of the invention comprises a combination of features and advantages that enable it to overcome various problems of prior devices. The various characteristics described above, as well as other features, will be readily apparent to those skilled in the art upon reading the following detailed description of the preferred embodiments of the invention, and by referring to the accompanying drawings.[0014]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a detailed description of the preferred embodiments of the invention, reference will now be made to the accompanying drawings in which: [0015]
  • FIG. 1 shows a system diagram of a plurality of processors coupled together through a multitude of processor buses; and [0016]
  • FIGS. [0017] 2 shows a block diagram of the hardware to identify memory requests as non-cacheable to reduce cache coherence directory lookups and bus snoops in accordance with the preferred embodiment.
  • NOTATION AND NOMENCLATURE
  • Certain terms are used throughout the following description and claims to refer to particular system components. As one skilled in the art will appreciate, computer companies may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not function. In the following discussion and in the claims, the terms “including” and “comprising” are used in an open-ended fashion, and thus should be interpreted to mean “including, but not limited to . . . ”. Also, the term “couple” or “couples” is intended to mean either an indirect or direct electrical connection. Thus, if a first device couples to a second device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.[0018]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Referring now to FIG. 1, in accordance with the preferred embodiment of the invention, [0019] computer system 90 comprises one or more processor modules 100 coupled to a main memory 102 and an input/output (“I/O”) bridge unit 104. As shown, computer system 90 includes five processor modules 100, each processor module 100 coupled to a main memory 102, an external memory cache 106, and an I/O bridge unit 104. Each processor preferably includes processor buses 108 for connection to adjacent processors. A processor module 100 preferably couples through a straight-line processor bus 108 to all other processor modules 100 of computer system 90. As such, each processor module 100 in the embodiment shown can be connected to four other processor modules 100 through the processor bus 108. Although five processor modules 100 are shown in the exemplary embodiment of FIG. 1, any desired number of processors (e.g. 4, 6, 7, 8, 12 etc.) limited by the communication bandwidth of the processor bus can be included.
  • The I/O bridge unit [0020] 104 provides an interface to various input/output devices such as disk drives and display devices as described in greater detail below. Data from the I/O devices thus enters the processor bus 108 of the computer system via the I/O bridge unit 104.
  • In accordance with the preferred embodiment, the [0021] main memory 102 generally includes a conventional memory device or an array of memory devices in which application programs and data are stored. The capacity of the main memory 102 can be any suitable size. Further, main memory 102 preferably is any suitable type of memory such as dynamic random access memory (“DRAM”) or any of the various types of DRAM circuits such as synchronous dynamic random access memory (“SDRAM”).
  • In one exemplary embodiment, an off-chip [0022] external cache 106 couples to a processor module 100 through the processor bus 108. The external cache may be a 1.75-MB, seven-way set associative write-back mixed instruction and data cache. Preferably, the L2 cache holds physical address data for each block. Alternatively in another exemplary embodiment, memory cache 106 may be integrated on-chip into the processor in processor module 100. The on-chip memory cache may be unified instruction and data cache. In the preferred embodiment, the memory cache 106 preferably comprises a 64-KB, two-way set associative, virtually indexed, physically tagged, writeback memory cache with 64-byte cache blocks. During each cycle the memory cache 106 preferably performs one of the following transactions: two quadword (or shorter) read transactions to arbitrary addresses, two quadword write transactions to the same aligned octaword, two non-overlapping less-than quadword writes to the same aligned quadword, or one sequential read and write transaction from and to the same aligned octaword. Preferably, each data cache block contains 64 data bytes and associated quadword ECC bits, tag parity bit calculated across the tag and one bit to control round-robin set allocation. The memory cache 106 is organized to contain two cache sets, each with 512 rows containing 64-byte blocks per row (i.e., 32 KB of data per cache set).
  • In general, [0023] computer system 90 can be configured so that any processor module 100 can access its own main memory 102 and I/O devices as well as the main memory and I/O devices of all other processors in the network. Preferably, the computer system has physical connections between each processor module resulting in low interprocessor communication times and improved memory and I/O device access reliability.
  • Referring now to FIG. 2, each [0024] hardware block 110 preferably includes main memory 102 coupled to processor module 100 through a memory bus 230. External cache 106 of the preferred embodiment couples to processor module 100 through processor bus 108. The processor module 100 preferably includes multiple processor buses 108 (e.g. processor bus 0 and processor bus 1) coupling to other processor modules and external caches. An I/O bridge unit (South bridge or I/O controller) 104 couples to the processor module 100 through an I/O bus 240. In the preferred embodiment, a side band signal 250 also couples the processor module 100 to the I/O bridge unit 104. I/O bridge unit 104 preferably couples the processor module 100 to various peripheral devices through a variety of different peripheral buses. In the preferred embodiment shown, a PCI bus 260 couples the PCI device 265 to the I/O bridge unit 104. Examples of PCI devices which can be coupled to the PCI bus include network interface cards, video accelerators, audio cards, SCSI adapters, and telephony cards, to name a few. The I/O bridge unit 104 also may couple to PCIx devices 275 through a PCIx bus 270. Preferably, the I/O bridge unit 104 also couples the processor module 100 to a graphics controller 285 through an AGP bus 280. In the preferred embodiment, the graphics controller 285 is coupled to a display 290. A suitable display 290 may include, for example, a cathode ray tube (“CRT”), a liquid crystal display (“LCD”), or a virtual retinal display (“VRD”), or any other type of suitable display device for a computer system. The graphics controller controls the output sent to the display 290.
  • Preferably, the [0025] processor module 100 includes a host bridge unit (North Bridge or memory controller) 210 that couples to a processor 215 through the processor bus 108. The external cache 106 of the preferred embodiment couples through the processor bus 108 to the host bridge unit 210 and processor 215. Preferably, processor bus 0 108 a couples the processor to another processor in the computer system 90. The host bridge unit 210 also couples to the main memory 102 through the memory bus 230. I/O bridge unit 104 couples to the host bridge unit 210 through the I/O bus 240 and side band signal 250. In the preferred embodiment, processor module 100 also includes a cache coherence directory 220 that couples to the host bridge unit 210. Processor bus 1 108 c couples other processors of computer system 90 to processor 215 through host bridge unit 210. The components discussed above are described in greater detail below.
  • Processor bus [0026] 108 couples the processor 215 to the host bridge unit 210 and the memory bus 230 couples the host bridge unit 210 to the main memory 102. The processor 215 is illustrative of, for example, a Pentium® Pro Microprocessor. It should be understood, however, that other alternative types of processors could be employed. The main memory controller (not shown in FIG. 2) typically is incorporated within the host bridge unit 210 to generate various control signals for accessing the main memory 102. An interface to a high bandwidth local expansion bus, such as the PCI bus, may also be included as a separate I/O bridge unit.
  • In applications that are graphics intensive, a separate peripheral bus optimized for graphics related data transfers is provided. A popular example of such a bus is the AGP bus. The AGP bus is generally considered a high performance, component level interconnect bus optimized for three dimensional graphical display applications, and is based on a set of performance extensions or enhancements to the PCI standard. In part, the AGP bus was developed in response to the increasing demands placed on memory bandwidths for three-dimensional renderings. With the advent of AGP, a graphics controller can be removed from the PCI bus (where it traditionally was located) to the AGP bus. AGP provides greater bandwidth for data transfer between a graphics accelerator and system memory than is possible with PCI or other conventional bus architectures. The increase in data rate provided by AGP allows some of the three dimensional rendering data structures, such as textures, to be stored in main memory, reducing the cost of incorporating large amounts of memory local to the graphics accelerator or frame buffer. Although the AGP bus uses the PCI specification as an operational baseline, it provides two significant performance extensions or enhancements to that specification. These extensions include a deeply pipelined read and write operation and demultiplexing of address and data on the AGP bus. [0027]
  • [0028] Graphics controller 285 controls the rendering of text and images on display 290. Graphics controller 285 may embody a typical graphics accelerator generally known in the art to render three-dimensional data structures on display 290. These data structures can be effectively shifted into and out of main memory 102. The graphics controller 285 therefore may be a master of the AGP bus 280 in that it can request and receive access through the I/O bridge unit 104 to a target interface within the host bridge unit 210 to thereby obtain access to main memory 102. A dedicated graphics bus accommodates rapid retrieval of data from main memory 102. For certain operations, graphics controller 285 may further be configured to generate PCI protocol transactions on the AGP bus 280. The AGP interface of the I/O bridge unit 104 may thus include functionality to support both AGP protocol transactions as well as PCI protocol transactions. Display 118 is any electronic display device upon which an image or text can be represented.
  • The prior discussion describes one embodiment of [0029] computer system 90 for coupling together various computer buses. Computer system 90 can be implemented with respect to the particular bus architectures shown in FIG. 2 (i.e., PCI, PCIx, and AGP buses), or other bus architectures, as desired. The embodiment described herein, however, assumes buses 260, 270, and 280 represent a PCI bus, PCIx bus, and an AGP bus, as shown in FIG. 2. Further, processor 215 is assumed to be a Pentium® Pro processor and thus processor bus 108 represents a Pentium Pro bus®. These bus protocols and the terminology used with respect to these protocols are well known to those of ordinary skill in the art. If a more thorough understanding of the PCI, PCIx, AGP, or Pentium® Pro buses is desired, reference should be made to the PCI Local Bus Specification (1993), Accelerated Graphics Port Interface Specification (Intel, 1996), and Intel P6 External Bus Specification.
  • [0030] Host bridge unit 210 of the preferred embodiment includes a cache coherence controller 212 that preferably implements the coherence protocol. A memory request from a peripheral device is transmitted to the host bridge unit 210. The host bridge unit 210 is informed through sideband signal 250 that a memory request is non-cacheable. The host bridge unit does not send the memory request to the cache coherence controller 212. Thus, the cache coherence controller 212 for non-cacheable memory requests does not perform a cache coherence directory lookup and evaluation. For non-cacheable memory requests, the host bridge unit 210 will return a coherent response indicating that main memory 102 of the owner processor contains the most recent copy. Thus, under this implementation, the coherent response from the host bridge unit 210 is significantly faster than those memory requests requiring a cache coherence directory lookup. If the memory request is cacheable and a snoop of the processor bus 108 is required, the host bridge unit 210 broadcasts the memory request to the appropriate processor cache 106 using the processor bus 108. In the preferred embodiment of the invention described, if a significant number of non-cacheable memory request cycles exist, excluding them from the cache coherence directory lookup and comparison process results in significantly increased bus performance and reduced snoop traffic.
  • For non-cacheable memory, transactions on the processor bus [0031] 108 preferably follow a strong ordering model for memory accesses, I/O accesses, locked memory accesses, and PCI configuration accesses. Strong ordering of transactions means that the transactions are completed on the processor bus 108 in the order in which they were initiated. If additional explanation of the ordering rules identified above for the Pentium® Pro bus is desired, reference may be made to the Pentium Pro Family Developer's Manual, Volume 3: Operating System Writer's Manual.
  • I/O bridge unit [0032] 104 receives requests for instructions and data from the peripheral devices. The I/O bridge unit 104 transmits the memory request to the host bridge unit 210 that then performs a cache coherence directory lookup based on if the memory request was cacheable or non-cacheable. The I/O bridge unit 104 includes local cache 262 coupled to each PCI device 265 through a PCI bus 260. PCI devices coupled to the PCI bus 260 may request the I/O bridge 104 to fetch data and instructions from main memory 102 or cache 106. Preferably, the I/O bridge 104 is implemented to constantly prefetch and store data and instructions into the local cache 262 to try to stay ahead of PCI device requests. PCIx devices, 275, coupled to the PCIx bus 270 request from the I/O bridge 104 a range of memory that it will need in the future. Because the data fetched is not needed immediately, PCIx data and instructions are generally tagged as non-cacheable. The data is retrieved from main memory 102 and stored into local cache 272. Because most of the data and instructions requested by PCIx devices are non-cacheable, memory requests from PCIx devices can benefit significantly from the apparatus to bypass cache coherence directory lookups and bus snoops described in the preferred embodiment of the invention.
  • Traditionally, the AGP bus is provided as a part of the [0033] host bridge unit 210. According to the preferred embodiment, the graphics controller 285 couples to the I/O bridge unit 104 through the AGP port or bus 280. Connection of graphics devices to the I/O bridge unit 104 rather than interfaced directly to the host bridge unit 210 offers greater flexibility in the design of the I/O subsystem. Devices coupled to the AGP bus 280 benefit from bypassing cache coherence directory lookups and bus snoops for non-cacheable requests of data to main memory as described in the preferred embodiment of the invention. This is because AGP enabled graphics devices are capable of non-cacheable data transfer rates peaking at 1 Gigabyte/sec. If the preferred embodiment of the invention that allows distinguishing between cacheable and non-cacheable memory requests is not implemented, requests to memory from AGP devices will significantly impact bus snoop performance.
  • In the preferred embodiment of the invention, the I/[0034] O bus 240 couples the host bridge unit 210 to the I/O bridge unit 104. The I/O bus 240 generally does not support identification of cacheable and non-cacheable requests for data and instructions as part of its bus protocol. Thus, use of the I/O bus 240 by itself degrades performance as bus snoops are performed unnecessarily for data and instructions that are non-cacheable. In accordance with the preferred embodiment, one solution to this problem is for the host bridge unit 210 to support a sideband signal 250 that identifies non-cacheable memory requests. When the host bridge unit 210 receives a memory request in which the side band signal 250 is asserted indicating that the data or instructions are non-cacheable, the host bridge unit will not request the cache coherence controller to perform a cache coherence directory lookup or snoop the processor buses.
  • The above discussion is meant to be illustrative of the principles and various embodiments of the present invention. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications. [0035]

Claims (20)

What is claimed is:
1. An apparatus for identifying memory requests originating on remote I/O devices as non-cacheable in a computer system with multiple processors, comprising:
a main memory coupled to a host bridge unit, wherein said host bridge unit includes a cache coherence controller that implements a protocol to maintain the coherence of data stored in a plurality of processor caches in the computer system;
a cache coherence directory coupled to said cache coherence controller;
an I/O bridge unit coupled to said host bridge unit; and
a peripheral bus coupled to said I/O bridge unit, said peripheral bus transmitting requests from a peripheral device coupled to the peripheral bus for data to the I/O bridge unit, wherein said I/O bridge unit transmits the request for data to the host bridge unit and asserts a sideband signal to the host bridge unit if the request is for non-cacheable data.
2. The apparatus of claim 1, wherein said cache coherence directory includes the addresses of data stored in each of the processor caches and the state of the data, wherein said host bridge unit recognizes requests for data as cacheable or non-cacheable, and said host bridge unit requests the cache coherence controller to bypass the cache coherence directory lookup for non-cacheable data.
3. The apparatus of claim 1 wherein said host bridge unit comprises a memory controller.
4. The apparatus of claim 1 wherein said peripheral bus is an advanced graphic port (“AGP”) bus.
5. The apparatus of claim 4 wherein said peripheral device is an I/O device.
6. The apparatus of claim 1 wherein said peripheral bus is a peripheral component interconnect (“PCI”) bus.
7. The apparatus of claim 1 wherein said peripheral bus is a peripheral component interconnect extended (“PCIx”) bus.
8. An apparatus in a computer system for identifying memory requests originating on remote I/O devices as non-cacheable, comprising:
a memory cache coupled to a first bridge unit;
a main memory coupled to said first bridge unit, wherein said first bridge unit includes a cache coherence controller that implements a protocol to maintain the coherence of data stored in a plurality of processor caches in the computer system;
a cache coherence directory coupled to said cache coherence controller;
a secondary bridge unit coupled to said first bridge unit;
a peripheral bus coupled to said secondary bridge unit, said peripheral bus transmitting requests from a peripheral device coupled to the peripheral bus for data to the secondary bridge unit, wherein said secondary bridge unit transmits the request for data to the first bridge unit and asserts a sideband signal to the first bridge unit if the request is for non-cacheable data; and
a display coupled to said secondary bridge unit.
9. The apparatus of claim 8, wherein said cache coherence directory includes the addresses of data stored in each of the processor caches and the state of the data, wherein said first bridge unit recognizes requests for data as cacheable or non-cacheable, and said first bridge unit requests the cache coherence controller to bypass the cache coherence directory lookup for non-cacheable data.
10. The apparatus of claim 8 wherein said first bridge unit comprises a memory controller.
11. The apparatus of claim 8 wherein said peripheral bus is an advanced graphic port (“AGP”) bus.
12. The apparatus of claim 11 wherein said peripheral device is an I/O device.
13. The apparatus of claim 8 wherein said peripheral bus is a peripheral component interconnect (“PCI”) bus.
14. The apparatus of claim 8 wherein said peripheral bus is a peripheral component interconnect extended (“PCIx”) bus.
15. The apparatus of claim 8 wherein said computer system includes multiple processors coupled together through a processor bus.
16. The apparatus of claim 8 wherein said secondary bridge unit is a South Bridge.
17. The apparatus of claim 8 wherein said secondary bridge unit comprises an I/O controller.
18. The apparatus of claim 8 wherein said first bridge unit comprises a memory controller.
19. The apparatus of claim 8 wherein said cache coherence directory is located in the first bridge unit.
20. A method for identifying memory requests originating on remote I/O devices as non-cacheable in a multiprocessing computer system, comprising:
transmitting requests for data from an I/O device to an I/O bridge unit, said I/O bridge unit coupled to a host bridge unit;
identifying the requests for data as cacheable or non-cacheable, wherein said I/O bridge unit transmits the request for data to the host bridge unit and asserts a sideband signal to the host bridge unit if the request is for non-cacheable data; and
requesting the cache coherence controller bypass the cache coherence directory lookup for non-cacheable data.
US09/751,505 2000-12-29 2000-12-29 Apparatus for identifying memory requests originating on remote I/O devices as noncacheable Expired - Lifetime US6463510B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/751,505 US6463510B1 (en) 2000-12-29 2000-12-29 Apparatus for identifying memory requests originating on remote I/O devices as noncacheable

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/751,505 US6463510B1 (en) 2000-12-29 2000-12-29 Apparatus for identifying memory requests originating on remote I/O devices as noncacheable

Publications (2)

Publication Number Publication Date
US20020087803A1 true US20020087803A1 (en) 2002-07-04
US6463510B1 US6463510B1 (en) 2002-10-08

Family

ID=25022282

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/751,505 Expired - Lifetime US6463510B1 (en) 2000-12-29 2000-12-29 Apparatus for identifying memory requests originating on remote I/O devices as noncacheable

Country Status (1)

Country Link
US (1) US6463510B1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028728A1 (en) * 2001-07-31 2003-02-06 Mitsubishi Denki Kabushiki Kaisha Cache memory control device
US6564299B1 (en) * 2001-07-30 2003-05-13 Lsi Logic Corporation Method and apparatus for defining cacheable address ranges
US20060041706A1 (en) * 2004-08-17 2006-02-23 Yao-Chun Su Apparatus And Related Method For Maintaining Read Caching Data of South Bridge With North Bridge
US7194576B1 (en) * 2003-07-31 2007-03-20 Western Digital Technologies, Inc. Fetch operations in a disk drive control system
US20070162911A1 (en) * 2001-10-22 2007-07-12 Kohn Leslie D Multi-core multi-thread processor
US7806839B2 (en) 2004-06-14 2010-10-05 Ethicon Endo-Surgery, Inc. System and method for ultrasound therapy using grating lobes
US7806892B2 (en) 2001-05-29 2010-10-05 Ethicon Endo-Surgery, Inc. Tissue-retaining system for ultrasound medical treatment
US7846096B2 (en) 2001-05-29 2010-12-07 Ethicon Endo-Surgery, Inc. Method for monitoring of medical treatment using pulse-echo ultrasound
US20140032202A1 (en) * 2012-07-30 2014-01-30 Cheng-Yen Huang Apparatus of system level simulation and emulation, and associated method
US9990154B2 (en) * 2014-03-05 2018-06-05 Renesas Electronics Corporation Semiconductor device
CN109154910A (en) * 2016-05-31 2019-01-04 超威半导体公司 Cache coherence for being handled in memory
US10909035B2 (en) * 2019-04-03 2021-02-02 Apple Inc. Processing memory accesses while supporting a zero size cache in a cache hierarchy
US11263139B2 (en) * 2017-09-06 2022-03-01 Shanghai Zhaoxin Semiconductor Co., Ltd. Hardware accelerators and access methods thereof
EP4235441A1 (en) * 2022-02-28 2023-08-30 INTEL Corporation System, method and apparatus for peer-to-peer communication

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6636946B2 (en) 2001-03-13 2003-10-21 Micron Technology, Inc. System and method for caching data based on identity of requestor
US6681292B2 (en) * 2001-08-27 2004-01-20 Intel Corporation Distributed read and write caching implementation for optimized input/output applications
US6829665B2 (en) * 2001-09-28 2004-12-07 Hewlett-Packard Development Company, L.P. Next snoop predictor in a host controller
US7752281B2 (en) * 2001-11-20 2010-07-06 Broadcom Corporation Bridges performing remote reads and writes as uncacheable coherent operations
US6826653B2 (en) * 2002-02-06 2004-11-30 Hewlett-Packard Development Company, L.P. Block data mover adapted to contain faults in a partitioned multiprocessor system
US7093079B2 (en) * 2002-12-17 2006-08-15 Intel Corporation Snoop filter bypass
CN1310160C (en) * 2004-03-09 2007-04-11 威盛电子股份有限公司 Method for promoting access finishing of computer system which support write transporting action
US8010682B2 (en) * 2004-12-28 2011-08-30 International Business Machines Corporation Early coherency indication for return data in shared memory architecture
US7536514B2 (en) * 2005-09-13 2009-05-19 International Business Machines Corporation Early return indication for read exclusive requests in shared memory architecture
US20070083715A1 (en) * 2005-09-13 2007-04-12 International Business Machines Corporation Early return indication for return data prior to receiving all responses in shared memory architecture
US8341360B2 (en) 2005-12-30 2012-12-25 Intel Corporation Method and apparatus for memory write performance optimization in architectures with out-of-order read/request-for-ownership response
US7853638B2 (en) * 2007-01-26 2010-12-14 International Business Machines Corporation Structure for a flexibly configurable multi central processing unit (CPU) supported hypertransport switching
US7797475B2 (en) * 2007-01-26 2010-09-14 International Business Machines Corporation Flexibly configurable multi central processing unit (CPU) supported hypertransport switching
US20150113221A1 (en) * 2013-03-15 2015-04-23 Herbert Hum Hybrid input/output write operations

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5890216A (en) * 1995-04-21 1999-03-30 International Business Machines Corporation Apparatus and method for decreasing the access time to non-cacheable address space in a computer system
US5918069A (en) * 1996-03-02 1999-06-29 Kabushiki Kaisha Toshiba System for simultaneously writing back cached data via first bus and transferring cached data to second bus when read request is cached and dirty
US6128711A (en) * 1996-11-12 2000-10-03 Compaq Computer Corporation Performance optimization and system bus duty cycle reduction by I/O bridge partial cache line writes
US6338119B1 (en) * 1999-03-31 2002-01-08 International Business Machines Corporation Method and apparatus with page buffer and I/O page kill definition for improved DMA and L1/L2 cache performance

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5890216A (en) * 1995-04-21 1999-03-30 International Business Machines Corporation Apparatus and method for decreasing the access time to non-cacheable address space in a computer system
US5918069A (en) * 1996-03-02 1999-06-29 Kabushiki Kaisha Toshiba System for simultaneously writing back cached data via first bus and transferring cached data to second bus when read request is cached and dirty
US6128711A (en) * 1996-11-12 2000-10-03 Compaq Computer Corporation Performance optimization and system bus duty cycle reduction by I/O bridge partial cache line writes
US6338119B1 (en) * 1999-03-31 2002-01-08 International Business Machines Corporation Method and apparatus with page buffer and I/O page kill definition for improved DMA and L1/L2 cache performance

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9005144B2 (en) 2001-05-29 2015-04-14 Michael H. Slayton Tissue-retaining systems for ultrasound medical treatment
US9261596B2 (en) 2001-05-29 2016-02-16 T. Douglas Mast Method for monitoring of medical treatment using pulse-echo ultrasound
US7806892B2 (en) 2001-05-29 2010-10-05 Ethicon Endo-Surgery, Inc. Tissue-retaining system for ultrasound medical treatment
US7846096B2 (en) 2001-05-29 2010-12-07 Ethicon Endo-Surgery, Inc. Method for monitoring of medical treatment using pulse-echo ultrasound
US6564299B1 (en) * 2001-07-30 2003-05-13 Lsi Logic Corporation Method and apparatus for defining cacheable address ranges
US20030028728A1 (en) * 2001-07-31 2003-02-06 Mitsubishi Denki Kabushiki Kaisha Cache memory control device
US7865667B2 (en) * 2001-10-22 2011-01-04 Oracle America, Inc. Multi-core multi-thread processor
US20070162911A1 (en) * 2001-10-22 2007-07-12 Kohn Leslie D Multi-core multi-thread processor
US7194576B1 (en) * 2003-07-31 2007-03-20 Western Digital Technologies, Inc. Fetch operations in a disk drive control system
US7806839B2 (en) 2004-06-14 2010-10-05 Ethicon Endo-Surgery, Inc. System and method for ultrasound therapy using grating lobes
US9132287B2 (en) 2004-06-14 2015-09-15 T. Douglas Mast System and method for ultrasound treatment using grating lobes
US8166226B2 (en) * 2004-08-17 2012-04-24 Via Technologies Inc. Apparatus and related method for maintaining read caching data of south bridge with north bridge
US20060041706A1 (en) * 2004-08-17 2006-02-23 Yao-Chun Su Apparatus And Related Method For Maintaining Read Caching Data of South Bridge With North Bridge
US20140032202A1 (en) * 2012-07-30 2014-01-30 Cheng-Yen Huang Apparatus of system level simulation and emulation, and associated method
US9990154B2 (en) * 2014-03-05 2018-06-05 Renesas Electronics Corporation Semiconductor device
US10558379B2 (en) 2014-03-05 2020-02-11 Renesas Electronics Corporation Semiconductor device
CN109154910A (en) * 2016-05-31 2019-01-04 超威半导体公司 Cache coherence for being handled in memory
US11263139B2 (en) * 2017-09-06 2022-03-01 Shanghai Zhaoxin Semiconductor Co., Ltd. Hardware accelerators and access methods thereof
US10909035B2 (en) * 2019-04-03 2021-02-02 Apple Inc. Processing memory accesses while supporting a zero size cache in a cache hierarchy
EP4235441A1 (en) * 2022-02-28 2023-08-30 INTEL Corporation System, method and apparatus for peer-to-peer communication

Also Published As

Publication number Publication date
US6463510B1 (en) 2002-10-08

Similar Documents

Publication Publication Date Title
US6463510B1 (en) Apparatus for identifying memory requests originating on remote I/O devices as noncacheable
US6470429B1 (en) System for identifying memory requests as noncacheable or reduce cache coherence directory lookups and bus snoops
US6721848B2 (en) Method and mechanism to use a cache to translate from a virtual bus to a physical bus
US6345342B1 (en) Cache coherency protocol employing a read operation including a programmable flag to indicate deallocation of an intervened cache line
US6571322B2 (en) Multiprocessor computer system with sectored cache line mechanism for cache intervention
US5940856A (en) Cache intervention from only one of many cache lines sharing an unmodified value
KR100545951B1 (en) Distributed read and write caching implementation for optimized input/output applications
US5822763A (en) Cache coherence protocol for reducing the effects of false sharing in non-bus-based shared-memory multiprocessors
US5996048A (en) Inclusion vector architecture for a level two cache
US6021468A (en) Cache coherency protocol with efficient write-through aliasing
US4959777A (en) Write-shared cache circuit for multiprocessor system
US5963974A (en) Cache intervention from a cache line exclusively holding an unmodified value
US5946709A (en) Shared intervention protocol for SMP bus using caches, snooping, tags and prioritizing
US5940864A (en) Shared memory-access priorization method for multiprocessors using caches and snoop responses
EP0743601A2 (en) A system and method for improving cache performance in a multiprocessing system
US6321306B1 (en) High performance multiprocessor system with modified-unsolicited cache state
US5943685A (en) Method of shared intervention via a single data provider among shared caches for SMP bus
US6260118B1 (en) Snooping a variable number of cache addresses in a multiple processor system by a single snoop request
US6412047B2 (en) Coherency protocol
US5996049A (en) Cache-coherency protocol with recently read state for data and instructions
KR100322223B1 (en) Memory controller with oueue and snoop tables
US6336169B1 (en) Background kill system bus transaction to optimize coherency transactions on a multiprocessor system bus
US6345344B1 (en) Cache allocation mechanism for modified-unsolicited cache state that modifies victimization priority bits
EP0976047B1 (en) Read operations in multiprocessor computer system
US6629213B1 (en) Apparatus and method using sub-cacheline transactions to improve system performance

Legal Events

Date Code Title Description
AS Assignment

Owner name: COMPAQ COMPUTER CORPORATION, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JONES, PHILLIP M.;REEL/FRAME:011814/0252

Effective date: 20010323

AS Assignment

Owner name: COMPAQ COMPUTER CORPORATION, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WOODS, ROBERT L.;REEL/FRAME:011972/0404

Effective date: 20010521

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:COMPAQ COMPUTER CORPORATION;REEL/FRAME:012478/0170

Effective date: 20010720

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: CHANGE OF NAME;ASSIGNOR:COMPAQ INFORMATION TECHNOLOGIES GROUP, LP;REEL/FRAME:015000/0305

Effective date: 20021001

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027