USRE44610E1 - Node identification for distributed shared memory system - Google Patents

Node identification for distributed shared memory system Download PDF

Info

Publication number
USRE44610E1
USRE44610E1 US13/468,751 US201213468751A USRE44610E US RE44610 E1 USRE44610 E1 US RE44610E1 US 201213468751 A US201213468751 A US 201213468751A US RE44610 E USRE44610 E US RE44610E
Authority
US
United States
Prior art keywords
node
memory
packet
distributed
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/468,751
Inventor
Shahe Hagop Krakirian
Isam Akkawi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intellectual Ventures Holding 81 LLC
Original Assignee
Intellectual Ventures Holding 80 LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intellectual Ventures Holding 80 LLC filed Critical Intellectual Ventures Holding 80 LLC
Priority to US13/468,751 priority Critical patent/USRE44610E1/en
Assigned to INTELLECTUAL VENTURES HOLDING 80 LLC reassignment INTELLECTUAL VENTURES HOLDING 80 LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: THE FLORIDA STATE UNIVERSITY FOUNDATION, INCORPORATED
Application granted granted Critical
Publication of USRE44610E1 publication Critical patent/USRE44610E1/en
Assigned to INTELLECTUAL VENTURES FUND 81 LLC reassignment INTELLECTUAL VENTURES FUND 81 LLC MERGER (SEE DOCUMENT FOR DETAILS). Assignors: INTELLECTUAL VENTURES HOLDING 80 LLC
Assigned to INTELLECTUAL VENTURES HOLDING 81 LLC reassignment INTELLECTUAL VENTURES HOLDING 81 LLC CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE NAME PREVIOUSLY RECORDED AT REEL: 037575 FRAME: 0812. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER. Assignors: INTELLECTUAL VENTURES HOLDING 80 LLC
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9047Buffering arrangements including multiple buffers, e.g. buffer pools
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9042Separate storage for different parts of the packet, e.g. header and payload

Definitions

  • the present disclosure relates to an identification process for the nodes in a distributed shared memory system.
  • a distributed shared memory is a multiprocessor system in which the processors in the system are connected by a scalable interconnect, such as an InfiniBand switched fabric communications link, instead of a bus.
  • DSM systems present a single memory image to the user, but the memory is physically distributed at the hardware level.
  • each processor has access to a large shared global memory in addition to a limited local memory, which might be used as a component of the large shared global memory and also as a cache for the large shared global memory.
  • each processor will access the limited local memory associated with the processor much faster than the large shared global memory associated with other processors. This discrepancy in access time is called non-uniform memory access (NUMA).
  • NUMA non-uniform memory access
  • a major technical challenge in DSM systems is ensuring that the each processor's memory cache is consistent with each other processor's memory cache. Such consistency is called cache coherence.
  • additional hardware logic e.g., a chipset
  • software is used to implement a coherence protocol, typically directory-based, chosen in accordance with a data consistency model, such as strict consistency.
  • DSM systems that maintain cache coherence are called cache-coherent NUMA (ccNUMA).
  • a node in the system will comprise a chip that includes the hardware logic and one or more processors and will be connected to the other nodes by the scalable interconnect.
  • the system might employ node identifiers, e.g., serial, random, or centrally-assigned numbers, which in turn might be used as part of an address for physical memory residing on the node.
  • the present invention provides methods, apparatuses, and systems directed to node identification in a DSM system.
  • the present invention provides node-identification processes for use with a connection/communication protocol and a memory-addressing scheme in a DSM system.
  • FIG. 1 is a block diagram showing a DSM system, which system might be used with some embodiments of the present invention.
  • FIG. 2 is a block diagram showing some of the physical and functional components of an example DSM-management chip or logic circuit, which chip might be used as part of a node with some embodiments of the present invention.
  • FIG. 3 is a diagram showing the format of an RDP over Ethernet packet and its header, which formats might be used in some embodiments of the present invention.
  • FIG. 4 is a diagram showing the format of an RDP over InfiniBand packet and its header, which formats might be used in some embodiments of the present invention.
  • FIG. 5 is a diagram showing the format of an RDP packet, its header, and its optional trailer, which formats might be used in some embodiments of the present invention.
  • FIG. 6 is a diagram showing the format of a connection control block, which format might be used in some embodiments of the present invention.
  • FIG. 7 is a diagram showing an example illustrating the use of LNIDs with respect to the RDP protocol, which protocol might be used with an embodiment of the present invention.
  • FIG. 8 is a diagram showing a flowchart of an example process for building an RDP packet for transmission over the switched fabric network, which process might be used with an embodiment of the present invention.
  • FIG. 9 is a diagram showing a flowchart of an example process for validating an RDP packet received over the switched fabric network, which process might be used with an embodiment of the present invention.
  • FIG. 10 is a diagram showing the format of a 40-bit physical memory address in a 16-node DSM system and the format of a 40-bit physical memory address in a 256-node DSM system, which formats might be used with embodiments of the present invention.
  • FIG. 11 is a diagram showing, for didactic purposes, the local views of a physical address space for a virtual server comprised of three nodes.
  • FIG. 12 is a diagram showing a flowchart of an example process for altering a physical memory address prior to transmission over a HyperTransport bus, which process might be used with an embodiment of the present invention.
  • FIG. 13 is a diagram showing a flowchart of an example process for altering a physical memory address prior to transmission over a switched fabric, which process might be used with an embodiment of the present invention.
  • a DSM system has been developed that provides cache-coherent non-uniform memory access (ccNUMA) through the use of a DSM-management chip.
  • a DSM system may comprise a distributed computer network of up to 16 nodes, connected by a switched fabric, where each node includes two or more Opteron CPUs and one DSM-management chip.
  • this DSM system comprises up to 256 nodes connected by the switched fabric.
  • the DSM system allows the creation of a multi-node virtual server which is a virtual machine consisting of multiple CPUs belonging to two or more nodes.
  • the nodes use a connection/communication protocol to communicate with each other and with virtual I/O servers in the DSM system. Enforcement of the connection/communication protocol is also handled by the DSM-management chip. Consequently, virtual I/O servers include a DSM-management chip, though they do not contribute any physical memory to the DSM system and consequently do not make use of the chip's functionality directly related to cache coherence, in particular embodiments.
  • a virtual I/O server see U.S.
  • connection/communication protocol uses an identifier called a logical node identifier (LNID) to identify source and destination nodes for packets that travel over the switched fabric.
  • LNID logical node identifier
  • FIG. 1 is a diagram showing a ccNUMA DSM system, which system might be used with a particular embodiment of the invention.
  • this DSM system four nodes (labeled 101 , 102 , 103 , and 104 ) are connected to each other over a switched fabric (labeled 105 ) such as Ethernet or InfiniBand.
  • each of the four nodes includes two Opteron CPUs, a DSM-management chip, and memory in the form of DDR2 S DRAM (double-data-rate two synchronous dynamic random access memory).
  • each Opteron CPU includes a local main memory connected to the CPU.
  • This DSM system provides NUMA (non-uniform memory access) since each CPU can access its own local main memory faster than it can access the other memories shown in FIG. 1 .
  • a block of memory has its “home” in the local main memory of one of the Opteron CPUs in node 101 . That is to say, this local main memory is where the system's version of the memory block is stored, regardless of whether there are any cached copies of the block. Such cached copies are shown in the DDR2s for nodes 103 and 104 .
  • the DSM-management chip includes hardware logic (e.g., the CMM) to enforce a coherence protocol and make the DSM system cache-coherent (e.g., ccNUMA) when multiple nodes are caching copies of the same block of memory.
  • FIG. 2 is diagram showing the physical and functional components of a DSM-management chip, which chip might be used as part of a node with particular embodiments of the invention.
  • the DSM-management chip includes interconnect functionality facilitating communications with one or more processors, which might be Opteron processors offered by Advanced Micro Devices (AMD), Inc., of Sunnyvale, Calif., in some embodiments.
  • the DSM-management chip includes two HyperTransport Managers (HTM), each of which manages communications to and from a processor over a HT (HyperTransport) bus. More specifically, an HTM provides the PHY and link layer functionality for a cache coherent HT interface such as Opteron's ccHT.
  • HTM HyperTransport Managers
  • the HTM captures all received HT packets in a set of receive queues per interface (e.g., posted/non-posted command, request command, probe command and data) which are consumed by the Coherent Memory Manager (CMM).
  • CMM Coherent Memory Manager
  • the HTM also captures packets from the CMM in a similar set of transmit queues per interface and transmits those packets on the HT interface.
  • the DSM-management chip becomes a coherent agent with respect to any bus snoops broadcast over the cache-coherent HT bus by a processor's memory controller.
  • other inter-chip or bus communications protocols might be used in other embodiments of the present invention.
  • the two HTMs are connected to a Coherent Memory Manager (CMM), which enforces a coherence protocol and thereby provides cache-coherent access to memory shared by the nodes that are part of the DSM fabric.
  • CMM Coherent Memory Manager
  • the CMM interfaces with the fabric via the RDM (Reliable Delivery Manager). Additionally, the CMM provides interfaces to the HTM for DMA (Direct Memory Access) and configuration.
  • the CMM behaves like both a processor cache on a cache-coherent (e.g., ccHT) bus and a memory controller on a cache-coherent (e.g., ccHT) bus, depending on the scenario.
  • a processor on a node performs an access to a home (or local) memory address
  • the home (or local) memory will generate a probe request that is used to snoop the caches of all the processors on the node.
  • the CMM will use this probe to determine if it has exported the block of memory containing that address to another node and may generate DSM probes (over the fabric) to respond appropriately to the initial probe.
  • the CMM behaves like a processor cache on the cache-coherent bus.
  • the processor When a processor on a node performs an access to a remote memory, the processor will direct this access to the CMM.
  • the CMM will examine the request and satisfy it from the local cache, if possible, and, in the process, generate any appropriate probes. If the request cannot be satisfied from the local cache, the CMM will send a DSM request to the remote memory's home node to (a) fetch the block of memory that contains the requested data or (b) request a state upgrade. In this case, the CMM will wait for the DSM response before it responds back to the processor. In this scenario, the CMM behaves like a memory controller on the ccHT bus.
  • the RDM manages the flow of packets across the DSM-managementchip's two fabric interface ports.
  • the RDM has two major clients, the CMM and the DMA Manager (DMM), which initiate packets to be transmitted and consume received packets.
  • the RDM ensures reliable end-to-end delivery of packets using a connection/communication protocol called Reliable Delivery Protocol (RDP).
  • RDP Reliable Delivery Protocol
  • the RDM interfaces to the selected link/MAC (XGM for Ethernet, IBL for InfiniBand) for each of the two fabric ports.
  • the fabric might connect nodes to other nodes.
  • the fabric might also connect nodes to virtual IO servers.
  • the processes using LNIDs described below might be executed by the RDM.
  • the XGM provides a 10G Ethernet MAC function, which includes framing, inter-frame gap handling, padding for minimum frame size, Ethernet FCS (CRC) generation and checking, and flow control using PAUSE frames.
  • the XGM supports two link speeds: single data rate XAUI (10 Gbps) and double data rate XAUI (20 Gbps).
  • the DSM-management chip has two instances of the XGM, one for each fabric port. Each XGM instance interfaces to the RDM, on one side, and to the associated PCS, on the other side.
  • the IBL provides a standard 4-lane IB link layer function, which includes link initialization, link state machine, CRC generation and checking, and flow control.
  • the IBL block supports two link speeds, data rate (8 Gbps) and double data rate (16 Gbps), with automatic speed negotiation.
  • the DSM-management chip has two instances of the IBL, one for each fabric port. Each IBL instance interfaces to the RDM, on one side, and to the associated Physical Coding Sub-layer (PCS), on the other side.
  • PCS Physical Coding Sub-layer
  • the PCS along with an associated quad-serdes, provides physical layer functionality for a 4-lane InfiniBand SDR/DDR interface, or a 10G/20G Ethernet XAUI/10GBase-CX4 interface.
  • the DSM-management chip has two instances of the PCS, one for each fabric port. Each PCS instance interfaces to the associated IBL and XGM.
  • the DMM shown in FIG. 2 manages and executes direct memory access (DMA) operations over RDP, interfacing to the CMM block on the host side and the RDM block on the fabric side.
  • DMA direct memory access
  • the DMM interfaces to software through the DmaCB table in memory and the on-chip DMA execution and completion queues.
  • the DMM also handles the sending and receiving of RDP interrupt messages and non-RDP packets, and manages the associated inbound and outbound queues.
  • the DDR2 SDRAM Controller attaches to a one or two external 240-pin DDR2 SDRAM DIMM, which is actually external to the DMS-management chip, as shown in both FIG. 1 and FIG. 2 .
  • the SDC provides SDRAM access for the CMM and the DMM.
  • the DSM-management chip might comprise an application specific integrated circuit (ASIC), whereas in other embodiments the chip might comprise a field-programmable gate array (FPGA).
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the logic encoded in the chip could be implemented in software for DSM systems whose requirements might allow for longer latencies with respect to cache coherence, DMA, interrupts, etc.
  • FIG. 3 is a diagram showing the format of a packet for RDP over Ethernet and the packet's header, which formats might be used in some embodiments of the present invention.
  • RDP runs over the Ethernet MAC layer
  • an RDP packet is encapsulated in an Ethernet MAC frame.
  • the Ethernet header of an encapsulated RDP packet is a VLAN-tagged header (where VLAN stands for virtual local area network).
  • SA identifies the 6-byte source MAC address
  • DA identifies the 6-byte destination MAC address.
  • the Reliable Delivery Protocol allows RDP and non-RDP packets to co-exist on the same fabric.
  • RDP and non-RDP packets are distinguished from each other by the presence of the VLAN header and the value of the Length/Type field following it.
  • the VLAN header is present, i.e., the first Length/Type field (following the last SA byte) has a value of 0x0081; and
  • the second Length/Type field followsing the VLAN header
  • An Ethernet frame that does not satisfy both of the above conditions is a non-RDP packet.
  • FIG. 4 is a diagram showing the format of a packet for RDP over InfiniBand and the packet's header, which formats might be used in some embodiments of the present invention. It will be appreciated that the header includes fields for Source Local ID and Destination Local ID.
  • an RDP packet is encapsulated into an IB packet.
  • the format of an IB Local Transport Packet is used, although the 12-byte Base Transport Header (BTH) which is normally present after the Local Route Header (LRH) is replaced by the RDP header (8 bytes) and the first 4 bytes of the RDP payload. From the standpoint of the IB standard, bits 31 : 24 of the first DWORD of the RDP Header is the OpCode field of Base Transport Header (BTH).
  • the most significant two bits ( 31 : 30 ) of that field have a fixed value of 0x3 (binary 11) for RDP packets, which specifies a ‘Manufacturer Specific OpCode’.
  • the Rsv8 field of the BTH (bits 31 : 24 of the second DWORD) is not protected by the 32-bit IB Invariant CRC (ICRC). This corresponds to the most significant 8 bits of the DstLNID.
  • ICRC IB Invariant CRC
  • these bits do not have end-to-end protection but do have point-to-point protection by the 16-bit Variant CRC (VCRC), which presents an insignificant risk of failure since the DstLNID is only used as a packet validation field at the destination node in conjunction with many other validation fields.
  • a false match of a corrupted LNID MSB (most significant bit) with good VCRC has very low probability and would only occur if the connection parameters were set up inconsistently at the source and destination nodes.
  • RDP and non-RDP packets are distinguished by the values of the LNH field in the IB Local Route Header and the OpCode field in the IB Base Transport Header.
  • LNH 0x2 (IBA Local); and
  • OpCode bits [ 7 : 6 ] 0x3 (Manufacturer Specific OpCode).
  • An InfiniBand packet that does not satisfy both of the above conditions is a non-RDP packet.
  • FIG. 5 is a diagram showing the format of an RDP packet and its header, which formats might be used in some embodiments of the present invention.
  • An RDP packet consists of a header, payload, and optional trailer.
  • another field in the RDP packet is the SrcLNID (Source Logical Node ID)) which identifies the packet's source node. This is the connection identifier (i.e., remote LNID) at the destination node. This field is also 16 bits wide.
  • SrcLNID Source Logical Node ID
  • DestLNID Destination Logical Node ID
  • This field is 16 bits wide.
  • the DSM system uses a software data structure called the connection control block (CCB), stored in local memory such as the local main memory shown in FIG. 1 , to facilitate implementation of the RDP protocol.
  • the RDM uses a received packet's source LNID as an index into the CCB to find an entry for the connection corresponding to the packet.
  • FIG. 6 is a diagram showing the format of a CCB entry for a single connection, which format might be used in sonic some embodiments of the present invention. As shown in FIG. 6 , each entry records the fabric address for two paths, Path 0 and Path 1 , which may correspond to the two fabric interface ports shown connected to the RDM in FIG. 2 . In other embodiments, there might be more than two paths, corresponding to more than two fabric interface ports.
  • the CCB entry has a field called MY_LNID, which identifies the LNID for the RDM's node.
  • every node is assigned a unique LNID, possibly by some management entity for the DSM system. For example, within a three-node VS, the LNID values might be 0, 1, and 2, or 1, 3, and 4, i.e., they not need to be sequentially incrementing from 0.
  • every server multi-node virtual server or standalone server assigns a unique LNID to each node that communicates with it. For example, a standalone server node that communicates with the virtual server described above might be assigned an LNID value of 16 by the VS. If that same node communicates with another server, it may be assigned the same LNID or a different LNID by that server. Therefore, LNID assignments are unique from the standpoint of a given server, but they are not unique across servers.
  • a virtual computing environment consists of two virtual servers (A and B), an application server (C), and a virtual I/O server (D).
  • virtual server A assigns LNID values 0, 1, and 2 to each of its own nodes (VS nodes A 0 , A 1 , and A 2 , respectively) and an LNID value of 16 to virtual I/O server D.
  • Virtual server B assigns values of 1 and 5 to each of its own nodes (VS nodes B 1 and B 5 , respectively) and an LNID value of 18 to virtual I/O server D.
  • Application server C assigns an LNID value of 3 to virtual I/O server D.
  • Virtual I/O server D assigns LNID values 0, 2, and 4, to VS nodes A 0 , A 1 and A 2 , respectively, and LNID values of 6 and 8 to VS nodes B 1 and B 5 . Finally, virtual I/O server D assigns a value of 10 to application server C. These various assignments are collected and summarized in Table 7.1 in FIG. 7 .
  • Table 7.2 shows the SrcLNID and DstLNID values used in the headers of RDP packets exchanged between different node pairs.
  • VS nodes A 0 and A 1 both belong to virtual server A, so a packet from A 0 to A 1 will have a SrcLNID value of 0 (LNID assigned to A 1 by VS A), and a DstLNID value of 1 (LNID assigned to A 1 by VS A).
  • a packet from A 1 to I/O server D will have a SrcLNID value of 2 (LNID assigned to A 1 by I/O server D) and a DstLNID value of 16 (LNID assigned by VS A to I/O server D).
  • FIG. 8 is a diagram showing a flowchart of an example process for building an RDP packet for transmission over the switched fabric network, which process might be used with an embodiment of the present invention.
  • the node's Reliable Delivery Manager receives a DestLNID and data for an RDP packet from the node's CMM or DMM.
  • the RDM uses the packet's DestLNID to look up the entry corresponding to the DestLNID in the Connection Control Block (CCB), in step 802 . If there is no corresponding entry, the RDM sends an error message to the CMM or DMM, as the case may be.
  • CCB Connection Control Block
  • step 803 the RDM builds an RDP header for an RDP packet for the data, using the DestLNID and the CCB entry's MY_LNID value.
  • step 804 the RDM builds a fabric header for the RDP packet, using information in the CCB entry's remote fabric address. Once the RDP packet is complete, the RDM sends the packet to the fabric link for transmission to the remote node, in step 805 .
  • FIG. 9 is a diagram showing a flowchart of an example process for validating an RDP packet received over the switched fabric network, which process might be used with an embodiment of the present invention.
  • a node's RDM receives an RDP packet over the switched fabric network.
  • the RDM checks to see whether the packet's destination fabric address (e.g., the 6-byte MAC DA in an Ethernet header or the Destination Local ID in an Infiniband LRH) matches the node's fabric address, in step 902 . If not, the RDM discards the packet. Otherwise, the RDM goes to step 903 and determines whether the packet is an RDP packet.
  • the packet's destination fabric address e.g., the 6-byte MAC DA in an Ethernet header or the Destination Local ID in an Infiniband LRH
  • the RDM will process the packet as a non-RDP packet, in step 904 . Otherwise, if the packet is an RDP packet, the RDM uses the packet's SrcLNID to look up the entry corresponding to the SrcLNID in the Connection Control Block (CCB), in step 905 . If there is no corresponding entry, the RDM discards the packet. Then the RDM goes to step 906 and checks to make sure that the packet's source fabric address (e.g., the 6-byte MAC SA in an Ethernet header or the Source Local ID in an Infiniband LRH) matches the CCB entry's remote fabric address (e.g., for Path 0 or Path 1 ). If not, the RDM discards the packet.
  • the packet's source fabric address e.g., the 6-byte MAC SA in an Ethernet header or the Source Local ID in an Infiniband LRH
  • the RDM checks to determine whether the packet's DestLNID matches the CCB entry's MY_LNID, in step 907 . If not, the RDM discards the packet. But if there is a match, the RDM forwards the packet to the CMM or DMM for further processing.
  • the DSM system also uses LNIDs in its memory-addressing scheme.
  • the physical memory address width is 40-bits (e.g., in DSM systems that use the present generation of Opteron CPUs), though it will be appreciated that there are numerous other suitable widths.
  • FIG. 10 is a diagram showing the format of a 40-bit physical memory address in a 16-node DSM system and the format of a 40-bit physical memory address in a 256-node DSM system. As shown in FIG. 10 , the four most significant bits comprise an LNID in the 16-node DSM system and the eight most significant bits comprise an LNID in the 256-node DSM system.
  • the physical address space for a virtual server is arranged so that the local node's memory always starts at address 0 (zero).
  • system software e.g., boot code
  • Another reason for using this arrangement is that it simplifies the address lookup in the CMM. For a memory read/write request from a local processor, an address in the lower 1/16th or 1/256th segment of the 40-bit address space is always local and all other addresses map to memory in other nodes.
  • the total addressable memory space for this virtual server would be 1 terabyte (2 ⁇ 40) and each node would be allocated a segment which is 1/16 of that space (64GB or 2 ⁇ 36). From a global view, the first 64GB segment of the physical address space starting at address 0 would be allocated to node 0 (i.e., the node whose LNID equals 0), the next 64GB segment to node 1 , and the following segment to node 2 . The remaining 13 segments would be unused since LNIDs 4 - 15 are not used.
  • FIG. 11 shows this physical address space from the local view of each of the three nodes in the virtual server.
  • the local view of node 0 would be the same as the global view and is shown in FIG. 11 under the label “Node 0 ”, with Local Memory ( 0 ) first. Node 1 Memory second, and Node 2 Memory third.
  • the local view of node 1 would be as shown under the label “Node 1 ”, with Local Memory ( 1 ) first, Node 0 Memory second, and Node 2 Memory third.
  • the local view of node 2 would be as shown under the label “Node 2 ”, with Local Memory ( 2 ) first, Node 1 Memory second, and Node 0 Memory third.
  • FIG. 12 is a diagram showing a flowchart of an example process for altering a physical memory address, by the swapping a described above, prior to transmission over a HyperTransport bus.
  • a node's CMM receives a memory operation (e.g., a read, write, or probe) pertaining to a physical memory address from the RDM on the DSM-management chip.
  • the CMM determines whether the four (or eight) most significant bits in the physical address are equal to: (1) the MY_LNID value for the node; or (2) zero.
  • the CMM goes to step 1203 , where: (1) if those bits are equal to the MY_LNID value, the CMM sets the bits to zero (e g., by changing to zero the four (or eight) most significant bits in the physical memory address) before transmission of the operation over the HyperTransport bus; and (2) if those bits are equal to zero, the CMM sets those bits to MY_LNID (e.g., by changing to MY_LNID the four (or eight) most significant bits in the physical memory address) before transmission of the operation over the HyperTransport bus. Otherwise, if those bits are not equal to MY_LNID or zero, the CMM goes to step 1204 and allows the memory operation to proceed without processing relating to LNID swapping.
  • FIG. 13 is a diagram showing a flowchart of an example process for altering a physical memory address, by reversing the swapping as described above, prior to transmission over a switched fabric.
  • a node's CMM receives a memory operation (e.g., a read, write, or probe) pertaining to a physical memory address from one of the node's CPUs over the HyperTransport (e.g., ccHT) bus that connects the node's CPUs to the node's DSM-management chip.
  • the CMM determines whether the four (or eight) most significant bits in the physical address are equal to (1) the MY_LNID value for the node; or (2) zero.
  • the CMM goes to step 1303 , where: (1) if those bits are equal to the MY_LNID value, the CMM sets the DstLNID value to zero (e g., by changing to zero the four (or eight) most significant bits in the physical memory address) before transmission of the operation to the RDM; and (2) if those bits are equal to zero, the CMM sets the DstLNID value to MY_LNID (e.g. by changing to MY_LNID the four (or eight) most significant bits in the physical memory address) before transmission of the operation to the RDM.
  • the CMM goes to step 1304 and allows the memory operation to proceed without processing relating to LNID swapping, if the physical memory address is not for exported local memory. (If the physical memory address is for exported local memory, a probe operation to another physical memory address might result, feeding back into the process at step 1301 .)
  • Particular embodiments of the above-described processes might be comprised of instructions that are stored on storage media.
  • the instructions might be retrieved and executed by a processing system.
  • the instructions are operational when executed by the processing system to direct the processing system to operate in accord with the present invention.
  • Some examples of instructions are software, program code, firmware, and microcode.
  • Some examples of storage media are memory devices, tape, disks, integrated circuits, and servers.
  • processing system refers to a single processing device or a group of inter-operational processing devices. Some examples of processing devices are integrated circuits and logic circuitry. Those skilled in the art are familiar with instructions, storage media, and processing systems.

Abstract

An example embodiment of the present invention provides processes relating to a connection/communication protocol and a memory-addressing scheme for a distributed shared memory system. In the example embodiment, a logical node identifier comprises bits in the physical memory addresses used by the distributed shared memory system. Processes in the embodiment include logical node identifiers in packets which conform to the protocol and which are stored in a connection control block in local memory. By matching the logical node identifiers in a packet against the logical node identifiers in the connection control block, the processes ensure reliable delivery of packet data. Further, in the example embodiment, the logical node identifiers are used to create a virtual server consisting of multiple nodes in the distributed shared memory system.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
The present application is for the broadening reissue of U.S. Pat. No. 7,715,400, entitled “NODE IDENTIFICATION FOR DISTRIBUTED SHARED MEMORY SYSTEM,” which issued May 11, 2010 from U.S. patent application Ser. No. 11/740,432, which was filed Apr. 26, 2007.
ThisThe present application is related to the following commonly-owned U.S. utility patent application, filed on Jan. 29, 2007. whosethe disclosure of which is incorporated herein by reference in its entirety for all purposes: U.S. patent application Ser. No. 11/668,275, entitled “Fast Invalidation for Cache Coherency in Distributed Shared Memory System,” filed on Jan. 29, 2007.
TECHNICAL FIELD
The present disclosure relates to an identification process for the nodes in a distributed shared memory system.
BACKGROUND
A distributed shared memory (DSM) is a multiprocessor system in which the processors in the system are connected by a scalable interconnect, such as an InfiniBand switched fabric communications link, instead of a bus. DSM systems present a single memory image to the user, but the memory is physically distributed at the hardware level. Typically, each processor has access to a large shared global memory in addition to a limited local memory, which might be used as a component of the large shared global memory and also as a cache for the large shared global memory. Naturally, each processor will access the limited local memory associated with the processor much faster than the large shared global memory associated with other processors. This discrepancy in access time is called non-uniform memory access (NUMA).
A major technical challenge in DSM systems is ensuring that the each processor's memory cache is consistent with each other processor's memory cache. Such consistency is called cache coherence. To maintain cache coherence in larger distributed systems, additional hardware logic (e.g., a chipset) or software is used to implement a coherence protocol, typically directory-based, chosen in accordance with a data consistency model, such as strict consistency. DSM systems that maintain cache coherence are called cache-coherent NUMA (ccNUMA).
Typically, if additional hardware logic is used, a node in the system will comprise a chip that includes the hardware logic and one or more processors and will be connected to the other nodes by the scalable interconnect. For purposes of initial connection and later communication between nodes, the system might employ node identifiers, e.g., serial, random, or centrally-assigned numbers, which in turn might be used as part of an address for physical memory residing on the node.
SUMMARY
In particular embodiments, the present invention provides methods, apparatuses, and systems directed to node identification in a DSM system. In one particular embodiment, the present invention provides node-identification processes for use with a connection/communication protocol and a memory-addressing scheme in a DSM system.
DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram showing a DSM system, which system might be used with some embodiments of the present invention.
FIG. 2 is a block diagram showing some of the physical and functional components of an example DSM-management chip or logic circuit, which chip might be used as part of a node with some embodiments of the present invention.
FIG. 3 is a diagram showing the format of an RDP over Ethernet packet and its header, which formats might be used in some embodiments of the present invention.
FIG. 4 is a diagram showing the format of an RDP over InfiniBand packet and its header, which formats might be used in some embodiments of the present invention.
FIG. 5 is a diagram showing the format of an RDP packet, its header, and its optional trailer, which formats might be used in some embodiments of the present invention.
FIG. 6 is a diagram showing the format of a connection control block, which format might be used in some embodiments of the present invention.
FIG. 7 is a diagram showing an example illustrating the use of LNIDs with respect to the RDP protocol, which protocol might be used with an embodiment of the present invention.
FIG. 8 is a diagram showing a flowchart of an example process for building an RDP packet for transmission over the switched fabric network, which process might be used with an embodiment of the present invention.
FIG. 9 is a diagram showing a flowchart of an example process for validating an RDP packet received over the switched fabric network, which process might be used with an embodiment of the present invention.
FIG. 10 is a diagram showing the format of a 40-bit physical memory address in a 16-node DSM system and the format of a 40-bit physical memory address in a 256-node DSM system, which formats might be used with embodiments of the present invention.
FIG. 11 is a diagram showing, for didactic purposes, the local views of a physical address space for a virtual server comprised of three nodes.
FIG. 12 is a diagram showing a flowchart of an example process for altering a physical memory address prior to transmission over a HyperTransport bus, which process might be used with an embodiment of the present invention.
FIG. 13 is a diagram showing a flowchart of an example process for altering a physical memory address prior to transmission over a switched fabric, which process might be used with an embodiment of the present invention.
DESCRIPTION OF EXAMPLE EMBODIMENT(S)
The following example embodiments are described and illustrated in conjunction with apparatuses, methods, and systems which are meant to be examples and illustrative, not limiting in scope.
A. ccNUMA DMA System with DSM-Management Chips
A DSM system has been developed that provides cache-coherent non-uniform memory access (ccNUMA) through the use of a DSM-management chip. In a particular embodiment, a DSM system may comprise a distributed computer network of up to 16 nodes, connected by a switched fabric, where each node includes two or more Opteron CPUs and one DSM-management chip. In another embodiment, this DSM system comprises up to 256 nodes connected by the switched fabric.
The DSM system allows the creation of a multi-node virtual server which is a virtual machine consisting of multiple CPUs belonging to two or more nodes. In some embodiments, the nodes use a connection/communication protocol to communicate with each other and with virtual I/O servers in the DSM system. Enforcement of the connection/communication protocol is also handled by the DSM-management chip. Consequently, virtual I/O servers include a DSM-management chip, though they do not contribute any physical memory to the DSM system and consequently do not make use of the chip's functionality directly related to cache coherence, in particular embodiments. For a further description of a virtual I/O server, see U.S. patent application Ser. No. 11/624,542, entitled “Virtualized Access to I/O Subsystems”, and U.S. patent application Ser. No. 11/624,573, entitled “Virtual Input/Output Server”, both filed on Jan. 18, 2007, which are incorporated herein by reference for all purposes. As explained below, the connection/communication protocol uses an identifier called a logical node identifier (LNID) to identify source and destination nodes for packets that travel over the switched fabric.
FIG. 1 is a diagram showing a ccNUMA DSM system, which system might be used with a particular embodiment of the invention. In this DSM system, four nodes (labeled 101, 102, 103, and 104) are connected to each other over a switched fabric (labeled 105) such as Ethernet or InfiniBand. In turn, each of the four nodes includes two Opteron CPUs, a DSM-management chip, and memory in the form of DDR2 S DRAM (double-data-rate two synchronous dynamic random access memory). In this embodiment, each Opteron CPU includes a local main memory connected to the CPU. This DSM system provides NUMA (non-uniform memory access) since each CPU can access its own local main memory faster than it can access the other memories shown in FIG. 1.
Also as shown in FIG. 1, a block of memory has its “home” in the local main memory of one of the Opteron CPUs in node 101. That is to say, this local main memory is where the system's version of the memory block is stored, regardless of whether there are any cached copies of the block. Such cached copies are shown in the DDR2s for nodes 103 and 104. The DSM-management chip includes hardware logic (e.g., the CMM) to enforce a coherence protocol and make the DSM system cache-coherent (e.g., ccNUMA) when multiple nodes are caching copies of the same block of memory.
B. Example System Architecture of a DSM-Management Chip
FIG. 2 is diagram showing the physical and functional components of a DSM-management chip, which chip might be used as part of a node with particular embodiments of the invention. The DSM-management chip includes interconnect functionality facilitating communications with one or more processors, which might be Opteron processors offered by Advanced Micro Devices (AMD), Inc., of Sunnyvale, Calif., in some embodiments. As FIG. 2 illustrates, the DSM-management chip includes two HyperTransport Managers (HTM), each of which manages communications to and from a processor over a HT (HyperTransport) bus. More specifically, an HTM provides the PHY and link layer functionality for a cache coherent HT interface such as Opteron's ccHT. The HTM captures all received HT packets in a set of receive queues per interface (e.g., posted/non-posted command, request command, probe command and data) which are consumed by the Coherent Memory Manager (CMM). The HTM also captures packets from the CMM in a similar set of transmit queues per interface and transmits those packets on the HT interface. As a result of the two HTMs, the DSM-management chip becomes a coherent agent with respect to any bus snoops broadcast over the cache-coherent HT bus by a processor's memory controller. Of course, other inter-chip or bus communications protocols might be used in other embodiments of the present invention.
Also as shown in FIG. 2, the two HTMs are connected to a Coherent Memory Manager (CMM), which enforces a coherence protocol and thereby provides cache-coherent access to memory shared by the nodes that are part of the DSM fabric. In addition to interfacing with the Opteron processors through the HTM, the CMM interfaces with the fabric via the RDM (Reliable Delivery Manager). Additionally, the CMM provides interfaces to the HTM for DMA (Direct Memory Access) and configuration.
In some embodiments, the CMM behaves like both a processor cache on a cache-coherent (e.g., ccHT) bus and a memory controller on a cache-coherent (e.g., ccHT) bus, depending on the scenario. In particular, when a processor on a node performs an access to a home (or local) memory address, the home (or local) memory will generate a probe request that is used to snoop the caches of all the processors on the node. The CMM will use this probe to determine if it has exported the block of memory containing that address to another node and may generate DSM probes (over the fabric) to respond appropriately to the initial probe. In this scenario, the CMM behaves like a processor cache on the cache-coherent bus.
When a processor on a node performs an access to a remote memory, the processor will direct this access to the CMM. The CMM will examine the request and satisfy it from the local cache, if possible, and, in the process, generate any appropriate probes. If the request cannot be satisfied from the local cache, the CMM will send a DSM request to the remote memory's home node to (a) fetch the block of memory that contains the requested data or (b) request a state upgrade. In this case, the CMM will wait for the DSM response before it responds back to the processor. In this scenario, the CMM behaves like a memory controller on the ccHT bus.
The RDM manages the flow of packets across the DSM-managementchip's two fabric interface ports. The RDM has two major clients, the CMM and the DMA Manager (DMM), which initiate packets to be transmitted and consume received packets. The RDM ensures reliable end-to-end delivery of packets using a connection/communication protocol called Reliable Delivery Protocol (RDP). On the fabric side, the RDM interfaces to the selected link/MAC (XGM for Ethernet, IBL for InfiniBand) for each of the two fabric ports. In particular embodiments, the fabric might connect nodes to other nodes. In other embodiments, the fabric might also connect nodes to virtual IO servers. In particular embodiments, the processes using LNIDs described below might be executed by the RDM.
The XGM provides a 10G Ethernet MAC function, which includes framing, inter-frame gap handling, padding for minimum frame size, Ethernet FCS (CRC) generation and checking, and flow control using PAUSE frames. The XGM supports two link speeds: single data rate XAUI (10 Gbps) and double data rate XAUI (20 Gbps). In particular embodiments, the DSM-management chip has two instances of the XGM, one for each fabric port. Each XGM instance interfaces to the RDM, on one side, and to the associated PCS, on the other side.
The IBL provides a standard 4-lane IB link layer function, which includes link initialization, link state machine, CRC generation and checking, and flow control. The IBL block supports two link speeds, data rate (8 Gbps) and double data rate (16 Gbps), with automatic speed negotiation. In particular embodiments, the DSM-management chip has two instances of the IBL, one for each fabric port. Each IBL instance interfaces to the RDM, on one side, and to the associated Physical Coding Sub-layer (PCS), on the other side.
The PCS, along with an associated quad-serdes, provides physical layer functionality for a 4-lane InfiniBand SDR/DDR interface, or a 10G/20G Ethernet XAUI/10GBase-CX4 interface. In particular embodiments, the DSM-management chip has two instances of the PCS, one for each fabric port. Each PCS instance interfaces to the associated IBL and XGM.
The DMM shown in FIG. 2 manages and executes direct memory access (DMA) operations over RDP, interfacing to the CMM block on the host side and the RDM block on the fabric side. For DMA, the DMM interfaces to software through the DmaCB table in memory and the on-chip DMA execution and completion queues. The DMM also handles the sending and receiving of RDP interrupt messages and non-RDP packets, and manages the associated inbound and outbound queues.
The DDR2 SDRAM Controller (SDC) attaches to a one or two external 240-pin DDR2 SDRAM DIMM, which is actually external to the DMS-management chip, as shown in both FIG. 1 and FIG. 2. In particular embodiments, the SDC provides SDRAM access for the CMM and the DMM.
In some embodiments, the DSM-management chip might comprise an application specific integrated circuit (ASIC), whereas in other embodiments the chip might comprise a field-programmable gate array (FPGA). Indeed, the logic encoded in the chip could be implemented in software for DSM systems whose requirements might allow for longer latencies with respect to cache coherence, DMA, interrupts, etc.
C. RDP Packets and Their Headers
FIG. 3 is a diagram showing the format of a packet for RDP over Ethernet and the packet's header, which formats might be used in some embodiments of the present invention. When RDP runs over the Ethernet MAC layer, an RDP packet is encapsulated in an Ethernet MAC frame. The Ethernet header of an encapsulated RDP packet is a VLAN-tagged header (where VLAN stands for virtual local area network). In FIG. 3, SA identifies the 6-byte source MAC address and DA identifies the 6-byte destination MAC address.
The Reliable Delivery Protocol allows RDP and non-RDP packets to co-exist on the same fabric. When RDP runs over the Ethernet MAC layer, RDP and non-RDP packets are distinguished from each other by the presence of the VLAN header and the value of the Length/Type field following it. For an RDP packet: (a) the VLAN header is present, i.e., the first Length/Type field (following the last SA byte) has a value of 0x0081; and (h) the second Length/Type field (following the VLAN header) has a value less than 1536 (frame length). An Ethernet frame that does not satisfy both of the above conditions is a non-RDP packet.
FIG. 4 is a diagram showing the format of a packet for RDP over InfiniBand and the packet's header, which formats might be used in some embodiments of the present invention. It will be appreciated that the header includes fields for Source Local ID and Destination Local ID. When RDP runs over the IB link layer, an RDP packet is encapsulated into an IB packet. The format of an IB Local Transport Packet is used, although the 12-byte Base Transport Header (BTH) which is normally present after the Local Route Header (LRH) is replaced by the RDP header (8 bytes) and the first 4 bytes of the RDP payload. From the standpoint of the IB standard, bits 31:24 of the first DWORD of the RDP Header is the OpCode field of Base Transport Header (BTH). The most significant two bits (31:30) of that field have a fixed value of 0x3 (binary 11) for RDP packets, which specifies a ‘Manufacturer Specific OpCode’. The Rsv8 field of the BTH (bits 31:24 of the second DWORD) is not protected by the 32-bit IB Invariant CRC (ICRC). This corresponds to the most significant 8 bits of the DstLNID. Thus, these bits do not have end-to-end protection but do have point-to-point protection by the 16-bit Variant CRC (VCRC), which presents an insignificant risk of failure since the DstLNID is only used as a packet validation field at the destination node in conjunction with many other validation fields. A false match of a corrupted LNID MSB (most significant bit) with good VCRC has very low probability and would only occur if the connection parameters were set up inconsistently at the source and destination nodes.
When RDP runs over the InfiniBand link layer, RDP and non-RDP packets are distinguished by the values of the LNH field in the IB Local Route Header and the OpCode field in the IB Base Transport Header. For an RDP packet: (a) LNH=0x2 (IBA Local); and (b) OpCode bits [7:6]=0x3 (Manufacturer Specific OpCode). An InfiniBand packet that does not satisfy both of the above conditions is a non-RDP packet.
FIG. 5 is a diagram showing the format of an RDP packet and its header, which formats might be used in some embodiments of the present invention. An RDP packet consists of a header, payload, and optional trailer. As shown in FIG. 5, another field in the RDP packet is the SrcLNID (Source Logical Node ID)) which identifies the packet's source node. This is the connection identifier (i.e., remote LNID) at the destination node. This field is also 16 bits wide. Also as shown in FIG. 5, one of the fields in an RDP packet is the DestLNID (Destination Logical Node ID) which identities identifies the packet's destination node. This is the connection identifier (i.e., remote LNID) at the source node. This field is 16 bits wide.
D. Using LNIDs with RDP
In particular embodiments, the DSM system uses a software data structure called the connection control block (CCB), stored in local memory such as the local main memory shown in FIG. 1, to facilitate implementation of the RDP protocol. The RDM uses a received packet's source LNID as an index into the CCB to find an entry for the connection corresponding to the packet. FIG. 6 is a diagram showing the format of a CCB entry for a single connection, which format might be used in sonic some embodiments of the present invention. As shown in FIG. 6, each entry records the fabric address for two paths, Path 0 and Path 1, which may correspond to the two fabric interface ports shown connected to the RDM in FIG. 2. In other embodiments, there might be more than two paths, corresponding to more than two fabric interface ports. It will be appreciated that the CCB entry has a field called MY_LNID, which identifies the LNID for the RDM's node.
For an RDP connection between a pair of nodes, the node at each end uses an LNID to refer to the node at the other end. Within a multi-node virtual server (VS), every node is assigned a unique LNID, possibly by some management entity for the DSM system. For example, within a three-node VS, the LNID values might be 0, 1, and 2, or 1, 3, and 4, i.e., they not need to be sequentially incrementing from 0. In addition, every server (multi-node virtual server or standalone server) assigns a unique LNID to each node that communicates with it. For example, a standalone server node that communicates with the virtual server described above might be assigned an LNID value of 16 by the VS. If that same node communicates with another server, it may be assigned the same LNID or a different LNID by that server. Therefore, LNID assignments are unique from the standpoint of a given server, but they are not unique across servers.
An example of LNID assignments is shown in FIG. 7. In the example, a virtual computing environment (VCE) consists of two virtual servers (A and B), an application server (C), and a virtual I/O server (D). In this example, virtual server A assigns LNID values 0, 1, and 2 to each of its own nodes (VS nodes A0, A1, and A2, respectively) and an LNID value of 16 to virtual I/O server D. Virtual server B assigns values of 1 and 5 to each of its own nodes (VS nodes B1 and B5, respectively) and an LNID value of 18 to virtual I/O server D. Application server C assigns an LNID value of 3 to virtual I/O server D. Virtual I/O server D assigns LNID values 0, 2, and 4, to VS nodes A0, A1 and A2, respectively, and LNID values of 6 and 8 to VS nodes B1 and B5. Finally, virtual I/O server D assigns a value of 10 to application server C. These various assignments are collected and summarized in Table 7.1 in FIG. 7.
Table 7.2 shows the SrcLNID and DstLNID values used in the headers of RDP packets exchanged between different node pairs. For example, VS nodes A0 and A1 both belong to virtual server A, so a packet from A0 to A1 will have a SrcLNID value of 0 (LNID assigned to A1 by VS A), and a DstLNID value of 1 (LNID assigned to A1 by VS A). As another example, a packet from A1 to I/O server D will have a SrcLNID value of 2 (LNID assigned to A1 by I/O server D) and a DstLNID value of 16 (LNID assigned by VS A to I/O server D).
FIG. 8 is a diagram showing a flowchart of an example process for building an RDP packet for transmission over the switched fabric network, which process might be used with an embodiment of the present invention. In the process's first step 801, the node's Reliable Delivery Manager (RDM) receives a DestLNID and data for an RDP packet from the node's CMM or DMM. The RDM uses the packet's DestLNID to look up the entry corresponding to the DestLNID in the Connection Control Block (CCB), in step 802. If there is no corresponding entry, the RDM sends an error message to the CMM or DMM, as the case may be. Then in step 803, the RDM builds an RDP header for an RDP packet for the data, using the DestLNID and the CCB entry's MY_LNID value. In step 804, the RDM builds a fabric header for the RDP packet, using information in the CCB entry's remote fabric address. Once the RDP packet is complete, the RDM sends the packet to the fabric link for transmission to the remote node, in step 805.
FIG. 9 is a diagram showing a flowchart of an example process for validating an RDP packet received over the switched fabric network, which process might be used with an embodiment of the present invention. In the process's first step 901, a node's RDM receives an RDP packet over the switched fabric network. The RDM then checks to see whether the packet's destination fabric address (e.g., the 6-byte MAC DA in an Ethernet header or the Destination Local ID in an Infiniband LRH) matches the node's fabric address, in step 902. If not, the RDM discards the packet. Otherwise, the RDM goes to step 903 and determines whether the packet is an RDP packet. If not, the RDM will process the packet as a non-RDP packet, in step 904. Otherwise, if the packet is an RDP packet, the RDM uses the packet's SrcLNID to look up the entry corresponding to the SrcLNID in the Connection Control Block (CCB), in step 905. If there is no corresponding entry, the RDM discards the packet. Then the RDM goes to step 906 and checks to make sure that the packet's source fabric address (e.g., the 6-byte MAC SA in an Ethernet header or the Source Local ID in an Infiniband LRH) matches the CCB entry's remote fabric address (e.g., for Path 0 or Path 1). If not, the RDM discards the packet. Otherwise, the RDM checks to determine whether the packet's DestLNID matches the CCB entry's MY_LNID, in step 907. If not, the RDM discards the packet. But if there is a match, the RDM forwards the packet to the CMM or DMM for further processing.
E. Using LNIDs With Memory-Addressing Scheme
As indicated earlier, the DSM system also uses LNIDs in its memory-addressing scheme. In particular embodiments, the physical memory address width is 40-bits (e.g., in DSM systems that use the present generation of Opteron CPUs), though it will be appreciated that there are numerous other suitable widths. FIG. 10 is a diagram showing the format of a 40-bit physical memory address in a 16-node DSM system and the format of a 40-bit physical memory address in a 256-node DSM system. As shown in FIG. 10, the four most significant bits comprise an LNID in the 16-node DSM system and the eight most significant bits comprise an LNID in the 256-node DSM system.
In particular embodiments of the DSM system, the physical address space for a virtual server is arranged so that the local node's memory always starts at address 0 (zero). One reason for using this arrangement is compatibility with legacy system software, in particular embodiments. Specifically, with local memory starting at address 0, system software (e.g., boot code) accesses local memory the same way that it does on a standard server. Another reason for using this arrangement is that it simplifies the address lookup in the CMM. For a memory read/write request from a local processor, an address in the lower 1/16th or 1/256th segment of the 40-bit address space is always local and all other addresses map to memory in other nodes.
To see how the arrangement works, consider the example of a virtual server consisting of three nodes: 0, 1, and 2. In a 16-node DSM system, the total addressable memory space for this virtual server would be 1 terabyte (2^40) and each node would be allocated a segment which is 1/16 of that space (64GB or 2^36). From a global view, the first 64GB segment of the physical address space starting at address 0 would be allocated to node 0 (i.e., the node whose LNID equals 0), the next 64GB segment to node 1, and the following segment to node 2. The remaining 13 segments would be unused since LNIDs 4-15 are not used.
FIG. 11 shows this physical address space from the local view of each of the three nodes in the virtual server. The local view of node 0 would be the same as the global view and is shown in FIG. 11 under the label “Node 0”, with Local Memory (0) first. Node 1 Memory second, and Node 2 Memory third. The local view of node 1 would be as shown under the label “Node 1”, with Local Memory (1) first, Node 0 Memory second, and Node 2 Memory third. And the local view of node 2 would be as shown under the label “Node 2”, with Local Memory (2) first, Node 1 Memory second, and Node 0 Memory third.
It will be appreciated that in order to accomplish this arrangement, the locations of the local segment and the node 0 segment are swapped in the address map. And since MY_LNID, as defined above, is the LNID assigned to the local node, this is equivalent to swapping MY_LNID with LNID 0 in the address map. However, such a swapping would create confusion in the DSM system if it were applied to memory traffic leaving the node over the switched fabric. Therefore, the node's CMM reverses the swapping for traffic leaving the node.
FIG. 12 is a diagram showing a flowchart of an example process for altering a physical memory address, by the swapping a described above, prior to transmission over a HyperTransport bus. In the process's first step 1201, a node's CMM receives a memory operation (e.g., a read, write, or probe) pertaining to a physical memory address from the RDM on the DSM-management chip. In step 1202, the CMM determines whether the four (or eight) most significant bits in the physical address are equal to: (1) the MY_LNID value for the node; or (2) zero. If so, the CMM goes to step 1203, where: (1) if those bits are equal to the MY_LNID value, the CMM sets the bits to zero (e g., by changing to zero the four (or eight) most significant bits in the physical memory address) before transmission of the operation over the HyperTransport bus; and (2) if those bits are equal to zero, the CMM sets those bits to MY_LNID (e.g., by changing to MY_LNID the four (or eight) most significant bits in the physical memory address) before transmission of the operation over the HyperTransport bus. Otherwise, if those bits are not equal to MY_LNID or zero, the CMM goes to step 1204 and allows the memory operation to proceed without processing relating to LNID swapping.
FIG. 13 is a diagram showing a flowchart of an example process for altering a physical memory address, by reversing the swapping as described above, prior to transmission over a switched fabric. In the process's first step 1301, a node's CMM receives a memory operation (e.g., a read, write, or probe) pertaining to a physical memory address from one of the node's CPUs over the HyperTransport (e.g., ccHT) bus that connects the node's CPUs to the node's DSM-management chip. In step 1302, the CMM determines whether the four (or eight) most significant bits in the physical address are equal to (1) the MY_LNID value for the node; or (2) zero. If so, the CMM goes to step 1303, where: (1) if those bits are equal to the MY_LNID value, the CMM sets the DstLNID value to zero (e g., by changing to zero the four (or eight) most significant bits in the physical memory address) before transmission of the operation to the RDM; and (2) if those bits are equal to zero, the CMM sets the DstLNID value to MY_LNID (e.g. by changing to MY_LNID the four (or eight) most significant bits in the physical memory address) before transmission of the operation to the RDM. Otherwise, if those bits are not equal to MY_LNID or zero, the CMM goes to step 1304 and allows the memory operation to proceed without processing relating to LNID swapping, if the physical memory address is not for exported local memory. (If the physical memory address is for exported local memory, a probe operation to another physical memory address might result, feeding back into the process at step 1301.)
Particular embodiments of the above-described processes might be comprised of instructions that are stored on storage media. The instructions might be retrieved and executed by a processing system. The instructions are operational when executed by the processing system to direct the processing system to operate in accord with the present invention. Some examples of instructions are software, program code, firmware, and microcode. Some examples of storage media are memory devices, tape, disks, integrated circuits, and servers. The term “processing system” refers to a single processing device or a group of inter-operational processing devices. Some examples of processing devices are integrated circuits and logic circuitry. Those skilled in the art are familiar with instructions, storage media, and processing systems.
Those skilled in the art will appreciate variations of the above-described embodiments that fall within the scope of the invention. In this regard, it will be appreciated that there are many other possible orderings of the steps in the processes described above and many other possible modularizations of those orderings. Also, it will be appreciated that the above processes relating to memory-addressing will work with physical memory addresses that exceed 40-bits in width and DSM systems that have more than 256 nodes. Further, it will be appreciated that the DSM system will work with nodes whose CPUs are not Opterons having a ccHT bus. As a result, the invention is not limited to the specific examples and illustrations discussed above, but only by the following claims and their equivalents.

Claims (23)

What is claimed is;
1. A method, comprising:
receiving, at a distributed memory logic circuit of a first node, data for a packet destined to a distributed memory logic circuit of a second node, wherein the first and second nodes are connected by a network switch fabric and are parts of a distributed shared memory system, and wherein the data for the packet includes a physical memory address in which one or more bits in the physical memory address comprise a destination logical node identifier for the second node;
using the destination logical node identifier as an index into a connection control block to locate an entry for a connection between the first and second nodes, resulting in a located entry of the connection control block, wherein the connection control block is stored in a local memory on the first node;
building a the packet in a format of a connection and communication protocol using the data, the destination logical node identifier, and a logical node identifier for the first node, wherein the logical node identifier for the first node is included in the located entry of the connection control block entry;
adding, to the packet, a header that includes a switch fabric address for the second node, wherein the switch fabric address is identified in the located entry of the connection control block; and
transmitting the packet on a link to the switch fabric.
2. A method as in claim 1, wherein the distributed shared memory system is a cache coherent non-uniform memory access system.
3. A method as in claim 1, wherein the a distributed memory logic circuit in the first node sets the destination logical node identifier to zero if the destination logical node identifier in the physical memory address equals the logical node identifier for the first node.
4. A method, comprising:
receiving, at a distributed memory logic circuit of a first node, a packet from a distributed memory logic circuit of a second node, wherein the packet includes a source logical node identifier and wherein the first and second nodes are connected by a network switch fabric and are parts of a distributed shared memory system;
determining whether a destination switch fabric address included in the packet matches a switch fabric address for the first node;
using the source logical node identifier as an index into a connection control block to locate an entry for the a connection between the first and second nodes, resulting in a located entry of the connection control block, wherein the connection control block is stored in a local memory on the first node;
determining whether a destination logical node identifier included in the packet matches a logical node identifier for the first node, wherein the logical node identifier for the first node is identified in the located entry of the connection control block; and
accepting data in the packet for further processing by the first node.
5. The method of claim 4, wherein the packet is discarded if the destination switch fabric address included in the packet does not match the switch fabric address for the first node.
6. The method of claim 4, wherein the packet is discarded if the destination logical node identifier does not match the logical node identifier for the first node identified in the located entry of the connection control block.
7. The method of claim 4, wherein the distributed shared memory system is a cache coherent non-uniform memory access system.
8. A distributed memory logic circuit encoded with executable logic, the logic when executed operable to:
receive, at the distributed memory logic circuit of a first node, data for a packet destined to a distributed memory logic circuit of a second node, wherein the first and second nodes are connected by a network switch fabric and are parts of a distributed shared memory system,; and wherein the data for the packet includes a physical memory address in which one or more bits in the physical memory address comprise a destination logical node identifier for the second node;
use the destination logical node identifier as an index into a connection control block to locate an entry for a connection between the first and second nodes, resulting in a located entry of the connection control block, wherein the connection control block is stored in a local memory on the first node;
build a the packet in a format of a connection and communication protocol using the data, the destination logical node identifier, and a logical node identifier for the first node, wherein the logical node identifier for the first node is included in the located entry of the connection control block entry;
add, to the packet, a header that includes a switch fabric address for the second node, wherein the switch fabric address is identified in the located entry of the connection control block; and
transmit the packet on a link to the switch fabric.
9. The distributed memory logic circuit of in claim 8, wherein the distributed shared memory system is a cache coherent non-uniform memory access system.
10. The distributed memory logic circuit of claim 8, wherein the distributed memory logic circuit of the first node sets logic is further operable to set the destination logical node identifier to zero if the destination logical node identifier in the physical memory address equals the logical node identifier for the first node.
11. A distributed memory logic circuit encoded with executable logic, the logic when executed operable to:
receive, at the distributed memory logic circuit of a first node, a packet from a distributed memory logic circuit of a second node, wherein the packet includes a source logical node identifier and wherein the first and second nodes are connected by a network switch fabric and are parts of a distributed shared memory system;
determine whether a destination switch fabric address included in the packet matches a switch fabric address for the first node;
use the source logical node identifier as an index into a connection control block to locate an entry for a connection between the first and second nodes, resulting in a located entry of the connection control block, wherein the connection control block is stored in a local memory on the first node;
determine whether a destination logical node identifier included in the packet matches a logical node identifier for the first node, wherein the logical node identifier for the first node is identified in the located entry of the connection control block; and
accept data in the packet for further processing by the first node.
12. The distributed memory logic circuit of claim 11, wherein the packet is discarded if the destination switch fabric address included in the packet does not match the switch fabric address for the first node.
13. The distributed memory logic circuit of claim 11, wherein the packet is discarded if the destination logical node identifier does not match the logical node identifier for the first node identified in the located entry of the connection control block.
14. The distributed memory logic circuit of claim 11, wherein the distributed shared memory system is a cache coherent non-uniform memory access system.
15. A distributed shared memory system comprising:
a network switch fabric;
two or more nodes in a distributed shared memory system connected by a the network switch fabric; and wherein, each of the two or more nodes comprises comprising:
one or more processors,;
local memory; and
a distributed shared memory logic circuit,
wherein the distributed memory logic circuit is encoded with executable logic, the logic that
when executed, is operable to:
receive, at the distributed memory logic circuit of a local node, data for a packet destined to a distributed memory logic circuit of a remote node of the two or more nodes in the distributed shared memory system, wherein the data for the packet includes a physical memory address in which one or more bits in the physical memory address comprise a destination logical node identifier for the remote node,
use the destination logical node identifier as an index into a connection control block to locate an entry for a connection between the local node and the remote node, resulting in a local entry of the connection control block, wherein the connection control block is stored in local memory on the local node,
build a the packet in a format of a connection and communication protocol using the data, the destination logical node identifier, and a logical node identifier for the local node, wherein the logical node identifier for the local node is included in the located entry of the connection control block entry,
add, to the packet, a header that includes a switch fabric address for the remote node, wherein the switch fabric address is identified in the located entry of the connection control block,
transmit the packet on a link to the network switch fabric, receive, at the distributed memory logic circuit of the local node, a second packet from a distributed memory logic circuit of the remote node or another remote node of the two or more nodes in the distributed shared memory system, wherein the second packet includes a source logical node identifier,
determine whether a destination switch fabric address included in the second packet matches a switch fabric address for the local node,
use the source logical node identifier as an index into the connection control block to locate an entry for a connection between the local and remote node, resulting in a second located entry of the connection control block, determine whether a destination logical node identifier included in the second packet matches a the logical node identifier for the local node, wherein the logical node identifier for the local node is identified in the second located entry of the connection control block, and
accept data in the packet for further processing by the local node.
16. A method comprising:
receiving, at a first node in a distributed shared memory system, a message from a second node in the distributed shared memory system, the distributed shared memory system comprising a plurality of interconnected nodes each having a unique logical node identifier, wherein the message indicates a memory operation related to a local memory of the first node and identifies a memory address;
if a first plurality of contiguous bits of the memory address equal a logical node identifier of the first node, changing the first plurality of contiguous bits to a predetermined value;
if the first plurality of contiguous bits of the memory address equal the predetermined value, changing the first plurality of contiguous bits to the logical node identifier of the first node; and
forwarding the message to a processor of the first node for processing.
17. The method of claim 16, wherein the predetermined value is zero.
18. The method of claim 16, wherein each node of the plurality of interconnected nodes internally accesses a respective local memory having memory addresses with a first plurality of contiguous bits set to the predetermined value.
19. The method of claim 16, wherein a given node of the plurality of interconnected nodes accesses a local memory of another node of the plurality of interconnected nodes that has a logical unit identifier equal to the predetermined value using the given node's own respective logical node identifier for the another node.
20. The method of claim 16, wherein the memory operation is one of a read command, a write command, or a probe.
21. A method comprising:
receiving, at a first node in a distributed shared memory system, a message from a processor of the first node identifying a memory operation related to a local memory of a second node in the distributed shared memory system, the distributed shared memory system comprising a plurality of nodes each having a unique logical unit identifier, the plurality of nodes being interconnected by a switch fabric, wherein the message identifies a memory address;
if a first plurality of contiguous bits of the memory address equal a logical node identifier of the first node, changing the first plurality of contiguous bits to a predetermined value;
if the first plurality of contiguous bits of the memory address equal the predetermined value, changing the first plurality of contiguous bits to the logical node identifier of the first node; and
forwarding the message to the second node for processing.
22. A distributed shared memory system, comprising:
a network switch fabric; and
a plurality of nodes interconnected by the network switch fabric, each given node of the plurality of nodes comprising:
a logical node identifier of a plurality of contiguous bits;
a local memory;
a distributed shared memory management chip operative to share the local memory of the given node with others of the plurality of nodes in the distributed shared memory system to create a shared memory accessible using binary addresses comprising a plurality of bits, wherein a set of contiguous most-significant bits of the binary addresses collectively represent a logical node identifier of a node of the plurality of nodes; and
one or more processors each operative to access the local memory of the given node, the local memory accessed using binary addresses having the set of contiguous most-significant bits collectively set to a predetermined value,
wherein the distributed shared memory management chip is further operative to map the predetermined value to the logical node identifier of the given node in memory management traffic transmitted between the plurality of nodes that include one or more binary addresses of the shared memory.
23. The distributed shared memory system of claim 22, wherein the distributed shared memory management chip of each node of the plurality of nodes is further operative to:
if the set of contiguous most-significant bits of a given binary address equal the logical node identifier of the given node, change the set of contiguous most-significant bits of the given binary address to the predetermined value; and
if the set of contiguous most-significant bits of the given binary address equal the predetermined value, change the set of contiguous most-significant bits of the given binary address to the logical node identifier of the given node.
US13/468,751 2007-04-26 2012-05-10 Node identification for distributed shared memory system Active 2028-11-11 USRE44610E1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/468,751 USRE44610E1 (en) 2007-04-26 2012-05-10 Node identification for distributed shared memory system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/740,432 US7715400B1 (en) 2007-04-26 2007-04-26 Node identification for distributed shared memory system
US13/468,751 USRE44610E1 (en) 2007-04-26 2012-05-10 Node identification for distributed shared memory system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/740,432 Reissue US7715400B1 (en) 2007-04-26 2007-04-26 Node identification for distributed shared memory system

Publications (1)

Publication Number Publication Date
USRE44610E1 true USRE44610E1 (en) 2013-11-26

Family

ID=42139367

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/740,432 Ceased US7715400B1 (en) 2007-04-26 2007-04-26 Node identification for distributed shared memory system
US12/755,113 Abandoned US20110004733A1 (en) 2007-04-26 2010-04-06 Node Identification for Distributed Shared Memory System
US13/468,751 Active 2028-11-11 USRE44610E1 (en) 2007-04-26 2012-05-10 Node identification for distributed shared memory system

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US11/740,432 Ceased US7715400B1 (en) 2007-04-26 2007-04-26 Node identification for distributed shared memory system
US12/755,113 Abandoned US20110004733A1 (en) 2007-04-26 2010-04-06 Node Identification for Distributed Shared Memory System

Country Status (1)

Country Link
US (3) US7715400B1 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9054990B2 (en) 2009-10-30 2015-06-09 Iii Holdings 2, Llc System and method for data center security enhancements leveraging server SOCs or server fabrics
US9069929B2 (en) 2011-10-31 2015-06-30 Iii Holdings 2, Llc Arbitrating usage of serial port in node card of scalable and modular servers
US9077654B2 (en) 2009-10-30 2015-07-07 Iii Holdings 2, Llc System and method for data center security enhancements leveraging managed server SOCs
US9465771B2 (en) 2009-09-24 2016-10-11 Iii Holdings 2, Llc Server on a chip and node cards comprising one or more of same
US9585281B2 (en) 2011-10-28 2017-02-28 Iii Holdings 2, Llc System and method for flexible storage and networking provisioning in large scalable processor installations
US9648102B1 (en) 2012-12-27 2017-05-09 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US9680770B2 (en) 2009-10-30 2017-06-13 Iii Holdings 2, Llc System and method for using a multi-protocol fabric module across a distributed server interconnect fabric
US9866477B2 (en) 2009-10-30 2018-01-09 Iii Holdings 2, Llc System and method for high-performance, low-power data center interconnect fabric
US9876735B2 (en) 2009-10-30 2018-01-23 Iii Holdings 2, Llc Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect
US10140245B2 (en) 2009-10-30 2018-11-27 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US10187452B2 (en) 2012-08-23 2019-01-22 TidalScale, Inc. Hierarchical dynamic scheduling
US10353736B2 (en) 2016-08-29 2019-07-16 TidalScale, Inc. Associating working sets and threads
US10579274B2 (en) 2017-06-27 2020-03-03 TidalScale, Inc. Hierarchical stalling strategies for handling stalling events in a virtualized environment
US10817347B2 (en) 2017-08-31 2020-10-27 TidalScale, Inc. Entanglement of pages and guest threads
US10877695B2 (en) 2009-10-30 2020-12-29 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US11175927B2 (en) 2017-11-14 2021-11-16 TidalScale, Inc. Fast boot
US11240334B2 (en) 2015-10-01 2022-02-01 TidalScale, Inc. Network attached memory using selective resource migration
US11467883B2 (en) 2004-03-13 2022-10-11 Iii Holdings 12, Llc Co-allocating a reservation spanning different compute resources types
US11494235B2 (en) 2004-11-08 2022-11-08 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11496415B2 (en) 2005-04-07 2022-11-08 Iii Holdings 12, Llc On-demand access to compute resources
US11522952B2 (en) 2007-09-24 2022-12-06 The Research Foundation For The State University Of New York Automatic clustering for self-organizing grids
US11630704B2 (en) 2004-08-20 2023-04-18 Iii Holdings 12, Llc System and method for a workload management and scheduling module to manage access to a compute environment according to local and non-local user identity information
US11652706B2 (en) 2004-06-18 2023-05-16 Iii Holdings 12, Llc System and method for providing dynamic provisioning within a compute environment
US11650857B2 (en) 2006-03-16 2023-05-16 Iii Holdings 12, Llc System and method for managing a hybrid computer environment
US11658916B2 (en) 2005-03-16 2023-05-23 Iii Holdings 12, Llc Simple integration of an on-demand compute environment
US11720290B2 (en) 2009-10-30 2023-08-08 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US11960937B2 (en) 2004-03-13 2024-04-16 Iii Holdings 12, Llc System and method for an optimizing reservation in time of compute resources based on prioritization function and reservation policy parameter

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161879A1 (en) * 2008-12-18 2010-06-24 Lsi Corporation Efficient and Secure Main Memory Sharing Across Multiple Processors
US8825863B2 (en) * 2011-09-20 2014-09-02 International Business Machines Corporation Virtual machine placement within a server farm
WO2013103339A1 (en) * 2012-01-04 2013-07-11 Intel Corporation Bimodal functionality between coherent link and memory expansion
JP6221792B2 (en) * 2014-02-05 2017-11-01 富士通株式会社 Information processing apparatus, information processing system, and information processing system control method
US10223268B2 (en) * 2016-02-23 2019-03-05 International Business Systems Corporation Transactional memory system including cache versioning architecture to implement nested transactions
US20170371783A1 (en) * 2016-06-24 2017-12-28 Qualcomm Incorporated Self-aware, peer-to-peer cache transfers between local, shared cache memories in a multi-processor system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774731A (en) * 1995-03-22 1998-06-30 Hitachi, Ltd. Exclusive control method with each node controlling issue of an exclusive use request to a shared resource, a computer system therefor and a computer system with a circuit for detecting writing of an event flag into a shared main storage
US6160814A (en) * 1997-05-31 2000-12-12 Texas Instruments Incorporated Distributed shared-memory packet switch
US20010037435A1 (en) * 2000-05-31 2001-11-01 Van Doren Stephen R. Distributed address mapping and routing table mechanism that supports flexible configuration and partitioning in a modular switch-based, shared-memory multiprocessor computer system
US20030076831A1 (en) * 2000-05-31 2003-04-24 Van Doren Stephen R. Mechanism for packet component merging and channel assignment, and packet decomposition and channel reassignment in a multiprocessor system
US20040030763A1 (en) * 2002-08-08 2004-02-12 Manter Venitha L. Method for implementing vendor-specific mangement in an inifiniband device
US6757790B2 (en) * 2002-02-19 2004-06-29 Emc Corporation Distributed, scalable data storage facility with cache memory
US20040148472A1 (en) * 2001-06-11 2004-07-29 Barroso Luiz A. Multiprocessor cache coherence system and method in which processor nodes and input/output nodes are equal participants
US6877030B2 (en) * 2002-02-28 2005-04-05 Silicon Graphics, Inc. Method and system for cache coherence in DSM multiprocessor system without growth of the sharing vector
US6922766B2 (en) * 2002-09-04 2005-07-26 Cray Inc. Remote translation mechanism for a multi-node system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774731A (en) * 1995-03-22 1998-06-30 Hitachi, Ltd. Exclusive control method with each node controlling issue of an exclusive use request to a shared resource, a computer system therefor and a computer system with a circuit for detecting writing of an event flag into a shared main storage
US6160814A (en) * 1997-05-31 2000-12-12 Texas Instruments Incorporated Distributed shared-memory packet switch
US20010037435A1 (en) * 2000-05-31 2001-11-01 Van Doren Stephen R. Distributed address mapping and routing table mechanism that supports flexible configuration and partitioning in a modular switch-based, shared-memory multiprocessor computer system
US20030076831A1 (en) * 2000-05-31 2003-04-24 Van Doren Stephen R. Mechanism for packet component merging and channel assignment, and packet decomposition and channel reassignment in a multiprocessor system
US20040148472A1 (en) * 2001-06-11 2004-07-29 Barroso Luiz A. Multiprocessor cache coherence system and method in which processor nodes and input/output nodes are equal participants
US6757790B2 (en) * 2002-02-19 2004-06-29 Emc Corporation Distributed, scalable data storage facility with cache memory
US6877030B2 (en) * 2002-02-28 2005-04-05 Silicon Graphics, Inc. Method and system for cache coherence in DSM multiprocessor system without growth of the sharing vector
US20040030763A1 (en) * 2002-08-08 2004-02-12 Manter Venitha L. Method for implementing vendor-specific mangement in an inifiniband device
US6922766B2 (en) * 2002-09-04 2005-07-26 Cray Inc. Remote translation mechanism for a multi-node system

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11467883B2 (en) 2004-03-13 2022-10-11 Iii Holdings 12, Llc Co-allocating a reservation spanning different compute resources types
US11960937B2 (en) 2004-03-13 2024-04-16 Iii Holdings 12, Llc System and method for an optimizing reservation in time of compute resources based on prioritization function and reservation policy parameter
US11652706B2 (en) 2004-06-18 2023-05-16 Iii Holdings 12, Llc System and method for providing dynamic provisioning within a compute environment
US11630704B2 (en) 2004-08-20 2023-04-18 Iii Holdings 12, Llc System and method for a workload management and scheduling module to manage access to a compute environment according to local and non-local user identity information
US11886915B2 (en) 2004-11-08 2024-01-30 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11656907B2 (en) 2004-11-08 2023-05-23 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11537435B2 (en) 2004-11-08 2022-12-27 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11861404B2 (en) 2004-11-08 2024-01-02 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11494235B2 (en) 2004-11-08 2022-11-08 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11709709B2 (en) 2004-11-08 2023-07-25 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11762694B2 (en) 2004-11-08 2023-09-19 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11537434B2 (en) 2004-11-08 2022-12-27 Iii Holdings 12, Llc System and method of providing system jobs within a compute environment
US11658916B2 (en) 2005-03-16 2023-05-23 Iii Holdings 12, Llc Simple integration of an on-demand compute environment
US11765101B2 (en) 2005-04-07 2023-09-19 Iii Holdings 12, Llc On-demand access to compute resources
US11533274B2 (en) 2005-04-07 2022-12-20 Iii Holdings 12, Llc On-demand access to compute resources
US11831564B2 (en) 2005-04-07 2023-11-28 Iii Holdings 12, Llc On-demand access to compute resources
US11522811B2 (en) 2005-04-07 2022-12-06 Iii Holdings 12, Llc On-demand access to compute resources
US11496415B2 (en) 2005-04-07 2022-11-08 Iii Holdings 12, Llc On-demand access to compute resources
US11650857B2 (en) 2006-03-16 2023-05-16 Iii Holdings 12, Llc System and method for managing a hybrid computer environment
US11522952B2 (en) 2007-09-24 2022-12-06 The Research Foundation For The State University Of New York Automatic clustering for self-organizing grids
US9465771B2 (en) 2009-09-24 2016-10-11 Iii Holdings 2, Llc Server on a chip and node cards comprising one or more of same
US9680770B2 (en) 2009-10-30 2017-06-13 Iii Holdings 2, Llc System and method for using a multi-protocol fabric module across a distributed server interconnect fabric
US11720290B2 (en) 2009-10-30 2023-08-08 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US10140245B2 (en) 2009-10-30 2018-11-27 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US10135731B2 (en) 2009-10-30 2018-11-20 Iii Holdings 2, Llc Remote memory access functionality in a cluster of data processing nodes
US10050970B2 (en) 2009-10-30 2018-08-14 Iii Holdings 2, Llc System and method for data center security enhancements leveraging server SOCs or server fabrics
US9054990B2 (en) 2009-10-30 2015-06-09 Iii Holdings 2, Llc System and method for data center security enhancements leveraging server SOCs or server fabrics
US9929976B2 (en) 2009-10-30 2018-03-27 Iii Holdings 2, Llc System and method for data center security enhancements leveraging managed server SOCs
US9876735B2 (en) 2009-10-30 2018-01-23 Iii Holdings 2, Llc Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect
US9866477B2 (en) 2009-10-30 2018-01-09 Iii Holdings 2, Llc System and method for high-performance, low-power data center interconnect fabric
US10877695B2 (en) 2009-10-30 2020-12-29 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US11526304B2 (en) 2009-10-30 2022-12-13 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US9749326B2 (en) 2009-10-30 2017-08-29 Iii Holdings 2, Llc System and method for data center security enhancements leveraging server SOCs or server fabrics
US9509552B2 (en) 2009-10-30 2016-11-29 Iii Holdings 2, Llc System and method for data center security enhancements leveraging server SOCs or server fabrics
US9479463B2 (en) 2009-10-30 2016-10-25 Iii Holdings 2, Llc System and method for data center security enhancements leveraging managed server SOCs
US9077654B2 (en) 2009-10-30 2015-07-07 Iii Holdings 2, Llc System and method for data center security enhancements leveraging managed server SOCs
US10021806B2 (en) 2011-10-28 2018-07-10 Iii Holdings 2, Llc System and method for flexible storage and networking provisioning in large scalable processor installations
US9585281B2 (en) 2011-10-28 2017-02-28 Iii Holdings 2, Llc System and method for flexible storage and networking provisioning in large scalable processor installations
US9792249B2 (en) 2011-10-31 2017-10-17 Iii Holdings 2, Llc Node card utilizing a same connector to communicate pluralities of signals
US9069929B2 (en) 2011-10-31 2015-06-30 Iii Holdings 2, Llc Arbitrating usage of serial port in node card of scalable and modular servers
US9092594B2 (en) 2011-10-31 2015-07-28 Iii Holdings 2, Llc Node card management in a modular and large scalable server system
US9965442B2 (en) 2011-10-31 2018-05-08 Iii Holdings 2, Llc Node card management in a modular and large scalable server system
US10623479B2 (en) 2012-08-23 2020-04-14 TidalScale, Inc. Selective migration of resources or remapping of virtual processors to provide access to resources
US10187452B2 (en) 2012-08-23 2019-01-22 TidalScale, Inc. Hierarchical dynamic scheduling
US10205772B2 (en) 2012-08-23 2019-02-12 TidalScale, Inc. Saving and resuming continuation on a physical processor after virtual processor stalls
US11159605B2 (en) 2012-08-23 2021-10-26 TidalScale, Inc. Hierarchical dynamic scheduling
US10645150B2 (en) 2012-08-23 2020-05-05 TidalScale, Inc. Hierarchical dynamic scheduling
US9648102B1 (en) 2012-12-27 2017-05-09 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US11240334B2 (en) 2015-10-01 2022-02-01 TidalScale, Inc. Network attached memory using selective resource migration
US10620992B2 (en) 2016-08-29 2020-04-14 TidalScale, Inc. Resource migration negotiation
US10783000B2 (en) 2016-08-29 2020-09-22 TidalScale, Inc. Associating working sets and threads
US11513836B2 (en) 2016-08-29 2022-11-29 TidalScale, Inc. Scheduling resuming of ready to run virtual processors in a distributed system
US10579421B2 (en) 2016-08-29 2020-03-03 TidalScale, Inc. Dynamic scheduling of virtual processors in a distributed system
US10353736B2 (en) 2016-08-29 2019-07-16 TidalScale, Inc. Associating working sets and threads
US11403135B2 (en) 2016-08-29 2022-08-02 TidalScale, Inc. Resource migration negotiation
US11023135B2 (en) 2017-06-27 2021-06-01 TidalScale, Inc. Handling frequently accessed pages
US11803306B2 (en) 2017-06-27 2023-10-31 Hewlett Packard Enterprise Development Lp Handling frequently accessed pages
US10579274B2 (en) 2017-06-27 2020-03-03 TidalScale, Inc. Hierarchical stalling strategies for handling stalling events in a virtualized environment
US11449233B2 (en) 2017-06-27 2022-09-20 TidalScale, Inc. Hierarchical stalling strategies for handling stalling events in a virtualized environment
US10817347B2 (en) 2017-08-31 2020-10-27 TidalScale, Inc. Entanglement of pages and guest threads
US11907768B2 (en) 2017-08-31 2024-02-20 Hewlett Packard Enterprise Development Lp Entanglement of pages and guest threads
US11656878B2 (en) 2017-11-14 2023-05-23 Hewlett Packard Enterprise Development Lp Fast boot
US11175927B2 (en) 2017-11-14 2021-11-16 TidalScale, Inc. Fast boot

Also Published As

Publication number Publication date
US7715400B1 (en) 2010-05-11
US20110004733A1 (en) 2011-01-06

Similar Documents

Publication Publication Date Title
USRE44610E1 (en) Node identification for distributed shared memory system
US11593291B2 (en) Methods and apparatus for high-speed data bus connection and fabric management
US9996491B2 (en) Network interface controller with direct connection to host memory
US7941613B2 (en) Shared memory architecture
US9304896B2 (en) Remote memory ring buffers in a cluster of data processing nodes
KR100555394B1 (en) Methodology and mechanism for remote key validation for ngio/infiniband applications
EP3042297B1 (en) Universal pci express port
US9025495B1 (en) Flexible routing engine for a PCI express switch and method of use
US20110004732A1 (en) DMA in Distributed Shared Memory System
US6421746B1 (en) Method of data and interrupt posting for computer devices
WO2022001417A1 (en) Data transmission method, processor system, and memory access system
US11829309B2 (en) Data forwarding chip and server
US10936048B2 (en) System, apparatus and method for bulk register accesses in a processor
US20230393997A1 (en) Composable infrastructure enabled by heterogeneous architecture, delivered by cxl based cached switch soc and extensible via cxloverethernet (coe) protocols
US11003607B2 (en) NVMF storage to NIC card coupling over a dedicated bus
JP2017537404A (en) Memory access method, switch, and multiprocessor system
KR20140113439A (en) Apparatus, system and method for providing access to a device function
US20050080941A1 (en) Distributed copies of configuration information using token ring
US20120324078A1 (en) Apparatus and method for sharing i/o device
US11765037B2 (en) Method and system for facilitating high availability in a multi-fabric system
US7549091B2 (en) Hypertransport exception detection and processing
KR20050080704A (en) Apparatus and method of inter processor communication
US20220300442A1 (en) Peripheral component interconnect express device and method of operating the same
US20090006712A1 (en) Data ordering in a multi-node system
US6298409B1 (en) System for data and interrupt posting for computer devices

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTELLECTUAL VENTURES HOLDING 80 LLC, NEVADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THE FLORIDA STATE UNIVERSITY FOUNDATION, INCORPORATED;REEL/FRAME:028347/0991

Effective date: 20111014

AS Assignment

Owner name: INTELLECTUAL VENTURES FUND 81 LLC, NEVADA

Free format text: MERGER;ASSIGNOR:INTELLECTUAL VENTURES HOLDING 80 LLC;REEL/FRAME:037575/0812

Effective date: 20150827

AS Assignment

Owner name: INTELLECTUAL VENTURES HOLDING 81 LLC, NEVADA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE NAME PREVIOUSLY RECORDED AT REEL: 037575 FRAME: 0812. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER;ASSIGNOR:INTELLECTUAL VENTURES HOLDING 80 LLC;REEL/FRAME:038516/0869

Effective date: 20150827

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552)

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12