US20160301610A1 - Interconnect congestion control in a storage grid - Google Patents

Interconnect congestion control in a storage grid Download PDF

Info

Publication number
US20160301610A1
US20160301610A1 US14/682,573 US201514682573A US2016301610A1 US 20160301610 A1 US20160301610 A1 US 20160301610A1 US 201514682573 A US201514682573 A US 201514682573A US 2016301610 A1 US2016301610 A1 US 2016301610A1
Authority
US
United States
Prior art keywords
storage
message
watermarked
nodes
grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/682,573
Other versions
US9876698B2 (en
Inventor
Jonathan Amit
Zah BARZIK
Vladislav DROUKER
Maxim KALAEV
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US14/682,573 priority Critical patent/US9876698B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AMIT, JONATHAN, BARZIK, ZAH, DROUKER, VLADISLAV, KALAEV, MAXIM
Publication of US20160301610A1 publication Critical patent/US20160301610A1/en
Priority to US15/805,447 priority patent/US10257066B2/en
Application granted granted Critical
Publication of US9876698B2 publication Critical patent/US9876698B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0882Utilisation of link capacity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/11Identifying congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/22Traffic shaping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/25Flow control; Congestion control with rate being modified by the source upon detecting a change of network conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • the present invention relates in general to computers, and more particularly to a method, system, and computer program product for controlling congestion in a storage grid, such as in a software defined storage system (SDS) or SDS server environment.
  • a storage grid such as in a software defined storage system (SDS) or SDS server environment.
  • SDS software defined storage system
  • SDS or SDS server environments using architectures such as the industry standard InfiniBand® architecture, for example, may be used for interconnecting systems in cluster configurations, by providing a channel-based, switched-fabric technology.
  • data may be transmitted via messages which are made up of packets.
  • Each device, whether processor or input/output (I/O), may include a channel adapter.
  • the messages are typically transmitted from one device's channel adapter to another device's channel adapter via switches.
  • the method comprises, monitoring a storage request transmit queue length; wherein upon reaching at least one of a certain threshold, a watermarked message is transmitted to a receiving node, the receiving node altering storage requests based upon the watermarked message.
  • FIG. 1 illustrates an exemplary computing environment, more specifically a clustered storage system, in which aspects of the present invention may be implemented;
  • FIG. 2 illustrates a portion of a computing environment according to one aspect of the present invention
  • FIG. 3 illustrates a method according to one aspect of the present invention
  • FIG. 4 illustrates an additional method according to one aspect of the present invention.
  • FIG. 5 illustrates still an additional method according to one aspect of the present invention.
  • SDS software defined systems
  • SDS server environments Modern storage arrays are moving away from physical hardware devices and transitioning to become software defined systems (SDS), or SDS server environments. Technical progression in systems such as these often dictates back end changes within the systems and the connections between them. For example, within storage arrays, the shift toward SDS requires necessary changes related to the backbone links within the array and the corresponding interconnections between its nodes. These interconnecting links are generally hidden from the end user and serves primarily for such purposes as data resiliency, cache coherency, snapshot management, mirroring, compression, cluster management, etc. within the storage arrays internal logic.
  • InfiniBand® is an industry standard architecture that may be used for interconnecting systems in cluster configurations, by providing a channel-based, switched-fabric technology.
  • data may be transmitted via messages which are made up of packets.
  • Each device whether processor or input/output (I/O), may include a channel adapter.
  • the messages are typically transmitted from one device's channel adapter to another device's channel adapter via switches.
  • Such as the aforementioned servers or systems may be implemented in an SDS, or software defined environment as will be further discussed.
  • a server or storage environment may be a virtual storage area network (VSAN).
  • VSAN virtual storage area network
  • Complex SAN configurations such as the aforementioned enable large numbers of computing components such as servers to access common storage via interconnection switches and cabling. The availability, integrity and recovery of these interconnections are critical to the reliable operations of the systems.
  • VSANs, modeled after virtual local area networks (VLANs) are often used because of their flexibility in set up.
  • the aforementioned changes are complicated by matters such as interconnecting links being replaced by Ethernet and TCP/IP protocols within the software defined storage grids, making the backbone interconnecting links more expensive with regard to CPU usage, latency, and total bandwidth capacity.
  • the storage grids nodes may begin to choke one another when an over-commit of work is placed on a specific node within the grid.
  • Present congestion control techniques include such methods as flow control systems that rate-limit some, or all, internal services according to various resource limits, such as disk queue depth or parallel jobs in the node.
  • resource limits such as disk queue depth or parallel jobs in the node.
  • nodes Under high load, nodes will pass as many requests as possible over the storage grid, however, the TCP/IP stack will inevitably consume a majority of the machine processing power. This leaves inadequate processing power for performing the actual storage input/output (I/O) operations themselves, and leaves the storage grid congested. Ultimately, this causes such failures as timer aborts regarding the delayed response times.
  • the present invention provides congestion control within the storage grid, accounting for the amount of processing power required to transfer data over the grid. More specifically, it enables the use of Ethernet TCP/IP networks for interconnect functions without leaving the grid congested and choked.
  • the method comprises, monitoring a storage request transmit queue length; wherein upon reaching at least one of a certain threshold, a watermarked message is transmitted to a receiving node, the receiving node altering storage requests based upon the watermarked message.
  • the new interconnect congestion control is accomplished by monitoring the job transmit queue length.
  • each message that is unable to transmit using the current TCP/IP socket buffer is added to this queue.
  • the new method provides a high watermark and a low watermark. When a predefined threshold is reached for the transmit queue, a high watermarked message is created and transmitted to upper level nodes within the grid.
  • upper level grid nodes when upper level grid nodes receive this high watermarked message, indicating a specific node is experiencing a choked state, the upper level grid nodes will accommodate by slowing or ceasing storage I/O requests to the congested node. Similarly, when the transmit queue slows or becomes free and a low watermark is reached, the congested or choked message state message is ceased. Upper layer grid nodes then may resume the transmission of normal storage I/O requests.
  • each grid member monitors the choked state message of other nodes within the grid, as well as clearing the choked state message for any given node when the choked state message has not been broadcast for a predetermined amount of time.
  • a high watermark is maintained indicating a choked or congested state, which is broadcast to other nodes within the grid when a predetermined threshold has been reached.
  • a low watermark, or lower threshold then clears this message, allowing normal I/O storage requests to continue.
  • the watermarked messages may relate to a variety of internal queues regarding interconnect function within the storage grid.
  • the high watermark and low watermark are thresholds regarding limiting I/O requests to a particular node experiencing congestion or a choked state.
  • these watermarked messages may be used in other ways staying within the spirit of the invention.
  • One example may be speeding I/O job requests when a particular watermark is reached or cleared.
  • Other implementations may exist while staying within the scope of the aforementioned examples.
  • FIG. 1 illustrates an exemplary clustered storage system or clustered server system, such as an InfiniBand® implementation 100 .
  • Switches 102 - 114 and a router 116 form the subnet 120 .
  • Multiple processor nodes 122 - 126 may be connected to switches within the subnet 120 through InfiniBand® host channel adapters 128 - 136 to form a cluster.
  • InfiniBand® host channel adapters 128 - 136 to form a cluster.
  • One of the nodes of subnet 120 hosts a subnet manager node.
  • end node 108 includes a subnet manager 130 and subnet administrator and its database 132 .
  • Subnet manager 130 is used for discovery, configuration, and initialization of the fabric.
  • subnet manager 130 configures host channel adapters 128 - 136 with the local addresses for each associated physical port, i.e., the port's LID.
  • the subnet manager 130 is generally depicted, in some embodiments it may be contained within a server, a console, a processor node, a storage subsystem, an I/O chassis or in another device connected to the subnet 120 .
  • a processor node may contain multiple CPUs 140 - 144 and may have a single InfiniBand® host channel adapter 128 . As depicted, the host channel adapter 128 may be connected to both switch 102 and switch 108 . As illustrated by processor node 124 , a processor node may contain more than one host channel adapter 130 and 132 connected to different switches 102 and 104 .
  • Each host channel adapter 128 - 136 may have a globally unique identifier (GUID) that is assigned by the channel adapter vendor.
  • GUID globally unique identifier
  • local identification numbers assigned by the subnet manager are static (i.e., they do not change from one power cycle to the next).
  • each port may have a port GUID assigned by the manufacturer.
  • Every destination within the subnet 120 may also be configured with one or more unique local identifiers (LIDs), which are statically assigned to each destination endpoint.
  • LIDs unique local identifiers
  • the subnet manager 130 in order to maintain static assignment of the LIDs to each destination endpoint, is provided with a mapping table including a mapping of GUIDs to corresponding LIDs.
  • the mapping table includes a mapping of LID assignments based on switch and port locations that are discoverable by both the subnet manager 130 and the destination endpoints.
  • software logic defines a predetermined process for assigning LIDs and corresponding GUIDs.
  • Packets may contain a destination address that specifies the LID of the destination. From the point of view of a switch, a destination LID may represent a path through the switch. Switches 102 - 114 may be configured with routing tables and an individual packet may be forwarded to an output port based on the packet's destination LID and the switch's forwarding table.
  • Switches 102 - 114 may primarily pass packets along based on a destination address within the packet's local route header. Switches 102 - 114 may also consume packets required for managing the switches 102 - 114 themselves.
  • a switch port may incorporate the properties of a physical InfiniBand® host channel adapter. Switches 102 - 114 may also support delivery of a single packet to a single destination as well as the delivery of a single packet to multiple destinations.
  • a storage subsystem 146 containing a storage capacity 150 , a controller 148 and an InfiniBand® host channel adapter 160 may be connected to switches 104 and 106 .
  • a RAID storage subsystem 162 may also be connected via InfiniBand® host channel adapter 164 to switches 108 and 102 within the subnet 120 .
  • I/O chassis 166 and 168 may be connected to switches 112 and 114 respectively.
  • FIG. 1 may be a VSAN environment. Notwithstanding the illustration of some of the functionality attendant to the various embodiments however, one of ordinary skill will appreciate that the methodologies herein may be adapted to a wide variety of implementations and scenarios.
  • FIG. 2 illustrates one aspect of one embodiment of the present invention 200 .
  • a host 202 interacts with an interface node 204 .
  • a cache node queue 206 processes I/O requests for cache node 208 .
  • the cache node queue 206 is showing a congested state between the cache node 208 and the interface node 204 .
  • An exemplary implementation of the present invention method as aforementioned remedies the congested or choked state shown between the cache node queue 206 and the interface node 204 .
  • the mechanisms of the present invention according to FIG. 2 may be implemented to monitor one or more distinct queue lengths.
  • a flow depth control queue may be monitored within the storage server environment.
  • a speed queue length may be monitored.
  • a duplex or duplex control queue length may be monitored within the storage or server system environment.
  • FIG. 3 illustrates a flow chart according to one embodiment of the present invention 300 .
  • a storage transmit queue length is monitored by nodes within the storage grid 304 .
  • a high watermark is reached indicating that a congested or choked state message be transmitted to other receiving nodes within the grid 306 .
  • the nodes receiving the choked state message alter storage requests based upon the watermarked message 308 . In the example given above, this indicates that nodes within the grid limit I/O requests to the particular node transmitting the message, however as aforementioned, other implementations may exist while staying within the scope of the present invention.
  • the method ends 310 .
  • FIG. 4 illustrates an additional flow chart of a method according to one embodiment of the present invention 400 .
  • nodes monitor internal queues related to interconnect function within the storage grid 404 .
  • the higher level grid nodes rate limit I/O requests to the node experiencing the congestion 408 .
  • Neighbor nodes take on the responsibility of assisting with the requests currently in transmission. The method ends 410 .
  • FIG. 5 illustrates still an additional flow chart of a method according to one embodiment of the present invention 500 .
  • high-level grid nodes limit I/O requests to a particular node that has transmitted a high watermarked message indicating a congested or choked state 504 .
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Abstract

Methods, systems, and computer program product embodiments for controlling congestion in a storage grid, by a processor device, are provided. In a storage grid, a storage request transmit queue length is monitored. Upon reaching at least one of a certain threshold, a watermarked message is transmitted to a receiving node, the receiving node altering storage requests based upon the watermarked message.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates in general to computers, and more particularly to a method, system, and computer program product for controlling congestion in a storage grid, such as in a software defined storage system (SDS) or SDS server environment.
  • 2. Description of the Related Art
  • In today's society, computers are commonplace. Aside from home and personal computers, large corporations and businesses often use many systems interconnected together in a cluster to handle and store large amounts of data. As technology evolves, these storage systems have continued to become software based.
  • SDS or SDS server environments using architectures such as the industry standard InfiniBand® architecture, for example, may be used for interconnecting systems in cluster configurations, by providing a channel-based, switched-fabric technology. In such a configuration, data may be transmitted via messages which are made up of packets. Each device, whether processor or input/output (I/O), may include a channel adapter. The messages are typically transmitted from one device's channel adapter to another device's channel adapter via switches.
  • SUMMARY OF THE DESCRIBED EMBODIMENTS
  • Various embodiments for controlling congestion in a storage grid are provided. In one embodiment, the method comprises, monitoring a storage request transmit queue length; wherein upon reaching at least one of a certain threshold, a watermarked message is transmitted to a receiving node, the receiving node altering storage requests based upon the watermarked message.
  • In addition to the foregoing exemplary embodiment, various other system and computer program product embodiments are provided and supply related advantages. The foregoing summary has been provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The claimed subject matter is not limited to implementations that solve any or all disadvantages noted in the background.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • In order that the advantages of the invention will be readily understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings. Understanding that these drawings depict embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings, in which:
  • FIG. 1 illustrates an exemplary computing environment, more specifically a clustered storage system, in which aspects of the present invention may be implemented;
  • FIG. 2 illustrates a portion of a computing environment according to one aspect of the present invention;
  • FIG. 3 illustrates a method according to one aspect of the present invention;
  • FIG. 4 illustrates an additional method according to one aspect of the present invention; and
  • FIG. 5 illustrates still an additional method according to one aspect of the present invention.
  • DETAILED DESCRIPTION OF THE DRAWINGS
  • Modern storage arrays are moving away from physical hardware devices and transitioning to become software defined systems (SDS), or SDS server environments. Technical progression in systems such as these often dictates back end changes within the systems and the connections between them. For example, within storage arrays, the shift toward SDS requires necessary changes related to the backbone links within the array and the corresponding interconnections between its nodes. These interconnecting links are generally hidden from the end user and serves primarily for such purposes as data resiliency, cache coherency, snapshot management, mirroring, compression, cluster management, etc. within the storage arrays internal logic.
  • As aforementioned, InfiniBand®, for example, is an industry standard architecture that may be used for interconnecting systems in cluster configurations, by providing a channel-based, switched-fabric technology. In such a configuration, data may be transmitted via messages which are made up of packets. Each device, whether processor or input/output (I/O), may include a channel adapter. The messages are typically transmitted from one device's channel adapter to another device's channel adapter via switches.
  • Such as the aforementioned servers or systems may be implemented in an SDS, or software defined environment as will be further discussed. One example of such a server or storage environment may be a virtual storage area network (VSAN). Complex SAN configurations such as the aforementioned enable large numbers of computing components such as servers to access common storage via interconnection switches and cabling. The availability, integrity and recovery of these interconnections are critical to the reliable operations of the systems. VSANs, modeled after virtual local area networks (VLANs) are often used because of their flexibility in set up.
  • In systems such as Infiniband® architectures implemented in SDS or SDS server environments, the aforementioned changes are complicated by matters such as interconnecting links being replaced by Ethernet and TCP/IP protocols within the software defined storage grids, making the backbone interconnecting links more expensive with regard to CPU usage, latency, and total bandwidth capacity. For example, the storage grids nodes may begin to choke one another when an over-commit of work is placed on a specific node within the grid.
  • Present congestion control techniques include such methods as flow control systems that rate-limit some, or all, internal services according to various resource limits, such as disk queue depth or parallel jobs in the node. Under high load, nodes will pass as many requests as possible over the storage grid, however, the TCP/IP stack will inevitably consume a majority of the machine processing power. This leaves inadequate processing power for performing the actual storage input/output (I/O) operations themselves, and leaves the storage grid congested. Ultimately, this causes such failures as timer aborts regarding the delayed response times.
  • Accordingly, to address the deficiencies within current methods for congestion control, the present invention provides congestion control within the storage grid, accounting for the amount of processing power required to transfer data over the grid. More specifically, it enables the use of Ethernet TCP/IP networks for interconnect functions without leaving the grid congested and choked.
  • In one embodiment, the method comprises, monitoring a storage request transmit queue length; wherein upon reaching at least one of a certain threshold, a watermarked message is transmitted to a receiving node, the receiving node altering storage requests based upon the watermarked message.
  • In other words, the new interconnect congestion control is accomplished by monitoring the job transmit queue length. In a congested system, each message that is unable to transmit using the current TCP/IP socket buffer is added to this queue. Additionally, the new method provides a high watermark and a low watermark. When a predefined threshold is reached for the transmit queue, a high watermarked message is created and transmitted to upper level nodes within the grid.
  • In one embodiment, when upper level grid nodes receive this high watermarked message, indicating a specific node is experiencing a choked state, the upper level grid nodes will accommodate by slowing or ceasing storage I/O requests to the congested node. Similarly, when the transmit queue slows or becomes free and a low watermark is reached, the congested or choked message state message is ceased. Upper layer grid nodes then may resume the transmission of normal storage I/O requests.
  • Unlike traditional methods, such as a pool resource limit, when a choked state message is transmitted to upper level grid nodes, internal I/O job requests are still serviced by neighbor nodes. This is accomplished by broadcasting the choked state message to other grid nodes, indicating they should rate limit internal I/O job requests to the specific node indicating the congestion.
  • In one embodiment, each grid member monitors the choked state message of other nodes within the grid, as well as clearing the choked state message for any given node when the choked state message has not been broadcast for a predetermined amount of time. Again, a high watermark is maintained indicating a choked or congested state, which is broadcast to other nodes within the grid when a predetermined threshold has been reached. A low watermark, or lower threshold then clears this message, allowing normal I/O storage requests to continue. The watermarked messages may relate to a variety of internal queues regarding interconnect function within the storage grid.
  • Note that while in the aforementioned example, the high watermark and low watermark are thresholds regarding limiting I/O requests to a particular node experiencing congestion or a choked state. However, in other implementations, these watermarked messages may be used in other ways staying within the spirit of the invention. One example may be speeding I/O job requests when a particular watermark is reached or cleared. Other implementations may exist while staying within the scope of the aforementioned examples.
  • Turning now to the Figures, FIG. 1 illustrates an exemplary clustered storage system or clustered server system, such as an InfiniBand® implementation 100. Switches 102-114 and a router 116 form the subnet 120. Multiple processor nodes 122-126 may be connected to switches within the subnet 120 through InfiniBand® host channel adapters 128-136 to form a cluster. Although a single cluster is shown, multiple clusters, each including multiple processor nodes similar to nodes 122-126, can be connected to switches within the subnet 120 in other embodiments. One of the nodes of subnet 120 hosts a subnet manager node. For example, end node 108 includes a subnet manager 130 and subnet administrator and its database 132. Subnet manager 130 is used for discovery, configuration, and initialization of the fabric. In an embodiment, subnet manager 130 configures host channel adapters 128-136 with the local addresses for each associated physical port, i.e., the port's LID. Although the subnet manager 130 is generally depicted, in some embodiments it may be contained within a server, a console, a processor node, a storage subsystem, an I/O chassis or in another device connected to the subnet 120.
  • As illustrated by processor Node 122, a processor node may contain multiple CPUs 140-144 and may have a single InfiniBand® host channel adapter 128. As depicted, the host channel adapter 128 may be connected to both switch 102 and switch 108. As illustrated by processor node 124, a processor node may contain more than one host channel adapter 130 and 132 connected to different switches 102 and 104.
  • Each host channel adapter 128-136 may have a globally unique identifier (GUID) that is assigned by the channel adapter vendor. According to an embodiment, local identification numbers assigned by the subnet manager are static (i.e., they do not change from one power cycle to the next). Additionally, each port may have a port GUID assigned by the manufacturer.
  • Every destination within the subnet 120 may also be configured with one or more unique local identifiers (LIDs), which are statically assigned to each destination endpoint. In an embodiment, in order to maintain static assignment of the LIDs to each destination endpoint, the subnet manager 130 is provided with a mapping table including a mapping of GUIDs to corresponding LIDs. In another embodiment, the mapping table includes a mapping of LID assignments based on switch and port locations that are discoverable by both the subnet manager 130 and the destination endpoints. In still another embodiment, software logic defines a predetermined process for assigning LIDs and corresponding GUIDs.
  • Packets may contain a destination address that specifies the LID of the destination. From the point of view of a switch, a destination LID may represent a path through the switch. Switches 102-114 may be configured with routing tables and an individual packet may be forwarded to an output port based on the packet's destination LID and the switch's forwarding table.
  • Switches 102-114 may primarily pass packets along based on a destination address within the packet's local route header. Switches 102-114 may also consume packets required for managing the switches 102-114 themselves. Optionally, a switch port may incorporate the properties of a physical InfiniBand® host channel adapter. Switches 102-114 may also support delivery of a single packet to a single destination as well as the delivery of a single packet to multiple destinations.
  • Various types of storage devices may also be connected to switches within the subnet 120. A storage subsystem 146 containing a storage capacity 150, a controller 148 and an InfiniBand® host channel adapter 160 may be connected to switches 104 and 106. A RAID storage subsystem 162 may also be connected via InfiniBand® host channel adapter 164 to switches 108 and 102 within the subnet 120. As well as the storage subsystems 146 and 162, I/ O chassis 166 and 168 may be connected to switches 112 and 114 respectively.
  • The mechanisms of the present invention may be applicable to a variety of network topologies, network components, and server type systems. According to one aspect of the present invention, FIG. 1, for example, may be a VSAN environment. Notwithstanding the illustration of some of the functionality attendant to the various embodiments however, one of ordinary skill will appreciate that the methodologies herein may be adapted to a wide variety of implementations and scenarios.
  • FIG. 2 illustrates one aspect of one embodiment of the present invention 200. According to FIG. 2, a host 202 interacts with an interface node 204. A cache node queue 206 processes I/O requests for cache node 208. The cache node queue 206 is showing a congested state between the cache node 208 and the interface node 204. An exemplary implementation of the present invention method as aforementioned remedies the congested or choked state shown between the cache node queue 206 and the interface node 204.
  • The mechanisms of the present invention according to FIG. 2 may be implemented to monitor one or more distinct queue lengths. In one embodiment, for example, a flow depth control queue may be monitored within the storage server environment. In another embodiment, for example, a speed queue length may be monitored. In still another example, a duplex or duplex control queue length may be monitored within the storage or server system environment.
  • FIG. 3 illustrates a flow chart according to one embodiment of the present invention 300. Beginning at 302, a storage transmit queue length is monitored by nodes within the storage grid 304. Upon reaching a higher threshold, a high watermark is reached indicating that a congested or choked state message be transmitted to other receiving nodes within the grid 306. The nodes receiving the choked state message alter storage requests based upon the watermarked message 308. In the example given above, this indicates that nodes within the grid limit I/O requests to the particular node transmitting the message, however as aforementioned, other implementations may exist while staying within the scope of the present invention. The method ends 310.
  • FIG. 4 illustrates an additional flow chart of a method according to one embodiment of the present invention 400. Beginning at 402, nodes monitor internal queues related to interconnect function within the storage grid 404. A determination of whether a predetermined threshold is reached regarding predetermined thresholds given to various internal queues within the grid 406. If the threshold has not been reached, nodes within the grid continue to process and request storage I/O jobs as normal and continue to monitor the internal queues 404. If a predetermined threshold has been reached, a high watermarked message is transmitted to other nodes within the grid, indicating a congested or choked state by the transmitting node 406. When a high watermarked message is transmitted to higher-level grid nodes, indicating a congested or choked state, the higher level grid nodes rate limit I/O requests to the node experiencing the congestion 408. Neighbor nodes take on the responsibility of assisting with the requests currently in transmission. The method ends 410.
  • FIG. 5 illustrates still an additional flow chart of a method according to one embodiment of the present invention 500. Beginning at 502, high-level grid nodes limit I/O requests to a particular node that has transmitted a high watermarked message indicating a congested or choked state 504. A determination of whether a predetermined threshold has been reached regarding internal interconnect function queues within the node 506. If the predetermined threshold is continually reached, higher-level grid nodes continue to rate limit I/O requests to the particular node experiencing and transmitting the congested, or choked state 504. If a predetermined threshold, or low watermark is reached, the choked state message is ceased transmission from the particular node 508. Higher-level grid nodes then cease rate limiting I/O requests and resume normal I/O operation job requests to the node 510. The method ends 512.
  • The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
  • While one or more embodiments of the present invention have been illustrated in detail, the skilled artisan will appreciate that modifications and adaptations to those embodiments may be made without departing from the scope of the present invention as set forth in the following claims.

Claims (24)

What is claimed is:
1. A method for congestion control in a storage grid, by a processor device, comprising:
monitoring a storage request transmit queue length; wherein upon reaching at least one of a certain threshold, a watermarked message is transmitted to a receiving node, the receiving node altering storage requests based upon the watermarked message.
2. The method of claim 1, further including monitoring the storage transmit request queue length for a plurality of interconnecting links connecting a plurality of interconnected nodes, wherein the altered storage requests are based upon the watermarked message pertaining to the plurality of interconnecting links connecting the plurality of interconnecting nodes.
3. The method of claim 1, further including a high watermarked message and a low watermarked message.
4. The method of claim 3, further including transmitting a choked state message to upper layer grid nodes when the high watermarked message threshold is reached.
5. The method of claim 4, further including ceasing transmission of the choked state message to upper layer grid nodes when the low watermarked message threshold is reached.
6. The method of claim 4, further including rate limiting, by upper layer grid nodes, storage input/output (I/O) operations requests to the node transmitting the choked state message.
7. The method of claim 1, wherein the watermarked messages are applied to various internal queues related to storage interconnect function.
8. The method of claim 1, wherein the storage grid is a Software Defined Storage grid.
9. A system for congestion control in a storage grid, comprising:
a storage grid;
at least one processor device, wherein the processor device:
monitors a storage request transmit queue length; wherein upon reaching at least one of a certain threshold, a watermarked message is transmitted to a receiving node, the receiving node altering storage requests based upon the watermarked message.
10. The system of claim 9, wherein the processor device monitors the storage transmit request queue length for a plurality of interconnecting links connecting a plurality of interconnected nodes, wherein the altered storage requests are based upon the watermarked message pertaining to the plurality of interconnecting links connecting the plurality of interconnecting nodes.
11. The system of claim 9, further including a high watermarked message and a low watermarked message.
12. The system of claim 11, wherein the processor device transmits a choked state message to upper layer grid nodes when the high watermarked message threshold is reached.
13. The system of claim 12, wherein the processor device ceases transmission of the choked state message to upper layer grid nodes when the low watermarked message threshold is reached.
14. The system of claim 12, wherein the processor device rate limits, by upper layer grid nodes, storage input/output (I/O) operations requests to the node transmitting the choked state message.
15. The system of claim 8, wherein the watermarked messages are applied to various internal queues related to storage interconnect function.
16. The system of claim 8, wherein the storage grid is a Software Defined Storage grid.
17. A computer program product for congestion control in a storage grid by a processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
a first executable portion that monitors a storage request transmit queue length; wherein upon reaching at least one of a certain threshold, a watermarked message is transmitted to a receiving node, the receiving node altering storage requests based upon the watermarked message.
18. The computer program product of claim 17, further including a second executable portion that monitors the storage transmit request queue length for a plurality of interconnecting links connecting a plurality of interconnected nodes, wherein the altered storage requests are based upon the watermarked message pertaining to the plurality of interconnecting links connecting the plurality of interconnecting nodes.
19. The computer program product of claim 17, further including a high watermarked message and a low watermarked message.
20. The computer program product of claim 19, further including a second executable portion that transmits a choked state message to upper layer grid nodes when the high watermarked message threshold is reached.
21. The computer program product of claim 20, further including a third executable portion that ceases transmission of the choked state message to upper layer grid nodes when the low watermarked message threshold is reached.
22. The computer program product of claim 20, further including a third executable portion that rate limits, by upper layer grid nodes, storage input/output (I/O) operations requests to the node transmitting the choked state message.
23. The computer program product of claim 17, wherein the watermarked messages are applied to various internal queues related to storage interconnect function.
24. The computer program product of claim 17, wherein the storage grid is a Software Defined Storage grid.
US14/682,573 2015-04-09 2015-04-09 Interconnect congestion control in a storage grid Expired - Fee Related US9876698B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/682,573 US9876698B2 (en) 2015-04-09 2015-04-09 Interconnect congestion control in a storage grid
US15/805,447 US10257066B2 (en) 2015-04-09 2017-11-07 Interconnect congestion control in a storage grid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/682,573 US9876698B2 (en) 2015-04-09 2015-04-09 Interconnect congestion control in a storage grid

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/805,447 Continuation US10257066B2 (en) 2015-04-09 2017-11-07 Interconnect congestion control in a storage grid

Publications (2)

Publication Number Publication Date
US20160301610A1 true US20160301610A1 (en) 2016-10-13
US9876698B2 US9876698B2 (en) 2018-01-23

Family

ID=57112879

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/682,573 Expired - Fee Related US9876698B2 (en) 2015-04-09 2015-04-09 Interconnect congestion control in a storage grid
US15/805,447 Expired - Fee Related US10257066B2 (en) 2015-04-09 2017-11-07 Interconnect congestion control in a storage grid

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/805,447 Expired - Fee Related US10257066B2 (en) 2015-04-09 2017-11-07 Interconnect congestion control in a storage grid

Country Status (1)

Country Link
US (2) US9876698B2 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170331751A1 (en) * 2015-11-19 2017-11-16 Viasat, Inc. Enhancing capacity of a direct communication link
US10146465B1 (en) * 2015-12-18 2018-12-04 EMC IP Holding Company LLC Automated provisioning and de-provisioning software defined storage systems
US20200021484A1 (en) * 2016-01-27 2020-01-16 Oracle International Corporation System and method of host-side configuration of a host channel adapter (hca) in a high-performance computing environment
US10761726B2 (en) * 2018-04-16 2020-09-01 VWware, Inc. Resource fairness control in distributed storage systems using congestion data
US10761745B1 (en) * 2016-12-14 2020-09-01 Ocient Inc. System and method for managing parity within a database management system
WO2020236287A1 (en) * 2019-05-23 2020-11-26 Cray Inc. System and method for facilitating data-driven intelligent network with per-flow credit-based flow control
US10972375B2 (en) 2016-01-27 2021-04-06 Oracle International Corporation System and method of reserving a specific queue pair number for proprietary management traffic in a high-performance computing environment
US11010270B2 (en) 2015-04-28 2021-05-18 Viasat, Inc. Self-organized storage nodes for distributed delivery network
US11018947B2 (en) 2016-01-27 2021-05-25 Oracle International Corporation System and method for supporting on-demand setup of local host channel adapter port partition membership in a high-performance computing environment
US11144226B2 (en) 2019-04-11 2021-10-12 Samsung Electronics Co., Ltd. Intelligent path selection and load balancing
US11216190B2 (en) 2019-06-10 2022-01-04 Samsung Electronics Co., Ltd. Systems and methods for I/O transmissions in queue pair-based NVMeoF initiator-target system
US11240294B2 (en) 2019-08-23 2022-02-01 Samsung Electronics Co., Ltd. Systems and methods for spike detection and load balancing resource management
US20220253417A1 (en) * 2016-12-14 2022-08-11 Ocient Inc. Database management systems for managing data with data confidence
US11962490B2 (en) 2020-03-23 2024-04-16 Hewlett Packard Enterprise Development Lp Systems and methods for per traffic class routing

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9876698B2 (en) * 2015-04-09 2018-01-23 International Business Machines Corporation Interconnect congestion control in a storage grid

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108444A1 (en) * 2003-11-19 2005-05-19 Flauaus Gary R. Method of detecting and monitoring fabric congestion
US20100103816A1 (en) * 2008-10-28 2010-04-29 Fujitsu Limited Transmission apparatus, transmission system and transmission method
US20100238941A1 (en) * 2009-03-19 2010-09-23 Fujitsu Limited Packet transmission apparatus, line interface unit, and control method for packet transmission apparatus
US20100271725A1 (en) * 2009-04-28 2010-10-28 Samsung Electronics Co., Ltd. Apparatus and method for preventing queue overflow for hard disk drive protection in computer system
US20110119679A1 (en) * 2009-11-13 2011-05-19 Muppirala Kishore Kumar Method and system of an i/o stack for controlling flows of workload specific i/o requests
US20140310434A1 (en) * 2013-04-11 2014-10-16 Transparent Io, Inc. Enlightened Storage Target
US20150103667A1 (en) * 2013-10-13 2015-04-16 Mellanox Technologies Ltd. Detection of root and victim network congestion
US9454321B1 (en) * 2014-05-30 2016-09-27 Emc Corporation Workload-driven storage configuration management
US20160301611A1 (en) * 2013-12-10 2016-10-13 Huawei Technologies Co., Ltd. Method for avoiding congestion on network device and network device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9239786B2 (en) 2012-01-18 2016-01-19 Samsung Electronics Co., Ltd. Reconfigurable storage device
US9747034B2 (en) 2013-01-15 2017-08-29 Xiotech Corporation Orchestrating management operations among a plurality of intelligent storage elements
US9430412B2 (en) 2013-06-26 2016-08-30 Cnex Labs, Inc. NVM express controller for remote access of memory and I/O over Ethernet-type networks
US9876698B2 (en) * 2015-04-09 2018-01-23 International Business Machines Corporation Interconnect congestion control in a storage grid

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108444A1 (en) * 2003-11-19 2005-05-19 Flauaus Gary R. Method of detecting and monitoring fabric congestion
US20100103816A1 (en) * 2008-10-28 2010-04-29 Fujitsu Limited Transmission apparatus, transmission system and transmission method
US20100238941A1 (en) * 2009-03-19 2010-09-23 Fujitsu Limited Packet transmission apparatus, line interface unit, and control method for packet transmission apparatus
US20100271725A1 (en) * 2009-04-28 2010-10-28 Samsung Electronics Co., Ltd. Apparatus and method for preventing queue overflow for hard disk drive protection in computer system
US20110119679A1 (en) * 2009-11-13 2011-05-19 Muppirala Kishore Kumar Method and system of an i/o stack for controlling flows of workload specific i/o requests
US20140310434A1 (en) * 2013-04-11 2014-10-16 Transparent Io, Inc. Enlightened Storage Target
US20150103667A1 (en) * 2013-10-13 2015-04-16 Mellanox Technologies Ltd. Detection of root and victim network congestion
US20160301611A1 (en) * 2013-12-10 2016-10-13 Huawei Technologies Co., Ltd. Method for avoiding congestion on network device and network device
US9454321B1 (en) * 2014-05-30 2016-09-27 Emc Corporation Workload-driven storage configuration management

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11010270B2 (en) 2015-04-28 2021-05-18 Viasat, Inc. Self-organized storage nodes for distributed delivery network
US10069743B2 (en) * 2015-11-19 2018-09-04 Viasat, Inc. Enhancing capacity of a direct communication link
US10536384B2 (en) 2015-11-19 2020-01-14 Viasat, Inc. Enhancing capacity of a direct communication link
US20170331751A1 (en) * 2015-11-19 2017-11-16 Viasat, Inc. Enhancing capacity of a direct communication link
US11032204B2 (en) 2015-11-19 2021-06-08 Viasat, Inc. Enhancing capacity of a direct communication link
US10146465B1 (en) * 2015-12-18 2018-12-04 EMC IP Holding Company LLC Automated provisioning and de-provisioning software defined storage systems
US10684784B2 (en) 2015-12-18 2020-06-16 EMC IP Holding Company LLC Automated provisioning and de-provisioning software defined storage systems
US11252023B2 (en) 2016-01-27 2022-02-15 Oracle International Corporation System and method for application of virtual host channel adapter configuration policies in a high-performance computing environment
US11805008B2 (en) 2016-01-27 2023-10-31 Oracle International Corporation System and method for supporting on-demand setup of local host channel adapter port partition membership in a high-performance computing environment
US10972375B2 (en) 2016-01-27 2021-04-06 Oracle International Corporation System and method of reserving a specific queue pair number for proprietary management traffic in a high-performance computing environment
US11012293B2 (en) 2016-01-27 2021-05-18 Oracle International Corporation System and method for defining virtual machine fabric profiles of virtual machines in a high-performance computing environment
US11451434B2 (en) 2016-01-27 2022-09-20 Oracle International Corporation System and method for correlating fabric-level group membership with subnet-level partition membership in a high-performance computing environment
US11018947B2 (en) 2016-01-27 2021-05-25 Oracle International Corporation System and method for supporting on-demand setup of local host channel adapter port partition membership in a high-performance computing environment
US20200021484A1 (en) * 2016-01-27 2020-01-16 Oracle International Corporation System and method of host-side configuration of a host channel adapter (hca) in a high-performance computing environment
US11128524B2 (en) * 2016-01-27 2021-09-21 Oracle International Corporation System and method of host-side configuration of a host channel adapter (HCA) in a high-performance computing environment
US11868623B2 (en) 2016-12-14 2024-01-09 Ocient Inc. Database management system with coding cluster and methods for use therewith
US11797506B2 (en) * 2016-12-14 2023-10-24 Ocient Inc. Database management systems for managing data with data confidence
US11334257B2 (en) 2016-12-14 2022-05-17 Ocient Inc. Database management system and methods for use therewith
US20220253417A1 (en) * 2016-12-14 2022-08-11 Ocient Inc. Database management systems for managing data with data confidence
US10761745B1 (en) * 2016-12-14 2020-09-01 Ocient Inc. System and method for managing parity within a database management system
US11599278B2 (en) 2016-12-14 2023-03-07 Ocient Inc. Database system with designated leader and methods for use therewith
US10761726B2 (en) * 2018-04-16 2020-09-01 VWware, Inc. Resource fairness control in distributed storage systems using congestion data
US11144226B2 (en) 2019-04-11 2021-10-12 Samsung Electronics Co., Ltd. Intelligent path selection and load balancing
US11740815B2 (en) 2019-04-11 2023-08-29 Samsung Electronics Co., Ltd. Intelligent path selection and load balancing
US11750504B2 (en) 2019-05-23 2023-09-05 Hewlett Packard Enterprise Development Lp Method and system for providing network egress fairness between applications
US11848859B2 (en) 2019-05-23 2023-12-19 Hewlett Packard Enterprise Development Lp System and method for facilitating on-demand paging in a network interface controller (NIC)
US11757763B2 (en) 2019-05-23 2023-09-12 Hewlett Packard Enterprise Development Lp System and method for facilitating efficient host memory access from a network interface controller (NIC)
US11765074B2 (en) 2019-05-23 2023-09-19 Hewlett Packard Enterprise Development Lp System and method for facilitating hybrid message matching in a network interface controller (NIC)
US11777843B2 (en) 2019-05-23 2023-10-03 Hewlett Packard Enterprise Development Lp System and method for facilitating data-driven intelligent network
US11784920B2 (en) 2019-05-23 2023-10-10 Hewlett Packard Enterprise Development Lp Algorithms for use of load information from neighboring nodes in adaptive routing
US11792114B2 (en) 2019-05-23 2023-10-17 Hewlett Packard Enterprise Development Lp System and method for facilitating efficient management of non-idempotent operations in a network interface controller (NIC)
US11799764B2 (en) 2019-05-23 2023-10-24 Hewlett Packard Enterprise Development Lp System and method for facilitating efficient packet injection into an output buffer in a network interface controller (NIC)
US11929919B2 (en) 2019-05-23 2024-03-12 Hewlett Packard Enterprise Development Lp System and method for facilitating self-managing reduction engines
US11916781B2 (en) 2019-05-23 2024-02-27 Hewlett Packard Enterprise Development Lp System and method for facilitating efficient utilization of an output buffer in a network interface controller (NIC)
US11818037B2 (en) 2019-05-23 2023-11-14 Hewlett Packard Enterprise Development Lp Switch device for facilitating switching in data-driven intelligent network
US11757764B2 (en) 2019-05-23 2023-09-12 Hewlett Packard Enterprise Development Lp Optimized adaptive routing to reduce number of hops
US11855881B2 (en) 2019-05-23 2023-12-26 Hewlett Packard Enterprise Development Lp System and method for facilitating efficient packet forwarding using a message state table in a network interface controller (NIC)
US11863431B2 (en) 2019-05-23 2024-01-02 Hewlett Packard Enterprise Development Lp System and method for facilitating fine-grain flow control in a network interface controller (NIC)
WO2020236287A1 (en) * 2019-05-23 2020-11-26 Cray Inc. System and method for facilitating data-driven intelligent network with per-flow credit-based flow control
US11876701B2 (en) 2019-05-23 2024-01-16 Hewlett Packard Enterprise Development Lp System and method for facilitating operation management in a network interface controller (NIC) for accelerators
US11876702B2 (en) 2019-05-23 2024-01-16 Hewlett Packard Enterprise Development Lp System and method for facilitating efficient address translation in a network interface controller (NIC)
US11882025B2 (en) 2019-05-23 2024-01-23 Hewlett Packard Enterprise Development Lp System and method for facilitating efficient message matching in a network interface controller (NIC)
US11899596B2 (en) 2019-05-23 2024-02-13 Hewlett Packard Enterprise Development Lp System and method for facilitating dynamic command management in a network interface controller (NIC)
US11902150B2 (en) 2019-05-23 2024-02-13 Hewlett Packard Enterprise Development Lp Systems and methods for adaptive routing in the presence of persistent flows
US11916782B2 (en) 2019-05-23 2024-02-27 Hewlett Packard Enterprise Development Lp System and method for facilitating global fairness in a network
US11216190B2 (en) 2019-06-10 2022-01-04 Samsung Electronics Co., Ltd. Systems and methods for I/O transmissions in queue pair-based NVMeoF initiator-target system
US11240294B2 (en) 2019-08-23 2022-02-01 Samsung Electronics Co., Ltd. Systems and methods for spike detection and load balancing resource management
US11962490B2 (en) 2020-03-23 2024-04-16 Hewlett Packard Enterprise Development Lp Systems and methods for per traffic class routing

Also Published As

Publication number Publication date
US9876698B2 (en) 2018-01-23
US20180091408A1 (en) 2018-03-29
US10257066B2 (en) 2019-04-09

Similar Documents

Publication Publication Date Title
US10257066B2 (en) Interconnect congestion control in a storage grid
JP6670109B2 (en) Scalable flow and congestion control in networks
US9112794B2 (en) Dynamic multipath forwarding in software defined data center networks
US8386825B2 (en) Method and system for power management in a virtual machine environment without disrupting network connectivity
US9473414B2 (en) Method and system for supporting packet prioritization at a data network
CN107959625B (en) Virtual router with dynamic flow offload capability
US10594565B2 (en) Multicast advertisement message for a network switch in a storage area network
US10103969B2 (en) Open shortest path first routing for hybrid networks
US10411742B2 (en) Link aggregation configuration for a node in a software-defined network
US10826823B2 (en) Centralized label-based software defined network
CN112398676A (en) Vendor independent profile based modeling of service access endpoints in a multi-tenant environment
US11121969B2 (en) Routing between software defined networks and physical networks
US20220045969A1 (en) Mapping nvme-over-fabric packets using virtual output queues
CN115769556A (en) Path visibility, packet loss and delay measurements of service chain data flows
US11632288B2 (en) Determining the impact of network events on network applications
US11438263B2 (en) Policy application
US20150138979A1 (en) Network management control device, network management control system, and network management control method
US11336695B2 (en) Conversation-based policy distribution
EP4262150A1 (en) Layer-3 policy enforcement for layer-7 data flows
US11962498B1 (en) Symmetric networking for orphan workloads in cloud networks
US10764168B1 (en) Adjusting communications parameters based on known characteristics
US9172490B2 (en) Virtual wavelength networks

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AMIT, JONATHAN;BARZIK, ZAH;DROUKER, VLADISLAV;AND OTHERS;REEL/FRAME:035394/0476

Effective date: 20150412

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20220123