WO2004084508A1

WO2004084508A1 - Method and apparatus for controlling congestion in communications network

Info

Publication number: WO2004084508A1
Application number: PCT/SG2003/000054
Authority: WO
Inventors: Mehul Motani; Saravanan Govindan; Peng Yong Kong
Original assignee: Agency For Science, Technology And Research
Priority date: 2003-03-20
Filing date: 2003-03-20
Publication date: 2004-09-30
Also published as: AU2003217146A1

Abstract

A random sampling buffer management (RS) scheme for controlling congestion in a packet switching network is disclosed. Upon receiving a newly arrived packet, a new average occupancy level of the buffer is calculated. The newly arrived packet is admitted into the buffer if the calculated average occupancy level of the buffer lies within a congestion region, the newly arrived packet is compared to a number of sample packets randomly selected from the packets in the buffer in a matching process. The newly arrived packet is discarded in accordance with a probability that is proportional to the number of matches found in the matching process. The matching process includes declaring a match upon finding the newly arrived packet being associated with the same active flow as that of a packet from among the randomly sampled packets based on their respective flow identifiers and tracking the number of matches declared.

Description

Method and Apparatus for Controlling Congestion in a Communications

Network

Field of the Invention

This invention relates to congestion control in communications networks. In particular, it relates to a method and apparatus for managing buffers in a packet switching network.

Background

Communications networks such as the Internet transport information in packet form from source nodes to destination nodes over transmission links, which connect the source nodes and the destination nodes. Generally, source-destination node pairs are connected by a series of intermediate nodes, called routers or switches, whose main function is to route packets received from source and/or previous intermediate nodes to the appropriate subsequent nodes and/or destination nodes. The intermediate nodes also have buffers at their input and/or output ports for temporary storage of packets during periods when the number of packets arriving at the intermediate nodes exceeds the bandwidth capacity of the output transmission link.

The Internet and many other communications networks use a variety of transport protocols that implement congestion control and error recovery mechanisms for transporting data in the form of packets. One such protocol that is widely implemented is the Transmission Control Protocol (TCP). The TCP uses an adaptive window based congestion control mechanism. In most commercial implementations of TCP, when a source node detects packet loss it decreases its transmission rate by reducing the size of its congestion window. Under most circumstances, as long as the source node does not detect any packet losses, it periodically increases the size of its congestion window by one packet over a time interval of the order of a round-trip time. A round-trip time is the time between the start of transmission of a packet to the beginning of receipt of an acknowledgement for the transmitted packet. Packet losses, as a result of congestion, lead to abrupt reduction in transmission rates and slacken subsequent growth in the congestion window. This can restrict high throughput and lower link utilization. Furthermore, the TCP has no way of warning the source nodes about the onset of congestion before the buffer limit is reached. This can invariably result in many existing active flows achieving meager throughputs and new active flows achieving zero throughputs because the first packet sent by these new active flows is discarded by intermediate nodes along the transmission links.

To help alleviate the foregoing problems, an intermediate node typically requires large buffers for accommodating heavy network traffic. However, this is not a desirable solution because it is extremely difficult to predict the maximum traffic flow and therefore select the correct buffer size. Further, excessive buffering can result in high delays that may be unfavorable to many applications. Therefore there is a need to provide buffer management schemes to control congestion on the Internet and the like communications networks.

There are a number of buffer management schemes proposed. These schemes include the Random Early Detection (RED) scheme, the Stabilized RED (SRED) scheme and the "CHOose and Keep for responsive flows, CHOose and Kill for unresponsive flows" (CHOKe) scheme.

The RED scheme is proposed by S. Floyd and V. Jacobson in "Random Early Detection Gateways for Congestion Avoidance," IEEE/ACM Transactions on Networking, vol. 1, no. 4, pp. 397-413, August 1993. This buffer management scheme relies on adaptive transport protocols like TCP. In the RED scheme, source nodes are notified about the onset of congestion at an intermediate node when the average buffer content at the intermediate node exceeds a predetermined threshold. The RED scheme notifies the source node(s) by discarding packets based on a probability that is related to the average occupancy level of the buffer. If the source nodes use the TCP, the source nodes reduce their transmission rates in response to packet drops, thus reducing congestion at intermediate nodes along the transmission links to destination nodes. Some of the key features of the RED scheme include the ability to avoid synchronization of flows by randomly dropping packets from different active flows, the maintenance of low buffer occupancy levels to ensure short delays and the prevention of bias against active flows with bursty traffic patterns.

The RED scheme achieves these features by first computing the average occupancy level of the buffer using an exponentially weighted moving average and comparing it to two thresholds: a lower bound average buffer occupancy threshold (Min_th) and an upper bound average buffer occupancy threshold (Max_th). Incoming packets are discarded with a packet drop probability that is a linear function of the average occupancy level of the buffer when the average occupancy level is between the two average buffer occupancy thresholds. The packet drop probability is zero when the average buffer occupancy level is below Min_th and one when the average buffer occupancy level is above Max_t , as shown in a packet drop probability curve 100 in FIG. 1. When the average occupancy level of the buffer is between the two thresholds, the packet drop probability P is calculated as follows:

where Max_p represents a maximum packet drop probability constant and avg represents the average occupancy level of the buffer. Max_p sets the maximum probability for discarding a packet. Depending on the operational requirements of the intermediate node, Max_p can have a value ranging from greater than zero up to one. The average occupancy level of the buffer avg is calculated as follows:

avg = (1 - w)avg_prev + wq

where w represents a weight constant, avg_prev represents the immediate previous average occupancy level of the buffer and q represents the current occupancy level of the buffer. This is essentially a lowpass filter, where w determines the time constant of the lowpass filter. Thus, the average occupancy level of the buffer avg is an exponentially weighted moving average, which is not substantially affected by random bursty traffic.

The RED scheme works well for random bursty traffic and in situations where congestion is not prolonged. During persistent congestion periods the RED scheme applies a uniform packet drop probability on all arriving packets irrespective of the level of contribution each of the active flows has on the buffer and therefore the state of congestion. That is, the RED scheme is not able to distinguish which of the active flows contribute the most to the congestion and therefore unable to penalize these active flows accordingly. This leads to unfair bandwidth utilization by certain active flows on the Internet and the like communications networks that deploy the RED scheme in the intermediate nodes. Active flows that have high levels of contribution to the congestion at intermediate nodes are referred to as misbehaving flows.

The SRED scheme is proposed by T. Ott, et. al. in "SRED: Stablized RED," IEEE INFOCOM, vol. 3, pp. 1346-1355, 1999 and a related US patent no. 6,434,116. The SRED scheme seeks to improve upon the RED scheme by stabilizing the occupancy of a buffer at a level independent of the number of active flows. The SRED scheme achieves this by estimating the number of active flows and finding misbehaving flows. The main idea is to compare the information of a newly arrived packet with information from a randomly chosen entry from a fixed size data structure, called the "Zombie list". The Zombie list contains a list of entries, whereby each entry contains information of a recent packet that traversed the buffer. If the information of the two packets are found to be associated with the same active flow a match is declared. Accordingly, the newly arrived packet is discarded with a packet drop probability that is related to whether a match is declared. By using the "Zombie list," the SRED scheme is able to identify misbehaving flows. However, the SRED scheme does not propose a simple router mechanism for penalizing the misbehaving flows. Further, the SRED scheme requires additional resources to store, maintain and operate the Zombie list. This invariably results in the intermediate nodes requiring greater hardware and processing capability, which increases the overall cost of the intermediate nodes. The CHOKe scheme is proposed by R Pan, et. al. in "CHOKe - A Stateless Active Queue Management Scheme for Approximating Fair Bandwidth Allocation," IEEE INFOCOM vol. 2, pp. 942-951, 2000. The CHOKe scheme compares a newly arrived packet with a randomly selected set of packets from the buffer. Once a packet from the randomly selected set of packets is found to originate from the same source node as the newly arrived packet a match is declared and the matched packet from the randomly selected set of packets and the newly arrived packet are discarded. If a match is not found, the newly arrived packet is admitted into the buffer using the RED packet drop probability P_d as described in the foregoing. The assumption behind the CHOKe scheme is that the buffer is likely to contain a greater number of packets belonging to misbehaving flows than normal active flows. Thus, the packets from the misbehaving flows are more likely to be selected for comparison with the newly arrived packet. By dropping packets belonging to the misbehaving flows, the CHOKe scheme provides fairer bandwidth utilization for all active flows. However, the CHOKe scheme can result in bursty losses as each packet drop decision is effective for both the newly arrived packet and the packets already admitted into the buffer. Further_s because the CHOKe scheme fundamentally relies on the RED packet drop probability Pd, it also suffers the same drawback of uniform dropping during persistent congestion periods as in the RED scheme since the packet drop probability for each active flow is calculated as a function of the packets from all active flows.

It is therefore desirable to provide a buffer management scheme that exhibits the key features of the RED scheme and yet able to achieve a high degree of fairness in bandwidth utilization without the limitations of the schemes described in the foregoing.

Summary

A random sampling buffer management (RS) scheme according to the embodiments of the invention provides an efficient buffer management scheme that exhibits a high degree of fairness in allocating bandwidth to all active flows. The RS scheme flows in proportion to the level of contribution of each of the active flows to the overall congestion at the intermediate node. The level of contribution of each of the active flows is determined by randomly sampling packets (the sample) stored in the buffer of the intermediate node and comparing the sample with a newly arrived packet at the intermediate node. A match is declared each time a packet in the sample and the newly arrived packet are determined to have the same flow identifier. The number of matches declared is an indication of the level of contribution the active flow has on the overall state of congestion at the intermediate node. Accordingly, the newly arrived packet is discarded according to a probability based on the number of matches declared.

Therefore, in accordance with a first aspect of the invention, there is disclosed a method for controlling congestion in a communications network, the network comprising a recipient node, the recipient node comprising a buffer for storing packets received at the recipient node, the method comprising the steps of: providing a sample, the sample comprising at least one packet randomly selected from packets stored in the buffer upon receiving a packet at the recipient node, the at least one packet and the received packet each having a flow identifier and a packet size, the flow identifier being indicative of the active flow to which the packet belongs; identifying a match group for grouping the at least one packet in the sample, wherein the flow identifier of each packet in the match group matches the flow identifier of the received packet; generating a discard probability based on the sample and the match group; and discarding the received packet based on the generated discard probability to control congestion in the communications network.

In accordance with a second aspect of the invention, there is disclosed an apparatus for controlling congestion in a communications network, the communications network comprising a recipient node, the recipient node comprising a buffer for storing packets received at the recipient node, the apparatus comprising: a sample, the sample comprising at least one packet randomly selected from packets stored in the buffer upon receiving a packet at the recipient node, the at least one packet and the received packet each having a flow identifier and a packet size, the flow identifier being indicative of the active flow to which the packet belongs; means for identifying a match group for grouping the at least one packet in the sample, wherein the flow identifier of each packet in the match group matches the flow identifier of the received packet; means for generating a discard probability based on the sample and the match group; and means for discarding the received packet based on the discard probability to control congestion in the communications network.

Brief Descriptions of The Drawings

Embodiments of the invention are described hereinafter with reference to the following drawings, in which:

FIG. 1 shows a prior art packet drop probability curve of the Random Early Detection (RED) scheme;

FIG. 2 shows a flowchart of the steps performed by an intermediate node for managing its buffers in accordance with a first embodiment of the invention; and

FIG. 3 shows a flowchart of the steps performed by an intermediate node for managing its buffers in accordance with a second embodiment of the invention.

Detailed Description

A random sampling buffer management (RS) scheme according to the embodiments of the invention is provided hereinafter. The RS scheme provides an efficient buffer management scheme that exhibits a high degree of fairness in allocating bandwidth to all active flows in addition to the ability to avoid synchronization of active flows by randomly dropping packets, the maintenance of low buffer occupancy levels to ensure all active flows in addition to the ability to avoid synchronization of active flows by randomly dropping packets, the maintenance of low buffer occupancy levels to ensure short delays and the prevention of bias against active flows with bursty traffic patterns.

A packet switching network such as the Internet consists of multiple intermediate nodes linked together by multiple transmission links for transporting information in packet form from one or more source nodes to one or more destination nodes. Typically, each intermediate node has a switch or a router, which receives packets from source and/or previous intermediate nodes and redirects these packets to their respective subsequent intermediate or destination nodes. During high traffic periods the intermediate nodes may not be able to cope with the large volume of arriving packets due to the limited bandwidth capacity of the outgoing transmission link. High traffic periods occur when many active flows are transmitting information at the same time or when one or more active flows are transmitting large volumes of information over a short time period (i.e. bursty traffic). In such situations, the intermediate nodes typically discard incoming packets to implicitly signal the source nodes to reduce their transmission rates thereby reducing the number of packet being transmitted. In the case where the source nodes use the TCP, the sizes of the congestion windows of the source nodes are reduced, typically by one-half. To avoid discarding the incoming packets during high traffic periods or bursty traffic periods, intermediate nodes also have buffers for temporarily storing arriving packets. However, because these buffers cannot be too large, an efficient buffer management scheme is needed to control the congestion at the intermediate nodes and at the same time provide fair bandwidth allocation to all active flows traversing the intermediate nodes.

The RS scheme according to the embodiments of the invention operates on the premise that the cause of the congestion at an intermediate node is most likely to be the active flow(s) with the greatest contribution to the state of the buffer. Therefore, to provide a high degree of fairness in the allocation of bandwidth to all active flows traversing the intermediate node, the active flows that contribute the most to the congestion (i.e. misbehaving flows) need to be identified and penalized accordingly. Furthermore, each misbehaving flow is to be penalized in accordance with its level of contribution to the state of the buffer and thereby the state of congestion. This ensures that not all active flows are penalized uniformly, rather each active flow is penalized depending on its contribution to the state of congestion. This results in a high degree of fairness in the utilization of the outgoing transmission link bandwidth.

The RS scheme according to a first embodiment of the invention is shown in flowchart 200 in FIG. 2. An intermediate node implementing the RS scheme carries out the steps shown in the flowchart 200 as described hereinafter. When the router at an intermediate node receives a newly arrived packet in a step 202, the router randomly samples a set of packets (the sample) from the buffer for comparison with the newly arrived packet. To alleviate the need for additional buffering, the packets in the sample are randomly selected from the buffer one packet at a time, in a step 206, for comparison in the matching process in the steps 208 to 212. The size of the sample S_si_ze can range from one packet up to the instantaneous buffer size. If the sample size Ssize is too small, the comparisons do not provide an adequate representation of the actual state of the buffer. Therefore, misbehaving flows might not be identified and instead normal active flows can be mistaken as misbehaving flows and therefore erroneously penalized. On the other hand, if the sample size S_,_ze is too large, even though large sample size S_,__e yields a substantially clearer picture of the state of the buffer for identifying more accurately the misbehaving flows, the router requires a great amount of processing time to carry out the matching process. In the first embodiment, to provide a substantially fair bandwidth allocation and at the same time minimizing the processing time of the router, the sample size S__/ze is preferably selected from a range of 20% to 50% of the instantaneous buffer size.

Before the matching process begins, a temporary counter C and a matching counter MC are provided in a step 204 for use in the matching process. The temporary counter C is used for counting the number of packets compared during the matching process, while the matching counter MC is used for tracking the number of matches found during the matching process. In the step 204, the temporary counter C and the matching counter MC are assigned initial values "0" and "0", respectively. Upon acquiring a randomly sampled packet from the buffer in the step 206, the temporary counter C is incremented by one and a step 208 is activated to begin the matching process. In the step 208, if the temporary counter C is lesser than or equal to the sample size S_size the requisite number of randomly sampled packets has not been compared. In this case, the newly arrived packet is compared with a previously un- compared packet randomly sampled from the buffer in a step 210. Each active flow has a flow identifier. The flow identifier comprises one or a combination of destination and source addresses, destination and source port numbers, protocol identifier, and the like identifiers. A match is found when the newly arrived packet and the randomly sampled packet from the buffer are determined to have the same flow identifier. Each time a match is found, the matching counter MC is incremented by one to register the number of matches found for the newly arrived packet in a step 212. The router then reverts to the step 206 to retrieve another un-checked randomly sampled packet from the buffer for comparison.

However, if a match is not found in the step 210, the router reverts to the step 206 to compare the next un-checked randomly sampled packet from the buffer.

In the step 208, if the requisite number of comparisons (i.e. the sample size S_sj_ze) is compared, the newly arrived packet is discarded with a packet drop probability that is proportional to the matching counter MC in a step 214. In the first embodiment, the packet drop probability Pa of a newly arrived packet associating with an active flow i is given by:

MC, p_di = (1)

where MCi represents the number of matches found for the newly arrived packet that associates with the active flow i. Equation (1) indicates that the newly arrived packet from the active flow i is discarded in proportion to the number of matches found. If the number of matches found is high, then the newly arrived packet from the active flow i has a high probability of being discarded. Thus the bandwidth of the outgoing transmission link is not fully occupied by misbehaving flows.

Once the packet drop probability P_di for a newly arrived packet from an active flow i is obtained, the router preferably generates a random number P. The generated random number P is then compared with the packet drop probability P i for the newly arrived packet. If the generated random number P is less than the packet drop probability P_di for the newly arrived packet from the active flow i, the newly arrived packet is discarded. Otherwise, the newly arrived packet is admitted into the buffer for transmission to either a subsequent intermediate node or the destination node(s). Alternatively, the newly arrived packet can be discarded once the probability P i is greater than a predetermined threshold.

The RS scheme according to a second embodiment of the invention is shown in a flowchart 300 in FIG. 3. An intermediate node implementing the RS scheme of this second embodiment carries out the steps shown in the flowchart 300 as described hereinafter. When the router at an intermediate node receives a newly arrived packet in a step 302, the router proceeds to calculate a new average occupancy level of the buffer avg in a step 304. Like in the RED scheme, the average occupancy level of the buffer avg is calculated using an exponentially weighted moving average. Although other average occupancy level calculation methods can be used, the exponentially weighted moving average is preferred for its resilient mechanism that prevents drastic changes in the average occupancy level of the buffer due to random bursty traffic patterns. Thus, the RS scheme does not bias against active flows exhibiting bursty traffic patterns. The average occupancy level of the buffer avg is calculated as follows:

avg = (1 - w)avg _prev + wq (2) where w represents a weight constant, avg_prev represents the immediate previous average occupancy level of the buffer and q represents the instantaneous occupancy level of the buffer. The selection of w is dependent on the operational preferences of the service provider that deploys the intermediate nodes. If w is too large, the averaging procedure does not quickly detect the changes in the level of congestion at the intermediate node. On the other hand, if w is too small, the average occupancy level of the buffer avg responds too slowly to changes in the actual buffer occupancy level. Thus, the router is unable to detect the initial stages of congestion. Preferably, w is chosen between 0.001 and 0.01. The average occupancy level of the buffer avg is essentially calculated using a lowpass filter, where w determines the time constant of the lowpass filter.

Once the new average occupancy level of the buffer avg is obtained, the router checks if the new average occupancy level of the buffer avg falls within two congestion thresholds Cl_th and C2_tι, in a step 306 with Cl_tι_t being lower than C_V This region between Cl_th and C-V, indicates the onset of congestion at the intermediate node. The values of these congestion thresholds Cl_th and C2_tι, are dependent on the size of the buffer. Preferably, these values are 50% and 80% of the size of the buffer, respectively. If the average occupancy level of the buffer avg is not within the region between Cl_tι, and C2_tι„ the router checks if the new average occupancy level of the buffer avg is lesser than Cl_th in a step 308. If the new average occupancy level of the buffer avg is lesser than Cl_th, the newly arrived packet is admitted into the buffer in a step 310 for transmission to either a subsequent intermediate node or the destination node. In the step 308, if the average occupancy level of the buffer avg is greater than Cl_t , it implies that the average occupancy level avg is equal to or greater than C2_tι, indicating that the buffer is in a state of heavy congestion. Thus, the newly arrived packet is discarded directly in a step 312.

However, if the average occupancy level of the buffer avg is found to fall within the region between Cl_tι, and C2_tι, in the step 306, a sample of packets (the sample) is selected randomly from the buffer for comparison with the newly arrived packet. To alleviate the need for additional buffering, the packets in the sample are randomly selected from the buffer one packet at a time, in a step 316, for comparison in the matching process in the steps 318 to 324. The size of the sample &__ can range from one packet up to the instantaneous buffer size of the buffer. Like in the first embodiment, if the sample size S~_ze is too small, the comparison does not provide an adequate representation of the actual state of the buffer. Therefore, misbehaving flows might not be identified and instead normal active flows can be mistaken as misbehaving flows and therefore erroneously penalized. On the other hand, if the sample size S_,_ze is too large, even though large sample size S_,_ze yields a substantially clearer picture of the state of the packets in the buffer for identifying more accurately the misbehaving flows, the router requires a great amount of processing time to carry out the matching process. In the second embodiment, to provide substantially fair bandwidth utilization and at the same time minimizing the processing time of the router, the sample size S_,_ze is preferably selected from a range of 20% to 50% of the instantaneous occupancy level of the buffer.

Before the matching process begins, a temporary counter C and a matching counter MC are provided in a step 314 for using in the matching process. The temporary counter C is used for counting the number of packets compared during the matching process, while the matching counter MC is used for tracking the number of matches found during the matching process. In the step 314, the temporary counter C and the matching counter MC are assigned initial values "0" and "0", respectively.

Upon acquiring a randomly sampled packet from the buffer in the step 316, the temporary counter C is incremented by one and a step 318 is activated to begin the matching process. In the step 318, if the temporary counter C is lesser than or equal to the sample size S_,-__e the requisite number of randomly sampled packets has not been compared. In this case, the newly arrived packet is compared with an un-checked packet randomly sampled from the buffer in a step 322. Each active flow has a flow identifier. The flow identifier comprises one or a combination of destination and source addresses, destination and source port numbers, protocol identifier, and the like identifiers. A match is found when the newly arrived packet and the randomly sampled packet from the buffer are determined to have the same flow identifier. Each reverts to the step 316 to retrieve another un-checked randomly sampled packet from the buffer for comparison.

However, if a match is not found in the step 322, the router reverts to the step 316 to compare the next un-checked randomly sampled packet from the buffer.

In the step 318, if the requisite number of comparisons (i.e. the sample size S_sιze) is compared, the newly arrived packet is discarded with a packet drop probability that is proportional to the matching counter MC in a step 320. In the second embodiment, the packet drop probability P_/,- of a newly arrived packet originating from an active flow i is given by:

where Max_p represents the maximum packet drop probability and MC, represents the number of matches found for the newly arrived packet that associates with the active flow /. Max_p sets the maximum probability for discarding a packet. Depending on the operational requirements of the intermediate node, Max_p can have a value ranging from greater than zero up to one. The value of Max_p is preferably chosen from a range of 0.02 to 0.08; although it is obvious to one skilled in the art the Max_p can also be chosen from a larger range of zero to one.

Equation (3) indicates that the newly arrived packet from the active flow i is discarded in proportion to the number of matches found. If the number of matches found is high, then the newly arrived packet from the active flow i has a high probability of being discarded. Thus, the bandwidth of the outgoing transmission link is not fully occupied by misbehaving flows. Equation (3) also shows a random packet drop profile component that relates to the state of the buffer given by average occupancy level of the buffer avg. The larger the average occupancy level of the buffer avg in comparison C2_th, the greater the probability the newly arrived packet from the active flow / is discarded. The random packet drop profile component in equation (3) provides the RS scheme the ability to avoid synchronization of active flows and maintains the buffer occupancies at an optimum level to ensure optimum throughput and minimal delay.

Once the packet drop probability Pd, for a newly arrived packet from an active flow i is obtained, the router preferably generates a random number P. The generated random number P is then compared with the packet drop probability P , for the newly arrived packet. If the generated random number P is less than the packet drop probability P , for the newly arrived packet from the active flow , the newly arrived packet is discarded. Otherwise, the newly arrived packet is admitted into the buffer for transmission to either a subsequent intermediate node or the destination node(s). Alternatively, the newly arrived packet can be discarded once the probability Pdi is greater than a predetermined threshold.

The steps in the flowcharts 200 and 300 according to the first and second embodiments of the invention, respectively, described in the foregoing are based on the assumption that the packets in the buffer are of equal size. In most cases, this is true because most routers at intermediate nodes split the packets into equal sizes during the buffering process to allow for efficient memory management. However, in cases where the packets in the buffer are not of equal size, the RS scheme can be modified to provide a fairer comparison.

In accordance with a third embodiment of the invention, the RS scheme according to the first and second embodiments as described in the foregoing is modified to cater for cases where the packets in the buffer are not of equal size. Accordingly, the steps in the flowcharts 200 and 300, as shown in FIG. 2 and FIG. 3 respectively, described in the foregoing are incorporated herein with the exception of the steps 206 and 212 in the flowchart 200 in FIG. 2 and the steps 316 and 324 in the flowchart 300 in FIG. 3.

Typically, each packet in the buffer is 1500 bytes in size. However, it is also possible that some packets in the buffer are larger or smaller in size. To account for the difference in packet sizes while ensuring that all the active flows are treated fairly, the temporary counter C in the step 206 in the flowchart 200 in FIG.2 and the step 316 in the flowchart 300 in FIG. 3 along with the matching counter MC in the step 212 in the flowchart 200 in FIG.2 and the step 324 in the flowchart 300 in FIG. 3 are incremented by the size of the randomly sampled packet. That is, if two randomly sampled packets of sizes 1500 bytes and 500 byes are checked, the temporary counter C is incremented by 1500 and 500, respectively. And if the comparisons result in a match between the newly arrived packet from the active flow t and the randomly sampled packet of 500 bytes, the matching counter MC is incremented by 500.

Therefore, the functions of the steps 206 and 212 in the flowchart 200 in FIG. 2 and the steps 316 and 324 in the flowchart 300 in FIG. 3 are changed as described in the foregoing to cater for the cases where the sizes of the packets in the buffer are not the same. Further, the sample size S_ _ze no longer represents the number of packets as in the first and second embodiments of the invention. In the third embodiment of the invention, the sample size S__/__e represents the total size of the sample in bytes.

In the foregoing manner, a random sampling buffer management scheme is described according to the embodiments of the invention for addressing one or more of the foregoing disadvantages of conventional buffer management schemes. It will be apparent to one skilled in the art in view of this disclosure that numerous changes, modifications and combinations can be made without departing from the scope and spirit of the invention. For example, an equally effective average occupancy level calculation method can be used to replace the exponentially weighted moving average method. Similarly, the random packet drop probability component in equation (3) can be replaced by any one of numerous known methods of calculating a random probability. In the embodiments of the invention, an alternative to retrieving the randomly sampled packets from the buffer is to retrieve only the information relating to the randomly sampled packets for the matching process. Further, an alternative to randomly sampling packets stored in the buffer is to identify packets from a fixed location in the buffer. Since the buffer operates on a first in first out (FIFO) basis, the movement of the packets in the buffer from the entry position to the exit position provides a random characteristic. Even though the embodiments of the invention are described in the context of the Internet using the TCP for transporting information from source nodes to destination nodes, the RS scheme can be implemented for use in other switching networks, such as a frame relay network or an Asynchronous Transfer Mode (ATM) network, that use other transport protocols. Further, the RS scheme according to the embodiments of the invention can be implemented in one of software, firmware, special purpose digital logic, or any combination thereof.

Claims

Claims:

1. A method for controlling congestion in a communications network, the network comprising a recipient node, the recipient node comprising a buffer for storing packets received at the recipient node, the method comprising the steps of: providing a sample, the sample comprising at least one packet randomly selected from packets stored in the buffer upon receiving a packet at the recipient node, the at least one packet and the received packet each having a flow identifier and a packet size, the flow identifier being indicative of the active flow to which the packet belongs; identifying a match group for grouping the at least one packet in the sample, wherein the flow identifier of each packet in the match group matches the flow identifier of the received packet; generating a discard probability based on the sample and the match group; and discarding the received packet based on the generated discard probability to control congestion in the communications network.

2. The method as in claim 1, wherein the step of identifying the match group comprises the steps of: declaring a match when the flow identifier of the at least one packet in the sample and the received packet is the same; and incrementing a counter in response to the match being declared.

3. The method as in claim 2, wherein the step of incrementing the counter comprises the step of incrementing the counter by a factor relating to the size of the at least one packet in the sample being compared.

4. The method as in claim 1, wherein the step of generating the discard probability comprises the steps of: determining a sample quantity, the sample quantity being the size of the sample; determining a match quantity, the match quantity being the size of the match group; and generating the discard probability from the sample quantity and the match quantity.

5. The method as in claim 4, wherein the step of generating the discard probability comprises the step of calculating the discard probability in accordance with:

MQ_a p_d =

SQcount

where P_d represents the discard probability, MQ_C0Unt represents the match quantity and SQcou_nt represents the sample quantity.

6. The method as in claim 1, wherein the step of generating the discard probability comprises the steps of: determining a sample quantity, the sample quantity being a summation of the packet size of each of the at least one packet in the sample; determining a match quantity, the match quantity being a summation of the packet size of each packet in the match group; and generating the discard probability from the sample quantity and the match quantity.

7. The method as in claim 6, wherein the step of generating the discard probability comprises the step of calculating the discard probability in accordance with:

p _ MQ_m

SQ„ where P_d represents the discard probability, MQ_size represents the match quantity and SQ_si_ze represents the sample quantity.

8. The method as in claim 1, further comprising the steps of: calculating an average occupancy level of the buffer upon receiving the packet at the recipient node; and admitting the received packet into the buffer if the average occupancy level of the buffer is lesser than a first threshold.

9. The method as in claim 8, further comprising the step of discarding the received packet if the average occupancy level of the buffer is greater than a second threshold, wherein the second threshold has a value ranging from a value greater than the first threshold up to the size of the buffer.

10. The method as in claim 8, wherein the step of calculating the average occupancy level of the buffer comprises the step of calculating the average occupancy level of the buffer avg in accordance with:

avg = (l-w)avg_prev + wq

where w represents a weight constant, avg_prev represents the immediate previous average occupancy level of the buffer and q represents the instantaneous occupancy level of the buffer.

11. The method as in claim 10, wherein the step of generating the discard probability comprises the steps of: determining a sample quantity, the sample quantity being the size of the sample; determining a match quantity, the match quantity being the size of the match group; and calculating the discard probability in accordance with:

where P_d represents the discard probability, Max_p represents a maximum packet drop probability, avg represents the average occupancy level of the buffer, Cl_th and C2_th represent the first and second thresholds, respectively, MQ_coullt represents the match quantity and SQ_C0Unt represents the sample quantity.

12. The method as in claim 10, wherein the step of generating the discard probability comprises the steps of: determining a sample quantity, the sample quantity being a summation of the packet size of each of the at least one packet in the sample; determining a match quantity, the match quantity being a summation of the packet size of each packet in the match group; and calculating the discard probability in accordance with:

where P_d represents the discard probability, Max_p represents a maximum packet drop probability, avg represents the average occupancy level of the buffer, Cl_th and C2_th represent the first and second thresholds, respectively, MQ_slze represents the match quantity and SQ_s,_ze represents the sample quantity.

13. An apparatus for controlling congestion in a communications network, the network comprising a recipient node, the recipient node comprising a buffer for storing packets received at the recipient node, the apparatus comprising: a sample, the sample comprising at least one packet randomly selected from packets stored in the buffer upon receiving a packet at the recipient node, the at least one packet and the received packet each having a flow identifier and a packet size, the flow identifier being indicative of the active flow to which the packet belongs; means for identifying a match group for grouping the at least one packet in the sample, wherein the flow identifier of each packet in the match group matches the flow identifier of the received packet; means for generating a discard probability based on the sample and the match group; and means for discarding the received packet based on the discard probability to control congestion in the communications network.

14. The apparatus as in claim 13, wherein the means for identifying the match group comprises: means for declaring a match when the flow identifier of the at least one packet in the sample and the received packet is the same; and a counter for keeping a count in response to the match being declared.

15. The apparatus as in claim 14, wherein the counter increases the count by a factor relating to the size of the at least one packet in the sample being compared.

16. The apparatus as in claim 13, wherein the means for generating the discard probability comprises: means for determining a sample quantity, the sample quantity being the size of the sample; means for determining a match quantity, the match quantity being the size of the match group; and means for generating the discard probability from the sample quantity and the match quantity.

17. The apparatus as in claim 16, wherein the means for generating the discard probability comprises means for calculating the discard probability in accordance with:

MQ_comt P_d

SQcount where P. represents the discard probability, MQ_comt represents the match quantity and SQcou_nt represents the sample quantity.

18. The apparatus as in claim 13, wherein the means for generating the discard probability comprises: means for determining a sample quantity, the sample quantity being a summation of the packet size of each of the at least one packet in the sample; means for determining a match quantity, the match quantity being a summation of the packet size of each packet in the match group; and means for generating the discard probability from the sample quantity and the match quantity.

19. The apparatus as in claim 18, wherein the means for generating the discard probability comprises means for calculating the discard probability in accordance with:

P_{rf =} MQ_SI

SQ_S,

where P represents the discard probability, MQ_S{_ze represents the match quantity and S_2-/_re represents the sample quantity.

20. The apparatus as in claim 13, further comprising: means for calculating an average occupancy level of the buffer upon receiving the packet at the recipient node; and means for admitting the received packet into the buffer if the average occupancy level of the buffer is lesser than a first threshold.

21. The apparatus as in claim 20, further comprising means for discarding the received packet if the average occupancy level of the buffer is greater than a second threshold, wherein the second threshold has a value ranging from a value greater than the first threshold up to the size of the buffer.

22. The apparatus as in claim 20, wherein the means for calculating the average occupancy level of the buffer comprises means for calculating the average occupancy level of the buffer avg in accordance with:

avg = (l-w)avg_prev + wq

23. The apparatus as in claim 22, wherein the means for generating the discard probability comprises: means for determining a sample quantity, the sample quantity being the size of the sample; means for determining a match quantity, the match quantity being the size of the match group; and means for calculating the discard probability in accordance with:

where P represents the discard probability, Max_p represents a maximum packet drop probability, avg represents the average occupancy level of the buffer, Cl_th and C2_t/, represent the first and second thresholds, respectively, MQ_comt represents the match quantity and SQ_COu_nt represents the sample quantity.

24. The apparatus as in claim 22, wherein the means for generating the discard probability comprises: means for determining a sample quantity, the sample quantity being a summation of the packet size of each of the at least one packet in the sample; means for determining a match quantity, the match quantity being a summation of the packet size of each packet in the match group; and means for calculating the discard probability in accordance with:

where P represents the discard probability, Max_p represents a maximum packet drop probability, avg represents the average occupancy level of the buffer, Cl_th and C2_tι, represent the first and second thresholds, respectively, MQ_slze represents the match quantity and SQ_s,_ze represents the sample quantity.