Network Working Group M. Allman Request for Comments: 2581 NASA Glenn/Sterling Software Obsoletes: 2001 V. Paxson Category: Standards Track ACIRI / ICSI W. Stevens Consultant April 1999
Network Working Group M. Allman Request for Comments: 2581 NASA Glenn/Sterling Software Obsoletes: 2001 V. Paxson Category: Standards Track ACIRI / ICSI W. Stevens Consultant April 1999
TCP Congestion Control
拥挤控制算法
Status of this Memo
本备忘录的状况
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The Internet Society (1999). All Rights Reserved.
版权所有(C)互联网协会(1999年)。版权所有。
Abstract
摘要
This document defines TCP's four intertwined congestion control algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery. In addition, the document specifies how TCP should begin transmission after a relatively long idle period, as well as discussing various acknowledgment generation methods.
本文档定义了TCP的四种交织的拥塞控制算法:慢启动、拥塞避免、快速重传和快速恢复。此外,本文档还规定了TCP在相对较长的空闲时间后应如何开始传输,并讨论了各种确认生成方法。
This document specifies four TCP [Pos81] congestion control algorithms: slow start, congestion avoidance, fast retransmit and fast recovery. These algorithms were devised in [Jac88] and [Jac90]. Their use with TCP is standardized in [Bra89].
本文档指定了四种TCP[Pos81]拥塞控制算法:慢启动、拥塞避免、快速重传和快速恢复。这些算法是在[Jac88]和[Jac90]中设计的。[Bra89]对其与TCP的使用进行了标准化。
This document is an update of [Ste97]. In addition to specifying the congestion control algorithms, this document specifies what TCP connections should do after a relatively long idle period, as well as specifying and clarifying some of the issues pertaining to TCP ACK generation.
本文件是[Ste97]的更新。除了指定拥塞控制算法外,本文档还指定了TCP连接在相对较长的空闲时间后应该做什么,以及指定和澄清与TCP ACK生成相关的一些问题。
Note that [Ste94] provides examples of these algorithms in action and [WS95] provides an explanation of the source code for the BSD implementation of these algorithms.
请注意,[Ste94]提供了这些算法的实例,[WS95]提供了这些算法的BSD实现的源代码解释。
This document is organized as follows. Section 2 provides various definitions which will be used throughout the document. Section 3 provides a specification of the congestion control algorithms. Section 4 outlines concerns related to the congestion control algorithms and finally, section 5 outlines security considerations.
本文件的组织结构如下。第2节提供了贯穿本文件的各种定义。第3节提供了拥塞控制算法的规范。第4节概述了与拥塞控制算法相关的问题,最后,第5节概述了安全注意事项。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [Bra97].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[Bra97]中所述进行解释。
This section provides the definition of several terms that will be used throughout the remainder of this document.
本节提供了本文件其余部分将使用的几个术语的定义。
SEGMENT: A segment is ANY TCP/IP data or acknowledgment packet (or both).
段:段是任何TCP/IP数据或确认数据包(或两者)。
SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the largest segment that the sender can transmit. This value can be based on the maximum transmission unit of the network, the path MTU discovery [MD90] algorithm, RMSS (see next item), or other factors. The size does not include the TCP/IP headers and options.
发送方最大段大小(SMSS):SMSS是发送方可以传输的最大段的大小。该值可以基于网络的最大传输单位、路径MTU发现[MD90]算法、RMS(见下一项)或其他因素。该大小不包括TCP/IP头和选项。
RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the largest segment the receiver is willing to accept. This is the value specified in the MSS option sent by the receiver during connection startup. Or, if the MSS option is not used, 536 bytes [Bra89]. The size does not include the TCP/IP headers and options.
接收器最大段大小(RMSS):RMSS是接收器愿意接受的最大段的大小。这是连接启动期间接收器发送的MSS选项中指定的值。或者,如果未使用MSS选项,则为536字节[Bra89]。该大小不包括TCP/IP头和选项。
FULL-SIZED SEGMENT: A segment that contains the maximum number of data bytes permitted (i.e., a segment containing SMSS bytes of data).
全尺寸段:包含允许的最大数据字节数的段(即,包含SMSS字节数据的段)。
RECEIVER WINDOW (rwnd) The most recently advertised receiver window.
接收器窗口(rwnd):最近播发的接收器窗口。
CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount of data a TCP can send. At any given time, a TCP MUST NOT send data with a sequence number higher than the sum of the highest acknowledged sequence number and the minimum of cwnd and rwnd.
拥塞窗口(cwnd):限制TCP可以发送的数据量的TCP状态变量。在任何给定时间,TCP发送的数据的序列号不得高于最高确认序列号与最小cwnd和rwnd之和。
INITIAL WINDOW (IW): The initial window is the size of the sender's congestion window after the three-way handshake is completed.
初始窗口(IW):初始窗口是三方握手完成后发送方拥塞窗口的大小。
LOSS WINDOW (LW): The loss window is the size of the congestion window after a TCP sender detects loss using its retransmission timer.
丢失窗口(LW):丢失窗口是TCP发送方使用其重传计时器检测到丢失后拥塞窗口的大小。
RESTART WINDOW (RW): The restart window is the size of the congestion window after a TCP restarts transmission after an idle period (if the slow start algorithm is used; see section 4.1 for more discussion).
重新启动窗口(RW):重新启动窗口是TCP在空闲时间后重新启动传输后拥塞窗口的大小(如果使用慢启动算法;有关更多讨论,请参阅第4.1节)。
FLIGHT SIZE: The amount of data that has been sent but not yet acknowledged.
航班大小:已发送但尚未确认的数据量。
This section defines the four congestion control algorithms: slow start, congestion avoidance, fast retransmit and fast recovery, developed in [Jac88] and [Jac90]. In some situations it may be beneficial for a TCP sender to be more conservative than the algorithms allow, however a TCP MUST NOT be more aggressive than the following algorithms allow (that is, MUST NOT send data when the value of cwnd computed by the following algorithms would not allow the data to be sent).
本节定义了[Jac88]和[Jac90]中开发的四种拥塞控制算法:慢启动、拥塞避免、快速重传和快速恢复。在某些情况下,TCP发送方可能比算法允许的更保守,但TCP不得比以下算法允许的更激进(即,当以下算法计算的cwnd值不允许发送数据时,不得发送数据)。
The slow start and congestion avoidance algorithms MUST be used by a TCP sender to control the amount of outstanding data being injected into the network. To implement these algorithms, two variables are added to the TCP per-connection state. The congestion window (cwnd) is a sender-side limit on the amount of data the sender can transmit into the network before receiving an acknowledgment (ACK), while the receiver's advertised window (rwnd) is a receiver-side limit on the amount of outstanding data. The minimum of cwnd and rwnd governs data transmission.
TCP发送方必须使用慢启动和拥塞避免算法来控制注入网络的未完成数据量。为了实现这些算法,每个连接状态的TCP中添加了两个变量。拥塞窗口(cwnd)是发送方在接收确认(ACK)之前可以向网络传输的数据量的发送方侧限制,而接收方的广告窗口(rwnd)是未完成数据量的接收方侧限制。cwnd和rwnd的最小值控制数据传输。
Another state variable, the slow start threshold (ssthresh), is used to determine whether the slow start or congestion avoidance algorithm is used to control data transmission, as discussed below.
另一个状态变量,慢启动阈值(ssthresh),用于确定是否使用慢启动或拥塞避免算法来控制数据传输,如下所述。
Beginning transmission into a network with unknown conditions requires TCP to slowly probe the network to determine the available capacity, in order to avoid congesting the network with an inappropriately large burst of data. The slow start algorithm is used for this purpose at the beginning of a transfer, or after repairing loss detected by the retransmission timer.
开始向未知条件下的网络传输时,需要TCP缓慢探测网络以确定可用容量,以避免网络因数据量过大而拥塞。为此,在传输开始时或在修复重传计时器检测到的丢失后,使用慢启动算法。
IW, the initial value of cwnd, MUST be less than or equal to 2*SMSS bytes and MUST NOT be more than 2 segments.
IW(cwnd的初始值)必须小于或等于2*SMSS字节,且不得超过2个段。
We note that a non-standard, experimental TCP extension allows that a TCP MAY use a larger initial window (IW), as defined in equation 1 [AFP98]:
我们注意到,非标准的实验性TCP扩展允许TCP使用更大的初始窗口(IW),如等式1[AFP98]中所定义:
IW = min (4*SMSS, max (2*SMSS, 4380 bytes)) (1)
IW = min (4*SMSS, max (2*SMSS, 4380 bytes)) (1)
With this extension, a TCP sender MAY use a 3 or 4 segment initial window, provided the combined size of the segments does not exceed 4380 bytes. We do NOT allow this change as part of the standard defined by this document. However, we include discussion of (1) in the remainder of this document as a guideline for those experimenting with the change, rather than conforming to the present standards for TCP congestion control.
通过此扩展,TCP发送方可以使用3或4段初始窗口,前提是段的组合大小不超过4380字节。我们不允许将此变更作为本文件定义的标准的一部分。然而,我们在本文档的其余部分中包括了对(1)的讨论,作为那些尝试改变的人的指南,而不是符合TCP拥塞控制的现有标准。
The initial value of ssthresh MAY be arbitrarily high (for example, some implementations use the size of the advertised window), but it may be reduced in response to congestion. The slow start algorithm is used when cwnd < ssthresh, while the congestion avoidance algorithm is used when cwnd > ssthresh. When cwnd and ssthresh are equal the sender may use either slow start or congestion avoidance.
ssthresh的初始值可以任意高(例如,一些实现使用广告窗口的大小),但是它可以响应于拥塞而减小。当cwnd<ssthresh时使用慢启动算法,当cwnd>ssthresh时使用拥塞避免算法。当cwnd和ssthresh相等时,发送方可以使用慢启动或拥塞避免。
During slow start, a TCP increments cwnd by at most SMSS bytes for each ACK received that acknowledges new data. Slow start ends when cwnd exceeds ssthresh (or, optionally, when it reaches it, as noted above) or when congestion is observed.
在慢启动过程中,对于接收到的确认新数据的每个ACK,TCP最多将cwnd增加SMSS字节。慢启动在cwnd超过ssthresh(或者,可以选择,当它达到ssthresh时,如上所述)或观察到拥塞时结束。
During congestion avoidance, cwnd is incremented by 1 full-sized segment per round-trip time (RTT). Congestion avoidance continues until congestion is detected. One formula commonly used to update cwnd during congestion avoidance is given in equation 2:
在避免拥塞期间,cwnd每往返时间(RTT)增加1个全尺寸段。拥塞避免将继续,直到检测到拥塞为止。方程2给出了在避免拥塞期间更新cwnd常用的一个公式:
cwnd += SMSS*SMSS/cwnd (2)
cwnd += SMSS*SMSS/cwnd (2)
This adjustment is executed on every incoming non-duplicate ACK. Equation (2) provides an acceptable approximation to the underlying principle of increasing cwnd by 1 full-sized segment per RTT. (Note that for a connection in which the receiver acknowledges every data segment, (2) proves slightly more aggressive than 1 segment per RTT, and for a receiver acknowledging every-other packet, (2) is less aggressive.)
此调整在每个传入的非重复ACK上执行。方程式(2)提供了一个可接受的近似值,用于将cwnd每RTT增加1个完整尺寸段的基本原理。(注意,对于接收器确认每个数据段的连接,(2)证明比每个RTT的1段稍微更具攻击性,并且对于接收器确认每个其他数据包的连接,(2)的攻击性更小。)
Implementation Note: Since integer arithmetic is usually used in TCP implementations, the formula given in equation 2 can fail to increase cwnd when the congestion window is very large (larger than SMSS*SMSS). If the above formula yields 0, the result SHOULD be rounded up to 1 byte.
实现说明:由于TCP实现中通常使用整数算法,当拥塞窗口非常大(大于SMSS*SMSS)时,等式2中给出的公式可能无法增加cwnd。如果上述公式得出0,则结果应向上舍入到1字节。
Implementation Note: older implementations have an additional additive constant on the right-hand side of equation (2). This is incorrect and can actually lead to diminished performance [PAD+98].
实现说明:较旧的实现在等式(2)的右侧有一个附加的加法常数。这是不正确的,实际上会导致性能降低[PAD+98]。
Another acceptable way to increase cwnd during congestion avoidance is to count the number of bytes that have been acknowledged by ACKs for new data. (A drawback of this implementation is that it requires maintaining an additional state variable.) When the number of bytes acknowledged reaches cwnd, then cwnd can be incremented by up to SMSS bytes. Note that during congestion avoidance, cwnd MUST NOT be increased by more than the larger of either 1 full-sized segment per RTT, or the value computed using equation 2.
在避免拥塞期间增加cwnd的另一个可接受的方法是统计ACK已确认的新数据字节数。(此实现的一个缺点是需要维护一个额外的状态变量。)当确认的字节数达到cwnd时,cwnd最多可以增加SMSS字节。请注意,在避免拥塞期间,cwnd的增加不得超过每RTT 1个完整尺寸段或使用公式2计算的值中的较大值。
Implementation Note: some implementations maintain cwnd in units of bytes, while others in units of full-sized segments. The latter will find equation (2) difficult to use, and may prefer to use the counting approach discussed in the previous paragraph.
实现说明:一些实现以字节为单位维护cwnd,而另一些实现以全尺寸段为单位。后者将发现等式(2)难以使用,可能更倾向于使用上一段中讨论的计数方法。
When a TCP sender detects segment loss using the retransmission timer, the value of ssthresh MUST be set to no more than the value given in equation 3:
当TCP发送方使用重传计时器检测到段丢失时,ssthresh的值必须设置为不超过等式3中给出的值:
ssthresh = max (FlightSize / 2, 2*SMSS) (3)
ssthresh = max (FlightSize / 2, 2*SMSS) (3)
As discussed above, FlightSize is the amount of outstanding data in the network.
如上所述,FlightSize是网络中未完成的数据量。
Implementation Note: an easy mistake to make is to simply use cwnd, rather than FlightSize, which in some implementations may incidentally increase well beyond rwnd.
实现说明:一个容易犯的错误是简单地使用cwnd,而不是FlightSize,在某些实现中,FlightSize可能会意外地增加到远远超过rwnd。
Furthermore, upon a timeout cwnd MUST be set to no more than the loss window, LW, which equals 1 full-sized segment (regardless of the value of IW). Therefore, after retransmitting the dropped segment the TCP sender uses the slow start algorithm to increase the window from 1 full-sized segment to the new value of ssthresh, at which point congestion avoidance again takes over.
此外,在超时时,cwnd必须设置为不超过丢失窗口LW,即等于1个完整大小的段(无论IW的值如何)。因此,在重新传输丢弃的段后,TCP发送方使用慢启动算法将窗口从1个全尺寸段增加到新的ssthresh值,此时拥塞避免再次接管。
A TCP receiver SHOULD send an immediate duplicate ACK when an out-of-order segment arrives. The purpose of this ACK is to inform the sender that a segment was received out-of-order and which sequence number is expected. From the sender's perspective, duplicate ACKs can be caused by a number of network problems. First, they can be caused by dropped segments. In this case, all segments after the dropped segment will trigger duplicate ACKs. Second, duplicate ACKs can be caused by the re-ordering of data segments by the network (not a rare event along some network paths [Pax97]). Finally, duplicate ACKs can be caused by replication of ACK or data segments by the network. In addition, a TCP receiver SHOULD send an immediate ACK when the incoming segment fills in all or part of a gap in the sequence space. This will generate more timely information for a sender recovering from a loss through a retransmission timeout, a fast retransmit, or an experimental loss recovery algorithm, such as NewReno [FH98].
当出现故障的数据段到达时,TCP接收器应立即发送重复的ACK。此ACK的目的是通知发送方接收到的数据段顺序错误,以及预期的序列号。从发送方的角度来看,重复的ACK可能由许多网络问题引起。首先,它们可能是由删除的段引起的。在这种情况下,丢弃的段之后的所有段都将触发重复的ACK。其次,重复的ACK可能是由网络对数据段重新排序引起的(在某些网络路径[Pax97]上这不是罕见的事件)。最后,网络复制ACK或数据段可能导致重复ACK。此外,当传入段填充序列空间中的全部或部分间隙时,TCP接收器应立即发送ACK。这将为通过重传超时、快速重传或实验性丢失恢复算法(如NewReno[FH98])从丢失中恢复的发送方生成更及时的信息。
The TCP sender SHOULD use the "fast retransmit" algorithm to detect and repair loss, based on incoming duplicate ACKs. The fast retransmit algorithm uses the arrival of 3 duplicate ACKs (4 identical ACKs without the arrival of any other intervening packets) as an indication that a segment has been lost. After receiving 3 duplicate ACKs, TCP performs a retransmission of what appears to be the missing segment, without waiting for the retransmission timer to expire.
TCP发送方应使用“快速重传”算法根据传入的重复确认检测和修复丢失。快速重传算法使用3个重复ack的到达(4个相同ack,没有任何其他中间包的到达)作为段丢失的指示。在接收到3个重复的ACK后,TCP将对丢失的段执行重传,而不等待重传计时器过期。
After the fast retransmit algorithm sends what appears to be the missing segment, the "fast recovery" algorithm governs the transmission of new data until a non-duplicate ACK arrives. The reason for not performing slow start is that the receipt of the duplicate ACKs not only indicates that a segment has been lost, but also that segments are most likely leaving the network (although a massive segment duplication by the network can invalidate this conclusion). In other words, since the receiver can only generate a duplicate ACK when a segment has arrived, that segment has left the network and is in the receiver's buffer, so we know it is no longer consuming network resources. Furthermore, since the ACK "clock" [Jac88] is preserved, the TCP sender can continue to transmit new segments (although transmission must continue using a reduced cwnd).
在快速重传算法发送看似丢失的数据段后,“快速恢复”算法控制新数据的传输,直到非重复ACK到达。不执行慢速启动的原因是,接收到重复的ACK不仅表明某个段已丢失,而且还表明该段极有可能离开网络(尽管网络的大量段复制可能会使该结论无效)。换句话说,由于接收器只能在一个段到达时生成一个重复的ACK,因此该段已经离开网络并在接收器的缓冲区中,因此我们知道它不再消耗网络资源。此外,由于保留了ACK“clock”[Jac88],TCP发送方可以继续传输新的段(尽管传输必须使用减少的cwnd继续)。
The fast retransmit and fast recovery algorithms are usually implemented together as follows.
快速重传和快速恢复算法通常一起实现,如下所示。
1. When the third duplicate ACK is received, set ssthresh to no more than the value given in equation 3.
1. 当接收到第三个重复ACK时,将ssthresh设置为不超过等式3中给出的值。
2. Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS. This artificially "inflates" the congestion window by the number of segments (three) that have left the network and which the receiver has buffered.
2. 重新传输丢失的段,并将cwnd设置为ssthresh加上3*SMSS。这会通过离开网络并由接收方缓冲的段数(三个)人为地“膨胀”拥塞窗口。
3. For each additional duplicate ACK received, increment cwnd by SMSS. This artificially inflates the congestion window in order to reflect the additional segment that has left the network.
3. 对于接收到的每个额外的重复ACK,SMS增加cwnd。这会人为地增大拥塞窗口,以反映已离开网络的附加网段。
4. Transmit a segment, if allowed by the new value of cwnd and the receiver's advertised window.
4. 如果cwnd的新值和接收器的播发窗口允许,则发送一个段。
5. When the next ACK arrives that acknowledges new data, set cwnd to ssthresh (the value set in step 1). This is termed "deflating" the window.
5. 当确认新数据的下一个ACK到达时,将cwnd设置为ssthresh(步骤1中设置的值)。这被称为“放气”窗口。
This ACK should be the acknowledgment elicited by the retransmission from step 1, one RTT after the retransmission (though it may arrive sooner in the presence of significant out-of-order delivery of data segments at the receiver). Additionally, this ACK should acknowledge all the intermediate segments sent between the lost segment and the receipt of the third duplicate ACK, if none of these were lost.
此ACK应为从步骤1开始的重传所引发的确认,即重传后一个RTT(尽管在接收器处存在严重的数据段无序交付的情况下,它可能更快到达)。此外,如果所有中间段均未丢失,则此ACK应确认在丢失段和收到第三个重复ACK之间发送的所有中间段。
Note: This algorithm is known to generally not recover very efficiently from multiple losses in a single flight of packets [FF96]. One proposed set of modifications to address this problem can be found in [FH98].
注:众所周知,该算法通常不能非常有效地从单个数据包的多次丢失中恢复[FF96]。[FH98]中提供了一套解决该问题的修改方案。
A known problem with the TCP congestion control algorithms described above is that they allow a potentially inappropriate burst of traffic to be transmitted after TCP has been idle for a relatively long period of time. After an idle period, TCP cannot use the ACK clock to strobe new segments into the network, as all the ACKs have drained from the network. Therefore, as specified above, TCP can potentially send a cwnd-size line-rate burst into the network after an idle period.
上述TCP拥塞控制算法的一个已知问题是,它们允许在TCP空闲相当长一段时间后传输可能不适当的流量突发。空闲时间过后,TCP无法使用ACK时钟将新段选通到网络中,因为所有ACK都已从网络中排出。因此,如上所述,TCP可能在空闲期后向网络发送cwnd大小的线速率突发。
[Jac88] recommends that a TCP use slow start to restart transmission after a relatively long idle period. Slow start serves to restart the ACK clock, just as it does at the beginning of a transfer. This mechanism has been widely deployed in the following manner. When TCP has not received a segment for more than one retransmission timeout, cwnd is reduced to the value of the restart window (RW) before
[Jac88]建议TCP在相对较长的空闲时间后使用慢速启动重新启动传输。慢速启动用于重新启动ACK时钟,就像在传输开始时一样。该机制已通过以下方式广泛部署。当TCP在一个以上的重传超时时间内未接收到一个段时,cwnd将减少到重新启动窗口(RW)之前的值
transmission begins.
传输开始。
For the purposes of this standard, we define RW = IW.
在本标准中,我们定义了RW=IW。
We note that the non-standard experimental extension to TCP defined in [AFP98] defines RW = min(IW, cwnd), with the definition of IW adjusted per equation (1) above.
我们注意到[AFP98]中定义的TCP的非标准实验扩展定义了RW=min(IW,cwnd),IW的定义根据上述等式(1)进行了调整。
Using the last time a segment was received to determine whether or not to decrease cwnd fails to deflate cwnd in the common case of persistent HTTP connections [HTH98]. In this case, a WWW server receives a request before transmitting data to the WWW browser. The reception of the request makes the test for an idle connection fail, and allows the TCP to begin transmission with a possibly inappropriately large cwnd.
在持久HTTP连接的常见情况下,使用最后一次接收段的时间来确定是否减少cwnd无法减少cwnd[HTH98]。在这种情况下,WWW服务器在将数据传输到WWW浏览器之前接收请求。接收到请求会导致空闲连接测试失败,并允许TCP使用可能不适当的大cwnd开始传输。
Therefore, a TCP SHOULD set cwnd to no more than RW before beginning transmission if the TCP has not sent data in an interval exceeding the retransmission timeout.
因此,如果TCP没有在超过重传超时的时间间隔内发送数据,则在开始传输之前,TCP应将cwnd设置为不超过RW。
The delayed ACK algorithm specified in [Bra89] SHOULD be used by a TCP receiver. When used, a TCP receiver MUST NOT excessively delay acknowledgments. Specifically, an ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet.
TCP接收器应使用[Bra89]中规定的延迟ACK算法。使用时,TCP接收器不得过度延迟确认。具体而言,应至少每秒钟生成一个全尺寸段的ACK,并且必须在第一个未确认数据包到达后500 ms内生成。
The requirement that an ACK "SHOULD" be generated for at least every second full-sized segment is listed in [Bra89] in one place as a SHOULD and another as a MUST. Here we unambiguously state it is a SHOULD. We also emphasize that this is a SHOULD, meaning that an implementor should indeed only deviate from this requirement after careful consideration of the implications. See the discussion of "Stretch ACK violation" in [PAD+98] and the references therein for a discussion of the possible performance problems with generating ACKs less frequently than every second full-sized segment.
[Bra89]中列出了至少每秒钟生成一个完整尺寸段的ACK“应”的要求,其中一处为“应”,另一处为“必须”。在这里,我们毫不含糊地表示,这是一个应该。我们还强调,这是一个应该,也就是说,实施者确实应该在仔细考虑其影响后才偏离这一要求。参见[PAD+98]中关于“拉伸ACK冲突”的讨论以及其中的参考文献,以了解生成ACK的频率低于每秒钟全尺寸段的可能性能问题。
In some cases, the sender and receiver may not agree on what constitutes a full-sized segment. An implementation is deemed to comply with this requirement if it sends at least one acknowledgment every time it receives 2*RMSS bytes of new data from the sender, where RMSS is the Maximum Segment Size specified by the receiver to the sender (or the default value of 536 bytes, per [Bra89], if the receiver does not specify an MSS option during connection establishment). The sender may be forced to use a segment size less than RMSS due to the maximum transmission unit (MTU), the path MTU discovery algorithm or other factors. For instance, consider the
在某些情况下,发送方和接收方可能无法就什么构成完整的数据段达成一致。如果实现在每次从发送方接收到2*RMSS字节的新数据时发送至少一个确认,则视为符合此要求,其中RMSS是接收方向发送方指定的最大段大小(或默认值536字节,按照[Bra89],如果接收器在连接建立期间未指定MSS选项)。由于最大传输单元(MTU)、路径MTU发现算法或其他因素,发送方可能被迫使用小于rms的段大小。例如,考虑
case when the receiver announces an RMSS of X bytes but the sender ends up using a segment size of Y bytes (Y < X) due to path MTU discovery (or the sender's MTU size). The receiver will generate stretch ACKs if it waits for 2*X bytes to arrive before an ACK is sent. Clearly this will take more than 2 segments of size Y bytes. Therefore, while a specific algorithm is not defined, it is desirable for receivers to attempt to prevent this situation, for example by acknowledging at least every second segment, regardless of size. Finally, we repeat that an ACK MUST NOT be delayed for more than 500 ms waiting on a second full-sized segment to arrive.
当接收方宣布一个X字节的RMS,但由于路径MTU发现(或发送方的MTU大小),发送方最终使用了Y字节的段大小(Y<X)。如果接收器在发送ACK之前等待2*X字节到达,它将生成拉伸ACK。显然,这将需要2个以上大小为Y字节的段。因此,虽然未定义特定算法,但接收机希望尝试防止这种情况,例如,通过至少确认每一秒段,而不管大小。最后,我们重申,在等待第二个完整段到达时,ACK的延迟不得超过500 ms。
Out-of-order data segments SHOULD be acknowledged immediately, in order to accelerate loss recovery. To trigger the fast retransmit algorithm, the receiver SHOULD send an immediate duplicate ACK when it receives a data segment above a gap in the sequence space. To provide feedback to senders recovering from losses, the receiver SHOULD send an immediate ACK when it receives a data segment that fills in all or part of a gap in the sequence space.
应立即确认无序数据段,以加快丢失恢复。为了触发快速重传算法,当接收器接收到序列空间中间隙上方的数据段时,应立即发送重复ACK。为了向从丢失中恢复的发送方提供反馈,接收方应在接收到填补序列空间中全部或部分空白的数据段时立即发送ACK。
A TCP receiver MUST NOT generate more than one ACK for every incoming segment, other than to update the offered window as the receiving application consumes new data [page 42, Pos81][Cla82].
TCP接收器不得为每个传入段生成多个ACK,除非在接收应用程序使用新数据时更新提供的窗口[第42页,Pos81][Cla82]。
A number of loss recovery algorithms that augment fast retransmit and fast recovery have been suggested by TCP researchers. While some of these algorithms are based on the TCP selective acknowledgment (SACK) option [MMFR96], such as [FF96,MM96a,MM96b], others do not require SACKs [Hoe96,FF96,FH98]. The non-SACK algorithms use "partial acknowledgments" (ACKs which cover new data, but not all the data outstanding when loss was detected) to trigger retransmissions. While this document does not standardize any of the specific algorithms that may improve fast retransmit/fast recovery, these enhanced algorithms are implicitly allowed, as long as they follow the general principles of the basic four algorithms outlined above.
TCP研究人员提出了许多增强快速重传和快速恢复的丢失恢复算法。虽然其中一些算法基于TCP选择性确认(SACK)选项[MMFR96],例如[FF96,MM96a,MM96b],但其他算法不需要SACK[Hoe96,FF96,FH98]。非SACK算法使用“部分确认”(包括新数据的确认,但不包括检测到丢失时所有未完成的数据)来触发重传。虽然本文件未对可能改进快速重传/快速恢复的任何特定算法进行标准化,但这些增强算法是隐含允许的,只要它们遵循上述四种基本算法的一般原则。
Therefore, when the first loss in a window of data is detected, ssthresh MUST be set to no more than the value given by equation (3). Second, until all lost segments in the window of data in question are repaired, the number of segments transmitted in each RTT MUST be no more than half the number of outstanding segments when the loss was detected. Finally, after all loss in the given window of segments has been successfully retransmitted, cwnd MUST be set to no more than ssthresh and congestion avoidance MUST be used to further increase cwnd. Loss in two successive windows of data, or the loss of a retransmission, should be taken as two indications of congestion and, therefore, cwnd (and ssthresh) MUST be lowered twice in this case.
因此,当检测到数据窗口中的第一个丢失时,ssthresh必须设置为不大于等式(3)给出的值。其次,在修复相关数据窗口中所有丢失的段之前,每个RTT中传输的段数不得超过检测到丢失时未完成段数的一半。最后,在成功重传给定段窗口中的所有丢失后,必须将cwnd设置为不超过ssthresh,并且必须使用拥塞避免来进一步增加cwnd。两个连续的数据窗口中的丢失或重新传输的丢失应被视为拥塞的两个迹象,因此,在这种情况下,cwnd(和ssthresh)必须降低两次。
The algorithms outlined in [Hoe96,FF96,MM96a,MM6b] follow the principles of the basic four congestion control algorithms outlined in this document.
[Hoe96、FF96、MM96a、MM6b]中概述的算法遵循本文件中概述的四种基本拥塞控制算法的原理。
This document requires a TCP to diminish its sending rate in the presence of retransmission timeouts and the arrival of duplicate acknowledgments. An attacker can therefore impair the performance of a TCP connection by either causing data packets or their acknowledgments to be lost, or by forging excessive duplicate acknowledgments. Causing two congestion control events back-to-back will often cut ssthresh to its minimum value of 2*SMSS, causing the connection to immediately enter the slower-performing congestion avoidance phase.
本文档要求TCP在存在重传超时和重复确认到达时降低其发送速率。因此,攻击者可以导致数据包或其确认丢失,或伪造过多的重复确认,从而损害TCP连接的性能。背对背引发两个拥塞控制事件通常会将ssthresh削减到其最小值2*SMS,导致连接立即进入执行较慢的拥塞避免阶段。
The Internet to a considerable degree relies on the correct implementation of these algorithms in order to preserve network stability and avoid congestion collapse. An attacker could cause TCP endpoints to respond more aggressively in the face of congestion by forging excessive duplicate acknowledgments or excessive acknowledgments for new data. Conceivably, such an attack could drive a portion of the network into congestion collapse.
互联网在很大程度上依赖于这些算法的正确实现,以保持网络稳定并避免拥塞崩溃。攻击者可以伪造过多的重复确认或过多的新数据确认,从而使TCP端点在遇到拥塞时做出更积极的响应。可以想象,这样的攻击会导致部分网络陷入拥塞崩溃。
This document has been extensively rewritten editorially and it is not feasible to itemize the list of changes between the two documents. The intention of this document is not to change any of the recommendations given in RFC 2001, but to further clarify cases that were not discussed in detail in 2001. Specifically, this document suggests what TCP connections should do after a relatively long idle period, as well as specifying and clarifying some of the issues pertaining to TCP ACK generation. Finally, the allowable upper bound for the initial congestion window has also been raised from one to two segments.
本文件经过广泛编辑改写,无法逐项列出两份文件之间的变更清单。本文件的目的不是改变RFC 2001中给出的任何建议,而是进一步澄清2001年未详细讨论的案例。具体来说,本文档建议TCP连接在相对较长的空闲时间后应该做什么,并指定和澄清与TCP ACK生成相关的一些问题。最后,初始拥塞窗口的允许上限也从一段提高到了两段。
Acknowledgments
致谢
The four algorithms that are described were developed by Van Jacobson.
所描述的四种算法是由Van Jacobson开发的。
Some of the text from this document is taken from "TCP/IP Illustrated, Volume 1: The Protocols" by W. Richard Stevens (Addison-Wesley, 1994) and "TCP/IP Illustrated, Volume 2: The Implementation" by Gary R. Wright and W. Richard Stevens (Addison-Wesley, 1995). This material is used with the permission of Addison-Wesley.
本文件的部分文本摘自W.Richard Stevens(Addison Wesley,1994)的“TCP/IP图解,第1卷:协议”和Gary R.Wright和W.Richard Stevens(Addison Wesley,1995)的“TCP/IP图解,第2卷:实现”。本材料经Addison Wesley许可使用。
Neal Cardwell, Sally Floyd, Craig Partridge and Joe Touch contributed a number of helpful suggestions.
Neal Cardwell、Sally Floyd、Craig Partridge和Joe Touch提出了许多有用的建议。
References
工具书类
[AFP98] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's Initial Window Size, RFC 2414, September 1998.
[AFP98]Allman,M.,Floyd,S.和C.Partridge,“增加TCP的初始窗口大小”,RFC 2414,1998年9月。
[Bra89] Braden, R., "Requirements for Internet Hosts -- Communication Layers", STD 3, RFC 1122, October 1989.
[Bra89]Braden,R.,“互联网主机的要求——通信层”,STD 3,RFC 1122,1989年10月。
[Bra97] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[Bra97]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[Cla82] Clark, D., "Window and Acknowledgment Strategy in TCP", RFC 813, July 1982.
[Cla82]Clark,D.,“TCP中的窗口和确认策略”,RFC 813,1982年7月。
[FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of Tahoe, Reno and SACK TCP", Computer Communication Review, July 1996. ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.
[FF96]Fall,K.和S.Floyd,“基于模拟的塔霍、雷诺和萨克TCP的比较”,《计算机通信评论》,1996年7月。ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.
[FH98] Floyd, S. and T. Henderson, "The NewReno Modification to TCP's Fast Recovery Algorithm", RFC 2582, April 1999.
[FH98]Floyd,S.和T.Henderson,“TCP快速恢复算法的NewReno修改”,RFC 25821999年4月。
[Flo94] Floyd, S., "TCP and Successive Fast Retransmits. Technical report", October 1994. ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.
[Flo94]Floyd,S.,“TCP和连续快速重传技术报告”,1994年10月。ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.
[Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion Control Scheme for TCP", In ACM SIGCOMM, August 1996.
[Hoe96]Hoe,J.“改进TCP拥塞控制方案的启动行为”,载于ACM SIGCOMM,1996年8月。
[HTH98] Hughes, A., Touch, J. and J. Heidemann, "Issues in TCP Slow-Start Restart After Idle", Work in Progress.
[HTH98]Hughes,A.,Touch,J.和J.Heidemann,“空闲后TCP缓慢启动重启中的问题”,正在进行的工作。
[Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
[Jac88]Jacobson,V.,“拥塞避免和控制”,《计算机通信评论》,第18卷,第4期,第314-329页,1988年8月。ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
[Jac90] Jacobson, V., "Modified TCP Congestion Avoidance Algorithm", end2end-interest mailing list, April 30, 1990. ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail.
[Jac90]Jacobson,V.,“改进的TCP拥塞避免算法”,end2end interest邮件列表,1990年4月30日。ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail.
[MD90] Mogul, J. and S. Deering, "Path MTU Discovery", RFC 1191, November 1990.
[MD90]Mogul,J.和S.Deering,“MTU发现路径”,RFC191990年11月。
[MM96a] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: Refining TCP Congestion Control", Proceedings of SIGCOMM'96, August, 1996, Stanford, CA. Available fromhttp://www.psc.edu/networking/papers/papers.html
[MM96a] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: Refining TCP Congestion Control", Proceedings of SIGCOMM'96, August, 1996, Stanford, CA. Available fromhttp://www.psc.edu/networking/papers/papers.html
[MM96b] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding Parameters", Technical report. Available from http://www.psc.edu/networking/papers/FACKnotes/current.
[MM96b]Mathis,M.和J.Mahdavi,“带边界参数的TCP速率减半”,技术报告。可从http://www.psc.edu/networking/papers/FACKnotes/current.
[MMFR96] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP Selective Acknowledgement Options", RFC 2018, October 1996.
[MMFR96]Mathis,M.,Mahdavi,J.,Floyd,S.和A.Romanow,“TCP选择性确认选项”,RFC 2018,1996年10月。
[PAD+98] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, J., Heavens, I., Lahey, K., Semke, J. and B. Volz, "Known TCP Implementation Problems", RFC 2525, March 1999.
[PAD+98]Paxson,V.,Allman,M.,Dawson,S.,Fenner,W.,Griner,J.,Skys,I.,Lahey,K.,Semke,J.和B.Volz,“已知的TCP实施问题”,RFC 25251999年3月。
[Pax97] Paxson, V., "End-to-End Internet Packet Dynamics", Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997.
[Pax97]Paxson,V.,“端到端互联网数据包动力学”,1997年9月于法国戛纳举行的SIGCOMM'97会议录。
[Pos81] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981.
[Pos81]Postel,J.,“传输控制协议”,标准7,RFC 793,1981年9月。
[Ste94] Stevens, W., "TCP/IP Illustrated, Volume 1: The Protocols", Addison-Wesley, 1994.
[Ste94]Stevens,W.“TCP/IP图解,第1卷:协议”,Addison-Wesley,1994年。
[Ste97] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997.
[Ste97]Stevens,W.“TCP慢启动、拥塞避免、快速重传和快速恢复算法”,RFC 2001,1997年1月。
[WS95] Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2: The Implementation", Addison-Wesley, 1995.
[WS95]Wright,G.和W.Stevens,“TCP/IP图解,第2卷:实现”,Addison-Wesley,1995年。
Authors' Addresses
作者地址
Mark Allman NASA Glenn Research Center/Sterling Software Lewis Field 21000 Brookpark Rd. MS 54-2 Cleveland, OH 44135 216-433-6586
马克·奥尔曼美国宇航局格伦研究中心/斯特林软件刘易斯菲尔德21000布鲁克帕克路,俄亥俄州克利夫兰54-2号,邮编44135 216-433-6586
EMail: mallman@grc.nasa.gov http://roland.grc.nasa.gov/~mallman
EMail: mallman@grc.nasa.gov http://roland.grc.nasa.gov/~mallman
Vern Paxson ACIRI / ICSI 1947 Center Street Suite 600 Berkeley, CA 94704-1198
Vern Paxson ACIRI/ICSI 1947加利福尼亚州伯克利中心街600号套房94704-1198
Phone: +1 510/642-4274 x302 EMail: vern@aciri.org
Phone: +1 510/642-4274 x302 EMail: vern@aciri.org
W. Richard Stevens 1202 E. Paseo del Zorro Tucson, AZ 85718 520-297-9416
W.Richard Stevens 1202 E.Paseo del Zorro Tucson,亚利桑那州85718 520-297-9416
EMail: rstevens@kohala.com http://www.kohala.com/~rstevens
EMail: rstevens@kohala.com http://www.kohala.com/~rstevens
Full Copyright Statement
完整版权声明
Copyright (C) The Internet Society (1999). All Rights Reserved.
版权所有(C)互联网协会(1999年)。版权所有。
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。