Internet Engineering Task Force (IETF)                        K. Nichols
Request for Comments: 8289                                 Pollere, Inc.
Category: Experimental                                       V. Jacobson
ISSN: 2070-1721                                         A. McGregor, Ed.
                                                         J. Iyengar, Ed.
                                                                  Google
                                                            January 2018
        
Internet Engineering Task Force (IETF)                        K. Nichols
Request for Comments: 8289                                 Pollere, Inc.
Category: Experimental                                       V. Jacobson
ISSN: 2070-1721                                         A. McGregor, Ed.
                                                         J. Iyengar, Ed.
                                                                  Google
                                                            January 2018
        

Controlled Delay Active Queue Management

受控延迟主动队列管理

Abstract

摘要

This document describes CoDel (Controlled Delay) -- a general framework that controls bufferbloat-generated excess delay in modern networking environments. CoDel consists of an estimator, a setpoint, and a control loop. It requires no configuration in normal Internet deployments.

本文描述了CoDel(受控延迟)——一个通用框架,用于控制现代网络环境中缓冲区膨胀产生的额外延迟。CoDel由一个估计器、一个设定点和一个控制回路组成。在正常的Internet部署中,它不需要配置。

Status of This Memo

关于下段备忘

This document is not an Internet Standards Track specification; it is published for examination, experimental implementation, and evaluation.

本文件不是互联网标准跟踪规范;它是为检查、实验实施和评估而发布的。

This document defines an Experimental Protocol for the Internet community. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 7841.

本文档为互联网社区定义了一个实验协议。本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 7841第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8289.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问https://www.rfc-editor.org/info/rfc8289.

Copyright Notice

版权公告

Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2018 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(https://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Conventions and Terms Used in This Document . . . . . . . . .   4
   3.  Understanding the Building Blocks of Queue Management . . . .   5
     3.1.  Estimator . . . . . . . . . . . . . . . . . . . . . . . .   6
     3.2.  Target Setpoint . . . . . . . . . . . . . . . . . . . . .   8
     3.3.  Control Loop  . . . . . . . . . . . . . . . . . . . . . .  10
   4.  Overview of the CoDel AQM . . . . . . . . . . . . . . . . . .  13
     4.1.  Non-starvation  . . . . . . . . . . . . . . . . . . . . .  14
     4.2.  Setting INTERVAL  . . . . . . . . . . . . . . . . . . . .  14
     4.3.  Setting TARGET  . . . . . . . . . . . . . . . . . . . . .  14
     4.4.  Use with Multiple Queues  . . . . . . . . . . . . . . . .  15
     4.5.  Setting Up CoDel  . . . . . . . . . . . . . . . . . . . .  16
   5.  Annotated Pseudocode for CoDel AQM  . . . . . . . . . . . . .  16
     5.1.  Data Types  . . . . . . . . . . . . . . . . . . . . . . .  17
     5.2.  Per-Queue State (codel_queue_t Instance Variables)  . . .  17
     5.3.  Constants . . . . . . . . . . . . . . . . . . . . . . . .  17
     5.4.  Enqueue Routine . . . . . . . . . . . . . . . . . . . . .  18
     5.5.  Dequeue Routine . . . . . . . . . . . . . . . . . . . . .  18
     5.6.  Helper Routines . . . . . . . . . . . . . . . . . . . . .  19
     5.7.  Implementation Considerations . . . . . . . . . . . . . .  21
   6.  Further Experimentation . . . . . . . . . . . . . . . . . . .  21
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  21
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  21
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  22
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  22
     9.2.  Informative References  . . . . . . . . . . . . . . . . .  22
   Appendix A.  Applying CoDel in the Data Center  . . . . . . . . .  24
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  25
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  25
        
   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Conventions and Terms Used in This Document . . . . . . . . .   4
   3.  Understanding the Building Blocks of Queue Management . . . .   5
     3.1.  Estimator . . . . . . . . . . . . . . . . . . . . . . . .   6
     3.2.  Target Setpoint . . . . . . . . . . . . . . . . . . . . .   8
     3.3.  Control Loop  . . . . . . . . . . . . . . . . . . . . . .  10
   4.  Overview of the CoDel AQM . . . . . . . . . . . . . . . . . .  13
     4.1.  Non-starvation  . . . . . . . . . . . . . . . . . . . . .  14
     4.2.  Setting INTERVAL  . . . . . . . . . . . . . . . . . . . .  14
     4.3.  Setting TARGET  . . . . . . . . . . . . . . . . . . . . .  14
     4.4.  Use with Multiple Queues  . . . . . . . . . . . . . . . .  15
     4.5.  Setting Up CoDel  . . . . . . . . . . . . . . . . . . . .  16
   5.  Annotated Pseudocode for CoDel AQM  . . . . . . . . . . . . .  16
     5.1.  Data Types  . . . . . . . . . . . . . . . . . . . . . . .  17
     5.2.  Per-Queue State (codel_queue_t Instance Variables)  . . .  17
     5.3.  Constants . . . . . . . . . . . . . . . . . . . . . . . .  17
     5.4.  Enqueue Routine . . . . . . . . . . . . . . . . . . . . .  18
     5.5.  Dequeue Routine . . . . . . . . . . . . . . . . . . . . .  18
     5.6.  Helper Routines . . . . . . . . . . . . . . . . . . . . .  19
     5.7.  Implementation Considerations . . . . . . . . . . . . . .  21
   6.  Further Experimentation . . . . . . . . . . . . . . . . . . .  21
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  21
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  21
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  22
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  22
     9.2.  Informative References  . . . . . . . . . . . . . . . . .  22
   Appendix A.  Applying CoDel in the Data Center  . . . . . . . . .  24
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  25
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  25
        
1. Introduction
1. 介绍

The "persistently full buffer" problem has been discussed in the IETF community since the early 80s [RFC896]. The IRTF's End-to-End Research Group called for the deployment of Active Queue Management (AQM) to solve the problem in 1998 [RFC2309]. Despite this awareness, the problem has only gotten worse as growth in memory density per Moore's Law fueled an exponential increase in buffer pool size. Efforts to deploy AQM have been frustrated by difficult configuration and negative impact on network utilization. This "bufferbloat" problem [BLOAT] has become increasingly important throughout the Internet but particularly at the consumer edge. Queue management has become more critical due to increased consumer use of the Internet, mixing large video transactions with time-critical VoIP and gaming.

自80年代初以来,IETF社区一直在讨论“持续满缓冲区”问题[RFC896]。IRTF的端到端研究小组在1998年呼吁部署主动队列管理(AQM)来解决这个问题[RFC2309]。尽管有这样的认识,但随着按摩尔定律计算的内存密度的增长推动了缓冲池大小的指数级增长,问题只会变得更糟。由于配置困难和对网络利用率的负面影响,部署AQM的努力受到阻碍。这种“缓冲区膨胀”问题在整个互联网上变得越来越重要,尤其是在消费者边缘。由于消费者对互联网的使用增加,将大型视频交易与时间紧迫的VoIP和游戏相结合,队列管理变得更加重要。

An effective AQM remediates bufferbloat at a bottleneck while "doing no harm" at hops where buffers are not bloated. However, the development and deployment of AQM are frequently subject to misconceptions about the cause of packet queues in network buffers. Network buffers exist to absorb the packet bursts that occur naturally in statistically multiplexed networks. Buffers helpfully absorb the queues created by reasonable packet network behavior such as short-term mismatches in traffic arrival and departure rates that arise from upstream resource contention, transport conversation startup transients, and/or changes in the number of conversations sharing a link. Unfortunately, other less useful network behaviors can cause queues to fill, and their effects are not nearly as benign. Discussion of these issues and the reason why the solution is not simply "smaller buffers" can be found in [RFC2309], [VANQ2006], [REDL1998], and [CODEL2012]. To understand queue management, it is critical to understand the difference between the necessary, useful "good" queue and the counterproductive "bad" queue.

一个有效的AQM在瓶颈处修复缓冲区膨胀,而在缓冲区未膨胀的跃点处“无害”。然而,AQM的开发和部署经常受到关于网络缓冲区中数据包队列原因的误解。存在网络缓冲器以吸收在统计复用网络中自然发生的分组突发。缓冲区有助于吸收由合理的分组网络行为(例如,由于上游资源争用、传输会话启动瞬态和/或共享链路的会话数量的变化而产生的流量到达率和离开率的短期不匹配)创建的队列。不幸的是,其他不太有用的网络行为可能会导致队列填满,而且它们的影响也不是那么好。[RFC2309]、[VANQ2006]、[REDL1998]和[CODEL2012]中对这些问题的讨论以及解决方案不仅仅是“更小的缓冲区”的原因。要理解队列管理,关键是要理解必要的、有用的“好”队列和适得其反的“坏”队列之间的区别。

Several approaches to AQM have been developed over the past two decades, but none have been widely deployed due to performance problems. When designed with the wrong conceptual model for queues, AQMs have limited operational range, require a lot of configuration tweaking, and frequently impair rather than improve performance. Learning from this past history, the CoDel approach is designed to meet the following goals:

在过去二十年中,已经开发了几种AQM方法,但由于性能问题,没有一种方法得到广泛应用。当使用错误的队列概念模型进行设计时,AQM的操作范围有限,需要大量的配置调整,并且经常损害而不是提高性能。从过去的历史中学习,CoDel方法旨在实现以下目标:

o Make AQM parameterless for normal operation, with no knobs for operators, users, or implementers to adjust.

o 使AQM正常运行时无参数,操作员、用户或实施者无需调节旋钮。

o Be able to distinguish "good" queue from "bad" queue and treat them differently, that is, keep delay low while permitting necessary bursts of traffic.

o 能够区分“好”队列和“坏”队列,并区别对待它们,即在允许必要流量爆发的同时保持较低的延迟。

o Control delay while insensitive (or nearly so) to round-trip delays, link rates, and traffic loads; this goal is to "do no harm" to network traffic while controlling delay.

o 控制延迟,同时对往返延迟、链路速率和流量负载不敏感(或几乎不敏感);这个目标是在控制延迟的同时“不损害”网络流量。

o Adapt to dynamically changing link rates with no negative impact on utilization.

o 适应动态变化的链路速率,对利用率没有负面影响。

o Allow simple and efficient implementation (can easily span the spectrum from low-end access points and home routers up to high-end router hardware).

o 允许简单高效的实施(可以轻松跨越从低端接入点和家庭路由器到高端路由器硬件的范围)。

CoDel has five major differences from prior AQMs: use of the local queue minimum to track congestion ("bad" queue), use of an efficient single state variable representation of that tracked statistic, use of packet sojourn time as the observed datum (rather than packets, bytes, or rates), use of a mathematically determined setpoint derived from maximizing network power [KLEIN81], and a modern state-space controller.

CoDel与以前的AQM有五个主要区别:使用本地队列最小值来跟踪拥塞(“坏”队列)、使用该跟踪统计的有效单状态变量表示、使用数据包驻留时间作为观察数据(而不是数据包、字节或速率),使用从最大化网络功率[KLEIN81]得出的数学确定的设定点和现代状态空间控制器。

CoDel configures itself based on a round-trip time metric that can be set to 100 ms for the normal, terrestrial Internet. With no changes to parameters, CoDel is expected to work across a wide range of conditions, with varying links and the full range of terrestrial round-trip times.

CoDel根据往返时间指标进行自我配置,对于正常的地面互联网,往返时间指标可设置为100毫秒。在不改变参数的情况下,CoDel有望在各种条件下工作,具有不同的链路和全范围的地面往返时间。

CoDel is easily adapted to multiple queue systems as shown by [RFC8290]. Implementers and users SHOULD use the fq_codel multiple-queue approach as it deals with many problems beyond the reach of an AQM on a single queue.

CoDel很容易适应多队列系统,如[RFC8290]所示。实施者和用户应该使用fq_codel多队列方法,因为它可以处理AQM无法在单个队列上处理的许多问题。

CoDel was first published in [CODEL2012] and has been implemented in the Linux kernel.

CoDel最早发表于[CODEL2012],并已在Linux内核中实现。

Note that while this document refers to dropping packets when indicated by CoDel, it may be reasonable to ECN-mark packets instead.

请注意,虽然本文档提到在CoDel指示时丢弃数据包,但ECN标记数据包可能是合理的。

2. Conventions and Terms Used in This Document
2. 本文件中使用的约定和术语

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“建议”、“不建议”、“可”和“可选”在所有大写字母出现时(如图所示)应按照BCP 14[RFC2119][RFC8174]所述进行解释。

The following terms are used in this document and are defined as follows:

本文件中使用了以下术语,其定义如下:

sojourn time: the amount of time a packet has spent in a particular buffer, i.e., the time a packet departs the buffer minus the time the packet arrived at the buffer. A packet can depart a buffer via transmission or drop.

逗留时间:数据包在特定缓冲区内的时间量,即数据包离开缓冲区的时间减去数据包到达缓冲区的时间。数据包可以通过传输或丢弃离开缓冲区。

standing queue: a queue (in packets, bytes, or time delay) in a buffer that persists for a "long" time, where "long" is on the order of the longer round-trip times through the buffer, as discussed in Section 4.2. A standing queue occurs when the minimum queue over the "long" time is non-zero and is usually tolerable and even desirable as long as it does not exceed some target delay.

站立队列:缓冲区中持续“长”时间的队列(以数据包、字节或时间延迟为单位),其中“长”是指通过缓冲区的较长往返时间,如第4.2节所述。当“长”时间内的最小队列为非零且通常是可容忍的,甚至是理想的,只要它不超过某个目标延迟,就会出现站立队列。

bottleneck bandwidth: the limiting bandwidth along a network path.

瓶颈带宽:沿网络路径的限制带宽。

3. Understanding the Building Blocks of Queue Management
3. 了解队列管理的构造块

At the heart of queue management is the notion of "good" queue and "bad" queue and the search for ways to get rid of the "bad" queue (which only adds delay) while preserving the "good" queue (which provides for good utilization). This section explains queueing, both good and bad, and covers the CoDel building blocks that can be used to manage packet buffers to keep their queues in the "good" range.

队列管理的核心是“好”队列和“坏”队列的概念,以及寻找摆脱“坏”队列(只增加延迟)的方法,同时保留“好”队列(提供良好的利用率)。本节解释排队的好坏,并介绍可用于管理数据包缓冲区以使其队列保持在“良好”范围内的CoDel构建块。

Packet queues form in buffers facing bottleneck links, i.e., where the line rate goes from high to low or where many links converge. The well-known bandwidth-delay product (sometimes called "pipe size") is the bottleneck's bandwidth multiplied by the sender-receiver-sender round-trip delay; it is the amount of data that has to be in transit between two hosts in order to run the bottleneck link at 100% utilization. To explore how queues can form, consider a long-lived TCP connection with a 25-packet window sending through a connection with a bandwidth-delay product of 20 packets. After an initial burst of packets, the connection will settle into a 5-packet (+/-1) standing queue; this standing queue size is determined by the mismatch between the window size and the pipe size and is unrelated to the connection's sending rate. The connection has 25 packets in flight at all times, but only 20 packets arrive at the destination over a round-trip time. If the TCP connection has a 30-packet window, the queue will be 10 packets with no change in sending rate. Similarly, if the window is 20 packets, there will be no queue, but the sending rate is the same. Nothing can be inferred about the sending rate from the queue size, and any queue other than transient bursts only creates delays in the network. The sender needs to reduce the number of packets in flight rather than the sending rate.

数据包队列形成于面向瓶颈链路的缓冲区中,即线路速率从高到低或多个链路汇聚的缓冲区。众所周知的带宽延迟乘积(有时称为“管道尺寸”)是瓶颈带宽乘以发送方-接收方往返延迟;这是两台主机之间必须传输的数据量,以便以100%的利用率运行瓶颈链路。为了探索队列如何形成,考虑与25包窗口通过与20个包的带宽延迟产品的连接发送的长寿命TCP连接。在初始数据包突发之后,连接将进入一个5数据包(+/-1)的固定队列;此固定队列大小由窗口大小和管道大小之间的不匹配决定,与连接的发送速率无关。该连接在任何时候都有25个数据包在飞行中,但在往返时间内只有20个数据包到达目的地。如果TCP连接有30个数据包窗口,则队列将是10个数据包,发送速率不变。类似地,如果窗口为20个数据包,则不会有队列,但发送速率相同。不能从队列大小推断发送速率,除了瞬时突发之外的任何队列只会在网络中产生延迟。发送方需要减少传输中的数据包数量,而不是发送速率。

In the above example, the 5-packet standing queue can be seen to contribute nothing but delay to the connection and thus is clearly "bad" queue. If, in our example, there is a single bottleneck link and it is much slower than the link that feeds it (say, a high-speed Ethernet link into a limited DSL uplink), then a 20-packet buffer at the bottleneck might be necessary to temporarily hold the 20 packets in flight to keep the bottleneck link's utilization high. The burst of packets should drain completely (to 0 or 1 packets) within a round-trip time, and this transient queue is "good" queue because it allows the connection to keep the 20 packets in flight and the bottleneck link to be fully utilized. In terms of the delay experienced, the "good" queue goes away in about a round-trip time, while "bad" queue hangs around for longer, causing delays.

在上面的示例中,可以看到5包站立队列只对连接造成延迟,因此显然是“坏”队列。在我们的示例中,如果存在一个瓶颈链路,并且该链路比向其提供数据的链路慢得多(例如,高速以太网链路进入有限的DSL上行链路),那么可能需要在瓶颈处设置一个20数据包缓冲区,以暂时保持20数据包的传输,从而保持瓶颈链路的高利用率。在往返时间内,数据包的突发应该完全耗尽(0或1个数据包),而这个临时队列是“良好”队列,因为它允许连接保持20个数据包在飞行中,并充分利用瓶颈链路。就所经历的延迟而言,“好”队列大约在往返时间内消失,而“坏”队列停留的时间更长,导致延迟。

Effective queue management detects "bad" queue while ignoring "good" queue and takes action to get rid of the "bad" queue when it is detected. The goal is a queue controller that accomplishes this objective. To control a queue, we need three basic components:

有效的队列管理会检测到“坏”队列,而忽略“好”队列,并在检测到“坏”队列时采取措施清除它。目标是实现此目标的队列控制器。要控制队列,我们需要三个基本组件:

o Estimator - To figure out what we've got.

o 估计器-计算出我们得到了什么。

o Target setpoint - To know what we want.

o 目标设定点-了解我们想要什么。

o Control loop - If what we've got isn't what we want, we need a way to move it there.

o 控制回路-如果我们得到的不是我们想要的,我们需要一种方法将其移动到那里。

3.1. Estimator
3.1. 估计员

The estimator both observes the queue and detects when "good" queue turns to "bad" queue and vice versa. CoDel has two parts to its estimator: what is observed as an indicator of queue and how the observations are used to detect "good"/"bad" queue.

估计器观察队列并检测“好”队列何时变为“坏”队列,反之亦然。CoDel的估计器有两个部分:观察到什么作为队列的指示器,以及如何使用这些观察来检测“好的”/“坏的”队列。

Queue length has been widely used as an observed indicator of congestion and is frequently conflated with sending rate. Use of queue length as a metric is sensitive to how and when the length is observed. A high-speed arrival link to a buffer serviced at a much lower rate can rapidly build up a queue that might disperse completely or down to a single packet before a round-trip time has elapsed. If the queue length is monitored at packet arrival (as in original Random Early Detection (RED)) or departure time, every packet will see a queue with one possible exception. If the queue length itself is time sampled (as recommended in [REDL1998]), a truer picture of the queue's occupancy can be gained at the expense of considerable implementation complexity.

队列长度已被广泛用作拥塞的观察指标,并经常与发送速率混为一谈。使用队列长度作为度量对观察长度的方式和时间非常敏感。高速到达链接到以更低速率提供服务的缓冲区可以快速建立一个队列,该队列可能在往返时间过去之前完全分散或下降到单个数据包。如果在数据包到达时(如原始随机早期检测(RED))或离开时监控队列长度,则每个数据包将看到一个可能存在异常的队列。如果队列长度本身是时间采样的(如[REDL1998]中所建议的),则可以获得队列占用情况的更真实的图片,而代价是相当大的实现复杂性。

The use of queue length is further complicated in networks that are subject to both short- and long-term changes in available link rate (as in WiFi). Link rate drops can result in a spike in queue length that should be ignored unless it persists. It is not the queue length that should be controlled but the amount of excess delay packets experience due to a persistent or standing queue, which means that the packet sojourn time in the buffer is exactly what we want to track. Tracking the packet sojourn times in the buffer observes the actual delay experienced by each packet. Sojourn time allows queue management to be independent of link rate, gives superior performance to use of buffer size, and is directly related to user-visible performance. It works regardless of line rate changes or link sharing by multiple queues (which the individual queues may experience as changing rates).

队列长度的使用在可用链路速率(如WiFi)发生短期和长期变化的网络中更加复杂。链路速率下降可能导致队列长度出现峰值,除非这种峰值持续存在,否则应忽略该峰值。应该控制的不是队列长度,而是由于持久队列或固定队列而经历的额外延迟数据包的数量,这意味着数据包在缓冲区中的驻留时间正是我们想要跟踪的。跟踪数据包在缓冲区中的停留时间可以观察每个数据包所经历的实际延迟。逗留时间允许队列管理独立于链路速率,为缓冲区大小的使用提供了优异的性能,并且与用户可见的性能直接相关。无论多个队列的线路速率变化或链路共享(单个队列可能会经历速率变化),它都可以工作。

Consider a link shared by two queues with different priorities. Packets that arrive at the high-priority queue are sent as soon as the link is available, while packets in the other queue have to wait until the high-priority queue is empty (i.e., a strict priority scheduler). The number of packets in the high-priority queue might be large, but the queue is emptied quickly, and the amount of time each packet spends enqueued (the sojourn time) is not large. The other queue might have a smaller number of packets, but packet sojourn times will include the waiting time for the high-priority packets to be sent. This makes the sojourn time a good sample of the congestion that each separate queue is experiencing. This example also shows how the metric of sojourn time is independent of the number of queues or the service discipline used and is instead indicative of congestion seen by the individual queues.

考虑由两个具有不同优先级的队列共享的链接。到达高优先级队列的数据包在链路可用时立即发送,而另一个队列中的数据包必须等待高优先级队列为空(即,严格优先级调度程序)。高优先级队列中的数据包数量可能很大,但队列很快就会清空,每个数据包排队的时间(逗留时间)也不大。另一个队列可能具有较少数量的数据包,但数据包驻留时间将包括高优先级数据包发送的等待时间。这使得逗留时间成为每个单独队列所经历的拥塞的一个很好的样本。此示例还显示了逗留时间度量如何独立于队列数量或使用的服务规程,并指示各个队列看到的拥塞。

How can observed sojourn time be used to separate "good" queue from "bad" queue? Although averages, especially of queue length, have previously been widely used as an indicator of "bad" queue, their efficacy is questionable. Consider the burst that disperses every round-trip time. The average queue will be one-half the burst size, though this might vary depending on when the average is computed and the timing of arrivals. The average queue sojourn time would be one-half the time it takes to clear the burst. The average then would indicate a persistent queue where there is none. Instead of averages, we recommend tracking the minimum sojourn time; then, if there is one packet that has a zero sojourn time, there is no persistent queue.

如何使用观察到的逗留时间来区分“好”队列和“坏”队列?虽然平均值,特别是队列长度,以前被广泛用作“坏”队列的指标,但其有效性值得怀疑。考虑分散每一次往返时间的突发。平均队列将是突发大小的一半,尽管这可能因计算平均值的时间和到达时间而异。平均队列停留时间将是清除突发所需时间的一半。然后,平均值将指示一个没有的持久队列。我们建议跟踪最短逗留时间,而不是平均值;然后,如果有一个包的逗留时间为零,则不存在持久队列。

A persistent queue can be detected by tracking the (local) minimum queue delay packets experience. To ensure that this minimum value does not become stale, it has to have been experienced recently, i.e., during an appropriate past time interval. This interval is the maximum amount of time a minimum value is considered to be in effect

通过跟踪(本地)最小队列延迟数据包,可以检测到持久队列。为确保该最小值不会过时,必须在最近经历过,即在适当的过去时间间隔内。此间隔是认为最小值有效的最大时间量

and is related to the amount of time it takes for the largest expected burst to drain. Conservatively, this interval SHOULD be at least a round-trip time to avoid falsely detecting a persistent queue and not a lot more than a round-trip time to avoid delay in detecting the persistent queue. This suggests that the appropriate interval value is the maximum round-trip time of all the connections sharing the buffer.

它与最大的预期脉冲群耗尽所需的时间有关。保守地说,这个间隔应该至少是一个往返时间,以避免错误地检测到持久性队列,而不是比一个往返时间多得多,以避免在检测持久性队列时出现延迟。这表明适当的间隔值是共享缓冲区的所有连接的最大往返时间。

Note that the following key insight makes computation of the local minimum efficient: it is sufficient to keep a single state variable that indicates how long the minimum has been above or below the target value rather than retaining all the local values to compute the minimum, which leads to both storage and computational savings. We use this insight in the pseudocode for CoDel later in the document.

请注意,以下关键细节使局部最小值的计算变得高效:只保留一个状态变量就足够了,该变量指示最小值高于或低于目标值的时间,而不是保留所有局部值来计算最小值,这将节省存储和计算量。我们在文档后面的CoDel伪代码中使用了这一观点。

These two parts, use of sojourn time as the observed value and the local minimum as the statistic to monitor queue congestion, are key to CoDel's estimator building block. The local minimum sojourn time provides an accurate and robust measure of standing queue and has an efficient implementation. In addition, use of the minimum sojourn time has important advantages in implementation. The minimum packet sojourn can only be decreased when a packet is dequeued, which means that all the work of CoDel can take place when packets are dequeued for transmission and that no locks are needed in the implementation. The minimum is the only statistic with this property.

这两个部分,使用逗留时间作为观察值和局部最小值作为统计数据来监控队列拥塞,是CoDel估计器构建块的关键。本地最小逗留时间提供了一个准确而健壮的站立队列度量,并且具有高效的实现。此外,使用最小逗留时间在实现中具有重要优势。只有当数据包退出队列时,才能减少最小数据包驻留时间,这意味着当数据包退出队列进行传输时,CoDel的所有工作都可以进行,并且在实现中不需要锁。最小值是唯一具有此属性的统计信息。

A more detailed explanation with many pictures can be found in [TSV84].

[TSV84]中提供了许多图片的详细说明。

3.2. Target Setpoint
3.2. 目标设定点

Now that we have a robust way of detecting standing queue, we need a target setpoint that tells us when to act. If the controller is set to take action as soon as the estimator has a non-zero value, the average drop rate will be maximized, which minimizes TCP goodput [MACTCP1997]. Also, this policy results in no backlog over time (no persistent queue), which negates much of the value of having a buffer, since it maximizes the bottleneck link bandwidth lost due to normal stochastic variation in packet interarrival time. We want a target that maximizes utilization while minimizing delay. Early in the history of packet networking, Kleinrock developed the analytic machinery to do this using a quantity he called "power", which is the ratio of a normalized throughput to a normalized delay [KLEIN81].

现在我们有了一种检测站立队列的健壮方法,我们需要一个目标设定点来告诉我们何时采取行动。如果控制器设置为在估计器具有非零值时立即采取行动,则平均下降率将最大化,从而使TCP goodput最小化[MACTCP1997]。此外,该策略不会导致时间上的积压(没有持久队列),这否定了拥有缓冲区的大部分价值,因为它最大化了由于数据包到达间隔时间的正常随机变化而造成的瓶颈链路带宽损失。我们想要一个目标,最大限度地提高利用率,同时最大限度地减少延迟。在分组网络的早期历史中,Kleinrock开发了一种分析机器,使用他称之为“功率”的量来实现这一点,功率是标准化吞吐量与标准化延迟的比率[KLEIN81]。

It is straightforward to derive an analytic expression for the average goodput of a TCP conversation at a given round-trip time r and target f (where f is expressed as a fraction of r). Reno TCP, for example, yields:

推导TCP会话在给定往返时间r和目标f(其中f表示为r的分数)的平均goodput的解析表达式很简单。例如,雷诺TCP可以产生:

   goodput = r (3 + 6f - f^2) / (4 (1+f))
        
   goodput = r (3 + 6f - f^2) / (4 (1+f))
        

Since the peak queue delay is simply the product of f and r, power is solely a function of f since the r's in the numerator and denominator cancel:

由于峰值队列延迟仅为f和r的乘积,功率仅为f的函数,因为分子和分母中的r取消:

   power is proportional to (1 + 2f - 1/3 f^2) / (1 + f)^2
        
   power is proportional to (1 + 2f - 1/3 f^2) / (1 + f)^2
        

As Kleinrock observed, the best operating point (in terms of bandwidth/delay trade-off) is the peak power point, since points off the peak represent a higher cost (in delay) per unit of bandwidth. The power vs. f curve for any Additive Increase Multiplicative Decrease (AIMD) TCP is monotone decreasing. But the curve is very flat for f < 0.1, followed by an increasing curvature with a knee around f = 0.2, then a steep, almost linear fall off [TSV84]. Since the previous equation showed that goodput is monotone increasing with f, the best operating point is near the right edge of the flat top, since that represents the highest goodput achievable for a negligible increase in delay. However, since the r in the model is a conservative upper bound, a target of 0.1r runs the risk of pushing shorter RTT connections over the knee and giving them higher delay for no significant goodput increase. Generally, a more conservative target of 0.05r offers a good utilization vs. delay trade-off while giving enough headroom to work well with a large variation in real RTT.

正如Kleinrock所观察到的,最佳工作点(就带宽/延迟权衡而言)是峰值功率点,因为峰值以外的点代表了每单位带宽的更高成本(延迟)。任何加性增加乘性减少(AIMD)TCP的功率与f曲线都是单调递减的。但当f<0.1时,曲线非常平坦,随后弯曲度增加,膝关节围绕f=0.2,然后急剧下降,几乎呈线性下降[TSV84]。由于前面的方程式表明,goodput随f单调增加,因此最佳工作点位于平顶的右边缘附近,因为这表示延迟增加可以忽略不计的最高goodput。然而,由于模型中的r是一个保守的上限,0.1r的目标存在将较短的RTT连接推到膝盖上的风险,并使其在没有显著的输出增加的情况下具有更高的延迟。一般来说,0.05r这一更保守的目标提供了良好的利用率和延迟权衡,同时提供了足够的净空,以便在实际RTT变化较大的情况下正常工作。

As the above analysis shows, a very small standing queue gives close to 100% utilization of the bottleneck link. While this result was for Reno TCP, the derivation uses only properties that must hold for any "TCP friendly" transport. We have verified by both analysis and simulation that this result holds for Reno, Cubic, and Westwood [TSV84]. This results in a particularly simple form for the target: the ideal range for the permitted standing queue, or the target setpoint, is between 5% and 10% of the TCP connection's RTT.

正如上面的分析所显示的,一个非常小的站立队列提供了接近100%的瓶颈链路利用率。虽然此结果适用于Reno TCP,但派生只使用任何“TCP友好”传输必须保留的属性。我们通过分析和模拟验证了这一结果适用于雷诺、立方和韦斯特伍德[TSV84]。这导致目标的形式特别简单:允许的站立队列或目标设定点的理想范围在TCP连接RTT的5%到10%之间。

We used simulation to explore the impact when TCPs are mixed with other traffic and with connections of different RTTs. Accordingly, we experimented extensively with values in the 5-10% of RTT range and, overall, used target values between 1 and 20 milliseconds for RTTs from 30 to 500 ms and link bandwidths of 64 Kbps to 100 Mbps to experimentally explore a value for the target that gives consistently

我们使用模拟来探索TCP与其他流量以及不同RTT连接混合时的影响。因此,我们对RTT范围的5-10%的值进行了广泛的实验,总体而言,对于30-500 ms的RTT和64 Kbps-100 Mbps的链路带宽,我们使用了1-20毫秒之间的目标值,以在实验上探索一个目标值,该目标值给出一致的结果

high utilization while controlling delay across a range of bandwidths, RTTs, and traffic loads. Our results were notably consistent with the mathematics above.

高利用率,同时控制带宽、RTT和流量负载范围内的延迟。我们的结果与上面的数学结果非常一致。

A congested (but not overloaded) CoDel link with traffic composed solely or primarily of long-lived TCP flows will have a median delay through the link that will tend to the target. For bursty traffic loads and for overloaded conditions (where it is difficult or impossible for all the arriving flows to be accommodated), the median queues will be longer than the target.

拥塞(但不是过载)的CoDel链路,其流量单独或主要由长寿命TCP流组成,将有一个通过链路的中间延迟,该延迟将趋向于目标。对于突发性交通负荷和超载情况(难以或不可能容纳所有到达的流量),中值队列将比目标队列长。

The non-starvation drop inhibit feature dominates where the link rate becomes very small. By inhibiting drops when there is less than an (outbound link) MTU worth of bytes in the buffer, CoDel adapts to very low bandwidth links, as shown in [CODEL2012].

当链路速率变得非常小时,非饥饿下降抑制特性占主导地位。当缓冲区中的字节数少于(出站链路)MTU时,CoDel通过抑制丢弃来适应非常低的带宽链路,如[CODEL2012]所示。

3.3. Control Loop
3.3. 控制回路

Section 3.1 describes a simple, reliable way to measure "bad" (persistent) queue. Section 3.2 shows that TCP congestion control dynamics gives rise to a target setpoint for this measure that's a provably good balance between enhancing throughput and minimizing delay. Section 3.2 also shows that this target is a constant fraction of the same "largest average RTT" interval used to distinguish persistent from transient queue. The only remaining building block needed for a basic AQM is a "control loop" algorithm to effectively drive the queueing system from any "persistent queue above the target" state to a state where the persistent queue is below the target.

第3.1节描述了一种简单、可靠的测量“坏”(持久)队列的方法。第3.2节显示,TCP拥塞控制动态为该措施提供了一个目标设定点,这是提高吞吐量和最小化延迟之间的一个良好平衡。第3.2节还显示,该目标是用于区分持久队列和瞬态队列的相同“最大平均RTT”间隔的常数部分。基本AQM所需的唯一剩余构建块是“控制循环”算法,用于有效地将排队系统从任何“高于目标的持久队列”状态驱动到持久队列低于目标的状态。

Control theory provides a wealth of approaches to the design of control loops. Most of classical control theory deals with the control of linear, time-invariant, Single-Input-Single-Output (SISO) systems. Control loops for these systems generally come from a well-understood class known as Proportional-Integral-Derivative (PID) controllers. Unfortunately, a queue is not a linear system, and an AQM operates at the point of maximum non-linearity (where the output link bandwidth saturates, so increased demand creates delay rather than higher utilization). Output queues are also not time invariant since traffic is generally a mix of connections that start and stop at arbitrary times and that can have radically different behaviors ranging from "open-loop" UDP audio/video to "closed-loop" congestion-avoiding TCP. Finally, the constantly changing mix of connections (which can't be converted to a single "lumped parameter" model because of their transfer function differences) makes the system Multi-Input-Multi-Output (MIMO), not SISO.

控制理论为控制回路的设计提供了丰富的方法。大多数经典控制理论研究线性、时不变、单输入单输出(SISO)系统的控制。这些系统的控制回路通常来自众所周知的比例积分微分(PID)控制器。不幸的是,队列不是线性系统,AQM在最大非线性点运行(输出链路带宽饱和,因此增加的需求会产生延迟,而不是更高的利用率)。输出队列也不是时不变的,因为流量通常是在任意时间启动和停止的连接的混合,并且可能具有从“开环”UDP音频/视频到“闭环”避免TCP拥塞的完全不同的行为。最后,不断变化的连接组合(由于传递函数的差异,无法转换为单一的“集总参数”模型)使系统成为多输入多输出(MIMO),而不是SISO。

Since queueing systems match none of the prerequisites for a classical controller, a better approach is a modern state-space controller with "no persistent queue" and "has persistent queue" states. Since Internet traffic mixtures change rapidly and unpredictably, a noise- and error-tolerant adaptation algorithm like stochastic gradient is a good choice. Since there's essentially no information in the amount of persistent queue [TSV84], the adaptation should be driven by how long it has persisted.

由于排队系统不符合经典控制器的任何先决条件,因此更好的方法是使用具有“无持久队列”和“具有持久队列”状态的现代状态空间控制器。由于互联网流量混合变化迅速且不可预测,因此一种像随机梯度这样的噪声和容错自适应算法是一个不错的选择。由于持久队列[TSV84]的数量基本上没有任何信息,因此自适应应该由其持续的时间来驱动。

Consider the two extremes of traffic behavior: a single, open-loop UDP video stream and a single, long-lived TCP bulk data transfer. If the average bandwidth of the UDP video stream is greater than the bottleneck link rate, the link's queue will grow, and the controller will eventually enter "has persistent queue" state and start dropping packets. Since the video stream is open loop, its arrival rate is unaffected by drops, so the queue will persist until the average drop rate is greater than the output bandwidth deficit (= average arrival rate - average departure rate); the job of the adaptation algorithm is to discover this rate. For this example, the adaptation could consist of simply estimating the arrival and departure rates and then dropping at a rate slightly greater than their difference, but this class of algorithm won't work at all for the bulk data TCP stream. TCP runs in closed-loop flow balance [TSV84], so its arrival rate is almost always exactly equal to the departure rate -- the queue isn't the result of a rate imbalance but rather a mismatch between the TCP sender's window and the source-destination-source round-trip path capacity (i.e., the connection's bandwidth-delay product). The sender's TCP congestion avoidance algorithm will slowly increase the send window (one packet per round-trip time) [RFC5681], which will eventually cause the bottleneck to enter "has persistent queue" state. But, since the average input rate is the same as the average output rate, the rate deficit estimation that gave the correct drop rate for the video stream would compute a drop rate of zero for the TCP stream. However, if the output link drops one packet as it enters "has persistent queue" state, when the sender discovers this (via TCP's normal packet loss repair mechanisms), it will reduce its window by a factor of two [RFC5681]; so, one round-trip time after the drop, the persistent queue will go away.

考虑流量行为的两个极端:一个单一的、开环的UDP视频流和一个单一的、长寿命的TCP批量数据传输。如果UDP视频流的平均带宽大于瓶颈链路速率,链路的队列将增长,控制器最终将进入“具有持久队列”状态并开始丢弃数据包。由于视频流是开环的,其到达率不受丢弃的影响,因此队列将持续,直到平均丢弃率大于输出带宽赤字(=平均到达率-平均离开率);自适应算法的工作就是发现这个速率。对于本例,自适应可能包括简单地估计到达和离开速率,然后以略大于其差值的速率下降,但这类算法对于批量数据TCP流根本不起作用。TCP在闭环流平衡中运行[TSV84],因此其到达率几乎总是完全等于离开率——队列不是速率不平衡的结果,而是TCP发送方的窗口和源-目的地往返路径容量(即连接的带宽延迟积)之间的不匹配。发送方的TCP拥塞避免算法将缓慢增加发送窗口(每个往返时间一个数据包)[RFC5681],这将最终导致瓶颈进入“具有持久队列”状态。但是,由于平均输入速率与平均输出速率相同,因此给出视频流正确丢弃速率的速率差估计将计算TCP流的丢弃速率为零。然而,如果输出链路在进入“具有持久队列”状态时丢弃一个数据包,当发送方发现这一点时(通过TCP的正常数据包丢失修复机制),它会将其窗口减少两倍[RFC5681];因此,在丢弃后的一个往返时间内,持久队列将消失。

If there were N TCP conversations sharing the bottleneck, the controller would have to drop O(N) packets (one from each conversation) to make all the conversations reduce their window to get rid of the persistent queue. If the traffic mix consists of short (<= bandwidth-delay product) conversations, the aggregate behavior becomes more like the open-loop video example since each conversation is likely to have already sent all its packets by the time it learns about a drop so each drop has negligible effect on subsequent traffic.

如果有N个TCP会话共享瓶颈,控制器将不得不丢弃O(N)个数据包(每个会话一个),以使所有会话减少其窗口以摆脱持久队列。如果流量组合由短(<=带宽延迟积)对话组成,则聚合行为更像是开环视频示例,因为每个对话在了解到丢包时可能已经发送了所有数据包,因此每个丢包对后续流量的影响可以忽略不计。

The controller does not know the number, duration, or kind of conversations creating its queue, so it has to learn the appropriate response. Since single drops can have a large effect if the degree of multiplexing (the number of active conversations) is small, dropping at too high a rate is likely to have a catastrophic effect on throughput. Dropping at a low rate (< 1 packet per round-trip time) and then increasing the drop rate slowly until the persistent queue goes below the target is unlikely to overdrop and is guaranteed to eventually dissipate the persistent queue. This stochastic gradient learning procedure is the core of CoDel's control loop (the gradient exists because a drop always reduces the (instantaneous) queue, so an increasing drop rate always moves the system "down" toward no persistent queue, regardless of traffic mix).

控制器不知道创建其队列的对话的数量、持续时间或种类,因此它必须了解相应的响应。由于如果多路复用程度(活动会话的数量)很小,单次丢弃可能会产生很大的影响,因此以过高的速率丢弃可能会对吞吐量产生灾难性的影响。以较低的速率(每往返时间小于1个数据包)丢弃,然后缓慢增加丢弃速率,直到持久队列低于目标值,这不太可能导致超负荷,并保证最终消除持久队列。这种随机梯度学习过程是CoDel控制循环的核心(梯度的存在是因为下降总是减少(瞬时)队列,因此上升的下降率总是将系统“向下”移动到无持续队列,而不考虑流量混合)。

The "next drop time" is decreased in inverse proportion to the square root of the number of drops since the drop state was entered, using the well-known non-linear relationship of drop rate to throughput to get a linear change in throughput [REDL1998][MACTCP1997].

“下一次下降时间”与进入下降状态后下降次数的平方根成反比减少,使用众所周知的下降速率与吞吐量的非线性关系来获得吞吐量的线性变化[REDL1998][MACTCP1997]。

Since the best rate to start dropping is at slightly more than one packet per RTT, the controller's initial drop rate can be directly derived from the estimator's interval. When the minimum sojourn time first crosses the target and CoDel drops a packet, the earliest the controller could see the effect of the drop is the round-trip time (interval) + the local queue wait time (the target). If the next drop happens any earlier than this time (interval + target), CoDel will overdrop. In practice, the local queue waiting time tends to vary, so making the initial drop spacing (i.e., the time to the second drop) be exactly the minimum possible also leads to overdropping. Analysis of simulation and real-world measured data shows that the 75th percentile magnitude of this variation is less than the target, so the initial drop spacing SHOULD be set to the estimator's interval (i.e., initial drop spacing = interval) to ensure that the controller has accounted for acceptable congestion delays.

由于开始丢弃的最佳速率为每个RTT略多于一个数据包,因此控制器的初始丢弃速率可以直接从估计器的间隔中导出。当最小逗留时间第一次穿过目标并且CoDel丢弃一个数据包时,控制器能够看到丢弃影响的最早时间是往返时间(间隔)+本地队列等待时间(目标)。如果下一次下降发生的时间早于此时间(间隔+目标),CoDel将超速。在实践中,本地队列等待时间往往会发生变化,因此使初始投递间隔(即,到第二次投递的时间)精确到最小可能也会导致超负荷。对模拟和真实测量数据的分析表明,这种变化的第75个百分位幅度小于目标值,因此初始落差间隔应设置为估计器的间隔(即初始落差间隔=间隔),以确保控制器考虑了可接受的拥塞延迟。

Use of the minimum statistic lets the controller be placed in the dequeue routine with the estimator. This means that the control signal (the drop) can be sent at the first sign of "bad" queue (as indicated by the sojourn time) and that the controller can stop acting as soon as the sojourn time falls below the target. Dropping at dequeue has both implementation and control advantages.

使用最小统计量可以将控制器与估计器一起放入出列例程中。这意味着控制信号(下降)可以在“坏”队列的第一个迹象(由停留时间指示)时发送,并且一旦停留时间低于目标值,控制器就可以停止动作。在退出队列时丢弃具有实现和控制两方面的优势。

4. Overview of the CoDel AQM
4. CoDel AQM概述

CoDel was initially designed as a bufferbloat solution for the consumer network edge. The CoDel building blocks are able to adapt to different or time-varying link rates, be easily used with multiple queues, have excellent utilization with low delay, and have a simple and efficient implementation.

CoDel最初设计为消费者网络边缘的bufferbloat解决方案。CoDel构建块能够适应不同或时变的链路速率,易于用于多个队列,具有低延迟的优异利用率,并且具有简单高效的实现。

The CoDel algorithm described in the rest of this document uses two key variables: TARGET, which is the controller's target setpoint described in Section 3.2, and INTERVAL, which is the estimator's interval described in Section 3.3.

本文件其余部分中描述的CoDel算法使用两个关键变量:目标(第3.2节中描述的控制器目标设定点)和间隔(第3.3节中描述的估计器间隔)。

The only setting CoDel requires is the INTERVAL value, and as 100 ms satisfies that definition for normal Internet usage, CoDel can be parameter-free for consumer use. To ensure that link utilization is not adversely affected, CoDel's estimator sets TARGET to one that optimizes power. CoDel's controller does not drop packets when the drop would leave the queue empty or with fewer than a Maximum Transmission Unit (MTU) worth of bytes in the buffer. Section 3.2 shows that an ideal TARGET is 5-10% of the connection round-trip time (RTT). In the open terrestrial-based Internet, especially at the consumer edge, we expect most unbloated RTTs to have a ceiling of 100 ms [CHARB2007]. Using this RTT gives a minimum TARGET of 5 ms and INTERVAL of 100 ms. In practice, uncongested links will see sojourn times below TARGET more often than once per RTT, so the estimator is not overly sensitive to the value of INTERVAL.

CoDel所需的唯一设置是间隔值,当100毫秒满足正常互联网使用的定义时,CoDel可以无参数供消费者使用。为了确保链路利用率不会受到不利影响,CoDel的估计器将目标设置为优化功率的目标。当丢包会使队列变空或缓冲区中的字节数少于最大传输单元(MTU)时,CoDel的控制器不会丢包。第3.2节表明,理想的目标是连接往返时间(RTT)的5-10%。在开放的基于地面的互联网中,特别是在消费者边缘,我们预计大多数未经许可的RTT的上限为100毫秒[CHARB2007]。使用此RTT时,最小目标为5ms,间隔为100ms。在实践中,每个RTT中,未阻塞链路的停留时间低于目标的次数超过一次,因此估计器对间隔值不太敏感。

When the estimator finds a persistent delay above TARGET, the controller enters the drop state where a packet is dropped, and the next drop time is set. As discussed in Section 3.3, the initial next drop spacing is intended to be long enough to give the endpoints time to react to the single drop and therefore SHOULD be set to a value equal to INTERVAL. If the estimator's output falls below TARGET, the controller cancels the next drop and exits the drop state. (The controller is more sensitive than the estimator to an overly short INTERVAL value, since an unnecessary drop would occur and lower link utilization). If the next drop time is reached while the controller is still in drop state, the packet being dequeued is dropped, and the next drop time is recalculated.

当估计器发现超过目标的持续延迟时,控制器进入丢包状态,其中丢包,并设置下一个丢包时间。如第3.3节所述,初始下一个液滴间距应足够长,以使端点有时间对单个液滴作出反应,因此应设置为等于间隔的值。如果估计器的输出低于目标,控制器取消下一次下降并退出下降状态。(控制器比估计器对过短的间隔值更敏感,因为会发生不必要的下降,降低链路利用率)。如果在控制器仍处于丢弃状态时达到下一个丢弃时间,则丢弃正在排队的数据包,并重新计算下一个丢弃时间。

Additional logic prevents re-entering the drop state too soon after exiting it and resumes the drop state at a recent control level, if one exists. This logic is described more precisely in the pseudocode below. Additional work is required to determine the frequency and importance of re-entering the drop state.

附加逻辑可防止退出后过早重新进入drop状态,并在最近的控制级别(如果存在)恢复drop状态。下面的伪代码更精确地描述了该逻辑。需要额外的工作来确定重新进入下降状态的频率和重要性。

Note that CoDel AQM only enters its drop state when the local minimum sojourn delay has exceeded TARGET for a time period long enough for normal bursts to dissipate, ensuring that a burst of packets that fits in the pipe will not be dropped.

请注意,CoDel AQM仅在本地最小驻留延迟超过目标的时间段足以使正常突发消散时才进入其丢弃状态,以确保适合管道的数据包突发不会被丢弃。

4.1. Non-starvation
4.1. 非饥饿

CoDel's goals are to control delay with little or no impact on link utilization and to be deployed on a wide range of link bandwidths, including variable-rate links, without reconfiguration. To keep from making drops when it would starve the output link, CoDel makes another check before dropping to see if at least an MTU worth of bytes remains in the buffer. If not, the packet SHOULD NOT be dropped; therefore, CoDel exits the drop state. The MTU size can be set adaptively to the largest packet seen so far or can be read from the interface driver.

CoDel的目标是在对链路利用率影响很小或没有影响的情况下控制延迟,并在不重新配置的情况下部署在广泛的链路带宽上,包括可变速率链路。为了避免在输出链路耗尽时进行丢弃,CoDel在丢弃之前进行另一次检查,以查看缓冲区中是否至少保留了MTU值的字节。如果不是,则不应丢弃数据包;因此,CoDel退出drop状态。MTU大小可以根据迄今为止看到的最大数据包自适应设置,也可以从接口驱动程序读取。

4.2. Setting INTERVAL
4.2. 设定间隔

The INTERVAL value is chosen to give endpoints time to react to a drop without being so long that response times suffer. CoDel's estimator, TARGET, and control loop all use INTERVAL. Understanding their derivation shows that CoDel is the most sensitive to the value of INTERVAL for single long-lived TCPs with a decreased sensitivity for traffic mixes. This is fortunate, as RTTs vary across connections and are not known a priori. The best policy seems to be to use an INTERVAL value slightly larger than the RTT seen by most of the connections using a link, a value that can be determined as the largest RTT seen if the value is not an outlier (use of a 95-99th percentile value should work). In practice, this value is not known or measured (however, see Appendix A for an application where INTERVAL is measured). An INTERVAL setting of 100 ms works well across a range of RTTs from 10 ms to 1 second (excellent performance is achieved in the range from 10 ms to 300 ms). For devices intended for the normal terrestrial Internet, INTERVAL SHOULD have a value of 100 ms. This will only cause overdropping where a long-lived TCP has an RTT longer than 100 ms and there is little or no mixing with other connections through the link.

选择“间隔”值是为了给端点时间对下降作出反应,而不会太长,以致影响响应时间。CoDel的估计器、目标和控制回路都使用区间。理解它们的推导表明,CoDel对单个长寿命TCP的间隔值最敏感,对流量混合的敏感度降低。这是幸运的,因为RTT在不同的连接中是不同的,并且不是先验的。最好的策略似乎是使用一个比大多数使用链接的连接看到的RTT稍大的间隔值,如果该值不是异常值,则可以将该值确定为看到的最大RTT(使用第95-99百分位值应该可以)。在实践中,该值未知或无法测量(但是,对于测量间隔的应用,请参见附录A)。100 ms的间隔设置在10 ms到1秒的RTT范围内运行良好(在10 ms到300 ms的范围内实现了优异的性能)。对于用于普通地面互联网的设备,间隔值应为100 ms。这只会在长寿命TCP的RTT超过100 ms且很少或没有通过链路与其他连接混合的情况下导致透支。

4.3. Setting TARGET
4.3. 设定目标

TARGET is the maximum acceptable persistent queue delay above which CoDel is dropping or preparing to drop and below which CoDel will not drop. TARGET SHOULD be set to 5 ms for normal Internet traffic.

TARGET是最大可接受的持久队列延迟,高于此延迟,CoDel将丢弃或准备丢弃,低于此延迟,CoDel将不会丢弃。对于正常的互联网流量,目标应设置为5毫秒。

The calculations of Section 3.2 show that the best TARGET value is 5-10% of the RTT, with the low end of 5% preferred. Extensive simulations exploring the impact of different TARGET values when used

第3.2节的计算表明,最佳目标值为RTT的5-10%,低端为5%。广泛的模拟,探索使用时不同目标值的影响

with mixed traffic flows with different RTTs and different bandwidths show that below a TARGET of 5 ms, utilization suffers for some conditions and traffic loads; above 5 ms showed very little or no improvement in utilization.

对于具有不同RTT和不同带宽的混合业务流,在5ms的目标以下,某些条件和业务负载会影响利用率;超过5ms时,利用率几乎没有改善。

Sojourn times must remain above the TARGET for INTERVAL amount of time in order to enter the drop state. Any packet with a sojourn time less than TARGET will reset the time that the queue was last below TARGET. Since Internet traffic has very dynamic characteristics, the actual sojourn delay experienced by packets varies greatly and is often less than TARGET unless the overload is excessive. When a link is not overloaded, it is not a bottleneck, and packet sojourn times will be small or nonexistent. In the usual case, there are only one or two places along a path where packets will encounter a bottleneck (usually at the edge), so the total amount of queueing delay experienced by a packet should be less than 10 ms even under extremely congested conditions. This net delay is substantially lower than common current queueing delays on the Internet that grow to orders of seconds [NETAL2010] [CHARB2007].

逗留时间必须在间隔时间内保持在目标以上,才能进入drop状态。任何驻留时间小于目标的数据包都将重置队列最后一次低于目标的时间。由于Internet流量具有非常动态的特性,数据包所经历的实际驻留延迟变化很大,通常小于目标值,除非过载过大。当一个链路没有过载时,它就不是一个瓶颈,并且数据包驻留时间很短或者根本不存在。在通常情况下,一条路径上只有一个或两个位置的数据包会遇到瓶颈(通常在边缘),因此即使在极度拥挤的情况下,数据包经历的排队延迟总量也应小于10 ms。此净延迟大大低于Internet上常见的当前排队延迟(增长到秒级)[NETAL2010][CHARB2007]。

Regarding the roles of TARGET and the minimum-tracking INTERVAL, note that TARGET SHOULD NOT be increased in response to lower layers that have a bursty nature, where packets are transmitted for short periods interspersed with idle periods where the link is waiting for permission to send. CoDel's estimator will "see" the effective transmission rate over an INTERVAL amount of time, and increasing TARGET only leads to longer queue delays. On the other hand, where a significant additional delay is added to the intrinsic RTT of most or all packets due to the waiting time for a transmission, it is necessary to increase INTERVAL by that extra delay. TARGET SHOULD NOT be adjusted for such short-term bursts, but INTERVAL MAY need to be adjusted if the path's intrinsic RTT changes.

关于目标的角色和最小跟踪间隔,请注意,不应增加目标以响应具有突发性的较低层,在较低层中,数据包在短时间内传输,而链路在空闲时间内等待发送许可。CoDel的估计器将“看到”一段时间间隔内的有效传输速率,增加目标值只会导致更长的队列延迟。另一方面,在由于传输的等待时间而将显著的额外延迟添加到大多数或所有分组的固有RTT的情况下,有必要通过该额外延迟来增加间隔。不应针对此类短期突发调整目标,但如果路径的固有RTT发生变化,则可能需要调整间隔。

4.4. Use with Multiple Queues
4.4. 用于多个队列

CoDel is easily adapted to multiple queue systems. With other approaches, there is always a question of how to account for the fact that each queue receives less than the full link rate over time and usually sees a varying rate over time. This is what CoDel excels at: using a packet's sojourn time in the buffer completely circumvents this problem. In a multiple-queue setting, a separate CoDel algorithm runs on each queue, but each CoDel instance uses the packet sojourn time the same way a single-queue CoDel does. Just as a single-queue CoDel adapts to changing link bandwidths [CODEL2012], so does a multiple-queue CoDel system. As an optimization to avoid queueing more than necessary, when testing for queue occupancy before dropping, the total occupancy of all queues sharing the same output link SHOULD be used. This property of CoDel has been exploited in

CoDel很容易适应多队列系统。对于其他方法,始终存在一个问题,即如何解释以下事实:随着时间的推移,每个队列接收的链路速率小于完整链路速率,并且通常会随着时间的推移而变化。这就是CoDel擅长的地方:在缓冲区中使用数据包的驻留时间完全可以避免这个问题。在多队列设置中,一个单独的CoDel算法在每个队列上运行,但每个CoDel实例使用数据包驻留时间的方式与单个队列CoDel相同。正如单队列CoDel适应不断变化的链路带宽[CODEL2012],多队列CoDel系统也是如此。为了避免排队过度,在丢弃前测试队列占用率时,应使用共享同一输出链路的所有队列的总占用率。CoDel的这一特性已在中被利用

fq_codel [RFC8290], which hashes on the packet header fields to determine a specific bin, or sub-queue, for the packet and runs CoDel on each bin or sub-queue, thus creating a well-mixed output flow and obviating issues of reverse path flows (including "ack compression").

fq_codel[RFC8290],它对数据包头字段进行散列,以确定数据包的特定bin或子队列,并在每个bin或子队列上运行codel,从而创建混合良好的输出流,并避免反向路径流的问题(包括“ack压缩”)。

4.5. Setting Up CoDel
4.5. 建立CoDel

CoDel is set for use in devices in the open Internet. An INTERVAL setting of 100 ms is used, TARGET is set to 5% of INTERVAL, and the initial drop spacing is also set to the INTERVAL. These settings have been chosen so that a device, such as a small WiFi router, can be sold without the need for any values to be made adjustable, yielding a parameterless implementation. In addition, CoDel is useful in environments with significantly different characteristics from the normal Internet, for example, in switches used as a cluster interconnect within a data center. Since cluster traffic is entirely internal to the data center, round-trip latencies are low (typically <100 us) but bandwidths are high (1-40 Gbps), so it's relatively easy for the aggregation phase of a distributed computation (e.g., the Reduce part of a Map/Reduce) to persistently fill and then overflow the modest per-port buffering available in most high-speed switches. A CoDel configured for this environment (TARGET and INTERVAL in the microsecond rather than millisecond range) can minimize drops or Explicit Congestion Notification (ECN) marks while keeping throughput high and latency low.

CoDel设置为在开放互联网的设备中使用。使用100 ms的间隔设置,目标设置为间隔的5%,初始跌落间隔也设置为间隔。选择这些设置是为了在不需要调整任何值的情况下出售设备(如小型WiFi路由器),从而实现无参数实现。此外,CoDel在特性与普通互联网截然不同的环境中非常有用,例如,在用作数据中心内群集互连的交换机中。由于集群通信量完全在数据中心内部,往返延迟较低(通常<100 us),但带宽较高(1-40 Gbps),因此分布式计算的聚合阶段相对容易(例如,Map/Reduce的Reduce部分)持续填充然后溢出大多数高速交换机中可用的适度每端口缓冲。为此环境配置的CoDel(目标和间隔在微秒而不是毫秒范围内)可以最大限度地减少丢弃或显式拥塞通知(ECN)标记,同时保持高吞吐量和低延迟。

Devices destined for these environments MAY use a different value for INTERVAL, where suitable. If appropriate analysis indicates, the TARGET MAY be set to some other value in the 5-10% of INTERVAL, and the initial drop spacing MAY be set to a value of 1.0 to 1.2 times INTERVAL. But these settings will cause problems, such as overdropping and low throughput, if used on the open Internet, so devices that allow CoDel to be configured SHOULD default to the Internet-appropriate values given in this document.

适用于这些环境的设备可能会使用不同的间隔值。如果适当的分析表明,可将目标设定为5-10%间隔内的其他值,并将初始跌落间隔设定为1.0到1.2倍间隔的值。但如果在开放互联网上使用,这些设置将导致问题,如透支和低吞吐量,因此允许配置CoDel的设备应默认为本文档中给出的与互联网相关的值。

5. Annotated Pseudocode for CoDel AQM
5. CoDel AQM的带注释伪代码

What follows is the CoDel algorithm in C++-like pseudocode. Since CoDel adds relatively little new code to a basic tail-drop FIFO queue, we have attempted to highlight just these additions by presenting CoDel as a sub-class of a basic FIFO queue base class. The reference code is included to aid implementers who wish to apply CoDel to queue management as described here or to adapt its principles to other applications.

下面是类似C++的伪代码中的CoDel算法。由于CoDel向基本尾部丢弃FIFO队列添加的新代码相对较少,因此我们试图通过将CoDel作为基本FIFO队列基类的子类来突出这些添加。包含参考代码是为了帮助希望将CoDel应用于此处所述队列管理或使其原理适应其他应用程序的实现者。

Implementors are strongly encouraged to also look at the Linux kernel version of CoDel -- a well-written, well-tested, real-world, C-based implementation. As of this writing, it is available at https://github.com/torvalds/linux/blob/master/net/sched/sch_codel.c.

强烈鼓励实现者也看看CoDel的Linux内核版本——一个编写良好、测试良好、真实的基于C的实现。截至撰写本文时,可在https://github.com/torvalds/linux/blob/master/net/sched/sch_codel.c.

5.1. Data Types
5.1. 数据类型

time_t is an integer time value in units convenient for the system. The code presented here uses 0 as a flag value to indicate "no time set."

time_t是一个整数时间值,单位为便于系统使用的单位。此处显示的代码使用0作为标志值,表示“未设置时间”

packet_t* is a pointer to a packet descriptor. We assume it has a tstamp field capable of holding a time_t and that the field is available for use by CoDel (it will be set by the enqueue routine and used by the dequeue routine).

packet_t*是指向数据包描述符的指针。我们假设它有一个tstamp字段,能够保存一个时间,并且该字段可供CoDel使用(它将由排队例程设置,并由出列例程使用)。

queue_t is a base class for queue objects (the parent class for codel_queue_t objects). We assume it has enqueue() and dequeue() methods that can be implemented in child classes. We assume it has a bytes() method that returns the current queue size in bytes. This can be an approximate value. The method is invoked in the dequeue() method but shouldn't require a lock with the enqueue() method.

queue_t是队列对象的基类(codel_queue_t对象的父类)。我们假设它有可以在子类中实现的enqueue()和dequeue()方法。我们假设它有一个bytes()方法,该方法以字节为单位返回当前队列大小。这可以是一个近似值。该方法在dequeue()方法中调用,但不需要使用enqueue()方法进行锁定。

flag_t is a Boolean.

flag_t是一个布尔值。

5.2. Per-Queue State (codel_queue_t Instance Variables)
5.2. 每队列状态(代码队列实例变量)
   time_t first_above_time_ = 0; // Time to declare sojourn time above
                                 // TARGET
   time_t drop_next_ = 0;        // Time to drop next packet
   uint32_t count_ = 0;          // Packets dropped in drop state
   uint32_t lastcount_ = 0;      // Count from previous iteration
   flag_t dropping_ = false;     // Set to true if in drop state
        
   time_t first_above_time_ = 0; // Time to declare sojourn time above
                                 // TARGET
   time_t drop_next_ = 0;        // Time to drop next packet
   uint32_t count_ = 0;          // Packets dropped in drop state
   uint32_t lastcount_ = 0;      // Count from previous iteration
   flag_t dropping_ = false;     // Set to true if in drop state
        
5.3. Constants
5.3. 常数
   time_t TARGET = MS2TIME(5);     // 5 ms TARGET queue delay
   time_t INTERVAL = MS2TIME(100); // 100 ms sliding-minimum window
   u_int maxpacket = 512;          // Maximum packet size in bytes
                                   // (SHOULD use interface MTU)
        
   time_t TARGET = MS2TIME(5);     // 5 ms TARGET queue delay
   time_t INTERVAL = MS2TIME(100); // 100 ms sliding-minimum window
   u_int maxpacket = 512;          // Maximum packet size in bytes
                                   // (SHOULD use interface MTU)
        
5.4. Enqueue Routine
5.4. 排队例行程序

All the work of CoDel is done in the dequeue routine. The only CoDel addition to enqueue is putting the current time in the packet's tstamp field so that the dequeue routine can compute the packet's sojourn time. Note that packets arriving at a full buffer will be dropped, but these drops are not counted towards CoDel's computations.

CoDel的所有工作都是在出列例程中完成的。enqueue中唯一添加的代码是将当前时间放入数据包的tstamp字段,以便出列例程可以计算数据包的驻留时间。请注意,到达完整缓冲区的数据包将被丢弃,但这些丢弃不会计入CoDel的计算。

   void codel_queue_t::enqueue(packet_t* pkt)
   {
       pkt->tstamp = clock();
       queue_t::enqueue(pkt);
   }
        
   void codel_queue_t::enqueue(packet_t* pkt)
   {
       pkt->tstamp = clock();
       queue_t::enqueue(pkt);
   }
        
5.5. Dequeue Routine
5.5. 出列例行程序

This is the heart of CoDel. There are two branches based on whether the controller is in drop state: (i) if the controller is in drop state (that is, the minimum packet sojourn time is greater than TARGET), then the controller checks if it is time to leave drop state or schedules the next drop(s); or (ii) if the controller is not in drop state, it determines if it should enter drop state and do the initial drop.

这是CoDel的核心。根据控制器是否处于丢弃状态,有两个分支:(i)如果控制器处于丢弃状态(即,最小数据包驻留时间大于目标时间),则控制器检查是时候离开丢弃状态还是安排下一次丢弃;或者(ii)如果控制器未处于下降状态,则确定是否应进入下降状态并进行初始下降。

   packet_t* CoDelQueue::dequeue()
   {
       time_t now = clock();
       dodequeue_result r = dodequeue(now);
       uint32_t delta;
        
   packet_t* CoDelQueue::dequeue()
   {
       time_t now = clock();
       dodequeue_result r = dodequeue(now);
       uint32_t delta;
        
       if (dropping_) {
           if (! r.ok_to_drop) {
               // sojourn time below TARGET - leave drop state
               dropping_ = false;
           }
           // Time for the next drop.  Drop current packet and dequeue
           // next.  If the dequeue doesn't take us out of dropping
           // state, schedule the next drop.  A large backlog might
           // result in drop rates so high that the next drop should
           // happen now, hence the 'while' loop.
           while (now >= drop_next_ && dropping_) {
               drop(r.p);
               ++count_;
               r = dodequeue(now);
               if (! r.ok_to_drop) {
                   // leave drop state
                   dropping_ = false;
        
       if (dropping_) {
           if (! r.ok_to_drop) {
               // sojourn time below TARGET - leave drop state
               dropping_ = false;
           }
           // Time for the next drop.  Drop current packet and dequeue
           // next.  If the dequeue doesn't take us out of dropping
           // state, schedule the next drop.  A large backlog might
           // result in drop rates so high that the next drop should
           // happen now, hence the 'while' loop.
           while (now >= drop_next_ && dropping_) {
               drop(r.p);
               ++count_;
               r = dodequeue(now);
               if (! r.ok_to_drop) {
                   // leave drop state
                   dropping_ = false;
        
               } else {
                   // schedule the next drop.
                   drop_next_ = control_law(drop_next_, count_);
               }
           }
       // If we get here, we're not in drop state.  The 'ok_to_drop'
       // return from dodequeue means that the sojourn time has been
       // above 'TARGET' for 'INTERVAL', so enter drop state.
       } else if (r.ok_to_drop) {
           drop(r.p);
           r = dodequeue(now);
           dropping_ = true;
        
               } else {
                   // schedule the next drop.
                   drop_next_ = control_law(drop_next_, count_);
               }
           }
       // If we get here, we're not in drop state.  The 'ok_to_drop'
       // return from dodequeue means that the sojourn time has been
       // above 'TARGET' for 'INTERVAL', so enter drop state.
       } else if (r.ok_to_drop) {
           drop(r.p);
           r = dodequeue(now);
           dropping_ = true;
        
           // If min went above TARGET close to when it last went
           // below, assume that the drop rate that controlled the
           // queue on the last cycle is a good starting point to
           // control it now.  ('drop_next' will be at most 'INTERVAL'
           // later than the time of the last drop, so 'now - drop_next'
           // is a good approximation of the time from the last drop
           // until now.) Implementations vary slightly here; this is
           // the Linux version, which is more widely deployed and
           // tested.
           delta = count_ - lastcount_;
           count_ = 1;
           if ((delta > 1) && (now - drop_next_ < 16*INTERVAL))
               count_ = delta;
        
           // If min went above TARGET close to when it last went
           // below, assume that the drop rate that controlled the
           // queue on the last cycle is a good starting point to
           // control it now.  ('drop_next' will be at most 'INTERVAL'
           // later than the time of the last drop, so 'now - drop_next'
           // is a good approximation of the time from the last drop
           // until now.) Implementations vary slightly here; this is
           // the Linux version, which is more widely deployed and
           // tested.
           delta = count_ - lastcount_;
           count_ = 1;
           if ((delta > 1) && (now - drop_next_ < 16*INTERVAL))
               count_ = delta;
        
           drop_next_ = control_law(now, count_);
           lastcount_ = count_;
       }
       return (r.p);
   }
        
           drop_next_ = control_law(now, count_);
           lastcount_ = count_;
       }
       return (r.p);
   }
        
5.6. Helper Routines
5.6. 辅助程序例程

Since the degree of multiplexing and nature of the traffic sources is unknown, CoDel acts as a closed-loop servo system that gradually increases the frequency of dropping until the queue is controlled (sojourn time goes below TARGET). This is the control law that governs the servo. It has this form because of the sqrt(p) dependence of TCP throughput on drop probability. Note that for embedded systems or kernel implementation, the inverse sqrt can be computed efficiently using only integer multiplication.

由于多路复用程度和流量源的性质未知,CoDel充当闭环伺服系统,逐渐增加丢弃频率,直到队列得到控制(停留时间低于目标值)。这是控制伺服的控制律。它具有这种形式是因为TCP吞吐量对丢弃概率的sqrt(p)依赖性。请注意,对于嵌入式系统或内核实现,可以仅使用整数乘法有效地计算逆sqrt。

   time_t codel_queue_t::control_law(time_t t, uint32_t count)
   {
       return t + INTERVAL / sqrt(count);
   }
        
   time_t codel_queue_t::control_law(time_t t, uint32_t count)
   {
       return t + INTERVAL / sqrt(count);
   }
        

Next is a helper routine that does the actual packet dequeue and tracks whether the sojourn time is above or below TARGET and, if above, if it has remained above continuously for at least INTERVAL amount of time. It returns two values: a Boolean indicating if it is OK to drop (sojourn time above TARGET for at least INTERVAL) and the packet dequeued.

接下来是一个助手例程,它执行实际的数据包出列,并跟踪驻留时间是否高于或低于目标值,如果高于目标值,则跟踪它是否在至少一段时间间隔内连续保持高于目标值。它返回两个值:一个布尔值,指示是否可以丢弃(至少间隔一段时间,停留时间高于目标值)和数据包退出队列。

   typedef struct {
       packet_t* p;
       flag_t ok_to_drop;
   } dodequeue_result;
        
   typedef struct {
       packet_t* p;
       flag_t ok_to_drop;
   } dodequeue_result;
        
   dodequeue_result codel_queue_t::dodequeue(time_t now)
   {
       dodequeue_result r = { queue_t::dequeue(), false };
       if (r.p == NULL) {
           // queue is empty - we can't be above TARGET
           first_above_time_ = 0;
           return r;
       }
        
   dodequeue_result codel_queue_t::dodequeue(time_t now)
   {
       dodequeue_result r = { queue_t::dequeue(), false };
       if (r.p == NULL) {
           // queue is empty - we can't be above TARGET
           first_above_time_ = 0;
           return r;
       }
        
       // To span a large range of bandwidths, CoDel runs two
       // different AQMs in parallel.  One is based on sojourn time
       // and takes effect when the time to send an MTU-sized
       // packet is less than TARGET.  The 1st term of the "if"
       // below does this.  The other is based on backlog and takes
       // effect when the time to send an MTU-sized packet is >=
       // TARGET.  The goal here is to keep the output link
       // utilization high by never allowing the queue to get
       // smaller than the amount that arrives in a typical
       // interarrival time (MTU-sized packets arriving spaced
       // by the amount of time it takes to send such a packet on
       // the bottleneck).  The 2nd term of the "if" does this.
       time_t sojourn_time = now - r.p->tstamp;
       if (sojourn_time_ < TARGET || bytes() <= maxpacket_) {
           // went below - stay below for at least INTERVAL
           first_above_time_ = 0;
       } else {
           if (first_above_time_ == 0) {
               // just went above from below. if still above at
               // first_above_time, will say it's ok to drop.
               first_above_time_ = now + INTERVAL;
           } else if (now >= first_above_time_) {
               r.ok_to_drop = true;
           }
       }
       return r;
   }
        
       // To span a large range of bandwidths, CoDel runs two
       // different AQMs in parallel.  One is based on sojourn time
       // and takes effect when the time to send an MTU-sized
       // packet is less than TARGET.  The 1st term of the "if"
       // below does this.  The other is based on backlog and takes
       // effect when the time to send an MTU-sized packet is >=
       // TARGET.  The goal here is to keep the output link
       // utilization high by never allowing the queue to get
       // smaller than the amount that arrives in a typical
       // interarrival time (MTU-sized packets arriving spaced
       // by the amount of time it takes to send such a packet on
       // the bottleneck).  The 2nd term of the "if" does this.
       time_t sojourn_time = now - r.p->tstamp;
       if (sojourn_time_ < TARGET || bytes() <= maxpacket_) {
           // went below - stay below for at least INTERVAL
           first_above_time_ = 0;
       } else {
           if (first_above_time_ == 0) {
               // just went above from below. if still above at
               // first_above_time, will say it's ok to drop.
               first_above_time_ = now + INTERVAL;
           } else if (now >= first_above_time_) {
               r.ok_to_drop = true;
           }
       }
       return r;
   }
        
5.7. Implementation Considerations
5.7. 实施考虑

time_t is an integer time value in units convenient for the system. Resolution to at least a millisecond is required, and better resolution is useful up to the minimum possible packet time on the output link; 64- or 32-bit widths are acceptable but with 32 bits the resolution should be no finer than 2^{-16} to leave enough dynamic range to represent a wide range of queue waiting times. Narrower widths also have implementation issues due to overflow (wrapping) and underflow (limit cycles because of truncation to zero) that are not addressed in this pseudocode.

time_t是一个整数时间值,单位为便于系统使用的单位。需要至少一毫秒的分辨率,在输出链路上的最小可能分组时间内,更好的分辨率是有用的;64位或32位宽度是可以接受的,但对于32位,分辨率不应小于2^{-16},以留下足够的动态范围来表示宽范围的队列等待时间。由于溢出(环绕)和下溢(由于截断为零而导致的极限环),较窄的宽度也有实现问题,这些问题在此伪代码中没有解决。

Since CoDel requires relatively little per-queue state and no direct communication or state sharing between the enqueue and dequeue routines, it is relatively simple to add CoDel to almost any packet processing pipeline, including forwarding engines based on Application-Specific Integrated Circuits (ASICs) or Network Processors (NPUs). One issue to consider is dodequeue()'s use of a 'bytes()' function to determine the current queue size in bytes. This value does not need to be exact. If the enqueue part of the pipeline keeps a running count of the total number of bytes it has put into the queue, and the dequeue routine keeps a running count of the total bytes it has removed from the queue, 'bytes()' is simply the difference between these two counters (32-bit counters should be adequate). Enqueue has to update its counter once per packet queued, but it does not matter when (before, during, or after the packet has been added to the queue). The worst that can happen is a slight, transient underestimate of the queue size, which might cause a drop to be briefly deferred.

由于CoDel对每个队列状态的要求相对较低,且入队和出队例程之间没有直接通信或状态共享,因此将CoDel添加到几乎任何数据包处理管道(包括基于特定应用集成电路(ASIC)或网络处理器(NPU)的转发引擎)中相对简单。需要考虑的一个问题是doDQueReNe()使用“字节()”函数来确定当前队列大小的字节数。此值不需要精确。如果管道的排队部分保留其放入队列的字节总数的运行计数,而出列例程保留其从队列中移除的字节总数的运行计数,“bytes()”只是这两个计数器之间的差(32位计数器应该足够)。Enqueue必须为排队的每个数据包更新其计数器一次,但它与何时(数据包添加到队列之前、期间或之后)无关。可能发生的最坏情况是对队列大小的轻微、暂时的低估,这可能会导致下降被短暂推迟。

6. Further Experimentation
6. 进一步试验

We encourage experimentation with the recommended values of TARGET and INTERVAL for Internet settings. CoDel provides general, efficient, parameterless building blocks for queue management that can be applied to single or multiple queues in a variety of data networking scenarios. CoDel's settings may be modified for other special-purpose networking applications.

我们鼓励对互联网设置的目标和间隔的建议值进行实验。CoDel为队列管理提供了通用、高效、无参数的构建块,可应用于各种数据网络场景中的单个或多个队列。CoDel的设置可针对其他专用网络应用进行修改。

7. Security Considerations
7. 安全考虑

This document describes an active queue management algorithm for implementation in networked devices. There are no known security exposures associated with CoDel at this time.

本文档描述了一种在网络设备中实现的主动队列管理算法。目前没有与CoDel相关的已知安全风险。

8. IANA Considerations
8. IANA考虑

This document does not require actions by IANA.

本文件不要求IANA采取行动。

9. References
9. 工具书类
9.1. Normative References
9.1. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<https://www.rfc-editor.org/info/rfc2119>.

[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.

[RFC8174]Leiba,B.,“RFC 2119关键词中大写与小写的歧义”,BCP 14,RFC 8174,DOI 10.17487/RFC8174,2017年5月<https://www.rfc-editor.org/info/rfc8174>.

9.2. Informative References
9.2. 资料性引用

[BLOAT] Gettys, J. and K. Nichols, "Bufferbloat: Dark Buffers in the Internet", Communications of the ACM, Volume 55, Issue 1, DOI 10.1145/2063176.2063196, January 2012.

[BLOAT]Gettys,J.和K.Nichols,“缓冲区膨胀:互联网中的暗缓冲区”,《ACM通讯》,第55卷,第1期,DOI 10.1145/2063176.2063196,2012年1月。

[CHARB2007] Dischinger, M., Haeberlen, A., Gummadi, K., and S. Saroiu, "Characterizing Residential Broadband Networks", Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, DOI 10.1145/1298306.1298313, October 2007.

[CHARB2007]Dischinger,M.,Haeberlen,A.,Gummadi,K.,和S.Saroiu,“住宅宽带网络的特征”,第七届ACM SIGCOMM互联网测量会议记录,DOI 10.1145/1298306.1298313,2007年10月。

[CODEL2012] Nichols, K. and V. Jacobson, "Controlling Queue Delay", ACM Queue, Volume 10, Issue 5, DOI 10.1145/2208917.2209336, May 2012.

[CODEL2012]Nichols,K.和V.Jacobson,“控制队列延迟”,ACM队列,第10卷,第5期,DOI 10.1145/2208917.2209336,2012年5月。

[KLEIN81] Kleinrock, L. and R. Gail, "An Invariant Property of Computer Network Power", Proceedings of the International Conference on Communications, June 1981, <http://www.lk.cs.ucla.edu/data/files/Gail/power.pdf>.

[KLEIN81]Kleinrock,L.和R.Gail,“计算机网络能力的不变属性”,《国际通信会议记录》,1981年6月<http://www.lk.cs.ucla.edu/data/files/Gail/power.pdf>.

[MACTCP1997] Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm", ACM SIGCOMM Computer Communications Review, Volume 27, Issue 3, pp. 67-82, DOI 10.1145/263932.264023, July 1997.

[MACTCP1997]Mathis,M.,Semke,J.,Mahdavi,J.,和T.Ott,“TCP拥塞避免算法的宏观行为”,ACM SIGCOMM计算机通信评论,第27卷,第3期,第67-82页,DOI 10.1145/263932.264023,1997年7月。

[NETAL2010] Kreibich, C., Weaver, N., Paxson, V., and B. Nechaev, "Netalyzr: Illuminating the Edge Network", Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, DOI 10.1145/1879141.1879173, November 2010.

[NETAL2010]Kreibich,C.,Weaver,N.,Paxson,V.,和B.Nechaev,“Netalyzr:照亮边缘网络”,第十届ACM SIGCOMM互联网测量会议记录,DOI 10.1145/1879141.18791732010年11月。

[REDL1998] Nichols, K., Jacobson, V., and K. Poduri, "RED in a Different Light", Technical Report, Cisco Systems, September 1999, <http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.22.9406>.

[REDL1998]Nichols,K.,Jacobson,V.,和K.Poduri,“不同光线下的红色”,技术报告,思科系统,1999年9月<http://citeseerx.ist.psu.edu/viewdoc/ 总结?doi=10.1.1.22.9406>。

[RFC896] Nagle, J., "Congestion Control in IP/TCP Internetworks", RFC 896, DOI 10.17487/RFC0896, January 1984, <https://www.rfc-editor.org/info/rfc896>.

[RFC896]Nagle,J.,“IP/TCP互联网中的拥塞控制”,RFC 896,DOI 10.17487/RFC0896,1984年1月<https://www.rfc-editor.org/info/rfc896>.

[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, <https://www.rfc-editor.org/info/rfc2309>.

[RFC2309]Braden,B.,Clark,D.,Crowcroft,J.,Davie,B.,Deering,S.,Estrin,D.,Floyd,S.,Jacobson,V.,Minshall,G.,Partridge,C.,Peterson,L.,Ramakrishnan,K.,Shenker,S.,Wroclawski,J.,and L.Zhang,“关于互联网中队列管理和拥塞避免的建议”,RFC 2309,DOI 10.17487/RFC2309,1998年4月, <https://www.rfc-editor.org/info/rfc2309>.

[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, <https://www.rfc-editor.org/info/rfc5681>.

[RFC5681]Allman,M.,Paxson,V.和E.Blanton,“TCP拥塞控制”,RFC 5681,DOI 10.17487/RFC56812009年9月<https://www.rfc-editor.org/info/rfc5681>.

[RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler and Active Queue Management Algorithm", RFC 8290, DOI 10.17487/RFC8290, January 2018, <https://www.rfc-editor.org/info/rfc8290>.

[RFC8290]Hoeiland Joergensen,T.,McKenney,P.,Taht,D.,Gettys,J.,和E.Dumazet,“流队列代码包调度器和主动队列管理算法”,RFC 8290,DOI 10.17487/RFC8290,2018年1月<https://www.rfc-editor.org/info/rfc8290>.

[TSV84] Jacobson, V., "CoDel", IETF 84, Transport Area Open Meeting, July 2012, <http://www.ietf.org/proceedings/84/slides/ slides-84-tsvarea-4.pdf>.

[TSV84]Jacobson,V.,“CoDel”,IETF 84,运输领域公开会议,2012年7月<http://www.ietf.org/proceedings/84/slides/ 幻灯片-84-tsvarea-4.pdf>。

[VANQ2006] Jacobson, V., "A Rant on Queues", Talk at MIT Lincoln Labs, Lexington, MA, July 2006, <http://www.pollere.net/Pdfdocs/QrantJul06.pdf>.

[VANQ2006]Jacobson,V.,“排队时的咆哮”,麻省理工学院林肯实验室演讲,马萨诸塞州列克星敦,2006年7月<http://www.pollere.net/Pdfdocs/QrantJul06.pdf>.

Appendix A. Applying CoDel in the Data Center
附录A.在数据中心应用CoDel

Nandita Dukkipati and her group at Google realized that the CoDel building blocks could be applied to bufferbloat problems in data-center servers, not just to Internet routers. The Linux CoDel queueing discipline (qdisc) was adapted in three ways to tackle this bufferbloat problem.

Nandita Dukkipati和她在谷歌的团队意识到CoDel构建块可以应用于数据中心服务器的缓冲区膨胀问题,而不仅仅是互联网路由器。Linux代码排队规程(qdisc)采用了三种方法来解决这个缓冲区膨胀问题。

1. The default CoDel action was modified to be a direct feedback from qdisc to the TCP layer at dequeue. The direct feedback simply reduces TCP's congestion window just as congestion control would do in the event of drop. The scheme falls back to ECN marking or packet drop if the TCP socket lock could not be acquired at dequeue.

1. 默认的CoDel操作被修改为在退出队列时从qdisc直接反馈给TCP层。直接反馈只是减少了TCP的拥塞窗口,就像拥塞控制在丢包时所做的那样。如果在退出队列时无法获取TCP套接字锁,则该方案会退回到ECN标记或数据包丢弃。

2. Being located in the server makes it possible to monitor the actual RTT to use as CoDel's interval rather than making a "best guess" of RTT. The CoDel interval is dynamically adjusted by using the maximum TCP round-trip time (RTT) of those connections sharing the same qdisc/bucket. In particular, there is a history entry of the maximum RTT experienced over the last second. As a packet is dequeued, the RTT estimate is accessed from its TCP socket. If the estimate is larger than the current CoDel interval, the CoDel interval is immediately refreshed to the new value. If the CoDel interval is not refreshed for over a second, it is decreased to the history entry, and the process is repeated. The use of the dynamic TCP RTT estimate allows the interval to adapt to the actual maximum value currently seen and thus lets the controller space its drop intervals appropriately.

2. 位于服务器中,可以监视实际的RTT以用作CoDel的间隔,而不是对RTT进行“最佳猜测”。CoDel间隔通过使用共享相同qdisc/bucket的那些连接的最大TCP往返时间(RTT)进行动态调整。特别是,有一个在最后一秒钟经历的最大RTT的历史记录条目。当数据包退出队列时,从其TCP套接字访问RTT估计值。如果估计值大于当前CoDel间隔,CoDel间隔将立即刷新为新值。如果CoDel间隔超过一秒钟未刷新,则会将其减少为历史记录条目,并重复该过程。动态TCP RTT估计的使用允许间隔适应当前看到的实际最大值,从而允许控制器适当地间隔其丢弃间隔。

3. Since the mathematics of computing the setpoint are invariant, a TARGET of 5% of the RTT or CoDel interval was used here.

3. 由于计算设定点的数学是不变的,因此此处使用的目标为RTT或CoDel间隔的5%。

Non-data packets were not dropped, as these are typically small and sometimes critical control packets. Being located on the server, there is no concern with misbehaving users as there would be on the public Internet.

未丢弃非数据包,因为这些数据包通常很小,有时是关键的控制数据包。由于位于服务器上,因此不会像在公共互联网上那样担心行为不端的用户。

In several data-center workload benchmarks, which are typically bursty, CoDel reduced the queueing latencies at the qdisc and thereby improved the mean and 99th-percentile latencies from several tens of milliseconds to less than one millisecond. The minimum tracking part of the CoDel framework proved useful in disambiguating "good" queue versus "bad" queue, which is particularly helpful in controlling qdisc buffers that are inherently bursty because of TCP Segmentation Offload (TSO).

在一些数据中心工作负载基准测试中,CoDel降低了qdisc的排队延迟,从而将平均延迟和第99百分位延迟从几十毫秒提高到不到一毫秒。CoDel框架的最小跟踪部分在消除“好”队列与“坏”队列之间的歧义方面被证明是有用的,这对于控制由于TCP分段卸载(TSO)而固有的突发性qdisc缓冲区尤其有用。

Acknowledgments

致谢

The authors thank Jim Gettys for the constructive nagging that made us get the work "out there" before we thought it was ready. We thank Dave Taht, Eric Dumazet, and the open source community for showing the value of getting it "out there" and for making it real. We thank Nandita Dukkipati for contributions to Section 6 and for comments that helped to substantially improve this document. We thank the AQM Working Group and the Transport Area Shepherd, Wes Eddy, for patiently prodding this document all the way to publication as an RFC.

作者感谢Jim Gettys的建设性唠叨,使我们在认为工作准备就绪之前就将工作“摆在那里”。我们感谢Dave Taht、Eric Dumazet和开源社区展示了将其“发布”并使其成为现实的价值。我们感谢Nandita Dukkipati对第6节的贡献以及有助于大幅改进本文件的评论。我们感谢AQM工作组和运输区Shepherd Wes Eddy耐心地推动本文件作为RFC出版。

Authors' Addresses

作者地址

Kathleen Nichols Pollere, Inc. PO Box 370201 Montara, CA 94037 United States of America

Kathleen Nichols Pollere,Inc.美国加利福尼亚州蒙塔拉市370201号邮政信箱94037

   Email: nichols@pollere.com
        
   Email: nichols@pollere.com
        

Van Jacobson Google

范雅各布森谷歌

   Email: vanj@google.com
        
   Email: vanj@google.com
        

Andrew McGregor (editor) Google

安德鲁·麦格雷戈(编辑)谷歌

   Email: andrewmcgr@google.com
        
   Email: andrewmcgr@google.com
        

Janardhan Iyengar (editor) Google

Janardhan Iyengar(编辑)谷歌

   Email: jri@google.com
        
   Email: jri@google.com