Internet Engineering Task Force (IETF) A. Ford Request for Comments: 6824 Cisco Category: Experimental C. Raiciu ISSN: 2070-1721 U. Politechnica of Bucharest M. Handley U. College London O. Bonaventure U. catholique de Louvain January 2013
Internet Engineering Task Force (IETF) A. Ford Request for Comments: 6824 Cisco Category: Experimental C. Raiciu ISSN: 2070-1721 U. Politechnica of Bucharest M. Handley U. College London O. Bonaventure U. catholique de Louvain January 2013
TCP Extensions for Multipath Operation with Multiple Addresses
多地址多路径操作的TCP扩展
Abstract
摘要
TCP/IP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a TCP/IP session would improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure.
TCP/IP通信目前被限制为每个连接只有一条路径,但对等点之间通常存在多条路径。在TCP/IP会话中同时使用这些多条路径将提高网络内的资源利用率,从而通过更高的吞吐量和更好的网络故障恢复能力改善用户体验。
Multipath TCP provides the ability to simultaneously use multiple paths between peers. This document presents a set of extensions to traditional TCP to support multipath operation. The protocol offers the same type of service to applications as TCP (i.e., reliable bytestream), and it provides the components necessary to establish and use multiple TCP flows across potentially disjoint paths.
多路径TCP提供了在对等点之间同时使用多条路径的能力。本文档提供了一组对传统TCP的扩展,以支持多路径操作。该协议为应用程序提供了与TCP相同的服务类型(即可靠的ByTestStream),并提供了跨潜在不相交路径建立和使用多个TCP流所需的组件。
Status of This Memo
关于下段备忘
This document is not an Internet Standards Track specification; it is published for examination, experimental implementation, and evaluation.
本文件不是互联网标准跟踪规范;它是为检查、实验实施和评估而发布的。
This document defines an Experimental Protocol for the Internet community. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.
本文档为互联网社区定义了一个实验协议。本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 5741第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6824.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6824.
Copyright Notice
版权公告
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2013 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
Table of Contents
目录
1. Introduction ....................................................4 1.1. Design Assumptions .........................................4 1.2. Multipath TCP in the Networking Stack ......................5 1.3. Terminology ................................................6 1.4. MPTCP Concept ..............................................7 1.5. Requirements Language ......................................8 2. Operation Overview ..............................................8 2.1. Initiating an MPTCP Connection .............................9 2.2. Associating a New Subflow with an Existing MPTCP Connection .................................................9 2.3. Informing the Other Host about Another Potential Address ..10 2.4. Data Transfer Using MPTCP .................................11 2.5. Requesting a Change in a Path's Priority ..................11 2.6. Closing an MPTCP Connection ...............................12 2.7. Notable Features ..........................................12 3. MPTCP Protocol .................................................12 3.1. Connection Initiation .....................................14 3.2. Starting a New Subflow ....................................18 3.3. General MPTCP Operation ...................................23 3.3.1. Data Sequence Mapping ..............................25 3.3.2. Data Acknowledgments ...............................28 3.3.3. Closing a Connection ...............................29 3.3.4. Receiver Considerations ............................30 3.3.5. Sender Considerations ..............................31 3.3.6. Reliability and Retransmissions ....................32 3.3.7. Congestion Control Considerations ..................33 3.3.8. Subflow Policy .....................................34 3.4. Address Knowledge Exchange (Path Management) ..............35 3.4.1. Address Advertisement ..............................36 3.4.2. Remove Address .....................................39 3.5. Fast Close ................................................40
1. Introduction ....................................................4 1.1. Design Assumptions .........................................4 1.2. Multipath TCP in the Networking Stack ......................5 1.3. Terminology ................................................6 1.4. MPTCP Concept ..............................................7 1.5. Requirements Language ......................................8 2. Operation Overview ..............................................8 2.1. Initiating an MPTCP Connection .............................9 2.2. Associating a New Subflow with an Existing MPTCP Connection .................................................9 2.3. Informing the Other Host about Another Potential Address ..10 2.4. Data Transfer Using MPTCP .................................11 2.5. Requesting a Change in a Path's Priority ..................11 2.6. Closing an MPTCP Connection ...............................12 2.7. Notable Features ..........................................12 3. MPTCP Protocol .................................................12 3.1. Connection Initiation .....................................14 3.2. Starting a New Subflow ....................................18 3.3. General MPTCP Operation ...................................23 3.3.1. Data Sequence Mapping ..............................25 3.3.2. Data Acknowledgments ...............................28 3.3.3. Closing a Connection ...............................29 3.3.4. Receiver Considerations ............................30 3.3.5. Sender Considerations ..............................31 3.3.6. Reliability and Retransmissions ....................32 3.3.7. Congestion Control Considerations ..................33 3.3.8. Subflow Policy .....................................34 3.4. Address Knowledge Exchange (Path Management) ..............35 3.4.1. Address Advertisement ..............................36 3.4.2. Remove Address .....................................39 3.5. Fast Close ................................................40
3.6. Fallback ..................................................41 3.7. Error Handling ............................................45 3.8. Heuristics ................................................45 3.8.1. Port Usage .........................................46 3.8.2. Delayed Subflow Start ..............................46 3.8.3. Failure Handling ...................................47 4. Semantic Issues ................................................48 5. Security Considerations ........................................49 6. Interactions with Middleboxes ..................................51 7. Acknowledgments ................................................55 8. IANA Considerations ............................................55 9. References .....................................................57 9.1. Normative References ......................................57 9.2. Informative References ....................................57 Appendix A. Notes on Use of TCP Options ...........................59 Appendix B. Control Blocks ........................................60 B.1. MPTCP Control Block .......................................60 B.1.1. Authentication and Metadata ........................60 B.1.2. Sending Side .......................................61 B.1.3. Receiving Side .....................................61 B.2. TCP Control Blocks ........................................62 B.2.1. Sending Side .......................................62 B.2.2. Receiving Side .....................................62 Appendix C. Finite State Machine ..................................63
3.6. Fallback ..................................................41 3.7. Error Handling ............................................45 3.8. Heuristics ................................................45 3.8.1. Port Usage .........................................46 3.8.2. Delayed Subflow Start ..............................46 3.8.3. Failure Handling ...................................47 4. Semantic Issues ................................................48 5. Security Considerations ........................................49 6. Interactions with Middleboxes ..................................51 7. Acknowledgments ................................................55 8. IANA Considerations ............................................55 9. References .....................................................57 9.1. Normative References ......................................57 9.2. Informative References ....................................57 Appendix A. Notes on Use of TCP Options ...........................59 Appendix B. Control Blocks ........................................60 B.1. MPTCP Control Block .......................................60 B.1.1. Authentication and Metadata ........................60 B.1.2. Sending Side .......................................61 B.1.3. Receiving Side .....................................61 B.2. TCP Control Blocks ........................................62 B.2.1. Sending Side .......................................62 B.2.2. Receiving Side .....................................62 Appendix C. Finite State Machine ..................................63
Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to provide a Multipath TCP [2] service, which enables a transport connection to operate across multiple paths simultaneously. This document presents the protocol changes required to add multipath capability to TCP; specifically, those for signaling and setting up multiple paths ("subflows"), managing these subflows, reassembly of data, and termination of sessions. This is not the only information required to create a Multipath TCP implementation, however. This document is complemented by three others:
多路径TCP(MPTCP)是常规TCP[1]的一组扩展,提供多路径TCP[2]服务,使传输连接能够同时跨多条路径运行。本文档介绍了向TCP添加多路径功能所需的协议更改;具体地说,用于发送信号和设置多条路径(“子流”)、管理这些子流、重新组装数据和终止会话的协议。然而,这不是创建多路径TCP实现所需的唯一信息。本文件由其他三个文件补充:
o Architecture [2], which explains the motivations behind Multipath TCP, contains a discussion of high-level design decisions on which this design is based, and an explanation of a functional separation through which an extensible MPTCP implementation can be developed.
o 体系结构[2]解释了多路径TCP背后的动机,包括对本设计所基于的高层设计决策的讨论,以及对功能分离的解释,通过该分离可以开发可扩展的MPTCP实现。
o Congestion control [5] presents a safe congestion control algorithm for coupling the behavior of the multiple paths in order to "do no harm" to other network users.
o 拥塞控制[5]提出了一种安全的拥塞控制算法,用于耦合多条路径的行为,以“不伤害”其他网络用户。
o Application considerations [6] discusses what impact MPTCP will have on applications, what applications will want to do with MPTCP, and as a consequence of these factors, what API extensions an MPTCP implementation should present.
o 应用程序注意事项[6]讨论了MPTCP对应用程序的影响,应用程序希望使用MPTCP做什么,以及由于这些因素,MPTCP实现应该提供什么API扩展。
In order to limit the potentially huge design space, the working group imposed two key constraints on the Multipath TCP design presented in this document:
为了限制潜在的巨大设计空间,工作组对本文件中介绍的多路径TCP设计施加了两个关键约束:
o It must be backwards-compatible with current, regular TCP, to increase its chances of deployment.
o 它必须向后兼容当前的常规TCP,以增加部署的机会。
o It can be assumed that one or both hosts are multihomed and multiaddressed.
o 可以假定一个或两个主机是多址和多址的。
To simplify the design, we assume that the presence of multiple addresses at a host is sufficient to indicate the existence of multiple paths. These paths need not be entirely disjoint: they may share one or many routers between them. Even in such a situation, making use of multiple paths is beneficial, improving resource utilization and resilience to a subset of node failures. The congestion control algorithms defined in [5] ensure this does not act detrimentally. Furthermore, there may be some scenarios where different TCP ports on a single host can provide disjoint paths (such
为了简化设计,我们假设主机上存在多个地址足以表明存在多条路径。这些路径不必完全不相交:它们之间可能共享一个或多个路由器。即使在这种情况下,使用多条路径也是有益的,可以提高资源利用率和对节点故障子集的恢复能力。[5]中定义的拥塞控制算法确保这不会产生不利影响。此外,在某些情况下,单个主机上的不同TCP端口可能会提供不相交的路径(例如
as through certain Equal-Cost Multipath (ECMP) implementations [7]), and so the MPTCP design also supports the use of ports in path identifiers.
与某些等成本多路径(ECMP)实现[7])一样,MPTCP设计也支持在路径标识符中使用端口。
There are three aspects to the backwards-compatibility listed above (discussed in more detail in [2]):
上面列出的向后兼容性有三个方面(在[2]中详细讨论):
External Constraints: The protocol must function through the vast majority of existing middleboxes such as NATs, firewalls, and proxies, and as such must resemble existing TCP as far as possible on the wire. Furthermore, the protocol must not assume the segments it sends on the wire arrive unmodified at the destination: they may be split or coalesced; TCP options may be removed or duplicated.
外部约束:协议必须通过NAT、防火墙和代理等绝大多数现有的中间盒运行,因此必须尽可能类似于现有的TCP。此外,协议不得假设它在线路上发送的段未经修改就到达目的地:它们可能被拆分或合并;TCP选项可能被删除或复制。
Application Constraints: The protocol must be usable with no change to existing applications that use the common TCP API (although it is reasonable that not all features would be available to such legacy applications). Furthermore, the protocol must provide the same service model as regular TCP to the application.
应用程序约束:协议必须可用,且不改变使用公共TCP API的现有应用程序(尽管并非所有功能都可用于此类遗留应用程序是合理的)。此外,协议必须向应用程序提供与常规TCP相同的服务模型。
Fallback: The protocol should be able to fall back to standard TCP with no interference from the user, to be able to communicate with legacy hosts.
回退:协议应该能够回退到标准TCP,而不会受到用户的干扰,以便能够与传统主机通信。
The complementary application considerations document [6] discusses the necessary features of an API to provide backwards-compatibility, as well as API extensions to convey the behavior of MPTCP at a level of control and information equivalent to that available with regular, single-path TCP.
补充应用注意事项文档[6]讨论了API提供向后兼容性的必要功能,以及API扩展,以在与常规单路径TCP相同的控制和信息级别传达MPTCP的行为。
Further discussion of the design constraints and associated design decisions are given in the MPTCP Architecture document [2] and in [8].
MPTCP体系结构文件[2]和[8]中给出了设计约束和相关设计决策的进一步讨论。
MPTCP operates at the transport layer and aims to be transparent to both higher and lower layers. It is a set of additional features on top of standard TCP; Figure 1 illustrates this layering. MPTCP is designed to be usable by legacy applications with no changes; detailed discussion of its interactions with applications is given in [6].
MPTCP在传输层运行,其目标是对上层和下层都透明。它是标准TCP之上的一组附加功能;图1说明了这种分层。MPTCP设计为可供遗留应用程序使用,无需更改;[6]中详细讨论了它与应用程序的交互作用。
+-------------------------------+ | Application | +---------------+ +-------------------------------+ | Application | | MPTCP | +---------------+ + - - - - - - - + - - - - - - - + | TCP | | Subflow (TCP) | Subflow (TCP) | +---------------+ +-------------------------------+ | IP | | IP | IP | +---------------+ +-------------------------------+
+-------------------------------+ | Application | +---------------+ +-------------------------------+ | Application | | MPTCP | +---------------+ + - - - - - - - + - - - - - - - + | TCP | | Subflow (TCP) | Subflow (TCP) | +---------------+ +-------------------------------+ | IP | | IP | IP | +---------------+ +-------------------------------+
Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks
图1:标准TCP和MPTCP协议栈的比较
This document makes use of a number of terms that are either MPTCP-specific or have defined meaning in the context of MPTCP, as follows:
本文件使用了许多MPTCP专用术语或在MPTCP上下文中具有定义含义的术语,如下所示:
Path: A sequence of links between a sender and a receiver, defined in this context by a 4-tuple of source and destination address/ port pairs.
路径:发送方和接收方之间的链接序列,在此上下文中由源地址和目标地址/端口对的4元组定义。
Subflow: A flow of TCP segments operating over an individual path, which forms part of a larger MPTCP connection. A subflow is started and terminated similar to a regular TCP connection.
子流:在单个路径上运行的TCP段流,构成较大MPTCP连接的一部分。子流的启动和终止与常规TCP连接类似。
(MPTCP) Connection: A set of one or more subflows, over which an application can communicate between two hosts. There is a one-to-one mapping between a connection and an application socket.
(MPTCP)连接:一组一个或多个子流,应用程序可以通过它在两台主机之间进行通信。连接和应用程序套接字之间存在一对一映射。
Data-level: The payload data is nominally transferred over a connection, which in turn is transported over subflows. Thus, the term "data-level" is synonymous with "connection level", in contrast to "subflow-level", which refers to properties of an individual subflow.
数据级别:有效负载数据名义上通过连接传输,而连接又通过子流传输。因此,术语“数据级别”与“连接级别”同义,而“子流级别”是指单个子流的属性。
Token: A locally unique identifier given to a multipath connection by a host. May also be referred to as a "Connection ID".
令牌:主机为多路径连接提供的本地唯一标识符。也可称为“连接ID”。
Host: An end host operating an MPTCP implementation, and either initiating or accepting an MPTCP connection.
主机:操作MPTCP实现并启动或接受MPTCP连接的终端主机。
In addition to these terms, note that MPTCP's interpretation of, and effect on, regular single-path TCP semantics are discussed in Section 4.
除这些术语外,请注意,第4节讨论了MPTCP对常规单路径TCP语义的解释及其影响。
This section provides a high-level summary of normal operation of MPTCP, and is illustrated by the scenario shown in Figure 2. A detailed description of operation is given in Section 3.
本节提供了MPTCP正常运行的高级总结,如图2所示的场景所示。操作的详细说明见第3节。
o To a non-MPTCP-aware application, MPTCP will behave the same as normal TCP. Extended APIs could provide additional control to MPTCP-aware applications [6]. An application begins by opening a TCP socket in the normal way. MPTCP signaling and operation are handled by the MPTCP implementation.
o 对于不支持MPTCP的应用程序,MPTCP的行为与普通TCP相同。扩展API可以为MPTCP感知应用程序提供额外的控制[6]。应用程序首先以正常方式打开TCP套接字。MPTCP信令和操作由MPTCP实现处理。
o An MPTCP connection begins similarly to a regular TCP connection. This is illustrated in Figure 2 where an MPTCP connection is established between addresses A1 and B1 on Hosts A and B, respectively.
o MPTCP连接的开始方式与常规TCP连接类似。这如图2所示,其中在主机A和主机B上的地址A1和B1之间分别建立了MPTCP连接。
o If extra paths are available, additional TCP sessions (termed MPTCP "subflows") are created on these paths, and are combined with the existing session, which continues to appear as a single connection to the applications at both ends. The creation of the additional TCP session is illustrated between Address A2 on Host A and Address B1 on Host B.
o 如果有额外的路径可用,则在这些路径上创建额外的TCP会话(称为MPTCP“子流”),并与现有会话组合,该会话继续显示为两端应用程序的单个连接。在主机A上的地址A2和主机B上的地址B1之间说明了附加TCP会话的创建。
o MPTCP identifies multiple paths by the presence of multiple addresses at hosts. Combinations of these multiple addresses equate to the additional paths. In the example, other potential paths that could be set up are A1<->B2 and A2<->B2. Although this additional session is shown as being initiated from A2, it could equally have been initiated from B1.
o MPTCP通过主机上存在多个地址来标识多个路径。这些多个地址的组合等同于附加路径。在该示例中,可以设置的其他潜在路径是A1<->B2和A2<->B2。虽然该附加会话显示为从A2启动,但它也可以从B1启动。
o The discovery and setup of additional subflows will be achieved through a path management method; this document describes a mechanism by which a host can initiate new subflows by using its own additional addresses, or by signaling its available addresses to the other host.
o 额外子流的发现和设置将通过路径管理方法实现;本文档描述了一种机制,通过该机制,主机可以使用自己的附加地址或通过向另一台主机发送可用地址的信号来启动新的子流。
o MPTCP adds connection-level sequence numbers to allow the reassembly of segments arriving on multiple subflows with differing network delays.
o MPTCP增加了连接级别的序列号,以允许重新组装到达多个子流且具有不同网络延迟的段。
o Subflows are terminated as regular TCP connections, with a four-way FIN handshake. The MPTCP connection is terminated by a connection-level FIN.
o 子流通过四路FIN握手作为常规TCP连接终止。MPTCP连接由连接级别FIN终止。
Host A Host B ------------------------ ------------------------ Address A1 Address A2 Address B1 Address B2 ---------- ---------- ---------- ---------- | | | | | (initial connection setup) | | |----------------------------------->| | |<-----------------------------------| | | | | | | (additional subflow setup) | | |--------------------->| | | |<---------------------| | | | | | | | | |
Host A Host B ------------------------ ------------------------ Address A1 Address A2 Address B1 Address B2 ---------- ---------- ---------- ---------- | | | | | (initial connection setup) | | |----------------------------------->| | |<-----------------------------------| | | | | | | (additional subflow setup) | | |--------------------->| | | |<---------------------| | | | | | | | | |
Figure 2: Example MPTCP Usage Scenario
图2:MPTCP使用场景示例
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[3]中所述进行解释。
This section presents a single description of common MPTCP operation, with reference to the protocol operation. This is a high-level overview of the key functions; the full specification follows in Section 3. Extensibility and negotiated features are not discussed here. Considerable reference is made to symbolic names of MPTCP options throughout this section -- these are subtypes of the IANA-assigned MPTCP option (see Section 8), and their formats are defined in the detailed protocol specification that follows in Section 3.
本节介绍常见MPTCP操作的单一描述,并参考协议操作。这是对关键功能的高级概述;完整规范见第3节。这里不讨论可扩展性和协商特性。在本节中,大量参考了MPTCP选项的符号名称——这些是IANA分配的MPTCP选项的子类型(见第8节),其格式在第3节后面的详细协议规范中定义。
A Multipath TCP connection provides a bidirectional bytestream between two hosts communicating like normal TCP and, thus, does not require any change to the applications. However, Multipath TCP enables the hosts to use different paths with different IP addresses to exchange packets belonging to the MPTCP connection. A Multipath TCP connection appears like a normal TCP connection to an application. However, to the network layer, each MPTCP subflow looks like a regular TCP flow whose segments carry a new TCP option type. Multipath TCP manages the creation, removal, and utilization of these subflows to send data. The number of subflows that are managed within a Multipath TCP connection is not fixed and it can fluctuate during the lifetime of the Multipath TCP connection.
多路径TCP连接在两台主机之间提供双向ByTestStream,与普通TCP一样通信,因此不需要对应用程序进行任何更改。但是,多路径TCP允许主机使用具有不同IP地址的不同路径来交换属于MPTCP连接的数据包。多路径TCP连接与应用程序的正常TCP连接类似。但是,对于网络层,每个MPTCP子流看起来像一个常规TCP流,其段带有一个新的TCP选项类型。多路径TCP管理这些子流的创建、删除和使用以发送数据。在多路径TCP连接中管理的子流数量不是固定的,并且在多路径TCP连接的生存期内可能会波动。
All MPTCP operations are signaled with a TCP option -- a single numerical type for MPTCP, with "sub-types" for each MPTCP message. What follows is a summary of the purpose and rationale of these messages.
所有MPTCP操作都通过TCP选项发出信号——MPTCP的单一数字类型,每个MPTCP消息都有“子类型”。以下是这些信息的目的和基本原理的摘要。
This is the same signaling as for initiating a normal TCP connection, but the SYN, SYN/ACK, and ACK packets also carry the MP_CAPABLE option. This is variable length and serves multiple purposes. Firstly, it verifies whether the remote host supports Multipath TCP; secondly, this option allows the hosts to exchange some information to authenticate the establishment of additional subflows. Further details are given in Section 3.1.
这与启动正常TCP连接的信令相同,但SYN、SYN/ACK和ACK数据包也带有MP_功能选项。这是可变长度的,有多种用途。首先,验证远程主机是否支持多路径TCP;其次,此选项允许主机交换一些信息以验证附加子流的建立。更多详情见第3.1节。
Host A Host B ------ ------ MP_CAPABLE -> [A's key, flags] <- MP_CAPABLE [B's key, flags] ACK + MP_CAPABLE -> [A's key, B's key, flags]
Host A Host B ------ ------ MP_CAPABLE -> [A's key, flags] <- MP_CAPABLE [B's key, flags] ACK + MP_CAPABLE -> [A's key, B's key, flags]
The exchange of keys in the MP_CAPABLE handshake provides material that can be used to authenticate the endpoints when new subflows will be set up. Additional subflows begin in the same way as initiating a normal TCP connection, but the SYN, SYN/ACK, and ACK packets also carry the MP_JOIN option.
支持MP_的握手中的密钥交换提供了可用于在设置新子流时对端点进行身份验证的材料。其他子流的开始方式与启动正常TCP连接的方式相同,但SYN、SYN/ACK和ACK数据包也带有MP_JOIN选项。
Host A initiates a new subflow between one of its addresses and one of Host B's addresses. The token -- generated from the key -- is used to identify which MPTCP connection it is joining, and the HMAC is used for authentication. The Hash-based Message Authentication Code (HMAC) uses the keys exchanged in the MP_CAPABLE handshake, and the random numbers (nonces) exchanged in these MP_JOIN options. MP_JOIN also contains flags and an Address ID that can be used to refer to the source address without the sender needing to know if it has been changed by a NAT. Further details are in Section 3.2.
主机A在其一个地址和主机B的一个地址之间启动一个新的子流。令牌(由密钥生成)用于标识它要加入的MPTCP连接,HMAC用于身份验证。基于散列的消息身份验证码(HMAC)使用在支持MP_的握手中交换的密钥,以及在这些MP_连接选项中交换的随机数(nonce)。MP_JOIN还包含标志和地址ID,可用于引用源地址,而发送方无需知道它是否已被NAT更改。更多详情见第3.2节。
Host A Host B ------ ------ MP_JOIN -> [B's token, A's nonce, A's Address ID, flags] <- MP_JOIN [B's HMAC, B's nonce, B's Address ID, flags] ACK + MP_JOIN -> [A's HMAC]
Host A Host B ------ ------ MP_JOIN -> [B's token, A's nonce, A's Address ID, flags] <- MP_JOIN [B's HMAC, B's nonce, B's Address ID, flags] ACK + MP_JOIN -> [A's HMAC]
<- ACK
<-确认
The set of IP addresses associated to a multihomed host may change during the lifetime of an MPTCP connection. MPTCP supports the addition and removal of addresses on a host both implicitly and explicitly. If Host A has established a subflow starting at address IP#-A1 and wants to open a second subflow starting at address IP#-A2, it simply initiates the establishment of the subflow as explained above. The remote host will then be implicitly informed about the new address.
在MPTCP连接的生存期内,与多宿主机关联的IP地址集可能会更改。MPTCP支持隐式和显式地在主机上添加和删除地址。如果主机A已经建立了从地址IP#-A1开始的子流,并且想要打开从地址IP#-A2开始的第二个子流,那么它只需启动子流的建立,如上所述。然后将隐式通知远程主机新地址。
In some circumstances, a host may want to advertise to the remote host the availability of an address without establishing a new subflow, for example, when a NAT prevents setup in one direction. In the example below, Host A informs Host B about its alternative IP address (IP#-A2). Host B may later send an MP_JOIN to this new address. Due to the presence of middleboxes that may translate IP addresses, this option uses an address identifier to unambiguously identify an address on a host. Further details are in Section 3.4.1.
在某些情况下,主机可能希望在不建立新子流的情况下向远程主机通告地址的可用性,例如,当NAT阻止在一个方向上进行设置时。在下面的示例中,主机A通知主机B其备用IP地址(IP#-A2)。主机B稍后可能会将MP_加入发送到此新地址。由于存在可转换IP地址的中间盒,此选项使用地址标识符明确标识主机上的地址。更多详情见第3.4.1节。
Host A Host B ------ ------ ADD_ADDR -> [IP#-A2, IP#-A2's Address ID]
Host A Host B ------ ------ ADD_ADDR -> [IP#-A2, IP#-A2's Address ID]
There is a corresponding signal for address removal, making use of the Address ID that is signaled in the add address handshake. Further details in Section 3.4.2.
使用addaddress握手中发出的地址ID,有一个用于地址删除的相应信号。更多详情见第3.4.2节。
Host A Host B ------ ------ REMOVE_ADDR -> [IP#-A2's Address ID]
Host A Host B ------ ------ REMOVE_ADDR -> [IP#-A2's Address ID]
To ensure reliable, in-order delivery of data over subflows that may appear and disappear at any time, MPTCP uses a 64-bit data sequence number (DSN) to number all data sent over the MPTCP connection. Each subflow has its own 32-bit sequence number space and an MPTCP option maps the subflow sequence space to the data sequence space. In this way, data can be retransmitted on different subflows (mapped to the same DSN) in the event of failure.
为确保可靠、有序地通过随时出现和消失的子流传递数据,MPTCP使用64位数据序列号(DSN)对通过MPTCP连接发送的所有数据进行编号。每个子流都有自己的32位序列号空间,MPTCP选项将子流序列空间映射到数据序列空间。这样,在发生故障时,可以在不同的子流(映射到同一DSN)上重新传输数据。
The "Data Sequence Signal" carries the "Data Sequence Mapping". The data sequence mapping consists of the subflow sequence number, data sequence number, and length for which this mapping is valid. This option can also carry a connection-level acknowledgment (the "Data ACK") for the received DSN.
“数据序列信号”携带“数据序列映射”。数据序列映射由子流序列号、数据序列号和该映射有效的长度组成。此选项还可以为接收到的DSN携带连接级别确认(“数据确认”)。
With MPTCP, all subflows share the same receive buffer and advertise the same receive window. There are two levels of acknowledgment in MPTCP. Regular TCP acknowledgments are used on each subflow to acknowledge the reception of the segments sent over the subflow independently of their DSN. In addition, there are connection-level acknowledgments for the data sequence space. These acknowledgments track the advancement of the bytestream and slide the receiving window.
使用MPTCP,所有子流共享相同的接收缓冲区,并播发相同的接收窗口。MPTCP中有两个级别的确认。在每个子流上使用常规TCP确认来确认通过子流发送的段的接收,而不依赖于它们的DSN。此外,还有数据序列空间的连接级别确认。这些确认跟踪ByTestStream的进程并滑动接收窗口。
Further details are in Section 3.3.
更多详情见第3.3节。
Host A Host B ------ ------ DATA_SEQUENCE_SIGNAL -> [Data Sequence Mapping] [Data ACK] [Checksum]
Host A Host B ------ ------ DATA_SEQUENCE_SIGNAL -> [Data Sequence Mapping] [Data ACK] [Checksum]
Hosts can indicate at initial subflow setup whether they wish the subflow to be used as a regular or backup path -- a backup path only being used if there are no regular paths available. During a connection, Host A can request a change in the priority of a subflow through the MP_PRIO signal to Host B. Further details are in Section 3.3.8.
主机可以在初始子流设置时指示是否希望将子流用作常规路径或备份路径—只有在没有常规路径可用时才使用备份路径。在连接过程中,主机a可以通过MP_PRIO信号向主机B请求更改子流的优先级。更多详细信息请参见第3.3.8节。
Host A Host B ------ ------ MP_PRIO ->
Host A Host B ------ ------ MP_PRIO ->
When Host A wants to inform Host B that it has no more data to send, it signals this "Data FIN" as part of the Data Sequence Signal (see above). It has the same semantics and behavior as a regular TCP FIN, but at the connection level. Once all the data on the MPTCP connection has been successfully received, then this message is acknowledged at the connection level with a DATA_ACK. Further details are in Section 3.3.3.
当主机A想要通知主机B它没有更多的数据要发送时,它会将此“数据FIN”作为数据序列信号的一部分发送信号(见上文)。它与常规TCP FIN具有相同的语义和行为,但在连接级别。成功接收MPTCP连接上的所有数据后,将在连接级别使用数据确认确认此消息。更多详情见第3.3.3节。
Host A Host B ------ ------ DATA_SEQUENCE_SIGNAL -> [Data FIN]
Host A Host B ------ ------ DATA_SEQUENCE_SIGNAL -> [Data FIN]
<- (MPTCP DATA_ACK)
<-(MPTCP数据确认)
It is worth highlighting that MPTCP's signaling has been designed with several key requirements in mind:
值得强调的是,MPTCP的信令设计考虑了几个关键要求:
o To cope with NATs on the path, addresses are referred to by Address IDs, in case the IP packet's source address gets changed by a NAT. Setting up a new TCP flow is not possible if the passive opener is behind a NAT; to allow subflows to be created when either end is behind a NAT, MPTCP uses the ADD_ADDR message.
o 为了处理路径上的NAT,地址由地址ID引用,以防NAT更改IP数据包的源地址。如果被动开启器位于NAT后面,则无法设置新的TCP流;为了允许在任意一端位于NAT后面时创建子流,MPTCP使用ADD_ADDR消息。
o MPTCP falls back to ordinary TCP if MPTCP operation is not possible, for example, if one host is not MPTCP capable or if a middlebox alters the payload.
o 如果无法进行MPTCP操作,例如,如果一台主机不支持MPTCP,或者如果一个中间盒改变了有效负载,MPTCP将返回到普通TCP。
o To meet the threats identified in [9], the following steps are taken: keys are sent in the clear in the MP_CAPABLE messages; MP_JOIN messages are secured with HMAC-SHA1 ([10], [4]) using those keys; and standard TCP validity checks are made on the other messages (ensuring sequence numbers are in-window).
o 为应对[9]中确定的威胁,采取以下步骤:在具有MP_功能的消息中以明文形式发送密钥;MP_JOIN消息使用这些密钥由HMAC-SHA1([10],[4])保护;对其他消息进行标准TCP有效性检查(确保序列号在窗口中)。
This section describes the operation of the MPTCP protocol, and is subdivided into sections for each key part of the protocol operation.
本节介绍MPTCP协议的操作,并针对协议操作的每个关键部分细分为几个部分。
All MPTCP operations are signaled using optional TCP header fields. A single TCP option number ("Kind") has been assigned by IANA for MPTCP (see Section 8), and then individual messages will be determined by a "subtype", the values of which are also stored in an IANA registry (and are also listed in Section 8).
所有MPTCP操作都使用可选的TCP头字段发出信号。IANA为MPTCP分配了一个单一的TCP选项号(“种类”)(见第8节),然后将通过一个“子类型”确定单个消息,该“子类型”的值也存储在IANA注册表中(并在第8节中列出)。
Throughout this document, when reference is made to an MPTCP option by symbolic name, such as "MP_CAPABLE", this refers to a TCP option with the single MPTCP option type, and with the subtype value of the symbolic name as defined in Section 8. This subtype is a 4-bit field -- the first 4 bits of the option payload, as shown in Figure 3. The MPTCP messages are defined in the following sections.
在本文件中,当通过符号名称(如“MP_-CAPABLE”)引用MPTCP选项时,这指的是具有单一MPTCP选项类型和第8节中定义的符号名称子类型值的TCP选项。该子类型是一个4位字段——选项有效负载的前4位,如图3所示。MPTCP消息在以下部分中定义。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----------------------+ | Kind | Length |Subtype| | +---------------+---------------+-------+ | | Subtype-specific data | | (variable length) | +---------------------------------------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----------------------+ | Kind | Length |Subtype| | +---------------+---------------+-------+ | | Subtype-specific data | | (variable length) | +---------------------------------------------------------------+
Figure 3: MPTCP Option Format
图3:MPTCP选项格式
Those MPTCP options associated with subflow initiation are used on packets with the SYN flag set. Additionally, there is one MPTCP option for signaling metadata to ensure segmented data can be recombined for delivery to the application.
与子流启动相关联的那些MPTCP选项用于设置了SYN标志的数据包。此外,还有一个MPTCP选项用于发送元数据信号,以确保分段数据可以重新组合以交付给应用程序。
The remaining options, however, are signals that do not need to be on a specific packet, such as those for signaling additional addresses. Whilst an implementation may desire to send MPTCP options as soon as possible, it may not be possible to combine all desired options (both those for MPTCP and for regular TCP, such as SACK (selective acknowledgment) [11]) on a single packet. Therefore, an implementation may choose to send duplicate ACKs containing the additional signaling information. This changes the semantics of a duplicate ACK; these are usually only sent as a signal of a lost segment [12] in regular TCP. Therefore, an MPTCP implementation receiving a duplicate ACK that contains an MPTCP option MUST NOT treat it as a signal of congestion. Additionally, an MPTCP implementation SHOULD NOT send more than two duplicate ACKs in a row for the purposes of sending MPTCP options alone, in order to ensure no middleboxes misinterpret this as a sign of congestion.
然而,剩下的选项是不需要在特定分组上的信号,例如用于发送附加地址的信号。虽然实现可能希望尽快发送MPTCP选项,但可能无法在单个数据包上组合所有所需选项(MPTCP和常规TCP的选项,如SACK(选择性确认)[11])。因此,实现可以选择发送包含附加信令信息的重复ack。这改变了重复ACK的语义;在常规TCP中,这些数据通常仅作为丢失段[12]的信号发送。因此,接收包含MPTCP选项的重复ACK的MPTCP实现不能将其视为拥塞信号。此外,MPTCP实现不应仅为了发送MPTCP选项而发送两个以上的重复ACK,以确保中间盒不会将其误解为拥塞迹象。
Furthermore, standard TCP validity checks (such as ensuring the sequence number and acknowledgment number are within window) MUST be undertaken before processing any MPTCP signals, as described in [13].
此外,如[13]所述,在处理任何MPTCP信号之前,必须进行标准TCP有效性检查(如确保序列号和确认号在窗口内)。
Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a single path. Each packet contains the Multipath Capable (MP_CAPABLE) TCP option (Figure 4). This option declares its sender is capable of performing Multipath TCP and wishes to do so on this particular connection.
连接启动从单个路径上的SYN、SYN/ACK、ACK交换开始。每个数据包都包含支持多路径(支持MP_)的TCP选项(图4)。此选项声明其发送方能够执行多路径TCP,并希望在此特定连接上执行此操作。
This option is used to declare the 64-bit key that the sender has generated for this MPTCP connection. This key is used to authenticate the addition of future subflows to this connection. This is the only time the key will be sent in clear on the wire (unless "fast close", Section 3.5, is used); all future subflows will identify the connection using a 32-bit "token". This token is a cryptographic hash of this key. The algorithm for this process is dependent on the authentication algorithm selected; the method of selection is defined later in this section.
此选项用于声明发送方为此MPTCP连接生成的64位密钥。此密钥用于验证将来添加到此连接的子流。这是唯一一次将钥匙发送到导线上(除非使用第3.5节中的“快速关闭”);所有未来子流将使用32位“令牌”标识连接。此令牌是此密钥的加密哈希。此过程的算法取决于所选的身份验证算法;本节后面将定义选择方法。
This key is generated by its sender, and its method of generation is implementation specific. The key MUST be hard to guess, and it MUST be unique for the sending host at any one time. Recommendations for generating random numbers for use in keys are given in [14]. Connections will be indexed at each host by the token (a one-way hash of the key). Therefore, an implementation will require a mapping from each token to the corresponding connection, and in turn to the keys for the connection.
此密钥由其发送方生成,其生成方法是特定于实现的。密钥必须很难猜测,并且在任何时候对于发送主机来说都必须是唯一的。[14]中给出了生成密钥中使用的随机数的建议。连接将通过令牌(密钥的单向散列)在每个主机上建立索引。因此,实现将需要从每个令牌映射到相应的连接,然后映射到连接的密钥。
There is a risk that two different keys will hash to the same token. The risk of hash collisions is usually small, unless the host is handling many tens of thousands of connections. Therefore, an implementation SHOULD check its list of connection tokens to ensure there is not a collision before sending its key in the SYN/ACK. This would, however, be costly for a server with thousands of connections. The subflow handshake mechanism (Section 3.2) will ensure that new subflows only join the correct connection, however, through the cryptographic handshake, as well as checking the connection tokens in both directions, and ensuring sequence numbers are in-window. So in the worst case if there was a token collision, the new subflow would not succeed, but the MPTCP connection would continue to provide a regular TCP service.
存在两个不同的密钥将散列到同一令牌的风险。哈希冲突的风险通常很小,除非主机正在处理成千上万个连接。因此,实现应该检查其连接令牌列表,以确保在SYN/ACK中发送其密钥之前不会发生冲突。但是,对于具有数千个连接的服务器来说,这将是昂贵的。子流握手机制(第3.2节)将确保新的子流仅通过加密握手加入正确的连接,同时检查两个方向的连接令牌,并确保序列号在窗口中。因此,在最坏的情况下,如果出现令牌冲突,新的子流将不会成功,但MPTCP连接将继续提供常规TCP服务。
The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets that start the first subflow of an MPTCP connection. The data carried by each packet is as follows, where A = initiator and B = listener.
启用MP_的选项在启动MPTCP连接的第一个子流的SYN、SYN/ACK和ACK数据包上进行。每个数据包携带的数据如下所示,其中A=启动器,B=侦听器。
o SYN (A->B): A's Key for this connection.
o SYN(A->B):用于此连接的A的键。
o SYN/ACK (B->A): B's Key for this connection.
o SYN/ACK(B->A):此连接的B键。
o ACK (A->B): A's Key followed by B's Key.
o 确认(A->B):A的键后跟B的键。
The contents of the option is determined by the SYN and ACK flags of the packet, verified by the option's length field. For the diagram shown in Figure 4, "sender" and "receiver" refer to the sender or receiver of the TCP packet (which can be either host). If the SYN flag is set, a single key is included; if only an ACK flag is set, both keys are present.
选项的内容由数据包的SYN和ACK标志确定,并由选项的长度字段验证。对于图4所示的图,“发送方”和“接收方”指的是TCP数据包的发送方或接收方(可以是任何一个主机)。如果设置了SYN标志,则包含一个键;如果仅设置了ACK标志,则两个键都存在。
B's Key is echoed in the ACK in order to allow the listener (Host B) to act statelessly until the TCP connection reaches the ESTABLISHED state. If the listener acts in this way, however, it MUST generate its key in a way that would allow it to verify that it generated the key when it is echoed in the ACK.
在ACK中回显B的密钥,以允许侦听器(主机B)无状态操作,直到TCP连接达到已建立状态。但是,如果侦听器以这种方式操作,则它必须以允许它在ACK中回响时验证它是否生成了密钥的方式生成其密钥。
This exchange allows the safe passage of MPTCP options on SYN packets to be determined. If any of these options are dropped, MPTCP will gracefully fall back to regular single-path TCP, as documented in Section 3.6. Note that new subflows MUST NOT be established (using the process documented in Section 3.2) until a Digital Signature Standard (DSS) option has been successfully received across the path (as documented in Section 3.3).
此交换允许确定SYN数据包上MPTCP选项的安全通道。如第3.6节所述,如果删除这些选项中的任何一个,MPTCP将正常地退回到常规单路径TCP。注意,在路径上成功接收到数字签名标准(DSS)选项(如第3.3节所述)之前,不得建立新的子流(使用第3.2节中记录的流程)。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H| +---------------+---------------+-------+-------+---------------+ | Option Sender's Key (64 bits) | | | | | +---------------------------------------------------------------+ | Option Receiver's Key (64 bits) | | (if option Length == 20) | | | +---------------------------------------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H| +---------------+---------------+-------+-------+---------------+ | Option Sender's Key (64 bits) | | | | | +---------------------------------------------------------------+ | Option Receiver's Key (64 bits) | | (if option Length == 20) | | | +---------------------------------------------------------------+
Figure 4: Multipath Capable (MP_CAPABLE) Option
图4:支持多路径(支持MP_)选项
The first 4 bits of the first octet in the MP_CAPABLE option (Figure 4) define the MPTCP option subtype (see Section 8; for MP_CAPABLE, this is 0), and the remaining 4 bits of this octet specify the MPTCP version in use (for this specification, this is 0).
MP_-CAPABLE选项(图4)中第一个八位字节的前4位定义MPTCP选项子类型(参见第8节;对于MP_-CAPABLE,这是0),该八位字节的其余4位指定正在使用的MPTCP版本(对于本规范,这是0)。
The second octet is reserved for flags, allocated as follows:
第二个八位字节是为标志保留的,分配如下:
A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate "Checksum Required", unless the system administrator has decided that checksums are not required (for example, if the environment is controlled and no middleboxes exist that might adjust the payload).
答:最左边的位(标记为“A”)应设置为1以表示“需要校验和”,除非系统管理员决定不需要校验和(例如,如果环境受到控制,并且不存在可能调整有效负载的中间盒)。
B: The second bit, labeled "B", is an extensibility flag, and MUST be set to 0 for current implementations. This will be used for an extensibility mechanism in a future specification, and the impact of this flag will be defined at a later date. If receiving a message with the 'B' flag set to 1, and this is not understood, then this SYN MUST be silently ignored; the sender is expected to retry with a format compatible with this legacy specification. Note that the length of the MP_CAPABLE option, and the meanings of bits "C" through "H", may be altered by setting B=1.
B:第二位标记为“B”,是一个可扩展性标志,对于当前的实现必须设置为0。这将在将来的规范中用于扩展机制,该标志的影响将在以后定义。如果接收到“B”标志设置为1的消息,但无法理解,则必须静默忽略此SYN;发件人应使用与此旧规范兼容的格式重试。注意,可以通过设置B=1来改变MP_-CAPABLE选项的长度以及位“C”到“H”的含义。
C through H: The remaining bits, labeled "C" through "H", are used for crypto algorithm negotiation. Currently only the rightmost bit, labeled "H", is assigned. Bit "H" indicates the use of HMAC-SHA1 (as defined in Section 3.2). An implementation that only supports this method MUST set bit "H" to 1, and bits "C" through "G" to 0.
C到H:剩余的位标记为“C”到“H”,用于密码算法协商。目前只分配最右边的位,标记为“H”。位“H”表示使用HMAC-SHA1(定义见第3.2节)。仅支持此方法的实现必须将位“H”设置为1,将位“C”到“G”设置为0。
A crypto algorithm MUST be specified. If flag bits C through H are all 0, the MP_CAPABLE option MUST be treated as invalid and ignored (that is, it must be treated as a regular TCP handshake).
必须指定加密算法。如果标志位C到H均为0,则必须将支持MP_的选项视为无效并忽略(即,必须将其视为常规TCP握手)。
The selection of the authentication algorithm also impacts the algorithm used to generate the token and the initial data sequence number (IDSN). In this specification, with only the SHA-1 algorithm (bit "H") specified and selected, the token MUST be a truncated (most significant 32 bits) SHA-1 hash ([4], [15]) of the key. A different, 64-bit truncation (the least significant 64 bits) of the SHA-1 hash of the key MUST be used as the initial data sequence number. Note that the key MUST be hashed in network byte order. Also note that the "least significant" bits MUST be the rightmost bits of the SHA-1 digest, as per [4]. Future specifications of the use of the crypto bits may choose to specify different algorithms for token and IDSN generation.
认证算法的选择还影响用于生成令牌和初始数据序列号(IDSN)的算法。在本规范中,仅指定和选择SHA-1算法(位“H”),令牌必须是密钥的截断(最高有效32位)SHA-1散列([4],[15])。密钥的SHA-1散列的另一个64位截断(最低有效64位)必须用作初始数据序列号。请注意,密钥必须按网络字节顺序散列。另请注意,“最低有效”位必须是SHA-1摘要的最右边的位,如[4]。加密比特的未来使用规范可能会选择为令牌和IDSN生成指定不同的算法。
Both the crypto and checksum bits negotiate capabilities in similar ways. For the Checksum Required bit (labeled "A"), if either host requires the use of checksums, checksums MUST be used. In other words, the only way for checksums not to be used is if both hosts in their SYNs set A=0. This decision is confirmed by the setting of the "A" bit in the third packet (the ACK) of the handshake. For example,
加密和校验位以类似的方式协商功能。对于校验和要求位(标记为“A”),如果任一主机需要使用校验和,则必须使用校验和。换句话说,不使用校验和的唯一方法是,如果其SYN中的两台主机都设置了A=0。通过在握手的第三个分组(ACK)中设置“A”位来确认该决定。例如
if the initiator sets A=0 in the SYN, but the responder sets A=1 in the SYN/ACK, checksums MUST be used in both directions, and the initiator will set A=1 in the ACK. The decision whether to use checksums will be stored by an implementation in a per-connection binary state variable.
如果发起方在SYN中设置A=0,但响应方在SYN/ACK中设置A=1,则必须在两个方向上使用校验和,发起方将在ACK中设置A=1。是否使用校验和的决定将由实现存储在每个连接的二进制状态变量中。
For crypto negotiation, the responder has the choice. The initiator creates a proposal setting a bit for each algorithm it supports to 1 (in this version of the specification, there is only one proposal, so bit "H" will be always set to 1). The responder responds with only 1 bit set -- this is the chosen algorithm. The rationale for this behavior is that the responder will typically be a server with potentially many thousands of connections, so it may wish to choose an algorithm with minimal computational complexity, depending on the load. If a responder does not support (or does not want to support) any of the initiator's proposals, it can respond without an MP_CAPABLE option, thus forcing a fallback to regular TCP.
对于加密协商,响应者可以选择。发起者为其支持的每个算法创建一个建议,将位设置为1(在此版本的规范中,只有一个建议,因此位“H”将始终设置为1)。响应程序只使用1位集进行响应——这是所选的算法。这种行为的基本原理是响应者通常是一个具有数千个潜在连接的服务器,因此它可能希望根据负载选择计算复杂度最低的算法。如果响应者不支持(或不想支持)发起者的任何提议,它可以在没有MP_功能选项的情况下响应,从而强制退回到常规TCP。
The MP_CAPABLE option is only used in the first subflow of a connection, in order to identify the connection; all following subflows will use the "Join" option (see Section 3.2) to join the existing connection.
MP_-CAPABLE选项仅用于连接的第一个子流,以识别连接;以下所有子流将使用“连接”选项(参见第3.2节)连接现有连接。
If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it is assumed that the passive opener is not multipath capable; thus, the MPTCP session MUST operate as a regular, single-path TCP. If a SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT contain one in response. If the third packet (the ACK) does not contain the MP_CAPABLE option, then the session MUST fall back to operating as a regular, single-path TCP. This is to maintain compatibility with middleboxes on the path that drop some or all TCP options. Note that an implementation MAY choose to attempt sending MPTCP options more than one time before making this decision to operate as regular TCP (see Section 3.8).
如果SYN包含支持MP_的选项,但SYN/ACK没有,则假定被动开启器不支持多路径;因此,MPTCP会话必须作为常规的单路径TCP进行操作。如果SYN不包含支持MP_的选项,则SYN/ACK响应中不得包含选项。如果第三个数据包(ACK)不包含支持MP_的选项,则会话必须退回到常规的单路径TCP。这是为了保持与删除部分或所有TCP选项的路径上的中间盒的兼容性。请注意,在决定作为常规TCP运行之前,实现可能会选择多次尝试发送MPTCP选项(请参见第3.8节)。
If the SYN packets are unacknowledged, it is up to local policy to decide how to respond. It is expected that a sender will eventually fall back to single-path TCP (i.e., without the MP_CAPABLE option) in order to work around middleboxes that may drop packets with unknown options; however, the number of multipath-capable attempts that are made first will be up to local policy. It is possible that MPTCP and non-MPTCP SYNs could get reordered in the network. Therefore, the final state is inferred from the presence or absence of the MP_CAPABLE option in the third packet of the TCP handshake. If this option is not present, the connection SHOULD fall back to regular TCP, as documented in Section 3.6.
如果SYN数据包未被确认,则由本地策略决定如何响应。预计发送方最终会退回到单路径TCP(即,没有支持MP_的选项),以便绕过可能丢弃具有未知选项的数据包的中间盒;但是,首先进行的支持多路径的尝试次数将取决于本地策略。MPTCP和非MPTCP SYN可能在网络中重新排序。因此,根据TCP握手的第三个分组中是否存在MP_-CAPABLE选项来推断最终状态。如果不存在此选项,连接应退回到常规TCP,如第3.6节所述。
The initial data sequence number on an MPTCP connection is generated from the key. The algorithm for IDSN generation is also determined from the negotiated authentication algorithm. In this specification, with only the SHA-1 algorithm specified and selected, the IDSN of a host MUST be the least significant 64 bits of the SHA-1 hash of its key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This deterministic generation of the IDSN allows a receiver to ensure that there are no gaps in sequence space at the start of the connection. The SYN with MP_CAPABLE occupies the first octet of data sequence space, although this does not need to be acknowledged at the connection level until the first data is sent (see Section 3.3).
MPTCP连接上的初始数据序列号由密钥生成。IDSN生成算法也由协商认证算法确定。在本规范中,仅指定并选择SHA-1算法时,主机的IDSN必须是其密钥的SHA-1哈希的最低有效64位,即IDSN-a=哈希(key-a)和IDSN-B=哈希(key-B)。IDSN的这种确定性生成允许接收器确保在连接开始时序列空间中没有间隙。具有MP_功能的SYN占用数据序列空间的第一个八位字节,尽管在发送第一个数据之前不需要在连接级别确认这一点(参见第3.3节)。
Once an MPTCP connection has begun with the MP_CAPABLE exchange, further subflows can be added to the connection. Hosts have knowledge of their own address(es), and can become aware of the other host's addresses through signaling exchanges as described in Section 3.4. Using this knowledge, a host can initiate a new subflow over a currently unused pair of addresses. It is permitted for either host in a connection to initiate the creation of a new subflow, but it is expected that this will normally be the original connection initiator (see Section 3.8 for heuristics).
使用支持MP_的exchange启动MPTCP连接后,可以向该连接添加更多子流。主机知道自己的地址,并且可以通过第3.4节所述的信令交换了解其他主机的地址。利用这些知识,主机可以在当前未使用的一对地址上启动新的子流。允许连接中的任一主机启动新子流的创建,但通常情况下,这将是原始连接启动程序(有关启发,请参阅第3.8节)。
A new subflow is started as a normal TCP SYN/ACK exchange. The Join Connection (MP_JOIN) TCP option is used to identify the connection to be joined by the new subflow. It uses keying material that was exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that handshake also negotiates the crypto algorithm in use for the MP_JOIN handshake.
新的子流作为正常的TCP SYN/ACK交换启动。连接连接(MP_Join)TCP选项用于标识新子流要连接的连接。它使用在初始MP_功能握手(第3.1节)中交换的密钥材料,该握手还协商用于MP_连接握手的加密算法。
This section specifies the behavior of MP_JOIN using the HMAC-SHA1 algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK of the three-way handshake, although in each case with a different format.
本节使用HMAC-SHA1算法指定MP_联接的行为。三方握手的SYN、SYN/ACK和ACK中都有MP_连接选项,尽管在每种情况下格式不同。
In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the initiator sends a token, random number, and address ID.
在SYN数据包上的第一个MP_连接中,如图5所示,启动器发送令牌、随机数和地址ID。
The token is used to identify the MPTCP connection and is a cryptographic hash of the receiver's key, as exchanged in the initial MP_CAPABLE handshake (Section 3.1). In this specification, the tokens presented in this option are generated by the SHA-1 ([4], [15]) algorithm, truncated to the most significant 32 bits. The token included in the MP_JOIN option is the token that the receiver of the packet uses to identify this connection; i.e., Host A will
令牌用于标识MPTCP连接,是接收方密钥的加密散列,在初始MP_功能握手中交换(第3.1节)。在本规范中,此选项中显示的令牌由SHA-1([4]、[15])算法生成,截断为最高有效32位。MP_JOIN选项中包含的令牌是数据包的接收方用于标识该连接的令牌;i、 例如,主持一个遗嘱
send Token-B (which is generated from Key-B). Note that the hash generation algorithm can be overridden by the choice of cryptographic handshake algorithm, as defined in Section 3.1.
发送令牌-B(由密钥-B生成)。请注意,如第3.1节所定义,可以通过选择加密握手算法来覆盖哈希生成算法。
The MP_JOIN SYN sends not only the token (which is static for a connection) but also random numbers (nonces) that are used to prevent replay attacks on the authentication method. Recommendations for the generation of random numbers for this purpose are given in [14].
MP_JOIN SYN不仅发送令牌(对于连接来说是静态的),还发送用于防止对身份验证方法的重播攻击的随机数(nonce)。[14]中给出了为此目的生成随机数的建议。
The MP_JOIN option includes an "Address ID". This is an identifier that only has significance within a single connection, where it identifies the source address of this packet, even if the IP header has been changed in transit by a middlebox. The Address ID allows address removal (Section 3.4.2) without needing to know what the source address at the receiver is, thus allowing address removal through NATs. The Address ID also allows correlation between new subflow setup attempts and address signaling (Section 3.4.1), to prevent setting up duplicate subflows on the same path, if an MP_JOIN and ADD_ADDR are sent at the same time.
MP_JOIN选项包括一个“地址ID”。这是一个仅在单个连接中具有重要意义的标识符,其中它标识此数据包的源地址,即使IP报头在传输过程中被中间盒更改。地址ID允许地址删除(第3.4.2节),而无需知道接收方的源地址,因此允许通过NAT删除地址。地址ID还允许在新的子流设置尝试和地址信令之间进行关联(第3.4.1节),以防止在同一路径上设置重复的子流,前提是同时发送MP_JOIN和ADD_ADDR。
The Address IDs of the subflow used in the initial SYN exchange of the first subflow in the connection are implicit, and have the value zero. A host MUST store the mappings between Address IDs and addresses both for itself and the remote host. An implementation will also need to know which local and remote Address IDs are associated with which established subflows, for when addresses are removed from a local or remote host.
连接中第一个子流的初始SYN交换中使用的子流的地址ID是隐式的,其值为零。主机必须存储自身和远程主机的地址ID和地址之间的映射。实现还需要知道哪些本地和远程地址ID与哪些已建立的子流相关联,例如何时从本地或远程主机删除地址。
The MP_JOIN option on packets with the SYN flag set also includes 4 bits of flags, 3 of which are currently reserved and MUST be set to zero by the sender. The final bit, labeled "B", indicates whether the sender of this option wishes this subflow to be used as a backup path (B=1) in the event of failure of other paths, or whether it wants it to be used as part of the connection immediately. By setting B=1, the sender of the option is requesting the other host to only send data on this subflow if there are no available subflows where B=0. Subflow policy is discussed in more detail in Section 3.3.8.
设置SYN标志的数据包上的MP_JOIN选项还包括4位标志,其中3位是当前保留的,发送方必须将其设置为零。最后一位标记为“B”,表示此选项的发送方是否希望在其他路径出现故障时将此子流用作备份路径(B=1),或者是否希望立即将其用作连接的一部分。通过设置B=1,如果B=0时没有可用的子流,则选项的发送方将请求其他主机仅在此子流上发送数据。第3.3.8节详细讨论了子流策略。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----+-+---------------+ | Kind | Length = 12 |Subtype| |B| Address ID | +---------------+---------------+-------+-----+-+---------------+ | Receiver's Token (32 bits) | +---------------------------------------------------------------+ | Sender's Random Number (32 bits) | +---------------------------------------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----+-+---------------+ | Kind | Length = 12 |Subtype| |B| Address ID | +---------------+---------------+-------+-----+-+---------------+ | Receiver's Token (32 bits) | +---------------------------------------------------------------+ | Sender's Random Number (32 bits) | +---------------------------------------------------------------+
Figure 5: Join Connection (MP_JOIN) Option (for Initial SYN)
图5:联接连接(MP_联接)选项(用于初始SYN)
When receiving a SYN with an MP_JOIN option that contains a valid token for an existing MPTCP connection, the recipient SHOULD respond with a SYN/ACK also containing an MP_JOIN option containing a random number and a truncated (leftmost 64 bits) Hash-based Message Authentication Code (HMAC). This version of the option is shown in Figure 6. If the token is unknown, or the host wants to refuse subflow establishment (for example, due to a limit on the number of subflows it will permit), the receiver will send back a reset (RST) signal, analogous to an unknown port in TCP. Although calculating an HMAC requires cryptographic operations, it is believed that the 32- bit token in the MP_JOIN SYN gives sufficient protection against blind state exhaustion attacks; therefore, there is no need to provide mechanisms to allow a responder to operate statelessly at the MP_JOIN stage.
当接收到包含现有MPTCP连接的有效令牌的MP_连接选项的SYN时,收件人应使用SYN/ACK进行响应,该SYN/ACK还包含一个MP_连接选项,该选项包含一个随机数和一个截断的(最左边的64位)基于散列的消息身份验证码(HMAC)。此版本的选项如图6所示。如果令牌未知,或者主机想要拒绝建立子流(例如,由于允许的子流数量的限制),则接收器将发回重置(RST)信号,类似于TCP中的未知端口。虽然计算HMAC需要加密操作,但据信MP_JOIN SYN中的32位令牌提供了足够的保护,以防止盲状态耗尽攻击;因此,不需要提供允许响应者在MP_连接阶段无状态操作的机制。
An HMAC is sent by both hosts -- by the initiator (Host A) in the third packet (the ACK) and by the responder (Host B) in the second packet (the SYN/ACK). Doing the HMAC exchange at this stage allows both hosts to have first exchanged random data (in the first two SYN packets) that is used as the "message". This specification defines that HMAC as defined in [10] is used, along with the SHA-1 hash algorithm [4] (potentially implemented as in [15]), thus generating a 160-bit / 20-octet HMAC. Due to option space limitations, the HMAC included in the SYN/ACK is truncated to the leftmost 64 bits, but this is acceptable since random numbers are used; thus, an attacker only has one chance to guess the HMAC correctly (if the HMAC is incorrect, the TCP connection is closed, so a new MP_JOIN negotiation with a new random number is required).
HMAC由两个主机发送——在第三个数据包(ACK)中由发起方(主机A)发送,在第二个数据包(SYN/ACK)中由响应方(主机B)发送。在此阶段进行HMAC交换允许两台主机首先交换用作“消息”的随机数据(在前两个SYN数据包中)。本规范规定使用[10]中定义的HMAC以及SHA-1哈希算法[4](可能如[15]中所述实现),从而生成160位/20八位HMAC。由于选项空间的限制,SYN/ACK中包含的HMAC被截断为最左边的64位,但这是可以接受的,因为使用了随机数;因此,攻击者只有一次机会正确猜测HMAC(如果HMAC不正确,TCP连接将关闭,因此需要使用新的随机数进行新的MP_连接协商)。
The initiator's authentication information is sent in its first ACK (the third packet of the handshake), as shown in Figure 7. This data needs to be sent reliably, since it is the only time this HMAC is sent; therefore, receipt of this packet MUST trigger a regular TCP ACK in response, and the packet MUST be retransmitted if this ACK is not received. In other words, sending the ACK/MP_JOIN packet places the subflow in the PRE_ESTABLISHED state, and it moves to the
启动器的身份验证信息在其第一个ACK(握手的第三个数据包)中发送,如图7所示。该数据需要可靠地发送,因为这是唯一一次发送该HMAC;因此,收到此数据包必须触发一个常规TCP ACK响应,如果未收到此ACK,则必须重新传输该数据包。换句话说,发送ACK/MP_JOIN数据包将子流置于预先建立的状态,并移动到
ESTABLISHED state only on receipt of an ACK from the receiver. It is not permitted to send data while in the PRE_ESTABLISHED state. The reserved bits in this option MUST be set to zero by the sender.
仅在接收到来自接收器的ACK时建立状态。在预先建立的状态下,不允许发送数据。发送方必须将此选项中的保留位设置为零。
The key for the HMAC algorithm, in the case of the message transmitted by Host A, will be Key-A followed by Key-B, and in the case of Host B, Key-B followed by Key-A. These are the keys that were exchanged in the original MP_CAPABLE handshake. The "message" for the HMAC algorithm in each case is the concatenations of random number for each host (denoted by R): for Host A, R-A followed by R-B; and for Host B, R-B followed by R-A.
HMAC算法的密钥,在主机A传输消息的情况下,将是key-A后跟key-B,在主机B的情况下,key-B后跟key-A。这些密钥是在原始MP_握手中交换的密钥。在每种情况下,HMAC算法的“消息”是每个主机的随机数的串联(用R表示):对于主机A,R-A后跟R-B;对于主机B,R-B后跟R-A。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----+-+---------------+ | Kind | Length = 16 |Subtype| |B| Address ID | +---------------+---------------+-------+-----+-+---------------+ | | | Sender's Truncated HMAC (64 bits) | | | +---------------------------------------------------------------+ | Sender's Random Number (32 bits) | +---------------------------------------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----+-+---------------+ | Kind | Length = 16 |Subtype| |B| Address ID | +---------------+---------------+-------+-----+-+---------------+ | | | Sender's Truncated HMAC (64 bits) | | | +---------------------------------------------------------------+ | Sender's Random Number (32 bits) | +---------------------------------------------------------------+
Figure 6: Join Connection (MP_JOIN) Option (for Responding SYN/ACK)
图6:连接连接(MP_连接)选项(用于响应SYN/ACK)
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----------------------+ | Kind | Length = 24 |Subtype| (reserved) | +---------------+---------------+-------+-----------------------+ | | | | | Sender's HMAC (160 bits) | | | | | +---------------------------------------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----------------------+ | Kind | Length = 24 |Subtype| (reserved) | +---------------+---------------+-------+-----------------------+ | | | | | Sender's HMAC (160 bits) | | | | | +---------------------------------------------------------------+
Figure 7: Join Connection (MP_JOIN) Option (for Third ACK)
图7:连接连接(MP_连接)选项(用于第三次确认)
These various TCP options fit together to enable authenticated subflow setup as illustrated in Figure 8.
这些不同的TCP选项组合在一起,以启用经过身份验证的子流设置,如图8所示。
Host A Host B ------------------------ ---------- Address A1 Address A2 Address B1 ---------- ---------- ---------- | | | | SYN + MP_CAPABLE(Key-A) | |--------------------------------------------->| |<---------------------------------------------| | SYN/ACK + MP_CAPABLE(Key-B) | | | | | ACK + MP_CAPABLE(Key-A, Key-B) | |--------------------------------------------->| | | | | | SYN + MP_JOIN(Token-B, R-A) | | |------------------------------->| | |<-------------------------------| | | SYN/ACK + MP_JOIN(HMAC-B, R-B) | | | | | | ACK + MP_JOIN(HMAC-A) | | |------------------------------->| | |<-------------------------------| | | ACK |
Host A Host B ------------------------ ---------- Address A1 Address A2 Address B1 ---------- ---------- ---------- | | | | SYN + MP_CAPABLE(Key-A) | |--------------------------------------------->| |<---------------------------------------------| | SYN/ACK + MP_CAPABLE(Key-B) | | | | | ACK + MP_CAPABLE(Key-A, Key-B) | |--------------------------------------------->| | | | | | SYN + MP_JOIN(Token-B, R-A) | | |------------------------------->| | |<-------------------------------| | | SYN/ACK + MP_JOIN(HMAC-B, R-B) | | | | | | ACK + MP_JOIN(HMAC-A) | | |------------------------------->| | |<-------------------------------| | | ACK |
HMAC-A = HMAC(Key=(Key-A+Key-B), Msg=(R-A+R-B)) HMAC-B = HMAC(Key=(Key-B+Key-A), Msg=(R-B+R-A))
HMAC-A = HMAC(Key=(Key-A+Key-B), Msg=(R-A+R-B)) HMAC-B = HMAC(Key=(Key-B+Key-A), Msg=(R-B+R-A))
Figure 8: Example Use of MPTCP Authentication
图8:MPTCP身份验证的示例使用
If the token received at Host B is unknown or local policy prohibits the acceptance of the new subflow, the recipient MUST respond with a TCP RST for the subflow.
如果主机B上接收到的令牌未知或本地策略禁止接受新的子流,则接收方必须使用子流的TCP RST进行响应。
If the token is accepted at Host B, but the HMAC returned to Host A does not match the one expected, Host A MUST close the subflow with a TCP RST.
如果主机B接受令牌,但返回到主机A的HMAC与预期的不匹配,则主机A必须使用TCP RST关闭子流。
If Host B does not receive the expected HMAC, or the MP_JOIN option is missing from the ACK, it MUST close the subflow with a TCP RST.
如果主机B未接收到预期的HMAC,或者ACK中缺少MP_JOIN选项,则它必须使用TCP RST关闭子流。
If the HMACs are verified as correct, then both hosts have authenticated each other as being the same peers as existed at the start of the connection, and they have agreed of which connection this subflow will become a part.
如果HMAC被验证为正确,则两个主机都已将彼此验证为连接开始时存在的相同对等方,并且它们已同意此子流将成为哪个连接的一部分。
If the SYN/ACK as received at Host A does not have an MP_JOIN option, Host A MUST close the subflow with a RST.
如果主机A接收到的SYN/ACK没有MP_连接选项,则主机A必须使用RST关闭子流。
This covers all cases of the loss of an MP_JOIN. In more detail, if MP_JOIN is stripped from the SYN on the path from A to B, and Host B does not have a passive opener on the relevant port, it will respond with a RST in the normal way. If in response to a SYN with an MP_JOIN option, a SYN/ACK is received without the MP_JOIN option (either since it was stripped on the return path, or it was stripped on the outgoing path but the passive opener on Host B responded as if it were a new regular TCP session), then the subflow is unusable and Host A MUST close it with a RST.
这涵盖了MP_连接丢失的所有情况。更详细地说,如果在从A到B的路径上从SYN剥离MP_连接,并且主机B在相关端口上没有被动开启器,它将以正常方式响应RST。如果响应带有MP_JOIN选项的SYN时,在没有MP_JOIN选项的情况下接收到SYN/ACK(因为它在返回路径上被剥离,或者在传出路径上被剥离,但主机B上的被动开启器的响应与新的常规TCP会话类似),则子流不可用,主机a必须使用RST将其关闭。
Note that additional subflows can be created between any pair of ports (but see Section 3.8 for heuristics); no explicit application-level accept calls or bind calls are required to open additional subflows. To associate a new subflow with an existing connection, the token supplied in the subflow's SYN exchange is used for demultiplexing. This then binds the 5-tuple of the TCP subflow to the local token of the connection. A consequence is that it is possible to allow any port pairs to be used for a connection.
请注意,任何一对端口之间都可以创建额外的子流(但试探法见第3.8节);打开其他子流不需要显式的应用程序级接受调用或绑定调用。要将新子流与现有连接关联,子流的SYN交换中提供的令牌用于解复用。然后将TCP子流的5元组绑定到连接的本地令牌。结果是,可以允许任何端口对用于连接。
Demultiplexing subflow SYNs MUST be done using the token; this is unlike traditional TCP, where the destination port is used for demultiplexing SYN packets. Once a subflow is set up, demultiplexing packets is done using the 5-tuple, as in traditional TCP. The 5-tuples will be mapped to the local connection identifier (token). Note that Host A will know its local token for the subflow even though it is not sent on the wire -- only the responder's token is sent.
解复用子流SYN必须使用令牌完成;这与传统TCP不同,传统TCP使用目标端口来解复用SYN数据包。一旦建立了一个子流,就可以像传统的TCP一样使用5元组来完成数据包的解复用。5元组将映射到本地连接标识符(令牌)。请注意,主机A将知道其子流的本地令牌,即使它不是通过线路发送的——只发送响应者的令牌。
This section discusses operation of MPTCP for data transfer. At a high level, an MPTCP implementation will take one input data stream from an application, and split it into one or more subflows, with sufficient control information to allow it to be reassembled and delivered reliably and in order to the recipient application. The following subsections define this behavior in detail.
本节讨论MPTCP用于数据传输的操作。在较高级别上,MPTCP实现将从应用程序获取一个输入数据流,并将其拆分为一个或多个子流,具有足够的控制信息,以便可靠地重新组装并交付给接收方应用程序。以下小节详细定义了此行为。
The data sequence mapping and the Data ACK are signaled in the Data Sequence Signal (DSS) option (Figure 9). Either or both can be signaled in one DSS, dependent on the flags set. The data sequence mapping defines how the sequence space on the subflow maps to the connection level, and the Data ACK acknowledges receipt of data at the connection level. These functions are described in more detail in the following two subsections.
数据序列映射和数据确认在数据序列信号(DSS)选项中发出信号(图9)。根据设置的标志,可以在一个DSS中发送一个或两个信号。数据序列映射定义子流上的序列空间如何映射到连接级别,数据确认确认在连接级别接收数据。以下两小节将更详细地描述这些功能。
Either or both the data sequence mapping and the Data ACK can be signaled in the DSS option, dependent on the flags set.
根据设置的标志,数据序列映射和数据确认中的一个或两个可以在DSS选项中发出信号。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+----------------------+ | Kind | Length |Subtype| (reserved) |F|m|M|a|A| +---------------+---------------+-------+----------------------+ | Data ACK (4 or 8 octets, depending on flags) | +--------------------------------------------------------------+ | Data sequence number (4 or 8 octets, depending on flags) | +--------------------------------------------------------------+ | Subflow Sequence Number (4 octets) | +-------------------------------+------------------------------+ | Data-Level Length (2 octets) | Checksum (2 octets) | +-------------------------------+------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+----------------------+ | Kind | Length |Subtype| (reserved) |F|m|M|a|A| +---------------+---------------+-------+----------------------+ | Data ACK (4 or 8 octets, depending on flags) | +--------------------------------------------------------------+ | Data sequence number (4 or 8 octets, depending on flags) | +--------------------------------------------------------------+ | Subflow Sequence Number (4 octets) | +-------------------------------+------------------------------+ | Data-Level Length (2 octets) | Checksum (2 octets) | +-------------------------------+------------------------------+
Figure 9: Data Sequence Signal (DSS) Option
图9:数据序列信号(DSS)选项
The flags, when set, define the contents of this option, as follows:
设置标志后,将定义此选项的内容,如下所示:
o A = Data ACK present
o A=存在数据确认
o a = Data ACK is 8 octets (if not set, Data ACK is 4 octets)
o a=数据确认为8个八位字节(如果未设置,数据确认为4个八位字节)
o M = Data Sequence Number (DSN), Subflow Sequence Number (SSN), Data-Level Length, and Checksum present
o M=存在的数据序列号(DSN)、子流序列号(SSN)、数据级长度和校验和
o m = Data sequence number is 8 octets (if not set, DSN is 4 octets)
o m=数据序列号为8个八位字节(如果未设置,DSN为4个八位字节)
The flags 'a' and 'm' only have meaning if the corresponding 'A' or 'M' flags are set; otherwise, they will be ignored. The maximum length of this option, with all flags set, is 28 octets.
仅当设置了相应的“a”或“m”标志时,标志“a”和“m”才有意义;否则,它们将被忽略。设置了所有标志后,此选项的最大长度为28个八位字节。
The 'F' flag indicates "DATA_FIN". If present, this means that this mapping covers the final data from the sender. This is the connection-level equivalent to the FIN flag in single-path TCP. A connection is not closed unless there has been a DATA_FIN exchange or a timeout. The purpose of the DATA_FIN and the interactions between this flag, the subflow-level FIN flag, and the data sequence mapping are described in Section 3.3.3. The remaining reserved bits MUST be set to zero by an implementation of this specification.
“F”标志表示“数据FIN”。如果存在,这意味着此映射覆盖来自发送方的最终数据。这是与单路径TCP中的FIN标志等效的连接级别。除非发生数据交换或超时,否则连接不会关闭。第3.3.3节描述了数据FIN的用途以及该标志、亚流级FIN标志和数据序列映射之间的相互作用。其余保留位必须通过本规范的实现设置为零。
Note that the checksum is only present in this option if the use of MPTCP checksumming has been negotiated at the MP_CAPABLE handshake (see Section 3.1). The presence of the checksum can be inferred from the length of the option. If a checksum is present, but its use had not been negotiated in the MP_CAPABLE handshake, the checksum field MUST be ignored. If a checksum is not present when its use has been negotiated, the receiver MUST close the subflow with a RST as it is considered broken.
请注意,只有当MPTCP校验和的使用已在支持MP_的握手中协商时,校验和才出现在该选项中(见第3.1节)。可以从选项的长度推断是否存在校验和。如果存在校验和,但其使用尚未在支持MP_的握手中协商,则必须忽略校验和字段。如果在协商使用校验和时校验和不存在,则接收器必须使用RST关闭子流,因为它被认为已断开。
The data stream as a whole can be reassembled through the use of the data sequence mapping components of the DSS option (Figure 9), which define the mapping from the subflow sequence number to the data sequence number. This is used by the receiver to ensure in-order delivery to the application layer. Meanwhile, the subflow-level sequence numbers (i.e., the regular sequence numbers in the TCP header) have subflow-only relevance. It is expected (but not mandated) that SACK [11] is used at the subflow level to improve efficiency.
数据流作为一个整体可以通过使用DSS选项的数据序列映射组件(图9)重新组装,该组件定义了从子流序列号到数据序列号的映射。接收者使用它来确保有序地交付到应用层。同时,子流级序列号(即TCP报头中的常规序列号)仅具有子流相关性。预计(但不是强制要求)SACK[11]在亚流级别使用以提高效率。
The data sequence mapping specifies a mapping from subflow sequence space to data sequence space. This is expressed in terms of starting sequence numbers for the subflow and the data level, and a length of bytes for which this mapping is valid. This explicit mapping for a range of data was chosen rather than per-packet signaling to assist with compatibility with situations where TCP/IP segmentation or coalescing is undertaken separately from the stack that is generating the data flow (e.g., through the use of TCP segmentation offloading on network interface cards, or by middleboxes such as performance enhancing proxies). It also allows a single mapping to cover many packets, which may be useful in bulk transfer situations.
数据序列映射指定从子流序列空间到数据序列空间的映射。这表示为子流和数据级别的起始序列号,以及此映射有效的字节长度。选择一系列数据的显式映射,而不是每个数据包的信令,以帮助与TCP/IP分段或合并与生成数据流的堆栈分开进行的情况兼容(例如,通过使用网络接口卡上的TCP分段卸载,或通过性能增强代理等中间盒)。它还允许单个映射覆盖多个数据包,这在批量传输情况下可能很有用。
A mapping is fixed, in that the subflow sequence number is bound to the data sequence number after the mapping has been processed. A sender MUST NOT change this mapping after it has been declared; however, the same data sequence number can be mapped to by different subflows for retransmission purposes (see Section 3.3.6). This would also permit the same data to be sent simultaneously on multiple subflows for resilience or efficiency purposes, especially in the case of lossy links. Although the detailed specification of such operation is outside the scope of this document, an implementation SHOULD treat the first data that is received at a subflow for the data sequence space as that which should be delivered to the application, and any later data for that sequence space ignored.
映射是固定的,因为子流序列号在处理映射后绑定到数据序列号。发送方在声明此映射后不得更改此映射;但是,相同的数据序列号可以通过不同的子流映射到相同的数据序列号,以便重新传输(参见第3.3.6节)。这还将允许在多个子流上同时发送相同的数据,以实现弹性或效率目的,特别是在有损链路的情况下。尽管此类操作的详细规范不在本文档的范围内,但实现应将在数据序列空间的子流处接收的第一个数据视为应交付给应用程序的数据,并忽略该序列空间的任何后续数据。
The data sequence number is specified as an absolute value, whereas the subflow sequence numbering is relative (the SYN at the start of the subflow has relative subflow sequence number 0). This is to allow middleboxes to change the initial sequence number of a subflow, such as firewalls that undertake ISN randomization.
数据序列号指定为绝对值,而子流序列号是相对的(子流开头的SYN具有相对子流序列号0)。这是为了允许中间盒更改子流的初始序列号,例如进行ISN随机化的防火墙。
The data sequence mapping also contains a checksum of the data that this mapping covers, if use of checksums has been negotiated at the MP_CAPABLE exchange. Checksums are used to detect if the payload has been adjusted in any way by a non-MPTCP-aware middlebox. If this checksum fails, it will trigger a failure of the subflow, or a
如果校验和的使用已在支持MP_的交换中协商,则数据序列映射还包含此映射所覆盖的数据校验和。校验和用于检测非MPTCP感知的中间盒是否以任何方式调整了有效负载。如果此校验和失败,将触发子流失败,或
fallback to regular TCP, as documented in Section 3.6, since MPTCP can no longer reliably know the subflow sequence space at the receiver to build data sequence mappings.
回退到常规TCP,如第3.6节所述,因为MPTCP不再能够可靠地知道接收器处的子流序列空间以构建数据序列映射。
The checksum algorithm used is the standard TCP checksum [1], operating over the data covered by this mapping, along with a pseudo-header as shown in Figure 10.
使用的校验和算法是标准的TCP校验和[1],在该映射所覆盖的数据上进行操作,并带有一个伪头,如图10所示。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +--------------------------------------------------------------+ | | | Data Sequence Number (8 octets) | | | +--------------------------------------------------------------+ | Subflow Sequence Number (4 octets) | +-------------------------------+------------------------------+ | Data-Level Length (2 octets) | Zeros (2 octets) | +-------------------------------+------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +--------------------------------------------------------------+ | | | Data Sequence Number (8 octets) | | | +--------------------------------------------------------------+ | Subflow Sequence Number (4 octets) | +-------------------------------+------------------------------+ | Data-Level Length (2 octets) | Zeros (2 octets) | +-------------------------------+------------------------------+
Figure 10: Pseudo-Header for DSS Checksum
图10:DSS校验和的伪头
Note that the data sequence number used in the pseudo-header is always the 64-bit value, irrespective of what length is used in the DSS option itself. The standard TCP checksum algorithm has been chosen since it will be calculated anyway for the TCP subflow, and if calculated first over the data before adding the pseudo-headers, it only needs to be calculated once. Furthermore, since the TCP checksum is additive, the checksum for a DSN_MAP can be constructed by simply adding together the checksums for the data of each constituent TCP segment, and adding the checksum for the DSS pseudo-header.
请注意,伪标头中使用的数据序列号始终为64位值,而与DSS选项本身中使用的长度无关。选择了标准的TCP校验和算法,因为它将针对TCP子流进行计算,如果在添加伪头之前先对数据进行计算,则只需计算一次。此外,由于TCP校验和是相加的,DSN_映射的校验和可以通过简单地将每个组成TCP段的数据的校验和相加,并将DSS伪报头的校验和相加来构造。
Note that checksumming relies on the TCP subflow containing contiguous data; therefore, a TCP subflow MUST NOT use the Urgent Pointer to interrupt an existing mapping. Further note, however, that if Urgent data is received on a subflow, it SHOULD be mapped to the data sequence space and delivered to the application analogous to Urgent data in regular TCP.
注意,校验和依赖于包含连续数据的TCP子流;因此,TCP子流不得使用紧急指针中断现有映射。但是,进一步注意,如果在子流上接收到紧急数据,则应将其映射到数据序列空间,并将其发送到应用程序,类似于常规TCP中的紧急数据。
To avoid possible deadlock scenarios, subflow-level processing should be undertaken separately from that at connection level. Therefore, even if a mapping does not exist from the subflow space to the data-level space, the data SHOULD still be ACKed at the subflow (if it is in-window). This data cannot, however, be acknowledged at the data level (Section 3.3.2) because its data sequence numbers are unknown. Implementations MAY hold onto such unmapped data for a short while in the expectation that a mapping will arrive shortly. Such unmapped
为了避免可能的死锁情况,子流级别的处理应该与连接级别的处理分开进行。因此,即使不存在从子流空间到数据级空间的映射,数据仍应在子流处确认(如果在窗口中)。但是,无法在数据级别(第3.3.2节)确认该数据,因为其数据序列号未知。实现可能会暂时保留这些未映射的数据,期望映射很快就会到来。这样的未映射
data cannot be counted as being within the connection level receive window because this is relative to the data sequence numbers, so if the receiver runs out of memory to hold this data, it will have to be discarded. If a mapping for that subflow-level sequence space does not arrive within a receive window of data, that subflow SHOULD be treated as broken, closed with a RST, and any unmapped data silently discarded.
无法将数据计算为连接级别接收窗口内的数据,因为这与数据序列号有关,因此如果接收器内存不足,无法保存此数据,则必须将其丢弃。如果该子流级别序列空间的映射未到达数据的接收窗口内,则应将该子流视为已断开,并使用RST关闭,任何未映射的数据都将被悄悄丢弃。
Data sequence numbers are always 64-bit quantities, and MUST be maintained as such in implementations. If a connection is progressing at a slow rate, so protection against wrapped sequence numbers is not required, then it is permissible to include just the lower 32 bits of the data sequence number in the data sequence mapping and/or Data ACK as an optimization, and an implementation can make this choice independently for each packet.
数据序列号始终是64位的数量,在实现中必须如此维护。如果连接正在以低速进行,因此不需要针对包装的序列号进行保护,则允许在数据序列映射和/或数据ACK中仅包括数据序列号的较低32位作为优化,并且实现可以针对每个分组独立地做出此选择。
An implementation MUST send the full 64-bit data sequence number if it is transmitting at a sufficiently high rate that the 32-bit value could wrap within the Maximum Segment Lifetime (MSL) [16]. The lengths of the DSNs used in these values (which may be different) are declared with flags in the DSS option. Implementations MUST accept a 32-bit DSN and implicitly promote it to a 64-bit quantity by incrementing the upper 32 bits of sequence number each time the lower 32 bits wrap. A sanity check MUST be implemented to ensure that a wrap occurs at an expected time (e.g., the sequence number jumps from a very high number to a very low number) and is not triggered by out-of-order packets.
如果实现以足够高的速率传输,32位值可以在最大段生存期(MSL)内包装,则必须发送完整的64位数据序列号[16]。这些值中使用的DSN的长度(可能不同)在DSS选项中用标志声明。实现必须接受32位DSN,并通过每次低32位换行时增加序列号的高32位来隐式地将其提升到64位数量。必须执行健全性检查,以确保在预期时间(例如,序列号从非常高的数字跳到非常低的数字)发生换行,并且不会由无序数据包触发。
As with the standard TCP sequence number, the data sequence number should not start at zero, but at a random value to make blind session hijacking harder. This specification requires setting the initial data sequence number (IDSN) of each host to the least significant 64 bits of the SHA-1 hash of the host's key, as described in Section 3.1.
与标准TCP序列号一样,数据序列号不应以零开始,而是以随机值开始,以使盲会话劫持更加困难。本规范要求将每个主机的初始数据序列号(IDSN)设置为主机密钥SHA-1哈希的最低有效64位,如第3.1节所述。
A data sequence mapping does not need to be included in every MPTCP packet, as long as the subflow sequence space in that packet is covered by a mapping known at the receiver. This can be used to reduce overhead in cases where the mapping is known in advance; one such case is when there is a single subflow between the hosts, another is when segments of data are scheduled in larger than packet-sized chunks.
数据序列映射不需要包括在每个MPTCP分组中,只要该分组中的子流序列空间由接收机处已知的映射覆盖。这可用于在映射事先已知的情况下减少开销;一种情况是当主机之间存在单个子流时,另一种情况是当数据段调度为大于数据包大小的块时。
An "infinite" mapping can be used to fall back to regular TCP by mapping the subflow-level data to the connection-level data for the remainder of the connection (see Section 3.6). This is achieved by setting the Data-Level Length field of the DSS option to the reserved value of 0. The checksum, in such a case, will also be set to zero.
通过将子流级数据映射到连接其余部分的连接级数据,可以使用“无限”映射回退到常规TCP(参见第3.6节)。这是通过将DSS选项的数据级长度字段设置为保留值0来实现的。在这种情况下,校验和也将设置为零。
To provide full end-to-end resilience, MPTCP provides a connection-level acknowledgment, to act as a cumulative ACK for the connection as a whole. This is the "Data ACK" field of the DSS option (Figure 9). The Data ACK is analogous to the behavior of the standard TCP cumulative ACK -- indicating how much data has been successfully received (with no holes). This is in comparison to the subflow-level ACK, which acts analogous to TCP SACK, given that there may still be holes in the data stream at the connection level. The Data ACK specifies the next data sequence number it expects to receive.
为了提供完整的端到端弹性,MPTCP提供连接级别的确认,作为整个连接的累积确认。这是DSS选项的“数据确认”字段(图9)。数据确认类似于标准TCP累积确认的行为——指示成功接收了多少数据(没有漏洞)。这与子流级别ACK相比,后者的行为类似于TCP SACK,因为连接级别的数据流中可能仍然存在漏洞。数据确认指定它期望接收的下一个数据序列号。
The Data ACK, as for the DSN, can be sent as the full 64-bit value, or as the lower 32 bits. If data is received with a 64-bit DSN, it MUST be acknowledged with a 64-bit Data ACK. If the DSN received is 32 bits, it is valid for the implementation to choose whether to send a 32-bit or 64-bit Data ACK.
与DSN一样,数据ACK可以作为完整的64位值或更低的32位值发送。如果使用64位DSN接收数据,则必须使用64位数据确认确认。如果接收的DSN为32位,则实现可以选择是发送32位还是64位数据ACK。
The Data ACK proves that the data, and all required MPTCP signaling, has been received and accepted by the remote end. One key use of the Data ACK signal is that it is used to indicate the left edge of the advertised receive window. As explained in Section 3.3.4, the receive window is shared by all subflows and is relative to the Data ACK. Because of this, an implementation MUST NOT use the RCV.WND field of a TCP segment at the connection level if it does not also carry a DSS option with a Data ACK field. Furthermore, separating the connection-level acknowledgments from the subflow level allows processing to be done separately, and a receiver has the freedom to drop segments after acknowledgment at the subflow level, for example, due to memory constraints when many segments arrive out of order.
数据确认证明远程端已接收并接受数据和所有必需的MPTCP信令。数据确认信号的一个关键用途是,它用于指示播发接收窗口的左边缘。如第3.3.4节所述,接收窗口由所有子流共享,并与数据ACK相关。因此,如果一个实现未携带带有数据确认字段的DSS选项,则它不得在连接级别使用TCP段的RCV.WND字段。此外,将连接级别的确认与子流级别分离允许单独进行处理,并且接收机在子流级别的确认之后可以自由地丢弃段,例如,当许多段无序到达时,由于内存限制。
An MPTCP sender MUST NOT free data from the send buffer until it has been acknowledged by both a Data ACK received on any subflow and at the subflow level by all subflows on which the data was sent. The former condition ensures liveness of the connection and the latter condition ensures liveness and self-consistence of a subflow when data needs to be retransmitted. Note, however, that if some data needs to be retransmitted multiple times over a subflow, there is a risk of blocking the sending window. In this case, the MPTCP sender can decide to terminate the subflow that is behaving badly by sending a RST.
MPTCP发送方不得从发送缓冲区释放数据,直到在任何子流上接收到的数据ACK以及在子流级别上发送数据的所有子流都确认了数据。前一个条件确保连接的活跃度,后一个条件确保在需要重新传输数据时子流的活跃度和自一致性。但是,请注意,如果某些数据需要在子流上多次重新传输,则存在阻塞发送窗口的风险。在这种情况下,MPTCP发送方可以通过发送RST来决定终止表现不好的子流。
The Data ACK MAY be included in all segments; however, optimizations SHOULD be considered in more advanced implementations, where the Data ACK is present in segments only when the Data ACK value advances, and
数据ACK可以包括在所有段中;但是,在更高级的实现中应该考虑优化,其中数据ACK仅在数据ACK值提高时才以段的形式出现,并且
this behavior MUST be treated as valid. This behavior ensures the sender buffer is freed, while reducing overhead when the data transfer is unidirectional.
此行为必须视为有效。此行为确保释放发送方缓冲区,同时减少数据传输单向时的开销。
In regular TCP, a FIN announces the receiver that the sender has no more data to send. In order to allow subflows to operate independently and to keep the appearance of TCP over the wire, a FIN in MPTCP only affects the subflow on which it is sent. This allows nodes to exercise considerable freedom over which paths are in use at any one time. The semantics of a FIN remain as for regular TCP; i.e., it is not until both sides have ACKed each other's FINs that the subflow is fully closed.
在常规TCP中,FIN通知接收方发送方没有更多的数据要发送。为了允许子流独立运行并保持TCP在线路上的外观,MPTCP中的FIN仅影响发送它的子流。这允许节点在任何时候对正在使用的路径行使相当大的自由度。FIN的语义与常规TCP相同;i、 也就是说,直到两边都把对方的鳍收起来,亚水流才完全关闭。
When an application calls close() on a socket, this indicates that it has no more data to send; for regular TCP, this would result in a FIN on the connection. For MPTCP, an equivalent mechanism is needed, and this is referred to as the DATA_FIN.
当应用程序在套接字上调用close()时,这表示它没有更多的数据要发送;对于常规TCP,这将导致连接上出现FIN。对于MPTCP,需要一个等效的机制,这称为DATA_FIN。
A DATA_FIN is an indication that the sender has no more data to send, and as such can be used to verify that all data has been successfully received. A DATA_FIN, as with the FIN on a regular TCP connection, is a unidirectional signal.
数据FIN表示发送方没有更多的数据要发送,因此可用于验证所有数据是否已成功接收。与常规TCP连接上的FIN一样,数据FIN是单向信号。
The DATA_FIN is signaled by setting the 'F' flag in the Data Sequence Signal option (Figure 9) to 1. A DATA_FIN occupies 1 octet (the final octet) of the connection-level sequence space. Note that the DATA_FIN is included in the Data-Level Length, but not at the subflow level: for example, a segment with DSN 80, and Data-Level Length 11, with DATA_FIN set, would map 10 octets from the subflow into data sequence space 80-89, the DATA_FIN is DSN 90; therefore, this segment including DATA_FIN would be acknowledged with a DATA_ACK of 91.
通过将数据序列信号选项(图9)中的“F”标志设置为1,向数据FIN发送信号。数据_FIN占用连接级序列空间的1个八位字节(最后一个八位字节)。注意,数据_FIN包含在数据级长度中,但不在子流级别:例如,设置了数据_FIN的具有DSN 80和数据级长度11的段将10个八位字节从子流映射到数据序列空间80-89,数据_FIN为DSN 90;因此,包括数据FIN的该段将以数据ACK 91进行确认。
Note that when the DATA_FIN is not attached to a TCP segment containing data, the Data Sequence Signal MUST have a subflow sequence number of 0, a Data-Level Length of 1, and the data sequence number that corresponds with the DATA_FIN itself. The checksum in this case will only cover the pseudo-header.
请注意,当数据FIN未连接到包含数据的TCP段时,数据序列信号必须具有子流序列号0、数据级别长度1以及与数据FIN本身对应的数据序列号。这种情况下的校验和只覆盖伪报头。
A DATA_FIN has the semantics and behavior as a regular TCP FIN, but at the connection level. Notably, it is only DATA_ACKed once all data has been successfully received at the connection level. Note, therefore, that a DATA_FIN is decoupled from a subflow FIN. It is only permissible to combine these signals on one subflow if there is no data outstanding on other subflows. Otherwise, it may be necessary to retransmit data on different subflows. Essentially, a host MUST NOT close all functioning subflows unless it is safe to do
数据FIN的语义和行为与常规TCP FIN相同,但处于连接级别。值得注意的是,只有在连接级别成功接收所有数据后,才会确认数据。因此,请注意,数据_鳍与子流鳍分离。只有在其他子流上没有未处理的数据时,才允许在一个子流上组合这些信号。否则,可能需要在不同的子流上重新传输数据。基本上,主机不能关闭所有正常工作的子流,除非这样做是安全的
so, i.e., until all outstanding data has been DATA_ACKed, or until the segment with the DATA_FIN flag set is the only outstanding segment.
因此,也就是说,直到所有未完成的数据都已确认,或者直到设置了data FIN标志的段是唯一未完成的段。
Once a DATA_FIN has been acknowledged, all remaining subflows MUST be closed with standard FIN exchanges. Both hosts SHOULD send FINs on all subflows, as a courtesy to allow middleboxes to clean up state even if an individual subflow has failed. It is also encouraged to reduce the timeouts (Maximum Segment Life) on subflows at end hosts. In particular, any subflows where there is still outstanding data queued (which has been retransmitted on other subflows in order to get the DATA_FIN acknowledged) MAY be closed with a RST.
确认数据FIN后,必须使用标准FIN交换关闭所有剩余子流。两台主机都应在所有子流上发送FIN,作为一种礼貌,即使单个子流出现故障,也允许中间盒清除状态。还鼓励减少终端主机子流上的超时(最大段寿命)。特别地,仍然有未完成数据排队的任何子流(已经在其他子流上重新传输以获得数据确认)可以用RST关闭。
A connection is considered closed once both hosts' DATA_FINs have been acknowledged by DATA_ACKs.
一旦数据确认确认两台主机的数据,连接即被视为关闭。
As specified above, a standard TCP FIN on an individual subflow only shuts down the subflow on which it was sent. If all subflows have been closed with a FIN exchange, but no DATA_FIN has been received and acknowledged, the MPTCP connection is treated as closed only after a timeout. This implies that an implementation will have TIME_WAIT states at both the subflow and connection levels (see Appendix C). This permits "break-before-make" scenarios where connectivity is lost on all subflows before a new one can be re-established.
如上所述,单个子流上的标准TCP FIN仅关闭发送它的子流。如果通过FIN交换关闭了所有子流,但未收到并确认任何数据,则MPTCP连接仅在超时后视为关闭。这意味着一个实现在子流和连接级别上都将具有TIME_WAIT状态(参见附录C)。这允许出现“先断后通”的情况,即在重新建立新的子流之前,所有子流上的连接都会丢失。
Regular TCP advertises a receive window in each packet, telling the sender how much data the receiver is willing to accept past the cumulative ack. The receive window is used to implement flow control, throttling down fast senders when receivers cannot keep up.
常规TCP在每个数据包中通告一个接收窗口,告诉发送方接收方愿意接受超过累积ack的数据量。“接收”窗口用于实现流量控制,在接收器无法跟上时限制快速发送器。
MPTCP also uses a unique receive window, shared between the subflows. The idea is to allow any subflow to send data as long as the receiver is willing to accept it. The alternative, maintaining per subflow receive windows, could end up stalling some subflows while others would not use up their window.
MPTCP还使用子流之间共享的唯一接收窗口。其思想是只要接收方愿意接受数据,就允许任何子流发送数据。另一种选择是,维护每个子流的接收窗口,可能会导致某些子流暂停,而其他子流则不会使用完其窗口。
The receive window is relative to the DATA_ACK. As in TCP, a receiver MUST NOT shrink the right edge of the receive window (i.e., DATA_ACK + receive window). The receiver will use the data sequence number to tell if a packet should be accepted at the connection level.
接收窗口与数据确认相关。在TCP中,接收器不得收缩接收窗口的右边缘(即数据确认+接收窗口)。接收器将使用数据序列号来告知是否应在连接级别接受数据包。
When deciding to accept packets at subflow level, regular TCP checks the sequence number in the packet against the allowed receive window. With multipath, such a check is done using only the connection-level
当决定在子流级别接受数据包时,常规TCP会根据允许的接收窗口检查数据包中的序列号。对于多路径,这样的检查只使用连接级别来完成
window. A sanity check SHOULD be performed at subflow level to ensure that the subflow and mapped sequence numbers meet the following test: SSN - SUBFLOW_ACK <= DSN - DATA_ACK, where SSN is the subflow sequence number of the received packet and SUBFLOW_ACK is the RCV.NXT (next expected sequence number) of the subflow (with the equivalent connection-level definitions for DSN and DATA_ACK).
窗应在子流级别执行健全性检查,以确保子流和映射的序列号符合以下测试:SSN-子流\u ACK<=DSN-数据\u ACK,其中SSN是接收数据包的子流序列号,子流\u ACK是子流的RCV.NXT(下一个预期序列号)(具有DSN和数据确认的等效连接级别定义)。
In regular TCP, once a segment is deemed in-window, it is put either in the in-order receive queue or in the out-of-order queue. In Multipath TCP, the same happens but at the connection level: a segment is placed in the connection level in-order or out-of-order queue if it is in-window at both connection and subflow levels. The stack still has to remember, for each subflow, which segments were received successfully so that it can ACK them at subflow level appropriately. Typically, this will be implemented by keeping per subflow out-of-order queues (containing only message headers, not the payloads) and remembering the value of the cumulative ACK.
在常规TCP中,一旦一个段被视为在窗口中,它就会被放入有序接收队列或无序队列中。在多路径TCP中,同样的情况也会发生,但发生在连接级别:如果某个段在连接和子流级别都处于窗口中,则该段按顺序或无序队列放置在连接级别。堆栈仍然必须记住,对于每个子流,成功接收了哪些段,以便能够在子流级别正确地确认它们。通常,这将通过保持每个子流无序队列(仅包含消息头,而不包含有效负载)并记住累积ACK的值来实现。
It is important for implementers to understand how large a receiver buffer is appropriate. The lower bound for full network utilization is the maximum bandwidth-delay product of any one of the paths. However, this might be insufficient when a packet is lost on a slower subflow and needs to be retransmitted (see Section 3.3.6). A tight upper bound would be the maximum round-trip time (RTT) of any path multiplied by the total bandwidth available across all paths. This permits all subflows to continue at full speed while a packet is fast-retransmitted on the maximum RTT path. Even this might be insufficient to maintain full performance in the event of a retransmit timeout on the maximum RTT path. It is for future study to determine the relationship between retransmission strategies and receive buffer sizing.
对于实现人员来说,了解接收器缓冲区的大小是非常重要的。完全网络利用率的下限是任何一条路径的最大带宽延迟乘积。然而,当数据包在较慢的子流上丢失并且需要重新传输时,这可能是不够的(参见第3.3.6节)。严格的上限是任何路径的最大往返时间(RTT)乘以所有路径的总可用带宽。这允许在最大RTT路径上快速重新传输数据包时,所有子流以全速继续。即使这样,在最大RTT路径上发生重新传输超时时,也可能不足以保持完全性能。确定重传策略和接收缓冲区大小之间的关系有待于将来的研究。
The sender remembers receiver window advertisements from the receiver. It should only update its local receive window values when the largest sequence number allowed (i.e., DATA_ACK + receive window) increases, on the receipt of a DATA_ACK. This is important to allow using paths with different RTTs, and thus different feedback loops.
发送方从接收方记住接收方窗口广告。在接收到数据确认后,仅当允许的最大序列号(即数据确认+接收窗口)增加时,才应更新其本地接收窗口值。这对于允许使用具有不同RTT的路径以及不同的反馈回路非常重要。
MPTCP uses a single receive window across all subflows, and if the receive window was guaranteed to be unchanged end-to-end, a host could always read the most recent receive window value. However, some classes of middleboxes may alter the TCP-level receive window. Typically, these will shrink the offered window, although for short periods of time it may be possible for the window to be larger (however, note that this would not continue for long periods since ultimately the middlebox must keep up with delivering data to the
MPTCP在所有子流中使用单个接收窗口,如果接收窗口保证端到端保持不变,则主机始终可以读取最近的接收窗口值。但是,某些类别的中间盒可能会改变TCP级别的接收窗口。通常,这会缩小提供的窗口,尽管在短时间内窗口可能会变大(但是,请注意,这不会持续很长时间,因为最终中间盒必须跟上向客户交付数据的步伐)
receiver). Therefore, if receive window sizes differ on multiple subflows, when sending data MPTCP SHOULD take the largest of the most recent window sizes as the one to use in calculations. This rule is implicit in the requirement not to reduce the right edge of the window.
接收器)。因此,如果多个子流上的接收窗口大小不同,则在发送数据时,MPTCP应将最新窗口大小中的最大值作为计算中使用的窗口大小。此规则隐含在不减少窗口右边缘的要求中。
The sender MUST also remember the receive windows advertised by each subflow. The allowed window for subflow i is (ack_i, ack_i + rcv_wnd_i), where ack_i is the subflow-level cumulative ACK of subflow i. This ensures data will not be sent to a middlebox unless there is enough buffering for the data.
发送方还必须记住每个子流播发的接收窗口。子流i允许的窗口为(ack_i,ack_i+rcv_wnd_i),其中ack_i是子流i的子流级累积ack。这确保了数据不会被发送到中间盒,除非有足够的数据缓冲。
Putting the two rules together, we get the following: a sender is allowed to send data segments with data-level sequence numbers between (DATA_ACK, DATA_ACK + receive_window). Each of these segments will be mapped onto subflows, as long as subflow sequence numbers are in the allowed windows for those subflows. Note that subflow sequence numbers do not generally affect flow control if the same receive window is advertised across all subflows. They will perform flow control for those subflows with a smaller advertised receive window.
将这两条规则放在一起,我们得到以下结果:允许发送方发送数据段,数据级序列号介于(data_ACK,data_ACK+receive_window)和(data_ACK+receive_window)之间。只要子流序列号在这些子流的允许窗口中,这些段中的每一段都将映射到子流。请注意,如果在所有子流中播发相同的接收窗口,则子流序列号通常不会影响流控制。它们将使用较小的广告接收窗口对这些子流执行流量控制。
The send buffer MUST, at a minimum, be as big as the receive buffer, to enable the sender to reach maximum throughput.
发送缓冲区必须至少与接收缓冲区一样大,以使发送方能够达到最大吞吐量。
The data sequence mapping allows senders to resend data with the same data sequence number on a different subflow. When doing this, a host MUST still retransmit the original data on the original subflow, in order to preserve the subflow integrity (middleboxes could replay old data, and/or could reject holes in subflows), and a receiver will ignore these retransmissions. While this is clearly suboptimal, for compatibility reasons this is sensible behavior. Optimizations could be negotiated in future versions of this protocol.
数据序列映射允许发送方在不同的子流上重新发送具有相同数据序列号的数据。执行此操作时,主机仍必须在原始子流上重新传输原始数据,以保持子流的完整性(中间盒可能会重播旧数据和/或拒绝子流中的漏洞),并且接收器将忽略这些重新传输。虽然这显然是次优的,但出于兼容性原因,这是明智的行为。优化可以在该协议的未来版本中协商。
This protocol specification does not mandate any mechanisms for handling retransmissions, and much will be dependent upon local policy (as discussed in Section 3.3.8). One can imagine aggressive connection-level retransmissions policies where every packet lost at subflow level is retransmitted on a different subflow (hence, wasting bandwidth but possibly reducing application-to-application delays), or conservative retransmission policies where connection-level retransmits are only used after a few subflow-level retransmission timeouts occur.
本协议规范不要求任何处理重传的机制,并且在很大程度上取决于当地政策(如第3.3.8节所述)。我们可以想象激进的连接级重传策略,其中在子流级丢失的每个数据包都在不同的子流上重传(因此,浪费带宽,但可能减少应用程序到应用程序的延迟),或者保守的重传策略,其中连接级别的重传仅在少数子流级别的重传超时发生后使用。
It is envisaged that a standard connection-level retransmission mechanism would be implemented around a connection-level data queue: all segments that haven't been DATA_ACKed are stored. A timer is set when the head of the connection-level is ACKed at subflow level but its corresponding data is not ACKed at data level. This timer will guard against failures in retransmission by middleboxes that proactively ACK data.
设想围绕连接级数据队列实施标准连接级重传机制:存储所有未确认数据的段。当连接级别的标头在子流级别确认,但其相应的数据未在数据级别确认时,将设置计时器。此计时器将防止主动确认数据的中间盒在重新传输时发生故障。
The sender MUST keep data in its send buffer as long as the data has not been acknowledged at both connection level and on all subflows on which it has been sent. In this way, the sender can always retransmit the data if needed, on the same subflow or on a different one. A special case is when a subflow fails: the sender will typically resend the data on other working subflows after a timeout, and will keep trying to retransmit the data on the failed subflow too. The sender will declare the subflow failed after a predefined upper bound on retransmissions is reached (which MAY be lower than the usual TCP limits of the Maximum Segment Life), or on the receipt of an ICMP error, and only then delete the outstanding data segments.
只要数据未在连接级别和发送数据的所有子流上得到确认,发送方就必须将数据保留在其发送缓冲区中。这样,发送方总是可以在需要时在同一子流或不同子流上重新传输数据。一种特殊情况是子流失败:发送方通常会在超时后重新发送其他工作子流上的数据,并会继续尝试重新传输失败子流上的数据。在达到重新传输的预定义上限(可能低于最长数据段寿命的通常TCP限制)或收到ICMP错误后,发送方将声明子流失败,然后才删除未完成的数据段。
Multiple retransmissions are triggers that will indicate that a subflow performs badly and could lead to a host resetting the subflow with a RST. However, additional research is required to understand the heuristics of how and when to reset underperforming subflows. For example, a highly asymmetric path may be misdiagnosed as underperforming.
多次重传是一种触发器,表示子流性能不好,并可能导致主机使用RST重置子流。然而,需要更多的研究来理解如何以及何时重置表现不佳的子流的启发式方法。例如,高度不对称的路径可能被误诊为表现不佳。
Different subflows in an MPTCP connection have different congestion windows. To achieve fairness at bottlenecks and resource pooling, it is necessary to couple the congestion windows in use on each subflow, in order to push most traffic to uncongested links. One algorithm for achieving this is presented in [5]; the algorithm does not achieve perfect resource pooling but is "safe" in that it is readily deployable in the current Internet. By this, we mean that it does not take up more capacity on any one path than if it was a single path flow using only that route, so this ensures fair coexistence with single-path TCP at shared bottlenecks.
MPTCP连接中的不同子流具有不同的拥塞窗口。为了在瓶颈和资源池中实现公平性,有必要在每个子流上耦合正在使用的拥塞窗口,以便将大多数流量推送到未阻塞的链路。[5]中给出了实现这一点的一种算法;该算法不能实现完美的资源池,但是“安全”的,因为它可以很容易地部署在当前的互联网上。我们的意思是,与仅使用该路由的单路径流相比,它不会在任何一条路径上占用更多的容量,因此这确保了在共享瓶颈处与单路径TCP公平共存。
It is foreseeable that different congestion controllers will be implemented for MPTCP, each aiming to achieve different properties in the resource pooling/fairness/stability design space, as well as those for achieving different properties in quality of service, reliability, and resilience.
可以预见,MPTCP将采用不同的拥塞控制器,每个拥塞控制器旨在实现资源池/公平性/稳定性设计空间中的不同属性,以及在服务质量、可靠性和弹性方面实现不同属性的拥塞控制器。
Regardless of the algorithm used, the design of the MPTCP protocol aims to provide the congestion control implementations sufficient information to take the right decisions; this information includes, for each subflow, which packets were lost and when.
无论使用何种算法,MPTCP协议的设计都旨在为拥塞控制实现提供足够的信息,以便做出正确的决策;对于每个子流,此信息包括哪些数据包丢失以及何时丢失。
Within a local MPTCP implementation, a host may use any local policy it wishes to decide how to share the traffic to be sent over the available paths.
在本地MPTCP实现中,主机可以使用其希望决定如何共享通过可用路径发送的流量的任何本地策略。
In the typical use case, where the goal is to maximize throughput, all available paths will be used simultaneously for data transfer, using coupled congestion control as described in [5]. It is expected, however, that other use cases will appear.
在典型用例中,目标是最大化吞吐量,所有可用路径将同时用于数据传输,使用[5]中描述的耦合拥塞控制。然而,预计会出现其他用例。
For instance, a possibility is an 'all-or-nothing' approach, i.e., have a second path ready for use in the event of failure of the first path, but alternatives could include entirely saturating one path before using an additional path (the 'overflow' case). Such choices would be most likely based on the monetary cost of links, but may also be based on properties such as the delay or jitter of links, where stability (of delay or bandwidth) is more important than throughput. Application requirements such as these are discussed in detail in [6].
例如,一种可能性是“全有或全无”方法,即在第一条路径出现故障时准备好第二条路径以供使用,但替代方法可能包括在使用其他路径之前完全饱和一条路径(“溢出”情况)。这种选择很可能基于链路的货币成本,但也可能基于链路的延迟或抖动等特性,其中稳定性(延迟或带宽)比吞吐量更重要。[6]中详细讨论了此类应用要求。
The ability to make effective choices at the sender requires full knowledge of the path "cost", which is unlikely to be the case. It would be desirable for a receiver to be able to signal their own preferences for paths, since they will often be the multihomed party, and may have to pay for metered incoming bandwidth.
在发送方做出有效选择的能力需要对路径“成本”有充分的了解,但事实并非如此。接收机最好能够发出自己对路径的偏好信号,因为它们通常是多址方,并且可能必须为计量的传入带宽付费。
Whilst fine-grained control may be the most powerful solution, that would require some mechanism such as overloading the Explicit Congestion Notification (ECN) signal [17], which is undesirable, and it is felt that there would not be sufficient benefit to justify an entirely new signal. Therefore, the MP_JOIN option (see Section 3.2) contains the 'B' bit, which allows a host to indicate to its peer that this path should be treated as a backup path to use only in the event of failure of other working subflows (i.e., a subflow where the receiver has indicated B=1 SHOULD NOT be used to send data unless there are no usable subflows where B=0).
虽然细粒度控制可能是最强大的解决方案,但这需要一些机制,如显式拥塞通知(ECN)信号过载[17],这是不可取的,并且认为没有足够的好处来证明全新信号的合理性。因此,MP_连接选项(参见第3.2节)包含“B”位,这允许主机向其对等方指示,该路径应被视为备份路径,仅在其他工作子流发生故障时使用(即,如果接收器指示B=1,则不应使用子流发送数据,除非在B=0的情况下没有可用的子流)。
In the event that the available set of paths changes, a host may wish to signal a change in priority of subflows to the peer (e.g., a subflow that was previously set as backup should now take priority
在可用路径集发生变化的情况下,主机可能希望向对等方发出子流优先级变化的信号(例如,先前设置为备份的子流现在应具有优先级)
over all remaining subflows). Therefore, the MP_PRIO option, shown in Figure 11, can be used to change the 'B' flag of the subflow on which it is sent.
在所有剩余的子流上)。因此,图11所示的MP_PRIO选项可用于更改发送它的子流的“B”标志。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----+-+--------------+ | Kind | Length |Subtype| |B| AddrID (opt) | +---------------+---------------+-------+-----+-+--------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----+-+--------------+ | Kind | Length |Subtype| |B| AddrID (opt) | +---------------+---------------+-------+-----+-+--------------+
Figure 11: Change Subflow Priority (MP_PRIO) Option
图11:更改子流优先级(MP_PRIO)选项
It should be noted that the backup flag is a request from a data receiver to a data sender only, and the data sender SHOULD adhere to these requests. A host cannot assume that the data sender will do so, however, since local policies -- or technical difficulties -- may override MP_PRIO requests. Note also that this signal applies to a single direction, and so the sender of this option could choose to continue using the subflow to send data even if it has signaled B=1 to the other host.
需要注意的是,备份标志仅是数据接收方对数据发送方的请求,数据发送方应遵守这些请求。但是,主机不能假定数据发送方会这样做,因为本地策略或技术困难可能会覆盖MP_PRIO请求。还请注意,此信号适用于单个方向,因此此选项的发送方可以选择继续使用子流发送数据,即使它已向其他主机发出B=1的信号。
This option can also be applied to other subflows than the one on which it is sent, by setting the optional Address ID field. This applies the given setting of B to all subflows in this connection that use the address identified by the given Address ID. The presence of this field is determined by the option length; if Length==4 then it is present. If Length==3, then it applies to the current subflow only. The use case of this is that a host can signal to its peer that an address is temporarily unavailable (for example, if it has radio coverage issues) and the peer should therefore drop to backup state on all subflows using that Address ID.
通过设置可选的地址ID字段,此选项还可以应用于发送它的子流以外的其他子流。这会将给定的B设置应用于此连接中使用给定地址ID标识的地址的所有子流。此字段的存在由选项长度决定;如果长度==4,则表示存在。如果长度==3,则它仅适用于当前子流。这种情况的使用案例是,主机可以向其对等方发出信号,表明某个地址暂时不可用(例如,如果它有无线电覆盖问题),因此该对等方应使用该地址ID在所有子流上下降到备份状态。
We use the term "path management" to refer to the exchange of information about additional paths between hosts, which in this design is managed by multiple addresses at hosts. For more detail of the architectural thinking behind this design, see the MPTCP Architecture document [2].
我们使用术语“路径管理”指的是主机之间其他路径的信息交换,在本设计中,这些路径由主机上的多个地址管理。有关此设计背后的体系结构思想的更多详细信息,请参阅MPTCP体系结构文档[2]。
This design makes use of two methods of sharing such information, and both can be used on a connection. The first is the direct setup of new subflows, already described in Section 3.2, where the initiator has an additional address. The second method, described in the following subsections, signals addresses explicitly to the other host to allow it to initiate new subflows. The two mechanisms are complementary: the first is implicit and simple, while the explicit is more complex but is more robust. Together, the mechanisms allow
此设计使用两种共享此类信息的方法,两种方法都可以用于连接。第一个是直接设置新子流,如第3.2节所述,其中启动器有一个附加地址。第二种方法(在下面的小节中描述)明确地向另一个主机发送地址信号,以允许它启动新的子流。这两种机制是互补的:第一种机制是隐式和简单的,而显式机制更复杂,但更稳健。这些机制共同允许
addresses to change in flight (and thus support operation through NATs, since the source address need not be known), and also allow the signaling of previously unknown addresses, and of addresses belonging to other address families (e.g., both IPv4 and IPv6).
要在飞行中更改的地址(因此支持通过NAT进行操作,因为源地址不需要已知),并且还允许发送先前未知地址以及属于其他地址系列(例如IPv4和IPv6)的地址的信令。
Here is an example of typical operation of the protocol:
以下是协议的典型操作示例:
o An MPTCP connection is initially set up between address/port A1 of Host A and address/port B1 of Host B. If Host A is multihomed and multiaddressed, it can start an additional subflow from its address A2 to B1, by sending a SYN with a Join option from A2 to B1, using B's previously declared token for this connection. Alternatively, if B is multihomed, it can try to set up a new subflow from B2 to A1, using A's previously declared token. In either case, the SYN will be sent to the port already in use for the original subflow on the receiving host.
o MPTCP连接最初是在主机A的地址/端口A1和主机B的地址/端口B1之间建立的。如果主机A是多址和多址的,它可以使用B先前声明的连接令牌,通过从A2到B1发送带有连接选项的SYN,从其地址A2到B1启动额外的子流。或者,如果B是多址的,它可以尝试使用a先前声明的令牌设置从B2到A1的新子流。在任何一种情况下,SYN都将被发送到接收主机上已用于原始子流的端口。
o Simultaneously (or after a timeout), an ADD_ADDR option (Section 3.4.1) is sent on an existing subflow, informing the receiver of the sender's alternative address(es). The recipient can use this information to open a new subflow to the sender's additional address. In our example, A will send ADD_ADDR option informing B of address/port A2. The mix of using the SYN-based option and the ADD_ADDR option, including timeouts, is implementation specific and can be tailored to agree with local policy.
o 同时(或在超时后),在现有子流上发送ADD_ADDR选项(第3.4.1节),通知接收方发送方的备用地址。收件人可以使用此信息打开发件人附加地址的新子流。在我们的示例中,A将发送ADD_ADDR选项,通知B地址/端口A2。使用基于SYN的选项和ADD_ADDR选项(包括超时)的组合是特定于实现的,可以根据本地策略进行调整。
o If subflow A2-B1 is successfully set up, Host B can use the Address ID in the Join option to correlate this with the ADD_ADDR option that will also arrive on an existing subflow; now B knows not to open A2-B1, ignoring the ADD_ADDR. Otherwise, if B has not received the A2-B1 MP_JOIN SYN but received the ADD_ADDR, it can try to initiate a new subflow from one or more of its addresses to address A2. This permits new sessions to be opened if one host is behind a NAT.
o 如果成功设置子流A2-B1,主机B可以使用连接选项中的地址ID将其与也将到达现有子流的ADD_ADDR选项关联;现在B知道不打开A2-B1,忽略ADD_ADDR。否则,如果B没有收到A2-B1 MP_JOIN SYN,但收到了ADD_ADDR,它可以尝试从一个或多个地址向地址A2发起新的子流。这允许在一台主机位于NAT后面时打开新会话。
Other ways of using the two signaling mechanisms are possible; for instance, signaling addresses in other address families can only be done explicitly using the Add Address option.
使用这两种信令机制的其他方式是可能的;例如,其他地址族中的信令地址只能使用addaddress选项显式完成。
The Add Address (ADD_ADDR) TCP option announces additional addresses (and optionally, ports) on which a host can be reached (Figure 12). Multiple instances of this TCP option can be added in a single message if there is sufficient TCP option space; otherwise, multiple TCP messages containing this option will be sent. This option can be used at any time during a connection, depending on when the sender
addaddress(Add_ADDR)TCP选项宣布可以访问主机的其他地址(以及可选的端口)(图12)。如果有足够的TCP选项空间,可以在一条消息中添加此TCP选项的多个实例;否则,将发送包含此选项的多条TCP消息。此选项可在连接过程中的任何时间使用,具体取决于发送方的时间
wishes to enable multiple paths and/or when paths become available. As with all MPTCP signals, the receiver MUST undertake standard TCP validity checks before acting upon it.
希望启用多个路径和/或当路径可用时。与所有MPTCP信号一样,接收器必须在对其采取行动之前进行标准TCP有效性检查。
Every address has an Address ID that can be used for uniquely identifying the address within a connection for address removal. This is also used to identify MP_JOIN options (see Section 3.2) relating to the same address, even when address translators are in use. The Address ID MUST uniquely identify the address to the sender (within the scope of the connection), but the mechanism for allocating such IDs is implementation specific.
每个地址都有一个地址ID,可用于唯一标识连接中的地址,以便删除地址。这也用于识别与同一地址相关的MP_连接选项(见第3.2节),即使在使用地址转换器时也是如此。地址ID必须唯一地标识发送方的地址(在连接范围内),但分配此类ID的机制是特定于实现的。
All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be stored by the receiver in a data structure that gathers all the Address ID to address mappings for a connection (identified by a token pair). In this way, there is a stored mapping between Address ID, observed source address, and token pair for future processing of control information for a connection. Note that an implementation MAY discard incoming address advertisements at will, for example, for avoiding the required mapping state, or because advertised addresses are of no use to it (for example, IPv6 addresses when it has IPv4 only). Therefore, a host MUST treat address advertisements as soft state, and it MAY choose to refresh advertisements periodically.
通过MP_JOIN或ADD_ADDR学习的所有地址ID应由接收方存储在数据结构中,该数据结构收集连接的所有地址ID到地址映射(由令牌对标识)。以这种方式,在地址ID、观察到的源地址和令牌对之间存在存储的映射,以便将来处理连接的控制信息。请注意,实现可能会随意丢弃传入地址播发,例如,为了避免所需的映射状态,或者因为播发的地址对它没有任何用处(例如,当它只有IPv4时,IPv6地址)。因此,主机必须将地址播发视为软状态,并且可以选择定期刷新播发。
This option is shown in Figure 12. The illustration is sized for IPv4 addresses (IPVer = 4). For IPv6, the IPVer field will read 6, and the length of the address will be 16 octets (instead of 4).
此选项如图12所示。该图的大小适合IPv4地址(IPVer=4)。对于IPv6,IPVer字段将读取6,地址长度将为16个八位字节(而不是4个)。
The presence of the final 2 octets, specifying the TCP port number to use, are optional and can be inferred from the length of the option. Although it is expected that the majority of use cases will use the same port pairs as used for the initial subflow (e.g., port 80 remains port 80 on all subflows, as does the ephemeral port at the client), there may be cases (such as port-based load balancing) where the explicit specification of a different port is required. If no port is specified, MPTCP SHOULD attempt to connect to the specified address on the same port as is already in use by the subflow on which the ADD_ADDR signal was sent; this is discussed in more detail in Section 3.8.
最后2个八位字节(指定要使用的TCP端口号)的存在是可选的,可以从选项的长度推断出来。虽然预计大多数用例将使用与初始子流相同的端口对(例如,端口80在所有子流上保持端口80,客户端的临时端口也是如此),但可能存在需要明确指定不同端口的情况(例如基于端口的负载平衡)。如果未指定端口,MPTCP应尝试连接到发送ADD_ADDR信号的子流已在使用的同一端口上的指定地址;第3.8节对此进行了更详细的讨论。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ | Kind | Length |Subtype| IPVer | Address ID | +---------------+---------------+-------+-------+---------------+ | Address (IPv4 - 4 octets / IPv6 - 16 octets) | +-------------------------------+-------------------------------+ | Port (2 octets, optional) | +-------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ | Kind | Length |Subtype| IPVer | Address ID | +---------------+---------------+-------+-------+---------------+ | Address (IPv4 - 4 octets / IPv6 - 16 octets) | +-------------------------------+-------------------------------+ | Port (2 octets, optional) | +-------------------------------+
Figure 12: Add Address (ADD_ADDR) Option
图12:添加地址(Add_ADDR)选项
Due to the proliferation of NATs, it is reasonably likely that one host may attempt to advertise private addresses [18]. It is not desirable to prohibit this, since there may be cases where both hosts have additional interfaces on the same private network, and a host MAY want to advertise such addresses. The MP_JOIN handshake to create a new subflow (Section 3.2) provides mechanisms to minimize security risks. The MP_JOIN message contains a 32-bit token that uniquely identifies the connection to the receiving host. If the token is unknown, the host will return with a RST. In the unlikely event that the token is known, subflow setup will continue, but the HMAC exchange must occur for authentication. This will fail, and will provide sufficient protection against two unconnected hosts accidentally setting up a new subflow upon the signal of a private address. Further security considerations around the issue of ADD_ADDR messages that accidentally misdirect, or maliciously direct, new MP_JOIN attempts are discussed in Section 5.
由于NAT的激增,一台主机可能会尝试公布专用地址[18]。不希望禁止这一点,因为可能存在两台主机在同一专用网络上都有额外接口的情况,并且主机可能希望公布此类地址。创建新子流的MP_连接握手(第3.2节)提供了最小化安全风险的机制。MP_JOIN消息包含一个32位令牌,用于唯一标识与接收主机的连接。如果令牌未知,主机将返回RST。在不太可能知道令牌的情况下,子流设置将继续,但必须进行HMAC交换以进行身份验证。这将失败,并将提供足够的保护,防止两个未连接的主机在收到专用地址信号时意外设置新的子流。第5节讨论了有关意外误导或恶意引导新MP_加入尝试的ADD_ADDR消息问题的进一步安全注意事项。
Ideally, ADD_ADDR and REMOVE_ADDR options would be sent reliably, and in order, to the other end. This would ensure that this address management does not unnecessarily cause an outage in the connection when remove/add addresses are processed in reverse order, and also to ensure that all possible paths are used. Note, however, that losing reliability and ordering will not break the multipath connections, it will just reduce the opportunity to open multipath paths and to survive different patterns of path failures.
理想情况下,ADD_ADDR和REMOVE_ADDR选项将按顺序可靠地发送到另一端。这将确保在按相反顺序处理删除/添加地址时,此地址管理不会不必要地导致连接中断,并确保使用所有可能的路径。但是,请注意,失去可靠性和顺序不会中断多路径连接,它只会减少打开多路径路径和在不同路径故障模式下生存的机会。
Therefore, implementing reliability signals for these TCP options is not necessary. In order to minimize the impact of the loss of these options, however, it is RECOMMENDED that a sender should send these options on all available subflows. If these options need to be received in order, an implementation SHOULD only send one ADD_ADDR/ REMOVE_ADDR option per RTT, to minimize the risk of misordering.
因此,没有必要为这些TCP选项实现可靠性信号。但是,为了将丢失这些选项的影响降至最低,建议发送方在所有可用子流上发送这些选项。如果需要按顺序接收这些选项,则每个RTT实现只应发送一个ADD\u ADDR/REMOVE\u ADDR选项,以将错误排序的风险降至最低。
A host can send an ADD_ADDR message with an already assigned Address ID, but the Address MUST be the same as previously assigned to this Address ID, and the Port MUST be different from one already in use
主机可以使用已分配的地址ID发送ADD_ADDR消息,但地址必须与以前分配给此地址ID的地址相同,并且端口必须与已使用的端口不同
for this Address ID. If these conditions are not met, the receiver SHOULD silently ignore the ADD_ADDR. A host wishing to replace an existing Address ID MUST first remove the existing one (Section 3.4.2).
对于此地址ID。如果不满足这些条件,则接收方应自动忽略ADD_ADDR。希望替换现有地址ID的主机必须首先删除现有地址ID(第3.4.2节)。
A host that receives an ADD_ADDR but finds a connection set up to that IP address and port number is unsuccessful SHOULD NOT perform further connection attempts to this address/port combination for this connection. A sender that wants to trigger a new incoming connection attempt on a previously advertised address/port combination can therefore refresh ADD_ADDR information by sending the option again.
接收ADD_ADDR但发现设置到该IP地址和端口号的连接不成功的主机不应为此连接执行到该地址/端口组合的进一步连接尝试。因此,想要在先前公布的地址/端口组合上触发新的传入连接尝试的发送方可以通过再次发送该选项来刷新添加地址信息。
During normal MPTCP operation, it is unlikely that there will be sufficient TCP option space for ADD_ADDR to be included along with those for data sequence numbering (Section 3.3.1). Therefore, it is expected that an MPTCP implementation will send the ADD_ADDR option on separate ACKs. As discussed earlier, however, an MPTCP implementation MUST NOT treat duplicate ACKs with any MPTCP option, with the exception of the DSS option, as indications of congestion [12], and an MPTCP implementation SHOULD NOT send more than two duplicate ACKs in a row for signaling purposes.
在正常的MPTCP操作过程中,不太可能有足够的TCP选项空间将ADD_ADDR与数据序列编号一起包含(第3.3.1节)。因此,预计MPTCP实现将在单独的ACK上发送ADD_ADDR选项。但是,如前所述,MPTCP实现不得将具有任何MPTCP选项的重复ACK(DSS选项除外)视为拥塞指示[12],并且MPTCP实现不得出于信令目的连续发送两个以上的重复ACK。
If, during the lifetime of an MPTCP connection, a previously announced address becomes invalid (e.g., if the interface disappears), the affected host SHOULD announce this so that the peer can remove subflows related to this address.
如果在MPTCP连接的生存期内,先前宣布的地址变得无效(例如,如果接口消失),则受影响的主机应宣布该地址,以便对等方可以删除与该地址相关的子流。
This is achieved through the Remove Address (REMOVE_ADDR) option (Figure 13), which will remove a previously added address (or list of addresses) from a connection and terminate any subflows currently using that address.
这是通过Remove Address(Remove_ADDR)选项(图13)实现的,该选项将从连接中删除先前添加的地址(或地址列表),并终止当前使用该地址的任何子流。
For security purposes, if a host receives a REMOVE_ADDR option, it must ensure the affected path(s) are no longer in use before it instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger the sending of a TCP keepalive [19] on the path, and if a response is received the path SHOULD NOT be removed. Typical TCP validity tests on the subflow (e.g., ensuring sequence and ACK numbers are correct) MUST also be undertaken. An implementation can use indications of these test failures as part of intrusion detection or error logging.
出于安全目的,如果主机接收到REMOVE_ADDR选项,则必须确保受影响的路径不再被使用,然后才能启动关闭。接收REMOVE_ADDR应首先触发在路径上发送TCP keepalive[19],如果收到响应,则不应删除路径。还必须对子流进行典型的TCP有效性测试(例如,确保序列号和ACK号正确)。实现可以使用这些测试失败的指示作为入侵检测或错误日志记录的一部分。
The sending and receipt (if no keepalive response was received) of this message SHOULD trigger the sending of RSTs by both hosts on the affected subflow(s) (if possible), as a courtesy to cleaning up middlebox state, before cleaning up any local state.
此消息的发送和接收(如果未收到keepalive响应)应触发两台主机在受影响的子流(如果可能)上发送RST,这是为了在清除任何本地状态之前清除中间盒状态。
Address removal is undertaken by ID, so as to permit the use of NATs and other middleboxes that rewrite source addresses. If there is no address at the requested ID, the receiver will silently ignore the request.
地址删除由ID执行,以便允许使用NAT和其他重写源地址的中间盒。如果请求的ID上没有地址,则接收方将自动忽略该请求。
A subflow that is still functioning MUST be closed with a FIN exchange as in regular TCP, rather than using this option. For more information, see Section 3.3.3.
在常规TCP中,必须使用FIN交换关闭仍在运行的子流,而不是使用此选项。有关更多信息,请参见第3.3.3节。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ | Kind | Length = 3+n |Subtype|(resvd)| Address ID | ... +---------------+---------------+-------+-------+---------------+ (followed by n-1 Address IDs, if required)
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ | Kind | Length = 3+n |Subtype|(resvd)| Address ID | ... +---------------+---------------+-------+-------+---------------+ (followed by n-1 Address IDs, if required)
Figure 13: Remove Address (REMOVE_ADDR) Option
图13:删除地址(Remove\u ADDR)选项
Regular TCP has the means of sending a reset (RST) signal to abruptly close a connection. With MPTCP, the RST only has the scope of the subflow and will only close the concerned subflow but not affect the remaining subflows. MPTCP's connection will stay alive at the data level, in order to permit break-before-make handover between subflows. It is therefore necessary to provide an MPTCP-level "reset" to allow the abrupt closure of the whole MPTCP connection, and this is the MP_FASTCLOSE option.
常规TCP可以发送重置(RST)信号以突然关闭连接。对于MPTCP,RST仅具有子流的范围,并且仅关闭相关子流,但不影响其余子流。MPTCP的连接将在数据级别保持活动状态,以便在子流之间进行切换之前允许中断。因此,有必要提供MPTCP级别的“重置”,以允许突然关闭整个MPTCP连接,这是MP_FASTCLOSE选项。
MP_FASTCLOSE is used to indicate to the peer that the connection will be abruptly closed and no data will be accepted anymore. The reasons for triggering an MP_FASTCLOSE are implementation specific. Regular TCP does not allow sending a RST while the connection is in a synchronized state [1]. Nevertheless, implementations allow the sending of a RST in this state, if, for example, the operating system is running out of resources. In these cases, MPTCP should send the MP_FASTCLOSE. This option is illustrated in Figure 14.
MP_FASTCLOSE用于向对等方指示连接将突然关闭,不再接受任何数据。触发MP_FASTCLOSE的原因是特定于实现的。常规TCP不允许在连接处于同步状态时发送RST[1]。然而,实现允许在这种状态下发送RST,例如,如果操作系统正在耗尽资源。在这些情况下,MPTCP应发送MP_FASTCLOSE。该选项如图14所示。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----------------------+ | Kind | Length |Subtype| (reserved) | +---------------+---------------+-------+-----------------------+ | Option Receiver's Key | | (64 bits) | | | +---------------------------------------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----------------------+ | Kind | Length |Subtype| (reserved) | +---------------+---------------+-------+-----------------------+ | Option Receiver's Key | | (64 bits) | | | +---------------------------------------------------------------+
Figure 14: Fast Close (MP_FASTCLOSE) Option
图14:快速关闭(MP_FASTCLOSE)选项
If Host A wants to force the closure of an MPTCP connection, the MPTCP Fast Close procedure is as follows:
如果主机A想要强制关闭MPTCP连接,MPTCP快速关闭过程如下:
o Host A sends an ACK containing the MP_FASTCLOSE option on one subflow, containing the key of Host B as declared in the initial connection handshake. On all the other subflows, Host A sends a regular TCP RST to close these subflows, and tears them down. Host A now enters FASTCLOSE_WAIT state.
o 主机A在一个子流上发送一个包含MP_FASTCLOSE选项的ACK,其中包含在初始连接握手中声明的主机B的密钥。在所有其他子流上,主机A发送一个常规TCP RST来关闭这些子流,并将其拆下。主机A现在进入快速关闭等待状态。
o Upon receipt of an MP_FASTCLOSE, containing the valid key, Host B answers on the same subflow with a TCP RST and tears down all subflows. Host B can now close the whole MPTCP connection (it transitions directly to CLOSED state).
o 收到包含有效密钥的MP_FASTCLOSE后,主机B使用TCP RST在同一子流上应答,并删除所有子流。主机B现在可以关闭整个MPTCP连接(它直接转换为关闭状态)。
o As soon as Host A has received the TCP RST on the remaining subflow, it can close this subflow and tear down the whole connection (transition from FASTCLOSE_WAIT to CLOSED states). If Host A receives an MP_FASTCLOSE instead of a TCP RST, both hosts attempted fast closure simultaneously. Host A should reply with a TCP RST and tear down the connection.
o 一旦主机A接收到剩余子流上的TCP RST,它就可以关闭此子流并中断整个连接(从FASTCLOSE\u WAIT转换到CLOSED状态)。如果主机A接收到MP_FASTCLOSE而不是TCP RST,则两台主机同时尝试快速关闭。主机A应使用TCP RST回复并断开连接。
o If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE after one retransmission timeout (RTO) (the RTO of the subflow where the MPTCP_RST has been sent), it SHOULD retransmit the MP_FASTCLOSE. The number of retransmissions SHOULD be limited to avoid this connection from being retained for a long time, but this limit is implementation specific. A RECOMMENDED number is 3.
o 如果主机A在一次重新传输超时(RTO)(发送MPTCP_RST的子流的RTO)后没有收到TCP RST对其MP_FASTCLOSE的响应,则应重新传输MP_FASTCLOSE。应限制重新传输的次数,以避免长时间保留此连接,但此限制是特定于实现的。建议的数字是3。
Sometimes, middleboxes will exist on a path that could prevent the operation of MPTCP. MPTCP has been designed in order to cope with many middlebox modifications (see Section 6), but there are still some cases where a subflow could fail to operate within the MPTCP requirements. These cases are notably the following: the loss of TCP options on a path and the modification of payload data. If such an
有时,在可能阻止MPTCP操作的路径上会存在中间盒。MPTCP的设计是为了应对许多中间箱修改(见第6节),但仍有一些情况下,子流可能无法在MPTCP要求范围内运行。这些情况主要有以下几种:路径上TCP选项的丢失和有效负载数据的修改。如果是这样的话
event occurs, it is necessary to "fall back" to the previous, safe operation. This may be either falling back to regular TCP or removing a problematic subflow.
事件发生时,必须“退回”到先前的安全操作。这可能是退回到常规TCP或删除有问题的子流。
At the start of an MPTCP connection (i.e., the first subflow), it is important to ensure that the path is fully MPTCP capable and the necessary TCP options can reach each host. The handshake as described in Section 3.1 SHOULD fall back to regular TCP if either of the SYN messages do not have the MPTCP options: this is the same, and desired, behavior in the case where a host is not MPTCP capable, or the path does not support the MPTCP options. When attempting to join an existing MPTCP connection (Section 3.2), if a path is not MPTCP capable and the TCP options do not get through on the SYNs, the subflow will be closed according to the MP_JOIN logic.
在MPTCP连接开始时(即第一个子流),确保路径完全支持MPTCP,并且必要的TCP选项可以到达每个主机,这一点很重要。如果任一SYN消息没有MPTCP选项,则第3.1节中所述的握手应退回到常规TCP:在主机不支持MPTCP或路径不支持MPTCP选项的情况下,这是相同且理想的行为。尝试加入现有MPTCP连接(第3.2节)时,如果路径不支持MPTCP,并且TCP选项无法通过SYN,则子流将根据MP_加入逻辑关闭。
There is, however, another corner case that should be addressed. That is one of MPTCP options getting through on the SYN, but not on regular packets. This can be resolved if the subflow is the first subflow, and thus all data in flight is contiguous, using the following rules.
然而,还有另一个需要解决的问题。这是在SYN上通过的MPTCP选项之一,但不是在常规数据包上。如果子流是第一个子流,因此飞行中的所有数据都是连续的,则可以使用以下规则解决此问题。
A sender MUST include a DSS option with data sequence mapping in every segment until one of the sent segments has been acknowledged with a DSS option containing a Data ACK. Upon reception of the acknowledgment, the sender has the confirmation that the DSS option passes in both directions and may choose to send fewer DSS options than once per segment.
发送方必须在每个段中包含一个带有数据序列映射的DSS选项,直到其中一个发送段已被包含数据确认的DSS选项确认。在收到确认后,发送方确认DSS选项在两个方向上通过,并且可以选择发送少于每个段一次的DSS选项。
If, however, an ACK is received for data (not just for the SYN) without a DSS option containing a Data ACK, the sender determines the path is not MPTCP capable. In the case of this occurring on an additional subflow (i.e., one started with MP_JOIN), the host MUST close the subflow with a RST. In the case of the first subflow (i.e., that started with MP_CAPABLE), it MUST drop out of an MPTCP mode back to regular TCP. The sender will send one final data sequence mapping, with the Data-Level Length value of 0 indicating an infinite mapping (in case the path drops options in one direction only), and then revert to sending data on the single subflow without any MPTCP options.
但是,如果在没有包含数据ACK的DSS选项的情况下接收到数据(不仅仅是SYN)的ACK,则发送方确定路径不支持MPTCP。如果这种情况发生在附加子流(即,以MP_JOIN开始的子流)上,则主机必须使用RST关闭子流。在第一个子流的情况下(即,从具有MP_功能的子流开始),它必须退出MPTCP模式,返回到常规TCP。发送方将发送一个最终数据序列映射,数据级别长度值为0表示无限映射(如果路径仅在一个方向上放置选项),然后返回到在单个子流上发送数据,而不使用任何MPTCP选项。
Note that this rule essentially prohibits the sending of data on the third packet of an MP_CAPABLE or MP_JOIN handshake, since both that option and a DSS cannot fit in TCP option space. If the initiator is to send first, another segment must be sent that contains the data and DSS. Note also that an additional subflow cannot be used until the initial path has been verified as MPTCP capable.
请注意,此规则基本上禁止在具有MP_功能或MP_连接握手的第三个数据包上发送数据,因为该选项和DSS都不适合TCP选项空间。如果启动器要首先发送,则必须发送另一个包含数据和DSS的段。还请注意,在验证初始路径是否支持MPTCP之前,不能使用额外的子流。
These rules should cover all cases where such a failure could happen: whether it's on the forward or reverse path and whether the server or the client first sends data. If lost options on data packets occur on any other subflow apart from the initial subflow, it should be treated as a standard path failure. The data would not be DATA_ACKed (since there is no mapping for the data), and the subflow can be closed with a RST.
这些规则应涵盖可能发生此类故障的所有情况:是在正向路径上还是反向路径上,以及服务器还是客户端首先发送数据。如果数据包上的丢失选项发生在除初始子流之外的任何其他子流上,则应将其视为标准路径故障。数据不会被数据确认(因为数据没有映射),并且可以使用RST关闭子流。
The case described above is a specialized case of fallback, for when the lack of MPTCP support is detected before any data is acknowledged at the connection level on a subflow. More generally, fallback (either closing a subflow, or to regular TCP) can become necessary at any point during a connection if a non-MPTCP-aware middlebox changes the data stream.
上述情况是回退的特殊情况,例如在子流上的连接级别确认任何数据之前检测到缺少MPTCP支持。更一般地说,如果非MPTCP感知的中间盒更改了数据流,则在连接期间的任何时候都可能需要回退(关闭子流或常规TCP)。
As described in Section 3.3, each portion of data for which there is a mapping is protected by a checksum. This mechanism is used to detect if middleboxes have made any adjustments to the payload (added, removed, or changed data). A checksum will fail if the data has been changed in any way. This will also detect if the length of data on the subflow is increased or decreased, and this means the data sequence mapping is no longer valid. The sender no longer knows what subflow-level sequence number the receiver is genuinely operating at (the middlebox will be faking ACKs in return), and it cannot signal any further mappings. Furthermore, in addition to the possibility of payload modifications that are valid at the application layer, there is the possibility that false positives could be hit across MPTCP segment boundaries, corrupting the data. Therefore, all data from the start of the segment that failed the checksum onwards is not trustworthy.
如第3.3节所述,存在映射的数据的每个部分都由校验和保护。此机制用于检测中间盒是否对有效负载进行了任何调整(添加、删除或更改数据)。如果数据以任何方式更改,校验和将失败。这还将检测子流上的数据长度是否增加或减少,这意味着数据序列映射不再有效。发送方不再知道接收方真正操作的是哪个子流级别序列号(作为回报,中间盒将伪造ack),并且它无法发出任何进一步映射的信号。此外,除了在应用层有效的有效负载修改的可能性外,还存在跨MPTCP段边界命中误报的可能性,从而损坏数据。因此,从校验和失败的段开始的所有数据都不可信。
When multiple subflows are in use, the data in flight on a subflow will likely involve data that is not contiguously part of the connection-level stream, since segments will be spread across the multiple subflows. Due to the problems identified above, it is not possible to determine what the adjustment has done to the data (notably, any changes to the subflow sequence numbering). Therefore, it is not possible to recover the subflow, and the affected subflow must be immediately closed with a RST, featuring an MP_FAIL option (Figure 15), which defines the data sequence number at the start of the segment (defined by the data sequence mapping) that had the checksum failure. Note that the MP_FAIL option requires the use of the full 64-bit sequence number, even if 32-bit sequence numbers are normally in use in the DSS signals on the path.
当使用多个子流时,子流上正在传输的数据可能涉及不连续属于连接级流的数据,因为段将分布在多个子流上。由于上述问题,无法确定对数据进行了哪些调整(尤其是对子流序列编号的任何更改)。因此,无法恢复子流,必须立即使用RST关闭受影响的子流,该RST具有MP_FAIL选项(图15),该选项定义了发生校验和故障的段(由数据序列映射定义)开始处的数据序列号。请注意,MP_FAIL选项要求使用完整的64位序列号,即使路径上的DSS信号中通常使用32位序列号。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+----------------------+ | Kind | Length=12 |Subtype| (reserved) | +---------------+---------------+-------+----------------------+ | | | Data Sequence Number (8 octets) | | | +--------------------------------------------------------------+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+----------------------+ | Kind | Length=12 |Subtype| (reserved) | +---------------+---------------+-------+----------------------+ | | | Data Sequence Number (8 octets) | | | +--------------------------------------------------------------+
Figure 15: Fallback (MP_FAIL) Option
图15:回退(MP\U失败)选项
The receiver MUST discard all data following the data sequence number specified. Failed data MUST NOT be DATA_ACKed and so will be retransmitted on other subflows (Section 3.3.6).
接收器必须丢弃指定数据序列号之后的所有数据。故障数据不得进行数据确认,因此将在其他子流上重新传输(第3.3.6节)。
A special case is when there is a single subflow and it fails with a checksum error. If it is known that all unacknowledged data in flight is contiguous (which will usually be the case with a single subflow), an infinite mapping can be applied to the subflow without the need to close it first, and essentially turn off all further MPTCP signaling. In this case, if a receiver identifies a checksum failure when there is only one path, it will send back an MP_FAIL option on the subflow-level ACK, referring to the data-level sequence number of the start of the segment on which the checksum error was detected. The sender will receive this, and if all unacknowledged data in flight is contiguous, will signal an infinite mapping. This infinite mapping will be a DSS option (Section 3.3) on the first new packet, containing a data sequence mapping that acts retroactively, referring to the start of the subflow sequence number of the last segment that was known to be delivered intact. From that point onwards, data can be altered by a middlebox without affecting MPTCP, as the data stream is equivalent to a regular, legacy TCP session.
一种特殊情况是,存在单个子流,并且由于校验和错误而失败。如果已知飞行中所有未确认的数据都是连续的(通常情况下是单个子流),则无需首先关闭子流即可将无限映射应用于该子流,并基本上关闭所有进一步的MPTCP信令。在这种情况下,如果接收器在只有一条路径的情况下识别出校验和故障,它将在子流级别ACK上发回MP_FAIL选项,该选项参考检测到校验和错误的段的起始数据级别序列号。发送方将收到此消息,如果飞行中所有未确认的数据都是连续的,则将发出无限映射的信号。该无限映射将是第一个新数据包上的DSS选项(第3.3节),包含追溯作用的数据序列映射,参考已知完整交付的最后一段的子流序列号的开始。从那时起,数据就可以通过中间盒进行更改,而不会影响MPTCP,因为数据流相当于常规的传统TCP会话。
In the rare case that the data is not contiguous (which could happen when there is only one subflow but it is retransmitting data from a subflow that has recently been uncleanly closed), the receiver MUST close the subflow with a RST with MP_FAIL. The receiver MUST discard all data that follows the data sequence number specified. The sender MAY attempt to create a new subflow belonging to the same connection, and, if it chooses to do so, SHOULD place the single subflow immediately in single-path mode by setting an infinite data sequence mapping. This mapping will begin from the data-level sequence number that was declared in the MP_FAIL.
在数据不连续的罕见情况下(当只有一个子流,但它正在重新传输来自最近被不干净地关闭的子流的数据时,可能会发生这种情况),接收器必须使用RST和MP_FAIL关闭子流。接收器必须丢弃指定数据序列号之后的所有数据。发送方可以尝试创建属于同一连接的新子流,如果选择创建,则应通过设置无限数据序列映射将单个子流立即置于单路径模式。此映射将从MP_FAIL中声明的数据级序列号开始。
After a sender signals an infinite mapping, it MUST only use subflow ACKs to clear its send buffer. This is because Data ACKs may become misaligned with the subflow ACKs when middleboxes insert or delete data. The receive SHOULD stop generating Data ACKs after it receives an infinite mapping.
发送方发出无限映射信号后,它必须仅使用子流确认来清除其发送缓冲区。这是因为当中间盒插入或删除数据时,数据ack可能会与子流ack不对齐。接收端在接收到无限映射后应停止生成数据确认。
When a connection has fallen back, only one subflow can send data; otherwise, the receiver would not know how to reorder the data. In practice, this means that all MPTCP subflows will have to be terminated except one. Once MPTCP falls back to regular TCP, it MUST NOT revert to MPTCP later in the connection.
当一个连接后退时,只有一个子流可以发送数据;否则,接收方将不知道如何对数据重新排序。实际上,这意味着除一个MPTCP子流外,所有MPTCP子流都必须终止。一旦MPTCP退回到常规TCP,它就不能在以后的连接中恢复到MPTCP。
It should be emphasized that we are not attempting to prevent the use of middleboxes that want to adjust the payload. An MPTCP-aware middlebox could provide such functionality by also rewriting checksums.
应该强调的是,我们并没有试图阻止使用想要调整有效负载的中间盒。支持MPTCP的中间盒还可以通过重写校验和来提供这种功能。
In addition to the fallback mechanism as described above, the standard classes of TCP errors may need to be handled in an MPTCP-specific way. Note that changing semantics -- such as the relevance of a RST -- are covered in Section 4. Where possible, we do not want to deviate from regular TCP behavior.
除了如上所述的回退机制外,TCP错误的标准类可能需要以MPTCP特定的方式进行处理。请注意,第4节介绍了不断变化的语义,例如RST的相关性。在可能的情况下,我们不希望偏离常规TCP行为。
The following list covers possible errors and the appropriate MPTCP behavior:
以下列表涵盖了可能的错误和适当的MPTCP行为:
o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's behavior on an unknown port)
o MP_JOIN中的未知令牌(或MP_JOIN ACK中的HMAC失败,或SYN/ACK响应中缺少MP_JOIN):发送RST(类似于TCP在未知端口上的行为)
o DSN out of window (during normal operation): drop the data, do not send Data ACKs
o DSN超出窗口(在正常操作期间):删除数据,不发送数据确认
o Remove request for unknown address ID: silently ignore
o 删除未知地址ID的请求:静默忽略
There are a number of heuristics that are needed for performance or deployment but that are not required for protocol correctness. In this section, we detail such heuristics. Note that discussion of buffering and certain sender and receiver window behaviors are presented in Sections 3.3.4 and 3.3.5, as well as retransmission in Section 3.3.6.
性能或部署需要许多试探法,但协议正确性不需要这些试探法。在本节中,我们将详细介绍这种启发式方法。注意,第3.3.4节和第3.3.5节介绍了缓冲和某些发送方和接收方窗口行为的讨论,第3.3.6节介绍了重传。
Under typical operation, an MPTCP implementation SHOULD use the same ports as already in use. In other words, the destination port of a SYN containing an MP_JOIN option SHOULD be the same as the remote port of the first subflow in the connection. The local port for such SYNs SHOULD also be the same as for the first subflow (and as such, an implementation SHOULD reserve ephemeral ports across all local IP addresses), although there may be cases where this is infeasible. This strategy is intended to maximize the probability of the SYN being permitted by a firewall or NAT at the recipient and to avoid confusing any network monitoring software.
在典型操作下,MPTCP实现应使用与已在使用的端口相同的端口。换句话说,包含MP_连接选项的SYN的目标端口应与连接中第一个子流的远程端口相同。此类SYN的本地端口也应与第一个子流相同(因此,实现应在所有本地IP地址上保留临时端口),尽管在某些情况下这是不可行的。该策略旨在最大限度地提高SYN被接收方防火墙或NAT允许的可能性,并避免混淆任何网络监控软件。
There may also be cases, however, where the passive opener wishes to signal to the other host that a specific port should be used, and this facility is provided in the Add Address option as documented in Section 3.4.1. It is therefore feasible to allow multiple subflows between the same two addresses but using different port pairs, and such a facility could be used to allow load balancing within the network based on 5-tuples (e.g., some ECMP implementations [7]).
但是,在某些情况下,被动开启器希望向另一台主机发出信号,表示应使用特定端口,并且该设施在第3.4.1节中记录的添加地址选项中提供。因此,允许在相同的两个地址之间使用多个子流但使用不同的端口对是可行的,并且这种设施可用于允许基于5元组的网络内的负载平衡(例如,一些ECMP实现[7])。
Many TCP connections are short-lived and consist only of a few segments, and so the overheads of using MPTCP outweigh any benefits. A heuristic is required, therefore, to decide when to start using additional subflows in an MPTCP connection. We expect that experience gathered from deployments will provide further guidance on this, and will be affected by particular application characteristics (which are likely to change over time). However, a suggested general-purpose heuristic that an implementation MAY choose to employ is as follows. Results from experimental deployments are needed in order to verify the correctness of this proposal.
许多TCP连接都是短命的,并且只由几个段组成,因此使用MPTCP的开销超过了任何好处。因此,需要一种启发式方法来决定何时开始在MPTCP连接中使用额外的子流。我们希望从部署中收集到的经验将为这方面提供进一步的指导,并将受到特定应用程序特性的影响(这些特性可能会随着时间的推移而改变)。然而,一个实现可能选择采用的建议通用启发式如下。需要实验部署的结果来验证该方案的正确性。
If a host has data buffered for its peer (which implies that the application has received a request for data), the host opens one subflow for each initial window's worth of data that is buffered.
如果主机为其对等方缓冲了数据(这意味着应用程序已收到数据请求),则主机将为每个缓冲的初始窗口的数据打开一个子流。
Consideration should also be given to limiting the rate of adding new subflows, as well as limiting the total number of subflows open for a particular connection. A host may choose to vary these values based on its load or knowledge of traffic and path characteristics.
还应考虑限制添加新子流的速率,以及限制为特定连接打开的子流总数。主机可以根据其负载或对流量和路径特征的了解来选择改变这些值。
Note that this heuristic alone is probably insufficient. Traffic for many common applications, such as downloads, is highly asymmetric and the host that is multihomed may well be the client that will never
请注意,仅此启发式可能是不够的。许多常见应用程序(如下载)的通信量是高度不对称的,多宿主机很可能是永远不会出现的客户端
fill its buffers, and thus never use MPTCP. Advanced APIs that allow an application to signal its traffic requirements would aid in these decisions.
填充其缓冲区,因此永远不要使用MPTCP。允许应用程序向其流量需求发送信号的高级API将有助于这些决策。
An additional time-based heuristic could be applied, opening additional subflows after a given period of time has passed. This would alleviate the above issue, and also provide resilience for low-bandwidth but long-lived applications.
可以应用额外的基于时间的启发式,在给定的时间段过后打开额外的子流。这将缓解上述问题,并为低带宽但寿命较长的应用程序提供恢复能力。
This section has shown some of the considerations that an implementer should give when developing MPTCP heuristics, but is not intended to be prescriptive.
本节展示了实现者在开发MPTCP启发法时应考虑的一些事项,但并非规定性的。
Requirements for MPTCP's handling of unexpected signals have been given in Section 3.7. There are other failure cases, however, where a hosts can choose appropriate behavior.
第3.7节给出了MPTCP处理意外信号的要求。但是,在其他故障情况下,主机可以选择适当的行为。
For example, Section 3.1 suggests that a host SHOULD fall back to trying regular TCP SYNs after one or more failures of MPTCP SYNs for a connection. A host may keep a system-wide cache of such information, so that it can back off from using MPTCP, firstly for that particular destination host, and eventually on a whole interface, if MPTCP connections continue failing.
例如,第3.1节建议,在连接的MPTCP SYN出现一个或多个故障后,主机应重新尝试常规TCP SYN。主机可以保留此类信息的系统范围缓存,以便在MPTCP连接继续失败的情况下,首先针对特定的目标主机,最终在整个接口上,停止使用MPTCP。
Another failure could occur when the MP_JOIN handshake fails. Section 3.7 specifies that an incorrect handshake MUST lead to the subflow being closed with a RST. A host operating an active intrusion detection system may choose to start blocking MP_JOIN packets from the source host if multiple failed MP_JOIN attempts are seen. From the connection initiator's point of view, if an MP_JOIN fails, it SHOULD NOT attempt to connect to the same IP address and port during the lifetime of the connection, unless the other host refreshes the information with another ADD_ADDR option. Note that the ADD_ADDR option is informational only, and does not guarantee the other host will attempt a connection.
当MP_连接握手失败时,可能会发生另一个失败。第3.7节规定,不正确的握手必须导致子流通过RST关闭。如果发现多个失败的MP_加入尝试,运行主动入侵检测系统的主机可能会选择开始阻止来自源主机的MP_加入数据包。从连接发起方的角度来看,如果MP_连接失败,则在连接的生存期内不应尝试连接到相同的IP地址和端口,除非另一台主机使用另一个ADD_ADDR选项刷新信息。请注意,ADD_ADDR选项仅供参考,并不保证其他主机会尝试连接。
In addition, an implementation may learn, over a number of connections, that certain interfaces or destination addresses consistently fail and may default to not trying to use MPTCP for these. Behavior could also be learned for particularly badly performing subflows or subflows that regularly fail during use, in order to temporarily choose not to use these paths.
此外,一个实现可以通过大量连接了解到某些接口或目标地址始终失败,并且可能默认不尝试对这些接口或目标地址使用MPTCP。对于性能特别差的子流或在使用过程中经常出现故障的子流,也可以了解其行为,以便临时选择不使用这些路径。
In order to support multipath operation, the semantics of some TCP components have changed. To aid clarity, this section collects these semantic changes as a reference.
为了支持多路径操作,一些TCP组件的语义已经改变。为了清晰起见,本节收集这些语义变化作为参考。
Sequence number: The (in-header) TCP sequence number is specific to the subflow. To allow the receiver to reorder application data, an additional data-level sequence space is used. In this data-level sequence space, the initial SYN and the final DATA_FIN occupy 1 octet of sequence space. There is an explicit mapping of data sequence space to subflow sequence space, which is signaled through TCP options in data packets.
序列号:(标头中)TCP序列号特定于子流。为了允许接收器对应用程序数据重新排序,使用了额外的数据级序列空间。在这个数据级序列空间中,初始SYN和最终数据_FIN占据序列空间的1个八位组。存在数据序列空间到子流序列空间的显式映射,该映射通过数据包中的TCP选项发出信号。
ACK: The ACK field in the TCP header acknowledges only the subflow sequence number, not the data-level sequence space. Implementations SHOULD NOT attempt to infer a data-level acknowledgment from the subflow ACKs. This separates subflow- and connection-level processing at an end host.
ACK:TCP报头中的ACK字段只确认子流序列号,而不确认数据级序列空间。实现不应试图从子流确认推断数据级确认。这将在终端主机上分离子流和连接级处理。
Duplicate ACK: A duplicate ACK that includes any MPTCP signaling (with the exception of the DSS option) MUST NOT be treated as a signal of congestion. To limit the chances of non-MPTCP-aware entities mistakenly interpreting duplicate ACKs as a signal of congestion, MPTCP SHOULD NOT send more than two duplicate ACKs containing (non-DSS) MPTCP signals in a row.
重复确认:包含任何MPTCP信令的重复确认(DSS选项除外)不得视为拥塞信号。为了限制非MPTCP感知实体错误地将重复ACK解释为拥塞信号的可能性,MPTCP不应发送两个以上的重复ACK,其中一行包含(非DSS)MPTCP信号。
Receive Window: The receive window in the TCP header indicates the amount of free buffer space for the whole data-level connection (as opposed to for this subflow) that is available at the receiver. This is the same semantics as regular TCP, but to maintain these semantics the receive window must be interpreted at the sender as relative to the sequence number given in the DATA_ACK rather than the subflow ACK in the TCP header. In this way, the original flow control role is preserved. Note that some middleboxes may change the receive window, and so a host SHOULD use the maximum value of those recently seen on the constituent subflows for the connection-level receive window, and also needs to maintain a subflow-level window for subflow-level processing.
接收窗口:TCP报头中的接收窗口指示整个数据级连接(与此子流相反)在接收器上可用的可用缓冲区空间量。这与常规TCP的语义相同,但为了保持这些语义,必须在发送方将接收窗口解释为相对于数据确认中给出的序列号,而不是TCP报头中的子流确认。这样,就保留了原始的流控制角色。请注意,某些中间盒可能会更改接收窗口,因此主机应使用最近在连接级别接收窗口的组成子流上看到的最大值,并且还需要为子流处理维护子流级别窗口。
FIN: The FIN flag in the TCP header applies only to the subflow it is sent on, not to the whole connection. For connection-level FIN semantics, the DATA_FIN option is used.
FIN:TCP报头中的FIN标志仅适用于发送它的子流,而不适用于整个连接。对于连接级别的FIN语义,使用DATA_FIN选项。
RST: The RST flag in the TCP header applies only to the subflow it is sent on, not to the whole connection. The MP_FASTCLOSE option provides the fast close functionality of a RST at the MPTCP connection level.
RST:TCP头中的RST标志仅适用于发送它的子流,而不适用于整个连接。MP_FASTCLOSE选项在MPTCP连接级别提供RST的快速关闭功能。
Address List: Address list management (i.e., knowledge of the local and remote hosts' lists of available IP addresses) is handled on a per-connection basis (as opposed to per subflow, per host, or per pair of communicating hosts). This permits the application of per-connection local policy. Adding an address to one connection (either explicitly through an Add Address message, or implicitly through a Join) has no implication for other connections between the same pair of hosts.
地址列表:地址列表管理(即,了解本地和远程主机的可用IP地址列表)基于每个连接(而不是每个子流、每个主机或每对通信主机)进行处理。这允许应用每连接本地策略。将地址添加到一个连接(通过addaddress消息显式地添加地址,或通过Join隐式地添加地址)不会影响同一对主机之间的其他连接。
5-tuple: The 5-tuple (protocol, local address, local port, remote address, remote port) presented by kernel APIs to the application layer in a non-multipath-aware application is that of the first subflow, even if the subflow has since been closed and removed from the connection. This decision, and other related API issues, are discussed in more detail in [6].
5元组:内核API向非多路径感知应用程序中的应用层提供的5元组(协议、本地地址、本地端口、远程地址、远程端口)是第一个子流的5元组,即使该子流已经关闭并从连接中移除。该决定以及其他相关API问题将在[6]中进行更详细的讨论。
As identified in [9], the addition of multipath capability to TCP will bring with it a number of new classes of threat. In order to prevent these, [2] presents a set of requirements for a security solution for MPTCP. The fundamental goal is for the security of MPTCP to be "no worse" than regular TCP today, and the key security requirements are:
如[9]所述,TCP的多路径能力的增加将带来许多新的威胁类别。为了防止这些问题,[2]提出了一套MPTCP安全解决方案的要求。基本目标是使MPTCP的安全性“不比现在的常规TCP差”,关键的安全要求是:
o Provide a mechanism to confirm that the parties in a subflow handshake are the same as in the original connection setup.
o 提供一种机制,以确认子流握手中的参与方与原始连接设置中的参与方相同。
o Provide verification that the peer can receive traffic at a new address before using it as part of a connection.
o 在将新地址用作连接的一部分之前,验证对等方是否可以在新地址接收流量。
o Provide replay protection, i.e., ensure that a request to add/ remove a subflow is 'fresh'.
o 提供重播保护,即确保添加/删除子流的请求是“新的”。
In order to achieve these goals, MPTCP includes a hash-based handshake algorithm documented in Sections 3.1 and 3.2.
为了实现这些目标,MPTCP包括第3.1节和第3.2节中记录的基于哈希的握手算法。
The security of the MPTCP connection hangs on the use of keys that are shared once at the start of the first subflow, and are never sent again over the network (unless used in the fast close mechanism, Section 3.5). To ease demultiplexing while not giving away any cryptographic material, future subflows use a truncated cryptographic hash of this key as the connection identification "token". The keys are concatenated and used as keys for creating Hash-based Message Authentication Codes (HMACs) used on subflow setup, in order to verify that the parties in the handshake are the same as in the original connection setup. It also provides verification that the peer can receive traffic at this new address. Replay attacks would
MPTCP连接的安全性取决于密钥的使用,这些密钥在第一个子流开始时共享一次,并且永远不会通过网络再次发送(除非在快速关闭机制中使用,第3.5节)。为了在不泄露任何加密材料的情况下简化解复用,未来的子流使用此密钥的截断加密散列作为连接标识“令牌”。这些密钥被连接起来并用作创建子流设置中使用的基于散列的消息身份验证码(HMAC)的密钥,以验证握手中的各方与原始连接设置中的各方相同。它还提供了对等方可以在这个新地址接收流量的验证。重播攻击将
still be possible when only keys are used; therefore, the handshakes use single-use random numbers (nonces) at both ends -- this ensures the HMAC will never be the same on two handshakes. Guidance on generating random numbers suitable for use as keys is given in [14] and discussed in Section 3.1.
当只使用钥匙时仍然可以;因此,握手在两端使用一次性随机数(nonce)——这确保了HMAC在两次握手中永远不会相同。[14]中给出了生成适合用作密钥的随机数的指南,并在第3.1节中进行了讨论。
The use of crypto capability bits in the initial connection handshake to negotiate use of a particular algorithm allows the deployment of additional crypto mechanisms in the future. Note that this would be susceptible to bid-down attacks only if the attacker was on-path (and thus would be able to modify the data anyway). The security mechanism presented in this document should therefore protect against all forms of flooding and hijacking attacks discussed in [9].
在初始连接握手中使用加密能力位来协商特定算法的使用,允许将来部署额外的加密机制。请注意,只有当攻击者在路径上时(因此无论如何都可以修改数据),这才容易受到向下出价攻击。因此,本文件中介绍的安全机制应能防止[9]中讨论的所有形式的洪水和劫持攻击。
During normal operation, regular TCP protection mechanisms (such as ensuring sequence numbers are in-window) will provide the same level of protection against attacks on individual TCP subflows as exists for regular TCP today. Implementations will introduce additional buffers compared to regular TCP, to reassemble data at the connection level. The application of window sizing will minimize the risk of denial-of-service attacks consuming resources.
在正常运行期间,常规TCP保护机制(如确保序列号在窗口中)将提供与当前常规TCP相同的保护级别,以防止对单个TCP子流的攻击。与常规TCP相比,实现将引入额外的缓冲区,以便在连接级别重新组装数据。应用窗口大小将最大限度地降低拒绝服务攻击消耗资源的风险。
As discussed in Section 3.4.1, a host may advertise its private addresses, but these might point to different hosts in the receiver's network. The MP_JOIN handshake (Section 3.2) will ensure that this does not succeed in setting up a subflow to the incorrect host. However, it could still create unwanted TCP handshake traffic. This feature of MPTCP could be a target for denial-of-service exploits, with malicious participants in MPTCP connections encouraging the recipient to target other hosts in the network. Therefore, implementations should consider heuristics (Section 3.8) at both the sender and receiver to reduce the impact of this.
如第3.4.1节所述,主机可以公布其专用地址,但这些地址可能指向接收方网络中的不同主机。MP_连接握手(第3.2节)将确保不会成功设置到错误主机的子流。但是,它仍然可能创建不需要的TCP握手通信。MPTCP的此功能可能成为拒绝服务攻击的目标,MPTCP连接中的恶意参与者会鼓励收件人以网络中的其他主机为目标。因此,实现应该考虑发送者和接收者的启发式(第3.8节)以减少这种影响。
A small security risk could theoretically exist with key reuse, but in order to accomplish a replay attack, both the sender and receiver keys, and the sender and receiver random numbers, in the MP_JOIN handshake (Section 3.2) would have to match.
从理论上讲,密钥重用可能存在较小的安全风险,但为了完成重放攻击,MP_连接握手(第3.2节)中的发送方和接收方密钥以及发送方和接收方随机数必须匹配。
Whilst this specification defines a "medium" security solution, meeting the criteria specified at the start of this section and the threat analysis ([9]), since attacks only ever get worse, it is likely that a future Standards Track version of MPTCP would need to be able to support stronger security. There are several ways the security of MPTCP could potentially be improved; some of these would be compatible with MPTCP as defined in this document, whilst others may not be. For now, the best approach is to get experience with the current approach, establish what might work, and check that the threat analysis is still accurate.
虽然本规范定义了一个“中等”安全解决方案,满足本节开头规定的标准和威胁分析([9]),但由于攻击只会变得更严重,未来MPTCP的标准跟踪版本可能需要能够支持更强的安全性。有几种可能提高MPTCP安全性的方法;其中一些将与本文件中定义的MPTCP兼容,而其他可能不兼容。目前,最好的方法是获得当前方法的经验,确定哪些方法可行,并检查威胁分析是否仍然准确。
Possible ways of improving MPTCP security could include:
改进MPTCP安全性的可能方法包括:
o defining a new MPCTP cryptographic algorithm, as negotiated in MP_CAPABLE. A sub-case could be to include an additional deployment assumption, such as stateful servers, in order to allow a more powerful algorithm to be used.
o 定义一个新的MPCTP加密算法,如在MP_-CAPABLE中协商的那样。一个子案例可以包括额外的部署假设,例如有状态服务器,以便使用更强大的算法。
o defining how to secure data transfer with MPTCP, whilst not changing the signaling part of the protocol.
o 定义如何使用MPTCP保护数据传输,同时不更改协议的信令部分。
o defining security that requires more option space, perhaps in conjunction with a "long options" proposal for extending the TCP options space (such as those surveyed in [20]), or perhaps building on the current approach with a second stage of MPTCP-option-based security.
o 定义需要更多选项空间的安全性,可能与扩展TCP选项空间的“长选项”提案(如[20]中调查的内容)相结合,或者可能在当前方法的基础上建立基于MPTCP选项的安全性的第二阶段。
o revisiting the working group's decision to exclusively use TCP options for MPTCP signaling, and instead look at also making use of the TCP payloads.
o 再次回顾工作组关于MPTCP信令只使用TCP选项的决定,并考虑使用TCP有效负载。
MPTCP has been designed with several methods available to indicate a new security mechanism, including:
MPTCP设计有几种方法可用于指示新的安全机制,包括:
o available flags in MP_CAPABLE (Figure 4);
o MP_-CAPABLE中的可用标志(图4);
o available subtypes in the MPTCP option (Figure 3);
o MPTCP选项中的可用子类型(图3);
o the version field in MP_CAPABLE (Figure 4);
o MP_-CAPABLE中的版本字段(图4);
Multipath TCP was designed to be deployable in the present world. Its design takes into account "reasonable" existing middlebox behavior. In this section, we outline a few representative middlebox-related failure scenarios and show how Multipath TCP handles them. Next, we list the design decisions multipath has made to accommodate the different middleboxes.
多路径TCP被设计为可在当今世界部署。其设计考虑了“合理”的现有中间盒行为。在本节中,我们将概述几个具有代表性的与中间盒相关的故障场景,并展示多路径TCP如何处理它们。接下来,我们列出多径为适应不同的中间盒而做出的设计决策。
A primary concern is our use of a new TCP option. Middleboxes should forward packets with unknown options unchanged, yet there are some that don't. These we expect will either strip options and pass the data, drop packets with new options, copy the same option into multiple segments (e.g., when doing segmentation), or drop options during segment coalescing.
我们主要关心的是使用新的TCP选项。中间盒应该转发未知选项不变的数据包,但也有一些没有。我们期望这些选项可以剥离选项并传递数据,使用新选项丢弃数据包,将同一选项复制到多个段中(例如,进行分段时),或者在段合并期间丢弃选项。
MPTCP uses a single new TCP option "Kind", and all message types are defined by "subtype" values (see Section 8). This should reduce the chances of only some types of MPTCP options being passed, and instead the key differing characteristics are different paths, and the presence of the SYN flag.
MPTCP使用一个新的TCP选项“种类”,所有消息类型都由“子类型”值定义(参见第8节)。这将减少仅传递某些类型的MPTCP选项的机会,相反,关键的不同特征是不同的路径和SYN标志的存在。
MPTCP SYN packets on the first subflow of a connection contain the MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD fall back to regular TCP. If packets with the MP_JOIN option (Section 3.2) are dropped, the paths will simply not be used.
连接的第一个子流上的MPTCP SYN数据包包含支持MP_的选项(第3.1节)。如果删除此选项,MPTCP应返回到常规TCP。如果带有MP_连接选项(第3.2节)的数据包被丢弃,那么这些路径将不被使用。
If a middlebox strips options but otherwise passes the packets unchanged, MPTCP will behave safely. If an MP_CAPABLE option is dropped on either the outgoing or the return path, the initiating host can fall back to regular TCP, as illustrated in Figure 16 and discussed in Section 3.1.
如果中间盒删除选项,但以其他方式未更改地传递数据包,则MPTCP将安全运行。如果在传出路径或返回路径上丢弃了支持MP_的选项,则发起主机可以退回到常规TCP,如图16所示,并在第3.1节中讨论。
Subflow SYNs contain the MP_JOIN option. If this option is stripped on the outgoing path, the SYN will appear to be a regular SYN to Host B. Depending on whether there is a listening socket on the target port, Host B will reply either with SYN/ACK or RST (subflow connection fails). When Host A receives the SYN/ACK it sends a RST because the SYN/ACK does not contain the MP_JOIN option and its token. Either way, the subflow setup fails, but otherwise does not affect the MPTCP connection as a whole.
子流SYN包含MP\U JOIN选项。如果在传出路径上剥离此选项,则SYN将显示为主机B的常规SYN。根据目标端口上是否有侦听套接字,主机B将使用SYN/ACK或RST进行应答(子流连接失败)。当主机A收到SYN/ACK时,它会发送RST,因为SYN/ACK不包含MP_JOIN选项及其令牌。无论哪种方式,子流设置都会失败,但不会影响整个MPTCP连接。
Host A Host B | Middlebox M | | | | | SYN(MP_CAPABLE) | SYN | |-------------------|---------------->| | SYN/ACK | |<------------------------------------| a) MP_CAPABLE option stripped on outgoing path
Host A Host B | Middlebox M | | | | | SYN(MP_CAPABLE) | SYN | |-------------------|---------------->| | SYN/ACK | |<------------------------------------| a) MP_CAPABLE option stripped on outgoing path
Host A Host B | SYN(MP_CAPABLE) | |------------------------------------>| | Middlebox M | | | | | SYN/ACK |SYN/ACK(MP_CAPABLE)| |<----------------|-------------------| b) MP_CAPABLE option stripped on return path
Host A Host B | SYN(MP_CAPABLE) | |------------------------------------>| | Middlebox M | | | | | SYN/ACK |SYN/ACK(MP_CAPABLE)| |<----------------|-------------------| b) MP_CAPABLE option stripped on return path
Figure 16: Connection Setup with Middleboxes that Strip Options from Packets
图16:从数据包中剥离选项的中间盒连接设置
We now examine data flow with MPTCP, assuming the flow is correctly set up, which implies the options in the SYN packets were allowed through by the relevant middleboxes. If options are allowed through and there is no resegmentation or coalescing to TCP segments, Multipath TCP flows can proceed without problems.
我们现在使用MPTCP检查数据流,假设流设置正确,这意味着SYN数据包中的选项被相关的中间盒允许通过。如果允许通过选项,并且没有重新分段或合并到TCP段,则多路径TCP流可以顺利进行。
The case when options get stripped on data packets has been discussed in the Fallback section. If a fraction of options are stripped, behavior is not deterministic. If some data sequence mappings are lost, the connection can continue so long as mappings exist for the subflow-level data (e.g., if multiple maps have been sent that reinforce each other). If some subflow-level space is left unmapped, however, the subflow is treated as broken and is closed, through the process described in Section 3.6. MPTCP should survive with a loss of some Data ACKs, but performance will degrade as the fraction of stripped options increases. We do not expect such cases to appear in practice, though: most middleboxes will either strip all options or let them all through.
在回退部分讨论了在数据包上剥离选项的情况。如果剥夺了一小部分选择权,行为就不具有确定性。如果某些数据序列映射丢失,只要子流级数据存在映射,连接就可以继续(例如,如果发送了多个相互增强的映射)。但是,如果某些子流级别空间未映射,则通过第3.6节中描述的过程,子流将被视为断开并关闭。MPTCP应该能够在丢失一些数据ACK的情况下生存,但性能会随着剥离选项比例的增加而降低。不过,我们并不期望这种情况在实践中出现:大多数中间商要么剥夺所有选择,要么让它们全部通过。
We end this section with a list of middlebox classes, their behavior, and the elements in the MPTCP design that allow operation through such middleboxes. Issues surrounding dropping packets with options or stripping options were discussed above, and are not included here:
在本节结束时,我们将列出中间盒类、它们的行为以及MPTCP设计中允许通过此类中间盒进行操作的元素。上文讨论了使用选项或剥离选项丢弃数据包的相关问题,此处不包括这些问题:
o NATs [21] (Network Address (and Port) Translators) change the source address (and often source port) of packets. This means that a host will not know its public-facing address for signaling in MPTCP. Therefore, MPTCP permits implicit address addition via the MP_JOIN option, and the handshake mechanism ensures that connection attempts to private addresses [18] do not cause problems. Explicit address removal is undertaken by an Address ID to allow no knowledge of the source address.
o NAT[21](网络地址(和端口)转换器)更改数据包的源地址(通常是源端口)。这意味着主机在MPTCP中不知道其面向公众的信令地址。因此,MPTCP允许通过MP_JOIN选项进行隐式地址添加,握手机制确保到专用地址的连接尝试不会导致问题[18]。显式地址删除由地址ID执行,不允许知道源地址。
o Performance Enhancing Proxies (PEPs) [22] might proactively ACK data to increase performance. MPTCP, however, relies on accurate congestion control signals from the end host, and non-MPTCP-aware PEPs will not be able to provide such signals. MPTCP will, therefore, fall back to single-path TCP, or close the problematic subflow (see Section 3.6).
o 性能增强代理(PEP)[22]可能会主动确认数据以提高性能。然而,MPTCP依赖于来自终端主机的准确拥塞控制信号,非MPTCP感知的PEP将无法提供此类信号。因此,MPTCP将退回到单路径TCP,或关闭有问题的子流(参见第3.6节)。
o Traffic Normalizers [23] may not allow holes in sequence numbers, and may cache packets and retransmit the same data. MPTCP looks like standard TCP on the wire, and will not retransmit different data on the same subflow sequence number. In the event of a retransmission, the same data will be retransmitted on the original TCP subflow even if it is additionally retransmitted at the connection level on a different subflow.
o 流量规范化器[23]可能不允许序列号中存在漏洞,并且可能缓存数据包并重新传输相同的数据。MPTCP看起来像是线上的标准TCP,不会在同一子流序列号上重新传输不同的数据。在重新传输的情况下,相同的数据将在原始TCP子流上重新传输,即使它在不同子流的连接级别上额外重新传输。
o Firewalls [24] might perform initial sequence number randomization on TCP connections. MPTCP uses relative sequence numbers in data sequence mapping to cope with this. Like NATs, firewalls will not permit many incoming connections, so MPTCP supports address signaling (ADD_ADDR) so that a multiaddressed host can invite its peer behind the firewall/NAT to connect out to its additional interface.
o 防火墙[24]可能会对TCP连接执行初始序列号随机化。MPTCP在数据序列映射中使用相对序列号来处理此问题。与NAT一样,防火墙不允许许多传入连接,因此MPTCP支持地址信令(ADD_ADDR),以便多地址主机可以邀请防火墙/NAT后面的对等主机连接到其附加接口。
o Intrusion Detection Systems look out for traffic patterns and content that could threaten a network. Multipath will mean that such data is potentially spread, so it is more difficult for an IDS to analyze the whole traffic, and potentially increases the risk of false positives. However, for an MPTCP-aware IDS, tokens can be read by such systems to correlate multiple subflows and reassemble for analysis.
o 入侵检测系统关注可能威胁网络的流量模式和内容。多路径将意味着此类数据可能会传播,因此IDS更难分析整个流量,并可能增加误报风险。然而,对于MPTCP感知的id,这些系统可以读取令牌以关联多个子流并重新组装以进行分析。
o Application-level middleboxes such as content-aware firewalls may alter the payload within a subflow, such as rewriting URIs in HTTP traffic. MPTCP will detect these using the checksum and close the affected subflow(s), if there are other subflows that can be used. If all subflows are affected, multipath will fall back to TCP, allowing such middleboxes to change the payload. MPTCP-aware middleboxes should be able to adjust the payload and MPTCP metadata in order not to break the connection.
o 应用程序级中间盒(如内容感知防火墙)可能会改变子流中的有效负载,如在HTTP流量中重写URI。MPTCP将使用校验和检测这些子流,并关闭受影响的子流(如果可以使用其他子流)。如果所有子流都受到影响,多路径将退回到TCP,从而允许此类中间盒更改有效负载。支持MPTCP的中间件应该能够调整有效负载和MPTCP元数据,以避免断开连接。
In addition, all classes of middleboxes may affect TCP traffic in the following ways:
此外,所有类别的中间盒都可能通过以下方式影响TCP流量:
o TCP options may be removed, or packets with unknown options dropped, by many classes of middleboxes. It is intended that the initial SYN exchange, with a TCP option, will be sufficient to identify the path capabilities. If such a packet does not get through, MPTCP will end up falling back to regular TCP.
o TCP选项可能会被删除,或者带有未知选项的数据包可能会被许多类别的中间盒丢弃。初始SYN交换(带有TCP选项)将足以识别路径功能。如果这样的数据包无法通过,MPTCP最终将退回到常规TCP。
o Segmentation/Coalescing (e.g., TCP segmentation offloading) might copy options between packets and might strip some options. MPTCP's data sequence mapping includes the relative subflow sequence number instead of using the sequence number in the segment. In this way, the mapping is independent of the packets that carry it.
o 分段/合并(例如,TCP分段卸载)可能会在数据包之间复制选项,并可能剥离某些选项。MPTCP的数据序列映射包括相对子流序列号,而不是使用段中的序列号。这样,映射独立于携带它的数据包。
o The receive window may be shrunk by some middleboxes at the subflow level. MPTCP will use the maximum window at data level, but will also obey subflow-specific windows.
o 接收窗口可能会被一些处于子流级别的中间盒缩小。MPTCP将在数据级别使用最大窗口,但也将遵循子流特定窗口。
The authors were originally supported by Trilogy (http://www.trilogy-project.org), a research project (ICT-216372) partially funded by the European Community under its Seventh Framework Program.
作者最初是由三部曲支持的(http://www.trilogy-project.org),一个研究项目(ICT-216372),部分由欧洲共同体根据其第七个框架计划资助。
Alan Ford was originally supported by Roke Manor Research.
Alan Ford最初由Roke Manor Research提供支持。
The authors gratefully acknowledge significant input into this document from Sebastien Barre, Christoph Paasch, and Andrew McDonald.
作者衷心感谢Sebastien Barre、Christoph Paasch和Andrew McDonald对本文的重要贡献。
The authors also wish to acknowledge reviews and contributions from Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Sean Turner, Stephen Farrell, and Martin Stiemerling.
作者还希望感谢来自Iljitsch van Beijnum,Lars Eggert,Marcelo Bagnulo,Robert Hancock,Pasi Sarolahti,Toby Moncaster,Philip Eardley,Sergio Lembo,Lawrence Conroy,Yoshifumi Nishida,Bob Briscoe,Stein Gjessing,Andrew McGregor,Georg Hampel,Anumta Biswas,Wes Eddy,Alexey Melnikov,弗朗西斯·杜邦、阿德里安·法雷尔、巴里·莱巴、罗伯特·斯帕克斯、肖恩·特纳、斯蒂芬·法雷尔和马丁·斯蒂梅林。
This document defines a new TCP option for MPTCP, assigned a value of 30 (decimal) from the TCP option space. This value is the value of "Kind" as seen in all MPTCP options in this document. This value is defined as:
本文档为MPTCP定义了一个新的TCP选项,从TCP选项空间中指定了一个值30(十进制)。该值为本文档中所有MPTCP选项中的“种类”值。该值定义为:
+------+--------+-----------------------+-----------+ | Kind | Length | Meaning | Reference | +------+--------+-----------------------+-----------+ | 30 | N | Multipath TCP (MPTCP) | RFC 6824 | +------+--------+-----------------------+-----------+
+------+--------+-----------------------+-----------+ | Kind | Length | Meaning | Reference | +------+--------+-----------------------+-----------+ | 30 | N | Multipath TCP (MPTCP) | RFC 6824 | +------+--------+-----------------------+-----------+
Table 1: TCP Option Kind Numbers
表1:TCP选项种类编号
This document also defines a 4-bit subtype field, for which IANA has created and will maintain a new sub-registry entitled "MPTCP Option Subtypes" under the "Transmission Control Protocol (TCP) Parameters" registry. Initial values for the MPTCP option subtype registry are given below; future assignments are to be defined by Standards Action as defined by [25]. Assignments consist of the MPTCP subtype's symbolic name and its associated value, as per the following table.
本文档还定义了一个4位子类型字段,IANA已为其创建并将在“传输控制协议(TCP)参数”注册表下维护一个名为“MPTCP选项子类型”的新子注册表。MPTCP选项子类型注册表的初始值如下所示;未来的任务将由[25]中定义的标准行动来定义。分配由MPTCP子类型的符号名及其关联值组成,如下表所示。
+-------+--------------+----------------------------+---------------+ | Value | Symbol | Name | Reference | +-------+--------------+----------------------------+---------------+ | 0x0 | MP_CAPABLE | Multipath Capable | Section 3.1 | | 0x1 | MP_JOIN | Join Connection | Section 3.2 | | 0x2 | DSS | Data Sequence Signal (Data | Section 3.3 | | | | ACK and data sequence | | | | | mapping) | | | 0x3 | ADD_ADDR | Add Address | Section 3.4.1 | | 0x4 | REMOVE_ADDR | Remove Address | Section 3.4.2 | | 0x5 | MP_PRIO | Change Subflow Priority | Section 3.3.8 | | 0x6 | MP_FAIL | Fallback | Section 3.6 | | 0x7 | MP_FASTCLOSE | Fast Close | Section 3.5 | +-------+--------------+----------------------------+---------------+
+-------+--------------+----------------------------+---------------+ | Value | Symbol | Name | Reference | +-------+--------------+----------------------------+---------------+ | 0x0 | MP_CAPABLE | Multipath Capable | Section 3.1 | | 0x1 | MP_JOIN | Join Connection | Section 3.2 | | 0x2 | DSS | Data Sequence Signal (Data | Section 3.3 | | | | ACK and data sequence | | | | | mapping) | | | 0x3 | ADD_ADDR | Add Address | Section 3.4.1 | | 0x4 | REMOVE_ADDR | Remove Address | Section 3.4.2 | | 0x5 | MP_PRIO | Change Subflow Priority | Section 3.3.8 | | 0x6 | MP_FAIL | Fallback | Section 3.6 | | 0x7 | MP_FASTCLOSE | Fast Close | Section 3.5 | +-------+--------------+----------------------------+---------------+
Table 2: MPTCP Option Subtypes
表2:MPTCP选项子类型
Values 0x8 through 0xe are currently unassigned. The value 0xf is reserved for Private Use within controlled testbeds.
值0x8到0xe当前未分配。值0xf保留在受控试验台内供私人使用。
IANA has created another sub-registry, "MPTCP Handshake Algorithms" under the "Transmission Control Protocol (TCP) Parameters" registry, based on the flags in MP_CAPABLE (Section 3.1). The flags consist of 8 bits, labeled "A" through "H", and this document assigns the bits as follows:
IANA根据MP_-CAPABLE(第3.1节)中的标志,在“传输控制协议(TCP)参数”注册表下创建了另一个子注册表“MPTCP握手算法”。标志由8位组成,标记为“A”到“H”,本文件按如下方式分配位:
+----------+-------------------+-----------------------+ | Flag Bit | Meaning | Reference | +----------+-------------------+-----------------------+ | A | Checksum required | RFC 6824, Section 3.1 | | B | Extensibility | RFC 6824, Section 3.1 | | C-G | Unassigned | | | H | HMAC-SHA1 | RFC 6824, Section 3.2 | +----------+-------------------+-----------------------+
+----------+-------------------+-----------------------+ | Flag Bit | Meaning | Reference | +----------+-------------------+-----------------------+ | A | Checksum required | RFC 6824, Section 3.1 | | B | Extensibility | RFC 6824, Section 3.1 | | C-G | Unassigned | | | H | HMAC-SHA1 | RFC 6824, Section 3.2 | +----------+-------------------+-----------------------+
Table 3: MPTCP Handshake Algorithms
表3:MPTCP握手算法
Note that the meanings of bits C through H can be dependent upon bit B, depending on how Extensibility is defined in future specifications; see Section 3.1 for more information.
注意,位C到H的含义可以取决于位B,这取决于扩展性在未来规范中的定义方式;更多信息请参见第3.1节。
Future assignments in this registry are also to be defined by Standards Action as defined by [25]. Assignments consist of the value of the flags, a symbolic name for the algorithm, and a reference to its specification.
本注册表中的未来分配也将由[25]中定义的标准行动定义。赋值由标志值、算法的符号名和对其规范的引用组成。
[1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981.
[1] 《传输控制协议》,标准7,RFC 793,1981年9月。
[2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar, "Architectural Guidelines for Multipath TCP Development", RFC 6182, March 2011.
[2] Ford,A.,Raiciu,C.,Handley,M.,Barre,S.,和J.Iyengar,“多路径TCP开发的架构指南”,RFC 61822011年3月。
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[3] Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[4] National Institute of Science and Technology, "Secure Hash Standard", Federal Information Processing Standard (FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/ fips/fips180-3/fips180-3_final.pdf>.
[4] 美国国家科学技术研究所,“安全哈希标准”,联邦信息处理标准(FIPS)180-32008年10月<http://csrc.nist.gov/publications/ fips/fips180-3/fips180-3_final.pdf>。
[5] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion Control for Multipath Transport Protocols", RFC 6356, October 2011.
[5] Raiciu,C.,Handley,M.,和D.Wischik,“多路径传输协议的耦合拥塞控制”,RFC 6356,2011年10月。
[6] Scharf, M. and A. Ford, "MPTCP Application Interface Considerations", Work in Progress, October 2012.
[6] Scharf,M.和A.Ford,“MPTCP应用程序接口注意事项”,正在进行的工作,2012年10月。
[7] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC 2992, November 2000.
[7] Hopps,C.,“等成本多路径算法的分析”,RFC 2992,2000年11月。
[8] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP", Usenix Symposium on Networked Systems Design and Implementation 012, 2012, <https://www.usenix.org/conference/ nsdi12/how-hard-can-it-be-designing-and-implementing-deployable-multipath-tcp>.
[8] Raiciu,C.,Paasch,C.,Barre,S.,Ford,A.,Honda,M.,Duchene,F.,Bonaventure,O.,和M.Handley,“有多难?设计和实现可部署多路径TCP”,Usenix网络系统设计和实现研讨会2012年12月12日, <https://www.usenix.org/conference/ nsdi12/设计和实现可部署多路径tcp>有多难。
[9] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6181, March 2011.
[9] Bagnulo,M.,“具有多个地址的多路径操作的TCP扩展的威胁分析”,RFC 61812011年3月。
[10] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", RFC 2104, February 1997.
[10] Krawczyk,H.,Bellare,M.和R.Canetti,“HMAC:用于消息身份验证的键控哈希”,RFC 2104,1997年2月。
[11] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996.
[11] Mathis,M.,Mahdavi,J.,Floyd,S.,和A.Romanow,“TCP选择性确认选项”,RFC 2018,1996年10月。
[12] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, September 2009.
[12] Allman,M.,Paxson,V.,和E.Blanton,“TCP拥塞控制”,RFC 56812009年9月。
[13] Gont, F., "Survey of Security Hardening Methods for Transmission Control Protocol (TCP) Implementations", Work in Progress, March 2012.
[13] Gont,F.,“传输控制协议(TCP)实现的安全强化方法调查”,正在进行的工作,2012年3月。
[14] Eastlake, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", BCP 106, RFC 4086, June 2005.
[14] Eastlake,D.,Schiller,J.,和S.Crocker,“安全的随机性要求”,BCP 106,RFC 40862005年6月。
[15] Eastlake, D. and T. Hansen, "US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, May 2011.
[15] Eastlake,D.和T.Hansen,“美国安全哈希算法(基于SHA和SHA的HMAC和HKDF)”,RFC 6234,2011年5月。
[16] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for High Performance", RFC 1323, May 1992.
[16] Jacobson,V.,Braden,B.和D.Borman,“高性能TCP扩展”,RFC 1323,1992年5月。
[17] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001.
[17] Ramakrishnan,K.,Floyd,S.,和D.Black,“向IP添加显式拥塞通知(ECN)”,RFC 3168,2001年9月。
[18] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E. Lear, "Address Allocation for Private Internets", BCP 5, RFC 1918, February 1996.
[18] Rekhter,Y.,Moskowitz,R.,Karrenberg,D.,Groot,G.,和E.Lear,“私人互联网地址分配”,BCP 5,RFC 1918,1996年2月。
[19] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989.
[19] Braden,R.,“互联网主机的要求-通信层”,标准3,RFC 1122,1989年10月。
[20] Ramaiah, A., "TCP option space extension", Work in Progress, March 2012.
[20] Ramaiah,A.,“TCP选项空间扩展”,正在进行的工作,2012年3月。
[21] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, January 2001.
[21] Srisuresh,P.和K.Egevang,“传统IP网络地址转换器(传统NAT)”,RFC 3022,2001年1月。
[22] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. Shelby, "Performance Enhancing Proxies Intended to Mitigate Link-Related Degradations", RFC 3135, June 2001.
[22] Border,J.,Kojo,M.,Griner,J.,黑山,G.,和Z.Shelby,“旨在缓解链路相关降级的性能增强代理”,RFC 31352001年6月。
[23] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics", Usenix Security 2001, 2001, <http://www.usenix.org/events/sec01/full_papers/ handley/handley.pdf>.
[23] Handley,M.,Paxson,V.,和C.Kreibich,“网络入侵检测:规避、流量规范化和端到端协议语义”,Usenix Security 2001,2001<http://www.usenix.org/events/sec01/full_papers/ handley/handley.pdf>。
[24] Freed, N., "Behavior of and Requirements for Internet Firewalls", RFC 2979, October 2000.
[24] 弗里德,N.,“互联网防火墙的行为和要求”,RFC 2979,2000年10月。
[25] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.
[25] Narten,T.和H.Alvestrand,“在RFCs中编写IANA注意事项部分的指南”,BCP 26,RFC 5226,2008年5月。
The TCP option space is limited due to the length of the Data Offset field in the TCP header (4 bits), which defines the TCP header length in 32-bit words. With the standard TCP header being 20 bytes, this leaves a maximum of 40 bytes for options, and many of these may already be used by options such as timestamp and SACK.
由于TCP报头中数据偏移字段的长度(4位),TCP选项空间受到限制,该字段以32位字定义TCP报头长度。标准TCP报头为20个字节,这将为选项保留最多40个字节,其中许多可能已经被时间戳和SACK等选项使用。
We have performed a brief study on the commonly used TCP options in SYN, data, and pure ACK packets, and found that there is enough room to fit all the options we propose using in this document.
我们对SYN、data和pure ACK数据包中常用的TCP选项进行了简要研究,发现有足够的空间容纳我们在本文中建议使用的所有选项。
SYN packets typically include Maximum Segment Size (MSS) (4 bytes), window scale (3 bytes), SACK permitted (2 bytes), and timestamp (10 bytes) options. Together these sum to 19 bytes. Some operating systems appear to pad each option up to a word boundary, thus using 24 bytes (a brief survey suggests Windows XP and Mac OS X do this, whereas Linux does not). Optimistically, therefore, we have 21 bytes spare, or 16 if it has to be word-aligned. In either case, however, the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16 bytes) options will fit in this remaining space.
SYN数据包通常包括最大段大小(MSS)(4字节)、窗口比例(3字节)、允许SACK(2字节)和时间戳(10字节)选项。这些总和为19个字节。一些操作系统似乎将每个选项填充到一个单词边界,因此使用了24个字节(一项简短的调查表明,Windows XP和Mac OS X可以做到这一点,而Linux不能做到)。因此,乐观地说,我们有21个字节的空闲空间,如果必须进行字对齐,则有16个。但是,在这两种情况下,支持多路径(12字节)和连接(12或16字节)选项的SYN版本将适合此剩余空间。
TCP data packets typically carry timestamp options in every packet, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, if word-aligned). The Data Sequence Signal (DSS) option varies in length depending on whether the data sequence mapping and DATA_ACK are included, and whether the sequence numbers in use are 4 or 8 octets. The maximum size of the DSS option is 28 bytes, so even that will fit in the available space. But unless a connection is both bidirectional and high-bandwidth, it is unlikely that all that option space will be required on each DSS option.
TCP数据包通常在每个包中都带有时间戳选项,占用10个字节(或12个带填充)。剩下30个字节(如果字对齐,则为28个字节)。数据序列信号(DSS)选项的长度取决于是否包括数据序列映射和数据确认,以及使用的序列号是4还是8个八位字节。DSS选项的最大大小为28字节,因此即使是这样,也可以容纳在可用空间中。但是,除非连接是双向和高带宽的,否则每个DSS选项上不太可能需要所有的选项空间。
Within the DSS option, it is not necessary to include the data sequence mapping and DATA_ACK in each packet, and in many cases it may be possible to alternate their presence (so long as the mapping covers the data being sent in the following packet). It would also be possible to alternate between 4- and 8-byte sequence numbers in each option.
在DSS选项中,不必在每个分组中包括数据序列映射和数据确认,并且在许多情况下,可以替换它们的存在(只要映射覆盖在下一分组中发送的数据)。也可以在每个选项中交替使用4字节和8字节的序列号。
On subflow and connection setup, an MPTCP option is also set on the third packet (an ACK). These are 20 bytes (for Multipath Capable) and 24 bytes (for Join), both of which will fit in the available option space.
在子流和连接设置中,第三个数据包(ACK)上也设置了MPTCP选项。这是20个字节(用于支持多路径)和24个字节(用于连接),这两个字节都可以放入可用的选项空间。
Pure ACKs in TCP typically contain only timestamps (10 bytes). Here, Multipath TCP typically needs to encode only the DATA_ACK (maximum of 12 bytes). Occasionally, ACKs will contain SACK information. Depending on the number of lost packets, SACK may utilize the entire
TCP中的纯ACK通常只包含时间戳(10字节)。在这里,多路径TCP通常只需要编码数据确认(最多12个字节)。有时,ACK将包含SACK信息。根据丢失数据包的数量,SACK可以利用整个数据包
option space. If a DATA_ACK had to be included, then it is probably necessary to reduce the number of SACK blocks to accommodate the DATA_ACK. However, the presence of the DATA_ACK is unlikely to be necessary in a case where SACK is in use, since until at least some of the SACK blocks have been retransmitted, the cumulative data-level ACK will not be moving forward (or if it does, due to retransmissions on another path, then that path can also be used to transmit the new DATA_ACK).
选项空间。如果必须包括数据确认,则可能需要减少SACK块的数量以适应数据确认。然而,在SACK正在使用的情况下,不太可能需要存在数据ACK,因为在至少一些SACK块被重新传输之前,累积数据级ACK将不会向前移动(或者如果由于在另一路径上的重新传输而向前移动,则该路径也可用于传输新的数据ACK)。
The ADD_ADDR option can be between 8 and 22 bytes, depending on whether IPv4 or IPv6 is used, and whether or not the port number is present. It is unlikely that such signaling would fit in a data packet (although if there is space, it is fine to include it). It is recommended to use duplicate ACKs with no other payload or options in order to transmit these rare signals. Note this is the reason for mandating that duplicate ACKs with MPTCP options are not taken as a signal of congestion.
ADD_ADDR选项可以在8到22字节之间,具体取决于使用的是IPv4还是IPv6,以及端口号是否存在。这样的信令不太可能适合数据包(尽管如果有空间,也可以包含它)。建议在没有其他有效负载或选项的情况下使用复制ACK,以便传输这些罕见的信号。注:这是强制要求使用MPTCP选项的重复ACK不被视为拥塞信号的原因。
Finally, there are issues with reliable delivery of options. As options can also be sent on pure ACKs, these are not reliably sent. This is not an issue for DATA_ACK due to their cumulative nature, but may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is recommended to send these options redundantly (whether on multiple paths or on the same path on a number of ACKs -- but interspersed with data in order to avoid interpretation as congestion). The cases where options are stripped by middleboxes are discussed in Section 6.
最后,期权的可靠交付存在问题。由于选项也可以在纯ACK上发送,因此无法可靠地发送这些选项。由于数据的累积性质,这不是数据确认的问题,但可能是添加/删除添加选项的问题。在这里,建议冗余发送这些选项(无论是在多条路径上还是在多个ack上的同一路径上,但要散布数据,以避免被解释为拥塞)。第6节讨论了通过中间箱剥离选项的情况。
Conceptually, an MPTCP connection can be represented as an MPTCP control block that contains several variables that track the progress and the state of the MPTCP connection and a set of linked TCP control blocks that correspond to the subflows that have been established.
从概念上讲,MPTCP连接可以表示为MPTCP控制块,该控制块包含多个跟踪MPTCP连接进度和状态的变量,以及一组与已建立的子流相对应的链接TCP控制块。
RFC 793 [1] specifies several state variables. Whenever possible, we reuse the same terminology as RFC 793 to describe the state variables that are maintained by MPTCP.
RFC 793[1]指定了几个状态变量。只要有可能,我们就使用与RFC793相同的术语来描述由MPTCP维护的状态变量。
The MPTCP control block contains the following variable per connection.
MPTCP控制块包含每个连接的以下变量。
Local.Token (32 bits): This is the token chosen by the local host on this MPTCP connection. The token MUST be unique among all established MPTCP connections, generated from the local key.
令牌(32位):这是本地主机在此MPTCP连接上选择的令牌。令牌在所有已建立的MPTCP连接中必须是唯一的,由本地密钥生成。
Local.Key (64 bits): This is the key sent by the local host on this MPTCP connection.
Key(64位):这是本地主机在此MPTCP连接上发送的密钥。
Remote.Token (32 bits): This is the token chosen by the remote host on this MPTCP connection, generated from the remote key.
令牌(32位):这是远程主机在此MPTCP连接上选择的令牌,由远程密钥生成。
Remote.Key (64 bits): This is the key chosen by the remote host on this MPTCP connection
Key(64位):这是远程主机在此MPTCP连接上选择的密钥
MPTCP.Checksum (flag): This flag is set to true if at least one of the hosts has set the C bit in the MP_CAPABLE options exchanged during connection establishment, and is set to false otherwise. If this flag is set, the checksum must be computed in all DSS options.
MPTCP.Checksum(flag):如果至少有一台主机在连接建立期间交换的支持MP_的选项中设置了C位,则此标志设置为true,否则设置为false。如果设置了此标志,则必须在所有DSS选项中计算校验和。
SND.UNA (64 bits): This is the data sequence number of the next byte to be acknowledged, at the MPTCP connection level. This variable is updated upon reception of a DSS option containing a DATA_ACK.
SND.UNA(64位):这是在MPTCP连接级别要确认的下一个字节的数据序列号。此变量在接收到包含数据确认的DSS选项时更新。
SND.NXT (64 bits): This is the data sequence number of the next byte to be sent. SND.NXT is used to determine the value of the DSN in the DSS option.
SND.NXT(64位):这是要发送的下一个字节的数据序列号。SND.NXT用于确定DSS选项中DSN的值。
SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the sending window. MPTCP maintains the sending window at the MPTCP connection level and the same window is shared by all subflows. All subflows use the MPTCP connection level SND.WND to compute the SEQ.WND value that is sent in each transmitted segment.
SND.WND(RFC1323为32位,其他为16位):这是发送窗口。MPTCP在MPTCP连接级别维护发送窗口,并且相同的窗口由所有子流共享。所有子流使用MPTCP连接级别SND.WND来计算在每个传输段中发送的SEQ.WND值。
RCV.NXT (64 bits): This is the data sequence number of the next byte that is expected on the MPTCP connection. This state variable is modified upon reception of in-order data. The value of RCV.NXT is used to specify the DATA_ACK that is sent in the DSS option on all subflows.
RCV.NXT(64位):这是MPTCP连接上预期的下一个字节的数据序列号。该状态变量在收到订单数据时进行修改。RCV.NXT的值用于指定在所有子流的DSS选项中发送的数据确认。
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the connection-level receive window, which is the maximum of the RCV.WND on all the subflows.
RCV.WND(RFC 1323为32位,其他为16位):这是连接级别接收窗口,是所有子流上RCV.WND的最大值。
The MPTCP control block also contains a list of the TCP control blocks that are associated to the MPTCP connection.
MPTCP控制块还包含与MPTCP连接关联的TCP控制块列表。
Note that the TCP control block on the TCP subflows does not contain the RCV.WND and SND.WND state variables as these are maintained at the MPTCP connection level and not at the subflow level.
请注意,TCP子流上的TCP控制块不包含RCV.WND和SND.WND状态变量,因为它们保持在MPTCP连接级别,而不是子流级别。
Inside each TCP control block, the following state variables are defined.
在每个TCP控制块中,定义了以下状态变量。
SND.UNA (32 bits): This is the sequence number of the next byte to be acknowledged on the subflow. This variable is updated upon reception of each TCP acknowledgment on the subflow.
SND.UNA(32位):这是子流上要确认的下一个字节的序列号。此变量在子流上接收到每个TCP确认时更新。
SND.NXT (32 bits): This is the sequence number of the next byte to be sent on the subflow. SND.NXT is used to set the value of SEG.SEQ upon transmission of the next segment.
SND.NXT(32位):这是子流上要发送的下一个字节的序列号。SND.NXT用于在传输下一段时设置SEG.SEQ的值。
RCV.NXT (32 bits): This is the sequence number of the next byte that is expected on the subflow. This state variable is modified upon reception of in-order segments. The value of RCV.NXT is copied to the SEG.ACK field of the next segments transmitted on the subflow.
RCV.NXT(32位):这是子流上预期的下一个字节的序列号。该状态变量在收到顺序段时进行修改。RCV.NXT的值被复制到子流上传输的下一段的SEG.ACK字段中。
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the subflow-level receive window that is updated with the window field from the segments received on this subflow.
RCV.WND(RFC 1323为32位,其他为16位):这是子流级别的接收窗口,使用该子流上接收的段的窗口字段进行更新。
The diagram in Figure 17 shows the Finite State Machine for connection-level closure. This illustrates how the DATA_FIN connection-level signal (indicated as the DFIN flag on a DATA_ACK) interacts with subflow-level FINs, and permits "break-before-make" handover between subflows.
图17中的图表显示了连接级别关闭的有限状态机。这说明了数据FIN连接级别信号(在数据确认上指示为DFIN标志)如何与子流级别FIN交互,并允许在子流之间进行“先断后通”切换。
+---------+ | M_ESTAB | +---------+ M_CLOSE | | rcv DATA_FIN ------- | | ------- +---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +---------+ | M_FIN |<----------------- ------------------->| M_CLOSE | | WAIT-1 |--------------------------- | WAIT | +---------+ rcv DATA_FIN \ +---------+ | rcv DATA_ACK[DFIN] ------- | M_CLOSE | | -------------- snd DATA_ACK | ------- | | CLOSE all subflows | snd DATA_FIN | V V V +-----------+ +-----------+ +-----------+ |M_FINWAIT-2| | M_CLOSING | | M_LAST-ACK| +-----------+ +-----------+ +-----------+ | rcv DATA_ACK[DFIN] | rcv DATA_ACK[DFIN] | | rcv DATA_FIN -------------- | -------------- | | ------- CLOSE all subflows | CLOSE all subflows | | snd DATA_ACK[DFIN] V delete MPTCP PCB V \ +-----------+ +---------+ ------------------------>|M_TIME WAIT|----------------->| M_CLOSED| +-----------+ +---------+ All subflows in CLOSED ------------ delete MPTCP PCB
+---------+ | M_ESTAB | +---------+ M_CLOSE | | rcv DATA_FIN ------- | | ------- +---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +---------+ | M_FIN |<----------------- ------------------->| M_CLOSE | | WAIT-1 |--------------------------- | WAIT | +---------+ rcv DATA_FIN \ +---------+ | rcv DATA_ACK[DFIN] ------- | M_CLOSE | | -------------- snd DATA_ACK | ------- | | CLOSE all subflows | snd DATA_FIN | V V V +-----------+ +-----------+ +-----------+ |M_FINWAIT-2| | M_CLOSING | | M_LAST-ACK| +-----------+ +-----------+ +-----------+ | rcv DATA_ACK[DFIN] | rcv DATA_ACK[DFIN] | | rcv DATA_FIN -------------- | -------------- | | ------- CLOSE all subflows | CLOSE all subflows | | snd DATA_ACK[DFIN] V delete MPTCP PCB V \ +-----------+ +---------+ ------------------------>|M_TIME WAIT|----------------->| M_CLOSED| +-----------+ +---------+ All subflows in CLOSED ------------ delete MPTCP PCB
Figure 17: Finite State Machine for Connection Closure
图17:连接闭合的有限状态机
Authors' Addresses
作者地址
Alan Ford Cisco Ruscombe Business Park Ruscombe, Berkshire RG10 9NN UK
艾伦·福特思科罗斯科姆商业园罗斯科姆,伯克希尔RG10 9NN英国
EMail: alanford@cisco.com
EMail: alanford@cisco.com
Costin Raiciu University Politehnica of Bucharest Splaiul Independentei 313 Bucharest Romania
罗马尼亚布加勒斯特独立学院布加勒斯特理工大学
EMail: costin.raiciu@cs.pub.ro
EMail: costin.raiciu@cs.pub.ro
Mark Handley University College London Gower Street London WC1E 6BT UK
马克·汉德利大学学院伦敦高尔街伦敦WC1E 6BT英国
EMail: m.handley@cs.ucl.ac.uk
EMail: m.handley@cs.ucl.ac.uk
Olivier Bonaventure Universite catholique de Louvain Pl. Ste Barbe, 2 Louvain-la-Neuve 1348 Belgium
Olivier Bonaventure Universite catholique de Louvain Pl.Ste Barbe,2 Louvain la Neuve 1348比利时
EMail: olivier.bonaventure@uclouvain.be
EMail: olivier.bonaventure@uclouvain.be