Network Working Group                                          P. Culley
Request for Comments: 5044                       Hewlett-Packard Company
Category: Standards Track                                       U. Elzur
                                                    Broadcom Corporation
                                                                R. Recio
                                                         IBM Corporation
                                                               S. Bailey
                                                   Sandburst Corporation
                                                              J. Carrier
                                                               Cray Inc.
                                                            October 2007
        
Network Working Group                                          P. Culley
Request for Comments: 5044                       Hewlett-Packard Company
Category: Standards Track                                       U. Elzur
                                                    Broadcom Corporation
                                                                R. Recio
                                                         IBM Corporation
                                                               S. Bailey
                                                   Sandburst Corporation
                                                              J. Carrier
                                                               Cray Inc.
                                                            October 2007
        

Marker PDU Aligned Framing for TCP Specification

TCP规范的标记PDU对齐帧

Status of This Memo

关于下段备忘

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Abstract

摘要

Marker PDU Aligned Framing (MPA) is designed to work as an "adaptation layer" between TCP and the Direct Data Placement protocol (DDP) as described in RFC 5041. It preserves the reliable, in-order delivery of TCP, while adding the preservation of higher-level protocol record boundaries that DDP requires. MPA is fully compliant with applicable TCP RFCs and can be utilized with existing TCP implementations. MPA also supports integrated implementations that combine TCP, MPA and DDP to reduce buffering requirements in the implementation and improve performance at the system level.

标记PDU对齐帧(MPA)被设计为TCP和直接数据放置协议(DDP)之间的“适配层”,如RFC 5041所述。它保留了TCP的可靠有序传输,同时添加了DDP所需的更高级别协议记录边界的保留。MPA完全符合适用的TCP RFC,可用于现有的TCP实现。MPA还支持结合TCP、MPA和DDP的集成实现,以减少实现中的缓冲需求并提高系统级别的性能。

Table of Contents

目录

   1. Introduction ....................................................4
      1.1. Motivation .................................................4
      1.2. Protocol Overview ..........................................5
   2. Glossary ........................................................8
   3. MPA's Interactions with DDP ....................................11
   4. MPA Full Operation Phase .......................................13
      4.1. FPDU Format ...............................................13
      4.2. Marker Format .............................................14
      4.3. MPA Markers ...............................................14
      4.4. CRC Calculation ...........................................16
      4.5. FPDU Size Considerations ..................................21
   5. MPA's interactions with TCP ....................................22
      5.1. MPA transmitters with a standard layered TCP ..............22
      5.2. MPA receivers with a standard layered TCP .................23
   6. MPA Receiver FPDU Identification ...............................24
   7. Connection Semantics ...........................................24
      7.1. Connection Setup ..........................................24
           7.1.1. MPA Request and Reply Frame Format .................26
           7.1.2. Connection Startup Rules ...........................28
           7.1.3. Example Delayed Startup Sequence ...................30
           7.1.4. Use of Private Data ................................33
                  7.1.4.1. Motivation ................................33
                  7.1.4.2. Example Immediate Startup Using
                           Private Data ..............................35
           7.1.5. "Dual Stack" Implementations .......................37
      7.2. Normal Connection Teardown ................................38
   8. Error Semantics ................................................39
   9. Security Considerations ........................................40
      9.1. Protocol-Specific Security Considerations .................40
           9.1.1. Spoofing ...........................................40
                  9.1.1.1. Impersonation .............................41
                  9.1.1.2. Stream Hijacking ..........................41
                  9.1.1.3. Man-in-the-Middle Attack ..................41
           9.1.2. Eavesdropping ......................................42
      9.2. Introduction to Security Options ..........................42
      9.3. Using IPsec with MPA ......................................43
      9.4. Requirements for IPsec Encapsulation of MPA/DDP ...........43
   10. IANA Considerations ...........................................44
   Appendix A. Optimized MPA-Aware TCP Implementations ...............45
      A.1. Optimized MPA/TCP Transmitters ............................46
      A.2. Effects of Optimized MPA/TCP Segmentation .................46
      A.3. Optimized MPA/TCP Receivers ...............................48
      A.4. Re-segmenting Middleboxes and Non-Optimized MPA/TCP
           Senders ...................................................49
      A.5. Receiver Implementation ...................................50
           A.5.1. Network Layer Reassembly Buffers ...................51
        
   1. Introduction ....................................................4
      1.1. Motivation .................................................4
      1.2. Protocol Overview ..........................................5
   2. Glossary ........................................................8
   3. MPA's Interactions with DDP ....................................11
   4. MPA Full Operation Phase .......................................13
      4.1. FPDU Format ...............................................13
      4.2. Marker Format .............................................14
      4.3. MPA Markers ...............................................14
      4.4. CRC Calculation ...........................................16
      4.5. FPDU Size Considerations ..................................21
   5. MPA's interactions with TCP ....................................22
      5.1. MPA transmitters with a standard layered TCP ..............22
      5.2. MPA receivers with a standard layered TCP .................23
   6. MPA Receiver FPDU Identification ...............................24
   7. Connection Semantics ...........................................24
      7.1. Connection Setup ..........................................24
           7.1.1. MPA Request and Reply Frame Format .................26
           7.1.2. Connection Startup Rules ...........................28
           7.1.3. Example Delayed Startup Sequence ...................30
           7.1.4. Use of Private Data ................................33
                  7.1.4.1. Motivation ................................33
                  7.1.4.2. Example Immediate Startup Using
                           Private Data ..............................35
           7.1.5. "Dual Stack" Implementations .......................37
      7.2. Normal Connection Teardown ................................38
   8. Error Semantics ................................................39
   9. Security Considerations ........................................40
      9.1. Protocol-Specific Security Considerations .................40
           9.1.1. Spoofing ...........................................40
                  9.1.1.1. Impersonation .............................41
                  9.1.1.2. Stream Hijacking ..........................41
                  9.1.1.3. Man-in-the-Middle Attack ..................41
           9.1.2. Eavesdropping ......................................42
      9.2. Introduction to Security Options ..........................42
      9.3. Using IPsec with MPA ......................................43
      9.4. Requirements for IPsec Encapsulation of MPA/DDP ...........43
   10. IANA Considerations ...........................................44
   Appendix A. Optimized MPA-Aware TCP Implementations ...............45
      A.1. Optimized MPA/TCP Transmitters ............................46
      A.2. Effects of Optimized MPA/TCP Segmentation .................46
      A.3. Optimized MPA/TCP Receivers ...............................48
      A.4. Re-segmenting Middleboxes and Non-Optimized MPA/TCP
           Senders ...................................................49
      A.5. Receiver Implementation ...................................50
           A.5.1. Network Layer Reassembly Buffers ...................51
        
           A.5.2. TCP Reassembly Buffers .............................52
   Appendix B. Analysis of MPA over TCP Operations ...................52
      B.1. Assumptions ...............................................53
           B.1.1. MPA Is Layered beneath DDP .........................53
           B.1.2. MPA Preserves DDP Message Framing ..................53
           B.1.3. The Size of the ULPDU Passed to MPA Is Less Than
                  EMSS Under Normal Conditions .......................53
           B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery.54
     B.2.  The Value of FPDU Alignment ...............................54
           B.2.1. Impact of Lack of FPDU Alignment on the Receiver
                  Computational Load and Complexity ..................56
           B.2.2. FPDU Alignment Effects on TCP Wire Protocol ........60
   Appendix C. IETF Implementation Interoperability with RDMA
               Consortium Protocols ..................................62
     C.1. Negotiated Parameters ......................................63
     C.2. RDMAC RNIC and Non-Permissive IETF RNIC ....................64
          C.2.1. RDMAC RNIC Initiator ................................65
          C.2.2. Non-Permissive IETF RNIC Initiator ..................65
          C.2.3. RDMAC RNIC and Permissive IETF RNIC .................65
          C.2.4. RDMAC RNIC Initiator ................................66
          C.2.5. Permissive IETF RNIC Initiator ......................67
     C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC ..........67
   Normative References ..............................................68
   Informative References ............................................68
   Contributors ......................................................70
        
           A.5.2. TCP Reassembly Buffers .............................52
   Appendix B. Analysis of MPA over TCP Operations ...................52
      B.1. Assumptions ...............................................53
           B.1.1. MPA Is Layered beneath DDP .........................53
           B.1.2. MPA Preserves DDP Message Framing ..................53
           B.1.3. The Size of the ULPDU Passed to MPA Is Less Than
                  EMSS Under Normal Conditions .......................53
           B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery.54
     B.2.  The Value of FPDU Alignment ...............................54
           B.2.1. Impact of Lack of FPDU Alignment on the Receiver
                  Computational Load and Complexity ..................56
           B.2.2. FPDU Alignment Effects on TCP Wire Protocol ........60
   Appendix C. IETF Implementation Interoperability with RDMA
               Consortium Protocols ..................................62
     C.1. Negotiated Parameters ......................................63
     C.2. RDMAC RNIC and Non-Permissive IETF RNIC ....................64
          C.2.1. RDMAC RNIC Initiator ................................65
          C.2.2. Non-Permissive IETF RNIC Initiator ..................65
          C.2.3. RDMAC RNIC and Permissive IETF RNIC .................65
          C.2.4. RDMAC RNIC Initiator ................................66
          C.2.5. Permissive IETF RNIC Initiator ......................67
     C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC ..........67
   Normative References ..............................................68
   Informative References ............................................68
   Contributors ......................................................70
        

Table of Figures

图表

   Figure 1: ULP MPA TCP Layering .....................................5
   Figure 2: FPDU Format .............................................13
   Figure 3: Marker Format ...........................................14
   Figure 4: Example FPDU Format with Marker .........................16
   Figure 5: Annotated Hex Dump of an FPDU ...........................19
   Figure 6: Annotated Hex Dump of an FPDU with Marker ...............20
   Figure 7: Fully Layered Implementation ............................22
   Figure 8: MPA Request/Reply Frame .................................26
   Figure 9: Example Delayed Startup Negotiation .....................31
   Figure 10: Example Immediate Startup Negotiation ..................35
   Figure 11: Optimized MPA/TCP Implementation .......................45
   Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream .....56
   Figure 13: Aligned FPDU Placed Immediately after TCP Header .......58
   Figure 14: Connection Parameters for the RNIC Types ...............63
   Figure 15: MPA Negotiation between an RDMAC RNIC and a
              Non-Permissive IETF RNIC ...............................65
   Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive
              IETF RNIC ..............................................66
   Figure 17: MPA Negotiation between a Non-Permissive IETF RNIC and
              a Permissive IETF RNIC .................................67
        
   Figure 1: ULP MPA TCP Layering .....................................5
   Figure 2: FPDU Format .............................................13
   Figure 3: Marker Format ...........................................14
   Figure 4: Example FPDU Format with Marker .........................16
   Figure 5: Annotated Hex Dump of an FPDU ...........................19
   Figure 6: Annotated Hex Dump of an FPDU with Marker ...............20
   Figure 7: Fully Layered Implementation ............................22
   Figure 8: MPA Request/Reply Frame .................................26
   Figure 9: Example Delayed Startup Negotiation .....................31
   Figure 10: Example Immediate Startup Negotiation ..................35
   Figure 11: Optimized MPA/TCP Implementation .......................45
   Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream .....56
   Figure 13: Aligned FPDU Placed Immediately after TCP Header .......58
   Figure 14: Connection Parameters for the RNIC Types ...............63
   Figure 15: MPA Negotiation between an RDMAC RNIC and a
              Non-Permissive IETF RNIC ...............................65
   Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive
              IETF RNIC ..............................................66
   Figure 17: MPA Negotiation between a Non-Permissive IETF RNIC and
              a Permissive IETF RNIC .................................67
        
1. Introduction
1. 介绍

This section discusses the reason for creating MPA on TCP and a general overview of the protocol.

本节讨论在TCP上创建MPA的原因以及协议的一般概述。

1.1. Motivation
1.1. 动机

The Direct Data Placement protocol [DDP], when used with TCP [RFC793], requires a mechanism to detect record boundaries. The DDP records are referred to as Upper Layer Protocol Data Units by this document. The ability to locate the Upper Layer Protocol Data Unit (ULPDU) boundary is useful to a hardware network adapter that uses DDP to directly place the data in the application buffer based on the control information carried in the ULPDU header. This may be done without requiring that the packets arrive in order. Potential benefits of this capability are the avoidance of the memory copy overhead and a smaller memory requirement for handling out-of-order or dropped packets.

当与TCP[RFC793]一起使用时,直接数据放置协议[DDP]需要一种检测记录边界的机制。本文件将DDP记录称为上层协议数据单元。定位上层协议数据单元(ULPDU)边界的能力对于硬件网络适配器非常有用,该适配器使用DDP根据ULPDU报头中携带的控制信息直接将数据放入应用程序缓冲区。这可以在不要求分组按顺序到达的情况下完成。此功能的潜在好处是避免了内存复制开销,并减少了处理无序或丢弃的数据包所需的内存。

Many approaches have been proposed for a generalized framing mechanism. Some are probabilistic in nature and others are deterministic. An example probabilistic approach is characterized by a detectable value embedded in the octet stream, with no method of preventing that value elsewhere within user data. It is probabilistic because under some conditions the receiver may incorrectly interpret application data as the detectable value. Under these conditions, the protocol may fail with unacceptable frequency. One deterministic approach is characterized by embedded controls at known locations in the octet stream. Because the receiver can guarantee it will only examine the data stream at locations that are known to contain the embedded control, the protocol can never misinterpret application data as being embedded control data. For unambiguous handling of an out-of-order packet, a deterministic approach is preferred.

对于广义框架机制,已经提出了许多方法。有些本质上是概率的,有些则是确定性的。一种示例概率方法的特征是嵌入在八位字节流中的可检测值,而没有在用户数据的其他地方防止该值的方法。这是概率的,因为在某些条件下,接收器可能会错误地将应用程序数据解释为可检测值。在这些情况下,协议可能以不可接受的频率失败。一种确定性方法的特点是在八位元流中的已知位置嵌入控件。因为接收器可以保证它只检查已知包含嵌入式控制的位置处的数据流,所以协议永远不会将应用程序数据误解为嵌入式控制数据。对于无序数据包的明确处理,首选确定性方法。

The MPA protocol provides a framing mechanism for DDP running over TCP using the deterministic approach. It allows the location of the ULPDU to be determined in the TCP stream even if the TCP segments arrive out of order.

MPA协议使用确定性方法为在TCP上运行的DDP提供了帧机制。它允许在TCP流中确定ULPDU的位置,即使TCP段到达时出现故障。

1.2. Protocol Overview
1.2. 协议概述

The layering of PDUs with MPA is shown in Figure 1, below.

PDU与MPA的分层如下图1所示。

               +------------------+
               |     ULP client   |
               +------------------+  <- Consumer messages
               |        DDP       |
               +------------------+  <- ULPDUs
               |        MPA*      |
               +------------------+  <- FPDUs (containing ULPDUs)
               |        TCP*      |
               +------------------+  <- TCP Segments (containing FPDUs)
               |      IP etc.     |
               +------------------+
                * These may be fully layered or optimized together.
        
               +------------------+
               |     ULP client   |
               +------------------+  <- Consumer messages
               |        DDP       |
               +------------------+  <- ULPDUs
               |        MPA*      |
               +------------------+  <- FPDUs (containing ULPDUs)
               |        TCP*      |
               +------------------+  <- TCP Segments (containing FPDUs)
               |      IP etc.     |
               +------------------+
                * These may be fully layered or optimized together.
        

Figure 1: ULP MPA TCP Layering

图1:ULP MPA TCP分层

MPA is described as an extra layer above TCP and below DDP. The operation sequence is:

MPA被描述为TCP之上和DDP之下的额外层。操作顺序为:

1. A TCP connection is established by ULP action. This is done using methods not described by this specification. The ULP may exchange some amount of data in streaming mode prior to starting MPA, but is not required to do so.

1. 通过ULP操作建立TCP连接。这是使用本规范未描述的方法完成的。ULP可以在启动MPA之前以流模式交换一定数量的数据,但不需要这样做。

2. The Consumer negotiates the use of DDP and MPA at both ends of a connection. The mechanisms to do this are not described in this specification. The negotiation may be done in streaming mode, or by some other mechanism (such as a pre-arranged port number).

2. 消费者协商在连接两端使用DDP和MPA。本规范中未描述执行此操作的机制。协商可以在流模式下完成,或者通过一些其他机制(例如预先安排的端口号)完成。

3. The ULP activates MPA on each end in the Startup Phase, either as an Initiator or a Responder, as determined by the ULP. This mode verifies the usage of MPA, specifies the use of CRC and Markers, and allows the ULP to communicate some additional data via a Private Data exchange. See Section 7.1, Connection Setup, for more details on the startup process.

3. ULP在启动阶段的每一端激活MPA,作为启动器或响应者,由ULP确定。此模式验证MPA的使用,指定CRC和标记的使用,并允许ULP通过专用数据交换来传输一些附加数据。有关启动过程的更多详细信息,请参阅第7.1节“连接设置”。

4. At the end of the Startup Phase, the ULP puts MPA (and DDP) into Full Operation and begins sending DDP data as further described below. In this document, DDP data chunks are called ULPDUs. For a description of the DDP data, see [DDP].

4. 在启动阶段结束时,ULP将MPA(和DDP)投入全面运行,并开始发送DDP数据,如下所述。在本文档中,DDP数据块称为ULPDU。有关DDP数据的说明,请参阅[DDP]。

Following is a description of data transfer when MPA is in Full Operation.

以下是MPA完全运行时的数据传输说明。

1. DDP determines the Maximum ULPDU (MULPDU) size by querying MPA for this value. MPA derives this information from TCP or IP, when it is available, or chooses a reasonable value.

1. DDP通过查询MPA来确定最大ULPDU(MULPDU)大小。MPA从TCP或IP中获取此信息(如果可用),或选择合理的值。

2. DDP creates ULPDUs of MULPDU size or smaller, and hands them to MPA at the sender.

2. DDP创建MULPDU大小或更小的ULPDU,并将其交给发送方的MPA。

3. MPA creates a Framed Protocol Data Unit (FPDU) by prepending a header, optionally inserting Markers, and appending a CRC field after the ULPDU and PAD (if any). MPA delivers the FPDU to TCP.

3. MPA通过在头文件前面加上前缀、选择性地插入标记以及在ULPDU和PAD(如果有的话)后面加上CRC字段来创建帧协议数据单元(FPDU)。MPA将FPDU交付给TCP。

4. The TCP sender puts the FPDUs into the TCP stream. If the sender is optimized MPA/TCP, it segments the TCP stream in such a way that a TCP Segment boundary is also the boundary of an FPDU. TCP then passes each segment to the IP layer for transmission.

4. TCP发送方将FPDU放入TCP流中。如果发送方是优化的MPA/TCP,它将以TCP段边界也是FPDU边界的方式对TCP流进行分段。TCP然后将每个段传递到IP层进行传输。

5. The receiver may or may not be optimized. If it is optimized MPA/TCP, it may separate passing the TCP payload to MPA from passing the TCP payload ordering information to MPA. In either case, RFC-compliant TCP wire behavior is observed at both the sender and receiver.

5. 接收机可以优化,也可以不优化。如果它是优化的MPA/TCP,它可以将TCP有效负载传递给MPA与将TCP有效负载排序信息传递给MPA分开。在这两种情况下,在发送方和接收方都可以观察到符合RFC的TCP连接行为。

6. The MPA receiver locates and assembles complete FPDUs within the stream, verifies their integrity, and removes MPA Markers (when present), ULPDU_Length, PAD, and the CRC field.

6. MPA接收器在流中定位和组装完整的FPDU,验证其完整性,并移除MPA标记(如果存在)、ULPDU_长度、PAD和CRC字段。

7. MPA then provides the complete ULPDUs to DDP. MPA may also separate passing MPA payload to DDP from passing the MPA payload ordering information.

7. MPA然后向DDP提供完整的ULPDU。MPA还可以将向DDP传递MPA有效载荷与传递MPA有效载荷订购信息分开。

A fully layered MPA on TCP is implemented as a data stream ULP for TCP and is therefore RFC compliant.

TCP上的完全分层MPA作为TCP的数据流ULP实现,因此符合RFC。

An optimized DDP/MPA/TCP uses a TCP layer that potentially contains some additional behaviors as suggested in this document. When DDP/MPA/TCP are cross-layer optimized, the behavior of TCP (especially sender segmentation) may change from that of the un-optimized implementation, but the changes are within the bounds permitted by the TCP RFC specifications, and will interoperate with an un-optimized TCP. The additional behaviors are described in Appendix A and are not normative; they are described at a TCP interface layer as a convenience. Implementations may achieve the described functionality using any method, including cross-layer optimizations between TCP, MPA, and DDP.

优化的DDP/MPA/TCP使用TCP层,该层可能包含本文档中建议的一些附加行为。当DDP/MPA/TCP进行跨层优化时,TCP的行为(尤其是发送方分段)可能会与未优化的实现的行为不同,但这些变化在TCP RFC规范允许的范围内,并且将与未优化的TCP进行互操作。附录A中描述了其他行为,这些行为不规范;为了方便起见,在TCP接口层对它们进行了描述。实现可以使用任何方法实现所描述的功能,包括TCP、MPA和DDP之间的跨层优化。

An optimized DDP/MPA/TCP sender is able to segment the data stream such that TCP segments begin with FPDUs (FPDU Alignment). This has significant advantages for receivers. When segments arrive with aligned FPDUs, the receiver usually need not buffer any portion of the segment, allowing DDP to place it in its destination memory immediately, thus avoiding copies from intermediate buffers (DDP's reason for existence).

优化的DDP/MPA/TCP发送器能够对数据流进行分段,从而使TCP段以FPDU(FPDU对齐)开始。这对接收者有显著的优势。当段与对齐的FPDU一起到达时,接收器通常不需要缓冲段的任何部分,允许DDP立即将其放入其目标内存中,从而避免从中间缓冲区复制(DDP存在的原因)。

An optimized DDP/MPA/TCP receiver allows a DDP on MPA implementation to locate the start of ULPDUs that may be received out of order. It also allows the implementation to determine if the entire ULPDU has been received. As a result, MPA can pass out-of-order ULPDUs to DDP for immediate use. This enables a DDP on MPA implementation to save a significant amount of intermediate storage by placing the ULPDUs in the right locations in the application buffers when they arrive, rather than waiting until full ordering can be restored.

优化的DDP/MPA/TCP接收器允许MPA上的DDP实现定位可能无序接收的ULPDU的开始。它还允许实现确定是否已接收到整个ULPDU。因此,MPA可以将无序的ULPDU传递给DDP,以便立即使用。这使得MPA上的DDP实现能够通过在ULPDU到达时将其放置在应用程序缓冲区中的正确位置,而不是等到可以恢复完全订购,从而节省大量的中间存储。

The ability of a receiver to recover out-of-order ULPDUs is optional and declared to the transmitter during startup. When the receiver declares that it does not support out-of-order recovery, the transmitter does not add the control information to the data stream needed for out-of-order recovery.

接收机恢复无序ULPDU的能力是可选的,并在启动期间向发射机声明。当接收器声明它不支持无序恢复时,发送器不会将控制信息添加到无序恢复所需的数据流中。

If the receiver is fully layered, then MPA receives a strictly ordered stream of data and does not deal with out-of-order ULPDUs. In this case, MPA passes each ULPDU to DDP when the last bytes arrive from TCP, along with the indication that they are in order.

如果接收器是完全分层的,那么MPA接收严格有序的数据流,并且不处理无序的ULPDU。在这种情况下,当最后一个字节从TCP到达时,MPA将每个ULPDU传递给DDP,并指示它们是有序的。

MPA implementations that support recovery of out-of-order ULPDUs MUST support a mechanism to indicate the ordering of ULPDUs as the sender transmitted them and indicate when missing intermediate segments arrive. These mechanisms allow DDP to reestablish record ordering and report Delivery of complete messages (groups of records).

支持无序ULPDU恢复的MPA实现必须支持一种机制,在发送方传输ULPDU时指示ULPDU的顺序,并指示缺失的中间段何时到达。这些机制允许DDP重新建立记录顺序并报告完整消息(记录组)的传递。

MPA also addresses enhanced data integrity. Some users of TCP have noted that the TCP checksum is not as strong as could be desired (see [CRCTCP]). Studies such as [CRCTCP] have shown that the TCP checksum indicates segments in error at a much higher rate than the underlying link characteristics would indicate. With these higher error rates, the chance that an error will escape detection, when using only the TCP checksum for data integrity, becomes a concern. A stronger integrity check can reduce the chance of data errors being missed.

MPA还解决了增强的数据完整性问题。一些TCP用户注意到TCP校验和没有期望的那么强(参见[CRCTCP])。像[CRCTCP]这样的研究表明,TCP校验和以比底层链路特征更高的速率指示出错的段。由于错误率较高,当仅使用TCP校验和进行数据完整性检查时,错误逃脱检测的可能性成为一个问题。更强的完整性检查可以减少丢失数据错误的机会。

MPA includes a CRC check to increase the ULPDU data integrity to the level provided by other modern protocols, such as SCTP [RFC4960]. It is possible to disable this CRC check; however, CRCs MUST be enabled unless it is clear that the end-to-end connection through the network has data integrity at least as good as an MPA with CRC enabled (for

MPA包括CRC检查,以将ULPDU数据完整性提高到其他现代协议(如SCTP[RFC4960])提供的水平。可以禁用此CRC检查;但是,必须启用CRC,除非通过网络的端到端连接具有至少与启用CRC的MPA一样好的数据完整性(例如

example, when IPsec is implemented end to end). DDP's ULP expects this level of data integrity and therefore the ULP does not have to provide its own duplicate data integrity and error recovery for lost data.

例如,在端到端实现IPsec时)。DDP的ULP需要这种级别的数据完整性,因此ULP不必为丢失的数据提供自己的重复数据完整性和错误恢复。

2. Glossary
2. 术语汇编

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。

Consumer - the ULPs or applications that lie above MPA and DDP. The Consumer is responsible for making TCP connections, starting MPA and DDP connections, and generally controlling operations.

消费者-ULP或高于MPA和DDP的应用。使用者负责建立TCP连接,启动MPA和DDP连接,并通常控制操作。

CRC - Cyclic Redundancy Check.

循环冗余校验。

Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as the process of informing DDP that a particular PDU is ordered for use. A PDU is Delivered in the exact order that it was sent by the original sender; MPA uses TCP's byte stream ordering to determine when Delivery is possible. This is specifically different from "passing the PDU to DDP", which may generally occur in any order, while the order of Delivery is strictly defined.

交付-(交付,交付)-对于MPA,交付定义为通知DDP订购特定PDU以供使用的过程。PDU的交付顺序与原始发件人发送的顺序完全相同;MPA使用TCP的字节流排序来确定何时可以交付。这与“将PDU传递给DDP”不同,后者通常以任何顺序发生,而交货顺序是严格定义的。

EMSS - Effective Maximum Segment Size. EMSS is the smaller of the TCP maximum segment size (MSS) as defined in RFC 793 [RFC793], and the current path Maximum Transmission Unit (MTU) [RFC1191].

EMSS-有效最大段大小。EMSS是RFC 793[RFC793]中定义的TCP最大段大小(MSS)和当前路径最大传输单元(MTU)[RFC1191]中的较小值。

FPDU - Framed Protocol Data Unit. The unit of data created by an MPA sender.

帧协议数据单元。MPA发送方创建的数据单位。

FPDU Alignment - The property that an FPDU is Header Aligned with the TCP segment, and the TCP segment includes an integer number of FPDUs. A TCP segment with an FPDU Alignment allows immediate processing of the contained FPDUs without waiting on other TCP segments to arrive or combining with prior segments.

FPDU对齐-FPDU头与TCP段对齐,并且TCP段包含整数个FPDU的属性。具有FPDU对齐的TCP段允许立即处理包含的FPDU,而无需等待其他TCP段到达或与先前的段组合。

FPDU Pointer (FPDUPTR) - This field of the Marker is used to indicate the beginning of an FPDU.

FPDU指针(FPDUPTR)-此标记字段用于指示FPDU的开始。

Full Operation (Full Operation Phase) - After the completion of the Startup Phase, MPA begins exchanging FPDUs.

全运行(全运行阶段)-启动阶段完成后,MPA开始更换FPDU。

Header Alignment - The property that a TCP segment begins with an FPDU. The FPDU is Header Aligned when the FPDU header is exactly at the start of the TCP segment (right behind the TCP headers on the wire).

标头对齐-TCP段以FPDU开头的属性。当FPDU报头正好位于TCP段的起始位置时(在导线上TCP报头的正后方),FPDU报头对齐。

Initiator - The endpoint of a connection that sends the MPA Request Frame, i.e., the first to actually send data (which may not be the one that sends the TCP SYN).

启动器—发送MPA请求帧的连接的端点,即第一个实际发送数据的端点(可能不是发送TCP SYN的端点)。

Marker - A four-octet field that is placed in the MPA data stream at fixed octet intervals (every 512 octets).

标记-以固定的八位字节间隔(每512个八位字节)放置在MPA数据流中的四个八位字节字段。

MPA-aware TCP - A TCP implementation that is aware of the receiver efficiencies of MPA FPDU Alignment and is capable of sending TCP segments that begin with an FPDU.

MPA感知TCP—一种TCP实现,它了解MPA FPDU对齐的接收器效率,并能够发送以FPDU开头的TCP段。

MPA-enabled - MPA is enabled if the MPA protocol is visible on the wire. When the sender is MPA-enabled, it is inserting framing and Markers. When the receiver is MPA-enabled, it is interpreting framing and Markers.

MPA启用-如果MPA协议在导线上可见,则启用MPA。当发送方启用MPA时,它将插入帧和标记。当接收器启用MPA时,它将解释帧和标记。

MPA Request Frame - Data sent from the MPA Initiator to the MPA Responder during the Startup Phase.

MPA请求帧—启动阶段从MPA启动器发送到MPA响应程序的数据。

MPA Reply Frame - Data sent from the MPA Responder to the MPA Initiator during the Startup Phase.

MPA应答帧-启动阶段从MPA应答器发送到MPA启动器的数据。

MPA - Marker-based ULP PDU Aligned Framing for TCP protocol. This document defines the MPA protocol.

MPA-TCP协议中基于标记的ULP PDU对齐帧。本文件定义了MPA协议。

MULPDU - Maximum ULPDU. The current maximum size of the record that is acceptable for DDP to pass to MPA for transmission.

MULPDU-最大ULPDU。DDP可通过MPA传输的记录的当前最大大小。

Node - A computing device attached to one or more links of a network. A Node in this context does not refer to a specific application or protocol instantiation running on the computer. A Node may consist of one or more MPA on TCP devices installed in a host computer.

节点-连接到网络的一个或多个链路的计算设备。此上下文中的节点不引用计算机上运行的特定应用程序或协议实例化。节点可以由安装在主机中的TCP设备上的一个或多个MPA组成。

PAD - A 1-3 octet group of zeros used to fill an FPDU to an exact modulo 4 size.

PAD-1-3个八位组的零,用于将FPDU填充到精确的模4大小。

PDU - Protocol data unit

协议数据单元

Private Data - A block of data exchanged between MPA endpoints during initial connection setup.

私有数据-初始连接设置期间MPA端点之间交换的数据块。

Protection Domain - An RDMA concept (see [VERBS-RDMA] and [RDMASEC]) that ties use of various endpoint resources (memory access, etc.) to the specific RDMA/DDP/MPA connection.

保护域—一种RDMA概念(参见[VERBS-RDMA]和[RDMASEC]),将各种端点资源(内存访问等)的使用与特定RDMA/DDP/MPA连接联系起来。

RDDP - A suite of protocols including MPA, [DDP], [RDMAP], an overall security document [RDMASEC], a problem statement [RFC4297], an architecture document [RFC4296], and an applicability document [APPL].

RDDP—一套协议,包括MPA、[DDP]、[RDMAP]、总体安全文档[RDMASEC]、问题声明[RFC4297]、体系结构文档[RFC4296]和适用性文档[APPL]。

RDMA - Remote Direct Memory Access; a protocol that uses DDP and MPA to enable applications to transfer data directly from memory buffers. See [RDMAP].

RDMA-远程直接内存访问;一种协议,使用DDP和MPA使应用程序能够直接从内存缓冲区传输数据。参见[RDMAP]。

Remote Peer - The MPA protocol implementation on the opposite end of the connection. Used to refer to the remote entity when describing protocol exchanges or other interactions between two Nodes.

远程对等-连接另一端的MPA协议实现。在描述两个节点之间的协议交换或其他交互时,用于指代远程实体。

Responder - The connection endpoint that responds to an incoming MPA connection request (the MAP Request Frame). This may not be the endpoint that awaited the TCP SYN.

Responder—响应传入MPA连接请求(映射请求帧)的连接端点。这可能不是等待TCP SYN的端点。

Startup Phase - The initial exchanges of an MPA connection that serves to more fully identify MPA endpoints to each other and pass connection specific setup information to each other.

启动阶段-MPA连接的初始交换,用于更全面地相互识别MPA端点,并相互传递特定于连接的设置信息。

ULP - Upper Layer Protocol. The protocol layer above the protocol layer currently being referenced. The ULP for MPA is DDP [DDP].

ULP-上层协议。当前引用的协议层之上的协议层。MPA的ULP为DDP[DDP]。

ULPDU - Upper Layer Protocol Data Unit. The data record defined by the layer above MPA (DDP). ULPDU corresponds to DDP's DDP segment.

ULPDU-上层协议数据单元。由MPA(DDP)以上的层定义的数据记录。ULPDU对应于DDP的DDP段。

ULPDU_Length - A field in the FPDU describing the length of the included ULPDU.

ULPDU_Length-FPDU中描述包含的ULPDU长度的字段。

3. MPA's Interactions with DDP
3. MPA与DDP的相互作用

DDP requires MPA to maintain DDP record boundaries from the sender to the receiver. When using MPA on TCP to send data, DDP provides records (ULPDUs) to MPA. MPA will use the reliable transmission abilities of TCP to transmit the data, and will insert appropriate additional information into the TCP stream to allow the MPA receiver to locate the record boundary information.

DDP要求MPA维护从发送方到接收方的DDP记录边界。在TCP上使用MPA发送数据时,DDP向MPA提供记录(ULPDU)。MPA将使用TCP的可靠传输能力来传输数据,并将向TCP流中插入适当的附加信息,以允许MPA接收器定位记录边界信息。

As such, MPA accepts complete records (ULPDUs) from DDP at the sender and returns them to DDP at the receiver.

因此,MPA在发送方接受来自DDP的完整记录(ULPDU),并在接收方将其返回给DDP。

MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU contained in one FPDU.

MPA必须封装ULPDU,以便一个FPDU中正好包含一个ULPDU。

MPA over a standard TCP stack can usually provide FPDU Alignment with the TCP Header if the FPDU is equal to TCP's EMSS. An optimized MPA/TCP stack can also maintain alignment as long as the FPDU is less than or equal to TCP's EMSS. Since FPDU Alignment is generally desired by the receiver, DDP cooperates with MPA to ensure FPDUs' lengths do not exceed the EMSS under normal conditions. This is done with the MULPDU mechanism.

如果FPDU等于TCP的EMS,则标准TCP堆栈上的MPA通常可以提供FPDU与TCP头的对齐。只要FPDU小于或等于TCP的EMS,优化的MPA/TCP堆栈也可以保持对齐。由于接收机通常需要FPDU对准,DDP与MPA合作,以确保FPDU的长度在正常条件下不会超过EMS。这是通过MULPDU机制完成的。

MPA MUST provide information to DDP on the current maximum size of the record that is acceptable to send (MULPDU). DDP SHOULD limit each record size to MULPDU. The range of MULPDU values MUST be between 128 octets and 64768 octets, inclusive.

MPA必须向DDP提供可接受发送的记录当前最大大小的信息(MULPDU)。DDP应将每个记录大小限制为MULPDU。MULPDU值的范围必须介于128个八位字节和64768个八位字节之间(含128个八位字节)。

The sending DDP MUST NOT post a ULPDU larger than 64768 octets to MPA. DDP MAY post a ULPDU of any size between one and 64768 octets; however, MPA is not REQUIRED to support a ULPDU Length that is greater than the current MULPDU.

发送DDP不得向MPA发送大于64768个八位字节的ULPDU。DDP可以发布一个大小在1到64768个八位组之间的ULPDU;但是,MPA不需要支持大于当前MULPDU的ULPDU长度。

While the maximum theoretical length supported by the MPA header ULPDU_Length field is 65535, TCP over IP requires the IP datagram maximum length to be 65535 octets. To enable MPA to support FPDU Alignment, the maximum size of the FPDU must fit within an IP datagram. Thus, the ULPDU limit of 64768 octets was derived by taking the maximum IP datagram length, subtracting from it the maximum total length of the sum of the IPv4 header, TCP header, IPv4 options, TCP options, and the worst-case MPA overhead, and then rounding the result down to a 128-octet boundary.

虽然MPA标头ULPDU长度字段支持的最大理论长度为65535,但TCP over IP要求IP数据报的最大长度为65535个八位字节。为了使MPA支持FPDU对齐,FPDU的最大大小必须适合IP数据报。因此,通过取最大IP数据报长度,减去IPv4报头、TCP报头、IPv4选项、TCP选项和最坏情况MPA开销之和的最大总长度,然后将结果四舍五入到128个八位字节的边界,得出64768个八位字节的ULPDU限制。

Note that MULPDU will be significantly smaller than the theoretical maximum in most implementations for most circumstances, due to link MTUs, use of extra headers such as required for IPsec, etc.

请注意,在大多数情况下,在大多数实现中,由于链路MTU、使用额外的报头(如IPsec所需的报头),MULPDU将大大小于理论最大值。

On receive, MPA MUST pass each ULPDU with its length to DDP when it has been validated.

接收时,MPA必须在验证每个ULPDU时将其长度传递给DDP。

If an MPA implementation supports passing out-of-order ULPDUs to DDP, the MPA implementation SHOULD:

如果MPA实现支持将无序ULPDU传递给DDP,则MPA实现应:

* Pass each ULPDU with its length to DDP as soon as it has been fully received and validated.

* 一旦每个ULPDU被完全接收和验证,立即将其长度传递给DDP。

* Provide a mechanism to indicate the ordering of ULPDUs as the sender transmitted them. One possible mechanism might be providing the TCP sequence number for each ULPDU.

* 提供一种机制,在发送方传输ULPDU时指示ULPDU的顺序。一种可能的机制可能是为每个ULPDU提供TCP序列号。

* Provide a mechanism to indicate when a given ULPDU (and prior ULPDUs) are complete (Delivered to DDP). One possible mechanism might be to allow DDP to see the current outgoing TCP ACK sequence number.

* 提供一种机制,以指示给定的ULPDU(和之前的ULPDU)何时完成(交付给DDP)。一种可能的机制是允许DDP查看当前传出的TCP ACK序列号。

* Provide an indication to DDP that the TCP has closed or has begun to close the connection (e.g., received a FIN).

* 向DDP提供TCP已关闭或已开始关闭连接的指示(例如,收到FIN)。

MPA MUST provide the protocol version negotiated with its peer to DDP. DDP will use this version to set the version in its header and to report the version to [RDMAP].

MPA必须提供与其对等DDP协商的协议版本。DDP将使用此版本在其标头中设置版本,并将版本报告给[RDMAP]。

4. MPA Full Operation Phase
4. 全运行阶段

The following sections describe the main semantics of the Full Operation Phase of MPA.

以下各节描述了MPA全运行阶段的主要语义。

4.1. FPDU Format
4.1. FPDU格式

MPA senders create FPDUs out of ULPDUs. The format of an FPDU shown below MUST be used for all MPA FPDUs. For purposes of clarity, Markers are not shown in Figure 2.

MPA发送器从ULPDU创建FPDU。以下显示的FPDU格式必须用于所有MPA FPDU。为清晰起见,图2中未显示标记。

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          ULPDU_Length         |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
      |                                                               |
      ~                                                               ~
      ~                            ULPDU                              ~
      |                                                               |
      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               |          PAD (0-3 octets)     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             CRC                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          ULPDU_Length         |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
      |                                                               |
      ~                                                               ~
      ~                            ULPDU                              ~
      |                                                               |
      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               |          PAD (0-3 octets)     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             CRC                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 2: FPDU Format

图2:FPDU格式

ULPDU_Length: 16 bits (unsigned integer). This is the number of octets of the contained ULPDU. It does not include the length of the FPDU header itself, the pad, the CRC, or of any Markers that fall within the ULPDU. The 16-bit ULPDU Length field is large enough to support the largest IP datagrams for IPv4 or IPv6.

ULPDU_长度:16位(无符号整数)。这是包含的ULPDU的八位字节数。它不包括FPDU头本身、pad、CRC或ULPDU内任何标记的长度。16位ULPDU长度字段足够大,足以支持IPv4或IPv6的最大IP数据报。

PAD: The PAD field trails the ULPDU and contains between 0 and 3 octets of data. The pad data MUST be set to zero by the sender and ignored by the receiver (except for CRC checking). The length of the pad is set so as to make the size of the FPDU an integral multiple of four.

PAD:PAD字段跟踪ULPDU,包含0到3个八位字节的数据。pad数据必须由发送方设置为零,并由接收方忽略(CRC检查除外)。设置焊盘长度,使FPDU的尺寸为四的整数倍。

CRC: 32 bits. When CRCs are enabled, this field contains a CRC32c check value, which is used to verify the entire contents of the FPDU, using CRC32c. See Section 4.4, CRC Calculation. When CRCs are not enabled, this field is still present, may contain any value, and MUST NOT be checked.

CRC:32位。启用CRC时,此字段包含一个CRC32c检查值,用于使用CRC32c验证FPDU的全部内容。见第4.4节,CRC计算。未启用CRC时,此字段仍然存在,可能包含任何值,并且不得选中。

The FPDU adds a minimum of 6 octets to the length of the ULPDU. In addition, the total length of the FPDU will include the length of any Markers and from 0 to 3 pad octets added to round-up the ULPDU size.

FPDU在ULPDU的长度上增加至少6个八位字节。此外,FPDU的总长度将包括任何标记的长度,以及添加到ULPDU大小四舍五入的0到3个pad八位字节。

4.2. Marker Format
4.2. 标记格式

The format of a Marker MUST be as specified in Figure 3:

标记的格式必须如图3所示:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           RESERVED            |            FPDUPTR            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           RESERVED            |            FPDUPTR            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 3: Marker Format

图3:标记格式

RESERVED: The Reserved field MUST be set to zero on transmit and ignored on receive (except for CRC calculation).

保留:保留字段必须在发送时设置为零,在接收时忽略(CRC计算除外)。

FPDUPTR: The FPDU Pointer is a relative pointer, 16 bits long, interpreted as an unsigned integer that indicates the number of octets in the TCP stream from the beginning of the ULPDU Length field to the first octet of the entire Marker. The least significant two bits MUST always be set to zero at the transmitter, and the receivers MUST always treat these as zero for calculations.

FPDUPTR:FPDU指针是一个相对指针,长度为16位,解释为无符号整数,表示TCP流中从ULPDU长度字段开始到整个标记的第一个八位字节的八位字节数。发射机的最低有效位必须始终设置为零,接收机必须始终将其视为零进行计算。

4.3. MPA Markers
4.3. MPA标记

MPA Markers are used to identify the start of FPDUs when packets are received out of order. This is done by locating the Markers at fixed intervals in the data stream (which is correlated to the TCP sequence number) and using the Marker value to locate the preceding FPDU start.

MPA标记用于在接收到无序数据包时识别FPDU的开始。这是通过在数据流中以固定间隔定位标记(与TCP序列号相关)并使用标记值来定位前面的FPDU开始来完成的。

All MPA Markers are included in the containing FPDU CRC calculation (when both CRCs and Markers are in use).

所有MPA标记均包含在包含FPDU CRC的计算中(当CRC和标记同时使用时)。

The MPA receiver's ability to locate out-of-order FPDUs and pass the ULPDUs to DDP is implementation dependent. MPA/DDP allows those receivers that are able to deal with out-of-order FPDUs in this way to require the insertion of Markers in the data stream. When the receiver cannot deal with out-of-order FPDUs in this way, it may disable the insertion of Markers at the sender. All MPA senders MUST be able to generate Markers when their use is declared by the opposing receiver (see Section 7.1, Connection Setup).

MPA接收器定位无序FPDU并将ULPDU传递给DDP的能力取决于实现。MPA/DDP允许那些能够以这种方式处理无序FPDU的接收器要求在数据流中插入标记。当接收方无法以这种方式处理无序FPDU时,它可能会禁用在发送方插入标记。所有MPA发送方必须能够在对方接收方声明其使用时生成标记(见第7.1节,连接设置)。

When Markers are enabled, MPA senders MUST insert a Marker into the data stream at a 512-octet periodic interval in the TCP Sequence Number Space. The Marker contains a 16-bit unsigned integer referred to as the FPDUPTR (FPDU Pointer).

启用标记时,MPA发送方必须在TCP序列号空间中以512个八位字节的周期间隔将标记插入数据流。该标记包含一个称为FPDUPTR(FPDU指针)的16位无符号整数。

If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16-bit relative back-pointer. FPDUPTR MUST contain the number of octets in the TCP stream from the beginning of the ULPDU Length field to the first octet of the Marker, unless the Marker falls between FPDUs. Thus, the location of the first octet of the previous FPDU header can be determined by subtracting the value of the given Marker from the current octet-stream sequence number (i.e., TCP sequence number) of the first octet of the Marker. Note that this computation MUST take into account that the TCP sequence number could have wrapped between the Marker and the header.

如果FPDUPTR的值为非零,则FPDU指针为16位相对后向指针。FPDUPTR必须包含TCP流中从ULPDU长度字段开始到标记的第一个八位字节的八位字节数,除非标记位于FPDU之间。因此,可以通过从标记的第一八位组的当前八位组流序列号(即,TCP序列号)减去给定标记的值来确定先前FPDU报头的第一八位组的位置。请注意,此计算必须考虑到TCP序列号可能已包装在标记和标头之间。

An FPDUPTR value of 0x0000 is a special case -- it is used when the Marker falls exactly between FPDUs (between the preceding FPDU CRC field and the next FPDU's ULPDU Length field). In this case, the Marker is considered to be contained in the following FPDU; the Marker MUST be included in the CRC calculation of the FPDU following the Marker (if CRCs are being generated or checked). Thus, an FPDUPTR value of 0x0000 means that immediately following the Marker is an FPDU header (the ULPDU Length field).

FPDUPTR值0x0000是一种特殊情况——当标记正好位于FPDU之间(位于前一个FPDU CRC字段和下一个FPDU的ULPDU长度字段之间)时使用。在这种情况下,标记被认为包含在以下FPDU中;标记必须包括在标记后FPDU的CRC计算中(如果正在生成或检查CRC)。因此,FPDUPTR值0x0000意味着紧跟在标记后面的是FPDU头(ULPDU长度字段)。

Since all FPDUs are integral multiples of 4 octets, the bottom two bits of the FPDUPTR as calculated by the sender are zero. MPA reserves these bits so they MUST be treated as zero for computation at the receiver.

由于所有FPDU都是4个八位字节的整数倍,因此发送方计算的FPDUPTR的底部两位为零。MPA保留这些位,因此必须将其视为零,以便在接收器处进行计算。

When Markers are enabled (see Section 7.1, Connection Setup), the MPA Markers MUST be inserted immediately preceding the first FPDU of Full Operation Phase, and at every 512th octet of the TCP octet stream thereafter. As a result, the first Marker has an FPDUPTR value of 0x0000. If the first Marker begins at octet sequence number SeqStart, then Markers are inserted such that the first octet of the Marker is at octet sequence number SeqNum if the remainder of (SeqNum - SeqStart) mod 512 is zero. Note that SeqNum can wrap.

当启用标记时(参见第7.1节“连接设置”),MPA标记必须立即插入完整操作阶段的第一个FPDU之前,以及之后TCP八位字节流的每512个八位字节。因此,第一个标记的FPDUPTR值为0x0000。如果第一个标记从八位字节序列号SeqStart开始,则插入标记,以便在(SeqNum-SeqStart)mod 512的剩余值为零时,标记的第一个八位字节位于八位字节序列号SeqNum。请注意,SeqNum可以换行。

For example, if the TCP sequence number were used to calculate the insertion point of the Marker, the starting TCP sequence number is unlikely to be zero, and 512-octet multiples are unlikely to fall on a modulo 512 of zero. If the MPA connection is started at TCP sequence number 11, then the 1st Marker will begin at 11, and subsequent Markers will begin at 523, 1035, etc.

例如,如果TCP序列号用于计算标记的插入点,则起始TCP序列号不可能为零,512个八位组的倍数不可能落在0的模512上。如果MPA连接在TCP序列号11处启动,则第一个标记将在11处开始,后续标记将在523、1035处开始,以此类推。

If an FPDU is large enough to contain multiple Markers, they MUST all point to the same point in the TCP stream: the first octet of the ULPDU Length field for the FPDU.

如果FPDU足够大,可以包含多个标记,则它们必须全部指向TCP流中的同一点:FPDU的ULPDU长度字段的第一个八位字节。

If a Marker interval contains multiple FPDUs (the FPDUs are small), the Marker MUST point to the start of the ULPDU Length field for the FPDU containing the Marker unless the Marker falls between FPDUs, in which case the Marker MUST be zero.

如果标记间隔包含多个FPDU(FPDU较小),则标记必须指向包含该标记的FPDU的ULPDU长度字段的开始,除非标记位于FPDU之间,在这种情况下,标记必须为零。

The following example shows an FPDU containing a Marker.

以下示例显示了包含标记的FPDU。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       ULPDU Length (0x0010)   |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
   |                                                               |
   +                                                               +
   |                         ULPDU (octets 0-9)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            (0x0000)           |        FPDU ptr (0x000C)      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        ULPDU (octets 10-15)                   |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               |          PAD (2 octets:0,0)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              CRC                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       ULPDU Length (0x0010)   |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
   |                                                               |
   +                                                               +
   |                         ULPDU (octets 0-9)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            (0x0000)           |        FPDU ptr (0x000C)      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        ULPDU (octets 10-15)                   |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               |          PAD (2 octets:0,0)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              CRC                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 4: Example FPDU Format with Marker

图4:带标记的FPDU格式示例

MPA Receivers MUST preserve ULPDU boundaries when passing data to DDP. MPA Receivers MUST pass the ULPDU data and the ULPDU Length to DDP and not the Markers, headers, and CRC.

MPA接收器在向DDP传递数据时必须保留ULPDU边界。MPA接收器必须将ULPDU数据和ULPDU长度传递给DDP,而不是标记、标头和CRC。

4.4. CRC Calculation
4.4. CRC计算

An MPA implementation MUST implement CRC support and MUST either:

MPA实现必须实现CRC支持,并且必须:

(1) always use CRCs; the MPA provider is not REQUIRED to support an administrator's request that CRCs not be used.

(1) 始终使用CRC;MPA提供商无需支持管理员关于不使用CRC的请求。

or

(2a) only indicate a preference not to use CRCs on the explicit request of the system administrator, via an interface not defined in this spec. The default configuration for a connection MUST be to use CRCs.

(2a)仅在系统管理员明确要求下,通过本规范中未定义的接口指示不使用CRC的首选项。连接的默认配置必须为使用CRC。

(2b) disable CRC checking (and possibly generation) if both the local and remote endpoints indicate preference not to use CRCs.

(2b)如果本地和远程端点都表示不使用CRC的偏好,则禁用CRC检查(并可能生成)。

An administrative decision to have a host request CRC suppression SHOULD NOT be made unless there is assurance that the TCP connection involved provides protection from undetected errors that is at least as strong as an end-to-end CRC32c. End-to-end usage of an IPsec cryptographic integrity check is among the ways to provide such protection, and the use of channel bindings [NFSv4CHANNEL] by the ULP can provide a high level of assurance that the IPsec protection scope is end-to-end with respect to the ULP.

除非能够保证所涉及的TCP连接能够提供至少与端到端CRC32c一样强大的未检测到错误的保护,否则不应做出主机请求CRC抑制的管理决定。端到端使用IPsec加密完整性检查是提供此类保护的方式之一,ULP使用通道绑定[NFSv4CHANNEL]可以提供IPsec保护范围相对于ULP是端到端的高度保证。

The process MUST be invisible to the ULP.

该过程必须对ULP不可见。

After receipt of an MPA startup declaration indicating that its peer requires CRCs, an MPA instance MUST continue generating and checking CRCs until the connection terminates. If an MPA instance has declared that it does not require CRCs, it MUST turn off CRC checking immediately after receipt of an MPA mode declaration indicating that its peer also does not require CRCs. It MAY continue generating CRCs. See Section 7.1, Connection Setup, for details on the MPA startup.

在收到指示其对等方需要CRC的MPA启动声明后,MPA实例必须继续生成和检查CRC,直到连接终止。如果MPA实例已声明不需要CRC,则必须在收到指示其对等方也不需要CRC的MPA模式声明后立即关闭CRC检查。它可能会继续生成CRC。有关MPA启动的详细信息,请参见第7.1节“连接设置”。

When sending an FPDU, the sender MUST include a CRC field. When CRCs are enabled, the CRC field in the MPA FPDU MUST be computed using the CRC32c polynomial in the manner described in the iSCSI Protocol [iSCSI] document for Header and Data Digests.

发送FPDU时,发送方必须包含CRC字段。启用CRC时,MPA FPDU中的CRC字段必须使用CRC32c多项式,按照头和数据摘要的iSCSI协议[iSCSI]文档中描述的方式计算。

The fields which MUST be included in the CRC calculation when sending an FPDU are as follows:

发送FPDU时必须包括在CRC计算中的字段如下:

1) If a Marker does not immediately precede the ULPDU Length field, the CRC-32c is calculated from the first octet of the ULPDU Length field, through all the ULPDU and Markers (if present), to the last octet of the PAD (if present), inclusive. If there is a Marker immediately following the PAD, the Marker is included in the CRC calculation for this FPDU.

1) 如果标记没有紧跟在ULPDU长度字段之前,则CRC-32c从ULPDU长度字段的第一个八位字节,通过所有ULPDU和标记(如果存在)计算到PAD的最后一个八位字节(如果存在),包括在内。如果PAD后面紧跟着一个标记,则该标记将包含在此FPDU的CRC计算中。

2) If a Marker immediately precedes the first octet of the ULPDU Length field of the FPDU, (i.e., the Marker fell between FPDUs, and thus is required to be included in the second FPDU), the CRC-32c is calculated from the first octet of the Marker, through the ULPDU Length header, through all the ULPDU and Markers (if present), to the last octet of the PAD (if present), inclusive.

2) 如果一个标记紧跟在FPDU的ULPDU长度字段的第一个八位字节之前(即,该标记落在FPDU之间,因此需要包括在第二个FPDU中),则CRC-32c从标记的第一个八位字节开始计算,通过ULPDU长度头,通过所有ULPDU和标记(如果存在),至焊盘的最后一个八位字节(如有),包括在内。

3) After calculating the CRC-32c, the resultant value is placed into the CRC field at the end of the FPDU.

3) 在计算CRC-32c之后,结果值被放入FPDU末尾的CRC字段中。

When an FPDU is received, and CRC checking is enabled, the receiver MUST first perform the following:

当接收到FPDU且启用CRC检查时,接收器必须首先执行以下操作:

1) Calculate the CRC of the incoming FPDU in the same fashion as defined above.

1) 按照上述定义的相同方式计算传入FPDU的CRC。

2) Verify that the calculated CRC-32c value is the same as the received CRC-32c value found in the FPDU CRC field. If not, the receiver MUST treat the FPDU as an invalid FPDU.

2) 验证计算的CRC-32c值是否与FPDU CRC字段中的接收CRC-32c值相同。否则,接收器必须将FPDU视为无效FPDU。

The procedure for handling invalid FPDUs is covered in Section 8, Error Semantics.

第8节“错误语义”介绍了处理无效FPDU的过程。

The following is an annotated hex dump of an example FPDU sent as the first FPDU on the stream. As such, it starts with a Marker. The FPDU contains a 42 octet ULPDU (an example DDP segment) which in turn contains 24 octets of the contained ULPDU, which is a data load that is all zeros. The CRC32c has been correctly calculated and can be used as a reference. See the [DDP] and [RDMAP] specification for definitions of the DDP Control field, Queue, MSN, MO, and Send Data.

以下是作为流上的第一个FPDU发送的示例FPDU的带注释的十六进制转储。因此,它从一个标记开始。FPDU包含42个八位ULPDU(一个示例DDP段),该段又包含所包含ULPDU的24个八位字节,该ULPDU是一个全为零的数据加载。CRC32c已正确计算,可作为参考。有关DDP控制字段、队列、MSN、MO和发送数据的定义,请参阅[DDP]和[RDMAP]规范。

Octet Contents Annotation Count

八位字节内容注释计数

0000 00 Marker: Reserved 0001 00 0002 00 Marker: FPDUPTR 0003 00 0004 00 ULPDU Length 0005 2a 0006 41 DDP Control Field, Send with Last flag set 0007 43 0008 00 Reserved (DDP STag position with no STag) 0009 00 000a 00 000b 00 000c 00 DDP Queue = 0 000d 00 000e 00 000f 00 0010 00 DDP MSN = 1 0011 00 0012 00 0013 01 0014 00 DDP MO = 0 0015 00 0016 00 0017 00 0018 00 DDP Send Data (24 octets of zeros) ... 002f 00 0030 52 CRC32c 0031 23 0032 99 0033 83

0000 00标记:保留0001 00 0002 00标记:FPDUPTR 0003 00 0004 00 ULPDU长度0005 2a 0006 41 DDP控制字段,发送时保留最后一个标记集0007 43 0008 00(DDP STag位置无STag)0009 00 000a 00 000b 00 000c 00 DDP队列=0 000d 00 000e 00 000f 00 0010 00 DDP MSN=1 0011 00 0012 00 0013 01 0014 00 DDP MO=0 0015 00 0016 00 0017 00 0018 00 DDP发送数据(24个零位字节)。。。002f 00 0030 52 CRC32c 0031 23 0032 99 0033 83

Figure 5: Annotated Hex Dump of an FPDU

图5:FPDU的带注释十六进制转储

The following is an example sent as the second FPDU of the stream where the first FPDU (which is not shown here) had a length of 492 octets and was also a Send to Queue 0 with Last Flag set. This example contains a Marker.

以下是作为流的第二个FPDU发送的示例,其中第一个FPDU(此处未显示)的长度为492个八位字节,也是设置了最后一个标志的发送到队列0。此示例包含一个标记。

Octet Contents Annotation Count

八位字节内容注释计数

01ec 00 Length 01ed 2a 01ee 41 DDP Control Field: Send with Last Flag set 01ef 43 01f0 00 Reserved (DDP STag position with no STag) 01f1 00 01f2 00 01f3 00 01f4 00 DDP Queue = 0 01f5 00 01f6 00 01f7 00 01f8 00 DDP MSN = 2 01f9 00 01fa 00 01fb 02 01fc 00 DDP MO = 0 01fd 00 01fe 00 01ff 00 0200 00 Marker: Reserved 0201 00 0202 00 Marker: FPDUPTR 0203 14 0204 00 DDP Send Data (24 octets of zeros) ... 021b 00 021c 84 CRC32c 021d 92 021e 58 021f 98

01ec 00长度01ed 2a 01ee 41 DDP控制字段:发送时保留最后一个标志集01ef 43 01f0 00(DDP STag位置,无STag)01f1 00 01f2 00 01f3 00 01f4 00 DDP队列=0 01f5 00 01f6 00 01f7 00 01f8 00 DDP MSN=2 01f9 00 01fa 00 01fb 02 01fc 00 DDP MO=0 01fd 00 01fe 00 01ff 00 02000标记:保留0201 00 0202 00标记:FPDUPTR 0203 14 0204 00 DDP发送数据(24个八位零)。。。021b 00 021c 84 CRC32c 021d 92 021e 58 021f 98

Figure 6: Annotated Hex Dump of an FPDU with Marker

图6:带标记的FPDU的带注释的十六进制转储

4.5. FPDU Size Considerations
4.5. FPDU尺寸注意事项

MPA defines the Maximum Upper Layer Protocol Data Unit (MULPDU) as the size of the largest ULPDU fitting in an FPDU. For an empty TCP Segment, MULPDU is EMSS minus the FPDU overhead (6 octets) minus space for Markers and pad octets.

MPA将最大上层协议数据单元(MULPDU)定义为FPDU中最大ULPDU配件的大小。对于空TCP段,MULPDU是EMS减去FPDU开销(6个八位字节)减去标记和pad八位字节的空间。

The maximum ULPDU Length for a single ULPDU when Markers are present MUST be computed as:

当存在标记时,单个ULPDU的最大ULPDU长度必须计算为:

       MULPDU = EMSS - (6 + 4 * Ceiling(EMSS / 512) + EMSS mod 4)
        
       MULPDU = EMSS - (6 + 4 * Ceiling(EMSS / 512) + EMSS mod 4)
        

The formula above accounts for the worst-case number of Markers.

上面的公式说明了最坏情况下的标记数。

The maximum ULPDU Length for a single ULPDU when Markers are NOT present MUST be computed as:

当不存在标记时,单个ULPDU的最大ULPDU长度必须计算为:

       MULPDU = EMSS - (6 + EMSS mod 4)
        
       MULPDU = EMSS - (6 + EMSS mod 4)
        

As a further optimization of the wire efficiency an MPA implementation MAY dynamically adjust the MULPDU (see Section 5 for latency and wire efficiency trade-offs). When one or more FPDUs are already packed into a TCP Segment, MULPDU MAY be reduced accordingly.

作为线效率的进一步优化,MPA实现可以动态调整MULPDU(关于延迟和线效率的权衡,请参见第5节)。当一个或多个FPDU已经打包到TCP段中时,MULPDU可以相应地减少。

DDP SHOULD provide ULPDUs that are as large as possible, but less than or equal to MULPDU.

DDP应提供尽可能大但小于或等于MULPDU的ULPDU。

If the TCP implementation needs to adjust EMSS to support MTU changes or changing TCP options, the MULPDU value is changed accordingly.

如果TCP实现需要调整EMS以支持MTU更改或更改TCP选项,则会相应更改MULPDU值。

In certain rare situations, the EMSS may shrink below 128 octets in size. If this occurs, the MPA on TCP sender MUST NOT shrink the MULPDU below 128 octets and is not required to follow the segmentation rules in Section 5.1 and Appendix A.

在某些罕见的情况下,EMS可能会缩小到128个八位字节以下。如果出现这种情况,TCP发送方上的MPA不得将MULPDU压缩到128个八位字节以下,并且无需遵循第5.1节和附录A中的分段规则。

If one or more FPDUs are already packed into a TCP segment, such that the remaining room is less than 128 octets, MPA MUST NOT provide a MULPDU smaller than 128. In this case, MPA would typically provide a MULPDU for the next full sized segment, but may still pack the next FPDU into the small remaining room, provide that the next FPDU is small enough to fit.

如果一个或多个FPDU已经打包到TCP段中,因此剩余空间小于128个八位字节,MPA不得提供小于128的MULPDU。在这种情况下,MPA通常会为下一个全尺寸段提供MULPDU,但如果下一个FPDU足够小,可以将下一个FPDU装入剩余的小房间。

The value 128 is chosen as to allow DDP designers room for the DDP Header and some user data.

选择值128是为了允许DDP设计器为DDP头和一些用户数据留出空间。

5. MPA's interactions with TCP
5. MPA与TCP的相互作用

The following sections describe MPA's interactions with TCP. This section discusses using a standard layered TCP stack with MPA attached above a TCP socket. Discussion of using an optimized MPA-aware TCP with an MPA implementation that takes advantage of the extra optimizations is done in Appendix A.

以下各节介绍MPA与TCP的交互。本节讨论如何使用标准分层TCP堆栈,并在TCP套接字上附加MPA。附录A中讨论了使用优化的MPA感知TCP和利用额外优化的MPA实现。

                   +-----------------------------------+
                   | +-----+       +-----------------+ |
                   | | MPA |       | Other Protocols | |
                   | +-----+       +-----------------+ |
                   |    ||                  ||         |
                   |  ----- socket API --------------  |
                   |            ||                     |
                   |         +-----+                   |
                   |         | TCP |                   |
                   |         +-----+                   |
                   |            ||                     |
                   |         +-----+                   |
                   |         | IP  |                   |
                   |         +-----+                   |
                   +-----------------------------------+
        
                   +-----------------------------------+
                   | +-----+       +-----------------+ |
                   | | MPA |       | Other Protocols | |
                   | +-----+       +-----------------+ |
                   |    ||                  ||         |
                   |  ----- socket API --------------  |
                   |            ||                     |
                   |         +-----+                   |
                   |         | TCP |                   |
                   |         +-----+                   |
                   |            ||                     |
                   |         +-----+                   |
                   |         | IP  |                   |
                   |         +-----+                   |
                   +-----------------------------------+
        

Figure 7: Fully Layered Implementation

图7:完全分层的实现

The Fully layered implementation is described for completeness; however, the user is cautioned that the reduced probability of FPDU alignment when transmitting with this implementation will tend to introduce a higher overhead at optimized receivers. In addition, the lack of out-of-order receive processing will significantly reduce the value of DDP/MPA by imposing higher buffering and copying overhead in the local receiver.

为了完整性,描述了完全分层的实现;然而,用户应注意,当使用该实现进行传输时,FPDU对准的降低概率将倾向于在优化的接收机处引入更高的开销。此外,缺少无序接收处理将通过在本地接收器中施加更高的缓冲和复制开销而显著降低DDP/MPA的值。

5.1. MPA transmitters with a standard layered TCP
5.1. 带有标准分层TCP的MPA变送器

MPA transmitters SHOULD calculate a MULPDU as described in Section 4.5. If the TCP implementation allows EMSS to be determined by MPA, that value should be used. If the transmit side TCP implementation is not able to report the EMSS, MPA SHOULD use the current MTU value to establish a likely FPDU size, taking into account the various expected header sizes.

MPA变送器应按照第4.5节所述计算MULPDU。如果TCP实现允许由MPA确定EMS,则应使用该值。如果传输端TCP实现无法报告EMS,MPA应使用当前MTU值建立可能的FPDU大小,同时考虑各种预期的报头大小。

MPA transmitters SHOULD also use whatever facilities the TCP stack presents to cause the TCP transmitter to start TCP segments at FPDU boundaries. Multiple FPDUs MAY be packed into a single TCP segment as determined by the EMSS calculation as long as they are entirely contained in the TCP segment.

MPA变送器还应使用TCP堆栈提供的任何设施,使TCP变送器在FPDU边界处启动TCP段。只要多个FPDU完全包含在TCP段中,则可根据EMS计算将其打包到单个TCP段中。

For example, passing FPDU buffers sized to the current EMSS to the TCP socket and using the TCP_NODELAY socket option to disable the Nagle [RFC896] algorithm will usually result in many of the segments starting with an FPDU.

例如,将大小为当前EMS的FPDU缓冲区传递到TCP套接字,并使用TCP_NODELAY套接字选项禁用Nagle[RFC896]算法,通常会导致许多段以FPDU开头。

It is recognized that various effects can cause an FPDU Alignment to be lost. Following are a few of the effects:

人们认识到,各种影响可能导致FPDU对齐丢失。以下是一些影响:

* ULPDUs that are smaller than the MULPDU. If these are sent in a continuous stream, FPDU Alignment will be lost. Note that careful use of a dynamic MULPDU can help in this case; the MULPDU for future FPDUs can be adjusted to re-establish alignment with the segments based on the current EMSS.

* 比MULPDU小的ULPDU。如果这些以连续流的形式发送,FPDU对齐将丢失。注意,在这种情况下,仔细使用动态MULPDU会有所帮助;未来FPDU的MULPDU可根据当前EMS进行调整,以重新建立与分段的对齐。

* Sending enough data that the TCP receive window limit is reached. TCP may send a smaller segment to exactly fill the receive window.

* 发送足够的数据以达到TCP接收窗口限制。TCP可以发送一个较小的段来精确地填充接收窗口。

* Sending data when TCP is operating up against the congestion window. If TCP is not tracking the congestion window in segments, it may transmit a smaller segment to exactly fill the receive window.

* 当TCP在拥塞窗口上运行时发送数据。如果TCP没有分段跟踪拥塞窗口,它可能会发送一个较小的段来精确填充接收窗口。

* Changes in EMSS due to varying TCP options, or changes in MTU.

* 由于TCP选项的变化或MTU的变化而导致EMS的变化。

If FPDU Alignment with TCP segments is lost for any reason, the alignment is regained after a break in transmission where the TCP send buffers are emptied. Many usage models for DDP/MPA will include such breaks.

如果FPDU与TCP段的对齐因任何原因丢失,则在TCP发送缓冲区清空的传输中断后,将恢复对齐。DDP/MPA的许多使用模型将包括此类中断。

MPA receivers are REQUIRED to be able to operate correctly even if alignment is lost (see Section 6).

MPA接收器需要能够正确运行,即使校准丢失(见第6节)。

5.2. MPA receivers with a standard layered TCP
5.2. 具有标准分层TCP的MPA接收机

MPA receivers will get TCP data in the usual ordered stream. The receivers MUST identify FPDU boundaries by using the ULPDU_LENGTH field, as described in Section 6. Receivers MAY utilize markers to check for FPDU boundary consistency, but they are NOT required to examine the markers to determine the FPDU boundaries.

MPA接收器将在通常的有序流中获得TCP数据。接收机必须使用ULPDU_长度字段识别FPDU边界,如第6节所述。接收器可利用标记检查FPDU边界的一致性,但不要求他们检查标记以确定FPDU边界。

6. MPA Receiver FPDU Identification
6. MPA接收机FPDU识别

An MPA receiver MUST first verify the FPDU before passing the ULPDU to DDP. To do this, the receiver MUST:

MPA接收器必须首先验证FPDU,然后再将ULPDU传递给DDP。为此,接收器必须:

* locate the start of the FPDU unambiguously,

* 明确定位FPDU的起点,

* verify its CRC (if CRC checking is enabled).

* 验证其CRC(如果启用了CRC检查)。

If the above conditions are true, the MPA receiver passes the ULPDU to DDP.

如果上述条件为真,MPA接收器将ULPDU传递给DDP。

To detect the start of the FPDU unambiguously one of the following MUST be used:

要明确检测FPDU的启动,必须使用以下方法之一:

1: In an ordered TCP stream, the ULPDU Length field in the current FPDU when FPDU has a valid CRC, can be used to identify the beginning of the next FPDU.

1:在有序TCP流中,当FPDU具有有效CRC时,当前FPDU中的ULPDU长度字段可用于标识下一个FPDU的开始。

2: For optimized MPA/TCP receivers that support out-of-order reception of FPDUs (see Section 4.3, MPA Markers) a Marker can always be used to locate the beginning of an FPDU (in FPDUs with valid CRCs). Since the location of the Marker is known in the octet stream (sequence number space), the Marker can always be found.

2:对于支持FPDU无序接收的优化MPA/TCP接收器(参见第4.3节,MPA标记),始终可以使用标记定位FPDU的开始位置(在具有有效CRC的FPDU中)。由于标记的位置在八位字节流(序列号空间)中是已知的,因此始终可以找到标记。

3: Having found an FPDU by means of a Marker, an optimized MPA/TCP receiver can find following contiguous FPDUs by using the ULPDU Length fields (from FPDUs with valid CRCs) to establish the next FPDU boundary.

3:通过标记找到FPDU后,优化的MPA/TCP接收器可以使用ULPDU长度字段(来自具有有效CRC的FPDU)来建立下一个FPDU边界,从而找到以下连续FPDU。

The ULPDU Length field (see Section 4) MUST be used to determine if the entire FPDU is present before forwarding the ULPDU to DDP.

在将ULPDU转发给DDP之前,必须使用ULPDU长度字段(参见第4节)确定是否存在整个FPDU。

CRC calculation is discussed in Section 4.4 above.

上文第4.4节讨论了CRC计算。

7. Connection Semantics
7. 连接语义
7.1. Connection Setup
7.1. 连接设置

MPA requires that the Consumer MUST activate MPA, and any TCP enhancements for MPA, on a TCP half connection at the same location in the octet stream at both the sender and the receiver. This is required in order for the Marker scheme to correctly locate the Markers (if enabled) and to correctly locate the first FPDU.

MPA要求使用者必须在发送方和接收方的八位元流中相同位置的TCP半连接上激活MPA以及MPA的任何TCP增强功能。这是标记方案正确定位标记(如果启用)和正确定位第一个FPDU所必需的。

MPA, and any TCP enhancements for MPA are enabled by the ULP in both directions at once at an endpoint.

MPA以及针对MPA的任何TCP增强都由ULP在端点同时在两个方向上启用。

This can be accomplished several ways, and is left up to DDP's ULP:

这可以通过多种方式实现,并由DDP的ULP决定:

* DDP's ULP MAY require DDP on MPA startup immediately after TCP connection setup. This has the advantage that no streaming mode negotiation is needed. An example of such a protocol is shown in Figure 10: Example Immediate Startup negotiation.

* DDP的ULP可能需要在TCP连接设置后立即启动MPA上的DDP。这具有不需要流模式协商的优点。这种协议的一个示例如图10所示:即时启动协商示例。

This may be accomplished by using a well-known port, or a service locator protocol to locate an appropriate port on which DDP on MPA is expected to operate.

这可以通过使用一个众所周知的端口或服务定位器协议来定位一个合适的端口来实现,MPA上的DDP预计将在该端口上运行。

* DDP's ULP MAY negotiate the start of DDP on MPA sometime after a normal TCP startup, using TCP streaming data exchanges on the same connection. The exchange establishes that DDP on MPA (as well as other ULPs) will be used, and exactly locates the point in the octet stream where MPA is to begin operation. Note that such a negotiation protocol is outside the scope of this specification. A simplified example of such a protocol is shown in Figure 9: Example Delayed Startup negotiation on page 33.

* DDP的ULP可以在正常TCP启动后的某个时间,使用相同连接上的TCP流数据交换,在MPA上协商DDP的启动。交换确定将使用MPA上的DDP(以及其他ULP),并准确定位八位组流中MPA开始运行的点。请注意,此类协商协议不在本规范的范围内。这种协议的简化示例如图9所示:第33页的延迟启动协商示例。

An MPA endpoint operates in two distinct phases.

MPA端点在两个不同阶段运行。

The Startup Phase is used to verify correct MPA setup, exchange CRC and Marker configuration, and optionally pass Private Data between endpoints prior to completing a DDP connection. During this phase, specifically formatted frames are exchanged as TCP byte streams without using CRCs or Markers. During this phase a DDP endpoint need not be "bound" to the MPA connection. In fact, the choice of DDP endpoint and its operating parameters may not be known until the Consumer supplied Private Data (if any) has been examined by the Consumer.

启动阶段用于验证正确的MPA设置、交换CRC和标记配置,以及在完成DDP连接之前在端点之间选择性地传递私有数据。在此阶段,特定格式的帧作为TCP字节流交换,而不使用CRC或标记。在此阶段,DDP端点无需“绑定”到MPA连接。事实上,在消费者检查消费者提供的私人数据(如有)之前,可能不知道DDP端点的选择及其操作参数。

The second distinct phase is Full Operation during which FPDUs are sent using all the rules that pertain (CRCs, Markers, MULPDU restrictions, etc.). A DDP endpoint MUST be "bound" to the MPA connection at entry to this phase.

第二个不同的阶段是完全操作,在此期间使用所有相关规则(CRC、标记、MULPDU限制等)发送FPDU。在进入该阶段时,DDP端点必须“绑定”到MPA连接。

When Private Data is passed between ULPs in the Startup Phase, the ULP is responsible for interpreting that data, and then placing MPA into Full Operation.

在启动阶段,当在ULP之间传递私有数据时,ULP负责解释该数据,然后将MPA投入全面运行。

Note: The following text differentiates the two endpoints by calling them Initiator and Responder. This is quite arbitrary and is NOT related to the TCP startup (SYN, SYN/ACK sequence). The Initiator is the side that sends first in the MPA startup sequence (the MPA Request Frame).

注意:下面的文本通过调用Initiator和Responder来区分这两个端点。这是非常随意的,与TCP启动无关(SYN、SYN/ACK序列)。启动器是MPA启动序列(MPA请求帧)中第一个发送的端。

Note: The possibility that both endpoints would be allowed to make a connection at the same time, sometimes called an active/active connection, was considered by the work group and rejected. There were several motivations for this decision. One was that applications needing this facility were few (none other than theoretical at the time of this document). Another was that the facility created some implementation difficulties, particularly with the "dual stack" designs described later on. A last issue was that dealing with rejected connections at startup would have required at least an additional frame type, and more recovery actions, complicating the protocol. While none of these issues was overwhelming, the group and implementers were not motivated to do the work to resolve these issues. The protocol includes a method of detecting these active/active startup attempts so that they can be rejected and an error reported.

注意:工作组考虑了允许两个端点同时进行连接的可能性,有时称为主动/主动连接,但拒绝了这种可能性。这一决定有几个动机。一个是需要这种设施的应用很少(在本文件编写时,只有理论上的应用)。另一个原因是该设施造成了一些实施困难,特别是在后面描述的“双堆栈”设计中。最后一个问题是,在启动时处理被拒绝的连接至少需要额外的帧类型和更多的恢复操作,从而使协议复杂化。虽然这些问题都不是压倒性的,但小组和实施者没有动力去做解决这些问题的工作。该协议包括一种检测这些主动/主动启动尝试的方法,以便可以拒绝这些尝试并报告错误。

The ULP is responsible for determining which side is Initiator or Responder. For client/server type ULPs, this is easy. For peer-peer ULPs (which might utilize a TCP style active/active startup), some mechanism (not defined by this specification) must be established, or some streaming mode data exchanged prior to MPA startup to determine which side starts in Initiator and which starts in Responder MPA mode.

ULP负责确定哪一方是发起方或响应方。对于客户机/服务器类型的ULP,这很简单。对于对等ULP(可能使用TCP类型的主动/主动启动),必须建立一些机制(本规范未定义),或者在MPA启动之前交换一些流模式数据,以确定哪一侧以启动器模式启动,哪一侧以响应器MPA模式启动。

7.1.1 MPA Request and Reply Frame Format
7.1.1 MPA请求和应答帧格式
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   0  |                                                               |
      +         Key (16 bytes containing "MPA ID Req Frame")          +
   4  |      (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65)        |
      +         Or  (16 bytes containing "MPA ID Rep Frame")          +
   8  |      (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65)        |
      +                                                               +
   12 |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   16 |M|C|R| Res     |     Rev       |          PD_Length            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      ~                                                               ~
      ~                   Private Data                                ~
      |                                                               |
      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   0  |                                                               |
      +         Key (16 bytes containing "MPA ID Req Frame")          +
   4  |      (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65)        |
      +         Or  (16 bytes containing "MPA ID Rep Frame")          +
   8  |      (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65)        |
      +                                                               +
   12 |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   16 |M|C|R| Res     |     Rev       |          PD_Length            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      ~                                                               ~
      ~                   Private Data                                ~
      |                                                               |
      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 8: MPA Request/Reply Frame

图8:MPA请求/应答框架

Key: This field contains the "key" used to validate that the sender is an MPA sender. Initiator mode senders MUST set this field to the fixed value "MPA ID Req Frame" or (in byte order) 4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65 (in hexadecimal). Responder mode receivers MUST check this field for the same value, and close the connection and report an error locally if any other value is detected. Responder mode senders MUST set this field to the fixed value "MPA ID Rep Frame" or (in byte order) 4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65 (in hexadecimal). Initiator mode receivers MUST check this field for the same value, and close the connection and report an error locally if any other value is detected.

密钥:此字段包含用于验证发件人是否为MPA发件人的“密钥”。发起方模式发送方必须将此字段设置为固定值“MPA ID Req Frame”或(按字节顺序)4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65(十六进制)。响应器模式接收器必须检查此字段是否有相同的值,如果检测到任何其他值,则关闭连接并在本地报告错误。响应者模式发送者必须将此字段设置为固定值“MPA ID Rep Frame”或(按字节顺序)4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65(十六进制)。启动器模式接收器必须检查此字段是否有相同的值,如果检测到任何其他值,则关闭连接并在本地报告错误。

M: This bit declares an endpoint's REQUIRED Marker usage. When this bit is '1' in an MPA Request Frame, the Initiator declares that Markers are REQUIRED in FPDUs sent from the Responder. When set to '1' in an MPA Reply Frame, this bit declares that Markers are REQUIRED in FPDUs sent from the Initiator. When in a received MPA Request Frame or MPA Reply Frame and the value is '0', Markers MUST NOT be added to the data stream by that endpoint. When '1' Markers MUST be added as described in Section 4.3, MPA Markers.

M:该位声明端点所需的标记用法。当MPA请求帧中的该位为“1”时,发起方声明从响应方发送的FPDU中需要标记。在MPA应答帧中设置为“1”时,该位声明从启动器发送的FPDU中需要标记。当在接收到的MPA请求帧或MPA应答帧中且值为“0”时,该端点不得将标记添加到数据流中。如第4.3节MPA标记所述,必须添加“1”标记。

C: This bit declares an endpoint's preferred CRC usage. When this field is '0' in the MPA Request Frame and the MPA Reply Frame, CRCs MUST not be checked and need not be generated by either endpoint. When this bit is '1' in either the MPA Request Frame or MPA Reply Frame, CRCs MUST be generated and checked by both endpoints. Note that even when not in use, the CRC field remains present in the FPDU. When CRCs are not in use, the CRC field MUST be considered valid for FPDU checking regardless of its contents.

C:该位声明端点的首选CRC用法。当此字段在MPA请求帧和MPA回复帧中为“0”时,不得检查CRC,也无需由任一端点生成。当MPA请求帧或MPA应答帧中的该位为“1”时,必须由两个端点生成和检查CRC。请注意,即使不使用,CRC字段仍存在于FPDU中。当CRC未使用时,无论其内容如何,必须认为CRC字段对FPDU检查有效。

R: This bit is set to zero, and not checked on reception in the MPA Request Frame. In the MPA Reply Frame, this bit is the Rejected Connection bit, set by the Responders ULP to indicate acceptance '0', or rejection '1', of the connection parameters provided in the Private Data.

R:该位设置为零,在MPA请求帧中接收时未检查。在MPA应答帧中,该位是被拒绝的连接位,由应答器ULP设置以指示接受私有数据中提供的连接参数的“0”或拒绝“1”。

Res: This field is reserved for future use. It MUST be set to zero when sending, and not checked on reception.

Res:此字段保留供将来使用。发送时必须将其设置为零,接收时不检查。

Rev: This field contains the revision of MPA. For this version of the specification, senders MUST set this field to one. MPA receivers compliant with this version of the specification MUST check this field. If the MPA receiver cannot interoperate with the received version, then it MUST close the connection and report an error locally. Otherwise, the MPA receiver should report the received version to the ULP.

版次:此字段包含MPA的版次。对于此版本的规范,发件人必须将此字段设置为1。符合本规范版本的MPA接收器必须检查此字段。如果MPA接收器无法与接收到的版本互操作,则必须关闭连接并在本地报告错误。否则,MPA接收器应向ULP报告收到的版本。

PD_Length: This field MUST contain the length in octets of the Private Data field. A value of zero indicates that there is no Private Data field present at all. If the receiver detects that the PD_Length field does not match the length of the Private Data field, or if the length of the Private Data field exceeds 512 octets, the receiver MUST close the connection and report an error locally. Otherwise, the MPA receiver should pass the PD_Length value and Private Data to the ULP.

PD_长度:此字段必须包含专用数据字段的长度(以八位字节为单位)。值为零表示根本不存在私有数据字段。如果接收器检测到PD_长度字段与专用数据字段的长度不匹配,或者专用数据字段的长度超过512个八位字节,则接收器必须关闭连接并在本地报告错误。否则,MPA接收器应将PD_长度值和专用数据传递给ULP。

Private Data: This field may contain any value defined by ULPs or may not be present. The Private Data field MUST be between 0 and 512 octets in length. ULPs define how to size, set, and validate this field within these limits. Private Data usage is further discussed in Section 7.1.4.

私有数据:此字段可能包含ULPs定义的任何值,也可能不存在。专用数据字段的长度必须介于0到512个八位字节之间。ULP定义如何在这些限制范围内调整、设置和验证此字段。第7.1.4节进一步讨论了私有数据的使用。

7.1.2. Connection Startup Rules
7.1.2. 连接启动规则

The following rules apply to MPA connection Startup Phase:

以下规则适用于MPA连接启动阶段:

1. When MPA is started in the Initiator mode, the MPA implementation MUST send a valid MPA Request Frame. The MPA Request Frame MAY include ULP-supplied Private Data.

1. 在启动器模式下启动MPA时,MPA实现必须发送有效的MPA请求帧。MPA请求帧可以包括ULP提供的私有数据。

2. When MPA is started in the Responder mode, the MPA implementation MUST wait until an MPA Request Frame is received and validated before entering Full MPA/DDP Operation.

2. 在响应器模式下启动MPA时,MPA实现必须等到收到并验证MPA请求帧后才能进入完整的MPA/DDP操作。

If the MPA Request Frame is improperly formatted, the implementation MUST close the TCP connection and exit MPA.

如果MPA请求帧格式不正确,则实现必须关闭TCP连接并退出MPA。

If the MPA Request Frame is properly formatted but the Private Data is not acceptable, the implementation SHOULD return an MPA Reply Frame with the Rejected Connection bit set to '1'; the MPA Reply Frame MAY include ULP-supplied Private Data; the implementation MUST exit MPA, leaving the TCP connection open. The ULP may close TCP or use the connection for other purposes.

如果MPA请求帧格式正确,但私有数据不可接受,则实现应返回一个MPA应答帧,并将拒绝的连接位设置为“1”;MPA应答帧可以包括ULP提供的私有数据;实现必须退出MPA,保持TCP连接打开。ULP可能会关闭TCP或将连接用于其他目的。

If the MPA Request Frame is properly formatted and the Private Data is acceptable, the implementation SHOULD return an MPA Reply Frame with the Rejected Connection bit set to '0'; the MPA Reply

如果MPA请求帧的格式正确且私有数据可接受,则实现应返回一个MPA应答帧,并将拒绝的连接位设置为“0”;MPA的答复

Frame MAY include ULP-supplied Private Data; and the Responder SHOULD prepare to interpret any data received as FPDUs and pass any received ULPDUs to DDP.

帧可以包括ULP提供的私有数据;响应者应准备将任何接收到的数据解释为FPDU,并将任何接收到的ULPDU传递给DDP。

Note: Since the receiver's ability to deal with Markers is unknown until the Request and Reply Frames have been received, sending FPDUs before this occurs is not possible.

注意:由于在接收到请求和应答帧之前,接收器处理标记的能力是未知的,因此不可能在此之前发送FPDU。

Note: The requirement to wait on a Request Frame before sending a Reply Frame is a design choice. It makes for a well-ordered sequence of events at each end, and avoids having to specify how to deal with situations where both ends start at the same time.

注意:在发送应答帧之前等待请求帧的要求是一种设计选择。它使每一端的事件顺序井然有序,避免了必须指定如何处理两端同时开始的情况。

3. MPA Initiator mode implementations MUST receive and validate an MPA Reply Frame.

3. MPA启动器模式实现必须接收并验证MPA应答帧。

If the MPA Reply Frame is improperly formatted, the implementation MUST close the TCP connection and exit MPA.

如果MPA应答帧格式不正确,则实现必须关闭TCP连接并退出MPA。

If the MPA Reply Frame is properly formatted but is the Private Data is not acceptable, or if the Rejected Connection bit is set to '1', the implementation MUST exit MPA, leaving the TCP connection open. The ULP may close TCP or use the connection for other purposes.

如果MPA应答帧格式正确,但专用数据不可接受,或者如果拒绝的连接位设置为“1”,则实现必须退出MPA,保持TCP连接打开。ULP可能会关闭TCP或将连接用于其他目的。

If the MPA Reply Frame is properly formatted and the Private Data is acceptable, and the Reject Connection bit is set to '0', the implementation SHOULD enter Full MPA/DDP Operation Phase; interpreting any received data as FPDUs and sending DDP ULPDUs as FPDUs.

如果MPA应答帧格式正确且私有数据可接受,且拒绝连接位设置为“0”,则实现应进入完整MPA/DDP操作阶段;将任何接收到的数据解释为FPDU,并将DDP ULPDU发送为FPDU。

4. MPA Responder mode implementations MUST receive and validate at least one FPDU before sending any FPDUs or Markers.

4. MPA响应器模式实施必须在发送任何FPDU或标记之前接收并验证至少一个FPDU。

Note: This requirement is present to allow the Initiator time to get its receiver into Full Operation before an FPDU arrives, avoiding potential race conditions at the Initiator. This was also subject to some debate in the work group before rough consensus was reached. Eliminating this requirement would allow faster startup in some types of applications. However, that would also make certain implementations (particularly "dual stack") much harder.

注:此要求旨在允许启动器在FPDU到达之前有时间使其接收器完全运行,从而避免启动器处的潜在竞争条件。在达成大致共识之前,工作组对此也进行了一些辩论。消除此要求将允许在某些类型的应用程序中更快地启动。然而,这也会使某些实现(特别是“双堆栈”)更加困难。

5. If a received "Key" does not match the expected value (see Section 7.1.1, MPA Request and Reply Frame Format) the TCP/DDP connection MUST be closed, and an error returned to the ULP.

5. 如果收到的“密钥”与预期值不匹配(参见第7.1.1节,MPA请求和回复帧格式),则必须关闭TCP/DDP连接,并向ULP返回错误。

6. The received Private Data fields may be used by Consumers at either end to further validate the connection and set up DDP or other ULP parameters. The Initiator ULP MAY close the TCP/MPA/DDP connection as a result of validating the Private Data fields. The Responder SHOULD return an MPA Reply Frame with the "Reject Connection" bit set to '1' if the validation of the Private Data is not acceptable to the ULP.

6. 接收到的私有数据字段可由两端的使用者用于进一步验证连接并设置DDP或其他ULP参数。启动器ULP可在验证专用数据字段后关闭TCP/MPA/DDP连接。如果ULP不接受私有数据的验证,则响应程序应返回MPA应答帧,并将“拒绝连接”位设置为“1”。

7. When the first FPDU is to be sent, then if Markers are enabled, the first octets sent are the special Marker 0x00000000, followed by the start of the FPDU (the FPDU's ULPDU Length field). If Markers are not enabled, the first octets sent are the start of the FPDU (the FPDU's ULPDU Length field).

7. 当发送第一个FPDU时,如果启用了标记,则发送的第一个八位字节是特殊标记0x00000000,然后是FPDU的开始(FPDU的ULPDU长度字段)。如果未启用标记,则发送的第一个八位字节是FPDU的开始(FPDU的ULPDU长度字段)。

8. MPA implementations MUST use the difference between the MPA Request Frame and the MPA Reply Frame to check for incorrect "Initiator/Initiator" startups. Implementations SHOULD put a timeout on waiting for the MPA Request Frame when started in Responder mode, to detect incorrect "Responder/Responder" startups.

8. MPA实施必须使用MPA请求帧和MPA回复帧之间的差异来检查错误的“启动器/启动器”启动。在响应器模式下启动时,实现应在等待MPA请求帧时设置超时,以检测不正确的“响应器/响应器”启动。

9. MPA implementations MUST validate the PD_Length field. The buffer that receives the Private Data field MUST be large enough to receive that data; the amount of Private Data MUST not exceed the PD_Length or the application buffer. If any of the above fails, the startup frame MUST be considered improperly formatted.

9. MPA实现必须验证PD_长度字段。接收私有数据字段的缓冲区必须足够大以接收该数据;私有数据量不得超过PD_长度或应用程序缓冲区。如果上述任何一项失败,则必须认为启动框架的格式不正确。

10. MPA implementations SHOULD implement a reasonable timeout while waiting for the entire set of startup frames; this prevents certain denial-of-service attacks. ULPs SHOULD implement a reasonable timeout while waiting for FPDUs, ULPDUs, and application level messages to guard against application failures and certain denial-of-service attacks.

10. MPA实现应在等待整套启动帧时实现合理的超时;这可以防止某些拒绝服务攻击。ULPs应在等待FPDU、ULPDU和应用程序级消息时实现合理的超时,以防止应用程序故障和某些拒绝服务攻击。

7.1.3. Example Delayed Startup Sequence
7.1.3. 延迟启动序列示例

A variety of startup sequences are possible when using MPA on TCP. Following is an example of an MPA/DDP startup that occurs after TCP has been running for a while and has exchanged some amount of streaming data. This example does not use any Private Data (an example that does is shown later in Section 7.1.4.2, Example Immediate Startup Using Private Data), although it is perfectly legal to include the Private Data. Note that since the example does not use any Private Data, there are no ULP interactions shown between receiving "startup frames" and putting MPA into Full Operation.

在TCP上使用MPA时,可以使用多种启动顺序。下面是一个MPA/DDP启动的示例,该启动发生在TCP运行一段时间并交换了一些流数据之后。本示例不使用任何私有数据(后面第7.1.4.2节“使用私有数据的即时启动示例”中给出了使用私有数据的示例),尽管包含私有数据是完全合法的。请注意,由于该示例不使用任何私有数据,因此在接收“启动帧”和将MPA投入完全运行之间没有显示ULP交互。

Initiator Responder

发起者响应者

  +---------------------------+
  |ULP streaming mode         |
  |  <Hello> request to       |
  |  transition to DDP/MPA    |           +---------------------------+
  |  mode (optional).         | --------> |ULP gets request;          |
  +---------------------------+           |  enables MPA Responder    |
                                          |  mode with last (optional)|
                                          |  streaming mode           |
                                          |  <Hello Ack> for MPA to   |
                                          |  send.                    |
  +---------------------------+           |MPA waits for incoming     |
  |ULP receives streaming     | <-------- |  <MPA Request Frame>.     |
  |  <Hello Ack>;             |           +---------------------------+
  |Enters MPA Initiator mode; |
  |MPA sends                  |
  |  <MPA Request Frame>;     |
  |MPA waits for incoming     |           +---------------------------+
  |  <MPA Reply Frame>.       | - - - - > |MPA receives               |
  +---------------------------+           |  <MPA Request Frame>.     |
                                          |Consumer binds DDP to MPA; |
                                          |MPA sends the              |
                                          |  <MPA Reply Frame>.       |
                                          |DDP/MPA enables FPDU       |
  +---------------------------+           |  decoding, but does not   |
  |MPA receives the           | < - - - - |  send any FPDUs.          |
  |  <MPA Reply Frame>        |           +---------------------------+
  |Consumer binds DDP to MPA; |
  |DDP/MPA begins Full        |
  |  Operation.               |
  |MPA sends first FPDU (as   |           +---------------------------+
  |  DDP ULPDUs become        | ========> |MPA receives first FPDU.   |
  |  available).              |           |MPA sends first FPDU (as   |
  +---------------------------+           |  DDP ULPDUs become        |
                                  <====== |  available).              |
                                          +---------------------------+
        
  +---------------------------+
  |ULP streaming mode         |
  |  <Hello> request to       |
  |  transition to DDP/MPA    |           +---------------------------+
  |  mode (optional).         | --------> |ULP gets request;          |
  +---------------------------+           |  enables MPA Responder    |
                                          |  mode with last (optional)|
                                          |  streaming mode           |
                                          |  <Hello Ack> for MPA to   |
                                          |  send.                    |
  +---------------------------+           |MPA waits for incoming     |
  |ULP receives streaming     | <-------- |  <MPA Request Frame>.     |
  |  <Hello Ack>;             |           +---------------------------+
  |Enters MPA Initiator mode; |
  |MPA sends                  |
  |  <MPA Request Frame>;     |
  |MPA waits for incoming     |           +---------------------------+
  |  <MPA Reply Frame>.       | - - - - > |MPA receives               |
  +---------------------------+           |  <MPA Request Frame>.     |
                                          |Consumer binds DDP to MPA; |
                                          |MPA sends the              |
                                          |  <MPA Reply Frame>.       |
                                          |DDP/MPA enables FPDU       |
  +---------------------------+           |  decoding, but does not   |
  |MPA receives the           | < - - - - |  send any FPDUs.          |
  |  <MPA Reply Frame>        |           +---------------------------+
  |Consumer binds DDP to MPA; |
  |DDP/MPA begins Full        |
  |  Operation.               |
  |MPA sends first FPDU (as   |           +---------------------------+
  |  DDP ULPDUs become        | ========> |MPA receives first FPDU.   |
  |  available).              |           |MPA sends first FPDU (as   |
  +---------------------------+           |  DDP ULPDUs become        |
                                  <====== |  available).              |
                                          +---------------------------+
        

Figure 9: Example Delayed Startup Negotiation

图9:延迟启动协商示例

An example Delayed Startup sequence is described below:

延迟启动序列示例如下所述:

* Active and passive sides start up a TCP connection in the usual fashion, probably using sockets APIs. They exchange some amount of streaming mode data. At some point, one side (the MPA Initiator) sends streaming mode data that effectively says "Hello, let's go into MPA/DDP mode".

* 主动和被动端以通常的方式启动TCP连接,可能使用套接字API。它们交换一定数量的流模式数据。在某个时刻,一方(MPA启动器)发送流模式数据,有效地说“你好,让我们进入MPA/DDP模式”。

* When the remote side (the MPA Responder) gets this streaming mode message, the Consumer would send a last streaming mode message that effectively says "I acknowledge your Hello, and am now in MPA Responder mode". The exchange of these messages establishes the exact point in the TCP stream where MPA is enabled. The Responding Consumer enables MPA in the Responder mode and waits for the initial MPA startup message.

* 当远程端(MPA应答器)收到此流模式消息时,消费者将发送最后一条流模式消息,有效地说“我确认您的问候,我现在处于MPA应答器模式”。这些消息的交换建立了TCP流中启用MPA的确切点。响应消费者在响应者模式下启用MPA,并等待初始MPA启动消息。

* The Initiating Consumer would enable MPA startup in the Initiator mode which then sends the MPA Request Frame. It is assumed that no Private Data messages are needed for this example, although it is possible to do so. The Initiating MPA (and Consumer) would also wait for the MPA connection to be accepted.

* 发起消费者将在发起方模式下启用MPA启动,然后发送MPA请求帧。假设本例不需要私有数据消息,尽管这样做是可能的。启动MPA(和用户)也将等待MPA连接被接受。

* The Responding MPA would receive the initial MPA Request Frame and would inform the Consumer that this message arrived. The Consumer can then accept the MPA/DDP connection or close the TCP connection.

* 响应的MPA将接收初始MPA请求帧,并通知消费者该消息已到达。然后,使用者可以接受MPA/DDP连接或关闭TCP连接。

* To accept the connection request, the Responding Consumer would use an appropriate API to bind the TCP/MPA connections to a DDP endpoint, thus enabling MPA/DDP into Full Operation. In the process of going to Full Operation, MPA sends the MPA Reply Frame. MPA/DDP waits for the first incoming FPDU before sending any FPDUs.

* 为了接受连接请求,响应的使用者将使用适当的API将TCP/MPA连接绑定到DDP端点,从而使MPA/DDP能够完全运行。在进入全运行的过程中,MPA发送MPA应答帧。MPA/DDP在发送任何FPDU之前等待第一个传入FPDU。

* If the initial TCP data was not a properly formatted MPA Request Frame, MPA will close or reset the TCP connection immediately.

* 如果初始TCP数据不是格式正确的MPA请求帧,MPA将立即关闭或重置TCP连接。

* The Initiating MPA would receive the MPA Reply Frame and would report this message to the Consumer. The Consumer can then accept the MPA/DDP connection, or close or reset the TCP connection to abort the process.

* 发起MPA将接收MPA应答帧,并将此消息报告给消费者。然后,使用者可以接受MPA/DDP连接,或关闭或重置TCP连接以中止进程。

* On determining that the connection is acceptable, the Initiating Consumer would use an appropriate API to bind the TCP/MPA connections to a DDP endpoint thus enabling MPA/DDP into Full Operation. MPA/DDP would begin sending DDP messages as MPA FPDUs.

* 在确定连接是可接受的时,发起使用者将使用适当的API将TCP/MPA连接绑定到DDP端点,从而使MPA/DDP能够完全运行。MPA/DDP将开始以MPA FPDU的形式发送DDP消息。

7.1.4. Use of Private Data
7.1.4. 使用私人资料

This section is advisory in nature, in that it suggests a method by which a ULP can deal with pre-DDP connection information exchange.

本节本质上是建议性的,因为它提出了一种ULP可以处理DDP前连接信息交换的方法。

7.1.4.1. Motivation
7.1.4.1. 动机

Prior RDMA protocols have been developed that provide Private Data via out-of-band mechanisms. As a result, many applications now expect some form of Private Data to be available for application use prior to setting up the DDP/RDMA connection. Following are some examples of the use of Private Data.

以前的RDMA协议已经开发出来,通过带外机制提供私有数据。因此,许多应用程序现在希望在设置DDP/RDMA连接之前,应用程序可以使用某种形式的私有数据。下面是一些使用私有数据的示例。

An RDMA endpoint (referred to as a Queue Pair, or QP, in InfiniBand and the [VERBS-RDMA]) must be associated with a Protection Domain. No receive operations may be posted to the endpoint before it is associated with a Protection Domain. Indeed under both the InfiniBand and proposed RDMA/DDP verbs [VERBS-RDMA] an endpoint/QP is created within a Protection Domain.

RDMA端点(在InfiniBand和[VERBS-RDMA]中称为队列对或QP)必须与保护域相关联。在端点与保护域关联之前,不能将接收操作发布到该端点。实际上,在InfiniBand和建议的RDMA/DDP谓词[verbs-RDMA]下,在保护域内创建端点/QP。

There are some applications where the choice of Protection Domain is dependent upon the identity of the remote ULP client. For example, if a user session requires multiple connections, it is highly desirable for all of those connections to use a single Protection Domain. Note: Use of Protection Domains is further discussed in [RDMASEC].

有些应用程序的保护域选择取决于远程ULP客户端的身份。例如,如果用户会话需要多个连接,则所有这些连接都需要使用单个保护域。注:保护域的使用在[RDMASEC]中作了进一步讨论。

InfiniBand, the DAT APIs [DAT-API], and the IT-API [IT-API] all provide for the active-side ULP to provide Private Data when requesting a connection. This data is passed to the ULP to allow it to determine whether to accept the connection, and if so with which endpoint (and implicitly which Protection Domain).

InfiniBand、DAT API[DAT-API]和IT-API[IT-API]都为主动端ULP提供了在请求连接时提供私有数据的功能。此数据被传递到ULP,以允许其确定是否接受连接,如果接受,则与哪个端点(以及隐式的哪个保护域)进行连接。

The Private Data can also be used to ensure that both ends of the connection have configured their RDMA endpoints compatibly on such matters as the RDMA Read capacity (see [RDMAP]). Further ULP-specific uses are also presumed, such as establishing the identity of the client.

私有数据还可用于确保连接的两端已在RDMA读取容量等问题上兼容地配置其RDMA端点(请参见[RDMAP])。还假定ULP的其他特定用途,例如确定客户的身份。

Private Data is also allowed for when accepting the connection, to allow completion of any negotiation on RDMA resources and for other ULP reasons.

在接受连接时,还允许使用私有数据,以允许完成RDMA资源上的任何协商以及其他ULP原因。

There are several potential ways to exchange this Private Data. For example, the InfiniBand specification includes a connection management protocol that allows a small amount of Private Data to be exchanged using datagrams before actually starting the RDMA connection.

有几种可能的方法来交换这些私有数据。例如,InfiniBand规范包括一个连接管理协议,该协议允许在实际启动RDMA连接之前使用数据报交换少量私有数据。

This document allows for small amounts of Private Data to be exchanged as part of the MPA startup sequence. The actual Private Data fields are carried in the MPA Request Frame and the MPA Reply Frame.

本文件允许作为MPA启动序列的一部分交换少量私有数据。实际的私有数据字段在MPA请求帧和MPA应答帧中携带。

If larger amounts of Private Data or more negotiation is necessary, TCP streaming mode messages may be exchanged prior to enabling MPA.

如果需要更多的私有数据或更多的协商,则可以在启用MPA之前交换TCP流模式消息。

7.1.4.2. Example Immediate Startup Using Private Data
7.1.4.2. 使用私有数据的即时启动示例

Initiator Responder

发起者响应者

   +---------------------------+
   |TCP SYN sent.              |           +--------------------------+
   +---------------------------+ --------> |TCP gets SYN packet;      |
   +---------------------------+           |  sends SYN-Ack.          |
   |TCP gets SYN-Ack           | <-------- +--------------------------+
   |  sends Ack.               |
   +---------------------------+ --------> +--------------------------+
   +---------------------------+           |Consumer enables MPA      |
   |Consumer enables MPA       |           |Responder mode, waits for |
   |Initiator mode with        |           |  <MPA Request frame>.    |
   |Private Data; MPA sends    |           +--------------------------+
   |  <MPA Request Frame>;     |
   |MPA waits for incoming     |           +--------------------------+
   |  <MPA Reply Frame>.       | - - - - > |MPA receives              |
   +---------------------------+           |  <MPA Request Frame>.    |
                                           |Consumer examines Private |
                                           |Data, provides MPA with   |
                                           |return Private Data,      |
                                           |binds DDP to MPA, and     |
                                           |enables MPA to send an    |
                                           |  <MPA Reply Frame>.      |
                                           |DDP/MPA enables FPDU      |
   +---------------------------+           |decoding, but does not    |
   |MPA receives the           | < - - - - |send any FPDUs.           |
   |  <MPA Reply Frame>.       |           +--------------------------+
   |Consumer examines Private  |
   |Data, binds DDP to MPA,    |
   |and enables DDP/MPA to     |
   |begin Full Operation.      |
   |MPA sends first FPDU (as   |           +--------------------------+
   |DDP ULPDUs become          | ========> |MPA receives first FPDU.  |
   |available).                |           |MPA sends first FPDU (as  |
   +---------------------------+           |DDP ULPDUs become         |
                                   <====== |available).               |
                                           +--------------------------+
        
   +---------------------------+
   |TCP SYN sent.              |           +--------------------------+
   +---------------------------+ --------> |TCP gets SYN packet;      |
   +---------------------------+           |  sends SYN-Ack.          |
   |TCP gets SYN-Ack           | <-------- +--------------------------+
   |  sends Ack.               |
   +---------------------------+ --------> +--------------------------+
   +---------------------------+           |Consumer enables MPA      |
   |Consumer enables MPA       |           |Responder mode, waits for |
   |Initiator mode with        |           |  <MPA Request frame>.    |
   |Private Data; MPA sends    |           +--------------------------+
   |  <MPA Request Frame>;     |
   |MPA waits for incoming     |           +--------------------------+
   |  <MPA Reply Frame>.       | - - - - > |MPA receives              |
   +---------------------------+           |  <MPA Request Frame>.    |
                                           |Consumer examines Private |
                                           |Data, provides MPA with   |
                                           |return Private Data,      |
                                           |binds DDP to MPA, and     |
                                           |enables MPA to send an    |
                                           |  <MPA Reply Frame>.      |
                                           |DDP/MPA enables FPDU      |
   +---------------------------+           |decoding, but does not    |
   |MPA receives the           | < - - - - |send any FPDUs.           |
   |  <MPA Reply Frame>.       |           +--------------------------+
   |Consumer examines Private  |
   |Data, binds DDP to MPA,    |
   |and enables DDP/MPA to     |
   |begin Full Operation.      |
   |MPA sends first FPDU (as   |           +--------------------------+
   |DDP ULPDUs become          | ========> |MPA receives first FPDU.  |
   |available).                |           |MPA sends first FPDU (as  |
   +---------------------------+           |DDP ULPDUs become         |
                                   <====== |available).               |
                                           +--------------------------+
        

Figure 10: Example Immediate Startup Negotiation

图10:即时启动协商示例

Note: The exact order of when MPA is started in the TCP connection sequence is implementation dependent; the above diagram shows one possible sequence. Also, the Initiator "Ack" to the Responder's "SYN-Ack" may be combined into the same TCP segment containing the MPA Request Frame (as is allowed by TCP RFCs).

注意:MPA在TCP连接序列中启动的确切顺序取决于实现;上图显示了一种可能的顺序。此外,可以将响应者的“SYN Ack”的发起方“Ack”组合到包含MPA请求帧的相同TCP段中(如TCP RFCs所允许的)。

The example immediate startup sequence is described below:

立即启动顺序示例如下所述:

* The passive side (Responding Consumer) would listen on the TCP destination port, to indicate its readiness to accept a connection.

* 被动端(响应消费者)将侦听TCP目标端口,以指示其准备接受连接。

* The active side (Initiating Consumer) would request a connection from a TCP endpoint (that expected to upgrade to MPA/DDP/RDMA and expected the Private Data) to a destination address and port.

* 主动端(发起使用者)将请求从TCP端点(预期升级到MPA/DDP/RDMA并预期私有数据)到目标地址和端口的连接。

* The Initiating Consumer would initiate a TCP connection to the destination port. Acceptance/rejection of the connection would proceed as per normal TCP connection establishment.

* 发起消费者将发起到目标端口的TCP连接。连接的接受/拒绝将按照正常的TCP连接建立进行。

* The passive side (Responding Consumer) would receive the TCP connection request as usual allowing normal TCP gatekeepers, such as INETD and TCPserver, to exercise their normal safeguard/logging functions. On acceptance of the TCP connection, the Responding Consumer would enable MPA in the Responder mode and wait for the initial MPA startup message.

* 被动端(响应消费者)将像往常一样接收TCP连接请求,从而允许正常的TCP网关守护者(如INETD和TCPserver)行使其正常的保护/记录功能。在接受TCP连接时,响应消费者将在响应者模式下启用MPA,并等待初始MPA启动消息。

* The Initiating Consumer would enable MPA startup in the Initiator mode to send an initial MPA Request Frame with its included Private Data message to send. The Initiating MPA (and Consumer) would also wait for the MPA connection to be accepted, and any returned Private Data.

* 发起消费者将在发起模式下启用MPA启动,以发送初始MPA请求帧及其包含的要发送的私有数据消息。发起MPA(和使用者)还将等待MPA连接被接受,以及任何返回的私有数据。

* The Responding MPA would receive the initial MPA Request Frame with the Private Data message and would pass the Private Data through to the Consumer. The Consumer can then accept the MPA/DDP connection, close the TCP connection, or reject the MPA connection with a return message.

* 响应的MPA将接收带有私有数据消息的初始MPA请求帧,并将私有数据传递给消费者。然后,使用者可以接受MPA/DDP连接、关闭TCP连接或通过返回消息拒绝MPA连接。

* To accept the connection request, the Responding Consumer would use an appropriate API to bind the TCP/MPA connections to a DDP endpoint, thus enabling MPA/DDP into Full Operation. In the process of going to Full Operation, MPA sends the MPA Reply Frame, which includes the Consumer-supplied Private Data containing any appropriate Consumer response. MPA/DDP waits for the first incoming FPDU before sending any FPDUs.

* 为了接受连接请求,响应的使用者将使用适当的API将TCP/MPA连接绑定到DDP端点,从而使MPA/DDP能够完全运行。在进入完全操作的过程中,MPA发送MPA回复帧,其中包括消费者提供的包含任何适当消费者响应的私有数据。MPA/DDP在发送任何FPDU之前等待第一个传入FPDU。

* If the initial TCP data was not a properly formatted MPA Request Frame, MPA will close or reset the TCP connection immediately.

* 如果初始TCP数据不是格式正确的MPA请求帧,MPA将立即关闭或重置TCP连接。

* To reject the MPA connection request, the Responding Consumer would send an MPA Reply Frame with any ULP-supplied Private Data (with reason for rejection), with the "Rejected Connection" bit set to '1', and may close the TCP connection.

* 要拒绝MPA连接请求,响应的使用者将发送一个MPA回复帧,其中包含任何ULP提供的私有数据(带有拒绝的原因),“拒绝的连接”位设置为“1”,并且可能会关闭TCP连接。

* The Initiating MPA would receive the MPA Reply Frame with the Private Data message and would report this message to the Consumer, including the supplied Private Data.

* 发起MPA将接收带有私有数据消息的MPA应答帧,并将该消息报告给消费者,包括提供的私有数据。

If the "Rejected Connection" bit is set to a '1', MPA will close the TCP connection and exit.

如果“拒绝连接”位设置为“1”,MPA将关闭TCP连接并退出。

If the "Rejected Connection" bit is set to a '0', and on determining from the MPA Reply Frame Private Data that the connection is acceptable, the Initiating Consumer would use an appropriate API to bind the TCP/MPA connections to a DDP endpoint thus enabling MPA/DDP into Full Operation. MPA/DDP would begin sending DDP messages as MPA FPDUs.

如果“拒绝连接”位设置为“0”,并且在从MPA应答帧私有数据确定连接是可接受的时,发起消费者将使用适当的API将TCP/MPA连接绑定到DDP端点,从而使MPA/DDP能够完全运行。MPA/DDP将开始以MPA FPDU的形式发送DDP消息。

7.1.5. "Dual Stack" Implementations
7.1.5. “双栈”实现

MPA/DDP implementations are commonly expected to be implemented as part of a "dual stack" architecture. One stack is the traditional TCP stack, usually with a sockets interface API (Application Programming Interface). The second stack is the MPA/DDP stack with its own API, and potentially separate code or hardware to deal with the MPA/DDP data. Of course, implementations may vary, so the following comments are of an advisory nature only.

MPA/DDP实现通常被认为是“双堆栈”体系结构的一部分。一个堆栈是传统的TCP堆栈,通常带有套接字接口API(应用程序编程接口)。第二个堆栈是MPA/DDP堆栈,它有自己的API,并且可能有单独的代码或硬件来处理MPA/DDP数据。当然,实现可能会有所不同,因此以下注释仅具有咨询性质。

The use of the two stacks offers advantages:

使用这两个堆栈具有以下优点:

TCP connection setup is usually done with the TCP stack. This allows use of the usual naming and addressing mechanisms. It also means that any mechanisms used to "harden" the connection setup against security threats are also used when starting MPA/DDP.

TCP连接设置通常使用TCP堆栈完成。这允许使用通常的命名和寻址机制。这还意味着,在启动MPA/DDP时,也会使用任何用于“强化”连接设置以抵御安全威胁的机制。

Some applications may have been originally designed for TCP, but are "enhanced" to utilize MPA/DDP after a negotiation reveals the capability to do so. The negotiation process takes place in TCP's streaming mode, using the usual TCP APIs.

有些应用程序最初可能是为TCP设计的,但在协商表明有能力使用MPA/DDP后,这些应用程序被“增强”为使用MPA/DDP。协商过程在TCP的流模式下进行,使用通常的TCP API。

Some new applications, designed for RDMA or DDP, still need to exchange some data prior to starting MPA/DDP. This exchange can be of arbitrary length or complexity, but often consists of only a small amount of Private Data, perhaps only a single message. Using the TCP streaming mode for this exchange allows this to be done using well-understood methods.

一些为RDMA或DDP设计的新应用程序在启动MPA/DDP之前仍然需要交换一些数据。这种交换可以是任意长度或复杂度的,但通常只包含少量的私有数据,可能只包含一条消息。使用此交换的TCP流模式允许使用易于理解的方法来完成此操作。

The main disadvantage of using two stacks is the conversion of an active TCP connection between them. This process must be done with care to prevent loss of data.

使用两个堆栈的主要缺点是在它们之间转换活动TCP连接。此过程必须小心进行,以防止数据丢失。

To avoid some of the problems when using a "dual stack" architecture, the following additional restrictions may be required by the implementation:

为了避免使用“双堆栈”体系结构时出现一些问题,实现可能需要以下附加限制:

1. Enabling the DDP/MPA stack SHOULD be done only when no incoming stream data is expected. This is typically managed by the ULP protocol. When following the recommended startup sequence, the Responder side enters DDP/MPA mode, sends the last streaming mode data, and then waits for the MPA Request Frame. No additional streaming mode data is expected. The Initiator side ULP receives the last streaming mode data, and then enters DDP/MPA mode. Again, no additional streaming mode data is expected.

1. 只有在预期没有传入流数据时,才应启用DDP/MPA堆栈。这通常由ULP协议管理。当遵循推荐的启动顺序时,响应方进入DDP/MPA模式,发送最后的流模式数据,然后等待MPA请求帧。不需要额外的流模式数据。发起方端ULP接收最后的流模式数据,然后进入DDP/MPA模式。同样,不需要额外的流模式数据。

2. The DDP/MPA MAY provide the ability to send a "last streaming message" as part of its Responder DDP/MPA enable function. This allows the DDP/MPA stack to more easily manage the conversion to DDP/MPA mode (and avoid problems with a very fast return of the MPA Request Frame from the Initiator side).

2. DDP/MPA可以提供发送“最后一条流消息”的能力,作为其响应者DDP/MPA启用功能的一部分。这允许DDP/MPA堆栈更容易地管理到DDP/MPA模式的转换(并避免从启动器端非常快速地返回MPA请求帧的问题)。

Note: Regardless of the "stack" architecture used, TCP's rules MUST be followed. For example, if network data is lost, re-segmented, or re-ordered, TCP MUST recover appropriately even when this occurs while switching stacks.

注意:无论使用何种“堆栈”体系结构,都必须遵循TCP的规则。例如,如果网络数据丢失、重新分段或重新排序,即使在切换堆栈时发生这种情况,TCP也必须进行适当的恢复。

7.2. Normal Connection Teardown
7.2. 正常连接拆卸

Each half connection of MPA terminates when DDP closes the corresponding TCP half connection.

当DDP关闭相应的TCP半连接时,MPA的每个半连接终止。

A mechanism SHOULD be provided by MPA to DDP for DDP to be made aware that a graceful close of the TCP connection has been received by the TCP (e.g., FIN is received).

MPA至DDP应提供一种机制,使DDP意识到TCP已接收到TCP连接的正常关闭(例如,接收到FIN)。

8. Error Semantics
8. 错误语义

The following errors MUST be detected by MPA and the codes SHOULD be provided to DDP or other Consumer:

MPA必须检测以下错误,并将代码提供给DDP或其他消费者:

Code Error

代码错误

1 TCP connection closed, terminated, or lost. This includes lost by timeout, too many retries, RST received, or FIN received.

1 TCP连接已关闭、终止或丢失。这包括超时丢失、重试次数过多、接收RST或接收FIN。

2 Received MPA CRC does not match the calculated value for the FPDU.

2收到的MPA CRC与FPDU的计算值不匹配。

3 In the event that the CRC is valid, received MPA Marker (if enabled) and ULPDU Length fields do not agree on the start of an FPDU. If the FPDU start determined from previous ULPDU Length fields does not match with the MPA Marker position, MPA SHOULD deliver an error to DDP. It may not be possible to make this check as a segment arrives, but the check SHOULD be made when a gap creating an out-of-order sequence is closed and any time a Marker points to an already identified FPDU. It is OPTIONAL for a receiver to check each Marker, if multiple Markers are present in an FPDU, or if the segment is received in order.

3在CRC有效的情况下,接收到的MPA标记(如果启用)和ULPDU长度字段不同意FPDU的开始。如果根据以前的ULPDU长度字段确定的FPDU开始与MPA标记位置不匹配,MPA应向DDP发送错误。当段到达时,可能无法进行此检查,但应在产生无序序列的间隙关闭且标记指向已识别的FPDU时进行检查。如果FPDU中存在多个标记,或者如果按顺序接收段,则接收机可选择检查每个标记。

4 Invalid MPA Request Frame or MPA Response Frame received. In this case, the TCP connection MUST be immediately closed. DDP and other ULPs should treat this similar to code 1, above.

4接收到无效的MPA请求帧或MPA响应帧。在这种情况下,必须立即关闭TCP连接。DDP和其他ULP应将其与上述代码1类似。

When conditions 2 or 3 above are detected, an optimized MPA/TCP implementation MAY choose to silently drop the TCP segment rather than reporting the error to DDP. In this case, the sending TCP will retry the segment, usually correcting the error, unless the problem was at the source. In that case, the source will usually exceed the number of retries and terminate the connection.

当检测到上述条件2或3时,优化的MPA/TCP实现可能会选择静默删除TCP段,而不是向DDP报告错误。在这种情况下,发送TCP将重试该段,通常会更正错误,除非问题在源位置。在这种情况下,源通常会超过重试次数并终止连接。

Once MPA delivers an error of any type, it MUST NOT pass or deliver any additional FPDUs on that half connection.

一旦MPA发送任何类型的错误,它不得在该半连接上传递或发送任何额外的FPDU。

For Error codes 2 and 3, MPA MUST NOT close the TCP connection following a reported error. Closing the connection is the responsibility of DDP's ULP.

对于错误代码2和3,MPA不得在报告错误后关闭TCP连接。关闭连接是DDP的ULP的责任。

Note that since MPA will not Deliver any FPDUs on a half connection following an error detected on the receive side of that connection, DDP's ULP is expected to tear down the connection. This may not occur until after one or more last messages are transmitted on the opposite half connection. This allows a diagnostic error message to be sent.

请注意,由于MPA不会在半连接上交付任何FPDU,因为在该连接的接收端检测到错误,因此DDP的ULP预计会中断该连接。在另一半连接上传输一条或多条最后消息之前,可能不会发生这种情况。这允许发送诊断错误消息。

9. Security Considerations
9. 安全考虑

This section discusses the security considerations for MPA.

本节讨论MPA的安全注意事项。

9.1. Protocol-Specific Security Considerations
9.1. 特定于协议的安全注意事项

The vulnerabilities of MPA to third-party attacks are no greater than any other protocol running over TCP. A third party, by sending packets into the network that are delivered to an MPA receiver, could launch a variety of attacks that take advantage of how MPA operates. For example, a third party could send random packets that are valid for TCP, but contain no FPDU headers. An MPA receiver reports an error to DDP when any packet arrives that cannot be validated as an FPDU when properly located on an FPDU boundary. A third party could also send packets that are valid for TCP, MPA, and DDP, but do not target valid buffers. These types of attacks ultimately result in loss of connection and thus become a type of DOS (Denial Of Service) attack. Communication security mechanisms such as IPsec [RFC2401, RFC4301] may be used to prevent such attacks.

MPA对第三方攻击的脆弱性并不比TCP上运行的任何其他协议更大。第三方通过向网络发送数据包并将其发送给MPA接收器,可以利用MPA的运行方式发动各种攻击。例如,第三方可以发送对TCP有效但不包含FPDU头的随机数据包。当任何数据包到达时,MPA接收器向DDP报告一个错误,该数据包在正确位于FPDU边界上时无法验证为FPDU。第三方也可以发送对TCP、MPA和DDP有效但不针对有效缓冲区的数据包。这些类型的攻击最终会导致连接丢失,从而成为一种DOS(拒绝服务)攻击。诸如IPsec[RFC2401,RFC4301]之类的通信安全机制可用于防止此类攻击。

Independent of how MPA operates, a third party could use ICMP messages to reduce the path MTU to such a small size that performance would likewise be severely impacted. Range checking on path MTU sizes in ICMP packets may be used to prevent such attacks.

独立于MPA的运行方式,第三方可以使用ICMP消息将路径MTU减小到如此小的大小,从而同样会严重影响性能。ICMP数据包中路径MTU大小的范围检查可用于防止此类攻击。

[RDMAP] and [DDP] are used to control, read, and write data buffers over IP networks. Therefore, the control and the data packets of these protocols are vulnerable to the spoofing, tampering, and information disclosure attacks listed below. In addition, connection to/from an unauthorized or unauthenticated endpoint is a potential problem with most applications using RDMA, DDP, and MPA.

[RDMAP]和[DDP]用于通过IP网络控制、读取和写入数据缓冲区。因此,这些协议的控制和数据包容易受到下面列出的欺骗、篡改和信息泄露攻击。此外,对于大多数使用RDMA、DDP和MPA的应用程序,与未经授权或未经验证的端点的连接是一个潜在问题。

9.1.1. Spoofing
9.1.1. 欺骗

Spoofing attacks can be launched by the Remote Peer or by a network based attacker. A network-based spoofing attack applies to all Remote Peers. Because the MPA Stream requires a TCP Stream in the ESTABLISHED state, certain types of traditional forms of wire attacks do not apply -- an end-to-end handshake must have occurred to establish the MPA Stream. So, the only form of spoofing that applies is one when a remote node can both send and receive packets. Yet even with this limitation the Stream is still exposed to the following spoofing attacks.

远程对等方或基于网络的攻击者可以发起欺骗攻击。基于网络的欺骗攻击适用于所有远程对等方。由于MPA流需要处于已建立状态的TCP流,某些传统形式的有线攻击不适用——必须发生端到端握手才能建立MPA流。因此,唯一适用的欺骗形式是远程节点可以发送和接收数据包。然而,即使有此限制,流仍会受到以下欺骗攻击。

9.1.1.1. Impersonation
9.1.1.1. 模仿

A network-based attacker can impersonate a legal MPA/DDP/RDMAP peer (by spoofing a legal IP address) and establish an MPA/DDP/RDMAP Stream with the victim. End-to-end authentication (i.e., IPsec or ULP authentication) provides protection against this attack.

基于网络的攻击者可以模拟合法的MPA/DDP/RDMAP对等方(通过欺骗合法IP地址),并与受害者建立MPA/DDP/RDMAP流。端到端身份验证(即IPsec或ULP身份验证)提供了针对此攻击的保护。

9.1.1.2. Stream Hijacking
9.1.1.2. 流劫持

Stream hijacking happens when a network-based attacker follows the Stream establishment phase, and waits until the authentication phase (if such a phase exists) is completed successfully. He can then spoof the IP address and redirect the Stream from the victim to its own machine. For example, an attacker can wait until an iSCSI authentication is completed successfully, and hijack the iSCSI Stream.

当基于网络的攻击者遵循流建立阶段并等待身份验证阶段(如果存在此阶段)成功完成时,就会发生流劫持。然后,他可以伪造IP地址并将流从受害者重定向到自己的机器。例如,攻击者可以等待iSCSI身份验证成功完成,然后劫持iSCSI流。

The best protection against this form of attack is end-to-end integrity protection and authentication, such as IPsec, to prevent spoofing. Another option is to provide physical security. Discussion of physical security is out of scope for this document.

针对这种形式的攻击的最佳保护是端到端完整性保护和身份验证,如IPsec,以防止欺骗。另一个选择是提供物理安全。关于物理安全性的讨论超出了本文档的范围。

9.1.1.3. Man-in-the-Middle Attack
9.1.1.3. 中间人攻击

If a network-based attacker has the ability to delete, inject, replay, or modify packets that will still be accepted by MPA (e.g., TCP sequence number is correct, FPDU is valid, etc.), then the Stream can be exposed to a man-in-the-middle attack. The attacker could potentially use the services of [DDP] and [RDMAP] to read the contents of the associated Data Buffer, to modify the contents of the associated Data Buffer, or to disable further access to the buffer. Other attacks on the connection setup sequence and even on TCP can be used to cause denial of service. The only countermeasure for this form of attack is to either secure the MPA/DDP/RDMAP Stream (i.e., integrity protect) or attempt to provide physical security to prevent man-in-the-middle type attacks.

如果基于网络的攻击者能够删除、注入、重播或修改MPA仍将接受的数据包(例如TCP序列号正确、FPDU有效等),则流可能会受到中间人攻击。攻击者可能使用[DDP]和[RDMAP]服务读取关联数据缓冲区的内容,修改关联数据缓冲区的内容,或禁止进一步访问缓冲区。对连接设置序列甚至TCP的其他攻击可用于造成拒绝服务。这种形式的攻击的唯一对策是保护MPA/DDP/RDMAP流(即完整性保护)或尝试提供物理安全以防止中间人攻击。

The best protection against this form of attack is end-to-end integrity protection and authentication, such as IPsec, to prevent spoofing or tampering. If Stream or session level authentication and integrity protection are not used, then a man-in-the-middle attack can occur, enabling spoofing and tampering.

针对这种形式的攻击的最佳保护是端到端完整性保护和身份验证,如IPsec,以防止欺骗或篡改。如果未使用流或会话级身份验证和完整性保护,则可能会发生中间人攻击,从而启用欺骗和篡改。

Another approach is to restrict access to only the local subnet/link and provide some mechanism to limit access, such as physical security or 802.1.x. This model is an extremely limited deployment scenario and will not be further examined here.

另一种方法是仅限制对本地子网/链路的访问,并提供一些限制访问的机制,如物理安全或802.1.x。此模型是一个非常有限的部署场景,这里将不再进一步研究。

9.1.2. Eavesdropping
9.1.2. 窃听

Generally speaking, Stream confidentiality protects against eavesdropping. Stream and/or session authentication and integrity protection are a counter measurement against various spoofing and tampering attacks. The effectiveness of authentication and integrity against a specific attack depend on whether the authentication is machine-level authentication (as the one provided by IPsec) or ULP authentication.

一般来说,流机密性可以防止窃听。流和/或会话身份验证和完整性保护是针对各种欺骗和篡改攻击的一种对抗措施。针对特定攻击的身份验证和完整性的有效性取决于身份验证是机器级身份验证(如IPsec提供的身份验证)还是ULP身份验证。

9.2. Introduction to Security Options
9.2. 安全选项简介

The following security services can be applied to an MPA/DDP/RDMAP Stream:

以下安全服务可应用于MPA/DDP/RDMAP流:

1. Session confidentiality - protects against eavesdropping.

1. 会话机密性-防止窃听。

2. Per-packet data source authentication - protects against the following spoofing attacks: network-based impersonation, Stream hijacking, and man in the middle.

2. 每包数据源认证-防止以下欺骗攻击:基于网络的模拟,流劫持,中间人。

3. Per-packet integrity - protects against tampering done by network-based modification of FPDUs (indirectly affecting buffer content through DDP services).

3. 每数据包完整性-防止FPDU基于网络的修改(通过DDP服务间接影响缓冲区内容)造成的篡改。

4. Packet sequencing - protects against replay attacks, which is a special case of the above tampering attack.

4. 数据包排序-防止重播攻击,这是上述篡改攻击的特例。

If an MPA/DDP/RDMAP Stream may be subject to impersonation attacks, or Stream hijacking attacks, it is recommended that the Stream be authenticated, integrity protected, and protected from replay attacks. It may use confidentiality protection to protect from eavesdropping (in case the MPA/DDP/RDMAP Stream traverses a public network).

如果MPA/DDP/RDMAP流可能受到模拟攻击或流劫持攻击,建议对该流进行身份验证、完整性保护并防止重播攻击。它可以使用保密保护来防止窃听(在MPA/DDP/RDMAP流穿越公共网络的情况下)。

IPsec is capable of providing the above security services for IP and TCP traffic.

IPsec能够为IP和TCP流量提供上述安全服务。

ULP protocols may be able to provide part of the above security services. See [NFSv4CHAN] for additional information on a promising approach called "channel binding". From [NFSv4CHAN]:

ULP协议可能能够提供上述安全服务的一部分。有关一种称为“通道绑定”的有前途的方法的更多信息,请参见[NFSv4CHAN]。从[NFSv4CHAN]:

"The concept of channel bindings allows applications to prove that the end-points of two secure channels at different network layers are the same by binding authentication at one channel to the session protection at the other channel. The use of channel

“通道绑定的概念允许应用程序通过将一个通道上的身份验证绑定到另一个通道上的会话保护来证明不同网络层上两个安全通道的端点是相同的

bindings allows applications to delegate session protection to lower layers, which may significantly improve performance for some applications."

绑定允许应用程序将会话保护委托给较低的层,这可能会显著提高某些应用程序的性能。”

9.3. Using IPsec with MPA
9.3. 在MPA中使用IPsec

IPsec can be used to protect against the packet injection attacks outlined above. Because IPsec is designed to secure individual IP packets, MPA can run above IPsec without change. IPsec packets are processed (e.g., integrity checked and decrypted) in the order they are received, and an MPA receiver will process the decrypted FPDUs contained in these packets in the same manner as FPDUs contained in unsecured IP packets.

IPsec可用于防止上述数据包注入攻击。由于IPsec旨在保护单个IP数据包的安全,MPA可以在IPsec之上运行,而无需更改。IPsec数据包按照接收顺序进行处理(例如,完整性检查和解密),MPA接收器将以与不安全IP数据包中包含的FPDU相同的方式处理这些数据包中包含的解密FPDU。

MPA implementations MUST implement IPsec as described in Section 9.4 below. The use of IPsec is up to ULPs and administrators.

MPA实施必须按照下面第9.4节所述实施IPsec。IPsec的使用取决于ULP和管理员。

9.4. Requirements for IPsec Encapsulation of MPA/DDP
9.4. MPA/DDP的IPsec封装要求

The IP Storage working group has spent significant time and effort to define the normative IPsec requirements for IP storage [RFC3723]. Portions of that specification are applicable to a wide variety of protocols, including the RDDP protocol suite. In order not to replicate this effort, an MPA on TCP implementation MUST follow the requirements defined in RFC 3723, Sections 2.3 and 5, including the associated normative references for those sections.

IP存储工作组花费了大量时间和精力来定义IP存储的标准IPsec要求[RFC3723]。该规范的部分内容适用于各种协议,包括RDDP协议套件。为了避免重复这一工作,TCP实施的MPA必须遵循RFC 3723第2.3节和第5节中定义的要求,包括这些章节的相关规范性参考。

Additionally, since IPsec acceleration hardware may only be able to handle a limited number of active Internet Key Exchange Protocol (IKE) Phase 2 security associations (SAs), Phase 2 delete messages MAY be sent for idle SAs, as a means of keeping the number of active Phase 2 SAs to a minimum. The receipt of an IKE Phase 2 delete message MUST NOT be interpreted as a reason for tearing down a DDP/RDMA Stream. Rather, it is preferable to leave the Stream up, and if additional traffic is sent on it, to bring up another IKE Phase 2 SA to protect it. This avoids the potential for continually bringing Streams up and down.

此外,由于IPsec加速硬件可能只能处理有限数量的活动Internet密钥交换协议(IKE)阶段2安全关联(SA),因此可以为空闲SA发送阶段2删除消息,作为将活动阶段2 SA的数量保持在最小的一种方法。接收IKE第2阶段删除消息不得被解释为中断DDP/RDMA流的原因。相反,最好让流保持向上,如果在流上发送了额外的流量,则启动另一个IKE阶段2 SA来保护它。这避免了不断地使水流上下流动的可能性。

The IPsec requirements for RDDP are based on the version of IPsec specified in RFC 2401 [RFC2401] and related RFCs, as profiled by RFC 3723 [RFC3723], despite the existence of a newer version of IPsec specified in RFC 4301 [RFC4301] and related RFCs. One of the important early applications of the RDDP protocols is their use with iSCSI [iSER]; RDDP's IPsec requirements follow those of IPsec in order to facilitate that usage by allowing a common profile of IPsec to be used with iSCSI and the RDDP protocols. In the future, RFC

RDDP的IPsec要求基于RFC 2401[RFC2401]和相关RFC中规定的IPsec版本,如RFC 3723[RFC3723]所述,尽管存在RFC 4301[RFC4301]和相关RFC中规定的较新版本的IPsec。RDDP协议的一个重要早期应用是与iSCSI[iSER]一起使用;RDDP的IPsec要求遵循IPsec的要求,以便通过允许将IPsec的公共配置文件与iSCSI和RDDP协议一起使用来促进这种使用。未来,RFC

3723 may be updated to the newer version of IPsec; the IPsec security requirements of any such update should apply uniformly to iSCSI and the RDDP protocols.

3723可以更新到IPsec的较新版本;任何此类更新的IPsec安全要求都应统一适用于iSCSI和RDDP协议。

Note that there are serious security issues if IPsec is not implemented end-to-end. For example, if IPsec is implemented as a tunnel in the middle of the network, any hosts between the peer and the IPsec tunneling device can freely attack the unprotected Stream.

请注意,如果未端到端实现IPsec,则会出现严重的安全问题。例如,如果IPSec被实现为网络中间的隧道,那么对等体和IPSec隧道设备之间的任何主机可以自由地攻击未受保护的流。

10. IANA Considerations
10. IANA考虑

No IANA actions are required by this document.

本文件不要求IANA采取任何行动。

If a well-known port is chosen as the mechanism to identify a DDP on MPA on TCP, the well-known port must be registered with IANA. Because the use of the port is DDP specific, registration of the port with IANA is left to DDP.

如果选择一个已知端口作为识别TCP上MPA上DDP的机制,则必须向IANA注册该已知端口。由于该端口的使用是特定于DDP的,因此向IANA注册该端口将由DDP负责。

Appendix A. Optimized MPA-Aware TCP Implementations
附录A.优化的MPA感知TCP实现

This appendix is for information only and is NOT part of the standard.

本附录仅供参考,不属于本标准的一部分。

This appendix covers some Optimized MPA-aware TCP implementation guidance to implementers. It is intended for those implementations that want to send/receive as much traffic as possible in an aligned and zero-copy fashion.

本附录涵盖了一些针对实施者的优化MPA感知TCP实施指南。它适用于那些希望以对齐和零拷贝方式发送/接收尽可能多的流量的实现。

                   +-----------------------------------+
                   | +-----------+ +-----------------+ |
                   | | Optimized | | Other Protocols | |
                   | |  MPA/TCP  | +-----------------+ |
                   | +-----------+        ||           |
                   |         \\     --- socket API --- |
                   |          \\          ||           |
                   |           \\      +-----+         |
                   |            \\     | TCP |         |
                   |             \\    +-----+         |
                   |              \\    //             |
                   |             +-------+             |
                   |             |  IP   |             |
                   |             +-------+             |
                   +-----------------------------------+
        
                   +-----------------------------------+
                   | +-----------+ +-----------------+ |
                   | | Optimized | | Other Protocols | |
                   | |  MPA/TCP  | +-----------------+ |
                   | +-----------+        ||           |
                   |         \\     --- socket API --- |
                   |          \\          ||           |
                   |           \\      +-----+         |
                   |            \\     | TCP |         |
                   |             \\    +-----+         |
                   |              \\    //             |
                   |             +-------+             |
                   |             |  IP   |             |
                   |             +-------+             |
                   +-----------------------------------+
        

Figure 11: Optimized MPA/TCP Implementation

图11:优化的MPA/TCP实现

The diagram above shows a block diagram of a potential implementation. The network sub-system in the diagram can support traditional sockets-based connections using the normal API as shown on the right side of the diagram. Connections for DDP/MPA/TCP are run using the facilities shown on the left side of the diagram.

上图显示了潜在实现的框图。图中的网络子系统可以使用图右侧所示的普通API支持传统的基于套接字的连接。DDP/MPA/TCP的连接使用图表左侧所示的设施运行。

The DDP/MPA/TCP connections can be started using the facilities shown on the left side using some suitable API, or they can be initiated using the facilities shown on the right side and transitioned to the left side at the point in the connection setup where MPA goes to "Full MPA/DDP Operation Phase" as described in Section 7.1.2.

DDP/MPA/TCP连接可以使用左侧所示的设施,使用一些合适的API启动,也可以使用右侧所示的设施启动,并在连接设置中MPA进入第7.1.2节所述的“全MPA/DDP运行阶段”时转换到左侧。

The optimized MPA/TCP implementations (left side of diagram and described below) are only applicable to MPA. All other TCP applications continue to use the standard TCP stacks and interfaces shown in the right side of the diagram.

优化的MPA/TCP实现(图的左侧和下面描述的)仅适用于MPA。所有其他TCP应用程序继续使用图右侧所示的标准TCP堆栈和接口。

A.1. Optimized MPA/TCP Transmitters
A.1. 优化MPA/TCP发射机

The various TCP RFCs allow considerable choice in segmenting a TCP stream. In order to optimize FPDU recovery at the MPA receiver, an optimized MPA/TCP implementation uses additional segmentation rules.

各种TCP RFC允许在分割TCP流时进行大量选择。为了优化MPA接收器上的FPDU恢复,优化的MPA/TCP实现使用附加的分段规则。

To provide optimum performance, an optimized MPA/TCP transmit side implementation should be enabled to:

为了提供最佳性能,应启用优化的MPA/TCP传输端实现,以:

* With an EMSS large enough to contain the FPDU(s), segment the outgoing TCP stream such that the first octet of every TCP segment begins with an FPDU. Multiple FPDUs may be packed into a single TCP segment as long as they are entirely contained in the TCP segment.

* 当EMS足够大以包含FPDU时,将传出TCP流分段,使每个TCP段的第一个八位组以FPDU开头。多个FPDU可以打包到单个TCP段中,只要它们完全包含在TCP段中。

* Report the current EMSS from the TCP to the MPA transmit layer.

* 从TCP向MPA传输层报告当前EMS。

There are exceptions to the above rule. Once an ULPDU is provided to MPA, the MPA/TCP sender transmits it or fails the connection; it cannot be repudiated. As a result, during changes in MTU and EMSS, or when TCP's Receive Window size (RWIN) becomes too small, it may be necessary to send FPDUs that do not conform to the segmentation rule above.

上述规则也有例外。向MPA提供ULPDU后,MPA/TCP发送方将其传输或连接失败;这是不可否认的。因此,在MTU和EMS变化期间,或者当TCP的接收窗口大小(RWIN)变得太小时,可能需要发送不符合上述分段规则的FPDU。

A possible, but less desirable, alternative is to use IP fragmentation on accepted FPDUs to deal with MTU reductions or extremely small EMSS.

一种可能但不太理想的替代方法是在已接受的FPDU上使用IP碎片来处理MTU减少或极小的EMS。

Even when alignment with TCP segments is lost, the sender still formats the FPDU according to FPDU format as shown in Figure 2.

即使与TCP段的对齐丢失,发送方仍然根据FPDU格式格式化FPDU,如图2所示。

On a retransmission, TCP does not necessarily preserve original TCP segmentation boundaries. This can lead to the loss of FPDU Alignment and containment within a TCP segment during TCP retransmissions. An optimized MPA/TCP sender should try to preserve original TCP segmentation boundaries on a retransmission.

在重新传输时,TCP不一定保留原始TCP分段边界。这可能会导致在TCP重新传输期间丢失TCP段内的FPDU对齐和控制。优化的MPA/TCP发送方应在重新传输时尝试保留原始TCP分段边界。

A.2. Effects of Optimized MPA/TCP Segmentation
A.2. 优化MPA/TCP分段的效果

Optimized MPA/TCP senders will fill TCP segments to the EMSS with a single FPDU when a DDP message is large enough. Since the DDP message may not exactly fit into TCP segments, a "message tail" often occurs that results in an FPDU that is smaller than a single TCP segment. Additionally, some DDP messages may be considerably shorter than the EMSS. If a small FPDU is sent in a single TCP segment, the result is a "short" TCP segment.

当DDP消息足够大时,优化的MPA/TCP发送器将使用单个FPDU向EMS填充TCP段。由于DDP消息可能不完全适合TCP段,因此经常出现“消息尾”,导致FPDU小于单个TCP段。此外,一些DDP消息可能比EMS短得多。如果在单个TCP段中发送小型FPDU,则结果是“短”TCP段。

Applications expected to see strong advantages from Direct Data Placement include transaction-based applications and throughput applications. Request/response protocols typically send one FPDU per TCP segment and then wait for a response. Under these conditions, these "short" TCP segments are an appropriate and expected effect of the segmentation.

预期将从直接数据放置中看到强大优势的应用程序包括基于事务的应用程序和吞吐量应用程序。请求/响应协议通常为每个TCP段发送一个FPDU,然后等待响应。在这些条件下,这些“短”TCP段是一个适当的和预期的分割效果。

Another possibility is that the application might be sending multiple messages (FPDUs) to the same endpoint before waiting for a response. In this case, the segmentation policy would tend to reduce the available connection bandwidth by under-filling the TCP segments.

另一种可能性是,应用程序可能在等待响应之前向同一端点发送多条消息(FPDU)。在这种情况下,分段策略将倾向于通过不足填充TCP段来减少可用连接带宽。

Standard TCP implementations often utilize the Nagle [RFC896] algorithm to ensure that segments are filled to the EMSS whenever the round-trip latency is large enough that the source stream can fully fill segments before ACKs arrive. The algorithm does this by delaying the transmission of TCP segments until a ULP can fill a segment, or until an ACK arrives from the far side. The algorithm thus allows for smaller segments when latencies are shorter to keep the ULP's end-to-end latency to reasonable levels.

标准TCP实现通常使用Nagle[RFC896]算法,以确保在往返延迟足够大,源流可以在ACK到达之前完全填充段时,段被填充到EMS。该算法通过延迟TCP段的传输来实现这一点,直到ULP可以填充一个段,或者直到来自远端的ACK到达。因此,当延迟较短时,该算法允许较小的分段,以将ULP的端到端延迟保持在合理水平。

The Nagle algorithm is not mandatory to use [RFC1122].

Nagle算法不是必须使用[RFC1122]的。

When used with optimized MPA/TCP stacks, Nagle and similar algorithms can result in the "packing" of multiple FPDUs into TCP segments.

当与优化的MPA/TCP堆栈一起使用时,Nagle和类似算法可以将多个FPDU“打包”到TCP段中。

If a "message tail", small DDP messages, or the start of a larger DDP message are available, MPA may pack multiple FPDUs into TCP segments. When this is done, the TCP segments can be more fully utilized, but, due to the size constraints of FPDUs, segments may not be filled to the EMSS. A dynamic MULPDU that informs DDP of the size of the remaining TCP segment space makes filling the TCP segment more effective.

如果“消息尾”、较小的DDP消息或较大DDP消息的开始可用,MPA可以将多个FPDU打包到TCP段中。完成此操作后,可以更充分地利用TCP段,但是,由于FPDU的大小限制,段可能无法填充到EMS。动态MULPDU通知DDP剩余TCP段空间的大小,使填充TCP段更加有效。

Note that MPA receivers do more processing of a TCP segment that contains multiple FPDUs; this may affect the performance of some receiver implementations.

请注意,MPA接收器对包含多个FPDU的TCP段进行更多处理;这可能会影响某些接收器实现的性能。

It is up to the ULP to decide if Nagle is useful with DDP/MPA. Note that many of the applications expected to take advantage of MPA/DDP prefer to avoid the extra delays caused by Nagle. In such scenarios, it is anticipated there will be minimal opportunity for packing at the transmitter and receivers may choose to optimize their performance for this anticipated behavior.

由ULP决定Nagle是否适用于DDP/MPA。请注意,许多预期利用MPA/DDP的应用程序更倾向于避免Nagle造成的额外延迟。在这种情况下,预计在发射机和接收机处打包的机会最小,接收机可能会选择为这种预期行为优化其性能。

Therefore, the application is expected to set TCP parameters such that it can trade off latency and wire efficiency. Implementations should provide a connection option that disables Nagle for MPA/TCP similar to the way the TCP_NODELAY socket option is provided for a traditional sockets interface.

因此,应用程序需要设置TCP参数,以便能够权衡延迟和连线效率。实现应该提供一个连接选项,用于禁用MPA/TCP的Nagle,类似于为传统套接字接口提供TCP_NODELAY套接字选项的方式。

When latency is not critical, application is expected to leave Nagle enabled. In this case, the TCP implementation may pack any available FPDUs into TCP segments so that the segments are filled to the EMSS. If the amount of data available is not enough to fill the TCP segment when it is prepared for transmission, TCP can send the segment partly filled, or use the Nagle algorithm to wait for the ULP to post more data.

当延迟不重要时,应用程序应保持启用Nagle。在这种情况下,TCP实现可以将任何可用的FPDU打包到TCP段中,以便将这些段填充到EMS中。如果准备传输时可用数据量不足以填充TCP段,TCP可以发送部分填充的段,或使用Nagle算法等待ULP发布更多数据。

A.3. Optimized MPA/TCP Receivers
A.3. 优化MPA/TCP接收机

When an MPA receive implementation and the MPA-aware receive side TCP implementation support handling out-of-order ULPDUs, the TCP receive implementation performs the following functions:

当MPA接收实现和MPA感知接收端TCP实现支持处理无序ULPDU时,TCP接收实现将执行以下功能:

1) The implementation passes incoming TCP segments to MPA as soon as they have been received and validated, even if not received in order. The TCP layer commits to keeping each segment before it can be passed to the MPA. This means that the segment must have passed the TCP, IP, and lower layer data integrity validation (i.e., checksum), must be in the receive window, must be part of the same epoch (if timestamps are used to verify this), and must have passed any other checks required by TCP RFCs.

1) 一旦接收并验证了传入的TCP段,即使没有按顺序接收,实现也会将它们传递给MPA。TCP层承诺在将每个段传递给MPA之前保留它。这意味着该段必须通过TCP、IP和较低层数据完整性验证(即校验和),必须在接收窗口中,必须是同一历元的一部分(如果使用时间戳验证),并且必须通过TCP RFCs要求的任何其他检查。

This is not to imply that the data must be completely ordered before use. An implementation can accept out-of-order segments, SACK them [RFC2018], and pass them to MPA immediately, before the reception of the segments needed to fill in the gaps. MPA expects to utilize these segments when they are complete FPDUs or can be combined into complete FPDUs to allow the passing of ULPDUs to DDP when they arrive, independent of ordering. DDP uses the passed ULPDU to "place" the DDP segments (see [DDP] for more details).

这并不意味着数据在使用前必须完全排序。一个实现可以接受无序的段,将它们打包[RFC2018],并在接收填补空白所需的段之前立即将它们传递给MPA。MPA希望在完整FPDU时使用这些段,或者可以组合成完整FPDU,以允许ULPDU在到达时通过DDP,而不依赖于订购。DDP使用传递的ULPDU“放置”DDP段(有关更多详细信息,请参阅[DDP])。

Since MPA performs a CRC calculation and other checks on received FPDUs, the MPA/TCP implementation ensures that any TCP segments that duplicate data already received and processed (as can happen during TCP retries) do not overwrite already received and processed FPDUs. This avoids the possibility that duplicate data may corrupt already validated FPDUs.

由于MPA对接收到的FPDU执行CRC计算和其他检查,MPA/TCP实现可确保任何重复已接收和处理数据的TCP段(TCP重试期间可能发生的情况)不会覆盖已接收和处理的FPDU。这避免了重复数据可能损坏已验证的FPDU的可能性。

2) The implementation provides a mechanism to indicate the ordering of TCP segments as the sender transmitted them. One possible mechanism might be attaching the TCP sequence number to each segment.

2) 该实现提供了一种机制,在发送方传输TCP段时指示TCP段的顺序。一种可能的机制可能是将TCP序列号附加到每个段。

3) The implementation also provides a mechanism to indicate when a given TCP segment (and the prior TCP stream) is complete. One possible mechanism might be to utilize the leading (left) edge of the TCP Receive Window.

3) 该实现还提供了一种机制来指示给定的TCP段(和先前的TCP流)何时完成。一种可能的机制可能是利用TCP接收窗口的前缘(左)边缘。

MPA uses the ordering and completion indications to inform DDP when a ULPDU is complete; MPA Delivers the FPDU to DDP. DDP uses the indications to "deliver" its messages to the DDP consumer (see [DDP] for more details).

当ULPDU完成时,MPA使用订购和完成指示通知DDP;MPA将FPDU交付给DDP。DDP使用指示向DDP消费者“传递”其信息(有关更多详细信息,请参阅[DDP])。

DDP on MPA utilizes the above two mechanisms to establish the Delivery semantics that DDP's consumers agree to. These semantics are described fully in [DDP]. These include requirements on DDP's consumer to respect ownership of buffers prior to the time that DDP delivers them to the Consumer.

MPA上的DDP利用上述两种机制来建立DDP的消费者同意的交付语义。这些语义在[DDP]中有完整的描述。这些要求包括要求DDP的消费者在DDP向消费者交付缓冲区之前尊重缓冲区的所有权。

The use of SACK [RFC2018] significantly improves network utilization and performance and is therefore recommended. When combined with the out-of-order passing of segments to MPA and DDP, significant buffering and copying of received data can be avoided.

使用SACK[RFC2018]可显著提高网络利用率和性能,因此建议使用SACK。当与向MPA和DDP无序传递段相结合时,可以避免对接收数据进行大量缓冲和复制。

A.4. Re-Segmenting Middleboxes and Non-Optimized MPA/TCP Senders
A.4. 重新划分中间盒和非优化MPA/TCP发送器

Since MPA senders often start FPDUs on TCP segment boundaries, a receiving optimized MPA/TCP implementation may be able to optimize the reception of data in various ways.

由于MPA发送方通常在TCP段边界上启动FPDU,因此接收优化的MPA/TCP实现可能能够以各种方式优化数据的接收。

However, MPA receivers MUST NOT depend on FPDU Alignment on TCP segment boundaries.

但是,MPA接收器不得依赖于TCP段边界上的FPDU对齐。

Some MPA senders may be unable to conform to the sender requirements because their implementation of TCP is not designed with MPA in mind. Even for optimized MPA/TCP senders, the network may contain "middleboxes" which modify the TCP stream by changing the segmentation. This is generally interoperable with TCP and its users and MPA must be no exception.

一些MPA发送方可能无法符合发送方要求,因为它们的TCP实现在设计时没有考虑MPA。即使对于优化的MPA/TCP发送方,网络也可能包含通过更改分段修改TCP流的“中间盒”。这通常可以与TCP及其用户进行互操作,MPA也不例外。

The presence of Markers in MPA (when enabled) allows an optimized MPA/TCP receiver to recover the FPDUs despite these obstacles, although it may be necessary to utilize additional buffering at the receiver to do so.

MPA中标记的存在(启用时)允许优化的MPA/TCP接收器恢复FPDU,尽管存在这些障碍,但可能需要在接收器处利用额外的缓冲来恢复FPDU。

Some of the cases that a receiver may have to contend with are listed below as a reminder to the implementer:

以下列出了接收者可能必须面对的一些情况,以提醒实施者:

* A single aligned and complete FPDU, either in order or out of order: This can be passed to DDP as soon as validated, and Delivered when ordering is established.

* 一个对齐的完整FPDU,有序或无序:可在验证后尽快传递给DDP,并在确定订购时交付。

* Multiple FPDUs in a TCP segment, aligned and fully contained, either in order or out of order: These can be passed to DDP as soon as validated, and Delivered when ordering is established.

* TCP段中的多个FPDU,对齐且完全包含,有序或无序:这些FPDU可在验证后立即传递给DDP,并在建立订购时交付。

* Incomplete FPDU: The receiver should buffer until the remainder of the FPDU arrives. If the remainder of the FPDU is already available, this can be passed to DDP as soon as validated, and Delivered when ordering is established.

* 不完整的FPDU:接收器应缓冲,直到剩余的FPDU到达。如果FPDU的其余部分已经可用,则可在验证后尽快将其传递给DDP,并在确定订购时交付。

* Unaligned FPDU start: The partial FPDU must be combined with its preceding portion(s). If the preceding parts are already available, and the whole FPDU is present, this can be passed to DDP as soon as validated, and Delivered when ordering is established. If the whole FPDU is not available, the receiver should buffer until the remainder of the FPDU arrives.

* 未对齐的FPDU开始:部分FPDU必须与其前面的部分组合。如果前面的零件已经可用,并且存在整个FPDU,则可在验证后尽快将其传递给DDP,并在确定订购时交付。如果整个FPDU不可用,接收器应缓冲,直到FPDU的其余部分到达。

* Combinations of unaligned or incomplete FPDUs (and potentially other complete FPDUs) in the same TCP segment: If any FPDU is present in its entirety, or can be completed with portions already available, it can be passed to DDP as soon as validated, and Delivered when ordering is established.

* 同一TCP段中未对齐或不完整的FPDU(以及可能的其他完整FPDU)的组合:如果任何FPDU全部存在,或者可以在部分可用的情况下完成,则可以在验证后尽快将其传递给DDP,并在确定订购时交付。

A.5. Receiver Implementation
A.5. 接收机实现

Transport & Network Layer Reassembly Buffers:

传输层和网络层重新组装缓冲区:

The use of reassembly buffers (either TCP reassembly buffers or IP fragmentation reassembly buffers) is implementation dependent. When MPA is enabled, reassembly buffers are needed if out-of-order packets arrive and Markers are not enabled. Buffers are also needed if FPDU alignment is lost or if IP fragmentation occurs. This is because the incoming out-of-order segment may not contain enough information for MPA to process all of the FPDU. For cases where a re-segmenting middlebox is present, or where the TCP sender is not optimized, the presence of Markers significantly reduces the amount of buffering needed.

重组缓冲区(TCP重组缓冲区或IP碎片重组缓冲区)的使用取决于实现。启用MPA时,如果无序数据包到达且标记未启用,则需要重新组装缓冲区。如果FPDU对齐丢失或发生IP碎片,也需要缓冲区。这是因为传入的无序段可能不包含MPA处理所有FPDU所需的足够信息。对于存在重新分段的中间盒或TCP发送方未优化的情况,标记的存在显著减少了所需的缓冲量。

Recovery from IP fragmentation is transparent to the MPA Consumers.

从IP碎片中恢复对MPA使用者是透明的。

A.5.1 Network Layer Reassembly Buffers
A.5.1 网络层重组缓冲区

The MPA/TCP implementation should set the IP Don't Fragment bit at the IP layer. Thus, upon a path MTU change, intermediate devices drop the IP datagram if it is too large and reply with an ICMP message that tells the source TCP that the path MTU has changed. This causes TCP to emit segments conformant with the new path MTU size. Thus, IP fragments under most conditions should never occur at the receiver. But it is possible.

MPA/TCP实现应该在IP层设置IP不分段位。因此,在路径MTU更改时,如果IP数据报太大,中间设备会丢弃该数据报,并用ICMP消息回复,该消息告诉源TCP路径MTU已更改。这会导致TCP发出与新路径MTU大小一致的段。因此,在大多数情况下,IP碎片不应该出现在接收器上。但这是可能的。

There are several options for implementation of network layer reassembly buffers:

实现网络层重组缓冲区有几个选项:

1. drop any IP fragments, and reply with an ICMP message according to [RFC792] (fragmentation needed and DF set) to tell the Remote Peer to resize its TCP segment.

1. 删除任何IP碎片,并根据[RFC792](需要碎片和DF设置)回复ICMP消息,以告知远程对等方调整其TCP段的大小。

2. support an IP reassembly buffer, but have it of limited size (possibly the same size as the local link's MTU). The end node would normally never Advertise a path MTU larger than the local link MTU. It is recommended that a dropped IP fragment cause an ICMP message to be generated according to RFC 792.

2. 支持IP重组缓冲区,但其大小有限(可能与本地链路的MTU大小相同)。终端节点通常不会公布大于本地链路MTU的路径MTU。建议丢弃的IP片段根据RFC 792生成ICMP消息。

3. multiple IP reassembly buffers, of effectively unlimited size.

3. 多个IP重组缓冲区,实际上大小不限。

4. support an IP reassembly buffer for the largest IP datagram (64 KB).

4. 支持最大IP数据报(64 KB)的IP重组缓冲区。

5. support for a large IP reassembly buffer that could span multiple IP datagrams.

5. 支持可以跨越多个IP数据报的大型IP重组缓冲区。

An implementation should support at least 2 or 3 above, to avoid dropping packets that have traversed the entire fabric.

一个实现应该至少支持2或3个以上,以避免丢弃穿越整个结构的数据包。

There is no end-to-end ACK for IP reassembly buffers, so there is no flow control on the buffer. The only end-to-end ACK is a TCP ACK, which can only occur when a complete IP datagram is delivered to TCP. Because of this, under worst case, pathological scenarios, the largest IP reassembly buffer is the TCP receive window (to buffer multiple IP datagrams that have all been fragmented).

IP重组缓冲区没有端到端的ACK,因此缓冲区上没有流控制。唯一的端到端ACK是TCP ACK,它只能在完整的IP数据报传递到TCP时发生。正因为如此,在最坏的情况下,病理场景下,最大的IP重组缓冲区是TCP接收窗口(用于缓冲所有已碎片化的多个IP数据报)。

Note that if the Remote Peer does not implement re-segmentation of the data stream upon receiving the ICMP reply updating the path MTU, it is possible to halt forward progress because the opposite peer would continue to retransmit using a transport segment size that is too large. This deadlock scenario is no different than if the fabric MTU (not last-hop MTU) was reduced after connection setup, and the remote node's behavior is not compliant with [RFC1122].

请注意,如果远程对等方在接收到更新路径MTU的ICMP应答时未实现数据流的重新分段,则可能会停止转发进程,因为对方将继续使用过大的传输段大小重新传输。此死锁场景与连接设置后结构MTU(不是最后一跳MTU)减少,并且远程节点的行为不符合[RFC1122]的情况相同。

A.5.2 TCP Reassembly Buffers
A.5.2 TCP重组缓冲区

A TCP reassembly buffer is also needed. TCP reassembly buffers are needed if FPDU Alignment is lost when using TCP with MPA or when the MPA FPDU spans multiple TCP segments. Buffers are also needed if Markers are disabled and out-of-order packets arrive.

还需要TCP重组缓冲区。如果将TCP与MPA一起使用或MPA FPDU跨越多个TCP段时FPDU对齐丢失,则需要TCP重组缓冲区。如果标记被禁用并且无序数据包到达,也需要缓冲区。

Since lost FPDU Alignment often means that FPDUs are incomplete, an MPA on TCP implementation must have a reassembly buffer large enough to recover an FPDU that is less than or equal to the MTU of the locally attached link (this should be the largest possible Advertised TCP path MTU). If the MTU is smaller than 140 octets, a buffer of at least 140 octets long is needed to support the minimum FPDU size. The 140 octets allow for the minimum MULPDU of 128, 2 octets of pad, 2 of ULPDU_Length, 4 of CRC, and space for a possible Marker. As usual, additional buffering is likely to provide better performance.

由于FPDU对齐丢失通常意味着FPDU不完整,TCP上的MPA实现必须具有足够大的重新组装缓冲区,以恢复小于或等于本地连接链路的MTU的FPDU(这应该是可能的最大公布TCP路径MTU)。如果MTU小于140个八位字节,则需要至少140个八位字节长的缓冲区来支持最小FPDU大小。140个八位字节允许最小MULPDU为128,2个八位字节pad,2个ULPDU_长度,4个CRC,以及可能标记的空间。与往常一样,额外的缓冲可能提供更好的性能。

Note that if the TCP segments were not stored, it would be possible to deadlock the MPA algorithm. If the path MTU is reduced, FPDU Alignment requires the source TCP to re-segment the data stream to the new path MTU. The source MPA will detect this condition and reduce the MPA segment size, but any FPDUs already posted to the source TCP will be re-segmented and lose FPDU Alignment. If the destination does not support a TCP reassembly buffer, these segments can never be successfully transmitted and the protocol deadlocks.

请注意,如果没有存储TCP段,可能会使MPA算法死锁。如果路径MTU减少,FPDU对齐要求源TCP将数据流重新分段到新路径MTU。源MPA将检测到这种情况并减小MPA段大小,但已发布到源TCP的任何FPDU将被重新分段并失去FPDU对齐。如果目的地不支持TCP重组缓冲区,则这些段永远无法成功传输,并且协议会死锁。

When a complete FPDU is received, processing continues normally.

当接收到完整的FPDU时,处理正常继续。

Appendix B. Analysis of MPA over TCP Operations
附录B.TCP操作的MPA分析

This appendix is for information only and is NOT part of the standard.

本附录仅供参考,不属于本标准的一部分。

This appendix is an analysis of MPA on TCP and why it is useful to integrate MPA with TCP (with modifications to typical TCP implementations) to reduce overall system buffering and overhead.

本附录分析了MPA对TCP的影响,以及为什么将MPA与TCP集成(对典型TCP实现进行了修改)有助于减少总体系统缓冲和开销。

One of MPA's high-level goals is to provide enough information, when combined with the Direct Data Placement Protocol [DDP], to enable out-of-order placement of DDP payload into the final Upper Layer Protocol (ULP) Buffer. Note that DDP separates the act of placing data into a ULP Buffer from that of notifying the ULP that the ULP Buffer is available for use. In DDP terminology, the former is defined as "Placement", and the later is defined as "Delivery". MPA supports in-order Delivery of the data to the ULP, including support for Direct Data Placement in the final ULP Buffer location when TCP segments arrive out of order. Effectively, the goal is to use the

MPA的高级目标之一是在与直接数据放置协议[DDP]结合时提供足够的信息,以便将DDP有效负载无序放置到最终上层协议(ULP)缓冲区中。请注意,DDP将数据放入ULP缓冲区的行为与通知ULP ULP缓冲区可供使用的行为分开。在DDP术语中,前者定义为“放置”,后者定义为“交付”。MPA支持按顺序将数据交付到ULP,包括在TCP段无序到达时支持将数据直接放置在最终ULP缓冲区位置。实际上,目标是使用

pre-posted ULP Buffers as the TCP receive buffer, where the reassembly of the ULP Protocol Data Unit (PDU) by TCP (with MPA and DDP) is done in place, in the ULP Buffer, with no data copies.

预发布的ULP缓冲区作为TCP接收缓冲区,其中通过TCP(使用MPA和DDP)重新组装ULP协议数据单元(PDU),在ULP缓冲区中进行,无数据副本。

This appendix walks through the advantages and disadvantages of the TCP sender modifications proposed by MPA:

本附录介绍了MPA提出的TCP发送方修改的优缺点:

1) that MPA prefers that the TCP sender to do Header Alignment, where a TCP segment should begin with an MPA Framing Protocol Data Unit (FPDU) (if there is payload present).

1) MPA倾向于TCP发送方进行报头对齐,其中TCP段应以MPA帧协议数据单元(FPDU)开始(如果存在有效负载)。

2) that there be an integral number of FPDUs in a TCP segment (under conditions where the path MTU is not changing).

2) TCP段中有整数个FPDU(在路径MTU不变的情况下)。

This appendix concludes that the scaling advantages of FPDU Alignment are strong, based primarily on fairly drastic TCP receive buffer reduction requirements and simplified receive handling. The analysis also shows that there is little effect to TCP wire behavior.

本附录得出结论,FPDU对齐的可扩展性优势很强,主要基于相当激烈的TCP接收缓冲区减少要求和简化的接收处理。分析还表明,对TCP连接行为的影响很小。

B.1. Assumptions
B.1. 假设
B.1.1 MPA Is Layered beneath DDP
B.1.1 MPA在DDP下面分层

MPA is an adaptation layer between DDP and TCP. DDP requires preservation of DDP segment boundaries and a CRC32c digest covering the DDP header and data. MPA adds these features to the TCP stream so that DDP over TCP has the same basic properties as DDP over SCTP.

MPA是DDP和TCP之间的一个适配层。DDP要求保留DDP段边界和包含DDP标题和数据的CRC32c摘要。MPA将这些特性添加到TCP流中,以便TCP上的DDP具有与SCTP上的DDP相同的基本属性。

B.1.2. MPA Preserves DDP Message Framing
B.1.2. MPA保留DDP消息帧

MPA was designed as a framing layer specifically for DDP and was not intended as a general-purpose framing layer for any other ULP using TCP.

MPA被设计为专门用于DDP的帧层,而不是用于使用TCP的任何其他ULP的通用帧层。

A framing layer allows ULPs using it to receive indications from the transport layer only when complete ULPDUs are present. As a framing layer, MPA is not aware of the content of the DDP PDU, only that it has received and, if necessary, reassembled a complete PDU for Delivery to the DDP.

帧层允许ULP仅在存在完整的ULPDU时使用它从传输层接收指示。作为帧层,MPA不知道DDP PDU的内容,只知道它已经接收并在必要时重新组装了一个完整的PDU以交付给DDP。

B.1.3. The Size of the ULPDU Passed to MPA Is Less Than EMSS under Normal Conditions

B.1.3. 在正常条件下,传递给MPA的ULPDU尺寸小于EMS

To make reception of a complete DDP PDU on every received segment possible, DDP passes to MPA a PDU that is no larger than the EMSS of the underlying fabric. Each FPDU that MPA creates contains sufficient information for the receiver to directly place the ULP payload in the correct location in the correct receive buffer.

为了能够在每个接收段上接收完整的DDP PDU,DDP将不大于底层结构EMS的PDU传递给MPA。MPA创建的每个FPDU都包含足够的信息,以便接收器直接将ULP有效负载放置在正确接收缓冲区中的正确位置。

Edge cases when this condition does not occur are dealt with, but do not need to be on the fast path.

当这种情况不发生时,将处理边缘情况,但不需要在快速路径上。

B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery
B.1.4. 无序放置,但无无序交付

DDP receives complete DDP PDUs from MPA. Each DDP PDU contains the information necessary to place its ULP payload directly in the correct location in host memory.

DDP从MPA接收完整的DDP PDU。每个DDP PDU都包含将其ULP有效负载直接置于主机内存中正确位置所需的信息。

Because each DDP segment is self-describing, it is possible for DDP segments received out of order to have their ULP payload placed immediately in the ULP receive buffer.

由于每个DDP段都是自描述的,因此无序接收的DDP段可能会将其ULP有效负载立即放置在ULP接收缓冲区中。

Data delivery to the ULP is guaranteed to be in the order the data was sent. DDP only indicates data delivery to the ULP after TCP has acknowledged the complete byte stream.

向ULP的数据传输保证按照数据发送的顺序进行。DDP仅在TCP确认完整的字节流后指示向ULP发送数据。

B.2. The Value of FPDU Alignment
B.2. FPDU对齐的值

Significant receiver optimizations can be achieved when Header Alignment and complete FPDUs are the common case. The optimizations allow utilizing significantly fewer buffers on the receiver and less computation per FPDU. The net effect is the ability to build a "flow-through" receiver that enables TCP-based solutions to scale to 10G and beyond in an economical way. The optimizations are especially relevant to hardware implementations of receivers that process multiple protocol layers -- Data Link Layer (e.g., Ethernet), Network and Transport Layer (e.g., TCP/IP), and even some ULP on top of TCP (e.g., MPA/DDP). As network speed increases, there is an increasing desire to use a hardware-based receiver in order to achieve an efficient high performance solution.

当报头对齐和完整FPDU是常见情况时,可以实现显著的接收机优化。优化允许在接收器上使用更少的缓冲区,并且每个FPDU的计算量更少。其最终效果是能够构建“直通式”接收器,使基于TCP的解决方案能够以经济的方式扩展到10G及以上。这些优化特别适用于处理多个协议层的接收器的硬件实现——数据链路层(如以太网)、网络和传输层(如TCP/IP),甚至TCP之上的一些ULP(如MPA/DDP)。随着网络速度的提高,人们越来越希望使用基于硬件的接收机来实现高效的高性能解决方案。

A TCP receiver, under worst-case conditions, has to allocate buffers (BufferSizeTCP) whose capacities are a function of the bandwidth-delay product. Thus:

TCP接收器在最坏情况下必须分配缓冲区(BufferSizeTCP),其容量是带宽延迟乘积的函数。因此:

BufferSizeTCP = K * bandwidth [octets/second] * Delay [seconds].

BufferSizeTCP=K*带宽[八位字节/秒]*延迟[秒]。

Where bandwidth is the end-to-end bandwidth of the connection, delay is the round-trip delay of the connection, and K is an implementation-dependent constant.

其中,带宽是连接的端到端带宽,延迟是连接的往返延迟,K是依赖于实现的常数。

Thus, BufferSizeTCP scales with the end-to-end bandwidth (10x more buffers for a 10x increase in end-to-end bandwidth). As this buffering approach may scale poorly for hardware or software implementations alike, several approaches allow reduction in the amount of buffering required for high-speed TCP communication.

因此,BufferSizeTCP随着端到端带宽的增加而扩展(端到端带宽增加10倍,缓冲区增加10倍)。由于这种缓冲方法可能对硬件或软件实现的扩展性很差,有几种方法允许减少高速TCP通信所需的缓冲量。

The MPA/DDP approach is to enable the ULP's Buffer to be used as the TCP receive buffer. If the application pre-posts a sufficient amount of buffering, and each TCP segment has sufficient information to place the payload into the right application buffer, when an out-of-order TCP segment arrives it could potentially be placed directly in the ULP Buffer. However, placement can only be done when a complete FPDU with the placement information is available to the receiver, and the FPDU contents contain enough information to place the data into the correct ULP Buffer (e.g., there is a DDP header available).

MPA/DDP方法是将ULP的缓冲区用作TCP接收缓冲区。如果应用程序预先发布了足够的缓冲,并且每个TCP段都有足够的信息将有效负载放入正确的应用程序缓冲区,那么当出现故障的TCP段到达时,可能会直接将其放入ULP缓冲区。然而,只有当接收机可以使用带有放置信息的完整FPDU,并且FPDU内容包含足够的信息以将数据放置到正确的ULP缓冲区(例如,有一个DDP报头可用)时,才能进行放置。

For the case when the FPDU is not aligned with the TCP segment, it may take, on average, 2 TCP segments to assemble one FPDU. Therefore, the receiver has to allocate BufferSizeNAF (Buffer Size, Non-Aligned FPDU) octets:

对于FPDU未与TCP段对齐的情况,可能平均需要2个TCP段来组装一个FPDU。因此,接收器必须分配BufferSizeNAF(缓冲区大小,非对齐FPDU)八位字节:

       BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS
        
       BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS
        

Where K1 and K2 are implementation-dependent constants and EMSS is the effective maximum segment size.

其中,K1和K2是与实现相关的常数,EMSS是有效的最大段大小。

For example, a 1 GB/sec link with 10,000 connections and an EMSS of 1500 B would require 15 MB of memory. Often the number of connections used scales with the network speed, aggravating the situation for higher speeds.

例如,具有10000个连接的1 GB/s链路和1500 B的EMS需要15 MB内存。通常,所使用的连接数量会随着网络速度的增加而增加,从而加剧了更高速度的情况。

FPDU Alignment would allow the receiver to allocate BufferSizeAF (Buffer Size, Aligned FPDU) octets:

FPDU对齐将允许接收器分配BufferSizeAF(缓冲区大小,对齐的FPDU)八位字节:

BufferSizeAF = K2 * EMSS

BufferSizeAF=K2*EMSS

for the same conditions. An FPDU Aligned receiver may require memory in the range of ~100s of KB -- which is feasible for an on-chip memory and enables a "flow-through" design, in which the data flows through the network interface card (NIC) and is placed directly in the destination buffer. Assuming most of the connections support FPDU Alignment, the receiver buffers no longer scale with number of connections.

在同样的条件下。与FPDU对齐的接收器可能需要约100s KB的内存,这对于片上内存是可行的,并支持“流式”设计,其中数据流经网络接口卡(NIC)并直接放入目标缓冲区。假设大多数连接支持FPDU对齐,则接收器缓冲区不再随连接数而缩放。

Additional optimizations can be achieved in a balanced I/O sub-system -- where the system interface of the network controller provides ample bandwidth as compared with the network bandwidth. For almost twenty years this has been the case and the trend is expected to continue. While Ethernet speeds have scaled by 1000 (from 10 megabit/sec to 10 gigabit/sec), I/O bus bandwidth of volume CPU architectures has scaled from ~2 MB/sec to ~2 GB/sec (PC-XT bus to PCI-X DDR). Under these conditions, the FPDU Alignment approach allows BufferSizeAF to be indifferent to network speed. It is primarily a function of the local processing time for a given frame.

额外的优化可以在平衡的I/O子系统中实现——与网络带宽相比,网络控制器的系统接口提供了充足的带宽。近二十年来,情况一直如此,预计这一趋势将继续下去。虽然以太网速度已扩展到1000(从10兆位/秒到10千兆位/秒),但卷CPU体系结构的I/O总线带宽已从~2 MB/秒扩展到~2 GB/秒(PC-XT总线到PCI-X DDR)。在这些条件下,FPDU对齐方法允许BufferSizeAF与网络速度无关。它主要是给定帧的本地处理时间的函数。

Thus, when the FPDU Alignment approach is used, receive buffering is expected to scale gracefully (i.e., less than linear scaling) as network speed is increased.

因此,当使用FPDU对齐方法时,随着网络速度的增加,接收缓冲预计会优雅地缩放(即,小于线性缩放)。

B.2.1. Impact of Lack of FPDU Alignment on the Receiver Computational Load and Complexity

B.2.1. 缺少FPDU对齐对接收机计算负载和复杂性的影响

The receiver must perform IP and TCP processing, and then perform FPDU CRC checks, before it can trust the FPDU header placement information. For simplicity of the description, the assumption is that an FPDU is carried in no more than 2 TCP segments. In reality, with no FPDU Alignment, an FPDU can be carried by more than 2 TCP segments (e.g., if the path MTU was reduced).

接收器必须先执行IP和TCP处理,然后执行FPDU CRC检查,然后才能信任FPDU头位置信息。为简化描述,假设FPDU在不超过2个TCP段中承载。实际上,在没有FPDU对齐的情况下,一个FPDU可以由2个以上的TCP段承载(例如,如果路径MTU减少)。

   ----++-----------------------------++-----------------------++-----
   +---||---------------+    +--------||--------+   +----------||----+
   |   TCP Seg X-1      |    |     TCP Seg X    |   |  TCP Seg X+1   |
   +---||---------------+    +--------||--------+   +----------||----+
   ----++-----------------------------++-----------------------++-----
                   FPDU #N-1                  FPDU #N
        
   ----++-----------------------------++-----------------------++-----
   +---||---------------+    +--------||--------+   +----------||----+
   |   TCP Seg X-1      |    |     TCP Seg X    |   |  TCP Seg X+1   |
   +---||---------------+    +--------||--------+   +----------||----+
   ----++-----------------------------++-----------------------++-----
                   FPDU #N-1                  FPDU #N
        

Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream

图12:自由放置在TCP八位字节流中的未对齐FPDU

The receiver algorithm for processing TCP segments (e.g., TCP segment #X in Figure 12) carrying non-aligned FPDUs (in order or out of order) includes:

处理携带未对齐FPDU(有序或无序)的TCP段(如图12中的TCP段#X)的接收器算法包括:

Data Link Layer processing (whole frame) -- typically including a CRC calculation.

数据链路层处理(整个帧)——通常包括CRC计算。

1. Network Layer processing (assuming not an IP fragment, the whole Data Link Layer frame contains one IP datagram. IP fragments should be reassembled in a local buffer. This is not a performance optimization goal.)

1. 网络层处理(假设不是IP片段,整个数据链路层框架包含一个IP数据报。IP片段应在本地缓冲区中重新组装。这不是性能优化目标。)

2. Transport Layer processing -- TCP protocol processing, header and checksum checks.

2. 传输层处理——TCP协议处理、报头和校验和检查。

a. Classify incoming TCP segment using the 5 tuple (IP SRC, IP DST, TCP SRC Port, TCP DST Port, protocol).

a. 使用5元组(IP SRC、IP DST、TCP SRC端口、TCP DST端口、协议)对传入的TCP段进行分类。

3. Find FPDU message boundaries.

3. 查找FPDU消息边界。

a. Get MPA state information for the connection.

a. 获取连接的状态信息。

If the TCP segment is in order, use the receiver-managed MPA state information to calculate where the previous FPDU message (#N-1) ends in the current TCP segment X. (previously, when the MPA receiver processed the first part of FPDU #N-1, it calculated the number of bytes remaining to complete FPDU #N-1 by using the MPA Length field).

如果TCP段正常,则使用接收方管理的MPA状态信息计算前一个FPDU消息(#N-1)在当前TCP段X中的结束位置(之前,当MPA接收方处理FPDU的第一部分#N-1时,它使用MPA长度字段计算完成FPDU#N-1所剩余的字节数)。

Get the stored partial CRC for FPDU #N-1.

获取存储的FPDU#N-1的部分CRC。

Complete CRC calculation for FPDU #N-1 data (first portion of TCP segment #X).

完成FPDU#N-1数据的CRC计算(TCP段#X的第一部分)。

Check CRC calculation for FPDU #N-1.

检查FPDU N-1的CRC计算。

If no FPDU CRC errors, placement is allowed.

如果没有FPDU CRC错误,则允许放置。

Locate the local buffer for the first portion of FPDU#N-1, CopyData(local buffer of first portion of FPDU #N-1, host buffer address, length).

找到FPDU#N-1第一部分的本地缓冲区,CopyData(FPDU#N-1第一部分的本地缓冲区,主机缓冲区地址,长度)。

Compute host buffer address for second portion of FPDU #N-1.

计算FPDU#N-1第二部分的主机缓冲区地址。

CopyData (local buffer of second portion of FPDU #N-1, host buffer address for second portion, length).

CopyData(FPDU第二部分的本地缓冲区#N-1,第二部分的主机缓冲区地址,长度)。

Calculate the octet offset into the TCP segment for the next FPDU #N.

为下一个FPDU#N计算TCP段中的八位字节偏移量。

Start calculation of CRC for available data for FPDU. #N

开始计算FPDU可用数据的CRC#N

Store partial CRC results for FPDU #N.

存储FPDU的部分CRC结果#N。

Store local buffer address of first portion of FPDU #N.

存储FPDU#N的第一部分的本地缓冲区地址。

No further action is possible on FPDU #N, before it is completely received.

在完全接收FPDU#N之前,不可能对其采取进一步行动。

If the TCP segment is out of order, the receiver must buffer the data until at least one complete FPDU is received. Typically, buffering for more than one TCP segment per connection is required. Use the MPA-based Markers to calculate where FPDU boundaries are.

如果TCP段出现故障,接收器必须缓冲数据,直到至少收到一个完整的FPDU。通常,每个连接需要缓冲多个TCP段。使用基于MPA的标记计算FPDU边界的位置。

When a complete FPDU is available, a similar procedure to the in-order algorithm above is used. There is additional complexity, though, because when the missing segment arrives, this TCP segment must be run through the CRC engine after the CRC is calculated for the missing segment.

当完整的FPDU可用时,使用与上述顺序算法类似的程序。然而,还有额外的复杂性,因为当缺失段到达时,在为缺失段计算CRC之后,必须通过CRC引擎运行该TCP段。

If we assume FPDU Alignment, the following diagram and the algorithm below apply. Note that when using MPA, the receiver is assumed to actively detect presence or loss of FPDU Alignment for every TCP segment received.

如果我们假设FPDU对齐,下面的图表和算法适用。注意,当使用MPA时,假定接收器主动检测接收到的每个TCP段的FPDU对齐的存在或丢失。

      +--------------------------+      +--------------------------+
   +--|--------------------------+   +--|--------------------------+
   |  |       TCP Seg X          |   |  |         TCP Seg X+1      |
   +--|--------------------------+   +--|--------------------------+
      +--------------------------+      +--------------------------+
                FPDU #N                          FPDU #N+1
        
      +--------------------------+      +--------------------------+
   +--|--------------------------+   +--|--------------------------+
   |  |       TCP Seg X          |   |  |         TCP Seg X+1      |
   +--|--------------------------+   +--|--------------------------+
      +--------------------------+      +--------------------------+
                FPDU #N                          FPDU #N+1
        

Figure 13: Aligned FPDU Placed Immediately after TCP Header

图13:对齐的FPDU放置在TCP头之后

The receiver algorithm for FPDU Aligned frames (in order or out of order) includes:

FPDU对齐帧(有序或无序)的接收器算法包括:

1) Data Link Layer processing (whole frame) -- typically including a CRC calculation.

1) 数据链路层处理(整个帧)——通常包括CRC计算。

2) Network Layer processing (assuming not an IP fragment, the whole Data Link Layer frame contains one IP datagram. IP fragments should be reassembled in a local buffer. This is not a performance optimization goal.)

2) 网络层处理(假设不是IP片段,整个数据链路层框架包含一个IP数据报。IP片段应在本地缓冲区中重新组装。这不是性能优化目标。)

3) Transport Layer processing -- TCP protocol processing, header and checksum checks.

3) 传输层处理——TCP协议处理、报头和校验和检查。

a. Classify incoming TCP segment using the 5 tuple (IP SRC, IP DST, TCP SRC Port, TCP DST Port, protocol).

a. 使用5元组(IP SRC、IP DST、TCP SRC端口、TCP DST端口、协议)对传入的TCP段进行分类。

4) Check for Header Alignment (described in detail in Section 6). Assuming Header Alignment for the rest of the algorithm below.

4) 检查收割台对齐情况(详细说明见第6节)。假设下面的算法的其余部分为标题对齐。

a. If the header is not aligned, see the algorithm defined in the prior section.

a. 如果标题未对齐,请参阅上一节中定义的算法。

5) If TCP segment is in order or out of order, the MPA header is at the beginning of the current TCP payload. Get the FPDU length from the FPDU header.

5) 如果TCP段处于有序或无序状态,则MPA标头位于当前TCP有效负载的开头。从FPDU标头获取FPDU长度。

6) Calculate CRC over FPDU.

6) 计算FPDU上的CRC。

7) Check CRC calculation for FPDU #N.

7) 检查FPDU N的CRC计算。

8) If no FPDU CRC errors, placement is allowed.

8) 如果没有FPDU CRC错误,则允许放置。

9) CopyData(TCP segment #X, host buffer address, length).

9) CopyData(TCP段#X、主机缓冲区地址、长度)。

10) Loop to #5 until all the FPDUs in the TCP segment are consumed in order to handle FPDU packing.

10) 循环到#5,直到TCP段中的所有FPDU都被消耗,以便处理FPDU打包。

Implementation note: In both cases, the receiver has to classify the incoming TCP segment and associate it with one of the flows it maintains. In the case of no FPDU Alignment, the receiver is forced to classify incoming traffic before it can calculate the FPDU CRC. In the case of FPDU Alignment, the operations order is left to the implementer.

实现说明:在这两种情况下,接收方都必须对传入的TCP段进行分类,并将其与它维护的一个流相关联。在没有FPDU对齐的情况下,接收机被迫在计算FPDU CRC之前对传入流量进行分类。在FPDU对齐的情况下,操作顺序留给实现者。

The FPDU Aligned receiver algorithm is significantly simpler. There is no need to locally buffer portions of FPDUs. Accessing state information is also substantially simplified -- the normal case does not require retrieving information to find out where an FPDU starts and ends or retrieval of a partial CRC before the CRC calculation can commence. This avoids adding internal latencies, having multiple data passes through the CRC machine, or scheduling multiple commands for moving the data to the host buffer.

FPDU对齐接收机算法要简单得多。不需要本地缓冲FPDU的部分。访问状态信息也大大简化了——正常情况下,不需要检索信息来找出FPDU的开始和结束位置,也不需要在CRC计算开始之前检索部分CRC。这避免了添加内部延迟、使多个数据通过CRC机器或调度多个命令以将数据移动到主机缓冲区。

The aligned FPDU approach is useful for in-order and out-of-order reception. The receiver can use the same mechanisms for data storage in both cases, and only needs to account for when all the TCP segments have arrived to enable Delivery. The Header Alignment, along with the high probability that at least one complete FPDU is found with every TCP segment, allows the receiver to perform data placement for out-of-order TCP segments with no need for intermediate buffering. Essentially, the TCP receive buffer has been eliminated and TCP reassembly is done in place within the ULP Buffer.

对齐的FPDU方法对于有序和无序接收非常有用。在这两种情况下,接收方都可以使用相同的数据存储机制,并且只需要考虑所有TCP段何时到达以启用传递。报头对齐以及在每个TCP段中至少找到一个完整FPDU的高概率,允许接收器对无序TCP段执行数据放置,而无需中间缓冲。基本上,TCP接收缓冲区已被消除,TCP重新组装在ULP缓冲区内完成。

In case FPDU Alignment is not found, the receiver should follow the algorithm for non-aligned FPDU reception, which may be slower and less efficient.

如果未找到FPDU对齐,接收器应遵循非对齐FPDU接收的算法,该算法可能较慢且效率较低。

B.2.2. FPDU Alignment Effects on TCP Wire Protocol
B.2.2. FPDU对齐对TCP-Wire协议的影响

In an optimized MPA/TCP implementation, TCP exposes its EMSS to MPA. MPA uses the EMSS to calculate its MULPDU, which it then exposes to DDP, its ULP. DDP uses the MULPDU to segment its payload so that each FPDU sent by MPA fits completely into one TCP segment. This has no impact on wire protocol, and exposing this information is already supported on many TCP implementations, including all modern flavors of BSD networking, through the TCP_MAXSEG socket option.

在优化的MPA/TCP实现中,TCP将其EMS公开给MPA。MPA使用EMS计算其MULPDU,然后将其暴露于DDP,即其ULP。DDP使用MULPDU对其有效负载进行分段,以便MPA发送的每个FPDU完全适合一个TCP段。这对wire协议没有影响,通过TCP_MAXSEG套接字选项,许多TCP实现(包括所有现代BSD网络)都支持公开此信息。

In the common case, the ULP (i.e., DDP over MPA) messages provided to the TCP layer are segmented to MULPDU size. It is assumed that the ULP message size is bounded by MULPDU, such that a single ULP message can be encapsulated in a single TCP segment. Therefore, in the common case, there is no increase in the number of TCP segments emitted. For smaller ULP messages, the sender can also apply packing, i.e., the sender packs as many complete FPDUs as possible into one TCP segment. The requirement to always have a complete FPDU may increase the number of TCP segments emitted. Typically, a ULP message size varies from a few bytes to multiple EMSSs (e.g., 64 Kbytes). In some cases, the ULP may post more than one message at a time for transmission, giving the sender an opportunity for packing. In the case where more than one FPDU is available for transmission and the FPDUs are encapsulated into a TCP segment and there is no room in the TCP segment to include the next complete FPDU, another

在常见情况下,提供给TCP层的ULP(即,MPA上的DDP)消息被分割为MULPDU大小。假设ULP消息大小受MULPDU限制,因此单个ULP消息可以封装在单个TCP段中。因此,在常见情况下,发送的TCP段数量不会增加。对于较小的ULP消息,发送方也可以应用打包,即发送方将尽可能多的完整FPDU打包到一个TCP段中。始终具有完整FPDU的要求可能会增加发出的TCP段的数量。通常,ULP消息大小从几个字节到多个EMS(例如64 KB)不等。在某些情况下,ULP可能一次发布多条消息以进行传输,从而为发送者提供打包的机会。如果有多个FPDU可用于传输,并且FPDU封装在TCP段中,并且TCP段中没有空间包含下一个完整的FPDU,则另一个FPDU

TCP segment is sent. In this corner case, some of the TCP segments are not full size. In the worst-case scenario, the ULP may choose an FPDU size that is EMSS/2 +1 and has multiple messages available for transmission. For this poor choice of FPDU size, the average TCP segment size is therefore about 1/2 of the EMSS and the number of TCP segments emitted is approaching 2x of what is possible without the requirement to encapsulate an integer number of complete FPDUs in every TCP segment. This is a dynamic situation that only lasts for the duration where the sender ULP has multiple non-optimal messages for transmission and this causes a minor impact on the wire utilization.

发送TCP段。在这种情况下,一些TCP段不是完整的大小。在最坏情况下,ULP可选择FPDU大小为EMSS/2+1且具有多条可用于传输的消息。由于FPDU大小选择不当,因此平均TCP段大小约为EMS的1/2,并且发出的TCP段数量接近2倍,而无需在每个TCP段中封装整数个完整FPDU。这是一种动态情况,仅在发送方ULP具有多个非最佳消息进行传输的持续时间内持续,这会对有线利用率造成轻微影响。

However, it is not expected that requiring FPDU Alignment will have a measurable impact on wire behavior of most applications. Throughput applications with large I/Os are expected to take full advantage of the EMSS. Another class of applications with many small outstanding buffers (as compared to EMSS) is expected to use packing when applicable. Transaction-oriented applications are also optimal.

然而,预计要求FPDU校准不会对大多数应用的导线行为产生可测量的影响。具有大型I/O的吞吐量应用程序有望充分利用EMS。另一类具有许多小型优秀缓冲区(与EMS相比)的应用程序预计将在适用时使用打包。面向事务的应用程序也是最佳的。

TCP retransmission is another area that can affect sender behavior. TCP supports retransmission of the exact, originally transmitted segment (see [RFC793], Sections 2.6 and 3.7 (under "Managing the Window") and [RFC1122], Section 4.2.2.15). In the unlikely event that part of the original segment has been received and acknowledged by the Remote Peer (e.g., a re-segmenting middlebox, as documented in Appendix A.4, Re-Segmenting Middleboxes and Non-Optimized MPA/TCP Senders), a better available bandwidth utilization may be possible by retransmitting only the missing octets. If an optimized MPA/TCP retransmits complete FPDUs, there may be some marginal bandwidth loss.

TCP重传是影响发送方行为的另一个方面。TCP支持重新传输最初传输的准确数据段(参见[RFC793],第2.6节和第3.7节(“管理窗口”)和[RFC1122]第4.2.2.15节)。在不太可能的情况下,远程对等方已经接收并确认了部分原始段(例如,附录a.4“重新分段的中间盒和未优化的MPA/TCP发送器”中记录的重新分段的中间盒),通过仅重新传输丢失的八位字节,可以获得更好的可用带宽利用率。如果优化的MPA/TCP重新传输完整的FPDU,可能会有一些边际带宽损失。

Another area where a change in the TCP segment number may have impact is that of slow start and congestion avoidance. Slow-start exponential increase is measured in segments per second, as the algorithm focuses on the overhead per segment at the source for congestion that eventually results in dropped segments. Slow-start exponential bandwidth growth for optimized MPA/TCP is similar to any TCP implementation. Congestion avoidance allows for a linear growth in available bandwidth when recovering after a packet drop. Similar to the analysis for slow start, optimized MPA/TCP doesn't change the behavior of the algorithm. Therefore, the average size of the segment versus EMSS is not a major factor in the assessment of the bandwidth growth for a sender. Both slow start and congestion avoidance for an optimized MPA/TCP will behave similarly to any TCP sender and allow an optimized MPA/TCP to enjoy the theoretical performance limits of the algorithms.

TCP段号的变化可能影响的另一个方面是慢启动和避免拥塞。慢启动指数增长是以每秒段数为单位测量的,因为算法关注的是拥塞源处的每段开销,而拥塞最终会导致段数下降。优化MPA/TCP的慢启动指数带宽增长与任何TCP实现类似。拥塞避免允许在数据包丢失后恢复时可用带宽线性增长。与慢启动分析类似,优化的MPA/TCP不会改变算法的行为。因此,段与EMS的平均大小不是评估发送方带宽增长的主要因素。优化后的MPA/TCP的慢启动和拥塞避免将与任何TCP发送方的行为类似,并允许优化后的MPA/TCP享受算法的理论性能限制。

In summary, the ULP messages generated at the sender (e.g., the amount of messages grouped for every transmission request) and message size distribution has the most significant impact over the number of TCP segments emitted. The worst-case effect for certain ULPs (with average message size of EMSS/2+1 to EMSS) is bounded by an increase of up to 2x in the number of TCP segments and acknowledges. In reality, the effect is expected to be marginal.

总之,在发送方生成的ULP消息(例如,为每个传输请求分组的消息数量)和消息大小分布对发出的TCP段数具有最显著的影响。某些ULP(平均消息大小为EMSS/2+1至EMSS)的最坏情况影响以TCP段和确认数增加2倍为界。事实上,这种影响预计是微不足道的。

Appendix C. IETF Implementation Interoperability with RDMA Consortium Protocols

附录C.IETF实施与RDMA联盟协议的互操作性

This appendix is for information only and is NOT part of the standard.

本附录仅供参考,不属于本标准的一部分。

This appendix covers methods of making MPA implementations interoperate with both IETF and RDMA Consortium versions of the protocols.

本附录涵盖了使MPA实现与IETF和RDMA联盟版本的协议互操作的方法。

The RDMA Consortium created early specifications of the MPA/DDP/RDMA protocols, and some manufacturers created implementations of those protocols before the IETF versions were finalized. These protocols are very similar to the IETF versions making it possible for implementations to be created or modified to support either set of specifications.

RDMA联盟创建了MPA/DDP/RDMA协议的早期规范,一些制造商在IETF版本最终确定之前创建了这些协议的实现。这些协议与IETF版本非常相似,因此可以创建或修改实现以支持任何一组规范。

For those interested, the RDMA Consortium protocol documents (draft-culley-iwarp-mpa-v1.0.pdf [RDMA-MPA], draft-shah-iwarp-ddp-v1.0.pdf [RDMA-DDP], and draft-recio-iwarp-rdmac-v1.0.pdf [RDMA-RDMAC]) can be obtained at http://www.rdmaconsortium.org/home.

对于感兴趣的人,RDMA联合体协议文件(draft-culley-iwarp-mpa-v1.0.pdf[RDMA-mpa]、draft-shah-iwarp-ddp-v1.0.pdf[RDMA-ddp]和draft-recio-iwarp-rdmac-v1.0.pdf[RDMA-rdmac])可在以下网址获得:http://www.rdmaconsortium.org/home.

In this section, implementations of MPA/DDP/RDMA that conform to the RDMAC specifications are called RDMAC RNICs. Implementations of MPA/DDP/RDMA that conform to the IETF RFCs are called IETF RNICs.

在本节中,符合RDMAC规范的MPA/DDP/RDMA实现称为RDMAC RNIC。符合IETF RFCs的MPA/DDP/RDMA实现称为IETF RNICs。

Without the exchange of MPA Request/Reply Frames, there is no standard mechanism for enabling RDMAC RNICs to interoperate with IETF RNICs. Even if a ULP uses a well-known port to start an IETF RNIC immediately in RDMA mode (i.e., without exchanging the MPA Request/Reply messages), there is no reason to believe an IETF RNIC will interoperate with an RDMAC RNIC because of the differences in the version number in the DDP and RDMAP headers on the wire.

没有MPA请求/应答帧的交换,就没有标准的机制使RDMAC RNIC能够与IETF RNIC进行互操作。即使ULP使用众所周知的端口在RDMA模式下立即启动IETF RNIC(即,不交换MPA请求/回复消息),也没有理由相信IETF RNIC会与RDMAC RNIC进行互操作,因为DDP和RDMAP头中的版本号在线路上存在差异。

Therefore, the ULP or other supporting entity at the RDMAC RNIC must implement MPA Request/Reply Frames on behalf of the RNIC in order to negotiate the connection parameters. The following section describes the results following the exchange of the MPA Request/Reply Frames before the conversion from streaming to RDMA mode.

因此,RDMAC RNIC上的ULP或其他支持实体必须代表RNIC实现MPA请求/应答帧,以便协商连接参数。下一节描述了从流模式转换为RDMA模式之前MPA请求/应答帧交换后的结果。

C.1. Negotiated Parameters
C.1. 协商参数

Three types of RNICs are considered:

考虑三种类型的RNIC:

Upgraded RDMAC RNIC - an RNIC implementing the RDMAC protocols that has a ULP or other supporting entity that exchanges the MPA Request/Reply Frames in streaming mode before the conversion to RDMA mode.

升级的RDMAC RNIC-实现RDMAC协议的RNIC,具有ULP或其他支持实体,在转换为RDMA模式之前以流模式交换MPA请求/应答帧。

Non-permissive IETF RNIC - an RNIC implementing the IETF protocols that is not capable of implementing the RDMAC protocols. Such an RNIC can only interoperate with other IETF RNICs.

非许可IETF RNIC——实现IETF协议的RNIC,不能实现RDMAC协议。这样的RNIC只能与其他IETF RNIC互操作。

Permissive IETF RNIC - an RNIC implementing the IETF protocols that is capable of implementing the RDMAC protocols on a per-connection basis.

许可IETF RNIC-实现IETF协议的RNIC,能够在每个连接的基础上实现RDMAC协议。

The Permissive IETF RNIC is recommended for those implementers that want maximum interoperability with other RNIC implementations.

对于那些希望与其他RNIC实现最大互操作性的实施者,建议使用允许的IETF RNIC。

The values used by these three RNIC types for the MPA, DDP, and RDMAP versions as well as MPA Markers and CRC are summarized in Figure 14.

图14总结了这三种RNIC类型用于MPA、DDP和RDMAP版本以及MPA标记和CRC的值。

    +----------------++-----------+-----------+-----------+-----------+
    | RNIC TYPE      || DDP/RDMAP |    MPA    |    MPA    |    MPA    |
    |                ||  Version  | Revision  |  Markers  |    CRC    |
    +----------------++-----------+-----------+-----------+-----------+
    +----------------++-----------+-----------+-----------+-----------+
    | RDMAC          ||     0     |     0     |     1     |     1     |
    |                ||           |           |           |           |
    +----------------++-----------+-----------+-----------+-----------+
    | IETF           ||     1     |     1     |  0 or 1   |  0 or 1   |
    | Non-permissive ||           |           |           |           |
    +----------------++-----------+-----------+-----------+-----------+
    | IETF           ||  1 or 0   |  1 or 0   |  0 or 1   |  0 or 1   |
    | permissive     ||           |           |           |           |
    +----------------++-----------+-----------+-----------+-----------+
        
    +----------------++-----------+-----------+-----------+-----------+
    | RNIC TYPE      || DDP/RDMAP |    MPA    |    MPA    |    MPA    |
    |                ||  Version  | Revision  |  Markers  |    CRC    |
    +----------------++-----------+-----------+-----------+-----------+
    +----------------++-----------+-----------+-----------+-----------+
    | RDMAC          ||     0     |     0     |     1     |     1     |
    |                ||           |           |           |           |
    +----------------++-----------+-----------+-----------+-----------+
    | IETF           ||     1     |     1     |  0 or 1   |  0 or 1   |
    | Non-permissive ||           |           |           |           |
    +----------------++-----------+-----------+-----------+-----------+
    | IETF           ||  1 or 0   |  1 or 0   |  0 or 1   |  0 or 1   |
    | permissive     ||           |           |           |           |
    +----------------++-----------+-----------+-----------+-----------+
        

Figure 14: Connection Parameters for the RNIC Types for MPA Markers and MPA CRC, enabled=1, disabled=0.

图14:MPA标记和MPA CRC的RNIC类型的连接参数,启用=1,禁用=0。

It is assumed there is no mixing of versions allowed between MPA, DDP, and RDMAP. The RNIC either generates the RDMAC protocols on the wire (version is zero) or uses the IETF protocols (version is one).

假设MPA、DDP和RDMAP之间不允许混合版本。RNIC要么在线路上生成RDMAC协议(版本为零),要么使用IETF协议(版本为一)。

During the exchange of the MPA Request/Reply Frames, each peer provides its MPA Revision, Marker preference (M: 0=disabled, 1=enabled), and CRC preference. The MPA Revision provided in the MPA Request Frame and the MPA Reply Frame may differ.

在交换MPA请求/应答帧期间,每个对等方提供其MPA修订、标记首选项(M:0=禁用,1=启用)和CRC首选项。MPA请求框架和MPA回复框架中提供的MPA版本可能有所不同。

From the information in the MPA Request/Reply Frames, each side sets the Version field (V: 0=RDMAC, 1=IETF) of the DDP/RDMAP protocols as well as the state of the Markers for each half connection. Between DDP and RDMAP, no mixing of versions is allowed. Moreover, the DDP and RDMAP version MUST be identical in the two directions. The RNIC either generates the RDMAC protocols on the wire (version is zero) or uses the IETF protocols (version is one).

根据MPA请求/应答帧中的信息,各方设置DDP/RDMAP协议的版本字段(V:0=RDMAC,1=IETF)以及每个半连接的标记状态。在DDP和RDMAP之间,不允许混合版本。此外,DDP和RDMAP版本在两个方向上必须相同。RNIC要么在线路上生成RDMAC协议(版本为零),要么使用IETF协议(版本为一)。

In the following sections, the figures do not discuss CRC negotiation because there is no interoperability issue for CRCs. Since the RDMAC RNIC will always request CRC use, then, according to the IETF MPA specification, both peers MUST generate and check CRCs.

在以下章节中,图中未讨论CRC协商,因为CRC不存在互操作性问题。由于RDMAC RNIC将始终请求CRC使用,因此,根据IETF MPA规范,两个对等方必须生成并检查CRC。

C.2. RDMAC RNIC and Non-Permissive IETF RNIC
C.2. RDMAC RNIC和非许可IETF RNIC

Figure 15 shows that a Non-permissive IETF RNIC cannot interoperate with an RDMAC RNIC, despite the fact that both peers exchange MPA Request/Reply Frames. For a Non-permissive IETF RNIC, the MPA negotiation has no effect on the DDP/RDMAP version and it is unable to interoperate with the RDMAC RNIC.

图15显示,非许可IETF RNIC无法与RDMAC RNIC互操作,尽管两个对等方交换MPA请求/应答帧。对于非许可IETF RNIC,MPA协商对DDP/RDMAP版本没有影响,并且无法与RDMAC RNIC互操作。

The rows in the figure show the state of the Marker field in the MPA Request Frame sent by the MPA Initiator. The columns show the state of the Marker field in the MPA Reply Frame sent by the MPA Responder. Each type of RNIC is shown as an Initiator and a Responder. The connection results are shown in the lower right corner, at the intersection of the different RNIC types, where V=0 is the RDMAC DDP/RDMAP version, V=1 is the IETF DDP/RDMAC version, M=0 means MPA Markers are disabled, and M=1 means MPA Markers are enabled. The negotiated Marker state is shown as X/Y, for the receive direction of the Initiator/Responder.

图中的行显示MPA启动器发送的MPA请求帧中标记字段的状态。这些列显示MPA应答器发送的MPA应答帧中标记字段的状态。每种类型的RNIC都显示为一个启动器和一个响应程序。连接结果显示在右下角不同RNIC类型的交叉处,其中V=0表示RDMAC DDP/RDMAP版本,V=1表示IETF DDP/RDMAC版本,M=0表示禁用MPA标记,M=1表示启用MPA标记。协商标记状态显示为X/Y,表示发起方/响应方的接收方向。

          +---------------------------++-----------------------+
          |   MPA                     ||          MPA          |
          | CONNECT                   ||       Responder       |
          |   MODE  +-----------------++-------+---------------+
          |         |   RNIC          || RDMAC |     IETF      |
          |         |   TYPE          ||       | Non-permissive|
          |         |          +------++-------+-------+-------+
          |         |          |MARKER|| M=1   | M=0   |  M=1  |
          +---------+----------+------++-------+-------+-------+
          +---------+----------+------++-------+-------+-------+
          |         |   RDMAC  | M=1  || V=0   | close | close |
          |         |          |      || M=1/1 |       |       |
          |         +----------+------++-------+-------+-------+
          |   MPA   |          | M=0  || close | V=1   | V=1   |
          |Initiator|   IETF   |      ||       | M=0/0 | M=0/1 |
          |         |Non-perms.+------++-------+-------+-------+
          |         |          | M=1  || close | V=1   | V=1   |
          |         |          |      ||       | M=1/0 | M=1/1 |
          +---------+----------+------++-------+-------+-------+
        
          +---------------------------++-----------------------+
          |   MPA                     ||          MPA          |
          | CONNECT                   ||       Responder       |
          |   MODE  +-----------------++-------+---------------+
          |         |   RNIC          || RDMAC |     IETF      |
          |         |   TYPE          ||       | Non-permissive|
          |         |          +------++-------+-------+-------+
          |         |          |MARKER|| M=1   | M=0   |  M=1  |
          +---------+----------+------++-------+-------+-------+
          +---------+----------+------++-------+-------+-------+
          |         |   RDMAC  | M=1  || V=0   | close | close |
          |         |          |      || M=1/1 |       |       |
          |         +----------+------++-------+-------+-------+
          |   MPA   |          | M=0  || close | V=1   | V=1   |
          |Initiator|   IETF   |      ||       | M=0/0 | M=0/1 |
          |         |Non-perms.+------++-------+-------+-------+
          |         |          | M=1  || close | V=1   | V=1   |
          |         |          |      ||       | M=1/0 | M=1/1 |
          +---------+----------+------++-------+-------+-------+
        

Figure 15: MPA Negotiation between an RDMAC RNIC and a Non-Permissive IETF RNIC

图15:RDMAC RNIC和非许可IETF RNIC之间的MPA协商

C.2.1. RDMAC RNIC Initiator
C.2.1. RDMAC RNIC启动器

If the RDMAC RNIC is the MPA Initiator, its ULP sends an MPA Request Frame with Rev field set to zero and the M and C bits set to one. Because the Non-permissive IETF RNIC cannot dynamically downgrade the version number it uses for DDP and RDMAP, it would send an MPA Reply Frame with the Rev field equal to one and then gracefully close the connection.

如果RDMAC RNIC是MPA启动器,其ULP发送一个MPA请求帧,Rev字段设置为零,M和C位设置为一。由于非许可IETF RNIC无法动态降级其用于DDP和RDMAP的版本号,因此它将发送一个Rev字段等于1的MPA应答帧,然后正常关闭连接。

C.2.2. Non-Permissive IETF RNIC Initiator
C.2.2. 非许可IETF RNIC启动器

If the Non-permissive IETF RNIC is the MPA Initiator, it sends an MPA Request Frame with Rev field equal to one. The ULP or supporting entity for the RDMAC RNIC responds with an MPA Reply Frame that has the Rev field equal to zero and the M bit set to one. The Non-permissive IETF RNIC will gracefully close the connection after it reads the incompatible Rev field in the MPA Reply Frame.

如果非许可IETF RNIC是MPA发起者,它发送一个MPA请求帧,其Rev字段等于1。RDMAC RNIC的ULP或支持实体使用MPA应答帧进行响应,该帧的Rev字段等于零,M位设置为1。非许可IETF RNIC在读取MPA应答帧中不兼容的Rev字段后,将优雅地关闭连接。

C.2.3. RDMAC RNIC and Permissive IETF RNIC
C.2.3. RDMAC RNIC和许可IETF RNIC

Figure 16 shows that a Permissive IETF RNIC can interoperate with an RDMAC RNIC regardless of its Marker preference. The figure uses the same format as shown with the Non-permissive IETF RNIC.

图16显示了允许的IETF RNIC可以与RDMAC RNIC进行互操作,而不管其标记首选项如何。该图使用与非许可IETF RNIC相同的格式。

          +---------------------------++-----------------------+
          |   MPA                     ||          MPA          |
          | CONNECT                   ||       Responder       |
          |   MODE  +-----------------++-------+---------------+
          |         |   RNIC          || RDMAC |     IETF      |
          |         |   TYPE          ||       |  Permissive   |
          |         |          +------++-------+-------+-------+
          |         |          |MARKER|| M=1   | M=0   | M=1   |
          +---------+----------+------++-------+-------+-------+
          +---------+----------+------++-------+-------+-------+
          |         |   RDMAC  | M=1  || V=0   | N/A   | V=0   |
          |         |          |      || M=1/1 |       | M=1/1 |
          |         +----------+------++-------+-------+-------+
          |   MPA   |          | M=0  || V=0   | V=1   | V=1   |
          |Initiator|   IETF   |      || M=1/1 | M=0/0 | M=0/1 |
          |         |Permissive+------++-------+-------+-------+
          |         |          | M=1  || V=0   | V=1   | V=1   |
          |         |          |      || M=1/1 | M=1/0 | M=1/1 |
          +---------+----------+------++-------+-------+-------+
        
          +---------------------------++-----------------------+
          |   MPA                     ||          MPA          |
          | CONNECT                   ||       Responder       |
          |   MODE  +-----------------++-------+---------------+
          |         |   RNIC          || RDMAC |     IETF      |
          |         |   TYPE          ||       |  Permissive   |
          |         |          +------++-------+-------+-------+
          |         |          |MARKER|| M=1   | M=0   | M=1   |
          +---------+----------+------++-------+-------+-------+
          +---------+----------+------++-------+-------+-------+
          |         |   RDMAC  | M=1  || V=0   | N/A   | V=0   |
          |         |          |      || M=1/1 |       | M=1/1 |
          |         +----------+------++-------+-------+-------+
          |   MPA   |          | M=0  || V=0   | V=1   | V=1   |
          |Initiator|   IETF   |      || M=1/1 | M=0/0 | M=0/1 |
          |         |Permissive+------++-------+-------+-------+
          |         |          | M=1  || V=0   | V=1   | V=1   |
          |         |          |      || M=1/1 | M=1/0 | M=1/1 |
          +---------+----------+------++-------+-------+-------+
        

Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive IETF RNIC

图16:RDMAC RNIC和许可IETF RNIC之间的MPA协商

A truly Permissive IETF RNIC will recognize an RDMAC RNIC from the Rev field of the MPA Req/Rep Frames and then adjust its receive Marker state and DDP/RDMAP version to accommodate the RDMAC RNIC. As a result, as an MPA Responder, the Permissive IETF RNIC will never return an MPA Reply Frame with the M bit set to zero. This case is shown as a not applicable (N/A) in Figure 16.

真正许可的IETF RNIC将从MPA Req/Rep帧的Rev字段识别RDMAC RNIC,然后调整其接收标记状态和DDP/RDMAP版本以适应RDMAC RNIC。因此,作为MPA应答器,许可IETF RNIC永远不会返回M位设置为零的MPA应答帧。这种情况在图16中显示为不适用(N/a)。

C.2.4. RDMAC RNIC Initiator
C.2.4. RDMAC RNIC启动器

When the RDMAC RNIC is the MPA Initiator, its ULP or other supporting entity prepares an MPA Request message and sets the revision to zero and the M bit and C bit to one.

当RDMAC RNIC是MPA启动器时,其ULP或其他支持实体准备MPA请求消息,并将修订设置为零,将M位和C位设置为一。

The Permissive IETF Responder receives the MPA Request message and checks the revision field. Since it is capable of generating RDMAC DDP/RDMAP headers, it sends an MPA Reply message with revision set to zero and the M and C bits set to one. The Responder must inform its ULP that it is generating version zero DDP/RDMAP messages.

许可IETF响应程序接收MPA请求消息并检查修订字段。由于它能够生成RDMAC DDP/RDMAP头,因此它发送一条MPA回复消息,修订设置为零,M和C位设置为一。响应者必须通知其ULP它正在生成版本为零的DDP/RDMAP消息。

C.2.5 Permissive IETF RNIC Initiator
C.2.5 允许IETF RNIC启动器

If the Permissive IETF RNIC is the MPA Initiator, it prepares the MPA Request Frame setting the Rev field to one. Regardless of the value of the M bit in the MPA Request Frame, the ULP or other supporting entity for the RDMAC RNIC will create an MPA Reply Frame with Rev equal to zero and the M bit set to one.

如果许可IETF RNIC是MPA发起方,它将准备MPA请求帧,将Rev字段设置为1。无论MPA请求帧中M位的值如何,ULP或RDMAC RNIC的其他支持实体将创建一个Rev等于零且M位设置为1的MPA应答帧。

When the Initiator reads the Rev field of the MPA Reply Frame and finds that its peer is an RDMAC RNIC, it must inform its ULP that it should generate version zero DDP/RDMAP messages and enable MPA Markers and CRC.

当启动器读取MPA应答帧的Rev字段并发现其对等方是RDMAC RNIC时,它必须通知其ULP它应该生成版本为零的DDP/RDMAP消息并启用MPA标记和CRC。

C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC
C.3. 非许可IETF RNIC和许可IETF RNIC

For completeness, Figure 17 below shows the results of MPA negotiation between a Non-permissive IETF RNIC and a Permissive IETF RNIC. The important point from this figure is that an IETF RNIC cannot detect whether its peer is a Permissive or Non-permissive RNIC.

为完整起见,下图17显示了非许可IETF RNIC和许可IETF RNIC之间MPA协商的结果。该图的要点是IETF RNIC无法检测其对等方是许可的还是非许可的RNIC。

      +---------------------------++-------------------------------+
      |   MPA                     ||              MPA              |
      | CONNECT                   ||            Responder          |
      |   MODE  +-----------------++---------------+---------------+
      |         |   RNIC          ||     IETF      |     IETF      |
      |         |   TYPE          || Non-permissive|  Permissive   |
      |         |          +------++-------+-------+-------+-------+
      |         |          |MARKER|| M=0   | M=1   | M=0   | M=1   |
      +---------+----------+------++-------+-------+-------+-------+
      +---------+----------+------++-------+-------+-------+-------+
      |         |          | M=0  || V=1   | V=1   | V=1   | V=1   |
      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
      |         |Non-perms.+------++-------+-------+-------+-------+
      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   |
      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
      |   MPA   +----------+------++-------+-------+-------+-------+
      |Initiator|          | M=0  || V=1   | V=1   | V=1   | V=1   |
      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
      |         |Permissive+------++-------+-------+-------+-------+
      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   |
      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
      +---------+----------+------++-------+-------+-------+-------+
        
      +---------------------------++-------------------------------+
      |   MPA                     ||              MPA              |
      | CONNECT                   ||            Responder          |
      |   MODE  +-----------------++---------------+---------------+
      |         |   RNIC          ||     IETF      |     IETF      |
      |         |   TYPE          || Non-permissive|  Permissive   |
      |         |          +------++-------+-------+-------+-------+
      |         |          |MARKER|| M=0   | M=1   | M=0   | M=1   |
      +---------+----------+------++-------+-------+-------+-------+
      +---------+----------+------++-------+-------+-------+-------+
      |         |          | M=0  || V=1   | V=1   | V=1   | V=1   |
      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
      |         |Non-perms.+------++-------+-------+-------+-------+
      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   |
      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
      |   MPA   +----------+------++-------+-------+-------+-------+
      |Initiator|          | M=0  || V=1   | V=1   | V=1   | V=1   |
      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
      |         |Permissive+------++-------+-------+-------+-------+
      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   |
      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
      +---------+----------+------++-------+-------+-------+-------+
        

Figure 17: MPA negotiation between a Non-permissive IETF RNIC and a Permissive IETF RNIC.

图17:非许可IETF RNIC和许可IETF RNIC之间的MPA协商。

Normative References

规范性引用文件

[iSCSI] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E. Zeidner, "Internet Small Computer Systems Interface (iSCSI)", RFC 3720, April 2004.

[iSCSI]Satran,J.,Meth,K.,Sapuntzakis,C.,Chadalapaka,M.,和E.Zeidner,“互联网小型计算机系统接口(iSCSI)”,RFC 3720,2004年4月。

[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990.

[RFC1191]Mogul,J.和S.Deering,“MTU发现路径”,RFC1191,1990年11月。

[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996.

[RFC2018]Mathis,M.,Mahdavi,J.,Floyd,S.,和A.Romanow,“TCP选择性确认选项”,RFC 2018,1996年10月。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998.

[RFC2401]Kent,S.和R.Atkinson,“互联网协议的安全架构”,RFC 2401,1998年11月。

[RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and F. Travostino, "Securing Block Storage Protocols over IP", RFC 3723, April 2004.

[RFC3723]Aboba,B.,Tseng,J.,Walker,J.,Rangan,V.,和F.Travostino,“通过IP保护块存储协议”,RFC 37232004年4月。

[RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981.

[RFC793]Postel,J.,“传输控制协议”,标准7,RFC 793,1981年9月。

[RDMASEC] Pinkerton, J. and E. Deleganes, "Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security", RFC 5042, October 2007.

[RDMASEC]Pinkerton,J.和E.Deleganes,“直接数据放置协议(DDP)/远程直接内存访问协议(RDMAP)安全”,RFC 50422007年10月。

Informative References

资料性引用

[APPL] Bestler, C. and L. Coene, "Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct Data Placement (DDP)", RFC 5045, October 2007.

[APPL]Bestler,C.和L.Coene,“远程直接内存访问协议(RDMA)和直接数据放置(DDP)的适用性”,RFC 50452007年10月。

[CRCTCP] Stone J., Partridge, C., "When the CRC and TCP checksum disagree", ACM Sigcomm, Sept. 2000.

[CRCTCP]Stone J.,Partridge,C.,“当CRC和TCP校验和不一致时”,ACM Sigcomm,2000年9月。

[DAT-API] DAT Collaborative, "kDAPL (Kernel Direct Access Programming Library) and uDAPL (User Direct Access Programming Library)", Http://www.datcollaborative.org.

[DAT-API]DAT协作,“kDAPL(内核直接访问编程库)和uDAPL(用户直接访问编程库)”,Http://www.datcollaborative.org.

[DDP] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct Data Placement over Reliable Transports", RFC 5041, October 2007.

[DDP]Shah,H.,Pinkerton,J.,Recio,R.,和P.Culley,“可靠传输上的直接数据放置”,RFC 50412007年10月。

[iSER] Ko, M., Chadalapaka, M., Hufferd, J., Elzur, U., Shah, H., and P. Thaler, "Internet Small Computer System Interface (iSCSI) Extensions for Remote Direct Memory Access (RDMA)" RFC 5046, October 2007.

[iSER]Ko,M.,Chadalapaka,M.,Hufferd,J.,Elzur,U.,Shah,H.,和P.Thaler,“用于远程直接内存访问(RDMA)的互联网小型计算机系统接口(iSCSI)扩展”,RFC 5046,2007年10月。

[IT-API] The Open Group, "Interconnect Transport API (IT-API)" Version 2.1, http://www.opengroup.org.

[IT-API]开放组,“互连传输API(IT-API)”2.1版,http://www.opengroup.org.

[NFSv4CHAN] Williams, N., "On the Use of Channel Bindings to Secure Channels", Work in Progress, June 2006.

[NFSv4CHAN]Williams,N.,“关于使用通道绑定保护通道”,正在进行的工作,2006年6月。

[RDMA-DDP] "Direct Data Placement over Reliable Transports (Version 1.0)", RDMA Consortium, October 2002, <http://www.rdmaconsortium.org/home/draft-shah-iwarp-ddp-v1.0.pdf>.

[RDMA-DDP]“可靠传输上的直接数据放置(1.0版)”,RDMA联盟,2002年10月<http://www.rdmaconsortium.org/home/draft-shah-iwarp-ddp-v1.0.pdf>.

[RDMA-MPA] "Marker PDU Aligned Framing for TCP Specification (Version 1.0)", RDMA Consortium, October 2002, <http://www.rdmaconsortium.org/home/draft-culley-iwarp-mpa-v1.0.pdf>.

[RDMA-MPA]“TCP规范的标记PDU对齐框架(1.0版)”,RDMA联盟,2002年10月<http://www.rdmaconsortium.org/home/draft-culley-iwarp-mpa-v1.0.pdf>.

[RDMA-RDMAC] "An RDMA Protocol Specification (Version 1.0)", RDMA Consortium, October 2002, <http://www.rdmaconsortium.org/home/draft-recio-iwarp-rdmac-v1.0.pdf>.

[RDMA-RDMAC]“RDMA协议规范(1.0版)”,RDMA联盟,2002年10月<http://www.rdmaconsortium.org/home/draft-recio-iwarp-rdmac-v1.0.pdf>.

[RDMAP] Recio, R., Culley, P., Garcia, D., Hilland, J., and B. Metzler, "A Remote Direct Memory Access Protocol Specification", RFC 5040, October 2007.

[RDMAP]Recio,R.,Culley,P.,Garcia,D.,Hilland,J.,和B.Metzler,“远程直接内存访问协议规范”,RFC 50402007年10月。

[RFC792] Postel, J., "Internet Control Message Protocol", STD 5, RFC 792, September 1981.

[RFC792]Postel,J.,“互联网控制消息协议”,STD 5,RFC 792,1981年9月。

[RFC896] Nagle, J., "Congestion control in IP/TCP internetworks", RFC 896, January 1984.

[RFC896]Nagle,J.,“IP/TCP网络中的拥塞控制”,RFC896,1984年1月。

[RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989.

[RFC1122]Braden,R.,“互联网主机的要求-通信层”,标准3,RFC 1122,1989年10月。

[RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", RFC 4960, September 2007.

[RFC4960]Stewart,R.,Ed.“流控制传输协议”,RFC 49602007年9月。

[RFC4296] Bailey, S. and T. Talpey, "The Architecture of Direct Data Placement (DDP) and Remote Direct Memory Access (RDMA) on Internet Protocols", RFC 4296, December 2005.

[RFC4296]Bailey,S.和T.Talpey,“互联网协议上直接数据放置(DDP)和远程直接内存访问(RDMA)的体系结构”,RFC 42962005年12月。

[RFC4297] Romanow, A., Mogul, J., Talpey, T., and S. Bailey, "Remote Direct Memory Access (RDMA) over IP Problem Statement", RFC 4297, December 2005.

[RFC4297]Romanow,A.,Mogul,J.,Talpey,T.,和S.Bailey,“IP上的远程直接内存访问(RDMA)问题陈述”,RFC 42972005年12月。

[RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005.

[RFC4301]Kent,S.和K.Seo,“互联网协议的安全架构”,RFC 43012005年12月。

[VERBS-RMDA] "RDMA Protocol Verbs Specification", RDMA Consortium standard, April 2003, <http://www.rdmaconsortium.org/ home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf>.

[动词-RMDA]“RDMA协议动词规范”,RDMA联盟标准,2003年4月<http://www.rdmaconsortium.org/ home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf>。

Contributors

贡献者

Dwight Barron Hewlett-Packard Company 20555 SH 249 Houston, TX 77070-2698 USA Phone: 281-514-2769 EMail: dwight.barron@hp.com

德怀特·巴伦·惠普公司20555 SH 249德克萨斯州休斯顿77070-2698美国电话:281-514-2769电子邮件:德怀特。barron@hp.com

Jeff Chase Department of Computer Science Duke University Durham, NC 27708-0129 USA Phone: +1 919 660 6559 EMail: chase@cs.duke.edu

Jeff Chase美国北卡罗来纳州达勒姆杜克大学计算机科学系27708-0129电话:+1 919 660 6559电子邮件:chase@cs.duke.edu

Ted Compton EMC Corporation Research Triangle Park, NC 27709 USA Phone: 919-248-6075 EMail: compton_ted@emc.com

泰德·康普顿美国北卡罗来纳州三角研究园EMC公司电话:919-248-6075电子邮件:康普顿_ted@emc.com

Dave Garcia 24100 Hutchinson Rd. Los Gatos, CA 95033 Phone: 831 247 4464 EMail: Dave.Garcia@StanfordAlumni.org

戴夫·加西亚加利福尼亚州洛斯加托斯哈钦森路24100号95033电话:831 247 4464电子邮件:戴夫。Garcia@StanfordAlumni.org

Hari Ghadia Gen10 Technology, Inc. 1501 W Shady Grove Road Grand Prairie, TX 75050 Phone: (972) 301 3630 EMail: hghadia@gen10technology.com

Hari Ghadia Gen10 Technology,Inc.德克萨斯州大草原Shady Grove路西1501号75050电话:(972)301 3630电子邮件:hghadia@gen10technology.com

Howard C. Herbert Intel Corporation MS CH7-404 5000 West Chandler Blvd. Chandler, AZ 85226 Phone: 480-554-3116 EMail: howard.c.herbert@intel.com

霍华德C.赫伯特英特尔公司MS CH7-404 5000西钱德勒大道。亚利桑那州钱德勒85226电话:480-554-3116电子邮件:howard.c。herbert@intel.com

Jeff Hilland Hewlett-Packard Company 20555 SH 249 Houston, TX 77070-2698 USA Phone: 281-514-9489 EMail: jeff.hilland@hp.com

杰夫·希尔兰·惠普公司20555 SH 249德克萨斯州休斯顿77070-2698美国电话:281-514-9489电子邮件:杰夫。hilland@hp.com

Mike Ko IBM 650 Harry Rd. San Jose, CA 95120 Phone: (408) 927-2085 EMail: mako@us.ibm.com

Mike Ko IBM加利福尼亚州圣何塞哈里路650号95120电话:(408)927-2085电子邮件:mako@us.ibm.com

Mike Krause Hewlett-Packard Corporation, 43LN 19410 Homestead Road Cupertino, CA 95014 USA Phone: +1 (408) 447-3191 EMail: krause@cup.hp.com

Mike Krause Hewlett-Packard Corporation,地址:美国加利福尼亚州库比蒂诺市霍姆斯特德路19410号,邮编:95014电话:+1(408)447-3191电子邮件:krause@cup.hp.com

Dave Minturn Intel Corporation MS JF1-210 5200 North East Elam Young Parkway Hillsboro, Oregon 97124 Phone: 503-712-4106 EMail: dave.b.minturn@intel.com

Dave Minturn Intel Corporation MS JF1-210 5200东北俄勒冈州埃拉姆杨公园路希尔斯伯勒97124电话:503-712-4106电子邮件:Dave.b。minturn@intel.com

Jim Pinkerton Microsoft, Inc. One Microsoft Way Redmond, WA 98052 USA EMail: jpink@microsoft.com

Jim Pinkerton Microsoft,Inc.One Microsoft Way Redmond,WA 98052美国电子邮件:jpink@microsoft.com

Hemal Shah Broadcom Corporation 5300 California Avenue Irvine, CA 92617 USA Phone: +1 (949) 926-6941 EMail: hemal@broadcom.com

Hemal Shah Broadcom Corporation美国加利福尼亚州欧文市加利福尼亚大道5300号92617电话:+1(949)926-6941电子邮件:hemal@broadcom.com

Allyn Romanow Cisco Systems 170 W Tasman Drive San Jose, CA 95134 USA Phone: +1 408 525 8836 EMail: allyn@cisco.com

Allyn Romanow Cisco Systems 170 W美国加利福尼亚州圣何塞塔斯曼大道95134电话:+1 408 525 8836电子邮件:allyn@cisco.com

Tom Talpey Network Appliance 1601 Trapelo Road #16 Waltham, MA 02451 USA Phone: +1 (781) 768-5329 EMail: thomas.talpey@netapp.com

Tom Talpey Network Appliance 1601 Trapelo Road#16 Waltham,MA 02451美国电话:+1(781)768-5329电子邮件:thomas。talpey@netapp.com

Patricia Thaler Broadcom 16215 Alton Parkway Irvine, CA 92618 Phone: 916 570 2707 EMail: pthaler@broadcom.com

Patricia Thaler Broadcom 16215加利福尼亚州欧文市阿尔顿大道92618电话:916 570 2707电子邮件:pthaler@broadcom.com

Jim Wendt Hewlett Packard Corporation 8000 Foothills Boulevard MS 5668 Roseville, CA 95747-5668 USA Phone: +1 916 785 5198 EMail: jim_wendt@hp.com

Jim Wendt Hewlett-Packard Corporation 8000 Foothills Boulevard MS 5668,加利福尼亚州罗斯维尔95747-5668美国电话:+1 916 785 5198电子邮件:Jim_wendt@hp.com

Jim Williams Emulex Corporation 580 Main Street Bolton, MA 01740 USA Phone: +1 978 779 7224 EMail: jim.williams@emulex.com

Jim Williams Emulex Corporation 580美国马萨诸塞州博尔顿大街01740电话:+1 978 779 7224电子邮件:Jim。williams@emulex.com

Authors' Addresses

作者地址

Paul R. Culley Hewlett-Packard Company 20555 SH 249 Houston, TX 77070-2698 USA Phone: 281-514-5543 EMail: paul.culley@hp.com

Paul R.Culley Hewlett-Packard Company 20555 SH 249德克萨斯州休斯顿77070-2698美国电话:281-514-5543电子邮件:Paul。culley@hp.com

Uri Elzur 5300 California Avenue Irvine, CA 92617, USA Phone: 949.926.6432 EMail: uri@broadcom.com

美国加利福尼亚州欧文市加利福尼亚大道5300号Uri Elzur 92617电话:949.926.6432电子邮件:uri@broadcom.com

Renato J Recio IBM Internal Zip 9043 11400 Burnett Road Austin, Texas 78759 Phone: 512-838-3685 EMail: recio@us.ibm.com

Renato J Recio IBM内部邮编9043 11400德克萨斯州奥斯汀伯内特路78759电话:512-838-3685电子邮件:recio@us.ibm.com

Stephen Bailey Sandburst Corporation 600 Federal Street Andover, MA 01810 USA Phone: +1 978 689 1614 EMail: steph@sandburst.com

Stephen Bailey Sandburst Corporation美国马萨诸塞州安多弗联邦街600号电话:+1 978 689 1614电子邮件:steph@sandburst.com

John Carrier Cray Inc. 411 First Avenue S, Suite 600 Seattle, WA 98104-2860 Phone: 206-701-2090 EMail: carrier@cray.com

John Carrier Cray Inc.华盛顿州西雅图第一大道S 411号600室98104-2860电话:206-701-2090电子邮件:carrier@cray.com

Full Copyright Statement

完整版权声明

Copyright (C) The IETF Trust (2007).

版权所有(C)IETF信托基金(2007年)。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件及其包含的信息以“原样”为基础提供,贡献者、他/她所代表或赞助的组织(如有)、互联网协会、IETF信托基金和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Intellectual Property

知识产权

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.