Network Working Group V. Kashyap Request for Comments: 4755 IBM Category: Standards Track December 2006
Network Working Group V. Kashyap Request for Comments: 4755 IBM Category: Standards Track December 2006
IP over InfiniBand: Connected Mode
InfiniBand上的IP:连接模式
Status of This Memo
关于下段备忘
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The IETF Trust (2006).
版权所有(C)IETF信托基金(2006年)。
Abstract
摘要
This document specifies transmission of IPv4/IPv6 packets and address resolution over the connected modes of InfiniBand.
本文档指定通过InfiniBand的连接模式传输IPv4/IPv6数据包和地址解析。
Table of Contents
目录
1. Introduction ....................................................2 2. IPoIB-connected Mode ............................................3 2.1. Multicasting ...............................................3 2.2. Outline of Address Resolution ..............................4 2.3. Outline of Connection Setup ................................4 3. Address Resolution ..............................................4 3.1. Link-layer Address .........................................4 3.2. IB Connection Setup ........................................6 3.3. Simultaneous IB Connections ................................6 3.4. IPoIB-CM IB Connection Teardown ............................7 3.5. Service-ID .................................................7 4. Frame Format ....................................................8 5. Maximum Transmission Unit .......................................8 5.1. Per-Connection MTU .........................................9 6. Private-Data Format .............................................9 7. IPoIB-CM Considerations ........................................10 7.1. A Cautionary Note on IPoIB-RC .............................10 7.2. IPoIB-CM Per-Destination MTU ..............................10 8. Security Considerations ........................................11 9. IANA Considerations ............................................11 10. Acknowledgements ..............................................11 11. Normative References ..........................................11 12. Informative References ........................................11
1. Introduction ....................................................2 2. IPoIB-connected Mode ............................................3 2.1. Multicasting ...............................................3 2.2. Outline of Address Resolution ..............................4 2.3. Outline of Connection Setup ................................4 3. Address Resolution ..............................................4 3.1. Link-layer Address .........................................4 3.2. IB Connection Setup ........................................6 3.3. Simultaneous IB Connections ................................6 3.4. IPoIB-CM IB Connection Teardown ............................7 3.5. Service-ID .................................................7 4. Frame Format ....................................................8 5. Maximum Transmission Unit .......................................8 5.1. Per-Connection MTU .........................................9 6. Private-Data Format .............................................9 7. IPoIB-CM Considerations ........................................10 7.1. A Cautionary Note on IPoIB-RC .............................10 7.2. IPoIB-CM Per-Destination MTU ..............................10 8. Security Considerations ........................................11 9. IANA Considerations ............................................11 10. Acknowledgements ..............................................11 11. Normative References ..........................................11 12. Informative References ........................................11
The InfiniBand specification [IB_ARCH] can be found at www.infinibandta.org. The document [RFC4392] provides a short overview of InfiniBand architecture along with consideration for specifying IP over InfiniBand networks.
InfiniBand规范[IB_ARCH]可在www.infinibandta.org上找到。文档[RFC4392]提供了InfiniBand体系结构的简要概述,以及在InfiniBand网络上指定IP的注意事项。
The InfiniBand Architecture (IBA) defines multiple modes of transports. Of these the unreliable datagram (UD) transport method best matches the needs of IP. IP over InfiniBand (IPoIB) over UD is described in [RFC4391]. This document describes IP transmission over the connected modes of IBA.
InfiniBand体系结构(IBA)定义了多种传输模式。其中,不可靠数据报(UD)传输方法最符合IP的需要。[RFC4391]中描述了UD上的InfiniBand IP(IPoIB)。本文件描述了IBA连接模式下的IP传输。
IBA defines two connected modes:
IBA定义了两种连接模式:
1. Reliable Connected (RC) 2. Unreliable Connected (UC)
1. 可靠连接(RC)2。不可靠连接(UC)
As is evident from the nomenclature, the two modes differ mainly in providing reliability of data delivery across the connection. This document applies equally to both the connected modes. IPoIB over these two modes is referred to as IPoIB-CM (connected mode) in this
从术语中可以明显看出,这两种模式的主要区别在于跨连接提供数据传输的可靠性。本文件同样适用于两种连接模式。这两种模式上的IPoIB在本规范中称为IPoIB CM(连接模式)
document. For clarity, IPoIB over the unreliable datagram mode as described in [RFC4391] is referred to as IPoIB-UD.
文件为清楚起见,[RFC4391]中所述的不可靠数据报模式下的IPoIB称为IPoIB UD。
IBA requires that all Host Channel Adapters (HCAs) support the reliable and unreliable connected modes [IB_ARCH]. It is optional for Target Channel Adapters (TCAs) to support the connected modes.
IBA要求所有主机通道适配器(HCA)支持可靠和不可靠的连接模式[IB_ARCH]。目标通道适配器(TCA)支持连接模式是可选的。
The connected modes offer link MTUs of up to 2^31 octets in length. Thus, the use of connected modes can offer significant benefits by supporting reasonably large MTUs. The datagram modes of InfiniBand Architecture (IBA) are limited to 4096 octets.
连接模式提供长度为2^31个八位字节的链路MTU。因此,通过支持相当大的MTU,使用连接模式可以提供显著的好处。InfiniBand体系结构(IBA)的数据报模式限制为4096个八位字节。
Reliability is also enhanced if the underlying feature of "automatic path migration" supported by the connected modes is utilized.
如果利用连接模式支持的“自动路径迁移”的基本功能,可靠性也会得到增强。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。
IPoIB over connected mode is an OPTIONAL extension to IPoIB-UD. Every IPoIB implementation MUST support [RFC4391] and MAY support the extensions described in this document.
IPoIB over connected模式是IPoIB UD的可选扩展。每个IPoIB实现都必须支持[RFC4391],并且可能支持本文档中描述的扩展。
Therefore, IP encapsulation, default MTU, link-layer address format, and the IPv6 stateless autoconfiguration mechanism apply to IPoIB-CM exactly as described in [RFC4391].
因此,IP封装、默认MTU、链路层地址格式和IPv6无状态自动配置机制完全按照[RFC4391]中的描述应用于IPoIB CM。
The connected modes of IBA define a non-broadcast, multiple-access network. The connected modes of IBA do not support multicasting though every node can communicate with every other node if desired.
IBA的连接模式定义了一个非广播、多址网络。IBA的连接模式不支持多播,但如果需要,每个节点都可以与其他节点通信。
This requires that multicasting be emulated in some form by the network. However, in the case of an InfiniBand network, instead of an emulation, an unreliable datagram (UD) queue pair (QP) can be used to support multicasting while the connected mode QP is used for unicast traffic. Since every IPoIB implementation is required to support the UD mode, every implementation supporting IPoIB-CM will be able to utilize the pre-existing IPoIB-UD QP for all broadcast/multicast communications. Multicast mapping, transmission, and reception of multicast packets and multicast routing MUST use the UD QP associated with the IPoIB interface.
这要求网络以某种形式模拟多播。然而,在InfiniBand网络的情况下,不可靠数据报(UD)队列对(QP)可用于支持多播,而连接模式QP用于单播业务,而不是模拟。由于每个IPoIB实现都需要支持UD模式,因此每个支持IPoIB CM的实现都将能够将预先存在的IPoIB UD QP用于所有广播/多播通信。多播映射、多播数据包的传输和接收以及多播路由必须使用与IPoIB接口相关联的UD QP。
Every IPoIB-CM interface MUST have two sets of QPs associated with it:
每个IPoIB CM接口必须有两组与之相关的QP:
1) One unreliable datagram QP 2) One or more connected mode QPs
1) 一个不可靠数据报QP 2)一个或多个连接模式QP
[RFC4391] describes the address resolution method to determine the link address of the peer. This response is received on the UD QP associated with the IPoIB interface.
[RFC4391]描述了确定对等方链路地址的地址解析方法。此响应在与IPoIB接口相关的UD QP上接收。
Once the link address of the remote node is known, an IB connection must be set up between the nodes before any IP communication may occur.
一旦知道远程节点的链路地址,就必须在节点之间建立IB连接,然后才能进行任何IP通信。
To make a connection, the sender must know the service-ID to use in the request to make a connection [IB_ARCH]. It must also supply the "connection mode" queue pair to the remote node. The peer replies with its queue pair. Each IB connection is peer to peer and uses one connected mode QP at each end.
要建立连接,发送方必须知道要在建立连接的请求中使用的服务ID[ibu ARCH]。它还必须向远程节点提供“连接模式”队列对。对等方使用其队列对进行答复。每个IB连接都是对等的,并且在每一端使用一个连接模式QP。
Though the address resolution occurs at an individual IP address level, the connection between the nodes is at the IB layer. Therefore, every individual address resolution does not imply a new connection between the peers.
虽然地址解析发生在单个IP地址级别,但节点之间的连接在IB层。因此,每个单独的地址解析并不意味着对等点之间的新连接。
Address resolution queries are sent out on the "broadcast-GID" (Broadcast-Group Identifier) over the UD QP associated with the IPoIB interface [RFC4391]. A unicast reply is received on the UD QP.
地址解析查询通过与IPoIB接口[RFC4391]关联的UD QP在“广播GID”(广播组标识符)上发送。在UD-QP上接收单播应答。
IPoIB encapsulation [RFC4391] describes the link-layer address as follows:
IPoIB封装[RFC4391]对链路层地址的描述如下:
<1 octet reserved>:QP: GID
<1 octet reserved>:QP: GID
This document extends the link-layer address as follows:
本文档扩展了链接层地址,如下所示:
<Flags>:QPN:GID
<Flags>:QPN:GID
Flags:
旗帜:
This is a single-octet field. The bits indicate the connected modes supported by the interface.
这是一个八位组字段。这些位表示接口支持的连接模式。
Bit 0 specifies the support for the "reliable connected" (RC) mode. Bit 1 indicates the support for the "unreliable connected" (UC) mode. All other bits in the octet are reserved and MUST be set to 0 on transmits and ignored on receives. The format of the flags is as follows:
位0指定对“可靠连接”(RC)模式的支持。位1表示支持“不可靠连接”(UC)模式。八位字节中的所有其他位都是保留的,在传输时必须设置为0,在接收时忽略。标志的格式如下:
+--+--+--+--+--+--+--+--+ |RC|UC| 0| 0| 0| 0| 0| 0| +--+--+--+--+--+--+--+--+
+--+--+--+--+--+--+--+--+ |RC|UC| 0| 0| 0| 0| 0| 0| +--+--+--+--+--+--+--+--+
Both the RC and UC MAY be set at the same time if the interface supports both the modes. Since the IPoIB-UD mode is always supported, there are no flags to indicate IPoIB-UD support.
如果接口支持这两种模式,则可以同时设置RC和UC。由于始终支持IPoIB UD模式,因此没有任何标志指示IPoIB UD支持。
If IPoIB-CM is not supported, i.e., if the implementation only supports IPoIB-UD, then the implementation MUST ignore the <Flags> on reception. It MUST set the <Flags> octet to all zeros on transmission as specified in [RFC4391].
如果不支持IPoIB CM,即如果实现仅支持IPoIB UD,则实现必须忽略接收时的<Flags>。它必须按照[RFC4391]中的规定,在传输时将<Flags>八位字节设置为全零。
QPN:
QPN:
The queue-pair number (QPN) on which the unicast address resolution replies will be received [RFC4391]. An IPoIB interface has only one UD QP associated with it whether or not it supports this extension.
将在其上接收单播地址解析答复的队列对号(QPN)[RFC4391]。IPoIB接口只有一个关联的UD QP,无论它是否支持此扩展。
The QPN also serves another purpose. It is used to form the Service-ID that is used to set up the IB connection.
QPN还有另一个用途。它用于形成用于设置IB连接的服务ID。
On receiving the multicast/broadcast address resolution request, the receiver replies with its own link address, including the associated UD QPN and the appropriate flags.
在接收到多播/广播地址解析请求时,接收器用其自己的链路地址(包括关联的UD QPN和适当的标志)进行应答。
The receiver's reply is unicast back to the sender after the receiver has, as in the case of IPoIB-UD, resolved the GID to the Local Identifier (LID), and determined other required parameters [RFC4391]. Once the address resolution is completed, the underlying IB connection on the supported connection modes can be set up. An implementation is NOT REQUIRED to set up a connection merely because the peer indicates the capability. The decision to make such a connection is left to the implementation.
在接收方将GID解析为本地标识符(LID)并确定其他所需参数后(如IPoIB UD的情况),接收方的回复将单播回发送方[RFC4391]。一旦地址解析完成,就可以在支持的连接模式上建立基础IB连接。仅仅因为对等方指示了该功能,就不需要实现来建立连接。建立这种连接的决定权留给实现。
Once the address resolution is complete, the IB connection can be set up by either of the peers. To set up a connection, IB Management Datagrams (MADs) are directed to the peer's communication manager (CM). The connection request always contains a Service-ID for the peer to associate the request with the appropriate service. If the request is accepted, the peer returns the relevant connected mode QPN in the response MAD. The format of the CM connection messages and the IB connection setup process is described in [IB_ARCH]. The overall handshake is of the form:
地址解析完成后,任何一个对等方都可以建立IB连接。为了建立连接,IB管理数据报(MAD)被定向到对等方的通信管理器(CM)。连接请求始终包含一个服务ID,以便对等方将请求与适当的服务相关联。如果请求被接受,对等方将在响应MAD中返回相关的连接模式QPN。CM连接消息的格式和IB连接设置过程在[IB_ARCH]中描述。整个握手的形式如下:
REQ ----> <---- REP [or REJ(reject)] RTA ----> [or REJ(reject)]
REQ ----> <---- REP [or REJ(reject)] RTA ----> [or REJ(reject)]
The CM messages include, among other parameters, the Service-ID, Local connection-mode QPN, and the payload size to use over the connection.
除其他参数外,CM消息包括服务ID、本地连接模式QPN和通过连接使用的有效负载大小。
Note: The IB connection is set up using the Service-ID as defined in Section 3.5 below. The node MUST keep a record of IB connections it is participating in. The node MAY attempt another connection to the remote peer using the same Service-ID as used for an existing IB connection. Similarly, the receiver of such a connection MAY drop the request with a suitable error indication in the CM response. The decision to accept or initiate multiple connections from or to an IPoIB interface is left to the implementation.
注:IB连接使用下面第3.5节中定义的服务ID进行设置。节点必须保留其参与的IB连接的记录。该节点可以尝试使用与现有IB连接相同的服务ID与远程对等方建立另一个连接。类似地,这种连接的接收器可以在CM响应中丢弃具有适当错误指示的请求。从IPoIB接口接受或启动多个连接的决定权留给实现。
The node that initiated the connection is aware of the target node's IP address as described above. The node receiving the IB connection request, however, cannot determine the initiating node's link address. To enable this determination, every CM message exchanged in setting up the IB connection MUST include the sender's IPoIB-UD QPN in the "private data" [IB_ARCH] field. The IPoIB-UD QPN MUST be included in all "REJ" [IB_ARCH] messages too.
如上所述,发起连接的节点知道目标节点的IP地址。但是,接收IB连接请求的节点无法确定发起节点的链路地址。为实现此确定,在建立IB连接时交换的每个CM消息必须在“专用数据”[IB_ARCH]字段中包含发送方的IPoIB UD QPN。IPoIB UD QPN也必须包含在所有“REJ”[IB_ARCH]消息中。
To ensure that two IB connections are not set up between the peers due to REQ crossing, the following rules MUST be followed:
为确保由于REQ交叉,对等端之间未建立两个IB连接,必须遵守以下规则:
The receiver forms the remote node's link-layer address using the UD QPN received in the "private data" field of the "REQ" message and the GID of the sender included in the "REQ" message. The link-layer address is used to determine if there is already an
接收机使用在“REQ”消息的“私有数据”字段中接收的UD QPN和包括在“REQ”消息中的发送方的GID来形成远程节点的链路层地址。链路层地址用于确定是否已经存在
outstanding connection request "REQ" sent by the local interface to the given received link-layer address. If such an outstanding request is determined, then the two link-layer addresses (local and remote) are numerically compared. If the local link-layer address is numerically smaller, then the connection is accepted, otherwise rejected. The error code in "REJ" MAD is set to "Consumer Reject" [IB_ARCH].
本地接口发送到给定接收链路层地址的未完成连接请求“REQ”。如果确定了这样一个未完成的请求,则对两个链路层地址(本地和远程)进行数字比较。如果本地链路层地址的数值较小,则接受连接,否则拒绝连接。“REJ”MAD中的错误代码设置为“消费者拒绝”[IB_ARCH]。
Note: The link-layer addresses formed for comparison zero out the connection mode flags specified in Section 3.1. The comparison is performed from the most significant octet to the least significant octet of the link-layer address.
注:为比较而形成的链路层地址使第3.1节中规定的连接模式标志为零。从链路层地址的最高有效八位字节到最低有效八位字节执行比较。
The above holds even if the receiver supports multiple IB connections from the same peer. This is to ensure that only one more connection is set up when the "REQ" messages cross.
即使接收器支持来自同一对等方的多个IB连接,上述情况仍然成立。这是为了确保当“REQ”消息交叉时,只会再建立一个连接。
IB connections created through IPoIB-CM are considered part of an IPoIB interface. As such, they SHOULD be torn down when the IPoIB interfaces they are associated with are torn down.
通过IPoIB CM创建的IB连接被视为IPoIB接口的一部分。因此,当与它们相关联的IPoIB接口被拆除时,它们应该被拆除。
Furthermore, the IB connection between two peers MAY be torn down by either peer whenever the address resolution entry expires. An implementation is free to implement alternative policies for tearing down of IB connections between peers.
此外,每当地址解析条目过期时,两个对等方之间的IB连接可能会被任一对等方中断。一个实现可以自由地实现用于断开对等点之间IB连接的替代策略。
The InfiniBand specification defines a block of Service-IDs for IETF use. The InfiniBand specification has left the definition and management of this block to the IETF [IB_ARCH]. The 64-bit block is as follows:
InfiniBand规范定义了IETF使用的服务ID块。InfiniBand规范将此块的定义和管理留给了IETF[ibu ARCH]。64位块如下所示:
+--------+--------+--------+--------+-------+--------+--------+------+ |00000001|<-------------------IETF use------------------------------>| +--------+--------+--------+--------+-------+--------+--------+------+
+--------+--------+--------+--------+-------+--------+--------+------+ |00000001|<-------------------IETF use------------------------------>| +--------+--------+--------+--------+-------+--------+--------+------+
The Service-IDs used by IPoIB will be in the following format:
IPoIB使用的服务ID将采用以下格式:
+--------+--------+--------+--------+-------+-------+--------+-------+ |00000001| Type | Reserved | QPN | +--------+--------+--------+--------+-------+-------+--------+-------+
+--------+--------+--------+--------+-------+-------+--------+-------+ |00000001| Type | Reserved | QPN | +--------+--------+--------+--------+-------+-------+--------+-------+
The "Type" field MUST be set to 0.
“类型”字段必须设置为0。
The "Reserved" field MUST be set to zeros.
“保留”字段必须设置为零。
The QPN MUST be the UD QP exchanged during address resolution.
QPN必须是地址解析期间交换的UD QP。
All IP datagrams transported over InfiniBand are prefixed by a 4-octet encapsulation header as described in [RFC4391].
所有通过InfiniBand传输的IP数据报都以[RFC4391]中所述的4-octet封装头作为前缀。
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Type | Reserved | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | Type | Reserved | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The type field SHALL indicate the encapsulated protocol as per the following table.
类型字段应根据下表指示封装协议。
+----------+-------------+ | Type | Protocol | |------------------------| | 0x800 | IPv4 | |------------------------| | 0x86DD | IPv6 | +------------------------+
+----------+-------------+ | Type | Protocol | |------------------------| | 0x800 | IPv4 | |------------------------| | 0x86DD | IPv6 | +------------------------+
These values are taken from the "ETHER TYPE" numbers assigned by Internet Assigned Numbers Authority (IANA). Other network protocols, identified by different values of "ETHER TYPE", may use the encapsulation format defined herein, but such use is outside of the scope of this document.
这些值取自互联网分配号码管理局(IANA)分配的“乙醚类型”号码。由不同的“乙醚类型”值标识的其他网络协议可以使用本文定义的封装格式,但这种使用不在本文档的范围内。
The IB connection setup might be used for both IPv4 and IPv6 or it could be used for only one of them while a different connection is used for the other. The link MTU MUST be able to support the minimum MTU required by the protocols.
IB连接设置可以同时用于IPv4和IPv6,也可以仅用于其中一个,而另一个使用不同的连接。链路MTU必须能够支持协议要求的最小MTU。
The default MTU of the IPoIB-CM interface is 2044 octets, i.e., 2048-octet IPoIB-link MTU minus the 4-octet encapsulation header.
IPoIB CM接口的默认MTU为2044个八位字节,即2048个八位字节的IPoIB链路MTU减去4个八位字节的封装头。
However, connected modes of InfiniBand allow message sizes up to 2^31 octets. Therefore, IPoIB-CM can use a much larger MTU for unicast communication between any two endpoints. The maximum and/or optimal payload that can be received or sent over an InfiniBand connection is dependent on the implementation, IB Channel Adapter, and the resources configured.
但是,InfiniBand的连接模式允许消息大小达到2^31个八位字节。因此,IPoIB CM可以在任意两个端点之间使用更大的MTU进行单播通信。通过InfiniBand连接可以接收或发送的最大和/或最佳负载取决于实现、IB通道适配器和配置的资源。
An implementation MAY utilize the following mechanism to exchange the optimal message size across the IB connection.
实现可以利用以下机制在IB连接上交换最佳消息大小。
Every IB connection setup message includes a "private data" field [IB_ARCH]. The "private data" field in the connection setup message (CM REQ) MUST include the "Receive MTU". This indicates the maximum packet size the requester can accept. The requester MUST be able to accept smaller MTU sizes as well.
每个IB连接设置消息都包含一个“专用数据”字段[IB_ARCH]。连接设置消息(CM REQ)中的“专用数据”字段必须包含“接收MTU”。这表示请求者可以接受的最大数据包大小。请求者也必须能够接受较小的MTU尺寸。
It is up to the implementation to utilize this mechanism for setting the per-IB connection MTU. To calculate the resultant IPoIB MTU over the connection the smaller of the two IB "Receive MTU" values is used by both the peers. The IPoIB interface must also account for the 4- octet encapsulation header and so the IPoIB MTU over the connection will be further reduced by that amount.
利用此机制设置每IB连接MTU取决于实现。为了计算连接上产生的IPoIB MTU,两个对等方使用两个IB“接收MTU”值中较小的值。IPoIB接口还必须考虑4-octet封装头,因此连接上的IPoIB MTU将进一步减少该数量。
The "private data" field in every CM message for connection establishment must include the following values:
每个CM“连接”字段中的“建立”值必须包括以下值:
1. UD QPN of the sender 2. Receive MTU supported by the sender
1. 发送方2的UD QPN。接收发送方支持的MTU
The format of the "private data" field MUST be as follows:
“专用数据”字段的格式必须如下所示:
0 7 15 23 31 +--------+--------+--------+--------+ |Reserved| UD QPN | +--------+--------+--------+--------+ | Receive MTU | +--------+--------+--------+--------+
0 7 15 23 31 +--------+--------+--------+--------+ |Reserved| UD QPN | +--------+--------+--------+--------+ | Receive MTU | +--------+--------+--------+--------+
The Reserved value MUST be set to zero on transmit and ignored on receive.
传输时保留值必须设置为零,接收时忽略。
Every IPoIB interface supports IPoIB-UD. It may additionally support one or both of the IPoIB-CM modes. Therefore, there can be multiple methods of communicating between any two peers. This implies that an interface MAY transmit/receive a packet over any of the RC, UC, or UD modes depending on the modes supported between it and the peer. It further follows that every IPoIB implementation compliant with this document MUST accept all IP unicast transmissions over any of the IPoIB modes it supports. Multicast and broadcast packets by their nature will always be transmitted and received over the IPoIB-UD QP. Additionally, all address resolution responses (ARP or Neighbor Discovery) MUST always be encapsulated in a UD mode packet.
每个IPoIB接口都支持IPoIB UD。它还可以支持一种或两种IPoIB CM模式。因此,任何两个对等方之间可以有多种通信方法。这意味着接口可以通过任何RC、UC或UD模式发送/接收数据包,这取决于它与对等方之间支持的模式。此外,符合本文档的每个IPoIB实现必须接受通过其支持的任何IPoIB模式进行的所有IP单播传输。多播和广播数据包的性质将始终通过IPoIB UD QP传输和接收。此外,所有地址解析响应(ARP或邻居发现)必须始终封装在UD模式数据包中。
The RC mode of InfiniBand guarantees in-order delivery of packets. Every message transmitted over the RC connection is broken into physical MTU-sized packets by the RC connection. If any packet is lost, it is retransmitted until the complete message is exchanged. Therefore, there is a possibility of an upper transport layer experiencing a timeout, while the RC layer is still in the process of transferring the complete message. TCP will view the timeout as an indicator of congestion and enter slow-start thereby affecting throughput drastically [RFC2581]. Other upper-layer protocols might insert retransmissions into the fabric, adding to the already existing congestion.
InfiniBand的RC模式保证了数据包的有序传递。通过RC连接传输的每条消息都被RC连接分解为MTU大小的物理数据包。如果任何数据包丢失,它将被重新传输,直到交换完整的消息。因此,当RC层仍在传输完整消息的过程中时,上层传输层有可能经历超时。TCP将超时视为拥塞的指示器,并进入慢速启动,从而显著影响吞吐量[RFC2581]。其他上层协议可能会将重传插入到结构中,从而增加已经存在的拥塞。
The applicability of Infiniband reliability is on a fabric with short latencies (not wide area). Therefore, the RC timer values should be short compared with the starting minimum time values used by the upper end-to-end transports. In addition, because the RC mode does not have measurement-based reliable transmission, its use over fabrics with long latency or very dynamic latency may be a concern for congestion-aware traffic traversing those fabrics.
Infiniband可靠性适用于具有短延迟(而非广域)的结构。因此,与高端端到端传输使用的起始最小时间值相比,RC定时器值应该较短。此外,由于RC模式没有基于测量的可靠传输,因此在具有长延迟或非常动态延迟的结构上使用RC模式可能会引起通过这些结构的拥塞感知流量的关注。
As described above, interfaces on the same subnet may support different link MTUs based on the negotiated value or due to the link type (UD or connected mode). Therefore, an implementation might choose to define a large IP MTU, which is reduced based on the MTU to the destination. The relevant MTU may be stored in a suitable per-destination object, such as a route cache or a neighbor cache. The per-destination MTU is known to the IPoIB-CM interface as described in Section 5.
如上所述,基于协商值或由于链路类型(UD或连接模式),同一子网上的接口可能支持不同的链路MTU。因此,一个实现可能会选择定义一个大型的IP MTU,该MTU根据目标的MTU而减少。相关MTU可以存储在适当的每个目的地对象中,例如路由缓存或邻居缓存。IPoIB CM接口已知每个目的地MTU,如第5节所述。
Implementations might choose not to support differing MTU values and always support an MTU equal to the IPoIB-UD MTU determined from the broadcast GID.
实现可能选择不支持不同的MTU值,并且始终支持等于根据广播GID确定的IPoIB UD MTU的MTU。
An impostor may return a false set of flags to an IPOIB interface. This may cause unnecessary attempts and some delay/disruption in IPoIB communication. The same is the case if wrong/spurious QPN values are provided during address resolution broadcast/multicast.
冒名顶替者可能会向IPOIB接口返回一组错误的标志。这可能会导致不必要的尝试以及IPoIB通信中的一些延迟/中断。如果在地址解析广播/多播期间提供了错误/虚假的QPN值,则情况也是如此。
Future uses of the reserved bits and octets in the link-layer address (Section 3.1), Service-ID (Section 3.5), and "Private-Data Format" (Section 6) MUST be published as RFCs. This document requires that the reserved bits be set to zero on sends.
链路层地址(第3.1节)、服务ID(第3.5节)和“专用数据格式”(第6节)中保留位和八位字节的未来使用必须作为RFC发布。本文档要求在发送时将保留位设置为零。
The author thanks the IPoIB Working Group for the various comments and suggestions. A special thanks to Bernie King-Smith and Dror Goldenberg for the detailed review and suggestions.
作者感谢IPoIB工作组提出的各种意见和建议。特别感谢伯尼·金·史密斯和德罗·戈登伯格的详细评论和建议。
[IB_ARCH] InfiniBand Architecture Specification, version 1.2 www.infinibandta.org
[IB_ARCH]InfiniBand体系结构规范,版本1.2 www.infinibandta.org
[RFC4392] Kashyap, V., "IP over InfiniBand (IPoIB) Architecture", RFC 4392, April 2006.
[RFC4392]Kashyap,V.,“InfiniBand上的IP(IPoIB)架构”,RFC 4392,2006年4月。
[RFC4391] Chu, J. and V. Kashyap, "Transmission of IP over InfiniBand (IPoIB)", RFC 4391, April 2006.
[RFC4391]Chu,J.和V.Kashyap,“InfiniBand上的IP传输(IPoIB)”,RFC 4391,2006年4月。
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control ", RFC 2581, April 1999.
[RFC2581]Allman,M.,Paxson,V.和W.Stevens,“TCP拥塞控制”,RFC 25811999年4月。
Author's Address
作者地址
Vivek Kashyap 15350, SW Koll Parkway Beaverton OR 97006
Vivek Kashyap 15350,科尔公园大道西南比弗顿或97006
Phone: +1 503 578 3422 EMail: vivk@us.ibm.com
Phone: +1 503 578 3422 EMail: vivk@us.ibm.com
Full Copyright Statement
完整版权声明
Copyright (C) The IETF Trust (2006).
版权所有(C)IETF信托基金(2006年)。
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST, AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件及其包含的信息以“原样”为基础提供,贡献者、他/她所代表或赞助的组织(如有)、互联网协会、IETF信托基金和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。
Intellectual Property
知识产权
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.
IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.
Acknowledgement
确认
Funding for the RFC Editor function is currently provided by the Internet Society.
RFC编辑功能的资金目前由互联网协会提供。