Network Working Group                                             J. Chu
Request for Comments: 4391                              Sun Microsystems
Category: Standards Track                                     V. Kashyap
                                                                     IBM
                                                              April 2006
        
Network Working Group                                             J. Chu
Request for Comments: 4391                              Sun Microsystems
Category: Standards Track                                     V. Kashyap
                                                                     IBM
                                                              April 2006
        

Transmission of IP over InfiniBand (IPoIB)

通过InfiniBand(IPoIB)传输IP

Status of This Memo

关于下段备忘

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2006).

版权所有(C)互联网协会(2006年)。

Abstract

摘要

This document specifies a method for encapsulating and transmitting IPv4/IPv6 and Address Resolution Protocol (ARP) packets over InfiniBand (IB). It describes the link-layer address to be used when resolving the IP addresses in IP over InfiniBand (IPoIB) subnets. The document also describes the mapping from IP multicast addresses to InfiniBand multicast addresses. In addition, this document defines the setup and configuration of IPoIB links.

本文档指定了通过InfiniBand(IB)封装和传输IPv4/IPv6和地址解析协议(ARP)数据包的方法。它描述了在解析InfiniBand上的IP(IPoIB)子网中的IP地址时要使用的链路层地址。本文档还描述了从IP多播地址到InfiniBand多播地址的映射。此外,本文档还定义了IPoIB链路的设置和配置。

Table of Contents

目录

   1. Introduction ....................................................2
   2. IP over UD Mode .................................................2
   3. InfiniBand Datalink .............................................3
   4. Multicast Mapping ...............................................3
      4.1. Broadcast-GID Parameters ...................................5
   5. Setting Up an IPoIB Link ........................................6
   6. Frame Format ....................................................6
   7. Maximum Transmission Unit .......................................8
   8. IPv6 Stateless Autoconfiguration ................................8
      8.1. IPv6 Link-Local Address ....................................9
   9. Address Mapping - Unicast .......................................9
      9.1. Link Information ...........................................9
           9.1.1. Link-Layer Address/Hardware Address ................11
           9.1.2. Auxiliary Link Information .........................12
        
   1. Introduction ....................................................2
   2. IP over UD Mode .................................................2
   3. InfiniBand Datalink .............................................3
   4. Multicast Mapping ...............................................3
      4.1. Broadcast-GID Parameters ...................................5
   5. Setting Up an IPoIB Link ........................................6
   6. Frame Format ....................................................6
   7. Maximum Transmission Unit .......................................8
   8. IPv6 Stateless Autoconfiguration ................................8
      8.1. IPv6 Link-Local Address ....................................9
   9. Address Mapping - Unicast .......................................9
      9.1. Link Information ...........................................9
           9.1.1. Link-Layer Address/Hardware Address ................11
           9.1.2. Auxiliary Link Information .........................12
        
      9.2. Address Resolution in IPv4 Subnets ........................13
      9.3. Address Resolution in IPv6 Subnets ........................14
      9.4. Cautionary Note on QPN Caching ............................14
   10. Sending and Receiving IP Multicast Packets ....................14
   11. IP Multicast Routing ..........................................16
   12. New Types of Vulnerability in IB Multicast ....................17
   13. Security Considerations .......................................17
   14. IANA Considerations ...........................................18
   15. Acknowledgements ..............................................18
   16. References ....................................................18
      16.1. Normative References .....................................18
      16.2. Informative References ...................................19
        
      9.2. Address Resolution in IPv4 Subnets ........................13
      9.3. Address Resolution in IPv6 Subnets ........................14
      9.4. Cautionary Note on QPN Caching ............................14
   10. Sending and Receiving IP Multicast Packets ....................14
   11. IP Multicast Routing ..........................................16
   12. New Types of Vulnerability in IB Multicast ....................17
   13. Security Considerations .......................................17
   14. IANA Considerations ...........................................18
   15. Acknowledgements ..............................................18
   16. References ....................................................18
      16.1. Normative References .....................................18
      16.2. Informative References ...................................19
        
1. Introduction
1. 介绍

The InfiniBand specification [IBTA] can be found at http://www.infinibandta.org. The document [RFC4392] provides a short overview of InfiniBand architecture (IBA) along with considerations for specifying IP over InfiniBand networks.

InfiniBand规范[IBTA]可在以下位置找到:http://www.infinibandta.org. 文档[RFC4392]简要概述了InfiniBand体系结构(IBA)以及在InfiniBand网络上指定IP的注意事项。

IBA defines multiple modes of transport over which IP may be implemented. The Unreliable Datagram (UD) transport mode best matches the needs of IP and the need for universality as described in [RFC4392].

IBA定义了IP可通过的多种传输模式。不可靠数据报(UD)传输模式最符合IP需求和[RFC4392]中所述的通用性需求。

This document specifies IPoIB over IB's UD mode. The implementation of IP subnets over IB's other transport mechanisms is out of scope of this document.

本文件规定IPoIB优于IB的UD模式。在IB的其他传输机制上实现IP子网超出了本文件的范围。

This document describes the necessary steps required in order to lay out an IP network on top of an IB network. It describes all the elements of an IPoIB link, how to configure its associated attributes, and how to set up basic broadcast and multicast services for it.

本文档描述了在IB网络上布置IP网络所需的必要步骤。它描述了IPoIB链路的所有元素,如何配置其相关属性,以及如何为其设置基本广播和多播服务。

It further describes IP address resolution and the encapsulation of IP and Address Resolution Protocol (ARP) packets in InfiniBand frame.

它进一步描述了IP地址解析以及IP和地址解析协议(ARP)包在InfiniBand帧中的封装。

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。

2. IP over UD Mode
2. IP over UD模式

The unreliable datagram mode of communication is supported by all IB elements be they IB routers, Host Channel Adapters (HCAs), or Target Channel Adapters (TCAs). In addition to being the only universal transmission method, it supports multicasting, partitioning, and a

所有IB元素都支持不可靠的数据报通信模式,无论是IB路由器、主机通道适配器(HCA)还是目标通道适配器(TCA)。除了作为唯一的通用传输方法外,它还支持多播、分区和

32-bit Cyclic Redundancy Check (CRC) [IBTA]. Though multicasting support is optional in IB fabrics, IPoIB architecture requires the participating components to support it.

32位循环冗余校验(CRC)[IBTA]。尽管在IB结构中多播支持是可选的,但IPoIB体系结构需要参与的组件来支持它。

All IPoIB implementations MUST support IP over the UD transport mode of IBA.

所有IPoIB实施必须支持IBA的UD传输模式上的IP。

3. InfiniBand Datalink
3. InfiniBand数据链路

An IB subnet is formed by a network of IB nodes interconnected either directly or via IB switches. IB subnets may be connected using IB routers to form a fabric made of multiple IB subnets. Nodes residing in different IB subnets can communicate directly with one another through IB routers at the IB network layer. Multiple IP subnets may be overlaid over this IB network.

IB子网由直接或通过IB交换机互连的IB节点网络构成。IB子网可以使用IB路由器连接,以形成由多个IB子网组成的结构。驻留在不同IB子网中的节点可以通过IB网络层的IB路由器彼此直接通信。多个IP子网可覆盖在此IB网络上。

An IP subnet is configured over a communication facility or medium over which nodes can communicate at the "link" layer [IPV6]. For example, an ethernet segment is a link formed by interconnected switches/hubs/bridges. The segment is therefore defined by the physical topology of the network. This is not the case with IPoIB. IPoIB subnets are built over an abstract "link". The link is defined by its members and common characteristics such as the P_Key, link MTU, and the Q_Key.

IP子网是在通信设施或介质上配置的,节点可通过该通信设施或介质在“链路”层[IPV6]上进行通信。例如,以太网段是由互连交换机/集线器/网桥形成的链路。因此,网段由网络的物理拓扑定义。IPoIB的情况并非如此。IPoIB子网是在抽象的“链路”上构建的。链接由其成员和公共特征(如P_键、链接MTU和Q_键)定义。

Any two ports using UD communication mode in an IB fabric can communicate only if they are in the same partition (i.e., have the same P_Key and the same Q_Key) [RFC4392]. The link MTU provides a limit to the size of the payload that may be used. The packet transmission and routing within the IB fabric are also affected by additional parameters such as the traffic class (TClass), hop limit (HopLimit), service level (SL), and the flow label (FlowLabel) [RFC4392]. The determination and use of these values for IPoIB communication are described in the following sections.

IB结构中使用UD通信模式的任何两个端口只有在它们位于同一分区(即,具有相同的P_密钥和相同的Q_密钥)时才能通信[RFC4392]。链路MTU对可使用的有效负载大小提供限制。IB结构内的数据包传输和路由也受到附加参数的影响,如通信量类别(TClass)、跃点限制(HopLimit)、服务级别(SL)和流标签(FlowLabel)[RFC4392]。以下各节介绍了IPoIB通信中这些值的确定和使用。

4. Multicast Mapping
4. 多播映射

IB identifies multicast groups by the Multicast Global Identifiers (MGIDs), which follow the same rules as IPv6 multicast addresses. Hence the MGIDs follow the same rules regarding the transient addresses and scope bits albeit in the context of the IB fabric. The resultant address therefore resembles IPv6 multicast addresses. The documents [IBTA, RFC4392] give a detailed description of IB multicast.

IB通过多播全局标识符(MGID)标识多播组,MGID遵循与IPv6多播地址相同的规则。因此,尽管在IB结构的上下文中,mgid遵循关于瞬时地址和作用域位的相同规则。因此,生成的地址类似于IPv6多播地址。文献[IBTA,RFC4392]给出了IB多播的详细描述。

The IPoIB multicast mapping is depicted in figure 1. The same mapping function is used for both IPv4 and IPv6 except for the IPoIB signature field.

IPoIB多播映射如图1所示。除了IPoIB签名字段外,IPv4和IPv6都使用相同的映射函数。

Unless explicitly stated, all addresses and fields in the protocol headers in this document are stored in the network byte order.

除非明确说明,否则本文档中协议头中的所有地址和字段均以网络字节顺序存储。

   |   8    |  4 |  4 |     16 bits     | 16 bits |      80 bits      |
   +------ -+----+----+-----------------+---------+-------------------+
   |11111111|0001|scop|<IPoIB signature>|< P_Key >|      group ID     |
   +--------+----+----+-----------------+---------+-------------------+
        
   |   8    |  4 |  4 |     16 bits     | 16 bits |      80 bits      |
   +------ -+----+----+-----------------+---------+-------------------+
   |11111111|0001|scop|<IPoIB signature>|< P_Key >|      group ID     |
   +--------+----+----+-----------------+---------+-------------------+
        

Figure 1

图1

Since an MGID allocated for transporting IP multicast datagrams is considered only a transient link-layer multicast address [RFC4392], all IB MGIDs allocated for IPoIB purpose MUST set T-flag to 1 [IBTA].

由于分配用于传输IP多播数据报的MGID仅被视为临时链路层多播地址[RFC4392],因此分配用于IPoIB目的的所有IB MGID必须将T-flag设置为1[IBTA]。

A special signature is embedded to identify the MGID for IPoIB use only. For IPv4 over IB, the signature MUST be "0x401B". For IPv6 over IB, the signature MUST be "0x601B".

嵌入了一个特殊的签名,用于标识仅用于IPoIB的MGID。对于IPv4 over IB,签名必须为“0x401B”。对于IPv6 over IB,签名必须是“0x601B”。

The IP multicast address is used together with a given IPoIB link P_Key to form the MGID of the IB multicast group. For IPv6 the lower 80-bit of the group ID is used directly in the lower 80-bit of the MGID. For IPv4, the group ID is only 28-bit long, and is placed directly in the lower 28 bits of the MGID. The rest of the group ID bits in the MGID are filled with 0.

IP多播地址与给定的IPoIB链路P_密钥一起使用,以形成IB多播组的MGID。对于IPv6,组ID的下80位直接用于MGID的下80位。对于IPv4,组ID只有28位长,直接放在MGID的较低28位。MGID中其余的组ID位用0填充。

E.g., on an IPoIB link that is fully contained within a single IB subnet with a P_Key of 0x8000, the MGIDs for the all-router multicast group with group ID 2 [AARCH, IGMP3] are:

例如,在P_密钥为0x8000的单个IB子网中完全包含的IPoIB链路上,组ID为2[AARCH,IGMP3]的全路由器多播组的MGID为:

FF12:401B:8000::2, for IPv4 in compressed format, and FF12:601B:8000::2, for IPv6 in compressed format.

FF12:401B:8000::2,用于压缩格式的IPv4,FF12:601B:8000::2,用于压缩格式的IPv6。

A special case exists for the IPv4 limited broadcast address "255.255.255.255" [HOSTS]. The address SHALL be mapped to the "broadcast-GID", which is defined as follows:

IPv4有限广播地址“255.255.255.255”[主机]存在一种特殊情况。地址应映射到“广播GID”,其定义如下:

   |   8    |  4 |  4 |     16 bits    | 16 bits | 48 bits  | 32 bits |
   +--------+----+----+----------------+---------+----------+---------+
   |11111111|0001|scop|0100000000011011|< P_Key >|00.......0|<all 1's>|
   +--------+----+----+----------------+---------+----------+---------+
        
   |   8    |  4 |  4 |     16 bits    | 16 bits | 48 bits  | 32 bits |
   +--------+----+----+----------------+---------+----------+---------+
   |11111111|0001|scop|0100000000011011|< P_Key >|00.......0|<all 1's>|
   +--------+----+----+----------------+---------+----------+---------+
        

Figure 2

图2

All MGIDs used in the IPoIB subnet MUST use the same scop bits as in the corresponding broadcast-GID.

IPoIB子网中使用的所有MGID必须使用与相应广播GID中相同的scop位。

4.1. Broadcast-GID Parameters
4.1. 广播GID参数

The broadcast-GID is set up with the following attributes:

广播GID设置为具有以下属性:

1. P_Key

1. P_键

A "Full Membership" P_Key (high-order bit is set to 1) MUST be used so that all members may communicate with one another.

必须使用“完全成员”P_键(高阶位设置为1),以便所有成员可以相互通信。

2. Q_Key

2. Q_键

It is RECOMMENDED that a controlled Q_Key be used with the high-order bit set. This is to prevent non-privileged software from fabricating and sending out bogus IP datagrams.

建议将受控Q_键与高阶位集一起使用。这是为了防止非特权软件制造和发送虚假IP数据报。

3. IB MTU

3. IB MTU

The value assigned to the broadcast-GID must not be greater than any physical link MTU spanned by the IPoIB subnet.

分配给广播GID的值不得大于IPoIB子网跨越的任何物理链路MTU。

The following attributes are required in multicast transmissions and also in unicast transmissions if an IPoIB link covers more than a single IB subnet.

如果IPoIB链路覆盖多个IB子网,则在多播传输和单播传输中需要以下属性。

4. Other parameters

4. 其他参数

The selection of TClass, FlowLabel, and HopLimit values is implementation dependent. But it must take into account the topology of IB subnets comprising the IPoIB link in order to allow successful communication between any two nodes in the same IPoIB link.

TClass、FlowLabel和HopLimit值的选择取决于实现。但它必须考虑组成IPoIB链路的IB子网的拓扑结构,以便允许同一IPoIB链路中的任意两个节点之间的成功通信。

An SL also needs to be assigned to the broadcast-GID. This SL is used in all multicast communication in the subnet.

还需要将SL分配给广播GID。此SL用于子网中的所有多播通信。

The broadcast-GID's scope bits need to be set based on whether the IPoIB link is confined within an IB subnet or the IPoIB link spans multiple IB subnets. A default of local-subnet scope (i.e., 0x2) is RECOMMENDED. A node might determine the scope bits to use by interactively searching for a broadcast-GID of ever greater scope by first starting with the local-scope. Or, an implementation might include the scope bits as a configuration parameter.

广播GID的作用域位需要根据IPoIB链路是限制在IB子网内还是IPoIB链路跨越多个IB子网来设置。建议使用默认的本地子网范围(即0x2)。节点可以通过首先从本地范围开始,以交互方式搜索范围更大的广播GID来确定要使用的范围位。或者,实现可能包括作用域位作为配置参数。

5. Setting Up an IPoIB Link
5. 设置IPoIB链接

The broadcast-GID, as defined in the previous section, MUST be set up for an IPoIB subnet to be formed. Every IPoIB interface MUST "FullMember" join the IB multicast group defined by the broadcast-GID. This multicast group will henceforth be referred to as the broadcast group. The join operation returns the MTU, the Q_Key, and other parameters associated with the broadcast group. The node then associates the parameters received as a result of the join operation with its IPoIB interface. The broadcast group also serves to provide a link-layer broadcast service for protocols like ARP, net-directed, subnet-directed, and all-subnets-directed broadcasts in IPv4 over IB networks.

必须为要形成的IPoIB子网设置上一节中定义的广播GID。每个IPoIB接口必须“FullMember”加入广播GID定义的IB多播组。此后,该多播组将被称为广播组。join操作返回MTU、Q_键以及与广播组相关的其他参数。然后,节点将作为连接操作的结果接收的参数与其IPoIB接口相关联。广播组还用于为IPv4 over IB网络中的ARP、网络定向、子网定向和所有子网定向广播等协议提供链路层广播服务。

The join operation is successful only if the Subnet Manager (SM) determines that the joining node can support the MTU registered with the broadcast group [RFC4392] ensuring support for a common link MTU. The SM also ensures that all the nodes joining the broadcast-GID have paths to one another and can therefore send and receive unicast packets. It further ensures that all the nodes do indeed form a multicast tree that allows packets sent from any member to be replicated to every other member. Thus, the IPoIB link is formed by the IPoIB nodes joining the broadcast group. There is no physical demarcation of the IPoIB link other than that determined by the broadcast group membership.

只有当子网管理器(SM)确定加入节点可以支持向广播组[RFC4392]注册的MTU,从而确保支持公共链路MTU时,加入操作才会成功。SM还确保加入广播GID的所有节点具有彼此的路径,因此可以发送和接收单播分组。它进一步确保所有节点确实形成一个多播树,允许从任何成员发送的数据包复制到每个其他成员。因此,IPoIB链路由加入广播组的IPoIB节点形成。除了由广播组成员确定的物理界限外,IPoIB链路没有物理界限。

The P_Key is a configuration parameter that must be known before the broadcast-GID can be formed. For a node to join a partition, one of its ports must be assigned the relevant P_Key by the SM [RFC4392].

P_键是在形成广播GID之前必须知道的配置参数。对于要加入分区的节点,SM[RFC4392]必须为其一个端口分配相关的P_密钥。

The method of creation of the broadcast group and the assignment/choice of its parameters are up to the implementation and/or the administrator of the IPoIB subnet. The broadcast group may be created by the first IPoIB node to be initialized, or it can be created administratively before the IPoIB subnet is set up. It is RECOMMENDED that the creation and deletion of the broadcast group be under administrative control.

广播组的创建方法及其参数的分配/选择取决于IPoIB子网的实施和/或管理员。广播组可以由要初始化的第一个IPoIB节点创建,也可以在设置IPoIB子网之前以管理方式创建。建议对广播组的创建和删除进行管理控制。

InfiniBand multicast management, which includes the creation, joining, and leaving of IB multicast groups by IB nodes, is described in [RFC4392].

InfiniBand多播管理,包括IB节点创建、加入和离开IB多播组,如[RFC4392]所述。

6. Frame Format
6. 帧格式

All IP and ARP datagrams transported over InfiniBand are prefixed by a 4-octet encapsulation header as illustrated below.

通过InfiniBand传输的所有IP和ARP数据报都以4-octet封装头作为前缀,如下所示。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               |                               |
   |         Type                  |       Reserved                |
   |                               |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               |                               |
   |         Type                  |       Reserved                |
   |                               |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 3

图3

The "Reserved" field MUST be set to zero on send and ignored on receive unless specified differently in a future document.

除非在将来的文档中另有规定,否则“保留”字段在发送时必须设置为零,在接收时必须忽略。

The "Type" field SHALL indicate the encapsulated protocol as per the following table.

“类型”字段应根据下表指示封装协议。

                      +----------+-------------+
                      | Type     |    Protocol |
                      |------------------------|
                      | 0x800    |    IPv4     |
                      |------------------------|
                      | 0x806    |    ARP      |
                      |------------------------|
                      | 0x8035   |    RARP     |
                      |------------------------|
                      | 0x86DD   |    IPv6     |
                      +------------------------+
        
                      +----------+-------------+
                      | Type     |    Protocol |
                      |------------------------|
                      | 0x800    |    IPv4     |
                      |------------------------|
                      | 0x806    |    ARP      |
                      |------------------------|
                      | 0x8035   |    RARP     |
                      |------------------------|
                      | 0x86DD   |    IPv6     |
                      +------------------------+
        

Table 1

表1

These values are taken from the "ETHER TYPE" numbers assigned by Internet Assigned Numbers Authority (IANA) [IANA]. Other network protocols, identified by different values of "ETHER TYPE", may use the encapsulation format defined herein, but such use is outside of the scope of this document.

这些值取自互联网分配号码管理局(IANA)[IANA]分配的“乙醚型”号码。由不同的“乙醚类型”值标识的其他网络协议可以使用本文定义的封装格式,但这种使用不在本文档的范围内。

   |<------ IB Frame headers -------->|<- Payload ->|<- IB trailers ->|
   +-------+------+---------+---------+-------------+---------+-------+
   |Local  |      |Base     |Datagram |   4-octet   |         |       |
   |Routing| GRH* |Transport|Extended |   header    |Invariant|Variant|
   |Header |Header|Header   |Transport|      +      |  CRC    |  CRC  |
   |       |      |         |Header   |   IP/ARP    |         |       |
   +-------+------+---------+---------+-------------+---------+-------+
        
   |<------ IB Frame headers -------->|<- Payload ->|<- IB trailers ->|
   +-------+------+---------+---------+-------------+---------+-------+
   |Local  |      |Base     |Datagram |   4-octet   |         |       |
   |Routing| GRH* |Transport|Extended |   header    |Invariant|Variant|
   |Header |Header|Header   |Transport|      +      |  CRC    |  CRC  |
   |       |      |         |Header   |   IP/ARP    |         |       |
   +-------+------+---------+---------+-------------+---------+-------+
        

Figure 4

图4

Figure 4 depicts the IB frame encapsulating an IP/ARP datagram. The InfiniBand specification requires the use of Global Routing Header

图4描述了封装IP/ARP数据报的IB帧。InfiniBand规范要求使用全局路由头

(GRH) [RFC4392] when multicasting or when an InfiniBand packet traverses from one IB subnet to another through an IB router. Its use is optional when used for unicast transmission between nodes within an IB subnet. The IPoIB implementation MUST be able to handle packets received with or without the use of GRH.

(GRH)[RFC4392]当多播或InfiniBand数据包通过IB路由器从一个IB子网到另一个子网时。当用于IB子网内节点之间的单播传输时,它的使用是可选的。IPoIB实现必须能够处理使用或不使用GRH接收的数据包。

7. Maximum Transmission Unit
7. 最大传输单位

IB MTU: The IB components, that is, IB links, switches, Channel Adapters (CAs), and IB routers, may support maximum payloads of 256, 512, 1024, 2048, or 4096 octets. The maximum IB payload supported by the IB components in any IB path is the IB MTU for the path.

IB MTU:IB组件,即IB链路、交换机、通道适配器(CA)和IB路由器,可支持256、512、1024、2048或4096个八位字节的最大有效负载。任何IB路径中IB组件支持的最大IB有效负载是该路径的IB MTU。

IPoIB-Link MTU: The IPoIB-link MTU is the MTU value associated with the broadcast group. The IPoIB-link MTU can be set to any value up to the smallest IB MTU supported by the IB components comprising the IPoIB link.

IPoIB链路MTU:IPoIB链路MTU是与广播组关联的MTU值。IPoIB链路MTU可以设置为任何值,最大值为构成IPoIB链路的IB组件支持的最小IB MTU。

In order to reduce problems with fragmentation and path-MTU discovery, this document requires that all IPoIB implementations support an MTU of 2044 octets, that is, a 2048-octet IPoIB-link MTU minus the 4-octet encapsulation overhead. Larger and smaller MTUs MAY be supported subject to other existing MTU requirements [IPV6], but the default configuration must support an MTU of 2044 octets.

为了减少碎片和路径MTU发现问题,本文档要求所有IPoIB实现都支持2044个八位字节的MTU,即2048个八位字节的IPoIB链路MTU减去4个八位字节的封装开销。根据其他现有MTU要求[IPV6],可能支持更大和更小的MTU,但默认配置必须支持2044个八位字节的MTU。

8. IPv6 Stateless Autoconfiguration
8. IPv6无状态自动配置

IB architecture associates an EUI-64 identifier termed the Globally Unique Identifier (GUID) [RFC4392, IBTA] with each port. The Local Identifier (LID) is unique within an IB subnet only.

IB architecture associates an EUI-64 identifier termed the Globally Unique Identifier (GUID) [RFC4392, IBTA] with each port. The Local Identifier (LID) is unique within an IB subnet only.translate error, please retry

The interface identifier may be chosen from the following:

接口标识符可从以下各项中选择:

1) The EUI-64-compliant GUID assigned by the manufacturer.

1) 制造商分配的符合EUI-64的GUID。

2) If the IPoIB subnet is fully contained within an IB subnet, any of the unique 16-bit LIDs of the port associated with the IPoIB interface.

2) 如果IPoIB子网完全包含在IB子网中,则与IPoIB接口关联的端口的任何唯一16位LID。

The LID values of a port may change after a reboot/power-cycle of the IB node. Therefore, if a persistent value is desired, it would be prudent not to use the LID to form the interface identifier.

IB节点重新启动/通电后,端口的LID值可能会更改。因此,如果需要一个持久值,最好不要使用LID来形成接口标识符。

On the other hand, the LID provides an identifier that can be used to create a more anonymous IPv6 address since the LID is not globally unique and is subject to change over time.

另一方面,LID提供了一个标识符,可用于创建更匿名的IPv6地址,因为LID不是全局唯一的,并且随着时间的推移会发生变化。

It is RECOMMENDED that the link-local address be constructed from the port's EUI-64 identifier as given below.

建议根据端口的EUI-64标识符构造链路本地地址,如下所示。

[AARCH] requires that the interface identifier be created in the "Modified EUI-64" format when derived from an EUI-64 identifier. [IBTA] is unclear if the GUID should use IEEE EUI-64 format or the "Modified EUI-64" format. Therefore, when creating an interface identifier from the GUID, an implementation MUST do the following:

[AARCH]要求在从EUI-64标识符派生时,以“修改的EUI-64”格式创建接口标识符。[IBTA]不清楚GUID应使用IEEE EUI-64格式还是“修改的EUI-64”格式。因此,从GUID创建接口标识符时,实现必须执行以下操作:

=> Determine if the GUID is a modified EUI-64 identifier ("u" bit is toggled) as defined by [AARCH]

=>确定GUID是否为[AARCH]定义的修改过的EUI-64标识符(“u”位被切换)

=> If the GUID is a modified EUI-64 identifier, then the "u" bit MUST NOT be toggled when creating the interface identifier

=>如果GUID是修改后的EUI-64标识符,则在创建接口标识符时不得切换“u”位

=> If the GUID is an unmodified EUI-64 identifier, then the "u" bit MUST be toggled in compliance with [AARCH]

=>如果GUID是未修改的EUI-64标识符,则必须按照[AARCH]切换“u”位

8.1. IPv6 Link-Local Address
8.1. IPv6链路本地地址

The IPv6 link-local address for an IPoIB interface is formed as described in [AARCH] using the interface identifier as described in the previous section.

IPoIB接口的IPv6链路本地地址按照[AARCH]中所述,使用上一节中所述的接口标识符形成。

9. Address Mapping - Unicast
9. 地址映射-单播

Address resolution in IPv4 subnets is accomplished through Address Resolution Protocol (ARP) [ARP]. It is accomplished in IPv6 subnets using the Neighbor Discovery protocol [DISC].

IPv4子网中的地址解析是通过地址解析协议(ARP)[ARP]实现的。它使用邻居发现协议[DISC]在IPv6子网中完成。

9.1. Link Information
9.1. 链接信息

An InfiniBand packet over the UD mode includes multiple headers such as the LRH (local route header), GRH (global route header), BTH (base transport header), DETH (datagram extended transport header) as depicted in figure 4 and specified in the InfiniBand architecture [IBTA]. All these headers comprise the link-layer in an IPoIB link.

UD模式上的InfiniBand数据包包括多个报头,如图4所示的、InfiniBand体系结构[IBTA]中指定的LRH(本地路由报头)、GRH(全局路由报头)、BTH(基本传输报头)、DETH(数据报扩展传输报头)。所有这些报头构成IPoIB链路中的链路层。

The parameters needed in these IBA headers constitute the link-layer information that needs to be determined before an IP packet may be transmitted across the IPoIB link.

这些IBA报头中所需的参数构成了在IP分组可通过IPoIB链路传输之前需要确定的链路层信息。

The parameters that need to be determined are as follows:

需要确定的参数如下:

a) LID

a) 盖子

The LID is always needed. A packet always includes the LRH that is targeted at the remote node's LID, or an IB router's LID to get to the remote node in another IB subnet.

盖子总是需要的。数据包始终包括针对远程节点的LID的LRH,或IB路由器的LID,以到达另一IB子网中的远程节点。

b) Global Identifier (GID)

b) 全局标识符(GID)

The GID is not needed when exchanging information within an IB subnet though it may be included in any packet. It is an absolute necessity when transmitting across the IB subnet since the IB routers use the GID to correctly forward the packets. The source and destination GIDs are fields included in the GRH.

在IB子网内交换信息时不需要GID,尽管它可能包含在任何数据包中。跨IB子网传输时,这是绝对必要的,因为IB路由器使用GID正确转发数据包。源和目标GID是GRH中包含的字段。

The GID, if formed using the GUID, can be used to unambiguously identify an endpoint.

如果使用GUID形成GID,则可以使用GID明确标识端点。

c) Queue Pair Number (QPN)

c) 队列对编号(QPN)

Every unicast UD communication is always directed to a particular queue pair (QP) at the peer.

每个单播UD通信总是指向对等方的特定队列对(QP)。

d) Q_Key

d) Q_键

A Q_Key is associated with each Unreliable Datagram QPN. The received packets must contain a Q_Key that matches the QP's Q_Key to be accepted.

Q_密钥与每个不可靠数据报QPN相关联。接收到的数据包必须包含与要接受的QP的Q_密钥匹配的Q_密钥。

e) P_Key

e) P_键

A successful communication between two IB nodes using UD mode can occur only if the two nodes have compatible P_Keys. This is referred to as being in the same partition [IBTA].

只有当两个IB节点具有兼容的P_密钥时,才能使用UD模式在两个IB节点之间成功通信。这被称为位于同一分区[IBTA]中。

f) SL

f) SL

Every IBA packet contains an SL value. A path in IBA is defined by the three-tuple (source LID, destination LID, SL). The SL in turns is mapped to a virtual lane (VL) at every CA, switch that sends/forwards the packet [RFC4392]. Multiple SLs may be used between two endpoints to provide for load balancing. SLs may be used for providing a Quality of Service (QoS) infrastructure, or may be used to avoid deadlocks in the IBA fabric.

每个IBA数据包都包含一个SL值。IBA中的路径由三个元组(源LID、目标LID、SL)定义。SL依次映射到每个CA交换机上的虚拟通道(VL),该交换机发送/转发数据包[RFC4392]。可以在两个端点之间使用多个SLs来提供负载平衡。SLs可用于提供服务质量(QoS)基础设施,或可用于避免IBA结构中的死锁。

Another auxiliary piece of information, not included in the IBA headers, is the following:

IBA标题中未包含的另一条辅助信息如下:

g) Path rate

g) 路径速率

IBA defines multiple link speeds. A higher-speed transmitter can swamp switches and the CAs. To avoid such congestion, every source transmitting at greater than 1x speeds is required to determine the "path rate" before the data may be transmitted [IBTA].

IBA定义了多个链路速度。高速发射机可能会淹没开关和CAs。为了避免这种拥塞,在传输数据[IBTA]之前,每个以1x以上速度传输的源都需要确定“路径速率”。

9.1.1. Link-Layer Address/Hardware Address
9.1.1. 链路层地址/硬件地址

Though the list of information required for a successful transmittal of an IPoIB packet is large, not all the information need be determined during the IP address resolution process.

虽然成功传输IPoIB数据包所需的信息列表很大,但并非所有信息都需要在IP地址解析过程中确定。

The 20-octet IPoIB link-layer address used in the source/target link-layer address option in IPv6 and the "hardware address" in IPv4/ARP has the same format.

IPv6中源/目标链路层地址选项中使用的20个八位字节的IPoIB链路层地址与IPv4/ARP中的“硬件地址”具有相同的格式。

The format is as described below:

格式如下所述:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |    Reserved   |              Queue Pair Number                |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       +                                                               +
       |                                                               |
       +                            GID                                +
       |                                                               |
       +                                                               +
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |    Reserved   |              Queue Pair Number                |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       +                                                               +
       |                                                               |
       +                            GID                                +
       |                                                               |
       +                                                               +
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 5

图5

a) Reserved Flags

a) 保留旗

These 8 bits are reserved for future use. These bits MUST be set to zero on send and ignored on receive unless specified differently in a future document.

这8位保留供将来使用。除非在将来的文档中另有规定,否则这些位在发送时必须设置为零,在接收时必须忽略。

b) QPN

b) QPN

Every unicast communication in IB architecture is directed to a specific QP [IBTA]. This QP number is included in the link description. All IP communication to the relevant IPoIB interface MUST be directed to this QPN. In the case of IPv4 subnets, the Address Resolution Protocol (ARP) reply packets are also directed to the same QPN.

IB体系结构中的每个单播通信都指向特定的QP[IBTA]。此QP编号包含在链接描述中。与相关IPoIB接口的所有IP通信必须定向到此QPN。在IPv4子网的情况下,地址解析协议(ARP)应答包也被定向到相同的QPN。

The choice of the QPN for IP/ARP communication is up to the implementation.

IP/ARP通信的QPN选择取决于实现。

c) GID

c) 基德

This is one of the GIDs of the port associated with the IPoIB interface [IBTA]. IB associates multiple GIDs with a port. It is RECOMMENDED that the GID formed by the combination of the IB subnet prefix and the port's "Port GUID" [IBTA] be included in the link-layer/hardware address.

这是与IPoIB接口[IBTA]关联的端口的GID之一。IB将多个GID与一个端口关联。建议在链路层/硬件地址中包括由IB子网前缀和端口的“端口GUID”[IBTA]组合形成的GID。

9.1.2. Auxiliary Link Information
9.1.2. 辅助链路信息

The rest of the parameters are determined as follows:

其余参数的确定如下所示:

a) LID

a) 盖子

The method of determining the peer's LID is not defined in this document. It is up to the implementation to use any of the IBA-approved methods to determine the destination LID. One such method is to use the GID determined during the address resolution, to retrieve the associated LID from the IB routing infrastructure or the Subnet Administrator (SA).

本文件中未定义确定对等方LID的方法。由实施方使用IBA批准的任何方法确定目的地LID。其中一种方法是使用地址解析期间确定的GID,从IB路由基础设施或子网管理员(SA)检索关联的LID。

It is the responsibility of the administrator to ensure that the IB subnet(s) have unicast connectivity between the IPoIB nodes. The GID exchanged between two endpoints in a multicast message (ARP/ND) does not guarantee the existence of a unicast path between the two.

管理员有责任确保IB子网在IPoIB节点之间具有单播连接。多播消息(ARP/ND)中两个端点之间交换的GID不能保证两个端点之间存在单播路径。

There may be multiple LIDs, and hence paths, between the endpoints. The criteria for selection of the LIDs are beyond the scope of this document.

端点之间可能有多个盖子,因此也可能有多条路径。盖子的选择标准超出了本文件的范围。

b) Q_Key

b) Q_键

The Q_Key received on joining the broadcast group MUST be used for all IPoIB communication over the particular IPoIB link.

必须在所有IPOU上使用广播组上接收的IPOQib。

c) P_Key

c) P_键

The P_Key to be used in the IP subnet is not discovered but is a configuration parameter.

未发现IP子网中要使用的P_密钥,但它是一个配置参数。

d) SL

d) SL

The method of determining the SL is not defined in this document. The SL is determined by any of the IBA-approved methods.

本文件中未定义确定SL的方法。SL由IBA批准的任何方法确定。

e) Path rate

e) 路径速率

The implementation must leverage IB methods to determine the path rate as required.

实现必须根据需要利用IB方法来确定路径速率。

9.2. Address Resolution in IPv4 Subnets
9.2. IPv4子网中的地址解析

The ARP packet header is as defined in [ARP]. The hardware type is set to 32 (decimal) as specified by IANA [IANA]. The rest of the fields are used as per [ARP].

ARP数据包头的定义见[ARP]。按照IANA[IANA]的规定,硬件类型设置为32(十进制)。其余字段按[ARP]使用。

16 bits: hardware type 16 bits: protocol 8 bits: length of hardware address 8 bits: length of protocol address 16 bits: ARP operation

16位:硬件类型16位:协议8位:硬件地址长度8位:协议地址长度16位:ARP操作

The remaining fields in the packet hold the sender/target hardware and protocol addresses.

数据包中的其余字段保存发送方/目标硬件和协议地址。

               [ sender hardware address ]
               [ sender protocol address ]
               [ target hardware address ]
               [ target protocol address ]
        
               [ sender hardware address ]
               [ sender protocol address ]
               [ target hardware address ]
               [ target protocol address ]
        

The hardware address included in the ARP packet will be as specified in section 9.1.1 and depicted in figure 5.

ARP数据包中包含的硬件地址如第9.1.1节所述,如图5所示。

The length of the hardware address used in ARP packet header therefore is 20.

因此,ARP数据包头中使用的硬件地址长度为20。

9.3. Address Resolution in IPv6 Subnets
9.3. IPv6子网中的地址解析

The Source/Target Link-layer address option is used in Router Solicit, Router advertisements, Redirect, Neighbor Solicitation, and Neighbor Advertisement messages when such messages are transmitted on InfiniBand networks.

源/目标链路层地址选项用于在InfiniBand网络上传输路由器请求、路由器播发、重定向、邻居请求和邻居播发消息时使用。

The source/target address option is specified as follows:

源/目标地址选项指定如下:

Type: Source Link-layer address 1 Target Link-layer address 2

类型:源链路层地址1目标链路层地址2

Length: 3

长度:3

Link-layer address:

链路层地址:

The link-layer address is as specified in section 9.1.1 and depicted in figure 5.

链路层地址如第9.1.1节所述,如图5所示。

[DISC] specifies the length of source/target option in number of 8-octets as indicated by a length of '3' above. Since the IPoIB link-layer address is only 20 octets long, two octets of zero MUST be prepended to fill the total option length of 24 octets.

[DISC]指定源/目标选项的长度(以8个八位字节为单位),如上面的长度“3”所示。由于IPoIB链路层地址只有20个八位字节长,因此必须在前面加上两个零八位字节,以填充24个八位字节的总选项长度。

9.4. Cautionary Note on QPN Caching
9.4. 关于QPN缓存的注意事项

The link-layer address for IPoIB includes the QPN, which might not be constant across reboots or even across network interface resets. Cached QPN entries, such as in static ARP entries or in Reverse Address Resolution Protocol (RARP) servers, will only work if the implementation(s) using these options ensure that the QPN associated with an interface is invariant across reboots/network resets.

IPoIB的链路层地址包括QPN,QPN在重新启动或网络接口重置期间可能不是恒定的。缓存的QPN条目,如静态ARP条目或反向地址解析协议(RARP)服务器中的QPN条目,只有在使用这些选项的实现确保与接口相关联的QPN在重新启动/网络重置期间保持不变的情况下才会起作用。

It is RECOMMENDED that implementations revalidate ARP caches periodically due to the aforementioned QPN-induced volatility of IPoIB link-layer addresses.

由于前面提到的QPN导致IPoIB链路层地址的波动性,建议实施定期重新验证ARP缓存。

10. Sending and Receiving IP Multicast Packets
10. 发送和接收IP多播数据包

Multicast in InfiniBand differs in a number of ways from multicast in ethernet. This adds some complexity to an IPoIB implementation when supporting IP multicast over IB.

InfiniBand中的多播与以太网中的多播有许多不同之处。当支持IB上的IP多播时,这增加了IPoIB实现的复杂性。

A) An IB multicast group must be explicitly created through the SA before it can be used.

A) IB多播组必须通过SA显式创建才能使用。

This implies that in order to send a packet destined for an IP multicast address, the IPoIB implementation must check with the SA on the outbound link first for a "MCMemberRecord" that matches the MGID. If one does exist, the Multicast Local Identifier (MLID) associated with the multicast group is used as the Destination Local Identifier (DLID) for the packet. Otherwise, it implies no member exists on the local link. If the scope of the IP multicast group is beyond link-local, the packet must be sent to the on-link routers through the use of the all-router multicast group or the broadcast group. This is to allow local routers to forward the packet to multicast listeners on remote networks. The all-router multicast group is preferred over the broadcast group for better efficiency. If the all-router multicast group does not exist, the sender can assume that there are no routers on the local link; hence the packet can be safely dropped.

这意味着,为了发送目的地为IP多播地址的数据包,IPoIB实现必须首先与出站链路上的SA检查与MGID匹配的“MCMemberRecord”。如果确实存在多播本地标识符,则将与多播组关联的多播本地标识符(MLID)用作数据包的目标本地标识符(DLID)。否则,它意味着本地链接上不存在任何成员。如果IP多播组的范围超出链路本地,则必须通过使用全路由器多播组或广播组将数据包发送到链路上的路由器。这是为了允许本地路由器将数据包转发给远程网络上的多播侦听器。为了提高效率,全路由器多播组优于广播组。如果全路由器多播组不存在,发送方可以假设本地链路上没有路由器;因此,可以安全地丢弃数据包。

B) A multicast sender must join the target multicast group before outgoing multicast messages from it can be successfully routed. The "SendOnlyNonMember" join is different from the regular "FullMember" join in two aspects. First, both types of joins enable multicast packets to be routed FROM the local port, but only the "FullMember" join causes multicast packets to be routed TO the port. Second, the sender port of a "SendOnlyNonMember" join will not be counted as a member of the multicast group for purposes of group creation and deletion.

B) 在成功路由来自目标多播组的传出多播消息之前,多播发送方必须加入目标多播组。“SendOnlyNonMember”联接在两个方面不同于常规的“FullMember”联接。首先,这两种类型的连接都允许从本地端口路由多播数据包,但只有“FullMember”连接会导致多播数据包路由到端口。第二,“SendOnlyOnMember”联接的发送方端口在创建和删除组时不会被算作多播组的成员。

The following code snippet demonstrates the steps in a typical implementation when processing an egress multicast packet.

下面的代码片段演示了处理出口多播数据包时典型实现中的步骤。

if the egress port is already a "SendOnlyNonMember", or a "FullMember" => send the packet

如果出口端口已经是“SendOnlyNonMember”或“FullMember”=>发送数据包

else if the target multicast group exists => do "SendOnlyNonMember" join => send the packet

否则,如果目标多播组存在=>do“SendOnlyNonMember”join=>发送数据包

else if scope > link-local AND the all-router multicast group exists => send the packet to all routers

否则,如果作用域>链路本地且存在所有路由器多播组=>将数据包发送到所有路由器

else => drop the packet

else=>丢弃数据包

Implementations should cache the information about the existence of an IB multicast group, its MLID and other attributes. This is to avoid expensive SA calls on every outgoing multicast packet. Senders MUST subscribe to the multicast group create and delete traps in

实现应该缓存有关IB多播组的存在、其MLID和其他属性的信息。这是为了避免对每个传出的多播数据包进行昂贵的SA调用。发件人必须订阅多播组“在中创建和删除陷阱”

order to monitor the status of specific IB multicast groups. For example, multicast packets directed to the all-router multicast group due to a lack of listener on the local subnet must be forwarded to the right multicast group if the group is created later. This happens when a listener shows up on the local subnet.

以监视特定IB多播组的状态。例如,如果稍后创建组,则由于本地子网上缺少侦听器而定向到全路由器多播组的多播数据包必须转发到正确的多播组。当侦听器出现在本地子网上时,就会发生这种情况。

A node joining an IP multicast group must first construct an MGID according to the rule described in section 4 above. Once the correct MGID is calculated, the node must call the SA of the outbound link to attempt a "FullMember" join of the IB multicast group corresponding to the MGID. If the IB multicast group does not already exist, one must be created first with the IPoIB link MTU. The MGID MUST use the same P_Key, Q_Key, SL, MTU, and HopLimit as those used in the broadcast-GID. The rest of attributes SHOULD follow the values used in the broadcast-GID as well.

加入IP多播组的节点必须首先根据上面第4节中描述的规则构造MGID。一旦计算出正确的MGID,节点必须调用出站链路的SA,以尝试与MGID对应的IB多播组的“FullMember”加入。如果IB多播组不存在,则必须首先使用IPoIB链路MTU创建一个。MGID必须使用与广播GID中使用的相同的P_键、Q_键、SL、MTU和HopLimit。其余属性也应遵循广播GID中使用的值。

The join request will cause the local port to be added to the multicast group. It also enables the SM to program IB switches and routers with the new multicast information to ensure the correct forwarding of multicast packets for the group.

加入请求将导致本地端口添加到多播组。它还使SM能够使用新的多播信息对IB交换机和路由器进行编程,以确保为组正确转发多播数据包。

When a node leaves an IP multicast group, it SHOULD make a "FullMember" leave request to the SA. This gives the SM an opportunity to update relevant forwarding information, to delete an IB multicast group if the local port is the last FullMember to leave, and to free up the MLID allocated for it. The specific algorithm is implementation-dependent and is out of the scope of this document.

当节点离开IP多播组时,它应该向SA发出“FullMember”离开请求。这使SM有机会更新相关转发信息,删除IB多播组(如果本地端口是最后一个离开的完整成员),并释放分配给它的MLID。具体算法取决于实现,不在本文档范围内。

Note that for an IPoIB link that spans more than one IB subnet connected by IB routers, an adequate multicast forwarding support at the IB level is required for multicast packets to reach listeners on a remote IB subnet. The specific mechanism for this is beyond the scope of IPoIB.

请注意,对于跨越IB路由器连接的多个IB子网的IPoIB链路,需要在IB级别提供足够的多播转发支持,以便多播数据包到达远程IB子网上的侦听器。这方面的具体机制超出了IPoIB的范围。

11. IP Multicast Routing
11. IP组播路由

IP multicast routing requires each interface over which the router is operating to be configured to listen to all link-layer multicast addresses generated by IP [IPMULT, IP6MLD]. For an Ethernet interface, this is often achieved by turning on the promiscuous multicast mode on the interface.

IP多播路由要求路由器运行的每个接口都配置为侦听IP[IPMULT,IP6MLD]生成的所有链路层多播地址。对于以太网接口,这通常是通过打开接口上的混杂多播模式来实现的。

IBA does not provide any hardware support for promiscuous multicast mode. Fortunately, a promiscuous multicast mode can be emulated in the software running on a router through the following steps:

IBA不为混杂多播模式提供任何硬件支持。幸运的是,可以通过以下步骤在路由器上运行的软件中模拟混杂多播模式:

A) Obtain a list of all active IB multicast groups from the local SA.

A) 从本地SA获取所有活动IB多播组的列表。

B) Make a "NonMember" join request to the SA for every group that has a signature in its MGID matching the one for either IPv4 or IPv6.

B) 对其MGID中的签名与IPv4或IPv6的签名匹配的每个组向SA发出“非成员”加入请求。

C) Subscribe to the IB multicast group creation events using a wildcarded MGID so that the router can "NonMember" join all IB multicast groups created subsequently for IPv4 or IPv6.

C) 使用通配符MGID订阅IB多播组创建事件,以便路由器可以“非成员”加入随后为IPv4或IPv6创建的所有IB多播组。

The "NonMember" join has the same effect as a "FullMember" join except that the former will not be counted as a member of the multicast group for purposes of group creation or deletion. That is, when the last "FullMember" leaves a multicast group, the group can be safely deleted by the SA without concerning any "NonMember" routers.

“非成员”联接与“完整成员”联接具有相同的效果,但前者在创建或删除组时不会被视为多播组的成员。也就是说,当最后一个“完整成员”离开多播组时,SA可以安全地删除该组,而无需考虑任何“非成员”路由器。

12. New Types of Vulnerability in IB Multicast
12. IB多播中的新型漏洞

Many IB multicast functions are subject to failures due to a number of possible resource constraints. These include the creation of IB multicast groups, the join calls ("SendOnlyNonMember", "FullMember", and "NonMember"), and the attaching of a QP to a multicast group.

由于许多可能的资源限制,许多IB多播功能都会出现故障。这些包括IB多播组的创建、连接调用(“SendOnlyNonMember”、“FullMember”和“NonMember”)以及将QP附加到多播组。

In general, the occurrence of these failure conditions is highly implementation-dependent, and is believed to be rare. Usually, a failed multicast operation at the IB level can be propagated back to the IP level, causing the original operation to fail and the initiator of the operation to be notified. But some IB multicast functions are not tied to any foreground operation, making their failures hard to detect. For example, if an IP multicast router attempts to "NonMember" join a newly created multicast group in the local subnet, but the join call fails, packet forwarding for that particular multicast group will likely fail silently, that is, without the attention of local multicast senders. This type of problem can add more vulnerability to the already unreliable IP multicast operations.

通常,这些故障条件的发生高度依赖于实现,并且被认为是罕见的。通常,IB级别失败的多播操作可以传播回IP级别,导致原始操作失败并通知操作的发起方。但是一些IB多播功能与任何前台操作都没有关联,因此很难检测到它们的故障。例如,如果IP多播路由器尝试“非成员”加入本地子网中新创建的多播组,但加入呼叫失败,则该特定多播组的数据包转发可能会无声地失败,即,没有本地多播发送者的注意。这类问题会给已经不可靠的IP多播操作增加更多漏洞。

Implementations SHOULD log error messages upon any failure from an IB multicast operation. Network administrators should be aware of this vulnerability, and preserve enough multicast resources at the points where IP multicast will be used heavily. For example, HCAs with ample multicast resources should be used at any IP multicast router.

实现应该在IB多播操作出现任何故障时记录错误消息。网络管理员应意识到此漏洞,并在IP多播将被大量使用的位置保留足够的多播资源。例如,任何IP多播路由器都应该使用具有充足多播资源的HCA。

13. Security Considerations
13. 安全考虑

This document specifies IP transmission over a multicast network. Any network of this kind is vulnerable to a sender claiming another's identity and forging traffic or eavesdropping. It is the responsibility of the higher layers or applications to implement suitable countermeasures if this is a problem.

本文档指定通过多播网络进行IP传输。任何此类网络都容易受到声称他人身份、伪造流量或窃听的发件人的攻击。如果这是一个问题,则高层或应用程序有责任实施适当的对策。

Successful transmission of IP packets depends on the correct setup of the IPoIB link, creation of the broadcast-GID, creation of the QP and its attachment to the broadcast-GID, and the correct determination of various link parameters such as the LID, service level, and path rate. These operations, many of which involve interactions with the SM/SA, MUST be protected by the underlying operating system. This is to prevent malicious, non-privileged software from hijacking important resources and configurations.

IP包的成功传输取决于IPoIB链路的正确设置、广播GID的创建、QP的创建及其与广播GID的连接,以及各种链路参数(如LID、服务级别和路径速率)的正确确定。这些操作(其中许多涉及与SM/SA的交互)必须受到底层操作系统的保护。这是为了防止恶意、非特权软件劫持重要资源和配置。

Controlled Q_Keys SHOULD be used in all transmissions. This is to prevent non-privileged software from fabricating IP datagrams.

所有变速箱均应使用受控Q_键。这是为了防止非特权软件制造IP数据报。

14. IANA Considerations
14. IANA考虑

To support ARP over InfiniBand, a value for the Address Resolution Parameter "Number Hardware Type (hrd)" is required. IANA has assigned the number "32" to indicate InfiniBand [IANA_ARP].

要支持InfiniBand上的ARP,需要地址解析参数“数字硬件类型(hrd)”的值。IANA已指定数字“32”表示InfiniBand[IANA_ARP]。

Future uses of the reserved bits in the frame format (Figure 3) and link-layer address (Figure 5) MUST be published as RFCs. This document requires that the reserved bits be set to zero on send and ignored on receive.

帧格式(图3)和链路层地址(图5)中保留位的未来使用必须发布为RFC。本文档要求在发送时将保留位设置为零,在接收时将其忽略。

15. Acknowledgements
15. 致谢

The authors would like to thank Bruce Beukema, David Brean, Dan Cassiday, Aditya Dube, Yaron Haviv, Michael Krause, Thomas Narten, Erik Nordmark, Greg Pfister, Jim Pinkerton, Renato Recio, Kevin Reilly, Kanoj Sarcar, Satya Sharma, Madhu Talluri, and David L. Stevens for their suggestions and many clarifications on the IBA specification.

作者要感谢Bruce Beukema、David Brean、Dan Cassiday、Aditya Dube、Yaron Haviv、Michael Krause、Thomas Narten、Erik Nordmark、Greg Pfister、Jim Pinkerton、Renato Recio、Kevin Reilly、Kanoj Sarcar、Satya Sharma、Madhu Talluri和David L.Stevens对IBA规范的建议和许多澄清。

16. References
16. 工具书类
16.1. Normative References
16.1. 规范性引用文件

[AARCH] Hinden, R. and S. Deering, "Internet Protocol Version 6 (IPv6) Addressing Architecture", RFC 3513, April 2003.

[AARCH]Hinden,R.和S.Deering,“互联网协议版本6(IPv6)寻址体系结构”,RFC 3513,2003年4月。

[ARP] Plummer, David C., "Ethernet Address Resolution Protocol: Or converting network protocol addresses to 48.bit Ethernet address for transmission on Ethernet hardware ", STD 37, RFC 826, November 1982.

[ARP]Plummer,David C.,“以太网地址解析协议:或将网络协议地址转换为48位以太网地址,以便在以太网硬件上传输”,STD 37,RFC 826,1982年11月。

[DISC] Narten, T., Nordmark, E., and W. Simpson, "Neighbor Discovery for IP Version 6 (IPv6)", RFC 2461, December 1998.

[DISC]Narten,T.,Nordmark,E.,和W.Simpson,“IP版本6(IPv6)的邻居发现”,RFC 246112998年12月。

[IANA] Internet Assigned Numbers Authority, URL http://www.iana.org

[IANA]互联网分配号码管理局,URLhttp://www.iana.org

   [IANA_ARP]   URL http://www.iana.org/assignments/arp-parameters
        
   [IANA_ARP]   URL http://www.iana.org/assignments/arp-parameters
        
   [IBTA]       InfiniBand Architecture Specification, URL
                http://www.infinibandta.org/specs
        
   [IBTA]       InfiniBand Architecture Specification, URL
                http://www.infinibandta.org/specs
        

[RFC4392] Kashyap, V., "IP over InfiniBand (IPoIB) Architecture", RFC 4392, April 2006.

[RFC4392]Kashyap,V.,“InfiniBand上的IP(IPoIB)架构”,RFC 4392,2006年4月。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

16.2. Informative References
16.2. 资料性引用

[HOSTS] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989.

[主机]Braden,R.,“互联网主机的要求-通信层”,标准3,RFC 1122,1989年10月。

[IGMP3] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A. Thyagarajan, "Internet Group Management Protocol, Version 3", RFC 3376, October 2002.

[IGMP3]Cain,B.,Deering,S.,Kouvelas,I.,Fenner,B.,和A.Thyagarajan,“互联网组管理协议,第3版”,RFC 3376,2002年10月。

[IP6MLD] Deering, S., Fenner, W., and B. Haberman, "Multicast Listener Discovery (MLD) for IPv6", RFC 2710, October 1999.

[IP6MLD]Deering,S.,Fenner,W.,和B.Haberman,“IPv6的多播侦听器发现(MLD)”,RFC 2710,1999年10月。

[IPMULT] Deering, S., "Host extensions for IP multicasting", STD 5, RFC 1112, August 1989.

[IPMULT]Deering,S.,“IP多播的主机扩展”,STD 5,RFC 1112,1989年8月。

[IPV6] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998.

[IPV6]Deering,S.和R.Hinden,“互联网协议,第6版(IPV6)规范”,RFC 2460,1998年12月。

Authors' Addresses

作者地址

H.K. Jerry Chu 17 Network Circle, UMPK17-201 Menlo Park, CA 94025 USA

H.K.Jerry Chu 17美国加利福尼亚州门罗公园UMPK17-201网络圈,邮编94025

   Phone: +1 650 786 5146
   EMail: jerry.chu@sun.com
        
   Phone: +1 650 786 5146
   EMail: jerry.chu@sun.com
        

Vivek Kashyap 15350, SW Koll Parkway Beaverton, OR 97006 USA

Vivek Kashyap 15350,美国比弗顿西南科尔公园路,或97006

   Phone: +1 503 578 3422
   EMail: vivk@us.ibm.com
        
   Phone: +1 503 578 3422
   EMail: vivk@us.ibm.com
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2006).

版权所有(C)互联网协会(2006年)。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件及其包含的信息是按“原样”提供的,贡献者、他/她所代表或赞助的组织(如有)、互联网协会和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Intellectual Property

知识产权

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.

Acknowledgement

确认

Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).

RFC编辑器功能的资金由IETF行政支持活动(IASA)提供。