Internet Engineering Task Force (IETF) C. Lever, Ed. Request for Comments: 8166 Oracle Obsoletes: 5666 W. Simpson Category: Standards Track Red Hat ISSN: 2070-1721 T. Talpey Microsoft June 2017
Internet Engineering Task Force (IETF) C. Lever, Ed. Request for Comments: 8166 Oracle Obsoletes: 5666 W. Simpson Category: Standards Track Red Hat ISSN: 2070-1721 T. Talpey Microsoft June 2017
Remote Direct Memory Access Transport for Remote Procedure Call Version 1
远程过程调用版本1的远程直接内存访问传输
Abstract
摘要
This document specifies a protocol for conveying Remote Procedure Call (RPC) messages on physical transports capable of Remote Direct Memory Access (RDMA). This protocol is referred to as the RPC-over-RDMA version 1 protocol in this document. It requires no revision to application RPC protocols or the RPC protocol itself. This document obsoletes RFC 5666.
本文档指定了一个协议,用于在能够进行远程直接内存访问(RDMA)的物理传输上传输远程过程调用(RPC)消息。在本文档中,此协议称为RPC over RDMA版本1协议。它不需要修改应用程序RPC协议或RPC协议本身。本文件淘汰了RFC 5666。
Status of This Memo
关于下段备忘
This is an Internet Standards Track document.
这是一份互联网标准跟踪文件。
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.
本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 7841第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc8166.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc8166.
Copyright Notice
版权公告
Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2017 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.
本文件可能包含2008年11月10日之前发布或公开的IETF文件或IETF贡献中的材料。控制某些材料版权的人员可能未授予IETF信托允许在IETF标准流程之外修改此类材料的权利。在未从控制此类材料版权的人员处获得充分许可的情况下,不得在IETF标准流程之外修改本文件,也不得在IETF标准流程之外创建其衍生作品,除了将其格式化以RFC形式发布或将其翻译成英语以外的其他语言。
Table of Contents
目录
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. RPCs on RDMA Transports . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 2.2. RPCs . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3. RDMA . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. RPC-over-RDMA Protocol Framework . . . . . . . . . . . . . . 10 3.1. Transfer Models . . . . . . . . . . . . . . . . . . . . . 10 3.2. Message Framing . . . . . . . . . . . . . . . . . . . . . 11 3.3. Managing Receiver Resources . . . . . . . . . . . . . . . 11 3.4. XDR Encoding with Chunks . . . . . . . . . . . . . . . . 14 3.5. Message Size . . . . . . . . . . . . . . . . . . . . . . 19 4. RPC-over-RDMA in Operation . . . . . . . . . . . . . . . . . 23 4.1. XDR Protocol Definition . . . . . . . . . . . . . . . . . 23 4.2. Fixed Header Fields . . . . . . . . . . . . . . . . . . . 28 4.3. Chunk Lists . . . . . . . . . . . . . . . . . . . . . . . 30 4.4. Memory Registration . . . . . . . . . . . . . . . . . . . 33 4.5. Error Handling . . . . . . . . . . . . . . . . . . . . . 34 4.6. Protocol Elements No Longer Supported . . . . . . . . . . 37 4.7. XDR Examples . . . . . . . . . . . . . . . . . . . . . . 38 5. RPC Bind Parameters . . . . . . . . . . . . . . . . . . . . . 39 6. ULB Specifications . . . . . . . . . . . . . . . . . . . . . 41 6.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 41 6.2. Maximum Reply Size . . . . . . . . . . . . . . . . . . . 43 6.3. Additional Considerations . . . . . . . . . . . . . . . . 43 6.4. ULP Extensions . . . . . . . . . . . . . . . . . . . . . 43 7. Protocol Extensibility . . . . . . . . . . . . . . . . . . . 44 7.1. Conventional Extensions . . . . . . . . . . . . . . . . . 44 8. Security Considerations . . . . . . . . . . . . . . . . . . . 44 8.1. Memory Protection . . . . . . . . . . . . . . . . . . . . 44 8.2. RPC Message Security . . . . . . . . . . . . . . . . . . 46 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 49 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 50 10.1. Normative References . . . . . . . . . . . . . . . . . . 50 10.2. Informative References . . . . . . . . . . . . . . . . . 51 Appendix A. Changes from RFC 5666 . . . . . . . . . . . . . . . 53 A.1. Changes to the Specification . . . . . . . . . . . . . . 53 A.2. Changes to the Protocol . . . . . . . . . . . . . . . . . 53 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 54 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 55
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. RPCs on RDMA Transports . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 2.2. RPCs . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3. RDMA . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. RPC-over-RDMA Protocol Framework . . . . . . . . . . . . . . 10 3.1. Transfer Models . . . . . . . . . . . . . . . . . . . . . 10 3.2. Message Framing . . . . . . . . . . . . . . . . . . . . . 11 3.3. Managing Receiver Resources . . . . . . . . . . . . . . . 11 3.4. XDR Encoding with Chunks . . . . . . . . . . . . . . . . 14 3.5. Message Size . . . . . . . . . . . . . . . . . . . . . . 19 4. RPC-over-RDMA in Operation . . . . . . . . . . . . . . . . . 23 4.1. XDR Protocol Definition . . . . . . . . . . . . . . . . . 23 4.2. Fixed Header Fields . . . . . . . . . . . . . . . . . . . 28 4.3. Chunk Lists . . . . . . . . . . . . . . . . . . . . . . . 30 4.4. Memory Registration . . . . . . . . . . . . . . . . . . . 33 4.5. Error Handling . . . . . . . . . . . . . . . . . . . . . 34 4.6. Protocol Elements No Longer Supported . . . . . . . . . . 37 4.7. XDR Examples . . . . . . . . . . . . . . . . . . . . . . 38 5. RPC Bind Parameters . . . . . . . . . . . . . . . . . . . . . 39 6. ULB Specifications . . . . . . . . . . . . . . . . . . . . . 41 6.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 41 6.2. Maximum Reply Size . . . . . . . . . . . . . . . . . . . 43 6.3. Additional Considerations . . . . . . . . . . . . . . . . 43 6.4. ULP Extensions . . . . . . . . . . . . . . . . . . . . . 43 7. Protocol Extensibility . . . . . . . . . . . . . . . . . . . 44 7.1. Conventional Extensions . . . . . . . . . . . . . . . . . 44 8. Security Considerations . . . . . . . . . . . . . . . . . . . 44 8.1. Memory Protection . . . . . . . . . . . . . . . . . . . . 44 8.2. RPC Message Security . . . . . . . . . . . . . . . . . . 46 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 49 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 50 10.1. Normative References . . . . . . . . . . . . . . . . . . 50 10.2. Informative References . . . . . . . . . . . . . . . . . 51 Appendix A. Changes from RFC 5666 . . . . . . . . . . . . . . . 53 A.1. Changes to the Specification . . . . . . . . . . . . . . 53 A.2. Changes to the Protocol . . . . . . . . . . . . . . . . . 53 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 54 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 55
This document specifies the RPC-over-RDMA version 1 protocol, based on existing implementations of RFC 5666 and experience gained through deployment. This document obsoletes RFC 5666.
本文档基于RFC 5666的现有实现和通过部署获得的经验,指定了RPC over RDMA版本1协议。本文件淘汰了RFC 5666。
This specification clarifies text that was subject to multiple interpretations and removes support for unimplemented RPC-over-RDMA version 1 protocol elements. It clarifies the role of Upper-Layer Bindings (ULBs) and describes what they are to contain.
本规范澄清了受到多种解释的文本,并删除了对未实现的RPCWorRDMA版本1协议元素的支持。它阐明了上层绑定(ULB)的作用,并描述了它们将包含的内容。
In addition, this document describes current practice using RPCSEC_GSS [RFC7861] on RDMA transports.
此外,本文件描述了在RDMA传输上使用RPCSEC_GSS[RFC7861]的当前实践。
The protocol version number has not been changed because the protocol specified in this document fully interoperates with implementations of the RPC-over-RDMA version 1 protocol specified in [RFC5666].
协议版本号未更改,因为本文档中指定的协议与[RFC5666]中指定的RPC over RDMA版本1协议的实现完全互操作。
RDMA [RFC5040] [RFC5041] [IBARCH] is a technique for moving data efficiently between end nodes. By directing data into destination buffers as it is sent on a network, and placing it via direct memory access by hardware, the benefits of faster transfers and reduced host overhead are obtained.
RDMA[RFC5040][RFC5041][IBARCH]是一种在终端节点之间高效移动数据的技术。通过在网络上发送数据时将数据定向到目标缓冲区,并通过硬件的直接内存访问将其放置,可以获得更快传输和减少主机开销的好处。
Open Network Computing Remote Procedure Call (ONC RPC, often shortened in NFSv4 documents to RPC) [RFC5531] is a remote procedure call protocol that runs over a variety of transports. Most RPC implementations today use UDP [RFC768] or TCP [RFC793]. On UDP, RPC messages are encapsulated inside datagrams, while on a TCP byte stream, RPC messages are delineated by a record marking protocol. An RDMA transport also conveys RPC messages in a specific fashion that must be fully described if RPC implementations are to interoperate.
开放网络计算远程过程调用(ONC-RPC,在NFSv4文档中通常简称为RPC)[RFC5531]是一种远程过程调用协议,运行于各种传输上。目前大多数RPC实现都使用UDP[RFC768]或TCP[RFC793]。在UDP上,RPC消息封装在数据报中,而在TCP字节流上,RPC消息由记录标记协议描述。RDMA传输还以特定的方式传递RPC消息,如果RPC实现要进行互操作,则必须对这种方式进行充分描述。
RDMA transports present semantics that differ from either UDP or TCP. They retain message delineations like UDP but provide reliable and sequenced data transfer like TCP. They also provide an offloaded bulk transfer service not provided by UDP or TCP. RDMA transports are therefore appropriately viewed as a new transport type by RPC.
RDMA传输呈现不同于UDP或TCP的语义。它们保留了UDP等消息描述,但提供了TCP等可靠且有序的数据传输。它们还提供UDP或TCP不提供的卸载批量传输服务。因此,RDMA传输被RPC视为一种新的传输类型。
In this context, the Network File System (NFS) protocols, as described in [RFC1094], [RFC1813], [RFC7530], [RFC5661], and future NFSv4 minor versions, are all obvious beneficiaries of RDMA transports. A complete problem statement is presented in [RFC5532]. Many other RPC-based protocols can also benefit.
在这种情况下,[RFC1094]、[RFC1813]、[RFC7530]、[RFC5661]和未来的NFSv4次要版本中描述的网络文件系统(NFS)协议都是RDMA传输的明显受益者。[RFC5532]中给出了完整的问题陈述。许多其他基于RPC的协议也可以从中受益。
Although the RDMA transport described herein can provide relatively transparent support for any RPC application, this document also describes mechanisms that can optimize data transfer even further, when RPC applications are willing to exploit awareness of RDMA as the transport.
尽管本文描述的RDMA传输可以为任何RPC应用程序提供相对透明的支持,但本文还描述了当RPC应用程序愿意利用RDMA作为传输的感知时,可以进一步优化数据传输的机制。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“建议”、“不建议”、“可”和“可选”在所有大写字母出现时(如图所示)应按照BCP 14[RFC2119][RFC8174]所述进行解释。
This section highlights key elements of the RPC [RFC5531] and External Data Representation (XDR) [RFC4506] protocols, upon which RPC-over-RDMA version 1 is constructed. Strong grounding with these protocols is recommended before reading this document.
本节重点介绍RPC[RFC5531]和外部数据表示(XDR)[RFC4506]协议的关键元素,RPCoverRDMA版本1就是基于这些协议构建的。在阅读本文档之前,建议对这些协议进行深入了解。
RPCs are an abstraction used to implement the operations of an Upper-Layer Protocol (ULP). "ULP" refers to an RPC Program and Version tuple, which is a versioned set of procedure calls that comprise a single well-defined API. One example of a ULP is the Network File System Version 4.0 [RFC7530].
RPC是用于实现上层协议(ULP)操作的抽象。“ULP”是指RPC程序和版本元组,它是一组经过版本控制的过程调用,包含一个定义良好的API。ULP的一个示例是网络文件系统版本4.0[RFC7530]。
In this document, the term "RPC consumer" refers to an implementation of a ULP running on an RPC client endpoint.
在本文档中,术语“RPC使用者”是指在RPC客户端端点上运行的ULP的实现。
Like a local procedure call, every RPC procedure has a set of "arguments" and a set of "results". A calling context invokes a procedure, passing arguments to it, and the procedure subsequently returns a set of results. Unlike a local procedure call, the called procedure is executed remotely rather than in the local application's execution context.
与本地过程调用一样,每个RPC过程都有一组“参数”和一组“结果”。调用上下文调用一个过程,向其传递参数,该过程随后返回一组结果。与本地过程调用不同,被调用的过程是远程执行的,而不是在本地应用程序的执行上下文中执行。
The RPC protocol as described in [RFC5531] is fundamentally a message-passing protocol between one or more clients (where RPC consumers are running) and a server (where a remote execution context is available to process RPC transactions on behalf of those consumers).
[RFC5531]中所述的RPC协议基本上是一个或多个客户端(其中RPC使用者正在运行)和服务器(其中远程执行上下文可用于代表这些使用者处理RPC事务)之间的消息传递协议。
ONC RPC transactions are made up of two types of messages:
ONC RPC事务由两种类型的消息组成:
CALL An "RPC Call message" requests that work be done. This type of message is designated by the value zero (0) in the message's msg_type field. An arbitrary unique value is placed in the message's XID field in order to match this RPC Call message to a corresponding RPC Reply message.
调用“RPC调用消息”请求完成工作。此类型的消息由消息的msg_type字段中的值零(0)指定。在消息的XID字段中放置一个任意唯一值,以便将此RPC调用消息与相应的RPC应答消息相匹配。
REPLY An "RPC Reply message" reports the results of work requested by an RPC Call message. An RPC Reply message is designated by the value one (1) in the message's msg_type field. The value contained in an RPC Reply message's XID field is copied from the RPC Call message whose results are being reported.
答复“RPC答复消息”报告RPC调用消息请求的工作结果。RPC回复消息由消息的msg_type字段中的值1指定。RPC应答消息的XID字段中包含的值是从报告其结果的RPC调用消息中复制的。
The RPC client endpoint acts as a "Requester". It serializes the procedure's arguments and conveys them to a server endpoint via an RPC Call message. This message contains an RPC protocol header, a header describing the requested upper-layer operation, and all arguments.
RPC客户端端点充当“请求者”。它序列化过程的参数,并通过RPC调用消息将它们传递给服务器端点。此消息包含一个RPC协议头、一个描述请求的上层操作的头以及所有参数。
The RPC server endpoint acts as a "Responder". It deserializes the arguments and processes the requested operation. It then serializes the operation's results into another byte stream. This byte stream is conveyed back to the Requester via an RPC Reply message. This message contains an RPC protocol header, a header describing the upper-layer reply, and all results.
RPC服务器端点充当“响应者”。它反序列化参数并处理请求的操作。然后,它将操作结果序列化为另一个字节流。该字节流通过RPC应答消息传回请求者。此消息包含一个RPC协议头、一个描述上层应答的头以及所有结果。
The Requester deserializes the results and allows the original caller to proceed. At this point, the RPC transaction designated by the XID in the RPC Call message is complete, and the XID is retired.
请求者反序列化结果并允许原始调用方继续。此时,RPC调用消息中由XID指定的RPC事务已完成,并且XID已失效。
In summary, RPC Call messages are sent by Requesters to Responders to initiate RPC transactions. RPC Reply messages are sent by Responders to Requesters to complete the processing on an RPC transaction.
总之,RPC调用消息由请求者发送给响应者以启动RPC事务。RPC回复消息由响应者发送给请求者,以完成对RPC事务的处理。
The role of an "RPC transport" is to mediate the exchange of RPC messages between Requesters and Responders. An RPC transport bridges the gap between the RPC message abstraction and the native operations of a particular network transport.
“RPC传输”的作用是调解请求者和响应者之间的RPC消息交换。RPC传输在RPC消息抽象和特定网络传输的本机操作之间架起了桥梁。
RPC-over-RDMA is a connection-oriented RPC transport. When a connection-oriented transport is used, clients initiate transport connections, while servers wait passively for incoming connection requests.
RDMA上的RPC是一种面向连接的RPC传输。当使用面向连接的传输时,客户端启动传输连接,而服务器被动地等待传入的连接请求。
One cannot assume that all Requesters and Responders represent data objects the same way internally. RPC uses External Data Representation (XDR) to translate native data types and serialize arguments and results [RFC4506].
不能假设所有请求者和响应者在内部以相同的方式表示数据对象。RPC使用外部数据表示(XDR)来转换本机数据类型并序列化参数和结果[RFC4506]。
The XDR protocol encodes data independently of the endianness or size of host-native data types, allowing unambiguous decoding of data on the receiving end. RPC Programs are specified by writing an XDR definition of their procedures, argument data types, and result data types.
XDR协议独立于主机本机数据类型的端点或大小对数据进行编码,允许在接收端对数据进行明确的解码。RPC程序是通过编写其过程、参数数据类型和结果数据类型的XDR定义来指定的。
XDR assumes that the number of bits in a byte (octet) and their order are the same on both endpoints and on the physical network. The smallest indivisible unit of XDR encoding is a group of four octets. XDR also flattens lists, arrays, and other complex data types so they can be conveyed as a stream of bytes.
XDR假设一个字节(八位字节)中的位数及其顺序在端点和物理网络上都是相同的。XDR编码的最小不可分割单元是一组四个八位组。XDR还将列表、数组和其他复杂数据类型展平,以便它们可以作为字节流传输。
A serialized stream of bytes that is the result of XDR encoding is referred to as an "XDR stream". A sending endpoint encodes native data into an XDR stream and then transmits that stream to a receiver. A receiving endpoint decodes incoming XDR byte streams into its native data representation format.
作为XDR编码结果的序列化字节流称为“XDR流”。发送端点将本机数据编码到XDR流中,然后将该流传输到接收器。接收端点将传入的XDR字节流解码为其本机数据表示格式。
Sometimes, a data item must be transferred as is: without encoding or decoding. The contents of such a data item are referred to as "opaque data". XDR encoding places the content of opaque data items directly into an XDR stream without altering it in any way. ULPs or applications perform any needed data translation in this case. Examples of opaque data items include the content of files or generic byte strings.
有时,数据项必须按原样传输:不进行编码或解码。此类数据项的内容称为“不透明数据”。XDR编码将不透明数据项的内容直接放入XDR流中,而不会以任何方式对其进行更改。在这种情况下,ULP或应用程序执行任何所需的数据转换。不透明数据项的示例包括文件内容或通用字节字符串。
The number of octets in a variable-length data item precedes that item in an XDR stream. If the size of an encoded data item is not a multiple of four octets, octets containing zero are added after the end of the item; this is the case so that the next encoded data item in the XDR stream starts on a four-octet boundary. The encoded size of the item is not changed by the addition of the extra octets. These extra octets are never exposed to ULPs.
可变长度数据项在XDR流中先于该项的八位字节数。如果编码数据项的大小不是四个八位字节的倍数,则在该项结束后添加包含零的八位字节;这种情况下,XDR流中的下一个编码数据项从四个八位组边界开始。添加额外的八位字节不会改变项目的编码大小。这些额外的八位组从未接触过ULP。
This technique is referred to as "XDR roundup", and the extra octets are referred to as "XDR roundup padding".
这种技术被称为“XDR-roundup”,额外的八位字节被称为“XDR-roundup-padding”。
RPC Requesters and Responders can be made more efficient if large RPC messages are transferred by a third party, such as intelligent network-interface hardware (data movement offload), and placed in the receiver's memory so that no additional adjustment of data alignment has to be made (direct data placement or "DDP"). RDMA transports enable both optimizations.
如果大型RPC消息由第三方(如智能网络接口硬件(数据移动卸载))传输,并放置在接收器的内存中,从而无需对数据对齐进行额外调整(直接数据放置或“DDP”),则可以使RPC请求者和响应者更加高效。RDMA传输支持这两种优化。
Typically, RPC implementations copy the contents of RPC messages into a buffer before being sent. An efficient RPC implementation sends bulk data without copying it into a separate send buffer first.
通常,RPC实现会在发送RPC消息之前将其内容复制到缓冲区中。高效的RPC实现发送大容量数据,而无需首先将其复制到单独的发送缓冲区。
However, socket-based RPC implementations are often unable to receive data directly into its final place in memory. Receivers often need to copy incoming data to finish an RPC operation: sometimes, only to adjust data alignment.
然而,基于套接字的RPC实现通常无法将数据直接接收到内存中的最终位置。接收器通常需要复制传入数据以完成RPC操作:有时,只是为了调整数据对齐。
In this document, "RDMA" refers to the physical mechanism an RDMA transport utilizes when moving data. Although this may not be efficient, before an RDMA transfer, a sender may copy data into an intermediate buffer. After an RDMA transfer, a receiver may copy that data again to its final destination.
在本文档中,“RDMA”指的是RDMA传输在移动数据时使用的物理机制。虽然这可能不是很有效,但在RDMA传输之前,发送方可能会将数据复制到中间缓冲区中。在RDMA传输之后,接收器可以将该数据再次复制到其最终目的地。
In this document, the term "DDP" refers to any optimized data transfer where it is unnecessary for a receiving host's CPU to copy transferred data to another location after it has been received.
在本文件中,术语“DDP”指的是任何优化的数据传输,其中接收主机的CPU在接收到传输的数据后无需将其复制到另一个位置。
Just as [RFC5666] did, this document focuses on the use of RDMA Read and Write operations to achieve both data movement offload and DDP. However, not all RDMA-based data transfer qualifies as DDP, and DDP can be achieved using non-RDMA mechanisms.
正如[RFC5666]所做的,本文档重点介绍如何使用RDMA读写操作来实现数据移动卸载和DDP。然而,并非所有基于RDMA的数据传输都符合DDP,并且DDP可以使用非RDMA机制实现。
To achieve good performance during receive operations, RDMA transports require that RDMA consumers provision resources in advance to receive incoming messages.
为了在接收操作期间获得良好的性能,RDMA传输要求RDMA使用者提前提供资源以接收传入消息。
An RDMA consumer might provide Receive buffers in advance by posting an RDMA Receive Work Request for every expected RDMA Send from a remote peer. These buffers are provided before the remote peer posts RDMA Send Work Requests; thus, this is often referred to as "pre-posting" buffers.
RDMA使用者可以通过为来自远程对等方的每个预期RDMA发送发送发送RDMA接收工作请求,预先提供接收缓冲区。这些缓冲区在远程对等POST RDMA发送工作请求之前提供;因此,这通常被称为“预过账”缓冲区。
An RDMA Receive Work Request remains outstanding until hardware matches it to an inbound Send operation. The resources associated with that Receive must be retained in host memory, or "pinned", until the Receive completes.
RDMA接收工作请求保持未完成状态,直到硬件将其与入站发送操作匹配。与该接收关联的资源必须保留在主机内存中,或“固定”,直到接收完成。
Given these basic tenets of RDMA transport operation, the RPC-over-RDMA version 1 protocol assumes each transport provides the following abstract operations. A more complete discussion of these operations is found in [RFC5040].
鉴于RDMA传输操作的这些基本原则,RPC over RDMA version 1协议假定每个传输提供以下抽象操作。[RFC5040]中对这些操作进行了更完整的讨论。
Registered Memory Registered memory is a region of memory that is assigned a steering tag that temporarily permits access by the RDMA provider to perform data-transfer operations. The RPC-over-RDMA version 1 protocol assumes that each region of registered memory MUST be identified with a steering tag of no more than 32 bits and memory addresses of up to 64 bits in length.
注册内存注册内存是一个内存区域,分配了一个转向标记,临时允许RDMA提供程序访问以执行数据传输操作。RPC over RDMA version 1协议假设注册内存的每个区域必须用不超过32位的引导标记和不超过64位的内存地址来标识。
RDMA Send The RDMA provider supports an RDMA Send operation, with completion signaled on the receiving peer after data has been placed in a pre-posted buffer. Sends complete at the receiver in the order they were issued at the sender. The amount of data transferred by a single RDMA Send operation is limited by the size of the remote peer's pre-posted buffers.
RDMA发送RDMA提供程序支持RDMA发送操作,在将数据放入预发布的缓冲区后,接收端会发出完成信号。按照在发送方发出的顺序在接收方完成发送。单个RDMA发送操作传输的数据量受远程对等方预发布缓冲区大小的限制。
RDMA Receive The RDMA provider supports an RDMA Receive operation to receive data conveyed by incoming RDMA Send operations. To reduce the amount of memory that must remain pinned awaiting incoming Sends, the amount of pre-posted memory is limited. Flow control to prevent overrunning receiver resources is provided by the RDMA consumer (in this case, the RPC-over-RDMA version 1 protocol).
RDMA接收RDMA提供程序支持RDMA接收操作,以接收传入RDMA发送操作传送的数据。为了减少等待传入发送时必须保持固定的内存量,预先发布的内存量是有限的。RDMA使用者(在本例中,是RPC over RDMA version 1协议)提供用于防止接收器资源溢出的流控制。
RDMA Write The RDMA provider supports an RDMA Write operation to place data directly into a remote memory region. The local host initiates an RDMA Write, and completion is signaled there. No completion is signaled on the remote peer. The local host provides a steering tag, memory address, and length of the remote peer's memory region.
RDMA写入RDMA提供程序支持RDMA写入操作,将数据直接放入远程内存区域。本地主机启动RDMA写入,并在那里发出完成信号。远程对等机上没有完成的信号。本地主机提供指导标签、内存地址和远程对等机内存区域的长度。
RDMA Writes are not ordered with respect to one another, but are ordered with respect to RDMA Sends. A subsequent RDMA Send completion obtained at the write initiator guarantees that prior RDMA Write data has been successfully placed in the remote peer's memory.
RDMA写操作不是针对彼此进行排序,而是针对RDMA发送进行排序。在写入启动器处获得的后续RDMA发送完成可确保先前的RDMA写入数据已成功放置在远程对等机的内存中。
RDMA Read The RDMA provider supports an RDMA Read operation to place peer source data directly into the read initiator's memory. The local host initiates an RDMA Read, and completion is signaled there. No completion is signaled on the remote peer. The local host provides steering tags, memory addresses, and a length for the remote source and local destination memory region.
RDMA读取RDMA提供程序支持RDMA读取操作,将对等源数据直接放入读取启动器的内存中。本地主机启动RDMA读取,并在此发出完成信号。远程对等机上没有完成的信号。本地主机为远程源和本地目标内存区域提供转向标记、内存地址和长度。
The local host signals Read completion to the remote peer as part of a subsequent RDMA Send message. The remote peer can then release steering tags and subsequently free associated source memory regions.
作为后续RDMA发送消息的一部分,本地主机向远程对等方发送读取完成信号。然后,远程对等方可以释放转向标记,并随后释放相关的源内存区域。
The RPC-over-RDMA version 1 protocol is designed to be carried over RDMA transports that support the above abstract operations. This protocol conveys information sufficient for an RPC peer to direct an RDMA provider to perform transfers containing RPC data and to communicate their result(s).
RPC over RDMA version 1协议旨在通过支持上述抽象操作的RDMA传输进行传输。此协议传递的信息足以让RPC对等方指示RDMA提供程序执行包含RPC数据的传输并传递其结果。
A "transfer model" designates which endpoint exposes its memory and which is responsible for initiating the transfer of data. To enable RDMA Read and Write operations, for example, an endpoint first exposes regions of its memory to a remote endpoint, which initiates these operations against the exposed memory.
“传输模型”指定哪个端点公开其内存,哪个端点负责启动数据传输。例如,要启用RDMA读写操作,端点首先将其内存区域公开给远程端点,远程端点针对公开的内存启动这些操作。
Read-Read Requesters expose their memory to the Responder, and the Responder exposes its memory to Requesters. The Responder reads, or pulls, RPC arguments or whole RPC calls from each Requester. Requesters pull RPC results or whole RPC relies from the Responder.
读请求程序向响应程序公开其内存,响应程序向请求程序公开其内存。响应程序从每个请求程序读取或提取RPC参数或整个RPC调用。请求者从响应者获取RPC结果或整个RPC依赖。
Write-Write Requesters expose their memory to the Responder, and the Responder exposes its memory to Requesters. Requesters write, or push, RPC arguments or whole RPC calls to the Responder. The Responder pushes RPC results or whole RPC relies to each Requester.
写-写请求程序向响应程序公开其内存,响应程序向请求程序公开其内存。请求者向响应者写入或推送RPC参数或整个RPC调用。响应者将RPC结果或整个RPC依赖推送到每个请求者。
Read-Write Requesters expose their memory to the Responder, but the Responder does not expose its memory. The Responder pulls RPC arguments or whole RPC calls from each Requester. The Responder pushes RPC results or whole RPC relies to each Requester.
读写请求程序向响应程序公开其内存,但响应程序不公开其内存。响应程序从每个请求程序提取RPC参数或整个RPC调用。响应者将RPC结果或整个RPC依赖推送到每个请求者。
Write-Read The Responder exposes its memory to Requesters, but Requesters do not expose their memory. Requesters push RPC arguments or whole RPC calls to the Responder. Requesters pull RPC results or whole RPC relies from the Responder.
写-读响应程序向请求程序公开其内存,但请求程序不公开其内存。请求者将RPC参数或整个RPC调用推送到响应者。请求者从响应者获取RPC结果或整个RPC依赖。
On an RPC-over-RDMA transport, each RPC message is encapsulated by an RPC-over-RDMA message. An RPC-over-RDMA message consists of two XDR streams.
在RPCoverRDMA传输上,每个RPC消息都由RPCoverRDMA消息封装。RPC over RDMA消息由两个XDR流组成。
RPC Payload Stream The "Payload stream" contains the encapsulated RPC message being transferred by this RPC-over-RDMA message. This stream always begins with the Transaction ID (XID) field of the encapsulated RPC message.
RPC有效负载流“有效负载流”包含此RPC over RDMA消息传输的封装RPC消息。此流始终以封装的RPC消息的事务ID(XID)字段开始。
Transport Stream The "Transport stream" contains a header that describes and controls the transfer of the Payload stream in this RPC-over-RDMA message. This header is analogous to the record marking used for RPC on TCP sockets but is more extensive, since RDMA transports support several modes of data transfer.
传输流“传输流”包含一个标头,用于描述和控制此RPC over RDMA消息中有效负载流的传输。此标头类似于TCP套接字上RPC使用的记录标记,但更广泛,因为RDMA传输支持多种数据传输模式。
In its simplest form, an RPC-over-RDMA message consists of a Transport stream followed immediately by a Payload stream conveyed together in a single RDMA Send. To transmit large RPC messages, a combination of one RDMA Send operation and one or more other RDMA operations is employed.
在其最简单的形式中,RPC over RDMA消息由传输流组成,紧接着是在单个RDMA发送中一起传输的有效负载流。为了传输大型RPC消息,采用了一个RDMA发送操作和一个或多个其他RDMA操作的组合。
RPC-over-RDMA framing replaces all other RPC framing (such as TCP record marking) when used atop an RPC-over-RDMA association, even when the underlying RDMA protocol may itself be layered atop a transport with a defined RPC framing (such as TCP).
在RPC over RDMA关联上使用时,RPC over RDMA帧将替换所有其他RPC帧(例如TCP记录标记),即使底层RDMA协议本身可能在具有已定义RPC帧(例如TCP)的传输上分层。
However, it is possible for RPC-over-RDMA to be dynamically enabled in the course of negotiating the use of RDMA via a ULP exchange. Because RPC framing delimits an entire RPC request or reply, the resulting shift in framing must occur between distinct RPC messages, and in concert with the underlying transport.
但是,在通过ULP交换协商使用RDMA的过程中,可以动态启用RDMA上的RPC。由于RPC帧界定了整个RPC请求或应答,因此帧中产生的移动必须发生在不同的RPC消息之间,并且与底层传输一致。
It is critical to provide RDMA Send flow control for an RDMA connection. If any pre-posted Receive buffer on the connection is not large enough to accept an incoming RDMA Send, or if a pre-posted Receive buffer is not available to accept an incoming RDMA Send, the
为RDMA连接提供RDMA发送流控制至关重要。如果连接上的任何预投递接收缓冲区不够大,无法接受传入的RDMA发送,或者如果预投递接收缓冲区无法接受传入的RDMA发送,则
RDMA connection can be terminated. This is different than conventional TCP/IP networking, in which buffers are allocated dynamically as messages are received.
可以终止RDMA连接。这与传统的TCP/IP网络不同,传统的TCP/IP网络在接收消息时动态分配缓冲区。
The longevity of an RDMA connection mandates that sending endpoints respect the resource limits of peer receivers. To ensure messages can be sent and received reliably, there are two operational parameters for each connection.
RDMA连接的寿命要求发送端点尊重对等接收方的资源限制。为了确保消息能够可靠地发送和接收,每个连接有两个操作参数。
Flow control for RDMA Send operations directed to the Responder is implemented as a simple request/grant protocol in the RPC-over-RDMA header associated with each RPC message.
定向到响应程序的RDMA发送操作的流控制在与每个RPC消息关联的RPC over RDMA标头中作为简单的请求/授权协议实现。
An RPC-over-RDMA version 1 credit is the capability to handle one RPC-over-RDMA transaction. Each RPC-over-RDMA message sent from Requester to Responder requests a number of credits from the Responder. Each RPC-over-RDMA message sent from Responder to Requester informs the Requester how many credits the Responder has granted. The requested and granted values are carried in each RPC-over-RDMA message's rdma_credit field (see Section 4.2.3).
RPC over RDMA版本1积分是处理一个RPC over RDMA事务的能力。从请求者发送到响应者的每个RPC over RDMA消息从响应者请求一定数量的信用。从响应者发送到请求者的每个RPC over RDMA消息都会通知请求者响应者已授予多少信用。请求和授予的值在每个RPC over RDMA消息的RDMA_credit字段中进行(见第4.2.3节)。
Practically speaking, the critical value is the granted value. A Requester MUST NOT send unacknowledged requests in excess of the Responder's granted credit limit. If the granted value is exceeded, the RDMA layer may signal an error, possibly terminating the connection. The granted value MUST NOT be zero, since such a value would result in deadlock.
实际上,临界值就是给定值。请求者发送的未确认请求不得超过响应者授予的信用额度。如果超过授权值,RDMA层可能会发出错误信号,可能会终止连接。授予的值不能为零,因为这样的值将导致死锁。
RPC calls complete in any order, but the current granted credit limit at the Responder is known to the Requester from RDMA Send ordering properties. The number of allowed new requests the Requester may send is then the lower of the current requested and granted credit values, minus the number of requests in flight. Advertised credit values are not altered when individual RPCs are started or completed.
RPC调用可以按任何顺序完成,但是请求者可以从RDMA发送顺序属性中知道响应者当前授予的信用额度。请求者可以发送的允许的新请求数是当前请求和授予的信用值中的较低值减去飞行中的请求数。单个RPC启动或完成时,不会更改公布的信用值。
The requested and granted credit values MAY be adjusted to match the needs or policies in effect on either peer. For instance, a Responder may reduce the granted credit value to accommodate the available resources in a Shared Receive Queue. The Responder MUST ensure that an increase in receive resources is effected before the next RPC Reply message is sent.
可调整请求和授予的信用值,以符合任何一方的需求或有效政策。例如,响应者可以减少所授予的信用值以适应共享接收队列中的可用资源。响应者必须确保在发送下一个RPC应答消息之前,接收资源的增加受到影响。
A Requester MUST maintain enough receive resources to accommodate expected replies. Responders have to be prepared for there to be no receive resources available on Requesters with no pending RPC transactions.
请求者必须维护足够的接收资源以容纳预期的答复。响应者必须做好准备,以便在没有未决RPC事务的请求者上没有可用的接收资源。
Certain RDMA implementations may impose additional flow-control restrictions, such as limits on RDMA Read operations in progress at the Responder. Accommodation of such restrictions is considered the responsibility of each RPC-over-RDMA version 1 implementation.
某些RDMA实现可能会施加额外的流控制限制,例如对响应程序中正在进行的RDMA读取操作的限制。这些限制的适应被认为是每个RPC over RDMA版本1实现的责任。
An "inline threshold" value is the largest message size (in octets) that can be conveyed in one direction between peer implementations using RDMA Send and Receive. The inline threshold value is the smaller of the largest number of bytes the sender can post via a single RDMA Send operation and the largest number of bytes the receiver can accept via a single RDMA Receive operation. Each connection has two inline threshold values: one for messages flowing from Requester-to-Responder (referred to as the "call inline threshold") and one for messages flowing from Responder-to-Requester (referred to as the "reply inline threshold").
“内联阈值”值是使用RDMA发送和接收的对等实现之间可以在一个方向上传输的最大消息大小(以八位字节为单位)。内联阈值是发送方可通过单个RDMA发送操作发布的最大字节数和接收方可通过单个RDMA接收操作接受的最大字节数中的较小值。每个连接有两个内联阈值:一个用于从请求者到响应者的消息流(称为“调用内联阈值”),另一个用于从响应者到请求者的消息流(称为“回复内联阈值”)。
Unlike credit limits, inline threshold values are not advertised to peers via the RPC-over-RDMA version 1 protocol, and there is no provision for inline threshold values to change during the lifetime of an RPC-over-RDMA version 1 connection.
与信用限制不同,内联阈值不会通过RPC over RDMA version 1协议通告给对等方,并且没有规定内联阈值在RPC over RDMA version 1连接的生存期内更改。
When a connection is first established, peers might not know how many receive resources the other has, nor how large the other peer's inline thresholds are.
首次建立连接时,对等方可能不知道另一方有多少接收资源,也不知道另一方的内联阈值有多大。
As a basis for an initial exchange of RPC requests, each RPC-over-RDMA version 1 connection provides the ability to exchange at least one RPC message at a time, whose RPC Call and Reply messages are no more than 1024 bytes in size. A Responder MAY exceed this basic level of configuration, but a Requester MUST NOT assume more than one credit is available and MUST receive a valid reply from the Responder carrying the actual number of available credits, prior to sending its next request.
作为RPC请求初始交换的基础,每个RPC over RDMA version 1连接提供一次至少交换一条RPC消息的能力,其RPC调用和回复消息的大小不超过1024字节。响应者可能会超过此基本配置级别,但请求者不得假设有多个可用信用,并且必须在发送下一个请求之前收到响应者的有效回复,其中包含实际可用信用数。
Receiver implementations MUST support inline thresholds of 1024 bytes but MAY support larger inline thresholds values. An independent mechanism for discovering a peer's inline thresholds before a connection is established may be used to optimize the use of RDMA Send and Receive operations. In the absence of such a mechanism, senders and receives MUST assume the inline thresholds are 1024 bytes.
接收器实现必须支持1024字节的内联阈值,但可能支持更大的内联阈值。在建立连接之前发现对等方内联阈值的独立机制可用于优化RDMA发送和接收操作的使用。在没有这种机制的情况下,发送方和接收方必须假定内联阈值为1024字节。
When a DDP capability is available, the transport places the contents of one or more XDR data items directly into the receiver's memory, separately from the transfer of other parts of the containing XDR stream.
当DDP功能可用时,传输将一个或多个XDR数据项的内容直接放入接收器的内存,与包含XDR流的其他部分的传输分开。
RPC-over-RDMA version 1 provides a mechanism for moving part of an RPC message via a data transfer distinct from an RDMA Send/Receive pair. The sender removes one or more XDR data items from the Payload stream. They are conveyed via other mechanisms, such as one or more RDMA Read or Write operations. As the receiver decodes an incoming message, it skips over directly placed data items.
RPC over RDMA version 1提供了一种机制,用于通过与RDMA发送/接收对不同的数据传输来移动RPC消息的一部分。发送方从有效负载流中删除一个或多个XDR数据项。它们通过其他机制传递,例如一个或多个RDMA读写操作。当接收器解码传入消息时,它跳过直接放置的数据项。
The portion of an XDR stream that is split out and moved separately is referred to as a "chunk". In some contexts, data in an RPC-over-RDMA header that describes these split out regions of memory may also be referred to as a "chunk".
XDR流中分开并单独移动的部分称为“块”。在某些上下文中,描述这些内存分离区域的RPC over RDMA报头中的数据也可以称为“块”。
A Payload stream after chunks have been removed is referred to as a "reduced" Payload stream. Likewise, a data item that has been removed from a Payload stream to be transferred separately is referred to as a "reduced" data item.
块被移除后的有效负载流被称为“减少的”有效负载流。类似地,从要单独传输的有效负载流中移除的数据项被称为“缩减”数据项。
Not all XDR data items benefit from DDP. For example, small data items or data items that require XDR unmarshaling by the receiver do not benefit from DDP. In addition, it is impractical for receivers to prepare for every possible XDR data item in a protocol to be transferred in a chunk.
并非所有XDR数据项都受益于DDP。例如,小型数据项或需要接收器进行XDR解编的数据项不能从DDP中受益。此外,对于接收器来说,准备协议中的每个可能的XDR数据项在块中传输是不切实际的。
To maintain interoperability on an RPC-over-RDMA transport, a determination must be made of which few XDR data items in each ULP are allowed to use DDP.
为了在RPC over RDMA传输上保持互操作性,必须确定每个ULP中的哪些XDR数据项允许使用DDP。
This is done by additional specifications that describe how ULPs employ DDP. A "ULB specification" identifies which specific individual XDR data items in a ULP MAY be transferred via DDP. Such data items are referred to as "DDP-eligible". All other XDR data items MUST NOT be reduced.
这是通过描述ULP如何使用DDP的附加规范实现的。“ULB规范”确定了ULP中哪些特定的XDR数据项可以通过DDP传输。这些数据项被称为“DDP合格”。不得减少所有其他XDR数据项。
Detailed requirements for ULBs are provided in Section 6.
第6节提供了ULB的详细要求。
When encoding a Payload stream that contains a DDP-eligible data item, a sender may choose to reduce that data item. When it chooses to do so, the sender does not place the item into the Payload stream. Instead, the sender records in the RPC-over-RDMA header the location and size of the memory region containing that data item.
当编码包含DDP合格数据项的有效负载流时,发送方可以选择减少该数据项。当它选择这样做时,发送方不会将项目放入有效负载流中。相反,发送方在RPC over RDMA报头中记录包含该数据项的内存区域的位置和大小。
The Requester provides location information for DDP-eligible data items in both RPC Call and Reply messages. The Responder uses this information to retrieve arguments contained in the specified region of the Requester's memory or place results in that memory region.
请求者在RPC调用和应答消息中提供符合DDP条件的数据项的位置信息。响应者使用此信息检索请求者内存指定区域中包含的参数,或将结果放置在该内存区域中。
An "RDMA segment", or "plain segment", is an RPC-over-RDMA Transport header data object that contains the precise coordinates of a contiguous memory region that is to be conveyed separately from the Payload stream. Plain segments contain the following information:
“RDMA段”或“普通段”是RPC over RDMA传输报头数据对象,它包含要与有效负载流分开传输的连续内存区域的精确坐标。普通段包含以下信息:
Handle Steering tag (STag) or R_key generated by registering this memory with the RDMA provider.
通过向RDMA提供程序注册此内存生成的句柄转向标记(STag)或R_密钥。
Length The length of the RDMA segment's memory region, in octets. An "empty segment" is an RDMA segment with the value zero (0) in its length field.
长度RDMA段内存区域的长度,以八位字节为单位。“空段”是长度字段中值为零(0)的RDMA段。
Offset The offset or beginning memory address of the RDMA segment's memory region.
偏移量RDMA段内存区域的偏移量或起始内存地址。
See [RFC5040] for further discussion.
有关进一步的讨论,请参见[RFC5040]。
In RPC-over-RDMA version 1, a "chunk" refers to a portion of the Payload stream that is moved independently of the RPC-over-RDMA Transport header and Payload stream. Chunk data is removed from the sender's Payload stream, transferred via separate operations, and then reinserted into the receiver's Payload stream to form a complete RPC message.
在RPCoverRDMA版本1中,“块”是指有效负载流的一部分,该部分独立于RPCoverRDMA传输头和有效负载流移动。区块数据从发送方的有效负载流中移除,通过单独的操作进行传输,然后重新插入接收方的有效负载流以形成完整的RPC消息。
Each chunk is comprised of RDMA segments. Each RDMA segment represents a single contiguous piece of that chunk. A Requester MAY divide a chunk into RDMA segments using any boundaries that are convenient. The length of a chunk is the sum of the lengths of the RDMA segments that comprise it.
每个区块由RDMA段组成。每个RDMA段表示该块的单个连续块。请求者可以使用任何方便的边界将块划分为RDMA段。块的长度是组成块的RDMA段的长度之和。
The RPC-over-RDMA version 1 transport protocol does not place a limit on chunk size. However, each ULP may cap the amount of data that can be transferred by a single RPC (for example, NFS has "rsize" and "wsize", which restrict the payload size of NFS READ and WRITE operations). The Responder can use such limits to sanity check chunk sizes before using them in RDMA operations.
RPC over RDMA版本1传输协议对数据块大小没有限制。但是,每个ULP可能会限制单个RPC可以传输的数据量(例如,NFS具有“rsize”和“wsize”,这限制了NFS读写操作的负载大小)。在RDMA操作中使用块大小之前,响应程序可以使用这些限制来检查块大小。
If a chunk contains a counted array data type, the count of array elements MUST remain in the Payload stream, while the array elements MUST be moved to the chunk. For example, when encoding an opaque byte array as a chunk, the count of bytes stays in the Payload stream, while the bytes in the array are removed from the Payload stream and transferred within the chunk.
如果区块包含已计数的数组数据类型,则数组元素的计数必须保留在有效负载流中,而数组元素必须移动到区块中。例如,当将不透明字节数组编码为块时,字节计数保留在有效负载流中,而数组中的字节将从有效负载流中移除并在块内传输。
Individual array elements appear in a chunk in their entirety. For example, when encoding an array of arrays as a chunk, the count of items in the enclosing array stays in the Payload stream, but each enclosed array, including its item count, is transferred as part of the chunk.
单个数组元素整体显示在块中。例如,当将数组数组编码为区块时,封闭数组中的项目计数保留在有效负载流中,但每个封闭数组(包括其项目计数)都作为区块的一部分传输。
If a chunk contains an optional-data data type, the "is present" field MUST remain in the Payload stream, while the data, if present, MUST be moved to the chunk.
如果区块包含可选数据类型,“存在”字段必须保留在有效负载流中,而数据(如果存在)必须移动到区块。
A union data type MUST NOT be made DDP-eligible, but one or more of its arms MAY be DDP-eligible, subject to the other requirements in this section.
联合数据类型不得成为DDP合格数据类型,但根据本节中的其他要求,其一个或多个分支可以是DDP合格数据类型。
Except in special cases (covered in Section 3.5.3), a chunk MUST contain exactly one XDR data item. This makes it straightforward to reduce variable-length data items without affecting the XDR alignment of data items in the Payload stream.
除特殊情况(第3.5.3节中所述)外,数据块必须仅包含一个XDR数据项。这使得减少可变长度数据项变得简单,而不会影响有效负载流中数据项的XDR对齐。
When a variable-length XDR data item is reduced, the sender MUST remove XDR roundup padding for that data item from the Payload stream so that data items remaining in the Payload stream begin on four-byte alignment.
当可变长度XDR数据项减少时,发送方必须从有效负载流中删除该数据项的XDR汇总填充,以便有效负载流中剩余的数据项以四字节对齐开始。
A "Read chunk" represents an XDR data item that is to be pulled from the Requester to the Responder.
“读取区块”表示要从请求者拉入响应者的XDR数据项。
A Read chunk is a list of one or more RDMA read segments. An RDMA read segment consists of a Position field followed by a plain segment. See Section 4.1.2 for details.
读块是一个或多个RDMA读段的列表。RDMA读取段由位置字段和普通段组成。详见第4.1.2节。
Position The byte offset in the unreduced Payload stream where the receiver reinserts the data item conveyed in a chunk. The Position value MUST be computed from the beginning of the unreduced Payload stream, which begins at Position zero. All RDMA read segments belonging to the same Read chunk have the same value in their Position field.
在未减少的有效负载流中定位字节偏移量,在该流中接收器重新插入块中传输的数据项。位置值必须从未减少的有效负载流开始计算,该有效负载流从位置0开始。属于同一读块的所有RDMA读段在其位置字段中具有相同的值。
While constructing an RPC Call message, a Requester registers memory regions that contain data to be transferred via RDMA Read operations. It advertises the coordinates of these regions in the RPC-over-RDMA Transport header of the RPC Call message.
在构造RPC调用消息时,请求者注册包含要通过RDMA读取操作传输的数据的内存区域。它在RPC调用消息的RPC over RDMA传输头中公布这些区域的坐标。
After receiving an RPC Call message sent via an RDMA Send operation, a Responder transfers the chunk data from the Requester using RDMA Read operations. The Responder reconstructs the transferred chunk data by concatenating the contents of each RDMA segment, in list order, into the received Payload stream at the Position value recorded in that RDMA segment.
在接收到通过RDMA发送操作发送的RPC调用消息后,响应者使用RDMA读取操作从请求者传输区块数据。响应器通过以列表顺序将每个RDMA段的内容连接到该RDMA段中记录的位置值处的接收有效负载流中来重构传输的区块数据。
Put another way, the Responder inserts the first RDMA segment in a Read chunk into the Payload stream at the byte offset indicated by its Position field. RDMA segments whose Position field value match this offset are concatenated afterwards, until there are no more RDMA segments at that Position value.
换句话说,响应程序将读取块中的第一个RDMA段插入有效负载流中,其位置字段指示字节偏移量。其位置字段值与该偏移匹配的RDMA段随后将被连接,直到该位置值处不再有RDMA段。
The Position field in a read segment indicates where the containing Read chunk starts in the Payload stream. The value in this field MUST be a multiple of four. All segments in the same Read chunk share the same Position value, even if one or more of the RDMA segments have a non-four-byte-aligned length.
读取段中的位置字段指示有效负载流中包含读取块的起始位置。此字段中的值必须是四的倍数。同一读块中的所有段共享相同的位置值,即使一个或多个RDMA段具有非四字节对齐长度。
While decoding a received Payload stream, whenever the XDR offset in the Payload stream matches that of a Read chunk, the Responder initiates an RDMA Read to pull the chunk's data content into registered local memory.
在对接收到的有效负载流进行解码时,只要有效负载流中的XDR偏移量与读取区块的XDR偏移量匹配,响应程序就会启动RDMA读取,以将区块的数据内容拉入已注册的本地内存。
The Responder acknowledges its completion of use of Read chunk source buffers when it sends an RPC Reply message to the Requester. The Requester may then release Read chunks advertised in the request.
当响应程序向请求程序发送RPC回复消息时,响应程序确认其已完成读取区块源缓冲区的使用。然后,请求者可以释放请求中公布的读块。
When reducing a variable-length argument data item, the Requester SHOULD NOT include the data item's XDR roundup padding in the chunk. The length of a Read chunk is determined as follows:
减少可变长度参数数据项时,请求者不应在数据块中包含该数据项的XDR汇总填充。读取区块的长度确定如下:
o If the Requester chooses to include roundup padding in a Read chunk, the chunk's total length MUST be the sum of the encoded length of the data item and the length of the roundup padding. The length of the data item that was encoded into the Payload stream remains unchanged.
o 如果请求者选择在读取区块中包含取整填充,区块的总长度必须是数据项的编码长度和取整填充长度之和。编码到有效负载流中的数据项的长度保持不变。
The sender can increase the length of the chunk by adding another RDMA segment containing only the roundup padding, or it can do so by extending the final RDMA segment in the chunk.
发送方可以通过添加另一个仅包含舍入填充的RDMA段来增加数据块的长度,也可以通过扩展数据块中的最后一个RDMA段来增加数据块的长度。
o If the sender chooses not to include roundup padding in the chunk, the chunk's total length MUST be the same as the encoded length of the data item.
o 如果发送方选择不在数据块中包含取整填充,则数据块的总长度必须与数据项的编码长度相同。
While constructing an RPC Call message, a Requester prepares memory regions in which to receive DDP-eligible result data items. A "Write chunk" represents an XDR data item that is to be pushed from a Responder to a Requester. It is made up of an array of zero or more plain segments.
在构造RPC调用消息时,请求者准备内存区域,在其中接收符合DDP条件的结果数据项。“写块”表示要从响应者推送到请求者的XDR数据项。它由零个或多个普通段组成的数组组成。
Write chunks are provisioned by a Requester long before the Responder has prepared the reply Payload stream. A Requester often does not know the actual length of the result data items to be returned, since the result does not yet exist. Thus, it MUST register Write chunks long enough to accommodate the maximum possible size of each returned data item.
在响应者准备应答有效负载流之前很久,请求者就提供了写块。请求者通常不知道要返回的结果数据项的实际长度,因为结果还不存在。因此,它必须注册足够长的写块,以容纳每个返回数据项的最大可能大小。
In addition, the XDR position of DDP-eligible data items in the reply's Payload stream is not predictable when a Requester constructs an RPC Call message. Therefore, RDMA segments in a Write chunk do not have a Position field.
此外,当请求者构造RPC调用消息时,应答有效负载流中符合DDP条件的数据项的XDR位置是不可预测的。因此,写入块中的RDMA段没有位置字段。
For each Write chunk provided by a Requester, the Responder pushes one data item to the Requester, filling the chunk contiguously and in segment array order until that data item has been completely written to the Requester. The Responder MUST copy the segment count and all segments from the Requester-provided Write chunk into the RPC Reply message's Transport header. As it does so, the Responder updates each segment length field to reflect the actual amount of data that is being returned in that segment. The Responder then sends the RPC Reply message via an RDMA Send operation.
对于请求者提供的每个写入区块,响应者将一个数据项推送到请求者,以段数组顺序连续填充区块,直到该数据项完全写入请求者。响应者必须将段计数和所有段从请求者提供的写入块复制到RPC应答消息的传输头中。在执行此操作时,响应程序更新每个段长度字段,以反映在该段中返回的实际数据量。然后,响应程序通过RDMA发送操作发送RPC回复消息。
An "empty Write chunk" is a Write chunk with a zero segment count. By definition, the length of an empty Write chunk is zero. An "unused Write chunk" has a non-zero segment count, but all of its segments are empty segments.
“空写块”是段计数为零的写块。根据定义,空写块的长度为零。“未使用的写块”具有非零段计数,但其所有段都是空段。
After receiving the RPC Reply message, the Requester reconstructs the transferred data by concatenating the contents of each segment, in array order, into the RPC Reply message's XDR stream at the known XDR position of the associated DDP-eligible result data item.
在接收到RPC回复消息后,请求者通过在相关DDP合格结果数据项的已知XDR位置将每个段的内容按数组顺序连接到RPC回复消息的XDR流中来重构传输的数据。
When provisioning a Write chunk for a variable-length result data item, the Requester SHOULD NOT include additional space for XDR roundup padding. A Responder MUST NOT write XDR roundup padding into a Write chunk, even if the Requester made space available for it. Therefore, when returning a single variable-length result data item, a returned Write chunk's total length MUST be the same as the encoded length of the result data item.
为可变长度的结果数据项提供写块时,请求者不应为XDR汇总填充包含额外的空间。响应者不得将XDR汇总填充写入写入块,即使请求者为其提供了可用空间。因此,当返回单个可变长度的结果数据项时,返回的写块的总长度必须与结果数据项的编码长度相同。
A receiver of RDMA Send operations is required by RDMA to have previously posted one or more adequately sized buffers. Memory savings are achieved on both Requesters and Responders by posting small Receive buffers. However, not all RPC messages are small. RPC-over-RDMA version 1 provides several mechanisms that allow messages of any size to be conveyed efficiently.
RDMA要求RDMA发送操作的接收者之前发布一个或多个大小适当的缓冲区。通过发布小的接收缓冲区,请求者和响应者都可以节省内存。但是,并非所有RPC消息都很小。RPC over RDMA version 1提供了几种机制,允许有效地传输任意大小的消息。
RPC messages are frequently smaller than typical inline thresholds. For example, the NFS version 3 GETATTR operation is only 56 bytes: 20 bytes of RPC header, a 32-byte file handle argument, and 4 bytes for its length. The reply to this common request is about 100 bytes.
RPC消息通常小于典型的内联阈值。例如,NFS版本3 GETATTR操作只有56个字节:20个字节的RPC头、一个32字节的文件句柄参数和4个字节的长度。对这个常见请求的回复大约为100字节。
Since all RPC messages conveyed via RPC-over-RDMA require an RDMA Send operation, the most efficient way to send an RPC message that is smaller than the inline threshold is to append the Payload stream directly to the Transport stream. An RPC-over-RDMA header with a small RPC Call or Reply message immediately following is transferred using a single RDMA Send operation. No other operations are needed.
由于通过RPC over RDMA传输的所有RPC消息都需要RDMA发送操作,因此发送小于内联阈值的RPC消息的最有效方法是将有效负载流直接附加到传输流。使用单个RDMA发送操作传输带有紧接其后的小RPC调用或回复消息的RPC over RDMA报头。不需要其他操作。
An RPC-over-RDMA transaction using Short Messages:
使用短消息的RPC over RDMA事务:
Requester Responder | RDMA Send (RDMA_MSG) | Call | ------------------------------> | | | | | Processing | | | RDMA Send (RDMA_MSG) | | <------------------------------ | Reply
Requester Responder | RDMA Send (RDMA_MSG) | Call | ------------------------------> | | | | | Processing | | | RDMA Send (RDMA_MSG) | | <------------------------------ | Reply
If DDP-eligible data items are present in a Payload stream, a sender MAY reduce some or all of these items by removing them from the Payload stream. The sender uses a separate mechanism to transfer the reduced data items. The Transport stream with the reduced Payload stream immediately following is then transferred using a single RDMA Send operation.
如果有效负载流中存在符合DDP条件的数据项,则发送方可以通过从有效负载流中移除它们来减少部分或全部这些项。发送方使用单独的机制传输减少的数据项。然后,使用单个RDMA发送操作来传输紧随其后的具有减少的有效负载流的传输流。
After receiving the Transport and Payload streams of an RPC Call message accompanied by Read chunks, the Responder uses RDMA Read operations to move reduced data items in Read chunks. Before sending the Transport and Payload streams of an RPC Reply message containing Write chunks, the Responder uses RDMA Write operations to move reduced data items in Write and Reply chunks.
在接收到带有读块的RPC调用消息的传输流和有效负载流后,响应程序使用RDMA读操作来移动读块中减少的数据项。在发送包含写块的RPC应答消息的传输流和有效负载流之前,应答器使用RDMA写操作移动写块和应答块中减少的数据项。
An RPC-over-RDMA transaction with a Read chunk:
具有读取区块的RPC over RDMA事务:
Requester Responder | RDMA Send (RDMA_MSG) | Call | ------------------------------> | | RDMA Read | | <------------------------------ | | RDMA Response (arg data) | | ------------------------------> | | | | | Processing | | | RDMA Send (RDMA_MSG) | | <------------------------------ | Reply
Requester Responder | RDMA Send (RDMA_MSG) | Call | ------------------------------> | | RDMA Read | | <------------------------------ | | RDMA Response (arg data) | | ------------------------------> | | | | | Processing | | | RDMA Send (RDMA_MSG) | | <------------------------------ | Reply
An RPC-over-RDMA transaction with a Write chunk:
具有写入区块的RPC over RDMA事务:
Requester Responder | RDMA Send (RDMA_MSG) | Call | ------------------------------> | | | | | Processing | | | RDMA Write (result data) | | <------------------------------ | | RDMA Send (RDMA_MSG) | | <------------------------------ | Reply
Requester Responder | RDMA Send (RDMA_MSG) | Call | ------------------------------> | | | | | Processing | | | RDMA Write (result data) | | <------------------------------ | | RDMA Send (RDMA_MSG) | | <------------------------------ | Reply
When a Payload stream is larger than the receiver's inline threshold, the Payload stream is reduced by removing DDP-eligible data items and placing them in chunks to be moved separately. If there are no DDP-eligible data items in the Payload stream, or the Payload stream is still too large after it has been reduced, the RDMA transport MUST use RDMA Read or Write operations to convey the Payload stream itself. This mechanism is referred to as a "Long Message".
当有效负载流大于接收器的内联阈值时,通过移除符合DDP条件的数据项并将其放置在要单独移动的块中来减少有效负载流。如果有效负载流中没有符合DDP条件的数据项,或者有效负载流在减少后仍然过大,RDMA传输必须使用RDMA读或写操作来传输有效负载流本身。这种机制被称为“长消息”。
To transmit a Long Message, the sender conveys only the Transport stream with an RDMA Send operation. The Payload stream is not included in the Send buffer in this instance. Instead, the Requester provides chunks that the Responder uses to move the Payload stream.
为了传输长消息,发送方仅通过RDMA发送操作传送传输流。在此实例中,有效负载流不包括在发送缓冲区中。相反,请求者提供响应者用于移动有效负载流的块。
Long Call To send a Long Call message, the Requester provides a special Read chunk that contains the RPC Call message's Payload stream. Every RDMA read segment in this chunk MUST contain zero in its Position field. Thus, this chunk is known as a "Position Zero Read chunk".
Long Call要发送Long Call消息,请求者提供一个特殊的读取块,其中包含RPC Call消息的有效负载流。此块中的每个RDMA读取段在其位置字段中必须包含零。因此,这个区块被称为“零位读取区块”。
Long Reply To send a Long Reply, the Requester provides a single special Write chunk in advance, known as the "Reply chunk", that will contain the RPC Reply message's Payload stream. The Requester sizes the Reply chunk to accommodate the maximum expected reply size for that upper-layer operation.
长回复要发送长回复,请求者提前提供一个特殊的写块,称为“回复块”,它将包含RPC回复消息的有效负载流。请求者调整回复块的大小,以适应上层操作的最大预期回复大小。
Though the purpose of a Long Message is to handle large RPC messages, Requesters MAY use a Long Message at any time to convey an RPC Call message.
尽管长消息的目的是处理大型RPC消息,但请求者可以随时使用长消息来传递RPC调用消息。
A Responder chooses which form of reply to use based on the chunks provided by the Requester. If Write chunks were provided and the Responder has a DDP-eligible result, it first reduces the reply Payload stream. If a Reply chunk was provided and the reduced Payload stream is larger than the reply inline threshold, the Responder MUST use the Requester-provided Reply chunk for the reply.
响应者根据请求者提供的区块选择使用哪种形式的响应。如果提供了写块,并且响应程序具有符合DDP条件的结果,那么它首先会减少应答有效负载流。如果提供了回复区块,并且减少的有效负载流大于回复内联阈值,则响应者必须使用请求者提供的回复区块进行回复。
XDR data items may appear in these special chunks without regard to their DDP-eligibility. As these chunks contain a Payload stream, such chunks MUST include appropriate XDR roundup padding to maintain proper XDR alignment of their contents.
XDR数据项可能出现在这些特殊块中,而不考虑其DDP资格。由于这些区块包含有效负载流,因此此类区块必须包含适当的XDR汇总填充,以保持其内容的正确XDR对齐。
An RPC-over-RDMA transaction using a Long Call:
使用长调用的RPC over RDMA事务:
Requester Responder | RDMA Send (RDMA_NOMSG) | Call | ------------------------------> | | RDMA Read | | <------------------------------ | | RDMA Response (RPC call) | | ------------------------------> | | | | | Processing | | | RDMA Send (RDMA_MSG) | | <------------------------------ | Reply
Requester Responder | RDMA Send (RDMA_NOMSG) | Call | ------------------------------> | | RDMA Read | | <------------------------------ | | RDMA Response (RPC call) | | ------------------------------> | | | | | Processing | | | RDMA Send (RDMA_MSG) | | <------------------------------ | Reply
An RPC-over-RDMA transaction using a Long Reply:
使用长回复的RPC over RDMA事务:
Requester Responder | RDMA Send (RDMA_MSG) | Call | ------------------------------> | | | | | Processing | | | RDMA Write (RPC reply) | | <------------------------------ | | RDMA Send (RDMA_NOMSG) | | <------------------------------ | Reply
Requester Responder | RDMA Send (RDMA_MSG) | Call | ------------------------------> | | | | | Processing | | | RDMA Write (RPC reply) | | <------------------------------ | | RDMA Send (RDMA_NOMSG) | | <------------------------------ | Reply
Every RPC-over-RDMA version 1 message has a header that includes a copy of the message's transaction ID, data for managing RDMA flow-control credits, and lists of RDMA segments describing chunks. All RPC-over-RDMA header content is contained in the Transport stream; thus, it MUST be XDR encoded.
每个RPC over RDMA version 1消息都有一个标头,其中包括消息事务ID的副本、用于管理RDMA流控制信用的数据以及描述块的RDMA段列表。所有RPC over RDMA头内容都包含在传输流中;因此,它必须是XDR编码的。
RPC message layout is unchanged from that described in [RFC5531] except for the possible reduction of data items that are moved by separate operations.
RPC消息布局与[RFC5531]中描述的相同,除了可能减少通过单独操作移动的数据项。
The RPC-over-RDMA protocol passes RPC messages without regard to their type (CALL or REPLY). Apart from restrictions imposed by ULBs, each endpoint of a connection MAY send RDMA_MSG or RDMA_NOMSG message header types at any time (subject to credit limits).
RPC over RDMA协议传递RPC消息,而不考虑其类型(呼叫或应答)。除ULBs施加的限制外,连接的每个端点可随时发送RDMA_MSG或RDMA_NOMSG消息头类型(受信用限制)。
This section contains a description of the core features of the RPC-over-RDMA version 1 protocol, expressed in the XDR language [RFC4506].
本节描述了RPC over RDMA版本1协议的核心功能,以XDR语言[RFC4506]表示。
This description is provided in a way that makes it simple to extract into ready-to-compile form. The reader can apply the following shell script to this document to produce a machine-readable XDR description of the RPC-over-RDMA version 1 protocol.
提供此描述的方式使得提取到准备编译的形式变得简单。读者可以将以下shell脚本应用于此文档,以生成RPCoverRDMA版本1协议的机器可读XDR描述。
<CODE BEGINS>
<代码开始>
#!/bin/sh grep '^ *///' | sed 's?^ /// ??' | sed 's?^ *///$??'
#!/bin/sh grep '^ *///' | sed 's?^ /// ??' | sed 's?^ *///$??'
<CODE ENDS>
<代码结束>
That is, if the above script is stored in a file called "extract.sh" and this document is in a file called "spec.txt", then the reader can do the following to extract an XDR description file:
也就是说,如果上述脚本存储在一个名为“extract.sh”的文件中,并且该文档存储在一个名为“spec.txt”的文件中,那么读者可以执行以下操作来提取XDR描述文件:
<CODE BEGINS>
<代码开始>
sh extract.sh < spec.txt > rpcrdma_corev1.x
sh extract.sh<spec.txt>rpcrdma_corev1.x
<CODE ENDS>
<代码结束>
Code components extracted from this document must include the following license text. When the extracted XDR code is combined with other complementary XDR code, which itself has an identical license, only a single copy of the license text need be preserved.
从本文档中提取的代码组件必须包含以下许可证文本。当提取的XDR代码与其他具有相同许可证的补充XDR代码组合时,只需保留许可证文本的单个副本。
<CODE BEGINS>
<代码开始>
/// /* /// * Copyright (c) 2010-2017 IETF Trust and the persons /// * identified as authors of the code. All rights reserved. /// * /// * The authors of the code are: /// * B. Callaghan, T. Talpey, and C. Lever /// * /// * Redistribution and use in source and binary forms, with /// * or without modification, are permitted provided that the /// * following conditions are met: /// * /// * - Redistributions of source code must retain the above /// * copyright notice, this list of conditions and the /// * following disclaimer. /// * /// * - Redistributions in binary form must reproduce the above /// * copyright notice, this list of conditions and the /// * following disclaimer in the documentation and/or other /// * materials provided with the distribution. /// * /// * - Neither the name of Internet Society, IETF or IETF /// * Trust, nor the names of specific contributors, may be /// * used to endorse or promote products derived from this /// * software without specific prior written permission. /// * /// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS /// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED /// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE /// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS /// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO /// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE /// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, /// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT /// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR /// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS /// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF /// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, /// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING /// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. /// */ ///
/// /* /// * Copyright (c) 2010-2017 IETF Trust and the persons /// * identified as authors of the code. All rights reserved. /// * /// * The authors of the code are: /// * B. Callaghan, T. Talpey, and C. Lever /// * /// * Redistribution and use in source and binary forms, with /// * or without modification, are permitted provided that the /// * following conditions are met: /// * /// * - Redistributions of source code must retain the above /// * copyright notice, this list of conditions and the /// * following disclaimer. /// * /// * - Redistributions in binary form must reproduce the above /// * copyright notice, this list of conditions and the /// * following disclaimer in the documentation and/or other /// * materials provided with the distribution. /// * /// * - Neither the name of Internet Society, IETF or IETF /// * Trust, nor the names of specific contributors, may be /// * used to endorse or promote products derived from this /// * software without specific prior written permission. /// * /// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS /// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED /// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE /// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS /// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO /// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE /// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, /// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT /// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR /// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS /// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF /// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, /// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING /// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. /// */ ///
<CODE ENDS>
<代码结束>
XDR data items defined in this section encodes the Transport Header Stream in each RPC-over-RDMA version 1 message. Comments identify items that cannot be changed in subsequent versions.
本节中定义的XDR数据项对每个RPC over RDMA版本1消息中的传输头流进行编码。注释标识在后续版本中无法更改的项。
<CODE BEGINS>
<代码开始>
/// /* /// * Plain RDMA segment (Section 3.4.3) /// */ /// struct xdr_rdma_segment { /// uint32 handle; /* Registered memory handle */ /// uint32 length; /* Length of the chunk in bytes */ /// uint64 offset; /* Chunk virtual address or offset */ /// }; /// /// /* /// * RDMA read segment (Section 3.4.5) /// */ /// struct xdr_read_chunk { /// uint32 position; /* Position in XDR stream */ /// struct xdr_rdma_segment target; /// }; /// /// /* /// * Read list (Section 4.3.1) /// */ /// struct xdr_read_list { /// struct xdr_read_chunk entry; /// struct xdr_read_list *next; /// }; /// /// /* /// * Write chunk (Section 3.4.6) /// */ /// struct xdr_write_chunk { /// struct xdr_rdma_segment target<>; /// }; /// /// /* /// * Write list (Section 4.3.2) /// */ /// struct xdr_write_list { /// struct xdr_write_chunk entry; /// struct xdr_write_list *next; /// }; ///
/// /* /// * Plain RDMA segment (Section 3.4.3) /// */ /// struct xdr_rdma_segment { /// uint32 handle; /* Registered memory handle */ /// uint32 length; /* Length of the chunk in bytes */ /// uint64 offset; /* Chunk virtual address or offset */ /// }; /// /// /* /// * RDMA read segment (Section 3.4.5) /// */ /// struct xdr_read_chunk { /// uint32 position; /* Position in XDR stream */ /// struct xdr_rdma_segment target; /// }; /// /// /* /// * Read list (Section 4.3.1) /// */ /// struct xdr_read_list { /// struct xdr_read_chunk entry; /// struct xdr_read_list *next; /// }; /// /// /* /// * Write chunk (Section 3.4.6) /// */ /// struct xdr_write_chunk { /// struct xdr_rdma_segment target<>; /// }; /// /// /* /// * Write list (Section 4.3.2) /// */ /// struct xdr_write_list { /// struct xdr_write_chunk entry; /// struct xdr_write_list *next; /// }; ///
/// /* /// * Chunk lists (Section 4.3) /// */ /// struct rpc_rdma_header { /// struct xdr_read_list *rdma_reads; /// struct xdr_write_list *rdma_writes; /// struct xdr_write_chunk *rdma_reply; /// /* rpc body follows */ /// }; /// /// struct rpc_rdma_header_nomsg { /// struct xdr_read_list *rdma_reads; /// struct xdr_write_list *rdma_writes; /// struct xdr_write_chunk *rdma_reply; /// }; /// /// /* Not to be used */ /// struct rpc_rdma_header_padded { /// uint32 rdma_align; /// uint32 rdma_thresh; /// struct xdr_read_list *rdma_reads; /// struct xdr_write_list *rdma_writes; /// struct xdr_write_chunk *rdma_reply; /// /* rpc body follows */ /// }; /// /// /* /// * Error handling (Section 4.5) /// */ /// enum rpc_rdma_errcode { /// ERR_VERS = 1, /* Value fixed for all versions */ /// ERR_CHUNK = 2 /// }; /// /// /* Structure fixed for all versions */ /// struct rpc_rdma_errvers { /// uint32 rdma_vers_low; /// uint32 rdma_vers_high; /// }; /// /// union rpc_rdma_error switch (rpc_rdma_errcode err) { /// case ERR_VERS: /// rpc_rdma_errvers range; /// case ERR_CHUNK: /// void; /// }; /// /// /*
/// /* /// * Chunk lists (Section 4.3) /// */ /// struct rpc_rdma_header { /// struct xdr_read_list *rdma_reads; /// struct xdr_write_list *rdma_writes; /// struct xdr_write_chunk *rdma_reply; /// /* rpc body follows */ /// }; /// /// struct rpc_rdma_header_nomsg { /// struct xdr_read_list *rdma_reads; /// struct xdr_write_list *rdma_writes; /// struct xdr_write_chunk *rdma_reply; /// }; /// /// /* Not to be used */ /// struct rpc_rdma_header_padded { /// uint32 rdma_align; /// uint32 rdma_thresh; /// struct xdr_read_list *rdma_reads; /// struct xdr_write_list *rdma_writes; /// struct xdr_write_chunk *rdma_reply; /// /* rpc body follows */ /// }; /// /// /* /// * Error handling (Section 4.5) /// */ /// enum rpc_rdma_errcode { /// ERR_VERS = 1, /* Value fixed for all versions */ /// ERR_CHUNK = 2 /// }; /// /// /* Structure fixed for all versions */ /// struct rpc_rdma_errvers { /// uint32 rdma_vers_low; /// uint32 rdma_vers_high; /// }; /// /// union rpc_rdma_error switch (rpc_rdma_errcode err) { /// case ERR_VERS: /// rpc_rdma_errvers range; /// case ERR_CHUNK: /// void; /// }; /// /// /*
/// * Procedures (Section 4.2.4) /// */ /// enum rdma_proc { /// RDMA_MSG = 0, /* Value fixed for all versions */ /// RDMA_NOMSG = 1, /* Value fixed for all versions */ /// RDMA_MSGP = 2, /* Not to be used */ /// RDMA_DONE = 3, /* Not to be used */ /// RDMA_ERROR = 4 /* Value fixed for all versions */ /// }; /// /// /* The position of the proc discriminator field is /// * fixed for all versions */ /// union rdma_body switch (rdma_proc proc) { /// case RDMA_MSG: /// rpc_rdma_header rdma_msg; /// case RDMA_NOMSG: /// rpc_rdma_header_nomsg rdma_nomsg; /// case RDMA_MSGP: /* Not to be used */ /// rpc_rdma_header_padded rdma_msgp; /// case RDMA_DONE: /* Not to be used */ /// void; /// case RDMA_ERROR: /// rpc_rdma_error rdma_error; /// }; /// /// /* /// * Fixed header fields (Section 4.2) /// */ /// struct rdma_msg { /// uint32 rdma_xid; /* Position fixed for all versions */ /// uint32 rdma_vers; /* Position fixed for all versions */ /// uint32 rdma_credit; /* Position fixed for all versions */ /// rdma_body rdma_body; /// };
/// * Procedures (Section 4.2.4) /// */ /// enum rdma_proc { /// RDMA_MSG = 0, /* Value fixed for all versions */ /// RDMA_NOMSG = 1, /* Value fixed for all versions */ /// RDMA_MSGP = 2, /* Not to be used */ /// RDMA_DONE = 3, /* Not to be used */ /// RDMA_ERROR = 4 /* Value fixed for all versions */ /// }; /// /// /* The position of the proc discriminator field is /// * fixed for all versions */ /// union rdma_body switch (rdma_proc proc) { /// case RDMA_MSG: /// rpc_rdma_header rdma_msg; /// case RDMA_NOMSG: /// rpc_rdma_header_nomsg rdma_nomsg; /// case RDMA_MSGP: /* Not to be used */ /// rpc_rdma_header_padded rdma_msgp; /// case RDMA_DONE: /* Not to be used */ /// void; /// case RDMA_ERROR: /// rpc_rdma_error rdma_error; /// }; /// /// /* /// * Fixed header fields (Section 4.2) /// */ /// struct rdma_msg { /// uint32 rdma_xid; /* Position fixed for all versions */ /// uint32 rdma_vers; /* Position fixed for all versions */ /// uint32 rdma_credit; /* Position fixed for all versions */ /// rdma_body rdma_body; /// };
<CODE ENDS>
<代码结束>
The RPC-over-RDMA header begins with four fixed 32-bit fields that control the RDMA interaction.
RPC over RDMA标头以四个固定的32位字段开始,这些字段控制RDMA交互。
The first three words are individual fields in the rdma_msg structure. The fourth word is the first word of the rdma_body union, which acts as the discriminator for the switched union. The contents of this field are described in Section 4.2.4.
前三个单词是rdma_msg结构中的单个字段。第四个字是rdma_主体并集的第一个字,它充当交换并集的鉴别器。第4.2.4节描述了该字段的内容。
These four fields must remain with the same meanings and in the same positions in all subsequent versions of the RPC-over-RDMA protocol.
这四个字段在RPCoverRDMA协议的所有后续版本中必须保持相同的含义和位置。
The XID generated for the RPC Call and Reply messages. Having the XID at a fixed location in the header makes it easy for the receiver to establish context as soon as each RPC-over-RDMA message arrives. This XID MUST be the same as the XID in the RPC message. The receiver MAY perform its processing based solely on the XID in the RPC-over-RDMA header, and thereby ignore the XID in the RPC message, if it so chooses.
为RPC调用和回复消息生成的XID。将XID放在报头中的一个固定位置可以使接收方在每个RPC over RDMA消息到达时很容易建立上下文。此XID必须与RPC消息中的XID相同。接收方可以仅基于RPC over RDMA报头中的XID来执行其处理,从而忽略RPC消息中的XID(如果它选择这样做的话)。
For RPC-over-RDMA version 1, this field MUST contain the value one (1). Rules regarding changes to this transport protocol version number can be found in Section 7.
对于RPC over RDMA版本1,此字段必须包含值1(1)。有关此传输协议版本号更改的规则,请参见第7节。
When sent with an RPC Call message, the requested credit value is provided. When sent with an RPC Reply message, the granted credit value is returned. Further discussion of how the credit value is determined can be found in Section 3.3.
当与RPC调用消息一起发送时,将提供请求的信用值。当与RPC回复消息一起发送时,将返回授予的信用值。关于如何确定信用价值的进一步讨论见第3.3节。
RDMA_MSG = 0 indicates that chunk lists and a Payload stream follow. The format of the chunk lists is discussed below.
RDMA_MSG=0表示块列表和有效负载流跟随。区块列表的格式如下所述。
RDMA_NOMSG = 1 indicates that after the chunk lists there is no Payload stream. In this case, the chunk lists provide information to allow the Responder to transfer the Payload stream using explicit RDMA operations.
RDMA_NOMSG=1表示在区块列表之后没有有效负载流。在这种情况下,区块列表提供了允许响应者使用显式RDMA操作传输有效负载流的信息。
RDMA_MSGP = 2 is reserved.
保留RDMA_MSGP=2。
RDMA_DONE = 3 is reserved.
RDMA_DONE=3是保留的。
RDMA_ERROR = 4 is used to signal an encoding error in the RPC-over-RDMA header.
RDMA_ERROR=4用于表示RPC over RDMA标头中的编码错误。
An RDMA_MSG procedure conveys the Transport stream and the Payload stream via an RDMA Send operation. The Transport stream contains the four fixed fields followed by the Read and Write lists and the Reply
RDMA_MSG过程通过RDMA发送操作传送传输流和有效负载流。传输流包含四个固定字段,后跟读写列表和回复
chunk, though any or all three MAY be marked as not present. The Payload stream then follows, beginning with its XID field. If a Read or Write chunk list is present, a portion of the Payload stream has been reduced and is conveyed via separate operations.
块,但其中任何一个或所有三个都可能被标记为不存在。然后,有效负载流从其XID字段开始。如果存在读或写区块列表,则有效负载流的一部分已减少,并通过单独的操作传送。
An RDMA_NOMSG procedure conveys the Transport stream via an RDMA Send operation. The Transport stream contains the four fixed fields followed by the Read and Write chunk lists and the Reply chunk. Though any of these MAY be marked as not present, one MUST be present and MUST hold the Payload stream for this RPC-over-RDMA message. If a Read or Write chunk list is present, a portion of the Payload stream has been excised and is conveyed via separate operations.
RDMA_NOMSG过程通过RDMA发送操作传送传输流。传输流包含四个固定字段,后跟读写块列表和应答块。尽管这些消息中的任何一个都可能被标记为不存在,但其中一个必须存在,并且必须持有此RPCoverRDMA消息的有效负载流。如果存在读或写区块列表,则有效负载流的一部分已被切除,并通过单独的操作传送。
An RDMA_ERROR procedure conveys the Transport stream via an RDMA Send operation. The Transport stream contains the four fixed fields followed by formatted error information. No Payload stream is conveyed in this type of RPC-over-RDMA message.
RDMA_错误过程通过RDMA发送操作传送传输流。传输流包含四个固定字段,后跟格式化的错误信息。在这种类型的RPC over RDMA消息中不传输有效负载流。
A Requester MUST NOT send an RPC-over-RDMA header with the RDMA_ERROR procedure. A Responder MUST silently discard RDMA_ERROR procedures.
请求者不得发送带有RDMA_错误过程的RPC over RDMA头。响应程序必须以静默方式放弃RDMA_错误过程。
The Transport stream and Payload stream can be constructed in separate buffers. However, the total length of the gathered buffers cannot exceed the inline threshold.
传输流和有效负载流可以在单独的缓冲器中构造。但是,收集的缓冲区的总长度不能超过内联阈值。
The chunk lists in an RPC-over-RDMA version 1 header are three XDR optional-data fields that follow the fixed header fields in RDMA_MSG and RDMA_NOMSG procedures. Read Section 4.19 of [RFC4506] carefully to understand how optional-data fields work. Examples of XDR-encoded chunk lists are provided in Section 4.7 as an aid to understanding.
RPC over RDMA version 1标头中的区块列表是三个XDR可选数据字段,它们位于RDMA_MSG和RDMA_NOMSG过程中的固定标头字段之后。仔细阅读[RFC4506]第4.19节,了解可选数据字段的工作原理。第4.7节提供了XDR编码块列表的示例,以帮助理解。
Often, an RPC-over-RDMA message has no associated chunks. In this case, the Read list, Write list, and Reply chunk are all marked "not present".
通常,RPCoverRDMA消息没有关联的块。在这种情况下,读列表、写列表和回复区块都标记为“不存在”。
Each RDMA_MSG or RDMA_NOMSG procedure has one "Read list". The Read list is a list of zero or more RDMA read segments, provided by the Requester, that are grouped by their Position fields into Read chunks. Each Read chunk advertises the location of argument data the Responder is to pull from the Requester. The Requester has reduced the data items in these chunks from the call's Payload stream.
每个RDMA_MSG或RDMA_NOMSG过程都有一个“读取列表”。读列表是由请求者提供的零个或多个RDMA读段的列表,这些读段按其位置字段分组为读块。每个读取块都会公布响应程序要从请求程序中提取的参数数据的位置。请求者已经从调用的有效负载流中减少了这些数据块中的数据项。
A Requester may transmit the Payload stream of an RPC Call message using a Position Zero Read chunk. If the RPC Call message has no argument data that is DDP-eligible and the Position Zero Read chunk is not being used, the Requester leaves the Read list empty.
请求者可以使用位置为零的读取块来传输RPC调用消息的有效负载流。如果RPC调用消息没有符合DDP条件的参数数据,并且未使用位置为零的读取块,则请求者会将读取列表保留为空。
Responders MUST leave the Read list empty in all replies.
响应者必须在所有回复中保留读取列表为空。
When reducing a DDP-eligible argument data item, a Requester records the XDR stream offset of that data item in the Read chunk's Position field. The Responder can then tell unambiguously where that chunk is to be reinserted into the received Payload stream to form a complete RPC Call message.
当减少DDP合格参数数据项时,请求者在读取区块的位置字段中记录该数据项的XDR流偏移量。然后,响应者可以明确地告诉将在何处将该块重新插入到接收的有效负载流中,以形成完整的RPC调用消息。
Each RDMA_MSG or RDMA_NOMSG procedure has one "Write list". The Write list is a list of zero or more Write chunks, provided by the Requester. Each Write chunk is an array of plain segments; thus, the Write list is a list of counted arrays.
每个RDMA_MSG或RDMA_NOMSG过程都有一个“写入列表”。写列表是由请求者提供的零个或多个写块的列表。每个写块是一个普通段数组;因此,写入列表是计数数组的列表。
If an RPC Reply message has no possible DDP-eligible result data items, the Requester leaves the Write list empty. When a Requester provides a Write list, the Responder MUST push data corresponding to DDP-eligible result data items to Requester memory referenced in the Write list. The Responder removes these data items from the reply's Payload stream.
如果RPC回复消息没有可能的DDP合格结果数据项,则请求者将写列表保留为空。当请求者提供写入列表时,响应者必须将与DDP合格结果数据项对应的数据推送到写入列表中引用的请求者内存中。应答器从应答的有效负载流中删除这些数据项。
A Requester constructs the Write list for an RPC transaction before the Responder has formulated its reply. When there is only one DDP-eligible result data item, the Requester inserts only a single Write chunk in the Write list. If the returned Write chunk is not an unused Write chunk, the Requester knows with certainty which result data item is contained in it.
请求者在响应者制定其回复之前为RPC事务构造写列表。当只有一个DDP合格结果数据项时,请求者只在写入列表中插入一个写入块。如果返回的写块不是未使用的写块,那么请求者肯定知道其中包含哪个结果数据项。
When a Requester has provided multiple Write chunks, the Responder fills in each Write chunk with one DDP-eligible result until there are either no more DDP-eligible results or no more Write chunks.
当请求者提供了多个写块时,响应者用一个符合DDP条件的结果填充每个写块,直到不再有符合DDP条件的结果或写入块为止。
The Requester might not be able to predict in advance which DDP-eligible data item goes in which chunk. Thus, the Requester is responsible for allocating and registering Write chunks large enough to accommodate the largest result data item that might be associated with each chunk in the Write list.
请求者可能无法提前预测哪个DDP合格数据项进入哪个区块。因此,请求者负责分配和注册足够大的写块,以容纳可能与写列表中的每个块关联的最大结果数据项。
As a Requester decodes a reply Payload stream, it is clear from the contents of the RPC Reply message which Write chunk contains which result data item.
当请求者解码应答有效负载流时,从RPC应答消息的内容可以清楚地看出,哪个写块包含哪个结果数据项。
There are occasions when a Requester provides a non-empty Write chunk but the Responder is not able to use it. For example, a ULP may define a union result where some arms of the union contain a DDP-eligible data item while other arms do not. The Responder is required to use Requester-provided Write chunks in this case, but if the Responder returns a result that uses an arm of the union that has no DDP-eligible data item, that Write chunk remains unconsumed.
有时,请求者提供非空写块,但响应者无法使用它。例如,ULP可以定义联合结果,其中联合的一些分支包含DDP合格数据项,而其他分支不包含DDP合格数据项。在这种情况下,响应程序需要使用请求程序提供的写块,但如果响应程序返回的结果使用的是没有DDP合格数据项的联合分支,则该写块将保持未使用状态。
If there is a subsequent DDP-eligible result data item in the RPC Reply message, it MUST be placed in that unconsumed Write chunk. Therefore, the Requester MUST provision each Write chunk so it can be filled with the largest DDP-eligible data item that can be placed in it.
如果RPC回复消息中有后续DDP合格结果数据项,则必须将其放置在未使用的写入区块中。因此,请求者必须对每个写块进行设置,以便可以将最大的符合DDP条件的数据项填入其中。
If this is the last or only Write chunk available and it remains unconsumed, the Responder MUST return this Write chunk as an unused Write chunk (see Section 3.4.6). The Responder sets the segment count to a value matching the Requester-provided Write chunk, but returns only empty segments in that Write chunk.
如果这是最后一个或唯一可用的写块,并且它仍然未使用,则响应者必须将此写块作为未使用的写块返回(请参阅第3.4.6节)。响应程序将段计数设置为与请求程序提供的写块匹配的值,但只返回该写块中的空段。
Unused Write chunks, or unused bytes in Write chunk segments, are returned to the RPC consumer as part of RPC completion. Even if a Responder indicates that a Write chunk is not consumed, the Responder may have written data into one or more segments before choosing not to return that data item. The Requester MUST NOT assume that the memory regions backing a Write chunk have not been modified.
未使用的写块或写块段中未使用的字节作为RPC完成的一部分返回给RPC使用者。即使响应程序指示未使用写入块,响应程序也可能在选择不返回该数据项之前已将数据写入一个或多个段。请求者不得假设支持写块的内存区域未被修改。
To force a Responder to return a DDP-eligible result inline, a Requester employs the following mechanism:
为了强制响应者内联返回DDP合格结果,请求者采用以下机制:
o When there is only one DDP-eligible result item in an RPC Reply message, the Requester provides an empty Write list.
o 当RPC回复消息中只有一个DDP合格结果项时,请求者提供一个空的写列表。
o When there are multiple DDP-eligible result data items and a Requester prefers that a data item is returned inline, the Requester provides an empty Write chunk for that item (see Section 3.4.6). The Responder MUST return the corresponding result data item inline and MUST return an empty Write chunk in that Write list position in the RPC Reply message.
o 当存在多个符合DDP条件的结果数据项且请求者希望以内联方式返回数据项时,请求者将为该项提供一个空的写块(参见第3.4.6节)。响应程序必须内联返回相应的结果数据项,并且必须在RPC应答消息的写入列表位置返回空的写入块。
As always, a Requester and Responder must prepare for a Long Reply to be used if the resulting RPC Reply might be too large to be conveyed in an RDMA Send.
与往常一样,如果生成的RPC应答可能太大而无法在RDMA发送中传输,则请求者和响应者必须准备使用长应答。
Each RDMA_MSG or RDMA_NOMSG procedure has one "Reply chunk" slot. A Requester MUST provide a Reply chunk whenever the maximum possible size of the RPC Reply message's Transport and Payload streams is larger than the inline threshold for messages from Responder to Requester. Otherwise, the Requester marks the Reply chunk as not present.
每个RDMA_MSG或RDMA_NOMSG过程都有一个“回复块”插槽。每当RPC回复消息的传输和有效负载流的最大可能大小大于从响应者到请求者的消息的内联阈值时,请求者必须提供回复区块。否则,请求者将应答块标记为不存在。
If the Transport stream and Payload stream together are smaller than the reply inline threshold, the Responder MAY return the RPC Reply message as a Short message rather than using the Requester-provided Reply chunk.
如果传输流和有效负载流一起小于应答内联阈值,则响应者可以将RPC应答消息作为短消息返回,而不是使用请求者提供的应答区块。
When a Requester provides a Reply chunk in an RPC Call message, the Responder MUST copy that chunk into the Transport header of the RPC Reply message. As with Write chunks, the Responder modifies the copied Reply chunk in the RPC Reply message to reflect the actual amount of data that is being returned in the Reply chunk.
当请求者在RPC调用消息中提供应答区块时,应答者必须将该区块复制到RPC应答消息的传输头中。与写块一样,响应程序修改RPC应答消息中复制的应答块,以反映应答块中返回的实际数据量。
The cost of registering and invalidating memory can be a significant proportion of the cost of an RPC-over-RDMA transaction. Thus, an important implementation consideration is how to minimize registration activity without exposing system memory needlessly.
注册内存和使内存无效的成本可能是RPC over RDMA事务成本的很大一部分。因此,一个重要的实现考虑是如何在不不必要地暴露系统内存的情况下最小化注册活动。
Data transferred via RDMA Read and Write can reside in a memory allocation not in the control of the RPC-over-RDMA transport. These memory allocations can persist outside the bounds of an RPC transaction. They are registered and invalidated as needed, as part of each RPC transaction.
通过RDMA读写传输的数据可以驻留在内存分配中,而不受RPC over RDMA传输的控制。这些内存分配可以在RPC事务的边界之外保持。作为每个RPC事务的一部分,它们会根据需要进行注册和失效。
The Requester endpoint must ensure that memory regions associated with each RPC transaction are protected from Responder access before allowing upper-layer access to the data contained in them. Moreover, the Requester must not access these memory regions while the Responder has access to them.
请求者端点必须确保与每个RPC事务相关联的内存区域在允许上层访问其中包含的数据之前受到保护,不被响应者访问。此外,当响应者可以访问这些内存区域时,请求者不得访问这些内存区域。
This includes memory regions that are associated with canceled RPCs. A Responder cannot know that the Requester is no longer waiting for a reply, and it might proceed to read or even update memory that the Requester might have released for other use.
这包括与已取消的RPC关联的内存区域。响应者无法知道请求者不再等待响应,它可能会继续读取甚至更新请求者可能已释放用于其他用途的内存。
The interface by which a ULP implementation communicates the eligibility of a data item locally to its local RPC-over-RDMA endpoint is not described by this specification.
ULP实现将数据项的合格性本地传递给其本地RPC over RDMA端点的接口不在本规范中描述。
Depending on the implementation and constraints imposed by ULBs, it is possible to implement reduction transparently to upper layers. Such implementations may lead to inefficiencies, either because they require the RPC layer to perform expensive registration and invalidation of memory "on the fly", or they may require using RDMA chunks in RPC Reply messages, along with the resulting additional handshaking with the RPC-over-RDMA peer.
根据ULB施加的实现和约束,可以对上层透明地实现缩减。这种实现可能会导致效率低下,因为它们需要RPC层“动态”执行昂贵的内存注册和失效,或者可能需要在RPC应答消息中使用RDMA块,以及由此产生的与RPC over RDMA对等方的额外握手。
However, these issues are internal and generally confined to the local interface between RPC and its upper layers, one in which implementations are free to innovate. The only requirement, beyond constraints imposed by the ULB, is that the resulting RPC-over-RDMA protocol sent to the peer be valid for the upper layer.
然而,这些问题是内部的,通常局限于RPC及其上层之间的本地接口,其中的实现可以自由创新。除了ULB施加的约束之外,唯一的要求是发送给对等方的RPC over RDMA协议对上层有效。
The choice of which memory registration strategies to employ is left to Requester and Responder implementers. To support the widest array of RDMA implementations, as well as the most general steering tag scheme, an Offset field is included in each RDMA segment.
使用哪种内存注册策略的选择留给请求者和响应者实现者。为了支持最广泛的RDMA实现,以及最通用的转向标记方案,每个RDMA段中都包含一个偏移字段。
While zero-based offset schemes are available in many RDMA implementations, their use by RPC requires individual registration of each memory region. For such implementations, this can be a significant overhead. By providing an offset in each chunk, many pre-registration or region-based registrations can be readily supported.
虽然在许多RDMA实现中可以使用基于零的偏移方案,但RPC使用这些方案需要对每个内存区域进行单独注册。对于这样的实现,这可能是一个巨大的开销。通过在每个块中提供偏移量,可以很容易地支持许多预注册或基于区域的注册。
A receiver performs basic validity checks on the RPC-over-RDMA header and chunk contents before it passes the RPC message to the RPC layer. If an incoming RPC-over-RDMA message is not as long as a minimal size RPC-over-RDMA header (28 bytes), the receiver cannot trust the value of the XID field; therefore, it MUST silently discard the message before performing any parsing. If other errors are detected in the RPC-over-RDMA header of an RPC Call message, a Responder MUST send an
接收方在将RPC消息传递到RPC层之前,对RPCoverRDMA头和块内容执行基本的有效性检查。如果传入的RPC over RDMA消息没有最小大小的RPC over RDMA报头(28字节)长,则接收方不能信任XID字段的值;因此,在执行任何解析之前,它必须悄悄地丢弃消息。如果在RPC调用消息的RPC over RDMA标头中检测到其他错误,则响应程序必须发送
RDMA_ERROR message back to the Requester. If errors are detected in the RPC-over-RDMA header of an RPC Reply message, a Requester MUST silently discard the message.
RDMA_错误消息返回给请求者。如果在RPC回复消息的RPC over RDMA头中检测到错误,请求者必须以静默方式放弃该消息。
To form an RDMA_ERROR procedure:
要形成RDMA_错误过程:
o The rdma_xid field MUST contain the same XID that was in the rdma_xid field in the failing request;
o rdma_xid字段必须包含与失败请求中rdma_xid字段中相同的xid;
o The rdma_vers field MUST contain the same version that was in the rdma_vers field in the failing request;
o rdma版本字段必须包含与失败请求中rdma版本字段相同的版本;
o The rdma_proc field MUST contain the value RDMA_ERROR; and
o rdma_proc字段必须包含值rdma_ERROR;和
o The rdma_err field contains a value that reflects the type of error that occurred, as described below.
o rdma_err字段包含一个反映所发生错误类型的值,如下所述。
An RDMA_ERROR procedure indicates a permanent error. Receipt of this procedure completes the RPC transaction associated with XID in the rdma_xid field. A receiver MUST silently discard an RDMA_ERROR procedure that it cannot decode.
RDMA_错误过程表示永久性错误。收到此过程将完成rdma_XID字段中与XID关联的RPC事务。接收器必须悄悄地丢弃它无法解码的RDMA_错误过程。
When a Responder detects an RPC-over-RDMA header version that it does not support (currently this document defines only version 1), it MUST reply with an RDMA_ERROR procedure and set the rdma_err value to ERR_VERS, also providing the low and high inclusive version numbers it does, in fact, support.
当响应程序检测到它不支持的RPC over RDMA标头版本时(当前本文档仅定义版本1),它必须使用RDMA_错误过程进行响应,并将RDMA_err值设置为err_VERS,同时提供它实际上支持的低和高兼容版本号。
A receiver might encounter an XDR parsing error that prevents it from processing the incoming Transport stream. Examples of such errors include an invalid value in the rdma_proc field; an RDMA_NOMSG message where the Read list, Write list, and Reply chunk are marked not present; or the value of the rdma_xid field does not match the value of the XID field in the accompanying RPC message. If the rdma_vers field contains a recognized value, but an XDR parsing error occurs, the Responder MUST reply with an RDMA_ERROR procedure and set the rdma_err value to ERR_CHUNK.
接收器可能会遇到XDR解析错误,从而阻止其处理传入的传输流。此类错误的示例包括rdma_proc字段中的无效值;一种RDMA_NOMSG消息,其中读列表、写列表和应答块被标记为不存在;或者rdma_xid字段的值与随附RPC消息中的xid字段的值不匹配。如果rdma_vers字段包含可识别的值,但发生XDR解析错误,则响应程序必须使用rdma_错误过程进行回复,并将rdma_err值设置为err_CHUNK。
When a Responder receives a valid RPC-over-RDMA header but the Responder's ULP implementation cannot parse the RPC arguments in the RPC Call message, the Responder SHOULD return an RPC Reply message with status GARBAGE_ARGS, using an RDMA_MSG procedure. This type of parsing failure might be due to mismatches between chunk sizes or offsets and the contents of the Payload stream, for example.
当响应程序接收到有效的RPC over RDMA标头,但响应程序的ULP实现无法解析RPC调用消息中的RPC参数时,响应程序应使用RDMA_MSG过程返回状态为垃圾参数的RPC回复消息。例如,这种类型的解析失败可能是由于块大小或偏移量与有效负载流的内容不匹配造成的。
In RPC-over-RDMA version 1, the Responder initiates RDMA Read and Write operations that target the Requester's memory. Problems might arise as the Responder attempts to use Requester-provided resources for RDMA operations. For example:
在RPC over RDMA版本1中,响应程序启动以请求程序内存为目标的RDMA读写操作。当响应者试图使用请求者提供的资源进行RDMA操作时,可能会出现问题。例如:
o Usually, chunks can be validated only by using their contents to perform data transfers. If chunk contents are invalid (e.g., a memory region is no longer registered or a chunk length exceeds the end of the registered memory region), a Remote Access Error occurs.
o 通常,块只能通过使用其内容执行数据传输来验证。如果区块内容无效(例如,内存区域不再注册或区块长度超过已注册内存区域的末尾),则会发生远程访问错误。
o If a Requester's Receive buffer is too small, the Responder's Send operation completes with a Local Length Error.
o 如果请求者的接收缓冲区太小,则响应者的发送操作将以本地长度错误完成。
o If the Requester-provided Reply chunk is too small to accommodate a large RPC Reply message, a Remote Access Error occurs. A Responder might detect this problem before attempting to write past the end of the Reply chunk.
o 如果请求者提供的回复区块太小,无法容纳大型RPC回复消息,则会发生远程访问错误。响应程序可能会在尝试写入超过应答块末尾之前检测到此问题。
RDMA operational errors are typically fatal to the connection. To avoid a retransmission loop and repeated connection loss that deadlocks the connection, once the Requester has re-established a connection, the Responder should send an RDMA_ERROR reply with an rdma_err value of ERR_CHUNK to indicate that no RPC-level reply is possible for that XID.
RDMA操作错误通常对连接是致命的。为了避免重传循环和重复的连接丢失导致连接死锁,一旦请求者重新建立连接,响应者应发送RDMA_错误回复,RDMA_ERROR值为err_CHUNK,以指示该XID不可能有RPC级别的回复。
While a Requester is constructing an RPC Call message, an unrecoverable problem might occur that prevents the Requester from posting further RDMA Work Requests on behalf of that message. As with other transports, if a Requester is unable to construct and transmit an RPC Call message, the associated RPC transaction fails immediately.
当请求者构造RPC调用消息时,可能会出现不可恢复的问题,阻止请求者代表该消息发布进一步的RDMA工作请求。与其他传输一样,如果请求者无法构造和传输RPC调用消息,则关联的RPC事务将立即失败。
After a Requester has received a reply, if it is unable to invalidate a memory region due to an unrecoverable problem, the Requester MUST close the connection to protect that memory from Responder access before the associated RPC transaction is complete.
请求者收到回复后,如果由于不可恢复的问题而无法使内存区域无效,则在相关RPC事务完成之前,请求者必须关闭连接以保护该内存不被响应者访问。
While a Responder is constructing an RPC Reply message or error message, an unrecoverable problem might occur that prevents the Responder from posting further RDMA Work Requests on behalf of that message. If a Responder is unable to construct and transmit an RPC Reply or RPC-over-RDMA error message, the Responder MUST close the connection to signal to the Requester that a reply was lost.
当响应程序正在构造RPC回复消息或错误消息时,可能会出现不可恢复的问题,阻止响应程序代表该消息发布更多RDMA工作请求。如果响应程序无法构造和传输RPC Reply或RPC over RDMA错误消息,则响应程序必须关闭连接以向请求程序发出响应丢失的信号。
The RDMA connection and physical link provide some degree of error detection and retransmission. iWARP's Marker PDU Aligned (MPA) layer (when used over TCP), the Stream Control Transmission Protocol (SCTP), as well as the InfiniBand [IBARCH] link layer all provide Cyclic Redundancy Check (CRC) protection of the RDMA payload, and CRC-class protection is a general attribute of such transports.
RDMA连接和物理链路提供一定程度的错误检测和重传。iWARP的标记PDU对齐(MPA)层(通过TCP使用时)、流控制传输协议(SCTP)以及InfiniBand[IBARCH]链路层都提供RDMA有效负载的循环冗余校验(CRC)保护,CRC类保护是此类传输的一般属性。
Additionally, the RPC layer itself can accept errors from the transport and recover via retransmission. RPC recovery can handle complete loss and re-establishment of a transport connection.
此外,RPC层本身可以接受来自传输的错误,并通过重传进行恢复。RPC恢复可以处理传输连接的完全丢失和重新建立。
The details of reporting and recovery from RDMA link-layer errors are described in specific link-layer APIs and operational specifications and are outside the scope of this protocol specification. See Section 8 for further discussion of the use of RPC-level integrity schemes to detect errors.
RDMA链路层错误报告和恢复的详细信息在特定链路层API和操作规范中描述,不在本协议规范的范围内。有关使用RPC级别完整性方案检测错误的进一步讨论,请参见第8节。
The following protocol elements are no longer supported in RPC-over-RDMA version 1. Related enum values and structure definitions remain in the RPC-over-RDMA version 1 protocol for backwards compatibility.
RPC over RDMA版本1中不再支持以下协议元素。相关的枚举值和结构定义保留在RPC over RDMA版本1协议中,以实现向后兼容性。
The specification of RDMA_MSGP in Section 3.9 of [RFC5666] is incomplete. To fully specify RDMA_MSGP would require:
[RFC5666]第3.9节中的RDMA_MSGP规范不完整。要完全指定RDMA_MSGP,需要:
o Updating the definition of DDP-eligibility to include data items that may be transferred, with padding, via RDMA_MSGP procedures
o 更新DDP资格的定义,以包括可通过RDMA_MSGP程序通过填充传输的数据项
o Adding full operational descriptions of the alignment and threshold fields
o 添加对齐和阈值字段的完整操作说明
o Discussing how alignment preferences are communicated between two peers without using CCP
o 讨论如何在不使用CCP的情况下在两个对等方之间沟通对齐偏好
o Describing the treatment of RDMA_MSGP procedures that convey Read or Write chunks
o 描述RDMA_MSGP过程的处理,该过程传递读或写块
The RDMA_MSGP message type is beneficial only when the padded data payload is at the end of an RPC message's argument or result list. This is not typical for NFSv4 COMPOUND RPCs, which often include a GETATTR operation as the final element of the compound operation array.
RDMA_MSGP消息类型只有在填充数据有效负载位于RPC消息的参数或结果列表的末尾时才有用。这对于NFSv4复合RPC并不典型,它通常包括一个GETATTR操作作为复合操作数组的最终元素。
Without a full specification of RDMA_MSGP, there has been no fully implemented prototype of it. Without a complete prototype of RDMA_MSGP support, it is difficult to assess whether this protocol element has benefit or can even be made to work interoperably.
没有RDMA_MSGP的完整规范,就没有完全实现的原型。如果没有RDMA_MSGP支持的完整原型,就很难评估此协议元素是否有好处,或者甚至可以使其协同工作。
Therefore, senders MUST NOT send RDMA_MSGP procedures. When receiving an RDMA_MSGP procedure, Responders SHOULD reply with an RDMA_ERROR procedure, setting the rdma_err field to ERR_CHUNK; Requesters MUST silently discard the message.
因此,发送方不得发送RDMA_MSGP过程。收到RDMA_MSGP过程时,响应者应使用RDMA_错误过程进行回复,将RDMA_err字段设置为err_CHUNK;请求者必须以静默方式放弃该消息。
Because no implementation of RPC-over-RDMA version 1 uses the Read-Read transfer model, there is never a need to send an RDMA_DONE procedure.
由于RPCoverRDMA版本1的任何实现都不使用读-读传输模型,因此永远不需要发送RDMA_-DONE过程。
Therefore, senders MUST NOT send RDMA_DONE messages. Receivers MUST silently discard RDMA_DONE messages.
因此,发件人不得发送RDMA_DONE消息。接收者必须悄悄地丢弃RDMA_DONE消息。
RPC-over-RDMA chunk lists are complex data types. In this section, illustrations are provided to help readers grasp how chunk lists are represented inside an RPC-over-RDMA header.
RPC over RDMA区块列表是复杂的数据类型。在本节中,提供了一些插图,以帮助读者理解如何在RPCoverRDMA头中表示块列表。
A plain segment is the simplest component, being made up of a 32-bit handle (H), a 32-bit length (L), and 64 bits of offset (OO). Once flattened into an XDR stream, plain segments appear as
普通段是最简单的组件,由32位句柄(H)、32位长度(L)和64位偏移量(OO)组成。一旦展平为XDR流,普通段显示为
HLOO
赫鲁
An RDMA read segment has an additional 32-bit position field (P). RDMA read segments appear as
RDMA读取段具有额外的32位位置字段(P)。RDMA读取段显示为
PHLOO
韧皮部
A Read chunk is a list of RDMA read segments. Each RDMA read segment is preceded by a 32-bit word containing a one if a segment follows or a zero if there are no more segments in the list. In XDR form, this would look like
读块是RDMA读段的列表。每个RDMA读取段前面都有一个32位字,如果后面有一个段,则前面有一个1;如果列表中没有其他段,则前面有一个0。在XDR形式中,这看起来像
1 PHLOO 1 PHLOO 1 PHLOO 0
1个PHLOO 1个PHLOO 1个PHLOO 0
where P would hold the same value for each RDMA read segment belonging to the same Read chunk.
其中P将为属于同一读块的每个RDMA读段保留相同的值。
The Read list is also a list of RDMA read segments. In XDR form, this would look like a Read chunk, except that the P values could vary across the list. An empty Read list is encoded as a single 32-bit zero.
读取列表也是RDMA读取段的列表。在XDR形式中,这看起来像是一个读块,只是P值可能在列表中有所不同。空读取列表编码为单个32位零。
One Write chunk is a counted array of plain segments. In XDR form, the count would appear as the first 32-bit word, followed by an HLOO for each element of the array. For instance, a Write chunk with three elements would look like
一个写块是一个普通段的计数数组。在XDR格式中,计数将显示为第一个32位字,然后是数组中每个元素的HLOO。例如,一个包含三个元素的写块
3 HLOO HLOO HLOO
3-HLOO-HLOO-HLOO
The Write list is a list of counted arrays. In XDR form, this is a combination of optional-data and counted arrays. To represent a Write list containing a Write chunk with three segments and a Write chunk with two segments, XDR would encode
写入列表是已计数数组的列表。在XDR形式中,这是可选数据和计数数组的组合。为了表示包含三段写块和两段写块的写列表,XDR将进行编码
1 3 HLOO HLOO HLOO 1 2 HLOO HLOO 0
1 3 HLOO HLOO 1 2 HLOO HLOO 0
An empty Write list is encoded as a single 32-bit zero.
空写列表编码为单个32位零。
The Reply chunk is a Write chunk. However, since it is an optional-data field, there is a 32-bit field in front of it that contains a one if the Reply chunk is present or a zero if it is not. After encoding, a Reply chunk with two segments would look like
回复区块是写入区块。但是,由于它是一个可选的数据字段,因此它前面有一个32位字段,如果应答块存在,则该字段包含一个1;如果不存在,则该字段包含零。编码后,包含两个段的回复块如下所示
1 2 HLOO HLOO
1.2 HLOO HLOO
Frequently, a Requester does not provide any chunks. In that case, after the four fixed fields in the RPC-over-RDMA header, there are simply three 32-bit fields that contain zero.
通常,请求者不提供任何块。在这种情况下,在RPCoverRDMA头中的四个固定字段之后,只有三个32位字段包含零。
In setting up a new RDMA connection, the first action by a Requester is to obtain a transport address for the Responder. The means used to obtain this address, and to open an RDMA connection, is dependent on the type of RDMA transport and is the responsibility of each RPC protocol binding and its local implementation.
在建立新的RDMA连接时,请求者的第一个操作是获取响应者的传输地址。用于获取此地址和打开RDMA连接的方法取决于RDMA传输的类型,并且由每个RPC协议绑定及其本地实现负责。
RPC services normally register with a portmap or rpcbind service [RFC1833], which associates an RPC Program number with a service address. This policy is no different with RDMA transports. However, a different and distinct service address (port number) might sometimes be required for ULP operation with RPC-over-RDMA.
RPC服务通常向portmap或rpcbind服务[RFC1833]注册,该服务将RPC程序编号与服务地址相关联。此策略与RDMA传输没有区别。但是,使用RPCoverRDMA的ULP操作有时可能需要不同的服务地址(端口号)。
When mapped atop the iWARP transport [RFC5040] [RFC5041], which uses IP port addressing due to its layering on TCP and/or SCTP, port mapping is trivial and consists merely of issuing the port in the connection process. The NFS/RDMA protocol service address has been assigned port 20049 by IANA, for both iWARP/TCP and iWARP/SCTP [RFC5667].
当映射到iWARP传输[RFC5040][RFC5041]上时,由于其在TCP和/或SCTP上的分层而使用IP端口寻址,端口映射很简单,只包括在连接过程中发出端口。IANA已经为iWARP/TCP和iWARP/SCTP[RFC5667]分配了NFS/RDMA协议服务地址端口20049。
When mapped atop InfiniBand [IBARCH], which uses a service endpoint naming scheme based on a Group Identifier (GID), a translation MUST be employed. One such translation is described in Annexes A3 (Application Specific Identifiers), A4 (Sockets Direct Protocol (SDP)), and A11 (RDMA IP CM Service) of [IBARCH], which is appropriate for translating IP port addressing to the InfiniBand network. Therefore, in this case, IP port addressing may be readily employed by the upper layer.
当映射到InfiniBand[IBARCH]上时,它使用基于组标识符(GID)的服务端点命名方案,必须使用转换。[IBARCH]的附录A3(应用特定标识符)、A4(套接字直接协议(SDP))和A11(RDMA IP CM服务)中描述了一种此类转换,其适用于将IP端口寻址转换为InfiniBand网络。因此,在这种情况下,上层可以容易地使用IP端口寻址。
When a mapping standard or convention exists for IP ports on an RDMA interconnect, there are several possibilities for each upper layer to consider:
当RDMA互连上的IP端口存在映射标准或约定时,每个上层都有几种可能需要考虑:
o One possibility is to have the Responder register its mapped IP port with the rpcbind service under the netid (or netids) defined here. An RPC-over-RDMA-aware Requester can then resolve its desired service to a mappable port and proceed to connect. This is the most flexible and compatible approach, for those upper layers that are defined to use the rpcbind service.
o 一种可能是让响应者在此处定义的netid(或netid)下向rpcbind服务注册其映射的IP端口。RPC over RDMA感知请求程序可以将其所需的服务解析为可映射端口并继续连接。对于那些定义为使用rpcbind服务的上层,这是最灵活、最兼容的方法。
o A second possibility is to have the Responder's portmapper register itself on the RDMA interconnect at a "well-known" service address (on UDP or TCP, this corresponds to port 111). A Requester could connect to this service address and use the portmap protocol to obtain a service address in response to a program number, e.g., an iWARP port number or an InfiniBand GID.
o 第二种可能是让响应者的端口映射器在RDMA互连上以“众所周知”的服务地址(在UDP或TCP上,这对应于端口111)注册自身。请求者可以连接到此服务地址,并使用portmap协议获取服务地址,以响应程序号,例如iWARP端口号或InfiniBand GID。
o Alternately, the Requester could simply connect to the mapped well-known port for the service itself, if it is appropriately defined. By convention, the NFS/RDMA service, when operating atop such an InfiniBand fabric, uses the same 20049 assignment as for iWARP.
o 或者,请求者可以简单地连接到服务本身的映射的已知端口(如果已适当定义)。按照惯例,NFS/RDMA服务在这样的InfiniBand结构上运行时,使用与iWARP相同的20049分配。
Historically, different RPC protocols have taken different approaches to their port assignment. Therefore, the specific method is left to each RPC-over-RDMA-enabled ULB and is not addressed in this document.
历史上,不同的RPC协议对其端口分配采取了不同的方法。因此,特定的方法留给每个支持RDMA的RPC ULB使用,本文档中没有介绍。
In Section 9, this specification defines two new netid values, to be used for registration of upper layers atop iWARP [RFC5040] [RFC5041] and (when a suitable port translation service is available) InfiniBand [IBARCH]. Additional RDMA-capable networks MAY define their own netids, or if they provide a port translation, they MAY share the one defined in this document.
在第9节中,本规范定义了两个新的netid值,用于在iWARP[RFC5040][RFC5041]和(当合适的端口转换服务可用时)InfiniBand[IBARCH]上注册上层。其他支持RDMA的网络可以定义自己的NetID,或者如果它们提供端口转换,它们可以共享本文档中定义的NetID。
An ULP is typically defined independently of any particular RPC transport. An ULB (ULB) specification provides guidance that helps the ULP interoperate correctly and efficiently over a particular transport. For RPC-over-RDMA version 1, a ULB may provide:
ULP通常独立于任何特定RPC传输进行定义。ULB(ULB)规范提供指导,帮助ULP在特定传输上正确有效地互操作。对于RPC over RDMA版本1,ULB可提供:
o A taxonomy of XDR data items that are eligible for DDP
o 符合DDP条件的XDR数据项的分类法
o Constraints on which upper-layer procedures may be reduced and on how many chunks may appear in a single RPC request
o 限制哪些上层过程可以减少,以及单个RPC请求中可能出现多少块
o A method for determining the maximum size of the reply Payload stream for all procedures in the ULP
o 一种用于确定ULP中所有过程的应答有效负载流的最大大小的方法
o An rpcbind port assignment for operation of the RPC Program and Version on an RPC-over-RDMA transport
o rpcbind端口分配,用于在RPC over RDMA传输上操作RPC程序和版本
Each RPC Program and Version tuple that utilizes RPC-over-RDMA version 1 needs to have a ULB specification.
每个利用RPC over RDMA版本1的RPC程序和版本元组都需要有一个ULB规范。
An ULB designates some XDR data items as eligible for DDP. As an RPC-over-RDMA message is formed, DDP-eligible data items can be removed from the Payload stream and placed directly in the receiver's memory.
ULB将某些XDR数据项指定为符合DDP条件。当RPC over RDMA消息形成时,可以从有效负载流中删除符合DDP条件的数据项,并将其直接放入接收器的内存中。
An XDR data item should be considered for DDP-eligibility if there is a clear benefit to moving the contents of the item directly from the sender's memory to the receiver's memory. Criteria for DDP-eligibility include:
如果直接将XDR数据项的内容从发送方内存移动到接收方内存有明显的好处,则应考虑将其作为DDP资格。DDP资格标准包括:
o The XDR data item is frequently sent or received, and its size is often much larger than typical inline thresholds.
o XDR数据项经常被发送或接收,其大小通常比典型的内联阈值大得多。
o If the XDR data item is a result, its maximum size must be predictable in advance by the Requester.
o 如果XDR数据项是一个结果,那么请求者必须提前预测其最大大小。
o Transport-level processing of the XDR data item is not needed. For example, the data item is an opaque byte array, which requires no XDR encoding and decoding of its content.
o 不需要对XDR数据项进行传输级处理。例如,数据项是不透明字节数组,不需要对其内容进行XDR编码和解码。
o The content of the XDR data item is sensitive to address alignment. For example, a data copy operation would be required on the receiver to enable the message to be parsed correctly, or to enable the data item to be accessed.
o XDR数据项的内容对地址对齐非常敏感。例如,需要在接收器上执行数据复制操作,以正确解析消息,或访问数据项。
o The XDR data item does not contain DDP-eligible data items.
o XDR数据项不包含符合DDP条件的数据项。
In addition to defining the set of data items that are DDP-eligible, a ULB may also limit the use of chunks to particular upper-layer procedures. If more than one data item in a procedure is DDP-eligible, the ULB may also limit the number of chunks that a Requester can provide for a particular upper-layer procedure.
除了定义符合DDP条件的数据项集外,ULB还可以将块的使用限制为特定的上层过程。如果一个过程中有多个数据项符合DDP条件,ULB还可以限制请求者可以为特定上层过程提供的数据块数量。
Senders MUST NOT reduce data items that are not DDP-eligible. Such data items MAY, however, be moved as part of a Position Zero Read chunk or a Reply chunk.
发件人不得减少不符合DDP条件的数据项。然而,这些数据项可以作为零位读取块或应答块的一部分移动。
The programming interface by which an upper-layer implementation indicates the DDP-eligibility of a data item to the RPC transport is not described by this specification. The only requirements are that the receiver can re-assemble the transmitted RPC-over-RDMA message into a valid XDR stream, and that DDP-eligibility rules specified by the ULB are respected.
本规范未描述上层实现用于向RPC传输指示数据项的DDP合格性的编程接口。唯一的要求是接收器可以将传输的RPC over RDMA消息重新组装成有效的XDR流,并且遵守ULB指定的DDP资格规则。
There is no provision to express DDP-eligibility within the XDR language. The only definitive specification of DDP-eligibility is a ULB.
在XDR语言中没有明确规定DDP资格。唯一确定的DDP合格性规范是ULB。
In general, a DDP-eligibility violation occurs when:
通常,在以下情况下,DDP资格违规:
o A Requester reduces a non-DDP-eligible argument data item. The Responder MUST NOT process this RPC Call message and MUST report the violation as described in Section 4.5.2.
o 请求者减少不符合DDP条件的参数数据项。响应者不得处理此RPC调用消息,必须按照第4.5.2节所述报告违规行为。
o A Responder reduces a non-DDP-eligible result data item. The Requester MUST terminate the pending RPC transaction and report an appropriate permanent error to the RPC consumer.
o 响应者减少非DDP合格结果数据项。请求者必须终止挂起的RPC事务,并向RPC使用者报告适当的永久错误。
o A Responder does not reduce a DDP-eligible result data item into an available Write chunk. The Requester MUST terminate the pending RPC transaction and report an appropriate permanent error to the RPC consumer.
o 响应程序不会将符合DDP条件的结果数据项缩减为可用的写块。请求者必须终止挂起的RPC事务,并向RPC使用者报告适当的永久错误。
A Requester provides resources for both an RPC Call message and its matching RPC Reply message. A Requester forms the RPC Call message itself; thus, the Requester can compute the exact resources needed.
请求者为RPC调用消息及其匹配的RPC应答消息提供资源。请求者形成RPC调用消息本身;因此,请求者可以计算所需的确切资源。
A Requester must allocate resources for the RPC Reply message (an RPC-over-RDMA credit, a Receive buffer, and possibly a Write list and Reply chunk) before the Responder has formed the actual reply. To accommodate all possible replies for the procedure in the RPC Call message, a Requester must allocate reply resources based on the maximum possible size of the expected RPC Reply message.
请求者必须为RPC回复消息分配资源(RPC over RDMA信用、接收缓冲区,可能还有写列表和回复区块),然后响应者才能形成实际的回复。为了在RPC调用消息中容纳过程的所有可能回复,请求者必须根据预期RPC回复消息的最大可能大小分配回复资源。
If there are procedures in the ULP for which there is no clear reply size maximum, the ULB needs to specify a dependable means for determining the maximum.
如果ULP中存在没有明确回复大小最大值的程序,ULB需要指定确定最大值的可靠方法。
There may be other details provided in a ULB.
ULB中可能会提供其他详细信息。
o An ULB may recommend inline threshold values or other transport-related parameters for RPC-over-RDMA version 1 connections bearing that ULP.
o ULB可以为承载该ULP的RPC over RDMA版本1连接推荐内联阈值或其他传输相关参数。
o An ULP may provide a means to communicate these transport-related parameters between peers. Note that RPC-over-RDMA version 1 does not specify any mechanism for changing any transport-related parameter after a connection has been established.
o ULP可以提供在对等方之间通信这些传输相关参数的方法。请注意,RPC over RDMA版本1没有指定任何机制,用于在建立连接后更改任何与传输相关的参数。
o Multiple ULPs may share a single RPC-over-RDMA version 1 connection when their ULBs allow the use of RPC-over-RDMA version 1 and the rpcbind port assignments for the Protocols allow connection sharing. In this case, the same transport parameters (such as inline threshold) apply to all Protocols using that connection.
o 如果多个ULP的ULB允许使用RPC over RDMA version 1,并且协议的rpcbind端口分配允许连接共享,则多个ULP可以共享单个RPC over RDMA version 1连接。在这种情况下,相同的传输参数(如内联阈值)适用于使用该连接的所有协议。
Each ULB needs to be designed to allow correct interoperation without regard to the transport parameters actually in use. Furthermore, implementations of ULPs must be designed to interoperate correctly regardless of the connection parameters in effect on a connection.
每个ULB需要设计为允许正确的互操作,而不考虑实际使用的传输参数。此外,ULPs的实现必须设计为能够正确地互操作,而不考虑连接上的有效连接参数。
An RPC Program and Version tuple may be extensible. For instance, there may be a minor versioning scheme that is not reflected in the RPC version number, or the ULP may allow additional features to be specified after the original RPC Program specification was ratified.
RPC程序和版本元组可以扩展。例如,可能存在未反映在RPC版本号中的次要版本控制方案,或者ULP可能允许在批准原始RPC程序规范后指定其他功能。
ULBs are provided for interoperable RPC Programs and Versions by extending existing ULBs to reflect the changes made necessary by each addition to the existing XDR.
通过扩展现有ULB以反映对现有XDR的每次添加所需的更改,为可互操作的RPC程序和版本提供了ULB。
The RPC-over-RDMA header format is specified using XDR, unlike the message header used with RPC-over-TCP. To maintain a high degree of interoperability among implementations of RPC-over-RDMA, any change to this XDR requires a protocol version number change. New versions of RPC-over-RDMA may be published as separate protocol specifications without updating this document.
RDMA上的RPC标头格式是使用XDR指定的,与TCP上的RPC使用的消息标头不同。为了在基于RDMA的RPC实现之间保持高度的互操作性,对此XDR的任何更改都需要更改协议版本号。新版本的RPC over RDMA可以作为单独的协议规范发布,而无需更新本文档。
The first four fields in every RPC-over-RDMA header must remain aligned at the same fixed offsets for all versions of the RPC-over-RDMA protocol. The version number must be in a fixed place to enable implementations to detect protocol version mismatches.
对于所有版本的RPC over RDMA协议,每个RPC over RDMA报头中的前四个字段必须以相同的固定偏移量保持对齐。版本号必须位于固定位置,以使实现能够检测协议版本不匹配。
For version mismatches to be reported in a fashion that all future version implementations can reliably decode, the rdma_proc field must remain in a fixed place, the value of ERR_VERS must always remain the same, and the field placement in struct rpc_rdma_errvers must always remain the same.
为了以所有未来版本实现都可以可靠解码的方式报告版本不匹配,rdma_proc字段必须保持在固定位置,ERR_VERS的值必须始终保持不变,并且struct rpc_rdma_ERVERS中的字段位置必须始终保持不变。
Introducing new capabilities to RPC-over-RDMA version 1 is limited to the adoption of conventions that make use of existing XDR (defined in this document) and allowed abstract RDMA operations. Because no mechanism for detecting optional features exists in RPC-over-RDMA version 1, implementations must rely on ULPs to communicate the existence of such extensions.
在RPCoverRDMA版本1中引入新功能仅限于采用使用现有XDR(在本文档中定义)和允许抽象RDMA操作的约定。因为RPC over RDMA版本1中不存在用于检测可选特性的机制,所以实现必须依赖ULP来传达此类扩展的存在。
Such extensions must be specified in a Standards Track RFC with appropriate review by the NFSv4 Working Group and the IESG. An example of a conventional extension to RPC-over-RDMA version 1 is the specification of backward direction message support to enable NFSv4.1 callback operations, described in [RFC8167].
此类扩展必须在标准跟踪RFC中指定,并由NFSv4工作组和IESG进行适当审查。RDMA版本1上RPC的常规扩展的一个示例是反向消息支持的规范,以支持NFSv4.1回调操作,如[RFC8167]所述。
A primary consideration is the protection of the integrity and confidentiality of local memory by an RPC-over-RDMA transport. The use of an RPC-over-RDMA transport protocol MUST NOT introduce vulnerabilities to system memory contents nor to memory owned by user processes.
主要考虑的是通过RPC over RDMA传输保护本地内存的完整性和机密性。使用RPC over RDMA传输协议不得给系统内存内容或用户进程拥有的内存带来漏洞。
It is REQUIRED that any RDMA provider used for RPC transport be conformant to the requirements of [RFC5042] in order to satisfy these protections. These protections are provided by the RDMA layer specifications, and in particular, their security models.
要求用于RPC传输的任何RDMA提供程序符合[RFC5042]的要求,以满足这些保护要求。这些保护由RDMA层规范提供,尤其是其安全模型。
The use of Protection Domains to limit the exposure of memory regions to a single connection is critical. Any attempt by an endpoint not participating in that connection to reuse memory handles needs to result in immediate failure of that connection. Because ULP security mechanisms rely on this aspect of Reliable Connection behavior, strong authentication of remote endpoints is recommended.
使用保护域来限制内存区域对单个连接的暴露是至关重要的。不参与该连接的端点尝试重用内存句柄时,需要立即导致该连接失败。由于ULP安全机制依赖于可靠连接行为的这一方面,因此建议对远程端点进行强身份验证。
Unpredictable memory handles should be used for any operation requiring advertised memory regions. Advertising a continuously registered memory region allows a remote host to read or write to that region even when an RPC involving that memory is not under way. Therefore, implementations should avoid advertising persistently registered memory.
任何需要公布内存区域的操作都应使用不可预测的内存句柄。播发连续注册的内存区域允许远程主机读取或写入该区域,即使涉及该内存的RPC未在进行中。因此,实现应该避免宣传持久注册内存。
Requesters should register memory regions for remote access only when they are about to be the target of an RPC operation that involves an RDMA Read or Write.
只有当请求者即将成为涉及RDMA读写的RPC操作的目标时,才应该注册远程访问的内存区域。
Registered memory regions should be invalidated as soon as related RPC operations are complete. Invalidation and DMA unmapping of memory regions should be complete before message integrity checking is done and before the RPC consumer is allowed to continue execution and use or alter the contents of a memory region.
注册的内存区域应在相关RPC操作完成后立即失效。在完成消息完整性检查之前,以及在允许RPC使用者继续执行和使用或更改内存区域的内容之前,应该完成内存区域的无效化和DMA取消映射。
An RPC transaction on a Requester might be terminated before a reply arrives if the RPC consumer exits unexpectedly (for example, it is signaled or a segmentation fault occurs). When an RPC terminates abnormally, memory regions associated with that RPC should be invalidated appropriately before the regions are released to be reused for other purposes on the Requester.
如果RPC使用者意外退出(例如,发出信号或发生分段错误),请求者上的RPC事务可能在答复到达之前终止。当RPC异常终止时,与该RPC关联的内存区域应在释放这些区域以便在请求者上用于其他目的之前适当地失效。
A detailed discussion of denial-of-service exposures that can result from the use of an RDMA transport is found in Section 6.4 of [RFC5042].
[RFC5042]第6.4节详细讨论了使用RDMA传输可能导致的拒绝服务风险。
A Responder is not obliged to pull Read chunks that are unreasonably large. The Responder can use an RDMA_ERROR response to terminate RPCs with unreadable Read chunks. If a Responder transmits more data than a Requester is prepared to receive in a Write or Reply chunk, the RDMA Network Interface Cards (RNICs) typically terminate the connection. For further discussion, see Section 4.5. Such repeated chunk errors can deny service to other users sharing the connection from the errant Requester.
响应者没有义务拉取过大的读取块。响应程序可以使用RDMA_错误响应来终止具有不可读读块的RPC。如果响应者发送的数据多于请求者准备在写入或应答块中接收的数据,RDMA网络接口卡(RNIC)通常会终止连接。有关进一步讨论,请参见第4.5节。这种重复的区块错误可能会拒绝向其他用户提供服务,这些用户共享来自错误请求者的连接。
An RPC-over-RDMA transport implementation is not responsible for throttling the RPC request rate, other than to keep the number of concurrent RPC transactions at or under the number of credits granted per connection. This is explained in Section 3.3.1. A sender can trigger a self denial of service by exceeding the credit grant repeatedly.
RPC over RDMA传输实现不负责限制RPC请求速率,只负责将并发RPC事务数保持在或低于每个连接授予的信用数。第3.3.1节对此进行了解释。发送方可以通过多次超过信用授权触发自我拒绝服务。
When an RPC has been canceled due to a signal or premature exit of an application process, a Requester may invalidate the RPC's Write and Reply chunks. Invalidation prevents the subsequent arrival of the Responder's reply from altering the memory regions associated with those chunks after the memory has been reused.
当RPC由于信号或应用程序进程过早退出而被取消时,请求者可能会使RPC的写入和回复块无效。失效防止响应者的后续回复在内存被重用后改变与这些块相关联的内存区域。
On the Requester, a malfunctioning application or a malicious user can create a situation where RPCs are continuously initiated and then aborted, resulting in Responder replies that terminate the underlying RPC-over-RDMA connection repeatedly. Such situations can deny service to other users sharing the connection from that Requester.
在请求程序上,出现故障的应用程序或恶意用户可能会造成这样的情况,即RPC会连续启动,然后中止,从而导致响应程序回复,从而反复终止基础RPC over RDMA连接。这种情况可能会拒绝向共享来自该请求者的连接的其他用户提供服务。
ONC RPC provides cryptographic security via the RPCSEC_GSS framework [RFC7861]. RPCSEC_GSS implements message authentication (rpc_gss_svc_none), per-message integrity checking (rpc_gss_svc_integrity), and per-message confidentiality (rpc_gss_svc_privacy) in the layer above RPC-over-RDMA. The latter two services require significant computation and movement of data on each endpoint host. Some performance benefits enabled by RDMA transports can be lost.
ONC RPC通过RPCSEC_GSS框架[RFC7861]提供加密安全性。RPCSEC_GSS在rpc over RDMA上的层中实现消息身份验证(rpc_GSS_svc_none)、每消息完整性检查(rpc_GSS_svc_integrity)和每消息机密性(rpc_GSS_svc_privacy)。后两种服务需要在每个端点主机上进行大量的计算和数据移动。RDMA传输带来的一些性能优势可能会丢失。
For any RPC transport, utilizing RPCSEC_GSS integrity or privacy services has performance implications. Protection below the RPC transport is often more appropriate in performance-sensitive deployments, especially if it, too, can be offloaded. Certain configurations of IPsec can be co-located in RDMA hardware, for example, without change to RDMA consumers and little loss of data
对于任何RPC传输,使用RPCSEC_GSS完整性或隐私服务都会影响性能。RPC传输下的保护通常更适合于性能敏感的部署,尤其是在可以卸载的情况下。例如,IPsec的某些配置可以位于RDMA硬件中,而不需要更改RDMA使用者,也不会造成数据丢失
movement efficiency. Such arrangements can also provide a higher degree of privacy by hiding endpoint identity or altering the frequency at which messages are exchanged, at a performance cost.
移动效率。这样的安排还可以通过隐藏端点身份或改变消息交换的频率来提供更高程度的隐私,但要以性能为代价。
The use of protection in a lower layer MAY be negotiated through the use of an RPCSEC_GSS security flavor defined in [RFC7861] in conjunction with the Channel Binding mechanism [RFC5056] and IPsec Channel Connection Latching [RFC5660]. Use of such mechanisms is REQUIRED where integrity or confidentiality is desired and where efficiency is required.
可通过使用[RFC7861]中定义的RPCSEC_GSS安全特性,结合信道绑定机制[RFC5056]和IPsec信道连接锁存[RFC5660]来协商下层保护的使用。在需要完整性或保密性以及需要效率的情况下,需要使用此类机制。
Not all RDMA devices and fabrics support the above protection mechanisms. Also, per-message authentication is still required on NFS clients where multiple users access NFS files. In these cases, RPCSEC_GSS can protect NFS traffic conveyed on RPC-over-RDMA connections.
并非所有RDMA设备和结构都支持上述保护机制。此外,在多个用户访问NFS文件的NFS客户端上仍然需要每消息身份验证。在这些情况下,RPCSEC_GSS可以保护通过RDMA连接在RPC上传输的NFS流量。
RPCSEC_GSS extends the ONC RPC protocol [RFC5531] without changing the format of RPC messages. By observing the conventions described in this section, an RPC-over-RDMA transport can convey RPCSEC_GSS-protected RPC messages interoperably.
RPCSEC_GSS在不更改RPC消息格式的情况下扩展了ONC RPC协议[RFC5531]。通过遵守本节中描述的约定,RPC over RDMA传输可以互操作地传输受RPCSEC_GSS保护的RPC消息。
As part of the ONC RPC protocol, protocol elements of RPCSEC_GSS that appear in the Payload stream of an RPC-over-RDMA message (such as control messages exchanged as part of establishing or destroying a security context or data items that are part of RPCSEC_GSS authentication material) MUST NOT be reduced.
作为ONC RPC协议的一部分,不得减少RPC over RDMA消息的有效负载流中出现的RPCSEC_GSS的协议元素(例如作为建立或销毁安全上下文的一部分交换的控制消息或作为RPCSEC_GSS身份验证材料的一部分的数据项)。
Some NFS client implementations use a separate connection to establish a Generic Security Service (GSS) context for NFS operation. These clients use TCP and the standard NFS port (2049) for context establishment. To enable the use of RPCSEC_GSS with NFS/RDMA, an NFS server MUST also provide a TCP-based NFS service on port 2049.
一些NFS客户端实现使用单独的连接为NFS操作建立通用安全服务(GSS)上下文。这些客户端使用TCP和标准NFS端口(2049)建立上下文。要使RPCSEC_GSS能够与NFS/RDMA一起使用,NFS服务器还必须在端口2049上提供基于TCP的NFS服务。
The RPCSEC_GSS authentication service has no impact on the DDP-eligibility of data items in a ULP.
RPCSEC_GSS认证服务对ULP中数据项的DDP资格没有影响。
However, RPCSEC_GSS authentication material appearing in an RPC message header can be larger than, say, an AUTH_SYS authenticator. In particular, when an RPCSEC_GSS pseudoflavor is in use, a Requester
但是,出现在RPC消息头中的RPCSEC_GSS身份验证材料可以大于(例如)AUTH_SYS身份验证器。特别是,当使用RPCSEC_GSS伪味道时,请求者
needs to accommodate a larger RPC credential when marshaling RPC Call messages and needs to provide for a maximum size RPCSEC_GSS verifier when allocating reply buffers and Reply chunks.
在封送RPC调用消息时需要容纳较大的RPC凭据,并且在分配应答缓冲区和应答块时需要提供最大大小的RPCSEC_GSS验证器。
RPC messages, and thus Payload streams, are made larger as a result. ULP operations that fit in a Short Message when a simpler form of authentication is in use might need to be reduced, or conveyed via a Long Message, when RPCSEC_GSS authentication is in use. It is more likely that a Requester provides both a Read list and a Reply chunk in the same RPC-over-RDMA header to convey a Long Call and provision a receptacle for a Long Reply. More frequent use of Long Messages can impact transport efficiency.
因此,RPC消息和有效负载流变得更大。当使用RPCSEC_GSS身份验证时,可能需要减少或通过长消息传递适合在使用更简单形式的身份验证时短消息中的ULP操作。更可能的情况是,请求者在同一RPCoverRDMA报头中提供读取列表和应答块,以传递长调用并为长应答提供容器。更频繁地使用长消息可能会影响传输效率。
The RPCSEC_GSS integrity service enables endpoints to detect modification of RPC messages in flight. The RPCSEC_GSS privacy service prevents all but the intended recipient from viewing the cleartext content of RPC arguments and results. RPCSEC_GSS integrity and privacy services are end-to-end. They protect RPC arguments and results from application to server endpoint, and back.
RPCSEC_GSS完整性服务使端点能够检测飞行中RPC消息的修改。RPCSEC_GSS隐私服务阻止除预期收件人以外的所有人查看RPC参数和结果的明文内容。RPCSEC_GSS的完整性和隐私服务是端到端的。它们保护从应用程序到服务器端点以及从应用程序到服务器端点的RPC参数和结果。
The RPCSEC_GSS integrity and encryption services operate on whole RPC messages after they have been XDR encoded for transmit, and before they have been XDR decoded after receipt. Both sender and receiver endpoints use intermediate buffers to prevent exposure of encrypted data or unverified cleartext data to RPC consumers. After verification, encryption, and message wrapping has been performed, the transport layer MAY use RDMA data transfer between these intermediate buffers.
RPCSEC_GSS完整性和加密服务在对整个RPC消息进行XDR编码以进行传输之后,以及在接收后对其进行XDR解码之前,对其进行操作。发送方和接收方端点都使用中间缓冲区来防止将加密数据或未经验证的明文数据暴露给RPC使用者。在执行验证、加密和消息包装之后,传输层可以在这些中间缓冲区之间使用RDMA数据传输。
The process of reducing a DDP-eligible data item removes the data item and its XDR padding from the encoded XDR stream. XDR padding of a reduced data item is not transferred in an RPC-over-RDMA message. After reduction, the Payload stream contains fewer octets than the whole XDR stream did beforehand. XDR padding octets are often zero bytes, but they don't have to be. Thus, reducing DDP-eligible items affects the result of message integrity verification or encryption.
减少DDP合格数据项的过程将从编码的XDR流中删除该数据项及其XDR填充。精简数据项的XDR填充不会在RPC over RDMA消息中传输。减少后,有效负载流包含的八位字节数比整个XDR流之前更少。XDR填充八位字节通常是零字节,但它们不一定是零字节。因此,减少DDP合格项会影响消息完整性验证或加密的结果。
Therefore, a sender MUST NOT reduce a Payload stream when RPCSEC_GSS integrity or encryption services are in use. Effectively, no data item is DDP-eligible in this situation, and Chunked Messages cannot be used. In this mode, an RPC-over-RDMA transport operates in the same manner as a transport that does not support DDP.
因此,在使用RPCSEC_GSS完整性或加密服务时,发送方不得减少有效负载流。实际上,在这种情况下,没有数据项符合DDP条件,并且不能使用分块消息。在此模式下,RPC over RDMA传输的操作方式与不支持DDP的传输相同。
When an RPCSEC_GSS integrity or privacy service is in use, a Requester provides both a Read list and a Reply chunk in the same RPC-over-RDMA header to convey a Long Call and provision a receptacle for a Long Reply.
当RPCSEC_GSS integrity或privacy service正在使用时,请求者在同一个RPC over RDMA标头中提供读取列表和应答块,以传递长呼叫并为长应答提供容器。
Like the base fields in an ONC RPC message (XID, call direction, and so on), the contents of an RPC-over-RDMA message's Transport stream are not protected by RPCSEC_GSS. This exposes XIDs, connection credit limits, and chunk lists (but not the content of the data items they refer to) to malicious behavior, which could redirect data that is transferred by the RPC-over-RDMA message, result in spurious retransmits, or trigger connection loss.
与ONC-RPC消息中的基本字段(XID、调用方向等)一样,RPC-over-RDMA消息的传输流的内容不受RPCSEC_GSS的保护。这会使XIDs、连接信用限制和区块列表(但不是它们所引用的数据项的内容)暴露于恶意行为,这可能会重定向RPC over RDMA消息传输的数据,导致虚假的重新传输,或触发连接丢失。
In particular, if an attacker alters the information contained in the chunk lists of an RPC-over-RDMA header, data contained in those chunks can be redirected to other registered memory regions on Requesters. An attacker might alter the arguments of RDMA Read and RDMA Write operations on the wire to similar effect. If such alterations occur, the use of RPCSEC_GSS integrity or privacy services enable a Requester to detect unexpected material in a received RPC message.
特别是,如果攻击者更改RPC over RDMA标头的区块列表中包含的信息,则这些区块中包含的数据可以重定向到请求程序上的其他已注册内存区域。攻击者可能会更改线路上RDMA读取和RDMA写入操作的参数,以达到类似效果。如果发生此类更改,则使用RPCSEC_GSS完整性或隐私服务可使请求者检测到接收到的RPC消息中的意外内容。
Encryption at lower layers, as described in Section 8.2.1, protects the content of the Transport stream. To address attacks on RDMA protocols themselves, RDMA transport implementations should conform to [RFC5042].
如第8.2.1节所述,较低层的加密保护传输流的内容。为了解决对RDMA协议本身的攻击,RDMA传输实现应符合[RFC5042]。
A set of RPC netids for resolving RPC-over-RDMA services is specified by this document. This is unchanged from [RFC5666].
本文档指定了一组用于解析RPC over RDMA服务的RPC NetID。这与[RFC5666]相比没有变化。
The RPC-over-RDMA transport has been assigned an RPC netid, which is an rpcbind [RFC1833] string used to describe the underlying protocol in order for RPC to select the appropriate transport framing, as well as the format of the service addresses and ports.
RPC over RDMA传输已分配一个RPC netid,它是一个rpcbind[RFC1833]字符串,用于描述底层协议,以便RPC选择适当的传输帧以及服务地址和端口的格式。
The following netid registry strings are defined for this purpose:
为此,定义了以下netid注册表字符串:
NC_RDMA "rdma" NC_RDMA6 "rdma6"
NC_RDMA“RDMA”NC_RDMA6“RDMA6”
The "rdma" netid is to be used when IPv4 addressing is employed by the underlying transport, and "rdma6" for IPv6 addressing. The netid assignment policy and registry are defined in [RFC5665].
当底层传输使用IPv4寻址时,将使用“rdma”netid,而“rdma6”用于IPv6寻址。[RFC5665]中定义了netid分配策略和注册表。
These netids MAY be used for any RDMA network that satisfies the requirements of Section 2.3.2 and that is able to identify service endpoints using IP port addressing, possibly through use of a translation service as described in Section 5.
这些NetID可用于满足第2.3.2节要求的任何RDMA网络,并且能够使用IP端口寻址(可能通过使用第5节所述的翻译服务)识别服务端点。
The use of the RPC-over-RDMA protocol has no effect on RPC Program numbers or existing registered port numbers. However, new port numbers MAY be registered for use by RPC-over-RDMA-enabled services, as appropriate to the new networks over which the services will operate.
RPC over RDMA协议的使用对RPC程序号或现有注册端口号没有影响。但是,新的端口号可以注册以供支持RDMA的RPC服务使用,这适用于服务将在其上运行的新网络。
For example, the NFS/RDMA service defined in [RFC5667] has been assigned the port 20049 in the "Service Name and Transport Protocol Port Number Registry". This is distinct from the port number defined for NFS on TCP, which is assigned the port 2049 in the same registry. NFS clients use the same RPC Program number for NFS (100003) when using either transport [RFC5531] (see the "Remote Procedure Call (RPC) Program Numbers" registry).
例如,[RFC5667]中定义的NFS/RDMA服务已在“服务名称和传输协议端口号注册表”中分配了端口20049。这与TCP上为NFS定义的端口号不同,TCP在同一注册表中分配了端口2049。NFS客户端在使用任一传输[RFC5531](请参阅“远程过程调用(RPC)程序编号”注册表)时,使用相同的NFS RPC程序编号(100003)。
[RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", RFC 1833, DOI 10.17487/RFC1833, August 1995, <http://www.rfc-editor.org/info/rfc1833>.
[RFC1833]Srinivasan,R.,“ONC RPC版本2的绑定协议”,RFC 1833,DOI 10.17487/RFC1833,1995年8月<http://www.rfc-editor.org/info/rfc1833>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<http://www.rfc-editor.org/info/rfc2119>.
[RFC4506] Eisler, M., Ed., "XDR: External Data Representation Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 2006, <http://www.rfc-editor.org/info/rfc4506>.
[RFC4506]艾斯勒,M.,编辑,“XDR:外部数据表示标准”,STD 67,RFC 4506,DOI 10.17487/RFC4506,2006年5月<http://www.rfc-editor.org/info/rfc4506>.
[RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security", RFC 5042, DOI 10.17487/RFC5042, October 2007, <http://www.rfc-editor.org/info/rfc5042>.
[RFC5042]Pinkerton,J.和E.Deleganes,“直接数据放置协议(DDP)/远程直接内存访问协议(RDMAP)安全”,RFC 5042,DOI 10.17487/RFC50422007年10月<http://www.rfc-editor.org/info/rfc5042>.
[RFC5056] Williams, N., "On the Use of Channel Bindings to Secure Channels", RFC 5056, DOI 10.17487/RFC5056, November 2007, <http://www.rfc-editor.org/info/rfc5056>.
[RFC5056]Williams,N.,“关于使用通道绑定保护通道”,RFC 5056,DOI 10.17487/RFC5056,2007年11月<http://www.rfc-editor.org/info/rfc5056>.
[RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol Specification Version 2", RFC 5531, DOI 10.17487/RFC5531, May 2009, <http://www.rfc-editor.org/info/rfc5531>.
[RFC5531]Thurlow,R.,“RPC:远程过程调用协议规范版本2”,RFC 5531,DOI 10.17487/RFC5531,2009年5月<http://www.rfc-editor.org/info/rfc5531>.
[RFC5660] Williams, N., "IPsec Channels: Connection Latching", RFC 5660, DOI 10.17487/RFC5660, October 2009, <http://www.rfc-editor.org/info/rfc5660>.
[RFC5660]Williams,N.,“IPsec通道:连接锁存”,RFC 5660,DOI 10.17487/RFC5660,2009年10月<http://www.rfc-editor.org/info/rfc5660>.
[RFC5665] Eisler, M., "IANA Considerations for Remote Procedure Call (RPC) Network Identifiers and Universal Address Formats", RFC 5665, DOI 10.17487/RFC5665, January 2010, <http://www.rfc-editor.org/info/rfc5665>.
[RFC5665]Eisler,M.“远程过程调用(RPC)网络标识符和通用地址格式的IANA注意事项”,RFC 5665,DOI 10.17487/RFC5665,2010年1月<http://www.rfc-editor.org/info/rfc5665>.
[RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) Security Version 3", RFC 7861, DOI 10.17487/RFC7861, November 2016, <http://www.rfc-editor.org/info/rfc7861>.
[RFC7861]Adamson,A.和N.Williams,“远程过程调用(RPC)安全版本3”,RFC 7861,DOI 10.17487/RFC7861,2016年11月<http://www.rfc-editor.org/info/rfc7861>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <http://www.rfc-editor.org/info/rfc8174>.
[RFC8174]Leiba,B.,“RFC 2119关键词中大写与小写的歧义”,BCP 14,RFC 8174,DOI 10.17487/RFC8174,2017年5月<http://www.rfc-editor.org/info/rfc8174>.
[IBARCH] InfiniBand Trade Association, "InfiniBand Architecture Specification Volume 1", Release 1.3, March 2015, <http://www.infinibandta.org/content/ pages.php?pg=technology_download>.
[IBARCH]InfiniBand贸易协会,“InfiniBand体系结构规范第1卷”,第1.3版,2015年3月<http://www.infinibandta.org/content/ pages.php?pg=technology\u download>。
[RFC768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 10.17487/RFC0768, August 1980, <http://www.rfc-editor.org/info/rfc768>.
[RFC768]Postel,J.,“用户数据报协议”,STD 6,RFC 768,DOI 10.17487/RFC0768,1980年8月<http://www.rfc-editor.org/info/rfc768>.
[RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, DOI 10.17487/RFC0793, September 1981, <http://www.rfc-editor.org/info/rfc793>.
[RFC793]Postel,J.,“传输控制协议”,标准7,RFC 793,DOI 10.17487/RFC0793,1981年9月<http://www.rfc-editor.org/info/rfc793>.
[RFC1094] Nowicki, B., "NFS: Network File System Protocol specification", RFC 1094, DOI 10.17487/RFC1094, March 1989, <http://www.rfc-editor.org/info/rfc1094>.
[RFC1094]Nowicki,B.,“NFS:网络文件系统协议规范”,RFC 1094,DOI 10.17487/RFC1094,1989年3月<http://www.rfc-editor.org/info/rfc1094>.
[RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS Version 3 Protocol Specification", RFC 1813, DOI 10.17487/RFC1813, June 1995, <http://www.rfc-editor.org/info/rfc1813>.
[RFC1813]Callaghan,B.,Pawlowski,B.,和P.Staubach,“NFS版本3协议规范”,RFC 1813,DOI 10.17487/RFC1813,1995年6月<http://www.rfc-editor.org/info/rfc1813>.
[RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. Garcia, "A Remote Direct Memory Access Protocol Specification", RFC 5040, DOI 10.17487/RFC5040, October 2007, <http://www.rfc-editor.org/info/rfc5040>.
[RFC5040]Recio,R.,Metzler,B.,Culley,P.,Hilland,J.,和D.Garcia,“远程直接内存访问协议规范”,RFC 5040,DOI 10.17487/RFC5040,2007年10月<http://www.rfc-editor.org/info/rfc5040>.
[RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct Data Placement over Reliable Transports", RFC 5041, DOI 10.17487/RFC5041, October 2007, <http://www.rfc-editor.org/info/rfc5041>.
[RFC5041]Shah,H.,Pinkerton,J.,Recio,R.,和P.Culley,“可靠传输上的直接数据放置”,RFC 5041,DOI 10.17487/RFC5041,2007年10月<http://www.rfc-editor.org/info/rfc5041>.
[RFC5532] Talpey, T. and C. Juszczak, "Network File System (NFS) Remote Direct Memory Access (RDMA) Problem Statement", RFC 5532, DOI 10.17487/RFC5532, May 2009, <http://www.rfc-editor.org/info/rfc5532>.
[RFC5532]Talpey,T.和C.Juszczak,“网络文件系统(NFS)远程直接内存访问(RDMA)问题声明”,RFC 5532,DOI 10.17487/RFC5532,2009年5月<http://www.rfc-editor.org/info/rfc5532>.
[RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, <http://www.rfc-editor.org/info/rfc5661>.
[RFC5661]Shepler,S.,Ed.,Eisler,M.,Ed.,和D.Noveck,Ed.,“网络文件系统(NFS)版本4次要版本1协议”,RFC 5661,DOI 10.17487/RFC5661,2010年1月<http://www.rfc-editor.org/info/rfc5661>.
[RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network File System (NFS) Version 4 Minor Version 1 External Data Representation Standard (XDR) Description", RFC 5662, DOI 10.17487/RFC5662, January 2010, <http://www.rfc-editor.org/info/rfc5662>.
[RFC5662]Shepler,S.,Ed.,Eisler,M.,Ed.,和D.Noveck,Ed.,“网络文件系统(NFS)版本4次要版本1外部数据表示标准(XDR)说明”,RFC 5662,DOI 10.17487/RFC5662,2010年1月<http://www.rfc-editor.org/info/rfc5662>.
[RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access Transport for Remote Procedure Call", RFC 5666, DOI 10.17487/RFC5666, January 2010, <http://www.rfc-editor.org/info/rfc5666>.
[RFC5666]Talpey,T.和B.Callaghan,“远程过程调用的远程直接内存访问传输”,RFC 5666,DOI 10.17487/RFC5666,2010年1月<http://www.rfc-editor.org/info/rfc5666>.
[RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, January 2010, <http://www.rfc-editor.org/info/rfc5667>.
[RFC5667]Talpey,T.和B.Callaghan,“网络文件系统(NFS)直接数据放置”,RFC 5667,DOI 10.17487/RFC5667,2010年1月<http://www.rfc-editor.org/info/rfc5667>.
[RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, March 2015, <http://www.rfc-editor.org/info/rfc7530>.
[RFC7530]Haynes,T.,Ed.和D.Noveck,Ed.,“网络文件系统(NFS)第4版协议”,RFC 7530,DOI 10.17487/RFC7530,2015年3月<http://www.rfc-editor.org/info/rfc7530>.
[RFC8167] Lever, C., "Bidirectional Remote Procedure Call on RPC-over-RDMA Transports", RFC 8167, DOI 10.17487/RFC8167, June 2017, <http://www.rfc-editor.org/info/rfc8167>.
[RFC8167]Lever,C.,“RDMA传输上RPC的双向远程过程调用”,RFC 8167,DOI 10.17487/RFC8167,2017年6月<http://www.rfc-editor.org/info/rfc8167>.
The following alterations have been made to the RPC-over-RDMA version 1 specification. The section numbers below refer to [RFC5666].
对RPC over RDMA版本1规范进行了以下修改。以下章节编号参考[RFC5666]。
o Section 2 has been expanded to introduce and explain key RPC [RFC5531], XDR [RFC4506], and RDMA [RFC5040] terminology. These terms are now used consistently throughout the specification.
o 第2节已经扩展,以介绍和解释关键RPC[RFC5531]、XDR[RFC4506]和RDMA[RFC5040]术语。这些术语现在在整个规范中使用一致。
o Section 3 has been reorganized and split into subsections to help readers locate specific requirements and definitions.
o 第3节已重新组织并分为几个小节,以帮助读者定位特定的需求和定义。
o Sections 4 and 5 have been combined to improve the organization of this information.
o 第4节和第5节已合并,以改进此信息的组织。
o The optional Connection Configuration Protocol has never been implemented. The specification of CCP has been deleted from this specification.
o 可选连接配置协议从未实现过。CCP规范已从本规范中删除。
o A section consolidating requirements for ULBs has been added.
o 增加了一个章节,整合了ULBs的要求。
o An XDR extraction mechanism is provided, along with full copyright, matching the approach used in [RFC5662].
o 提供了一种XDR提取机制以及完全版权,与[RFC5662]中使用的方法相匹配。
o The "Security Considerations" section has been expanded to include a discussion of how RPC-over-RDMA security depends on features of the underlying RDMA transport.
o “安全注意事项”一节已经扩展,包括关于RPC over RDMA安全性如何依赖于底层RDMA传输特性的讨论。
o A subsection describing the use of RPCSEC_GSS [RFC7861] with RPC-over-RDMA version 1 has been added.
o 增加了一小节,说明RPCSEC_GSS[RFC7861]与RPC over RDMA版本1的使用。
Although the protocol described herein interoperates with existing implementations of [RFC5666], the following changes have been made relative to the protocol described in that document:
尽管本文所述的协议与[RFC5666]的现有实现互操作,但相对于该文档中所述的协议,已做出以下更改:
o Support for the Read-Read transfer model has been removed. Read-Read is a slower transfer model than Read-Write. As a result, implementers have chosen not to support it. Removal of Read-Read simplifies explanatory text, and the RDMA_DONE procedure is no longer part of the protocol.
o 已删除对读传输模型的支持。读-读是一种比读-写更慢的传输模式。因此,实现者选择不支持它。删除读取简化了解释性文本,RDMA_DONE过程不再是协议的一部分。
o The specification of RDMA_MSGP in [RFC5666] is not adequate, although some incomplete implementations exist. Even if an adequate specification were provided and an implementation were produced, benefit for protocols such as NFSv4.0 [RFC7530] is doubtful. Therefore, the RDMA_MSGP message type is no longer supported.
o [RFC5666]中的RDMA_MSGP规范不充分,尽管存在一些不完整的实现。即使提供了足够的规范和实现,NFSv4.0[RFC7530]等协议的好处也值得怀疑。因此,不再支持RDMA_MSGP消息类型。
o Technical issues with regard to handling RPC-over-RDMA header errors have been corrected.
o 关于处理RPC over RDMA报头错误的技术问题已得到纠正。
o Specific requirements related to implicit XDR roundup and complex XDR data types have been added.
o 增加了与隐式XDR汇总和复杂XDR数据类型相关的具体要求。
o Explicit guidance is provided related to sizing Write chunks, managing multiple chunks in the Write list, and handling unused Write chunks.
o 提供了与调整写入块大小、管理写入列表中的多个块以及处理未使用的写入块相关的明确指导。
o Clear guidance about Send and Receive buffer sizes has been introduced. This enables better decisions about when a Reply chunk must be provided.
o 关于发送和接收缓冲区大小的明确指导已经引入。这可以更好地决定何时必须提供回复块。
Acknowledgments
致谢
The editor gratefully acknowledges the work of Brent Callaghan and Tom Talpey on the original RPC-over-RDMA Version 1 specification [RFC5666].
编辑非常感谢Brent Callaghan和Tom Talpey在原始RPC over RDMA版本1规范[RFC5666]上所做的工作。
Dave Noveck provided excellent review, constructive suggestions, and consistent navigational guidance throughout the process of drafting this document. Dave also contributed much of the organization and content of Section 7 and helped the authors understand the complexities of XDR extensibility.
在起草本文件的整个过程中,Dave Noveck提供了出色的审查、建设性建议和一致的导航指导。Dave还对第7节的组织和内容做出了很大贡献,并帮助作者理解XDR可扩展性的复杂性。
The comments and contributions of Karen Deitke, Dai Ngo, Chunli Zhang, Dominique Martinet, and Mahesh Siddheshwar are accepted with great thanks. The editor also wishes to thank Bill Baker, Greg Marsden, and Matt Benjamin for their support of this work.
我们非常感谢Karen Deitke、Dai Ngo、Chunli Zhang、Dominique Martinet和Mahesh Siddheshwar的评论和贡献。编辑还要感谢比尔·贝克、格雷格·马斯登和马特·本杰明对本书的支持。
The extract.sh shell script and formatting conventions were first described by the authors of the NFSv4.1 XDR specification [RFC5662].
extract.sh shell脚本和格式约定首先由NFSv4.1 XDR规范[RFC5662]的作者描述。
Special thanks go to Transport Area Director Spencer Dawkins, NFSV4 Working Group Chair and Document Shepherd Spencer Shepler, and NFSV4 Working Group Secretary Thomas Haynes for their support.
特别感谢运输区总监斯宾塞·道金斯、NFSV4工作组主席兼文件保管员斯宾塞·谢普勒和NFSV4工作组秘书托马斯·海恩斯的支持。
Authors' Addresses
作者地址
Charles Lever (editor) Oracle Corporation 1015 Granger Avenue Ann Arbor, MI 48104 United States of America
Charles Lever(编辑)美国密歇根州安阿伯格兰杰大道1015号甲骨文公司48104
Phone: +1 248 816 6463 Email: chuck.lever@oracle.com
Phone: +1 248 816 6463 Email: chuck.lever@oracle.com
William Allen Simpson Red Hat 1384 Fontaine Madison Heights, MI 48071 United States of America
威廉·艾伦·辛普森红帽美国密歇根州芳丹麦迪逊高地1384号,邮编:48071
Email: william.allen.simpson@gmail.com
Email: william.allen.simpson@gmail.com
Tom Talpey Microsoft Corp. One Microsoft Way Redmond, WA 98052 United States of America
Tom Talpey微软公司美国华盛顿州雷德蒙微软大道一号,邮编:98052
Phone: +1 425 704-9945 Email: ttalpey@microsoft.com
Phone: +1 425 704-9945 Email: ttalpey@microsoft.com