Internet Engineering Task Force (IETF)                        C. Hellwig
Request for Comments: 8154                                      May 2017
Category: Standards Track
ISSN: 2070-1721
        
Internet Engineering Task Force (IETF)                        C. Hellwig
Request for Comments: 8154                                      May 2017
Category: Standards Track
ISSN: 2070-1721
        

Parallel NFS (pNFS) Small Computer System Interface (SCSI) Layout

并行NFS(pNFS)小型计算机系统接口(SCSI)布局

Abstract

摘要

The Parallel Network File System (pNFS) allows a separation between the metadata (onto a metadata server) and data (onto a storage device) for a file. The Small Computer System Interface (SCSI) layout type is defined in this document as an extension to pNFS to allow the use of SCSI-based block storage devices.

并行网络文件系统(pNFS)允许文件的元数据(在元数据服务器上)和数据(在存储设备上)分离。本文档将小型计算机系统接口(SCSI)布局类型定义为pNFS的扩展,以允许使用基于SCSI的块存储设备。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 7841第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc8154.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc8154.

Copyright Notice

版权公告

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2017 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Conventions Used in This Document . . . . . . . . . . . .   4
     1.2.  General Definitions . . . . . . . . . . . . . . . . . . .   4
     1.3.  Code Components Licensing Notice  . . . . . . . . . . . .   5
     1.4.  XDR Description . . . . . . . . . . . . . . . . . . . . .   5
   2.  SCSI Layout Description . . . . . . . . . . . . . . . . . . .   7
     2.1.  Background and Architecture . . . . . . . . . . . . . . .   7
     2.2.  layouttype4 . . . . . . . . . . . . . . . . . . . . . . .   8
     2.3.  GETDEVICEINFO . . . . . . . . . . . . . . . . . . . . . .   8
       2.3.1.  Volume Identification . . . . . . . . . . . . . . . .   8
       2.3.2.  Volume Topology . . . . . . . . . . . . . . . . . . .  10
     2.4.  Data Structures: Extents and Extent Lists . . . . . . . .  12
       2.4.1.  Layout Requests and Extent Lists  . . . . . . . . . .  15
       2.4.2.  Layout Commits  . . . . . . . . . . . . . . . . . . .  16
       2.4.3.  Layout Returns  . . . . . . . . . . . . . . . . . . .  17
       2.4.4.  Layout Revocation . . . . . . . . . . . . . . . . . .  17
       2.4.5.  Client Copy-on-Write Processing . . . . . . . . . . .  17
       2.4.6.  Extents Are Permissions . . . . . . . . . . . . . . .  18
       2.4.7.  Partial-Block Updates . . . . . . . . . . . . . . . .  19
       2.4.8.  End-of-File Processing  . . . . . . . . . . . . . . .  20
       2.4.9.  Layout Hints  . . . . . . . . . . . . . . . . . . . .  20
       2.4.10. Client Fencing  . . . . . . . . . . . . . . . . . . .  21
     2.5.  Crash Recovery Issues . . . . . . . . . . . . . . . . . .  22
     2.6.  Recalling Resources: CB_RECALL_ANY  . . . . . . . . . . .  23
     2.7.  Transient and Permanent Errors  . . . . . . . . . . . . .  23
     2.8.  Volatile Write Caches . . . . . . . . . . . . . . . . . .  24
   3.  Enforcing NFSv4 Semantics . . . . . . . . . . . . . . . . . .  24
     3.1.  Use of Open Stateids  . . . . . . . . . . . . . . . . . .  25
     3.2.  Enforcing Security Restrictions . . . . . . . . . . . . .  26
     3.3.  Enforcing Locking Restrictions  . . . . . . . . . . . . .  26
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  27
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  28
   6.  Normative References  . . . . . . . . . . . . . . . . . . . .  28
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  29
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  30
        
   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Conventions Used in This Document . . . . . . . . . . . .   4
     1.2.  General Definitions . . . . . . . . . . . . . . . . . . .   4
     1.3.  Code Components Licensing Notice  . . . . . . . . . . . .   5
     1.4.  XDR Description . . . . . . . . . . . . . . . . . . . . .   5
   2.  SCSI Layout Description . . . . . . . . . . . . . . . . . . .   7
     2.1.  Background and Architecture . . . . . . . . . . . . . . .   7
     2.2.  layouttype4 . . . . . . . . . . . . . . . . . . . . . . .   8
     2.3.  GETDEVICEINFO . . . . . . . . . . . . . . . . . . . . . .   8
       2.3.1.  Volume Identification . . . . . . . . . . . . . . . .   8
       2.3.2.  Volume Topology . . . . . . . . . . . . . . . . . . .  10
     2.4.  Data Structures: Extents and Extent Lists . . . . . . . .  12
       2.4.1.  Layout Requests and Extent Lists  . . . . . . . . . .  15
       2.4.2.  Layout Commits  . . . . . . . . . . . . . . . . . . .  16
       2.4.3.  Layout Returns  . . . . . . . . . . . . . . . . . . .  17
       2.4.4.  Layout Revocation . . . . . . . . . . . . . . . . . .  17
       2.4.5.  Client Copy-on-Write Processing . . . . . . . . . . .  17
       2.4.6.  Extents Are Permissions . . . . . . . . . . . . . . .  18
       2.4.7.  Partial-Block Updates . . . . . . . . . . . . . . . .  19
       2.4.8.  End-of-File Processing  . . . . . . . . . . . . . . .  20
       2.4.9.  Layout Hints  . . . . . . . . . . . . . . . . . . . .  20
       2.4.10. Client Fencing  . . . . . . . . . . . . . . . . . . .  21
     2.5.  Crash Recovery Issues . . . . . . . . . . . . . . . . . .  22
     2.6.  Recalling Resources: CB_RECALL_ANY  . . . . . . . . . . .  23
     2.7.  Transient and Permanent Errors  . . . . . . . . . . . . .  23
     2.8.  Volatile Write Caches . . . . . . . . . . . . . . . . . .  24
   3.  Enforcing NFSv4 Semantics . . . . . . . . . . . . . . . . . .  24
     3.1.  Use of Open Stateids  . . . . . . . . . . . . . . . . . .  25
     3.2.  Enforcing Security Restrictions . . . . . . . . . . . . .  26
     3.3.  Enforcing Locking Restrictions  . . . . . . . . . . . . .  26
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  27
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  28
   6.  Normative References  . . . . . . . . . . . . . . . . . . . .  28
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  29
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  30
        
1. Introduction
1. 介绍

Figure 1 shows the overall architecture of a Parallel NFS (pNFS) system:

图1显示了并行NFS(pNFS)系统的总体架构:

        +-----------+
        |+-----------+                                 +-----------+
        ||+-----------+                                |           |
        |||           |       NFSv4.1 + pNFS           |           |
        +||  Clients  |<------------------------------>|   Server  |
         +|           |                                |           |
          +-----------+                                |           |
               |||                                     +-----------+
               |||                                           |
               |||                                           |
               ||| Storage        +-----------+              |
               ||| Protocol       |+-----------+             |
               ||+----------------||+-----------+  Control   |
               |+-----------------|||           |    Protocol|
               +------------------+||  Storage  |------------+
                                   +|  Systems  |
                                    +-----------+
        
        +-----------+
        |+-----------+                                 +-----------+
        ||+-----------+                                |           |
        |||           |       NFSv4.1 + pNFS           |           |
        +||  Clients  |<------------------------------>|   Server  |
         +|           |                                |           |
          +-----------+                                |           |
               |||                                     +-----------+
               |||                                           |
               |||                                           |
               ||| Storage        +-----------+              |
               ||| Protocol       |+-----------+             |
               ||+----------------||+-----------+  Control   |
               |+-----------------|||           |    Protocol|
               +------------------+||  Storage  |------------+
                                   +|  Systems  |
                                    +-----------+
        

Figure 1

图1

The overall approach is that pNFS-enhanced clients obtain sufficient information from the server to enable them to access the underlying storage (on the storage systems) directly. See Section 12 of [RFC5661] for more details. This document is concerned with access from pNFS clients to storage devices over block storage protocols based on the SCSI Architecture Model [SAM-5], e.g., the Fibre Channel Protocol (FCP), Internet SCSI (iSCSI), or Serial Attached SCSI (SAS). pNFS SCSI layout requires block-based SCSI command sets, for example, SCSI Block Commands [SBC3]. While SCSI command sets for non-block-based access exist, these are not supported by the SCSI layout type, and all future references to SCSI storage devices will imply a block-based SCSI command set.

总体方法是,增强了pNFS的客户端从服务器获取足够的信息,使它们能够直接访问底层存储(在存储系统上)。详见[RFC5661]第12节。本文档涉及通过基于SCSI体系结构模型[SAM-5]的块存储协议(例如光纤通道协议(FCP)、Internet SCSI(iSCSI)或串行连接SCSI(SAS))从pNFS客户端访问存储设备。pNFS SCSI布局需要基于块的SCSI命令集,例如SCSI块命令[SBC3]。虽然存在用于非基于块访问的SCSI命令集,但SCSI布局类型不支持这些命令集,并且将来对SCSI存储设备的所有引用都将暗示基于块的SCSI命令集。

The Server to Storage System protocol, called the "Control Protocol", is not of concern for interoperability, although it will typically be the same SCSI-based storage protocol.

服务器到存储系统协议(称为“控制协议”)与互操作性无关,尽管它通常是相同的基于SCSI的存储协议。

This document is based on [RFC5663] and makes changes to the block layout type to provide a better pNFS layout protocol for SCSI-based storage devices. Despite these changes, [RFC5663] remains the defining document for the existing block layout type. pNFS Block Disk Protection [RFC6688] is unnecessary in the context of the SCSI layout type because the new layout type provides mandatory disk access

本文档基于[RFC5663],并对块布局类型进行了更改,以便为基于SCSI的存储设备提供更好的pNFS布局协议。尽管有这些更改,[RFC5663]仍然是现有块布局类型的定义文档。pNFS块磁盘保护[RFC6688]在SCSI布局类型的上下文中是不必要的,因为新的布局类型提供了强制磁盘访问

protection as part of the layout type definition. In contrast to [RFC5663], this document uses SCSI protocol features to provide reliable fencing by using SCSI persistent reservations, and it can provide reliable and efficient device discovery by using SCSI device identifiers instead of having to rely on probing all devices potentially attached to a client. This new layout type also optimizes the Input/Output (I/O) path by reducing the size of the LAYOUTCOMMIT payload.

保护作为布局类型定义的一部分。与[RFC5663]相反,本文档使用SCSI协议功能通过使用SCSI持久保留提供可靠的防护,并且它可以通过使用SCSI设备标识符提供可靠和高效的设备发现,而不必依赖于探测可能连接到客户端的所有设备。这种新的布局类型还通过减少LAYOUTCOMMIT负载的大小来优化输入/输出(I/O)路径。

The above two paragraphs summarize the major functional differences from [RFC5663]. There are other minor differences, e.g., the "base" volume type in this specification is used instead of the "simple" volume type in [RFC5663], but there are no significant differences in the data structures that describe the volume topology above this level (Section 2.3.2) or in the data structures that describe extents (Section 2.4).

以上两段总结了[RFC5663]的主要功能差异。还有其他细微差异,例如,本规范中使用的是“基本”卷类型,而不是[RFC5663]中的“简单”卷类型,但在描述该级别以上卷拓扑的数据结构(第2.3.2节)或描述范围的数据结构(第2.4节)中没有显著差异。

1.1. Conventions Used in This Document
1.1. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。

1.2. General Definitions
1.2. 一般定义

The following definitions are provided for the purpose of providing an appropriate context for the reader.

以下定义旨在为读者提供适当的上下文。

Byte: an octet, i.e., a datum exactly 8 bits in length.

字节:八位字节,即长度正好为8位的数据。

Client: the entity that accesses the NFS server's resources. The client may be an application that contains the logic to access the NFS server directly. The client may also be the traditional operating system client that provides remote file system services for a set of applications.

客户端:访问NFS服务器资源的实体。客户端可能是包含直接访问NFS服务器的逻辑的应用程序。客户端也可以是为一组应用程序提供远程文件系统服务的传统操作系统客户端。

Server: the entity responsible for coordinating client access to a set of file systems and is identified by a server owner.

服务器:负责协调客户端对一组文件系统的访问的实体,由服务器所有者标识。

Metadata Server (MDS): a pNFS server that provides metadata information for a file system object. It also is responsible for generating layouts for file system objects. Note that the MDS is also responsible for directory-based operations.

元数据服务器(MDS):为文件系统对象提供元数据信息的pNFS服务器。它还负责为文件系统对象生成布局。请注意,MDS还负责基于目录的操作。

1.3. Code Components Licensing Notice
1.3. 代码组件许可证公告

The external data representation (XDR) description and scripts for extracting the XDR description are Code Components as described in Section 4 of "Legal Provisions Relating to IETF Documents" [LEGAL]. These Code Components are licensed according to the terms of Section 4 of "Legal Provisions Relating to IETF Documents".

外部数据表示(XDR)描述和用于提取XDR描述的脚本是“与IETF文件相关的法律规定”[法律]第4节所述的代码组件。这些代码组件根据“与IETF文件有关的法律规定”第4节的条款获得许可。

1.4. XDR Description
1.4. XDR描述

This document contains the XDR [RFC4506] description of the NFSv4.1 SCSI layout protocol. The XDR description is embedded in this document in a way that makes it simple for the reader to extract into a ready-to-compile form. The reader can feed this document into the following shell script to produce the machine-readable XDR description of the NFSv4.1 SCSI layout:

本文档包含NFSv4.1 SCSI布局协议的XDR[RFC4506]说明。XDR描述以某种方式嵌入到本文档中,使读者能够轻松地将其提取到准备编译的表单中。读者可以将此文档输入以下shell脚本,以生成NFSv4.1 SCSI布局的机器可读XDR描述:

    #!/bin/sh
    grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??'
        
    #!/bin/sh
    grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??'
        

That is, if the above script is stored in a file called "extract.sh", and this document is in a file called "spec.txt", then the reader can do:

也就是说,如果上述脚本存储在一个名为“extract.sh”的文件中,而此文档存储在一个名为“spec.txt”的文件中,那么读者可以执行以下操作:

sh extract.sh < spec.txt > scsi_prot.x

sh extract.sh<spec.txt>scsi_prot.x

The effect of the script is to remove leading white space from each line, plus a sentinel sequence of "///".

脚本的作用是删除每行的前导空格,以及“//”的哨兵序列。

The embedded XDR file header follows. Subsequent XDR descriptions with the sentinel sequence are embedded throughout the document.

下面是嵌入式XDR文件头。带有sentinel序列的后续XDR描述嵌入到整个文档中。

Note that the XDR code contained in this document depends on types from the NFSv4.1 nfs4_prot.x file [RFC5662]. This includes both NFS types that end with a 4, such as offset4, length4, etc., as well as more generic types such as uint32_t and uint64_t.

请注意,本文档中包含的XDR代码取决于NFSv4.1 nfs4_prot.x文件[RFC5662]中的类型。这包括以4结尾的NFS类型,如offset4、length4等,以及更通用的类型,如uint32和uint64。

       /// /*
       ///  * This code was derived from RFC 8154.
       ///  * Please reproduce this note if possible.
       ///  */
       /// /*
       ///  * Copyright (c) 2017 IETF Trust and the persons
       ///  * identified as authors of the code.  All rights reserved.
       ///  *
        
       /// /*
       ///  * This code was derived from RFC 8154.
       ///  * Please reproduce this note if possible.
       ///  */
       /// /*
       ///  * Copyright (c) 2017 IETF Trust and the persons
       ///  * identified as authors of the code.  All rights reserved.
       ///  *
        
       ///  * Redistribution and use in source and binary forms, with
       ///  * or without modification, are permitted provided that the
       ///  * following conditions are met:
       ///  *
       ///  * - Redistributions of source code must retain the above
       ///  *   copyright notice, this list of conditions and the
       ///  *   following disclaimer.
       ///  *
       ///  * - Redistributions in binary form must reproduce the above
       ///  *   copyright notice, this list of conditions and the
       ///  *   following disclaimer in the documentation and/or other
       ///  *   materials provided with the distribution.
       ///  *
       ///  * - Neither the name of Internet Society, IETF or IETF
       ///  *   Trust, nor the names of specific contributors, may be
       ///  *   used to endorse or promote products derived from this
       ///  *   software without specific prior written permission.
       ///  *
       ///  *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
       ///  *   AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
       ///  *   WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
       ///  *   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
       ///  *   FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
       ///  *   EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
       ///  *   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
       ///  *   EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
       ///  *   NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
       ///  *   SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
       ///  *   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
       ///  *   LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
       ///  *   OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
       ///  *   IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
       ///  *   ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
       ///  */
       ///
       /// /*
       ///  *      nfs4_scsi_layout_prot.x
       ///  */
       ///
       /// %#include "nfsv41.h"
       ///
        
       ///  * Redistribution and use in source and binary forms, with
       ///  * or without modification, are permitted provided that the
       ///  * following conditions are met:
       ///  *
       ///  * - Redistributions of source code must retain the above
       ///  *   copyright notice, this list of conditions and the
       ///  *   following disclaimer.
       ///  *
       ///  * - Redistributions in binary form must reproduce the above
       ///  *   copyright notice, this list of conditions and the
       ///  *   following disclaimer in the documentation and/or other
       ///  *   materials provided with the distribution.
       ///  *
       ///  * - Neither the name of Internet Society, IETF or IETF
       ///  *   Trust, nor the names of specific contributors, may be
       ///  *   used to endorse or promote products derived from this
       ///  *   software without specific prior written permission.
       ///  *
       ///  *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
       ///  *   AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
       ///  *   WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
       ///  *   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
       ///  *   FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
       ///  *   EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
       ///  *   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
       ///  *   EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
       ///  *   NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
       ///  *   SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
       ///  *   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
       ///  *   LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
       ///  *   OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
       ///  *   IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
       ///  *   ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
       ///  */
       ///
       /// /*
       ///  *      nfs4_scsi_layout_prot.x
       ///  */
       ///
       /// %#include "nfsv41.h"
       ///
        
2. SCSI Layout Description
2. SCSI布局说明
2.1. Background and Architecture
2.1. 背景与建筑

The fundamental storage model supported by SCSI storage devices is a logical unit (LU) consisting of a sequential series of fixed-size blocks. Logical units used as devices for NFS SCSI layouts, and the SCSI initiators used for the pNFS metadata server and clients, MUST support SCSI persistent reservations as defined in [SPC4].

SCSI存储设备支持的基本存储模型是一个逻辑单元(LU),由一系列连续的固定大小的块组成。用作NFS SCSI布局设备的逻辑单元以及用于pNFS元数据服务器和客户端的SCSI启动器必须支持[SPC4]中定义的SCSI持久保留。

A pNFS layout for this SCSI class of storage is responsible for mapping from an NFS file (or portion of a file) to the blocks of storage volumes that contain the file. The blocks are expressed as extents with 64-bit offsets and lengths using the existing NFSv4 offset4 and length4 types. Clients MUST be able to perform I/O to the block extents without affecting additional areas of storage (especially important for writes); therefore, extents MUST be aligned to logical block size boundaries of the underlying logical units (typically 512 or 4096 bytes). For complex volume topologies, the server MUST ensure extents are aligned to the logical block size boundaries of the largest logical block size in the volume topology.

此SCSI类存储的pNFS布局负责从NFS文件(或文件的一部分)映射到包含该文件的存储卷块。使用现有的NFSv4 offset4和length4类型,将块表示为具有64位偏移量和长度的区段。客户机必须能够在不影响额外存储区域的情况下对块扩展执行I/O(对于写入尤其重要);因此,扩展数据块必须与底层逻辑单元的逻辑块大小边界对齐(通常为512或4096字节)。对于复杂卷拓扑,服务器必须确保扩展数据块与卷拓扑中最大逻辑块大小的逻辑块大小边界对齐。

The pNFS operation for requesting a layout (LAYOUTGET) includes the "layoutiomode4 loga_iomode" argument, which indicates whether the requested layout is for read-only use or read-write use. A read-only layout may contain holes that are read as zero, whereas a read-write layout will contain allocated but uninitialized storage in those holes (read as zero, can be written by client). This document also supports client participation in copy-on-write (e.g., for file systems with snapshots) by providing both read-only and uninitialized storage for the same range in a layout. Reads are initially performed on the read-only storage, with writes going to the uninitialized storage. After the first write that initializes the uninitialized storage, all reads are performed to that now-initialized writable storage, and the corresponding read-only storage is no longer used.

用于请求布局的pNFS操作(LAYOUTGET)包括“layoutiomode4 loga_iomode”参数,该参数指示请求的布局是只读使用还是读写使用。只读布局可能包含读取为零的孔,而读写布局将在这些孔中包含已分配但未初始化的存储(读取为零,可由客户端写入)。本文档还通过为布局中的同一范围提供只读和未初始化存储,支持客户端参与写时拷贝(例如,对于具有快照的文件系统)。读取最初是在只读存储器上执行的,而写入则进入未初始化的存储器。在初始化未初始化的存储器的第一次写入之后,将对现在已初始化的可写存储器执行所有读取,并且不再使用相应的只读存储器。

The SCSI layout solution expands the security responsibilities of the pNFS clients, and there are a number of environments where the mandatory-to-implement security properties for NFS cannot be satisfied. The additional security responsibilities of the client follow, and a full discussion is present in Section 4 ("Security Considerations").

SCSI布局解决方案扩展了pNFS客户机的安全责任,并且有许多环境无法满足为NFS实现安全属性的强制要求。客户的额外安全责任如下,第4节(“安全注意事项”)中有详细讨论。

o Typically, SCSI storage devices provide access control mechanisms (e.g., Logical Unit Number (LUN) mapping and/or masking), which operate at the granularity of individual hosts, not individual blocks. For this reason, block-based protection must be provided by the client software.

o 通常,SCSI存储设备提供访问控制机制(例如,逻辑单元号(LUN)映射和/或掩蔽),这些机制以单个主机而不是单个块的粒度运行。因此,客户端软件必须提供基于块的保护。

o Similarly, SCSI storage devices typically are not able to validate NFS locks that apply to file regions. For instance, if a file is covered by a mandatory read-only lock, the server can ensure that only readable layouts for the file are granted to pNFS clients. However, it is up to each pNFS client to ensure that the readable layout is used only to service read requests and not to allow writes to the existing parts of the file.

o 类似地,SCSI存储设备通常无法验证应用于文件区域的NFS锁。例如,如果一个文件被强制只读锁覆盖,服务器可以确保只有文件的可读布局被授予pNFS客户端。但是,由每个pNFS客户机来确保可读布局仅用于服务读取请求,而不允许写入文件的现有部分。

Since SCSI storage devices are generally not capable of enforcing such file-based security, in environments where pNFS clients cannot be trusted to enforce such policies, pNFS SCSI layouts MUST NOT be used.

由于SCSI存储设备通常无法实施此类基于文件的安全性,因此在无法信任pNFS客户端实施此类策略的环境中,不得使用pNFS SCSI布局。

2.2. layouttype4
2.2. 布局类型4

The layout4 type defined in [RFC5662] is extended with a new value as follows:

[RFC5662]中定义的layout4类型使用新值进行扩展,如下所示:

        enum layouttype4 {
            LAYOUT4_NFSV4_1_FILES   = 1,
            LAYOUT4_OSD2_OBJECTS    = 2,
            LAYOUT4_BLOCK_VOLUME    = 3,
            LAYOUT4_SCSI            = 5
        };
        
        enum layouttype4 {
            LAYOUT4_NFSV4_1_FILES   = 1,
            LAYOUT4_OSD2_OBJECTS    = 2,
            LAYOUT4_BLOCK_VOLUME    = 3,
            LAYOUT4_SCSI            = 5
        };
        

This document defines the structure associated with the layouttype4 value LAYOUT4_SCSI. [RFC5661] specifies the loc_body structure as an XDR type "opaque". The opaque layout is uninterpreted by the generic pNFS client layers but obviously must be interpreted by the layout type implementation.

本文档定义了与layouttype4值LAYOUT4_SCSI关联的结构。[RFC5661]将loc_主体结构指定为XDR类型“不透明”。一般pNFS客户端层并不理解不透明布局,但显然必须由布局类型实现来解释。

2.3. GETDEVICEINFO
2.3. 获取设备信息
2.3.1. Volume Identification
2.3.1. 体积识别

SCSI targets implementing [SPC4] export unique LU names for each LU through the Device Identification Vital Product Data (VPD) page (page code 0x83), which can be obtained using the INQUIRY command with the Enable VPD (EVPD) bit set to one. This document uses a subset of this information to identify LUs backing pNFS SCSI layouts. The

实现[SPC4]的SCSI目标通过设备标识重要产品数据(VPD)页面(页面代码0x83)为每个LU导出唯一的LU名称,该页面可在启用VPD(EVPD)位设置为1的情况下使用查询命令获得。本文档使用此信息的子集来标识支持pNFS SCSI布局的LU。这个

Device Identification VPD page descriptors used to identify LUs for use with pNFS SCSI layouts must adhere to the following restrictions:

用于识别用于pNFS SCSI布局的LU的设备标识VPD页面描述符必须遵守以下限制:

1. The "ASSOCIATION" MUST be set to 0 (The "DESIGNATOR" field is associated with the addressed logical unit).

1. “关联”必须设置为0(“指示符”字段与寻址逻辑单元关联)。

2. The "DESIGNATOR TYPE" MUST be set to one of four values that are required for the mandatory logical unit name in Section 7.7.3 of [SPC4], as explicitly listed in the "pnfs_scsi_designator_type" enumeration:

2. “指示符类型”必须设置为[SPC4]第7.7.3节中强制逻辑单元名称所需的四个值之一,如“pnfs_scsi_指示符类型”枚举中明确列出的:

PS_DESIGNATOR_T10 - based on T10 vendor ID

PS_指示符_T10-基于T10供应商ID

PS_DESIGNATOR_EUI64 - based on EUI-64

PS_指示器_EUI64-基于EUI-64

PS_DESIGNATOR_NAA - Network Address Authority (NAA)

PS_指示符_NAA-网络地址管理机构(NAA)

PS_DESIGNATOR_NAME - SCSI name string

PS_指示符_名称-SCSI名称字符串

3. Any other association or designator type MUST NOT be used. Use of T10 vendor IDs is discouraged when one of the other types can be used.

3. 不得使用任何其他关联或指示符类型。如果可以使用其他类型之一,则不鼓励使用T10供应商ID。

The "CODE SET" VPD page field is stored in the "sbv_code_set" field of the "pnfs_scsi_base_volume_info4" data structure, the "DESIGNATOR TYPE" is stored in "sbv_designator_type", and the DESIGNATOR is stored in "sbv_designator". Due to the use of an XDR array, the "DESIGNATOR LENGTH" field does not need to be set separately. Only certain combinations of "sbv_code_set" and "sbv_designator_type" are valid; please refer to [SPC4] for details, and note that ASCII MAY be used as the code set for UTF-8 text that contains only printable ASCII characters. Note that a Device Identification VPD page MAY contain multiple descriptors with the same association, code set, and designator type. Thus, NFS clients MUST check all the descriptors for a possible match to "sbv_code_set", "sbv_designator_type", and "sbv_designator".

“代码集”VPD页面字段存储在“pnfs_scsi_base_volume_info4”数据结构的“sbv_代码集”字段中,“指示符类型”存储在“sbv_指示符类型”中,“指示符类型”存储在“sbv_指示符”中。由于使用XDR阵列,“指示符长度”字段不需要单独设置。只有“sbv_代码集”和“sbv_指示符类型”的某些组合有效;有关详细信息,请参阅[SPC4],并注意ASCII可以用作仅包含可打印ASCII字符的UTF-8文本的代码集。请注意,设备标识VPD页面可能包含具有相同关联、代码集和标识符类型的多个描述符。因此,NFS客户端必须检查所有描述符是否与“sbv_代码集”、“sbv_指示符类型”和“sbv_指示符”匹配。

Storage devices such as storage arrays can have multiple physical network interfaces that need not be connected to a common network, resulting in a pNFS client having simultaneous multipath access to the same storage volumes via different ports on different networks. Selection of one or multiple ports to access the storage device is left up to the client.

存储设备(如存储阵列)可以具有多个物理网络接口,这些接口不需要连接到公共网络,从而导致pNFS客户端可以通过不同网络上的不同端口同时多路径访问相同的存储卷。选择一个或多个端口以访问存储设备由客户端决定。

Additionally, the server returns a persistent reservation key in the "sbv_pr_key" field. See Section 2.4.10 for more details on the use of persistent reservations.

此外,服务器在“sbv_pr_key”字段中返回一个持久保留密钥。有关使用永久保留的更多详细信息,请参见第2.4.10节。

2.3.2. Volume Topology
2.3.2. 体积拓扑

The pNFS SCSI layout volume topology is expressed in terms of the volume types described below. The individual components of the topology are contained in an array, and components MAY refer to other components by using array indices.

pNFS SCSI布局卷拓扑以下面描述的卷类型表示。拓扑的各个组件包含在一个数组中,组件可以使用数组索引引用其他组件。

   /// enum pnfs_scsi_volume_type4 {
   ///     PNFS_SCSI_VOLUME_SLICE  = 1,  /* volume is a slice of
   ///                                      another volume */
   ///     PNFS_SCSI_VOLUME_CONCAT = 2,  /* volume is a
   ///                                      concatenation of
   ///                                      multiple volumes */
   ///     PNFS_SCSI_VOLUME_STRIPE = 3   /* volume is striped across
   ///                                      multiple volumes */
   ///     PNFS_SCSI_VOLUME_BASE   = 4,  /* volume maps to a single
   ///                                      LU */
   /// };
   ///
        
   /// enum pnfs_scsi_volume_type4 {
   ///     PNFS_SCSI_VOLUME_SLICE  = 1,  /* volume is a slice of
   ///                                      another volume */
   ///     PNFS_SCSI_VOLUME_CONCAT = 2,  /* volume is a
   ///                                      concatenation of
   ///                                      multiple volumes */
   ///     PNFS_SCSI_VOLUME_STRIPE = 3   /* volume is striped across
   ///                                      multiple volumes */
   ///     PNFS_SCSI_VOLUME_BASE   = 4,  /* volume maps to a single
   ///                                      LU */
   /// };
   ///
        
   /// /*
   ///  * Code sets from SPC-4.
   ///  */
   /// enum pnfs_scsi_code_set {
   ///     PS_CODE_SET_BINARY     = 1,
   ///     PS_CODE_SET_ASCII      = 2,
   ///     PS_CODE_SET_UTF8       = 3
   /// };
   ///
   /// /*
   ///  * Designator types taken from SPC-4.
   ///  *
   ///  * Other values are allocated in SPC-4 but are not mandatory to
   ///  * implement or aren't logical unit names.
   ///  */
   /// enum pnfs_scsi_designator_type {
   ///     PS_DESIGNATOR_T10      = 1,
   ///     PS_DESIGNATOR_EUI64    = 2,
   ///     PS_DESIGNATOR_NAA      = 3,
   ///     PS_DESIGNATOR_NAME     = 8
   /// };
   ///
   /// /*
   ///  * Logical unit name + reservation key.
   ///  */
   /// struct pnfs_scsi_base_volume_info4 {
   ///     pnfs_scsi_code_set             sbv_code_set;
   ///     pnfs_scsi_designator_type      sbv_designator_type;
        
   /// /*
   ///  * Code sets from SPC-4.
   ///  */
   /// enum pnfs_scsi_code_set {
   ///     PS_CODE_SET_BINARY     = 1,
   ///     PS_CODE_SET_ASCII      = 2,
   ///     PS_CODE_SET_UTF8       = 3
   /// };
   ///
   /// /*
   ///  * Designator types taken from SPC-4.
   ///  *
   ///  * Other values are allocated in SPC-4 but are not mandatory to
   ///  * implement or aren't logical unit names.
   ///  */
   /// enum pnfs_scsi_designator_type {
   ///     PS_DESIGNATOR_T10      = 1,
   ///     PS_DESIGNATOR_EUI64    = 2,
   ///     PS_DESIGNATOR_NAA      = 3,
   ///     PS_DESIGNATOR_NAME     = 8
   /// };
   ///
   /// /*
   ///  * Logical unit name + reservation key.
   ///  */
   /// struct pnfs_scsi_base_volume_info4 {
   ///     pnfs_scsi_code_set             sbv_code_set;
   ///     pnfs_scsi_designator_type      sbv_designator_type;
        
   ///     opaque                         sbv_designator<>;
   ///     uint64_t                       sbv_pr_key;
   /// };
   ///
        
   ///     opaque                         sbv_designator<>;
   ///     uint64_t                       sbv_pr_key;
   /// };
   ///
        
   /// struct pnfs_scsi_slice_volume_info4 {
   ///     offset4  ssv_start;            /* offset of the start of
   ///                                       the slice in bytes */
   ///     length4  ssv_length;           /* length of slice in
   ///                                       bytes */
   ///     uint32_t ssv_volume;           /* array index of sliced
   ///                                       volume */
   /// };
   ///
        
   /// struct pnfs_scsi_slice_volume_info4 {
   ///     offset4  ssv_start;            /* offset of the start of
   ///                                       the slice in bytes */
   ///     length4  ssv_length;           /* length of slice in
   ///                                       bytes */
   ///     uint32_t ssv_volume;           /* array index of sliced
   ///                                       volume */
   /// };
   ///
        
   ///
   /// struct pnfs_scsi_concat_volume_info4 {
   ///     uint32_t  scv_volumes<>;       /* array indices of volumes
   ///                                       that are concatenated */
   /// };
        
   ///
   /// struct pnfs_scsi_concat_volume_info4 {
   ///     uint32_t  scv_volumes<>;       /* array indices of volumes
   ///                                       that are concatenated */
   /// };
        
   ///
   /// struct pnfs_scsi_stripe_volume_info4 {
   ///     length4  ssv_stripe_unit;      /* size of stripe in bytes */
   ///     uint32_t ssv_volumes<>;        /* array indices of
   ///                                       volumes that are striped
   ///                                       across -- MUST be same
   ///                                       size */
   /// };
        
   ///
   /// struct pnfs_scsi_stripe_volume_info4 {
   ///     length4  ssv_stripe_unit;      /* size of stripe in bytes */
   ///     uint32_t ssv_volumes<>;        /* array indices of
   ///                                       volumes that are striped
   ///                                       across -- MUST be same
   ///                                       size */
   /// };
        
   ///
   /// union pnfs_scsi_volume4 switch (pnfs_scsi_volume_type4 type) {
   ///     case PNFS_SCSI_VOLUME_BASE:
   ///         pnfs_scsi_base_volume_info4 sv_simple_info;
   ///     case PNFS_SCSI_VOLUME_SLICE:
   ///         pnfs_scsi_slice_volume_info4 sv_slice_info;
   ///     case PNFS_SCSI_VOLUME_CONCAT:
   ///         pnfs_scsi_concat_volume_info4 sv_concat_info;
   ///     case PNFS_SCSI_VOLUME_STRIPE:
   ///         pnfs_scsi_stripe_volume_info4 sv_stripe_info;
   /// };
   ///
        
   ///
   /// union pnfs_scsi_volume4 switch (pnfs_scsi_volume_type4 type) {
   ///     case PNFS_SCSI_VOLUME_BASE:
   ///         pnfs_scsi_base_volume_info4 sv_simple_info;
   ///     case PNFS_SCSI_VOLUME_SLICE:
   ///         pnfs_scsi_slice_volume_info4 sv_slice_info;
   ///     case PNFS_SCSI_VOLUME_CONCAT:
   ///         pnfs_scsi_concat_volume_info4 sv_concat_info;
   ///     case PNFS_SCSI_VOLUME_STRIPE:
   ///         pnfs_scsi_stripe_volume_info4 sv_stripe_info;
   /// };
   ///
        
   /// /* SCSI layout-specific type for da_addr_body */
   /// struct pnfs_scsi_deviceaddr4 {
   ///     pnfs_scsi_volume4 sda_volumes<>; /* array of volumes */
   /// };
   ///
        
   /// /* SCSI layout-specific type for da_addr_body */
   /// struct pnfs_scsi_deviceaddr4 {
   ///     pnfs_scsi_volume4 sda_volumes<>; /* array of volumes */
   /// };
   ///
        

The "pnfs_scsi_deviceaddr4" data structure is a structure that allows arbitrarily complex nested volume structures to be encoded. The types of aggregations that are allowed are stripes, concatenations, and slices. Note that the volume topology expressed in the "pnfs_scsi_deviceaddr4" data structure will always resolve to a set of "pnfs_scsi_volume_type4" PNFS_SCSI_VOLUME_BASE. The array of volumes is ordered such that the root of the volume hierarchy is the last element of the array. Concat, slice, and stripe volumes MUST refer to volumes defined by lower indexed elements of the array.

“pnfs_scsi_deviceaddr4”数据结构是一种允许对任意复杂的嵌套卷结构进行编码的结构。允许的聚合类型包括条带、连接和切片。请注意,“pnfs\u scsi\u deviceaddr4”数据结构中表示的卷拓扑将始终解析为一组“pnfs\u scsi\u volume\u type4”pnfs\u scsi\u volume\u BASE。对卷数组进行排序,使卷层次结构的根是数组的最后一个元素。Concat、slice和stripe卷必须引用由数组中索引较低的元素定义的卷。

The "pnfs_scsi_deviceaddr4" data structure is returned by the server as the storage-protocol-specific opaque field "da_addr_body" in the "device_addr4" data structure by a successful GETDEVICEINFO operation [RFC5661].

“pnfs_scsi_deviceaddr4”数据结构由服务器通过成功的GETDEVICEINFO操作[RFC5661]返回,作为“device_addr4”数据结构中存储协议特定的不透明字段“da_addr_body”。

As noted above, all "device_addr4" data structures eventually resolve to a set of volumes of type PNFS_SCSI_VOLUME_BASE. Complicated volume hierarchies may be composed of dozens of volumes, each with several components; thus, the device address may require several kilobytes. The client SHOULD be prepared to allocate a large buffer to contain the result. In the case of the server returning NFS4ERR_TOOSMALL, the client SHOULD allocate a buffer of at least gdir_mincount_bytes to contain the expected result and retry the GETDEVICEINFO request.

如上所述,所有“device_addr4”数据结构最终解析为一组PNFS_SCSI_VOLUME_BASE类型的卷。复杂的卷层次结构可能由几十个卷组成,每个卷都有几个组件;因此,设备地址可能需要几千字节。客户端应该准备分配一个大的缓冲区来包含结果。如果服务器返回NFS4ERR_TOOSMALL,客户端应分配至少gdir_mincount_字节的缓冲区以包含预期结果,然后重试GETDEVICEINFO请求。

2.4. Data Structures: Extents and Extent Lists
2.4. 数据结构:数据块和数据块列表

A pNFS SCSI layout is a list of extents within a flat array of data blocks in a volume. The details of the volume topology can be determined by using the GETDEVICEINFO operation. The SCSI layout describes the individual block extents on the volume that make up the file. The offsets and length contained in an extent are specified in units of bytes.

pNFS SCSI布局是卷中数据块平面阵列内的扩展数据块列表。卷拓扑的详细信息可以通过使用GETDEVICEINFO操作来确定。SCSI布局描述卷上构成文件的各个块扩展数据块。数据块中包含的偏移量和长度以字节为单位指定。

   /// enum pnfs_scsi_extent_state4 {
   ///     PNFS_SCSI_READ_WRITE_DATA = 0, /* the data located by
   ///                                       this extent is valid
   ///                                       for reading and
   ///                                       writing. */
   ///     PNFS_SCSI_READ_DATA      = 1,  /* the data located by this
   ///                                       extent is valid for
   ///                                       reading only; it may not
   ///                                       be written. */
   ///     PNFS_SCSI_INVALID_DATA   = 2,  /* the location is valid; the
   ///                                       data is invalid.  It is a
   ///                                       newly (pre-)allocated
   ///                                       extent.  The client MUST
   ///                                       not read from this
   ///                                       space. */
   ///     PNFS_SCSI_NONE_DATA      = 3   /* the location is invalid.
   ///                                       It is a hole in the file.
   ///                                       The client MUST NOT read
   ///                                       from or write to this
   ///                                       space. */
   /// };
        
   /// enum pnfs_scsi_extent_state4 {
   ///     PNFS_SCSI_READ_WRITE_DATA = 0, /* the data located by
   ///                                       this extent is valid
   ///                                       for reading and
   ///                                       writing. */
   ///     PNFS_SCSI_READ_DATA      = 1,  /* the data located by this
   ///                                       extent is valid for
   ///                                       reading only; it may not
   ///                                       be written. */
   ///     PNFS_SCSI_INVALID_DATA   = 2,  /* the location is valid; the
   ///                                       data is invalid.  It is a
   ///                                       newly (pre-)allocated
   ///                                       extent.  The client MUST
   ///                                       not read from this
   ///                                       space. */
   ///     PNFS_SCSI_NONE_DATA      = 3   /* the location is invalid.
   ///                                       It is a hole in the file.
   ///                                       The client MUST NOT read
   ///                                       from or write to this
   ///                                       space. */
   /// };
        
   ///
   /// struct pnfs_scsi_extent4 {
   ///     deviceid4    se_vol_id;         /* id of the volume on
   ///                                        which extent of file is
   ///                                        stored */
   ///     offset4      se_file_offset;    /* starting byte offset
   ///                                        in the file */
   ///     length4      se_length;         /* size in bytes of the
   ///                                        extent */
   ///     offset4      se_storage_offset; /* starting byte offset
   ///                                        in the volume */
   ///     pnfs_scsi_extent_state4 se_state;
   ///                                     /* state of this extent */
   /// };
   ///
        
   ///
   /// struct pnfs_scsi_extent4 {
   ///     deviceid4    se_vol_id;         /* id of the volume on
   ///                                        which extent of file is
   ///                                        stored */
   ///     offset4      se_file_offset;    /* starting byte offset
   ///                                        in the file */
   ///     length4      se_length;         /* size in bytes of the
   ///                                        extent */
   ///     offset4      se_storage_offset; /* starting byte offset
   ///                                        in the volume */
   ///     pnfs_scsi_extent_state4 se_state;
   ///                                     /* state of this extent */
   /// };
   ///
        
   /// /* SCSI layout-specific type for loc_body */
   /// struct pnfs_scsi_layout4 {
   ///     pnfs_scsi_extent4 sl_extents<>;
   ///                                    /* extents that make up this
   ///                                       layout */
   /// };
   ///
        
   /// /* SCSI layout-specific type for loc_body */
   /// struct pnfs_scsi_layout4 {
   ///     pnfs_scsi_extent4 sl_extents<>;
   ///                                    /* extents that make up this
   ///                                       layout */
   /// };
   ///
        

The SCSI layout consists of a list of extents that map the regions of the file to locations on a volume. The "se_storage_offset" field within each extent identifies a location on the volume specified by the "se_vol_id" field in the extent. The "se_vol_id" itself is shorthand for the whole topology of the volume on which the file is stored. The client is responsible for translating this volume-relative offset into an offset on the appropriate underlying SCSI LU.

SCSI布局由将文件区域映射到卷上位置的扩展数据块列表组成。每个扩展数据块中的“存储偏移量”字段标识扩展数据块中“存储卷id”字段指定的卷上的位置。“se_vol_id”本身是存储文件的卷的整个拓扑的缩写。客户机负责将此卷相对偏移量转换为相应基础SCSI LU上的偏移量。

Each extent maps a region of the file onto a portion of the specified LU. The "se_file_offset", "se_length", and "se_state" fields for an extent returned from the server are valid for all extents. In contrast, the interpretation of the "se_storage_offset" field depends on the value of "se_state" as follows (in increasing order):

每个区段将文件的一个区域映射到指定LU的一部分。从服务器返回的扩展数据块的“seu file_offset”、“seu length”和“seu state”字段对所有扩展数据块都有效。相反,“se_存储_偏移”字段的解释取决于“se_状态”的值,如下所示(按递增顺序):

PNFS_SCSI_READ_WRITE_DATA "se_storage_offset" is valid and points to valid/initialized data that can be read and written.

PNFS_SCSI_读取_写入_数据“se_存储_偏移量”有效,并指向可以读取和写入的有效/初始化数据。

PNFS_SCSI_READ_DATA "se_storage_offset" is valid and points to valid/initialized data that can only be read. Write operations are prohibited.

PNFS_SCSI_读取_数据“se_存储_偏移量”有效,并指向只能读取的有效/初始化数据。禁止写操作。

PNFS_SCSI_INVALID_DATA "se_storage_offset" is valid but points to invalid, uninitialized data. This data MUST not be read from the disk until it has been initialized. A read request for a PNFS_SCSI_INVALID_DATA extent MUST fill the user buffer with zeros, unless the extent is covered by a PNFS_SCSI_READ_DATA extent of a copy-on-write file system. Write requests MUST write whole server-sized blocks to the disk; bytes not initialized by the user MUST be set to zero. Any write to storage in a PNFS_SCSI_INVALID_DATA extent changes the written portion of the extent to PNFS_SCSI_READ_WRITE_DATA; the pNFS client is responsible for reporting this change via LAYOUTCOMMIT.

PNFS_SCSI_无效_数据“se_存储_偏移量”有效,但指向无效的未初始化数据。在初始化之前,不得从磁盘读取此数据。PNFS_SCSI_无效_数据扩展数据块的读取请求必须用零填充用户缓冲区,除非该扩展数据块由写时复制文件系统的PNFS_SCSI_读取_数据扩展数据块覆盖。写请求必须将整个服务器大小的块写入磁盘;用户未初始化的字节必须设置为零。对PNFS_SCSI_无效_数据扩展数据块中存储器的任何写入操作都会将扩展数据块的写入部分更改为PNFS_SCSI_读写_数据;pNFS客户端负责通过LAYOUTCOMMIT报告此更改。

PNFS_SCSI_NONE_DATA "se_storage_offset" is not valid, and this extent MAY not be used to satisfy write requests. Read requests MAY be satisfied by zero-filling as for PNFS_SCSI_INVALID_DATA. PNFS_SCSI_NONE_DATA extents MAY be returned by requests for readable extents; they are never returned if the request was for a writable extent.

PNFS_SCSI_NONE_数据“se_存储_偏移量”无效,此扩展数据块可能无法用于满足写入请求。对于PNFS\u SCSI\u无效\u数据,可以通过零填充来满足读取请求。PNFS_SCSI_NONE_数据扩展数据块可通过可读扩展数据块请求返回;如果请求是针对可写数据块的,则不会返回它们。

An extent list contains all relevant extents in increasing order of the se_file_offset of each extent; any ties are broken by increasing order of the extent state (se_state).

数据块列表按每个数据块的se_文件偏移量的递增顺序包含所有相关数据块;任何连接都会通过区段状态(se_状态)的递增顺序断开。

2.4.1. Layout Requests and Extent Lists
2.4.1. 布局请求和数据块列表

Each request for a layout specifies at least three parameters: file offset, desired size, and minimum size. If the status of a request indicates success, the extent list returned MUST meet the following criteria:

每个布局请求至少指定三个参数:文件偏移量、所需大小和最小大小。如果请求的状态指示成功,则返回的数据块列表必须满足以下条件:

o A request for a readable (but not writable) layout MUST return either PNFS_SCSI_READ_DATA or PNFS_SCSI_NONE_DATA extents. It SHALL NOT return PNFS_SCSI_INVALID_DATA or PNFS_SCSI_READ_WRITE_DATA extents.

o 对可读(但不可写)布局的请求必须返回PNFS_SCSI_READ_数据或PNFS_SCSI_NONE_数据扩展数据块。它不应返回PNFS\U SCSI\U无效数据或PNFS\U SCSI\U读写\U数据块。

o A request for a writable layout MUST return PNFS_SCSI_READ_WRITE_DATA or PNFS_SCSI_INVALID_DATA extents, and it MAY return additional PNFS_SCSI_READ_DATA extents for ranges covered by PNFS_SCSI_INVALID_DATA extents to allow client-side copy-on-write operations. A request for a writable layout SHALL NOT return PNFS_SCSI_NONE_DATA extents.

o 对可写布局的请求必须返回PNFS\U SCSI\U读写\U数据或PNFS\U SCSI\U无效\U数据扩展数据块,并且它可能会为PNFS\U SCSI\U无效\U数据扩展数据块覆盖的范围返回额外的PNFS\U SCSI\U读\U数据扩展数据块,以允许客户端写时复制操作。对可写布局的请求不应返回PNFS\u SCSI\u NONE\u数据块。

o The first extent in the list MUST contain the requested starting offset.

o 列表中的第一个区段必须包含请求的起始偏移量。

o The total size of extents within the requested range MUST cover at least the minimum size. One exception is allowed: the total size MAY be smaller if only readable extents were requested and EOF is encountered.

o 请求范围内的扩展数据块的总大小必须至少包含最小大小。允许一种例外情况:如果仅请求可读数据块且遇到EOF,则总大小可能较小。

o Extents in the extent list MUST be logically contiguous for a read-only layout. For a read-write layout, the set of writable extents (i.e., excluding PNFS_SCSI_READ_DATA extents) MUST be logically contiguous. Every PNFS_SCSI_READ_DATA extent in a read-write layout MUST be covered by one or more PNFS_SCSI_INVALID_DATA extents. This overlap of PNFS_SCSI_READ_DATA and PNFS_SCSI_INVALID_DATA extents is the only permitted extent overlap.

o 对于只读布局,数据块列表中的数据块必须在逻辑上连续。对于读写布局,可写扩展数据块集(即,不包括PNFS_SCSI_read_DATA扩展数据块)必须在逻辑上连续。读写布局中的每个PNFS_SCSI_读_数据块必须由一个或多个PNFS_SCSI_无效_数据块覆盖。PNFS_SCSI_读取_数据和PNFS_SCSI_无效_数据区段的重叠是唯一允许的区段重叠。

o Extents MUST be ordered in the list by starting offset, with PNFS_SCSI_READ_DATA extents preceding PNFS_SCSI_INVALID_DATA extents in the case of equal se_file_offsets.

o 必须按起始偏移量在列表中对扩展数据块进行排序,如果se_文件_偏移量相等,则PNFS_SCSI_读取_数据块在PNFS_SCSI_无效_数据块之前。

According to [RFC5661], if the minimum requested size, loga_minlength, is zero, this is an indication to the metadata server that the client desires any layout at offset loga_offset or less that the metadata server has "readily available". Given the lack of a clear definition of this phrase, in the context of the SCSI layout

根据[RFC5661],如果请求的最小大小loga_minlength为零,则向元数据服务器表明客户端希望在偏移量loga_offset或更小的位置进行任何布局,即元数据服务器“随时可用”。鉴于该短语缺乏明确的定义,在SCSI布局的上下文中

type, when loga_minlength is zero, the metadata server SHOULD do the following:

类型,当loga_minlength为零时,元数据服务器应执行以下操作:

o when processing requests for readable layouts, return all such layouts, even if some extents are in the PNFS_SCSI_NONE_DATA state.

o 处理可读布局请求时,返回所有此类布局,即使某些扩展数据块处于PNFS_SCSI_NONE_数据状态。

o when processing requests for writable layouts, return extents that can be returned in the PNFS_SCSI_READ_WRITE_DATA state.

o 在处理可写布局的请求时,返回可在PNFS\u SCSI\u READ\u WRITE\u数据状态下返回的扩展数据块。

2.4.2. Layout Commits
2.4.2. 布局提交
     ///
     /// /* SCSI layout-specific type for lou_body */
     ///
     /// struct pnfs_scsi_range4 {
     ///     offset4      sr_file_offset;   /* starting byte offset
     ///                                       in the file */
     ///     length4      sr_length;        /* size in bytes */
     /// };
     ///
     /// struct pnfs_scsi_layoutupdate4 {
     ///     pnfs_scsi_range4 slu_commit_list<>;
     ///                                    /* list of extents that
     ///                                     * now contain valid data.
     ///                                     */
     /// };
        
     ///
     /// /* SCSI layout-specific type for lou_body */
     ///
     /// struct pnfs_scsi_range4 {
     ///     offset4      sr_file_offset;   /* starting byte offset
     ///                                       in the file */
     ///     length4      sr_length;        /* size in bytes */
     /// };
     ///
     /// struct pnfs_scsi_layoutupdate4 {
     ///     pnfs_scsi_range4 slu_commit_list<>;
     ///                                    /* list of extents that
     ///                                     * now contain valid data.
     ///                                     */
     /// };
        

The "pnfs_scsi_layoutupdate4" data structure is used by the client as the SCSI layout-specific argument in a LAYOUTCOMMIT operation. The "slu_commit_list" field is a list covering regions of the file layout that were previously in the PNFS_SCSI_INVALID_DATA state but have been written by the client and SHOULD now be considered in the PNFS_SCSI_READ_WRITE_DATA state. The extents in the commit list MUST be disjoint and MUST be sorted by sr_file_offset. Implementors should be aware that a server MAY be unable to commit regions at a granularity smaller than a file system block (typically 4 KB or 8 KB). As noted above, the block size that the server uses is available as an NFSv4 attribute, and any extents included in the "slu_commit_list" MUST be aligned to this granularity and have a size that is a multiple of this granularity. Since the block in question is in state PNFS_SCSI_INVALID_DATA, byte ranges not written SHOULD be filled with zeros. This applies even if it appears that the area being written is beyond what the client believes to be the end of file.

“pnfs_scsi_layoutupdate4”数据结构由客户端用作LAYOUTCOMMIT操作中的scsi布局特定参数。“slu_commit_list”字段是一个列表,涵盖文件布局的区域,这些区域以前处于PNFS_SCSI_INVALID_数据状态,但已由客户端写入,现在应被视为处于PNFS_SCSI_READ_WRITE_数据状态。提交列表中的数据块必须是不相交的,并且必须按sr_file_offset排序。实现者应该知道,服务器可能无法以小于文件系统块(通常为4KB或8KB)的粒度提交区域。如上所述,服务器使用的块大小可用作NFSv4属性,并且“slu_commit_list”中包含的任何扩展数据块都必须与此粒度对齐,并具有此粒度的倍数。由于所讨论的块处于PNFS_SCSI_INVALID_DATA状态,因此未写入的字节范围应该用零填充。即使正在写入的区域似乎超出了客户机认为的文件末尾,这也适用。

2.4.3. Layout Returns
2.4.3. 布局返回

A LAYOUTRETURN operation represents an explicit release of resources by the client. This MAY be done in response to a CB_LAYOUTRECALL or before any recall, in order to avoid a future CB_LAYOUTRECALL. When the LAYOUTRETURN operation specifies a LAYOUTRETURN4_FILE return type, then the "layoutreturn_file4" data structure specifies the region of the file layout that is no longer needed by the client.

LAYOUTRETURN操作表示客户端显式释放资源。这可以在响应CB_LAYOUTRECALL或在任何召回之前进行,以避免将来发生CB_LAYOUTRECALL。当LAYOUTRETURN操作指定LAYOUTRETURN4_文件返回类型时,“LAYOUTRETURN_file4”数据结构指定客户端不再需要的文件布局区域。

The LAYOUTRETURN operation is done without any data specific to the SCSI layout. The opaque "lrf_body" field of the "layoutreturn_file4" data structure MUST have length zero.

LAYOUTRETURN操作在没有特定于SCSI布局的任何数据的情况下完成。“layoutreturn_file4”数据结构的不透明“lrf_body”字段的长度必须为零。

2.4.4. Layout Revocation
2.4.4. 版面撤销

Layouts MAY be unilaterally revoked by the server due to the client's lease time expiring or the client failing to return a layout that has been recalled in a timely manner. For the SCSI layout type, this is accomplished by fencing off the client from access to storage as described in Section 2.4.10. When this is done, it is necessary that all I/Os issued by the fenced-off client be rejected by the storage. This includes any in-flight I/Os that the client issued before the layout was revoked.

由于客户端的租约到期或客户端未能及时返回已召回的布局,服务器可能会单方面撤销布局。对于SCSI布局类型,如第2.4.10节所述,通过隔离客户端访问存储来实现。完成此操作后,存储必须拒绝隔离客户端发出的所有I/O。这包括客户端在撤销布局之前发出的任何正在运行的I/O。

Note that the granularity of this operation can only be at the host/ LU level. Thus, if one of a client's layouts is unilaterally revoked by the server, it will effectively render useless *all* of the client's layouts for files located on the storage units comprising the volume. This may render useless the client's layouts for files in other file systems. See Section 2.4.10.5 for a discussion of recovery from fencing.

请注意,此操作的粒度只能在主机/LU级别。因此,如果服务器单方面撤销了客户机的一个布局,那么对于位于构成卷的存储单元上的文件,它将有效地使客户机的所有布局变得无用。这可能会使客户端的布局对其他文件系统中的文件无效。关于围栏恢复的讨论,见第2.4.10.5节。

2.4.5. Client Copy-on-Write Processing
2.4.5. 客户端写时拷贝处理

Copy-on-write is a mechanism used to support file and/or file system snapshots. When writing to unaligned regions, or to regions smaller than a file system block, the writer MUST copy the portions of the original file data to a new location on disk. This behavior can be implemented either on the client or the server. The paragraphs below describe how a pNFS SCSI layout client implements access to a file that requires copy-on-write semantics.

写时复制是一种用于支持文件和/或文件系统快照的机制。写入未对齐的区域或小于文件系统块的区域时,写入程序必须将原始文件数据的部分复制到磁盘上的新位置。此行为可以在客户端或服务器上实现。下面的段落描述了pNFS SCSI布局客户端如何实现对需要写时复制语义的文件的访问。

Distinguishing the PNFS_SCSI_READ_WRITE_DATA and PNFS_SCSI_READ_DATA extent types in combination with the allowed overlap of PNFS_SCSI_READ_DATA extents with PNFS_SCSI_INVALID_DATA extents allows copy-on-write processing to be done by pNFS clients. In classic NFS, this operation would be done by the server. Since pNFS enables clients to do direct block access, it is useful for clients

区分PNFS_SCSI_读取_写入_数据和PNFS_SCSI_读取_数据扩展数据类型,并结合允许的PNFS_SCSI_读取_数据扩展数据与PNFS_SCSI_无效_数据扩展数据的重叠,允许PNFS客户端完成写时复制处理。在传统NFS中,此操作将由服务器完成。由于pNFS使客户端能够进行直接块访问,因此它对客户端非常有用

to participate in copy-on-write operations. All SCSI pNFS clients MUST support this copy-on-write processing.

参与写时复制操作。所有SCSI pNFS客户端都必须支持此写时拷贝处理。

When a client wishes to write data covered by a PNFS_SCSI_READ_DATA extent, it MUST have requested a writable layout from the server; that layout will contain PNFS_SCSI_INVALID_DATA extents to cover all the data ranges of that layout's PNFS_SCSI_READ_DATA extents. More precisely, for any se_file_offset range covered by one or more PNFS_SCSI_READ_DATA extents in a writable layout, the server MUST include one or more PNFS_SCSI_INVALID_DATA extents in the layout that cover the same se_file_offset range. When performing a write to such an area of a layout, the client MUST effectively copy the data from the PNFS_SCSI_READ_DATA extent for any partial blocks of se_file_offset and range, merge in the changes to be written, and write the result to the PNFS_SCSI_INVALID_DATA extent for the blocks for that se_file_offset and range. That is, if entire blocks of data are to be overwritten by an operation, the corresponding PNFS_SCSI_READ_DATA blocks need not be fetched, but any partial-block writes MUST be merged with data fetched via PNFS_SCSI_READ_DATA extents before storing the result via PNFS_SCSI_INVALID_DATA extents. For the purposes of this discussion, "entire blocks" and "partial blocks" refer to the block size of the server's file system. Storing of data in a PNFS_SCSI_INVALID_DATA extent converts the written portion of the PNFS_SCSI_INVALID_DATA extent to a PNFS_SCSI_READ_WRITE_DATA extent; all subsequent reads MUST be performed from this extent; the corresponding portion of the PNFS_SCSI_READ_DATA extent MUST NOT be used after storing data in a PNFS_SCSI_INVALID_DATA extent. If a client writes only a portion of an extent, the extent MAY be split at block-aligned boundaries.

当客户机希望写入PNFS_SCSI_READ_数据区段所包含的数据时,它必须已从服务器请求可写布局;该布局将包含PNFS\u SCSI\u无效\u数据扩展数据块,以覆盖该布局的PNFS\u SCSI\u读取\u数据扩展数据块的所有数据范围。更准确地说,对于可写布局中一个或多个PNFS_SCSI_READ_数据扩展数据块所覆盖的任何se_file_偏移范围,服务器必须在覆盖相同se_文件偏移范围的布局中包含一个或多个PNFS_SCSI_INVALID_数据扩展数据块。在对布局的这样一个区域执行写入操作时,客户机必须有效地将se_文件偏移量和范围的任何部分块的PNFS_SCSI_读取_数据块中的数据复制,合并要写入的更改,并将结果写入该se_文件偏移量和范围的块的PNFS_SCSI_无效_数据块中。也就是说,如果操作要覆盖整个数据块,则不需要获取相应的PNFS_SCSI_READ_数据块,但在通过PNFS_SCSI_READ_数据块存储结果之前,必须将任何部分块写入与通过PNFS_SCSI_READ_数据块获取的数据合并。在本讨论中,“整个块”和“部分块”指的是服务器文件系统的块大小。在PNFS_SCSI_INVALID_数据区段中存储数据将PNFS_SCSI_INVALID_数据区段的写入部分转换为PNFS_SCSI_READ_WRITE_数据区段;必须从此区段执行所有后续读取;将数据存储在PNFS_SCSI_无效_数据块中后,不得使用PNFS_SCSI_读取_数据块的相应部分。如果客户端仅写入数据块的一部分,则该数据块可能会在块对齐的边界处拆分。

When a client wishes to write data to a PNFS_SCSI_INVALID_DATA extent that is not covered by a PNFS_SCSI_READ_DATA extent, it MUST treat this write identically to a write to a file not involved with copy-on-write semantics. Thus, data MUST be written in at least block-sized increments and aligned to multiples of block-sized offsets, and unwritten portions of blocks MUST be zero filled.

当客户端希望将数据写入PNFS_SCSI_无效_数据扩展数据块(PNFS_SCSI_READ_数据扩展数据块未涵盖)时,它必须将此写入操作等同于写入不涉及写时复制语义的文件。因此,数据必须以至少块大小的增量写入,并与块大小偏移的倍数对齐,并且块的未写入部分必须为零填充。

2.4.6. Extents Are Permissions
2.4.6. 扩展数据块是权限

Layout extents returned to pNFS clients grant permission to read or write; PNFS_SCSI_READ_DATA and PNFS_SCSI_NONE_DATA are read-only (PNFS_SCSI_NONE_DATA reads as zeros), and PNFS_SCSI_READ_WRITE_DATA and PNFS_SCSI_INVALID_DATA are read-write (PNFS_SCSI_INVALID_DATA reads as zeros; any write converts it to PNFS_SCSI_READ_WRITE_DATA). This is the only means a client has of obtaining permission to perform direct I/O to storage devices; a pNFS client MUST NOT perform direct I/O operations that are not permitted by an extent held by the client. Client adherence to this rule places the pNFS server in

返回给pNFS客户端的布局扩展数据块授予读取或写入权限;PNFS_SCSI_READ_DATA和PNFS_SCSI_NONE_DATA为只读(PNFS_SCSI_NONE_DATA读取为零),PNFS_SCSI_READ_WRITE_DATA和PNFS_SCSI_INVALID_DATA为读写(PNFS_SCSI_INVALID_DATA读取为零;任何写入操作都将其转换为PNFS_SCSI READ_WRITE_DATA)。这是客户端获得对存储设备执行直接I/O权限的唯一方法;pNFS客户端不得执行客户端持有的扩展数据块不允许的直接I/O操作。遵守此规则的客户端会将pNFS服务器置于

control of potentially conflicting storage device operations, enabling the server to determine what does conflict and how to avoid conflicts by granting and recalling extents to/from clients.

控制可能发生冲突的存储设备操作,使服务器能够通过向客户端授予数据块或从客户端调用数据块来确定哪些数据块会发生冲突以及如何避免冲突。

If a client makes a layout request that conflicts with an existing layout delegation, the request will be rejected with the error NFS4ERR_LAYOUTTRYLATER. This client is then expected to retry the request after a short interval. During this interval, the server SHOULD recall the conflicting portion of the layout delegation from the client that currently holds it. This reject-and-retry approach does not prevent client starvation when there is contention for the layout of a particular file. For this reason, a pNFS server SHOULD implement a mechanism to prevent starvation. One possibility is that the server can maintain a queue of rejected layout requests. Each new layout request can be checked to see if it conflicts with a previous rejected request, and if so, the newer request can be rejected. Once the original requesting client retries its request, its entry in the rejected request queue can be cleared, or the entry in the rejected request queue can be removed when it reaches a certain age.

如果客户端发出与现有布局委派冲突的布局请求,则该请求将被拒绝,并出现错误NFS4ERR_LayouttyLater。然后,该客户端将在短时间间隔后重试该请求。在此时间间隔内,服务器应从当前持有布局委派的客户端调用布局委派的冲突部分。当存在对特定文件布局的争用时,这种拒绝并重试方法无法防止客户端饥饿。因此,pNFS服务器应该实现一种防止饥饿的机制。一种可能性是服务器可以维护一个被拒绝的布局请求队列。可以检查每个新的布局请求,查看它是否与以前被拒绝的请求冲突,如果冲突,则可以拒绝较新的请求。一旦原始请求客户端重试其请求,可以清除其在被拒绝请求队列中的条目,或者在被拒绝请求队列中的条目达到一定期限后可以删除。

NFSv4 supports mandatory locks and share reservations. These are mechanisms that clients can use to restrict the set of I/O operations that are permissible to other clients. Since all I/O operations ultimately arrive at the NFSv4 server for processing, the server is in a position to enforce these restrictions. However, with pNFS layouts, I/Os will be issued from the clients that hold the layouts directly to the storage devices that host the data. These devices have no knowledge of files, mandatory locks, or share reservations, and they are not in a position to enforce such restrictions. For this reason, the NFSv4 server MUST NOT grant layouts that conflict with mandatory locks or share reservations. Further, if a conflicting mandatory lock request or a conflicting OPEN request arrives at the server, the server MUST recall the part of the layout in conflict with the request before granting the request.

NFSv4支持强制锁定和共享保留。客户机可以使用这些机制来限制其他客户机允许的I/O操作集。由于所有I/O操作最终都会到达NFSv4服务器进行处理,因此服务器可以强制执行这些限制。但是,对于pNFS布局,I/O将从保存布局的客户端直接发送到承载数据的存储设备。这些设备不了解文件、强制锁或共享保留,也无法实施此类限制。因此,NFSv4服务器不得授予与强制锁或共享保留冲突的布局。此外,如果冲突的强制锁定请求或冲突的打开请求到达服务器,则服务器必须在授予请求之前调用与请求冲突的布局部分。

2.4.7. Partial-Block Updates
2.4.7. 部分块更新

SCSI storage devices do not provide byte granularity access and can only perform read and write operations atomically on a block granularity. Writes to SCSI storage devices thus require read-modify-write cycles to write data that is smaller than the block size or that is otherwise not block aligned. Write operations from multiple clients to the same block can thus lead to data corruption even if the byte range written by the applications does not overlap. When there are multiple clients who wish to access the same block, a

SCSI存储设备不提供字节粒度访问,只能在块粒度上原子地执行读写操作。因此,对SCSI存储设备的写入需要读-修改-写周期来写入小于块大小或未按块对齐的数据。因此,即使应用程序写入的字节范围不重叠,从多个客户端到同一块的写入操作也可能导致数据损坏。如果有多个客户端希望访问同一块,则

pNFS server MUST avoid these conflicts by implementing a concurrency control policy of single writer XOR multiple readers for a given data block.

pNFS服务器必须通过为给定数据块实现单写器XOR多读器的并发控制策略来避免这些冲突。

2.4.8. End-of-File Processing
2.4.8. 文件处理结束

The end-of-file location can be changed in two ways: implicitly as the result of a WRITE or LAYOUTCOMMIT beyond the current end of file or explicitly as the result of a SETATTR request. Typically, when a file is truncated by an NFSv4 client via the SETATTR call, the server frees any disk blocks belonging to the file that are beyond the new end-of-file byte and MUST write zeros to the portion of the new end-of-file block beyond the new end-of-file byte. These actions render semantically invalid any pNFS layouts that refer to the blocks that are freed or written. Therefore, the server MUST recall from clients the portions of any pNFS layouts that refer to blocks that will be freed or written by the server before effecting the file truncation. These recalls may take time to complete; as explained in [RFC5661], if the server cannot respond to the client SETATTR request in a reasonable amount of time, it SHOULD reply to the client with the error NFS4ERR_DELAY.

可以通过两种方式更改文件结尾位置:隐式地作为当前文件结尾之外的写入或LAYOUTCOMMIT的结果,或显式地作为SETATTR请求的结果。通常,当NFSv4客户端通过SETATTR调用截断文件时,服务器会释放属于该文件的超出新文件结尾字节的任何磁盘块,并且必须将零写入新文件结尾字节以外的新文件结尾块部分。这些操作使得引用已释放或写入的块的任何pNFS布局在语义上无效。因此,在执行文件截断之前,服务器必须从客户端调用任何pNFS布局中引用服务器将释放或写入的块的部分。这些召回可能需要时间才能完成;如[RFC5661]中所述,如果服务器无法在合理的时间内响应客户端SETATTR请求,则应向客户端回复错误NFS4ERR_DELAY。

Blocks in the PNFS_SCSI_INVALID_DATA state that lie beyond the new end-of-file block present a special case. The server has reserved these blocks for use by a pNFS client with a writable layout for the file, but the client has yet to commit the blocks, and they are not yet a part of the file mapping on disk. The server MAY free these blocks while processing the SETATTR request. If so, the server MUST recall any layouts from pNFS clients that refer to the blocks before processing the truncate. If the server does not free the PNFS_SCSI_INVALID_DATA blocks while processing the SETATTR request, it need not recall layouts that refer only to the PNFS_SCSI_INVALID_DATA blocks.

PNFS_SCSI_无效_数据状态中位于新文件结尾块之外的块是一种特殊情况。服务器已保留这些块供具有文件可写布局的pNFS客户端使用,但客户端尚未提交这些块,并且它们还不是磁盘上文件映射的一部分。服务器可以在处理SETATTR请求时释放这些块。如果是这样,服务器必须在处理截断之前从引用块的pNFS客户端调用任何布局。如果服务器在处理SETATTR请求时未释放PNFS\u SCSI\u无效\u数据块,则无需调用仅引用PNFS\u SCSI\u无效\u数据块的布局。

When a file is extended implicitly by a WRITE or LAYOUTCOMMIT beyond the current end of file, or extended explicitly by a SETATTR request, the server need not recall any portions of any pNFS layouts.

当文件被WRITE或LAYOUTCOMMIT隐式扩展到文件的当前结尾之外,或者被SETATTR请求显式扩展时,服务器不需要调用任何pNFS布局的任何部分。

2.4.9. Layout Hints
2.4.9. 布局提示

The layout hint attribute specified in [RFC5661] is not supported by the SCSI layout, and the pNFS server MUST reject setting a layout hint attribute with a loh_type value of LAYOUT4_SCSI_VOLUME during OPEN or SETATTR operations. On a file system only supporting the SCSI layout, a server MUST NOT report the layout_hint attribute in the supported_attrs attribute.

SCSI布局不支持[RFC5661]中指定的布局提示属性,pNFS服务器必须在打开或设置属性期间拒绝设置loh_类型值为LAYOUT4_SCSI_VOLUME的布局提示属性。在仅支持SCSI布局的文件系统上,服务器不得在受支持的属性中报告布局提示属性。

2.4.10. Client Fencing
2.4.10. 客户围栏

The pNFS SCSI protocol must handle situations in which a system failure, typically a network connectivity issue, requires the server to unilaterally revoke extents from a client after the client fails to respond to a CB_LAYOUTRECALL request. This is implemented by fencing off a non-responding client from access to the storage device.

pNFS SCSI协议必须处理以下情况:系统故障(通常是网络连接问题)要求服务器在客户端未能响应CB_LAYOUTRECALL请求后单方面撤销客户端的扩展数据块。这是通过阻止无响应的客户端访问存储设备来实现的。

The pNFS SCSI protocol implements fencing using persistent reservations (PRs), similar to the fencing method used by existing shared disk file systems. By placing a PR of type "Exclusive Access - Registrants Only" on each SCSI LU exported to pNFS clients, the MDS prevents access from any client that does not have an outstanding device ID that gives the client a reservation key to access the LU and allows the MDS to revoke access to the logical unit at any time.

pNFS SCSI协议使用持久保留(PRs)实现防护,类似于现有共享磁盘文件系统使用的防护方法。通过在导出到pNFS客户端的每个SCSI LU上放置一个类型为“独占访问-仅注册者”的PR,MDS可以防止来自任何没有未完成设备ID的客户端的访问,该设备ID为客户端提供访问LU的保留密钥,并允许MDS随时撤销对逻辑单元的访问。

2.4.10.1. PRs -- Key Generation
2.4.10.1. PRs——密钥生成

To allow fencing individual systems, each system MUST use a unique persistent reservation key. [SPC4] does not specify a way to generate keys. This document assigns the burden to generate unique keys to the MDS, which MUST generate a key for itself before exporting a volume and a key for each client that accesses SCSI layout volumes. Individuals keys for each volume that a client can access are permitted but not required.

要允许隔离单个系统,每个系统必须使用唯一的持久保留密钥。[SPC4]未指定生成密钥的方法。本文档将为MDS分配生成唯一密钥的负担,MDS必须在导出卷之前为自己生成一个密钥,并为访问SCSI布局卷的每个客户端生成一个密钥。允许但不需要客户端可以访问的每个卷的单个密钥。

2.4.10.2. PRs -- MDS Registration and Reservation
2.4.10.2. PRs——MDS注册和预订

Before returning a PNFS_SCSI_VOLUME_BASE volume to the client, the MDS needs to prepare the volume for fencing using PRs. This is done by registering the reservation generated for the MDS with the device using the "PERSISTENT RESERVE OUT" command with a service action of "REGISTER", followed by a "PERSISTENT RESERVE OUT" command with a service action of "RESERVE" and the "TYPE" field set to 8h (Exclusive Access - Registrants Only). To make sure all I_T nexuses (see Section 3.1.45 of [SAM-5]) are registered, the MDS SHOULD set the "All Target Ports" (ALL_TG_PT) bit when registering the key or otherwise ensure the registration is performed for each target port, and it MUST perform registration for each initiator port.

在将PNFS_SCSI_卷_基本卷返回客户端之前,MDS需要准备卷,以便使用PRs进行隔离。这是通过使用“持久保留输出”命令和“注册”服务操作向设备注册为MDS生成的保留,然后是“持久保留输出”命令和“保留”服务操作,“类型”字段设置为8h(独占访问-仅注册者)来完成的。为确保所有I_T Nexuse(见[SAM-5]第3.1.45节)都已注册,MDS应在注册密钥时设置“所有目标端口”(all_TG_PT)位,或确保为每个目标端口执行注册,并且必须为每个启动器端口执行注册。

2.4.10.3. PRs -- Client Registration
2.4.10.3. PRs——客户注册

Before performing the first I/O to a device returned from a GETDEVICEINFO operation, the client will register the registration key returned in sbv_pr_key with the storage device by issuing a "PERSISTENT RESERVE OUT" command with a service action of REGISTER with the "SERVICE ACTION RESERVATION KEY" set to the reservation key

在对从GETDEVICEINFO操作返回的设备执行第一次I/O之前,客户机将通过发出“持久保留输出”命令向存储设备注册sbv_pr_key中返回的注册密钥,该命令的服务操作为register,并将“服务操作保留密钥”设置为保留密钥

returned in sbv_pr_key. To make sure all I_T nexuses are registered, the client SHOULD set the "All Target Ports" (ALL_TG_PT) bit when registering the key or otherwise ensure the registration is performed for each target port, and it MUST perform registration for each initiator port.

在sbv_pr_密钥中返回。为了确保所有I_T Nexuse都已注册,客户端在注册密钥时应设置“所有目标端口”(all_TG_PT)位,或者确保为每个目标端口执行注册,并且必须为每个启动器端口执行注册。

When a client stops using a device earlier returned by GETDEVICEINFO, it MUST unregister the earlier registered key by issuing a "PERSISTENT RESERVE OUT" command with a service action of "REGISTER" with the "RESERVATION KEY" set to the earlier registered reservation key.

当客户端停止使用GETDEVICEINFO先前返回的设备时,它必须通过发出“持久保留输出”命令来注销先前注册的密钥,该命令的服务操作为“注册”,并将“保留密钥”设置为先前注册的保留密钥。

2.4.10.4. PRs -- Fencing Action
2.4.10.4. PRs——击剑行动

In case of a non-responding client, the MDS fences the client by issuing a "PERSISTENT RESERVE OUT" command with the service action set to "PREEMPT" or "PREEMPT AND ABORT", the "RESERVATION KEY" field set to the server's reservation key, the service action "RESERVATION KEY" field set to the reservation key associated with the non-responding client, and the "TYPE" field set to 8h (Exclusive Access - Registrants Only).

对于无响应的客户机,MDS通过发出“持久保留退出”命令,将服务操作设置为“抢占”或“抢占并中止”,将“保留密钥”字段设置为服务器的保留密钥,将服务操作“保留密钥”字段设置为与无响应的客户机关联的保留密钥,从而保护客户机,“类型”字段设置为8h(仅限注册人专用)。

After the MDS preempts a client, all client I/O to the LU fails. The client SHOULD at this point return any layout that refers to the device ID that points to the LU. Note that the client can distinguish I/O errors due to fencing from other errors based on the "RESERVATION CONFLICT" SCSI status. Refer to [SPC4] for details.

MDS抢占客户机后,LU的所有客户机I/O都会失败。此时,客户端应返回指向LU的设备ID的任何布局。请注意,客户机可以根据“保留冲突”SCSI状态将隔离引起的I/O错误与其他错误区分开来。有关详细信息,请参阅[SPC4]。

2.4.10.5. Client Recovery after a Fence Action
2.4.10.5. 隔离操作后的客户端恢复

A client that detects a "RESERVATION CONFLICT" SCSI status (I/O error) on the storage devices MUST commit all layouts that use the storage device through the MDS, return all outstanding layouts for the device, forget the device ID, and unregister the reservation key. Future GETDEVICEINFO calls MAY refer to the storage device again, in which case the client will perform a new registration based on the key provided (via sbv_pr_key) at that time.

在存储设备上检测到“保留冲突”SCSI状态(I/O错误)的客户端必须通过MDS提交使用存储设备的所有布局,返回设备的所有未完成布局,忘记设备ID,并注销保留密钥。未来的GETDEVICEINFO调用可能会再次引用存储设备,在这种情况下,客户端将根据当时提供的密钥(通过sbv_pr_密钥)执行新的注册。

2.5. Crash Recovery Issues
2.5. 崩溃恢复问题

A critical requirement in crash recovery is that both the client and the server know when the other has failed. Additionally, it is required that a client sees a consistent view of data across server restarts. These requirements and a full discussion of crash recovery issues are covered in Section 8.4 ("Crash Recovery") of the NFSv4.1 specification [RFC5661]. This document contains additional crash recovery material specific only to the SCSI layout.

崩溃恢复中的一个关键要求是,客户端和服务器都知道另一方何时出现故障。此外,要求客户端在服务器重新启动时看到一致的数据视图。NFSv4.1规范[RFC5661]第8.4节(“碰撞恢复”)涵盖了这些要求和碰撞恢复问题的全面讨论。本文档包含仅针对SCSI布局的其他崩溃恢复资料。

When the server crashes while the client holds a writable layout, the client has written data to blocks covered by the layout, and the blocks are still in the PNFS_SCSI_INVALID_DATA state, the client has two options for recovery. If the data that has been written to these blocks is still cached by the client, the client can simply re-write the data via NFSv4 once the server has come back online. However, if the data is no longer in the client's cache, the client MUST NOT attempt to source the data from the data servers. Instead, it SHOULD attempt to commit the blocks in question to the server during the server's recovery grace period by sending a LAYOUTCOMMIT with the "loca_reclaim" flag set to true. This process is described in detail in Section 18.42.4 of [RFC5661].

当服务器崩溃而客户机持有可写布局时,客户机已将数据写入布局覆盖的块,并且这些块仍处于PNFS_SCSI_INVALID_data状态时,客户机有两个恢复选项。如果已写入这些块的数据仍由客户端缓存,则一旦服务器恢复联机,客户端可以通过NFSv4简单地重新写入数据。但是,如果数据不再在客户端缓存中,客户端不得尝试从数据服务器获取数据。相反,它应该在服务器的恢复宽限期内,通过发送一个LAYOUTCOMMIT,并将“loca_reclain”标志设置为true,尝试将有问题的块提交给服务器。[RFC5661]第18.42.4节详细描述了该过程。

2.6. Recalling Resources: CB_RECALL_ANY
2.6. 召回资源:CB_召回任何资源

The server MAY decide that it cannot hold all of the state for layouts without running out of resources. In such a case, it is free to recall individual layouts using CB_LAYOUTRECALL to reduce the load, or it MAY choose to request that the client return any layout.

服务器可能会决定,如果没有资源,它无法保存布局的所有状态。在这种情况下,可以使用CB_LAYOUTRECALL调用单个布局以减少负载,也可以选择请求客户端返回任何布局。

The NFSv4.1 specification [RFC5661] defines the following types:

NFSv4.1规范[RFC5661]定义了以下类型:

const RCA4_TYPE_MASK_BLK_LAYOUT = 4;

常数RCA4类型掩码布局=4;

       struct CB_RECALL_ANY4args {
              uint32_t      craa_objects_to_keep;
              bitmap4       craa_type_mask;
       };
        
       struct CB_RECALL_ANY4args {
              uint32_t      craa_objects_to_keep;
              bitmap4       craa_type_mask;
       };
        

When the server sends a CB_RECALL_ANY request to a client specifying the RCA4_TYPE_MASK_BLK_LAYOUT bit in craa_type_mask, the client SHOULD immediately respond with NFS4_OK and then asynchronously return complete file layouts until the number of files with layouts cached on the client is less than craa_object_to_keep.

当服务器向指定craa_类型_掩码中RCA4_类型_掩码_BLK_布局位的客户端发送CB_REALL_ANY请求时,客户端应立即响应NFS4_OK,然后异步返回完整的文件布局,直到客户端上缓存的具有布局的文件数小于要保留的craa_对象。

2.7. Transient and Permanent Errors
2.7. 瞬时误差和永久误差

The server may respond to LAYOUTGET with a variety of error statuses. These errors can convey transient conditions or more permanent conditions that are unlikely to be resolved soon.

服务器可能会以各种错误状态响应LAYOUTGET。这些错误可能传递暂时性条件或不太可能很快解决的更永久性条件。

The error NFS4ERR_RECALLCONFLICT indicates that the server has recently issued a CB_LAYOUTRECALL to the requesting client, making it necessary for the client to respond to the recall before processing the layout request. A client can wait for that recall to be received and processed, or it can retry as NFS4ERR_TRYLATER, as described below.

错误NFS4ERR_RECALLCONFLICT表示服务器最近向请求客户端发出了CB_LAYOUTRECALL,因此客户端有必要在处理布局请求之前响应调用。客户端可以等待接收和处理该调用,也可以作为NFS4ERR_TRYLATER重试,如下所述。

The error NFS4ERR_TRYLATER is used to indicate that the server cannot immediately grant the layout to the client. This may be due to constraints on writable sharing of blocks by multiple clients or to a conflict with a recallable lock (e.g., a delegation). In either case, a reasonable approach for the client is to wait several milliseconds and retry the request. The client SHOULD track the number of retries, and if forward progress is not made, the client SHOULD abandon the attempt to get a layout and perform READ and WRITE operations by sending them to the server.

错误NFS4ERR_TRYLATER用于指示服务器无法立即将布局授予客户端。这可能是由于多个客户端对块的可写共享的限制或与可重调用锁(例如,委托)的冲突造成的。在这两种情况下,客户端的合理方法是等待几毫秒,然后重试请求。客户机应跟踪重试次数,如果未进行转发,则客户机应放弃获取布局的尝试,并通过将其发送到服务器来执行读写操作。

The error NFS4ERR_LAYOUTUNAVAILABLE MAY be returned by the server if layouts are not supported for the requested file or its containing file system. The server MAY also return this error code if the server is in the process of migrating the file from secondary storage, there is a conflicting lock that would prevent the layout from being granted, or any other reason causes the server to be unable to supply the layout. As a result of receiving NFS4ERR_LAYOUTUNAVAILABLE, the client SHOULD abandon the attempt to get a layout and perform READ and WRITE operations by sending them to the MDS. It is expected that a client will not cache the file's layoutunavailable state forever. In particular, when the file is closed or opened by the client, issuing a new LAYOUTGET is appropriate.

如果请求的文件或其包含的文件系统不支持布局,服务器可能会返回错误NFS4ERR_LAYOUTUNAVAILABLE。如果服务器正在从辅助存储迁移文件,存在冲突的锁会阻止授予布局,或者任何其他原因导致服务器无法提供布局,则服务器也可能返回此错误代码。由于接收到NFS4ERR_LAYOUTUNAVAILABLE,客户端应放弃获取布局的尝试,并通过将其发送到MDS来执行读写操作。预计客户端不会永远缓存文件的layoutunavailable状态。特别是,当客户端关闭或打开文件时,发出新的LAYOUTGET是合适的。

2.8. Volatile Write Caches
2.8. 易失写缓存

Many storage devices implement volatile write caches that require an explicit flush to persist the data from write operations to stable storage. Storage devices implementing [SBC3] should indicate a volatile write cache by setting the Write Cache Enable (WCE) bit to 1 in the Caching mode page. When a volatile write cache is used, the pNFS server MUST ensure the volatile write cache has been committed to stable storage before the LAYOUTCOMMIT operation returns by using one of the SYNCHRONIZE CACHE commands.

许多存储设备实现易失性写缓存,需要显式刷新才能将数据从写操作持久化到稳定存储。实现[SBC3]的存储设备应通过在缓存模式页面中将写缓存启用(WCE)位设置为1来指示易失性写缓存。使用易失性写缓存时,pNFS服务器必须确保在使用同步缓存命令之一返回LAYOUTCOMMIT操作之前,已将易失性写缓存提交到稳定存储。

3. Enforcing NFSv4 Semantics
3. 强制NFSv4语义

The functionality provided by SCSI persistent reservations makes it possible for the MDS to control access by individual client machines to specific LUs. Individual client machines may be allowed to or prevented from reading or writing to certain block devices. Finer-grained access control methods are not generally available.

SCSI持久保留提供的功能使MDS能够控制单个客户机对特定LU的访问。可以允许或阻止单个客户端计算机读取或写入某些块设备。细粒度的访问控制方法通常不可用。

For this reason, certain responsibilities for enforcing NFSv4 semantics, including security and locking, are delegated to pNFS clients when SCSI layouts are being used. The metadata server's role is to only grant layouts appropriately, and the pNFS clients have to be trusted to only perform accesses allowed by the layout extents

因此,在使用SCSI布局时,强制执行NFSv4语义(包括安全性和锁定)的某些职责将委托给pNFS客户端。元数据服务器的角色是仅适当地授予布局,并且必须信任pNFS客户端,以仅执行布局扩展数据块允许的访问

they currently hold (e.g., not access storage for files on which a layout extent is not held). In general, the server will not be able to prevent a client that holds a layout for a file from accessing parts of the physical disk not covered by the layout. Similarly, the server will not be able to prevent a client from accessing blocks covered by a layout that it has already returned. The pNFS client must respect the layout model for this mapping type to appropriately respect NFSv4 semantics.

它们当前保存(例如,不访问未保存布局扩展数据块的文件的存储)。通常,服务器将无法阻止持有文件布局的客户端访问布局未覆盖的物理磁盘部分。类似地,服务器将无法阻止客户端访问其已返回的布局所覆盖的块。pNFS客户端必须遵守此映射类型的布局模型,才能适当地遵守NFSv4语义。

Furthermore, there is no way for the storage to determine the specific NFSv4 entity (principal, openowner, lockowner) on whose behalf the I/O operation is being done. This fact may limit the functionality to be supported and require the pNFS client to implement server policies other than those describable by layouts. In cases in which layouts previously granted become invalid, the server has the option of recalling them. In situations in which communication difficulties prevent this from happening, layouts may be revoked by the server. This revocation is accompanied by changes in persistent reservation that have the effect of preventing SCSI access to the LUs in question by the client.

此外,存储器无法确定正在代表其执行I/O操作的特定NFSv4实体(主体、openowner、lockowner)。这一事实可能会限制所支持的功能,并要求pNFS客户端实现除布局可描述的服务器策略以外的其他服务器策略。在以前授予的布局无效的情况下,服务器可以选择调用它们。在通信困难阻止这种情况发生的情况下,服务器可能会撤销布局。此撤销伴随着永久保留的更改,其效果是阻止客户机对相关LU的SCSI访问。

3.1. Use of Open Stateids
3.1. 开放状态ID的使用

The effective implementation of these NFSv4 semantic constraints is complicated by the different granularities of the actors for the different types of the functionality to be enforced:

这些NFSv4语义约束的有效实现因要实施的不同类型功能的参与者粒度不同而变得复杂:

o To enforce security constraints for particular principals.

o 为特定主体强制实施安全约束。

o To enforce locking constraints for particular owners (openowners and lockowners).

o 为特定所有者(openowners和lockowners)强制实施锁定约束。

Fundamental to enforcing both of these sorts of constraints is the principle that a pNFS client must not issue a SCSI I/O operation unless it possesses both:

强制实施这两种约束的基本原则是,pNFS客户端不得发出SCSI I/O操作,除非它同时具备以下两种条件:

o A valid open stateid for the file in question, performing the I/O that allows I/O of the type in question, which is associated with the openowner and principal on whose behalf the I/O is to be done.

o 相关文件的有效open stateid,执行允许相关类型的I/O的I/O,该类型与将代表其执行I/O的openowner和主体关联。

o A valid layout stateid for the file in question that covers the byte range on which the I/O is to be done and that allows I/O of that type to be done.

o 所讨论文件的有效布局stateid,包括要在其上执行I/O的字节范围,并允许执行该类型的I/O。

As a result, if the equivalent of I/O with an anonymous or write-bypass stateid is to be done, it MUST NOT by done using the pNFS SCSI layout type. The client MAY attempt such I/O using READs and WRITEs that do not use pNFS and are directed to the MDS.

因此,如果要使用匿名或写旁路stateid执行与I/O等效的操作,则不能使用pNFS SCSI布局类型。客户机可以尝试使用不使用PNF且定向到MDS的读写操作进行此类I/O。

When open stateids are revoked, due to lease expiration or any form of administrative revocation, the server MUST recall all layouts that allow I/O to be done on any of the files for which open revocation happens. When there is a failure to successfully return those layouts, the client MUST be fenced.

当由于租约到期或任何形式的管理撤销而撤销open StateID时,服务器必须调用所有布局,以允许对发生开放撤销的任何文件执行I/O操作。当无法成功返回这些布局时,必须对客户端进行防护。

3.2. Enforcing Security Restrictions
3.2. 实施安全限制

The restriction noted above provides adequate enforcement of appropriate security restriction when the principal issuing the I/O is the same as that opening the file. The server is responsible for checking that the I/O mode requested by the OPEN is allowed for the principal doing the OPEN. If the correct sort of I/O is done on behalf of the same principal, then the security restriction is thereby enforced.

当发布I/O的主体与打开文件的主体相同时,上述限制提供了适当安全限制的充分实施。服务器负责检查开放请求的I/O模式是否允许主体进行开放。如果代表同一主体执行正确类型的I/O,则安全限制将因此而强制执行。

If I/O is done by a principal different from the one that opened the file, the client SHOULD send the I/O to be performed by the metadata server rather than doing it directly to the storage device.

如果I/O由与打开文件的主体不同的主体完成,则客户端应将要由元数据服务器执行的I/O发送给存储设备,而不是直接发送给存储设备。

3.3. Enforcing Locking Restrictions
3.3. 强制执行锁定限制

Mandatory enforcement of whole-file locking by means of share reservations is provided when the pNFS client obeys the requirement set forth in Section 3.1. Since performing I/O requires a valid open stateid, an I/O that violates an existing share reservation would only be possible when the server allows conflicting open stateids to exist.

当pNFS客户遵守第3.1节规定的要求时,通过共享保留强制执行整个文件锁定。由于执行I/O需要有效的OpenStateID,因此只有当服务器允许存在冲突的OpenStateID时,才可能发生违反现有共享保留的I/O。

The nature of the SCSI layout type is that such implementation/ enforcement of mandatory byte-range locks is very difficult. Given that layouts are granted to clients rather than owners, the pNFS client is in no position to successfully arbitrate among multiple lockowners on the same client. Suppose lockowner A is doing a write and, while the I/O is pending, lockowner B requests a mandatory byte-range lock for a byte range potentially overlapping the pending I/O. In such a situation, the lock request cannot be granted while the I/O is pending. In a non-pNFS environment, the server would have to wait for pending I/O before granting the mandatory byte-range lock. In the pNFS environment, the server does not issue the I/O and is thus in no position to wait for its completion. The server may recall such layouts, but in doing so, it has no way of distinguishing those being used by lockowners A and B, making it difficult to allow B to perform I/O while forbidding A from doing so. Given this fact, the MDS need to successfully recall all layouts that overlap the range being locked before returning a successful response to the LOCK request. While the lock is in effect, the server SHOULD respond to requests for layouts that overlap a currently locked area with

SCSI布局类型的本质是,这种强制字节范围锁的实现/实施非常困难。鉴于布局是授予客户机而不是所有者的,pNFS客户机无法在同一客户机上的多个锁所有者之间成功仲裁。假设锁所有者A正在执行写入操作,当I/O挂起时,锁所有者B请求强制的字节范围锁,该字节范围可能与挂起的I/O重叠。在这种情况下,当I/O挂起时,无法授予锁请求。在非pNFS环境中,服务器必须等待挂起的I/O,然后才能授予强制的字节范围锁。在pNFS环境中,服务器不会发出I/O,因此无法等待其完成。服务器可能会调用此类布局,但这样做时,它无法区分锁所有者A和B使用的布局,这使得在禁止A执行I/O的同时,很难允许B执行I/O。鉴于这一事实,MDS需要成功地调用与锁定范围重叠的所有布局,然后才能返回对锁定请求的成功响应。锁定生效时,服务器应响应与当前锁定区域重叠的布局请求

NFS4ERR_LAYOUTUNAVAILABLE. To simplify the required logic, a server MAY do this for all layout requests on the file in question as long as there are any byte-range locks in effect.

NFS4ERR_布局不可用。为了简化所需的逻辑,只要存在有效的字节范围锁,服务器就可以对相关文件上的所有布局请求执行此操作。

Given these difficulties, it may be difficult for servers supporting mandatory byte-range locks to also support SCSI layouts. Servers can support advisory byte-range locks instead. The NFSv4 protocol currently has no way of determining whether byte-range lock support on a particular file system will be mandatory or advisory, except by trying operation, which would conflict if mandatory locking is in effect. Therefore, to avoid confusion, servers SHOULD NOT switch between mandatory and advisory byte-range locking based on whether any SCSI layouts have been obtained or whether a client that has obtained a SCSI layout has requested a byte-range lock.

鉴于这些困难,支持强制字节范围锁的服务器可能很难同时支持SCSI布局。服务器可以支持建议字节范围锁。NFSv4协议目前无法确定特定文件系统上的字节范围锁支持是强制的还是建议性的,除非尝试操作,如果强制锁定生效,这将发生冲突。因此,为避免混淆,服务器不应根据是否已获得任何SCSI布局或已获得SCSI布局的客户端是否已请求字节范围锁,在强制和建议字节范围锁之间切换。

4. Security Considerations
4. 安全考虑

Access to SCSI storage devices is logically at a lower layer of the I/O stack than NFSv4; hence, NFSv4 security is not directly applicable to protocols that access such storage directly. Depending on the protocol, some of the security mechanisms provided by NFSv4 (e.g., encryption and cryptographic integrity) may not be available or may be provided via different means. At one extreme, pNFS with SCSI layouts can be used with storage access protocols (e.g., Serial Attached SCSI [SAS3]) that provide essentially no security functionality. At the other extreme, pNFS may be used with storage protocols such as iSCSI [RFC7143] that can provide significant security functionality. It is the responsibility of those administering and deploying pNFS with a SCSI storage access protocol to ensure that appropriate protection is provided to that protocol (physical security is a common means for protocols not based on IP). In environments where the security requirements for the storage protocol cannot be met, pNFS SCSI layouts SHOULD NOT be used.

对SCSI存储设备的访问在逻辑上位于比NFSv4更低的I/O堆栈层;因此,NFSv4安全性不直接适用于直接访问此类存储的协议。根据协议的不同,NFSv4提供的一些安全机制(例如,加密和密码完整性)可能不可用,或者可能通过不同的方式提供。在一个极端情况下,带有SCSI布局的PNF可以与基本上不提供安全功能的存储访问协议(例如,串行连接SCSI[SAS3])一起使用。在另一个极端,pNFS可以与iSCSI[RFC7143]等存储协议一起使用,这些协议可以提供重要的安全功能。使用SCSI存储访问协议管理和部署PNF的人员负责确保为该协议提供适当的保护(物理安全是不基于IP的协议的常见手段)。在无法满足存储协议安全要求的环境中,不应使用pNFS SCSI布局。

When using IP-based storage protocols such as iSCSI, IPsec should be used as outlined in [RFC3723] and updated in [RFC7146].

使用基于IP的存储协议(如iSCSI)时,应按照[RFC3723]中的说明使用IPsec,并在[RFC7146]中进行更新。

When security is available for a storage protocol, it is generally at a different granularity and with a different notion of identity than NFSv4 (e.g., NFSv4 controls user access to files, and iSCSI controls initiator access to volumes). The responsibility for enforcing appropriate correspondences between these security layers is placed upon the pNFS client. As with the issues in the first paragraph of this section, in environments where the security requirements are such that client-side protection from access to storage outside of the layout is not sufficient, pNFS SCSI layouts SHOULD NOT be used.

当存储协议具有安全性时,其粒度和身份概念通常与NFSv4不同(例如,NFSv4控制用户对文件的访问,iSCSI控制启动器对卷的访问)。pNFS客户端负责在这些安全层之间执行适当的通信。与本节第一段中的问题一样,在安全要求不足以防止客户端访问布局之外的存储的环境中,不应使用pNFS SCSI布局。

5. IANA Considerations
5. IANA考虑

IANA has assigned a new pNFS layout type in the "pNFS Layout Types Registry" as follows:

IANA已在“pNFS布局类型注册表”中分配了新的pNFS布局类型,如下所示:

Layout Type Name: LAYOUT4_SCSI Value: 0x00000005 RFC: RFC 8154 How: L Minor Versions: 1

布局类型名称:LAYOUT4\u SCSI值:0x00000005 RFC:RFC 8154方式:L次要版本:1

6. Normative References
6. 规范性引用文件

[LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents", March 2015, <http://trustee.ietf.org/docs/ IETF-Trust-License-Policy.pdf>.

[法律]IETF信托,“与IETF文件相关的法律规定”,2015年3月<http://trustee.ietf.org/docs/ IETF信任许可证策略.pdf>。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<http://www.rfc-editor.org/info/rfc2119>.

[RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and F. Travostino, "Securing Block Storage Protocols over IP", RFC 3723, DOI 10.17487/RFC3723, April 2004, <http://www.rfc-editor.org/info/rfc3723>.

[RFC3723]Aboba,B.,Tseng,J.,Walker,J.,Rangan,V.,和F.Travostino,“通过IP保护块存储协议”,RFC 3723,DOI 10.17487/RFC3723,2004年4月<http://www.rfc-editor.org/info/rfc3723>.

[RFC4506] Eisler, M., Ed., "XDR: External Data Representation Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 2006, <http://www.rfc-editor.org/info/rfc4506>.

[RFC4506]艾斯勒,M.,编辑,“XDR:外部数据表示标准”,STD 67,RFC 4506,DOI 10.17487/RFC4506,2006年5月<http://www.rfc-editor.org/info/rfc4506>.

[RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, <http://www.rfc-editor.org/info/rfc5661>.

[RFC5661]Shepler,S.,Ed.,Eisler,M.,Ed.,和D.Noveck,Ed.,“网络文件系统(NFS)版本4次要版本1协议”,RFC 5661,DOI 10.17487/RFC5661,2010年1月<http://www.rfc-editor.org/info/rfc5661>.

[RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network File System (NFS) Version 4 Minor Version 1 External Data Representation Standard (XDR) Description", RFC 5662, DOI 10.17487/RFC5662, January 2010, <http://www.rfc-editor.org/info/rfc5662>.

[RFC5662]Shepler,S.,Ed.,Eisler,M.,Ed.,和D.Noveck,Ed.,“网络文件系统(NFS)版本4次要版本1外部数据表示标准(XDR)说明”,RFC 5662,DOI 10.17487/RFC5662,2010年1月<http://www.rfc-editor.org/info/rfc5662>.

[RFC5663] Black, D., Fridella, S., and J. Glasgow, "Parallel NFS (pNFS) Block/Volume Layout", RFC 5663, DOI 10.17487/RFC5663, January 2010, <http://www.rfc-editor.org/info/rfc5663>.

[RFC5663]Black,D.,Fridella,S.,和J.Glasgow,“并行NFS(pNFS)块/卷布局”,RFC 5663,DOI 10.17487/RFC5663,2010年1月<http://www.rfc-editor.org/info/rfc5663>.

[RFC6688] Black, D., Ed., Glasgow, J., and S. Faibish, "Parallel NFS (pNFS) Block Disk Protection", RFC 6688, DOI 10.17487/RFC6688, July 2012, <http://www.rfc-editor.org/info/rfc6688>.

[RFC6688]Black,D.,Ed.,Glasgow,J.,和S.Faibish,“并行NFS(pNFS)块磁盘保护”,RFC 6688,DOI 10.17487/RFC6688,2012年7月<http://www.rfc-editor.org/info/rfc6688>.

[RFC7143] Chadalapaka, M., Satran, J., Meth, K., and D. Black, "Internet Small Computer System Interface (iSCSI) Protocol (Consolidated)", RFC 7143, DOI 10.17487/RFC7143, April 2014, <http://www.rfc-editor.org/info/rfc7143>.

[RFC7143]Chadalapaka,M.,Satran,J.,Meth,K.,和D.Black,“互联网小型计算机系统接口(iSCSI)协议(整合)”,RFC 7143,DOI 10.17487/RFC7143,2014年4月<http://www.rfc-editor.org/info/rfc7143>.

[RFC7146] Black, D. and P. Koning, "Securing Block Storage Protocols over IP: RFC 3723 Requirements Update for IPsec v3", RFC 7146, DOI 10.17487/RFC7146, April 2014, <http://www.rfc-editor.org/info/rfc7146>.

[RFC7146]Black,D.和P.Koning,“通过IP保护块存储协议:IPsec v3的RFC 3723要求更新”,RFC 7146,DOI 10.17487/RFC7146,2014年4月<http://www.rfc-editor.org/info/rfc7146>.

[SAM-5] INCITS Technical Committee T10, "Information Technology - SCSI Architecture Model - 5 (SAM-5)", ANSI INCITS 515-2016, 2016.

[SAM-5]INCITS技术委员会T10,“信息技术-SCSI体系结构模型-5(SAM-5)”,ANSI INCITS 515-2016,2016。

[SAS3] INCITS Technical Committee T10, "Information technology - Serial Attached SCSI-3 (SAS-3)", ANSI INCITS 519-2014, ISO/IEC 14776-154, 2014.

[SAS3]INCITS技术委员会T10,“信息技术-串行连接SCSI-3(SAS-3)”,ANSI INCITS 519-2014,ISO/IEC 14776-154,2014。

[SBC3] INCITS Technical Committee T10, "Information Technology - SCSI Block Commands - 3 (SBC-3)", ANSI INCITS 514-2014, ISO/IEC 14776-323, 2014.

[SBC3]INCITS技术委员会T10,“信息技术-SCSI块命令-3(SBC-3)”,ANSI INCITS 514-2014,ISO/IEC 14776-323,2014。

[SPC4] INCITS Technical Committee T10, "Information Technology - SCSI Primary Commands - 4 (SPC-4)", ANSI INCITS 513-2015, 2015.

[SPC4]INCITS技术委员会T10,“信息技术-SCSI主命令-4(SPC-4)”,ANSI INCITS 513-2015,2015。

Acknowledgments

致谢

Large parts of this document were copied verbatim from [RFC5663], and some parts were inspired by it. Thank to David Black, Stephen Fridella, and Jason Glasgow for their work on the pNFS block/volume layout protocol.

本文档的大部分内容是从[RFC5663]一字不差地复制而来的,有些部分是受其启发而来的。感谢David Black、Stephen Fridella和Jason Glasgow在pNFS块/卷布局协议方面的工作。

David Black, Robert Elliott, and Tom Haynes provided a thorough review of drafts of this document, and their input led to the current form of the document.

David Black、Robert Elliott和Tom Haynes对本文件的草稿进行了全面审查,他们的意见形成了本文件的当前形式。

David Noveck provided ample feedback to various drafts of this document, wrote the section on enforcing NFSv4 semantics, and rewrote various sections to better catch the intent.

David Noveck为本文档的各个草稿提供了充分的反馈,编写了关于强制NFSv4语义的部分,并重写了各个部分以更好地理解意图。

Author's Address

作者地址

Christoph Hellwig

克里斯托夫·赫尔维格

   Email: hch@lst.de
        
   Email: hch@lst.de