Independent Submission J. Kunze Request for Comments: 8493 California Digital Library Category: Informational J. Littman ISSN: 2070-1721 Stanford Libraries E. Madden Library of Congress J. Scancella
Independent Submission J. Kunze Request for Comments: 8493 California Digital Library Category: Informational J. Littman ISSN: 2070-1721 Stanford Libraries E. Madden Library of Congress J. Scancella
C. Adams Library of Congress October 2018
C.亚当斯国会图书馆2018年10月
The BagIt File Packaging Format (V1.0)
BagIt文件打包格式(V1.0)
Abstract
摘要
This document describes BagIt, a set of hierarchical file layout conventions for storage and transfer of arbitrary digital content. A "bag" has just enough structure to enclose descriptive metadata "tags" and a file "payload" but does not require knowledge of the payload's internal semantics. This BagIt format is suitable for reliable storage and transfer.
本文档描述了BagIt,这是一组用于存储和传输任意数字内容的分层文件布局约定。“包”的结构刚好足以包含描述性元数据“标记”和文件“有效负载”,但不需要了解有效负载的内部语义。这种BagIt格式适用于可靠的存储和传输。
Status of This Memo
关于下段备忘
This document is not an Internet Standards Track specification; it is published for informational purposes.
本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。
This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not candidates for any level of Internet Standard; see Section 2 of RFC 7841.
这是对RFC系列的贡献,独立于任何其他RFC流。RFC编辑器已选择自行发布此文档,并且未声明其对实现或部署的价值。RFC编辑批准发布的文件不适用于任何级别的互联网标准;见RFC 7841第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8493.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问https://www.rfc-editor.org/info/rfc8493.
Copyright Notice
版权公告
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2018 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(https://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。
Table of Contents
目录
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Requirements . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2. Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1. Required Elements . . . . . . . . . . . . . . . . . . . . 6 2.1.1. Bag Declaration: bagit.txt . . . . . . . . . . . . . 6 2.1.2. Payload Directory: data/ . . . . . . . . . . . . . . 7 2.1.3. Payload Manifest: manifest-algorithm.txt . . . . . . 7 2.2. Optional Elements . . . . . . . . . . . . . . . . . . . . 8 2.2.1. Tag Manifest: tagmanifest-algorithm.txt . . . . . . . 8 2.2.2. Bag Metadata: bag-info.txt . . . . . . . . . . . . . 9 2.2.3. Fetch File: fetch.txt . . . . . . . . . . . . . . . . 12 2.2.4. Other Tag Files . . . . . . . . . . . . . . . . . . . 12 2.3. Text Tag File Format . . . . . . . . . . . . . . . . . . 13 2.4. Bag Checksum Algorithms . . . . . . . . . . . . . . . . . 13 3. Complete and Valid Bags . . . . . . . . . . . . . . . . . . . 14 4. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1. Example of a Basic Bag . . . . . . . . . . . . . . . . . 15 4.2. Example Bag Using fetch.txt . . . . . . . . . . . . . . . 16 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16 5.1. Special Directory Characters . . . . . . . . . . . . . . 16 5.2. Control of URLs in fetch.txt . . . . . . . . . . . . . . 17 5.3. File Sizes in fetch.txt . . . . . . . . . . . . . . . . . 17 5.4. Attacks on Payload File Content . . . . . . . . . . . . . 17 6. Practical Considerations (Non-normative) . . . . . . . . . . 17 6.1. Interoperability . . . . . . . . . . . . . . . . . . . . 17 6.1.1. Filename Normalization . . . . . . . . . . . . . . . 18 6.1.2. Windows and Unix File Naming . . . . . . . . . . . . 18 6.1.3. Legacy Checksum Tools . . . . . . . . . . . . . . . . 18 7. Augmented Backus-Naur Form (Non-normative) . . . . . . . . . 21 7.1. Bag Declaration: bagit.txt . . . . . . . . . . . . . . . 21 7.2. Payload Manifest: manifest-algorithm.txt . . . . . . . . 21 7.3. Bag Metadata: bag-info.txt . . . . . . . . . . . . . . . 22 7.4. Fetch File: fetch.txt . . . . . . . . . . . . . . . . . . 22 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 9.1. Normative References . . . . . . . . . . . . . . . . . . 22 9.2. Informative References . . . . . . . . . . . . . . . . . 23 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 24 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Requirements . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2. Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1. Required Elements . . . . . . . . . . . . . . . . . . . . 6 2.1.1. Bag Declaration: bagit.txt . . . . . . . . . . . . . 6 2.1.2. Payload Directory: data/ . . . . . . . . . . . . . . 7 2.1.3. Payload Manifest: manifest-algorithm.txt . . . . . . 7 2.2. Optional Elements . . . . . . . . . . . . . . . . . . . . 8 2.2.1. Tag Manifest: tagmanifest-algorithm.txt . . . . . . . 8 2.2.2. Bag Metadata: bag-info.txt . . . . . . . . . . . . . 9 2.2.3. Fetch File: fetch.txt . . . . . . . . . . . . . . . . 12 2.2.4. Other Tag Files . . . . . . . . . . . . . . . . . . . 12 2.3. Text Tag File Format . . . . . . . . . . . . . . . . . . 13 2.4. Bag Checksum Algorithms . . . . . . . . . . . . . . . . . 13 3. Complete and Valid Bags . . . . . . . . . . . . . . . . . . . 14 4. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1. Example of a Basic Bag . . . . . . . . . . . . . . . . . 15 4.2. Example Bag Using fetch.txt . . . . . . . . . . . . . . . 16 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16 5.1. Special Directory Characters . . . . . . . . . . . . . . 16 5.2. Control of URLs in fetch.txt . . . . . . . . . . . . . . 17 5.3. File Sizes in fetch.txt . . . . . . . . . . . . . . . . . 17 5.4. Attacks on Payload File Content . . . . . . . . . . . . . 17 6. Practical Considerations (Non-normative) . . . . . . . . . . 17 6.1. Interoperability . . . . . . . . . . . . . . . . . . . . 17 6.1.1. Filename Normalization . . . . . . . . . . . . . . . 18 6.1.2. Windows and Unix File Naming . . . . . . . . . . . . 18 6.1.3. Legacy Checksum Tools . . . . . . . . . . . . . . . . 18 7. Augmented Backus-Naur Form (Non-normative) . . . . . . . . . 21 7.1. Bag Declaration: bagit.txt . . . . . . . . . . . . . . . 21 7.2. Payload Manifest: manifest-algorithm.txt . . . . . . . . 21 7.3. Bag Metadata: bag-info.txt . . . . . . . . . . . . . . . 22 7.4. Fetch File: fetch.txt . . . . . . . . . . . . . . . . . . 22 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 9.1. Normative References . . . . . . . . . . . . . . . . . . 22 9.2. Informative References . . . . . . . . . . . . . . . . . 23 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 24 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25
BagIt is a set of hierarchical file layout conventions designed to support storage and transfer of arbitrary digital content. A "bag" consists of a directory containing the payload files and other accompanying metadata files known as "tag" files. The "tags" are metadata files intended to facilitate and document the storage and transfer of the bag. Processing a bag does not require any understanding of the payload file contents, and the payload files can be accessed without processing the BagIt metadata.
BagIt是一组分层文件布局约定,旨在支持任意数字内容的存储和传输。“包”由一个目录组成,其中包含有效负载文件和其他称为“标记”文件的伴随元数据文件。“标签”是元数据文件,旨在促进和记录行李的储存和转移。处理行李不需要了解有效载荷文件的内容,可以在不处理BagIt元数据的情况下访问有效载荷文件。
The name, BagIt, is inspired by the "enclose and deposit" method [ENCDEP], sometimes referred to as "bag it and tag it". BagIt differs from serialized archival formats such as MIME, TAR, or ZIP in two general areas:
名称BagIt的灵感来源于“封装并存放”方法[ENCDEP],有时也称为“打包并标记”。BagIt与MIME、TAR或ZIP等序列化存档格式在两个方面有所不同:
1. Strong integrity assurances. The format supports cryptographic-quality hash algorithms (see Section 2.4) and allows for in-place upgrades to add additional manifests using stronger algorithms without breaking backwards compatibility. This provides high levels of confidence against data corruption, but it is not designed to be secure against active attacks.
1. 强有力的诚信保证。该格式支持加密质量哈希算法(见第2.4节),并允许就地升级,以使用更强大的算法添加额外清单,而不破坏向后兼容性。这为防止数据损坏提供了高度的信心,但其设计目的不是为了防止主动攻击。
2. Direct file access. Because BagIt specifies an actual filesystem hierarchy rather than a serialized representation of one, files can be accessed using standard operating system utilities, implementations do not need to process a potentially large archival file to extract a subset of data, and the format imposes no size limits for either individual files or a bag.
2. 直接文件访问。因为BagIt指定了一个实际的文件系统层次结构,而不是一个文件系统的序列化表示,所以可以使用标准操作系统实用程序访问文件,实现不需要处理一个可能很大的归档文件来提取数据的子集,而且该格式对单个文件或一个包都没有大小限制。
BagIt is widely used for preserving digital assets originating from different domains. Organizations involved in digital preservation with BagIt include the Library of Congress, Dryad Data Repository, NSF DataONE, and the Rockefeller Archive Center. Software implementations are available for many languages, including Python, Ruby, Java, Perl, and PHP. It is also used in the libraries of many universities, such as Cornell, Purdue, Stanford, Ghent University, New York University, and the University of California.
BagIt广泛用于保存来自不同领域的数字资产。与BagIt一起参与数字保存的组织包括国会图书馆、Dryad数据存储库、NSF DataONE和洛克菲勒档案中心。软件实现可用于多种语言,包括Python、Ruby、Java、Perl和PHP。它也被用于许多大学的图书馆,如康奈尔,Purdue,斯坦福大学,根特大学,纽约大学和加利福尼亚大学。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“建议”、“不建议”、“可”和“可选”在所有大写字母出现时(如图所示)应按照BCP 14[RFC2119][RFC8174]所述进行解释。
Implementers are strongly encouraged to review the interoperability considerations described in Section 6.1.
强烈鼓励实施者审查第6.1节中描述的互操作性注意事项。
The following terms have precise definitions as used in this document:
以下术语具有本文件中使用的精确定义:
bag: A set of opaque files contained within the structure defined by this document.
包:包含在本文件定义的结构中的一组不透明文件。
bag declaration: The file required to be in all bags conforming to this document. Contains values necessary to process the rest of a bag. See Section 2.1.1.
行李声明:所有符合本文件要求的行李中所需的文件。包含处理袋子其余部分所需的值。见第2.1.1节。
bag checksum algorithm: The name of a cryptographic checksum algorithm that has been normalized for use in a manifest or tag manifest file name (e.g., "sha512") as described in Section 2.4.
bag校验和算法:如第2.4节所述,已规范化用于清单或标记清单文件名(例如,“sha512”)的加密校验和算法的名称。
manifest: A tag file that maps filepaths to checksums. A manifest can be a payload manifest (see Section 2.1.3) or a tag manifest (see Section 2.2.1).
清单:将文件路径映射到校验和的标记文件。清单可以是有效负载清单(见第2.1.3节)或标签清单(见第2.2.1节)。
payload: The data encapsulated by the bag as a set of named files, which may be organized in subdirectories. The contents of the payload files are opaque to this document, and, with respect to BagIt processing, are always considered as sequences of uninterpreted octets. See Section 2.1.2.
有效载荷:包封装为一组命名文件的数据,这些文件可以组织在子目录中。有效载荷文件的内容对本文档来说是不透明的,并且,就BagIt处理而言,始终被视为未解释的八位字节序列。见第2.1.2节。
tag directory: A directory that contains one or more tag files.
标记目录:包含一个或多个标记文件的目录。
tag file: A file that contains metadata about the bag or its payload. This document defines the standard BagIt tag files: the bag declaration in "bagit.txt" (see Section 2.1.1), payload manifests (see Section 2.1.3), tag manifests (see Section 2.2.1), bag metadata in "bag-info.txt" (see Section 2.2.2), and remote payload in "fetch.txt" (see Section 2.2.3). This document also allows other arbitrary tag files as described in Section 2.2.4.
tag file: A file that contains metadata about the bag or its payload. This document defines the standard BagIt tag files: the bag declaration in "bagit.txt" (see Section 2.1.1), payload manifests (see Section 2.1.3), tag manifests (see Section 2.2.1), bag metadata in "bag-info.txt" (see Section 2.2.2), and remote payload in "fetch.txt" (see Section 2.2.3). This document also allows other arbitrary tag files as described in Section 2.2.4.translate error, please retry
complete: A bag that contains every element required by this document, every payload file listed in a manifest, and any optional files that are listed in a tag manifest. See Section 3.
完整:包含本文档要求的每个元素、清单中列出的每个有效负载文件以及标签清单中列出的任何可选文件的包。见第3节。
valid: A complete bag where every checksum in every manifest has been successfully verified against the corresponding file.
有效:一个完整的包,其中每个清单中的每个校验和都已根据相应的文件成功验证。
A bag MUST consist of a base directory containing the following:
行李必须由包含以下内容的基本目录组成:
1. a set of required and optional tag files (see Section 2.2);
1. 一组必需和可选的标记文件(见第2.2节);
2. a subdirectory named "data", called the payload directory (see Section 2.1.2); and
2. 名为“data”的子目录,称为有效负载目录(见第2.1.2节);和
3. a set of optional tag directories.
3. 一组可选的标记目录。
The tag files in the base directory consist of one or more files named "manifest-_algorithm_.txt" (see Sections 2.1.3 and 2.4), a file named "bagit.txt" (see Section 2.1.1), and zero or more additional tag files (see Section 2.2). The tag files and directories are in arbitrary file hierarchies and MAY have any name that is not reserved for a file or directory in this document.
基本目录中的标记文件包括一个或多个名为“manifest-_algorithm_uu.txt”的文件(参见第2.1.3和2.4节)、一个名为“bagit.txt”的文件(参见第2.1.1节)和零个或多个附加标记文件(参见第2.2节)。标记文件和目录位于任意文件层次结构中,并且可能具有本文档中未为文件或目录保留的任何名称。
The base directory can have any name, as illustrated by the figure below.
基本目录可以有任何名称,如下图所示。
<base directory>/ | +-- bagit.txt | +-- manifest-<algorithm>.txt | +-- [additional tag files] | +-- data/ | | | +-- [payload files] | +-- [tag directories]/ | +-- [tag files]
<base directory>/ | +-- bagit.txt | +-- manifest-<algorithm>.txt | +-- [additional tag files] | +-- data/ | | | +-- [payload files] | +-- [tag directories]/ | +-- [tag files]
The "bagit.txt" tag file MUST consist of exactly two lines in this order:
“bagit.txt”标记文件必须按此顺序由两行组成:
BagIt-Version: M.N Tag-File-Character-Encoding: ENCODING
BagIt版本:M.N标记文件字符编码:编码
_M.N_ identifies the BagIt major (M) and minor (N) version numbers. _ENCODING_ identifies the character set encoding used by the remaining tag files. _ENCODING_ SHOULD be "UTF-8", but for backwards compatibility it MAY be any other encoding registered in [cs-registry]. The bag declaration itself MUST be encoded in UTF-8 and MUST NOT contain a Byte Order Mark (BOM) [RFC3629].
_M.N_u标识BagIt主要(M)和次要(N)版本号_ENCODING_uu标识其余标记文件使用的字符集编码_编码应为“UTF-8”,但为了向后兼容,它可以是在[cs注册表]中注册的任何其他编码。行李声明本身必须以UTF-8编码,且不得包含字节顺序标记(BOM)[RFC3629]。
The number for this version of BagIt is "1.0".
此版本的BagIt的编号为“1.0”。
The base directory MUST contain a subdirectory named "data".
基本目录必须包含名为“data”的子目录。
The payload directory contains the arbitrary digital content within the bag. The files under the payload directory are called payload files, or the payload. Each payload file is treated as an opaque octet stream when verifying file correctness. Payload files MAY be organized in arbitrary subdirectory structures within the payload directory; however, for the purpose of this document, such subdirectory structures and filenames have no given meaning.
有效载荷目录包含包内的任意数字内容。有效负载目录下的文件称为有效负载文件或有效负载。在验证文件正确性时,每个有效负载文件都被视为不透明的八位字节流。有效载荷文件可以组织在有效载荷目录内的任意子目录结构中;但是,就本文档而言,此类子目录结构和文件名没有给定的含义。
A payload manifest file provides a complete listing of each payload file name along with a corresponding checksum to permit data integrity checking. A bag can have more than one payload manifest, with each using a different checksum algorithm. Manifest entries MUST satisfy the following constraints:
有效负载清单文件提供每个有效负载文件名的完整列表以及相应的校验和,以允许数据完整性检查。一个行李可以有多个有效负载清单,每个清单使用不同的校验和算法。清单条目必须满足以下约束:
o Every bag MUST contain at least one payload manifest file and MAY contain more than one.
o 每个行李必须包含至少一个有效负载清单文件,并且可能包含多个有效负载清单文件。
o Every payload manifest MUST list every payload file name exactly once.
o 每个有效负载清单必须精确列出每个有效负载文件名一次。
o A payload manifest file MUST have a name of the form "manifest-_algorithm_.txt", where _algorithm_ is a string specifying the checksum algorithm used by that manifest as described in Section 2.4.
o 有效负载清单文件必须具有格式为“manifest-_algorithm_uuu.txt”的名称,其中_algorithm_uuu是指定该清单使用的校验和算法的字符串,如第2.4节所述。
Example payload manifest filenames:
负载清单文件名示例:
manifest-sha256.txt manifest-sha512.txt
manifest-sha256.txt manifest-sha512.txt
Each line of a payload manifest file MUST be of the form
有效负载清单文件的每一行的格式必须为
checksum filepath
校验和文件路径
where _filepath_ is the pathname of a file relative to the base directory, and _checksum_ is a hex-encoded checksum calculated by applying _algorithm_ over the file.
其中,_filepath_uu是文件相对于基本目录的路径名,_checksum_uu是通过在文件上应用_算法计算的十六进制编码校验和。
o The hex-encoded checksum MAY use uppercase and/or lowercase letters.
o 十六进制编码的校验和可以使用大写和/或小写字母。
o The slash character ('/') MUST be used as a path separator in _filepath_.
o 斜杠字符(“/”)必须用作_filepath\ux中的路径分隔符。
o One or more linear whitespace characters (spaces or tabs) MUST separate _checksum_ from _filepath_.
o 一个或多个线性空白字符(空格或制表符)必须将校验和与文件路径分开。
o There is no limitation on the length of a pathname.
o 路径名的长度没有限制。
o The payload manifest MUST NOT reference files outside the payload directory.
o 有效负载清单不得引用有效负载目录之外的文件。
o If a _filepath_ includes a Line Feed (LF), a Carriage Return (CR), a Carriage-Return Line Feed (CRLF), or a percent sign (%), those characters (and only those) MUST be percent-encoded following [RFC3986].
o 如果_filepath_包含换行符(LF)、回车符(CR)、回车符换行符(CRLF)或百分号(%),则这些字符(仅这些字符)必须按照[RFC3986]进行百分比编码。
A manifest MUST NOT reference directories. Bag creators who wish to create an otherwise empty directory have typically done so by creating an empty placeholder file with a name such as ".keep".
清单不能引用目录。想要创建一个空目录的包创建者通常会创建一个名为“.keep”的空占位符文件。
A tag manifest is a tag file that lists other tag files and checksums for those tag files generated using a particular bag checksum algorithm.
标签清单是一个标签文件,它列出了其他标签文件以及使用特定行李校验和算法生成的标签文件的校验和。
A bag MAY contain one or more tag manifests, in which case each tag manifest SHOULD list the same set of tag files.
一个包可能包含一个或多个标签清单,在这种情况下,每个标签清单应列出相同的标签文件集。
Each tag manifest MUST list every payload manifest. Each tag manifest MUST NOT list any tag manifests but SHOULD list the remaining tag files present in the bag.
每个标记清单必须列出每个有效负载清单。每个标签清单不得列出任何标签清单,但应列出袋子中剩余的标签文件。
A tag manifest file MUST have a name of the form "tagmanifest-_algorithm_.txt", where _algorithm_ is a string following the format described in Section 2.4 that specifies the bag checksum algorithm used in that manifest.
标签清单文件的名称必须为“tagmanifest-_algorithm_uuu.txt”,其中_algorithm_uuu是一个字符串,采用第2.4节中描述的格式,指定该清单中使用的行李校验和算法。
Tag manifests SHOULD use the same algorithms as the payload manifests that are present in the bag.
标签清单应使用与行李中有效负载清单相同的算法。
Example tag manifest filenames:
标记清单文件名示例:
tagmanifest-sha256.txt tagmanifest-sha512.txt
tagmanifest-sha256.txt tagmanifest-sha512.txt
A tag manifest file has the same form as the payload manifest file described in Section 2.1.3 but MUST NOT list any payload files. As a result, no _filepath_ listed in a tag manifest begins "data/".
标记清单文件的格式与第2.1.3节中描述的有效负载清单文件相同,但不得列出任何有效负载文件。因此,标记清单中没有列出以“data/”开头的文件路径。
The "bag-info.txt" file is a tag file that contains metadata elements describing the bag and the payload. The metadata elements contained in the "bag-info.txt" file are intended primarily for human use. All metadata elements are OPTIONAL and MAY be repeated. Because "bag-info.txt" is intended for human reading and editing, ordering MAY be significant and the ordering of metadata elements MUST be preserved.
“bag info.txt”文件是一个标签文件,其中包含描述行李和有效载荷的元数据元素。“bag info.txt”文件中包含的元数据元素主要供人使用。所有元数据元素都是可选的,可以重复。由于“bag info.txt”旨在供人阅读和编辑,因此排序可能很重要,必须保留元数据元素的排序。
A metadata element MUST consist of a label, a colon ":", a single linear whitespace character (space or tab), and a value that is terminated with an LF, a CR, or a CRLF.
元数据元素必须由标签、冒号“:”、单个线性空白字符(空格或制表符)和以LF、CR或CRLF结尾的值组成。
The label MUST NOT contain a colon (:), LF, or CR. The label MAY contain linear whitespace characters but MUST NOT start or end with whitespace.
标签不能包含冒号(:)、LF或CR。标签可以包含线性空格字符,但不能以空格开头或结尾。
It is RECOMMENDED that lines not exceed 79 characters in length. Long values MAY be continued onto the next line by inserting a LF, CR, or CRLF, and then indenting the next line with one or more linear white space characters (spaces or tabs). Except for linebreaks, such padding does not form part of the value.
建议行的长度不超过79个字符。通过插入LF、CR或CRLF,然后使用一个或多个线性空白字符(空格或制表符)缩进下一行,可以将长值延续到下一行。除换行符外,此类填充不构成值的一部分。
Implementations wishing to support previous BagIt versions MUST accept multiple linear whitespace characters before and after the colon when the bag version is earlier than 1.0; such whitespace does not form part of the label or value.
当bag版本早于1.0时,希望支持以前的BagIt版本的实现必须在冒号前后接受多个线性空白字符;此类空白不构成标签或值的一部分。
The following are reserved metadata elements. The use of these reserved metadata elements is OPTIONAL but encouraged. Reserved metadata element names are case insensitive. Except where indicated otherwise, these metadata element names MAY be repeated to capture multiple values.
以下是保留的元数据元素。这些保留元数据元素的使用是可选的,但鼓励使用。保留的元数据元素名称不区分大小写。除非另有说明,否则这些元数据元素名称可以重复以捕获多个值。
Source-Organization: Organization transferring the content.
源组织:传输内容的组织。
Organization-Address: Mailing address of the source organization.
组织地址:源组织的邮寄地址。
Contact-Name: Person at the source organization who is responsible for the content transfer.
联系人姓名:源组织中负责内容传输的人员。
Contact-Phone: International format telephone number of person or position responsible.
联系电话:负责人或职位的国际格式电话号码。
Contact-Email: Fully qualified email address of person or position responsible.
联系人电子邮件:负责人或职位的完全合格电子邮件地址。
External-Description: A brief explanation of the contents and provenance.
外部描述:简要说明内容和出处。
Bagging-Date: Date (YYYY-MM-DD) that the content was prepared for transfer. This metadata element SHOULD NOT be repeated.
装袋日期:内容物准备转移的日期(YYYY-MM-DD)。此元数据元素不应重复。
External-Identifier: A sender-supplied identifier for the bag.
外部标识符:发送者为行李提供的标识符。
Bag-Size: The size or approximate size of the bag being transferred, followed by an abbreviation such as MB (megabytes), GB (gigabytes), or TB (terabytes): for example, 42600 MB, 42.6 GB, or .043 TB. Compared to Payload-Oxum (described next), Bag-Size is intended for human consumption. This metadata element SHOULD NOT be repeated.
行李大小:正在传输的行李的大小或近似大小,后跟诸如MB(兆字节)、GB(千兆字节)或TB(兆字节)之类的缩写:例如,42600 MB、42.6 GB或.043 TB。与有效载荷Oxum(如下所述)相比,袋子的尺寸适合人类消费。此元数据元素不应重复。
Payload-Oxum: The "octetstream sum" of the payload, which is intended for the purpose of quickly detecting incomplete bags before performing checksum validation. This is strictly an optimization, and implementations MUST perform the standard checksum validation process before proclaiming a bag to be valid. This element MUST NOT be present more than once and, if present, MUST be in the form "_OctetCount_._StreamCount_", where _OctetCount_ is the total number of octets (8-bit bytes) across all payload file content and _StreamCount_ is the total number of payload files. This metadata element MUST NOT be repeated.
有效载荷Oxum:有效载荷的“八进制流和”,用于在执行校验和验证之前快速检测不完整的行李。这严格来说是一种优化,实现必须在宣布行李有效之前执行标准校验和验证过程。此元素不能出现一次以上,如果出现,则必须采用“_OctetCount._StreamCount_u”的形式,其中_OctetCount_u是所有有效负载文件内容的八位字节总数(8位字节),而_StreamCount_u是有效负载文件的总数。此元数据元素不能重复。
Bag-Group-Identifier: A sender-supplied identifier for the set, if any, of bags to which it logically belongs. This identifier SHOULD be unique across the sender's content, and if it is recognizable as belonging to a globally unique scheme, the receiver SHOULD make an effort to honor the reference to it. This metadata element SHOULD NOT be repeated.
行李组标识符:发送方为其逻辑上所属的一组行李(如有)提供的标识符。该标识符在发送方的内容中应该是唯一的,如果它属于一个全局唯一的方案,接收方应该努力尊重对它的引用。此元数据元素不应重复。
Bag-Count: Two numbers separated by "of", in particular, "N of T", where T is the total number of bags in a group of bags and N is the ordinal number within the group. If T is not known, specify it as "?" (question mark): for example, 1 of 2, 4 of 4, 3 of ?, 89 of 145. This metadata element SHOULD NOT be repeated. If this metadata element is present, it is RECOMMENDED to also include the Bag-Group-Identifier element.
行李计数:由“of”分隔的两个数字,特别是“N of T”,其中T是一组行李中的行李总数,N是组内的序号。如果T未知,请将其指定为“?”(问号):例如,2中的1、4中的4、3中的3、89中的145。此元数据元素不应重复。如果存在此元数据元素,建议还包括行李组标识符元素。
Internal-Sender-Identifier: An alternate sender-specific identifier for the content and/or bag.
内部发件人标识符:内容和/或行李的备用发件人特定标识符。
Internal-Sender-Description: A sender-local explanation of the contents and provenance.
内部发送者描述:发送者对内容和出处的本地解释。
In addition to these metadata elements, other arbitrary metadata elements MAY also be present.
除了这些元数据元素之外,还可能存在其他任意元数据元素。
An example of "bag-info.txt" file is as follows:
“bag info.txt”文件的示例如下:
Source-Organization: FOO University Organization-Address: 1 Main St., Cupertino, California, 11111 Contact-Name: Jane Doe Contact-Phone: +1 111-111-1111 Contact-Email: example@example.com External-Description: Uncompressed greyscale TIFF images from the FOO papers colle... Bagging-Date: 2008-01-15 External-Identifier: university_foo_001 Payload-Oxum: 279164409832.1198 Bag-Group-Identifier: university_foo Bag-Count: 1 of 15 Internal-Sender-Identifier: /storage/images/foo Internal-Sender-Description: Uncompressed greyscale TIFFs created from microfilm and are...
来源组织:福大学组织地址:加利福尼亚州库比蒂诺市主街1号,邮编:11111联系人姓名:Jane Doe联系电话:+1 111-111-1111联系电子邮件:example@example.com外部描述:未压缩的灰度TIFF图像从富论文集。。。装袋日期:2008-01-15外部标识符:university_foo_001有效载荷Oxum:279164409832.1198行李组标识符:university_foo行李计数:1/15内部发送者标识符:/storage/images/foo内部发送者描述:从缩微胶片创建的未压缩灰度TIFF,并且是。。。
For reasons of efficiency, a bag MAY be sent with a list of files to be fetched and added to the payload before it can meaningfully be checked for completeness. The fetch file allows a bag to be transmitted with "holes" in it, which can be practical for several reasons. For example, it obviates the need for the sender to stage a large serialized copy of the content while the bag is transferred to the receiver. Also, this method allows a sender to construct a bag from components that are either a subset of logically related components (e.g., the localized logical object could be much larger than what is intended for export) or assembled from logically distributed sources (e.g., the object components for export are not stored locally under one filesystem tree). An OPTIONAL tag file, called the fetch file, contains such a list.
为了提高效率,在对行李的完整性进行有意义的检查之前,可能会发送一个行李,其中包含要提取并添加到有效负载的文件列表。fetch文件允许传输带有“孔”的包,这可能出于几个原因而变得实用。例如,当行李被传送到接收者时,发送者无需准备内容的大型序列化副本。此外,该方法允许发送者从逻辑相关组件的子集(例如,本地化逻辑对象可能比预期出口的对象大得多)或从逻辑分布源组装的组件构建包(例如,要导出的对象组件不是本地存储在一个文件系统树下的)。
The fetch file MUST be named "fetch.txt". Every file listed in the fetch file MUST be listed in every payload manifest. A fetch file MUST NOT list any tag files.
提取文件必须命名为“fetch.txt”。fetch文件中列出的每个文件都必须在每个有效负载清单中列出。提取文件不得列出任何标记文件。
Each line of a fetch file MUST be of the form
提取文件的每一行都必须是
url length filepath
url长度文件路径
where _url_ identifies the file to be fetched and MUST be an absolute URI as defined in [RFC3986], _length_ is the number of octets in the file (or "-", to leave it unspecified), and _filepath_ identifies the corresponding payload file, relative to the base directory.
其中_url_uu标识要获取的文件,并且必须是[RFC3986]中定义的绝对URI,_length_u是文件中的八位字节数(或“-”,未指定),而_filepath_u标识相对于基本目录的相应有效负载文件。
The slash character ('/') MUST be used as a path separator in _filepath_. One or more linear whitespace characters (spaces or tabs) MUST separate these three values, and any such characters in the _url_ MUST be percent-encoded [RFC3986]. If _filename_ includes an LF, a CR, a CRLF, or a percent sign (%), those characters (and only those) MUST be percent-encoded as described in [RFC3986]. There is no limitation on the length of any of the fields in the fetch file.
斜杠字符(“/”)必须用作_filepath\ux中的路径分隔符。一个或多个线性空白字符(空格或制表符)必须分隔这三个值,_url_uu中的任何此类字符必须进行百分比编码[RFC3986]。如果_filename_包含LF、CR、CRLF或百分号(%),则必须按照[RFC3986]中的说明对这些字符(仅限于这些字符)进行百分号编码。获取文件中任何字段的长度都没有限制。
A bag MAY contain other tag files that are not defined by this document. Implementations MUST perform standard checksum validation on any tag file that is listed in a tag manifest but MUST otherwise ignore their contents.
行李可能包含本文件未定义的其他标签文件。实现必须对标记清单中列出的任何标记文件执行标准校验和验证,否则必须忽略其内容。
All tag files specifically described in this document MUST adhere to the text tag file format described below. Other tag files MAY adhere to the text tag file format described below.
本文档中明确描述的所有标记文件必须遵循以下描述的文本标记文件格式。其他标记文件可能遵循下面描述的文本标记文件格式。
Text tag files are line oriented, and each line MUST be terminated by an LF, a CR, or a CRLF. It is RECOMMENDED that the last line in a tag file also end with LF, CR, or CRLF. Text tag file names MUST end in the extension ".txt".
文本标记文件是面向行的,每一行都必须由LF、CR或CRLF终止。建议标记文件中的最后一行也以LF、CR或CRLF结尾。文本标记文件名必须以扩展名“.txt”结尾。
In all text tag files except for the bag declaration file, text MUST use the character encoding specified in the "bagit.txt" bag declaration file. Text tag files except for the bag declaration file MAY include a Byte Order Mark (BOM) only if the specified encoding requires it for proper decoding. In accordance with [RFC3629], when "bagit.txt" specifies UTF-8, the tag files MUST NOT begin with a BOM. See Section 2.1.1.
在除行李声明文件外的所有文本标记文件中,文本必须使用“bagit.txt”行李声明文件中指定的字符编码。仅当指定编码要求正确解码时,除行李声明文件外的文本标签文件可能包括字节顺序标记(BOM)。根据[RFC3629],当“bagit.txt”指定UTF-8时,标签文件不得以BOM开头。见第2.1.1节。
The use of UTF-8 for text tag files is strongly RECOMMENDED. A future version of BagIt may disallow encodings other than UTF-8.
强烈建议对文本标记文件使用UTF-8。BagIt的未来版本可能不允许UTF-8以外的编码。
The payload manifest and tag manifest permit validating the integrity of the payload and tag files in a bag produced by the checksum algorithms. Checksum values MUST be encoded so as to conform to the manifest format specified in Section 2.1.3. However, the internal details of a checksum are outside the scope of this document.
有效负载清单和标记清单允许验证校验和算法生成的包中有效负载和标记文件的完整性。校验和值的编码必须符合第2.1.3节规定的清单格式。但是,校验和的内部细节不在本文档的范围内。
To avoid future ambiguity, the checksum algorithm SHOULD be registered in IANA's "Named Information Hash Algorithm Registry" [ni-registry] according to [RFC6920] but MAY, for backwards compatibility, also be MD5 [RFC1321] or SHA-1 [RFC3174].
为了避免将来出现歧义,校验和算法应根据[RFC6920]在IANA的“命名信息哈希算法注册表”[ni注册表]中注册,但为了向后兼容,也可以是MD5[RFC1321]或SHA-1[RFC3174]。
The name of the checksum algorithm MUST be normalized for use in the manifest's filename by lowercasing the common name of the algorithm and removing all non-alphanumeric characters. Following is a partial list that maps common algorithm names to normalized names:
校验和算法的名称必须规范化,以便在清单的文件名中使用,方法是将算法的通用名称小写,并删除所有非字母数字字符。以下是将常用算法名称映射到规范化名称的部分列表:
o MD5: md5
o MD5:MD5
o SHA-1: sha1
o SHA-1:sha1
o sha-256: sha256
o sha-256:sha256
o sha-512: sha512
o sha-512:sha512
Starting with BagIt 1.0, bag creation and validation tools MUST support the SHA-256 and SHA-512 algorithms [RFC6234] and SHOULD enable SHA-512 by default when creating new bags. For backwards compatibility, implementers SHOULD support MD5 [RFC1321] and SHA-1 [RFC3174]. Implementers are encouraged to simplify the process of adding additional manifests using new algorithms to streamline the process of in-place upgrades.
从BagIt 1.0开始,行李创建和验证工具必须支持SHA-256和SHA-512算法[RFC6234],并在创建新行李时默认启用SHA-512。为了向后兼容,实现者应该支持MD5[RFC1321]和SHA-1[RFC3174]。鼓励实施者使用新算法简化添加额外清单的过程,以简化就地升级过程。
A _complete_ bag MUST meet the following requirements:
完整的行李必须满足以下要求:
1. Every required element MUST be present (see Section 2.1).
1. 每个必需的元素必须存在(见第2.1节)。
2. Every file listed in every tag manifest MUST be present.
2. 每个标记清单中列出的每个文件都必须存在。
3. Every file listed in every payload manifest MUST be present.
3. 每个有效负载清单中列出的每个文件都必须存在。
4. For BagIt 1.0, every payload file MUST be listed in every payload manifest. Note that older versions of BagIt allowed payload files to be listed in just one of the manifests.
4. 对于Bagit1.0,每个有效负载文件必须列在每个有效负载清单中。请注意,旧版本的BagIt只允许在一个清单中列出有效负载文件。
5. Every element present MUST conform to BagIt 1.0.
5. 存在的每个元素必须符合BagIt 1.0。
A _valid_ bag MUST meet the following requirements:
有效行李必须满足以下要求:
1. The bag MUST be _complete_.
1. 这个包必须是完整的。
2. Every checksum in every payload manifest and tag manifest has been successfully verified against the contents of the corresponding file.
2. 每个有效负载清单和标记清单中的每个校验和都已根据相应文件的内容成功验证。
This is the layout of a basic bag containing an image and a companion Optical Character Recognition (OCR) file. Lines of file content are shown with added parentheses to indicate each complete line. For brevity, this example uses MD5 rather than the recommended SHA-512.
这是包含图像和配套光学字符识别(OCR)文件的基本行李的布局。显示文件内容行时,会添加括号,以指示每一行的完整内容。为简洁起见,此示例使用MD5而不是推荐的SHA-512。
myfirstbag/ | | manifest-md5.txt | (49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/images/q172.png) | (408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/images/q172.txt) | | bagit.txt | (BagIt-version: 1.0 ) | (Tag-File-Character-Encoding: UTF-8 ) | \--- data/ | | 27613-h/images/q172.png | (... image bytes ... ) | | 27613-h/images/q172.txt | (... OCR text ... ) ....
myfirstbag/ | | manifest-md5.txt | (49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/images/q172.png) | (408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/images/q172.txt) | | bagit.txt | (BagIt-version: 1.0 ) | (Tag-File-Character-Encoding: UTF-8 ) | \--- data/ | | 27613-h/images/q172.png | (... image bytes ... ) | | 27613-h/images/q172.txt | (... OCR text ... ) ....
This is the layout of a bag that expects the receiver to download the files listed in the payload manifests prior to validation. Lines of file content are shown with added parentheses to indicate each complete line. For brevity, this example uses MD5 rather than the recommended SHA-512.
这是行李的布局,期望接收者在验证之前下载有效负载清单中列出的文件。显示文件内容行时,会添加括号,以指示每一行的完整内容。为简洁起见,此示例使用MD5而不是推荐的SHA-512。
highsmith-tahoe/ | | manifest-md5.txt | (102b0e6effe208ef9b29864946de9e22 data/23364a.tif ) | | fetch.txt | (https://cdn.loc.gov/master/pnp/highsm/23300/23364a.tif | 216951362 data/23364a.tif ) | | bagit.txt | (BagIt-version: 1.0 ) | (Tag-File-Character-Encoding: UTF-8 ) | | bag-info.txt | (Internal-Sender-Description: Download link found at ) | ( https://www.loc.gov/resource/highsm.23364/ )
highsmith-tahoe/ | | manifest-md5.txt | (102b0e6effe208ef9b29864946de9e22 data/23364a.tif ) | | fetch.txt | (https://cdn.loc.gov/master/pnp/highsm/23300/23364a.tif | 216951362 data/23364a.tif ) | | bagit.txt | (BagIt-version: 1.0 ) | (Tag-File-Character-Encoding: UTF-8 ) | | bag-info.txt | (Internal-Sender-Description: Download link found at ) | ( https://www.loc.gov/resource/highsm.23364/ )
The paths specified in the payload manifests, tag manifests, and fetch files do not prohibit special directory characters that have special meaning on some operating systems. Implementers MUST ensure that files outside the bag directory structure are not accessed when reading or writing files based on paths specified in a bag.
有效负载清单、标记清单和获取文件中指定的路径不禁止在某些操作系统上具有特殊含义的特殊目录字符。实施者必须确保在根据包中指定的路径读取或写入文件时,不会访问包目录结构之外的文件。
All implementations SHOULD have a test suite to guard against special directory characters.
所有实现都应该有一个测试套件来防止特殊的目录字符。
For example, a maliciously crafted "tagmanifest-sha512.txt" file might contain entries that begin with a path character such as "/", "..", or a "~username" home directory reference in an attempt to cause a naive implementation to leak or overwrite targeted files on a POSIX operating system.
例如,恶意创建的“tagmanifest-sha512.txt”文件可能包含以路径字符开头的条目,例如“/”、“.”或“~username”主目录引用,以试图导致原始实现泄漏或覆盖POSIX操作系统上的目标文件。
Windows implementations SHOULD test their implementations to ensure that safety checks prevent use of drive letters and the less commonly used namespace sequences (e.g., "\\?\C:\...") described in [MSFNAM].
Windows实现应测试其实现,以确保安全检查可防止使用驱动器号和[MSFNAM]中描述的不太常用的命名空间序列(例如“\\?\C:\…”)。
To assist implementers, the Library of Congress conformance suite [LC-CONFORMANCE-SUITE] has some tests for invalid bags that are expected to fail on POSIX or Windows clients.
为了帮助实施者,国会图书馆一致性套件[LC-conformance-suite]对POSIX或Windows客户端上可能失败的无效包进行了一些测试。
Implementers of tools that complete bags by retrieving URLs listed in a fetch file need to be aware that some of those URLs might point to hosts, intentionally or unintentionally, that are not under control of the bag's sender. Moreover, older checksum algorithms, even if reasonable for detecting corruption during transit, may not offer strong cryptographic protection against intentional spoofing.
通过检索fetch文件中列出的URL来完成行李的工具的实现者需要知道,其中一些URL可能有意或无意地指向不受行李发送者控制的主机。此外,旧的校验和算法,即使对于在传输过程中检测损坏是合理的,也可能无法提供针对故意欺骗的强大密码保护。
The size of files, as optionally reported in the fetch file, cannot be guaranteed to match the actual file size to be downloaded. Implementers SHOULD take steps to monitor and abort transfer when the received file size exceeds the file size reported in the fetch file. Implementers SHOULD NOT use the file size in the fetch file for critical resource allocation, such as buffer sizing or storage requisitioning.
获取文件中报告的文件大小(可选)不能保证与要下载的实际文件大小匹配。当接收到的文件大小超过获取文件中报告的文件大小时,实现者应采取步骤监视并中止传输。实现者不应将获取文件中的文件大小用于关键资源分配,如缓冲区大小或存储请求。
The integrity assurance provided by manifests is designed to provide high levels of confidence against data corruption but is not designed to be secure against active attacks. Organizations that need to secure bags against such threats SHOULD agree on additional measures, such as digital signatures, that are out of scope for this specification.
清单提供的完整性保证旨在提供针对数据损坏的高信任度,但不是针对主动攻击的安全性。需要保护行李免受此类威胁的组织应就超出本规范范围的其他措施(如数字签名)达成一致。
This section lists practical considerations for implementers and users. None of the points below are required, but they are recommended for general-purpose usage.
本节列出了实施者和用户的实际注意事项。以下几点都不是必需的,但建议用于一般用途。
Upon discovering errors in bags, an implementation is free to take action (for example, logging or reporting) in an application-specific manner. This document does not mandate any particular action.
在发现包中的错误时,实现可以自由地以特定于应用程序的方式采取操作(例如,记录或报告)。本文件不要求采取任何具体行动。
The Library of Congress conformance suite [LC-CONFORMANCE-SUITE] is provided as a public resource to test new implementations for compatibility and error handling.
国会图书馆一致性套件[LC-conformance-suite]作为公共资源提供,用于测试新实现的兼容性和错误处理。
This section provides background information on various challenges caused by differences in how operating systems, filesystems, and common tools handle filenames. This section is followed by a list of recommendations for implementers in Section 6.1.1.3.
本节提供了由于操作系统、文件系统和常用工具处理文件名的方式不同而引起的各种挑战的背景信息。本节之后是第6.1.1.3节中针对实施者的建议列表。
There are three challenges for interoperability related to filename case:
与文件名案例相关的互操作性有三个挑战:
o Filesystems such as File Allocation Table (FAT) or Extended File Allocation Table (EXFAT) always convert filenames to uppercase: "example.txt" will be stored as "EXAMPLE.TXT".
o 文件系统,如文件分配表(FAT)或扩展文件分配表(EXFAT),总是将文件名转换为大写:“example.txt”将存储为“example.txt”。
o Many Unix filesystems save filenames exactly as provided, which allows multiple files that differ only in case: "example.txt" and "Example.txt" are separate files.
o 许多Unix文件系统完全按照提供的方式保存文件名,这允许多个文件只在大小写不同的情况下使用:“example.txt”和“example.txt”是单独的文件。
o New Technology File System (NTFS) and Apple's Hierarchical File System (HFS) Plus usually preserve case when storing files but are case insensitive when retrieving them. A file saved as "Example.txt" will be retrieved by that name but will also be retrieved as "EXAMPLE.TXT", "example.txt", etc.
o 新技术文件系统(NTFS)和苹果的分层文件系统(HFS)Plus通常在存储文件时保留大小写,但在检索文件时不区分大小写。保存为“Example.txt”的文件将按该名称检索,但也将检索为“Example.txt”、“Example.txt”等。
The Unicode specification has common cases where different character sequences produce the same human-meaningful text. These are referred to as "canonically equivalent" and the Unicode specification defines different normalization forms - see [UNICODE-TR15] for the full details.
Unicode规范有一些常见情况,不同的字符序列产生相同的人类有意义的文本。这些被称为“规范等效”,Unicode规范定义了不同的规范化形式-有关详细信息,请参见[Unicode-TR15]。
The example below shows the common surname "Nunez" normalized in different forms.
下面的示例显示了以不同形式规范化的常见姓氏“Nunez”。
Normalization Form D (Decomposition)
归一化形式D(分解)
Char UTF8 Hex Name ---------------------------------------------- N 4e LATIN CAPITAL LETTER N u 75 LATIN SMALL LETTER U \u0301 cc81 COMBINING ACUTE ACCENT n 6e LATIN SMALL LETTER N \u0303 cc83 COMBINING TILDE e 65 LATIN SMALL LETTER E z 7a LATIN SMALL LETTER Z
Char UTF8 Hex Name ---------------------------------------------- N 4e LATIN CAPITAL LETTER N u 75 LATIN SMALL LETTER U \u0301 cc81 COMBINING ACUTE ACCENT n 6e LATIN SMALL LETTER N \u0303 cc83 COMBINING TILDE e 65 LATIN SMALL LETTER E z 7a LATIN SMALL LETTER Z
Normalization Form C (Canonical Composition)
规范化形式C(标准组合)
Char UTF8 Hex Name ---------------------------------------------- N 4e LATIN CAPITAL LETTER N u c3ba LATIN SMALL LETTER U WITH ACUTE n c3b1 LATIN SMALL LETTER N WITH TILDE e 65 LATIN SMALL LETTER E z 7a LATIN SMALL LETTER Z
Char UTF8 Hex Name ---------------------------------------------- N 4e LATIN CAPITAL LETTER N u c3ba LATIN SMALL LETTER U WITH ACUTE n c3b1 LATIN SMALL LETTER N WITH TILDE e 65 LATIN SMALL LETTER E z 7a LATIN SMALL LETTER Z
Unicode normalization is relevant to BagIt implementors because different systems have different standards for normalization:
Unicode规范化与BagIt实施者相关,因为不同的系统有不同的规范化标准:
o Apple's HFS Plus filesystem always normalizes filenames to a fully decomposed form based on the Unicode 2.0 specification (see [TN1150]).
o 苹果的HFS Plus文件系统总是基于Unicode 2.0规范(参见[TN1150])将文件名规范化为完全分解的形式。
o Windows treats filenames as opaque character sequences (see [MSFNAM]) and will store and return the encoded bytes exactly as provided.
o Windows将文件名视为不透明字符序列(请参见[MSFNAM]),并将完全按照提供的方式存储和返回编码的字节。
o Linux and other common Unix systems are generally similar to Windows in storing and returning opaque byte streams, but this behavior is technically dependent on the filesystem.
o Linux和其他常见的Unix系统在存储和返回不透明字节流方面通常与Windows相似,但这种行为在技术上取决于文件系统。
o Utilities used for file management, transfer, and archiving may ignore this issue, apply an arbitrary normalization form, or allow the user to control how normalization is applied.
o 用于文件管理、传输和归档的实用程序可能会忽略此问题,应用任意规范化表单,或允许用户控制如何应用规范化。
In practice, this means that the encoded filename stored in a manifest may fail a simple file existence check because the filename's normalization was changed at some point after the manifest was written. This situation is very confusing for users because the filenames are visually indistinguishable, and the "missing" file is obviously present in the payload directory.
实际上,这意味着存储在清单中的编码文件名可能无法通过简单的文件存在性检查,因为文件名的规范化在写入清单后的某个时间点发生了更改。这种情况对于用户来说是非常混乱的,因为文件名在视觉上是不可区分的,而且“缺少”的文件显然存在于有效负载目录中。
o Implementations SHOULD discourage the creation of bags containing files that differ only in case.
o 实施应阻止创建包含文件的包,这些文件仅在不同情况下有所不同。
o Implementations SHOULD prevent the creation of bags containing files that differ only in normalization form.
o 实现应防止创建包含仅在规范化形式上不同的文件的包。
o BagIt implementations SHOULD tolerate differences in normalization form by comparing both the list of filesystem and manifest names after applying the same normalization form to both.
o 在对文件系统和清单名称应用相同的规范化表单后,BagIt实现应该通过比较文件系统和清单名称的列表来容忍规范化表单的差异。
o Implementations SHOULD issue a warning when multiple manifests are present that differ only in case or normalization form.
o 当存在多个仅在大小写或规范化形式上不同的清单时,实现应该发出警告。
As specified above, only the Unix-based path separator ('/') may be used inside filenames listed in BagIt manifest and fetch.txt files. When bags are exchanged between Windows and Unix platforms, the path separator SHOULD be translated as needed. Receivers of bags on physical media SHOULD be prepared for filesystems created under either Windows or Unix. Besides the fundamental difference between path separators ('\' and '/'), generally, Windows filesystems have more limitations than Unix filesystems.
如上所述,在BagIt manifest和fetch.txt文件中列出的文件名中只能使用基于Unix的路径分隔符(“/”)。在Windows和Unix平台之间交换行李时,应根据需要转换路径分隔符。物理介质上的行李接收器应为在Windows或Unix下创建的文件系统做好准备。除了路径分隔符(“\”和“/”)之间的根本区别之外,通常Windows文件系统比Unix文件系统有更多的限制。
Windows path names have a maximum of 255 characters, and none of these characters may be used in a path component:
Windows路径名的最大长度为255个字符,路径组件中不能使用这些字符:
< > : " / | ? *
< > : " / | ? *
Windows also reserves the following names, with or without a file extension:
Windows还保留以下名称(带或不带文件扩展名):
CON, PRN, AUX, NUL COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9 LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
CON、PRN、AUX、NUL COM1、COM2、COM3、COM4、COM5、COM6、COM7、COM8、COM9 LPT1、LPT2、LPT3、LPT4、LPT5、LPT6、LPT7、LPT8、LPT9
See [MSFNAM] for more information and possible alternatives.
有关更多信息和可能的替代方案,请参见[MSFNAM]。
Some bags have been manually assembled using checksum utilities such as those contained in the GNU Coreutils package (md5sum, sha1sum, etc.), collectively referred to here as "md5sum". Implementers who desire wide support of legacy content should be aware of some known quirks of these tools.
一些行李是使用校验和实用程序手动组装的,例如GNU Coreutils包中包含的那些(md5sum、sha1sum等),这里统称为“md5sum”。希望广泛支持遗留内容的实现者应该了解这些工具的一些已知怪癖。
md5sum can be run in "text mode", which causes it to normalize line endings on some operating systems. On Unix-like systems, both modes will usually produce the same results; on systems like Windows, they can produce different results based on the file contents. The md5sum output format has two characters between the checksum and the filepath: the first is always a space, and the second is an asterisk ("*") for binary mode and a space for text mode.
md5sum可以在“文本模式”下运行,这会使它在某些操作系统上规范化行尾。在类Unix系统上,两种模式通常会产生相同的结果;在Windows等系统上,它们可以根据文件内容生成不同的结果。md5sum输出格式在校验和和文件路径之间有两个字符:第一个字符始终是空格,第二个字符是星号(“*”),表示二进制模式,另一个字符表示文本模式。
A final note about md5sum-generated manifests is that, for a _filepath_ containing a backslash ('\'), the manifest line will have a backslash inserted in front of the _checksum_ and, under Windows, the backslashes inside _filepath_ can be doubled.
关于md5sum生成的清单的最后一个注意事项是,对于包含反斜杠(“\”)的“U文件路径”,清单行将在“U校验和”前面插入一个反斜杠,并且在Windows下,“U文件路径”内的反斜杠可以加倍。
Implementers MAY wish to accept this format by ignoring a leading asterisk or handling differences in line termination gracefully but, if so, implementations MUST warn the user that the bag in question will fail strict validation. In such cases, it is RECOMMENDED that tools provide an easy option to update the bag with valid manifests.
实施者可能希望通过忽略前导星号或优雅地处理行终止中的差异来接受此格式,但如果是这样,则实施必须警告用户,有问题的包将无法通过严格验证。在这种情况下,建议工具提供一个简单的选项,用有效的清单更新行李。
The Augmented Backus-Naur Form (ABNF) rules provided below are non-normative. If there is a discrepancy between requirements in the normative sections and the ABNF, the requirements in the normative sections prevail. Some definitions use the core rules (e.g., DIGIT, HEXDIG, etc) as defined in [RFC5234].
下面提供的扩充巴科斯诺尔表(ABNF)规则是非规范性的。如果规范性章节中的要求与ABNF之间存在差异,则以规范性章节中的要求为准。一些定义使用[RFC5234]中定义的核心规则(例如,数字、HEXDIG等)。
bagit.txt ABNF rules:
bagit.txt ABNF规则:
bagit-txt = "BagIt-Version: " 1*DIGIT "." 1*DIGIT ending "Tag-File-Character-Encoding: " encoding ending encoding = 1*CHAR ending = CR / LF / CRLF
bagit-txt = "BagIt-Version: " 1*DIGIT "." 1*DIGIT ending "Tag-File-Character-Encoding: " encoding ending encoding = 1*CHAR ending = CR / LF / CRLF
Payload Manifest ABNF rules:
有效负载清单ABNF规则:
payload-manifest = 1*payload-manifest-line payload-manifest-line = checksum 1*WSP filepath ending checksum = 1*case-hexdig case-hexdig = DIGIT / "A" / "a" / "B" / "b" / "C" / "c" / "D" / "d" / "E"/ "e"/ "F" / "f" filepath = "data/" 1*( unreserved / pct-encoded / sub-delims ) unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / DQUOTE / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / "/" pct-encoded = "%0D" / "%0d" / "%0A" / "%0a" / "%25" ending = CR / LF / CRLF
payload-manifest = 1*payload-manifest-line payload-manifest-line = checksum 1*WSP filepath ending checksum = 1*case-hexdig case-hexdig = DIGIT / "A" / "a" / "B" / "b" / "C" / "c" / "D" / "d" / "E"/ "e"/ "F" / "f" filepath = "data/" 1*( unreserved / pct-encoded / sub-delims ) unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / DQUOTE / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / "/" pct-encoded = "%0D" / "%0d" / "%0A" / "%0a" / "%25" ending = CR / LF / CRLF
bag-info.txt ABNF rules:
bag-info.txt ABNF规则:
metadata = 1*metadata-line metadata-line = key ":" WSP value ending *(continuation ending) key = 1*non-reserved value = 1*non-reserved continuation = WSP 1*non-reserved non-reserved = VCHAR / WSP ; any valid character for the specific encoding ; except those that match "ending" ending = CR / LF / CRLF
metadata = 1*metadata-line metadata-line = key ":" WSP value ending *(continuation ending) key = 1*non-reserved value = 1*non-reserved continuation = WSP 1*non-reserved non-reserved = VCHAR / WSP ; any valid character for the specific encoding ; except those that match "ending" ending = CR / LF / CRLF
fetch.txt ABNF rules:
fetch.txt ABNF规则:
fetch = 1*fetch-line fetch-line = url 1*WSP length 1*WSP filepath ending url = <absolute-URI, see [RFC3986], Section 4.3> length = 1*DIGIT / "-" filepath = ("data/" 1*( unreserved / pct-encoded / sub-delims )) ending = CR / LF / CRLF
fetch = 1*fetch-line fetch-line = url 1*WSP length 1*WSP filepath ending url = <absolute-URI, see [RFC3986], Section 4.3> length = 1*DIGIT / "-" filepath = ("data/" 1*( unreserved / pct-encoded / sub-delims )) ending = CR / LF / CRLF
This document has no IANA actions.
本文档没有IANA操作。
[cs-registry] IANA, "Character Set", <https://www.iana.org/assignments/character-sets>.
[cs注册表]IANA,“字符集”<https://www.iana.org/assignments/character-sets>.
[ni-registry] IANA, "Named Information Hash Algorithm", <https://www.iana.org/assignments/named-information>.
[ni注册表]IANA,“命名信息哈希算法”<https://www.iana.org/assignments/named-information>.
[RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, DOI 10.17487/RFC1321, April 1992, <https://www.rfc-editor.org/info/rfc1321>.
[RFC1321]Rivest,R.,“MD5消息摘要算法”,RFC 1321,DOI 10.17487/RFC1321,1992年4月<https://www.rfc-editor.org/info/rfc1321>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<https://www.rfc-editor.org/info/rfc2119>.
[RFC3174] Eastlake 3rd, D. and P. Jones, "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, DOI 10.17487/RFC3174, September 2001, <https://www.rfc-editor.org/info/rfc3174>.
[RFC3174]Eastlake 3rd,D.和P.Jones,“美国安全哈希算法1(SHA1)”,RFC 3174,DOI 10.17487/RFC3174,2001年9月<https://www.rfc-editor.org/info/rfc3174>.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2003, <https://www.rfc-editor.org/info/rfc3629>.
[RFC3629]Yergeau,F.,“UTF-8,ISO 10646的转换格式”,STD 63,RFC 3629,DOI 10.17487/RFC3629,2003年11月<https://www.rfc-editor.org/info/rfc3629>.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005, <https://www.rfc-editor.org/info/rfc3986>.
[RFC3986]Berners Lee,T.,Fielding,R.,和L.Masinter,“统一资源标识符(URI):通用语法”,STD 66,RFC 3986,DOI 10.17487/RFC3986,2005年1月<https://www.rfc-editor.org/info/rfc3986>.
[RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, DOI 10.17487/RFC6234, May 2011, <https://www.rfc-editor.org/info/rfc6234>.
[RFC6234]Eastlake 3rd,D.和T.Hansen,“美国安全哈希算法(基于SHA和SHA的HMAC和HKDF)”,RFC 6234,DOI 10.17487/RFC6234,2011年5月<https://www.rfc-editor.org/info/rfc6234>.
[RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., and P. Hallam-Baker, "Naming Things with Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, <https://www.rfc-editor.org/info/rfc6920>.
[RFC6920]Farrell,S.,Kutscher,D.,Dannewitz,C.,Ohlman,B.,Keranen,A.,和P.Hallam Baker,“用哈希命名事物”,RFC 6920,DOI 10.17487/RFC692012013年4月<https://www.rfc-editor.org/info/rfc6920>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8174]Leiba,B.,“RFC 2119关键词中大写与小写的歧义”,BCP 14,RFC 8174,DOI 10.17487/RFC8174,2017年5月<https://www.rfc-editor.org/info/rfc8174>.
[ENCDEP] Tabata, K., Okada, T., Nagamori, M., Sakaguchi, T., and S. Sugimoto, "A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method", 2005, <https://web.archive.org/web/20060508015635/ http://www.iwaw.net/05/papers/iwaw05-tabata.pdf>.
[ENCDEP]Tabata,K.,Okada,T.,Nagamori,M.,Sakaguchi,T.,和S.Sugimoto,“档案系统之间的协作模式,通过封闭和存放方法提高保存的可靠性”,2005年<https://web.archive.org/web/20060508015635/ http://www.iwaw.net/05/papers/iwaw05-tabata.pdf>.
[LC-CONFORMANCE-SUITE] The Library of Congress, "Test cases for validating Bagit Implementations", commit 43bcbdf, November 2017, <https://github.com/LibraryOfCongress/ bagit-conformance-suite/>.
[LC-Compliance-SUITE]美国国会图书馆,“验证Bagit实现的测试案例”,提交43bcbdf,2017年11月<https://github.com/LibraryOfCongress/ bagit一致性套件/>。
[MSFNAM] Microsoft, Inc., "Naming Files, Paths, and Namespaces", May 2018, <http://msdn2.microsoft.com/en-us/library/aa365247.aspx>.
[MSFNAM]微软公司,“命名文件、路径和名称空间”,2018年5月<http://msdn2.microsoft.com/en-us/library/aa365247.aspx>.
[RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, January 2008, <https://www.rfc-editor.org/info/rfc5234>.
[RFC5234]Crocker,D.,Ed.和P.Overell,“语法规范的扩充BNF:ABNF”,STD 68,RFC 5234,DOI 10.17487/RFC5234,2008年1月<https://www.rfc-editor.org/info/rfc5234>.
[TN1150] Apple Inc., "Technical Note TN1150: HFS Plus Volume Format", March 2004, <https://developer.apple.com/legacy/ library/technotes/tn/tn1150.html>.
[TN1150]苹果公司,“技术说明TN1150:HFS Plus卷格式”,2004年3月<https://developer.apple.com/legacy/ library/technotes/tn/tn1150.html>。
[UNICODE-TR15] Unicode Consortium, "Unicode Standard Annex #15: Unicode Normalization Forms", Technical Report, Unicode 11.0.0, May 2018, <http://www.unicode.org/reports/tr15/>.
[UNICODE-TR15]UNICODE联盟,“UNICODE标准附件#15:UNICODE规范化格式”,技术报告,UNICODE 11.0.0,2018年5月<http://www.unicode.org/reports/tr15/>.
Acknowledgements
致谢
BagIt benefitted from the thoughtful assistance of Stephen Abrams, Mike Ashenfelder, Dan Chudnov, Dave Crocker, Scott Fisher, Brad Hards, Erik Hetzner, Keith Johnson, Leslie Johnston, David Loy, Mark Phillips, Tracy Seneca, Stian Soiland-Reyes, Brian Tingle, Adam Turoff, and Jim Tuttle.
巴吉特受益于斯蒂芬·艾布拉姆斯、迈克·阿森费尔德、丹·丘德诺夫、戴夫·克罗克、斯科特·费舍尔、布拉德·哈德斯、埃里克·赫茨纳、基思·约翰逊、莱斯利·约翰斯顿、大卫·罗伊、马克·菲利普斯、特蕾西·塞内卡、斯蒂安·索兰·雷耶斯、布赖恩·廷格尔、亚当·特洛夫和吉姆·塔特尔的周到帮助。
Contributors
贡献者
Additional contributors to the authoring of BagIt are Andy Boyko, David Brunton, Rosie Storey, Ed Summers, Brian Vargas, and Kate Zwaard.
巴吉特创作的其他贡献者包括安迪·博伊科、大卫·布伦顿、罗西·斯托雷、埃德·萨默斯、布赖恩·瓦尔加斯和凯特·兹瓦德。
Authors' Addresses
作者地址
John A. Kunze California Digital Library 415 20th St, 4th Floor Oakland, CA 94612 United States of America
美国加利福尼亚州奥克兰20街415号4楼约翰·A·昆茨加利福尼亚数字图书馆94612
Email: jak@ucop.edu
Email: jak@ucop.edu
Justin Littman Stanford Libraries 518 Memorial Way Stanford, CA 94305 United States of America
Justin Littman斯坦福图书馆518纪念路斯坦福,加利福尼亚州94305美利坚合众国
Email: justinlittman@stanford.edu
Email: justinlittman@stanford.edu
Liz Madden Library of Congress 101 Independence Avenue SE Washington, DC 20540 United States of America
美国华盛顿特区东南独立大道101号国会图书馆,邮编20540
Email: emad@loc.gov
Email: emad@loc.gov
John Scancella
约翰·斯坎塞拉
Email: john.scancella@gmail.com
Email: john.scancella@gmail.com
Chris Adams Library of Congress 101 Independence Avenue SE Washington, DC 20540 United States of America
美国华盛顿特区东南独立大道101号国会图书馆,邮编20540
Email: cadams@loc.gov
Email: cadams@loc.gov