Internet Engineering Task Force (IETF) C. Bormann Request for Comments: 7049 Universitaet Bremen TZI Category: Standards Track P. Hoffman ISSN: 2070-1721 VPN Consortium October 2013
Internet Engineering Task Force (IETF) C. Bormann Request for Comments: 7049 Universitaet Bremen TZI Category: Standards Track P. Hoffman ISSN: 2070-1721 VPN Consortium October 2013
Concise Binary Object Representation (CBOR)
简明二进制对象表示法(CBOR)
Abstract
摘要
The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. These design goals make it different from earlier binary serializations such as ASN.1 and MessagePack.
简明二进制对象表示法(CBOR)是一种数据格式,其设计目标包括极小的代码大小、相当小的消息大小和无需版本协商的可扩展性。这些设计目标使它不同于早期的二进制序列化,如ASN.1和MessagePack。
Status of This Memo
关于下段备忘
This is an Internet Standards Track document.
这是一份互联网标准跟踪文件。
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.
本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7049.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc7049.
Copyright Notice
版权公告
Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2013 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
Table of Contents
目录
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2. Specification of the CBOR Encoding . . . . . . . . . . . . . 6 2.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Indefinite Lengths for Some Major Types . . . . . . . . . 9 2.2.1. Indefinite-Length Arrays and Maps . . . . . . . . . . 9 2.2.2. Indefinite-Length Byte Strings and Text Strings . . . 11 2.3. Floating-Point Numbers and Values with No Content . . . . 12 2.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 14 2.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 16 2.4.2. Bignums . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.3. Decimal Fractions and Bigfloats . . . . . . . . . . . 17 2.4.4. Content Hints . . . . . . . . . . . . . . . . . . . . 18 2.4.4.1. Encoded CBOR Data Item . . . . . . . . . . . . . 18 2.4.4.2. Expected Later Encoding for CBOR-to-JSON Converters . . . . . . . . . . . . . . . . . . . 18 2.4.4.3. Encoded Text . . . . . . . . . . . . . . . . . . 19 2.4.5. Self-Describe CBOR . . . . . . . . . . . . . . . . . 19 3. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 20 3.1. CBOR in Streaming Applications . . . . . . . . . . . . . 20 3.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 21 3.3. Syntax Errors . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1. Incomplete CBOR Data Items . . . . . . . . . . . . . 22 3.3.2. Malformed Indefinite-Length Items . . . . . . . . . . 22 3.3.3. Unknown Additional Information Values . . . . . . . . 23 3.4. Other Decoding Errors . . . . . . . . . . . . . . . . . . 23 3.5. Handling Unknown Simple Values and Tags . . . . . . . . . 24 3.6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.7. Specifying Keys for Maps . . . . . . . . . . . . . . . . 25 3.8. Undefined Values . . . . . . . . . . . . . . . . . . . . 26 3.9. Canonical CBOR . . . . . . . . . . . . . . . . . . . . . 26 3.10. Strict Mode . . . . . . . . . . . . . . . . . . . . . . . 28 4. Converting Data between CBOR and JSON . . . . . . . . . . . . 29 4.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 29 4.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 30 5. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 31 5.1. Extension Points . . . . . . . . . . . . . . . . . . . . 32 5.2. Curating the Additional Information Space . . . . . . . . 33 6. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 33 6.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 34 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 7.1. Simple Values Registry . . . . . . . . . . . . . . . . . 35 7.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 35 7.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 36 7.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 37
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2. Specification of the CBOR Encoding . . . . . . . . . . . . . 6 2.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Indefinite Lengths for Some Major Types . . . . . . . . . 9 2.2.1. Indefinite-Length Arrays and Maps . . . . . . . . . . 9 2.2.2. Indefinite-Length Byte Strings and Text Strings . . . 11 2.3. Floating-Point Numbers and Values with No Content . . . . 12 2.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 14 2.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 16 2.4.2. Bignums . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.3. Decimal Fractions and Bigfloats . . . . . . . . . . . 17 2.4.4. Content Hints . . . . . . . . . . . . . . . . . . . . 18 2.4.4.1. Encoded CBOR Data Item . . . . . . . . . . . . . 18 2.4.4.2. Expected Later Encoding for CBOR-to-JSON Converters . . . . . . . . . . . . . . . . . . . 18 2.4.4.3. Encoded Text . . . . . . . . . . . . . . . . . . 19 2.4.5. Self-Describe CBOR . . . . . . . . . . . . . . . . . 19 3. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 20 3.1. CBOR in Streaming Applications . . . . . . . . . . . . . 20 3.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 21 3.3. Syntax Errors . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1. Incomplete CBOR Data Items . . . . . . . . . . . . . 22 3.3.2. Malformed Indefinite-Length Items . . . . . . . . . . 22 3.3.3. Unknown Additional Information Values . . . . . . . . 23 3.4. Other Decoding Errors . . . . . . . . . . . . . . . . . . 23 3.5. Handling Unknown Simple Values and Tags . . . . . . . . . 24 3.6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.7. Specifying Keys for Maps . . . . . . . . . . . . . . . . 25 3.8. Undefined Values . . . . . . . . . . . . . . . . . . . . 26 3.9. Canonical CBOR . . . . . . . . . . . . . . . . . . . . . 26 3.10. Strict Mode . . . . . . . . . . . . . . . . . . . . . . . 28 4. Converting Data between CBOR and JSON . . . . . . . . . . . . 29 4.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 29 4.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 30 5. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 31 5.1. Extension Points . . . . . . . . . . . . . . . . . . . . 32 5.2. Curating the Additional Information Space . . . . . . . . 33 6. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 33 6.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 34 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 7.1. Simple Values Registry . . . . . . . . . . . . . . . . . 35 7.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 35 7.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 36 7.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 37
7.5. The +cbor Structured Syntax Suffix Registration . . . . . 37 8. Security Considerations . . . . . . . . . . . . . . . . . . . 38 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 10.1. Normative References . . . . . . . . . . . . . . . . . . 39 10.2. Informative References . . . . . . . . . . . . . . . . . 40 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 41 Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 45 Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 48 Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 50 Appendix E. Comparison of Other Binary Formats to CBOR's Design Objectives . . . . . . . . . . . . . . . . . . . . . 51 E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 52 E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 52 E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 53 E.4. UBJSON . . . . . . . . . . . . . . . . . . . . . . . . . 53 E.5. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 53 E.6. Conciseness on the Wire . . . . . . . . . . . . . . . . . 53
7.5. The +cbor Structured Syntax Suffix Registration . . . . . 37 8. Security Considerations . . . . . . . . . . . . . . . . . . . 38 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 10.1. Normative References . . . . . . . . . . . . . . . . . . 39 10.2. Informative References . . . . . . . . . . . . . . . . . 40 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 41 Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 45 Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 48 Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 50 Appendix E. Comparison of Other Binary Formats to CBOR's Design Objectives . . . . . . . . . . . . . . . . . . . . . 51 E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 52 E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 52 E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 53 E.4. UBJSON . . . . . . . . . . . . . . . . . . . . . . . . . 53 E.5. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 53 E.6. Conciseness on the Wire . . . . . . . . . . . . . . . . . 53
There are hundreds of standardized formats for binary representation of structured data (also known as binary serialization formats). Of those, some are for specific domains of information, while others are generalized for arbitrary data. In the IETF, probably the best-known formats in the latter category are ASN.1's BER and DER [ASN.1].
结构化数据的二进制表示有数百种标准格式(也称为二进制序列化格式)。其中一些用于特定的信息领域,而另一些则用于任意数据。在IETF中,后一类中最著名的格式可能是ASN.1的BER和DER[ASN.1]。
The format defined here follows some specific design goals that are not well met by current formats. The underlying data model is an extended version of the JSON data model [RFC4627]. It is important to note that this is not a proposal that the grammar in RFC 4627 be extended in general, since doing so would cause a significant backwards incompatibility with already deployed JSON documents. Instead, this document simply defines its own data model that starts from JSON.
这里定义的格式遵循一些当前格式无法很好实现的特定设计目标。底层数据模型是JSON数据模型[RFC4627]的扩展版本。需要注意的是,这并不是一个建议,即RFC4627中的语法通常应该扩展,因为这样做会导致与已经部署的JSON文档的向后不兼容。相反,本文只是定义了它自己的数据模型,该模型从JSON开始。
Appendix E lists some existing binary formats and discusses how well they do or do not fit the design objectives of the Concise Binary Object Representation (CBOR).
附录E列出了一些现有的二进制格式,并讨论了它们是否符合简明二进制对象表示法(CBOR)的设计目标。
The objectives of CBOR, roughly in decreasing order of importance, are:
CBOR的目标大致按重要性递减顺序为:
1. The representation must be able to unambiguously encode most common data formats used in Internet standards.
1. 表示必须能够明确地编码互联网标准中使用的最常见的数据格式。
* It must represent a reasonable set of basic data types and structures using binary encoding. "Reasonable" here is largely influenced by the capabilities of JSON, with the major addition of binary byte strings. The structures supported are limited to arrays and trees; loops and lattice-style graphs are not supported.
* 它必须使用二进制编码表示一组合理的基本数据类型和结构。这里的“合理”很大程度上受JSON功能的影响,主要是添加了二进制字节字符串。支持的结构仅限于阵列和树;不支持循环和晶格样式图。
* There is no requirement that all data formats be uniquely encoded; that is, it is acceptable that the number "7" might be encoded in multiple different ways.
* 无需对所有数据格式进行唯一编码;也就是说,数字“7”可以以多种不同方式编码是可以接受的。
2. The code for an encoder or decoder must be able to be compact in order to support systems with very limited memory, processor power, and instruction sets.
2. 编码器或解码器的代码必须紧凑,以支持内存、处理器能力和指令集非常有限的系统。
* An encoder and a decoder need to be implementable in a very small amount of code (for example, in class 1 constrained nodes as defined in [CNN-TERMS]).
* 编码器和解码器需要在非常少量的代码中实现(例如,在[CNN-TERMS]中定义的类1约束节点中)。
* The format should use contemporary machine representations of data (for example, not requiring binary-to-decimal conversion).
* 格式应使用数据的当代机器表示(例如,不需要二进制到十进制转换)。
3. Data must be able to be decoded without a schema description.
3. 数据必须能够在没有模式描述的情况下解码。
* Similar to JSON, encoded data should be self-describing so that a generic decoder can be written.
* 与JSON类似,编码数据应该是自描述的,以便编写通用解码器。
4. The serialization must be reasonably compact, but data compactness is secondary to code compactness for the encoder and decoder.
4. 序列化必须合理紧凑,但对于编码器和解码器来说,数据紧凑性仅次于代码紧凑性。
* "Reasonable" here is bounded by JSON as an upper bound in size, and by implementation complexity maintaining a lower bound. Using either general compression schemes or extensive bit-fiddling violates the complexity goals.
* 这里的“合理”是以JSON为大小上限,以实现复杂性为下限。使用通用压缩方案或广泛的位篡改都违反了复杂性目标。
5. The format must be applicable to both constrained nodes and high-volume applications.
5. 该格式必须同时适用于受约束节点和高容量应用程序。
* This means it must be reasonably frugal in CPU usage for both encoding and decoding. This is relevant both for constrained nodes and for potential usage in applications with a very high volume of data.
* 这意味着编码和解码的CPU使用量必须合理节约。这对于受约束的节点和具有非常高数据量的应用程序中的潜在用途都是相关的。
6. The format must support all JSON data types for conversion to and from JSON.
6. 格式必须支持所有JSON数据类型,以便在JSON之间进行转换。
* It must support a reasonable level of conversion as long as the data represented is within the capabilities of JSON. It must be possible to define a unidirectional mapping towards JSON for all types of data.
* 只要所表示的数据在JSON的能力范围内,它就必须支持合理的转换级别。必须能够为所有类型的数据定义指向JSON的单向映射。
7. The format must be extensible, and the extended data must be decodable by earlier decoders.
7. 格式必须是可扩展的,并且扩展的数据必须能够被早期的解码器解码。
* The format is designed for decades of use.
* 这种格式是为几十年的使用而设计的。
* The format must support a form of extensibility that allows fallback so that a decoder that does not understand an extension can still decode the message.
* 该格式必须支持允许回退的扩展性形式,以便不理解扩展的解码器仍然可以解码消息。
* The format must be able to be extended in the future by later IETF standards.
* 该格式必须能够在将来通过以后的IETF标准进行扩展。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119, BCP 14 [RFC2119] and indicate requirement levels for compliant CBOR implementations.
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119、BCP 14[RFC2119]中的描述进行解释,并指出符合CBOR实施的要求级别。
The term "byte" is used in its now-customary sense as a synonym for "octet". All multi-byte values are encoded in network byte order (that is, most significant byte first, also known as "big-endian").
术语“byte”在现在的习惯意义上被用作“octet”的同义词。所有多字节值均按网络字节顺序编码(即,最高有效字节优先,也称为“big-endian”)。
This specification makes use of the following terminology:
本规范使用了以下术语:
Data item: A single piece of CBOR data. The structure of a data item may contain zero, one, or more nested data items. The term is used both for the data item in representation format and for the abstract idea that can be derived from that by a decoder.
数据项:单个CBOR数据。数据项的结构可以包含零个、一个或多个嵌套数据项。该术语既用于表示格式的数据项,也用于可由解码器从中导出的抽象概念。
Decoder: A process that decodes a CBOR data item and makes it available to an application. Formally speaking, a decoder contains a parser to break up the input using the syntax rules of CBOR, as well as a semantic processor to prepare the data in a form suitable to the application.
解码器:对CBOR数据项进行解码并使其可供应用程序使用的过程。从形式上讲,解码器包含使用CBOR语法规则分解输入的解析器,以及以适合应用程序的形式准备数据的语义处理器。
Encoder: A process that generates the representation format of a CBOR data item from application information.
编码器:根据应用程序信息生成CBOR数据项表示格式的过程。
Data Stream: A sequence of zero or more data items, not further assembled into a larger containing data item. The independent data items that make up a data stream are sometimes also referred to as "top-level data items".
数据流:零个或多个数据项的序列,没有进一步组合成一个较大的包含数据项的序列。构成数据流的独立数据项有时也称为“顶级数据项”。
Well-formed: A data item that follows the syntactic structure of CBOR. A well-formed data item uses the initial bytes and the byte strings and/or data items that are implied by their values as defined in CBOR and is not followed by extraneous data.
格式良好:遵循CBOR语法结构的数据项。格式良好的数据项使用初始字节、字节字符串和/或数据项,这些数据项由CBOR中定义的值暗示,后面不跟随无关数据。
Valid: A data item that is well-formed and also follows the semantic restrictions that apply to CBOR data items.
有效:格式良好的数据项,并且遵循适用于CBOR数据项的语义限制。
Stream decoder: A process that decodes a data stream and makes each of the data items in the sequence available to an application as they are received.
流解码器:对数据流进行解码并使序列中的每个数据项在接收时可供应用程序使用的过程。
Where bit arithmetic or data types are explained, this document uses the notation familiar from the programming language C, except that "**" denotes exponentiation. Similar to the "0x" notation for hexadecimal numbers, numbers in binary notation are prefixed with "0b". Underscores can be added to such a number solely for readability, so 0b00100001 (0x21) might be written 0b001_00001 to emphasize the desired interpretation of the bits in the byte; in this case, it is split into three bits and five bits.
在解释位算术或数据类型时,本文档使用编程语言C中熟悉的符号,除了“**”表示幂运算。与十六进制数字的“0x”表示法类似,二进制表示法中的数字前缀为“0b”。下划线可以仅仅为了可读性而添加到这样的数字中,因此可以将0b00100001(0x21)写入0b001_00001,以强调字节中位的所需解释;在这种情况下,它被分为三位和五位。
A CBOR-encoded data item is structured and encoded as described in this section. The encoding is summarized in Table 5.
CBOR编码数据项的结构和编码如本节所述。编码汇总在表5中。
The initial byte of each data item contains both information about the major type (the high-order 3 bits, described in Section 2.1) and additional information (the low-order 5 bits). When the value of the additional information is less than 24, it is directly used as a small unsigned integer. When it is 24 to 27, the additional bytes for a variable-length integer immediately follow; the values 24 to 27 of the additional information specify that its length is a 1-, 2-, 4-, or 8-byte unsigned integer, respectively. Additional information
每个数据项的初始字节都包含关于主类型的信息(第2.1节中描述的高阶3位)和附加信息(低阶5位)。当附加信息的值小于24时,它直接用作小的无符号整数。当它是24到27时,可变长度整数的附加字节立即跟随;附加信息的值24到27分别指定其长度为1、2、4或8字节无符号整数。补充资料
value 31 is used for indefinite-length items, described in Section 2.2. Additional information values 28 to 30 are reserved for future expansion.
值31用于第2.2节所述的不定长项目。保留附加信息值28至30以供将来扩展。
In all additional information values, the resulting integer is interpreted depending on the major type. It may represent the actual data: for example, in integer types, the resulting integer is used for the value itself. It may instead supply length information: for example, in byte strings it gives the length of the byte string data that follows.
在所有附加信息值中,根据主类型解释结果整数。它可能表示实际数据:例如,在整数类型中,结果整数用于值本身。它可以提供长度信息:例如,在字节字符串中,它给出了后面字节字符串数据的长度。
A CBOR decoder implementation can be based on a jump table with all 256 defined values for the initial byte (Table 5). A decoder in a constrained implementation can instead use the structure of the initial byte and following bytes for more compact code (see Appendix C for a rough impression of how this could look).
CBOR解码器的实现可以基于一个跳转表,该跳转表包含初始字节的所有256个定义值(表5)。受约束实现中的解码器可以使用初始字节和后续字节的结构来实现更紧凑的代码(参见附录C,了解其外观的大致印象)。
The following lists the major types and the additional information and other bytes associated with the type.
以下列出了主要类型以及与该类型相关的附加信息和其他字节。
Major type 0: an unsigned integer. The 5-bit additional information is either the integer itself (for additional information values 0 through 23) or the length of additional data. Additional information 24 means the value is represented in an additional uint8_t, 25 means a uint16_t, 26 means a uint32_t, and 27 means a uint64_t. For example, the integer 10 is denoted as the one byte 0b000_01010 (major type 0, additional information 10). The integer 500 would be 0b000_11001 (major type 0, additional information 25) followed by the two bytes 0x01f4, which is 500 in decimal.
主类型0:无符号整数。5位附加信息是整数本身(对于附加信息值0到23)或附加数据的长度。附加信息24表示该值用附加uint8表示,25表示uint16表示,26表示uint32表示,27表示uint64表示。例如,整数10表示为一个字节0b000_01010(主类型0,附加信息10)。整数500将是0b000_11001(主类型0,附加信息25),后跟两个字节0x01f4,十进制为500。
Major type 1: a negative integer. The encoding follows the rules for unsigned integers (major type 0), except that the value is then -1 minus the encoded unsigned integer. For example, the integer -500 would be 0b001_11001 (major type 1, additional information 25) followed by the two bytes 0x01f3, which is 499 in decimal.
主要类型1:负整数。编码遵循无符号整数(主类型0)的规则,但值为-1减去编码的无符号整数。例如,整数-500将是0b001_11001(主类型1,附加信息25),后跟两个字节0x01f3,十进制为499。
Major type 2: a byte string. The string's length in bytes is represented following the rules for positive integers (major type 0). For example, a byte string whose length is 5 would have an initial byte of 0b010_00101 (major type 2, additional information 5 for the length), followed by 5 bytes of binary content. A byte string whose length is 500 would have 3 initial bytes of
主要类型2:字节字符串。字符串的长度(以字节为单位)按照正整数(主类型0)的规则表示。例如,长度为5的字节字符串的初始字节为0b010_00101(主要类型2,长度的附加信息为5),然后是5个字节的二进制内容。长度为500的字节字符串的初始字节数为3
0b010_11001 (major type 2, additional information 25 to indicate a two-byte length) followed by the two bytes 0x01f4 for a length of 500, followed by 500 bytes of binary content.
0b010_11001(主类型2,附加信息25表示两个字节的长度),后跟两个字节0x01f4,长度为500,后跟500字节的二进制内容。
Major type 3: a text string, specifically a string of Unicode characters that is encoded as UTF-8 [RFC3629]. The format of this type is identical to that of byte strings (major type 2), that is, as with major type 2, the length gives the number of bytes. This type is provided for systems that need to interpret or display human-readable text, and allows the differentiation between unstructured bytes and text that has a specified repertoire and encoding. In contrast to formats such as JSON, the Unicode characters in this type are never escaped. Thus, a newline character (U+000A) is always represented in a string as the byte 0x0a, and never as the bytes 0x5c6e (the characters "\" and "n") or as 0x5c7530303061 (the characters "\", "u", "0", "0", "0", and "a").
主要类型3:文本字符串,特别是编码为UTF-8[RFC3629]的Unicode字符字符串。此类型的格式与字节字符串(主类型2)的格式相同,也就是说,与主类型2一样,长度表示字节数。这种类型适用于需要解释或显示人类可读文本的系统,并允许区分非结构化字节和具有指定指令集和编码的文本。与JSON等格式不同,这种类型的Unicode字符从不转义。因此,换行符(U+000A)始终在字符串中表示为字节0x0a,而从不表示为字节0x5c6e(字符“\”和“n”)或0x5c7530303061(字符“\”、“U”、“0”、“0”和“a”)。
Major type 4: an array of data items. Arrays are also called lists, sequences, or tuples. The array's length follows the rules for byte strings (major type 2), except that the length denotes the number of data items, not the length in bytes that the array takes up. Items in an array do not need to all be of the same type. For example, an array that contains 10 items of any type would have an initial byte of 0b100_01010 (major type of 4, additional information of 10 for the length) followed by the 10 remaining items.
主要类型4:数据项数组。数组也称为列表、序列或元组。数组的长度遵循字节字符串(主要类型2)的规则,但长度表示数据项的数量,而不是数组占用的字节长度。数组中的项不一定都是同一类型的。例如,包含10个任意类型项的数组的初始字节为0b100_01010(主类型为4,长度的附加信息为10),后面是剩余的10个项。
Major type 5: a map of pairs of data items. Maps are also called tables, dictionaries, hashes, or objects (in JSON). A map is comprised of pairs of data items, each pair consisting of a key that is immediately followed by a value. The map's length follows the rules for byte strings (major type 2), except that the length denotes the number of pairs, not the length in bytes that the map takes up. For example, a map that contains 9 pairs would have an initial byte of 0b101_01001 (major type of 5, additional information of 9 for the number of pairs) followed by the 18 remaining items. The first item is the first key, the second item is the first value, the third item is the second key, and so on. A map that has duplicate keys may be well-formed, but it is not valid, and thus it causes indeterminate decoding; see also Section 3.7.
主要类型5:成对数据项的映射。映射也称为表、字典、哈希或对象(JSON)。映射由成对的数据项组成,每对数据项由一个紧跟着一个值的键组成。映射的长度遵循字节字符串(主要类型2)的规则,除了长度表示对的数量,而不是映射占用的字节长度。例如,包含9对的映射的初始字节为0b101_01001(主要类型为5,对数的附加信息为9),后面是剩余的18项。第一项是第一个键,第二项是第一个值,第三项是第二个键,依此类推。具有重复密钥的映射可能格式良好,但无效,因此会导致不确定解码;另见第3.7节。
Major type 6: optional semantic tagging of other major types. See Section 2.4.
主要类型6:其他主要类型的可选语义标记。见第2.4节。
Major type 7: floating-point numbers and simple data types that need no content, as well as the "break" stop code. See Section 2.3.
主要类型7:浮点数和不需要内容的简单数据类型,以及“中断”停止码。见第2.3节。
These eight major types lead to a simple table showing which of the 256 possible values for the initial byte of a data item are used (Table 5).
这八种主要类型导致了一个简单的表格,显示了数据项初始字节的256个可能值中使用了哪一个(表5)。
In major types 6 and 7, many of the possible values are reserved for future specification. See Section 7 for more information on these values.
在主要类型6和7中,许多可能的值保留供将来的规范使用。有关这些值的更多信息,请参见第7节。
Four CBOR items (arrays, maps, byte strings, and text strings) can be encoded with an indefinite length using additional information value 31. This is useful if the encoding of the item needs to begin before the number of items inside the array or map, or the total length of the string, is known. (The application of this is often referred to as "streaming" within a data item.)
四个CBOR项(数组、映射、字节字符串和文本字符串)可以使用附加信息值31进行不定长编码。如果需要在数组或映射中的项数或字符串的总长度已知之前开始对项进行编码,则此选项非常有用。(这种应用通常被称为数据项中的“流式传输”。)
Indefinite-length arrays and maps are dealt with differently than indefinite-length byte strings and text strings.
不定长数组和映射的处理方式不同于不定长字节字符串和文本字符串。
Indefinite-length arrays and maps are simply opened without indicating the number of data items that will be included in the array or map, using the additional information value of 31. The initial major type and additional information byte is followed by the elements of the array or map, just as they would be in other arrays or maps. The end of the array or map is indicated by encoding a "break" stop code in a place where the next data item would normally have been included. The "break" is encoded with major type 7 and additional information value 31 (0b111_11111) but is not itself a data item: it is just a syntactic feature to close the array or map. That is, the "break" stop code comes after the last item in the array or map, and it cannot occur anywhere else in place of a data item. In this way, indefinite-length arrays and maps look identical to other arrays and maps except for beginning with the additional information value 31 and ending with the "break" stop code.
使用附加信息值31,只需打开不定长数组和映射,而不指示将包含在数组或映射中的数据项的数量。初始主类型和附加信息字节后面跟着数组或映射的元素,就像在其他数组或映射中一样。数组或映射的结尾通过在通常包含下一个数据项的位置编码“中断”停止码来表示。“break”用主类型7和附加信息值31(0b111_11111)编码,但它本身不是一个数据项:它只是一个用于关闭数组或映射的语法特征。也就是说,“break”停止代码出现在数组或映射中的最后一项之后,它不能出现在数据项的其他任何位置。这样,不定长数组和映射看起来与其他数组和映射相同,只是以附加信息值31开头,以“break”停止码结尾。
Arrays and maps with indefinite lengths allow any number of items (for arrays) and key/value pairs (for maps) to be given before the "break" stop code. There is no restriction against nesting indefinite-length array or map items. A "break" only terminates a single item, so nested indefinite-length items need exactly as many "break" stop codes as there are type bytes starting an indefinite-length item.
长度不定的数组和映射允许在“中断”停止码之前给出任意数量的项(对于数组)和键/值对(对于映射)。嵌套不定长数组或映射项没有限制。“break”只终止单个项,因此嵌套的不定长项需要的“break”停止码与不定长项开头的类型字节数相同。
For example, assume an encoder wants to represent the abstract array [1, [2, 3], [4, 5]]. The definite-length encoding would be 0x8301820203820405:
例如,假设编码器想要表示抽象数组[1、[2,3]、[4,5]]。定长编码为0x830182020382045:
83 -- Array of length 3 01 -- 1 82 -- Array of length 2 02 -- 2 03 -- 3 82 -- Array of length 2 04 -- 4 05 -- 5
83--长度为301--182--长度为202--203--382--长度为204--405--5的数组
Indefinite-length encoding could be applied independently to each of the three arrays encoded in this data item, as required, leading to representations such as:
根据需要,不定长编码可独立应用于此数据项中编码的三个数组中的每一个,从而产生如下表示:
0x9f018202039f0405ffff 9F -- Start indefinite-length array 01 -- 1 82 -- Array of length 2 02 -- 2 03 -- 3 9F -- Start indefinite-length array 04 -- 4 05 -- 5 FF -- "break" (inner array) FF -- "break" (outer array)
0x9f018202039f0405ffff 9F--开始不定长数组01--182--长度数组202--203--39F--开始不定长数组04--405--5FF--“中断”(内部数组)FF--“中断”(外部数组)
0x9f01820203820405ff 9F -- Start indefinite-length array 01 -- 1 82 -- Array of length 2 02 -- 2 03 -- 3 82 -- Array of length 2 04 -- 4 05 -- 5 FF -- "break"
0x9f01820203820405ff 9F--开始不定长数组01--182--长度数组202--203--382--长度数组204--405--5 FF--“中断”
0x83018202039f0405ff 83 -- Array of length 3 01 -- 1 82 -- Array of length 2 02 -- 2 03 -- 3 9F -- Start indefinite-length array 04 -- 4 05 -- 5 FF -- "break"
0x83018202039f0405ff 83—长度为301--182的数组—长度为202--203--39F的数组—开始不定长数组04--405--5FF--“中断”
0x83019f0203ff820405 83 -- Array of length 3 01 -- 1 9F -- Start indefinite-length array 02 -- 2 03 -- 3 FF -- "break" 82 -- Array of length 2 04 -- 4 05 -- 5
0x83019F0203FF82040583--长度为301--19F的数组--开始不定长数组02--203--3FF--“中断”82--长度为204--405--5的数组
An example of an indefinite-length map (that happens to have two key/value pairs) might be:
不定长映射(恰好有两个键/值对)的示例可能是:
0xbf6346756ef563416d7421ff BF -- Start indefinite-length map 63 -- First key, UTF-8 string length 3 46756e -- "Fun" F5 -- First value, true 63 -- Second key, UTF-8 string length 3 416d74 -- "Amt" 21 -- -2 FF -- "break"
0xbf6346756ef563416d7421ff BF--开始不定长映射63--第一个键,UTF-8字符串长度34756E--“乐趣”F5--第一个值,true 63--第二个键,UTF-8字符串长度3416D74--“金额”21--2 FF--“中断”
Indefinite-length byte strings and text strings are actually a concatenation of zero or more definite-length byte or text strings ("chunks") that are together treated as one contiguous string. Indefinite-length strings are opened with the major type and additional information value of 31, but what follows are a series of byte or text strings that have definite lengths (the chunks). The end of the series of chunks is indicated by encoding the "break" stop code (0b111_11111) in a place where the next chunk in the series would occur. The contents of the chunks are concatenated together,
不定长字节字符串和文本字符串实际上是零个或多个定长字节或文本字符串(“块”)的串联,它们一起被视为一个连续字符串。不定长字符串是以主类型和附加信息值31打开的,但接下来是一系列具有定长的字节或文本字符串(块)。通过在序列中下一个块出现的位置编码“中断”停止代码(0b111_11111)来指示该系列块的结束。块的内容连接在一起,
and the overall length of the indefinite-length string will be the sum of the lengths of all of the chunks. In summary, an indefinite-length string is encoded similarly to how an indefinite-length array of its chunks would be encoded, except that the major type of the indefinite-length string is that of a (text or byte) string and matches the major types of its chunks.
不定长字符串的总长度是所有块的长度之和。总之,不定长字符串的编码方式与其块的不定长数组的编码方式类似,不同之处在于不定长字符串的主要类型是(文本或字节)字符串,并且与其块的主要类型相匹配。
For indefinite-length byte strings, every data item (chunk) between the indefinite-length indicator and the "break" MUST be a definite-length byte string item; if the parser sees any item type other than a byte string before it sees the "break", it is an error.
对于不定长字节字符串,不定长指示符和“break”之间的每个数据项(块)必须是定长字节字符串项;如果解析器在看到“break”之前看到除字节字符串以外的任何项类型,则这是一个错误。
For example, assume the sequence:
例如,假设序列为:
0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111
0b010_111110B010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111
5F -- Start indefinite-length byte string 44 -- Byte string of length 4 aabbccdd -- Bytes content 43 -- Byte string of length 3 eeff99 -- Bytes content FF -- "break"
5F--开始不定长字节字符串44--长度为4的字节字符串aabbccdd--字节内容43--长度为3的字节字符串eeff99--字节内容FF--“中断”
After decoding, this results in a single byte string with seven bytes: 0xaabbccddeeff99.
解码后,将生成一个包含七个字节的单字节字符串:0xaabbccddeeff99。
Text strings with indefinite lengths act the same as byte strings with indefinite lengths, except that all their chunks MUST be definite-length text strings. Note that this implies that the bytes of a single UTF-8 character cannot be spread between chunks: a new chunk can only be started at a character boundary.
长度不定的文本字符串与长度不定的字节字符串的作用相同,只是它们的所有块必须是长度不定的文本字符串。请注意,这意味着单个UTF-8字符的字节不能分布在块之间:新块只能在字符边界处启动。
Major type 7 is for two types of data: floating-point numbers and "simple values" that do not need any content. Each value of the 5-bit additional information in the initial byte has its own separate meaning, as defined in Table 1. Like the major types for integers, items of this major type do not carry content data; all the information is in the initial bytes.
主要类型7用于两种类型的数据:浮点数和不需要任何内容的“简单值”。初始字节中5位附加信息的每个值都有其单独的含义,如表1所示。与整数的主要类型一样,此主要类型的项不携带内容数据;所有信息都在初始字节中。
+-------------+--------------------------------------------------+ | 5-Bit Value | Semantics | +-------------+--------------------------------------------------+ | 0..23 | Simple value (value 0..23) | | | | | 24 | Simple value (value 32..255 in following byte) | | | | | 25 | IEEE 754 Half-Precision Float (16 bits follow) | | | | | 26 | IEEE 754 Single-Precision Float (32 bits follow) | | | | | 27 | IEEE 754 Double-Precision Float (64 bits follow) | | | | | 28-30 | (Unassigned) | | | | | 31 | "break" stop code for indefinite-length items | +-------------+--------------------------------------------------+
+-------------+--------------------------------------------------+ | 5-Bit Value | Semantics | +-------------+--------------------------------------------------+ | 0..23 | Simple value (value 0..23) | | | | | 24 | Simple value (value 32..255 in following byte) | | | | | 25 | IEEE 754 Half-Precision Float (16 bits follow) | | | | | 26 | IEEE 754 Single-Precision Float (32 bits follow) | | | | | 27 | IEEE 754 Double-Precision Float (64 bits follow) | | | | | 28-30 | (Unassigned) | | | | | 31 | "break" stop code for indefinite-length items | +-------------+--------------------------------------------------+
Table 1: Values for Additional Information in Major Type 7
表1:主要类型7的附加信息值
As with all other major types, the 5-bit value 24 signifies a single-byte extension: it is followed by an additional byte to represent the simple value. (To minimize confusion, only the values 32 to 255 are used.) This maintains the structure of the initial bytes: as for the other major types, the length of these always depends on the additional information in the first byte. Table 2 lists the values assigned and available for simple types.
与所有其他主要类型一样,5位值24表示一个单字节扩展:后跟一个附加字节以表示简单值。(为了尽量减少混淆,只使用值32到255。)这保持了初始字节的结构:对于其他主要类型,它们的长度始终取决于第一个字节中的附加信息。表2列出了为简单类型分配和可用的值。
+---------+-----------------+ | Value | Semantics | +---------+-----------------+ | 0..19 | (Unassigned) | | | | | 20 | False | | | | | 21 | True | | | | | 22 | Null | | | | | 23 | Undefined value | | | | | 24..31 | (Reserved) | | | | | 32..255 | (Unassigned) | +---------+-----------------+
+---------+-----------------+ | Value | Semantics | +---------+-----------------+ | 0..19 | (Unassigned) | | | | | 20 | False | | | | | 21 | True | | | | | 22 | Null | | | | | 23 | Undefined value | | | | | 24..31 | (Reserved) | | | | | 32..255 | (Unassigned) | +---------+-----------------+
Table 2: Simple Values
表2:简单值
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit IEEE 754 binary floating-point values. These floating-point values are encoded in the additional bytes of the appropriate size. (See Appendix D for some information about 16-bit floating point.)
25、26和27的5位值用于16位、32位和64位IEEE 754二进制浮点值。这些浮点值以适当大小的附加字节进行编码。(有关16位浮点的一些信息,请参见附录D。)
In CBOR, a data item can optionally be preceded by a tag to give it additional semantics while retaining its structure. The tag is major type 6, and represents an integer number as indicated by the tag's integer value; the (sole) data item is carried as content data. If a tag requires structured data, this structure is encoded into the nested data item. The definition of a tag usually restricts what kinds of nested data item or items can be carried by a tag.
在CBOR中,可以选择在数据项前面加上标记,以便在保留其结构的同时为其提供额外的语义。标记是主类型6,表示由标记的整数值指示的整数;(唯一)数据项作为内容数据携带。如果标记需要结构化数据,则此结构将编码到嵌套数据项中。标记的定义通常限制标记可以携带哪种类型的嵌套数据项。
The initial bytes of the tag follow the rules for positive integers (major type 0). The tag is followed by a single data item of any type. For example, assume that a byte string of length 12 is marked with a tag to indicate it is a positive bignum (Section 2.4.2). This would be marked as 0b110_00010 (major type 6, additional information 2 for the tag) followed by 0b010_01100 (major type 2, additional information of 12 for the length) followed by the 12 bytes of the bignum.
标记的初始字节遵循正整数规则(主类型0)。标记后面跟着任何类型的单个数据项。例如,假设一个长度为12的字节字符串被标记为一个标记,表示它是一个正的bignum(第2.4.2节)。这将被标记为0b110_00010(主类型6,标记的附加信息2),后跟0b010_01100(主类型2,长度的附加信息为12),后跟bignum的12个字节。
Decoders do not need to understand tags, and thus tags may be of little value in applications where the implementation creating a particular CBOR data item and the implementation decoding that stream know the semantic meaning of each item in the data flow. Their primary purpose in this specification is to define common data types such as dates. A secondary purpose is to allow optional tagging when the decoder is a generic CBOR decoder that might be able to benefit from hints about the content of items. Understanding the semantic tags is optional for a decoder; it can just jump over the initial bytes of the tag and interpret the tagged data item itself.
解码器不需要理解标记,因此,在创建特定CBOR数据项的实现和该流的实现解码知道数据流中每个项的语义的应用中,标记可能没有什么价值。在本规范中,它们的主要目的是定义常见的数据类型,如日期。第二个目的是,当解码器是通用CBOR解码器时,允许进行可选标记,该解码器可能能够从项目内容的提示中获益。理解语义标记对于解码器是可选的;它可以跳过标记的初始字节并解释标记的数据项本身。
A tag always applies to the item that is directly followed by it. Thus, if tag A is followed by tag B, which is followed by data item C, tag A applies to the result of applying tag B on data item C. That is, a tagged item is a data item consisting of a tag and a value. The content of the tagged item is the data item (the value) that is being tagged.
标记始终应用于紧跟其后的项目。因此,如果标签A后面跟着标签B,标签B后面跟着数据项C,则标签A适用于将标签B应用于数据项C的结果。也就是说,标签项是由标签和值组成的数据项。标记项的内容是要标记的数据项(值)。
IANA maintains a registry of tag values as described in Section 7.2. Table 3 provides a list of initial values, with definitions in the rest of this section.
IANA维护第7.2节所述的标签值注册表。表3提供了初始值列表,定义见本节其余部分。
+--------------+------------------+---------------------------------+ | Tag | Data Item | Semantics | +--------------+------------------+---------------------------------+ | 0 | UTF-8 string | Standard date/time string; see | | | | Section 2.4.1 | | | | | | 1 | multiple | Epoch-based date/time; see | | | | Section 2.4.1 | | | | | | 2 | byte string | Positive bignum; see Section | | | | 2.4.2 | | | | | | 3 | byte string | Negative bignum; see Section | | | | 2.4.2 | | | | | | 4 | array | Decimal fraction; see Section | | | | 2.4.3 | | | | | | 5 | array | Bigfloat; see Section 2.4.3 | | | | | | 6..20 | (Unassigned) | (Unassigned) | | | | | | 21 | multiple | Expected conversion to | | | | base64url encoding; see | | | | Section 2.4.4.2 | | | | | | 22 | multiple | Expected conversion to base64 | | | | encoding; see Section 2.4.4.2 | | | | | | 23 | multiple | Expected conversion to base16 | | | | encoding; see Section 2.4.4.2 | | | | | | 24 | byte string | Encoded CBOR data item; see | | | | Section 2.4.4.1 | | | | | | 25..31 | (Unassigned) | (Unassigned) | | | | | | 32 | UTF-8 string | URI; see Section 2.4.4.3 | | | | | | 33 | UTF-8 string | base64url; see Section 2.4.4.3 | | | | | | 34 | UTF-8 string | base64; see Section 2.4.4.3 | | | | | | 35 | UTF-8 string | Regular expression; see | | | | Section 2.4.4.3 | | | | | | 36 | UTF-8 string | MIME message; see Section | | | | 2.4.4.3 |
+--------------+------------------+---------------------------------+ | Tag | Data Item | Semantics | +--------------+------------------+---------------------------------+ | 0 | UTF-8 string | Standard date/time string; see | | | | Section 2.4.1 | | | | | | 1 | multiple | Epoch-based date/time; see | | | | Section 2.4.1 | | | | | | 2 | byte string | Positive bignum; see Section | | | | 2.4.2 | | | | | | 3 | byte string | Negative bignum; see Section | | | | 2.4.2 | | | | | | 4 | array | Decimal fraction; see Section | | | | 2.4.3 | | | | | | 5 | array | Bigfloat; see Section 2.4.3 | | | | | | 6..20 | (Unassigned) | (Unassigned) | | | | | | 21 | multiple | Expected conversion to | | | | base64url encoding; see | | | | Section 2.4.4.2 | | | | | | 22 | multiple | Expected conversion to base64 | | | | encoding; see Section 2.4.4.2 | | | | | | 23 | multiple | Expected conversion to base16 | | | | encoding; see Section 2.4.4.2 | | | | | | 24 | byte string | Encoded CBOR data item; see | | | | Section 2.4.4.1 | | | | | | 25..31 | (Unassigned) | (Unassigned) | | | | | | 32 | UTF-8 string | URI; see Section 2.4.4.3 | | | | | | 33 | UTF-8 string | base64url; see Section 2.4.4.3 | | | | | | 34 | UTF-8 string | base64; see Section 2.4.4.3 | | | | | | 35 | UTF-8 string | Regular expression; see | | | | Section 2.4.4.3 | | | | | | 36 | UTF-8 string | MIME message; see Section | | | | 2.4.4.3 |
| | | | | 37..55798 | (Unassigned) | (Unassigned) | | | | | | 55799 | multiple | Self-describe CBOR; see | | | | Section 2.4.5 | | | | | | 55800+ | (Unassigned) | (Unassigned) | +--------------+------------------+---------------------------------+
| | | | | 37..55798 | (Unassigned) | (Unassigned) | | | | | | 55799 | multiple | Self-describe CBOR; see | | | | Section 2.4.5 | | | | | | 55800+ | (Unassigned) | (Unassigned) | +--------------+------------------+---------------------------------+
Table 3: Values for Tags
表3:标签的值
Tag value 0 is for date/time strings that follow the standard format described in [RFC3339], as refined by Section 3.3 of [RFC4287].
标记值0表示遵循[RFC3339]中所述标准格式的日期/时间字符串,如[RFC4287]第3.3节所述。
Tag value 1 is for numerical representation of seconds relative to 1970-01-01T00:00Z in UTC time. (For the non-negative values that the Portable Operating System Interface (POSIX) defines, the number of seconds is counted in the same way as for POSIX "seconds since the epoch" [TIME_T].) The tagged item can be a positive or negative integer (major types 0 and 1), or a floating-point number (major type 7 with additional information 25, 26, or 27). Note that the number can be negative (time before 1970-01-01T00:00Z) and, if a floating-point number, indicate fractional seconds.
标记值1表示UTC时间中相对于1970-01-01T00:00Z的秒数。(对于便携式操作系统接口(POSIX)定义的非负值,秒数的计算方法与POSIX“自历元起的秒数”[时间])的计算方法相同。标记项可以是正整数或负整数(主要类型0和1),也可以是浮点数(主要类型7,带有附加信息25、26或27). 请注意,数字可以是负数(1970-01-01T00:00Z之前的时间),如果是浮点数,则表示小数秒。
Bignums are integers that do not fit into the basic integer representations provided by major types 0 and 1. They are encoded as a byte string data item, which is interpreted as an unsigned integer n in network byte order. For tag value 2, the value of the bignum is n. For tag value 3, the value of the bignum is -1 - n. Decoders that understand these tags MUST be able to decode bignums that have leading zeroes.
Bignums是不符合主要类型0和1提供的基本整数表示形式的整数。它们被编码为字节字符串数据项,按网络字节顺序被解释为无符号整数n。对于标记值2,bignum的值为n。对于标记值3,bignum的值是-1-n。理解这些标记的解码器必须能够解码具有前导零的bignum。
For example, the number 18446744073709551616 (2**64) is represented as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major type 2, length 9), followed by 0x010000000000000000 (one byte 0x01 and eight bytes 0x00). In hexadecimal:
例如,数字18446744073709551616(2**64)表示为0b110_00010(主类型6,标记2),后跟0b010_01001(主类型2,长度9),后跟0x010000000000(一个字节0x01和八个字节0x00)。十六进制:
C2 -- Tag 2 29 -- Byte string of length 9 010000000000000000 -- Bytes content
C2—标记2 29—长度为9 010000000000000000的字节字符串—字节内容
Decimal fractions combine an integer mantissa with a base-10 scaling factor. They are most useful if an application needs the exact representation of a decimal fraction such as 1.1 because there is no exact representation for many decimal fractions in binary floating point.
小数将整数尾数与以10为基数的比例因子相结合。如果应用程序需要精确表示十进制分数(如1.1),它们最有用,因为二进制浮点中的许多十进制分数没有精确表示。
Bigfloats combine an integer mantissa with a base-2 scaling factor. They are binary floating-point values that can exceed the range or the precision of the three IEEE 754 formats supported by CBOR (Section 2.3). Bigfloats may also be used by constrained applications that need some basic binary floating-point capability without the need for supporting IEEE 754.
BigFloat将整数尾数与基数为2的比例因子相结合。它们是二进制浮点值,可能超过CBOR支持的三种IEEE 754格式的范围或精度(第2.3节)。BigFloat也可用于需要一些基本二进制浮点功能而不需要支持IEEE 754的受限应用程序。
A decimal fraction or a bigfloat is represented as a tagged array that contains exactly two integer numbers: an exponent e and a mantissa m. Decimal fractions (tag 4) use base-10 exponents; the value of a decimal fraction data item is m*(10**e). Bigfloats (tag 5) use base-2 exponents; the value of a bigfloat data item is m*(2**e). The exponent e MUST be represented in an integer of major type 0 or 1, while the mantissa also can be a bignum (Section 2.4.2).
小数或大浮点表示为一个标记数组,该数组正好包含两个整数:指数e和尾数m。小数(标记4)使用以10为基数的指数;小数点数据项的值为m*(10**e)。BigFloat(标记5)使用基数为2的指数;bigfloat数据项的值为m*(2**e)。指数e必须用主类型为0或1的整数表示,尾数也可以是bignum(第2.4.2节)。
An example of a decimal fraction is that the number 273.15 could be represented as 0b110_00100 (major type of 6 for the tag, additional information of 4 for the type of tag), followed by 0b100_00010 (major type of 4 for the array, additional information of 2 for the length of the array), followed by 0b001_00001 (major type of 1 for the first integer, additional information of 1 for the value of -2), followed by 0b000_11001 (major type of 0 for the second integer, additional information of 25 for a two-byte value), followed by 0b0110101010110011 (27315 in two bytes). In hexadecimal:
小数的一个例子是,数字273.15可以表示为0b110_00100(标记的主要类型为6,标记类型的附加信息为4),然后是0b100_00010(数组的主要类型为4,数组长度的附加信息为2),然后是0b001_00001(第一个整数的主类型为1,值为-2的附加信息为1),然后是0b000_11001(第二个整数的主类型为0,两个字节值的附加信息为25),然后是0b0110101010110011(两个字节中的27315)。十六进制:
C4 -- Tag 4 82 -- Array of length 2 21 -- -2 19 6ab3 -- 27315
C4--标记482--长度为221--2196AB3--27315的数组
An example of a bigfloat is that the number 1.5 could be represented as 0b110_00101 (major type of 6 for the tag, additional information of 5 for the type of tag), followed by 0b100_00010 (major type of 4 for the array, additional information of 2 for the length of the array), followed by 0b001_00000 (major type of 1 for the first integer, additional information of 0 for the value of -1), followed by 0b000_00011 (major type of 0 for the second integer, additional information of 3 for the value of 3). In hexadecimal:
大浮点的一个例子是数字1.5可以表示为0b110_00101(标记的主要类型为6,标记类型的附加信息为5),然后是0b100_00010(数组的主要类型为4,数组长度的附加信息为2),最后是0b001_00000(第一个整数的主类型为1,值为-1的附加信息为0),然后是0b000_00011(第二个整数的主类型为0,值为3的附加信息为3)。十六进制:
C5 -- Tag 5 82 -- Array of length 2 20 -- -1 03 -- 3
C5--标记5 82--长度为220--1 03--3的数组
Decimal fractions and bigfloats provide no representation of Infinity, -Infinity, or NaN; if these are needed in place of a decimal fraction or bigfloat, the IEEE 754 half-precision representations from Section 2.3 can be used. For constrained applications, where there is a choice between representing a specific number as an integer and as a decimal fraction or bigfloat (such as when the exponent is small and non-negative), there is a quality-of-implementation expectation that the integer representation is used directly.
小数和大浮点数不表示无穷大、无穷大或NaN;如果需要用它们代替小数或大浮点,则可以使用第2.3节中的IEEE 754半精度表示法。对于受约束的应用程序,如果可以选择将特定数字表示为整数和十进制分数或bigfloat(例如当指数很小且非负时),则实现质量要求直接使用整数表示。
The tags in this section are for content hints that might be used by generic CBOR processors.
本节中的标记用于通用CBOR处理器可能使用的内容提示。
Sometimes it is beneficial to carry an embedded CBOR data item that is not meant to be decoded immediately at the time the enclosing data item is being parsed. Tag 24 (CBOR data item) can be used to tag the embedded byte string as a data item encoded in CBOR format.
有时,携带嵌入的CBOR数据项是有益的,该数据项不意味着在解析封闭数据项时立即解码。标记24(CBOR数据项)可用于将嵌入的字节字符串标记为以CBOR格式编码的数据项。
Tags 21 to 23 indicate that a byte string might require a specific encoding when interoperating with a text-based representation. These tags are useful when an encoder knows that the byte string data it is writing is likely to be later converted to a particular JSON-based usage. That usage specifies that some strings are encoded as base64, base64url, and so on. The encoder uses byte strings instead of doing the encoding itself to reduce the message size, to reduce the code size of the encoder, or both. The encoder does not know whether or not the converter will be generic, and therefore wants to say what it believes is the proper way to convert binary strings to JSON.
标记21到23表示在与基于文本的表示进行互操作时,字节字符串可能需要特定的编码。当编码器知道它正在写入的字节字符串数据可能稍后转换为基于JSON的特定用法时,这些标记非常有用。该用法指定将某些字符串编码为base64、base64url等。编码器使用字节字符串而不是编码本身来减小消息大小,或者减小编码器的代码大小,或者两者兼而有之。编码器不知道转换器是否是通用的,因此想要说明它认为将二进制字符串转换为JSON的正确方法。
The data item tagged can be a byte string or any other data item. In the latter case, the tag applies to all of the byte string data items contained in the data item, except for those contained in a nested data item tagged with an expected conversion.
标记的数据项可以是字节字符串或任何其他数据项。在后一种情况下,标记应用于数据项中包含的所有字节字符串数据项,但包含在使用预期转换标记的嵌套数据项中的数据项除外。
These three tag types suggest conversions to three of the base data encodings defined in [RFC4648]. For base64url encoding, padding is not used (see Section 3.2 of RFC 4648); that is, all trailing equals
这三种标记类型建议转换为[RFC4648]中定义的三种基本数据编码。对于base64url编码,不使用填充(参见RFC 4648第3.2节);也就是说,所有尾随项都相等
signs ("=") are removed from the base64url-encoded string. Later tags might be defined for other data encodings of RFC 4648 or for other ways to encode binary data in strings.
符号(“=”)将从base64url编码字符串中删除。稍后的标记可以为RFC4648的其他数据编码或以字符串形式编码二进制数据的其他方式定义。
Some text strings hold data that have formats widely used on the Internet, and sometimes those formats can be validated and presented to the application in appropriate form by the decoder. There are tags for some of these formats.
有些文本字符串保存的数据格式在互联网上广泛使用,有时解码器可以验证这些格式,并将其以适当的形式呈现给应用程序。其中一些格式有标签。
o Tag 32 is for URIs, as defined in [RFC3986];
o 标签32用于URI,如[RFC3986]中所定义;
o Tags 33 and 34 are for base64url- and base64-encoded text strings, as defined in [RFC4648];
o 标记33和34用于base64url和base64编码的文本字符串,如[RFC4648]中所定义;
o Tag 35 is for regular expressions in Perl Compatible Regular Expressions (PCRE) / JavaScript syntax [ECMA262].
o 标记35用于Perl兼容正则表达式(PCRE)/JavaScript语法[ECMA262]中的正则表达式。
o Tag 36 is for MIME messages (including all headers), as defined in [RFC2045];
o 标记36用于[RFC2045]中定义的MIME消息(包括所有头);
Note that tags 33 and 34 differ from 21 and 22 in that the data is transported in base-encoded form for the former and in raw byte string form for the latter.
注意,标记33和34与21和22的不同之处在于,前者以基本编码形式传输数据,后者以原始字节字符串形式传输数据。
In many applications, it will be clear from the context that CBOR is being employed for encoding a data item. For instance, a specific protocol might specify the use of CBOR, or a media type is indicated that specifies its use. However, there may be applications where such context information is not available, such as when CBOR data is stored in a file and disambiguating metadata is not in use. Here, it may help to have some distinguishing characteristics for the data itself.
在许多应用中,从上下文中可以清楚地看出,CBOR被用于对数据项进行编码。例如,特定的协议可能指定CBOR的使用,或者指示指定其使用的媒体类型。然而,在某些应用程序中,此类上下文信息可能不可用,例如CBOR数据存储在文件中,而消除歧义元数据未被使用。在这里,它可能有助于对数据本身具有一些区别特征。
Tag 55799 is defined for this purpose. It does not impart any special semantics on the data item that follows; that is, the semantics of a data item tagged with tag 55799 is exactly identical to the semantics of the data item itself.
标签55799是为此目的而定义的。它不会对后面的数据项赋予任何特殊语义;也就是说,使用标记55799标记的数据项的语义与数据项本身的语义完全相同。
The serialization of this tag is 0xd9d9f7, which appears not to be in use as a distinguishing mark for frequently used file types. In particular, it is not a valid start of a Unicode text in any Unicode encoding if followed by a valid CBOR data item.
此标记的序列化为0xd9d9f7,它似乎没有用作常用文件类型的区分标记。特别是,如果后跟有效的CBOR数据项,则在任何Unicode编码中,它都不是Unicode文本的有效开头。
For instance, a decoder might be able to parse both CBOR and JSON. Such a decoder would need to mechanically distinguish the two formats. An easy way for an encoder to help the decoder would be to tag the entire CBOR item with tag 55799, the serialization of which will never be found at the beginning of a JSON text.
例如,解码器可能能够解析CBOR和JSON。这样的解码器需要机械地区分这两种格式。编码器帮助解码器的一个简单方法是使用标记55799标记整个CBOR项,在JSON文本的开头永远找不到该标记的序列化。
Data formats such as CBOR are often used in environments where there is no format negotiation. A specific design goal of CBOR is to not need any included or assumed schema: a decoder can take a CBOR item and decode it with no other knowledge.
诸如CBOR之类的数据格式通常用于没有格式协商的环境中。CBOR的一个特定设计目标是不需要任何包含的或假定的模式:解码器可以获取CBOR项并在没有其他知识的情况下对其进行解码。
Of course, in real-world implementations, the encoder and the decoder will have a shared view of what should be in a CBOR data item. For example, an agreed-to format might be "the item is an array whose first value is a UTF-8 string, second value is an integer, and subsequent values are zero or more floating-point numbers" or "the item is a map that has byte strings for keys and contains at least one pair whose key is 0xab01".
当然,在现实世界的实现中,编码器和解码器将拥有CBOR数据项中应该包含的内容的共享视图。例如,商定的格式可能是“该项是一个数组,其第一个值为UTF-8字符串,第二个值为整数,后续值为零或多个浮点数”或“该项是一个映射,其中包含密钥的字节字符串,并且至少包含一对密钥为0xab01”。
This specification puts no restrictions on CBOR-based protocols. An encoder can be capable of encoding as many or as few types of values as is required by the protocol in which it is used; a decoder can be capable of understanding as many or as few types of values as is required by the protocols in which it is used. This lack of restrictions allows CBOR to be used in extremely constrained environments.
本规范对基于CBOR的协议没有任何限制。编码器能够根据使用它的协议的要求编码任意多或任意少的值类型;解码器能够理解使用它的协议所要求的尽可能多或尽可能少的值类型。由于缺乏限制,CBOR可以在极度受限的环境中使用。
This section discusses some considerations in creating CBOR-based protocols. It is advisory only and explicitly excludes any language from RFC 2119 other than words that could be interpreted as "MAY" in the sense of RFC 2119.
本节讨论创建基于CBOR的协议时的一些注意事项。它只是建议性的,并明确排除RFC 2119中的任何语言,但RFC 2119意义上可解释为“可能”的词语除外。
In a streaming application, a data stream may be composed of a sequence of CBOR data items concatenated back-to-back. In such an environment, the decoder immediately begins decoding a new data item if data is found after the end of a previous data item.
在流应用中,数据流可以由背靠背连接的CBOR数据项序列组成。在这样的环境中,如果在先前数据项的结束之后发现数据,则解码器立即开始解码新数据项。
Not all of the bytes making up a data item may be immediately available to the decoder; some decoders will buffer additional data until a complete data item can be presented to the application. Other decoders can present partial information about a top-level data item to an application, such as the nested data items that could already be decoded, or even parts of a byte string that hasn't completely arrived yet.
并非构成数据项的所有字节可立即用于解码器;一些解码器将缓冲额外的数据,直到完整的数据项可以呈现给应用程序。其他解码器可以向应用程序提供有关顶级数据项的部分信息,例如可以解码的嵌套数据项,甚至是尚未完全到达的字节字符串部分。
Note that some applications and protocols will not want to use indefinite-length encoding. Using indefinite-length encoding allows an encoder to not need to marshal all the data for counting, but it requires a decoder to allocate increasing amounts of memory while waiting for the end of the item. This might be fine for some applications but not others.
请注意,某些应用程序和协议不希望使用不定长编码。使用不定长编码允许编码器不需要封送所有数据进行计数,但它需要解码器在等待项目结束时分配越来越多的内存。这可能适用于某些应用程序,但不适用于其他应用程序。
A generic CBOR decoder can decode all well-formed CBOR data and present them to an application. CBOR data is well-formed if it uses the initial bytes, as well as the byte strings and/or data items that are implied by their values, in the manner defined by CBOR, and no extraneous data follows (Appendix C).
通用CBOR解码器可以解码所有格式良好的CBOR数据,并将其呈现给应用程序。如果CBOR数据以CBOR定义的方式使用初始字节以及其值所隐含的字节字符串和/或数据项,并且没有任何无关数据,则CBOR数据是格式良好的(附录C)。
Even though CBOR attempts to minimize these cases, not all well-formed CBOR data is valid: for example, the format excludes simple values below 32 that are encoded with an extension byte. Also, specific tags may make semantic constraints that may be violated, such as by including a tag in a bignum tag or by following a byte string within a date tag. Finally, the data may be invalid, such as invalid UTF-8 strings or date strings that do not conform to [RFC3339]. There is no requirement that generic encoders and decoders make unnatural choices for their application interface to enable the processing of invalid data. Generic encoders and decoders are expected to forward simple values and tags even if their specific codepoints are not registered at the time the encoder/decoder is written (Section 3.5).
尽管CBOR试图最小化这些情况,但并非所有格式良好的CBOR数据都有效:例如,该格式排除了使用扩展字节编码的32以下的简单值。此外,特定的标记可能会产生可能被违反的语义约束,例如在bignum标记中包含一个标记,或者在日期标记中跟随一个字节字符串。最后,数据可能无效,例如无效的UTF-8字符串或不符合[RFC3339]的日期字符串。不要求通用编码器和解码器对其应用程序接口做出不自然的选择,以允许处理无效数据。通用编码器和解码器应转发简单值和标签,即使在编写编码器/解码器时未注册其特定码点(第3.5节)。
Generic decoders provide ways to present well-formed CBOR values, both valid and invalid, to an application. The diagnostic notation (Section 6) may be used to present well-formed CBOR values to humans.
通用解码器提供向应用程序显示格式良好的CBOR值(有效和无效)的方法。诊断符号(第6节)可用于向人类呈现形式良好的CBOR值。
Generic encoders provide an application interface that allows the application to specify any well-formed value, including simple values and tags unknown to the encoder.
通用编码器提供一个应用程序接口,允许应用程序指定任何格式良好的值,包括编码器未知的简单值和标记。
A decoder encountering a CBOR data item that is not well-formed generally can choose to completely fail the decoding (issue an error and/or stop processing altogether), substitute the problematic data and data items using a decoder-specific convention that clearly indicates there has been a problem, or take some other action.
遇到格式不正确的CBOR数据项的解码器通常可以选择完全失败解码(发出错误和/或完全停止处理),使用明确指示存在问题的解码器特定约定替换有问题的数据和数据项,或采取一些其他行动。
The representation of a CBOR data item has a specific length, determined by its initial bytes and by the structure of any data items enclosed in the data items. If less data is available, this can be treated as a syntax error. A decoder may also implement incremental parsing, that is, decode the data item as far as it is available and present the data found so far (such as in an event-based interface), with the option of continuing the decoding once further data is available.
CBOR数据项的表示具有特定的长度,由其初始字节和数据项中包含的任何数据项的结构决定。如果可用数据较少,则可以将其视为语法错误。解码器还可以实现增量解析,即,尽可能地对数据项进行解码,并呈现迄今为止发现的数据(例如在基于事件的接口中),具有一旦进一步的数据可用就继续解码的选项。
Examples of incomplete data items include:
不完整数据项的示例包括:
o A decoder expects a certain number of array or map entries but instead encounters the end of the data.
o 解码器需要一定数量的数组或映射项,但会遇到数据的结尾。
o A decoder processes what it expects to be the last pair in a map and comes to the end of the data.
o 解码器处理它期望的映射中的最后一对,并到达数据的末尾。
o A decoder has just seen a tag and then encounters the end of the data.
o 解码器刚刚看到一个标记,然后遇到数据的结尾。
o A decoder has seen the beginning of an indefinite-length item but encounters the end of the data before it sees the "break" stop code.
o 解码器看到了不定长项的开头,但在看到“中断”停止码之前遇到了数据的结尾。
Examples of malformed indefinite-length data items include:
格式错误的不定长数据项的示例包括:
o Within an indefinite-length byte string or text, a decoder finds an item that is not of the appropriate major type before it finds the "break" stop code.
o 在不定长字节字符串或文本中,解码器在找到“中断”停止码之前,会先找到不属于适当主类型的项。
o Within an indefinite-length map, a decoder encounters the "break" stop code immediately after reading a key (the value is missing).
o 在不定长映射中,解码器在读取密钥后立即遇到“中断”停止码(值丢失)。
Another error is finding a "break" stop code at a point in the data where there is no immediately enclosing (unclosed) indefinite-length item.
另一个错误是在数据中没有立即封闭(未闭合)的不定长项的点上查找“中断”停止代码。
At the time of writing, some additional information values are unassigned and reserved for future versions of this document (see Section 5.2). Since the overall syntax for these additional information values is not yet defined, a decoder that sees an additional information value that it does not understand cannot continue parsing.
在撰写本文时,一些附加信息值未分配,并保留给本文件的未来版本(见第5.2节)。由于这些附加信息值的总体语法尚未定义,因此看到不理解的附加信息值的解码器无法继续解析。
A CBOR data item may be syntactically well-formed but present a problem with interpreting the data encoded in it in the CBOR data model. Generally speaking, a decoder that finds a data item with such a problem might issue a warning, might stop processing altogether, might handle the error and make the problematic value available to the application as such, or take some other type of action.
CBOR数据项可能在语法上格式良好,但在解释CBOR数据模型中编码在其中的数据时存在问题。一般来说,发现有此类问题的数据项的解码器可能会发出警告,可能会完全停止处理,可能会处理错误并使有问题的值可供应用程序使用,或者采取其他类型的操作。
Such problems might include:
这些问题可能包括:
Duplicate keys in a map: Generic decoders (Section 3.2) make data available to applications using the native CBOR data model. That data model includes maps (key-value mappings with unique keys), not multimaps (key-value mappings where multiple entries can have the same key). Thus, a generic decoder that gets a CBOR map item that has duplicate keys will decode to a map with only one instance of that key, or it might stop processing altogether. On the other hand, a "streaming decoder" may not even be able to notice (Section 3.7).
地图中的重复键:通用解码器(第3.2节)使数据可用于使用本机CBOR数据模型的应用程序。该数据模型包括映射(具有唯一键的键值映射),而不是多映射(多个条目可以具有相同键的键值映射)。因此,获取具有重复密钥的CBOR映射项的通用解码器将仅使用该密钥的一个实例解码到映射,或者可能完全停止处理。另一方面,“流解码器”甚至可能无法注意到(第3.7节)。
Inadmissible type on the value following a tag: Tags (Section 2.4) specify what type of data item is supposed to follow the tag; for example, the tags for positive or negative bignums are supposed to be put on byte strings. A decoder that decodes the tagged data item into a native representation (a native big integer in this example) is expected to check the type of the data item being tagged. Even decoders that don't have such native representations available in their environment may perform the check on those tags known to them and react appropriately.
标签后面的值的不允许类型:标签(第2.4节)指定标签后面应该是什么类型的数据项;例如,正bignum或负bignum的标记应该放在字节字符串上。将标记的数据项解码为本机表示(本例中为本机大整数)的解码器应检查被标记的数据项的类型。即使是在其环境中没有此类本机表示的解码器也可能对其已知的标记执行检查,并做出适当的反应。
Invalid UTF-8 string: A decoder might or might not want to verify that the sequence of bytes in a UTF-8 string (major type 3) is actually valid UTF-8 and react appropriately.
无效的UTF-8字符串:解码器可能希望也可能不希望验证UTF-8字符串(主要类型3)中的字节序列是否实际有效,并做出适当的反应。
A decoder that comes across a simple value (Section 2.3) that it does not recognize, such as a value that was added to the IANA registry after the decoder was deployed or a value that the decoder chose not to implement, might issue a warning, might stop processing altogether, might handle the error by making the unknown value available to the application as such (as is expected of generic decoders), or take some other type of action.
解码器遇到其无法识别的简单值(第2.3节),例如部署解码器后添加到IANA注册表的值或解码器选择不实现的值,可能会发出警告,可能会完全停止处理,可以通过使未知值本身对应用程序可用来处理错误(如通用解码器所期望的),或者采取其他类型的操作。
A decoder that comes across a tag (Section 2.4) that it does not recognize, such as a tag that was added to the IANA registry after the decoder was deployed or a tag that the decoder chose not to implement, might issue a warning, might stop processing altogether, might handle the error and present the unknown tag value together with the contained data item to the application (as is expected of generic decoders), might ignore the tag and simply present the contained data item only to the application, or take some other type of action.
遇到无法识别的标记(第2.4节)的解码器,例如部署解码器后添加到IANA注册表的标记或解码器选择不实施的标记,可能会发出警告,可能会完全停止处理,可能会处理错误并将未知标记值与包含的数据项一起呈现给应用程序(正如通用解码器所期望的那样),可能会忽略标记并仅将包含的数据项呈现给应用程序,或者采取其他类型的操作。
For the purposes of this specification, all number representations for the same numeric value are equivalent. This means that an encoder can encode a floating-point value of 0.0 as the integer 0. It, however, also means that an application that expects to find integer values only might find floating-point values if the encoder decides these are desirable, such as when the floating-point value is more compact than a 64-bit integer.
在本规范中,相同数值的所有数字表示形式都是等效的。这意味着编码器可以将0.0的浮点值编码为整数0。然而,这也意味着,如果编码器认为浮点值是可取的,则期望仅查找整数值的应用程序可能会查找浮点值,例如当浮点值比64位整数更紧凑时。
An application or protocol that uses CBOR might restrict the representations of numbers. For instance, a protocol that only deals with integers might say that floating-point numbers may not be used and that decoders of that protocol do not need to be able to handle floating-point numbers. Similarly, a protocol or application that uses CBOR might say that decoders need to be able to handle either type of number.
使用CBOR的应用程序或协议可能会限制数字的表示。例如,一个只处理整数的协议可能会说不使用浮点数,并且该协议的解码器不需要能够处理浮点数。类似地,使用CBOR的协议或应用程序可能会说解码器需要能够处理任意一种类型的数字。
CBOR-based protocols should take into account that different language environments pose different restrictions on the range and precision of numbers that are representable. For example, the JavaScript number system treats all numbers as floating point, which may result in silent loss of precision in decoding integers with more than 53 significant bits. A protocol that uses numbers should define its expectations on the handling of non-trivial numbers in decoders and receiving applications.
基于CBOR的协议应考虑到不同的语言环境对可表示数字的范围和精度造成不同的限制。例如,JavaScript数字系统将所有数字视为浮点数,这可能会导致解码有效位超过53位的整数时无提示地丢失精度。使用数字的协议应定义其在解码器和接收应用程序中处理非平凡数字的期望。
A CBOR-based protocol that includes floating-point numbers can restrict which of the three formats (half-precision, single-precision, and double-precision) are to be supported. For an integer-only application, a protocol may want to completely exclude the use of floating-point values.
包含浮点数的基于CBOR的协议可以限制支持三种格式(半精度、单精度和双精度)中的哪种格式。对于仅限整数的应用程序,协议可能希望完全排除浮点值的使用。
A CBOR-based protocol designed for compactness may want to exclude specific integer encodings that are longer than necessary for the application, such as to save the need to implement 64-bit integers. There is an expectation that encoders will use the most compact integer representation that can represent a given value. However, a compact application should accept values that use a longer-than-needed encoding (such as encoding "0" as 0b000_11101 followed by two bytes of 0x00) as long as the application can decode an integer of the given size.
为紧凑性而设计的基于CBOR的协议可能希望排除比应用程序所需长度更长的特定整数编码,例如省去实现64位整数的需要。有一个期望,编码器将使用最紧凑的整数表示,可以代表一个给定的值。但是,只要应用程序能够解码给定大小的整数,紧凑型应用程序应该接受使用比需要的编码更长的值(例如将“0”编码为0b000_11101,后跟两个字节0x00)。
The encoding and decoding applications need to agree on what types of keys are going to be used in maps. In applications that need to interwork with JSON-based applications, keys probably should be limited to UTF-8 strings only; otherwise, there has to be a specified mapping from the other CBOR types to Unicode characters, and this often leads to implementation errors. In applications where keys are numeric in nature and numeric ordering of keys is important to the application, directly using the numbers for the keys is useful.
编码和解码应用程序需要就地图中使用的密钥类型达成一致。在需要与基于JSON的应用程序交互的应用程序中,密钥可能仅限于UTF-8字符串;否则,必须有从其他CBOR类型到Unicode字符的指定映射,这通常会导致实现错误。在键本质上是数字且键的数字顺序对应用程序很重要的应用程序中,直接使用键的数字非常有用。
If multiple types of keys are to be used, consideration should be given to how these types would be represented in the specific programming environments that are to be used. For example, in JavaScript objects, a key of integer 1 cannot be distinguished from a key of string "1". This means that, if integer keys are used, the simultaneous use of string keys that look like numbers needs to be avoided. Again, this leads to the conclusion that keys should be of a single CBOR type.
如果要使用多种类型的键,则应考虑如何在要使用的特定编程环境中表示这些类型。例如,在JavaScript对象中,不能将整数1的键与字符串“1”的键区分开来。这意味着,如果使用整数键,则需要避免同时使用看起来像数字的字符串键。同样,这导致了一个结论,即密钥应为单一CBOR类型。
Decoders that deliver data items nested within a CBOR data item immediately on decoding them ("streaming decoders") often do not keep the state that is necessary to ascertain uniqueness of a key in a map. Similarly, an encoder that can start encoding data items before the enclosing data item is completely available ("streaming encoder") may want to reduce its overhead significantly by relying on its data source to maintain uniqueness.
解码后立即交付嵌套在CBOR数据项中的数据项的解码器(“流式解码器”)通常不保持确定地图中密钥唯一性所需的状态。类似地,可以在封闭数据项完全可用之前开始编码数据项的编码器(“流式编码器”)可能希望通过依赖其数据源来保持唯一性来显著降低其开销。
A CBOR-based protocol should make an intentional decision about what to do when a receiving application does see multiple identical keys in a map. The resulting rule in the protocol should respect the CBOR data model: it cannot prescribe a specific handling of the entries
基于CBOR的协议应该有意识地决定当接收应用程序确实在映射中看到多个相同的密钥时应该做什么。协议中的最终规则应尊重CBOR数据模型:它不能规定条目的特定处理
with the identical keys, except that it might have a rule that having identical keys in a map indicates a malformed map and that the decoder has to stop with an error. Duplicate keys are also prohibited by CBOR decoders that are using strict mode (Section 3.10).
使用相同的键,但它可能有一个规则,即在映射中具有相同的键表示映射格式不正确,并且解码器必须因错误而停止。使用严格模式的CBOR解码器也禁止重复密钥(第3.10节)。
The CBOR data model for maps does not allow ascribing semantics to the order of the key/value pairs in the map representation. Thus, it would be a very bad practice to define a CBOR-based protocol in such a way that changing the key/value pair order in a map would change the semantics, apart from trivial aspects (cache usage, etc.). (A CBOR-based protocol can prescribe a specific order of serialization, such as for canonicalization.)
地图的CBOR数据模型不允许将语义归因于地图表示中键/值对的顺序。因此,将基于CBOR的协议定义为改变映射中的键/值对顺序会改变语义,除了琐碎的方面(缓存使用等),这将是一种非常糟糕的做法。(基于CBOR的协议可以规定特定的序列化顺序,例如规范化。)
Applications for constrained devices that have maps with 24 or fewer frequently used keys should consider using small integers (and those with up to 48 frequently used keys should consider also using small negative integers) because the keys can then be encoded in a single byte.
应用具有24个或更少频繁使用密钥的映射设备的应用程序应该考虑使用小整数(并且那些多达48个频繁使用的密钥应该考虑也使用小的负整数),因为密钥可以被编码在一个字节中。
In some CBOR-based protocols, the simple value (Section 2.3) of Undefined might be used by an encoder as a substitute for a data item with an encoding problem, in order to allow the rest of the enclosing data items to be encoded without harm.
在一些基于CBOR的协议中,编码器可能会使用未定义的简单值(第2.3节)代替存在编码问题的数据项,以便对其余封闭数据项进行编码而不会造成损害。
Some protocols may want encoders to only emit CBOR in a particular canonical format; those protocols might also have the decoders check that their input is canonical. Those protocols are free to define what they mean by a canonical format and what encoders and decoders are expected to do. This section lists some suggestions for such protocols.
一些协议可能希望编码器仅以特定规范格式发出CBOR;这些协议还可能让解码器检查其输入是否规范。这些协议可以自由定义规范格式的含义以及编码器和解码器的预期用途。本节列出了对此类协议的一些建议。
If a protocol considers "canonical" to mean that two encoder implementations starting with the same input data will produce the same CBOR output, the following four rules would suffice:
如果协议认为“规范”意味着以相同输入数据开始的两个编码器实现将产生相同的CBOR输出,那么以下四条规则就足够了:
o Integers must be as small as possible.
o 整数必须尽可能小。
* 0 to 23 and -1 to -24 must be expressed in the same byte as the major type;
* 0 to 23 and -1 to -24 must be expressed in the same byte as the major type;
* 24 to 255 and -25 to -256 must be expressed only with an additional uint8_t;
* 24 to 255 and -25 to -256 must be expressed only with an additional uint8_t;
* 256 to 65535 and -257 to -65536 must be expressed only with an additional uint16_t;
* 256 to 65535 and -257 to -65536 must be expressed only with an additional uint16_t;
* 65536 to 4294967295 and -65537 to -4294967296 must be expressed only with an additional uint32_t.
* 65536至4294967295和-65537至-4294967296只能用附加的uint32表示。
o The expression of lengths in major types 2 through 5 must be as short as possible. The rules for these lengths follow the above rule for integers.
o 主要类型2至5中的长度表达式必须尽可能短。这些长度的规则遵循上述整数规则。
o The keys in every map must be sorted lowest value to highest. Sorting is performed on the bytes of the representation of the key data items without paying attention to the 3/5 bit splitting for major types. (Note that this rule allows maps that have keys of different types, even though that is probably a bad practice that could lead to errors in some canonicalization implementations.) The sorting rules are:
o 每个映射中的键必须按从低到高的顺序排列。排序是在关键数据项表示的字节上执行的,而不注意主要类型的3/5位拆分。(请注意,此规则允许映射具有不同类型的键,尽管这可能是一种不好的做法,可能会导致某些规范化实现中出现错误。)排序规则包括:
* If two keys have different lengths, the shorter one sorts earlier;
* 如果两个键的长度不同,则较短的键排序较早;
* If two keys have the same length, the one with the lower value in (byte-wise) lexical order sorts earlier.
* 如果两个键具有相同的长度,则在(字节)词法顺序中具有较低值的键排序较早。
o Indefinite-length items must be made into definite-length items.
o 不定长物品必须制成定长物品。
If a protocol allows for IEEE floats, then additional canonicalization rules might need to be added. One example rule might be to have all floats start as a 64-bit float, then do a test conversion to a 32-bit float; if the result is the same numeric value, use the shorter value and repeat the process with a test conversion to a 16-bit float. (This rule selects 16-bit float for positive and negative Infinity as well.) Also, there are many representations for NaN. If NaN is an allowed value, it must always be represented as 0xf97e00.
如果协议允许IEEE浮动,则可能需要添加其他规范化规则。一个示例规则可能是让所有浮点以64位浮点开始,然后将测试转换为32位浮点;如果结果是相同的数值,则使用较短的值并通过测试转换为16位浮点重复该过程。(该规则还为正无穷大和负无穷大选择16位浮点。)此外,NaN有许多表示形式。如果NaN是允许的值,则它必须始终表示为0xf97e00。
CBOR tags present additional considerations for canonicalization. The absence or presence of tags in a canonical format is determined by the optionality of the tags in the protocol. In a CBOR-based protocol that allows optional tagging anywhere, the canonical format must not allow them. In a protocol that requires tags in certain places, the tag needs to appear in the canonical format. A CBOR-based protocol that uses canonicalization might instead say that all tags that appear in a message must be retained regardless of whether they are optional.
CBOR标记为规范化提供了额外的注意事项。规范格式的标签是否存在取决于协议中标签的可选性。在允许在任何地方进行可选标记的基于CBOR的协议中,规范格式不得允许进行可选标记。在某些地方需要标记的协议中,标记需要以规范格式显示。使用规范化的基于CBOR的协议可能会说,无论消息中出现的所有标记是否可选,都必须保留这些标记。
Some areas of application of CBOR do not require canonicalization (Section 3.9) but may require that different decoders reach the same (semantically equivalent) results, even in the presence of potentially malicious data. This can be required if one application (such as a firewall or other protecting entity) makes a decision based on the data that another application, which independently decodes the data, relies on.
CBOR的某些应用领域不需要规范化(第3.9节),但可能需要不同的解码器达到相同(语义等效)的结果,即使存在潜在的恶意数据。如果一个应用程序(如防火墙或其他保护实体)根据另一个独立解码数据的应用程序所依赖的数据做出决定,则可能需要这样做。
Normally, it is the responsibility of the sender to avoid ambiguously decodable data. However, the sender might be an attacker specially making up CBOR data such that it will be interpreted differently by different decoders in an attempt to exploit that as a vulnerability. Generic decoders used in applications where this might be a problem need to support a strict mode in which it is also the responsibility of the receiver to reject ambiguously decodable data. It is expected that firewalls and other security systems that decode CBOR will only decode in strict mode.
通常,发送方有责任避免含糊不清的可解码数据。但是,发送方可能是专门编造CBOR数据的攻击者,因此不同的解码器会对其进行不同的解释,试图利用该漏洞进行攻击。在这可能是一个问题的应用中使用的通用解码器需要支持严格模式,在这种模式下,接收器也有责任拒绝含糊不清的可解码数据。预计对CBOR进行解码的防火墙和其他安全系统将仅在严格模式下进行解码。
A decoder in strict mode will reliably reject any data that could be interpreted by other decoders in different ways. It will reliably reject data items with syntax errors (Section 3.3). It will also expend the effort to reliably detect other decoding errors (Section 3.4). In particular, a strict decoder needs to have an API that reports an error (and does not return data) for a CBOR data item that contains any of the following:
严格模式下的解码器将可靠地拒绝任何可能由其他解码器以不同方式解释的数据。它将可靠地拒绝出现语法错误的数据项(第3.3节)。它还将努力可靠地检测其他解码错误(第3.4节)。特别是,严格解码器需要有一个API,用于报告包含以下任何一项的CBOR数据项的错误(且不返回数据):
o a map (major type 5) that has more than one entry with the same key
o 具有多个具有相同键的条目的映射(主类型5)
o a tag that is used on a data item of the incorrect type
o 在类型不正确的数据项上使用的标记
o a data item that is incorrectly formatted for the type given to it, such as invalid UTF-8 or data that cannot be interpreted with the specific tag that it has been tagged with
o 一种数据项,其格式与给定的类型不符,例如无效的UTF-8或无法用其标记的特定标记进行解释的数据
A decoder in strict mode can do one of two things when it encounters a tag or simple value that it does not recognize:
严格模式下的解码器在遇到无法识别的标记或简单值时,可以执行以下两种操作之一:
o It can report an error (and not return data).
o 它可以报告错误(而不返回数据)。
o It can emit the unknown item (type, value, and, for tags, the decoded tagged data item) to the application calling the decoder with an indication that the decoder did not recognize that tag or simple value.
o 它可以向调用解码器的应用程序发出未知项(类型、值,以及对于标签,解码的标签数据项),并指示解码器未识别该标签或简单值。
The latter approach, which is also appropriate for non-strict decoders, supports forward compatibility with newly registered tags and simple values without the requirement to update the encoder at the same time as the calling application. (For this, the API for the decoder needs to have a way to mark unknown items so that the calling application can handle them in a manner appropriate for the program.)
后一种方法也适用于非严格解码器,支持与新注册的标记和简单值的前向兼容性,而无需在调用应用程序的同时更新编码器。(为此,解码器的API需要有一种标记未知项的方法,以便调用应用程序能够以适合程序的方式处理它们。)
Since some of this processing may have an appreciable cost (in particular with duplicate detection for maps), support of strict mode is not a requirement placed on all CBOR decoders.
由于其中一些处理可能具有可观的成本(特别是地图的重复检测),因此并非所有CBOR解码器都要求支持严格模式。
Some encoders will rely on their applications to provide input data in such a way that unambiguously decodable CBOR results. A generic encoder also may want to provide a strict mode where it reliably limits its output to unambiguously decodable CBOR, independent of whether or not its application is providing API-conformant data.
一些编码器将依靠其应用程序以这样一种方式提供输入数据,即明确可解码的CBOR结果。通用编码器还可能希望提供一种严格的模式,在这种模式下,它可靠地将其输出限制为明确可解码的CBOR,而不依赖于其应用程序是否提供符合API的数据。
This section gives non-normative advice about converting between CBOR and JSON. Implementations of converters are free to use whichever advice here they want.
本节给出了关于在CBOR和JSON之间转换的非规范性建议。转换器的实现可以自由使用他们想要的任何建议。
It is worth noting that a JSON text is a sequence of characters, not an encoded sequence of bytes, while a CBOR data item consists of bytes, not characters.
值得注意的是,JSON文本是字符序列,而不是字节编码序列,而CBOR数据项由字节组成,而不是字符。
Most of the types in CBOR have direct analogs in JSON. However, some do not, and someone implementing a CBOR-to-JSON converter has to consider what to do in those cases. The following non-normative advice deals with these by converting them to a single substitute value, such as a JSON null.
CBOR中的大多数类型在JSON中都有直接的类比。然而,有些人不这样做,并且有人实现CBOR到JSON转换器必须考虑在这些情况下要做什么。以下非规范性建议通过将它们转换为单个替代值(如JSON null)来处理这些问题。
o An integer (major type 0 or 1) becomes a JSON number.
o 整数(主类型0或1)成为JSON编号。
o A byte string (major type 2) that is not embedded in a tag that specifies a proposed encoding is encoded in base64url without padding and becomes a JSON string.
o 未嵌入指定拟议编码的标记中的字节字符串(主要类型2)在base64url中编码,不带填充,并成为JSON字符串。
o A UTF-8 string (major type 3) becomes a JSON string. Note that JSON requires escaping certain characters (RFC 4627, Section 2.5): quotation mark (U+0022), reverse solidus (U+005C), and the "C0 control characters" (U+0000 through U+001F). All other characters are copied unchanged into the JSON UTF-8 string.
o UTF-8字符串(主要类型3)变成JSON字符串。请注意,JSON需要转义某些字符(RFC 4627,第2.5节):引号(U+0022)、反向索利多金币(U+005C)和“C0控制字符”(U+0000到U+001F)。所有其他字符都会原封不动地复制到JSON UTF-8字符串中。
o An array (major type 4) becomes a JSON array.
o 数组(主类型4)变成JSON数组。
o A map (major type 5) becomes a JSON object. This is possible directly only if all keys are UTF-8 strings. A converter might also convert other keys into UTF-8 strings (such as by converting integers into strings containing their decimal representation); however, doing so introduces a danger of key collision.
o 映射(主类型5)成为JSON对象。仅当所有键都是UTF-8字符串时,才可以直接执行此操作。转换器还可以将其他键转换为UTF-8字符串(例如,通过将整数转换为包含其十进制表示的字符串);但是,这样做会带来钥匙碰撞的危险。
o False (major type 7, additional information 20) becomes a JSON false.
o False(主要类型7,附加信息20)变为JSON False。
o True (major type 7, additional information 21) becomes a JSON true.
o True(主类型7,附加信息21)变为JSON True。
o Null (major type 7, additional information 22) becomes a JSON null.
o Null(主要类型7,附加信息22)变为JSON Null。
o A floating-point value (major type 7, additional information 25 through 27) becomes a JSON number if it is finite (that is, it can be represented in a JSON number); if the value is non-finite (NaN, or positive or negative Infinity), it is represented by the substitute value.
o 如果浮点值(主要类型7,附加信息25到27)是有限的(也就是说,它可以用JSON数表示),则它将成为JSON数;如果该值是非有限的(NaN,或正无穷大或负无穷大),则由替换值表示。
o Any other simple value (major type 7, any additional information value not yet discussed) is represented by the substitute value.
o 任何其他简单值(主要类型7,任何尚未讨论的附加信息值)由替代值表示。
o A bignum (major type 6, tag value 2 or 3) is represented by encoding its byte string in base64url without padding and becomes a JSON string. For tag value 3 (negative bignum), a "~" (ASCII tilde) is inserted before the base-encoded value. (The conversion to a binary blob instead of a number is to prevent a likely numeric overflow for the JSON decoder.)
o bignum(主类型6,标记值2或3)通过在base64url中编码其字节字符串来表示,无需填充,并成为JSON字符串。对于标记值3(负bignum),在基编码值之前插入“~”(ASCII波浪号)。(转换为二进制blob而不是数字是为了防止JSON解码器可能出现的数字溢出。)
o A byte string with an encoding hint (major type 6, tag value 21 through 23) is encoded as described and becomes a JSON string.
o 带有编码提示的字节字符串(主类型6,标记值21到23)按所述进行编码,并成为JSON字符串。
o For all other tags (major type 6, any other tag value), the embedded CBOR item is represented as a JSON value; the tag value is ignored.
o 对于所有其他标记(主类型6,任何其他标记值),嵌入的CBOR项表示为JSON值;标记值将被忽略。
o Indefinite-length items are made definite before conversion.
o 不定长项目在转换前确定。
All JSON values, once decoded, directly map into one or more CBOR values. As with any kind of CBOR generation, decisions have to be made with respect to number representation. In a suggested conversion:
所有JSON值一旦解码,就直接映射为一个或多个CBOR值。与任何类型的CBOR生成一样,必须就数字表示作出决定。在建议的转换中:
o JSON numbers without fractional parts (integer numbers) are represented as integers (major types 0 and 1, possibly major type 6 tag value 2 and 3), choosing the shortest form; integers longer than an implementation-defined threshold (which is usually either 32 or 64 bits) may instead be represented as floating-point values. (If the JSON was generated from a JavaScript implementation, its precision is already limited to 53 bits maximum.)
o 不带小数部分(整数)的JSON数字表示为整数(主要类型0和1,可能主要类型6标记值2和3),选择最短形式;大于实现定义的阈值(通常为32或64位)的整数可以表示为浮点值。(如果JSON是从JavaScript实现生成的,则其精度已限制为最多53位。)
o Numbers with fractional parts are represented as floating-point values. Preferably, the shortest exact floating-point representation is used; for instance, 1.5 is represented in a 16-bit floating-point value (not all implementations will be capable of efficiently finding the minimum form, though). There may be an implementation-defined limit to the precision that will affect the precision of the represented values. Decimal representation should only be used if that is specified in a protocol.
o 带有小数部分的数字表示为浮点值。优选地,使用最短精确浮点表示;例如,1.5用16位浮点值表示(但并非所有实现都能有效地找到最小形式)。可能存在实现定义的精度限制,这将影响所表示值的精度。只有在协议中指定了十进制表示时,才应使用十进制表示。
CBOR has been designed to generally provide a more compact encoding than JSON. One implementation strategy that might come to mind is to perform a JSON-to-CBOR encoding in place in a single buffer. This strategy would need to carefully consider a number of pathological cases, such as that some strings represented with no or very few escapes and longer (or much longer) than 255 bytes may expand when encoded as UTF-8 strings in CBOR. Similarly, a few of the binary floating-point representations might cause expansion from some short decimal representations (1.1, 1e9) in JSON. This may be hard to get right, and any ensuing vulnerabilities may be exploited by an attacker.
CBOR通常被设计为提供比JSON更紧凑的编码。可能想到的一种实现策略是在单个缓冲区中执行JSON到CBOR编码。该策略需要仔细考虑一些病理情况,例如,在CBOR中编码为UTF-8字符串时,一些字符串表示为没有或很少的逃逸,并且更长(或更长)超过255字节。类似地,一些二进制浮点表示可能会导致JSON中某些短十进制表示(1.1,1e9)的扩展。这可能很难纠正,攻击者可能会利用任何后续漏洞。
Successful protocols evolve over time. New ideas appear, implementation platforms improve, related protocols are developed and evolve, and new requirements from applications and protocols are added. Facilitating protocol evolution is therefore an important design consideration for any protocol development.
成功的协议会随着时间的推移而发展。出现了新的想法,改进了实现平台,开发和发展了相关协议,并增加了来自应用程序和协议的新需求。因此,促进协议演化是任何协议开发的重要设计考虑因素。
For protocols that will use CBOR, CBOR provides some useful mechanisms to facilitate their evolution. Best practices for this are well known, particularly from JSON format development of JSON-based protocols. Therefore, such best practices are outside the scope of this specification.
对于将使用CBOR的协议,CBOR提供了一些有用的机制来促进它们的发展。这方面的最佳实践是众所周知的,特别是从基于JSON协议的JSON格式开发。因此,此类最佳实践不在本规范的范围内。
However, facilitating the evolution of CBOR itself is very well within its scope. CBOR is designed to both provide a stable basis for development of CBOR-based protocols and to be able to evolve.
然而,促进CBOR本身的发展完全在其范围之内。CBOR的设计目的是为基于CBOR的协议的开发提供稳定的基础,并能够不断发展。
Since a successful protocol may live for decades, CBOR needs to be designed for decades of use and evolution. This section provides some guidance for the evolution of CBOR. It is necessarily more subjective than other parts of this document. It is also necessarily incomplete, lest it turn into a textbook on protocol development.
由于一个成功的协议可能存在几十年,CBOR需要设计为几十年的使用和发展。本节为CBOR的发展提供了一些指导。它必然比本文件的其他部分更加主观。它也必然是不完整的,以免成为协议开发的教科书。
In a protocol design, opportunities for evolution are often included in the form of extension points. For example, there may be a codepoint space that is not fully allocated from the outset, and the protocol is designed to tolerate and embrace implementations that start using more codepoints than initially allocated.
在协议设计中,演化的机会通常以扩展点的形式包含。例如,可能存在一个从一开始就没有完全分配的代码点空间,并且协议被设计为容忍和接受开始使用比最初分配的更多代码点的实现。
Sizing the codepoint space may be difficult because the range required may be hard to predict. An attempt should be made to make the codepoint space large enough so that it can slowly be filled over the intended lifetime of the protocol.
确定码点空间的大小可能很困难,因为所需的范围可能很难预测。应尝试使代码点空间足够大,以便在协议的预期生命周期内缓慢填充。
CBOR has three major extension points:
CBOR有三个主要扩展点:
o the "simple" space (values in major type 7). Of the 24 efficient (and 224 slightly less efficient) values, only a small number have been allocated. Implementations receiving an unknown simple data item may be able to process it as such, given that the structure of the value is indeed simple. The IANA registry in Section 7.1 is the appropriate way to address the extensibility of this codepoint space.
o “简单”空间(主要类型7中的值)。在24个有效值(以及224个效率稍低的值)中,只分配了一小部分。如果值的结构确实很简单,那么接收未知简单数据项的实现可能能够对其进行处理。第7.1节中的IANA注册表是解决此代码点空间可扩展性的适当方式。
o the "tag" space (values in major type 6). Again, only a small part of the codepoint space has been allocated, and the space is abundant (although the early numbers are more efficient than the later ones). Implementations receiving an unknown tag can choose to simply ignore it or to process it as an unknown tag wrapping the following data item. The IANA registry in Section 7.2 is the appropriate way to address the extensibility of this codepoint space.
o “标记”空间(主要类型6中的值)。同样,只分配了一小部分代码点空间,而且空间非常丰富(尽管早期的数字比后期的数字更有效)。接收未知标记的实现可以选择忽略它,或者将其作为包装以下数据项的未知标记进行处理。第7.2节中的IANA注册表是解决此代码点空间可扩展性的适当方法。
o the "additional information" space. An implementation receiving an unknown additional information value has no way to continue parsing, so allocating codepoints to this space is a major step. There are also very few codepoints left.
o “附加信息”空间。接收未知附加信息值的实现无法继续解析,因此将代码点分配到该空间是一个主要步骤。剩下的代码点也非常少。
The human mind is sometimes drawn to filling in little perceived gaps to make something neat. We expect the remaining gaps in the codepoint space for the additional information values to be an attractor for new ideas, just because they are there.
人类的思维有时会被吸引来填补一些感知到的小间隙,从而使事物变得整洁。我们预计,代码点空间中剩余的额外信息值的缺口将成为新想法的吸引器,仅仅因为它们存在。
The present specification does not manage the additional information codepoint space by an IANA registry. Instead, allocations out of this space can only be done by updating this specification.
本规范不通过IANA注册表管理附加信息代码点空间。相反,只能通过更新此规范来完成此空间之外的分配。
For an additional information value of n >= 24, the size of the additional data typically is 2**(n-24) bytes. Therefore, additional information values 28 and 29 should be viewed as candidates for 128-bit and 256-bit quantities, in case a need arises to add them to the protocol. Additional information value 30 is then the only additional information value available for general allocation, and there should be a very good reason for allocating it before assigning it through an update of this protocol.
对于n>=24的附加信息值,附加数据的大小通常为2**(n-24)字节。因此,如果需要将附加信息值28和29添加到协议中,则应将其视为128位和256位量的候选值。因此,附加信息值30是可用于一般分配的唯一附加信息值,在通过更新此协议分配之前,应该有很好的理由进行分配。
CBOR is a binary interchange format. To facilitate documentation and debugging, and in particular to facilitate communication between entities cooperating in debugging, this section defines a simple human-readable diagnostic notation. All actual interchange always happens in the binary format.
CBOR是一种二进制交换格式。为了便于文档编制和调试,特别是为了便于在调试中协作的实体之间的通信,本节定义了一个简单的人类可读的诊断符号。所有实际的交换总是以二进制格式进行的。
Note that this truly is a diagnostic format; it is not meant to be parsed. Therefore, no formal definition (as in ABNF) is given in this document. (Implementers looking for a text-based format for representing CBOR data items in configuration files may also want to consider YAML [YAML].)
请注意,这确实是一种诊断格式;它并不意味着要被解析。因此,本文件未给出正式定义(如ABNF)。(实现基于文本的格式在配置文件中表示CBOR数据项的实现者也可能需要考虑YAML[YAML])。
The diagnostic notation is loosely based on JSON as it is defined in RFC 4627, extending it where needed.
诊断符号松散地基于RFC4627中定义的JSON,并在需要时进行扩展。
The notation borrows the JSON syntax for numbers (integer and floating point), True (>true<), False (>false<), Null (>null<), UTF-8 strings, arrays, and maps (maps are called objects in JSON; the diagnostic notation extends JSON here by allowing any data item in the key position). Undefined is written >undefined< as in JavaScript. The non-finite floating-point numbers Infinity, -Infinity, and NaN are written exactly as in this sentence (this is also a way they can be written in JavaScript, although JSON does not allow them). A tagged item is written as an integer number for the tag followed by the item in parentheses; for instance, an RFC 3339 (ISO 8601) date could be notated as:
该符号借用了JSON语法来表示数字(整数和浮点)、True(>True<)、False(>False<)、Null(>Null<)、UTF-8字符串、数组和映射(映射在JSON中称为对象;诊断符号通过允许键位置的任何数据项来扩展JSON)。Undefined是在JavaScript中编写的>Undefined<。非有限浮点数Infinity、-Infinity和NaN的编写方式与这句话完全相同(这也是一种可以用JavaScript编写的方式,尽管JSON不允许)。标记的项目写为标记的整数,后跟括号中的项目;例如,RFC 3339(ISO 8601)日期可以表示为:
0("2013-03-21T20:04:00Z")
0("2013-03-21T20:04:00Z")
or the equivalent relative time as
或等效的相对时间
1(1363896240)
1(1363896240)
Byte strings are notated in one of the base encodings, without padding, enclosed in single quotes, prefixed by >h< for base16, >b32< for base32, >h32< for base32hex, >b64< for base64 or base64url (the actual encodings do not overlap, so the string remains unambiguous). For example, the byte string 0x12345678 could be written h'12345678', b32'CI2FM6A', or b64'EjRWeA'.
字节字符串用一种基本编码表示,没有填充,用单引号括起来,前缀为:对于base16为>h<,对于base32为>b32<,对于base32hex为>h32<,对于base64或base64url为>b64<(实际编码不重叠,因此字符串保持明确)。例如,字节字符串0x12345678可以写入h'12345678',b32'CI2FM6A'或b64'EjRWeA'。
Unassigned simple values are given as "simple()" with the appropriate integer in the parentheses. For example, "simple(42)" indicates major type 7, value 42.
未分配的简单值以“simple()”的形式给出,括号中包含适当的整数。例如,“simple(42)”表示主类型7,值42。
Sometimes it is useful to indicate in the diagnostic notation which of several alternative representations were actually used; for example, a data item written >1.5< by a diagnostic decoder might have been encoded as a half-, single-, or double-precision float.
有时,在诊断符号中指出实际使用了几种替代表示法中的哪一种是有用的;例如,由诊断解码器写入的大于1.5<的数据项可能已编码为半精度、单精度或双精度浮点。
The convention for encoding indicators is that anything starting with an underscore and all following characters that are alphanumeric or underscore, is an encoding indicator, and can be ignored by anyone not interested in this information. Encoding indicators are always optional.
对指示符进行编码的惯例是,任何以下划线开头的字符以及以下所有字母数字或下划线字符都是编码指示符,任何对该信息不感兴趣的人都可以忽略它。编码指示符总是可选的。
A single underscore can be written after the opening brace of a map or the opening bracket of an array to indicate that the data item was represented in indefinite-length format. For example, [_ 1, 2] contains an indicator that an indefinite-length representation was used to represent the data item [1, 2].
可以在映射的左括号或数组的左括号后写一条下划线,以指示数据项是以不定长格式表示的。例如,[[u1,2]包含一个指示符,表示使用了不定长表示来表示数据项[1,2]。
An underscore followed by a decimal digit n indicates that the preceding item (or, for arrays and maps, the item starting with the preceding bracket or brace) was encoded with an additional information value of 24+n. For example, 1.5_1 is a half-precision floating-point number, while 1.5_3 is encoded as double precision. This encoding indicator is not shown in Appendix A. (Note that the encoding indicator "_" is thus an abbreviation of the full form "_7", which is not used.)
下划线后跟十进制数字n表示前面的项(或者,对于数组和映射,以前面的括号或大括号开头的项)使用24+n的附加信息值进行编码。例如,1.5_1是半精度浮点数,而1.5_3编码为双精度。该编码指示符未在附录A中显示。(注意,编码指示符“_”因此是未使用的完整形式“_7”的缩写。)
As a special case, byte and text strings of indefinite length can be notated in the form (_ h'0123', h'4567') and (_ "foo", "bar").
作为一种特殊情况,长度不定的字节和文本字符串可以用(u'0123',h'4567')和(u'foo',bar)的形式表示。
IANA has created two registries for new CBOR values. The registries are separate, that is, not under an umbrella registry, and follow the rules in [RFC5226]. IANA has also assigned a new MIME media type and an associated Constrained Application Protocol (CoAP) Content-Format entry.
IANA为新的CBOR值创建了两个注册中心。注册中心是独立的,即不在总括注册中心之下,并遵循[RFC5226]中的规则。IANA还分配了一个新的MIME媒体类型和相关的受限应用程序协议(CoAP)内容格式条目。
IANA has created the "Concise Binary Object Representation (CBOR) Simple Values" registry. The initial values are shown in Table 2.
IANA创建了“简明二进制对象表示(CBOR)简单值”注册表。初始值如表2所示。
New entries in the range 0 to 19 are assigned by Standards Action. It is suggested that these Standards Actions allocate values starting with the number 16 in order to reserve the lower numbers for contiguous blocks (if any).
范围为0到19的新条目由标准操作分配。建议这些标准操作分配从数字16开始的值,以便为连续块(如果有)保留较低的数字。
New entries in the range 32 to 255 are assigned by Specification Required.
范围在32到255之间的新条目按所需规格分配。
IANA has created the "Concise Binary Object Representation (CBOR) Tags" registry. The initial values are shown in Table 3.
IANA创建了“简明二进制对象表示(CBOR)标记”注册表。初始值如表3所示。
New entries in the range 0 to 23 are assigned by Standards Action. New entries in the range 24 to 255 are assigned by Specification Required. New entries in the range 256 to 18446744073709551615 are assigned by First Come First Served. The template for registration requests is:
范围为0到23的新条目由标准操作分配。24到255范围内的新条目按所需规格分配。256至18446744073709551615范围内的新条目由先到先得分配。注册请求的模板是:
o Data item
o 数据项
o Semantics (short form)
o 语义学(简称)
In addition, First Come First Served requests should include:
此外,先到先得的请求应包括:
o Point of contact
o 接触点
o Description of semantics (URL) This description is optional; the URL can point to something like an Internet-Draft or a web page.
o 语义描述(URL)此描述是可选的;URL可以指向互联网草稿或网页之类的内容。
The Internet media type [RFC6838] for CBOR data is application/cbor.
CBOR数据的互联网媒体类型[RFC6838]为应用程序/CBOR。
Type name: application
类型名称:应用程序
Subtype name: cbor
子类型名称:cbor
Required parameters: n/a
所需参数:不适用
Optional parameters: n/a
可选参数:不适用
Encoding considerations: binary
编码注意事项:二进制
Security considerations: See Section 8 of this document
安全注意事项:见本文件第8节
Interoperability considerations: n/a
互操作性注意事项:不适用
Published specification: This document
已发布规范:本文件
Applications that use this media type: None yet, but it is expected that this format will be deployed in protocols and applications.
使用此媒体类型的应用程序:还没有,但预计此格式将部署在协议和应用程序中。
Additional information: Magic number(s): n/a File extension(s): .cbor Macintosh file type code(s): n/a
Additional information: Magic number(s): n/a File extension(s): .cbor Macintosh file type code(s): n/a
Person & email address to contact for further information: Carsten Bormann cabo@tzi.org
联系人和电子邮件地址,以获取更多信息:Carsten Bormanncabo@tzi.org
Intended usage: COMMON
预期用途:普通
Restrictions on usage: none
使用限制:无
Author: Carsten Bormann <cabo@tzi.org>
Author: Carsten Bormann <cabo@tzi.org>
Change controller: The IESG <iesg@ietf.org>
Change controller: The IESG <iesg@ietf.org>
Media Type: application/cbor
媒体类型:应用程序/cbor
Encoding: -
编码:-
Id: 60
身份证号码:60
Reference: [RFC7049]
参考文献:[RFC7049]
Name: Concise Binary Object Representation (CBOR)
名称:简明二进制对象表示法(CBOR)
+suffix: +cbor
+suffix: +cbor
References: [RFC7049]
参考文献:[RFC7049]
Encoding Considerations: CBOR is a binary format.
编码注意事项:CBOR是一种二进制格式。
Interoperability Considerations: n/a
互操作性注意事项:不适用
Fragment Identifier Considerations: The syntax and semantics of fragment identifiers specified for +cbor SHOULD be as specified for "application/cbor". (At publication of this document, there is no fragment identification syntax defined for "application/cbor".)
片段标识符注意事项:为+cbor指定的片段标识符的语法和语义应与为“application/cbor”指定的相同。(在本文件发布时,没有为“应用程序/cbor”定义片段标识语法。)
The syntax and semantics for fragment identifiers for a specific "xxx/yyy+cbor" SHOULD be processed as follows:
特定“xxx/yyy+cbor”的片段标识符的语法和语义应按如下方式处理:
For cases defined in +cbor, where the fragment identifier resolves per the +cbor rules, then process as specified in +cbor.
对于在+cbor中定义的情况,其中片段标识符根据+cbor规则进行解析,然后按照+cbor中的指定进行处理。
For cases defined in +cbor, where the fragment identifier does not resolve per the +cbor rules, then process as specified in "xxx/yyy+cbor".
对于+cbor中定义的情况,如果片段标识符没有按照+cbor规则解析,则按照“xxx/yyy+cbor”中的指定进行处理。
For cases not defined in +cbor, then process as specified in "xxx/yyy+cbor".
对于+cbor中未定义的情况,则按照“xxx/yyy+cbor”中的规定进行处理。
Security Considerations: See Section 8 of this document
安全注意事项:见本文件第8节
Contact: Apps Area Working Group (apps-discuss@ietf.org)
联系人:应用程序区域工作组(Apps)-discuss@ietf.org)
Author/Change Controller: The Apps Area Working Group. The IESG has change control over this registration.
作者/更改控制器:应用程序区域工作组。IESG对此注册具有变更控制权。
A network-facing application can exhibit vulnerabilities in its processing logic for incoming data. Complex parsers are well known as a likely source of such vulnerabilities, such as the ability to remotely crash a node, or even remotely execute arbitrary code on it. CBOR attempts to narrow the opportunities for introducing such vulnerabilities by reducing parser complexity, by giving the entire range of encodable values a meaning where possible.
面向网络的应用程序在处理传入数据的逻辑中可能存在漏洞。众所周知,复杂的解析器是此类漏洞的一个可能来源,例如远程使节点崩溃,甚至远程在其上执行任意代码的能力。CBOR试图通过降低解析器的复杂性,尽可能地赋予整个可编码值范围以含义,从而缩小引入此类漏洞的机会。
Resource exhaustion attacks might attempt to lure a decoder into allocating very big data items (strings, arrays, maps) or exhaust the stack depth by setting up deeply nested items. Decoders need to have appropriate resource management to mitigate these attacks. (Items for which very large sizes are given can also attempt to exploit integer overflow vulnerabilities.)
资源耗尽攻击可能试图诱使解码器分配非常大的数据项(字符串、数组、映射),或通过设置深度嵌套的项来耗尽堆栈深度。解码器需要有适当的资源管理来缓解这些攻击。(对于给定的非常大的项目,也可以尝试利用整数溢出漏洞。)
Applications where a CBOR data item is examined by a gatekeeper function and later used by a different application may exhibit vulnerabilities when multiple interpretations of the data item are possible. For example, an attacker could make use of duplicate keys in maps and precision issues in numbers to make the gatekeeper base its decisions on a different interpretation than the one that will be used by the second application. Protocols that are used in a security context should be defined in such a way that these multiple interpretations are reliably reduced to a single one. To facilitate this, encoder and decoder implementations used in such contexts should provide at least one strict mode of operation (Section 3.10).
当可能对CBOR数据项进行多种解释时,由网关守卫功能检查CBOR数据项并随后由不同应用程序使用的应用程序可能会出现漏洞。例如,攻击者可以利用地图中的重复密钥和数字中的精度问题,使守门人根据与第二个应用程序将使用的解释不同的解释做出决定。在安全上下文中使用的协议的定义方式应确保这些多个解释可靠地简化为单个解释。为了促进这一点,在此类上下文中使用的编码器和解码器实现应至少提供一种严格的操作模式(第3.10节)。
CBOR was inspired by MessagePack. MessagePack was developed and promoted by Sadayuki Furuhashi ("frsyuki"). This reference to MessagePack is solely for attribution; CBOR is not intended as a version of or replacement for MessagePack, as it has different design goals and requirements.
CBOR的灵感来自MessagePack。MessagePack由Sadayuki Furuhashi(“frsyuki”)开发和推广。对MessagePack的引用仅用于归属;CBOR不是MessagePack的版本或替代品,因为它有不同的设计目标和要求。
The need for functionality beyond the original MessagePack Specification became obvious to many people at about the same time around the year 2012. BinaryPack is a minor derivation of MessagePack that was developed by Eric Zhang for the binaryjs project. A similar, but different, extension was made by Tim Caswell
在2012年左右的同一时间,许多人都清楚地意识到,除了最初的MessagePack规范之外,还需要更多的功能。BinaryPack是MessagePack的一个次要派生版本,由Eric Zhang为binaryjs项目开发。蒂姆·卡斯韦尔(Tim Caswell)做了一个类似但不同的扩展
for his msgpack-js and msgpack-js-browser projects. Many people have contributed to the recent discussion about extending MessagePack to separate text string representation from byte string representation.
用于他的msgpack js和msgpack js浏览器项目。许多人对最近关于扩展MessagePack以将文本字符串表示与字节字符串表示分离的讨论做出了贡献。
The encoding of the additional information in CBOR was inspired by the encoding of length information designed by Klaus Hartke for CoAP.
CBOR中附加信息的编码受Klaus Hartke为CoAP设计的长度信息编码的启发。
This document also incorporates suggestions made by many people, notably Dan Frost, James Manger, Joe Hildebrand, Keith Moore, Matthew Lepinski, Nico Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray, Tony Finch, Tony Hansen, and Yaron Sheffer.
本文件还包含了许多人提出的建议,特别是丹·弗罗斯特、詹姆斯·马格尔、乔·希尔德布兰德、基思·摩尔、马修·莱宾斯基、尼科·威廉姆斯、菲利普·哈拉姆·贝克、雷·波尔克、蒂姆·布雷、托尼·芬奇、托尼·汉森和雅伦·谢弗。
[ECMA262] European Computer Manufacturers Association, "ECMAScript Language Specification 5.1 Edition", ECMA Standard ECMA-262, June 2011, <http://www.ecma-international.org/ publications/files/ecma-st/ECMA-262.pdf>.
[ECMA262]欧洲计算机制造商协会,“ECMAScript语言规范5.1版”,ECMA标准ECMA-262,2011年6月<http://www.ecma-international.org/ 出版物/文件/ecma st/ecma-262.pdf>。
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996.
[RFC2045]Freed,N.和N.Borenstein,“多用途Internet邮件扩展(MIME)第一部分:Internet邮件正文格式”,RFC 20451996年11月。
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, July 2002.
[RFC3339]Klyne,G.,Ed.和C.Newman,“互联网上的日期和时间:时间戳”,RFC33392002年7月。
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003.
[RFC3629]Yergeau,F.,“UTF-8,ISO 10646的转换格式”,STD 63,RFC 3629,2003年11月。
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
[RFC3986]Berners Lee,T.,Fielding,R.,和L.Masinter,“统一资源标识符(URI):通用语法”,STD 66,RFC 3986,2005年1月。
[RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom Syndication Format", RFC 4287, December 2005.
[RFC4287]诺丁汉,M.,Ed.和R.Sayre,Ed.,“原子联合格式”,RFC 4287,2005年12月。
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, October 2006.
[RFC4648]Josefsson,S.,“Base16、Base32和Base64数据编码”,RFC4648,2006年10月。
[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.
[RFC5226]Narten,T.和H.Alvestrand,“在RFCs中编写IANA注意事项部分的指南”,BCP 26,RFC 5226,2008年5月。
[TIME_T] The Open Group Base Specifications, "Vol. 1: Base Definitions, Issue 7", Section 4.15 'Seconds Since the Epoch', IEEE Std 1003.1, 2013 Edition, 2013, <http://pubs.opengroup.org/onlinepubs/9699919799/ basedefs/V1_chap04.html#tag_04_15>.
[时间]开放组基础规范,“第1卷:基础定义,第7期”,第4.15节“自新纪元以来的秒数”,IEEE标准1003.12013年版<http://pubs.opengroup.org/onlinepubs/9699919799/ basedefs/V1_chap04.html#tag_04_15>。
[ASN.1] International Telecommunication Union, "Information Technology -- ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER)", ITU-T Recommendation X.690, 1994.
[ASN.1]国际电信联盟,“信息技术——ASN.1编码规则:基本编码规则(BER)、规范编码规则(CER)和区分编码规则(DER)规范”,ITU-T建议X.690,1994年。
[BSON] Various, "BSON - Binary JSON", 2013, <http://bsonspec.org/>.
[BSON]各种,“BSON-二进制JSON”,2013年<http://bsonspec.org/>.
[CNN-TERMS] Bormann, C., Ersue, M., and A. Keranen, "Terminology for Constrained Node Networks", Work in Progress, July 2013.
[CNN-TERMS]Bormann,C.,Ersue,M.,和A.Keranen,“受限节点网络的术语”,正在进行的工作,2013年7月。
[MessagePack] Furuhashi, S., "MessagePack", 2013, <http://msgpack.org/>.
[MessagePack]Furuhashi,S.,“MessagePack”,2013年<http://msgpack.org/>.
[RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission Protocol", RFC 713, April 1976.
[RFC0713]Haverty,J.,“MSDTP消息服务数据传输协议”,RFC 713,1976年4月。
[RFC4627] Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006.
[RFC4627]Crockford,D.,“JavaScript对象表示法(json)的应用程序/json媒体类型”,RFC4627,2006年7月。
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, January 2013.
[RFC6838]Freed,N.,Klensin,J.和T.Hansen,“媒体类型规范和注册程序”,BCP 13,RFC 6838,2013年1月。
[UBJSON] The Buzz Media, "Universal Binary JSON Specification", 2013, <http://ubjson.org/>.
[UBJSON]Buzz媒体,“通用二进制JSON规范”,2013年<http://ubjson.org/>.
[YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup Language (YAML[TM]) Version 1.2", 3rd Edition, October 2009, <http://www.yaml.org/spec/1.2/spec.html>.
[YAML]Ben Kiki,O.,Evans,C.,和I.Net,“YAML不是标记语言(YAML[TM])版本1.2”,第三版,2009年10月<http://www.yaml.org/spec/1.2/spec.html>.
The following table provides some CBOR-encoded values in hexadecimal (right column), together with diagnostic notation for these values (left column). Note that the string "\u00fc" is one form of diagnostic notation for a UTF-8 string containing the single Unicode character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut). Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often representing "water"), and "\ud800\udd51" is a UTF-8 string in diagnostic notation with a single character U+10151 (GREEK ACROPHONIC ATTIC FIFTY STATERS). (Note that all these single-character strings could also be represented in native UTF-8 in diagnostic notation, just not in an ASCII-only specification like the present one.) In the diagnostic notation provided for bignums, their intended numeric value is shown as a decimal number (such as 18446744073709551616) instead of showing a tagged byte string (such as 2(h'010000000000000000')).
下表提供了一些十六进制的CBOR编码值(右列),以及这些值的诊断符号(左列)。请注意,字符串“\u00fc”是UTF-8字符串的一种诊断表示法,该字符串包含单个Unicode字符U+00FC、带分音符的拉丁文小写字母U(U umlaut)。类似地,“\u6c34”是诊断符号中的UTF-8字符串,具有单字符U+6C34(CJK统一表意符-6C34,通常表示“水”),而“\ud800\udd51”是诊断符号中的UTF-8字符串,具有单字符U+10151(希腊字母缩写阁楼五十州)。(请注意,所有这些单字符字符串也可以在诊断表示法中以本机UTF-8表示,而不是像现在这样以仅ASCII的规范表示。)在为bignums提供的诊断表示法中,它们的预期数值显示为十进制数(例如18446744073709551616)而不是显示带标记的字节字符串(例如2(h'010000000000000000'))。
+------------------------------+------------------------------------+ | Diagnostic | Encoded | +------------------------------+------------------------------------+ | 0 | 0x00 | | | | | 1 | 0x01 | | | | | 10 | 0x0a | | | | | 23 | 0x17 | | | | | 24 | 0x1818 | | | | | 25 | 0x1819 | | | | | 100 | 0x1864 | | | | | 1000 | 0x1903e8 | | | | | 1000000 | 0x1a000f4240 | | | | | 1000000000000 | 0x1b000000e8d4a51000 | | | | | 18446744073709551615 | 0x1bffffffffffffffff | | | | | 18446744073709551616 | 0xc249010000000000000000 | | | | | -18446744073709551616 | 0x3bffffffffffffffff | | | |
+------------------------------+------------------------------------+ | Diagnostic | Encoded | +------------------------------+------------------------------------+ | 0 | 0x00 | | | | | 1 | 0x01 | | | | | 10 | 0x0a | | | | | 23 | 0x17 | | | | | 24 | 0x1818 | | | | | 25 | 0x1819 | | | | | 100 | 0x1864 | | | | | 1000 | 0x1903e8 | | | | | 1000000 | 0x1a000f4240 | | | | | 1000000000000 | 0x1b000000e8d4a51000 | | | | | 18446744073709551615 | 0x1bffffffffffffffff | | | | | 18446744073709551616 | 0xc249010000000000000000 | | | | | -18446744073709551616 | 0x3bffffffffffffffff | | | |
| -18446744073709551617 | 0xc349010000000000000000 | | | | | -1 | 0x20 | | | | | -10 | 0x29 | | | | | -100 | 0x3863 | | | | | -1000 | 0x3903e7 | | | | | 0.0 | 0xf90000 | | | | | -0.0 | 0xf98000 | | | | | 1.0 | 0xf93c00 | | | | | 1.1 | 0xfb3ff199999999999a | | | | | 1.5 | 0xf93e00 | | | | | 65504.0 | 0xf97bff | | | | | 100000.0 | 0xfa47c35000 | | | | | 3.4028234663852886e+38 | 0xfa7f7fffff | | | | | 1.0e+300 | 0xfb7e37e43c8800759c | | | | | 5.960464477539063e-8 | 0xf90001 | | | | | 0.00006103515625 | 0xf90400 | | | | | -4.0 | 0xf9c400 | | | | | -4.1 | 0xfbc010666666666666 | | | | | Infinity | 0xf97c00 | | | | | NaN | 0xf97e00 | | | | | -Infinity | 0xf9fc00 | | | | | Infinity | 0xfa7f800000 | | | | | NaN | 0xfa7fc00000 | | | | | -Infinity | 0xfaff800000 | | | |
| -18446744073709551617 | 0xc349010000000000000000 | | | | | -1 | 0x20 | | | | | -10 | 0x29 | | | | | -100 | 0x3863 | | | | | -1000 | 0x3903e7 | | | | | 0.0 | 0xf90000 | | | | | -0.0 | 0xf98000 | | | | | 1.0 | 0xf93c00 | | | | | 1.1 | 0xfb3ff199999999999a | | | | | 1.5 | 0xf93e00 | | | | | 65504.0 | 0xf97bff | | | | | 100000.0 | 0xfa47c35000 | | | | | 3.4028234663852886e+38 | 0xfa7f7fffff | | | | | 1.0e+300 | 0xfb7e37e43c8800759c | | | | | 5.960464477539063e-8 | 0xf90001 | | | | | 0.00006103515625 | 0xf90400 | | | | | -4.0 | 0xf9c400 | | | | | -4.1 | 0xfbc010666666666666 | | | | | Infinity | 0xf97c00 | | | | | NaN | 0xf97e00 | | | | | -Infinity | 0xf9fc00 | | | | | Infinity | 0xfa7f800000 | | | | | NaN | 0xfa7fc00000 | | | | | -Infinity | 0xfaff800000 | | | |
| Infinity | 0xfb7ff0000000000000 | | | | | NaN | 0xfb7ff8000000000000 | | | | | -Infinity | 0xfbfff0000000000000 | | | | | false | 0xf4 | | | | | true | 0xf5 | | | | | null | 0xf6 | | | | | undefined | 0xf7 | | | | | simple(16) | 0xf0 | | | | | simple(24) | 0xf818 | | | | | simple(255) | 0xf8ff | | | | | 0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | | | 30343a30305a | | | | | 1(1363896240) | 0xc11a514b67b0 | | | | | 1(1363896240.5) | 0xc1fb41d452d9ec200000 | | | | | 23(h'01020304') | 0xd74401020304 | | | | | 24(h'6449455446') | 0xd818456449455446 | | | | | 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 | | | 616d706c652e636f6d | | | | | h'' | 0x40 | | | | | h'01020304' | 0x4401020304 | | | | | "" | 0x60 | | | | | "a" | 0x6161 | | | | | "IETF" | 0x6449455446 | | | | | "\"\\" | 0x62225c | | | | | "\u00fc" | 0x62c3bc | | | |
| Infinity | 0xfb7ff0000000000000 | | | | | NaN | 0xfb7ff8000000000000 | | | | | -Infinity | 0xfbfff0000000000000 | | | | | false | 0xf4 | | | | | true | 0xf5 | | | | | null | 0xf6 | | | | | undefined | 0xf7 | | | | | simple(16) | 0xf0 | | | | | simple(24) | 0xf818 | | | | | simple(255) | 0xf8ff | | | | | 0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | | | 30343a30305a | | | | | 1(1363896240) | 0xc11a514b67b0 | | | | | 1(1363896240.5) | 0xc1fb41d452d9ec200000 | | | | | 23(h'01020304') | 0xd74401020304 | | | | | 24(h'6449455446') | 0xd818456449455446 | | | | | 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 | | | 616d706c652e636f6d | | | | | h'' | 0x40 | | | | | h'01020304' | 0x4401020304 | | | | | "" | 0x60 | | | | | "a" | 0x6161 | | | | | "IETF" | 0x6449455446 | | | | | "\"\\" | 0x62225c | | | | | "\u00fc" | 0x62c3bc | | | |
| "\u6c34" | 0x63e6b0b4 | | | | | "\ud800\udd51" | 0x64f0908591 | | | | | [] | 0x80 | | | | | [1, 2, 3] | 0x83010203 | | | | | [1, [2, 3], [4, 5]] | 0x8301820203820405 | | | | | [1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e | | 10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 | | 17, 18, 19, 20, 21, 22, 23, | | | 24, 25] | | | | | | {} | 0xa0 | | | | | {1: 2, 3: 4} | 0xa201020304 | | | | | {"a": 1, "b": [2, 3]} | 0xa26161016162820203 | | | | | ["a", {"b": "c"}] | 0x826161a161626163 | | | | | {"a": "A", "b": "B", "c": | 0xa5616161416162614261636143616461 | | "C", "d": "D", "e": "E"} | 4461656145 | | | | | (_ h'0102', h'030405') | 0x5f42010243030405ff | | | | | (_ "strea", "ming") | 0x7f657374726561646d696e67ff | | | | | [_ ] | 0x9fff | | | | | [_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff | | | | | [_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff | | | | | [1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff | | | | | [1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 | | | | | [_ 1, 2, 3, 4, 5, 6, 7, 8, | 0x9f0102030405060708090a0b0c0d0e0f | | 9, 10, 11, 12, 13, 14, 15, | 101112131415161718181819ff | | 16, 17, 18, 19, 20, 21, 22, | | | 23, 24, 25] | | | | | | {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | | | |
| "\u6c34" | 0x63e6b0b4 | | | | | "\ud800\udd51" | 0x64f0908591 | | | | | [] | 0x80 | | | | | [1, 2, 3] | 0x83010203 | | | | | [1, [2, 3], [4, 5]] | 0x8301820203820405 | | | | | [1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e | | 10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 | | 17, 18, 19, 20, 21, 22, 23, | | | 24, 25] | | | | | | {} | 0xa0 | | | | | {1: 2, 3: 4} | 0xa201020304 | | | | | {"a": 1, "b": [2, 3]} | 0xa26161016162820203 | | | | | ["a", {"b": "c"}] | 0x826161a161626163 | | | | | {"a": "A", "b": "B", "c": | 0xa5616161416162614261636143616461 | | "C", "d": "D", "e": "E"} | 4461656145 | | | | | (_ h'0102', h'030405') | 0x5f42010243030405ff | | | | | (_ "strea", "ming") | 0x7f657374726561646d696e67ff | | | | | [_ ] | 0x9fff | | | | | [_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff | | | | | [_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff | | | | | [1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff | | | | | [1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 | | | | | [_ 1, 2, 3, 4, 5, 6, 7, 8, | 0x9f0102030405060708090a0b0c0d0e0f | | 9, 10, 11, 12, 13, 14, 15, | 101112131415161718181819ff | | 16, 17, 18, 19, 20, 21, 22, | | | 23, 24, 25] | | | | | | {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | | | |
| ["a", {_ "b": "c"}] | 0x826161bf61626163ff | | | | | {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | +------------------------------+------------------------------------+
| ["a", {_ "b": "c"}] | 0x826161bf61626163ff | | | | | {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | +------------------------------+------------------------------------+
Table 4: Examples of Encoded CBOR Data Items
表4:编码CBOR数据项示例
For brevity, this jump table does not show initial bytes that are reserved for future extension. It also only shows a selection of the initial bytes that can be used for optional features. (All unsigned integers are in network byte order.)
为简洁起见,此跳转表不显示为将来扩展保留的初始字节。它还仅显示可用于可选功能的初始字节的选择。(所有无符号整数均按网络字节顺序排列。)
+-----------------+-------------------------------------------------+ | Byte | Structure/Semantics | +-----------------+-------------------------------------------------+ | 0x00..0x17 | Integer 0x00..0x17 (0..23) | | | | | 0x18 | Unsigned integer (one-byte uint8_t follows) | | | | | 0x19 | Unsigned integer (two-byte uint16_t follows) | | | | | 0x1a | Unsigned integer (four-byte uint32_t follows) | | | | | 0x1b | Unsigned integer (eight-byte uint64_t follows) | | | | | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24) | | | | | 0x38 | Negative integer -1-n (one-byte uint8_t for n | | | follows) | | | | | 0x39 | Negative integer -1-n (two-byte uint16_t for n | | | follows) | | | | | 0x3a | Negative integer -1-n (four-byte uint32_t for n | | | follows) | | | | | 0x3b | Negative integer -1-n (eight-byte uint64_t for | | | n follows) | | | | | 0x40..0x57 | byte string (0x00..0x17 bytes follow) | | | | | 0x58 | byte string (one-byte uint8_t for n, and then n | | | bytes follow) | | | | | 0x59 | byte string (two-byte uint16_t for n, and then | | | n bytes follow) |
+-----------------+-------------------------------------------------+ | Byte | Structure/Semantics | +-----------------+-------------------------------------------------+ | 0x00..0x17 | Integer 0x00..0x17 (0..23) | | | | | 0x18 | Unsigned integer (one-byte uint8_t follows) | | | | | 0x19 | Unsigned integer (two-byte uint16_t follows) | | | | | 0x1a | Unsigned integer (four-byte uint32_t follows) | | | | | 0x1b | Unsigned integer (eight-byte uint64_t follows) | | | | | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24) | | | | | 0x38 | Negative integer -1-n (one-byte uint8_t for n | | | follows) | | | | | 0x39 | Negative integer -1-n (two-byte uint16_t for n | | | follows) | | | | | 0x3a | Negative integer -1-n (four-byte uint32_t for n | | | follows) | | | | | 0x3b | Negative integer -1-n (eight-byte uint64_t for | | | n follows) | | | | | 0x40..0x57 | byte string (0x00..0x17 bytes follow) | | | | | 0x58 | byte string (one-byte uint8_t for n, and then n | | | bytes follow) | | | | | 0x59 | byte string (two-byte uint16_t for n, and then | | | n bytes follow) |
| | | | 0x5a | byte string (four-byte uint32_t for n, and then | | | n bytes follow) | | | | | 0x5b | byte string (eight-byte uint64_t for n, and | | | then n bytes follow) | | | | | 0x5f | byte string, byte strings follow, terminated by | | | "break" | | | | | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow) | | | | | 0x78 | UTF-8 string (one-byte uint8_t for n, and then | | | n bytes follow) | | | | | 0x79 | UTF-8 string (two-byte uint16_t for n, and then | | | n bytes follow) | | | | | 0x7a | UTF-8 string (four-byte uint32_t for n, and | | | then n bytes follow) | | | | | 0x7b | UTF-8 string (eight-byte uint64_t for n, and | | | then n bytes follow) | | | | | 0x7f | UTF-8 string, UTF-8 strings follow, terminated | | | by "break" | | | | | 0x80..0x97 | array (0x00..0x17 data items follow) | | | | | 0x98 | array (one-byte uint8_t for n, and then n data | | | items follow) | | | | | 0x99 | array (two-byte uint16_t for n, and then n data | | | items follow) | | | | | 0x9a | array (four-byte uint32_t for n, and then n | | | data items follow) | | | | | 0x9b | array (eight-byte uint64_t for n, and then n | | | data items follow) | | | | | 0x9f | array, data items follow, terminated by "break" | | | | | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow) | | | | | 0xb8 | map (one-byte uint8_t for n, and then n pairs | | | of data items follow) | | | |
| | | | 0x5a | byte string (four-byte uint32_t for n, and then | | | n bytes follow) | | | | | 0x5b | byte string (eight-byte uint64_t for n, and | | | then n bytes follow) | | | | | 0x5f | byte string, byte strings follow, terminated by | | | "break" | | | | | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow) | | | | | 0x78 | UTF-8 string (one-byte uint8_t for n, and then | | | n bytes follow) | | | | | 0x79 | UTF-8 string (two-byte uint16_t for n, and then | | | n bytes follow) | | | | | 0x7a | UTF-8 string (four-byte uint32_t for n, and | | | then n bytes follow) | | | | | 0x7b | UTF-8 string (eight-byte uint64_t for n, and | | | then n bytes follow) | | | | | 0x7f | UTF-8 string, UTF-8 strings follow, terminated | | | by "break" | | | | | 0x80..0x97 | array (0x00..0x17 data items follow) | | | | | 0x98 | array (one-byte uint8_t for n, and then n data | | | items follow) | | | | | 0x99 | array (two-byte uint16_t for n, and then n data | | | items follow) | | | | | 0x9a | array (four-byte uint32_t for n, and then n | | | data items follow) | | | | | 0x9b | array (eight-byte uint64_t for n, and then n | | | data items follow) | | | | | 0x9f | array, data items follow, terminated by "break" | | | | | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow) | | | | | 0xb8 | map (one-byte uint8_t for n, and then n pairs | | | of data items follow) | | | |
| 0xb9 | map (two-byte uint16_t for n, and then n pairs | | | of data items follow) | | | | | 0xba | map (four-byte uint32_t for n, and then n pairs | | | of data items follow) | | | | | 0xbb | map (eight-byte uint64_t for n, and then n | | | pairs of data items follow) | | | | | 0xbf | map, pairs of data items follow, terminated by | | | "break" | | | | | 0xc0 | Text-based date/time (data item follows; see | | | Section 2.4.1) | | | | | 0xc1 | Epoch-based date/time (data item follows; see | | | Section 2.4.1) | | | | | 0xc2 | Positive bignum (data item "byte string" | | | follows) | | | | | 0xc3 | Negative bignum (data item "byte string" | | | follows) | | | | | 0xc4 | Decimal Fraction (data item "array" follows; | | | see Section 2.4.3) | | | | | 0xc5 | Bigfloat (data item "array" follows; see | | | Section 2.4.3) | | | | | 0xc6..0xd4 | (tagged item) | | | | | 0xd5..0xd7 | Expected Conversion (data item follows; see | | | Section 2.4.4.2) | | | | | 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a | | | data item follow) | | | | | 0xe0..0xf3 | (simple value) | | | | | 0xf4 | False | | | | | 0xf5 | True | | | | | 0xf6 | Null | | | | | 0xf7 | Undefined | | | |
| 0xb9 | map (two-byte uint16_t for n, and then n pairs | | | of data items follow) | | | | | 0xba | map (four-byte uint32_t for n, and then n pairs | | | of data items follow) | | | | | 0xbb | map (eight-byte uint64_t for n, and then n | | | pairs of data items follow) | | | | | 0xbf | map, pairs of data items follow, terminated by | | | "break" | | | | | 0xc0 | Text-based date/time (data item follows; see | | | Section 2.4.1) | | | | | 0xc1 | Epoch-based date/time (data item follows; see | | | Section 2.4.1) | | | | | 0xc2 | Positive bignum (data item "byte string" | | | follows) | | | | | 0xc3 | Negative bignum (data item "byte string" | | | follows) | | | | | 0xc4 | Decimal Fraction (data item "array" follows; | | | see Section 2.4.3) | | | | | 0xc5 | Bigfloat (data item "array" follows; see | | | Section 2.4.3) | | | | | 0xc6..0xd4 | (tagged item) | | | | | 0xd5..0xd7 | Expected Conversion (data item follows; see | | | Section 2.4.4.2) | | | | | 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a | | | data item follow) | | | | | 0xe0..0xf3 | (simple value) | | | | | 0xf4 | False | | | | | 0xf5 | True | | | | | 0xf6 | Null | | | | | 0xf7 | Undefined | | | |
| 0xf8 | (simple value, one byte follows) | | | | | 0xf9 | Half-Precision Float (two-byte IEEE 754) | | | | | 0xfa | Single-Precision Float (four-byte IEEE 754) | | | | | 0xfb | Double-Precision Float (eight-byte IEEE 754) | | | | | 0xff | "break" stop code | +-----------------+-------------------------------------------------+
| 0xf8 | (simple value, one byte follows) | | | | | 0xf9 | Half-Precision Float (two-byte IEEE 754) | | | | | 0xfa | Single-Precision Float (four-byte IEEE 754) | | | | | 0xfb | Double-Precision Float (eight-byte IEEE 754) | | | | | 0xff | "break" stop code | +-----------------+-------------------------------------------------+
Table 5: Jump Table for Initial Byte
表5:初始字节的跳转表
The well-formedness of a CBOR item can be checked by the pseudocode in Figure 1. The data is well-formed if and only if:
CBOR项目的良好形式可以通过图1中的伪代码进行检查。当且仅当满足以下条件时,数据格式良好:
o the pseudocode does not "fail";
o 伪代码没有“失败”;
o after execution of the pseudocode, no bytes are left in the input (except in streaming applications)
o 执行伪代码后,输入中没有剩余字节(流应用程序除外)
The pseudocode has the following prerequisites:
伪代码具有以下先决条件:
o take(n) reads n bytes from the input data and returns them as a byte string. If n bytes are no longer available, take(n) fails.
o take(n)从输入数据中读取n个字节,并将其作为字节字符串返回。如果n个字节不再可用,take(n)将失败。
o uint() converts a byte string into an unsigned integer by interpreting the byte string in network byte order.
o uint()通过按网络字节顺序解释字节字符串,将字节字符串转换为无符号整数。
o Arithmetic works as in C.
o 算术和C语言一样有效。
o All variables are unsigned integers of sufficient range.
o 所有变量都是具有足够范围的无符号整数。
well_formed (breakable = false) { // process initial bytes ib = uint(take(1)); mt = ib >> 5; val = ai = ib & 0x1f; switch (ai) { case 24: val = uint(take(1)); break; case 25: val = uint(take(2)); break; case 26: val = uint(take(4)); break; case 27: val = uint(take(8)); break; case 28: case 29: case 30: fail(); case 31: return well_formed_indefinite(mt, breakable); } // process content switch (mt) { // case 0, 1, 7 do not have content; just use val case 2: case 3: take(val); break; // bytes/UTF-8 case 4: for (i = 0; i < val; i++) well_formed(); break; case 5: for (i = 0; i < val*2; i++) well_formed(); break; case 6: well_formed(); break; // 1 embedded data item } return mt; // finite data item }
well_formed (breakable = false) { // process initial bytes ib = uint(take(1)); mt = ib >> 5; val = ai = ib & 0x1f; switch (ai) { case 24: val = uint(take(1)); break; case 25: val = uint(take(2)); break; case 26: val = uint(take(4)); break; case 27: val = uint(take(8)); break; case 28: case 29: case 30: fail(); case 31: return well_formed_indefinite(mt, breakable); } // process content switch (mt) { // case 0, 1, 7 do not have content; just use val case 2: case 3: take(val); break; // bytes/UTF-8 case 4: for (i = 0; i < val; i++) well_formed(); break; case 5: for (i = 0; i < val*2; i++) well_formed(); break; case 6: well_formed(); break; // 1 embedded data item } return mt; // finite data item }
well_formed_indefinite(mt, breakable) { switch (mt) { case 2: case 3: while ((it = well_formed(true)) != -1) if (it != mt) // need finite embedded fail(); // of same type break; case 4: while (well_formed(true) != -1); break; case 5: while (well_formed(true) != -1) well_formed(); break; case 7: if (breakable) return -1; // signal break out else fail(); // no enclosing indefinite default: fail(); // wrong mt } return 0; // no break out }
well_formed_indefinite(mt, breakable) { switch (mt) { case 2: case 3: while ((it = well_formed(true)) != -1) if (it != mt) // need finite embedded fail(); // of same type break; case 4: while (well_formed(true) != -1); break; case 5: while (well_formed(true) != -1) well_formed(); break; case 7: if (breakable) return -1; // signal break out else fail(); // no enclosing indefinite default: fail(); // wrong mt } return 0; // no break out }
Figure 1: Pseudocode for Well-Formedness Check
图1:良好形式检查的伪代码
Note that the remaining complexity of a complete CBOR decoder is about presenting data that has been parsed to the application in an appropriate form.
请注意,完整CBOR解码器的剩余复杂性是以适当的形式向应用程序呈现已解析的数据。
Major types 0 and 1 are designed in such a way that they can be encoded in C from a signed integer without actually doing an if-then-else for positive/negative (Figure 2). This uses the fact that (-1-n), the transformation for major type 1, is the same as ~n (bitwise complement) in C unsigned arithmetic; ~n can then be expressed as (-1)^n for the negative case, while 0^n leaves n unchanged for non-negative. The sign of a number can be converted to -1 for negative and 0 for non-negative (0 or positive) by arithmetic-shifting the number by one bit less than the bit length of the number (for example, by 63 for 64-bit numbers).
主要类型0和1的设计方式是,它们可以从有符号整数用C编码,而无需实际对正/负执行if-then-else(图2)。这使用了一个事实,即(-1-n),主要类型1的转换,与C无符号算术中的~n(按位补码)相同~对于负的情况,n可以表示为(-1)^n,而对于非负的情况,0^n保持n不变。一个数字的符号可以转换为-1表示负数,0表示非负数(0或正数),方法是将该数字进行算术移位,移位量比该数字的位长少一位(例如,64位数字的移位量为63)。
void encode_sint(int64_t n) { uint64t ui = n >> 63; // extend sign to whole length mt = ui & 0x20; // extract major type ui ^= n; // complement negatives if (ui < 24) *p++ = mt + ui; else if (ui < 256) { *p++ = mt + 24; *p++ = ui; } else ...
void encode_sint(int64_t n) { uint64t ui = n >> 63; // extend sign to whole length mt = ui & 0x20; // extract major type ui ^= n; // complement negatives if (ui < 24) *p++ = mt + ui; else if (ui < 256) { *p++ = mt + 24; *p++ = ui; } else ...
Figure 2: Pseudocode for Encoding a Signed Integer
图2:用于编码有符号整数的伪代码
As half-precision floating-point numbers were only added to IEEE 754 in 2008, today's programming platforms often still only have limited support for them. It is very easy to include at least decoding support for them even without such support. An example of a small decoder for half-precision floating-point numbers in the C language is shown in Figure 3. A similar program for Python is in Figure 4; this code assumes that the 2-byte value has already been decoded as an (unsigned short) integer in network byte order (as would be done by the pseudocode in Appendix C).
由于半精度浮点数在2008年才添加到IEEE 754中,因此今天的编程平台对它们的支持通常仍然有限。即使没有这样的支持,也很容易包含至少对它们的解码支持。图3显示了C语言中半精度浮点数的小型解码器示例。Python的类似程序如图4所示;此代码假定2字节值已被解码为网络字节顺序的(无符号短)整数(如附录C中的伪代码所示)。
#include <math.h>
#include <math.h>
double decode_half(unsigned char *halfp) { int half = (halfp[0] << 8) + halfp[1]; int exp = (half >> 10) & 0x1f; int mant = half & 0x3ff; double val; if (exp == 0) val = ldexp(mant, -24); else if (exp != 31) val = ldexp(mant + 1024, exp - 25); else val = mant == 0 ? INFINITY : NAN; return half & 0x8000 ? -val : val; }
double decode_half(unsigned char *halfp) { int half = (halfp[0] << 8) + halfp[1]; int exp = (half >> 10) & 0x1f; int mant = half & 0x3ff; double val; if (exp == 0) val = ldexp(mant, -24); else if (exp != 31) val = ldexp(mant + 1024, exp - 25); else val = mant == 0 ? INFINITY : NAN; return half & 0x8000 ? -val : val; }
Figure 3: C Code for a Half-Precision Decoder
图3:半精度解码器的C代码
import struct from math import ldexp
从math导入结构导入ldexp
def decode_single(single): return struct.unpack("!f", struct.pack("!I", single))[0]
def decode_single(single): return struct.unpack("!f", struct.pack("!I", single))[0]
def decode_half(half): valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16 if ((half & 0x7c00) != 0x7c00): return ldexp(decode_single(valu), 112) return decode_single(valu | 0x7f800000)
def decode_half(half): valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16 if ((half & 0x7c00) != 0x7c00): return ldexp(decode_single(valu), 112) return decode_single(valu | 0x7f800000)
Figure 4: Python Code for a Half-Precision Decoder
图4:用于半精度解码器的Python代码
Appendix E. Comparison of Other Binary Formats to CBOR's Design Objectives
附录E.其他二进制格式与CBOR设计目标的比较
The proposal for CBOR follows a history of binary formats that is as long as the history of computers themselves. Different formats have had different objectives. In most cases, the objectives of the format were never stated, although they can sometimes be implied by the context where the format was first used. Some formats were meant to be universally usable, although history has proven that no binary format meets the needs of all protocols and applications.
CBOR的提案遵循的是二进制格式的历史,与计算机本身的历史一样长。不同的形式有不同的目标。在大多数情况下,从未说明格式的目标,尽管格式最初使用的上下文有时会暗示这些目标。虽然历史证明,没有任何二进制格式能够满足所有协议和应用程序的需要,但有些格式是通用的。
CBOR differs from many of these formats due to it starting with a set of objectives and attempting to meet just those. This section compares a few of the dozens of formats with CBOR's objectives in order to help the reader decide if they want to use CBOR or a different format for a particular protocol or application.
CBOR不同于许多此类格式,因为它从一组目标开始,并试图满足这些目标。本节将几十种格式中的几种与CBOR的目标进行比较,以帮助读者决定是否要为特定协议或应用程序使用CBOR或其他格式。
Note that the discussion here is not meant to be a criticism of any format: to the best of our knowledge, no format before CBOR was meant to cover CBOR's objectives in the priority we have assigned them. A brief recap of the objectives from Section 1.1 is:
请注意,这里的讨论并不意味着对任何形式的批评:据我们所知,CBOR之前的任何形式都不意味着按照我们指定的优先顺序涵盖CBOR的目标。第1.1节的目标简要概述如下:
1. unambiguous encoding of most common data formats from Internet standards
1. 互联网标准中最常见数据格式的明确编码
2. code compactness for encoder or decoder
2. 编码器或解码器的代码紧凑性
3. no schema description needed
3. 不需要架构描述
4. reasonably compact serialization
4. 合理紧凑的序列化
5. applicability to constrained and unconstrained applications
5. 适用于受约束和不受约束的应用程序
6. good JSON conversion
6. 良好的JSON转换
7. extensibility
7. 扩展性
[ASN.1] has many serializations. In the IETF, DER and BER are the most common. The serialized output is not particularly compact for many items, and the code needed to decode numeric items can be complex on a constrained device.
[ASN.1]有许多序列化。在IETF中,DER和BER是最常见的。对于许多项目,序列化输出不是特别紧凑,在受约束的设备上解码数字项目所需的代码可能很复杂。
Few (if any) IETF protocols have adopted one of the several variants of Packed Encoding Rules (PER). There could be many reasons for this, but one that is commonly stated is that PER makes use of the schema even for parsing the surface structure of the data stream, requiring significant tool support. There are different versions of the ASN.1 schema language in use, which has also hampered adoption.
很少(如果有)IETF协议采用压缩编码规则(PER)的几种变体之一。这可能有很多原因,但通常说的一个原因是PER甚至在解析数据流的表面结构时使用模式,这需要大量的工具支持。目前使用的ASN.1模式语言有不同的版本,这也阻碍了其采用。
[MessagePack] is a concise, widely implemented counted binary serialization format, similar in many properties to CBOR, although somewhat less regular. While the data model can be used to represent JSON data, MessagePack has also been used in many remote procedure call (RPC) applications and for long-term storage of data.
[MessagePack]是一种简洁、广泛实现的计数二进制序列化格式,在许多属性上与CBOR类似,但有点不规则。虽然数据模型可以用来表示JSON数据,但MessagePack也被用于许多远程过程调用(RPC)应用程序和数据的长期存储。
MessagePack has been essentially stable since it was first published around 2011; it has not yet had a transition. The evolution of MessagePack is impeded by an imperative to maintain complete backwards compatibility with existing stored data, while only few bytecodes are still available for extension. Repeated requests over the years from the MessagePack user community to separate out binary
MessagePack自2011年左右首次发布以来基本稳定;它还没有过渡。MessagePack的发展受到与现有存储数据保持完全向后兼容性的迫切要求的阻碍,同时只有很少的字节码可用于扩展。MessagePack用户社区多年来不断请求分离二进制文件
and text strings in the encoding recently have led to an extension proposal that would leave MessagePack's "raw" data ambiguous between its usages for binary and text data. The extension mechanism for MessagePack remains unclear.
编码中的文本字符串最近导致了一项扩展建议,这将使MessagePack的“原始”数据在二进制数据和文本数据之间的用法不明确。MessagePack的扩展机制尚不清楚。
[BSON] is a data format that was developed for the storage of JSON-like maps (JSON objects) in the MongoDB database. Its major distinguishing feature is the capability for in-place update, foregoing a compact representation. BSON uses a counted representation except for map keys, which are null-byte terminated. While BSON can be used for the representation of JSON-like objects on the wire, its specification is dominated by the requirements of the database application and has become somewhat baroque. The status of how BSON extensions will be implemented remains unclear.
[BSON]是为在MongoDB数据库中存储类似JSON的映射(JSON对象)而开发的数据格式。它的主要特点是能够进行就地更新,而不是紧凑的表示。BSON使用计数表示法,但映射键除外,映射键以空字节结尾。虽然BSON可以用于在线表示类似JSON的对象,但它的规范主要由数据库应用程序的需求决定,并且有点巴洛克风格。BSON扩展将如何实施,目前尚不清楚。
[UBJSON] has a design goal to make JSON faster and somewhat smaller, using a binary format that is limited to exactly the data model JSON uses. Thus, there is expressly no intention to support, for example, binary data; however, there is a "high-precision number", expressed as a character string in JSON syntax. UBJSON is not optimized for code compactness, and its type byte coding is optimized for human recognition and not for compact representation of native types such as small integers. Although UBJSON is mostly counted, it provides a reserved "unknown-length" value to support streaming of arrays and maps (JSON objects). Within these containers, UBJSON also has a "Noop" type for padding.
[UBJSON]的设计目标是使JSON更快、更小,使用的二进制格式仅限于JSON使用的数据模型。因此,明确不打算支持例如二进制数据;但是,有一个“高精度数字”,用JSON语法表示为字符串。UBJSON没有针对代码紧凑性进行优化,它的类型字节编码是针对人类识别而优化的,而不是针对本机类型(如小整数)的紧凑表示。虽然UBJSON主要是计算在内的,但它提供了一个保留的“未知长度”值来支持数组和映射(JSON对象)的流。在这些容器中,UBJSON还有一个用于填充的“Noop”类型。
Message Services Data Transmission (MSDTP) is a very early example of a compact message format; it is described in [RFC0713], written in 1976. It is included here for its historical value, not because it was ever widely used.
消息服务数据传输(MSDTP)是紧凑消息格式的早期示例;1976年编写的[RFC0713]中对其进行了描述。这里包括它的历史价值,而不是因为它曾经被广泛使用。
While CBOR's design objective of code compactness for encoders and decoders is a higher priority than its objective of conciseness on the wire, many people focus on the wire size. Table 6 shows some encoding examples for the simple nested array [1, [2, 3]]; where some form of indefinite-length encoding is supported by the encoding, [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.
虽然CBOR的编码器和解码器代码紧凑性的设计目标比其在导线上的简洁性目标具有更高的优先级,但许多人关注导线尺寸。表6显示了简单嵌套数组[1、[2、3]]的一些编码示例;当编码支持某种形式的不定长编码时,还显示了[1、[2,3]](外部数组上的不定长)。
+---------------+-------------------------+-------------------------+ | Format | [1, [2, 3]] | [_ 1, [2, 3]] | +---------------+-------------------------+-------------------------+ | RFC 713 | c2 05 81 c2 02 82 83 | | | | | | | ASN.1 BER | 30 0b 02 01 01 30 06 02 | 30 80 02 01 01 30 06 02 | | | 01 02 02 01 03 | 01 02 02 01 03 00 00 | | | | | | MessagePack | 92 01 92 02 03 | | | | | | | BSON | 22 00 00 00 10 30 00 01 | | | | 00 00 00 04 31 00 13 00 | | | | 00 00 10 30 00 02 00 00 | | | | 00 10 31 00 03 00 00 00 | | | | 00 00 | | | | | | | UBJSON | 61 02 42 01 61 02 42 02 | 61 ff 42 01 61 02 42 02 | | | 42 03 | 42 03 45 | | | | | | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 ff | +---------------+-------------------------+-------------------------+
+---------------+-------------------------+-------------------------+ | Format | [1, [2, 3]] | [_ 1, [2, 3]] | +---------------+-------------------------+-------------------------+ | RFC 713 | c2 05 81 c2 02 82 83 | | | | | | | ASN.1 BER | 30 0b 02 01 01 30 06 02 | 30 80 02 01 01 30 06 02 | | | 01 02 02 01 03 | 01 02 02 01 03 00 00 | | | | | | MessagePack | 92 01 92 02 03 | | | | | | | BSON | 22 00 00 00 10 30 00 01 | | | | 00 00 00 04 31 00 13 00 | | | | 00 00 10 30 00 02 00 00 | | | | 00 10 31 00 03 00 00 00 | | | | 00 00 | | | | | | | UBJSON | 61 02 42 01 61 02 42 02 | 61 ff 42 01 61 02 42 02 | | | 42 03 | 42 03 45 | | | | | | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 ff | +---------------+-------------------------+-------------------------+
Table 6: Examples for Different Levels of Conciseness
表6:不同简明程度的示例
Authors' Addresses
作者地址
Carsten Bormann Universitaet Bremen TZI Postfach 330440 D-28359 Bremen Germany
德国不来梅卡斯滕·鲍曼大学邮政学院330440 D-28359
Phone: +49-421-218-63921 EMail: cabo@tzi.org
Phone: +49-421-218-63921 EMail: cabo@tzi.org
Paul Hoffman VPN Consortium
保罗·霍夫曼VPN联盟
EMail: paul.hoffman@vpnc.org
EMail: paul.hoffman@vpnc.org