Internet Engineering Task Force (IETF)                       N. Williams
Request for Comments: 7464                                  Cryptonector
Category: Standards Track                                  February 2015
ISSN: 2070-1721
        
Internet Engineering Task Force (IETF)                       N. Williams
Request for Comments: 7464                                  Cryptonector
Category: Standards Track                                  February 2015
ISSN: 2070-1721
        

JavaScript Object Notation (JSON) Text Sequences

JavaScript对象表示法(JSON)文本序列

Abstract

摘要

This document describes the JavaScript Object Notation (JSON) text sequence format and associated media type "application/json-seq". A JSON text sequence consists of any number of JSON texts, all encoded in UTF-8, each prefixed by an ASCII Record Separator (0x1E), and each ending with an ASCII Line Feed character (0x0A).

本文档描述JavaScript对象表示法(JSON)文本序列格式和相关媒体类型“application/JSON seq”。JSON文本序列由任意数量的JSON文本组成,所有文本均以UTF-8编码,每个文本以ASCII记录分隔符(0x1E)作为前缀,每个文本以ASCII换行符(0x0A)结尾。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7464.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc7464.

Copyright Notice

版权公告

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2015 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1. Introduction and Motivation .....................................2
      1.1. Conventions Used in This Document ..........................2
   2. JSON Text Sequence Format .......................................3
      2.1. JSON Text Sequence Parsing .................................3
      2.2. JSON Text Sequence Encoding ................................4
      2.3. Incomplete/Invalid JSON Texts Need Not Be Fatal ............4
      2.4. Top-Level Values: numbers, true, false, and null ...........5
   3. Security Considerations .........................................6
   4. IANA Considerations .............................................6
   5. Normative References ............................................7
   Acknowledgements ...................................................8
   Author's Address ...................................................8
        
   1. Introduction and Motivation .....................................2
      1.1. Conventions Used in This Document ..........................2
   2. JSON Text Sequence Format .......................................3
      2.1. JSON Text Sequence Parsing .................................3
      2.2. JSON Text Sequence Encoding ................................4
      2.3. Incomplete/Invalid JSON Texts Need Not Be Fatal ............4
      2.4. Top-Level Values: numbers, true, false, and null ...........5
   3. Security Considerations .........................................6
   4. IANA Considerations .............................................6
   5. Normative References ............................................7
   Acknowledgements ...................................................8
   Author's Address ...................................................8
        
1. Introduction and Motivation
1. 介绍和动机

The JavaScript Object Notation (JSON) [RFC7159] is a very handy serialization format. However, when serializing a large sequence of values as an array, or a possibly indeterminate-length or never-ending sequence of values, JSON becomes difficult to work with.

JavaScript对象表示法(JSON)[RFC7159]是一种非常方便的序列化格式。然而,当将一个大的值序列序列化为一个数组时,或者将一个可能不确定长度或永无止境的值序列序列化时,JSON变得很难使用。

Consider a sequence of one million values, each possibly one kilobyte when encoded -- roughly one gigabyte. It is often desirable to process such a dataset in an incremental manner without having to first read all of it before beginning to produce results. Traditionally, the way to do this with JSON is to use a "streaming" parser, but these are not widely available, widely used, or easy to use.

考虑一个一百万个值的序列,每个编码时可能有一个千字节,大约一千兆字节。通常希望以增量方式处理此类数据集,而不必在开始生成结果之前先读取所有数据集。传统上,使用JSON实现这一点的方法是使用“流”解析器,但这些解析器并不广泛可用、广泛使用或易于使用。

This document describes the concept and format of "JSON text sequences", which are specifically not JSON texts themselves but are composed of (possible) JSON texts. JSON text sequences can be parsed (and produced) incrementally without having to have a streaming parser (nor streaming encoder).

本文档描述了“JSON文本序列”的概念和格式,这些序列本身不是JSON文本,而是由(可能的)JSON文本组成。JSON文本序列可以增量解析(和生成),而无需使用流式解析器(或流式编码器)。

1.1. Conventions Used in This Document
1.1. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“建议”、“不建议”、“可”和“可选”应按照[RFC2119]中的说明进行解释。

2. JSON Text Sequence Format
2. JSON文本序列格式

Two different sets of ABNF rules are provided for the definition of JSON text sequences: one for parsers and one for encoders. Having two different sets of rules permits recovery by parsers from sequences where some of the elements are truncated for whatever reason. The syntax for parsers is specified in terms of octet strings that are then interpreted as JSON texts, if possible. The syntax for encoders, on the other hand, assumes that sequence elements are not truncated.

为JSON文本序列的定义提供了两组不同的ABNF规则:一组用于解析器,另一组用于编码器。拥有两组不同的规则允许解析器从某些元素因任何原因被截断的序列中恢复。解析器的语法是根据八进制字符串指定的,如果可能的话,八进制字符串将被解释为JSON文本。另一方面,编码器的语法假定序列元素没有被截断。

JSON text sequences MUST use UTF-8 encoding; other encodings of JSON (i.e., UTF-16 and UTF-32) MUST NOT be used.

JSON文本序列必须使用UTF-8编码;不得使用JSON的其他编码(即UTF-16和UTF-32)。

2.1. JSON Text Sequence Parsing
2.1. JSON文本序列解析

The ABNF [RFC5234] for the JSON text sequence parser is as given in Figure 1.

JSON文本序列解析器的ABNF[RFC5234]如图1所示。

      input-JSON-sequence = *(1*RS possible-JSON)
      RS = %x1E; "record separator" (RS), see RFC 20
               ; Also known as: Unicode Character INFORMATION SEPARATOR
               ;                TWO (U+001E)
      possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
                                ; JSON text (see RFC 7159)
      not-RS = %x00-1d / %x1f-ff; any octets other than RS
        
      input-JSON-sequence = *(1*RS possible-JSON)
      RS = %x1E; "record separator" (RS), see RFC 20
               ; Also known as: Unicode Character INFORMATION SEPARATOR
               ;                TWO (U+001E)
      possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
                                ; JSON text (see RFC 7159)
      not-RS = %x00-1d / %x1f-ff; any octets other than RS
        

Figure 1: JSON Text Sequence ABNF

图1:JSON文本序列ABNF

In prose: a series of octet strings, each containing any octet other than a record separator (RS) (0x1E) [RFC20]. All octet strings are preceded by an RS byte. Each octet string in the sequence is to be parsed as a JSON text in the UTF-8 encoding [RFC3629].

散文中的一系列八位字节字符串,每个字符串包含除记录分隔符(RS)(0x1E)[RFC20]以外的任何八位字节。所有八位字节字符串前面都有一个RS字节。序列中的每个八位字节字符串将被解析为UTF-8编码[RFC3629]中的JSON文本。

If parsing of such an octet string as a UTF-8-encoded JSON text fails, the parser SHOULD nonetheless continue parsing the remainder of the sequence. The parser can report such failures to applications, which might then choose to terminate parsing of a sequence. Multiple consecutive RS octets do not denote empty sequence elements between them and can be ignored.

如果解析UTF-8编码的JSON文本这样的八位字节字符串失败,解析器仍应继续解析序列的其余部分。解析器可以向应用程序报告此类故障,然后应用程序可能选择终止对序列的解析。多个连续的RS八位字节不表示它们之间的空序列元素,可以忽略。

This document does not define a mechanism for reliably identifying text sequence by position (for example, when sending individual elements of an array as unique text sequences). For applications where truncation is a possibility, this means that intended sequence elements can be truncated and can even be missing entirely; therefore, a reference to an nth element would be unreliable.

本文档未定义按位置可靠标识文本序列的机制(例如,当将数组的单个元素作为唯一文本序列发送时)。对于可能截断的应用程序,这意味着可以截断预期的序列元素,甚至可能完全缺失;因此,对第n个元素的引用是不可靠的。

There is no end of sequence indicator.

没有序列结束指示器。

2.2. JSON Text Sequence Encoding
2.2. JSON文本序列编码

The ABNF for the JSON text sequence encoder is given in Figure 2.

JSON文本序列编码器的ABNF如图2所示。

      JSON-sequence = *(RS JSON-text LF)
      RS = %x1E; see RFC 20
               ; Also known as: Unicode Character INFORMATION SEPARATOR
               ;                TWO (U+001E)
      LF = %x0A; "line feed" (LF), see RFC 20
      JSON-text = <given by RFC 7159, using UTF-8 encoding>
        
      JSON-sequence = *(RS JSON-text LF)
      RS = %x1E; see RFC 20
               ; Also known as: Unicode Character INFORMATION SEPARATOR
               ;                TWO (U+001E)
      LF = %x0A; "line feed" (LF), see RFC 20
      JSON-text = <given by RFC 7159, using UTF-8 encoding>
        

Figure 2: JSON Text Sequence ABNF

图2:JSON文本序列ABNF

In prose: any number of JSON texts, each encoded in UTF-8 [RFC3629], each preceded by one ASCII RS character, and each followed by a line feed (LF). Since RS is an ASCII control character, it may only appear in JSON strings in escaped form (see [RFC7159]), and since RS may not appear in JSON texts in any other form, RS unambiguously delimits the start of any element in the sequence. RS is sufficient to unambiguously delimit all top-level JSON value types other than numbers. Following each JSON text in the sequence with an LF allows detection of truncated JSON texts consisting of a number at the top-level; see Section 2.4.

散文中:任意数量的JSON文本,每个文本以UTF-8[RFC3629]编码,前面有一个ASCII RS字符,后面有一个换行符(LF)。由于RS是ASCII控制字符,它可能仅以转义形式出现在JSON字符串中(请参见[RFC7159]),并且由于RS可能不会以任何其他形式出现在JSON文本中,因此RS明确地限定了序列中任何元素的开头。RS足以明确界定除数字以外的所有顶级JSON值类型。在序列中的每个JSON文本后面加上LF,可以检测到由顶级数字组成的截断JSON文本;见第2.4节。

JSON text sequence encoders are expected to ensure that the sequence elements are properly formed. When the JSON text sequence encoder does the JSON text encoding, the sequence elements will naturally be properly formed. When the JSON text sequence encoder accepts already-encoded JSON texts, the JSON text sequence encoder ought to parse them before adding them to a sequence.

JSON文本序列编码器应确保序列元素的格式正确。当JSON文本序列编码器进行JSON文本编码时,序列元素自然会正确形成。当JSON文本序列编码器接受已经编码的JSON文本时,JSON文本序列编码器应该在将其添加到序列之前对其进行解析。

Note that on some systems it"s possible to input RS by typing "ctrl-^"; on some system or applications, the correct sequence may be "ctrl-v ctrl-^". This is helpful when constructing a sequence manually with a text editor.

请注意,在某些系统上,可以通过键入“ctrl-^”来输入RS;在某些系统或应用程序上,正确的顺序可能是“ctrl-v ctrl-^”。这在使用文本编辑器手动构建序列时非常有用。

2.3. Incomplete/Invalid JSON Texts Need Not Be Fatal
2.3. 不完整/无效的JSON文本不一定是致命的

Per Section 2.1, JSON text sequence parsers should not abort when an octet string contains a malformed JSON text. Instead, the JSON text sequence parser should skip to the next RS. Such a situation may arise in contexts where, for example, data that is appended to log files to log files is truncated by the filesystem (e.g., due to a crash or administrative process termination).

根据第2.1节,当八位字节字符串包含格式错误的JSON文本时,JSON文本序列解析器不应中止。相反,JSON文本序列解析器应该跳到下一个RS。例如,在附加到日志文件到日志文件的数据被文件系统截断的情况下(例如,由于崩溃或管理进程终止),可能会出现这种情况。

Incremental JSON text parsers may be used, though of course failure to parse a given text may result after first producing some incremental parse results.

可以使用增量JSON文本解析器,当然,在首先生成一些增量解析结果之后,可能会导致解析给定文本失败。

Sequence parsers should have an option to warn about truncated JSON texts.

序列解析器应该有一个选项来警告被截断的JSON文本。

2.4. Top-Level Values: numbers, true, false, and null
2.4. 顶级值:数字、true、false和null

While objects, arrays, and strings are self-delimited in JSON texts, numbers and the values 'true', 'false', and 'null' are not. Only whitespace can delimit the latter four kinds of values.

虽然对象、数组和字符串在JSON文本中是自分隔的,但数字和值“true”、“false”和“null”则不是。只有空格可以分隔后四种值。

JSON text sequences use 0x0A as a "canary" octet to detect truncation.

JSON文本序列使用0x0A作为“金丝雀”八位元来检测截断。

Parsers MUST check that any JSON texts that are a top-level number, or that might be 'true', 'false', or 'null', include JSON whitespace (at least one byte matching the "ws" ABNF rule from [RFC7159]) after that value; otherwise, the JSON-text may have been truncated. Note that the LF following each JSON text matches the "ws" ABNF rule.

解析器必须检查顶级数字或可能为“真”、“假”或“空”的任何JSON文本是否在该值后包含JSON空格(至少一个字节与[RFC7159]中的“ws”ABNF规则匹配);否则,JSON文本可能已被截断。请注意,每个JSON文本后面的LF与“ws”ABNF规则匹配。

Parsers MUST drop JSON-text sequence elements consisting of non-self-delimited top-level values that may have been truncated (that are not delimited by whitespace). Parsers can report such texts as warnings (including, optionally, the parsed text and/or the original octet string).

解析器必须删除JSON文本序列元素,这些元素由可能已被截断(不由空格分隔)的非自分隔顶级值组成。解析器可以将此类文本报告为警告(可选地包括解析文本和/或原始八位字节字符串)。

For example, '<RS>123<RS>' might have been intended to carry the top-level number 1234, but it got truncated. Similarly, '<RS>true<RS>' might have been intended to carry the invalid text 'trueish'. '<RS>truefalse<RS>' is not two top-level values, 'true', and 'false'; it is simply not a valid JSON text.

例如,“<RS>123<RS>”可能是用来携带顶级数字1234的,但它被截断了。类似地,<RS>true<RS>“可能是用来携带无效文本“trueish”<RS>truefalse<RS>'不是两个顶级值,“true”和“false”;它根本不是有效的JSON文本。

Implementations may produce a value when parsing '<RS>"foo"<RS>' because their JSON text parser might be able to consume bytes incrementally; since the JSON text in this case is a self-delimiting top-level value, the parser can produce the result without consuming an additional byte. Such implementations ought to skip to the next RS byte, possibly reporting any intervening non-whitespace bytes.

在解析“<RS>”foo“<RS>”时,实现可能会产生一个值,因为它们的JSON文本解析器可能能够增量地消耗字节;由于本例中的JSON文本是一个自定界的顶级值,因此解析器可以生成结果,而无需消耗额外的字节。这样的实现应该跳到下一个RS字节,可能会报告任何中间的非空白字节。

Detection of truncation of non-self-delimited sequence elements (numbers, true, false, and null) is only possible when the sequence encoder produces or receives complete JSON texts. Implementations where the sequence encoder is not also in charge of encoding the individual JSON texts should ensure that those JSON texts are complete.

只有当序列编码器生成或接收完整的JSON文本时,才能检测非自分隔序列元素(数字、true、false和null)的截断。序列编码器不负责编码单个JSON文本的实现应确保这些JSON文本是完整的。

3. Security Considerations
3. 安全考虑

All the security considerations of JSON [RFC7159] apply. This format provides no cryptographic integrity protection of any kind.

JSON[RFC7159]的所有安全注意事项都适用。此格式不提供任何类型的加密完整性保护。

As usual, parsers must operate on input that is assumed to be untrusted. This means that parsers must fail gracefully in the face of malicious inputs.

通常,解析器必须对假定不可信的输入进行操作。这意味着解析器在面对恶意输入时必须正常地失败。

Note that incremental JSON text parsers can produce partial results and later indicate failure to parse the remainder of a text. A sequence parser that uses an incremental JSON text parser might treat a sequence like '<RS>"foo"<LF>456<LF><RS>' as a sequence of one element ("foo"), while a sequence parser that uses a non-incremental JSON text parser might treat the same sequence as being empty. This effect, and texts that fail to parse and are ignored, can be used to smuggle data past sequence parsers that don't warn about JSON text failures.

请注意,增量JSON文本解析器可能会产生部分结果,然后表示无法解析文本的其余部分。使用增量JSON文本解析器的序列解析器可能将类似“<RS>”foo”<LF>456<LF><RS>”的序列视为一个元素的序列(“foo”),而使用非增量JSON文本解析器的序列解析器可能将同一序列视为空。这种效果,以及无法解析且被忽略的文本,可用于通过序列解析器走私数据,而序列解析器不会警告JSON文本失败。

Repeated parsing and re-encoding of a JSON text sequence can result in the addition (or stripping) of trailing LF bytes from (to) individual sequence element JSON texts. This can break signature validation. JSON has no canonical form for JSON texts, therefore neither does the JSON text sequence format.

重复解析和重新编码JSON文本序列可能会导致从单个序列元素JSON文本中添加(或剥离)后续LF字节。这可能会破坏签名验证。JSON没有JSON文本的标准格式,因此JSON文本序列格式也没有。

4. IANA Considerations
4. IANA考虑

The MIME media type for JSON text sequences is application/json-seq.

JSON文本序列的MIME媒体类型为application/JSON seq。

Type name: application

类型名称:应用程序

Subtype name: json-seq

子类型名称:json seq

Required parameters: N/A

所需参数:不适用

Optional parameters: N/A

可选参数:不适用

Encoding considerations: binary

编码注意事项:二进制

Security considerations: See RFC 7464, Section 3.

安全注意事项:见RFC 7464,第3节。

Interoperability considerations: Described herein.

互操作性注意事项:本文描述。

Published specification: RFC 7464.

已发布规范:RFC 7464。

Applications that use this media type:

使用此媒体类型的应用程序:

      <https://stedolan.github.io/jq>
      <https://github.com/mapbox/cligj>
      <https://github.com/hildjj/json-text-sequence>
        
      <https://stedolan.github.io/jq>
      <https://github.com/mapbox/cligj>
      <https://github.com/hildjj/json-text-sequence>
        

Fragment identifier considerations: N/A

片段标识符注意事项:不适用

Additional information:

其他信息:

o Deprecated alias names for this type: N/A

o 此类型的已弃用别名:不适用

o Magic number(s): N/A

o 幻数:不适用

o File extension(s): N/A

o 文件扩展名:不适用

o Macintosh file type code(s): N/A

o Macintosh文件类型代码:不适用

Person & email address to contact for further information:

联系人和电子邮件地址,以获取更多信息:

json@ietf.org

json@ietf.org

Intended usage: COMMON

预期用途:普通

Author: Nicolas Williams (nico@cryptonector.com)

作者:尼古拉斯·威廉姆斯(nico@cryptonector.com)

Change controller: IETF

更改控制器:IETF

5. Normative References
5. 规范性引用文件

[RFC20] Cerf, V., "ASCII format for network interchange", STD 80, RFC 20, October 1969, <http://www.rfc-editor.org/info/rfc20>.

[RFC20]Cerf,V.,“网络交换的ASCII格式”,STD 80,RFC 20,1969年10月<http://www.rfc-editor.org/info/rfc20>.

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月<http://www.rfc-editor.org/info/rfc2119>.

[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003, <http://www.rfc-editor.org/info/rfc3629>.

[RFC3629]Yergeau,F.,“UTF-8,ISO 10646的转换格式”,STD 63,RFC 3629,2003年11月<http://www.rfc-editor.org/info/rfc3629>.

[RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008, <http://www.rfc-editor.org/info/rfc5234>.

[RFC5234]Crocker,D.,Ed.和P.Overell,“语法规范的扩充BNF:ABNF”,STD 68,RFC 5234,2008年1月<http://www.rfc-editor.org/info/rfc5234>.

[RFC7159] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data Interchange Format", RFC 7159, March 2014, <http://www.rfc-editor.org/info/rfc7159>.

[RFC7159]Bray,T.,Ed.“JavaScript对象表示法(JSON)数据交换格式”,RFC 7159,2014年3月<http://www.rfc-editor.org/info/rfc7159>.

Acknowledgements

致谢

Phillip Hallam-Baker proposed the use of JSON text sequences for logfiles and pointed out the need for resynchronization. Stephen Dolan created <https://github.com/stedolan/jq>, which uses something like JSON text sequences (with LF as the separator between texts on output, and requiring only such whitespace as needed to disambiguate on input). Carsten Bormann suggested the use of ASCII RS, and Joe Hildebrand suggested the use of LF in addition to RS for disambiguating top-level number values. Paul Hoffman shepherded the document. Many others contributed reviews and comments on the JSON Working Group mailing list.

Phillip Hallam Baker建议对日志文件使用JSON文本序列,并指出需要重新同步。斯蒂芬·多兰创造了<https://github.com/stedolan/jq>,它使用类似于JSON文本序列的内容(输出时文本之间的分隔符是LF,输入时只需要这样的空格来消除歧义)。Carsten Bormann建议使用ASCII RS,Joe Hildebrand建议在RS之外使用LF来消除顶级数值的歧义。保罗·霍夫曼负责管理这份文件。许多其他人对JSON工作组邮件列表发表了评论。

Author's Address

作者地址

Nicolas Williams Cryptonector, LLC

Nicolas Williams Cryptonector有限责任公司

   EMail: nico@cryptonector.com
        
   EMail: nico@cryptonector.com