Network Working Group                                         F. Yergeau
Request for Comments: 3629                             Alis Technologies
STD: 63                                                    November 2003
Obsoletes: 2279
Category: Standards Track
        
Network Working Group                                         F. Yergeau
Request for Comments: 3629                             Alis Technologies
STD: 63                                                    November 2003
Obsoletes: 2279
Category: Standards Track
        

UTF-8, a transformation format of ISO 10646

UTF-8,ISO 10646的转换格式

Status of this Memo

本备忘录的状况

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2003). All Rights Reserved.

版权所有(C)互联网协会(2003年)。版权所有。

Abstract

摘要

ISO/IEC 10646-1 defines a large character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems. The originally proposed encodings of the UCS, however, were not compatible with many current applications and protocols, and this has led to the development of UTF-8, the object of this memo. UTF-8 has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values. This memo obsoletes and replaces RFC 2279.

ISO/IEC 10646-1定义了一个称为通用字符集(UCS)的大型字符集,它涵盖了世界上大多数书写系统。然而,最初提议的UCS编码与许多当前应用程序和协议不兼容,这导致了本备忘录的目标UTF-8的开发。UTF-8的特点是保留完整的US-ASCII范围,提供与依赖US-ASCII值但对其他值透明的文件系统、解析器和其他软件的兼容性。本备忘录废除并取代RFC 2279。

Table of Contents

目录

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Notational conventions . . . . . . . . . . . . . . . . . . . .  3
   3.  UTF-8 definition . . . . . . . . . . . . . . . . . . . . . . .  4
   4.  Syntax of UTF-8 Byte Sequences . . . . . . . . . . . . . . . .  5
   5.  Versions of the standards  . . . . . . . . . . . . . . . . . .  6
   6.  Byte order mark (BOM)  . . . . . . . . . . . . . . . . . . . .  6
   7.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
   8.  MIME registration  . . . . . . . . . . . . . . . . . . . . . .  9
   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
   12. Changes from RFC 2279  . . . . . . . . . . . . . . . . . . . . 11
   13. Normative References . . . . . . . . . . . . . . . . . . . . . 12
        
   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Notational conventions . . . . . . . . . . . . . . . . . . . .  3
   3.  UTF-8 definition . . . . . . . . . . . . . . . . . . . . . . .  4
   4.  Syntax of UTF-8 Byte Sequences . . . . . . . . . . . . . . . .  5
   5.  Versions of the standards  . . . . . . . . . . . . . . . . . .  6
   6.  Byte order mark (BOM)  . . . . . . . . . . . . . . . . . . . .  6
   7.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
   8.  MIME registration  . . . . . . . . . . . . . . . . . . . . . .  9
   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
   12. Changes from RFC 2279  . . . . . . . . . . . . . . . . . . . . 11
   13. Normative References . . . . . . . . . . . . . . . . . . . . . 12
        
   14. Informative References . . . . . . . . . . . . . . . . . . . . 12
   15. URI's  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
   16. Intellectual Property Statement  . . . . . . . . . . . . . . . 13
   17. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13
   18. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 14
        
   14. Informative References . . . . . . . . . . . . . . . . . . . . 12
   15. URI's  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
   16. Intellectual Property Statement  . . . . . . . . . . . . . . . 13
   17. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13
   18. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 14
        
1. Introduction
1. 介绍

ISO/IEC 10646 [ISO.10646] defines a large character set called the Universal Character Set (UCS), which encompasses most of the world's writing systems. The same set of characters is defined by the Unicode standard [UNICODE], which further defines additional character properties and other application details of great interest to implementers. Up to the present time, changes in Unicode and amendments and additions to ISO/IEC 10646 have tracked each other, so that the character repertoires and code point assignments have remained in sync. The relevant standardization committees have committed to maintain this very useful synchronism.

ISO/IEC10646[ISO.10646]定义了一个称为通用字符集(UCS)的大型字符集,它包含了世界上大多数的书写系统。Unicode标准[Unicode]定义了相同的字符集,它进一步定义了实现人员非常感兴趣的其他字符属性和其他应用程序细节。到目前为止,Unicode的变化以及ISO/IEC 10646的修订和添加都相互跟踪,因此字符表和代码点分配保持同步。相关标准化委员会已承诺保持这种非常有用的同步性。

ISO/IEC 10646 and Unicode define several encoding forms of their common repertoire: UTF-8, UCS-2, UTF-16, UCS-4 and UTF-32. In an encoding form, each character is represented as one or more encoding units. All standard UCS encoding forms except UTF-8 have an encoding unit larger than one octet, making them hard to use in many current applications and protocols that assume 8 or even 7 bit characters.

ISO/IEC 10646和Unicode定义了几种常见的编码形式:UTF-8、UCS-2、UTF-16、UCS-4和UTF-32。在编码形式中,每个字符表示为一个或多个编码单元。除UTF-8外,所有标准UCS编码形式的编码单元都大于一个八位字节,这使得它们难以在许多当前采用8位甚至7位字符的应用程序和协议中使用。

UTF-8, the object of this memo, has a one-octet encoding unit. It uses all bits of an octet, but has the quality of preserving the full US-ASCII [US-ASCII] range: US-ASCII characters are encoded in one octet having the normal US-ASCII value, and any octet with such a value can only stand for a US-ASCII character, and nothing else.

本备忘录的对象UTF-8有一个八位字节编码单元。它使用一个八位字节的所有位,但具有保留完整US-ASCII[US-ASCII]范围的性质:US-ASCII字符编码在一个具有正常US-ASCII值的八位字节中,任何具有该值的八位字节只能代表US-ASCII字符,而不能代表其他字符。

UTF-8 encodes UCS characters as a varying number of octets, where the number of octets, and the value of each, depend on the integer value assigned to the character in ISO/IEC 10646 (the character number, a.k.a. code position, code point or Unicode scalar value). This encoding form has the following characteristics (all values are in hexadecimal):

UTF-8将UCS字符编码为不同数量的八位字节,其中八位字节的数量和每个八位字节的值取决于ISO/IEC 10646中分配给字符的整数值(字符编号、也称为代码位置、代码点或Unicode标量值)。此编码形式具有以下特征(所有值均为十六进制):

o Character numbers from U+0000 to U+007F (US-ASCII repertoire) correspond to octets 00 to 7F (7 bit US-ASCII values). A direct consequence is that a plain ASCII string is also a valid UTF-8 string.

o 从U+0000到U+007F(US-ASCII指令集)的字符数对应于八位字节00到7F(7位US-ASCII值)。直接的结果是,普通ASCII字符串也是有效的UTF-8字符串。

o US-ASCII octet values do not appear otherwise in a UTF-8 encoded character stream. This provides compatibility with file systems or other software (e.g., the printf() function in C libraries) that parse based on US-ASCII values but are transparent to other values.

o US-ASCII八位字节值不会以其他方式出现在UTF-8编码字符流中。这提供了与基于US-ASCII值进行解析但对其他值透明的文件系统或其他软件(例如C库中的printf()函数)的兼容性。

o Round-trip conversion is easy between UTF-8 and other encoding forms.

o UTF-8和其他编码形式之间的往返转换很容易。

o The first octet of a multi-octet sequence indicates the number of octets in the sequence.

o 多八位组序列的第一个八位组表示序列中的八位组数。

o The octet values C0, C1, F5 to FF never appear.

o 八位字节值C0、C1、F5至FF从未出现。

o Character boundaries are easily found from anywhere in an octet stream.

o 字符边界很容易从八位字节流中的任何位置找到。

o The byte-value lexicographic sorting order of UTF-8 strings is the same as if ordered by character numbers. Of course this is of limited interest since a sort order based on character numbers is almost never culturally valid.

o UTF-8字符串的字节值字典排序顺序与按字符数排序相同。当然,这是有限的兴趣,因为基于字符数的排序顺序在文化上几乎从来都是无效的。

o The Boyer-Moore fast search algorithm can be used with UTF-8 data.

o Boyer-Moore快速搜索算法可用于UTF-8数据。

o UTF-8 strings can be fairly reliably recognized as such by a simple algorithm, i.e., the probability that a string of characters in any other encoding appears as valid UTF-8 is low, diminishing with increasing string length.

o UTF-8字符串可以通过一个简单的算法相当可靠地识别出来,即任何其他编码中的字符字符串显示为有效UTF-8的概率很低,随着字符串长度的增加而减小。

UTF-8 was devised in September 1992 by Ken Thompson, guided by design criteria specified by Rob Pike, with the objective of defining a UCS transformation format usable in the Plan9 operating system in a non-disruptive manner. Thompson's design was stewarded through standardization by the X/Open Joint Internationalization Group XOJIG (see [FSS_UTF]), bearing the names FSS-UTF (variant FSS/UTF), UTF-2 and finally UTF-8 along the way.

UTF-8于1992年9月由Ken Thompson在Rob Pike指定的设计标准指导下设计,目的是以无中断的方式定义可在Plan9操作系统中使用的UCS转换格式。汤普森的设计由X/开放联合国际化集团XOJIG(参见[FSS_UTF])通过标准化进行管理,沿途命名为FSS-UTF(变体FSS/UTF)、UTF-2,最后命名为UTF-8。

2. Notational conventions
2. 符号约定

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。

UCS characters are designated by the U+HHHH notation, where HHHH is a string of from 4 to 6 hexadecimal digits representing the character number in ISO/IEC 10646.

UCS字符由U+HHHH表示法指定,其中HHHH是一个由4到6个十六进制数字组成的字符串,表示ISO/IEC 10646中的字符编号。

3. UTF-8 definition
3. UTF-8定义

UTF-8 is defined by the Unicode Standard [UNICODE]. Descriptions and formulae can also be found in Annex D of ISO/IEC 10646-1 [ISO.10646]

UTF-8由Unicode标准[Unicode]定义。说明和公式也可在ISO/IEC 10646-1[ISO.10646]的附录D中找到

In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 accessible range) are encoded using sequences of 1 to 4 octets. The only octet of a "sequence" of one has the higher-order bit set to 0, the remaining 7 bits being used to encode the character number. In a sequence of n octets, n>1, the initial octet has the n higher-order bits set to 1, followed by a bit set to 0. The remaining bit(s) of that octet contain bits from the number of the character to be encoded. The following octet(s) all have the higher-order bit set to 1 and the following bit set to 0, leaving 6 bits in each to contain bits from the character to be encoded.

在UTF-8中,来自U+0000..U+10FFFF范围(UTF-16可访问范围)的字符使用1到4个八位字节的序列进行编码。一个“序列”的唯一八位字节的高阶位设置为0,其余7位用于编码字符号。在n个八位组的序列中,n>1,初始八位组将n个高阶位设置为1,然后将一个位设置为0。该八位字节的剩余位包含要编码的字符数的位。下面的八位字节都将高阶位设置为1,将下面的位设置为0,每个八位字节中保留6位,以包含要编码的字符中的位。

The table below summarizes the format of these different octet types. The letter x indicates bits available for encoding bits of the character number.

下表总结了这些不同八位组类型的格式。字母x表示可用于编码字符号位的位。

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
        
   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
        

Encoding a character to UTF-8 proceeds as follows:

将字符编码为UTF-8的过程如下:

1. Determine the number of octets required from the character number and the first column of the table above. It is important to note that the rows of the table are mutually exclusive, i.e., there is only one valid way to encode a given character.

1. 根据字符数和上表第一列确定所需的八位字节数。需要注意的是,表中的行是互斥的,也就是说,对给定字符进行编码只有一种有效的方法。

2. Prepare the high-order bits of the octets as per the second column of the table.

2. 根据表的第二列准备八位字节的高阶位。

3. Fill in the bits marked x from the bits of the character number, expressed in binary. Start by putting the lowest-order bit of the character number in the lowest-order position of the last octet of the sequence, then put the next higher-order bit of the character number in the next higher-order position of that octet, etc. When the x bits of the last octet are filled in, move on to the next to last octet, then to the preceding one, etc. until all x bits are filled in.

3. 从以二进制表示的字符数的位中填入标记为x的位。首先将字符号的最低阶位放在序列最后一个八位字节的最低阶位置,然后将字符号的下一个高阶位放在该八位字节的下一个高阶位置,以此类推。填充最后一个八位字节的x位后,移到下一个到最后一个八位字节,然后移到前一个八位字节,等,直到所有的x位都被填满。

The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form (as surrogate pairs) and do not directly represent characters. When encoding in UTF-8 from UTF-16 data, it is necessary to first decode the UTF-16 data to obtain character numbers, which are then encoded in UTF-8 as described above. This contrasts with CESU-8 [CESU-8], which is a UTF-8-like encoding that is not meant for use on the Internet. CESU-8 operates similarly to UTF-8 but encodes the UTF-16 code values (16-bit quantities) instead of the character number (code point). This leads to different results for character numbers above 0xFFFF; the CESU-8 encoding of those characters is NOT valid UTF-8.

UTF-8的定义禁止对U+D800和U+DFFF之间的字符编号进行编码,这些编号保留用于UTF-16编码形式(作为代理项对),并且不直接表示字符。当从UTF-16数据以UTF-8进行编码时,必须首先对UTF-16数据进行解码以获得字符号,然后如上所述以UTF-8进行编码。这与CESU-8[CESU-8]形成对比,CESU-8是一种类似UTF-8的编码,不适用于互联网。CESU-8的操作与UTF-8类似,但编码UTF-16代码值(16位量),而不是字符号(代码点)。这会导致0xFFFF以上的字符数产生不同的结果;这些字符的CESU-8编码不是有效的UTF-8。

Decoding a UTF-8 character proceeds as follows:

解码UTF-8字符的过程如下:

1. Initialize a binary number with all bits set to 0. Up to 21 bits may be needed.

1. 初始化二进制数,将所有位设置为0。最多需要21位。

2. Determine which bits encode the character number from the number of octets in the sequence and the second column of the table above (the bits marked x).

2. 根据序列中的八位字节数和上表的第二列(标记为x的位),确定哪些位对字符号进行编码。

3. Distribute the bits from the sequence to the binary number, first the lower-order bits from the last octet of the sequence and proceeding to the left until no x bits are left. The binary number is now equal to the character number.

3. 将序列中的位分配到二进制数,首先是序列最后一个八位字节中的低阶位,然后向左移动,直到没有剩下x位。二进制数现在等于字符数。

Implementations of the decoding algorithm above MUST protect against decoding invalid sequences. For instance, a naive implementation may decode the overlong UTF-8 sequence C0 80 into the character U+0000, or the surrogate pair ED A1 8C ED BE B4 into U+233B4. Decoding invalid sequences may have security consequences or cause other problems. See Security Considerations (Section 10) below.

上述解码算法的实现必须防止解码无效序列。例如,原始实现可以将过长的UTF-8序列C0 80解码为字符U+0000,或者将代理项对ED A1 8C ED B4解码为U+233B4。解码无效序列可能会产生安全后果或导致其他问题。见下文安全注意事项(第10节)。

4. Syntax of UTF-8 Byte Sequences
4. UTF-8字节序列的语法

For the convenience of implementors using ABNF, a definition of UTF-8 in ABNF syntax is given here.

为了方便使用ABNF的实现者,这里给出了ABNF语法中UTF-8的定义。

A UTF-8 string is a sequence of octets representing a sequence of UCS characters. An octet sequence is valid UTF-8 only if it matches the following syntax, which is derived from the rules for encoding UTF-8 and is expressed in the ABNF of [RFC2234].

UTF-8字符串是表示UCS字符序列的八位字节序列。八位字节序列只有在符合以下语法时才是有效的UTF-8,该语法源自UTF-8编码规则,并用[RFC2234]的ABNF表示。

   UTF8-octets = *( UTF8-char )
   UTF8-char   = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
   UTF8-1      = %x00-7F
   UTF8-2      = %xC2-DF UTF8-tail
        
   UTF8-octets = *( UTF8-char )
   UTF8-char   = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
   UTF8-1      = %x00-7F
   UTF8-2      = %xC2-DF UTF8-tail
        
   UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
                 %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
   UTF8-4      = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
                 %xF4 %x80-8F 2( UTF8-tail )
   UTF8-tail   = %x80-BF
        
   UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
                 %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
   UTF8-4      = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
                 %xF4 %x80-8F 2( UTF8-tail )
   UTF8-tail   = %x80-BF
        

NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This grammar is believed to describe the same thing Unicode describes, but does not claim to be authoritative. Implementors are urged to rely on the authoritative source, rather than on this ABNF.

注意——UTF-8的权威定义是[UNICODE]。这种语法被认为描述了Unicode描述的相同的东西,但并不声称是权威的。敦促实现者依赖权威来源,而不是此ABNF。

5. Versions of the standards
5. 标准的版本

ISO/IEC 10646 is updated from time to time by publication of amendments and additional parts; similarly, new versions of the Unicode standard are published over time. Each new version obsoletes and replaces the previous one, but implementations, and more significantly data, are not updated instantly.

ISO/IEC 10646不时通过发布修订版和附加部分进行更新;类似地,Unicode标准的新版本也会随着时间的推移而发布。每一个新版本都会淘汰并替换上一个版本,但实现和更重要的数据不会立即更新。

In general, the changes amount to adding new characters, which does not pose particular problems with old data. In 1996, Amendment 5 to the 1993 edition of ISO/IEC 10646 and Unicode 2.0 moved and expanded the Korean Hangul block, thereby making any previous data containing Hangul characters invalid under the new version. Unicode 2.0 has the same difference from Unicode 1.1. The justification for allowing such an incompatible change was that there were no major implementations and no significant amounts of data containing Hangul. The incident has been dubbed the "Korean mess", and the relevant committees have pledged to never, ever again make such an incompatible change (see Unicode Consortium Policies [1]).

一般来说,更改相当于添加新字符,这不会对旧数据造成特殊问题。1996年,1993年版ISO/IEC 10646和Unicode 2.0的第5次修正案移动并扩展了韩国语韩国语块,从而使任何以前包含韩国语字符的数据在新版本下无效。Unicode 2.0与Unicode 1.1有相同的区别。允许这种不兼容更改的理由是没有主要的实现,也没有大量包含韩文的数据。这起事件被称为“韩国乱象”,相关委员会承诺永远不再做出如此不相容的改变(参见Unicode联盟政策[1])。

New versions, and in particular any incompatible changes, have consequences regarding MIME charset labels, to be discussed in MIME registration (Section 8).

新版本,尤其是任何不兼容的更改,都会对MIME字符集标签产生影响,将在MIME注册(第8节)中讨论。

6. Byte order mark (BOM)
6. 字节顺序标记(BOM)

The UCS character U+FEFF "ZERO WIDTH NO-BREAK SPACE" is also known informally as "BYTE ORDER MARK" (abbreviated "BOM"). This character can be used as a genuine "ZERO WIDTH NO-BREAK SPACE" within text, but the BOM name hints at a second possible usage of the character: to prepend a U+FEFF character to a stream of UCS characters as a "signature". A receiver of such a serialized stream may then use the initial character as a hint that the stream consists of UCS characters and also to recognize which UCS encoding is involved and, with encodings having a multi-octet encoding unit, as a way to

UCS字符U+FEFF“零宽度无中断空间”也被非正式地称为“字节顺序标记”(缩写为“BOM”)。该字符可以用作文本中真正的“零宽度不间断空格”,但BOM名称暗示了该字符的第二种可能用法:将U+FEFF字符作为“签名”前置到UCS字符流中。这样的序列化流的接收器随后可以使用初始字符作为流由UCS字符组成的提示,并且还可以识别涉及哪个UCS编码,并且对于具有多个八位组编码单元的编码,可以使用初始字符作为识别的方法

recognize the serialization order of the octets. UTF-8 having a single-octet encoding unit, this last function is useless and the BOM will always appear as the octet sequence EF BB BF.

识别八位字节的序列化顺序。UTF-8有一个八位字节编码单元,最后一个功能是无用的,BOM将始终显示为八位字节序列EF BB BF。

It is important to understand that the character U+FEFF appearing at any position other than the beginning of a stream MUST be interpreted with the semantics for the zero-width non-breaking space, and MUST NOT be interpreted as a signature. When interpreted as a signature, the Unicode standard suggests than an initial U+FEFF character may be stripped before processing the text. Such stripping is necessary in some cases (e.g., when concatenating two strings, because otherwise the resulting string may contain an unintended "ZERO WIDTH NO-BREAK SPACE" at the connection point), but might affect an external process at a different layer (such as a digital signature or a count of the characters) that is relying on the presence of all characters in the stream. It is therefore RECOMMENDED to avoid stripping an initial U+FEFF interpreted as a signature without a good reason, to ignore it instead of stripping it when appropriate (such as for display) and to strip it only when really necessary.

重要的是要理解,出现在流开头以外的任何位置的字符U+FEFF必须使用零宽度非中断空间的语义进行解释,并且不得解释为签名。当解释为签名时,Unicode标准建议在处理文本之前,可以去除初始U+FEFF字符。这种剥离在某些情况下是必要的(例如,当连接两个字符串时,因为否则结果字符串可能在连接点处包含意外的“零宽度无中断空间”),但可能会影响不同层的外部进程(例如数字签名或字符计数)这取决于流中所有字符的存在。因此,建议避免在没有充分理由的情况下剥离解释为签名的初始U+FEFF,在适当的情况下(例如用于显示)忽略它而不是剥离它,并且仅在真正必要时剥离它。

U+FEFF in the first position of a stream MAY be interpreted as a zero-width non-breaking space, and is not always a signature. In an attempt at diminishing this uncertainty, Unicode 3.2 adds a new character, U+2060 "WORD JOINER", with exactly the same semantics and usage as U+FEFF except for the signature function, and strongly recommends its exclusive use for expressing word-joining semantics. Eventually, following this recommendation will make it all but certain that any initial U+FEFF is a signature, not an intended "ZERO WIDTH NO-BREAK SPACE".

流的第一个位置的U+FEFF可以解释为零宽度非中断空间,并且不总是签名。为了减少这种不确定性,Unicode 3.2添加了一个新字符U+2060“WORD JOINER”,除了签名函数外,其语义和用法与U+FEFF完全相同,并强烈建议将其专用于表示WORD JOINER语义。最终,遵循这一建议将确保任何初始U+FEFF都是一个签名,而不是预期的“零宽度无中断空间”。

In the meantime, the uncertainty unfortunately remains and may affect Internet protocols. Protocol specifications MAY restrict usage of U+FEFF as a signature in order to reduce or eliminate the potential ill effects of this uncertainty. In the interest of striking a balance between the advantages (reduction of uncertainty) and drawbacks (loss of the signature function) of such restrictions, it is useful to distinguish a few cases:

与此同时,不幸的是,不确定性仍然存在,并可能影响互联网协议。协议规范可能会限制U+FEFF作为签名的使用,以减少或消除这种不确定性的潜在不良影响。为了在此类限制的优点(减少不确定性)和缺点(失去签名功能)之间取得平衡,有必要区分以下几种情况:

o A protocol SHOULD forbid use of U+FEFF as a signature for those textual protocol elements that the protocol mandates to be always UTF-8, the signature function being totally useless in those cases.

o 协议应禁止使用U+FEFF作为协议要求始终为UTF-8的文本协议元素的签名,在这些情况下,签名功能完全无用。

o A protocol SHOULD also forbid use of U+FEFF as a signature for those textual protocol elements for which the protocol provides character encoding identification mechanisms, when it is expected that implementations of the protocol will be in a position to always use the mechanisms properly. This will be the case when

o 协议还应禁止使用U+FEFF作为协议提供字符编码识别机制的文本协议元素的签名,因为协议的实现将始终能够正确使用这些机制。这种情况将在

the protocol elements are maintained tightly under the control of the implementation from the time of their creation to the time of their (properly labeled) transmission.

从协议元素创建到传输(正确标记)期间,协议元素在实现的控制下严格维护。

o A protocol SHOULD NOT forbid use of U+FEFF as a signature for those textual protocol elements for which the protocol does not provide character encoding identification mechanisms, when a ban would be unenforceable, or when it is expected that implementations of the protocol will not be in a position to always use the mechanisms properly. The latter two cases are likely to occur with larger protocol elements such as MIME entities, especially when implementations of the protocol will obtain such entities from file systems, from protocols that do not have encoding identification mechanisms for payloads (such as FTP) or from other protocols that do not guarantee proper identification of character encoding (such as HTTP).

o 协议不应禁止使用U+FEFF作为协议未提供字符编码标识机制的文本协议元素的签名,如果禁止将无法执行,或者预期协议的实现将无法始终正确使用这些机制。后两种情况可能发生在更大的协议元素(如MIME实体)中,尤其是当协议的实现将从文件系统、从没有有效负载编码标识机制的协议(如FTP)中获取此类实体时或者来自不能保证正确识别字符编码的其他协议(如HTTP)。

When a protocol forbids use of U+FEFF as a signature for a certain protocol element, then any initial U+FEFF in that protocol element MUST be interpreted as a "ZERO WIDTH NO-BREAK SPACE". When a protocol does NOT forbid use of U+FEFF as a signature for a certain protocol element, then implementations SHOULD be prepared to handle a signature in that element and react appropriately: using the signature to identify the character encoding as necessary and stripping or ignoring the signature as appropriate.

当协议禁止将U+FEFF用作某个协议元素的签名时,该协议元素中的任何初始U+FEFF都必须解释为“零宽度无中断空间”。当协议不禁止使用U+FEFF作为某个协议元素的签名时,那么实现应该准备好处理该元素中的签名并做出适当的反应:根据需要使用签名来识别字符编码,并根据需要剥离或忽略签名。

7. Examples
7. 例子

The character sequence U+0041 U+2262 U+0391 U+002E "A<NOT IDENTICAL TO><ALPHA>." is encoded in UTF-8 as follows:

字符序列U+0041 U+2262 U+0391 U+002E“A<与><ALPHA>不同>”在UTF-8中编码如下:

       --+--------+-----+--
       41 E2 89 A2 CE 91 2E
       --+--------+-----+--
        
       --+--------+-----+--
       41 E2 89 A2 CE 91 2E
       --+--------+-----+--
        

The character sequence U+D55C U+AD6D U+C5B4 (Korean "hangugeo", meaning "the Korean language") is encoded in UTF-8 as follows:

字符序列U+D55C U+AD6D U+C5B4(韩语“hangugeo”,意思是“韩语”)以UTF-8编码,如下所示:

       --------+--------+--------
       ED 95 9C EA B5 AD EC 96 B4
       --------+--------+--------
        
       --------+--------+--------
       ED 95 9C EA B5 AD EC 96 B4
       --------+--------+--------
        

The character sequence U+65E5 U+672C U+8A9E (Japanese "nihongo", meaning "the Japanese language") is encoded in UTF-8 as follows:

字符序列U+65E5 U+672C U+8A9E(日语“nihongo”,意思是“日语”)以UTF-8编码,如下所示:

       --------+--------+--------
       E6 97 A5 E6 9C AC E8 AA 9E
       --------+--------+--------
        
       --------+--------+--------
       E6 97 A5 E6 9C AC E8 AA 9E
       --------+--------+--------
        

The character U+233B4 (a Chinese character meaning 'stump of tree'), prepended with a UTF-8 BOM, is encoded in UTF-8 as follows:

以UTF-8 BOM开头的字符U+233B4(中文字符,意为“树桩”)在UTF-8中编码如下:

       --------+-----------
       EF BB BF F0 A3 8E B4
       --------+-----------
        
       --------+-----------
       EF BB BF F0 A3 8E B4
       --------+-----------
        
8. MIME registration
8. MIME注册

This memo serves as the basis for registration of the MIME charset parameter for UTF-8, according to [RFC2978]. The charset parameter value is "UTF-8". This string labels media types containing text consisting of characters from the repertoire of ISO/IEC 10646 including all amendments at least up to amendment 5 of the 1993 edition (Korean block), encoded to a sequence of octets using the encoding scheme outlined above. UTF-8 is suitable for use in MIME content types under the "text" top-level type.

根据[RFC2978],本备忘录作为注册UTF-8 MIME字符集参数的基础。字符集参数值为“UTF-8”。该字符串标记包含由ISO/IEC 10646指令集中的字符组成的文本的媒体类型,包括至少1993年版(韩文块)第5次修订之前的所有修订,使用上述编码方案编码为八位字节序列。UTF-8适用于“text”顶级类型下的MIME内容类型。

It is noteworthy that the label "UTF-8" does not contain a version identification, referring generically to ISO/IEC 10646. This is intentional, the rationale being as follows:

值得注意的是,标签“UTF-8”不包含版本标识,通常指ISO/IEC 10646。这是有意的,理由如下:

A MIME charset label is designed to give just the information needed to interpret a sequence of bytes received on the wire into a sequence of characters, nothing more (see [RFC2045], section 2.2). As long as a character set standard does not change incompatibly, version numbers serve no purpose, because one gains nothing by learning from the tag that newly assigned characters may be received that one doesn't know about. The tag itself doesn't teach anything about the new characters, which are going to be received anyway.

MIME字符集标签旨在提供将线路上接收的字节序列解释为字符序列所需的信息,仅此而已(请参见[RFC2045],第2.2节)。只要字符集标准没有发生不兼容的变化,版本号就没有任何作用,因为从标签中了解到新分配的字符可能会被接收到,而用户对此一无所知。标签本身并没有告诉我们任何关于新角色的信息,这些新角色无论如何都会被接收。

Hence, as long as the standards evolve compatibly, the apparent advantage of having labels that identify the versions is only that, apparent. But there is a disadvantage to such version-dependent labels: when an older application receives data accompanied by a newer, unknown label, it may fail to recognize the label and be completely unable to deal with the data, whereas a generic, known label would have triggered mostly correct processing of the data, which may well not contain any new characters.

因此,只要标准能够兼容地发展,拥有标识版本的标签的明显优势就是显而易见的。但这种依赖于版本的标签有一个缺点:当较旧的应用程序接收到带有较新的未知标签的数据时,它可能无法识别该标签,并且完全无法处理该数据,而通用的已知标签会触发对数据的大部分正确处理,很可能不包含任何新字符。

Now the "Korean mess" (ISO/IEC 10646 amendment 5) is an incompatible change, in principle contradicting the appropriateness of a version independent MIME charset label as described above. But the compatibility problem can only appear with data containing Korean Hangul characters encoded according to Unicode 1.1 (or equivalently ISO/IEC 10646 before amendment 5), and there is arguably no such data to worry about, this being the very reason the incompatible change was deemed acceptable.

现在,“Korean mess”(ISO/IEC 10646修订5)是一个不兼容的变更,原则上与上述独立于版本的MIME字符集标签的适当性相矛盾。但兼容性问题只会出现在包含根据Unicode 1.1编码的韩国语韩国语字符的数据上(或在修正案5之前相当于ISO/IEC 10646),并且可以说没有此类数据需要担心,这正是不兼容更改被视为可接受的原因。

In practice, then, a version-independent label is warranted, provided the label is understood to refer to all versions after Amendment 5, and provided no incompatible change actually occurs. Should incompatible changes occur in a later version of ISO/IEC 10646, the MIME charset label defined here will stay aligned with the previous version until and unless the IETF specifically decides otherwise.

因此,在实践中,如果标签被理解为是指修订5后的所有版本,并且没有实际发生不兼容的更改,则保证使用独立于版本的标签。如果ISO/IEC 10646的更高版本中出现不兼容的更改,则此处定义的MIME字符集标签将与上一版本保持一致,直至IETF另有明确决定。

9. IANA Considerations
9. IANA考虑

The entry for UTF-8 in the IANA charset registry has been updated to point to this memo.

IANA字符集注册表中的UTF-8条目已更新,以指向此备忘录。

10. Security Considerations
10. 安全考虑

Implementers of UTF-8 need to consider the security aspects of how they handle illegal UTF-8 sequences. It is conceivable that in some circumstances an attacker would be able to exploit an incautious UTF-8 parser by sending it an octet sequence that is not permitted by the UTF-8 syntax.

UTF-8的实现者需要考虑它们如何处理非法UTF-8序列的安全性方面。可以想象,在某些情况下,攻击者可以通过发送UTF-8语法不允许的八位字节序列来攻击不谨慎的UTF-8解析器。

A particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters. For example, a parser might prohibit the NUL character when encoded as the single-octet sequence 00, but erroneously allow the illegal two-octet sequence C0 80 and interpret it as a NUL character. Another example might be a parser which prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the illegal octet sequence 2F C0 AE 2E 2F. This last exploit has actually been used in a widespread virus attacking Web servers in 2001; thus, the security threat is very real.

这种攻击的一种特别微妙的形式可以针对解析器执行,该解析器对其输入的UTF-8编码形式执行安全关键有效性检查,但将某些非法八位字节序列解释为字符。例如,当编码为单八位元序列00时,解析器可能禁止NUL字符,但错误地允许非法的双八位元序列C080并将其解释为NUL字符。另一个例子可能是一个解析器,它禁止八位元序列2F 2E 2E 2F(“///”),但允许非法八位元序列2F C0 AE 2E 2F。这最后一次攻击实际上是在2001年一个广泛的病毒攻击网络服务器中使用的;因此,安全威胁是非常真实的。

Another security issue occurs when encoding to UTF-8: the ISO/IEC 10646 description of UTF-8 allows encoding character numbers up to U+7FFFFFFF, yielding sequences of up to 6 bytes. There is therefore a risk of buffer overflow if the range of character numbers is not explicitly limited to U+10FFFF or if buffer sizing doesn't take into account the possibility of 5- and 6-byte sequences.

当编码到UTF-8时,会出现另一个安全问题:UTF-8的ISO/IEC 10646描述允许编码最多为U+7FFFFF的字符数,产生最多为6字节的序列。因此,如果字符数的范围未明确限制为U+10FFFF,或者如果缓冲区大小未考虑5字节和6字节序列的可能性,则存在缓冲区溢出的风险。

Security may also be impacted by a characteristic of several character encodings, including UTF-8: the "same thing" (as far as a user can tell) can be represented by several distinct character sequences. For instance, an e with acute accent can be represented by the precomposed U+00E9 E ACUTE character or by the canonically equivalent sequence U+0065 U+0301 (E + COMBINING ACUTE). Even though UTF-8 provides a single byte sequence for each character sequence, the existence of multiple character sequences for "the same thing" may have security consequences whenever string matching, indexing,

安全性也可能受到多个字符编码(包括UTF-8)特征的影响:“相同的东西”(据用户所知)可以由多个不同的字符序列表示。例如,带有锐重音的e可以由预合成的U+00E9 e锐字符或标准等效序列U+0065 U+0301(e+组合锐字符)表示。尽管UTF-8为每个字符序列提供了一个单字节序列,但当字符串匹配、索引、,

searching, sorting, regular expression matching and selection are involved. An example would be string matching of an identifier appearing in a credential and in access control list entries. This issue is amenable to solutions based on Unicode Normalization Forms, see [UAX15].

包括搜索、排序、正则表达式匹配和选择。例如,凭证和访问控制列表条目中出现的标识符的字符串匹配。此问题适用于基于Unicode规范化表单的解决方案,请参见[UAX15]。

11. Acknowledgements
11. 致谢

The following have participated in the drafting and discussion of this memo: James E. Agenbroad, Harald Alvestrand, Andries Brouwer, Mark Davis, Martin J. Duerst, Patrick Faltstrom, Ned Freed, David Goldsmith, Tony Hansen, Edwin F. Hart, Paul Hoffman, David Hopwood, Simon Josefsson, Kent Karlsson, Dan Kohn, Markus Kuhn, Michael Kung, Alain LaBonte, Ira McDonald, Alexey Melnikov, MURATA Makoto, John Gardiner Myers, Chris Newman, Dan Oscarsson, Roozbeh Pournader, Murray Sargent, Markus Scherer, Keld Simonsen, Arnold Winkler, Kenneth Whistler and Misha Wolf.

以下人员参与了本备忘录的起草和讨论:詹姆斯·阿根布罗德、哈拉尔·阿尔韦斯特朗、安德烈·布劳沃、马克·戴维斯、马丁·杜尔斯、帕特里克·法茨特罗姆、内德·弗里德、大卫·戈德史密斯、托尼·汉森、埃德温·哈特、保罗·霍夫曼、大卫·霍普伍德、西蒙·约瑟夫森、肯特·卡尔松、丹·科恩、马克斯·库恩、迈克尔·孔、阿兰·拉邦特、,Ira McDonald、Alexey Melnikov、MURATA Makoto、John Gardiner Myers、Chris Newman、Dan Oscarsson、Roozbeh Pournader、Murray Sargent、Markus Scherer、Keld Simonsen、Arnold Winkler、Kenneth Whistler和Misha Wolf。

12. Changes from RFC 2279
12. RFC 2279的变更

o Restricted the range of characters to 0000-10FFFF (the UTF-16 accessible range).

o 将字符范围限制为0000-10FFFF(UTF-16可访问范围)。

o Made Unicode the source of the normative definition of UTF-8, keeping ISO/IEC 10646 as the reference for characters.

o 使Unicode成为UTF-8标准定义的来源,保留ISO/IEC 10646作为字符的参考。

o Straightened out terminology. UTF-8 now described in terms of an encoding form of the character number. UCS-2 and UCS-4 almost disappeared.

o 理顺术语。UTF-8现在用字符数的编码形式来描述。UCS-2和UCS-4几乎消失。

o Turned the note warning against decoding of invalid sequences into a normative MUST NOT.

o 将“禁止解码无效序列的注意事项”改为“禁止”。

o Added a new section about the UTF-8 BOM, with advice for protocols.

o 添加了一个关于UTF-8 BOM的新部分,并提供了协议建议。

o Removed suggested UNICODE-1-1-UTF-8 MIME charset registration.

o 删除了建议的UNICODE-1-1-UTF-8 MIME字符集注册。

o Added an ABNF syntax for valid UTF-8 octet sequences

o 为有效的UTF-8八位字节序列添加了ABNF语法

o Expanded Security Considerations section, in particular impact of Unicode normalization

o 扩展了安全注意事项部分,特别是Unicode规范化的影响

13. Normative References
13. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[ISO.10646] International Organization for Standardization, "Information Technology - Universal Multiple-octet coded Character Set (UCS)", ISO/IEC Standard 10646, comprised of ISO/IEC 10646-1:2000, "Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane", ISO/IEC 10646-2:2001, "Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 2: Supplementary Planes" and ISO/IEC 10646- 1:2000/Amd 1:2002, "Mathematical symbols and other characters".

[ISO.10646]国际标准化组织,“信息技术-通用多八位编码字符集(UCS)”,ISO/IEC标准10646,由ISO/IEC 10646-1:2000组成,“信息技术-通用多八位编码字符集(UCS)-第1部分:体系结构和基本多语言平面”,ISO/IEC 10646-2:2001,“信息技术——通用多八位编码字符集(UCS)——第2部分:补充平面”和ISO/IEC 10646-1:2000/Amd 1:2002,“数学符号和其他字符”。

[UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 4.0", defined by The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), April 2003, <http://www.unicode.org/unicode/standard/ versions/enumeratedversions.html#Unicode_4_0_0>.

[UNICODE]UNICODE联盟,“UNICODE标准——版本4.0”,由UNICODE标准定义,版本4.0(波士顿,马萨诸塞州,Addison-Wesley,2003年。ISBN 0-321-18578-1),2003年4月<http://www.unicode.org/unicode/standard/ versions/enumeratedversions.html#Unicode_4_0_0>。

14. Informative References
14. 资料性引用

[CESU-8] Phipps, T., "Unicode Technical Report #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8)", UTR 26, April 2002, <http://www.unicode.org/unicode/reports/tr26/>.

[CESU-8]Phipps,T,“Unicode技术报告#26:UTF-16的兼容性编码方案:8位(CESU-8)”,UTR 26,2002年4月<http://www.unicode.org/unicode/reports/tr26/>.

[FSS_UTF] X/Open Company Ltd., "X/Open Preliminary Specification -- File System Safe UCS Transformation Format (FSS-UTF)", May 1993, <http://wwwold.dkuug.dk/jtc1/sc22/wg20/docs/ N193-FSS-UTF.pdf>.

[FSS_UTF]X/Open有限公司,“X/Open初步规范——文件系统安全UCS转换格式(FSS-UTF)”,1993年5月<http://wwwold.dkuug.dk/jtc1/sc22/wg20/docs/ N193-FSS-UTF.pdf>。

[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996.

[RFC2045]Freed,N.和N.Borenstein,“多用途Internet邮件扩展(MIME)第一部分:Internet邮件正文格式”,RFC 20451996年11月。

[RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997.

[RFC2234]Crocker,D.和P.Overell,“语法规范的扩充BNF:ABNF”,RFC 2234,1997年11月。

[RFC2978] Freed, N. and J. Postel, "IANA Charset Registration Procedures", BCP 19, RFC 2978, October 2000.

[RFC2978]Freed,N.和J.Postel,“IANA字符集注册程序”,BCP 19,RFC 2978,2000年10月。

[UAX15] Davis, M. and M. Duerst, "Unicode Standard Annex #15: Unicode Normalization Forms", An integral part of The Unicode Standard, Version 4.0.0, April 2003, <http:// www.unicode.org/unicode/reports/tr15>.

[UAX15]Davis,M.和M.Duerst,“Unicode标准附件#15:Unicode规范化形式”,Unicode标准的一个组成部分,版本4.0.0,2003年4月,<http://www.Unicode.org/Unicode/reports/tr15>。

[US-ASCII] American National Standards Institute, "Coded Character Set - 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986.

[US-ASCII]美国国家标准协会,“编码字符集-信息交换用7位美国标准代码”,ANSI X3.41986。

15. URIs
15. URI
   [1]  <http://www.unicode.org/unicode/standard/policies.html>
        
   [1]  <http://www.unicode.org/unicode/standard/policies.html>
        
16. Intellectual Property Statement
16. 知识产权声明

The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat.

IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何努力来确定任何此类权利。有关IETF在标准跟踪和标准相关文件中权利的程序信息,请参见BCP-11。可从IETF秘书处获得可供发布的权利声明副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果。

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director.

IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涉及实施本标准所需技术的专有权利。请将信息发送给IETF执行董事。

17. Author's Address
17. 作者地址

Francois Yergeau Alis Technologies 100, boul. Alexis-Nihon, bureau 600 Montreal, QC H4M 2P2 Canada

Francois Yergeau Alis Technologies 100,boul。Alexis Nihon,加拿大QC H4M 2P2蒙特利尔600局

   Phone: +1 514 747 2547
   Fax:   +1 514 747 2561
   EMail: fyergeau@alis.com
        
   Phone: +1 514 747 2547
   Fax:   +1 514 747 2561
   EMail: fyergeau@alis.com
        
18. Full Copyright Statement
18. 完整版权声明

Copyright (C) The Internet Society (2003). All Rights Reserved.

版权所有(C)互联网协会(2003年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Acknowledgement

确认

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。