Internet Engineering Task Force (IETF) J. Klensin Request for Comments: 5890 August 2010 Obsoletes: 3490 Category: Standards Track ISSN: 2070-1721
Internet Engineering Task Force (IETF) J. Klensin Request for Comments: 5890 August 2010 Obsoletes: 3490 Category: Standards Track ISSN: 2070-1721
Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework
应用程序国际化域名(IDNA):定义和文档框架
Abstract
摘要
This document is one of a collection that, together, describe the protocol and usage context for a revision of Internationalized Domain Names for Applications (IDNA), superseding the earlier version. It describes the document collection and provides definitions and other material that are common to the set.
本文档是一个集合,该集合共同描述了应用程序国际化域名(IDNA)修订版的协议和使用上下文,以取代早期版本。它描述文档集合,并提供集合通用的定义和其他材料。
Status of This Memo
关于下段备忘
This is an Internet Standards Track document.
这是一份互联网标准跟踪文件。
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.
本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 5741第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5890.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc5890.
Copyright Notice
版权公告
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2010 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.
本文件可能包含2008年11月10日之前发布或公开的IETF文件或IETF贡献中的材料。控制某些材料版权的人员可能未授予IETF信托允许在IETF标准流程之外修改此类材料的权利。在未从控制此类材料版权的人员处获得充分许可的情况下,不得在IETF标准流程之外修改本文件,也不得在IETF标准流程之外创建其衍生作品,除了将其格式化以RFC形式发布或将其翻译成英语以外的其他语言。
Table of Contents
目录
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. IDNA2008 . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1. Audiences . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2. Normative Language . . . . . . . . . . . . . . . . . . 5 1.2. Road Map of IDNA2008 Documents . . . . . . . . . . . . . . 5 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 6 2.1. Characters and Character Sets . . . . . . . . . . . . . . 6 2.2. DNS-Related Terminology . . . . . . . . . . . . . . . . . 6 2.3. Terminology Specific to IDNA . . . . . . . . . . . . . . . 7 2.3.1. LDH Label . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2. Terms for IDN Label Codings . . . . . . . . . . . . . 11 2.3.2.1. IDNA-valid strings, A-label, and U-label . . . . . 11 2.3.2.2. NR-LDH Label . . . . . . . . . . . . . . . . . . . 13 2.3.2.3. Internationalized Domain Name and Internationalized Label . . . . . . . . . . . . . 13 2.3.2.4. Label Equivalence . . . . . . . . . . . . . . . . 14 2.3.2.5. ACE Prefix . . . . . . . . . . . . . . . . . . . . 14 2.3.2.6. Domain Name Slot . . . . . . . . . . . . . . . . . 14 2.3.3. Order of Characters in Labels . . . . . . . . . . . . 15 2.3.4. Punycode is an Algorithm, Not a Name or Adjective . . 15 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 4. Security Considerations . . . . . . . . . . . . . . . . . . . 16 4.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 16 4.2. U-label Lengths . . . . . . . . . . . . . . . . . . . . . 16 4.3. Local Character Set Issues . . . . . . . . . . . . . . . . 17 4.4. Visually Similar Characters . . . . . . . . . . . . . . . 17 4.5. IDNA Lookup, Registration, and the Base DNS Specifications . . . . . . . . . . . . . . . . . . . . . . 18 4.6. Legacy IDN Label Strings . . . . . . . . . . . . . . . . . 18 4.7. Security Differences from IDNA2003 . . . . . . . . . . . . 19 4.8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 6.1. Normative References . . . . . . . . . . . . . . . . . . . 20 6.2. Informative References . . . . . . . . . . . . . . . . . . 21
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. IDNA2008 . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1. Audiences . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2. Normative Language . . . . . . . . . . . . . . . . . . 5 1.2. Road Map of IDNA2008 Documents . . . . . . . . . . . . . . 5 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 6 2.1. Characters and Character Sets . . . . . . . . . . . . . . 6 2.2. DNS-Related Terminology . . . . . . . . . . . . . . . . . 6 2.3. Terminology Specific to IDNA . . . . . . . . . . . . . . . 7 2.3.1. LDH Label . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2. Terms for IDN Label Codings . . . . . . . . . . . . . 11 2.3.2.1. IDNA-valid strings, A-label, and U-label . . . . . 11 2.3.2.2. NR-LDH Label . . . . . . . . . . . . . . . . . . . 13 2.3.2.3. Internationalized Domain Name and Internationalized Label . . . . . . . . . . . . . 13 2.3.2.4. Label Equivalence . . . . . . . . . . . . . . . . 14 2.3.2.5. ACE Prefix . . . . . . . . . . . . . . . . . . . . 14 2.3.2.6. Domain Name Slot . . . . . . . . . . . . . . . . . 14 2.3.3. Order of Characters in Labels . . . . . . . . . . . . 15 2.3.4. Punycode is an Algorithm, Not a Name or Adjective . . 15 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 4. Security Considerations . . . . . . . . . . . . . . . . . . . 16 4.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 16 4.2. U-label Lengths . . . . . . . . . . . . . . . . . . . . . 16 4.3. Local Character Set Issues . . . . . . . . . . . . . . . . 17 4.4. Visually Similar Characters . . . . . . . . . . . . . . . 17 4.5. IDNA Lookup, Registration, and the Base DNS Specifications . . . . . . . . . . . . . . . . . . . . . . 18 4.6. Legacy IDN Label Strings . . . . . . . . . . . . . . . . . 18 4.7. Security Differences from IDNA2003 . . . . . . . . . . . . 19 4.8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 6.1. Normative References . . . . . . . . . . . . . . . . . . . 20 6.2. Informative References . . . . . . . . . . . . . . . . . . 21
This document is one of a collection that, together, describe the protocol and usage context for a revision of Internationalized Domain Names for Applications (IDNA) that was largely completed in 2008, known within the series and elsewhere as "IDNA2008". The series replaces an earlier version of IDNA [RFC3490] [RFC3491]. For convenience, that version of IDNA is referred to in these documents as "IDNA2003". The newer version continues to use the Punycode algorithm [RFC3492] and ACE (ASCII-compatible encoding) prefix from that earlier version. The document collection is described in Section 1.2. As indicated there, this document provides definitions and other material that are common to the set.
本文件是一个集合,该集合共同描述了2008年基本完成的应用程序国际化域名(IDNA)修订的协议和使用环境,在本系列和其他地方称为“IDNA2008”。该系列取代了IDNA的早期版本[RFC3490][RFC3491]。为方便起见,该版本的IDNA在这些文件中称为“IDNA2003”。较新版本继续使用早期版本的Punycode算法[RFC3492]和ACE(ASCII兼容编码)前缀。第1.2节介绍了文件收集。如图所示,本文件提供了集合通用的定义和其他材料。
While many IETF specifications are directed exclusively to protocol implementers, the character of IDNA requires that it be understood and properly used by those whose responsibilities include making decisions about:
虽然许多IETF规范专门针对协议实施者,但IDNA的特点要求其职责包括做出以下决策的人员理解并正确使用:
o what names are permitted in DNS zone files,
o DNS区域文件中允许哪些名称,
o policies related to names and naming, and
o 与名称和命名相关的策略,以及
o the handling of domain name strings in files and systems, even with no immediate intention of looking them up.
o 在文件和系统中处理域名字符串,即使没有立即查找它们的意图。
This document and those documents concerned with the protocol definition, rules for handling strings that include characters written right to left, and the actual list of characters and categories will be of primary interest to protocol implementers. This document and the one containing explanatory material will be of primary interest to others, although they may have to fill in some details by reference to other documents in the set.
本文档以及与协议定义有关的文档、处理包含从右向左写入的字符的字符串的规则以及字符和类别的实际列表将是协议实现者最感兴趣的。本文件和包含解释性材料的文件将是其他人最感兴趣的文件,尽管他们可能需要参考文件集中的其他文件来填写一些细节。
This document and the associated ones are written from the perspective of an IDNA-aware user, application, or implementation. While they may reiterate fundamental DNS rules and requirements for the convenience of the reader, they make no attempt to be comprehensive about DNS principles and should not be considered as a substitute for a thorough understanding of the DNS protocols and specifications.
本文档和相关文档是从支持IDNA的用户、应用程序或实现的角度编写的。虽然他们可能会为了读者的方便而重申基本的DNS规则和要求,但他们并不试图全面了解DNS原则,也不应被视为是对DNS协议和规范的彻底理解的替代。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。
IDNA2008 consists of the following documents:
IDNA2008由以下文件组成:
o This document, containing definitions and other material that are needed for understanding other documents in the set. It is referred to informally in other documents in the set as "Defs" or "Definitions".
o 此文档包含理解集合中其他文档所需的定义和其他材料。该套文件中的其他文件非正式地将其称为“定义”或“定义”。
o A document, RFC 5894 [RFC5894], that provides an overview of the protocol and associated tables together with explanatory material and some rationale for the decisions that led to IDNA2008. That document also contains advice for registry operations and those who use Internationalized Domain Names (IDNs). It is referred to informally in other documents in the set as "Rationale". It is not normative.
o 一份文件,RFC 5894[RFC5894],该文件概述了方案和相关表格,以及导致IDNA2008的决策的解释性材料和一些基本原理。该文档还包含对注册表操作和使用国际化域名(IDN)的用户的建议。该套文件中的其他文件非正式地将其称为“基本原理”。这是不规范的。
o A document, RFC 5891 [RFC5891], that describes the core IDNA2008 protocol and its operations. In combination with the Bidi document, described immediately below, it explicitly updates and replaces RFC 3490. It is referred to informally in other documents in the set as "Protocol".
o 描述核心IDNA2008协议及其操作的文件RFC 5891[RFC5891]。结合下文所述的Bidi文件,它明确更新并取代RFC 3490。在该套文书的其他文件中非正式地称之为“议定书”。
o A document, RFC 5893 [RFC5893], that specifies special rules (Bidi) for labels that contain characters that are written from right to left.
o 一种文档,RFC 5893[RFC5893],它为包含从右向左写入的字符的标签指定特殊规则(Bidi)。
o A specification, RFC 5892 [RFC5892], of the categories and rules that identify the code points allowed in a label written in native character form (defined more specifically as a "U-label" in Section 2.3.2.1 below), based on Unicode 5.2 [Unicode52] code point assignments and additional rules unique to IDNA2008. The Unicode-based rules are expected to be stable across Unicode updates and hence independent of Unicode versions. That specification obsoletes RFC 3941 and IDN use of the tables to which it refers. It is referred to informally in other documents in the set as "Tables".
o 基于Unicode 5.2[Unicode52]代码点分配和IDNA2008特有的其他规则,以本机字符形式编写的标签(在下面第2.3.2.1节中更具体地定义为“U标签”)中允许的代码点识别类别和规则的规范RFC 5892[RFC5892]。基于Unicode的规则有望在Unicode更新中保持稳定,因此独立于Unicode版本。该规范废除了RFC3941和IDN对其引用的表的使用。该套文件中的其他文件非正式地称之为“表格”。
o A document [IDNA2008-Mapping] that discusses the issue of mapping characters into other characters and that provides guidance for doing so when that is appropriate. That document, referred to informally as "Mapping", provides advice; it is not a required part of IDNA.
o 一份文件[IDNA2008 Mapping],讨论了将字符映射到其他字符的问题,并在适当的时候提供了这样做的指导。该文件非正式地称为“绘图”,提供咨询意见;它不是IDNA的必需部分。
A code point is an integer value in the codespace of a coded character set. In Unicode, these are integers from 0 to 0x10FFFF.
代码点是编码字符集的代码空间中的整数值。在Unicode中,这些是从0到0x10FFFF的整数。
Unicode [Unicode52] is a coded character set containing somewhat over 100,000 characters assigned to code points as of version 5.2. A single Unicode code point is denoted in these documents by "U+" followed by four to six hexadecimal digits, while a range of Unicode code points is denoted by two four to six digit hexadecimal numbers separated by "..", with no prefixes.
Unicode[Unicode52]是一个编码字符集,从5.2版开始,它包含分配给代码点的大约100000个字符。在这些文档中,单个Unicode代码点由“U+”表示,后跟四到六个十六进制数字,而Unicode代码点的范围由两个四到六位十六进制数字表示,这些数字之间用“.”分隔,没有前缀。
ASCII means US-ASCII [ASCII], a coded character set containing 128 characters associated with code points in the range 0000..007F. Unicode is a superset of ASCII and may be thought of as a generalization of it; it includes all the ASCII characters and associates them with the equivalent code points.
ASCII表示US-ASCII[ASCII],一种编码字符集,包含与0000..007F范围内的代码点相关的128个字符。Unicode是ASCII的超集,可以被认为是它的推广;它包括所有ASCII字符,并将它们与等效代码点关联。
"Letters" are, informally, generalizations from the ASCII and common-sense understanding of that term, i.e., characters that are used to write text and that are not digits, symbols, or punctuation. Formally, they are characters with a Unicode General Category value starting in "L" (see Section 4.5 of The Unicode Standard [Unicode52]).
“字母”非正式地是对该术语的ASCII和常识理解的概括,即用于书写文本的字符,而不是数字、符号或标点符号。在形式上,它们是Unicode通用类别值以“L”开头的字符(请参见Unicode标准[Unicode52]的第4.5节)。
When discussing the DNS, this document generally assumes the terminology used in the DNS specifications [RFC1034] [RFC1035] as subsequently modified [RFC1123] [RFC2181]. The term "lookup" is used to describe the combination of operations performed by the IDNA2008 protocol and those actually performed by a DNS resolver. The process of placing an entry into the DNS is referred to as "registration". This is similar to common contemporary usage of that term in other contexts. Consequently, any DNS zone administration is described as a "registry", and the terms "registry" and "zone administrator" are used interchangeably, regardless of the actual administrative arrangements or level in the DNS tree. More details about that relationship are included in the Rationale document.
在讨论DNS时,本文档通常假定DNS规范[RFC1034][RFC1035]中使用的术语随后经过修改[RFC1123][RFC2181]。术语“查找”用于描述IDNA2008协议执行的操作和DNS解析程序实际执行的操作的组合。将条目放入DNS的过程称为“注册”。这类似于该术语在其他上下文中的常见当代用法。因此,任何DNS区域管理都被描述为“注册表”,术语“注册表”和“区域管理员”可以互换使用,而与DNS树中的实际管理安排或级别无关。有关这种关系的更多细节,请参见基本原理文件。
The term "LDH code point" is defined in this document to refer to the code points associated with ASCII letters (Unicode code points 0041..005A and 0061..007A), digits (0030..0039), and the hyphen-minus (U+002D). "LDH" is an abbreviation for "letters, digits, hyphen" but is used specifically in this document to refer to the set of naming rules described in Section 2.3.1 below.
本文件中定义的术语“LDH码点”是指与ASCII字母(Unicode码点0041..005A和0061..007A)、数字(0030..0039)和连字符减号(U+002D)相关的码点。“LDH”是“字母、数字、连字符”的缩写,但在本文件中专门用于指下文第2.3.1节所述的命名规则集。
The base DNS specifications [RFC1034] [RFC1035] discuss "domain names" and "hostnames", but many people use the terms interchangeably, as do sections of these specifications. Lack of clarity about that terminology has contributed to confusion about intent in some cases. These documents generally use the term "domain name". When they refer to, e.g., hostname syntax restrictions, they explicitly cite the relevant defining documents. The remaining definitions in this subsection are essentially a review: if there is any perceived difference between those definitions and the definitions in the base DNS documents or those cited below, the definitions in the other documents take precedence.
基本DNS规范[RFC1034][RFC1035]讨论了“域名”和“主机名”,但许多人可以互换使用这些术语,这些规范的部分也是如此。在某些情况下,该术语缺乏明确性导致了对意图的混淆。这些文档通常使用术语“域名”。当他们提到主机名语法限制时,他们会明确引用相关的定义文档。本小节中的其余定义基本上是一个审查:如果这些定义与基本DNS文件中的定义或以下引用的定义之间存在任何明显差异,则以其他文件中的定义为准。
A label is an individual component of a domain name. Labels are usually shown separated by dots; for example, the domain name "www.example.com" is composed of three labels: "www", "example", and "com". (The complete name convention using a trailing dot described in RFC 1123 [RFC1123], which can be explicit as in "www.example.com." or implicit as in "www.example.com", is not considered in this specification.) IDNA extends the set of usable characters in labels that are treated as text (as distinct from the binary string labels discussed in RFC 1035 and RFC 2181 [RFC2181] and bitstring ones [RFC2673]), but only in certain contexts. The different contexts for different sets of usable characters are outlined in the next section. For the rest of this document and in the related ones, the term "label" is shorthand for "text label", and "every label" means "every text label", including the expanded context.
标签是域名的单个组件。标签通常以点分隔显示;例如,域名“www.example.com”由三个标签组成:“www”、“example”和“com”。(本规范不考虑使用RFC 1123[RFC1123]中描述的尾随点的全名约定,该尾随点可以在“www.example.com”中显式表示,也可以在“www.example.com”中隐式表示。)IDNA扩展标签中视为文本的可用字符集(不同于RFC 1035和RFC 2181[RFC2181]以及位字符串[RFC2673]中讨论的二进制字符串标签),但仅限于某些上下文。下一节概述了不同可用字符集的不同上下文。对于本文档的其余部分以及相关部分,“标签”一词是“文本标签”的缩写“每个标签”是指“每个文本标签”,包括扩展的上下文。
This section defines some terminology to reduce dependence on terms and definitions that have been problematic in the past. The relationships among these definitions are illustrated in Figure 1 and Figure 2. In the first of those figures, the parenthesized numbers refer to the notes below the figure.
本节定义了一些术语,以减少对过去有问题的术语和定义的依赖。这些定义之间的关系如图1和图2所示。在第一个图中,括号内的数字指的是图下方的注释。
This is the classical label form used, albeit with some additional restrictions, in hostnames [RFC0952]. Its syntax is identical to that described as the "preferred name syntax" in Section 3.5 of RFC 1034 [RFC1034] as modified by RFC 1123 [RFC1123]. Briefly, it is a
这是主机名[RFC0952]中使用的经典标签形式,尽管有一些附加限制。其语法与经RFC 1123[RFC1123]修改的RFC 1034[RFC1034]第3.5节中描述的“首选名称语法”相同。简单地说,这是一个
string consisting of ASCII letters, digits, and the hyphen with the further restriction that the hyphen cannot appear at the beginning or end of the string. Like all DNS labels, its total length must not exceed 63 octets.
由ASCII字母、数字和连字符组成的字符串,进一步限制连字符不能出现在字符串的开头或结尾。与所有DNS标签一样,其总长度不得超过63个八位字节。
LDH labels include the specialized labels used by IDNA (described as "A-labels" below) and some additional restricted forms (also described below).
LDH标签包括IDNA使用的专用标签(下文称为“A标签”)和一些附加限制形式(下文也有描述)。
To facilitate clear description, two new subsets of LDH labels are created by the introduction of IDNA. These are called Reserved LDH labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels). Reserved LDH labels, known as "tagged domain names" in some other contexts, have the property that they contain "--" in the third and fourth characters but which otherwise conform to LDH label rules. Only a subset of the R-LDH labels can be used in IDNA-aware applications. That subset consists of the class of labels that begin with the prefix "xn--" (case independent), but otherwise conform to the rules for LDH labels. That subset is called "XN-labels" in this set of documents. XN-labels are further divided into those whose remaining characters (after the "xn--") are valid output of the Punycode algorithm [RFC3492] and those that are not (see below). The XN-labels that are valid Punycode output are known as "A-labels" if they also meet the other criteria for IDNA-validity described below. Because LDH labels (and, indeed, any DNS label) must not be more than 63 octets in length, the portion of an XN-label derived from the Punycode algorithm is limited to no more than 59 ASCII characters. Non-Reserved LDH labels are the set of valid LDH labels that do not have "--" in the third and fourth positions.
为了便于清晰描述,通过引入IDNA创建了两个新的LDH标签子集。这些被称为保留LDH标签(R-LDH标签)和非保留LDH标签(NR-LDH标签)。保留的LDH标签在某些其他上下文中称为“标记的域名”,其属性是在第三个和第四个字符中包含“-”,但在其他方面符合LDH标签规则。只有R-LDH标签的一个子集可用于IDNA感知应用程序。该子集包含以前缀“xn-”(独立于大小写)开头的标签类,但在其他方面符合LDH标签的规则。在这组文档中,该子集称为“XN标签”。XN标签进一步分为剩余字符(在“XN--”之后)为Punycode算法[RFC3492]有效输出的标签和非有效输出的标签(见下文)。有效Punycode输出的XN标签称为“A标签”,如果它们也满足下面描述的IDNA有效性的其他标准。由于LDH标签(事实上,任何DNS标签)的长度不得超过63个八位字节,因此从Punycode算法派生的XN标签部分限制为不超过59个ASCII字符。非保留LDH标签是在第三和第四位置没有“-”的一组有效LDH标签。
A consequence of the restrictions on valid characters in the native Unicode character form (see U-labels) turns out to be that mixed-case annotation, of the sort outlined in Appendix A of RFC 3492 [RFC3492], is never useful. Therefore, since a valid A-label is the result of Punycode encoding of a U-label, A-labels should be produced only in lowercase, despite matching other (mixed-case or uppercase) potential labels in the DNS.
对本机Unicode字符形式中的有效字符进行限制(请参见U-labels)的结果是,RFC 3492[RFC3492]附录A中概述的那种大小写混合注释永远不会有用。因此,由于有效的a标签是U标签的Punycode编码的结果,因此a标签应仅以小写形式生成,尽管与DNS中的其他潜在标签(混合大小写或大写)匹配。
Some strings that are prefixed with "xn--" to form labels may not be the output of the Punycode algorithm, may fail the other tests outlined below, or may violate other IDNA restrictions and thus are also not valid IDNA labels. They are called "Fake A-labels" for convenience.
某些以“xn--”为前缀以形成标签的字符串可能不是Punycode算法的输出,可能无法通过下面概述的其他测试,或者可能违反其他IDNA限制,因此也不是有效的IDNA标签。为了方便起见,它们被称为“假A标签”。
Labels within the class of R-LDH labels that are not prefixed with "xn--" are also not valid IDNA labels. To allow for future use of mechanisms similar to IDNA, those labels MUST NOT be processed as
R-LDH标签类别中未加前缀“xn--”的标签也是无效的IDNA标签。为了允许将来使用类似于IDNA的机制,这些标签不得作为
ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be mixed with IDNA labels in the same zone.
普通LDH标签由IDNA一致性计划提供,不应与IDNA标签在同一区域混合。
These distinctions among possible LDH labels are only of significance for software that is IDNA-aware or for future extensions that use extensions based on the same "prefix and encoding" model. For IDNA-aware systems, the valid label types are: A-labels, U-labels, and NR-LDH labels.
可能的LDH标签之间的这些区别仅对识别IDNA的软件或使用基于相同“前缀和编码”模型的扩展的未来扩展具有重要意义。对于IDNA感知系统,有效的标签类型为:A标签、U标签和NR-LDH标签。
IDNA labels come in two flavors: an ACE-encoded form and a Unicode (native character) form. These are referred to as A-labels and U-labels, respectively, and are described in detail in the next section.
IDNA标签有两种形式:ACE编码形式和Unicode(本机字符)形式。这些标签分别称为A标签和U标签,将在下一节中详细描述。
ASCII Label __________________________________________________________________ | | | ____________________ LDH Label (1) (4) ________________ | | | ___________________________________ | | | | |IDN Reserved LDH Labels | | | | | | ("??--") or R-LDH Labels | _______________ | | | | | _______________________________ | |NON-RESERVED | | | | | | | XN-labels | | | LDH Labels | | | | | | | _____________ ___________ | | | (NR-LDH | | | | | | | | A-labels | | Fake (3) || | | labels) | | | | | | | | "xn--"(2) | | A-labels || | |_____________| | | | | | | |___________| |__________|| | | | | | | |_____________________________| | | | | | |_________________________________| | | | |_______________________________________________________| | | | | _____________NON-LDH label________ | | | ______________________ | | | | | Underscore labels | | | | | | e.g., _tcp | | | | | |____________________| | | | | | Labels with leading| | | | | | or trailing | | | | | | hyphens "-abcd" | | | | | | or "xyz-" | | | | | | or "-uvw-" | | | | | |____________________| | | | | | Labels with other | | | | | | non-LDH ASCII chars| | | | | | e.g., #$%_ | | | | | |____________________| | | | |________________________________| | |________________________________________________________________|
ASCII Label __________________________________________________________________ | | | ____________________ LDH Label (1) (4) ________________ | | | ___________________________________ | | | | |IDN Reserved LDH Labels | | | | | | ("??--") or R-LDH Labels | _______________ | | | | | _______________________________ | |NON-RESERVED | | | | | | | XN-labels | | | LDH Labels | | | | | | | _____________ ___________ | | | (NR-LDH | | | | | | | | A-labels | | Fake (3) || | | labels) | | | | | | | | "xn--"(2) | | A-labels || | |_____________| | | | | | | |___________| |__________|| | | | | | | |_____________________________| | | | | | |_________________________________| | | | |_______________________________________________________| | | | | _____________NON-LDH label________ | | | ______________________ | | | | | Underscore labels | | | | | | e.g., _tcp | | | | | |____________________| | | | | | Labels with leading| | | | | | or trailing | | | | | | hyphens "-abcd" | | | | | | or "xyz-" | | | | | | or "-uvw-" | | | | | |____________________| | | | | | Labels with other | | | | | | non-LDH ASCII chars| | | | | | e.g., #$%_ | | | | | |____________________| | | | |________________________________| | |________________________________________________________________|
(1) ASCII letters (uppercase and lowercase), digits, hyphen. Hyphen may not appear in first or last position. No more than 63 octets. (2) Note that the string following "xn--" must be the valid output of the Punycode algorithm and must be convertible into valid U-label form. (3) Note that a Fake A-label has a prefix "xn--" but the remainder of the label is NOT the valid output of the Punycode algorithm. (4) LDH label subtypes are indistinguishable to applications that are not IDNA-aware.
(1) ASCII字母(大写和小写)、数字、连字符。连字符不能出现在第一个或最后一个位置。不超过63个八位组。(2) 请注意,“xn--”后面的字符串必须是Punycode算法的有效输出,并且必须转换为有效的U标签形式。(3) 请注意,伪a标签具有前缀“xn--”,但标签的其余部分不是Punycode算法的有效输出。(4) LDH标签子类型对于不知道IDNA的应用程序是无法区分的。
Figure 1: IDNA and Related DNS Terminology Space -- ASCII Labels
图1:IDNA和相关DNS术语空间——ASCII标签
__________________________ | Non-ASCII | | | | ___________________ | | | U-label (5) | | | |_________________| | | | | | | | Binary Label | | | | (including | | | | high bit on) | | | |_________________| | | | | | | | Bit String | | | | Label | | | |_________________| | |________________________|
__________________________ | Non-ASCII | | | | ___________________ | | | U-label (5) | | | |_________________| | | | | | | | Binary Label | | | | (including | | | | high bit on) | | | |_________________| | | | | | | | Bit String | | | | Label | | | |_________________| | |________________________|
(5) To applications that are not IDNA-aware, U-labels are indistinguishable from Binary ones.
(5) 对于不支持IDNA的应用程序,U标签与二进制标签无法区分。
Figure 2: Non-ASCII Labels
图2:非ASCII标签
For IDNA-aware applications, the three types of valid labels are "A-labels", "U-labels", and "NR-LDH labels", each of which is defined below. The relationships among them are illustrated in Figure 1 and Figure 2.
对于IDNA感知应用程序,三种类型的有效标签是“A标签”、“U标签”和“NR-LDH标签”,每种标签的定义如下。它们之间的关系如图1和图2所示。
o A string is "IDNA-valid" if it meets all of the requirements of these specifications for an IDNA label. IDNA-valid strings may appear in either of the two forms defined immediately below, or may be drawn from the NR-LDH label subset. IDNA-valid strings must also conform to all basic DNS requirements for labels. These documents make specific reference to the form appropriate to any context in which the distinction is important.
o 如果字符串满足这些规范对IDNA标签的所有要求,则该字符串为“IDNA有效”。IDNA有效字符串可能以下面定义的两种形式之一出现,也可能从NR-LDH标签子集中提取。IDNA有效字符串还必须符合标签的所有基本DNS要求。这些文件具体提到了适用于区分重要的任何上下文的形式。
o An "A-label" is the ASCII-Compatible Encoding (ACE, see Section 2.3.2.5) form of an IDNA-valid string. It must be a complete label: IDNA is defined for labels, not for parts of them and not for complete domain names. This means, by definition, that every A-label will begin with the IDNA ACE prefix, "xn--" (see Section 2.3.2.5), followed by a string that is a valid output of the Punycode algorithm [RFC3492] and hence a maximum of 59 ASCII characters in length. The prefix and string together must conform to all requirements for a label that can be stored in the
o “A标签”是IDNA有效字符串的ASCII兼容编码(ACE,见第2.3.2.5节)形式。它必须是一个完整的标签:IDNA是为标签定义的,不是为标签的一部分定义的,也不是为完整的域名定义的。根据定义,这意味着每个A标签将以IDNA ACE前缀“xn-”(见第2.3.2.5节)开头,然后是一个字符串,该字符串是Punycode算法[RFC3492]的有效输出,因此长度最多为59个ASCII字符。前缀和字符串必须符合可存储在
DNS including conformance to the rules for LDH labels (Section 2.3.1). If and only if a string meeting the above requirements can be decoded into a U-label is it an A-label.
DNS,包括符合LDH标签规则(第2.3.1节)。当且仅当满足上述要求的字符串可以解码为U标签时,它才是a标签。
o A "U-label" is an IDNA-valid string of Unicode characters, in Normalization Form C (NFC) and including at least one non-ASCII character, expressed in a standard Unicode Encoding Form (such as UTF-8). It is also subject to the constraints about permitted characters that are specified in Section 4.2 of the Protocol document and the rules in the Sections 2 and 3 of the Tables document, the Bidi constraints in that document if it contains any character from scripts that are written right to left, and the symmetry constraint described immediately below. Conversions between U-labels and A-labels are performed according to the "Punycode" specification [RFC3492], adding or removing the ACE prefix as needed.
o “U-label”是Unicode字符的IDNA有效字符串,采用规范化形式C(NFC),至少包括一个非ASCII字符,以标准Unicode编码形式(如UTF-8)表示。它还受协议文件第4.2节中规定的关于允许字符的限制以及表格文件第2节和第3节中的规则的约束,如果它包含从右向左书写的脚本中的任何字符,则该文件中的Bidi约束,以及下面描述的对称约束。U标签和A标签之间的转换根据“Punycode”规范[RFC3492]执行,根据需要添加或删除ACE前缀。
To be valid, U-labels and A-labels must obey an important symmetry constraint. While that constraint may be tested in any of several ways, an A-label A1 must be capable of being produced by conversion from a U-label U1, and that U-label U1 must be capable of being produced by conversion from A-label A1. Among other things, this implies that both U-labels and A-labels must be strings in Unicode NFC [Unicode-UAX15] normalized form. These strings MUST contain only characters specified elsewhere in this document series, and only in the contexts indicated as appropriate.
为了有效,U标签和A标签必须遵守重要的对称约束。虽然该约束可通过多种方式中的任何一种进行测试,但A标签A1必须能够通过从U标签U1转换产生,并且U标签U1必须能够通过从A标签A1转换产生。除此之外,这意味着U标签和A标签都必须是Unicode NFC[Unicode-UAX15]规范化形式的字符串。这些字符串必须仅包含本文档系列中其他地方指定的字符,并且只能在适当的上下文中指定。
Any rules or conventions that apply to DNS labels in general apply to whichever of the U-label or A-label would be more restrictive. There are two exceptions to this principle. First, the restriction to ASCII characters does not apply to the U-label. Second, expansion of the A-label form to a U-label may produce strings that are much longer than the normal 63 octet DNS limit (potentially up to 252 characters) due to the compression efficiency of the Punycode algorithm. Such extended-length U-labels are valid from the standpoint of IDNA, but caution should be exercised as shorter limits may be imposed by some applications.
适用于DNS标签的任何规则或约定通常适用于U标签或A标签中更具限制性的一种。这一原则有两个例外。首先,对ASCII字符的限制不适用于U型标签。其次,由于Punycode算法的压缩效率,将A标签形式扩展为U标签可能会产生比正常的63个八位字节DNS限制(可能高达252个字符)长得多的字符串。从IDNA的角度来看,这种延长长度的U型标签是有效的,但应谨慎,因为某些应用可能会施加较短的限制。
For context, applications that are not IDNA-aware treat all LDH labels as valid for appearance in DNS zone files and queries and some of them may permit additional types of labels (i.e., not impose the LDH restriction). IDNA-aware applications permit only A-labels and NR-LDH labels to appear in zone files and queries. U-labels can appear, along with the other two, in presentation and user interface forms, and in protocols that use IDNA forms but that do not involve the DNS itself.
对于上下文,不支持IDNA的应用程序会将所有LDH标签视为在DNS区域文件和查询中显示的有效标签,其中一些可能允许其他类型的标签(即,不施加LDH限制)。支持IDNA的应用程序只允许A标签和NR-LDH标签出现在区域文件和查询中。U标签可以与其他两种标签一起出现在表示形式和用户界面形式中,以及使用IDNA形式但不涉及DNS本身的协议中。
Specifically, for IDNA-aware applications and contexts, the three allowed categories are A-label, U-label, and NR-LDH label. Of the Reserved LDH labels (R-LDH labels) only A-labels are valid for IDNA use.
具体而言,对于IDNA感知的应用程序和上下文,三个允许的类别是A标签、U标签和NR-LDH标签。在保留的LDH标签(R-LDH标签)中,只有A-标签可用于IDNA。
Strings that appear to be A-labels or U-labels are processed in various operations of the Protocol document [RFC5891]. Those strings are not yet demonstrably conformant with the conditions outlined above because they are in the process of validation. Such strings may be referred to as "unvalidated", "putative", or "apparent", or as being "in the form of" one of the label types to indicate that they have not been verified to meet the specified conformance requirements.
在协议文档[RFC5891]的各种操作中处理看似A标签或U标签的字符串。这些字符串尚未明显符合上述条件,因为它们正在验证过程中。此类字符串可被称为“未验证”、“假定”或“明显”,或被称为标签类型之一的“形式”,以表明其未经验证符合规定的一致性要求。
Unvalidated A-labels are known only to be XN-labels, while Fake A-labels have been demonstrated to fail some of the A-label tests. Similarly, unvalidated U-labels are simply non-ASCII labels that may or may not meet the requirements for U-labels.
未经验证的A标签已知仅为XN标签,而伪造的A标签已被证明无法通过一些A标签测试。同样,未经验证的U型标签只是非ASCII标签,可能满足也可能不满足U型标签的要求。
These specifications use the term "NR-LDH label" strictly to refer to an all-ASCII label that obeys the LDH label syntax discussed in Section 2.3.1 and that is neither an IDN nor a label form reserved by IDNA (R-LDH label). It should be stressed that all A-labels obey the "hostname" [RFC0952] rules other than the length restriction in those rules.
这些规范严格使用术语“NR-LDH标签”是指符合第2.3.1节中讨论的LDH标签语法的所有ASCII标签,既不是IDN,也不是IDNA保留的标签形式(R-LDH标签)。应该强调的是,所有A标签都遵守“主机名”[RFC0952]规则,而不是这些规则中的长度限制。
An "internationalized domain name" (IDN) is a domain name that contains at least one A-label or U-label, but that otherwise may contain any mixture of NR-LDH labels, A-labels, or U-labels. Just as has been the case with ASCII names, some DNS zone administrators may impose restrictions, beyond those imposed by DNS or IDNA, on the characters or strings that may be registered as labels in their zones. Because of the diversity of characters that can be used in a U-label and the confusion they might cause, such restrictions are mandatory for IDN registries and zones even though the particular restrictions are not part of these specifications (the issue is discussed in more detail in Section 4.3 of the Protocol document [RFC5891]. Because these restrictions, commonly known as "registry restrictions", only affect what can be registered and not lookup processing, they have no effect on the syntax or semantics of DNS protocol messages; a query for a name that matches no records will yield the same response regardless of the reason why it is not in the zone. Clients issuing queries or interpreting responses cannot be
“国际化域名”(IDN)是至少包含一个a标签或U标签的域名,但可能包含NR-LDH标签、a标签或U标签的任何混合物。与ASCII名称一样,一些DNS区域管理员可能会对其区域中可能注册为标签的字符或字符串施加限制(DNS或IDNA施加的限制除外)。由于U型标签中可使用的字符的多样性及其可能造成的混淆,即使特定限制不是这些规范的一部分,这些限制对于IDN注册中心和区域也是强制性的(该问题在协议文件[RFC5891]第4.3节中有更详细的讨论)。因为这些限制,通常称为“注册表限制”,仅影响可以注册的内容,而不影响查找处理,它们对DNS协议消息的语法或语义没有影响;对不匹配任何记录的名称的查询将产生相同的响应,无论其不在区域中的原因为何。无法访问发出查询或解释响应的客户端
assumed to have any knowledge of zone-specific restrictions or conventions. See the section on registration policy in the Rationale document [RFC5894] for additional discussion.
假定了解特定区域的限制或约定。有关更多讨论,请参阅基本原理文件[RFC5894]中有关注册政策的章节。
"Internationalized label" is used when a term is needed to refer to a single label of an IDN, i.e., one that might be any of an NR-LDH label, A-label, or U-label. There are some standardized DNS label formats, such as the "underscore labels" used for service location (SRV) records [RFC2782], that do not fall into any of the three categories and hence are not internationalized labels.
当需要一个术语来指代IDN的单个标签时,即可能是NR-LDH标签、a-标签或U-标签中的任何一个时,使用“国际化标签”。有一些标准化的DNS标签格式,例如用于服务位置(SRV)记录[RFC2782]的“下划线标签”,它们不属于这三类中的任何一类,因此不是国际化标签。
In IDNA, equivalence of labels is defined in terms of the A-labels. If the A-labels are equal in a case-independent comparison, then the labels are considered equivalent, no matter how they are represented. Because of the isomorphism of A-labels and U-labels in IDNA2008, it is possible to compare U-labels directly; see the Protocol document [RFC5891] for details. Traditional LDH labels already have a notion of equivalence: within that list of characters, uppercase and lowercase are considered equivalent. The IDNA notion of equivalence is an extension of that older notion but, because the protocol does not specify any mandatory mapping and only those isomorphic forms are considered, the only equivalents are:
在IDNA中,标签的等价性是根据A标签定义的。如果在独立于大小写的比较中,A标签相等,则认为标签相等,无论它们是如何表示的。由于IDNA2008中A-标签和U-标签的同构,可以直接比较U-标签;详见协议文件[RFC5891]。传统的LDH标签已经有了一个等价的概念:在这个字符列表中,大写和小写被认为是等价的。IDNA等效概念是该旧概念的延伸,但由于协议未规定任何强制性映射,且仅考虑这些同构形式,因此唯一的等效概念为:
o Exact (bit-string identity) matches between a pair of U-labels.
o 一对U型标签之间的精确(位字符串标识)匹配。
o Matches between a pair of A-labels, using normal DNS case-insensitive matching rules.
o 使用普通DNS不区分大小写的匹配规则在一对a标签之间进行匹配。
o Equivalence between a U-label and an A-label determined by translating the U-label form into an A-label form and then testing for a match between the A-labels using normal DNS case-insensitive matching rules.
o 通过将U标签形式转换为a标签形式,然后使用正常DNS不区分大小写的匹配规则测试a标签之间的匹配,确定U标签和a标签之间的等效性。
The "ACE prefix" is defined in this document to be a string of ASCII characters, "xn--", that appears at the beginning of every A-label. "ACE" stands for "ASCII-Compatible Encoding".
“ACE前缀”在本文档中定义为ASCII字符字符串“xn-”,出现在每个a标签的开头。“ACE”代表“ASCII兼容编码”。
A "domain name slot" is defined in this document to be a protocol element or a function argument or a return value (and so on) explicitly designated for carrying a domain name. Examples of domain name slots include the QNAME field of a DNS query; the name argument of the gethostbyname() or getaddrinfo() standard C library functions;
“域名槽”在本文档中定义为协议元素、函数参数或显式指定用于承载域名的返回值(等等)。域名槽的示例包括DNS查询的QNAME字段;gethostbyname()或getaddrinfo()标准C库函数的name参数;
the part of an email address following the at sign ("@") in the parameter to the SMTP MAIL or RCPT commands or the "From:" field of an email message header; and the host portion of the URI in the "src" attribute of an HTML "<IMG>" tag. A string that has the syntax of a domain name but that appears in general text is not in a domain name slot. For example, a domain name appearing in the plain text body of an email message is not occupying a domain name slot.
SMTP邮件或RCPT命令参数中at符号(“@”)后的电子邮件地址部分,或电子邮件头的“发件人:”字段;以及HTML“<IMG>”标记的“src”属性中URI的主机部分。具有域名语法但出现在常规文本中的字符串不在域名槽中。例如,出现在电子邮件纯文本正文中的域名没有占用域名槽。
An "IDNA-aware domain name slot" is defined for this set of documents to be a domain name slot explicitly designated for carrying an internationalized domain name as defined in this document. The designation may be static (for example, in the specification of the protocol or interface) or dynamic (for example, as a result of negotiation in an interactive session).
“IDNA感知域名槽”是为这组文档定义的域名槽,明确指定用于承载本文档中定义的国际化域名。指定可以是静态的(例如,在协议或接口的规范中)或动态的(例如,作为交互会话中协商的结果)。
Name slots that are not IDNA-aware obviously include any domain name slot whose specification predates IDNA. Note that the requirements of some protocols that use the DNS for data storage prevent the use of IDNs. For example, the format required for the underscore labels used by the service location protocol [RFC2782] precludes representation of a non-ASCII label in the DNS using A-labels because those SRV-related labels must start with underscores. Of course, non-ASCII IDN labels may be part of a domain name that also includes underscore labels.
不知道IDNA的名称槽显然包括规范早于IDNA的任何域名槽。请注意,某些使用DNS进行数据存储的协议的要求会阻止使用IDN。例如,服务位置协议[RFC2782]使用的下划线标签所需的格式排除了使用a标签在DNS中表示非ASCII标签的可能性,因为这些SRV相关标签必须以下划线开头。当然,非ASCII IDN标签可能是还包括下划线标签的域名的一部分。
Because IDN labels may contain characters that are read, and preferentially displayed, from right to left, there is a potential ambiguity about which character in a label is "first". For the purposes of these specifications, labels are considered, and characters numbered, strictly in the order in which they appear "on the wire". That order is equivalent to the leftmost character being treated as first in a label that is read left to right and to the rightmost character being first in a label that is read right to left. The Bidi specification contains additional discussion of the conditions that influence reading order.
由于IDN标签可能包含从右到左读取并优先显示的字符,因此标签中的哪个字符是“第一个”可能存在歧义。在这些规范中,标签和字符的编号严格按照其在“电线上”的出现顺序进行。该顺序相当于在从左到右读取的标签中将最左边的字符视为第一个,在从右到左读取的标签中将最右边的字符视为第一个。Bidi规范包含对影响阅读顺序的条件的附加讨论。
There has been some confusion about whether a "Punycode string" does or does not include the ACE prefix and about whether it is required that such strings could have been the output of the ToASCII operation (see RFC 3490, Section 4 [RFC3490]). This specification discourages the use of the term "Punycode" to describe anything but the encoding method and algorithm of RFC 3492 [RFC3492]. The terms defined above are preferred as much more clear than the term "Punycode string".
关于“Punycode字符串”是否包含ACE前缀以及是否要求此类字符串可能是ToASCII操作的输出(请参阅RFC 3490,第4节[RFC3490]),存在一些混淆。本规范不鼓励使用术语“Punycode”来描述RFC 3492[RFC3492]的编码方法和算法以外的任何内容。上面定义的术语比术语“punycodestring”更清晰。
IANA actions for this version of IDNA (IDNA2008) are specified in the Tables document [RFC5892]. An overview of the relationships among the various IANA registries appears in the Rationale document [RFC5894]. This document does not specify any actions for IANA.
此版本IDNA(IDNA2008)的IANA操作在表格文档[RFC5892]中指定。各种IANA注册中心之间关系的概述见基本原理文件[RFC5894]。本文档未指定IANA的任何操作。
Security on the Internet partly relies on the DNS. Thus, any change to the characteristics of the DNS can change the security of much of the Internet.
互联网上的安全部分依赖于DNS。因此,DNS特性的任何改变都可能改变互联网的大部分安全性。
Domain names are used by users to identify and connect to Internet hosts and other network resources. The security of the Internet is compromised if a user entering a single internationalized name is connected to different servers based on different interpretations of the internationalized domain name. In addition to characters that are permitted by IDNA2003 and its mapping conventions (see Section 4.6), the current specification changes the interpretation of a few characters that were mapped to others in the earlier version; zone administrators should be aware of the problems that this might raise and take appropriate measures. The context for this issue is discussed in more detail in the Rationale document [RFC5894].
域名被用户用来识别并连接到Internet主机和其他网络资源。如果输入单个国际化域名的用户根据对国际化域名的不同解释连接到不同的服务器,则会危及互联网的安全。除了IDNA2003及其映射约定(见第4.6节)允许的字符外,当前规范更改了早期版本中映射到其他字符的一些字符的解释;区域管理员应意识到这可能引发的问题,并采取适当的措施。基本原理文件[RFC5894]中更详细地讨论了该问题的背景。
In addition to the Security Considerations material that appears in this document, the Bidi document [RFC5893] contains a discussion of security issues specific to labels containing characters from scripts that are normally written right to left.
除了本文件中出现的安全注意事项材料外,Bidi文件[RFC5893]还讨论了特定于标签的安全问题,标签包含通常从右向左书写的脚本中的字符。
Labels associated with the DNS have traditionally been limited to 63 octets by the general restrictions in RFC 1035 and by the need to treat them as a six-bit string length followed by the string in actual calls to the DNS. That format is used in some other applications and, in general, that representations of domain names as dot-separated labels and as length-string pairs have been treated as interchangeable. Because A-labels (the form actually used in the DNS) are potentially much more compressed than UTF-8 (and UTF-8 is, in general, more compressed that UTF-16 or UTF-32), U-labels that obey all of the relevant symmetry (and other) constraints of these documents may be quite a bit longer, potentially up to 252 characters (Unicode code points). A fully-qualified domain name containing several such labels can obviously also exceed the nominal 255 octet
由于RFC 1035中的一般限制以及在实际调用DNS时需要将其视为六位字符串长度后跟字符串,因此与DNS相关联的标签传统上被限制为63个八位字节。该格式在其他一些应用程序中使用,并且通常,将域名表示为点分隔标签和长度字符串对被视为可互换的。由于A标签(DNS中实际使用的形式)可能比UTF-8压缩得多(通常,UTF-8比UTF-16或UTF-32压缩得多),因此遵守这些文档的所有相关对称性(和其他)约束的U标签可能要长一点,可能最多252个字符(Unicode代码点)。一个包含几个这样的标签的完全限定域名显然也可以超过标称的255个八位字节
limit for such names. Application authors using U-labels must exert due caution to avoid buffer overflow and truncation errors and attacks in contexts where shorter strings are expected.
对此类名称的限制。使用U型标签的应用程序作者必须格外小心,以避免缓冲区溢出和截断错误,以及在预期字符串较短的上下文中发生攻击。
When systems use local character sets other than ASCII and Unicode, these specifications leave the problem of converting between the local character set and Unicode up to the application or local system. If different applications (or different versions of one application) implement different rules for conversions among coded character sets, they could interpret the same name differently and contact different servers. This problem is not solved by security protocols, such as Transport Layer Security (TLS) [RFC5246], that do not take local character sets into account.
当系统使用ASCII和Unicode以外的本地字符集时,这些规范将本地字符集和Unicode之间的转换问题留给应用程序或本地系统。如果不同的应用程序(或一个应用程序的不同版本)在编码字符集之间实现不同的转换规则,那么它们可能会对同一名称进行不同的解释,并联系不同的服务器。安全协议(如传输层安全协议(Transport Layer security,TLS)[RFC5246])不能解决这个问题,因为它们不考虑本地字符集。
To help prevent confusion between characters that are visually similar (sometimes called "confusables"), it is suggested that implementations provide visual indications where a domain name contains multiple scripts, especially when the scripts contain characters that are easily confused visually, such as an omicron in Greek mixed with Latin text. Such mechanisms can also be used to show when a name contains a mixture of Simplified Chinese characters with Traditional ones that have Simplified forms, or to distinguish zero and one from uppercase "O" and lowercase "L". DNS zone administrators may impose restrictions (subject to the limitations identified elsewhere in these documents) that try to minimize characters that have similar appearance or similar interpretations.
为了帮助防止视觉上相似的字符之间的混淆(有时称为“混淆”),建议在域名包含多个脚本的情况下,实现提供视觉指示,特别是当脚本包含视觉上容易混淆的字符时,例如希腊语的omicron与拉丁语文本混合。这种机制还可用于显示名称何时包含简体中文字符和具有简化形式的繁体中文字符的混合,或用于区分零和一与大写字母“O”和小写字母“L”。DNS区域管理员可能会施加限制(根据这些文档中其他地方确定的限制),以尽量减少具有类似外观或类似解释的字符。
If multiple characters appear in a label and the label consists only of characters in one script, individual characters that might be confused with others if compared separately may be unambiguous and non-confusing. On the other hand, that observation makes labels containing characters from more than one script (often called "mixed-script labels") even more risky -- users will tend to see what they expect to see and context is a powerful reinforcement to perception. At the same time, while the risks associated with mixed-script labels are clear, simply prohibiting them will not eliminate problems, especially where closely related scripts are involved. For example, there are many strings that are entirely in Greek or Cyrillic scripts that can be confused with each other or with Latin script strings.
如果标签中出现多个字符,且标签仅由一个脚本中的字符组成,则单独比较可能会与其他字符混淆的单个字符可能是明确且不混淆的。另一方面,这种观察使包含多个脚本中的字符的标签(通常称为“混合脚本标签”)更具风险——用户将倾向于看到他们期望看到的内容,而上下文是对感知的有力强化。与此同时,虽然混合脚本标签的风险显而易见,但简单地禁止它们并不能消除问题,特别是在涉及密切相关脚本的情况下。例如,有许多字符串完全是希腊或西里尔文字,它们可能相互混淆或与拉丁文字字符串混淆。
It is worth noting that there are no comprehensive technical solutions to the problems of confusable characters. One can reduce the extent of the problems in various ways, but probably never
值得注意的是,对于易混淆字符的问题没有全面的技术解决方案。人们可以通过各种方式降低问题的程度,但可能永远不会
eliminate it. Some specific suggestions about identification and handling of confusable characters appear in a Unicode Consortium publication [Unicode-UTR36].
消除它。Unicode联盟出版物[Unicode-UTR36]中给出了一些关于识别和处理易混淆字符的具体建议。
The Protocol specification [RFC5891] describes procedures for registering and looking up labels that are not compatible with the preferred syntax described in the base DNS specifications (see Section 2.3.1) because they contain non-ASCII characters. These procedures depend on the use of a special ASCII-compatible encoding form that contains only characters permitted in hostnames by those earlier specifications. The encoding used is Punycode [RFC3492]. No security issues such as string length increases or new allowed values are introduced by the encoding process or the use of these encoded values, apart from those introduced by the ACE encoding itself.
协议规范[RFC5891]描述了注册和查找与基本DNS规范(见第2.3.1节)中描述的首选语法不兼容的标签的过程,因为这些标签包含非ASCII字符。这些过程依赖于使用一种特殊的ASCII兼容编码形式,该编码形式只包含那些早期规范允许的主机名中的字符。使用的编码是Punycode[RFC3492]。除了ACE编码本身引入的安全问题外,编码过程或这些编码值的使用不会引入任何安全问题,如字符串长度增加或新的允许值。
Domain names (or portions of them) are sometimes compared against a set of domains to be given special treatment if a match occurs, e.g., treated as more privileged than others or blocked in some way. In such situations, it is especially important that the comparisons be done properly, as specified in the "Requirements" section of the Protocol document [RFC5891]. For labels already in ASCII form, the proper comparison reduces to the same case-insensitive ASCII comparison that has always been used for ASCII labels although IDNA-aware applications are expected to look up only A-labels and NR-LDH labels, i.e., to avoid looking up R-LDH labels that are not A-labels.
域名(或其中的一部分)有时会与一组域进行比较,如果出现匹配,将给予特殊处理,例如,被视为比其他域名更具特权或以某种方式被阻止。在这种情况下,按照协议文件[RFC5891]中“要求”部分的规定,正确进行比较尤为重要。对于已经采用ASCII格式的标签,正确的比较将简化为始终用于ASCII标签的不区分大小写的ASCII比较,尽管IDNA感知应用程序预计只查找A标签和NR-LDH标签,即避免查找非A标签的R-LDH标签。
The introduction of IDNA meant that any existing labels that start with the ACE prefix would be construed as A-labels, at least until they failed one of the relevant tests, whether or not that was the intent of the zone administrator or registrant. There is no evidence that this has caused any practical problems since RFC 3490 was adopted, but the risk still exists in principle.
IDNA的引入意味着任何以ACE前缀开头的现有标签都将被解释为A标签,至少直到它们通过一项相关测试为止,无论这是否是区域管理员或注册人的意图。自采用RFC 3490以来,没有证据表明这导致了任何实际问题,但原则上风险仍然存在。
The URI Standard [RFC3986] and a number of application specifications (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII labels in DNS names used with those protocols, i.e., only the A-label form of IDNs is permitted in those contexts. If only A-labels are used, differences in interpretation between IDNA2003 and this version arise only for characters whose interpretation have actually changed (e.g., characters, such as ZWJ and ZWNJ, that were mapped to nothing in IDNA2003 and that are considered legitimate in some contexts by these specifications). Despite that prohibition, there are a significant number of files and databases on the Internet in which
URI标准[RFC3986]和许多应用程序规范(例如SMTP[RFC5321]和HTTP[RFC2616])不允许在与这些协议一起使用的DNS名称中使用非ASCII标签,即,在这些上下文中只允许IDN的a标签形式。如果仅使用A-标签,IDNA2003和本版本之间的解释差异仅出现在解释已实际更改的字符(例如,在IDNA2003中映射为零的字符,如ZWJ和ZWNJ,这些规范认为这些字符在某些上下文中是合法的)。尽管有这项禁令,互联网上仍有大量文件和数据库
domain name strings appear in native-character form; a subset of those strings use native-character labels that require IDNA2003 mapping to produce valid A-labels. The treatment of such labels will vary by types of applications and application-designer preference: in some situations, warnings to the user or outright rejection may be appropriate; in others, it may be preferable to attempt to apply the earlier mappings if lookup strictly conformant to these specifications fails or even to do lookups under both sets of rules. This general situation is discussed in more detail in the Rationale document [RFC5894]. However, in the absence of care by registries about how strings that could have different interpretations under IDNA2003 and the current specification are handled, it is possible that the differences could be used as a component of name-matching or name-confusion attacks. Such care is therefore appropriate.
域名字符串以本机字符形式出现;这些字符串的子集使用需要IDNA2003映射才能生成有效a标签的本机字符标签。此类标签的处理方式因应用程序类型和应用程序设计者偏好而异:在某些情况下,向用户发出警告或直接拒绝可能是合适的;在其他情况下,如果严格符合这些规范的查找失败,或者甚至无法在两组规则下进行查找,则最好尝试应用早期的映射。基本原理文件[RFC5894]中对这种一般情况进行了更详细的讨论。然而,由于注册中心对如何处理IDNA2003和当前规范下可能具有不同解释的字符串缺乏关注,这些差异可能被用作名称匹配或名称混淆攻击的一个组成部分。因此,这种谨慎是适当的。
The registration and lookup models described in this set of documents change the mechanisms available for lookup applications to determine the validity of labels they encounter. In some respects, the ability to test is strengthened. For example, putative labels that contain unassigned code points will now be rejected, while IDNA2003 permitted them (see the Rationale document [RFC5894] for a discussion of the reasons for this). On the other hand, the Protocol specification no longer assumes that the application that looks up a name will be able to determine, and apply, information about the protocol version used in registration. In theory, that may increase risk since the application will be able to do less pre-lookup validation. In practice, the protection afforded by that test has been largely illusory for reasons explained in RFC 4690 [RFC4690] and elsewhere in these documents.
这组文档中描述的注册和查找模型更改了查找应用程序可用的机制,以确定它们遇到的标签的有效性。在某些方面,测试能力得到了加强。例如,包含未分配代码点的假定标签现在将被拒绝,而IDNA2003允许这些标签(有关其原因的讨论,请参阅基本原理文档[RFC5894])。另一方面,协议规范不再假设查找名称的应用程序将能够确定并应用有关注册中使用的协议版本的信息。从理论上讲,这可能会增加风险,因为应用程序将能够执行较少的预查找验证。实际上,由于RFC 4690[RFC4690]和这些文件中其他地方解释的原因,该测试提供的保护在很大程度上是虚幻的。
Any change to the Stringprep [RFC3454] procedure that is profiled and used in IDNA2003, or, more broadly, the IETF's model of the use of internationalized character strings in different protocols, creates some risk of inadvertent changes to those protocols, invalidating deployed applications or databases, and so on. But these specifications do not change Stringprep at all; they merely bypass it. Because these documents do not depend on Stringprep, the question of upgrading other protocols that do have that dependency can be left to experts on those protocols: the IDNA changes and possible upgrades to security protocols or conventions are independent issues.
IDNA2003中描述和使用的Stringprep[RFC3454]过程的任何更改,或者更广泛地说,IETF在不同协议中使用国际化字符串的模型的任何更改,都会造成对这些协议的意外更改、使部署的应用程序或数据库失效等风险。但这些规范根本没有改变;他们只是绕过它。由于这些文档不依赖于Stringprep,因此升级具有该依赖性的其他协议的问题可以留给这些协议的专家解决:IDNA更改和可能升级到安全协议或约定是独立的问题。
No mechanism involving names or identifiers alone can protect against a wide variety of security threats and attacks that are largely independent of the naming or identification system. These attacks include spoofed pages, DNS query trapping and diversion, and so on.
仅涉及名称或标识符的任何机制都无法抵御在很大程度上独立于命名或标识系统的各种安全威胁和攻击。这些攻击包括伪造页面、DNS查询捕获和转移等。
The initial version of this document was created largely by extracting text from early draft versions of the Rationale document [RFC5894]. See the section of this name and the one entitled "Contributors", in it.
本文件的初始版本主要是从基本原理文件的早期草案版本[RFC5894]中提取文本创建的。请参阅此名称的一节和其中题为“贡献者”的一节。
Specific textual suggestions after the extraction process came from Vint Cerf, Lisa Dusseault, Bill McQuillan, Andrew Sullivan, and Ken Whistler. Other changes were made in response to more general comments, lists of concerns or specific errors from participants in the Working Group and other observers, including Lyman Chapin, James Mitchell, Subramanian Moonesamy, and Dan Winship.
提取过程后的具体文本建议来自Vint Cerf、Lisa Dusseault、Bill McQuillan、Andrew Sullivan和Ken Whistler。针对工作组参与者和其他观察员(包括莱曼·查宾、詹姆斯·米切尔、Subramanian Moonesay和丹·温希普)提出的更一般性的评论、关注事项清单或具体错误,做出了其他修改。
[ASCII] American National Standards Institute (formerly United States of America Standards Institute), "USA Code for Information Interchange", ANSI X3.4-1968, 1968. ANSI X3.4-1968 has been replaced by newer versions with slight modifications, but the 1968 version remains definitive for the Internet.
[ASCII]美国国家标准协会(前美国标准协会),“美国信息交换代码”,ANSI X3.4-1968,1968年。ANSI X3.4-1968已被稍作修改的较新版本所取代,但1968年版本仍然是互联网的最终版本。
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", STD 13, RFC 1034, November 1987.
[RFC1034]Mockapetris,P.,“域名-概念和设施”,STD 13,RFC 1034,1987年11月。
[RFC1035] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, November 1987.
[RFC1035]Mockapetris,P.,“域名-实现和规范”,STD 13,RFC 1035,1987年11月。
[RFC1123] Braden, R., "Requirements for Internet Hosts - Application and Support", STD 3, RFC 1123, October 1989.
[RFC1123]Braden,R.,“互联网主机的要求-应用和支持”,STD 3,RFC 1123,1989年10月。
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[Unicode-UAX15] The Unicode Consortium, "Unicode Standard Annex #15: Unicode Normalization Forms, Revision 31", September 2009, <http://www.unicode.org/reports/tr15/tr15-31.html>.
[Unicode-UAX15]Unicode联盟,“Unicode标准附录15:Unicode规范化表单,第31版”,2009年9月<http://www.unicode.org/reports/tr15/tr15-31.html>.
[Unicode52] The Unicode Consortium. The Unicode Standard, Version 5.2.0, defined by: "The Unicode Standard, Version 5.2.0", (Mountain View, CA: The Unicode Consortium, 2009. ISBN 978-1-936213-00-9). <http://www.unicode.org/versions/Unicode5.2.0/>.
[Unicode 52]Unicode联盟。Unicode标准,版本5.2.0,定义为:“Unicode标准,版本5.2.0”(加利福尼亚州山景城:Unicode联盟,2009年。ISBN 978-1-936213-00-9)<http://www.unicode.org/versions/Unicode5.2.0/>.
[IDNA2008-Mapping] Resnick, P. and P. Hoffman, "Mapping Characters in Internationalized Domain Names for Applications (IDNA)", Work in Progress, April 2010.
[IDNA2008映射]Resnick,P.和P.Hoffman,“应用程序国际化域名(IDNA)中的字符映射”,正在进行的工作,2010年4月。
[RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet host table specification", RFC 952, October 1985.
[RFC0952]Harrenstien,K.,Stahl,M.和E.Feinler,“国防部互联网主机表规范”,RFC 952,1985年10月。
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS Specification", RFC 2181, July 1997.
[RFC2181]Elz,R.和R.Bush,“DNS规范的澄清”,RFC 21811997年7月。
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC2616]菲尔丁,R.,盖蒂斯,J.,莫卧儿,J.,弗莱斯蒂克,H.,马斯特,L.,利奇,P.,和T.伯纳斯李,“超文本传输协议——HTTP/1.1”,RFC 2616,1999年6月。
[RFC2673] Crawford, M., "Binary Labels in the Domain Name System", RFC 2673, August 1999.
[RFC2673]克劳福德,M.,“域名系统中的二进制标签”,RFC2673,1999年8月。
[RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for specifying the location of services (DNS SRV)", RFC 2782, February 2000.
[RFC2782]Gulbrandsen,A.,Vixie,P.和L.Esibov,“用于指定服务位置(DNS SRV)的DNS RR”,RFC 2782,2000年2月。
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002.
[RFC3454]Hoffman,P.和M.Blanchet,“国际化弦的准备(“stringprep”)”,RFC 3454,2002年12月。
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003.
[RFC3490]Faltstrom,P.,Hoffman,P.,和A.Costello,“应用程序中的域名国际化(IDNA)”,RFC 34902003年3月。
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003.
[RFC3491]Hoffman,P.和M.Blanchet,“Nameprep:国际化域名(IDN)的Stringprep配置文件”,RFC 3491,2003年3月。
[RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003.
[RFC3492]Costello,A.,“Punycode:应用程序中国际化域名的Unicode引导字符串编码(IDNA)”,RFC 3492,2003年3月。
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
[RFC3986]Berners Lee,T.,Fielding,R.,和L.Masinter,“统一资源标识符(URI):通用语法”,STD 66,RFC 3986,2005年1月。
[RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and Recommendations for Internationalized Domain Names (IDNs)", RFC 4690, September 2006.
[RFC4690]Klensin,J.,Faltstrom,P.,Karp,C.,和IAB,“国际化域名(IDN)的审查和建议”,RFC 46902006年9月。
[RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008.
[RFC5246]Dierks,T.和E.Rescorla,“传输层安全(TLS)协议版本1.2”,RFC 5246,2008年8月。
[RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, October 2008.
[RFC5321]Klensin,J.,“简单邮件传输协议”,RFC 53212008年10月。
[RFC5891] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, August 2010.
[RFC5891]Klensin,J.,“应用程序中的国际化域名(IDNA):协议”,RFC 58912010年8月。
[RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)", RFC 5892, August 2010.
[RFC5892]Faltstrom,P.,Ed.“Unicode代码点和应用程序的国际化域名(IDNA)”,RFC 5892,2010年8月。
[RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)", RFC 5893, August 2010.
[RFC5893]Alvestrand,H.和C.Karp,“应用程序国际化域名(IDNA)的从右到左脚本”,RFC 58932010年8月。
[RFC5894] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale", RFC 5894, August 2010.
[RFC5894]Klensin,J.“应用程序的国际化域名(IDNA):背景、解释和基本原理”,RFC 58942010年8月。
[Unicode-UTR36] The Unicode Consortium, "Unicode Technical Report #36: Unicode Security Considerations, Revision 7", July 2008, <http://www.unicode.org/reports/tr36/tr36-7.html>.
[Unicode-UTR36]Unicode联盟,“Unicode技术报告#36:Unicode安全注意事项,第7版”,2008年7月<http://www.unicode.org/reports/tr36/tr36-7.html>.
Author's Address
作者地址
John C Klensin 1770 Massachusetts Ave, Ste 322 Cambridge, MA 02140 USA
美国马萨诸塞州剑桥322号马萨诸塞大道1770号约翰·C·克伦辛邮编:02140
Phone: +1 617 245 1457 EMail: john+ietf@jck.com
Phone: +1 617 245 1457 EMail: john+ietf@jck.com