Network Working Group J. Klensin Request for Comments: 4290 December 2005 Category: Informational
Network Working Group J. Klensin Request for Comments: 4290 December 2005 Category: Informational
Suggested Practices for Registration of Internationalized Domain Names (IDN)
国际域名(IDN)注册的建议做法
Status of This Memo
关于下段备忘
This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The Internet Society (2005).
版权所有(C)互联网协会(2005年)。
IESG Note
IESG注释
This RFC is not a candidate for any level of Internet Standard. The IETF disclaims any knowledge of the fitness of this RFC for any purpose and notes that the decision to publish is not based on IETF review apart from IESG review for conflict with IETF work. The RFC Editor has chosen to publish this document at its discretion. See RFC 3932 for more information.
本RFC不适用于任何级别的互联网标准。IETF不承认任何关于本RFC适用于任何目的的知识,并指出,除了IESG审查与IETF工作的冲突外,发布决定并非基于IETF审查。RFC编辑已自行决定发布本文件。有关更多信息,请参阅RFC 3932。
Abstract
摘要
This document explores the issues in the registration of internationalized domain names (IDNs). The basic IDN definition allows a very large number of possible characters in domain names, and this richness may lead to serious user confusion about similar-looking names. To avoid this confusion, the IDN registration process must impose rules that disallow some otherwise-valid name combinations. This document suggests a set of mechanisms that registries might use to define and implement such rules for a broad range of languages, including adaptation of methods developed for Chinese, Japanese, and Korean domain names.
本文件探讨了国际域名(IDN)注册中的问题。基本的IDN定义允许域名中有大量可能的字符,这种丰富性可能会导致用户对相似名称的严重混淆。为了避免这种混淆,IDN注册过程必须强制实施禁止某些其他有效名称组合的规则。本文件提出了一套机制,注册中心可以使用这些机制为广泛的语言定义和实施此类规则,包括对为中文、日文和韩文域名开发的方法进行改编。
Table of Contents
目录
1. Introduction ....................................................3 1.1. Background .................................................3 1.2. The Nature and Status of these Recommendations .............4 1.3. Terminology ................................................5 1.3.1. Languages and Scripts .................................5 1.3.2. Characters, Variants, Registrations, and Other Issues ................................................6 1.3.3. Confusion, Fraud, and Cybersquatting ..................9 1.4. A Review of the JET Guidelines .............................9 1.4.1. JET Model .............................................9 1.4.2. Reserved Names and Label Packages ....................10 1.5. Languages, Scripts, and Variants ..........................11 1.5.1. Languages versus Scripts .............................11 1.5.2. Variant Selection ....................................13 1.6. Variants are not a Universal Remedy .......................14 1.7. Reservations and Exclusions ...............................14 1.7.1. Sequence Exclusions for Valid Characters .............14 1.7.2. Character Pairing Issues .............................15 1.8. The Registration Bundle ...................................15 1.8.1. Definitions and Structure ............................15 1.8.2. Application of the Registration Bundle ...............16 2. Some Implications of This Approach .............................17 3. Possible Modifications of the JET Model ........................18 4. Conclusions and Recommendations About the General Approach .....18 5. A Model Table Format ...........................................19 6. A Model Label Registration Procedure: "CreateBundle" ...........20 6.1. Description of the CreateBundle Mechanism .................21 6.2. The "no-variants" Case ....................................22 6.3. CreateBundle and Nameprep Mapping .........................22 7. IANA Considerations ............................................23 8. Internationalization Considerations ............................24 9. Security Considerations ........................................24 10. Acknowledgements ..............................................25 11. Informative References ........................................26
1. Introduction ....................................................3 1.1. Background .................................................3 1.2. The Nature and Status of these Recommendations .............4 1.3. Terminology ................................................5 1.3.1. Languages and Scripts .................................5 1.3.2. Characters, Variants, Registrations, and Other Issues ................................................6 1.3.3. Confusion, Fraud, and Cybersquatting ..................9 1.4. A Review of the JET Guidelines .............................9 1.4.1. JET Model .............................................9 1.4.2. Reserved Names and Label Packages ....................10 1.5. Languages, Scripts, and Variants ..........................11 1.5.1. Languages versus Scripts .............................11 1.5.2. Variant Selection ....................................13 1.6. Variants are not a Universal Remedy .......................14 1.7. Reservations and Exclusions ...............................14 1.7.1. Sequence Exclusions for Valid Characters .............14 1.7.2. Character Pairing Issues .............................15 1.8. The Registration Bundle ...................................15 1.8.1. Definitions and Structure ............................15 1.8.2. Application of the Registration Bundle ...............16 2. Some Implications of This Approach .............................17 3. Possible Modifications of the JET Model ........................18 4. Conclusions and Recommendations About the General Approach .....18 5. A Model Table Format ...........................................19 6. A Model Label Registration Procedure: "CreateBundle" ...........20 6.1. Description of the CreateBundle Mechanism .................21 6.2. The "no-variants" Case ....................................22 6.3. CreateBundle and Nameprep Mapping .........................22 7. IANA Considerations ............................................23 8. Internationalization Considerations ............................24 9. Security Considerations ........................................24 10. Acknowledgements ..............................................25 11. Informative References ........................................26
The IDNA (Internationalized Domain Names in Applications) specification [RFC3490] defines the basic model for encoding non-ASCII strings in the DNS. Additional specifications [RFC3491] [RFC3492] define the mechanisms and tables needed to support IDNA. As work on these specifications neared completion, it became apparent that it would be desirable for registries to impose additional restrictions on the names that could actually be registered (e.g., see [IESG-IDN] and [ICANN-IDN]) to reduce potential confusion among characters that were similar in some way. This document explores these IDN (international domain name) registration issues and suggests a set of mechanisms that IDN registries might use.
IDNA(应用程序中的国际化域名)规范[RFC3490]定义了在DNS中编码非ASCII字符串的基本模型。其他规范[RFC3491][RFC3492]定义了支持IDNA所需的机制和表格。随着这些规范的工作接近完成,很明显,注册中心需要对实际可以注册的名称施加额外的限制(例如,参见[IESG-IDN]和[ICANN-IDN]),以减少在某种程度上相似的字符之间的潜在混淆。本文档探讨这些IDN(国际域名)注册问题,并建议IDN注册中心可能使用的一组机制。
Registration restrictions are part of a long tradition. For example, while the original DNS specifications [RFC1035] permitted any string of octets in a DNS label, they also recommended the use of a much more restricted subset. This subset was derived from the much older "hostname" rules [RFC952] and defined by the "LDH" convention (for the three permitted types of characters: letters, digits, and the hyphen). Enforcement of this restricted subset in registrations was the responsibility of the registry or domain administrator. The definition of the subset was embedded in the DNS protocol itself, although some applications protocols, notably those concerned with electronic mail, did impose and enforce similar rules.
注册限制是一项长期传统的一部分。例如,虽然最初的DNS规范[RFC1035]允许DNS标签中的任何八进制字符串,但他们也建议使用更受限制的子集。该子集源自更古老的“主机名”规则[RFC952],并由“LDH”约定定义(适用于三种允许的字符类型:字母、数字和连字符)。在注册中强制执行此受限子集是注册中心或域管理员的责任。子集的定义嵌入到DNS协议本身中,尽管一些应用程序协议,特别是与电子邮件有关的应用程序协议,确实实施了类似的规则。
If there are no constraints on registration in a zone, people can register characters that increase the risk of misunderstandings, cybersquatting, and other forms of confusion. A similar situation existed even before the introduction of IDNA, as exemplified by domain names such as example.com and examp1e.com (note that the latter domain contains the digit "1" instead of the letter "l").
如果在一个区域注册没有限制,人们可以注册增加误解、网络抢注和其他形式混乱风险的字符。甚至在引入IDNA之前也存在类似的情况,例如example.com和examp1e.com等域名(请注意,后一个域名包含数字“1”而不是字母“l”)。
For non-ASCII names (so-called "internationalized domain names" or "IDNs"), the problem is more complicated. In the earlier situation that led to the LDH (hostname) rules, all protocols, hosts, and DNS zones used ASCII exclusively in practice, so the LDH restriction could reasonably be applied uniformly across the Internet. Support for IDNs introduces a very large character repertoire, different geographical and political locations, and languages that require different collections of characters. The optimal registration restrictions are no longer a global matter; they may be different in different areas and, hence, in different DNS zones.
对于非ASCII名称(所谓的“国际化域名”或“IDN”),问题更加复杂。在导致LDH(主机名)规则的早期情况下,所有协议、主机和DNS区域在实践中都只使用ASCII,因此LDH限制可以合理地在Internet上统一应用。对IDN的支持引入了非常庞大的字符库、不同的地理和政治位置以及需要不同字符集的语言。最佳注册限制不再是一个全球性问题;它们在不同的区域可能不同,因此在不同的DNS区域也可能不同。
For some human writing systems, there are characters and/or strings that have equivalent or near-equivalent usages. If a name can be registered with such a character or string, the registry might want to automatically associate all of the names that have the same meaning with the registered name. The registry might also decide whether the names that are associated with, or generated by, one registration should, as a group or individually, go into the zone or should be blocked from registration by different parties.
对于某些人类书写系统,存在具有等效或近似等效用法的字符和/或字符串。如果可以使用这样的字符或字符串注册名称,则注册表可能希望自动将所有具有相同含义的名称与注册名称关联。登记处还可以决定与一项登记相关联或由一项登记产生的名称是否应作为一个整体或单独进入该区域,或是否应被不同的当事方禁止登记。
To date, the best-developed system for handling registration restrictions for IDNs is the JET Guidelines for Chinese, Japanese, and Korean [RFC3743], the so-called "CJK" languages. The JET Guidelines are limited to the CJK languages and, in particular, to their common script base. Those languages are also the best-known and most widely-used examples of writing systems constructed on "ideographic" or "pictographic" principles. This document explores the principles behind the JET guidelines. It then examines some of the issues that might arise in adapting them to alphabetic languages, i.e., to languages whose characters primarily represent sounds rather than meanings.
迄今为止,处理IDN注册限制的最完善系统是针对中文、日文和韩文的JET指南[RFC3743],即所谓的“CJK”语言。JET指南仅限于CJK语言,特别是其通用脚本库。这些语言也是根据“表意”或“象形”原则构建的书写系统中最著名和使用最广泛的例子。本文件探讨了JET指南背后的原则。然后,本章探讨了将它们改编成字母语言时可能出现的一些问题,也就是说,改编成字符主要代表声音而非意义的语言。
This document describes five things:
本文件描述了五件事:
1. The general background and considerations for non-ASCII scripts in names.
1. 名称中非ASCII脚本的一般背景和注意事项。
2. Suggested practices for describing character variants.
2. 描述角色变体的建议实践。
3. A method for using a zone's character variants to determine which names should be associated with a registration.
3. 一种使用区域的字符变体来确定哪些名称应与注册关联的方法。
4. A format for publishing a zone's table of character variants; Such tables are referred to below simply as "language tables" or simply "tables".
4. 用于发布区域字符变体表的格式;这些表格在下文中简称为“语言表格”或简称为“表格”。
5. A model algorithm for name registration given the presence of language tables.
5. 一种基于语言表的名称注册模型算法。
The document makes recommendations for consideration by registries and, where relevant, by those who coordinate them, and by those who use their services. None of the recommendations are intended to be normative. Instead, the intent of the document is to illustrate a framework for developing variations to meet the needs of particular registries and their processing of particular languages. Of course, if registries make similar decisions and utilize similar tools, costs
该文件提出了建议,供各登记处审议,并在相关情况下供协调它们的人以及使用它们的服务的人审议。这些建议都不是规范性的。相反,本文件的目的是说明一个框架,用以制定各种变体,以满足特定登记处的需要及其对特定语文的处理。当然,如果登记处做出类似的决定,并使用类似的工具,则成本会降低
and confusion may be reduced -- both between registries and for users and registrars who have relationships with more than one domain.
同时,也可以减少注册中心之间以及与多个域有关系的用户和注册者之间的混淆。
Just as the JET Guidelines contain some suggestions that may not be applicable to alphabetic scripts, some of the suggestions here, especially the more specific ones, may be applicable to some scripts and not others.
正如JET指南包含一些可能不适用于字母脚本的建议一样,这里的一些建议,特别是更具体的建议,可能适用于某些脚本,而不适用于其他脚本。
This document uses the term "language" in what may be, to many readers, an odd way. Neither this specification, nor IDNA, nor the DNS are directly concerned with natural language, but only with the characters that make up a given label. In some respects, the term "script", used in the character coding community for a collection of characters, might be more appropriate. However, different subsets of the same script may be used with different languages, and the same language may be written using different characters (or even completely different scripts) in different locations, so "script" is not precisely correct either.
本文件使用“语言”一词,对许多读者来说,这可能是一种奇怪的方式。本规范、IDNA和DNS都与自然语言没有直接关系,只与构成给定标签的字符有关。在某些方面,字符编码社区中用于字符集合的术语“脚本”可能更合适。然而,同一脚本的不同子集可能用于不同的语言,同一语言可能在不同的位置使用不同的字符(甚至完全不同的脚本)编写,因此“脚本”也不完全正确。
Long-standing confusion has also resulted from the fact that most scripts are, informally at least, named after one of the languages written in them. "Chinese" describes both a language and a collection of characters that are also used in writing Japanese, Korean, and, at least historically, some other languages. "Latin" describes a language, the characters used to write that language, and, often, characters used to write a number of contemporary languages that are derived from or similar to those used to write the Latin language. The script used to write the Arabic language is called "Arabic", but it is also used (typically with some additions or deletions) to write a number of other languages. Situations in which a script has a clearly-defined name that is independent of the name of a language are the exception, rather than the rule; examples include Hangul, used to write Korean, Katakana and Hiragana, used to write Japanese, and a few others. Some scholars have historically used "Roman" or "Roman-derived" for the script in an attempt to distinguish between a script and the Latin language.
长期以来的困惑也源于这样一个事实,即大多数脚本(至少非正式地)都是以其中一种语言命名的。“汉语”既描述一种语言,也描述一组字符,这些字符也用于书写日语、韩语,至少在历史上,还有一些其他语言。“拉丁语”指的是一种语言,指用来书写该语言的字符,通常指用来书写许多现代语言的字符,这些语言源于或类似于拉丁语。用于编写阿拉伯语的脚本称为“阿拉伯语”,但它也用于(通常添加或删除)编写许多其他语言。脚本具有独立于语言名称的明确定义名称的情况是例外,而不是规则;例子包括韩语,用来写韩语,片假名和平假名,用来写日语,还有一些其他的。一些学者在历史上使用“罗马”或“罗马衍生”作为脚本,试图区分脚本和拉丁语。
The term "language" is therefore used in this document in the informal sense of a written language and is defined, for this purpose, by the characters used to write it, i.e., as a language-specific subset of a script. In this context, a "language" is defined by the combination of a code (see Section 1.4.1) and an authority that has chosen to use that code and establish a character-listing for it. Authorities are normally TLD (top-level
因此,“语言”一词在本文件中以书面语言的非正式含义使用,并为此目的由用于书写的字符定义,即作为脚本的特定语言子集。在此上下文中,“语言”由代码(见第1.4.1节)和选择使用该代码并为其建立字符列表的机构的组合定义。当局通常为TLD(顶级
domain) registries; see Section 7 and [IANA-language-registry]. However, it is expected that TLD registries will find appropriate experts and that advice from language and script experts selected by international neutral bodies will also become part of the registration system. In addition, as discussed below in Section 7, registries may conclude that the best interests of registrants, stakeholders, and the Internet community would be served by constructing "language tables" that mix scripts and characters in ways that conform to no known language. Conventions should be developed for such registrations that do not misleadingly reflect specific language codes.
(a)登记册;参见第7节和[IANA语言注册中心]。然而,预计TLD登记处将找到适当的专家,国际中立机构挑选的语言和文字专家的建议也将成为登记制度的一部分。此外,如下文第7节所述,注册处可以得出结论,通过构建“语言表”,以不符合任何已知语言的方式混合脚本和字符,将有利于注册人、利益相关者和互联网社区的最佳利益。应为此类登记制定公约,以避免误导性地反映特定的语言代码。
1. Characters in this document are specified by their Unicode codepoints in U+xxxx format, by their official names, or both.
1. 本文档中的字符由U+xxxx格式的Unicode代码点、官方名称或两者指定。
2. The following terms are used in this document.
2. 本文件中使用了以下术语。
* String
* 一串
A "string" is an sequence of one or more characters.
“字符串”是一个或多个字符的序列。
* Base Character
* 基本字符
This document discusses characters that may have equivalent or near-equivalent characters or strings. A "base character" is a character that has zero or more equivalents. In the JET Guidelines, base characters are referred to as "valid characters". In a table with variants, as described in Section 5, the base characters occupy the first column. Normally (and always, if the recommendation of Section 6.3 is adopted), the base characters will be the characters that appear in registration requests from registrants; any other character will invalidate the registration attempt.
本文档讨论可能具有等效或近似等效的字符或字符串的字符。“基本字符”是具有零个或多个等效项的字符。在JET指南中,基本字符被称为“有效字符”。如第5节所述,在带有变体的表格中,基本字符占据第一列。通常(如果采用第6.3节的建议,则始终如此),基本字符将是注册人的注册请求中出现的字符;任何其他字符都将使注册尝试无效。
* Native Script
* 本族语
Native script is the form in which the relevant string would normally be represented. For example, it might use Lower Slobbovian characters and the glyphs normally used to write them. It would not be punycode as a presentation form.
本机脚本是通常表示相关字符串的形式。例如,它可能使用较低的斯洛波文字符和通常用于书写它们的字形。作为一种表现形式,它不会是一个小代码。
* Variant Characters/Strings
* 变体字符/字符串
The "variant(s)" are character(s) and/or string(s) that are treated as equivalent to the base character. Note that these might not be exactly equivalent characters; a particular
The "variant(s)" are character(s) and/or string(s) that are treated as equivalent to the base character. Note that these might not be exactly equivalent characters; a particular
original character may be a base character with a mapping to a particular variant character, but that variant character may not have a mapping to the original base character. Indeed, the variant character may not appear in the base character list, and hence may not be valid for use in a registration. Usually, characters or strings to be designated as variants are considered either equivalent or sufficiently similar (by some registry-specific definition) that confusion between them and the base character might occur.
原始字符可以是映射到特定变体字符的基字符,但该变体字符可能没有映射到原始基字符。实际上,变体字符可能不会出现在基本字符列表中,因此可能无法在注册中使用。通常,指定为变体的字符或字符串被认为是等效的或非常相似的(根据某些特定于注册表的定义),因此它们可能与基字符混淆。
* Base Registration
* 基本注册
The "base registration" is the single name that the registrant requested from the registry. The JET Guidelines use the term "label string" for this concept.
“基本注册”是注册人向注册处请求的单一名称。JET指南使用术语“标签字符串”来表示此概念。
* Registered, Activated
* 注册,激活
A label (or "name") is described as "registered" if it is actually entered into a domain (i.e., into a zone file) by the registry, so that it can be accessed and resolved using standard DNS tools. The JET Guidelines describe a "registered" label as "activated". However, some domains use a slightly different registration logic in which a name can be registered with the registrar (if one is involved) and with the registry, but not actually entered into the zone file until an additional activation or delegation step occurs. This document does not make that distinction, but is compatible with it.
如果一个标签(或“名称”)被注册表实际输入域(即区域文件),则该标签(或“名称”)被描述为“已注册”,以便可以使用标准DNS工具对其进行访问和解析。JET指南将“注册”标签描述为“已激活”。但是,有些域使用稍有不同的注册逻辑,其中一个名称可以向注册器(如果涉及)和注册中心注册,但在执行其他激活或委派步骤之前,不能实际输入到区域文件中。本文件没有作出这种区分,但与之兼容。
As specified in the IDNA Standard, the name actually placed in the zone file is always the internal ("punycode") form. There is no provision for actually entering any other form of an IDN into the DNS. It remains controversial, with different registrars and registries having adopted different policies, as to whether the registration, as submitted by the registrant, is in the form of:
按照IDNA标准的规定,实际放置在区域文件中的名称始终是内部(“punycode”)形式。没有规定实际将任何其他形式的IDN输入DNS。由于不同的注册人和注册处采取了不同的政策,关于注册人提交的注册是否以以下形式存在争议:
o The native-script name, either in UTF-8 or in some coding specified by the registrar, or
o 本机脚本名称,UTF-8或注册员指定的某些编码形式,或
o the internal-form ("punycode") name, or
o 内部形式(“punycode”)名称,或
o both forms of the name together, so that the registrar and registry can verify the intended translation.
o 将这两种形式的名称合并在一起,以便书记官长和书记官处能够核实预期的翻译。
If any of the approaches defined in this document is used, it is almost certain to be necessary that the native-script form of the requested string be available to the registry.
如果使用了本文档中定义的任何方法,几乎可以肯定的是,注册表必须使用请求字符串的本机脚本形式。
* Registration Bundle
* 注册包
A "registration bundle" is the set of all labels that come from expanding the base characters for a single name into their variants. The presence of a label in a registration bundle does not imply that it is registered. In the JET Guidelines, a registration bundle is called an "IDN Package".
“注册包”是所有标签的集合,这些标签来自于将单个名称的基本字符扩展为其变体。注册包中存在标签并不意味着它已注册。在JET指南中,注册包称为“IDN包”。
* Reserved Label
* 保留标签
A "reserved label" is a label in a registration bundle that is not actually registered.
“保留标签”是注册包中未实际注册的标签。
* Registry"
* 登记处“
A "registry" is the administrative authority for a DNS zone. The registry is the body that enforces, and typically makes, policies that are used in a particular zone in the DNS.
“注册表”是DNS区域的管理权限。注册表是强制执行并通常制定在DNS中特定区域中使用的策略的机构。
* Coded Character Set
* 编码字符集
A "Coded Character Set" (CCS) is a list of characters and the code positions assigned to them. ASCII and Unicode are CCSs.
“编码字符集”(CCS)是字符列表以及分配给它们的代码位置。ASCII和Unicode是CCS。
* Language
* 语言
A "language" is something spoken by humans, independent of how it is written or coded. ISO Standard 639 and IETF BCP 47 (RFC 3066) [RFC3066] list and define codes for identifying languages.
“语言”是人类所说的东西,与书写或编码方式无关。ISO标准639和IETF BCP 47(RFC 3066)[RFC3066]列出并定义识别语言的代码。
* Script
* 剧本
A "script" is a collection of characters (glyphs, independent of coding) that are used together, typically to represent one or more languages. Note that the script for one language may heavily overlap the script for another. This does not imply that they have identical scripts.
“脚本”是一起使用的字符(字形,独立于编码)的集合,通常用于表示一种或多种语言。请注意,一种语言的脚本可能与另一种语言的脚本严重重叠。这并不意味着它们有相同的脚本。
* Charset
* 字符集
"Charset" is an IETF-invented term to describe, more or less, the combination of a script, a CCS that encodes that script,
“字符集”是IETF发明的一个术语,或多或少地描述了一个脚本的组合,一个编码该脚本的CCS,
and rules for serializing encoded bytes that are stored on a computer or transmitted over the network.
以及用于序列化存储在计算机上或通过网络传输的编码字节的规则。
The last four of these definitions are redundant with, but deliberately somewhat less precise than, the definitions in [RFC3536], which also provides sources. The two sets of definitions are intended to be consistent.
这些定义中的最后四个与[RFC3536]中的定义是冗余的,但故意低于[RFC3536]中的定义,后者也提供了来源。这两套定义旨在保持一致。
The term "confusion" is used very generically in this document to cover the entire range from accidental user misperception of the relationship between characters with some characteristic in common (typically appearance, sound, or meaning) to cybersquatting and (other) deliberately fraudulent attempts to exploit those relationships based on the nature of the characters.
本文档中“混淆”一词的使用非常普遍,涵盖了从用户意外误解具有某些共同特征(通常是外观、声音或意义)的字符之间的关系到网络抢注和(其他)的整个范围故意欺诈性地试图利用基于角色性质的关系。
In the JET Guidelines model, a prospective registrant approaches the registry for a zone (perhaps through an intermediate registrar) with a candidate base registration -- a proposed name to be registered -- and a list of languages in which that name is to be interpreted. The languages are defined according to the fairly high-resolution coding of [RFC3066] or, if the registry considers it more appropriate, a coding based on scripts such as those in [LTRU-Registry]. In this way, Chinese as used on the mainland of the People's Republic of China ("zh-cn") can, at registry option, consist of a somewhat different list of characters (code points) and be represented by a separate table compared to Chinese as used in Taiwan ("zh-tw").
在JET指南模型中,潜在注册人(可能通过中间注册人)向区域注册处提交候选基本注册——拟注册的名称——以及解释该名称的语言列表。这些语言是根据[RFC3066]的相当高的分辨率编码定义的,或者,如果注册中心认为更合适,则是基于脚本的编码,如[LTRU注册中心]中的脚本。这样,在中华人民共和国大陆使用的中文(“zh-cn”)可以根据注册选择由一些不同的字符列表(代码点)组成,并且与在台湾使用的中文(“zh-tw”)相比,由单独的表格表示。
The design of the JET Guidelines took one important constraint as a basis: IDNA was treated as a firm standard. A procedure that modified some portion of the IDNA functions, or was a variant on them, was considered a violation of those standards and should not be encouraged (or, probably, even permitted).
JET指南的设计以一个重要约束为基础:IDNA被视为一个固定标准。修改部分IDNA功能或其变体的程序被视为违反了这些标准,不应鼓励(或甚至可能不允许)。
Each registry is expected to construct (or obtain) a table for each language it considers relevant and appropriate. These tables list, for the particular zone, the characters permitted for that language. If a character does not appear as a base character (called a "valid code point" in the JET document) in that table, then a name containing it cannot be registered. If multiple languages are listed for the registration, then the character must appear in the tables for each of those languages.
每个注册中心都应为其认为相关和适当的每种语言构造(或获取)一个表。对于特定区域,这些表列出了该语言允许的字符。如果某个字符在该表中未显示为基本字符(在JET文档中称为“有效代码点”),则无法注册包含该字符的名称。如果为注册列出了多种语言,则字符必须出现在每种语言的表中。
The tables may also contain columns that specify alternate or variant forms of the valid character. If these variants appear, they are used to synthesize labels that are alternatives to the original one. These labels are all reserved and can be registered or "activated" (placed into the DNS) only by the action or request of the original registrant; some (the "preferred variant labels") are typically registered automatically. The zone is expected to establish appropriate policies for situations in which the variant forms of one label conflict with already-reserved or already-registered labels.
这些表还可能包含指定有效字符的替代或变体形式的列。如果出现这些变体,它们将用于合成作为原始变体替代品的标签。这些标签都是保留的,只能通过原始注册人的行动或请求注册或“激活”(放入DNS);一些(“首选变体标签”)通常是自动注册的。对于一个标签的变体形式与已保留或已注册的标签冲突的情况,预计该区域将制定适当的政策。
Most of these concepts were introduced because of concerns about specific issues with CJK characters, beginning from the requirement that the use of Simplified Chinese by some registrants and Traditional Chinese by others not be permitted to create confusion or opportunities for fraud. While they may be applicable to registry tables constructed for alphabetic scripts, the translation should be done with care, since many analogies are not exact.
引入这些概念大多是因为担心CJK字符的具体问题,首先是要求不允许某些注册人使用简体中文和其他注册人使用繁体中文造成混淆或欺诈机会。虽然它们可能适用于为字母脚本构建的注册表表,但翻译时应小心,因为许多类比并不精确。
Some of the important issues are discussed in the sections that follow, especially Section 3. The JET model may be considered as a variation on, and inspiration for, the model and method presented by the rest of this document, although the JET model has been completely developed only for CJK characters. Other languages or scripts, especially alphabetic ones, may require other variations.
一些重要问题将在后面的章节中讨论,特别是第3节。JET模型可被视为本文件其余部分所述模型和方法的变体和灵感来源,尽管JET模型仅针对CJK角色进行了完全开发。其他语言或脚本,尤其是字母,可能需要其他变体。
A basic assumption of the JET model is that, if the evolution of specific characters or the properties of Unicode [Unicode] [Unicode32] or IDNA cause two strings to appear similar enough to cause confusion, then both should be registered by the same party or one of them should become unregisterable. The definition of "appear similar enough" will differ for different cultures and circumstance, and hence DNS zones, but the principle is fairly general. In the JET model, all of the variant strings are identified, some are registered into the DNS automatically, and others are simply reserved and can be registered, if at all, only by the original registrant. Other zones might find other policies appropriate. For example, a zone might conclude that having similar strings registered in the DNS was undesirable. If so, the list of variant strings would be used only to build a list of names that would be reserved and prohibited from being registered.
JET模型的一个基本假设是,如果特定字符的演变或Unicode[Unicode][Unicode32]或IDNA的属性导致两个字符串看起来相似,足以引起混淆,那么这两个字符串都应该由同一方注册,或者其中一个应该不可注册。对于不同的文化和环境,“看起来足够相似”的定义会有所不同,因此DNS区域也会有所不同,但原则相当笼统。在JET模型中,所有变体字符串都被识别,其中一些字符串被自动注册到DNS中,而另一些字符串则被简单地保留,如果有的话,只能由原始注册人注册。其他区域可能会发现其他适合的策略。例如,区域可能会得出结论,在DNS中注册类似字符串是不可取的。如果是这样,变量字符串列表将仅用于构建保留和禁止注册的名称列表。
Conversations about scripts -- collections of characters associated with particular languages -- are common when discussing character sets and codes. However, the boundaries between one script and another are not well-defined. The Unicode Standard ([Unicode], [Unicode32]), for example, does not define script boundaries at all, even though it is structured in terms of usually-related blocks of characters. The issue is complicated by the common origin of most alphabetic scripts in use in the world today (see, for example, [Drucker] or the more scholarly [Daniels]).
在讨论字符集和代码时,关于脚本(与特定语言相关的字符集合)的对话是常见的。但是,一个脚本和另一个脚本之间的边界没有很好地定义。例如,Unicode标准([Unicode]、[Unicode32])根本没有定义脚本边界,即使它是按照通常相关的字符块来构造的。当今世界上使用的大多数字母脚本的共同起源使问题变得复杂(例如,参见[Drucker]或更具学术性的[Daniels])。
Because of that history, certain characters (or, more precisely, symbols representing characters) appear in the scripts associated with multiple languages, sometimes with very different sounds or meanings. This differs from the CJK situation in which, if a character appears in more than one of the relevant languages, it will usually have the same interpretation in each one. For the subset of characters that actually are ideographs or pictographs, pronunciation is expected to vary widely while meaning is preserved. At least in part because of that similarity of meaning, it made sense in the JET case to permit a registration to specify multiple languages, to verify that the characters in the label string (the requested "Base registration") were valid for each, and then to generate variant labels using each language in turn. For many alphabetic languages, it may be more sensible to prohibit the label string submitted for registration from being associated with more than one language. Indeed, "one label, one language" has been suggested as an important barrier against common sources of "look-alike" confusion. For example, the imposition of that rule in a zone would prevent the insertion of a few Greek or Cyrillic characters with shapes identical to the Latin ones into what was otherwise a Latin-based string. For a particular table, the list of base characters may be thought of as the script associated with the relevant language, with the understanding that the table design does not prevent the same character from appearing in the tables for multiple languages.
由于这段历史,某些字符(或者更准确地说,代表字符的符号)出现在与多种语言相关联的脚本中,有时发音或含义非常不同。这与CJK情况不同,在CJK情况下,如果一个字符出现在多个相关语言中,则每个语言中的解释通常相同。对于实际上是表意文字或象形文字的字符子集,在保留含义的同时,发音可能会有很大差异。至少在某种程度上,由于含义的相似性,在JET案例中,允许注册指定多种语言,验证标签字符串中的字符(请求的“基本注册”)对每种语言都有效,然后依次使用每种语言生成变体标签是有意义的。对于许多字母语言,禁止提交注册的标签字符串与多种语言关联可能更为合理。事实上,“一个标签,一种语言”被认为是防止“相貌相似”混淆的一个重要障碍。例如,在区域中强制执行该规则将阻止在原本基于拉丁语的字符串中插入一些形状与拉丁语相同的希腊或西里尔语字符。对于一个特定的表,基本字符列表可以被认为是与相关语言相关联的脚本,需要理解的是,表设计不会阻止相同的字符出现在多语言的表中。
Indeed, this notion of a script that is local and specifically identified can be turned around: so-called "language tables" are associated with languages only insofar as thinking about the character structure and word forms associated with a given language helps to inform the construction of the table. A country like Finland, for example, might select among:
事实上,这种本地脚本和特定脚本的概念是可以改变的:所谓的“语言表”仅在考虑与给定语言相关联的字符结构和单词形式有助于通知表的构造的情况下才与语言相关。例如,像芬兰这样的国家可能会选择:
o One table each for Finnish, Swedish, and English characters and conventions, permitting a string to be registered in one, two, or
o 芬兰语、瑞典语和英语字符和约定各一个表,允许字符串以一、二或三种格式注册
all three languages. However, a three-language registration would necessarily prohibit any characters that did not appear in all three languages, since the label would make little sense otherwise.
所有三种语言。然而,三种语言的注册必然会禁止所有三种语言中都没有出现的任何字符,因为否则标签就毫无意义。
o One table each, but with a "one label, one language" rule for the zone.
o 每个表一个,但分区有“一个标签,一种语言”规则。
o A combined table based on the observation that all three writing systems were based on Roman characters and that the possibilities for confusion of interest to the registry would not be reduced by "language" differentiation. This option raises an interesting issue about language labeling as described in Section 1.4.1; see the discussion in Section 7 below.
o 一个综合表格是根据以下观察结果编制的,即所有三种书写系统均以罗马字符为基础,而且书记官处感兴趣的混淆可能性不会因“语言”差异而减少。如第1.4.1节所述,该选项提出了一个有关语言标签的有趣问题;见下文第7节的讨论。
Regardless of what decisions were made about those languages and scripts, they might have a separate table for registration of labels containing Cyrillic characters. That table might contain some Roman-derived characters (either as base characters or as variants), just as some CJK tables do. See also Section 2, below.
不管对这些语言和脚本做出了什么决定,它们都可能有一个单独的表来注册包含西里尔字母的标签。该表可能包含一些罗马派生字符(作为基本字符或变体),就像某些CJK表一样。另见下文第2节。
Tables that present multiple languages, as described above, have introduced confusion and discomfort among those who have failed to understand these definitions. The consequence of these definitions is that use of a language or script code in a registration is a mnemonic, rather than a normative statement about the language or script itself. When that confusion is likely to occur, it is appropriate to simply use the registry identifier and a sequence number to identify the registration.
如上所述,呈现多种语言的表格给那些未能理解这些定义的人带来了困惑和不适。这些定义的结果是,在注册中使用语言或脚本代码是一种助记符,而不是关于语言或脚本本身的规范性陈述。当可能出现这种混淆时,只需使用注册表标识符和序列号即可识别注册。
As the JET Guidelines stress, no tables or systems of this type -- even if identified with a language as a means of defining or describing the table -- can assure linguistic or even syntactic correctness of labels with regard to that language. That assurance may not be possible without human intervention or at least dictionary lookups of complete proposed labels. It may even not be desirable to attempt that level of correctness (see Section 2).
正如JET指南所强调的那样,这种类型的表或系统——即使用一种语言作为定义或描述表的手段——都不能保证与该语言相关的标签在语言上甚至句法上的正确性。如果没有人为干预或至少对完整的建议标签进行字典查找,则可能无法实现这一保证。甚至可能不希望尝试这种级别的正确性(参见第2节)。
Of course, if any language-based tests or constraints, including "one label, one language", are to be applied to limit the associated sources of confusion, each zone must have a table for each language in which it expects to accept registrations. The notion of a single combined table for the zone is, in the general case, simply unworkable. One could use a single table for the zone if the intent were to impose only minimal restrictions, e.g., to force alphabetic and numeric characters only, excluding symbols and punctuation. That type of restriction might be useful in eliminating some problems, such as those of unreadable labels, but it would be unlikely to be
当然,如果要应用任何基于语言的测试或限制,包括“一个标签,一种语言”,以限制相关的混淆源,则每个区域必须为其预期接受注册的每种语言提供一个表。在一般情况下,分区的单一组合表的概念根本不可行。如果目的只是施加最低限度的限制,例如,只强制使用字母和数字字符,不包括符号和标点符号,则可以为区域使用单个表格。这种类型的限制可能有助于消除某些问题,例如不可读标签的问题,但不太可能
very helpful with, e.g., confusion caused by similar-looking characters.
非常有帮助,例如,由相似的字符引起的混淆。
The area of character variants is rife with difficulties (and perhaps opportunities). There is no universal agreement about which base characters have variants, or if they do, what those variants are. For example, in some regions of the world and in some languages, LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) and LATIN SMALL LETTER O WITH STROKE (U+00F8) are variants of each other, while in other regions, most people would think that LATIN SMALL LETTER O WITH STROKE has no variants. In some cases, the list of variants is difficult to enumerate. For example, it required several years for the Chinese language community to create variant tables for use with IDNA, and it remains, at the time of this writing, questionable how widely those tables will be accepted among users of Chinese from areas of the world other than those represented by the groups that created them.
角色变体领域充满了困难(也许还有机会)。关于哪些基本字符有变体,或者如果有变体,这些变体是什么,目前还没有普遍的共识。例如,在世界的一些地区和一些语言中,带分音符的拉丁小写字母O(U+00F6)和带笔划的拉丁小写字母O(U+00F8)是彼此的变体,而在其他地区,大多数人会认为带笔划的拉丁小写字母O没有变体。在某些情况下,很难列举变体列表。例如,汉语社区需要几年时间才能创建与IDNA一起使用的变体表格,而在撰写本文时,这些表格在世界其他地区的汉语用户(而不是创建它们的团体所代表的用户)中的接受程度仍存在疑问。
Thus, the first thing a registry should ask is whether or not any of the characters that they want to permit to be used have variants. If not, the registry's work is much simpler. This is not to say that a registry should ignore variants if they exist: adding variants after a registry has started to take registrations will be nearly as difficult administratively as removing characters from the list of acceptable characters. That is, if a registry later decides that two characters are variants of each other, and there are actively-used names in the zones that differ only on the new variants, the registry might have to transfer ownership of one of the names to a different owner, using some process that is certain to be controversial.
因此,注册表应该询问的第一件事是它们希望允许使用的任何字符是否有变体。如果没有,注册处的工作就简单多了。这并不是说注册中心应该忽略存在的变体:在注册中心开始进行注册后添加变体在管理上几乎与从可接受字符列表中删除字符一样困难。也就是说,如果注册表后来决定两个字符是彼此的变体,并且区域中存在仅在新变体上不同的活动使用名称,则注册表可能必须使用某些肯定会引起争议的过程,将其中一个名称的所有权转移给其他所有者。
This situation in likely to be much easier for areas and zones that use characters that previously did not occur in the DNS at all than it will be for zones in which non-English labels have been registered in ASCII characters for some time, presumably because the language of interest uses additional "Latin" characters with some conventions when only ASCII is available. In the former case, the rules and conventions can be established before any registrations occur. In the latter, there may be conflicts or opportunities for confusion between existing registrations and now-permitted Roman-based characters that do not appear in ASCII. For example, a domain name might exist today that uses the name of a city in Canada spelled as "Montreal". If the zone in which it occurs changes its rules to permit the use of the character LATIN SMALL LETTER E WITH ACUTE (U+00E9), does the name of the city, spelled (correctly) using that character, conflict with the existing domain name registration?
对于使用DNS中以前根本没有出现的字符的区域和区域,这种情况可能比使用ASCII字符注册非英语标签一段时间的区域要容易得多,这可能是因为感兴趣的语言使用了额外的“拉丁语”只有ASCII可用时具有某些约定的字符。在前一种情况下,规则和惯例可以在任何注册发生之前建立。在后一种情况下,现有注册和现在允许的不出现在ASCII中的基于罗马的字符之间可能存在冲突或混淆的机会。例如,今天可能存在一个域名,它使用加拿大一个城市的名称拼写为“Montreal”。如果发生这种情况的区域改变了规则,允许使用带锐音符(U+00E9)的拉丁小写字母E,那么使用该字符拼写(正确)的城市名称是否与现有域名注册冲突?
Certainly, if both are permitted, and permitted to be registered by separate parties, there are many opportunities for confusion.
当然,如果两者都被允许,并且被允许由不同的当事人注册,那么就有很多混淆的机会。
Of course, zone managers should inform all current registrants when the registration policy for the zone changes. This includes the times when IDN characters are first allowed in the zone, when additional characters are permitted, and when any change occurs in the character variant tables.
当然,当区域的注册政策发生变化时,区域经理应通知所有当前注册人。这包括首次允许在区域中使用IDN字符的时间、允许使用其他字符的时间以及字符变体表中发生任何更改的时间。
Many languages contain two variants for a character, one of which is strongly preferred. A registry might restrict the base registration to the preferred form, or it might allow any form for the base registration. If the variant tables are created carefully, the resulting bundles will be the same, but some registries will give special status to the base registration such as its appearance in "Whois" databases.
许多语言包含字符的两种变体,其中一种是首选的。注册表可能会将基本注册限制为首选表单,也可能允许任何表单进行基本注册。如果仔细创建变量表,则生成的bundle将是相同的,但是一些注册中心将为基本注册提供特殊状态,例如它在“Whois”数据库中的外观。
It is worth stressing that there are many obvious opportunities for confusion that variant systems, by virtue of being based on processing of individual characters, cannot address. For example, if a language can be written with more than one script, or transliterations of the language into another script are common, variant models are insufficient to prevent conflicting registration of the related forms. Avoiding those types of problems would require different mechanisms, perhaps based on phonetic or natural language processing techniques for the entire proposed base registration.
值得强调的是,有许多明显的混淆机会,而变体系统由于基于单个字符的处理而无法解决。例如,如果一种语言可以使用多个脚本编写,或者将该语言翻译成另一个脚本是常见的,那么变体模型不足以防止相关表单的注册冲突。避免这些类型的问题需要不同的机制,可能是基于整个拟议基础注册的语音或自然语言处理技术。
The JET Guidelines are based on processing only single characters. Pairs or longer sequences of characters can, at the option of the registry, be handled through what the Guidelines describe as "additional processing". These registry-specific string processing procedures are specifically permitted by the guidelines to supplement the per-character processing that generates the variants.
JET指南仅基于处理单个字符。根据注册表的选择,成对或更长的字符序列可以通过指南描述为“附加处理”的方式进行处理。指南特别允许这些特定于注册表的字符串处理过程,以补充生成变体的每字符处理。
A different zone with different needs could use a modified version of the table structure, or different types of additional processing, to prohibit particular sequences of characters by marking them as invalid, and to accept characters by marking them as valid. Other modifications or extensions might be designed to prevent certain letters from appearing at the beginning or end of labels. The use of regular expressions in the "valid characters" column might be one way
具有不同需求的不同区域可以使用表结构的修改版本或不同类型的附加处理,通过将特定字符序列标记为无效来禁止它们,并通过将它们标记为有效来接受字符。可能会设计其他修改或扩展,以防止某些字母出现在标签的开头或结尾。在“有效字符”列中使用正则表达式可能是一种方法
to implement these types of restrictions, but there has been no experience so far with that approach.
实施这些类型的限制,但迄今为止还没有这种方法的经验。
In particular, in some scripts derived from Roman characters, sequences that have historically been typographically represented by single "ligature" or "digraph" characters may also be represented by the separate characters (e.g., "ae" for U+00E6 or "ij" for U+0133). If it is desired to either prohibit these, or to treat them as variants, some extensions to the single-character JET model may be needed. Some careful thinking about IDNA (especially nameprep) may also be needed, since some of these combinations are excluded there).
特别是,在一些源自罗马字符的脚本中,历史上由单个“连字”或“有向图”字符印刷表示的序列也可以由单独的字符表示(例如,U+00E6的“ae”或U+0133的“ij”)。如果希望禁止这些,或将其视为变体,则可能需要对单角色JET模型进行一些扩展。可能还需要仔细考虑IDNA(特别是nameprep),因为其中一些组合被排除在外)。
Some character pairings -- the use of a character form (glyph) in one language and a different form with the same properties in a related one -- closely approximate the issues with mapping between Traditional and Simplified Chinese, although the history is different. For example, it might be useful to have "o" with a stroke (U+00F8) as a variant for "o" with diaeresis above it (U+00F6) (and the equivalent upper-case pair) in a Swedish table, and vice versa in a Norwegian one, or to prohibit one of these characters entirely in each table. In a German table, U+00F8 would presumably be prohibited, while U+00F6 might have "oe" as a variant. Obviously, if the relevant language of registration is unknown, this type of variant matching cannot be applied in any sensible way.
一些字符配对——在一种语言中使用字符形式(字形),在相关语言中使用具有相同属性的不同形式——非常接近繁体中文和简体中文之间的映射问题,尽管历史不同。例如,在瑞典表中,将带有笔划(U+00F8)的“o”作为“o”的变体(U+00F6)(以及等效的大写字母对),在挪威表中则可能有用,或者在每个表中完全禁止这些字符中的一个。在德国表格中,U+00F8可能被禁止,而U+00F6可能有“oe”作为变体。显然,如果相关的注册语言未知,则无法以任何合理的方式应用这种类型的变体匹配。
As one of its critical innovations, the JET model defines an "IDN package", known in this document as a "registration bundle", which consists of the primary registered string (which is used as the name of the bundle), the information about the language table(s) used, the variant labels for that string, and indications of which of those labels are registered in the relevant zone file ("activated" in the JET terminology). Registration bundles are also atomic -- one can not add or remove variant labels from one without unregistering the entire package. A label exists in only one registration bundle at a time; if a new label is registered that would generate a variant that matches one that appears in an existing package, that variant simply is not included in the second package. A subsequent de-registration of the first package does not cause the variant to be added to the second. While it might be possible to change this in other models, the JET conclusion was that other options would be far too complex to implement and operate and would cause many new types of name conflicts.
作为其关键创新之一,JET模型定义了一个“IDN包”,在本文档中称为“注册包”,该包由主要注册字符串(用作包的名称)、所用语言表的信息、该字符串的变体标签、,以及在相关区域文件中登记了哪些标签的指示(在JET术语中为“激活”)。注册包也是原子的——如果不注销整个包,就不能添加或删除变体标签。标签一次只存在于一个注册包中;如果注册了一个新标签,该标签将生成一个与现有包中出现的标签相匹配的变体,则该变体不会包含在第二个包中。第一个包的后续注销不会导致将变体添加到第二个包中。虽然在其他模型中可能会改变这一点,但JET的结论是,其他选项将过于复杂,无法实施和操作,并将导致许多新类型的名称冲突。
A registry has three options for handling the case where the registration bundle contains more than one label. The policy options are:
注册表有三个选项用于处理注册包包含多个标签的情况。政策选择包括:
o Register and resolve all labels in the zone, making the zone information identical to that of the registered labels. This option will allow end users to find names with variants more easily, but will result in larger zone files. For some language tables, the zone file could become so large that it could negatively affect the ability of the registry to perform name resolution. If the base registration contains several characters that have equivalents, the owner could end up having to take care of large numbers of zones. For instance, if DIGIT ONE is a variant of LATIN SMALL LETTER L, the owner of the domain name all-lollypops.example.com will have to manage 32 zones. If the intent is to keep the contents of those zones identical, the owner may then face a significant administrative problem. If other concerns dictate short times to live and absolute consistency of DNS responses, the challenges may be nearly impossible.
o 注册并解析分区中的所有标签,使分区信息与注册标签的信息相同。此选项将允许最终用户更轻松地查找带有变体的名称,但将导致更大的区域文件。对于某些语言表,区域文件可能会变得太大,从而对注册表执行名称解析的能力产生负面影响。如果基本注册包含多个具有等效项的字符,则所有者可能最终不得不处理大量区域。例如,如果数字1是拉丁小写字母L的变体,则域名all-lollypops.example.com的所有者必须管理32个区域。如果目的是保持这些区域的内容相同,则所有者可能会面临重大的管理问题。如果其他问题要求DNS响应的生存时间短且绝对一致,那么挑战几乎是不可能的。
o Block all labels other than the registered label so they cannot be registered in the future. This option does not increase the size of the zone file and provides maximum safety against false positives, but it may cause end users to not be able to find names with variants that they would expect. If the base registration contains characters that have equivalents, Internet users who do not know what base characters were used in the registration will not know what character to type in to get a DNS response. For instance, if DIGIT ONE is a variant of LATIN SMALL LETTER L, and LATIN SMALL LETTER L is a variant of DIGIT ONE, the user who sees "pale.example.com" will not know whether to type a "1" or a "l" after the "pa" in the first label.
o 阻止除已注册标签之外的所有标签,以便以后无法注册。此选项不会增加区域文件的大小,并提供最大程度的防误报安全性,但它可能会导致最终用户无法找到具有他们期望的变体的名称。如果基本注册包含具有等效项的字符,则不知道注册中使用了哪些基本字符的Internet用户将不知道键入哪个字符以获得DNS响应。例如,如果数字1是拉丁小写字母L的变体,而拉丁小写字母L是数字1的变体,则看到“pale.example.com”的用户将不知道在第一个标签的“pa”后面键入“1”还是“L”。
o Resolve some labels and block some other labels. This option is likely to cause the most confusion with users because including some variants will cause a name to be found, but using other variants will cause the name to be not found. For example, even if people understood that DIGIT ONE and LATIN SMALL LETTER L were variants, a typical DNS user wouldn't know which character to type because they wouldn't know whether this pair were used to register or block the labels. However, this option can be used to balance the desires of the name owner (that every possible attempt to enter their name will work) with the desires of the zone administrator (to make the zone more manageable and possibly to be compensated for greater amounts of work needed for a single
o 解析一些标签并阻止其他一些标签。此选项可能会导致与用户的最大混淆,因为包含某些变体会导致找到名称,但使用其他变体会导致找不到名称。例如,即使人们理解数字1和拉丁小写字母L是变体,典型的DNS用户也不知道键入哪个字符,因为他们不知道这对字符是用来注册还是阻止标签。但是,此选项可用于平衡名称所有者的愿望(即每一次可能的输入其名称的尝试都会起作用)与区域管理员的愿望(使区域更易于管理,并可能因单个项目所需的工作量而获得补偿)
registration). For many circumstances, it may be the most attractive option.
注册)。在许多情况下,这可能是最有吸引力的选择。
In all cases, at least the registered label should appear in the zone. It would be almost impossible to describe to name owners why the name that they asked for is not in the zone, but some other name that they now control is. By implication, if the requested label is already registered, the entire registration request must be rejected.
在所有情况下,区域中至少应显示注册标签。几乎不可能向业主描述为什么他们要求的名字不在区域内,而他们现在控制的其他名字在区域内。因此,如果请求的标签已注册,则必须拒绝整个注册请求。
Historically, DNS labels were considered to be arbitrary identifier strings, without any inherent meaning. Even in ASCII, there was no requirement that labels form words. Labels that could not possibly represent words in any Romance or Germanic language (the languages that have been written in "Latin" scripts since medieval times or earlier) have actually been quite common. In general, in those languages, words contain at least one vowel and do not have embedded numbers. As a result, a string such as "bc345df" cannot possibly be a "word" in these languages. More generally, the more one moves toward "language"-based registry restrictions, the less it is going to be possible to construct labels out of fanciful strings. While fanciful strings are terrible candidates for "words", they may make very good identifiers. To take a trivial example using only ASCII characters, "rtr32w", "rtr32x", and "rtr32z" might be very good DNS labels for a particular zone and application. However, given the embedded digits and lack of vowels, they, like the "bc345df" example given above, would fail even the most superficial of tests for valid English (or German or French (etc.)) word forms.
历史上,DNS标签被认为是任意标识符字符串,没有任何固有的含义。即使在ASCII中,也没有要求标签形成单词。不可能用任何罗曼史或日耳曼语(中世纪或更早时期就用“拉丁”文字书写的语言)来表示单词的标签实际上已经相当普遍。一般来说,在这些语言中,单词至少包含一个元音,并且没有嵌入数字。因此,像“bc345df”这样的字符串在这些语言中不可能是“单词”。更一般地说,人们越倾向于基于“语言”的注册表限制,就越不可能用奇异的字符串构造标签。虽然奇幻的字符串是“单词”的可怕候选者,但它们可能是非常好的标识符。举一个仅使用ASCII字符的简单示例,“rtr32w”、“rtr32x”和“rtr32z”对于特定区域和应用程序可能是非常好的DNS标签。然而,考虑到嵌入的数字和缺少元音,它们就像上面给出的“bc345df”示例一样,即使是最肤浅的有效英语(或德语或法语(等))词形测试也会失败。
It is worth noting that several DNS experts have suggested that a number of problems could be solved by prohibiting meaningful names in labels, requiring instead that the labels be random or nonsense strings. If methods similar to those discussed in this document were used to force identifiers to be closer to meaningful words in real languages, the result would be directly contradictory to those "random name" approaches.
值得注意的是,一些DNS专家建议,可以通过禁止标签中有意义的名称来解决一些问题,而不是要求标签是随机或无意义的字符串。如果使用与本文件中讨论的方法类似的方法来强制标识符更接近真实语言中的有意义的单词,那么结果将与那些“随机名称”方法直接矛盾。
Interestingly, if one were trying to develop an "only words" system, a rather different -- but very restrictive -- model could be developed using lookups in a dictionary for the relevant language and a listing of valid business names for the relevant area. If a string did not appear in either, it would not be permitted to be registered. Models that require a prior national business listing (or registration) that is identical to the proposed domain name label have historically been used to restrict registrations in some country-code top level domains, so this is not a new idea. On the other hand, if look-alike characters are a concern, even that type of
有趣的是,如果一个人试图开发一个“只有单词”的系统,那么可以通过在字典中查找相关语言和列出相关领域的有效商业名称来开发一个完全不同但非常严格的模型。如果字符串没有出现在任何一个中,则不允许注册该字符串。要求与拟议域名标签相同的事先国家商业上市(或注册)的模式历来被用于限制某些国家代码顶级域的注册,因此这不是一个新想法。另一方面,如果要考虑相貌相似的角色,即使是那种类型的
rule (or restriction) would still not avoid the need to consider character variants.
规则(或限制)仍然不能避免需要考虑字符变体。
Consequently, registries applying the principles outlined in this document should be careful not to apply more severe restrictions than are reasonable and appropriate while, at the same time, being aware of how difficult it usually is to add restrictions at a later time.
因此,适用本文件所述原则的登记处应小心,不要适用比合理和适当的更严格的限制,同时要意识到在以后增加限制通常有多么困难。
The JET model was designed for CJK characters. The discussion above implies that some extensions to it may be needed to handle the characteristics of various alphabetic scripts and the decisions that might be made about them in different zones. Those extensions might include facilities to process:
JET模型是为CJK角色设计的。上面的讨论意味着可能需要对其进行一些扩展,以处理各种字母脚本的特征以及在不同区域可能对其做出的决定。这些扩展可能包括处理以下内容的设施:
o Two-character (or more) sequences, such as ligatures and typographic spelling conventions, as variants.
o 两个字符(或更多)序列,如连字和排版拼写约定,作为变体。
o Regular expressions or some other mechanism for dealing with string positions of characters (e.g., characters that must, or must not, appear at the beginning or end of strings).
o 正则表达式或处理字符字符串位置的其他机制(例如,必须或不得出现在字符串开头或结尾的字符)。
o Delimiter breaks to permit multiple languages to be used, separately, within the same label. E.g., is it possible to define a label as consisting of two or more components, each in a different language, with some particular delimiter to define the boundaries of the components?
o 分隔符中断,允许在同一标签内单独使用多种语言。例如,是否可以将标签定义为由两个或多个组件组成,每个组件使用不同的语言,并使用特定的分隔符来定义组件的边界?
After examining the implications of the potential use of the full range of characters permitted by IDNA in DNS labels, multiple groups, including IESG [IESG-IDN] and ICANN [ICANN-IDN] [ICANN-IDN2], have concluded that some restrictions are needed to prevent many forms of user confusion about the actual structure of a name or the word, phrase, or term that it appears to spell out. The best way to approach such restrictions appears to draw from the language and culture of the community of registrants and users in the relevant zone: if particular characters are likely to be surprising or unintelligible to both of those groups, it is probably wise to not permit them to be used in registrations. Registration restrictions can be carried much further than restricting permitted characters to a selected Unicode subset. The idea of a reserved "bundle" of related labels permits probably-confusing combinations or sets of characters to be bound together, under the control of a single registrant. While that registrant might still use the package in a way that confused his or her own users (the approach outlined here
在研究了IDNA允许在DNS标签中使用的所有字符的潜在影响后,包括IESG[IESG-IDN]和ICANN[ICANN-IDN][ICANN-IDN2]在内的多个小组得出结论,需要一些限制,以防止用户对名称或单词、短语的实际结构产生多种形式的混淆,或者是它看起来拼写出来的术语。处理此类限制的最佳方式似乎是借鉴相关区域注册人和用户社区的语言和文化:如果特定字符可能会让这两个群体感到惊讶或无法理解,则不允许在注册中使用这些字符可能是明智的。注册限制可以比将允许的字符限制到选定的Unicode子集更进一步。保留相关标签的“捆绑”概念允许在单个注册人的控制下将可能混淆的字符组合或字符集绑定在一起。虽然该注册人可能仍然以一种让他或她自己的用户感到困惑的方式使用该软件包(这里概述的方法)
will not prevent either ill-though-out ideas or stupidity), the possibility of turning potential confusion into a hostile attack would be considerably reduced.
这既不能防止错误的想法,也不能防止愚蠢),将潜在的混乱转化为敌对攻击的可能性将大大降低。
At the same time, excessive restrictions may make DNS identifiers less useful for their original purpose: identifying particular hosts and similar resources on the network in an orderly way. Registries creating rules and policies about what can be registered in particular zones -- whether those are based on the JET Guidelines or the suggestions in this document -- should balance the need for restrictions against the need for flexibility in constructing identifiers.
同时,过度的限制可能会使DNS标识符对其原始用途(以有序的方式识别网络上的特定主机和类似资源)不太有用。创建关于在特定区域中可以注册哪些内容的规则和策略的注册中心——无论这些规则和策略是基于JET指南还是本文档中的建议——应该在限制需求和构造标识符的灵活性需求之间取得平衡。
The discussion above provides many options that could be selected, defined, and applied in different ways in different registries (zones). Registrars and registrants would almost certainly prefer systems in which they can predict, at least to a first order approximation, the implications of a particular potential registration. Predictability of that sort probably requires more standards, and less flexibility, than the model itself might suggest.
上面的讨论提供了许多选项,可以在不同的注册表(区域)中以不同的方式选择、定义和应用这些选项。登记人和登记人几乎肯定更喜欢能够预测(至少是一阶近似值)特定潜在登记的影响的系统。这种可预测性可能需要比模型本身更高的标准和更少的灵活性。
The format of the table is meant to be machine-readable but not human-readable. It is fairly trivial to convert the table into one that can be read by people.
表的格式是机器可读的,但不是人类可读的。将表转换为人们可以阅读的表是相当简单的。
Each character in the table is given in the "U+" notation for Unicode characters. The lines of the table are terminated with either a carriage return character (ASCII 0x0D), a linefeed character (ASCII 0x0A), or a sequence of carriage return followed by linefeed (ASCII 0x0D 0x0A). The order of the lines in the table may or may not matter, depending on how the table is constructed.
表中的每个字符都以Unicode字符的“U+”表示法给出。表中的行以回车符(ASCII 0x0D)、换行符(ASCII 0x0A)或换行符序列(ASCII 0x0D 0x0A)结尾。表中行的顺序可能重要,也可能不重要,这取决于表的构造方式。
Comment lines in the table are preceded with a "#" character (ASCII 0x2C).
表中的注释行前面带有“#”字符(ASCII 0x2C)。
Each non-comment line in the table starts with the character that is allowed in the registry and expected to be used in registrations, which is also called the "base character". If the base character has any variants, the base character is followed by a vertical bar character ("|", ASCII 0x7C) and the variant string. If the base character has more than one variant, the variants are separated by a colon (":", ASCII 0x3A). Strings are given with a hyphen ("-", ASCII 0x2D) between each character. Comments beginning with a "#" (ASCII 0x2C), and may be preceded by spaces (" ", ASCII 0x20).
表中的每个非注释行都以注册表中允许的字符开头,该字符预计将用于注册,也称为“基本字符”。如果基字符有任何变体,则基字符后面会跟一个竖线字符(“|”,ASCII 0x7C)和变体字符串。如果基字符有多个变体,则变体之间用冒号(“:”,ASCII 0x3A)分隔。字符串在每个字符之间带有连字符(“-”,ASCII 0x2D)。以“#”(ASCII 0x2C)开头的注释,前面可以有空格(“,ASCII 0x20)。
The following is an example of how a table might look. The entries in this table are purposely silly and should not be used by any registry as the basis for choosing variants. For the example, assume that the registry:
下面是一个表的外观示例。此表中的条目故意愚蠢,任何注册表都不应将其用作选择变体的基础。例如,假设注册表:
o allows the FOR ALL character (U+2200) with no variants
o 允许对所有字符(U+2200)使用无变体
o allows the COMPLEMENT character (U+2201) which has a single variant of LATIN CAPITAL LETTER C (U+0043)
o 允许补码字符(U+2201),该字符具有拉丁大写字母C(U+0043)的单一变体
o allows the PROPORTION character (U+2237) which has one variant which is the string COLON (U+003A) COLON (U+003A)
o 允许比例字符(U+2237),其中有一个变量,即字符串冒号(U+003A)冒号(U+003A)
o allows the PARTIAL DIFFERENTIAL character (U+2202) which has two variants: LATIN SMALL LETTER D (U+0064) and GREEK SMALL LETTER DELTA (U+03B4)
o 允许偏微分字符(U+2202),它有两种变体:拉丁字母D(U+0064)和希腊字母DELTA(U+03B4)
The table contents (after any required header information, see [IANA-language-registry] and the discussion in Section 7 below) would look like:
表内容(在任何必需的标题信息之后,请参见[IANA语言注册表]和下面第7节中的讨论)如下所示:
# An example of a table U+2200 U+2201|U+0043 U+2237|U+003A-U+003A # Note that the variant is a string U+2202|U+0064:U+03B4 # Two variants for the same character
# An example of a table U+2200 U+2201|U+0043 U+2237|U+003A-U+003A # Note that the variant is a string U+2202|U+0064:U+03B4 # Two variants for the same character
Implementers of table processors should remember that there are tens of thousands of characters whose codepoints are greater than 0xFFFF. Thus, any program that assumes that each character in the table is represented in exactly six octets ("U", "+", and four octets representing the character value) will fail with tables that use characters whose value is greater than 0xFFFF.
表处理器的实现者应该记住,有成千上万个字符的代码点大于0xFFFF。因此,如果任何程序假定表中的每个字符都以六个八位字节(“U”、“+”和四个代表字符值的八位字节)表示,那么使用值大于0xFFFF的字符的表将失败。
This procedure has three inputs:
此程序有三个输入:
1. the proposed base registration,
1. 拟议的基地注册,
2. the language (or script, if the registration is script-based, but "language" is used for convenience below) for the proposed base registration, and
2. 拟议基本注册的语言(或脚本,如果注册是基于脚本的,但为了方便下文使用“语言”),以及
3. the processing table associated with that language.
3. 与该语言关联的处理表。
The output of the process is either failure (the base registration cannot be registered at all), or a registration bundle that contains
进程的输出要么是失败(根本无法注册基本注册),要么是包含
one or more labels (always including the base registration). As described earlier, the registration bundle should be stored with its date of creation so that issues with overlapping elements between bundles can later be resolved on a first-come, first-served basis.
一个或多个标签(始终包括基本注册)。如前所述,注册包应与其创建日期一起存储,以便以后可以在先到先得的基础上解决包之间元素重叠的问题。
There are two steps to processing the registration:
处理注册有两个步骤:
1. Check whether the proposed base registration exists in any bundle. If it does, stop immediately with a failure.
1. 检查建议的基本注册是否存在于任何捆绑包中。如果出现故障,请立即停止。
2. Process the base registration with the mechanism described as "CreateBundle" in Section 6.1, below.
2. 使用下面第6.1节中描述的“CreateBundle”机制处理基本注册。
Note that the process must be executed only once. The process must not be performed on any output of the process, only on the proposed base registration.
请注意,该过程只能执行一次。不得对流程的任何输出执行流程,仅对提议的基础注册执行流程。
The CreateBundle mechanism determines whether a registration bundle can be created and, if so, populates that bundle with valid labels.
CreateBundle机制确定是否可以创建注册捆绑包,如果可以,则使用有效标签填充该捆绑包。
During the processing, a "temporary bundle" contains partial labels, that is, labels that are being built and are not complete labels. The partial labels in the temporary bundle consist of strings.
在处理过程中,“临时捆绑包”包含部分标签,即正在生成但不是完整标签的标签。临时束中的部分标签由字符串组成。
The steps are:
这些步骤是:
1. Split the base registration into individual characters, called "candidate characters". Compare every candidate character against the base characters in the table. If any candidate character does not exist in the set of base characters, the system must stop and not register any names (that is, it must not register either the base registration or any labels that would have come from character variants).
1. 将基本注册拆分为单个字符,称为“候选字符”。将每个候选字符与表中的基本字符进行比较。如果基本字符集中不存在任何候选字符,则系统必须停止并不注册任何名称(即,它不得注册基本注册或可能来自字符变体的任何标签)。
2. Perform the steps in IDNA's ToASCII sequence for the base registration. If ToASCII fails for the base registration, the system must stop and not register any label (that is, it must not register either the base registration or labels that might have been created from variants of characters contained in it). If ToASCII succeeds, place the base registration into the registration bundle.
2. 执行IDNA的ToASCII序列中的步骤进行基本注册。如果ToASCII未能进行基本注册,则系统必须停止,并且不得注册任何标签(即,不得注册基本注册或可能已由其中包含的字符变体创建的标签)。如果ToASCII成功,将基本注册放入注册包中。
3. For every candidate character in the base registration, do the following:
3. 对于基本注册中的每个候选角色,请执行以下操作:
o Create the set of characters that consists of the candidate character and any variants.
o 创建由候选角色和任何变体组成的角色集。
o For each character in the set from the previous step, duplicate the temporary bundle that resulted from the previous candidate character, and add the new character to the end of each partial label.
o 对于上一步中集合中的每个字符,复制由上一个候选字符生成的临时绑定,并将新字符添加到每个部分标签的末尾。
4. The temporary bundle now contains zero or more labels that consist of Unicode characters. For every label in the temporary bundle, do the following:
4. 临时捆绑包现在包含零个或多个由Unicode字符组成的标签。对于临时捆绑包中的每个标签,请执行以下操作:
o Process the label with ToASCII to see if ToASCII succeeds. If it does, add the label to the registration bundle. Otherwise, do not process this label from the temporary bundle any further; it will not go into the registration bundle.
o 使用ToASCII处理标签,查看ToASCII是否成功。如果有,请将标签添加到注册包中。否则,请勿从临时捆绑中进一步处理此标签;它不会进入注册包。
The result of the processing outlined above is the registration bundle with the base registration and possibly other labels.
上述处理的结果是带有基本注册和其他标签的注册包。
It is clear that, for many scripts, registries will choose to create tables without variants, either because variants are clearly not necessary or because they are determined to cause more confusion and overhead than is justified by the circumstances. For those situations the table model of Section 5 becomes a trivial listing of base characters and only the first two steps of CreateBundle (verifying that all candidate character are in the base ("valid") character list and verifying that the resulting characters will succeed in the ToASCII operation) are applicable. Even the second of those steps becomes pro forma if the advice in the next subsection is followed.
很明显,对于许多脚本,注册中心将选择创建没有变体的表,这要么是因为变体显然是不必要的,要么是因为它们被确定会导致比实际情况更大的混乱和开销。对于这些情况,第5节的表模型成为基本字符的简单列表,并且只有CreateBundle的前两个步骤(验证所有候选字符都在基本(“有效”)字符列表中,并验证生成的字符将在ToASCII操作中成功)适用。如果遵循下一小节中的建议,即使第二个步骤也会变成形式上的。
One of the functions of Nameprep, and IDNA more generally, is to map a large number of Unicode characters (code points) into a smaller number to avoid a different but overlapping set of confusion problems. For example, when a non-ASCII script makes distinctions between "upper case" and "lower case", nameprep maps the upper case characters to the lower case ones in order to simulate the DNS protocol's rule that ASCII characters are interpreted in a case-insensitive way. Unicode also contains many code points that are typographic variants on each other (e.g., forms with different widths and code points that designate font variations for mathematical uses), the Unicode standard explicitly identifies them that way, and Nameprep maps these onto base characters.
Nameprep和IDNA的功能之一是将大量Unicode字符(代码点)映射为较小的数字,以避免出现不同但重叠的混淆问题。例如,当非ASCII脚本区分“大写”和“小写”时,nameprep将大写字符映射到小写字符,以模拟DNS协议的规则,即ASCII字符以不区分大小写的方式解释。Unicode还包含许多相互之间是排版变体的代码点(例如,具有不同宽度的表单和指定用于数学用途的字体变体的代码点),Unicode标准以这种方式明确标识这些代码点,并且Nameprep将这些代码点映射到基本字符上。
While having these mapping functions available during lookup may be quite helpful to users who type equivalent forms, registrations are probably best performed in terms of the IDNA base characters only, i.e., those characters that nameprep will not change. This will have two advantages.
虽然在查找过程中使用这些映射函数可能对键入等效表单的用户非常有帮助,但注册可能仅在IDNA基本字符方面执行得最好,即nameprep不会更改的字符。这将有两个好处。
o Registrants will never find themselves in the rather confusing position of having submitted one string for registration and finding a different string in the registry database (which could otherwise occur even if the relevant language table does not contain variants).
o 注册人将永远不会发现自己处于相当混乱的境地,提交了一个字符串进行注册,并在注册表数据库中找到了另一个字符串(即使相关的语言表不包含变体,也可能发生这种情况)。
o Those who are interested in what characters are permitted by a given registry will only need to examine the relevant tables, rather than simulating the IDNA algorithm to determine the result of processing particular characters.
o 那些对给定注册表允许哪些字符感兴趣的人只需要检查相关的表,而不是模拟IDNA算法来确定处理特定字符的结果。
Under ICANN (not IETF) direction and management, the IANA has created a registry for language variant tables. The authoritative documentation for that registry is in [IANA-language-registry]. Since the registry exists and is being managed under ICANN direction, the material that follows is a review of the theory of this registry, rather than new instructions for IANA.
在ICANN(非IETF)的指导和管理下,IANA为语言变体表创建了一个注册表。该注册表的权威文档位于[IANA语言注册表]。由于注册中心的存在并在ICANN的指导下进行管理,下面的材料是对该注册中心理论的回顾,而不是IANA的新说明。
As described above and suggested in the JET Guidelines, the registration rules generally require only that:
如上所述和JET指南中所述,注册规则通常只要求:
o The application be submitted or endorsed by a TLD registry, to ensure that someone cares about the particular table.
o 申请必须由TLD注册表提交或背书,以确保有人关心特定的表。
o The table be identified by the following:
o 该表可由以下内容确定:
* the name -- usually the top-level domain name -- of the submitting or endorsing registry;
* 提交或背书注册中心的名称——通常是顶级域名;
* one of: a language designation (consistent with [RFC3066] or with some other system approved by the IANA), a script designation, a combination of the two, or a sequence number acceptable to IANA for this purpose;
* 其中之一:语言名称(与[RFC3066]或IANA批准的其他系统一致)、脚本名称、两者的组合或IANA为此目的可接受的序列号;
* a version number; and
* 版本号;和
* a date.
* 约会。
o Characters listed in the table be identified by Unicode code points, as discussed above.
o 如上文所述,表中列出的字符可以由Unicode代码点标识。
o The table format may correspond to that identified in [RFC3743], or in Section 5 above, or may be some variation on those themes appropriate to the local processing model (with or without variants).
o 表格格式可能与[RFC3743]或上文第5节中确定的格式相对应,也可能是适用于本地处理模型的主题的一些变体(有或没有变体)。
This raises some issues that will need to be worked out as experiences accumulate. For example, more standardization of table formats would be desirable to allow processing by the same computer tools for different registries and languages. But standardization seems premature at this time due to differences in languages, processing, and requirements and lack of experience with them. Similarly, if a registry concludes that it should use a table that contains characters from several scripts, it is not clear how such a table should be designated. Identifying it with a language code (either according to [RFC3066] or an independent code registered with IANA) is likely to just introduce more confusion, especially given other Internet uses of the language codes. It appears that some other convention will be needed for those cases, and it should be developed (if it has not already been established by the time this document is published).
这就提出了一些需要随着经验积累而解决的问题。例如,需要对表格格式进行更多的标准化,以允许使用相同的计算机工具对不同的登记册和语言进行处理。但由于语言、处理和需求的差异以及缺乏相关经验,目前标准化似乎还为时过早。类似地,如果注册表认为它应该使用一个包含多个脚本中的字符的表,则不清楚如何指定这样的表。用语言代码(根据[RFC3066]或在IANA注册的独立代码)识别它可能会带来更多的混淆,特别是考虑到语言代码在互联网上的其他用途。对于这些情况,似乎还需要一些其他公约,应该制定这些公约(如果在本文件出版时尚未制定)。
This document specifies a model mechanism for registering Internationalized Domain Names (IDNs) that can be used to reduce confusion among similar-appearing names. The proposal is designed to facilitate internationalization while permitting a balance between internationalization concerns and concerns about keeping the Internet global and domain name system references unique in the perception of the user as well as in practice.
本文档指定了一种注册国际化域名(IDN)的模型机制,该机制可用于减少类似名称之间的混淆。该提案旨在促进国际化,同时允许在国际化关注点与保持互联网全球和域名系统引用在用户感知和实践中唯一的关注点之间取得平衡。
Registration of labels in the DNS that contain essentially unrestricted sequences of arbitrary Unicode characters may introduce opportunities for either attacks or simple confusion. Some of these risks, such as confusion about which character (of several that look alike) is actually intended, may be associated with the presentation form of DNS names. Others may be linked to databases associated with the DNS, e.g., with the difficulty of finding an entry in a "Whois file" when it is not clear how to enter or to search for the characters that make up a name. This document discusses a family of restrictions on the names that can be registered. Restrictions of the type described can be imposed by a DNS zone ("registry"). The document also describes some possible tools for implementing such restrictions.
在DNS中注册包含基本上不受限制的任意Unicode字符序列的标签可能会导致攻击或简单混淆。这些风险中的一些可能与DNS名称的表示形式有关,例如混淆(几个看起来相似的字符中的)实际使用的字符。其他可能链接到与DNS相关联的数据库,例如,当不清楚如何输入或搜索组成名称的字符时,很难在“Whois文件”中找到条目。本文档讨论了对可注册名称的一系列限制。DNS区域(“注册表”)可以施加所述类型的限制。本文件还描述了一些可能用于实施此类限制的工具。
While the increased number and types of characters made available by Unicode considerably increases the scale of the potential problems, the problems addressed by this document are not new. No plausible set of restrictions will eliminate all problems and sources of confusion: for example, it has often been pointed out that, even in ASCII, the characters digit-one ("1") and lower case L ("l") can easily be confused in some display fonts. But, to the degree to which security may be aided by sensible risk reduction, these techniques may be helpful.
虽然Unicode提供的字符数量和类型的增加大大增加了潜在问题的规模,但本文档解决的问题并不是新问题。没有一套合理的限制可以消除所有问题和混淆源:例如,经常有人指出,即使在ASCII中,在某些显示字体中,数字1(“1”)和小写L(“L”)也很容易混淆。但是,在某种程度上,合理的风险降低可能有助于安全,这些技术可能会有所帮助。
Discussions in the process of developing the JET Guidelines were vital in developing this document and all of the JET participants are consequently acknowledged. Attempts to explain some of the issues uncovered there to, and feedback from, Vint Cerf, Wendy Rickard, and members of the ICANN IDN Committee were also helpful in the thinking leading up to this document.
制定JET指南过程中的讨论对于制定本文件至关重要,因此,所有JET参与者都得到了认可。试图向文特·瑟夫(Vint Cerf)、温迪·里卡德(Wendy Rickard)和ICANN IDN委员会成员解释其中发现的一些问题以及他们的反馈也有助于形成本文件。
An effort by Paul Hoffman to create a generic specification for registration restrictions of this type helped to inspire this document, which takes a somewhat different, more language-oriented, approach than his initial draft. While the initial version of that draft indicated that multiple languages (or multiple language tables) for a single zone were infeasible, more recent versions [Hoffman-reg] shifted to inclusion of language-based approaches. The current version of this document incorporates considerable text, and even more ideas, from those drafts, with Paul Hoffman's generous permission.
保罗·霍夫曼(Paul Hoffman)为这种类型的注册限制创建通用规范的努力有助于激发本文件的灵感,该文件采用了与他的初稿稍有不同、更面向语言的方法。虽然该草案的初始版本表明单一区域的多种语言(或多语言表)不可行,但较新版本[霍夫曼条例]转向纳入基于语言的方法。在保罗·霍夫曼(Paul Hoffman)的慷慨许可下,本文件的当前版本包含了这些草案中的大量文本,甚至更多的想法。
Feedback was provided by several registry operators (of both country code and generic TLDs), including Edmon Chung and Ram Mohan of Afilias, and by ICANN and IANA staff, notably Tina Dam and Theresa Swinehart. This feedback about issues encountered in registering tables and designing IDN implementations resulted in the addition of significant clarifying text to the current version of the document.
一些注册运营商(包括国家代码和通用TLD)提供了反馈,包括阿菲利亚的Edmon Chung和Ram Mohan,ICANN和IANA的工作人员,尤其是Tina Dam和Theresa Swinehart。关于注册表和设计IDN实现过程中遇到的问题的反馈导致在文档的当前版本中添加了重要的澄清文本。
The opinions expressed here are the sole responsibility of the author. Some of those whose ideas and comments are reflected in this document may disagree with the conclusions the author has drawn from them. The first draft version of this document was posted in June 2003.
此处所表达的意见由作者全权负责。一些观点和评论反映在本文件中的人可能不同意作者从中得出的结论。本文件初稿于2003年6月公布。
[Daniels] P.T. Daniels and W. Bright, The World's Writing Systems, Oxford: Oxford University Press: 1996.
[Daniels]P.T.Daniels和W.Bright,《世界写作系统》,牛津:牛津大学出版社:1996年。
[Drucker] Drucker, J., "The Alphabetic Labyrinth: The Letters in History and Imagination", 1995.
[德鲁克]德鲁克,J.,“字母迷宫:历史和想象中的字母”,1995年。
[Hoffman-reg] Hoffman, P., "A Method for Registering Internationalized Domain Names", Work in Progress, October 2003.
[Hoffman reg]Hoffman,P.,“注册国际化域名的方法”,正在进行的工作,2003年10月。
[IESG-IDN] Internet Engineering Steering Group, IETF, "IESG Statement on IDN", IESG Statement available from http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt, February 2003.
[IESG-IDN]互联网工程指导小组,IETF,“IESG关于IDN的声明”,IESG声明可从http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt,2003年2月。
[ICANN-IDN] Internet Corporation for Assigned Names and Numbers (ICANN), "Guidelines for the Implementation of Internationalized Domain Names, Version 1.0", June 2003.
[ICANN-IDN]互联网域名和数字分配公司(ICANN),“国际化域名实施指南,1.0版”,2003年6月。
[ICANN-IDN2] Internet Corporation for Assigned Names and Numbers (ICANN), "Guidelines for the Implementation of Internationalized Domain Names, Version 2.0", September 2005.
[ICANN-IDN2]互联网域名和数字分配公司(ICANN),“国际化域名实施指南,2.0版”,2005年9月。
[IANA-language-registry] Internet Assigned Numbers Authority (IANA), "IDN Language Table Registry", April 2004.
[IANA语言登记处]互联网分配号码管理局(IANA),“IDN语言表登记处”,2004年4月。
[LTRU-Registry] Phillips, A., Ed. and M. Davis, Ed., "Tags for Identifying Languages", Work in Progress, October 2005.
[LTRU注册表]Phillips,A.,Ed.和M.Davis,Ed.,“识别语言的标签”,正在进行的工作,2005年10月。
[RFC952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet host table specification", RFC 952, October 1985.
[RFC952]Harrenstien,K.,Stahl,M.和E.Feinler,“国防部互联网主机表规范”,RFC952,1985年10月。
[RFC1035] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, November 1987.
[RFC1035]Mockapetris,P.,“域名-实现和规范”,STD 13,RFC 1035,1987年11月。
[RFC3066] Alvestrand, H., "Tags for the Identification of Languages", BCP 47, RFC 3066, January 2001.
[RFC3066]Alvestrand,H.,“语言识别标签”,BCP 47,RFC 3066,2001年1月。
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003.
[RFC3490]Faltstrom,P.,Hoffman,P.,和A.Costello,“应用程序中的域名国际化(IDNA)”,RFC 34902003年3月。
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003.
[RFC3491]Hoffman,P.和M.Blanchet,“Nameprep:国际化域名(IDN)的Stringprep配置文件”,RFC 3491,2003年3月。
[RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003.
[RFC3492]Costello,A.,“Punycode:应用程序中国际化域名的Unicode引导字符串编码(IDNA)”,RFC 3492,2003年3月。
[RFC3536] Hoffman, P., "Terminology Used in Internationalization in the IETF", RFC 3536, May 2003.
[RFC3536]Hoffman,P.,“IETF国际化中使用的术语”,RFC3536,2003年5月。
[RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint Engineering Team (JET) Guidelines for Internationalized Domain Names (IDN) Registration and Administration for Chinese, Japanese, and Korean", RFC 3743, April 2004.
[RFC3743]Konishi,K.,Huang,K.,Qian,H.,和Y.Ko,“中国,日本和韩国的国际域名(IDN)注册和管理联合工程团队(JET)指南”,RFC 37432004年4月。
[Unicode] The Unicode Consortium, "The Unicode Standard -- Version 3.0", January 2000.
[Unicode]Unicode联盟,“Unicode标准——3.0版”,2000年1月。
[Unicode32] The Unicode Consortium, "Unicode Standard Annex #28: Unicode 3.2", March 2002.
[Unicode32]Unicode联盟,“Unicode标准附录28:Unicode 3.2”,2002年3月。
Author's Address
作者地址
John C Klensin 1770 Massachusetts Ave, #322 Cambridge, MA 02140 USA
美国马萨诸塞州剑桥市322号马萨诸塞大道1770号约翰·C·克伦辛,邮编:02140
Phone: +1 617 491 5735 EMail: john-ietf@jck.com
Phone: +1 617 491 5735 EMail: john-ietf@jck.com
Full Copyright Statement
完整版权声明
Copyright (C) The Internet Society (2005).
版权所有(C)互联网协会(2005年)。
This document is subject to the rights, licenses and restrictions contained in BCP 78 and at www.rfc-editor.org/copyright.html, and except as set forth therein, the authors retain all their rights.
本文件受BCP 78和www.rfc-editor.org/copyright.html中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件及其包含的信息是按“原样”提供的,贡献者、他/她所代表或赞助的组织(如有)、互联网协会和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。
Intellectual Property
知识产权
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.
IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.
Acknowledgement
确认
Funding for the RFC Editor function is currently provided by the Internet Society.
RFC编辑功能的资金目前由互联网协会提供。