Network Working Group J. Klensin Request for Comments: 4690 P. Faltstrom Category: Informational Cisco Systems C. Karp Swedish Museum of Natural History IAB September 2006
Network Working Group J. Klensin Request for Comments: 4690 P. Faltstrom Category: Informational Cisco Systems C. Karp Swedish Museum of Natural History IAB September 2006
Review and Recommendations for Internationalized Domain Names (IDNs)
对国际化域名(IDN)的审查和建议
Status of This Memo
关于下段备忘
This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The Internet Society (2006).
版权所有(C)互联网协会(2006年)。
Abstract
摘要
This note describes issues raised by the deployment and use of Internationalized Domain Names. It describes problems both at the time of registration and for use of those names in the DNS. It recommends that IETF should update the RFCs relating to IDNs and a framework to be followed in doing so, as well as summarizing and identifying some work that is required outside the IETF. In particular, it proposes that some changes be investigated for the Internationalizing Domain Names in Applications (IDNA) standard and its supporting tables, based on experience gained since those standards were completed.
本说明描述了部署和使用国际化域名引起的问题。它描述了注册时以及在DNS中使用这些名称时出现的问题。它建议IETF更新与IDN相关的RFC和更新过程中应遵循的框架,并总结和确定IETF之外需要的一些工作。特别是,它建议根据自这些标准完成以来取得的经验,对应用程序中的域名国际化(IDNA)标准及其支持表进行一些修改。
Table of Contents
目录
1. Introduction ....................................................3 1.1. The Role of IDNs and This Document .........................3 1.2. Status of This Document and Its Recommendations ............4 1.3. The IDNA Standard ..........................................4 1.4. Unicode Documents ..........................................5 1.5. Definitions ................................................5 1.5.1. Language ............................................6 1.5.2. Script ..............................................6 1.5.3. Multilingual ........................................6 1.5.4. Localization ........................................7 1.5.5. Internationalization ................................7
1. Introduction ....................................................3 1.1. The Role of IDNs and This Document .........................3 1.2. Status of This Document and Its Recommendations ............4 1.3. The IDNA Standard ..........................................4 1.4. Unicode Documents ..........................................5 1.5. Definitions ................................................5 1.5.1. Language ............................................6 1.5.2. Script ..............................................6 1.5.3. Multilingual ........................................6 1.5.4. Localization ........................................7 1.5.5. Internationalization ................................7
1.6. Statements and Guidelines ..................................7 1.6.1. IESG Statement ......................................8 1.6.2. ICANN Statements ....................................8 2. General Problems and Issues ....................................11 2.1. User Conceptions, Local Character Sets, and Input issues ..11 2.2. Examples of Issues ........................................13 2.2.1. Language-Specific Character Matching ...............13 2.2.2. Multiple Scripts ...................................13 2.2.3. Normalization and Character Mappings ...............14 2.2.4. URLs in Printed Form ...............................16 2.2.5. Bidirectional Text .................................17 2.2.6. Confusable Character Issues ........................17 2.2.7. The IESG Statement and IDNA issues .................19 3. Migrating to New Versions of Unicode ...........................20 3.1. Versions of Unicode .......................................20 3.2. Version Changes and Normalization Issues ..................21 3.2.1. Unnormalized Combining Sequences ...................21 3.2.2. Combining Characters and Character Components ......22 3.2.3. When does normalization occur? .....................23 4. Framework for Next Steps in IDN Development ....................24 4.1. Issues within the Scope of the IETF .......................24 4.1.1. Review of IDNA .....................................24 4.1.2. Non-DNS and Above-DNS Internationalization Approaches .........................................25 4.1.3. Security Issues, Certificates, etc. ................25 4.1.4. Protocol Changes and Policy Implications ...........27 4.1.5. Non-US-ASCII in Local Part of Email Addresses ......27 4.1.6. Use of the Unicode Character Set in the IETF .......27 4.2. Issues That Fall within the Purview of ICANN ..............28 4.2.1. Dispute Resolution .................................28 4.2.2. Policy at Registries ...............................28 4.2.3. IDNs at the Top Level of the DNS ...................29 5. Specific Recommendations for Next Steps ........................29 5.1. Reduction of Permitted Character List .....................29 5.1.1. Elimination of All Non-Language Characters .........30 5.1.2. Elimination of Word-Separation Punctuation .........30 5.2. Updating to New Versions of Unicode .......................30 5.3. Role and Uses of the DNS ..................................31 5.4. Databases of Registered Names .............................31 6. Security Considerations ........................................31 7. Acknowledgements ...............................................32 8. References .....................................................32 8.1. Normative References ......................................32 8.2. Informative References ....................................33
1.6. Statements and Guidelines ..................................7 1.6.1. IESG Statement ......................................8 1.6.2. ICANN Statements ....................................8 2. General Problems and Issues ....................................11 2.1. User Conceptions, Local Character Sets, and Input issues ..11 2.2. Examples of Issues ........................................13 2.2.1. Language-Specific Character Matching ...............13 2.2.2. Multiple Scripts ...................................13 2.2.3. Normalization and Character Mappings ...............14 2.2.4. URLs in Printed Form ...............................16 2.2.5. Bidirectional Text .................................17 2.2.6. Confusable Character Issues ........................17 2.2.7. The IESG Statement and IDNA issues .................19 3. Migrating to New Versions of Unicode ...........................20 3.1. Versions of Unicode .......................................20 3.2. Version Changes and Normalization Issues ..................21 3.2.1. Unnormalized Combining Sequences ...................21 3.2.2. Combining Characters and Character Components ......22 3.2.3. When does normalization occur? .....................23 4. Framework for Next Steps in IDN Development ....................24 4.1. Issues within the Scope of the IETF .......................24 4.1.1. Review of IDNA .....................................24 4.1.2. Non-DNS and Above-DNS Internationalization Approaches .........................................25 4.1.3. Security Issues, Certificates, etc. ................25 4.1.4. Protocol Changes and Policy Implications ...........27 4.1.5. Non-US-ASCII in Local Part of Email Addresses ......27 4.1.6. Use of the Unicode Character Set in the IETF .......27 4.2. Issues That Fall within the Purview of ICANN ..............28 4.2.1. Dispute Resolution .................................28 4.2.2. Policy at Registries ...............................28 4.2.3. IDNs at the Top Level of the DNS ...................29 5. Specific Recommendations for Next Steps ........................29 5.1. Reduction of Permitted Character List .....................29 5.1.1. Elimination of All Non-Language Characters .........30 5.1.2. Elimination of Word-Separation Punctuation .........30 5.2. Updating to New Versions of Unicode .......................30 5.3. Role and Uses of the DNS ..................................31 5.4. Databases of Registered Names .............................31 6. Security Considerations ........................................31 7. Acknowledgements ...............................................32 8. References .....................................................32 8.1. Normative References ......................................32 8.2. Informative References ....................................33
While IDNs have been advocated as the solution to a wide range of problems, this document is written from the perspective that they are no more and no less than DNS names, reflecting the same requirements for use, stability, and accuracy as traditional "hostnames", but using a much larger collection of permitted characters. In particular, while IDNs represent a step toward an Internet that is equally accessible from all languages and scripts, they, at best, address only a small part of that very broad objective. There has been controversy since IDNs were first suggested about how important they will actually turn out to be; that controversy will probably continue. Accessibility from all languages is an important objective, hence it is important that our standards and definitions for IDNs be smoothly adaptable to additional scripts as they are added to the Unicode character set.
虽然IDN被认为是一系列问题的解决方案,但本文档的撰写角度是,IDN与DNS名称相同,反映了与传统“主机名”相同的使用、稳定性和准确性要求,但使用了更大的允许字符集。特别是,虽然IDN代表着向所有语言和脚本都能平等访问的互联网迈进了一步,但它们充其量只能解决这一广泛目标的一小部分。自从IDN第一次被提出以来,人们就一直在争论它们到底有多重要;这场争论可能会继续下去。所有语言的可访问性都是一个重要的目标,因此,我们的IDN标准和定义在添加到Unicode字符集时能够顺利地适应其他脚本非常重要。
The utility of IDNs must be evaluated in terms of their application by users and in protocols: the ability to simply put a name into the DNS and retrieve it is not, in and of itself, important. From this point of view, IDNs will be useful and effective if they provide stable and predictable references -- references that are no less stable and predictable, and no less secure, than their ASCII counterparts.
IDN的效用必须根据用户的应用和协议进行评估:简单地将名称放入DNS并检索它本身并不重要。从这个角度来看,如果IDN提供稳定和可预测的引用,那么IDN将是有用和有效的——这些引用的稳定性和可预测性不亚于它们的ASCII对等引用,安全性也不亚于它们。
This combination of objectives and criteria has proven very difficult to satisfy. Experience in developing the IDNA standard and during the initial years of its implementation and deployment suggests that it may be impossible to fully satisfy all of them and that engineering compromises are needed to yield a result that is workable, even if not completely satisfactory. Based on that experience and issues that have been raised, it is now appropriate to review some of the implications of IDNs, the decisions made in defining them, and the foundation on which they rest and determine whether changes are needed and, if so, which ones.
事实证明,这种目标和标准的结合很难满足。在制定IDNA标准及其实施和部署的最初几年中的经验表明,可能不可能完全满足所有这些标准,并且需要工程折衷以产生可行的结果,即使不完全令人满意。基于这些经验和提出的问题,现在回顾一下IDN的一些含义,定义它们的决定,以及它们休息的基础,确定是否需要改变,如果有的话,哪些是合适的。
The design of the DNS itself imposes some additional constraints. If the DNS is to remain globally interoperable, there are specific characteristics that no implementation of IDNs, or the DNS more generally, can change. For example, because the DNS is a global hierarchal administrative namespace with only a single name at any given node, there is one and only one owner of each domain name. Also, when strings are looked up in the DNS, positive responses can only reflect exact matches: if there is no exact match, then one gets an error reply, not a list of near matches or other supplemental information. Searches and approximate matchings are not possible.
DNS本身的设计施加了一些额外的限制。如果DNS要保持全局互操作性,则IDN的任何实现或更一般的DNS都无法更改某些特定特征。例如,由于DNS是一个全局分层管理命名空间,在任何给定节点上只有一个名称,因此每个域名都只有一个所有者。此外,当在DNS中查找字符串时,肯定响应只能反映精确匹配:如果没有精确匹配,则会得到错误回复,而不是近似匹配列表或其他补充信息。无法进行搜索和近似匹配。
Finally, because the DNS is a distributed system where any server might cache responses, and later use those cached responses to attempt to satisfy queries before a global lookup is done, every server must use the same matching criteria.
最后,由于DNS是一个分布式系统,任何服务器都可能缓存响应,然后在完成全局查找之前使用这些缓存的响应尝试满足查询,因此每个服务器都必须使用相同的匹配条件。
This document reviews the IDN landscape from an IETF perspective and presents the recommendations and conclusions of the IAB, based partially on input from an ad hoc committee charged with reviewing IDN issues and the path forward (see Section 7). Its recommendations are advice to the IETF, or in a few cases to other bodies, for topics to be investigated and actions to be taken if those bodies, after their examinations, consider those actions appropriate.
本文件从IETF的角度审查了IDN情况,并提出了IAB的建议和结论,部分基于负责审查IDN问题和前进道路的特设委员会的意见(见第7节)。它的建议是IETF的建议,或在少数情况下向其他机构提出的建议,如果这些机构在他们的考试后认为这些行动是适当的,就要调查的问题和采取的行动。
During 2002, the IETF completed the following RFCs that, together, define IDNs:
2002年期间,IETF完成了以下RFC,这些RFC共同定义了IDN:
RFC 3454 Preparation of Internationalized Strings ("Stringprep") [RFC3454]. Stringprep is a generic mechanism for taking a Unicode string and converting it into a canonical format. Stringprep itself is just a collection of rules, tables, and operations. Any protocol or algorithm that uses it must define a "Stringprep profile", which specifies which of those rules are applied, how, and with which characteristics.
RFC 3454国际化字符串的准备(“Stringprep”)[RFC3454]。Stringprep是一种通用机制,用于获取Unicode字符串并将其转换为规范格式。Stringprep本身只是规则、表和操作的集合。任何使用它的协议或算法都必须定义一个“Stringprep配置文件”,它指定应用哪些规则、如何应用以及使用哪些特征。
RFC 3490 Internationalizing Domain Names in Applications (IDNA) [RFC3490]. IDNA is the base specification in this group. It specifies that Nameprep is used as the Stringprep profile for domain names, and that Punycode is the relevant encoding mechanism for use in generating an ASCII-compatible ("ACE") form of the name. It also applies some additional conversions and character filtering that are not part of Nameprep.
RFC 3490应用程序中的域名国际化(IDNA)[RFC3490]。IDNA是该组中的基本规范。它指定Nameprep用作域名的Stringprep配置文件,Punycode是用于生成名称的ASCII兼容(“ACE”)形式的相关编码机制。它还应用了一些附加的转换和字符过滤,这些不是Nameprep的一部分。
RFC 3491 Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN) [RFC3491]. Nameprep is designed to meet the specific needs of IDNs and, in particular, to support case-folding for scripts that support what are traditionally known as upper- and lowercase forms of the same letters. The result of the Nameprep algorithm is a string containing a subset of the Unicode Character set, normalized and case-folded so that case-insensitive comparison can be made.
RFC 3491 Nameprep:用于国际化域名(IDN)的Stringprep配置文件[RFC3491]。Nameprep旨在满足IDN的特定需求,特别是支持支持传统上称为相同字母的大小写形式的脚本的大小写折叠。Nameprep算法的结果是一个包含Unicode字符集子集的字符串,经过规范化和大小写折叠,因此可以进行不区分大小写的比较。
RFC 3492 Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA) [RFC3492]. Punycode is a mechanism for encoding a Unicode string in ASCII characters. The characters used are the same the subset of characters that are allowed in the hostname definition of DNS, i.e., the "letter, digit, and hyphen" characters, sometimes known as "LDH".
RFC 3492 Punycode:应用程序中国际化域名的Unicode引导字符串编码(IDNA)[RFC3492]。Punycode是一种用ASCII字符编码Unicode字符串的机制。使用的字符与DNS主机名定义中允许的字符子集相同,即“字母、数字和连字符”,有时称为“LDH”。
Unicode is used as the base, and defining, character set for IDNs. Unicode is standardized by the Unicode Consortium, and synchronized with ISO to create ISO/IEC 10646 [ISO10646]. At the time the RFCs mentioned earlier were created, Unicode was at Version 3.2. For reasons explained later, it was necessary to pick a particular, then-current, version of Unicode when IDNA was adopted. Consequently, the RFCs are explicitly dependent on Unicode Version 3.2 [Unicode32]. There is, at present, no established mechanism for modifying the IDNA RFCs to use newer Unicode versions (see Section 3.1).
Unicode用作IDN的基本和定义字符集。Unicode由Unicode联盟标准化,并与ISO同步以创建ISO/IEC 10646[ISO10646]。在创建前面提到的RFC时,Unicode版本为3.2。出于后面解释的原因,在采用IDNA时,有必要选择一个特定的、当时最新的Unicode版本。因此,RFC显式依赖于Unicode版本3.2[Unicode 32]。目前,还没有建立机制来修改IDNA RFC以使用更新的Unicode版本(见第3.1节)。
Unicode is a very large and complex character set. (The term "character set" or "charset" is used in a way that is peculiar to the IETF and may not be the same as the usage in other bodies and contexts.) The Unicode Standard and related documents are created and maintained by the Unicode Technical Committee (UTC), one of the committees of the Unicode Consortium.
Unicode是一个非常大且复杂的字符集。(术语“字符集”或“字符集”的使用方式是IETF特有的,可能与其他机构和上下文中的用法不同。)Unicode标准和相关文档由Unicode联合体的委员会之一Unicode技术委员会(UTC)创建和维护。
The Consortium first published The Unicode Standard [Unicode10] in 1991, and continues to develop standards based on that original work. Unicode is developed in conjunction with the International Organization for Standardization, and it shares its character repertoire with ISO/IEC 10646. Unicode and ISO/IEC 10646 function equivalently as character encodings, but The Unicode Standard contains much more information for implementers, covering -- in depth -- topics such as bitwise encoding, collation, and rendering. The Unicode Standard enumerates a multitude of character properties, including those needed for supporting bidirectional text. The Unicode Consortium and ISO standards do use slightly different terminology.
该联盟于1991年首次发布了Unicode标准[Unicode10],并继续在原著的基础上开发标准。Unicode是与国际标准化组织联合开发的,它与ISO/IEC 10646共享其字符库。Unicode和ISO/IEC 10646的功能与字符编码相当,但Unicode标准包含了更多供实施者使用的信息,深入讨论了位编码、排序和呈现等主题。Unicode标准列举了大量字符属性,包括支持双向文本所需的属性。Unicode联盟和ISO标准使用的术语略有不同。
The following terms and their meanings are critical to understanding the rest of this document and to discussions of IDNs more generally. These terms are derived from [RFC3536], which contains additional discussion of some of them.
以下术语及其含义对于理解本文件其余部分以及更广泛地讨论IDN至关重要。这些术语来源于[RFC3536],其中包含对其中一些术语的附加讨论。
A language is a way that humans interact. The use of language occurs in many forms, including speech, writing, and signing.
语言是人类互动的一种方式。语言的使用有多种形式,包括演讲、写作和签名。
Some languages have a close relationship between the written and spoken forms, while others have a looser relationship. RFC 3066 [RFC3066] discusses languages in more detail and provides identifiers for languages for use in Internet protocols. Computer languages are explicitly excluded from this definition. The most recent IETF work in this area, and on script identification (see below), is documented in [RFC4645] and [RFC4646].
一些语言的书面形式和口头形式之间有着密切的关系,而另一些语言则有着松散的关系。RFC 3066[RFC3066]更详细地讨论了语言,并提供了用于Internet协议的语言标识符。计算机语言明确排除在此定义之外。[RFC4645]和[RFC4646]中记录了该领域的最新IETF工作以及脚本识别(见下文)。
A script is a set of graphic characters used for the written form of one or more languages. This definition is the one used in [ISO10646].
脚本是一组图形字符,用于一种或多种语言的书面形式。此定义是[ISO10646]中使用的定义。
Examples of scripts are Arabic, Cyrillic, Greek, Han (the so-called ideographs used in writing Chinese, Japanese, and Korean), and "Latin". Arabic, Greek, and Latin are, of course, also names of languages.
例如阿拉伯文、西里尔文、希腊文、韩文(书写汉语、日语和韩语时使用的所谓表意文字)和“拉丁语”。当然,阿拉伯语、希腊语和拉丁语也是语言的名称。
Historically, the script that is known as "Latin" in Unicode and most contexts associated with information technology standards is known in the linguistic community as "Roman" or "Roman-derived". The latter terminology distinguishes between the Latin language and the characters used to write it, especially in Republican times, from the much richer and more decorated script derived and adapted from those characters. Since IDNA is defined using Unicode and that standard used the term "LATIN" in its character names and descriptions, that terminology will be used in this document as well except when "Roman-derived" is needed for clarity. However, readers approaching this document from a cultural or linguistic standpoint should be aware that the use of, or references to, "Latin script" in this document refers to the entire collection of Roman-derived characters, not just the characters used to write the Latin language. Some other issues with script identification and relationships with other standards are discussed in [RFC4646].
历史上,Unicode中被称为“拉丁”的脚本以及与信息技术标准相关的大多数上下文在语言学界被称为“罗马”或“罗马衍生”。后一个术语区分了拉丁语和书写拉丁语的字符,尤其是在共和时代,与从这些字符衍生和改编的内容更丰富、装饰更精美的脚本。由于IDNA是使用Unicode定义的,且该标准在其字符名称和描述中使用了术语“拉丁语”,因此本文件中也将使用该术语,除非为了清晰起见需要使用“罗马派生语”。但是,从文化或语言角度阅读本文件的读者应注意,本文件中使用或提及的“拉丁文字”指的是整个罗马衍生字符集,而不仅仅是用于书写拉丁语的字符。[RFC4646]中讨论了脚本标识的一些其他问题以及与其他标准的关系。
The term "multilingual" has many widely-varying definitions and thus is not recommended for use in standards. Some of the definitions relate to the ability to handle international characters; other definitions relate to the ability to handle multiple charsets; and still others relate to the ability to handle multiple languages.
“多语言”一词有许多不同的定义,因此不建议在标准中使用。一些定义与处理国际字符的能力有关;其他定义涉及处理多个字符集的能力;还有一些涉及到处理多种语言的能力。
While this term has been deprecated for IETF-related uses and does not otherwise appear in this document, a discussion here seemed appropriate since the term is still widely used in some discussions of IDNs.
虽然该术语在IETF相关用途中已被弃用,并且本文档中并未出现,但此处的讨论似乎是适当的,因为该术语在一些IDN讨论中仍被广泛使用。
Localization is the process of adapting an internationalized application platform or application to a specific cultural environment. In localization, the same semantics are preserved while the syntax or presentation forms may be changed.
本地化是使国际化应用程序平台或应用程序适应特定文化环境的过程。在本地化中,相同的语义被保留,而语法或表示形式可能被更改。
Localization is the act of tailoring an application for a different language or script or culture. Some internationalized applications can handle a wide variety of languages. Typical users understand only a small number of languages, so the program must be tailored to interact with users in just the languages they know.
本地化是为不同的语言、脚本或文化定制应用程序的行为。一些国际化应用程序可以处理多种语言。典型的用户只懂少量的语言,因此必须对程序进行定制,以便只使用他们所知道的语言与用户进行交互。
Somewhat different definitions for localization and internationalization (see below) are used by groups other than the IETF. See [W3C-Localization] for one example.
IETF以外的团体使用的本地化和国际化定义(见下文)略有不同。有关一个示例,请参见[W3C本地化]。
In the IETF, the term "internationalization" is used to describe adding or improving the handling of non-ASCII text in a protocol. Other bodies use the term in other ways, often with subtle variation in meaning. The term "internationalization" is often abbreviated "i18n" (and localization as "l10n").
在IETF中,术语“国际化”用于描述在协议中添加或改进非ASCII文本的处理。其他机构以其他方式使用该术语,通常在含义上有细微的变化。术语“国际化”通常缩写为“i18n”(本地化为“l10n”)。
Many protocols that handle text only handle the characters associated with one script (often, a subset of the characters used in writing English text), or leave the question of what character set is used up to local guesswork (which leads to interoperability problems). Adding non-ASCII text to such a protocol allows the protocol to handle more scripts, with the intention of being able to include all of the scripts that are useful in the world. It is naive (sic) to believe that all English words can be written in ASCII, various mythologies notwithstanding.
许多处理文本的协议只处理与一个脚本相关的字符(通常是编写英文文本时使用的字符的子集),或者将使用什么字符集的问题留给本地猜测(这会导致互操作性问题)。向这样的协议中添加非ASCII文本允许协议处理更多脚本,目的是能够包含世界上所有有用的脚本。尽管有各种各样的神话传说,但相信所有英语单词都可以用ASCII码书写是天真的(原文如此)。
When the IDNA RFCs were published, the IESG and ICANN made statements that were intended to guide deployment and future work. In recent months, ICANN has updated its statement and others have also made contributions. It is worth noting that the quality of understanding of internationalization issues as applied to the DNS has evolved
当IDNA RFC发布时,IESG和ICANN发表了旨在指导部署和未来工作的声明。最近几个月,ICANN更新了声明,其他人也做出了贡献。值得注意的是,对应用于DNS的国际化问题的理解质量有所提高
considerably over the last few years. Organizations that took specific positions a year or more ago might not make exactly the same statements today.
在过去的几年里,这是相当可观的。一年或更长时间前担任特定职位的组织今天可能不会做出完全相同的声明。
The IESG made a statement on IDNA [IESG-IDN]:
IESG就IDNA发表了一项声明[IESG-IDN]:
IDNA, through its requirement of Nameprep [RFC3491], uses equivalence tables that are based only on the characters themselves; no attention is paid to the intended language (if any) for the domain name. However, for many domain names, the intended language of one or more parts of the domain name actually does matter to the users.
IDNA通过其Nameprep[RFC3491]的要求,使用仅基于字符本身的等价表;未注意域名的预期语言(如有)。然而,对于许多域名来说,域名的一个或多个部分的预期语言实际上对用户很重要。
Similarly, many names cannot be presented and used without ambiguity unless the scripts to which their characters belong are known. In both cases, this additional information should be of concern to the registry.
类似地,许多名称不能在没有歧义的情况下呈现和使用,除非它们的字符所属的脚本是已知的。在这两种情况下,书记官处都应关注这一补充信息。
The statement is longer than this, but these paragraphs are the important ones. The rest of the statement consists of explanations and examples.
声明比这要长,但这些段落是重要的。陈述的其余部分包括解释和示例。
Soon after the IDNA standards were adopted, ICANN produced an initial version of its "IDN Guidelines" [ICANNv1]. This document was intended to serve two purposes. The first was to provide a basis for releasing the Generic Top Level Domain (gTLD) registries that had been established by ICANN from a contractual restriction on the registration of labels containing hyphens in the third and fourth positions. The second was to provide a general framework for the development of registry policies for the implementation of IDNs.
IDNA标准通过后不久,ICANN就制定了其“IDN指南”的初始版本[ICANNv1]。本文件有两个目的。第一个是为将ICANN建立的通用顶级域(gTLD)注册中心从第三位和第四位连字符标签注册的合同限制中释放出来提供基础。第二个是为制定登记册政策以实施IDN提供一个总体框架。
One of the key components of this framework prescribed strict compliance with RFCs 3490, 3491, and 3492. With the framework, ICANN specified that IDNA was to be the sole mechanism to be used in the DNS to represent IDNs.
该框架的关键组成部分之一规定严格遵守RFCs 3490、3491和3492。在该框架下,ICANN指定IDNA是DNS中用于表示IDN的唯一机制。
Limitations on the characters available for inclusion in IDNs were mandated by two mechanisms. The first was by requiring an "inclusion-based approach (meaning that code points that are not explicitly permitted by the registry are prohibited) for identifying permissible
对可包含在IDN中的字符的限制是由两种机制强制规定的。第一种是通过要求“基于包含的方法(即禁止注册中心未明确允许的代码点)来识别允许的代码点”
code points from among the full Unicode repertoire." The second mechanism required the association of every IDN with a specific language, with additional policies also being language based:
第二种机制要求每个IDN与特定语言相关联,其他策略也基于语言:
"In implementing the IDN standards, top-level domain registries will (a) associate each registered internationalized domain name with one language or set of languages, (b) employ language-specific registration and administration rules that are documented and publicly available, such as the reservation of all domain names with equivalent character variants in the languages associated with the registered domain name, and, (c) where the registry finds that the registration and administration rules for a given language would benefit from a character variants table, allow registrations in that language only when an appropriate table is available. ... In implementing the IDN standards, top-level domain registries should, at least initially, limit any given domain label (such as a second-level domain name) to the characters associated with one language or set of languages only."
“在实施IDN标准时,顶级域名注册机构将(a)将每个注册的国际化域名与一种或一组语言相关联,(b)采用记录在案且公开可用的特定语言的注册和管理规则,例如保留所有域名,并保留与注册域名相关的语言中的等效字符变体,以及(c)如果注册中心发现某一语言的注册和管理规则将受益于字符变体表,则仅当适当的表可用时,才允许使用该语言进行注册。……在实施IDN标准时,顶级域注册中心应至少在最初限制任何给定的域标签(如二级域名)转换为仅与一种语言或一组语言关联的字符。“
It was left to each TLD registry to define the character repertoire it would associate with any given language. This led to significant variation from registry to registry, with further heterogeneity in the underlying language-based IDN policies. If the guidelines had made provision for IDN policies also being based on script, a substantial amount of the resulting ambiguity could have been avoided. However, they did not, and the sequence of events leading to the present review of IDNA was thus triggered.
它留给每个TLD注册表来定义它将与任何给定语言关联的字符集。这导致了不同注册中心之间的显著差异,基于语言的IDN策略的基础更加异构。如果指导方针规定IDN政策也基于脚本,则可以避免由此产生的大量歧义。然而,他们没有这样做,因此引发了导致本次IDNA审查的一系列事件。
One of the responses of the TLD registries to what was widely perceived as a crisis situation was to invoke the mechanism described in the initial guidelines: "As the deployment of IDNs proceeds, ICANN and the IDN registries will review these Guidelines at regular intervals, and revise them as necessary based on experience."
TLD登记处对普遍认为的危机情况的反应之一是援引初始指南中描述的机制:“随着IDN的部署,ICANN和IDN登记处将定期审查这些指南,并根据经验在必要时进行修订。”
The pivotal requirement was the modification of the guidelines to permit script-based policies for IDNs. Further concern was expressed about the need for realistically implementable mechanisms for the propagation of TLD registry policies into the lower levels of their name trees. In addition to the anticipated increase of constraint on the protocol level, one obvious additional approach would be to replace the guidelines by an instrument that itself had clear status in the IETF's normative framework. A BCP was therefore seen as the appropriate focus for longer-term effort. The most pressing issues would be dealt with in the interim by incremental modification to the guidelines, but no need was seen for the detailed further development of those guidelines once that incremental modification was complete.
关键要求是修改指南,以允许IDN使用基于脚本的策略。还有人表示关切的是,需要有切实可行的机制,将TLD登记册政策传播到其名称树的较低级别。除了预期会增加协议层面的约束外,一个明显的额外方法是用一项本身在IETF规范框架中具有明确地位的文书取代指南。因此,业务连续性计划被视为长期努力的适当重点。在此期间,最紧迫的问题将通过逐步修改准则来解决,但一旦逐步修改完成,就没有必要进一步详细制定这些准则。
The outcome of this action was a version 2.0 of the guidelines [ICANNv2], which was endorsed by the ICANN Board on November 8, 2005 for a period of nine months. The Board stated further that it "tasks the IDN working group to continue its important work and return to the board with specific IDN improvement recommendations before the ICANN Meeting in Morocco" and "supports the working group's continued action to reframe the guidelines completely in a manner appropriate for further development as a Best Current Practices (BCP) document, to ensure that the Guideline directions will be used deeper into the DNS hierarchy and within TLD's where ICANN has a lesser policy relationship."
这项行动的结果是制定了指南[ICANNv2]的2.0版,并于2005年11月8日得到ICANN董事会的批准,为期九个月。董事会进一步表示,其“责成IDN工作组继续其重要工作,并在ICANN摩洛哥会议之前向董事会提交具体的IDN改进建议”,以及“支持工作组继续采取行动,以适合进一步发展为最佳现行做法(BCP)文件的方式完全重新制定指南,以确保指南指示将被更深入地用于DNS层次结构和与ICANN的政策关系较小的TLD中。”
Retaining the inclusion-based approach established in version 1.0, the crucial addition to the policy framework is that:
保留1.0版中建立的基于包容的方法,政策框架的关键补充是:
"All code points in a single label will be taken from the same script as determined by the Unicode Standard Annex #24: Script Names at http://www.unicode.org/reports/tr24. Exception to this is permissible for languages with established orthographies and conventions that require the commingled use of multiple scripts. In such cases, visually confusable characters from different scripts will not be allowed to coexist in a single set of permissible codepoints unless a corresponding policy and character table is clearly defined."
“单个标签中的所有代码点将取自Unicode标准附录24:脚本名称中确定的相同脚本http://www.unicode.org/reports/tr24. 例外情况是,对于具有既定正字法和惯例的语言,需要混合使用多个脚本除非明确定义了相应的策略和字符表,否则不允许来自不同脚本的可启用字符共存于一组允许的代码点中。”
Additionally:
此外:
"Permissible code points will not include: (a) line symbol-drawing characters (as those in the Unicode Box Drawing block), (b) symbols and icons that are neither alphanumeric nor ideographic language characters, such as typographic and pictographic dingbats, (c) characters with well-established functions as protocol elements, (d) punctuation marks used solely to indicate the structure of sentences."
“允许的代码点不包括:(a)线条符号绘图字符(如Unicode方框绘图块中的字符),(b)既不是字母数字字符也不是表意语言字符的符号和图标,如排版和象形丁字,(c)具有作为协议元素的既定功能的字符,(d)仅用于表示句子结构的标点符号。”
Attention has been called to several points that are not adequately dealt with (if at all) in the version 2.0 guidelines but that ought to be included in the policy framework without waiting for the production and release of a document based on a "best practices" model. The term "BCP" above does not necessarily refer to an IETF consensus document.
已提请注意在2.0版指南中没有充分处理(如果有的话)的几个要点,但这些要点应在不等待根据“最佳做法”模式编制和发布文件的情况下纳入政策框架。上述术语“BCP”不一定指IETF共识文件。
The intention in November 2005 was for the recommended major revision to be put to the ICANN Board prior to its meeting in Morocco (in late June 2006), but for the changes to be collated incrementally and appear in interim version 2.n releases of the guidelines. The IAB's understanding is that, while there has been some progress with this,
2005年11月的意图是在ICANN董事会在摩洛哥召开会议(2006年6月下旬)之前将建议的重大修订提交给ICANN董事会,但对修订内容进行增量整理,并将其发布在指南的临时版本2.n中。IAB的理解是,虽然在这方面取得了一些进展,
other issues relating to IDNs subsequently diverted much of the energy that was intended to be devoted to the more extensive treatment of the guidelines.
随后,与IDN有关的其他问题转移了本打算用于更广泛处理指南的大部分精力。
This section interweaves problems and issues of several types. Each subsection outlines something that is perceived to be a problem or issue "with IDNs", therefore needing correction. Some of these issues can be at least partially resolved by making changes to elements of the IDNA protocol or tables. Others will exist as long as people have expectations of IDNs that are inconsistent with the basic DNS architecture. It is important to identify this entire range of problems because users, registrants, and policy makers often do not understand the protocol and other technical issues but only the difference between what they believe happens or should happen and what actually happens. As long as those differences exist, there will be demands for functionality or policy changes for IDNs. Of course, some of these demands will be less realistic than others, but even the realistic ones should be understood in the same context as the others.
本节将问题和几种类型的问题交织在一起。每一小节都概述了一些被认为是“IDN”问题的东西,因此需要纠正。通过更改IDNA协议或表的元素,至少可以部分解决其中一些问题。只要人们对与基本DNS体系结构不一致的IDN抱有期望,其他人就会存在。识别这一系列问题很重要,因为用户、注册人和决策者通常不了解协议和其他技术问题,而只了解他们认为发生或应该发生的事情与实际发生的事情之间的区别。只要存在这些差异,就需要对IDN的功能或策略进行更改。当然,其中一些要求不如其他要求现实,但即使是现实的要求也应该与其他要求放在同一个背景下理解。
Most of the issues that have been raised, and that are discussed in this document, exist whether IDNA remains tied to Unicode 3.2 or whether migration to new Unicode versions is contemplated. A migration path is necessary to accommodate newly-coded scripts and to permit the maximum number of languages and scripts to be represented in domain names. However, the migration issues are largely separate from those involving a single Unicode version or Version 3.2 in particular, so they have been separated into this section and Section 3.
本文档中讨论的大多数问题都存在于IDNA是否仍然与Unicode 3.2绑定,或者是否考虑迁移到新的Unicode版本。为了适应新编码的脚本,并允许在域名中表示最大数量的语言和脚本,迁移路径是必需的。然而,迁移问题在很大程度上与涉及单个Unicode版本或特别是3.2版的问题是分开的,因此它们被分为本节和第3节。
The labels of the DNS are just strings of characters that are not inherently tied to a particular language. As mentioned briefly in the Introduction, DNS labels that could not lexically be words in any language are possible and indeed common. There appears to be no reason to impose protocol restrictions on IDNs that would restrict them more than all-ASCII hostname labels have been restricted. For that reason, even describing DNS labels or strings of them as "names" is something of a misnomer, one that has probably added to user confusion about what to expect.
DNS的标签只是字符串,它们本身并不与特定语言相关联。正如简介中简要提到的,在任何语言中都不可能是词汇的DNS标签是可能的,而且确实很常见。似乎没有理由对IDN施加比所有ASCII主机名标签受到的限制更大的协议限制。因此,即使将DNS标签或其字符串描述为“名称”也有点用词不当,这可能会增加用户对预期内容的困惑。
Ordinarily, people use "words" when they think of things and wish others to think of them too, for example, "orange", "tree", "restaurant" or "Acme Inc". Words are normally in a specific language, such as English or Swedish. The character-string labels
通常,人们在想到事物时会使用“词语”,并希望他人也能想到它们,例如,“橙色”、“树”、“餐厅”或“Acme Inc”。单词通常使用特定的语言,如英语或瑞典语。字符串标签
supported by the DNS are, as suggested above, not inherently "words". While it is useful, especially for mnemonic value or to identify objects, for actual words to be used as DNS labels, other constraints on the DNS make it impossible to guarantee that it will be possible to represent every word in every language as a DNS label, internationalized or not.
如上所述,DNS支持的不是固有的“词语”。虽然它对于助记符值或标识对象非常有用,但对于用作DNS标签的实际单词来说,DNS上的其他约束使得无法保证能够将每种语言中的每个单词表示为DNS标签,无论是否国际化。
When writing or typing the label (or word), a script must be selected and a charset must be picked for use with that script. The choice of charset is typically not under the control of the user on a per-word or per-document basis, but may depend on local input devices, keyboard or terminal drivers, or other decisions made by operating system or even hardware designers and implementers.
在编写或键入标签(或单词)时,必须选择脚本,并且必须选择一个字符集用于该脚本。字符集的选择通常不在用户对每个单词或每个文档的控制之下,而是取决于本地输入设备、键盘或终端驱动程序,或者操作系统甚至硬件设计者和实现者做出的其他决定。
If that charset, or the local charset being used by the relevant operating system or application software, is not Unicode, a further conversion must be performed to produce Unicode. How often this is an issue depends on estimates of how widely Unicode is deployed as the native character set for hardware, operating systems, and applications. Those estimates differ widely, but it should be noted that, among other difficulties:
如果该字符集或相关操作系统或应用程序软件使用的本地字符集不是Unicode,则必须执行进一步的转换以生成Unicode。这一问题出现的频率取决于对Unicode作为硬件、操作系统和应用程序的本机字符集的部署范围的估计。这些估计数差别很大,但应当指出,除其他困难外:
o ISO 8859 versions [ISO.8859.2003] and even national variations of ISO 646 [ISO.646.1991], are still widely used in parts of Europe;
o ISO 8859版本[ISO.8859.2003]甚至ISO 646的国家版本[ISO.646.1991]仍在欧洲部分地区广泛使用;
o code-table switching methods, typically based on the techniques of ISO 2022 [ISO.2022.1986] are still in general use in many parts of the world, especially in Japan with Shift-JIS and its variations; and
o 代码表切换方法,通常基于ISO 2022[ISO.2022.1986]的技术,在世界许多地方仍然普遍使用,特别是在日本,具有Shift JIS及其变体;和
o computing, systems, and communications in China tend to use one or more of the national "GB" standards rather than native Unicode.
o 中国的计算、系统和通信往往使用一个或多个国家“GB”标准,而不是本地Unicode。
Additionally, not all charsets define their characters in the same way and not all preexisting coding systems were incorporated into Unicode without changes. Sometimes local distinctions were made that Unicode does not make or vice versa. Consequently, conversion from other systems to Unicode may potentially lose information.
此外,并非所有字符集都以相同的方式定义其字符,也并非所有先前存在的编码系统都未经更改就被合并到Unicode中。有时会做出Unicode无法做出的局部区分,反之亦然。因此,从其他系统到Unicode的转换可能会丢失信息。
The Unicode string that results from this processing -- processing that is trivial in a Unicode-native system but that may be significant in others -- is then used as input to IDNA.
由该处理产生的Unicode字符串(在Unicode本机系统中是微不足道的处理,但在其他系统中可能很重要)随后被用作IDNA的输入。
While much of the discussion below is stated in terms of Unicode codings and associated rules, the IAB believes that some of the issues are actually not about the Unicode character set per se, but about how distributed matching systems operate in reality, and about what implications the distributed delayed search for stored data that characterizes the DNS has on the mapping algorithms.
虽然下面的大部分讨论都是根据Unicode编码和相关规则进行的,但IAB认为,一些问题实际上与Unicode字符集本身无关,而是与分布式匹配系统在现实中的运行方式有关,以及DNS特征存储数据的分布式延迟搜索对映射算法的影响。
There are similar words that can be expressed in multiple languages. Consider, for example, the name Torbjorn in Norwegian and Swedish. In Norwegian it is spelled with the character U+00F8 (LATIN SMALL LETTER O WITH STROKE) in the second syllable, while in Swedish it is spelled with U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS). Those characters are not treated as equivalent according to the Unicode Standard and its Annexes while most people speaking Swedish, Danish, or Norwegian probably think they are equivalent.
有一些相似的词可以用多种语言表达。例如,考虑Norwegian和瑞典的Turbjurn这个名字。在挪威语中,它的第二个音节用字符U+00F8(带笔划的拉丁小写字母O)拼写,而在瑞典语中,它的拼写是U+00F6(带分音符的拉丁小写字母O)。根据Unicode标准及其附件,这些字符不被视为等效字符,而大多数说瑞典语、丹麦语或挪威语的人可能认为它们是等效的。
It is neither possible nor desirable to make these characters equivalent on a global basis. To do so would, for this example, rationalize the situation in Sweden while causing considerable confusion in Germany because the U+00F8 character is never used in the German language. But the "variant" model introduced in [RFC3743] and [RFC4290] can be used by a registry to prevent the worst consequence of the possible confusion, by ensuring either that both names are registered to the same party in a given domain or that one of them is completely prohibited.
在全球范围内使这些字符等效既不可能也不可取。例如,这样做将使瑞典的情况合理化,同时在德国造成相当大的混乱,因为德语中从未使用U+00F8字符。但[RFC3743]和[RFC4290]中引入的“变体”模型可供注册中心使用,以防止可能出现混淆的最坏后果,方法是确保两个名称都注册到给定域中的同一方,或者其中一个名称被完全禁止。
There are languages in the world that can be expressed using multiple scripts. For example, some Eastern European and Central Asian languages can be expressed in either Cyrillic or Latin (see Section 1.5.2) characters, or some African and Southeast Asian languages can be expressed in either Arabic or Latin characters. A few languages can even be written in three different scripts. In other cases, the language is typically written in a combination of scripts (e.g., Kanji, Kana, and Romaji for Japanese; Hangul and Hanji for Korean). Because of this, the same word, in the same language, can be expressed in different ways. For some languages, only a single script is normally used to write a single word; for others, mixed scripts are required; and, for still others, special circumstances may dictate mixing scripts in labels although that is not normally done for "words". For IDN purposes, these variations make the definition of "script" extremely sensitive, especially since ICANN is now recommending that it be used as the primary basis for
世界上有些语言可以使用多个脚本来表达。例如,一些东欧和中亚语言可以用西里尔语或拉丁语(见第1.5.2节)字符表示,或者一些非洲和东南亚语言可以用阿拉伯语或拉丁语字符表示。一些语言甚至可以用三种不同的脚本编写。在其他情况下,该语言通常以脚本的组合形式编写(例如,日语为汉字、假名和罗马字;韩语为韩语和韩语)。因此,在同一种语言中,同一个词可以用不同的方式表达。对于某些语言,通常只使用单个脚本来编写单个单词;对于其他情况,需要混合脚本;此外,对于其他人来说,特殊情况可能要求在标签中混合脚本,尽管这通常不适用于“单词”。就IDN而言,这些变化使得“脚本”的定义极其敏感,特别是因为ICANN现在建议将其用作
registry policies. However essential it may be to prohibit mixed-script labels, additional policy nuance is required for "languages with established orthographies and conventions that require the commingled use of multiple scripts".
注册表策略。无论禁止混合脚本标签多么重要,对于“需要混合使用多个脚本的已建立正字法和约定的语言”,还需要额外的政策细微差别。
Unicode contains several different models for representing characters. The Chinese (Han)-derived characters of the "CJK" (Chinese, Japanese, and Korean) languages are "unified", i.e., characters with common derivation and similar appearances are assigned to the same code point. European characters derived from a Greek-Latin base are separated into separate code blocks for Latin, Greek, and Cyrillic even when individual characters are identical in both form and semantics. Separate code points based on font differences alone are generally prohibited, but a large number of characters for "mathematical" use have been assigned separate code points even though they differ from base ASCII characters only by font attributes such as "script", "bold", or "italic". Some characters that often appear together are treated as typographical digraphs with specific code points assigned to the combination, others require that the two-character sequences be used, and still others are available in both forms. Some Roman-derived letters that were developed as decorated variations on the basic Latin letter collection (e.g., by addition of diacritical marks) are assigned code points as individual characters, others must be built up as two (or more) character sequences using "combining characters".
Unicode包含几个不同的字符表示模型。“CJK”(中文、日文和韩文)语言的中文(韩文)衍生字符是“统一”的,即具有共同衍生和类似外观的字符被分配到同一代码点。即使单个字符在形式和语义上完全相同,源于希腊-拉丁语基础的欧洲字符也会被分离为拉丁语、希腊语和西里尔语的单独代码块。通常禁止仅基于字体差异的单独代码点,但大量用于“数学”用途的字符被分配了单独的代码点,即使它们与基本ASCII字符仅因字体属性(如“脚本”、“粗体”或“斜体”)不同。一些经常同时出现的字符被视为具有指定给组合的特定代码点的印刷有向图,其他字符要求使用两个字符序列,还有一些字符以两种形式都可用。一些罗马衍生字母是作为基本拉丁字母集合的装饰变体开发的(例如,通过添加变音符号),被指定为单独字符的代码点,其他字母必须使用“组合字符”构建为两个(或更多)字符序列。
Many of these differences result from the desire to maintain backward compatibility while the standard evolved historically, and are hence understandable. However, the DNS requires precise knowledge of which codes and code sequences represent the same character and which ones do not. Limiting the potential difficulties with confusable characters (see Section 2.2.6) requires even more knowledge of which characters might look alike in some fonts but not in others. These variations make it difficult or impossible to apply a single set of rules to all of Unicode and, in doing so, satisfy everyone and their perceived needs. Instead, more or less complex mapping tables, defined on a character-by-character basis, are required to "normalize" different representations of the same character to a single form so that matching is possible.
这些差异中的许多是由于希望在标准历史发展的同时保持向后兼容性,因此是可以理解的。但是,DNS需要精确了解哪些代码和代码序列代表相同的字符,哪些不代表相同的字符。限制易混淆字符的潜在困难(见第2.2.6节)需要更多关于哪些字符在某些字体中看起来相似,而在其他字体中则不相似的知识。这些变化使得很难或不可能对所有Unicode应用一组规则,并在这样做时满足每个人及其感知的需求。相反,需要在逐个字符的基础上定义或多或少复杂的映射表,以便将同一字符的不同表示形式“规范化”为单一形式,从而实现匹配。
Unless normalization rules, such as those that underlie Nameprep, are applied, characters that are essentially identical will not match in the DNS, creating many opportunities for problems. The most common of these problems is that, due to the processing applied (and discussed above) before a word is represented as a Unicode string, a single word can end up being expressed as several different Unicode
除非应用规范化规则,如Nameprep的基础规则,否则本质相同的字符在DNS中将不匹配,从而产生许多问题。这些问题中最常见的是,由于在将一个单词表示为Unicode字符串之前应用(并在上面讨论)的处理,单个单词最终可能会表示为多个不同的Unicode字符串
strings. Even if normalization rules are applied, some strings that are considered identical by users will not compare equal. That problem is discussed in more detail elsewhere in this document, particularly in Section 3.2.1.
串。即使应用了规范化规则,用户认为相同的某些字符串也不会进行相等比较。该问题将在本文件其他部分,特别是第3.2.1节中详细讨论。
IDNA attempts to compensate for these problems by using a normalization algorithm defined by the Unicode Consortium. This algorithm can change a sequence of one or more Unicode characters to another set of characters. One example is that the base character U+0061 (LATIN SMALL LETTER A) followed by U+0308 (COMBINING DIAERESIS) is changed to the single Unicode character U+00E4 (LATIN SMALL LETTER A WITH DIAERESIS).
IDNA试图通过使用Unicode联盟定义的规范化算法来弥补这些问题。此算法可以将一个或多个Unicode字符的序列更改为另一组字符。一个示例是,将后跟U+0308(组合分音符)的基本字符U+0061(拉丁小写字母A)更改为单个Unicode字符U+00E4(带分音符的拉丁小写字母A)。
This Unicode normalization process accounts only for simple character equivalences, not equivalences that are language or script dependent. For example, as mentioned above, the characters U+00F8 (LATIN SMALL LETTER O WITH STROKE) and U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) are considered to match in Swedish (and some other languages), but not for all languages that use either of the characters. Having these characters be treated as equivalent in some contexts and not in others requires decisions and mechanisms that, in turn, depend much more on context than either IDNA or the Unicode character-based normalization tables can provide.
此Unicode规范化过程只考虑简单的字符等效,而不考虑依赖于语言或脚本的等效。例如,如上所述,字符U+00F8(带笔划的拉丁小写字母O)和U+00F6(带分音符的拉丁小写字母O)在瑞典语(和一些其他语言)中被认为是匹配的,但并非所有使用这两种字符的语言都匹配。让这些字符在某些上下文中被视为等效字符而在其他上下文中不被视为等效字符需要决策和机制,而这些决策和机制反过来又比IDNA或基于Unicode字符的规范化表所能提供的更多地依赖于上下文。
Additional complications occur if the sequences are more complicated or if an attacker is making a deliberate effort to confuse the normalization process. For example, if the sequence U+0069 U+0307 (LATIN SMALL LETTER I followed by COMBINING DOT ABOVE) appears, the Unicode Normalization Method known as NFKC maps it into U+00EF (LATIN SMALL LETTER I WITH DIAERESIS), which is what one would predict. But consider U+0131 U+0308 (LATIN SMALL LETTER DOTLESS I and COMBINING DIAERESIS): is that the same character? Is U+0131 U+0307 U+0307 (dotless i and two combining dot-above characters) equivalent to U+00EF or U+0069, or neither? NFKC does not appear to tell us, nor does the definition of U+0307 appear to tell us what happens when it is combined with other "symbol above" arrangements (unlike some of the "accent above" combining characters, which more or less specify kerning). Similar issues arise when U+00EF is combined with various dot-above combining characters. Each of these questions provides some opportunities for spoofing if different display implementations interpret the rules in different ways.
如果序列更复杂,或者攻击者故意混淆规范化过程,则会出现其他复杂情况。例如,如果出现序列U+0069 U+0307(拉丁文小写字母I后跟上面的组合点),则称为NFKC的Unicode规范化方法会将其映射为U+00EF(拉丁文小写字母I加分音符),这是可以预测的。但是考虑U + 0131 U + 0308(拉丁文小写无字母I和组合diaScript):这是相同的字符吗?U+0131 U+0307 U+0307(无点i和字符上方的两个组合点)是否等同于U+00EF或U+0069,或者两者都不是?NFKC似乎没有告诉我们,U+0307的定义似乎也没有告诉我们当它与其他“上面的符号”排列组合时会发生什么(不同于一些“上面的重音”组合字符,它们或多或少地指定了紧排)。当U+00EF与各种点上组合字符组合时,也会出现类似的问题。如果不同的显示实现以不同的方式解释规则,这些问题中的每一个都提供了一些欺骗的机会。
If we leave Latin scripts and examine those based on Chinese characters, we see there is also an absence of specific, lexigraphic, rules for transformations between Traditional and Simplified Chinese. Even if there were such rules, unification of Japanese and Korean
如果我们不使用拉丁文字,而研究基于汉字的文字,我们会发现繁体中文和简体中文之间的转换也缺乏具体的词汇规则。即使有这样的规则,日本和韩国的统一
characters with Chinese ones would make it impossible to normalize Traditional Chinese into Simplified Chinese ones without causing problems in Japanese and Korean use of the same characters.
带有中文的字符将使繁体中文规范化为简体中文而不会在日韩使用相同字符时产生问题。
More generally, while some mappings, such as those between precomposed Latin script characters and the equivalent multiple code point composed character sequences, depend only on the characters themselves, in many or most cases, such as the case with Swedish above, the mapping is language or culturally dependent. There have been discussions as to whether different canonicalization rules (in addition to or instead of Unicode normalization) should be, or could be, applied differently to different languages or scripts. The fact that most scripts included in Unicode have been initially incorporated by copying an existing standard more or less intact has impact on the optimization of these algorithms and on forward compatibility. Even if the language is known and language-specific rules can be defined, dependencies on the language do not disappear. Canonicalization operations are not possible unless they either depend only on short sequences of text or have significant context available that is not obvious from the text itself. DNS lookups and many other operations do not have a way to capture and utilize the language or other information that would be needed to provide that context.
更一般地说,虽然一些映射(如预合成拉丁脚本字符和等效多码点合成字符序列之间的映射)仅依赖于字符本身,但在许多或大多数情况下,如上述瑞典语,映射依赖于语言或文化。对于不同的规范化规则(除了或代替Unicode规范化)是否应该或可以不同地应用于不同的语言或脚本,已经进行了讨论。Unicode中包含的大多数脚本最初都是通过复制一个或多或少完好无损的现有标准来合并的,这一事实对这些算法的优化和前向兼容性产生了影响。即使已知该语言并且可以定义特定于该语言的规则,对该语言的依赖也不会消失。规范化操作是不可能的,除非它们要么仅依赖于文本的短序列,要么具有从文本本身看不明显的重要上下文。DNS查找和许多其他操作无法捕获和利用提供该上下文所需的语言或其他信息。
These variations in languages and in user perceptions of characters make it difficult or impossible to provide uniform algorithms for matching Unicode strings in a way that no end users are ever surprised by the result. For closely-related scripts or characters, surprises may even be frequent. However, because uniform algorithms are required for mappings that are applied when names are looked up in the DNS, the rules that are chosen will always represent an approximation that will be more or less successful in minimizing those user surprises. The current Nameprep and Stringprep algorithms use mapping tables to "normalize" different representations of the same text to a single form so that matching is possible.
语言和用户对字符感知的这些差异使得很难或不可能提供统一的算法来匹配Unicode字符串,最终用户对结果都不会感到惊讶。对于密切相关的脚本或角色,惊喜甚至可能是经常发生的。但是,由于在DNS中查找名称时应用的映射需要统一的算法,因此所选择的规则将始终表示一种近似值,该近似值将或多或少地成功地最小化这些用户意外。当前的Nameprep和Stringprep算法使用映射表将同一文本的不同表示形式“规范化”为单一形式,以便能够进行匹配。
More details on the creation of the normalization algorithms can be found in the Unicode Specification and the associated Technical Reports [UTR] and Annexes. Technical Report #36 [UTR36] and [UTR39] are specifically related to the IDN discussion.
有关创建规范化算法的更多详细信息,请参见Unicode规范和相关技术报告[UTR]及附件。技术报告#36[UTR36]和[UTR39]与IDN讨论特别相关。
URLs and other identifiers appear, not only in electronic forms from which they can (at least in principle) be accurately copied and "pasted" but in printed forms from which the user must transcribe them into the computer system. This is often known as the "side-of-the-bus problem" because a particularly problematic version of it
URL和其他标识符不仅以电子形式出现,可以(至少在原则上)准确地复制和“粘贴”,而且以打印形式出现,用户必须将其转录到计算机系统中。这通常被称为“总线问题的一方”,因为它有一个特别有问题的版本
requires that the user be able to observe and accurately remember a URL that is quickly glimpsed in a transient form -- a billboard seen while driving, a sign on the side of a passing vehicle, a television advertisement that is not frequently repeated or on-screen for a long time, and so on.
要求用户能够观察并准确地记住一个以瞬间形式快速浏览的URL——驾驶时看到的广告牌、路过车辆侧面的标志、不经常重复或长时间不在屏幕上播放的电视广告,等等。
The difficulty, in short, is that two Unicode strings that are actually different might look exactly the same, especially when there is no time to study them. This is because, for example, some glyphs in Cyrillic, Greek, and Latin do look the same, but have been assigned different code points in Unicode. Worse, one needs to be reasonably familiar with a script and how it is used to understand how much characters can reasonably vary as the result of artistic fonts and typography. For example, there are a few fonts for Latin characters that are sufficiently highly ornamented that an observer might easily confuse some of the characters with characters in Thai script. Uppercase ITC Blackadder (a registered trademark of International Typeface Corporation) and Curlz MT are two fairly obvious examples; these fonts use loops at the end of serifs, creating a resemblance to Thai (in some fonts) for some characters.
简言之,困难在于两个实际上不同的Unicode字符串可能看起来完全相同,尤其是在没有时间研究它们的情况下。这是因为,例如,西里尔文、希腊文和拉丁文中的某些字形看起来确实相同,但在Unicode中分配了不同的代码点。更糟糕的是,我们需要合理地熟悉一个脚本,以及如何使用它来理解由于艺术字体和排版的结果,有多少字符可以合理地变化。例如,有一些拉丁字符的字体装饰得很好,以至于观察者很容易将其中一些字符与泰文字符混淆。大写ITC Blackadder(国际字体公司的注册商标)和Curlz MT是两个相当明显的例子;这些字体在衬线的末尾使用循环,使某些字符与泰语(在某些字体中)相似。
Some scripts (and because of that some words in some languages) are written not left to right, but right to left. And, to complicate things, one might have something written in Arabic script right to left that includes some characters that are read from left to right, such as European-style digits. This implies that some texts might have a mixed left-to-right AND right-to-left order (even though in most implementations, and in IDNA, all texts have a major direction, with the other as an exception).
有些脚本(以及某些语言中的某些单词)不是从左到右写的,而是从右到左写的。而且,让事情复杂化的是,可能有一些东西是用阿拉伯语从右到左书写的,其中包括一些从左到右读取的字符,比如欧洲风格的数字。这意味着一些文本可能具有从左到右和从右到左的混合顺序(即使在大多数实现中,以及在IDNA中,所有文本都有一个主方向,另一个是例外)。
IDNA permits the inclusion of European digits in a label that is otherwise a sequence of right-to-left characters, but prohibits most other mixed-directional (or bidirectional) strings. This prohibition can cause other problems such as the rejection of some otherwise linguistically and culturally sensible strings. As Unicode and conventions for handling so-called bidirectional ("BIDI") strings evolve, the prohibition in IDNA should be reviewed and reevaluated.
IDNA允许在标签中包含欧洲数字,否则标签是从右到左的字符序列,但禁止大多数其他混合方向(或双向)字符串。这项禁令可能会导致其他问题,例如拒绝使用一些在语言和文化上都合理的字符串。随着Unicode和处理所谓的双向(“BIDI”)字符串的约定的发展,IDNA中的禁令应该得到审查和重新评估。
Similar-looking characters in identifiers can cause actual problems on the Internet since they can result, deliberately or accidentally, in people being directed to the wrong host or mailbox by believing that they are typing, or clicking on, intended characters that are different from those that actually appear in the domain name or reference. See Section 4.1.3 for further discussion of this issue.
标识符中外观相似的字符可能会在互联网上造成实际问题,因为这些字符可能会导致人们故意或无意地被引导到错误的主机或邮箱,因为他们认为自己正在键入或单击与域名或引用中实际出现的字符不同的预期字符。有关此问题的进一步讨论,请参见第4.1.3节。
IDNs complicate these issues, not only by providing many additional characters that look sufficiently alike to be potentially confused, but also by raising new policy questions. For example, if a language can be written in two different scripts, is a label constructed from a word written in one script equivalent to a label constructed from the same word written in the other script? Is the answer the same for words in two different languages that translate into each other?
IDN使这些问题复杂化,不仅提供了许多额外的字符,这些字符看起来非常相似,可能会引起混淆,而且还提出了新的政策问题。例如,如果一种语言可以用两个不同的脚本编写,那么由一个脚本编写的单词构成的标签是否等同于由另一个脚本编写的相同单词构成的标签?对于两种不同语言中相互翻译的单词,答案是一样的吗?
It is now generally understood that, in addition to the collision problems of possibly equivalent words and hence labels, it is possible to utilize characters that look alike -- "confusable" characters -- to spoof names in order to mislead or defraud users. That issue, driven by particular attacks such as those known as "phishing", has introduced stronger requirements for registry efforts to prevent problems than were previously generally recognized as important.
现在人们普遍理解,除了可能的等价词和标签的冲突问题外,还可能利用看起来相似的字符——“易混淆”字符——来欺骗名称,以误导或欺骗用户。这一问题是由诸如“网络钓鱼”之类的特定攻击所驱动的,它对注册表工作提出了更高的要求,以防止问题的发生,这比以前普遍认为的重要。
One commonly-proposed approach is to have a registry establish restrictions on the characters, and combinations of characters, it will permit to be included in a string to be registered as a label. Taking the Swedish top-level domain, .SE, as an example, a rule might be adopted that the registry "only accepts registrations in Swedish, using Latin script, and because of this, Unicode characters Latin-a, -b, -c,...". But, because there is not a 1:1 mapping between country and language, even a Country Code Top Level Domain (ccTLD) like .SE might have to accept registrations in other languages. For example, there may be a requirement for Finnish (the second most-used language in Sweden). What rules and code points are then defined for Finnish? Does it have special mappings that collide with those that are defined for Swedish? And what does one do in countries that use more than one script? (Finnish and Swedish use the same script.) In all cases, the dispute will ultimately be about whether two strings are the same (or confusingly similar) or not. That, in turn, will generate a discussion of how one defines "what is the same" and "what is similar enough to be a problem".
一种常见的建议方法是让注册中心对字符和字符组合建立限制,它将允许包含在要注册为标签的字符串中。以瑞典语顶级域.SE为例,可以采用这样一条规则,即注册表“只接受瑞典语注册,使用拉丁语脚本,因此使用Unicode字符Latin-a、-b、-c、…”。但是,由于国家和语言之间没有1:1的映射,即使是像.SE这样的国家代码顶级域(ccTLD)也可能必须接受其他语言的注册。例如,可能需要芬兰语(瑞典第二大使用语言)。芬兰语定义了哪些规则和代码点?它是否有与为瑞典语定义的映射冲突的特殊映射?在使用多个脚本的国家,人们会做什么?(芬兰语和瑞典语使用相同的脚本。)在所有情况下,争议最终将是两个字符串是否相同(或令人困惑地相似)。反过来,这将引发一场关于如何定义“什么是相同的”和“什么相似到足以成为问题”的讨论。
Another example arose recently that further illustrates the problem. If one were to use Cyrillic characters to represent the country code for Russia in a localized equivalent to the ccTLD label, the characters themselves would be indistinguishable from the Latin characters "P" and "Y" (in either lower- or uppercase) in most fonts. We presume this might cause some consternation in Paraguay.
最近出现的另一个例子进一步说明了这个问题。如果使用西里尔字母来表示俄罗斯的国家代码,其本地化等同于ccTLD标签,那么这些字符本身将无法与大多数字体中的拉丁字母“P”和“Y”(小写或大写)区分开来。我们认为这可能会在巴拉圭引起一些恐慌。
These difficulties can never be completely eliminated by algorithmic means. Some of the problem can be addressed by appropriate tuning of the protocols and their tables, other parts by registry actions to reduce confusion and conflicts, and still other parts can be
这些困难永远无法通过算法手段完全消除。一些问题可以通过适当调整协议及其表来解决,其他部分可以通过注册表操作来减少混乱和冲突,还有一些部分可以解决
addressed by careful design of user interfaces in application programs. But, ultimately, some responsibility to avoid being tricked or harmfully confused will rest with the user.
通过仔细设计应用程序中的用户界面来解决。但是,最终,一些避免被欺骗或有害混淆的责任将落在用户身上。
Another registry technique that has been extensively explored involves looking at confusable characters and confusion between complete labels, restricting the labels that can be registered based on relationships to what is registered already. Registries that adopt this approach might establish special mapping rules such as:
另一种已被广泛探索的注册技术涉及查看可混淆字符和完整标签之间的混淆,根据与已注册内容的关系限制可注册的标签。采用这种方法的登记处可以制定特殊的映射规则,例如:
1. If you register something with code point A, domain names with B instead of A will be blocked from registration by others (where B is a character at a separate code point that has a confusingly similar appearance to A).
1. 如果您使用代码点A注册某些内容,则其他人将阻止使用B而不是A注册域名(其中B是位于单独代码点的字符,其外观与A非常相似)。
2. If you register something with code point A, you also get domain name with B instead of A.
2. 如果你用代码点A注册某个东西,你也会用B而不是A获得域名。
These approaches are discussed in more detail for "CJK" characters in RFC 3743 [RFC3743] and more generally in RFC 4290 [RFC4290].
RFC 3743[RFC3743]和RFC 4290[RFC4290]中对“CJK”字符更详细地讨论了这些方法。
The issues above, at least as they were understood at the time, provided the background for the IESG statement included in Section 1.6.1 (which, in turn, was part of the basis for the initial ICANN Guidelines) that a registry should have a policy about the scripts, languages, code points and text directions for which registrations will be accepted. While "accept all" might be an acceptable policy, it implies there is also a dispute resolution process that takes the problems listed above into account. This process must be designed for dealing with all types of potential disputes. For example, issues might arise between registrant and registry over a decision by the registry on collisions with already registered domain names and between registrant and trademark holder (that a domain name infringes on a trademark). In both cases, the parties disagreeing have different views on whether two strings are "equivalent" or not. They may believe that a string that is not allowed to be registered is actually different from one that is already registered. Or they might believe that two strings are the same, even though the rules adopted by the registry to prevent confusion define them as two different domain names.
上述问题,至少在当时理解的情况下,为第1.6.1节中包含的IESG声明提供了背景(该声明反过来也是初始ICANN指南的基础的一部分),即注册处应制定关于脚本、语言、语言和服务的政策,接受注册的代码点和文字说明。虽然“全部接受”可能是一项可接受的政策,但这意味着还有一个考虑到上述问题的争议解决过程。该流程必须设计用于处理所有类型的潜在争议。例如,注册人和注册处之间可能会就注册处关于与已注册域名冲突的决定以及注册人和商标持有人之间(域名侵犯商标)的决定产生问题。在这两种情况下,持不同意见的各方对两个字符串是否“等效”有不同的看法。他们可能认为不允许注册的字符串实际上与已注册的字符串不同。或者他们可能认为两个字符串是相同的,即使注册中心为防止混淆而采用的规则将它们定义为两个不同的域名。
While opinions differ about how important the issues are in practice, the use of Unicode and its supporting tables for IDNA appears to be far more sensitive to subtle changes than it is in typical Unicode applications. This may be, at least in part, because many other applications are internally sensitive only to the appearance of characters and not to their representation. Or those applications may be able to take effective advantage of script, language, or character class identification. The working group that developed IDNA concluded that attempting to encode any ancillary character information into the DNS label would be impractical and unwise, and the IAB, based in part on the comments in the ad hoc committee, saw no reason to review that decision.
虽然人们对这些问题在实践中的重要性看法不一,但与典型的Unicode应用程序相比,Unicode及其IDNA支持表的使用似乎对细微变化更加敏感。这可能至少部分是因为许多其他应用程序内部只对字符的外观敏感,而对字符的表示不敏感。或者,这些应用程序可以有效地利用脚本、语言或字符类标识。开发IDNA的工作组得出结论,试图将任何辅助字符信息编码到DNS标签中是不切实际和不明智的,IAB部分基于特设委员会的评论,认为没有理由审查该决定。
The Unicode Consortium has sometimes used the likelihood of a combination of characters actually appearing in a natural language as a criterion for the safety of a possible change. However, as discussed above, DNS names are often fabrications -- abbreviations, strings deliberately formed to be unusual, members of a series sequenced by numbers or other characters, and so on. Consequently, a criterion that considers a change to be safe if it would not be visible in properly-constructed running text is not helpful for DNS purposes: a change that would be safe under that criterion could still be quite problematic for the DNS.
Unicode联盟有时使用自然语言中实际出现的字符组合的可能性作为可能更改的安全性标准。然而,如上所述,DNS名称通常是捏造的——缩写、故意形成不寻常的字符串、按数字或其他字符排序的序列的成员,等等。因此,如果在正确构造的运行文本中看不到更改,则将其视为安全的标准对于DNS来说是没有帮助的:在该标准下安全的更改对于DNS来说仍然是有问题的。
This sensitivity to changes has made it quite difficult to migrate IDNA from one version of Unicode to the next if any changes are made that are not strictly additive. A change in a code point assignment or definition may be extremely disruptive if a DNS label has been defined using the earlier form and any of its previous components has been moved from one table position or normalization rule to another. Unicode normalization tables, tables of scripts or languages and characters that belong to them, and even tables of confusable characters as an adjunct to security recommendations may be very helpful in designing registry restrictions on registrations and applications provisions for avoiding or identifying suspicious names. Ironically, they also extend the sensitivity of IDNA and its implementations to all forms of change between one version of Unicode and the next. Consequently, they make Unicode version migration more difficult.
这种对更改的敏感性使得如果进行的任何更改不是严格意义上的加法更改,则很难将IDNA从一个Unicode版本迁移到下一个Unicode版本。如果DNS标签已使用早期格式定义,并且其任何以前的组件已从一个表位置或规范化规则移动到另一个表位置或规范化规则,则代码点分配或定义中的更改可能会造成极大的破坏。Unicode规范化表、脚本或语言表以及属于它们的字符,甚至是作为安全建议附件的可混淆字符表,在设计注册限制和应用程序条款以避免或识别可疑名称时可能非常有帮助。具有讽刺意味的是,它们还将IDNA及其实现的敏感性扩展到Unicode版本和下一版本之间所有形式的更改。因此,它们使得Unicode版本迁移更加困难。
An example of the type of change that appears to be just a small correction from one perspective but may be problematic from another was the correction to the normalization definition in 2004 [Unicode-PR29]. Community input suggested that the change would
从一个角度来看,变更类型似乎只是一个小的修正,但从另一个角度来看可能存在问题,例如2004年对规范化定义的修正[Unicode-PR29]。社区的意见表明,这一变化将
cause problems for Stringprep, but the Unicode Technical Committee decided, on balance, that the change was worthwhile. Because of difficulties with consistency, some deployed implementations have decided to adopt the change and others have not, leading to subtle incompatibilities.
为Stringprep带来问题,但是Unicode技术委员会决定,总的来说,更改是值得的。由于一致性方面的困难,一些已部署的实现决定采用更改,而另一些则没有,这导致了微妙的不兼容性。
This situation leads to a dilemma. On the one hand, it is completely unacceptable to freeze IDNA at a Unicode version level that excludes more recently-defined characters and scripts that are important to those who use them. On the other hand, it is equally unacceptable to migrate from one version of Unicode to the next if such migration might invalidate an existing registered DNS name or some of its registered properties or might make the string or representation of that name ambiguous. If IDNA is to be modified to accommodate new versions of Unicode, the IETF will need to work with the Unicode Consortium and other bodies to find an appropriate balance in this area, but progress will be possible only if all relevant parties are able to fairly consider and discuss possible decisions that may be very difficult and unpalatable.
这种情况导致了进退两难的局面。一方面,将IDNA冻结在Unicode版本级别是完全不可接受的,因为Unicode版本级别排除了最近定义的字符和脚本,这些字符和脚本对使用它们的人来说非常重要。另一方面,如果从一个Unicode版本迁移到下一个Unicode版本可能使现有注册DNS名称或其某些注册属性无效,或者可能使该名称的字符串或表示形式不明确,则同样不可接受。如果要对IDNA进行修改以适应Unicode的新版本,IETF将需要与Unicode联盟和其他机构合作,以在这一领域找到适当的平衡,但只有所有相关方能够公平地考虑和讨论可能非常困难和令人不快的可能的决定,才有可能取得进展。
It would also prove useful if, during the course of that dialog, the need for Unicode Consortium concern with security issues in applications of the Unicode character set could be clarified. It would be unfortunate from almost every perspective considered here, if such matters slowed the inclusion of as yet unencoded scripts.
在对话过程中,如果能够澄清Unicode联盟对Unicode字符集应用程序中的安全问题的关注的必要性,这也将证明是有用的。从这里考虑的几乎每一个角度来看,如果这些问题减缓了尚未编码脚本的加入,那将是不幸的。
One of the advantages of the Unicode model of combining characters, as with previous systems that use character overstriking to accomplish similar purposes, is that it is possible to use sequences of code points to generate characters that are not explicitly provided for in the character set. However, unless sequences that are not explicitly provided for are prohibited by some mechanism (such as the normalization tables), such combining sequences can permit two related dangers.
Unicode组合字符模型的一个优点是,可以使用代码点序列生成字符集中未明确提供的字符,这与以前使用字符溢出来实现类似目的的系统一样。然而,除非某些机制(如规范化表)禁止未明确规定的序列,否则这种组合序列可能会导致两种相关的危险。
o The first is another risk of character confusion, especially if the relationship of the combining character with characters it combines with are not precisely defined or unexpected combinations of combining characters are used. That issue is discussed in more detail, with an example, in Section 2.2.3.
o 第一个是字符混淆的另一个风险,特别是如果组合字符与其组合的字符之间的关系没有精确定义,或者使用了意外的组合字符组合时。第2.2.3节以一个例子对该问题进行了更详细的讨论。
o These same issues also inherently impact the stability of the normalization tables. Suppose that, somewhere in the world, there is a character that looks like a Roman-derived lowercase "i", but
o 这些同样的问题也会内在地影响规范化表的稳定性。假设,在世界的某个地方,有一个字符看起来像罗马派生的小写字母“i”,但是
with three (not one or two) dots above it. And suppose that the users of that character agree to represent it by combining a traditional "i" (U+0069) with a combining diaeresis (U+0308). So far, no problem. But, later, a broader need for this character is discovered and it is coded into Unicode either as a single precomposed character or, more likely under existing rules, by introducing a three-dot-above combining character. In either case, that version of Unicode should include a rule in NFKC that maps the "i"-plus-diaeresis sequence into the new, approved, one. If one does not do so, then there is arguably a normalization that should occur that does not. If one does so, then strings that were valid and normalized (although unanticipated) under the previous versions of Unicode become unnormalized under the new version. That, in turn, would impact IDNA comparisons because, effectively, it would introduce a change in the matching rules.
上面有三个(不是一个或两个)点。假设该字符的用户同意通过组合传统的“i”(U+0069)和组合分划(U+0308)来表示它。到目前为止,没有问题。但是,后来,人们发现了对该字符更广泛的需求,并将其编码为Unicode,或者作为单个预合成字符,或者更可能根据现有规则,通过在组合字符上方引入三个点。在任何一种情况下,Unicode版本都应该在NFKC中包含一条规则,该规则将“i”-plus diaeresis序列映射到新的、经批准的序列中。如果一个人不这样做,那么可以说应该发生一种正常化,而这种正常化不会发生。如果这样做,那么在以前的Unicode版本下有效且规范化(尽管未预料到)的字符串在新版本下将变得不规范。这反过来会影响IDNA比较,因为它实际上会引入匹配规则的变化。
It would be useful to consider rules that would avoid or minimize these problems with the understanding that, for reasons given elsewhere, simply minimizing it may not be good enough for IDNA. One partial solution might be to ban any combination of a base character and a combining character that does not appear in a hypothetical "anticipated combinations" table from being used in a domain name label. The next subsection discusses a more radical, if impractical, view of the problem and its solutions.
考虑这些规则会避免或最小化这些问题,这是有用的,因为在其他地方给出的理由,简单地最小化它可能不足以IDNA。一个部分解决方案可能是禁止在域名标签中使用假设的“预期组合”表中未出现的基本字符和组合字符的任何组合。下一小节将讨论对问题及其解决方案的更激进的观点(如果不切实际的话)。
For several reasons, including those discussed above, one thing that increases IDNA complexity and the need for normalization is that combining characters are permitted. Without them, complexity might be reduced enough to permit easier transitions to new versions. The community should consider the impact of entirely prohibiting combining characters from IDNs. While it is almost certainly unfeasible to introduce this change into Unicode as it is now defined and doing so would be extremely disruptive even if it were feasible, the thought experiment can be helpful in understanding both the issues and the implications of the paths not taken. For example, one consequence of this, of course, is that each new language or script, and several existing ones, would require that all of its characters have Unicode assignments to specific, precomposed, code points.
出于几个原因,包括上面讨论的原因,增加IDNA复杂性和规范化需求的一件事是允许组合字符。没有它们,复杂性可能会降低到足以允许更轻松地过渡到新版本。社区应该考虑完全禁止IDN组合字符的影响。虽然几乎可以肯定的是,按照现在的定义,将这种变化引入Unicode是不可行的,而且这样做会造成极大的破坏性,即使它是可行的,但思想实验有助于理解问题和未采取的路径的含义。例如,这样做的一个结果当然是,每种新语言或脚本,以及一些现有的语言或脚本,都要求其所有字符都具有特定的、预合成的代码点的Unicode赋值。
Note that this is not currently permitted within Unicode for Latin scripts. For non-Latin scripts, some such code points have been defined. The decisions that govern the assignment of such code points are managed entirely within the Unicode Consortium. Were the IETF to choose to reduce IDNA complexity by excluding combining characters, no doubt there would be additional input to the Unicode Consortium from users and proponents of scripts that precomposed
请注意,对于拉丁语脚本,Unicode目前不允许这样做。对于非拉丁语脚本,已经定义了一些这样的代码点。管理此类代码点分配的决策完全由Unicode联盟管理。如果IETF选择通过排除组合字符来降低IDNA的复杂性,那么用户和预组合脚本的支持者无疑会向Unicode联盟提供额外的输入
characters be required. The IAB and the IETF should examine whether it is appropriate to press the Unicode Consortium to revise these policies or otherwise to recommend actions that would reduce the need for normalization and the related complexities. However, we have been told that the Technical Committee does not believe it is reasonable or feasible to add all possible precomposed characters to Unicode. If Unicode cannot be modified to contain the precomposed characters necessary to support existing languages and scripts, much less new ones, this option for IDN restrictions will not be feasible.
必须输入字符。IAB和IETF应审查是否适合敦促Unicode联盟修订这些政策,或以其他方式建议降低标准化需求和相关复杂性的行动。然而,我们被告知,技术委员会不认为在Unicode中添加所有可能的预合成字符是合理或可行的。如果无法修改Unicode以包含支持现有语言和脚本(更不用说新语言和脚本)所需的预合成字符,则此IDN限制选项将不可行。
In many Unicode applications, the preferred solution is to pick a style of normalization and require that all text that is stored or transmitted be normalized to that form. (This is the approach taken in ongoing work in the IETF on a standard Unicode text form [net-utf8]). IDNA does not impose this requirement. Text is normalized and case-reduced at registration time, and only the normalized version is placed in the DNS. However, there is no requirement that applications show only the native (and lower-case where appropriate) characters associated with the normalized form in discussions or references such as URLs. If conventions used for all-ASCII DNS labels are to be extended to internationalized forms, such a requirement would be unreasonable, since it would prohibit the use of mixed-case references for clarity or market identification. It might even be culturally inappropriate. However, without that restriction, the comparison that will ultimately be made in the DNS will be between strings normalized at different times and under different versions of Unicode. The assertion that a string in normalized form under one version of Unicode will still be in normalized form under all future versions is not sufficient. Normalization at different times also requires that a given source string always normalizes to the same target string, regardless of the version under which it is normalized. That criterion is much more difficult to fulfill. The discussion above suggests that it may even be impossible.
在许多Unicode应用程序中,首选的解决方案是选择一种规范化样式,并要求存储或传输的所有文本都规范化为该格式。(这是IETF在标准Unicode文本格式[net-utf8]上正在进行的工作中采用的方法)。IDNA没有规定这一要求。文本在注册时被规范化并减少大小写,并且只有规范化版本被放置在DNS中。但是,不要求应用程序在讨论或引用(如URL)中仅显示与规范化表单相关联的本机字符(以及适当的小写字符)。如果用于所有ASCII DNS标签的约定都要扩展到国际化形式,那么这样的要求将是不合理的,因为它将禁止使用混合案例引用以明确或识别市场。这甚至可能在文化上是不合适的。但是,如果没有该限制,最终将在DNS中进行的比较将在不同时间和不同Unicode版本下规范化的字符串之间进行。断言一个Unicode版本下的规范化形式的字符串在所有未来版本下仍将是规范化形式是不够的。在不同的时间进行规范化还要求给定的源字符串始终规范化为同一目标字符串,而不管在哪个版本下进行规范化。这一标准更难实现。上述讨论表明,这甚至可能是不可能的。
Ignoring these issues with combining characters entirely, as IDNA effectively does today, may leave us "stuck" at Unicode 3.2, leading either to incompatibility differences in applications that otherwise use a modern version of Unicode (while IDN remains at Unicode 3.2) or to painful transitions to new versions. If decisions are made quickly, it may still be possible to make a one-time version upgrade to Version 4.1 or Version 5 of Unicode. However, unless we can impose sufficient global restrictions to permit smooth transitions, upgrading to versions beyond that one are likely to be painful (e.g., potentially requiring changing strings already in the DNS or even a new Punycode prefix) or impossible.
像IDNA今天所做的那样,完全忽略组合字符的这些问题,可能会让我们“停留在”Unicode 3.2上,导致使用现代Unicode版本(而IDN仍然使用Unicode 3.2)的应用程序之间的不兼容性差异,或者导致向新版本的痛苦过渡。如果很快做出决定,可能仍然可以一次性升级到Unicode的4.1版或5版。但是,除非我们能够施加足够的全局限制以允许平滑过渡,否则升级到超出该限制的版本可能是痛苦的(例如,可能需要更改DNS中已经存在的字符串,甚至是新的Punycode前缀)或不可能的。
The IETF should consider reviewing RFCs 3454, 3490, 3491, and/or 3492, and update, replace, or supplement them to meet the criteria of this paragraph (one or more of them may prove impractical after further study). Any new versions or additional specifications should be adapted to the version of Unicode that is current when they are created. Ideally, they should specify a path for adapting to future versions of Unicode (some suggestions below may facilitate this). The IETF should also consider whether there are significant advantages to mapping some groups of characters, such as code points assigned to font variations, into others or whether clarity and comprehensibility for the user would be better served by simply prohibiting those characters. More generally, it appears that it would be worthwhile for the IETF to review whether the Unicode normalization rules now invoked by the Stringprep profile in Nameprep are optimal for the DNS or whether more restrictive rules, or an even more restrictive set of permitted character combinations, would provide better support for DNS internationalization.
IETF应考虑审查RFCS 3454, 3490, 3491和/或3492,并更新、替换或补充它们以满足本段的标准(其中一个或多个可能在进一步研究之后被证明是不切实际的)。任何新版本或附加规范都应适应创建时最新的Unicode版本。理想情况下,他们应该指定一条路径,以适应未来版本的Unicode(下面的一些建议可能有助于实现这一点)。IETF还应该考虑将某些字符组(例如分配给字体变化的代码点)映射到其他字符中是否有明显的优点,或者仅仅通过禁止这些字符来更好地服务用户的清晰度和可理解性。更一般地说,IETF有必要审查Nameprep中Stringprep配置文件现在调用的Unicode规范化规则是否适用于DNS,或者更严格的规则,或者更严格的一组允许的字符组合,将为DNS国际化提供更好的支持。
The IAB has concluded that there is a consensus within the broader community that lists of code points should be specified by the use of an inclusion-based mechanism (i.e., identifying the characters that are permitted), rather than by excluding a small number of characters from the total Unicode set as Stringprep and Nameprep do today. That conclusion should be reviewed by the IETF community and action taken as appropriate.
IAB得出结论,在更广泛的社区内,有一个共识,即应通过使用基于包含的机制(即,识别允许的字符)来指定代码点列表,而不是像Stringprep和Nameprep现在所做的那样,从整个Unicode集中排除少量字符。IETF社区应审查该结论,并酌情采取行动。
We suggest that the individuals doing the review of the code points should work as a specialized design team. To the extent possible, that work should be done jointly by people with experience from the IETF and deep knowledge of the constraints of the DNS and application design, participants from the Unicode Consortium, and other people necessary to be able to reach a generally-accepted result. Because any work along these lines would be modifications and updates to standards-track documents, final review and approval of any proposals would necessarily follow normal IETF processes.
我们建议审查代码点的个人应该作为一个专门的设计团队来工作。在可能的情况下,这项工作应由具有IETF经验并对DNS和应用程序设计的约束有深入了解的人员、Unicode联盟的参与者以及能够获得普遍接受的结果所需的其他人员共同完成。由于这些方面的任何工作都是对标准跟踪文件的修改和更新,因此任何提案的最终审查和批准都必须遵循正常的IETF流程。
It is worth noting that sufficiently extreme changes to IDNA would require a new Punycode prefix, probably with long-term support for both the old prefix and the new one in both registration arrangements and applications. An alternative, which is almost certainly impractical, would be some sort of "flag day", i.e., a date on which the old rules are simultaneously abandoned by everyone and the new
值得注意的是,对IDNA进行充分的极端更改将需要一个新的Punycode前缀,可能需要在注册安排和申请中长期支持旧前缀和新前缀。另一种几乎肯定是不切实际的选择是某种“卖旗日”,也就是说,在这一天,所有人和新政府同时放弃旧规则
ones adopted. However, preliminary analysis indicates that few, if any, of the changes recommended for consideration elsewhere in this document would require this type of version change. For example, suppose additional restrictions, such as those implied above, are imposed on what can be registered. Those restrictions might require policy decisions about how labels are to be disposed of if they conformed to the earlier rules but not to the new ones. But they would not inherently require changes in the protocol or prefix.
被采纳的。然而,初步分析表明,本文件其他地方建议考虑的变更中,很少(如果有)需要此类版本变更。例如,假设对可以注册的内容施加了额外的限制,如上面暗示的限制。如果标签符合先前的规则,而不符合新规则,那么这些限制可能需要做出关于如何处置标签的政策决定。但它们本身并不需要更改协议或前缀。
The IETF should once again examine the extent to which it is appropriate to try to solve internationalization problems via the DNS and what place the many varieties of so-called "keyword systems" or other Internet navigational techniques might have. Those techniques can be designed to impose fewer constraints, or at least different constraints, than IDNA and the DNS. As discussed elsewhere in this document, IDNA cannot support information about scripts, languages, or Unicode versions on lookup. As a consequence of the nature of DNS lookups, characters and labels either match or do not match; a near-match is simply not a possible concept in the DNS. By contrast, observation of near-matching is common in human communication and in matching operations performed by people, especially when they have a particular script or language context in mind. The DNS is further constrained by a fairly rigid internal aliasing system (via CNAME and DNAME resource records), while some applications of international naming may require more flexibility. Finally, the rigid hierarchy of the DNS --and the tendency in practice for it to become flat at levels nearest the root-- and the need for names to be unique are more suitable for some purposes than others and may not be a good match for some purposes for which people wish to use IDNs. Each of these constraints can be relaxed or changed by one or more systems that would provide alternatives to direct use of the DNS by users. Some of the issues involved are discussed further in Section 5.3 and various ideas have been discussed in detail in the IETF or IRTF. Many of those ideas have even been described in Internet Drafts or other documents. As experience with IDNs and with expectations for them accumulates, it will probably become appropriate for the IETF or IRTF to revisit the underlying questions and possibilities.
IETF应再次检查尝试通过DNS解决国际化问题的适当程度,以及各种所谓的“关键字系统”或其他互联网导航技术的位置。这些技术可以设计为施加比IDNA和DNS更少的约束,或者至少不同的约束。正如本文档其他部分所讨论的,IDNA无法支持有关脚本、语言或Unicode版本的查找信息。由于DNS查找的性质,字符和标签要么匹配,要么不匹配;在DNS中,近似匹配根本不是一个可能的概念。相比之下,近匹配的观察在人类交流和人们执行的匹配操作中很常见,尤其是当他们脑子里有特定的脚本或语言上下文时。DNS进一步受到相当严格的内部别名系统(通过CNAME和DNAME资源记录)的限制,而一些国际命名应用可能需要更大的灵活性。最后,DNS的严格层次结构——在实践中,它倾向于在最接近根的层次上变得平坦——以及名称必须唯一,这在某些方面比其他方面更适合,并且可能不适合人们希望使用IDN的某些目的。一个或多个系统可以放松或更改这些约束,这些系统将提供用户直接使用DNS的替代方案。第5.3节进一步讨论了涉及的一些问题,IETF或IRTF中详细讨论了各种想法。其中许多想法甚至在互联网草稿或其他文件中有所描述。随着IDN经验的积累和对IDN的期望,IETF或IRTF重新审视潜在的问题和可能性可能会变得合适。
4.1.3. Security Issues, Certificates, etc.
4.1.3. 安全问题、证书等。
Some characters look like others, often as the result of common origins. The problem with these "confusable" characters, often incorrectly called homographs, has always existed when characters are presented to humans who interpret what is displayed and then make decisions based on what is seen. This is not a problem that exists only when working with internationalized domain names, but they make
有些角色看起来像其他角色,这通常是共同起源的结果。这些“易混淆”的字符(通常被错误地称为同形词)的问题一直存在,当字符呈现给解释所显示内容然后根据所看到的内容做出决策的人时。这不是一个仅在使用国际化域名时才存在的问题,但它们会使
the problem worse. The result of a survey that would explain what the problems are might be interesting. Many of these issues are mentioned in Unicode Technical Report #36 [UTR36].
问题更糟了。解释问题所在的调查结果可能很有趣。Unicode技术报告#36[UTR36]中提到了其中许多问题。
In this and other issues associated with IDNs, precise use of terminology is important lest even more confusion result. The definition of the term 'homograph' that normally appears in dictionaries and linguistic texts states that homographs are different words that are spelled identically (for example, the adjective 'brief' meaning short, the noun 'brief' meaning a document, and the verb 'brief' meaning to inform). By definition, letters in two different alphabets are not the same, regardless of similarities in appearance. This means that sequences of letters from two different scripts that appear to be identical on a computer display cannot be homographs in the accepted sense, even if they are both words in the dictionary of some language. Assuming that there is a language written with Cyrillic script in which "cap" is a word, regardless of what it might mean, it is not a homograph of the Latin-script English word "cap".
在这一问题以及与IDN相关的其他问题中,准确使用术语非常重要,以免造成更多的混淆。通常出现在字典和语言文本中的术语“同形词”的定义指出,同形词是拼写相同的不同单词(例如,形容词“简短”表示简短,名词“简短”表示文档,动词“简短”表示通知)。根据定义,两个不同字母表中的字母并不相同,无论其外观是否相似。这意味着,来自两个不同脚本的字母序列在计算机显示器上看起来完全相同,即使它们都是某种语言词典中的单词,也不能是公认意义上的同形词。假设有一种语言是用西里尔字母书写的,其中“cap”是一个单词,不管它的意思是什么,它不是拉丁字母英语单词“cap”的同形异义词。
When the security implications of visually confusable characters were brought to the forefront in 2005, the term homograph was used to designate any instance of graphic similarity, even when comparing individual characters. This usage is not only incorrect, but risks introducing even more confusion and hence should be avoided. The current preferred terminology is to describe these similar-looking characters as "confusable characters" or even "confusables".
2005年,当视觉上易混淆的字符的安全含义被提到最重要的位置时,术语同形词被用来表示图形相似性的任何实例,即使是在比较单个字符时。这种用法不仅是不正确的,而且有可能导致更多的混淆,因此应该避免。当前首选的术语是将这些相似的字符描述为“可混淆字符”甚至“可混淆字符”。
Many people have suggested that confusable characters are a problem that must be addressed, at least in part, directly in the user interfaces of application software. While it should almost certainly be part of a complete solution, that approach creates it own set of difficulties. For example, a user switching between systems, or even between applications on the same system, may be surprised by different types of behavior and different levels of protection. In addition, it is unclear how a secure setup for the end user should be designed. Today, in the web browser, a padlock is a traditional way of describing some level of security for the end user. Is this binary signaling enough? Should there be any connection between a risk for a displayed string including confusable characters and the padlock or similar signaling to the user?
许多人认为,易混淆字符是一个必须直接在应用软件的用户界面中解决的问题,至少部分是这样。虽然几乎可以肯定这是一个完整解决方案的一部分,但这种方法本身也带来了一系列困难。例如,用户在系统之间切换,甚至在同一系统上的应用程序之间切换,可能会对不同类型的行为和不同级别的保护感到惊讶。此外,还不清楚应如何为最终用户设计安全设置。如今,在web浏览器中,挂锁是为最终用户描述某种安全级别的传统方式。这个二进制信号足够吗?显示的字符串(包括可混淆字符)的风险与挂锁或向用户发送的类似信号之间是否存在任何联系?
Many web browsers have adopted a convention, based on a "whitelist" or similar technique, of restricting the display of native characters to subdomains of top-level domains that are deemed to have safe practices for the registration of potentially confusable labels. IDNs in other domains are displayed as Punycode. These techniques may not be sufficiently sensitive to differences in policies among
许多web浏览器采用了一种基于“白名单”或类似技术的约定,将本地字符的显示限制在顶级域的子域中,这些子域被认为具有注册潜在易混淆标签的安全实践。其他域中的IDN显示为Punycode。这些技术可能对不同国家的政策差异不够敏感
top-level domains and their subdomains and so, while they are clearly helpful, they may not be adequate. Are other methods of dealing with confusable characters possible? Would other methods of identifying and listing policies about avoiding confusing registrations be feasible and helpful?
顶级域及其子域等,虽然它们显然很有帮助,但可能不够。处理易混淆字符的其他方法是否可行?确定和列出避免混淆注册的政策的其他方法是否可行和有用?
It would be interesting to see a more coordinated effort in establishing guidelines for user interfaces. If nothing else, the current whitelists are browser specific and both can, and do, differ between implementations.
在为用户界面制定指导方针方面,看到更协调的努力将是很有意思的。如果没有其他内容,那么当前的白名单是特定于浏览器的,并且两种白名单在不同的实现中可能也确实有所不同。
Some potential protocol or table changes raise important policy issues about what to do with existing, registered, names. Should such changes be needed, their impact must be carefully evaluated in the IETF, ICANN, and possibly other forums. In particular, protocol or policy changes that would not permit existing names to be registered under the newer rules should be considered carefully, balancing their importance against possible disruption and the issues of invalidating older names against the importance of consistency as seen by the user.
一些潜在的协议或表更改引起了关于如何处理现有的、已注册的名称的重要策略问题。如果需要这些更改,必须在IETF、ICANN以及可能的其他论坛上仔细评估其影响。特别是,应仔细考虑不允许现有名称根据较新规则注册的协议或政策变更,平衡其重要性与可能的中断,以及使旧名称无效的问题与用户所看到的一致性的重要性。
Work is going on in the IETF related to the local part of email addresses. It should be noted that the local part of email addresses has much different syntax and constraints than a domain name label, so to directly apply IDNA on the local part is not possible.
IETF正在进行与电子邮件地址本地部分相关的工作。应该注意的是,电子邮件地址的本地部分与域名标签的语法和约束有很大不同,因此不可能在本地部分直接应用IDNA。
Unicode and the closely-related ISO 10646 are the only coded character sets that aspire to include all of the world's characters. As such, they permit use of international characters without having to identify particular character coding standards or tables. The requirement for a single character set is particularly important for use with the DNS since there is no place to put character set identification. The decision to use Unicode as the base for IETF protocols going forward is discussed in [RFC2277]. The IAB does not see any reason to revisit the decision to use Unicode in IETF protocols.
Unicode和密切相关的ISO10646是唯一希望包含世界上所有字符的编码字符集。因此,它们允许使用国际字符,而无需确定特定的字符编码标准或表格。对单个字符集的要求对于与DNS一起使用尤其重要,因为没有放置字符集标识的位置。[RFC2277]中讨论了使用Unicode作为未来IETF协议基础的决定。IAB认为没有任何理由重新考虑在IETF协议中使用Unicode的决定。
IDNs create new types of collisions between trademarks and domain names as well as collisions between domain names. These have impact on dispute resolution processes used by registries and otherwise. It is important that deployment of IDNs evolve in parallel with review and updating of ICANN or registry-specific dispute resolution processes.
IDN在商标和域名之间以及域名之间创建了新类型的冲突。并对这些争议解决程序产生影响。IDN的部署必须与ICANN或特定于注册中心的争议解决流程的审查和更新同步进行。
The IAB recommends that registries use an inclusion-based model when choosing what characters to allow at the time of registration. This list of characters is in turn to be a subset of what is allowed according to the updated IDNA standard. The IAB further recommends that registries develop their inclusion-based models in parallel with dispute resolution process at the registry itself.
IAB建议注册中心在选择注册时允许哪些字符时使用基于包含的模型。根据更新的IDNA标准,此字符列表是允许的字符的子集。国际律师协会进一步建议各登记处与登记处本身的争端解决程序同时开发其基于纳入的模式。
Most established policies for dealing with claimed or apparent confusion or conflicts of names are based on dispute resolution. Decisions about legitimate use or registration of one or more names are resolved at or after the time of registration on a case-by-case basis and using policies that are specific to the particular DNS zone or jurisdiction involved. These policies have generally not been extended below the level of the DNS that is directly controlled by the top-level registry.
大多数处理声称的或明显的混淆或名称冲突的既定政策都是以争议解决为基础的。关于合法使用或注册一个或多个名称的决定将在注册时或注册后根据具体情况,并使用特定于所涉及特定DNS区域或管辖区的政策予以解决。这些策略通常没有扩展到顶级注册表直接控制的DNS级别以下。
Because of the number of conflicts that can be generated by the larger number of available and confusable characters in Unicode, we recommend that registration-restriction and dispute resolution policies be developed to constrain registration of IDNs and zone administrators at all levels of the DNS tree. Of course, many of these policies will be less formal than others and there is no requirement for complete global consistency, but the arguments for reduction of confusable characters and other issues in TLDs should apply to all zones below that specific TLD.
由于Unicode中大量可用和易混淆的字符可能会产生大量冲突,因此我们建议制定注册限制和争议解决策略,以约束DNS树各级的IDN和区域管理员的注册。当然,这些政策中的许多政策没有其他政策那么正式,也不要求完全的全球一致性,但减少TLD中易混淆字符和其他问题的论点应适用于该特定TLD下的所有区域。
Consistency across all zones can obviously only be accomplished by changes to the protocols. Such changes should be considered by the IETF if particular restrictions are identified that are important and consistent enough to be applied globally.
显然,只有更改协议才能实现所有区域的一致性。如果确定了足够重要且一致的特定限制,IETF应考虑此类变更,以便在全球范围内应用。
Some potential protocol changes or changes to character-mapping tables might, if adopted, have profound registry policy implications. See Section 4.1.4.
如果采用某些潜在的协议更改或对字符映射表的更改,可能会对注册表策略产生深远的影响。见第4.1.4节。
The IAB has concluded that there is not one issue with IDNs at the top level of the DNS (IDN TLDs) but at least three very separate ones:
IAB的结论是,DNS顶级的IDN(IDN TLD)没有一个问题,但至少有三个非常独立的问题:
o If IDNs are to be entered in the root zone, decisions must first be made about how these TLDs are to be named and delegated. These decisions fall within the traditional IANA scope and are ICANN issues today.
o 如果要在根区域中输入IDN,则必须首先决定如何命名和委派这些TLD。这些决定属于IANA的传统范围,是ICANN今天的问题。
o There has been discussion of permitting some or all existing TLDs to be referenced by multiple labels, with those labels presumably representing some understanding of the "name" of the TLD in different languages. If actual aliases of this type are desired for existing domains, the IETF may need to consider whether the use of DNAME records in the root is appropriate to meet that need, what constraints, if any, are needed, whether alternate approaches, such as those of [RFC4185], are appropriate or whether further alternatives should be investigated. But, to the extent to which aliases are considered desirable and feasible, decisions presumably must be made as to which, if any, root IDN labels should be associated with DNAME records and which ones should be handled by normal delegation records or other mechanisms. That decision is one of DNS root-level namespace policy and hence falls to ICANN although we would expect ICANN to pay careful attention to any technical, operational, or security recommendations that may be produced by other bodies.
o 已经讨论过允许一些或所有现有TLD被多个标签引用,这些标签可能代表对不同语言TLD“名称”的某种理解。如果该类型的实际别名对于现有域是期望的,IETF可能需要考虑根中的DNED记录是否适合满足该需求,需要什么约束,如果替代方法,例如[RCF4185]的方法是合适的,或者是否应该进一步研究替代方案。但是,在别名被认为是可取和可行的范围内,可能必须决定哪些根IDN标签(如果有的话)应该与DNAME记录关联,哪些应该由正常的委托记录或其他机制处理。这一决定是DNS根级命名空间政策之一,因此属于ICANN,尽管我们希望ICANN认真关注其他机构可能提出的任何技术、操作或安全建议。
o Finally, if IDN labels are to be placed in the root zone, there are issues associated with how they are to be encoded and deployed. This area may have implications for work that has been done, or should be done, in the IETF.
o 最后,如果要将IDN标签放置在根区域中,则存在与如何对其进行编码和部署相关的问题。这一领域可能会对IETF中已经完成或应该完成的工作产生影响。
Consistent with the framework described above, the IAB offers these recommendations as steps for further consideration in the identified groups.
根据上述框架,IAB提供了这些建议,作为在确定的群体中进一步考虑的步骤。
Generalize from the original "hostname" rules to non-ASCII characters, permitting as few characters as possible to do that job. This would involve a restrictive model for characters permitted in IDN labels, thus contrasting with the approach used to develop the original IDNA/Nameprep tables. That approach was to include all Unicode characters that there was not a clear reason to exclude.
将原始的“主机名”规则概括为非ASCII字符,允许尽可能少的字符来执行该任务。这将涉及对IDN标签中允许的字符的限制性模型,从而与用于开发原始IDNA/Nameprep表的方法形成对比。这种方法是包括所有没有明确理由排除的Unicode字符。
The specific recommendation here is to specify such internationalized hostnames. Such an activity would fall to the IETF, although the task of developing the appropriate list of permitted characters will require effort both in the IETF and elsewhere. The effort should be as linguistically and culturally sensitive as possible, but smooth and effective operation of the DNS, including minimizing of complexity, should be primary goals. The following should be considered as possible mechanisms for achieving an appropriate minimum number of characters.
这里的具体建议是指定此类国际化主机名。这样的活动将由IETF负责,尽管开发适当的允许字符列表的任务将需要IETF和其他地方的努力。这项工作应该在语言和文化上尽可能敏感,但DNS的顺利有效运行,包括最小化复杂性,应该是首要目标。应将以下内容视为实现适当最小字符数的可能机制。
Unicode characters that are not needed to write words or numbers in any of the world's languages should be eliminated from the list of characters that are appropriate in DNS labels. In addition to such characters as those used for box-drawing and sentence punctuation, this should exclude punctuation for word structure and other delimiters. While DNS labels may conveniently be used to express words in many circumstances, the goal is not to express words (or sentences or phrases), but to permit the creation of unambiguous labels with good mnemonic value.
用世界上任何一种语言书写单词或数字时不需要的Unicode字符应该从DNS标签中合适的字符列表中删除。除了用于方框图和句子标点符号的字符外,还应排除用于单词结构和其他分隔符的标点符号。虽然DNS标签可以方便地用于在许多情况下表达单词,但目标不是表达单词(或句子或短语),而是允许创建具有良好记忆价值的明确标签。
The inclusion of the hyphen in the original hostname rules is a historical artifact from an older, flat, namespace. The community should consider whether it is appropriate to treat it as a simple legacy property of ASCII names and not attempt to generalize it to other scripts. We might, for example, not permit claimed equivalents to the hyphen from other scripts to be used in IDNs. We might even consider banning use of the hyphen itself in non-ASCII strings or, less restrictively, strings that contained non-Latin characters.
在原始主机名规则中包含连字符是来自旧的、扁平的命名空间的历史产物。社区应该考虑将其视为ASCII名称的简单遗留属性是否合适,而不是试图将其推广到其他脚本。例如,我们可能不允许在IDN中使用其他脚本中声明的连字符等价物。我们甚至可以考虑禁止在非ASCII字符串中使用连字符本身,或者更少限制性地使用包含非拉丁字符的字符串。
As new scripts, to support new languages, continue to be added to Unicode, it is important that IDNA track updates. If it does not do so, but remains "stuck" at 3.2 or some single later version, it will not be possible to include labels in the DNS that are derived from words in languages that require characters that are available only in later versions. Making those upgrades is difficult, and will continue to be difficult, as long as new versions require, not just addition of characters, but changes to canonicalization conventions, normalization tables, or matching procedures (see Section 3.1). Anything that can be done to lower complexity and simplify forward transitions should be seriously considered.
随着支持新语言的新脚本不断添加到Unicode中,IDNA跟踪更新非常重要。如果它没有这样做,但仍然停留在3.2或某个更高版本上,则不可能在DNS中包含从需要仅在更高版本中可用的字符的语言中的单词派生的标签。只要新版本不仅需要添加字符,而且还需要更改规范化约定、规范化表或匹配过程(请参见第3.1节),那么进行这些升级是困难的,并且将继续是困难的。任何可以降低复杂性和简化前向转换的方法都应该认真考虑。
We wish to remind the community that there are boundaries to the appropriate uses of the DNS. It was designed and implemented to serve some specific purposes. There are additional things that it does well, other things that it does badly, and still other things it cannot do at all. No amount of protocol work on IDNs will solve problems with alternate spellings, near-matches, searching for appropriate names, and so on. Registration restrictions and carefully-designed user interfaces can be used to reduce the risk and pain of attempts to do some of these things gone wrong, as well as reducing the risks of various sort of deliberate bad behavior, but, beyond a certain point, use of the DNS simply because it is available becomes a bad tradeoff. The tradeoff may be particularly unfortunate when the use of IDNs does not actually solve the proposed problem. For example, internationalization of DNS names does not eliminate the ASCII protocol identifiers and structure of URIs [RFC3986] and even IRIs [RFC3987]. Hence, DNS internationalization itself, at any or all levels of the DNS tree, is not a sufficient response to the desire of populations to use the Internet entirely in their own languages and the characters associated with those languages.
我们希望提醒社区,DNS的适当使用是有界限的。它的设计和实现是为了满足某些特定的目的。还有其他一些事情它做得很好,其他一些事情它做得不好,还有一些事情它根本做不到。无论在IDN上做多少协议工作,都无法解决交替拼写、近似匹配、搜索适当名称等问题。注册限制和精心设计的用户界面可以用来降低尝试做这些事情时出错的风险和痛苦,以及降低各种故意不良行为的风险,但是,超过一定程度后,仅仅因为DNS可用而使用DNS将成为一种不好的权衡。当使用IDN并不能真正解决所提出的问题时,这种权衡可能特别不幸。例如,DNS名称的国际化并没有消除URI[RFC3986]甚至IRIs[RFC3987]的ASCII协议标识符和结构。因此,在DNS树的任何或所有级别上,DNS国际化本身都不足以满足人们完全以自己的语言和与这些语言相关的字符使用互联网的愿望。
These issues are discussed at more length, and alternatives presented, in [RFC2825], [RFC3467], [INDNS], and [DNS-Choices].
[RFC2825]、[RFC3467]、[INDNS]和[DNS选择]对这些问题进行了更详细的讨论,并给出了备选方案。
In addition to their presence in the DNS, IDNs introduce issues in other contexts in which domain names are used. In particular, the design and content of databases that bind registered names to information about the registrant (commonly described as "whois" databases) will require review and updating. For example, the whois protocol itself [RFC3912] has no standard capability for handling non-ASCII text: one cannot search consistently for, or report, either a DNS name or contact information that is not in ASCII characters. This may provide some additional impetus for a switch to IRIS [RFC3981] [RFC3982] but also raises a number of other questions about what information, and in what languages and scripts, should be included or permitted in such databases.
IDN除了在DNS中存在外,还引入了使用域名的其他上下文中的问题。特别是,将注册人姓名与注册人信息绑定的数据库(通常称为“whois”数据库)的设计和内容需要审查和更新。例如,whois协议本身[RFC3912]没有处理非ASCII文本的标准功能:无法一致地搜索或报告非ASCII字符的DNS名称或联系人信息。这可能为转换到IRIS[RFC3981][RFC3982]提供了一些额外的动力,但也提出了一些其他问题,即在此类数据库中应包含或允许哪些信息以及哪些语言和脚本。
This document is simply a discussion of IDNs and IDNA issues; it raises no new security concerns. However, if some of its recommendations to reduce IDNA complexity, the number of available characters, and various approaches to constraining the use of confusable characters, are followed and prove successful, the risks of name spoofing and other problems may be reduced.
本文件仅讨论IDN和IDNA问题;它没有引起新的安全问题。但是,如果遵循其关于降低IDNA复杂性、可用字符数量和限制易混淆字符使用的各种方法的一些建议并证明成功,则名称欺骗和其他问题的风险可能会降低。
The contributions to this report from members of the IAB-IDN ad hoc committee are gratefully acknowledged. Of course, not all of the members of that group endorse every comment and suggestion of this report. In particular, this report does not claim to reflect the views of the Unicode Consortium as a whole or those of particular participants in the work of that Consortium.
感谢IAB-IDN特设委员会成员对本报告的贡献。当然,并非该小组的所有成员都赞同本报告的每一条评论和建议。特别是,本报告并不要求反映整个Unicode联盟或该联盟工作的特定参与者的观点。
The members of the ad hoc committee were: Rob Austein, Leslie Daigle, Tina Dam, Mark Davis, Patrik Faltstrom, Scott Hollenbeck, Cary Karp, John Klensin, Gervase Markham, David Meyer, Thomas Narten, Michael Suignard, Sam Weiler, Bert Wijnen, Kurt Zeilenga, and Lixia Zhang.
特设委员会的成员有:罗布·奥斯汀、莱斯利·戴格尔、蒂娜·达姆、马克·戴维斯、帕特里克·法茨特罗姆、斯科特·霍伦贝克、卡里·卡普、约翰·克莱辛、格瓦塞·马卡姆、大卫·迈耶、托马斯·纳腾、迈克尔·苏伊格纳德、萨姆·韦勒、伯特·维恩、库尔特·泽林加和张丽霞。
Thanks are due to Tina Dam and others associated with the ICANN IDN Working Group for contributions of considerable specific text, to Marcos Sanz and Paul Hoffman for careful late-stage reading and extensive comments, and to Pete Resnick for many contributions and comments, both in conjunction with his former IAB service and subsequently. Olaf M. Kolkman took over IAB leadership for this document after Patrik Faltstrom and Pete Resnick stepped down in March 2006.
感谢Tina Dam和与ICANN IDN工作组相关的其他人对大量具体文本的贡献,感谢Marcos Sanz和Paul Hoffman在后期仔细阅读和广泛评论,感谢Pete Resnick对其前IAB服务和后续服务的许多贡献和评论。在Patrik Faltstrom和Pete Resnick于2006年3月辞职后,Olaf M.Kolkman接管了IAB对该文件的领导权。
Members of the IAB at the time of approval of this document were: Bernard Aboba, Loa Andersson, Brian Carpenter, Leslie Daigle, Patrik Faltstrom, Bob Hinden, Kurtis Lindqvist, David Meyer, Pekka Nikander, Eric Rescorla, Pete Resnick, Jonathan Rosenberg and Lixia Zhang.
在批准本文件时,IAB的成员有:伯纳德·阿博巴、洛亚·安德森、布赖恩·卡彭特、莱斯利·戴格尔、帕特里克·法特斯特罗姆、鲍勃·希登、库尔蒂斯·林克维斯特、大卫·迈耶、佩卡·尼坎德、埃里克·雷索拉、皮特·雷斯尼克、乔纳森·罗森伯格和张丽霞。
[ISO10646] International Organization for Standardization, "Information Technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane"", ISO/IEC 10646-1:2000, October 2000.
[ISO10646]国际标准化组织,“信息技术-通用多八位编码字符集(UCS)-第1部分:体系结构和基本多语言平面”,ISO/IEC 10646-1:2000,2000年10月。
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002.
[RFC3454]Hoffman,P.和M.Blanchet,“国际化弦的准备(“stringprep”)”,RFC 3454,2002年12月。
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003.
[RFC3490]Faltstrom,P.,Hoffman,P.,和A.Costello,“应用程序中的域名国际化(IDNA)”,RFC 34902003年3月。
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003.
[RFC3491]Hoffman,P.和M.Blanchet,“Nameprep:国际化域名(IDN)的Stringprep配置文件”,RFC 3491,2003年3月。
[RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003.
[RFC3492]Costello,A.,“Punycode:应用程序中国际化域名的Unicode引导字符串编码(IDNA)”,RFC 3492,2003年3月。
[Unicode32] The Unicode Consortium, "The Unicode Standard, Version 3.0", 2000. (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5). Version 3.2 consists of the definition in that book as amended by the Unicode Standard Annex #27: Unicode 3.1 (http://www.unicode.org/reports/tr27/) and by the Unicode Standard Annex #28: Unicode 3.2 (http://www.unicode.org/reports/tr28/).
[Unicode 32]Unicode联盟,“Unicode标准,3.0版”,2000年。(雷丁,马萨诸塞州,艾迪生·韦斯利,2000年,ISBN 0-201-61633-5)。第3.2版包含该书中的定义,该定义由Unicode标准附录#27:Unicode 3.1修订(http://www.unicode.org/reports/tr27/)根据Unicode标准附录#28:Unicode 3.2(http://www.unicode.org/reports/tr28/).
[DNS-Choices] Faltstrom, P., "Design Choices When Expanding DNS", Work in Progress, June 2005.
[DNS选择]Faltstrom,P.,“扩展DNS时的设计选择”,正在进行的工作,2005年6月。
[ICANNv1] ICANN, "Guidelines for the Implementation of Internationalized Domain Names, Version 1.0", March 2003, <http://www.icann.org/general/ idn-guidelines-20jun03.htm>.
[ICANNv1]ICANN,“国际化域名实施指南,1.0版”,2003年3月<http://www.icann.org/general/ idn-guidelines-20jun03.htm>。
[ICANNv2] ICANN, "Guidelines for the Implementation of Internationalized Domain Names, Version 2.0", November 2005, <http://www.icann.org/general/ idn-guidelines-20sep05.htm>.
[ICANNv2]ICANN,“国际化域名实施指南,2.0版”,2005年11月<http://www.icann.org/general/ idn-guidelines-20sep05.htm>。
[IESG-IDN] Internet Engineering Steering Group (IESG), "IESG Statement on IDN", IESG Statements IDN Statement, February 2003, <http://www.ietf.org/IESG/ STATEMENTS/IDNstatement.txt>.
[IESG-IDN]互联网工程指导小组(IESG),“IESG关于IDN的声明”,IESG声明IDN声明,2003年2月<http://www.ietf.org/IESG/ 语句/IDNstatement.txt>。
[INDNS] National Research Council, "Signposts in Cyberspace: The Domain Name System and Internet Navigation", National Academy Press ISBN 0309- 09640-5 (Book) 0309-54979-5 (PDF), 2005, <http:// www7.nationalacademies.org/cstb/pub_dns.html>.
[INDNS]国家研究委员会,“网络空间中的路标:域名系统和互联网导航”,国家科学院出版社ISBN 0309-09640-5(图书)0309-54979-5(PDF),2005年,<http://www7.nationalsacademies.org/cstb/pub_dns.html>。
[ISO.2022.1986] International Organization for Standardization, "Information Processing: ISO 7-bit and 8-bit coded character sets: Code extension techniques", ISO Standard 2022, 1986.
[ISO.2022.1986]国际标准化组织,“信息处理:ISO 7位和8位编码字符集:代码扩展技术”,ISO标准20221986。
[ISO.646.1991] International Organization for Standardization, "Information technology - ISO 7-bit coded character set for information interchange", ISO Standard 646, 1991.
[ISO.646.1991]国际标准化组织,“信息技术-信息交换用ISO 7位编码字符集”,ISO标准6461991。
[ISO.8859.2003] International Organization for Standardization, "Information processing - 8-bit single-byte coded graphic character sets - Part 1: Latin alphabet No. 1 (1998) - Part 2: Latin alphabet No. 2 (1999) - Part 3: Latin alphabet No. 3 (1999) - Part 4: Latin alphabet No. 4 (1998) - Part 5: Latin/Cyrillic alphabet (1999) - Part 6: Latin/ Arabic alphabet (1999) - Part 7: Latin/Greek alphabet (2003) - Part 8: Latin/Hebrew alphabet (1999) - Part 9: Latin alphabet No. 5 (1999) - Part 10: Latin alphabet No. 6 (1998) - Part 11: Latin/Thai alphabet (2001) - Part 13: Latin alphabet No. 7 (1998) - Part 14: Latin alphabet No. 8 (Celtic) (1998) - Part 15: Latin alphabet No. 9 (1999) - Part 16: Part 16: Latin alphabet No. 10 (2001)", ISO Standard 8859, 2003.
[ISO.8859.2003]国际标准化组织,“信息处理-8位单字节编码图形字符集-第1部分:第1号拉丁字母(1998)-第2部分:第2号拉丁字母(1999)-第3部分:第3号拉丁字母(1999)-第4部分:第4号拉丁字母(1998)-第5部分:拉丁/西里尔字母(1999)-第6部分:拉丁/阿拉伯字母表(1999年)-第7部分:拉丁/希腊字母表(2003年)-第8部分:拉丁/希伯来字母表(1999年)-第9部分:第5号拉丁字母表(1999年)-第10部分:第6号拉丁字母表(1998年)-第11部分:拉丁/泰国字母表(2001年)-第13部分:第7号拉丁字母表(1998年)-第14部分:第8号拉丁字母表(凯尔特语)(1998年)-第15部分:第9号拉丁字母(1999)-第16部分:第16部分:第10号拉丁字母(2001)”,ISO标准8859,2003。
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and Languages", BCP 18, RFC 2277, January 1998.
[RFC2277]Alvestrand,H.,“IETF字符集和语言政策”,BCP 18,RFC 2277,1998年1月。
[RFC2825] IAB and L. Daigle, "A Tangled Web: Issues of I18N, Domain Names, and the Other Internet protocols", RFC 2825, May 2000.
[RFC2825]IAB和L.Daigle,“一个混乱的网络:I18N、域名和其他互联网协议的问题”,RFC 28252000年5月。
[RFC3066] Alvestrand, H., "Tags for the Identification of Languages", BCP 47, RFC 3066, January 2001.
[RFC3066]Alvestrand,H.,“语言识别标签”,BCP 47,RFC 3066,2001年1月。
[RFC3467] Klensin, J., "Role of the Domain Name System (DNS)", RFC 3467, February 2003.
[RFC3467]Klensin,J.,“域名系统(DNS)的作用”,RFC 3467,2003年2月。
[RFC3536] Hoffman, P., "Terminology Used in Internationalization in the IETF", RFC 3536, May 2003.
[RFC3536]Hoffman,P.,“IETF国际化中使用的术语”,RFC3536,2003年5月。
[RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint Engineering Team (JET) Guidelines for Internationalized Domain Names (IDN) Registration and Administration for Chinese, Japanese, and Korean", RFC 3743, April 2004.
[RFC3743]Konishi,K.,Huang,K.,Qian,H.,和Y.Ko,“中国,日本和韩国的国际域名(IDN)注册和管理联合工程团队(JET)指南”,RFC 37432004年4月。
[RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912, September 2004.
[RFC3912]Daigle,L.,“WHOIS协议规范”,RFC 3912,2004年9月。
[RFC3981] Newton, A. and M. Sanz, "IRIS: The Internet Registry Information Service (IRIS) Core Protocol", RFC 3981, January 2005.
[RFC3981]Newton,A.和M.Sanz,“IRIS:互联网注册信息服务(IRIS)核心协议”,RFC 39812005年1月。
[RFC3982] Newton, A. and M. Sanz, "IRIS: A Domain Registry (dreg) Type for the Internet Registry Information Service (IRIS)", RFC 3982, January 2005.
[RFC3982]Newton,A.和M.Sanz,“IRIS:Internet注册表信息服务(IRIS)的域注册表(dreg)类型”,RFC 3982,2005年1月。
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
[RFC3986]Berners Lee,T.,Fielding,R.,和L.Masinter,“统一资源标识符(URI):通用语法”,STD 66,RFC 3986,2005年1月。
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, January 2005.
[RFC3987]Duerst,M.和M.Suignard,“国际化资源标识符(IRIs)”,RFC 3987,2005年1月。
[RFC4185] Klensin, J., "National and Local Characters for DNS Top Level Domain (TLD) Names", RFC 4185, October 2005.
[RFC4185]Klensin,J.,“DNS顶级域名(TLD)的国家和地方字符”,RFC 4185,2005年10月。
[RFC4290] Klensin, J., "Suggested Practices for Registration of Internationalized Domain Names (IDN)", RFC 4290, December 2005.
[RFC4290]Klensin,J.,“国际域名(IDN)注册的建议做法”,RFC 42902005年12月。
[RFC4645] Ewell, D., "Initial Language Subtag Registry", RFC 4645, September 2006.
[RFC4645]Ewell,D.,“初始语言子标签注册”,RFC 46452006年9月。
[RFC4646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 4646, September 2006.
[RFC4646]Phillips,A.和M.Davis,“识别语言的标记”,BCP 47,RFC 46462006年9月。
[UTR] Unicode Consortium, "Unicode Technical Reports", <http://www.unicode.org/reports/>.
[UTR]Unicode联盟,“Unicode技术报告”<http://www.unicode.org/reports/>.
[UTR36] Davis, M. and M. Suignard, "Unicode Technical Report #36: Unicode Security Considerations", November 2005, <http://www.unicode.org/draft/ reports/tr36/tr36.html>.
[UTR36]Davis,M.和M.Suignard,“Unicode技术报告#36:Unicode安全注意事项”,2005年11月<http://www.unicode.org/draft/ 报告/tr36/tr36.html>。
[UTR39] Davis, M. and M. Suignard, "Unicode Technical Standard #39 (proposed): Unicode Security Considerations", July 2005, <http:// www.unicode.org/draft/reports/tr39/tr39.html>.
[UTR39]Davis,M.和M.Suignard,“Unicode技术标准#39(提议):Unicode安全注意事项”,2005年7月,<http://www.Unicode.org/draft/reports/tr39/tr39.html>。
[Unicode-PR29] The Unicode Consortium, "Public Review Issue #29: Normalization Issue", Unicode PR 29, February 2004.
[Unicode-PR29]Unicode联盟,“公共评论问题#29:规范化问题”,Unicode PR 29,2004年2月。
[Unicode10] The Unicode Consortium, "The Unicode Standard,
[Unicode 10]Unicode联盟,“Unicode标准,
Version 1.0", 1991.
1.0版”,1991年。
[W3C-Localization] Ishida, R. and S. Miller, "Localization vs. Internationalization", W3C International/ questions/qa-i18n.txt, December 2005.
[W3C本地化]Ishida,R.和S.Miller,“本地化与国际化”,W3C International/questions/qa-i18n.txt,2005年12月。
[net-utf8] Klensin, J. and M. Padlipsky, "Unicode Format for Network Interchange", Work in Progress, April 2006.
[net-utf8]Klensin,J.和M.Padlipsky,“网络交换的Unicode格式”,正在进行的工作,2006年4月。
Authors' Addresses
作者地址
John C Klensin 1770 Massachusetts Ave, #322 Cambridge, MA 02140 USA
美国马萨诸塞州剑桥大道40号,邮编:1770
Phone: +1 617 491 5735 EMail: john-ietf@jck.com
Phone: +1 617 491 5735 EMail: john-ietf@jck.com
Patrik Faltstrom Cisco Systems
Patrik Faltstrom思科系统公司
EMail: paf@cisco.com
EMail: paf@cisco.com
Cary Karp Swedish Museum of Natural History Box 50007 Stockholm SE-10405 Sweden
瑞典自然历史博物馆瑞典斯德哥尔摩SE-10405信箱50007
Phone: +46 8 5195 4055 EMail: ck@nrm.museum
Phone: +46 8 5195 4055 EMail: ck@nrm.museum
IAB
IAB
EMail: iab@iab.org
EMail: iab@iab.org
Full Copyright Statement
完整版权声明
Copyright (C) The Internet Society (2006).
版权所有(C)互联网协会(2006年)。
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
本文件受BCP 78中包含的权利、许可和限制的约束,除其中规定外,作者保留其所有权利。
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件及其包含的信息是按“原样”提供的,贡献者、他/她所代表或赞助的组织(如有)、互联网协会和互联网工程任务组不承担任何明示或暗示的担保,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。
Intellectual Property
知识产权
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
IETF对可能声称与本文件所述技术的实施或使用有关的任何知识产权或其他权利的有效性或范围,或此类权利下的任何许可可能或可能不可用的程度,不采取任何立场;它也不表示它已作出任何独立努力来确定任何此类权利。有关RFC文件中权利的程序信息,请参见BCP 78和BCP 79。
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
向IETF秘书处披露的知识产权副本和任何许可证保证,或本规范实施者或用户试图获得使用此类专有权利的一般许可证或许可的结果,可从IETF在线知识产权存储库获取,网址为http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.
IETF邀请任何相关方提请其注意任何版权、专利或专利申请,或其他可能涵盖实施本标准所需技术的专有权利。请将信息发送至IETF的IETF-ipr@ietf.org.
Acknowledgement
确认
Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).
RFC编辑器功能的资金由IETF行政支持活动(IASA)提供。