Independent Submission                                        P. Resnick
Request for Comments: 5895                         Qualcomm Incorporated
Category: Informational                                       P. Hoffman
ISSN: 2070-1721                                           VPN Consortium
                                                          September 2010
        
Independent Submission                                        P. Resnick
Request for Comments: 5895                         Qualcomm Incorporated
Category: Informational                                       P. Hoffman
ISSN: 2070-1721                                           VPN Consortium
                                                          September 2010
        

Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008

应用程序中国际化域名的映射字符(IDNA)2008

Abstract

摘要

In the original version of the Internationalized Domain Names in Applications (IDNA) protocol, any Unicode code points taken from user input were mapped into a set of Unicode code points that "made sense", and then encoded and passed to the domain name system (DNS). The IDNA2008 protocol (described in RFCs 5890, 5891, 5892, and 5893) presumes that the input to the protocol comes from a set of "permitted" code points, which it then encodes and passes to the DNS, but does not specify what to do with the result of user input. This document describes the actions that can be taken by an implementation between receiving user input and passing permitted code points to the new IDNA protocol.

在应用程序中的国际化域名(IDNA)协议的原始版本中,从用户输入中获取的任何Unicode代码点都被映射到一组“有意义”的Unicode代码点,然后进行编码并传递给域名系统(DNS)。IDNA2008协议(在RFCs 5890、5891、5892和5893中描述)假定协议的输入来自一组“允许的”代码点,然后对这些代码点进行编码并传递到DNS,但没有指定如何处理用户输入的结果。本文档描述了实现在接收用户输入和将允许的代码点传递给新的IDNA协议之间可以采取的操作。

Status of This Memo

关于下段备忘

This document is not an Internet Standards Track specification; it is published for informational purposes.

本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。

This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741.

这是对RFC系列的贡献,独立于任何其他RFC流。RFC编辑器已选择自行发布此文档,并且未声明其对实现或部署的价值。RFC编辑批准发布的文件不适用于任何级别的互联网标准;见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5895.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc5895.

Copyright Notice

版权公告

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2010 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。

1. Introduction
1. 介绍

This document describes the operations that can be applied to user input in order to get it into a form that is acceptable by the Internationalized Domain Names in Applications (IDNA) protocol [IDNA2008protocol]. It includes a general implementation procedure for mapping.

本文档描述了可应用于用户输入的操作,以便将其转换为应用程序中的国际化域名(IDNA)协议[IDNA2008protocol]可接受的形式。它包括映射的一般实现过程。

It should be noted that this document does not specify the behavior of a protocol that appears "on the wire". It describes an operation that is to be applied to user input in order to prepare that user input for use in an "on the network" protocol. As unusual as this may be for a document concerning Internet protocols, it is necessary to describe this operation for implementors who may have designed around the original IDNA protocol (herein referred to as IDNA2003), which conflates this user-input operation into the protocol.

应该注意的是,本文档没有指定“在线”协议的行为。它描述了应用于用户输入的操作,以便准备用户输入用于“网络上”协议。尽管这对于涉及互联网协议的文档来说可能是不寻常的,但有必要为可能围绕原始IDNA协议(本文称为IDNA2003)进行设计的实施者描述此操作,该协议将此用户输入操作合并到协议中。

It is very important to note that there are many potential valid mappings of characters from user input. The mapping described in this document is the basis for other mappings, and is not likely to be useful without modification. Any useful mapping will have features designed to reduce the surprise for users and is likely to be slightly (or sometimes radically) different depending on the locale of the user, the type of input being used (such as typing, copy-and-paste, voice, and so on), the type of application used, etc. Although most common mappings will probably produce similar results for the same input, there will be subtle differences between applications.

需要注意的是,用户输入中存在许多潜在的有效字符映射。本文档中描述的映射是其他映射的基础,如果不进行修改,它可能不会有用。任何有用的映射都具有旨在减少用户惊喜的功能,并且可能会根据用户的区域设置、使用的输入类型(如键入、复制和粘贴、语音等)、使用的应用程序类型而略有不同(有时甚至根本不同),尽管大多数常见的映射可能会对相同的输入产生类似的结果,但应用程序之间会有细微的差异。

1.1. The Dividing Line between User Interface and Protocol
1.1. 用户界面和协议之间的分界线

The user interface to applications is much more complicated than most network implementers think. When we say "the user enters an internationalized domain name in the application", we are talking about a very complex process that encompasses everything from the user formulating the name and deciding which symbols to use to

应用程序的用户界面比大多数网络实现者想象的要复杂得多。当我们说“用户在应用程序中输入一个国际化域名”时,我们所说的是一个非常复杂的过程,它包含了从用户制定名称到决定使用哪些符号的所有内容

express that name, to the user entering the symbols into the computer using some input method (be it a keyboard, a stylus, or even a voice recognition program), to the computer interpreting that input (be it keyboard scan codes, a graphical representation, or digitized sounds) into some representation of those symbols, through finally normalizing those symbols into a particular character repertoire in an encoding recognizable to IDNA processes and the domain name system.

向使用某种输入法(键盘、手写笔、甚至语音识别程序)将符号输入计算机的用户表示该名称,向将该输入(键盘扫描码、图形表示或数字化声音)解释为这些符号表示的计算机表示该名称,通过最终将这些符号规范化为IDNA进程和域名系统可识别的编码中的特定字符集。

Considerations for a user interface for internationalized domain names involves taking into account culture, context, and locale for any given user. A simple and well-known example is the lowercasing of the letter LATIN CAPITAL LETTER I (U+0049) when it is used in the Turkish and other languages. A capital "I" in Turkish is properly lowercased to a LATIN SMALL LETTER DOTLESS I (U+0131), not to a LATIN SMALL LETTER I (U+0069). This lowercasing is clearly dependent on the locale of the system and/or the locale of the user. Using a single context-free mapping without considering the user interface properties has the potential of doing exactly the wrong thing for the user.

国际化域名用户界面的考虑因素包括考虑任何给定用户的文化、上下文和语言环境。一个简单而著名的例子是在土耳其语和其他语言中使用拉丁字母大写字母I(U+0049)时将其小写。土耳其语中的大写字母“I”正确地小写为拉丁小写字母DOTLESS I(U+0131),而不是拉丁小写字母I(U+0069)。这种小写显然取决于系统的语言环境和/或用户的语言环境。在不考虑用户界面属性的情况下使用单个上下文无关映射可能会为用户做错误的事情。

The original version of IDNA conflated user interface processing and protocol. It took whatever characters the user produced in whatever encoding the application used, assumed some conversion to Unicode code points, and then without regard to context, locale, or anything about the user's intentions, mapped them into a particular set of other characters, and then re-encoded them in Punycode, in order to have the entire operation be contained within the protocol. Ignoring context, locale, and user preference in the IDNA protocol made life significantly less complicated for the application developer, but at the expense of violating the principle of "least user surprise" for consumers and producers of domain names.

IDNA的原始版本融合了用户界面处理和协议。它采用用户在应用程序使用的任何编码中生成的任何字符,假设将其转换为Unicode代码点,然后不考虑上下文、区域设置或任何有关用户意图的信息,将其映射到一组特定的其他字符中,然后在Punycode中对其重新编码,以便将整个操作包含在协议中。在IDNA协议中忽略上下文、区域设置和用户偏好,使应用程序开发人员的生活大大减少了复杂性,但代价是违反了域名消费者和生产者的“最少用户惊喜”原则。

In IDNA2008, the dividing line between "user interface" and "protocol" is clear. The IDNA2008 specification defines the protocol part of IDNA: it explicitly does not deal with the user interface. Mappings such as the one described in this document explicitly deal with the user interface and not the protocol. That is, a mapping is only to be applied before a string of characters is treated as a domain name (in the "user interface") and is never to be applied during domain name processing (in the "protocol").

在IDNA2008中,“用户界面”和“协议”之间的分界线是明确的。IDNA2008规范定义了IDNA的协议部分:它明确地不处理用户界面。本文档中描述的映射明确地处理用户界面,而不是协议。也就是说,映射仅在字符串被视为域名(在“用户界面”中)之前应用,并且在域名处理过程中(在“协议”中)永远不会应用。

1.2. The Design of This Mapping
1.2. 该映射的设计

The user interface mapping in this document is a set of expansions to IDNA2008 that are meant to be sensible and friendly and mostly obvious to people throughout the world when using typical applications with domain names that are entered by hand. It is also

本文档中的用户界面映射是对IDNA2008的一组扩展,其目的是在使用手动输入域名的典型应用程序时,对全世界的人来说都是明智和友好的,并且最明显。也是

designed to let applications be mostly backwards compatible with IDNA2003. By definition, it cannot meet all of those design goals for all people, and in fact is known to fail on some of those goals for quite large populations of people.

旨在使应用程序大部分向后兼容IDNA2003。根据定义,它不能满足所有人的所有这些设计目标,事实上,对于相当多的人来说,它在某些目标上失败了。

A good mapping in the real world might use the "sensible and friendly and mostly obvious" design goal but come up with a different algorithm. Many algorithms will have results that are close to what is described here, but will differ in assumptions about the users' way of thinking or typing. Having said that, it is likely that some mappings will be significantly different. For example, a mapping might apply to a spoken user interface instead of a typed one. Another example is that a mapping might be different for users that are typing than for users that are copying-and-pasting from different applications. Yet another example is that a user interface that allows typed input that is transliterated from Latin characters could have very different mappings than one that applies to typing in other character sets; this would be typical in a Pinyin input method for Chinese characters.

在现实世界中,一个好的映射可能会使用“合理、友好且最明显”的设计目标,但会使用不同的算法。许多算法的结果与本文描述的结果相近,但对用户的思维方式或键入方式的假设不同。话虽如此,有些映射可能会有很大的不同。例如,映射可能应用于语音用户界面,而不是类型化用户界面。另一个例子是,对于正在键入的用户,映射可能与从不同应用程序复制和粘贴的用户不同。还有一个例子是,一个用户界面,允许输入的输入是从拉丁字符音译而来的,它的映射可能与在其他字符集中输入的映射非常不同;这在汉字的拼音输入法中是典型的。

2. The General Procedure
2. 一般程序

This section defines a general algorithm that applications ought to implement in order to produce Unicode code points that will be valid under the IDNA protocol. An application might implement the full mapping as described below, or it can choose a different mapping. This mapping is very general and was designed to be acceptable to the widest user community, but as stated above, it does not take into account any particular context, culture, or locale.

本节定义了应用程序应实现的通用算法,以便生成在IDNA协议下有效的Unicode代码点。应用程序可以实现如下所述的完整映射,也可以选择其他映射。此映射非常通用,旨在为最广泛的用户社区所接受,但如上所述,它没有考虑任何特定的上下文、文化或区域设置。

The general algorithm that an application (or the input method provided by an operating system) ought to use is relatively straightforward:

应用程序(或操作系统提供的输入法)应该使用的通用算法相对简单:

1. Uppercase characters are mapped to their lowercase equivalents by using the algorithm for mapping case in Unicode characters. This step was chosen because the output will behave more like ASCII host names behave.

1. 通过使用Unicode字符中的大小写映射算法,将大写字符映射为其对应的小写字符。选择此步骤是因为输出的行为更像ASCII主机名的行为。

2. Fullwidth and halfwidth characters (those defined with Decomposition Types <wide> and <narrow>) are mapped to their decomposition mappings as shown in the Unicode character database. This step was chosen because many input mechanisms, particularly in Asia, do not allow you to easily enter characters in the form used by IDNA2008. Even if they do allow the correct character form, the user might not know which form they are entering.

2. 如Unicode字符数据库中所示,全宽字符和半宽字符(使用分解类型<宽>和<窄>定义的字符)映射到它们的分解映射。选择此步骤是因为许多输入机制(特别是在亚洲)不允许您以IDNA2008使用的格式轻松输入字符。即使他们允许正确的字符形式,用户也可能不知道他们输入的是哪种形式。

3. All characters are mapped using Unicode Normalization Form C (NFC). This step was chosen because it maps combinations of combining characters into canonical composed form. As with the fullwidth/halfwidth mapping, users are not generally aware of the particular form of characters that they are entering, and IDNA2008 requires that only the canonical composed forms from NFC be used.

3. 所有字符都使用Unicode规范化表单C(NFC)进行映射。选择此步骤是因为它将组合字符的组合映射为规范的组合形式。与fullwidth/halfwidth映射一样,用户通常不知道他们输入的字符的特定形式,IDNA2008要求只使用NFC中的规范组合形式。

4. [IDNA2008protocol] is specified such that the protocol acts on the individual labels of the domain name. If an implementation of this mapping is also performing the step of separation of the parts of a domain name into labels by using the FULL STOP character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002) can be mapped to the FULL STOP before label separation occurs. There are other characters that are used as "full stops" that one could consider mapping as label separators, but their use as such has not been investigated thoroughly. This step was chosen because some input mechanisms do not allow the user to easily enter proper label separators. Only the IDEOGRAPHIC FULL STOP character (U+3002) is added in this mapping because the authors have not fully investigated the applicability of other characters and the environments where they should and should not be considered domain name label separators.

4. [IDNA2008protocol]的指定使得该协议作用于域名的各个标签。如果此映射的实现还通过使用句号字符(U+002E)执行将域名的部分分离为标签的步骤,则表意句号字符(U+3002)可以在标签分离发生之前映射到句号。还有其他字符被用作“完全停止”,人们可以考虑将其映射为标签分隔符,但是它们的使用还没有被彻底地研究过。选择此步骤是因为某些输入机制不允许用户轻松输入正确的标签分隔符。此映射中只添加了表意符句号字符(U+3002),因为作者没有充分调查其他字符的适用性以及它们应该和不应该被视为域名标签分隔符的环境。

Note that the steps above are ordered.

请注意,以上步骤是有序的。

Definitions for the rules in this algorithm can be found in [Unicode52]. Specifically:

此算法中规则的定义可在[Unicode52]中找到。明确地:

o Unicode Normalization Form C can be found in Annex #15 of [Unicode-UAX15].

o Unicode规范化表格C可在[Unicode-UAX15]的附件15中找到。

o In order to map uppercase characters to their lowercase equivalents (defined in Section 3.13 of [Unicode52]), first map characters to the "Lowercase_Mapping" property (the "<lower>" entry in the second column) in <http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt>, if any. Then, map characters to the "Simple_Lowercase_Mapping" property (the fourteenth column) in <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>, if any.

o 为了将大写字符映射到其对应的小写字符(定义见[Unicode52]第3.13节),首先将字符映射到<http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt>,如有的话。然后,将字符映射到中的“Simple_Lowercase_Mapping”属性(第十四列)<http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>,如有的话。

o In order to map fullwidth and halfwidth characters to their decomposition mappings, map any character whose "Decomposition_Type" (contained in the first part of the sixth column) in <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt> is either "<wide>" or "<narrow>" to the "Decomposition_Mapping" of that character (contained in the second part of the sixth column) in <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>.

o 要将全宽和半宽字符映射到其分解映射,请映射其“分解类型”(包含在第六列的第一部分)为<http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>是该字符的“分解映射”的“<wide>”或“<窄>”(包含在第六栏的第二部分)中<http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>.

o The Unicode Character Database [TR44] has useful descriptions of the contents of these files.

o Unicode字符数据库[TR44]对这些文件的内容有有用的描述。

If the mappings in this document are applied to versions of Unicode later than Unicode 5.2, the later versions of the Unicode Standard should be consulted.

如果本文档中的映射应用于Unicode 5.2之后的Unicode版本,则应参考Unicode标准的更高版本。

These form a minimal set of mappings that an application should strongly consider doing. Of course, there are many others that might be done.

这些构成了应用程序应该强烈考虑的一组最小映射。当然,还有很多其他的方法可以做到。

3. Implementing This Mapping
3. 实现此映射

If you are implementing a mapping for an application or operating system by using exactly the four steps in Section 2, the authors of this document have a request: please don't. We mean it. Section 2 does not describe a universal mapping algorithm because, as we said, there is no universally-applicable mapping algorithm.

如果您正通过使用第2节中的四个步骤来实现应用程序或操作系统的映射,那么本文档的作者有一个请求:请不要。我们是认真的。第2节没有描述通用映射算法,因为正如我们所说,没有通用的映射算法。

If you read the material in Section 2 without reading Section 1, go back and carefully read all of Section 1; in many ways, Section 1 is more important than Section 2. Further, you can probably think of user interface considerations that we did not list in Section 1. If you did read Section 1 but somehow decided that the algorithm in Section 2 is completely correct for the intended users of your application or operating system, you are probably not thinking hard enough about your intended users.

如果你没有阅读第1节而阅读了第2节中的材料,请返回并仔细阅读第1节中的所有内容;在许多方面,第1节比第2节更重要。此外,您可能会想到我们在第1节中没有列出的用户界面注意事项。如果您确实阅读了第1节,但不知何故认为第2节中的算法对于应用程序或操作系统的预期用户来说是完全正确的,那么您可能对预期用户考虑得不够仔细。

4. Security Considerations
4. 安全考虑

This document suggests creating mappings that might cause confusion for some users while alleviating confusion in other users. Such confusion is not covered in any depth in this document (nor in the other IDNA-related documents).

本文档建议创建可能导致某些用户混淆的映射,同时减轻其他用户的混淆。本文件(或其他IDNA相关文件)未深入讨论此类混淆。

5. Acknowledgements
5. 致谢

This document is the product of many contributions from numerous people in the IETF.

本文件是IETF中许多人的许多贡献的产物。

6. Normative References
6. 规范性引用文件

[IDNA2008protocol] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, August 2010.

[IDNA2008protocol]Klensin,J.,“应用程序中的国际化域名(IDNA):协议”,RFC 58912010年8月。

[TR44] The Unicode Consortium, "Unicode Technical Report #44: Unicode Character Database", September 2009, <http://www.unicode.org/reports/tr44/ tr44-4.html>.

[TR44]Unicode联盟,“Unicode技术报告#44:Unicode字符数据库”,2009年9月<http://www.unicode.org/reports/tr44/ tr44-4.html>。

[Unicode-UAX15] The Unicode Consortium, "Unicode Standard Annex #15: Unicode Normalization Forms, Revision 31", September 2009, <http://www.unicode.org/reports/ tr15/tr15-31.html>.

[Unicode-UAX15]Unicode联盟,“Unicode标准附录15:Unicode规范化表单,第31版”,2009年9月<http://www.unicode.org/reports/ tr15/tr15-31.html>。

[Unicode52] The Unicode Consortium. The Unicode Standard, Version 5.2.0, defined by: "The Unicode Standard, Version 5.2.0", (Mountain View, CA: The Unicode Consortium, 2009. ISBN 978-1-936213-00-9). <http://www.unicode.org/versions/Unicode5.2.0/>.

[Unicode 52]Unicode联盟。Unicode标准,版本5.2.0,定义为:“Unicode标准,版本5.2.0”(加利福尼亚州山景城:Unicode联盟,2009年。ISBN 978-1-936213-00-9)<http://www.unicode.org/versions/Unicode5.2.0/>.

Authors' Addresses

作者地址

Peter W. Resnick Qualcomm Incorporated 5775 Morehouse Drive San Diego, CA 92121-1714 US

Peter W.Resnick高通公司美国加利福尼亚州圣地亚哥Morehouse大道5775号,邮编92121-1714

   Phone: +1 858 651 4478
   EMail: presnick@qualcomm.com
   URI:   http://www.qualcomm.com/~presnick/
        
   Phone: +1 858 651 4478
   EMail: presnick@qualcomm.com
   URI:   http://www.qualcomm.com/~presnick/
        

Paul Hoffman VPN Consortium 127 Segre Place Santa Cruz, CA 95060 US

美国加利福尼亚州圣克鲁斯塞格雷广场127号保罗·霍夫曼私人有限公司,邮编95060

Phone: 1-831-426-9827 EMail: paul.hoffman@vpnc.org

电话:1-831-426-9827电子邮件:保罗。hoffman@vpnc.org