Network Working Group                                        A. Costello
Request for Comments: 3492                 Univ. of California, Berkeley
Category: Standards Track                                     March 2003
        
Network Working Group                                        A. Costello
Request for Comments: 3492                 Univ. of California, Berkeley
Category: Standards Track                                     March 2003
        

Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)

Punycode:应用程序中国际化域名的Unicode引导字符串编码(IDNA)

Status of this Memo

本备忘录的状况

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

本文件规定了互联网社区的互联网标准跟踪协议,并要求进行讨论和提出改进建议。有关本协议的标准化状态和状态,请参考当前版本的“互联网官方协议标准”(STD 1)。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2003). All Rights Reserved.

版权所有(C)互联网协会(2003年)。版权所有。

Abstract

摘要

Punycode is a simple and efficient transfer encoding syntax designed for use with Internationalized Domain Names in Applications (IDNA). It uniquely and reversibly transforms a Unicode string into an ASCII string. ASCII characters in the Unicode string are represented literally, and non-ASCII characters are represented by ASCII characters that are allowed in host name labels (letters, digits, and hyphens). This document defines a general algorithm called Bootstring that allows a string of basic code points to uniquely represent any string of code points drawn from a larger set. Punycode is an instance of Bootstring that uses particular parameter values specified by this document, appropriate for IDNA.

Punycode是一种简单高效的传输编码语法,设计用于应用程序中的国际化域名(IDNA)。它唯一且可逆地将Unicode字符串转换为ASCII字符串。Unicode字符串中的ASCII字符按字面表示,非ASCII字符由主机名标签(字母、数字和连字符)中允许的ASCII字符表示。本文档定义了一个名为Bootstring的通用算法,该算法允许一个基本代码点字符串唯一地表示从更大集合中提取的任何代码点字符串。Punycode是Bootstring的一个实例,它使用本文档指定的特定参数值,适用于IDNA。

Table of Contents

目录

   1. Introduction...............................................2
       1.1 Features..............................................2
       1.2 Interaction of protocol parts.........................3
   2. Terminology................................................3
   3. Bootstring description.....................................4
       3.1 Basic code point segregation..........................4
       3.2 Insertion unsort coding...............................4
       3.3 Generalized variable-length integers..................5
       3.4 Bias adaptation.......................................7
   4. Bootstring parameters......................................8
   5. Parameter values for Punycode..............................8
   6. Bootstring algorithms......................................9
        
   1. Introduction...............................................2
       1.1 Features..............................................2
       1.2 Interaction of protocol parts.........................3
   2. Terminology................................................3
   3. Bootstring description.....................................4
       3.1 Basic code point segregation..........................4
       3.2 Insertion unsort coding...............................4
       3.3 Generalized variable-length integers..................5
       3.4 Bias adaptation.......................................7
   4. Bootstring parameters......................................8
   5. Parameter values for Punycode..............................8
   6. Bootstring algorithms......................................9
        
       6.1 Bias adaptation function.............................10
       6.2 Decoding procedure...................................11
       6.3 Encoding procedure...................................12
       6.4 Overflow handling....................................13
   7. Punycode examples.........................................14
       7.1 Sample strings.......................................14
       7.2 Decoding traces......................................17
       7.3 Encoding traces......................................19
   8. Security Considerations...................................20
   9. References................................................21
       9.1 Normative References.................................21
       9.2 Informative References...............................21
   A. Mixed-case annotation.....................................22
   B. Disclaimer and license....................................22
   C. Punycode sample implementation............................23
   Author's Address.............................................34
   Full Copyright Statement.....................................35
        
       6.1 Bias adaptation function.............................10
       6.2 Decoding procedure...................................11
       6.3 Encoding procedure...................................12
       6.4 Overflow handling....................................13
   7. Punycode examples.........................................14
       7.1 Sample strings.......................................14
       7.2 Decoding traces......................................17
       7.3 Encoding traces......................................19
   8. Security Considerations...................................20
   9. References................................................21
       9.1 Normative References.................................21
       9.2 Informative References...............................21
   A. Mixed-case annotation.....................................22
   B. Disclaimer and license....................................22
   C. Punycode sample implementation............................23
   Author's Address.............................................34
   Full Copyright Statement.....................................35
        
1. Introduction
1. 介绍

[IDNA] describes an architecture for supporting internationalized domain names. Labels containing non-ASCII characters can be represented by ACE labels, which begin with a special ACE prefix and contain only ASCII characters. The remainder of the label after the prefix is a Punycode encoding of a Unicode string satisfying certain constraints. For the details of the prefix and constraints, see [IDNA] and [NAMEPREP].

[IDNA]描述了支持国际化域名的体系结构。包含非ASCII字符的标签可以由ACE标签表示,ACE标签以特殊的ACE前缀开头,并且仅包含ASCII字符。前缀后的标签剩余部分是满足某些约束的Unicode字符串的Punycode编码。有关前缀和约束的详细信息,请参见[IDNA]和[NAMEPREP]。

Punycode is an instance of a more general algorithm called Bootstring, which allows strings composed from a small set of "basic" code points to uniquely represent any string of code points drawn from a larger set. Punycode is Bootstring with particular parameter values appropriate for IDNA.

Punycode是一种更通用的算法Bootstring的实例,该算法允许由一小组“基本”代码点组成的字符串唯一地表示从较大的代码点集中提取的任何代码点字符串。Punycode是引导字符串,具有适合IDNA的特定参数值。

1.1 Features
1.1 特征

Bootstring has been designed to have the following features:

引导字符串被设计为具有以下功能:

* Completeness: Every extended string (sequence of arbitrary code points) can be represented by a basic string (sequence of basic code points). Restrictions on what strings are allowed, and on length, can be imposed by higher layers.

* 完整性:每个扩展字符串(任意代码点序列)都可以用一个基本字符串(基本代码点序列)表示。对允许的字符串和长度的限制可以由更高的层施加。

* Uniqueness: There is at most one basic string that represents a given extended string.

* 唯一性:最多有一个基本字符串表示给定的扩展字符串。

* Reversibility: Any extended string mapped to a basic string can be recovered from that basic string.

* 可逆性:映射到基本字符串的任何扩展字符串都可以从该基本字符串恢复。

* Efficient encoding: The ratio of basic string length to extended string length is small. This is important in the context of domain names because RFC 1034 [RFC1034] restricts the length of a domain label to 63 characters.

* 高效编码:基本字符串长度与扩展字符串长度的比率很小。这在域名上下文中很重要,因为RFC1034[RFC1034]将域标签的长度限制为63个字符。

* Simplicity: The encoding and decoding algorithms are reasonably simple to implement. The goals of efficiency and simplicity are at odds; Bootstring aims at a good balance between them.

* 简单性:编码和解码算法实现起来相当简单。效率和简单的目标是不一致的;Bootstring的目标是在两者之间取得良好的平衡。

* Readability: Basic code points appearing in the extended string are represented as themselves in the basic string (although the main purpose is to improve efficiency, not readability).

* 可读性:扩展字符串中出现的基本代码点在基本字符串中表示为它们自己(尽管主要目的是提高效率,而不是可读性)。

Punycode can also support an additional feature that is not used by the ToASCII and ToUnicode operations of [IDNA]. When extended strings are case-folded prior to encoding, the basic string can use mixed case to tell how to convert the folded string into a mixed-case string. See appendix A "Mixed-case annotation".

Punycode还可以支持[IDNA]的ToASCII和ToUnicode操作未使用的附加功能。当扩展字符串在编码之前进行大小写折叠时,基本字符串可以使用混合大小写来说明如何将折叠的字符串转换为混合大小写字符串。见附录A“混合案例注释”。

1.2 Interaction of protocol parts
1.2 协议部分的交互

Punycode is used by the IDNA protocol [IDNA] for converting domain labels into ASCII; it is not designed for any other purpose. It is explicitly not designed for processing arbitrary free text.

Punycode由IDNA协议[IDNA]用于将域标签转换为ASCII;它不是为任何其他目的而设计的。它显然不是为处理任意自由文本而设计的。

2. Terminology
2. 术语

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照BCP 14、RFC 2119[RFC2119]中的说明进行解释。

A code point is an integral value associated with a character in a coded character set.

码点是与编码字符集中的字符相关联的整数值。

As in the Unicode Standard [UNICODE], Unicode code points are denoted by "U+" followed by four to six hexadecimal digits, while a range of code points is denoted by two hexadecimal numbers separated by "..", with no prefixes.

与Unicode标准[Unicode]一样,Unicode代码点由“U+”表示,后跟四到六个十六进制数字,而代码点的范围由两个十六进制数字表示,两个数字之间用“.”分隔,没有前缀。

The operators div and mod perform integer division; (x div y) is the quotient of x divided by y, discarding the remainder, and (x mod y) is the remainder, so (x div y) * y + (x mod y) == x. Bootstring uses these operators only with nonnegative operands, so the quotient and remainder are always nonnegative.

运算符div和mod执行整数除法;(x div y)是x除以y的商,丢弃余数,(x mod y)是余数,因此(x div y)*y+(x mod y)==x。引导字符串仅对非负操作数使用这些运算符,因此商和余数始终为非负。

The break statement jumps out of the innermost loop (as in C).

break语句跳出最内部的循环(如在C中)。

An overflow is an attempt to compute a value that exceeds the maximum value of an integer variable.

溢出是指试图计算超过整数变量最大值的值。

3. Bootstring description
3. 引导字符串描述

Bootstring represents an arbitrary sequence of code points (the "extended string") as a sequence of basic code points (the "basic string"). This section describes the representation. Section 6 "Bootstring algorithms" presents the algorithms as pseudocode. Sections 7.1 "Decoding traces" and 7.2 "Encoding traces" trace the algorithms for sample inputs.

Bootstring将任意代码点序列(“扩展字符串”)表示为基本代码点序列(“基本字符串”)。本节介绍表示。第6节“引导字符串算法”将这些算法表示为伪代码。第7.1节“解码跟踪”和第7.2节“编码跟踪”跟踪样本输入的算法。

The following sections describe the four techniques used in Bootstring. "Basic code point segregation" is a very simple and efficient encoding for basic code points occurring in the extended string: they are simply copied all at once. "Insertion unsort coding" encodes the non-basic code points as deltas, and processes the code points in numerical order rather than in order of appearance, which typically results in smaller deltas. The deltas are represented as "generalized variable-length integers", which use basic code points to represent nonnegative integers. The parameters of this integer representation are dynamically adjusted using "bias adaptation", to improve efficiency when consecutive deltas have similar magnitudes.

以下各节介绍引导字符串中使用的四种技术。“基本代码点分离”是对扩展字符串中出现的基本代码点进行的一种非常简单有效的编码:它们只是一次复制所有代码点。“插入-取消排序编码”将非基本代码点编码为增量,并按数字顺序而不是外观顺序处理代码点,这通常会导致较小的增量。增量表示为“广义变长整数”,它使用基本代码点表示非负整数。该整数表示的参数使用“偏差自适应”进行动态调整,以在连续增量具有相似大小时提高效率。

3.1 Basic code point segregation
3.1 基本代码点分离

All basic code points appearing in the extended string are represented literally at the beginning of the basic string, in their original order, followed by a delimiter if (and only if) the number of basic code points is nonzero. The delimiter is a particular basic code point, which never appears in the remainder of the basic string. The decoder can therefore find the end of the literal portion (if there is one) by scanning for the last delimiter.

扩展字符串中出现的所有基本代码点都以其原始顺序在基本字符串的开头以字面形式表示,如果(且仅当)基本代码点的数量不为零,则后跟一个分隔符。分隔符是一个特定的基本代码点,它永远不会出现在基本字符串的其余部分中。因此,解码器可以通过扫描最后一个定界符来找到文本部分的结尾(如果有)。

3.2 Insertion unsort coding
3.2 插入反排序编码

The remainder of the basic string (after the last delimiter if there is one) represents a sequence of nonnegative integral deltas as generalized variable-length integers, described in section 3.3. The meaning of the deltas is best understood in terms of the decoder.

基本字符串的其余部分(如果有最后一个分隔符,则位于最后一个分隔符之后)表示一系列非负整数增量,如第3.3节所述。最好从解码器的角度来理解增量的含义。

The decoder builds the extended string incrementally. Initially, the extended string is a copy of the literal portion of the basic string (excluding the last delimiter). The decoder inserts non-basic code points, one for each delta, into the extended string, ultimately arriving at the final decoded string.

解码器以增量方式构建扩展字符串。最初,扩展字符串是基本字符串(不包括最后一个分隔符)的文本部分的副本。解码器将非基本码点(每个增量一个)插入扩展字符串,最终到达最终解码字符串。

At the heart of this process is a state machine with two state variables: an index i and a counter n. The index i refers to a position in the extended string; it ranges from 0 (the first position) to the current length of the extended string (which refers to a potential position beyond the current end). If the current state is <n,i>, the next state is <n,i+1> if i is less than the length of the extended string, or <n+1,0> if i equals the length of the extended string. In other words, each state change causes i to increment, wrapping around to zero if necessary, and n counts the number of wrap-arounds.

这个过程的核心是一个具有两个状态变量的状态机:索引i和计数器n。索引i指扩展字符串中的位置;它的范围从0(第一个位置)到扩展字符串的当前长度(指超出当前端点的潜在位置)。如果当前状态为<n,i>,如果i小于扩展字符串的长度,则下一个状态为<n,i+1>,或者如果i等于扩展字符串的长度,则下一个状态为<n+1,0>。换言之,每个状态更改都会导致i递增,必要时将环绕为零,n计算环绕的数量。

Notice that the state always advances monotonically (there is no way for the decoder to return to an earlier state). At each state, an insertion is either performed or not performed. At most one insertion is performed in a given state. An insertion inserts the value of n at position i in the extended string. The deltas are a run-length encoding of this sequence of events: they are the lengths of the runs of non-insertion states preceeding the insertion states. Hence, for each delta, the decoder performs delta state changes, then an insertion, and then one more state change. (An implementation need not perform each state change individually, but can instead use division and remainder calculations to compute the next insertion state directly.) It is an error if the inserted code point is a basic code point (because basic code points were supposed to be segregated as described in section 3.1).

请注意,状态总是单调前进(解码器无法返回到较早的状态)。在每个状态下,执行或不执行插入。在给定状态下最多执行一次插入。插入操作在扩展字符串的位置i处插入n的值。Delta是此事件序列的运行长度编码:它们是插入状态之前的非插入状态的运行长度。因此,对于每个增量,解码器执行增量状态改变,然后进行插入,然后再进行一次状态改变。(实现不需要单独执行每个状态更改,但可以使用除法和余数计算直接计算下一个插入状态。)如果插入的代码点是基本代码点,则为错误(因为基本代码点应按照第3.1节所述进行隔离)。

The encoder's main task is to derive the sequence of deltas that will cause the decoder to construct the desired string. It can do this by repeatedly scanning the extended string for the next code point that the decoder would need to insert, and counting the number of state changes the decoder would need to perform, mindful of the fact that the decoder's extended string will include only those code points that have already been inserted. Section 6.3 "Encoding procedure" gives a precise algorithm.

编码器的主要任务是导出将使解码器构造所需字符串的增量序列。它可以通过重复扫描扩展字符串以查找解码器需要插入的下一个代码点,并计算解码器需要执行的状态更改的数量,同时注意解码器的扩展字符串将仅包括已插入的代码点这一事实。第6.3节“编码程序”给出了精确的算法。

3.3 Generalized variable-length integers
3.3 广义变长整数

In a conventional integer representation the base is the number of distinct symbols for digits, whose values are 0 through base-1. Let digit_0 denote the least significant digit, digit_1 the next least significant, and so on. The value represented is the sum over j of digit_j * w(j), where w(j) = base^j is the weight (scale factor) for position j. For example, in the base 8 integer 437, the digits are 7, 3, and 4, and the weights are 1, 8, and 64, so the value is 7 + 3*8 + 4*64 = 287. This representation has two disadvantages: First, there are multiple encodings of each value (because there can be extra zeros in the most significant positions), which is inconvenient

在传统的整数表示法中,基数是数字的不同符号数,其值为0到基数-1。让数字_0表示最低有效数字,数字_1表示下一个最低有效数字,依此类推。表示的值是数字_j*w(j)在j上的和,其中w(j)=基^j是位置j的权重(比例因子)。例如,在基数为8的整数437中,数字为7、3和4,权重为1、8和64,因此值为7+3*8+4*64=287。这种表示法有两个缺点:首先,每个值都有多个编码(因为在最重要的位置可能有额外的零),这很不方便

when unique encodings are needed. Second, the integer is not self-delimiting, so if multiple integers are concatenated the boundaries between them are lost.

当需要唯一编码时。其次,整数不是自定界的,因此如果将多个整数连接在一起,它们之间的边界将丢失。

The generalized variable-length representation solves these two problems. The digit values are still 0 through base-1, but now the integer is self-delimiting by means of thresholds t(j), each of which is in the range 0 through base-1. Exactly one digit, the most significant, satisfies digit_j < t(j). Therefore, if several integers are concatenated, it is easy to separate them, starting with the first if they are little-endian (least significant digit first), or starting with the last if they are big-endian (most significant digit first). As before, the value is the sum over j of digit_j * w(j), but the weights are different:

广义变长表示解决了这两个问题。数字值仍然是0到base-1,但现在整数通过阈值t(j)自定界,每个阈值都在0到base-1的范围内。只有一个数字,即最重要的数字,满足数字_j<t(j)。因此,如果多个整数串联在一起,则很容易将它们分开,如果它们是小尾数,则从第一个开始(最低有效位在前),如果它们是大尾数,则从最后一个开始(最高有效位在前)。如前所述,该值是数字_j*w(j)在j上的和,但权重不同:

      w(0) = 1
      w(j) = w(j-1) * (base - t(j-1)) for j > 0
        
      w(0) = 1
      w(j) = w(j-1) * (base - t(j-1)) for j > 0
        

For example, consider the little-endian sequence of base 8 digits 734251... Suppose the thresholds are 2, 3, 5, 5, 5, 5... This implies that the weights are 1, 1*(8-2) = 6, 6*(8-3) = 30, 30*(8-5) = 90, 90*(8-5) = 270, and so on. 7 is not less than 2, and 3 is not less than 3, but 4 is less than 5, so 4 is the last digit. The value of 734 is 7*1 + 3*6 + 4*30 = 145. The next integer is 251, with value 2*1 + 5*6 + 1*30 = 62. Decoding this representation is very similar to decoding a conventional integer: Start with a current value of N = 0 and a weight w = 1. Fetch the next digit d and increase N by d * w. If d is less than the current threshold (t) then stop, otherwise increase w by a factor of (base - t), update t for the next position, and repeat.

例如,考虑基数8位734251的小端序序列…假设阈值是2,3,5,5,5,5。。。这意味着权重为1、1*(8-2)=6、6*(8-3)=30、30*(8-5)=90、90*(8-5)=270,依此类推。7不小于2,3不小于3,但4小于5,因此4是最后一位数字。734的值是7*1+3*6+4*30=145。下一个整数是251,值为2*1+5*6+1*30=62。解码此表示与解码常规整数非常相似:从当前值N=0和权重w=1开始。取下一个数字d,将N增加d*w。如果d小于当前阈值(t),则停止,否则将w增加一个因子(base-t),为下一个位置更新t,然后重复。

Encoding this representation is similar to encoding a conventional integer: If N < t then output one digit for N and stop, otherwise output the digit for t + ((N - t) mod (base - t)), then replace N with (N - t) div (base - t), update t for the next position, and repeat.

编码此表示法类似于编码常规整数:如果N<t,则为N输出一个数字并停止,否则为t+输出数字((N-t)mod(base-t)),然后用(N-t)div(base-t)替换N,更新下一个位置的t,然后重复。

For any particular set of values of t(j), there is exactly one generalized variable-length representation of each nonnegative integral value.

对于t(j)的任何特定值集,每个非负整数值都有一个广义变长表示。

Bootstring uses little-endian ordering so that the deltas can be separated starting with the first. The t(j) values are defined in terms of the constants base, tmin, and tmax, and a state variable called bias:

Bootstring使用小端点排序,因此可以从第一个开始分离增量。t(j)值根据常数base、tmin和tmax以及称为bias的状态变量定义:

      t(j) = base * (j + 1) - bias,
      clamped to the range tmin through tmax
        
      t(j) = base * (j + 1) - bias,
      clamped to the range tmin through tmax
        

The clamping means that if the formula yields a value less than tmin or greater than tmax, then t(j) = tmin or tmax, respectively. (In the pseudocode in section 6 "Bootstring algorithms", the expression base * (j + 1) is denoted by k for performance reasons.) These t(j) values cause the representation to favor integers within a particular range determined by the bias.

夹紧意味着,如果公式得出的值小于tmin或大于tmax,则t(j)=tmin或tmax。(在第6节“引导字符串算法”中的伪代码中,出于性能原因,表达式base*(j+1)用k表示。)这些t(j)值使表示有利于偏差确定的特定范围内的整数。

3.4 Bias adaptation
3.4 偏差适应

After each delta is encoded or decoded, bias is set for the next delta as follows:

在对每个增量进行编码或解码后,将为下一个增量设置偏差,如下所示:

1. Delta is scaled in order to avoid overflow in the next step:

1. 增量是按比例缩放的,以便在下一步中避免溢出:

let delta = delta div 2

设delta=delta div 2

But when this is the very first delta, the divisor is not 2, but instead a constant called damp. This compensates for the fact that the second delta is usually much smaller than the first.

但当这是第一个delta时,除数不是2,而是一个称为damp的常数。这弥补了第二个增量通常比第一个小得多的事实。

2. Delta is increased to compensate for the fact that the next delta will be inserting into a longer string:

2. 增量增加以补偿下一个增量将插入较长字符串的事实:

let delta = delta + (delta div numpoints)

设delta=delta+(delta div numpoints)

numpoints is the total number of code points encoded/decoded so far (including the one corresponding to this delta itself, and including the basic code points).

numpoints是迄今为止编码/解码的代码点总数(包括与此增量本身相对应的代码点,并包括基本代码点)。

3. Delta is repeatedly divided until it falls within a threshold, to predict the minimum number of digits needed to represent the next delta:

3. 增量被反复分割,直到它落在阈值内,以预测表示下一个增量所需的最小位数:

         while delta > ((base - tmin) * tmax) div 2
         do let delta = delta div (base - tmin)
        
         while delta > ((base - tmin) * tmax) div 2
         do let delta = delta div (base - tmin)
        

4. The bias is set:

4. 偏差设置为:

         let bias =
           (base * the number of divisions performed in step 3) +
           (((base - tmin + 1) * delta) div (delta + skew))
        
         let bias =
           (base * the number of divisions performed in step 3) +
           (((base - tmin + 1) * delta) div (delta + skew))
        

The motivation for this procedure is that the current delta provides a hint about the likely size of the next delta, and so t(j) is set to tmax for the more significant digits starting with the one expected to be last, tmin for the less significant digits up through the one expected to be third-last, and somewhere between tmin and tmax for the digit expected to be second-last

此过程的动机是当前增量提供了下一个增量可能大小的提示,因此t(j)对于从预期最后一个开始的较高有效数字设置为tmax,对于较低有效数字设置为tmin,直到预期最后三个,在tmin和tmax之间的某个位置,该数字预计为倒数第二位

(balancing the hope of the expected-last digit being unnecessary against the danger of it being insufficient).

(平衡预期的最后一个数字不必要的希望和它不足的危险)。

4. Bootstring parameters
4. 引导字符串参数

Given a set of basic code points, one needs to be designated as the delimiter. The base cannot be greater than the number of distinguishable basic code points remaining. The digit-values in the range 0 through base-1 need to be associated with distinct non-delimiter basic code points. In some cases multiple code points need to have the same digit-value; for example, uppercase and lowercase versions of the same letter need to be equivalent if basic strings are case-insensitive.

给定一组基本代码点,需要指定一个作为分隔符。基数不能大于剩余的可分辨基本代码点数。0到base-1范围内的数字值需要与不同的非分隔符基本代码点相关联。在某些情况下,多个代码点需要具有相同的数字值;例如,如果基本字符串不区分大小写,则同一字母的大小写版本需要等效。

The initial value of n cannot be greater than the minimum non-basic code point that could appear in extended strings.

n的初始值不能大于扩展字符串中可能出现的最小非基本代码点。

The remaining five parameters (tmin, tmax, skew, damp, and the initial value of bias) need to satisfy the following constraints:

其余五个参数(tmin、tmax、倾斜、阻尼和偏差的初始值)需要满足以下约束:

      0 <= tmin <= tmax <= base-1
      skew >= 1
      damp >= 2
      initial_bias mod base <= base - tmin
        
      0 <= tmin <= tmax <= base-1
      skew >= 1
      damp >= 2
      initial_bias mod base <= base - tmin
        

Provided the constraints are satisfied, these five parameters affect efficiency but not correctness. They are best chosen empirically.

如果满足约束条件,这五个参数会影响效率,但不会影响正确性。它们最好是根据经验选择的。

If support for mixed-case annotation is desired (see appendix A), make sure that the code points corresponding to 0 through tmax-1 all have both uppercase and lowercase forms.

如果需要支持混合大小写注释(参见附录A),请确保0到tmax-1对应的代码点都具有大写和小写形式。

5. Parameter values for Punycode
5. Punycode的参数值

Punycode uses the following Bootstring parameter values:

Punycode使用以下引导字符串参数值:

base = 36 tmin = 1 tmax = 26 skew = 38 damp = 700 initial_bias = 72 initial_n = 128 = 0x80

基准=36 tmin=1 tmax=26倾斜=38阻尼=700初始偏差=72初始偏差=128=0x80

Although the only restriction Punycode imposes on the input integers is that they be nonnegative, these parameters are especially designed to work well with Unicode [UNICODE] code points, which are integers in the range 0..10FFFF (but not D800..DFFF, which are reserved for

尽管Punycode对输入整数施加的唯一限制是它们是非负的,但这些参数特别设计用于Unicode[Unicode]代码点,这些代码点是0..10FFFF(但不是D800..DFFF)范围内的整数,保留用于

use by the UTF-16 encoding of Unicode). The basic code points are the ASCII [ASCII] code points (0..7F), of which U+002D (-) is the delimiter, and some of the others have digit-values as follows:

由Unicode的UTF-16编码使用)。基本代码点是ASCII[ASCII]代码点(0..7F),其中U+002D(-)是分隔符,其他一些具有如下数字值:

      code points    digit-values
      ------------   ----------------------
      41..5A (A-Z) =  0 to 25, respectively
      61..7A (a-z) =  0 to 25, respectively
      30..39 (0-9) = 26 to 35, respectively
        
      code points    digit-values
      ------------   ----------------------
      41..5A (A-Z) =  0 to 25, respectively
      61..7A (a-z) =  0 to 25, respectively
      30..39 (0-9) = 26 to 35, respectively
        

Using hyphen-minus as the delimiter implies that the encoded string can end with a hyphen-minus only if the Unicode string consists entirely of basic code points, but IDNA forbids such strings from being encoded. The encoded string can begin with a hyphen-minus, but IDNA prepends a prefix. Therefore IDNA using Punycode conforms to the RFC 952 rule that host name labels neither begin nor end with a hyphen-minus [RFC952].

使用连字符减号作为分隔符意味着编码字符串只能在Unicode字符串完全由基本代码点组成时以连字符减号结尾,但IDNA禁止对此类字符串进行编码。编码字符串可以以连字符减号开头,但IDNA会在前缀前加上前缀。因此,使用Punycode的IDNA符合RFC952规则,即主机名标签既不以连字符减号开始也不以连字符减号结束[RFC952]。

A decoder MUST recognize the letters in both uppercase and lowercase forms (including mixtures of both forms). An encoder SHOULD output only uppercase forms or only lowercase forms, unless it uses mixed-case annotation (see appendix A).

解码器必须识别大写和小写形式的字母(包括两种形式的混合)。编码器应该只输出大写形式或小写形式,除非它使用混合大小写注释(参见附录A)。

Presumably most users will not manually write or type encoded strings (as opposed to cutting and pasting them), but those who do will need to be alert to the potential visual ambiguity between the following sets of characters:

大多数用户可能不会手动编写或键入编码字符串(而不是剪切和粘贴字符串),但那些这样做的用户需要警惕以下字符集之间潜在的视觉歧义:

G 6 I l 1 O 0 S 5 U V Z 2

G 6 I l 1 O 0 S 5 U V Z 2

Such ambiguities are usually resolved by context, but in a Punycode encoded string there is no context apparent to humans.

这种歧义通常通过上下文来解决,但在一个Punycode编码的字符串中,人类看不到上下文。

6. Bootstring algorithms
6. 引导字符串算法

Some parts of the pseudocode can be omitted if the parameters satisfy certain conditions (for which Punycode qualifies). These parts are enclosed in {braces}, and notes immediately following the pseudocode explain the conditions under which they can be omitted.

如果参数满足某些条件(Punycode符合这些条件),则可以省略伪代码的某些部分。这些部分包含在{大括号}中,伪代码后面的注释解释了可以省略它们的条件。

Formally, code points are integers, and hence the pseudocode assumes that arithmetic operations can be performed directly on code points. In some programming languages, explicit conversion between code points and integers might be necessary.

形式上,代码点是整数,因此伪代码假定可以直接对代码点执行算术运算。在某些编程语言中,可能需要在代码点和整数之间进行显式转换。

6.1 Bias adaptation function
6.1 偏差适应函数
   function adapt(delta,numpoints,firsttime):
     if firsttime then let delta = delta div damp
     else let delta = delta div 2
     let delta = delta + (delta div numpoints)
     let k = 0
     while delta > ((base - tmin) * tmax) div 2 do begin
       let delta = delta div (base - tmin)
       let k = k + base
     end
     return k + (((base - tmin + 1) * delta) div (delta + skew))
        
   function adapt(delta,numpoints,firsttime):
     if firsttime then let delta = delta div damp
     else let delta = delta div 2
     let delta = delta + (delta div numpoints)
     let k = 0
     while delta > ((base - tmin) * tmax) div 2 do begin
       let delta = delta div (base - tmin)
       let k = k + base
     end
     return k + (((base - tmin + 1) * delta) div (delta + skew))
        

It does not matter whether the modifications to delta and k inside adapt() affect variables of the same name inside the encoding/decoding procedures, because after calling adapt() the caller does not read those variables before overwriting them.

adapt()中对delta和k的修改是否会影响编码/解码过程中同名的变量并不重要,因为调用adapt()后,调用方在覆盖这些变量之前不会读取这些变量。

6.2 Decoding procedure
6.2 解码程序
   let n = initial_n
   let i = 0
   let bias = initial_bias
   let output = an empty string indexed from 0
   consume all code points before the last delimiter (if there is one)
     and copy them to output, fail on any non-basic code point
   if more than zero code points were consumed then consume one more
     (which will be the last delimiter)
   while the input is not exhausted do begin
     let oldi = i
     let w = 1
     for k = base to infinity in steps of base do begin
       consume a code point, or fail if there was none to consume
       let digit = the code point's digit-value, fail if it has none
       let i = i + digit * w, fail on overflow
       let t = tmin if k <= bias {+ tmin}, or
               tmax if k >= bias + tmax, or k - bias otherwise
       if digit < t then break
       let w = w * (base - t), fail on overflow
     end
     let bias = adapt(i - oldi, length(output) + 1, test oldi is 0?)
     let n = n + i div (length(output) + 1), fail on overflow
     let i = i mod (length(output) + 1)
     {if n is a basic code point then fail}
     insert n into output at position i
     increment i
   end
        
   let n = initial_n
   let i = 0
   let bias = initial_bias
   let output = an empty string indexed from 0
   consume all code points before the last delimiter (if there is one)
     and copy them to output, fail on any non-basic code point
   if more than zero code points were consumed then consume one more
     (which will be the last delimiter)
   while the input is not exhausted do begin
     let oldi = i
     let w = 1
     for k = base to infinity in steps of base do begin
       consume a code point, or fail if there was none to consume
       let digit = the code point's digit-value, fail if it has none
       let i = i + digit * w, fail on overflow
       let t = tmin if k <= bias {+ tmin}, or
               tmax if k >= bias + tmax, or k - bias otherwise
       if digit < t then break
       let w = w * (base - t), fail on overflow
     end
     let bias = adapt(i - oldi, length(output) + 1, test oldi is 0?)
     let n = n + i div (length(output) + 1), fail on overflow
     let i = i mod (length(output) + 1)
     {if n is a basic code point then fail}
     insert n into output at position i
     increment i
   end
        

The full statement enclosed in braces (checking whether n is a basic code point) can be omitted if initial_n exceeds all basic code points (which is true for Punycode), because n is never less than initial_n.

如果initial_n超过所有基本代码点(对于Punycode是这样),则可以省略大括号中的完整语句(检查n是否为基本代码点),因为n永远不小于initial_n。

In the assignment of t, where t is clamped to the range tmin through tmax, "+ tmin" can always be omitted. This makes the clamping calculation incorrect when bias < k < bias + tmin, but that cannot happen because of the way bias is computed and because of the constraints on the parameters.

在t的赋值中,其中t被钳制到范围tmin到tmax,“+tmin”总是可以省略。当偏置<k<bias+tmin时,这使得夹紧计算不正确,但由于偏置的计算方式和参数的约束,这无法发生。

Because the decoder state can only advance monotonically, and there is only one representation of any delta, there is therefore only one encoded string that can represent a given sequence of integers. The only error conditions are invalid code points, unexpected end-of-input, overflow, and basic code points encoded using deltas instead of appearing literally. If the decoder fails on these errors as shown above, then it cannot produce the same output for two distinct inputs. Without this property it would have been necessary to re-

因为解码器状态只能单调前进,并且任何增量只有一个表示,因此只有一个编码字符串可以表示给定的整数序列。唯一的错误情况是无效代码点、输入意外结束、溢出和使用增量编码的基本代码点,而不是按字面意思显示。如上图所示,如果解码器在这些错误上失败,那么它不能为两个不同的输入产生相同的输出。如果没有这一财产,就有必要重新开发-

encode the output and verify that it matches the input in order to guarantee the uniqueness of the encoding.

对输出进行编码并验证其与输入匹配,以确保编码的唯一性。

6.3 Encoding procedure
6.3 编码程序
   let n = initial_n
   let delta = 0
   let bias = initial_bias
   let h = b = the number of basic code points in the input
   copy them to the output in order, followed by a delimiter if b > 0
   {if the input contains a non-basic code point < n then fail}
   while h < length(input) do begin
     let m = the minimum {non-basic} code point >= n in the input
     let delta = delta + (m - n) * (h + 1), fail on overflow
     let n = m
     for each code point c in the input (in order) do begin
       if c < n {or c is basic} then increment delta, fail on overflow
       if c == n then begin
         let q = delta
         for k = base to infinity in steps of base do begin
           let t = tmin if k <= bias {+ tmin}, or
                   tmax if k >= bias + tmax, or k - bias otherwise
           if q < t then break
           output the code point for digit t + ((q - t) mod (base - t))
           let q = (q - t) div (base - t)
         end
         output the code point for digit q
         let bias = adapt(delta, h + 1, test h equals b?)
         let delta = 0
         increment h
       end
     end
     increment delta and n
   end
        
   let n = initial_n
   let delta = 0
   let bias = initial_bias
   let h = b = the number of basic code points in the input
   copy them to the output in order, followed by a delimiter if b > 0
   {if the input contains a non-basic code point < n then fail}
   while h < length(input) do begin
     let m = the minimum {non-basic} code point >= n in the input
     let delta = delta + (m - n) * (h + 1), fail on overflow
     let n = m
     for each code point c in the input (in order) do begin
       if c < n {or c is basic} then increment delta, fail on overflow
       if c == n then begin
         let q = delta
         for k = base to infinity in steps of base do begin
           let t = tmin if k <= bias {+ tmin}, or
                   tmax if k >= bias + tmax, or k - bias otherwise
           if q < t then break
           output the code point for digit t + ((q - t) mod (base - t))
           let q = (q - t) div (base - t)
         end
         output the code point for digit q
         let bias = adapt(delta, h + 1, test h equals b?)
         let delta = 0
         increment h
       end
     end
     increment delta and n
   end
        

The full statement enclosed in braces (checking whether the input contains a non-basic code point less than n) can be omitted if all code points less than initial_n are basic code points (which is true for Punycode if code points are unsigned).

如果所有小于initial_n的代码点都是基本代码点(如果代码点是无符号的,则Punycode为真),则可以省略大括号中的完整语句(检查输入是否包含小于n的非基本代码点)。

The brace-enclosed conditions "non-basic" and "or c is basic" can be omitted if initial_n exceeds all basic code points (which is true for Punycode), because the code point being tested is never less than initial_n.

如果初始值超过所有基本代码点(对于Punycode来说是如此),则可以省略括号内的条件“非基本”和“或c是基本的”,因为被测试的代码点从不小于初始值。

In the assignment of t, where t is clamped to the range tmin through tmax, "+ tmin" can always be omitted. This makes the clamping calculation incorrect when bias < k < bias + tmin, but that cannot

在t的赋值中,其中t被钳制到范围tmin到tmax,“+tmin”总是可以省略。当偏置<k<bias+tmin时,这使得夹紧计算不正确,但不能

happen because of the way bias is computed and because of the constraints on the parameters.

这是因为计算偏差的方式和参数的约束。

The checks for overflow are necessary to avoid producing invalid output when the input contains very large values or is very long.

当输入包含非常大的值或非常长时,检查溢出是必要的,以避免产生无效输出。

The increment of delta at the bottom of the outer loop cannot overflow because delta < length(input) before the increment, and length(input) is already assumed to be representable. The increment of n could overflow, but only if h == length(input), in which case the procedure is finished anyway.

外部循环底部的增量增量delta不能溢出,因为增量之前的增量delta<长度(输入),并且长度(输入)已经被假定为可表示的。n的增量可能溢出,但仅当h==length(输入)时,在这种情况下,过程无论如何都会完成。

6.4 Overflow handling
6.4 溢出处理

For IDNA, 26-bit unsigned integers are sufficient to handle all valid IDNA labels without overflow, because any string that needed a 27-bit delta would have to exceed either the code point limit (0..10FFFF) or the label length limit (63 characters). However, overflow handling is necessary because the inputs are not necessarily valid IDNA labels.

对于IDNA,26位无符号整数足以处理所有有效的IDNA标签而不会溢出,因为任何需要27位增量的字符串都必须超过代码点限制(0..10FFFF)或标签长度限制(63个字符)。但是,溢出处理是必要的,因为输入不一定是有效的IDNA标签。

If the programming language does not provide overflow detection, the following technique can be used. Suppose A, B, and C are representable nonnegative integers and C is nonzero. Then A + B overflows if and only if B > maxint - A, and A + (B * C) overflows if and only if B > (maxint - A) div C, where maxint is the greatest integer for which maxint + 1 cannot be represented. Refer to appendix C "Punycode sample implementation" for demonstrations of this technique in the C language.

如果编程语言不提供溢出检测,则可以使用以下技术。假设A、B和C是可表示的非负整数,C是非零。然后A+B溢出当且仅当B>maxint-A,A+(B*C)溢出当且仅当B>(maxint-A)div C,其中maxint是不能表示maxint+1的最大整数。请参阅附录C“Punycode示例实现”,以了解该技术在C语言中的演示。

The decoding and encoding algorithms shown in sections 6.2 and 6.3 handle overflow by detecting it whenever it happens. Another approach is to enforce limits on the inputs that prevent overflow from happening. For example, if the encoder were to verify that no input code points exceed M and that the input length does not exceed L, then no delta could ever exceed (M - initial_n) * (L + 1), and hence no overflow could occur if integer variables were capable of representing values that large. This prevention approach would impose more restrictions on the input than the detection approach does, but might be considered simpler in some programming languages.

第6.2节和第6.3节中所示的解码和编码算法通过检测溢出来处理溢出。另一种方法是对输入施加限制,以防止发生溢出。例如,如果编码器要验证输入代码点不超过M且输入长度不超过L,则增量永远不会超过(M-initial_n)*(L+1),因此,如果整数变量能够表示如此大的值,则不会发生溢出。这种预防方法将比检测方法对输入施加更多的限制,但在某些编程语言中可能会被认为更简单。

In theory, the decoder could use an analogous approach, limiting the number of digits in a variable-length integer (that is, limiting the number of iterations in the innermost loop). However, the number of digits that suffice to represent a given delta can sometimes represent much larger deltas (because of the adaptation), and hence this approach would probably need integers wider than 32 bits.

理论上,解码器可以使用类似的方法,限制可变长度整数中的位数(即限制最内层循环中的迭代次数)。然而,足以表示给定增量的位数有时可以表示更大的增量(因为自适应),因此这种方法可能需要大于32位的整数。

Yet another approach for the decoder is to allow overflow to occur, but to check the final output string by re-encoding it and comparing to the decoder input. If and only if they do not match (using a case-insensitive ASCII comparison) overflow has occurred. This delayed-detection approach would not impose any more restrictions on the input than the immediate-detection approach does, and might be considered simpler in some programming languages.

解码器的另一种方法是允许发生溢出,但通过重新编码并与解码器输入进行比较来检查最终输出字符串。当且仅当它们不匹配时(使用不区分大小写的ASCII比较),发生溢出。与即时检测方法相比,这种延迟检测方法不会对输入施加更多的限制,并且在某些编程语言中可能会被认为更简单。

In fact, if the decoder is used only inside the IDNA ToUnicode operation [IDNA], then it need not check for overflow at all, because ToUnicode performs a higher level re-encoding and comparison, and a mismatch has the same consequence as if the Punycode decoder had failed.

事实上,如果解码器仅在IDNA ToUnicode操作[IDNA]内使用,则根本不需要检查溢出,因为ToUnicode执行更高级别的重新编码和比较,并且不匹配的后果与Punycode解码器失败的后果相同。

7. Punycode examples
7. Punycode示例
7.1 Sample strings
7.1 示例字符串

In the Punycode encodings below, the ACE prefix is not shown. Backslashes show where line breaks have been inserted in strings too long for one line.

在下面的Punycode编码中,未显示ACE前缀。反斜杠表示在字符串中插入换行符的位置过长。

The first several examples are all translations of the sentence "Why can't they just speak in <language>?" (courtesy of Michael Kaplan's "provincial" page [PROVINCIAL]). Word breaks and punctuation have been removed, as is often done in domain names.

前几个例子都是“为什么他们不能用<语言>说话?”这句话的翻译(由迈克尔·卡普兰的“省级”页面[省级]提供)。断字和标点符号已经被删除,就像在域名中经常做的那样。

   (A) Arabic (Egyptian):
       u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644
       u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F
       Punycode: egbpdaj6bu4bxfgehfvwxn
        
   (A) Arabic (Egyptian):
       u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644
       u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F
       Punycode: egbpdaj6bu4bxfgehfvwxn
        
   (B) Chinese (simplified):
       u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587
       Punycode: ihqwcrb4cv8a8dqg056pqjye
        
   (B) Chinese (simplified):
       u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587
       Punycode: ihqwcrb4cv8a8dqg056pqjye
        
   (C) Chinese (traditional):
       u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587
       Punycode: ihqwctvzc91f659drss3x8bo0yb
        
   (C) Chinese (traditional):
       u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587
       Punycode: ihqwctvzc91f659drss3x8bo0yb
        
   (D) Czech: Pro<ccaron>prost<ecaron>nemluv<iacute><ccaron>esky
       U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074
       u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D
       u+0065 u+0073 u+006B u+0079
       Punycode: Proprostnemluvesky-uyb24dma41a
        
   (D) Czech: Pro<ccaron>prost<ecaron>nemluv<iacute><ccaron>esky
       U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074
       u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D
       u+0065 u+0073 u+006B u+0079
       Punycode: Proprostnemluvesky-uyb24dma41a
        
   (E) Hebrew:
       u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8
       u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2
       u+05D1 u+05E8 u+05D9 u+05EA
       Punycode: 4dbcagdahymbxekheh6e0a7fei0b
        
   (E) Hebrew:
       u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8
       u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2
       u+05D1 u+05E8 u+05D9 u+05EA
       Punycode: 4dbcagdahymbxekheh6e0a7fei0b
        
   (F) Hindi (Devanagari):
       u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D
       u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939
       u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947
       u+0939 u+0948 u+0902
       Punycode: i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd
        
   (F) Hindi (Devanagari):
       u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D
       u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939
       u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947
       u+0939 u+0948 u+0902
       Punycode: i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd
        
   (G) Japanese (kanji and hiragana):
       u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092
       u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B
       Punycode: n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa
        
   (G) Japanese (kanji and hiragana):
       u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092
       u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B
       Punycode: n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa
        
   (H) Korean (Hangul syllables):
       u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774
       u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74
       u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C
       Punycode: 989aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5j\
                 psd879ccm6fea98c
        
   (H) Korean (Hangul syllables):
       u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774
       u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74
       u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C
       Punycode: 989aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5j\
                 psd879ccm6fea98c
        
   (I) Russian (Cyrillic):
       U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
       u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
       u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
       u+0438
       Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l
        
   (I) Russian (Cyrillic):
       U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
       u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
       u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
       u+0438
       Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l
        
   (J) Spanish: Porqu<eacute>nopuedensimplementehablarenEspa<ntilde>ol
       U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070
       u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070
       u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061
       u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070
       u+0061 u+00F1 u+006F u+006C
       Punycode: PorqunopuedensimplementehablarenEspaol-fmd56a
        
   (J) Spanish: Porqu<eacute>nopuedensimplementehablarenEspa<ntilde>ol
       U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070
       u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070
       u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061
       u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070
       u+0061 u+00F1 u+006F u+006C
       Punycode: PorqunopuedensimplementehablarenEspaol-fmd56a
        
   (K) Vietnamese:
       T<adotbelow>isaoh<odotbelow>kh<ocirc>ngth<ecirchookabove>ch\
       <ihookabove>n<oacute>iti<ecircacute>ngVi<ecircdotbelow>t
       U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B
       u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068
       u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067
       U+0056 u+0069 u+1EC7 u+0074
       Punycode: TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g
        
   (K) Vietnamese:
       T<adotbelow>isaoh<odotbelow>kh<ocirc>ngth<ecirchookabove>ch\
       <ihookabove>n<oacute>iti<ecircacute>ngVi<ecircdotbelow>t
       U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B
       u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068
       u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067
       U+0056 u+0069 u+1EC7 u+0074
       Punycode: TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g
        

The next several examples are all names of Japanese music artists, song titles, and TV programs, just because the author happens to have them handy (but Japanese is useful for providing examples of single-row text, two-row text, ideographic text, and various mixtures thereof).

接下来的几个例子都是日本音乐艺术家的名字、歌曲名称和电视节目,只是因为作者碰巧手头有这些名字(但日语在提供单行文字、两行文字、表意文字以及各种混合文字的例子时很有用)。

   (L) 3<nen>B<gumi><kinpachi><sensei>
       u+0033 u+5E74 U+0042 u+7D44 u+91D1 u+516B u+5148 u+751F
       Punycode: 3B-ww4c5e180e575a65lsy2b
        
   (L) 3<nen>B<gumi><kinpachi><sensei>
       u+0033 u+5E74 U+0042 u+7D44 u+91D1 u+516B u+5148 u+751F
       Punycode: 3B-ww4c5e180e575a65lsy2b
        
   (M) <amuro><namie>-with-SUPER-MONKEYS
       u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074
       u+0068 u+002D U+0053 U+0055 U+0050 U+0045 U+0052 u+002D U+004D
       U+004F U+004E U+004B U+0045 U+0059 U+0053
       Punycode: -with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n
        
   (M) <amuro><namie>-with-SUPER-MONKEYS
       u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074
       u+0068 u+002D U+0053 U+0055 U+0050 U+0045 U+0052 u+002D U+004D
       U+004F U+004E U+004B U+0045 U+0059 U+0053
       Punycode: -with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n
        
   (N) Hello-Another-Way-<sorezore><no><basho>
       U+0048 u+0065 u+006C u+006C u+006F u+002D U+0041 u+006E u+006F
       u+0074 u+0068 u+0065 u+0072 u+002D U+0057 u+0061 u+0079 u+002D
       u+305D u+308C u+305E u+308C u+306E u+5834 u+6240
       Punycode: Hello-Another-Way--fc4qua05auwb3674vfr0b
        
   (N) Hello-Another-Way-<sorezore><no><basho>
       U+0048 u+0065 u+006C u+006C u+006F u+002D U+0041 u+006E u+006F
       u+0074 u+0068 u+0065 u+0072 u+002D U+0057 u+0061 u+0079 u+002D
       u+305D u+308C u+305E u+308C u+306E u+5834 u+6240
       Punycode: Hello-Another-Way--fc4qua05auwb3674vfr0b
        
   (O) <hitotsu><yane><no><shita>2
       u+3072 u+3068 u+3064 u+5C4B u+6839 u+306E u+4E0B u+0032
       Punycode: 2-u9tlzr9756bt3uc0v
        
   (O) <hitotsu><yane><no><shita>2
       u+3072 u+3068 u+3064 u+5C4B u+6839 u+306E u+4E0B u+0032
       Punycode: 2-u9tlzr9756bt3uc0v
        
   (P) Maji<de>Koi<suru>5<byou><mae>
       U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059
       u+308B u+0035 u+79D2 u+524D
       Punycode: MajiKoi5-783gue6qz075azm5e
        
   (P) Maji<de>Koi<suru>5<byou><mae>
       U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059
       u+308B u+0035 u+79D2 u+524D
       Punycode: MajiKoi5-783gue6qz075azm5e
        
   (Q) <pafii>de<runba>
       u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
       Punycode: de-jg4avhby1noc0d
        
   (Q) <pafii>de<runba>
       u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
       Punycode: de-jg4avhby1noc0d
        
   (R) <sono><supiido><de>
       u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
       Punycode: d9juau41awczczp
        
   (R) <sono><supiido><de>
       u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
       Punycode: d9juau41awczczp
        

The last example is an ASCII string that breaks the existing rules for host name labels. (It is not a realistic example for IDNA, because IDNA never encodes pure ASCII labels.)

最后一个示例是一个ASCII字符串,它打破了主机名标签的现有规则。(这不是IDNA的实际示例,因为IDNA从不编码纯ASCII标签。)

   (S) -> $1.00 <-
       u+002D u+003E u+0020 u+0024 u+0031 u+002E u+0030 u+0030 u+0020
       u+003C u+002D
       Punycode: -> $1.00 <--
        
   (S) -> $1.00 <-
       u+002D u+003E u+0020 u+0024 u+0031 u+002E u+0030 u+0030 u+0020
       u+003C u+002D
       Punycode: -> $1.00 <--
        
7.2 Decoding traces
7.2 解码跟踪

In the following traces, the evolving state of the decoder is shown as a sequence of hexadecimal values, representing the code points in the extended string. An asterisk appears just after the most recently inserted code point, indicating both n (the value preceeding the asterisk) and i (the position of the value just after the asterisk). Other numerical values are decimal.

在以下跟踪中,解码器的演变状态显示为十六进制值序列,表示扩展字符串中的代码点。最近插入的代码点后面会出现一个星号,表示n(星号前面的值)和i(星号后面的值的位置)。其他数值为十进制。

Decoding trace of example B from section 7.1:

第7.1节示例B的解码轨迹:

   n is 128, i is 0, bias is 72
   input is "ihqwcrb4cv8a8dqg056pqjye"
   there is no delimiter, so extended string starts empty
   delta "ihq" decodes to 19853
   bias becomes 21
   4E0D *
   delta "wc" decodes to 64
   bias becomes 20
   4E0D 4E2D *
   delta "rb" decodes to 37
   bias becomes 13
   4E3A * 4E0D 4E2D
   delta "4c" decodes to 56
   bias becomes 17
   4E3A 4E48 * 4E0D 4E2D
   delta "v8a" decodes to 599
   bias becomes 32
   4E3A 4EC0 * 4E48 4E0D 4E2D
   delta "8d" decodes to 130
   bias becomes 23
   4ED6 * 4E3A 4EC0 4E48 4E0D 4E2D
   delta "qg" decodes to 154
   bias becomes 25
   4ED6 4EEC * 4E3A 4EC0 4E48 4E0D 4E2D
   delta "056p" decodes to 46301
   bias becomes 84
   4ED6 4EEC 4E3A 4EC0 4E48 4E0D 4E2D 6587 *
   delta "qjye" decodes to 88531
   bias becomes 90
   4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 * 4E2D 6587
        
   n is 128, i is 0, bias is 72
   input is "ihqwcrb4cv8a8dqg056pqjye"
   there is no delimiter, so extended string starts empty
   delta "ihq" decodes to 19853
   bias becomes 21
   4E0D *
   delta "wc" decodes to 64
   bias becomes 20
   4E0D 4E2D *
   delta "rb" decodes to 37
   bias becomes 13
   4E3A * 4E0D 4E2D
   delta "4c" decodes to 56
   bias becomes 17
   4E3A 4E48 * 4E0D 4E2D
   delta "v8a" decodes to 599
   bias becomes 32
   4E3A 4EC0 * 4E48 4E0D 4E2D
   delta "8d" decodes to 130
   bias becomes 23
   4ED6 * 4E3A 4EC0 4E48 4E0D 4E2D
   delta "qg" decodes to 154
   bias becomes 25
   4ED6 4EEC * 4E3A 4EC0 4E48 4E0D 4E2D
   delta "056p" decodes to 46301
   bias becomes 84
   4ED6 4EEC 4E3A 4EC0 4E48 4E0D 4E2D 6587 *
   delta "qjye" decodes to 88531
   bias becomes 90
   4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 * 4E2D 6587
        

Decoding trace of example L from section 7.1:

第7.1节示例L的解码轨迹:

n is 128, i is 0, bias is 72 input is "3B-ww4c5e180e575a65lsy2b" literal portion is "3B-", so extended string starts as: 0033 0042 delta "ww4c" decodes to 62042 bias becomes 27 0033 0042 5148 * delta "5e" decodes to 139 bias becomes 24 0033 0042 516B * 5148 delta "180e" decodes to 16683 bias becomes 67 0033 5E74 * 0042 516B 5148 delta "575a" decodes to 34821 bias becomes 82 0033 5E74 0042 516B 5148 751F * delta "65l" decodes to 14592 bias becomes 67 0033 5E74 0042 7D44 * 516B 5148 751F delta "sy2b" decodes to 42088 bias becomes 84 0033 5E74 0042 7D44 91D1 * 516B 5148 751F

n是128,i是0,偏差是72输入是“3B-ww4c5e180e575a65lsy2b”文字部分是“3B-”,所以扩展字符串开始为:00330042增量“ww4c”解码到62042偏差变成27 0033 0042 5148*增量“5e”解码到139偏差变成24 0033 0042 516B*5148增量“180e”解码到16683偏差变成67 0033 5E74*0042 516B 5148增量“575a”解码至34821偏差变为82 0033 5E74 0042 516B 5148 751F*增量“65l”解码至14592偏差变为67 0033 5E74 0042 7D44*516B 5148 751F增量“sy2b”解码至42088偏差变为84 0033 5E74 0042 7D44 91D1*516B 5148 751F

7.3 Encoding traces
7.3 编码痕迹

In the following traces, code point values are hexadecimal, while other numerical values are decimal.

在以下跟踪中,代码点值为十六进制,而其他数值为十进制。

Encoding trace of example B from section 7.1:

第7.1节示例B的编码跟踪:

bias is 72 input is: 4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 4E2D 6587 there are no basic code points, so no literal portion next code point to insert is 4E0D needed delta is 19853, encodes as "ihq" bias becomes 21 next code point to insert is 4E2D needed delta is 64, encodes as "wc" bias becomes 20 next code point to insert is 4E3A needed delta is 37, encodes as "rb" bias becomes 13 next code point to insert is 4E48 needed delta is 56, encodes as "4c" bias becomes 17 next code point to insert is 4EC0 needed delta is 599, encodes as "v8a" bias becomes 32 next code point to insert is 4ED6 needed delta is 130, encodes as "8d" bias becomes 23 next code point to insert is 4EEC needed delta is 154, encodes as "qg" bias becomes 25 next code point to insert is 6587 needed delta is 46301, encodes as "056p" bias becomes 84 next code point to insert is 8BF4 needed delta is 88531, encodes as "qjye" bias becomes 90 output is "ihqwcrb4cv8a8dqg056pqjye"

偏差为72输入为:4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 4E2D 6587没有基本代码点,因此下一个要插入的代码点没有文字部分4E0D需要的增量是19853,编码为“ihq”偏差变为21下一个要插入的代码点是4E2D需要的增量是64,编码为“wc”偏差变为20下一个要插入的代码点是4E3A需要的增量是37,编码为“rb”偏差变为13下一个要插入的代码点为4E48需要的增量为56,编码为“4c”偏差变为17下一个要插入的代码点为4EC0需要的增量为599,编码为“v8a”偏差变为32下一个要插入的代码点为4ED6需要的增量为130,编码为“8d”偏差变为23下一个要插入的代码点为4EEC需要的增量为154,编码为“qg”偏差变为25下一个要插入的代码点为6587需要的增量为46301,编码为“056p”偏差变为84下一个要插入的代码点为8BF4需要的增量为88531,编码为“qjye”偏差变为90输出为“ihqwcrb4cv8a8dqg056pqjye”

Encoding trace of example L from section 7.1:

第7.1节示例L的编码跟踪:

bias is 72 input is: 0033 5E74 0042 7D44 91D1 516B 5148 751F basic code points (0033, 0042) are copied to literal portion: "3B-" next code point to insert is 5148 needed delta is 62042, encodes as "ww4c" bias becomes 27 next code point to insert is 516B needed delta is 139, encodes as "5e" bias becomes 24 next code point to insert is 5E74 needed delta is 16683, encodes as "180e" bias becomes 67 next code point to insert is 751F needed delta is 34821, encodes as "575a" bias becomes 82 next code point to insert is 7D44 needed delta is 14592, encodes as "65l" bias becomes 67 next code point to insert is 91D1 needed delta is 42088, encodes as "sy2b" bias becomes 84 output is "3B-ww4c5e180e575a65lsy2b"

bias is 72 input is: 0033 5E74 0042 7D44 91D1 516B 5148 751F basic code points (0033, 0042) are copied to literal portion: "3B-" next code point to insert is 5148 needed delta is 62042, encodes as "ww4c" bias becomes 27 next code point to insert is 516B needed delta is 139, encodes as "5e" bias becomes 24 next code point to insert is 5E74 needed delta is 16683, encodes as "180e" bias becomes 67 next code point to insert is 751F needed delta is 34821, encodes as "575a" bias becomes 82 next code point to insert is 7D44 needed delta is 14592, encodes as "65l" bias becomes 67 next code point to insert is 91D1 needed delta is 42088, encodes as "sy2b" bias becomes 84 output is "3B-ww4c5e180e575a65lsy2b"translate error, please retry

8. Security Considerations
8. 安全考虑

Users expect each domain name in DNS to be controlled by a single authority. If a Unicode string intended for use as a domain label could map to multiple ACE labels, then an internationalized domain name could map to multiple ASCII domain names, each controlled by a different authority, some of which could be spoofs that hijack service requests intended for another. Therefore Punycode is designed so that each Unicode string has a unique encoding.

用户希望DNS中的每个域名都由一个机构控制。如果打算用作域标签的Unicode字符串可以映射到多个ACE标签,则国际化域名可以映射到多个ASCII域名,每个ASCII域名由不同的机构控制,其中一些可能是欺骗,劫持针对另一个机构的服务请求。因此,Punycode的设计使得每个Unicode字符串都具有唯一的编码。

However, there can still be multiple Unicode representations of the "same" text, for various definitions of "same". This problem is addressed to some extent by the Unicode standard under the topic of canonicalization, and this work is leveraged for domain names by Nameprep [NAMEPREP].

但是,对于“相同”的各种定义,“相同”文本仍然可以有多个Unicode表示。Unicode标准在规范化的主题下在某种程度上解决了这个问题,Nameprep[Nameprep]利用了这项工作来处理域名。

9. References
9. 工具书类
9.1 Normative References
9.1 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

9.2 Informative References
9.2 资料性引用

[RFC952] Harrenstien, K., Stahl, M. and E. Feinler, "DOD Internet Host Table Specification", RFC 952, October 1985.

[RFC952]Harrenstien,K.,Stahl,M.和E.Feinler,“国防部互联网主机表规范”,RFC952,1985年10月。

[RFC1034] Mockapetris, P., "Domain Names - Concepts and Facilities", STD 13, RFC 1034, November 1987.

[RFC1034]Mockapetris,P.,“域名-概念和设施”,STD 13,RFC 1034,1987年11月。

[IDNA] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003.

[IDNA]Faltstrom,P.,Hoffman,P.和A.Costello,“应用程序中的域名国际化(IDNA)”,RFC 34902003年3月。

[NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003.

[NAMEPREP]Hoffman,P.和M.Blanchet,“NAMEPREP:国际化域名(IDN)的Stringprep配置文件”,RFC 34912003年3月。

[ASCII] Cerf, V., "ASCII format for Network Interchange", RFC 20, October 1969.

[ASCII]Cerf,V.,“网络交换的ASCII格式”,RFC 20,1969年10月。

[PROVINCIAL] Kaplan, M., "The 'anyone can be provincial!' page", http://www.trigeminal.com/samples/provincial.html.

[省级]卡普兰,M.,“任何人都可以是省级的!”,http://www.trigeminal.com/samples/provincial.html.

[UNICODE] The Unicode Consortium, "The Unicode Standard", http://www.unicode.org/unicode/standard/standard.html.

[UNICODE]UNICODE联盟,“UNICODE标准”,http://www.unicode.org/unicode/standard/standard.html.

A. Mixed-case annotation

A.混合案例注释

In order to use Punycode to represent case-insensitive strings, higher layers need to case-fold the strings prior to Punycode encoding. The encoded string can use mixed case as an annotation telling how to convert the folded string into a mixed-case string for display purposes. Note, however, that mixed-case annotation is not used by the ToASCII and ToUnicode operations specified in [IDNA], and therefore implementors of IDNA can disregard this appendix.

为了使用Punycode表示不区分大小写的字符串,更高层需要在Punycode编码之前对字符串进行大小写折叠。编码字符串可以使用混合大小写作为注释,说明如何将折叠字符串转换为混合大小写字符串以用于显示。但是,请注意,[IDNA]中指定的ToASCII和ToUnicode操作不使用混合案例注释,因此IDNA的实现者可以忽略本附录。

Basic code points can use mixed case directly, because the decoder copies them verbatim, leaving lowercase code points lowercase, and leaving uppercase code points uppercase. Each non-basic code point is represented by a delta, which is represented by a sequence of basic code points, the last of which provides the annotation. If it is uppercase, it is a suggestion to map the non-basic code point to uppercase (if possible); if it is lowercase, it is a suggestion to map the non-basic code point to lowercase (if possible).

基本代码点可以直接使用混合大小写,因为解码器会逐字复制它们,将小写代码点保留为小写,将大写代码点保留为大写。每个非基本代码点由一个增量表示,该增量由一系列基本代码点表示,最后一个基本代码点提供注释。如果是大写,建议将非基本代码点映射为大写(如果可能);如果是小写,则建议将非基本代码点映射为小写(如果可能)。

These annotations do not alter the code points returned by decoders; the annotations are returned separately, for the caller to use or ignore. Encoders can accept annotations in addition to code points, but the annotations do not alter the output, except to influence the uppercase/lowercase form of ASCII letters.

这些注释不会改变解码器返回的代码点;注释是单独返回的,供调用方使用或忽略。除了代码点之外,编码器还可以接受注释,但是注释不会改变输出,除非影响ASCII字母的大小写形式。

Punycode encoders and decoders need not support these annotations, and higher layers need not use them.

Punycode编码器和解码器不需要支持这些注释,更高层也不需要使用它们。

B. Disclaimer and license

B.免责声明和许可证

Regarding this entire document or any portion of it (including the pseudocode and C code), the author makes no guarantees and is not responsible for any damage resulting from its use. The author grants irrevocable permission to anyone to use, modify, and distribute it in any way that does not diminish the rights of anyone else to use, modify, and distribute it, provided that redistributed derivative works do not contain misleading author or version information. Derivative works need not be licensed under similar terms.

对于整个文档或其任何部分(包括伪代码和C代码),作者不作任何保证,也不对其使用造成的任何损坏负责。作者向任何人授予不可撤销的使用、修改和分发许可,允许其以任何方式使用、修改和分发,但不得削弱任何其他人使用、修改和分发的权利,前提是重新分发的衍生作品不包含误导性的作者或版本信息。衍生作品无需根据类似条款获得许可。

C. Punycode sample implementation

C.Punycode示例实现

/*
punycode.c from RFC 3492
http://www.nicemice.net/idn/
Adam M. Costello
http://www.nicemice.net/amc/
        
/*
punycode.c from RFC 3492
http://www.nicemice.net/idn/
Adam M. Costello
http://www.nicemice.net/amc/
        

This is ANSI C code (C89) implementing Punycode (RFC 3492).

这是实现Punycode(RFC3492)的ANSI C代码(C89)。

*/

*/

/************************************************************/
/* Public interface (would normally go in its own .h file): */
        
/************************************************************/
/* Public interface (would normally go in its own .h file): */
        
#include <limits.h>
        
#include <limits.h>
        
enum punycode_status {
  punycode_success,
  punycode_bad_input,   /* Input is invalid.                       */
  punycode_big_output,  /* Output would exceed the space provided. */
  punycode_overflow     /* Input needs wider integers to process.  */
};
        
enum punycode_status {
  punycode_success,
  punycode_bad_input,   /* Input is invalid.                       */
  punycode_big_output,  /* Output would exceed the space provided. */
  punycode_overflow     /* Input needs wider integers to process.  */
};
        
#if UINT_MAX >= (1 << 26) - 1
typedef unsigned int punycode_uint;
#else
typedef unsigned long punycode_uint;
#endif
        
#if UINT_MAX >= (1 << 26) - 1
typedef unsigned int punycode_uint;
#else
typedef unsigned long punycode_uint;
#endif
        

enum punycode_status punycode_encode( punycode_uint input_length, const punycode_uint input[], const unsigned char case_flags[], punycode_uint *output_length, char output[] );

enum punycode_status punycode_encode(punycode_输入长度,常量punycode_输入[],常量无符号字符大小写标志[],punycode_输出长度,字符输出[]);

    /* punycode_encode() converts Unicode to Punycode.  The input     */
    /* is represented as an array of Unicode code points (not code    */
    /* units; surrogate pairs are not allowed), and the output        */
    /* will be represented as an array of ASCII code points.  The     */
    /* output string is *not* null-terminated; it will contain        */
    /* zeros if and only if the input contains zeros.  (Of course     */
    /* the caller can leave room for a terminator and add one if      */
    /* needed.)  The input_length is the number of code points in     */
    /* the input.  The output_length is an in/out argument: the       */
    /* caller passes in the maximum number of code points that it     */
        
    /* punycode_encode() converts Unicode to Punycode.  The input     */
    /* is represented as an array of Unicode code points (not code    */
    /* units; surrogate pairs are not allowed), and the output        */
    /* will be represented as an array of ASCII code points.  The     */
    /* output string is *not* null-terminated; it will contain        */
    /* zeros if and only if the input contains zeros.  (Of course     */
    /* the caller can leave room for a terminator and add one if      */
    /* needed.)  The input_length is the number of code points in     */
    /* the input.  The output_length is an in/out argument: the       */
    /* caller passes in the maximum number of code points that it     */
        
    /* can receive, and on successful return it will contain the      */
    /* number of code points actually output.  The case_flags array   */
    /* holds input_length boolean values, where nonzero suggests that */
    /* the corresponding Unicode character be forced to uppercase     */
    /* after being decoded (if possible), and zero suggests that      */
    /* it be forced to lowercase (if possible).  ASCII code points    */
    /* are encoded literally, except that ASCII letters are forced    */
    /* to uppercase or lowercase according to the corresponding       */
    /* uppercase flags.  If case_flags is a null pointer then ASCII   */
    /* letters are left as they are, and other code points are        */
    /* treated as if their uppercase flags were zero.  The return     */
    /* value can be any of the punycode_status values defined above   */
    /* except punycode_bad_input; if not punycode_success, then       */
    /* output_size and output might contain garbage.                  */
        
    /* can receive, and on successful return it will contain the      */
    /* number of code points actually output.  The case_flags array   */
    /* holds input_length boolean values, where nonzero suggests that */
    /* the corresponding Unicode character be forced to uppercase     */
    /* after being decoded (if possible), and zero suggests that      */
    /* it be forced to lowercase (if possible).  ASCII code points    */
    /* are encoded literally, except that ASCII letters are forced    */
    /* to uppercase or lowercase according to the corresponding       */
    /* uppercase flags.  If case_flags is a null pointer then ASCII   */
    /* letters are left as they are, and other code points are        */
    /* treated as if their uppercase flags were zero.  The return     */
    /* value can be any of the punycode_status values defined above   */
    /* except punycode_bad_input; if not punycode_success, then       */
    /* output_size and output might contain garbage.                  */
        

enum punycode_status punycode_decode( punycode_uint input_length, const char input[], punycode_uint *output_length, punycode_uint output[], unsigned char case_flags[] );

enum punycode_status punycode_decode(punycode_输入长度、常量字符输入[]、punycode_输入*输出长度、punycode_输出[]、无符号字符大小写标志[]);

    /* punycode_decode() converts Punycode to Unicode.  The input is  */
    /* represented as an array of ASCII code points, and the output   */
    /* will be represented as an array of Unicode code points.  The   */
    /* input_length is the number of code points in the input.  The   */
    /* output_length is an in/out argument: the caller passes in      */
    /* the maximum number of code points that it can receive, and     */
    /* on successful return it will contain the actual number of      */
    /* code points output.  The case_flags array needs room for at    */
    /* least output_length values, or it can be a null pointer if the */
    /* case information is not needed.  A nonzero flag suggests that  */
    /* the corresponding Unicode character be forced to uppercase     */
    /* by the caller (if possible), while zero suggests that it be    */
    /* forced to lowercase (if possible).  ASCII code points are      */
    /* output already in the proper case, but their flags will be set */
    /* appropriately so that applying the flags would be harmless.    */
    /* The return value can be any of the punycode_status values      */
    /* defined above; if not punycode_success, then output_length,    */
    /* output, and case_flags might contain garbage.  On success, the */
    /* decoder will never need to write an output_length greater than */
    /* input_length, because of how the encoding is defined.          */
        
    /* punycode_decode() converts Punycode to Unicode.  The input is  */
    /* represented as an array of ASCII code points, and the output   */
    /* will be represented as an array of Unicode code points.  The   */
    /* input_length is the number of code points in the input.  The   */
    /* output_length is an in/out argument: the caller passes in      */
    /* the maximum number of code points that it can receive, and     */
    /* on successful return it will contain the actual number of      */
    /* code points output.  The case_flags array needs room for at    */
    /* least output_length values, or it can be a null pointer if the */
    /* case information is not needed.  A nonzero flag suggests that  */
    /* the corresponding Unicode character be forced to uppercase     */
    /* by the caller (if possible), while zero suggests that it be    */
    /* forced to lowercase (if possible).  ASCII code points are      */
    /* output already in the proper case, but their flags will be set */
    /* appropriately so that applying the flags would be harmless.    */
    /* The return value can be any of the punycode_status values      */
    /* defined above; if not punycode_success, then output_length,    */
    /* output, and case_flags might contain garbage.  On success, the */
    /* decoder will never need to write an output_length greater than */
    /* input_length, because of how the encoding is defined.          */
        
/**********************************************************/
/* Implementation (would normally go in its own .c file): */
        
/**********************************************************/
/* Implementation (would normally go in its own .c file): */
        
#include <string.h>
        
#include <string.h>
        
/*** Bootstring parameters for Punycode ***/
        
/*** Bootstring parameters for Punycode ***/
        
enum { base = 36, tmin = 1, tmax = 26, skew = 38, damp = 700,
       initial_bias = 72, initial_n = 0x80, delimiter = 0x2D };
        
enum { base = 36, tmin = 1, tmax = 26, skew = 38, damp = 700,
       initial_bias = 72, initial_n = 0x80, delimiter = 0x2D };
        
/* basic(cp) tests whether cp is a basic code point: */
#define basic(cp) ((punycode_uint)(cp) < 0x80)
        
/* basic(cp) tests whether cp is a basic code point: */
#define basic(cp) ((punycode_uint)(cp) < 0x80)
        
/* delim(cp) tests whether cp is a delimiter: */
#define delim(cp) ((cp) == delimiter)
        
/* delim(cp) tests whether cp is a delimiter: */
#define delim(cp) ((cp) == delimiter)
        
/* decode_digit(cp) returns the numeric value of a basic code */
/* point (for use in representing integers) in the range 0 to */
/* base-1, or base if cp is does not represent a value.       */
        
/* decode_digit(cp) returns the numeric value of a basic code */
/* point (for use in representing integers) in the range 0 to */
/* base-1, or base if cp is does not represent a value.       */
        
static punycode_uint decode_digit(punycode_uint cp)
{
  return  cp - 48 < 10 ? cp - 22 :  cp - 65 < 26 ? cp - 65 :
          cp - 97 < 26 ? cp - 97 :  base;
}
        
static punycode_uint decode_digit(punycode_uint cp)
{
  return  cp - 48 < 10 ? cp - 22 :  cp - 65 < 26 ? cp - 65 :
          cp - 97 < 26 ? cp - 97 :  base;
}
        
/* encode_digit(d,flag) returns the basic code point whose value      */
/* (when used for representing integers) is d, which needs to be in   */
/* the range 0 to base-1.  The lowercase form is used unless flag is  */
/* nonzero, in which case the uppercase form is used.  The behavior   */
/* is undefined if flag is nonzero and digit d has no uppercase form. */
        
/* encode_digit(d,flag) returns the basic code point whose value      */
/* (when used for representing integers) is d, which needs to be in   */
/* the range 0 to base-1.  The lowercase form is used unless flag is  */
/* nonzero, in which case the uppercase form is used.  The behavior   */
/* is undefined if flag is nonzero and digit d has no uppercase form. */
        
static char encode_digit(punycode_uint d, int flag)
{
  return d + 22 + 75 * (d < 26) - ((flag != 0) << 5);
  /*  0..25 map to ASCII a..z or A..Z */
  /* 26..35 map to ASCII 0..9         */
}
        
static char encode_digit(punycode_uint d, int flag)
{
  return d + 22 + 75 * (d < 26) - ((flag != 0) << 5);
  /*  0..25 map to ASCII a..z or A..Z */
  /* 26..35 map to ASCII 0..9         */
}
        
/* flagged(bcp) tests whether a basic code point is flagged */
/* (uppercase).  The behavior is undefined if bcp is not a  */
/* basic code point.                                        */
        
/* flagged(bcp) tests whether a basic code point is flagged */
/* (uppercase).  The behavior is undefined if bcp is not a  */
/* basic code point.                                        */
        
#define flagged(bcp) ((punycode_uint)(bcp) - 65 < 26)
        
#define flagged(bcp) ((punycode_uint)(bcp) - 65 < 26)
        
/* encode_basic(bcp,flag) forces a basic code point to lowercase */
/* if flag is zero, uppercase if flag is nonzero, and returns    */
/* the resulting code point.  The code point is unchanged if it  */
/* is caseless.  The behavior is undefined if bcp is not a basic */
/* code point.                                                   */
        
/* encode_basic(bcp,flag) forces a basic code point to lowercase */
/* if flag is zero, uppercase if flag is nonzero, and returns    */
/* the resulting code point.  The code point is unchanged if it  */
/* is caseless.  The behavior is undefined if bcp is not a basic */
/* code point.                                                   */
        

static char encode_basic(punycode_uint bcp, int flag) {

静态字符编码\u基本(punycode\u uint bcp,int标志){

  bcp -= (bcp - 97 < 26) << 5;
  return bcp + ((!flag && (bcp - 65 < 26)) << 5);
}
        
  bcp -= (bcp - 97 < 26) << 5;
  return bcp + ((!flag && (bcp - 65 < 26)) << 5);
}
        
/*** Platform-specific constants ***/
        
/*** Platform-specific constants ***/
        
/* maxint is the maximum value of a punycode_uint variable: */
static const punycode_uint maxint = -1;
/* Because maxint is unsigned, -1 becomes the maximum value. */
        
/* maxint is the maximum value of a punycode_uint variable: */
static const punycode_uint maxint = -1;
/* Because maxint is unsigned, -1 becomes the maximum value. */
        
/*** Bias adaptation function ***/
        
/*** Bias adaptation function ***/
        
static punycode_uint adapt(
  punycode_uint delta, punycode_uint numpoints, int firsttime )
{
  punycode_uint k;
        
static punycode_uint adapt(
  punycode_uint delta, punycode_uint numpoints, int firsttime )
{
  punycode_uint k;
        
  delta = firsttime ? delta / damp : delta >> 1;
  /* delta >> 1 is a faster way of doing delta / 2 */
  delta += delta / numpoints;
        
  delta = firsttime ? delta / damp : delta >> 1;
  /* delta >> 1 is a faster way of doing delta / 2 */
  delta += delta / numpoints;
        
  for (k = 0;  delta > ((base - tmin) * tmax) / 2;  k += base) {
    delta /= base - tmin;
  }
        
  for (k = 0;  delta > ((base - tmin) * tmax) / 2;  k += base) {
    delta /= base - tmin;
  }
        
  return k + (base - tmin + 1) * delta / (delta + skew);
}
        
  return k + (base - tmin + 1) * delta / (delta + skew);
}
        
/*** Main encode function ***/
        
/*** Main encode function ***/
        
enum punycode_status punycode_encode(
  punycode_uint input_length,
  const punycode_uint input[],
  const unsigned char case_flags[],
  punycode_uint *output_length,
  char output[] )
{
  punycode_uint n, delta, h, b, out, max_out, bias, j, m, q, k, t;
        
enum punycode_status punycode_encode(
  punycode_uint input_length,
  const punycode_uint input[],
  const unsigned char case_flags[],
  punycode_uint *output_length,
  char output[] )
{
  punycode_uint n, delta, h, b, out, max_out, bias, j, m, q, k, t;
        
  /* Initialize the state: */
        
  /* Initialize the state: */
        
  n = initial_n;
  delta = out = 0;
  max_out = *output_length;
  bias = initial_bias;
        
  n = initial_n;
  delta = out = 0;
  max_out = *output_length;
  bias = initial_bias;
        
  /* Handle the basic code points: */
        
  /* Handle the basic code points: */
        
  for (j = 0;  j < input_length;  ++j) {
    if (basic(input[j])) {
      if (max_out - out < 2) return punycode_big_output;
      output[out++] =
        case_flags ?  encode_basic(input[j], case_flags[j]) : input[j];
    }
    /* else if (input[j] < n) return punycode_bad_input; */
    /* (not needed for Punycode with unsigned code points) */
  }
        
  for (j = 0;  j < input_length;  ++j) {
    if (basic(input[j])) {
      if (max_out - out < 2) return punycode_big_output;
      output[out++] =
        case_flags ?  encode_basic(input[j], case_flags[j]) : input[j];
    }
    /* else if (input[j] < n) return punycode_bad_input; */
    /* (not needed for Punycode with unsigned code points) */
  }
        
  h = b = out;
        
  h = b = out;
        
  /* h is the number of code points that have been handled, b is the  */
  /* number of basic code points, and out is the number of characters */
  /* that have been output.                                           */
        
  /* h is the number of code points that have been handled, b is the  */
  /* number of basic code points, and out is the number of characters */
  /* that have been output.                                           */
        
  if (b > 0) output[out++] = delimiter;
        
  if (b > 0) output[out++] = delimiter;
        
  /* Main encoding loop: */
        
  /* Main encoding loop: */
        
  while (h < input_length) {
    /* All non-basic code points < n have been     */
    /* handled already.  Find the next larger one: */
        
  while (h < input_length) {
    /* All non-basic code points < n have been     */
    /* handled already.  Find the next larger one: */
        
    for (m = maxint, j = 0;  j < input_length;  ++j) {
      /* if (basic(input[j])) continue; */
      /* (not needed for Punycode) */
      if (input[j] >= n && input[j] < m) m = input[j];
    }
        
    for (m = maxint, j = 0;  j < input_length;  ++j) {
      /* if (basic(input[j])) continue; */
      /* (not needed for Punycode) */
      if (input[j] >= n && input[j] < m) m = input[j];
    }
        
    /* Increase delta enough to advance the decoder's    */
    /* <n,i> state to <m,0>, but guard against overflow: */
        
    /* Increase delta enough to advance the decoder's    */
    /* <n,i> state to <m,0>, but guard against overflow: */
        
    if (m - n > (maxint - delta) / (h + 1)) return punycode_overflow;
    delta += (m - n) * (h + 1);
    n = m;
        
    if (m - n > (maxint - delta) / (h + 1)) return punycode_overflow;
    delta += (m - n) * (h + 1);
    n = m;
        
    for (j = 0;  j < input_length;  ++j) {
      /* Punycode does not need to check whether input[j] is basic: */
      if (input[j] < n /* || basic(input[j]) */ ) {
        if (++delta == 0) return punycode_overflow;
      }
        
    for (j = 0;  j < input_length;  ++j) {
      /* Punycode does not need to check whether input[j] is basic: */
      if (input[j] < n /* || basic(input[j]) */ ) {
        if (++delta == 0) return punycode_overflow;
      }
        
      if (input[j] == n) {
        /* Represent delta as a generalized variable-length integer: */
        
      if (input[j] == n) {
        /* Represent delta as a generalized variable-length integer: */
        
        for (q = delta, k = base;  ;  k += base) {
          if (out >= max_out) return punycode_big_output;
        
        for (q = delta, k = base;  ;  k += base) {
          if (out >= max_out) return punycode_big_output;
        
          t = k <= bias /* + tmin */ ? tmin :     /* +tmin not needed */
              k >= bias + tmax ? tmax : k - bias;
          if (q < t) break;
          output[out++] = encode_digit(t + (q - t) % (base - t), 0);
          q = (q - t) / (base - t);
        }
        
          t = k <= bias /* + tmin */ ? tmin :     /* +tmin not needed */
              k >= bias + tmax ? tmax : k - bias;
          if (q < t) break;
          output[out++] = encode_digit(t + (q - t) % (base - t), 0);
          q = (q - t) / (base - t);
        }
        
        output[out++] = encode_digit(q, case_flags && case_flags[j]);
        bias = adapt(delta, h + 1, h == b);
        delta = 0;
        ++h;
      }
    }
        
        output[out++] = encode_digit(q, case_flags && case_flags[j]);
        bias = adapt(delta, h + 1, h == b);
        delta = 0;
        ++h;
      }
    }
        
    ++delta, ++n;
  }
        
    ++delta, ++n;
  }
        
  *output_length = out;
  return punycode_success;
}
        
  *output_length = out;
  return punycode_success;
}
        
/*** Main decode function ***/
        
/*** Main decode function ***/
        
enum punycode_status punycode_decode(
  punycode_uint input_length,
  const char input[],
  punycode_uint *output_length,
  punycode_uint output[],
  unsigned char case_flags[] )
{
  punycode_uint n, out, i, max_out, bias,
                 b, j, in, oldi, w, k, digit, t;
        
enum punycode_status punycode_decode(
  punycode_uint input_length,
  const char input[],
  punycode_uint *output_length,
  punycode_uint output[],
  unsigned char case_flags[] )
{
  punycode_uint n, out, i, max_out, bias,
                 b, j, in, oldi, w, k, digit, t;
        
  /* Initialize the state: */
        
  /* Initialize the state: */
        
  n = initial_n;
  out = i = 0;
  max_out = *output_length;
  bias = initial_bias;
        
  n = initial_n;
  out = i = 0;
  max_out = *output_length;
  bias = initial_bias;
        
  /* Handle the basic code points:  Let b be the number of input code */
  /* points before the last delimiter, or 0 if there is none, then    */
  /* copy the first b code points to the output.                      */
        
  /* Handle the basic code points:  Let b be the number of input code */
  /* points before the last delimiter, or 0 if there is none, then    */
  /* copy the first b code points to the output.                      */
        
  for (b = j = 0;  j < input_length;  ++j) if (delim(input[j])) b = j;
  if (b > max_out) return punycode_big_output;
        
  for (b = j = 0;  j < input_length;  ++j) if (delim(input[j])) b = j;
  if (b > max_out) return punycode_big_output;
        
  for (j = 0;  j < b;  ++j) {
        
  for (j = 0;  j < b;  ++j) {
        
    if (case_flags) case_flags[out] = flagged(input[j]);
    if (!basic(input[j])) return punycode_bad_input;
    output[out++] = input[j];
  }
        
    if (case_flags) case_flags[out] = flagged(input[j]);
    if (!basic(input[j])) return punycode_bad_input;
    output[out++] = input[j];
  }
        
  /* Main decoding loop:  Start just after the last delimiter if any  */
  /* basic code points were copied; start at the beginning otherwise. */
        
  /* Main decoding loop:  Start just after the last delimiter if any  */
  /* basic code points were copied; start at the beginning otherwise. */
        
  for (in = b > 0 ? b + 1 : 0;  in < input_length;  ++out) {
        
  for (in = b > 0 ? b + 1 : 0;  in < input_length;  ++out) {
        
    /* in is the index of the next character to be consumed, and */
    /* out is the number of code points in the output array.     */
        
    /* in is the index of the next character to be consumed, and */
    /* out is the number of code points in the output array.     */
        
    /* Decode a generalized variable-length integer into delta,  */
    /* which gets added to i.  The overflow checking is easier   */
    /* if we increase i as we go, then subtract off its starting */
    /* value at the end to obtain delta.                         */
        
    /* Decode a generalized variable-length integer into delta,  */
    /* which gets added to i.  The overflow checking is easier   */
    /* if we increase i as we go, then subtract off its starting */
    /* value at the end to obtain delta.                         */
        
    for (oldi = i, w = 1, k = base;  ;  k += base) {
      if (in >= input_length) return punycode_bad_input;
      digit = decode_digit(input[in++]);
      if (digit >= base) return punycode_bad_input;
      if (digit > (maxint - i) / w) return punycode_overflow;
      i += digit * w;
      t = k <= bias /* + tmin */ ? tmin :     /* +tmin not needed */
          k >= bias + tmax ? tmax : k - bias;
      if (digit < t) break;
      if (w > maxint / (base - t)) return punycode_overflow;
      w *= (base - t);
    }
        
    for (oldi = i, w = 1, k = base;  ;  k += base) {
      if (in >= input_length) return punycode_bad_input;
      digit = decode_digit(input[in++]);
      if (digit >= base) return punycode_bad_input;
      if (digit > (maxint - i) / w) return punycode_overflow;
      i += digit * w;
      t = k <= bias /* + tmin */ ? tmin :     /* +tmin not needed */
          k >= bias + tmax ? tmax : k - bias;
      if (digit < t) break;
      if (w > maxint / (base - t)) return punycode_overflow;
      w *= (base - t);
    }
        
    bias = adapt(i - oldi, out + 1, oldi == 0);
        
    bias = adapt(i - oldi, out + 1, oldi == 0);
        
    /* i was supposed to wrap around from out+1 to 0,   */
    /* incrementing n each time, so we'll fix that now: */
        
    /* i was supposed to wrap around from out+1 to 0,   */
    /* incrementing n each time, so we'll fix that now: */
        
    if (i / (out + 1) > maxint - n) return punycode_overflow;
    n += i / (out + 1);
    i %= (out + 1);
        
    if (i / (out + 1) > maxint - n) return punycode_overflow;
    n += i / (out + 1);
    i %= (out + 1);
        
    /* Insert n at position i of the output: */
        
    /* Insert n at position i of the output: */
        
    /* not needed for Punycode: */
    /* if (decode_digit(n) <= base) return punycode_invalid_input; */
    if (out >= max_out) return punycode_big_output;
        
    /* not needed for Punycode: */
    /* if (decode_digit(n) <= base) return punycode_invalid_input; */
    if (out >= max_out) return punycode_big_output;
        
    if (case_flags) {
      memmove(case_flags + i + 1, case_flags + i, out - i);
        
    if (case_flags) {
      memmove(case_flags + i + 1, case_flags + i, out - i);
        
      /* Case of last character determines uppercase flag: */
      case_flags[i] = flagged(input[in - 1]);
    }
        
      /* Case of last character determines uppercase flag: */
      case_flags[i] = flagged(input[in - 1]);
    }
        
    memmove(output + i + 1, output + i, (out - i) * sizeof *output);
    output[i++] = n;
  }
        
    memmove(output + i + 1, output + i, (out - i) * sizeof *output);
    output[i++] = n;
  }
        
  *output_length = out;
  return punycode_success;
}
        
  *output_length = out;
  return punycode_success;
}
        
/******************************************************************/
/* Wrapper for testing (would normally go in a separate .c file): */
        
/******************************************************************/
/* Wrapper for testing (would normally go in a separate .c file): */
        
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
        
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
        
/* For testing, we'll just set some compile-time limits rather than */
/* use malloc(), and set a compile-time option rather than using a  */
/* command-line option.                                             */
        
/* For testing, we'll just set some compile-time limits rather than */
/* use malloc(), and set a compile-time option rather than using a  */
/* command-line option.                                             */
        
enum {
  unicode_max_length = 256,
  ace_max_length = 256
};
        
enum {
  unicode_max_length = 256,
  ace_max_length = 256
};
        
static void usage(char **argv)
{
  fprintf(stderr,
    "\n"
    "%s -e reads code points and writes a Punycode string.\n"
    "%s -d reads a Punycode string and writes code points.\n"
    "\n"
    "Input and output are plain text in the native character set.\n"
    "Code points are in the form u+hex separated by whitespace.\n"
    "Although the specification allows Punycode strings to contain\n"
    "any characters from the ASCII repertoire, this test code\n"
    "supports only the printable characters, and needs the Punycode\n"
    "string to be followed by a newline.\n"
    "The case of the u in u+hex is the force-to-uppercase flag.\n"
    , argv[0], argv[0]);
  exit(EXIT_FAILURE);
}
        
static void usage(char **argv)
{
  fprintf(stderr,
    "\n"
    "%s -e reads code points and writes a Punycode string.\n"
    "%s -d reads a Punycode string and writes code points.\n"
    "\n"
    "Input and output are plain text in the native character set.\n"
    "Code points are in the form u+hex separated by whitespace.\n"
    "Although the specification allows Punycode strings to contain\n"
    "any characters from the ASCII repertoire, this test code\n"
    "supports only the printable characters, and needs the Punycode\n"
    "string to be followed by a newline.\n"
    "The case of the u in u+hex is the force-to-uppercase flag.\n"
    , argv[0], argv[0]);
  exit(EXIT_FAILURE);
}
        

static void fail(const char *msg)

静态无效失败(常量字符*消息)

{
  fputs(msg,stderr);
  exit(EXIT_FAILURE);
}
        
{
  fputs(msg,stderr);
  exit(EXIT_FAILURE);
}
        
static const char too_big[] =
  "input or output is too large, recompile with larger limits\n";
static const char invalid_input[] = "invalid input\n";
static const char overflow[] = "arithmetic overflow\n";
static const char io_error[] = "I/O error\n";
        
static const char too_big[] =
  "input or output is too large, recompile with larger limits\n";
static const char invalid_input[] = "invalid input\n";
static const char overflow[] = "arithmetic overflow\n";
static const char io_error[] = "I/O error\n";
        
/* The following string is used to convert printable */
/* characters between ASCII and the native charset:  */
        
/* The following string is used to convert printable */
/* characters between ASCII and the native charset:  */
        
static const char print_ascii[] =
  "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  " !\"#$%&'()*+,-./"
  "0123456789:;<=>?"
  "@ABCDEFGHIJKLMNO"
  "PQRSTUVWXYZ[\\]^_"
  "`abcdefghijklmno"
  "pqrstuvwxyz{|}~\n";
        
static const char print_ascii[] =
  "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  " !\"#$%&'()*+,-./"
  "0123456789:;<=>?"
  "@ABCDEFGHIJKLMNO"
  "PQRSTUVWXYZ[\\]^_"
  "`abcdefghijklmno"
  "pqrstuvwxyz{|}~\n";
        
int main(int argc, char **argv)
{
  enum punycode_status status;
  int r;
  unsigned int input_length, output_length, j;
  unsigned char case_flags[unicode_max_length];
        
int main(int argc, char **argv)
{
  enum punycode_status status;
  int r;
  unsigned int input_length, output_length, j;
  unsigned char case_flags[unicode_max_length];
        
  if (argc != 2) usage(argv);
  if (argv[1][0] != '-') usage(argv);
  if (argv[1][2] != 0) usage(argv);
        
  if (argc != 2) usage(argv);
  if (argv[1][0] != '-') usage(argv);
  if (argv[1][2] != 0) usage(argv);
        
  if (argv[1][1] == 'e') {
    punycode_uint input[unicode_max_length];
    unsigned long codept;
    char output[ace_max_length+1], uplus[3];
    int c;
        
  if (argv[1][1] == 'e') {
    punycode_uint input[unicode_max_length];
    unsigned long codept;
    char output[ace_max_length+1], uplus[3];
    int c;
        
    /* Read the input code points: */
        
    /* Read the input code points: */
        

input_length = 0;

输入_长度=0;

    for (;;) {
      r = scanf("%2s%lx", uplus, &codept);
      if (ferror(stdin)) fail(io_error);
        
    for (;;) {
      r = scanf("%2s%lx", uplus, &codept);
      if (ferror(stdin)) fail(io_error);
        
      if (r == EOF || r == 0) break;
        
      if (r == EOF || r == 0) break;
        
      if (r != 2 || uplus[1] != '+' || codept > (punycode_uint)-1) {
        fail(invalid_input);
      }
        
      if (r != 2 || uplus[1] != '+' || codept > (punycode_uint)-1) {
        fail(invalid_input);
      }
        
      if (input_length == unicode_max_length) fail(too_big);
        
      if (input_length == unicode_max_length) fail(too_big);
        
      if (uplus[0] == 'u') case_flags[input_length] = 0;
      else if (uplus[0] == 'U') case_flags[input_length] = 1;
      else fail(invalid_input);
        
      if (uplus[0] == 'u') case_flags[input_length] = 0;
      else if (uplus[0] == 'U') case_flags[input_length] = 1;
      else fail(invalid_input);
        
      input[input_length++] = codept;
    }
        
      input[input_length++] = codept;
    }
        
    /* Encode: */
        
    /* Encode: */
        
    output_length = ace_max_length;
    status = punycode_encode(input_length, input, case_flags,
                             &output_length, output);
    if (status == punycode_bad_input) fail(invalid_input);
    if (status == punycode_big_output) fail(too_big);
    if (status == punycode_overflow) fail(overflow);
    assert(status == punycode_success);
        
    output_length = ace_max_length;
    status = punycode_encode(input_length, input, case_flags,
                             &output_length, output);
    if (status == punycode_bad_input) fail(invalid_input);
    if (status == punycode_big_output) fail(too_big);
    if (status == punycode_overflow) fail(overflow);
    assert(status == punycode_success);
        
    /* Convert to native charset and output: */
        
    /* Convert to native charset and output: */
        
    for (j = 0;  j < output_length;  ++j) {
      c = output[j];
      assert(c >= 0 && c <= 127);
      if (print_ascii[c] == 0) fail(invalid_input);
      output[j] = print_ascii[c];
    }
        
    for (j = 0;  j < output_length;  ++j) {
      c = output[j];
      assert(c >= 0 && c <= 127);
      if (print_ascii[c] == 0) fail(invalid_input);
      output[j] = print_ascii[c];
    }
        
    output[j] = 0;
    r = puts(output);
    if (r == EOF) fail(io_error);
    return EXIT_SUCCESS;
  }
        
    output[j] = 0;
    r = puts(output);
    if (r == EOF) fail(io_error);
    return EXIT_SUCCESS;
  }
        
  if (argv[1][1] == 'd') {
    char input[ace_max_length+2], *p, *pp;
    punycode_uint output[unicode_max_length];
        
  if (argv[1][1] == 'd') {
    char input[ace_max_length+2], *p, *pp;
    punycode_uint output[unicode_max_length];
        
    /* Read the Punycode input string and convert to ASCII: */
        
    /* Read the Punycode input string and convert to ASCII: */
        
    fgets(input, ace_max_length+2, stdin);
    if (ferror(stdin)) fail(io_error);
        
    fgets(input, ace_max_length+2, stdin);
    if (ferror(stdin)) fail(io_error);
        
    if (feof(stdin)) fail(invalid_input);
    input_length = strlen(input) - 1;
    if (input[input_length] != '\n') fail(too_big);
    input[input_length] = 0;
        
    if (feof(stdin)) fail(invalid_input);
    input_length = strlen(input) - 1;
    if (input[input_length] != '\n') fail(too_big);
    input[input_length] = 0;
        
    for (p = input;  *p != 0;  ++p) {
      pp = strchr(print_ascii, *p);
      if (pp == 0) fail(invalid_input);
      *p = pp - print_ascii;
    }
        
    for (p = input;  *p != 0;  ++p) {
      pp = strchr(print_ascii, *p);
      if (pp == 0) fail(invalid_input);
      *p = pp - print_ascii;
    }
        
    /* Decode: */
        
    /* Decode: */
        
    output_length = unicode_max_length;
    status = punycode_decode(input_length, input, &output_length,
                             output, case_flags);
    if (status == punycode_bad_input) fail(invalid_input);
    if (status == punycode_big_output) fail(too_big);
    if (status == punycode_overflow) fail(overflow);
    assert(status == punycode_success);
        
    output_length = unicode_max_length;
    status = punycode_decode(input_length, input, &output_length,
                             output, case_flags);
    if (status == punycode_bad_input) fail(invalid_input);
    if (status == punycode_big_output) fail(too_big);
    if (status == punycode_overflow) fail(overflow);
    assert(status == punycode_success);
        
    /* Output the result: */
        
    /* Output the result: */
        
    for (j = 0;  j < output_length;  ++j) {
      r = printf("%s+%04lX\n",
                 case_flags[j] ? "U" : "u",
                 (unsigned long) output[j] );
      if (r < 0) fail(io_error);
    }
        
    for (j = 0;  j < output_length;  ++j) {
      r = printf("%s+%04lX\n",
                 case_flags[j] ? "U" : "u",
                 (unsigned long) output[j] );
      if (r < 0) fail(io_error);
    }
        
    return EXIT_SUCCESS;
  }
        
    return EXIT_SUCCESS;
  }
        
  usage(argv);
  return EXIT_SUCCESS;  /* not reached, but quiets compiler warning */
}
        
  usage(argv);
  return EXIT_SUCCESS;  /* not reached, but quiets compiler warning */
}
        

Author's Address

作者地址

   Adam M. Costello
   University of California, Berkeley
   http://www.nicemice.net/amc/
        
   Adam M. Costello
   University of California, Berkeley
   http://www.nicemice.net/amc/
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2003). All Rights Reserved.

版权所有(C)互联网协会(2003年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Acknowledgement

确认

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。