Internet Engineering Task Force (IETF)                        A. Freytag
Request for Comments: 8228                                   August 2017
Category: Informational
ISSN: 2070-1721
        
Internet Engineering Task Force (IETF)                        A. Freytag
Request for Comments: 8228                                   August 2017
Category: Informational
ISSN: 2070-1721
        

Guidance on Designing Label Generation Rulesets (LGRs) Supporting Variant Labels

关于设计支持变体标签的标签生成规则集(LGR)的指南

Abstract

摘要

Rules for validating identifier labels and alternate representations of those labels (variants) are known as Label Generation Rulesets (LGRs); they are used for the implementation of identifier systems such as Internationalized Domain Names (IDNs). This document describes ways to design LGRs to support variant labels. In designing LGRs, it is important to ensure that the label generation rules are consistent and well behaved in the presence of variants. The design decisions can then be expressed using the XML representation of LGRs that is defined in RFC 7940.

用于验证标识符标签和这些标签(变体)的替代表示的规则称为标签生成规则集(LGR);它们用于实现标识符系统,如国际化域名(IDN)。本文档描述了设计LGR以支持变体标签的方法。在设计LGR时,重要的是确保标签生成规则在存在变体的情况下保持一致并表现良好。然后,可以使用RFC 7940中定义的LGR的XML表示来表达设计决策。

Status of This Memo

关于下段备忘

This document is not an Internet Standards Track specification; it is published for informational purposes.

本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。

This document is a product of the Internet Engineering Task Force (IETF). It has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 7841.

本文件是互联网工程任务组(IETF)的产品。互联网工程指导小组(IESG)已批准将其出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 7841第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc8228.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc8228.

Copyright Notice

版权公告

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2017 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Variant Relations . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Symmetry and Transitivity . . . . . . . . . . . . . . . . . .   5
   4.  A Word on Notation  . . . . . . . . . . . . . . . . . . . . .   5
   5.  Variant Mappings  . . . . . . . . . . . . . . . . . . . . . .   6
   6.  Variant Labels  . . . . . . . . . . . . . . . . . . . . . . .   7
   7.  Variant Types and Label Dispositions  . . . . . . . . . . . .   7
   8.  Allocatable Variants  . . . . . . . . . . . . . . . . . . . .   8
   9.  Blocked Variants  . . . . . . . . . . . . . . . . . . . . . .   9
   10. Pure Variant Labels . . . . . . . . . . . . . . . . . . . . .  10
   11. Reflexive Variants  . . . . . . . . . . . . . . . . . . . . .  11
   12. Limiting Allocatable Variants by Subtyping  . . . . . . . . .  12
   13. Allowing Mixed Originals  . . . . . . . . . . . . . . . . . .  14
   14. Handling Out-of-Repertoire Variants . . . . . . . . . . . . .  15
   15. Conditional Variants  . . . . . . . . . . . . . . . . . . . .  16
   16. Making Conditional Variants Well Behaved  . . . . . . . . . .  18
   17. Variants for Sequences  . . . . . . . . . . . . . . . . . . .  19
   18. Corresponding XML Notation  . . . . . . . . . . . . . . . . .  21
   19. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  22
   20. Security Considerations . . . . . . . . . . . . . . . . . . .  23
   21. References  . . . . . . . . . . . . . . . . . . . . . . . . .  23
     21.1.  Normative References . . . . . . . . . . . . . . . . . .  23
     21.2.  Informative References . . . . . . . . . . . . . . . . .  23
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  24
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  24
        
   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Variant Relations . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Symmetry and Transitivity . . . . . . . . . . . . . . . . . .   5
   4.  A Word on Notation  . . . . . . . . . . . . . . . . . . . . .   5
   5.  Variant Mappings  . . . . . . . . . . . . . . . . . . . . . .   6
   6.  Variant Labels  . . . . . . . . . . . . . . . . . . . . . . .   7
   7.  Variant Types and Label Dispositions  . . . . . . . . . . . .   7
   8.  Allocatable Variants  . . . . . . . . . . . . . . . . . . . .   8
   9.  Blocked Variants  . . . . . . . . . . . . . . . . . . . . . .   9
   10. Pure Variant Labels . . . . . . . . . . . . . . . . . . . . .  10
   11. Reflexive Variants  . . . . . . . . . . . . . . . . . . . . .  11
   12. Limiting Allocatable Variants by Subtyping  . . . . . . . . .  12
   13. Allowing Mixed Originals  . . . . . . . . . . . . . . . . . .  14
   14. Handling Out-of-Repertoire Variants . . . . . . . . . . . . .  15
   15. Conditional Variants  . . . . . . . . . . . . . . . . . . . .  16
   16. Making Conditional Variants Well Behaved  . . . . . . . . . .  18
   17. Variants for Sequences  . . . . . . . . . . . . . . . . . . .  19
   18. Corresponding XML Notation  . . . . . . . . . . . . . . . . .  21
   19. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  22
   20. Security Considerations . . . . . . . . . . . . . . . . . . .  23
   21. References  . . . . . . . . . . . . . . . . . . . . . . . . .  23
     21.1.  Normative References . . . . . . . . . . . . . . . . . .  23
     21.2.  Informative References . . . . . . . . . . . . . . . . .  23
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  24
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  24
        
1. Introduction
1. 介绍

Label Generation Rulesets (LGRs) that define the set of permissible labels may be applied to identifier systems that rely on labels, such as the Domain Name System (DNS) [RFC1034] [RFC1035]. To date, LGRs have mostly been used to define policies for implementing Internationalized Domain Names (IDNs) using IDNA2008 [RFC5890] [RFC5891] [RFC5892] [RFC5893] [RFC5894] in the DNS. This document aims to discuss the generation of LGRs for such circumstances, but the techniques and considerations here are almost certainly applicable to a wider range of internationalized identifiers.

定义允许标签集的标签生成规则集(LGR)可应用于依赖标签的标识符系统,例如域名系统(DNS)[RFC1034][RFC1035]。迄今为止,LGR主要用于定义在DNS中使用IDNA2008[RFC5890][RFC5891][RFC5892][RFC5893][RFC5894]实施国际化域名(IDN)的策略。本文件旨在讨论此类情况下LGR的生成,但这里的技术和注意事项几乎肯定适用于更广泛的国际化标识符。

In addition to determining whether a given label is eligible, LGRs may also define the condition under which alternate representations of these labels, so-called "variant labels", may exist and their status (disposition). In the most general sense, variant labels are typically labels that are either visually or semantically indistinguishable from another label in the context of the writing system or script supported by the LGR. Unlike merely similar labels, where there may be a measurable degree of similarity, variant labels considered here represent a form of equivalence in meaning or appearance. What constitutes an appropriate variant in any writing system or given context, particularly in the DNS, is assumed to have been determined ahead of time and therefore is not a subject of this document.

除了确定给定标签是否合格外,LGR还可以定义这些标签的替代表示(即所谓的“变体标签”)可能存在的条件及其状态(处置)。在最一般的意义上,变体标签通常是在LGR支持的书写系统或脚本的上下文中,在视觉或语义上与另一个标签无法区分的标签。与可能存在可测量的相似程度的相似标签不同,这里考虑的变体标签代表意义或外观上的一种等价形式。在任何书写系统或给定上下文中,特别是在DNS中,构成适当变体的内容假定已提前确定,因此不属于本文档的主题。

Once identified, variant labels are typically delegated to some entity together with the applied-for label, or permanently reserved, based on the disposition derived from the LGR. Correctly defined, variant labels can improve the security of an LGR, yet successfully defining variant rules for an LGR so that the result is well behaved is not always trivial. This document describes the basic considerations and constraints that must be taken into account and gives examples of what might be use cases for different types of variant specifications in an LGR.

一旦确定,变体标签通常与应用标签一起委托给某个实体,或根据从LGR派生的处置永久保留。正确定义的变量标签可以提高LGR的安全性,但成功地为LGR定义变量规则以使结果表现良好并不总是微不足道的。本文档描述了必须考虑的基本注意事项和约束条件,并给出了LGR中不同类型变体规范的用例示例。

This document does not address whether variants are an appropriate means to solve any given issue or the basis on which they should be defined. It is intended to explain in more detail the effects of various declarations and the trade-offs in making design choices. It implicitly assumes that any LGR will be expressed using the XML representation defined in [RFC7940] and therefore conforms to any requirements stated therein. Purely for clarity of exposition, examples in this document use a more compact notation than the XML syntax defined in [RFC7940]. However, the reader is expected to have some familiarity with the concepts described in that RFC (see Section 4).

本文件未说明变量是否是解决任何给定问题的适当方法,也未说明变量的定义依据。它旨在更详细地解释各种声明的影响以及在做出设计选择时的权衡。它隐含地假设任何LGR都将使用[RFC7940]中定义的XML表示来表示,因此符合其中规定的任何要求。纯粹为了说明清楚,本文中的示例使用了比[RFC7940]中定义的XML语法更紧凑的表示法。然而,读者应该对RFC中描述的概念有一些熟悉(参见第4节)。

The user of any identifier system, such as the DNS, interacts with it in the context of labels; variants are experienced as variant labels, i.e., two (or more) labels that are functionally "same as" under the conventions of the writing system used, even though their code point sequences are different. An LGR specification, on the other hand, defines variant mappings between code points and, only in a secondary step, derives the variant labels from these mappings. For a discussion of this process, see [RFC7940].

任何标识符系统(如DNS)的用户在标签上下文中与其交互;变体被视为变体标签,即两个(或更多)标签在使用的书写系统的约定下功能“相同”,即使它们的代码点序列不同。另一方面,LGR规范定义代码点之间的变量映射,并且仅在第二步中,从这些映射派生变量标签。有关此过程的讨论,请参阅[RFC7940]。

The designer of an LGR can control whether some or all of the variant labels created from an original label should be allocatable, i.e., available for allocation (to the original applicant), or whether some or all of these labels should be blocked instead, i.e., remain not allocatable (to anyone). This document describes how this choice of label disposition is accomplished (see Section 7).

LGR的设计者可以控制从原始标签创建的部分或所有变体标签是否应可分配,即是否可分配(给原始申请人),或者是否应阻止部分或所有这些标签,即保持不可分配(给任何人)。本文件描述了如何完成标签处置的选择(见第7节)。

The choice of desired label disposition would be based on the expectations of the users of the particular zone; it is not the subject of this document. Likewise, this document does not address the possibility of an LGR defining custom label dispositions. Instead, this document suggests ways of designing an LGR to achieve the selected design choice for handling variants in the context of the two standard label dispositions: "allocatable" and "blocked".

所需标签配置的选择将基于特定区域用户的期望;这不是本文件的主题。同样,本文件不涉及LGR定义自定义标签配置的可能性。相反,本文件提出了设计LGR的方法,以实现在两种标准标签配置“可分配”和“阻止”的情况下处理变体的选定设计选择。

The information in this document is based on operational experience gained in developing LGRs for a wide number of languages and scripts using RFC 7940. This information is provided here as a benefit to the wider community. It does not alter or change the specification found in RFC 7940 in any way.

本文档中的信息基于使用RFC 7940为多种语言和脚本开发LGR所获得的操作经验。这里提供这些信息是为了让更广泛的社区受益。它不会以任何方式更改RFC 7940中的规范。

2. Variant Relations
2. 变异关系

A variant relation is fundamentally a "same as" relation; in other words, it is an equivalence relation. Now, the strictest sense of "same as" would be equality, and for any equality, we have both symmetry

变体关系从根本上说是“与”关系;换句话说,这是一种等价关系。现在,严格意义上的“等同”应该是平等,对于任何平等,我们都有对称性

     A = B => B = A
        
     A = B => B = A
        

and transitivity

及物性

     A = B and B = C => A = C
        
     A = B and B = C => A = C
        

The variant relation with its functional sense of "same as" must really satisfy the same constraint. Once we say A is the "same as" B, we also assert that B is the "same as" A. In this document, the symbol "~" means "has a variant relation with". Thus, we get

变体关系及其功能意义“相同”必须真正满足相同的约束。一旦我们说A与B“相同”,我们也会断言B与A“相同”。在本文件中,符号“~”表示“与”有不同的关系。因此,我们得到

     A ~ B => B ~ A
        
     A ~ B => B ~ A
        

Likewise, if we make the same claim for B and C (B ~ C), then we get A ~ C, because if B is the "same as" both A and C, then A must be the "same as" C:

同样,如果我们对B和C(B~C)提出相同的索赔,那么我们得到A~C,因为如果B与A和C“相同”,那么A必须与C“相同”:

     A ~ B and B ~ C => A ~ C
        
     A ~ B and B ~ C => A ~ C
        
3. Symmetry and Transitivity
3. 对称性与及物性

Not all potential relations between labels constitute equivalence, and those that do not are not transitive and may not be symmetric. For example, the degree to which labels are confusable is not transitive: two labels can be confusingly similar to a third without necessarily being confusable with each other, such as when the third one has a shape that is "in between" the other two. In contrast, a relation based on identical or effectively identical appearance would meet the criterion of transitivity, and we would consider it a variant relation. Examples of variant relations include other forms of equivalence, such as semantic equivalence.

并非标签之间的所有潜在关系都构成等价关系,那些不构成等价关系的关系不是传递的,也可能不是对称的。例如,标签的可混淆程度是不可传递的:两个标签可以混淆地类似于第三个标签,而不必彼此混淆,例如第三个标签的形状介于其他两个标签之间。相反,基于相同或有效相同外观的关系将满足传递性的标准,我们将认为它是一种变异关系。变体关系的例子包括其他形式的等价,例如语义等价。

Using [RFC7940], a set of mappings could be defined that is neither symmetric nor transitive; such a specification would be formally valid. However, a symmetric and transitive set of mappings is strongly preferred as a basis for an LGR, not least because of the benefits from an implementation point of view; for example, if all mappings are symmetric and transitive, it greatly simplifies the check for collisions between labels with variants. For this reason, we will limit the discussion in this document to those relations that are symmetric and transitive. Incidentally, it is often straightforward to verify mechanically whether an LGR is symmetric and/or transitive and to compute any mappings required to make it so (but see Section 15).

使用[RFC7940],可以定义一组既不对称也不可传递的映射;这样的规范在形式上是有效的。然而,一组对称且可传递的映射被强烈推荐作为LGR的基础,这不仅是因为从实现的角度来看有好处;例如,如果所有映射都是对称的和可传递的,则可以大大简化带有变量的标签之间冲突的检查。出于这个原因,我们将在本文档中的讨论限于对称和传递的关系。顺便说一句,机械地验证LGR是否对称和/或可传递,并计算使其对称和/或可传递所需的任何映射通常是很简单的(但请参见第15节)。

4. A Word on Notation
4. 记谱法

[RFC7940] defines an XML schema for Label Generation Rulesets in general and variant code points and sequences in particular (see Section 18). That notation is rather verbose and can easily obscure salient features to anyone not trained to read XML. For this reason, this document uses a symbolic shorthand notation in presenting the examples for discussion. This shorthand is merely a didactic tool

[RFC7940]定义了标签生成规则集的XML模式,特别是变量代码点和序列(参见第18节)。这种表示法相当冗长,对于没有受过XML阅读训练的任何人来说,都很容易模糊其显著特征。因此,本文档使用符号速记法来呈现示例以供讨论。这种速记只是一种说教工具

for presentation and is not intended as an alternative to or replacement for the XML syntax that is used in formally specifying an LGR under [RFC7940].

用于表示,不打算替代或替代[RFC7940]下正式指定LGR时使用的XML语法。

When it comes time to capture the LGR in a formal definition, the notation used for any of the examples in this document can be converted to the XML format as described in Section 18.

当需要在正式定义中捕获LGR时,本文档中任何示例使用的符号都可以转换为XML格式,如第18节所述。

5. Variant Mappings
5. 变量映射

So far, we have treated variant relations as simple "same as" relations, ignoring that each relation representing equivalence would consist of a symmetric pair of reciprocal mappings. In this document, the symbol "-->" means "maps to".

到目前为止,我们将变量关系视为简单的“与”关系,忽略了表示等价的每个关系都由一对对称的倒数映射组成。在本文件中,符号“->”表示“映射到”。

   A ~ B => A --> B, B --> A
        
   A ~ B => A --> B, B --> A
        

In an LGR, these mappings are not defined directly between labels but between code points (or code point sequences; see Section 17). In the transitive case, given

在LGR中,这些映射不是在标签之间直接定义的,而是在代码点(或代码点序列;参见第17节)之间定义的。在可传递的情况下,给定

   A ~ B => A --> B, B --> A
        
   A ~ B => A --> B, B --> A
        
   A ~ C => A --> C, C --> A
        
   A ~ C => A --> C, C --> A
        

we also get

我们也得到

   B ~ C => B --> C, C --> B
        
   B ~ C => B --> C, C --> B
        

for a total of six possible mappings. Conventionally, these are listed in tables in order of the source code point, like so:

总共有六个可能的映射。按照惯例,这些代码按源代码点的顺序列在表中,如下所示:

     A --> B
     A --> C
     B --> A
     B --> C
     C --> A
     C --> B
        
     A --> B
     A --> C
     B --> A
     B --> C
     C --> A
     C --> B
        

As we can see, A, B, and C can each be mapped two ways.

如我们所见,A、B和C可以分别以两种方式映射。

6. Variant Labels
6. 变体标签

To create a variant label, each code point in the original label is successively replaced by all variant code points defined by a mapping from the original code point. For a label AAA (the letter "A" three times), the variant labels (given the mappings from the transitive example above) would be

要创建变体标签,原始标签中的每个代码点将依次替换为由原始代码点的映射定义的所有变体代码点。对于标签AAA(字母“a”三次),变量标签(给定上面传递示例中的映射)将是

AAB ABA ABB BAA BAB BBA BBB AAC ... CCC

ABB ABB ABB BAB BBA BBB AAC。。。CCC

So far, we have merely defined what the variant labels are, but we have not considered their possible dispositions. In the next section, we discuss how to set up the variant mappings so that some variant labels are mutually exclusive (blocked), but some may be allocated to the same applicant as the original label (allocatable).

到目前为止,我们只定义了变体标签,但没有考虑其可能的配置。在下一节中,我们将讨论如何设置变量映射,以使一些变量标签相互排斥(阻止),但有些变量标签可以分配给与原始标签相同的申请人(可分配)。

7. Variant Types and Label Dispositions
7. 变体类型和标签配置

Assume we wanted to allow a variant relation between code points O and A, and perhaps between O and B or O and C as well. Assuming transitivity, this would give us:

假设我们希望在代码点O和a之间,以及可能在O和B或O和C之间允许一个变量关系。假设可传递性,这将为我们提供:

     O ~ A ~ B ~ C
        
     O ~ A ~ B ~ C
        

Now, further assume that we would like to distinguish the case where someone applies for OOO from the case where someone applies for the label ABC. In this case, we would like to allocate only the applied-for label OOO, but in the latter case, we would like to also allow the allocation of either the label OOO or the variant label ABC, or both, but not of any of the other possible variant labels, like OAO, BCO, or the like. (A real-world example might be the case where O represents an unaccented letter, while A, B, and C might represent various accented forms of the same letter. Because unaccented letters are a common fallback, there might be a desire to allocate an unaccented label as a variant, but not the other way around.)

现在,进一步假设我们想要区分有人申请OOO和有人申请ABC标签的情况。在这种情况下,我们只希望分配已申请的标签OOO,但在后一种情况下,我们还希望允许分配标签OOO或变体标签ABC,或两者,但不允许分配任何其他可能的变体标签,如OAO、BCO等。(现实世界中的一个例子可能是O代表一个无重音字母,而A、B和C可能代表同一字母的各种重音形式。由于无重音字母是一种常见的回退方式,因此可能希望将无重音标签分配为变体,但并非相反。)

How would we specify such a distinction?

我们将如何具体说明这种区别?

The answer lies in labeling the mappings A --> O, B --> O, and C --> O with the type "allocatable" and the mappings O --> A, O --> B, and O --> C with the type "blocked". In this document, the symbol "x-->" means "maps with type blocked", and the symbol "a-->" means "maps with type allocatable". Thus:

答案在于将映射A-->O、B-->O和C-->O标记为“可分配”类型,将映射O-->A、O-->B和O-->C标记为“阻塞”类型。在本文档中,符号“x-->”表示“类型被阻止的映射”,符号“a-->”表示“类型可分配的映射”。因此:

O x--> A O x--> B O x--> C A a--> O B a--> O C a--> O

O x-->A O x-->B O x-->C A-->O B A-->O C A-->O

When we generate all permutations of labels, we use mappings with different types depending on which code points we start from. The set of all permuted variant labels would be the same, but the disposition of the variant label depends on which label we start from (we call that label the "original" or "applied-for" label).

当我们生成标签的所有排列时,我们使用不同类型的映射,这取决于我们从哪个代码点开始。所有排列的变体标签的集合是相同的,但变体标签的处置取决于我们从哪个标签开始(我们称该标签为“原始”或“应用于”标签)。

In creating an LGR with variants, all variant mappings should always be labeled with a type ([RFC7940] does not formally require a type, but any well-behaved LGR would be fully typed). By default, these types correspond directly to the dispositions for variant labels, with the most restrictive type determining the disposition of the variant label. However, as we shall see later, it is sometimes useful to assign types from a wider array of values than the final dispositions for the labels and then define explicitly how to derive label dispositions from them.

在创建带有变体的LGR时,所有变体映射都应始终标记为一个类型([RFC7940]不正式要求类型,但任何表现良好的LGR都将是全类型的)。默认情况下,这些类型直接对应于变体标签的配置,最严格的类型决定变体标签的配置。但是,正如我们稍后将看到的,从比标签的最终配置更广泛的值数组中指定类型,然后明确定义如何从中派生标签配置,有时是有用的。

8. Allocatable Variants
8. 可分配变量

If we start with AAA and use the mappings from Section 7, the permutation OOO will be the result of applying the mapping A a--> O at each code point. That is, only mappings with type "a" (allocatable) were used. To know whether we can allocate both the label OOO and the original label AAA, we track the types of the mappings used in generating the label.

如果我们从AAA开始并使用第7节中的映射,则排列OOO将是在每个代码点应用映射A-->O的结果。也就是说,只使用了类型为“a”(可分配)的映射。为了知道是否可以同时分配标签OOO和原始标签AAA,我们跟踪生成标签时使用的映射类型。

We record the variant types for each of the variant mappings used in creating the permutation in an ordered list. Such an ordered list of variant types is called a "variant type list". In running text, we often show it enclosed in square brackets. For example, [a x -] means the variant label was derived from a variant mapping with the "a" variant type in the first code point position, "x" in the second code point position, and the original code point in the third position ("-" means "no variant mapping").

我们记录用于在有序列表中创建置换的每个变量映射的变量类型。这种变体类型的有序列表称为“变体类型列表”。在运行文本中,我们经常将其显示在方括号中。例如,[a x-]表示变体标签是从变体映射派生的,其中“a”变体类型位于第一个代码点位置,“x”位于第二个代码点位置,原始代码点位于第三个位置(“-”表示“无变体映射”)。

For our example permutation, we get the following variant type list (brackets dropped):

对于我们的示例置换,我们得到以下变体类型列表(括号已删除):

     AAA --> OOO : a a a
        
     AAA --> OOO : a a a
        

From the variant type list, we derive a "variant type set", denoted by curly braces, that contains an unordered set of unique variant types in the variant type list. For the variant type list for the given permutation, [a a a], the variant type set is { a }, which has a single element "a".

从变量类型列表中,我们派生出一个“变量类型集”,用大括号表示,它在变量类型列表中包含一组无序的唯一变量类型。对于给定置换[a]的变量类型列表,变量类型集是{a},它有一个元素“a”。

Deciding whether to allow the allocation of a variant label then amounts to deriving a disposition for the variant label from the variant type set created from the variant mappings that were used to create the label. For example, the derivation

决定是否允许分配变量标签相当于从用于创建标签的变量映射创建的变量类型集派生变量标签的处置。例如,派生

     if "all variants" = "a" => set label disposition to "allocatable"
        
     if "all variants" = "a" => set label disposition to "allocatable"
        

would allow OOO to be allocated, because the types of all variant mappings used to create that variant label from AAA are "a".

将允许分配OOO,因为用于从AAA创建变量标签的所有变量映射的类型都是“a”。

The "all-variants" condition is tolerant of an extra "-" in the variant set (unlike the "only-variants" condition described in Section 10). So, had we started with AOA, OAA, or AAO, the variant set for the permuted variant OOO would have been { a - } because in each case one of the code points remains the same code point as the original. The "-" means that because of the absence of a mapping O --> O, there is no variant type for the O in each of these labels.

“所有变量”条件允许变量集中出现额外的“-”(与第10节中描述的“仅变量”条件不同)。因此,如果我们从AOA、OAA或AAO开始,那么置换变量OOO的变量集应该是{a-},因为在每种情况下,其中一个代码点与原始代码点保持相同的代码点。“-”表示由于没有映射O-->O,因此每个标签中都没有O的变量类型。

The "all-variants" = "a" condition ignores the "-", so using the derivation from above, we find that OOO is an allocatable variant for each of the labels AOA, OAA, or AAO.

“all variants”=“a”条件忽略“-”,因此使用上面的推导,我们发现OOO是每个标签AOA、OAA或AAO的可分配变量。

Allocatable variant labels, especially large numbers of allocatable variants per label, incur a certain cost to users of the LGR. A well-behaved LGR will minimize the number of allocatable variants.

可分配变量标签,特别是每个标签上有大量可分配变量,会给LGR用户带来一定的成本。表现良好的LGR将最小化可分配变量的数量。

9. Blocked Variants
9. 阻断变异体

Blocked variants are not available to another registrant. They therefore protect the applicant of the original label from someone else registering a label that is the "same as" under some user-perceived metric. Blocked variants can be a useful tool even for scripts for which no allocatable labels are ever defined.

其他注册人无法使用阻止的变体。因此,它们可以保护原始标签的申请人,使其免受其他人在某些用户感知指标下注册“相同”标签的影响。即使对于从未定义过可分配标签的脚本,阻塞变量也是一个有用的工具。

If we start with OOO and use the mappings from Section 7, the permutation AAA will have been the result of applying only mappings with type "blocked", and we cannot allocate the label AAA, only the original label OOO. This corresponds to the following derivation:

如果我们从OOO开始并使用第7节中的映射,则置换AAA将是仅应用类型为“blocked”的映射的结果,并且我们不能分配标签AAA,只能分配原始标签OOO。这对应于以下推导:

     if "any variants" = "x" => set label disposition to "blocked"
        
     if "any variants" = "x" => set label disposition to "blocked"
        

Additionally, to prevent allocating ABO as a variant label for AAA, we need to make sure that the mapping A --> B has been defined with type "blocked", as in

此外,为了防止将ABO分配为AAA的变量标签,我们需要确保映射a-->B已定义为类型“blocked”,如中所示

A x--> B

A x-->B

so that

因此

AAA --> ABO: - x a.

AAA-->ABO:-x a。

Thus, the set {x a} contains at least one "x" and satisfies the derivation of a blocked disposition for ABO when AAA is applied for.

因此,集合{xa}包含至少一个“x”,并且满足当应用AAA时ABO的阻塞处理的派生。

If an LGR results in a symmetric and transitive set of variant labels, then the task of determining whether a label or its variants collide with another label or its variants can be implemented very efficiently. Symmetry and transitivity imply that sets of labels that are mutual variants of each other are disjoint from all other such sets. Only labels within the same set can be variants of each other. Identifying the variant set can be an O(1) operation, and enumerating all variants is not necessary.

如果LGR产生了一组对称且可传递的变体标签,那么确定一个标签或其变体是否与另一个标签或其变体冲突的任务可以非常有效地实现。对称性和及物性意味着互为变体的标签集与所有其他此类标签集是不相交的。只有同一集合中的标签可以是彼此的变体。识别变量集可以是一个O(1)操作,不需要枚举所有变量。

10. Pure Variant Labels
10. 纯变体标签

Now, if we wanted to prevent allocation of AOA when we start from AAA, we would need a rule disallowing a mix of original code points and variant code points; this is easily accomplished by use of the "only-variants" qualifier, which requires that the label consist entirely of variants and that all the variants are from the same set of types.

现在,如果我们想在从AAA开始时阻止AOA的分配,我们需要一条规则,禁止混合使用原始代码点和变体代码点;这可以通过使用“only variants”限定符轻松实现,该限定符要求标签完全由变体组成,并且所有变体都来自同一组类型。

     if "only-variants" = "a" => set label disposition to "allocatable"
        
     if "only-variants" = "a" => set label disposition to "allocatable"
        

The two code points A in AOA are not arrived at by variant mappings, because the code points are unchanged and no variant mappings are defined for A --> A. So, in our example, the set of variant mapping types is

AOA中的两个代码点A不是由变量映射得到的,因为这些代码点是不变的,并且没有为-->A定义变量映射。因此,在我们的示例中,变量映射类型集是

     AAA --> AOA:  - a -
        
     AAA --> AOA:  - a -
        

but unlike the "all-variants" condition, "only-variants" requires a variant type set { a } corresponding to a variant type list [a a a] (no - allowed). By adding a final derivation

但与“所有变体”条件不同,“仅变体”需要与变体类型列表[a](不允许)对应的变体类型集{a}。通过添加一个最终派生

     else if "any-variants" = "a" => set label disposition to "blocked"
        
     else if "any-variants" = "a" => set label disposition to "blocked"
        

and executing that derivation only on any remaining labels, we disallow AOA when starting from AAA but still allow OOO.

并且仅在任何剩余标签上执行该派生,当从AAA开始时,我们不允许AOA,但仍然允许OOO。

Derivation conditions are always applied in order, with later derivations only applying to labels that did not match any earlier conditions, as indicated by the use of "else" in the last example. In other words, they form a cascade.

派生条件总是按顺序应用,后面的派生只应用于与任何早期条件不匹配的标签,如最后一个示例中使用的“else”所示。换句话说,它们形成一个级联。

11. Reflexive Variants
11. 自反变体

But what if we started from AOA? We would expect the original label OOO to be allocatable, but, using the mappings from Section 7, the variant type set would be

但是如果我们从AOA开始呢?我们希望原始标签OOO是可分配的,但是,使用第7节中的映射,变量类型集将是

     AOA --> OOO:  a - a
        
     AOA --> OOO:  a - a
        

because the middle O is unchanged from the original code point. Here is where we use a reflexive mapping. Realizing that O is the "same as" O, we can map it to itself. This is normally redundant, but adding an explicit reflexive mapping allows us to specify a disposition on that mapping:

因为中间的O与原始代码点保持不变。这里是我们使用自反映射的地方。认识到O与O“相同”,我们可以将它映射到它自己。这通常是多余的,但添加显式自反映射允许我们指定该映射的配置:

O a--> O

O a-->O

With that, the variant type list for AOA --> OOO becomes:

至此,AOA-->OOO的变量类型列表变为:

     AOA --> OOO: a a a
        
     AOA --> OOO: a a a
        

and the label OOO again passes the derivation condition

标签OOO再次通过派生条件

     if "only-variants" = "a" => set label disposition to "allocatable"
        
     if "only-variants" = "a" => set label disposition to "allocatable"
        

as desired. This use of reflexive variants is typical whenever derivations with the "only-variants" qualifier are used. If any code point uses a reflexive variant, a well-behaved LGR would specify an appropriate reflexive variant for all code points.

如所愿。每当使用带有“only variants”限定符的派生词时,这种自反变体的使用是典型的。如果任何代码点使用自反变量,表现良好的LGR将为所有代码点指定适当的自反变量。

12. Limiting Allocatable Variants by Subtyping
12. 通过子类型限制可分配变量

As we have seen, the number of variant labels can potentially be large, due to combinatorics. Sometimes it is possible to divide variants into categories and to stipulate that only variant labels with variants from the same category should be allocatable. For some LGRs, this constraint can be implemented by a rule that disallows code points from different categories to occur in the same allocatable label. For other LGRs, the appropriate mechanism may be dividing the allocatable variants into subtypes.

正如我们所看到的,由于组合学,变体标签的数量可能会很大。有时可以将变体划分为类别,并规定只有具有相同类别变体的变体标签才可分配。对于某些LGR,可以通过一条规则来实现此约束,该规则不允许来自不同类别的代码点出现在同一个可分配标签中。对于其他LGR,适当的机制可能是将可分配变体划分为子类型。

To recap, in the standard case, a code point C can have (up to) two types of variant mappings

总而言之,在标准情况下,代码点C最多可以有两种类型的变量映射

C x--> X C a--> A

C x-->x C a-->a

where a--> means a variant mapping with type "allocatable" and x--> means "blocked". For the purpose of the following discussion, we name the target code point with the corresponding uppercase letter.

其中a-->表示类型为“可分配”的变量映射,x-->表示“已阻止”。出于以下讨论的目的,我们使用相应的大写字母命名目标代码点。

Subtyping allows us to distinguish among different types of allocatable variants. For example, we can define three new types: "s", "t", and "b". Of these, "s" and "t" are mutually incompatible, but "b" is compatible with either "s" or "t" (in this case, "b" stands for "both"). A real-world example for this might be variant mappings appropriate for "simplified" or "traditional" Chinese variants, or appropriate for both.

子类型化允许我们区分不同类型的可分配变体。例如,我们可以定义三种新类型:“s”、“t”和“b”。其中,“s”和“t”是互不兼容的,但“b”与“s”或“t”兼容(在本例中,“b”代表“两者”)。现实世界中的一个例子可能是适合于“简化”或“传统”中文变体的变体映射,或者两者都适用。

With subtypes defined as above, a code point C might have (up to) four types of variant mappings

对于上面定义的子类型,代码点C可能有(最多)四种类型的变量映射

C x--> X C s--> S C t--> T C b--> B

C x-->x C s-->s C t-->t C b-->b

and explicit reflexive mappings of one of these types

和其中一种类型的显式自反映射

C s--> C C t--> C C b--> C

C s-->C C t-->C C b-->C

As before, all mappings must have one and only one type, but each code point may map to any number of other code points.

与前面一样,所有映射必须有一个且只有一个类型,但每个代码点可以映射到任意数量的其他代码点。

We define the compatibility of "b" with "t" or "s" by our choice of derivation conditions as follows

我们通过选择以下派生条件来定义“b”与“t”或“s”的兼容性

     if "any-variants" = "x" =>  blocked
     else if "only-variants" = "s" or "b" =>  allocatable
     else if "only-variants" = "t" or "b" =>  allocatable
     else if "any-variants" = "s" or "t" or "b" =>  blocked
        
     if "any-variants" = "x" =>  blocked
     else if "only-variants" = "s" or "b" =>  allocatable
     else if "only-variants" = "t" or "b" =>  allocatable
     else if "any-variants" = "s" or "t" or "b" =>  blocked
        

An original label of four code points

四个代码点的原始标签

CCCC

中交

may have many variant labels, such as this example listed with its corresponding variant type list:

可能有许多变体标签,例如本示例及其相应的变体类型列表:

     CCCC --> XSTB : x s t b
        
     CCCC --> XSTB : x s t b
        

This variant label is blocked because to get from C to B required x-->. (Because variant mappings are defined for specific source code points, we need to show the starting label for each of these examples, not merely the code points in the variant label.) The variant label

此变体标签被阻止,因为从C到B需要x-->。(因为变量映射是为特定的源代码点定义的,所以我们需要显示每个示例的起始标签,而不仅仅是变量标签中的代码点。)

     CCCC --> SSBB : s s b b
        
     CCCC --> SSBB : s s b b
        

is allocatable, because the variant type list contains only allocatable mappings of subtype "s" or "b", which we have defined as being compatible by our choice of derivations. The actual set of variant types {s, b} has only two members, but the examples are easier to follow if we list each type. The label

是可分配的,因为变量类型列表只包含子类型“s”或“b”的可分配映射,我们通过选择派生将其定义为兼容的。变量类型{s,b}的实际集合只有两个成员,但是如果我们列出每种类型,示例更容易理解。标签

     CCCC --> TTBB : t t b b
        
     CCCC --> TTBB : t t b b
        

is again allocatable, because the variant type set {t, b} contains only allocatable mappings of the mutually compatible allocatable subtypes "t" or "b". In contrast,

也是可分配的,因为变量类型集{t,b}只包含相互兼容的可分配子类型“t”或“b”的可分配映射。相反

     CCCC --> SSTT : s s t t
        
     CCCC --> SSTT : s s t t
        

is not allocatable, because the type set contains incompatible subtypes "t" and "s" and thus would be blocked by the final derivation.

不可分配,因为类型集包含不兼容的子类型“t”和“s”,因此将被最终派生阻止。

The variant labels

变体标签

     CCCC --> CSBB : c s b b
     CCCC --> CTBB : c t b b
        
     CCCC --> CSBB : c s b b
     CCCC --> CTBB : c t b b
        

are only allocatable based on the subtype for the C --> C mapping, which is denoted here by "c" and (depending on what was chosen for the type of the reflexive mapping) could correspond to "s", "t", or "b".

只能根据C-->C映射的子类型进行分配,在这里用“C”表示,并且(取决于为自反映射类型选择的内容)可以对应于“s”、“t”或“b”。

If the subtype is "s", the first of these two labels is allocatable; if it is "t", the second of these two labels is allocatable; if it is "b", both labels are allocatable.

如果子类型为“s”,则这两个标签中的第一个标签是可分配的;如果是“t”,则这两个标签中的第二个标签是可分配的;如果是“b”,则两个标签都是可分配的。

So far, the scheme does not seem to have brought any huge reduction in allocatable variant labels, but that is because we tacitly assumed that C could have all three types of allocatable variants "s", "t", and "b" at the same time.

到目前为止,该方案似乎没有大幅减少可分配变量标签,但这是因为我们默认C可以同时拥有所有三种类型的可分配变量“s”、“t”和“b”。

In a real-world example, the types "s", "t", and "b" are assigned so that each code point C normally has, at most, one non-reflexive variant mapping labeled with one of these subtypes, and all other mappings would be assigned type "x" (blocked). This holds true for most code points in existing tables (such as those used in current IDN Top-Level Domains (TLDs)), although certain code points have exceptionally complex variant relations and may have an extra mapping.

在现实世界的示例中,类型“s”、“t”和“b”被分配,这样每个代码点C通常最多有一个非自反变量映射,用这些子类型中的一个标记,所有其他映射将被分配类型“x”(阻塞)。这适用于现有表中的大多数代码点(如当前IDN顶级域(TLD)中使用的代码点),尽管某些代码点具有异常复杂的变量关系,并且可能具有额外的映射。

13. Allowing Mixed Originals
13. 允许混合原件

If the desire is to allow original labels (but not variant labels) that are s/t mixed, then the scheme needs to be slightly refined to distinguish between reflexive and non-reflexive variants. In this document, the symbol "r-n" means "a reflexive (identity) mapping of type 'n'". The reflexive mappings of the preceding section thus become:

如果希望允许s/t混合的原始标签(而不是变体标签),则需要稍微细化方案以区分自反变体和非自反变体。在本文件中,符号“r-n”表示“类型为'n'的自反(身份)映射”。上一节的自反映射因此成为:

C r-s--> C C r-t--> C C r-b--> C

C r-s-->C C r-t-->C C r-b-->C

With this convention, and redefining the derivations

根据这个惯例,重新定义派生词

   if "any-variants" = "x" =>  blocked
   else if "only-variants" = "s" or "r-s" or "b" or "r-b" => allocatable
   else if "only-variants" = "t" or "r-t" or "b" or "r-b" => allocatable
   else if "any-variants" = "s" or "t" or "b"  => blocked
   else => allocatable
        
   if "any-variants" = "x" =>  blocked
   else if "only-variants" = "s" or "r-s" or "b" or "r-b" => allocatable
   else if "only-variants" = "t" or "r-t" or "b" or "r-b" => allocatable
   else if "any-variants" = "s" or "t" or "b"  => blocked
   else => allocatable
        

any labels that contain only reflexive mappings of otherwise mixed type (in other words, any mixed original label) now fall through, and their disposition is set to "allocatable" in the final derivation.

任何只包含其他混合类型的自反映射的标签(换句话说,任何混合的原始标签)现在都将失效,并且它们的处置在最终派生中设置为“可分配”。

In a well-behaved LGR, it is preferable to explicitly define the derivation for allocatable labels instead of using a fall through. In the derivation above, code points without any variant mappings fall through and become allocatable by default if they are part of an original label. Especially in a large repertoire, it can be difficult to identify which code points are affected. Instead, it is preferable to mark them with their own reflexive mapping type "neither" or "r-n".

在性能良好的LGR中,最好显式定义可分配标签的派生,而不是使用fall-throup。在上面的派生过程中,如果没有任何变量映射的代码点是原始标签的一部分,那么它们将失效并在默认情况下成为可分配的。特别是在大型曲目中,很难确定哪些代码点受到影响。相反,最好用它们自己的自反映射类型“none”或“r-n”来标记它们。

C r-n--> C

C r-n-->C

With that, we can change

有了这些,我们可以改变

else => allocatable

else=>可分配

to

     else if "only-variants" = "r-s" or "r-t" or "r-b" or "r-n"
          =>  allocatable
     else => invalid
        
     else if "only-variants" = "r-s" or "r-t" or "r-b" or "r-n"
          =>  allocatable
     else => invalid
        

This makes the intent more explicit, and by ensuring that all code points in the LGR have a reflexive mapping of some kind, it is easier to verify the correct assignment of their types.

这使得意图更加明确,并且通过确保LGR中的所有代码点都具有某种自反映射,更容易验证其类型的正确分配。

14. Handling Out-of-Repertoire Variants
14. 处理曲目外的变体

At first, it may seem counterintuitive to define variants that map to code points that are not part of the repertoire. However, for zones for which multiple LGRs are defined, there may be situations where labels valid under one LGR should be blocked if a label under another LGR is already delegated. This situation can arise whether or not the repertoires of the affected LGRs overlap and, where repertoires overlap, whether or not the labels are both restricted to the common subset.

首先,定义映射到不属于指令集的代码点的变体似乎有违直觉。但是,对于定义了多个LGR的区域,可能存在这样的情况:如果已委派另一个LGR下的标签,则应阻止一个LGR下有效的标签。无论受影响LGR的曲目是否重叠,以及在曲目重叠的情况下,标签是否都局限于公共子集,都可能出现这种情况。

In order to handle this exclusion relation through definition of variants, it is necessary to be able to specify variant mappings to some code point X that is outside an LGR's repertoire, R:

为了通过定义变量来处理这种排除关系,必须能够指定变量映射到LGR指令集之外的某些代码点X,R:

     C  x--> X : where C = elementOf(R) and X != elementOf(R)
        
     C  x--> X : where C = elementOf(R) and X != elementOf(R)
        

Because of symmetry, it is necessary to also specify the inverse mapping in the LGR:

由于对称性,还需要在LGR中指定逆映射:

     X  x--> C : where X != elementOf(R) and C = elementOf(R)
        
     X  x--> C : where X != elementOf(R) and C = elementOf(R)
        

This makes X a source of variant mappings, and it becomes necessary to identify X as being outside the repertoire, so that any attempt to apply for a label containing X will lead to a disposition of "invalid", just as if X had never been listed in the LGR. The mechanism to do this uses reflexive variants but with a new type of reflexive mapping of "out-of-repertoire-var", shown as "r-o-->":

这使得X成为变量映射的来源,因此有必要将X标识为在指令集之外,这样,任何尝试应用包含X的标签的行为都将导致“无效”的处置,就像X从未在LGR中列出一样。实现这一点的机制使用自反变体,但使用了一种新类型的自反映射“out-of-repertoire var”,如“r-o-->”:

X r-o--> X

X r-o-->X

This indicates X != elementOf(R), as long as the LGR is provided with a suitable derivation, so that any label containing "r-o-->" is assigned a disposition of "invalid", just as if X was any other code point not part of the repertoire. The derivation used is:

这表示X!=elementOf(R),只要LGR提供了适当的派生,那么任何包含“R-o-->”的标签都会被分配一个“invalid”的处理,就像X是任何其他不属于指令集的代码点一样。使用的推导是:

     if "any-variant" = "out-of-repertoire-var" => invalid
        
     if "any-variant" = "out-of-repertoire-var" => invalid
        

It is inserted ahead of any other derivation of the "any-variant" kind in the chain of derivations. As a result, instead of the minimum two symmetric variants, for any out-of-repertoire variants, there are a minimum of three variant mappings defined:

它是在派生链中“任何变体”类型的任何其他派生之前插入的。因此,对于任何曲目外的变体,至少定义了三个变体映射,而不是至少两个对称变体:

C x--> X X x--> C X r-o--> X

C x-->x x-->C x r-o-->x

where C = elementOf(R) and X != elementOf(R).

其中C=元素(R)和X!=(R)元素。

Because no variant label with any code point outside the repertoire could ever be allocated, the only logical choice for the non-reflexive mappings to out-of-repertoire code points is "blocked".

因为无法分配任何代码点位于指令集之外的变量标签,所以非自反映射到指令集之外代码点的唯一逻辑选择是“阻塞”。

15. Conditional Variants
15. 条件变体

Variant mappings are based on whether code points are "same as" to the user. In some writing systems, code points change shape based on where they occur in the word (positional forms). Some code points have matching shapes in some positions but not in others. In such cases, the variant mapping exists only for some possible positions or, more generally, only for some contexts. For other contexts, the variant mapping does not exist.

变量映射基于代码点是否与用户“相同”。在某些书写系统中,代码点根据它们在单词中出现的位置(位置形式)改变形状。某些代码点在某些位置具有匹配形状,但在其他位置不具有匹配形状。在这种情况下,变量映射只存在于某些可能的位置,或者更一般地,只存在于某些上下文中。对于其他上下文,变量映射不存在。

For example, take two code points that have the same shape at the end of a label (or in final position) but not in any other position. In that case, they are variants only when they occur in the final position, something we indicate like this:

例如,取两个在标签末端(或最终位置)具有相同形状但不在任何其他位置的代码点。在这种情况下,只有当它们出现在最终位置时才是变体,我们这样表示:

     final: C --> D
        
     final: C --> D
        

In cursively connected scripts, like Arabic, a code point may take its final form when next to any following code point that interrupts the cursive connection, not just at the end of a label. (We ignore the isolated form to keep the discussion simple; if included, "final" might be "final-or-isolate", for example).

在草书连接的脚本(如阿拉伯语)中,当代码点位于中断草书连接的任何后续代码点旁边时,而不仅仅是在标签的末尾时,代码点可能会采用其最终形式。(为了保持讨论的简单性,我们忽略了单独的形式;例如,如果包含“final”,则“final”可能是“final或isolate”)。

From symmetry, we expect that the mapping D --> C should also exist only when the code point D is in final position. (Similar considerations apply to transitivity.)

从对称性来看,我们期望映射D-->C也应该仅在代码点D位于最终位置时存在。(类似的考虑也适用于及物性。)

Sometimes a code point has a final form that is practically the same as that of some other code point while sharing initial and medial forms with another.

有时,一个代码点的最终形式实际上与其他代码点的最终形式相同,同时与其他代码点共享初始形式和中间形式。

     final: C --> D
     !final: C --> E
        
     final: C --> D
     !final: C --> E
        

Here, the case where the condition is the opposite of final is shown as "!final".

这里,条件与final相反的情况显示为“!final”。

Because shapes differ by position, when a context is applied to a variant mapping, it is treated independently from the same mapping in other contexts. This extends to the assignment of types. For example, the mapping C --> F may be "allocatable" in final position but "blocked" in any other context:

由于形状因位置不同而不同,当上下文应用于变量映射时,它将独立于其他上下文中的相同映射进行处理。这扩展到类型的分配。例如,映射C-->F在最终位置可能“可分配”,但在任何其他上下文中可能“被阻止”:

     final:  C  a--> F
     !final: C  x--> F
        
     final:  C  a--> F
     !final: C  x--> F
        

Now, the type assigned to the forward mapping is independent of the reverse symmetric mapping or any transitive mappings. Imagine a situation where the symmetric mapping is defined as F a--> C, that is, all mappings from F to C are "allocatable":

现在,分配给正向映射的类型独立于反向对称映射或任何传递映射。假设对称映射定义为F a-->C,即从F到C的所有映射都是“可分配的”:

     final: F  a--> C
     !final: F  a-->C
        
     final: F  a--> C
     !final: F  a-->C
        

Why not simply write F a--> C? Because the forward mapping is divided by context. Adding a context makes the two forward variant mappings distinct, and that needs to be accounted for explicitly in the reverse mappings so that human and machine readers can easily

为什么不直接写F a-->C呢?因为正向映射是按上下文划分的。添加一个上下文会使两个前向变量映射不同,这需要在反向映射中明确说明,以便人类和机器读者可以轻松地

verify symmetry and transitivity of the variant mappings in the LGR. (This is true even though the two opposite contexts of "final" and "!final" should together cover all possible cases.)

验证LGR中变量映射的对称性和传递性。(即使“final”和“!final”这两个相反的上下文应该一起涵盖所有可能的情况,这也是正确的。)

16. Making Conditional Variants Well Behaved
16. 使条件变量表现良好

To ensure that LGR with contextual variants is well behaved, it is best to always use "fully qualified" variant mappings that always agree in the names of the context rules for forward and reverse mappings. It is also necessary to ensure that no label can match more than one context for the same mapping. Using mutually exclusive contexts, such as "final" and "!final", is an easy way to ensure that.

为确保具有上下文变量的LGR表现良好,最好始终使用“完全限定”的变量映射,这些映射在正向和反向映射的上下文规则名称中始终一致。还必须确保任何标签都不能为同一映射匹配多个上下文。使用相互排斥的上下文,例如“final”和“!final”,是确保这一点的简单方法。

However, it is not always necessary to define dual or multiple contexts that together cover all possible cases. For example, here are two contexts that do not cover all possible positional contexts:

然而,并不总是需要定义涵盖所有可能情况的双重或多重上下文。例如,以下两个上下文并不涵盖所有可能的位置上下文:

final: C --> D initial: C --> D.

最终版本:C-->D初始版本:C-->D。

A well-behaved LGR using these two contexts would define all symmetric and transitive mappings involving C, D, and their variants consistently in terms of the two conditions "final" and "initial" and ensure that both cannot be satisfied at the same time by some label.

使用这两个上下文的表现良好的LGR将根据“final”和“initial”两个条件一致地定义所有涉及C、D及其变体的对称和传递映射,并确保某些标签不能同时满足这两个条件。

In addition to never defining the same mapping with two contexts that may be satisfied by the same label, a well-behaved LGR never combines a variant mapping with a context with the same variant mapping without a context:

除了从不使用同一标签可能满足的两个上下文定义相同的映射外,表现良好的LGR从不将变量映射与具有相同变量映射的上下文(不含上下文)相结合:

     context: C --> D
     C --> D
        
     context: C --> D
     C --> D
        

Inadvertent mixing of conditional and unconditional variants can be detected and flagged by a parser, but verifying that two formally distinct contexts are never satisfied by the same label would depend on the interaction between labels and context rules, which means that it will be up to the LGR designer to ensure that the LGR is well behaved.

解析器可以检测并标记条件和无条件变体的无意混合,但验证同一标签是否永远不会满足两个形式上不同的上下文将取决于标签和上下文规则之间的交互,这意味着LGR设计者将负责确保LGR表现良好。

A well-behaved LGR never assigns conditions on a reflexive variant, as that is effectively no different from having a context on the code point itself; the latter is preferred.

表现良好的LGR从不在自反变量上指定条件,因为这实际上与在代码点本身上具有上下文没有什么不同;后者是首选。

Finally, for symmetry to work as expected, the context must be defined such that it is satisfied for both the original code point in the context of the original label and for the variant code point in the variant label. In other words, the context should be "stable under variant substitution" anywhere in the label.

最后,为了使对称性按预期工作,必须定义上下文,以使原始标签上下文中的原始代码点和变体标签中的变体代码点都满足该上下文。换句话说,上下文应该在标签的任何地方“在变体替换下稳定”。

Positional contexts usually satisfy this last condition; for example, a code point that interrupts a cursive connection would likely share this property with any of its variants. However, as it is possible in principle to define other kinds of contexts, it is necessary to make sure that the LGR is well behaved in this aspect at the time the LGR is designed.

位置上下文通常满足最后一个条件;例如,中断草书连接的代码点可能与其任何变体共享此属性。然而,由于原则上可以定义其他类型的上下文,因此有必要在设计LGR时确保LGR在这方面表现良好。

Due to the difficulty in verifying these constraints mechanically, it is essential that an LGR designer document the reasons why the LGR can be expected to meet them and the details of the techniques used to ensure that outcome. This information should be found in the description element of the LGR.

由于难以机械地验证这些约束,LGR设计师必须记录LGR能够满足这些约束的原因以及用于确保结果的技术细节。该信息应在LGR的description元素中找到。

In summary, conditional contexts can be useful for some cases, but additional care must be taken to ensure that an LGR containing conditional contexts is well behaved. LGR designers would be well advised to avoid using conditional contexts and to prefer unconditional rules whenever practical, even though it will doubtlessly reduce the number of labels practically available.

总之,条件上下文在某些情况下可能有用,但必须额外注意确保包含条件上下文的LGR表现良好。LGR设计者最好避免使用条件上下文,并在可行时选择无条件规则,尽管这无疑会减少实际可用的标签数量。

17. Variants for Sequences
17. 序列的变体

Variant mappings can be defined between sequences or between a code point and a sequence. For example, one might define a "blocked" variant between the sequence "rn" and the code point "m" because they are practically indistinguishable in common UI fonts.

可以在序列之间或代码点与序列之间定义变量映射。例如,可以在序列“rn”和代码点“m”之间定义一个“阻塞”变量,因为它们在普通UI字体中几乎无法区分。

Such variants are no different from variants defined between single code points, except if a sequence is defined such that there is a code point or shorter sequence that is a prefix (initial subsequence) and both it and the remainder are also part of the repertoire. In that case, it is possible to create duplicate variants with conflicting dispositions.

这种变体与在单个代码点之间定义的变体没有什么不同,除非定义了一个序列,使得存在一个作为前缀(初始子序列)的代码点或更短的序列,并且它和其余部分也是指令集的一部分。在这种情况下,可能会创建具有冲突配置的重复变体。

The following shows such an example resulting in conflicting reflexive variants:

下面显示了导致自反变体冲突的示例:

A a--> C AB x--> CD

A-->C AB x-->CD

where AB is a sequence with an initial subsequence of A. For example, B might be a combining code point used in sequence AB. If B only occurs in the sequence, there is no issue, but if B also occurs by itself, for example:

其中AB是初始子序列为a的序列。例如,B可能是序列AB中使用的组合码点。如果B仅出现在序列中,则不存在问题,但如果B本身也出现,例如:

B a--> D

B a-->D

then a label "AB" might correspond to either {A}{B}, that is, the two code points, or {AB}, the sequence, where the curly braces show the sequence boundaries as they would be applied during label validation and variant mapping.

然后,标签“AB”可能对应于{a}{B},即两个代码点,或者{AB},序列,其中大括号显示序列边界,就像它们在标签验证和变量映射期间应用一样。

A label AB would then generate the "allocatable" variant label {C}{D} and the "blocked" variant label {CD}, thus creating two variant labels with conflicting dispositions.

然后,标签AB将生成“可分配”变体标签{C}{D}和“阻塞”变体标签{CD},从而创建具有冲突配置的两个变体标签。

For the example of a blocked variant between "m" and "rn" (and vice versa), there is no issue as long as "r" and "n" do not have variant mappings of their own, so that there cannot be multiple variant labels for the same input. However, it is preferable to avoid ambiguities altogether where possible.

对于“m”和“rn”之间的阻塞变量(反之亦然),只要“r”和“n”没有自己的变量映射,就没有问题,因此同一输入不能有多个变量标签。然而,最好尽可能避免歧义。

The easiest way to avoid an ambiguous segmentation into sequences is by never allowing both a sequence and all of its constituent parts simultaneously as independent parts of the repertoire, for example, by not defining B by itself as a member of the repertoire.

避免对序列进行模棱两可的分割的最简单方法是,绝不允许序列及其所有组成部分同时作为曲目的独立部分,例如,不将B本身定义为曲目的一个成员。

Sequences are often used for combining sequences that consist of a base character B followed by one or more combining marks C. By enumerating all sequences in which a certain combining mark is expected and by not listing the combining mark by itself in the LGR, the mark cannot occur outside of these specifically enumerated contexts. In cases where enumeration is not possible or practicable, other techniques can be used to prevent ambiguous segmentation, for example, a context rule on code points that disallows B preceding C in any label except as part of a predefined sequence or class of sequences. The details of such techniques are outside the scope of this document (see [RFC7940] for information on context rules for code points).

序列通常用于组合由基本字符B后跟一个或多个组合标记C组成的序列。通过枚举预期有某个组合标记的所有序列,并且通过在LGR中不单独列出组合标记,标记不能出现在这些特定枚举的上下文之外。在枚举不可能或不可行的情况下,可以使用其他技术来防止模棱两可的分段,例如,代码点上的上下文规则不允许在任何标签中B在C之前,除非作为预定义序列或序列类的一部分。此类技术的详细信息不在本文档的范围内(有关代码点的上下文规则的信息,请参见[RFC7940])。

18. Corresponding XML Notation
18. 相应的XML表示法

The XML format defined in [RFC7940] corresponds fairly directly to the notation used for variant mappings in this document. (There is no notation in the RFC for variant type sets). In an LGR document, a simple member of a repertoire that does not have any variants is listed as:

[RFC7940]中定义的XML格式相当直接地对应于本文档中用于变量映射的符号。(RFC中没有变量类型集的符号)。在LGR文档中,没有任何变体的曲目的简单成员如下所示:

   <char cp="nnnn" />
        
   <char cp="nnnn" />
        

where nnnn is the [UNICODE] code point value in the standard uppercase hexadecimal notation padded to at least 4 digits and without leading "U+". For a code point sequence of length 2, the XML notation becomes:

其中nnnn是标准大写十六进制表示法中的[UNICODE]码点值,填充至至少4位,且不带前导“U+”。对于长度为2的代码点序列,XML表示法为:

   <char cp="uuuu vvvvv" />
        
   <char cp="uuuu vvvvv" />
        
   Variant mappings are defined by nesting <var> elements inside the
   <char> element.  For example, a variant relation of type "blocked"
        
   Variant mappings are defined by nesting <var> elements inside the
   <char> element.  For example, a variant relation of type "blocked"
        

C x--> X

C x-->x

is expressed as

表示为

     <char cp="nnnn">
       <var cp="mmmm" type="blocked" />
     </char>
        
     <char cp="nnnn">
       <var cp="mmmm" type="blocked" />
     </char>
        

where "x-->" identifies a "blocked" type. (Other types include "a-->" for "allocatable", for example. Here, nnnn and mmmm are the [UNICODE] code point values for C and X, respectively. Either C or X could be a code point sequence or a single code point.

其中“x-->”标识“阻塞”类型。(其他类型包括代表“可分配”的“a-->”。这里,nnnn和mmmm分别是C和X的[UNICODE]代码点值。C或X可以是代码点序列或单个代码点。

A reflexive mapping is specified the same way, except that it always uses the same code point value for both the <char> and <var> element, for example:

自反映射的指定方式相同,只是它始终对<char>和<var>元素使用相同的代码点值,例如:

X r-o--> X

X r-o-->X

would correspond to

相当于

   <char cp="nnnn"><var cp="nnnn" type="out-of-repertoire-var" /></char>
        
   <char cp="nnnn"><var cp="nnnn" type="out-of-repertoire-var" /></char>
        

Multiple <var> elements may be nested inside a single <char> element, but their "cp" values must be distinct (unless attributes for context rules are present and the combination of "cp" value and context attributes are distinct).

多个<var>元素可以嵌套在单个<char>元素中,但是它们的“cp”值必须是不同的(除非存在上下文规则的属性,并且“cp”值和上下文属性的组合是不同的)。

     <char cp="nnnn">
       <var cp="kkkk" type="allocatable" />
       <var cp="mmmm" type="blocked" />
     </char>
        
     <char cp="nnnn">
       <var cp="kkkk" type="allocatable" />
       <var cp="mmmm" type="blocked" />
     </char>
        

A set of conditional variants like

一组条件变量,如

     final: C  a--> K
     !final: C  x--> K
        
     final: C  a--> K
     !final: C  x--> K
        

would correspond to

相当于

     <var cp="kkkk" when="final" type="allocatable" />
     <var cp="kkkk" not-when="final" type="blocked" />
        
     <var cp="kkkk" when="final" type="allocatable" />
     <var cp="kkkk" not-when="final" type="blocked" />
        

where the string "final" references a name of a context rule. Context rules are defined in [RFC7940]; they conceptually correspond to regular expressions. The details of how to create and define these rules are outside the scope of this document. If the label matches the context defined in the rule, the variant mapping is valid and takes part in further processing. Otherwise, it is invalid and ignored. Using the "not-when" attribute inverts the sense of the match. The two attributes are mutually exclusive.

其中字符串“final”引用上下文规则的名称。[RFC7940]中定义了上下文规则;它们在概念上对应于正则表达式。如何创建和定义这些规则的详细信息不在本文档的范围内。如果标签与规则中定义的上下文匹配,则变量映射有效并参与进一步的处理。否则,它将无效并被忽略。使用“not when”属性可以反转匹配的意义。这两个属性是相互排斥的。

A derivation of a variant label disposition

变体标签配置的派生

     if "only-variants" = "s" or "b" => allocatable
        
     if "only-variants" = "s" or "b" => allocatable
        

is expressed as

表示为

     <action disp="allocatable" only-variants= "s b" />
        
     <action disp="allocatable" only-variants= "s b" />
        

Instead of using "if" and "else if", the <action> elements implicitly form a cascade, where the first action triggered defines the disposition of the label. The order of action elements is thus significant.

<action>元素不使用“if”和“else-if”,而是隐式地形成一个级联,其中触发的第一个动作定义标签的处置。因此,行动要素的顺序非常重要。

For the full specification of the XML format, see [RFC7940].

有关XML格式的完整规范,请参见[RFC7940]。

19. IANA Considerations
19. IANA考虑

This document does not require any IANA actions.

本文件不要求IANA采取任何行动。

20. Security Considerations
20. 安全考虑

As described in [RFC7940], variants may be used as a tool to reduce certain avenues of attack in security-relevant identifiers by allowing certain labels to be "mutually exclusive or registered only to the same user". However, if indiscriminately designed, variants may themselves contribute to risks to the security or usability of the identifiers, whether resulting from an ambiguous definition or from allowing too many allocatable variants per label.

如[RFC7940]所述,变体可作为一种工具,通过允许某些标签“相互排斥或仅向同一用户注册”,减少安全相关标识符中的某些攻击途径。然而,如果不加区别地设计,变体本身可能会对标识符的安全性或可用性造成风险,无论是由于定义不明确还是由于每个标签允许太多可分配变体。

The information in this document is intended to allow the reader to design a specification of an LGR that is "well behaved" with respect to variants; as used here, this term refers to an LGR that is predictable in its effects to the LGR author (and reviewer) and more reliable in its implementation.

本文件中的信息旨在允许读者设计关于变体的“性能良好”的LGR规范;正如这里所使用的,这个术语指的是一个LGR,它对LGR作者(和审阅者)的影响是可预测的,并且在实现上更可靠。

A well-behaved LGR is not merely one that can be expressed in [RFC7940], but, in addition, it actively avoids certain edge cases not prevented by the schema, such as those that would result in ambiguities in the specification of the intended disposition for some variant labels. By applying the additional considerations introduced in this document, including adding certain declarations that are optional under the schema and may not alter the results of processing a label, such an LGR becomes easier to review and its implementations easier to verify.

表现良好的LGR不仅仅是可以在[RFC7940]中表达的LGR,而且,它还积极地避免了模式无法阻止的某些边缘情况,例如那些可能导致某些变体标签的预期配置规范模糊不清的情况。通过应用本文档中介绍的其他注意事项,包括在模式下添加某些可选的声明,并且这些声明可能不会改变处理标签的结果,这样的LGR更容易审查,其实现也更容易验证。

It should be noted that variants are an important part, but only a part, of an LGR design. There are many other features of an LGR that this document does not touch upon. Also, the question of whether to define variants at all, or what labels are to be considered variants of each other, is not addressed here.

应注意的是,变型是LGR设计的一个重要部分,但只是其中的一部分。本文档未涉及LGR的许多其他功能。此外,是否定义变体的问题,或者什么标签被认为是彼此的变体的问题,在这里没有讨论。

21. References
21. 工具书类
21.1. Normative References
21.1. 规范性引用文件

[RFC7940] Davies, K. and A. Freytag, "Representing Label Generation Rulesets Using XML", RFC 7940, DOI 10.17487/RFC7940, August 2016, <https://www.rfc-editor.org/info/rfc7940>.

[RFC7940]Davies,K.和A.Freytag,“使用XML表示标签生成规则集”,RFC 7940,DOI 10.17487/RFC7940,2016年8月<https://www.rfc-editor.org/info/rfc7940>.

21.2. Informative References
21.2. 资料性引用

[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, <https://www.rfc-editor.org/info/rfc1034>.

[RFC1034]Mockapetris,P.,“域名-概念和设施”,STD 13,RFC 1034,DOI 10.17487/RFC1034,1987年11月<https://www.rfc-editor.org/info/rfc1034>.

[RFC1035] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, November 1987, <https://www.rfc-editor.org/info/rfc1035>.

[RFC1035]Mockapetris,P.,“域名-实现和规范”,STD 13,RFC 1035,DOI 10.17487/RFC1035,1987年11月<https://www.rfc-editor.org/info/rfc1035>.

[RFC5890] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", RFC 5890, DOI 10.17487/RFC5890, August 2010, <https://www.rfc-editor.org/info/rfc5890>.

[RFC5890]Klensin,J.,“应用程序的国际化域名(IDNA):定义和文档框架”,RFC 5890,DOI 10.17487/RFC5890,2010年8月<https://www.rfc-editor.org/info/rfc5890>.

[RFC5891] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, DOI 10.17487/RFC5891, August 2010, <https://www.rfc-editor.org/info/rfc5891>.

[RFC5891]Klensin,J.,“应用程序中的国际化域名(IDNA):协议”,RFC 5891,DOI 10.17487/RFC5891,2010年8月<https://www.rfc-editor.org/info/rfc5891>.

[RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)", RFC 5892, DOI 10.17487/RFC5892, August 2010, <https://www.rfc-editor.org/info/rfc5892>.

[RFC5892]Faltstrom,P.,Ed.“Unicode码点和应用程序的国际化域名(IDNA)”,RFC 5892,DOI 10.17487/RFC5892,2010年8月<https://www.rfc-editor.org/info/rfc5892>.

[RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, <https://www.rfc-editor.org/info/rfc5893>.

[RFC5893]Alvestrand,H.,Ed.和C.Karp,“应用程序国际化域名(IDNA)的从右到左脚本”,RFC 5893,DOI 10.17487/RFC5893,2010年8月<https://www.rfc-editor.org/info/rfc5893>.

[RFC5894] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, <https://www.rfc-editor.org/info/rfc5894>.

[RFC5894]Klensin,J.,“应用程序的国际化域名(IDNA):背景、解释和理由”,RFC 5894,DOI 10.17487/RFC5894,2010年8月<https://www.rfc-editor.org/info/rfc5894>.

[UNICODE] The Unicode Consortium, "The Unicode Standard", <http://www.unicode.org/versions/latest/>.

[UNICODE]UNICODE联盟,“UNICODE标准”<http://www.unicode.org/versions/latest/>.

Acknowledgments

致谢

Contributions that have shaped this document have been provided by Marc Blanchet, Ben Campbell, Patrik Faltstrom, Scott Hollenbeck, Mirja Kuehlewind, Sarmad Hussain, John Klensin, Alexey Melnikov, Nicholas Ostler, Michel Suignard, Andrew Sullivan, Wil Tan, and Suzanne Woolf.

Marc Blanchet、Ben Campbell、Patrik Faltstrom、Scott Hollenbeck、Mirja Kuehlewind、Sarmad Hussain、John Klensin、Alexey Melnikov、Nicholas Ostler、Michel Suignard、Andrew Sullivan、Wil Tan和Suzanne Woolf提供了形成本文件的贡献。

Author's Address

作者地址

Asmus Freytag

阿斯穆斯弗雷塔格

   Email: asmus@unicode.org
        
   Email: asmus@unicode.org