Internet Engineering Task Force (IETF)                         K. Davies
Request for Comments: 7940                                         ICANN
Category: Standards Track                                     A. Freytag
ISSN: 2070-1721                                              ASMUS, Inc.
                                                             August 2016
        
Internet Engineering Task Force (IETF)                         K. Davies
Request for Comments: 7940                                         ICANN
Category: Standards Track                                     A. Freytag
ISSN: 2070-1721                                              ASMUS, Inc.
                                                             August 2016
        

Representing Label Generation Rulesets Using XML

使用XML表示标签生成规则集

Abstract

摘要

This document describes a method of representing rules for validating identifier labels and alternate representations of those labels using Extensible Markup Language (XML). These policies, known as "Label Generation Rulesets" (LGRs), are used for the implementation of Internationalized Domain Names (IDNs), for example. The rulesets are used to implement and share that aspect of policy defining which labels and Unicode code points are permitted for registrations, which alternative code points are considered variants, and what actions may be performed on labels containing those variants.

本文档描述了一种表示规则的方法,用于使用可扩展标记语言(XML)验证标识符标签和这些标签的替代表示。例如,这些策略称为“标签生成规则集”(LGR),用于实现国际化域名(IDN)。规则集用于实现和共享策略的这一方面,定义允许注册哪些标签和Unicode代码点,哪些替代代码点被视为变体,以及可以对包含这些变体的标签执行哪些操作。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。有关互联网标准的更多信息,请参见RFC 7841第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7940.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc7940.

Copyright Notice

版权公告

Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2016 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1. Introduction ....................................................4
   2. Design Goals ....................................................5
   3. Normative Language ..............................................6
   4. LGR Format ......................................................6
      4.1. Namespace ..................................................7
      4.2. Basic Structure ............................................7
      4.3. Metadata ...................................................8
           4.3.1. The "version" Element ...............................8
           4.3.2. The "date" Element ..................................9
           4.3.3. The "language" Element ..............................9
           4.3.4. The "scope" Element ................................10
           4.3.5. The "description" Element ..........................10
           4.3.6. The "validity-start" and "validity-end" Elements ...11
           4.3.7. The "unicode-version" Element ......................11
           4.3.8. The "references" Element ...........................12
   5. Code Points and Variants .......................................13
      5.1. Sequences .................................................14
      5.2. Conditional Contexts ......................................15
      5.3. Variants ..................................................16
           5.3.1. Basic Variants .....................................16
           5.3.2. The "type" Attribute ...............................17
           5.3.3. Null Variants ......................................18
           5.3.4. Variants with Reflexive Mapping ....................19
           5.3.5. Conditional Variants ...............................20
      5.4. Annotations ...............................................22
           5.4.1. The "ref" Attribute ................................22
           5.4.2. The "comment" Attribute ............................23
      5.5. Code Point Tagging ........................................23
        
   1. Introduction ....................................................4
   2. Design Goals ....................................................5
   3. Normative Language ..............................................6
   4. LGR Format ......................................................6
      4.1. Namespace ..................................................7
      4.2. Basic Structure ............................................7
      4.3. Metadata ...................................................8
           4.3.1. The "version" Element ...............................8
           4.3.2. The "date" Element ..................................9
           4.3.3. The "language" Element ..............................9
           4.3.4. The "scope" Element ................................10
           4.3.5. The "description" Element ..........................10
           4.3.6. The "validity-start" and "validity-end" Elements ...11
           4.3.7. The "unicode-version" Element ......................11
           4.3.8. The "references" Element ...........................12
   5. Code Points and Variants .......................................13
      5.1. Sequences .................................................14
      5.2. Conditional Contexts ......................................15
      5.3. Variants ..................................................16
           5.3.1. Basic Variants .....................................16
           5.3.2. The "type" Attribute ...............................17
           5.3.3. Null Variants ......................................18
           5.3.4. Variants with Reflexive Mapping ....................19
           5.3.5. Conditional Variants ...............................20
      5.4. Annotations ...............................................22
           5.4.1. The "ref" Attribute ................................22
           5.4.2. The "comment" Attribute ............................23
      5.5. Code Point Tagging ........................................23
        
   6. Whole Label and Context Evaluation .............................23
      6.1. Basic Concepts ............................................23
      6.2. Character Classes .........................................25
           6.2.1. Declaring and Invoking Named Classes ...............25
           6.2.2. Tag-Based Classes ..................................26
           6.2.3. Unicode Property-Based Classes .....................26
           6.2.4. Explicitly Declared Classes ........................28
           6.2.5. Combined Classes ...................................29
      6.3. Whole Label and Context Rules .............................30
           6.3.1. The "rule" Element .................................31
           6.3.2. The Match Operators ................................32
           6.3.3. The "count" Attribute ..............................33
           6.3.4. The "name" and "by-ref" Attributes .................34
           6.3.5. The "choice" Element ...............................34
           6.3.6. Literal Code Point Sequences .......................35
           6.3.7. The "any" Element ..................................35
           6.3.8. The "start" and "end" Elements .....................35
           6.3.9. Example Context Rule from IDNA Specification .......36
      6.4. Parameterized Context or When Rules .......................37
           6.4.1. The "anchor" Element ...............................37
           6.4.2. The "look-behind" and "look-ahead" Elements ........38
           6.4.3. Omitting the "anchor" Element ......................40
   7. The "action" Element ...........................................40
      7.1. The "match" and "not-match" Attributes ....................41
      7.2. Actions with Variant Type Triggers ........................41
           7.2.1. The "any-variant", "all-variants", and
                  "only-variants" Attributes .........................41
           7.2.2. Example from Tables in the Style of RFC 3743 .......44
      7.3. Recommended Disposition Values ............................45
      7.4. Precedence ................................................45
      7.5. Implied Actions ...........................................45
      7.6. Default Actions ...........................................46
   8. Processing a Label against an LGR ..............................47
      8.1. Determining Eligibility for a Label .......................47
           8.1.1. Determining Eligibility Using Reflexive
                  Variant Mappings ...................................47
      8.2. Determining Variants for a Label ..........................48
      8.3. Determining a Disposition for a Label or Variant Label ....49
      8.4. Duplicate Variant Labels ..................................50
      8.5. Checking Labels for Collision .............................50
   9. Conversion to and from Other Formats ...........................51
   10. Media Type ....................................................51
   11. IANA Considerations ...........................................52
      11.1. Media Type Registration ..................................52
      11.2. URN Registration .........................................53
      11.3. Disposition Registry .....................................53
        
   6. Whole Label and Context Evaluation .............................23
      6.1. Basic Concepts ............................................23
      6.2. Character Classes .........................................25
           6.2.1. Declaring and Invoking Named Classes ...............25
           6.2.2. Tag-Based Classes ..................................26
           6.2.3. Unicode Property-Based Classes .....................26
           6.2.4. Explicitly Declared Classes ........................28
           6.2.5. Combined Classes ...................................29
      6.3. Whole Label and Context Rules .............................30
           6.3.1. The "rule" Element .................................31
           6.3.2. The Match Operators ................................32
           6.3.3. The "count" Attribute ..............................33
           6.3.4. The "name" and "by-ref" Attributes .................34
           6.3.5. The "choice" Element ...............................34
           6.3.6. Literal Code Point Sequences .......................35
           6.3.7. The "any" Element ..................................35
           6.3.8. The "start" and "end" Elements .....................35
           6.3.9. Example Context Rule from IDNA Specification .......36
      6.4. Parameterized Context or When Rules .......................37
           6.4.1. The "anchor" Element ...............................37
           6.4.2. The "look-behind" and "look-ahead" Elements ........38
           6.4.3. Omitting the "anchor" Element ......................40
   7. The "action" Element ...........................................40
      7.1. The "match" and "not-match" Attributes ....................41
      7.2. Actions with Variant Type Triggers ........................41
           7.2.1. The "any-variant", "all-variants", and
                  "only-variants" Attributes .........................41
           7.2.2. Example from Tables in the Style of RFC 3743 .......44
      7.3. Recommended Disposition Values ............................45
      7.4. Precedence ................................................45
      7.5. Implied Actions ...........................................45
      7.6. Default Actions ...........................................46
   8. Processing a Label against an LGR ..............................47
      8.1. Determining Eligibility for a Label .......................47
           8.1.1. Determining Eligibility Using Reflexive
                  Variant Mappings ...................................47
      8.2. Determining Variants for a Label ..........................48
      8.3. Determining a Disposition for a Label or Variant Label ....49
      8.4. Duplicate Variant Labels ..................................50
      8.5. Checking Labels for Collision .............................50
   9. Conversion to and from Other Formats ...........................51
   10. Media Type ....................................................51
   11. IANA Considerations ...........................................52
      11.1. Media Type Registration ..................................52
      11.2. URN Registration .........................................53
      11.3. Disposition Registry .....................................53
        
   12. Security Considerations .......................................54
      12.1. LGRs Are Only a Partial Remedy for Problem Space .........54
      12.2. Computational Expense of Complex Tables ..................54
   13. References ....................................................55
      13.1. Normative References .....................................55
      13.2. Informative References ...................................56
   Appendix A. Example Tables ........................................58
   Appendix B. How to Translate Tables Based on RFC 3743 into the
               XML Format ............................................63
   Appendix C. Indic Syllable Structure Example ......................68
      C.1. Reducing Complexity .......................................70
   Appendix D. RELAX NG Compact Schema ...............................71
   Acknowledgements ..................................................82
   Authors' Addresses ................................................82
        
   12. Security Considerations .......................................54
      12.1. LGRs Are Only a Partial Remedy for Problem Space .........54
      12.2. Computational Expense of Complex Tables ..................54
   13. References ....................................................55
      13.1. Normative References .....................................55
      13.2. Informative References ...................................56
   Appendix A. Example Tables ........................................58
   Appendix B. How to Translate Tables Based on RFC 3743 into the
               XML Format ............................................63
   Appendix C. Indic Syllable Structure Example ......................68
      C.1. Reducing Complexity .......................................70
   Appendix D. RELAX NG Compact Schema ...............................71
   Acknowledgements ..................................................82
   Authors' Addresses ................................................82
        
1. Introduction
1. 介绍

This document specifies a method of using Extensible Markup Language (XML) to describe Label Generation Rulesets (LGRs). LGRs are algorithms used to determine whether, and under what conditions, a given identifier label is permitted, based on the code points it contains and their context. These algorithms comprise a list of permissible code points, variant code point mappings, and a set of rules that act on the code points and mappings. LGRs form part of an administrator's policies. In deploying Internationalized Domain Names (IDNs), they have also been known as IDN tables or variant tables.

本文档指定了一种使用可扩展标记语言(XML)描述标签生成规则集(LGR)的方法。LGR是一种算法,用于根据给定的标识符标签所包含的代码点及其上下文,确定该标识符标签是否被允许,以及在何种条件下被允许。这些算法包括允许的代码点列表、变量代码点映射以及一组作用于代码点和映射的规则。LGR构成管理员策略的一部分。在部署国际化域名(IDN)时,它们也被称为IDN表或变体表。

There are other kinds of policies relating to labels that are not normally covered by LGRs and are therefore not necessarily representable by the XML format described here. These include, but are not limited to, policies around trademarks, or prohibition of fraudulent or objectionable words.

还有其他类型的与标签相关的策略,这些策略通常不在LGR的范围内,因此不一定可以用这里描述的XML格式表示。这些措施包括但不限于有关商标的政策,或禁止欺诈或令人反感的词语。

Administrators of the zones for top-level domain registries have historically published their LGRs using ASCII text or HTML. The formatting of these documents has been loosely based on the format used for the Language Variant Table described in [RFC3743]. [RFC4290] also provides a "model table format" that describes a similar set of functionality. Common to these formats is that the algorithms used to evaluate the data therein are implicit or specified elsewhere.

顶级域注册表区域的管理员历史上曾使用ASCII文本或HTML发布其LGR。这些文档的格式松散地基于[RFC3743]中描述的语言变量表所使用的格式。[RFC4290]还提供了一种“模型表格式”,描述了一组类似的功能。这些格式的共同点是,用于评估其中数据的算法是隐式的或在别处指定的。

Through the first decade of IDN deployment, experience has shown that LGRs derived from these formats are difficult to consistently implement and compare, due to their differing formats. A universal

在IDN部署的第一个十年中,经验表明,由于格式不同,从这些格式派生的LGR很难一致地实现和比较。普遍的

format, such as one using a structured XML format, will assist by improving machine readability, consistency, reusability, and maintainability of LGRs.

格式,例如使用结构化XML格式的格式,将有助于提高LGR的机器可读性、一致性、可重用性和可维护性。

When used to represent a simple list of permitted code points, the format is quite straightforward. At the cost of some complexity in the resulting file, it also allows for an implementation of more sophisticated handling of conditional variants that reflects the known requirements of current zone administrator policies.

当用于表示允许的代码点的简单列表时,格式非常简单。以结果文件的复杂性为代价,它还允许实现更复杂的条件变量处理,以反映当前区域管理员策略的已知要求。

Another feature of this format is that it allows many of the algorithms to be made explicit and machine implementable. A remaining small set of implicit algorithms is described in this document to allow commonality in implementation.

这种格式的另一个特点是,它允许许多算法被显式地和机器实现。本文档中描述了剩余的一小部分隐式算法,以实现通用性。

While the predominant usage of this specification is to represent IDN label policy, the format is not limited to IDN usage and may also be used for describing ASCII domain name label rulesets, or other types of identifier labels beyond those used for domain names.

虽然本规范的主要用途是表示IDN标签策略,但该格式不限于IDN用途,也可用于描述ASCII域名标签规则集,或域名以外的其他类型的标识符标签。

2. Design Goals
2. 设计目标

The following goals informed the design of this format:

以下目标为该格式的设计提供了依据:

o The format needs to be implementable in a reasonably straightforward manner in software.

o 该格式需要在软件中以一种相当简单的方式实现。

o The format should be able to be automatically checked for formatting errors, so that common mistakes can be caught.

o 该格式应能够自动检查格式错误,以便捕获常见错误。

o An LGR needs to be able to express the set of valid code points that are allowed for registration under a specific administrator's policies.

o LGR需要能够表示在特定管理员策略下允许注册的有效代码点集。

o An LGR needs to be able to express computed alternatives to a given identifier based on mapping relationships between code points, whether one-to-one or many-to-many. These computed alternatives are commonly known as "variants".

o LGR需要能够基于代码点之间的映射关系(无论是一对一还是多对多)来表示给定标识符的计算备选方案。这些计算出的备选方案通常被称为“变体”。

o Variant code points should be able to be tagged with explicit dispositions or categories that can be used to support registry policy (such as whether to allocate the computed variant or to merely block it from usage or registration).

o 变量代码点应能够标记为明确的配置或类别,这些配置或类别可用于支持注册表策略(例如是分配计算的变量,还是仅阻止其使用或注册)。

o Variants and code points must be able to be stipulated based on contextual information. For example, some variants may only be applicable when they follow a certain code point or when the code point is displayed in a specific presentation form.

o 必须能够根据上下文信息规定变体和代码点。例如,某些变体可能仅在遵循某个代码点或该代码点显示在特定的表示形式中时才适用。

o The data contained within an LGR must be able to be interpreted unambiguously, so that independent implementations that utilize the contents will arrive at the same results.

o LGR中包含的数据必须能够被明确地解释,以便使用这些内容的独立实现将得到相同的结果。

o To the largest extent possible, policy rules should be able to be specified in the XML format without relying on hidden or built-in algorithms in implementations.

o 在最大程度上,策略规则应该能够以XML格式指定,而不依赖于实现中隐藏的或内置的算法。

o LGRs should be suitable for comparison and reuse, such that one could easily compare the contents of two or more to see the differences, to merge them, and so on.

o LGR应该适合于比较和重用,这样就可以轻松地比较两个或多个的内容,以查看差异,合并它们,等等。

o As many existing IDN tables as practicable should be able to be migrated to the LGR format with all applicable interpretation logic retained.

o 尽可能多的现有IDN表格应能够迁移到LGR格式,并保留所有适用的解释逻辑。

These requirements are partly derived from reviewing the existing corpus of published IDN tables, plus the requirements of ICANN's work to implement an LGR for the DNS root zone [LGR-PROCEDURE]. In particular, Section B of that document identifies five specific requirements for an LGR methodology.

这些要求部分来自于审查已发布IDN表的现有语料库,以及ICANN为DNS根区域实施LGR的工作要求[LGR-PROCEDURE]。特别是,该文件B节确定了LGR方法的五项具体要求。

The syntax and rules in [RFC5892] and [RFC3743] were also reviewed.

还对[RFC5892]和[RFC3743]中的语法和规则进行了审查。

It is explicitly not the goal of this format to stipulate what code points should be listed in an LGR by a zone administrator. Which registration policies are used for a particular zone are outside the scope of this memo.

此格式的目标显然不是规定区域管理员应在LGR中列出哪些代码点。特定区域使用的注册策略不在本备忘录的范围内。

3. Normative Language
3. 规范语言

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。

4. LGR Format
4. LGR格式

An LGR is expressed as a well-formed XML document [XML] that conforms to the schema defined in Appendix D.

LGR表示为格式良好的XML文档[XML],符合附录D中定义的模式。

As XML is case sensitive, an LGR must be authored with the correct casing. For example, the XML element names MUST be in lowercase as described in this specification, and matching of attribute values is only performed in a case-sensitive manner.

由于XML区分大小写,因此必须使用正确的大小写编写LGR。例如,如本规范所述,XML元素名称必须为小写,并且属性值的匹配仅以区分大小写的方式执行。

A document that is not well-formed, is non-conforming, or violates other constraints specified in this specification MUST be rejected.

格式不正确、不合格或违反本规范规定的其他约束的文件必须拒收。

4.1. Namespace
4.1. 名称空间

The XML Namespace URI is "urn:ietf:params:xml:ns:lgr-1.0".

XML名称空间URI是“urn:ietf:params:XML:ns:lgr-1.0”。

See Section 11.2 for more information.

详见第11.2节。

4.2. Basic Structure
4.2. 基本结构

The basic XML framework of the document is as follows:

文档的基本XML框架如下所示:

       <?xml version="1.0"?>
       <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
           ...
       </lgr>
        
       <?xml version="1.0"?>
       <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
           ...
       </lgr>
        

The "lgr" element contains up to three sub-elements or sections. First is an optional "meta" element that contains all metadata associated with the LGR, such as its authorship, what it is used for, implementation notes, and references. This is followed by a required "data" element that contains the substantive code point data. Finally, an optional "rules" element contains information on rules for evaluating labels, if any, along with "action" elements providing for the disposition of labels and computed variant labels.

“lgr”元素最多包含三个子元素或部分。第一个是可选的“meta”元素,它包含与LGR关联的所有元数据,例如其作者、用途、实现说明和引用。然后是一个必需的“数据”元素,其中包含实质性的代码点数据。最后,一个可选的“rules”元素包含有关用于评估标签的规则(如果有)的信息,以及用于处理标签和计算变量标签的“action”元素。

       <?xml version="1.0"?>
       <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
           <meta>
               ...
           </meta>
           <data>
               ...
           </data>
           <rules>
               ...
           </rules>
       </lgr>
        
       <?xml version="1.0"?>
       <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
           <meta>
               ...
           </meta>
           <data>
               ...
           </data>
           <rules>
               ...
           </rules>
       </lgr>
        

A document MUST contain exactly one "lgr" element. Each "lgr" element MUST contain zero or one "meta" element, exactly one "data" element, and zero or one "rules" element; and these three elements MUST be in that order.

文档必须仅包含一个“lgr”元素。每个“lgr”元素必须包含零个或一个“meta”元素、恰好一个“data”元素和零个或一个“rules”元素;这三个要素必须按顺序排列。

Some elements that are direct or nested child elements of the "rules" element MUST be placed in a specific relative order to other elements for the LGR to be valid. An LGR that violates these constraints MUST be rejected. In other cases, changing the ordering would result in a valid, but different, specification.

某些元素是“rules”元素的直接子元素或嵌套子元素,必须按照与其他元素的特定相对顺序放置,LGR才能有效。必须拒绝违反这些约束的LGR。在其他情况下,更改顺序将产生有效但不同的规范。

In the following descriptions, required, non-repeating elements or attributes are generally not called out explicitly, in contrast to "OPTIONAL" ones, or those that "MAY" be repeated. For attributes that take lists as values, the elements MUST be space-separated.

在下面的描述中,与“可选”元素或那些“可能”重复的元素或属性相比,必需的、非重复的元素或属性通常不会被显式调用。对于以列表为值的属性,元素必须以空格分隔。

4.3. Metadata
4.3. 元数据

The "meta" element expresses metadata associated with the LGR, and the element SHOULD be included so that the associated metadata are available as part of the LGR and cannot become disassociated. The following subsections describe elements that may appear within the "meta" element.

“meta”元素表示与LGR相关联的元数据,应该包含该元素,以便相关联的元数据作为LGR的一部分可用,并且不能解除关联。以下小节描述了“元”元素中可能出现的元素。

The "meta" element can be used to identify the author or relevant contact person, explain the intended usage of the LGR, and provide implementation notes as well as references. Detailed metadata allow the LGR document to become self-documenting -- for example, if rendered in a human-readable format by an appropriate tool.

“元”元素可用于识别作者或相关联系人,解释LGR的预期用途,并提供实现说明和参考。详细的元数据允许LGR文档成为自文档——例如,如果通过适当的工具以人类可读的格式呈现。

Providing metadata pertaining to the date and version of the LGR is particularly encouraged to make it easier for interoperating consumers to ensure that they are using the correct LGR.

特别鼓励提供与LGR的日期和版本相关的元数据,以便于互操作使用者确保他们使用正确的LGR。

With the exception of the "unicode-version" element, the data contained within is not required by software consuming the LGR in order to calculate valid labels or to calculate variants. If present, the "unicode-version" element MUST be used by a consumer of the table to identify that it has the correct Unicode property data to perform operations on the table. This ensures that possible differences in code point properties between editions of the Unicode Standard do not impact the product of calculations utilizing an LGR.

除“unicode版本”元素外,使用LGR的软件不需要其中包含的数据来计算有效标签或计算变体。如果存在“unicode版本”元素,则表的使用者必须使用该元素来标识它具有正确的unicode属性数据,以便对表执行操作。这确保了Unicode标准版本之间可能存在的代码点属性差异不会影响使用LGR的计算结果。

4.3.1. The "version" Element
4.3.1. “版本”元素

The "version" element is OPTIONAL. It is used to uniquely identify each version of the LGR. No specific format is required, but it is RECOMMENDED that it be the decimal representation of a single positive integer, which is incremented with each revision of the file.

“version”元素是可选的。它用于唯一标识LGR的每个版本。不需要特定格式,但建议使用单个正整数的十进制表示形式,该格式随文件的每次修订而递增。

An example of a typical first edition of a document:

文档的典型第一版示例:

       <version>1</version>
        
       <version>1</version>
        

The "version" element may have an OPTIONAL "comment" attribute.

“version”元素可能有一个可选的“comment”属性。

       <version comment="draft">1</version>
        
       <version comment="draft">1</version>
        
4.3.2. The "date" Element
4.3.2. “日期”要素

The OPTIONAL "date" element is used to identify the date the LGR was posted. The contents of this element MUST be a valid ISO 8601 "full-date" string as described in [RFC3339].

可选的“日期”元素用于标识LGR的发布日期。此元素的内容必须是[RFC3339]中所述的有效ISO 8601“完整日期”字符串。

Example of a date:

日期示例:

       <date>2009-11-01</date>
        
       <date>2009-11-01</date>
        
4.3.3. The "language" Element
4.3.3. “语言”因素

Each OPTIONAL "language" element identifies a language or script for which the LGR is intended. The value of the "language" element MUST be a valid language tag as described in [RFC5646]. The tag may refer to a script plus undefined language if the LGR is not intended for a specific language.

每个可选的“语言”元素标识LGR所针对的语言或脚本。“language”元素的值必须是[RFC5646]中所述的有效语言标记。如果LGR不适用于特定语言,则标记可能引用脚本和未定义的语言。

Example of an LGR for the English language:

英语语言的LGR示例:

       <language>en</language>
        
       <language>en</language>
        

If the LGR applies to a script rather than a specific language, the "und" language tag SHOULD be used followed by the relevant script subtag from [RFC5646]. For example, for a Cyrillic script LGR:

如果LGR适用于脚本而非特定语言,则应使用“und”语言标记,后跟[RFC5646]中的相关脚本子标记。例如,对于西里尔文脚本LGR:

       <language>und-Cyrl</language>
        
       <language>und-Cyrl</language>
        

If the LGR covers a set of multiple languages or scripts, the "language" element MAY be repeated. However, for cases of a script-specific LGR exhibiting insignificant admixture of code points from other scripts, it is RECOMMENDED to use a single "language" element identifying the predominant script. In the exceptional case of a multi-script LGR where no script is predominant, use Zyyy (Common):

如果LGR涵盖一组多种语言或脚本,“语言”元素可能会重复。但是,对于特定于脚本的LGR显示出与其他脚本的代码点不显著混合的情况,建议使用单个“语言”元素来标识主要脚本。在没有脚本占主导地位的多脚本LGR的例外情况下,使用Zyyy(通用):

       <language>und-Zyyy</language>
        
       <language>und-Zyyy</language>
        
4.3.4. The "scope" Element
4.3.4. “范围”要素

This OPTIONAL element refers to a scope, such as a domain, to which this policy is applied. The "type" attribute specifies the type of scope being defined. A type of "domain" means that the scope is a domain that represents the apex of the DNS zone to which the LGR is applied. For that type, the content of the "scope" element MUST be a domain name written relative to the root zone, in presentation format with no trailing dot. However, in the unique case of the DNS root zone, it is represented as ".".

此可选元素引用应用此策略的范围,例如域。“type”属性指定要定义的范围的类型。一种类型的“域”意味着作用域是表示应用LGR的DNS区域顶点的域。对于该类型,“scope”元素的内容必须是相对于根区域编写的域名,采用不带尾随点的表示格式。但是,在DNS根区域的唯一情况下,它表示为“”。

       <scope type="domain">example.com</scope>
        
       <scope type="domain">example.com</scope>
        

There may be multiple "scope" tags used -- for example, to reflect a list of domains to which the LGR is applied.

可能会使用多个“范围”标记——例如,用于反映应用LGR的域列表。

No other values of the "type" attribute are defined by this specification; however, this specification can be used for applications other than domain names. Implementers of LGRs for applications other than domain names SHOULD define the scope extension grammar in an IETF specification or use XML namespaces to distinguish their scoping mechanism distinctly from the base LGR namespace. An explanation of any custom usage of the scope in the "description" element is RECOMMENDED.

本规范未定义“类型”属性的其他值;但是,此规范可用于域名以外的应用程序。除域名以外的应用程序的LGR实现者应在IETF规范中定义范围扩展语法,或使用XML名称空间区分其作用域机制与基本LGR名称空间。建议在“description”元素中解释作用域的任何自定义用法。

       <scope xmlns="http://example.com/ns/scope/1.0">
           ... content per alternate namespace ...
       </scope>
        
       <scope xmlns="http://example.com/ns/scope/1.0">
           ... content per alternate namespace ...
       </scope>
        
4.3.5. The "description" Element
4.3.5. “描述”元素

The "description" element is an OPTIONAL, free-form element that contains any additional relevant description that is useful for the user in its interpretation. Typically, this field contains authorship information, as well as additional context on how the LGR was formulated and how it applies, such as citations and references that apply to the LGR as a whole.

“description”元素是一个可选的、自由形式的元素,它包含对用户解释有用的任何其他相关描述。通常,此字段包含作者信息,以及有关LGR如何制定和如何应用的其他上下文,例如适用于整个LGR的引用和参考。

This field should not be relied upon for providing instructions on how to parse or utilize the data contained elsewhere in the specification. Authors of tables should expect that software applications that parse and use LGRs will not use the "description" element to condition the application of the LGR's data and rules.

不应依赖此字段来提供有关如何解析或利用规范中其他地方包含的数据的说明。表的作者应该期望解析和使用LGR的软件应用程序不会使用“description”元素来限制LGR数据和规则的应用。

The element has an OPTIONAL "type" attribute, which refers to the Internet media type [RFC2045] of the enclosed data. Typical types would be "text/plain" or "text/html". The attribute SHOULD be a valid media type. If supplied, it will be assumed that the contents are of that media type. If the description lacks a "type" value, it will be assumed to be plain text ("text/plain").

该元素有一个可选的“type”属性,该属性表示所包含数据的Internet媒体类型[RFC2045]。典型的类型是“text/plain”或“text/html”。该属性应为有效的媒体类型。如果提供,将假定内容为该媒体类型。如果描述缺少“类型”值,则假定为纯文本(“文本/纯文本”)。

4.3.6. The "validity-start" and "validity-end" Elements
4.3.6. “有效性开始”和“有效性结束”要素

The "validity-start" and "validity-end" elements are OPTIONAL elements that describe the time period from which the contents of the LGR become valid (are used in registry policy) and time when the contents of the LGR cease to be used, respectively.

“有效性开始”和“有效性结束”元素是可选元素,分别描述LGR内容生效的时间段(在注册表策略中使用)和LGR内容停止使用的时间。

The dates MUST conform to the "full-date" format described in Section 5.6 of [RFC3339].

日期必须符合[RFC3339]第5.6节所述的“完整日期”格式。

       <validity-start>2014-03-12</validity-start>
        
       <validity-start>2014-03-12</validity-start>
        
4.3.7. The "unicode-version" Element
4.3.7. “unicode版本”元素

Whenever an LGR depends on character properties from a given version of the Unicode Standard, the version number used in creating the LGR MUST be listed in the form x.y.z, where x, y, and z are positive decimal integers (see [Unicode-Versions]). If any software processing the table does not have access to character property data of the requisite version, it MUST NOT perform any operations relating to whole-label evaluation relying on Unicode character properties (Section 6.2.3).

每当LGR依赖于给定版本的Unicode标准的字符属性时,创建LGR时使用的版本号必须以x.y.z的形式列出,其中x、y和z是正十进制整数(请参见[Unicode版本])。如果处理表格的任何软件无法访问所需版本的字符属性数据,则不得执行与依赖Unicode字符属性的整个标签评估相关的任何操作(第6.2.3节)。

The value of a given Unicode character property may change between versions of the Unicode Character Database [UAX44], unless such change has been explicitly disallowed in [Unicode-Stability]. It is RECOMMENDED to only reference properties defined as stable or immutable. As an alternative to referencing the property, the information can be presented explicitly in the LGR.

给定Unicode字符属性的值可能在Unicode字符数据库[UAX44]的不同版本之间更改,除非[Unicode稳定性]中明确禁止此类更改。建议仅引用定义为稳定或不可变的属性。作为引用属性的替代方法,可以在LGR中明确显示信息。

       <unicode-version>6.3.0</unicode-version>
        
       <unicode-version>6.3.0</unicode-version>
        

It is not necessary to include a "unicode-version" element for LGRs that do not make use of Unicode character properties; however, it is RECOMMENDED.

对于不使用unicode字符属性的LGR,不需要包含“unicode版本”元素;但是,建议这样做。

4.3.8. The "references" Element
4.3.8. “参考”元素

An LGR may define a list of references that are used to associate various individual elements in the LGR to one or more normative references. A common use for references is to annotate that code points belong to an externally defined collection or standard or to give normative references for rules.

LGR可定义参考文献列表,用于将LGR中的各个元素与一个或多个规范性参考文献相关联。引用的一个常见用途是注释代码点属于外部定义的集合或标准,或者为规则提供规范性引用。

References are specified in an OPTIONAL "references" element containing one or more "reference" elements, each with a unique "id" attribute. It is RECOMMENDED that the "id" attribute be a zero-based integer; however, in addition to digits 0-9, it MAY contain uppercase letters A-Z, as well as a period, hyphen, colon, or underscore. The value of each "reference" element SHOULD be the citation of a standard, dictionary, or other specification in any suitable format. In addition to an "id" attribute, a "reference" element MAY have a "comment" attribute for an optional free-form annotation.

引用在可选的“References”元素中指定,该元素包含一个或多个“reference”元素,每个元素都具有唯一的“id”属性。建议“id”属性为基于零的整数;但是,除数字0-9外,它还可能包含大写字母A-Z以及句点、连字符、冒号或下划线。每个“参考”元素的值应为以任何适当格式引用的标准、词典或其他规范。除了“id”属性外,“reference”元素还可以具有可选自由形式注释的“comment”属性。

       <references>
         <reference id="0">The Unicode Consortium.  The Unicode
           Standard, Version 8.0.0, (Mountain View, CA: The Unicode
           Consortium, 2015.  ISBN 978-1-936213-10-8)
           http://www.unicode.org/versions/Unicode8.0.0/</reference>
         <reference id="1">Big-5: Computer Chinese Glyph and Character
            Code Mapping Table, Technical Report C-26, 1984</reference>
         <reference id="2" comment="synchronized with Unicode 6.1">
            ISO/IEC
            10646:2012 3rd edition</reference>
         ...
       </references>
       ...
       <data>
         <char cp="0620" ref="0 2" />
         ...
       </data>
        
       <references>
         <reference id="0">The Unicode Consortium.  The Unicode
           Standard, Version 8.0.0, (Mountain View, CA: The Unicode
           Consortium, 2015.  ISBN 978-1-936213-10-8)
           http://www.unicode.org/versions/Unicode8.0.0/</reference>
         <reference id="1">Big-5: Computer Chinese Glyph and Character
            Code Mapping Table, Technical Report C-26, 1984</reference>
         <reference id="2" comment="synchronized with Unicode 6.1">
            ISO/IEC
            10646:2012 3rd edition</reference>
         ...
       </references>
       ...
       <data>
         <char cp="0620" ref="0 2" />
         ...
       </data>
        

A reference is associated with an element by using its id as part of an optional "ref" attribute (see Section 5.4.1). The "ref" attribute may be used with many kinds of elements in the "data" or "rules" sections of the LGR, most notably those defining code points, variants, and rules. However, a "ref" attribute may not occur in certain kinds of elements, including references to named character classes or rules. See below for the description of these elements.

引用通过使用其id作为可选“ref”属性的一部分与元素相关联(见第5.4.1节)。“ref”属性可与LGR的“数据”或“规则”部分中的多种元素一起使用,最明显的是那些定义代码点、变体和规则的元素。但是,“ref”属性可能不会出现在某些类型的元素中,包括对命名字符类或规则的引用。有关这些元素的说明,请参见下文。

5. Code Points and Variants
5. 代码点和变体

The bulk of an LGR is a description of which set of code points is eligible for a given label. For rulesets that perform operations that result in potential variants, the code point-level relationships between variants need to also be described.

LGR的主体是描述哪些代码点集符合给定标签的条件。对于执行导致潜在变体的操作的规则集,还需要描述变体之间的代码点级关系。

The code point data is collected within the "data" element. Within this element, a series of "char" and "range" elements describe eligible code points or ranges of code points, respectively. Collectively, these are known as the repertoire.

代码点数据收集在“数据”元素中。在这个元素中,一系列“char”和“range”元素分别描述了合格的代码点或代码点的范围。总的来说,这些被称为剧目。

Discrete permissible code points or code point sequences (see Section 5.1) are declared with a "char" element. Here is a minimal example declaration for a single code point, with the code point value given in the "cp" attribute:

离散允许代码点或代码点序列(见第5.1节)用“char”元素声明。下面是单个代码点的最小示例声明,代码点值在“cp”属性中给出:

       <char cp="002D"/>
        
       <char cp="002D"/>
        

As described below, a full declaration for a "char" element, whether or not it is used for a single code point or for a sequence (see Section 5.1), may have optional child elements defining variants. Both the "char" and "range" elements can take a number of optional attributes for conditional inclusion, commenting, cross-referencing, and character tagging, as described below.

如下所述,“char”元素的完整声明,无论它是否用于单个代码点或序列(参见第5.1节),都可能具有定义变体的可选子元素。“char”和“range”元素都可以采用许多可选属性,用于条件包含、注释、交叉引用和字符标记,如下所述。

Ranges of permissible code points may be declared with a "range" element, as in this minimal example:

可使用“范围”元素声明允许代码点的范围,如本示例所示:

       <range first-cp="0030" last-cp="0039"/>
        
       <range first-cp="0030" last-cp="0039"/>
        

The range is inclusive of the first and last code points. Any additional attributes defined for a "range" element act as if applied to each code point within. A "range" element has no child elements.

该范围包括第一个和最后一个代码点。为“range”元素定义的任何附加属性的作用就像应用于其中的每个代码点一样。“range”元素没有子元素。

It is always possible to substitute a list of individually specified code points for a "range" element. The reverse is not necessarily the case. Whenever such a substitution is possible, it makes no difference in processing the data. Tools reading or writing the LGR format are free to aggregate sequences of consecutive code points of the same properties into "range" elements.

始终可以用单独指定的代码点列表替换“范围”元素。情况未必相反。只要这种替代是可能的,它就不会对数据的处理产生影响。读取或写入LGR格式的工具可以自由地将相同属性的连续代码点序列聚合到“范围”元素中。

Code points MUST be represented according to the standard Unicode convention but without the prefix "U+": they are expressed in uppercase hexadecimal and are zero-padded to a minimum of 4 digits.

代码点必须根据标准Unicode约定表示,但不带前缀“U+”:它们以大写十六进制表示,并以零填充至至少4位。

The rationale for not allowing other encoding formats, including native Unicode encoding in XML, is explored in [UAX42]. The XML conventions used in this format, such as element and attribute names, mirror this document where practical and reasonable to do so. It is RECOMMENDED to list all "char" elements in ascending order of the "cp" attribute. Not doing so makes it unnecessarily difficult for authors and reviewers to check for errors, such as duplications, or to review and compare against listing of code points in other documents and specifications.

[UAX42]探讨了不允许其他编码格式(包括XML中的本机Unicode编码)的基本原理。此格式中使用的XML约定(如元素和属性名称)在实际和合理的情况下镜像此文档。建议按“cp”属性的升序列出所有“char”元素。如果不这样做,作者和审阅者将不必要地难以检查错误,例如重复,或者难以审阅并与其他文档和规范中的代码点列表进行比较。

All "char" elements in the "data" section MUST have distinct "cp" attributes. The "range" elements MUST NOT specify code point ranges that overlap either another range or any single code point "char" elements. An LGR that defines the same code point more than once by any combination of "char" or "range" elements MUST be rejected.

“数据”部分中的所有“char”元素必须具有不同的“cp”属性。“范围”元素不得指定与另一个范围或任何单个代码点“字符”元素重叠的代码点范围。必须拒绝通过“char”或“range”元素的任意组合多次定义同一代码点的LGR。

5.1. Sequences
5.1. 序列

A sequence of two or more code points may be specified in an LGR -- for example, when defining the source for n:m variant mappings. Another use of sequences would be in cases when the exact sequence of code points is required to occur in order for the constituent elements to be eligible, such as when some code point is only eligible when preceded or followed by a certain code point. The following would define the eligibility of the MIDDLE DOT (U+00B7) only when both preceded and followed by the LATIN SMALL LETTER L (U+006C):

可以在LGR中指定两个或多个代码点的序列——例如,在定义n:m变量映射的源时。序列的另一种用途是在需要精确的代码点序列才能使组成元素合格的情况下,例如当某些代码点只有在某个代码点之前或之后才合格时。以下仅当中间点(U+00B7)前面和后面都有拉丁小写字母L(U+006C)时,才定义其合格性:

       <char cp="006C 00B7 006C" comment="Catalan middle dot"/>
        
       <char cp="006C 00B7 006C" comment="Catalan middle dot"/>
        

All sequences defined this way must be distinct, but sub-sequences may be defined. Thus, the sequence defined here may coexist with single code point definitions such as:

以这种方式定义的所有序列必须是不同的,但子序列可以定义。因此,此处定义的序列可能与单个代码点定义共存,例如:

       <char cp="006C" />
        
       <char cp="006C" />
        

As an alternative to using sequences to define a required context, a "char" or "range" element may specify a conditional context using an optional "when" attribute as described below in Section 5.2. Using a conditional context is more flexible because a context is not limited to a specific sequence of code points. In addition, using a context allows the choice of specifying either a prohibited or a required context.

作为使用序列定义所需上下文的替代方法,“char”或“range”元素可以使用可选的“when”属性指定条件上下文,如下文第5.2节所述。使用条件上下文更灵活,因为上下文不限于特定的代码点序列。此外,使用上下文允许选择指定禁止的上下文或必需的上下文。

5.2. Conditional Contexts
5.2. 条件上下文

A conditional context is specified by a rule that must be satisfied (or, alternatively, must not be satisfied) for a code point in a given label, often at a particular location in a label.

条件上下文由一条规则指定,该规则对于给定标签中的代码点必须满足(或者不能满足),通常在标签中的特定位置。

To specify a conditional context, either a "when" or "not-when" attribute may be used. The value of each "when" or "not-when" attribute is a context rule as described below in Section 6.3. This rule can be a rule evaluating the whole label or a parameterized context rule. The context condition is met when the rule specified in the "when" attribute is matched or when the rule in the "not-when" attribute fails to match. It is an error to reference a rule that is not actually defined in the "rules" element.

要指定条件上下文,可以使用“when”或“notwhen”属性。每个“何时”或“不何时”属性的值是一个上下文规则,如下文第6.3节所述。此规则可以是评估整个标签的规则,也可以是参数化上下文规则。当“when”属性中指定的规则匹配或“not when”属性中的规则不匹配时,满足上下文条件。引用“rules”元素中未实际定义的规则是错误的。

A parameterized context rule (see Section 6.4) defines the context immediately surrounding a given code point; unlike a sequence, the context is not limited to a specific fixed code point but, for example, may designate any member of a certain character class or a code point that has a certain Unicode character property.

参数化上下文规则(见第6.4节)定义了直接围绕给定代码点的上下文;与序列不同,上下文不限于特定的固定代码点,例如,可以指定特定字符类的任何成员或具有特定Unicode字符属性的代码点。

Given a suitable definition of a parameterized context rule named "follows-virama", this example specifies that a ZERO WIDTH JOINER (U+200D) is restricted to immediately follow any of several code points classified as virama:

给定一个名为“follows virama”的参数化上下文规则的合适定义,此示例指定零宽度连接符(U+200D)被限制为立即跟随几个分类为virama的代码点中的任何一个:

       <char cp="200D" when="follows-virama" />
        
       <char cp="200D" when="follows-virama" />
        

For a complete example, see Appendix A.

有关完整示例,请参见附录a。

In contrast, a whole label rule (see Section 6.3) specifies a condition to be met by the entire label -- for example, that it must contain at least one code point from a given script anywhere in the label. In the following example, no digit from either range may occur in a label that mixes digits from both ranges:

相反,完整标签规则(参见第6.3节)指定了整个标签要满足的条件——例如,它必须至少包含来自标签中任意给定脚本的一个代码点。在以下示例中,混合两个范围中的数字的标签中可能不会出现任何一个范围中的数字:

       <data>
          <range first-cp="0660" last-cp="0669" not-when="mixed-digits"
                 tag="arabic-indic-digits" />
          <range first-cp="06F0" last-cp="06F9" not-when="mixed-digits"
                 tag="extended-arabic-indic-digits" />
       </data>
        
       <data>
          <range first-cp="0660" last-cp="0669" not-when="mixed-digits"
                 tag="arabic-indic-digits" />
          <range first-cp="06F0" last-cp="06F9" not-when="mixed-digits"
                 tag="extended-arabic-indic-digits" />
       </data>
        

(See Section 6.3.9 for an example of the "mixed-digits" rule.)

(有关“混合数字”规则的示例,请参见第6.3.9节。)

The OPTIONAL "when" or "not-when" attributes are mutually exclusive. They MAY be applied to both "char" and "range" elements in the "data" element, including "char" elements defining sequences of code points, as well as to "var" elements (see Section 5.3.5).

可选的“何时”或“非何时”属性是互斥的。它们可应用于“数据”元素中的“字符”和“范围”元素,包括定义代码点序列的“字符”元素,以及“变量”元素(见第5.3.5节)。

If a label contains one or more code points that fail to satisfy a conditional context, the label is invalid (see Section 7.5). For variants, the conditional context restricts the definition of the variant to the case where the condition is met. Outside the specified context, a variant is not defined.

如果标签包含一个或多个不满足条件上下文的代码点,则标签无效(见第7.5节)。对于变体,条件上下文将变体的定义限制为满足条件的情况。在指定的上下文之外,未定义变量。

5.3. Variants
5.3. 变体

Most LGRs typically only determine simple code point eligibility, and for them, the elements described so far would be the only ones required for their "data" section. Others additionally specify a mapping of code points to other code points, known as "variants". What constitutes a variant code point is a matter of policy and varies for each implementation. The following examples are intended to demonstrate the syntax; they are not necessarily typical.

大多数LGR通常只确定简单的代码点合格性,对于他们来说,到目前为止描述的元素将是其“数据”部分所需的唯一元素。其他代码点还指定了代码点到其他代码点的映射,称为“变体”。什么构成变量代码点是一个策略问题,每个实现都会有所不同。以下示例旨在演示语法;它们不一定是典型的。

5.3.1. Basic Variants
5.3.1. 基本变体

Variant code points are specified using one of more "var" elements as children of a "char" element. The target mapping is specified using the "cp" attribute. Other, optional attributes for the "var" element are described below.

变量代码点是使用一个或多个“var”元素作为“char”元素的子元素指定的。使用“cp”属性指定目标映射。“var”元素的其他可选属性如下所述。

For example, to map LATIN SMALL LETTER V (U+0076) as a variant of LATIN SMALL LETTER U (U+0075):

例如,要将拉丁小写字母V(U+0076)映射为拉丁小写字母U(U+0075)的变体,请执行以下操作:

       <char cp="0075">
           <var cp="0076"/>
       </char>
        
       <char cp="0075">
           <var cp="0076"/>
       </char>
        

A sequence of multiple code points can be specified as a variant of a single code point. For example, the sequence of LATIN SMALL LETTER O (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be specified as a variant for a LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) as follows:

多个代码点的序列可以指定为单个代码点的变体。例如,拉丁小写字母O(U+006F)和拉丁小写字母E(U+0065)的顺序可以假设指定为拉丁小写字母O的变体,并带有分音符(U+00F6),如下所示:

       <char cp="00F6">
           <var cp="006F 0065"/>
       </char>
        
       <char cp="00F6">
           <var cp="006F 0065"/>
       </char>
        

The source and target of a variant mapping may both be sequences but not ranges.

变量映射的源和目标可能都是序列,但不是范围。

If the source of one mapping is a prefix sequence of the source for another, both variant mappings will be considered at the same location in the input label when generating permuted variant labels. If poorly designed, an LGR containing such an instance of a prefix relation could generate multiple instances of the same variant label for the same original label, but with potentially different dispositions. Any duplicate variant labels encountered MUST be treated as an error (see Section 8.4).

如果一个映射的源是另一个映射的源的前缀序列,则在生成置换变量标签时,将在输入标签中的同一位置考虑两个变量映射。如果设计不当,包含前缀关系实例的LGR可能会为同一原始标签生成同一变体标签的多个实例,但可能具有不同的配置。遇到的任何重复变体标签都必须视为错误(见第8.4节)。

The "var" element specifies variant mappings in only one direction, even though the variant relation is usually considered symmetric; that is, if A is a variant of B, then B should also be a variant of A. The format requires that the inverse of the variant be given explicitly to fully specify symmetric variant relations in the LGR. This has the beneficial side effect of making the symmetry explicit:

“var”元素仅在一个方向上指定变量映射,即使变量关系通常被认为是对称的;也就是说,如果A是B的变体,那么B也应该是A的变体。该格式要求明确给出变体的倒数,以完全指定LGR中的对称变体关系。这会产生有利的副作用,使对称性变得明确:

       <char cp="006F 0065">
           <var cp="00F6"/>
       </char>
        
       <char cp="006F 0065">
           <var cp="00F6"/>
       </char>
        

Variant relations are normally not only symmetric but also transitive. If A is a variant of B and B is a variant of C, then A is also a variant of C. As with symmetry, these transitive relations are only part of the LGR if spelled out explicitly. Implementations that require an LGR to be symmetric and transitive should verify this mechanically.

变体关系通常不仅是对称的,而且是传递的。如果A是B的变体,B是C的变体,那么A也是C的变体。与对称一样,如果明确说明,这些传递关系只是LGR的一部分。要求LGR对称且可传递的实现应该机械地验证这一点。

All variant mappings are unique. For a given "char" element, all "var" elements MUST have a unique combination of "cp", "when", and "not-when" attributes. It is RECOMMENDED to list the "var" elements in ascending order of their target code point sequence. (For "when" and "not-when" attributes, see Section 5.3.5.)

所有变量映射都是唯一的。对于给定的“char”元素,所有“var”元素必须具有“cp”、“when”和“not when”属性的唯一组合。建议按目标代码点序列的升序列出“var”元素。(有关“何时”和“非何时”属性,请参见第5.3.5节。)

5.3.2. The "type" Attribute
5.3.2. “类型”属性

Variants may be tagged with an OPTIONAL "type" attribute. The value of the "type" attribute may be any non-empty value not starting with an underscore and not containing spaces. This value is used to resolve the disposition of any variant labels created using a given variant. (See Section 7.2.)

可以使用可选的“类型”属性标记变体。“type”属性的值可以是任何不以下划线开头且不包含空格的非空值。此值用于解析使用给定变量创建的任何变量标签的处置。(见第7.2节。)

By default, the values of the "type" attribute directly describe the target policy status (disposition) for a variant label that was generated using a particular variant, with any variant label being assigned a disposition corresponding to the most restrictive variant type. Several conventional disposition values are predefined below in Section 7. Whenever these values can represent the desired policy, they SHOULD be used.

默认情况下,“类型”属性的值直接描述使用特定变量生成的变量标签的目标策略状态(处置),任何变量标签都会被分配一个对应于最严格的变量类型的处置。以下第7节中预定义了几个常规处置值。只要这些值可以表示所需的策略,就应该使用它们。

       <char cp="767C">
           <var cp="53D1" type="allocatable"/>
           <var cp="5F42" type="blocked"/>
           <var cp="9AEA" type="blocked"/>
           <var cp="9AEE" type="blocked"/>
       </char>
        
       <char cp="767C">
           <var cp="53D1" type="allocatable"/>
           <var cp="5F42" type="blocked"/>
           <var cp="9AEA" type="blocked"/>
           <var cp="9AEE" type="blocked"/>
       </char>
        

By default, if a variant label contains any instance of one of the variants of type "blocked", the label would be blocked, but if it contained only instances of variants to be allocated, it could be allocated. See the discussion about implied actions in Section 7.6.

默认情况下,如果变量标签包含类型为“blocked”的变量之一的任何实例,则标签将被阻止,但如果它仅包含要分配的变量实例,则可以分配它。参见第7.6节中关于隐含行为的讨论。

The XML format for the LGR makes the relation between the values of the "type" attribute on variants and the resulting disposition of variant labels fully explicit. See the discussion in Section 7.2. Making this relation explicit allows a generalization of the "type" attribute from directly reflecting dispositions to a more differentiated intermediate value that is then used in the resolution of label disposition. Instead of the default action of applying the most restrictive disposition to the entire label, such a generalized resolution can be used to achieve additional goals, such as limiting the set of allocatable variant labels or implementing other policies found in existing LGRs (see, for example, Appendix B).

LGR的XML格式使变量的“type”属性的值与变量标签的最终处置之间的关系完全明确。参见第7.2节中的讨论。将此关系显式化可以将“type”属性从直接反映配置概括为更具差异性的中间值,然后用于标签配置的解析。与对整个标签应用最严格的处置的默认操作不同,这种通用解决方案可用于实现其他目标,例如限制可分配变量标签集或实施现有LGR中的其他策略(例如,参见附录B)。

Because variant mappings MUST be unique, it is not possible to define the same variant for the same "char" element with different "type" attributes (however, see Section 5.3.5).

因为变量映射必须是唯一的,所以不可能为具有不同“类型”属性的相同“char”元素定义相同的变量(但是,请参见第5.3.5节)。

5.3.3. Null Variants
5.3.3. 空变量

A null variant is a variant string that maps to no code point. This is used when a particular code point sequence is considered discretionary in the context of a whole label. To specify a null variant, use an empty "cp" attribute. For example, to mark a string with a ZERO WIDTH NON-JOINER (U+200C) to the same string without the ZERO WIDTH NON-JOINER:

空变量是不映射到任何代码点的变量字符串。当特定代码点序列在整个标签的上下文中被视为可自由支配时,可使用此选项。要指定空变量,请使用空的“cp”属性。例如,要将具有零宽度非连接符(U+200C)的字符串标记为不具有零宽度非连接符的同一字符串,请执行以下操作:

       <char cp="200C">
           <var cp=""/>
       </char>
        
       <char cp="200C">
           <var cp=""/>
       </char>
        

This is useful in expressing the intent that some code points in a label are to be mapped away when generating a canonical variant of the label. However, in tables that are designed to have symmetric variant mappings, this could lead to combinatorial explosion if not handled carefully.

这对于表示在生成标签的规范变体时要映射标签中的某些代码点的意图非常有用。然而,在设计为具有对称变量映射的表中,如果不小心处理,这可能导致组合爆炸。

The symmetric form of a null variant is expressed as follows:

空变量的对称形式表示如下:

       <char cp="">
           <var cp="200C" type="invalid" />
       </char>
        
       <char cp="">
           <var cp="200C" type="invalid" />
       </char>
        

A "char" element with an empty "cp" attribute MUST specify at least one variant mapping. It is strongly RECOMMENDED to use a type of "invalid" or equivalent when defining variant mappings from null sequences, so that variant mappings from null sequences are removed in variant label generation (see Section 5.3.2).

具有空“cp”属性的“char”元素必须至少指定一个变量映射。强烈建议在定义空序列的变量映射时使用“无效”或等效类型,以便在变量标签生成过程中删除空序列的变量映射(见第5.3.2节)。

5.3.4. Variants with Reflexive Mapping
5.3.4. 具有自反映射的变体

At first glance, there seems to be no call for adding variant mappings for which source and target code points are the same -- that is, for which the mapping is reflexive, or, in other words, an identity mapping. Yet, such reflexive mappings occur frequently in LGRs that follow [RFC3743].

乍一看,似乎不需要添加源代码点和目标代码点相同的变量映射——也就是说,对于这些变量映射是自反的,或者换句话说,是标识映射。然而,这种自反映射经常出现在[RFC3743]之后的LGR中。

Adding a "var" element allows both a type and a reference id to be specified for it. While the reference id is not used in processing, the type of the variant can be used to trigger actions. In permuting the label to generate all possible variants, the type associated with a reflexive variant mapping is applied to any of the permuted labels containing the original code point.

添加“var”元素允许为其指定类型和引用id。虽然在处理过程中未使用引用id,但变量的类型可用于触发操作。在排列标签以生成所有可能的变体时,与自反变体映射关联的类型将应用于包含原始代码点的任何排列标签。

In the following example, let's assume that the goal is to allocate only those labels that contain a variant that is considered "preferred" in some way. As defined in the example, the code point U+3473 exists both as a variant of U+3447 and as a variant of itself (reflexive mapping). Assuming an original label of "U+3473 U+3447", the permuted variant "U+3473 U+3473" would consist of the reflexive variant of U+3473 followed by a variant of U+3447. Given the variant mappings as defined here, the types for both of the variant mappings used to generate that particular permutation would have the value "preferred":

在下面的示例中,我们假设目标是只分配那些包含在某种程度上被视为“首选”的变体的标签。如示例中所定义,代码点U+3473既作为U+3447的变体存在,也作为自身的变体存在(自反映射)。假设原始标签为“U+3473 U+3447”,则置换变体“U+3473 U+3473”将由U+3473的反身变体和U+3447的变体组成。给定此处定义的变量映射,用于生成特定置换的两个变量映射的类型将具有“首选”值:

       <char cp="3447" ref="0">
         <var cp="3473" type="preferred" ref="1 3" />
       </char>
       <char cp="3473" ref="0">
         <var cp="3447" type="blocked" ref="1 3" />
         <var cp="3473" type="preferred" ref="0" />
       </char>
        
       <char cp="3447" ref="0">
         <var cp="3473" type="preferred" ref="1 3" />
       </char>
       <char cp="3473" ref="0">
         <var cp="3447" type="blocked" ref="1 3" />
         <var cp="3473" type="preferred" ref="0" />
       </char>
        

Having established the variant types in this way, a set of actions could be defined that return a disposition of "allocatable" or "activated" for a label consisting exclusively of variants with type "preferred", for example. (For details on how to define actions based on variant types, see Section 7.2.1.)

以这种方式建立变量类型后,可以定义一组操作,例如,对于仅由类型为“首选”的变量组成的标签,这些操作返回“可分配”或“已激活”的处置。(有关如何根据变量类型定义操作的详细信息,请参见第7.2.1节。)

In general, using reflexive variant mappings in this manner makes it possible to calculate disposition values using a uniform approach for all labels, whether they consist of mapped variant code points, original code points, or a mixture of both. In particular, the dispositions for two otherwise identical labels may differ based on which variant mappings were executed in order to generate each of them. (For details on how to generate variants and evaluate dispositions, see Section 8.)

通常,以这种方式使用自反变量映射可以使用统一的方法计算所有标签的处置值,无论它们是由映射的变量代码点、原始代码点还是两者的混合组成。特别是,两个其他方面相同的标签的配置可能会根据为生成每个标签而执行的变量映射而有所不同。(有关如何生成变体和评估处置的详细信息,请参见第8节。)

Another useful convention that uses reflexive variants is described below in Section 7.2.1.

下面第7.2.1节描述了另一个使用自反变体的有用约定。

5.3.5. Conditional Variants
5.3.5. 条件变体

Fundamentally, variants are mappings between two sequences of code points. However, in some instances, for a variant relationship to exist, some context external to the code point sequence must also be considered. For example, a positional context may determine whether two code point sequences are variants of each other.

从根本上说,变体是两个代码点序列之间的映射。然而,在某些情况下,为了存在变量关系,还必须考虑代码点序列外部的一些上下文。例如,位置上下文可以确定两个代码点序列是否是彼此的变体。

An example of that are Arabic code points, which can have different forms based on position, with some code points sharing forms, thus making them variants in the positions corresponding to those forms. Such positional context cannot be solely derived from the code point by itself, as the code point would be the same for the various forms.

其中一个例子是阿拉伯语代码点,它可以根据位置具有不同的形式,一些代码点共享形式,从而使它们在对应于这些形式的位置上成为变体。这样的位置上下文不能单独从代码点本身派生,因为不同形式的代码点是相同的。

As described in Section 5.2, an OPTIONAL "when" or "not-when" attribute may be given for any "var" element to specify required or prohibited contextual conditions under which the variant is defined.

如第5.2节所述,可为任何“var”元素提供可选的“when”或“not when”属性,以指定定义变量所需或禁止的上下文条件。

Assuming that the "rules" element contains suitably defined rules for "arabic-isolated" and "arabic-final", the following example shows how to mark ARABIC LETTER ALEF WITH WAVY HAMZA BELOW (U+0673) as a variant of ARABIC LETTER ALEF WITH HAMZA BELOW (U+0625), but only when it appears in its isolated or final forms:

假设“rules”元素包含“arabic isolated”和“arabic final”的适当定义的规则,下面的示例显示如何将下面带有波浪形HAMZA(U+0673)的阿拉伯文字母ALEF标记为下面带有HAMZA(U+0625)的阿拉伯文字母ALEF的变体,但仅当它以孤立或最终形式出现时:

       <char cp="0625">
           <var cp="0673" when="arabic-isolated"/>
           <var cp="0673" when="arabic-final"/>
       </char>
        
       <char cp="0625">
           <var cp="0673" when="arabic-isolated"/>
           <var cp="0673" when="arabic-final"/>
       </char>
        

While a "var" element MUST NOT contain multiple conditions (it is only allowed a single "when" or "not-when" attribute), multiple "var" elements using the same mapping MAY be specified with different "when" or "not-when" attributes. The combination of mapping and conditional context defines a unique variant.

虽然“var”元素不能包含多个条件(只允许一个“when”或“NOT when”属性),但使用相同映射的多个“var”元素可以使用不同的“when”或“NOT when”属性指定。映射和条件上下文的组合定义了一个唯一的变量。

For each variant label, care must be taken to ensure that at most one of the contextual conditions is met for variants with the same mapping; otherwise, duplicate variant labels would be created for the same input label. Any such duplicate variant labels MUST be treated as an error; see Section 8.4.

对于每个变体标签,必须注意确保具有相同映射的变体最多满足一个上下文条件;否则,将为同一输入标签创建重复的变量标签。任何此类重复的变体标签必须视为错误;见第8.4节。

Two contexts may be complementary, as in the following example, which shows ARABIC LETTER TEH MARBUTA (U+0629) as a variant of ARABIC LETTER HEH (U+0647), but with two different types.

两个上下文可能是互补的,如以下示例所示,其中阿拉伯文字母TEH MARBUTA(U+0629)是阿拉伯文字母HEH(U+0647)的变体,但有两种不同的类型。

       <char cp="0647" >
         <var cp="0629" not-when="arabic-final" type="blocked" />
         <var cp="0629" when="arabic-final" type="allocatable" />
       </char>
        
       <char cp="0647" >
         <var cp="0629" not-when="arabic-final" type="blocked" />
         <var cp="0629" when="arabic-final" type="allocatable" />
       </char>
        

The intent is that a label that uses U+0629 instead of U+0647 in a final position should be considered essentially the same label and, therefore, allocatable to the same entity, while the same substitution in a non-final position leads to labels that are different, but considered confusable, so that either one, but not both, should be delegatable.

其目的是,在最终位置使用U+0629而不是U+0647的标签应被视为本质上相同的标签,因此可分配给相同的实体,而在非最终位置使用相同的替换会导致标签不同,但被视为可混淆,因此其中一个标签(而不是两个)应是可删除的。

For symmetry, the reverse mappings must exist and must agree in their "when" or "not-when" attributes. However, symmetry does not apply to the other attributes. For example, these are potential reverse mappings for the above:

对于对称性,反向映射必须存在,并且必须在其“何时”或“非何时”属性中一致。但是,对称性不适用于其他属性。例如,这些是上述的潜在反向映射:

       <char cp="0629" >
         <var cp="0647" not-when="arabic-final" type="allocatable" />
         <var cp="0647" when="arabic-final" type="allocatable" />
       </char>
        
       <char cp="0629" >
         <var cp="0647" not-when="arabic-final" type="allocatable" />
         <var cp="0647" when="arabic-final" type="allocatable" />
       </char>
        

Here, both variants have the same "type" attribute. While it is tempting to recognize that, in this instance, the "when" and "not-when" attributes are complementary; therefore, between them they cover every single possible context, it is strongly RECOMMENDED to use the format shown in the example that makes the symmetry easily verifiable by parsers and tools. (The same applies to entries created for transitivity.)

在这里,两个变体具有相同的“type”属性。虽然人们很容易认识到,在这种情况下,“何时”和“何时”属性是互补的;因此,在它们之间,它们涵盖了每一个可能的上下文,强烈建议使用示例中所示的格式,以便解析器和工具轻松验证对称性。(这同样适用于为传递性创建的条目。)

Arabic is an example of a script for which such conditional variants have been implemented based on the joining contexts for Arabic code points. The mechanism defined here supports other forms of conditional variants that may be required by other scripts.

阿拉伯语是脚本的一个示例,对于该脚本,已根据阿拉伯语代码点的连接上下文实现了此类条件变体。这里定义的机制支持其他脚本可能需要的其他形式的条件变量。

5.4. Annotations
5.4. 注释

Two attributes, the "ref" and "comment" attributes, can be used to annotate individual elements in the LGR. They are ignored in machine-processing of the LGR. The "ref" attribute is intended for formal annotations and the "comment" attribute for free-form annotations. The latter can be applied more widely.

“ref”和“comment”两个属性可用于注释LGR中的单个元素。它们在LGR的机器处理中被忽略。“ref”属性用于正式注释,“comment”属性用于自由形式注释。后者可以得到更广泛的应用。

5.4.1. The "ref" Attribute
5.4.1. “ref”属性

Reference information MAY optionally be specified by a "ref" attribute consisting of a space-delimited sequence of reference identifiers (see Section 4.3.8).

参考信息可选择性地由“ref”属性指定,该属性由参考标识符的空格分隔序列组成(见第4.3.8节)。

       <char cp="5220" ref="0">
           <var cp="5220" ref="5"/>
           <var cp="522A" ref="2 3"/>
       </char>
        
       <char cp="5220" ref="0">
           <var cp="5220" ref="5"/>
           <var cp="522A" ref="2 3"/>
       </char>
        

This facility is typically used to give source information for code points or variant relations. This information is ignored when machine-processing an LGR. If applied to a range, the "ref" attribute applies to every code point in the range. All reference identifiers MUST be from the set declared in the "references" element (see Section 4.3.8). It is an error to repeat a reference identifier in the same "ref" attribute. It is RECOMMENDED that identifiers be listed in ascending order.

此工具通常用于提供代码点或变量关系的源信息。机器处理LGR时忽略此信息。如果应用于某个范围,“ref”属性将应用于该范围内的每个代码点。所有参考标识符必须来自“参考”元素中声明的集合(见第4.3.8节)。在同一“ref”属性中重复引用标识符是错误的。建议按升序列出标识符。

In addition to "char", "range", and "var" elements in the "data" section, a "ref" attribute may be present for a number of element types contained in the "rules" element as described below: actions and literals ("char" inside a rule), as well as for definitions of rules and classes, but not for references to named character classes or rules using the "by-ref" attribute defined below. (The use of the "by-ref" and "ref" attributes is mutually exclusive.) None of the elements in the metadata take a "ref" attribute; to provide additional information, use the "description" element instead.

除了“数据”部分中的“char”、“range”和“var”元素外,“rules”元素中包含的许多元素类型可能都有一个“ref”属性,如下所述:操作和文本(“规则中的char”),以及规则和类的定义,但不适用于使用下面定义的“by ref”属性引用命名字符类或规则。(使用“by ref”和“ref”属性是相互排斥的。)元数据中的元素都没有“ref”属性;要提供其他信息,请使用“description”元素。

5.4.2. The "comment" Attribute
5.4.2. “注释”属性

Any "char", "range", or "variant" element in the "data" section may contain an OPTIONAL "comment" attribute. The contents of a "comment" attribute are free-form plain text. Comments are ignored in machine processing of the table. "comment" attributes MAY also be placed on all elements in the "rules" section of the document, such as actions and match operators, as well as definitions of classes and rules, but not on child elements of the "class" element. Finally, in the metadata, only the "version" and "reference" elements MAY have "comment" attributes (to match the syntax in [RFC3743]).

“数据”部分中的任何“char”、“range”或“variant”元素都可能包含可选的“comment”属性。“comment”属性的内容是自由格式的纯文本。注释在表的机器处理中被忽略。“注释”属性也可以放在文档“规则”部分的所有元素上,例如操作和匹配运算符,以及类和规则的定义,但不能放在“类”元素的子元素上。最后,在元数据中,只有“version”和“reference”元素可以具有“comment”属性(以匹配[RFC3743]中的语法)。

5.5. Code Point Tagging
5.5. 代码点标记

Typically, LGRs are used to explicitly designate allowable code points, where any label that contains a code point not explicitly listed in the LGR is considered an ineligible label according to the ruleset.

通常,LGR用于明确指定允许的代码点,根据规则集,任何包含LGR中未明确列出的代码点的标签都被视为不合格标签。

For more-complex registry rules, there may be a need to discern one or more subsets of code points. This can be accomplished by applying an OPTIONAL "tag" attribute to "char" or "range" elements that are child elements of the "data" element. By collecting code points that share the same tag value, character classes may be defined (see Section 6.2.2) that can then be used in parameterized context or whole label rules (see Section 6.3.2).

对于更复杂的注册表规则,可能需要识别一个或多个代码点子集。这可以通过对“数据”元素的子元素“char”或“range”元素应用可选的“tag”属性来实现。通过收集共享相同标记值的代码点,可以定义字符类(见第6.2.2节),然后可以在参数化上下文或整个标签规则中使用(见第6.3.2节)。

Each "tag" attribute MAY contain multiple values separated by white space. A tag value is an identifier that may also include certain punctuation marks, such as a colon. Formally, it MUST correspond to the XML 1.0 Nmtoken (Name token) production (see [XML] Section 2.3). It is an error to duplicate a value within the same "tag" attribute. A "tag" attribute for a "range" element applies to all code points in the range. Because code point sequences are not proper members of a set of code points, a "tag" attribute MUST NOT be present in a "char" element defining a code point sequence.

每个“标记”属性可能包含多个由空格分隔的值。标记值是一个标识符,它还可能包括某些标点符号,例如冒号。形式上,它必须对应于XML 1.0 Nmtoken(名称令牌)产品(请参见[XML]第2.3节)。在同一“标记”属性中复制值是错误的。“range”元素的“tag”属性应用于范围中的所有代码点。因为代码点序列不是一组代码点的正确成员,所以定义代码点序列的“char”元素中不能存在“tag”属性。

6. Whole Label and Context Evaluation
6. 整体标签与语境评价
6.1. Basic Concepts
6.1. 基本概念

The "rules" element contains the specification of both context-based and whole label rules. Collectively, these are known as Whole Label Evaluation (WLE) rules (Section 6.3). The "rules" element also contains the character classes (Section 6.2) that they depend on, and any actions (Section 7) that assign dispositions to labels based on rules or variant mappings.

“rules”元素包含基于上下文和整个标签规则的规范。这些规则统称为整体标签评估(WLE)规则(第6.3节)。“rules”元素还包含它们所依赖的字符类(第6.2节),以及根据规则或变量映射为标签分配处置的任何操作(第7节)。

A whole label rule is applied to the whole label. It is used to validate both original labels and any variant labels computed from them.

整个标签规则应用于整个标签。它用于验证原始标签和根据它们计算的任何变体标签。

A rule implementing a conditional context as discussed in Section 5.2 does not necessarily apply to the whole label but may be specific to the context around a single code point or code point sequence. Certain code points in a label sometimes need to satisfy context-based rules -- for example, for the label to be considered valid, or to satisfy the context for a variant mapping (see the description of the "when" attribute in Section 6.4).

第5.2节中讨论的实现条件上下文的规则不一定适用于整个标签,但可能特定于单个代码点或代码点序列周围的上下文。标签中的某些代码点有时需要满足基于上下文的规则——例如,标签被认为是有效的,或者满足变量映射的上下文(参见第6.4节中“when”属性的描述)。

For example, if a rule is referenced in the "when" attribute of a variant mapping, it is used to describe the conditional context under which the particular variant mapping is defined to exist.

例如,如果在变量映射的“when”属性中引用了一条规则,则该规则用于描述定义特定变量映射存在的条件上下文。

Each rule is defined in a "rule" element. A rule may contain the following as child elements:

每个规则都在“rule”元素中定义。规则可以包含以下子元素:

o literal code points or code point sequences

o 文字代码点或代码点序列

o character classes, which define sets of code points to be used for context comparisons

o 字符类,定义用于上下文比较的代码点集

o context operators, which define when character classes and literals may appear

o 上下文运算符,用于定义字符类和文字出现的时间

o nested rules, whether defined in place or invoked by reference

o 嵌套规则,无论是就地定义还是通过引用调用

Collectively, these are called "match operators" and are listed in Section 6.3.2. An LGR containing rules or match operators that

这些统称为“匹配运算符”,并在第6.3.2节中列出。包含以下规则或匹配运算符的LGR

1. are incorrectly defined or nested,

1. 定义或嵌套不正确,

2. have invalid attributes, or

2. 具有无效属性,或

3. have invalid or undefined attribute values

3. 具有无效或未定义的属性值

MUST be rejected. Note that not all of the constraints defined here are validated by the schema.

必须拒绝。请注意,这里定义的约束并非都由模式验证。

6.2. Character Classes
6.2. 字符类

Character classes are sets of characters that often share a particular property. While they function like sets in every way, even supporting the usual set operators, they are called "character classes" here in a nod to the use of that term in regular expression syntax. (This also avoids confusion with the term "character set" in the sense of character encoding.)

字符类是通常共享特定属性的字符集。尽管它们在各个方面都像集合一样工作,甚至支持常用的集合运算符,但在这里它们被称为“字符类”,这是对正则表达式语法中使用该术语的认可。(这也避免了与字符编码意义上的术语“字符集”混淆。)

Character classes can be specified in several ways:

可以通过多种方式指定字符类:

o by defining the class via matching a tag in the code point data. All characters with the same "tag" attribute are part of the same class;

o 通过匹配代码点数据中的标记来定义类。具有相同“tag”属性的所有字符都是同一类的一部分;

o by referencing a value of one of the Unicode character properties defined in the Unicode Character Database;

o 通过引用Unicode字符数据库中定义的Unicode字符属性之一的值;

o by explicitly listing all the code points in the class; or

o 通过显式列出类中的所有代码点;或

o by defining the class as a set combination of any number of other classes.

o 通过将类定义为任意数量的其他类的集合组合。

6.2.1. Declaring and Invoking Named Classes
6.2.1. 声明和调用命名类

A character class has an OPTIONAL "name" attribute consisting of a single identifier not containing spaces. All names for classes must be unique. If the "name" attribute is omitted, the class is anonymous and exists only inside the rule or combined class where it is defined. A named character class is defined independently and can be referenced by name from within any rules or as part of other character class definitions.

字符类有一个可选的“name”属性,该属性由一个不包含空格的标识符组成。类的所有名称都必须是唯一的。如果省略“name”属性,则该类是匿名的,并且仅存在于定义该类的规则或组合类中。命名字符类是独立定义的,可以从任何规则中按名称引用,也可以作为其他字符类定义的一部分。

       <class name="example" comment="an example class definition">
           0061 4E00
       </class>
       ...
       <rule>
           <class by-ref="example" />
       </rule>
        
       <class name="example" comment="an example class definition">
           0061 4E00
       </class>
       ...
       <rule>
           <class by-ref="example" />
       </rule>
        

An empty "class" element with a "by-ref" attribute is a reference to an existing named class. The "by-ref" attribute MUST NOT be used in the same "class" element with any of these attributes: "name", "from-tag", "property", or "ref". The "name" attribute MUST be present if and only if the class is a direct child element of the "rules" element. It is an error to reference a named class for which the definition has not been seen.

带有“by ref”属性的空“class”元素是对现有命名类的引用。“by ref”属性不能与以下任何属性一起在同一个“class”元素中使用:“name”、“from tag”、“property”或“ref”。当且仅当类是“rules”元素的直接子元素时,“name”属性必须存在。引用尚未看到其定义的命名类是错误的。

6.2.2. Tag-Based Classes
6.2.2. 基于标记的类

The "char" or "range" elements that are child elements of the "data" element MAY contain a "tag" attribute that consists of one or more space-separated tag values; for example:

作为“数据”元素的子元素的“char”或“range”元素可以包含由一个或多个空格分隔的标记值组成的“tag”属性;例如:

       <char cp="0061" tag="letter lower"/>
       <char cp="4E00" tag="letter"/>
        
       <char cp="0061" tag="letter lower"/>
       <char cp="4E00" tag="letter"/>
        

This defines two tags for use with code point U+0061, the tag "letter" and the tag "lower". Use

这定义了与代码点U+0061一起使用的两个标记,标记“字母”和标记“较低”。使用

       <class name="letter" from-tag="letter" />
       <class name="lower" from-tag="lower" />
        
       <class name="letter" from-tag="letter" />
       <class name="lower" from-tag="lower" />
        

to define two named character classes, "letter" and "lower", containing all code points with the respective tags, the first with 0061 and 4E00 as elements, and the latter with 0061 but not 4E00 as an element. The "name" attribute may be omitted for an anonymous in-place definition of a nested, tag-based class.

要定义两个命名字符类,“letter”和“lower”,它们包含具有相应标记的所有代码点,第一个以0061和4E00作为元素,第二个以0061但不是4E00作为元素。对于嵌套的、基于标记的类的匿名就地定义,可以省略“name”属性。

Tag values are typically identifiers, with the addition of a few punctuation symbols, such as a colon. Formally, they MUST correspond to the XML 1.0 Nmtoken production. While a "tag" attribute may contain a list of tag values, the "from-tag" attribute MUST always contain a single tag value.

标记值通常是标识符,加上一些标点符号,如冒号。在形式上,它们必须对应于XML 1.0 Nmtoken产品。虽然“标记”属性可能包含标记值列表,“来自标记”属性必须始终包含单个标记值。

If the document contains no "char" or "range" elements with a corresponding tag, the character class represents the empty set. This is valid, to allow a common "rules" element to be shared across files. However, it is RECOMMENDED that implementations allow for a warning to ensure that referring to an undefined tag in this way is intentional.

如果文档不包含带有相应标记的“char”或“range”元素,则character类表示空集。这是有效的,允许跨文件共享公共“规则”元素。但是,建议实现允许警告,以确保以这种方式引用未定义的标记是有意的。

6.2.3. Unicode Property-Based Classes
6.2.3. 基于Unicode属性的类

A class is defined in terms of Unicode properties by giving the Unicode property alias and the property value or property value alias, separated by a colon.

通过提供Unicode属性别名和属性值或属性值别名(用冒号分隔),可以根据Unicode属性定义类。

       <class name="virama" property="ccc:9" />
        
       <class name="virama" property="ccc:9" />
        

The example above selects all code points for which the Unicode Canonical Combining Class (ccc) value is 9. This value of the ccc is assigned to all code points that encode viramas.

上面的示例选择Unicode规范组合类(ccc)值为9的所有代码点。ccc的这个值被分配给所有编码viramas的代码点。

Unicode property values MUST be designated via a composite of the attribute name and value as defined for the property value in [UAX42], separated by a colon. Loose matching of property values and names as described in [UAX44] is not appropriate for an XML schema and is not supported; it is likewise not supported in the XML representation [UAX42] of the Unicode Character Database itself.

Unicode属性值必须通过属性名称和[UAX42]中为属性值定义的值的组合来指定,并用冒号分隔。[UAX44]中描述的属性值和名称的松散匹配不适用于XML模式,不受支持;Unicode字符数据库本身的XML表示[UAX42]也不支持它。

A property-based class MAY be anonymous, or, when defined as an immediate child of the "rules" element, it MAY be named to relate a formal property definition to its usage, such as the use of the value 9 for ccc to designate a virama (or halant) in various scripts.

基于属性的类可以是匿名的,或者,当定义为“rules”元素的直接子级时,可以对其进行命名,以将正式属性定义与其用法联系起来,例如使用ccc的值9在各种脚本中指定virama(或halant)。

Unicode properties may, in principle, change between versions of the Unicode Standard. However, the values assigned for a given version are fixed. If Unicode properties are used, a Unicode version MUST be declared in the "unicode-version" element in the header. (Note: Some Unicode properties are by definition stable across versions and do not change once assigned; see [Unicode-Stability].)

原则上,Unicode属性可能在Unicode标准的不同版本之间发生变化。但是,为给定版本指定的值是固定的。如果使用Unicode属性,则必须在标头的“Unicode版本”元素中声明Unicode版本。(注意:根据定义,某些Unicode属性在不同版本之间是稳定的,并且在分配后不会更改;请参阅[Unicode稳定性]。)

All implementations processing LGR files SHOULD provide support for the following minimal set of Unicode properties:

所有处理LGR文件的实现都应支持以下最小的Unicode属性集:

o General Category (gc)

o 一般类别(gc)

o Script (sc)

o 脚本(sc)

o Canonical Combining Class (ccc)

o 标准组合类(ccc)

o Bidi Class (bc)

o 比迪级(卑诗省)

o Arabic Joining Type (jt)

o 阿拉伯语连接类型(jt)

o Indic Syllabic Category (InSC)

o 印度音节分类(InSC)

o Deprecated (Dep)

o 不推荐使用(Dep)

The short name for each property is given in parentheses.

括号中给出了每个属性的简短名称。

If a program that is using an LGR to determine the validity of a label encounters a property that it does not support, it MUST abort with an error.

如果使用LGR确定标签有效性的程序遇到它不支持的属性,它必须中止并出错。

6.2.4. Explicitly Declared Classes
6.2.4. 显式声明的类

A class of code points may also be declared by listing all code points that are members of the class. This is useful when tagging cannot be used because code points are not listed individually as part of the eligible set of code points for the given LGR -- for example, because they only occur in code point sequences.

还可以通过列出作为类成员的所有代码点来声明代码点类。当无法使用标记时,这非常有用,因为代码点没有单独列为给定LGR的合格代码点集的一部分——例如,因为它们只出现在代码点序列中。

To define a class in terms of an explicit list of code points, use a space-separated list of hexadecimal code point values:

要根据显式代码点列表定义类,请使用以空格分隔的十六进制代码点值列表:

       <class name="abcd">0061 0062 0063 0064</class>
        
       <class name="abcd">0061 0062 0063 0064</class>
        

This defines a class named "abcd" containing the code points for characters "a", "b", "c", and "d". The ordering of the code points is not material, but it is RECOMMENDED to list them in ascending order; not doing so makes it unnecessarily difficult for users to detect errors such as duplicates or to compare and review these classes against other specifications.

这定义了一个名为“abcd”的类,其中包含字符“a”、“b”、“c”和“d”的代码点。代码点的顺序不重要,但建议按升序列出;如果不这样做,用户将不必要地难以检测错误,例如重复的错误,或者将这些类与其他规范进行比较和审查。

In a class definition, ranges of code points are represented by a hexadecimal start and end value separated by a hyphen. The following declaration is equivalent to the preceding:

在类定义中,代码点的范围由十六进制开始值和结束值表示,以连字符分隔。以下声明等同于上述声明:

       <class name="abcd">0061-0064</class>
        
       <class name="abcd">0061-0064</class>
        

Range and code point declarations can be freely intermixed:

范围和代码点声明可以自由混合使用:

       <class name="abcd">0061 0062-0063 0064</class>
        
       <class name="abcd">0061 0062-0063 0064</class>
        

The contents of a class differ from a repertoire in that the latter MAY contain sequences as elements, while the former MUST NOT. Instead, they closely resemble character classes as found in regular expressions.

一个类的内容不同于一个剧目,因为后者可能包含序列作为元素,而前者不能。相反,它们与正则表达式中的字符类非常相似。

6.2.5. Combined Classes
6.2.5. 联合班

Classes may be combined using operators for set complement, union, intersection, difference (elements of the first class that are not in the second), and symmetric difference (elements in either class but not both). Because classes fundamentally function like sets, the union of several character classes is itself a class, for example.

类可以使用集合补码、并集、交集、差分(第一类中不在第二类中的元素)和对称差分(任一类中的元素,但不是两个)的运算符组合。因为类的基本功能类似于集合,所以几个字符类的并集本身就是一个类。

   +-------------------+----------------------------------------------+
   | Logical Operation | Example                                      |
   +-------------------+----------------------------------------------+
   | Complement        | <complement><class by-ref="xxx"></complement>|
   +-------------------+----------------------------------------------+
   | Union             | <union>                                      |
   |                   |    <class by-ref="class-1"/>                 |
   |                   |    <class by-ref="class-2"/>                 |
   |                   |    <class by-ref="class-3"/>                 |
   |                   | </union>                                     |
   +-------------------+----------------------------------------------+
   | Intersection      | <intersection>                               |
   |                   |    <class by-ref="class-1"/>                 |
   |                   |    <class by-ref="class-2"/>                 |
   |                   | </intersection>                              |
   +-------------------+----------------------------------------------+
   | Difference        | <difference>                                 |
   |                   |    <class by-ref="class-1"/>                 |
   |                   |    <class by-ref="class-2"/>                 |
   |                   | </difference>                                |
   +-------------------+----------------------------------------------+
   | Symmetric         | <symmetric-difference>                       |
   | Difference        |    <class by-ref="class-1"/>                 |
   |                   |    <class by-ref="class-2"/>                 |
   |                   | </symmetric-difference>                      |
   +-------------------+----------------------------------------------+
        
   +-------------------+----------------------------------------------+
   | Logical Operation | Example                                      |
   +-------------------+----------------------------------------------+
   | Complement        | <complement><class by-ref="xxx"></complement>|
   +-------------------+----------------------------------------------+
   | Union             | <union>                                      |
   |                   |    <class by-ref="class-1"/>                 |
   |                   |    <class by-ref="class-2"/>                 |
   |                   |    <class by-ref="class-3"/>                 |
   |                   | </union>                                     |
   +-------------------+----------------------------------------------+
   | Intersection      | <intersection>                               |
   |                   |    <class by-ref="class-1"/>                 |
   |                   |    <class by-ref="class-2"/>                 |
   |                   | </intersection>                              |
   +-------------------+----------------------------------------------+
   | Difference        | <difference>                                 |
   |                   |    <class by-ref="class-1"/>                 |
   |                   |    <class by-ref="class-2"/>                 |
   |                   | </difference>                                |
   +-------------------+----------------------------------------------+
   | Symmetric         | <symmetric-difference>                       |
   | Difference        |    <class by-ref="class-1"/>                 |
   |                   |    <class by-ref="class-2"/>                 |
   |                   | </symmetric-difference>                      |
   +-------------------+----------------------------------------------+
        

Set Operators

集合运算符

The elements from this table may be arbitrarily nested inside each other, subject to the following restriction: a "complement" element MUST contain precisely one "class" or one of the operator elements, while an "intersection", "symmetric-difference", or "difference" element MUST contain precisely two, and a "union" element MUST contain two or more of these elements.

此表中的元素可以任意嵌套在彼此内部,但要遵守以下限制:“complete”元素必须精确包含一个“class”或一个操作符元素,而“intersection”、“symmetric difference”或“difference”元素必须精确包含两个和一个“union”元素必须包含这些元素中的两个或多个。

An anonymous combined class can be defined directly inside a rule or any of the match operator elements that allow child elements (see Section 6.3.2) by using the set combination as the outer element.

通过使用集合组合作为外部元素,可以直接在规则或允许子元素(参见第6.3.2节)的任何匹配运算符元素内定义匿名组合类。

       <rule>
           <union>
               <class by-ref="xxx"/>
               <class by-ref="yyy"/>
           </union>
       </rule>
        
       <rule>
           <union>
               <class by-ref="xxx"/>
               <class by-ref="yyy"/>
           </union>
       </rule>
        

The example shows the definition of an anonymous combined class that represents the union of classes "xxx" and "yyy". There is no need to wrap this union inside another "class" element, and, in fact, set combination elements MUST NOT be nested inside a "class" element.

该示例显示了表示类“xxx”和“yyy”的并集的匿名组合类的定义。不需要将此并集包装在另一个“class”元素中,事实上,set组合元素不能嵌套在“class”元素中。

Lastly, to create a named combined class that can be referenced in other classes or in rules as <class by-ref="xxxyyy"/>, add a "name" attribute to the set combination element -- for example, <union name="xxxyyy" /> -- and place it at the top level immediately below the "rules" element (see Section 6.2.1).

最后,要创建可在其他类或规则中引用的命名组合类,如<class by ref=“xxxyyy”/>,请向集合组合元素添加一个“name”属性,例如<union name=“xxxyyy”/>,并将其放置在“rules”元素正下方的顶层(参见第6.2.1节)。

       <rules>
          <union name="xxxyyy">
              <class by-ref="xxx"/>
              <class by-ref="yyy"/>
          </union>
            ...
       </rules>
        
       <rules>
          <union name="xxxyyy">
              <class by-ref="xxx"/>
              <class by-ref="yyy"/>
          </union>
            ...
       </rules>
        

Because (as for ordinary sets) a combination of classes is itself a class, no matter by what combinations of set operators a combined class is created, a reference to it always uses the "class" element as described in Section 6.2.1. That is, a named class is always referenced via an empty "class" element using the "by-ref" attribute containing the name of the class to be referenced.

由于(对于普通集合而言)类的组合本身就是一个类,因此,无论使用何种集合运算符组合创建组合类,对其的引用始终使用第6.2.1节中所述的“类”元素。也就是说,命名类总是通过空的“class”元素引用,该元素使用包含要引用的类的名称的“by ref”属性。

6.3. Whole Label and Context Rules
6.3. 整体标签和上下文规则

Each rule comprises a series of matching operators that must be satisfied in order to determine whether a label meets a given condition. Rules may reference other rules or character classes defined elsewhere in the table.

每个规则都包含一系列必须满足的匹配运算符,以确定标签是否满足给定条件。规则可以引用表中其他地方定义的其他规则或字符类。

6.3.1. The "rule" Element
6.3.1. “规则”要素

A matching rule is defined by a "rule" element, the child elements of which are one of the match operators from Section 6.3.2. In evaluating a rule, each child element is matched in order. "rule" elements MAY be nested inside each other and inside certain match operators.

匹配规则由“规则”元素定义,其子元素是第6.3.2节中的匹配运算符之一。在计算规则时,每个子元素按顺序匹配。“规则”元素可以相互嵌套,也可以嵌套在某些匹配运算符中。

A simple rule to match a label where all characters are members of some class called "preferred-codepoint":

匹配标签的一个简单规则,其中所有字符都是称为“首选代码点”的某个类的成员:

       <rule name="preferred-label">
           <start />
           <class by-ref="preferred-codepoint" count="1+"/>
           <end />
       </rule>
        
       <rule name="preferred-label">
           <start />
           <class by-ref="preferred-codepoint" count="1+"/>
           <end />
       </rule>
        

Rules are paired with explicit and implied actions, triggering these actions when a rule matches a label. For example, a simple explicit action for the rule shown above would be:

规则与显式和隐含操作配对,当规则与标签匹配时触发这些操作。例如,上面显示的规则的简单显式操作是:

       <action disp="allocatable" match="preferred-label" />
        
       <action disp="allocatable" match="preferred-label" />
        

The rule in this example would have the effect of setting the policy disposition for a label made up entirely of preferred code points to "allocatable". Explicit actions are further discussed in Section 7 and implicit actions in Section 7.5. Another use of rules is in defining conditional contexts for code points and variants as discussed in Sections 5.2 and 5.3.5.

本例中的规则可以将完全由首选代码点组成的标签的策略配置设置为“可分配”。第7节将进一步讨论显性行为,第7.5节将进一步讨论隐性行为。规则的另一个用途是定义代码点和变体的条件上下文,如第5.2节和第5.3.5节所述。

A rule that is an immediate child element of the "rules" element MUST be named using a "name" attribute containing a single identifier string with no spaces. A named rule may be incorporated into another rule by reference and may also be referenced by an "action" element, "when" attribute, or "not-when" attribute. If the "name" attribute is omitted, the rule is anonymous and MUST be nested inside another rule or match operator.

作为“rules”元素的直接子元素的规则必须使用“name”属性命名,该属性包含不带空格的单个标识符字符串。命名规则可以通过引用合并到另一个规则中,也可以由“action”元素、“when”属性或“not when”属性引用。如果省略“name”属性,则该规则是匿名的,必须嵌套在另一个规则或匹配运算符中。

6.3.2. The Match Operators
6.3.2. 匹配运算符

The child elements of a rule are a series of match operators, which are listed here by type and name and with a basic example or two.

规则的子元素是一系列匹配运算符,此处按类型和名称以及一两个基本示例列出。

   +------------+-------------+------------------------------------+
   | Type       | Operator    | Examples                           |
   +------------+-------------+------------------------------------+
   | logical    | any         | <any />                            |
   |            +-------------+------------------------------------+
   |            | choice      | <choice>                           |
   |            |             |  <rule by-ref="alternative1"/>     |
   |            |             |  <rule by-ref="alternative2"/>     |
   |            |             | </choice>                          |
   +--------------------------+------------------------------------+
   | positional | start       | <start />                          |
   |            +-------------+------------------------------------+
   |            | end         | <end />                            |
   +--------------------------+------------------------------------+
   | literal    | char        | <char cp="0061 0062 0063" />       |
   +--------------------------+------------------------------------+
   | set        | class       | <class by-ref="class1" />          |
   |            |             | <class>0061 0064-0065</class>      |
   +--------------------------+------------------------------------+
   | group      | rule        | <rule by-ref="rule1" />            |
   |            |             | <rule><any /></rule>               |
   +--------------------------+------------------------------------+
   | contextual | anchor      | <anchor />                         |
   |            +-------------+------------------------------------+
   |            | look-ahead  | <look-ahead><any /></look-ahead>   |
   |            +-------------+------------------------------------+
   |            | look-behind | <look-behind><any /></look-behind> |
   +--------------------------+------------------------------------+
        
   +------------+-------------+------------------------------------+
   | Type       | Operator    | Examples                           |
   +------------+-------------+------------------------------------+
   | logical    | any         | <any />                            |
   |            +-------------+------------------------------------+
   |            | choice      | <choice>                           |
   |            |             |  <rule by-ref="alternative1"/>     |
   |            |             |  <rule by-ref="alternative2"/>     |
   |            |             | </choice>                          |
   +--------------------------+------------------------------------+
   | positional | start       | <start />                          |
   |            +-------------+------------------------------------+
   |            | end         | <end />                            |
   +--------------------------+------------------------------------+
   | literal    | char        | <char cp="0061 0062 0063" />       |
   +--------------------------+------------------------------------+
   | set        | class       | <class by-ref="class1" />          |
   |            |             | <class>0061 0064-0065</class>      |
   +--------------------------+------------------------------------+
   | group      | rule        | <rule by-ref="rule1" />            |
   |            |             | <rule><any /></rule>               |
   +--------------------------+------------------------------------+
   | contextual | anchor      | <anchor />                         |
   |            +-------------+------------------------------------+
   |            | look-ahead  | <look-ahead><any /></look-ahead>   |
   |            +-------------+------------------------------------+
   |            | look-behind | <look-behind><any /></look-behind> |
   +--------------------------+------------------------------------+
        

Match Operators

匹配运算符

Any element defining an anonymous class can be used as a match operator, including any of the set combination operators (see Section 6.2.5) as well as references to named classes.

定义匿名类的任何元素都可以用作匹配运算符,包括任何集合组合运算符(参见第6.2.5节)以及对命名类的引用。

All match operators shown as empty elements in the Examples column of the table above do not support child elements of their own; otherwise, match operators MAY be nested. In particular, anonymous "rule" elements can be used for grouping.

在上表的“示例”列中显示为空元素的所有匹配运算符都不支持自己的子元素;否则,可能会嵌套匹配运算符。特别是,匿名“规则”元素可用于分组。

6.3.3. The "count" Attribute
6.3.3. “计数”属性

The OPTIONAL "count" attribute, when present, specifies the minimally required or maximal permitted number of times a match operator is used to match input. If the "count" attribute is

可选的“count”属性(如果存在)指定使用匹配运算符匹配输入的最小要求或最大允许次数。如果“计数”属性为

n the match operator matches the input exactly n times, where n is 1 or greater.

n匹配运算符精确匹配输入n次,其中n等于或大于1。

n+ the match operator matches the input at least n times, where n is 0 or greater.

n+匹配运算符至少匹配输入n次,其中n为0或更大。

n:m the match operator matches the input at least n times, where n is 0 or greater, but matches the input up to m times in total, where m > n. If m = n and n > 0, the match operator matches the input exactly n times.

n:m匹配运算符至少匹配输入n次,其中n为0或更大,但总共匹配输入m次,其中m>n。如果m=n且n>0,则匹配运算符将与输入精确匹配n次。

If there is no "count" attribute, the match operator matches the input exactly once.

如果没有“count”属性,则match操作符只匹配输入一次。

In matching, greedy evaluation is used in the sense defined for regular expressions: beyond the required number or times, the input is matched as many times as possible, but not so often as to prevent a match of the remainder of the rule.

在匹配中,贪心求值是在为正则表达式定义的意义上使用的:超出所需的数目或次数后,将尽可能多地匹配输入,但不会频繁地阻止规则其余部分的匹配。

A "count" attribute MUST NOT be applied to any element that contains a "name" attribute but MAY be applied to operators such as "class" that declare anonymous classes (including combined classes) or invoke any predefined classes by reference. The "count" attribute MUST NOT be applied to any "class" element, or element defining a combined class, when it is nested inside a combined class.

“count”属性不得应用于包含“name”属性的任何元素,但可以应用于诸如“class”之类的运算符,这些运算符声明匿名类(包括组合类)或通过引用调用任何预定义类。“count”属性嵌套在组合类中时,不得应用于任何“class”元素或定义组合类的元素。

A "count" attribute MUST NOT be applied to match operators of type "start", "end", "anchor", "look-ahead", or "look-behind" or to any operators, such as "rule" or "choice", that contain a nested instance of them. This limitation applies recursively and irrespective of whether a "rule" element containing these nested instances is declared in place or used by reference.

“count”属性不得应用于匹配类型为“start”、“end”、“anchor”、“look ahead”或“look behind”的运算符,也不得应用于包含嵌套实例的任何运算符,如“rule”或“choice”。这个限制递归地应用,不管包含这些嵌套实例的“规则”元素是就地声明的还是通过引用使用的。

However, the "count" attribute MAY be applied to any other instances of either an anonymous "rule" element or a "choice" element, including those instances nested inside other match operators. It MAY also be applied to the elements "any" and "char", when used as match operators.

但是,“count”属性可以应用于匿名“rule”元素或“choice”元素的任何其他实例,包括嵌套在其他匹配运算符中的实例。当用作匹配运算符时,它也可以应用于元素“any”和“char”。

6.3.4. The "name" and "by-ref" Attributes
6.3.4. “name”和“by ref”属性

Like classes (see Section 6.2.1), rules declared as immediate child elements of the "rules" element MUST be named using a unique "name" attribute, and all other instances MUST NOT be named. Anonymous rules and classes or references to named rules and classes can be nested inside other match operators by reference.

与类一样(请参见第6.2.1节),声明为“rules”元素的直接子元素的规则必须使用唯一的“name”属性命名,并且不得命名所有其他实例。匿名规则和类或对命名规则和类的引用可以通过引用嵌套在其他匹配运算符中。

To reference a named rule or class inside a rule or match operator, use a "rule" or "class" element with an OPTIONAL "by-ref" attribute containing the name of the referenced element. It is an error to reference a rule or class for which the complete definition has not been seen. In other words, it is explicitly not possible to define recursive rules or class definitions. The "by-ref" attribute MUST NOT appear in the same element as the "name" attribute or in an element that has any child elements.

要在规则或匹配运算符中引用命名规则或类,请使用“rule”或“class”元素,该元素带有可选的“by ref”属性,该属性包含被引用元素的名称。引用尚未看到完整定义的规则或类是错误的。换句话说,明确地定义递归规则或类定义是不可能的。“by ref”属性不能出现在与“name”属性相同的元素中,也不能出现在具有任何子元素的元素中。

The example shows several named classes and a named rule referencing some of them by name.

该示例显示了几个命名类和一个命名规则,这些命名类和规则按名称引用其中一些类。

       <class name="letter" property="gc:L"/>
       <class name="combining-mark" property="gc:M"/>
       <class name="digit" property="gc:Nd" />
       <rule name="letter-grapheme">
          <class by-ref="letter" count="1+"/>
          <class by-ref="combining-mark" count="0+"/>
       </rule>
        
       <class name="letter" property="gc:L"/>
       <class name="combining-mark" property="gc:M"/>
       <class name="digit" property="gc:Nd" />
       <rule name="letter-grapheme">
          <class by-ref="letter" count="1+"/>
          <class by-ref="combining-mark" count="0+"/>
       </rule>
        
6.3.5. The "choice" Element
6.3.5. “选择”因素

The "choice" element is used to represent a list of two or more alternatives:

“choice”元素用于表示两个或多个备选方案的列表:

       <rule name="ldh">
          <choice count="1+">
              <class by-ref="letter"/>
              <class by-ref="digit"/>
              <char cp="002D" comment="literal HYPHEN"/>
          </choice>
       </rule>
        
       <rule name="ldh">
          <choice count="1+">
              <class by-ref="letter"/>
              <class by-ref="digit"/>
              <char cp="002D" comment="literal HYPHEN"/>
          </choice>
       </rule>
        

Each child element of a "choice" element represents one alternative. The first matching alternative determines the match for the "choice" element. To express a choice where an alternative itself consists of a sequence of elements, the sequence must be wrapped in an anonymous rule.

“choice”元素的每个子元素表示一个备选方案。第一个匹配选项确定“choice”元素的匹配。若要在备选方案本身由一系列元素组成的情况下表示选择,必须将该序列包装在匿名规则中。

6.3.6. Literal Code Point Sequences
6.3.6. 文字代码点序列

A literal code point sequence matches a single code point or a sequence. It is defined by a "char" element, with the code point or sequence to be matched given by the "cp" attribute. When used as a literal, a "char" element MAY contain a "count" attribute in addition to the "cp" attribute and OPTIONAL "comment" or "ref" attributes. No other attributes or child elements are permitted.

文字代码点序列与单个代码点或序列匹配。它由“char”元素定义,由“cp”属性给出要匹配的代码点或序列。当用作文字时,“char”元素除了“cp”属性和可选的“comment”或“ref”属性外,还可以包含“count”属性。不允许使用其他属性或子元素。

6.3.7. The "any" Element
6.3.7. “任何”元素

The "any" element is an empty element that matches any single code point. It MAY have a "count" attribute. For an example, see Section 6.3.9.

“any”元素是与任何单个代码点匹配的空元素。它可能有一个“count”属性。有关示例,请参见第6.3.9节。

Unlike a literal, the "any" element MUST NOT have a "ref" attribute.

与文本不同,“any”元素不能有“ref”属性。

6.3.8. The "start" and "end" Elements
6.3.8. “开始”和“结束”元素

To match the beginning or end of a label, use the "start" or "end" element. An empty label would match this rule:

要匹配标签的开始或结束,请使用“开始”或“结束”元素。空标签将符合此规则:

       <rule name="empty-label">
           <start/>
           <end/>
       </rule>
        
       <rule name="empty-label">
           <start/>
           <end/>
       </rule>
        

Conceptually, whole label rules evaluate the label as a whole, but in practice, many rules do not actually need to be specified to match the entire label. For example, to express a requirement of not starting a label with a digit, a rule needs to describe only the initial part of a label.

从概念上讲,整个标签规则将标签作为一个整体进行评估,但在实践中,实际上不需要指定许多规则来匹配整个标签。例如,为了表示不以数字开头标签的要求,规则只需要描述标签的初始部分。

This example uses the previously defined rules, together with "start" and "end" elements, to define a rule that requires that an entire label be well-formed. For this example, that means that it must start with a letter and that it contains no leading digits or combining marks nor combining marks placed on digits.

本例使用先前定义的规则以及“开始”和“结束”元素来定义一个规则,该规则要求整个标签格式良好。对于本例,这意味着它必须以字母开头,并且它不包含前导数字或组合标记,也不包含放置在数字上的组合标记。

       <rule name="leading-letter" >
         <start />
         <rule by-ref="letter-grapheme" count="1"/>
         <choice count="0+">
           <rule by-ref="letter-grapheme" count="0+"/>
           <class by-ref="digit" count="0+"/>
         </choice>
         <end />
       </rule>
        
       <rule name="leading-letter" >
         <start />
         <rule by-ref="letter-grapheme" count="1"/>
         <choice count="0+">
           <rule by-ref="letter-grapheme" count="0+"/>
           <class by-ref="digit" count="0+"/>
         </choice>
         <end />
       </rule>
        

Each "start" or "end" element occurs at most once in a rule, except if nested inside a "choice" element in such a way that in matching each alternative at most one occurrence of each is encountered. Otherwise, the result is an error, as is any case where a "start" or "end" element is not encountered as the first or last element to be matched, respectively, in matching a rule. "start" and "end" elements are empty elements that do not have a "count" attribute or any other attribute other than "comment". It is an error for any match operator enclosing a nested "start" or "end" element to have a "count" attribute.

每个“开始”或“结束”元素在规则中最多出现一次,但嵌套在“选择”元素中的情况除外,在匹配每个选项时,每个元素最多出现一次。否则,结果将是一个错误,这与在匹配规则时未分别遇到“开始”或“结束”元素作为要匹配的第一个或最后一个元素的任何情况一样。“start”和“end”元素是空元素,没有“count”属性或除“comment”之外的任何其他属性。任何包含嵌套的“开始”或“结束”元素的匹配运算符都有“计数”属性,这是一个错误。

6.3.9. Example Context Rule from IDNA Specification
6.3.9. IDNA规范中的上下文规则示例

This is an example of the WLE rule from [RFC5892] forbidding the mixture of the Arabic-Indic and extended Arabic-Indic digits in the same label. It is implemented as a whole label rule associated with the code point ranges using the "not-when" attribute, which defines an impermissible context. The example also demonstrates several instances of the use of anonymous rules for grouping.

这是[RFC5892]中WLE规则的一个示例,该规则禁止在同一标签中混合使用阿拉伯-印度语和扩展阿拉伯-印度语数字。它是作为一个整体标签规则实现的,与使用“not when”属性的代码点范围相关联,该属性定义了一个不允许的上下文。该示例还演示了使用匿名规则进行分组的几个实例。

       <data>
          <range first-cp="0660" last-cp="0669" not-when="mixed-digits"
                 tag="arabic-indic-digits" />
          <range first-cp="06F0" last-cp="06F9" not-when="mixed-digits"
                 tag="extended-arabic-indic-digits" />
       </data>
       <rules>
          <rule name="mixed-digits">
             <choice>
               <rule>
                   <class from-tag="arabic-indic-digits"/>
                   <any count="0+"/>
                   <class from-tag="extended-arabic-indic-digits"/>
                </rule>
                <rule>
                   <class from-tag="extended-arabic-indic-digits"/>
                   <any count="0+"/>
                   <class from-tag="arabic-indic-digits"/>
                </rule>
             </choice>
          </rule>
       </rules>
        
       <data>
          <range first-cp="0660" last-cp="0669" not-when="mixed-digits"
                 tag="arabic-indic-digits" />
          <range first-cp="06F0" last-cp="06F9" not-when="mixed-digits"
                 tag="extended-arabic-indic-digits" />
       </data>
       <rules>
          <rule name="mixed-digits">
             <choice>
               <rule>
                   <class from-tag="arabic-indic-digits"/>
                   <any count="0+"/>
                   <class from-tag="extended-arabic-indic-digits"/>
                </rule>
                <rule>
                   <class from-tag="extended-arabic-indic-digits"/>
                   <any count="0+"/>
                   <class from-tag="arabic-indic-digits"/>
                </rule>
             </choice>
          </rule>
       </rules>
        

As specified in the example, a label containing a code point from either of the two digit ranges is invalid for any label matching the "mixed-digits" rule, that is, any time that a code point from the other range is also present. Note that invalidating the label is not

如示例中所述,包含两个数字范围中任意一个的代码点的标签对于与“混合数字”规则匹配的任何标签都是无效的,也就是说,任何时候也存在另一个范围中的代码点。请注意,不允许使标签无效

the same as invalidating the definition of the "range" elements; in particular, the definition of the tag values does not depend on the "when" attribute.

与废除“范围”要素的定义相同;特别是,标记值的定义不依赖于“when”属性。

6.4. Parameterized Context or When Rules
6.4. 参数化上下文或当规则

To recap: When a rule is intended to provide a context for evaluating the validity of a code point or variant mapping, it is invoked by the "when" or "not-when" attributes described in Section 5.2. For "char" and "range" elements, an action implied by a context rule always has a disposition of "invalid" whenever the rule given by the "when" attribute is not matched (see Section 7.5). Conversely, a "not-when" attribute results in a disposition of "invalid" whenever the rule is matched. When a rule is used in this way, it is called a context or "when" rule.

重述:当规则旨在提供上下文以评估代码点或变量映射的有效性时,第5.2节中描述的“何时”或“非何时”属性将调用该规则。对于“char”和“range”元素,当“when”属性给出的规则不匹配时,上下文规则隐含的操作总是具有“invalid”的处置(参见第7.5节)。相反,“not when”属性会在匹配规则时导致“invalid”的处置。以这种方式使用规则时,称为上下文或“何时”规则。

The example in the previous section shows a whole label rule used as a context rule, essentially making the whole label the context. The next sections describe several match operators that can be used to provide a more specific specification of a context, allowing a parameterized context rule. See Section 7 for an alternative method of defining an invalid disposition for a label not matching a whole label rule.

上一节中的示例显示了用作上下文规则的整个标签规则,基本上使整个标签成为上下文。下一节将描述几个匹配运算符,这些运算符可用于提供更具体的上下文规范,从而允许参数化上下文规则。有关为不匹配整个标签规则的标签定义无效处置的替代方法,请参见第7节。

6.4.1. The "anchor" Element
6.4.1. “锚”元素

Such parameterized context rules are rules that contain a special placeholder represented by an "anchor" element. As each When Rule is evaluated, if an "anchor" element is present, it is replaced by a literal corresponding to the "cp" attribute of the element containing the "when" (or "not-when") attribute. The match to the "anchor" element must be at the same position in the label as the code point or variant mapping triggering the When Rule.

此类参数化上下文规则是包含由“锚”元素表示的特殊占位符的规则。在计算每个When规则时,如果存在一个“锚定”元素,它将被一个与包含“When”(或“not When”)属性的元素的“cp”属性相对应的文本替换。与“锚定”元素的匹配必须与触发When规则的代码点或变量映射在标签中的相同位置。

For example, the Greek lower numeral sign is invalid if not immediately preceding a character in the Greek script. This is most naturally addressed with a parameterized When Rule using "look-ahead":

例如,如果希腊语小写数字符号不紧跟希腊语脚本中的字符,则该符号无效。当使用“向前看”时,最自然地使用参数化规则解决这一问题:

       <char cp="0375" when="preceding-greek"/>
       ...
       <class name="greek-script" property="sc:Grek"/>
       <rule name="preceding-greek">
           <anchor/>
           <look-ahead>
               <class by-ref="greek-script"/>
           </look-ahead>
       </rule>
        
       <char cp="0375" when="preceding-greek"/>
       ...
       <class name="greek-script" property="sc:Grek"/>
       <rule name="preceding-greek">
           <anchor/>
           <look-ahead>
               <class by-ref="greek-script"/>
           </look-ahead>
       </rule>
        

In evaluating this rule, the "anchor" element is treated as if it was replaced by a literal

在评估此规则时,“anchor”元素被视为被文字替换

       <char cp="0375"/>
        
       <char cp="0375"/>
        

but only the instance of U+0375 at the given position is evaluated. If a label had two instances of U+0375 with the first one matching the rule and the second not, then evaluating the When Rule MUST succeed for the first instance and fail for the second.

但仅计算给定位置的U+0375实例。如果标签有两个U+0375实例,第一个与规则匹配,第二个与规则不匹配,则计算When规则必须在第一个实例中成功,在第二个实例中失败。

Unlike other rules, rules containing an "anchor" element MUST only be invoked via the "when" or "not-when" attributes on code points or variants; otherwise, their "anchor" elements cannot be evaluated. However, it is possible to invoke rules not containing an "anchor" element from a "when" or "not-when" attribute. (See Section 6.4.3.)

与其他规则不同,包含“锚定”元素的规则只能通过代码点或变体上的“何时”或“非何时”属性调用;否则,无法评估其“锚”元素。但是,可以从“when”或“not-when”属性调用不包含“anchor”元素的规则。(见第6.4.3节。)

The "anchor" element is an empty element, with no attributes permitted except "comment".

“anchor”元素是空元素,除了“comment”之外,不允许有任何属性。

6.4.2. The "look-behind" and "look-ahead" Elements
6.4.2. “向后看”和“向前看”元素

Context rules use the "look-behind" and "look-ahead" elements to define context before and after the code point sequence matched by the "anchor" element. If the "anchor" element is omitted, neither the "look-behind" nor the "look-ahead" element may be present in a rule.

上下文规则使用“lookbehind”和“lookahead”元素定义与“anchor”元素匹配的代码点序列前后的上下文。如果省略了“锚定”元素,“向后看”和“向前看”元素都不能出现在规则中。

Here is an example of a rule that defines an "initial" context for an Arabic code point:

以下是定义阿拉伯语代码点“初始”上下文的规则示例:

       <class name="transparent" property="jt:T"/>
       <class name="right-joining" property="jt:R"/>
       <class name="left-joining" property="jt:L"/>
       <class name="dual-joining" property="jt:D"/>
       <class name="non-joining" property="jt:U"/>
       <rule name="Arabic-initial">
         <look-behind>
           <choice>
             <start/>
             <rule>
               <class by-ref="transparent" count="0+"/>
               <class by-ref="non-joining"/>
             </rule>
           </choice>
         </look-behind>
         <anchor/>
         <look-ahead>
           <class by-ref="transparent" count="0+" />
           <choice>
             <class by-ref="right-joining" />
             <class by-ref="dual-joining" />
           </choice>
         </look-ahead>
       </rule>
        
       <class name="transparent" property="jt:T"/>
       <class name="right-joining" property="jt:R"/>
       <class name="left-joining" property="jt:L"/>
       <class name="dual-joining" property="jt:D"/>
       <class name="non-joining" property="jt:U"/>
       <rule name="Arabic-initial">
         <look-behind>
           <choice>
             <start/>
             <rule>
               <class by-ref="transparent" count="0+"/>
               <class by-ref="non-joining"/>
             </rule>
           </choice>
         </look-behind>
         <anchor/>
         <look-ahead>
           <class by-ref="transparent" count="0+" />
           <choice>
             <class by-ref="right-joining" />
             <class by-ref="dual-joining" />
           </choice>
         </look-ahead>
       </rule>
        

A "when" rule (or context rule) is a named rule that contains any combination of "look-behind", "anchor", and "look-ahead" elements, in that order. Each of these elements occurs at most once, except if nested inside a "choice" element in such a way that in matching each alternative at most one occurrence of each is encountered. Otherwise, the result is undefined. None of these elements takes a "count" attribute, nor does any enclosing match operator; otherwise, the result is undefined. If a context rule contains a "look-ahead" or "look-behind" element, it MUST contain an "anchor" element. If, because of a "choice" element, a required anchor is not actually encountered, the results are undefined.

“when”规则(或上下文规则)是一个命名规则,它按顺序包含“look behind”、“anchor”和“look ahead”元素的任意组合。这些元素中的每一个最多出现一次,除非嵌套在“choice”元素中,这样在匹配每个备选方案时,最多会遇到一次。否则,结果是未定义的。这些元素都不具有“count”属性,也没有任何封闭匹配运算符;否则,结果是未定义的。如果上下文规则包含“向前看”或“向后看”元素,则它必须包含“锚定”元素。如果由于“choice”元素,实际未遇到所需的锚,则结果未定义。

6.4.3. Omitting the "anchor" Element
6.4.3. 省略“锚”元素

If the "anchor" element is omitted, the evaluation of the context rule is not tied to the position of the code point or sequence associated with the "when" attribute.

如果省略了“anchor”元素,则上下文规则的计算不会绑定到与“when”属性关联的代码点或序列的位置。

According to [RFC5892], the Katakana middle dot is invalid in any label not containing at least one Japanese character anywhere in the label. Because this requirement is independent of the position of the middle dot, the rule does not require an "anchor" element.

根据[RFC5892],片假名中间点在标签中任何位置不包含至少一个日语字符的任何标签中无效。由于该要求与中间点的位置无关,因此该规则不需要“锚定”元素。

       <char cp="30FB" when="japanese-in-label"/>
       <rule name="japanese-in-label">
           <union>
               <class property="sc:Hani"/>
               <class property="sc:Kata"/>
               <class property="sc:Hira"/>
           </union>
       </rule>
        
       <char cp="30FB" when="japanese-in-label"/>
       <rule name="japanese-in-label">
           <union>
               <class property="sc:Hani"/>
               <class property="sc:Kata"/>
               <class property="sc:Hira"/>
           </union>
       </rule>
        

The Katakana middle dot is used only with Han, Katakana, or Hiragana. The corresponding When Rule requires that at least one code point in the label be in one of these scripts, but the position of that code point is independent of the location of the middle dot; therefore, no anchor is required. (Note that the Katakana middle dot itself is of script Common, that is, "sc:Zyyy".)

片假名中间点仅与Han、片假名或平假名一起使用。相应的When规则要求标签中至少有一个代码点位于其中一个脚本中,但该代码点的位置与中间点的位置无关;因此,不需要锚。(请注意,片假名中间点本身是常见的脚本,即“sc:zyy”。)

7. The "action" Element
7. “行动”要素

The purpose of an action is to assign a disposition to a label in response to being triggered by the label meeting a specified condition. Often, the action simply results in blocking or invalidating a label that does not match a rule. An example of an action invalidating a label because it does not match a rule named "leading-letter" is as follows:

操作的目的是为满足指定条件的标签触发后的标签分配处置。通常,该操作只会导致阻止或使与规则不匹配的标签无效。由于标签与名为“前导字母”的规则不匹配而导致标签无效的操作示例如下:

       <action disp="invalid" not-match="leading-letter"/>
        
       <action disp="invalid" not-match="leading-letter"/>
        

If an action is to be triggered on matching a rule, a "match" attribute is used instead. Actions are evaluated in the order that they appear in the XML file. Once an action is triggered by a label, the disposition defined in the "disp" attribute is assigned to the label and no other actions are evaluated for that label.

如果要在匹配规则时触发操作,则使用“匹配”属性。操作将按照它们在XML文件中出现的顺序进行计算。标签触发某个操作后,“disp”属性中定义的处置将分配给该标签,并且不会为该标签评估其他操作。

The goal of the LGR is to identify all labels and variant labels and to assign them disposition values. These dispositions are then fed into a further process that ultimately implements all aspects of policy. To allow this specification to be used with the widest range

LGR的目标是识别所有标签和变体标签,并为其分配处置值。然后,这些处置被反馈到进一步的过程中,最终实现政策的各个方面。允许在最大范围内使用本规范

of policies, the permissible values for the "disp" attribute are neither defined nor restricted. Nevertheless, a set of commonly used disposition values is RECOMMENDED. (See Section 7.3.)

对于策略,“disp”属性的允许值既没有定义也没有限制。不过,建议使用一组常用的处置值。(见第7.3节。)

7.1. The "match" and "not-match" Attributes
7.1. “匹配”和“不匹配”属性

An OPTIONAL "match" or "not-match" attribute specifies a rule that must be matched or not matched as a condition for triggering an action. Only a single rule may be named as the value of a "match" or "not-match" attribute. Because rules may be composed of other rules, this restriction to a single attribute value does not impose any limitation on the contexts that can trigger an action.

可选的“匹配”或“不匹配”属性指定必须匹配或不匹配的规则,作为触发操作的条件。只有一条规则可以命名为“匹配”或“不匹配”属性的值。由于规则可能由其他规则组成,因此对单个属性值的限制不会对可触发操作的上下文施加任何限制。

An action MUST NOT contain both a "match" and a "not-match" attribute, and the value of either attribute MUST be the name of a previously defined rule; otherwise, the document MUST be rejected. An action without any attributes is triggered by all labels unconditionally. For a very simple LGR, the following action would allocate all labels that match the repertoire:

一个操作不能同时包含“匹配”和“不匹配”属性,并且任一属性的值必须是先前定义的规则的名称;否则,必须拒绝该文件。没有任何属性的操作由所有标签无条件触发。对于非常简单的LGR,以下操作将分配与曲目匹配的所有标签:

       <action disp="allocatable" />
        
       <action disp="allocatable" />
        

Since rules are evaluated for all labels, whether they are the original label or computed by permuting the defined and valid variant mappings for the label's code points, actions based on matching or not matching a rule may be triggered for both original and variant labels, but the rules are not affected by the disposition attributes of the variant mappings. To trigger any actions based on these dispositions requires the use of additional optional attributes for actions described next.

由于规则是针对所有标签计算的,无论它们是原始标签还是通过排列标签代码点的已定义和有效变体映射来计算的,因此可能会针对原始标签和变体标签触发基于匹配或不匹配规则的操作,但这些规则不受变量映射的处置属性的影响。要触发基于这些配置的任何操作,需要为下面描述的操作使用其他可选属性。

7.2. Actions with Variant Type Triggers
7.2. 具有变量类型触发器的操作

7.2.1. The "any-variant", "all-variants", and "only-variants" Attributes

7.2.1. “任意变量”、“所有变量”和“仅变量”属性

An action may contain one of the OPTIONAL attributes "any-variant", "all-variants", or "only-variants" defining triggers based on variant types. The permitted value for these attributes consists of one or more variant type values, separated by spaces. These MAY include type values that are not used in any "var" element in the LGR. When a variant label is generated, these variant type values are compared to the set of type values on the variant mappings used to generate the particular variant label (see Section 8).

一个操作可能包含一个可选属性“任意变量”、“所有变量”或“仅变量”,根据变量类型定义触发器。这些属性的允许值由一个或多个变量类型值组成,用空格分隔。这些可能包括LGR中任何“var”元素中未使用的类型值。生成变量标签时,将这些变量类型值与用于生成特定变量标签的变量映射上的类型值集进行比较(参见第8节)。

Any single match may trigger an action that contains an "any-variant" attribute, while for an "all-variants" or "only-variants" attribute, the variant type for all variant code points must match one or

任何单个匹配都可能触发包含“任意变量”属性的操作,而对于“所有变量”或“仅变量”属性,所有变量代码点的变量类型必须匹配一个或多个变量

several of the type values specified in the attribute to trigger the action. There is no requirement that the entire list of variant type values be matched, as long as all variant code points match at least one of the values.

属性中指定的用于触发操作的多个类型值。只要所有变量代码点至少匹配一个值,就不需要匹配整个变量类型值列表。

An "only-variants" attribute will trigger the action only if all code points of the variant label have variant mappings from the original code points. In other words, the label contains no original code points other than those with a reflexive mapping (see Section 5.3.4).

仅当变量标签的所有代码点都具有来自原始代码点的变量映射时,“only variants”属性才会触发该操作。换句话说,标签不包含原始代码点,只有具有自反映射的代码点(见第5.3.4节)。

       <char cp="0078" comment="x">
           <var cp="0078" type="allocatable" comment="reflexive" />
           <var cp="0079" type="blocked" />
       </char>
       <char cp="0079" comment="y">
           <var cp="0078" type="allocatable" />
       </char>
       ...
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" only-variants="allocatable" />
       <action disp="some-disp" any-variant="allocatable" />
        
       <char cp="0078" comment="x">
           <var cp="0078" type="allocatable" comment="reflexive" />
           <var cp="0079" type="blocked" />
       </char>
       <char cp="0079" comment="y">
           <var cp="0078" type="allocatable" />
       </char>
       ...
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" only-variants="allocatable" />
       <action disp="some-disp" any-variant="allocatable" />
        

In the example above, the label "xx" would have variant labels "xx", "xy", "yx", and "yy". The first action would result in blocking any variant label containing "y", because the variant mapping from "x" to "y" is of type "blocked", triggering the "any-variant" condition. Because in this example "x" has a reflexive variant mapping to itself of type "allocatable", the original label "xx" has a reflexive variant "xx" that would trigger the "only-variants" condition on the second action.

在上面的示例中,标签“xx”将具有变体标签“xx”、“xy”、“yx”和“yy”。第一个操作将导致阻止包含“y”的任何变量标签,因为从“x”到“y”的变量映射类型为“blocked”,从而触发“any variant”条件。因为在本例中“x”有一个自反变量映射到“allocatable”类型的自身,所以原始标签“xx”有一个自反变量“xx”,它将在第二个动作上触发“only variants”条件。

A label "yy" would have the variants "xy", "yx", and "xx". Because the variant mapping from "y" to "x" is of type "allocatable" and a mapping from "y" to "y" is not defined, the labels "xy" and "yx" trigger the "any-variant" condition on the third label. The variant "xx", being generated using the mapping from "y" to "x" of type "allocatable", would trigger the "only-variants" condition on the section action. As there is no reflexive variant "yy", the original label "yy" cannot trigger any variant type triggers. However, it could still trigger an action defined as matching or not matching a rule.

标签“yy”将具有变体“xy”、“yx”和“xx”。由于从“y”到“x”的变量映射属于“可分配”类型,并且没有定义从“y”到“y”的映射,因此标签“xy”和“yx”会触发第三个标签上的“任意变量”条件。使用类型为“allocatable”的从“y”到“x”的映射生成的变量“xx”,将触发节操作上的“仅变量”条件。由于没有自反变体“yy”,原始标签“yy”无法触发任何变体类型触发器。但是,它仍然可以触发定义为匹配或不匹配规则的操作。

In each action, one variant type trigger may be present by itself or in conjunction with an attribute matching or not matching a rule. If variant triggers and rule-matching triggers are used together, the label MUST "match" or respectively "not-match" the specified rule AND satisfy the conditions on the variant type values given by the "any-variant", "all-variants", or "only-variants" attribute.

在每个操作中,一个变量类型触发器可能单独存在,也可能与匹配或不匹配规则的属性一起存在。如果同时使用变量触发器和规则匹配触发器,则标签必须与指定规则“匹配”或分别“不匹配”,并满足“任意变量”、“所有变量”或“仅变量”属性给出的变量类型值的条件。

A useful convention combines the "any-variant" trigger with reflexive variant mappings (Section 5.3.4). This convention is used, for example, when multiple LGRs are defined within the same registry and for overlapping repertoire. In some cases, the delegation of a label from one LGR must prohibit the delegation of another label in some other LGR. This can be done using a variant of type "blocked" as in this example from an Armenian LGR, where the Armenian, Latin, and Cyrillic letters all look identical:

一个有用的约定是将“任意变量”触发器与自反变量映射相结合(第5.3.4节)。例如,当在同一注册表中定义多个LGR时,以及对于重叠的曲目集,使用此约定。在某些情况下,一个LGR的标签授权必须禁止其他LGR中的其他标签授权。这可以使用“blocked”类型的变体完成,如本例中亚美尼亚LGR中的变体,其中亚美尼亚字母、拉丁字母和西里尔字母看起来都相同:

       <char cp="0570" comment="ARMENIAN SMALL LETTER HO">
         <var cp="0068" type="blocked" comment="LATIN SMALL LETTER H" />
         <var cp="04BB" type="blocked"
              comment="CYRILLIC SMALL LETTER SHHA" />
       </char>
        
       <char cp="0570" comment="ARMENIAN SMALL LETTER HO">
         <var cp="0068" type="blocked" comment="LATIN SMALL LETTER H" />
         <var cp="04BB" type="blocked"
              comment="CYRILLIC SMALL LETTER SHHA" />
       </char>
        

The issue is that the target code points for these two variants are both outside the Armenian repertoire. By using a reflexive variant with the following convention:

问题是,这两种变体的目标代码点都不在亚美尼亚语曲目中。通过使用具有以下约定的自反变体:

<char cp="0068" comment="not part of repertoire"> <var cp="0068" type="out-of-repertoire-var" comment="reflexive mapping" /> <var cp="04BB" type="blocked" /> <var cp="0570" type="blocked" /> </char> ...

<char cp=“0068”comment=“不是曲目的一部分”><var cp=“0068”type=“曲目之外的var”comment=“自反映射”/><var cp=“04BB”type=“阻塞”/><var cp=“0570”type=“阻塞”/></char>。。。

and associating this with an action of the form:

并将其与以下形式的动作关联:

       <action disp="invalid" any-variant="out-of-repertoire-var" />
        
       <action disp="invalid" any-variant="out-of-repertoire-var" />
        

it is possible to list the symmetric and transitive variant mappings in the LGR even where they involve out-of-repertoire code points. By associating the action shown with the special type for these reflexive mappings, any original labels containing one or more of the out-of-repertoire code points are filtered out, just as if these code points had not been listed in the LGR in the first place. Nevertheless, they do participate in the permutation of variant labels for n-repertoire labels (Armenian in the example), and these permuted variants can be used to detect collisions with out-of-repertoire labels (see Section 8).

在LGR中列出对称和可传递变量映射是可能的,即使它们涉及到表外代码点。通过将显示的操作与这些自反映射的特殊类型相关联,包含一个或多个表外代码点的任何原始标签都会被过滤掉,就像这些代码点最初没有在LGR中列出一样。然而,它们确实参与了n-曲目标签(示例中为亚美尼亚语)变体标签的排列,并且这些排列的变体可用于检测与曲目外标签的冲突(参见第8节)。

7.2.2. Example from Tables in the Style of RFC 3743
7.2.2. RFC 3743样式表中的示例

This section gives an example of using variant type triggers, combined with variants with reflexive mappings (Section 5.3.4), to achieve LGRs that implement tables like those defined according to [RFC3743] where the goal is to allow as variants only labels that consist entirely of simplified or traditional variants, in addition to the original label.

本节给出了一个使用变体类型触发器的示例,结合具有自反映射的变体(第5.3.4节),以实现LGR,该LGR实现了与[RFC3743]中定义的表类似的表,其目标是仅允许完全由简化或传统变体组成的标签作为变体,除了原来的标签。

This example assumes an LGR where all variants have been given suitable "type" attributes of "blocked", "simplified", "traditional", or "both", similar to the ones discussed in Appendix B. Given such an LGR, the following example actions evaluate the disposition for the variant label:

本示例假设一个LGR,其中所有变体都具有合适的“类型”属性,即“阻塞”、“简化”、“传统”或“两者”,类似于附录B中讨论的属性。给定此类LGR,以下示例操作评估变体标签的处置:

       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" only-variants="simplified both" />
       <action disp="allocatable" only-variants="traditional both" />
       <action disp="blocked" all-variants="simplified traditional" />
       <action disp="allocatable" />
        
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" only-variants="simplified both" />
       <action disp="allocatable" only-variants="traditional both" />
       <action disp="blocked" all-variants="simplified traditional" />
       <action disp="allocatable" />
        

The first action matches any variant label for which at least one of the code point variants is of type "blocked". The second matches any variant label for which all of the code point variants are of type "simplified" or "both" -- in other words, an all-simplified label. The third matches any label for which all variants are of type "traditional" or "both" -- that is, all traditional. These two actions are not triggered by any variant labels containing some original code points, unless each of those code points has a variant defined with a reflexive mapping (Section 5.3.4).

第一个操作匹配至少一个代码点变体为“blocked”类型的任何变体标签。第二个匹配所有代码点变体类型为“简化”或“两者”的任何变体标签——换句话说,是一个全简化标签。第三个匹配所有变体均为“传统”或“两者”类型的任何标签,即所有传统类型。这两个动作不会由包含一些原始代码点的任何变体标签触发,除非这些代码点中的每个都有一个用自反映射定义的变体(第5.3.4节)。

The final two actions rely on the fact that actions are evaluated in sequence and that the first action triggered also defines the final disposition for a variant label (see Section 7.4). They further rely on the assumption that the only variants with type "both" are also reflexive variants.

最后两项行动取决于一个事实,即按顺序评估行动,并且触发的第一项行动还定义了变体标签的最终处置(见第7.4节)。他们进一步依赖于这样的假设,即仅有的类型为“两者”的变体也是自反变体。

Given these assumptions, any remaining simplified or traditional variants must then be part of a mixed label and so are blocked; all labels surviving to the last action are original code points only (that is, the original label). The example assumes that an original label may be a mixed label; if that is not the case, the disposition for the last action would be set to "blocked".

鉴于这些假设,任何剩余的简化或传统变体必须是混合标签的一部分,因此被阻止;保留到最后一个操作的所有标签仅为原始代码点(即原始标签)。该示例假设原始标签可以是混合标签;如果情况并非如此,则最后一个操作的处置将设置为“已阻止”。

There are exceptions where the assumption on reflexive mappings made above does not hold, so this basic scheme needs some refinements to cover all cases. For a more complete example, see Appendix B.

也有例外,上面关于自反映射的假设不成立,所以这个基本方案需要一些改进以涵盖所有情况。有关更完整的示例,请参见附录B。

7.3. Recommended Disposition Values
7.3. 建议的处置值

The precise nature of the policy action taken in response to a disposition and the name of the corresponding "disp" attributes are only partially defined here. It is strongly RECOMMENDED to use the following dispositions only in their conventional sense.

此处仅部分定义了为响应处置而采取的策略操作的确切性质以及相应“disp”属性的名称。强烈建议仅在常规意义上使用以下配置。

invalid The resulting string is not a valid label. This disposition may be assigned implicitly; see Section 7.5. No variant labels should be generated from a variant mapping with this type.

无效结果字符串不是有效的标签。该处置可以隐式分配;见第7.5节。不应从具有此类型的变量映射生成任何变量标签。

blocked The resulting string is a valid label but should be blocked from registration. This would typically apply for a derived variant that is undesirable due to having no practical use or being confusingly similar to some other label.

blocked结果字符串是有效的标签,但应禁止注册。这通常适用于由于没有实际用途或与其他标签混淆相似而不受欢迎的衍生变体。

allocatable The resulting string should be reserved for use by the same operator of the origin string but not automatically allocated for use.

可分配结果字符串应保留供源字符串的同一运算符使用,但不自动分配使用。

activated The resulting string should be activated for use. (This is the same as a Preferred Variant [RFC3743].)

激活应激活生成的字符串以供使用。(这与首选变体[RFC3743]相同。)

valid The resultant string is a valid label. (This is the typical default action if no dispositions are defined.)

有效结果字符串是有效的标签。(如果未定义处置,这是典型的默认操作。)

7.4. Precedence
7.4. 优先

Actions are applied in the order of their appearance in the file. This defines their relative precedence. The first action triggered by a label defines the disposition for that label. To define the order of precedence, list the actions in the desired order. The conventional order of precedence for the actions defined in Section 7.3 is "invalid", "blocked", "allocatable", "activated", and then "valid". This default precedence is used for the default actions defined in Section 7.6.

操作将按其在文件中的显示顺序应用。这定义了它们的相对优先级。标签触发的第一个操作定义该标签的处置。要定义优先顺序,请按所需顺序列出操作。第7.3节中定义的操作的常规优先顺序为“无效”、“阻止”、“可分配”、“激活”,然后是“有效”。此默认优先级用于第7.6节中定义的默认操作。

7.5. Implied Actions
7.5. 默示行为

The context rules on code points ("not-when" or "when" rules) carry an implied action with a disposition of "invalid" (not eligible) if a "when" context is not satisfied or a "not-when" context is matched, respectively. These rules are evaluated at the time the code points for a label or its variant labels are checked for validity (see Section 8). In other words, they are evaluated before any of the actions are applied, and with higher precedence. The context rules for variant mappings are evaluated when variants are generated and/or when variant tables are made symmetric and transitive. They have an

如果不满足“何时”上下文或匹配“非何时”上下文,则代码点上的上下文规则(“非何时”或“何时”规则)分别包含一个隐含操作,其处置为“无效”(不合格)。在检查标签或其变体标签的代码点的有效性时,对这些规则进行评估(见第8节)。换句话说,它们在应用任何操作之前进行评估,并且具有更高的优先级。当生成变量和/或使变量表对称和可传递时,将评估变量映射的上下文规则。他们有一个

implied action with a disposition of "invalid", which means that a putative variant mapping does not exist whenever the given context matches a "not-when" rule or fails to match a "when" rule specified for that mapping. The result of that disposition is that the variant mapping is ignored in generating variant labels and the value is therefore not accessible to trigger any explicit actions.

处理为“invalid”的隐含操作,这意味着只要给定上下文与“not when”规则匹配或与为该映射指定的“when”规则不匹配,则假定的变量映射就不存在。该处理的结果是在生成变量标签时忽略变量映射,因此无法访问该值以触发任何显式操作。

Note that such non-existing variant mapping is different from a blocked variant, which is a variant code point mapping that exists but results in a label that may not be allocated.

请注意,这种不存在的变量映射不同于阻塞的变量,阻塞的变量是存在的变量代码点映射,但可能导致无法分配标签。

7.6. Default Actions
7.6. 默认操作

If a label does not trigger any of the actions defined explicitly in the LGR, the following implicitly defined default actions are evaluated. They are shown below in their relative order of precedence (see Section 7.4). Default actions have a lower order of precedence than explicit actions (see Section 8.3).

如果标签未触发LGR中显式定义的任何操作,则将计算以下隐式定义的默认操作。它们按相对优先顺序显示如下(见第7.4节)。默认操作的优先级低于显式操作(见第8.3节)。

The default actions for variant labels are defined as follows. The first set is triggered based on the standard variant type values of "invalid", "blocked", "allocatable", and "activated":

变量标签的默认操作定义如下。第一组根据“无效”、“阻止”、“可分配”和“激活”的标准变量类型值触发:

       <action disp="invalid" any-variant="invalid"/>
       <action disp="blocked" any-variant="blocked"/>
       <action disp="allocatable" any-variant="allocatable"/>
       <action disp="activated" all-variants="activated"/>
        
       <action disp="invalid" any-variant="invalid"/>
       <action disp="blocked" any-variant="blocked"/>
       <action disp="allocatable" any-variant="allocatable"/>
       <action disp="activated" all-variants="activated"/>
        

A final default action sets the disposition to "valid" for any label matching the repertoire for which no other action has been triggered. This "catch-all" action also matches all remaining variant labels from variants that do not have a type value.

最后一个默认操作会将与曲目匹配的任何标签的处置设置为“有效”,而没有触发其他操作。此“全部捕获”操作还匹配来自没有类型值的变体的所有剩余变体标签。

       <action disp="valid" comment="Catch-all if other rules not met"/>
        
       <action disp="valid" comment="Catch-all if other rules not met"/>
        

Conceptually, the implicitly defined default actions act just like a block of "action" elements that is added (virtually) beyond the last of the user-supplied actions. Any label not processed by the user-supplied actions would thus be processed by the default actions as if they were present in the LGR. As the last default action is a "catch-all", all processing is guaranteed to end with a definite disposition for the label.

从概念上讲,隐式定义的默认操作就像是在用户提供的最后一个操作之后添加(实际上)的“操作”元素块。因此,任何未由用户提供的操作处理的标签都将由默认操作处理,就像它们出现在LGR中一样。由于最后一个默认操作是“一网打尽”,所有处理都保证以标签的明确处置结束。

8. Processing a Label against an LGR
8. 根据LGR处理标签
8.1. Determining Eligibility for a Label
8.1. 确定标签的合格性

In order to test a given label for membership in the LGR, a consumer of the LGR must iterate through each code point within a given label and test that each instance of a code point is a member of the LGR. If any instance of a code point is not a member of the LGR, the label shall be deemed invalid.

为了测试给定标签在LGR中的成员资格,LGR的使用者必须迭代给定标签内的每个代码点,并测试代码点的每个实例是否是LGR的成员。如果代码点的任何实例不是LGR的成员,则标签应视为无效。

An individual instance of a code point is deemed a member of the LGR when it is listed using a "char" element, or is part of a range defined with a "range" element, and all necessary conditions in any "when" or "not-when" attributes are correctly satisfied for that instance.

如果代码点的单个实例使用“char”元素列出,或者是使用“range”元素定义的范围的一部分,并且该实例正确满足任何“when”或“not when”属性中的所有必要条件,则该实例被视为LGR的成员。

Alternatively, an instance of a code point is also deemed a member of the LGR when it forms part of a sequence that corresponds to a sequence listed using a "char" element for which the "cp" attribute defines a sequence, and all necessary conditions in any "when" or "not-when" attributes are correctly satisfied for that instance of the sequence.

或者,当代码点实例构成与使用“cp”属性定义序列的“char”元素列出的序列相对应的序列的一部分,并且正确满足该序列实例的任何“when”或“not when”属性中的所有必要条件时,代码点实例也被视为LGR的成员。

In determining eligibility, at each position the longest possible sequence of code points is evaluated first. If that sequence matches a sequence defined in the LGR and satisfies any required context at that position, the instances of its constituent code points are deemed members of the LGR and evaluation proceeds with the next code point following the sequence. If the sequence does not match a defined sequence or does not satisfy the required context, successively shorter sequences are evaluated until only a single code point remains. The eligibility of that code point is determined as described above for an individual code point instance.

在确定合格性时,首先在每个位置评估可能最长的代码点序列。如果该序列与LGR中定义的序列相匹配,并且在该位置满足任何所需的上下文,则其组成代码点的实例被视为LGR的成员,并从该序列之后的下一个代码点进行评估。如果序列与定义的序列不匹配或不满足所需的上下文,则依次计算较短的序列,直到只剩下一个代码点。对于单个代码点实例,如上所述确定该代码点的合格性。

A label must also not trigger any action that results in a disposition of "invalid"; otherwise, it is deemed not eligible. (This step may need to be deferred until variant code point dispositions have been determined.)

标签也不得触发任何导致“无效”处置的操作;否则,视为不合格。(此步骤可能需要推迟,直到确定了变体代码点配置。)

8.1.1. Determining Eligibility Using Reflexive Variant Mappings
8.1.1. 使用自反变量映射确定合格性

For LGRs that contain reflexive variant mappings (defined in Section 5.3.4), the final evaluation of eligibility for the label must be deferred until variants are generated. In essence, LGRs that use this feature treat the original label as the (identity) variant of itself. For such LGRs, the ordinary determination of eligibility described here is but a first step that generally excludes only a subset of invalid labels.

对于包含自反变量映射(定义见第5.3.4节)的LGR,标签合格性的最终评估必须推迟到生成变量之后。本质上,使用此功能的LGR将原始标签视为其自身的(标识)变体。对于此类LGR,此处描述的合格性的普通确定只是第一步,通常仅排除无效标签的子集。

To further check the validity of a label with reflexive mappings, it is not necessary to generate all variant labels. Only a single variant needs to be created, where any reflexive variants are applied for each code point, and the label disposition is evaluated (as described in Section 8.3). A disposition of "invalid" results in the label being not eligible. (In the exceptional case where context rules are present on reflexive mappings, multiple reflexive variants may be defined, but for each original label, at most one of these can be valid at each code position. However, see Section 8.4.)

为了进一步检查具有自反映射的标签的有效性,不必生成所有变量标签。只需要创建一个变体,其中对每个代码点应用任何自反变体,并评估标签配置(如第8.3节所述)。“无效”的处置将导致标签不合格。(在自反映射上存在上下文规则的例外情况下,可以定义多个自反变体,但对于每个原始标签,在每个代码位置最多可以有一个自反变体有效。但是,请参见第8.4节。)

8.2. Determining Variants for a Label
8.2. 确定标签的变体

For a given eligible label, the set of variant labels is deemed to consist of each possible permutation of original code points and substituted code points or sequences defined in "var" elements, whereby all "when" and "not-when" attributes are correctly satisfied for each "char" or "var" element in the given permutation and all applicable whole label rules are satisfied as follows:

对于给定的合格标签,变体标签集被视为由原始代码点和替换代码点或“var”元素中定义的序列的每个可能排列组成,由此,每个“char”或“var”正确满足所有“when”和“not when”属性给定排列中的元素和所有适用的完整标签规则满足如下要求:

1. Create each possible permutation of a label by substituting each code point or code point sequence in turn by any defined variant mapping (including any reflexive mappings).

1. 通过将每个代码点或代码点序列依次替换为任何定义的变量映射(包括任何自反映射),创建标签的每个可能排列。

2. Apply variant mappings with "when" or "not-when" attributes only if the conditions are satisfied; otherwise, they are not defined.

2. 仅当满足条件时,才应用带有“何时”或“不何时”属性的变量映射;否则,它们没有定义。

3. Record each of the "type" values on the variant mappings used in creating a given variant label in a disposition set; for any unmapped code point, record the "type" value of any reflexive variant (see Section 5.3.4).

3. 记录用于在处置集中创建给定变量标签的变量映射上的每个“类型”值;对于任何未映射的代码点,记录任何自反变量的“类型”值(见第5.3.4节)。

4. Determine the disposition for each variant label per Section 8.3.

4. 根据第8.3节确定每个变体标签的处置。

5. If the disposition is "invalid", remove the label from the set.

5. 如果处置“无效”,则从集合中移除标签。

6. If final evaluation of the disposition for the unpermuted label per Section 8.3 results in a disposition of "invalid", remove all associated variant labels from the set.

6. 如果根据第8.3节对未授权标签处置的最终评估结果为“无效”,则从集合中移除所有相关的变体标签。

The number of potential permutations can be very large. In practice, implementations would use suitable optimizations to avoid having to actually create all permutations (see Section 8.5).

潜在排列的数量可能非常大。在实践中,实现将使用适当的优化来避免实际创建所有置换(参见第8.5节)。

In determining the permuted set of variant labels in step (1) above, all eligible partitions into sequences must be evaluated. A label "ab" that matches a sequence "ab" defined in the LGR but also matches

在确定上述步骤(1)中变量标签的置换集时,必须评估序列中所有合格的分区。与LGR中定义的序列“ab”匹配但也匹配的标签“ab”

the sequence of individual code points "a" and "b" (both defined in the LGR) must be permuted using any defined variant mappings for both the sequence "ab" and the code points "a" and "b" individually.

单个代码点“a”和“b”(均在LGR中定义)的序列必须使用为序列“ab”和代码点“a”和“b”分别定义的任何变量映射进行置换。

8.3. Determining a Disposition for a Label or Variant Label
8.3. 确定标签或变体标签的配置

For a given label (variant or original), its disposition is determined by evaluating, in order of their appearance, all actions for which the label or variant label satisfies the conditions.

对于给定的标签(变体或原始标签),其处置是通过按外观顺序评估标签或变体标签满足条件的所有操作来确定的。

1. For any label that contains code points or sequences not defined in the repertoire, or does not satisfy the context rules on all of its code points and variants, the disposition is "invalid".

1. 对于包含未在指令表中定义的代码点或序列,或不满足其所有代码点和变体的上下文规则的任何标签,处置为“无效”。

2. For all other labels, the disposition is given by the value of the "disp" attribute for the first action triggered by the label. An action is triggered if all of the following are true:

2. 对于所有其他标签,处置由标签触发的第一个操作的“disp”属性的值给出。如果以下所有条件均为真,则会触发操作:

* the label matches the whole label rule given in the "match" attribute for that action;

* 标签与该操作的“匹配”属性中给出的整个标签规则相匹配;

* the label does not match the whole label rule given in the "not-match" attribute for that action;

* 标签与该操作的“不匹配”属性中给出的整个标签规则不匹配;

* any of the recorded variant types for a variant label match the types given in the "any-variant" attribute for that action;

* 变量标签记录的任何变量类型与该操作的“任意变量”属性中给出的类型匹配;

* all of the recorded variant types for a variant label match the types given in the "all-variants" or "only-variants" attribute given for that action;

* 变量标签上记录的所有变量类型均与该操作的“所有变量”或“仅变量”属性中给出的类型匹配;

* in case of an "only-variants" attribute, the label contains only code points that are the target of applied variant mappings;

* 对于“仅变量”属性,标签仅包含作为应用变量映射目标的代码点;

or

* the action does not contain any "match", "not-match", "any-variant", "all-variants", or "only-variants" attributes: catch-all.

* 该操作不包含任何“匹配”、“不匹配”、“任何变体”、“所有变体”或“仅变体”属性:全部捕获。

3. For any remaining variant label, assign the variant label the disposition using the default actions defined in Section 7.6. For this step, variant types outside the predefined recommended set (see Section 7.3) are ignored.

3. 对于任何剩余的变体标签,使用第7.6节中定义的默认操作为变体标签分配处置。对于该步骤,将忽略预定义推荐集(参见第7.3节)之外的变量类型。

4. For any remaining label, set the disposition to "valid".

4. 对于任何剩余标签,将处置设置为“有效”。

8.4. Duplicate Variant Labels
8.4. 重复变量标签

For a poorly designed LGR, it is possible to generate duplicate variant labels from the same input label, but with different, and potentially conflicting, dispositions. Implementations MUST treat any duplicate variant labels encountered as an error, irrespective of their dispositions.

对于设计拙劣的LGR,可能会从相同的输入标签生成重复的变体标签,但配置不同且可能冲突。实现必须将遇到的任何重复变量标签视为错误,而不管它们的配置如何。

This situation can arise in two ways. One is described in Section 5.3.5 and involves defining the same variant mapping with two context rules that are formally distinct but nevertheless overlap so that they are not mutually exclusive for the same label.

这种情况可以通过两种方式出现。其中一个在第5.3.5节中描述,涉及使用两个上下文规则定义相同的变量映射,这两个上下文规则在形式上是不同的,但仍然是重叠的,因此它们对于同一标签不是相互排斥的。

The other case involves variants defined for sequences, where one sequence is a prefix of another (see Section 5.3.1). The following shows such an example resulting in conflicting reflexive variants:

另一种情况涉及为序列定义的变体,其中一个序列是另一个序列的前缀(见第5.3.1节)。下面显示了导致自反变体冲突的示例:

       <char cp="0061">
         <var cp="0061" type="allocatable"/>
       </char>
       <char cp="0062"/>
       <char cp="0061 0062">
         <var cp="0061 0062" type="blocked"/>
       </char>
        
       <char cp="0061">
         <var cp="0061" type="allocatable"/>
       </char>
       <char cp="0062"/>
       <char cp="0061 0062">
         <var cp="0061 0062" type="blocked"/>
       </char>
        

A label "ab" would generate the variant labels "{a}{b}" and "{ab}" where the curly braces show the sequence boundaries as they were applied during variant mapping. The result is a duplicate variant label "ab", one based on a variant of type "allocatable" plus an original code point "b" that has no variant, and another one based on a single variant of type "blocked", thus creating two variant labels with conflicting dispositions.

标签“ab”将生成变体标签“{A}{b}”和“{ab}”,其中花括号显示在变体映射期间应用的序列边界。结果是一个重复的变量标签“ab”,一个基于类型“allocatable”的变量加上没有变量的原始代码点“b”,另一个基于类型“blocked”的单个变量,从而创建两个具有冲突配置的变量标签。

In the general case, it is difficult to impossible to prove by mechanical inspection of the LGR that duplicate variant labels will never occur, so implementations have to be prepared to detect this error during variant label generation. The condition is easily avoided by careful design of context rules and special attention to the relation among code point sequences with variants.

在一般情况下,很难通过对LGR的机械检查来证明不会出现重复的变体标签,因此必须准备好实施,以便在变体标签生成期间检测此错误。通过仔细设计上下文规则和特别注意具有变体的代码点序列之间的关系,可以很容易地避免这种情况。

8.5. Checking Labels for Collision
8.5. 检查标签是否有碰撞

The obvious method for checking for collision between labels is to generate the fully permuted set of variants for one of them and see whether it contains the other label as a member. As discussed above, this can be prohibitive and is not necessary.

检查标签之间冲突的明显方法是为其中一个标签生成完全置换的变量集,并查看它是否包含另一个标签作为成员。如上所述,这可能是禁止的,并且不是必需的。

Because of symmetry and transitivity, all variant mappings form disjoint sets. In each of these sets, the source and target of each mapping are also variants of the sources and targets of all the other mappings. However, members of two different sets are never variants of each other.

由于对称性和传递性,所有变量映射都形成不相交集。在这些集合中,每个映射的源和目标也是所有其他映射的源和目标的变体。然而,两个不同集合的成员从来不是彼此的变体。

If two labels have code points at the same position that are members of two different variant mapping sets, any variant labels of one cannot be variant labels of the other: the sets of their variant labels are likewise disjoint. Instead of generating all permutations to compare all possible variants, it is enough to find out whether code points at the same position belong to the same variant set or not.

如果两个标签的代码点位于同一位置,并且是两个不同变量映射集的成员,则其中一个的任何变量标签都不能是另一个的变量标签:它们的变量标签集也是不相交的。与其生成所有排列来比较所有可能的变体,不如找出相同位置的代码点是否属于相同的变体集。

For that, it is sufficient to substitute an "index" mapping that identifies the set. This index mapping could be, for example, the variant mapping for which the target code point (or sequence) comes first in some sorting order. This index mapping would, in effect, identify the set of variant mappings for that position.

为此,替换标识集合的“索引”映射就足够了。例如,该索引映射可以是变量映射,目标代码点(或序列)以某种排序顺序排在第一位。实际上,这个索引映射将标识该位置的变量映射集。

To check for collision then means generating a single variant label from the original by substituting the respective "index" value for each code point. This results in an "index label". Two labels collide whenever the index labels for them are the same.

检查冲突意味着通过替换每个代码点的相应“索引”值,从原始文件生成一个变体标签。这将产生一个“索引标签”。只要两个标签的索引标签相同,它们就会发生冲突。

9. Conversion to and from Other Formats
9. 与其他格式之间的转换

Both [RFC3743] and [RFC4290] provide different grammars for IDN tables. The formats in those documents are unable to fully support the increased requirements of contemporary IDN variant policies.

[RFC3743]和[RFC4290]都为IDN表提供了不同的语法。这些文件中的格式无法完全支持当代IDN变体政策不断增加的要求。

This specification is a superset of functionality provided by the older IDN table formats; thus, any table expressed in those formats can be expressed in this new format. Automated conversion can be conducted between tables conformant with the grammar specified in each document.

本规范是旧IDN表格格式提供的功能的超集;因此,任何以这些格式表示的表都可以用这种新格式表示。可以在符合每个文档中指定语法的表之间进行自动转换。

For notes on how to translate a table in the style of RFC 3743, see Appendix B.

有关如何翻译RFC 3743样式的表格的注释,请参见附录B。

10. Media Type
10. 媒体类型

Well-formed LGRs that comply with this specification SHOULD be transmitted with a media type of "application/lgr+xml". This media type will signal to an LGR-aware client that the content is designed to be interpreted as an LGR.

符合本规范的格式良好的lgr应使用“application/lgr+xml”媒体类型进行传输。此媒体类型将向支持LGR的客户端发出信号,表明内容被设计为解释为LGR。

11. IANA Considerations
11. IANA考虑

IANA has completed the following actions:

IANA已完成以下操作:

11.1. Media Type Registration
11.1. 媒体类型注册

The media type "application/lgr+xml" has been registered to denote transmission of LGRs that are compliant with this specification, in accordance with [RFC6838].

根据[RFC6838],已注册媒体类型“application/lgr+xml”以表示符合本规范的lgr传输。

Type name: application

类型名称:应用程序

Subtype name: lgr+xml

子类型名称:lgr+xml

Required parameters: N/A

所需参数:不适用

Optional parameters: charset (as for application/xml per [RFC7303])

可选参数:字符集(与[RFC7303]中的应用程序/xml相同)

Security considerations: See the security considerations for application/xml in [RFC7303] and the specific security considerations for Label Generation Rulesets (LGRs) in RFC 7940

安全注意事项:请参阅[RFC7303]中应用程序/xml的安全注意事项和RFC 7940中标签生成规则集(LGR)的特定安全注意事项

Interoperability considerations: As for application/xml per [RFC7303]

互操作性注意事项:根据[RFC7303]针对应用程序/xml

Published specification: See RFC 7940

已发布规范:见RFC 7940

Applications that use this media type: Software using LGRs for international identifiers, such as IDNs, including registry applications and client validators.

使用此媒体类型的应用程序:使用LGR作为国际标识符(如IDN)的软件,包括注册表应用程序和客户端验证程序。

Additional information:

其他信息:

Deprecated alias names for this type: N/A

此类型的已弃用别名:不适用

      Magic number(s): N/A
        
      Magic number(s): N/A
        

File extension(s): .lgr

文件扩展名:.lgr

      Macintosh file type code(s): N/A
        
      Macintosh file type code(s): N/A
        

Person & email address to contact for further information:

联系人和电子邮件地址,以获取更多信息:

      Kim Davies <kim.davies@icann.org>
        
      Kim Davies <kim.davies@icann.org>
        
      Asmus Freytag <asmus@unicode.org>
        
      Asmus Freytag <asmus@unicode.org>
        

Intended usage: COMMON

预期用途:普通

Restrictions on usage: N/A

使用限制:不适用

Author:

作者:

      Kim Davies <kim.davies@icann.org>
        
      Kim Davies <kim.davies@icann.org>
        
      Asmus Freytag <asmus@unicode.org>
        
      Asmus Freytag <asmus@unicode.org>
        

Change controller: IESG

更改控制器:IESG

Provisional registration? (standards tree only): No

临时登记?(仅限标准树):否

11.2. URN Registration
11.2. 骨灰盒注册

This specification uses a URN to describe the XML namespace, in accordance with [RFC3688].

根据[RFC3688],本规范使用URN来描述XML名称空间。

   URI: urn:ietf:params:xml:ns:lgr-1.0
        
   URI: urn:ietf:params:xml:ns:lgr-1.0
        

Registrant Contact: See the Authors of this document.

注册人联系人:见本文件作者。

XML: None.

XML:没有。

11.3. Disposition Registry
11.3. 处置登记处

This document establishes a vocabulary of "Label Generation Ruleset Dispositions", which has been reflected as a new IANA registry. This registry is divided into two subregistries:

本文件建立了“标签生成规则集配置”词汇表,该词汇表已反映为新的IANA注册表。该登记册分为两个子区域:

o Standard Dispositions - This registry lists dispositions that have been defined in published specifications, i.e., the eligibility for such registrations is "Specification Required" [RFC5226]. The initial set of registrations are the five dispositions in this document described in Section 7.3.

o 标准处置-此注册表列出了已发布规范中定义的处置,即此类注册的资格为“规范要求”[RFC5226]。初始注册集为本文件第7.3节所述的五种处置。

o Private Dispositions - This registry lists dispositions that have been registered "First Come First Served" [RFC5226] by third parties with the IANA. Such dispositions must take the form "entity:disposition" where the entity is a domain name that uniquely identifies the private user of the namespace. For example, "example.org:reserved" could be a private extension used by the example organization to denote a disposition relating to reserved labels. These extensions are not intended to be interoperable, but registration is designed to minimize potential conflicts. It is strongly recommended that any new dispositions that require interoperability and have applicability beyond a single organization be defined as Standard Dispositions.

o 私人处置-此注册表列出了第三方在IANA“先到先得”[RFC5226]注册的处置。此类处置必须采用“实体:处置”的形式,其中实体是唯一标识命名空间的私有用户的域名。例如,“example.org:reserved”可以是示例组织用来表示与保留标签相关的处置的私有扩展。这些扩展并不旨在实现互操作,但注册旨在最大限度地减少潜在冲突。强烈建议将任何需要互操作性且适用性超出单个组织范围的新配置定义为标准配置。

In order to distinguish them from Private Dispositions, Standard Dispositions MUST NOT contain the ":" character. All disposition names shall be in lowercase ASCII.

为了将其与私人处置区分开来,标准处置不得包含“:”字符。所有处置名称应为小写ASCII。

The IANA registry provides data on the name of the disposition, the intended purposes, and the registrant or defining specification for the disposition.

IANA注册中心提供有关处置名称、预期用途和注册人或处置定义规范的数据。

12. Security Considerations
12. 安全考虑
12.1. LGRs Are Only a Partial Remedy for Problem Space
12.1. LGR只是问题空间的部分补救措施

Substantially unrestricted use of non-ASCII characters in security-relevant identifiers such as domain name labels may cause user confusion and invite various types of attacks. In many languages, in particular those using complex or large scripts, an attacker has an opportunity to divert or confuse users as a result of different code points with identical appearance or similar semantics.

在安全相关标识符(如域名标签)中不受限制地使用非ASCII字符可能会导致用户混淆并引发各种类型的攻击。在许多语言中,尤其是在使用复杂或大型脚本的语言中,由于具有相同外观或类似语义的不同代码点,攻击者有机会转移或混淆用户。

The use of an LGR provides a partial remedy for these risks by supplying a framework for prohibiting inappropriate code points or sequences from being registered at all and for permitting "variant" code points to be grouped together so that labels containing them may be mutually exclusive or registered only to the same owner.

LGR的使用为这些风险提供了部分补救措施,提供了一个框架,用于禁止不适当的代码点或序列进行注册,并允许将“变体”代码点分组在一起,以便包含它们的标签可以相互排斥或仅向同一所有者注册。

In addition, by being fully machine processable the format may enable automated checks for known weaknesses in label generation rules. However, the use of this format, or compliance with this specification, by itself does not ensure that the LGRs expressed in this format are free of risk. Additional approaches may be considered, depending on the acceptable trade-off between flexibility and risk for a given application. One method of managing risk may involve a case-by-case evaluation of a proposed label in context with already-registered labels -- for example, when reviewing labels for their degree of visual confusability.

此外,通过完全机器可处理,该格式可以自动检查标签生成规则中的已知缺陷。然而,使用该格式或遵守本规范本身并不确保以该格式表示的LGR没有风险。根据给定应用程序的灵活性和风险之间可接受的权衡,可以考虑其他方法。管理风险的一种方法可能涉及在已注册标签的上下文中对建议标签进行逐案评估——例如,在审查标签的视觉混淆程度时。

12.2. Computational Expense of Complex Tables
12.2. 复杂表的计算费用

A naive implementation attempting to generate all variant labels for a given label could lead to the possibility of exhausting the resources on the machine running the LGR processor, potentially causing denial-of-service consequences. For many operations, brute-force generation can be avoided by optimization, and if needed, the number of permuted labels can be estimated more cheaply ahead of time.

尝试为给定标签生成所有变体标签的幼稚实现可能会耗尽运行LGR处理器的机器上的资源,可能导致拒绝服务后果。对于许多操作,可以通过优化避免暴力生成,如果需要,可以提前更便宜地估计置换标签的数量。

The implementation of WLE rules, using certain backtracking algorithms, can take exponential time for pathological rules or labels and exhaust stack resources. This can be mitigated by proper implementation and enforcing the restrictions on permissible label length.

WLE规则的实现,使用某些回溯算法,可能需要病态规则或标签的指数时间,并耗尽堆栈资源。这可以通过适当的实施和对允许标签长度的限制来缓解。

13. References
13. 工具书类
13.1. Normative References
13.1. 规范性引用文件

[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, <http://www.rfc-editor.org/info/rfc2045>.

[RFC2045]Freed,N.和N.Borenstein,“多用途互联网邮件扩展(MIME)第一部分:互联网邮件正文格式”,RFC 2045,DOI 10.17487/RFC20451996年11月<http://www.rfc-editor.org/info/rfc2045>.

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<http://www.rfc-editor.org/info/rfc2119>.

[RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002, <http://www.rfc-editor.org/info/rfc3339>.

[RFC3339]Klyne,G.和C.Newman,“互联网上的日期和时间:时间戳”,RFC 3339,DOI 10.17487/RFC3339,2002年7月<http://www.rfc-editor.org/info/rfc3339>.

[RFC5646] Phillips, A., Ed., and M. Davis, Ed., "Tags for Identifying Languages", BCP 47, RFC 5646, DOI 10.17487/RFC5646, September 2009, <http://www.rfc-editor.org/info/rfc5646>.

[RFC5646]Phillips,A.,Ed.,和M.Davis,Ed.,“识别语言的标签”,BCP 47,RFC 5646,DOI 10.17487/RFC5646,2009年9月<http://www.rfc-editor.org/info/rfc5646>.

[UAX42] The Unicode Consortium, "Unicode Character Database in XML", May 2016, <http://unicode.org/reports/tr42/>.

[UAX42]Unicode联盟,“XML中的Unicode字符数据库”,2016年5月<http://unicode.org/reports/tr42/>.

[Unicode-Stability] The Unicode Consortium, "Unicode Encoding Stability Policy, Property Value Stability", April 2015, <http://www.unicode.org/policies/ stability_policy.html#Property_Value>.

[Unicode稳定性]Unicode联盟,“Unicode编码稳定性政策,财产价值稳定性”,2015年4月<http://www.unicode.org/policies/ stability_policy.html#Property_Value>。

[Unicode-Versions] The Unicode Consortium, "Unicode Version Numbering", June 2016, <http://unicode.org/versions/#Version_Numbering>.

[Unicode版本]Unicode联盟,“Unicode版本编号”,2016年6月<http://unicode.org/versions/#Version_Numbering>.

[XML] Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth Edition)", World Wide Web Consortium, November 2008, <http://www.w3.org/TR/REC-xml/>.

[XML]Bray,T.,Paoli,J.,Sperberg McQueen,M.,Maler,E.,和F.Yergeau,“可扩展标记语言(XML)1.0(第五版)”,万维网联盟,2008年11月<http://www.w3.org/TR/REC-xml/>.

13.2. Informative References
13.2. 资料性引用

[ASIA-TABLE] DotAsia Organisation, ".ASIA ZH IDN Language Table", February 2012, <http://www.dot.asia/policies/ASIA-ZH-1.2.pdf>.

[ASIA-TABLE]DotAsia组织,“.ASIA ZH IDN语言表”,2012年2月<http://www.dot.asia/policies/ASIA-ZH-1.2.pdf>.

[LGR-PROCEDURE] Internet Corporation for Assigned Names and Numbers, "Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels", December 2012, <http://www.icann.org/en/resources/idn/ draft-lgr-procedure-07dec12-en.pdf>.

[LGR-程序]互联网名称和编号分配公司,“制定和维护IDNA标签根区域标签生成规则的程序”,2012年12月<http://www.icann.org/en/resources/idn/ 草稿-lgr-procedure-07dec12-en.pdf>。

[RELAX-NG] The Organization for the Advancement of Structured Information Standards (OASIS), "RELAX NG Compact Syntax", November 2002, <https://www.oasis-open.org/committees/ relax-ng/compact-20021121.html>.

[RELAX-NG]结构化信息标准促进组织(OASIS),“RELAX-NG紧凑语法”,2002年11月<https://www.oasis-open.org/committees/ relax ng/compact-20021121.html>。

[RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, DOI 10.17487/RFC3688, January 2004, <http://www.rfc-editor.org/info/rfc3688>.

[RFC3688]Mealling,M.,“IETF XML注册表”,BCP 81,RFC 3688,DOI 10.17487/RFC3688,2004年1月<http://www.rfc-editor.org/info/rfc3688>.

[RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint Engineering Team (JET) Guidelines for Internationalized Domain Names (IDN) Registration and Administration for Chinese, Japanese, and Korean", RFC 3743, DOI 10.17487/RFC3743, April 2004, <http://www.rfc-editor.org/info/rfc3743>.

[RFC3743]Konishi,K.,Huang,K.,Qian,H.,和Y.Ko,“中国,日本和韩国国际域名(IDN)注册和管理联合工程团队(JET)指南”,RFC 3743,DOI 10.17487/RFC3743,2004年4月<http://www.rfc-editor.org/info/rfc3743>.

[RFC4290] Klensin, J., "Suggested Practices for Registration of Internationalized Domain Names (IDN)", RFC 4290, DOI 10.17487/RFC4290, December 2005, <http://www.rfc-editor.org/info/rfc4290>.

[RFC4290]Klensin,J.,“国际域名(IDN)注册的建议做法”,RFC 4290,DOI 10.17487/RFC4290,2005年12月<http://www.rfc-editor.org/info/rfc4290>.

[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, DOI 10.17487/RFC5226, May 2008, <http://www.rfc-editor.org/info/rfc5226>.

[RFC5226]Narten,T.和H.Alvestrand,“在RFCs中编写IANA注意事项部分的指南”,BCP 26,RFC 5226,DOI 10.17487/RFC5226,2008年5月<http://www.rfc-editor.org/info/rfc5226>.

[RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman, "Linguistic Guidelines for the Use of the Arabic Language in Internet Domains", RFC 5564, DOI 10.17487/RFC5564, February 2010, <http://www.rfc-editor.org/info/rfc5564>.

[RFC5564]El Sherbiny,A.,Farah,M.,Oueichek,I.,和A.Al Zoman,“互联网领域使用阿拉伯语的语言指南”,RFC 5564,DOI 10.17487/RFC5564,2010年2月<http://www.rfc-editor.org/info/rfc5564>.

[RFC5891] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, DOI 10.17487/RFC5891, August 2010, <http://www.rfc-editor.org/info/rfc5891>.

[RFC5891]Klensin,J.,“应用程序中的国际化域名(IDNA):协议”,RFC 5891,DOI 10.17487/RFC5891,2010年8月<http://www.rfc-editor.org/info/rfc5891>.

[RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)", RFC 5892, DOI 10.17487/RFC5892, August 2010, <http://www.rfc-editor.org/info/rfc5892>.

[RFC5892]Faltstrom,P.,Ed.“Unicode码点和应用程序的国际化域名(IDNA)”,RFC 5892,DOI 10.17487/RFC5892,2010年8月<http://www.rfc-editor.org/info/rfc5892>.

[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, DOI 10.17487/RFC6838, January 2013, <http://www.rfc-editor.org/info/rfc6838>.

[RFC6838]Freed,N.,Klensin,J.和T.Hansen,“介质类型规范和注册程序”,BCP 13,RFC 6838,DOI 10.17487/RFC6838,2013年1月<http://www.rfc-editor.org/info/rfc6838>.

[RFC7303] Thompson, H. and C. Lilley, "XML Media Types", RFC 7303, DOI 10.17487/RFC7303, July 2014, <http://www.rfc-editor.org/info/rfc7303>.

[RFC7303]Thompson,H.和C.Lilley,“XML媒体类型”,RFC 7303,DOI 10.17487/RFC7303,2014年7月<http://www.rfc-editor.org/info/rfc7303>.

[TDIL-HINDI] Technology Development for Indian Languages (TDIL) Programme, "Devanagari Script Behaviour for Hindi Ver2.0", <http://tdil-dc.in/index.php?option=com_download&task=show resourceDetails&toolid=1625&lang=en>.

[TDIL-HINDI]印度语言技术开发(TDIL)计划,“印地语2.0版的Devanagari脚本行为”<http://tdil-dc.in/index.php?option=com_download&task=show resourceDetails&toolid=1625&lang=en>。

[UAX44] The Unicode Consortium, "Unicode Character Database", June 2016, <http://unicode.org/reports/tr44/>.

[UAX44]Unicode联盟,“Unicode字符数据库”,2016年6月<http://unicode.org/reports/tr44/>.

[WLE-RULES] Internet Corporation for Assigned Names and Numbers, "Whole Label Evaluation (WLE) Rules", August 2016, <https://community.icann.org/download/ attachments/43989034/WLE-Rules.pdf>.

[WLE-RULES]互联网名称和编号分配公司,“全标签评估(WLE)规则”,2016年8月<https://community.icann.org/download/ 附件/43989034/WLE Rules.pdf>。

Appendix A. Example Tables
附录A.示例表

The following presents a minimal LGR table defining the lowercase LDH (letters, digits, hyphen) repertoire and containing no rules or metadata elements. Many simple LGR tables will look quite similar, except that they would contain some metadata.

下面给出了一个最小的LGR表,该表定义了小写LDH(字母、数字、连字符)指令集,并且不包含任何规则或元数据元素。许多简单的LGR表看起来非常相似,只是它们包含一些元数据。

   <?xml version="1.0" encoding="utf-8"?>
   <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
   <data>
       <char cp="002D" comment="HYPHEN (-)" />
       <range first-cp="0030" last-cp="0039"
         comment="DIGIT ZERO - DIGIT NINE" />
       <range first-cp="0061" last-cp="007A"
         comment="LATIN SMALL LETTER A - LATIN SMALL LETTER Z" />
   </data>
   </lgr>
        
   <?xml version="1.0" encoding="utf-8"?>
   <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
   <data>
       <char cp="002D" comment="HYPHEN (-)" />
       <range first-cp="0030" last-cp="0039"
         comment="DIGIT ZERO - DIGIT NINE" />
       <range first-cp="0061" last-cp="007A"
         comment="LATIN SMALL LETTER A - LATIN SMALL LETTER Z" />
   </data>
   </lgr>
        

In practice, any LGR that includes the hyphen might also contain rules invalidating any labels beginning with a hyphen, ending with a hyphen, and containing consecutive hyphens in the third and fourth positions as required by [RFC5891].

实际上,任何包含连字符的LGR也可能包含规则,使任何以连字符开头、以连字符结尾、在[RFC5891]要求的第三和第四位置包含连续连字符的标签无效。

   <?xml version="1.0" encoding="utf-8"?>
   <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
   <data>
       <char cp="002D"
             not-when="hyphen-minus-disallowed" />
       <range first-cp="0030" last-cp="0039" />
       <range first-cp="0061" last-cp="007A" />
   </data>
   <rules>
       <rule name="hyphen-minus-disallowed"
             comment="RFC5891 restrictions on U+002D">
         <choice>
           <rule comment="no leading hyphen">
             <look-behind>
               <start />
             </look-behind>
             <anchor />
           </rule>
           <rule comment="no trailing hyphen">
             <anchor />
             <look-ahead>
               <end />
             </look-ahead>
           </rule>
        
   <?xml version="1.0" encoding="utf-8"?>
   <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
   <data>
       <char cp="002D"
             not-when="hyphen-minus-disallowed" />
       <range first-cp="0030" last-cp="0039" />
       <range first-cp="0061" last-cp="007A" />
   </data>
   <rules>
       <rule name="hyphen-minus-disallowed"
             comment="RFC5891 restrictions on U+002D">
         <choice>
           <rule comment="no leading hyphen">
             <look-behind>
               <start />
             </look-behind>
             <anchor />
           </rule>
           <rule comment="no trailing hyphen">
             <anchor />
             <look-ahead>
               <end />
             </look-ahead>
           </rule>
        
           <rule comment="no consecutive hyphens
                   in third and fourth positions">
             <look-behind>
               <start />
               <any />
               <any />
               <char cp="002D" comment="hyphen-minus" />
             </look-behind>
             <anchor />
           </rule>
         </choice>
       </rule>
   </rules>
   </lgr>
        
           <rule comment="no consecutive hyphens
                   in third and fourth positions">
             <look-behind>
               <start />
               <any />
               <any />
               <char cp="002D" comment="hyphen-minus" />
             </look-behind>
             <anchor />
           </rule>
         </choice>
       </rule>
   </rules>
   </lgr>
        

The following sample LGR shows a more complete collection of the elements and attributes defined in this specification in a somewhat typical context.

下面的示例LGR显示了本规范中定义的元素和属性的更完整的集合,这些元素和属性是在某种典型的上下文中定义的。

   <?xml version="1.0" encoding="utf-8"?>
        
   <?xml version="1.0" encoding="utf-8"?>
        

<!-- This example uses a large subset of the features of this specification. It does not include every set operator, match operator element, or action trigger attribute, their use being largely parallel to the ones demonstrated. -->

<!-- 本例使用了本规范的大部分功能。它不包括每个集合运算符、匹配运算符元素或动作触发器属性,它们的使用在很大程度上与演示的类似。-->

   <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
   <!-- meta element with all optional elements -->
   <meta>
       <version comment="initial version">1</version>
       <date>2010-01-01</date>
       <language>sv</language>
       <scope type="domain">example.com</scope>
       <validity-start>2010-01-01</validity-start>
       <validity-end>2013-12-31</validity-end>
       <description type="text/html">
           <![CDATA[
           This language table was developed with the
           <a href="http://swedish.example/">Swedish
           examples institute</a>.
           ]]>
       </description>
        
   <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
   <!-- meta element with all optional elements -->
   <meta>
       <version comment="initial version">1</version>
       <date>2010-01-01</date>
       <language>sv</language>
       <scope type="domain">example.com</scope>
       <validity-start>2010-01-01</validity-start>
       <validity-end>2013-12-31</validity-end>
       <description type="text/html">
           <![CDATA[
           This language table was developed with the
           <a href="http://swedish.example/">Swedish
           examples institute</a>.
           ]]>
       </description>
        
       <unicode-version>6.3.0</unicode-version>
       <references>
         <reference id="0" comment="the most recent" >The
               Unicode Standard 9.0</reference>
         <reference id="1" >RFC 5892</reference>
         <reference id="2" >Big-5: Computer Chinese Glyph
            and Character Code Mapping Table, Technical Report
            C-26, 1984</reference>
       </references>
    </meta>
        
       <unicode-version>6.3.0</unicode-version>
       <references>
         <reference id="0" comment="the most recent" >The
               Unicode Standard 9.0</reference>
         <reference id="1" >RFC 5892</reference>
         <reference id="2" >Big-5: Computer Chinese Glyph
            and Character Code Mapping Table, Technical Report
            C-26, 1984</reference>
       </references>
    </meta>
        
    <!-- the "data" section describing the repertoire -->
    <data>
       <!-- single code point "char" element -->
       <char cp="002D" ref="1" comment="HYPHEN" />
        
    <!-- the "data" section describing the repertoire -->
    <data>
       <!-- single code point "char" element -->
       <char cp="002D" ref="1" comment="HYPHEN" />
        
       <!-- "range" elements for contiguous code points, with tags -->
       <range first-cp="0030" last-cp="0039" ref="1" tag="digit" />
       <range first-cp="0061" last-cp="007A" ref ="1" tag="letter" />
        
       <!-- "range" elements for contiguous code points, with tags -->
       <range first-cp="0030" last-cp="0039" ref="1" tag="digit" />
       <range first-cp="0061" last-cp="007A" ref ="1" tag="letter" />
        
       <!-- code point sequence -->
       <char cp="006C 00B7 006C" comment="Catalan middle dot" />
        
       <!-- code point sequence -->
       <char cp="006C 00B7 006C" comment="Catalan middle dot" />
        
       <!-- alternatively, use a When Rule -->
       <char cp="00B7" when="catalan-middle-dot" />
        
       <!-- alternatively, use a When Rule -->
       <char cp="00B7" when="catalan-middle-dot" />
        
        <!-- code point with context rule -->
       <char cp="200D" when="joiner" ref="2" />
        
        <!-- code point with context rule -->
       <char cp="200D" when="joiner" ref="2" />
        
       <!-- code points with variants -->
       <char cp="4E16" tag="preferred" ref="0">
         <var cp="4E17" type="blocked" ref="2" />
         <var cp="534B" type="allocatable" ref="2" />
       </char>
       <char cp="4E17" ref="0">
         <var cp="4E16" type="allocatable" ref="2" />
         <var cp="534B" type="allocatable" ref="2" />
       </char>
       <char cp="534B" ref="0">
         <var cp="4E16" type="allocatable" ref="2" />
         <var cp="4E17" type="blocked" ref="2" />
       </char>
     </data>
        
       <!-- code points with variants -->
       <char cp="4E16" tag="preferred" ref="0">
         <var cp="4E17" type="blocked" ref="2" />
         <var cp="534B" type="allocatable" ref="2" />
       </char>
       <char cp="4E17" ref="0">
         <var cp="4E16" type="allocatable" ref="2" />
         <var cp="534B" type="allocatable" ref="2" />
       </char>
       <char cp="534B" ref="0">
         <var cp="4E16" type="allocatable" ref="2" />
         <var cp="4E17" type="blocked" ref="2" />
       </char>
     </data>
        
     <!-- Context and whole label rules -->
     <rules>
       <!-- Require the given code point to be between two 006C
            code points -->
       <rule name="catalan-middle-dot" ref="0">
           <look-behind>
               <char cp="006C" />
           </look-behind>
           <anchor />
           <look-ahead>
               <char cp="006C" />
           </look-ahead>
       </rule>
        
     <!-- Context and whole label rules -->
     <rules>
       <!-- Require the given code point to be between two 006C
            code points -->
       <rule name="catalan-middle-dot" ref="0">
           <look-behind>
               <char cp="006C" />
           </look-behind>
           <anchor />
           <look-ahead>
               <char cp="006C" />
           </look-ahead>
       </rule>
        
       <!-- example of a context rule based on property -->
       <class name="virama" property="ccc:9" />
       <rule name="joiner"  ref="1" >
           <look-behind>
               <class by-ref="virama" />
           </look-behind>
           <anchor />
       </rule>
        
       <!-- example of a context rule based on property -->
       <class name="virama" property="ccc:9" />
       <rule name="joiner"  ref="1" >
           <look-behind>
               <class by-ref="virama" />
           </look-behind>
           <anchor />
       </rule>
        
       <!-- example of using set operators -->
        
       <!-- example of using set operators -->
        
       <!-- Subtract vowels from letters to get
            consonant, demonstrating the different
            set notations and the difference operator -->
        <difference name="consonants">
            <class comment="all letters">0061-007A</class>
            <class comment="all vowels">
                    0061 0065 0069 006F 0075
            </class>
        </difference>
        
       <!-- Subtract vowels from letters to get
            consonant, demonstrating the different
            set notations and the difference operator -->
        <difference name="consonants">
            <class comment="all letters">0061-007A</class>
            <class comment="all vowels">
                    0061 0065 0069 006F 0075
            </class>
        </difference>
        
        <!-- by using the start and end, rule matches whole label -->
        <rule name="three-or-more-consonants">
            <start />
            <!-- reference the class defined by the difference,
                 and require three or more matches -->
            <class by-ref="consonants" count="3+" />
            <end />
       </rule>
        
        <!-- by using the start and end, rule matches whole label -->
        <rule name="three-or-more-consonants">
            <start />
            <!-- reference the class defined by the difference,
                 and require three or more matches -->
            <class by-ref="consonants" count="3+" />
            <end />
       </rule>
        
       <!-- rule for negative matching -->
       <rule name="non-preferred"
             comment="matches any non-preferred code point">
           <complement comment="non-preferred" >
               <class from-tag="preferred" />
           </complement>
       </rule>
        
       <!-- rule for negative matching -->
       <rule name="non-preferred"
             comment="matches any non-preferred code point">
           <complement comment="non-preferred" >
               <class from-tag="preferred" />
           </complement>
       </rule>
        
      <!-- actions triggered by matching rules and/or
           variant types -->
       <action disp="invalid"
               match="three-or-more-consonants" />
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" all-variants="allocatable"
               not-match="non-preferred" />
     </rules>
   </lgr>
        
      <!-- actions triggered by matching rules and/or
           variant types -->
       <action disp="invalid"
               match="three-or-more-consonants" />
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" all-variants="allocatable"
               not-match="non-preferred" />
     </rules>
   </lgr>
        

Appendix B. How to Translate Tables Based on RFC 3743 into the XML Format

附录B.如何将基于RFC3743的表转换为XML格式

As background, the rules specified in [RFC3743] work as follows:

作为背景,[RFC3743]中指定的规则如下所示:

1. The original (requested) label is checked to make sure that all the code points are a subset of the repertoire.

1. 检查原始(请求的)标签,以确保所有代码点都是曲目的子集。

2. If it passes the check, the original label is allocatable.

2. 如果通过检查,则原始标签是可分配的。

3. Generate the all-simplified and all-traditional variant labels (union of all the labels generated using all the simplified variants of the code points) for allocation.

3. 生成用于分配的所有简化和所有传统变体标签(使用代码点的所有简化变体生成的所有标签的并集)。

To illustrate by example, here is one of the more complicated set of variants:

为了举例说明,下面是一组更复杂的变体:

U+4E7E U+4E81 U+5E72 U+5E79 U+69A6 U+6F27

U+4E7E U+4E81 U+5E72 U+5E79 U+69A6 U+6F27

The following shows the relevant section of the Chinese language table published by the .ASIA registry [ASIA-TABLE]. Its entries read:

以下显示了由.ASIA注册表[ASIA-table]发布的中文表格的相关部分。其条目如下:

    <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)>
        
    <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)>
        

These are the lines corresponding to the set of variants listed above:

这些是与上面列出的一组变体相对应的行:

   U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6
   U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6
   U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27
   U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27
   U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27
   U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6
        
   U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6
   U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6
   U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27
   U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27
   U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27
   U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6
        

The corresponding "data" section XML format would look like this:

相应的“数据”部分XML格式如下所示:

     <data>
       <char cp="4E7E">
       <var cp="4E7E" type="both" comment="identity" />
       <var cp="4E81" type="blocked" />
       <var cp="5E72" type="simp" />
       <var cp="5E79" type="blocked" />
       <var cp="69A6" type="blocked" />
       <var cp="6F27" type="blocked" />
       </char>
       <char cp="4E81">
       <var cp="4E7E" type="trad" />
       <var cp="5E72" type="simp" />
       <var cp="5E79" type="blocked" />
       <var cp="69A6" type="blocked" />
       <var cp="6F27" type="blocked" />
       </char>
       <char cp="5E72">
       <var cp="4E7E" type="trad"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="both" comment="identity"/>
       <var cp="5E79" type="trad"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="blocked"/>
       </char>
       <char cp="5E79">
       <var cp="4E7E" type="blocked"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="simp"/>
       <var cp="5E79" type="trad" comment="identity"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="blocked"/>
       </char>
       <char cp="69A6">
       <var cp="4E7E" type="blocked"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="simp"/>
       <var cp="5E79" type="blocked"/>
       <var cp="69A6" type="trad" comment="identity"/>
       <var cp="6F27" type="blocked"/>
       </char>
        
     <data>
       <char cp="4E7E">
       <var cp="4E7E" type="both" comment="identity" />
       <var cp="4E81" type="blocked" />
       <var cp="5E72" type="simp" />
       <var cp="5E79" type="blocked" />
       <var cp="69A6" type="blocked" />
       <var cp="6F27" type="blocked" />
       </char>
       <char cp="4E81">
       <var cp="4E7E" type="trad" />
       <var cp="5E72" type="simp" />
       <var cp="5E79" type="blocked" />
       <var cp="69A6" type="blocked" />
       <var cp="6F27" type="blocked" />
       </char>
       <char cp="5E72">
       <var cp="4E7E" type="trad"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="both" comment="identity"/>
       <var cp="5E79" type="trad"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="blocked"/>
       </char>
       <char cp="5E79">
       <var cp="4E7E" type="blocked"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="simp"/>
       <var cp="5E79" type="trad" comment="identity"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="blocked"/>
       </char>
       <char cp="69A6">
       <var cp="4E7E" type="blocked"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="simp"/>
       <var cp="5E79" type="blocked"/>
       <var cp="69A6" type="trad" comment="identity"/>
       <var cp="6F27" type="blocked"/>
       </char>
        
       <char cp="6F27">
       <var cp="4E7E" type="simp"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="blocked"/>
       <var cp="5E79" type="blocked"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="trad" comment="identity"/>
       </char>
     </data>
        
       <char cp="6F27">
       <var cp="4E7E" type="simp"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="blocked"/>
       <var cp="5E79" type="blocked"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="trad" comment="identity"/>
       </char>
     </data>
        

Here, the simplified variants have been given a type of "simp" and the traditional variants one of "trad", and all other ones are given "blocked".

在这里,简化的变体被赋予一种“simp”类型,而传统的变体被赋予一种“trad”,而所有其他变体被赋予“blocked”。

Because some variant mappings show in more than one column, while the XML format allows only a single type value, they have been given the type of "both".

因为一些变量映射显示在多个列中,而XML格式只允许一个类型值,所以它们被赋予了“两者”类型。

Note that some variant mappings map to themselves (identity); that is, the mapping is reflexive (see Section 5.3.4). In creating the permutation of all variant labels, these mappings have no effect, other than adding a value to the variant type list for the variant label containing them.

注意一些变量映射映射到它们自己(identity);也就是说,映射是自反的(见第5.3.4节)。在创建所有变体标签的排列时,这些映射没有任何效果,只是为包含它们的变体标签的变体类型列表添加了一个值。

In the example so far, all of the entries with type="both" are also mappings where source and target are identical. That is, they are reflexive mappings as defined in Section 5.3.4.

在到目前为止的示例中,type=“tware”的所有条目也是源和目标相同的映射。也就是说,它们是第5.3.4节中定义的自反映射。

Given a label "U+4E7E U+4E81", the following labels would be ruled allocatable per [RFC3743], based on how that standard is commonly implemented in domain registries:

给定一个标签“U+4E7E U+4E81”,根据该标准在域注册表中通常的实现方式,以下标签将根据[RFC3743]规定为可分配:

       Original label:     U+4E7E U+4E81
       Simplified label 1: U+4E7E U+5E72
       Simplified label 2: U+5E72 U+5E72
       Traditional label:  U+4E7E U+4E7E
        
       Original label:     U+4E7E U+4E81
       Simplified label 1: U+4E7E U+5E72
       Simplified label 2: U+5E72 U+5E72
       Traditional label:  U+4E7E U+4E7E
        

However, if allocatable labels were generated simply by a straight permutation of all variants with type other than type="blocked" and without regard to the simplified and traditional variants, we would end up with an extra allocatable label of "U+5E72 U+4E7E". This label is composed of both a Simplified Chinese character and a Traditional Chinese code point and therefore shouldn't be allocatable.

然而,如果可分配标签仅由类型为type=“blocked”以外的所有变体的直接排列生成,而不考虑简化的和传统的变体,我们最终将得到一个额外的可分配标签“U+5E72 U+4E7E”。此标签由简体中文字符和繁体中文代码点组成,因此不应分配。

To more fully resolve the dispositions requires several actions to be defined, as described in Section 7.2.2, that will override the default actions from Section 7.6. After blocking all labels that contain a variant with type "blocked", these actions will set to "allocatable" labels based on the following variant types: "simp", "trad", and "both". Note that these variant types do not directly relate to dispositions for the variant label, but that the actions will resolve them to the Standard Dispositions on labels, i.e., "blocked" and "allocatable".

如第7.2.2节所述,为了更全面地解决处置问题,需要定义几个操作,这些操作将覆盖第7.6节中的默认操作。阻止包含类型为“blocked”的变量的所有标签后,这些操作将基于以下变量类型设置为“allocatable”标签:“simp”、“trad”和“两者”。请注意,这些变体类型与变体标签的配置没有直接关系,但这些操作将它们解析为标签上的标准配置,即“阻塞”和“可分配”。

To resolve label dispositions requires five actions to be defined (in the "rules" section of the XML document in question); these actions apply in order, and the first one triggered defines the disposition for the label. The actions are as follows:

要解决标签处理问题,需要定义五个操作(在相关XML文档的“规则”部分);这些操作按顺序应用,触发的第一个操作定义标签的处置。行动如下:

1. Block all variant labels containing at least one blocked variant.

1. 阻止包含至少一个阻止的变体的所有变体标签。

2. Allocate all labels that consist entirely of variants that are "simp" or "both".

2. 分配完全由“simp”或“两者”变体组成的所有标签。

3. Also allocate all labels that are entirely "trad" or "both".

3. 同时分配所有完全“贸易”或“两者”的标签。

4. Block all surviving labels containing any one of the dispositions "simp" or "trad" or "both", because they are now known to be part of an undesirable mixed simplified/traditional label.

4. 阻止包含任何一种处置“simp”或“trad”或“两者”的所有剩余标签,因为现在已知它们是不受欢迎的混合简化/传统标签的一部分。

5. Allocate any remaining label; the original label would be such a label.

5. 分配任何剩余标签;原始标签就是这样一个标签。

The rules declarations would be represented as:

规则声明将表述为:

     <rules>
       <!--"action" elements - order defines precedence-->
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" only-variants="simp both" />
       <action disp="allocatable" only-variants="trad both" />
       <action disp="blocked" any-variant="simp trad" />
       <action disp="allocatable" comment="catch-all" />
     </rules>
        
     <rules>
       <!--"action" elements - order defines precedence-->
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" only-variants="simp both" />
       <action disp="allocatable" only-variants="trad both" />
       <action disp="blocked" any-variant="simp trad" />
       <action disp="allocatable" comment="catch-all" />
     </rules>
        

Up to now, variants with type "both" have occurred only associated with reflexive variant mappings. The "action" elements defined above rely on the assumption that this is always the case. However, consider the following set of variants:

到目前为止,类型为“两者”的变体仅与自反变体映射相关。上文定义的“行动”要素依赖于这样的假设,即情况始终如此。但是,请考虑下面的一组变体:

       U+62E0;U+636E;U+636E;U+64DA
       U+636E;U+636E;U+64DA;U+62E0
       U+64DA;U+636E;U+64DA;U+62E0
        
       U+62E0;U+636E;U+636E;U+64DA
       U+636E;U+636E;U+64DA;U+62E0
       U+64DA;U+636E;U+64DA;U+62E0
        

The corresponding XML would be:

相应的XML将是:

       <char cp="62E0">
       <var cp="636E" type="both" comment="both, but not reflexive" />
       <var cp="64DA" type="blocked" />
       </char>
       <char cp="636E">
       <var cp="636E" type="simp" comment="reflexive, but not both" />
       <var cp="64DA" type="trad" />
       <var cp="62E0" type="blocked" />
       </char>
       <char cp="64DA">
       <var cp="636E" type="simp" />
       <var cp="64DA" type="trad" comment="reflexive" />
       <var cp="62E0" type="blocked" />
       </char>
        
       <char cp="62E0">
       <var cp="636E" type="both" comment="both, but not reflexive" />
       <var cp="64DA" type="blocked" />
       </char>
       <char cp="636E">
       <var cp="636E" type="simp" comment="reflexive, but not both" />
       <var cp="64DA" type="trad" />
       <var cp="62E0" type="blocked" />
       </char>
       <char cp="64DA">
       <var cp="636E" type="simp" />
       <var cp="64DA" type="trad" comment="reflexive" />
       <var cp="62E0" type="blocked" />
       </char>
        

To make such variant sets work requires a way to selectively trigger an action based on whether a variant type is associated with an identity or reflexive mapping, or is associated with an ordinary variant mapping. This can be done by adding a prefix "r-" to the "type" attribute on reflexive variant mappings. For example, the "trad" for code point U+64DA in the preceding figure would become "r-trad".

要使这些变量集起作用,需要一种基于变量类型是否与标识或自反映射相关联,或者是否与普通变量映射相关联的方式来选择性地触发操作。这可以通过在自反变量映射的“type”属性中添加前缀“r-”来实现。例如,上图中代码点U+64DA的“trad”将变为“r-trad”。

With the dispositions prepared in this way, only a slight modification to the actions is needed to yield the correct set of allocatable labels:

以这种方式准备好处置后,只需对操作稍加修改即可生成正确的可分配标签集:

   <action disp="blocked" any-variant="blocked" />
   <action disp="allocatable" only-variants="simp r-simp both r-both" />
   <action disp="allocatable" only-variants="trad r-trad both r-both" />
   <action disp="blocked" all-variants="simp trad both" />
   <action disp="allocatable" />
        
   <action disp="blocked" any-variant="blocked" />
   <action disp="allocatable" only-variants="simp r-simp both r-both" />
   <action disp="allocatable" only-variants="trad r-trad both r-both" />
   <action disp="blocked" all-variants="simp trad both" />
   <action disp="allocatable" />
        

The first three actions get triggered by the same labels as before.

前三个动作由与前面相同的标签触发。

The fourth action blocks any label that combines an original code point with any mix of ordinary variant mappings; however, no labels that are a combination of only original code points (code points having either no variant mappings or a reflexive mapping) would be affected. These are the original labels, and they are allocated in the last action.

第四个动作阻止将原始代码点与普通变量映射的任何组合组合的任何标签;但是,只有原始代码点(没有变体映射或自反映射的代码点)组合的标签不会受到影响。这些是原始标签,它们是在最后一个操作中分配的。

Using this scheme of assigning types to ordinary and reflexive variants, all tables in the style of RFC 3743 can be converted to XML. By defining a set of actions as outlined above, the LGR will yield the correct set of allocatable variants: all variants consisting completely of variant code points preferred for simplified or traditional, respectively, will be allocated, as will be the original label. All other variant labels will be blocked.

使用这种将类型分配给普通变量和自反变量的方案,RFC3743样式的所有表都可以转换为XML。通过定义如上所述的一组操作,LGR将产生正确的可分配变量集:所有完全由简化或传统变量代码点组成的变量将被分配,就像原始标签一样。所有其他变体标签将被阻止。

Appendix C. Indic Syllable Structure Example
附录C.印度语音节结构示例

In LGRs for Indic scripts, it may be desirable to restrict valid labels to sequences of valid Indic syllables, or aksharas. This appendix gives a sample set of rules designed to enforce this restriction.

在印度语脚本的LGR中,可能需要将有效标签限制为有效印度语音节序列或aksharas。本附录给出了一组旨在实施该限制的规则示例。

Below is an example of BNF for an akshara, which has been published in "Devanagari Script Behaviour for Hindi" [TDIL-HINDI]. The rules for other languages and scripts used in India are expected to be generally similar.

下面是akshara的BNF示例,该示例已在“印地语的Devanagari脚本行为”[TDIL-Hindi]中发布。印度使用的其他语言和脚本的规则预计大体相似。

For Hindi, the BNF has the form:

对于印地语,BNF的形式如下:

       V[m]|{C[N]H}C[N](H|[v][m])
        
       V[m]|{C[N]H}C[N](H|[v][m])
        

Where:

哪里:

V (uppercase) is any independent vowel

V(大写)是任何独立元音

m is any vowel modifier (Devanagari Anusvara, Visarga, and Candrabindu)

m是任何元音修饰语(Devanagari Anusvara、Visarga和Candrabindu)

C is any consonant (with inherent vowel)

C是任何辅音(带有固有元音)

N is Nukta

N是Nukta

H is a halant (or virama)

H是halant(或virama)

v (lowercase) is any dependent vowel sign (matra)

v(小写)是任何从属元音符号(matra)

{} encloses items that may be repeated one or more times

{}包含可以重复一次或多次的项

[ ] encloses items that may or may not be present

[]包含可能存在或不存在的项目

| separates items, out of which only one can be present

|分隔项目,其中只能有一个项目

By using the Unicode character property "InSC" or "Indic_Syllabic_Category", which corresponds rather directly to the classification of characters in the BNF above, we can translate the BNF into a set of WLE rules matching the definition of an akshara.

通过使用Unicode字符属性“InSC”或“indi_音节_Category”(相当直接地对应于上述BNF中的字符分类),我们可以将BNF转换为一组与akshara定义匹配的WLE规则。

     <rules>
       <!--Character class definitions go here-->
       <class name="halant" property="InSC:Virama" />
       <union name="vowel-modifier">
         <class property="InSC:Visarga" />
         <class property="InSC:Bindu" comment="includes anusvara" />
       </union>
       <!--Whole label evaluation and context rules go here-->
       <rule name="consonant-with-optional-nukta">
           <class by-ref="InSC:Consonant" />
           <class by-ref="InSC:Nukta" count="0:1"/>
       </rule>
       <rule name="independent-vowel-with-optional-modifier">
           <class by-ref="InSC:Vowel_Independent" />
           <class by-ref="vowel-modifier" count="0:1" />
       </rule>
       <rule name="optional-dependent-vowel-with-opt-modifier" >
         <class by-ref="InSC:Vowel_Dependent" count="0:1" />
         <class by-ref="vowel-modifier" count="0:1" />
       </rule>
       <rule name="consonant-cluster">
         <rule count="0+">
           <rule by-ref="consonant-with-optional-nukta" />
           <class by-ref="halant" />
         </rule>
         <rule by-ref="consonant-with-optional-nukta" />
         <choice>
           <class by-ref="halant" />
           <rule by-ref="optional-dependent-vowel-with-opt-modifier" />
         </choice>
       </rule>
       <rule name="akshara">
         <choice>
           <rule by-ref="independent-vowel-with-optional-modifier" />
           <rule by-ref="consonant-cluster" />
         </choice>
       </rule>
        
     <rules>
       <!--Character class definitions go here-->
       <class name="halant" property="InSC:Virama" />
       <union name="vowel-modifier">
         <class property="InSC:Visarga" />
         <class property="InSC:Bindu" comment="includes anusvara" />
       </union>
       <!--Whole label evaluation and context rules go here-->
       <rule name="consonant-with-optional-nukta">
           <class by-ref="InSC:Consonant" />
           <class by-ref="InSC:Nukta" count="0:1"/>
       </rule>
       <rule name="independent-vowel-with-optional-modifier">
           <class by-ref="InSC:Vowel_Independent" />
           <class by-ref="vowel-modifier" count="0:1" />
       </rule>
       <rule name="optional-dependent-vowel-with-opt-modifier" >
         <class by-ref="InSC:Vowel_Dependent" count="0:1" />
         <class by-ref="vowel-modifier" count="0:1" />
       </rule>
       <rule name="consonant-cluster">
         <rule count="0+">
           <rule by-ref="consonant-with-optional-nukta" />
           <class by-ref="halant" />
         </rule>
         <rule by-ref="consonant-with-optional-nukta" />
         <choice>
           <class by-ref="halant" />
           <rule by-ref="optional-dependent-vowel-with-opt-modifier" />
         </choice>
       </rule>
       <rule name="akshara">
         <choice>
           <rule by-ref="independent-vowel-with-optional-modifier" />
           <rule by-ref="consonant-cluster" />
         </choice>
       </rule>
        
       <rule name="WLE-akshara-or-other" comment="series of one or
           more aksharas, possibly alternating with other types of
           code points such as digits">
         <start />
         <choice count="1+">
           <class property="InSC:other" />
           <rule by-ref="akshara" />
         </choice>
         <end />
       </rule>
       <!--"action" elements go here - order defines precedence-->
       <action disp="invalid" not-match="WLE-akshara-or-other" />
     </rules>
        
       <rule name="WLE-akshara-or-other" comment="series of one or
           more aksharas, possibly alternating with other types of
           code points such as digits">
         <start />
         <choice count="1+">
           <class property="InSC:other" />
           <rule by-ref="akshara" />
         </choice>
         <end />
       </rule>
       <!--"action" elements go here - order defines precedence-->
       <action disp="invalid" not-match="WLE-akshara-or-other" />
     </rules>
        

With the rules and classes as defined above, the final action assigns a disposition of "invalid" to all labels that are not composed of a sequence of well-formed aksharas, optionally interspersed with other characters, perhaps digits, for example.

对于上面定义的规则和类,最终操作将“无效”的处置分配给所有不由格式良好的Akshara序列组成的标签,可以选择散布其他字符,例如数字。

The relevant Unicode character property could be replicated by tagging repertoire values directly in the LGR; this would remove the dependency on any specific version of the Unicode Standard.

相关的Unicode字符属性可以通过直接在LGR中标记指令集值来复制;这将消除对Unicode标准任何特定版本的依赖。

Generally, dependent vowels may only follow consonant expressions; however, for some scripts, like Bengali, the Unicode Standard supports sequences of dependent vowels or their application on independent vowels. This makes the definition of akshara less restrictive.

一般来说,从属元音只能跟随辅音表达;但是,对于某些脚本,如孟加拉语,Unicode标准支持从属元音序列或它们在独立元音上的应用。这使得阿克萨拉的定义没有那么严格。

C.1. Reducing Complexity
C.1. 降低复杂性

As presented in this example, the rules are rather complex -- although useful in demonstrating the features of the XML format, such complexity would be an undesirable feature in an actual LGR.

如本例所示,这些规则相当复杂——尽管在演示XML格式的特性时很有用,但在实际的LGR中,这种复杂性是不可取的。

It is possible to reduce the complexity of the rules in this example by defining alternate rules that simply define the permissible pair-wise context of adjacent code points by character class, such as a rule that a halant can only follow a (nuktated) consonant. Such pair-wise contexts are easier to understand, implement, and verify, and have the additional benefit of allowing tools to better pinpoint why a label failed to validate. They also tend to correspond more directly to the kind of well-formedness requirements that are most relevant to DNS security, like the requirement to limit the application of a combining mark (such as a vowel modifier) to only selected base characters (in this case, vowels). (See the example and discussion in [WLE-RULES].)

在本例中,可以通过定义替代规则来降低规则的复杂性,该替代规则简单地通过字符类定义相邻代码点的允许成对上下文,例如halant只能跟随(nuktated)辅音的规则。这样的成对上下文更容易理解、实现和验证,并且还有一个额外的好处,即允许工具更好地查明标签无法验证的原因。它们还倾向于更直接地对应于与DNS安全性最相关的良好格式要求,例如将组合标记(如元音修饰符)的应用限制为仅选定的基本字符(在本例中为元音)的要求。(参见[WLE-RULES]中的示例和讨论。)

Appendix D. RELAX NG Compact Schema
附录D.RELAXNG紧凑模式

This schema is provided in RELAX NG Compact format [RELAX-NG].

此模式以RELAX NG压缩格式[RELAX-NG]提供。

<CODE BEGINS> # # LGR XML Schema 1.0 #

<CODE BEGINS>##LGR XML模式1.0#

   default namespace = "urn:ietf:params:xml:ns:lgr-1.0"
        
   default namespace = "urn:ietf:params:xml:ns:lgr-1.0"
        

# # SIMPLE TYPES #

##简单类型#

# RFC 5646 language tag (e.g., "de", "und-Latn") language-tag = xsd:token

#RFC 5646语言标记(例如,“de”、“und-Latn”)语言标记=xsd:token

   # The scope to which the LGR applies.  For the "domain" scope type,
   # it should be a fully qualified domain name.
   scope-value = xsd:token {
       minLength = "1"
   }
        
   # The scope to which the LGR applies.  For the "domain" scope type,
   # it should be a fully qualified domain name.
   scope-value = xsd:token {
       minLength = "1"
   }
        
   ## a single code point
   code-point = xsd:token {
       pattern = "[0-9A-F]{4,6}"
   }
        
   ## a single code point
   code-point = xsd:token {
       pattern = "[0-9A-F]{4,6}"
   }
        
   ## a space-separated sequence of code points
   code-point-sequence = xsd:token {
       pattern = "[0-9A-F]{4,6}( [0-9A-F]{4,6})+"
   }
        
   ## a space-separated sequence of code points
   code-point-sequence = xsd:token {
       pattern = "[0-9A-F]{4,6}( [0-9A-F]{4,6})+"
   }
        

## single code point, or a sequence of code points, or empty string code-point-literal = code-point | code-point-sequence | ""

##单个代码点,或代码点序列,或空字符串代码点文字=代码点|代码点序列|“”

## code point or sequence only non-empty-code-point-literal = code-point | code-point-sequence

##仅代码点或序列非空代码点文字=代码点|代码点序列

   ## code point sent represented in short form
   code-point-set-shorthand = xsd:token {
       pattern = "([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6})"
                 ~ "( ([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6}))*"
   }
        
   ## code point sent represented in short form
   code-point-set-shorthand = xsd:token {
       pattern = "([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6})"
                 ~ "( ([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6}))*"
   }
        
   ## dates are used in information fields in the meta
   ## section ("YYYY-MM-DD")
   date-pattern = xsd:token {
       pattern = "\d{4}-\d\d-\d\d"
   }
        
   ## dates are used in information fields in the meta
   ## section ("YYYY-MM-DD")
   date-pattern = xsd:token {
       pattern = "\d{4}-\d\d-\d\d"
   }
        
   ## variant type
   ## the variant type MUST be non-empty and MUST NOT
   ## start with a "_"; using xsd:NMTOKEN here because
   ## we need space-separated lists of them
   variant-type = xsd:NMTOKEN
        
   ## variant type
   ## the variant type MUST be non-empty and MUST NOT
   ## start with a "_"; using xsd:NMTOKEN here because
   ## we need space-separated lists of them
   variant-type = xsd:NMTOKEN
        
   ## variant type list for action triggers
   ## the list MUST NOT be empty, and entries MUST NOT
   ## start with a "_"
   variant-type-list = xsd:NMTOKENS
        
   ## variant type list for action triggers
   ## the list MUST NOT be empty, and entries MUST NOT
   ## start with a "_"
   variant-type-list = xsd:NMTOKENS
        
   ## reference to a rule name (used in "when" and "not-when"
   ## attributes, as well as the "by-ref" attribute of the "rule"
   ## element).
   rule-ref = xsd:IDREF
        
   ## reference to a rule name (used in "when" and "not-when"
   ## attributes, as well as the "by-ref" attribute of the "rule"
   ## element).
   rule-ref = xsd:IDREF
        
   ## a space-separated list of tags.  Tags should generally follow
   ## xsd:Name syntax.  However, we are using the xsd:NMTOKENS here
   ## because there is no native XSD datatype for space-separated
   ## xsd:Name
   tags = xsd:NMTOKENS
        
   ## a space-separated list of tags.  Tags should generally follow
   ## xsd:Name syntax.  However, we are using the xsd:NMTOKENS here
   ## because there is no native XSD datatype for space-separated
   ## xsd:Name
   tags = xsd:NMTOKENS
        
   ## The value space of a "from-tag" attribute.  Although it is closer
   ## to xsd:IDREF lexically and semantically, tags are not unique in
   ## the document.  As such, we are unable to take advantage of
   ## facilities provided by a validator.  xsd:NMTOKEN is used instead
   ## of the stricter xsd:Names here so as to be consistent with
   ## the above.
   tag-ref = xsd:NMTOKEN
        
   ## The value space of a "from-tag" attribute.  Although it is closer
   ## to xsd:IDREF lexically and semantically, tags are not unique in
   ## the document.  As such, we are unable to take advantage of
   ## facilities provided by a validator.  xsd:NMTOKEN is used instead
   ## of the stricter xsd:Names here so as to be consistent with
   ## the above.
   tag-ref = xsd:NMTOKEN
        
   ## an identifier type (used by "name" attributes).
   identifier = xsd:ID
        
   ## an identifier type (used by "name" attributes).
   identifier = xsd:ID
        
   ## used in the class "by-ref" attribute to reference another class of
   ## the same "name" attribute value.
   class-ref = xsd:IDREF
        
   ## used in the class "by-ref" attribute to reference another class of
   ## the same "name" attribute value.
   class-ref = xsd:IDREF
        
   ## "count" attribute pattern ("n", "n+", or "n:m")
   count-pattern = xsd:token {
       pattern = "\d+(\+|:\d+)?"
   }
        
   ## "count" attribute pattern ("n", "n+", or "n:m")
   count-pattern = xsd:token {
       pattern = "\d+(\+|:\d+)?"
   }
        
   ## "ref" attribute pattern
   ## space-separated list of "id" attribute values for
   ## "reference" elements.  These reference ids
   ## must be declared in a "reference" element
   ## before they can be used in a "ref" attribute
   ref-pattern = xsd:token {
       pattern = "[\-_.:0-9A-Z]+( [\-_.:0-9A-Z]+)*"
   }
        
   ## "ref" attribute pattern
   ## space-separated list of "id" attribute values for
   ## "reference" elements.  These reference ids
   ## must be declared in a "reference" element
   ## before they can be used in a "ref" attribute
   ref-pattern = xsd:token {
       pattern = "[\-_.:0-9A-Z]+( [\-_.:0-9A-Z]+)*"
   }
        

# # STRUCTURES #

##结构#

   ## Representation of a single code point or a sequence of code
   ## points
   char = element char {
       attribute cp { code-point-literal },
       attribute comment { text }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute tag { tags }?,
       attribute ref { ref-pattern }?,
         variant*
   }
        
   ## Representation of a single code point or a sequence of code
   ## points
   char = element char {
       attribute cp { code-point-literal },
       attribute comment { text }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute tag { tags }?,
       attribute ref { ref-pattern }?,
         variant*
   }
        
   ## Representation of a range of code points
   range = element range {
       attribute first-cp { code-point },
       attribute last-cp { code-point },
       attribute comment { text }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute tag { tags }?,
       attribute ref { ref-pattern }?
   }
        
   ## Representation of a range of code points
   range = element range {
       attribute first-cp { code-point },
       attribute last-cp { code-point },
       attribute comment { text }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute tag { tags }?,
       attribute ref { ref-pattern }?
   }
        
   ## Representation of a variant code point or sequence
   variant = element var {
       attribute cp { code-point-literal },
       attribute type { xsd:NMTOKEN }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?
   }
        
   ## Representation of a variant code point or sequence
   variant = element var {
       attribute cp { code-point-literal },
       attribute type { xsd:NMTOKEN }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?
   }
        

# # Classes #

##班级#

   ## a "class" element that references the name of another "class"
   ## (or set-operator like "union") defined elsewhere.
   ## If used as a matcher (appearing under a "rule" element),
   ## the "count" attribute may be present.
   class-invocation = element class { class-invocation-content }
        
   ## a "class" element that references the name of another "class"
   ## (or set-operator like "union") defined elsewhere.
   ## If used as a matcher (appearing under a "rule" element),
   ## the "count" attribute may be present.
   class-invocation = element class { class-invocation-content }
        
   class-invocation-content =
       attribute by-ref { class-ref },
       attribute count { count-pattern }?,
       attribute comment { text }?
        
   class-invocation-content =
       attribute by-ref { class-ref },
       attribute count { count-pattern }?,
       attribute comment { text }?
        
   ## defines a new class (set of code points) using Unicode property
   ## or code points of the same tag value or code point literals
   class-declaration = element class { class-declaration-content }
        
   ## defines a new class (set of code points) using Unicode property
   ## or code points of the same tag value or code point literals
   class-declaration = element class { class-declaration-content }
        
   class-declaration-content =
       # "name" attribute MUST be present if this is a "top-level"
       # class declaration, i.e., appearing directly under the "rules"
       # element.  Otherwise, it MUST be absent.
       attribute name { identifier }?,
       # If used as a matcher (appearing in a "rule" element, but not
       # when nested inside a set-operator or class), the "count"
       # attribute may be present.  Otherwise, it MUST be absent.
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       (
         # define the class by property (e.g., property="sc:Latn"), OR
         attribute property { xsd:NMTOKEN }
         # define the class by tagged code points, OR
         | attribute from-tag { tag-ref }
         # text node to allow for shorthand notation
         # e.g., "0061 0062-0063"
         | code-point-set-shorthand
       )
        
   class-declaration-content =
       # "name" attribute MUST be present if this is a "top-level"
       # class declaration, i.e., appearing directly under the "rules"
       # element.  Otherwise, it MUST be absent.
       attribute name { identifier }?,
       # If used as a matcher (appearing in a "rule" element, but not
       # when nested inside a set-operator or class), the "count"
       # attribute may be present.  Otherwise, it MUST be absent.
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       (
         # define the class by property (e.g., property="sc:Latn"), OR
         attribute property { xsd:NMTOKEN }
         # define the class by tagged code points, OR
         | attribute from-tag { tag-ref }
         # text node to allow for shorthand notation
         # e.g., "0061 0062-0063"
         | code-point-set-shorthand
       )
        
   class-invocation-or-declaration = element class {
     class-invocation-content | class-declaration-content
   }
        
   class-invocation-or-declaration = element class {
     class-invocation-content | class-declaration-content
   }
        

class-or-set-operator-nested = class-invocation-or-declaration | set-operator

类或集合运算符嵌套=类调用或声明|集合运算符

class-or-set-operator-declaration = # a "class" element or set-operator (effectively defining a class) # directly in the "rules" element. class-declaration | set-operator

class或set操作符声明=#直接在“rules”元素中的“class”元素或set操作符(有效地定义类)#。类声明|集运算符

# # set-operators #

##集合运算符#

   complement-operator = element complement {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested
   }
        
   complement-operator = element complement {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested
   }
        
   union-operator = element union {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       # needs two or more child elements
       class-or-set-operator-nested+
   }
        
   union-operator = element union {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       # needs two or more child elements
       class-or-set-operator-nested+
   }
        
   intersection-operator = element intersection {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }
        
   intersection-operator = element intersection {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }
        
   difference-operator = element difference {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }
        
   difference-operator = element difference {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }
        
   symmetric-difference-operator = element symmetric-difference {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }
        
   symmetric-difference-operator = element symmetric-difference {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }
        
   ## operators that transform class(es) into a new class.
   set-operator = complement-operator
                  | union-operator
                  | intersection-operator
                  | difference-operator
                  | symmetric-difference-operator
        
   ## operators that transform class(es) into a new class.
   set-operator = complement-operator
                  | union-operator
                  | intersection-operator
                  | difference-operator
                  | symmetric-difference-operator
        

# # Match operators (matchers) #

##匹配运算符(匹配器)#

   any-matcher = element any {
       attribute count { count-pattern }?,
       attribute comment { text }?
   }
        
   any-matcher = element any {
       attribute count { count-pattern }?,
       attribute comment { text }?
   }
        
   choice-matcher = element choice {
       ## "count" attribute MUST only be used when the choice-matcher
       ## contains no nested "start", "end", "anchor", "look-behind",
       ## or "look-ahead" operators and no nested rule-matchers
       ## containing any of these elements
       attribute count { count-pattern }?,
       attribute comment { text }?,
       # two or more match operators
       match-operator-choice,
       match-operator-choice+
   }
        
   choice-matcher = element choice {
       ## "count" attribute MUST only be used when the choice-matcher
       ## contains no nested "start", "end", "anchor", "look-behind",
       ## or "look-ahead" operators and no nested rule-matchers
       ## containing any of these elements
       attribute count { count-pattern }?,
       attribute comment { text }?,
       # two or more match operators
       match-operator-choice,
       match-operator-choice+
   }
        
   char-matcher =
     # for use as a matcher - like "char" but without a "tag" attribute
     element char {
       attribute cp { non-empty-code-point-literal },
       # If used as a matcher (appearing in a "rule" element), the
       # "count" attribute may be present.  Otherwise, it MUST be
       # absent.
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?
   }
        
   char-matcher =
     # for use as a matcher - like "char" but without a "tag" attribute
     element char {
       attribute cp { non-empty-code-point-literal },
       # If used as a matcher (appearing in a "rule" element), the
       # "count" attribute may be present.  Otherwise, it MUST be
       # absent.
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?
   }
        
   start-matcher = element start {
       attribute comment { text }?
   }
        
   start-matcher = element start {
       attribute comment { text }?
   }
        
   end-matcher = element end {
       attribute comment { text }?
   }
        
   end-matcher = element end {
       attribute comment { text }?
   }
        
   anchor-matcher = element anchor {
       attribute comment { text }?
   }
        
   anchor-matcher = element anchor {
       attribute comment { text }?
   }
        
   look-ahead-matcher = element look-ahead {
       attribute comment { text }?,
       match-operators-non-pos
   }
   look-behind-matcher = element look-behind {
       attribute comment { text }?,
       match-operators-non-pos
   }
        
   look-ahead-matcher = element look-ahead {
       attribute comment { text }?,
       match-operators-non-pos
   }
   look-behind-matcher = element look-behind {
       attribute comment { text }?,
       match-operators-non-pos
   }
        
   ## non-positional match operator that can be used as a direct child
   ## element of the choice-matcher.
   match-operator-choice = (
     any-matcher | choice-matcher | start-matcher | end-matcher
     | char-matcher | class-or-set-operator-nested | rule-matcher
   )
        
   ## non-positional match operator that can be used as a direct child
   ## element of the choice-matcher.
   match-operator-choice = (
     any-matcher | choice-matcher | start-matcher | end-matcher
     | char-matcher | class-or-set-operator-nested | rule-matcher
   )
        
   ## non-positional match operators do not contain any "anchor",
   ## "look-behind", or "look-ahead" elements.
   match-operators-non-pos = (
     start-matcher?,
     (any-matcher | choice-matcher | char-matcher
      | class-or-set-operator-nested | rule-matcher)*,
     end-matcher?
   )
        
   ## non-positional match operators do not contain any "anchor",
   ## "look-behind", or "look-ahead" elements.
   match-operators-non-pos = (
     start-matcher?,
     (any-matcher | choice-matcher | char-matcher
      | class-or-set-operator-nested | rule-matcher)*,
     end-matcher?
   )
        
   ## positional match operators have an "anchor" element, which may be
   ## preceded by a "look-behind" element, or followed by a "look-ahead"
   ## element, or both.
   match-operators-pos =
     look-behind-matcher?, anchor-matcher, look-ahead-matcher?
        
   ## positional match operators have an "anchor" element, which may be
   ## preceded by a "look-behind" element, or followed by a "look-ahead"
   ## element, or both.
   match-operators-pos =
     look-behind-matcher?, anchor-matcher, look-ahead-matcher?
        

match-operators = match-operators-non-pos | match-operators-pos

匹配运算符=匹配运算符非位置|匹配运算符位置

# # Rules #

##规则#

   # top-level rule must have "name" attribute
   rule-declaration-top = element rule {
       attribute name { identifier },
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       match-operators
   }
        
   # top-level rule must have "name" attribute
   rule-declaration-top = element rule {
       attribute name { identifier },
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       match-operators
   }
        
   ## "rule" element used as a matcher (either "by-ref" or contains
   ## other match operators itself)
   rule-matcher =
     element rule {
       ## "count" attribute MUST only be used when the rule-matcher
       ## contains no nested "start", "end", "anchor", "look-behind",
       ## or "look-ahead" operators and no nested rule-matchers
       ## containing any of these elements
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       (attribute by-ref { rule-ref } | match-operators)
     }
        
   ## "rule" element used as a matcher (either "by-ref" or contains
   ## other match operators itself)
   rule-matcher =
     element rule {
       ## "count" attribute MUST only be used when the rule-matcher
       ## contains no nested "start", "end", "anchor", "look-behind",
       ## or "look-ahead" operators and no nested rule-matchers
       ## containing any of these elements
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       (attribute by-ref { rule-ref } | match-operators)
     }
        

# # Actions #

##行动#

   action-declaration = element action {
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # dispositions are often named after variant types or vice versa
       attribute disp { variant-type },
       ( attribute match { rule-ref }
         | attribute not-match { rule-ref } )?,
       ( attribute any-variant { variant-type-list }
         | attribute all-variants { variant-type-list }
         | attribute only-variants { variant-type-list } )?
   }
        
   action-declaration = element action {
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # dispositions are often named after variant types or vice versa
       attribute disp { variant-type },
       ( attribute match { rule-ref }
         | attribute not-match { rule-ref } )?,
       ( attribute any-variant { variant-type-list }
         | attribute all-variants { variant-type-list }
         | attribute only-variants { variant-type-list } )?
   }
        

# DOCUMENT STRUCTURE

#文件结构

   start = lgr
   lgr = element lgr {
       meta-section?,
       data-section,
       rules-section?
   }
        
   start = lgr
   lgr = element lgr {
       meta-section?,
       data-section,
       rules-section?
   }
        
   ## Meta section - information recorded with an LGR that generally
   ## does not affect machine processing (except for "unicode-version").
   ## However, if any "class-declaration" uses the "property" attribute,
   ## a "unicode-version" element MUST be present.
   meta-section = element meta {
       element version {
           attribute comment { text }?,
           text
       }?
       & element date { date-pattern }?
       & element language { language-tag }*
       & element scope {
           # type may by "domain" or an application-defined value
           attribute type { xsd:NCName },
           scope-value
       }*
       & element validity-start { date-pattern }?
       & element validity-end { date-pattern }?
       & element unicode-version {
           xsd:token {
               pattern = "\d+\.\d+\.\d+"
           }
       }?
       & element description {
           # this SHOULD be a valid MIME type
           attribute type { text }?,
           text
       }?
        
   ## Meta section - information recorded with an LGR that generally
   ## does not affect machine processing (except for "unicode-version").
   ## However, if any "class-declaration" uses the "property" attribute,
   ## a "unicode-version" element MUST be present.
   meta-section = element meta {
       element version {
           attribute comment { text }?,
           text
       }?
       & element date { date-pattern }?
       & element language { language-tag }*
       & element scope {
           # type may by "domain" or an application-defined value
           attribute type { xsd:NCName },
           scope-value
       }*
       & element validity-start { date-pattern }?
       & element validity-end { date-pattern }?
       & element unicode-version {
           xsd:token {
               pattern = "\d+\.\d+\.\d+"
           }
       }?
       & element description {
           # this SHOULD be a valid MIME type
           attribute type { text }?,
           text
       }?
        
       & element references {
           element reference {
               attribute id {
                   xsd:token {
                       # limit "id" attribute to uppercase letters,
                       # digits, and a few punctuation marks; use of
                       # integers is RECOMMENDED
                       pattern = "[\-_.:0-9A-Z]*"
                       minLength = "1"
                   }
                },
                attribute comment { text }?,
                text
           }*
       }?
   }
        
       & element references {
           element reference {
               attribute id {
                   xsd:token {
                       # limit "id" attribute to uppercase letters,
                       # digits, and a few punctuation marks; use of
                       # integers is RECOMMENDED
                       pattern = "[\-_.:0-9A-Z]*"
                       minLength = "1"
                   }
                },
                attribute comment { text }?,
                text
           }*
       }?
   }
        
   data-section = element data { (char | range)+ }
        
   data-section = element data { (char | range)+ }
        
   ## Note that action declarations are strictly order dependent.
   ## class-or-set-operator-declaration and rule-declaration-top
   ## are weakly order dependent; they must precede first use of the
   ## identifier via "by-ref".
   rules-section = element rules {
     ( class-or-set-operator-declaration
       | rule-declaration-top
       | action-declaration)*
   }
        
   ## Note that action declarations are strictly order dependent.
   ## class-or-set-operator-declaration and rule-declaration-top
   ## are weakly order dependent; they must precede first use of the
   ## identifier via "by-ref".
   rules-section = element rules {
     ( class-or-set-operator-declaration
       | rule-declaration-top
       | action-declaration)*
   }
        

<CODE ENDS>

<代码结束>

Acknowledgements

致谢

This format builds upon the work on documenting IDN tables by many different registry operators. Notably, a comprehensive language table for Chinese, Japanese, and Korean was developed by the "Joint Engineering Team" [RFC3743]; this table is the basis of many registry policies. Also, a set of guidelines for Arabic script registrations [RFC5564] was published by the Arabic-language community.

这种格式建立在许多不同的注册表操作员记录IDN表的工作之上。值得注意的是,“联合工程团队”[RFC3743]开发了中文、日文和韩文的综合语言表;此表是许多注册表策略的基础。此外,阿拉伯语社区还发布了一套阿拉伯语脚本注册指南[RFC5564]。

Contributions that have shaped this document have been provided by Francisco Arias, Julien Bernard, Mark Davis, Martin Duerst, Paul Hoffman, Sarmad Hussain, Barry Leiba, Alexander Mayrhofer, Alexey Melnikov, Nicholas Ostler, Thomas Roessler, Audric Schiltknecht, Steve Sheng, Michel Suignard, Andrew Sullivan, Wil Tan, and John Yunker.

弗朗西斯科·阿里亚斯、朱利安·伯纳德、马克·戴维斯、马丁·杜尔斯、保罗·霍夫曼、萨玛德·侯赛因、巴里·莱巴、亚历山大·梅尔霍夫、阿列克谢·梅尔尼科夫、尼古拉斯·奥斯特勒、托马斯·罗斯勒、奥德里克·希特克内赫特、史蒂夫·盛、米歇尔·苏伊格纳德、安德鲁·沙利文、威尔·谭和约翰·云克为本文件做出了贡献。

Authors' Addresses

作者地址

Kim Davies Internet Corporation for Assigned Names and Numbers 12025 Waterfront Drive Los Angeles, CA 90094 United States of America

美国加利福尼亚州洛杉矶滨水路12025号Kim Davies互联网公司,邮编90094

   Phone: +1 310 301 5800
   Email: kim.davies@icann.org
   URI:   http://www.icann.org/
        
   Phone: +1 310 301 5800
   Email: kim.davies@icann.org
   URI:   http://www.icann.org/
        

Asmus Freytag ASMUS, Inc.

阿斯穆斯-弗雷塔格阿斯穆斯公司。

   Email: asmus@unicode.org
        
   Email: asmus@unicode.org