Internet Engineering Task Force (IETF) M. Davis Request for Comments: 6497 Google Category: Informational A. Phillips ISSN: 2070-1721 Lab126 Y. Umaoka IBM C. Falk Infinite Automata February 2012
Internet Engineering Task Force (IETF) M. Davis Request for Comments: 6497 Google Category: Informational A. Phillips ISSN: 2070-1721 Lab126 Y. Umaoka IBM C. Falk Infinite Automata February 2012
BCP 47 Extension T - Transformed Content
BCP 47扩展T转换内容
Abstract
摘要
This document specifies an Extension to BCP 47 that provides subtags for specifying the source language or script of transformed content, including content that has been transliterated, transcribed, or translated, or in some other way influenced by the source. It also provides for additional information used for identification.
本文档指定了BCP 47的扩展,该扩展提供了子标签,用于指定转换内容的源语言或脚本,包括已被音译、转录或翻译的内容,或以其他方式受源影响的内容。它还提供了用于识别的附加信息。
Status of This Memo
关于下段备忘
This document is not an Internet Standards Track specification; it is published for informational purposes.
本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.
本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 5741第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6497.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6497.
Copyright Notice
版权公告
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2012 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
Table of Contents
目录
1. Introduction ....................................................2 1.1. Requirements Language ......................................4 2. BCP 47 Required Information .....................................4 2.1. Overview ...................................................4 2.2. Structure ..................................................6 2.3. Canonicalization ...........................................7 2.4. BCP 47 Registration Form ...................................8 2.5. Field Definitions ..........................................8 2.6. Registration of Field Subtags .............................10 2.7. Registration of Additional Fields .........................11 2.8. Committee Responses to Registration Proposals .............11 2.9. Machine-Readable Data .....................................11 3. Acknowledgements ...............................................14 4. IANA Considerations ............................................14 5. Security Considerations ........................................14 6. References .....................................................14 6.1. Normative References ......................................14 6.2. Informative References ....................................15
1. Introduction ....................................................2 1.1. Requirements Language ......................................4 2. BCP 47 Required Information .....................................4 2.1. Overview ...................................................4 2.2. Structure ..................................................6 2.3. Canonicalization ...........................................7 2.4. BCP 47 Registration Form ...................................8 2.5. Field Definitions ..........................................8 2.6. Registration of Field Subtags .............................10 2.7. Registration of Additional Fields .........................11 2.8. Committee Responses to Registration Proposals .............11 2.9. Machine-Readable Data .....................................11 3. Acknowledgements ...............................................14 4. IANA Considerations ............................................14 5. Security Considerations ........................................14 6. References .....................................................14 6.1. Normative References ......................................14 6.2. Informative References ....................................15
[BCP47] permits the definition and registration of language tag extensions "that contain a language component and are compatible with applications that understand language tags". This document defines an extension for specifying the source of content that has been transformed, including text that has been transliterated, transcribed, or translated, or in some other way influenced by the source. It may be used in queries to request content that has been transformed. The "singleton" identifier for this extension is 't'.
[BCP47]允许定义和注册“包含语言组件且与理解语言标记的应用程序兼容”的语言标记扩展。本文档定义了一个扩展,用于指定已转换的内容源,包括已被音译、转录或翻译的文本,或以其他方式受源影响的文本。它可以在查询中用于请求已转换的内容。此扩展的“单例”标识符为“t”。
Language tags, as defined by [BCP47], are useful for identifying the language of content. There are mechanisms for specifying variant subtags for special purposes. However, these variants are insufficient for specifying content that has undergone transformations, including content that has been transliterated, transcribed, or translated. The correct interpretation of the content may depend upon knowledge of the conventions used for the transformation.
[BCP47]定义的语言标记可用于标识内容的语言。有一些机制可以为特殊目的指定变量子标签。但是,这些变体不足以指定经过转换的内容,包括已音译、转录或翻译的内容。内容的正确解释可能取决于对转换使用的约定的了解。
Suppose that Italian or Russian cities on a map are transcribed for Japanese users. Each name needs to be transliterated into katakana using rules appropriate for the specific source and target language. When tagging such data, it is important to be able to indicate not only the resulting content language ("ja" in this case), but also the source language.
假设地图上的意大利或俄罗斯城市是为日本用户转录的。每个名字都需要使用适合于特定源语言和目标语言的规则音译成片假名。在标记此类数据时,不仅要能够指出结果内容语言(在本例中为“ja”),而且要能够指出源语言,这一点很重要。
Transforms such as transliterations may vary, depending not only on the basis of the source and target script, but also on the source and target language. Thus, the Russian <U+041F U+0443 U+0442 U+0438 U+043D> (which corresponds to the Cyrillic <PE, U, TE, I, EN>) transliterates into "Putin" in English but "Poutine" in French. The identifier could be used to indicate a desired mechanical transformation in an API, or could be used to tag data that has been converted (mechanically or by hand) according to a transliteration method.
转换(如音译)可能会有所不同,这不仅取决于源脚本和目标脚本的基础,还取决于源语言和目标语言。因此,俄语<U+041F U+0443 U+0442 U+0438 U+043D>(对应于西里尔语<PE,U,TE,I,EN>)在英语中音译为“putine”,在法语中音译为“Poutine”。标识符可用于指示API中所需的机械转换,或可用于标记已根据音译方法转换(机械或手动)的数据。
In addition, many different conventions have arisen for how to transform text, even between the same languages and scripts. For example, "Gaddafi" is commonly transliterated from Arabic to English as any of (G/Q/K/Kh)a(d/dh/dd/dhdh/th/zz)af(i/y). Some examples of standardized conventions used for transcribing or transliterating text include:
此外,对于如何转换文本,甚至在相同的语言和脚本之间,也出现了许多不同的约定。例如,“卡扎菲”通常是从阿拉伯语翻译成英语的(G/Q/K/Kh)a(d/dh/dd/dhdh/th/zz)af(i/y)。用于转录或音译文本的标准化约定的一些示例包括:
a. United Nations Group of Experts on Geographical Names (UNGEGN)
a. 联合国地名专家组(地名专家组)
b. US Library of Congress (LOC)
b. 美国国会图书馆(LOC)
c. US Board on Geographic Names (BGN)
c. 美国地名委员会(BGN)
d. Korean Ministry of Culture, Sports and Tourism (MCST)
d. 韩国文化、体育和旅游部(MCST)
e. International Organization for Standardization (ISO)
e. 国际标准化组织(ISO)
The usage of this extension is not limited to formal transformations, and may include other instances where the content is in some other way influenced by the source. For example, this extension could be used to designate a request for a speech recognizer that is tailored
此扩展的使用不仅限于形式转换,还可能包括内容以某种其他方式受源影响的其他实例。例如,此扩展可用于指定对定制语音识别器的请求
specifically for second-language speakers who are first-language speakers of a particular language (e.g., a recognizer for "English spoken with a Chinese accent").
特别适用于第二语言使用者,即特定语言的第一语言使用者(例如,“带有中国口音的英语”的识别器)。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。
Identification of transformed content can be done using the 't' extension defined in this document. This extension is formed by the 't' singleton followed by a sequence of subtags that would form a language tag as defined by [BCP47]. This allows the source language or script to be specified to the degree of precision required. There are restrictions on the sequence of subtags. They MUST form a regular, valid, canonical language tag, and MUST neither include extensions nor private use sequences introduced by the singleton 'x'. Where only the script is relevant (such as identifying a script-script transliteration), then 'und' is used for the primary language subtag.
可以使用本文档中定义的“t”扩展来识别转换后的内容。此扩展由“t”单例和一系列子标记组成,这些子标记将形成[BCP47]定义的语言标记。这允许按所需的精度指定源语言或脚本。子标签的顺序有限制。它们必须形成一个常规的、有效的、规范的语言标记,并且不能包含扩展,也不能包含单例“x”引入的私有使用序列。如果只有脚本相关(例如识别脚本音译),则“und”用于主语言子标记。
For example:
例如:
+---------------------+---------------------------------------------+ | Language Tag | Description | +---------------------+---------------------------------------------+ | ja-t-it | The content is Japanese, transformed from | | | Italian. | | ja-Kana-t-it | The content is Japanese Katakana, | | | transformed from Italian. | | und-Latn-t-und-cyrl | The content is in the Latin script, | | | transformed from the Cyrillic script. | +---------------------+---------------------------------------------+
+---------------------+---------------------------------------------+ | Language Tag | Description | +---------------------+---------------------------------------------+ | ja-t-it | The content is Japanese, transformed from | | | Italian. | | ja-Kana-t-it | The content is Japanese Katakana, | | | transformed from Italian. | | und-Latn-t-und-cyrl | The content is in the Latin script, | | | transformed from the Cyrillic script. | +---------------------+---------------------------------------------+
Note that the sequence of subtags governed by 't' cannot contain a singleton (a single-character subtag), because that would start a new extension. For example, the tag "ja-t-i-ami" does not indicate that the source is in "i-ami", because "i-ami" is not a regular language tag in [BCP47]. That tag would express an empty 't' extension followed by an 'i' extension.
请注意,由“t”控制的子标记序列不能包含单例(单字符子标记),因为这将启动新的扩展。例如,标记“ja-t-i-ami”并不表示源代码在“i-ami”中,因为“i-ami”不是[BCP47]中的常规语言标记。该标记将表示一个空的't'扩展名,后跟一个'i'扩展名。
The 't' extension is not intended for use in structured data that already provides separate source and target language identifiers. For example, this is the case in localization interchange formats such as XLIFF. In such cases, it would be inappropriate to use "ja-t-it" for the target language tag because the source language tag "it" would already be present in the data. Instead, one would use the language tag "ja".
“t”扩展不适用于已提供独立源语言和目标语言标识符的结构化数据。例如,本地化交换格式(如XLIFF)就是这种情况。在这种情况下,对目标语言标记使用“ja-t-it”是不合适的,因为源语言标记“it”已经存在于数据中。相反,可以使用语言标记“ja”。
As noted earlier, it is sometimes necessary to indicate additional information about a transformation. This additional information is optionally supplied after the source in a series of one or more fields, where each field consists of a field separator subtag followed by one or more non-separator subtags. Each field separator subtag consists of a single letter followed by a single digit.
如前所述,有时需要指出有关转换的附加信息。此附加信息可选地在一系列一个或多个字段中的源之后提供,其中每个字段由字段分隔符子标记和一个或多个非分隔符子标记组成。每个字段分隔符子标记由一个字母后跟一个数字组成。
A transformation mechanism is an optional field that indicates the specification used for the transformation, such as "UNGEGN" for the United Nations Group of Experts on Geographical Names transliterations and transcriptions. It uses the 'm0' field separator followed by certain subtags.
转换机制是一个可选字段,指明用于转换的规范,例如联合国地名音译和转录专家组的“UNGEGN”。它使用“m0”字段分隔符,后跟某些子标记。
For example:
例如:
+------------------------------------+------------------------------+ | Language Tag | Description | +------------------------------------+------------------------------+ | und-Cyrl-t-und-latn-m0-ungegn-2007 | The content is in Cyrillic, | | | transformed from Latin, | | | according to a UNGEGN | | | specification dated 2007. | +------------------------------------+------------------------------+
+------------------------------------+------------------------------+ | Language Tag | Description | +------------------------------------+------------------------------+ | und-Cyrl-t-und-latn-m0-ungegn-2007 | The content is in Cyrillic, | | | transformed from Latin, | | | according to a UNGEGN | | | specification dated 2007. | +------------------------------------+------------------------------+
The field separator subtags, such as 'm0', were chosen because they are short, visually distinctive, and cannot occur in a language subtag (outside of an extension and after 'x'), thus eliminating the potential for collision or confusion with the source language tag.
选择字段分隔符子标记(如“m0”)是因为它们很短,视觉上很独特,并且不能出现在语言子标记中(在扩展名之外和“x”之后),因此消除了与源语言标记发生冲突或混淆的可能性。
The field subtags are defined by Section 3 of Unicode Technical Standard #35: Unicode Locale Data Markup Language (LDML) [UTS35], the main specification for the Unicode Common Locale Data Repository (CLDR) project. That section also defines the parallel 'u' extension [RFC6067], for which the Unicode Consortium is also the maintaining authority. As required by BCP 47, subtags follow the language tag ABNF and other rules for the formation of language tags and subtags, are restricted to the ASCII letters and digits, are not case sensitive, and do not exceed eight characters in length.
字段子标记由Unicode技术标准#35:Unicode语言环境数据标记语言(LDML)[UTS35]第3节定义,这是Unicode公共语言环境数据存储库(CLDR)项目的主要规范。该部分还定义了并行“u”扩展[RFC6067],Unicode联盟也是该扩展的维护机构。根据BCP 47的要求,子标签遵循语言标签ABNF和其他形成语言标签和子标签的规则,仅限于ASCII字母和数字,不区分大小写,长度不超过8个字符。
The LDML specification is available over the Internet and at no cost, and is available via a royalty-free license at http://unicode.org/copyright.html. LDML is versioned, and each version of LDML is numbered, dated, and stable. Extension subtags, once defined by LDML, are never retracted or substantially changed in meaning.
LDML规范可通过互联网免费获取,并可通过以下网站获得免版税许可证:http://unicode.org/copyright.html. LDML是版本化的,并且LDML的每个版本都是编号、日期和稳定的。一旦LDML定义了扩展子标签,扩展子标签就不会被收回或在意义上发生实质性变化。
The maintaining authority for the 't' extension is the Unicode Consortium:
“t”扩展的维护机构是Unicode联盟:
+---------------+---------------------------------------------------+ | Item | Value | +---------------+---------------------------------------------------+ | Name | Unicode Consortium | | Contact Email | cldr-contact@unicode.org | | Discussion | cldr-users@unicode.org | | List Email | | | URL Location | cldr.unicode.org | | Specification | Unicode Technical Standard #35 Unicode Locale | | | Data Markup Language (LDML), | | | http://unicode.org/reports/tr35/ | | Section | Section 3 Unicode Language and Locale Identifiers | +---------------+---------------------------------------------------+
+---------------+---------------------------------------------------+ | Item | Value | +---------------+---------------------------------------------------+ | Name | Unicode Consortium | | Contact Email | cldr-contact@unicode.org | | Discussion | cldr-users@unicode.org | | List Email | | | URL Location | cldr.unicode.org | | Specification | Unicode Technical Standard #35 Unicode Locale | | | Data Markup Language (LDML), | | | http://unicode.org/reports/tr35/ | | Section | Section 3 Unicode Language and Locale Identifiers | +---------------+---------------------------------------------------+
The subtags in the 't' extension are of the following form:
“t”扩展名中的子标签的形式如下:
t-ext = "t" ; Extension (("-" lang *("-" field)) ; Source + optional field(s) / 1*("-" field)) ; Field(s) only (no source)
t-ext = "t" ; Extension (("-" lang *("-" field)) ; Source + optional field(s) / 1*("-" field)) ; Field(s) only (no source)
lang = language ; BCP 47, with restrictions ["-" script] ["-" region] *("-" variant)
lang=语言;BCP 47,带有限制[“-”脚本][“-”区域]*(“-”变体)
field = fsep 1*("-" 3*8alphanum) ; With restrictions
field = fsep 1*("-" 3*8alphanum) ; With restrictions
fsep = ALPHA DIGIT ; Subtag separators alphanum = ALPHA / DIGIT
fsep = ALPHA DIGIT ; Subtag separators alphanum = ALPHA / DIGIT
where <language>, <script>, <region>, and <variant> rules are specified in [BCP47], and <ALPHA> and <DIGIT> rules in [RFC5234].
其中,[BCP47]中指定了<language>、<script>、<region>和<variant>规则,[RFC5234]中指定了<ALPHA>和<DIGIT>规则。
Description and restrictions:
说明和限制:
a. The 't' extension MUST have at least one subtag.
a. “t”扩展必须至少有一个子标记。
b. The 't' extension normally starts with a source language tag, which MUST be a regular, canonical language tag as specified by [BCP47]. Tags described by the 'irregular' production in BCP 47 MUST NOT be used to form the language tag. The source language tag MAY be omitted: some field values do not require it.
b. “t”扩展通常以源语言标记开头,该标记必须是[BCP47]指定的常规规范语言标记。BCP 47中“不规则”产品描述的标记不得用于形成语言标记。源语言标记可以省略:某些字段值不需要它。
c. There is optionally a sequence of fields, where each field has a separator followed by a sequence of one or more subtags. Two identical field separators MUST NOT be present in the language tag.
c. 有一个可选的字段序列,其中每个字段都有一个分隔符,后跟一个或多个子标签序列。语言标记中不得存在两个相同的字段分隔符。
d. The order of the fields in a 't' extension is not significant. The order of subtags within a field is significant. See Section 2.3 ("Canonicalization").
d. “t”扩展名中字段的顺序不重要。字段中子标记的顺序是重要的。见第2.3节(“规范化”)。
e. The 't' subtag fields are defined by Section 3 of Unicode Technical Standard #35: Unicode Locale Data Markup Language [UTS35].
e. “t”子标记字段由Unicode技术标准#35:Unicode语言环境数据标记语言[UTS35]第3节定义。
As required by [BCP47], the use of uppercase or lowercase letters is not significant in the subtags used in this extension. The canonical form for all subtags in the extension is lowercase, with the fields ordered by the separators, alphabetically. The order of subtags within a field is significant, and MUST NOT be changed in the process of canonicalizing.
根据[BCP47]的要求,在本扩展中使用的子标签中,大写或小写字母的使用并不重要。扩展中所有子标签的规范格式都是小写,字段按字母顺序由分隔符排序。字段中子标记的顺序非常重要,在规范化过程中不得更改。
Per RFC 5646, Section 3.7 [BCP47]:
根据RFC 5646第3.7节[BCP47]:
%% Identifier: t Description: Specifying Transformed Content Comments: Subtags for the identification of content that has been transformed, including but not limited to: transliteration, transcription, and translation. Added: 2011-12-16 RFC: RFC 6497 Authority: Unicode Consortium Contact_Email: cldr-contact@unicode.org Mailing_List: cldr-users@unicode.org URL: http://www.unicode.org/Public/cldr/latest/core.zip %%
%% Identifier: t Description: Specifying Transformed Content Comments: Subtags for the identification of content that has been transformed, including but not limited to: transliteration, transcription, and translation. Added: 2011-12-16 RFC: RFC 6497 Authority: Unicode Consortium Contact_Email: cldr-contact@unicode.org Mailing_List: cldr-users@unicode.org URL: http://www.unicode.org/Public/cldr/latest/core.zip %%
Assignment of 't' field subtags is determined by the Unicode CLDR Technical Committee, in accordance with the policies and procedures in http://www.unicode.org/consortium/tc-procedures.html, and subject to the Unicode Consortium Policies on http://www.unicode.org/policies/policies.html.
“t”字段子标签的分配由Unicode CLDR技术委员会根据http://www.unicode.org/consortium/tc-procedures.html,并受http://www.unicode.org/policies/policies.html.
Assignments that can be made by successive versions of LDML [UTS35] by the Unicode Consortium without requiring a new RFC include:
Unicode联盟可以通过LDML[UTS35]的后续版本进行分配,而无需新的RFC,包括:
o The allocation of new field separator subtags for use after the 't' extension.
o 分配新的字段分隔符子标记以在“t”扩展名之后使用。
o The allocation of subtags valid after a field separator subtag.
o 字段分隔符子标记后有效的子标记分配。
o The addition of subtag aliases and descriptions.
o 添加子标记别名和描述。
o The modification of subtag descriptions.
o 子标记描述的修改。
Changes to the syntax or meaning of the 't' extension would require a new RFC that obsoletes this document; such an RFC would break stability, and would thus be contrary to the policies of the Unicode Consortium.
对“t”扩展名的语法或含义的更改将要求新的RFC淘汰本文件;这样的RFC将破坏稳定性,因此与Unicode联盟的政策背道而驰。
At the time this document was published, one field separator subtag was specified in [UTS35]: the transform mechanism. That field is summarized here:
在本文档发布时,在[UTS35]:转换机制中指定了一个字段分隔符子标记。该领域总结如下:
a. The transform mechanism consists of a sequence of subtags starting with the 'm0' separator followed by one or more mechanism subtags. Each mechanism subtag has a length of 3 to 8 alphanumeric characters. The sequence as a whole provides an identification of the specification for the transform, such as the mechanism subtag 'ungegn' in "und-Cyrl-t-und-latn-m0-ungegn". In many cases, only one mechanism subtag is necessary, but multiple subtags MAY be defined in [UTS35] where necessary.
a. 转换机制由一系列子标记组成,从“m0”分隔符开始,然后是一个或多个机制子标记。每个机制子标签的长度为3到8个字母数字字符。序列作为一个整体提供了转换规范的标识,例如“und-Cyrl-t-und-latn-m0-ungegn”中的机制子标签“ungegn”。在许多情况下,只需要一个机制子标签,但必要时可在[UTS35]中定义多个子标签。
b. Any purely numeric subtag is a representation of a date in the Gregorian calendar. It MAY occur in any mechanism field, but it SHOULD only be used where necessary. If it does occur:
b. 任何纯数字子标记都表示公历中的日期。它可能出现在任何机械领域,但只应在必要时使用。如果确实发生:
* it MUST occur as the final subtag in the field
* 它必须作为字段中的最后一个子标记出现
* it MUST NOT be the only subtag in the field
* 它不能是字段中的唯一子标记
* it MUST only consist of a sequence of digits of the form YYYY, YYYYMM, or YYYYMMDD
* 它只能由格式为YYYY、YYYYMM或YYYYMMDD的数字序列组成
* it SHOULD be as short as possible
* 它应该尽可能短
Note: The format is related to that of [RFC3339], but is not the same. The RFC 3339 full-date won't work because it uses hyphens. The offset ("Z") is not used because the date is a publication date (aka 'floating date'). For more information, see Section 3.3 ("Floating Time") of [W3C-TimeZones].
注:格式与[RFC3339]相关,但不相同。RFC 3339完整日期不起作用,因为它使用连字符。不使用偏移量(“Z”),因为该日期是发布日期(也称为“浮动日期”)。有关更多信息,请参见[W3C时区]第3.3节(“浮动时间”)。
c. Examples:
c. 示例:
* 20110623 represents June 23, 2011.
* 20110623代表2011年6月23日。
* There are three dated versions of the UNGEGN transliteration specification for Hebrew to Latin. They can be represented by the following language tags:
* UNGEGN的希伯来语到拉丁语的音译规范有三个注明日期的版本。它们可以由以下语言标记表示:
+ und-Hebr-t-und-latn-m0-ungegn-1972
+ und-Hebr-t-und-latn-m0-ungegn-1972
+ und-Hebr-t-und-latn-m0-ungegn-1977
+ und-Hebr-t-und-latn-m0-ungegn-1977
+ und-Hebr-t-und-latn-m0-ungegn-2007
+ und-Hebr-t-und-latn-m0-ungegn-2007
* Suppose that the BGN transliteration specification for Cyrillic to Latin had three versions, dated June 11, 1999; Dec 30, 1999; and May 1, 2011. In that case, the corresponding first two DATE subtags would require the months to be distinctive (199906 and 199912), but the last subtag would only require the year (2011).
* 假设西里尔语到拉丁语的BGN音译规范有三个版本,日期为1999年6月11日;1999年12月30日;2011年5月1日。在这种情况下,相应的前两个日期子标签将要求月份是不同的(199906和199912),但最后一个子标签将只要求年份(2011)。
d. Some mechanisms may use a versioning system that is not distinguished by date, or not by date alone. In the latter case, the version will be of a form specified by [UTS35] for that mechanism. For example, if the mechanism xxx uses versions of the form v21a, then a tag could look like "ja-t-it-m0-xxx-v21a". If there are multiple sub-versions distinguished by date, then a tag could look like "ja-t-it-m0-xxx-v21a-2007".
d. 有些机制可能使用不按日期区分的版本控制系统,或者不单独按日期区分。在后一种情况下,版本将采用[UTS35]为该机制指定的形式。例如,如果机制xxx使用形式v21a的版本,那么标记可能看起来像“ja-t-it-m0-xxx-v21a”。如果有多个子版本以日期区分,那么标签可能看起来像“ja-t-it-m0-xxx-v21a-2007”。
A language tag with the 't' extension MAY be used to request a specific transform of content. In such a case, the recipient SHOULD return content that corresponds as closely as feasible to the requested transform, including the specification of the mechanism. For example, if the request is ja-t-it-m0-xxx-v21a-2007, and the recipient has content corresponding to both ja-t-it-m0-xxx-v21a and ja-t-it-m0-xxx-v21b-2009, then the v21a version would be preferred. As is the case for language matching as discussed in [BCP47], different implementations MAY have different measures of "closeness".
扩展名为“t”的语言标记可用于请求内容的特定转换。在这种情况下,接收方应返回与请求的转换尽可能接近的内容,包括机制的规范。例如,如果请求是ja-t-it-m0-xxx-v21a-2007,并且接收方具有与ja-t-it-m0-xxx-v21a和ja-t-it-m0-xxx-v21b-2009两者对应的内容,则首选v21a版本。与[BCP47]中讨论的语言匹配一样,不同的实现可能具有不同的“紧密性”度量。
Registration of transform mechanisms is requested by filing a ticket at http://cldr.unicode.org/. The proposal in the ticket MUST contain the following information:
通过在以下地址提交票据来申请转换机制的注册:http://cldr.unicode.org/. 票证中的提案必须包含以下信息:
+-------------+-----------------------------------------------------+ | Item | Description | +-------------+-----------------------------------------------------+ | Subtag | The proposed mechanism subtag (or subtag sequence). | | Description | A description of the proposed mechanism; that | | | description MUST be sufficient to distinguish it | | | from other mechanisms in use. | | Version | If versioning for the mechanism is not done | | | according to date, then a description of the | | | versioning conventions used for the mechanism. | +-------------+-----------------------------------------------------+
+-------------+-----------------------------------------------------+ | Item | Description | +-------------+-----------------------------------------------------+ | Subtag | The proposed mechanism subtag (or subtag sequence). | | Description | A description of the proposed mechanism; that | | | description MUST be sufficient to distinguish it | | | from other mechanisms in use. | | Version | If versioning for the mechanism is not done | | | according to date, then a description of the | | | versioning conventions used for the mechanism. | +-------------+-----------------------------------------------------+
Proposals for clarifications of descriptions or additional aliases may also be requested by filing a ticket.
还可通过提交票据要求澄清说明或其他别名。
The committee MAY define a template for submissions that requests more information, if it is found that such information would be useful in evaluating proposals.
委员会可以为要求提供更多信息的划界案确定一个模板,如果发现这些信息有助于评估提案。
In the event that it proves necessary to add an additional field (such as 'm2'), it can be requested by filing a ticket at http://cldr.unicode.org/. The proposal in the ticket MUST contain a full description of the proposed field semantics and subtag syntax, and MUST conform to the ABNF syntax for "field" presented in Section 2.2.
如果证明有必要添加额外字段(如“m2”),则可以通过在http://cldr.unicode.org/. 票证中的提案必须包含提议的字段语义和子标签语法的完整描述,并且必须符合第2.2节中“字段”的ABNF语法。
The committee MUST post each proposal publicly within 2 weeks after reception, to allow for comments. The committee must respond publicly to each proposal within 4 weeks after reception.
委员会必须在收到每个提案后2周内将其公开发布,以便征求意见。委员会必须在收到每项提案后4周内对其作出公开回应。
The response MAY:
答复可以:
o request more information or clarification
o 要求更多信息或澄清
o accept the proposal, optionally with modifications to the subtag or description
o 接受建议书,可选择修改子标签或说明
o reject the proposal, because of significant objections raised on the mailing list or due to problems with constraints in this document or in [UTS35]
o 由于邮件列表上提出的重大反对意见,或由于本文件或[UTS35]中的约束问题,拒绝该提案
Accepted tickets result in a new entry in the machine-readable CLDR BCP 47 data or, in the case of a clarified description, modifications to the description attribute value for an existing entry.
接受的票证会导致机器可读的CLDR BCP 47数据中出现新条目,或者在澄清描述的情况下,修改现有条目的描述属性值。
Beginning with CLDR version 1.7.2, machine-readable files are available listing the data defined for BCP 47 extensions for each successive version of [UTS35]. The data in these files is used for testing the validity of subtags for the 't' extension and for the 'u' extension [RFC6067], for which the Unicode Consortium is also the maintaining authority. These releases are listed on http://cldr.unicode.org/index/downloads. Each release has an associated data directory of the form "http://unicode.org/Public/cldr/<version>", where "<version>" is replaced by the release number. For example, for version 1.7.2, the
Beginning with CLDR version 1.7.2, machine-readable files are available listing the data defined for BCP 47 extensions for each successive version of [UTS35]. The data in these files is used for testing the validity of subtags for the 't' extension and for the 'u' extension [RFC6067], for which the Unicode Consortium is also the maintaining authority. These releases are listed on http://cldr.unicode.org/index/downloads. Each release has an associated data directory of the form "http://unicode.org/Public/cldr/<version>", where "<version>" is replaced by the release number. For example, for version 1.7.2, the
"core.zip" file is located at http://unicode.org/Public/cldr/1.7.2/core.zip. The most recent version is always identified by the version "latest" and can be accessed by the URL in Section 2.4.
“core.zip”文件位于http://unicode.org/Public/cldr/1.7.2/core.zip. 最新版本始终由“最新”版本标识,可通过第2.4节中的URL访问。
Inside the "core.zip" file, the directory "common/bcp47" contains the data files listing the valid attributes, keys, and types for each successive version of [UTS35]. Each data file lists the keys and types relevant to that topic.
在“core.zip”文件中,目录“common/bcp47”包含列出[UTS35]每个后续版本的有效属性、键和类型的数据文件。每个数据文件都列出了与该主题相关的键和类型。
The XML structure lists the keys, such as <key extension="t" name="m0" description="Transliteration extension mechanism">, with subelements for the types, such as <type name="ungegn" description="United Nations Group of Experts on Geographical Names"/>. The currently defined attributes for the mechanisms include:
XML结构列出了密钥,如<key extension=“t”name=“m0”description=“translation extension mechanism”>,以及类型的子元素,如<type name=“ungegn”description=“联合国地名专家组”/>。机制当前定义的属性包括:
+-------------+-------------------------------+---------------------+ | Attribute | Description | Examples | +-------------+-------------------------------+---------------------+ | name | The name of the mechanism, | UNGEGN, ALALC | | | limited to 3-8 characters (or | | | | sequences of them). | | | description | A description of the name, | United Nations | | | with all and only that | Group of Experts on | | | information necessary to | Geographical Names; | | | distinguish one name from | American Library | | | others with which it might be | Association-Library | | | confused. Descriptions are | of Congress | | | not intended to provide | | | | general background | | | | information. | | | since | Indicates the first version | 1.9, 2.0.1 | | | of CLDR where the name | | | | appears. (Required for new | | | | items.) | | | alias | Alternative name of the key | | | | or type, not limited in | | | | number of characters. | | | | Aliases are intended for | | | | backwards compatibility, not | | | | to provide all possible | | | | alternate names or | | | | designations. (Optional.) | | +-------------+-------------------------------+---------------------+
+-------------+-------------------------------+---------------------+ | Attribute | Description | Examples | +-------------+-------------------------------+---------------------+ | name | The name of the mechanism, | UNGEGN, ALALC | | | limited to 3-8 characters (or | | | | sequences of them). | | | description | A description of the name, | United Nations | | | with all and only that | Group of Experts on | | | information necessary to | Geographical Names; | | | distinguish one name from | American Library | | | others with which it might be | Association-Library | | | confused. Descriptions are | of Congress | | | not intended to provide | | | | general background | | | | information. | | | since | Indicates the first version | 1.9, 2.0.1 | | | of CLDR where the name | | | | appears. (Required for new | | | | items.) | | | alias | Alternative name of the key | | | | or type, not limited in | | | | number of characters. | | | | Aliases are intended for | | | | backwards compatibility, not | | | | to provide all possible | | | | alternate names or | | | | designations. (Optional.) | | +-------------+-------------------------------+---------------------+
The file for the transform extension is "transform.xml". The initial version of that file contains the following information.
转换扩展名的文件是“transform.xml”。该文件的初始版本包含以下信息。
<keyword> <key extension="t" name="m0" description= "Transliteration extension mechanism"> <type name="ungegn" description= "United Nations Group of Experts on Geographical Names" since="21"/> <type name="alaloc" description= "American Library Association-Library of Congress" since="21"/> <type name="bgn" description= "US Board on Geographic Names" since="21"/> <type name="mcst" description= "Korean Ministry of Culture, Sports and Tourism" since="21"/> <type name="iso" description= "International Organization for Standardization" since="21"/> <type name="din" description= "Deutsches Institut fuer Normung" since="21"/> <type name="gost" description= "Euro-Asian Council for Standardization, Metrology and Certification" since="21"/> </key> </keyword>
<keyword> <key extension="t" name="m0" description= "Transliteration extension mechanism"> <type name="ungegn" description= "United Nations Group of Experts on Geographical Names" since="21"/> <type name="alaloc" description= "American Library Association-Library of Congress" since="21"/> <type name="bgn" description= "US Board on Geographic Names" since="21"/> <type name="mcst" description= "Korean Ministry of Culture, Sports and Tourism" since="21"/> <type name="iso" description= "International Organization for Standardization" since="21"/> <type name="din" description= "Deutsches Institut fuer Normung" since="21"/> <type name="gost" description= "Euro-Asian Council for Standardization, Metrology and Certification" since="21"/> </key> </keyword>
To get the version information in XML when working with the data files, the XML parser must be validating. When the 'core.zip' file is unzipped, the 'dtd' directory will be at the same level as the 'bcp47' directory; this is required for correct validation. For each release after CLDR 1.8, types introduced in that release are also marked in the data files by the XML attribute "since", such as in the following example: <type name="adp" since="1.9"/>
要在处理数据文件时获得XML版本信息,XML解析器必须进行验证。解压“core.zip”文件时,“dtd”目录将与“bcp47”目录处于同一级别;这是正确验证所必需的。对于CLDR 1.8之后的每个版本,在该版本中引入的类型也会在数据文件中用XML属性“begin”标记,例如在下面的示例中:<type name=“adp”begin=“1.9”/
The data is also currently maintained in a source code repository, with each release tagged, for viewing directly without unzipping. For example, see:
数据目前也保存在源代码存储库中,每个版本都有标记,以便在不解压缩的情况下直接查看。例如,请参见:
o http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/
o http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/
o http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/
o http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/
For more information, see http://cldr.unicode.org/index/bcp47-extension.
有关详细信息,请参阅http://cldr.unicode.org/index/bcp47-extension.
Thanks to John Emmons and the rest of the Unicode CLDR Technical Committee for their work in developing the BCP 47 subtags for LDML.
感谢John Emmons和Unicode CLDR技术委员会的其他成员为LDML开发BCP 47子标签所做的工作。
IANA has inserted the record of Section 2.4 into the Language Extensions Registry, according to Section 3.7 ("Extensions and the Extensions Registry") of "Tags for Identifying Languages" [BCP47]. Per Section 5.2 of [BCP47], there might be occasional (rare) requests by the Unicode Consortium (the "Authority" listed in the record) for maintenance of this record. Changes that can be submitted to IANA without the publication of a new RFC are limited to modification of the Comments, Contact_Email, Mailing_List, and URL fields. Any such requested changes MUST use the domain 'unicode.org' in any new addresses or URIs, MUST explicitly cite this document (so that IANA can reference these requirements), and MUST originate from the 'unicode.org' domain. The domain or authority can only be changed via a new RFC.
IANA已根据“识别语言的标签”[BCP47]第3.7节(“扩展和扩展注册表”)将第2.4节的记录插入语言扩展注册表。根据[BCP47]第5.2节,Unicode联盟(记录中列出的“机构”)可能会偶尔(罕见)要求维护该记录。在不发布新RFC的情况下,可以提交给IANA的更改仅限于修改注释、联系人电子邮件、邮件列表和URL字段。任何此类请求的更改必须在任何新地址或URI中使用域“unicode.org”,必须明确引用本文档(以便IANA可以引用这些要求),并且必须源自“unicode.org”域。只能通过新的RFC更改域或权限。
The security considerations for this extension are the same as those for [BCP47]. See RFC 5646, Section 6, Security Considerations [BCP47].
此扩展的安全注意事项与[BCP47]的安全注意事项相同。见RFC 5646,第6节,安全注意事项[BCP47]。
[BCP47] Phillips, A., Ed., and M. Davis, Ed., "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009.
[BCP47]Phillips,A.,Ed.,和M.Davis,Ed.,“识别语言的标记”,BCP 47,RFC 5646,2009年9月。
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[RFC5234] Crocker, D., Ed., and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008.
[RFC5234]Crocker,D.,Ed.,和P.Overell,“语法规范的扩充BNF:ABNF”,STD 68,RFC 5234,2008年1月。
[UTS35] Davis, M., "Unicode Technical Standard #35: Locale Data Markup Language (LDML)", February 2012, <http://www.unicode.org/reports/tr35/>.
[UTS35]Davis,M.“Unicode技术标准#35:语言环境数据标记语言(LDML)”,2012年2月<http://www.unicode.org/reports/tr35/>.
[RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, July 2002.
[RFC3339]Klyne,G.和C.Newman,“互联网上的日期和时间:时间戳”,RFC 3339,2002年7月。
[RFC6067] Davis, M., Phillips, A., and Y. Umaoka, "BCP 47 Extension U", RFC 6067, December 2010.
[RFC6067]Davis,M.,Phillips,A.,和Y.Umaoka,“BCP 47扩展U”,RFC 6067,2010年12月。
[W3C-TimeZones] Phillips, Ed., "W3C Working Group Note: Working with Time Zones", July 2011, <http://www.w3.org/TR/2011/NOTE-timezone-20110705/>.
[W3C时区]Phillips,Ed.,“W3C工作组说明:使用时区”,2011年7月<http://www.w3.org/TR/2011/NOTE-timezone-20110705/>.
Authors' Addresses
作者地址
Mark Davis Google
马克·戴维斯谷歌
EMail: mark@macchiato.com
EMail: mark@macchiato.com
Addison Phillips Lab126
艾迪生菲利普斯实验室126
EMail: addison@lab126.com
EMail: addison@lab126.com
Yoshito Umaoka IBM
宇冈吉人IBM
EMail: yoshito_umaoka@us.ibm.com
EMail: yoshito_umaoka@us.ibm.com
Courtney Falk Infinite Automata
Courtney-Falk无限自动机
EMail: court@infiauto.com
EMail: court@infiauto.com