Network Working Group H. Alvestrand Request for Comments: 2277 UNINETT BCP: 18 January 1998 Category: Best Current Practice
Network Working Group H. Alvestrand Request for Comments: 2277 UNINETT BCP: 18 January 1998 Category: Best Current Practice
IETF Policy on Character Sets and Languages
IETF字符集和语言策略
Status of this Memo
本备忘录的状况
This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. Distribution of this memo is unlimited.
本文件规定了互联网社区的最佳现行做法,并要求进行讨论和提出改进建议。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The Internet Society (1998). All Rights Reserved.
版权所有(C)互联网协会(1998年)。版权所有。
The Internet is international.
互联网是国际化的。
With the international Internet follows an absolute requirement to interchange data in a multiplicity of languages, which in turn utilize a bewildering number of characters.
随着国际互联网的发展,人们绝对需要以多种语言交换数据,而这些语言又使用了大量令人困惑的字符。
This document is the current policies being applied by the Internet Engineering Steering Group (IESG) towards the standardization efforts in the Internet Engineering Task Force (IETF) in order to help Internet protocols fulfill these requirements.
本文件是互联网工程指导小组(IESG)对互联网工程任务组(IETF)的标准化工作所采用的现行政策,以帮助互联网协议满足这些要求。
The document is very much based upon the recommendations of the IAB Character Set Workshop of February 29-March 1, 1996, which is documented in RFC 2130 [WR]. This document attempts to be concise, explicit and clear; people wanting more background are encouraged to read RFC 2130.
该文件在很大程度上基于1996年2月29日至3月1日IAB字符集研讨会的建议,该研讨会记录在RFC 2130[WR]中。本文件力求简明扼要;希望了解更多背景的人可以阅读RFC2130。
The document uses the terms 'MUST', 'SHOULD' and 'MAY', and their negatives, in the way described in [RFC 2119]. In this case, 'the specification' as used by RFC 2119 refers to the processing of protocols being submitted to the IETF standards process.
本文件以[RFC 2119]中所述的方式使用术语“必须”、“应该”和“可能”及其否定词。在这种情况下,RFC 2119使用的“规范”指的是提交给IETF标准过程的协议处理。
Internationalization is for humans. This means that protocols are not subject to internationalization; text strings are. Where protocol elements look like text tokens, such as in many IETF application layer protocols, protocols MUST specify which parts are protocol and which are text. [WR 2.2.1.1]
国际化是为了人类。这意味着协议不受国际化的约束;文本字符串为。当协议元素看起来像文本标记时,例如在许多IETF应用层协议中,协议必须指定哪些部分是协议,哪些部分是文本。[WR 2.2.1.1]
Names are a problem, because people feel strongly about them, many of them are mostly for local usage, and all of them tend to leak out of the local context at times. RFC 1958 [RFC 1958] recommends US-ASCII for all globally visible names.
名字是一个问题,因为人们对它们的感觉很强烈,很多名字都是本地使用的,而且有时都会泄露出本地上下文。RFC 1958[RFC 1958]建议对所有全局可见名称使用US-ASCII。
This document does not mandate a policy on name internationalization, but requires that all protocols describe whether names are internationalized or US-ASCII.
本文件不强制规定名称国际化政策,但要求所有协议描述名称是国际化还是US-ASCII。
NOTE: In the protocol stack for any given application, there is usually one or a few layers that need to address these problems.
注意:在任何给定应用程序的协议栈中,通常有一层或几层需要解决这些问题。
It would, for instance, not be appropriate to define language tags for Ethernet frames. But it is the responsibility of the WGs to ensure that whenever responsibility for internationalization is left to "another layer", those responsible for that layer are in fact aware that they HAVE that responsibility.
例如,为以太网帧定义语言标记是不合适的。但是,工作组有责任确保,每当国际化的责任留给“另一层”时,负责这一层的人实际上意识到他们有这一责任。
This document uses the term "charset" to mean a set of rules for mapping from a sequence of octets to a sequence of characters, such as the combination of a coded character set and a character encoding scheme; this is also what is used as an identifier in MIME "charset=" parameters, and registered in the IANA charset registry [REG]. (Note that this is NOT a term used by other standards bodies, such as ISO).
本文件使用术语“字符集”表示用于从八位字节序列映射到字符序列的一组规则,例如编码字符集和字符编码方案的组合;这也是MIME“charset=”参数中用作标识符的内容,并在IANA字符集注册表[REG]中注册。(请注意,这不是其他标准机构(如ISO)使用的术语)。
For a definition of the term "coded character set", refer to the workshop report.
有关术语“编码字符集”的定义,请参阅《车间维修报告》。
A "name" is an identifier such as a person's name, a hostname, a domainname, a filename or an E-mail address; it is often treated as an identifier rather than as a piece of text, and is often used in protocols as an identifier for entities, without surrounding text.
“名称”是一个标识符,如人名、主机名、域名、文件名或电子邮件地址;它通常被视为一个标识符而不是一段文本,并且在协议中经常被用作实体的标识符,而不包含文本。
All protocols MUST identify, for all character data, which charset is in use.
对于所有字符数据,所有协议都必须标识正在使用的字符集。
Protocols MUST be able to use the UTF-8 charset, which consists of the ISO 10646 coded character set combined with the UTF-8 character encoding scheme, as defined in [10646] Annex R (published in Amendment 2), for all text.
协议必须能够对所有文本使用UTF-8字符集,该字符集由ISO 10646编码字符集和UTF-8字符编码方案组成,如[10646]附录R(修订件2中发布)中所定义。
Protocols MAY specify, in addition, how to use other charsets or other character encoding schemes for ISO 10646, such as UTF-16, but lack of an ability to use UTF-8 is a violation of this policy; such a violation would need a variance procedure ([BCP9] section 9) with clear and solid justification in the protocol specification document before being entered into or advanced upon the standards track.
此外,协议可规定如何使用其他字符集或ISO 10646的其他字符编码方案,如UTF-16,但缺乏使用UTF-8的能力违反了本政策;在进入标准轨道或进入标准轨道之前,此类违规行为需要在协议规范文件中有明确和可靠理由的变更程序([BCP9]第9节)。
For existing protocols or protocols that move data from existing datastores, support of other charsets, or even using a default other than UTF-8, may be a requirement. This is acceptable, but UTF-8 support MUST be possible.
对于现有协议或从现有数据存储中移动数据的协议,可能需要支持其他字符集,甚至使用UTF-8以外的默认字符集。这是可以接受的,但必须能够支持UTF-8。
When using other charsets than UTF-8, these MUST be registered in the IANA charset registry, if necessary by registering them when the protocol is published.
当使用UTF-8以外的其他字符集时,必须在IANA字符集注册表中注册这些字符集,如有必要,可在协议发布时进行注册。
(Note: ISO 10646 calls the UTF-8 CES a "Transformation Format" rather than a "character encoding scheme", but it fits the charset workshop report definition of a character encoding scheme).
(注:ISO10646将UTF-8 CES称为“转换格式”,而不是“字符编码方案”,但它符合字符编码方案的字符集研讨会报告定义)。
When the protocol allows a choice of multiple charsets, someone must make a decision on which charset to use.
当协议允许选择多个字符集时,必须有人决定使用哪个字符集。
In some cases, like HTTP, there is direct or semi-direct communication between the producer and the consumer of data containing text. In such cases, it may make sense to negotiate a charset before sending data.
在某些情况下,如HTTP,包含文本的数据的生产者和消费者之间存在直接或半直接通信。在这种情况下,在发送数据之前协商字符集可能是有意义的。
In other cases, like E-mail or stored data, there is no such communication, and the best one can do is to make sure the charset is clearly identified with the stored data, and choosing a charset that is as widely known as possible.
在其他情况下,如电子邮件或存储的数据,没有这种通信,最好的办法是确保字符集与存储的数据清楚地标识,并选择一个尽可能广为人知的字符集。
Note that a charset is an absolute; text that is encoded in a charset cannot be rendered comprehensibly without supporting that charset.
注意,字符集是绝对的;编码在字符集中的文本如果不支持该字符集,则无法以可理解的方式呈现。
(This also applies to English texts; charsets like EBCDIC do NOT have ASCII as a proper subset)
(这也适用于英文文本;像EBCDIC这样的字符集没有ASCII作为适当的子集)
Negotiating a charset may be regarded as an interim mechanism that is to be supported until support for interchange of UTF-8 is prevalent; however, the timeframe of "interim" may be at least 50 years, so there is every reason to think of it as permanent in practice.
协商字符集可被视为一种临时机制,在普遍支持UTF-8交换之前予以支持;然而,“临时”的时限可能至少为50年,因此有充分的理由认为它在实践中是永久性的。
All human-readable text has a language.
所有人类可读的文本都有一种语言。
Many operations, including high quality formatting, text-to-speech synthesis, searching, hyphenation, spellchecking and so on benefit greatly from access to information about the language of a piece of text. [WC 3.1.1.4].
许多操作,包括高质量的格式化、文本到语音合成、搜索、断字、拼写检查等,都从访问有关文本语言的信息中受益匪浅。[WC 3.1.1.4]。
Humans have some tolerance for foreign languages, but are generally very unhappy with being presented text in a language they do not understand; this is why negotiation of language is needed.
人类对外语有一定的容忍度,但通常对用他们不懂的语言呈现文本感到非常不高兴;这就是为什么需要进行语言谈判。
In most cases, machines will not be able to deduce the language of a transmitted text by themselves; the protocol must specify how to transfer the language information if it is to be available at all.
在大多数情况下,机器无法自行推断传输文本的语言;协议必须指定如何传输语言信息(如果语言信息可用)。
The interaction between language and processing is complex; for instance, if I compare "name-of-thing(lang=en)" to "name-of-thing(lang=no)" for equality, I will generally expect a match, while the word "ask(no)" is a kind of tree, and is hardly useful as a command verb.
语言与加工之间的相互作用是复杂的;例如,如果我比较“事物名称(lang=en)”和“事物名称(lang=no)”是否相等,我通常会期望匹配,而单词“ask(no)”是一种树,作为命令动词几乎没有用处。
Protocols that transfer text MUST provide for carrying information about the language of that text.
传输文本的协议必须提供有关该文本语言的信息。
Protocols SHOULD also provide for carrying information about the language of names, where appropriate.
在适当的情况下,协议还应规定携带有关姓名语言的信息。
Note that this does NOT mean that such information must always be present; the requirement is that if the sender of information wishes to send information about the language of a text, the protocol provides a well-defined way to carry this information.
注意,这并不意味着此类信息必须始终存在;要求是,如果信息发送者希望发送关于文本语言的信息,协议提供了一种定义明确的方式来携带该信息。
The RFC 1766 language tag is at the moment the most flexible tool available for identifying a language; protocols SHOULD use this, or provide clear and solid justification for doing otherwise in the document.
RFC1766语言标签是目前识别语言最灵活的工具;协议应使用这一点,或在文件中提供明确和可靠的理由来证明不这样做。
Note also that a language is distinct from a POSIX locale; a POSIX locale identifies a set of cultural conventions, which may imply a language (the POSIX or "C" locale of course do not), while a language tag as described in RFC 1766 identifies only a language.
还要注意,一种语言不同于POSIX语言环境;POSIX语言环境标识一组文化约定,这可能意味着一种语言(POSIX或“C”语言环境当然不是),而RFC1766中描述的语言标记仅标识一种语言。
Protocols where users have text presented to them in response to user actions MUST provide for support of multiple languages.
协议中,如果用户在响应用户操作时向其显示文本,则必须提供对多种语言的支持。
How this is done will vary between protocols; for instance, in some cases, a negotiation where the client proposes a set of languages and the server replies with one is appropriate; in other cases, a server may choose to send multiple variants of a text and let the client pick which one to display.
如何做到这一点将因协议而异;例如,在某些情况下,客户机提出一组语言,而服务器用一种语言回复的协商是合适的;在其他情况下,服务器可以选择发送文本的多个变体,并让客户端选择显示哪个变体。
Negotiation is useful in the case where one side of the protocol exchange is able to present text in multiple languages to the other side, and the other side has a preference for one of these; the most common example is the text part of error responses, or Web pages that are available in multiple languages.
在协议交换的一方能够以多种语言向另一方呈现文本,并且另一方优先选择其中一种语言的情况下,协商是有用的;最常见的例子是错误响应的文本部分,或者可以使用多种语言的网页。
Negotiating a language should be regarded as a permanent requirement of the protocol that will not go away at any time in the future.
谈判一种语言应被视为《议定书》的一项永久性要求,在今后任何时候都不会消失。
In many cases, it should be possible to include it as part of the connection establishment, together with authentication and other preferences negotiation.
在许多情况下,应该可以将其与身份验证和其他首选项协商一起作为连接建立的一部分。
When human-readable text must be presented in a context where the sender has no knowledge of the recipient's language preferences (such as login failures or E-mailed warnings, or prior to language negotiation), text SHOULD be presented in Default Language.
当人类可读文本必须在发送者不知道接收者的语言偏好(如登录失败或电子邮件警告,或在语言协商之前)的上下文中呈现时,文本应以默认语言呈现。
Default Language is assigned the tag "i-default" according to the procedures of RFC 1766. It is not a specific language, but rather identifies the condition where the language preferences of the user cannot be established.
根据RFC1766的程序,默认语言被分配标签“i-Default”。它不是一种特定的语言,而是标识了无法建立用户语言首选项的条件。
Messages in Default Language MUST be understandable by an English-speaking person, since English is the language which, worldwide, the greatest number of people will be able to get adequate help in interpreting when working with computers.
默认语言的信息必须能被说英语的人理解,因为英语是世界上最大数量的人在使用计算机时能够在口译方面得到充分帮助的语言。
Note that negotiating English is NOT the same as Default Language; Default Language is an emergency measure in otherwise unmanageable situations.
注意谈判英语与默认语言不同;默认语言在其他无法管理的情况下是一种紧急措施。
In many cases, using only English text is reasonable; in some cases, the English text may be augumented by text in other languages.
在许多情况下,仅使用英文文本是合理的;在某些情况下,英文文本可能由其他语言的文本作为预兆。
The POSIX standard [POSIX] defines a concept called a "locale", which includes a lot of information about collating order for sorting, date format, currency format and so on.
POSIX标准[POSIX]定义了一个称为“语言环境”的概念,其中包含大量关于排序排序顺序、日期格式、货币格式等的信息。
In some cases, and especially with text where the user is expected to do processing on the text, locale information may be usefully attached to the text; this would identify the sender's opinion about appropriate rules to follow when processing the document, which the recipient may choose to agree with or ignore.
在某些情况下,尤其是对于预期用户对文本进行处理的文本,可以将区域设置信息有效地附加到文本中;这将确定发件人对处理文档时应遵循的适当规则的意见,收件人可以选择同意或忽略这些规则。
This document does not require the communication of locale information on all text, but encourages its inclusion when appropriate.
本文件不要求在所有文本上传达区域设置信息,但鼓励在适当的时候将其包括在内。
Note that language and character set information will often be present as parts of a locale tag (such as no_NO.iso-8859-1; the language is before the underscore and the character set is after the dot); care must be taken to define precisely which specification of character set and language applies to any one text item.
请注意,语言和字符集信息通常作为区域设置标记的一部分出现(例如no_no.iso-8859-1;语言在下划线之前,字符集在点之后);必须注意精确定义适用于任何一个文本项的字符集和语言规范。
The default locale is the "POSIX" locale.
默认的语言环境是“POSIX”语言环境。
In documents that deal with internationalization issues at all, a synopsis of the approaches chosen for internationalization SHOULD be collected into a section called "Internationalization considerations", and placed next to the Security Considerations section.
在所有涉及国际化问题的文档中,选择国际化方法的概要应收集到一个名为“国际化注意事项”的部分,并放在安全注意事项部分的旁边。
This provides an easy reference for those who are looking for advice on these issues when implementing the protocol.
这为那些在实施协议时就这些问题寻求建议的人提供了一个简单的参考。
Apart from the fact that security warnings in a foreign language may cause inappropriate behaviour from the user, and the fact that multilingual systems usually have problems with consistency between language variants, no security considerations relevant have been identified.
除了使用外语发出的安全警告可能会导致用户做出不适当的行为,以及多语言系统通常在语言变体之间的一致性方面存在问题之外,还没有发现任何相关的安全注意事项。
[10646] ISO/IEC, Information Technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane, May 1993, with amendments
[10646]ISO/IEC,信息技术-通用多八位编码字符集(UCS)-第1部分:体系结构和基本多语言平面,1993年5月,修订版
[RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC 2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[WR] Weider, C., Preston, C., Simonsen, K., Alvestrand, H, Atkinson, R., Crispin, M., and P. Svanberg, "The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996", RFC 2130, April 1997.
[WR]Weider,C.,Preston,C.,Simonsen,K.,Alvestrand,H,Atkinson,R.,Crispin,M.,和P.Svanberg,“1996年2月29日至3月1日举行的IAB字符集研讨会报告”,RFC 21301997年4月。
[RFC 1958] Carpenter, B., "Architectural Principles of the Internet", RFC 1958, June 1996.
[RFC 1958]Carpenter,B.,“互联网的建筑原理”,RFC 19581996年6月。
[POSIX] ISO/IEC 9945-2:1993 Information technology -- Portable Operating System Interface (POSIX) -- Part 2: Shell and Utilities
[POSIX] ISO/IEC 9945-2:1993 Information technology -- Portable Operating System Interface (POSIX) -- Part 2: Shell and Utilities
[REG] Freed, N., and J. Postel, "IANA Charset Registration Procedures", BCP 19, RFC 2278, January 1998.
[REG]Freed,N.和J.Postel,“IANA字符集注册程序”,BCP 19,RFC 2278,1998年1月。
[UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC 2279, January 1998.
[UTF-8]Yergeau,F.,“UTF-8,ISO 10646的转换格式”,RFC 2279,1998年1月。
[BCP9] Bradner, S., "The Internet Standards Process -- Revision 3," BCP 9, RFC 2026, October 1996.
[BCP9]Bradner,S.,“互联网标准过程——第3版”,BCP 9,RFC 2026,1996年10月。
Harald Tveit Alvestrand UNINETT P.O.Box 6883 Elgeseter N-7002 TRONDHEIM NORWAY
挪威特隆赫姆哈拉尔德·特维特·阿尔韦斯特兰·尤尼特邮政信箱6883埃尔格塞特N-7002
Phone: +47 73 59 70 94 EMail: Harald.T.Alvestrand@uninett.no
Phone: +47 73 59 70 94 EMail: Harald.T.Alvestrand@uninett.no
Copyright (C) The Internet Society (1998). All Rights Reserved.
版权所有(C)互联网协会(1998年)。版权所有。
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。