Network Working Group E. Whitehead Request for Comments: 2376 UC Irvine Category: Informational M. Murata Fuji Xerox Info. Systems July 1998
Network Working Group E. Whitehead Request for Comments: 2376 UC Irvine Category: Informational M. Murata Fuji Xerox Info. Systems July 1998
XML Media Types
XML媒体类型
Status of this Memo
本备忘录的状况
This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The Internet Society (1998). All Rights Reserved.
版权所有(C)互联网协会(1998年)。版权所有。
Abstract
摘要
This document proposes two new media subtypes, text/xml and application/xml, for use in exchanging network entities which are conforming Extensible Markup Language (XML). XML entities are currently exchanged via the HyperText Transfer Protocol on the World Wide Web, are an integral part of the WebDAV protocol for remote web authoring, and are expected to have utility in many domains.
本文档提出了两种新的媒体子类型,text/xml和application/xml,用于交换符合可扩展标记语言(xml)的网络实体。XML实体目前通过万维网上的超文本传输协议进行交换,是用于远程Web创作的WebDAV协议的一个组成部分,并有望在许多领域中发挥作用。
Table of Contents
目录
1 INTRODUCTION ....................................................2 2 NOTATIONAL CONVENTIONS ..........................................3 3 XML MEDIA TYPES .................................................3 3.1 Text/xml Registration ........................................3 3.2 Application/xml Registration .................................6 4 SECURITY CONSIDERATIONS .........................................8 5 THE BYTE ORDER MARK (BOM) AND CONVERSIONS TO/FROM UTF-16 ........9 6 EXAMPLES ........................................................9 6.1 text/xml with UTF-8 Charset .................................10 6.2 text/xml with UTF-16 Charset ................................10 6.3 text/xml with ISO-2022-KR Charset ...........................10 6.4 text/xml with Omitted Charset ...............................11 6.5 application/xml with UTF-16 Charset .........................11 6.6 application/xml with ISO-2022-KR Charset ....................11 6.7 application/xml with Omitted Charset and UTF-16 XML Entity ..12 6.8 application/xml with Omitted Charset and UTF-8 Entity .......12 6.9 application/xml with Omitted Charset and Internal Encoding Declaration.......................................................12
1 INTRODUCTION ....................................................2 2 NOTATIONAL CONVENTIONS ..........................................3 3 XML MEDIA TYPES .................................................3 3.1 Text/xml Registration ........................................3 3.2 Application/xml Registration .................................6 4 SECURITY CONSIDERATIONS .........................................8 5 THE BYTE ORDER MARK (BOM) AND CONVERSIONS TO/FROM UTF-16 ........9 6 EXAMPLES ........................................................9 6.1 text/xml with UTF-8 Charset .................................10 6.2 text/xml with UTF-16 Charset ................................10 6.3 text/xml with ISO-2022-KR Charset ...........................10 6.4 text/xml with Omitted Charset ...............................11 6.5 application/xml with UTF-16 Charset .........................11 6.6 application/xml with ISO-2022-KR Charset ....................11 6.7 application/xml with Omitted Charset and UTF-16 XML Entity ..12 6.8 application/xml with Omitted Charset and UTF-8 Entity .......12 6.9 application/xml with Omitted Charset and Internal Encoding Declaration.......................................................12
7 REFERENCES .....................................................13 8 ACKNOWLEDGEMENTS ...............................................14 9 ADDRESSES OF AUTHORS ...........................................14 10 FULL COPYRIGHT STATEMENT ......................................15
7 REFERENCES .....................................................13 8 ACKNOWLEDGEMENTS ...............................................14 9 ADDRESSES OF AUTHORS ...........................................14 10 FULL COPYRIGHT STATEMENT ......................................15
1 Introduction
1导言
The World Wide Web Consortium (W3C) has issued a Recommendation [REC-XML] which defines the Extensible Markup Language (XML), version 1. To enable the exchange of XML network entities, this document proposes two new media types, text/xml and application/xml.
万维网联盟(W3C)发布了一项建议[REC-XML],该建议定义了可扩展标记语言(XML)版本1。为了实现XML网络实体的交换,本文提出了两种新的媒体类型,text/XML和application/XML。
XML entities are currently exchanged on the World Wide Web, and XML is also used for property values and parameter marshalling by the WebDAV protocol for remote web authoring. Thus, there is a need for a media type to properly label the exchange of XML network entities. (Note that, as sometimes happens between two communities, both MIME and XML have defined the term entity, with different meanings.)
XML实体目前在万维网上交换,XML还用于WebDAV协议的属性值和参数编组,用于远程Web创作。因此,需要一种媒体类型来正确标记XML网络实体的交换。(注意,正如两个社区之间有时发生的情况一样,MIME和XML都定义了术语实体,含义不同。)
Although XML is a subset of the Standard Generalized Markup Language (SGML) [ISO-8897], and currently is assigned the media types text/sgml and application/sgml, there are several reasons why use of text/sgml or application/sgml to label XML is inappropriate. First, there exist many applications which can process XML, but which cannot process SGML, due to SGML's larger feature set. Second, SGML applications cannot always process XML entities, because XML uses features of recent technical corrigenda to SGML. Third, the definition of text/sgml and application/sgml [RFC-1874] includes parameters for SGML bit combination transformation format (SGML-bctf), and SGML boot attribute (SGML-boot). Since XML does not use these parameters, it would be ambiguous if such parameters were given for an XML entity. For these reasons, the best approach for labeling XML network entities is to provide new media types for XML.
尽管XML是标准通用标记语言(SGML)[ISO-8897]的一个子集,并且目前被指定为媒体类型text/SGML和application/SGML,但使用text/SGML或application/SGML来标记XML有几个原因是不合适的。首先,存在许多可以处理XML的应用程序,但由于SGML具有更大的功能集,它们无法处理SGML。其次,SGML应用程序不能总是处理XML实体,因为XML使用了SGML最新技术勘误的特性。第三,text/sgml和application/sgml[RFC-1874]的定义包括sgml位组合转换格式(sgml-bctf)和sgml引导属性(sgml-boot)的参数。由于XML不使用这些参数,因此如果为XML实体提供这些参数,则会产生歧义。出于这些原因,标记XML网络实体的最佳方法是为XML提供新的媒体类型。
Since XML is an integral part of the WebDAV Distributed Authoring Protocol, and since World Wide Web Consortium Recommendations have conventionally been assigned IETF tree media types, and since similar media types (HTML, SGML) have been assigned IETF tree media types, the XML media types also belong in the IETF media types tree.
由于XML是WebDAV分布式创作协议不可分割的一部分,由于万维网联盟的建议通常被分配IETF树媒体类型,并且由于相似的媒体类型(HTML、SGML)被分配IETF树媒体类型,因此XML媒体类型也属于IETF媒体类型树。
2 Notational Conventions
2符号公约
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC-2119]中所述进行解释。
3 XML Media Types
3种XML媒体类型
This document introduces two new media types for XML entities, text/xml and application/xml. Registration information for these media types are described in the sections below.
本文档介绍了XML实体的两种新媒体类型,text/XML和application/XML。这些媒体类型的注册信息将在以下各节中介绍。
Every XML entity is suitable for use with the application/xml media type without modification. But this does not exploit the fact that XML can be treated as plain text in many cases. MIME user agents (and web user agents) that do not have explicit support for application/xml will treat it as application/octet-stream, for example, by offering to save it to a file.
每个XML实体都适合与application/XML媒体类型一起使用,无需修改。但这并没有利用XML在许多情况下可以被视为纯文本这一事实。不明确支持application/xml的MIME用户代理(和web用户代理)将把它作为application/octet流处理,例如,提供将它保存到文件中。
To indicate that an XML entity should be treated as plain text by default, use the text/xml media type. This restricts the encoding used in the XML entity to those that are compatible with the requirements for text media types as described in [RFC-2045] and [RFC-2046], e.g., UTF-8, but not UTF-16 (except for HTTP).
要指示默认情况下应将XML实体视为纯文本,请使用text/XML媒体类型。这将XML实体中使用的编码限制为与[RFC-2045]和[RFC-2046]中所述的文本媒体类型要求兼容的编码,例如UTF-8,但不包括UTF-16(HTTP除外)。
XML provides a general framework for defining sequences of structured data. In some cases, it may be desirable to define new media types which use XML but define a specific application of XML, perhaps due to domain-specific security considerations or runtime information. This document does not prohibit future media types dedicated to such XML applications. However, developers of such media types are recommended to use this document as a basis. In particular, the charset parameter should be used in the same manner.
XML为定义结构化数据序列提供了通用框架。在某些情况下,可能需要定义使用XML但定义XML的特定应用程序的新媒体类型,这可能是由于特定于域的安全考虑或运行时信息。本文档不禁止将来专用于此类XML应用程序的媒体类型。但是,建议此类媒体类型的开发人员使用本文档作为基础。特别是,字符集参数应以相同的方式使用。
Within the XML specification, XML entities can be classified into four types. In the XML terminology, they are called "document entities", "external DTD subsets", "external parsed entities", and "external parameter entities". The media types text/xml and application/xml can be used for any of these four types.
在XML规范中,XML实体可以分为四种类型。在XML术语中,它们被称为“文档实体”、“外部DTD子集”、“外部解析实体”和“外部参数实体”。媒体类型text/xml和application/xml可用于这四种类型中的任何一种。
MIME media type name: text
MIME媒体类型名称:text
MIME subtype name: xml
MIME子类型名称:xml
Mandatory parameters: none
强制参数:无
Optional parameters: charset
可选参数:字符集
Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the character encoding of the XML entity. The charset parameter can also be used to provide protocol-specific operations, such as charset-based content negotiation in HTTP. "UTF-8" [RFC-2279] is the recommended value, representing the UTF-8 charset. UTF-8 is supported by all conforming XML processors [REC-XML].
尽管列为可选参数,但强烈建议使用charset参数,因为XML处理器可以使用此信息权威地确定XML实体的字符编码。charset参数还可用于提供特定于协议的操作,例如HTTP中基于charset的内容协商。“UTF-8”[RFC-2279]是推荐值,表示UTF-8字符集。所有符合标准的XML处理器[REC-XML]都支持UTF-8。
If the XML entity is transmitted via HTTP, which uses a MIME-like mechanism that is exempt from the restrictions on the text top-level type (see section 19.4.1 of HTTP 1.1 [RFC-2068]), "UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]) is also recommended. UTF-16 is supported by all conforming XML processors [REC-XML]. Since the handling of CR, LF and NUL for text types in most MIME applications would cause undesired transformations of individual octets in UTF-16 multi-octet characters, gateways from HTTP to these MIME applications MUST transform the XML entity from a text/xml; charset="utf-16" to application/xml; charset="utf-16".
如果XML实体通过HTTP传输,HTTP使用类似MIME的机制,不受文本顶级类型的限制(见HTTP 1.1[RFC-2068]第19.4.1节),“UTF-16”(UNICODE的附录C.3和[ISO-10646]的修订件1)也是推荐的。所有符合标准的XML处理器[REC-XML]都支持UTF-16。由于在大多数MIME应用程序中处理文本类型的CR、LF和NUL会导致UTF-16多八位字节字符中单个八位字节的不希望的转换,因此从HTTP到这些MIME应用程序的网关必须将XML实体从文本/XML转换成XML实体;charset=“utf-16”到应用程序/xml;charset=“utf-16”。
Conformant with [RFC-2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii". In cases where the XML entity is transmitted via HTTP, the default charset value is still "us-ascii".
根据[RFC-2046],如果接收到的文本/xml实体省略了字符集参数,MIME处理器和xml处理器必须使用默认字符集值“us ascii”。在XML实体通过HTTP传输的情况下,默认字符集值仍然是“us ascii”。
Since the charset parameter is authoritative, the charset is not always declared within an XML encoding declaration. Thus, special care is needed when the recipient strips the MIME header and provides persistent storage of the received XML entity (e.g., in a file system). Unless the charset is UTF-8 or UTF-16, the recipient SHOULD also persistently store information about the charset, perhaps by embedding a correct XML encoding declaration within the XML entity.
由于字符集参数是权威的,所以字符集并不总是在XML编码声明中声明。因此,当接收者剥离MIME头并提供接收到的XML实体的持久存储(例如,在文件系统中)时,需要特别小心。除非字符集是UTF-8或UTF-16,否则接收方还应该持久存储有关该字符集的信息,可能是通过在XML实体中嵌入正确的XML编码声明。
Encoding considerations:
编码注意事项:
This media type MAY be encoded as appropriate for the charset and the capabilities of the underlying MIME transport. For 7-bit transports, data in both UTF-8 and UTF-16 is encoded in quoted-printable or base64. For 8-bit clean transport (e.g., ESMTP, 8BITMIME, or NNTP), UTF-8 is not encoded, but UTF-16 is base64 encoded. For binary clean transports (e.g., HTTP), no content-transfer-encoding is necessary.
此媒体类型可以根据字符集和基础MIME传输的功能进行适当编码。对于7位传输,UTF-8和UTF-16中的数据均采用带引号的可打印或base64编码。对于8位干净传输(例如ESMTP、8BITMIME或NNTP),UTF-8不编码,但UTF-16是base64编码的。对于二进制干净传输(例如HTTP),不需要内容传输编码。
Security considerations:
安全考虑:
See section 4 below.
见下文第4节。
Interoperability considerations:
互操作性注意事项:
XML has proven to be interoperable across WebDAV clients and servers, and for import and export from multiple XML authoring tools.
XML已被证明可以跨WebDAV客户端和服务器进行互操作,并且可以从多个XML创作工具导入和导出。
Published specification: see [REC-XML]
发布的规范:参见[REC-XML]
Applications which use this media type:
使用此媒体类型的应用程序:
XML is device-, platform-, and vendor-neutral and is supported by a wide range of Web user agents, WebDAV clients and servers, as well as XML authoring tools.
XML是设备、平台和供应商中立的,并受到广泛的Web用户代理、WebDAV客户端和服务器以及XML创作工具的支持。
Additional information:
其他信息:
Magic number(s): none
幻数:无
Although no byte sequences can be counted on to always be present, XML entities in ASCII-compatible charsets (including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"). For more information, see Appendix F of [REC-XML].
虽然不能指望字节序列总是存在,但ASCII兼容字符集(包括UTF-8)中的XML实体通常以十六进制3C 3F 78 6D 6C(“XML”)开头。有关更多信息,请参见[REC-XML]的附录F。
File extension(s): .xml, .dtd Macintosh File Type Code(s): "TEXT"
File extension(s): .xml, .dtd Macintosh File Type Code(s): "TEXT"
Person & email address for further information:
更多信息的联系人和电子邮件地址:
Dan Connolly <connolly@w3.org> Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
Dan Connolly <connolly@w3.org> Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
Intended usage: COMMON
预期用途:普通
Author/Change controller:
作者/变更控制员:
The XML specification is a work product of the World Wide Web Consortium's XML Working Group, and was edited by:
XML规范是万维网联盟XML工作组的工作成果,由以下人员编辑:
Tim Bray <tbray@textuality.com> Jean Paoli <jeanpa@microsoft.com> C. M. Sperberg-McQueen <cmsmcq@uic.edu>
Tim Bray <tbray@textuality.com> Jean Paoli <jeanpa@microsoft.com> C. M. Sperberg-McQueen <cmsmcq@uic.edu>
The W3C, and the W3C XML working group, has change control over the XML specification.
W3C和W3CXML工作组对XML规范具有变更控制权。
MIME media type name: application
MIME媒体类型名称:应用程序
MIME subtype name: xml
MIME子类型名称:xml
Mandatory parameters: none
强制参数:无
Optional parameters: charset
可选参数:字符集
Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML entity. The charset parameter can also be used to provide protocol-specific operations, such as charset-based content negotiation in HTTP.
尽管列为可选参数,但强烈建议使用charset参数,因为XML处理器可以使用此信息权威地确定XML实体的字符集。charset参数还可用于提供特定于协议的操作,例如HTTP中基于charset的内容协商。
"UTF-8" [RFC-2279] and "UTF-16" (Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]) are the recommended values, representing the UTF-8 and UTF-16 charsets, respectively. These charsets are preferred since they are supported by all conforming XML processors [REC-XML].
“UTF-8”[RFC-2279]和“UTF-16”([UNICODE]的附录C.3和[ISO-10646]的修改件1])是推荐值,分别表示UTF-8和UTF-16字符集。这些字符集是首选的,因为所有符合标准的XML处理器[REC-XML]都支持它们。
If an application/xml entity is received where the charset parameter is omitted, no information is being provided about the charset by the MIME Content-Type header. Conforming XML processors MUST follow the requirements in section 4.3.3 of [REC-XML] which directly address this contingency. However, MIME processors which are not XML processors should not assume a default charset if the charset parameter is omitted from an application/xml entity.
如果接收到的应用程序/xml实体省略了charset参数,则MIME内容类型标头不会提供有关该字符集的任何信息。符合要求的XML处理者必须遵守[REC-XML]第4.3.3节的要求,该节直接解决了这一意外情况。但是,如果应用程序/XML实体中省略了charset参数,则非XML处理器的MIME处理器不应采用默认字符集。
Since the charset parameter is authoritative, the charset is not always declared within an XML encoding declaration. Thus, special care is needed when the recipient strips the MIME header and provides persistent storage of the received XML entity (e.g., in a file system). Unless the charset is UTF-8 or UTF-16, the recipient SHOULD also persistently store information about the charset, perhaps by embedding a correct XML encoding declaration within the XML entity.
由于字符集参数是权威的,所以字符集并不总是在XML编码声明中声明。因此,当接收者剥离MIME头并提供接收到的XML实体的持久存储(例如,在文件系统中)时,需要特别小心。除非字符集是UTF-8或UTF-16,否则接收方还应该持久存储有关该字符集的信息,可能是通过在XML实体中嵌入正确的XML编码声明。
Encoding considerations:
编码注意事项:
This media type MAY be encoded as appropriate for the charset and the capabilities of the underlying MIME transport. For 7-bit transports, data in both UTF-8 and UTF-16 is encoded in quoted-printable or base64. For 8-bit clean transport (e.g., ESMTP, 8BITMIME, or NNTP), UTF-8 is not encoded, but UTF-16 is base64 encoded. For binary clean transport (e.g., HTTP), no content-transfer-encoding is necessary.
此媒体类型可以根据字符集和基础MIME传输的功能进行适当编码。对于7位传输,UTF-8和UTF-16中的数据均采用带引号的可打印或base64编码。对于8位干净传输(例如ESMTP、8BITMIME或NNTP),UTF-8不编码,但UTF-16是base64编码的。对于二进制干净传输(例如HTTP),不需要内容传输编码。
Security considerations:
安全考虑:
See section 4 below.
见下文第4节。
Interoperability considerations:
互操作性注意事项:
XML has proven to be interoperable for import and export from multiple XML authoring tools.
XML已被证明是可互操作的,可以从多个XML创作工具导入和导出。
Published specification: see [REC-XML]
发布的规范:参见[REC-XML]
Applications which use this media type:
使用此媒体类型的应用程序:
XML is device-, platform-, and vendor-neutral and is supported by a wide range of Web user agents and XML authoring tools.
XML是设备、平台和供应商中立的,并受到广泛的Web用户代理和XML创作工具的支持。
Additional information:
其他信息:
Magic number(s): none
幻数:无
Although no byte sequences can be counted on to always be present, XML entities in ASCII-compatible charsets (including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"), and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00 3F 00 78 00 6D or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order Mark (BOM) followed by "<?xml"). For more information, see Annex F of [REC-XML].
虽然不能指望总是存在字节序列,但ASCII兼容字符集(包括UTF-8)中的XML实体通常以十六进制3C 3F 78 6D 6C(“XML”)开头,而UTF-16中的XML实体通常以十六进制FE FF 00 3C 00 3F 00 78 00 6D或FF FE 3C 00 3F 00 78 00 6D 00开头(字节顺序标记(BOM)后跟“<?XML”)。有关更多信息,请参见[REC-XML]的附录F。
File extension(s): .xml, .dtd Macintosh File Type Code(s): "TEXT"
File extension(s): .xml, .dtd Macintosh File Type Code(s): "TEXT"
Person & email address for further information:
更多信息的联系人和电子邮件地址:
Dan Connolly <connolly@w3.org> Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
Dan Connolly <connolly@w3.org> Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
Intended usage: COMMON
预期用途:普通
Author/Change controller:
作者/变更控制员:
The XML specification is a work product of the World Wide Web Consortium's XML Working Group, and was edited by:
XML规范是万维网联盟XML工作组的工作成果,由以下人员编辑:
Tim Bray <tbray@textuality.com> Jean Paoli <jeanpa@microsoft.com> C. M. Sperberg-McQueen <cmsmcq@uic.edu>
Tim Bray <tbray@textuality.com> Jean Paoli <jeanpa@microsoft.com> C. M. Sperberg-McQueen <cmsmcq@uic.edu>
The W3C, and the W3C XML working group, has change control over the XML specification.
W3C和W3CXML工作组对XML规范具有变更控制权。
4 Security Considerations
4安全考虑
XML, as a subset of SGML, has the same security considerations as specified in [RFC-1874].
XML作为SGML的一个子集,具有与[RFC-1874]中规定的相同的安全注意事项。
To paraphrase section 3 of [RFC-1874], XML entities contain information to be parsed and processed by the recipient's XML system. These entities may contain and such systems may permit explicit system level commands to be executed while processing the data. To the extent that an XML system will execute arbitrary command strings, recipients of XML entities may be at risk. In general, it may be possible to specify commands that perform unauthorized file operations or make changes to the display processor's environment that affect subsequent operations.
如[RFC-1874]第3节所述,XML实体包含接收方XML系统解析和处理的信息。这些实体可以包含并且这些系统可以允许在处理数据时执行显式的系统级命令。由于XML系统将执行任意命令字符串,XML实体的接收者可能面临风险。通常,可以指定执行未经授权的文件操作的命令,或对显示处理器的环境进行影响后续操作的更改。
Use of XML is expected to be varied, and widespread. XML is under scrutiny by a wide range of communities for use as a common syntax for community-specific metadata. For example, the Dublin Core group is using XML for document metadata, and a new effort has begun which is considering use of XML for medical information. Other groups view XML as a mechanism for marshalling parameters for remote procedure calls. More uses of XML will undoubtedly arise.
XML的使用预计将是多种多样的,并且广泛使用。XML作为社区特定元数据的通用语法,正受到广泛社区的关注。例如,都柏林核心小组正在使用XML作为文档元数据,一项新的工作已经开始,该工作正在考虑将XML用于医疗信息。其他组将XML视为远程过程调用的参数编组机制。毫无疑问,XML的更多用途将会出现。
Security considerations will vary by domain of use. For example, XML medical records will have much more stringent privacy and security considerations than XML library metadata. Similarly, use of XML as a parameter marshalling syntax necessitates a case by case security review.
安全注意事项因使用领域而异。例如,XML医疗记录比XML库元数据具有更严格的隐私和安全考虑。类似地,使用XML作为参数编组语法需要逐案进行安全审查。
XML may also have some of the same security concerns as plain text. Like plain text, XML can contain escape sequences which, when displayed, have the potential to change the display processor environment in ways that adversely affect subsequent operations. Possible effects include, but are not limited to, locking the keyboard, changing display parameters so subsequent displayed text is unreadable, or even changing display parameters to deliberately
XML可能也有一些与纯文本相同的安全问题。与纯文本一样,XML可以包含转义序列,当显示该序列时,可能会以对后续操作产生不利影响的方式更改显示处理器环境。可能的影响包括但不限于锁定键盘、更改显示参数以使后续显示的文本无法读取,甚至故意将显示参数更改为
obscure or distort subsequent displayed material so that its meaning is lost or altered. Display processors should either filter such material from displayed text or else make sure to reset all important settings after a given display operation is complete.
模糊或扭曲随后显示的材料,使其含义丢失或改变。显示处理器应该从显示的文本中过滤这些内容,或者确保在给定的显示操作完成后重置所有重要设置。
Some terminal devices have keys whose output, when pressed, can be changed by sending the display processor a character sequence. If this is possible the display of a text object containing such character sequences could reprogram keys to perform some illicit or dangerous action when the key is subsequently pressed by the user. In some cases not only can keys be programmed, they can be triggered remotely, making it possible for a text display operation to directly perform some unwanted action. As such, the ability to program keys should be blocked either by filtering or by disabling the ability to program keys entirely.
某些终端设备具有按键,按下按键后,可通过向显示处理器发送字符序列来更改其输出。如果这是可能的,当用户随后按下键时,包含此类字符序列的文本对象的显示可能会重新编程键以执行一些非法或危险的操作。在某些情况下,不仅可以对按键进行编程,还可以远程触发按键,使文本显示操作能够直接执行一些不需要的操作。因此,应通过过滤或完全禁用密钥编程功能来阻止密钥编程功能。
Note that it is also possible to construct XML documents which make use of what XML terms "entity references" (using the XML meaning of the term "entity", which differs from the MIME definition of this term), to construct repeated expansions of text. Recursive expansions are prohibited [REC-XML] and XML processors are required to detect them. However, even non-recursive expansions may cause problems with the finite computing resources of computers, if they are performed many times.
请注意,还可以使用XML术语“实体引用”(使用术语“实体”的XML含义,这与该术语的MIME定义不同)来构造XML文档,以构造文本的重复扩展。禁止递归扩展[REC-XML],需要XML处理器来检测它们。然而,即使是非递归扩展,如果执行多次,也可能会导致计算机有限计算资源的问题。
5 The Byte Order Mark (BOM) and Conversions to/from UTF-16
5字节顺序标记(BOM)和与UTF-16的转换
The XML Recommendation, in section 4.3.3, specifies that UTF-16 XML entities must begin with a byte order mark (BOM), which is the ZERO WIDTH NO-BREAK SPACE character, hexadecimal sequence 0xFEFF (or 0xFFFE, depending on endian). The XML Recommendation further states that the BOM is an encoding signature, and is not part of either the markup or the character data of the XML document.
第4.3.3节中的XML建议规定,UTF-16 XML实体必须以字节顺序标记(BOM)开头,这是零宽度不间断空格字符、十六进制序列0xFEFF(或0xFFFE,取决于endian)。XML建议进一步指出,BOM是编码签名,不是XML文档的标记或字符数据的一部分。
Due to the BOM, applications which convert XML from the UTF-16 encoding to another encoding SHOULD strip the BOM before conversion. Similarly, when converting from another encoding into UTF-16, the BOM SHOULD be added after conversion is complete.
由于BOM的原因,将XML从UTF-16编码转换为另一种编码的应用程序应该在转换之前剥离BOM。类似地,当从另一种编码转换为UTF-16时,应在转换完成后添加BOM。
6 Examples
6个例子
The examples below give the value of the Content-type MIME header and the XML declaration (which includes the encoding declaration) inside the XML entity. For UTF-16 examples, the Byte Order Mark character is denoted as "{BOM}", and the XML declaration is assumed to come at the beginning of the XML entity, immediately following the BOM. Note that other MIME headers may be present, and the XML entity may
下面的示例给出了XML实体内的内容类型MIME头和XML声明(包括编码声明)的值。对于UTF-16示例,字节顺序标记字符表示为“{BOM}”,并且假定XML声明位于XML实体的开头,紧跟在BOM之后。请注意,可能存在其他MIME头,XML实体也可能存在
contain other data in addition to the XML declaration; the examples focus on the Content-type header and the encoding declaration for clarity.
包含XML声明之外的其他数据;为了清晰起见,这些示例将重点放在内容类型头和编码声明上。
Content-type: text/xml; charset="utf-8"
Content-type: text/xml; charset="utf-8"
<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="utf-8"?>
This is the recommended charset value for use with text/xml. Since the charset parameter is provided, MIME and XML processors must treat the enclosed entity as UTF-8 encoded.
这是用于text/xml的建议字符集值。由于提供了charset参数,MIME和XML处理器必须将封闭的实体视为UTF-8编码。
If sent using a 7-bit transport (e.g. SMTP), the XML entity must use a content-transfer-encoding of either quoted-printable or base64. For an 8-bit clean transport (e.g., ESMTP, 8BITMIME, or NNTP), or a binary clean transport (e.g., HTTP) no content-transfer-encoding is necessary.
如果使用7位传输(例如SMTP)发送,XML实体必须使用引用的可打印或base64的内容传输编码。对于8位干净传输(例如ESMTP、8BITMIME或NNTP)或二进制干净传输(例如HTTP),不需要内容传输编码。
Content-type: text/xml; charset="utf-16"
Content-type: text/xml; charset="utf-16"
{BOM}<?xml version='1.0' encoding='utf-16'?>
{BOM}<?xml version='1.0' encoding='utf-16'?>
This is possible only when the XML entity is transmitted via HTTP, which uses a MIME-like mechanism and is a binary-clean protocol, hence does not perform CR and LF transformations and allows NUL octets. This differs from typical text MIME type processing (see section 19.4.1 of HTTP 1.1 [RFC-2068] for details).
这只有在XML实体通过HTTP传输时才可能,HTTP使用类似MIME的机制,并且是二进制干净协议,因此不执行CR和LF转换,并允许NUL八位字节。这与典型的文本MIME类型处理不同(有关详细信息,请参阅HTTP 1.1[RFC-2068]的第19.4.1节)。
Since HTTP is binary clean, no content-transfer-encoding is necessary.
由于HTTP是二进制干净的,因此不需要内容传输编码。
Content-type: text/xml; charset="iso-2022-kr"
Content-type: text/xml; charset="iso-2022-kr"
<?xml version="1.0" encoding='iso-2022-kr'?>
<?xml version="1.0" encoding='iso-2022-kr'?>
This example shows text/xml with a Korean charset (e.g., Hangul) encoded following the specification in [RFC-1557]. Since the charset parameter is provided, MIME and XML processors must treat the enclosed entity as encoded per [RFC-1557].
此示例显示了文本/xml,其中包含按照[RFC-1557]中的规范编码的韩语字符集(例如,韩语)。由于提供了字符集参数,MIME和XML处理器必须按照[RFC-1557]对封闭实体进行编码。
Since ISO-2022-KR has been defined to use only 7 bits of data, no content-transfer-encoding is necessary with any transport.
由于ISO-2022-KR定义为仅使用7位数据,因此任何传输都不需要内容传输编码。
Content-type: text/xml
内容类型:text/xml
{BOM}<?xml version="1.0" encoding="utf-16"?>
{BOM}<?xml version="1.0" encoding="utf-16"?>
This example shows text/xml with the charset parameter omitted. In this case, MIME and XML processors must assume the charset is "us-ascii", the default charset value for text media types specified in [RFC-2046]. The default of "us-ascii" holds even if the text/xml entity is transported using HTTP.
此示例显示了省略了charset参数的text/xml。在这种情况下,MIME和XML处理器必须假定字符集为“us ascii”,即[RFC-2046]中指定的文本媒体类型的默认字符集值。即使使用HTTP传输文本/xml实体,“us ascii”的默认值仍然有效。
Omitting the charset parameter is NOT RECOMMENDED for text/xml. For example, even if the contents of the XML entity are UTF-16 or UTF-8, or the XML entity has an explicit encoding declaration, XML and MIME processors must assume the charset is "us-ascii".
对于text/xml,不建议省略charset参数。例如,即使XML实体的内容是UTF-16或UTF-8,或者XML实体具有显式编码声明,XML和MIME处理器也必须假定字符集为“us ascii”。
Content-type: application/xml; charset="utf-16"
Content-type: application/xml; charset="utf-16"
{BOM}<?xml version="1.0"?>
{BOM}<?xml version="1.0"?>
This is a recommended charset value for use with application/xml. Since the charset parameter is provided, MIME and XML processors must treat the enclosed entity as UTF-16 encoded.
这是一个建议用于application/xml的字符集值。由于提供了charset参数,MIME和XML处理器必须将封闭的实体视为UTF-16编码的实体。
If sent using a 7-bit transport (e.g., SMTP) or an 8-bit clean transport (e.g., ESMTP, 8BITMIME, or NNTP), the XML entity must be encoded in quoted-printable or base64. For a binary clean transport (e.g., HTTP), no content-transfer-encoding is necessary.
如果使用7位传输(如SMTP)或8位干净传输(如ESMTP、8BITMIME或NNTP)发送,则XML实体必须以带引号的可打印或base64编码。对于二进制干净传输(例如HTTP),不需要内容传输编码。
Content-type: application/xml; charset="iso-2022-kr"
Content-type: application/xml; charset="iso-2022-kr"
<?xml version="1.0" encoding="iso-2022-kr"?>
<?xml version="1.0" encoding="iso-2022-kr"?>
This example shows application/xml with a Korean charset (e.g., Hangul) encoded following the specification in [RFC-1557]. Since the charset parameter is provided, MIME and XML processors must treat the enclosed entity as encoded per [RFC-1557], independent of whether the XML entity has an internal encoding declaration (this example does show such a declaration, which agrees with the charset parameter).
此示例显示了应用程序/xml,其中包含按照[RFC-1557]中的规范编码的韩文字符集(例如,韩文)。由于提供了字符集参数,MIME和XML处理器必须将封闭的实体视为按照[RFC-1557]编码的实体,这与XML实体是否具有内部编码声明无关(本例确实显示了与字符集参数一致的声明)。
Since ISO-2022-KR has been defined to use only 7 bits of data, no content-transfer-encoding is necessary with any transport.
由于ISO-2022-KR定义为仅使用7位数据,因此任何传输都不需要内容传输编码。
Content-type: application/xml
内容类型:application/xml
{BOM}<?xml version='1.0'?>
{BOM}<?xml version='1.0'?>
For this example, the XML entity begins with a BOM. Since the charset has been omitted, a conforming XML processor follows the requirements of [REC-XML], section 4.3.3. Specifically, the XML processor reads the BOM, and thus knows deterministically that the charset encoding is UTF-16.
对于本例,XML实体以BOM表开始。由于省略了字符集,一致性XML处理器遵循[REC-XML]第4.3.3节的要求。具体地说,XML处理器读取BOM,因此确定地知道字符集编码是UTF-16。
An XML-unaware MIME processor should make no assumptions about the charset of the XML entity.
不知道XML的MIME处理器不应假设XML实体的字符集。
Content-type: application/xml
内容类型:application/xml
<?xml version='1.0'?>
<?xml version='1.0'?>
In this example, the charset parameter has been omitted, and there is no BOM. Since there is no BOM, the XML processor follows the requirements in section 4.3.3, and optionally applies the mechanism described in appendix F (which is non-normative) of [REC-XML] to determine the charset encoding of UTF-8. The XML entity does not contain an encoding declaration, but since the encoding is UTF-8, this is still a conforming XML entity.
在本例中,省略了charset参数,并且没有BOM表。由于没有BOM,XML处理器遵循第4.3.3节中的要求,并选择性地应用[REC-XML]附录F(非规范性)中描述的机制来确定UTF-8的字符集编码。XML实体不包含编码声明,但由于编码是UTF-8,因此它仍然是一致的XML实体。
An XML-unaware MIME processor should make no assumptions about the charset of the XML entity.
不知道XML的MIME处理器不应假设XML实体的字符集。
6.9 application/xml with Omitted Charset and Internal Encoding Declaration
6.9 带省略字符集和内部编码声明的application/xml
Content-type: application/xml
内容类型:application/xml
<?xml version='1.0' encoding="ISO-10646-UCS-4"?>
<?xml version='1.0' encoding="ISO-10646-UCS-4"?>
In this example, the charset parameter has been omitted, and there is no BOM. However, the XML entity does have an encoding declaration inside the XML entity which specifies the entity's charset. Following the requirements in section 4.3.3, and optionally applying the mechanism described in appendix F (non-normative) of [REC-XML], the XML processor determines the charset encoding of the XML entity (in this example, UCS-4).
在本例中,省略了charset参数,并且没有BOM表。但是,XML实体在XML实体中确实有一个编码声明,用于指定实体的字符集。根据第4.3.3节中的要求,并可选地应用[REC-XML]附录F(非规范性)中描述的机制,XML处理器确定XML实体(在本例中为UCS-4)的字符集编码。
An XML-unaware MIME processor should make no assumptions about the charset of the XML entity.
不知道XML的MIME处理器不应假设XML实体的字符集。
7 References
7参考文献
[ISO-10646] ISO/IEC, Information Technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane, May 1993.
[ISO-10646]ISO/IEC,信息技术-通用多八位编码字符集(UCS)-第1部分:体系结构和基本多语言平面,1993年5月。
[ISO-8897] ISO (International Organization for Standardization) ISO 8879:1986(E) Information Processing -- Text and Office Systems -- Standard Generalized Markup Language (SGML). First edition -- 1986- 10-15.
[ISO-8897]ISO(国际标准化组织)ISO 8879:1986(E)信息处理——文本和办公系统——标准通用标记语言(SGML)。第一版——1986-10-15。
[REC-XML] T. Bray, J. Paoli, C. M. Sperberg-McQueen, "Extensible Markup Language (XML)" World Wide Web Consortium Recommendation REC- xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210.
[REC-XML]T.Bray,J.Paoli,C.M.Sperberg McQueen,“可扩展标记语言(XML)”,万维网联盟建议REC-XML-19980210。http://www.w3.org/TR/1998/REC-xml-19980210.
[RFC-1557] Choi, U., Chon, K., and H. Park. "Korean Character Encoding for Internet Messages", RFC 1557. December, 1993.
[RFC-1557]Choi,U.,Chon,K.,和H.Park。“互联网信息的韩文字符编码”,RFC 1557。1993年12月。
[RFC-1874] Levinson, E., "SGML Media Types", RFC 1874. December 1995.
[RFC-1874]Levinson,E.“SGML媒体类型”,RFC 1874。1995年12月。
[RFC-2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC-2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[RFC-2045] Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996.
[RFC-2045]Freed,N.和N.Borenstein,“多用途Internet邮件扩展(MIME)第一部分:Internet邮件正文格式”,RFC 20451996年11月。
[RFC-2046] Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996.
[RFC-2046]Freed,N.和N.Borenstein,“多用途Internet邮件扩展(MIME)第二部分:媒体类型”,RFC 2046,1996年11月。
[RFC-2068] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, January 1997.
[RFC-2068]菲尔丁,R.,盖蒂斯,J.,莫卧儿,J.,弗莱斯蒂克,H.,和T.伯纳斯李,“超文本传输协议——HTTP/1.1”,RFC 2068,1997年1月。
[RFC-2279] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC 2279, January 1998.
[RFC-2279]Yergeau,F.,“UTF-8,ISO 10646的转换格式”,RFC 2279,1998年1月。
[UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 2.0", Addison-Wesley, 1996.
[UNICODE]UNICODE联盟,“UNICODE标准——2.0版”,Addison Wesley,1996年。
8 Acknowledgements
8致谢
Chris Newman and Yaron Y. Goland both contributed content to the security considerations section of this document. In particular, some text in the security considerations section is copied verbatim from work in progress, draft-newman-mime-textpara-00, by permission of the author. Chris Newman additionally contributed content to the encoding considerations sections. Dan Connolly contributed content discussing when to use text/xml. Discussions with Ned Freed and Dan Connolly helped refine the author's understanding of the text media type; feedback from Larry Masinter was also very helpful in understanding media type registration issues.
Chris Newman和Yaron Y.Goland都为本文档的安全考虑部分提供了内容。特别是,经作者许可,安全注意事项部分中的某些文本是从正在进行的工作draft-newman-mime-textpara-00中逐字复制的。Chris Newman还为编码注意事项部分提供了内容。Dan Connolly提供了讨论何时使用text/xml的内容。与Ned Freed和Dan Connolly的讨论有助于完善作者对文本媒体类型的理解;Larry Masinter的反馈也非常有助于理解媒体类型注册问题。
Members of the W3C XML Working Group and XML Special Interest group have made significant contributions to this document, and the authors would like to specially recognize James Clark, Martin Duerst, Rick Jelliffe, Gavin Nicol for their many thoughtful comments.
W3C XML工作组和XML特别兴趣小组的成员对本文档做出了重大贡献,作者特别感谢James Clark、Martin Duerst、Rick Jelliffe和Gavin Nicol发表了许多深思熟虑的评论。
9 Addresses of Authors
9作者的地址
E. James Whitehead, Jr. Dept. of Information and Computer Science University of California, Irvine Irvine, CA 92697-3425
E. James Whitehead,加利福尼亚大学信息与计算机科学系,欧文·欧文,CA92697-3525
EMail: ejw@ics.uci.edu
EMail: ejw@ics.uci.edu
Murata Makoto (Family Given) Fuji Xerox Information Systems, KSP 9A7, 2-1, Sakado 3-chome, Takatsu-ku, Kawasaki-shi, Kanagawa-ken, 213 Japan
村田Makoto(家庭提供)富士施乐信息系统,KSP 9A7,2-1,Sakado 3-chome,高津区,川崎市,神奈川县,213日本
EMail: murata@fxis.fujixerox.co.jp
EMail: murata@fxis.fujixerox.co.jp
10 Full Copyright Statement
10完整版权声明
Copyright (C) The Internet Society (1998). All Rights Reserved.
版权所有(C)互联网协会(1998年)。版权所有。
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。