Network Working Group                                           J. Boyer
Request for Comments: 3076                       PureEdge Solutions Inc.
Category: Informational                                       March 2001
        
Network Working Group                                           J. Boyer
Request for Comments: 3076                       PureEdge Solutions Inc.
Category: Informational                                       March 2001
        

Canonical XML Version 1.0

规范XML 1.0版

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2001). All Rights Reserved.

版权所有(C)互联网协会(2001年)。版权所有。

Abstract

摘要

Any XML (Extensible Markup Language) document is part of a set of XML documents that are logically equivalent within an application context, but which vary in physical representation based on syntactic changes permitted by XML 1.0 and Namespaces in XML. This specification describes a method for generating a physical representation, the canonical form, of an XML document that accounts for the permissible changes. Except for limitations regarding a few unusual cases, if two documents have the same canonical form, then the two documents are logically equivalent within the given application context. Note that two documents may have differing canonical forms yet still be equivalent in a given context based on application-specific equivalence rules for which no generalized XML specification could account.

任何XML(Extensible Markup Language,可扩展标记语言)文档都是一组XML文档的一部分,这些文档在应用程序上下文中在逻辑上是等价的,但其物理表示形式根据XML 1.0允许的语法更改和XML中的名称空间而有所不同。本规范描述了一种方法,用于生成XML文档的物理表示形式,即规范形式,以说明允许的更改。除了一些特殊情况的限制外,如果两个文档具有相同的规范形式,那么这两个文档在给定的应用程序上下文中在逻辑上是等价的。请注意,两个文档可能具有不同的规范形式,但在基于特定于应用程序的等价规则的给定上下文中仍然是等价的,没有通用的XML规范可以解释这些规则。

Table of Contents

目录

   1. Introduction...............................................  2
   1.1 Terminology...............................................  3
   1.2 Applications..............................................  4
   1.3 Limitations...............................................  4
   2. XML Canonicalization.......................................  6
   2.1 Data Model................................................  6
   2.2 Document Order............................................ 10
   2.3 Processing Model.......................................... 10
   2.4 Document Subsets.......................................... 13
   3. Examples of XML Canonicalization........................... 14
   3.1 PIs, Comments, and Outside of Document Element............ 14
   3.2 Whitespace in Document Content............................ 15
   3.3 Start and End Tags........................................ 16
   3.4 Character Modifications and Character References.......... 17
   3.5 Entity References......................................... 19
   3.6 UTF-8 Encoding............................................ 19
   3.7 Document Subsets.......................................... 20
   4. Resolutions................................................ 21
   4.1 No XML Declaration........................................ 21
   4.2 No Character Model Normalization.......................... 21
   4.3 Handling of Whitespace Outside Document Element........... 22
   4.4 No Namespace Prefix Rewriting............................. 22
   4.5 Order of Namespace Declarations and Attributes............ 23
   4.6 Superfluous Namespace Declarations........................ 23
   4.7 Propagation of Default Namespace Declaration in Document
       Subsets................................................... 24
   4.8 Sorting Attributes by Namespace URI....................... 24
   Security Considerations....................................... 24
   References.................................................... 25
   Author's Address.............................................. 26
   Acknowledgements.............................................. 27
   Full Copyright Statement...................................... 28
        
   1. Introduction...............................................  2
   1.1 Terminology...............................................  3
   1.2 Applications..............................................  4
   1.3 Limitations...............................................  4
   2. XML Canonicalization.......................................  6
   2.1 Data Model................................................  6
   2.2 Document Order............................................ 10
   2.3 Processing Model.......................................... 10
   2.4 Document Subsets.......................................... 13
   3. Examples of XML Canonicalization........................... 14
   3.1 PIs, Comments, and Outside of Document Element............ 14
   3.2 Whitespace in Document Content............................ 15
   3.3 Start and End Tags........................................ 16
   3.4 Character Modifications and Character References.......... 17
   3.5 Entity References......................................... 19
   3.6 UTF-8 Encoding............................................ 19
   3.7 Document Subsets.......................................... 20
   4. Resolutions................................................ 21
   4.1 No XML Declaration........................................ 21
   4.2 No Character Model Normalization.......................... 21
   4.3 Handling of Whitespace Outside Document Element........... 22
   4.4 No Namespace Prefix Rewriting............................. 22
   4.5 Order of Namespace Declarations and Attributes............ 23
   4.6 Superfluous Namespace Declarations........................ 23
   4.7 Propagation of Default Namespace Declaration in Document
       Subsets................................................... 24
   4.8 Sorting Attributes by Namespace URI....................... 24
   Security Considerations....................................... 24
   References.................................................... 25
   Author's Address.............................................. 26
   Acknowledgements.............................................. 27
   Full Copyright Statement...................................... 28
        
1. Introduction
1. 介绍

The XML 1.0 Recommendation [XML] specifies the syntax of a class of resources called XML documents. The Namespaces in XML Recommendation [Names] specifies additional syntax and semantics for XML documents. It is possible for XML documents which are equivalent for the purposes of many applications to differ in physical representation. For example, they may differ in their entity structure, attribute ordering, and character encoding. It is the goal of this specification to establish a method for determining whether two documents are identical, or whether an application has not changed a document, except for transformations permitted by XML 1.0 and Namespaces.

XML1.0建议[XML]指定了一类称为XML文档的资源的语法。XML推荐[Names]中的名称空间为XML文档指定了额外的语法和语义。对于许多应用程序来说,等价的XML文档可能在物理表示上有所不同。例如,它们可能在实体结构、属性顺序和字符编码方面有所不同。本规范的目标是建立一种方法,用于确定两个文档是否相同,或者应用程序是否未更改文档,XML 1.0和名称空间允许的转换除外。

1.1 Terminology
1.1 术语

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [Keywords].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[关键词]中所述进行解释。

See [Names] for the definition of QName.

有关QName的定义,请参见[名称]。

A document subset is a portion of an XML document indicated by a node-set that may not include all of the nodes in the document.

文档子集是由节点集指示的XML文档的一部分,该节点集可能不包括文档中的所有节点。

The canonical form of an XML document is physical representation of the document produced by the method described in this specification. The changes are summarized in the following list:

XML文档的规范形式是由本规范中描述的方法生成的文档的物理表示形式。以下列表总结了这些变化:

* The document is encoded in UTF-8 * Line breaks normalized to #xA on input, before parsing * Attribute values are normalized, as if by a validating processor * Character and parsed entity references are replaced * CDATA sections are replaced with their character content * The XML declaration and document type declaration (DTD) are removed * Empty elements are converted to start-end tag pairs * Whitespace outside of the document element and within start and end tags is normalized * All whitespace in character content is retained (excluding characters removed during line feed normalization) * Attribute value delimiters are set to quotation marks (double quotes) * Special characters in attribute values and character content are replaced by character references * Superfluous namespace declarations are removed from each element * Default attributes are added to each element * Lexicographic order is imposed on the namespace declarations and attributes of each element

* 文档以UTF-8编码*在解析之前,输入时将换行符规范化为#xA*属性值被规范化,就像由验证处理器*字符和解析的实体引用被替换*CDATA节被替换为其字符内容*XML声明和文档类型声明(DTD)已删除*空元素转换为开始-结束标记对*文档元素外部以及开始和结束标记内部的空白已规范化*保留字符内容中的所有空白(不包括换行规范化期间删除的字符)*属性值分隔符设置为引号(双引号)*属性值和字符内容中的特殊字符替换为字符引用*从每个元素中删除多余的命名空间声明*将默认属性添加到每个元素*对每个元素的命名空间声明和属性施加字典顺序

The term canonical XML refers to XML that is in canonical form. The XML canonicalization method is the algorithm defined by this specification that generates the canonical form of a given XML document or document subset. The term XML canonicalization refers to the process of applying the XML canonicalization method to an XML document or document subset.

术语规范XML指的是规范形式的XML。XML规范化方法是本规范定义的算法,它生成给定XML文档或文档子集的规范形式。术语XML规范化是指将XML规范化方法应用于XML文档或文档子集的过程。

The XPath 1.0 Recommendation [XPath] defines the term node-set and specifies a data model for representing an input XML document as a set of nodes of various types (element, attribute, namespace, text,

XPath 1.0建议[XPath]定义了术语节点集,并指定了一个数据模型,用于将输入XML文档表示为一组不同类型的节点(元素、属性、名称空间、文本、,

comment, processing instruction, and root). The nodes are included in or excluded from a node-set based on the evaluation of an expression. Within this specification, a node-set is used to directly indicate whether or not each node should be rendered in the canonical form (in this sense, it is used as a formal mathematical set). A node that is excluded from the set is not rendered in the canonical form being generated, even if its parent node is included in the node-set. However, an omitted node may still impact the rendering of its descendants (e.g., by augmenting the namespace context of the descendants).

注释、处理指令和根)。根据表达式的计算结果,节点包含在节点集中或从节点集中排除。在本规范中,节点集用于直接指示每个节点是否应以规范形式呈现(从这个意义上讲,它被用作正式的数学集)。从集合中排除的节点不会以生成的规范形式呈现,即使其父节点包含在节点集中也是如此。但是,省略的节点仍可能影响其子体的呈现(例如,通过增加子体的命名空间上下文)。

1.2 Applications
1.2 应用

Since the XML 1.0 Recommendation [XML] and the Namespaces in XML Recommendation [Names] define multiple syntactic methods for expressing the same information, XML applications tend to take liberties with changes that have no impact on the information content of the document. XML canonicalization is designed to be useful to applications that require the ability to test whether the information content of a document or document subset has been changed. This is done by comparing the canonical form of the original document before application processing with the canonical form of the document result of the application processing.

由于XML 1.0建议[XML]和XML建议[Names]中的名称空间定义了多种语法方法来表达相同的信息,因此XML应用程序倾向于随意更改对文档信息内容没有影响的内容。XML规范化设计用于需要测试文档或文档子集的信息内容是否已更改的应用程序。这是通过将应用程序处理前原始文档的规范格式与应用程序处理的文档结果的规范格式进行比较来实现的。

For example, a digital signature over the canonical form of an XML document or document subset would allow the signature digest calculations to be oblivious to changes in the original document's physical representation, provided that the changes are defined to be logically equivalent by the XML 1.0 or Namespaces in XML. During signature generation, the digest is computed over the canonical form of the document. The document is then transferred to the relying party, which validates the signature by reading the document and computing a digest of the canonical form of the received document. The equivalence of the digests computed by the signing and relying parties (and hence the equivalence of the canonical forms over which they were computed) ensures that the information content of the document has not been altered since it was signed.

例如,XML文档或文档子集的规范形式上的数字签名将允许签名摘要计算忽略原始文档物理表示中的更改,前提是这些更改由XML 1.0或XML中的名称空间定义为逻辑等效。在签名生成过程中,根据文档的规范形式计算摘要。然后将该文件传输给依赖方,依赖方通过阅读该文件并计算所收到文件的规范格式摘要来验证签名。由签署方和依赖方计算的摘要的等效性(以及由此计算的规范格式的等效性)确保了文件的信息内容自签署以来未被更改。

1.3 Limitations
1.3 局限性

Two XML documents may have differing information content that is nonetheless logically equivalent within a given application context. Although two XML documents are equivalent (aside from limitations given in this section) if their canonical forms are identical, it is not a goal of this work to establish a method such that two XML documents are equivalent if and only if their canonical forms are identical. Such a method is unachievable, in part due to application-specific rules such as those governing unimportant

两个XML文档可能具有不同的信息内容,但在给定的应用程序上下文中,这些信息内容在逻辑上是等价的。虽然如果两个XML文档的规范形式相同,则它们是等效的(除了本节中给出的限制),但本工作的目标不是建立一种方法,使两个XML文档在且仅当其规范形式相同时才是等效的。这种方法是不可能实现的,部分原因是特定于应用程序的规则,例如管理不重要的规则

whitespace and equivalent data (e.g., <color>black</color> versus <color>rgb(0,0,0)</color>). There are also equivalencies established by other W3C Recommendations and Working Drafts. Accounting for these additional equivalence rules is beyond the scope of this work. They can be applied by the application or become the subject of future specifications.

空白和等效数据(例如,<color>黑色</color>与<color>rgb(0,0,0)</color>)。其他W3C建议和工作草案也建立了等效性。解释这些额外的等价规则超出了本工作的范围。它们可以通过应用程序应用,也可以成为未来规范的主题。

The canonical form of an XML document may not be completely operational within the application context, though the circumstances under which this occurs are unusual. This problem may be of concern in certain applications since the canonical form of a document and the canonical form of the canonical form of the document are equivalent. For example, in a digital signature application, the canonical form can be substituted for the original document without changing the digest calculation. However, the security risk only occurs in the unusual circumstances described below, which can all be resolved or at least detected prior to digital signature generation.

XML文档的规范形式可能无法在应用程序上下文中完全运行,尽管发生这种情况的情况并不常见。这个问题在某些应用程序中可能会引起关注,因为文档的规范形式和文档的规范形式是等价的。例如,在数字签名应用程序中,可以在不更改摘要计算的情况下用规范形式替换原始文档。然而,安全风险仅在以下描述的异常情况下发生,这些异常情况都可以在生成数字签名之前解决或至少检测到。

The difficulties arise due to the loss of the following information not available in the data model:

由于丢失了数据模型中不可用的以下信息,出现了困难:

1. base URI, especially in content derived from the replacement text of external general parsed entity references 2. notations and external unparsed entity references 3. attribute types in the document type declaration

1. 基本URI,特别是在从外部通用解析实体引用2的替换文本派生的内容中。符号和外部未解析实体参考3。文档类型声明中的属性类型

In the first case, note that a document containing a relative URI [URI] is only operational when accessed from a specific URI that provides the proper base URI. In addition, if the document contains external general parsed entity references to content containing relative URIs, then the relative URIs will not be operational in the canonical form, which replaces the entity reference with internal content (thereby implicitly changing the default base URI of that content). Both of these problems can typically be solved by adding support for the xml:base attribute [XBase] to the application, then adding appropriate xml:base attributes to document element and all top-level elements in external entities. In addition, applications often have an opportunity to resolve relative URIs prior to the need for a canonical form. For example, in a digital signature application, a document is often retrieved and processed prior to signature generation. The processing SHOULD create a new document in which relative URIs have been converted to absolute URIs, thereby mitigating any security risk for the new document.

在第一种情况下,请注意,包含相对URI[URI]的文档只有在从提供正确基本URI的特定URI访问时才可操作。此外,如果文档包含对包含相对URI的内容的外部通用解析实体引用,则相对URI将不会以规范形式运行,这将用内部内容替换实体引用(从而隐式更改该内容的默认基本URI)。这两个问题通常都可以通过向应用程序添加对xml:base属性[XBase]的支持,然后向文档元素和外部实体中的所有顶级元素添加适当的xml:base属性来解决。此外,应用程序通常有机会在需要规范形式之前解析相对URI。例如,在数字签名应用程序中,通常在生成签名之前检索和处理文档。处理过程应创建一个新文档,其中相对URI已转换为绝对URI,从而降低新文档的任何安全风险。

In the second case, the loss of external unparsed entity references and the notations that bind them to applications means that canonical forms cannot properly distinguish among XML documents that incorporate unparsed data via this mechanism. This is an unusual

在第二种情况下,外部未解析实体引用和将它们绑定到应用程序的符号的丢失意味着规范形式无法正确区分通过此机制合并未解析数据的XML文档。这是一个不寻常的例子

case precisely because most XML processors currently discard the document type declaration, which discards the notation, the entity's binding to a URI, and the attribute type that binds the attribute value to an entity name. For documents that must be subjected to more than one XML processor, the XML design typically indicates a reference to unparsed data using a URI in the attribute value.

正是因为大多数XML处理器当前都放弃了文档类型声明,文档类型声明放弃了符号、实体与URI的绑定以及将属性值与实体名称绑定的属性类型。对于必须接受多个XML处理器处理的文档,XML设计通常使用属性值中的URI指示对未分析数据的引用。

In the third case, the loss of attribute types can affect the canonical form in different ways depending on the type. Attributes of type ID cease to be ID attributes. Hence, any XPath expressions that refer to the canonical form using the id() function cease to operate. The attribute types ENTITY and ENTITIES are not part of this case; they are covered in the second case above. Attributes of enumerated type and of type ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, and NOTATION fail to be appropriately constrained during future attempts to change the attribute value if the canonical form replaces the original document during application processing. Applications can avoid the difficulties of this case by ensuring that an appropriate document type declaration is prepended prior to using the canonical form in further XML processing. This is likely to be an easy task since attribute lists are usually acquired from a standard external DTD subset, and any entity and notation declarations not also in the external DTD subset are typically constructed from application configuration information and added to the internal DTD subset.

在第三种情况下,属性类型的丢失会以不同的方式影响规范形式,具体取决于类型。ID类型的属性不再是ID属性。因此,任何使用id()函数引用规范形式的XPath表达式都将停止运行。属性类型实体和实体不属于这种情况;上述第二种情况涉及这些问题。如果规范格式在应用程序处理期间替换了原始文档,则在以后尝试更改属性值时,枚举类型和类型ID、IDREF、IDREFS、NMTOKEN、NMTOKENS和NOTATION的属性无法得到适当的约束。应用程序可以通过确保在进一步的XML处理中使用规范格式之前预先准备好适当的文档类型声明来避免这种情况下的困难。这可能是一项简单的任务,因为属性列表通常是从标准的外部DTD子集获取的,并且外部DTD子集中不包含的任何实体和符号声明通常是从应用程序配置信息构造的,并添加到内部DTD子集中。

While these limitations are not severe, it would be possible to resolve them in a future version of XML canonicalization if, for example, a new version of XPath were created based on the XML Information Set [Infoset] currently under development at the W3C.

虽然这些限制并不严重,但如果(例如)基于W3C目前正在开发的XML信息集[Infoset]创建了一个新的XPath版本,则可以在未来的XML规范化版本中解决这些限制。

2. XML Canonicalization
2. XML规范化
2.1 Data Model
2.1 数据模型

The data model defined in the XPath 1.0 Recommendation [XPath] is used to represent the input XML document or document subset. Implementations SHOULD but need not be based on an XPath implementation. XML canonicalization is defined in terms of the XPath definition of a node-set, and implementations MUST produce equivalent results.

XPath 1.0建议[XPath]中定义的数据模型用于表示输入XML文档或文档子集。实现应该但不必基于XPath实现。XML规范化是根据节点集的XPath定义定义的,实现必须产生等效的结果。

The first parameter of input to the XML canonicalization method is either an XPath node-set or an octet stream containing a well-formed XML document. Implementations MUST support the octet stream input and SHOULD also support the document subset feature via node-set input. For the purpose of describing canonicalization in terms of an XPath node-set, this section describes how an octet stream is converted to an XPath node-set.

XML规范化方法的第一个输入参数是XPath节点集或包含格式良好的XML文档的八位字节流。实现必须支持八位字节流输入,还应通过节点集输入支持文档子集功能。为了描述XPath节点集的规范化,本节描述如何将八位字节流转换为XPath节点集。

The second parameter of input to the XML canonicalization method is a boolean flag indicating whether or not comments should be included in the canonical form output by the XML canonicalization method. If a canonical form contains comments corresponding to the comment nodes in the input node-set, the result is called canonical XML with comments. Note that the XPath data model does not create comment nodes for comments appearing within the document type declaration (DTD). Implementations are REQUIRED to be capable of producing canonical XML excluding all comments that may have appeared in the input document or document subset. Support for canonical XML with comments is RECOMMENDED.

XML规范化方法输入的第二个参数是一个布尔标志,指示XML规范化方法输出的规范化表单中是否应包含注释。如果规范形式包含与输入节点集中的注释节点相对应的注释,则结果称为带注释的规范XML。请注意,XPath数据模型不会为出现在文档类型声明(DTD)中的注释创建注释节点。实现需要能够生成规范的XML,排除输入文档或文档子集中可能出现的所有注释。建议支持带注释的规范XML。

If an XML document must be converted to a node-set, XPath REQUIRES that an XML processor be used to create the nodes of its data model to fully represent the document. The XML processor performs the following tasks in order:

如果必须将XML文档转换为节点集,则XPath要求使用XML处理器创建其数据模型的节点,以完全表示文档。XML处理器按顺序执行以下任务:

1. normalize line feeds 2. normalize attribute values 3. replace CDATA sections with their character content 4. resolve character and parsed entity references

1. 正常化换行2。规范化属性值3。将CDATA节替换为其字符内容4。解析字符和已解析的实体引用

The input octet stream MUST contain a well-formed XML document, but the input need not be validated. However, the attribute value normalization and entity reference resolution MUST be performed in accordance with the behaviors of a validating XML processor. As well, nodes for default attributes (declared in the ATTLIST with an AttValue but not specified) are created in each element. Thus, the declarations in the document type declaration are used to help create the canonical form, even though the document type declaration is not retained in the canonical form.

输入八位字节流必须包含格式良好的XML文档,但不需要验证输入。但是,属性值规范化和实体引用解析必须根据验证XML处理器的行为来执行。同样,在每个元素中创建默认属性(在ATTLIST中用AttValue声明但未指定)的节点。因此,文档类型声明中的声明用于帮助创建规范形式,即使文档类型声明没有保留在规范形式中。

The XPath data model represents data using UCS characters. Implementations MUST use XML processors that support UTF-8 and UTF-16 and translate to the UCS character domain. For UTF-16, the leading byte order mark is treated as an artifact of encoding and stripped from the UCS character data (subsequent zero width non-breaking spaces appearing within the UTF-16 data are not removed) [UTF-16, Section 3.2]. Support for ISO-8859-1 encoding is RECOMMENDED, and all other character encodings are OPTIONAL.

XPath数据模型使用UCS字符表示数据。实现必须使用支持UTF-8和UTF-16并转换为UCS字符域的XML处理器。对于UTF-16,前导字节顺序标记被视为编码工件,并从UCS字符数据中剥离(UTF-16数据中出现的后续零宽度非中断空格不会被删除)[UTF-16,第3.2节]。建议支持ISO-8859-1编码,所有其他字符编码都是可选的。

All whitespace within the root document element MUST be preserved (except for any #xD characters deleted by line delimiter normalization). This includes all whitespace in external entities. Whitespace outside of the root document element MUST be discarded.

必须保留根文档元素中的所有空白(通过行分隔符规范化删除的任何#xD字符除外)。这包括外部实体中的所有空白。必须放弃根文档元素之外的空白。

In the XPath data model, there exist the following node types: root, element, comment, processing instruction, text, attribute and namespace. There exists a single root node whose children are processing instruction nodes and comment nodes to represent information outside of the document element (and outside of the document type declaration). The root node also has a single element node representing the top-level document element. Each element node can have child nodes of type element, text, processing instruction, and comment. The attributes and namespaces associated with an element are not considered to be child nodes of the element, but they are associated with the element by inclusion in the element's attribute and namespace axes. Note that attribute and namespace axes may not directly correspond to the text appearing in the element's start tag in the original document.

在XPath数据模型中,存在以下节点类型:根、元素、注释、处理指令、文本、属性和命名空间。存在一个根节点,其子节点是处理指令节点和注释节点,以表示文档元素之外(以及文档类型声明之外)的信息。根节点还有一个表示顶级文档元素的元素节点。每个元素节点都可以有类型为element、text、processing instruction和comment的子节点。与元素关联的属性和命名空间不被视为元素的子节点,但它们通过包含在元素的属性和命名空间轴中而与元素关联。请注意,属性轴和名称空间轴可能与原始文档中元素的开始标记中显示的文本不直接对应。

Note: An element has attribute nodes to represent the non-namespace attribute declarations appearing in its start tag as well as nodes to represent the default attributes.

注意:元素有属性节点来表示其开始标记中出现的非命名空间属性声明,也有节点来表示默认属性。

By virtue of the XPath data model, XML canonicalization is namespace-aware [Names]. However, it cannot and therefore does not account for namespace equivalencies using namespace prefix rewriting (see explanation in Section 4). In the XPath data model, each element and attribute has a name returned by the function name() which can, at the discretion of the application, be the QName appearing in the original document. XML canonicalization REQUIRES that the XML processor retain sufficient information such that the QName of the element as it appeared in the original document can be provided.

借助XPath数据模型,XML规范化具有名称空间感知[名称]。但是,它不能也因此不能使用名称空间前缀重写来解释名称空间等价性(请参见第4节中的解释)。在XPath数据模型中,每个元素和属性都有一个由函数名()返回的名称,该名称可以是原始文档中出现的QName(由应用程序自行决定)。XML规范化要求XML处理器保留足够的信息,以便能够提供原始文档中出现的元素的QName。

Note: An element E has namespace nodes that represent its namespace declarations as well as any namespace declarations made by its ancestors that have not been overridden in E's declarations, the default namespace if it is non-empty, and the declaration of the prefix xml. nn Note: This specification supports the recent XML plenary decision to deprecate relative namespace URIs as follows: implementations of XML canonicalization MUST report an operation failure on documents containing relative namespace URIs. XML canonicalization MUST NOT be implemented with an XML parser that converts relative URIs to absolute URIs.

注意:元素E具有表示其名称空间声明的名称空间节点,以及由其祖先做出的未在元素E的声明中被重写的任何名称空间声明、默认名称空间(如果它是非空的)以及前缀xml的声明。nn注意:此规范支持最近的XML全体会议决定,即如下所示:XML规范化的实现必须在包含相对命名空间URI的文档上报告操作失败。不能使用将相对URI转换为绝对URI的XML解析器来实现XML规范化。

Character content is represented in the XPath data model with text nodes. All consecutive characters are placed into a single text node. Furthermore, the text node's characters are represented in the UCS character domain. The XML canonicalization method does not perform character model normalization (see explanation in Section 4). However, the XML processor used to prepare the XPath data model input

字符内容在XPath数据模型中用文本节点表示。所有连续字符都放置在单个文本节点中。此外,文本节点的字符在UCS字符域中表示。XML规范化方法不执行字符模型规范化(请参阅第4节中的说明)。但是,XML处理器用于准备XPath数据模型输入

is REQUIRED to use Normalization Form C [NFC, NFC-Corrigendum] when converting an XML document to the UCS character domain from any encoding that is not UCS-based (currently, UCS-based encodings include UTF-8, UTF-16, UTF-16BE, and UTF-16LE, UCS-2, and UCS-4).

从任何非基于UCS的编码(目前,基于UCS的编码包括UTF-8、UTF-16、UTF-16BE和UTF-16LE、UCS-2和UCS-4)将XML文档转换为UCS字符域时,需要使用规范化形式C[NFC,NFC勘误表]。

Since XML canonicalization converts an XPath node-set into a canonical form, the first parameter MUST either be an XPath node-set or it must be converted from an octet stream to a node-set by performing the XML processing necessary to create the XPath nodes described above, then setting an initial XPath evaluation context of:

由于XML规范化将XPath节点集转换为规范形式,因此第一个参数必须是XPath节点集,或者必须通过执行创建上述XPath节点所需的XML处理,将其从八位字节流转换为节点集,然后设置初始XPath计算上下文:

* A context node, initialized to the root node of the input XML document. * A context position, initialized to 1. * A context size, initialized to 1. * Any library of functions conforming to the XPath Recommendation. * An empty set of variable bindings. * An empty set of namespace declarations.

* 上下文节点,初始化为输入XML文档的根节点。*上下文位置,初始化为1.*上下文大小,初始化为1。*任何符合XPath建议的函数库。*一组空的变量绑定。*命名空间声明的空集合。

and evaluating the following default expression:

并计算以下默认表达式:

     Comment Parameter Value    Default XPath Expression
     -----------------------    ------------------------
        
     Comment Parameter Value    Default XPath Expression
     -----------------------    ------------------------
        
     Without (false):
                      (//. | //@* |//namespace::*)[not(self::comment())]
        
     Without (false):
                      (//. | //@* |//namespace::*)[not(self::comment())]
        
     With (true):
                      (//. | //@* | //namespace::*)
        
     With (true):
                      (//. | //@* | //namespace::*)
        

The expressions in this table generate a node-set containing every node of the XML document (except the comments if the comment parameter value is false).

此表中的表达式生成包含XML文档的每个节点的节点集(注释参数值为false时注释除外)。

If the input is an XPath node-set, then the node-set must explicitly contain every node to be rendered to the canonical form. For example, the result of the XPath expression id("E") is a node-set containing only the node corresponding to the element with an ID attribute value of "E". Since none of its descendant nodes, attribute nodes and namespace nodes are in the set, the canonical form would consist solely of the element's start and end tags, less the attribute and namespace declarations, with no internal content. Section 3.7 exemplifies how to serialize an identified element along with its internal content, attributes and namespace declarations.

如果输入是XPath节点集,则节点集必须显式包含要呈现为规范形式的每个节点。例如,XPath表达式id(“E”)的结果是一个节点集,其中只包含与id属性值为“E”的元素对应的节点。由于其子代节点、属性节点和名称空间节点都不在集合中,因此规范形式将只包含元素的开始和结束标记,而不包含属性和名称空间声明,没有内部内容。第3.7节举例说明了如何序列化已标识元素及其内部内容、属性和命名空间声明。

2.2 Document Order
2.2 文件顺序

Although an XPath node-set is defined to be unordered, the XPath 1.0 Recommendation [XPath] defines the term document order to be the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities, except for namespace and attribute nodes whose document order is application-dependent.

尽管XPath节点集被定义为无序,但XPath 1.0建议[XPath]将术语文档顺序定义为每个节点的XML表示的第一个字符在扩展通用实体后出现在文档的XML表示中的顺序,文档顺序依赖于应用程序的命名空间和属性节点除外。

The XML canonicalization method processes a node-set by imposing the following additional document order rules on the namespace and attribute nodes of each element:

XML规范化方法通过在每个元素的命名空间和属性节点上施加以下附加文档顺序规则来处理节点集:

* An element's namespace and attribute nodes have a document order position greater than the element but less than any child node of the element. * Namespace nodes have a lesser document order position than attribute nodes. * An element's namespace nodes are sorted lexicographically by local name (the default namespace node, if one exists, has no local name and is therefore lexicographically least). * An element's attribute nodes are sorted lexicographically with namespace URI as the primary key and local name as the secondary key (an empty namespace URI is lexicographically least).

* 元素的命名空间和属性节点的文档顺序位置大于元素,但小于元素的任何子节点。*命名空间节点的文档顺序位置低于属性节点。*元素的命名空间节点按本地名称按字典顺序排列(默认命名空间节点(如果存在)没有本地名称,因此按字典顺序排列最少)。*元素的属性节点按字典顺序排序,命名空间URI作为主键,本地名称作为辅助键(空命名空间URI按字典顺序最少)。

Lexicographic comparison, which orders strings from least to greatest alphabetically, is based on the UCS codepoint values, which is equivalent to lexicographic ordering based on UTF-8.

按字母顺序从最小到最大对字符串排序的词典比较基于UCS代码点值,这相当于基于UTF-8的词典排序。

2.3 Processing Model
2.3 加工模型

The XPath node-set is converted into an octet stream, the canonical form, by generating the representative UCS characters for each node in the node-set in ascending document order, then encoding the result in UTF-8 (without a leading byte order mark). No node is processed more than once. Note that processing an element node E includes the processing of all members of the node-set for which E is an ancestor. Therefore, directly after the representative text for E is generated, E and all nodes for which E is an ancestor are removed from the node-set (or some logically equivalent operation occurs such that the node-set's next node in document order has not been processed). Note, however, that an element node is not removed from the node-set until after its children are processed.

XPath节点集转换为八位字节流,即规范形式,方法是按文档升序为节点集中的每个节点生成代表性UCS字符,然后将结果编码为UTF-8(无前导字节顺序标记)。没有节点被多次处理。注意,处理元素节点E包括处理E是其祖先的节点集的所有成员。因此,直接在生成E的代表性文本之后,E和E是其祖先的所有节点从节点集中移除(或者发生一些逻辑上等效的操作,使得未处理文档顺序中节点集的下一个节点)。但是,请注意,只有在处理元素节点的子节点之后,才会从节点集中删除元素节点。

The result of processing a node depends on its type and on whether or not it is in the node-set. If a node is not in the node-set, then no text is generated for the node except for the result of processing

处理节点的结果取决于其类型以及是否在节点集中。如果节点不在节点集中,则除处理结果外,不会为该节点生成任何文本

its namespace and attribute axes (elements only) and its children (elements and the root node). If the node is in the node-set, then text is generated to represent the node in the canonical form in addition to the text generated by processing the node's namespace and attribute axes and child nodes.

其名称空间和属性轴(仅限元素)及其子级(元素和根节点)。如果节点位于节点集中,则除了通过处理节点的命名空间和属性轴以及子节点生成的文本外,还会生成文本以规范形式表示节点。

Note: The node-set is treated as a set of nodes, not a list of subtrees. To canonicalize an element including its namespaces, attributes, and content, the node-set must actually contain all of the nodes corresponding to these parts of the document, not just the element node.

注意:节点集被视为一组节点,而不是子树列表。要规范化元素(包括其名称空间、属性和内容),节点集实际上必须包含与文档的这些部分对应的所有节点,而不仅仅是元素节点。

The text generated for a node is dependent on the node type and given in the following list:

为节点生成的文本取决于节点类型,并在以下列表中给出:

* Root Node- The root node is the parent of the top-level document element. The result of processing each of its child nodes that is in the node-set in document order. The root node does not generate a byte order mark, XML declaration, nor anything from within the document type declaration.

* 根节点-根节点是顶级文档元素的父节点。按文档顺序处理节点集中每个子节点的结果。根节点不生成字节顺序标记、XML声明,也不从文档类型声明中生成任何内容。

* Element Nodes- If the element is not in the node-set, then the result is obtained by processing the namespace axis, then the attribute axis, then processing the child nodes of the element that are in the node-set (in document order). If the element is in the node-set, then the result is an open angle bracket (<), the element QName, the result of processing the namespace axis, the result of processing the attribute axis, a close angle bracket (>), the result of processing the child nodes of the element that are in the node-set (in document order), an open angle bracket, a forward slash (/), the element QName, and a close angle bracket.

* 元素节点-如果元素不在节点集中,则通过处理名称空间轴、属性轴,然后处理节点集中元素的子节点(按文档顺序)来获得结果。如果元素在节点集中,则结果为开角括号(<)、元素QName、处理命名空间轴的结果、处理属性轴的结果、闭角括号(>)、处理节点集中元素的子节点的结果(按文档顺序)、开角括号,正斜杠(/)、元素QName和右尖括号。

* o Namespace Axis- Consider a list L containing only namespace nodes in the axis and in the node-set in lexicographic order (ascending). To begin processing L, if the first node is not the default namespace node (a node with no namespace URI and no local name), then generate a space followed by xmlns="" if and only if the following conditions are met:

* o命名空间轴-考虑一个列表L,其中仅包含轴上的命名空间节点和节点集合中的字典顺序(升序)。要开始处理L,如果第一个节点不是默认名称空间节点(没有名称空间URI且没有本地名称的节点),则仅当满足以下条件时,才生成后跟xmlns=”“的空格:

+ the element E that owns the axis is in the node-set + The nearest ancestor element of E in the node-set has a default namespace node in the node-set (default namespace nodes always have non-empty values in XPath)

+ 拥有轴的元素E位于节点集中+节点集中最近的祖先元素E在节点集中具有默认名称空间节点(默认名称空间节点在XPath中始终具有非空值)

The latter condition eliminates unnecessary occurrences of xmlns="" in the canonical form since an element only receives an xmlns="" if its default namespace is empty and if it has an immediate parent in the canonical form that has a non-empty default namespace. To finish processing L, simply process every namespace node in L, except omit namespace node with local name xml, which defines the xml prefix, if its string value is http://www.w3.org/XML/1998/namespace.

后一种情况消除了规范形式中不必要的xmlns=“”,因为只有当元素的默认命名空间为空且规范形式中具有非空默认命名空间的直接父元素时,该元素才会接收xmlns=“”。要完成对L的处理,只需处理L中的每个名称空间节点,除非使用本地名称xml忽略名称空间节点,该名称空间节点定义了xml前缀(如果其字符串值为http://www.w3.org/XML/1998/namespace.

o Attribute Axis- In lexicographic order (ascending), process each node that is in the element's attribute axis and in the node-set.

o 属性轴-按字典顺序(升序),处理元素属性轴和节点集中的每个节点。

* Namespace Nodes- A namespace node N is ignored if the nearest ancestor element of the node's parent element that is in the node-set has a namespace node in the node-set with the same local name and value as N. Otherwise, process the namespace node N in the same way as an attribute node, except assign the local name xmlns to the default namespace node if it exists (in XPath, the default namespace node has an empty URI and local name).

* 名称空间节点-如果节点集中节点的父元素的最近祖先元素在节点集中具有与N相同的本地名称和值的名称空间节点,则忽略名称空间节点N。否则,以与属性节点相同的方式处理名称空间节点N,除非将本地名称xmlns分配给默认名称空间节点(如果存在)(在XPath中,默认名称空间节点具有空URI和本地名称)。

* Attribute Nodes- a space, the node's QName, an equals sign, an open quotation mark (double quote), the modified string value, and a close quotation mark (double quote). The string value of the node is modified by replacing all ampersands (&) with &amp;, all open angle brackets (<) with &lt;, all quotation mark (double quote) characters with &quot;, and the whitespace characters #x9, #xA, and #xD, with character references. The character references are written in uppercase hexadecimal with no leading zeroes (for example, #xD is represented by the character reference &#xD;).

* 属性节点—一个空格、节点的QName、一个等号、一个开引号(双引号)、修改后的字符串值和一个闭引号(双引号)。节点的字符串值通过将所有的符号(&)替换为&amp;,进行修改;,所有带&lt;的开口角括号(<);,所有带“的引号(双引号)字符;,以及带有字符引用的空格字符#x9、#xA和#xD。字符引用以大写十六进制书写,不带前导零(例如,#xD由字符引用&#xD;)表示)。

* Text Nodes- the string value, except all ampersands are replaced by &amp;, all open angle brackets (<) are replaced by &lt;, all closing angle brackets (>) are replaced by &gt;, and all #xD characters are replaced by &#xD;.

* Text Nodes-字符串值,除所有符号均替换为&amp;,所有开口角括号(<)均替换为&lt;,将所有闭合角括号(>)替换为&gt;,并且所有的#xD字符都替换为&#xD;。

* Processing Instruction (PI) Nodes- The opening PI symbol (<?), the PI target name of the node, a leading space and the string value if it is not empty, and the closing PI symbol (?>). If the string value is empty, then the leading space is not added. Also, a trailing #xA is rendered after the closing PI symbol for PI children of the root node with a lesser document order than the document element, and a leading #xA is rendered before the opening PI symbol of PI children of the root node with a greater document order than the document element.

* 处理指令(PI)节点-开始PI符号(<?),节点的PI目标名称,前导空格和字符串值(如果不为空),以及结束PI符号(?>)。如果字符串值为空,则不添加前导空格。此外,对于文档顺序小于文档元素的根节点的PI子节点,尾随的#xA在结束PI符号之后呈现,而前导的#xA在文档顺序大于文档元素的根节点的PI子节点的开始PI符号之前呈现。

* Comment Nodes- Nothing if generating canonical XML without comments. For canonical XML with comments, generate the opening comment symbol (<!--), the string value of the node, and the closing comment symbol (-->). Also, a trailing #xA is rendered after the closing comment symbol for comment children of the root node with a lesser document order than the document element, and a leading #xA is rendered before the opening comment symbol of comment children of the root node with a greater document order than the document element. (Comment children of the root node represent comments outside of the top-level document element and outside of the document type declaration.)

* 注释节点-如果生成不带注释的规范XML,则为空。对于带注释的规范XML,生成开始注释符号(<!-->)、节点的字符串值和结束注释符号(->)。此外,对于文档顺序低于文档元素的根节点的注释子节点,尾随的#xA呈现在结束注释符号之后,而对于文档顺序高于文档元素的根节点的注释子节点,尾随的#xA呈现在开始注释符号之前。(根节点的注释子节点表示顶级文档元素和文档类型声明之外的注释。)

The QName of a node is either the local name if the namespace prefix string is empty or the namespace prefix, a colon, then the local name of the element. The namespace prefix used in the QName MUST be the same one which appeared in the input document.

节点的QName要么是本地名称(如果名称空间前缀字符串为空),要么是名称空间前缀(冒号),然后是元素的本地名称。QName中使用的名称空间前缀必须与输入文档中出现的名称空间前缀相同。

2.4 Document Subsets
2.4 文档子集

Some applications require the ability to create a physical representation for an XML document subset (other than the one generated by default, which can be a proper subset of the document if the comments are omitted). Implementations of XML canonicalization that are based on XPath can provide this functionality with little additional overhead by accepting a node-set as input rather than an octet stream.

有些应用程序需要能够为XML文档子集创建物理表示(默认情况下生成的表示除外,如果省略注释,它可以是文档的适当子集)。基于XPath的XML规范化实现可以通过接受节点集作为输入而不是八位字节流,以很少的额外开销提供此功能。

The processing of an element node E MUST be modified slightly when an XPath node-set is given as input and the element's parent is omitted from the node-set. The method for processing the attribute axis of an element E in the node-set is enhanced. All element nodes along E's ancestor axis are examined for nearest occurrences of attributes in the xml namespace, such as xml:lang and xml:space (whether or not they are in the node-set). From this list of attributes, remove any that are in E's attribute axis (whether or not they are in the node-set). Then, lexicographically merge this attribute list with the nodes of E's attribute axis that are in the node-set. The result of visiting the attribute axis is computed by processing the attribute nodes in this merged attribute list.

当XPath节点集作为输入提供并且元素的父节点从节点集中省略时,必须稍微修改元素节点E的处理。增强了处理节点集中元素E的属性轴的方法。沿E的祖先轴的所有元素节点都会检查xml名称空间中最近出现的属性,例如xml:lang和xml:space(无论它们是否在节点集中)。从该属性列表中,删除E属性轴中的所有属性(无论它们是否在节点集中)。然后,按字典顺序将该属性列表与节点集中E的属性轴的节点合并。通过处理此合并属性列表中的属性节点来计算访问属性轴的结果。

Note: XML entities can derive application-specific meaning from anywhere in the XML markup as well as by rules not expressed in XML 1.0 and the Namespaces Recommendations. Clearly, these rules cannot be specified in this document, so the creator of the input node-set must be responsible for preserving the information necessary to capture the full semantics of the members of the resulting node-set.

注意:XML实体可以从XML标记中的任何位置以及XML 1.0和名称空间建议中未表达的规则派生特定于应用程序的含义。显然,这些规则不能在本文档中指定,因此输入节点集的创建者必须负责保存捕获结果节点集成员的完整语义所需的信息。

The canonical XML generated for an entire XML document is well-formed. The canonical form of an XML document subset may not be well-formed XML. However, since the canonical form may be subjected to further XML processing, most XPath node-sets provided for canonicalization will be designed to produce a canonical form that is a well-formed XML document or external general parsed entity. Whether from a full document or a document subset, if the canonical form is well-formed XML, then subsequent applications of the same XML canonicalization method to the canonical form make no changes.

为整个XML文档生成的规范XML格式良好。XML文档子集的规范形式可能不是格式良好的XML。但是,由于规范化形式可能会受到进一步的XML处理,因此为规范化提供的大多数XPath节点集将被设计为生成规范化形式,即格式良好的XML文档或外部通用解析实体。无论是从完整文档还是从文档子集,如果规范形式是格式良好的XML,则相同XML规范化方法对规范形式的后续应用程序不会进行任何更改。

3. Examples of XML Canonicalization
3. XML规范化示例

The examples in this section assume a non-validating processor, primarily so that a document type declaration can be used to declare entities as well as default attributes and attributes of various types (such as ID and enumerated) without having to declare all attributes for all elements in the document. As well, one example contains an element that deliberately violates a validity constraint (because it is still well-formed).

本节中的示例假定为非验证处理器,主要是为了使用文档类型声明来声明实体以及默认属性和各种类型的属性(例如ID和枚举),而不必声明文档中所有元素的所有属性。同样,一个示例包含一个故意违反有效性约束的元素(因为它仍然是格式良好的)。

3.1 PIs, Comments, and Outside of Document Element
3.1 PIs、注释和文档元素外部
   Input Document
   --------------
   <?xml version="1.0"?>
        
   Input Document
   --------------
   <?xml version="1.0"?>
        
   <?xml-stylesheet   href="doc.xsl"
      type="text/xsl"   ?>
        
   <?xml-stylesheet   href="doc.xsl"
      type="text/xsl"   ?>
        
   <!DOCTYPE doc SYSTEM "doc.dtd">
        
   <!DOCTYPE doc SYSTEM "doc.dtd">
        
   <doc>Hello, world!<!-- Comment 1 --></doc>
        
   <doc>Hello, world!<!-- Comment 1 --></doc>
        

<?pi-without-data ?>

<?无数据的pi>

   <!-- Comment 2 -->
        
   <!-- Comment 2 -->
        
   <!-- Comment 3 -->
        
   <!-- Comment 3 -->
        
   Canonical Form (uncommented)
   ----------------------------
   <?xml-stylesheet href="doc.xsl"
      type="text/xsl"   ?>
   <doc>Hello, world!</doc>
   <?pi-without-data?>
        
   Canonical Form (uncommented)
   ----------------------------
   <?xml-stylesheet href="doc.xsl"
      type="text/xsl"   ?>
   <doc>Hello, world!</doc>
   <?pi-without-data?>
        
   Canonical Form (commented)
   --------------------------
   <?xml-stylesheet href="doc.xsl"
      type="text/xsl"   ?>
   <doc>Hello, world!<!-- Comment 1 --></doc>
   <?pi-without-data?>
   <!-- Comment 2 -->
   <!-- Comment 3 -->
        
   Canonical Form (commented)
   --------------------------
   <?xml-stylesheet href="doc.xsl"
      type="text/xsl"   ?>
   <doc>Hello, world!<!-- Comment 1 --></doc>
   <?pi-without-data?>
   <!-- Comment 2 -->
   <!-- Comment 3 -->
        

Demonstrates:

演示:

* Loss of XML declaration * Loss of DTD * Normalization of whitespace outside of document element (first character of both canonical forms is '<'; single line breaks separate PIs and comments outside of document element) * Loss of whitespace between PITarget and its data * Retention of whitespace inside PI data * Comment removal from uncommented canonical form, including delimiter for comments outside document element (the last character in both canonical forms is '>')

* XML声明丢失*DTD丢失*文档元素外部空白的规范化(两种规范格式的第一个字符都是“<”;单行分隔文档元素外部的独立PI和注释)*PITarget与其数据之间的空白丢失*保留PI数据内部的空白*从未注释的规范格式中删除注释,包括文档元素外部注释的分隔符(两种规范格式中的最后一个字符都是“>”)

3.2 Whitespace in Document Content
3.2 文档内容中的空白
   Input Document
   --------------
   <doc>
      <clean>   </clean>
      <dirty>   A   B   </dirty>
      <mixed>
         A
         <clean>   </clean>
         B
         <dirty>   A   B   </dirty>
         C
      </mixed>
   </doc>
        
   Input Document
   --------------
   <doc>
      <clean>   </clean>
      <dirty>   A   B   </dirty>
      <mixed>
         A
         <clean>   </clean>
         B
         <dirty>   A   B   </dirty>
         C
      </mixed>
   </doc>
        
   Canonical Form
   --------------
   <doc>
      <clean>   </clean>
      <dirty>   A   B   </dirty>
      <mixed>
         A
         <clean>   </clean>
         B
         <dirty>   A   B   </dirty>
        
   Canonical Form
   --------------
   <doc>
      <clean>   </clean>
      <dirty>   A   B   </dirty>
      <mixed>
         A
         <clean>   </clean>
         B
         <dirty>   A   B   </dirty>
        
         C
      </mixed>
   </doc>
        
         C
      </mixed>
   </doc>
        

Demonstrates:

演示:

* Retain all whitespace between consecutive start tags, clean or dirty * Retain all whitespace between consecutive end tags, clean or dirty * Retain all whitespace between end tag/start tag pair, clean or dirty * Retain all whitespace in character content, clean or dirty

* 保留连续开始标记之间的所有空白,干净或脏*保留连续结束标记之间的所有空白,干净或脏*保留结束标记/开始标记对之间的所有空白,干净或脏*保留字符内容中的所有空白,干净或脏

Note: In this example, the input document and canonical form are identical. Both end with '>' character.

注意:在本例中,输入文档和规范形式是相同的。两者都以“>”字符结尾。

3.3 Start and End Tags
3.3 开始和结束标记
Input Document
--------------
<!DOCTYPE doc [<!ATTLIST e9 attr CDATA "default">]>
<doc>
   <e1   />
   <e2   ></e2>
   <e3    name = "elem3"   id="elem3"    />
   <e4    name="elem4"   id="elem4"    ></e4>
   <e5 a:attr="out" b:attr="sorted" attr2="all" attr="I'm"
       xmlns:b="http://www.ietf.org"
       xmlns:a="http://www.w3.org"
       xmlns="http://example.org"/>
   <e6 xmlns="" xmlns:a="http://www.w3.org">
       <e7 xmlns="http://www.ietf.org">
           <e8 xmlns="" xmlns:a="http://www.w3.org">
               <e9 xmlns="" xmlns:a="http://www.ietf.org"/>
           </e8>
       </e7>
   </e6>
</doc>
        
Input Document
--------------
<!DOCTYPE doc [<!ATTLIST e9 attr CDATA "default">]>
<doc>
   <e1   />
   <e2   ></e2>
   <e3    name = "elem3"   id="elem3"    />
   <e4    name="elem4"   id="elem4"    ></e4>
   <e5 a:attr="out" b:attr="sorted" attr2="all" attr="I'm"
       xmlns:b="http://www.ietf.org"
       xmlns:a="http://www.w3.org"
       xmlns="http://example.org"/>
   <e6 xmlns="" xmlns:a="http://www.w3.org">
       <e7 xmlns="http://www.ietf.org">
           <e8 xmlns="" xmlns:a="http://www.w3.org">
               <e9 xmlns="" xmlns:a="http://www.ietf.org"/>
           </e8>
       </e7>
   </e6>
</doc>
        
Canonical Form
--------------
<doc>
   <e1></e1>
   <e2></e2>
   <e3 id="elem3" name="elem3"></e3>
   <e4 id="elem4" name="elem4"></e4>
   <e5 xmlns="http://example.org" xmlns:a="http://www.w3.org"
        
Canonical Form
--------------
<doc>
   <e1></e1>
   <e2></e2>
   <e3 id="elem3" name="elem3"></e3>
   <e4 id="elem4" name="elem4"></e4>
   <e5 xmlns="http://example.org" xmlns:a="http://www.w3.org"
        
xmlns:b="http://www.ietf.org" attr="I'm" attr2="all"
b:attr="sorted" a:attr="out"></e5>
   <e6 xmlns:a="http://www.w3.org">
       <e7 xmlns="http://www.ietf.org">
           <e8 xmlns="">
               <e9 xmlns:a="http://www.ietf.org" attr="default"></e9>
           </e8>
       </e7>
   </e6>
</doc>
        
xmlns:b="http://www.ietf.org" attr="I'm" attr2="all"
b:attr="sorted" a:attr="out"></e5>
   <e6 xmlns:a="http://www.w3.org">
       <e7 xmlns="http://www.ietf.org">
           <e8 xmlns="">
               <e9 xmlns:a="http://www.ietf.org" attr="default"></e9>
           </e8>
       </e7>
   </e6>
</doc>
        

Demonstrates:

演示:

* Empty element conversion to start-end tag pair * Normalization of whitespace in start and end tags * Relative order of namespace and attribute axes * Lexicographic ordering of namespace and attribute axes * Retention of namespace prefixes from original document * Elimination of superfluous namespace declarations * Addition of default attribute

* 空元素到开始-结束标记对的转换*开始和结束标记中空白的规范化*名称空间和属性轴的相对顺序*名称空间和属性轴的字典顺序*保留原始文档中的名称空间前缀*消除多余的名称空间声明*添加默认属性

Note: Some start tags in the canonical form are very long, but each start tag in this example is entirely on a single line.

注意:一些规范形式的开始标记非常长,但本例中的每个开始标记都完全位于一行上。

Note: In e5, b:attr precedes a:attr because the primary key is namespace URI not namespace prefix, and attr2 precedes b:attr because the default namespace is not applied to unqualified attributes (so the namespace URI for attr2 is empty).

注意:在e5中,b:attr位于a:attr之前,因为主键是名称空间URI而不是名称空间前缀,而attr2位于b:attr之前,因为默认名称空间未应用于非限定属性(因此attr2的名称空间URI为空)。

3.4 Character Modifications and Character References
3.4 字符修改和字符引用
Input Document
--------------
<!DOCTYPE doc [
<!ATTLIST normId id ID #IMPLIED>
<!ATTLIST normNames attr NMTOKENS #IMPLIED>
]>
<doc>
   <text>First line&#x0d;&#10;Second line</text>
   <value>&#x32;</value>
   <compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]>
   </compute>
   <compute expr='value>"0" &amp;&amp; value&lt;"10"
?"valid":"error"'>valid</compute>
   <norm attr=' &apos;   &#x20;&#13;&#xa;&#9;   &apos; '/>
   <normNames attr='   A   &#x20;&#13;&#xa;&#9;   B   '/>
   <normId id=' &apos;   &#x20;&#13;&#xa;&#9;   &apos; '/>
</doc>
        
Input Document
--------------
<!DOCTYPE doc [
<!ATTLIST normId id ID #IMPLIED>
<!ATTLIST normNames attr NMTOKENS #IMPLIED>
]>
<doc>
   <text>First line&#x0d;&#10;Second line</text>
   <value>&#x32;</value>
   <compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]>
   </compute>
   <compute expr='value>"0" &amp;&amp; value&lt;"10"
?"valid":"error"'>valid</compute>
   <norm attr=' &apos;   &#x20;&#13;&#xa;&#9;   &apos; '/>
   <normNames attr='   A   &#x20;&#13;&#xa;&#9;   B   '/>
   <normId id=' &apos;   &#x20;&#13;&#xa;&#9;   &apos; '/>
</doc>
        
Canonical Form
--------------
<doc>
   <text>First line&#xD;
Second line</text>
   <value>2</value>
   <compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"
   </compute>
   <compute expr="value>&quot;0&quot; &amp;&amp; value&lt;&quot;10&quot;
?&quot;
valid&quot;:&quot;error&quot;">valid</compute>
   <norm attr=" '    &#xD;&#xA;&#x9;   ' "></norm>
   <normNames attr="A &#xD;&#xA;&#x9; B"></normNames>
   <normId id="' &#xD;&#xA;&#x9; '"></normId>
</doc>
        
Canonical Form
--------------
<doc>
   <text>First line&#xD;
Second line</text>
   <value>2</value>
   <compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"
   </compute>
   <compute expr="value>&quot;0&quot; &amp;&amp; value&lt;&quot;10&quot;
?&quot;
valid&quot;:&quot;error&quot;">valid</compute>
   <norm attr=" '    &#xD;&#xA;&#x9;   ' "></norm>
   <normNames attr="A &#xD;&#xA;&#x9; B"></normNames>
   <normId id="' &#xD;&#xA;&#x9; '"></normId>
</doc>
        

Demonstrates:

演示:

* Character reference replacement * Attribute value delimiters set to quotation marks (double quotes) * Attribute value normalization * CDATA section replacement * Encoding of special characters as character references in attribute values (&amp;, &lt;, &quot;, &#xD;, &#xA;, &#x9;) * Encoding of special characters as character references in text (&amp;, &lt;, &gt;, &#xD;)

* 字符引用替换*属性值分隔符设置为引号(双引号)*属性值规范化*CDATA节替换*将特殊字符编码为属性值中的字符引用(&amp;、&lt;、&quot;、&xD;、&xA;、&x9)*将特殊字符编码为文本中的字符引用(&amp;、&lt;、&gt;、&xD;)

Note: The last element, normId, is well-formed but violates a validity constraint for attributes of type ID. For testing canonical XML implementations based on validating processors, remove the line containing this element from the input and canonical form. In general, XML consumers should be discouraged from using this feature of XML.

注意:最后一个元素normId格式良好,但违反了ID类型属性的有效性约束。要基于验证处理器测试规范XML实现,请从输入和规范表单中删除包含此元素的行。一般来说,应该劝阻XML消费者不要使用XML的这一特性。

Note: Whitespace characters references other than &#x20; are not affected by attribute value normalization [XML].

注意:除&#x20;不受属性值规范化[XML]的影响。

Note: In the canonical form, the value of the attribute named attr in the element norm begins with a space, a single quote, then four spaces before the first character reference.

注意:在规范形式中,元素规范中名为attr的属性的值以一个空格开始,一个单引号,然后在第一个字符引用之前有四个空格。

Note: The expr attribute of the second compute element contains no line breaks.

注意:第二个compute元素的expr属性不包含换行符。

3.5 Entity References
3.5 实体引用
   Input Document
   --------------
   <!DOCTYPE doc [
   <!ATTLIST doc attrExtEnt ENTITY #IMPLIED>
   <!ENTITY ent1 "Hello">
   <!ENTITY ent2 SYSTEM "world.txt">
   <!ENTITY entExt SYSTEM "earth.gif" NDATA gif>
   <!NOTATION gif SYSTEM "viewgif.exe">
   ]>
   <doc attrExtEnt="entExt">
      &ent1;, &ent2;!
   </doc>
        
   Input Document
   --------------
   <!DOCTYPE doc [
   <!ATTLIST doc attrExtEnt ENTITY #IMPLIED>
   <!ENTITY ent1 "Hello">
   <!ENTITY ent2 SYSTEM "world.txt">
   <!ENTITY entExt SYSTEM "earth.gif" NDATA gif>
   <!NOTATION gif SYSTEM "viewgif.exe">
   ]>
   <doc attrExtEnt="entExt">
      &ent1;, &ent2;!
   </doc>
        
   <!-- Let world.txt contain "world" (excluding the quotes) -->
        
   <!-- Let world.txt contain "world" (excluding the quotes) -->
        
   Canonical Form (uncommented)
   ----------------------------
   <doc attrExtEnt="entExt">
      Hello, world!
   </doc>
        
   Canonical Form (uncommented)
   ----------------------------
   <doc attrExtEnt="entExt">
      Hello, world!
   </doc>
        

Demonstrates:

演示:

* Internal parsed entity reference replacement * External parsed entity reference replacement (including whitespace outside elements and PIs) * External unparsed entity reference

* 内部解析实体引用替换*外部解析实体引用替换(包括元素和PI外部的空格)*外部未解析实体引用

3.6 UTF-8 Encoding
3.6 UTF-8编码
   Input Document
   --------------
   <?xml version="1.0" encoding="ISO-8859-1"?>
   <doc>&#169;</doc>
        
   Input Document
   --------------
   <?xml version="1.0" encoding="ISO-8859-1"?>
   <doc>&#169;</doc>
        
   Canonical Form
   --------------
   <doc>#xC2#xA9</doc>
        
   Canonical Form
   --------------
   <doc>#xC2#xA9</doc>
        

Demonstrates:

演示:

* Effect of transcoding from a sample encoding to UTF-8

* 从样本编码到UTF-8的转码效果

Note: The content of the doc element is NOT the string #xC2#xA9 but rather the two octets whose hexadecimal values are C2 and A9, which is the UTF-8 encoding of the UCS codepoint for the copyright symbol (c).

注意:doc元素的内容不是字符串#xC2#xA9,而是十六进制值为C2和A9的两个八位字节,这是版权符号(c)UCS码点的UTF-8编码。

3.7 Document Subsets
3.7 文档子集
Input Document
--------------
<!DOCTYPE doc [
<!ATTLIST e2 xml:space (default|preserve) 'preserve'>
<!ATTLIST e3 id ID #IMPLIED>
]>
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org">
   <e1>
      <e2 xmlns="">
         <e3 id="E3"/>
      </e2>
   </e1>
</doc>
        
Input Document
--------------
<!DOCTYPE doc [
<!ATTLIST e2 xml:space (default|preserve) 'preserve'>
<!ATTLIST e3 id ID #IMPLIED>
]>
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org">
   <e1>
      <e2 xmlns="">
         <e3 id="E3"/>
      </e2>
   </e1>
</doc>
        
Document Subset Expression
--------------------------
(//. | //@* | //namespace::*)
[ <br/>
   self::ietf:e1 or (parent::ietf:e1 and not(self::text() or self::e2))
   or
   count(id("E3")|ancestor-or-self::node()) =
count(ancestor-or-self::node())
]
        
Document Subset Expression
--------------------------
(//. | //@* | //namespace::*)
[ <br/>
   self::ietf:e1 or (parent::ietf:e1 and not(self::text() or self::e2))
   or
   count(id("E3")|ancestor-or-self::node()) =
count(ancestor-or-self::node())
]
        
Canonical Form
--------------
<e1 xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"><e3
xmlns="" id="E3" xml:space="preserve"></e3></e1>
        
Canonical Form
--------------
<e1 xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"><e3
xmlns="" id="E3" xml:space="preserve"></e3></e1>
        

Demonstrates:

演示:

* Empty default namespace propagation from omitted parent element * Propagation of attributes in xml namespace in document subsets * Persistence of omitted namespace declarations in descendants

* 从省略的父元素中传播空的默认命名空间*在文档子集中传播xml命名空间中的属性*在子体中保留省略的命名空间声明

   Note: In the document subset expression, the subexpression (//. |
         //@* | //namespace::*) selects all nodes in the input document,
         subjecting each to the predicate expression in square brackets.
         The expression is true for e1 and its implicit namespace nodes,
         and it is true if the element identified by E3 is in the
        
   Note: In the document subset expression, the subexpression (//. |
         //@* | //namespace::*) selects all nodes in the input document,
         subjecting each to the predicate expression in square brackets.
         The expression is true for e1 and its implicit namespace nodes,
         and it is true if the element identified by E3 is in the
        

ancestor-or-self path of the context node (such that ancestor-or-self stays the same size under union with the element identified by E3).

上下文节点的祖先或自身路径(这样祖先或自身在与E3标识的元素联合时保持相同大小)。

Note: The canonical form contains no line delimiters.

注意:规范形式不包含行分隔符。

4. Resolutions
4. 决议

This section discusses a number of key decision points as well as a rationale for each decision. Although this specification now defines XML canonicalization in terms of the XPath data model rather than XML Infoset, the canonical form described in this document is quite similar in most respects to the canonical form described in the January 2000 Canonical XML draft [C14N-20000119]. However, some differences exist, and a number of the subsections discuss the changes.

本节讨论了一些关键决策点以及每个决策的基本原理。尽管本规范现在根据XPath数据模型而不是XML信息集定义了XML规范化,但本文档中描述的规范化形式在大多数方面与2000年1月规范化XML草案[C14N-20000119]中描述的规范化形式非常相似。然而,存在一些差异,许多小节讨论了这些变化。

4.1 No XML Declaration
4.1 没有XML声明

The XML declaration, including version number and character encoding is omitted from the canonical form. The encoding is not needed since the canonical form is encoded in UTF-8. The version is not needed since the absence of a version number unambiguously indicates XML 1.0.

规范格式中省略了XML声明,包括版本号和字符编码。不需要编码,因为规范形式是用UTF-8编码的。不需要该版本,因为缺少版本号明确表示XML 1.0。

Future versions of XML will be required to include an XML declaration to indicate the version number. However, canonicalization method described in this specification may not be applicable to future versions of XML without some modifications. When canonicalization of a new version of XML is required, this specification could be updated to include the XML declaration as presumably the absence of the XML declaration from the XPath data model can be remedied by that time (e.g., by reissuing a new XPath based on the Infoset data model).

XML的未来版本将需要包含一个XML声明来指示版本号。然而,本规范中描述的规范化方法在没有一些修改的情况下可能不适用于XML的未来版本。当需要规范化新版本的XML时,可以更新此规范以包含XML声明,因为XPath数据模型中缺少XML声明的情况可能会在那时得到纠正(例如,通过基于Infoset数据模型重新发布新的XPath)。

4.2 No Character Model Normalization
4.2 无字符模型规范化

The Unicode standard [Unicode] allows multiple different representations of certain "precomposed characters" (a simple example is +U00E7, "LATIN SMALL LETTER C WITH CEDILLA"). Thus two XML documents with content that is equivalent for the purposes of most applications may contain differing character sequences. The W3C is preparing a normalized representation [CharModel]. The C14N-20000119 Canonical XML draft used this normalized form. However, many XML 1.0 processors do not perform this normalization. Furthermore, applications that must solve this problem typically enforce character model normalization at all times starting when character content is created in order to avoid processing failures that could otherwise result (e.g., see example from Cowan). Therefore, character model

Unicode标准[Unicode]允许对某些“预合成字符”进行多种不同的表示(一个简单的例子是+U00E7,“带CEDILLA的拉丁小写字母C”)。因此,对于大多数应用程序而言,内容相同的两个XML文档可能包含不同的字符序列。W3C正在准备一个规范化表示[CharModel]。C14N-20000119规范化XML草案使用了这种规范化形式。但是,许多XML1.0处理器不执行这种规范化。此外,必须解决此问题的应用程序通常在创建字符内容时始终强制执行字符模型规范化,以避免可能导致的处理失败(例如,请参阅Cowan的示例)。因此,角色模型

normalization has been moved out of scope for XML canonicalization. However, the XML processor used to prepare the XPath data model input is required (by the Data Model) to use Normalization Form C [NFC, NFC-Corrigendum] when converting an XML document to the UCS character domain from any encoding that is not UCS-based (currently, UCS-based encodings include UTF-8, UTF-16, UTF-16BE, and UTF-16LE, UCS-2, and UCS-4).

规范化已移出XML规范化的范围。但是,当将XML文档从任何非基于UCS的编码转换为UCS字符域时(目前,基于UCS的编码包括UTF-8、UTF-16、UTF-16BE和UTF-16LE、UCS-2和UCS-4),用于准备XPath数据模型输入的XML处理器需要(数据模型)使用规范化形式C[NFC,NFC勘误表].

4.3 Handling of Whitespace Outside Document Element
4.3 文档元素外部空白的处理

The C14N-20000119 Canonical XML draft placed a #xA after each PI outside of the document element as well as a #xA after the end tag of the document element. The method in this specification performs the same function except for omitting the final #xA after the last PI (or comment or end tag of the document element). This technique ensures that PI (and comment) children of the root are separated from markup by a line feed even if root node or the document element are omitted from the output node-set.

C14N-20000119规范XML草案在文档元素外部的每个PI后面放置了一个#xA,并在文档元素的结束标记后面放置了一个#xA。本规范中的方法执行相同的功能,只是省略了最后一个PI(或文档元素的注释或结束标记)后的最后一个#xA。这种技术确保根的PI(和comment)子元素通过换行从标记中分离出来,即使从输出节点集中省略了根节点或文档元素。

4.4 No Namespace Prefix Rewriting
4.4 没有名称空间前缀重写

The C14N-20000119 Canonical XML draft described a method for rewriting namespace prefixes such that two documents having logically equivalent namespace declarations would also have identical namespace prefixes. The goal was to eliminate dependence on the particular namespace prefixes in a document when testing for logical equivalence. However, there now exist a number of contexts in which namespace prefixes can impart information value in an XML document. For example, an XPath expression in an attribute value or element content can reference a namespace prefix. Thus, rewriting the namespace prefixes would damage such a document by changing its meaning (and it cannot be logically equivalent if its meaning has changed).

C14N-20000119规范XML草案描述了一种重写名称空间前缀的方法,使得两个具有逻辑等效名称空间声明的文档也具有相同的名称空间前缀。目标是在测试逻辑等价性时消除对文档中特定名称空间前缀的依赖。然而,现在存在许多上下文,在这些上下文中,名称空间前缀可以传递XML文档中的信息值。例如,属性值或元素内容中的XPath表达式可以引用命名空间前缀。因此,重写名称空间前缀会通过更改其含义而损坏此类文档(如果其含义已更改,则在逻辑上不可能等同)。

More formally, let D1 be a document containing an XPath in an attribute value or element content that refers to namespace prefixes used in D1. Further assume that the namespace prefixes in D1 will all be rewritten by the canonicalization method. Let D23D D1, then modify the namespace prefixes in D2 and modify the XPath expression's references to namespace prefixes such that D2 and D1 remain logically equivalent. Since namespace rewriting does not include occurrences of namespace references in attribute values and element content, the canonical form of D1 does not equal the canonical form of D2 because the XPath will be different. Thus, although namespace rewriting normalizes the namespace declarations, the goal eliminating dependence on the particular namespace prefixes in the document is not achieved.

更正式地说,假设D1是一个文档,在属性值或元素内容中包含XPath,该属性值或元素内容引用D1中使用的名称空间前缀。进一步假设D1中的名称空间前缀都将通过规范化方法重写。让D23D D1,然后修改D2中的名称空间前缀,并修改XPath表达式对名称空间前缀的引用,以便D2和D1在逻辑上保持等效。由于名称空间重写不包括属性值和元素内容中出现的名称空间引用,D1的规范形式不等于D2的规范形式,因为XPath将不同。因此,尽管名称空间重写规范化了名称空间声明,但消除对文档中特定名称空间前缀的依赖的目标并没有实现。

Moreover, it is possible to prove that namespace rewriting is harmful, rather than simply ineffective. Let D1 be a document containing an XPath in an attribute value or element content that refers to namespace prefixes used in D1. Further assume that the namespace prefixes in D1 will all be rewritten by the canonicalization method. Now let D2 be the canonical form of D1. Clearly, the canonical forms of D1 and D2 are equivalent (since D2 is the canonical form of the canonical form of D1), yet D1 and D2 are not logically equivalent because the aforementioned XPath works in D1 and doesn't work in D2.

此外,可以证明名称空间重写是有害的,而不仅仅是无效的。假设D1是一个文档,在属性值或元素内容中包含XPath,该属性值或元素内容引用D1中使用的命名空间前缀。进一步假设D1中的名称空间前缀都将通过规范化方法重写。现在让D2成为D1的标准形式。显然,D1和D2的规范形式是等价的(因为D2是D1规范形式的规范形式),但是D1和D2在逻辑上并不等价,因为前面提到的XPath在D1中工作,而在D2中不工作。

Note that an argument similar to this can be leveled against the XML canonicalization method based on any of the cases in the Limitations, the problems cannot easily be fixed in those cases, whereas here we have an opportunity to avoid purposefully introducing such a limitation.

请注意,基于限制中的任何情况,可以针对XML规范化方法提出类似的论点,这些问题在这些情况下都不容易解决,而这里我们有机会避免故意引入这样的限制。

Applications that must test for logical equivalence must perform more sophisticated tests than mere octet stream comparison. However, this is quite likely to be necessary in any case in order to test for logical equivalencies based on application rules as well as rules from other XML-related recommendations, working drafts, and future works.

必须测试逻辑等价性的应用程序必须执行比单纯的八位字节流比较更复杂的测试。然而,为了基于应用程序规则以及来自其他XML相关建议、工作草案和未来工作的规则来测试逻辑等价性,这在任何情况下都是非常必要的。

4.5 Order of Namespace Declarations and Attributes
4.5 名称空间声明和属性的顺序

The C14N-20000119 Canonical XML draft alternated between namespace declarations and attribute declarations. This is part of the namespace prefix rewriting scheme, which this specification eliminates. This specification follows the XPath data model of putting all namespace nodes before all attribute nodes.

C14N-20000119规范XML草案在名称空间声明和属性声明之间交替。这是名称空间前缀重写方案的一部分,本规范消除了该方案。此规范遵循将所有命名空间节点放在所有属性节点之前的XPath数据模型。

4.6 Superfluous Namespace Declarations
4.6 多余的命名空间声明

Unnecessary namespace declarations are not made in the canonical form. Whether for an empty default namespace, a non-empty default namespace, or a namespace prefix binding, the XML canonicalization method omits a declaration if it determines that the immediate parent element in the canonical form has an equivalent declaration in scope. The root document element is handled specially since it has no parent element. All namespace declarations in it are retained, except the declaration of an empty default namespace is automatically omitted.

不必要的命名空间声明不是以规范形式进行的。无论是空的默认名称空间、非空的默认名称空间还是名称空间前缀绑定,如果XML规范化方法确定规范形式中的直接父元素在作用域中具有等效声明,则会忽略声明。根文档元素是专门处理的,因为它没有父元素。其中的所有名称空间声明都将保留,但空默认名称空间的声明将自动忽略。

Relative to the method of simply rendering the entire namespace context of each element, implementations are not hindered by more than a constant factor in processing time and memory use. The advantages include:

相对于简单地呈现每个元素的整个名称空间上下文的方法,实现不会受到处理时间和内存使用方面超过一个常数的因素的阻碍。优点包括:

* Eliminates overrun of xmlns="" from canonical forms of applications that may not even use namespaces, or support them only minimally. * Eliminates namespace declarations from elements where they may not belong according to the application's content model, thereby simplifying the task of reattaching a document type declaration to a canonical form.

* 消除了甚至可能不使用名称空间或仅最低限度支持名称空间的标准形式的应用程序中xmlns=”“的溢出。*根据应用程序的内容模型,从可能不属于的元素中消除名称空间声明,从而简化将文档类型声明重新附加到规范形式的任务。

Note that in document subsets, an element with omissions from its ancestral element chain will be rendered to the canonical form with namespace declarations that may have been made in its omitted ancestors, thus preserving the meaning of the element.

请注意,在文档子集中,从其祖先元素链中删除的元素将被呈现为具有命名空间声明的规范形式,这些命名空间声明可能已在其省略的祖先中进行,从而保留元素的含义。

4.7 Propagation of Default Namespace Declaration in Document Subsets
4.7 文档子集中默认名称空间声明的传播

The XPath data model represents an empty default namespace with the absence of a node, not with the presence of a default namespace node having an empty value. Thus, with respect to the fact that element e3 in the following examples is not namespace qualified, we cannot tell the difference between <e1 xmlns="a:b"><e2 xmlns=""><e3/></e2></e1> versus <e1 xmlns="a:b"><e2><e3 xmlns=""/></e2></e1>. All we know is that e3 was not namespace qualified on input, so we preserve this information on output if e2 is omitted so that e3 does not take on the default namespace qualification of e1.

XPath数据模型表示没有节点的空默认命名空间,而不是存在具有空值的默认命名空间节点。因此,关于以下示例中的元素e3不限定名称空间的事实,我们无法区分<e1 xmlns=“a:b”><e2 xmlns=“”><e3/></e2></e1>与<e1 xmlns=“a:b”><e2><e3 xmlns=“”/></e2>之间的区别。我们所知道的是,e3在输入时未限定名称空间,因此如果省略e2,我们将在输出时保留此信息,以便e3不采用默认名称空间限定e1。

4.8 Sorting Attributes by Namespace URI
4.8 按命名空间URI排序属性

Given the requirement to preserve the namespace prefixes declared in a document, sorting attributes with the prefix, rather than the namespace URI, as the primary key is viable and easier to implement.

由于需要保留文档中声明的名称空间前缀,因此使用前缀(而不是名称空间URI)作为主键对属性进行排序是可行的,并且更容易实现。

However, the namespace URI was selected as the primary key because this is closer to the intent of the XML Names specification, which is to identify namespaces by URI and local name, not by a prefix and local name. The effect of the sort is to group together all attributes that are in the same namespace.

但是,之所以选择名称空间URI作为主键,是因为这更接近XML名称规范的意图,即通过URI和本地名称而不是前缀和本地名称来标识名称空间。排序的效果是将同一名称空间中的所有属性组合在一起。

Security Considerations

安全考虑

Security issues are discussed in section 1.3.

第1.3节讨论了安全问题。

References

工具书类

[C14N-20000119] Canonical XML Version 1.0, W3C Working Draft. T. Bray, J. Clark, J. Tauber, and J. Cowan. January 19, 2000. http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html.

[C14N-20000119]规范XML版本1.0,W3C工作草案。T.布雷、J.克拉克、J.陶伯和J.考恩。二○○○年一月十九日。http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html.

[CharModel] Working Draft. eds. Martin J. Durst, Francois Yergeau, Misha Wolf, Asmus Freytag, Tex Texin. http://www.w3.org/TR/charmod/.

[模型]工作草案。编辑:马丁·J·杜斯特、弗朗索瓦·耶乔、米莎·沃尔夫、阿斯马斯·弗雷塔格、德克萨斯州。http://www.w3.org/TR/charmod/.

[Cowan] Example of Harmful Effect of Character Model Normalization, Letter in XML Signature Working Group Mail Archive. John Cowan, July 7, 2000 http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0038.html.

[Cowan]字符模型规范化的有害影响示例,XML签名工作组邮件存档中的信件。约翰·考恩,2000年7月7日http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0038.html.

[Infoset] XML Information Set, W3C Working Draft. John Cowan, Richard Tobin. http://www.w3.org/TR/xml-infoset.

[Infoset]XML信息集,W3C工作草案。约翰·考恩,理查德·托宾。http://www.w3.org/TR/xml-infoset.

[ISO-8859-1] ISO-8859-1 Latin 1 Character Set. http://www.utoronto.ca/webdocs/HTMLdocs/ NewHTML/iso_table.html or http://www.iso.ch/cate/cat.html.

[ISO-8859-1]ISO-8859-1拉丁1字符集。http://www.utoronto.ca/webdocs/HTMLdocs/ NewHTML/iso_table.html或http://www.iso.ch/cate/cat.html.

[Keywords] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[关键词]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

   [Namespaces]           Namespaces in XML, W3C Recommendation. eds.
                          Tim Bray, Dave Hollander, and Andrew Layman.
                          http://www.w3.org/TR/REC-xml-names/
        
   [Namespaces]           Namespaces in XML, W3C Recommendation. eds.
                          Tim Bray, Dave Hollander, and Andrew Layman.
                          http://www.w3.org/TR/REC-xml-names/
        

[NFC] TR15, Unicode Normalization Forms. M. Davis, M. Durst. Revision 18: November 1999. http://www.unicode.org/unicode/reports/tr15/ tr15-18.html.

[NFC]TR15,Unicode规范化表单。戴维斯先生,杜斯特先生。第18次修订:1999年11月。http://www.unicode.org/unicode/reports/tr15/ tr15-18.html。

[NFC-Corrigendum] NFC-Corrigendum. The Unicode Consortium. http://www.unicode.org/unicode/uni2errata/ Normalization_Corrigendum.html.

[NFC勘误表]NFC勘误表。Unicode联盟。http://www.unicode.org/unicode/uni2errata/ 标准化勘误表.html。

[Unicode] The Unicode Standard, version 3.0. The Unicode Consortium. ISBN 0-201-61633-5. http://www.unicode.org/unicode/standard/ versions/Unicode3.0.html.

[Unicode]Unicode标准,3.0版。Unicode联盟。ISBN 0-201-61633-5。http://www.unicode.org/unicode/standard/ versions/Unicode3.0.html。

[UTF-16] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646", RFC 2781, February 2000.

[UTF-16]Hoffman,P.和F.Yergeau,“UTF-16,ISO 10646编码”,RFC 2781,2000年2月。

[UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC 2279, January 1998.

[UTF-8]Yergeau,F.,“UTF-8,ISO 10646的转换格式”,RFC 2279,1998年1月。

[URI] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998.

[URI]Berners Lee,T.,Fielding,R.和L.Masinter,“统一资源标识符(URI):通用语法”,RFC 2396,1998年8月。

[XBase] XML Base ed. Jonathan Marsh. 07 June 2000. http://www.w3.org/TR/xmlbase/.

[XBase]基于XML的ed.Jonathan Marsh。2000年6月7日。http://www.w3.org/TR/xmlbase/.

[XML] Extensible Markup Language (XML) 1.0 (Second Edition), W3C=20 Recommendation. eds. Tim Bray, Jean Paoli, C. M. Sperberg-McQueen and Eve Maler. 6 October 2000. http://www.w3.org/TR/REC-xml.

[XML]可扩展标记语言(XML)1.0(第二版),W3C=20建议。编辑蒂姆·布雷、让·保利、C.M.斯珀伯格·麦奎因和伊夫·马勒。2000年10月6日。http://www.w3.org/TR/REC-xml.

[XML DSig] Eastlake, D., Reagle, J. and D. Solo, "XML-Signature Syntax and Processing", RFC 3075, July 2000.

[XML DSig]Eastlake,D.,Reagle,J.和D.Solo,“XML签名语法和处理”,RFC 30752000年7月。

[XML Plenary Decision] W3C XML Plenary Decision on relative URI References In namespace declarations, W3C Document. 11 September 2000. http://lists.w3.org/Archives/Public/xml-uri/2000Sep/0083.html.

[XML全体会议决定]W3C XML全体会议关于命名空间声明中相对URI引用的决定,W3C文档。2000年9月11日。http://lists.w3.org/Archives/Public/xml-uri/2000Sep/0083.html.

[XPath] XML Path Language (XPath) Version 1.0, , W3C Recommendation. eds. James Clark and Steven DeRose. 16 November 1999. http://www.w3.org/TR/1999/REC-xpath-19991116.

[XPath]XML路径语言(XPath)1.0版,W3C建议。詹姆斯·克拉克和史蒂文·德罗斯编辑。1999年11月16日。http://www.w3.org/TR/1999/REC-xpath-19991116.

Author's Address

作者地址

John Boyer PureEdge Solutions Inc.

约翰·博耶PureEdge解决方案公司。

Phone: 1-888-517-2675 EMail: jboyer@PureEdge.com

电话:1-888-517-2675电子邮件:jboyer@PureEdge.com

Acknowledgements

致谢

The following people provided valuable feedback that improved the quality of this specification:

以下人员提供了宝贵的反馈,提高了本规范的质量:

* Doug Bunting, Ariba * John Cowan, Reuters * Martin J. Durst, W3C * Donald Eastlake 3rd, Motorola * Merlin Hughes, Baltimore * Gregor Karlinger, IAIK TU Graz * Susan Lesch, W3C * Jonathan Marsh, Microsoft * Joseph Reagle, W3C * Petteri Stenius, Done360 * Kent Tamura, IBM

* 道格·邦廷,阿里巴*约翰·考恩,路透社*马丁·J·杜斯特,W3C*唐纳德·伊斯特莱克3号,摩托罗拉*梅林·休斯,巴尔的摩*格雷戈·卡林格,伊克·图·格拉兹*苏珊·莱希,W3C*乔纳森·马什,微软*约瑟夫·雷格尔,W3C*佩特里·斯滕纽斯,唐360*肯特·塔穆拉,IBM

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2001). All Rights Reserved.

版权所有(C)互联网协会(2001年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Acknowledgement

确认

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。