Internet Engineering Task Force (IETF) M. Ohye Request for Comments: 6596 J. Kupke Category: Informational April 2012 ISSN: 2070-1721
Internet Engineering Task Force (IETF) M. Ohye Request for Comments: 6596 J. Kupke Category: Informational April 2012 ISSN: 2070-1721
The Canonical Link Relation
规范链接关系
Abstract
摘要
RFC 5988 specifies a way to define relationships between links on the web. This document describes a new type of such a relationship, "canonical", to designate an Internationalized Resource Identifier (IRI) as preferred over resources with duplicative content.
RFC 5988指定了一种定义web上链接之间关系的方法。本文档描述了这种关系的一种新类型“canonical”,它将国际化资源标识符(IRI)指定为优先于具有重复内容的资源。
Status of This Memo
关于下段备忘
This document is not an Internet Standards Track specification; it is published for informational purposes.
本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.
本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 5741第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6596.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6596.
Copyright Notice
版权公告
Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2012 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
The canonical link relation specifies the preferred IRI from resources with duplicative content. Common implementations of the canonical link relation are to specify the preferred version of an IRI from duplicate pages created with the addition of IRI parameters (e.g., session IDs) or to specify the single-page version as preferred over the same content separated on multiple component pages.
规范链接关系指定来自具有重复内容的资源的首选IRI。规范链接关系的常见实现是,从添加IRI参数(例如,会话ID)创建的重复页面中指定IRI的首选版本,或者在多个组件页面上分离的相同内容中指定单页面版本作为首选版本。
In regard to the link relation type, "canonical" can be described informally as the author's preferred version of a resource. More formally, the canonical link relation specifies the preferred IRI from a set of resources that return the context IRI's content in duplicated form. Once specified, applications such as search engines can focus processing on the canonical, and references to the context (referring) IRI can be updated to reference the target (canonical) IRI.
关于链接关系类型,“canonical”可以非正式地描述为作者对资源的首选版本。更正式地说,规范链接关系从一组以重复形式返回上下文IRI内容的资源中指定首选IRI。一旦指定,搜索引擎等应用程序可以将处理重点放在规范上,对上下文(引用)IRI的引用可以更新为引用目标(规范)IRI。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。
The target (canonical) IRI MUST identify content that is either duplicative or a superset of the content at the context (referring) IRI. Authors who declare the canonical link relation ought to anticipate that applications such as search engines can:
目标(规范)IRI必须在上下文(引用)IRI中标识重复的内容或内容的超集。声明规范链接关系的作者应该预见到搜索引擎等应用程序可以:
o Index content only from the target IRI (i.e., content from the context IRIs will be likely disregarded as duplicative).
o 仅对目标IRI中的内容进行索引(即,上下文IRI中的内容可能会被忽略为重复内容)。
o Consolidate IRI properties, such as link popularity, to the target IRI.
o 将IRI属性(如链接流行度)整合到目标IRI。
o Display the target IRI as the representative IRI.
o 将目标IRI显示为代表性IRI。
The target (canonical) IRI MAY:
目标(规范)IRI可能:
o Specify a relative IRI (see [RFC3986], Section 4.2).
o 指定相对IRI(请参见[RFC3986],第4.2节)。
o Be self-referential (context IRI identical to target IRI).
o 自我参照(上下文IRI与目标IRI相同)。
o Exist on a different hostname or domain.
o 存在于不同的主机名或域上。
o Have different scheme names, such as "http" to "https" or "gopher" to "ftp".
o 具有不同的方案名称,例如“http”到“https”或“gopher”到“ftp”。
o Be a superset of the content at the context IRI.
o 是上下文IRI中内容的超集。
* As an example, each component page (e.g., page-1.html, page-2.html) of a multi-page article MAY specify the "view-all" version (e.g., page-all.html), the superset of their content, as the target IRI. This is because the content from each component page is contained within the view-all version. Given this implementation, applications can mark page-1.html and page-2.html as duplicates of page-all.html, process content only from page-all.html, and disregard the component pages. All references can then be made to the view-all version (page-all.html, the target IRI), and no content will have been lost in this process.
* 例如,多页文章的每个组件页(例如,page-1.html,page-2.html)可以指定“查看所有”版本(例如,page-all.html),即其内容的超集,作为目标IRI。这是因为每个组件页面的内容都包含在view all版本中。鉴于此实现,应用程序可以将page-1.html和page-2.html标记为page-all.html的副本,只处理page-all.html中的内容,而忽略组件页面。然后,可以对ViewAll版本(page-All.html,目标IRI)进行所有引用,并且在此过程中不会丢失任何内容。
* Using the same example above, page-2.html SHOULD NOT designate page-1.html as the target (canonical) IRI because this may cause a loss of data. When page-2.html designates page-1.html as the canonical, only content from the target IRI, page-1.html, will be processed. page-2.html may be marked as a duplicate of page-1.html and its content disregarded.
* 使用上面相同的示例,page-2.html不应将page-1.html指定为目标(规范)IRI,因为这可能会导致数据丢失。当page-2.html将page-1.html指定为规范时,将只处理来自目标IRI page-1.html的内容。page-2.html可能被标记为page-1.html的副本,其内容将被忽略。
o Be the source IRI of a temporary redirect. For HTTP, this refers to status codes 302, 303, or 307 (Sections 10.3.3, 10.3.4, and 10.3.8, respectively, of [RFC2616]).
o 是临时重定向的源IRI。对于HTTP,这是指状态代码302、303或307(分别为[RFC2616]第10.3.3、10.3.4和10.3.8节)。
To better ensure that applications properly handle the canonical link relation, administrators ought to consider the following guidelines:
为了更好地确保应用程序正确地处理规范的链接关系,管理员应该考虑以下准则:
o Specify only one canonical link relation for a resource. (It would be confusing to consider/label/designate more than one IRI as authoritative.)
o 仅为资源指定一个规范链接关系。(考虑/标记/指定多个IRI为权威性IRI会让人感到困惑。)
o Avoid designating the target (canonical) as:
o 避免将目标(规范)指定为:
* The source IRI of a permanent redirect (for HTTP, this refers to 300 and 301 response codes, defined in Sections 10.3.1 and 10.3.2 of [RFC2616]).
* 永久重定向的源IRI(对于HTTP,指[RFC2616]第10.3.1节和第10.3.2节中定义的300和301响应代码)。
* An IRI that also specifies a canonical link relation to an IRI other than itself.
* 一种IRI,它还指定与IRI(而非自身)的规范链接关系。
* An IRI that returns an error code, such as a 4xx response in HTTP (Section 10.4 of [RFC2616]).
* 返回错误代码的IRI,如HTTP中的4xx响应(RFC2616第10.4节)。
* The first page of a multi-page article or multi-page listing of items (since the first page is not duplicative or a superset of the context IRI). For example, page-2.html and page-3.html of an article SHOULD NOT specify page-1.html as the canonical. This may cause a loss of data from page-2.html and page-3.html as they will be marked duplicative of page-1.html with only content from page-1.html being processed.
* 多页文章或多页项目列表的第一页(因为第一页不是重复的或上下文IRI的超集)。例如,文章的page-2.html和page-3.html不应将page-1.html指定为规范。这可能会导致第2.html页和第3.html页的数据丢失,因为它们将被标记为与第1.html页重复,并且只处理第1.html页的内容。
When the canonical link relation is declared improperly, such as creating chained canonicals (i.e., target IRI specifies the source IRI of a permanent redirect) or designating a target IRI that returns a 4xx response, applications can use their own heuristics when processing the resource. For instance, an application can choose to ignore any improper canonical designation and continue to process the remaining content on a page.
当规范链接关系声明不正确时,例如创建链式规范(即,目标IRI指定永久重定向的源IRI)或指定返回4xx响应的目标IRI,应用程序可以在处理资源时使用自己的启发式方法。例如,应用程序可以选择忽略任何不正确的规范指定,并继续处理页面上的剩余内容。
The following example illustrates:
以下示例说明:
o Three IRIs that serve duplicate content.
o 提供重复内容的三个虹膜。
o One IRI that is the canonical or "preferred version".
o 一个IRI是规范的或“首选版本”。
o Two IRIs with additional query parameters, making them the non-preferred version of the content (duplicates). The canonical link relation is therefore specified on these duplicates.
o 具有附加查询参数的两个IRI,使其成为内容的非首选版本(重复)。因此,在这些副本上指定了规范链接关系。
If the preferred version of a IRI and its content exists at:
如果IRI的首选版本及其内容存在于:
http://www.example.com/page.php?item=purse
http://www.example.com/page.php?item=purse
Then duplicate content IRIs such as:
然后复制内容,例如:
http://www.example.com/page.php?item=purse&category=bags http://www.example.com/page.php?item=purse&category=bags&sid=1234
http://www.example.com/page.php?item=purse&category=bags http://www.example.com/page.php?item=purse&category=bags&sid=1234
may designate the canonical link relation in HTML as specified in [REC-html401-19991224]:
可以按照[REC-html401-19991224]中的规定在HTML中指定规范链接关系:
<link rel="canonical" href="http://www.example.com/page.php?item=purse">
<link rel="canonical" href="http://www.example.com/page.php?item=purse">
or as a relative IRI:
或作为相对IRI:
<link rel="canonical" href="page.php?item=purse">
<link rel="canonical" href="page.php?item=purse">
or alternatively, in the HTTP header field as specified in Section 5 of [RFC5988]:
或者,在[RFC5988]第5节规定的HTTP头字段中:
Link: <http://www.example.com/page.php?item=purse>; rel="canonical"
Link: <http://www.example.com/page.php?item=purse>; rel="canonical"
This signals to applications, such as search engines, that these are duplicates of the target (canonical) IRI:
这向应用程序(如搜索引擎)发出信号,表明这些是目标(规范)IRI的副本:
http://www.example.com/page.php?item=purse.
http://www.example.com/page.php?item=purse.
Applications may then select the canonical value as the display IRI (such as in search results), and additional IRI properties such as indexing and ranking signals can be transferred as well.
然后,应用程序可以选择规范值作为显示IRI(例如在搜索结果中),也可以传输额外的IRI属性,例如索引和排名信号。
Before adding the canonical link relation, verification of the following is RECOMMENDED:
在添加规范链接关系之前,建议验证以下内容:
1. The content of the context IRI is duplicated within the content of the target (canonical) IRI.
1. 上下文IRI的内容在目标(规范)IRI的内容中重复。
2. For HTTP, permanent HTTP redirects (Section 10.3.2 of [RFC2616]), the traditional strong indicator that a IRI's content has been permanently moved, could not be implemented in place of the canonical link relation.
2. 对于HTTP,永久HTTP重定向(RFC2616)的第10.3.2节,即IRI内容已永久移动的传统强指示符,无法代替规范链接关系来实现。
3. In the case where the target (canonical) IRI is a superset of content from the context IRI (i.e., the case where page-1.html and page-2.html designate page-all.html as the canonical), that the user experience is strongly taken into consideration, both in regard to possible increased load time and potential complexity in navigation.
3. 如果目标(规范)IRI是来自上下文IRI的内容超集(即,page-1.html和page-2.html将page-all.html指定为规范),则强烈考虑用户体验,包括可能增加的加载时间和导航的潜在复杂性。
IANA has registered the Canonical Link Relation below as per [RFC5988].
IANA已根据[RFC5988]注册了以下规范链接关系。
Relation Name:
关系名称:
canonical
典型的
Description:
说明:
Designates the preferred version of a resource (the IRI and its contents).
指定资源的首选版本(IRI及其内容)。
Reference:
参考:
This specification.
本规范。
Notes:
笔记:
None.
没有一个
Application Data:
应用数据:
None.
没有一个
When a site is compromised, the canonical link relation can be implemented with malicious intent to designate the attacker's IRI as the preferred version of the content. While this technique is largely unnoticeable to humans, automated programs may cluster the compromised resource as duplicative of the attacker's target IRI, transferring properties such as link popularity away from the compromised resource to the attacker's designated canonical. (Naturally, even a site that is not compromised could provide inaccurate or misleading information about which URI is canonical.)
当站点被破坏时,规范链接关系可能会被恶意实施,以将攻击者的IRI指定为内容的首选版本。虽然这种技术在很大程度上不为人类所察觉,但自动化程序可能会将受损资源群集为攻击者目标IRI的副本,从而将诸如链接流行度之类的属性从受损资源转移到攻击者指定的规范资源。(当然,即使是一个未被破坏的站点也可能提供关于哪个URI是规范的不准确或误导性信息。)
Internationalization considerations for link relations are provided in Section 8 of [RFC5988].
[RFC5988]第8节提供了链接关系的国际化注意事项。
[REC-html401-19991224] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 Specification", W3C Recommendation REC-html401-19991224, December 1999, <http://www.w3.org/TR/1999/REC-html401-19991224>.
[REC-html401-19991224]Raggett,D.,Le Hors,A.,和I.Jacobs,“HTML 4.01规范”,W3C建议REC-html401-19991224,1999年12月<http://www.w3.org/TR/1999/REC-html401-19991224>.
Latest version available at <http://www.w3.org/TR/html401>.
最新版本可于<http://www.w3.org/TR/html401>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC2616]菲尔丁,R.,盖蒂斯,J.,莫卧儿,J.,弗莱斯蒂克,H.,马斯特,L.,利奇,P.,和T.伯纳斯李,“超文本传输协议——HTTP/1.1”,RFC 2616,1999年6月。
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
[RFC3986]Berners Lee,T.,Fielding,R.,和L.Masinter,“统一资源标识符(URI):通用语法”,STD 66,RFC 3986,2005年1月。
[RFC5988] Nottingham, M., "Web Linking", RFC 5988, October 2010.
[RFC5988]诺丁汉,M.,“网络链接”,RFC 5988,2010年10月。
Automated programs that implement functionality with regard for the canonical link relation include:
实现规范链接关系相关功能的自动化程序包括:
o Google, canonical link relation HTML and HTTP header support, within the same domain and across domains:
o Google、规范链接关系HTML和HTTP标头支持,在同一域内和跨域:
* <http://googlewebmastercentral.blogspot.com/2009/02/ specify-your-canonical.html>
* <http://googlewebmastercentral.blogspot.com/2009/02/ specify-your-canonical.html>
* <http://googlewebmastercentral.blogspot.com/2011/06/ supporting-relcanonical-http-headers.html>
* <http://googlewebmastercentral.blogspot.com/2011/06/ supporting-relcanonical-http-headers.html>
* <http://googlewebmastercentral.blogspot.com/2009/12/ handling-legitimate-cross-domain.html>
* <http://googlewebmastercentral.blogspot.com/2009/12/ handling-legitimate-cross-domain.html>
o Yahoo, canonical link relation HTML support within the same domain:
o Yahoo,同一域内的规范链接关系HTML支持:
* <http://www.ysearchblog.com/2009/02/12/ fighting-duplication-adding-more-arrows-to-your-quiver/>
* <http://www.ysearchblog.com/2009/02/12/ fighting-duplication-adding-more-arrows-to-your-quiver/>
o Bing, canonical link relation HTML support within the same domain:
o Bing,同一域内的规范链接关系HTML支持:
* <http://www.bing.com/community/site_blogs/b/webmaster/archive/ 2009/02/12/ partnering-to-help-solve-duplicate-content-issues.aspx>
* <http://www.bing.com/community/site_blogs/b/webmaster/archive/ 2009/02/12/ partnering-to-help-solve-duplicate-content-issues.aspx>
Authors' Addresses
作者地址
Maile Ohye
主管麦里·奥耶
EMail: maileohye@gmail.com URI: http://maileohye.com/
EMail: maileohye@gmail.com URI: http://maileohye.com/
Joachim Kupke
约阿希姆·库普克
EMail: joachim@kupke.za.net
EMail: joachim@kupke.za.net