Network Working Group D. McPherson Request for Comments: 3345 TCB Category: Informational V. Gill AOL Time Warner, Inc. D. Walton A. Retana Cisco Systems, Inc. August 2002
Network Working Group D. McPherson Request for Comments: 3345 TCB Category: Informational V. Gill AOL Time Warner, Inc. D. Walton A. Retana Cisco Systems, Inc. August 2002
Border Gateway Protocol (BGP) Persistent Route Oscillation Condition
边界网关协议(BGP)持续路由振荡条件
Status of this Memo
本备忘录的状况
This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。
Copyright Notice
版权公告
Copyright (C) The Internet Society (2002). All Rights Reserved.
版权所有(C)互联网协会(2002年)。版权所有。
Abstract
摘要
In particular configurations, the BGP scaling mechanisms defined in "BGP Route Reflection - An Alternative to Full Mesh IBGP" and "Autonomous System Confederations for BGP" will introduce persistent BGP route oscillation. This document discusses the two types of persistent route oscillation that have been identified, describes when these conditions will occur, and provides some network design guidelines to avoid introducing such occurrences.
在特定配置中,“BGP路由反射-全网格IBGP的替代方案”和“BGP自治系统联合会”中定义的BGP缩放机制将引入持续的BGP路由振荡。本文档讨论了已识别的两种类型的持续路由振荡,描述了这些情况何时发生,并提供了一些网络设计指南,以避免引入此类情况。
The Border Gateway Protocol (BGP) is an inter-Autonomous System routing protocol. The primary function of a BGP speaking system is to exchange network reachability information with other BGP systems.
边界网关协议(BGP)是一种自治系统间路由协议。BGP语音系统的主要功能是与其他BGP系统交换网络可达性信息。
In particular configurations, the BGP [1] scaling mechanisms defined in "BGP Route Reflection - An Alternative to Full Mesh IBGP" [2] and "Autonomous System Confederations for BGP" [3] will introduce persistent BGP route oscillation.
在特定配置中,“BGP路由反射-全网格IBGP的替代方案”[2]和“BGP自治系统联合会”[3]中定义的BGP[1]缩放机制将引入持续的BGP路由振荡。
The problem is inherent in the way BGP works: locally defined routing policies may conflict globally, and certain types of conflicts can cause persistent oscillation of the protocol. Given current practices, we happen to see the problem manifest itself in the context of MED + route reflectors or confederations.
问题在于BGP的工作方式:本地定义的路由策略可能会在全局范围内发生冲突,而某些类型的冲突可能会导致协议的持续振荡。鉴于当前的实践,我们碰巧看到问题在MED+路线反射器或联合会的上下文中表现出来。
The current specification of BGP-4 [4] states that the MULTI_EXIT_DISC is only comparable between routes learned from the same neighboring AS. This limitation is consistent with the description of the attribute: "The MULTI_EXIT_DISC attribute may be used on external (inter-AS) links to discriminate among multiple exit or entry points to the same neighboring AS." [1,4]
BGP-4[4]的当前规范规定,多出口盘仅在从同一相邻设备学习的路由之间具有可比性。该限制与该属性的描述一致:“可在外部(AS间)链接上使用MULTI_EXIT_DISC属性,以区分与相邻AS相同的多个出口或入口点。”[1,4]
In a full mesh iBGP network, all the internal routers have complete visibility of the available exit points into a neighboring AS. The comparison of the MULTI_EXIT_DISC for only some paths is not a problem.
在全网状iBGP网络中,所有内部路由器都可以完全看到相邻AS的可用出口点。仅对某些路径比较MULTI_EXIT_光盘不是问题。
Because of the scalability implications of a full mesh iBGP network, two alternatives have been standardized: route reflectors [2] and AS confederations [3]. Both alternatives describe methods by which route distribution may be achieved without a full iBGP mesh in an AS.
由于全网状iBGP网络的可扩展性,两种备选方案已经标准化:路由反射器[2]和AS联盟[3]。两种备选方案都描述了在AS中无需完整iBGP网格即可实现路由分布的方法。
The route reflector alternative defines the ability to re-advertise (reflect) iBGP-learned routes to other iBGP peers once the best path is selected [2]. AS Confederations specify the operation of a collection of autonomous systems under a common administration as a single entity (i.e. from the outside, the internal topology and the existence of separate autonomous systems are not visible). In both cases, the reduction of the iBGP full mesh results in the fact that not all the BGP speakers in the AS have complete visibility of the available exit points into a neighboring AS. In fact, the visibility may be partial and inconsistent depending on the location (and function) of the router in the AS.
路由反射器替代方案定义了一旦选择了最佳路径,就可以向其他iBGP对等方重新公布(反映)iBGP学习的路由[2]。由于联邦将共同管理下的自治系统集合指定为单个实体(即,从外部看,内部拓扑和独立自治系统的存在不可见)。在这两种情况下,iBGP全网格的减少导致AS中并非所有BGP扬声器都能完全看到相邻AS中的可用出口点。事实上,可视性可能是局部的和不一致的,这取决于AS中路由器的位置(和功能)。
In certain topologies involving either route reflectors or confederations (detailed description later in this document), the partial visibility of the available exit points into a neighboring AS may result in an inconsistent best path selection decision as the routers don't have all the relevant information. If the inconsistencies span more than one peering router, they may result in a persistent route oscillation. The best path selection rules applied in this document are consistent with the current specification [4].
在涉及路由反射器或联盟的某些拓扑中(本文档后面将详细描述),由于路由器不具备所有相关信息,可用出口点在相邻AS中的局部可见性可能导致不一致的最佳路径选择决策。如果不一致跨越多个对等路由器,则可能导致路由持续振荡。本文档中应用的最佳路径选择规则与当前规范一致[4]。
The persistent route oscillation behavior is deterministic and can be avoided by employing some rudimentary BGP network design principles until protocol enhancements resolve the problem.
持续路由振荡行为是确定性的,可以通过采用一些基本的BGP网络设计原则来避免,直到协议增强解决了这个问题。
In the following sections a taxonomy of the types of oscillations is presented and a description of the set of conditions that will trigger route oscillations is given. We continue by providing several network design alternatives that remove the potential of this occurrence.
在以下章节中,将介绍振荡类型的分类,并对触发路由振荡的条件集进行描述。我们继续提供几种网络设计方案,以消除这种情况的可能性。
It is the intent of the authors that this document serve to increase operator awareness of the problem, as well as to trigger discussion and subsequent proposals for potential protocol enhancements that remove the possibility of this to occur.
作者的意图是,本文件有助于提高运营商对该问题的认识,并引发讨论,以及随后针对潜在协议增强提出建议,以消除出现这种情况的可能性。
The oscillations are classified into Type I and Type II depending upon the criteria documented below.
根据以下记录的标准,振荡分为I型和II型。
In the following two subsections we provide configurations under which Type I Churn will occur. We begin with a discussion of the problem when using Route Reflection, and then discuss the problem as it relates to AS Confederations.
在下面的两小节中,我们提供了I型搅动发生的配置。我们首先讨论使用路由反射时的问题,然后讨论与as联盟相关的问题。
In general, Type I Churn occurs only when BOTH of the following conditions are met:
一般来说,只有在满足以下两个条件时,才会发生I型搅动:
1) a single-level Route Reflection or AS Confederations design is used in the network AND
1) 在网络和网络中使用单级路由反射或AS联合设计
2) the network accepts the BGP MULTI_EXIT_DISC (MED) attribute from two or more ASs for a single prefix and the MED values are unique.
2) 网络接受来自两个或多个ASs的BGP MULTI_EXIT_DISC(MED)属性作为单个前缀,并且MED值是唯一的。
It is also possible for the non-deterministic ordering of paths to cause the route oscillation problem. [1] does not specify that paths should be ordered based on MEDs but it has been proven that non-deterministic ordering can lead to loops and inconsistent routing decisions. Most vendors have either implemented deterministic ordering as default behavior, or provide a knob that permits the operator to configure the router to order paths in a deterministic manner based on MEDs.
It is also possible for the non-deterministic ordering of paths to cause the route oscillation problem. [1] does not specify that paths should be ordered based on MEDs but it has been proven that non-deterministic ordering can lead to loops and inconsistent routing decisions. Most vendors have either implemented deterministic ordering as default behavior, or provide a knob that permits the operator to configure the router to order paths in a deterministic manner based on MEDs.translate error, please retry
We now discuss Type I oscillation as it relates to Route Reflection. To begin, consider the topology depicted in Figure 1:
现在我们讨论I型振荡,因为它与路由反射有关。首先,考虑图1中描述的拓扑结构:
--------------------------------------------------------------- / -------------------- -------------------- \ | / \ / \ | | | Cluster 1 | | Cluster 2 | | | | | | | | | | | *1 | | | | | Ra(RR) . . . . . . . . . . . . . . Rd(RR) | | | | . . | | . | | | | .*5 .*4 | | .*12 | | | | . . | | . | | | | Rb(C) Rc(C) | | Re(C) | | | | . . | | . | | | \ . . / \ . / | | ---.------------.--- ---------.---------- | \ .(10) .(1) AS1 .(0) / -------.------------.---------------------------.-------------- . . . ------ . ------------ . / \ . / \ . | AS10 | | AS6 | \ / \ / ------ ------------ . . . . . -------------- . / \ | AS100 |- 10.0.0.0/8 \ / --------------
--------------------------------------------------------------- / -------------------- -------------------- \ | / \ / \ | | | Cluster 1 | | Cluster 2 | | | | | | | | | | | *1 | | | | | Ra(RR) . . . . . . . . . . . . . . Rd(RR) | | | | . . | | . | | | | .*5 .*4 | | .*12 | | | | . . | | . | | | | Rb(C) Rc(C) | | Re(C) | | | | . . | | . | | | \ . . / \ . / | | ---.------------.--- ---------.---------- | \ .(10) .(1) AS1 .(0) / -------.------------.---------------------------.-------------- . . . ------ . ------------ . / \ . / \ . | AS10 | | AS6 | \ / \ / ------ ------------ . . . . . -------------- . / \ | AS100 |- 10.0.0.0/8 \ / --------------
Figure 1: Example Route Reflection Topology
图1:路由反射拓扑示例
In Figure 1 AS1 contains two Route Reflector Clusters, Clusters 1 and 2. Each Cluster contains one Route Reflector (RR) (i.e., Ra and Rd, respectively). An associated 'RR' in parentheses represents each RR. Cluster 1 contains two RR Clients (Rb and Rc), and Cluster 2 contains one RR Client (Re). An associated 'C' in parentheses indicates RR Client status. The dotted lines are used to represent BGP peering sessions.
在图1中,AS1包含两个路由反射器簇,簇1和簇2。每个集群包含一个路由反射器(RR)(即分别为Ra和Rd)。括号中的关联“RR”表示每个RR。集群1包含两个RR客户端(Rb和Rc),集群2包含一个RR客户端(Re)。括号中关联的“C”表示RR客户端状态。虚线用于表示BGP对等会话。
The number contained in parentheses on the AS1 EBGP peering sessions represents the MED value advertised by the peer to be associated with the 10.0.0.0/8 network reachability advertisement.
AS1 EBGP对等会话中括号中包含的数字表示对等方发布的与10.0.0.0/8网络可达性发布关联的MED值。
The number following each '*' on the IBGP peering sessions represents the additive IGP metrics that are to be associated with the BGP NEXT_HOP attribute for the concerned route. For example, the Ra IGP metric value associated with a NEXT_HOP learned via Rb would be 5; while the metric value associated with the NEXT_HOP learned via Re would be 13.
IBGP对等会话上每个“*”后面的数字表示与相关路由的BGP下一跳属性关联的附加IGP度量。例如,与经由Rb学习的下一跳相关联的Ra IGP度量值将是5;而与通过Re学习的下一跳相关的度量值为13。
Table 1 depicts the 10.0.0.0/8 route attributes as seen by routers Rb, Rc and Re, respectively. Note that the IGP metrics in Figure 1 are only of concern when advertising the route to an IBGP peer.
表1描述了路由器Rb、Rc和Re分别看到的10.0.0.0/8路由属性。请注意,图1中的IGP指标仅在向IBGP对等方公布路由时才值得关注。
Router MED AS_PATH -------------------- Rb 10 10 100 Rc 1 6 100 Re 0 6 100
Router MED AS_PATH -------------------- Rb 10 10 100 Rc 1 6 100 Re 0 6 100
Table 1: Route Attribute Table
表1:路由属性表
For the following steps 1 through 5, the best path will be marked with a '*'.
对于以下步骤1到5,最佳路径将标记为“*”。
1) Ra has the following installed in its BGP table, with the path learned via AS2 marked best:
1) Ra在其BGP表中安装了以下内容,通过AS2学习的路径标记为最佳:
NEXT_HOP AS_PATH MED IGP Cost ----------------------- 6 100 1 4 * 10 100 10 5
NEXT_HOP AS_PATH MED IGP Cost ----------------------- 6 100 1 4 * 10 100 10 5
The '10 100' route should not be marked as best, though this is not the cause of the persistent route oscillation. Ra realizes it has the wrong route marked as best since the '6 100' path has a lower IGP metric. As such, Ra makes this change and advertises an UPDATE message to its neighbors to let them know that it now considers the '6 100, 1, 4' route as best.
“10 100”路由不应标记为最佳,尽管这不是路由持续振荡的原因。Ra意识到,由于“6100”路径的IGP指标较低,所以它将错误的路径标记为最佳。因此,Ra做出了这一改变,并向其邻居发布了一条更新消息,让他们知道它现在认为“6100,1,4”路线是最好的。
2) Rd receives the UPDATE from Ra, which leaves Rd with the following installed in its BGP table:
2) Rd从Ra接收更新,Ra将在Rd的BGP表中安装以下内容:
NEXT_HOP AS_PATH MED IGP Cost ----------------------- * 6 100 0 12 6 100 1 5
NEXT_HOP AS_PATH MED IGP Cost ----------------------- * 6 100 0 12 6 100 1 5
Rd then marks the '6 100, 0, 12' route as best because it has a lower MED. Rd sends an UPDATE message to its neighbors to let them know that this is the best route.
Rd然后将“6100,0,12”路线标记为最佳路线,因为它的MED较低。Rd向其邻居发送更新消息,让他们知道这是最佳路由。
3) Ra receives the UPDATE message from Rd and now has the following in its BGP table:
3) Ra从Rd接收更新消息,现在其BGP表中包含以下内容:
NEXT_HOP AS_PATH MED IGP Cost ----------------------- 6 100 0 13 6 100 1 4 * 10 100 10 5
NEXT_HOP AS_PATH MED IGP Cost ----------------------- 6 100 0 13 6 100 1 4 * 10 100 10 5
The first route (6 100, 0, 13) beats the second route (6 100, 1, 4) because of a lower MED. Then the third route (10 100, 10, 5) beats the first route because of lower IGP metric to NEXT_HOP. Ra sends an UPDATE message to its peers informing them of the new best route.
由于MED较低,第一条路线(6100,0,13)优于第二条路线(6100,1,4)。然后第三条路由(10100,10,5)击败第一条路由,因为下一跳的IGP度量较低。Ra向其对等方发送更新消息,通知他们新的最佳路由。
4) Rd receives the UPDATE message from Ra, which leaves Rd with the following BGP table:
4) Rd从Ra接收更新消息,该消息将为Rd留下以下BGP表:
NEXT_HOP AS_PATH MED IGP Cost ----------------------- 6 100 0 12 * 10 100 10 6
NEXT_HOP AS_PATH MED IGP Cost ----------------------- 6 100 0 12 * 10 100 10 6
Rd selects the '10 100, 10, 6' path as best because of the IGP metric. Rd sends an UPDATE/withdraw to its peers letting them know this is the best route.
由于IGP度量,Rd选择“10 100,10,6”路径作为最佳路径。Rd向其对等方发送更新/撤销,让他们知道这是最佳路线。
5) Ra receives the UPDATE message from Rd, which leaves Ra with the following BGP table:
5) Ra从Rd接收更新消息,这将为Ra留下以下BGP表:
NEXT_HOP AS_PATH MED IGP Cost ----------------------- 6 100 1 4 * 10 100 10 5
NEXT_HOP AS_PATH MED IGP Cost ----------------------- 6 100 1 4 * 10 100 10 5
Ra received an UPDATE/withdraw for '6 100, 0, 13', which changes what is considered the best route for Ra. This is why Ra has the '10 100, 10, 5' route selected as best in Step 1, even though '6 100, 1, 4' is actually better.
Ra收到“6100,0,13”的更新/撤销,这改变了Ra认为的最佳路线。这就是为什么Ra在步骤1中选择了“10 100,10,5”路线作为最佳路线,尽管“6 100,1,4”实际上更好。
At this point, we've made a full loop and are back at Step 1. The router realizes it is using the incorrect best path, and repeats the cycle. This is an example of Type I Churn when using Route Reflection.
现在,我们已经完成了一个完整的循环,回到了步骤1。路由器意识到它正在使用不正确的最佳路径,并重复该循环。这是使用路由反射时I型搅动的一个示例。
Now we provide an example of Type I Churn occurring with AS Confederations. To begin, consider the topology depicted in Figure 2:
现在我们提供了一个发生在AS联盟的I型客户流失的示例。首先,考虑图2中描述的拓扑结构:
--------------------------------------------------------------- / -------------------- -------------------- \ | / \ / \ | | | Sub-AS 65000 | | Sub-AS 65001 | | | | | | | | | | | *1 | | | | | Ra . . . . . . . . . . . . . . . . . Rd | | | | . . | | . | | | | .*3 .*2 | | .*6 | | | | . . | | . | | | | Rb . . . . . Rc | | Re | | | | . *5 . | | . | | | \ . . / \ . / | | ---.------------.--- ---------.---------- | \ .(10) .(1) AS1 .(0) / -------.------------.---------------------------.-------------- . . . ------ . ------------ . / \ . / \ . | AS10 | | AS6 | \ / \ / ------ ------------ . . . . . -------------- . / \ | AS100 |- 10.0.0.0/8 \ / --------------
--------------------------------------------------------------- / -------------------- -------------------- \ | / \ / \ | | | Sub-AS 65000 | | Sub-AS 65001 | | | | | | | | | | | *1 | | | | | Ra . . . . . . . . . . . . . . . . . Rd | | | | . . | | . | | | | .*3 .*2 | | .*6 | | | | . . | | . | | | | Rb . . . . . Rc | | Re | | | | . *5 . | | . | | | \ . . / \ . / | | ---.------------.--- ---------.---------- | \ .(10) .(1) AS1 .(0) / -------.------------.---------------------------.-------------- . . . ------ . ------------ . / \ . / \ . | AS10 | | AS6 | \ / \ / ------ ------------ . . . . . -------------- . / \ | AS100 |- 10.0.0.0/8 \ / --------------
Figure 2: Example AS Confederations Topology
图2:联盟拓扑的示例
The number contained in parentheses on each AS1 EBGP peering session represents the MED value advertised by the peer to be associated with the 10.0.0.0/8 network reachability advertisement.
每个AS1 EBGP对等会话上括号中包含的数字表示对等方发布的与10.0.0.0/8网络可达性发布关联的MED值。
The number following each '*' on the IBGP peering sessions represents the additive IGP metrics that are to be associated with the BGP NEXT_HOP attribute for the concerned route.
IBGP对等会话上每个“*”后面的数字表示与相关路由的BGP下一跳属性关联的附加IGP度量。
For example, the Ra IGP metric value associated with a NEXT_HOP learned via Rb would be 3; while the metric value associated with the NEXT_HOP learned via Re would be 6.
例如,与经由Rb学习的下一跳相关联的Ra IGP度量值将是3;而与通过Re学习的下一跳相关的度量值为6。
Table 2 depicts the 10.0.0.0/8 route attributes as seen by routers Rb, Rc and Re, respectively. Note that the IGP metrics in Figure 2 are only of concern when advertising the route to an IBGP peer.
表2描述了路由器Rb、Rc和Re分别看到的10.0.0.0/8路由属性。请注意,图2中的IGP指标仅在向IBGP对等方公布路由时才值得关注。
Router MED AS_PATH -------------------- Rb 10 10 100 Rc 1 6 100 Re 0 6 100
Router MED AS_PATH -------------------- Rb 10 10 100 Rc 1 6 100 Re 0 6 100
Table 2: Route Attribute Table
表2:路由属性表
For the following steps 1 through 6 the best route will be marked with an '*'.
对于以下步骤1至6,最佳路线将标记为“*”。
1) Ra has the following BGP table:
1) Ra有以下BGP表:
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- * 10 100 10 3 (65001) 6 100 0 7 6 100 1 2
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- * 10 100 10 3 (65001) 6 100 0 7 6 100 1 2
The '10 100' route is selected as best and is advertised to Rd, though this is not the cause of the persistent route oscillation.
“10 100”路由被选为最佳路由,并公布给Rd,但这不是路由持续振荡的原因。
2) Rd has the following in its BGP table:
2) Rd的BGP表中包含以下内容:
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- 6 100 0 6 * (65000) 10 100 10 4
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- 6 100 0 6 * (65000) 10 100 10 4
The '(65000) 10 100' route is selected as best because it has the lowest IGP metric. As a result, Rd sends an UPDATE/withdraw to Ra for the '6 100' route that it had previously advertised.
“(65000)10 100”路线被选为最佳路线,因为其IGP指标最低。因此,Rd向Ra发送其先前公布的“6100”路线的更新/撤销。
3) Ra receives the withdraw from Rd. Ra now has the following in its BGP table:
3) Ra收到从Rd提取的通知。Ra现在在其BGP表中有以下内容:
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- * 10 100 10 3 6 100 1 2
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- * 10 100 10 3 6 100 1 2
Ra received a withdraw for '(65001) 6 100', which changes what is considered the best route for Ra. Ra does not compute the best path for a prefix unless its best route was withdrawn. This is why Ra has the '10 100, 10, 3' route selected as best, even though the '6 100, 1, 2' route is better.
Ra收到了针对(65001)6100'的撤回,这改变了Ra认为的最佳路线。Ra不会计算前缀的最佳路径,除非其最佳路由被撤回。这就是为什么Ra选择了“10 100,10,3”路线作为最佳路线,尽管“6 100,1,2”路线更好。
4) Ra's periodic BGP scanner runs and realizes that the '6 100' route is better because of the lower IGP metric. Ra sends an UPDATE/withdraw to Rd for the '10 100' route since Ra is now using the '6 100' path as its best route.
4) Ra的定期BGP扫描仪运行并意识到“6100”路线更好,因为IGP指标更低。Ra向Rd发送“10 100”路由的更新/撤销,因为Ra现在使用“6 100”路径作为其最佳路由。
Ra's BGP table looks like this:
Ra的BGP表如下所示:
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- 10 100 10 3 * 6 100 1 2
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- 10 100 10 3 * 6 100 1 2
5) Rd receives the UPDATE from Ra and now has the following in its BGP table:
5) Rd从Ra接收更新,现在在其BGP表中有以下内容:
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- (65000) 6 100 1 3 * 6 100 0 6
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- (65000) 6 100 1 3 * 6 100 0 6
Rd selects the '6 100, 0, 6' route as best because of the lower MED value. Rd sends an UPDATE message to Ra, reporting that '6 100, 0, 6' is now the best route.
Rd选择“6 100,0,6”路线作为最佳路线,因为MED值较低。Rd向Ra发送更新消息,报告“6100,0,6”现在是最佳路线。
6) Ra receives the UPDATE from Rd. Ra now has the following in its BGP table:
6) Ra从Rd接收更新。Ra的BGP表中现在有以下内容:
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- * 10 100 10 3 (65001) 6 100 0 7 6 100 1 2
NEXT_HOP AS_PATH MED IGP Cost ------------------------------- * 10 100 10 3 (65001) 6 100 0 7 6 100 1 2
At this point we have made a full cycle and are back to step 1. This is an example of Type I Churn with AS Confederations.
在这一点上,我们已经完成了一个完整的循环,并返回到步骤1。这是一个I型搅动AS联盟的例子。
There are a number of alternatives that can be employed to avoid this problem:
有许多替代方案可用于避免此问题:
1) When using Route Reflection make sure that the inter-Cluster links have a higher IGP metric than the intra-Cluster links. This is the preferred choice when using Route Reflection. Had the inter-Cluster IGP metrics been much larger than the intra-Cluster IGP metrics, the above would not have occurred.
1) 使用路由反射时,请确保簇间链路的IGP度量高于簇内链路。这是使用路由反射时的首选选项。如果集群间IGP度量远大于集群内IGP度量,则不会发生上述情况。
2) When using AS Confederations ensure that the inter-Sub-AS links have a higher IGP metric than the intra-Sub-AS links. This is the preferred option when using AS Confederations. Had the inter-Sub-AS IGP metrics been much larger than the intra-Sub-AS IGP metrics, the above would not have occurred.
2) 当使用AS联盟时,确保子AS链路间的IGP度量高于子AS链路内的IGP度量。这是作为联盟使用时的首选选项。如果子系统间AS IGP度量远大于子系统内AS IGP度量,则不会发生上述情况。
3) Do not accept MEDs from peers (this may not be a feasible alternative).
3) 不要接受来自同行的药物(这可能不是一个可行的选择)。
4) Utilize other BGP attributes higher in the decision process so that the BGP decision algorithm never reaches the MED step. As using this completely overrides MEDs, Option 3 may make more sense.
4) 在决策过程中利用其他较高的BGP属性,以便BGP决策算法永远不会达到MED步骤。由于使用这种方法完全可以替代药物,选择3可能更有意义。
5) Always compare BGP MEDs, regardless of whether or not they were obtained from a single AS. This is probably a bad idea since MEDs may be derived in a number of ways, and are typically done so as a matter of operator-specific policy. As such, comparing MED values for a single prefix learned from multiple ASs is ill-advised. Of course, this mostly defeats the purpose of MEDs, and as such, Option 3 may be a more viable alternative.
5) 始终比较BGP药物,无论它们是否来自单一AS。这可能是一个坏主意,因为MED可能以多种方式衍生,并且通常作为特定于运营商的政策。因此,比较从多个ASs学习的单个前缀的MED值是不明智的。当然,这在很大程度上违背了药物的目的,因此,方案3可能是一个更可行的替代方案。
6) Use a full IBGP mesh. This is not a feasible solution for ASs with a large number of BGP speakers.
6) 使用完整的IBGP网格。对于有大量BGP扬声器的ASs来说,这不是一个可行的解决方案。
In the following subsection we provide configurations under which Type II Churn will occur when using AS Confederations. For the sake of brevity, we avoid similar discussion of the occurrence when using Route Reflection.
在下面的小节中,我们提供了在作为联盟使用时发生II型搅动的配置。为简洁起见,在使用路由反射时,我们避免类似的事件讨论。
In general, Type II churn occurs only when BOTH of the following conditions are met:
一般来说,只有在满足以下两个条件时,才会发生II型搅动:
1) More than one tier of Route Reflection or Sub-ASs is used in the network AND
1) 网络中使用了一层以上的路由反射或子ASs,并且
2) the network accepts the BGP MULTI_EXIT_DISC (MED) attribute from two or more ASs for a single prefix and the MED values are unique.
2) 网络接受来自两个或多个ASs的BGP MULTI_EXIT_DISC(MED)属性作为单个前缀,并且MED值是唯一的。
Let's now examine the occurrence of Type II Churn as it relates to AS Confederations. Figure 3 provides our sample topology:
现在让我们来检查与as联盟相关的II型流失的发生情况。图3提供了我们的示例拓扑:
--------------------------------------------------------------- / ------------------- \ | AS 1 / Sub-AS 65500 \ | | | | | | | Rc . . . . Rd | | | | . *2 . | | | \ . . / | | .-----------------. | | .*40 .*40 | | --------------.----- --.----------------- | | / . \ / . \ | | | Sub-AS . | | . Sub-AS | | | | 65501 . | | . 65502 | | | | Rb | | Re | | | | . | | . . | | | | .*10 | | *2. .*3 | | | | . | | . . | | | | Ra | | . Rg . . . Rf | | | \ . / . . / | | ----------.---------- . -------------.------- | \ .(0) .(1) .() / ----------------.---------------.-------------------.----------
--------------------------------------------------------------- / ------------------- \ | AS 1 / Sub-AS 65500 \ | | | | | | | Rc . . . . Rd | | | | . *2 . | | | \ . . / | | .-----------------. | | .*40 .*40 | | --------------.----- --.----------------- | | / . \ / . \ | | | Sub-AS . | | . Sub-AS | | | | 65501 . | | . 65502 | | | | Rb | | Re | | | | . | | . . | | | | .*10 | | *2. .*3 | | | | . | | . . | | | | Ra | | . Rg . . . Rf | | | \ . / . . / | | ----------.---------- . -------------.------- | \ .(0) .(1) .() / ----------------.---------------.-------------------.----------
. . . --------- . --------- |AS 200 | |AS 300 | --------- --------- . . . . ------------------- | AS 400 | - 10.0.0.0/8 -------------------
. . . --------- . --------- |AS 200 | |AS 300 | --------- --------- . . . . ------------------- | AS 400 | - 10.0.0.0/8 -------------------
Figure 3: Example AS Confederations Topology
图3:联盟拓扑的示例
In Figure 3 AS 1 contains three Sub-ASs, 65500, 65501 and 65502. No RR is used within the Sub-AS, and as such, all routers within each Sub-AS are fully meshed. Ra and Rb are members of Sub-AS 65501. Rc and Rd are members of Sub-AS 65500. Ra and Rg are EBGP peering with AS 200, router Rf has an EBGP peering with AS 300. AS 200 and AS 300 provide transit for AS 400, and in particular, the 10/8 network. The dotted lines are used to represent BGP peering sessions.
在图3中,AS 1包含三个子ASs,65500、65501和65502。子AS中不使用RR,因此,每个子AS中的所有路由器都是完全网状的。Ra和Rb是Sub AS 65501的成员。Rc和Rd是Sub AS 65500的成员。Ra和Rg是EBGP对等AS 200,Rf路由器有EBGP对等AS 300。AS 200和AS 300为AS 400,特别是10/8网络提供中转。虚线用于表示BGP对等会话。
The number following each '*' on the BGP peering sessions represents the additive IGP metrics that are to be associated with the BGP NEXT_HOP. The number contained in parentheses on each AS 1 EBGP peering session represents the MED value advertised by the peer to be associated with the network reachability advertisement (10.0.0.0/8).
BGP对等会话上每个“*”后面的数字表示将与BGP下一跳关联的附加IGP度量。每个AS 1 EBGP对等会话上括号中包含的数字表示对等方发布的与网络可达性发布相关联的MED值(10.0.0.0/8)。
Rc, Rd and Re are the primary routers involved in the churn, and as such, will be the only BGP tables that we will monitor step by step.
Rc、Rd和Re是搅动中涉及的主要路由器,因此,将是我们将逐步监控的唯一BGP表。
For the following steps 1 through 8 each router's best route will be marked with a '*'.
对于以下步骤1到8,每个路由器的最佳路由将标记为“*”。
1) Re receives the AS 400 10.0.0.0/8 route advertisement via AS 200 from Rg and AS 300 from Rf. Re selects the path via Rg and AS 200 because of IGP metric (Re didn't consider MED because the advertisements were received from different ASs).
1) 通过来自Rg的AS 200和来自Rf的AS 300重新接收AS 400 10.0.0.0/8路由广告。重新选择路径通过RG和AS 200由于IGP度量(RE没有考虑MED,因为广告是从不同的屁股接收)。
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Re * 200 400 1 2 300 400 3
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Re * 200 400 1 2 300 400 3
Re sends an UPDATE message to Rd advertising its new best path '200 400, 1'.
向Rd重新发送更新消息,宣传其新的最佳路径“200 400,1”。
2) The '200 400, 0' path was advertised from Ra to Rb, and then from Rb to Rc. Rd learns the '200 400, 1' path from Re.
2) “200 400,0”路径从Ra播发到Rb,然后从Rb播发到Rc。Rd从Re学习“200 400,1”路径。
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------- Rc * 200 400 0 50 Rd * 200 400 1 42 Re 300 400 3 * 200 400 1 2
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------- Rc * 200 400 0 50 Rd * 200 400 1 42 Re 300 400 3 * 200 400 1 2
3) Rc and Rd advertise their best paths to each other; Rd selects '200 400, 0' because of the MED.
3) Rc和Rd相互宣传他们的最佳路径;Rd选择“200 400,0”,因为存在MED。
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc * 200 400 0 50 200 400 1 44 Rd * 200 400 0 52 200 400 1 42 Re 300 400 3 * 200 400 1 2
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc * 200 400 0 50 200 400 1 44 Rd * 200 400 0 52 200 400 1 42 Re 300 400 3 * 200 400 1 2
Rd has a new best path so it sends an UPDATE to to Re, announcing the new path and an UPDATE/withdraw for '200 400, 1' to Rc.
Rd有一个新的最佳路径,因此它向Re发送更新,向Rc宣布新路径和更新/撤销'200 400,1'。
4) Re now selects '300 400' (with no MED) because '200 400, 0' beats '200 400, 1' based on MED and '300 400' beats '200 400, 0' because of IGP metric.
4) Re现在选择“300 400”(无MED),因为“200 400,0”比“200 400,1”基于MED,而“300 400”比“200 400,0”基于IGP度量。
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc * 200 400 0 50 Rd * 200 400 0 52 200 400 1 42 Re * 300 400 3 200 400 0 92
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc * 200 400 0 50 Rd * 200 400 0 52 200 400 1 42 Re * 300 400 3 200 400 0 92
Re has a new best path and sends an UPDATE to Rd for '300 400'.
Re有一个新的最佳路径,并向Rd发送“300 400”的更新。
5) Rd selects the '300 400' path because of IGP metric.
5) 由于IGP度量,Rd选择“300 400”路径。
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc * 200 400 0 50 Rd 200 400 0 52 * 300 400 43 Re * 300 400 3 200 400 0 92 200 400 1 2
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc * 200 400 0 50 Rd 200 400 0 52 * 300 400 43 Re * 300 400 3 200 400 0 92 200 400 1 2
Rd has a new best path so it sends an UPDATE to Rc and a UPDATE/withdraw to Re for '200 400, 0'.
Rd有一个新的最佳路径,因此它向Rc发送更新,并向Re发送“200 400,0”的更新/撤销。
6) Rc selects '300 400' because of the IGP metric. Re selects '200 400, 1' because of the IGP metric.
6) 由于IGP指标,Rc选择“300 400”。由于IGP指标,重新选择“200 400,1”。
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc 200 400 0 50 * 300 400 45 Rd 200 400 0 52 * 300 400 43 Re 300 400 3 * 200 400 1 2
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc 200 400 0 50 * 300 400 45 Rd 200 400 0 52 * 300 400 43 Re 300 400 3 * 200 400 1 2
Rc sends an UPDATE/withdraw for '200 400, 0' to Rd. Re sends an UPDATE for '200 400, 1' to Rd.
Rc向Rd发送“200 400,0”的更新/撤消。向Rd重新发送“200 400,1”的更新。
7) Rd selects '200 400, 1' as its new best path based on the IGP metric.
7) Rd根据IGP度量选择“200 400,1”作为其新的最佳路径。
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc 200 400 0 50 * 300 400 45 Rd * 200 400 1 42 Re 300 400 3 * 200 400 1 2
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc 200 400 0 50 * 300 400 45 Rd * 200 400 1 42 Re 300 400 3 * 200 400 1 2
Rd sends an UPDATE to Rc, announcing '200 400, 1' and implicitly withdraws '300 400'.
Rd向Rc发送更新,宣布“200400,1”,并隐式撤回“300400”。
8) Rc selects '200 400, 0'.
8) Rc选择“200400,0”。
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc * 200 400 0 50 200 400 1 44 Rd * 200 400 1 42 Re 300 400 3 * 200 400 1 2
NEXT_HOP Router AS_PATH MED IGP Cost ------------------------------ Rc * 200 400 0 50 200 400 1 44 Rd * 200 400 1 42 Re 300 400 3 * 200 400 1 2
At this point we are back to Step 2 and are in a loop.
现在我们回到第2步,进入一个循环。
1) Do not accept MEDs from peers (this may not be a feasible alternative).
1) 不要接受来自同行的药物(这可能不是一个可行的选择)。
2) Utilize other BGP attributes higher in the decision process so that the BGP decision algorithm selects a single AS before it reaches the MED step. For example, if local-pref were set based on the advertising AS, then you first eliminate all routes except those in a single AS. In the example, router Re would pick either X or Y based on your local-pref and never change selections.
2) 在决策过程中利用其他更高的BGP属性,以便BGP决策算法在到达MED步骤之前选择单个属性。例如,如果本地pref是基于广告AS设置的,则首先消除除单个AS中的路由之外的所有路由。在本例中,路由器Re将根据本地pref选择X或Y,并且从不更改选择。
This leaves two simple workarounds for the two types of problems.
这就为这两类问题留下了两个简单的解决方法。
Type I: Make inter-cluster or inter-sub-AS link metrics higher than intra-cluster or intra-sub-AS metrics.
类型I:使集群间或子集群间作为链路度量高于集群内或子集群内作为度量。
Type II: Make route selections based on local-pref assigned to the advertising AS first and then use IGP cost and MED to make selection among routes from the same AS.
类型II:首先根据分配给广告AS的本地pref进行路线选择,然后使用IGP cost和MED从同一AS的路线中进行选择。
Note that this requires per-prefix policies, as well as near intimate knowledge of other networks by the network operator. The authors are not aware of ANY [large] provider today that performs per-prefix policies on routes learned from peers. Implicitly removing this dynamic portion of route selection does not appear to be a viable option in today's networks. The main point is that an available workaround using local-pref so that no two AS's advertise a given prefix at the same local-pref solves type II churn.
请注意,这需要每个前缀的策略,以及网络运营商对其他网络的近距离了解。作者不知道现在有哪家[大型]提供商在从对等方学习的路由上执行按前缀策略。隐式删除路由选择的这一动态部分在今天的网络中似乎不是一个可行的选择。主要的一点是,一个可用的解决方法是使用本地pref,这样就不会有两个AS在同一个本地pref上公布给定的前缀,从而解决了类型II的搅动问题。
3) Always compare BGP MEDs, regardless of whether or not they were obtained from a single AS. This is probably a bad idea since MEDs may be derived in a number of ways, and are typically done so as a matter of operator-specific policy and largely a function of available metric space provided by the employed IGP. As such, comparing MED values for a single prefix learned from multiple ASs is ill-advised. This mostly defeats the purpose of MEDs; Option 1 may be a more viable alternative.
3) 始终比较BGP药物,无论它们是否来自单一AS。这可能是一个坏主意,因为MED可能以多种方式导出,并且通常是作为特定于运营商的政策,并且主要是所用IGP提供的可用度量空间的函数。因此,比较从多个ASs学习的单个前缀的MED值是不明智的。这在很大程度上违背了药物的作用;备选方案1可能是一个更可行的备选方案。
4) Do not use more than one tier of Route Reflection or Sub-ASs in the network. The risk of route oscillation should be considered when designing networks that might use a multi-tiered routing isolation architecture.
4) 请勿在网络中使用一层以上的路由反射或子ASs。在设计可能使用多层路由隔离体系结构的网络时,应考虑路由振荡的风险。
5) In a RR topology, mesh the clients. For confederations, mesh the border routers at each level in the hierarchy. In Figure 3, for example, if Rb and Re are peers, then there's no churn.
5) 在RR拓扑中,对客户端进行网格划分。对于联盟,在层次结构中的每一级对边界路由器进行网格划分。例如,在图3中,如果Rb和Re是对等的,那么就没有搅动。
It should be stated that protocol enhancements regarding this problem must be pursued. Imposing network design requirements, such as those outlined above, are clearly an unreasonable long-term solution. Problems such as this should not occur under 'default' protocol configurations.
应该指出的是,必须对该问题进行协议增强。强加网络设计要求,如上述要求,显然是不合理的长期解决方案。在“默认”协议配置下,此类问题不应出现。
This discussion introduces no new security concerns to BGP or other specifications referenced in this document.
本讨论不会对BGP或本文档中引用的其他规范引入新的安全问题。
The authors would like to thank Curtis Villamizar, Tim Griffin, John Scudder, Ron Da Silva, Jeffrey Haas and Bill Fenner.
作者要感谢Curtis Villamizar、Tim Griffin、John Scudder、Ron Da Silva、Jeffrey Haas和Bill Fenner。
[1] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 1995.
[1] Rekhter,Y.和T.Li,“边境网关协议4(BGP-4)”,RFC 17711995年3月。
[2] Bates, T., Chandra, R. and E. Chen, "BGP Route Reflection - An Alternative to Full Mesh IBGP", RFC 2796, April 2000.
[2] Bates,T.,Chandra,R.和E.Chen,“BGP路线反射-全网格IBGP的替代方案”,RFC 2796,2000年4月。
[3] Traina, P., McPherson, D. and J. Scudder, J., "Autonomous System Confederations for BGP", RFC 3065, February 2001.
[3] Traina,P.,McPherson,D.和J.Scudder,J.,“BGP自治系统联合会”,RFC 3065,2001年2月。
[4] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)", Work in Progress.
[4] Rekhter,Y.和T.Li,“边境网关协议4(BGP-4)”,工作正在进行中。
Danny McPherson TCB EMail: danny@tcb.net
Danny McPherson TCB电子邮件:danny@tcb.net
Vijay Gill AOL Time Warner, Inc. 12100 Sunrise Valley Drive Reston, VA 20191 EMail: vijay@umbc.edu
Vijay Gill AOL时代华纳公司,地址:弗吉尼亚州莱斯顿日出谷大道12100号,邮编:20191电子邮件:vijay@umbc.edu
Daniel Walton Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 EMail: dwalton@cisco.com
Daniel Walton Cisco Systems,Inc.地址:北卡罗来纳州三角研究公园Kit Creek路7025号邮编:27709电子邮件:dwalton@cisco.com
Alvaro Retana Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 EMail: aretana@cisco.com
Alvaro Retana Cisco Systems,Inc.地址:北卡罗来纳州三角研究公园Kit Creek路7025号邮编:27709电子邮件:aretana@cisco.com
Copyright (C) The Internet Society (2002). All Rights Reserved.
版权所有(C)互联网协会(2002年)。版权所有。
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。
Acknowledgement
确认
Funding for the RFC Editor function is currently provided by the Internet Society.
RFC编辑功能的资金目前由互联网协会提供。