Internet Engineering Task Force (IETF) N. Hilliard Request for Comments: 7948 INEX Category: Informational E. Jasinska ISSN: 2070-1721 BigWave IT R. Raszuk Bloomberg LP N. Bakker Akamai Technologies B.V. September 2016
Internet Engineering Task Force (IETF) N. Hilliard Request for Comments: 7948 INEX Category: Informational E. Jasinska ISSN: 2070-1721 BigWave IT R. Raszuk Bloomberg LP N. Bakker Akamai Technologies B.V. September 2016
Internet Exchange BGP Route Server Operations
Internet Exchange BGP路由服务器操作
Abstract
摘要
The popularity of Internet Exchange Points (IXPs) brings new challenges to interconnecting networks. While bilateral External BGP (EBGP) sessions between exchange participants were historically the most common means of exchanging reachability information over an IXP, the overhead associated with this interconnection method causes serious operational and administrative scaling problems for IXP participants.
互联网交换点(IXP)的普及给互联网络带来了新的挑战。虽然exchange参与者之间的双边外部BGP(EBGP)会话在历史上是通过IXP交换可达性信息的最常见方式,但与此互连方法相关的开销会给IXP参与者带来严重的操作和管理扩展问题。
Multilateral interconnection using Internet route servers can dramatically reduce the administrative and operational overhead associated with connecting to IXPs; in some cases, route servers are used by IXP participants as their preferred means of exchanging routing information.
使用Internet路由服务器的多边互连可以显著减少与连接到IXP相关的管理和操作开销;在某些情况下,IXP参与者使用路由服务器作为交换路由信息的首选方式。
This document describes operational considerations for multilateral interconnections at IXPs.
本文件描述了IXPs多边互连的操作注意事项。
Status of This Memo
关于下段备忘
This document is not an Internet Standards Track specification; it is published for informational purposes.
本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 7841.
本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 7841第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7948.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc7948.
Copyright Notice
版权公告
Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2016 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
Table of Contents
目录
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 2. Bilateral BGP Sessions . . . . . . . . . . . . . . . . . . . 3 3. Multilateral Interconnection . . . . . . . . . . . . . . . . 4 4. Operational Considerations for Route Server Installations . . 6 4.1. Path Hiding . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Route Server Scaling . . . . . . . . . . . . . . . . . . 6 4.2.1. Tackling Scaling Issues . . . . . . . . . . . . . . . 7 4.2.1.1. View Merging and Decomposition . . . . . . . . . 7 4.2.1.2. Destination Splitting . . . . . . . . . . . . . . 8 4.2.1.3. NEXT_HOP Resolution . . . . . . . . . . . . . . . 8 4.3. Prefix Leakage Mitigation . . . . . . . . . . . . . . . . 8 4.4. Route Server Redundancy . . . . . . . . . . . . . . . . . 9 4.5. AS_PATH Consistency Check . . . . . . . . . . . . . . . . 9 4.6. Export Routing Policies . . . . . . . . . . . . . . . . . 10 4.6.1. BGP Communities . . . . . . . . . . . . . . . . . . . 10 4.6.2. Internet Routing Registries . . . . . . . . . . . . . 10 4.6.3. Client-Accessible Databases . . . . . . . . . . . . . 10 4.7. Layer 2 Reachability Problems . . . . . . . . . . . . . . 11 4.8. BGP NEXT_HOP Hijacking . . . . . . . . . . . . . . . . . 11 4.9. BGP Operations and Security . . . . . . . . . . . . . . . 13 5. Security Considerations . . . . . . . . . . . . . . . . . . . 13 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.1. Normative References . . . . . . . . . . . . . . . . . . 13 6.2. Informative References . . . . . . . . . . . . . . . . . 14 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 2. Bilateral BGP Sessions . . . . . . . . . . . . . . . . . . . 3 3. Multilateral Interconnection . . . . . . . . . . . . . . . . 4 4. Operational Considerations for Route Server Installations . . 6 4.1. Path Hiding . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Route Server Scaling . . . . . . . . . . . . . . . . . . 6 4.2.1. Tackling Scaling Issues . . . . . . . . . . . . . . . 7 4.2.1.1. View Merging and Decomposition . . . . . . . . . 7 4.2.1.2. Destination Splitting . . . . . . . . . . . . . . 8 4.2.1.3. NEXT_HOP Resolution . . . . . . . . . . . . . . . 8 4.3. Prefix Leakage Mitigation . . . . . . . . . . . . . . . . 8 4.4. Route Server Redundancy . . . . . . . . . . . . . . . . . 9 4.5. AS_PATH Consistency Check . . . . . . . . . . . . . . . . 9 4.6. Export Routing Policies . . . . . . . . . . . . . . . . . 10 4.6.1. BGP Communities . . . . . . . . . . . . . . . . . . . 10 4.6.2. Internet Routing Registries . . . . . . . . . . . . . 10 4.6.3. Client-Accessible Databases . . . . . . . . . . . . . 10 4.7. Layer 2 Reachability Problems . . . . . . . . . . . . . . 11 4.8. BGP NEXT_HOP Hijacking . . . . . . . . . . . . . . . . . 11 4.9. BGP Operations and Security . . . . . . . . . . . . . . . 13 5. Security Considerations . . . . . . . . . . . . . . . . . . . 13 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.1. Normative References . . . . . . . . . . . . . . . . . . 13 6.2. Informative References . . . . . . . . . . . . . . . . . 14 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15
Internet Exchange Points (IXPs) provide IP data interconnection facilities for their participants, using data link-layer protocols such as Ethernet. The Border Gateway Protocol (BGP) [RFC4271] is normally used to facilitate exchange of network reachability information over these media.
互联网交换点(IXP)使用数据链路层协议(如以太网)为参与者提供IP数据互连设施。边界网关协议(BGP)[RFC4271]通常用于促进通过这些媒体交换网络可达性信息。
As bilateral interconnection between IXP participants requires operational and administrative overhead, BGP route servers [RFC7947] are often deployed by IXP operators to provide a simple and convenient means of interconnecting IXP participants with each other. A route server redistributes BGP routes received from its BGP clients to other clients according to a prespecified policy, and it can be viewed as similar to an EBGP equivalent of an Internal BGP (IBGP) [RFC4456] route reflector.
由于IXP参与者之间的双边互连需要操作和管理开销,因此通常由IXP运营商部署BGP路由服务器[RFC7947],以提供一种简单方便的方式将IXP参与者相互互连。路由服务器根据预先指定的策略将从其BGP客户端接收到的BGP路由重新分配给其他客户端,并且可以将其视为类似于内部BGP(IBGP)[RFC4456]路由反射器的EBGP。
Route servers at IXPs require careful management, and it is important for route server operators to thoroughly understand both how they work and what their limitations are. In this document, we discuss several issues of operational relevance to route server operators and provide recommendations to help route server operators provision a reliable interconnection service.
IXPs的路由服务器需要仔细管理,路由服务器运营商必须彻底了解它们的工作方式及其局限性。在本文档中,我们讨论了与路由服务器运营商相关的几个运营问题,并提供了帮助路由服务器运营商提供可靠互连服务的建议。
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“建议”、“不建议”、“可”和“可选”应按照[RFC2119]中的说明进行解释。
The phrase "BGP route" in this document should be interpreted as the term "Route" described in [RFC4271].
本文件中的短语“BGP路由”应解释为[RFC4271]中所述的术语“路由”。
Bilateral interconnection is a method of interconnecting routers using individual BGP sessions between each pair of participant routers on an IXP, in order to exchange reachability information. If an IXP participant wishes to implement an open interconnection policy -- i.e., a policy of interconnecting with as many other IXP participants as possible -- it is necessary for the participant to liaise with each of their intended interconnection partners. Interconnection can then be implemented bilaterally by configuring a BGP session on both participants' routers to exchange network reachability information. If each exchange participant interconnects with each other participant, a full mesh of BGP sessions is needed, as shown in Figure 1.
双边互连是一种在IXP上的每对参与路由器之间使用单个BGP会话互连路由器的方法,以便交换可达性信息。如果IXP参与者希望实施开放互连政策——即与尽可能多的其他IXP参与者互连的政策——则参与者有必要与其每个预期的互连伙伴进行联络。然后,通过在两个参与者的路由器上配置BGP会话以交换网络可达性信息,可以双边实现互连。如果每个exchange参与者与其他参与者互连,则需要一个完整的BGP会话网格,如图1所示。
___ ___ / \ / \ ..| AS1 |..| AS2 |.. : \___/____\___/ : : | \ / | : : | \ / | : : IXP | \/ | : : | /\ | : : | / \ | : : _|_/____\_|_ : : / \ / \ : ..| AS3 |..| AS4 |.. \___/ \___/
___ ___ / \ / \ ..| AS1 |..| AS2 |.. : \___/____\___/ : : | \ / | : : | \ / | : : IXP | \/ | : : | /\ | : : | / \ | : : _|_/____\_|_ : : / \ / \ : ..| AS3 |..| AS4 |.. \___/ \___/
Figure 1: Full-Mesh Interconnection at an IXP
图1:IXP上的全网状互连
Figure 1 depicts an IXP platform with four connected routers, administered by four separate exchange participants, each of them with a locally unique Autonomous System (AS) number: AS1, AS2, AS3, and AS4. The lines between the routers depict BGP sessions; the dotted edge represents the IXP border. Each of these four participants wishes to exchange traffic with all other participants; this is accomplished by configuring a full mesh of BGP sessions on each router connected to the exchange, resulting in six BGP sessions across the IXP fabric.
图1描述了一个具有四个连接路由器的IXP平台,由四个独立的exchange参与者管理,每个参与者都有一个本地唯一的自治系统(AS)编号:AS1、AS2、AS3和AS4。路由器之间的线条描绘BGP会话;虚线边缘表示IXP边界。这四位参与者中的每一位都希望与所有其他参与者交换流量;这是通过在连接到交换机的每个路由器上配置一个完整的BGP会话网格来实现的,从而在IXP结构上产生六个BGP会话。
The number of BGP sessions at an exchange has an upper bound of n*(n-1)/2, where n is the number of routers at the exchange. As many exchanges have large numbers of participating networks, the amount of administrative and operation overhead required to implement an open interconnection scales quadratically. New participants to an IXP require significant initial resourcing in order to gain value from their IXP connection, while existing exchange participants need to commit ongoing resources in order to benefit from interconnecting with these new participants.
交换机上的BGP会话数的上限为n*(n-1)/2,其中n是交换机上的路由器数。由于许多交换机都有大量的参与网络,因此实施开放互连所需的管理和操作开销以二次方的方式递增。IXP的新参与者需要大量的初始资源,以便从他们的IXP连接中获得价值,而现有exchange参与者需要投入持续的资源,以便从与这些新参与者的互连中获益。
Multilateral interconnection is implemented using a route server configured to distribute BGP routes among client routers. The route server preserves the BGP NEXT_HOP attribute from all received BGP routes and passes them with unchanged NEXT_HOP to its route server clients according to its configured routing policy, as described in [RFC7947]. Using this method of exchanging BGP routes, an IXP participant router can receive an aggregated list of BGP routes from all other route server clients using a single BGP session to the route server instead of depending on BGP sessions with each router at the exchange. This reduces the overall number of BGP sessions at an
多边互连使用路由服务器实现,路由服务器配置为在客户端路由器之间分发BGP路由。路由服务器保留所有接收到的BGP路由的BGP NEXT_-HOP属性,并根据其配置的路由策略将它们以未更改的NEXT_-HOP传递给路由服务器客户端,如[RFC7947]中所述。使用这种交换BGP路由的方法,IXP参与者路由器可以使用到路由服务器的单个BGP会话从所有其他路由服务器客户端接收BGP路由的聚合列表,而不是依赖于交换机中每个路由器的BGP会话。这减少了一次BGP会话的总数
Internet exchange from n*(n-1)/2 to n, where n is the number of routers at the exchange.
从n*(n-1)/2到n的Internet交换,其中n是交换处的路由器数量。
Although a route server uses BGP to exchange reachability information with each of its clients, it does not forward traffic itself and is therefore not a router.
尽管路由服务器使用BGP与每个客户端交换可达性信息,但它本身并不转发流量,因此不是路由器。
In practical terms, this allows dense interconnection between IXP participants with low administrative overhead and significantly simpler and smaller router configurations. In particular, new IXP participants benefit from immediate and extensive interconnection, while existing route server participants receive reachability information from these new participants without necessarily having to modify their configurations.
实际上,这允许IXP参与者之间的密集互连,具有较低的管理开销和显著更简单、更小的路由器配置。特别是,新的IXP参与者受益于即时和广泛的互连,而现有的路由服务器参与者从这些新参与者那里接收可达性信息,而无需修改其配置。
___ ___ / \ / \ ..| AS1 |..| AS2 |.. : \___/ \___/ : : \ / : : \ / : : \__/ : : IXP / \ : : | RS | : : \____/ : : / \ : : / \ : : __/ \__ : : / \ / \ : ..| AS3 |..| AS4 |.. \___/ \___/
___ ___ / \ / \ ..| AS1 |..| AS2 |.. : \___/ \___/ : : \ / : : \ / : : \__/ : : IXP / \ : : | RS | : : \____/ : : / \ : : / \ : : __/ \__ : : / \ / \ : ..| AS3 |..| AS4 |.. \___/ \___/
Figure 2: IXP-Based Interconnection with Route Server
图2:基于IXP的与路由服务器的互连
As illustrated in Figure 2, each router on the IXP fabric requires only a single BGP session to the route server, from which it can receive reachability information for all other routers on the IXP that also connect to the route server.
如图2所示,IXP结构上的每个路由器只需要一个到路由服务器的BGP会话,它可以从中接收IXP上所有其他也连接到路由服务器的路由器的可达性信息。
Multilateral and bilateral interconnections between different autonomous systems are not exclusive to each other, and it is not unusual to have both sorts of sessions configured in parallel at an IXP. This configuration will lead to additional paths being available to the BGP Decision Process, which will calculate a best path as normal.
不同自治系统之间的多边和双边互连互不排斥,在IXP上并行配置这两种会话也不少见。此配置将导致BGP决策过程可以使用其他路径,这将正常计算最佳路径。
"Path hiding" is a term used in [RFC7947] to describe the process whereby a route server may mask individual paths by applying conflicting routing policies to its Loc-RIB. When this happens, route server clients receive incomplete information from the route server about network reachability.
“路径隐藏”是[RFC7947]中使用的一个术语,用于描述路由服务器通过将冲突路由策略应用于其Loc RIB来屏蔽单个路径的过程。发生这种情况时,路由服务器客户端从路由服务器接收到有关网络可达性的不完整信息。
There are several approaches that may be used to mitigate against the effect of path hiding; these are described in [RFC7947]. However, the only method that does not require explicit support from the route server client is for the route server itself to maintain an individual Loc-RIB for each client that is the subject of conflicting routing policies.
有几种方法可用于减轻路径隐藏的影响;这些在[RFC7947]中进行了描述。但是,唯一不需要路由服务器客户端显式支持的方法是路由服务器本身为每个路由策略冲突的客户端维护一个单独的Loc RIB。
While deployment of multiple Loc-RIBs on the route server presents a simple way to avoid the path-hiding problem noted in Section 4.1, this approach requires significantly more computing resources on the route server than where a single Loc-RIB is deployed for all clients. As the BGP Decision Process [RFC4271] must be applied to all Loc-RIBs deployed on the route server, both CPU and memory requirements on the host computer scale approximately according to O(P * N), where P is the total number of unique paths received by the route server, and N is the number of route server clients that require a unique Loc-RIB. As this is a super-linear scaling relationship, large route servers may derive benefit from deploying per-client Loc-RIBs only where they are required.
虽然在路由服务器上部署多个Loc RIB提供了一种避免第4.1节中提到的路径隐藏问题的简单方法,但与为所有客户端部署单个Loc RIB相比,这种方法在路由服务器上需要更多的计算资源。由于BGP决策过程[RFC4271]必须应用于路由服务器上部署的所有Loc RIB,主机上的CPU和内存需求大致按照O(P*N)进行扩展,其中P是路由服务器接收的唯一路径总数,N是需要唯一Loc RIB的路由服务器客户端数。由于这是一种超线性的扩展关系,大型路由服务器可能会从仅在需要时部署每个客户机中获益。
Regardless of whether any Loc-RIB optimization technique is implemented, the route server's theoretical upper-bound network bandwidth requirements will scale according to O(P_tot * N), where P_tot is the total number of unique paths received by the route server, and N is the total number of route server clients. In the case where P_avg (the arithmetic mean number of unique paths received per route server client) remains roughly constant even as the number of connected clients increases, the total number of prefixes will equal the average number of prefixes multiplied by the number of clients. Symbolically, this can be written as P_tot = P_avg * N. If we assume that in the worst case, each prefix is associated with a different set of BGP path attributes, so must be transmitted individually, the network bandwidth scaling function can be rewritten as O((P_avg * N) * N) or O(N^2). This quadratic upper bound on the network traffic requirements indicates that the route server model may not scale well for larger numbers of clients.
无论是否实施了任何Loc RIB优化技术,路由服务器的理论上限网络带宽需求将根据O(P_tot*N)进行扩展,其中P_tot是路由服务器接收的唯一路径的总数,N是路由服务器客户端的总数。在P_avg(每个路由服务器客户端接收的唯一路径的算术平均数)保持大致恒定的情况下,即使连接的客户端数量增加,前缀总数将等于平均前缀数量乘以客户端数量。从符号上讲,这可以写成P_tot=P_avg*N。如果我们假设在最坏的情况下,每个前缀都与一组不同的BGP路径属性相关联,因此必须单独传输,那么网络带宽缩放函数可以重写为O((P_avg*N)*N)或O(N^2)。网络流量需求的二次上界表明路由服务器模型可能无法很好地扩展到更多的客户端。
In practice, most prefixes will be associated with a limited number of BGP path attribute sets, allowing more efficient transmission of BGP routes from the route server than the theoretical analysis suggests. In the analysis above, P_tot will increase monotonically according to the number of clients, but it will have an upper limit of the size of the full default-free routing table of the network in which the IXP is located. Observations from production route servers have shown that most route server clients generally avoid using custom routing policies, and consequently, the route server may not need to deploy per-client Loc-RIBs. These practical bounds reduce the theoretical worst-case scaling scenario to the point where route server deployments are manageable even on larger IXPs.
在实践中,大多数前缀将与数量有限的BGP路径属性集相关联,从而允许从路由服务器更高效地传输BGP路由,而不是理论分析所表明的。在上面的分析中,P_tot将根据客户端的数量单调增加,但它将具有IXP所在网络的完整无默认路由表大小的上限。生产路由服务器的观察结果表明,大多数路由服务器客户端通常避免使用自定义路由策略,因此,路由服务器可能不需要按客户端部署。这些实际界限将理论上最坏情况下的扩展场景减少到甚至在更大的IXP上也可以管理路由服务器部署的程度。
The problem of scaling route servers still presents serious practical challenges and requires careful attention. Scaling analysis indicates problems in three key areas: route processor CPU overhead associated with BGP Decision Process calculations, the memory requirements for handling many different BGP path entries, and the network traffic bandwidth required to distribute these BGP routes from the route server to each route server client.
扩展路由服务器的问题仍然是一个严重的实际挑战,需要认真关注。缩放分析指出了三个关键领域的问题:与BGP决策过程计算相关的路由处理器CPU开销、处理许多不同BGP路径条目的内存需求,以及将这些BGP路由从路由服务器分发到每个路由服务器客户端所需的网络流量带宽。
View merging and decomposition, outlined in [RS-ARCH], describes a method of optimizing memory and CPU requirements where multiple route server clients are subject to exactly the same routing policies. In this situation, multiple Loc-RIB views can be merged into a single view.
[RS-ARCH]中概述的视图合并和分解描述了一种优化内存和CPU需求的方法,其中多个路由服务器客户端遵循完全相同的路由策略。在这种情况下,可以将多个Loc RIB视图合并到单个视图中。
There are several variations of this approach. If the route server operator has prior knowledge of interconnection relationships between route server clients, then the operator may configure separate Loc-RIBs only for route server clients with unique routing policies. As this approach requires prior knowledge of interconnection relationships, the route server operator must depend on each client sharing their interconnection policies either in an internal provisioning database controlled by the operator or in an external data store such as an Internet Routing Registry Database.
这种方法有几种变体。如果路由服务器运营商事先知道路由服务器客户端之间的互连关系,则运营商可以仅为具有唯一路由策略的路由服务器客户端配置单独的Loc RIB。由于这种方法需要事先了解互连关系,路由服务器运营商必须依赖于每个客户端在运营商控制的内部供应数据库或外部数据存储(如Internet路由注册表数据库)中共享其互连策略。
Conversely, the route server implementation itself may implement internal view decomposition by creating virtual Loc-RIBs based on a single in-memory master Loc-RIB, with delta differences for each prefix subject to different routing policies. This allows a more fine-grained and flexible approach to the problem of Loc-RIB scaling, at the expense of requiring a more complex in-memory Loc-RIB structure.
相反,路由服务器实现本身可以通过基于单个内存主Loc-RIB创建虚拟Loc-RIB来实现内部视图分解,每个前缀的delta差异取决于不同的路由策略。这允许以更复杂的内存Loc RIB结构为代价,以更细粒度和更灵活的方式解决Loc RIB缩放问题。
Whatever method of view merging and decomposition is chosen on a route server, pathological edge cases can be created whereby they will scale no better than fully non-optimized per-client Loc-RIBs. However, as most route server clients connect to a route server for the purposes of reducing overhead, rather than implementing complex per-client routing policies, edge cases tend not to arise in practice.
无论在路由服务器上选择何种视图合并和分解方法,都可以创建病理性边缘案例,因此它们的扩展性不会比完全未优化的每个客户端更好。但是,由于大多数路由服务器客户端连接到路由服务器是为了减少开销,而不是实现复杂的每客户端路由策略,因此在实践中不会出现边缘情况。
Destination splitting, also described in [RS-ARCH], describes a method for route server clients to connect to multiple route servers and to send non-overlapping sets of prefixes to each route server. As each route server computes the best path for its own set of prefixes, the quadratic scaling requirement operates on multiple smaller sets of prefixes. This reduces the overall computational and memory requirements for managing multiple Loc-RIBs and performing the best-path calculation on each.
目的地拆分(也在[RS-ARCH]中描述)描述了路由服务器客户端连接到多个路由服务器并向每个路由服务器发送不重叠前缀集的方法。当每个路由服务器为自己的前缀集计算最佳路径时,二次缩放要求对多个较小的前缀集进行操作。这降低了管理多个Loc肋骨和在每个肋骨上执行最佳路径计算的总体计算和内存需求。
In practice, the route server operator would need all route server clients to send a full set of BGP routes to each route server. The route server operator could then selectively filter these prefixes for each route server by using either BGP Outbound Route Filtering [RFC5291] or inbound prefix filters configured on client BGP sessions.
实际上,路由服务器运营商需要所有路由服务器客户端向每个路由服务器发送一整套BGP路由。然后,路由服务器操作员可以使用在客户端BGP会话上配置的BGP出站路由筛选[RFC5291]或入站前缀筛选器,有选择地筛选每个路由服务器的这些前缀。
As route servers are usually deployed at IXPs where all connected routers are on the same Layer 2 broadcast domain, recursive resolution of the NEXT_HOP attribute is generally not required and can be replaced by a simple check to ensure that the NEXT_HOP value for each received BGP route is a network address on the IXP LAN's IP address range.
由于路由服务器通常部署在IXP上,其中所有连接的路由器都位于同一层2广播域上,因此通常不需要递归解析NEXT_-HOP属性,可以通过简单的检查来替换,以确保每个接收到的BGP路由的NEXT_-HOP值是IXP LAN IP地址范围上的网络地址。
Prefix leakage occurs when a BGP client unintentionally distributes BGP routes to one or more neighboring BGP routers. Prefix leakage of this form to a route server can cause serious connectivity problems at an IXP if each route server client is configured to accept all BGP routes from the route server. It is therefore RECOMMENDED when deploying route servers that, due to the potential for collateral damage caused by BGP route leakage, route server operators deploy prefix leakage mitigation measures in order to prevent unintentional prefix announcements or else limit the scale of any such leak. Although not foolproof, per-client inbound prefix limits can restrict the damage caused by prefix leakage in many cases. Per-client
当BGP客户端无意中将BGP路由分发到一个或多个相邻BGP路由器时,会发生前缀泄漏。如果将每个路由服务器客户端配置为接受来自路由服务器的所有BGP路由,则此表单的前缀泄漏到路由服务器可能会在IXP上导致严重的连接问题。因此,建议在部署路由服务器时,由于BGP路由泄漏可能造成附带损害,路由服务器运营商应部署前缀泄漏缓解措施,以防止无意的前缀公告或限制任何此类泄漏的规模。虽然不是万无一失,但在许多情况下,每客户端入站前缀限制可以限制前缀泄漏造成的损害。每位客户
inbound prefix filtering on the route server is a more deterministic and usually more reliable means of preventing prefix leakage but requires more administrative resources to maintain properly.
路由服务器上的入站前缀过滤是一种更具确定性且通常更可靠的防止前缀泄漏的方法,但需要更多的管理资源才能正确维护。
If a route server operator implements per-client inbound prefix filtering, then it is RECOMMENDED that the operator also builds in mechanisms to automatically compare the Adj-RIB-In received from each client with the inbound prefix lists configured for those clients. Naturally, it is the responsibility of the route server client to ensure that their stated prefix list is compatible with what they announce to an IXP route server. However, many network operators do not carefully manage their published routing policies, and it is not uncommon to see significant variation between the two sets of prefixes. Route server operator visibility into this discrepancy can provide significant advantages to both operator and client.
如果路由服务器运营商实施每客户端入站前缀过滤,则建议该运营商还内置机制,自动将从每个客户端接收的Adj RIB in与为这些客户端配置的入站前缀列表进行比较。当然,路由服务器客户端有责任确保它们声明的前缀列表与它们向IXP路由服务器发布的前缀列表兼容。然而,许多网络运营商并没有仔细管理他们发布的路由策略,而且这两组前缀之间的显著差异并不少见。路由服务器运营商对这种差异的可见性可以为运营商和客户提供显著的优势。
As the purpose of an IXP route server implementation is to provide a reliable reachability brokerage service, it is RECOMMENDED that exchange operators who implement route server systems provision multiple route servers on each shared Layer 2 domain. There is no requirement to use the same BGP implementation or operating system for each route server on the IXP fabric; however, it is RECOMMENDED that where an operator provisions more than a single server on the same shared Layer 2 domain, each route server implementation be configured equivalently and in such a manner that the path reachability information from each system is identical.
由于IXP路由服务器实施的目的是提供可靠的可达性代理服务,建议实施路由服务器系统的exchange运营商在每个共享的第2层域上提供多个路由服务器。IXP结构上的每个路由服务器不需要使用相同的BGP实现或操作系统;但是,建议在运营商在同一共享第2层域上提供多台服务器的情况下,每个路由服务器实现都应进行等效配置,并以这样的方式配置,即来自每个系统的路径可达性信息是相同的。
[RFC4271] requires that every BGP speaker that advertises a BGP route to another external BGP speaker prepends its own AS number as the last element of the AS_PATH sequence. Therefore, the leftmost AS in an AS_PATH attribute should be equal to the AS number of the BGP speaker that sent the BGP route.
[RFC4271]要求每个播发到另一个外部BGP扬声器的BGP路由的BGP扬声器都预先准备好自己的AS编号,作为AS_路径序列的最后一个元素。因此,AS_路径属性中最左边的AS应等于发送BGP路由的BGP扬声器的AS编号。
As [RFC7947] suggests that route servers should not modify the AS_PATH attribute, a consistency check on the AS_PATH of a BGP route received by a route server client would normally fail. It is therefore RECOMMENDED that route server clients disable the AS_PATH consistency check towards the route server.
由于[RFC7947]建议路由服务器不应修改As_路径属性,路由服务器客户端接收的BGP路由的As_路径一致性检查通常会失败。因此,建议路由服务器客户端禁用对路由服务器的AS_路径一致性检查。
Policy filtering is commonly implemented on route servers to provide prefix distribution control mechanisms for route server clients. A route server "export" policy is a policy that affects prefixes sent from the route server to a route server client. Several different strategies are commonly used for implementing route server export policies.
策略筛选通常在路由服务器上实现,以便为路由服务器客户端提供前缀分发控制机制。路由服务器“导出”策略是影响从路由服务器发送到路由服务器客户端的前缀的策略。几种不同的策略通常用于实现路由服务器导出策略。
Prefixes sent to the route server are tagged with specific standard BGP Communities [RFC1997] or Extended Communities [RFC4360] attributes, based on predefined values agreed between the operator and all clients. Based on these Communities values, BGP routes may be propagated to all other clients, a subset of clients, or none. This mechanism allows route server clients to instruct the route server to implement per-client export routing policies.
发送到路由服务器的前缀根据运营商和所有客户端之间商定的预定义值,使用特定的标准BGP社区[RFC1997]或扩展社区[RFC4360]属性进行标记。基于这些社区值,可以将BGP路由传播到所有其他客户端、客户端子集或无。此机制允许路由服务器客户端指示路由服务器实施每客户端导出路由策略。
As both standard BGP Communities and Extended Communities values are restricted to 6 octets or fewer, it is not possible for both the global and local administrator fields in the BGP Communities value to fit a 4-octet AS number. Bearing this in mind, the route server operator SHOULD take care to ensure that the predefined BGP Communities values mechanism used on their route server is compatible with 4-octet AS numbers [RFC6793].
由于标准BGP Communities和扩展Communities值都限制为6个八位字节或更少,因此BGP Communities值中的全局和本地管理员字段都不可能将4个八位字节作为数字。考虑到这一点,路由服务器运营商应注意确保其路由服务器上使用的预定义BGP社区值机制与4-octet AS数字兼容[RFC6793]。
Internet Routing Registry databases (IRRDBs) may be used by route server operators to construct per-client routing policies. "Routing Policy Specification Language (RPSL)" [RFC2622] provides a comprehensive grammar for describing interconnection relationships, and several toolsets exist that can be used to translate RPSL policy description into route server configurations.
路由服务器操作员可以使用Internet路由注册表数据库(IRRDB)来构建每客户端路由策略。“路由策略规范语言(RPSL)”[RFC2622]提供了用于描述互连关系的全面语法,并且存在多个可用于将RPSL策略描述转换为路由服务器配置的工具集。
Should the route server operator not wish to use either BGP Communities or the public IRRDBs for implementing client export policies, they may implement their own routing policy database system for managing their clients' requirements. A database of this form SHOULD allow a route server client operator to update their routing policy and provide a mechanism for allowing the client to specify whether they wish to exchange all their prefixes with any other route server client. Optionally, the implementation may allow a client to specify unique routing policies for individual prefixes over which they have routing policy control.
如果路由服务器运营商不希望使用BGP社区或公共IRRDB来实施客户端导出策略,他们可以实施自己的路由策略数据库系统来管理其客户端的需求。此表单的数据库应允许路由服务器客户端操作员更新其路由策略,并提供一种机制,允许客户端指定是否希望与任何其他路由服务器客户端交换其所有前缀。可选地,该实现可以允许客户端为其具有路由策略控制权的各个前缀指定唯一的路由策略。
Layer 2 reachability problems on an IXP can cause serious operational problems for IXP participants that depend on route servers for interconnection. Ethernet switch forwarding bugs have occasionally been observed to cause non-transitive reachability. For example, given a route server and two IXP participants, A and B, if the two participants can reach the route server but cannot reach each other, then traffic between the participants may be dropped until such time as the Layer 2 forwarding problem is resolved. This situation does not tend to occur in bilateral interconnection arrangements, as the routing control path between the two hosts is usually (but not always, due to IXP inter-switch connectivity load-balancing algorithms) the same as the data path between them.
IXP上的第2层可达性问题可能会对依赖路由服务器进行互连的IXP参与者造成严重的操作问题。偶尔会观察到以太网交换机转发错误,从而导致不可传递的可达性。例如,给定一个路由服务器和两个IXP参与者a和B,如果两个参与者可以到达路由服务器但不能彼此到达,那么参与者之间的通信量可能会下降,直到第2层转发问题得到解决为止。这种情况通常不会发生在双边互连安排中,因为两台主机之间的路由控制路径通常(但并非总是,由于IXP交换机间连接负载平衡算法)与它们之间的数据路径相同。
Problems of this form can be partially mitigated by using Bidirectional Forwarding Detection (BFD) [RFC5881]. However, as this is a bilateral protocol configured between routers, and as there is currently no protocol to automatically configure BFD sessions between route server clients, BFD does not currently provide an optimal means of handling the problem. Even if automatic BFD session configuration were possible, practical problems would remain. If two IXP route server clients were configured to run BFD between each other and the protocol detected a non-transitive loss of reachability between them, each of those routers would internally mark the other's prefixes as unreachable via the BGP path announced by the route server. As the route server only propagates a single best path to each client, this could cause either sub-optimal routing or complete connectivity loss if there were no alternative paths learned from other BGP sessions.
通过使用双向转发检测(BFD)[RFC5881],可以部分缓解这种形式的问题。然而,由于这是路由器之间配置的双边协议,并且目前没有自动配置路由服务器客户端之间的BFD会话的协议,因此BFD目前无法提供处理该问题的最佳方法。即使自动BFD会话配置是可能的,实际问题仍然存在。如果两个IXP路由服务器客户端被配置为在彼此之间运行BFD,并且协议检测到它们之间不可传递的可达性丢失,则这些路由器中的每一个都会在内部将另一个的前缀标记为无法通过路由服务器宣布的BGP路径访问。由于路由服务器仅向每个客户端传播一条最佳路径,如果没有从其他BGP会话中学习到其他路径,这可能会导致次优路由或完全连接丢失。
Item 2 in Section 5.1.3 of [RFC4271] allows EBGP speakers to change the NEXT_HOP address of a received BGP route to be a different Internet address on the same subnet. This is the mechanism that allows route servers to operate on a shared Layer 2 IXP network. However, the mechanism can be abused by route server clients to redirect traffic for their prefixes to other IXP participant routers.
[RFC4271]第5.1.3节中的第2项允许EBGP扬声器将接收到的BGP路由的下一跳地址更改为同一子网上的不同Internet地址。这是一种允许路由服务器在共享的第2层IXP网络上运行的机制。然而,路由服务器客户端可能会滥用该机制,将其前缀的流量重定向到其他IXP参与者路由器。
____ / \ | AS99 | \____/ / \ / \ __/ \__ / \ / \ ..| AS1 |..| AS2 |.. : \___/ \___/ : : \ / : : \ / : : \__/ : : IXP / \ : : | RS | : : \____/ : : : ....................
____ / \ | AS99 | \____/ / \ / \ __/ \__ / \ / \ ..| AS1 |..| AS2 |.. : \___/ \___/ : : \ / : : \ / : : \__/ : : IXP / \ : : | RS | : : \____/ : : : ....................
Figure 3: BGP NEXT_HOP Hijacking Using a Route Server
图3:使用路由服务器的BGP下一跳劫持
For example, in Figure 3, if AS1 and AS2 both announce BGP routes for AS99 to the route server, AS1 could set the NEXT_HOP address for AS99's routes to be the address of AS2's router, thereby diverting traffic for AS99 via AS2. This may override the routing policies of AS99 and AS2.
例如,在图3中,如果AS1和AS2都向路由服务器宣布AS99的BGP路由,AS1可以将AS99路由的下一跳地址设置为AS2路由器的地址,从而通过AS2转移AS99的流量。这可能会覆盖AS99和AS2的路由策略。
Worse still, if the route server operator does not use inbound prefix filtering, AS1 could announce any arbitrary prefix to the route server with a NEXT_HOP address of any other IXP participant. This could be used as a denial-of-service mechanism against either the users of the address space being announced by illicitly diverting their traffic or the other IXP participant by overloading their network with traffic that would not normally be sent there.
更糟糕的是,如果路由服务器运营商不使用入站前缀过滤,AS1可能会使用任何其他IXP参与者的下一跳地址向路由服务器宣布任何任意前缀。这可以作为一种拒绝服务机制,针对通过非法转移其流量而宣布的地址空间的用户,或通过使用通常不会发送到地址空间的流量使其网络过载而宣布的其他IXP参与者。
This problem is not specific to route servers, and it can also be implemented using bilateral BGP sessions. However, the potential damage is amplified by route servers because a single BGP session can be used to affect many networks simultaneously.
这个问题并不特定于路由服务器,也可以使用双边BGP会话来实现。然而,由于单个BGP会话可用于同时影响多个网络,路由服务器加剧了潜在的损害。
Because route server clients cannot easily implement next-hop policy checks against route server BGP sessions, route server operators SHOULD check that the BGP NEXT_HOP attribute for BGP routes received from a route server client matches the interface address of the client. If the route server receives a BGP route where these addresses are different and where the announcing route server client is in a different AS to the route server client that uses the next-hop address, the BGP route SHOULD be dropped. Permitting next-hop
由于路由服务器客户端无法轻松对路由服务器BGP会话执行下一跳策略检查,因此路由服务器操作员应检查从路由服务器客户端接收的BGP路由的BGP下一跳属性是否与客户端的接口地址匹配。如果路由服务器接收到一个BGP路由,其中这些地址不同,并且通告路由服务器客户端与使用下一跳地址的路由服务器客户端处于不同的位置,则应丢弃BGP路由。允许下一跳
rewriting for the same AS allows an organization with multiple connections into an IXP configured with different IP addresses to direct traffic off the IXP infrastructure through any of their connections for traffic engineering or other purposes.
重写与允许具有多个连接到配置了不同IP地址的IXP的组织通过其任何连接将流量从IXP基础设施中引导出来,以用于流量工程或其他目的。
BGP route servers SHOULD be configured and operated in compliance with [RFC7454] with the exception of Section 11, "BGP Community Scrubbing", which may not necessarily apply on a route server, depending on the route server operator policy.
BGP路由服务器的配置和操作应符合[RFC7454]的规定,但第11节“BGP社区清理”除外,根据路由服务器运营商的政策,BGP社区清理可能不一定适用于路由服务器。
On route server installations that do not employ path-hiding mitigation techniques, the path-hiding problem outlined in Section 4.1 could be used by an IXP participant to prevent the route server from sending any BGP routes for a particular prefix to other route server clients, even if there was a valid path to that destination via another route server client.
在未采用路径隐藏缓解技术的路由服务器安装上,IXP参与者可使用第4.1节中概述的路径隐藏问题来防止路由服务器向其他路由服务器客户端发送特定前缀的任何BGP路由,即使存在通过另一个路由服务器客户端到达该目的地的有效路径。
If the route server operator does not implement prefix leakage mitigation as described in Section 4.3, it is trivial for route server clients to implement denial-of-service attacks against arbitrary Internet networks by leaking BGP routes to a route server.
如果路由服务器运营商未实施第4.3节中所述的前缀泄漏缓解措施,则路由服务器客户端可通过将BGP路由泄漏到路由服务器来对任意Internet网络实施拒绝服务攻击。
Route server installations SHOULD be secured against BGP NEXT_HOP hijacking, as described in Section 4.8.
如第4.8节所述,应保护路由服务器安装,防止BGP下一跳劫持。
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<http://www.rfc-editor.org/info/rfc2119>.
[RFC7947] Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, "Internet Exchange BGP Route Server", RFC 7947, DOI 10.17487/RFC7947, September 2016, <http://www.rfc-editor.org/info/rfc7947>.
[RFC7947]Jasinska,E.,Hilliard,N.,Raszuk,R.,和N.Bakker,“互联网交换BGP路由服务器”,RFC 7947,DOI 10.17487/RFC7947,2016年9月<http://www.rfc-editor.org/info/rfc7947>.
[RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, <http://www.rfc-editor.org/info/rfc1997>.
[RFC1997]Chandra,R.,Traina,P.,和T.Li,“BGP社区属性”,RFC 1997,DOI 10.17487/RFC1997,1996年8月<http://www.rfc-editor.org/info/rfc1997>.
[RFC2622] Alaettinoglu, C., Villamizar, C., Gerich, E., Kessens, D., Meyer, D., Bates, T., Karrenberg, D., and M. Terpstra, "Routing Policy Specification Language (RPSL)", RFC 2622, DOI 10.17487/RFC2622, June 1999, <http://www.rfc-editor.org/info/rfc2622>.
[RFC2622]Alaettinoglu,C.,Villamizar,C.,Gerich,E.,Kessens,D.,Meyer,D.,Bates,T.,Karrenberg,D.,和M.Terpstra,“路由策略规范语言(RPSL)”,RFC 2622,DOI 10.17487/RFC2622,1999年6月<http://www.rfc-editor.org/info/rfc2622>.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, <http://www.rfc-editor.org/info/rfc4271>.
[RFC4271]Rekhter,Y.,Ed.,Li,T.,Ed.,和S.Hares,Ed.,“边境网关协议4(BGP-4)”,RFC 4271,DOI 10.17487/RFC4271,2006年1月<http://www.rfc-editor.org/info/rfc4271>.
[RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February 2006, <http://www.rfc-editor.org/info/rfc4360>.
[RFC4360]Sangli,S.,Tappan,D.和Y.Rekhter,“BGP扩展社区属性”,RFC 4360,DOI 10.17487/RFC4360,2006年2月<http://www.rfc-editor.org/info/rfc4360>.
[RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, <http://www.rfc-editor.org/info/rfc4456>.
[RFC4456]Bates,T.,Chen,E.和R.Chandra,“BGP路由反射:全网格内部BGP(IBGP)的替代方案”,RFC 4456,DOI 10.17487/RFC4456,2006年4月<http://www.rfc-editor.org/info/rfc4456>.
[RFC5291] Chen, E. and Y. Rekhter, "Outbound Route Filtering Capability for BGP-4", RFC 5291, DOI 10.17487/RFC5291, August 2008, <http://www.rfc-editor.org/info/rfc5291>.
[RFC5291]Chen,E.和Y.Rekhter,“BGP-4的出站路由过滤能力”,RFC 5291,DOI 10.17487/RFC52912008年8月<http://www.rfc-editor.org/info/rfc5291>.
[RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, DOI 10.17487/RFC5881, June 2010, <http://www.rfc-editor.org/info/rfc5881>.
[RFC5881]Katz,D.和D.Ward,“IPv4和IPv6(单跳)的双向转发检测(BFD)”,RFC 5881,DOI 10.17487/RFC5881,2010年6月<http://www.rfc-editor.org/info/rfc5881>.
[RFC6793] Vohra, Q. and E. Chen, "BGP Support for Four-Octet Autonomous System (AS) Number Space", RFC 6793, DOI 10.17487/RFC6793, December 2012, <http://www.rfc-editor.org/info/rfc6793>.
[RFC6793]Vohra,Q.和E.Chen,“BGP对四个八位组自治系统(AS)数字空间的支持”,RFC 6793,DOI 10.17487/RFC6793,2012年12月<http://www.rfc-editor.org/info/rfc6793>.
[RFC7454] Durand, J., Pepelnjak, I., and G. Doering, "BGP Operations and Security", BCP 194, RFC 7454, DOI 10.17487/RFC7454, February 2015, <http://www.rfc-editor.org/info/rfc7454>.
[RFC7454]Durand,J.,Pepelnjak,I.,和G.Doering,“BGP运营和安全”,BCP 194,RFC 7454,DOI 10.17487/RFC7454,2015年2月<http://www.rfc-editor.org/info/rfc7454>.
[RS-ARCH] Govindan, R., Alaettinoglu, C., Varadhan, K., and D. Estrin, "A Route Server Architecture for Inter-Domain Routing", 1995, <http://www.cs.usc.edu/assets/003/83191.pdf>.
[RS-ARCH]Govindan,R.,Alaettinoglu,C.,Varadhan,K.,和D.Estrin,“域间路由的路由服务器架构”,1995年<http://www.cs.usc.edu/assets/003/83191.pdf>.
Acknowledgments
致谢
The authors would like to thank Chris Hall, Ryan Bickhart, Steven Bakker, and Eduardo Ascenco Reis for their valuable input.
作者要感谢Chris Hall、Ryan Bickhart、Steven Bakker和Eduardo Ascenco Reis的宝贵意见。
Authors' Addresses
作者地址
Nick Hilliard INEX 4027 Kingswood Road Dublin 24 Ireland
Nick Hilliard INEX 4027 Kingswood路都柏林24号爱尔兰
Email: nick@inex.ie
Email: nick@inex.ie
Elisa Jasinska BigWave IT ul. Skawinska 27/7 Krakow, MP 31-066 Poland
伊莉莎·贾辛斯卡·比格瓦夫。波兰克拉科夫市斯卡温斯卡27/7号,邮编31-066
Email: elisa@bigwaveit.org
Email: elisa@bigwaveit.org
Robert Raszuk Bloomberg LP 731 Lexington Ave. New York, NY 10022 United States of America
Robert Raszuk Bloomberg LP美国纽约州莱克星顿大道731号,邮编:10022
Email: robert@raszuk.net
Email: robert@raszuk.net
Niels Bakker Akamai Technologies B.V. Kingsfordweg 151 Amsterdam 1043 GR Netherlands
Niels Bakker Akamai Technologies B.V.Kingsfordweg 151阿姆斯特丹1043格荷兰
Email: nbakker@akamai.com
Email: nbakker@akamai.com