Internet Engineering Task Force (IETF)                          M. Shand
Request for Comments: 5714                                     S. Bryant
Category: Informational                                    Cisco Systems
ISSN: 2070-1721                                             January 2010
        
Internet Engineering Task Force (IETF)                          M. Shand
Request for Comments: 5714                                     S. Bryant
Category: Informational                                    Cisco Systems
ISSN: 2070-1721                                             January 2010
        

IP Fast Reroute Framework

IP快速重路由框架

Abstract

摘要

This document provides a framework for the development of IP fast-reroute mechanisms that provide protection against link or router failure by invoking locally determined repair paths. Unlike MPLS fast-reroute, the mechanisms are applicable to a network employing conventional IP routing and forwarding.

本文档提供了一个开发IP快速重路由机制的框架,该机制通过调用本地确定的修复路径来提供针对链路或路由器故障的保护。与MPLS快速重路由不同,这些机制适用于采用传统IP路由和转发的网络。

Status of This Memo

关于下段备忘

This document is not an Internet Standards Track specification; it is published for informational purposes.

本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5714.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc5714.

Copyright Notice

版权公告

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2010 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Scope and Applicability  . . . . . . . . . . . . . . . . . . .  5
   4.  Problem Analysis . . . . . . . . . . . . . . . . . . . . . . .  5
   5.  Mechanisms for IP Fast-Reroute . . . . . . . . . . . . . . . .  7
     5.1.  Mechanisms for Fast Failure Detection  . . . . . . . . . .  7
     5.2.  Mechanisms for Repair Paths  . . . . . . . . . . . . . . .  8
       5.2.1.  Scope of Repair Paths  . . . . . . . . . . . . . . . .  9
       5.2.2.  Analysis of Repair Coverage  . . . . . . . . . . . . .  9
       5.2.3.  Link or Node Repair  . . . . . . . . . . . . . . . . . 10
       5.2.4.  Maintenance of Repair Paths  . . . . . . . . . . . . . 10
       5.2.5.  Local Area Networks  . . . . . . . . . . . . . . . . . 11
       5.2.6.  Multiple Failures and Shared Risk Link Groups  . . . . 11
     5.3.  Mechanisms for Micro-Loop Prevention . . . . . . . . . . . 12
   6.  Management Considerations  . . . . . . . . . . . . . . . . . . 12
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
   9.  Informative References . . . . . . . . . . . . . . . . . . . . 14
        
   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Scope and Applicability  . . . . . . . . . . . . . . . . . . .  5
   4.  Problem Analysis . . . . . . . . . . . . . . . . . . . . . . .  5
   5.  Mechanisms for IP Fast-Reroute . . . . . . . . . . . . . . . .  7
     5.1.  Mechanisms for Fast Failure Detection  . . . . . . . . . .  7
     5.2.  Mechanisms for Repair Paths  . . . . . . . . . . . . . . .  8
       5.2.1.  Scope of Repair Paths  . . . . . . . . . . . . . . . .  9
       5.2.2.  Analysis of Repair Coverage  . . . . . . . . . . . . .  9
       5.2.3.  Link or Node Repair  . . . . . . . . . . . . . . . . . 10
       5.2.4.  Maintenance of Repair Paths  . . . . . . . . . . . . . 10
       5.2.5.  Local Area Networks  . . . . . . . . . . . . . . . . . 11
       5.2.6.  Multiple Failures and Shared Risk Link Groups  . . . . 11
     5.3.  Mechanisms for Micro-Loop Prevention . . . . . . . . . . . 12
   6.  Management Considerations  . . . . . . . . . . . . . . . . . . 12
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
   9.  Informative References . . . . . . . . . . . . . . . . . . . . 14
        
1. Introduction
1. 介绍

When a link or node failure occurs in a routed network, there is inevitably a period of disruption to the delivery of traffic until the network re-converges on the new topology. Packets for destinations that were previously reached by traversing the failed component may be dropped or may suffer looping. Traditionally, such disruptions have lasted for periods of at least several seconds, and most applications have been constructed to tolerate such a quality of service.

当路由网络中发生链路或节点故障时,在网络重新聚合到新拓扑之前,不可避免地会有一段时间的流量传输中断。先前通过遍历故障组件到达的目的地的数据包可能会被丢弃或遭受循环。传统上,这种中断至少会持续几秒钟,并且大多数应用程序的构造都能够容忍这种服务质量。

Recent advances in routers have reduced this interval to under a second for carefully configured networks using link state IGPs. However, new Internet services are emerging that may be sensitive to periods of traffic loss that are orders of magnitude shorter than this.

路由器的最新进展已将使用链路状态IGP的精心配置网络的间隔缩短到1秒以下。然而,新的互联网服务正在出现,这些服务可能对比这短几个数量级的流量损失周期很敏感。

Addressing these issues is difficult because the distributed nature of the network imposes an intrinsic limit on the minimum convergence time that can be achieved.

解决这些问题很困难,因为网络的分布式特性对可以实现的最小收敛时间施加了内在限制。

However, there is an alternative approach, which is to compute backup routes that allow the failure to be repaired locally by the router(s) detecting the failure without the immediate need to inform other routers of the failure. In this case, the disruption time can be limited to the small time taken to detect the adjacent failure and invoke the backup routes. This is analogous to the technique

然而,还有一种替代方法,即计算备份路由,允许检测故障的路由器本地修复故障,而无需立即通知其他路由器故障。在这种情况下,中断时间可以限制为检测相邻故障和调用备份路由所需的少量时间。这与技术类似

employed by MPLS fast-reroute [RFC4090], but the mechanisms employed for the backup routes in pure IP networks are necessarily very different.

MPLS快速重路由[RFC4090]采用的机制,但在纯IP网络中用于备份路由的机制必然非常不同。

This document provides a framework for the development of this approach.

本文件为该方法的发展提供了一个框架。

Note that in order to further minimize the impact on user applications, it may be necessary to design the network such that backup paths with suitable characteristics (for example, capacity and/or delay) are available for the algorithms to select. Such considerations are outside the scope of this document.

注意,为了进一步最小化对用户应用的影响,可能需要设计网络,使得具有适当特性(例如,容量和/或延迟)的备份路径可供算法选择。此类考虑不在本文件的范围内。

2. Terminology
2. 术语

This section defines words and acronyms used in this document and other documents discussing IP fast-reroute.

本节定义了本文档和其他讨论IP快速重路由的文档中使用的单词和首字母缩略词。

D Used to denote the destination router under discussion.

D用于表示讨论中的目标路由器。

Distance_opt(A,B) The metric sum of the shortest path from A to B.

距离(A,B)从A到B的最短路径的度量和。

Downstream Path This is a subset of the loop-free alternates where the neighbor N meets the following condition: Distance_opt(N, D) < Distance_opt(S,D)

下游路径这是相邻N满足以下条件的无环交替的子集:距离(N,D)<距离(S,D)

E Used to denote the router that is the primary neighbor to get from S to the destination D. Where there is an ECMP set for the shortest path from S to D, these are referred to as E_1, E_2, etc.

E用于表示作为从S到目的地D的主要邻居的路由器。如果存在从S到D的最短路径的ECMP集,则称为E_1、E_2等。

ECMP Equal cost multi-path: Where, for a particular destination D, multiple primary next-hops are used to forward traffic because there exist multiple shortest paths from S via different output layer-3 interfaces.

ECMP等成本多路径:其中,对于特定目的地D,多个主要下一跳用于转发流量,因为存在多个来自S的最短路径,这些路径通过不同的输出层3接口。

FIB Forwarding Information Base. The database used by the packet forwarder to determine what actions to perform on a packet.

FIB转发信息库。数据包转发器用于确定对数据包执行哪些操作的数据库。

IPFRR IP fast-reroute.

IPFRR IP快速重路由。

Link(A->B) A link connecting router A to router B.

链路(A->B)连接路由器A和路由器B的链路。

LFA Loop-Free Alternate. A neighbor N, that is not a primary neighbor E, whose shortest path to the destination D does not go back through the router S. The neighbor N must meet the following condition: Distance_opt(N, D) < Distance_opt(N, S) + Distance_opt(S, D)

LFA无回路备用。不是主要邻居E的邻居N,其到目的地D的最短路径不通过路由器S返回。邻居N必须满足以下条件:距离(N,D)<距离(N,S)+距离(S,D)

Loop-Free Neighbor A neighbor N_i, which is not the particular primary neighbor E_k under discussion, and whose shortest path to D does not traverse S. For example, if there are two primary neighbors E_1 and E_2, E_1 is a loop-free neighbor with regard to E_2, and vice versa.

无环邻居N_i不是讨论中的特定主邻居E_k,其到D的最短路径不穿过S。例如,如果有两个主邻居E_1和E_2,则E_1是关于E_2的无环邻居,反之亦然。

Loop-Free Link-Protecting Alternate A path via a Loop-Free Neighbor N_i that reaches destination D without going through the particular link of S that is being protected. In some cases, the path to D may go through the primary neighbor E.

无环链路保护通过无环邻居N_i的备用路径,该路径到达目的地D,而不经过受保护的S的特定链路。在某些情况下,到D的路径可能经过主邻居E。

Loop-Free Node-Protecting Alternate A path via a Loop-Free Neighbor N_i that reaches destination D without going through the particular primary neighbor (E) of S that is being protected.

无环节点通过无环邻居N_i保护备用路径,该无环邻居N_i到达目的地D,而不经过受保护的S的特定主邻居(E)。

N_i The ith neighbor of S.

N_i是S的第i个邻居。

Primary Neighbor A neighbor N_i of S which is one of the next hops for destination D in S's FIB prior to any failure.

主邻居S的邻居N_i,它是在任何故障之前S的FIB中目的地D的下一个跃点之一。

R_i_j The jth neighbor of N_i.

R_i_j是N_i的第j个邻居。

Repair Path The path used by a repairing node to send traffic that it is unable to send via the normal path owing to a failure.

修复路径修复节点用于发送由于故障而无法通过正常路径发送的通信量的路径。

Routing Transition The process whereby routers converge on a new topology. In conventional networks, this process frequently causes some disruption to packet delivery.

路由转换路由器在新拓扑上聚合的过程。在传统的网络中,这个过程经常会对数据包传输造成一些中断。

RPF Reverse Path Forwarding, i.e., checking that a packet is received over the interface that would be used to send packets addressed to the source address of the packet.

RPF反向路径转发,即,检查是否通过接口接收到数据包,该接口将用于发送地址为数据包源地址的数据包。

S Used to denote a router that is the source of a repair that is computed in anticipation of the failure of a neighboring router denoted as E, or of the link between S and E. It is the viewpoint from which IP fast-reroute is described.

S用于表示作为修复源的路由器,该修复是在预期表示为E的相邻路由器或S与E之间的链路出现故障时计算的。这是描述IP快速重路由的观点。

SPF Shortest Path First, e.g., Dijkstra's algorithm.

SPF最短路径优先,例如Dijkstra算法。

SPT Shortest path tree

最短路径树

Upstream Forwarding Loop A forwarding loop that involves a set of routers, none of which is directly connected to the link that has caused the topology change that triggered a new SPF in any of the routers.

上游转发循环涉及一组路由器的转发循环,其中没有一个直接连接到导致拓扑更改的链路,该拓扑更改触发了任何路由器中的新SPF。

3. Scope and Applicability
3. 范围和适用性

The initial scope of this work is in the context of link state IGPs. Link state protocols provide ubiquitous topology information, which facilitates the computation of repairs paths.

这项工作的初始范围是在链路状态IGP的上下文中。链路状态协议提供了无处不在的拓扑信息,这有助于修复路径的计算。

Provision of similar facilities in non-link state IGPs and BGP is a matter for further study, but the correct operation of the repair mechanisms for traffic with a destination outside the IGP domain is an important consideration for solutions based on this framework.

在非链路状态IGP和BGP中提供类似设施是一个有待进一步研究的问题,但是对于目的地在IGP域之外的流量,正确操作修复机制是基于此框架的解决方案的一个重要考虑因素。

Complete protection against multiple unrelated failures is out of scope of this work.

针对多个不相关故障的完整保护不在本工作范围内。

4. Problem Analysis
4. 问题分析

The duration of the packet delivery disruption caused by a conventional routing transition is determined by a number of factors:

由传统路由转换引起的分组传送中断的持续时间由许多因素确定:

1. The time taken to detect the failure. This may be of the order of a few milliseconds when it can be detected at the physical layer, up to several tens of seconds when a routing protocol Hello is employed. During this period, packets will be unavoidably lost.

1. 检测故障所用的时间。当可以在物理层检测到时,这可能是几毫秒的量级,而当使用路由协议Hello时,这可能是几十秒。在此期间,数据包将不可避免地丢失。

2. The time taken for the local router to react to the failure. This will typically involve generating and flooding new routing updates, perhaps after some hold-down delay, and re-computing the router's FIB.

2. 本地路由器对故障作出反应所用的时间。这通常涉及生成和泛洪新的路由更新,可能在一些抑制延迟之后,以及重新计算路由器的FIB。

3. The time taken to pass the information about the failure to other routers in the network. In the absence of routing protocol packet loss, this is typically between 10 milliseconds and 100 milliseconds per hop.

3. 将故障信息传递给网络中其他路由器所需的时间。在没有路由协议数据包丢失的情况下,这通常是每跳10毫秒到100毫秒之间。

4. The time taken to re-compute the forwarding tables. This is typically a few milliseconds for a link state protocol using Dijkstra's algorithm.

4. 重新计算转发表所用的时间。对于使用Dijkstra算法的链路状态协议,这通常是几毫秒。

5. The time taken to load the revised forwarding tables into the forwarding hardware. This time is very implementation dependent and also depends on the number of prefixes affected by the failure, but may be several hundred milliseconds.

5. 将修改后的转发表加载到转发硬件所用的时间。这一时间非常依赖于实现,也取决于受故障影响的前缀数量,但可能需要几百毫秒。

The disruption will last until the routers adjacent to the failure have completed steps 1 and 2, and until all the routers in the network whose paths are affected by the failure have completed the remaining steps.

中断将持续到与故障相邻的路由器完成步骤1和2,以及网络中路径受故障影响的所有路由器完成其余步骤。

The initial packet loss is caused by the router(s) adjacent to the failure continuing to attempt to transmit packets across the failure until it is detected. This loss is unavoidable, but the detection time can be reduced to a few tens of milliseconds as described in Section 5.1.

初始数据包丢失是由于邻近故障的路由器继续尝试在故障中传输数据包,直到检测到数据包为止。这种损失是不可避免的,但检测时间可以减少到几十毫秒,如第5.1节所述。

In some topologies, subsequent packet loss may be caused by the "micro-loops" which may form as a result of temporary inconsistencies between routers' forwarding tables [RFC5715]. These inconsistencies are caused by steps 3, 4, and 5 above, and in many routers it is step 5 that is both the largest factor and that has the greatest variance between routers. The large variance arises from implementation differences and from the differing impact that a failure has on each individual router. For example, the number of prefixes affected by the failure may vary dramatically from one router to another.

在某些拓扑中,后续的数据包丢失可能是由“微环”引起的,而“微环”可能是由于路由器的转发表之间的临时不一致而形成的[RFC5715]。这些不一致是由上面的步骤3、4和5造成的,在许多路由器中,步骤5是最大的因素,并且路由器之间的差异最大。较大的差异源于实现差异以及故障对每个路由器的不同影响。例如,受故障影响的前缀数量可能因路由器而异。

In order to reduce packet disruption times to a duration commensurate with the failure detection times, two mechanisms may be required:

为了将分组中断时间减少到与故障检测时间相称的持续时间,可能需要两种机制:

a. A mechanism for the router(s) adjacent to the failure to rapidly invoke a repair path, which is unaffected by any subsequent re-convergence.

a. 与故障相邻的路由器快速调用修复路径的一种机制,该修复路径不受任何后续重新收敛的影响。

b. In topologies that are susceptible to micro-loops, a micro-loop control mechanism may be required [RFC5715].

b. 在易受微回路影响的拓扑中,可能需要微回路控制机制[RFC5715]。

Performing the first task without the second may result in the repair path being starved of traffic and hence being redundant. Performing the second without the first will result in traffic being discarded by the router(s) adjacent to the failure.

在不执行第二个任务的情况下执行第一个任务可能会导致修复路径缺少通信量,从而导致冗余。在没有第一个的情况下执行第二个将导致与故障相邻的路由器丢弃通信量。

Repair paths may always be used in isolation where the failure is short-lived. In this case, the repair paths can be kept in place until the failure is repaired, therefore there is no need to advertise the failure to other routers.

在故障持续时间短的情况下,维修路径始终可以单独使用。在这种情况下,修复路径可以保持在原位,直到故障得到修复,因此不需要向其他路由器通告故障。

Similarly, micro-loop avoidance may be used in isolation to prevent loops arising from pre-planned management action. In which case the link or node being shut down can remain in service for a short time after its removal has been announced into the network, and hence it can function as its own "repair path".

类似地,可以单独使用微环避免,以防止由预先计划的管理行动引起的环。在这种情况下,被关闭的链路或节点在被宣布进入网络后可以在短时间内保持服务,因此它可以充当自己的“修复路径”。

Note that micro-loops may also occur when a link or node is restored to service, and thus a micro-loop avoidance mechanism may be required for both link up and link down cases.

请注意,当链路或节点恢复到服务状态时,也可能发生微环,因此,链路上行和链路下行情况都可能需要微环避免机制。

5. Mechanisms for IP Fast-Reroute
5. IP快速重路由机制

The set of mechanisms required for an effective solution to the problem can be broken down into the sub-problems described in this section.

有效解决问题所需的一套机制可分解为本节所述的子问题。

5.1. Mechanisms for Fast Failure Detection
5.1. 快速故障检测机制

It is critical that the failure detection time is minimized. A number of well-documented approaches are possible, such as:

将故障检测时间降至最低至关重要。有许多记录良好的方法是可能的,例如:

1. Physical detection; for example, loss of light.

1. 物理检测;例如,光线损失。

2. Protocol detection that is routing protocol independent; for example, the Bidirectional Failure Detection protocol [BFD].

2. 独立于路由协议的协议检测;例如,双向故障检测协议[BFD]。

3. Routing protocol detection; for example, use of "fast Hellos".

3. 路由协议检测;例如,使用“快速问候”。

When configuring packet-based failure detection mechanisms it is important that consideration be given to the likelihood and consequences of false indications of failure. The incidence of false indication of failure may be minimized by appropriately prioritizing the transmission, reception, and processing of the packets used to detect link or node failure. Note that this is not an issue that is specific to IPFRR.

在配置基于数据包的故障检测机制时,重要的是要考虑错误指示故障的可能性和后果。可以通过适当地对用于检测链路或节点故障的分组的发送、接收和处理进行优先级排序来最小化故障的错误指示的发生率。请注意,这不是IPFRR特有的问题。

5.2. Mechanisms for Repair Paths
5.2. 修复路径的机制

Once a failure has been detected by one of the above mechanisms, traffic that previously traversed the failure is transmitted over one or more repair paths. The design of the repair paths should be such that they can be pre-calculated in anticipation of each local failure and made available for invocation with minimal delay. There are three basic categories of repair paths:

一旦上述机制之一检测到故障,先前通过故障的流量将通过一条或多条修复路径传输。修复路径的设计应确保可以在预测每个局部故障时预先计算修复路径,并以最小的延迟进行调用。修复路径有三个基本类别:

1. Equal cost multi-paths (ECMP). Where such paths exist, and one or more of the alternate paths do not traverse the failure, they may trivially be used as repair paths.

1. 等成本多路径(ECMP)。如果存在这样的路径,并且一个或多个备用路径不遍历故障,则可以将它们简单地用作修复路径。

2. Loop-free alternate paths. Such a path exists when a direct neighbor of the router adjacent to the failure has a path to the destination that can be guaranteed not to traverse the failure.

2. 无循环备用路径。当与故障相邻的路由器的直接邻居有一条到目的地的路径时,这种路径就存在了,可以保证该路径不会穿越故障。

3. Multi-hop repair paths. When there is no feasible loop-free alternate path it may still be possible to locate a router, which is more than one hop away from the router adjacent to the failure, from which traffic will be forwarded to the destination without traversing the failure.

3. 多跳修复路径。当没有可行的无环路备用路径时,仍然可以找到一个路由器,该路由器距离故障附近的路由器有一个以上的跃点,通信量将从该路由器转发到目的地,而不会穿越故障。

ECMP and loop-free alternate paths (as described in [RFC5286]) offer the simplest repair paths and would normally be used when they are available. It is anticipated that around 80% of failures (see Section 5.2.2) can be repaired using these basic methods alone.

ECMP和无环备用路径(如[RFC5286]中所述)提供最简单的修复路径,通常在可用时使用。预计约80%的故障(见第5.2.2节)可单独使用这些基本方法进行修复。

Multi-hop repair paths are more complex, both in the computations required to determine their existence, and in the mechanisms required to invoke them. They can be further classified as:

多跳修复路径更为复杂,无论是在确定其存在所需的计算中,还是在调用它们所需的机制中。它们可进一步分类为:

a. Mechanisms where one or more alternate FIBs are pre-computed in all routers, and the repaired packet is instructed to be forwarded using a "repair FIB" by some method of per-packet signaling such as detecting a "U-turn" [UTURN], [FIFR] or by marking the packet [SIMULA].

a. 一种机制,其中在所有路由器中预先计算一个或多个备用FIB,并且通过诸如检测“U-turn”[UTURN]、[FIFR]或通过标记分组[SIMULA]之类的每分组信令的某种方法,指示使用“修复FIB”转发修复的分组。

b. Mechanisms functionally equivalent to a loose source route that is invoked using the normal FIB. These include tunnels [TUNNELS], alternative shortest paths [ALT-SP], and label-based mechanisms.

b. 机制在功能上相当于使用普通FIB调用的松散源路由。这些包括隧道、备选最短路径和基于标签的机制。

c. Mechanisms employing special addresses or labels that are installed in the FIBs of all routers with routes pre-computed to avoid certain components of the network. For example, see [NOTVIA].

c. 采用特殊地址或标签的机制,这些地址或标签安装在所有路由器的FIB中,预先计算路由以避免网络的某些组件。例如,请参见[NOTVIA]。

In many cases, a repair path that reaches two hops away from the router detecting the failure will suffice, and it is anticipated that around 98% of failures (see Section 5.2.2) can be repaired by this method. However, to provide complete repair coverage, some use of longer multi-hop repair paths is generally necessary.

在许多情况下,距离检测故障的路由器两跳的修复路径就足够了,预计约98%的故障(见第5.2.2节)可以通过这种方法修复。然而,为了提供完整的修复覆盖,通常需要使用更长的多跳修复路径。

5.2.1. Scope of Repair Paths
5.2.1. 修复路径的范围

A particular repair path may be valid for all destinations which require repair or may only be valid for a subset of destinations. If a repair path is valid for a node immediately downstream of the failure, then it will be valid for all destinations previously reachable by traversing the failure. However, in cases where such a repair path is difficult to achieve because it requires a high order multi-hop repair path, it may still be possible to identify lower-order repair paths (possibly even loop-free alternate paths) that allow the majority of destinations to be repaired. When IPFRR is unable to provide complete repair, it is desirable that the extent of the repair coverage can be determined and reported via network management.

特定修复路径可能对所有需要修复的目的地有效,也可能仅对目的地子集有效。如果修复路径对紧接故障下游的节点有效,那么它将对之前通过遍历故障可到达的所有目的地有效。然而,在这样的修复路径由于需要高阶多跳修复路径而难以实现的情况下,仍然可以识别允许大多数目的地被修复的低阶修复路径(甚至可能是无环路的备用路径)。当IPFRR无法提供完整的维修时,最好通过网络管理确定和报告维修覆盖范围。

There is a trade-off between minimizing the number of repair paths to be computed, and minimizing the overheads incurred in using higher-order multi-hop repair paths for destinations for which they are not strictly necessary. However, the computational cost of determining repair paths on an individual destination basis can be very high.

在最小化要计算的修复路径的数量和最小化使用高阶多跳修复路径(对于不严格需要修复路径的目的地)时产生的开销之间存在权衡。然而,基于单个目的地确定修复路径的计算成本可能非常高。

It will frequently be the case that the majority of destinations may be repaired using only the "basic" repair mechanism, leaving a smaller subset of the destinations to be repaired using one of the more complex multi-hop methods. Such a hybrid approach may go some way to resolving the conflict between completeness and complexity.

通常情况下,大多数目的地可仅使用“基本”修复机制进行修复,留下较小的目的地子集可使用更复杂的多跳方法之一进行修复。这种混合方法可能会在某种程度上解决完整性和复杂性之间的冲突。

The use of repair paths may result in excessive traffic passing over a link, resulting in congestion discard. This reduces the effectiveness of IPFRR. Mechanisms to influence the distribution of repaired traffic to minimize this effect are therefore desirable.

使用修复路径可能会导致过多的流量通过链路,从而导致拥塞。这降低了IPFRR的有效性。因此,需要采用影响修复交通分布的机制,以尽量减少这种影响。

5.2.2. Analysis of Repair Coverage
5.2.2. 维修覆盖率分析

The repair coverage obtained is dependent on the repair strategy and highly dependent on the detailed topology and metrics. Estimates of the repair coverage quoted in this document are for illustrative purposes only and may not be always be achievable.

获得的修复覆盖率取决于修复策略,并且高度依赖于详细的拓扑和度量。本文件中引用的维修范围估计仅用于说明目的,可能并不总是能够实现。

In some cases the repair strategy will permit the repair of all single link or node failures in the network for all possible destinations. This can be defined as 100% coverage. However, where

在某些情况下,修复策略将允许修复网络中所有可能目的地的所有单链路或节点故障。这可以定义为100%覆盖率。然而,在哪里

the coverage is less than 100%, it is important for the purposes of comparisons between different proposed repair strategies to define what is meant by such a percentage. There are four possibilities:

覆盖率小于100%,为了比较不同的拟议维修策略,定义该百分比的含义非常重要。有四种可能性:

1. The percentage of links (or nodes) that can be fully protected (i.e., for all destinations). This is appropriate where the requirement is to protect all traffic, but some percentage of the possible failures may be identified as being un-protectable.

1. 可以完全保护的链路(或节点)的百分比(即,对于所有目的地)。这适用于要求保护所有通信量的情况,但某些可能的故障可能被确定为不可保护。

2. The percentage of destinations that can be protected for all link (or node) failures. This is appropriate where the requirement is to protect against all possible failures, but some percentage of destinations may be identified as being un-protectable.

2. 可以针对所有链路(或节点)故障进行保护的目标的百分比。当要求针对所有可能的故障进行保护时,这是合适的,但某些百分比的目的地可能被确定为不可保护。

3. For all destinations (d) and for all failures (f), the percentage of the total potential failure cases (d*f) that are protected. This is appropriate where the requirement is an overall "best-effort" protection.

3. 对于所有目的地(d)和所有故障(f),受保护的总潜在故障案例(d*f)的百分比。这适用于要求为整体“尽力”保护的情况。

4. The percentage of packets normally passing though the network that will continue to reach their destination. This requires a traffic matrix for the network as part of the analysis.

4. 正常通过网络并将继续到达其目的地的数据包的百分比。这需要网络流量矩阵作为分析的一部分。

5.2.3. Link or Node Repair
5.2.3. 链接或节点修复

A repair path may be computed to protect against failure of an adjacent link, or failure of an adjacent node. In general, link protection is simpler to achieve. A repair which protects against node failure will also protect against link failure for all destinations except those for which the adjacent node is a single point of failure.

可以计算修复路径以防止相邻链路的故障或相邻节点的故障。一般来说,链路保护更容易实现。防止节点故障的修复也将防止所有目的地的链路故障,但相邻节点为单点故障的目的地除外。

In some cases, it may be necessary to distinguish between a link or node failure in order that the optimal repair strategy is invoked. Methods for link/node failure determination may be based on techniques such as BFD [BFD]. This determination may be made prior to invoking any repairs, but this will increase the period of packet loss following a failure unless the determination can be performed as part of the failure detection mechanism itself. Alternatively, a subsequent determination can be used to optimize an already invoked default strategy.

在某些情况下,可能需要区分链路或节点故障,以便调用最佳修复策略。链路/节点故障确定的方法可以基于诸如BFD[BFD]之类的技术。该确定可以在调用任何修复之前进行,但这将增加故障后的分组丢失时间,除非该确定可以作为故障检测机制本身的一部分执行。或者,随后的确定可用于优化已调用的默认策略。

5.2.4. Maintenance of Repair Paths
5.2.4. 维修路径的维护

In order to meet the response-time goals, it is expected (though not required) that repair paths, and their associated FIB entries, will be pre-computed and installed ready for invocation when a failure is detected. Following invocation, the repair paths remain in effect

为了满足响应时间目标,预计(尽管不是必需的)修复路径及其相关的FIB条目将预先计算并安装,以便在检测到故障时调用。调用后,修复路径仍然有效

until they are no longer required. This will normally be when the routing protocol has re-converged on the new topology taking into account the failure, and traffic will no longer be using the repair paths.

直到不再需要它们。考虑到故障,当路由协议重新聚合到新拓扑上时,通常会出现这种情况,并且流量将不再使用修复路径。

The repair paths have the property that they are unaffected by any topology changes resulting from the failure that caused their instantiation. Therefore, there is no need to re-compute them during the convergence period. They may be affected by an unrelated simultaneous topology change, but such events are out of scope of this work (see Section 5.2.6).

修复路径的属性是,它们不受导致其实例化的故障导致的任何拓扑更改的影响。因此,在收敛期间无需重新计算它们。它们可能会受到不相关的同时拓扑变化的影响,但此类事件不在本工作范围内(见第5.2.6节)。

Once the routing protocol has re-converged, it is necessary for all repair paths to take account of the new topology. Various optimizations may permit the efficient identification of repair paths that are unaffected by the change, and hence do not require full re-computation. Since the new repair paths will not be required until the next failure occurs, the re-computation may be performed as a background task and be subject to a hold-down, but excessive delay in completing this operation will increase the risk of a new failure occurring before the repair paths are in place.

一旦路由协议重新聚合,所有修复路径都必须考虑新拓扑。各种优化可允许有效识别不受更改影响的修复路径,因此不需要完全重新计算。由于在下一个故障发生之前不需要新的维修路径,因此可以将重新计算作为后台任务执行,并受到抑制,但完成此操作的过度延迟将增加在维修路径就位之前发生新故障的风险。

5.2.5. Local Area Networks
5.2.5. 局域网

Protection against partial or complete failure of LANs is more complex than the point-to-point case. In general, there is a trade-off between the simplicity of the repair and the ability to provide complete and optimal repair coverage.

防止局域网部分或完全故障的保护比点对点的情况更复杂。一般来说,维修的简单性与提供完整和最佳维修范围的能力之间存在权衡。

5.2.6. Multiple Failures and Shared Risk Link Groups
5.2.6. 多重故障和共享风险链接组

Complete protection against multiple unrelated failures is out of scope of this work. However, it is important that the occurrence of a second failure while one failure is undergoing repair should not result in a level of service which is significantly worse than that which would have been achieved in the absence of any repair strategy.

针对多个不相关故障的完整保护不在本工作范围内。然而,重要的是,当一个故障正在维修时,发生第二个故障,不应导致服务水平明显低于在没有任何维修策略的情况下所能达到的服务水平。

Shared Risk Link Groups (SRLGs) are an example of multiple related failures, and the more complex aspects of their protection are a matter for further study.

共享风险链接组(SRLGs)是多个相关故障的一个例子,其保护的更复杂方面有待进一步研究。

One specific example of an SRLG that is clearly within the scope of this work is a node failure. This causes the simultaneous failure of multiple links, but their closely defined topological relationship makes the problem more tractable.

显然在本工作范围内的SRLG的一个具体示例是节点故障。这会导致多个链接同时发生故障,但其定义严密的拓扑关系使问题更容易处理。

5.3. Mechanisms for Micro-Loop Prevention
5.3. 防止微循环的机制

Ensuring the absence of micro-loops is important not only because they can cause packet loss in traffic that is affected by the failure, but because by saturating a link with looping packets micro-loops can cause congestion. This congestion can then lead to routers discarding traffic that would otherwise be unaffected by the failure.

确保没有微环非常重要,不仅因为它们会导致受故障影响的流量中的数据包丢失,而且因为微环会使链路饱和,从而导致拥塞。这种拥塞会导致路由器丢弃原本不受故障影响的流量。

A number of solutions to the problem of micro-loop formation have been proposed and are summarized in [RFC5715]. The following factors are significant in their classification:

已经提出了许多微回路形成问题的解决方案,并在[RFC5715]中进行了总结。以下因素在其分类中很重要:

1. Partial or complete protection against micro-loops.

1. 针对微回路的部分或完全保护。

2. Convergence delay.

2. 收敛延迟。

3. Tolerance of multiple failures (from node failures, and in general).

3. 多个故障(节点故障和一般故障)的容差。

4. Computational complexity (pre-computed or real time).

4. 计算复杂性(预计算或实时)。

5. Applicability to scheduled events.

5. 对预定事件的适用性。

6. Applicability to link/node reinstatement.

6. 链接/节点恢复的适用性。

7. Topological constraints.

7. 拓扑约束。

6. Management Considerations
6. 管理考虑

While many of the management requirements will be specific to particular IPFRR solutions, the following general aspects need to be addressed:

虽然许多管理要求将针对特定的IPFRR解决方案,但需要解决以下一般方面:

1. Configuration

1. 配置

A. Enabling/disabling IPFRR support.

A.启用/禁用IPFRR支持。

B. Enabling/disabling protection on a per-link or per-node basis.

B.基于每个链路或每个节点启用/禁用保护。

C. Expressing preferences regarding the links/nodes used for repair paths.

C.表示有关用于修复路径的链接/节点的首选项。

D. Configuration of failure detection mechanisms.

D.故障检测机制的配置。

E. Configuration of loop-avoidance strategies

E.环路避免策略的配置

2. Monitoring and operational support

2. 监测和业务支助

A. Notification of links/nodes/destinations that cannot be protected.

A.通知无法保护的链接/节点/目的地。

B. Notification of pre-computed repair paths, and anticipated traffic patterns.

B.预先计算的维修路径和预期交通模式的通知。

C. Counts of failure detections, protection invocations, and packets forwarded over repair paths.

C.故障检测、保护调用和通过修复路径转发的数据包的计数。

D. Testing repairs.

D.测试维修。

7. Security Considerations
7. 安全考虑

This framework document does not itself introduce any security issues, but attention must be paid to the security implications of any proposed solutions to the problem.

本框架文件本身并没有提出任何安全问题,但必须注意对该问题提出的任何解决方案的安全影响。

Where the chosen solution uses tunnels it is necessary to ensure that the tunnel is not used as an attack vector. One method of addressing this is to use a set of tunnel endpoint addresses that are excluded from use by user traffic.

在所选择的解决方案使用隧道的地方,必须确保隧道不被用作攻击向量。解决此问题的一种方法是使用一组被用户流量排除在外的隧道端点地址。

There is a compatibility issue between IPFRR and reverse path forwarding (RPF) checking. Many of the solutions described in this document result in traffic arriving from a direction inconsistent with a standard RPF check. When a network relies on RPF checking for security purposes, an alternative security mechanism will need to be deployed in order to permit IPFRR to used.

IPFRR和反向路径转发(RPF)检查之间存在兼容性问题。本文档中描述的许多解决方案导致来自与标准RPF检查不一致的方向的流量。当网络出于安全目的依赖RPF检查时,需要部署替代安全机制以允许使用IPFRR。

Because the repair path will often be of a different length than the pre-failure path, security mechanisms that rely on specific Time to Live (TTL) values will be adversely affected.

由于修复路径的长度通常与故障前路径不同,因此依赖于特定生存时间(TTL)值的安全机制将受到不利影响。

8. Acknowledgements
8. 致谢

The authors would like to acknowledge contributions made by Alia Atlas, Clarence Filsfils, Pierre Francois, Joel Halpern, Stefano Previdi, and Alex Zinin.

作者要感谢Alia Atlas、Clarence Filsfils、Pierre Francois、Joel Halpern、Stefano Previdi和Alex Zinin的贡献。

9. Informative References
9. 资料性引用

[ALT-SP] Tian, A., "Fast Reroute using Alternative Shortest Paths", Work in Progress, July 2004.

[ALT-SP]Tian,A.,“使用替代最短路径的快速重路由”,正在进行的工作,2004年7月。

[BFD] Katz, D. and D. Ward, "Bidirectional Forwarding Detection", Work in Progress, January 2010.

[BFD]Katz,D.和D.Ward,“双向转发检测”,正在进行的工作,2010年1月。

[FIFR] Nelakuditi, S., Lee, S., Lu, Y., Zhang, Z., and C. Chuah, "Fast Local Rerouting for Handling Transient Link Failures", IEEE/ACM Transactions on Networking, Vol. 15, No. 2, DOI 10.1109/TNET.2007.892851, available from http://www.ieeexplore.ieee.org, April 2007.

[FIFR]Nelakuditi,S.,Lee,S.,Lu,Y.,Zhang,Z.,和C.Chuah,“处理瞬时链路故障的快速本地重新路由”,IEEE/ACM网络事务,第15卷,第2期,DOI 10.1109/TNET.2007.892851,可从http://www.ieeexplore.ieee.org,2007年4月。

[NOTVIA] Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute Using Not-via Addresses", Work in Progress, July 2009.

[NOTVIA]Shand,M.,Bryant,S.和S.Previdi,“使用非via地址的IP快速重路由”,正在进行的工作,2009年7月。

[RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, May 2005.

[RFC4090]Pan,P.,Swallow,G.,和A.Atlas,“LSP隧道RSVP-TE快速重路由扩展”,RFC 40902005年5月。

[RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, September 2008.

[RFC5286]Atlas,A.和A.Zinin,“IP快速重路由的基本规范:无环路交替”,RFC 5286,2008年9月。

[RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free Convergence", RFC 5715, January 2010.

[RFC5715]Shand,M.和S.Bryant,“无环收敛框架”,RFC 5715,2010年1月。

[SIMULA] Kvalbein, A., Hansen, A., Cicic, T., Gjessing, S., and O. Lysne, "Fast IP Network Recovery using Multiple Routing Configurations", Infocom 10.1109/INFOCOM.2006.227, available from http://www.ieeexplore.ieee.org, April 2006.

[SIMULA]Kvalbein,A.,Hansen,A.,Cicic,T.,Gjessing,S.,和O.Lysne,“使用多种路由配置的快速IP网络恢复”,Infocom 10.1109/Infocom.2006.227,可从http://www.ieeexplore.ieee.org,2006年4月。

[TUNNELS] Bryant, S., Filsfils, C., Previdi, S., and M. Shand, "IP Fast Reroute using tunnels", Work in Progress, November 2007.

[隧道]Bryant,S.,Filsfils,C.,Previdi,S.,和M.Shand,“使用隧道的IP快速重路由”,正在进行的工作,2007年11月。

[UTURN] Atlas, A., "U-turn Alternates for IP/LDP Fast-Reroute", Work in Progress, February 2006.

[UTURN]Atlas,A.,“IP/LDP快速重路由的U形转弯备选方案”,正在进行的工作,2006年2月。

Authors' Addresses

作者地址

Mike Shand Cisco Systems 250, Longwater Avenue. Reading, Berks RG2 6GB UK

Mike Shand Cisco Systems 250,Longwater大道。雷丁,伯克斯RG2 6GB英国

   EMail: mshand@cisco.com
        
   EMail: mshand@cisco.com
        

Stewart Bryant Cisco Systems 250, Longwater Avenue. Reading, Berks RG2 6GB UK

斯图尔特·布莱恩特思科系统公司,朗沃特大道250号。雷丁,伯克斯RG2 6GB英国

   EMail: stbryant@cisco.com
        
   EMail: stbryant@cisco.com