RFC 3386: Network Hierarchy and Multilayer Survivability 中文翻译

URL : https://datatracker.ietf.org/doc/html/rfc3386
标题 : RFC 3386
翻译类型 : 自动生成

Network Working Group                                        W. Lai, Ed.
Request for Comments: 3386                                          AT&T
Category: Informational                                  D. McDysan, Ed.
                                                                WorldCom
                                                           November 2002

Network Working Group                                        W. Lai, Ed.
Request for Comments: 3386                                          AT&T
Category: Informational                                  D. McDysan, Ed.
                                                                WorldCom
                                                           November 2002

Network Hierarchy and Multilayer Survivability

网络层次结构与多层生存性

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

版权公告

Abstract

摘要

This document presents a proposal of the near-term and practical requirements for network survivability and hierarchy in current service provider environments.

本文档提出了当前服务提供商环境中网络生存性和层次结构的近期和实际需求建议。

Conventions used in this document

本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [2].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照BCP 14、RFC 2119[2]中的描述进行解释。

Table of Contents

   1. Introduction..............................................2
   2. Terminology and Concepts..................................5
   2.1 Hierarchy................................................6
   2.1.1 Vertical Hierarchy.....................................5
   2.1.2 Horizontal Hierarchy...................................6
   2.2 Survivability Terminology................................6
   2.2.1 Survivability..........................................7
   2.2.2 Generic Operations.....................................7
   2.2.3 Survivability Techniques...............................8
   2.2.4 Survivability Performance..............................9
   2.3 Survivability Mechanisms: Comparison....................10
   3. Survivability............................................11
   3.1 Scope...................................................11
   3.2 Required initial set of survivability mechanisms........12
   3.2.1 1:1 Path Protection with Pre-Established Capacity.....12
   3.2.2 1:1 Path Protection with Pre-Planned Capacity.........13
   3.2.3 Local Restoration.....................................13
   3.2.4 Path Restoration......................................14
   3.3 Applications Supported..................................14
   3.4 Timing Bounds for Survivability Mechanisms..............15
   3.5 Coordination Among Layers...............................16
   3.6 Evolution Toward IP Over Optical........................17
   4. Hierarchy Requirements...................................17
   4.1 Historical Context......................................17
   4.2 Applications for Horizontal Hierarchy...................18
   4.3 Horizontal Hierarchy Requirements.......................19
   5. Survivability and Hierarchy..............................19
   6. Security Considerations..................................20
   7. References...............................................21
   8. Acknowledgments..........................................22
   9. Contributing Authors.....................................22
   Appendix A: Questions used to help develop requirements.....23
   Editors' Addresses..........................................26
   Full Copyright Statement....................................27

   1. Introduction..............................................2
   2. Terminology and Concepts..................................5
   2.1 Hierarchy................................................6
   2.1.1 Vertical Hierarchy.....................................5
   2.1.2 Horizontal Hierarchy...................................6
   2.2 Survivability Terminology................................6
   2.2.1 Survivability..........................................7
   2.2.2 Generic Operations.....................................7
   2.2.3 Survivability Techniques...............................8
   2.2.4 Survivability Performance..............................9
   2.3 Survivability Mechanisms: Comparison....................10
   3. Survivability............................................11
   3.1 Scope...................................................11
   3.2 Required initial set of survivability mechanisms........12
   3.2.1 1:1 Path Protection with Pre-Established Capacity.....12
   3.2.2 1:1 Path Protection with Pre-Planned Capacity.........13
   3.2.3 Local Restoration.....................................13
   3.2.4 Path Restoration......................................14
   3.3 Applications Supported..................................14
   3.4 Timing Bounds for Survivability Mechanisms..............15
   3.5 Coordination Among Layers...............................16
   3.6 Evolution Toward IP Over Optical........................17
   4. Hierarchy Requirements...................................17
   4.1 Historical Context......................................17
   4.2 Applications for Horizontal Hierarchy...................18
   4.3 Horizontal Hierarchy Requirements.......................19
   5. Survivability and Hierarchy..............................19
   6. Security Considerations..................................20
   7. References...............................................21
   8. Acknowledgments..........................................22
   9. Contributing Authors.....................................22
   Appendix A: Questions used to help develop requirements.....23
   Editors' Addresses..........................................26
   Full Copyright Statement....................................27

1. Introduction

1. 介绍

This document is the result of the Network Hierarchy and Survivability Techniques Design Team established within the Traffic Engineering Working Group. This team collected and documented current and near term requirements for survivability and hierarchy in service provider environments. For clarity, an expanded set of definitions is included. The team determined that there appears to be a need to define a small set of interoperable survivability approaches in packet and non-packet networks. Suggested approaches include path-based as well as one that repairs connections in

本文件是交通工程工作组内成立的网络层次结构和生存能力技术设计团队的成果。该团队收集并记录了服务提供商环境中生存能力和层次结构的当前和近期需求。为清楚起见，包含了一组扩展的定义。该团队确定，似乎有必要在分组和非分组网络中定义一组小型的可互操作生存性方法。建议的方法包括基于路径的方法以及修复连接的方法

proximity to the network fault. They operate primarily at a single network layer. For hierarchy, there did not appear to be a driving near-term need for work on "vertical hierarchy," defined as communication between network layers such as Time Division Multiplexed (TDM)/optical and Multi-Protocol Label Switching (MPLS). In particular, instead of direct exchange of signaling and routing between vertical layers, some looser form of coordination and communication, such as the specification of hold-off timers, is a nearer term need. For "horizontal hierarchy" in data networks, there are several pressing needs. The requirement is to be able to set up many Label Switched Paths (LSPs) in a service provider network with hierarchical Interior Gateway Protocol (IGP). This is necessary to support layer 2 and layer 3 Virtual Private Network (VPN) services that require edge-to-edge signaling across a core network.

接近网络故障。它们主要在单个网络层上运行。对于层次结构，近期内似乎没有对“垂直层次结构”工作的驱动需求，垂直层次结构定义为网络层之间的通信，如时分复用（TDM）/光和多协议标签交换（MPLS）。特别是，与垂直层之间的信令和路由直接交换不同，一些更松散的协调和通信形式，例如延迟计时器的规范，是近期的需要。对于数据网络中的“水平层次”，有几个迫切的需求。要求能够在具有分层内部网关协议（IGP）的服务提供商网络中建立多个标签交换路径（LSP）。这对于支持需要跨核心网络的边到边信令的第2层和第3层虚拟专用网络（VPN）服务是必要的。

This document presents a proposal of the near-term and practical requirements for network survivability and hierarchy in current service provider environments. With feedback from the working group solicited, the objective is to help focus the work that is being addressed in the TEWG (Traffic Engineering Working Group), CCAMP (Common Control and Measurement Plane Working Group), and other working groups. A main goal of this work is to provide some expedience for required functionality in multi-vendor service provider networks. The initial focus is primarily on intra-domain operations. However, to maintain consistency in the provision of end-to-end service in a multi-provider environment, rules governing the operations of survivability mechanisms at domain boundaries must also be specified. While such issues are raised and discussed, where appropriate, they will not be treated in depth in the initial release of this document.

本文档提出了当前服务提供商环境中网络生存性和层次结构的近期和实际需求建议。通过征求工作组的反馈意见，目标是帮助集中TEWG（交通工程工作组）、CCAMP（通用控制和测量平面工作组）和其他工作组正在处理的工作。这项工作的主要目标是为多供应商服务提供商网络中所需的功能提供一些便利。最初的重点主要是域内操作。然而，为了在多提供商环境中保持端到端服务提供的一致性，还必须指定管理域边界处生存性机制操作的规则。虽然在适当的情况下提出并讨论了这些问题，但在本文件的初始版本中将不会深入讨论这些问题。

The document first develops a set of definitions to be used later in this document and potentially in other documents as well. It then addresses the requirements and issues associated with service restoration, hierarchy, and finally a short discussion of survivability in hierarchical context.

本文件首先制定了一套定义，将在本文件后面以及其他文件中使用。然后讨论与服务恢复、层次结构相关的需求和问题，最后简要讨论层次结构环境中的生存性。

Here is a summary of the findings:

以下是调查结果的摘要：

A. Survivability Requirements

A.生存能力要求

o need to define a small set of interoperable survivability approaches in packet and non-packet networks o suggested survivability mechanisms include - 1:1 path protection with pre-established backup capacity (non-shared) - 1:1 path protection with pre-planned backup capacity (shared)

o 需要在分组和非分组网络中定义一小部分可互操作的生存能力方法o建议的生存能力机制包括-1:1路径保护和预先建立的备份容量（非共享）-1:1路径保护和预先计划的备份容量（共享）

- local restoration with repairs in proximity to the network fault - path restoration through source-based rerouting o timing bounds for service restoration to support voice call cutoff (140 msec to 2 sec), protocol timer requirements in premium data services, and mission critical applications o use of restoration priority for service differentiation

-本地恢复，修复接近网络故障-通过基于源的重新路由进行路径恢复o服务恢复的定时界限，以支持语音呼叫中断（140毫秒至2秒），高级数据服务中的协议计时器要求，和任务关键型应用程序o使用恢复优先级实现服务差异化

B. Hierarchy Requirements

B.等级要求

B.1. Horizontally Oriented Hierarchy (Intra-Domain)

B.1. 水平方向的层次结构（域内）

o ability to set up many LSPs in a service provider network with hierarchical IGP, for the support of layer 2 and layer 3 VPN services o requirements for multi-area traffic engineering need to be developed to provide guidance for any necessary protocol extensions

o 在具有分层IGP的服务提供商网络中建立多个LSP的能力，以支持第2层和第3层VPN服务。需要制定多区域流量工程的要求，为任何必要的协议扩展提供指导

B.2. Vertically Oriented Hierarchy

B.2. 垂直方向的层次结构

The following functionality for survivability is common on most routing equipment today.

以下生存能力功能在当今大多数路由设备上都很常见。

o near-term need is some loose form of coordination and communication based on the use of nested hold-off timers, instead of direct exchange of signaling and routing between vertical layers o means for an upper layer to immediately begin recovery actions in the event that a lower layer is not configured to perform recovery

o 近期需求是基于使用嵌套延迟定时器的某种松散形式的协调和通信，而不是在垂直层之间直接交换信令和路由，这意味着上层在下层未配置为执行恢复时立即开始恢复操作

C. Survivability Requirements in Horizontal Hierarchy

C.水平层次结构中的生存能力要求

o protection of end-to-end connection is based on a concatenated set of connections, each protected within their area o mechanisms for connection routing may include (1) a network element that participates on both sides of a boundary (e.g., OSPF ABR) - note that this is a common point of failure; (2) a route server o need for inter-area signaling of survivability information (1) to enable a "least common denominator" survivability mechanism at the boundary; (2) to convey the success or failure of the service restoration action; e.g., if a part of a "connection" is down on one side of a boundary, there is no need for the other side to recover from failures

o 端到端连接的保护基于连接的串联集合，每个连接在其区域内受到保护，用于连接路由的机制可包括（1）参与边界两侧的网元（例如，OSPF ABR）-注意，这是一个常见故障点；（2）路由服务器o需要生存性信息（1）的区域间信令，以在边界处启用“最小公分母”生存性机制；（2）传达服务恢复行动的成功或失败；e、例如，如果“连接”的一部分位于边界的一侧，则另一侧无需从故障中恢复

2. Terminology and Concepts

2. 术语和概念

2.1 Hierarchy

2.1 等级制度

Hierarchy is a technique used to build scalable complex systems. It is based on an abstraction, at each level, of what is most significant from the details and internal structures of the levels further away. This approach makes use of a general property of all hierarchical systems composed of related subsystems that interactions between subsystems decrease as the level of communication between subsystems decreases.

层次结构是一种用于构建可伸缩复杂系统的技术。它是基于一个抽象，在每一个层次，什么是最重要的细节和内部结构的层次更远。该方法利用了由相关子系统组成的所有层次系统的一般特性，即子系统之间的交互随着子系统之间的通信级别的降低而减少。

Network hierarchy is an abstraction of part of a network's topology, routing and signaling mechanisms. Abstraction may be used as a mechanism to build large networks or as a technique for enforcing administrative, topological, or geographic boundaries. For example, network hierarchy might be used to separate the metropolitan and long-haul regions of a network, or to separate the regional and backbone sections of a network, or to interconnect service provider networks (with BGP which reduces a network to an Autonomous System).

网络层次结构是网络拓扑、路由和信令机制的一部分的抽象。抽象可以用作构建大型网络的机制，也可以用作强制执行管理、拓扑或地理边界的技术。例如，网络层次结构可用于分离网络的城域和长途区域，或分离网络的区域和骨干部分，或互连服务提供商网络（使用BGP将网络简化为自治系统）。

In this document, network hierarchy is considered from two perspectives:

本文件从两个角度考虑了网络层次结构：

(1) Vertically oriented: between two network technology layers. (2) Horizontally oriented: between two areas or administrative subdivisions within the same network technology layer.

(1) 垂直方向：在两个网络技术层之间。（2）横向：在同一网络技术层内的两个区域或行政分区之间。

2.1.1 Vertical Hierarchy

2.1.1 垂直层次

Vertical hierarchy is the abstraction, or reduction in information, which would be of benefit when communicating information across network technology layers, as in propagating information between optical and router networks.

垂直层次结构是信息的抽象或减少，在跨网络技术层进行信息通信时，如在光网络和路由器网络之间传播信息时，这将是有益的。

In the vertical hierarchy, the total network functions are partitioned into a series of functional or technological layers with clear logical, and maybe even physical, separation between adjacent layers. Survivability mechanisms either currently exist or are being developed at multiple layers in networks [3]. The optical layer is now becoming capable of providing dynamic ring and mesh restoration functionality, in addition to traditional 1+1 or 1:1 protection. The Synchronous Digital Hierarchy (SDH)/Synchronous Optical NETwork (SONET) layer provides survivability capability with automatic protection switching (APS), as well as self-healing ring and mesh restoration architectures. Similar functionality has been defined in the Asynchronous Transfer Mode (ATM) Layer, with work ongoing to also provide such functionality using MPLS [4]. At the IP layer,

在垂直层次结构中，整个网络功能被划分为一系列功能或技术层，相邻层之间具有清晰的逻辑甚至物理分离。生存性机制目前存在或正在网络的多个层次上发展[3]。除了传统的1+1或1:1保护之外，光学层现在能够提供动态环网恢复功能。同步数字体系（SDH）/同步光网络（SONET）层通过自动保护交换（APS）以及自愈环和网状恢复架构提供生存能力。在异步传输模式（ATM）层中定义了类似的功能，目前正在进行使用MPLS提供此类功能的工作[4]。在IP层,，

rerouting is used to restore service continuity following link and node outages. Rerouting at the IP layer, however, occurs after a period of routing convergence, which may require a few seconds to several minutes to complete [5].

重路由用于在链路和节点中断后恢复服务连续性。然而，IP层的重新路由是在路由聚合一段时间后发生的，这可能需要几秒钟到几分钟才能完成[5]。

2.1.2 Horizontal Hierarchy

2.1.2 水平层次

Horizontal hierarchy is the abstraction that allows a network at one technology layer, for instance a packet network, to scale. Examples of horizontal hierarchy include BGP confederations, separate Autonomous Systems, and multi-area OSPF.

水平层次是允许网络在一个技术层（例如分组网络）进行扩展的抽象。水平层次结构的示例包括BGP联盟、独立自治系统和多区域OSPF。

In the horizontal hierarchy, a large network is partitioned into multiple smaller, non-overlapping sub-networks. The partitioning criteria can be based on topology, network function, administrative policy, or service domain demarcation. Two networks at the *same* hierarchical level, e.g., two Autonomous Systems in BGP, may share a peer relation with each other through some loose form of coupling. On the other hand, for routing in large networks using multi-area OSPF, abstraction through the aggregation of routing information is achieved through a hierarchical partitioning of the network.

在水平层次结构中，一个大型网络被划分为多个较小、不重叠的子网络。分区标准可以基于拓扑、网络功能、管理策略或服务域划分。处于*相同*层级的两个网络，例如BGP中的两个自治系统，可以通过某种松散的耦合形式彼此共享对等关系。另一方面，对于使用多区域OSPF的大型网络中的路由，通过对网络进行分层分区，通过聚合路由信息实现抽象。

2.2 Survivability Terminology

2.2 生存能力术语

In alphabetical order, the following terms are defined in this section:

本节按字母顺序定义了以下术语：

backup entity, same as protection entity (section 2.2.2) extra traffic (section 2.2.2) non-revertive mode (section 2.2.2) normalization (section 2.2.2) preemptable traffic, same as extra traffic (section 2.2.2) preemption priority (section 2.2.4) protection (section 2.2.3) protection entity (section 2.2.2) protection switching (section 2.2.3) protection switch time (section 2.2.4) recovery (section 2.2.2) recovery by rerouting, same as restoration (section 2.2.3) recovery entity, same as protection entity (section 2.2.2) restoration (section 2.2.3) restoration priority (section 2.2.4) restoration time (section 2.2.4) revertive mode (section 2.2.2) shared risk group (SRG) (section 2.2.2) survivability (section 2.2.1) working entity (section 2.2.2)

备份实体，与保护实体（第2.2.2节）额外流量（第2.2.2节）非恢复模式（第2.2.2节）正常化（第2.2.2节）可抢占流量相同，与额外流量（第2.2.2节）抢占优先级（第2.2.4节）保护（第2.2.3节）保护实体（第2.2.2节）保护切换（第2.2.3节）相同保护切换时间（第2.2.4节）恢复（第2.2.2节）通过重新路由恢复，与恢复（第2.2.3节）恢复实体相同，与保护实体（第2.2.2节）恢复（第2.2.3节）恢复优先级（第2.2.4节）恢复时间（第2.2.4节）恢复模式（第2.2.2节）共享风险组（SRG）相同（第2.2.2节）生存能力（第2.2.1节）工作实体（第2.2.2节）

2.2.1 Survivability

2.2.1 生存能力

Survivability is the capability of a network to maintain service continuity in the presence of faults within the network [6]. Survivability mechanisms such as protection and restoration are implemented either on a per-link basis, on a per-path basis, or throughout an entire network to alleviate service disruption at affordable costs. The degree of survivability is determined by the network's capability to survive single failures, multiple failures, and equipment failures.

生存性是指网络在出现故障时保持服务连续性的能力[6]。可生存性机制（如保护和恢复）可以在每条链路、每条路径或整个网络上实施，以降低服务中断的成本。生存能力的程度取决于网络在单个故障、多个故障和设备故障下的生存能力。

2.2.2 Generic Operations

2.2.2 一般操作

This document does not discuss the sequence of events of how network failures are monitored, detected, and mitigated. For more detail of this aspect, see [4]. Also, the repair process following a failure is out of the scope here.

本文档不讨论如何监控、检测和缓解网络故障的事件顺序。有关这方面的更多详细信息，请参见[4]。此外，故障后的维修过程也不在此处的范围之内。

A working entity is the entity that is used to carry traffic in normal operation mode. Depending upon the context, an entity can be a channel or a transmission link in the physical layer, an Label Switched Path (LSP) in MPLS, or a logical bundle of one or more LSPs.

工作实体是用于在正常运行模式下承载流量的实体。根据上下文，实体可以是物理层中的信道或传输链路、MPLS中的标签交换路径（LSP）或一个或多个LSP的逻辑束。

A protection entity, also called backup entity or recovery entity, is the entity that is used to carry protected traffic in recovery operation mode, i.e., when the working entity is in error or has failed.

保护实体，也称为备份实体或恢复实体，是用于在恢复操作模式下（即，当工作实体出错或出现故障时）承载受保护流量的实体。

Extra traffic, also referred to as preemptable traffic, is the traffic carried over the protection entity while the working entity is active. Extra traffic is not protected, i.e., when the protection entity is required to protect the traffic that is being carried over the working entity, the extra traffic is preempted.

额外通信量，也称为可抢占通信量，是在工作实体处于活动状态时通过保护实体传输的通信量。额外流量不受保护，即，当保护实体需要保护通过工作实体传输的流量时，额外流量被抢占。

A shared risk group (SRG) is a set of network elements that are collectively impacted by a specific fault or fault type. For example, a shared risk link group (SRLG) is the union of all the links on those fibers that are routed in the same physical conduit in a fiber-span network. This concept includes, besides shared conduit, other types of compromise such as shared fiber cable, shared right of way, shared optical ring, shared office without power sharing, etc. The span of an SRG, such as the length of the sharing for compromised outside plant, needs to be considered on a per fault basis. The concept of SRG can be extended to represent a "risk domain" and its associated capabilities and summarization for traffic engineering purposes. See [7] for further discussion.

共享风险组（SRG）是一组受特定故障或故障类型共同影响的网络元件。例如，共享风险链路组（SRLG）是在光纤跨度网络中路由在同一物理导管中的那些光纤上的所有链路的联合。这一概念包括，除共用导管外，其他类型的危害，如共用光缆、共用路权、共用光环、不共享电源的共用办公室等。SRG的跨度，如受损外部电厂的共享长度，需要根据每个故障进行考虑。SRG的概念可以扩展为代表一个“风险域”及其相关能力，并用于流量工程目的。有关进一步的讨论，请参见[7]。

Normalization is the sequence of events and actions taken by a network that returns the network to the preferred state upon completing repair of a failure. This could include the switching or rerouting of affected traffic to the original repaired working entities or new routes. Revertive mode refers to the case where traffic is automatically returned to a repaired working entity (also called switch back).

正常化是网络在完成故障修复后将网络恢复到首选状态的事件和操作序列。这可能包括将受影响的流量切换或重新路由到原始修复的工作实体或新路由。恢复模式是指通信量自动返回到修复的工作实体（也称为切换回）的情况。

Recovery is the sequence of events and actions taken by a network after the detection of a failure to maintain the required performance level for existing services (e.g., according to service level agreements) and to allow normalization of the network. The actions include notification of the failure followed by two parallel processes: (1) a repair process with fault isolation and repair of the failed components, and (2) a reconfiguration process using survivability mechanisms to maintain service continuity. In protection, reconfiguration involves switching the affected traffic from a working entity to a protection entity. In restoration, reconfiguration involves path selection and rerouting for the affected traffic.

恢复是在检测到故障后，网络采取的一系列事件和行动，以维持现有服务所需的性能级别（例如，根据服务级别协议），并允许网络正常化。这些措施包括故障通知，然后是两个并行过程：（1）具有故障隔离和故障组件修复的修复过程，以及（2）使用生存性机制来维持服务连续性的重新配置过程。在保护中，重新配置涉及将受影响的流量从工作实体切换到保护实体。在恢复中，重新配置涉及受影响流量的路径选择和重新路由。

Revertive mode is a procedure in which revertive action, i.e., switch back from the protection entity to the working entity, is taken once the failed working entity has been repaired. In non-revertive mode, such action is not taken. To minimize service interruption, switch-back in revertive mode should be performed at a time when there is the least impact on the traffic concerned, or by using the make-before-break concept.

恢复模式是一种程序，在该程序中，一旦故障工作实体得到修复，将采取恢复操作，即从保护实体切换回工作实体。在非还原模式下，不采取此类操作。为了最大限度地减少服务中断，应在对相关交通影响最小的时间，或使用先通后断的概念，以恢复模式切换回。

Non-revertive mode is the case where there is no preferred path or it may be desirable to minimize further disruption of the service brought on by a revertive switching operation. A switch-back to the original working path is not desired or not possible since the original path may no longer exist after the occurrence of a fault on that path.

非回复模式是指没有首选路径的情况，或者可能希望最小化回复切换操作带来的服务进一步中断。不希望或不可能切换回原始工作路径，因为在该路径上发生故障后，原始路径可能不再存在。

2.2.3 Survivability Techniques

2.2.3 生存能力技术

Protection, also called protection switching, is a survivability technique based on predetermined failure recovery: as the working entity is established, a protection entity is also established. Protection techniques can be implemented by several architectures: 1+1, 1:1, 1:n, and m:n. In the context of SDH/SONET, they are referred to as Automatic Protection Switching (APS).

保护，也称为保护切换，是一种基于预定故障恢复的生存性技术：在建立工作实体时，也会建立保护实体。保护技术可以通过几种体系结构实现：1+1、1:1、1:n和m:n。在SDH/SONET的上下文中，它们被称为自动保护切换（APS）。

In the 1+1 protection architecture, a protection entity is dedicated to each working entity. The dual-feed mechanism is used whereby the working entity is permanently bridged onto the protection entity at

在1+1保护体系结构中，保护实体专用于每个工作实体。使用双进给机构，工作实体永久桥接到保护实体上

the source of the protected domain. In normal operation mode, identical traffic is transmitted simultaneously on both the working and protection entities. At the other end (sink) of the protected domain, both feeds are monitored for alarms and maintenance signals. A selection between the working and protection entity is made based on some predetermined criteria, such as the transmission performance requirements or defect indication.

受保护域的源。在正常运行模式下，在工作实体和保护实体上同时传输相同的通信量。在受保护域的另一端（接收器），两个馈送都受到报警和维护信号的监控。工作实体和保护实体之间的选择基于一些预定标准，例如传输性能要求或缺陷指示。

In the 1:1 protection architecture, a protection entity is also dedicated to each working entity. The protected traffic is normally transmitted by the working entity. When the working entity fails, the protected traffic is switched to the protection entity. The two ends of the protected domain must signal detection of the fault and initiate the switchover.

在1:1保护体系结构中，保护实体也专用于每个工作实体。受保护的通信量通常由工作实体传输。当工作实体出现故障时，受保护的通信量将切换到保护实体。受保护域的两端必须发出故障检测信号并启动切换。

In the 1:n protection architecture, a dedicated protection entity is shared by n working entities. In this case, not all of the affected traffic may be protected.

在1:n保护体系结构中，一个专用保护实体由n个工作实体共享。在这种情况下，并非所有受影响的交通都可以得到保护。

The m:n architecture is a generalization of the 1:n architecture. Typically m <= n, where m dedicated protection entities are shared by n working entities.

m:n体系结构是1:n体系结构的推广。通常m<=n，其中m个专用保护实体由n个工作实体共享。

Restoration, also referred to as recovery by rerouting [4], is a survivability technique that establishes new paths or path segments on demand, for restoring affected traffic after the occurrence of a fault. The resources in these alternate paths are the currently unassigned (unreserved) resources in the same layer. Preemption of extra traffic may also be used if spare resources are not available to carry the higher-priority protected traffic. As initiated by detection of a fault on the working path, the selection of a recovery path may be based on preplanned configurations, network routing policies, or current network status such as network topology and fault information. Signaling is used for establishing the new paths to bypass the fault. Thus, restoration involves a path selection process followed by rerouting of the affected traffic from the working entity to the recovery entity.

恢复，也称为重路由恢复[4]，是一种生存性技术，可根据需要建立新的路径或路径段，用于在故障发生后恢复受影响的通信量。这些备用路径中的资源是同一层中当前未分配（未保留）的资源。如果没有备用资源来承载更高优先级的受保护流量，也可以使用额外流量的抢占。当检测到工作路径上的故障时，恢复路径的选择可以基于预先计划的配置、网络路由策略或当前网络状态，例如网络拓扑和故障信息。信令用于建立绕过故障的新路径。因此，恢复涉及路径选择过程，然后将受影响的流量从工作实体重新路由到恢复实体。

2.2.4 Survivability Performance

2.2.4 生存性能

Protection switch time is the time interval from the occurrence of a network fault until the completion of the protection-switching operations. It includes the detection time necessary to initiate the protection switch, any hold-off time to allow for the interworking of protection schemes, and the switch completion time.

保护切换时间是从网络故障发生到保护切换操作完成的时间间隔。它包括启动保护开关所需的检测时间、允许保护方案互通的任何保持时间以及开关完成时间。

Restoration time is the time interval from the occurrence of a network fault to the instant when the affected traffic is either completely restored, or until spare resources are exhausted, and/or no more extra traffic exists that can be preempted to make room.

恢复时间是指从网络故障发生到受影响的通信量完全恢复，或直到备用资源耗尽和/或不再存在可抢占空间的额外通信量的时间间隔。

Restoration priority is a method of giving preference to protect higher-priority traffic ahead of lower-priority traffic. Its use is to help determine the order of restoring traffic after a failure has occurred. The purpose is to differentiate service restoration time as well as to control access to available spare capacity for different classes of traffic.

恢复优先级是一种优先保护优先级较高的流量而不是优先级较低的流量的方法。它的用途是帮助确定故障发生后恢复通信量的顺序。其目的是区分服务恢复时间，以及控制对不同类别流量的可用备用容量的访问。

Preemption priority is a method of determining which traffic can be disconnected in the event that not all traffic with a higher restoration priority is restored after the occurrence of a failure.

抢占优先级是一种确定在故障发生后并非所有具有较高恢复优先级的流量都恢复时，哪些流量可以断开连接的方法。

2.3 Survivability Mechanisms: Comparison

2.3 生存机制：比较

In a survivable network design, spare capacity and diversity must be built into the network from the beginning to support some degree of self-healing whenever failures occur. A common strategy is to associate each working entity with a protection entity having either dedicated resources or shared resources that are pre-reserved or reserved-on-demand. According to the methods of setting up a protection entity, different approaches to providing survivability can be classified. Generally, protection techniques are based on having a dedicated protection entity set up prior to failure. Such is not the case in restoration techniques, which mainly rely on the use of spare capacity in the network. Hence, in terms of trade-offs, protection techniques usually offer fast recovery from failure with enhanced availability, while restoration techniques usually achieve better resource utilization.

在可生存的网络设计中，必须从一开始就将备用容量和多样性构建到网络中，以便在发生故障时支持某种程度的自愈。一种常见的策略是将每个工作实体与一个保护实体相关联，该保护实体具有预先保留或按需保留的专用资源或共享资源。根据建立保护实体的方法，可以对提供生存能力的不同方法进行分类。通常，保护技术基于在故障前建立专用保护实体。恢复技术并非如此，它主要依赖于网络中备用容量的使用。因此，在权衡方面，保护技术通常提供从故障中快速恢复并增强可用性，而恢复技术通常实现更好的资源利用率。

A 1+1 protection architecture is rather expensive since resource duplication is required for the working and protection entities. It is generally used for specific services that need a very high availability.

1+1保护体系结构相当昂贵，因为工作实体和保护实体需要资源复制。它通常用于需要非常高可用性的特定服务。

A 1:1 architecture is inherently slower in recovering from failure than a 1+1 architecture since communication between both ends of the protection domain is required to perform the switch-over operation. An advantage is that the protection entity can optionally be used to carry low-priority extra traffic in normal operation, if traffic preemption is allowed. Packet networks can pre-establish a protection path for later use with pre-planned but not pre-reserved capacity. That is, if no packets are sent onto a protection path,

由于执行切换操作需要保护域两端之间的通信，因此1:1体系结构从故障中恢复的速度天生比1+1体系结构慢。一个优点是，如果允许流量抢占，则可以选择使用保护实体在正常操作中承载低优先级的额外流量。分组网络可以预先建立保护路径，以便以后使用预先规划但不预先保留的容量。也就是说，如果没有数据包发送到保护路径，

then no bandwidth is consumed. This is not the case in transmission networks like optical or TDM where path establishment and resource reservation cannot be decoupled.

这样就不会消耗带宽。在像光学或TDM这样的传输网络中，路径建立和资源预留不能解耦，但情况并非如此。

In the 1:n protection architecture, traffic is normally sent on the working entities. When multiple working entities have failed simultaneously, only one of them can be restored by the common protection entity. This contention could be resolved by assigning a different preemptive priority to each working entity. As in the 1:1 case, the protection entity can optionally be used to carry preemptable traffic in normal operation.

在1:n保护体系结构中，流量通常在工作实体上发送。当多个工作实体同时发生故障时，公共保护实体只能恢复其中一个。这种争用可以通过为每个工作实体分配不同的抢占优先级来解决。在1:1的情况下，保护实体可选择性地用于在正常操作中承载可抢占的通信量。

While the m:n architecture can improve system availability with small cost increases, it has rarely been implemented or standardized.

虽然m:n体系结构可以在成本增加不大的情况下提高系统可用性，但它很少被实现或标准化。

When compared with protection mechanisms, restoration mechanisms are generally more frugal as no resources are committed until after the fault occurs and the location of the fault is known. However, restoration mechanisms are inherently slower, since more must be done following the detection of a fault. Also, the time it takes for the dynamic selection and establishment of alternate paths may vary, depending on the amount of traffic and connections to be restored, and is influenced by the network topology, technology employed, and the type and severity of the fault. As a result, restoration time tends to be more variable than the protection switch time needed with pre-selected protection entities. Hence, in using restoration mechanisms, it is essential to use restoration priority to ensure that service objectives are met cost-effectively.

与保护机制相比，恢复机制通常更节省，因为在故障发生且故障位置已知之前，不会提交任何资源。然而，恢复机制本质上较慢，因为在检测到故障后必须做更多的工作。此外，动态选择和建立备用路径所需的时间可能会有所不同，这取决于要恢复的流量和连接的数量，并且受网络拓扑、所采用的技术以及故障的类型和严重性的影响。因此，恢复时间往往比预先选择的保护实体所需的保护切换时间变化更大。因此，在使用恢复机制时，必须使用恢复优先级以确保经济高效地实现服务目标。

Once the network routing algorithms have converged after a fault, it may be preferable in some cases, to reoptimize the network by performing a reroute based on the current state of the network and network policies.

一旦网络路由算法在故障后收敛，在某些情况下，通过基于网络的当前状态和网络策略执行重新路由来重新优化网络可能是优选的。

3. Survivability

3. 生存能力

3.1 Scope

3.1 范围

Interoperable approaches to network survivability were determined to be an immediate requirement in packet networks as well as in SDH/SONET framed TDM networks. Not as pressing at this time were techniques that would cover all-optical networks (e.g., where framing is unknown), as the control of these networks in a multi-vendor environment appeared to have some other hurdles to first deal with. Also, not of immediate interest were approaches to coordinate or explicitly communicate survivability mechanisms across network layers (such as from a TDM or optical network to/from an IP network). However, a capability should be provided for a network operator to

可互操作的网络生存性方法被确定为分组网络以及SDH/SONET框架TDM网络中的一项迫切需求。此时，覆盖全光网络的技术（例如，在帧未知的情况下）没有那么紧迫，因为在多供应商环境中对这些网络的控制似乎有一些其他障碍需要首先解决。此外，跨网络层（例如从TDM或光网络到IP网络或从IP网络到IP网络）协调或显式通信生存性机制的方法也不具有直接意义。但是，应为网络运营商提供以下能力：

perform fault notification and to control the operation of survivability mechanisms among different layers. This may require the development of corresponding OAM functionality. However, such issues and those related to OAM are currently outside the scope of this document. (For proposed MPLS OAM requirements, see [8, 9]).

执行故障通知并控制不同层之间生存性机制的操作。这可能需要开发相应的OAM功能。然而，这些问题以及与OAM有关的问题目前不在本文件的范围之内。（有关提议的MPLS OAM要求，请参见[8,9]）。

The initial scope is to address only "backhoe failures" in the inter-office connections of a service provider network. A link connection in the router layer is typically comprised of multiple spans in the lower layers. Therefore, the types of network failures that cause a recovery to be performed include link/span failures. However, linecard and node failures may not need to be treated any differently than their respective link/span failures, as a router failure may be represented as a set of simultaneous link failures.

最初的范围是仅解决服务提供商网络的办公室间连接中的“反铲故障”。路由器层中的链路连接通常由较低层中的多个跨距组成。因此，导致执行恢复的网络故障类型包括链路/跨度故障。但是，线路卡和节点故障的处理可能不需要与它们各自的链路/跨度故障有任何不同，因为路由器故障可以表示为一组同时发生的链路故障。

Depending on the actual network configuration, drop-side interface (e.g., between a customer and an access router, or between a router and an optical cross-connect) may be considered either inter-domain or inter-layer. Another inter-domain scenario is the use of intra-office links for interconnecting a metro network and a core network, with both networks being administered by the same service provider. Failures at such interfaces may be similarly protected by the mechanisms of this section.

根据实际网络配置，可以考虑域间接口或层间接口（例如，客户与接入路由器之间，或路由器与光交叉连接之间）。另一个域间场景是使用局内链路互连城域网络和核心网络，两个网络由同一服务提供商管理。此类接口处的故障可通过本节的机制进行类似的保护。

Other more complex failure mechanisms such as systematic control-plane failure, configuration error, or breach of security are not within the scope of the survivability mechanisms discussed in this document. Network impairment such as congestion that results in lower throughput are also not covered.

其他更复杂的故障机制，如系统控制平面故障、配置错误或安全漏洞，不在本文件讨论的生存性机制的范围内。网络损坏（如导致吞吐量降低的拥塞）也不包括在内。

3.2 Required initial set of survivability mechanisms

3.2 所需的初始生存能力机制集

3.2.1 1:1 Path Protection with Pre-Established Capacity

3.2.1 具有预设容量的1:1路径保护

In this protection mode, the head end of a working connection establishes a protection connection to the destination. There should be the ability to maintain relative restoration priorities between working and protection connections, as well as between different classes of protection connections.

在此保护模式下，工作连接的前端建立到目标的保护连接。应能够在工作连接和保护连接之间以及不同类别的保护连接之间保持相对的恢复优先级。

In normal operation, traffic is only sent on the working connection, though the ability to signal that traffic will be sent on both connections (1+1 Path for signaling purposes) would be valuable in non-packet networks. Some distinction between working and protection connections is likely, either through explicit objects, or preferably through implicit methods such as general classes or priorities. Head ends need the ability to create connections that are as failure disjoint as possible from each other. This requires SRG information

在正常操作中，流量仅在工作连接上发送，尽管在非分组网络中，能够发出信号，表明流量将在两个连接上发送（用于信令目的的1+1路径）是有价值的。工作连接和保护连接之间可能存在一些区别，或者是通过显式对象，或者最好是通过隐式方法，如通用类或优先级。前端需要能够创建彼此尽可能不相交的连接。这需要SRG信息

that can be generally assigned to either nodes or links and propagated through the control or management plane. In this mechanism, capacity in the protection connection is pre-established, however it should be capable of carrying preemptable extra traffic in non-packet networks. When protection capacity is called into service during recovery, there should be the ability to promote the protection connection to working status (for non-revertive mode operation) with some form of make-before-break capability.

通常可以分配给节点或链接，并通过控制或管理平面传播。在这种机制中，保护连接中的容量是预先建立的，但是它应该能够在非分组网络中承载可抢占的额外流量。当保护容量在恢复期间投入使用时，应能够通过某种形式的先通后断能力将保护连接提升到工作状态（对于非恢复模式操作）。

3.2.2 1:1 Path Protection with Pre-Planned Capacity

3.2.2 具有预先规划容量的1:1路径保护

Similar to the above 1:1 protection with pre-established capacity, the protection connection in this case is also pre-signaled. The difference is in the way protection capacity is assigned. With pre-planned capacity, the mechanism supports the ability for the protection capacity to be shared, or "double-booked". Operators need the ability to provision different amounts of protection capacity according to expected failure modes and service level agreements. Thus, an operator may wish to provision sufficient restoration capacity to handle a single failure affecting all connections in an SRG, or may wish to provision less or more restoration capacity. Mechanisms should be provided to allow restoration capacity on each link to be shared by SRG-disjoint failures. In a sense, this is 1:1 from a path perspective; however, the protection capacity in the network (on a link by link basis) is shared in a 1:n fashion, e.g., see the proposals in [10, 11]. If capacity is planned but not allocated, some form of signaling could be required before traffic may be sent on protection connections, especially in TDM networks.

与上述具有预设容量的1:1保护类似，这种情况下的保护连接也会预先发出信号。不同之处在于保护容量的分配方式。对于预先规划的容量，该机制支持共享或“双重预订”保护容量的能力。运营商需要能够根据预期故障模式和服务水平协议提供不同数量的保护容量。因此，运营商可能希望提供足够的恢复容量来处理影响SRG中所有连接的单一故障，或者可能希望提供更少或更多的恢复容量。应提供机制，以允许SRG不相交故障共享每条链路上的恢复容量。从某种意义上说，从路径的角度来看，这是1:1；然而，网络中的保护容量（以链路为基础）以1:n的方式共享，例如，参见[10,11]中的建议。如果容量已规划但未分配，则在通过保护连接发送流量之前，可能需要某种形式的信令，尤其是在TDM网络中。

The use of this approach improves network resource utilization, but may require more careful planning. So, initial deployment might be based on 1:1 path protection with pre-established capacity and the local restoration mechanism to be described next.

使用此方法可以提高网络资源利用率，但可能需要更仔细的规划。因此，初始部署可能基于1:1路径保护，并具有预先建立的容量和本地恢复机制（将在下面介绍）。

3.2.3 Local Restoration

3.2.3 局部修复

Due to the time impact of signal propagation, dynamic recovery of an entire path may not meet the service requirements of some networks. The solution to this is to restore connectivity of the link or span in immediate proximity to the fault, e.g., see the proposals in [12, 13]. At a minimum, this approach should be able to protect against connectivity-type SRGs, though protecting against node-based SRGs might be worthwhile. Also, this approach is applicable to support restoration on the inter-domain and inter-layer interconnection scenarios using intra-office links as described in the Scope Section.

由于信号传播的时间影响，整个路径的动态恢复可能无法满足某些网络的服务要求。解决这一问题的方法是恢复紧邻故障的链路或跨度的连通性，例如，参见[12,13]中的建议。至少，这种方法应该能够防止连接类型的SRG，尽管保护基于节点的SRG可能是值得的。此外，此方法还适用于支持使用“范围”一节中所述的办公室内链路在域间和层间互连场景上进行恢复。

Head end systems must have some control as to whether their connections are candidates for or excluded from local restoration. For example, best-effort and preemptable traffic may be excluded from local restoration; they only get restored if there is bandwidth available. This type of control may require the definition of an object in signaling.

前端系统必须对其连接是否为本地恢复的候选连接或排除在本地恢复之外进行一定的控制。例如，尽最大努力和可抢占的流量可能被排除在本地恢复之外；只有在有可用带宽的情况下，它们才会恢复。这种类型的控制可能需要在信令中定义对象。

Since local restoration may be suboptimal, a means for head end systems to later perform path-level re-grooming must be supported for this approach.

由于局部恢复可能是次优的，因此这种方法必须支持前端系统稍后执行路径级重新整理的方法。

3.2.4 Path Restoration

3.2.4 路径恢复

In this approach, connections that are impacted by a fault are rerouted by the originating network element upon notification of connection failure. Such a source-based approach is efficient for network resources, but typically takes longer to accomplish restoration. It does not involve any new mechanisms. It merely is a mention of another common approach to protecting against faults in a network.

在这种方法中，受故障影响的连接在收到连接故障通知后由发起网元重新路由。这种基于源的方法对于网络资源是有效的，但通常需要更长的时间来完成恢复。它不涉及任何新的机制。这只是提到了另一种常见的防止网络故障的方法。

3.3 Applications Supported

3.3 支持的应用程序

With service continuity under failure as a goal, a network is "survivable" if, in the face of a network failure, connectivity is interrupted for a "brief" period and then recovered before the network failure ends. The length of this interrupted period is dependent upon the application supported. Here are some typical applications and considerations that drive the requirements for an acceptable protection switch time or restoration time:

以故障下的服务连续性为目标，如果面对网络故障，连接中断“短暂”一段时间，然后在网络故障结束前恢复，则网络是“可生存的”。此中断周期的长度取决于支持的应用程序。以下是一些典型应用和注意事项，这些应用和注意事项推动了对可接受的保护切换时间或恢复时间的要求：

- Best-effort data: recovery of network connectivity by rerouting at the IP layer would be sufficient - Premium data service: need to meet TCP timeout or application protocol timer requirements - Voice: call cutoff is in the range of 140 msec to 2 sec (the time that a person waits after interruption of the speech path before hanging up or the time that a telephone switch will disconnect a call) - Other real-time service (e.g., streaming, fax) where an interruption would cause the session to terminate - Mission-critical applications that cannot tolerate even brief interruptions, for example, real-time financial transactions

- 尽力而为的数据：通过在IP层重新路由恢复网络连接就足够了-高级数据服务：需要满足TCP超时或应用协议计时器要求-语音：呼叫中断在140毫秒到2秒的范围内（在语音路径中断后，用户在挂断之前等待的时间，或电话交换机将断开呼叫的时间）-其他实时服务（例如，流媒体、传真）如果中断会导致会话终止-任务关键型应用程序甚至不能容忍短暂中断，例如实时金融交易

3.4 Timing Bounds for Survivability Mechanisms

3.4 生存性机制的时间界限

The approach to picking the types of survivability mechanisms recommended was to consider a spectrum of mechanisms that can be used to protect traffic with varying characteristics of survivability and speed of protection/restoration, and then attempt to select a few general points that provide some coverage across that spectrum. The focus of this work is to provide requirements to which a small set of detailed proposals may be developed, allowing the operator some (limited) flexibility in approaches to meeting their design goals in engineering multi-vendor networks. Requirements of different applications as listed in the previous sub-section were discussed generally, however none on the team would likely attest to the scientific merit of the ability of the timing bounds below to meet any specific application's needs. A few assumptions include:

选择可生存性机制的类型的方法是考虑频谱的机制，可以用来保护具有不同生存特性和保护/恢复速度的流量，然后尝试选择一些在该频谱上提供一些覆盖的一般点。这项工作的重点是提供可制定一小部分详细提案的要求，允许运营商在满足其设计目标的方法上具有一定的（有限的）灵活性，以设计多供应商网络。对上一小节中列出的不同应用程序的要求进行了一般性讨论，但是团队中没有人可能证明以下时间界限能够满足任何特定应用程序需求的科学价值。一些假设包括：

1. Approaches in which protection switch without propagation of information are likely to be faster than those that do require some form of fault notification to some or all elements in a network.

1. 无信息传播的保护切换方法可能比需要向网络中的某些或所有元件发出某种形式的故障通知的方法更快。

2. Approaches that require some form of signaling after a fault will also likely suffer some timing impact.

2. 在发生故障后需要某种形式的信号的方法也可能会受到一些时序影响。

Proposed timing bounds for different survivability mechanisms are as follows (all bounds are exclusive of signal propagation):

不同生存性机制的拟议时间界限如下（所有界限不包括信号传播）：

   1:1 path protection with pre-established capacity:  100-500 ms
   1:1 path protection with pre-planned capacity:      100-750 ms
   Local restoration:                                  50 ms
   Path restoration:                                   1-5 seconds

   1:1 path protection with pre-established capacity:  100-500 ms
   1:1 path protection with pre-planned capacity:      100-750 ms
   Local restoration:                                  50 ms
   Path restoration:                                   1-5 seconds

To ensure that the service requirements for different applications can be met within the above timing bounds, restoration priority must be implemented to determine the order in which connections are restored (to minimize service restoration time as well as to gain access to available spare capacity on the best paths). For example, mission critical applications may require high restoration priority. At the fiber layer, instead of specific applications, it may be possible that priority be given to certain classifications of customers with their traffic types enclosed within the customer aggregate. Preemption priority should only be used in the event that not all connections can be restored, in which case connections with lower preemption priority should be released. Depending on a service provider's strategy in provisioning network resources for backup, preemption may or may not be needed in the network.

为确保在上述时间范围内满足不同应用程序的服务要求，必须实施恢复优先级，以确定恢复连接的顺序（以最大限度地减少服务恢复时间，并获得最佳路径上可用备用容量的访问权）。例如，任务关键型应用程序可能需要高恢复优先级。在光纤层，可能会优先考虑某些类别的客户，而不是特定的应用，其流量类型包含在客户集合中。只有在不能恢复所有连接的情况下才应使用抢占优先级，在这种情况下，应释放抢占优先级较低的连接。根据服务提供商为备份配置网络资源的策略，网络中可能需要也可能不需要抢占。

3.5 Coordination Among Layers

3.5 层间协调

A common design goal for networks with multiple technological layers is to provide the desired level of service in the most cost-effective manner. Multilayer survivability may allow the optimization of spare resources through the improvement of resource utilization by sharing spare capacity across different layers, though further investigations are needed. Coordination during recovery among different network layers (e.g., IP, SDH/SONET, optical layer) might necessitate development of vertical hierarchy. The benefits of providing survivability mechanisms at multiple layers, and the optimization of the overall approach, must be weighed with the associated cost and service impacts.

具有多个技术层的网络的一个共同设计目标是以最具成本效益的方式提供所需的服务级别。多层生存能力可以通过跨不同层共享备用容量来提高资源利用率，从而优化备用资源，但还需要进一步研究。不同网络层（如IP、SDH/SONET、光学层）之间恢复期间的协调可能需要开发垂直层次结构。在多层提供生存性机制的好处以及整体方法的优化，必须与相关的成本和服务影响进行权衡。

A default coordination mechanism for inter-layer interaction could be the use of nested timers and current SDH/SONET fault monitoring, as has been done traditionally for backward compatibility. Thus, when lower-layer recovery happens in a longer time period than higher-layer recovery, a hold-off timer is utilized to avoid contention between the different single-layer survivability schemes. In other words, multilayer interaction is addressed by having successively higher multiplexing levels operate at a protection/restoration time scale greater than the next lowest layer. This can impact the overall time to recover service. For example, if SDH/SONET protection switching is used, MPLS recovery timers must wait until SDH/SONET has had time to switch. Setting such timers involves a tradeoff between rapid recovery and creation of a race condition where multiple layers are responding to the same fault, potentially allocating resources in an inefficient manner.

层间交互的默认协调机制可以是使用嵌套计时器和当前SDH/SONET故障监测，传统上是为了向后兼容。因此，当较低层恢复发生在比较高层恢复更长的时间段内时，利用延迟定时器来避免不同单层生存性方案之间的争用。换言之，通过使连续较高的复用级别在大于下一最低层的保护/恢复时间尺度下操作来解决多层交互。这可能会影响恢复服务的总时间。例如，如果使用SDH/SONET保护交换，MPLS恢复计时器必须等到SDH/SONET有时间进行交换。设置此类计时器需要在快速恢复和创建竞态条件之间进行权衡，在竞态条件下，多个层对同一故障做出响应，从而可能以低效的方式分配资源。

In other configurations where the lower layer does not have a restoration capability or is not expected to protect, say an unprotected SDH/SONET linear circuit, then there must be a mechanism for the lower layer to trigger the higher layer to take recovery actions immediately. This difference in network configuration means that implementations must allow for adjustment of hold-off timer values and/or a means for a lower layer to immediately indicate to a higher layer that a fault has occurred so that the higher layer can take restoration or protection actions.

在低层不具备恢复能力或预计不会保护的其他配置中，例如未受保护的SDH/SONET线性电路，则必须有一种机制使低层触发高层立即采取恢复操作。网络配置中的这种差异意味着实现必须允许调整延迟计时器值和/或允许较低层立即向较高层指示发生了故障，以便较高层可以采取恢复或保护措施。

Furthermore, faults at higher layers should not trigger restoration or protection actions at lower layers [3, 4].

此外，较高层的故障不应触发较低层的恢复或保护动作[3,4]。

It was felt that the current approach to coordination of survivability approaches currently did not have significant operational shortfalls. These approaches include protecting traffic solely at one layer (e.g., at the IP layer over linear WDM, or at the SDH/SONET layer). Where survivability mechanisms might be deployed

有人认为，目前协调生存能力方法的方法目前没有明显的操作缺陷。这些方法包括仅在一层（例如，通过线性WDM在IP层，或在SDH/SONET层）保护流量。可能部署生存能力机制的地方

at several layers, such as when a routed network rides a SDH/SONET protected network, it was felt that current coordination approaches were sufficient in many cases. One exception is the hold-off of MPLS recovery until the completion of SDH/SONET protection switching as described above. This limits the recovery time of fast MPLS restoration. Also, by design, the operations and mechanisms within a given layer tend to be invisible to other layers.

在多个层次上，例如当路由网络占用SDH/SONET保护网络时，人们认为当前的协调方法在许多情况下是足够的。一个例外是，在完成上述SDH/SONET保护交换之前，暂停MPLS恢复。这限制了快速MPLS恢复的恢复时间。此外，根据设计，给定层中的操作和机制往往对其他层不可见。

3.6 Evolution Toward IP Over Optical

3.6 光网络IP的演进

As more pressing requirements for survivability and horizontal hierarchy for edge-to-edge signaling are met with technical proposals, it is believed that the benefits of merging (in some manner) the control planes of multiple layers will be outlined. When these benefits are self-evident, it would then seem to be the right time to review whether vertical hierarchy mechanisms are needed, and what the requirements might be. For example, a future requirement might be to provide a better match between the recovery requirements of IP networks with the recovery capability of optical transport. One such proposal is described in [14].

随着边到边信令的生存性和水平层次结构的更迫切要求通过技术提案得到满足，相信将概述（以某种方式）合并多层控制平面的好处。如果这些好处是不言而喻的，那么现在似乎是审查是否需要垂直层次结构机制以及可能的要求的适当时机。例如，未来的需求可能是在IP网络的恢复需求与光传输的恢复能力之间提供更好的匹配。[14]中描述了一个这样的提案。

4. Hierarchy Requirements

4. 层次结构要求

Efforts in the area of network hierarchy should focus on mechanisms that would allow more scalable edge-to-edge signaling, or signaling across networks with existing network hierarchy (such as multi-area OSPF). This appears to be a more urgent need than mechanisms that might be needed to interconnect networks at different layers.

网络层次结构领域的工作应侧重于允许更具可伸缩性的边到边信令，或通过现有网络层次结构（如多区域OSPF）跨网络信令的机制。这似乎比在不同层互连网络所需的机制更为迫切。

4.1 Historical Context

4.1 历史语境

One reason for horizontal hierarchy is functionality (e.g., metro versus backbone). Geographic "islands" or partitions reduce the need for interoperability and make administration and operations less complex. Using a simpler, more interoperable, survivability scheme at metro/backbone boundaries is natural for many provider network architectures. In transmission networks, creating geographic islands of different vendor equipment has been done for a long time because multi-vendor interoperability has been difficult to achieve. Traditionally, providers have to coordinate the equipment on either end of a "connection," and making this interoperable reduces complexity. A provider should be able to concatenate survivability mechanisms in order to provide a "protected link" to the next higher level. Think of SDH/SONET rings connecting to TDM DXCs with 1+1 line-layer protection between the ADM and the DXC port. The TDM connection, e.g., a DS3, is protected but usually all equipment on each SDH/SONET ring is from a single vendor. The DXC cross connections are controlled by the provider and the ports are

水平层次结构的一个原因是功能性（例如，城域网与主干网）。地理上的“孤岛”或分区减少了互操作性的需要，并降低了管理和操作的复杂性。对于许多提供商网络架构来说，在城域网/主干网边界使用更简单、更具互操作性、可生存性的方案是很自然的。在传输网络中，由于难以实现多供应商互操作性，长期以来一直在创建不同供应商设备的地理孤岛。传统上，提供商必须协调“连接”两端的设备，使这种互操作性降低了复杂性。提供者应该能够连接生存性机制，以便提供到下一个更高级别的“受保护链接”。设想SDH/SONET环连接到TDM DXC，在ADM和DXC端口之间具有1+1线路层保护。TDM连接（如DS3）受到保护，但每个SDH/SONET环上的所有设备通常来自同一供应商。DXC交叉连接由提供程序控制，端口为

physically protected resulting in a highly available design. Thus, concatenation of survivability approaches can be used to cascade across a horizontal hierarchy. While not perfect, it is workable in the near to mid-term until multi-vendor interoperability is achieved.

物理保护，实现高可用性设计。因此，可生存性方法的串联可用于跨水平层次进行级联。虽然并不完美，但在实现多供应商互操作性之前，它在近中期是可行的。

While the problems associated with multi-vendor interoperability may necessitate horizontal hierarchy as a practical matter in the near to mid-term (at least this has been the case in TDM networks), there should not be a technical reason for it in the standards developed by the IETF for core networks, or even most access networks. Establishing interoperability of survivability mechanisms between multi-vendor equipment in core IP networks is urgently required to enable adoption of IP as a viable core transport technology and to facilitate the traffic engineering of future multi-service IP networks [3].

虽然与多供应商互操作性相关的问题可能需要在近期至中期内将水平层次结构作为一个实际问题（至少在TDM网络中是这样），但IETF为核心网络，甚至大多数接入网络制定的标准中不应该有这样的技术原因。迫切需要在核心IP网络中的多供应商设备之间建立可生存性机制的互操作性，以便能够采用IP作为可行的核心传输技术，并促进未来多业务IP网络的流量工程[3]。

Some of the largest service provider networks currently run a single area/level IGP. Some service providers, as well as many large enterprise networks, run multi-area Open Shortest Path First (OSPF) to gain increases in scalability. Often, this was from an original design, so it is difficult to say if the network truly required the hierarchy to reach its current size.

一些最大的服务提供商网络目前运行单个区域/级别的IGP。一些服务提供商，以及许多大型企业网络，运行多区域开放最短路径优先（OSPF），以提高可扩展性。通常，这是一个原始设计，因此很难说网络是否真的需要层次结构来达到其当前大小。

Some proposals on improved mechanisms to address network hierarchy have been suggested [15, 16, 17, 18, 19]. This document aims to provide the concrete requirements so that these and other proposals can first aim to meet some limited objectives.

已经提出了一些关于改进机制以解决网络层次结构的建议[15、16、17、18、19]。本文件旨在提供具体要求，以便这些建议和其他建议能够首先达到一些有限的目标。

4.2 Applications for Horizontal Hierarchy

4.2 水平层次结构的应用

A primary driver for intra-domain horizontal hierarchy is signaling capabilities in the context of edge-to-edge VPNs, potentially across traffic-engineered data networks. There are a number of different approaches to layer 2 and layer 3 VPNs and they are currently being addressed by different emerging protocols in the provider-provisioned VPNs (e.g., virtual routers) and Pseudo Wire Edge-to-Edge Emulation (PWE3) efforts based on either MPLS and/or IP tunnels. These may or may not need explicit signaling from edge to edge, but it is a common perception that in order to meet SLAs, some form of edge-to-edge signaling may be required.

域内水平层次结构的主要驱动因素是边缘到边缘VPN环境中的信令能力，可能跨越流量工程数据网络。对于第2层和第3层VPN，有许多不同的方法，目前正通过提供商提供的VPN（例如，虚拟路由器）和基于MPLS和/或IP隧道的伪线边到边仿真（PWE3）工作中的不同新兴协议来解决这些方法。这些可能需要也可能不需要从边缘到边缘的明确信令，但人们普遍认为，为了满足SLA，可能需要某种形式的边缘到边缘信令。

With a large number of edges (N), scalability is concerned with avoiding the O(N^2) properties of edge-to-edge signaling. However, the main issue here is not with the scalability of large amounts of signaling, such as in O(N^2) meshes with a "connection" between every edge-pair. This is because, even if establishing and maintaining connections is feasible in a large network, there might be an impact on core survivability mechanisms which would cause

对于大量边（N），可伸缩性与避免边到边信令的O（N^2）特性有关。然而，这里的主要问题不是大量信令的可伸缩性，例如在每个边缘对之间具有“连接”的O（N^2）网格中。这是因为，即使在大型网络中建立和维护连接是可行的，也可能会对核心生存能力机制产生影响，从而导致

protection/restoration times to grow with N^2, which would be undesirable. While some value of N may be inevitable, approaches to reduce N (e.g. to pull in from the edge to aggregation points) might be of value.

保护/恢复时间随N^2增长，这是不可取的。虽然N的某些值可能是不可避免的，但减少N的方法（例如从边缘拉入聚集点）可能是有价值的。

Thus, most service providers feel that O(N^2) meshes are not necessary for VPNs, and that the number of tunnels to support VPNs would be within the scalability bounds of current protocols and implementations. That may be the case, as there is currently a lack of ability to signal MPLS tunnels from edge to edge across IGP hierarchy, such as OSPF areas. This may require the development of signaling standards that support dynamic establishment and potentially the restoration of LSPs across a 2-level IGP hierarchy.

因此，大多数服务提供商认为VPN不需要O（N^2）网格，并且支持VPN的隧道数量将在当前协议和实现的可伸缩性范围内。可能是这样，因为目前缺乏跨IGP层次结构（如OSPF区域）从边缘到边缘发送MPLS隧道信号的能力。这可能需要制定信令标准，以支持动态建立，并可能在2级IGP层次结构中恢复LSP。

For routing scalability, especially in data applications, a major concern is the amount of processing/state that is required in the variety of network elements. If some nodes might not be able to communicate and process the state of every other node, it might be preferable to limit the information. There is one school of thought that says that the amount of information contained by a horizontal barrier should be significant, and that impacts this might have on optimality in route selection and ability to provide global survivability are accepted tradeoffs.

对于路由可伸缩性，尤其是在数据应用程序中，一个主要问题是各种网络元素中所需的处理/状态量。如果某些节点可能无法通信和处理每个其他节点的状态，则最好限制信息。有一个学派认为，水平屏障所包含的信息量应该是重要的，而这可能对路线选择的最佳性和提供全局生存能力的能力产生的影响是公认的权衡。

4.3 Horizontal Hierarchy Requirements

4.3 水平层次要求

Mechanisms are required to allow for edge-to-edge signaling of connections through a network. One network scenario includes medium to large networks that currently have hierarchical interior routing such as multi-area OSPF or multi-level Intermediate System to Intermediate System (IS-IS). The primary context of this is edge-to-edge signaling, which is thought to be required to assure the SLAs for the layer 2 and layer 3 VPNs that are being carried across the network. Another possible context would be edge-to-edge signaling in TDM SDH/SONET networks with IP control, where metro and core networks again might be in a hierarchical interior routing domain.

需要机制来允许通过网络连接的边到边信令。一种网络场景包括当前具有分层内部路由的中大型网络，例如多区域OSPF或多级中间系统到中间系统（IS-IS）。这方面的主要内容是边到边信令，这被认为是确保通过网络传输的第2层和第3层VPN的SLA所必需的。另一种可能的情况是具有IP控制的TDM SDH/SONET网络中的边到边信令，其中城域网和核心网也可能位于分层内部路由域中。

To support edge-to-edge signaling in the above network scenarios within the framework of existing horizontal hierarchies, current traffic engineering (TE) methods [20, 6] may need to be extended. Requirements for multi-area TE need to be developed to provide guidance for any necessary protocol extensions.

为了在现有水平层次结构的框架内支持上述网络场景中的边到边信令，可能需要扩展当前的流量工程（TE）方法[20,6]。需要制定多区域TE的要求，以便为任何必要的协议扩展提供指导。

5. Survivability and Hierarchy

5. 生存性与等级

When horizontal hierarchy exists in a network technology layer, a question arises as to how survivability can be provided along a connection that crosses hierarchical boundaries.

当网络技术层中存在水平层次结构时，会出现一个问题，即如何沿着跨越层次结构边界的连接提供生存能力。

In designing protocols to meet the requirements of hierarchy, an approach to consider is that boundaries are either clean, or are of minimal value. However, the concept of network elements that participate on both sides of a boundary might be a consideration (e.g., OSPF ABRs). That would allow for devices on either side to take an intra-area approach within their region of knowledge, and for the ABR to do this in both areas, and splice the two protected connections together at a common point (granted it is a common point of failure now). If the limitations of this approach start to appear in operational settings, then perhaps it would be time to start thinking about route-servers and signaling propagated directives. However, one initial approach might be to signal through a common border router, and to consider the service as protected as it consists of a concatenated set of connections which are each protected within their area. Another approach might be to have a least common denominator mechanism at the boundary, e.g., 1+1 port protection. There should also be some standardized means for a survivability scheme on one side of such a boundary to communicate with the scheme on the other side regarding the success or failure of the recovery action. For example, if a part of a "connection" is down on one side of such a boundary, there is no need for the other side to recover from failures.

在设计协议以满足层次结构的要求时，要考虑的是边界是干净的，或者是最小值的。然而，可以考虑在边界两侧参与的网络元素的概念（例如，OSPF ABR）。这将允许任何一方的设备在其知识范围内采取区域内方法，ABR在两个区域内都这样做，并在一个公共点将两个受保护连接拼接在一起（假设现在是一个公共故障点）。如果这种方法的局限性开始出现在操作设置中，那么也许是时候开始考虑路由服务器和信号传播指令了。然而，一种最初的方法可能是通过公共边界路由器发出信号，并认为服务是受保护的，因为它是由一组连接的连接组成的，每个连接都在其区域内进行保护。另一种方法可能是在边界处采用最小公分母机制，例如1+1端口保护。对于这种边界一侧的生存性方案，还应该有一些标准化的方法，以便与另一侧的方案就恢复行动的成功或失败进行通信。例如，如果“连接”的一部分位于该边界的一侧，则另一侧无需从故障中恢复。

In summary, at this time, approaches as described above that allow concatenation of survivability schemes across hierarchical boundaries seem sufficient.

总之，目前，如上所述的允许跨层次边界连接生存性方案的方法似乎足够了。

6. Security Considerations

6. 安全考虑

The set of SRGs that are defined for a network under a common administrative control and the corresponding assignment of these SRGs to nodes and links within the administrative control is sensitive information and needs to be protected. An SRG is an acknowledgement that nodes and links that belong to an SRG are susceptible to a common threat. An adversary with access to information contained in an SRG could use that information to design an attack, determine the scope of damage caused by the attack and, therefore, be used to maximize the effect of an attack.

为公共管理控制下的网络定义的SRG集以及这些SRG对管理控制内的节点和链路的相应分配是敏感信息，需要加以保护。SRG是对属于SRG的节点和链路易受共同威胁的确认。能够访问SRG中包含的信息的对手可以使用该信息设计攻击，确定攻击造成的损害范围，从而最大限度地发挥攻击的效果。

The label used to refer to a particular SRG must allow for an encoding such that sensitive information such as physical location, function, purpose, customer, fault type, etc. is not readily discernable by unauthorized users.

用于指代特定SRG的标签必须允许编码，以便未经授权的用户无法轻易识别敏感信息，如物理位置、功能、用途、客户、故障类型等。

SRG information that is propagated through the control and management plane should allow for an encryption mechanism. An example of an approach would be to use IPSEC [21] on all packets carrying SRG information.

通过控制和管理平面传播的SRG信息应允许加密机制。一种方法的示例是在所有携带SRG信息的数据包上使用IPSEC[21]。

7. References

7. 工具书类

[1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996.

[1] Bradner，S.，“互联网标准过程——第3版”，BCP 9，RFC 2026，1996年10月。

[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[2] Bradner，S.，“RFC中用于表示需求水平的关键词”，BCP 14，RFC 2119，1997年3月。

[3] K. Owens, V. Sharma, and M. Oommen, "Network Survivability Considerations for Traffic Engineered IP Networks", Work in Progress.

[3] K.Owens、V.Sharma和M.Oommen，“流量工程IP网络的网络生存性考虑”，正在进行中。

[4] V. Sharma, B. Crane, S. Makam, K. Owens, C. Huang, F. Hellstrand, J. Weil, L. Andersson, B. Jamoussi, B. Cain, S. Civanlar, and A. Chiu, "Framework for MPLS-based Recovery", Work in Progress.

[4] V.Sharma、B.Crane、S.Makam、K.Owens、C.Huang、F.Hellstrand、J.Weil、L.Andersson、B.Jamoussi、B.Cain、S.Civanlar和A.Chiu，“基于MPLS的恢复框架”，正在进行的工作。

   [5]  M. Thorup, "Fortifying OSPF/ISIS Against Link Failure",
        http://www.research.att.com/~mthorup/PAPERS/lf_ospf.ps

   [5]  M. Thorup, "Fortifying OSPF/ISIS Against Link Failure",
        http://www.research.att.com/~mthorup/PAPERS/lf_ospf.ps

[6] Awduche, D., Chiu, A., Elwalid, A., Widjaja, I. and X. Xiao, "Overview and Principles of Internet Traffic Engineering", RFC 3272, May 2002.

[6] Awduche，D.，Chiu，A.，Elwalid，A.，Widjaja，I.和X.Xiao，“互联网流量工程概述和原则”，RFC 3272，2002年5月。

[7] S. Dharanikota, R. Jain, D. Papadimitriou, R. Hartani, G. Bernstein, V. Sharma, C. Brownmiller, Y. Xue, and J. Strand, "Inter-domain routing with Shared Risk Groups", Work in Progress.

[7] S.Dharanikota、R.Jain、D.Papadimitriou、R.Hartani、G.Bernstein、V.Sharma、C.Brownmiller、Y.Xue和J.Strand，“具有共享风险组的域间路由”，工作正在进行中。

[8] N. Harrison, P. Willis, S. Davari, E. Cuevas, B. Mack-Crane, E. Franze, H. Ohta, T. So, S. Goldfless, and F. Chen, "Requirements for OAM in MPLS Networks," Work in Progress.

[8] N.Harrison、P.Willis、S.Davari、E.Cuevas、B.Mack Crane、E.Franze、H.Ohta、T.So、S.Goldfless和F.Chen，“MPLS网络中对OAM的要求”，正在进行中。

[9] D. Allan and M. Azad, "A Framework for MPLS User Plane OAM," Work in Progress.

[9] D.Allan和M.Azad，“MPLS用户平面OAM框架”，正在进行中。

[10] S. Kini, M. Kodialam, T.V. Lakshman, S. Sengupta, and C. Villamizar, "Shared Backup Label Switched Path Restoration," Work in Progress.

[10] S.Kini、M.Kodialam、T.V.Lakshman、S.Sengupta和C.Villamizar，“共享备份标签交换路径恢复”，工作正在进行中。

[11] G. Li, C. Kalmanek, J. Yates, G. Bernstein, F. Liaw, and V. Sharma, "RSVP-TE Extensions For Shared-Mesh Restoration in Transport Networks", Work in Progress.

[11] G.Li、C.Kalmanek、J.Yates、G.Bernstein、F.Liaw和V.Sharma，“传输网络中共享网格恢复的RSVP-TE扩展”，正在进行中。

[12] P. Pan (Editor), D.H. Gan, G. Swallow, J. Vasseur, D. Cooper, A. Atlas, and M. Jork, "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", Work in Progress.

[12] P.Pan（编辑）、D.H.Gan、G.Swallow、J.Vasseur、D.Cooper、A.Atlas和M.Jork，“LSP隧道RSVP-TE快速改线扩展”，正在进行中。

[13] A. Atlas, C. Villamizar, and C. Litvanyi, "MPLS RSVP-TE Interoperability for Local Protection/Fast Reroute", Work in Progress.

[13] A.Atlas、C.Villamizar和C.Litvanyi，“用于本地保护/快速重路由的MPLS RSVP-TE互操作性”，工作正在进行中。

[14] A. Chiu and J. Strand, "Joint IP/Optical Layer Restoration after a Router Failure", Proc. OFC'2001, Anaheim, CA, March 2001.

[14] A.Chiu和J.Strand，“路由器故障后的联合IP/光学层恢复”，过程。OFC'2001，加利福尼亚州阿纳海姆，2001年3月。

[15] K. Kompella and Y. Rekhter, "Multi-area MPLS Traffic Engineering", Work in Progress.

[15] K.Kompella和Y.Rekhter，“多区域MPLS流量工程”，正在进行中。

[16] G. Ash, et. al., "Requirements for Multi-Area TE", Work in Progress.

[16] G.Ash等人，“多区域TE的要求”，正在进行的工作。

[17] A. Iwata, N. Fujita, G.R. Ash, and A. Farrel, "Crankback Routing Extensions for MPLS Signaling", Work in Progress.

[17] A.Iwata、N.Fujita、G.R.Ash和A.Farrel，“MPLS信令的回退路由扩展”，正在进行中。

[18] C-Y Lee, A. Celer, N. Gammage, S. Ghanti, G. Ash, "Distributed Route Exchangers", Work in Progress.

[18] C-Y Lee，A.Celer，N.Gammage，S.Ghanti，G.Ash，“分布式路由交换机”，正在进行的工作。

[19] C-Y Lee and S. Ghanti, "Path Request and Path Reply Message", Work in Progress.

[19] C-Y Lee和S.Ghanti，“路径请求和路径回复消息”，正在进行中。

[20] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M. and J. McManus, "Requirements for Traffic Engineering Over MPLS", RFC 2702, September 1999.

[20] Awduche，D.，Malcolm，J.，Agogbua，J.，O'Dell，M.和J.McManus，“MPLS上的流量工程要求”，RFC 2702，1999年9月。

[21] Kent, S. and R. Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998.

[21] Kent，S.和R.Atkinson，“互联网协议的安全架构”，RFC 2401，1998年11月。

8. Acknowledgments

8. 致谢

A lot of the direction taken in this document, and by the team in its initial effort was steered by the insightful questions provided by Bala Rajagoplan, Greg Bernstein, Yangguang Xu, and Avri Doria. The set of questions is attached as Appendix A in this document.

本文件中的许多方向以及团队最初的努力都是由巴拉·拉贾戈普兰（Bala Rajagoplan）、格雷格·伯恩斯坦（Greg Bernstein）、徐阳光（Yangguang Xu）和阿夫里·多里亚（Avri Doria）提出的富有洞察力的问题引导的。这组问题作为附录A附在本文件中。

After the release of the first draft, a number of comments were received. Thanks to the inputs from Jerry Ash, Sudheer Dharanikota, Chuck Kalmanek, Dan Koller, Lyndon Ong, Steve Plote, and Yong Xue.

在第一稿发布后，收到了一些评论意见。感谢Jerry Ash、Sudheer Dharanikota、Chuck Kalmanek、Dan Koller、Lyndon Ong、Steve Plote和Yong Xue的投入。

9. Contributing Authors

9. 撰稿人

Jim Boyle (PDNets), Rob Coltun (Movaz), Tim Griffin (AT&T), Ed Kern, Tom Reddington (Lucent) and Malin Carlzon.

吉姆·博伊尔（PDNET）、罗布·科尔顿（莫瓦兹）、蒂姆·格里芬（AT&T）、埃德·克恩、汤姆·雷丁顿（朗讯）和马林·卡尔松。

Appendix A: Questions used to help develop requirements

附录A：用于帮助开发需求的问题

A. Definitions

A.定义

1. In determining the specific requirements, the design team should precisely define the concepts "survivability", "restoration", "protection", "protection switching", "recovery", "re-routing" etc. and their relations. This would enable the requirements doc to describe precisely which of these will be addressed. In the following, the term "restoration" is used to indicate the broad set of policies and mechanisms used to ensure survivability.

1. 在确定具体需求时，设计团队应准确定义“生存性”、“恢复”、“保护”、“保护切换”、“恢复”、“重路由”等概念及其关系。这将使需求文档能够准确地描述将要解决的问题。在下文中，“恢复”一词用于表示用于确保生存能力的广泛政策和机制。

B. Network types and protection modes

B.网络类型和保护模式

1. What is the scope of the requirements with regard to the types of networks covered? Specifically, are the following in scope:

1. 关于所涵盖的网络类型，要求的范围是什么？具体而言，范围如下：

Restoration of connections in mesh optical networks (opaque or transparent) Restoration of connections in hybrid mesh-ring networks Restoration of LSPs in MPLS networks (composed of LSRs overlaid on a transport network, e.g., optical) Any other types of networks? Is commonality of approach, or optimization of approach more important?

网状光网络中连接的恢复（不透明或透明）混合网状环网中连接的恢复MPLS网络中LSP的恢复（由覆盖在传输网络上的LSR组成，例如光学）任何其他类型的网络？方法的通用性或方法的优化更重要吗？

2. What are the requirements with regard to the protection modes to be supported in each network type covered? (Examples of protection modes include 1+1, M:N, shared mesh, UPSR, BLSR, newly defined modes such as P-cycles, etc.)

2. 关于每种网络类型中支持的保护模式，有哪些要求？（保护模式的示例包括1+1、M:N、共享网格、UPSR、BLSR、新定义的模式，如P循环等）

3. What are the requirements on local span (i.e., link by link) protection and end-to-end protection, and the interaction between them? E.g.: what should be the granularity of connections for each type (single connection, bundle of connections, etc).

3. 本地跨距（即逐链路）保护和端到端保护的要求以及它们之间的相互作用是什么？例如：每种类型（单个连接、连接束等）的连接粒度应该是多少。

C. Hierarchy

C.等级制度

1. Vertical (between two network layers): What are the requirements for the interaction between restoration procedures across two network layers, when these features are offered in both layers? (Example, MPLS network realized over pt-to-pt optical connections.) Under such a case,

1. 垂直（两个网络层之间）：当两个网络层都提供了这些功能时，跨两个网络层的恢复过程之间的交互要求是什么？（例如，通过pt到pt光连接实现的MPLS网络）在这种情况下，

(a) Are there any criteria to choose which layer should provide protection?

(a) 是否有任何标准来选择哪一层应提供保护？

(b) If both layers provide survivability features, what are the requirements to coordinate these mechanisms?

(b) 如果这两层都提供了生存能力特性，那么协调这些机制的要求是什么？

(d) Would the benefits be worth additional complexity associated with routing isolation (e.g. VPN, areas), security, address isolation and policy / authentication processes?

(d) 与路由隔离（如VPN、区域）、安全性、地址隔离和策略/身份验证过程相关的额外复杂性是否值得这些好处？

2. Horizontal (between two areas or administrative subdivisions within the same network layer):

2. 水平（同一网络层内的两个区域或行政分区之间）：

(a) What are the criteria that trigger the creation of protocol or administrative boundaries pertaining to restoration? (e.g., scalability? multi-vendor interoperability? what are the practical issues?) multi-provider? Should multi-vendor necessitate hierarchical separation?

(a) 触发创建与恢复相关的协议或管理边界的标准是什么？（例如，可扩展性？多供应商互操作性？实际问题是什么？）多供应商？多供应商是否需要分层分离？

When such boundaries are defined:

定义此类边界时：

(b) What are the requirements on how protection/restoration is performed end-to-end across such boundaries?

(b) 关于如何跨这些边界端到端执行保护/恢复，有哪些要求？

(c) If different restoration mechanisms are implemented on two sides of a boundary, what are the requirements on their interaction?

What is the primary driver of horizontal hierarchy? (select one) - functionality (e.g. metro -v- backbone) - routing scalability - signaling scalability - current network architecture, trying to layer on TE on top of an already hierarchical network architecture - routing and signalling

水平层次结构的主要驱动因素是什么？（选择一个）-功能（例如metro-v-主干）-路由可扩展性-信令可扩展性-当前网络架构，尝试在已经分层的网络架构之上分层-路由和信令

For signalling scalability, is it - manageability - processing/state of network - edge-to-edge N^2 type issue

对于信令可伸缩性，是否存在-可管理性-处理/网络状态-边到边N^2类型问题

For routing scalability, is it - processing/state of network - are you flat and want to go hierarchical - or already hierarchical? - data or TDM application?

对于路由可伸缩性，是-处理/网络状态-您是平坦的，想要分层还是已经分层数据还是TDM应用？

D. Policy

D.政策

1. What are the requirements for policy support during protection/restoration, e.g., restoration priority, preemption, etc.

1. 保护/恢复期间的策略支持要求是什么，例如恢复优先级、抢占等。

E. Signaling Mechanisms

E.信号机制

1. What are the requirements on the signaling transport mechanism (e.g., in-band over SDH/SONET overhead bytes, out-of-band over an IP network, etc.) used to communicate restoration protocol messages between network elements? What are the bandwidth and other requirements on the signaling channels?

1. 用于在网元之间传输恢复协议消息的信令传输机制（例如，SDH/SONET开销字节带内传输、IP网络带外传输等）有哪些要求？信号通道的带宽和其他要求是什么？

2. What are the requirements on fault detection/localization mechanisms (which is the prelude to performing restoration procedures) in the case of opaque and transparent optical networks? What are the requirements in the case of MPLS restoration?

2. 在不透明和透明光网络的情况下，对故障检测/定位机制（这是执行恢复程序的前奏）有什么要求？MPLS恢复的要求是什么？

3. What are the requirements on signaling protocols to be used in restoration procedures (e.g., high priority processing, security, etc)?

3. 恢复过程（例如，高优先级处理、安全性等）中使用的信令协议有哪些要求？

4. Are there any requirements on the operation of restoration protocols?

4. 是否对恢复协议的操作有任何要求？

F. Quantitative

F.定量

1. What are the quantitative requirements (e.g., latency) for completing restoration under different protection modes (for both local and end-to-end protection)?

1. 在不同保护模式（本地和端到端保护）下完成恢复的定量要求（如延迟）是什么？

G. Management

G.管理

1. What information should be measured/maintained by the control plane at each network element pertaining to restoration events?

1. 控制平面应在每个网元上测量/维护与恢复事件有关的哪些信息？

2. What are the requirements for the correlation between control plane and data plane failures from the restoration point of view?

2. 从恢复的角度来看，控制平面和数据平面故障之间的相关性要求是什么？

Editors' Addresses

编辑地址

Wai Sum Lai AT&T 200 Laurel Avenue Middletown, NJ 07748, USA

美国新泽西州米德尔顿劳雷尔大道200号惠森丽电话电报公司，邮编07748

   Phone: +1 732-420-3712
   EMail: wlai@att.com

   Phone: +1 732-420-3712
   EMail: wlai@att.com

Dave McDysan WorldCom 22001 Loudoun County Pkwy Ashburn, VA 20147, USA

Dave McDysan WorldCom 22001美国弗吉尼亚州阿什本市劳顿县Pkwy，邮编：20147

   EMail: dave.mcdysan@wcom.com

   EMail: dave.mcdysan@wcom.com

Full Copyright Statement

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的，互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的，互联网协会和互联网工程任务组否认所有明示或暗示的保证，包括但不限于任何保证，即使用本文中的信息不会侵犯任何权利，或对适销性或特定用途适用性的任何默示保证。

Acknowledgement

确认

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。