Internet Engineering Task Force (IETF)                            Y. Nir
Request for Comments: 6027                                   Check Point
Category: Informational                                     October 2010
ISSN: 2070-1721
        
Internet Engineering Task Force (IETF)                            Y. Nir
Request for Comments: 6027                                   Check Point
Category: Informational                                     October 2010
ISSN: 2070-1721
        

IPsec Cluster Problem Statement

IPsec群集问题声明

Abstract

摘要

This document defines the terminology, problem statement, and requirements for implementing Internet Key Exchange (IKE) and IPsec on clusters. It also describes gaps in existing standards and their implementation that need to be filled in order to allow peers to interoperate with clusters from different vendors. Agreed upon terminology, problem statement, and requirements will allow IETF working groups to consider development of IPsec/IKEv2 mechanisms to simplify cluster implementations.

本文档定义了在集群上实现Internet密钥交换(IKE)和IPsec的术语、问题陈述和要求。它还描述了现有标准及其实现中需要填补的空白,以允许对等方与来自不同供应商的集群进行互操作。同意的术语、问题陈述和要求将允许IETF工作组考虑开发IPSec /IKEv2机制以简化集群实现。

Status of This Memo

关于下段备忘

This document is not an Internet Standards Track specification; it is published for informational purposes.

本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.

本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6027.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6027.

Copyright Notice

版权公告

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2010 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。

Table of Contents

目录

   1. Introduction ....................................................3
      1.1. Conventions Used in This Document ..........................3
   2. Terminology .....................................................3
   3. The Problem Statement ...........................................5
      3.1. Scope ......................................................5
      3.2. A Lot of Long-Lived State ..................................6
      3.3. IKE Counters ...............................................6
      3.4. Outbound SA Counters .......................................6
      3.5. Inbound SA Counters ........................................7
      3.6. Missing Synch Messages .....................................8
      3.7. Simultaneous Use of IKE and IPsec SAs by Different
           Members ....................................................8
           3.7.1. Outbound SAs Using Counter Modes ....................9
      3.8. Different IP Addresses for IKE and IPsec ..................10
      3.9. Allocation of SPIs ........................................10
   4. Security Considerations ........................................10
   5. Acknowledgements ...............................................11
   6. References .....................................................11
      6.1. Normative References ......................................11
      6.2. Informative References ....................................11
        
   1. Introduction ....................................................3
      1.1. Conventions Used in This Document ..........................3
   2. Terminology .....................................................3
   3. The Problem Statement ...........................................5
      3.1. Scope ......................................................5
      3.2. A Lot of Long-Lived State ..................................6
      3.3. IKE Counters ...............................................6
      3.4. Outbound SA Counters .......................................6
      3.5. Inbound SA Counters ........................................7
      3.6. Missing Synch Messages .....................................8
      3.7. Simultaneous Use of IKE and IPsec SAs by Different
           Members ....................................................8
           3.7.1. Outbound SAs Using Counter Modes ....................9
      3.8. Different IP Addresses for IKE and IPsec ..................10
      3.9. Allocation of SPIs ........................................10
   4. Security Considerations ........................................10
   5. Acknowledgements ...............................................11
   6. References .....................................................11
      6.1. Normative References ......................................11
      6.2. Informative References ....................................11
        
1. Introduction
1. 介绍

IKEv2, as described in [RFC5996], and IPsec, as described in [RFC4301] and others, allows deployment of VPNs between different sites as well as from VPN clients to protected networks.

[RFC5996]中所述的IKEv2和[RFC4301]等中所述的IPsec允许在不同站点之间以及从VPN客户端到受保护网络部署VPN。

As VPNs become increasingly important to the organizations deploying them, there is a demand to make IPsec solutions more scalable and less prone to down time, by using more than one physical gateway to either share the load or back each other up, forming a "cluster" (see Section 2). Similar demands have been made in the past for other critical pieces of an organization's infrastructure, such as DHCP and DNS servers, Web servers, databases, and others.

随着VPN对部署它们的组织越来越重要,人们需要通过使用多个物理网关来共享负载或相互备份,形成一个“集群”,从而使IPsec解决方案更具可扩展性,减少停机时间(见第2节)。过去对组织基础设施的其他关键部分也提出了类似的要求,如DHCP和DNS服务器、Web服务器、数据库等。

IKE and IPsec are, in particular, less friendly to clustering than these other protocols, because they store more state, and that state is more volatile. Section 2 defines terminology for use in this document and in the envisioned solution documents.

特别是,IKE和IPsec对集群的友好性不如其他协议,因为它们存储更多的状态,并且该状态更易波动。第2节定义了本文档和设想解决方案文档中使用的术语。

In general, deploying IKE and IPsec in a cluster requires such a large amount of information to be synchronized among the members of the cluster that it becomes impractical. Alternatively, if less information is synchronized, failover would mean a prolonged and intensive recovery phase, which negates the scalability and availability promises of using clusters. In Section 3, we will describe this in more detail.

一般来说,在集群中部署IKE和IPsec需要在集群成员之间同步大量信息,因此变得不切实际。或者,如果同步的信息较少,则故障切换将意味着一个漫长而密集的恢复阶段,这将否定使用群集所带来的可扩展性和可用性承诺。在第3节中,我们将更详细地描述这一点。

1.1. Conventions Used in This Document
1.1. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

本文件中的关键词“必须”、“不得”、“必需”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照[RFC2119]中所述进行解释。

2. Terminology
2. 术语

"Single Gateway" is an implementation of IKE and IPsec enforcing a certain policy, as described in [RFC4301].

“单一网关”是IKE和IPsec的一种实现,强制执行某种策略,如[RFC4301]中所述。

"Cluster" is a set of two or more gateways, implementing the same security policy, and protecting the same domain. Clusters exist to provide both high availability through redundancy and scalability through load sharing.

“集群”是由两个或多个网关组成的一组,实现相同的安全策略,并保护相同的域。集群通过冗余提供高可用性,通过负载共享提供可扩展性。

"Member" is one gateway in a cluster.

“成员”是群集中的一个网关。

"Availability" is a measure of a system's ability to perform the service for which it was designed. It is measured as the percentage of time a service is available from the time it is supposed to be

“可用性”是衡量系统执行其设计服务的能力。它是以一项服务从应该提供的时间开始可用的时间百分比来度量的

available. Colloquially, availability is sometimes expressed in "nines" rather than percentage, with 3 "nines" meaning 99.9% availability, 4 "nines" meaning 99.99% availability, etc.

可获得的通俗地说,可用性有时用“9”而不是百分比表示,3个“9”表示99.9%的可用性,4个“9”表示99.99%的可用性,等等。

"High Availability" is a property of a system, not a configuration type. A system is said to have high availability if its expected down time is low. High availability can be achieved in various ways, one of which is clustering. All the clusters described in this document achieve high availability. What "high" means depends on the application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of down time per year in a system that is supposed to be available all the time.

“高可用性”是系统的属性,而不是配置类型。如果系统的预期停机时间较短,则称其具有高可用性。高可用性可以通过多种方式实现,其中之一就是集群。本文档中描述的所有集群都实现了高可用性。“高”的含义取决于应用程序,但通常是4到6个“9”(在一个应该一直可用的系统中,每年最多有0.5-50分钟的停机时间)。

"Fault Tolerance" is a property related to high availability, where a system maintains service availability, even when a specified set of fault conditions occur. In clusters, we expect the system to maintain service availability, when one or more of the cluster members fails.

“容错性”是一种与高可用性相关的属性,在这种情况下,即使出现一组指定的故障条件,系统仍能保持服务可用性。在集群中,当一个或多个集群成员出现故障时,我们希望系统能够保持服务可用性。

"Completely Transparent Cluster" is a cluster where the occurrence of a fault is never visible to the peers.

“完全透明集群”是指对等方永远看不到故障发生的集群。

"Partially Transparent Cluster" is a cluster where the occurrence of a fault may be visible to the peers.

“部分透明集群”是一个对等方可以看到故障发生的集群。

"Hot Standby Cluster", or "HS Cluster" is a cluster where only one of the members is active at any one time. This member is also referred to as the "active" member, whereas the other(s) are referred to as "standbys". The Virtual Router Redundancy Protocol (VRRP) ([RFC5798]) is one method of building such a cluster.

“热备用群集”或“HS群集”是一个在任何时间只有一个成员处于活动状态的群集。该成员也称为“活跃”成员,而其他成员称为“备用”。虚拟路由器冗余协议(VRRP)([RFC5798])是构建此类集群的一种方法。

"Load Sharing Cluster", or "LS Cluster" is a cluster where more than one of the members may be active at the same time. The term "load balancing" is also common, but it implies that the load is actually balanced between the members, and this is not a requirement.

“负载共享群集”或“LS群集”是一个多个成员可能同时处于活动状态的群集。术语“负载平衡”也很常见,但它意味着负载实际上是在成员之间平衡的,这不是一个要求。

"Failover" is the event where one member takes over some load from some other member. In a hot standby cluster, this happens when a standby member becomes active due to a failure of the former active member, or because of an administrator command. In a load sharing cluster, this usually happens because of a failure of one of the members, but certain load-balancing technologies may allow a particular load (such as all the flows associated with a particular child Security Association (SA)) to move from one member to another to even out the load, even without any failures.

“故障转移”是一个成员从另一个成员处接管某些负载的事件。在热备用群集中,当备用成员由于前一个活动成员的故障或管理员命令而变为活动成员时,就会发生这种情况。在负载共享集群中,这通常是因为其中一个成员发生故障,但某些负载平衡技术可能允许特定负载(例如与特定子安全关联(SA)相关联的所有流)从一个成员移动到另一个成员以平衡负载,即使没有任何故障。

"Tight Cluster" is a cluster where all the members share an IP address. This could be accomplished using configured interfaces with specialized protocols or hardware, such as VRRP, or through the use of multicast addresses, but in any case, peers need only be configured with one IP address in the Peer Authentication Database.

“紧密集群”是所有成员共享一个IP地址的集群。这可以通过使用专用协议或硬件(如VRRP)的配置接口或通过使用多播地址来实现,但在任何情况下,只需在对等身份验证数据库中为对等方配置一个IP地址。

"Loose Cluster" is a cluster where each member has a different IP address. Peers find the correct member using some method such as DNS queries or the IKEv2 redirect mechanism ([RFC5685]). In some cases, a member's IP address(es) may be allocated to another member at failover.

“松散集群”是指每个成员具有不同IP地址的集群。对等方使用DNS查询或IKEv2重定向机制([RFC5685])等方法查找正确的成员。在某些情况下,成员的IP地址可能会在故障转移时分配给另一个成员。

"Synch Channel" is a communications channel among the cluster members, which is used to transfer state information. The synch channel may or may not be IP based, may or may not be encrypted, and may work over short or long distances. The security and physical characteristics of this channel are out of scope for this document, but it is a requirement that its use be minimized for scalability.

“同步通道”是集群成员之间的通信通道,用于传输状态信息。同步信道可以是或不是基于IP的,可以是或不是加密的,并且可以在短距离或长距离上工作。此通道的安全性和物理特性超出了本文档的范围,但为了可扩展性,需要尽量减少其使用。

3. The Problem Statement
3. 问题陈述

This section starts by scoping the problem, and goes on to list each of the issues encountered while setting up a cluster of IPsec VPN gateways.

本节首先确定问题的范围,然后列出在设置IPsec VPN网关群集时遇到的每个问题。

3.1. Scope
3.1. 范围

This document will make no attempt to describe the problems in setting up a generic cluster. It describes only problems related to the IKE/IPsec protocols.

本文档不会试图描述设置通用集群时出现的问题。它只描述与IKE/IPsec协议相关的问题。

The problem of synchronizing the policy between cluster members is out of scope, as this is an administrative issue that is not particular to either clusters or to IPsec.

在群集成员之间同步策略的问题超出范围,因为这是一个管理问题,不是群集或IPsec特有的。

The interesting scenario here is VPN, whether inter-domain or remote access. Host-to-host transport mode is not expected to benefit from this work.

这里有趣的场景是VPN,无论是域间访问还是远程访问。主机到主机传输模式预计不会从这项工作中受益。

We do not describe in full the problems of the communication channel between cluster members (the Synch Channel), nor do we intend to specify anything in this space later. Specifically, mixed-vendor clusters are out of scope.

我们没有完整地描述集群成员之间的通信通道(同步通道)的问题,也不打算稍后在此空间中指定任何内容。具体来说,混合供应商集群超出了范围。

The problem statement anticipates possible protocol-level solutions between IKE/IPsec peers in order to improve the availability and/or performance of VPN clusters. One vendor's IPsec endpoint should be able to work, optimally, with another vendor's cluster.

问题陈述预测了IKE/IPsec对等方之间可能的协议级解决方案,以提高VPN群集的可用性和/或性能。一个供应商的IPsec端点应该能够与另一个供应商的集群协同工作。

3.2. A Lot of Long-Lived State
3.2. 许多长期存在的国家

IKE and IPsec have a lot of long-lived state:

IKE和IPsec有许多长期存在的状态:

o IKE SAs last for minutes, hours, or days, and carry keys and other information. Some gateways may carry thousands to hundreds of thousands of IKE SAs.

o IKE SAs持续数分钟、数小时或数天,并携带钥匙和其他信息。一些网关可以承载数千到几十万个IKE SA。

o IPsec SAs last for minutes or hours, and carry keys, selectors, and other information. Some gateways may carry hundreds of thousands of such IPsec SAs.

o IPsec SAs持续数分钟或数小时,并携带钥匙、选择器和其他信息。某些网关可能承载数十万个这样的IPsec SA。

o SPD (Security Policy Database) cache entries. While the SPD is unchanging, the SPD cache changes on the fly due to narrowing. Entries last at least as long as the SAD (Security Association Database) entries, but tend to last even longer than that.

o SPD(安全策略数据库)缓存项。SPD不变时,SPD缓存会因变窄而动态变化。条目的持续时间至少与SAD(安全关联数据库)条目的持续时间相同,但往往比SAD(安全关联数据库)条目的持续时间更长。

A naive implementation of a cluster would have no synchronized state, and a failover would produce an effect similar to that of a rebooted gateway. [RFC5723] describes how new IKE and IPsec SAs can be recreated in such a case.

集群的简单实现将没有同步状态,故障切换将产生类似于重新启动网关的效果。[RFC5723]描述了在这种情况下如何重新创建新的IKE和IPsec SAs。

3.3. IKE Counters
3.3. IKE计数器

We can overcome the first problem described in Section 3.2, by synchronizing states -- whenever an SA is created, we can synch this new state to all other members. However, those states are not only long lived, they are also ever changing.

我们可以通过同步状态来克服第3.2节中描述的第一个问题——无论何时创建SA,我们都可以将这个新状态同步到所有其他成员。然而,这些国家不仅存在很长时间,而且还在不断变化。

IKE has message counters. A peer MUST NOT process message n until after it has processed message n-1. Skipping message IDs is not allowed. So a newly active member needs to know the last message IDs both received and transmitted.

IKE有消息计数器。对等方在处理消息n-1之前不得处理消息n。不允许跳过消息ID。因此,新活动成员需要知道接收和发送的最后一条消息ID。

One possible solution is to synchronize information about the IKE message counters after every IKE exchange. This way, the newly active member knows what messages it is allowed to process, and what message IDs to use on IKE requests, so that peers process them. This solution may be appropriate in some cases, but may be too onerous in systems with a lot of SAs. It also has the drawback that it never recovers from the missing synch message problem, which is described in Section 3.6.

一种可能的解决方案是在每次IKE交换之后同步有关IKE消息计数器的信息。通过这种方式,新活动成员知道允许它处理哪些消息,以及在IKE请求上使用哪些消息ID,以便对等方处理它们。此解决方案在某些情况下可能是合适的,但在具有大量SA的系统中可能过于繁重。它还有一个缺点,即它无法从丢失的同步消息问题中恢复,这在第3.6节中有描述。

3.4. Outbound SA Counters
3.4. 出站SA计数器

The Encapsulating Security Payload (ESP) and Authentication Header (AH) have an optional anti-replay feature, where every protected packet carries a counter number. Repeating counter numbers is

封装安全有效负载(ESP)和身份验证头(AH)具有可选的防重播功能,其中每个受保护的数据包都带有一个计数器号。重复计数器编号是错误的

considered an attack, so the newly active member MUST NOT use a replay counter number that has already been used. The peer will drop those packets as duplicates and/or warn of an attack.

被视为攻击,因此新活动成员不得使用已使用的重播计数器编号。对等方将丢弃重复的数据包和/或发出攻击警告。

Though it may be feasible to synchronize the IKE message counters, it is almost never feasible to synchronize the IPsec packet counters for every IPsec packet transmitted. So we have to assume that at least for IPsec, the replay counter will not be up to date on the newly active member, and the newly active member may repeat a counter.

尽管同步IKE消息计数器可能是可行的,但同步每个传输的IPsec数据包的IPsec数据包计数器几乎是不可行的。因此,我们必须假设,至少对于IPsec,重播计数器在新活动成员上不是最新的,并且新活动成员可能会重复计数器。

A possible solution is to synch replay counter information, not for each packet emitted, but only at regular intervals, say, every 10,000 packets or every 0.5 seconds. After a failover, the newly active member advances the counters for outbound IPsec SAs by 10,000 packets. To the peer, this looks like up to 10,000 packets were lost, but this should be acceptable, as neither ESP nor AH guarantee reliable delivery.

一个可能的解决方案是同步重放计数器信息,不是针对每个发出的数据包,而是仅在固定的时间间隔(例如,每10000个数据包或每0.5秒)进行同步。故障转移后,新活动成员将出站IPsec SAs的计数器提前10000个数据包。对对等方来说,这看起来像是多达10000个数据包丢失,但这应该是可以接受的,因为ESP和AH都不能保证可靠的传输。

3.5. Inbound SA Counters
3.5. 入境SA柜台

An even tougher issue is the synchronization of packet counters for inbound IPsec SAs. If a packet arrives at a newly active member, there is no way to determine whether or not this packet is a replay. The periodic synch does not solve this problem at all, because suppose we synchronize every 10,000 packets, and the last synch before the failover had the counter at 170,000. It is probable, though not certain, that packet number 180,000 has not yet been processed, but if packet 175,000 arrives at the newly active member, it has no way of determining whether or not that packet has already been processed. The synchronization does prevent the processing of really old packets, such as those with counter number 165,000. Ignoring all counters below 180,000 won't work either, because that's up to 10,000 dropped packets, which may be very noticeable.

更棘手的问题是入站IPsec SA的数据包计数器的同步。如果数据包到达新的活动成员,则无法确定此数据包是否为重播。定期同步根本不能解决这个问题,因为假设我们每10000个数据包同步一次,故障转移之前的最后一次同步的计数器为170000。虽然不确定,但分组编号180000可能尚未被处理,但是如果分组175000到达新的活动成员,则无法确定该分组是否已经被处理。同步确实会阻止处理非常旧的数据包,例如计数器号为165000的数据包。忽略180000以下的所有计数器也不起作用,因为这意味着多达10000个丢弃的数据包,这可能非常明显。

The easiest solution is to learn the replay counter from the incoming traffic. This is allowed by the standards, because replay counter verification is an optional feature (see Section 3.2 in [RFC4301]). The case can even be made that it is relatively secure, because non-attack traffic will reset the counters to what they should be, so an attacker faces the dual challenge of a very narrow window for attack, and the need to time the attack to a failover event. Unless the attacker can actually cause the failover, this would be very difficult. It should be noted, though, that although this solution is acceptable as far as RFC 4301 goes, it is a matter of policy whether this is acceptable.

最简单的解决方案是从传入流量中学习replay计数器。这是标准允许的,因为回放计数器验证是可选功能(见[RFC4301]第3.2节)。甚至可以证明它是相对安全的,因为非攻击流量会将计数器重置为应该的值,因此攻击者面临双重挑战,即攻击窗口非常窄,以及需要将攻击时间定为故障转移事件。除非攻击者能够真正导致故障转移,否则这将非常困难。但应注意的是,尽管就RFC 4301而言,该解决方案是可接受的,但这是否可接受是一个政策问题。

Another possible solution to the inbound IPsec SA problem is to rekey all child SAs following a failover. This may or may not be feasible depending on the implementation and the configuration.

入站IPsec SA问题的另一个可能的解决方案是在故障切换后重新设置所有子SA的密钥。这可能可行,也可能不可行,取决于实现和配置。

3.6. Missing Synch Messages
3.6. 丢失同步消息

The synch channel is very likely not to be infallible. Before failover is detected, some synchronization messages may have been missed. For example, the active member may have created a new child SA using message n. The new information (entry in the SAD and update to counters of the IKE SA) is sent on the synch channel. Still, with every possible technology, the update may be missed before the failover.

同步频道很可能不是绝对正确的。在检测到故障转移之前,某些同步消息可能已丢失。例如,活动成员可能已使用消息n创建了新的子SA。新信息(SAD中的条目和IKE SA计数器的更新)通过同步通道发送。尽管如此,使用各种可能的技术,在故障切换之前可能会错过更新。

This is a bad situation, because the IKE SA is doomed. The newly active member has two problems:

这是一个糟糕的情况,因为艾克SA注定要失败。新激活的成员有两个问题:

o It does not have the new IPsec SA pair. It will drop all incoming packets protected with such an SA. This could be fixed by sending some DELETEs and INVALID_SPI notifications, if it wasn't for the other problem.

o 它没有新的IPsec SA对。它将丢弃所有受此SA保护的传入数据包。如果不是因为其他问题,可以通过发送一些删除和无效的_SPI通知来解决。

o The counters for the IKE SA show that only request n-1 has been sent. The next request will get the message ID n, but that will be rejected by the peer. After a sufficient number of retransmissions and rejections, the whole IKE SA with all associated IPsec SAs will get dropped.

o IKE SA的计数器显示仅发送了请求n-1。下一个请求将获得消息ID n,但该ID将被对等方拒绝。在足够数量的重传和拒绝之后,整个IKE SA以及所有相关联的IPsec SA将被丢弃。

The above scenario may be rare enough that it is acceptable that on a configuration with thousands of IKE SAs, a few will need to be recreated from scratch or using session resumption techniques. However, detecting this may take a long time (several minutes) and this negates the goal of creating a cluster in the first place.

上述场景可能非常罕见,因此在具有数千个IKE SA的配置上,需要从头开始或使用会话恢复技术重新创建一些IKE SA是可以接受的。但是,检测这一点可能需要很长时间(几分钟),这就否定了首先创建集群的目标。

3.7. Simultaneous Use of IKE and IPsec SAs by Different Members
3.7. 不同成员同时使用IKE和IPsec SAs

For load sharing clusters, all active members may need to use the same SAs, both IKE and IPsec. This is an even greater problem than in the case of hot standby clusters, because consecutive packets may need to be sent by different members to the same peer gateway.

对于负载共享群集,所有活动成员可能需要使用相同的SAs,包括IKE和IPsec。这是一个比热备用集群更大的问题,因为不同成员可能需要将连续数据包发送到同一对等网关。

The solution to the IKE SA issue is up to the implementation. It's possible to create some locking mechanism over the synch channel, or else have one member "own" the IKE SA and manage the child SAs for all other members. For IPsec, solutions fall into two broad categories.

IKE SA问题的解决方案取决于实现。可以在同步通道上创建一些锁定机制,或者让一个成员“拥有”IKE SA并管理所有其他成员的子SA。对于IPsec,解决方案分为两大类。

The first is the "sticky" category, where all communications with a single peer, or all communications involving a certain SPD cache entry go through a single peer. In this case, all packets that match any particular SA go through the same member, so no synchronization of the replay counter needs to be done. Inbound processing is a "sticky" issue (no pun intended), because the packets have to be processed by the correct member based on peer and the Security Parameter Index (SPI), and most load balancers will not be able to match the SPIs to the correct member, unless stickiness extends to all traffic with a particular peer. Another disadvantage of sticky solutions is that the load tends to not distribute evenly, especially if one SA covers a significant portion of IPsec traffic.

第一种是“粘性”类别,其中与单个对等方的所有通信,或涉及特定SPD缓存项的所有通信都通过单个对等方。在这种情况下,与任何特定SA匹配的所有数据包都经过同一个成员,因此不需要同步replay计数器。入站处理是一个“粘性”问题(并非双关语),因为数据包必须由正确的成员基于对等方和安全参数索引(SPI)进行处理,并且大多数负载平衡器将无法将SPI与正确的成员匹配,除非粘性扩展到具有特定对等方的所有流量。粘性解决方案的另一个缺点是负载往往分布不均匀,特别是当一个SA覆盖了IPsec流量的很大一部分时。

The second is the "duplicate" category, where the child SA is duplicated for each pair of IPsec SAs for each active member. Different packets for the same peer go through different members, and get protected using different SAs with the same selectors and matching the same entries in the SPD cache. This has some shortcomings:

第二个是“复制”类别,其中每个活动成员的每对IPsec SA的子SA都是复制的。同一对等方的不同数据包经过不同的成员,并使用具有相同选择器并匹配SPD缓存中相同条目的不同SA获得保护。这有一些缺点:

o It requires multiple parallel SAs, for which the peer has no use. Section 2.8 of [RFC5996] specifically allows this, but some implementation might have a policy against long-term maintenance of redundant SAs.

o 它需要多个并行SA,对等方对此没有用处。[RFC5996]的第2.8节特别允许这样做,但某些实施可能有一项禁止长期维护冗余SA的政策。

o Different packets that belong to the same flow may be protected by different SAs, which may seem "weird" to the peer gateway, especially if it is integrated with some deep-inspection middleware such as a firewall. It is not known whether this will cause problems with current gateways. It is also impossible to mandate against this, because the definition of "flow" varies from one implementation to another.

o 属于同一流的不同数据包可能受到不同SA的保护,这对对等网关来说可能是“奇怪的”,特别是如果它与某些深度检查中间件(如防火墙)集成在一起。目前还不知道这是否会导致当前网关出现问题。也不可能强制反对这一点,因为“流”的定义在不同的实现中有所不同。

o Reply packets may arrive with an IPsec SA that is not "matched" to the one used for the outgoing packets. Also, they might arrive at a different member. This problem is beyond the scope of this document and should be solved by the application, perhaps by forwarding misdirected packets to the correct gateway for deep inspection.

o 回复数据包可能与IPsec SA一起到达,IPsec SA与用于传出数据包的IPsec SA不“匹配”。此外,他们可能会到达另一个成员。这个问题超出了本文档的范围,应该由应用程序解决,可能是通过将错误定向的数据包转发到正确的网关进行深入检查。

3.7.1. Outbound SAs Using Counter Modes
3.7.1. 使用计数器模式的出站SAs

For SAs involving counter mode ciphers such as Counter Mode (CTR) ([RFC3686]) or Galois/Counter Mode (GCM) ([RFC4106]) there is yet another complication. The initial vector for such modes MUST NOT be repeated, and senders use methods such as counters or linear feedback shift registers (LFSRs) to ensure this. For an SA shared between more than one active member, or even failing over from one member to

对于涉及计数器模式密码的SA,如计数器模式(CTR)([RFC3686])或伽罗瓦/计数器模式(GCM)([RFC4106]),还有另一个复杂问题。此类模式的初始向量不得重复,发送方使用计数器或线性反馈移位寄存器(LFSR)等方法来确保这一点。对于在多个活动成员之间共享的SA,甚至从一个成员故障切换到另一个成员

another, the cluster members need to make sure that they do not generate the same initial vector. See [COUNTER_MODES] for a discussion of this problem in another context.

另一方面,集群成员需要确保它们不会生成相同的初始向量。请参见[COUNTER_MODES],了解在另一个上下文中对此问题的讨论。

3.8. Different IP Addresses for IKE and IPsec
3.8. IKE和IPsec的不同IP地址

In many implementations there are separate IP addresses for the cluster, and for each member. While the packets protected by tunnel mode child SAs are encapsulated in IP headers with the cluster IP address, the IKE packets originate from a specific member, and carry that member's IP address. This may be done so that IPsec traffic bypasses the load balancer for greater scalability. For the peer, this looks weird, as the usual thing is for the IPsec packets to come from the same IP address as the IKE packets. Unmodified peers may drop such packets.

在许多实现中,集群和每个成员都有单独的IP地址。虽然受隧道模式子SA保护的数据包被封装在具有集群IP地址的IP报头中,但IKE数据包源自特定成员,并携带该成员的IP地址。这样做可以使IPsec通信绕过负载平衡器以获得更大的可伸缩性。对于对等方来说,这看起来很奇怪,因为通常情况下,IPsec数据包与IKE数据包来自同一IP地址。未修改的对等方可以丢弃这样的数据包。

One obvious solution is to use some fancy capability of the IKE host to change things so that IKE packets also come out of the cluster IP address. This can be achieved through NAT or through assigning multiple addresses to interfaces. This is not, however, possible for all implementations, and will not reduce load on the balancer.

一个明显的解决方案是使用IKE主机的一些奇特功能来改变事情,以便IKE数据包也来自集群IP地址。这可以通过NAT或向接口分配多个地址来实现。然而,这并不是所有实现都可以做到的,也不会减少平衡器上的负载。

[ARORA] discusses this problem in greater depth, and proposes another solution, that does involve protocol changes.

[ARORA]更深入地讨论了这个问题,并提出了另一个解决方案,该解决方案涉及协议更改。

3.9. Allocation of SPIs
3.9. SPI的分配

The SPI associated with each child SA, and with each IKE SA, MUST be unique relative to the peer of the SA. Thus, in the context of a cluster, each cluster member MUST generate SPIs in a fashion that avoids collisions (with other cluster members) for these SPI values. The means by which cluster members achieve this requirement is a local matter, outside the scope of this document.

与每个子SA和每个IKE SA关联的SPI相对于SA的对等方必须是唯一的。因此,在集群的上下文中,每个集群成员必须以避免这些SPI值发生冲突(与其他集群成员)的方式生成SPI。集群成员实现这一要求的方式属于本地事务,不在本文件范围内。

4. Security Considerations
4. 安全考虑

Implementations running on clusters MUST be as secure as implementations running on single gateways. In other words, no extension or interpretation used to allow operation in a cluster may facilitate attacks that are not possible for single gateways.

在集群上运行的实现必须与在单个网关上运行的实现一样安全。换句话说,用于允许在集群中操作的任何扩展或解释都不会促进单个网关不可能进行的攻击。

Moreover, thought must be given to the synching requirements of any protocol extension to make sure that it does not create an opportunity for denial-of-service attacks on the cluster.

此外,必须考虑任何协议扩展的同步要求,以确保它不会在集群上造成拒绝服务攻击的机会。

As mentioned in Section 3.5, allowing an inbound child SA to failover to another member has the effect of disabling replay counter protection for a short time. Though the threat is arguably low, it is a policy decision whether this is acceptable.

如第3.5节所述,允许入站子SA故障切换到另一个成员会在短时间内禁用重播计数器保护。尽管威胁可以说很低,但这是否可以接受还是一个政策决定。

Section 3.7 describes the problem of the two directions of a flow being protected by two SAs that are not part of a matched pair or that are not even being processed by the same cluster member. This is not a security problem as far as IPsec is concerned because IPsec has policy at the IP, protocol and port level only. However, many IPsec implementations are integrated with stateful firewalls, which need to see both sides of a flow. Such implementations may have to forward packets to other members for the firewall to properly inspect the traffic.

第3.7节描述了流的两个方向受两个SA保护的问题,这两个SA不是匹配对的一部分,或者甚至不是由同一集群成员处理的。就IPsec而言,这不是一个安全问题,因为IPsec只有IP、协议和端口级别的策略。然而,许多IPsec实现都与有状态防火墙集成,这需要看到流的两面。这样的实现可能必须将数据包转发给其他成员,以便防火墙正确地检查流量。

5. Acknowledgements
5. 致谢

This document is the collective work, and includes contribution from many people who participate in the IPsecME working group.

本文件是集体工作,包括许多参与IPsecME工作组的人员的贡献。

The editor would particularly like to acknowledge the extensive contribution of the following people (in alphabetical order): Jitender Arora, Jean-Michel Combes, Dan Harkins, David Harrington, Steve Kent, Tero Kivinen, Alexey Melnikov, Yaron Sheffer, Melinda Shore, and Rodney Van Meter.

编辑特别要感谢以下人士的广泛贡献(按字母顺序排列):吉滕德·阿罗拉、让·米歇尔·库姆斯、丹·哈金斯、大卫·哈林顿、史蒂夫·肯特、泰罗·基维宁、阿列克谢·梅尔尼科夫、雅隆·谢弗、梅琳达·肖尔和罗德尼·范·米特。

6. References
6. 工具书类
6.1. Normative References
6.1. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。

[RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005.

[RFC4301]Kent,S.和K.Seo,“互联网协议的安全架构”,RFC 43012005年12月。

[RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, "Internet Key Exchange Protocol Version 2 (IKEv2)", September 2010.

[RFC5996]Kaufman,C.,Hoffman,P.,Nir,Y.,和P.Erenen,“互联网密钥交换协议版本2(IKEv2)”,2010年9月。

6.2. Informative References
6.2. 资料性引用

[ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for IKEv2", Work in Progress, April 2010.

[ARORA]ARORA,J.和P.Kumar,“IKEv2的备用隧道地址”,正在进行的工作,2010年4月。

[COUNTER_MODES] McGrew, D. and B. Weis, "Using Counter Modes with Encapsulating Security Payload (ESP) and Authentication Header (AH) to Protect Group Traffic", Work in Progress, March 2010.

[COUNTER_MODES]McGrew,D.和B.Weis,“使用带有封装安全负载(ESP)和身份验证头(AH)的计数器模式来保护组流量”,正在进行的工作,2010年3月。

[RFC3686] Housley, R., "Using Advanced Encryption Standard (AES) Counter Mode", RFC 3686, January 2009.

[RFC3686]Housley,R.,“使用高级加密标准(AES)计数器模式”,RFC3686,2009年1月。

[RFC4106] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode (GCM) in IPsec Encapsulating Security Payload (ESP)", RFC 4106, June 2005.

[RFC4106]Viega,J.和D.McGrew,“在IPsec封装安全有效负载(ESP)中使用Galois/计数器模式(GCM)”,RFC 4106,2005年6月。

[RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for IKEv2", RFC 5685, November 2009.

[RFC5685]Devarapalli,V.和K.Weniger,“IKEv2的重定向机制”,RFC 5685,2009年11月。

[RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", RFC 5723, January 2010.

[RFC5723]Sheffer,Y.和H.Tschofenig,“IKEv2会议恢复”,RFC 57232010年1月。

[RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", RFC 5798, March 2010.

[RFC5798]Nadas,S.,“虚拟路由器冗余协议(VRRP)”,RFC 57982010年3月。

Author's Address

作者地址

Yoav Nir Check Point Software Technologies Ltd. 5 Hasolelim st. Tel Aviv 67897 Israel

以色列特拉维夫Hasolelim街5号Yoav Nir Check Point软件技术有限公司67897

   EMail: ynir@checkpoint.com
        
   EMail: ynir@checkpoint.com