Independent Submission                                      P. Garg, Ed.
Request for Comments: 7637                                  Y. Wang, Ed.
Category: Informational                                        Microsoft
ISSN: 2070-1721                                           September 2015
        
Independent Submission                                      P. Garg, Ed.
Request for Comments: 7637                                  Y. Wang, Ed.
Category: Informational                                        Microsoft
ISSN: 2070-1721                                           September 2015
        

NVGRE: Network Virtualization Using Generic Routing Encapsulation

NVGRE:使用通用路由封装的网络虚拟化

Abstract

摘要

This document describes the usage of the Generic Routing Encapsulation (GRE) header for Network Virtualization (NVGRE) in multi-tenant data centers. Network Virtualization decouples virtual networks and addresses from physical network infrastructure, providing isolation and concurrency between multiple virtual networks on the same physical network infrastructure. This document also introduces a Network Virtualization framework to illustrate the use cases, but the focus is on specifying the data-plane aspect of NVGRE.

本文档描述了多租户数据中心中用于网络虚拟化(NVGRE)的通用路由封装(GRE)标头的用法。网络虚拟化将虚拟网络和地址与物理网络基础设施分离,在同一物理网络基础设施上的多个虚拟网络之间提供隔离和并发。本文档还介绍了一个网络虚拟化框架来说明用例,但重点是指定NVGRE的数据平面方面。

Status of This Memo

关于下段备忘

This document is not an Internet Standards Track specification; it is published for informational purposes.

本文件不是互联网标准跟踪规范;它是为了提供信息而发布的。

This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741.

这是对RFC系列的贡献,独立于任何其他RFC流。RFC编辑器已选择自行发布此文档,并且未声明其对实现或部署的价值。RFC编辑批准发布的文件不适用于任何级别的互联网标准;见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7637.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc7637.

Copyright Notice

版权公告

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

版权所有(c)2015 IETF信托基金和确定为文件作者的人员。版权所有。

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。

Table of Contents

目录

   1. Introduction ....................................................2
      1.1. Terminology ................................................4
   2. Conventions Used in This Document ...............................4
   3. Network Virtualization Using GRE (NVGRE) ........................4
      3.1. NVGRE Endpoint .............................................5
      3.2. NVGRE Frame Format .........................................5
      3.3. Inner Tag as Defined by IEEE 802.1Q ........................8
      3.4. Reserved VSID ..............................................8
   4. NVGRE Deployment Considerations .................................9
      4.1. ECMP Support ...............................................9
      4.2. Broadcast and Multicast Traffic ............................9
      4.3. Unicast Traffic ............................................9
      4.4. IP Fragmentation ..........................................10
      4.5. Address/Policy Management and Routing .....................10
      4.6. Cross-Subnet, Cross-Premise Communication .................10
      4.7. Internet Connectivity .....................................12
      4.8. Management and Control Planes .............................12
      4.9. NVGRE-Aware Devices .......................................12
      4.10. Network Scalability with NVGRE ...........................13
   5. Security Considerations ........................................14
   6. Normative References ...........................................14
   Contributors ......................................................16
   Authors' Addresses ................................................17
        
   1. Introduction ....................................................2
      1.1. Terminology ................................................4
   2. Conventions Used in This Document ...............................4
   3. Network Virtualization Using GRE (NVGRE) ........................4
      3.1. NVGRE Endpoint .............................................5
      3.2. NVGRE Frame Format .........................................5
      3.3. Inner Tag as Defined by IEEE 802.1Q ........................8
      3.4. Reserved VSID ..............................................8
   4. NVGRE Deployment Considerations .................................9
      4.1. ECMP Support ...............................................9
      4.2. Broadcast and Multicast Traffic ............................9
      4.3. Unicast Traffic ............................................9
      4.4. IP Fragmentation ..........................................10
      4.5. Address/Policy Management and Routing .....................10
      4.6. Cross-Subnet, Cross-Premise Communication .................10
      4.7. Internet Connectivity .....................................12
      4.8. Management and Control Planes .............................12
      4.9. NVGRE-Aware Devices .......................................12
      4.10. Network Scalability with NVGRE ...........................13
   5. Security Considerations ........................................14
   6. Normative References ...........................................14
   Contributors ......................................................16
   Authors' Addresses ................................................17
        
1. Introduction
1. 介绍

Conventional data center network designs cater to largely static workloads and cause fragmentation of network and server capacity [6] [7]. There are several issues that limit dynamic allocation and consolidation of capacity. Layer 2 networks use the Rapid Spanning Tree Protocol (RSTP), which is designed to eliminate loops by blocking redundant paths. These eliminated paths translate to wasted capacity and a highly oversubscribed network. There are alternative approaches such as the Transparent Interconnection of Lots of Links (TRILL) that address this problem [13].

传统的数据中心网络设计主要针对静态工作负载,并导致网络和服务器容量碎片化[6][7]。有几个问题限制了容量的动态分配和整合。第二层网络使用快速生成树协议(RSTP),该协议旨在通过阻塞冗余路径消除环路。这些被消除的路径会导致容量浪费和网络过度订阅。有替代方法,如大量链路的透明互连(TRILL),可以解决这个问题[13]。

The network utilization inefficiencies are exacerbated by network fragmentation due to the use of VLANs for broadcast isolation. VLANs are used for traffic management and also as the mechanism for providing security and performance isolation among services belonging to different tenants. The Layer 2 network is carved into smaller-sized subnets (typically, one subnet per VLAN), with VLAN tags configured on all the Layer 2 switches connected to server racks that host a given tenant's services. The current VLAN limits theoretically allow for 4,000 such subnets to be created. The size

由于使用VLAN进行广播隔离,网络碎片化加剧了网络利用率的低下。VLAN用于流量管理,也用作在属于不同租户的服务之间提供安全性和性能隔离的机制。第2层网络被划分为更小的子网(通常,每个VLAN一个子网),VLAN标签配置在连接到承载给定租户服务的服务器机架的所有第2层交换机上。目前的VLAN限制理论上允许创建4000个这样的子网。大小

of the broadcast domain is typically restricted due to the overhead of broadcast traffic. The 4,000-subnet limit on VLANs is no longer sufficient in a shared infrastructure servicing multiple tenants.

由于广播业务的开销,广播域的带宽通常受到限制。在为多个租户提供服务的共享基础设施中,VLAN上的4000子网限制不再足够。

Data center operators must be able to achieve high utilization of server and network capacity. In order to achieve efficiency, it should be possible to assign workloads that operate in a single Layer 2 network to any server in any rack in the network. It should also be possible to migrate workloads to any server anywhere in the network while retaining the workloads' addresses. This can be achieved today by stretching VLANs; however, when workloads migrate, the network needs to be reconfigured and that is typically error prone. By decoupling the workload's location on the LAN from its network address, the network administrator configures the network once, not every time a service migrates. This decoupling enables any server to become part of any server resource pool.

数据中心运营商必须能够实现服务器和网络容量的高利用率。为了实现效率,应该可以将在单层2网络中运行的工作负载分配给网络中任何机架中的任何服务器。还可以将工作负载迁移到网络中任何位置的任何服务器,同时保留工作负载的地址。今天,这可以通过扩展VLAN来实现;但是,当工作负载迁移时,需要重新配置网络,这通常容易出错。通过将工作负载在LAN上的位置与其网络地址分离,网络管理员只需配置一次网络,而不是每次服务迁移时。这种解耦使任何服务器都可以成为任何服务器资源池的一部分。

The following are key design objectives for next-generation data centers:

以下是下一代数据中心的关键设计目标:

a) location-independent addressing

a) 位置无关寻址

b) the ability to a scale the number of logical Layer 2 / Layer 3 networks, irrespective of the underlying physical topology or the number of VLANs

b) 扩展逻辑第2层/第3层网络数量的能力,与底层物理拓扑或VLAN数量无关

c) preserving Layer 2 semantics for services and allowing them to retain their addresses as they move within and across data centers

c) 保留服务的第2层语义,并允许服务在数据中心内和数据中心之间移动时保留其地址

d) providing broadcast isolation as workloads move around without burdening the network control plane

d) 在工作负载移动时提供广播隔离,而不会加重网络控制平面的负担

This document describes use of the Generic Routing Encapsulation (GRE) header [3] [4] for network virtualization. Network virtualization decouples a virtual network from the underlying physical network infrastructure by virtualizing network addresses. Combined with a management and control plane for the virtual-to-physical mapping, network virtualization can enable flexible virtual machine placement and movement and provide network isolation for a multi-tenant data center.

本文档描述了通用路由封装(GRE)头[3][4]在网络虚拟化中的使用。网络虚拟化通过虚拟化网络地址,将虚拟网络与底层物理网络基础设施分离。结合虚拟到物理映射的管理和控制平面,网络虚拟化可以实现灵活的虚拟机放置和移动,并为多租户数据中心提供网络隔离。

Network virtualization enables customers to bring their own address spaces into a multi-tenant data center, while the data center administrators can place the customer virtual machines anywhere in the data center without reconfiguring their network switches or routers, irrespective of the customer address spaces.

网络虚拟化使客户能够将自己的地址空间带到多租户数据中心,而数据中心管理员可以将客户虚拟机放置在数据中心的任何位置,而无需重新配置其网络交换机或路由器,而无需考虑客户地址空间。

1.1. Terminology
1.1. 术语

Please refer to RFCs 7364 [10] and 7365 [11] for more formal definitions of terminology. The following terms are used in this document.

有关术语的更正式定义,请参考RFCs 7364[10]和7365[11]。本文件中使用了以下术语。

Customer Address (CA): This is the virtual IP address assigned and configured on the virtual Network Interface Controller (NIC) within each VM. This is the only address visible to VMs and applications running within VMs.

客户地址(CA):这是在每个VM内的虚拟网络接口控制器(NIC)上分配和配置的虚拟IP地址。这是虚拟机和在虚拟机内运行的应用程序唯一可见的地址。

Network Virtualization Edge (NVE): This is an entity that performs the network virtualization encapsulation and decapsulation.

网络虚拟化边缘(NVE):这是一个执行网络虚拟化封装和反封装的实体。

Provider Address (PA): This is the IP address used in the physical network. PAs are associated with VM CAs through the network virtualization mapping policy.

提供者地址(PA):这是物理网络中使用的IP地址。PA通过网络虚拟化映射策略与VM CA相关联。

Virtual Machine (VM): This is an instance of an OS running on top of the hypervisor over a physical machine or server. Multiple VMs can share the same physical server via the hypervisor, yet are completely isolated from each other in terms of CPU usage, storage, and other OS resources.

虚拟机(VM):这是一个操作系统的实例,运行在物理机或服务器上的虚拟机监控程序之上。多个虚拟机可以通过虚拟机监控程序共享同一个物理服务器,但在CPU使用、存储和其他操作系统资源方面彼此完全隔离。

Virtual Subnet Identifier (VSID): This is a 24-bit ID that uniquely identifies a virtual subnet or virtual Layer 2 broadcast domain.

虚拟子网标识符(VSID):这是唯一标识虚拟子网或虚拟第2层广播域的24位ID。

2. Conventions Used in This Document
2. 本文件中使用的公约

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[1]中所述进行解释。

In this document, these words will appear with that interpretation only when in ALL CAPS. Lowercase uses of these words are not to be interpreted as carrying the significance defined in RFC 2119.

在本文件中,只有在所有大写字母中,这些单词才会以该解释出现。这些词语的小写用法不得解释为具有RFC 2119中定义的意义。

3. Network Virtualization Using GRE (NVGRE)
3. 使用GRE的网络虚拟化(NVGRE)

This section describes Network Virtualization using GRE (NVGRE). Network virtualization involves creating virtual Layer 2 topologies on top of a physical Layer 3 network. Connectivity in the virtual topology is provided by tunneling Ethernet frames in GRE over IP over the physical network.

本节介绍使用GRE(NVGRE)的网络虚拟化。网络虚拟化涉及在物理第3层网络之上创建虚拟第2层拓扑。虚拟拓扑中的连接是通过物理网络上的GRE over IP隧道以太网帧来提供的。

In NVGRE, every virtual Layer 2 network is associated with a 24-bit identifier, called a Virtual Subnet Identifier (VSID). A VSID is carried in an outer header as defined in Section 3.2. This allows

在NVGRE中,每个虚拟第2层网络都与一个24位标识符相关联,称为虚拟子网标识符(VSID)。VSID携带在第3.2节定义的外部标题中。这允许

unique identification of a tenant's virtual subnet to various devices in the network. A 24-bit VSID supports up to 16 million virtual subnets in the same management domain, in contrast to only 4,000 that is achievable with VLANs. Each VSID represents a virtual Layer 2 broadcast domain, which can be used to identify a virtual subnet of a given tenant. To support multi-subnet virtual topology, data center administrators can configure routes to facilitate communication between virtual subnets of the same tenant.

租户虚拟子网到网络中各种设备的唯一标识。24位VSID在同一管理域中最多支持1600万个虚拟子网,而VLAN只能支持4000个虚拟子网。每个VSID表示一个虚拟第2层广播域,可用于标识给定租户的虚拟子网。为了支持多子网虚拟拓扑,数据中心管理员可以配置路由以促进同一租户的虚拟子网之间的通信。

GRE is a Proposed Standard from the IETF [3] [4] and provides a way for encapsulating an arbitrary protocol over IP. NVGRE leverages the GRE header to carry VSID information in each packet. The VSID information in each packet can be used to build multi-tenant-aware tools for traffic analysis, traffic inspection, and monitoring.

GRE是IETF[3][4]提出的标准,它提供了一种通过IP封装任意协议的方法。NVGRE利用GRE报头在每个数据包中携带VSID信息。每个数据包中的VSID信息可用于构建多租户感知工具,用于流量分析、流量检查和监控。

The following sections detail the packet format for NVGRE; describe the functions of an NVGRE endpoint; illustrate typical traffic flow both within and across data centers; and discuss address/policy management, and deployment considerations.

以下各节详细介绍了NVGRE的数据包格式;描述NVGRE端点的功能;说明数据中心内部和之间的典型流量;并讨论地址/策略管理和部署注意事项。

3.1. NVGRE Endpoint
3.1. NVGRE端点

NVGRE endpoints are the ingress/egress points between the virtual and the physical networks. The NVGRE endpoints are the NVEs as defined in the Network Virtualization over Layer 3 (NVO3) Framework document [11]. Any physical server or network device can be an NVGRE endpoint. One common deployment is for the endpoint to be part of a hypervisor. The primary function of this endpoint is to encapsulate/decapsulate Ethernet data frames to and from the GRE tunnel, ensure Layer 2 semantics, and apply isolation policy scoped on VSID. The endpoint can optionally participate in routing and function as a gateway in the virtual topology. To encapsulate an Ethernet frame, the endpoint needs to know the location information for the destination address in the frame. This information can be provisioned via a management plane or obtained via a combination of control-plane distribution or data-plane learning approaches. This document assumes that the location information, including VSID, is available to the NVGRE endpoint.

NVGRE端点是虚拟网络和物理网络之间的入口/出口点。NVGRE端点是第3层网络虚拟化(NVO3)框架文件[11]中定义的NVE。任何物理服务器或网络设备都可以是NVGRE端点。一种常见的部署是将端点作为虚拟机监控程序的一部分。该端点的主要功能是将以太网数据帧封装到GRE隧道中或从GRE隧道中解封,确保第2层语义,并在VSID上应用隔离策略。端点可以选择性地参与路由,并在虚拟拓扑中充当网关。要封装以太网帧,端点需要知道帧中目标地址的位置信息。该信息可以通过管理平面提供,或者通过控制平面分布或数据平面学习方法的组合获得。本文档假设位置信息(包括VSID)可供NVGRE端点使用。

3.2. NVGRE Frame Format
3.2. NVGRE帧格式

The GRE header format as specified in RFCs 2784 [3] and 2890 [4] is used for communication between NVGRE endpoints. NVGRE leverages the Key extension specified in RFC 2890 [4] to carry the VSID. The packet format for Layer 2 encapsulation in GRE is shown in Figure 1.

RFCs 2784[3]和2890[4]中规定的GRE头格式用于NVGRE端点之间的通信。NVGRE利用RFC 2890[4]中指定的密钥扩展来携带VSID。GRE中第2层封装的数据包格式如图1所示。

   Outer Ethernet Header:
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                (Outer) Destination MAC Address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |(Outer)Destination MAC Address |  (Outer)Source MAC Address    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Outer) Source MAC Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Ethertype 0x0800        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   Outer Ethernet Header:
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                (Outer) Destination MAC Address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |(Outer)Destination MAC Address |  (Outer)Source MAC Address    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Outer) Source MAC Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Ethertype 0x0800        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   Outer IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  HL   |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live | Protocol 0x2F |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      (Outer) Source Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Outer) Destination Address                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   Outer IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  HL   |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live | Protocol 0x2F |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      (Outer) Source Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Outer) Destination Address                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   GRE Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| |1|0|   Reserved0     | Ver |   Protocol Type 0x6558        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               Virtual Subnet ID (VSID)        |    FlowID     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Inner Ethernet Header
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                (Inner) Destination MAC Address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |(Inner)Destination MAC Address |  (Inner)Source MAC Address    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Inner) Source MAC Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Ethertype 0x0800        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   GRE Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0| |1|0|   Reserved0     | Ver |   Protocol Type 0x6558        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               Virtual Subnet ID (VSID)        |    FlowID     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Inner Ethernet Header
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                (Inner) Destination MAC Address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |(Inner)Destination MAC Address |  (Inner)Source MAC Address    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  (Inner) Source MAC Address                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Ethertype 0x0800        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   Inner IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  HL   |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |    Protocol   |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source Address                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Original IP Payload                      |
   |                                                               |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   Inner IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  HL   |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |    Protocol   |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source Address                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Original IP Payload                      |
   |                                                               |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Figure 1: GRE Encapsulation Frame Format

图1:GRE封装帧格式

Note: HL stands for Header Length.

注:HL代表收割台长度。

The outer/delivery headers include the outer Ethernet header and the outer IP header:

外部/传送头包括外部以太网头和外部IP头:

o The outer Ethernet header: The source Ethernet address in the outer frame is set to the MAC address associated with the NVGRE endpoint. The destination endpoint may or may not be on the same physical subnet. The destination Ethernet address is set to the MAC address of the next-hop IP address for the destination NVE. The outer VLAN tag information is optional and can be used for traffic management and broadcast scalability on the physical network.

o The outer Ethernet header: The source Ethernet address in the outer frame is set to the MAC address associated with the NVGRE endpoint. The destination endpoint may or may not be on the same physical subnet. The destination Ethernet address is set to the MAC address of the next-hop IP address for the destination NVE. The outer VLAN tag information is optional and can be used for traffic management and broadcast scalability on the physical network.translate error, please retry

o The outer IP header: Both IPv4 and IPv6 can be used as the delivery protocol for GRE. The IPv4 header is shown for illustrative purposes. Henceforth, the IP address in the outer frame is referred to as the Provider Address (PA). There can be one or more PA associated with an NVGRE endpoint, with policy controlling the choice of which PA to use for a given Customer Address (CA) for a customer VM.

o 外部IP头:IPv4和IPv6都可以用作GRE的传递协议。所示的IPv4报头用于说明目的。此后,外部帧中的IP地址被称为提供者地址(PA)。可以有一个或多个PA与NVGRE端点关联,策略控制对客户VM的给定客户地址(CA)使用哪个PA的选择。

In the GRE header:

在GRE标题中:

o The C (Checksum Present) and S (Sequence Number Present) bits in the GRE header MUST be zero.

o GRE报头中的C(存在校验和)和S(存在序列号)位必须为零。

o The K (Key Present) bit in the GRE header MUST be set to one. The 32-bit Key field in the GRE header is used to carry the Virtual Subnet ID (VSID) and the FlowID:

o GRE标头中的K(密钥存在)位必须设置为1。GRE标头中的32位密钥字段用于携带虚拟子网ID(VSID)和流ID:

- Virtual Subnet ID (VSID): This is a 24-bit value that is used to identify the NVGRE-based Virtual Layer 2 Network.

- 虚拟子网ID(VSID):这是一个24位的值,用于标识基于NVGRE的虚拟第2层网络。

- FlowID: This is an 8-bit value that is used to provide per-flow entropy for flows in the same VSID. The FlowID MUST NOT be modified by transit devices. The encapsulating NVE SHOULD provide as much entropy as possible in the FlowID. If a FlowID is not generated, it MUST be set to all zeros.

- FlowID:这是一个8位值,用于为同一VSID中的流提供每流熵。传输设备不得修改FlowID。封装的NVE应该在FlowID中提供尽可能多的熵。如果未生成FlowID,则必须将其设置为全零。

o The Protocol Type field in the GRE header is set to 0x6558 (Transparent Ethernet Bridging) [2].

o GRE头中的协议类型字段设置为0x6558(透明以太网桥接)[2]。

In the inner headers (headers of the GRE payload):

在内部标题(GRE有效负载的标题)中:

o The inner Ethernet frame comprises an inner Ethernet header followed by optional inner IP header, followed by the IP payload. The inner frame could be any Ethernet data frame not just IP. Note that the inner Ethernet frame's Frame Check Sequence (FCS) is not encapsulated.

o 内部以太网帧包括一个内部以太网报头,后面是可选的内部IP报头,后面是IP有效负载。内部帧可以是任何以太网数据帧,而不仅仅是IP。请注意,内部以太网帧的帧检查序列(FCS)未封装。

o For illustrative purposes, IPv4 headers are shown as the inner IP headers, but IPv6 headers may be used. Henceforth, the IP address contained in the inner frame is referred to as the Customer Address (CA).

o 出于说明目的,IPv4报头显示为内部IP报头,但也可以使用IPv6报头。此后,包含在内部框架中的IP地址被称为客户地址(CA)。

3.3. Inner Tag as Defined by IEEE 802.1Q
3.3. IEEE 802.1Q定义的内部标签

The inner Ethernet header of NVGRE MUST NOT contain the tag as defined by IEEE 802.1Q [5]. The encapsulating NVE MUST remove any existing IEEE 802.1Q tag before encapsulation of the frame in NVGRE. A decapsulating NVE MUST drop the frame if the inner Ethernet frame contains an IEEE 802.1Q tag.

NVGRE的内部以太网报头不得包含IEEE 802.1Q[5]定义的标签。在NVGRE中封装帧之前,封装NVE必须移除任何现有的IEEE 802.1Q标记。如果内部以太网帧包含IEEE 802.1Q标签,则解除封装的NVE必须丢弃该帧。

3.4. Reserved VSID
3.4. 保留VSID

The VSID range from 0-0xFFF is reserved for future use.

0-0xFFF的VSID范围保留供将来使用。

The VSID 0xFFFFFF is reserved for vendor-specific NVE-to-NVE communication. The sender NVE SHOULD verify the receiver NVE's vendor before sending a packet using this VSID; however, such a verification mechanism is out of scope of this document. Implementations SHOULD choose a mechanism that meets their requirements.

VSID 0xFFFFFF保留用于供应商特定的NVE到NVE通信。发送方NVE应在使用此VSID发送数据包之前验证接收方NVE的供应商;然而,这种核查机制超出了本文件的范围。实现应该选择满足其需求的机制。

4. NVGRE Deployment Considerations
4. NVGRE部署注意事项
4.1. ECMP Support
4.1. ECMP支持

Equal-Cost Multipath (ECMP) may be used to provide load balancing. If ECMP is used, it is RECOMMENDED that the ECMP hash is calculated either using the outer IP frame fields and entire Key field (32 bits) or the inner IP and transport frame fields.

等成本多路径(ECMP)可用于提供负载平衡。如果使用ECMP,建议使用外部IP帧字段和整个密钥字段(32位)或内部IP和传输帧字段计算ECMP哈希。

4.2. Broadcast and Multicast Traffic
4.2. 广播和多播流量

To support broadcast and multicast traffic inside a virtual subnet, one or more administratively scoped multicast addresses [8] [9] can be assigned for the VSID. All multicast or broadcast traffic originating from within a VSID is encapsulated and sent to the assigned multicast address. From an administrative standpoint, it is possible for network operators to configure a PA multicast address for each multicast address that is used inside a VSID; this facilitates optimal multicast handling. Depending on the hardware capabilities of the physical network devices and the physical network architecture, multiple virtual subnets may use the same physical IP multicast address.

为了支持虚拟子网内的广播和多播通信,可以为VSID分配一个或多个管理范围的多播地址[8][9]。来自VSID内的所有多播或广播流量都被封装并发送到指定的多播地址。从管理的角度来看,网络运营商可以为VSID内部使用的每个多播地址配置PA多播地址;这有助于优化多播处理。根据物理网络设备的硬件能力和物理网络体系结构,多个虚拟子网可以使用相同的物理IP多播地址。

Alternatively, based upon the configuration at the NVE, broadcast and multicast in the virtual subnet can be supported using N-way unicast. In N-way unicast, the sender NVE would send one encapsulated packet to every NVE in the virtual subnet. The sender NVE can encapsulate and send the packet as described in Section 4.3 ("Unicast Traffic"). This alleviates the need for multicast support in the physical network.

或者,基于在NVE处的配置,可以使用N路单播来支持虚拟子网中的广播和多播。在N路单播中,发送方NVE将向虚拟子网中的每个NVE发送一个封装的数据包。发送方NVE可以按照第4.3节(“单播通信”)中的描述封装和发送数据包。这减轻了物理网络中对多播支持的需求。

4.3. Unicast Traffic
4.3. 单播业务

The NVGRE endpoint encapsulates a Layer 2 packet in GRE using the source PA associated with the endpoint with the destination PA corresponding to the location of the destination endpoint. As outlined earlier, there can be one or more PAs associated with an endpoint and policy will control which ones get used for communication. The encapsulated GRE packet is bridged and routed normally by the physical network to the destination PA. Bridging uses the outer Ethernet encapsulation for scope on the LAN. The only requirement is bidirectional IP connectivity from the underlying physical network. On the destination, the NVGRE endpoint decapsulates the GRE packet to recover the original Layer 2 frame. Traffic flows similarly on the reverse path.

NVGRE端点使用与端点相关联的源PA和对应于目标端点位置的目标PA在GRE中封装第2层分组。如前所述,可以有一个或多个PA与端点关联,策略将控制哪些PA用于通信。封装的GRE数据包通常由物理网络桥接并路由到目标PA。桥接使用外部以太网封装,用于LAN上的作用域。唯一的要求是来自底层物理网络的双向IP连接。在目的地,NVGRE端点解除GRE数据包的封装以恢复原始的第2层帧。反向路径上的交通流相似。

4.4. IP Fragmentation
4.4. IP碎片

Section 5.1 of RFC 2003 [12] specifies mechanisms for handling fragmentation when encapsulating IP within IP. The subset of mechanisms NVGRE selects are intended to ensure that NVGRE-encapsulated frames are not fragmented after encapsulation en route to the destination NVGRE endpoint and that traffic sources can leverage Path MTU discovery.

RFC 2003[12]第5.1节规定了在IP中封装IP时处理碎片的机制。NVGRE选择的机制子集旨在确保NVGRE封装的帧在封装到目标NVGRE端点的过程中不会出现碎片,并且流量源可以利用路径MTU发现。

A sender NVE MUST NOT fragment NVGRE packets. A receiver NVE MAY discard fragmented NVGRE packets. It is RECOMMENDED that the MTU of the physical network accommodates the larger frame size due to encapsulation. Path MTU or configuration via control plane can be used to meet this requirement.

发送方NVE不得分割NVGRE数据包。接收器NVE可以丢弃分段的NVGRE分组。由于封装,建议物理网络的MTU容纳更大的帧大小。通过控制平面的路径MTU或配置可用于满足此要求。

4.5. Address/Policy Management and Routing
4.5. 地址/策略管理和路由

Address acquisition is beyond the scope of this document and can be obtained statically, dynamically, or using stateless address autoconfiguration. CA and PA space can be either IPv4 or IPv6. In fact, the address families don't have to match; for example, a CA can be IPv4 while the PA is IPv6, and vice versa.

地址获取超出了本文档的范围,可以通过静态、动态或使用无状态地址自动配置来获取。CA和PA空间可以是IPv4或IPv6。事实上,地址家庭不必匹配;例如,CA可以是IPv4,而PA可以是IPv6,反之亦然。

4.6. Cross-Subnet, Cross-Premise Communication
4.6. 跨子网、跨前提通信

One application of this framework is that it provides a seamless path for enterprises looking to expand their virtual machine hosting capabilities into public clouds. Enterprises can bring their entire IP subnet(s) and isolation policies, thus making the transition to or from the cloud simpler. It is possible to move portions of an IP subnet to the cloud; however, that requires additional configuration on the enterprise network and is not discussed in this document. Enterprises can continue to use existing communications models like site-to-site VPN to secure their traffic.

该框架的一个应用是,它为希望将虚拟机托管功能扩展到公共云的企业提供了一条无缝路径。企业可以使用其整个IP子网和隔离策略,从而简化向云的转换或从云的转换。可以将IP子网的一部分移动到云中;但是,这需要在企业网络上进行额外的配置,本文档中没有讨论。企业可以继续使用现有的通信模型(如站点到站点VPN)来保护其流量。

A VPN gateway is used to establish a secure site-to-site tunnel over the Internet, and all the enterprise services running in virtual machines in the cloud use the VPN gateway to communicate back to the enterprise. For simplicity, we use a VPN gateway configured as a VM (shown in Figure 2) to illustrate cross-subnet, cross-premise communication.

VPN网关用于在Internet上建立安全的站点到站点隧道,云中虚拟机中运行的所有企业服务都使用VPN网关与企业通信。为简单起见,我们使用配置为VM的VPN网关(如图2所示)来说明跨子网、跨前提的通信。

   +-----------------------+        +-----------------------+
   |       Server 1        |        |       Server 2        |
   | +--------+ +--------+ |        | +-------------------+ |
   | | VM1    | | VM2    | |        | |    VPN Gateway    | |
   | | IP=CA1 | | IP=CA2 | |        | | Internal  External| |
   | |        | |        | |        | |  IP=CAg   IP=GAdc | |
   | +--------+ +--------+ |        | +-------------------+ |
   |       Hypervisor      |        |     | Hypervisor| ^   |
   +-----------------------+        +-------------------:---+
               | IP=PA1                   | IP=PA4    | :
               |                          |           | :
               |     +-------------------------+      | : VPN
               +-----|     Layer 3 Network     |------+ : Tunnel
                     +-------------------------+        :
                                  |                     :
        +-----------------------------------------------:--+
        |                                               :  |
        |                     Internet                  :  |
        |                                               :  |
        +-----------------------------------------------:--+
                                  |                     v
                                  |   +-------------------+
                                  |   |    VPN Gateway    |
                                  |---|                   |
                             IP=GAcorp| External IP=GAcorp|
                                      +-------------------+
                                                |
                                    +-----------------------+
                                    |  Corp Layer 3 Network |
                                    |      (In CA Space)    |
                                    +-----------------------+
                                                |
                                   +---------------------------+
                                   |       Server X            |
                                   | +----------+ +----------+ |
                                   | | Corp VMe1| | Corp VMe2| |
                                   | |  IP=CAe1 | |  IP=CAe2 | |
                                   | +----------+ +----------+ |
                                   |         Hypervisor        |
                                   +---------------------------+
        
   +-----------------------+        +-----------------------+
   |       Server 1        |        |       Server 2        |
   | +--------+ +--------+ |        | +-------------------+ |
   | | VM1    | | VM2    | |        | |    VPN Gateway    | |
   | | IP=CA1 | | IP=CA2 | |        | | Internal  External| |
   | |        | |        | |        | |  IP=CAg   IP=GAdc | |
   | +--------+ +--------+ |        | +-------------------+ |
   |       Hypervisor      |        |     | Hypervisor| ^   |
   +-----------------------+        +-------------------:---+
               | IP=PA1                   | IP=PA4    | :
               |                          |           | :
               |     +-------------------------+      | : VPN
               +-----|     Layer 3 Network     |------+ : Tunnel
                     +-------------------------+        :
                                  |                     :
        +-----------------------------------------------:--+
        |                                               :  |
        |                     Internet                  :  |
        |                                               :  |
        +-----------------------------------------------:--+
                                  |                     v
                                  |   +-------------------+
                                  |   |    VPN Gateway    |
                                  |---|                   |
                             IP=GAcorp| External IP=GAcorp|
                                      +-------------------+
                                                |
                                    +-----------------------+
                                    |  Corp Layer 3 Network |
                                    |      (In CA Space)    |
                                    +-----------------------+
                                                |
                                   +---------------------------+
                                   |       Server X            |
                                   | +----------+ +----------+ |
                                   | | Corp VMe1| | Corp VMe2| |
                                   | |  IP=CAe1 | |  IP=CAe2 | |
                                   | +----------+ +----------+ |
                                   |         Hypervisor        |
                                   +---------------------------+
        

Figure 2: Cross-Subnet, Cross-Premise Communication

图2:跨子网、跨前提通信

The packet flow is similar to the unicast traffic flow between VMs; the key difference in this case is that the packet needs to be sent to a VPN gateway before it gets forwarded to the destination. As part of routing configuration in the CA space, a per-tenant VPN gateway is provisioned for communication back to the enterprise. The

分组流类似于vm之间的单播业务流;这种情况下的关键区别在于,在将数据包转发到目的地之前,需要将数据包发送到VPN网关。作为CA空间中路由配置的一部分,每个租户的VPN网关被设置为通信回企业。这个

example illustrates an outbound connection between VM1 inside the data center and VMe1 inside the enterprise network. When the outbound packet from CA1 to CAe1 reaches the hypervisor on Server 1, the NVE in Server 1 can perform the equivalent of a route lookup on the packet. The cross-premise packet will match the default gateway rule, as CAe1 is not part of the tenant virtual network in the data center. The virtualization policy will indicate the packet to be encapsulated and sent to the PA of the tenant VPN gateway (PA4) running as a VM on Server 2. The packet is decapsulated on Server 2 and delivered to the VM gateway. The gateway in turn validates and sends the packet on the site-to-site VPN tunnel back to the enterprise network. As the communication here is external to the data center, the PA address for the VPN tunnel is globally routable. The outer header of this packet is sourced from GAdc destined to GAcorp. This packet is routed through the Internet to the enterprise VPN gateway, which is the other end of the site-to-site tunnel; at that point, the VPN gateway decapsulates the packet and sends it inside the enterprise where the CAe1 is routable on the network. The reverse path is similar once the packet reaches the enterprise VPN gateway.

示例说明了数据中心内的VM1和企业网络内的VMe1之间的出站连接。当从CA1到CAe1的出站数据包到达服务器1上的虚拟机监控程序时,服务器1中的NVE可以对数据包执行等效的路由查找。跨前提数据包将匹配默认网关规则,因为CAe1不是数据中心租户虚拟网络的一部分。虚拟化策略将指示要封装的数据包,并将其发送到作为服务器2上的VM运行的租户VPN网关(PA4)的PA。数据包在服务器2上被解除封装并传送到VM网关。网关依次验证站点到站点VPN隧道上的数据包并将其发送回企业网络。由于此处的通信位于数据中心外部,因此VPN隧道的PA地址是全局可路由的。此数据包的外部报头来自目的地为GAcorp的GAdc。此数据包通过Internet路由到企业VPN网关,该网关是站点到站点隧道的另一端;此时,VPN网关解除数据包的封装并将其发送到企业内部,其中CAe1可在网络上路由。一旦数据包到达企业VPN网关,反向路径类似。

4.7. Internet Connectivity
4.7. 互联网连接

To enable connectivity to the Internet, an Internet gateway is needed that bridges the virtualized CA space to the public Internet address space. The gateway needs to perform translation between the virtualized world and the Internet. For example, the NVGRE endpoint can be part of a load balancer or a NAT that replaces the VPN Gateway on Server 2 shown in Figure 2.

为了能够连接到Internet,需要一个Internet网关,将虚拟化CA空间连接到公共Internet地址空间。网关需要在虚拟世界和互联网之间执行转换。例如,NVGRE端点可以是负载平衡器或NAT的一部分,后者替代了图2所示的服务器2上的VPN网关。

4.8. Management and Control Planes
4.8. 管理和控制飞机

There are several protocols that can manage and distribute policy; however, it is outside the scope of this document. Implementations SHOULD choose a mechanism that meets their scale requirements.

有几个协议可以管理和分发策略;但是,它不在本文件的范围内。实现应该选择一种满足其规模需求的机制。

4.9. NVGRE-Aware Devices
4.9. NVGRE感知设备

One example of a typical deployment consists of virtualized servers deployed across multiple racks connected by one or more layers of Layer 2 switches, which in turn may be connected to a Layer 3 routing domain. Even though routing in the physical infrastructure will work without any modification with NVGRE, devices that perform specialized processing in the network need to be able to parse GRE to get access to tenant-specific information. Devices that understand and parse the VSID can provide rich multi-tenant-aware services inside the data center. As outlined earlier, it is imperative to exploit multiple paths inside the network through techniques such as ECMP. The Key

典型部署的一个示例包括跨多个机架部署的虚拟化服务器,这些机架由一层或多层第2层交换机连接,而这些交换机又可以连接到第3层路由域。即使物理基础设施中的路由可以在无需任何修改的情况下使用NVGRE工作,但在网络中执行专门处理的设备需要能够解析GRE以访问特定于租户的信息。理解和解析VSID的设备可以在数据中心内提供丰富的多租户感知服务。如前所述,必须通过ECMP等技术利用网络内的多条路径。钥匙

field (a 32-bit field, including both the VSID and the optional FlowID) can provide additional entropy to the switches to exploit path diversity inside the network. A diverse ecosystem is expected to emerge as more and more devices become multi-tenant aware. In the interim, without requiring any hardware upgrades, there are alternatives to exploit path diversity with GRE by associating multiple PAs with NVGRE endpoints with policy controlling the choice of which PA to use.

字段(32位字段,包括VSID和可选的FlowID)可以为交换机提供额外的熵,以利用网络内的路径多样性。随着越来越多的设备具备多租户意识,预计将出现一个多样化的生态系统。在此期间,在不需要任何硬件升级的情况下,可以通过将多个PA与NVGRE端点关联,并通过控制使用哪个PA的选择的策略,利用GRE的路径多样性。

It is expected that communication can span multiple data centers and also cross the virtual/physical boundary. Typical scenarios that require virtual-to-physical communication include access to storage and databases. Scenarios demanding lossless Ethernet functionality may not be amenable to NVGRE, as traffic is carried over an IP network. NVGRE endpoints mediate between the network-virtualized and non-network-virtualized environments. This functionality can be incorporated into Top-of-Rack switches, storage appliances, load balancers, routers, etc., or built as a stand-alone appliance.

预计通信可以跨越多个数据中心,也可以跨越虚拟/物理边界。需要虚拟到物理通信的典型场景包括对存储和数据库的访问。要求无损以太网功能的场景可能不适合NVGRE,因为流量通过IP网络传输。NVGRE端点在网络虚拟化和非网络虚拟化环境之间进行调解。此功能可以集成到机架顶部交换机、存储设备、负载平衡器、路由器等中,也可以构建为独立设备。

It is imperative to consider the impact of any solution on host performance. Today's server operating systems employ sophisticated acceleration techniques such as checksum offload, Large Send Offload (LSO), Receive Segment Coalescing (RSC), Receive Side Scaling (RSS), Virtual Machine Queue (VMQ), etc. These technologies should become NVGRE aware. IPsec Security Associations (SAs) can be offloaded to the NIC so that computationally expensive cryptographic operations are performed at line rate in the NIC hardware. These SAs are based on the IP addresses of the endpoints. As each packet on the wire gets translated, the NVGRE endpoint SHOULD intercept the offload requests and do the appropriate address translation. This will ensure that IPsec continues to be usable with network virtualization while taking advantage of hardware offload capabilities for improved performance.

必须考虑任何解决方案对主机性能的影响。今天的服务器操作系统采用了复杂的加速技术,如校验和卸载、大发送卸载(LSO)、接收段合并(RSC)、接收端扩展(RSS)、虚拟机队列(VMQ)等。这些技术应该具有NVGRE意识。IPsec安全关联(SA)可以卸载到NIC,以便在NIC硬件中以线路速率执行计算代价高昂的加密操作。这些SA基于端点的IP地址。当线路上的每个数据包都被转换时,NVGRE端点应该拦截卸载请求并执行适当的地址转换。这将确保IPsec继续可用于网络虚拟化,同时利用硬件卸载功能提高性能。

4.10. Network Scalability with NVGRE
4.10. 使用NVGRE的网络可扩展性

One of the key benefits of using NVGRE is the IP address scalability and in turn MAC address table scalability that can be achieved. An NVGRE endpoint can use one PA to represent multiple CAs. This lowers the burden on the MAC address table sizes at the Top-of-Rack switches. One obvious benefit is in the context of server virtualization, which has increased the demands on the network infrastructure. By embedding an NVGRE endpoint in a hypervisor, it is possible to scale significantly. This framework enables location information to be preconfigured inside an NVGRE endpoint, thus allowing broadcast ARP traffic to be proxied locally. This approach can scale to large-sized virtual subnets. These virtual subnets can be spread across multiple Layer 3 physical subnets. It allows

使用NVGRE的一个关键好处是IP地址可伸缩性,进而可以实现MAC地址表可伸缩性。NVGRE端点可以使用一个PA来表示多个CA。这降低了机架式交换机顶部MAC地址表大小的负担。一个明显的好处是服务器虚拟化,这增加了对网络基础设施的需求。通过在虚拟机监控程序中嵌入NVGRE端点,可以显著扩展。该框架允许在NVGRE端点内预先配置位置信息,从而允许在本地代理广播ARP流量。这种方法可以扩展到大型虚拟子网。这些虚拟子网可以分布在多个第3层物理子网上。它允许

workloads to be moved around without imposing a huge burden on the network control plane. By eliminating most broadcast traffic and converting others to multicast, the routers and switches can function more optimally by building efficient multicast trees. By using server and network capacity efficiently, it is possible to drive down the cost of building and managing data centers.

在不给网络控制平面带来巨大负担的情况下移动工作负载。通过消除大部分广播流量并将其他流量转换为多播,路由器和交换机可以通过构建高效的多播树来实现更优化的功能。通过高效地使用服务器和网络容量,可以降低构建和管理数据中心的成本。

5. Security Considerations
5. 安全考虑

This proposal extends the Layer 2 subnet across the data center and increases the scope for spoofing attacks. Mitigations of such attacks are possible with authentication/encryption using IPsec or any other IP-based mechanism. The control plane for policy distribution is expected to be secured by using any of the existing security protocols. Further management traffic can be isolated in a separate subnet/VLAN.

此建议将第2层子网扩展到整个数据中心,并增加了欺骗攻击的范围。通过使用IPsec或任何其他基于IP的机制进行身份验证/加密,可以减轻此类攻击。策略分发的控制平面预计将通过使用任何现有安全协议来保护。进一步的管理流量可以在单独的子网/VLAN中隔离。

The checksum in the GRE header is not supported. The mitigation of this is to deploy an NVGRE-based solution in a network that provides error detection along the NVGRE packet path, for example, using Ethernet Cyclic Redundancy Check (CRC) or IPsec or any other error detection mechanism.

不支持GRE标头中的校验和。缓解措施是在网络中部署基于NVGRE的解决方案,该网络提供沿NVGRE数据包路径的错误检测,例如,使用以太网循环冗余校验(CRC)或IPsec或任何其他错误检测机制。

6. Normative References
6. 规范性引用文件

[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>.

[1] Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,DOI 10.17487/RFC2119,1997年3月<http://www.rfc-editor.org/info/rfc2119>.

[2] IANA, "IEEE 802 Numbers", <http://www.iana.org/assignments/ieee-802-numbers>.

[2] IANA,“IEEE 802编号”<http://www.iana.org/assignments/ieee-802-numbers>.

[3] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, DOI 10.17487/RFC2784, March 2000, <http://www.rfc-editor.org/info/rfc2784>.

[3] Farinaci,D.,Li,T.,Hanks,S.,Meyer,D.,和P.Traina,“通用路由封装(GRE)”,RFC 2784,DOI 10.17487/RFC27842000年3月<http://www.rfc-editor.org/info/rfc2784>.

[4] Dommety, G., "Key and Sequence Number Extensions to GRE", RFC 2890, DOI 10.17487/RFC2890, September 2000, <http://www.rfc-editor.org/info/rfc2890>.

[4] Dommety,G.,“GRE的密钥和序列号扩展”,RFC 2890,DOI 10.17487/RFC2890,2000年9月<http://www.rfc-editor.org/info/rfc2890>.

[5] IEEE, "IEEE Standard for Local and metropolitan area networks--Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks", IEEE Std 802.1Q.

[5] IEEE,“局域网和城域网IEEE标准——媒体访问控制(MAC)网桥和虚拟桥接局域网”,IEEE标准802.1Q。

[6] Greenberg, A., et al., "VL2: A Scalable and Flexible Data Center Network", Communications of the ACM, DOI 10.1145/1897852.1897877, 2011.

[6] Greenberg,A.等人,“VL2:可扩展和灵活的数据中心网络”,ACM通信,DOI 10.1145/1897852.18978772011。

[7] Greenberg, A., et al., "The Cost of a Cloud: Research Problems in Data Center Networks", ACM SIGCOMM Computer Communication Review, DOI 10.1145/1496091.1496103, 2009.

[7] Greenberg,A.等人,“云的成本:数据中心网络中的研究问题”,ACM SIGCOMM计算机通信评论,DOI 10.1145/149691.1496103,2009年。

[8] Hinden, R. and S. Deering, "IP Version 6 Addressing Architecture", RFC 4291, DOI 10.17487/RFC4291, February 2006, <http://www.rfc-editor.org/info/rfc4291>.

[8] Hinden,R.和S.Deering,“IP版本6寻址体系结构”,RFC 4291,DOI 10.17487/RFC42912006年2月<http://www.rfc-editor.org/info/rfc4291>.

[9] Meyer, D., "Administratively Scoped IP Multicast", BCP 23, RFC 2365, DOI 10.17487/RFC2365, July 1998, <http://www.rfc-editor.org/info/rfc2365>.

[9] Meyer,D.,“管理范围的IP多播”,BCP 23,RFC 2365,DOI 10.17487/RFC2365,1998年7月<http://www.rfc-editor.org/info/rfc2365>.

[10] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., Kreeger, L., and M. Napierala, "Problem Statement: Overlays for Network Virtualization", RFC 7364, DOI 10.17487/RFC7364, October 2014, <http://www.rfc-editor.org/info/rfc7364>.

[10] Narten,T.,Ed.,Gray,E.,Ed.,Black,D.,Fang,L.,Kreeger,L.,和M.Napierala,“问题陈述:网络虚拟化覆盖”,RFC 7364,DOI 10.17487/RFC7364,2014年10月<http://www.rfc-editor.org/info/rfc7364>.

[11] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter, "Framework for Data Center (DC) Network Virtualization", RFC 7365, DOI 10.17487/RFC7365, October 2014, <http://www.rfc-editor.org/info/rfc7365>.

[11] Lasserre,M.,Balus,F.,Morin,T.,Bitar,N.,和Y.Rekhter,“数据中心(DC)网络虚拟化框架”,RFC 7365,DOI 10.17487/RFC7365,2014年10月<http://www.rfc-editor.org/info/rfc7365>.

[12] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI 10.17487/RFC2003, October 1996, <http://www.rfc-editor.org/info/rfc2003>.

[12] Perkins,C.,“IP内的IP封装”,RFC 2003,DOI 10.17487/RFC2003,1996年10月<http://www.rfc-editor.org/info/rfc2003>.

[13] Touch, J. and R. Perlman, "Transparent Interconnection of Lots of Links (TRILL): Problem and Applicability Statement", RFC 5556, DOI 10.17487/RFC5556, May 2009, <http://www.rfc-editor.org/info/rfc5556>.

[13] Touch,J.和R.Perlman,“大量链接的透明互连(TRILL):问题和适用性声明”,RFC 5556,DOI 10.17487/RFC5556,2009年5月<http://www.rfc-editor.org/info/rfc5556>.

Contributors

贡献者

Murari Sridharan Microsoft Corporation 1 Microsoft Way Redmond, WA 98052 United States Email: muraris@microsoft.com

Murari Sridharan Microsoft Corporation 1 Microsoft Way Redmond,WA 98052美国电子邮件:muraris@microsoft.com

Albert Greenberg Microsoft Corporation 1 Microsoft Way Redmond, WA 98052 United States Email: albert@microsoft.com

Albert Greenberg微软公司1微软路华盛顿州雷德蒙98052美国电子邮件:albert@microsoft.com

Narasimhan Venkataramiah Microsoft Corporation 1 Microsoft Way Redmond, WA 98052 United States Email: navenkat@microsoft.com

Narasimhan Venkataramiah Microsoft Corporation 1 Microsoft Way Redmond,WA 98052美国电子邮件:navenkat@microsoft.com

Kenneth Duda Arista Networks, Inc. 5470 Great America Pkwy Santa Clara, CA 95054 United States Email: kduda@aristanetworks.com

Kenneth Duda Arista Networks,Inc.5470 Great America Pkwy Santa Clara,CA 95054美国电子邮件:kduda@aristanetworks.com

Ilango Ganga Intel Corporation 2200 Mission College Blvd. M/S: SC12-325 Santa Clara, CA 95054 United States Email: ilango.s.ganga@intel.com

伊兰戈甘加英特尔公司,使命学院大道2200号。签名:SC12-325加利福尼亚州圣克拉拉95054美国电子邮件:ilango.S。ganga@intel.com

Geng Lin Google 1600 Amphitheatre Parkway Mountain View, CA 94043 United States Email: genglin@google.com

耿林谷歌1600圆形剧场公园道山景,加利福尼亚州94043美国电子邮件:genglin@google.com

Mark Pearson Hewlett-Packard Co. 8000 Foothills Blvd. Roseville, CA 95747 United States Email: mark.pearson@hp.com

马克·皮尔森·惠普公司,位于福希尔大道8000号。加利福尼亚州罗斯维尔95747美国电子邮件:马克。pearson@hp.com

Patricia Thaler Broadcom Corporation 3151 Zanker Road San Jose, CA 95134 United States Email: pthaler@broadcom.com

Patricia Thaler Broadcom Corporation美国加利福尼亚州圣何塞市赞克路3151号95134电子邮件:pthaler@broadcom.com

Chait Tumuluri Emulex Corporation 3333 Susan Street Costa Mesa, CA 92626 United States Email: chait@emulex.com

Chait Tumuluri Emulex Corporation美国加利福尼亚州科斯塔梅萨市苏珊街3333号92626电子邮件:chait@emulex.com

Authors' Addresses

作者地址

Pankaj Garg (editor) Microsoft Corporation 1 Microsoft Way Redmond, WA 98052 United States Email: pankajg@microsoft.com

Pankaj Garg(编辑)微软公司1微软路雷德蒙,华盛顿98052美国电子邮件:pankajg@microsoft.com

Yu-Shun Wang (editor) Microsoft Corporation 1 Microsoft Way Redmond, WA 98052 United States Email: yushwang@microsoft.com

王玉顺(编辑)微软公司华盛顿州雷德蒙微软路1号98052美国电子邮件:yushwang@microsoft.com