Network Working Group                                           J. Salim
Request for Comments: 3549                                 Znyx Networks
Category: Informational                                      H. Khosravi
                                                                   Intel
                                                                A. Kleen
                                                                    Suse
                                                            A. Kuznetsov
                                                              INR/Swsoft
                                                               July 2003
        
Network Working Group                                           J. Salim
Request for Comments: 3549                                 Znyx Networks
Category: Informational                                      H. Khosravi
                                                                   Intel
                                                                A. Kleen
                                                                    Suse
                                                            A. Kuznetsov
                                                              INR/Swsoft
                                                               July 2003
        

Linux Netlink as an IP Services Protocol

作为IP服务协议的Linux Netlink

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (2003). All Rights Reserved.

版权所有(C)互联网协会(2003年)。版权所有。

Abstract

摘要

This document describes Linux Netlink, which is used in Linux both as an intra-kernel messaging system as well as between kernel and user space. The focus of this document is to describe Netlink's functionality as a protocol between a Forwarding Engine Component (FEC) and a Control Plane Component (CPC), the two components that define an IP service. As a result of this focus, this document ignores other uses of Netlink, including its use as a intra-kernel messaging system, as an inter-process communication scheme (IPC), or as a configuration tool for other non-networking or non-IP network services (such as decnet, etc.).

本文档描述Linux Netlink,它在Linux中用作内核内消息传递系统以及内核和用户空间之间的消息传递系统。本文档的重点是将Netlink的功能描述为转发引擎组件(FEC)和控制平面组件(CPC)之间的协议,这两个组件定义了IP服务。因此,本文忽略了Netlink的其他用途,包括作为内核内消息传递系统、进程间通信方案(IPC)或其他非网络或非IP网络服务(如decnet等)的配置工具的用途。

This document is intended as informational in the context of prior art for the ForCES IETF working group.

本文件旨在为部队IETF工作组提供现有技术方面的信息。

Table of Contents

目录

   1.  Introduction ...............................................  2
       1.1. Definitions ...........................................  3
            1.1.1.  Control Plane Components (CPCs)................  3
            1.1.2.  Forwarding Engine Components (FECs)............  3
            1.1.3.  IP Services ...................................  5
   2.  Netlink Architecture .......................................  7
       2.1. Netlink Logical Model .................................  8
       2.2. Message Format.........................................  9
       2.3. Protocol Model.........................................  9
            2.3.1.  Service Addressing............................. 10
            2.3.2.  Netlink Message Header......................... 10
            2.3.3.  FE System Services' Templates.................. 13
   3.  Currently Defined Netlink IP Services....................... 16
       3.1. IP Service NETLINK_ROUTE............................... 16
            3.1.1.  Network Route Service Module................... 16
            3.1.2.  Neighbor Setup Service Module.................. 20
            3.1.3.  Traffic Control Service........................ 21
       3.2. IP Service NETLINK_FIREWALL............................ 23
       3.3. IP Service NETLINK_ARPD................................ 27
   4.  References.................................................. 27
       4.1. Normative References................................... 27
       4.2. Informative References................................. 28
   5.  Security Considerations..................................... 28
   6.  Acknowledgements............................................ 28
   Appendix 1:  Sample Service Hierarchy .......................... 29
   Appendix 2:  Sample Protocol for the Foo IP Service............. 30
   Appendix 2a: Interacting with Other IP services................. 30
   Appendix 3:  Examples........................................... 31
   Authors' Addresses.............................................. 32
   Full Copyright Statement........................................ 33
        
   1.  Introduction ...............................................  2
       1.1. Definitions ...........................................  3
            1.1.1.  Control Plane Components (CPCs)................  3
            1.1.2.  Forwarding Engine Components (FECs)............  3
            1.1.3.  IP Services ...................................  5
   2.  Netlink Architecture .......................................  7
       2.1. Netlink Logical Model .................................  8
       2.2. Message Format.........................................  9
       2.3. Protocol Model.........................................  9
            2.3.1.  Service Addressing............................. 10
            2.3.2.  Netlink Message Header......................... 10
            2.3.3.  FE System Services' Templates.................. 13
   3.  Currently Defined Netlink IP Services....................... 16
       3.1. IP Service NETLINK_ROUTE............................... 16
            3.1.1.  Network Route Service Module................... 16
            3.1.2.  Neighbor Setup Service Module.................. 20
            3.1.3.  Traffic Control Service........................ 21
       3.2. IP Service NETLINK_FIREWALL............................ 23
       3.3. IP Service NETLINK_ARPD................................ 27
   4.  References.................................................. 27
       4.1. Normative References................................... 27
       4.2. Informative References................................. 28
   5.  Security Considerations..................................... 28
   6.  Acknowledgements............................................ 28
   Appendix 1:  Sample Service Hierarchy .......................... 29
   Appendix 2:  Sample Protocol for the Foo IP Service............. 30
   Appendix 2a: Interacting with Other IP services................. 30
   Appendix 3:  Examples........................................... 31
   Authors' Addresses.............................................. 32
   Full Copyright Statement........................................ 33
        
1. Introduction
1. 介绍

The concept of IP Service control-forwarding separation was first introduced in the early 1990s by the BSD 4.4 routing sockets [9]. The focus at that time was a simple IP(v4) forwarding service and how the CPC, either via a command line configuration tool or a dynamic route daemon, could control forwarding tables for that IPv4 forwarding service.

IP服务控制转发分离的概念在20世纪90年代初由BSD 4.4路由套接字首次引入[9]。当时的焦点是一个简单的IP(v4)转发服务,以及CPC如何通过命令行配置工具或动态路由守护程序控制该IPv4转发服务的转发表。

The IP world has evolved considerably since those days. Linux Netlink, when observed from a service provisioning and management point of view, takes routing sockets one step further by breaking the barrier of focus around IPv4 forwarding. Since the Linux 2.1 kernel, Netlink has been providing the IP service abstraction to a few services other than the classical RFC 1812 IPv4 forwarding.

自那时以来,知识产权世界发生了巨大的变化。从服务供应和管理的角度来看,Linux Netlink通过打破围绕IPv4转发的焦点障碍,使路由套接字更进一步。自Linux2.1内核发布以来,Netlink一直在为经典RFC1812IPv4转发以外的一些服务提供IP服务抽象。

The motivation for this document is not to list every possible service for which Netlink is applied. In fact, we leave out a lot of services (multicast routing, tunneling, policy routing, etc). Neither is this document intended to be a tutorial on Netlink. The idea is to explain the overall Netlink view with a special focus on the mandatory building blocks within the ForCES charter (i.e., IPv4 and QoS). This document also serves to capture prior art to many mechanisms that are useful within the context of ForCES. The text is limited to a subset of what is available in kernel 2.4.6, the newest kernel when this document was first written. It is also limited to IPv4 functionality.

本文档的目的不是列出应用Netlink的所有可能服务。事实上,我们遗漏了很多服务(多播路由、隧道、策略路由等)。本文档也不是关于Netlink的教程。其目的是解释整个Netlink视图,特别关注《部队宪章》中的强制性构件(即IPv4和QoS)。本文件还用于捕获在力的上下文中有用的许多机制的现有技术。本文仅限于kernel2.4.6中可用的部分,这是本文档第一次编写时的最新内核。它还限于IPv4功能。

We first give some concept definitions and then describe how Netlink fits in.

我们首先给出一些概念定义,然后描述Netlink是如何适应的。

1.1. Definitions
1.1. 定义

A Control Plane (CP) is an execution environment that may have several sub-components, which we refer to as CPCs. Each CPC provides control for a different IP service being executed by a Forwarding Engine (FE) component. This relationship means that there might be several CPCs on a physical CP, if it is controlling several IP services. In essence, the cohesion between a CP component and an FE component is the service abstraction.

控制平面(CP)是一个执行环境,它可能有几个子组件,我们称之为CPC。每个CPC为转发引擎(FE)组件执行的不同IP服务提供控制。这种关系意味着,如果一个物理CP控制多个IP服务,那么它可能有多个CP。本质上,CP组件和FE组件之间的内聚是服务抽象。

1.1.1. Control Plane Components (CPCs)
1.1.1. 控制平面组件(CPC)

Control Plane Components encompass signalling protocols, with diversity ranging from dynamic routing protocols, such as OSPF [5], to tag distribution protocols, such as CR-LDP [7]. Classical management protocols and activities also fall under this category. These include SNMP [6], COPS [4], and proprietary CLI/GUI configuration mechanisms. The purpose of the control plane is to provide an execution environment for the above-mentioned activities with the ultimate goal being to configure and manage the second Network Element (NE) component: the FE. The result of the configuration defines the way that packets traversing the FE are treated.

控制平面组件包括信令协议,其多样性从动态路由协议(如OSPF[5])到标签分发协议(如CR-LDP[7])。经典管理协议和活动也属于这一类。其中包括SNMP[6]、COPS[4]和专有的CLI/GUI配置机制。控制平面的目的是为上述活动提供执行环境,最终目标是配置和管理第二个网元(NE)组件:FE。配置的结果定义了处理穿过FE的数据包的方式。

1.1.2. Forwarding Engine Components (FECs)
1.1.2. 转发引擎组件(FEC)

The FE is the entity of the NE that incoming packets (from the network into the NE) first encounter.

FE是传入数据包(从网络进入网元)第一次遇到的网元实体。

The FE's service-specific component massages the packet to provide it with a treatment to achieve an IP service, as defined by the Control Plane Components for that IP service. Different services will utilize different FECs. Service modules may be chained to achieve a

FE的服务特定组件对数据包进行按摩,以向其提供实现IP服务的处理,如该IP服务的控制平面组件所定义。不同的服务将使用不同的FEC。服务模块可以链接以实现

more complex service (refer to the Linux FE model, described later). When built for providing a specific service, the FE service component will adhere to a forwarding model.

更复杂的服务(请参阅后面描述的Linux FE模型)。当为提供特定服务而构建时,FE服务组件将遵循转发模型。

1.1.2.1. Linux IP Forwarding Engine Model
1.1.2.1. Linux IP转发引擎模型
                        ____      +---------------+
                   +->-| FW |---> | TCP, UDP, ... |
                   |   +----+     +---------------+
                   |                   |
                   ^                   v
                   |                  _|_
                   +----<----+       | FW |
                             |       +----+
                             ^         |
                             |         Y
                           To host    From host
                            stack     stack
                             ^         |
                             |_____    |
Ingress                            ^   Y
device   ____    +-------+        +|---|--+   ____   +--------+ Egress
->----->| FW |-->|Ingress|-->---->| Forw- |->| FW |->| Egress | device
        +----+   |  TC   |        |  ard  |  +----+  |   TC   |-->
                 +-------+        +-------+          +--------+
        
                        ____      +---------------+
                   +->-| FW |---> | TCP, UDP, ... |
                   |   +----+     +---------------+
                   |                   |
                   ^                   v
                   |                  _|_
                   +----<----+       | FW |
                             |       +----+
                             ^         |
                             |         Y
                           To host    From host
                            stack     stack
                             ^         |
                             |_____    |
Ingress                            ^   Y
device   ____    +-------+        +|---|--+   ____   +--------+ Egress
->----->| FW |-->|Ingress|-->---->| Forw- |->| FW |->| Egress | device
        +----+   |  TC   |        |  ard  |  +----+  |   TC   |-->
                 +-------+        +-------+          +--------+
        

The figure above shows the Linux FE model per device. The only mandatory part of the datapath is the Forwarding module, which is RFC 1812 conformant. The different Firewall (FW), Ingress Traffic Control, and Egress Traffic Control building blocks are not mandatory in the datapath and may even be used to bypass the RFC 1812 module. These modules are shown as simple blocks in the datapath but, in fact, could be multiple cascaded, independent submodules within the indicated blocks. More information can be found at [10] and [11].

上图显示了每个设备的Linux FE模型。数据路径唯一必需的部分是转发模块,它符合RFC1812。不同的防火墙(FW)、入口流量控制和出口流量控制构建块在数据路径中不是强制性的,甚至可以用于绕过RFC 1812模块。这些模块在数据路径中显示为简单块,但事实上,可以是指示块中的多个级联独立子模块。更多信息可参见[10]和[11]。

Packets arriving at the ingress device first pass through a firewall module. Packets may be dropped, munged, etc., by the firewall module. The incoming packet, depending on set policy, may then be passed via an Ingress Traffic Control module. Metering and policing activities are contained within the Ingress TC module. Packets may be dropped, depending on metering results and policing policies, at this module. Next, the packet is subjected to the only non-optional module, the RFC 1812-conformant Forwarding module. The packet may be dropped if it is nonconformant (to the many RFCs complementing 1812 and 1122). This module is a juncture point at which packets destined to the forwarding NE may be sent up to the host stack.

到达入口设备的数据包首先通过防火墙模块。防火墙模块可以丢弃、屏蔽数据包等。根据所设置的策略,传入分组随后可经由入口业务控制模块传递。计量和监管活动包含在入口TC模块内。根据计量结果和监控策略,数据包可能会在此模块上丢弃。接下来,分组受制于唯一的非可选模块,RFC 1812一致性转发模块。如果数据包不符合(对许多补充1812和1122的RFC而言),则可能会丢弃该数据包。该模块是一个接合点,在该接合点处,目的地为转发网元的分组可以被发送到主机堆栈。

Packets that are not for the NE may further traverse a policy routing submodule (within the forwarding module), if so provisioned. Another firewall module is walked next. The firewall module can drop or munge/transform packets, depending on the configured sub-modules encountered and their policies. If all goes well, the Egress TC module is accessed next.

如果这样设置,则不用于NE的分组可以进一步穿过策略路由子模块(在转发模块内)。接下来是另一个防火墙模块。防火墙模块可以丢弃或咀嚼/转换数据包,具体取决于遇到的配置子模块及其策略。如果一切顺利,接下来访问出口TC模块。

The Egress TC may drop packets for policing, scheduling, congestion control, or rate control reasons. Egress queues exist at this point and any of the drops or delays may happen before or after the packet is queued. All is dependent on configured module algorithms and policies.

出口TC可以出于策略、调度、拥塞控制或速率控制原因丢弃分组。此时存在出口队列,任何丢弃或延迟都可能发生在分组排队之前或之后。所有这些都取决于配置的模块算法和策略。

1.1.3. IP Services
1.1.3. IP服务

An IP service is the treatment of an IP packet within the NE. This treatment is provided by a combination of both the CPC and the FEC.

IP服务是在网元内处理IP数据包。该处理由CPC和FEC的组合提供。

The time span of the service is from the moment when the packet arrives at the NE to the moment that it departs. In essence, an IP service in this context is a Per-Hop Behavior. CP components running on NEs define the end-to-end path control for a service by running control/signaling protocol/management-applications. These distributed CPCs unify the end-to-end view of the IP service. As noted above, these CP components then define the behavior of the FE (and therefore the NE) for a described packet.

服务的时间跨度是从数据包到达网元的时刻到数据包离开的时刻。本质上,此上下文中的IP服务是每跳行为。在网元上运行的CP组件通过运行控制/信令协议/管理应用程序来定义服务的端到端路径控制。这些分布式CPC统一了IP服务的端到端视图。如上所述,这些CP组件随后定义所述分组的FE(以及因此的NE)的行为。

A simple example of an IP service is the classical IPv4 Forwarding. In this case, control components, such as routing protocols (OSPF, RIP, etc.) and proprietary CLI/GUI configurations, modify the FE's forwarding tables in order to offer the simple service of forwarding packets to the next hop. Traditionally, NEs offering this simple service are known as routers.

IP服务的一个简单示例是经典的IPv4转发。在这种情况下,诸如路由协议(OSPF、RIP等)和专有CLI/GUI配置等控制组件修改FE的转发表,以便提供将数据包转发到下一跳的简单服务。传统上,提供这种简单服务的网元称为路由器。

In the diagram below, we show a simple FE<->CP setup to provide an example of the classical IPv4 service with an extension to do some basic QoS egress scheduling and illustrate how the setup fits in this described model.

在下图中,我们展示了一个简单的FE<->CP设置,以提供一个经典IPv4服务的示例,该服务带有一个扩展,用于执行一些基本的QoS出口调度,并说明该设置如何适用于所述模型。

                           Control Plane (CP)
                          .------------------------------------
                          |    /^^^^^^\      /^^^^^^\         |
                          |   |        |    | COPS  |-\       |
                          |   | ospfd  |    |  PEP  |  \      |
                          |   \       /      \_____/    |     |
                        /------\_____/         |       /      |
                        | |        |           |     /        |
                        | |_________\__________|____|_________|
                        |           |          |    |
                       ******************************************
         Forwarding    ************* Netlink  layer ************
         Engine (FE)   *****************************************
          .-------------|-----------|----------|---|-------------
          |       IPv4 forwarding   |              |             |
          |       FE Service       /               /             |
          |       Component       /               /              |
          |       ---------------/---------------/---------      |
          |       |             |               /         |      |
   packet |       |     --------|--        ----|-----     |   packet
   in     |       |     |  IPv4    |      | Egress   |    |    out
   -->--->|------>|---->|Forwarding|----->| QoS      |--->| ---->|->
          |       |     |          |      | Scheduler|    |      |
          |       |     -----------        ----------     |      |
          |       |                                       |      |
          |        ---------------------------------------       |
          |                                                      |
          -------------------------------------------------------
        
                           Control Plane (CP)
                          .------------------------------------
                          |    /^^^^^^\      /^^^^^^\         |
                          |   |        |    | COPS  |-\       |
                          |   | ospfd  |    |  PEP  |  \      |
                          |   \       /      \_____/    |     |
                        /------\_____/         |       /      |
                        | |        |           |     /        |
                        | |_________\__________|____|_________|
                        |           |          |    |
                       ******************************************
         Forwarding    ************* Netlink  layer ************
         Engine (FE)   *****************************************
          .-------------|-----------|----------|---|-------------
          |       IPv4 forwarding   |              |             |
          |       FE Service       /               /             |
          |       Component       /               /              |
          |       ---------------/---------------/---------      |
          |       |             |               /         |      |
   packet |       |     --------|--        ----|-----     |   packet
   in     |       |     |  IPv4    |      | Egress   |    |    out
   -->--->|------>|---->|Forwarding|----->| QoS      |--->| ---->|->
          |       |     |          |      | Scheduler|    |      |
          |       |     -----------        ----------     |      |
          |       |                                       |      |
          |        ---------------------------------------       |
          |                                                      |
          -------------------------------------------------------
        

The above diagram illustrates ospfd, an OSPF protocol control daemon, and a COPS Policy Enforcement Point (PEP) as distinct CPCs. The IPv4 FE component includes the IPv4 Forwarding service module as well as the Egress Scheduling service module. Another service might add a policy forwarder between the IPv4 forwarder and the QoS egress scheduler. A simpler classical service would have constituted only the IPv4 forwarder.

上图将ospfd、OSPF协议控制守护程序和COPS策略实施点(PEP)作为不同的CPC进行了说明。IPv4 FE组件包括IPv4转发服务模块以及出口调度服务模块。另一个服务可能会在IPv4转发器和QoS出口调度程序之间添加策略转发器。一个更简单的经典服务将只构成IPv4转发器。

Over the years, it has become important to add additional services to routers to meet emerging requirements. More complex services extending classical forwarding have been added and standardized. These newer services might go beyond the layer 3 contents of the packet header. However, the name "router", although a misnomer, is still used to describe these NEs. Services (which may look beyond

多年来,为路由器添加额外服务以满足新出现的需求变得越来越重要。已经添加并标准化了扩展经典转发的更复杂的服务。这些较新的服务可能超出包头的第3层内容。然而,“路由器”这个名字虽然用词不当,但仍然被用来描述这些网元。服务(可能超越

the classical L3 service headers) include firewalling, QoS in Diffserv and RSVP, NAT, policy based routing, etc. Newer control protocols or management activities are introduced with these new services.

经典的L3服务头)包括防火墙、区分服务和RSVP中的QoS、NAT、基于策略的路由等。这些新服务引入了新的控制协议或管理活动。

One extreme definition of a IP service is something for which a service provider would be able to charge.

IP服务的一个极端定义是服务提供商可以收费的东西。

2. Netlink Architecture
2. 网络链接体系结构

Control of IP service components is defined by using templates.

IP服务组件的控制是通过使用模板定义的。

The FEC and CPC participate to deliver the IP service by communicating using these templates. The FEC might continuously get updates from the Control Plane Component on how to operate the service (e.g., for v4 forwarding or for route additions or deletions).

FEC和CPC通过使用这些模板进行通信来参与提供IP服务。FEC可能会不断从控制平面组件获取有关如何操作服务的更新(例如,v4转发或路由添加或删除)。

The interaction between the FEC and the CPC, in the Netlink context, defines a protocol. Netlink provides mechanisms for the CPC (residing in user space) and the FEC (residing in kernel space) to have their own protocol definition -- kernel space and user space just mean different protection domains. Therefore, a wire protocol is needed to communicate. The wire protocol is normally provided by some privileged service that is able to copy between multiple protection domains. We will refer to this service as the Netlink service. The Netlink service can also be encapsulated in a different transport layer, if the CPC executes on a different node than the FEC. The FEC and CPC, using Netlink mechanisms, may choose to define a reliable protocol between each other. By default, however, Netlink provides an unreliable communication.

在Netlink上下文中,FEC和CPC之间的交互定义了一个协议。Netlink为CPC(驻留在用户空间)和FEC(驻留在内核空间)提供了机制,使它们有自己的协议定义——内核空间和用户空间只是指不同的保护域。因此,需要有线协议进行通信。wire协议通常由一些能够在多个保护域之间复制的特权服务提供。我们将此服务称为Netlink服务。如果CPC在不同于FEC的节点上执行,则Netlink服务也可以封装在不同的传输层中。FEC和CPC使用Netlink机制,可以选择在彼此之间定义可靠的协议。但是,默认情况下,Netlink提供不可靠的通信。

Note that the FEC and CPC can both live in the same memory protection domain and use the connect() system call to create a path to the peer and talk to each other. We will not discuss this mechanism further other than to say that it is available. Throughout this document, we will refer interchangeably to the FEC to mean kernel space and the CPC to mean user space. This denomination is not meant, however, to restrict the two components to these protection domains or to the same compute node.

请注意,FEC和CPC都可以位于同一内存保护域中,并使用connect()系统调用创建到对等方的路径并相互通信。我们不会进一步讨论这一机制,只是说它是可用的。在本文档中,我们将交替使用FEC表示内核空间,CPC表示用户空间。但是,此命名并不意味着将这两个组件限制在这些保护域或同一计算节点上。

Note: Netlink allows participation in IP services by both service components.

注意:Netlink允许两个服务组件参与IP服务。

2.1. Netlink Logical Model
2.1. 网络链路逻辑模型

In the diagram below we show a simple FEC<->CPC logical relationship. We use the IPv4 forwarding FEC (NETLINK_ROUTE, which is discussed further below) as an example.

在下图中,我们显示了一个简单的FEC<->CPC逻辑关系。我们使用IPv4转发FEC(NETLINK_路由,下面将进一步讨论)作为示例。

                    Control Plane (CP)
                   .------------------------------------
                   |    /^^^^^\        /^^^^^\          |
                   |   |       |      / CPC-2 \         |
                   |   | CPC-1 |     | COPS   |         |
                   |   | ospfd |     |  PEP   |         |
                   |   |      /       \____ _/          |
                   |    \____/            |             |
                   |      |               |             |
                ****************************************|
                ************* BROADCAST WIRE  ************
   FE---------- *****************************************.
   |      IPv4 forwarding |    |           |             |
   |               FEC    |    |           |             |
   |       --------------/ ----|-----------|--------     |
   |       |            /      |           |       |     |
   |       |     .-------.  .-------.   .------.   |     |
   |       |     |Ingress|  | IPv4  |   |Egress|   |     |
   |       |     |police |  |Forward|   | QoS  |   |     |
   |       |     |_______|  |_______|   |Sched |   |     |
   |       |                             ------    |     |
   |        ---------------------------------------      |
   |                                                     |
    -----------------------------------------------------
        
                    Control Plane (CP)
                   .------------------------------------
                   |    /^^^^^\        /^^^^^\          |
                   |   |       |      / CPC-2 \         |
                   |   | CPC-1 |     | COPS   |         |
                   |   | ospfd |     |  PEP   |         |
                   |   |      /       \____ _/          |
                   |    \____/            |             |
                   |      |               |             |
                ****************************************|
                ************* BROADCAST WIRE  ************
   FE---------- *****************************************.
   |      IPv4 forwarding |    |           |             |
   |               FEC    |    |           |             |
   |       --------------/ ----|-----------|--------     |
   |       |            /      |           |       |     |
   |       |     .-------.  .-------.   .------.   |     |
   |       |     |Ingress|  | IPv4  |   |Egress|   |     |
   |       |     |police |  |Forward|   | QoS  |   |     |
   |       |     |_______|  |_______|   |Sched |   |     |
   |       |                             ------    |     |
   |        ---------------------------------------      |
   |                                                     |
    -----------------------------------------------------
        

Netlink logically models FECs and CPCs in the form of nodes interconnected to each other via a broadcast wire.

Netlink以节点的形式对FEC和CPC进行逻辑建模,这些节点通过广播线相互连接。

The wire is specific to a service. The example above shows the broadcast wire belonging to the extended IPv4 forwarding service.

电线是特定于服务的。上面的示例显示了属于扩展IPv4转发服务的广播线。

Nodes (CPCs or FECs as illustrated above) connect to the wire and register to receive specific messages. CPCs may connect to multiple wires if it helps them to control the service better. All nodes (CPCs and FECs) dump packets on the broadcast wire. Packets can be discarded by the wire if they are malformed or not specifically formatted for the wire. Dropped packets are not seen by any of the nodes. The Netlink service may signal an error to the sender if it detects a malformatted Netlink packet.

节点(如上所示的CPC或FEC)连接到导线并注册以接收特定消息。如果有助于CPC更好地控制服务,则CPC可以连接到多条电线。所有节点(CPC和FEC)在广播线上转储数据包。如果数据包的格式不正确或不是专门针对数据包的格式,则数据包可能会被数据包线丢弃。任何节点都看不到丢弃的数据包。如果Netlink服务检测到格式错误的Netlink数据包,它可能会向发送方发出错误信号。

Packets sent on the wire can be broadcast, multicast, or unicast. FECs or CPCs register for specific messages of interest for processing or just monitoring purposes.

通过有线发送的数据包可以是广播、多播或单播。FEC或CPC注册特定感兴趣的消息,用于处理或仅用于监控目的。

Appendices 1 and 2 have a high level overview of this interaction.

附录1和附录2从高层次概述了这种互动。

2.2. Message Format
2.2. 消息格式

There are three levels to a Netlink message: The general Netlink message header, the IP service specific template, and the IP service specific data.

Netlink消息有三个级别:常规Netlink消息头、特定于IP服务的模板和特定于IP服务的数据。

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                   Netlink message header                      |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  IP Service Template                          |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  IP Service specific data in TLVs             |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                   Netlink message header                      |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  IP Service Template                          |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  IP Service specific data in TLVs             |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

The Netlink message is used to communicate between the FEC and CPC for parameterization of the FECs, asynchronous event notification of FEC events to the CPCs, and statistics querying/gathering (typically by a CPC).

Netlink消息用于在FEC和CPC之间进行通信,以便对FEC进行参数化、向CPC发送FEC事件的异步事件通知以及统计信息查询/收集(通常由CPC进行)。

The Netlink message header is generic for all services, whereas the IP Service Template header is specific to a service. Each IP Service then carries parameterization data (CPC->FEC direction) or response (FEC->CPC direction). These parameterizations are in TLV (Type-Length-Value) format and are unique to the service.

Netlink消息头是所有服务的通用消息头,而IP服务模板头是特定于服务的消息头。然后,每个IP服务携带参数化数据(CPC->FEC方向)或响应(FEC->CPC方向)。这些参数化是TLV(类型-长度-值)格式的,并且对服务是唯一的。

The different parts of the netlink message are discussed in the following sections.

netlink消息的不同部分将在以下部分中讨论。

2.3. Protocol Model
2.3. 协议模型

This section expands on how Netlink provides the mechanism for service-oriented FEC and CPC interaction.

本节介绍Netlink如何为面向服务的FEC和CPC交互提供机制。

2.3.1. Service Addressing
2.3.1. 服务寻址

Access is provided by first connecting to the service on the FE. The connection is achieved by making a socket() system call to the PF_NETLINK domain. Each FEC is identified by a protocol number. One may open either SOCK_RAW or SOCK_DGRAM type sockets, although Netlink does not distinguish between the two. The socket connection provides the basis for the FE<->CP addressing.

通过首先连接FE上的服务提供访问。通过对PF_NETLINK域进行socket()系统调用来实现连接。每个FEC由协议编号标识。可以打开SOCK_RAW或SOCK_DGRAM类型的套接字,但Netlink不区分两者。套接字连接为FE<->CP寻址提供了基础。

Connecting to a service is followed (at any point during the life of the connection) by either issuing a service-specific command (from the CPC to the FEC, mostly for configuration purposes), issuing a statistics-collection command, or subscribing/unsubscribing to service events. Closing the socket terminates the transaction. Refer to Appendices 1 and 2 for examples.

连接到服务之后(在连接生命周期内的任何时间点),可以发出特定于服务的命令(从CPC到FEC,主要用于配置)、发出统计信息收集命令或订阅/取消订阅服务事件。关闭套接字将终止事务。示例参见附录1和附录2。

2.3.2. Netlink Message Header
2.3.2. 网络链接消息头

Netlink messages consist of a byte stream with one or multiple Netlink headers and an associated payload. If the payload is too big to fit into a single message it, can be split over multiple Netlink messages, collectively called a multipart message. For multipart messages, the first and all following headers have the NLM_F_MULTI Netlink header flag set, except for the last header which has the Netlink header type NLMSG_DONE.

Netlink消息由一个字节流、一个或多个Netlink头和相关负载组成。如果有效负载太大,无法装入单个消息,则可以将其拆分为多个Netlink消息,统称为多部分消息。对于多部分消息,第一个和所有以下标头都设置了NLM_F_MULTI-Netlink标头标志,但最后一个标头的Netlink标头类型为NLMSG_DONE。

The Netlink message header is shown below.

Netlink消息头如下所示。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Length                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Type              |           Flags              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Process ID (PID)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Length                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Type              |           Flags              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Process ID (PID)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

The fields in the header are:

标题中的字段包括:

Length: 32 bits The length of the message in bytes, including the header.

长度:32位消息的长度,以字节为单位,包括标头。

Type: 16 bits This field describes the message content. It can be one of the standard message types: NLMSG_NOOP Message is ignored. NLMSG_ERROR The message signals an error and the payload contains a nlmsgerr structure. This can be looked at as a NACK and typically it is from FEC to CPC. NLMSG_DONE Message terminates a multipart message.

类型:16位此字段描述消息内容。它可以是标准消息类型之一:忽略NLMSG_NOOP消息。NLMSG_ERROR消息发出错误信号,有效负载包含nlmsgerr结构。这可以看作是NACK,通常是从FEC到CPC。NLMSG_DONE消息终止多部分消息。

Individual IP services specify more message types, e.g., NETLINK_ROUTE service specifies several types, such as RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK, RTM_NEWADDR, RTM_DELADDR, RTM_NEWROUTE, RTM_DELROUTE, etc.

个别IP服务指定更多的消息类型,例如,NETLINK_路由服务指定几种类型,例如RTM_NEWLINK、RTM_DELLINK、RTM_GETLINK、RTM_NEWADDR、RTM_DELADDR、RTM_NEWROUTE、RTM_DELROUTE等。

Flags: 16 bits The standard flag bits used in Netlink are NLM_F_REQUEST Must be set on all request messages (typically from user space to kernel space) NLM_F_MULTI Indicates the message is part of a multipart message terminated by NLMSG_DONE NLM_F_ACK Request for an acknowledgment on success. Typical direction of request is from user space (CPC) to kernel space (FEC). NLM_F_ECHO Echo this request. Typical direction of request is from user space (CPC) to kernel space (FEC).

标志:16位Netlink中使用的标准标志位为NLM_F_请求,必须在所有请求消息上设置(通常从用户空间到内核空间)。NLM_F_MULTI表示消息是由NLMSG_DONE NLM_F_ACK REQUEST终止的多部分消息的一部分,以在成功时进行确认。请求的典型方向是从用户空间(CPC)到内核空间(FEC)。NLM_F_ECHO回显此请求。请求的典型方向是从用户空间(CPC)到内核空间(FEC)。

Additional flag bits for GET requests on config information in the FEC. NLM_F_ROOT Return the complete table instead of a single entry. NLM_F_MATCH Return all entries matching criteria passed in message content. NLM_F_ATOMIC Return an atomic snapshot of the table being referenced. This may require special privileges because it has the potential to interrupt service in the FE for a longer time.

FEC中配置信息的GET请求的附加标志位。NLM_F_ROOT返回完整的表,而不是单个条目。NLM_F_MATCH返回与消息内容中传递的条件匹配的所有条目。NLM_F_原子返回被引用表的原子快照。这可能需要特殊权限,因为它可能会中断FE中的服务更长时间。

Convenience macros for flag bits: NLM_F_DUMP This is NLM_F_ROOT or'ed with NLM_F_MATCH

标志位的方便宏:NLM_F_DUMP这是NLM_F_根或与NLM_F_匹配

Additional flag bits for NEW requests NLM_F_REPLACE Replace existing matching config object with this request. NLM_F_EXCL Don't replace the config object if it already exists. NLM_F_CREATE Create config object if it doesn't already exist. NLM_F_APPEND Add to the end of the object list.

新请求的附加标志位NLM_F_用此请求替换现有的匹配配置对象。NLM_F_EXCL不替换配置对象(如果它已经存在)。NLM_F_创建配置对象(如果该对象尚不存在)。NLM_F_将添加添加到对象列表的末尾。

For those familiar with BSDish use of such operations in route sockets, the equivalent translations are:

对于熟悉在路由插座中使用此类操作的人员,等效翻译为:

- BSD ADD operation equates to NLM_F_CREATE or-ed with NLM_F_EXCL - BSD CHANGE operation equates to NLM_F_REPLACE - BSD Check operation equates to NLM_F_EXCL - BSD APPEND equivalent is actually mapped to NLM_F_CREATE

- BSD添加操作等同于NLM_F_创建或使用NLM_F_EXCL进行ed-BSD更改操作等同于NLM_F_替换-BSD检查操作等同于NLM_F_EXCL-BSD追加等效实际上映射到NLM_F_创建

Sequence Number: 32 bits The sequence number of the message.

序列号:32位消息的序列号。

Process ID (PID): 32 bits The PID of the process sending the message. The PID is used by the kernel to multiplex to the correct sockets. A PID of zero is used when sending messages to user space from the kernel.

进程ID(PID):32位发送消息的进程的PID。内核使用PID多路传输到正确的套接字。从内核向用户空间发送消息时使用零PID。

2.3.2.1. Mechanisms for Creating Protocols
2.3.2.1. 创建协议的机制

One could create a reliable protocol between an FEC and a CPC by using the combination of sequence numbers, ACKs, and retransmit timers. Both sequence numbers and ACKs are provided by Netlink; timers are provided by Linux.

通过使用序列号、ack和重传定时器的组合,可以在FEC和CPC之间创建可靠的协议。序列号和ACK均由Netlink提供;计时器由Linux提供。

One could create a heartbeat protocol between the FEC and CPC by using the ECHO flags and the NLMSG_NOOP message.

可以使用ECHO标志和NLMSG_NOOP消息在FEC和CPC之间创建心跳协议。

2.3.2.2. The ACK Netlink Message
2.3.2.2. 确认Netlink消息

This message is actually used to denote both an ACK and a NACK. Typically, the direction is from FEC to CPC (in response to an ACK request message). However, the CPC should be able to send ACKs back to FEC when requested. The semantics for this are IP service specific.

该消息实际上用于表示ACK和NACK。通常,方向是从FEC到CPC(响应ACK请求消息)。但是,CPC应能够在请求时将ACK发送回FEC。其语义是特定于IP服务的。

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Netlink message header                  |
   |                       type = NLMSG_ERROR                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Error code                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       OLD Netlink message header              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Netlink message header                  |
   |                       type = NLMSG_ERROR                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Error code                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       OLD Netlink message header              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Error code: integer (typically 32 bits)

错误代码:整数(通常为32位)

An error code of zero indicates that the message is an ACK response. An ACK response message contains the original Netlink message header, which can be used to compare against (sent sequence numbers, etc).

错误代码为零表示消息是ACK响应。ACK响应消息包含原始Netlink消息头,可用于与(发送的序列号等)进行比较。

A non-zero error code message is equivalent to a Negative ACK (NACK). In such a situation, the Netlink data that was sent down to the kernel is returned appended to the original Netlink message header. An error code printable via the perror() is also set (not in the message header, rather in the executing environment state variable).

非零错误代码消息相当于负ACK(NACK)。在这种情况下,发送到内核的Netlink数据返回时会附加到原始Netlink消息头中。还设置了可通过perror()打印的错误代码(不在消息头中,而是在执行环境状态变量中)。

2.3.3. FE System Services' Templates
2.3.3. FE系统服务的模板

These are services that are offered by the system for general use by other services. They include the ability to configure, gather statistics and listen to changes in shared resources. IP address management, link events, etc. fit here. We create this section for these services for logical separation, despite the fact that they are accessed via the NETLINK_ROUTE FEC. The reason that they exist within NETLINK_ROUTE is due to historical cruft: the BSD 4.4 Route Sockets implemented them as part of the IPv4 forwarding sockets.

这些是系统提供的供其他服务通用的服务。它们包括配置、收集统计数据和侦听共享资源中的更改的能力。IP地址管理、链接事件等适合这里。我们为这些服务创建此部分以实现逻辑分离,尽管它们是通过NETLINK_路由FEC访问的。它们存在于NETLINK_路由中的原因是历史遗留问题:BSD 4.4路由套接字将它们作为IPv4转发套接字的一部分实现。

2.3.3.1. Network Interface Service Module
2.3.3.1. 网络接口服务模块

This service provides the ability to create, remove, or get information about a specific network interface. The network interface can be either physical or virtual and is network protocol independent (e.g., an x.25 interface can be defined via this message). The Interface service message template is shown below.

此服务提供创建、删除或获取特定网络接口信息的功能。网络接口可以是物理接口,也可以是虚拟接口,并且与网络协议无关(例如,可以通过此消息定义x.25接口)。接口服务消息模板如下所示。

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |   Reserved  |          Device Type              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Device Flags                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Change Mask                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |   Reserved  |          Device Type              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Device Flags                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Change Mask                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Family: 8 bits This is always set to AF_UNSPEC.

族:8位这总是设置为AF_unsec。

Device Type: 16 bits This defines the type of the link. The link could be Ethernet, a tunnel, etc. We are interested only in IPv4, although the link type is L3 protocol-independent.

设备类型:16位定义链路的类型。链路可以是以太网、隧道等。我们只对IPv4感兴趣,尽管链路类型与L3协议无关。

Interface Index: 32 bits Uniquely identifies interface.

接口索引:32位唯一标识接口。

Device Flags: 32 bits

设备标志:32位

IFF_UP Interface is administratively up. IFF_BROADCAST Valid broadcast address set. IFF_DEBUG Internal debugging flag. IFF_LOOPBACK Interface is a loopback interface. IFF_POINTOPOINT Interface is a point-to-point link. IFF_RUNNING Interface is operationally up. IFF_NOARP No ARP protocol needed for this interface. IFF_PROMISC Interface is in promiscuous mode. IFF_NOTRAILERS Avoid use of trailers. IFF_ALLMULTI Receive all multicast packets. IFF_MASTER Master of a load balancing bundle. IFF_SLAVE Slave of a load balancing bundle. IFF_MULTICAST Supports multicast.

IFF_UP接口在管理上处于启动状态。IFF_广播有效的广播地址集。IFF_调试内部调试标志。IFF_环回接口是一个环回接口。IFF_PointPointPoint接口是一个点对点链接。IFF_运行界面已启动。IFF_NOARP此接口不需要ARP协议。IFF_PROMISC接口处于混杂模式。铁路司机避免使用拖车。IFF_ALLMULTI接收所有多播数据包。IFF_负载平衡捆绑包的主控主机。IFF_负载平衡捆绑包的从机。IFF_多播支持多播。

IFF_PORTSEL Is able to select media type via ifmap. IFF_AUTOMEDIA Auto media selection active. IFF_DYNAMIC Interface was dynamically created.

IFF_PORTSEL能够通过ifmap选择媒体类型。IFF_自动媒体自动媒体选择处于活动状态。已动态创建IFF_动态接口。

Change Mask: 32 bits Reserved for future use. Must be set to 0xFFFFFFFF.

更改掩码:保留32位供将来使用。必须设置为0xFFFFFF。

   Applicable attributes:
          Attribute            Description
          ..........................................................
          IFLA_UNSPEC          Unspecified.
          IFLA_ADDRESS         Hardware address interface L2 address.
          IFLA_BROADCAST       Hardware address L2 broadcast
                               address.
          IFLA_IFNAME          ASCII string device name.
          IFLA_MTU             MTU of the device.
          IFLA_LINK            ifindex of link to which this device
                               is bound.
          IFLA_QDISC           ASCII string defining egress root
                               queuing discipline.
          IFLA_STATS           Interface statistics.
        
   Applicable attributes:
          Attribute            Description
          ..........................................................
          IFLA_UNSPEC          Unspecified.
          IFLA_ADDRESS         Hardware address interface L2 address.
          IFLA_BROADCAST       Hardware address L2 broadcast
                               address.
          IFLA_IFNAME          ASCII string device name.
          IFLA_MTU             MTU of the device.
          IFLA_LINK            ifindex of link to which this device
                               is bound.
          IFLA_QDISC           ASCII string defining egress root
                               queuing discipline.
          IFLA_STATS           Interface statistics.
        

Netlink message types specific to this service: RTM_NEWLINK, RTM_DELLINK, and RTM_GETLINK

特定于此服务的Netlink消息类型:RTM_NEWLINK、RTM_DELLINK和RTM_GETLINK

2.3.3.2. IP Address Service Module
2.3.3.2. IP地址服务模块

This service provides the ability to add, remove, or receive information about an IP address associated with an interface. The address provisioning service message template is shown below.

此服务提供添加、删除或接收与接口关联的IP地址信息的功能。地址设置服务消息模板如下所示。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |     Length    |     Flags     |    Scope      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |     Length    |     Flags     |    Scope      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Family: 8 bits Address Family: AF_INET for IPv4; and AF_INET6 for IPV6.

系列:8位地址系列:IPv4的AF_INET;和用于IPV6的AF_INET6。

Length: 8 bits The length of the address mask.

长度:8位地址掩码的长度。

Flags: 8 bits IFA_F_SECONDARY For secondary address (alias interface).

标志:辅助地址(别名接口)的8位IFA_F_SECONDARY。

IFA_F_PERMANENT For a permanent address set by the user. When this is not set, it means the address was dynamically created (e.g., by stateless autoconfiguration). IFA_F_DEPRECATED Defines deprecated (IPV4) address. IFA_F_TENTATIVE Defines tentative (IPV4) address (duplicate address detection is still in progress). Scope: 8 bits The address scope in which the address stays valid. SCOPE_UNIVERSE: Global scope. SCOPE_SITE (IPv6 only): Only valid within this site. SCOPE_LINK: Valid only on this device. SCOPE_HOST: Valid only on this host.

用户设置的永久地址的IFA_F_PERMANENT。如果未设置,则表示地址是动态创建的(例如,通过无状态自动配置)。IFA_F_DEPRECATED定义不推荐的(IPV4)地址。IFA_F_暂定定义暂定(IPV4)地址(重复地址检测仍在进行中)。范围:8位地址保持有效的地址范围。范围:全局范围。范围_站点(仅限IPv6):仅在此站点内有效。作用域链接:仅在此设备上有效。作用域\主机:仅在此主机上有效。

le attributes:

le属性:

Attribute Description IFA_UNSPEC Unspecified. IFA_ADDRESS Raw protocol address of interface. IFA_LOCAL Raw protocol local address. IFA_LABEL ASCII string name of the interface. IFA_BROADCAST Raw protocol broadcast address. IFA_ANYCAST Raw protocol anycast address. IFA_CACHEINFO Cache address information.

属性描述IFA_UNSPEC未指定。IFA_地址接口的原始协议地址。IFA_本地原始协议本地地址。IFA_标签接口的ASCII字符串名称。IFA_广播原始协议广播地址。IFA_选播原始协议选播地址。IFA_CACHEINFO缓存地址信息。

Netlink messages specific to this service: RTM_NEWADDR, RTM_DELADDR, and RTM_GETADDR.

特定于此服务的Netlink消息:RTM_NEWADDR、RTM_DELADDR和RTM_GETADDR。

3. Currently Defined Netlink IP Services
3. 当前定义的Netlink IP服务

Although there are many other IP services defined that are using Netlink, as mentioned earlier, we will talk only about a handful of those integrated into kernel version 2.4.6. These are:

尽管有许多其他定义的IP服务正在使用Netlink,如前所述,我们将只讨论集成到内核版本2.4.6中的少数IP服务。这些是:

NETLINK_ROUTE, NETLINK_FIREWALL, and NETLINK_ARPD.

NETLINK_路由、NETLINK_防火墙和NETLINK_ARPD。

3.1. IP Service NETLINK_ROUTE
3.1. IP服务网络链接路由

This service allows CPCs to modify the IPv4 routing table in the Forwarding Engine. It can also be used by CPCs to receive routing updates, as well as to collect statistics.

此服务允许CPC修改转发引擎中的IPv4路由表。CPC还可以使用它来接收路由更新,以及收集统计信息。

3.1.1. Network Route Service Module
3.1.1. 网络路由服务模块

This service provides the ability to create, remove or receive information about a network route. The service message template is shown below.

此服务提供创建、删除或接收有关网络路由的信息的功能。服务消息模板如下所示。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |  Src length   |  Dest length  |     TOS       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Table ID   |   Protocol    |     Scope     |     Type      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Flags                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |  Src length   |  Dest length  |     TOS       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Table ID   |   Protocol    |     Scope     |     Type      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Flags                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Family: 8 bits Address Family: AF_INET for IPv4; and AF_INET6 for IPV6.

系列:8位地址系列:IPv4的AF_INET;和用于IPV6的AF_INET6。

Src length: 8 bits Prefix length of source IP address.

Src长度:源IP地址的8位前缀长度。

Dest length: 8 bits Prefix length of destination IP address.

Dest length:目标IP地址的8位前缀长度。

TOS: 8 bits The 8-bit TOS (should be deprecated to make room for DSCP). Table ID: 8 bits Table identifier. Up to 255 route tables are supported. RT_TABLE_UNSPEC An unspecified routing table. RT_TABLE_DEFAULT The default table. RT_TABLE_MAIN The main table. RT_TABLE_LOCAL The local table.

TOS:8位8位TOS(应弃用以为DSCP腾出空间)。表ID:8位表标识符。最多支持255个路由表。RT_TABLE_取消指定路由表。RT\u TABLE\u默认为默认表。RT_TABLE_MAIN主表。RT\u TABLE\u LOCAL本地表。

The user may assign arbitrary values between RT_TABLE_UNSPEC(0) and RT_TABLE_DEFAULT(253).

用户可以在RT_TABLE_unsec(0)和RT_TABLE_DEFAULT(253)之间分配任意值。

   Protocol: 8 bits
   Identifies what/who added the route.
                 Protocol          Route origin.
                 ..............................................
                 RTPROT_UNSPEC     Unknown.
                 RTPROT_REDIRECT   By an ICMP redirect.
                 RTPROT_KERNEL     By the kernel.
                 RTPROT_BOOT       During bootup.
                 RTPROT_STATIC     By the administrator.
        
   Protocol: 8 bits
   Identifies what/who added the route.
                 Protocol          Route origin.
                 ..............................................
                 RTPROT_UNSPEC     Unknown.
                 RTPROT_REDIRECT   By an ICMP redirect.
                 RTPROT_KERNEL     By the kernel.
                 RTPROT_BOOT       During bootup.
                 RTPROT_STATIC     By the administrator.
        

Values larger than RTPROT_STATIC(4) are not interpreted by the kernel, they are just for user information. They may be used to tag the source of a routing information or to distinguish between multiple routing daemons. See <linux/rtnetlink.h> for the routing daemon identifiers that are already assigned.

大于RTPROT_STATIC(4)的值不会被内核解释,它们只是用于用户信息。它们可用于标记路由信息的源或区分多个路由守护进程。请参阅<linux/rtnetlink.h>,了解已分配的路由守护程序标识符。

Scope: 8 bits Route scope (valid distance to destination). RT_SCOPE_UNIVERSE Global route. RT_SCOPE_SITE Interior route in the local autonomous system. RT_SCOPE_LINK Route on this link. RT_SCOPE_HOST Route on the local host. RT_SCOPE_NOWHERE Destination does not exist.

范围:8位路由范围(到目的地的有效距离)。RT_SCOPE_UNIVERSE全局路径。RT_范围_本地自治系统中的现场内部路线。RT\u范围\u此链接上的链接路由。本地主机上的RT_SCOPE_主机路由。RT\u范围\u无处目标不存在。

The values between RT_SCOPE_UNIVERSE(0) and RT_SCOPE_SITE(200) are available to the user.

用户可以使用RT_SCOPE_UNIVERSE(0)和RT_SCOPE_SITE(200)之间的值。

Type: 8 bits The type of route.

类型:8位为路由类型。

                 Route type        Description
                 ----------------------------------------------------
                 RTN_UNSPEC        Unknown route.
                 RTN_UNICAST       A gateway or direct route.
                 RTN_LOCAL         A local interface route.
                 RTN_BROADCAST     A local broadcast route
                                   (sent as a broadcast).
                 RTN_ANYCAST       An anycast route.
                 RTN_MULTICAST     A multicast route.
                 RTN_BLACKHOLE     A silent packet dropping route.
                 RTN_UNREACHABLE   An unreachable destination.
                                   Packets dropped and host
                                   unreachable ICMPs are sent to the
                                   originator.
                 RTN_PROHIBIT      A packet rejection route.  Packets
                                   are dropped and communication
                                   prohibited ICMPs are sent to the
                                   originator.
                 RTN_THROW         When used with policy routing,
                                   continue routing lookup in another
                                   table.  Under normal routing,
                                   packets are dropped and net
                                   unreachable ICMPs are sent to the
                                   originator.
                 RTN_NAT           A network address translation
                                   rule.
                 RTN_XRESOLVE      Refer to an external resolver (not
                                   implemented).
        
                 Route type        Description
                 ----------------------------------------------------
                 RTN_UNSPEC        Unknown route.
                 RTN_UNICAST       A gateway or direct route.
                 RTN_LOCAL         A local interface route.
                 RTN_BROADCAST     A local broadcast route
                                   (sent as a broadcast).
                 RTN_ANYCAST       An anycast route.
                 RTN_MULTICAST     A multicast route.
                 RTN_BLACKHOLE     A silent packet dropping route.
                 RTN_UNREACHABLE   An unreachable destination.
                                   Packets dropped and host
                                   unreachable ICMPs are sent to the
                                   originator.
                 RTN_PROHIBIT      A packet rejection route.  Packets
                                   are dropped and communication
                                   prohibited ICMPs are sent to the
                                   originator.
                 RTN_THROW         When used with policy routing,
                                   continue routing lookup in another
                                   table.  Under normal routing,
                                   packets are dropped and net
                                   unreachable ICMPs are sent to the
                                   originator.
                 RTN_NAT           A network address translation
                                   rule.
                 RTN_XRESOLVE      Refer to an external resolver (not
                                   implemented).
        

Flags: 32 bits Further qualify the route. RTM_F_NOTIFY If the route changes, notify the user. RTM_F_CLONED Route is cloned from another route. RTM_F_EQUALIZE Allow randomization of next hop path in multi-path routing (currently not implemented).

标志:32位进一步限定路由。RTM__通知如果路由发生变化,通知用户。RTM_F_克隆的路由是从另一个路由克隆的。RTM_F_均衡器允许在多路径路由中随机化下一跳路径(当前未实现)。

   Attributes applicable to this service:
                 Attribute       Description
                 ---------------------------------------------------
                 RTA_UNSPEC      Ignored.
                 RTA_DST         Protocol address for route
                                 destination address.
                 RTA_SRC         Protocol address for route source
                                 address.
                 RTA_IIF         Input interface index.
                 RTA_OIF         Output interface index.
                 RTA_GATEWAY     Protocol address for the gateway of
                                 the route
                 RTA_PRIORITY    Priority of route.
                 RTA_PREFSRC     Preferred source address in cases
                                 where more than one source address
                                 could be used.
                 RTA_METRICS     Route metrics attributed to route
                                 and associated protocols (e.g.,
                                 RTT, initial TCP window, etc.).
                 RTA_MULTIPATH   Multipath route next hop's
                                 attributes.
                 RTA_PROTOINFO   Firewall based policy routing
                                 attribute.
                 RTA_FLOW        Route realm.
                 RTA_CACHEINFO   Cached route information.
        
   Attributes applicable to this service:
                 Attribute       Description
                 ---------------------------------------------------
                 RTA_UNSPEC      Ignored.
                 RTA_DST         Protocol address for route
                                 destination address.
                 RTA_SRC         Protocol address for route source
                                 address.
                 RTA_IIF         Input interface index.
                 RTA_OIF         Output interface index.
                 RTA_GATEWAY     Protocol address for the gateway of
                                 the route
                 RTA_PRIORITY    Priority of route.
                 RTA_PREFSRC     Preferred source address in cases
                                 where more than one source address
                                 could be used.
                 RTA_METRICS     Route metrics attributed to route
                                 and associated protocols (e.g.,
                                 RTT, initial TCP window, etc.).
                 RTA_MULTIPATH   Multipath route next hop's
                                 attributes.
                 RTA_PROTOINFO   Firewall based policy routing
                                 attribute.
                 RTA_FLOW        Route realm.
                 RTA_CACHEINFO   Cached route information.
        

Additional Netlink message types applicable to this service: RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE

适用于此服务的其他Netlink消息类型:RTM_NEWROUTE、RTM_DELROUTE和RTM_GETROUTE

3.1.2. Neighbor Setup Service Module
3.1.2. 邻居设置服务模块

This service provides the ability to add, remove, or receive information about a neighbor table entry (e.g., an ARP entry or an IPv4 neighbor solicitation, etc.). The service message template is shown below.

此服务提供添加、删除或接收有关邻居表项(例如,ARP项或IPv4邻居请求等)的信息的能力。服务消息模板如下所示。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |    Reserved1  |           Reserved2           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           State             |     Flags     |     Type      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |    Reserved1  |           Reserved2           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           State             |     Flags     |     Type      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Family: 8 bits Address Family: AF_INET for IPv4; and AF_INET6 for IPV6.

系列:8位地址系列:IPv4的AF_INET;和用于IPV6的AF_INET6。

Interface Index: 32 bits The unique interface index.

接口索引:32位是唯一的接口索引。

State: 16 bits A bitmask of the following states: NUD_INCOMPLETE Still attempting to resolve. NUD_REACHABLE A confirmed working cache entry NUD_STALE an expired cache entry. NUD_DELAY Neighbor no longer reachable. Traffic sent, waiting for confirmation. NUD_PROBE A cache entry that is currently being re-solicited. NUD_FAILED An invalid cache entry. NUD_NOARP A device which does not do neighbor discovery (ARP). NUD_PERMANENT A static entry. Flags: 8 bits NTF_PROXY A proxy ARP entry. NTF_ROUTER An IPv6 router.

状态:16位以下状态的位掩码:NUD_未完成,仍在尝试解析。NUD_可访问已确认的工作缓存项NUD_过期缓存项。无法再访问NUD_延迟邻居。已发送流量,等待确认。NUD_探测当前正在重新请求的缓存项。NUD_未能创建无效的缓存项。NUD_NOARP不进行邻居发现(ARP)的设备。NUD_永久是一个静态条目。标志:8位NTF_代理一个代理ARP条目。NTF_路由器是IPv6路由器。

   Attributes applicable to this service:
                 Attributes      Description
                 ------------------------------------
                 NDA_UNSPEC      Unknown type.
                 NDA_DST         A neighbour cache network.
                                 layer destination address
                 NDA_LLADDR      A neighbor cache link layer
                                 address.
                 NDA_CACHEINFO   Cache statistics.
        
   Attributes applicable to this service:
                 Attributes      Description
                 ------------------------------------
                 NDA_UNSPEC      Unknown type.
                 NDA_DST         A neighbour cache network.
                                 layer destination address
                 NDA_LLADDR      A neighbor cache link layer
                                 address.
                 NDA_CACHEINFO   Cache statistics.
        

Additional Netlink message types applicable to this service: RTM_NEWNEIGH, RTM_DELNEIGH, and RTM_GETNEIGH.

适用于此服务的其他Netlink消息类型:RTM_NEWNEIGH、RTM_DELNEIGH和RTM_GETNEIGH。

3.1.3. Traffic Control Service
3.1.3. 交通管制服务

This service provides the ability to provision, query or listen to events under the auspices of traffic control. These include queuing disciplines, (schedulers and queue treatment algorithms -- e.g., priority-based scheduler or the RED algorithm) and classifiers. Linux Traffic Control Service is very flexible and allows for hierarchical cascading of the different blocks for traffic resource sharing.

该服务提供在交通管制支持下提供、查询或收听事件的能力。这些包括排队规则(调度器和队列处理算法——例如,基于优先级的调度器或RED算法)和分类器。Linux流量控制服务非常灵活,允许不同块的分层级联以共享流量资源。

          ++    ++                 +-----+   +-------+   ++     ++ .++
          || .  ||     +------+    |     |-->| Qdisc |-->||     ||  ||
          ||    ||---->|Filter|--->|Class|   +-------+   ||-+   ||  ||
          ||    ||  |  +------+    |     +---------------+| |   ||  ||
          || .  ||  |              +----------------------+ |   || .||
          || .  ||  |  +------+                             |   ||  ||
          ||    ||  +->|Filter|-_  +-----+   +-------+   ++ |   || .||
          || -->||  |  +------+  ->|     |-->| Qdisc |-->|| |   ||->||
          || .  ||  |              |Class|   +-------+   ||-+-->|| .||
   ->dev->||    ||  |  +------+ _->|     +---------------+|     ||  ||
          ||    ||  +->|Filter|-   +----------------------+     || .||
          ||    ||     +------+                                 || .||
          || .  |+----------------------------------------------+|  ||
          ||    |          Parent Queuing discipline             | .||
          || .  +------------------------------------------------+ .||
          || . . .. . . .. . .                 . .. .. .. .      .. ||
          |+--------------------------------------------------------+|
          |                 Parent Queuing discipline                |
          |                  (attached to egress device)             |
          +----------------------------------------------------------+
        
          ++    ++                 +-----+   +-------+   ++     ++ .++
          || .  ||     +------+    |     |-->| Qdisc |-->||     ||  ||
          ||    ||---->|Filter|--->|Class|   +-------+   ||-+   ||  ||
          ||    ||  |  +------+    |     +---------------+| |   ||  ||
          || .  ||  |              +----------------------+ |   || .||
          || .  ||  |  +------+                             |   ||  ||
          ||    ||  +->|Filter|-_  +-----+   +-------+   ++ |   || .||
          || -->||  |  +------+  ->|     |-->| Qdisc |-->|| |   ||->||
          || .  ||  |              |Class|   +-------+   ||-+-->|| .||
   ->dev->||    ||  |  +------+ _->|     +---------------+|     ||  ||
          ||    ||  +->|Filter|-   +----------------------+     || .||
          ||    ||     +------+                                 || .||
          || .  |+----------------------------------------------+|  ||
          ||    |          Parent Queuing discipline             | .||
          || .  +------------------------------------------------+ .||
          || . . .. . . .. . .                 . .. .. .. .      .. ||
          |+--------------------------------------------------------+|
          |                 Parent Queuing discipline                |
          |                  (attached to egress device)             |
          +----------------------------------------------------------+
        

The above diagram shows an example of the Egress TC block. We try to be very brief here. For more information, please refer to [11]. A packet first goes through a filter that is used to identify a class to which the packet may belong. A class is essentially a terminal

上图显示了出口TC块的示例。我们在这里尽量简短。有关更多信息,请参阅[11]。数据包首先经过一个过滤器,该过滤器用于识别数据包可能属于的类。类本质上是一个终端

queuing discipline and has a queue associated with it. The queue may be subject to a simple algorithm, like FIFO, or a more complex one, like RED or a token bucket. The outermost queuing discipline, which is referred to as the parent is typically associated with a scheduler. Within this scheduler hierarchy, however, may be other scheduling algorithms, making the Linux Egress TC very flexible.

队列规程,并具有与其关联的队列。队列可能受简单算法(如FIFO)或更复杂算法(如RED或令牌桶)的约束。最外层的队列规程(称为父队列)通常与调度程序关联。然而,在这个调度程序层次结构中,可能有其他调度算法,这使得Linux出口TC非常灵活。

The service message template that makes this possible is shown below. This template is used in both the ingress and the egress queuing disciplines (refer to the egress traffic control model in the FE model section). Each of the specific components of the model has unique attributes that describe it best. The common attributes are described below.

使这成为可能的服务消息模板如下所示。该模板用于入口和出口排队规程(参考FE模型一节中的出口交通控制模型)。模型的每个特定组件都有唯一的属性,可以最好地描述它。下面描述了常见属性。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |  Reserved1    |         Reserved2             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Qdisc handle                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Parent Qdisc                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        TCM Info                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Family    |  Reserved1    |         Reserved2             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Qdisc handle                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Parent Qdisc                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        TCM Info                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Family: 8 bits Address Family: AF_INET for IPv4; and AF_INET6 for IPV6.

系列:8位地址系列:IPv4的AF_INET;和用于IPV6的AF_INET6。

Interface Index: 32 bits The unique interface index.

接口索引:32位是唯一的接口索引。

Qdisc handle: 32 bits Unique identifier for instance of queuing discipline. Typically, this is split into major:minor of 16 bits each. The major number would also be the major number of the parent of this instance.

Qdisc句柄:队列规程实例的32位唯一标识符。通常,这被分为主要:次要,每个16位。主编号也将是此实例的父级的主编号。

Parent Qdisc: 32 bits Used in hierarchical layering of queuing disciplines. If this value and the Qdisc handle are the same and equal to TC_H_ROOT, then the defined qdisc is the top most layer known as the root qdisc.

父Qdisc:32位用于队列规程的分层。如果此值和Qdisc句柄相同且等于TC_H_ROOT,则定义的Qdisc是最顶层,称为根Qdisc。

TCM Info: 32 bits Set by the FE to 1 typically, except when the Qdisc instance is in use, in which case it is set to imply a reference count. From the CPC towards the direction of the FEC, this is typically set to 0

TCM Info:FE通常将32位设置为1,Qdisc实例正在使用时除外,在这种情况下,它被设置为暗示引用计数。从CPC到FEC的方向,这通常设置为0

except when used in the context of filters. In that case, this 32- bit field is split into a 16-bit priority field and 16-bit protocol field. The protocol is defined in kernel source <include/linux/if_ether.h>, however, the most commonly used one is ETH_P_IP (the IP protocol).

除非在过滤器上下文中使用。在这种情况下,这个32位字段被分为16位优先级字段和16位协议字段。该协议在内核源代码<include/linux/if_ether.h>中定义,但是,最常用的是ETH_P_IP(IP协议)。

The priority is used for conflict resolution when filters intersect in their expressions.

当过滤器在表达式中相交时,优先级用于冲突解决。

   Generic attributes applicable to this service:
                Attribute        Description
                ------------------------------------
                TCA_KIND         Canonical name of FE component.
                TCA_STATS        Generic usage statistics of FEC
                TCA_RATE         rate estimator being attached to
                                 FEC.  Takes snapshots of stats to
                                 compute rate.
                TCA_XSTATS       Specific statistics of FEC.
                TCA_OPTIONS      Nested FEC-specific attributes.
        
   Generic attributes applicable to this service:
                Attribute        Description
                ------------------------------------
                TCA_KIND         Canonical name of FE component.
                TCA_STATS        Generic usage statistics of FEC
                TCA_RATE         rate estimator being attached to
                                 FEC.  Takes snapshots of stats to
                                 compute rate.
                TCA_XSTATS       Specific statistics of FEC.
                TCA_OPTIONS      Nested FEC-specific attributes.
        

Appendix 3 has an example of configuring an FE component for a FIFO Qdisc.

附录3给出了配置FIFO Qdisc FE组件的示例。

Additional Netlink message types applicable to this service: RTM_NEWQDISC, RTM_DELQDISC, RTM_GETQDISC, RTM_NEWTCLASS, RTM_DELTCLASS, RTM_GETTCLASS, RTM_NEWTFILTER, RTM_DELTFILTER, and RTM_GETTFILTER.

适用于此服务的其他Netlink消息类型:RTM_NEWQDISC、RTM_DELQDISC、RTM_GETQDISC、RTM_NEWTCLASS、RTM_DELTCLASS、RTM_GetClass、RTM_NEWTFILTER、RTM_DELTFILTER和RTM_GETTFILTER。

3.2. IP Service NETLINK_FIREWALL
3.2. IP服务NETLINK_防火墙

This service allows CPCs to receive, manipulate, and re-inject packets via the IPv4 firewall service modules in the FE. A firewall rule is first inserted to activate packet redirection. The CPC informs the FEC whether it would like to receive just the metadata on the packet or the actual data and, if the metadata is desired, what is the maximum data length to be redirected. The redirected packets are still stored in the FEC, waiting a verdict from the CPC. The verdict could constitute a simple accept or drop decision of the packet, in which case the verdict is imposed on the packet still sitting on the FEC. The verdict may also include a modified packet to be sent on as a replacement.

此服务允许CPC通过FE中的IPv4防火墙服务模块接收、操作和重新注入数据包。首先插入防火墙规则以激活数据包重定向。CPC通知FEC是否只想接收数据包上的元数据或实际数据,如果需要元数据,则通知FEC要重定向的最大数据长度是多少。重定向的数据包仍存储在FEC中,等待CPC的裁决。该判决可以构成对数据包的简单接受或丢弃判决,在这种情况下,该判决被施加在仍然位于FEC上的数据包上。判决还可以包括要作为替换发送的修改包。

Two types of messages exist that can be sent from CPC to FEC. These are: Mode messages and Verdict messages. Mode messages are sent immediately to the FEC to describe what the CPC would like to receive. Verdict messages are sent to the FEC after a decision has been made on the fate of a received packet. The formats are described below.

存在两种类型的消息可以从CPC发送到FEC。这些是:模式消息和判决消息。模式消息立即发送至FEC,以描述CPC希望接收的内容。在对接收到的数据包的命运做出决定后,将判决消息发送到FEC。格式如下所述。

The mode message is described first.

首先描述模式消息。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Mode    |    Reserved1  |           Reserved2             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Range                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Mode    |    Reserved1  |           Reserved2             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Range                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Mode: 8 bits Control information on the packet to be sent to the CPC. The different types are:

模式:发送到CPC的数据包上的8位控制信息。不同的类型包括:

IPQ_COPY_META Copy only packet metadata to CPC. IPQ_COPY_PACKET Copy packet metadata and packet payloads to CPC.

IPQ_COPY_仅将数据包元数据元拷贝到CPC。IPQ_COPY_数据包将数据包元数据和数据包有效负载复制到CPC。

Range: 32 bits If IPQ_COPY_PACKET, this defines the maximum length to copy.

范围:如果IPQ_COPY_数据包为32位,则定义要复制的最大长度。

A packet and associated metadata received from user space looks as follows.

从用户空间接收的数据包和相关元数据如下所示。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Packet ID                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Mark                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       timestamp_m                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       timestamp_u                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          hook                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       indev_name                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       outdev_name                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           hw_protocol       |        hw_type                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         hw_addrlen          |           Reserved            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       hw_addr                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       data_len                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Payload . . .                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Packet ID                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Mark                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       timestamp_m                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       timestamp_u                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          hook                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       indev_name                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       outdev_name                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           hw_protocol       |        hw_type                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         hw_addrlen          |           Reserved            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       hw_addr                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       data_len                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Payload . . .                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Packet ID: 32 bits The unique packet identifier as passed to the CPC by the FEC.

数据包ID:32位FEC传递给CPC的唯一数据包标识符。

Mark: 32 bits The internal metadata value set to describe the rule in which the packet was picked.

标记:32位内部元数据值集,用于描述拾取数据包的规则。

timestamp_m: 32 bits Packet arrival time (seconds)

时间戳:32位数据包到达时间(秒)

timestamp_u: 32 bits Packet arrival time (useconds in addition to the seconds in timestamp_m)

timestamp_u:32位数据包到达时间(除timestamp_m中的秒外,使用秒)

hook: 32 bits The firewall module from which the packet was picked.

钩子:32位防火墙模块,从中提取数据包。

indev_name: 128 bits ASCII name of incoming interface.

indev_名称:传入接口的128位ASCII名称。

outdev_name: 128 bits ASCII name of outgoing interface.

outdev_名称:输出接口的128位ASCII名称。

hw_protocol: 16 bits Hardware protocol, in network order.

硬件协议:16位硬件协议,按网络顺序排列。

hw_type: 16 bits Hardware type.

硬件类型:16位硬件类型。

hw_addrlen: 8 bits Hardware address length.

硬件地址:8位硬件地址长度。

hw_addr: 64 bits Hardware address.

硬件地址:64位硬件地址。

data_len: 32 bits Length of packet data.

数据长度:数据包数据的32位长度。

Payload: size defined by data_len The payload of the packet received.

有效载荷:由数据长度定义的大小,即接收到的数据包的有效载荷。

The Verdict message format is as follows

裁决消息格式如下所示

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Value                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Packet ID                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Data Length                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Payload . . .                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Value                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Packet ID                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Data Length                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Payload . . .                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Value: 32 bits

值:32位

This is the verdict to be imposed on the packet still sitting in the FEC. Verdicts could be:

这是对仍在FEC中的数据包的判决。判决可以是:

NF_ACCEPT Accept the packet and let it continue its traversal. NF_DROP Drop the packet.

NF_接受数据包并让它继续遍历。NF_丢弃数据包。

Packet ID: 32 bits The packet identifier as passed to the CPC by the FEC.

数据包ID:32位FEC传递给CPC的数据包标识符。

Data Length: 32 bits The data length of the modified packet (in bytes). If you don't modify the packet just set it to 0.

数据长度:32位修改数据包的数据长度(字节)。如果不修改数据包,只需将其设置为0。

Payload: Size as defined by the Data Length field.

有效负载:数据长度字段定义的大小。

3.3. IP Service NETLINK_ARPD
3.3. IP服务网络链接\u ARPD

This service is used by CPCs for managing the neighbor table in the FE. The message format used between the FEC and CPC is described in the section on the Neighbor Setup Service Module.

CPCs使用此服务管理FE中的邻居表。FEC和CPC之间使用的消息格式在邻居设置服务模块一节中进行了描述。

The CPC service is expected to participate in neighbor solicitation protocol(s).

CPC服务应参与邻居请求协议。

A neighbor message of type RTM_NEWNEIGH is sent towards the CPC by the FE to inform the CPC of changes that might have happened on that neighbor's entry (e.g., a neighbor being perceived as unreachable).

FE向CPC发送RTM_NEWNEIGH类型的邻居消息,以通知CPC该邻居的条目上可能发生的更改(例如,被认为无法访问的邻居)。

RTM_GETNEIGH is used to solicit the CPC for information on a specific neighbor.

RTM_GETNEIGH用于请求CPC提供关于特定邻居的信息。

4. References
4. 工具书类
4.1. Normative References
4.1. 规范性引用文件

[1] Braden, R., Clark, D. and S. Shenker, "Integrated Services in the Internet Architecture: an Overview", RFC 1633, June 1994.

[1] Braden,R.,Clark,D.和S.Shenker,“互联网体系结构中的综合服务:概述”,RFC16331994年6月。

[2] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, June 1995.

[2] Baker,F.,“IP版本4路由器的要求”,RFC 1812,1995年6月。

[3] Blake, S., Black, D., Carlson, M., Davies, E, Wang, Z. and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998.

[3] Blake,S.,Black,D.,Carlson,M.,Davies,E,Wang,Z.和W.Weiss,“差异化服务架构”,RFC 24751998年12月。

[4] Durham, D., Boyle, J., Cohen, R., Herzog, S., Rajan, R. and A. Sastry, "The COPS (Common Open Policy Service) Protocol", RFC 2748, January 2000.

[4] 达勒姆,D.,博伊尔,J.,科恩,R.,赫尔佐格,S.,拉詹,R.和A.萨斯特里,“共同开放政策服务协议”,RFC 2748,2000年1月。

[5] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

[5] Moy,J.,“OSPF版本2”,STD 54,RFC 23281998年4月。

[6] Case, J., Fedor, M., Schoffstall, M. and C. Davin, "Simple Network Management Protocol (SNMP)", STD 15, RFC 1157, May 1990.

[6] Case,J.,Fedor,M.,Schoffstall,M.和C.Davin,“简单网络管理协议(SNMP)”,STD 15,RFC 1157,1990年5月。

[7] Andersson, L., Doolan, P., Feldman, N., Fredette, A. and B. Thomas, "LDP Specification", RFC 3036, January 2001.

[7] Andersson,L.,Doolan,P.,Feldman,N.,Fredette,A.和B.Thomas,“LDP规范”,RFC 3036,2001年1月。

[8] Bernet, Y., Blake, S., Grossman, D. and A. Smith, "An Informal Management Model for DiffServ Routers", RFC 3290, May 2002.

[8] Bernet,Y.,Blake,S.,Grossman,D.和A.Smith,“区分服务路由器的非正式管理模型”,RFC 3290,2002年5月。

4.2. Informative References
4.2. 资料性引用

[9] G. R. Wright, W. Richard Stevens. "TCP/IP Illustrated Volume 2, Chapter 20", June 1995.

[9] G.R.赖特,W.理查德·史蒂文斯。“TCP/IP图解第2卷,第20章”,1995年6月。

   [10] http://www.netfilter.org
        
   [10] http://www.netfilter.org
        
   [11] http://diffserv.sourceforge.net
        
   [11] http://diffserv.sourceforge.net
        
5. Security Considerations
5. 安全考虑

Netlink lives in a trusted environment of a single host separated by kernel and user space. Linux capabilities ensure that only someone with CAP_NET_ADMIN capability (typically, the root user) is allowed to open sockets.

Netlink生活在由内核和用户空间分隔的单个主机的可信环境中。Linux功能确保只有具有CAP_NET_ADMIN功能的用户(通常是root用户)才允许打开套接字。

6. Acknowledgements
6. 致谢

1) Andi Kleen, for man pages on netlink and rtnetlink.

1) Andi Kleen,关于netlink和rtnetlink的手册页。

2) Alexey Kuznetsov is credited for extending Netlink to the IP service delivery model. The original Netlink character device was written by Alan Cox.

2) 亚历克赛·库兹涅佐夫(Alexey Kuznetsov)因将Netlink扩展到IP服务交付模式而备受赞誉。最初的Netlink字符设备是由Alan Cox编写的。

3) Jeremy Ethridge for taking the role of someone who did not understand Netlink and reviewing the document to make sure that it made sense.

3) Jeremy Ethridge担任不了解Netlink的人的角色,并审阅文档以确保其有意义。

Appendix 1: Sample Service Hierarchy

附录1:示例服务层次结构

In the diagram below we show a simple IP service, foo, and the interaction it has between CP and FE components for the service (labels 1-3).

在下图中,我们展示了一个简单的IP服务foo,以及它在服务的CP和FE组件之间的交互(标签1-3)。

The diagram is also used to demonstrate CP<->FE addressing. In this section, we illustrate only the addressing semantics. In Appendix 2, the diagram is referenced again to define the protocol interaction between service foo's CPC and FEC (labels 4-10).

该图还用于演示CP<->FE寻址。在本节中,我们仅说明寻址语义。在附录2中,再次引用该图,以定义service foo的CPC和FEC之间的协议交互(标签4-10)。

     CP
    [--------------------------------------------------------.
    |   .-----.                                              |
    |  |                         . -------.                  |
    |  |  CLI   |               /           \                |
    |  |        |              | CP protocol |               |
    |         /->> -.          |  component  | <-.           |
    |    __ _/      |          |   For       |   |           |
    |                |         | IP service  |   ^           |
    |                Y         |    foo      |   |           |
    |                |           ___________/    ^           |
    |                Y   1,4,6,8,9 /  ^ 2,5,10   | 3,7       |
     --------------- Y------------/---|----------|-----------
                     |           ^    |          ^
                   **|***********|****|**********|**********
                   ************* Netlink  layer ************
                   **|***********|****|**********|**********
           FE        |           |    ^          ^
           .-------- Y-----------Y----|--------- |----.
           |                    |              /      |
           |                    Y            /        |
           |          . --------^-------.  /          |
           |          |FE component/module|/          |
           |          |  for IP Service   |           |
    --->---|------>---|     foo           |----->-----|------>--
           |           -------------------            |
           |                                          |
           |                                          |
            ------------------------------------------
        
     CP
    [--------------------------------------------------------.
    |   .-----.                                              |
    |  |                         . -------.                  |
    |  |  CLI   |               /           \                |
    |  |        |              | CP protocol |               |
    |         /->> -.          |  component  | <-.           |
    |    __ _/      |          |   For       |   |           |
    |                |         | IP service  |   ^           |
    |                Y         |    foo      |   |           |
    |                |           ___________/    ^           |
    |                Y   1,4,6,8,9 /  ^ 2,5,10   | 3,7       |
     --------------- Y------------/---|----------|-----------
                     |           ^    |          ^
                   **|***********|****|**********|**********
                   ************* Netlink  layer ************
                   **|***********|****|**********|**********
           FE        |           |    ^          ^
           .-------- Y-----------Y----|--------- |----.
           |                    |              /      |
           |                    Y            /        |
           |          . --------^-------.  /          |
           |          |FE component/module|/          |
           |          |  for IP Service   |           |
    --->---|------>---|     foo           |----->-----|------>--
           |           -------------------            |
           |                                          |
           |                                          |
            ------------------------------------------
        

The control plane protocol for IP service foo does the following to connect to its FE counterpart. The steps below are also numbered above in the diagram.

IP服务foo的控制平面协议执行以下操作,以连接到其FE对应方。下面的步骤也在上图中编号。

1) Connect to the IP service foo through a socket connect. A typical connection would be via a call to: socket(AF_NETLINK, SOCK_RAW, NETLINK_FOO).

1) 通过套接字连接连接到IP服务foo。典型的连接是通过调用:socket(AF_NETLINK、SOCK_RAW、NETLINK_FOO)。

2) Bind to listen to specific asynchronous events for service foo.

2) 绑定以侦听服务foo的特定异步事件。

3) Bind to listen to specific asynchronous FE events.

3) 绑定以侦听特定的异步FE事件。

Appendix 2: Sample Protocol for the Foo IP Service

附录2:Foo IP服务的示例协议

Our example IP service foo is used again to demonstrate how one can deploy a simple IP service control using Netlink.

我们的示例IP服务foo再次用于演示如何使用Netlink部署简单的IP服务控件。

These steps are continued from Appendix 1 (hence the numbering).

这些步骤从附录1继续(因此编号)。

4) Query for current config of FE component.

4) 查询FE组件的当前配置。

5) Receive response to (4) via channel on (3).

5) 通过(3)上的通道接收对(4)的响应。

6) Query for current state of IP service foo.

6) 查询IP服务foo的当前状态。

7) Receive response to (6) via channel on (2).

7) 通过(2)上的通道接收对(6)的响应。

8) Register the protocol-specific packets you would like the FE to forward to you.

8) 注册您希望FE转发给您的协议特定数据包。

9) Send service-specific foo commands and receive responses for them, if needed.

9) 如果需要,发送特定于服务的foo命令并接收它们的响应。

Appendix 2a: Interacting with Other IP services

附录2a:与其他IP服务交互

The diagram in Appendix 1 shows another control component configuring the same service. In this case, it is a proprietary Command Line Interface. The CLI may or may not be using the Netlink protocol to communicate to the foo component. If the CLI issues commands that will affect the policy of the FEC for service foo then, then the foo CPC is notified. It could then make algorithmic decisions based on this input. For example, if an FE allowed another service to delete policies installed by a different service and a policy that foo installed was deleted by service bar, there might be a need to propagate this to all the peers of service foo.

附录1中的图表显示了配置相同服务的另一个控制组件。在这种情况下,它是一个专有的命令行界面。CLI可能正在使用Netlink协议与foo组件通信,也可能未使用Netlink协议。如果CLI发出的命令将影响FEC for service foo的策略,则会通知foo CPC。然后,它可以根据这些输入做出算法决策。例如,如果FE允许另一个服务删除由另一个服务安装的策略,并且service bar删除了foo安装的策略,则可能需要将其传播到service foo的所有对等方。

Appendix 3: Examples

附录3:示例

In this example, we show a simple configuration Netlink message sent from a TC CPC to an egress TC FIFO queue. This queue algorithm is based on packet counting and drops packets when the limit exceeds 100 packets. We assume that the queue is in a hierarchical setup with a parent 100:0 and a classid of 100:1 and that it is to be installed on a device with an ifindex of 4.

在本例中,我们展示了从TC CPC发送到出口TC FIFO队列的简单配置Netlink消息。该队列算法基于数据包计数,当限制超过100个数据包时丢弃数据包。我们假设队列处于父级为100:0、classid为100:1的分层设置中,并且它将安装在ifindex为4的设备上。

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Length (52)                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Type (RTM_NEWQDISC)           | Flags (NLM_F_EXCL |         |
   |                               |NLM_F_CREATE | NLM_F_REQUEST)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Sequence Number(arbitrary number)      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Process ID (0)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Family(AF_INET)|  Reserved1    |         Reserved1           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index  (4)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Qdisc handle  (0x1000001)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Parent Qdisc   (0x1000000)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        TCM Info  (0)                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Type (TCA_KIND)   |           Length(4)          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Value ("pfifo")                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Type (TCA_OPTIONS) |          Length(4)          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Value (limit=100)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Length (52)                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Type (RTM_NEWQDISC)           | Flags (NLM_F_EXCL |         |
   |                               |NLM_F_CREATE | NLM_F_REQUEST)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Sequence Number(arbitrary number)      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Process ID (0)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Family(AF_INET)|  Reserved1    |         Reserved1           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Interface Index  (4)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Qdisc handle  (0x1000001)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Parent Qdisc   (0x1000000)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        TCM Info  (0)                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Type (TCA_KIND)   |           Length(4)          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Value ("pfifo")                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Type (TCA_OPTIONS) |          Length(4)          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Value (limit=100)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        

Authors' Addresses

作者地址

Jamal Hadi Salim Znyx Networks Ottawa, Ontario Canada

加拿大安大略省渥太华Jamal Hadi Salim Znyx网络公司

   EMail: hadi@znyx.com
        
   EMail: hadi@znyx.com
        

Hormuzd M Khosravi Intel 2111 N.E. 25th Avenue JF3-206 Hillsboro OR 97124-5961 USA

霍尔木兹德M科斯拉维英特尔2111美国希尔斯伯勒第25大道东北JF3-206号或97124-5961号

   Phone: +1 503 264 0334
   EMail: hormuzd.m.khosravi@intel.com
        
   Phone: +1 503 264 0334
   EMail: hormuzd.m.khosravi@intel.com
        

Andi Kleen SuSE Stahlgruberring 28 81829 Muenchen Germany

Andi Kleen SuSE Stahlgrubering 28 81829德国慕尼黑

   EMail: ak@suse.de
        
   EMail: ak@suse.de
        

Alexey Kuznetsov INR/Swsoft Moscow Russia

阿列克谢·库兹涅佐夫印度卢比/Swsoft莫斯科俄罗斯

   EMail: kuznet@ms2.inr.ac.ru
        
   EMail: kuznet@ms2.inr.ac.ru
        

Full Copyright Statement

完整版权声明

Copyright (C) The Internet Society (2003). All Rights Reserved.

版权所有(C)互联网协会(2003年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。

Acknowledgement

确认

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC编辑功能的资金目前由互联网协会提供。