Internet Engineering Task Force (IETF) E. Boschi Request for Comments: 6235 B. Trammell Category: Experimental ETH Zurich ISSN: 2070-1721 May 2011
Internet Engineering Task Force (IETF) E. Boschi Request for Comments: 6235 B. Trammell Category: Experimental ETH Zurich ISSN: 2070-1721 May 2011
IP Flow Anonymization Support
IP流匿名化支持
Abstract
摘要
This document describes anonymization techniques for IP flow data and the export of anonymized data using the IP Flow Information Export (IPFIX) protocol. It categorizes common anonymization schemes and defines the parameters needed to describe them. It provides guidelines for the implementation of anonymized data export and storage over IPFIX, and describes an information model and Options-based method for anonymization metadata export within the IPFIX protocol or storage in IPFIX Files.
本文档描述了IP流数据的匿名化技术以及使用IP流信息导出(IPFIX)协议导出匿名化数据。它对常见的匿名方案进行分类,并定义描述它们所需的参数。它提供了在IPFIX上实现匿名数据导出和存储的指南,并描述了在IPFIX协议内导出匿名元数据或在IPFIX文件中存储匿名元数据的信息模型和基于选项的方法。
Status of This Memo
关于下段备忘
This document is not an Internet Standards Track specification; it is published for examination, experimental implementation, and evaluation.
本文件不是互联网标准跟踪规范;它是为检查、实验实施和评估而发布的。
This document defines an Experimental Protocol for the Internet community. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.
本文档为互联网社区定义了一个实验协议。本文件是互联网工程任务组(IETF)的产品。它代表了IETF社区的共识。它已经接受了公众审查,并已被互联网工程指导小组(IESG)批准出版。并非IESG批准的所有文件都适用于任何级别的互联网标准;见RFC 5741第2节。
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6235.
有关本文件当前状态、任何勘误表以及如何提供反馈的信息,请访问http://www.rfc-editor.org/info/rfc6235.
Copyright Notice
版权公告
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
版权所有(c)2011 IETF信托基金和确定为文件作者的人员。版权所有。
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件,因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本,并提供简化BSD许可证中所述的无担保。
Table of Contents
目录
1. Introduction ....................................................4 1.1. IPFIX Protocol Overview ....................................4 1.2. IPFIX Documents Overview ...................................5 1.3. Anonymization within the IPFIX Architecture ................5 1.4. Supporting Experimentation with Anonymization ..............6 2. Terminology .....................................................6 3. Categorization of Anonymization Techniques ......................7 4. Anonymization of IP Flow Data ...................................8 4.1. IP Address Anonymization ..................................10 4.1.1. Truncation .........................................11 4.1.2. Reverse Truncation .................................11 4.1.3. Permutation ........................................11 4.1.4. Prefix-Preserving Pseudonymization .................12 4.2. MAC Address Anonymization .................................12 4.2.1. Truncation .........................................13 4.2.2. Reverse Truncation .................................13 4.2.3. Permutation ........................................14 4.2.4. Structured Pseudonymization ........................14 4.3. Timestamp Anonymization ...................................15 4.3.1. Precision Degradation ..............................15 4.3.2. Enumeration ........................................16 4.3.3. Random Shifts ......................................16 4.4. Counter Anonymization .....................................16 4.4.1. Precision Degradation ..............................17 4.4.2. Binning ............................................17 4.4.3. Random Noise Addition ..............................17 4.5. Anonymization of Other Flow Fields ........................18 4.5.1. Binning ............................................18 4.5.2. Permutation ........................................18 5. Parameters for the Description of Anonymization Techniques .....19 5.1. Stability .................................................19
1. Introduction ....................................................4 1.1. IPFIX Protocol Overview ....................................4 1.2. IPFIX Documents Overview ...................................5 1.3. Anonymization within the IPFIX Architecture ................5 1.4. Supporting Experimentation with Anonymization ..............6 2. Terminology .....................................................6 3. Categorization of Anonymization Techniques ......................7 4. Anonymization of IP Flow Data ...................................8 4.1. IP Address Anonymization ..................................10 4.1.1. Truncation .........................................11 4.1.2. Reverse Truncation .................................11 4.1.3. Permutation ........................................11 4.1.4. Prefix-Preserving Pseudonymization .................12 4.2. MAC Address Anonymization .................................12 4.2.1. Truncation .........................................13 4.2.2. Reverse Truncation .................................13 4.2.3. Permutation ........................................14 4.2.4. Structured Pseudonymization ........................14 4.3. Timestamp Anonymization ...................................15 4.3.1. Precision Degradation ..............................15 4.3.2. Enumeration ........................................16 4.3.3. Random Shifts ......................................16 4.4. Counter Anonymization .....................................16 4.4.1. Precision Degradation ..............................17 4.4.2. Binning ............................................17 4.4.3. Random Noise Addition ..............................17 4.5. Anonymization of Other Flow Fields ........................18 4.5.1. Binning ............................................18 4.5.2. Permutation ........................................18 5. Parameters for the Description of Anonymization Techniques .....19 5.1. Stability .................................................19
5.2. Truncation Length .........................................19 5.3. Bin Map ...................................................20 5.4. Permutation ...............................................20 5.5. Shift Amount ..............................................20 6. Anonymization Export Support in IPFIX ..........................20 6.1. Anonymization Records and the Anonymization Options Template ..........................................21 6.2. Recommended Information Elements for Anonymization Metadata ..................................................23 6.2.1. informationElementIndex ............................23 6.2.2. anonymizationTechnique .............................23 6.2.3. anonymizationFlags .................................25 7. Applying Anonymization Techniques to IPFIX Export and Storage ..27 7.1. Arrangement of Processes in IPFIX Anonymization ...........28 7.2. IPFIX-Specific Anonymization Guidelines ...................30 7.2.1. Appropriate Use of Information Elements for Anonymized Data ....................................30 7.2.2. Export of Perimeter-Based Anonymization Policies ...31 7.2.3. Anonymization of Header Data .......................32 7.2.4. Anonymization of Options Data ......................32 7.2.5. Special-Use Address Space Considerations ...........34 7.2.6. Protecting Out-of-Band Configuration and Management Data ....................................34 8. Examples .......................................................34 9. Security Considerations ........................................39 10. IANA Considerations ...........................................41 11. Acknowledgments ...............................................41 12. References ....................................................41 12.1. Normative References .....................................41 12.2. Informative References ...................................42
5.2. Truncation Length .........................................19 5.3. Bin Map ...................................................20 5.4. Permutation ...............................................20 5.5. Shift Amount ..............................................20 6. Anonymization Export Support in IPFIX ..........................20 6.1. Anonymization Records and the Anonymization Options Template ..........................................21 6.2. Recommended Information Elements for Anonymization Metadata ..................................................23 6.2.1. informationElementIndex ............................23 6.2.2. anonymizationTechnique .............................23 6.2.3. anonymizationFlags .................................25 7. Applying Anonymization Techniques to IPFIX Export and Storage ..27 7.1. Arrangement of Processes in IPFIX Anonymization ...........28 7.2. IPFIX-Specific Anonymization Guidelines ...................30 7.2.1. Appropriate Use of Information Elements for Anonymized Data ....................................30 7.2.2. Export of Perimeter-Based Anonymization Policies ...31 7.2.3. Anonymization of Header Data .......................32 7.2.4. Anonymization of Options Data ......................32 7.2.5. Special-Use Address Space Considerations ...........34 7.2.6. Protecting Out-of-Band Configuration and Management Data ....................................34 8. Examples .......................................................34 9. Security Considerations ........................................39 10. IANA Considerations ...........................................41 11. Acknowledgments ...............................................41 12. References ....................................................41 12.1. Normative References .....................................41 12.2. Informative References ...................................42
The standardization of an IP Flow Information Export (IPFIX) protocol [RFC5101] and associated representations removes a technical barrier to the sharing of IP flow data across organizational boundaries and with network operations, security, and research communities for a wide variety of purposes. However, with wider dissemination comes greater risks to the privacy of the users of networks under measurement, and to the security of those networks. While it is not a complete solution to the issues posed by distribution of IP flow information, anonymization (i.e., the deletion or transformation of information that is considered sensitive and that could be used to reveal the identity of subjects involved in a communication) is an important tool for the protection of privacy within network measurement infrastructures.
IP流信息导出(IPFIX)协议[RFC5101]和相关表示的标准化消除了跨组织边界共享IP流数据以及与网络运营、安全和研究社区共享用于各种目的的技术障碍。然而,随着更广泛的传播,被测网络用户的隐私以及这些网络的安全面临更大的风险。虽然这并不能完全解决IP流信息分发、匿名化(即删除或转换被视为敏感的信息,并可用于揭示通信所涉主体的身份)所带来的问题是网络测量基础设施中保护隐私的重要工具。
This document presents a mechanism for representing anonymized data within IPFIX and guidelines for using it. It is not intended as a general statement on the applicability of specific flow data anonymization techniques to specific situations or as a recommendation of any particular application of anonymization to flow data export. Exporters or publishers of anonymized data must take care that the applied anonymization technique is appropriate for the data source, the purpose, and the risk of deanonymization of a given application.
本文档介绍了一种在IPFIX中表示匿名数据的机制和使用指南。它不是关于特定流数据匿名技术对特定情况的适用性的一般性声明,也不是对流数据导出的任何特定匿名应用的建议。匿名数据的导出者或发布者必须注意,所应用的匿名技术适用于给定应用程序的数据源、目的和非对称风险。
It begins with a categorization of anonymization techniques. It then describes the applicability of each technique to commonly anonymizable fields of IP flow data, organized by information element data type and semantics as in [RFC5102]; enumerates the parameters required by each of the applicable anonymization techniques; and provides guidelines for the use of each of these techniques in accordance with current best practices in data protection. Finally, it specifies a mechanism for exporting anonymized data and binding anonymization metadata to Templates and Options Templates using IPFIX Options.
它从匿名技术的分类开始。然后描述了每种技术对IP流数据的常见匿名字段的适用性,这些字段按照[RFC5102]中的信息元素数据类型和语义进行组织;列举每种适用的匿名化技术所需的参数;并根据当前数据保护方面的最佳实践,为每种技术的使用提供指导。最后,它指定了一种机制,用于导出匿名数据并使用IPFIX选项将匿名元数据绑定到模板和选项模板。
In the IPFIX protocol, { type, length, value } tuples are expressed in Templates containing { type, length } pairs, specifying which { value } fields are present in data records conforming to the Template, giving great flexibility as to what data is transmitted. Since Templates are sent very infrequently compared with Data Records, this results in significant bandwidth savings. Various different data formats may be transmitted simply by sending new Templates specifying the { type, length } pairs for the new data format. See [RFC5101] for more information.
在IPFIX协议中,{type,length,value}元组在包含{type,length}对的模板中表示,指定符合模板的数据记录中存在哪些{value}字段,从而在传输什么数据方面提供了极大的灵活性。由于与数据记录相比,发送模板的频率非常低,因此可以显著节省带宽。通过发送指定新数据格式的{type,length}对的新模板,可以简单地传输各种不同的数据格式。有关更多信息,请参阅[RFC5101]。
The IPFIX information model [RFC5102] defines a large number of standard Information Elements (IEs) that provide the necessary { type } information for Templates. The use of standard elements enables interoperability among different vendors' implementations. Additionally, non-standard enterprise-specific elements may be defined for private use.
IPFIX信息模型[RFC5102]定义了大量标准信息元素,为模板提供必要的{type}信息。使用标准元素可以实现不同供应商实现之间的互操作性。此外,非标准企业特定元素可定义为私人使用。
"Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information" [RFC5101] and its associated documents define the IPFIX protocol, which provides network engineers and administrators with access to IP traffic flow information.
“用于交换IP流量信息的IP流量信息导出(IPFIX)协议规范”[RFC5101]及其相关文件定义了IPFIX协议,该协议为网络工程师和管理员提供了访问IP流量信息的权限。
"Architecture for IP Flow Information Export" [RFC5470] defines the architecture for the export of measured IP flow information out of an IPFIX Exporting Process to an IPFIX Collecting Process, and the basic terminology used to describe the elements of this architecture, per the requirements defined in "Requirements for IP Flow Information Export" [RFC3917]. The IPFIX Protocol document [RFC5101] then covers the details of the method for transporting IPFIX Data Records and Templates via a congestion-aware transport protocol from an IPFIX Exporting Process to an IPFIX Collecting Process.
“IP流信息导出体系结构”[RFC5470]根据“IP流信息导出要求”中定义的要求,定义了将测得的IP流信息从IPFIX导出过程导出到IPFIX收集过程的体系结构,以及用于描述此体系结构元素的基本术语[RFC3917]。然后,IPFIX协议文档[RFC5101]详细介绍了通过拥塞感知传输协议将IPFIX数据记录和模板从IPFIX导出过程传输到IPFIX收集过程的方法。
"Information Model for IP Flow Information Export" [RFC5102] describes the Information Elements used by IPFIX, including details on Information Element naming, numbering, and data type encoding. Finally, "IP Flow Information Export (IPFIX) Applicability" [RFC5472] describes the various applications of the IPFIX protocol and their use of information exported via IPFIX and relates the IPFIX architecture to other measurement architectures and frameworks.
“IP流信息导出的信息模型”[RFC5102]描述了IPFIX使用的信息元素,包括有关信息元素命名、编号和数据类型编码的详细信息。最后,“IP流信息导出(IPFIX)适用性”[RFC5472]描述了IPFIX协议的各种应用及其对通过IPFIX导出的信息的使用,并将IPFIX体系结构与其他度量体系结构和框架联系起来。
Additionally, "Specification of the IP Flow Information Export (IPFIX) File Format" [RFC5655] describes a file format based upon the IPFIX protocol for the storage of flow data.
此外,“IP流信息导出(IPFIX)文件格式规范”[RFC5655]描述了基于IPFIX协议的流数据存储文件格式。
This document references the Protocol and Architecture documents for terminology and extends the IPFIX Information Model to provide new Information Elements for anonymization metadata. The anonymization techniques described herein are equally applicable to the IPFIX protocol and data stored in IPFIX Files.
本文档参考了协议和体系结构文档中的术语,并扩展了IPFIX信息模型,为匿名化元数据提供了新的信息元素。本文描述的匿名化技术同样适用于IPFIX协议和存储在IPFIX文件中的数据。
According to [RFC5470], IPFIX Message anonymization is optionally performed as the final operation before handing the Message to the transport protocol for export. While no provision is made in the
根据[RFC5470],IPFIX消息匿名可选地作为将消息传递给传输协议进行导出之前的最终操作执行。然而,该预算中没有作出规定
architecture for anonymization metadata as in Section 6, this arrangement does allow for the rewriting necessary for comprehensive anonymization of IPFIX export as in Section 7. The development of the IPFIX Mediation [RFC6183] framework and the IPFIX File Format [RFC5655] expand upon this initial architectural allowance for anonymization by adding to the list of places that anonymization may be applied. The former specifies IPFIX Mediators, which rewrite existing IPFIX Messages, and the latter specifies a method for storage of IPFIX data in files.
匿名化元数据体系结构如第6节所述,该安排允许对IPFIX导出进行全面匿名化所需的重写,如第7节所述。IPFIX中介[RFC6183]框架和IPFIX文件格式[RFC5655]的开发通过添加到可应用匿名化的位置列表,扩展了匿名化的初始体系结构许可。前者指定用于重写现有IPFIX消息的IPFIX中介,后者指定用于在文件中存储IPFIX数据的方法。
More detail on the applicable architectural arrangements for anonymization can be found in Section 7.1
有关匿名化的适用架构安排的更多详细信息,请参见第7.1节
The status of this document is Experimental, reflecting the experimental nature of anonymization export support. Research on network trace anonymization techniques and attacks against them is ongoing. Indeed, there is increasing evidence that anonymization applied to network trace or flow data on its own is insufficient for many data protection applications as in [Bur10]. Therefore, this document explicitly does not recommend any particular technique or implementation thereof.
本文档的状态是实验性的,反映了匿名导出支持的实验性质。关于网络跟踪匿名化技术及其攻击的研究正在进行中。事实上,越来越多的证据表明,仅对网络跟踪或流数据进行匿名化不足以满足[Bur10]中的许多数据保护应用。因此,本文件明确不推荐任何特定技术或其实现。
The intention of this document is to provide a common basis for interoperable exchange of anonymized data, furthering research in this area, both on anonymization techniques themselves as well as to the application of anonymized data to network measurement. To that end, the classification in Section 3 and anonymization export support in Section 6 can be used to describe and export information even about data anonymized using techniques that are unacceptably weak for general application to production datasets on their own.
本文件的目的是为匿名数据的互操作交换提供一个共同的基础,进一步研究该领域的匿名技术本身以及匿名数据在网络测量中的应用。为此,第3节中的分类和第6节中的匿名化导出支持可用于描述和导出信息,甚至是关于使用技术匿名化的数据的信息,这些技术本身对于生产数据集的一般应用来说是不可接受的薄弱环节。
While the specification herein is designed to be independent of the anonymization techniques applied and the implementation thereof, open research in this area may necessitate future updates to the specification. Assuming the future successful application of this specification to anonymized data publication and exchange, it may be brought back to the IPFIX working group for further development and publication on the Standards Track.
虽然本文中的规范被设计为独立于所应用的匿名化技术及其实现,但该领域中的开放研究可能需要对规范进行未来更新。假设本规范将来成功应用于匿名数据发布和交换,则可将其带回IPFIX工作组,以便在标准轨道上进一步开发和发布。
Terms used in this document that are defined in the Terminology section of the IPFIX Protocol [RFC5101] document are to be interpreted as defined there. In addition, this document defines the following terms:
本文件中使用的IPFIX协议[RFC5101]文件术语部分中定义的术语应按照此处定义进行解释。此外,本文件定义了以下术语:
Anonymization Record: A record, defined by the Anonymization Options Template in Section 6.1, that defines the properties of the anonymization applied to a single Information Element within a single Template or Options Template.
匿名化记录:由第6.1节中的匿名化选项模板定义的记录,用于定义应用于单个模板或选项模板中单个信息元素的匿名化属性。
Anonymized Data Record: A Data Record within a Data Set containing at least one Information Element with anonymized values. The Information Element(s) within the Template or Options Template describing this Data Record SHOULD have a corresponding Anonymization Record.
匿名数据记录:数据集中的数据记录,包含至少一个具有匿名值的信息元素。模板或选项模板中描述此数据记录的信息元素应具有相应的匿名化记录。
Intermediate Anonymization Process: An intermediate process that takes Data Records and transforms them into Anonymized Data Records.
中间匿名过程:获取数据记录并将其转换为匿名数据记录的中间过程。
Note that there is an explicit difference in this document between a "Data Set" (which is defined as in [RFC5101]) and a "data set". When in lower case, this term refers to any collection of data (usually, within the context of this document, flow or packet data) that may contain identifying information and is therefore subject to anonymization.
请注意,本文件中“数据集”(定义见[RFC5101])和“数据集”之间存在明显差异。在小写情况下,该术语指的是可能包含识别信息的任何数据集合(通常在本文档的上下文中为流或分组数据),因此需要匿名。
Note also that when the term Template is used in this document, unless otherwise noted, it applies both to Templates and Options Templates as defined in [RFC5101]. Specifically, Anonymization Records may apply to both Templates and Options Templates.
另请注意,当本文件中使用术语模板时,除非另有说明,否则它适用于[RFC5101]中定义的模板和选项模板。具体而言,匿名记录可能同时适用于模板和选项模板。
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。
Anonymization, as described by this document, is the modification of a dataset in order to protect the identity of the people or entities described by the dataset from disclosure. With respect to network traffic data, anonymization generally attempts to preserve some set of properties of the network traffic useful for a given application or applications, while ensuring the data cannot be traced back to the specific networks, hosts, or users generating the traffic.
如本文件所述,匿名化是对数据集的修改,以保护数据集所述人员或实体的身份不被披露。关于网络流量数据,匿名化通常试图保留对给定应用程序有用的网络流量的某些属性集,同时确保数据不能追溯到生成流量的特定网络、主机或用户。
Anonymization may be broadly classified according to two properties: recoverability and countability. All anonymization techniques map the real space of identifiers or values into a separate, anonymized space, according to some function. A technique is said to be recoverable when the function used is invertible or can otherwise be reversed and a real identifier can be recovered from a given replacement identifier. "Recoverability" as used within this
匿名化可以根据两个属性进行广泛分类:可恢复性和可数性。所有的匿名化技术根据某种功能将标识符或值的真实空间映射到一个单独的匿名化空间。当所使用的函数是可逆的或可以反转,并且可以从给定的替换标识符中恢复真实标识符时,称技术是可恢复的。本协议中使用的“可恢复性”
categorization does not refer to recoverability under attack; that is, techniques wherein the function used can only be reversed using additional information, such as an encryption key, or knowledge of injected traffic within the dataset, are not considered to be recoverable.
分类不是指攻击下的可恢复性;也就是说,其中所使用的功能只能使用附加信息(例如加密密钥)或数据集中注入流量的知识来反转的技术被认为是不可恢复的。
Countability compares the dimension of the anonymized space (N) to the dimension of the real space (M), and denotes how the count of unique values is preserved by the anonymization function. If the anonymized space is smaller than the real space, then the function is said to generalize the input, mapping more than one input point to each anonymous value (e.g., as with aggregation). By definition, generalization is not recoverable.
可数性将匿名化空间(N)的维数与实空间(M)的维数进行比较,并表示匿名化函数如何保持唯一值的计数。如果匿名空间小于真实空间,则该函数被称为泛化输入,将多个输入点映射到每个匿名值(例如,与聚合一样)。根据定义,泛化是不可恢复的。
If the dimensions of the anonymized and real spaces are the same, such that the count of unique values is preserved, then the function is said to be a direct substitution function. If the dimension of the anonymized space is larger, such that each real value maps to a set of anonymized values, then the function is said to be a set substitution function. Note that with set substitution functions, the sets of anonymized values are not necessarily disjoint. Either direct or set substitution functions are said to be one-way if there exists no non-brute force method for recovering the real data point from an anonymized one in isolation (i.e., if the only way to recover the data point is to attack the anonymized data set as a whole, e.g., through fingerprinting or data injection).
如果匿名空间和实空间的维数相同,因此保留了唯一值的计数,则该函数称为直接替换函数。如果匿名空间的维数更大,使得每个实值映射到一组匿名值,则该函数称为集合替换函数。注意,对于集合替换函数,匿名值集合不一定是不相交的。如果不存在用于从单独的匿名数据点恢复真实数据点的非暴力方法(即,如果恢复数据点的唯一方法是攻击整个匿名数据集,例如通过指纹或数据注入),则称直接替换函数或集合替换函数为单向函数。
This classification is summarized in the table below.
下表总结了这一分类。
+------------------------+-----------------+------------------------+ | Recoverability / | Recoverable | Non-recoverable | | Countability | | | +------------------------+-----------------+------------------------+ | N < M | N.A. | Generalization | | N = M | Direct | One-way Direct | | | Substitution | Substitution | | N > M | Set | One-way Set | | | Substitution | Substitution | +------------------------+-----------------+------------------------+
+------------------------+-----------------+------------------------+ | Recoverability / | Recoverable | Non-recoverable | | Countability | | | +------------------------+-----------------+------------------------+ | N < M | N.A. | Generalization | | N = M | Direct | One-way Direct | | | Substitution | Substitution | | N > M | Set | One-way Set | | | Substitution | Substitution | +------------------------+-----------------+------------------------+
In anonymizing IP flow data as treated by this document, the goal is generally two-way address untraceability: to remove the ability to assert that endpoint X contacted endpoint Y at time T. Address untraceability is important as IP addresses are the most suitable field in IP flow records to identify real-world entities. Each IP address is associated with an interface on a network host and can
在对本文档处理的IP流数据进行匿名化时,目标通常是双向地址不可追踪:删除在时间T断言端点X与端点Y接触的能力。地址不可追踪性很重要,因为IP地址是IP流记录中最适合识别真实世界实体的字段。每个IP地址都与网络主机上的接口相关联,并且可以
potentially be identified with a single user. Additionally, IP addresses are structured identifiers; that is, partial IP address prefixes may be used to identify networks just as full IP addresses identify hosts. This leads IP flow data anonymization to be concerned first and foremost with IP address anonymization.
可能由单个用户标识。此外,IP地址是结构化标识符;也就是说,部分IP地址前缀可用于标识网络,就像完整IP地址标识主机一样。这导致IP流数据匿名首先与IP地址匿名有关。
Any form of aggregation that combines flows from multiple endpoints into a single record (e.g., aggregation by subnetwork, aggregation removing addressing completely) may also provide address untraceability; however, anonymization by aggregation is out of scope for this document. Additionally, of potential interest in this problem space but out of scope are anonymization techniques that are applied over multiple fields or multiple records in a way that introduces dependencies among anonymized fields or records. This document is concerned solely with anonymization techniques applied at the resolution of single fields within a flow record.
将来自多个端点的流组合到单个记录中的任何形式的聚合(例如,通过子网聚合,完全移除寻址的聚合)也可能提供地址不可追踪性;但是,通过聚合进行匿名不在本文档的范围之内。此外,在这个问题空间中可能感兴趣但超出范围的是匿名化技术,该技术以一种在匿名化字段或记录之间引入依赖关系的方式应用于多个字段或多个记录。本文件仅涉及流记录中单个字段解析时应用的匿名技术。
Even so, attacks against these anonymization techniques use entire flows and relationships between hosts and flows within a given dataset. Therefore, fields that may not necessarily be identifying by themselves may be anonymized in order to increase the anonymity of the dataset as a whole.
即便如此,针对这些匿名化技术的攻击使用整个流以及主机和给定数据集中的流之间的关系。因此,不一定由其自身识别的字段可以匿名化,以增加数据集作为一个整体的匿名性。
Due to the restricted semantics of IP flow data, there is a relatively limited set of specific anonymization techniques available on flow data, though each falls into the broad categories discussed in the previous section. Each type of field that may commonly appear in a flow record may have its own applicable specific techniques.
由于IP流数据的语义受限,流数据上可用的特定匿名化技术相对有限,尽管每种技术都属于上一节讨论的大类。通常出现在流量记录中的每种类型的字段都有其自己适用的特定技术。
As with IP addresses, Media Access Control (MAC) addresses uniquely identify devices on the network; while they are not often available in traffic data collected at Layer 3, and cannot be used to locate devices within the network, some traces may contain sub-IP data including MAC address data. Hardware addresses may be mappable to device serial numbers, and to the entities or individuals who purchased the devices, when combined with external databases. MAC addresses are also often used in constructing IPv6 addresses (see Section 2.5.1 of [RFC4291]) and as such may be used to reconstruct the low-order bits of anonymized IPv6 addresses in certain circumstances. Therefore, MAC address anonymization is also important.
与IP地址一样,媒体访问控制(MAC)地址唯一地标识网络上的设备;虽然它们在第3层收集的流量数据中通常不可用,并且不能用于定位网络内的设备,但一些跟踪可能包含子IP数据,包括MAC地址数据。当与外部数据库结合时,硬件地址可以映射到设备序列号以及购买设备的实体或个人。MAC地址也经常用于构造IPv6地址(见[RFC4291]第2.5.1节),因此在某些情况下可用于重构匿名IPv6地址的低阶位。因此,MAC地址匿名化也很重要。
Port numbers identify abstract entities (applications) as opposed to real-world entities, but they can be used to classify hosts and user behavior. Passive port fingerprinting, both of well-known and ephemeral ports, can be used to determine the operating system
端口号标识抽象实体(应用程序),而不是真实实体,但它们可用于对主机和用户行为进行分类。被动端口指纹识别(包括已知端口和临时端口)可用于确定操作系统
running on a host. Relative data volumes by port can also be used to determine the host's function (workstation, web server, etc.); this information can be used to identify hosts and users.
在主机上运行。按端口划分的相对数据量也可用于确定主机的功能(工作站、web服务器等);此信息可用于标识主机和用户。
While not identifiers in and of themselves, timestamps and counters can reveal the behavior of the hosts and users on a network. Any given network activity is recognizable by a pattern of relative time differences and data volumes in the associated sequence of flows, even without host address information. Therefore, they can be used to identify hosts and users. Timestamps and counters are also vulnerable to traffic injection attacks, where traffic with a known pattern is injected into a network under measurement, and this pattern is later identified in the anonymized dataset.
虽然时间戳和计数器本身不是标识符,但它们可以揭示网络上主机和用户的行为。任何给定的网络活动都可以通过相关流序列中的相对时间差和数据量模式来识别,即使没有主机地址信息。因此,它们可以用来识别主机和用户。时间戳和计数器也容易受到流量注入攻击,其中具有已知模式的流量被注入到被测量的网络中,该模式随后在匿名数据集中被识别。
The simplest and most extreme form of anonymization, which can be applied to any field of a flow record, is black-marker anonymization, or complete deletion of a given field. Note that black-marker anonymization is equivalent to simply not exporting the field(s) in question.
最简单和最极端的匿名形式,可以应用于流记录的任何字段,是黑标记匿名,或完全删除给定字段。请注意,黑色标记匿名化等同于不导出相关字段。
While black-marker anonymization completely protects the data in the deleted fields from the risk of disclosure, it also reduces the utility of the anonymized dataset as a whole. Techniques that retain some information while reducing (though not eliminating) the disclosure risk will be extensively discussed in the following sections; note that the techniques specifically applicable to IP addresses, timestamps, ports, and counters will be discussed in separate sections.
虽然黑标记匿名化完全保护已删除字段中的数据不受泄露风险的影响,但它也降低了整个匿名数据集的效用。在降低(但不是消除)披露风险的同时保留某些信息的技术将在以下章节中进行广泛讨论;请注意,专门适用于IP地址、时间戳、端口和计数器的技术将在单独的章节中讨论。
Since IP addresses are the most common identifiers within flow data that can be used to directly identify a person, organization, or host, most of the work on flow and trace data anonymization has gone into IP address anonymization techniques. Indeed, the aim of most attacks against anonymization is to recover the map from anonymized IP addresses to original IP addresses thereby identifying the identified hosts. Therefore, there is a wide range of IP address anonymization schemes that fit into the following categories.
由于IP地址是流数据中最常见的标识符,可用于直接识别个人、组织或主机,因此流和跟踪数据匿名化的大部分工作都涉及IP地址匿名化技术。实际上,大多数针对匿名化的攻击的目的是将映射从匿名化IP地址恢复到原始IP地址,从而识别已识别的主机。因此,有一系列的IP地址匿名化方案可分为以下几类。
+------------------------------------+---------------------+ | Scheme | Action | +------------------------------------+---------------------+ | Truncation | Generalization | | Reverse Truncation | Generalization | | Permutation | Direct Substitution | | Prefix-preserving Pseudonymization | Direct Substitution | +------------------------------------+---------------------+
+------------------------------------+---------------------+ | Scheme | Action | +------------------------------------+---------------------+ | Truncation | Generalization | | Reverse Truncation | Generalization | | Permutation | Direct Substitution | | Prefix-preserving Pseudonymization | Direct Substitution | +------------------------------------+---------------------+
Truncation removes "n" of the least significant bits from an IP address, replacing them with zeroes. In effect, it replaces a host address with a network address for some fixed netblock; for IPv4 addresses, 8-bit truncation corresponds to replacement with a /24 network address. Truncation is a non-reversible generalization scheme. Note that while truncation is effective for making hosts non-identifiable, it preserves information that can be used to identify an organization, a geographic region, a country, or a continent.
截断从IP地址中删除最低有效位的“n”,并将其替换为零。实际上,对于某些固定netblock,它将主机地址替换为网络地址;对于IPv4地址,8位截断对应于用/24网络地址替换。截断是一种不可逆的推广方案。请注意,虽然截断可以有效地使主机无法识别,但它保留了可用于识别组织、地理区域、国家或大陆的信息。
Truncation to an address length of 0 is equivalent to black-marker anonymization. Complete removal of IP address information is only recommended for analysis tasks that have no need to separate flow data by host or network; e.g., as a first stage to per-application (port) or time-series total volume analyses.
将地址长度截断为0相当于黑色标记匿名化。仅建议在不需要按主机或网络分离流数据的分析任务中完全删除IP地址信息;e、 例如,作为每个应用程序(端口)或时间序列总容量分析的第一阶段。
Reverse truncation removes "n" of the most significant bits from an IP address, replacing them with zeroes. Reverse truncation is a non-reversible generalization scheme. Reverse truncation is effective for making networks unidentifiable, partially or completely removing information that can be used to identify an organization, a geographic region, a country, or a continent (or Regional Internet Registry (RIR) region of responsibility). However, it may cause ambiguity when applied to data collected from more than one network, since it treats all the hosts with the same address on different networks as if they are the same host. It is not particularly useful when publishing data where the network of origin is known or can be easily guessed by virtue of the identity of the publisher.
反向截断从IP地址中删除最高有效位中的“n”,并将其替换为零。反向截断是一种不可逆的推广方案。反向截断可有效地使网络无法识别,部分或完全删除可用于识别组织、地理区域、国家或大陆(或区域互联网注册(RIR)责任区)的信息。但是,当应用于从多个网络收集的数据时,它可能会导致歧义,因为它将不同网络上具有相同地址的所有主机视为同一主机。当发布来源网络已知或可通过发布者身份轻松猜测的数据时,它不是特别有用。
Like truncation, reverse truncation to an address length of 0 is equivalent to black-marker anonymization.
与截断类似,地址长度为0的反向截断相当于黑色标记匿名化。
Permutation is a direct substitution technique, replacing each IP address with an address selected from the set of possible IP addresses, such that each anonymized address represents a unique original address. The selection function is often random, though it is not necessarily so. Permutation does not preserve any structural information about a network, but it does preserve the unique count of IP addresses. Any application that requires more structure than host-uniqueness will not be able to use permuted IP addresses.
置换是一种直接替换技术,用从一组可能的IP地址中选择的地址替换每个IP地址,使得每个匿名地址代表一个唯一的原始地址。选择函数通常是随机的,尽管未必如此。排列不会保留任何有关网络的结构信息,但它会保留IP地址的唯一计数。任何需要比主机唯一性更高结构的应用程序都不能使用置换IP地址。
There are many variations of permutation functions, each of which has trade-offs in performance, security, and guarantees of non-collision; evaluating these trade-offs is implementation independent. However, in general, permutation functions applied to anonymization SHOULD be difficult to reverse without knowing the parameters (e.g., a secret key for Hashed Message Authentication Code (HMAC). Given the relatively small space of IPv4 addresses in particular, hash functions applied without additional parameters could be reversed through brute force if the hash function is known, and SHOULD NOT be used as permutation functions. Permutation functions may guarantee non-collision (i.e., that each anonymized address represents a unique original address), but need not; however, the probability of collision SHOULD be low. Nevertheless, we treat even permutations with low but nonzero collision probability as a direct substitution. Beyond these guidelines, recommendations for specific permutation functions are out of scope for this document.
置换函数有许多变体,每种变体在性能、安全性和无冲突保证方面都有权衡;评估这些权衡是独立于实现的。然而,一般来说,应用于匿名化的置换函数在不知道参数(例如,散列消息认证码(HMAC)的密钥)的情况下应该很难反转。特别是考虑到IPv4地址的空间相对较小,如果已知哈希函数,则在不使用其他参数的情况下应用的哈希函数可以通过暴力方式反转,并且不应将其用作置换函数。置换函数可以保证不冲突(即,每个匿名地址代表一个唯一的原始地址),但不需要;然而,碰撞概率应较低。然而,我们将碰撞概率较低但非零的置换视为直接置换。除这些指南外,本文件不适用于特定置换函数的建议。
Prefix-preserving pseudonymization is a direct substitution technique, like permutation but further restricted such that the structure of subnets is preserved at each level while anonymizing IP addresses. If two real IP addresses match on a prefix of "n" bits, the two anonymized IP addresses will match on a prefix of "n" bits as well. This is useful when relationships among networks must be preserved for a given analysis task, but introduces structure into the anonymized data that can be exploited in attacks against the anonymization technique.
保留前缀的假名化是一种直接替换技术,与置换类似,但受到进一步限制,使得在匿名IP地址的同时,子网的结构在每个级别上都得到保留。如果两个真实IP地址在前缀“n”位上匹配,则两个匿名IP地址也将在前缀“n”位上匹配。当必须为给定的分析任务保留网络之间的关系时,这非常有用,但会将结构引入到匿名数据中,这些数据可在针对匿名技术的攻击中被利用。
Scanning in Internet background traffic can cause particular problems with this technique: if a scanner uses a predictable and known sequence of addresses, this information can be used to reverse the substitution. The low-order portion of the address can be left unanonymized as a partial defense against this attack.
在Internet后台流量中进行扫描可能会导致此技术出现特殊问题:如果扫描仪使用可预测且已知的地址序列,则此信息可用于反向替换。作为对此攻击的部分防御,地址的低阶部分可以保持未对齐状态。
Flow data containing sub-IP information can also contain identifying information in the form of the hardware (MAC) address. While MAC address information cannot be used to locate a node within a network, it can be used to directly and uniquely identify a specific device. Vendors or organizations within the supply chain may then have the information necessary to identify the entity or individual that purchased the device.
包含子IP信息的流数据还可以包含硬件(MAC)地址形式的标识信息。虽然MAC地址信息不能用于定位网络中的节点,但它可以用于直接且唯一地标识特定设备。供应链中的供应商或组织可能拥有识别购买设备的实体或个人所需的信息。
MAC address information is not as structured as IP address information. EUI-48 and EUI-64 MAC addresses contain an Organizational Unique Identifier (OUI) in the three most significant
MAC地址信息的结构不如IP地址信息。EUI-48和EUI-64 MAC地址在三个最重要的地址中包含组织唯一标识符(OUI)
bytes of the address; this OUI additionally contains bits noting whether the address is locally or globally administered. Beyond this, there is no standard relationship among the OUIs assigned to a given vendor.
地址的字节数;此OUI还包含位,指出地址是本地管理的还是全局管理的。除此之外,分配给给定供应商的OUI之间没有标准关系。
Note that MAC address information also appears within IPv6 addresses as the EAP-64 address, or EAP-48 address encoded as an EAP-64 address, is used as the least significant 64 bits of the IPv6 address in the case of link-local addressing or stateless autoconfiguration; the considerations and techniques in this section may then apply to such IPv6 addresses as well.
注意,MAC地址信息也出现在IPv6地址中,因为在链路本地寻址或无状态自动配置的情况下,EAP-64地址或编码为EAP-64地址的EAP-48地址用作IPv6地址的最低有效64位;本节中的注意事项和技术也可应用于此类IPv6地址。
+-----------------------------+---------------------+ | Scheme | Action | +-----------------------------+---------------------+ | Truncation | Generalization | | Reverse Truncation | Generalization | | Permutation | Direct Substitution | | Structured Pseudonymization | Direct Substitution | +-----------------------------+---------------------+
+-----------------------------+---------------------+ | Scheme | Action | +-----------------------------+---------------------+ | Truncation | Generalization | | Reverse Truncation | Generalization | | Permutation | Direct Substitution | | Structured Pseudonymization | Direct Substitution | +-----------------------------+---------------------+
Truncation removes "n" of the least significant bits from a MAC address, replacing them with zeroes. In effect, it retains bits of OUI, which identifies the manufacturer, while removing the least significant bits identifying the particular device. Truncation of 24 bits of an EAP-48 or 40 bits of an EAP-64 address zeroes out the device identifier while retaining the OUI.
截断从MAC地址中删除最低有效位中的“n”,并将其替换为零。实际上,它保留了用于标识制造商的OUI位,同时删除了标识特定设备的最低有效位。截断EAP-48的24位或EAP-64地址的40位将使设备标识符归零,同时保留OUI。
Truncation is effective for making device manufacturers partially or completely identifiable within a dataset while deleting unique host identifiers; this can be used to retain and aggregate MAC-layer behavior by vendor.
截断可有效地使设备制造商在数据集中部分或完全可识别,同时删除唯一的主机标识符;这可用于按供应商保留和聚合MAC层行为。
Truncation to an address length of 0 is equivalent to black-marker anonymization.
将地址长度截断为0相当于黑色标记匿名化。
Reverse truncation removes "n" of the most significant bits from a MAC address, replacing them with zeroes. Reverse truncation is a non-reversible generalization scheme. This has the effect of removing bits of the OUI, which identify manufacturers, before removing the least significant bits. Reverse truncation of 24 bits zeroes out the OUI.
反向截断从MAC地址中删除最高有效位中的“n”,并将其替换为零。反向截断是一种不可逆的推广方案。这具有在移除最低有效位之前移除用于标识制造商的OUI位的效果。24位的反向截断将OUI置零。
Reverse truncation is effective for making device manufacturers partially or completely unidentifiable within a dataset. However, it may cause ambiguity by introducing the possibility of truncated MAC address collision. Also, note that the utility of removing manufacturer information is not particularly well covered by the literature.
反向截断对于使设备制造商在数据集中部分或完全不可识别是有效的。然而,它可能会引入截断MAC地址冲突的可能性,从而导致歧义。此外,请注意,删除制造商信息的实用性并未在文献中得到很好的阐述。
Reverse truncation to an address length of 0 is equivalent to black-marker anonymization.
地址长度为0的反向截断相当于黑色标记匿名化。
Permutation is a direct substitution technique, replacing each MAC address with an address selected from the set of possible MAC addresses, such that each anonymized address represents a unique original address. The selection function is often random, though it is not necessarily so. Permutation does not preserve any structural information about a network, but it does preserve the unique count of devices on the network. Any application that requires more structure than host-uniqueness will not be able to use permuted MAC addresses.
置换是一种直接替换技术,用从一组可能的MAC地址中选择的地址替换每个MAC地址,使得每个匿名地址表示唯一的原始地址。选择函数通常是随机的,尽管未必如此。排列不会保留任何有关网络的结构信息,但它会保留网络上设备的唯一计数。任何需要比主机唯一性更高结构的应用程序都不能使用置换MAC地址。
There are many variations of permutation functions, each of which has trade-offs in performance, security, and guarantees of non-collision; evaluating these trade-offs is implementation independent. However, in general, permutation functions applied to anonymization SHOULD be difficult to reverse without knowing the parameters (e.g., a secret key for HMAC). While the EAP-48 space is larger than the IPv4 address space, hash functions applied without additional parameters could be reversed through brute force if the hash function is known, and SHOULD NOT be used as permutation functions. Permutation functions may guarantee non-collision (i.e., that each anonymized address represents a unique original address), but need not; however, the probability of collision SHOULD be low. Nevertheless, we treat even permutations with low but nonzero collision probability as a direct substitution. Beyond these guidelines, recommendations for specific permutation functions are out of scope for this document.
置换函数有许多变体,每种变体在性能、安全性和无冲突保证方面都有权衡;评估这些权衡是独立于实现的。然而,一般来说,在不知道参数(例如,HMAC的密钥)的情况下,应用于匿名化的置换函数应该很难反转。虽然EAP-48空间比IPv4地址空间大,但如果哈希函数已知,则在不使用额外参数的情况下应用的哈希函数可以通过蛮力反转,并且不应用作置换函数。置换函数可以保证无冲突(即,每个匿名地址代表一个唯一的原始地址),但不需要;然而,碰撞的概率应该很低。然而,我们将碰撞概率低但非零的偶数置换作为直接置换。除了这些指南之外,关于特定置换函数的建议不在本文档的范围之内。
Structured pseudonymization for MAC addresses is a direct substitution technique, like permutation, but restricted such that the OUI (the most significant three bytes) is permuted separately from the node identifier, the remainder. This is useful when the uniqueness of OUIs must be preserved for a given analysis task, but introduces structure into the anonymized data that can be exploited in attacks against the anonymization technique.
MAC地址的结构化假名是一种直接替换技术,与置换类似,但受到限制,使得OUI(最重要的三个字节)与节点标识符(剩余的)分开进行置换。当必须为给定的分析任务保留OUI的唯一性时,这非常有用,但会将结构引入匿名化数据中,从而可以利用匿名化技术进行攻击。
The particular time at which a flow began or ended is not particularly identifiable information, but it can be used as part of attacks against other anonymization techniques or for user profiling, e.g., as in [Mur07]. Timestamps can be used in traffic injection attacks, which use known information about a set of traffic generated or otherwise known by an attacker to recover mappings of other anonymized fields, as well as to identify certain activity by response delay and size fingerprinting, which compares response sizes and inter-flow times in anonymized data to known values. Note that these attacks have been shown to be relatively robust against timestamp anonymization techniques (see [Bur10]), so the techniques presented in this section are relatively weak and should be used with care.
流开始或结束的特定时间不是特别可识别的信息,但它可以作为攻击其他匿名技术的一部分或用于用户分析,例如[Mur07]。时间戳可用于流量注入攻击,该攻击使用攻击者生成的或以其他方式已知的一组流量的已知信息来恢复其他匿名字段的映射,以及通过响应延迟和大小指纹识别某些活动,它将匿名数据中的响应大小和流间时间与已知值进行比较。注意,这些攻击已被证明对时间戳匿名化技术具有相对鲁棒性(参见[Bur10]),因此本节中介绍的技术相对较弱,应谨慎使用。
+-----------------------+----------------------------+ | Scheme | Action | +-----------------------+----------------------------+ | Precision Degradation | Generalization | | Enumeration | Direct or Set Substitution | | Random Shifts | Direct Substitution | +-----------------------+----------------------------+
+-----------------------+----------------------------+ | Scheme | Action | +-----------------------+----------------------------+ | Precision Degradation | Generalization | | Enumeration | Direct or Set Substitution | | Random Shifts | Direct Substitution | +-----------------------+----------------------------+
Precision Degradation is a generalization technique that removes the most precise components of a timestamp, accounting for all events occurring in each given interval (e.g., one millisecond for millisecond level degradation) as simultaneous. This has the effect of potentially collapsing many timestamps into one. With this technique, time precision is reduced and sequencing may be lost, but the information regarding at which time the event occurred is preserved. The anonymized data may not be generally useful for applications that require strict sequencing of flows.
精度降级是一种泛化技术,它删除时间戳中最精确的部分,同时考虑每个给定间隔内发生的所有事件(例如,毫秒级降级为一毫秒)。这可能会将多个时间戳压缩为一个时间戳。使用这种技术,时间精度降低,排序可能丢失,但有关事件发生时间的信息得以保留。匿名化数据对于需要严格流排序的应用程序通常不有用。
Note that flow meters with low time precision (e.g., second precision, or millisecond precision on high-capacity networks) perform the equivalent of precision degradation anonymization by their design.
请注意,时间精度较低的流量计(例如,秒精度或高容量网络上的毫秒精度)通过其设计执行精度降级匿名化的等效功能。
Also, note that degradation to a very low precision (e.g., on the order of minutes, hours, or days) is commonly used in analyses operating on time-series aggregated data, and may also be described as binning; though the time scales are longer and applicability more restricted, in principle, this is the same operation.
此外,请注意,在对时间序列聚合数据进行操作的分析中,通常使用降到非常低的精度(例如,以分钟、小时或天为单位),也可以将其描述为装箱;虽然时间尺度更长,适用性更受限制,但原则上,这是相同的操作。
Precision degradation to infinitely low precision is equivalent to black-marker anonymization. Removal of timestamp information is only recommended for analysis tasks that have no need to separate flows in time, for example, for counting total volumes or unique occurrences of other flow keys in an entire dataset.
精度下降到无限低的精度相当于黑色标记匿名化。仅建议在不需要在时间上分离流的分析任务中删除时间戳信息,例如,在计算整个数据集中的总体积或其他流键的唯一出现次数时。
Enumeration is a substitution function that retains the chronological order in which events occurred while eliminating time information. Timestamps are substituted by equidistant timestamps (or numbers) starting from a randomly chosen start value. The resulting data is useful for applications requiring strict sequencing, but not for those requiring good timing information (e.g., delay- or jitter-measurement for quality-of-service (QoS) applications or service-level agreement (SLA) validation).
枚举是一个替换函数,它保留事件发生的时间顺序,同时消除时间信息。时间戳由从随机选择的起始值开始的等距时间戳(或数字)代替。结果数据对于需要严格排序的应用程序很有用,但对于那些需要良好定时信息的应用程序(例如,服务质量(QoS)应用程序的延迟或抖动测量或服务水平协议(SLA)验证)则不有用。
Note that enumeration is functionally equivalent to precision degradation in any environment into which traffic can be regularly injected to serve as a clock at the precision of the frequency of the injected flows.
请注意,枚举在功能上等同于任何环境中的精度降低,在这种环境中,流量可以定期注入,以注入流频率的精度作为时钟。
Random time shifts add a random offset to every timestamp within a dataset. Therefore, this reversible substitution technique retains duration and inter-event interval information as well as the chronological order of flows. Random time shifts are quite weak and relatively easy to reverse in the presence of external knowledge about traffic on the measured network.
随机时间偏移为数据集中的每个时间戳添加一个随机偏移量。因此,这种可逆替代技术保留了持续时间和事件间隔信息以及流的时间顺序。随机时间偏移非常微弱,并且在存在关于测量网络上的流量的外部知识的情况下,相对容易反转。
Counters (such as packet and octet volumes per flow) are subject to fingerprinting and injection attacks against anonymization or for user profiling as timestamps are. Data sets with anonymized counters are useful only for analysis tasks for which relative or imprecise magnitudes of activity are useful. Counter information can also be completely removed, but this is only recommended for analysis tasks that have no need to evaluate the removed counter, for example, for counting only unique occurrences of other flow keys.
计数器(如每个流的数据包和八位组卷)会受到指纹和注入攻击,以防止匿名化或用户分析,因为时间戳是安全的。具有匿名计数器的数据集仅适用于相对或不精确活动量有用的分析任务。计数器信息也可以完全删除,但这仅建议用于不需要评估已删除计数器的分析任务,例如,仅计算其他流键的唯一出现次数。
+-----------------------+----------------------------+ | Scheme | Action | +-----------------------+----------------------------+ | Precision Degradation | Generalization | | Binning | Generalization | | Random noise addition | Direct or Set Substitution | +-----------------------+----------------------------+
+-----------------------+----------------------------+ | Scheme | Action | +-----------------------+----------------------------+ | Precision Degradation | Generalization | | Binning | Generalization | | Random noise addition | Direct or Set Substitution | +-----------------------+----------------------------+
As with precision degradation in timestamps, precision degradation of counters removes lower-order bits of the counters, treating all the counters in a given range as having the same value. Depending on the precision reduction, this loses information about the relationships between sizes of similarly sized flows, but keeps relative magnitude information. Precision degradation to an infinitely low precision is equivalent to black-marker anonymization.
与时间戳的精度降级一样,计数器的精度降级会删除计数器的低阶位,将给定范围内的所有计数器视为具有相同的值。根据精度的降低,这会丢失有关大小相似的流的大小之间关系的信息,但会保留相对大小的信息。精度降低到无限低的精度相当于黑色标记匿名化。
Binning can be seen as a special case of precision degradation; the operation is identical, except for in precision degradation the counter ranges are uniform, and in binning, they need not be. For example, consider separating unopened TCP connections from potentially opened TCP connections. Here, packet counters per flow would be binned into two bins, one for 1-2 packet flows, and one for flows with 3 or more packets. Binning schemes are generally chosen to keep precisely the amount of information required in a counter for a given analysis task. Note that, also unlike precision degradation, the bin label need not be within the bin's range. Binning counters to a single bin is equivalent to black-marker anonymization.
装箱可视为精度下降的一种特殊情况;操作是相同的,除了在精度降低时计数器范围是一致的,在装箱时,它们不需要相同。例如,考虑将未打开的TCP连接与潜在打开的TCP连接分开。在这里,每个流的数据包计数器将分为两个容器,一个用于1-2个数据包流,另一个用于包含3个或更多数据包的流。通常选择装箱方案是为了精确地保存给定分析任务计数器中所需的信息量。请注意,与精度降级不同,存储箱标签不需要在存储箱的范围内。将计数器装箱到单个箱子相当于黑色标记匿名化。
Random noise addition adds a random amount to a counter in each flow; this is used to keep relative magnitude information and minimize the disruption to size relationship information while avoiding fingerprinting attacks against anonymization. Note that there is no guarantee that random noise addition will maintain ranking order by a counter among members of a set. Random noise addition is particularly useful when the derived analysis data will not be presented in such a way as to require the lower-order bits of the counters.
随机噪声加法将随机量添加到每个流中的计数器;这用于保持相对大小信息,最大限度地减少对大小关系信息的破坏,同时避免针对匿名化的指纹攻击。请注意,不保证随机噪声加法将保持集合成员中计数器的排序顺序。当导出的分析数据不会以需要计数器低阶位的方式呈现时,随机噪声加法特别有用。
Other fields, particularly port numbers and protocol numbers, can be used to partially identify the applications that generated the traffic in a given flow trace. This information can be used in fingerprinting attacks, and may be of interest on its own (e.g., to reveal that a certain application with suspected vulnerabilities is running on a given network). These fields are generally anonymized using one of two techniques.
其他字段,特别是端口号和协议号,可用于部分标识在给定流跟踪中生成流量的应用程序。此信息可用于指纹攻击,并且可能会引起自身的兴趣(例如,显示具有可疑漏洞的特定应用程序正在给定网络上运行)。这些字段通常使用两种技术之一进行匿名化。
+-------------+---------------------+ | Scheme | Action | +-------------+---------------------+ | Binning | Generalization | | Permutation | Direct Substitution | +-------------+---------------------+
+-------------+---------------------+ | Scheme | Action | +-------------+---------------------+ | Binning | Generalization | | Permutation | Direct Substitution | +-------------+---------------------+
Binning is a generalization technique mapping a set of potentially non-uniform ranges into a set of arbitrarily labeled bins. Common bin arrangements depend on the field type and the analysis application. For example, an IP protocol bin arrangement may preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all other protocols into a single bin, to mitigate the use of uncommon protocols in fingerprinting attacks. Another example arrangement may bin source and destination ports into low (0-1023) and high (1024- 65535) bins in order to tell service from ephemeral ports without identifying individual applications.
装箱是一种将一组潜在的非均匀范围映射为一组任意标记的装箱的泛化技术。常用料仓布置取决于字段类型和分析应用程序。例如,IP协议bin安排可以为ICMP、UDP和TCP通信保留1、6和17,并将所有其他协议装箱到单个bin中,以减轻指纹攻击中不常见协议的使用。另一个示例布置可以将源端口和目标端口装箱到低(0-1023)和高(1024-65535)箱中,以便在不识别单个应用程序的情况下将服务与临时端口区分开来。
Binning other flow key fields to a single bin is equivalent to black-marker anonymization. Removal of other flow key information is only recommended for analysis tasks that have no need to differentiate flows on the removed keys, for example, for total traffic counts or unique counts of other flow keys.
将其他流键字段装箱到单个箱子相当于黑标记匿名化。仅建议对于不需要区分已删除密钥上的流的分析任务删除其他流密钥信息,例如,对于总流量计数或其他流密钥的唯一计数。
Permutation is a direct substitution technique, replacing each value with an value selected from the set of possible range, such that each anonymized value represents a unique original value. This is used to preserve the count of unique values without preserving information about, or the ordering of, the values themselves.
置换是一种直接替换技术,用从可能范围集合中选择的值替换每个值,使得每个匿名值表示唯一的原始值。这用于保留唯一值的计数,而不保留有关值本身的信息或值本身的顺序。
While permutation ideally guarantees that each anonymized value represents a unique original value, such may require significant state in the Intermediate Anonymization Process. Therefore, permutation may be implemented by hashing for performance reasons,
虽然置换理想地保证每个匿名化值代表唯一的原始值,但这可能需要中间匿名化过程中的重要状态。因此,出于性能原因,可以通过哈希实现置换,
with hash functions that may have relatively small collision probabilities. Such techniques are still essentially direct substitution techniques, despite the nonzero error probability.
使用可能具有相对较小碰撞概率的哈希函数。尽管错误概率非零,但这些技术本质上仍然是直接替换技术。
This section details the abstract parameters used to describe the anonymization techniques examined in the previous section, on a per-parameter basis. These parameters and their export safety inform the design of the IPFIX anonymization metadata export specified in the following section.
本节详细介绍了用于描述前一节中检查的匿名化技术的抽象参数(按参数)。这些参数及其导出安全性为下一节中指定的IPFIX匿名元数据导出的设计提供了依据。
A stable anonymization will always map a given value in the real space to a given value in the anonymized space, while an unstable anonymization will change this mapping over time; a completely unstable anonymization is essentially indistinguishable from black-marker anonymization. Any given anonymization technique may be applied with a varying range of stability. Stability is important for assessing the comparability of anonymized information in different datasets, or in the same dataset over different time periods. In practice, an anonymization may also be stable for every dataset published by a particular producer to a particular consumer, stable for a stated time period within a dataset or across datasets, or stable only for a single dataset.
稳定的匿名化总是将实空间中的给定值映射到匿名空间中的给定值,而不稳定的匿名化会随着时间的推移改变这种映射;完全不稳定的匿名化本质上与黑标记匿名化无法区分。任何给定的匿名化技术都可以在不同的稳定性范围内应用。稳定性对于评估不同数据集中或同一数据集中不同时间段的匿名信息的可比性非常重要。实际上,对于由特定生产者发布给特定消费者的每个数据集,匿名化也可能是稳定的,在数据集内或数据集之间的规定时间段内是稳定的,或者仅对单个数据集是稳定的。
If no information about stability is available, users of anonymized data MAY assume that the techniques used are stable across the entire dataset, but unstable across datasets. Note that stability presents a risk-utility trade-off, as completely stable anonymization can be used for longer-term trend analysis tasks but also presents more risk of attack given the stable mapping. Information about the stability of a mapping SHOULD be exported along with the anonymized data.
如果没有关于稳定性的信息,匿名数据的用户可能会认为所使用的技术在整个数据集上是稳定的,但在数据集上是不稳定的。请注意,稳定性提供了一种风险-效用权衡,因为完全稳定的匿名化可用于长期趋势分析任务,但鉴于稳定映射,也会带来更大的攻击风险。有关映射稳定性的信息应与匿名数据一起导出。
Truncation and precision degradation are described by the truncation length or the amount of data still remaining in the anonymized field after anonymization.
截断和精度降低由截断长度或匿名化后仍保留在匿名字段中的数据量来描述。
Truncation length can generally be inferred from a given dataset, and need not be specially exported or protected. For bit-level truncation, the truncated bits are generally inferable by the least significant bit set for an instance of an Information Element described by a given Template (or the most significant bit set, in the case of reverse truncation). For precision degradation, the truncation is inferable from the maximum precision given. Note that
截断长度通常可以从给定的数据集推断出来,不需要专门导出或保护。对于位级截断,被截断的位通常可由给定模板描述的信息元素实例的最低有效位集(或在反向截断的情况下,最高有效位集)推断。对于精度下降,可从给定的最大精度推断截断。注意
while this inference method is generally applicable, it is data dependent: there is no guarantee that it will recover the exact truncation length used to prepare the data.
虽然这种推断方法通常适用,但它依赖于数据:不能保证它将恢复用于准备数据的精确截断长度。
In the special case of IP address export with variable (per-record) truncation, the truncation MAY be expressed by exporting the prefix length alongside the address.
在IP地址导出带有变量(每个记录)截断的特殊情况下,截断可以通过导出地址旁边的前缀长度来表示。
Binning is described by the specification of a bin mapping function. This function can be generally expressed in terms of an associative array that maps each point in the original space to a bin, although from an implementation standpoint most bin functions are much simpler and more efficient.
装箱由装箱映射函数的规范描述。该函数通常可以用关联数组表示,该数组将原始空间中的每个点映射到一个bin,尽管从实现的角度来看,大多数bin函数更简单、更高效。
Since the bin map for a bin mapping function is in essence the bin mapping key, and can be used to partially deanonymize binned data, depending on the degree of generalization, information about the bin mapping function SHOULD NOT be exported.
由于bin映射函数的bin映射本质上是bin映射键,并且可用于部分非对称化binned数据,这取决于泛化程度,因此不应导出有关bin映射函数的信息。
Like binning, permutation is described by the specification of a permutation function. In the general case, this can be expressed in terms of an associative array that maps each point in the original space to a point in the anonymized space. Unlike binning, each point in the anonymized space corresponds to a single, unique point in the original space.
与binning一样,置换也是通过指定置换函数来描述的。在一般情况下,这可以用关联数组表示,该数组将原始空间中的每个点映射到匿名空间中的一个点。与binning不同,匿名空间中的每个点对应于原始空间中的单个唯一点。
Since the parameters of the permutation function are in essence key-like (indeed, for cryptographic permutation functions, they are the keys themselves), information about the permutation function or its parameters SHOULD NOT be exported.
由于置换函数的参数本质上类似于密钥(实际上,对于加密置换函数,它们本身就是密钥),因此不应导出关于置换函数或其参数的信息。
Shifting requires an amount by which to shift each value. Since the shift amount is the only key to a shift function, and can be used to trivially deanonymize data protected by shifting, information about the shift amount SHOULD NOT be exported.
移位需要每个值的移位量。由于移位量是移位函数的唯一键,可用于对受移位保护的数据进行简单的非对称化,因此不应导出有关移位量的信息。
Anonymized data exported via IPFIX SHOULD be annotated with anonymization metadata, which details which fields described by which Templates are anonymized, and provides appropriate information on the anonymization techniques used. This metadata SHOULD be exported in
通过IPFIX导出的匿名数据应使用匿名元数据进行注释,该元数据详细说明了哪些模板所描述的字段是匿名的,并提供了有关所用匿名技术的适当信息。此元数据应在中导出
Data Records described by the recommended Options Templates described in this section; these Options Templates use the additional Information Elements described in the following subsection.
本节所述的推荐选项模板所述的数据记录;这些选项模板使用以下小节中描述的附加信息元素。
Note that fields anonymized using the black-marker (removal) technique do not require any special metadata support: black-marker anonymized fields SHOULD NOT be exported at all, by omitting the corresponding Information Elements from Template describing the Data Set. In the case where application requirements dictate that a black-marker anonymized field must remain in a Template, then an Exporting Process MAY export black-marker anonymized fields with their native length as all-zeros, but only in cases where enough contextual information exists within the record to differentiate a black-marker anonymized field exported in this way from a real zero value.
请注意,使用黑色标记(删除)技术匿名化的字段不需要任何特殊的元数据支持:通过从描述数据集的模板中省略相应的信息元素,根本不应导出黑色标记匿名化字段。如果应用程序要求黑色标记匿名化字段必须保留在模板中,则导出过程可以导出其本机长度为全零的黑色标记匿名化字段,但只有在记录中存在足够的上下文信息以区分以这种方式导出的黑色标记匿名字段与实际零值的情况下。
The Anonymization Options Template describes Anonymization Records, which allow anonymization metadata to be exported inline over IPFIX or stored in an IPFIX File, by binding information about anonymization techniques to Information Elements within defined Templates or Options Templates. IPFIX Exporting Processes SHOULD export anonymization records for any Template describing exported anonymized Data Records; IPFIX Collecting Processes and processes downstream from them MAY use anonymization records to treat anonymized data differently depending on the applied technique.
“匿名化选项”模板描述了匿名化记录,通过将有关匿名化技术的信息绑定到定义的模板或选项模板中的信息元素,可以通过IPFIX内联导出匿名化元数据或将其存储在IPFIX文件中。IPFIX导出过程应导出描述导出匿名数据记录的任何模板的匿名记录;IPFIX收集进程及其下游进程可能会使用匿名化记录,根据应用的技术不同处理匿名化数据。
Anonymization Records contain ancillary information bound to a Template, so many of the considerations for Templates apply to Anonymization Records as well. First, reliability is important: an Exporting Process SHOULD export Anonymization Records after the Templates they describe have been exported, and SHOULD export anonymization records reliably if supported by the underlying transport (i.e., without partial reliability when using Stream Control Transmission Protocol (SCTP)).
匿名化记录包含绑定到模板的辅助信息,因此模板的许多注意事项也适用于匿名化记录。首先,可靠性很重要:导出过程应该在其描述的模板导出后导出匿名记录,并且如果底层传输支持,则应该可靠地导出匿名记录(即,在使用流控制传输协议(SCTP)时没有部分可靠性)。
Anonymization Records MUST be handled by Collecting Processes as scoped to the Template to which they apply within the Transport Session in which they are sent. When a Template is withdrawn via a Template Withdrawal Message or expires during a UDP transport session, the accompanying Anonymization Records are withdrawn or expire as well and do not apply to subsequent Templates with the same Template ID within the Session unless re-exported.
匿名化记录必须通过收集流程来处理,收集流程的作用域是在发送它们的传输会话中应用到的模板。当模板通过模板撤回消息撤回或在UDP传输会话期间过期时,附带的匿名化记录也将撤回或过期,除非重新导出,否则不会应用于会话中具有相同模板ID的后续模板。
The Stability Class within the anonymizationFlags IE can be used to declare that a given anonymization technique's mapping will remain stable across multiple sessions, but this does not mean that
anonymizationFlags IE中的稳定性类可用于声明给定匿名技术的映射将在多个会话中保持稳定,但这并不意味着
anonymization technique information given in the Anonymization Records themselves persist across Sessions. Each new Transport Session MUST contain new Anonymization Records for each Template describing anonymized Data Sets.
匿名化技术匿名化记录中提供的信息在会话中保持不变。每个新传输会话必须包含描述匿名数据集的每个模板的新匿名记录。
SCTP per-stream export [IPFIX-PERSTREAM] may be used to ease management of Anonymization Records if appropriate for the application.
SCTP每流导出[IPFIX-PERSTREAM]可用于简化匿名记录的管理(如果适用于应用程序)。
The fields of the Anonymization Options Template are as follows:
匿名化选项模板的字段如下所示:
+-------------------------+-----------------------------------------+ | IE | Description | +-------------------------+-----------------------------------------+ | templateId [scope] | The Template ID of the Template or | | | Options Template containing the | | | Information Element described by this | | | anonymization record. This Information | | | Element MUST be defined as a Scope | | | Field. | | informationElementId | The Information Element identifier of | | [scope] | the Information Element described by | | | this anonymization record. This | | | Information Element MUST be defined as | | | a Scope Field. Exporting Processes | | | MUST clear then Enterprise bit of the | | | informationElementId and Collecting | | | Processes SHOULD ignore it; information | | | about enterprise-specific Information | | | Elements is exported via the | | | privateEnterpriseNumber Information | | | Element. | | privateEnterpriseNumber | The Private Enterprise Number of the | | [scope] [optional] | enterprise-specific Information Element | | | described by this anonymization record. | | | This Information Element MUST be | | | defined as a Scope Field if present. A | | | privateEnterpriseNumber of 0 signifies | | | that the Information Element is | | | IANA-registered. | | informationElementIndex | The Information Element index of the | | [scope] [optional] | instance of the Information Element | | | described by this anonymization record | | | identified by the informationElementId | | | within the Template. Optional; need | | | only be present when describing | | | Templates that have multiple instances | | | of the same Information Element. This |
+-------------------------+-----------------------------------------+ | IE | Description | +-------------------------+-----------------------------------------+ | templateId [scope] | The Template ID of the Template or | | | Options Template containing the | | | Information Element described by this | | | anonymization record. This Information | | | Element MUST be defined as a Scope | | | Field. | | informationElementId | The Information Element identifier of | | [scope] | the Information Element described by | | | this anonymization record. This | | | Information Element MUST be defined as | | | a Scope Field. Exporting Processes | | | MUST clear then Enterprise bit of the | | | informationElementId and Collecting | | | Processes SHOULD ignore it; information | | | about enterprise-specific Information | | | Elements is exported via the | | | privateEnterpriseNumber Information | | | Element. | | privateEnterpriseNumber | The Private Enterprise Number of the | | [scope] [optional] | enterprise-specific Information Element | | | described by this anonymization record. | | | This Information Element MUST be | | | defined as a Scope Field if present. A | | | privateEnterpriseNumber of 0 signifies | | | that the Information Element is | | | IANA-registered. | | informationElementIndex | The Information Element index of the | | [scope] [optional] | instance of the Information Element | | | described by this anonymization record | | | identified by the informationElementId | | | within the Template. Optional; need | | | only be present when describing | | | Templates that have multiple instances | | | of the same Information Element. This |
| | Information Element MUST be defined as | | | a Scope Field if present. This | | | Information Element is defined in | | | Section 6.2. | | anonymizationFlags | Flags describing the mapping stability | | | and specialized modifications to the | | | Anonymization Technique in use. SHOULD | | | be present. This Information Element | | | is defined in Section 6.2.3. | | anonymizationTechnique | The technique used to anonymize the | | | data. MUST be present. This | | | Information Element is defined in | | | Section 6.2.2. | +-------------------------+-----------------------------------------+
| | Information Element MUST be defined as | | | a Scope Field if present. This | | | Information Element is defined in | | | Section 6.2. | | anonymizationFlags | Flags describing the mapping stability | | | and specialized modifications to the | | | Anonymization Technique in use. SHOULD | | | be present. This Information Element | | | is defined in Section 6.2.3. | | anonymizationTechnique | The technique used to anonymize the | | | data. MUST be present. This | | | Information Element is defined in | | | Section 6.2.2. | +-------------------------+-----------------------------------------+
Description: A zero-based index of an Information Element referenced by informationElementId within a Template referenced by templateId; used to disambiguate scope for templates containing multiple identical Information Elements.
描述:templateId引用的模板中informationElementId引用的信息元素的从零开始的索引;用于消除包含多个相同信息元素的模板范围的歧义。
Abstract Data Type: unsigned16
抽象数据类型:unsigned16
Data Type Semantics: identifier
数据类型语义:标识符
ElementId: 287
元素ID:287
Status: Current
状态:当前
Description: A description of the anonymization technique applied to a referenced Information Element within a referenced Template. Each technique may be applicable only to certain Information Elements and recommended only for certain Information Elements; these restrictions are noted in the table below.
描述:对引用模板中引用信息元素应用的匿名化技术的描述。每种技术可能仅适用于某些信息元素,并且仅推荐用于某些信息元素;下表列出了这些限制。
+-------+---------------------------+-----------------+-------------+ | Value | Description | Applicable to | Recommended | | | | | for | +-------+---------------------------+-----------------+-------------+ | 0 | Undefined: the Exporting | all | all | | | Process makes no | | | | | representation as to | | | | | whether or not the | | | | | defined field is | | | | | anonymized. While the | | | | | Collecting Process MAY | | | | | assume that the field is | | | | | not anonymized, it is not | | | | | guaranteed not to be. | | | | | This is the default | | | | | anonymization technique. | | | | 1 | None: the values exported | all | all | | | are real. | | | | 2 | Precision | all | all | | | Degradation/Truncation: | | | | | the values exported are | | | | | anonymized using simple | | | | | precision degradation or | | | | | truncation. The new | | | | | precision or number of | | | | | truncated bits is | | | | | implicit in the exported | | | | | data and can be deduced | | | | | by the Collecting | | | | | Process. | | | | 3 | Binning: the values | all | all | | | exported are anonymized | | | | | into bins. | | | | 4 | Enumeration: the values | all | timestamps | | | exported are anonymized | | | | | by enumeration. | | | | 5 | Permutation: the values | all | identifiers | | | exported are anonymized | | | | | by permutation. | | | | 6 | Structured Permutation: | addresses | | | | the values exported are | | | | | anonymized by | | | | | permutation, preserving | | | | | bit-level structure as | | | | | appropriate; this | | | | | represents | | | | | prefix-preserving IP | | | | | address anonymization or | | |
+-------+---------------------------+-----------------+-------------+ | Value | Description | Applicable to | Recommended | | | | | for | +-------+---------------------------+-----------------+-------------+ | 0 | Undefined: the Exporting | all | all | | | Process makes no | | | | | representation as to | | | | | whether or not the | | | | | defined field is | | | | | anonymized. While the | | | | | Collecting Process MAY | | | | | assume that the field is | | | | | not anonymized, it is not | | | | | guaranteed not to be. | | | | | This is the default | | | | | anonymization technique. | | | | 1 | None: the values exported | all | all | | | are real. | | | | 2 | Precision | all | all | | | Degradation/Truncation: | | | | | the values exported are | | | | | anonymized using simple | | | | | precision degradation or | | | | | truncation. The new | | | | | precision or number of | | | | | truncated bits is | | | | | implicit in the exported | | | | | data and can be deduced | | | | | by the Collecting | | | | | Process. | | | | 3 | Binning: the values | all | all | | | exported are anonymized | | | | | into bins. | | | | 4 | Enumeration: the values | all | timestamps | | | exported are anonymized | | | | | by enumeration. | | | | 5 | Permutation: the values | all | identifiers | | | exported are anonymized | | | | | by permutation. | | | | 6 | Structured Permutation: | addresses | | | | the values exported are | | | | | anonymized by | | | | | permutation, preserving | | | | | bit-level structure as | | | | | appropriate; this | | | | | represents | | | | | prefix-preserving IP | | | | | address anonymization or | | |
| | structured MAC address | | | | | anonymization. | | | | 7 | Reverse Truncation: the | addresses | | | | values exported are | | | | | anonymized using reverse | | | | | truncation. The number | | | | | of truncated bits is | | | | | implicit in the exported | | | | | data, and can be deduced | | | | | by the Collecting | | | | | Process. | | | | 8 | Noise: the values | non-identifiers | counters | | | exported are anonymized | | | | | by adding random noise to | | | | | each value. | | | | 9 | Offset: the values | all | timestamps | | | exported are anonymized | | | | | by adding a single offset | | | | | to all values. | | | +-------+---------------------------+-----------------+-------------+
| | structured MAC address | | | | | anonymization. | | | | 7 | Reverse Truncation: the | addresses | | | | values exported are | | | | | anonymized using reverse | | | | | truncation. The number | | | | | of truncated bits is | | | | | implicit in the exported | | | | | data, and can be deduced | | | | | by the Collecting | | | | | Process. | | | | 8 | Noise: the values | non-identifiers | counters | | | exported are anonymized | | | | | by adding random noise to | | | | | each value. | | | | 9 | Offset: the values | all | timestamps | | | exported are anonymized | | | | | by adding a single offset | | | | | to all values. | | | +-------+---------------------------+-----------------+-------------+
Abstract Data Type: unsigned16
抽象数据类型:unsigned16
Data Type Semantics: identifier
数据类型语义:标识符
ElementId: 286
元素ID:286
Status: Current
状态:当前
Description: A flag word describing specialized modifications to the anonymization policy in effect for the anonymization technique applied to a referenced Information Element within a referenced Template. When flags are clear (0), the normal policy (as described by anonymizationTechnique) applies without modification.
描述:一个标志词,用于描述对匿名化策略的专门修改,该策略对应用于引用模板内引用信息元素的匿名化技术有效。当标志清除(0)时,正常策略(如匿名化技术所述)将在不进行修改的情况下应用。
MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | Reserved |LOR|PmA| SC | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | Reserved |LOR|PmA| SC | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
anonymizationFlags IE
匿名化
+--------+----------+-----------------------------------------------+ | bit(s) | name | description | | (LSB = | | | | 0) | | | +--------+----------+-----------------------------------------------+ | 0-1 | SC | Stability Class: see the Stability Class | | | | table below, and Section 5.1. | | 2 | PmA | Perimeter Anonymization: when set (1), source | | | | Information Elements as described in | | | | [RFC5103] are interpreted as external | | | | addresses, and destination Information | | | | Elements as described in [RFC5103] are | | | | interpreted as internal addresses, for the | | | | purposes of associating | | | | anonymizationTechnique to Information | | | | Elements only; see Section 7.2.2 for details. | | | | This bit MUST NOT be set when associated with | | | | a non-endpoint (i.e., source or destination) | | | | Information Element. SHOULD be consistent | | | | within a record (i.e., if a source | | | | Information Element has this flag set, the | | | | corresponding destination element SHOULD have | | | | this flag set, and vice versa.) | | 3 | LOR | Low-Order Unchanged: when set (1), the | | | | low-order bits of the anonymized Information | | | | Element contain real data. This modification | | | | is intended for the anonymization of | | | | network-level addresses while leaving | | | | host-level addresses intact in order to | | | | preserve host level-structure, which could | | | | otherwise be used to reverse anonymization. | | | | MUST NOT be set when associated with a | | | | truncation-based anonymizationTechnique. | | 4-15 | Reserved | Reserved for future use: SHOULD be cleared | | | | (0) by the Exporting Process and MUST be | | | | ignored by the Collecting Process. | +--------+----------+-----------------------------------------------+
+--------+----------+-----------------------------------------------+ | bit(s) | name | description | | (LSB = | | | | 0) | | | +--------+----------+-----------------------------------------------+ | 0-1 | SC | Stability Class: see the Stability Class | | | | table below, and Section 5.1. | | 2 | PmA | Perimeter Anonymization: when set (1), source | | | | Information Elements as described in | | | | [RFC5103] are interpreted as external | | | | addresses, and destination Information | | | | Elements as described in [RFC5103] are | | | | interpreted as internal addresses, for the | | | | purposes of associating | | | | anonymizationTechnique to Information | | | | Elements only; see Section 7.2.2 for details. | | | | This bit MUST NOT be set when associated with | | | | a non-endpoint (i.e., source or destination) | | | | Information Element. SHOULD be consistent | | | | within a record (i.e., if a source | | | | Information Element has this flag set, the | | | | corresponding destination element SHOULD have | | | | this flag set, and vice versa.) | | 3 | LOR | Low-Order Unchanged: when set (1), the | | | | low-order bits of the anonymized Information | | | | Element contain real data. This modification | | | | is intended for the anonymization of | | | | network-level addresses while leaving | | | | host-level addresses intact in order to | | | | preserve host level-structure, which could | | | | otherwise be used to reverse anonymization. | | | | MUST NOT be set when associated with a | | | | truncation-based anonymizationTechnique. | | 4-15 | Reserved | Reserved for future use: SHOULD be cleared | | | | (0) by the Exporting Process and MUST be | | | | ignored by the Collecting Process. | +--------+----------+-----------------------------------------------+
The Stability Class portion of this flags word describes the stability class of the anonymization technique applied to a referenced Information Element within a referenced Template. Stability classes refer to the stability of the parameters of the anonymization technique, and therefore the comparability of the mapping between the real and anonymized values over time. This determines which anonymized datasets may be compared with each other. Values are as follows:
该标志词的稳定性类部分描述了匿名化技术的稳定性类,匿名化技术应用于引用模板中的引用信息元素。稳定性等级是指匿名化技术参数的稳定性,因此是指真实值和匿名值之间映射随时间变化的可比性。这决定了哪些匿名数据集可以相互比较。数值如下:
+-----+-----+-------------------------------------------------------+ | Bit | Bit | Description | | 1 | 0 | | +-----+-----+-------------------------------------------------------+ | 0 | 0 | Undefined: the Exporting Process makes no | | | | representation as to how stable the mapping is, or | | | | over what time period values of this field will | | | | remain comparable; while the Collecting Process MAY | | | | assume Session level stability, Session level | | | | stability is not guaranteed. Processes SHOULD assume | | | | this is the case in the absence of stability class | | | | information; this is the default stability class. | | 0 | 1 | Session: the Exporting Process will ensure that the | | | | parameters of the anonymization technique are stable | | | | during the Transport Session. All the values of the | | | | described Information Element for each Record | | | | described by the referenced Template within the | | | | Transport Session are comparable. The Exporting | | | | Process SHOULD endeavor to ensure at least this | | | | stability class. | | 1 | 0 | Exporter-Collector Pair: the Exporting Process will | | | | ensure that the parameters of the anonymization | | | | technique are stable across Transport Sessions over | | | | time with the given Collecting Process, but may use | | | | different parameters for different Collecting | | | | Processes. Data exported to different Collecting | | | | Processes are not comparable. | | 1 | 1 | Stable: the Exporting Process will ensure that the | | | | parameters of the anonymization technique are stable | | | | across Transport Sessions over time, regardless of | | | | the Collecting Process to which it is sent. | +-----+-----+-------------------------------------------------------+
+-----+-----+-------------------------------------------------------+ | Bit | Bit | Description | | 1 | 0 | | +-----+-----+-------------------------------------------------------+ | 0 | 0 | Undefined: the Exporting Process makes no | | | | representation as to how stable the mapping is, or | | | | over what time period values of this field will | | | | remain comparable; while the Collecting Process MAY | | | | assume Session level stability, Session level | | | | stability is not guaranteed. Processes SHOULD assume | | | | this is the case in the absence of stability class | | | | information; this is the default stability class. | | 0 | 1 | Session: the Exporting Process will ensure that the | | | | parameters of the anonymization technique are stable | | | | during the Transport Session. All the values of the | | | | described Information Element for each Record | | | | described by the referenced Template within the | | | | Transport Session are comparable. The Exporting | | | | Process SHOULD endeavor to ensure at least this | | | | stability class. | | 1 | 0 | Exporter-Collector Pair: the Exporting Process will | | | | ensure that the parameters of the anonymization | | | | technique are stable across Transport Sessions over | | | | time with the given Collecting Process, but may use | | | | different parameters for different Collecting | | | | Processes. Data exported to different Collecting | | | | Processes are not comparable. | | 1 | 1 | Stable: the Exporting Process will ensure that the | | | | parameters of the anonymization technique are stable | | | | across Transport Sessions over time, regardless of | | | | the Collecting Process to which it is sent. | +-----+-----+-------------------------------------------------------+
Abstract Data Type: unsigned16
抽象数据类型:unsigned16
Data Type Semantics: flags
数据类型语义:标志
ElementId: 285
元素ID:285
Status: Current
状态:当前
When exporting or storing anonymized flow data using IPFIX, certain interactions between the IPFIX protocol and the anonymization techniques in use must be considered; these are treated in the subsections below.
在使用IPFIX导出或存储匿名流数据时,必须考虑IPFIX协议和使用中的匿名技术之间的某些交互;这些在下面的小节中处理。
Anonymization may be applied to IPFIX data at three stages within the collection infrastructure: on initial export, at a mediator, or after collection, as shown in Figure 1. Each of these locations has specific considerations and applicability.
匿名化可以在收集基础结构中的三个阶段应用于IPFIX数据:初始导出、中介或收集后,如图1所示。每个位置都有特定的考虑因素和适用性。
+==========================================+ | Exporting Process | +==========================================+ | | | (Anonymized at Original Exporter) | V | +=============================+ | | Mediator | | +=============================+ | | | | (Anonymizing Mediator) | V V +==========================================+ | Collecting Process | +==========================================+ | | (Anonymizing CP/File Writer) V +--------------------+ | IPFIX File Storage | +--------------------+
+==========================================+ | Exporting Process | +==========================================+ | | | (Anonymized at Original Exporter) | V | +=============================+ | | Mediator | | +=============================+ | | | | (Anonymizing Mediator) | V V +==========================================+ | Collecting Process | +==========================================+ | | (Anonymizing CP/File Writer) V +--------------------+ | IPFIX File Storage | +--------------------+
Figure 1: Potential Anonymization Locations
图1:潜在匿名位置
Anonymization is generally performed before the wider dissemination or repurposing of a dataset, e.g., adapting operational measurement data for research. Therefore, direct anonymization of flow data on initial export is only applicable in certain restricted circumstances: when the Exporting Process (EP) is "publishing" data to a Collecting Process (CP) directly, and the Exporting Process and Collecting Process are operated by different entities. Note that certain guidelines in Section 7.2.3 with respect to timestamp anonymization may not apply in this case, as the Collecting Process may be able to deduce certain timing information from the time at which each Message is received.
匿名化通常在数据集的广泛传播或重新调整用途之前进行,例如,为研究调整操作测量数据。因此,初始导出时流数据的直接匿名化仅适用于某些受限情况:当导出流程(EP)将数据直接“发布”到采集流程(CP)时,并且导出流程和采集流程由不同的实体操作。注意,第7.2.3节中关于时间戳匿名化的某些指南可能不适用于这种情况,因为收集过程可能能够从接收到每条消息的时间推断出某些定时信息。
A much more flexible arrangement is to anonymize data within a Mediator [RFC6183]. Here, original data is sent to a Mediator, which performs the anonymization function and re-exports the anonymized data. Such a Mediator could be located at the administrative domain boundary of the initial Exporting Process operator, exporting
更灵活的安排是在中介中匿名化数据[RFC6183]。这里,原始数据被发送到中介,中介执行匿名功能并重新导出匿名数据。这样一个中介可以位于初始导出过程操作符Exporting的管理域边界
anonymized data to other consumers outside the organization. In this case, the original Exporter SHOULD use TLS [RFC5246] as specified in [RFC5101] to secure the channel to the Mediator, and the Mediator should follow the guidelines in Section 7.2, to mitigate the risk of original data disclosure.
将匿名数据发送给组织外的其他使用者。在这种情况下,原始出口商应使用[RFC5101]中规定的TLS[RFC5246]来保护与调解人的通道,调解人应遵守第7.2节中的指南,以降低原始数据披露的风险。
When data is to be published as an anonymized dataset in an IPFIX File [RFC5655], the anonymization may be done at the final Collecting Process before storage and dissemination, as well. In this case, the Collector should follow the guidelines in Section 7.2, especially as regards File-specific Options in Section 7.2.4
当数据将作为IPFIX文件[RFC5655]中的匿名数据集发布时,也可以在存储和分发之前的最终收集过程中进行匿名化。在这种情况下,收集器应遵循第7.2节中的指南,尤其是第7.2.4节中关于文件特定选项的指南
In each of these data flows, the anonymization of records is undertaken by an Intermediate Anonymization Process (IAP); the data flows into and out of this IAP are shown in Figure 2 below.
在每个数据流中,记录的匿名化由中间匿名化过程(IAP)进行;流入和流出该IAP的数据如下图2所示。
packets --+ +- IPFIX Messages -+ | | | V V V +==================+ +====================+ +=============+ | Metering Process | | Collecting Process | | File Reader | +==================+ +====================+ +=============+ | Non-anonymized | Records | V V V +=========================================================+ | Intermediate Anonymization Process (IAP) | +=========================================================+ | Anonymized ^ Anonymized | | Records | Records | V | V +===================+ Anonymization +=============+ | Exporting Process |<--- Parameters ------>| File Writer | +===================+ +=============+ | | +------------> IPFIX Messages <----------+
packets --+ +- IPFIX Messages -+ | | | V V V +==================+ +====================+ +=============+ | Metering Process | | Collecting Process | | File Reader | +==================+ +====================+ +=============+ | Non-anonymized | Records | V V V +=========================================================+ | Intermediate Anonymization Process (IAP) | +=========================================================+ | Anonymized ^ Anonymized | | Records | Records | V | V +===================+ Anonymization +=============+ | Exporting Process |<--- Parameters ------>| File Writer | +===================+ +=============+ | | +------------> IPFIX Messages <----------+
Figure 2: Data Flows through the Anonymization Process
图2:通过匿名化过程的数据流
Anonymization parameters must also be available to the Exporting Process and/or File Writer in order to ensure header data is also appropriately anonymized as in Section 7.2.3.
匿名化参数也必须可用于导出过程和/或文件编写器,以确保标题数据也按照第7.2.3节的规定进行适当的匿名化。
Following each of the data flows through the IAP, we describe five basic types of anonymization arrangements within this framework in Figure 3. In addition to the three arrangements described in detail above, anonymization can also be done at a collocated Metering
在通过IAP的每个数据流之后,我们在图3中描述了该框架内五种基本类型的匿名安排。除了上面详细描述的三种安排之外,还可以在并置的计量器上进行匿名化
Process (MP) and File Writer (FW) (see Section 7.3.2 of [RFC5655]), or at a file manipulator, which combines a File Writer with a File Reader (FR) (see Section 7.3.7 of [RFC5655]).
进程(MP)和文件写入程序(FW)(参见[RFC5655]第7.3.2节),或在文件操纵器处,将文件写入程序与文件读取器(FR)相结合(参见[RFC5655]第7.3.7节)。
+----+ +-----+ +----+ pkts -> | MP |->| IAP |->| EP |-> Anonymization on Original Exporter +----+ +-----+ +----+ +----+ +-----+ +----+ pkts -> | MP |->| IAP |->| FW |-> Anonymizing collocated MP/File Writer +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | CP |->| IAP |->| EP |-> Anonymizing Mediator (Masq. Proxy) +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | CP |->| IAP |->| FW |-> Anonymizing collocated CP/File Writer +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | FR |->| IAP |->| FW |-> Anonymizing file manipulator File +----+ +-----+ +----+
+----+ +-----+ +----+ pkts -> | MP |->| IAP |->| EP |-> Anonymization on Original Exporter +----+ +-----+ +----+ +----+ +-----+ +----+ pkts -> | MP |->| IAP |->| FW |-> Anonymizing collocated MP/File Writer +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | CP |->| IAP |->| EP |-> Anonymizing Mediator (Masq. Proxy) +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | CP |->| IAP |->| FW |-> Anonymizing collocated CP/File Writer +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | FR |->| IAP |->| FW |-> Anonymizing file manipulator File +----+ +-----+ +----+
Figure 3: Possible Anonymization Arrangements in the IPFIX Architecture
图3:IPFIX体系结构中可能的匿名安排
Note that anonymization may occur at more than one location within a given collection infrastructure, to provide varying levels of anonymization, disclosure risk, or data utility for specific purposes.
请注意,匿名化可能发生在给定收集基础设施中的多个位置,以提供不同级别的匿名化、披露风险或用于特定目的的数据实用性。
In implementing and deploying the anonymization techniques described in this document, implementors should note that IPFIX already provides features that support anonymized data export, and use these where appropriate. Care must also be taken that data structures supporting the operation of the protocol itself do not leak data that could be used to reverse the anonymization applied to the flow data. Such data structures may appear in the header, or within the data stream itself, especially as options data. Each of these and their impact on specific anonymization techniques is noted in a separate subsection below.
在实施和部署本文档中描述的匿名化技术时,实施者应注意,IPFIX已经提供了支持匿名数据导出的功能,并在适当的情况下使用这些功能。还必须注意,支持协议本身操作的数据结构不会泄漏可用于逆转应用于流数据的匿名化的数据。这种数据结构可以出现在报头中,或者在数据流本身中,尤其是作为选项数据。下面的一个单独小节中说明了每一种方法及其对特定匿名技术的影响。
Note, as in Section 6 above, that black-marker anonymized fields SHOULD NOT be exported at all; the absence of the field in a given Data Set is implicitly declared by not including the corresponding Information Element in the Template describing that Data Set.
注意,如上文第6节所述,不应导出黑色标记匿名字段;通过不在描述该数据集的模板中包含相应的信息元素,隐式声明给定数据集中没有字段。
When using precision degradation of timestamps, Exporting Processes SHOULD export timing information using Information Elements of an appropriate precision, as explained in Section 4.5 of [RFC5153]. For example, timestamps measured in millisecond-level precision and degraded to second-level precision should use flowStartSeconds and flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds.
如[RFC5153]第4.5节所述,当使用时间戳的精度降级时,导出过程应使用适当精度的信息元素导出定时信息。例如,以毫秒级精度测量并降级为秒级精度的时间戳应使用flowStartSeconds和flowEndSeconds,而不是FlowStartMillSeconds和flowEndSeconds。
When exporting anonymized data and anonymization metadata, Exporting Processes SHOULD ensure that the combination of Information Element and declared anonymization technique are compatible. Specifically, the applicable and recommended Information Element types and semantics for each technique are noted in the description of the anonymizationTechnique Information Element in Section 6.2.2. In this description, a timestamp is an Information Element with the data type dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or dateTimeNanoseconds; an address is an Information Element with the data type ipv4Address, ipv6Address, or macAddress; and an identifier is an Information Element with identifier data type semantics. Exporting Process MUST NOT export Anonymization Options records binding techniques to Information Elements to which they are not applicable, and SHOULD NOT export Anonymization Options records binding techniques to Information Elements for which they are not recommended.
导出匿名数据和匿名元数据时,导出过程应确保信息元素和声明的匿名技术的组合是兼容的。具体而言,第6.2.2节中对匿名化技术信息元素的描述中说明了每种技术适用和推荐的信息元素类型和语义。在该描述中,时间戳是数据类型为dateTimeSeconds、DataTimeMilliconds、dateTimeMicroseconds或dateTimeNanoseconds的信息元素;地址是数据类型为ipv4Address、ipv6Address或macAddress的信息元素;标识符是具有标识符数据类型语义的信息元素。导出过程不得将匿名选项记录绑定技术导出到不适用的信息元素,也不得将匿名选项记录绑定技术导出到不推荐使用的信息元素。
Data collected from a single network may require different anonymization policies for addresses internal and external to the network. For example, internal addresses could be subject to simple permutation, while external addresses could be aggregated into networks by truncation. When exporting anonymized perimeter bidirectional flow (biflow) data as in Section 5.2 of [RFC5103], this arrangement may be easily represented by specifying one technique for source endpoint information (which represents the external endpoint in a perimeter biflow) and one technique for destination endpoint information (which represents the internal address in a perimeter biflow).
从单个网络收集的数据可能需要针对网络内部和外部地址的不同匿名策略。例如,内部地址可以进行简单的排列,而外部地址可以通过截断聚合到网络中。如[RFC5103]第5.2节所述,导出匿名周界双向流(biflow)数据时,可通过指定一种源端点信息技术(表示周界biflow中的外部端点)和一种目标端点信息技术,轻松表示这种安排(表示周界双流程中的内部地址)。
However, it can also be useful to represent perimeter-based anonymization policies with unidirectional flow (uniflow), or non-perimeter biflow data. In this case, the Perimeter Anonymization bit (bit 2) in the anonymizationFlags Information Element describing the anonymized address Information Elements can be set to change the meaning of "source" and "destination" of Information Elements to mean "external" and "internal" as with perimeter biflows, but only with respect to anonymization policies.
但是,使用单向流(uniflow)或非周界双向流数据表示基于周界的匿名化策略也很有用。在这种情况下,可以将描述匿名地址信息元素的匿名化标志信息元素中的周界匿名化位(位2)设置为将信息元素的“源”和“目的地”的含义更改为与周界双流相同的“外部”和“内部”,但仅限于匿名化政策。
Each IPFIX Message contains a Message Header; within this Message Header are contained two fields which may be used to break certain anonymization techniques: the Export Time, and the Observation Domain ID.
每个IPFIX消息包含一个消息头;此消息头中包含两个字段,可用于中断某些匿名技术:导出时间和观察域ID。
Export of IPFIX Messages containing anonymized timestamp data where the original Export Time Message header has some relationship to the anonymized timestamps SHOULD anonymize the Export Time header field so that the Export Time is consistent with the anonymized timestamp data. Otherwise, relationships between export and flow time could be used to partially or totally reverse timestamp anonymization. When anonymizing timestamps and the Export Time header field SHOULD avoid times too far in the past or future; while [RFC5101] does not make any allowance for Export Time error detection, it is sensible that Collecting Processes may interpret Messages with seemingly nonsensical Export Times as erroneous. Specific limits are implementation dependent, but this issue may cause interoperability issues when anonymizing the Export Time header field.
导出包含匿名时间戳数据的IPFIX消息时,如果原始导出时间消息头与匿名时间戳有某种关系,则应匿名导出时间头字段,以便导出时间与匿名时间戳数据一致。否则,导出和流时间之间的关系可用于部分或完全反转时间戳匿名化。匿名化时间戳和导出时间头字段时,应避免过去或将来的时间过长;虽然[RFC5101]没有考虑导出时间错误检测,但收集过程可能会将导出时间看似无意义的消息解释为错误。具体限制取决于实现,但在匿名化导出时间标头字段时,此问题可能会导致互操作性问题。
The similarity in size between an Observation Domain ID and an IPv4 address (32 bits) may lead to a temptation to use an IPv4 interface address on the Metering or Exporting Process as the Observation Domain ID. If this address bears some relation to the IP addresses in the flow data (e.g., shares a network prefix with internal addresses) and the IP addresses in the flow data are anonymized in a structure-preserving way, then the Observation Domain ID may be used to break the IP address anonymization. Use of an IPv4 interface address on the Metering or Exporting Process as the Observation Domain ID is NOT RECOMMENDED in this case.
观察域ID和IPv4地址(32位)之间大小的相似性可能导致在计量或导出过程中使用IPv4接口地址作为观察域ID。如果此地址与流数据中的IP地址有某种关系(例如,与内部地址共享网络前缀)并且流数据中的IP地址以保持结构的方式匿名化,然后可以使用观察域ID来中断IP地址匿名化。在这种情况下,不建议在计量或导出过程中使用IPv4接口地址作为观察域ID。
IPFIX uses the Options mechanism to export, among other things, metadata about exported flows and the flow collection infrastructure. As with the IPFIX Message Header, certain Options recommended in [RFC5101] and [RFC5655] containing flow timestamps and network addresses of Exporting and Collecting Processes may be used to break certain anonymization techniques. When using these Options along anonymized data export and storage, values within the Options that could be used to break the anonymization SHOULD themselves be anonymized or omitted.
IPFIX使用选项机制导出有关导出流和流收集基础结构的元数据。与IPFIX消息头一样,[RFC5101]和[RFC5655]中推荐的包含导出和收集过程的流时间戳和网络地址的某些选项可用于中断某些匿名化技术。在匿名数据导出和存储过程中使用这些选项时,可用于中断匿名化的选项中的值本身应匿名化或省略。
The Exporting Process Reliability Statistics Options Template, recommended in [RFC5101], contains an Exporting Process ID field, which may be an exportingProcessIPv4Address Information Element or an exportingProcessIPv6Address Information Element. If the Exporting
[RFC5101]中推荐的导出过程可靠性统计信息选项模板包含一个导出过程ID字段,该字段可以是exportingProcessIPv4Address信息元素或exportingProcessIPv6Address信息元素。如果出口
Process address bears some relation to the IP addresses in the flow data (e.g., shares a network prefix with internal addresses) and the IP addresses in the flow data are anonymized in a structure-preserving way, then the Exporting Process address may be used to break the IP address anonymization. Exporting Processes exporting anonymized data in this situation SHOULD mitigate the risk of attack either by omitting Options described by the Exporting Process Reliability Statistics Options Template or by anonymizing the Exporting Process address using a similar technique to that used to anonymize the IP addresses in the exported data.
流程地址与流程数据中的IP地址有某种关系(例如,与内部地址共享网络前缀),并且流程数据中的IP地址以保留结构的方式匿名化,然后导出流程地址可用于中断IP地址匿名化。导出过程在这种情况下导出匿名化数据应通过省略导出过程可靠性统计信息选项模板中描述的选项或使用与用于匿名化导出数据中IP地址的技术类似的技术匿名化导出过程地址来降低攻击风险。
Similarly, the Export Session Details Options Template and Message Details Options Template specified for the IPFIX File Format [RFC5655] may contain the exportingProcessIPv4Address Information Element or the exportingProcessIPv6Address Information Element to identify an Exporting Process from which a flow record was received, and the collectingProcessIPv4Address Information Element or the collectingProcessIPv6Address Information Element to identify the Collecting Process which received it. If the Exporting Process or Collecting Process address bears some relation to the IP addresses in the dataset (e.g., shares a network prefix with internal addresses) and the IP addresses in the dataset are anonymized in a structure-preserving way, then the Exporting Process or Collecting Process address may be used to break the IP address anonymization. Since these Options Templates are primarily intended for storing IPFIX Transport Session data for auditing, replay, and testing purposes, it is NOT RECOMMENDED that storage of anonymized data include these Options Templates in order to mitigate the risk of attack.
类似地,为IPFIX文件格式[RFC5655]指定的导出会话详细信息选项模板和消息详细信息选项模板可能包含exportingProcessIPv4Address信息元素或exportingProcessIPv6Address信息元素,以标识从中接收流记录的导出进程,以及CollectionProcessIPV4Address信息元素或CollectionProcessIPV6Address信息元素,以标识接收它的收集进程。如果导出进程或收集进程地址与数据集中的IP地址有某种关系(例如,与内部地址共享网络前缀),并且数据集中的IP地址以保留结构的方式匿名化,然后,可以使用导出进程或收集进程地址来中断IP地址匿名化。由于这些选项模板主要用于存储IPFIX传输会话数据,以便进行审核、重播和测试,因此不建议匿名数据的存储包括这些选项模板,以降低攻击风险。
The Message Details Options Template specified for the IPFIX File Format [RFC5655] also contains the collectionTimeMilliseconds Information Element. As with the Export Time Message Header field, if the exported dataset contains anonymized timestamp information, and the collectionTimeMilliseconds Information Element in a given Message has some relationship to the anonymized timestamp information, then this relationship can be exploited to reverse the timestamp anonymization. Since this Options Template is primarily intended for storing IPFIX Transport Session data for auditing, replay, and testing purposes, it is NOT RECOMMENDED that storage of anonymized data include this Options Template in order to mitigate the risk of attack.
为IPFIX文件格式[RFC5655]指定的消息详细信息选项模板还包含CollectionTimeMillicles信息元素。与Export Time Message Header字段一样,如果导出的数据集包含匿名时间戳信息,并且给定消息中的CollectionTimeMillimes信息元素与匿名时间戳信息存在某种关系,则可以利用此关系来反转时间戳匿名化。由于此选项模板主要用于存储IPFIX传输会话数据,以便进行审核、重播和测试,因此不建议匿名数据的存储包括此选项模板,以降低攻击风险。
Since the Time Window Options Template specified for the IPFIX File Format [RFC5655] refers to the timestamps within the dataset to provide partial table of contents information for an IPFIX File, Options described by this Template SHOULD be written using the anonymized timestamps instead of the original ones.
由于为IPFIX文件格式[RFC5655]指定的时间窗口选项模板引用数据集中的时间戳,以提供IPFIX文件的部分目录信息,因此该模板描述的选项应使用匿名时间戳而不是原始时间戳编写。
When anonymizing data for transport or storage using IPFIX containing anonymized IP addresses, and the analysis purpose permits doing so, it is RECOMMENDED to filter out or leave unanonymized data containing the special-use IPv4 addresses enumerated in [RFC5735] or the special-use IPv6 addresses enumerated in [RFC5156]. Data containing these addresses (e.g. 0.0.0.0 and 169.254.0.0/16 for link-local autoconfiguration in IPv4 space) are often associated with specific, well-known behavioral patterns. Detection of these patterns in anonymized data can lead to deanonymization of these special-use addresses, which increases the chance of a complete reversal of anonymization by an attacker, especially of prefix-preserving techniques.
如果使用包含匿名IP地址的IPFIX对传输或存储数据进行匿名化,并且分析目的允许这样做,则建议过滤掉或保留包含[RFC5735]中列举的特殊用途IPv4地址或[RFC5156]中列举的特殊用途IPv6地址的未经匿名化的数据。包含这些地址的数据(例如,IPv4空间中链路本地自动配置的0.0.0.0和169.254.0.0/16)通常与特定的、众所周知的行为模式相关联。在匿名数据中检测到这些模式会导致这些特殊用途地址的非对称化,从而增加攻击者完全逆转匿名化的可能性,尤其是前缀保留技术。
Special care should be taken when exporting or sharing anonymized data to avoid information leakage via the configuration or management planes of the IPFIX Device containing the Exporting Process or the File Writer. For example, adding noise to counters is useless if the receiver can deduce the values in the counters from Simple Network Management Protocol (SNMP) information, and concealing the network under test is similarly useless if such information is available in a configuration document. As the specifics of these concerns are largely implementation and deployment dependent, specific mitigation is out of scope for this document. The general ground rule is that information of similar type to that anonymized SHOULD NOT be made available to the receiver by any means, whether in the Data Records, in IPFIX protocol structures such as Message Headers, or out of band.
导出或共享匿名数据时应特别小心,以避免通过包含导出过程或文件编写器的IPFIX设备的配置或管理平面泄漏信息。例如,如果接收器可以从简单网络管理协议(SNMP)信息推断计数器中的值,则向计数器添加噪声是无用的,如果配置文档中有此类信息,则隐藏被测网络也是无用的。由于这些问题的具体细节在很大程度上取决于实施和部署,因此具体的缓解措施不在本文档的范围之内。一般的基本原则是,不应通过任何方式向接收方提供与匿名化信息类型类似的信息,无论是在数据记录中、在IPFIX协议结构(如消息头)中还是在带外。
In this example, consider the export or storage of an anonymized IPv4 dataset from a single network described by a simple Template containing a timestamp in seconds, a five-tuple, and packet and octet counters. The Template describing each record in this Data Set is shown in Figure 4.
在这个例子中,考虑一个匿名的IPv4数据集的导出或存储,它由一个简单的模板描述,包含一个以秒为单位的时间戳,一个五元组,以及包和八位字节计数器。描述此数据集中每个记录的模板如图4所示。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 2 | Length = 40 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template ID = 256 | Field Count = 8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| flowStartSeconds 150 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| sourceIPv4Address 8 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| destinationIPv4Address 12 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| sourceTransportPort 7 | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| destinationTransportPort 11 | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| packetDeltaCount 2 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| octetDeltaCount 1 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| protocolIdentifier 4 | Field Length = 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 2 | Length = 40 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template ID = 256 | Field Count = 8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| flowStartSeconds 150 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| sourceIPv4Address 8 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| destinationIPv4Address 12 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| sourceTransportPort 7 | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| destinationTransportPort 11 | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| packetDeltaCount 2 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| octetDeltaCount 1 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| protocolIdentifier 4 | Field Length = 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Example Flow Template
图4:示例流模板
Suppose that this Data Set is anonymized according to the following policy:
假设此数据集根据以下策略匿名化:
o IP addresses within the network are protected by reverse truncation.
o 网络中的IP地址受反向截断保护。
o IP addresses outside the network are protected by prefix-preserving anonymization.
o 网络外的IP地址通过保留前缀的匿名化进行保护。
o Octet counts are exported using degraded precision in order to provide minimal protection against fingerprinting attacks.
o 八位字节计数使用降级精度导出,以提供针对指纹攻击的最低保护。
o All other fields are exported unanonymized.
o 所有其他字段都将导出为未经授权的字段。
In order to export Anonymization Records for this Template and policy, first, the Anonymization Options Template shown in Figure 5 is exported. For this example, the optional privateEnterpriseNumber and informationElementIndex Information Elements are omitted, because they are not used.
为了导出此模板和策略的匿名化记录,首先导出图5所示的匿名化选项模板。对于本例,由于未使用可选的PrivateEnterpriseEnumber和informationElementIndex信息元素,因此省略了它们。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 3 | Length = 26 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template ID = 257 | Field Count = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Scope Field Count = 2 |0| templateID 145 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0| informationElementId 303 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0| anonymizationFlags 285 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0| anonymizationTechnique 286 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 3 | Length = 26 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template ID = 257 | Field Count = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Scope Field Count = 2 |0| templateID 145 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0| informationElementId 303 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0| anonymizationFlags 285 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0| anonymizationTechnique 286 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: Example Anonymization Options Template
图5:匿名化选项模板示例
Following the Anonymization Options Template comes a Data Set containing Anonymization Records. This dataset has an entry for each Information Element Specifier in Template 256 describing the flow records. This Data Set is shown in Figure 6. Note that sourceIPv4Address and destinationIPv4Address have the Perimeter Anonymization (0x0004) flag set in anonymizationFlags, meaning that source address should be treated as network-external, and the destination address as network-internal.
在匿名化选项模板之后是一个包含匿名化记录的数据集。对于模板256中描述流记录的每个信息元素说明符,此数据集都有一个条目。该数据集如图6所示。请注意,sourceIPv4Address和destinationIPv4Address在匿名标志中设置了周界匿名化(0x0004)标志,这意味着源地址应视为网络外部地址,目标地址应视为网络内部地址。
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 257 | Length = 68 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | flowStartSeconds IE 150 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | sourceIPv4Address IE 8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Perimeter, Session SC 0x0005 | Structured Permutation 6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | destinationIPv4Address IE 12 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Perimeter, Stable 0x0007 | Reverse Truncation 7 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | sourceTransportPort IE 7 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | dest.TransportPort IE 11 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | packetDeltaCount IE 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | octetDeltaCount IE 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stable 0x0003 | Precision Degradation 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | protocolIdentifier IE 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 257 | Length = 68 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | flowStartSeconds IE 150 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | sourceIPv4Address IE 8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Perimeter, Session SC 0x0005 | Structured Permutation 6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | destinationIPv4Address IE 12 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Perimeter, Stable 0x0007 | Reverse Truncation 7 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | sourceTransportPort IE 7 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | dest.TransportPort IE 11 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | packetDeltaCount IE 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | octetDeltaCount IE 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stable 0x0003 | Precision Degradation 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | protocolIdentifier IE 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | Not Anonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6: Example Anonymization Records
图6:匿名化记录示例
Following the Anonymization Records come the Data Sets containing the anonymized data, exported according to the Template in Figure 4. Bringing it all together, consider an IPFIX Message containing three real data records and the necessary templates to export them, shown in Figure 7. (Note that the scale of this message is 8-bytes per line, for compactness; lines of dots '. . . . . ' represent shifting of the example bit structure for clarity.)
Following the Anonymization Records come the Data Sets containing the anonymized data, exported according to the Template in Figure 4. Bringing it all together, consider an IPFIX Message containing three real data records and the necessary templates to export them, shown in Figure 7. (Note that the scale of this message is 8-bytes per line, for compactness; lines of dots '. . . . . ' represent shifting of the example bit structure for clarity.)
1 2 3 4 5 6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x000a | length 135 | export time 1271227717 | msg | sequence 0 | domain 1 | hdr | SetID 2 | length 40 | tid 256 | fields 8 | tmpl | IE 150 | length 4 | IE 8 | length 4 | set | IE 12 | length 4 | IE 7 | length 2 | | IE 11 | length 2 | IE 2 | length 4 | | IE 1 | length 4 | IE 4 | length 1 | | SetID 256 | length 79 | time 1271227681 | data | sip 192.0.2.3 | dip 198.51.100.7 | set | sp 53 | dp 53 | packets 1 | | bytes 74 | prt 17 | . . . . . . . . . . . | time 1271227682 | sip 198.51.100.7 | | dip 192.0.2.88 | sp 5091 | dp 80 | | packets 60 | bytes 2896 | | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | time 1271227683 | sip 198.51.100.7 | | dip 203.0.113.9 | sp 5092 | dp 80 | | packets 44 | bytes 2037 | | prt 6 | +---------+
1 2 3 4 5 6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x000a | length 135 | export time 1271227717 | msg | sequence 0 | domain 1 | hdr | SetID 2 | length 40 | tid 256 | fields 8 | tmpl | IE 150 | length 4 | IE 8 | length 4 | set | IE 12 | length 4 | IE 7 | length 2 | | IE 11 | length 2 | IE 2 | length 4 | | IE 1 | length 4 | IE 4 | length 1 | | SetID 256 | length 79 | time 1271227681 | data | sip 192.0.2.3 | dip 198.51.100.7 | set | sp 53 | dp 53 | packets 1 | | bytes 74 | prt 17 | . . . . . . . . . . . | time 1271227682 | sip 198.51.100.7 | | dip 192.0.2.88 | sp 5091 | dp 80 | | packets 60 | bytes 2896 | | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | time 1271227683 | sip 198.51.100.7 | | dip 203.0.113.9 | sp 5092 | dp 80 | | packets 44 | bytes 2037 | | prt 6 | +---------+
Figure 7: Example Real Message
图7:真实消息示例
The corresponding anonymized message is then shown in Figure 8. The Options Template Set describing Anonymization Records and the Anonymization Records themselves are added; IP addresses and byte counts are anonymized as declared.
相应的匿名消息如图8所示。增加描述匿名记录的选项模板集和匿名记录本身;IP地址和字节计数按声明进行匿名化。
1 2 3 4 5 6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x000a | length 233 | export time 1271227717 | msg | sequence 0 | domain 1 | hdr | SetID 2 | length 40 | tid 256 | fields 8 | tmpl | IE 150 | length 4 | IE 8 | length 4 | set | IE 12 | length 4 | IE 7 | length 2 | | IE 11 | length 2 | IE 2 | length 4 | | IE 1 | length 4 | IE 4 | length 1 | | SetID 3 | length 30 | tid 257 | fields 4 | opt | scope 2 | . . . . . . . . . . . . . . . . . . . . . . . . tmpl | IE 145 | length 2 | IE 303 | length 2 | set | IE 285 | length 2 | IE 286 | length 2 | | SetID 257 | length 68 | . . . . . . . . . . . . . . . . anon | tid 256 | IE 150 | flags 0 | tech 1 | recs | tid 256 | IE 8 | flags 5 | tech 6 | | tid 256 | IE 12 | flags 7 | tech 7 | | tid 256 | IE 7 | flags 0 | tech 1 | | tid 256 | IE 11 | flags 0 | tech 1 | | tid 256 | IE 2 | flags 0 | tech 1 | | tid 256 | IE 1 | flags 3 | tech 2 | | tid 256 | IE41 | flags 0 | tech 1 | | SetID 256 | length 79 | time 1271227681 | data | sip 254.202.119.209 | dip 0.0.0.7 | set | sp 53 | dp 53 | packets 1 | | bytes 100 | prt 17 | . . . . . . . . . . . | time 1271227682 | sip 0.0.0.7 | | dip 254.202.119.6 | sp 5091 | dp 80 | | packets 60 | bytes 2900 | | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | time 1271227683 | sip 0.0.0.7 | | dip 2.19.199.176 | sp 5092 | dp 80 | | packets 60 | bytes 2000 | | prt 6 | +---------+
1 2 3 4 5 6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x000a | length 233 | export time 1271227717 | msg | sequence 0 | domain 1 | hdr | SetID 2 | length 40 | tid 256 | fields 8 | tmpl | IE 150 | length 4 | IE 8 | length 4 | set | IE 12 | length 4 | IE 7 | length 2 | | IE 11 | length 2 | IE 2 | length 4 | | IE 1 | length 4 | IE 4 | length 1 | | SetID 3 | length 30 | tid 257 | fields 4 | opt | scope 2 | . . . . . . . . . . . . . . . . . . . . . . . . tmpl | IE 145 | length 2 | IE 303 | length 2 | set | IE 285 | length 2 | IE 286 | length 2 | | SetID 257 | length 68 | . . . . . . . . . . . . . . . . anon | tid 256 | IE 150 | flags 0 | tech 1 | recs | tid 256 | IE 8 | flags 5 | tech 6 | | tid 256 | IE 12 | flags 7 | tech 7 | | tid 256 | IE 7 | flags 0 | tech 1 | | tid 256 | IE 11 | flags 0 | tech 1 | | tid 256 | IE 2 | flags 0 | tech 1 | | tid 256 | IE 1 | flags 3 | tech 2 | | tid 256 | IE41 | flags 0 | tech 1 | | SetID 256 | length 79 | time 1271227681 | data | sip 254.202.119.209 | dip 0.0.0.7 | set | sp 53 | dp 53 | packets 1 | | bytes 100 | prt 17 | . . . . . . . . . . . | time 1271227682 | sip 0.0.0.7 | | dip 254.202.119.6 | sp 5091 | dp 80 | | packets 60 | bytes 2900 | | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | time 1271227683 | sip 0.0.0.7 | | dip 2.19.199.176 | sp 5092 | dp 80 | | packets 60 | bytes 2000 | | prt 6 | +---------+
Figure 8: Corresponding Anonymized Message
图8:相应的匿名消息
This document provides guidelines for exporting metadata about anonymized data in IPFIX, or storing metadata about anonymized data in IPFIX Files. It is not intended as a general statement on the applicability of specific flow data anonymization techniques. Exporters or publishers of anonymized data must take care that the applied anonymization technique is appropriate for the data source, the purpose, and the risk of deanonymization of a given application.
本文档提供了在IPFIX中导出有关匿名数据的元数据,或在IPFIX文件中存储有关匿名数据的元数据的指南。它不是关于特定流数据匿名技术适用性的一般性声明。匿名数据的导出者或发布者必须注意,所应用的匿名技术适用于给定应用程序的数据源、目的和非对称风险。
Research in anonymization techniques, and techniques for deanonymization, is ongoing, and currently "safe" anonymization techniques may be rendered unsafe by future developments.
匿名化技术和非对称化技术的研究正在进行中,目前“安全”的匿名化技术可能因未来的发展而变得不安全。
We note specifically that anonymization is not a replacement for encryption for confidentiality. It is only appropriate for protecting identifying information in data to be used for purposes in which the protected data is irrelevant. Confidentiality in export is best served by using TLS [RFC5246] or Datagram Transport Layer Security (DTLS) [RFC4347] as in the Security Considerations section of [RFC5101], and in long-term storage by implementation-specific protection applied as in the Security Considerations section of [RFC5655]. Indeed, confidentiality and anonymization are not mutually exclusive, as encryption for confidentiality may be applied to anonymized data export or storage, as well, when the anonymized data is not intended for public release.
我们特别注意到,匿名化并不能代替加密来保密。它仅适用于保护数据中的标识信息,以用于与受保护数据无关的目的。导出中的机密性最好通过使用TLS[RFC5246]或数据报传输层安全性(DTLS)[RFC4347]来实现,如[RFC5101]的安全注意事项部分所述,并通过在[RFC5655]的安全注意事项部分所应用的实现特定保护来实现。事实上,保密性和匿名性并不相互排斥,因为当匿名数据不打算公开发布时,保密性加密也可应用于匿名数据导出或存储。
We note as well that care should be taken even with well-anonymized data, and anonymized data should still be treated as privacy sensitive. Anonymization reduces the risk of misuse, but is not a complete solution to the problem of protecting end-user privacy in network flow trace analysis.
我们还注意到,即使是匿名数据也应谨慎处理,匿名数据仍应视为隐私敏感数据。匿名化降低了误用的风险,但并不能完全解决网络流跟踪分析中保护最终用户隐私的问题。
When using pseudonymization techniques that have a mutable mapping, there is an inherent trade-off in the stability of the map between long-term comparability and security of the dataset against deanonymization. In general, deanonymization attacks are more effective given more information, so the longer a given mapping is valid, the more information can be applied to deanonymization. The specific details of this are technique-dependent and therefore out of the scope of this document.
当使用具有可变映射的假名化技术时,映射的稳定性在长期可比性和数据集的安全性与非对称性之间存在内在的权衡。一般来说,如果给定的信息越多,去非符号化攻击就越有效,因此给定的映射有效的时间越长,可以应用于去非符号化的信息就越多。具体细节取决于技术,因此不在本文件范围内。
When releasing anonymized data, publishers need to ensure that data that could be used in deanonymization is not leaked through a side channel. The entire workflow (hardware, software, operational policies and procedures, etc.) for handling anonymized data must be evaluated for risk of data leakage. While most of these possible side channels are out of scope for this document, guidelines for reducing the risk of information leakage specific to the IPFIX export protocol are provided in Section 7.2.
发布匿名数据时,发布者需要确保可以用于非匿名化的数据不会通过旁道泄漏。必须评估处理匿名数据的整个工作流程(硬件、软件、操作策略和程序等)的数据泄漏风险。虽然这些可能的侧通道中的大多数超出了本文件的范围,但第7.2节提供了降低IPFIX导出协议特定信息泄漏风险的指南。
Note as well that the Security Considerations section of [RFC5101] applies as well to the export of anonymized data, and the Security Considerations section of [RFC5655] to the storage of anonymized data, or the publication of anonymized traces.
还请注意,[RFC5101]的安全注意事项部分也适用于匿名数据的导出,[RFC5655]的安全注意事项部分适用于匿名数据的存储或匿名跟踪的发布。
This document specifies the creation of several new IPFIX Information Elements in the IPFIX Information Element registry available from the IANA site (http://www.iana.org), as defined in Section 6.2. IANA has assigned the following Information Element numbers for their respective Information Elements as specified below:
本文档指定在IANA站点提供的IPFIX信息元素注册表中创建几个新的IPFIX信息元素(http://www.iana.org),定义见第6.2节。IANA为其各自的信息元素分配了以下信息元素编号,具体如下:
o Information Element number 285 for the anonymizationFlags Information Element.
o 匿名标志信息元素的信息元素编号285。
o Information Element number 286 for the anonymizationTechnique Information Element.
o 匿名技术信息元素的信息元素编号286。
o Information Element number 287 for the informationElementIndex Information Element.
o 信息元素索引信息元素的信息元素编号287。
We thank Paul Aitken and John McHugh for their comments and insight, and Carsten Schmoll, Benoit Claise, Lothar Braun, Dan Romascanu, Stewart Bryant, and Sean Turner for their reviews. Special thanks to the FP7 PRISM and DEMONS projects for their material support of this work.
我们感谢Paul Aitken和John McHugh的评论和见解,感谢Carsten Schmoll、Benoit Claise、Lothar Braun、Dan Romascanu、Stewart Bryant和Sean Turner的评论。特别感谢FP7 PRISM和DEMONS项目对这项工作的物质支持。
[RFC5101] Claise, B., "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information", RFC 5101, January 2008.
[RFC5101]Claise,B.,“用于交换IP流量信息的IP流量信息导出(IPFIX)协议规范”,RFC 5101,2008年1月。
[RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. Meyer, "Information Model for IP Flow Information Export", RFC 5102, January 2008.
[RFC5102]Quitek,J.,Bryant,S.,Claise,B.,Aitken,P.,和J.Meyer,“IP流信息导出的信息模型”,RFC 5102,2008年1月。
[RFC5103] Trammell, B. and E. Boschi, "Bidirectional Flow Export Using IP Flow Information Export (IPFIX)", RFC 5103, January 2008.
[RFC5103]Trammell,B.和E.Boschi,“使用IP流量信息导出(IPFIX)的双向流量导出”,RFC 5103,2008年1月。
[RFC5655] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. Wagner, "Specification of the IP Flow Information Export (IPFIX) File Format", RFC 5655, October 2009.
[RFC5655]Trammell,B.,Boschi,E.,Mark,L.,Zseby,T.,和A.Wagner,“IP流信息导出(IPFIX)文件格式规范”,RFC 56552009年10月。
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2119]Bradner,S.,“RFC中用于表示需求水平的关键词”,BCP 14,RFC 2119,1997年3月。
[RFC5735] Cotton, M. and L. Vegoda, "Special Use IPv4 Addresses", BCP 153, RFC 5735, January 2010.
[RFC5735]Cotton,M.和L.Vegoda,“特殊用途IPv4地址”,BCP 153,RFC 57352010年1月。
[RFC5156] Blanchet, M., "Special-Use IPv6 Addresses", RFC 5156, April 2008.
[RFC5156]Blanchet,M.,“特殊用途IPv6地址”,RFC 5156,2008年4月。
[RFC5470] Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek, "Architecture for IP Flow Information Export", RFC 5470, March 2009.
[RFC5470]Sadasivan,G.,Brownlee,N.,Claise,B.,和J.Quitek,“IP流信息导出架构”,RFC 54702009年3月。
[RFC5472] Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP Flow Information Export (IPFIX) Applicability", RFC 5472, March 2009.
[RFC5472]Zseby,T.,Boschi,E.,Brownlee,N.,和B.Claise,“IP流信息导出(IPFIX)适用性”,RFC 54722009年3月。
[RFC6183] Kobayashi, A., Claise, B., Muenz, G., and K. Ishibashi, "IP Flow Information Export (IPFIX) Mediation: Framework", RFC 6183, April 2011.
[RFC6183]Kobayashi,A.,Claise,B.,Muenz,G.,和K.Ishibashi,“IP流信息导出(IPFIX)中介:框架”,RFC 6183,2011年4月。
[IPFIX-PERSTREAM] Claise, B., Aitken, P., Johnson, A., and G. Muenz, "IPFIX Export per SCTP Stream", Work in Progress, May 2010.
[IPFIX-PERSTREAM]Claise,B.,Aitken,P.,Johnson,A.,和G.Muenz,“每个SCTP流的IPFIX导出”,正在进行的工作,2010年5月。
[RFC5153] Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P. Aitken, "IP Flow Information Export (IPFIX) Implementation Guidelines", RFC 5153, April 2008.
[RFC5153]Boschi,E.,Mark,L.,Quitek,J.,Stieemering,M.,和P.Aitken,“IP流信息导出(IPFIX)实施指南”,RFC 5153,2008年4月。
[RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, "Requirements for IP Flow Information Export (IPFIX)", RFC 3917, October 2004.
[RFC3917]Quitek,J.,Zseby,T.,Claise,B.,和S.Zander,“IP流信息导出(IPFIX)的要求”,RFC 39172004年10月。
[RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing Architecture", RFC 4291, February 2006.
[RFC4291]Hinden,R.和S.Deering,“IP版本6寻址体系结构”,RFC 42912006年2月。
[RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer Security", RFC 4347, April 2006.
[RFC4347]Rescorla,E.和N.Modadugu,“数据报传输层安全”,RFC 4347,2006年4月。
[RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008.
[RFC5246]Dierks,T.和E.Rescorla,“传输层安全(TLS)协议版本1.2”,RFC 5246,2008年8月。
[Bur10] Burkhart, M., Schatzmann, D., Trammell, B., and E. Boschi, "The Role of Network Trace Anonymization Under Attack", ACM Computer Communications Review, vol. 40, no. 1, pp. 6-11, January 2010.
[Bur10]Burkhart,M.,Schatzmann,D.,Trammell,B.,和E.Boschi,“网络跟踪匿名化在攻击中的作用”,《ACM计算机通信评论》,第40卷,第1期,第6-11页,2010年1月。
[Mur07] Murdoch, S. and P. Zielinski, "Sampled Traffic Analysis by Internet-Exchange-Level Adversaries", Proceedings of the 7th Workshop on Privacy Enhancing Technologies, Ottawa, Canada, June 2007.
[Mur07]Murdoch,S.和P.Zielinski,“互联网交换级对手的抽样流量分析”,第七届隐私增强技术研讨会论文集,加拿大渥太华,2007年6月。
Authors' Addresses
作者地址
Elisa Boschi Swiss Federal Institute of Technology Zurich Gloriastrasse 35 8092 Zurich Switzerland
Elisa Boschi瑞士联邦理工学院苏黎世Gloriastrasse 35 8092瑞士苏黎世
EMail: boschie@tik.ee.ethz.ch
EMail: boschie@tik.ee.ethz.ch
Brian Trammell Swiss Federal Institute of Technology Zurich Gloriastrasse 35 8092 Zurich Switzerland
Brian Trammell瑞士联邦理工学院苏黎世Gloriastrasse 35 8092瑞士苏黎世
Phone: +41 44 632 70 13 EMail: trammell@tik.ee.ethz.ch
Phone: +41 44 632 70 13 EMail: trammell@tik.ee.ethz.ch