Network Working Group V. Paxson Request for Comments: 2330 Lawrence Berkeley National Lab Category: Informational G. Almes Advanced Network & Services J. Mahdavi M. Mathis Pittsburgh Supercomputer Center May 1998
Network Working Group V. Paxson Request for Comments: 2330 Lawrence Berkeley National Lab Category: Informational G. Almes Advanced Network & Services J. Mahdavi M. Mathis Pittsburgh Supercomputer Center May 1998
Framework for IP Performance Metrics
IP性能度量框架
This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。
Copyright (C) The Internet Society (1998). All Rights Reserved.
版权所有(C)互联网协会(1998年)。版权所有。
Table of Contents
目录
1. STATUS OF THIS MEMO.............................................1 2. COPYRIGHT NOTICE................................................1 3. INTRODUCTION....................................................2 4. CRITERIA FOR IP PERFORMANCE METRICS.............................3 5. TERMINOLOGY FOR PATHS AND CLOUDS................................4 6. FUNDAMENTAL CONCEPTS............................................5 6.1 Metrics......................................................5 6.2 Measurement Methodology......................................6 6.3 Measurements, Uncertainties, and Errors......................7 7. METRICS AND THE ANALYTICAL FRAMEWORK............................8 8. EMPIRICALLY SPECIFIED METRICS..................................11 9. TWO FORMS OF COMPOSITION.......................................12 9.1 Spatial Composition of Metrics..............................12 9.2 Temporal Composition of Formal Models and Empirical Metrics.13 10. ISSUES RELATED TO TIME........................................14 10.1 Clock Issues...............................................14 10.2 The Notion of "Wire Time"..................................17 11. SINGLETONS, SAMPLES, AND STATISTICS............................19 11.1 Methods of Collecting Samples..............................20 11.1.1 Poisson Sampling........................................21 11.1.2 Geometric Sampling......................................22 11.1.3 Generating Poisson Sampling Intervals...................22
1. STATUS OF THIS MEMO.............................................1 2. COPYRIGHT NOTICE................................................1 3. INTRODUCTION....................................................2 4. CRITERIA FOR IP PERFORMANCE METRICS.............................3 5. TERMINOLOGY FOR PATHS AND CLOUDS................................4 6. FUNDAMENTAL CONCEPTS............................................5 6.1 Metrics......................................................5 6.2 Measurement Methodology......................................6 6.3 Measurements, Uncertainties, and Errors......................7 7. METRICS AND THE ANALYTICAL FRAMEWORK............................8 8. EMPIRICALLY SPECIFIED METRICS..................................11 9. TWO FORMS OF COMPOSITION.......................................12 9.1 Spatial Composition of Metrics..............................12 9.2 Temporal Composition of Formal Models and Empirical Metrics.13 10. ISSUES RELATED TO TIME........................................14 10.1 Clock Issues...............................................14 10.2 The Notion of "Wire Time"..................................17 11. SINGLETONS, SAMPLES, AND STATISTICS............................19 11.1 Methods of Collecting Samples..............................20 11.1.1 Poisson Sampling........................................21 11.1.2 Geometric Sampling......................................22 11.1.3 Generating Poisson Sampling Intervals...................22
11.2 Self-Consistency...........................................24 11.3 Defining Statistical Distributions.........................25 11.4 Testing For Goodness-of-Fit................................27 12. AVOIDING STOCHASTIC METRICS....................................28 13. PACKETS OF TYPE P..............................................29 14. INTERNET ADDRESSES VS. HOSTS...................................30 15. STANDARD-FORMED PACKETS........................................30 16. ACKNOWLEDGEMENTS...............................................31 17. SECURITY CONSIDERATIONS........................................31 18. APPENDIX.......................................................32 19. REFERENCES.....................................................38 20. AUTHORS' ADDRESSES.............................................39 21. FULL COPYRIGHT STATEMENT.......................................40
11.2 Self-Consistency...........................................24 11.3 Defining Statistical Distributions.........................25 11.4 Testing For Goodness-of-Fit................................27 12. AVOIDING STOCHASTIC METRICS....................................28 13. PACKETS OF TYPE P..............................................29 14. INTERNET ADDRESSES VS. HOSTS...................................30 15. STANDARD-FORMED PACKETS........................................30 16. ACKNOWLEDGEMENTS...............................................31 17. SECURITY CONSIDERATIONS........................................31 18. APPENDIX.......................................................32 19. REFERENCES.....................................................38 20. AUTHORS' ADDRESSES.............................................39 21. FULL COPYRIGHT STATEMENT.......................................40
The purpose of this memo is to define a general framework for particular metrics to be developed by the IETF's IP Performance Metrics effort, begun by the Benchmarking Methodology Working Group (BMWG) of the Operational Requirements Area, and being continued by the IP Performance Metrics Working Group (IPPM) of the Transport Area.
本备忘录的目的是为IETF的IP性能度量工作制定特定度量的一般框架,该工作由运行需求领域的基准方法工作组(BMWG)开始,并由传输领域的IP性能度量工作组(IPPM)继续。
We begin by laying out several criteria for the metrics that we adopt. These criteria are designed to promote an IPPM effort that will maximize an accurate common understanding by Internet users and Internet providers of the performance and reliability both of end-to-end paths through the Internet and of specific 'IP clouds' that comprise portions of those paths.
我们首先为我们采用的度量标准列出几个标准。这些标准旨在促进IPPM工作,使互联网用户和互联网提供商对通过互联网的端到端路径以及构成这些路径的特定“IP云”的性能和可靠性有最大的准确共识。
We next define some Internet vocabulary that will allow us to speak clearly about Internet components such as routers, paths, and clouds.
接下来,我们将定义一些Internet词汇表,这些词汇表将允许我们清楚地谈论Internet组件,如路由器、路径和云。
We then define the fundamental concepts of 'metric' and 'measurement methodology', which allow us to speak clearly about measurement issues. Given these concepts, we proceed to discuss the important issue of measurement uncertainties and errors, and develop a key, somewhat subtle notion of how they relate to the analytical framework shared by many aspects of the Internet engineering discipline. We then introduce the notion of empirically defined metrics, and finish this part of the document with a general discussion of how metrics can be 'composed'.
然后,我们定义了“度量”和“度量方法”的基本概念,这使我们能够清楚地谈论度量问题。鉴于这些概念,我们继续讨论测量不确定性和误差这一重要问题,并发展出一个关键的、有些微妙的概念,即它们如何与互联网工程学科许多方面共享的分析框架相关联。然后,我们介绍了经验定义的度量的概念,并在文档的这一部分结束时对度量如何“组合”进行了一般性讨论。
The remainder of the document deals with a variety of issues related to defining sound metrics and methodologies: how to deal with imperfect clocks; the notion of 'wire time' as distinct from 'host time'; how to aggregate sets of singleton metrics into samples and
本文件的其余部分涉及与定义合理的度量和方法有关的各种问题:如何处理不完美的时钟;与“主机时间”不同的“有线时间”概念;如何将单例度量集合聚合到样本和
derive sound statistics from those samples; why it is recommended to avoid thinking about Internet properties in probabilistic terms (such as the probability that a packet is dropped), since these terms often include implicit assumptions about how the network behaves; the utility of defining metrics in terms of packets of a generic type; the benefits of preferring IP addresses to DNS host names; and the notion of 'standard-formed' packets. An appendix discusses the Anderson-Darling test for gauging whether a set of values matches a given statistical distribution, and gives C code for an implementation of the test.
从这些样本中得出可靠的统计数据;为什么建议避免用概率术语(如数据包丢失的概率)来考虑互联网属性,因为这些术语通常包括关于网络行为的隐含假设;根据泛型类型的数据包定义度量的效用;将IP地址优先于DNS主机名的好处;以及“标准格式”数据包的概念。附录讨论了安德森-达林测试(Anderson-Darling test),用于衡量一组值是否与给定的统计分布相匹配,并给出了测试实现的C代码。
In some sections of the memo, we will surround some commentary text with the brackets {Comment: ... }. We stress that this commentary is only commentary, and is not itself part of the framework document or a proposal of particular metrics. In some cases this commentary will discuss some of the properties of metrics that might be envisioned, but the reader should assume that any such discussion is intended only to shed light on points made in the framework document, and not to suggest any specific metrics.
在备忘录的某些部分,我们将用括号{Comment:…}围绕一些评注文本。我们强调,本评注仅为评注,本身不是框架文件的一部分或特定指标的提案。在某些情况下,本评论将讨论可能设想的指标的某些属性,但读者应假设任何此类讨论仅旨在阐明框架文档中提出的要点,而不是建议任何特定指标。
The overarching goal of the IP Performance Metrics effort is to achieve a situation in which users and providers of Internet transport service have an accurate common understanding of the performance and reliability of the Internet component 'clouds' that they use/provide.
IP性能度量工作的总体目标是实现这样一种情况,即互联网传输服务的用户和提供商对其使用/提供的互联网组件“云”的性能和可靠性有准确的共识。
To achieve this, performance and reliability metrics for paths through the Internet must be developed. In several IETF meetings criteria for these metrics have been specified:
为了实现这一点,必须制定通过互联网的路径的性能和可靠性指标。在几次IETF会议中,已规定了这些指标的标准:
+ The metrics must be concrete and well-defined, + A methodology for a metric should have the property that it is repeatable: if the methodology is used multiple times under identical conditions, the same measurements should result in the same measurements. + The metrics must exhibit no bias for IP clouds implemented with identical technology, + The metrics must exhibit understood and fair bias for IP clouds implemented with non-identical technology, + The metrics must be useful to users and providers in understanding the performance they experience or provide,
+ 指标必须具体且定义明确,+指标的方法应具有可重复性:如果方法在相同条件下多次使用,相同的测量结果应相同。+对于采用相同技术实现的IP云,这些指标必须没有偏差,+对于采用非相同技术实现的IP云,这些指标必须具有可理解且公平的偏差,+这些指标必须有助于用户和提供商了解其体验或提供的性能,
+ The metrics must avoid inducing artificial performance goals.
+ 这些指标必须避免导致人为的绩效目标。
The following list defines terms that need to be precise in the development of path metrics. We begin with low-level notions of 'host', 'router', and 'link', then proceed to define the notions of 'path', 'IP cloud', and 'exchange' that allow us to segment a path into relevant pieces.
以下列表定义了在开发路径度量时需要精确的术语。我们从“主机”、“路由器”和“链路”的低级概念开始,然后继续定义“路径”、“IP云”和“交换”的概念,这些概念允许我们将路径分割成相关的部分。
host A computer capable of communicating using the Internet protocols; includes "routers".
托管一台能够使用互联网协议进行通信的计算机;包括“路由器”。
link A single link-level connection between two (or more) hosts; includes leased lines, ethernets, frame relay clouds, etc.
链接两台(或多台)主机之间的单链路级连接;包括租用线路、以太网、帧中继云等。
routerA host which facilitates network-level communication between hosts by forwarding IP packets.
routerA主机,通过转发IP数据包促进主机之间的网络级通信。
path A sequence of the form < h0, l1, h1, ..., ln, hn >, where n >= 0, each hi is a host, each li is a link between hi-1 and hi, each h1...hn-1 is a router. A pair <li, hi> is termed a 'hop'. In an appropriate operational configuration, the links and routers in the path facilitate network-layer communication of packets from h0 to hn. Note that path is a unidirectional concept.
路径:形式为<h0,l1,h1,…,ln,hn>的序列,其中n>=0,每个hi是主机,每个li是hi-1和hi之间的链路,每个h1…hn-1是路由器。一对<li,hi>被称为“hop”。在适当的操作配置中,路径中的链路和路由器促进从h0到hn的分组的网络层通信。请注意,路径是一个单向概念。
subpath Given a path, a subpath is any subsequence of the given path which is itself a path. (Thus, the first and last element of a subpath is a host.)
子路径给定一条路径,子路径是给定路径本身为路径的任何子序列。(因此,子路径的第一个和最后一个元素是主机。)
cloudAn undirected (possibly cyclic) graph whose vertices are routers and whose edges are links that connect pairs of routers. Formally, ethernets, frame relay clouds, and other links that connect more than two routers are modelled as fully-connected meshes of graph edges. Note that to connect to a cloud means to connect to a router of the cloud over a link; this link is not itself part of the cloud.
一个无向(可能是循环)图,其顶点是路由器,其边是连接路由器对的链接。形式上,以太网络、帧中继云和连接两个以上路由器的其他链路被建模为完全连接的图边网格。注意,连接到云意味着通过链路连接到云的路由器;此链接本身不是云的一部分。
exchange A special case of a link, an exchange directly connects either a host to a cloud and/or one cloud to another cloud.
exchange是链接的一种特殊情况,exchange直接将主机连接到云和/或将一个云连接到另一个云。
cloud subpath A subpath of a given path, all of whose hosts are routers of a given cloud.
云子路径给定路径的子路径,其所有主机都是给定云的路由器。
path digest A sequence of the form < h0, e1, C1, ..., en, hn >, where n >= 0, h0 and hn are hosts, each e1 ... en is an exchange, and each C1 ... Cn-1 is a cloud subpath.
路径摘要形式为<h0,e1,C1,…,en,hn>的序列,其中n>=0,h0和hn是主机,每个e1。。。en是一个交换,每个C1。。。Cn-1是一个云子路径。
In the operational Internet, there are several quantities related to the performance and reliability of the Internet that we'd like to know the value of. When such a quantity is carefully specified, we term the quantity a metric. We anticipate that there will be separate RFCs for each metric (or for each closely related group of metrics).
在可操作的互联网中,有几个与互联网的性能和可靠性相关的量,我们想知道它们的价值。如果仔细指定了此类数量,我们将其称为度量。我们预计每个指标(或每个密切相关的指标组)将有单独的RFC。
In some cases, there might be no obvious means to effectively measure the metric; this is allowed, and even understood to be very useful in some cases. It is required, however, that the specification of the metric be as clear as possible about what quantity is being specified. Thus, difficulty in practical measurement is sometimes allowed, but ambiguity in meaning is not.
在某些情况下,可能没有明显的手段来有效地衡量指标;这是允许的,甚至在某些情况下被理解为非常有用。然而,要求公制规格尽可能清楚地说明所规定的数量。因此,实际测量中的困难有时是允许的,但意义上的歧义是不允许的。
Each metric will be defined in terms of standard units of measurement. The international metric system will be used, with the following points specifically noted:
每种度量标准都将按照标准计量单位进行定义。将采用国际公制,并特别注明以下几点:
+ When a unit is expressed in simple meters (for distance/length) or seconds (for duration), appropriate related units based on thousands or thousandths of acceptable units are acceptable. Thus, distances expressed in kilometers (km), durations expressed in milliseconds (ms), or microseconds (us) are allowed, but not centimeters (because the prefix is not in terms of thousands or thousandths). + When a unit is expressed in a combination of units, appropriate related units based on thousands or thousandths of acceptable units are acceptable, but all such thousands/thousandths must be grouped at the beginning. Thus, kilo-meters per second (km/s) is allowed, but meters per millisecond is not. + The unit of information is the bit. + When metric prefixes are used with bits or with combinations including bits, those prefixes will have their metric meaning (related to decimal 1000), and not the meaning conventional with computer storage (related to decimal 1024). In any RFC that defines a metric whose units include bits, this convention will be followed and will be repeated to ensure clarity for the reader.
+ 当单位以简单米(表示距离/长度)或秒(表示持续时间)表示时,可接受基于千分之一或千分之一可接受单位的适当相关单位。因此,允许以公里(km)表示的距离、以毫秒(ms)或微秒(us)表示的持续时间,但不允许以厘米表示(因为前缀不是以千分之一或千分之一表示)。+当一个单位以单位组合表示时,基于千分之一或千分之一可接受单位的适当相关单位是可接受的,但所有这些千分之一必须在开始时分组。因此,允许千米/秒(km/s),但不允许米/毫秒信息的单位是位+当公制前缀与位或包括位的组合一起使用时,这些前缀将具有公制含义(与十进制1000相关),而不是计算机存储的常规含义(与十进制1024相关)。在定义单位包括位的度量的任何RFC中,将遵循并重复此约定,以确保读者的清晰度。
+ When a time is given, it will be expressed in UTC.
+ 当给定时间时,它将以UTC表示。
Note that these points apply to the specifications for metrics and not, for example, to packet formats where octets will likely be used in preference/addition to bits.
请注意,这些要点适用于度量规范,而不适用于数据包格式,例如,在数据包格式中,八位字节可能优先使用/添加到位。
Finally, we note that some metrics may be defined purely in terms of other metrics; such metrics are call 'derived metrics'.
最后,我们注意到一些度量可能纯粹根据其他度量定义;这种度量称为“派生度量”。
For a given set of well-defined metrics, a number of distinct measurement methodologies may exist. A partial list includes:
对于给定的一组定义良好的度量,可能存在许多不同的度量方法。部分清单包括:
+ Direct measurement of a performance metric using injected test traffic. Example: measurement of the round-trip delay of an IP packet of a given size over a given route at a given time. + Projection of a metric from lower-level measurements. Example: given accurate measurements of propagation delay and bandwidth for each step along a path, projection of the complete delay for the path for an IP packet of a given size. + Estimation of a constituent metric from a set of more aggregated measurements. Example: given accurate measurements of delay for a given one-hop path for IP packets of different sizes, estimation of propagation delay for the link of that one-hop path. + Estimation of a given metric at one time from a set of related metrics at other times. Example: given an accurate measurement of flow capacity at a past time, together with a set of accurate delay measurements for that past time and the current time, and given a model of flow dynamics, estimate the flow capacity that would be observed at the current time.
+ 使用注入测试流量直接测量性能指标。示例:在给定时间测量给定大小的IP数据包在给定路由上的往返延迟从较低级别的测量中投影度量。示例:给定路径上每一步的传播延迟和带宽的精确测量值,给定大小的IP数据包的路径完整延迟的投影从一组更聚合的度量值估计组成度量。示例:给定不同大小IP数据包的给定一跳路径的准确延迟测量值,该一跳路径链路的传播延迟估计值+从其他时间的一组相关度量一次估计给定度量。示例:给定过去时间流量的精确测量值,以及该过去时间和当前时间的一组精确延迟测量值,并给定流量动力学模型,估计当前时间将观察到的流量。
This list is by no means exhaustive. The purpose is to point out the variety of measurement techniques.
这份清单绝非详尽无遗。目的是指出测量技术的多样性。
When a given metric is specified, a given measurement approach might be noted and discussed. That approach, however, is not formally part of the specification.
当指定了一个给定的度量标准时,可以注意并讨论给定的测量方法。然而,这种方法并不是规范的正式组成部分。
A methodology for a metric should have the property that it is repeatable: if the methodology is used multiple times under identical conditions, it should result in consistent measurements.
度量方法应具有可重复性:如果在相同条件下多次使用该方法,则应产生一致的测量结果。
Backing off a little from the word 'identical' in the previous paragraph, we could more accurately use the word 'continuity' to describe a property of a given methodology: a methodology for a given metric exhibits continuity if, for small variations in conditions, it
从上一段中的“相同”一词中稍微退一步,我们可以更准确地使用“连续性”一词来描述给定方法的属性:给定度量的方法表现出连续性,如果对于条件的微小变化,它
results in small variations in the resulting measurements. Slightly more precisely, for every positive epsilon, there exists a positive delta, such that if two sets of conditions are within delta of each other, then the resulting measurements will be within epsilon of each other. At this point, this should be taken as a heuristic driving our intuition about one kind of robustness property rather than as a precise notion.
结果产生的测量值有微小变化。更精确地说,对于每个正ε,都存在一个正δ,这样,如果两组条件在彼此的δ范围内,那么得到的测量值将在彼此的ε范围内。在这一点上,这应该被视为一种启发,驱动我们对一种鲁棒性属性的直觉,而不是一个精确的概念。
A metric that has at least one methodology that exhibits continuity is said itself to exhibit continuity.
至少有一种方法具有连续性的度量称为自身具有连续性。
Note that some metrics, such as hop-count along a path, are integer-valued and therefore cannot exhibit continuity in quite the sense given above.
请注意,某些指标(如沿路径的跃点计数)是整数值,因此无法显示上述意义上的连续性。
Note further that, in practice, it may not be practical to know (or be able to quantify) the conditions relevant to a measurement at a given time. For example, since the instantaneous load (in packets to be served) at a given router in a high-speed wide-area network can vary widely over relatively brief periods and will be very hard for an external observer to quantify, various statistics of a given metric may be more repeatable, or may better exhibit continuity. In that case those particular statistics should be specified when the metric is specified.
进一步注意,在实践中,在给定时间了解(或能够量化)与测量相关的条件可能是不切实际的。例如,由于高速广域网中给定路由器处的瞬时负载(以要服务的分组为单位)在相对较短的时间段内变化很大,并且对于外部观察者来说将很难量化,因此给定度量的各种统计数据可能更可重复,或者可能更好地表现出连续性。在这种情况下,应在指定度量时指定这些特定统计信息。
Finally, some measurement methodologies may be 'conservative' in the sense that the act of measurement does not modify, or only slightly modifies, the value of the performance metric the methodology attempts to measure. {Comment: for example, in a wide-are high-speed network under modest load, a test using several small 'ping' packets to measure delay would likely not interfere (much) with the delay properties of that network as observed by others. The corresponding statement about tests using a large flow to measure flow capacity would likely fail.}
最后,一些测量方法可能是“保守”的,因为测量行为不会修改或只是略微修改方法试图测量的性能指标的值。{注释:例如,在中等负载的广域高速网络中,使用几个小的“ping”数据包来测量延迟的测试可能不会像其他人所观察到的那样干扰该网络的延迟特性。关于使用大流量来测量流容量的测试的相应声明可能会失败。}
Even the very best measurement methodologies for the very most well behaved metrics will exhibit errors. Those who develop such measurement methodologies, however, should strive to:
即使是表现最良好的度量标准的最佳度量方法也会出现错误。然而,开发此类测量方法的人员应努力:
+ minimize their uncertainties/errors, + understand and document the sources of uncertainty/error, and + quantify the amounts of uncertainty/error.
+ 将其不确定性/误差降至最低,+了解并记录不确定性/误差的来源,并+量化不确定性/误差的数量。
For example, when developing a method for measuring delay, understand how any errors in your clocks introduce errors into your delay measurement, and quantify this effect as well as you can. In some cases, this will result in a requirement that a clock be at least up to a certain quality if it is to be used to make a certain measurement.
例如,在开发测量延迟的方法时,了解时钟中的任何错误是如何将错误引入延迟测量的,并尽可能地量化这种影响。在某些情况下,如果要使用时钟进行特定测量,这将导致要求时钟至少达到特定质量。
As a second example, consider the timing error due to measurement overheads within the computer making the measurement, as opposed to delays due to the Internet component being measured. The former is a measurement error, while the latter reflects the metric of interest. Note that one technique that can help avoid this overhead is the use of a packet filter/sniffer, running on a separate computer that records network packets and timestamps them accurately (see the discussion of 'wire time' below). The resulting trace can then be analyzed to assess the test traffic, minimizing the effect of measurement host delays, or at least allowing those delays to be accounted for. We note that this technique may prove beneficial even if the packet filter/sniffer runs on the same machine, because such measurements generally provide 'kernel-level' timestamping as opposed to less-accurate 'application-level' timestamping.
作为第二个例子,考虑由于测量中的测量开销而引起的定时误差,而不是由于测量的因特网部件引起的延迟。前者是一种测量误差,而后者反映了人们的兴趣。请注意,可以帮助避免这种开销的一种技术是使用数据包过滤器/嗅探器,它运行在一台单独的计算机上,该计算机可以准确地记录网络数据包并给它们加上时间戳(请参阅下面关于“连线时间”的讨论)。然后可以分析得到的跟踪,以评估测试流量,最小化测量主机延迟的影响,或者至少允许考虑这些延迟。我们注意到,即使包过滤器/嗅探器在同一台机器上运行,这种技术也可能是有益的,因为这种测量通常提供“内核级”时间戳,而不是不太准确的“应用程序级”时间戳。
Finally, we note that derived metrics (defined above) or metrics that exhibit spatial or temporal composition (defined below) offer particular occasion for the analysis of measurement uncertainties, namely how the uncertainties propagate (conceptually) due to the derivation or composition.
最后,我们注意到,衍生指标(定义见上文)或表现出空间或时间组成的指标(定义见下文)为分析测量不确定性提供了特殊的机会,即不确定性如何因衍生或组成而传播(概念上)。
As the Internet has evolved from the early packet-switching studies of the 1960s, the Internet engineering community has evolved a common analytical framework of concepts. This analytical framework, or A-frame, used by designers and implementers of protocols, by those involved in measurement, and by those who study computer network performance using the tools of simulation and analysis, has great advantage to our work. A major objective here is to generate network characterizations that are consistent in both analytical and practical settings, since this will maximize the chances that non-empirical network study can be better correlated with, and used to further our understanding of, real network behavior.
随着互联网从20世纪60年代早期的分组交换研究发展而来,互联网工程界已经形成了一个通用的概念分析框架。协议的设计者和实现者、参与测量的人员以及使用模拟和分析工具研究计算机网络性能的人员使用的这种分析框架,或A框架,对我们的工作具有很大的优势。这里的一个主要目标是生成在分析和实际环境中都一致的网络特征,因为这将最大限度地增加非经验网络研究能够更好地与真实网络行为关联并用于进一步了解真实网络行为的机会。
Whenever possible, therefore, we would like to develop and leverage off of the A-frame. Thus, whenever a metric to be specified is understood to be closely related to concepts within the A-frame, we will attempt to specify the metric in the A-frame's terms. In such a specification we will develop the A-frame by precisely defining the concepts needed for the metric, then leverage off of the A-frame by defining the metric in terms of those concepts.
因此,只要有可能,我们都希望开发和利用A型架。因此,每当要指定的度量被理解为与a框架内的概念密切相关时,我们将尝试用a框架的术语指定度量。在这样的规范中,我们将通过精确定义度量所需的概念来开发a框架,然后通过根据这些概念定义度量来利用a框架。
Such a metric will be called an 'analytically specified metric' or, more simply, an analytical metric.
这种度量称为“分析指定度量”,或者更简单地说,称为分析度量。
{Comment: Examples of such analytical metrics might include:
{注释:此类分析指标的示例可能包括:
propagation time of a link The time, in seconds, required by a single bit to travel from the output port on one Internet host across a single link to another Internet host.
链路的传播时间—单个比特从一台Internet主机上的输出端口通过单个链路传输到另一台Internet主机所需的时间(以秒为单位)。
bandwidth of a link for packets of size k The capacity, in bits/second, where only those bits of the IP packet are counted, for packets of size k bytes.
大小为k的数据包的链路带宽大小为k字节的数据包的容量,单位为比特/秒,其中仅计算IP数据包的那些比特。
routeThe path, as defined in Section 5, from A to B at a given time.
路径如第5节所定义,在给定时间从A到B的路径。
hop count of a route The value 'n' of the route path. }
路由的跃点计数路由路径的值“n”。}
Note that we make no a priori list of just what A-frame concepts will emerge in these specifications, but we do encourage their use and urge that they be carefully specified so that, as our set of metrics develops, so will a specified set of A-frame concepts technically consistent with each other and consonant with the common understanding of those concepts within the general Internet community.
请注意,我们没有预先列出这些规范中将出现的a-frame概念,但我们鼓励使用这些概念,并敦促仔细指定它们,以便随着我们的一组度量标准的发展,因此,一组特定的a框架概念在技术上是相互一致的,并且与一般互联网社区对这些概念的共同理解是一致的。
These A-frame concepts will be intended to abstract from actual Internet components in such a way that:
这些A-frame概念旨在通过以下方式从实际互联网组件中提取:
+ the essential function of the component is retained, + properties of the component relevant to the metrics we aim to create are retained, + a subset of these component properties are potentially defined as analytical metrics, and
+ 保留组件的基本功能,+保留与我们旨在创建的度量相关的组件属性,+这些组件属性的子集可能定义为分析度量,以及
+ those properties of actual Internet components not relevant to defining the metrics we aim to create are dropped.
+ 那些与定义我们要创建的指标无关的实际互联网组件的属性将被删除。
For example, when considering a router in the context of packet forwarding, we might model the router as a component that receives packets on an input link, queues them on a FIFO packet queue of finite size, employs tail-drop when the packet queue is full, and forwards them on an output link. The transmission speed (in bits/second) of the input and output links, the latency in the router (in seconds), and the maximum size of the packet queue (in bits) are relevant analytical metrics.
例如,在包转发上下文中考虑路由器时,我们可以将路由器建模为一个组件,该组件在输入链路上接收包,在有限大小的FIFO包队列上排队,在包队列已满时使用尾部丢弃,并在输出链路上转发包。输入和输出链路的传输速度(以位/秒为单位)、路由器中的延迟(以秒为单位)以及数据包队列的最大大小(以位为单位)是相关的分析指标。
In some cases, such analytical metrics used in relation to a router will be very closely related to specific metrics of the performance of Internet paths. For example, an obvious formula (L + P/B) involving the latency in the router (L), the packet size (in bits) (P), and the transmission speed of the output link (B) might closely approximate the increase in packet delay due to the insertion of a given router along a path.
在某些情况下,与路由器相关的此类分析指标将与互联网路径性能的特定指标密切相关。例如,一个明显的公式(L+P/B)涉及路由器中的延迟(L)、分组大小(以比特为单位)(P)和输出链路(B)的传输速度,该公式可能近似于由于沿路径插入给定路由器而导致的分组延迟的增加。
We stress, however, that well-chosen and well-specified A-frame concepts and their analytical metrics will support more general metric creation efforts in less obvious ways.
然而,我们强调,精心选择和指定的A框架概念及其分析度量将以不太明显的方式支持更一般的度量创建工作。
{Comment: for example, when considering the flow capacity of a path, it may be of real value to be able to model each of the routers along the path as packet forwarders as above. Techniques for estimating the flow capacity of a path might use the maximum packet queue size as a parameter in decidedly non-obvious ways. For example, as the maximum queue size increases, so will the ability of the router to continuously move traffic along an output link despite fluctuations in traffic from an input link. Estimating this increase, however, remains a research topic.}
{注释:例如,在考虑路径的流量时,能够将路径上的每个路由器建模为如上所述的数据包转发器可能具有实际价值。估计路径流量的技术可能会使用最大数据包队列大小作为一个参数,其方式显然不明显。例如,作为最大queue大小增加,路由器在输入链路流量波动的情况下仍能沿着输出链路连续移动流量。然而,估计这种增加仍然是一个研究课题。}
Note that, when we specify A-frame concepts and analytical metrics, we will inevitably make simplifying assumptions. The key role of these concepts is to abstract the properties of the Internet components relevant to given metrics. Judgement is required to avoid making assumptions that bias the modeling and metric effort toward one kind of design.
请注意,当我们指定A-frame概念和分析指标时,我们将不可避免地进行简化假设。这些概念的关键作用是抽象与给定度量相关的互联网组件的属性。需要进行判断,以避免做出将建模和度量工作偏向于一种设计的假设。
{Comment: for example, routers might not use tail-drop, even though tail-drop might be easier to model analytically.}
{注释:例如,路由器可能不使用尾部丢弃,即使尾部丢弃可能更易于分析建模。}
Finally, note that different elements of the A-frame might well make different simplifying assumptions. For example, the abstraction of a router used to further the definition of path delay might treat the router's packet queue as a single FIFO queue, but the abstraction of
最后,请注意,A形框架的不同元素可能会做出不同的简化假设。例如,用于进一步定义路径延迟的路由器抽象可能将路由器的数据包队列视为单个FIFO队列,但
a router used to further the definition of the handling of an RSVP-enabled packet might treat the router's packet queue as supporting bounded delay -- a contradictory assumption. This is not to say that we make contradictory assumptions at the same time, but that two different parts of our work might refine the simpler base concept in two divergent ways for different purposes.
用于进一步定义支持RSVP的数据包处理的路由器可能会将路由器的数据包队列视为支持有界延迟——这是一个相互矛盾的假设。这并不是说我们同时做出了相互矛盾的假设,而是说我们工作的两个不同部分可能会以两种不同的方式为不同的目的完善更简单的基本概念。
{Comment: in more mathematical terms, we would say that the A-frame taken as a whole need not be consistent; but the set of particular A-frame elements used to define a particular metric must be.}
{注释:用更数学的术语来说,我们可以说A-框架作为一个整体不需要是一致的;但用于定义特定度量的特定A-框架元素集必须是一致的。}
There are useful performance and reliability metrics that do not fit so neatly into the A-frame, usually because the A-frame lacks the detail or power for dealing with them. For example, "the best flow capacity achievable along a path using an RFC-2001-compliant TCP" would be good to be able to measure, but we have no analytical framework of sufficient richness to allow us to cast that flow capacity as an analytical metric.
有一些有用的性能和可靠性指标不能很好地融入A框架,通常是因为A框架缺乏处理这些指标的细节或能力。例如,“使用符合RFC-2001的TCP沿路径可实现的最佳流量”最好能够进行测量,但我们没有足够丰富的分析框架,无法将该流量作为分析指标。
These notions can still be well specified by instead describing a reference methodology for measuring them.
这些概念仍然可以通过描述测量它们的参考方法来很好地说明。
Such a metric will be called an 'empirically specified metric', or more simply, an empirical metric.
这种度量称为“经验指定度量”,或者更简单地说,称为经验度量。
Such empirical metrics should have three properties:
此类经验指标应具有三个属性:
+ we should have a clear definition for each in terms of Internet components, + we should have at least one effective means to measure them, and + to the extent possible, we should have an (necessarily incomplete) understanding of the metric in terms of the A-frame so that we can use our measurements to reason about the performance and reliability of A-frame components and of aggregations of A-frame components.
+ 我们应该对每一个互联网组件都有一个明确的定义,+我们应该至少有一个有效的方法来衡量它们,并且+在可能的范围内,我们应该有一个(必然不完整的)了解A框架方面的指标,以便我们可以使用我们的测量结果来推断A框架组件和A框架组件聚合的性能和可靠性。
In some cases, it may be realistic and useful to define metrics in such a fashion that they exhibit spatial composition.
在某些情况下,以这样一种方式定义度量可能是现实和有用的,即它们显示空间组成。
By spatial composition, we mean a characteristic of some path metrics, in which the metric as applied to a (complete) path can also be defined for various subpaths, and in which the appropriate A-frame concepts for the metric suggest useful relationships between the metric applied to these various subpaths (including the complete path, the various cloud subpaths of a given path digest, and even single routers along the path). The effectiveness of spatial composition depends:
通过空间组合,我们指的是一些路径度量的特征,其中应用于(完整)路径的度量也可以为各种子路径定义,并且在其中,度量的适当a帧概念建议了应用于这些不同子路径的度量之间的有用关系(包括完整路径、给定路径摘要的各种云子路径,甚至路径上的单个路由器)。空间组合的有效性取决于:
+ on the usefulness in analysis of these relationships as applied to the relevant A-frame components, and + on the practical use of the corresponding relationships as applied to metrics and to measurement methodologies.
+ 分析适用于相关A型架组件的这些关系的有用性,以及适用于度量和测量方法的相应关系的实际使用。
{Comment: for example, consider some metric for delay of a 100-byte packet across a path P, and consider further a path digest <h0, e1, C1, ..., en, hn> of P. The definition of such a metric might include a conjecture that the delay across P is very nearly the sum of the corresponding metric across the exchanges (ei) and clouds (Ci) of the given path digest. The definition would further include a note on how a corresponding relation applies to relevant A-frame components, both for the path P and for the exchanges and clouds of the path digest.}
{注释:例如,考虑一个100字节分组在路径P上的延迟的度量,并进一步考虑路径摘要H.0、E1、C1、…、EN、HN> P。这样的度量的定义可能包括一个猜想,即跨越P的延迟几乎是交换(EI)和云(CI)上相应度量的总和。该定义将进一步包括一个注释,说明对应关系如何应用于路径P以及路径摘要的交换和云的相关a帧组件。}
When the definition of a metric includes a conjecture that the metric across the path is related to the metric across the subpaths of the path, that conjecture constitutes a claim that the metric exhibits spatial composition. The definition should then include:
当度量的定义包括一个假设,即路径上的度量与路径子路径上的度量相关时,该假设构成了该度量呈现空间组成的主张。然后,定义应包括:
+ the specific conjecture applied to the metric, + a justification of the practical utility of the composition in terms of making accurate measurements of the metric on the path, + a justification of the usefulness of the composition in terms of making analysis of the path using A-frame concepts more effective, and + an analysis of how the conjecture could be incorrect.
+ 应用于度量的具体推测,+证明合成的实际效用,即在路径上准确测量度量,+证明合成的有用性,即使用a框架概念更有效地分析路径,并分析了这个猜想可能是不正确的。
In some cases, it may be realistic and useful to define metrics in such a fashion that they exhibit temporal composition.
在某些情况下,以这样一种方式来定义度量可能是现实和有用的,即它们显示出时间组合。
By temporal composition, we mean a characteristic of some path metric, in which the metric as applied to a path at a given time T is also defined for various times t0 < t1 < ... < tn < T, and in which the appropriate A-frame concepts for the metric suggests useful relationships between the metric applied at times t0, ..., tn and the metric applied at time T. The effectiveness of temporal composition depends:
所谓时间组合,我们指的是某些路径度量的特征,其中在给定时间T应用于路径的度量也在不同时间t0<t1<…<tn<T,其中度量的适当A框架概念表明在时间t0、…、tn应用的度量和在时间T应用的度量之间的有用关系。时间合成的有效性取决于:
+ on the usefulness in analysis of these relationships as applied to the relevant A-frame components, and + on the practical use of the corresponding relationships as applied to metrics and to measurement methodologies.
+ 分析适用于相关A型架组件的这些关系的有用性,以及适用于度量和测量方法的相应关系的实际使用。
{Comment: for example, consider a metric for the expected flow capacity across a path P during the five-minute period surrounding the time T, and suppose further that we have the corresponding values for each of the four previous five-minute periods t0, t1, t2, and t3. The definition of such a metric might include a conjecture that the flow capacity at time T can be estimated from a certain kind of extrapolation from the values of t0, ..., t3. The definition would further include a note on how a corresponding relation applies to relevant A-frame components.
{注释:例如,考虑在时间t周围的五分钟周期内的路径P上的期望流量容量的度量,并且进一步假设我们有四个前五分钟周期T0、T1、T2和T3中的每一个的对应值。这样的度量的定义可能包括一个猜想:流电容。时间T的ty可以通过t0、…、t3的值的某种外推来估计。该定义还将包括关于对应关系如何应用于相关a帧组件的注释。
Note: any (spatial or temporal) compositions involving flow capacity are likely to be subtle, and temporal compositions are generally more subtle than spatial compositions, so the reader should understand that the foregoing example is intentionally naive.}
注:涉及流量的任何(空间或时间)构图都可能是微妙的,而时间构图通常比空间构图更微妙,因此读者应该理解前面的示例是故意幼稚的。}
When the definition of a metric includes a conjecture that the metric across the path at a given time T is related to the metric across the path for a set of other times, that conjecture constitutes a claim that the metric exhibits temporal composition. The definition should then include:
当度量的定义包括一个假设,即在给定时间T跨越路径的度量与跨越路径的度量在一组其他时间相关时,该假设构成了该度量呈现时间组成的主张。然后,定义应包括:
+ the specific conjecture applied to the metric, + a justification of the practical utility of the composition in terms of making accurate measurements of the metric on the path, and + a justification of the usefulness of the composition in terms of making analysis of the path using A-frame concepts more effective.
+ 应用于度量的具体推测,+证明合成的实际效用,即在路径上对度量进行精确测量,+证明合成的有用性,即使用a框架概念对路径进行更有效的分析。
Measurements of time lie at the heart of many Internet metrics. Because of this, it will often be crucial when designing a methodology for measuring a metric to understand the different types of errors and uncertainties introduced by imperfect clocks. In this section we define terminology for discussing the characteristics of clocks and touch upon related measurement issues which need to be addressed by any sound methodology.
时间测量是许多互联网指标的核心。因此,在设计度量方法时,了解由不完美时钟引入的不同类型的误差和不确定性通常是至关重要的。在本节中,我们定义了用于讨论时钟特性的术语,并涉及需要通过任何声音方法解决的相关测量问题。
The Network Time Protocol (NTP; RFC 1305) defines a nomenclature for discussing clock characteristics, which we will also use when appropriate [Mi92]. The main goal of NTP is to provide accurate timekeeping over fairly long time scales, such as minutes to days, while for measurement purposes often what is more important is short-term accuracy, between the beginning of the measurement and the end, or over the course of gathering a body of measurements (a sample). This difference in goals sometimes leads to different definitions of terminology as well, as discussed below.
网络时间协议(NTP;RFC 1305)定义了讨论时钟特性的术语,我们也将在适当时使用[Mi92]。NTP的主要目标是在相当长的时间范围内(如分钟到天)提供准确的计时,而对于测量目的而言,更重要的是在测量开始到结束之间,或在收集测量主体(样本)的过程中,提供短期的准确度。目标的这种差异有时也会导致术语的不同定义,如下所述。
To begin, we define a clock's "offset" at a particular moment as the difference between the time reported by the clock and the "true" time as defined by UTC. If the clock reports a time Tc and the true time is Tt, then the clock's offset is Tc - Tt.
首先,我们将时钟在特定时刻的“偏移”定义为时钟报告的时间与UTC定义的“真实”时间之间的差值。如果时钟报告时间Tc,且真实时间为Tt,则时钟的偏移量为Tc-Tt。
We will refer to a clock as "accurate" at a particular moment if the clock's offset is zero, and more generally a clock's "accuracy" is how close the absolute value of the offset is to zero. For NTP, accuracy also includes a notion of the frequency of the clock; for our purposes, we instead incorporate this notion into that of "skew", because we define accuracy in terms of a single moment in time rather than over an interval of time.
如果时钟的偏移量为零,我们将在特定时刻将时钟称为“准确”,更一般地说,时钟的“准确度”是指偏移量的绝对值接近零的程度。对于NTP,精度还包括时钟频率的概念;出于我们的目的,我们将这一概念纳入了“倾斜”的概念中,因为我们将准确度定义为时间上的单个时刻,而不是时间间隔。
A clock's "skew" at a particular moment is the frequency difference (first derivative of its offset with respect to true time) between the clock and true time.
时钟在特定时刻的“偏移”是时钟和真实时间之间的频率差(其偏移量相对于真实时间的一阶导数)。
As noted in RFC 1305, real clocks exhibit some variation in skew. That is, the second derivative of the clock's offset with respect to true time is generally non-zero. In keeping with RFC 1305, we define this quantity as the clock's "drift".
如RFC1305中所述,实际时钟在倾斜方面表现出一些变化。也就是说,时钟偏移量相对于真实时间的二阶导数通常为非零。根据RFC1305,我们将该量定义为时钟的“漂移”。
A clock's "resolution" is the smallest unit by which the clock's time is updated. It gives a lower bound on the clock's uncertainty. (Note that clocks can have very fine resolutions and yet be wildly inaccurate.) Resolution is defined in terms of seconds. However, resolution is relative to the clock's reported time and not to true time, so for example a resolution of 10 ms only means that the clock updates its notion of time in 0.01 second increments, not that this is the true amount of time between updates.
时钟的“分辨率”是时钟时间更新的最小单位。它给出了时钟不确定性的下限。(请注意,时钟可以具有非常精细的分辨率,但却非常不准确。)分辨率是以秒为单位定义的。但是,分辨率与时钟的报告时间有关,而与真实时间无关,因此,例如,10 ms的分辨率仅意味着时钟以0.01秒的增量更新其时间概念,而不是更新之间的真实时间量。
{Comment: Systems differ on how an application interface to the clock reports the time on subsequent calls during which the clock has not advanced. Some systems simply return the same unchanged time as given for previous calls. Others may add a small increment to the reported time to maintain monotone-increasing timestamps. For systems that do the latter, we do *not* consider these small increments when defining the clock's resolution. They are instead an impediment to assessing the clock's resolution, since a natural method for doing so is to repeatedly query the clock to determine the smallest non-zero difference in reported times.}
{注释:对于应用程序接口如何向时钟报告时钟未提前的后续调用的时间,系统有所不同。一些系统只是返回与先前调用相同的不变时间。其他系统可能会向报告的时间添加一个小增量,以保持单调递增的时间戳。对于执行之后,我们在定义时钟分辨率时不考虑这些小增量,而是评估时钟分辨率的障碍,因为这样做的自然方法是反复查询时钟以确定报告时间中的最小非零差异。
It is expected that a clock's resolution changes only rarely (for example, due to a hardware upgrade).
预计时钟的分辨率变化很少(例如,由于硬件升级)。
There are a number of interesting metrics for which some natural measurement methodologies involve comparing times reported by two different clocks. An example is one-way packet delay [AK97]. Here, the time required for a packet to travel through the network is measured by comparing the time reported by a clock at one end of the packet's path, corresponding to when the packet first entered the network, with the time reported by a clock at the other end of the path, corresponding to when the packet finished traversing the network.
有许多有趣的指标,一些自然测量方法涉及比较两个不同时钟报告的时间。一个例子是单向分组延迟[AK97]。这里,通过比较分组路径一端的时钟报告的时间(对应于分组首次进入网络的时间)与路径另一端的时钟报告的时间(对应于分组完成穿越网络的时间)来测量分组穿越网络所需的时间。
We are thus also interested in terminology for describing how two clocks C1 and C2 compare. To do so, we introduce terms related to those above in which the notion of "true time" is replaced by the time as reported by clock C1. For example, clock C2's offset relative to C1 at a particular moment is Tc2 - Tc1, the instantaneous difference in time reported by C2 and C1. To disambiguate between the use of the terms to compare two clocks versus the use of the terms to compare to true time, we will in the former case use the phrase "relative". So the offset defined earlier in this paragraph is the "relative offset" between C2 and C1.
因此,我们也对描述两个时钟C1和C2如何比较的术语感兴趣。为此,我们引入了与上述术语相关的术语,其中“真实时间”的概念由时钟C1报告的时间代替。例如,时钟C2在特定时刻相对于C1的偏移量为Tc2-Tc1,即C2和C1报告的瞬时时间差。为了消除使用术语比较两个时钟与使用术语比较真实时间之间的歧义,我们将在前一种情况下使用短语“相对”。因此,本段前面定义的偏移量是C2和C1之间的“相对偏移量”。
When comparing clocks, the analog of "resolution" is not "relative resolution", but instead "joint resolution", which is the sum of the resolutions of C1 and C2. The joint resolution then indicates a conservative lower bound on the accuracy of any time intervals computed by subtracting timestamps generated by one clock from those generated by the other.
在比较时钟时,“分辨率”的模拟值不是“相对分辨率”,而是“联合分辨率”,即C1和C2分辨率之和。然后,联合分辨率指示通过将一个时钟生成的时间戳减去另一个时钟生成的时间戳而计算出的任何时间间隔的精度的保守下限。
If two clocks are "accurate" with respect to one another (their relative offset is zero), we will refer to the pair of clocks as "synchronized". Note that clocks can be highly synchronized yet arbitrarily inaccurate in terms of how well they tell true time. This point is important because for many Internet measurements, synchronization between two clocks is more important than the accuracy of the clocks. The is somewhat true of skew, too: as long as the absolute skew is not too great, then minimal relative skew is more important, as it can induce systematic trends in packet transit times measured by comparing timestamps produced by the two clocks.
如果两个时钟彼此“精确”(它们的相对偏移量为零),我们将这对时钟称为“同步”。请注意,时钟可以高度同步,但就它们告诉真实时间的程度而言,可以任意不准确。这一点很重要,因为对于许多互联网测量,两个时钟之间的同步比时钟的精度更重要。歪斜也是如此:只要绝对歪斜不是太大,那么最小相对歪斜就更重要,因为它可以通过比较两个时钟产生的时间戳来诱导数据包传输时间的系统趋势。
These distinctions arise because for Internet measurement what is often most important are differences in time as computed by comparing the output of two clocks. The process of computing the difference removes any error due to clock inaccuracies with respect to true time; but it is crucial that the differences themselves accurately reflect differences in true time.
产生这些区别是因为对于互联网测量来说,最重要的是通过比较两个时钟的输出计算出的时间差。计算差分的过程消除了由于相对于真实时间的时钟不准确而引起的任何误差;但至关重要的是,差异本身要准确反映真实时间的差异。
Measurement methodologies will often begin with the step of assuring that two clocks are synchronized and have minimal skew and drift. {Comment: An effective way to assure these conditions (and also clock accuracy) is by using clocks that derive their notion of time from an external source, rather than only the host computer's clock. (These latter are often subject to large errors.) It is further preferable that the clocks directly derive their time, for example by having immediate access to a GPS (Global Positioning System) unit.}
测量方法通常从确保两个时钟同步以及最小偏差和漂移开始。{注释:确保这些条件(以及时钟精度)的一种有效方法是使用从外部来源而非仅从主机的时钟获得时间概念的时钟(后者通常会出现较大错误)更优选的是,时钟直接导出其时间,例如通过立即访问GPS(全球定位系统)单元。}
Two important concerns arise if the clocks indirectly derive their time using a network time synchronization protocol such as NTP:
如果时钟使用网络时间同步协议(如NTP)间接获得其时间,则会出现两个重要问题:
+ First, NTP's accuracy depends in part on the properties (particularly delay) of the Internet paths used by the NTP peers, and these might be exactly the properties that we wish to measure, so it would be unsound to use NTP to calibrate such measurements. + Second, NTP focuses on clock accuracy, which can come at the expense of short-term clock skew and drift. For example, when a host's clock is indirectly synchronized to a time source, if the synchronization intervals occur infrequently, then the host will sometimes be faced with the problem of how to adjust its current, incorrect time, Ti, with a considerably different, more accurate time it has just learned, Ta. Two general ways in which this is
+ 首先,NTP的准确性部分取决于NTP对等方使用的互联网路径的属性(特别是延迟),而这些可能正是我们希望测量的属性,因此使用NTP校准此类测量是不合理的其次,NTP关注时钟精度,这可能以短期时钟偏差和漂移为代价。例如,当主机的时钟间接地与时间源同步时,如果同步间隔不经常发生,那么主机有时将面临如何调整其当前的错误时间Ti的问题,该时间Ti与它刚刚学习到的时间Ta相差很大,更准确。这有两种通用方法
done are to either immediately set the current time to Ta, or to adjust the local clock's update frequency (hence, its skew) so that at some point in the future the local time Ti' will agree with the more accurate time Ta'. The first mechanism introduces discontinuities and can also violate common assumptions that timestamps are monotone increasing. If the host's clock is set backward in time, sometimes this can be easily detected. If the clock is set forward in time, this can be harder to detect. The skew induced by the second mechanism can lead to considerable inaccuracies when computing differences in time, as discussed above.
所做的是立即将当前时间设置为Ta,或调整本地时钟的更新频率(因此,其偏移),以便在将来的某个时间点,本地时间Ti‘将与更精确的时间Ta一致’。第一种机制引入不连续性,也可能违反时间戳单调递增的常见假设。如果主机的时钟设置在时间上向后,有时很容易检测到。如果时钟在时间上被设置为向前,这可能更难检测到。如上所述,由第二种机制引起的倾斜在计算时间差时会导致相当大的误差。
To illustrate why skew is a crucial concern, consider samples of one-way delays between two Internet hosts made at one minute intervals. The true transmission delay between the hosts might plausibly be on the order of 50 ms for a transcontinental path. If the skew between the two clocks is 0.01%, that is, 1 part in 10,000, then after 10 minutes of observation the error introduced into the measurement is 60 ms. Unless corrected, this error is enough to completely wipe out any accuracy in the transmission delay measurement. Finally, we note that assessing skew errors between unsynchronized network clocks is an open research area. (See [Pa97] for a discussion of detecting and compensating for these sorts of errors.) This shortcoming makes use of a solid, independent clock source such as GPS especially desirable.
为了说明为什么歪斜是一个至关重要的问题,考虑两个互联网主机之间的单向延迟的样本在一分钟的间隔。对于横贯大陆的路径,主机之间的真实传输延迟可能大约为50毫秒。如果两个时钟之间的偏差为0.01%,即10000分之一,则观察10分钟后,引入测量的误差为60 ms。除非纠正,否则该误差足以完全消除传输延迟测量中的任何精度。最后,我们注意到,评估非同步网络时钟之间的偏差误差是一个开放的研究领域。(有关检测和补偿此类错误的讨论,请参见[Pa97])这一缺点使得使用可靠、独立的时钟源(如GPS)尤其可取。
Internet measurement is often complicated by the use of Internet hosts themselves to perform the measurement. These hosts can introduce delays, bottlenecks, and the like that are due to hardware or operating system effects and have nothing to do with the network behavior we would like to measure. This problem is particularly acute when timestamping of network events occurs at the application level.
由于使用互联网主机本身进行测量,互联网测量往往变得复杂。这些主机可能会引入延迟、瓶颈等,这些都是由于硬件或操作系统的影响造成的,与我们想要测量的网络行为无关。当网络事件的时间戳发生在应用程序级别时,这个问题尤其严重。
In order to provide a general way of talking about these effects, we introduce two notions of "wire time". These notions are only defined in terms of an Internet host H observing an Internet link L at a particular location:
为了提供讨论这些效应的一般方法,我们引入了“连线时间”的两个概念。这些概念仅根据在特定位置观察因特网链路L的因特网主机H来定义:
+ For a given packet P, the 'wire arrival time' of P at H on L is the first time T at which any bit of P has appeared at H's observational position on L.
+ 对于给定的数据包P,P在L上的H处的“有线到达时间”是P的任何比特在L上的H的观测位置出现的第一时间T。
+ For a given packet P, the 'wire exit time' of P at H on L is the first time T at which all the bits of P have appeared at H's observational position on L.
+ 对于给定的数据包P,P在L上的H处的“连线退出时间”是P的所有比特在L上的H的观测位置出现的第一时间T。
Note that intrinsic to the definition is the notion of where on the link we are observing. This distinction is important because for large-latency links, we may obtain very different times depending on exactly where we are observing the link. We could allow the observational position to be an arbitrary location along the link; however, we define it to be in terms of an Internet host because we anticipate in practice that, for IPPM metrics, all such timing will be constrained to be performed by Internet hosts, rather than specialized hardware devices that might be able to monitor a link at locations where a host cannot. This definition also takes care of the problem of links that are comprised of multiple physical channels. Because these multiple channels are not visible at the IP layer, they cannot be individually observed in terms of the above definitions.
请注意,定义的内在含义是我们正在观察链接的位置。这一区别很重要,因为对于大延迟链路,我们可能会获得非常不同的时间,具体取决于我们观察链路的位置。我们可以允许观测位置是沿着链路的任意位置;然而,我们将其定义为互联网主机,因为我们在实践中预计,对于IPPM度量,所有此类计时将被限制由互联网主机执行,而不是由能够在主机无法监控的位置监控链路的专用硬件设备执行。此定义还考虑了由多个物理通道组成的链路的问题。由于这些多个通道在IP层不可见,因此无法根据上述定义单独观察它们。
It is possible, though one hopes uncommon, that a packet P might make multiple trips over a particular link L, due to a forwarding loop. These trips might even overlap, depending on the link technology. Whenever this occurs, we define a separate wire time associated with each instance of P seen at H's position on the link. This definition is worth making because it serves as a reminder that notions like *the* unique time a packet passes a point in the Internet are inherently slippery.
尽管人们希望不常见,但由于转发循环,分组P可能在特定链路L上进行多次旅行。这些行程甚至可能重叠,这取决于链路技术。每当发生这种情况时,我们都会定义一个单独的连线时间,该时间与在链接上H的位置看到的每个P实例相关联。这个定义值得一提,因为它提醒我们,像“数据包在互联网上通过某个点的唯一时间”这样的概念本质上是不稳定的。
The term wire time has historically been used to loosely denote the time at which a packet appeared on a link, without exactly specifying whether this refers to the first bit, the last bit, or some other consideration. This informal definition is generally already very useful, as it is usually used to make a distinction between when the packet's propagation delays begin and cease to be due to the network rather than the endpoint hosts.
术语wire time过去被用来松散地表示数据包出现在链路上的时间,而没有确切地指定这是指第一位、最后一位还是其他一些考虑因素。这种非正式的定义通常已经非常有用,因为它通常用于区分数据包的传播延迟何时开始和何时停止是由于网络而不是端点主机造成的。
When appropriate, metrics should be defined in terms of wire times rather than host endpoint times, so that the metric's definition highlights the issue of separating delays due to the host from those due to the network.
在适当的情况下,应根据接线时间而不是主机端点时间来定义度量,以便度量的定义突出了区分主机延迟和网络延迟的问题。
We note that one potential difficulty when dealing with wire times concerns IP fragments. It may be the case that, due to fragmentation, only a portion of a particular packet passes by H's location. Such fragments are themselves legitimate packets and have well-defined wire times associated with them; but the larger IP packet corresponding to their aggregate may not.
我们注意到,在处理连线时间时,一个潜在的困难与IP片段有关。可能的情况是,由于分段,只有特定分组的一部分经过H的位置。这些片段本身就是合法的数据包,并且具有与它们相关联的定义良好的连接时间;但是,与它们的聚合相对应的较大IP数据包可能不会。
We also note that these notions have not, to our knowledge, been previously defined in exact terms for Internet traffic. Consequently, we may find with experience that these definitions require some adjustment in the future.
我们还注意到,据我们所知,这些概念以前没有被准确地定义为互联网流量。因此,根据经验,我们可能会发现这些定义在将来需要进行一些调整。
{Comment: It can sometimes be difficult to measure wire times. One technique is to use a packet filter to monitor traffic on a link. The architecture of these filters often attempts to associate with each packet a timestamp as close to the wire time as possible. We note however that one common source of error is to run the packet filter on one of the endpoint hosts. In this case, it has been observed that some packet filters receive for some packets timestamps corresponding to when the packet was *scheduled* to be injected into the network, rather than when it actually was *sent* out onto the network (wire time). There can be a substantial difference between these two times. A technique for dealing with this problem is to run the packet filter on a separate host that passively monitors the given link. This can be problematic however for some link technologies. See [Pa97] for a discussion of the sorts of errors packet filters can exhibit. Finally, we note that packet filters will often only capture the first fragment of a fragmented IP packet, due to the use of filtering on fields in the IP and transport protocol headers. As we generally desire our measurement methodologies to avoid the complexity of creating fragmented traffic, one strategy for dealing with their presence as detected by a packet filter is to flag that the measured traffic has an unusual form and abandon further analysis of the packet timing.}
{注释:有时很难测量接线时间。一种技术是使用数据包过滤器来监控链路上的流量。这些过滤器的体系结构通常试图将时间戳与每个数据包关联,尽可能接近接线时间。但是,我们注意到,一个常见的错误源是在其中一个数据包上运行数据包过滤器端点主机。在这种情况下,已观察到一些数据包过滤器接收与数据包*计划*注入网络的时间对应的一些数据包时间戳,而不是数据包实际*发送*到网络的时间(连线时间)。这两次之间可能存在很大差异。解决此问题的一种技术是在被动监视给定链路的单独主机上运行数据包筛选器。但是,对于某些链路技术,这可能会有问题。请参阅[Pa97]对于错误种类的讨论,数据包过滤器可能会显示。最后,我们注意到,由于在IP和传输协议头中的字段上使用过滤,数据包过滤器通常只捕获碎片IP数据包的第一个片段。因为我们通常希望我们的测量方法能够避免创建碎片的复杂性有意识的流量,一种处理数据包过滤器检测到的流量的策略是标记测量的流量具有异常形式,并放弃对数据包定时的进一步分析。}
With experience we have found it useful to introduce a separation between three distinct -- yet related -- notions:
根据经验,我们发现在三个不同但相关的概念之间引入分离是有用的:
+ By a 'singleton' metric, we refer to metrics that are, in a sense, atomic. For example, a single instance of "bulk throughput capacity" from one host to another might be defined as a singleton metric, even though the instance involves measuring the timing of a number of Internet packets. + By a 'sample' metric, we refer to metrics derived from a given singleton metric by taking a number of distinct instances together. For example, we might define a sample metric of one-way delays from one host to another as an hour's worth of measurements, each made at Poisson intervals with a mean spacing of one second.
+ 通过“单例”度量,我们指的是某种意义上的原子度量。例如,从一台主机到另一台主机的“批量吞吐量容量”的单个实例可能被定义为单例度量,即使该实例涉及测量多个Internet数据包的定时通过“示例”度量,我们指的是通过将多个不同的实例放在一起,从给定的单例度量派生出来的度量。例如,我们可以将从一个主机到另一个主机的单向延迟的样本度量定义为一小时的测量值,每个测量值以泊松间隔进行,平均间隔为1秒。
+ By a 'statistical' metric, we refer to metrics derived from a given sample metric by computing some statistic of the values defined by the singleton metric on the sample. For example, the mean of all the one-way delay values on the sample given above might be defined as a statistical metric.
+ 通过“统计”度量,我们指的是从给定样本度量派生的度量,通过计算样本上的单例度量定义的值的一些统计信息。例如,上面给出的样本上所有单向延迟值的平均值可以定义为统计度量。
By applying these notions of singleton, sample, and statistic in a consistent way, we will be able to reuse lessons learned about how to define samples and statistics on various metrics. The orthogonality among these three notions will thus make all our work more effective and more intelligible by the community.
通过以一致的方式应用这些单例、样本和统计的概念,我们将能够重用关于如何定义各种度量的样本和统计的经验教训。因此,这三个概念之间的正交性将使我们的所有工作更有效,更容易为社会所理解。
In the remainder of this section, we will cover some topics in sampling and statistics that we believe will be important to a variety of metric definitions and measurement efforts.
在本节的其余部分中,我们将介绍抽样和统计中的一些主题,我们认为这些主题对各种度量定义和度量工作都很重要。
The main reason for collecting samples is to see what sort of variations and consistencies are present in the metric being measured. These variations might be with respect to different points in the Internet, or different measurement times. When assessing variations based on a sample, one generally makes an assumption that the sample is "unbiased", meaning that the process of collecting the measurements in the sample did not skew the sample so that it no longer accurately reflects the metric's variations and consistencies.
采集样本的主要原因是为了查看所测量的度量中存在何种变化和一致性。这些变化可能与互联网上的不同点有关,也可能与不同的测量时间有关。在评估基于样本的变化时,通常假设样本是“无偏的”,这意味着在样本中收集测量值的过程不会使样本倾斜,从而使其不再准确反映度量的变化和一致性。
One common way of collecting samples is to make measurements separated by fixed amounts of time: periodic sampling. Periodic sampling is particularly attractive because of its simplicity, but it suffers from two potential problems:
采集样本的一种常见方法是按固定时间间隔进行测量:定期采样。定期采样因其简单性而特别具有吸引力,但它存在两个潜在问题:
+ If the metric being measured itself exhibits periodic behavior, then there is a possibility that the sampling will observe only part of the periodic behavior if the periods happen to agree (either directly, or if one is a multiple of the other). Related to this problem is the notion that periodic sampling can be easily anticipated. Predictable sampling is susceptible to manipulation if there are mechanisms by which a network component's behavior can be temporarily changed such that the sampling only sees the modified behavior. + The act of measurement can perturb what is being measured (for example, injecting measurement traffic into a network alters the congestion level of the network), and repeated periodic perturbations can drive a network into a state of synchronization (cf. [FJ94]), greatly magnifying what might individually be minor effects.
+ 如果被测量的度量本身表现出周期性行为,那么如果周期恰好一致(或者直接一致,或者如果一个周期是另一个周期的倍数),则抽样可能只观察周期性行为的一部分。与这个问题相关的概念是,可以很容易地预测定期采样。如果存在可以临时更改网络组件行为的机制,使采样仅看到修改的行为,则可预测采样容易受到操纵。+测量行为可能会干扰正在测量的内容(例如,将测量流量注入网络会改变网络的拥塞水平),重复的周期性干扰可能会使网络进入同步状态(参见[FJ94]),从而大大放大可能单独产生的小影响。
A more sound approach is based on "random additive sampling": samples are separated by independent, randomly generated intervals that have a common statistical distribution G(t) [BM92]. The quality of this sampling depends on the distribution G(t). For example, if G(t) generates a constant value g with probability one, then the sampling reduces to periodic sampling with a period of g.
一种更合理的方法是基于“随机加性抽样”:样本由具有共同统计分布G(t)的独立、随机生成的间隔分隔[BM92]。该抽样的质量取决于分布G(t)。例如,如果G(t)生成一个概率为1的常数值G,则采样减少为周期为G的周期采样。
Random additive sampling gains significant advantages. In general, it avoids synchronization effects and yields an unbiased estimate of the property being sampled. The only significant drawbacks with it are:
随机加性抽样具有显著的优势。一般来说,它可以避免同步效应,并对采样的属性进行无偏估计。它唯一的显著缺点是:
+ it complicates frequency-domain analysis, because the samples do not occur at fixed intervals such as assumed by Fourier-transform techniques; and + unless G(t) is the exponential distribution (see below), sampling still remains somewhat predictable, as discussed for periodic sampling above.
+ 它使频域分析复杂化,因为样本不会以固定的间隔出现,如傅立叶变换技术假设的那样;和+除非G(t)是指数分布(见下文),否则采样仍然在某种程度上是可预测的,正如上文针对周期采样所讨论的。
It can be proved that if G(t) is an exponential distribution with rate lambda, that is
证明了如果G(t)是一个具有lambda速率的指数分布,则
G(t) = 1 - exp(-lambda * t)
G(t) = 1 - exp(-lambda * t)
then the arrival of new samples *cannot* be predicted (and, again, the sampling is unbiased). Furthermore, the sampling is asymptotically unbiased even if the act of sampling affects the network's state. Such sampling is referred to as "Poisson sampling". It is not prone to inducing synchronization, it can be used to accurately collect measurements of periodic behavior, and it is not prone to manipulation by anticipating when new samples will occur.
然后,新样本的到达*无法*预测(同样,采样是无偏的)。此外,即使采样行为影响网络状态,采样也是渐近无偏的。这种抽样称为“泊松抽样”。它不容易引起同步,可以用来准确地收集周期性行为的测量值,也不容易通过预测新样本何时出现而进行操纵。
Because of these valuable properties, we in general prefer that samples of Internet measurements are gathered using Poisson sampling. {Comment: We note, however, that there may be circumstances that favor use of a different G(t). For example, the exponential distribution is unbounded, so its use will on occasion generate lengthy spaces between sampling times. We might instead desire to bound the longest such interval to a maximum value dT, to speed the convergence of the estimation derived from the sampling. This could be done by using
由于这些有价值的特性,我们通常倾向于使用泊松抽样收集互联网测量的样本。{注释:然而,我们注意到,可能存在有利于使用不同G(t)的情况。例如,指数分布是无界的,因此它的使用有时会在采样时间之间产生较长的间隔。我们可能希望将最长的间隔限制为最大值dT,以加快从采样得出的估计的收敛。这可以通过使用
G(t) = Unif(0, dT)
G(t) = Unif(0, dT)
that is, the uniform distribution between 0 and dT. This sampling, of course, becomes highly predictable if an interval of nearly length dT has elapsed without a sample occurring.}
即0和dT之间的均匀分布。当然,如果在没有样本发生的情况下,经过了接近长度dT的间隔,则此采样变得高度可预测。}
In its purest form, Poisson sampling is done by generating independent, exponentially distributed intervals and gathering a single measurement after each interval has elapsed. It can be shown that if starting at time T one performs Poisson sampling over an interval dT, during which a total of N measurements happen to be made, then those measurements will be uniformly distributed over the interval [T, T+dT]. So another way of conducting Poisson sampling is to pick dT and N and generate N random sampling times uniformly over the interval [T, T+dT]. The two approaches are equivalent, except if N and dT are externally known. In that case, the property of not being able to predict measurement times is weakened (the other properties still hold). The N/dT approach has an advantage that dealing with fixed values of N and dT can be simpler than dealing with a fixed lambda but variable numbers of measurements over variably-sized intervals.
在最纯粹的形式中,泊松采样是通过生成独立的指数分布区间,并在每个区间结束后收集单个测量值来完成的。可以证明,如果从时间T开始,one在区间dT上执行泊松采样,在此期间总共进行了N次测量,那么这些测量将均匀分布在区间[T,T+dT]上。因此,进行泊松抽样的另一种方法是选取dT和N,并在区间[T,T+dT]上均匀地生成N个随机抽样次数。这两种方法是等效的,除非N和dT是外部已知的。在这种情况下,无法预测测量时间的特性被削弱(其他特性仍然有效)。N/dT方法的一个优点是,处理N和dT的固定值可能比处理固定lambda(但在不同大小的间隔上测量的数量不同)更简单。
Closely related to Poisson sampling is "geometric sampling", in which external events are measured with a fixed probability p. For example, one might capture all the packets over a link but only record the packet to a trace file if a randomly generated number uniformly distributed between 0 and 1 is less than a given p. Geometric sampling has the same properties of being unbiased and not predictable in advance as Poisson sampling, so if it fits a particular Internet measurement task, it too is sound. See [CPB93] for more discussion.
与泊松抽样密切相关的是“几何抽样”,其中外部事件以固定概率p进行测量。例如,如果均匀分布在0和1之间的随机生成的数字小于给定的p,则可以捕获链路上的所有数据包,但仅将数据包记录到跟踪文件。几何采样与泊松采样具有相同的无偏性和不可预测性,因此,如果它适合特定的互联网测量任务,它也是可靠的。有关更多讨论,请参见[CPB93]。
To generate Poisson sampling intervals, one first determines the rate lambda at which the singleton measurements will on average be made (e.g., for an average sampling interval of 30 seconds, we have lambda = 1/30, if the units of time are seconds). One then generates a series of exponentially-distributed (pseudo) random numbers E1, E2, ..., En. The first measurement is made at time E1, the next at time E1+E2, and so on.
要生成泊松采样间隔,首先确定进行单次测量的平均速率λ(例如,对于30秒的平均采样间隔,如果时间单位为秒,则λ=1/30)。然后生成一系列指数分布(伪)随机数E1,E2,…,En。第一次测量在时间E1进行,下一次测量在时间E1+E2进行,依此类推。
One technique for generating exponentially-distributed (pseudo) random numbers is based on the ability to generate U1, U2, ..., Un, (pseudo) random numbers that are uniformly distributed between 0 and 1. Many computers provide libraries that can do this. Given such
生成指数分布(伪)随机数的一种技术基于生成均匀分布在0和1之间的U1、U2、…、Un(伪)随机数的能力。许多计算机提供了可以做到这一点的库。鉴于此
Ui, to generate Ei one uses:
Ui,要生成Ei,请使用:
Ei = -log(Ui) / lambda
Ei = -log(Ui) / lambda
where log(Ui) is the natural logarithm of Ui. {Comment: This technique is an instance of the more general "inverse transform" method for generating random numbers with a given distribution.}
其中log(Ui)是Ui的自然对数。{注释:这项技术是生成具有给定分布的随机数的更通用的“逆变换”方法的一个实例。}
Implementation details:
实施详情:
There are at least three different methods for approximating Poisson sampling, which we describe here as Methods 1 through 3. Method 1 is the easiest to implement and has the most error, and method 3 is the most difficult to implement and has the least error (potentially none).
近似泊松抽样至少有三种不同的方法,我们在这里描述为方法1到3。方法1最容易实现且误差最大,而方法3最难实现且误差最小(可能没有)。
Method 1 is to proceed as follows:
方法1如下所示:
1. Generate E1 and wait that long. 2. Perform a measurement. 3. Generate E2 and wait that long. 4. Perform a measurement. 5. Generate E3 and wait that long. 6. Perform a measurement ...
1. 生成E1并等待那么长时间。2.进行测量。3.生成E2并等待那么长时间。4.进行测量。5.生成E3并等待那么长时间。6.进行测量。。。
The problem with this approach is that the "Perform a measurement" steps themselves take time, so the sampling is not done at times E1, E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount of time required for the i'th measurement. If Mi is very small compared to 1/lambda then the potential error introduced by this technique is likewise small. As Mi becomes a non-negligible fraction of 1/lambda, the potential error increases.
这种方法的问题是“执行测量”步骤本身需要时间,因此采样不是在E1、E1+E2等时间进行的,而是在E1、E1+M1+E2等时间进行的,其中Mi是第i次测量所需的时间量。如果Mi与1/lambda相比非常小,则该技术引入的潜在误差也同样小。当Mi成为1/lambda的不可忽略分数时,潜在误差增加。
Method 2 attempts to correct this error by taking into account the amount of time required by the measurements (i.e., the Mi's) and adjusting the waiting intervals accordingly:
方法2试图通过考虑测量(即Mi)所需的时间量并相应调整等待间隔来纠正此错误:
1. Generate E1 and wait that long. 2. Perform a measurement and measure M1, the time it took to do so. 3. Generate E2 and wait for a time E2-M1. 4. Perform a measurement and measure M2 ..
1. 生成E1并等待那么长时间。2.进行测量并测量M1,即测量所需的时间。3.生成E2并等待一段时间E2-M1。4.进行测量并测量M2。。
This approach works fine as long as E{i+1} >= Mi. But if E{i+1} < Mi then it is impossible to wait the proper amount of time. (Note that this case corresponds to needing to perform two measurements simultaneously.)
只要E{i+1}>=Mi,这种方法就可以正常工作。但是如果E{i+1}<Mi,那么就不可能等待适当的时间。(注意,这种情况对应于需要同时进行两次测量。)
Method 3 is generating a schedule of measurement times E1, E1+E2, etc., and then sticking to it:
方法3生成测量时间E1、E1+E2等的时间表,然后遵守该时间表:
1. Generate E1, E2, ..., En. 2. Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei. 3. Arrange that at times T1, T2, ..., Tn, a measurement is made.
1. 生成E1、E2、…、En。2.计算测量时间T1,T2,…,Tn,作为Ti=E1+…+工程安装。3.安排在时间T1、T2、…、Tn进行测量。
By allowing simultaneous measurements, Method 3 avoids the shortcomings of Methods 1 and 2. If, however, simultaneous measurements interfere with one another, then Method 3 does not gain any benefit and may actually prove worse than Methods 1 or 2.
通过允许同时测量,方法3避免了方法1和2的缺点。但是,如果同时测量相互干扰,则方法3不会获得任何好处,并且可能实际证明比方法1或2更糟糕。
For Internet phenomena, it is not known to what degree the inaccuracies of these methods are significant. If the Mi's are much less than 1/lambda, then any of the three should suffice. If the Mi's are less than 1/lambda but perhaps not greatly less, then Method 2 is preferred to Method 1. If simultaneous measurements do not interfere with one another, then Method 3 is preferred, though it can be considerably harder to implement.
对于互联网现象,不知道这些方法的不准确程度有多大。如果Mi远小于1/lambda,则三者中的任何一个都应足够。如果Mi小于1/lambda,但可能不会太小,则方法2优于方法1。如果同时测量不相互干扰,则首选方法3,尽管它可能很难实现。
A fundamental requirement for a sound measurement methodology is that measurement be made using as few unconfirmed assumptions as possible. Experience has painfully shown how easy it is to make an (often implicit) assumption that turns out to be incorrect. An example is incorporating into a measurement the reading of a clock synchronized to a highly accurate source. It is easy to assume that the clock is therefore accurate; but due to software bugs, a loss of power in the source, or a loss of communication between the source and the clock, the clock could actually be quite inaccurate.
合理测量方法的一个基本要求是使用尽可能少的未经证实的假设进行测量。经验已经痛苦地表明,做出一个(通常是隐含的)假设是多么容易,但结果却是不正确的。例如,将与高精度源同步的时钟读数纳入测量。因此很容易假设时钟是准确的;但由于软件缺陷、电源断电或电源与时钟之间的通信中断,时钟实际上可能非常不准确。
This is not to argue that one must not make *any* assumptions when measuring, but rather that, to the extent which is practical, assumptions should be tested. One powerful way for doing so involves checking for self-consistency. Such checking applies both to the observed value(s) of the measurement *and the values used by the measurement process itself*. A simple example of the former is that when computing a round trip time, one should check to see if it is negative. Since negative time intervals are non-physical, if it ever is negative that finding immediately flags an error. *These sorts of errors should then be investigated!* It is crucial to determine where the error lies, because only by doing so diligently can we build up faith in a methodology's fundamental soundness. For example, it could be that the round trip time is negative because during the measurement the clock was set backward in the process of synchronizing it with another source. But it could also be that the
这并不是说在测量时不能做出*任何*假设,而是说,在可行的范围内,应该对假设进行测试。一个强大的方法是检查自我一致性。此类检查适用于测量*的观察值和测量过程本身*使用的值。前者的一个简单例子是,在计算往返时间时,应该检查它是否为负值。由于负的时间间隔是非物理的,如果它是负的,则查找会立即标记错误*然后应调查此类错误!*确定错误所在是至关重要的,因为只有通过努力这样做,我们才能建立对方法论基本合理性的信心。例如,往返时间可能是负数,因为在测量过程中,时钟在与另一个源同步的过程中被向后设置。但也可能是
measurement program accesses uninitialized memory in one of its computations and, only very rarely, that leads to a bogus computation. This second error is more serious, if the same program is used by others to perform the same measurement, since then they too will suffer from incorrect results. Furthermore, once uncovered it can be completely fixed.
测量程序在其一次计算中访问未初始化的内存,但很少会导致虚假计算。如果其他人使用相同的程序执行相同的测量,则第二个错误更为严重,因为这样他们也将遭受不正确的结果。此外,一旦打开,它可以完全固定。
A more subtle example of testing for self-consistency comes from gathering samples of one-way Internet delays. If one has a large sample of such delays, it may well be highly telling to, for example, fit a line to the pairs of (time of measurement, measured delay), to see if the resulting line has a clearly non-zero slope. If so, a possible interpretation is that one of the clocks used in the measurements is skewed relative to the other. Another interpretation is that the slope is actually due to genuine network effects. Determining which is indeed the case will often be highly illuminating. (See [Pa97] for a discussion of distinguishing between relative clock skew and genuine network effects.) Furthermore, if making this check is part of the methodology, then a finding that the long-term slope is very near zero is positive evidence that the measurements are probably not biased by a difference in skew.
自我一致性测试的一个更微妙的例子是收集单向互联网延迟的样本。如果有一个这样的延迟的大样本,例如,将一条线拟合到一对(测量时间,测量的延迟),以查看结果线是否具有明显的非零斜率,这可能是非常有意义的。如果是这样,一种可能的解释是测量中使用的一个时钟相对于另一个时钟倾斜。另一种解释是,坡度实际上是由于真正的网络效应。确定哪一种情况确实如此通常会非常有启发性。(有关区分相对时钟偏差和真实网络效应的讨论,请参见[Pa97])此外,如果进行此检查是方法的一部分,那么长期斜率非常接近于零的发现是测量值可能没有偏差偏差的积极证据。
A final example illustrates checking the measurement process itself for self-consistency. Above we outline Poisson sampling techniques, based on generating exponentially-distributed intervals. A sound measurement methodology would include testing the generated intervals to see whether they are indeed exponentially distributed (and also to see if they suffer from correlation). In the appendix we discuss and give C code for one such technique, a general-purpose, well-regarded goodness-of-fit test called the Anderson-Darling test.
最后一个示例说明了如何检查度量过程本身的自一致性。上面我们概述了基于生成指数分布区间的泊松采样技术。合理的测量方法包括测试生成的间隔,以查看它们是否确实呈指数分布(以及它们是否存在相关性)。在附录中,我们讨论并给出了一种此类技术的C代码,这是一种被广泛认可的通用拟合优度测试,称为Anderson-Darling测试。
Finally, we note that what is truly relevant for Poisson sampling of Internet metrics is often not when the measurements began but the wire times corresponding to the measurement process. These could well be different, due to complications on the hosts used to perform the measurement. Thus, even those with complete faith in their pseudo-random number generators and subsequent algorithms are encouraged to consider how they might test the assumptions of each measurement procedure as much as possible.
最后,我们注意到,互联网指标的泊松抽样真正相关的通常不是测量开始的时间,而是与测量过程相对应的连线时间。由于用于执行测量的主机的复杂性,这些可能会有所不同。因此,即使那些完全相信他们的伪随机数发生器和后续算法的人被鼓励考虑他们如何尽可能多地测试每个测量过程的假设。
One way of describing a collection of measurements (a sample) is as a statistical distribution -- informally, as percentiles. There are several slightly different ways of doing so. In this section we define a standard definition to give uniformity to these descriptions.
描述一组测量数据(样本)的一种方式是统计分布——非正式地说,是百分位数。有几种稍有不同的方法。在本节中,我们定义了一个标准定义,以统一这些描述。
The "empirical distribution function" (EDF) of a set of scalar measurements is a function F(x) which for any x gives the fractional proportion of the total measurements that were <= x. If x is less than the minimum value observed, then F(x) is 0. If it is greater or equal to the maximum value observed, then F(x) is 1.
一组标量测量值的“经验分布函数”(EDF)是一个函数F(x),对于任何x,它给出了小于等于x的总测量值的分数比例。如果x小于观察到的最小值,则F(x)为0。如果大于或等于观察到的最大值,则F(x)为1。
For example, given the 6 measurements:
例如,给定6个测量值:
-2, 7, 7, 4, 18, -5
-2, 7, 7, 4, 18, -5
Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7) = 5/6, F(18) = 1, F(239) = 1.
那么F(-8)=0,F(-5)=1/6,F(-5.0001)=0,F(-4.999)=1/6,F(7)=5/6,F(18)=1,F(239)=1。
Note that we can recover the different measured values and how many times each occurred from F(x) -- no information regarding the range in values is lost. Summarizing measurements using histograms, on the other hand, in general loses information about the different values observed, so the EDF is preferred.
注意,我们可以从F(x)中恢复不同的测量值以及每次发生的次数——没有丢失有关值范围的信息。另一方面,使用直方图总结测量值通常会丢失有关观察到的不同值的信息,因此首选EDF。
Using either the EDF or a histogram, however, we do lose information regarding the order in which the values were observed. Whether this loss is potentially significant will depend on the metric being measured.
然而,使用EDF或直方图,我们确实会丢失有关观察值顺序的信息。这一损失是否具有潜在的重大意义将取决于所测量的指标。
We will use the term "percentile" to refer to the smallest value of x for which F(x) >= a given percentage. So the 50th percentile of the example above is 4, since F(4) = 3/6 = 50%; the 25th percentile is -2, since F(-5) = 1/6 < 25%, and F(-2) = 2/6 >= 25%; the 100th percentile is 18; and the 0th percentile is -infinity, as is the 15th percentile.
我们将使用术语“百分位”来表示x的最小值,其中F(x)>=给定的百分比。所以上面例子的第50个百分位是4,因为F(4)=3/6=50%;第25个百分位是-2,因为F(-5)=1/6<25%,F(-2)=2/6>=25%;第100百分位为18;第0百分位是-无穷大,第15百分位也是。
Care must be taken when using percentiles to summarize a sample, because they can lend an unwarranted appearance of more precision than is really available. Any such summary must include the sample size N, because any percentile difference finer than 1/N is below the resolution of the sample.
在使用百分位数对样本进行汇总时必须小心,因为它们可以提供比实际可用的更精确的不必要的外观。任何此类总结必须包括样本大小N,因为任何小于1/N的百分位差异都低于样本分辨率。
See [DS86] for more details regarding EDF's.
有关EDF的更多详细信息,请参见[DS86]。
We close with a note on the common (and important!) notion of median. In statistics, the median of a distribution is defined to be the point X for which the probability of observing a value <= X is equal to the probability of observing a value > X. When estimating the median of a set of observations, the estimate depends on whether the number of observations, N, is odd or even:
最后,我们将介绍中位数的常见(也是重要的!)概念。在统计学中,分布的中值被定义为观测值<=X的概率等于观测值>X的概率的点X。当估计一组观测值的中值时,估计值取决于观测值的数量N是奇数还是偶数:
+ If N is odd, then the 50th percentile as defined above is used as the estimated median. + If N is even, then the estimated median is the average of the central two observations; that is, if the observations are sorted in ascending order and numbered from 1 to N, where N = 2*K, then the estimated median is the average of the (K)'th and (K+1)'th observations.
+ 如果N为奇数,则使用上文定义的第50个百分位作为估计中值如果N为偶数,则估计中值为中心两个观测值的平均值;也就是说,如果观察值按升序排序并从1到N编号,其中N=2*K,则估计的中值是(K)个和(K+1)个观察值的平均值。
Usually the term "estimated" is dropped from the phrase "estimated median" and this value is simply referred to as the "median".
通常,术语“估计”从短语“估计中值”中删除,该值简称为“中值”。
For some forms of measurement calibration we need to test whether a set of numbers is consistent with those numbers having been drawn from a particular distribution. An example is that to apply a self-consistency check to measurements made using a Poisson process, one test is to see whether the spacing between the sampling times does indeed reflect an exponential distribution; or if the dT/N approach discussed above was used, whether the times are uniformly distributed across [T, dT].
对于某些形式的测量校准,我们需要测试一组数字是否与从特定分布中提取的数字一致。例如,要对使用泊松过程进行的测量应用自一致性检查,一个测试是查看采样时间之间的间隔是否确实反映指数分布;或者,如果使用上述dT/N方法,则时间是否均匀分布在[T,dT]上。
{Comment: There are at least three possible sets of values we could test: the scheduled packet transmission times, as determined by use of a pseudo-random number generator; user-level timestamps made just before or after the system call for transmitting the packet; and wire times for the packets as recorded using a packet filter. All three of these are potentially informative: failures for the scheduled times to match an exponential distribution indicate inaccuracies in the random number generation; failures for the user-level times indicate inaccuracies in the timers used to schedule transmission; and failures for the wire times indicate inaccuracies in actually transmitting the packets, perhaps due to contention for a shared resource.}
{Comment: There are at least three possible sets of values we could test: the scheduled packet transmission times, as determined by use of a pseudo-random number generator; user-level timestamps made just before or after the system call for transmitting the packet; and wire times for the packets as recorded using a packet filter. All three of these are potentially informative: failures for the scheduled times to match an exponential distribution indicate inaccuracies in the random number generation; failures for the user-level times indicate inaccuracies in the timers used to schedule transmission; and failures for the wire times indicate inaccuracies in actually transmitting the packets, perhaps due to contention for a shared resource.}
There are a large number of statistical goodness-of-fit techniques for performing such tests. See [DS86] for a thorough discussion. That reference recommends the Anderson-Darling EDF test as being a good all-purpose test, as well as one that is especially good at detecting deviations from a given distribution in the lower and upper tails of the EDF.
有大量的统计拟合优度技术用于执行此类测试。有关详细讨论,请参见[DS86]。该参考文献推荐Anderson-Darling EDF测试作为一种良好的通用测试,以及一种特别擅长于检测EDF下尾端和上尾端给定分布偏差的测试。
It is important to understand that the nature of goodness-of-fit tests is that one first selects a "significance level", which is the probability that the test will erroneously declare that the EDF of a given set of measurements fails to match a particular distribution when in fact the measurements do indeed reflect that distribution.
重要的是要理解拟合优度测试的本质是首先选择“显著性水平”,即测试将错误地宣布给定测量集的EDF与特定分布不匹配的概率,而事实上测量确实反映了该分布。
Unless otherwise stated, IPPM goodness-of-fit tests are done using 5% significance. This means that if the test is applied to 100 samples and 5 of those samples are deemed to have failed the test, then the samples are all consistent with the distribution being tested. If significantly more of the samples fail the test, then the assumption that the samples are consistent with the distribution being tested must be rejected. If significantly fewer of the samples fail the test, then the samples have potentially been doctored too well to fit the distribution. Similarly, some goodness-of-fit tests (including Anderson-Darling) can detect whether it is likely that a given sample was doctored. We also use a significance of 5% for this case; that is, the test will report that a given honest sample is "too good to be true" 5% of the time, so if the test reports this finding significantly more often than one time out of twenty, it is an indication that something unusual is occurring.
除非另有说明,IPPM拟合优度测试采用5%显著性。这意味着,如果对100个样品进行试验,其中5个样品被视为未通过试验,则所有样品均与试验分布一致。如果明显有更多的样本未通过测试,则必须拒绝假设样本与测试分布一致。如果测试失败的样本显著减少,则样本可能被篡改得太好,无法适应分布。类似地,一些拟合优度测试(包括安德森-达林)可以检测给定样本是否可能被篡改。对于这种情况,我们也使用5%的显著性;也就是说,测试将在5%的时间内报告给定的诚实样本“太好而不真实”,因此,如果测试报告这一发现的频率明显高于20次中的1次,则表明发生了异常情况。
The appendix gives sample C code for implementing the Anderson-Darling test, as well as further discussing its use.
附录给出了实现Anderson-Darling测试的示例C代码,并进一步讨论了它的使用。
See [Pa94] for a discussion of goodness-of-fit and closeness-of-fit tests in the context of network measurement.
关于网络测量中拟合优度和拟合贴近度测试的讨论,请参见[Pa94]。
When defining metrics applying to a path, subpath, cloud, or other network element, we in general do not define them in stochastic terms (probabilities). We instead prefer a deterministic definition. So, for example, rather than defining a metric about a "packet loss probability between A and B", we would define a metric about a "packet loss rate between A and B". (A measurement given by the first definition might be "0.73", and by the second "73 packets out of 100".)
当定义应用于路径、子路径、云或其他网络元素的度量时,我们通常不使用随机术语(概率)来定义它们。相反,我们更喜欢确定性定义。因此,例如,我们将定义一个关于“a和B之间的丢包概率”的度量,而不是定义一个关于“a和B之间的丢包率”的度量。(第一个定义给出的测量值可能是“0.73”,第二个定义给出的测量值可能是“100个包中的73个包”。)
We emphasize that the above distinction concerns the *definitions* of *metrics*. It is not intended to apply to what sort of techniques we might use to analyze the results of measurements.
我们强调,上述区别涉及*指标*的*定义*。它不适用于我们可能使用何种技术来分析测量结果。
The reason for this distinction is as follows. When definitions are made in terms of probabilities, there are often hidden assumptions in the definition about a stochastic model of the behavior being measured. The fundamental goal with avoiding probabilities in our metric definitions is to avoid biasing our definitions by these hidden assumptions.
这种区别的原因如下。当用概率来定义时,定义中往往隐藏着关于被测量行为的随机模型的假设。在我们的度量定义中避免概率的基本目标是避免这些隐藏的假设使我们的定义产生偏差。
For example, an easy hidden assumption to make is that packet loss in a network component due to queueing overflows can be described as something that happens to any given packet with a particular probability. In today's Internet, however, queueing drops are actually usually *deterministic*, and assuming that they should be described probabilistically can obscure crucial correlations between queueing drops among a set of packets. So it's better to explicitly note stochastic assumptions, rather than have them sneak into our definitions implicitly.
例如,一个容易隐藏的假设是,由于排队溢出导致的网络组件中的数据包丢失可以描述为以特定概率发生在任何给定数据包上的事件。然而,在今天的互联网上,排队丢弃实际上通常是“确定性的”,并且假设它们应该被概率地描述,可能会掩盖一组数据包中排队丢弃之间的关键相关性。因此,最好明确地记录随机假设,而不是让它们隐式地潜入我们的定义中。
This does *not* mean that we abandon stochastic models for *understanding* network performance! It only means that when defining IP metrics we avoid terms such as "probability" for terms like "proportion" or "rate". We will still use, for example, random sampling in order to estimate probabilities used by stochastic models related to the IP metrics. We also do not rule out the possibility of stochastic metrics when they are truly appropriate (for example, perhaps to model transmission errors caused by certain types of line noise).
这并不意味着我们为了理解网络性能而放弃随机模型!这只意味着在定义IP指标时,我们避免使用“比例”或“比率”等术语中的“概率”等术语。例如,我们仍将使用随机抽样来估计与IP度量相关的随机模型使用的概率。我们也不排除随机度量真正合适时的可能性(例如,可能用于模拟由某些类型的线路噪声引起的传输错误)。
A fundamental property of many Internet metrics is that the value of the metric depends on the type of IP packet(s) used to make the measurement. Consider an IP-connectivity metric: one obtains different results depending on whether one is interested in connectivity for packets destined for well-known TCP ports or unreserved UDP ports, or those with invalid IP checksums, or those with TTL's of 16, for example. In some circumstances these distinctions will be highly interesting (for example, in the presence of firewalls, or RSVP reservations).
许多Internet度量的一个基本属性是,度量的值取决于用于进行度量的IP数据包的类型。考虑IP连接性度量:一个获得不同的结果,这取决于是否感兴趣的是面向已知TCP端口或未保留UDP端口的分组,或者那些具有无效IP校验和或TTL为16的分组的连接。在某些情况下,这些区别将非常有趣(例如,在存在防火墙或RSVP保留的情况下)。
Because of this distinction, we introduce the generic notion of a "packet of type P", where in some contexts P will be explicitly defined (i.e., exactly what type of packet we mean), partially defined (e.g., "with a payload of B octets"), or left generic. Thus we may talk about generic IP-type-P-connectivity or more specific IP-port-HTTP-connectivity. Some metrics and methodologies may be fruitfully defined using generic type P definitions which are then made specific when performing actual measurements.
由于这一区别,我们引入了“P型数据包”的一般概念,在某些上下文中,P将被明确定义(即,我们所指的确切数据包类型)、部分定义(例如,“有效载荷为B个八位组”)或保持通用。因此,我们可以讨论通用IP-type-P-connectivity或更具体的IP端口HTTP连接。一些度量和方法可以使用通用的P型定义进行有效的定义,然后在执行实际测量时对其进行具体化。
Whenever a metric's value depends on the type of the packets involved in the metric, the metric's name will include either a specific type or a phrase such as "type-P". Thus we will not define an "IP-
当度量值取决于度量中涉及的数据包类型时,度量名称将包括特定类型或短语,如“type-P”。因此,我们不会定义“IP”-
connectivity" metric but instead an "IP-type-P-connectivity" metric and/or perhaps an "IP-port-HTTP-connectivity" metric. This naming convention serves as an important reminder that one must be conscious of the exact type of traffic being measured.
“连通性”度量,而不是“IP-type-P-connectivity”度量和/或可能是“IP端口HTTP连通性”度量。此命名约定作为一个重要提醒,提醒人们必须意识到所测量的确切流量类型。
A closely related note: it would be very useful to know if a given Internet component treats equally a class C of different types of packets. If so, then any one of those types of packets can be used for subsequent measurement of the component. This suggests we devise a metric or suite of metrics that attempt to determine C.
一个密切相关的注意事项:了解一个给定的Internet组件是否平等地对待不同类型的数据包的C类是非常有用的。如果是这样,那么这些类型的包中的任何一种都可以用于组件的后续测量。这表明我们设计了一个或一套度量标准,试图确定C。
When considering a metric for some path through the Internet, it is often natural to think about it as being for the path from Internet host H1 to host H2. A definition in these terms, though, can be ambiguous, because Internet hosts can be attached to more than one network. In this case, the result of the metric will depend on which of these networks is actually used.
当考虑通过Internet的某些路径的度量时,通常会将其视为从Internet主机H1到主机H2的路径。然而,这些术语中的定义可能不明确,因为Internet主机可以连接到多个网络。在这种情况下,度量的结果将取决于实际使用的网络。
Because of this ambiguity, usually such definitions should instead be defined in terms of Internet IP addresses. For the common case of a unidirectional path through the Internet, we will use the term "Src" to denote the IP address of the beginning of the path, and "Dst" to denote the IP address of the end.
由于这种模糊性,通常应根据Internet IP地址来定义此类定义。对于通过Internet的单向路径的常见情况,我们将使用术语“Src”表示路径开始的IP地址,“Dst”表示路径结束的IP地址。
Unless otherwise stated, all metric definitions that concern IP packets include an implicit assumption that the packet is *standard formed*. A packet is standard formed if it meets all of the following criteria:
除非另有说明,所有涉及IP数据包的度量定义都包含一个隐式假设,即该数据包是*标准格式的*。如果数据包满足以下所有标准,则为标准数据包:
+ Its length as given in the IP header corresponds to the size of the IP header plus the size of the payload. + It includes a valid IP header: the version field is 4 (later, we will expand this to include 6); the header length is >= 5; the checksum is correct. + It is not an IP fragment. + The source and destination addresses correspond to the hosts in question.
+ IP报头中给出的长度对应于IP报头的大小加上有效负载的大小它包括一个有效的IP头:版本字段是4(稍后,我们将扩展它以包括6);标题长度>=5;校验和是正确的它不是IP片段源地址和目标地址对应于所讨论的主机。
+ Either the packet possesses sufficient TTL to travel from the source to the destination if the TTL is decremented by one at each hop, or it possesses the maximum TTL of 255. + It does not contain IP options unless explicitly noted. + If a transport header is present, it too contains a valid checksum and other valid fields.
+ 如果TTL在每个跃点减少1,则数据包拥有足够的TTL从源传输到目的地,或者它拥有255+的最大TTL除非明确说明,否则不包含IP选项。+如果存在传输标头,则它也包含有效校验和和其他有效字段。
We further require that if a packet is described as having a "length of B octets", then 0 <= B <= 65535; and if B is the payload length in octets, then B <= (65535-IP header size in octets).
We further require that if a packet is described as having a "length of B octets", then 0 <= B <= 65535; and if B is the payload length in octets, then B <= (65535-IP header size in octets).
So, for example, one might imagine defining an IP connectivity metric as "IP-type-P-connectivity for standard-formed packets with the IP TOS field set to 0", or, more succinctly, "IP-type-P-connectivity with the IP TOS field set to 0", since standard-formed is already implied by convention.
因此,例如,可以想象将IP连接性度量定义为“IP-type-P-connectivity for standard formed packets,其IP-TOS字段设置为0”,或者更简洁地说,“IP-type-P-connectivity,其IP-TOS字段设置为0”,因为约定已经暗示了标准格式。
A particular type of standard-formed packet often useful to consider is the "minimal IP packet from A to B" - this is an IP packet with the following properties:
一种特殊类型的标准形成的分组通常是有用的,它是“从A到B的最小IP分组”——这是一个IP分组,具有以下属性:
+ It is standard-formed. + Its data payload is 0 octets. + It contains no options.
+ 它是标准格式的其数据有效载荷为0个八位字节+它不包含任何选项。
(Note that we do not define its protocol field, as different values may lead to different treatment by the network.)
(请注意,我们没有定义其协议字段,因为不同的值可能导致网络的不同处理。)
When defining IP metrics we keep in mind that no packet smaller or simpler than this can be transmitted over a correctly operating IP network.
在定义IP指标时,我们要记住,没有比这更小或更简单的数据包可以通过正确运行的IP网络传输。
The comments of Brian Carpenter, Bill Cerveny, Padma Krishnaswamy Jeff Sedayao and Howard Stanislevic are appreciated.
感谢布赖恩·卡彭特、比尔·塞维尼、帕德玛·克里希纳斯瓦米·杰夫·塞达尧和霍华德·斯坦尼斯列维奇的评论。
This document concerns definitions and concepts related to Internet measurement. We discuss measurement procedures only in high-level terms, regarding principles that lend themselves to sound measurement. As such, the topics discussed do not affect the security of the Internet or of applications which run on it.
本文件涉及与互联网测量相关的定义和概念。我们仅从高层次的角度讨论测量程序,涉及适用于声音测量的原则。因此,讨论的主题不会影响Internet或其上运行的应用程序的安全性。
That said, it should be recognized that conducting Internet measurements can raise both security and privacy concerns. Active techniques, in which traffic is injected into the network, can be abused for denial-of-service attacks disguised as legitimate measurement activity. Passive techniques, in which existing traffic is recorded and analyzed, can expose the contents of Internet traffic to unintended recipients. Consequently, the definition of each metric and methodology must include a corresponding discussion of security considerations.
这就是说,应该认识到,进行互联网测量可能会引起安全和隐私问题。将流量注入网络的主动技术可能被滥用,用于伪装为合法测量活动的拒绝服务攻击。被动技术,即记录和分析现有流量,可以将互联网流量的内容暴露给非预期的接收者。因此,每个指标和方法的定义必须包括对安全考虑的相应讨论。
Below we give routines written in C for computing the Anderson-Darling test statistic (A2) for determining whether a set of values is consistent with a given statistical distribution. Externally, the two main routines of interest are:
下面我们给出用C编写的例程,用于计算Anderson-Darling检验统计量(A2),以确定一组值是否与给定的统计分布一致。从外部来看,两个主要的程序是:
double exp_A2_known_mean(double x[], int n, double mean) double unif_A2_known_range(double x[], int n, double min_val, double max_val)
双经验A2已知平均值(双x[],整数n,双平均值)双统一A2已知范围(双x[],整数n,双最小值,双最大值)
Both take as their first argument, x, the array of n values to be tested. (Upon return, the elements of x are sorted.) The remaining parameters characterize the distribution to be used: either the mean (1/lambda), for an exponential distribution, or the lower and upper bounds, for a uniform distribution. The names of the routines stress that these values must be known in advance, and *not* estimated from the data (for example, by computing its sample mean). Estimating the parameters from the data *changes* the significance level of the test statistic. While [DS86] gives alternate significance tables for some instances in which the parameters are estimated from the data, for our purposes we expect that we should indeed know the parameters in advance, since what we will be testing are generally values such as packet sending times that we wish to verify follow a known distribution.
这两种方法都将要测试的n个值的数组x作为第一个参数。(返回时,对x的元素进行排序。)其余参数表示要使用的分布:指数分布的平均值(1/lambda),或均匀分布的上下限。例程的名称强调这些值必须事先知道,并且*不是*根据数据估计(例如,通过计算其样本平均值)。根据数据估计参数*改变*检验统计的显著性水平。虽然[DS86]给出了一些根据数据估计参数的实例的替代重要性表,但出于我们的目的,我们希望我们确实应该提前知道参数,因为我们将测试的通常是数据包发送时间等值,我们希望验证这些值是否符合已知分布。
Both routines return a significance level, as described earlier. This is a value between 0 and 1. The correct use of the routines is to pick in advance the threshold for the significance level to test; generally, this will be 0.05, corresponding to 5%, as also described above. Subsequently, if the routines return a value strictly less than this threshold, then the data are deemed to be inconsistent with the presumed distribution, *subject to an error corresponding to the significance level*. That is, for a significance level of 5%, 5% of the time data that is indeed drawn from the presumed distribution will be erroneously deemed inconsistent.
如前所述,两个例程都返回显著性级别。这是一个介于0和1之间的值。正确使用例程是提前选择显著性水平的阈值进行测试;通常,这将是0.05,对应于5%,如上所述。随后,如果例程返回的值严格小于该阈值,则数据被视为与假定分布不一致,*受到与显著性水平*对应的错误的影响。也就是说,对于5%的显著性水平,从假定分布中提取的5%的时间数据将被错误地视为不一致。
Thus, it is important to bear in mind that if these routines are used frequently, then one will indeed encounter occasional failures, even if the data is unblemished.
因此,重要的是要记住,如果经常使用这些例程,那么即使数据没有瑕疵,也确实会偶尔遇到故障。
Another important point concerning significance levels is that it is unsound to compare them in order to determine which of two sets of values is a "better" fit to a presumed distribution. Such testing should instead be done using "closeness-of-fit metrics" such as the lambda^2 metric described in [Pa94].
关于显著性水平的另一个重要观点是,为了确定两组值中哪一组更适合假定的分布,对它们进行比较是不合理的。相反,应使用“拟合贴近度指标”进行此类测试,如[Pa94]中所述的λ^2指标。
While the routines provided are for exponential and uniform distributions with known parameters, it is generally straight-forward to write comparable routines for any distribution with known parameters. The heart of the A2 tests lies in a statistic computed for testing whether a set of values is consistent with a uniform distribution between 0 and 1, which we term Unif(0, 1). If we wish to test whether a set of values, X, is consistent with a given distribution G(x), we first compute Y = G_inverse(X) If X is indeed distributed according to G(x), then Y will be distributed according to Unif(0, 1); so by testing Y for consistency with Unif(0, 1), we also test X for consistency with G(x).
虽然提供的例程用于具有已知参数的指数分布和均匀分布,但通常直接编写具有已知参数的任何分布的可比较例程。A2测试的核心在于计算统计数据,以测试一组值是否与0和1之间的均匀分布一致,我们称之为Unif(0,1)。如果我们想测试一组值X是否与给定的分布G(X)一致,我们首先计算Y=G_逆(X),如果X确实根据G(X)分布,那么Y将根据Unif(0,1)分布;因此,通过测试Y与Unif(0,1)的一致性,我们也测试X与G(X)的一致性。
We note, however, that the process of computing Y above might yield values of Y outside the range (0..1). Such values should not occur if X is indeed distributed according to G(x), but easily can occur if it is not. In the latter case, we need to avoid computing the central A2 statistic, since floating-point exceptions may occur if any of the values lie outside (0..1). Accordingly, the routines check for this possibility, and if encountered, return a raw A2 statistic of -1. The routine that converts the raw A2 statistic to a significance level likewise propagates this value, returning a significance level of -1. So, any use of these routines must be prepared for a possible negative significance level.
然而,我们注意到,上面计算Y的过程可能会产生超出范围(0..1)的Y值。如果X确实按照G(X)分布,则不应出现此类值,但如果不是,则很容易出现此类值。在后一种情况下,我们需要避免计算中心A2统计,因为如果任何值位于(0..1)之外,可能会发生浮点异常。相应地,例程检查这种可能性,如果遇到这种情况,则返回原始A2统计值-1。将原始A2统计数据转换为显著性级别的例程同样会传播此值,并返回-1的显著性级别。因此,这些程序的任何使用都必须为可能的负显著性水平做好准备。
The last important point regarding use of A2 statistic concerns n, the number of values being tested. If n < 5 then the test is not meaningful, and in this case a significance level of -1 is returned.
关于A2统计数据使用的最后一个要点涉及n,即被测试值的数量。如果n<5,则测试没有意义,在这种情况下,显著性水平返回-1。
On the other hand, for "real" data the test *gains* power as n becomes larger. It is well known in the statistics community that real data almost never exactly matches a theoretical distribution, even in cases such as rolling dice a great many times (see [Pa94] for a brief discussion and references). The A2 test is sensitive enough that, for sufficiently large sets of real data, the test will almost always fail, because it will manage to detect slight imperfections in the fit of the data to the distribution.
另一方面,对于“真实”数据,随着n变大,测试*获得*功率。统计界众所周知,即使在掷骰子很多次的情况下,真实数据几乎永远不会与理论分布完全匹配(有关简要讨论和参考资料,请参见[Pa94])。A2测试足够敏感,对于足够大的真实数据集,测试几乎总是失败,因为它将设法检测数据与分布的拟合中的轻微缺陷。
For example, we have found that when testing 8,192 measured wire times for packets sent at Poisson intervals, the measurements almost always fail the A2 test. On the other hand, testing 128 measurements failed at 5% significance only about 5% of the time, as expected. Thus, in general, when the test fails, care must be taken to understand why it failed.
例如,我们发现,当测试以泊松间隔发送的数据包的8192测得的连线时间时,测量几乎总是无法通过A2测试。另一方面,测试128个测量值在5%的显著性下失败,正如预期的那样,只有大约5%的时间失败。因此,一般来说,当测试失败时,必须小心理解它失败的原因。
The remainder of this appendix gives C code for the routines mentioned above.
本附录的其余部分给出了上述例程的C代码。
/* Routines for computing the Anderson-Darling A2 test statistic. * * Implemented based on the description in "Goodness-of-Fit * Techniques," R. D'Agostino and M. Stephens, editors, * Marcel Dekker, Inc., 1986. */
/* Routines for computing the Anderson-Darling A2 test statistic. * * Implemented based on the description in "Goodness-of-Fit * Techniques," R. D'Agostino and M. Stephens, editors, * Marcel Dekker, Inc., 1986. */
#include <stdio.h> #include <stdlib.h> #include <math.h>
#include <stdio.h> #include <stdlib.h> #include <math.h>
/* Returns the raw A^2 test statistic for n sorted samples * z[0] .. z[n-1], for z ~ Unif(0,1). */ extern double compute_A2(double z[], int n);
/* Returns the raw A^2 test statistic for n sorted samples * z[0] .. z[n-1], for z ~ Unif(0,1). */ extern double compute_A2(double z[], int n);
/* Returns the significance level associated with a A^2 test * statistic value of A2, assuming no parameters of the tested * distribution were estimated from the data. */ extern double A2_significance(double A2);
/* Returns the significance level associated with a A^2 test * statistic value of A2, assuming no parameters of the tested * distribution were estimated from the data. */ extern double A2_significance(double A2);
/* Returns the A^2 significance level for testing n observations * x[0] .. x[n-1] against an exponential distribution with the * given mean. * * SIDE EFFECT: the x[0..n-1] are sorted upon return. */ extern double exp_A2_known_mean(double x[], int n, double mean);
/* Returns the A^2 significance level for testing n observations * x[0] .. x[n-1] against an exponential distribution with the * given mean. * * SIDE EFFECT: the x[0..n-1] are sorted upon return. */ extern double exp_A2_known_mean(double x[], int n, double mean);
/* Returns the A^2 significance level for testing n observations * x[0] .. x[n-1] against the uniform distribution [min_val, max_val]. * * SIDE EFFECT: the x[0..n-1] are sorted upon return. */ extern double unif_A2_known_range(double x[], int n, double min_val, double max_val);
/* Returns the A^2 significance level for testing n observations * x[0] .. x[n-1] against the uniform distribution [min_val, max_val]. * * SIDE EFFECT: the x[0..n-1] are sorted upon return. */ extern double unif_A2_known_range(double x[], int n, double min_val, double max_val);
/* Returns a pseudo-random number distributed according to an * exponential distribution with the given mean. */ extern double random_exponential(double mean);
/* Returns a pseudo-random number distributed according to an * exponential distribution with the given mean. */ extern double random_exponential(double mean);
/* Helper function used by qsort() to sort double-precision * floating-point values. */ static int compare_double(const void *v1, const void *v2) { double d1 = *(double *) v1; double d2 = *(double *) v2;
/* Helper function used by qsort() to sort double-precision * floating-point values. */ static int compare_double(const void *v1, const void *v2) { double d1 = *(double *) v1; double d2 = *(double *) v2;
if (d1 < d2) return -1; else if (d1 > d2) return 1; else return 0; }
if (d1 < d2) return -1; else if (d1 > d2) return 1; else return 0; }
double compute_A2(double z[], int n) { int i; double sum = 0.0;
double compute_A2(double z[], int n) { int i; double sum = 0.0;
if ( n < 5 ) /* Too few values. */ return -1.0;
if ( n < 5 ) /* Too few values. */ return -1.0;
/* If any of the values are outside the range (0, 1) then * fail immediately (and avoid a possible floating point * exception in the code below). */ for (i = 0; i < n; ++i) if ( z[i] <= 0.0 || z[i] >= 1.0 ) return -1.0;
/* If any of the values are outside the range (0, 1) then * fail immediately (and avoid a possible floating point * exception in the code below). */ for (i = 0; i < n; ++i) if ( z[i] <= 0.0 || z[i] >= 1.0 ) return -1.0;
/* Page 101 of D'Agostino and Stephens. */ for (i = 1; i <= n; ++i) { sum += (2 * i - 1) * log(z[i-1]); sum += (2 * n + 1 - 2 * i) * log(1.0 - z[i-1]); } return -n - (1.0 / n) * sum; }
/* Page 101 of D'Agostino and Stephens. */ for (i = 1; i <= n; ++i) { sum += (2 * i - 1) * log(z[i-1]); sum += (2 * n + 1 - 2 * i) * log(1.0 - z[i-1]); } return -n - (1.0 / n) * sum; }
double A2_significance(double A2) { /* Page 105 of D'Agostino and Stephens. */ if (A2 < 0.0) return A2; /* Bogus A2 value - propagate it. */
double A2_significance(double A2) { /* Page 105 of D'Agostino and Stephens. */ if (A2 < 0.0) return A2; /* Bogus A2 value - propagate it. */
/* Check for possibly doctored values. */ if (A2 <= 0.201) return 0.99; else if (A2 <= 0.240) return 0.975; else if (A2 <= 0.283) return 0.95; else if (A2 <= 0.346) return 0.90; else if (A2 <= 0.399) return 0.85;
/* Check for possibly doctored values. */ if (A2 <= 0.201) return 0.99; else if (A2 <= 0.240) return 0.975; else if (A2 <= 0.283) return 0.95; else if (A2 <= 0.346) return 0.90; else if (A2 <= 0.399) return 0.85;
/* Now check for possible inconsistency. */ if (A2 <= 1.248) return 0.25; else if (A2 <= 1.610) return 0.15; else if (A2 <= 1.933) return 0.10; else if (A2 <= 2.492) return 0.05; else if (A2 <= 3.070) return 0.025; else if (A2 <= 3.880) return 0.01; else if (A2 <= 4.500) return 0.005; else if (A2 <= 6.000) return 0.001; else return 0.0; }
/* Now check for possible inconsistency. */ if (A2 <= 1.248) return 0.25; else if (A2 <= 1.610) return 0.15; else if (A2 <= 1.933) return 0.10; else if (A2 <= 2.492) return 0.05; else if (A2 <= 3.070) return 0.025; else if (A2 <= 3.880) return 0.01; else if (A2 <= 4.500) return 0.005; else if (A2 <= 6.000) return 0.001; else return 0.0; }
double exp_A2_known_mean(double x[], int n, double mean) { int i; double A2;
double exp_A2_known_mean(double x[], int n, double mean) { int i; double A2;
/* Sort the first n values. */ qsort(x, n, sizeof(x[0]), compare_double);
/* Sort the first n values. */ qsort(x, n, sizeof(x[0]), compare_double);
/* Assuming they match an exponential distribution, transform * them to Unif(0,1). */ for (i = 0; i < n; ++i) { x[i] = 1.0 - exp(-x[i] / mean); }
/* Assuming they match an exponential distribution, transform * them to Unif(0,1). */ for (i = 0; i < n; ++i) { x[i] = 1.0 - exp(-x[i] / mean); }
/* Now make the A^2 test to see if they're truly uniform. */ A2 = compute_A2(x, n); return A2_significance(A2); }
/* Now make the A^2 test to see if they're truly uniform. */ A2 = compute_A2(x, n); return A2_significance(A2); }
double unif_A2_known_range(double x[], int n, double min_val, double max_val) { int i; double A2; double range = max_val - min_val;
double unif_A2_known_range(double x[], int n, double min_val, double max_val) { int i; double A2; double range = max_val - min_val;
/* Sort the first n values. */ qsort(x, n, sizeof(x[0]), compare_double);
/* Sort the first n values. */ qsort(x, n, sizeof(x[0]), compare_double);
/* Transform Unif(min_val, max_val) to Unif(0,1). */ for (i = 0; i < n; ++i) x[i] = (x[i] - min_val) / range;
/* Transform Unif(min_val, max_val) to Unif(0,1). */ for (i = 0; i < n; ++i) x[i] = (x[i] - min_val) / range;
/* Now make the A^2 test to see if they're truly uniform. */ A2 = compute_A2(x, n); return A2_significance(A2); }
/* Now make the A^2 test to see if they're truly uniform. */ A2 = compute_A2(x, n); return A2_significance(A2); }
double random_exponential(double mean) { return -mean * log1p(-drand48()); }
double random_exponential(double mean) { return -mean * log1p(-drand48()); }
[AK97] G. Almes and S. Kalidindi, "A One-way Delay Metric for IPPM", Work in Progress, November 1997.
[AK97]G.Almes和S.Kalidini,“IPPM的单向延迟度量”,正在进行的工作,1997年11月。
[BM92] I. Bilinskis and A. Mikelsons, Randomized Signal Processing, Prentice Hall International, 1992.
[BM92]I.Bilinskis和A.Mikelsons,随机信号处理,普伦蒂斯霍尔国际,1992年。
[DS86] R. D'Agostino and M. Stephens, editors, Goodness-of-Fit Techniques, Marcel Dekker, Inc., 1986.
[DS86]R.D'Agostino和M.Stephens,《拟合优度技术》编辑,Marcel Dekker,Inc.,1986年。
[CPB93] K. Claffy, G. Polyzos, and H-W. Braun, "Application of Sampling Methodologies to Network Traffic Characterization," Proc. SIGCOMM '93, pp. 194-203, San Francisco, September 1993.
[CPB93]K.Claffy,G.Polyzos和H-W.Braun,“抽样方法在网络流量表征中的应用”,Proc。SIGCOMM 93,pp.194-203,旧金山,1993年9月。
[FJ94] S. Floyd and V. Jacobson, "The Synchronization of Periodic Routing Messages," IEEE/ACM Transactions on Networking, 2(2), pp. 122-136, April 1994.
[FJ94]S.Floyd和V.Jacobson,“定期路由消息的同步”,IEEE/ACM网络事务,2(2),第122-136页,1994年4月。
[Mi92] Mills, D., "Network Time Protocol (Version 3) Specification, Implementation and Analysis", RFC 1305, March 1992.
[Mi92]Mills,D.,“网络时间协议(第3版)规范、实施和分析”,RFC13051992年3月。
[Pa94] V. Paxson, "Empirically-Derived Analytic Models of Wide-Area TCP Connections," IEEE/ACM Transactions on Networking, 2(4), pp. 316-336, August 1994.
[Pa94]V.Paxson,“广域TCP连接的经验推导分析模型”,IEEE/ACM网络交易,2(4),第316-336页,1994年8月。
[Pa96] V. Paxson, "Towards a Framework for Defining Internet Performance Metrics," Proceedings of INET '96, ftp://ftp.ee.lbl.gov/papers/metrics-framework-INET96.ps.Z
[Pa96] V. Paxson, "Towards a Framework for Defining Internet Performance Metrics," Proceedings of INET '96, ftp://ftp.ee.lbl.gov/papers/metrics-framework-INET96.ps.Z
[Pa97] V. Paxson, "Measurements and Analysis of End-to-End Internet Dynamics," Ph.D. dissertation, U.C. Berkeley, 1997, ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz.
[Pa97]V.Paxson,“端到端互联网动态的测量和分析”,博士。博士论文,加州大学伯克利分校,1997年,ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz.
Vern Paxson MS 50B/2239 Lawrence Berkeley National Laboratory University of California Berkeley, CA 94720 USA
Vern Paxson MS 50B/2239劳伦斯伯克利国家实验室加利福尼亚大学伯克利,CA 94720美国
Phone: +1 510/486-7504 EMail: vern@ee.lbl.gov
Phone: +1 510/486-7504 EMail: vern@ee.lbl.gov
Guy Almes Advanced Network & Services, Inc. 200 Business Park Drive Armonk, NY 10504 USA
Guy Almes Advanced Network&Services,Inc.美国纽约州阿蒙克商业园区大道200号,邮编10504
Phone: +1 914/765-1120 EMail: almes@advanced.org
Phone: +1 914/765-1120 EMail: almes@advanced.org
Jamshid Mahdavi Pittsburgh Supercomputing Center 4400 5th Avenue Pittsburgh, PA 15213 USA
美国宾夕法尼亚州匹兹堡第五大道4400号杰姆希德·马赫达维匹兹堡超级计算中心,邮编15213
Phone: +1 412/268-6282 EMail: mahdavi@psc.edu
Phone: +1 412/268-6282 EMail: mahdavi@psc.edu
Matt Mathis Pittsburgh Supercomputing Center 4400 5th Avenue Pittsburgh, PA 15213 USA
美国宾夕法尼亚州匹兹堡第五大道4400号马特·马蒂斯匹兹堡超级计算中心,邮编15213
Phone: +1 412/268-3319 EMail: mathis@psc.edu
Phone: +1 412/268-3319 EMail: mathis@psc.edu
Copyright (C) The Internet Society (1998). All Rights Reserved.
版权所有(C)互联网协会(1998年)。版权所有。
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。