RFC 6562: Guidelines for the Use of Variable Bit Rate Audio with Secure RTP 中文翻译

URL : https://datatracker.ietf.org/doc/html/rfc6562
标题 : RFC 6562
翻译类型 : 自动生成

Internet Engineering Task Force (IETF)                        C. Perkins
Request for Comments: 6562                         University of Glasgow
Category: Standards Track                                      JM. Valin
ISSN: 2070-1721                                      Mozilla Corporation
                                                              March 2012

Internet Engineering Task Force (IETF)                        C. Perkins
Request for Comments: 6562                         University of Glasgow
Category: Standards Track                                      JM. Valin
ISSN: 2070-1721                                      Mozilla Corporation
                                                              March 2012

Guidelines for the Use of Variable Bit Rate Audio with Secure RTP

带安全RTP的可变比特率音频使用指南

Abstract

摘要

This memo discusses potential security issues that arise when using variable bit rate (VBR) audio with the secure RTP profile. Guidelines to mitigate these issues are suggested.

本备忘录讨论了在安全RTP配置文件中使用可变比特率（VBR）音频时可能出现的安全问题。提出了缓解这些问题的指导方针。

Status of This Memo

关于下段备忘

This is an Internet Standards Track document.

这是一份互联网标准跟踪文件。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.

本文件是互联网工程任务组（IETF）的产品。它代表了IETF社区的共识。它已经接受了公众审查，并已被互联网工程指导小组（IESG）批准出版。有关互联网标准的更多信息，请参见RFC 5741第2节。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6562.

有关本文件当前状态、任何勘误表以及如何提供反馈的信息，请访问http://www.rfc-editor.org/info/rfc6562.

版权公告

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

本文件受BCP 78和IETF信托有关IETF文件的法律规定的约束(http://trustee.ietf.org/license-info)自本文件出版之日起生效。请仔细阅读这些文件，因为它们描述了您对本文件的权利和限制。从本文件中提取的代码组件必须包括信托法律条款第4.e节中所述的简化BSD许可证文本，并提供简化BSD许可证中所述的无担保。

Table of Contents

   1.  Introduction ...................................................2
   2.  Scenario-Dependent Risk ........................................2
   3.  Guidelines for Use of VBR Audio with SRTP ......................3
   4.  Guidelines for Use of Voice Activity Detection with SRTP .......3
   5.  Padding the Output of VBR Codecs ...............................4
   6.  Security Considerations ........................................5
   7.  Acknowledgements ...............................................5
   8.  References .....................................................5
       8.1. Normative References ......................................5
       8.2. Informative References ....................................6

   1.  Introduction ...................................................2
   2.  Scenario-Dependent Risk ........................................2
   3.  Guidelines for Use of VBR Audio with SRTP ......................3
   4.  Guidelines for Use of Voice Activity Detection with SRTP .......3
   5.  Padding the Output of VBR Codecs ...............................4
   6.  Security Considerations ........................................5
   7.  Acknowledgements ...............................................5
   8.  References .....................................................5
       8.1. Normative References ......................................5
       8.2. Informative References ....................................6

1. Introduction

1. 介绍

The Secure RTP (SRTP) framework [RFC3711] is a widely used framework for securing RTP sessions [RFC3550]. SRTP provides the ability to encrypt the payload of an RTP packet, and optionally add an authentication tag, while leaving the RTP header and any header extension in the clear. A range of encryption transforms can be used with SRTP, but none of the predefined encryption transforms use any padding; the RTP and SRTP payload sizes match exactly.

安全RTP（SRTP）框架[RFC3711]是用于保护RTP会话的广泛使用的框架[RFC3550]。SRTP提供了对RTP数据包的有效负载进行加密的能力，并可以选择添加身份验证标签，同时保留RTP报头和任何报头扩展。一系列加密转换可以与SRTP一起使用，但预定义的加密转换都不使用任何填充；RTP和SRTP有效负载大小完全匹配。

When using SRTP with voice streams compressed using variable bit rate (VBR) codecs, the length of the compressed packets will depend on the characteristics of the speech signal. This variation in packet size will leak a small amount of information about the contents of the speech signal. This is potentially a security risk for some applications. For example, [spot-me] shows that known phrases in an encrypted call using the Speex codec in VBR mode can be recognized with high accuracy in certain circumstances, and [fon-iks] shows that approximate transcripts of encrypted VBR calls can be derived for some codecs without breaking the encryption. How significant these results are, and how they generalize to other codecs, is still an open question. This memo discusses ways in which such traffic analysis risks may be mitigated.

当SRTP与使用可变比特率（VBR）编解码器压缩的语音流一起使用时，压缩数据包的长度将取决于语音信号的特征。数据包大小的这种变化将泄漏有关语音信号内容的少量信息。这对某些应用程序来说可能存在安全风险。例如，[spot me]表明，在某些情况下，在VBR模式下使用Speex编解码器的加密呼叫中的已知短语可以高精度识别，[fon iks]表明，在不破坏加密的情况下，可以为某些编解码器导出加密VBR呼叫的近似转录本。这些结果有多重要，以及它们如何推广到其他编解码器，仍然是一个悬而未决的问题。本备忘录讨论了缓解此类流量分析风险的方法。

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

本文件中的关键词“必须”、“不得”、“要求”、“应”、“不应”、“应”、“不应”、“建议”、“可”和“可选”应按照RFC 2119[RFC2119]中所述进行解释。

2. Scenario-Dependent Risk

2. 情景相关风险

Whether the information leaks and attacks discussed in [spot-me], [fon-iks], and similar works are significant is highly dependent on the application and use scenario. In the worst case, using the rate information to recognize a prerecorded message knowing the set of all possible messages would lead to near-perfect accuracy. Even when the

[spot me]、[fon iks]和类似作品中讨论的信息泄漏和攻击是否重要，在很大程度上取决于应用程序和使用场景。在最坏的情况下，使用速率信息识别预先记录的消息，知道所有可能的消息集，将导致近乎完美的准确性。即使

audio is not prerecorded, there is a real possibility of being able to recognize contents from encrypted audio when the dialog is highly structured (e.g., when the eavesdropper knows that only a handful of possible sentences are possible), and thus contain only little information. Recognizing unconstrained conversational speech from the rate information alone is unreliable and computationally expensive at present, but does appear possible in some circumstances. These attacks are only likely to improve over time.

音频不是预先录制的，当对话高度结构化时（例如，当窃听者知道只有少数可能的句子是可能的），并且因此只包含很少的信息时，确实有可能从加密音频中识别内容。目前，仅从速率信息识别无约束的会话语音是不可靠的，并且计算成本很高，但在某些情况下似乎是可能的。这些攻击只可能随着时间的推移而改善。

In practical SRTP scenarios, how significant the information leak is when compared to other SRTP-related information must be considered, such as the fact that the source and destination IP addresses are available.

在实际的SRTP场景中，必须考虑与其他SRTP相关信息相比，信息泄漏的严重程度，例如源和目标IP地址可用的事实。

3. Guidelines for Use of VBR Audio with SRTP

3. 带SRTP的VBR音频使用指南

It is the responsibility of the application designer to determine the appropriate trade-off between security and bandwidth overhead. As a general rule, VBR codecs should be considered safe in the context of low-value encrypted unstructured calls. However, applications that make use of prerecorded messages where the contents of such prerecorded messages may be of any value to an eavesdropper (i.e., messages beyond standard greeting messages) SHOULD NOT use codecs in VBR mode. Interactive voice response (IVR) applications would be particularly vulnerable since an eavesdropper could easily use the rate information to recognize the prompts being played out. Applications conveying highly sensitive unstructured information SHOULD NOT use codecs in VBR mode.

应用程序设计者负责确定安全性和带宽开销之间的适当权衡。作为一般规则，VBR编解码器在低值加密非结构化调用的上下文中应被视为安全的。但是，如果使用预录消息的应用程序的预录消息的内容可能对窃听者有任何价值（即，超出标准问候消息的消息），则不应在VBR模式下使用编解码器。交互式语音应答（IVR）应用程序尤其容易受到攻击，因为窃听者可以轻松地使用速率信息来识别正在播放的提示。传输高度敏感的非结构化信息的应用程序不应在VBR模式下使用编解码器。

It is safe to use variable rate coding to adapt the output of a voice codec to match characteristics of a network channel, provided this adaptation is done in a way that does not expose any information on the speech signal. For example, VBR audio can be used for congestion control purposes, where the variation is driven by the available network bandwidth, not by the input speech (i.e., the packet sizes and spacing are constant unless the network conditions change). VBR speech codecs can safely be used in this fashion with SRTP while avoiding leaking information on the contents of the speech signal that might be useful for traffic analysis.

使用变速率编码来调整语音编解码器的输出以匹配网络信道的特征是安全的，前提是这种调整是以不公开语音信号的任何信息的方式进行的。例如，VBR音频可用于拥塞控制目的，其中变化由可用网络带宽驱动，而不是由输入语音驱动（即，除非网络条件改变，否则数据包大小和间隔是恒定的）。VBR语音编解码器可以安全地以这种方式与SRTP一起使用，同时避免泄漏可能对流量分析有用的语音信号内容信息。

4. Guidelines for Use of Voice Activity Detection with SRTP

4. SRTP语音活动检测使用指南

Many speech codecs employ some form of voice activity detection (VAD) to either suppress output frames, or generate some form of lower-rate comfort noise frames, during periods when the speaker is not active. If VAD is used on an encrypted speech signal, then some information

许多语音编解码器采用某种形式的语音活动检测（VAD）来抑制输出帧，或在扬声器不活动期间生成某种形式的低速舒适噪声帧。如果VAD用于加密语音信号，则

about the characteristics of that speech signal can be determined by watching the patterns of voice activity. This information leakage is less than with VBR coding since there are only two rates possible.

关于语音信号的特征，可以通过观察语音活动的模式来确定。这种信息泄漏比VBR编码少，因为只有两种可能的速率。

The information leakage due to VAD in SRTP audio sessions can be much reduced if the sender adds an unpredictable "overhang" period to the end of active speech intervals, obscuring their actual length. An RTP sender using VAD with encrypted SRTP audio SHOULD insert such an overhang period at the end of each talkspurt, delaying the start of the silence/comfort noise by a random interval. The length of the overhang applied to each talkspurt must be randomly chosen in such a way that it is computationally infeasible for an attacker to reliably estimate the length of that talkspurt. This may be more important for short talkspurts, since it seems easier to distinguish between different single word responses based on the exact word length, than to glean meaning from the duration of a longer phrase. The audio data comprising the overhang period must be packetized and transmitted in RTP packets in a manner that is indistinguishable from the other data in the talkspurt.

如果发送方在活动语音间隔的末尾添加了一个不可预测的“悬垂”周期，从而掩盖了其实际长度，则SRTP音频会话中由于VAD而导致的信息泄漏可以大大减少。使用带有加密SRTP音频的VAD的RTP发送方应在每个TalkSput结束时插入这样一个悬垂周期，以随机间隔延迟静音/舒适噪音的开始。应用于每个talkspurt的悬垂长度必须随机选择，以便攻击者在计算上无法可靠地估计该talkspurt的长度。这对于简短的谈话来说可能更为重要，因为根据确切的单词长度来区分不同的单个单词反应似乎比从较长短语的持续时间中收集意义更容易。包含悬垂周期的音频数据必须以与TalkSput中的其他数据无法区分的方式打包并以RTP分组传输。

The overhang period SHOULD have an exponentially decreasing probability distribution function. This ensures a long tail, while being easy to compute. It is RECOMMENDED to use an overhang with a "half life" of a few hundred milliseconds (this should be sufficient to obscure the presence of interword pauses and the lengths of single words spoken in isolation, for example, the digits of a credit card number clearly enunciated for an automated system, but not so long as to significantly reduce the effectiveness of VAD for detecting listening pauses). Despite the overhang (and no matter what the duration is), there is still a small amount of information leaked about the start time of the talkspurt due to the fact that we cannot apply an overhang to the start of a talkspurt without unacceptably affecting intelligibility. For that reason, VAD SHOULD NOT be used in encrypted IVR applications where the content of prerecorded messages may be of any value to an eavesdropper.

悬垂期应具有指数递减的概率分布函数。这确保了长尾，同时易于计算。建议使用“半衰期”为几百毫秒的悬挑（这应足以掩盖单词间停顿的存在和单独说出的单个单词的长度，例如，为自动化系统清晰地说出的信用卡号码的数字，但不足以显著降低VAD检测听力停顿的有效性）。尽管（无论持续时间是多少），仍有少量关于TalkSport开始时间的信息泄漏，这是因为我们无法在TalkSput开始时应用悬垂，而不会对可理解性造成不可接受的影响。因此，VAD不应用于预录消息内容可能具有任何价值的加密IVR应用程序中给窃听者。

The application of a random overhang period to each talkspurt will reduce the effectiveness of VAD in SRTP sessions when compared to non-SRTP sessions. However, it is still expected that the use of VAD will provide significant bandwidth savings for many encrypted sessions.

与非SRTP会话相比，在SRTP会话中对每个talkspurt应用随机悬垂周期将降低VAD的有效性。然而，人们仍然期望使用VAD将为许多加密会话节省大量带宽。

5. Padding the Output of VBR Codecs

5. 填充VBR编解码器的输出

For scenarios where VBR is considered unsafe, a constant bit rate (CBR) codec SHOULD be negotiated and used instead, or the VBR codec SHOULD be operated in a CBR mode. However, if the codec does not support CBR, RTP padding SHOULD be used to reduce the information

对于认为VBR不安全的场景，应协商并使用恒定比特率（CBR）编解码器，或者VBR编解码器应在CBR模式下运行。但是，如果编解码器不支持CBR，则应使用RTP填充来减少信息

leak to an insignificant level. Packets may be padded to a constant size or to a small range of sizes ([spot-me] achieves good results by padding to the next multiple of 16 octets, but the amount of padding needed to hide the variation in packet size will depend on the codec and the sophistication of the attacker) or may be padded to a size that varies with time. The most secure and RECOMMENDED option is to pad all packets throughout the call to the same size.

泄漏到微不足道的程度。数据包可以填充到一个恒定的大小或一个小的大小范围（[spot me]通过填充到下一个16个八位字节的倍数来获得良好的结果，但是隐藏数据包大小变化所需的填充量将取决于编解码器和攻击者的复杂程度），或者可以填充到随时间变化的大小。最安全和推荐的选择是将整个呼叫中的所有数据包填充到相同的大小。

In the case where the size of the padded packets varies in time, the same concerns as for VAD apply. That is, the padding SHOULD NOT be reduced without waiting for a certain (random) time. The RECOMMENDED "hold time" is the same as the one for VAD.

在填充数据包的大小随时间变化的情况下，与VAD相同的关注点也适用。也就是说，不应在不等待特定（随机）时间的情况下减少填充。建议的“保持时间”与VAD相同。

Note that SRTP encrypts the count of the number of octets of padding added to a packet, but not the bit in the RTP header that indicates that the packet has been padded. For this reason, it is RECOMMENDED to add at least one octet of padding to all packets in a media stream, so an attacker cannot tell which packets needed padding.

请注意，SRTP加密添加到数据包的填充八进制数的计数，但不加密RTP报头中指示数据包已填充的位。因此，建议在媒体流中的所有数据包中至少添加一个八位字节的填充，以便攻击者无法判断哪些数据包需要填充。

6. Security Considerations

6. 安全考虑

This entire memo is about security. The security considerations of [RFC3711] also apply.

整个备忘录都是关于安全的。[RFC3711]的安全注意事项也适用。

7. Acknowledgements

7. 致谢

ZRTP [RFC6189] contains similar recommendations; the purpose of this memo is to highlight these issues to a wider audience, since they are not specific to ZRTP. Thanks are due to Phil Zimmermann, Stefan Doehla, Mats Naslund, Gregory Maxwell, David McGrew, Mark Baugher, Koen Vos, Ingemar Johansson, and Stephen Farrell for their comments and feedback on this memo.

ZRTP[RFC6189]包含类似的建议；本备忘录的目的是向更广泛的受众强调这些问题，因为它们不是ZRTP特有的。感谢Phil Zimmermann、Stefan Doehla、Mats Naslund、Gregory Maxwell、David McGrew、Mark Baugher、Koen Vos、Ingemar Johansson和Stephen Farrell对本备忘录的评论和反馈。

8. References

8. 工具书类

8.1. Normative References

8.1. 规范性引用文件

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]Bradner，S.，“RFC中用于表示需求水平的关键词”，BCP 14，RFC 2119，1997年3月。

[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[RFC3550]Schulzrinne，H.，Casner，S.，Frederick，R.，和V.Jacobson，“RTP：实时应用的传输协议”，STD 64，RFC 35502003年7月。

[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.

[RFC3711]Baugher，M.，McGrew，D.，Naslund，M.，Carrara，E.，和K.Norrman，“安全实时传输协议（SRTP）”，RFC 37112004年3月。

8.2. Informative References

8.2. 资料性引用

[RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media Path Key Agreement for Unicast Secure RTP", RFC 6189, April 2011.

[RFC6189]Zimmermann，P.，Johnston，A.，和J.Callas，“ZRTP：单播安全RTP的媒体路径密钥协议”，RFC 6189，2011年4月。

[fon-iks] White, A., Matthews, A., Snow, K., and F. Monrose, "Phonotactic Reconstruction of Encrypted VoIP Conversations: Hookt on fon-iks", Proceedings of the IEEE Symposium on Security and Privacy 2011, May 2011.

[fon-iks]White，A.，Matthews，A.，Snow，K.，和F.Monrose，“加密VoIP对话的语音重建：fon-iks上的Hocket”，2011年5月IEEE安全和隐私研讨会论文集。

[spot-me] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. Masson, "Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversation", Proceedings of the IEEE Symposium on Security and Privacy 2008, May 2008.

[spot me]Wright，C.，Ballard，L.，Coull，S.，Monrose，F.，和G.Masson，“如果可以的话，找到我：在加密VoIP对话中发现口语短语”，2008年5月IEEE安全和隐私研讨会论文集。

Authors' Addresses

作者地址

Colin Perkins University of Glasgow School of Computing Science Glasgow G12 8QQ UK

柯林帕金斯格拉斯哥大学计算科学学院格拉斯哥G128QQ英国

   EMail: csp@csperkins.org

   EMail: csp@csperkins.org

Jean-Marc Valin Mozilla Corporation 650 Castro Street Mountain View, CA 94041 USA

美国加利福尼亚州山景城卡斯特罗街650号Jean-Marc Valin Mozilla Corporation，邮编94041

   Phone: +1 650 903-0800
   EMail: jmvalin@jmvalin.ca

   Phone: +1 650 903-0800
   EMail: jmvalin@jmvalin.ca